A database dynamic partial uncompression mechanism determines when to dynamically uncompress one or more compressed portions of a database table that also includes uncompressed portions. A query may include an express term that specifies whether or not to skip compressed portions. In addition, a query may include associated information that specifies whether or not to skip compressed portions, and one or more thresholds that may be used to determine if the system is too busy to perform uncompression. A display mechanism may also determine whether or not to display compressed portions. The uncompression may occur at the database server or at a client. The database dynamic partial uncompression mechanism thus performs dynamic uncompression in a way that preferably uncompresses one or more compressed portions of a partially compressed database table only when the compressed portions satisfy a query and/or need to be displayed.
|
9. An article of manufacture comprising:
an uncompression mechanism that determines a first portion of a database table is a first compressed portion that satisfies a query to the database table that includes the first compressed portion and at least one uncompressed portion, determines whether the query includes a parameter that specifies to skip compressed portions, and in response to determining the query includes the parameter that specifies to skip compressed portions, displaying a first result set for the query that does not include the first compressed portion, and in response to determining the query does not include the parameter that specifies to skip compressed portions, uncompressing the first compressed portion and displaying a second result set for the query that includes the uncompressed first portion;
a display mechanism that displays a result set that includes compressed data and determines when and how to display the compressed data in the result set; and
non-transitory computer-readable media bearing the uncompression mechanism.
1. An apparatus comprising:
at least one processor;
a memory coupled to the at least one processor;
a database table residing in the memory that includes at least one portion that is compressed and at least one portion that is uncompressed;
an uncompression mechanism residing in the memory and executed by the at least one processor that determines a first portion of the database table is a first compressed portion that satisfies the query, determines whether the query includes a parameter that specifies to skip compressed portions, and in response to determining the query includes the parameter that specifies to skip compressed portions, displaying a first result set for the query that does not include the first compressed portion, and in response to determining the query does not include the parameter that specifies to skip compressed portions, uncompressing the first compressed portion and displaying a second result set for the query that includes the uncompressed first portion; and
a display mechanism residing in the memory and executed by the at least one processor, the display mechanism displaying a result set that includes compressed data and determining when and how to display the compressed data in the result set.
5. An apparatus comprising:
at least one processor;
a memory coupled to the at least one processor;
a database table residing in the memory that includes at least one portion that is compressed and at least one portion that is uncompressed;
an uncompression mechanism residing in the memory and executed by the at least one processor that performs the steps of:
executing a query to the database table;
determining a first portion of the database table is a first compressed portion that satisfies the query;
determining whether the query includes a parameter that specifies to skip compressed portions;
in response to determining the query does not include the parameter that specifies to skip the compressed portions, uncompressing the first compressed portion in response to the first compressed portion satisfies the query;
a display mechanism residing in the memory and executed by the at least one processor that performs the steps of:
displaying a result set for the query;
in response to the result set for the query includes compressed data, performing the steps of:
in response to the query includes the parameter that specifies to skip compressed portions, not including the first compressed portion in the displayed result set; and
in response to the query does not include the parameter that specifies to skip compressed portions, including the uncompressed first portion in the displayed result set.
2. The apparatus of
3. The apparatus of
4. The apparatus of
6. The apparatus of
7. The apparatus of
8. The apparatus of
10. The article of manufacture of
11. The article of manufacture of
12. The article of manufacture of
|
This patent application is a continuation of U.S. Ser. No. 11/834,840 filed on Aug. 7, 2007, which is incorporated herein by reference.
1. Technical Field
This disclosure generally relates to computer systems, and more specifically relates to database systems.
2. Background Art
Database systems have been developed that allow a computer to store a large amount of information in a way that allows a user to search for and retrieve specific information in the database. For example, an insurance company may have a database that includes all of its policy holders and their current account information, including payment history, premium amount, policy number, policy type, exclusions to coverage, etc. A database system allows the insurance company to retrieve the account information for a single policy holder among the thousands and perhaps millions of policy holders in its database. Retrieval of information from a database is typically done using queries. A database query typically includes one or more predicate expressions interconnected with logical operators.
Database compression has been known for some time as a way to reduce the size of a table that is not often used. In the prior art, if compression is performed, it is performed on an entire database table. Once a table is compressed, it cannot be queried until it is uncompressed. If the data in the table is then needed, the entire table must be uncompressed, then a query may be executed to access data in the table. The cost in processor overhead of compressing and uncompressing a database table can be significant, especially for large tables. For this reason, compression/uncompression schemes have typically been limited to applications when the likelihood of needing data that has been compressed is low.
The first related application referenced above provides a way to partially compress a portion of a database table without compressing all of the database table. Portions that may be compressed include columns, parts of columns, and rows. When a database table has one or more compressed portions, the issue now arises regarding how to deal with the compressed portions. For example, one suitable way to handle compressed portions would be to uncompress a compressed portion when the portion is touched by a query. Note, however, that a query may cause a table scan to be performed that touches all rows even though most of the rows do not satisfy the query. Uncompressing on first touch in this manner may cause uncompression of portions of the table that are not needed. Without a way to perform dynamic uncompression of portions of a partially compressed database table in an intelligent manner, the partial compression taught in the first related application will have limited value.
A database dynamic partial uncompression mechanism determines when to dynamically uncompress one or more compressed portions of a database table that also includes uncompressed portions. A query may include an express term that specifies whether or not to skip compressed portions. In addition, a query may include associated information that specifies whether or not to skip compressed portions, and one or more thresholds that may be used to determine if the system is too busy to perform uncompression. A display mechanism may also determine whether or not to display compressed portions. The uncompression may occur at the database server or at a client. The database dynamic partial uncompression mechanism thus performs dynamic uncompression in a way that preferably uncompresses one or more compressed portions of a partially compressed database table only when the compressed portions satisfy a query and/or need to be displayed.
The foregoing and other features and advantages will be apparent from the following more particular description, as illustrated in the accompanying drawings.
The disclosure will be described in conjunction with the appended drawings, where like designations denote like elements, and:
The claims and disclosure herein provide a way to dynamically uncompress and display one or more portions of a database table that has one or more portions compressed while other portions of the database table are uncompressed. In one implementation, a parameter may be specified in a query that causes compressed portions to be skipped when the query is executed. Information associated with a query may include a flag or other information that specifies whether or not to include compressed portions of the database table in the result set and where to perform the uncompression. In addition, a display mechanism may determine whether and how to display a compressed portions of a database table that are in a result set for a query.
Referring to
Main memory 120 preferably contains data 121, an operating system 122, a database 123, a query 124, a result set 125 for the query 124, and a database dynamic partial uncompression mechanism 126. Data 121 represents any data that serves as input to or output from any program in computer system 100. Operating system 122 is a multitasking operating system known in the industry as i5/OS; however, those skilled in the art will appreciate that the spirit and scope of this disclosure is not limited to any one operating system. Database 123 is any suitable database, whether currently known or developed in the future. Database 123 preferably includes one or more tables that have one or more compressed portions and one or more uncompressed portions. Query 124 is any suitable database query, including an SQL query. The result set for the query 125 includes the results returned from executing query 124.
The database dynamic partial uncompression mechanism 126 performs partial uncompression of portions of a database table according to specified uncompression information 127, according to the results of an uncompression cost estimator 128, and displays the result set 125 for query 124 using to a result set display mechanism 129. The particular method used by the database dynamic partial uncompression mechanism for uncompressing compressed portions of the database depends on the method used for compressing those portions, and preferably returns the data to its original state before it was compressed after it is uncompressed. The uncompression information 127 may include information specified in a query, information associated with a query but not specified in the query, information that determines how rows in a result set are displayed and information that determines where the uncompression is performed. The uncompression cost estimator 128 is used to estimate the cost of uncompression so an intelligent decision may be made regarding whether uncompression is desirable. For example, if utilization of the processor 110 exceeds a predetermined threshold, the uncompression cost estimator 128 could decide not to do uncompression because the processor is too busy. If IO count exceeds a predetermined threshold, the uncompression cost estimator 128 could decide not to do uncompression because the IO count is too high. If memory utilization exceeds a predetermined threshold, the uncompression cost estimator 128 could decide not to do uncompression because the memory utilization is too high. The result set display mechanism 129 determines whether compressed portions in a result set are displayed, and if so, how they are displayed.
Computer system 100 utilizes well known virtual addressing mechanisms that allow the programs of computer system 100 to behave as if they only have access to a large, single storage entity instead of access to multiple, smaller storage entities such as main memory 120 and DASD device 155. Therefore, while data 121, operating system 122, database 123, query 124, result set for query 125, and database dynamic partial uncompression mechanism 126 are shown to reside in main memory 120, those skilled in the art will recognize that these items are not necessarily all completely contained in main memory 120 at the same time. It should also be noted that the term “memory” is used herein generically to refer to the entire virtual memory of computer system 100, and may include the virtual memory of other computer systems coupled to computer system 100.
Processor 110 may be constructed from one or more microprocessors and/or integrated circuits. Processor 110 executes program instructions stored in main memory 120. Main memory 120 stores programs and data that processor 110 may access. When computer system 100 starts up, processor 110 initially executes the program instructions that make up operating system 122.
Although computer system 100 is shown to contain only a single processor and a single system bus, those skilled in the art will appreciate that a database dynamic partial uncompression mechanism may be practiced using a computer system that has multiple processors and/or multiple buses. In addition, the interfaces that are used preferably each include separate, fully programmed microprocessors that are used to off-load compute-intensive processing from processor 110. However, those skilled in the art will appreciate that these functions may be performed using I/O adapters as well.
Display interface 140 is used to directly connect one or more displays 165 to computer system 100. These displays 165, which may be non-intelligent (i.e., dumb) terminals or fully programmable workstations, are used to provide system administrators and users the ability to communicate with computer system 100. Note, however, that while display interface 140 is provided to support communication with one or more displays 165, computer system 100 does not necessarily require a display 165, because all needed interaction with users and other processes may occur via network interface 150.
Network interface 150 is used to connect computer system 100 to other computer systems or workstations 175 via network 170. Network interface 150 broadly represents any suitable way to interconnect electronic devices, regardless of whether the network 170 comprises present-day analog and/or digital techniques or via some networking mechanism of the future. In addition, many different network protocols can be used to implement a network. These protocols are specialized computer programs that allow computers to communicate across a network. TCP/IP (Transmission Control Protocol/Internet Protocol) is an example of a suitable network protocol.
At this point, it is important to note that while the description above is in the context of a fully functional computer system, those skilled in the art will appreciate that the database dynamic partial uncompression mechanism may be distributed as an article of manufacture in a variety of forms, and the claims extend to all suitable types of computer-readable media that bear instructions that may be executed by a computer. Examples of suitable computer-readable media include recordable media such as floppy disks and CD-RW (e.g., 195 of
The database dynamic partial uncompression mechanism may also be delivered as part of a service engagement with a client corporation, nonprofit organization, government entity, internal organizational structure, or the like. This may include configuring a computer system to perform some or all of the methods described herein, and deploying software, hardware, and web services that implement some or all of the methods described herein. This may also include analyzing the client's operations, creating recommendations responsive to the analysis, building systems that implement portions of the recommendations, integrating the systems into existing processes and infrastructure, metering use of the systems, allocating expenses to users of the systems, and billing for use of the systems.
Referring to
Referring to
The first related application referenced above discloses a way to compress one or more portions of a database table without compressing all portions of the database table. Methods disclosed in the first related application include methods 400 and 500 in
Method 500 in
A simple example is now provided to illustrate the concepts discussed in general terms above. Referring to
Referring to
The Display Uncompressed field 714 specifies whether compressed portions in a the query's result set should be displayed uncompressed. The CPU % field 716 specifies a predetermined threshold for processor utilization that may be used in determining whether or not to uncompress one or more compressed portions in a result set. If the current processor utilization exceeds the predetermined threshold, dynamic uncompression may not be done because the processor is too busy. The IO Count field 718 specifies a predetermined threshold for IO Count that may be used in determining whether or not to uncompress one or more compressed portions in a result set. If the current IO count exceeds the predetermined threshold, dynamic uncompression may not be done because the IO count is too high. The Memory Used field 720 specifies a predetermined threshold for memory usage that may be used in determining whether or not to uncompress one or more compressed portions in a result set. If memory usage is above the predetermined threshold in the Memory Used field 720, dynamic uncompression may not be done because the memory utilization is too high. The First Touch field 722 defines a flag that specifies whether or not compressed portions are uncompressed at first touch. When the First Touch flag is set, uncompressed portions are uncompressed at first touch. When the First Touch flag is cleared, compressed portions are not necessarily uncompressed at first touch. The At Client field 724 specifies a flag that specifies where to perform the dynamic uncompression. When the At Client flag 724 is set, dynamic uncompression is done at the client instead of at the database server where the table resides. When the At Client flag 724 is cleared, dynamic uncompression is not necessarily done at the client, but can be done at the database server as well. The At Client flag 724 allows off-loading the dynamic uncompression to client computer systems that need the data, thereby freeing up the database server from performing uncompression tasks. When both the First Touch flag and the At Client flag are set, this means uncompression will be performed on the client computer system on first touch of a compressed portion from the application perspective. Uncompressing upon first touch from an application perspective prevent uncompressing rows that are touched by the database manager (e.g., in performing a table scan) and uncompresses only those compressed portions that are actually touched by the application.
The Secondary flag 712, Display Uncompressed flag 714, CPU % threshold 716, IO Count threshold 718, and Memory Used threshold 720 may all be used by the uncompression cost estimator 128 to determine whether the cost of uncompression is such that uncompression is warranted. If the cost of uncompression is not too high, uncompression may be performed on one or more compressed portions of a database table. In the most preferred implementation, only those compressed portions of a database table that satisfy a query are uncompressed. However, the disclosure and claims herein expressly extend to uncompressing any and all compressed portions of a database table, regardless of whether the compressed portions satisfy a query.
Each of the fields 702-724 shown in
When a portion of a database table has been compressed, various heuristics may be used to determine whether the compressed portion needs to be uncompressed. However, compressing rows that are rarely accessed may give rise to performance penalties if a query performs a full table scan, which touches all rows even though many of the rows are not used. If each row had to be uncompressed for each table scan, the uncompressing of rows could result in substantial overhead that may negate the benefit of performing the compression. This problem may be avoided by adding to the SQL syntax the ability to skip compress rows or to uncompress compressed rows. This could be done, for example, by defining a “skipCompressed” parameter that could be specified in an SQL query, as shown in
Referring to
Referring to
Referring to
The result set display mechanism 129 determines how to display a result set. As shown in table 600 in
Notice this is the same query in
Now let's assume the Display Uncompressed flag 714 is cleared, but a previous query execution caused the first transaction shown in
The uncompression mechanism disclosed and claimed herein allows one or more compressed portions of a database table that includes both compressed portions and uncompressed portions to be dynamically uncompressed. Dynamic uncompression may be performed based on information relating to uncompression that is specified by a user. This information may include information that specifies to skip compressed rows, that specifies various thresholds for determining whether or not to do dynamic uncompression, and that specifies where to perform the dynamic uncompression. In addition, a result set display mechanism determines how to display a result set when the result set may contain one or more compressed portions.
One skilled in the art will appreciate that many variations are possible within the scope of the claims. Thus, while the disclosure is particularly shown and described above, it will be understood by those skilled in the art that these and other changes in form and details may be made therein without departing from the spirit and scope of the claims.
Santosuosso, John M., Barsness, Eric L.
Patent | Priority | Assignee | Title |
9965570, | Jun 27 2014 | International Business Machines Corporation | Performing predicate evaluation on compressed character string of variable length |
Patent | Priority | Assignee | Title |
5276898, | Jul 26 1990 | International Business Machines Corporation | System for selectively compressing data frames based upon a current processor work load identifying whether the processor is too busy to perform the compression |
5742806, | Jan 31 1994 | Sun Microsystems, Inc | Apparatus and method for decomposing database queries for database management system including multiprocessor digital data processing system |
5794229, | Apr 16 1993 | SYBASE, INC | Database system with methodology for storing a database table by vertically partitioning all columns of the table |
5893102, | Dec 06 1996 | Unisys Corporation | Textual database management, storage and retrieval system utilizing word-oriented, dictionary-based data compression/decompression |
5918225, | Apr 16 1993 | SYBASE, INC | SQL-based database system with improved indexing methodology |
5930466, | Mar 11 1997 | Lexmark International Inc; Lexmark International, Inc | Method and apparatus for data compression of bitmaps using rows and columns of bit-mapped printer data divided into vertical slices |
5946692, | May 08 1997 | AT&T Corp | Compressed representation of a data base that permits AD HOC querying |
6343293, | Sep 24 1998 | International Business Machines Corporation | Storing the uncompressed data length in a LOB map to speed substring access within a LOB value |
6374250, | Feb 03 1997 | TWITTER, INC | System and method for differential compression of data from a plurality of binary sources |
6549995, | Jan 06 2000 | International Business Machines Corporation | Compressor system memory organization and method for low latency access to uncompressed memory regions |
6577254, | Nov 14 2001 | HEWLETT-PACKARD DEVELOPMENT COMPANY, L P | Data compression/decompression system |
6691132, | May 16 2001 | Reengineering LLC | Semantic encoding and compression of database tables |
6766334, | Nov 21 2000 | SZ DJI TECHNOLOGY CO , LTD | Project-based configuration management method and apparatus |
7050639, | Nov 24 1999 | General Electric Company | Image data compression employing multiple compression code tables |
7058783, | Sep 18 2002 | ORACLE INTERNATIONAL CORPORATION OIC | Method and mechanism for on-line data compression and in-place updates |
7103608, | May 10 2002 | ORACLE INTERNATIONAL CORPORATION OIC | Method and mechanism for storing and accessing data |
7113936, | Dec 06 2001 | EMC IP HOLDING COMPANY LLC | Optimizer improved statistics collection |
7127449, | Aug 21 2003 | International Business Machines Corporation | Data query system load optimization |
7181457, | May 28 2003 | ACTIAN CORP | System and method for utilizing compression in database caches to facilitate access to database information |
7216291, | Oct 21 2003 | GOOGLE LLC | System and method to display table data residing in columns outside the viewable area of a window |
7480643, | Dec 22 2005 | International Business Machines Corporation | System and method for migrating databases |
20010054131, | |||
20030028509, | |||
20050015374, | |||
20050160074, | |||
20060123035, | |||
20080071818, | |||
20080162523, | |||
20090043734, | |||
20090043792, | |||
WO21022, | |||
WO2093455, | |||
WO3096230, | |||
WO2008009135, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Mar 08 2013 | International Business Machines Corporation | (assignment on the face of the patent) | / | |||
Nov 01 2013 | BARSNESS, ERIC L | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 031534 | /0618 | |
Nov 01 2013 | SANTOSUOSSO, JOHN M | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 031534 | /0618 |
Date | Maintenance Fee Events |
Mar 19 2018 | REM: Maintenance Fee Reminder Mailed. |
Sep 10 2018 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Aug 05 2017 | 4 years fee payment window open |
Feb 05 2018 | 6 months grace period start (w surcharge) |
Aug 05 2018 | patent expiry (for year 4) |
Aug 05 2020 | 2 years to revive unintentionally abandoned end. (for year 4) |
Aug 05 2021 | 8 years fee payment window open |
Feb 05 2022 | 6 months grace period start (w surcharge) |
Aug 05 2022 | patent expiry (for year 8) |
Aug 05 2024 | 2 years to revive unintentionally abandoned end. (for year 8) |
Aug 05 2025 | 12 years fee payment window open |
Feb 05 2026 | 6 months grace period start (w surcharge) |
Aug 05 2026 | patent expiry (for year 12) |
Aug 05 2028 | 2 years to revive unintentionally abandoned end. (for year 12) |