A system and method for improved query processing in a database are provided, where different methods of performing the same task can be adaptively chosen. The system can then adapt to different and changing environments, resulting in improved and more stable performance.
|
18. A method for selecting a primitive to be used in a query of a database, the method comprising:
storing a plurality of flavors for each primitive function, wherein each flavor of each primitive function is a primitive function using a manual loop unrolling and a primitive function that does not use the manual loop unrolling;
exploring, during an explore period, a flavor of a primitive function to select as an active flavor, the primitive function having one or more instructions configured to be executed by a query execution component; and
selecting, during an exploitation phase following the explore period, an active flavor for the primitive function based on data about an execution of one or more queries.
1. A data processing system, comprising:
a computer having a processor;
a database engine having a query execution component on the computer that can adapt an execution of a plurality of queries, wherein each query has one or more primitive functions that are used to evaluate the query, each primitive function having one or more instructions configured to be executed by the query execution component;
the database engine storing a plurality of flavors for each primitive function, wherein each flavor of each primitive function is a primitive function using a manual loop unrolling and a primitive function that does not use the manual loop unrolling;
an expression evaluator on the computer that evaluates each expression in each query; and
a primitive optimizer on the computer that selects a flavor of a primitive function for each primitive function-used in each query.
2. The system of
3. The system of
4. The system of
5. The system of
6. The system of
7. The system of
8. The system of
9. The system of
10. The system of
11. The system of
12. The system of
13. The system of
14. The system of
15. The system of
16. The system of
17. The system of
19. The method of
20. The method of
21. The method of
22. The method of
|
The disclosure generally relates to a database management system and in particular to a method and system for improving data processing performance by means of adaptive selection of a processing method (“Micro Adaptivity”).
A Database Management System (DBMS) is a software package that provides a reliable way to store data in a database and process it. One of the main requirements of a DBMS is to provide a coherent interface that allows application developers to manipulate data stored in a database.
The architecture of most DBMSs follows a multilayer approach, where each layer is responsible for another phase of processing. For example, typical DBMS layers may include:
It is desirable to be able to increase the efficiency and robustness of the query executor which results in overall performance improvement and reduced development time.
To demonstrate the need for a more efficient and robust query executor, an example is provided below.
For a simple SQL query, e.g. “SELECT*FROM some_table WHERE some_attribute >7”, the key operation is the selection (filtering) of tuples that match the “some_attribute >7” predicate. Internally, in database systems, such a selection is implemented using a Primitive function.
The primitive function may be implemented so that it consumes the input column as the “input” array, the constant that is compared against as the “val” column, and it produces an array “result” containing the indices of tuples from “input” which match the predicate, returning the number of such tuples as “found”. A simple implementation of such a function could look like:
for (i=0, found=0; i<n; i++)
return found;
The above implementation uses what is known as control-dependency—depending on the output of a comparison, a different code path is taken. This behavior is called branching. Modern CPUs have branch-predictor functionality, where they try to guess what will be the path taken in a branch to optimize performance. The problem with branch prediction is that, if the guess incorrectly, additional cost is paid (see “Balancing Vectorized Query Execution with Bandwidth-Optimized Storage”; Marcin Zukowski; PhD Thesis, University of Amsterdam, Amsterdam, The Netherlands, 2009, Section 5.2.4.2 or “Micro Adaptivity in a Vectorized Database System”; Bogdan Raducanu; MSc Thesis; Vrije Universiteit Amsterdam, 2012, Section 1.1.2.)
Another implementation of the primitive function may be:
for (i=0, found=0; i<n; i++) {result [found]=i;
}
return found;
This implementation uses what is known as data-dependency. Specifically, for each value in the input, the same code is performed. The output of a comparison influences the “found” variable, but does not change the code path taken. Note, that more instructions are executed, as for every input there's some action done, but there is no branching, so the performance is less sensitive to input.
The performance of these two functions varies greatly depending on the input characteristics as shown in
Since inside the same system various selectivities can be observed (even within the same query) none of these functions would provide an optimal performance overall. It is desirable to provide a method and system that makes query execution not only resistant to, but also benefiting from such performance differences, allowing easy integration with a modern DBMS and it is to this end that the disclosure is directed.
The disclosure is particularly applicable to a Micro Adaptivity system implemented in a Vectorwise database system and it is in this context that the disclosure will be described. It will be appreciated, however, that the Micro Adaptivity system and method has greater utility since the system and method can be used with any database system. The system and method has a potential of improving query execution efficiency in different database systems, translating to overall system performance. Before describing the details of Micro Adaptivity, an overview of an example of a database system on which the Micro Adaptivity may be implemented is described.
The above database system, including the database engine 52, may implement a Micro Adaptivity system and method. The Micro Adaptivity system and method may be implemented in hardware and interact with a database server or DBMS, but may also be implemented in software and be part of the database engine 52 or database management system (DBMS) that executes on the database servers of the database system.
In systems built on designs like the query execution layer 70, the relational query, typically provided by the user in SQL, is internally represented as a tree of Operator objects. Each Operator represents a logic specific for a given relational operation (e.g. selection, aggregation or join). This logic is usually the same regardless of the data types used in a particular query, or even details of the operations performed—for example, Select operator logic is the same if the predicate used is “WHERE x>7” or “WHERE y SIMILAR TO z”. The query-specific data-processing operations are expressed with Primitive functions. For example, one can have a Primitive function selecting values larger than 7 in a column of floating-point numbers, another that adds two integer attributes, another which extracts a day of week from a date, etc.
A Primitive can provide one of many different functions including selection, computation. A primitive may also implement functions internal to the Operator logic, for example hash value computation, finding an element in a hash table, updating an aggregate etc. As a result, the total number of Primitives in the system can be very high (e.g. thousands of functions). Also, it is typical for Primitive objects to consume the majority of the query processing time. As such, it is important to have simple and automatic ways to guarantee good performance of these functions.
The system may also have specialized primitive objects that are generated for each combination of an operation type and input data types. For example, different Primitive functions will be used to compute “WHERE shipdate <‘1998-09-02’ depending if “shipdate” is an attribute of type “date”, “timestamp” etc.
In the embodiment described herein, each Primitive is identified with a “Primitive Signature” representing the above parameter combination. In
In most database systems, a set of default Primitives are created during the system compilation time, and registered in a “Primitive Dictionary” component. The primitive dictionary provides the function of mapping Primitive Signatures onto the matching Primitive implementation. It is important to note that within a single query plan (a set of queries to be performed in a particular order) there are multiple Primitive functions used and a single Primitive function can be used in multiple parts of the query. Each occurrence of a Primitive in a query plan may be known as a “Primitive Instance”.
The efficiency of the Primitives (how efficiently the primitive can be executed, how much processing time is needed to execute the primitive, etc.) described above depends on the algorithm chosen to implement each primitive and the way the code was compiled. In addition, the efficiency of the primitives also may be influenced by the environment: hardware, data distributions, query parameters, concurrent query workload, and the interactions between these elements. The high complexity of computer systems, with their complex cache hierarchies, out-of-order execution capabilities and constraints, SIMD instruction support etc. combined with the dynamic aspects of the environments where the primitives are applied, make it impossible to correctly choose one optimal implementation for a particular primitive even for a known workload.
The problem of multiple possible kinds of Primitives for a given Primitive Signature is addressed by Micro Adaptivity system, which allows multiple Primitive implementations. The microadaptivity system improves the raw performance as well as performance stability in the query-execution layer of the database system. Any improvement in this layer directly translates to query performance improvements.
The micro adaptivity system and method adds several elements to a standard database system as shown in
Primitive Flavors: Instead of having a single Primitive for each Primitive Signature, the system now stores multiple versions of a Primitive for each Primitive Signature. These different versions we call “Primitive Flavors” 84.
Primitive Flavor Libraries: Different Primitive Flavors for the same Primitive Signature can be created using different methods, including: different compilers; different compiler options; different algorithmic implementations. Collections of Primitive Flavors can be compiled together into separate object files known as Primitive Flavor Libraries 86. The system is extended with the ability to link with or dynamically load multiple Primitive Flavor Libraries. Each Primitive Flavor Library registers its Primitive Flavors, under their respective Primitive Signatures, in the Primitive Dictionary, providing additional meta-information (e.g. the source of the Primitive Flavor) useful for e.g. performance profiling or debugging.
Primitive Dictionary Repository 82: This component is extended to be able to store multiple Primitive Flavors for each Primitive Signature. Primitive Flavor Libraries register multiple Primitive Flavors in it. When asked for a particular Primitive Signature, depending on the request type, it can now return either a single (default) Primitive, or a collection of Primitive Flavors, all implementing the functionality of the same Primitive. Primitive Flavors are kept using additional meta-information, which might include: flavor source meta-information (e.g. code version, compiler used, etc), number of times it was used and some performance characteristics, both delivered with primitive as well as gathered during runtime. Primitive Dictionary provides a registration mechanism through which a software component can register and load additional Primitives and Primitive Flavors dynamically on startup or even while the system is active.
Primitive Generation Framework: To generate multiple Primitive Flavors and Primitive Flavor Libraries, a Primitive Generation Framework 85 is used in the microadaptivity system as shown in
Combination of these dimensions can result in very many Primitive Flavors for the same Primitive. Primitive Generation Framework also provides facility to find identical Primitive Flavors in different Primitive Flavor Libraries and annotate them as such or eliminate them, to reduce the number of Primitive Flavors considered during the Query Execution phase.
Query Execution: In the query execution component 79, for each Primitive Instance, the system observes the performance of different Primitive Flavors, and chooses the one it thinks to be optimal at a given moment that may be known as an Active Flavor. In that flavor-optimization phase, a few optimization techniques may be used: exploitation—once an Active Flavor is detected, it is being used for some time, to avoid frequent re-optimization, which carries some cost; exploration—periodically switching to currently inactive flavors to see if their performance might be now better than the Active Flavor; degradation detection—when the performance of the Active Flavor suddenly deteriorates, the system can choose to look at other possibilities. In the implementation in a Vectorwise system, the above logic can be encapsulated in the expression evaluation code, making all changes local and transparent to other parts of the system.
The microadaptivity system may have different kinds of Primitive-generation aspects where different Primitive Flavors with different performance characteristics are possible. The below list is for example only and not exhaustive.
Compiler Variations (Different Compilers and/or Compiler Options
One of the easiest ways to obtain different flavors is to compile the primitive with different compilers and switches.
Branch vs. No-Branch
An example of context-dependent performance are branching and non-branching implementations of Selection primitives. The branching primitives use the if statements to test a predicate while the non-branching primitives use logical operators and index arithmetic to completely remove any branching.
The selection primitive in Listing 1 above accepts as arguments a vector col of ints and its size n, a constant val, and a vector res where to store the result. It produces a selection vector with the indices of the elements in the input vector which have a value strictly less than the constant value. The selection vector is then passed to other primitives. The Branching implementation in Listing 1 uses a branch while the primitive shown in Listing 2 below is branchfree (No-Branching). These implementations are functionally equivalent: they always produce the same result.
The No-Branching implementation is Listing 2 always performs the same number of operations, while with Branching, this depends on the data. If the data is such that the branch is almost never taken, then the Branching implementation will do less work, as it avoids executing the code that generates a result. What is the fastest implementation depends on the data
Loop Fission
Certain implementation of a database management system, such as Vectorwise, uses bloom filters to accelerate hash-table lookups in situations where the key is often not found. Using a bloom filter is faster because hash-table lookup is CPU-intensive and the hash-table may not be in the CPU cache—whereas a bloom filter might fit as it is much smaller, and lookup is quick and simple. Still, the bloom filter may not fit the CPU cache, and the lookup primitive displayed in Listing 5 below then incurs a cache miss in the bf_get( ) call.
Note, that this is a Selection primitive, identifying tuples surviving the bloom filter check, and uses a No-Branching code style.
The loop fussion optimization of this bloom filter check is displayed in Listing 6 above. Rather than identifying the selected tuples inside the same loop, it just collects the Boolean result of bf_get( ) in a temporary array, and then selects from there in a separate loop. The idea behind this loop fussion variant is that it removes all dependencies between iterations of the first loop.
The loop fussion variant, when it sustains a cache miss in bf_get( ), allows the CPU to continue executing the next loop iteration(s), leveraging the large out-of-order execution capabilities of modern CPUs (>100 instructions). This way the CPU will get multiple (up to 5, on Ivy Bridge) loop iterations in execution at any time, leading to 5 concurrent outstanding cache misses, maximizing memory bandwidth utilization. In contrast, the non-ssion variant causes the iterations to wait on each other due to the loop-iteration dependency, thus achieves less concurrent cache misses and therefore lower memory throughput.
Micro-benchmarks were performed in which the number of unique keys was varied from 212 to 227, which required bloom filters with sizes from 4 KB to 131072 KB. The testing revealed that, for large bloom lters, fussion performs better, indeed sometimes 50% faster; whereas for small bloom filters where there are no cache misses fussion can be slower, sometimes by 15%.
Hand-Unrolling
The primitives in the database engine may be template-generated. The templates are used to provide type-specific versions of the same function (e.g. multiplication of integers, of doubles, of short integers, etc.). Additionally, these template macros instantiate all possible parameter combinations of vector vs. constant (e.g multiply values from two vectors, a vector with a constant or a constant with a vector). Template macros insert a body action, such as the multiplication of values, in a loop over the input vector(s).
By changing these templates, as in Listing 7 above, a user manually introduced a well known loop-unrolling optimization into the primitives.
Query Execution Logic Details
With multiple Primitive Flavors available from the Primitive Dictionary for each Primitive Signature, the Query Execution component has to be able to make the right decision on which of the Primitive Flavors to choose for the actual data processing.
In one implementation, the system may track of the performance of each Primitive Flavor for each Primitive Instance. The typical metric used is the number of CPU clocks for each processed record. Thus, for each Primitive Flavor, the time of the last N calls is stored and the average of it is used as the expected cost in the next call.
To choose the Primitive Flavor, a self-learning Primitive Choice Process may be used. In one embodiment, presented in
The process above was shown to choose the optimal flavor with high accuracy and low overhead. The process may be optionally extended with a special initial exploration phase, during which all known flavors are first tested one after another on the input data—this provide complete information for the follow up exploration and exploitation phases.
For example, the process to select/optimize the primitive may be implemented as follows:
In addition to the above described primitive selection, the microadaptivity system may have a data recorder component that record data during the execution of each query. The recorded data may include statistics about the data being processed (e.g. selectivity, cardinality). The recorded data may also include performance of executed function implementations (e.g. CPU cycles spent, number of cache misses). The recorded data may be made persistent, such as being stored in a persistent memory or storage device) so that the recorded data may be reloaded when the system starts. The recorded data may be used to perform decisions on which flavor of a primitive to choose in the next iteration of a given primitive.
Benchmark Results
The MicroAdaptivity system and method was tested on a set of the 22 queries from the TPC-H Benchmark (See [TPC-H]) using a 100 GB generated database.) For the different types of aspects listed above that influence primitive generation, primitives were tested using one of the possible approaches. During the test, how well MicroAdaptivity performed dynamically choosing primitives was measured. In addition, the testing simulated the “optimal” process where, for each function call, the testing measured the time of the fastest of the available flavors.
While the foregoing has been with reference to a particular embodiment of the invention, it will be appreciated by those skilled in the art that changes in this embodiment may be made without departing from the principles and spirit of the disclosure, the scope of which is defined by the appended claims.
Zukowski, Marcin, Boncz, Peter, Raducanu, Bogdan
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
5680557, | Feb 16 1994 | The United States of America as represented by the Secretary of the Army | Natural computing system |
590855, | |||
5920855, | Jun 03 1997 | International Business Machines Corporation | On-line mining of association rules |
6292186, | Nov 06 1998 | International Business Machines Corporation | Universal information appliance with parser |
6336124, | Oct 01 1998 | BCL COMPUTERS, INC | Conversion data representing a document to other formats for manipulation and display |
6343275, | Dec 22 1997 | BIG BABOON, INC | Integrated business-to-business web commerce and business automation system |
6463582, | Oct 21 1998 | Fujitsu Limited | Dynamic optimizing object code translator for architecture emulation and dynamic optimizing object code translation method |
6546388, | Jan 14 2000 | International Business Machines Corporation | Metadata search results ranking system |
7086047, | Dec 04 2002 | XILINX, Inc. | Determining hardware generated by high level language compilation through loop optimizations |
7089362, | Dec 27 2001 | Intel Corporation | Cache memory eviction policy for combining write transactions |
7404182, | Oct 03 2003 | Oracle America, Inc | Deferring and combining write barriers for a garbage-collected heap |
7448048, | May 27 2003 | International Business Machines Corporation | Method for performing real-time analytics using a business rules engine on real-time heterogeneous materialized data views |
7685580, | Aug 30 2005 | Oracle America, Inc | Method and apparatus for selectively eliminating write barriers in snapshot-at-the beginning concurrent-marking garbage collectors |
7730079, | Aug 30 2005 | Microsoft Technology Licensing, LLC | Query comprehensions |
7783681, | Dec 15 2006 | Oracle America, Inc | Method and system for pre-marking objects for concurrent garbage collection |
7849444, | Dec 21 2004 | National Instruments Corporation | Test executive with buffer overwrite detection for parameters of user-supplied code modules |
8001021, | May 16 2006 | INTUIT INC. | System and method for compensation query management |
8095738, | Jun 15 2009 | International Business Machines Corporation | Differential caching mechanism based on media I/O speed |
8200527, | Apr 25 2007 | CONCENTRIX CVG CUSTOMER MANAGEMENT DELAWARE LLC | Method for prioritizing and presenting recommendations regarding organizaion's customer care capabilities |
8260803, | Sep 23 2010 | MICRO FOCUS LLC | System and method for data stream processing |
8271965, | Jul 03 2003 | International Business Machines Corporation | Apparatus to guarantee type and initialization safety in multithreaded programs |
8364519, | Mar 14 2008 | DATAINFOCOM USA INC | Apparatus, system and method for processing, analyzing or displaying data related to performance metrics |
8386845, | Oct 04 2010 | Veritas Technologies LLC | Techniques for providing instant disaster recovery |
8417789, | Dec 15 2006 | The Research Foundation of State University of New York | Distributed adaptive network memory engine |
8423981, | Jun 18 2009 | National Instruments Corporation | Compiling a graphical program having a textual language program portion for a real time target |
8606994, | Feb 26 2010 | Red Hat, Inc | Method for adapting performance sensitive operations to various levels of machine loads |
8626725, | Jul 31 2008 | Microsoft Technology Licensing, LLC | Efficient large-scale processing of column based data encoded structures |
8767957, | Oct 29 2008 | Purdue Research Foundation | Communication encryption method and device |
8825959, | Jul 31 2012 | ACTIAN CORPORATION | Method and apparatus for using data access time prediction for improving data buffering policies |
9110946, | Mar 30 2012 | International Business Machines Corporation | Database query optimization |
9171041, | Sep 29 2011 | PIVOTAL SOFTWARE, INC | RLE-aware optimization of SQL queries |
20020083049, | |||
20020165848, | |||
20030005134, | |||
20030120654, | |||
20030220931, | |||
20050084152, | |||
20050167917, | |||
20050209988, | |||
20050216421, | |||
20050222996, | |||
20060167917, | |||
20060200741, | |||
20060212244, | |||
20060224542, | |||
20060293934, | |||
20070067573, | |||
20070240136, | |||
20080059492, | |||
20080066145, | |||
20080120469, | |||
20080235181, | |||
20080243969, | |||
20080244471, | |||
20090007116, | |||
20090249446, | |||
20090254572, | |||
20090300631, | |||
20100114915, | |||
20100235335, | |||
20100250497, | |||
20100268773, | |||
20110078243, | |||
20110093500, | |||
20110208822, | |||
20110225232, | |||
20110270711, | |||
20120130986, | |||
20130173669, | |||
20130179395, | |||
20140201450, | |||
EP754325, | |||
EP855656, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Mar 13 2013 | ACTIAN NETHERLANDS B.V. | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Date | Maintenance Schedule |
Nov 22 2025 | 4 years fee payment window open |
May 22 2026 | 6 months grace period start (w surcharge) |
Nov 22 2026 | patent expiry (for year 4) |
Nov 22 2028 | 2 years to revive unintentionally abandoned end. (for year 4) |
Nov 22 2029 | 8 years fee payment window open |
May 22 2030 | 6 months grace period start (w surcharge) |
Nov 22 2030 | patent expiry (for year 8) |
Nov 22 2032 | 2 years to revive unintentionally abandoned end. (for year 8) |
Nov 22 2033 | 12 years fee payment window open |
May 22 2034 | 6 months grace period start (w surcharge) |
Nov 22 2034 | patent expiry (for year 12) |
Nov 22 2036 | 2 years to revive unintentionally abandoned end. (for year 12) |