Methods, systems, and computer program products are provided for query expression evaluation for query optimization. Embodiments include receiving a query including a plurality of predicates; creating a truth table representing the possible solutions to the query; identifying a plurality of intersections of the truth table, the plurality of intersections including each predicate of the query; assigning a selectivity factor to each predicate; determining in dependence upon the selectivity factors for each predicate a selectivity rating for each of the intersections of the truth table; selecting one or more of the intersections in dependence upon the selectivity rating; and determining in dependence upon the selected intersections a percentage of rows in a database table returned by the query; and providing the percentage of rows in the database table to an access plan generator for query optimization.
|
1. A method for query expression evaluation for query optimization, the method comprising:
receiving a query including a plurality of predicates;
creating a truth table representing all of the possible solutions to the query;
condensing the truth table including identifying a plurality of intersections of the truth table, the plurality of intersections including each predicate of the query;
assigning a selectivity factor to each predicate;
determining in dependence upon the selectivity factors for each predicate a selectivity rating for each of the intersections of the truth table;
selecting one or more of the intersections in dependence upon the selectivity rating, wherein selecting one or more of the intersections in dependence upon the selectivity rating further comprises:
determining whether any of the intersections has a selectivity rating of zero; and
when any of the intersections has a selectivity rating of zero, selecting the intersections having a selection rating greater than zero;
determining in dependence upon the selected intersections a percentage of rows in a database table returned by the query; and
providing the percentage of rows in the database table to an access plan generator for query optimization.
7. A system for query expression evaluation for query optimization, the system comprising a computer processor, a computer memory operatively coupled to the computer processor, the computer memory having disposed within the computer memory, computer program instructions configured to:
receive a query including a plurality of predicates;
create a truth table representing all of the possible solutions to the query;
condense the truth table including identifying a plurality of intersections of the truth table, the plurality of intersections including each predicate of the query;
assign a selectivity factor to each predicate;
determine in dependence upon the selectivity factors for each predicate a selectivity rating for each of the intersections of the truth table;
select one or more of the intersections in dependence upon the selectivity rating, wherein selecting one or more of the intersections in dependence upon the selectivity rating further comprises:
determining whether any of the intersections has a selectivity rating of zero; and
when any of the intersections has a selectivity rating of zero, selecting the intersections having a selection rating greater than zero;
determine in dependence upon the selected intersections a percentage of rows in a database table returned by the query; and
provide the percentage of rows in the database table to an access plan generator for query optimization.
13. A computer program product including a computer readable, recordable medium, wherein the computer readable, recordable medium is not a signal, the computer program product for query expression evaluation for query optimization, the computer program product comprising computer instructions configured to:
receive a query including a plurality of predicates;
create a truth table representing all of the possible solutions to the query;
condense the truth table including identifying a plurality of intersections of the truth table, the plurality of intersections including each predicate of the query;
assign a selectivity factor to each predicate;
determine in dependence upon the selectivity factors for each predicate a selectivity rating for each of the intersections of the truth table; and
select one or more of the intersections in dependence upon the selectivity rating, wherein selecting one or more of the intersections in dependence upon the selectivity rating further comprises:
determining whether any of the intersections has a selectivity rating of zero; and
when any of the intersections has a selectivity rating of zero, selecting the intersections having a selection rating greater than zero; and
determining in dependence upon the selected intersections a percentage of rows in a database table returned by the query; and
providing the percentage of rows in the database table to an access plan generator for query optimization.
2. The method of
3. The method of
determining a percentage of rows in the database table returned by each of the intersections having a selection rating greater than zero; and
summing the percentage of rows in the database table returned by each of the intersections having a selection rating greater than zero.
4. The method of
determining whether any of the intersections has a selectivity rating greater than a threshold; and
if any of the intersections has a selectivity rating greater than the threshold, selecting the intersections having a selectivity rating greater than the threshold.
5. The method of
wherein determining in dependence upon the intersections a percentage of rows in a database table returned by the query further comprises determining a projected percentage of rows in the database table determined in dependence upon the subset of intersections.
6. The method of
determining a percentage of rows in the database table returned by each of the intersections of the subset; and
summing the percentage of rows in the database table returned by each of the intersections of the subset;
determining a scalar in dependence upon the selectivity ratings of the intersections of the subset; and
scaling the summed percentage in dependence upon the scalar.
8. The system of
9. The system of
determine a percentage of rows in the database table returned by each of the intersections having a selection rating greater than zero; and
sum the percentage of rows in the database table returned by each of the intersections having a selection rating greater than zero.
10. The system of
determine whether any of the intersections has a selectivity rating greater than a threshold; and
if any of the intersections has a selectivity rating greater than the threshold, select the intersections having a selectivity rating greater than the threshold.
11. The system of
select a subset of intersections, each subset of intersections including each predicate in the query; and
wherein determining in dependence upon the intersections a percentage of rows in a database table returned by the query further comprises determining a projected percentage of rows in the database table determined in dependence upon the subset of intersections.
12. The system of
determine a percentage of rows in the database table returned by each of the intersections of the subset; and
sum the percentage of rows in the database table returned by each of the intersections of the subset;
determine a scalar in dependence upon the selectivity ratings of the intersections of the subset; and
scale the summed percentage in dependence upon the scalar.
14. The computer program product of
determine a percentage of rows in the database table returned by each of the intersections having a selection rating greater than zero; and
sum the percentage of rows in the database table returned by each of the intersections having a selection rating greater than zero.
15. The computer program product of
determine whether any of the intersections has a selectivity rating greater than a threshold; and
if any of the intersections has a selectivity rating greater than the threshold, select the intersections having a selectivity rating greater than the threshold.
16. The computer program product of
select a subset of intersections, each subset of intersections including each predicate in the query; and
wherein determining in dependence upon the intersections a percentage of rows in a database table returned by the query further comprises determining a projected percentage of rows in the database table determined in dependence upon the subset of intersections.
17. The computer program product of
determine a percentage of rows in the database table returned by each of the intersections of the subset; and
sum the percentage of rows in the database table returned by each of the intersections of the subset;
determine a scalar in dependence upon the selectivity ratings of the intersections of the subset; and
scale the summed percentage in dependence upon the scalar.
|
1. Field of the Invention
The field of the invention is data processing, or, more specifically, methods, apparatus, and products for query expression evaluation for query optimization.
2. Description of Related Art
The development of the EDVAC computer system of 1948 is often cited as the beginning of the computer era. Since that time, computer systems have evolved into extremely complicated devices. Today's computers are much more sophisticated than early systems such as the EDVAC. Computer systems typically include a combination of hardware and software components, application programs, operating systems, processors, buses, memory, input/output devices, and so on. As advances in semiconductor processing and computer architecture push the performance of the computer higher and higher, more sophisticated computer software has evolved to take advantage of the higher performance of the hardware, resulting in computer systems today that are much more powerful than just a few years ago.
Information stored on a computer system is often organized in a structure called a database. A relational database is a grouping of related structures called ‘tables,’ which in turn are organized in rows of individual data elements. The rows are often referred to as ‘records,’ and the individual data elements are referred to as ‘fields.’ In this specification generally, therefore, an aggregation of fields is referred to as a ‘data structure’ or a ‘record,’ and an aggregation of records is referred to as a ‘table.’ An aggregation of related tables is called a ‘database.’
A computer system typically operates according to computer program instructions in computer programs. A computer program that supports access to information in a relational database is typically called a database management system or a ‘DBMS.’ A DBMS is responsible for helping other computer programs access, manipulate, and save information in a database.
A DBMS typically supports access and management tools to aid users, developers, and other programs in accessing information in a database. One such tool is the structured query language, ‘SQL.’ SQL is query language for requesting information from a database. Although there is a standard of the American National Standards Institute (‘ANSI’) for SQL, as a practical matter, most versions of SQL tend to include many extensions. Here is an example of a database query expressed in SQL:
select * from stores, transactions
where stores.location = “Minnesota”
and stores.storeID = transactions.storeID
This SQL query accesses information in a database by selecting records from two tables of the database, one table named ‘stores’ and another table named ‘transactions.’ The records selected are those having value “Minnesota” in their store location fields and transactions for the stores in Minnesota. In retrieving the data for this SQL query, an SQL engine will first retrieve records from the stores table and then retrieve records from the transaction table. Records that satisfy the query requirements then are merged in a ‘join.’
Each SQL query is carried out by a sequence of database operations specified as an access plan. An access plan generator creates an access plan for a query. An access plan generator uses an optimizer to optimize the execution of SQL queries against the DBMS based upon database statistics. Another database management statistic useful in creating an access plan includes the percentage of rows of a database table returned by a query. Such information may be used for example in determining whether an access plan includes retrieving a record using a tablescan or an index.
Methods, systems, and computer program products are provided for query expression evaluation for query optimization. Embodiments include receiving a query including a plurality of predicates; creating a truth table representing the possible solutions to the query; identifying a plurality of intersections of the truth table, the plurality of intersections including each predicate of the query; assigning a selectivity factor to each predicate such as by probing database statistics; determining in dependence upon the selectivity factors for each predicate a selectivity rating for each of the intersections of the truth table; selecting one or more of the intersections in dependence upon the selectivity rating; and determining in dependence upon the selected intersections a percentage of rows in a database table returned by the query; and providing the percentage of rows in the database table to an access plan generator for query optimization.
The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular descriptions of exemplary embodiments of the invention as illustrated in the accompanying drawings wherein like reference numbers generally represent like parts of exemplary embodiments of the invention.
Exemplary methods, systems, and products for query expression evaluation for query optimization in accordance with the present invention are described with reference to the accompanying drawings, beginning with
In the example of
In the system of
The arrangement of servers and other devices making up the exemplary system illustrated in
For further explanation,
The exemplary SQL module (260) of
select * from stores, transactions
where stores.storeID = transactions.storeID,
access plan generator (256) may generate the following exemplary access plan for this SQL query:
tablescan stores
join to
index access of transactions
This access plan represents database functions to scan through the stores table and, for each stores record, join all transactions records for the store. The transactions for a store are identified through the storeID field acting as a foreign key. The fact that a selection of transactions records is carried out for each store record in the stores table identifies the join function as iterative.
The exemplary access plan generator (256) of
The exemplary access plan generator (256) also includes an optimizer (254) implemented as computer program instructions that optimize the access plan in dependence upon database management statistics (264). Optimizer (254) optimizes the execution of SQL queries against DBMS (250). Optimizer (254) is implemented as computer program instructions that optimize execution of a SQL query in dependence upon database management statistics (264). Database statistics are typically implemented as metadata of a table, such as, for example, metadata of tables of database (262) or metadata of database indexes. Database statistics may include, for example:
These three database statistics are presented for explanation only, not for limitation.
Another statistic useful in creating an access plan is the percentage of rows of a database table returned by a query. The access plan generator (256) of
The exemplary SQL module (260) of
As mentioned above, query expression evaluation for query optimization in accordance with the present invention is generally implemented with computers, that is, with automated computing machinery. For further explanation,
The access plan generator (256) of
Also stored in RAM (168) is an application (232), a computer program that uses the DBMS (250) to access data stored in a database. Also stored in RAM (168) is an operating system (154). Operating systems useful in computers according to embodiments of the present invention include Unix, Linux, Microsoft NT™, i50S, and many others as will occur to those of skill in the art. Operating system (154), DBMS (250), and application (154) in the example of
The computer (152) of
The exemplary computer (134) of
The example computer of
For further explanation,
The method of
(C1=2 OR C2>4) AND (C3=6 OR C4<=8)
and truth table representing the possible solutions to the query expression:
Truth Table of Possible Solutions to
Query Expression: (C1 = 2 OR C2 > 4) AND (C3 = 6 OR C4 <= 8)
Row
A
B
C
D
Number
C1 = 2
C2 > 4
C3 = 6
C4 <= 8
T/F
0
0
0
0
0
F
1
0
0
0
1
F
2
0
0
1
0
F
3
0
0
1
1
F
4
0
1
0
0
F
5
0
1
0
1
T
6
0
1
1
0
T
7
0
1
1
1
T
8
1
0
0
0
F
9
1
0
0
1
T
10
1
0
1
0
T
11
1
0
1
1
T
12
1
1
0
0
F
13
1
1
0
1
T
14
1
1
1
0
T
15
1
1
1
1
T
In the example above, the query expression contains four predicates C1=2, C2>4, C3=6, and C4<=8 labeled A, B, C, and D respectively. The truth table representing this four predicate query includes sixteen possible solutions to the query in rows 0-15.
Truth tables representing the possible solutions to query expressions may be further simplified or condensed as a plurality of intersections of the query expression. The method of
Intersections of Truth Table for
Query: (C1 = 2 OR C2 > 4) AND (C3 = 6 OR C4 <= 8)
Intersection
A
B
C
D
Number
C1 = 2
C2 > 4
C3 = 6
C4 <= 8
1
X
1
X
1
2
X
1
1
X
3
1
X
X
1
4
1
X
1
X
The semantics of the intersections of the truth table differ from those of the truth table itself in that a value of 1 indicates that a predicate is true while a value of X indicates a disregard for the whether the predicate is true or false. That is, X is a ‘do not care’ with respect to whether the predicate is true or false.
The plurality of intersections (412) above includes each predicate (405) of the query (404) and also implicitly includes the relationships among the predicates (405). The plurality of intersections maintain the relationship among the predicates because each intersection explicitly includes a true solution and implicitly includes other true solutions. For example, intersection row 1 above includes the true solution B (C2>4) AND D (C4<=8) each represented as true by a value of ‘1’ thereby making the query relationship true and implicitly maintains other true solutions because predicates A (C1=2) and C (C3=6) may be either true or false respectively because the value of ‘X’ in the intersections representing either true or false.
The method of
Consider for example the following exemplary intersections having predicates assigned a selectivity factor.
Intersections of Truth Table for
Query Expression: (C1 = 2 OR C2 > 4) AND (C3 = 6 OR C4 <= 8)
A
B
C
D
C1 = 2
C2 > 4
C3 = 6
C4 <= 8
Intersection
Selectivity
Selectivity
Selectivity
Selectivity
Number
Factor = .2
Factor = .7
Factor = .6
Factor = .3
1
X
1
X
1
2
X
1
1
X
3
1
X
X
1
4
1
X
1
X
In the example intersections above, each predicate A, B, C, and D have been assigned a selectivity factor equal to the percentage of rows in the database table containing the predicate. Predicate A (C1=2) is assigned a selectivity factor of 0.2 because C1=2 is true for 20 percent of the rows of the target database table. Predicate B (C2>4) is assigned a selectivity factor of 0.7 because C2>4 is true for 70 percent of the rows of the target database table. Predicate C (C3=6) is assigned a selectivity factor of 0.6 because C3=6 is true for 60 percent of the rows of the target database table. And similarly, predicate D (C4<=8) is assigned a selectivity factor of 0.3 because C4<=8 is true for 30 percent of the rows of the target database table.
The method of
For further explanation, consider the following exemplary intersections whose selectivity ratings have been determined.
Intersections of Truth Table for
Query Expression: (C1 = 2 OR C2 > 4) AND (C3 = 6 OR C4 <= 8)
A
C1 = 2
B
C
D
Inter-
Selectivity
C2 > 4
C3 = 6
C4 <= 8
Selec-
section
Factor =
Selectivity
Selectivity
Selectivity
tivity
Number
.2
Factor = .7
Factor = .6
Factor = .3
Rating
1
X
1
X
1
.21
2
X
1
1
X
.42
3
1
X
X
1
.06
4
1
X
1
X
.12
In the example above, the selectivity rating of intersection 1 is determined to be 0.21 or 21 percent by multiplying the selectivity factor of predicate B (selectivity factor=0.7) by the selectivity factor of predicate D (selectivity factor=0.3), the predicates (B and D) having been identified as true in intersection 1. The selectivity rating of intersection 2 in the example above is determined to be 0.42 or 42 percent by multiplying the selectivity factor of predicate B (selectivity factor=0.7) by the selectivity factor of predicate C (selectivity factor=0.6), the predicates (B and C) having been identified as true in intersection 2. The selectivity rating of intersection 3 in the example above is determined to be 0.06 or 6 percent by multiplying the selectivity factor of predicate A (selectivity factor=0.2) by the selectivity factor of predicate D (selectivity factor=0.3), the predicates (A and D) having been identified as true in intersection 3. Similarly, the selectivity rating of intersection 4 in the example above is determined to be 0.12 or 12 percent by multiplying the selectivity factor of predicate A (selectivity factor=0.2) by the selectivity factor of predicate C (selectivity factor=0.6), the predicates (A and C) having been identified as true in intersection 4.
The method of
A first-tiered method of selecting (418) one or more of the intersections (412) in dependence upon the selectivity rating may be carried out by determining whether any of the intersections has a selectivity rating of zero and selecting the intersections having a selection rating greater than zero if any of the intersections has a selectivity rating of zero as described below with reference to
In some cases, there may be few or no intersections having a selectivity rating of zero but there may be enough intersections having relatively low selectivity ratings that a threshold may be used to select intersections according to the present invention. In such cases, a second-tier methodology uses a threshold greater than zero and in such a second-tier methodology selecting one or more of the intersections in dependence upon the selectivity rating may be carried out by determining whether any of the intersections has a selectivity rating greater than a threshold and selecting the intersections having a selectivity rating greater than the threshold if any of the intersections has a selectivity rating greater than the threshold as described below with reference to
In still other cases, there may be a number of intersections having relatively similar selectivity ratings. In such cases, a third-tiered methodology may be used to select intersections for determining a projected number of rows returned by the query. In such cases, selecting one or more of the intersections in dependence upon the selectivity rating is carried out by selecting a subset of intersections as described below with reference to
After selecting (418) one or more of the intersections (412) in dependence upon the selectivity rating, the method of
Alternatively, and often when there are more selected intersection such as by using the third-tier methodology for intersection selection described below, selecting one or more of the intersections in dependence upon the selectivity rating is carried out by selecting a subset of intersections and determining (422) in dependence upon the intersections (412) a percentage of rows in a database table returned by the query is carried out by determining a projected percentage of rows in the database table determined in dependence upon the subset of intersections as described below with reference to
As mentioned above, selecting one or more of the intersections in dependence upon the selectivity rating may be carried out using a tiered methodology. For further explanation,
For further explanation, consider the following example:
Intersections of Truth Table for
Query Expression: (C1 = 2 OR C2 > 4) AND (C3 = 6 OR C4 <= 8)
A
B
C
D
Inter-
C1 = 2
C2 > 4
C3 = 6
C4 <= 8
Selec-
section
Selectivity
Selectivity
Selectivity
Selectivity
tivity
Number
Factor = .2
Factor = 0
Factor = .6
Factor = .3
Rating
1
X
1
X
1
0
2
X
1
1
X
0
3
1
X
X
1
.06
4
1
X
1
X
.12
In the example above, the selectivity rating of intersection 1 is determined to be zero ‘0’ by multiplying the selectivity factor of predicate B (selectivity factor=0) by the selectivity factor of predicate D (selectivity factor=0.3), the predicates (B and D) having been identified as true in intersection 1. The selectivity rating of intersection 2 in the example above is also determined to be zero ('0) by multiplying the selectivity factor of predicate B (selectivity factor=0) by the selectivity factor of predicate C (selectivity factor=0.6), the predicates (B and C) having been identified as true in intersection 2. The selectivity rating of intersection 3 in the example above is determined to be 0.06 or 6 percent by multiplying the selectivity factor of predicate A (selectivity factor=0.2) by the selectivity factor of predicate D (selectivity factor=0.3), the predicates (A and D) having been identified as true in intersection 3. Similarly, the selectivity rating of intersection 4 in the example above is determined to be 0.12 or 12 percent by multiplying the selectivity factor of predicate A (selectivity factor=0.2) by the selectivity factor of predicate C (selectivity factor=0.6), the predicates (A and C) having been identified as true in intersection 4.
In this first-tier example, two intersections, intersection 1 and intersection 2, are determined to have a selectivity rating of zero. Selecting one or more of the intersections in dependence upon the selectivity rating according to the method of
In many cases, after selecting the intersections having a selection rating greater than zero, or some other small threshold, there are relatively few intersections selected. For further explanation,
The method of
In some relatively rare cases, only one intersection has a selection rating greater than zero. In such cases, summing (606) the percentage (604) of rows in the database table returned by each of the intersections (506) having a selection rating greater than zero includes simply summing the percentage of rows returned by that single non-zero intersection with zero.
As mentioned above, in some embodiments of the present invention, there may be few or no intersections having a selectivity rating of zero but there may be enough intersections having relatively low selectivity ratings such that a threshold may be used to select intersections according to the present invention. In such cases, a second-tier methodology may be used that includes threshold greater than zero for selecting intersections for use in determining a percentage of rows in a database table returned by the query for query optimization. For further explanation, therefore,
If any of the intersections has a selectivity rating greater than the threshold, the method of
For further explanation, consider the following example:
Intersections of Truth Table for
Query Expression: (C1 = 2 OR C2 > 4) AND (C3 = 6 OR C4 <= 8)
A
C1 = 2
B
C
D
Inter-
Selectivity
C2 > 4
C3 = 6
C4 <= 8
Selec-
section
Factor =
Selectivity
Selectivity
Selectivity
tivity
Number
.2
Factor = .01
Factor = .6
Factor = .3
Rating
1
X
1
X
1
.007
2
X
1
1
X
.006
3
1
X
X
1
.14
4
1
X
1
X
.12
In the example above, the selectivity rating of intersection 1 is determined to be 0.007 or 0.7 percent by multiplying the selectivity factor of predicate B (selectivity factor=0.01) by the selectivity factor of predicate D (selectivity factor=0.7), the predicates (B and D) having been identified as true in intersection 1. The selectivity rating of intersection 2 in the example above is determined to be 0.006 or 0.6 percent by multiplying the selectivity factor of predicate B (selectivity factor=0.01) by the selectivity factor of predicate C (selectivity factor=0.6), the predicates (B and C) having been identified as true in intersection 2. The selectivity rating of intersection 3 in the example above is determined to be 0.14 or 14 percent by multiplying the selectivity factor of predicate A (selectivity factor=0.2) by the selectivity factor of predicate D (selectivity factor=0.7), the predicates (A and D) having been identified as true in intersection 3. Similarly, the selectivity rating of intersection 4 in the example above is determined to be 0.12 or 12 percent by multiplying the selectivity factor of predicate A (selectivity factor=0.2) by the selectivity factor of predicate C (selectivity factor=0.6), the predicates (A and C) having been identified as true in intersection 4.
In this first second-tier example, a threshold is established at, for example, one percent. Intersection 1 and intersection 2 are determined to have a selectivity rating below the threshold of one percent. Selecting one or more of the intersections in dependence upon the selectivity rating according to the method of
In many cases, after selecting the intersections having a selection rating greater than the threshold there are relatively few intersections selected. In such cases, determining a percentage of rows in the database table returned by each of the intersections may be carried out as described above with reference to
As mentioned above, in some embodiments, there may be a number of intersections having relatively similar selectivity ratings. In such cases, a third-tiered methodology may be used to select intersections for determining a projected number of rows returned by the query. For further explanation,
For further explanation, consider the following exemplary intersections whose selectivity ratings have been determined.
Intersections of Truth Table for
Query Expression: (C1 = 2 OR C2 > 4) AND (C3 = 6 OR C4 <= 8)
A
C1 = 2
B
C
D
Inter-
Selectivity
C2 > 4
C3 = 6
C4 <= 8
Selec-
section
Factor =
Selectivity
Selectivity
Selectivity
tivity
Number
.2
Factor = .7
Factor = .6
Factor = .3
Rating
1
X
1
X
1
.21
2
X
1
1
X
.42
3
1
X
X
1
.06
4
1
X
1
X
.12
In the example above, selecting (802) a subset (804) of intersections (412) according to the method of
In the method of
For further explanation,
The method of
As mentioned above, total percentage of rows returned by the subset of the intersections is calculated only from the subset. The total percentage of rows returned by the subset is therefore scaled to create a projected percentage. The method of
Σ total intersections/[Σ subset
Where:
As an aid to understanding, the following example case with data is presented for better understanding of determining a projected percentage of rows returned by a query from a subset of intersections as will occur to those of skill in the art. In this example, a ten row database table includes the following data.
Database Table
Row
Number
C1
C2
C3
C4
0
1
1
7
12
1
1
1
7
12
2
2
1
6
12
3
2
5
6
12
4
3
5
7
4
5
4
5
7
5
6
7
5
6
6
7
8
6
6
12
8
9
7
6
13
9
10
8
6
14
In this example the following query expression is (C1=2 OR C2>4) AND (C3=6 OR C4<=8) is applied to the database table having the data above. The following table illustrates that the 8 of the 10 rows (80 percent) are in the result set of the query expression.
Database Table
Row in
Row
Result
Number
C1
C2
C3
C4
Set?
0
1
1
7
12
N
1
1
1
7
12
N
2
2
1
6
12
Y
3
2
5
6
12
Y
4
3
5
7
4
Y
5
4
5
7
5
Y
6
7
5
6
6
Y
7
8
6
6
12
Y
8
9
7
6
13
Y
9
10
8
6
14
Y
The intersections of a truth table representing the possible solutions to the query expression are illustrated in the chart below. The selectivity factors of each predicate in this example are determined by probing the database table above and a selectivity rating for each intersection is included in the table.
Intersections of Truth Table for
Query Expression: (C1 = 2 OR C2 > 4) AND (C3 = 6 OR C4 <= 8)
A
B
C
D
C1 = 2
C2 > 4
C3 = 6
C4 <= 8
Selectivity
Selectivity
Selectivity
Selectivity
Factor =
Factor =
Factor =
Factor =
Select.
Actual
Int. #
.2
.07
.6
.03
Rating
Probe
1
X
1
X
1
.21
.3
2
X
1
1
X
.42
.5
3
1
X
X
1
.06
.0
4
1
X
1
X
.12
.2
Using the method of
A projected percentage of rows returned by the query determined in dependence upon a subset of intersection 2 and intersection 3 by determining a percentage of rows in the database table returned by each of the intersections of the subset. In this example, the database table is probed to determine that intersection 2 returns 0.5 or 50 percent of the rows and intersection 3 returns zero (‘0’) rows.
Determining a projected percentage of rows in this example includes summing the percentage of rows in the database table returned intersection 2 and intersection 3 which in this example is 0.5 or 50 percent.
Determining a projected percentage of rows in this example includes determining a scalar in dependence upon the selectivity ratings of the intersections of the subset (in this case, 0.42+0.06=0.48 percent) and scaling the summed percentage in dependence upon the scalar. Determining a scalar in dependence upon the selectivity ratings of the intersections of the subset is typically carried out by summing the selectivity ratings of all the intersections (in this case 0.21+0.42+0.06+0.12=0.81) and summing the selectivity ratings of the intersections in the subset (in this case 0.42+0.06=0.48), and determining the ratio of the sum of the selectivity ratings of all the intersections and the sum of the selectivity ratings of the intersections of the subset (in this case 0.81/0.48). The scalar is therefore calculated according the following formula:
Σ total intersections/[Σ subset
Where:
In this case the calculated ratio is therefore 0.81/0.48
Scaling the scaling the summed percentage in dependence upon the scalar in this example therefore yields:
(0.81/0.48)*(0.5+0.0)=0.84
The example above yields a projected percentage of rows of 84 percent that is determined from probing only the subset of intersections not all the intersections. The projected percentage of rows which is 84 percent differs slightly from the actual percentage of rows in the database table above returned by the query which is 80 percent.
Exemplary embodiments of the present invention are described largely in the context of a fully functional computer system for query expression evaluation for query optimization. Readers of skill in the art will recognize, however, that the present invention also may be embodied in a computer program product disposed on signal bearing media for use with any suitable data processing system. Such signal bearing media may be transmission media or recordable media for machine-readable information, including magnetic media, optical media, or other suitable media. Examples of recordable media include magnetic disks in hard drives or diskettes, compact disks for optical drives, magnetic tape, and others as will occur to those of skill in the art. Examples of transmission media include telephone networks for voice communications and digital data communications networks such as, for example, Ethernets™ and networks that communicate with the Internet Protocol and the World Wide Web as well as wireless transmission media such as, for example, networks implemented according to the IEEE 802.11 family of specifications. Persons skilled in the art will immediately recognize that any computer system having suitable programming means will be capable of executing the steps of the method of the invention as embodied in a program product. Persons skilled in the art will recognize immediately that, although some of the exemplary embodiments described in this specification are oriented to software installed and executing on computer hardware, nevertheless, alternative embodiments implemented as firmware or as hardware are well within the scope of the present invention.
It will be understood from the foregoing description that modifications and changes may be made in various embodiments of the present invention without departing from its true spirit. The descriptions in this specification are for purposes of illustration only and are not to be construed in a limiting sense. The scope of the present invention is limited only by the language of the following claims.
Faunce, Michael S., Suginaka, Yasunobu
Patent | Priority | Assignee | Title |
11194809, | Dec 02 2016 | International Business Machines Corporation | Predicting performance of database queries |
9213759, | Jun 03 2011 | Hitachi, LTD | System, apparatus, and method for executing a query including boolean and conditional expressions |
9569495, | Aug 21 2014 | International Business Machines Corporation | Feedback mechanism providing row-level filtering earlier in a plan |
9582540, | Aug 21 2014 | International Business Machines Corporation | Feedback mechanism providing row-level filtering earlier in a plan |
Patent | Priority | Assignee | Title |
4829427, | May 25 1984 | Data General Corporation | Database query code generation and optimization based on the cost of alternate access methods |
4956774, | Sep 02 1988 | International Business Machines Corporation | Data base optimizer using most frequency values statistics |
5091852, | Jan 29 1988 | Hitachi, Ltd. | System for optimizing query processing in a relational database |
5930785, | Mar 31 1995 | International Business Machines Corporation | Method for detecting and optimizing queries with encoding/decoding tables |
6381616, | Mar 24 1999 | Microsoft Technology Licensing, LLC | System and method for speeding up heterogeneous data access using predicate conversion |
6567804, | Jun 27 2000 | TERADATA US, INC | Shared computation of user-defined metrics in an on-line analytic processing system |
6697961, | Sep 17 1999 | Nortel Networks Corporation | Method and system for describing predicates in disjuncts in procedures for test coverage estimation |
6721724, | Mar 31 2000 | Microsoft Technology Licensing, LLC | Validating multiple execution plans for database queries |
6748392, | Mar 06 2001 | Microsoft Technology Licensing, LLC | System and method for segmented evaluation of database queries |
6968330, | Nov 29 2001 | International Business Machines Corporation | Database query optimization apparatus and method |
7363299, | Nov 18 2004 | University of Washington | Computing probabilistic answers to queries |
20030009446, | |||
20050065921, | |||
20050097078, | |||
20050108204, | |||
20080201295, | |||
20080235181, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Mar 21 2007 | FAUNCE, MICHAEL S | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 019053 | /0771 | |
Mar 21 2007 | SUGINAKA, YASUNOBU | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 019053 | /0771 | |
Mar 23 2007 | International Business Machines Corporation | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Nov 13 2017 | REM: Maintenance Fee Reminder Mailed. |
Apr 30 2018 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Apr 01 2017 | 4 years fee payment window open |
Oct 01 2017 | 6 months grace period start (w surcharge) |
Apr 01 2018 | patent expiry (for year 4) |
Apr 01 2020 | 2 years to revive unintentionally abandoned end. (for year 4) |
Apr 01 2021 | 8 years fee payment window open |
Oct 01 2021 | 6 months grace period start (w surcharge) |
Apr 01 2022 | patent expiry (for year 8) |
Apr 01 2024 | 2 years to revive unintentionally abandoned end. (for year 8) |
Apr 01 2025 | 12 years fee payment window open |
Oct 01 2025 | 6 months grace period start (w surcharge) |
Apr 01 2026 | patent expiry (for year 12) |
Apr 01 2028 | 2 years to revive unintentionally abandoned end. (for year 12) |