A method, apparatus, and article of manufacture for incrementally maintaining a summary table derived from a plurality of base tables in a database stored on the computer. A plurality of modifications are applied to the base tables, and the applied modifications are then propagated to the summary table. The propagated modifications include an associated tag column containing a tag value, and the tag value indicates per row of the propagated modifications how many rows are to be added to or deleted from the summary table.
|
1. A method of incrementally maintaining a summary table derived from a plurality of base tables in a database stored on the computer, the method comprising:
(a) applying a plurality of modifications to the base tables; and
(b) propagating the applied modifications to the summary table, wherein the propagated modifications include an associated tag column containing a tag value, and the tag value indicates how many rows are to be added or deleted from the summary table.
59. An article of manufacture embodying logic for performing a method of incrementally maintaining a summary table derived from a plurality of base tables in a database stored on the computer, the method comprising:
(a) applying a plurality of modifications to the base tables; and
(b) propagating the applied modifications to the summary table, wherein the propagated modifications include an associated tag column contain a tag value, and the tag value indicates how many rows are to be added or deleted from the summary table.
30. A computer-implemented apparatus for incrementally maintaining a summary table derived from a plurality of base tables in a database stored on the computer, comprising:
(a) logic, performed by the computer, for:
(1) applying a plurality of modifications to the base tables; and
(2) propagating the applied modifications to the summary table, wherein the propagated modifications include an associated tag column containing a tag value, and the tag value indicates how many rows are to be added or deleted from the summary table.
3. The method of
4. The method of
5. The method of
if the modification is an inserted row, creating a propagated row with values from the inserted row and the tag value equal to +1,
if the modification is a deleted row, creating a propagated row with values from the deleted row and the tag value equal to −1, and
if the modification is an updated row, decomposing the updated row into two propagated rows, wherein a first propagated row contains old values from the updated row and the tag value equal to −1, and a second propagated row contains new values from the updated row and the tag value equal to +1.
6. The method of
7. The method of
8. The method of
9. The method of
11. The method of
12. The method of
13. The method of
applying zero ox more predicates to one or mote columns of the input delta to select the rows that will appear in the output delta; and
deriving the tag column of the output delta directly from the input delta tag column, wherein the tag values are not affected by the selection operation.
14. The method of
selecting one or more columns that are required by the projection operation from the input delta to produce the output delta, and
deriving the output delta tag column directly from the input delta tag column without change to the tag values.
15. The method of
pruning operands with no input delta from the union all operation,
if there is exactly one operand remaining (N=1) after pruning, then converting the union all operation to a selection operation without predicates, applying zero or more predicates to one or more columns of the input delta to select the rows that will appear in the output delta, and deriving the tag column of the output delta directly from the input delta tag column, wherein the tag values are not affected by the selection operation; and
if there is more than one operand remaining (N>1) after pruning, then deriving the output delta tag column by unioning together the tag columns of the input deltas, wherein the tag values are not changed by the union all operation.
16. The method of
17. The method of
18. The method of
generating one or mote join terms for the inner join operation,
adding an output delta tag column to each of the generated join terms, and
injecting a union operation that includes the generated join terms as its operands into the propagating step, and then applying the following steps:
pruning operands with no input delta from the union operation,
if there is exactly one operand remaining (N=1) after pruning, then converting the union operation to a selection operation without predicates, applying zero or more predicates to one or more columns of the input delta to select the rows that will appear in the output delta, and deriving the tag column of the output delta directly from the input delta tag column, wherein the tag values are not affected by the selection operation; and
if there is more than one operand remaining (N>1) after pruning, then deriving the output delta tag column with the following rule: for a given join term that contains K<=M deltas with corresponding input tag columns input-tag1, input-tag2, . . . , input-tagK:
output delta tag=input-tag1*input-tag2* . . . *input-tagK*(−1)**(K+1).
19. The method of
20. The method of
21. The method of
22. The method of
23. The method of
24. The method of
25. The method of
26. The method of
(1) the summary table is defined as an aggregate query,
(2) a FROM clause for the aggregate query contains only inner joins,
(3) a FROM clause for the aggregate query contains only a single table reference,
(4) the aggregate query does not contain a HAVING clause,
(5) aggregate functions in a SELECT list of the query are self-maintainable with respect to insertions and deletions, and
(6) a SELECT DISTINCT operation is not allowed.
27. The method of
28. The method of
29. The method of
32. The apparatus of
33. The apparatus of
34. The apparatus of
if the modification is an inserted row, creating a propagated row with values from the inserted tow and the tag value equal to +1,
if the modification is a deleted row, creating a propagated row with values from the deleted row and the tag value equal to −1, and
if the modification is an updated row, decomposing the updated row into two propagated rows, wherein a first propagated row contains old values from the updated tow and the tag value equal to −1, and a second propagated row contains new values from the updated row and the tag value equal to +1.
35. The apparatus of
36. The apparatus of
37. The apparatus of
38. The apparatus of
40. The apparatus of
41. The apparatus of
42. The apparatus of
applying zero or more predicates to one or more columns of the input delta to select the rows that will appear in the output delta; and
deriving the tag column of the output delta directly from the input delta tag column, wherein the tag values are not affected by the selection operation.
43. The apparatus of
selecting one or more columns that are required by the projection operation from the input delta to produce the output delta, and
deriving the output delta tag column directly from the input delta tag column without change to the tag values.
44. The apparatus of
pruning operands with no input delta from the union all operation,
if there is exactly one operand remaining (N=1) after pruning, then converting the union all operation to a selection operation without predicates, applying zero or mote predicates to one or more columns of the input delta to select the tows that will appear in the output delta, and deriving the tag column of the output delta directly from the input delta tag column, wherein the rag values are not affected by the selection operation; and
if there is more than one operand remaining (N>1) after pruning, then deriving the output delta tag column by unioning together the tag columns of the input deltas, wherein the tag values are not changed by the union all operation.
45. The apparatus of
46. The apparatus of
47. The apparatus of
generating one or more join terms for the inner join operation,
adding an output delta tag column to each of the generated join terms, and
injecting a union operation tag includes the generated join terms as its operands into the propagating step, and then applying the following steps:
pruning operands with no input delta from the union operation,
if there is exactly one operand remaining (N=1) after pruning, then converting the union operation to a selection operation without predicates, applying zero or more predicates to one or more columns of the input delta to select the rows that will appear in the output delta, and deriving the tag column of the output delta directly from the input delta tag column, wherein the tag values are not affected by the selection operation; and
if there is mote than one operand remaining (N>1) after pruning, then deriving the output delta tag column with the following rule: for a given join term that contains K<=M deltas with corresponding input tag columns input-tag1, input-tag2, . . . , input-tagK:
output delta tag=input-tag1*input-tag2* . . . *input-tagK*(−1)**(K+1).
48. The apparatus of
49. The apparatus of
50. The apparatus of
51. The apparatus of
52. The apparatus of
53. The apparatus of
54. The apparatus of
55. The apparatus of
(1) the summary table is defied as an aggregate query,
(2) a FROM clause for the aggregate query contains only inner joins,
(3) a FROM clause for the aggregate query contains only a single table reference,
(4) the aggregate query does not contain a HAVING clause,
(5) aggregate functions in a SECT list of the query are self-maintainable with respect to insertions and deletions, and
(6) a SELECT DISTINCT operation is not allowed.
56. The apparatus of
57. The apparatus of
58. The apparatus of
61. The article of manufacture of
62. The article of manufacture of
63. The article of manufacture of
if the modification is an inserted row, creating a propagated row with values from the inserted row and the tag value equal to +1,
if the modification is a deleted row, creating a propagated row with values from the deleted row and the tag value equal to −1, and
if the modification is an updated row, decomposing the updated row into two propagated rows, wherein a first propagated row contains old values from the updated tow and the tag value equal to −1, and a second propagated row contains new values from the updated row and the tag value equal to +1.
64. The article of manufacture of
65. The article of manufacture of
66. The article of manufacture of
67. The article of manufacture of
69. The article of manufacture of
70. The article of manufacture of
71. The article of manufacture of
applying zero or more predicates to one or more columns of the input delta to select the rows that will appear in the output delta; and
deriving the tag column of the output delta directly from the input delta tag column, wherein the tag values are not affected by the selection operation.
72. The article of manufacture of
selecting one or more columns that are required by the projection operation from the input delta to produce the output delta, and
deriving the output delta tag column directly from the input delta tag column without change to the tag values.
73. The article of manufacture of
pruning operands with no input delta from the union all operation,
if there is exactly one operand remaining (N=1) after pruning, then converting the union all operation to a selection operation without predicates, applying zero or more predicates to one or more columns of the input delta to select the rows that will appear in the output delta, and deriving the tag column of the output delta directly from the input delta tag column, wherein the tag values are not affected by the selection operation; and
if there is more than one operand remaining (N>1) after pruning, then deriving the output delta tag column by unioning together the tag columns of the input deltas, wherein the tag values are not changed by the union all operation.
74. The article of manufacture of
75. The article of manufacture of
76. The article of manufacture of
comprises an inner join operation and the propagating step comprises:
generating one or more join terms for the inner join operation,
adding an output delta tag column to each of the generated join terms, and
injecting a union operation that includes the generated join terms as its operands into the propagating step, and then applying the following steps:
pruning operands with no input delta from the union operation,
if there is exactly one operand remaining (N=1) after pruning, then converting the union operation to a selection operation without predicates, applying zero or more predicates to one or more columns of the input delta to select the rows that will appear in the output delta, and deriving the tag column of the output delta directly from the input delta tag column, wherein the tag values are not affected by the selection operation; and
if there is more than one operand remaining (N>1) after pruning, then deriving the output delta tag column with the following rule: for a given join term that contains K<=M deltas with corresponding input tag columns input-tag1, input-tag2, . . . , input-tagK:
out delta tag=input-tag1*input-tag2* . . . *input-tagK*(−1)**(+1).
77. The article of manufacture of
78. The article of manufacture of
79. The article of manufacture of
81. The article of manufacture of
82. The article of manufacture of
83. The article of manufacture of
84. The article of manufacture of
(1) the summary table is defined as an aggregate query,
(2) a FROM clause for the aggregate query contains only inner joins,
(3) a FROM clause for the aggregate query contains only a single table reference,
(4) the aggregate query does not contain a HAVING clause,
(5) aggregate functions in a SELECT list of the query are self-maintainable with respect to insertions and deletions, and
(6) a SELECT DISTINCT operation is not allowed.
85. The article of manufacture of
86. The article of manufacture of
87. The article of manufacture of
|
This application claims the benefit under 35 U.S.C §119(e) of co-pending and commonly-assigned U.S. provisional application Ser. No. 60/135,277, entitled “INCREMENTAL MAINTENANCE OF AGGREGATED JOIN SUMMARY TABLES”, filed on May 21, 1999, by Roberta J. Cochrane, Mir H. Pirahesh, and Richard S. Sidle, which application is incorporated by reference herein.
1. Field of the Invention
This invention relates in general to database management systems performed by computers, and in particular, to the incremental maintenance of aggregated and join summary tables.
2. Description of Related Art
Computer systems incorporating an Relational DataBase Management System (RDBMS) using a Structured Query Language (SQL) interface are well known in the art. The SQL interface has evolved into a standard language for performing operations in an RDBMS and has been adopted as such by both the American National Standards Institute (ANSI) and the International Standards Organization (ISO).
One application for an RDBMS is a data warehouse. The star schema of a typical data warehouse is comprised of a small number of very large fact tables and some number of relatively small dimension tables. The complex analytical queries that are evaluated in data warehouse environments often join one or more of the fact tables together with some dimension tables to pick up dimension attributes and then aggregate the results of these joins.
A common characteristic of the data warehouse is that data from operational systems are periodically propagated to the warehouse. For example, bulk load append operations may be applied to the fact tables on a nightly basis. Old data may also be periodically purged from the fact tables via delete statements. Dimension tables are typically more static in nature and are updated less often.
The sheer volume of data in the warehouse requires that the number of fact table scans must be minimized. Automatic summary tables, which are also known as “materialized views”, are commonly used in data warehouses to help achieve this goal. Result sets that can be used in the computation of multiple queries are pre-computed and materialized in tables. The queries are then rerouted (either manually by rewriting the query or automatically by sophisticated query compilers) to refer to the summary tables rather than the base fact table data.
However, summary tables must be maintained efficiently to reflect the periodic updates to the data warehouse. Given the vast amount of data involved, there is a need in the art for efficient maintenance methods for summary tables. Moreover, there is a need in the art for maintenance methods that are incremental in nature to avoid full re-computation of the queries that generated the summary tables.
To overcome the limitations in the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention discloses a method, apparatus, and article of manufacture for incrementally maintaining a summary table derived from a plurality of base tables in a database stored on the computer. A plurality of modifications are applied to the base tables, and the applied modifications are then propagated to the summary table. The propagated modifications include an associated tag column containing a tag value, and the tag value indicates per row of the propagated modifications how many rows are to be added to or deleted from the summary table.
Referring now to the drawings in which like reference numbers represent corresponding parts throughout:
In the following description of the preferred embodiment, reference is made to the accompanying drawings, which form a part hereof, and in which is shown by way of illustration a specific embodiment in which the invention may be practiced. It is to be understood that other embodiments may be utilized and structural and functional changes may be made without departing from the scope of the present invention.
As illustrated in
At the heart of the RDBMS is the Database Services module 114. The Database Services module 114 contains several submodules, including a Relational Database System (RDS) 116, Data Manager 118, Buffer Manager 120, and Other Components 122 such as an SQL compiler/interpreter, and including an optimizer function. These submodules support the functions of the SQL language, i.e., definition, access control, retrieval, and update.
Generally, the RDBMS comprises logic and/or data that is embodied in or retrievable from a device, medium, or carrier, e.g., a fixed or removable data storage device, a remote device coupled to the computer by a data communications device, etc. Moreover, this logic and/or data, when read, executed, and/or interpreted by the computer 100, cause the computer 100 to perform the steps necessary to implement and/or use the present invention.
Thus, the present invention may be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof. The term “article of manufacture” as used herein is intended to encompass logic and/or data embodied in or accessible from any device, carrier, or media.
Of course, those skilled in the art will recognize many modifications may be made to this configuration without departing from the scope of the present invention. For example, those skilled in the art will recognize that any combination of the above components, or any number of different components, including computer programs, peripherals, and other devices, may be used to implement the present invention, so long as similar functions are performed thereby.
For most RDBMS software, combinations of tables and views are used to access data stored in tables in the database. A view definition includes a query that, if processed, provides a temporary results table based on the results of the query at that point in time. Using an INSERT statement and an appropriately defined table in the database, the temporary results table can be stored in the database. To refresh this table, the user would need to perform a DELETE from the table and then perform the INSERT again.
Users can directly query against the created table, provided that the users are aware how the results were derived. Generally, the RDBMS is not aware that such a table is any different from any other table in the database. However, this table cannot be used by an optimizer within the RDBMS to improve performance, even though the table may contain data that would drastically improve the performance of other queries.
This leads to the notion of summary tables or materialized views as envisioned by the present invention. These tables are similar to the created table described above, except that the definition of the table is based on a “full select” (much like a view) that is materialized in the table. The columns of the table are based on the elements of the select list of the full select.
As noted above, summary tables must be maintained efficiently to reflect the periodic updates to the database. Given the vast amount of data involved, a critical feature of efficient maintenance methods is that they must be incremental in nature to avoid full re-computation of the queries that generated the summary tables. One class of summary tables that is known as “self-maintainable materialized views” may be maintained incrementally by processing only the set of updated rows without reference to the underlying tables of the query that generated the summary tables. Other summary tables can be maintained incrementally by processing the updated rows and visiting the underlying tables to re-compute some portions of the materialized view query.
Consider summary tables that satisfy the following criteria:
The summary tables of this set that have a single table reference in the FROM clause are self-maintainable but the remaining summary tables of the set that contain inner joins in the FROM clause are not self-maintainable.
The present invention comprises a method to incrementally maintain the set of summary tables that is defined by the preceding criteria. Prior work (see [MQM97]) has addressed the incremental maintenance of this set of summary tables when exactly one of the underlying tables in the FROM clause has been updated. The primary difference and contribution of the present invention is that it supports incremental maintenance of this set of summary tables when one or more of the underlying tables have been updated. In fact, the present invention supports incremental maintenance for all possible updates to the set of underlying tables of such summary tables. The method of this invention can also be applied to the propagation of joins in the absence of aggregation and to the deferred incremental maintenance of joins with and without aggregation.
Incremental Maintenance
As in prior work on incremental maintenance, the present invention is comprised of two independent phases: propagate and apply. Given the update, delete and insert operations on the underlying tables of the summary table, the propagate phase computes the set of rows that must be changed in the summary tables and how each of these rows must be changed. The apply phase simply joins the set of rows to be changed with the existing contents of the summary table and then given the results of this join, makes the appropriate changes to the summary table by updating or deleting existing rows or inserting new rows. The propagate phase is where the present invention handles the changes to the underlying tables of the summary table and is thus the focus of this specification.
Assume that there are two table functions per underlying table which give the sets of deleted and inserted rows for the underlying table. Updated rows are decomposed into two rows each: a deleted row containing the old values of the columns for the updated row and an inserted row containing the new values. These table functions are referred to as D(Ti) and I(Ti), respectively, for i=(1, 2, . . . N) where N is the number of table references in the query of the summary table and Tj is one such table reference. The inner workings of these table functions are not of importance to this discussion, but in the example of a bulk load append operation for the table referenced by Tj, I(Tj) might be implemented by a partial scan of the appended table starting at the first appended row. Note that a given table may appear multiple times in the set {T1, T2, . . . TN} and so, for example, the present invention supports joins of a table with itself.
The set of all changed rows for a given table reference Tj is the union of D(Tj) and I(Tj). This set is the “delta” for the table reference. The deleted and inserted rows of this delta are differentiated by adding a tag column that has value −1 for deletes and +1 for inserts. The delta for a given table reference Tj where the additional column is named “tag” can be expressed in SQL as:
Each relational operation can be thought of as a table function that creates a well-defined output table from each input table. During the propagate phase, the following rules are used to derive an output delta from an input delta for the relational operations of selection, projection, union all and inner join. These rules show how to propagate the tag column values across the operation. They are fully independent of each other which enables the present invention to support incremental maintenance for any combination of these operations in a FROM clause of the materialized view query.
Rules for Selection
The predicates (if any) are applied to the columns of the input delta to select the rows that will appear in the output delta. The tag column of the output delta is derived directly from the input delta tag column. The values of the tag column are not affected by the selection operation.
Rules for Projection
The columns that are required by the projection operation are selected from the input delta to produce the output delta. As for selection, the output delta tag column is derived directly from the input delta tag column without change to the values of this column.
Rules for Union All
Assume a general union all operation that has N operands where N>1. Some operands of the union may not have an input delta since there may be no table references in the operand or the tables referenced by the operand may not have been updated. The operands with no input delta are pruned from the union operation. If there is exactly one operand remaining (N=1), then the union operation is converted to a selection operation without predicates and the corresponding rule is applied. If N>1 after pruning, then the output delta tag column is derived directly from the tag column of the corresponding input row. The tag column values are not changed by the union all operation.
Rules for Inner Join
Assume an N-way inner join operation where M<=N of the join operands have deltas. Consider the 2-way case where both operands have input deltas (i.e. N=M=2). Given operands X and Y and the corresponding deltas named delta-X and delta-Y, then the result of the inner join operation X′><Y′ can be expressed as:
X′><Y′=(X+delta-X)><(Y+delta-Y)=X><Y+X><delta-Y+delta-X><Y+delta-X><delta-Y
Substituting for these in the above equation gives:
output−delta=(X′−delta-X)><delta-Y+delta-X><(Y′−delta-Y)+delta-X><delta-Y=(X′><delta-Y+delta-X><Y)−delta-X><delta-Y
Therefore, it can be seen that the output delta for this 2-way inner join may be computed by evaluating 3 inner joins and then performing set union and set difference operations on the results of these joins. Furthermore, it can be observed that all join combinations for which the join contains at least one delta appear in this equation. In general for all M, to compute the output delta requires the evaluation 2**M−1 joins. The general form of this equation is a set difference of form (A−B) where A is the result of a set union of the 2**(M−1) join terms that each contain an odd number of deltas and B is the result of a set union of the remaining 2**(M−1)−1 join terms that each contain an even number of deltas.
Note that when M<N, the non-delta operands appear unchanged in the join terms of this equation. For example, given operands X Y and Z where X and Y have deltas but Z does not (i.e. Z′=Z), then
output−delta=(X′><delta-Y><Z′+delta-X><Y′><Z′)−delta-X><delta-Y><Z′
However, rather than computing the set difference in the general form of this equation, the tag column of each join term is computed to keep track of which rows in the output delta of the join term are deletions and which are insertions, a union all operation is injected where its operands are all of the join terms of this equation, and then the union all rule is applied to the injected operation.
The rule to calculate the tag column of a given join term that contains K<=M deltas with corresponding input tag columns input-tag1, input-tag2, . . . , input-tagK is:
In summary, the inner join delta propagation rule is comprised of the following steps:
Note that this rule reduces to the simple selection rule when there is exactly one input operand (i.e. N=1) or there is only one input operand with a delta (i.e. M=1).
A set difference operation could have been used here and a corresponding rule created, but when the summary tables contain aggregation, this aggregation can be relied on to collapse the set of deleted and inserted rows for a given group and thus avoid the need to evaluate this set difference. Note that aggregation is not needed in the summary table definition to use this technique. The prior art does describe the use of a count column or a primary key for dealing with duplicates, wherein an internal aggregation is injected to maintain count duplication for joins. See, e.g., J. A. Blakeley, P. A. Larson, and F. W. Tompa, Efficiently Updating Materialized Views, Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 61–71, Washington, D.C, June 1986, and Y. Zhuge, H. Garcia-Molina, J. Hammer, and J. Widom, View Maintenance in a Warehousing Environment, Proceedings of the ACM Sigmod International Conference on Management of Data, pages 316–327, San Jose, Calif., May 1995, which are incorporated by reference herein. In fact, the tag column of the present invention supports this technique as well, although it need not be restricted to only aggregation.
Although the number of join terms generated by this rule is exponential in the number of input deltas, it is unlikely that there will be a large number of deltas in practice since there are typically few fact tables and the dimension tables are rarely updated. There are opportunities to optimize the number join terms for certain types of updates. For example, if it can be guaranteed that the results of the inner join operation can contain no duplicates and that the deltas contain only inserted rows (which may be common in data warehouses due to bulk load append operations), then it is possible to create exactly M join terms each of which contains a single and different delta and then apply a distinct operation on the results of the union operation to eliminate duplicates.
These rules are used to propagate the delta and corresponding tag column from the updated referenced tables through all the relational operations of the FROM clause. The group by relational operation may now be applied to the result to compute the delta changes to the aggregate functions of the summary table. The set of grouping columns is unchanged. The tag column is used to compute the delta change to the aggregate functions as follows:
If C is not nullable, then this can be simplified to SUM(TAG).
In order to guarantee correctness, the base table references in join expressions must be accessed using repeatable read isolation level unless there is an N to 1 relationship from the expression to the base table reference. In the latter case, concurrency can be improved by downgrading the isolation level to cursor stability.
In the example application, the computer system 100 of a credit card company stores credit card customer information, their credit card accounts, and transactions that customers made using credit cards for purchases. A possible database schema, comprising a “star” schema, is illustrated by
The following “CREATE TABLE” statements may be used to create the tables shown in
In this application, all tables in this database are partitioned via the partitioning keys defined above.
Given the following summary table:
Given that the fact tables TRANS and TRANSITEM have been updated, the following is an SQL version of the result of the propagate phase of compilation:
Block 500 represents the RDBMS computing a result set from at least one underlying base table in the database. In the preferred embodiment, the result set is generated using join operations with or without aggregation operations.
Block 502 represents the RDBMS materializing the computed result set in a summary table. In the preferred embodiment, the summary table satisfies the following criteria: (1) the summary table is defined as an aggregate query, (2) a FROM clause for the aggregate query contains only inner joins, (3) a FROM clause for the aggregate query contains only a single table reference, (4) the aggregate query does not contain a HAVING clause, (5) aggregate functions in a SELECT list of the query are self-maintainable with respect to insertions and deletions, and (6) a SELECT DISTINCT operation is not allowed.
Block 600 is a decision block that represents the optimizer determining whether there is one or more update, delete or insert statements in the query. If so, control transfers to Block 602; otherwise control transfers to Block 606.
Block 602 is a decision block that represents the optimizer determining whether there is one or more summary tables or materialized views referencing one or more tables modified by one of the update, insert, or delete statements in the query. If so, control transfers to Block 604; otherwise, control transfers to Block 606.
Block 604 represents the RDBMS compiling in incremental maintenance of the summary table, as further described in
Finally, Block 606 represents the RDBMS performing normal query optimization, building an execution plan, and executing the query.
Block 700 represents the RDBMS propagating the updates by computing a set of rows that must be changed in the summary table and how each of these rows must be changed. This Block is further described in
Block 702 represents the RDBMS applying the updates by associating a change with a row in the summary table and then making appropriate changes to the summary table by updating existing rows, deleting existing rows, and inserting new rows.
Block 800 represents the RDBMS decomposing each of the rows that must be changed in the summary table into a deleted row containing old values of the columns for the row and an inserted row containing new values of the columns for the row. In the case of deletes and inserts, this step only generates one row.
The underlying base tables are referred to as Ti for i=(1, 2, . . . , N, the deleted rows after decomposition are referred to as D(Ti), the inserted rows after decomposition are referred to as I(Ti), and the set of all updated rows for an underlying base table Tj is a union of D(Tj) and I(Tj). Note that each of the underlying base tables may appear multiple times in a set {T1, T2, . . . ,TN}.
The set of all updated rows for an underlying base table is referred to as a delta for the underlying base table. The deleted and inserted rows of the delta are differentiated by adding a tag column to each of the rows. In one embodiment, the tag column has a value of −1 for deleted rows and a value of +1 for inserted rows.
Block 802 represents the RDBMS deriving an output delta from an input delta for one or more relational operations used in computing the result set. This requires propagating the tag column values across each of the relational operations. This Block is further described in
Block 900 is a decision block that represents the RDBMS looping through the relational operations used in computing the result. For each of the operations, control transfers to Block 902; upon completion of the loop, the logic of
Block 902 is a decision block that represents the RDBMS determining whether the next relational operation is a selection operation. If so, control transfers to Block 904, which represents the RDBMS performing the steps illustrated in
Block 906 is a decision block that represents the RDBMS determining whether the next relational operation is a projection operation. If so, control transfers to Block 908, which represents the RDBMS performing the steps illustrated in
Block 910 is a decision block that represents the RDBMS determining whether the next relational operation is a union all operation. If so, control transfers to Block 912, which represents the RDBMS performing the steps illustrated in
Block 914 is a decision block that represents the RDBMS determining whether the next relational operation is an inner join operation. If so, control transfers to Block 916, which represents the RDBMS performing the steps illustrated in
Block 1000 represents the RDBMS applying zero or more predicates to one or more columns of the input delta to select the rows that will appear in the output delta.
Block 1002 represents the RDBMS deriving the tag column of the output delta directly from the input delta tag column, wherein the values of the tag column are not affected by the selection operation.
Block 1100 represents the RDBMS selecting one or more columns that are required by the projection operation from the input delta to produce the output delta.
Block 1102 represents the RDBMS deriving the output delta tag column directly from the input delta tag column without change to the values of this column.
Block 1200 represents the RDBMS pruning operands with no input delta from the union operation. This Block may include pruning operands that reference only table functions and/or pruning operands that reference an underlying base table that has not been updated.
Block 1202 is a decision block that represents the RDBMS determining whether there is exactly one operand remaining (N=1) after pruning. If so, control transfers to Block 1204; otherwise, control transfers to Block 1208.
Block 1204 represents the RDBMS converting the union operation to a selection operation without predicates.
Block 1206 represents the RDBMS invoking the selection function of
Block 1208 represents the RDBMS deriving the output delta tag column by unioning together the tag columns of the input deltas, wherein the tag column values are not changed by the union all operation.
Block 1300 represents the RDBMS generating one or more join terms for the inner join operation.
Block 1302 represents the RDBMS adding an output delta tag column to each of the generated join terms.
Block 1304 represents the RDBMS converting the inner join operation into a union all operation that includes the generated join terms as its operands.
Block 1306 represents the RDBMS recursively invoking the union all function of
This concludes the description of the preferred embodiment of the invention. The following describes some alternative embodiments for accomplishing the present invention. For example, any type of computer, such as a mainframe, minicomputer, or personal computer, could be used with the present invention. In addition, any software program adhering (either partially or entirely to the relational paradigm could benefit from the present invention.
In summary, the present invention discloses a method, apparatus, and article of manufacture for incrementally maintaining a summary table derived from a plurality of base tables in a database stored on the computer. A plurality of modifications are applied to the base tables, and the applied modifications are then propagated to the summary table. The propagated modifications include an associated tag column containing a tag value, and the tag value indicates per row of the propagated modifications how many rows are to be added to or deleted from the summary table.
The foregoing description of the preferred embodiment of the invention has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed. Many modifications and variations are possible in light of the above teaching.
Cochrane, Roberta Jo, Pirahesh, Mir Hamid, Sidle, Richard Sefton
Patent | Priority | Assignee | Title |
10002153, | May 14 2015 | Deephaven Data Labs LLC | Remote data object publishing/subscribing system having a multicast key-value protocol |
10002154, | Aug 24 2017 | Deephaven Data Labs LLC | Computer data system data source having an update propagation graph with feedback cyclicality |
10002155, | May 14 2015 | Deephaven Data Labs LLC | Dynamic code loading |
10003673, | May 14 2015 | Deephaven Data Labs LLC | Computer data distribution architecture |
10019138, | May 14 2015 | Deephaven Data Labs LLC | Applying a GUI display effect formula in a hidden column to a section of data |
10069943, | May 14 2015 | Deephaven Data Labs LLC | Query dispatch and execution architecture |
10176211, | May 14 2015 | Deephaven Data Labs LLC | Dynamic table index mapping |
10198465, | May 14 2015 | Deephaven Data Labs LLC | Computer data system current row position query language construct and array processing query language constructs |
10198466, | May 14 2015 | Deephaven Data Labs LLC | Data store access permission system with interleaved application of deferred access control filters |
10198469, | Aug 24 2017 | Deephaven Data Labs LLC | Computer data system data source refreshing using an update propagation graph having a merged join listener |
10212257, | May 14 2015 | Deephaven Data Labs LLC | Persistent query dispatch and execution architecture |
10241960, | May 14 2015 | Deephaven Data Labs LLC | Historical data replay utilizing a computer system |
10241965, | Aug 24 2017 | Deephaven Data Labs LLC | Computer data distribution architecture connecting an update propagation graph through multiple remote query processors |
10242040, | May 14 2015 | Deephaven Data Labs LLC | Parsing and compiling data system queries |
10242041, | May 14 2015 | Deephaven Data Labs LLC | Dynamic filter processing |
10346394, | May 14 2015 | Deephaven Data Labs LLC | Importation, presentation, and persistent storage of data |
10353893, | May 14 2015 | Deephaven Data Labs LLC | Data partitioning and ordering |
10430409, | Dec 20 2012 | INFOR US , LLC | Maintenance of active database queries |
10452649, | May 14 2015 | Deephaven Data Labs LLC | Computer data distribution architecture |
10496639, | May 14 2015 | Deephaven Data Labs LLC | Computer data distribution architecture |
10540351, | May 14 2015 | Deephaven Data Labs LLC | Query dispatch and execution architecture |
10552412, | May 14 2015 | Deephaven Data Labs LLC | Query task processing based on memory allocation and performance criteria |
10565194, | May 14 2015 | Deephaven Data Labs LLC | Computer system for join processing |
10565206, | May 14 2015 | Deephaven Data Labs LLC | Query task processing based on memory allocation and performance criteria |
10572474, | May 14 2015 | Deephaven Data Labs LLC | Computer data system data source refreshing using an update propagation graph |
10621168, | May 14 2015 | Deephaven Data Labs LLC | Dynamic join processing using real time merged notification listener |
10642829, | May 14 2015 | Deephaven Data Labs LLC | Distributed and optimized garbage collection of exported data objects |
10657184, | Aug 24 2017 | Deephaven Data Labs LLC | Computer data system data source having an update propagation graph with feedback cyclicality |
10678787, | May 14 2015 | Deephaven Data Labs LLC | Computer assisted completion of hyperlink command segments |
10691686, | May 14 2015 | Deephaven Data Labs LLC | Computer data system position-index mapping |
10783191, | Aug 24 2017 | Deephaven Data Labs LLC | Computer data distribution architecture for efficient distribution and synchronization of plotting processing and data |
10866943, | Aug 24 2017 | Deephaven Data Labs LLC | Keyed row selection |
10909183, | Aug 24 2017 | Deephaven Data Labs LLC | Computer data system data source refreshing using an update propagation graph having a merged join listener |
10915526, | May 14 2015 | Deephaven Data Labs LLC | Historical data replay utilizing a computer system |
10922311, | May 14 2015 | Deephaven Data Labs LLC | Dynamic updating of query result displays |
10929394, | May 14 2015 | Deephaven Data Labs LLC | Persistent query dispatch and execution architecture |
11023462, | May 14 2015 | Deephaven Data Labs LLC | Single input graphical user interface control element and method |
11126662, | Aug 24 2017 | Deephaven Data Labs LLC | Computer data distribution architecture connecting an update propagation graph through multiple remote query processors |
11151133, | May 14 2015 | Deephaven Data Labs, LLC | Computer data distribution architecture |
11238036, | May 14 2015 | Deephaven Data Labs LLC | System performance logging of complex remote query processor query operations |
11249994, | Nov 14 2016 | Deephaven Data Labs LLC | Query task processing based on memory allocation and performance criteria |
11263211, | May 14 2015 | Deephaven Data Labs, LLC | Data partitioning and ordering |
11449557, | Aug 24 2017 | Deephaven Data Labs LLC | Computer data distribution architecture for efficient distribution and synchronization of plotting processing and data |
11514037, | May 14 2015 | Deephaven Data Labs LLC | Remote data object publishing/subscribing system having a multicast key-value protocol |
11556528, | May 14 2015 | Deephaven Data Labs LLC | Dynamic updating of query result displays |
11574018, | Aug 24 2017 | Deephaven Data Labs LLC | Computer data distribution architecture connecting an update propagation graph through multiple remote query processing |
11663208, | May 14 2015 | Deephaven Data Labs LLC | Computer data system current row position query language construct and array processing query language constructs |
11687529, | May 14 2015 | Deephaven Data Labs LLC | Single input graphical user interface control element and method |
11860948, | Aug 24 2017 | Deephaven Data Labs LLC | Keyed row selection |
7359923, | Aug 20 2004 | Meta Platforms, Inc | Online incremental deferred integrity processing and maintenance of rolled in and rolled out data |
7490110, | Mar 24 2006 | International Business Machines Corporation | Predictable query execution through early materialization |
8078655, | Jun 04 2008 | Microsoft Technology Licensing, LLC | Generation of database deltas and restoration |
8170999, | Aug 20 2004 | International Business Machines Corporation | Online incremental deferred integrity processing and maintenance of rolled in and rolled out data |
8566341, | Nov 12 2009 | Oracle International Corporation | Continuous aggregation on a data grid |
8655923, | Dec 19 2008 | SAP SE | Simple aggregate mode for transactional data |
9424304, | Dec 20 2012 | INFOR US , LLC | Maintenance of active database queries |
9471662, | Jun 24 2013 | SAP SE | Homogeneity evaluation of datasets |
9612959, | May 14 2015 | Deephaven Data Labs LLC | Distributed and optimized garbage collection of remote and exported table handle links to update propagation graph nodes |
9613018, | May 14 2015 | Deephaven Data Labs LLC | Applying a GUI display effect formula in a hidden column to a section of data |
9613109, | May 14 2015 | Deephaven Data Labs LLC | Query task processing based on memory allocation and performance criteria |
9619210, | May 14 2015 | Deephaven Data Labs LLC | Parsing and compiling data system queries |
9633060, | May 14 2015 | Deephaven Data Labs LLC | Computer data distribution architecture with table data cache proxy |
9639570, | May 14 2015 | Deephaven Data Labs LLC | Data store access permission system with interleaved application of deferred access control filters |
9672238, | May 14 2015 | Deephaven Data Labs LLC | Dynamic filter processing |
9679006, | May 14 2015 | Deephaven Data Labs LLC | Dynamic join processing using real time merged notification listener |
9690821, | May 14 2015 | Deephaven Data Labs LLC | Computer data system position-index mapping |
9703825, | Oct 17 2013 | SYBASE, Inc. | Maintenance of a pre-computed result set |
9710511, | May 14 2015 | Deephaven Data Labs LLC | Dynamic table index mapping |
9760591, | May 14 2015 | Deephaven Data Labs LLC | Dynamic code loading |
9805084, | May 14 2015 | Deephaven Data Labs LLC | Computer data system data source refreshing using an update propagation graph |
9836494, | May 14 2015 | Deephaven Data Labs LLC | Importation, presentation, and persistent storage of data |
9836495, | May 14 2015 | Deephaven Data Labs LLC | Computer assisted completion of hyperlink command segments |
9858326, | Oct 10 2012 | Microsoft Technology Licensing, LLC | Distributed data warehouse |
9886469, | May 14 2015 | Deephaven Data Labs LLC | System performance logging of complex remote query processor query operations |
9898496, | May 14 2015 | Deephaven Data Labs LLC | Dynamic code loading |
9934266, | May 14 2015 | Deephaven Data Labs LLC | Memory-efficient computer system for dynamic updating of join processing |
Patent | Priority | Assignee | Title |
5377306, | Feb 10 1989 | The Secretary of State for Defence in Her Britannic Majesty's Government | Heuristic processor |
5379422, | Jan 16 1992 | Oracle International Corporation | Simple random sampling on pseudo-ranked hierarchical data structures in a data processing system |
5414839, | Jun 19 1992 | HEWLETT-PACKARD DEVELOPMENT COMPANY, L P | Hybrid lock escalation and de-escalation protocols |
5511116, | Aug 25 1992 | Intellectual Ventures II LLC | Method of creating and accessing value tables in a telecommunication service creation and execution environment |
5584024, | Mar 24 1994 | SPEEDWARE LTEE LTD | Interactive database query system and method for prohibiting the selection of semantically incorrect query parameters |
5590324, | Feb 07 1995 | International Business Machines Corporation | Optimization of SQL queries using universal quantifiers, set intersection, and max/min aggregation in the presence of nullable columns |
5666526, | Sep 02 1993 | Microsoft Technology Licensing, LLC | Method and system for supporting scrollable, updatable database queries |
5706495, | May 07 1996 | International Business Machines Corporation | Encoded-vector indices for decision support and warehousing |
5761657, | Dec 21 1995 | TERADATA US, INC | Global optimization of correlated subqueries and exists predicates |
5793968, | Aug 19 1994 | Cisco Technology, Inc | Scalable distributed computing environment |
5812840, | Mar 24 1994 | SPEEDWARE LTEE LTD | Database query system |
5822751, | Dec 16 1996 | Microsoft Technology Licensing, LLC | Efficient multidimensional data aggregation operator implementation |
5835904, | Oct 31 1995 | Microsoft Technology Licensing, LLC | System and method for implementing database cursors in a client/server environment |
5870746, | Oct 12 1995 | NCR Corporation | System and method for segmenting a database based upon data attributes |
5890150, | Jan 24 1997 | Hitachi, Ltd. | Random sampling method for use in a database processing system and a database processing system based thereon |
5893125, | Jan 27 1995 | Borland Software Corporation | Non-modal database system with methods for incremental maintenance |
5918232, | Nov 26 1997 | SYMPHONYRPM, INC | Multidimensional domain modeling method and system |
5926819, | May 30 1997 | ORACLE INTERNATIONAL CORPORATION OIC | In-line triggers |
6191797, | May 22 1996 | Canon Kabushiki Kaisha | Expression tree optimization for processing obscured graphical objects |
6192460, | Dec 16 1997 | Hewlett Packard Enterprise Development LP | Method and apparatus for accessing data in a shadow set after a failed data operation |
6289335, | Jun 23 1997 | ORACLE INTERNATIONAL CORPORATION OIC | Fast refresh of snapshots containing subqueries |
6298342, | Mar 16 1998 | Microsoft Technology Licensing, LLC | Electronic database operations for perspective transformations on relational tables using pivot and unpivot columns |
6353828, | May 14 1999 | ORACLE INTERNATIONAL CORPORATION OIC | Concurrency control for transactions that update base tables of a materialized view using different types of locks |
6353835, | Aug 03 1998 | RPX Corporation | Technique for effectively maintaining materialized views in a data warehouse |
6408163, | Dec 31 1997 | RPX CLEARINGHOUSE LLC | Method and apparatus for replicating operations on data |
6449609, | Dec 28 1998 | Oracle International Corporation | Using materialized view to process a related query containing a one to many lossless join |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
May 10 2000 | SIDLE, RICHARD SEFTON | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 010828 | /0632 | |
May 11 2000 | COCHRANE, ROBERTA JO | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 010828 | /0632 | |
May 15 2000 | PIRAHESH, MIR HAMID | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 010828 | /0632 | |
May 16 2000 | International Business Machines Corporation | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Oct 04 2005 | ASPN: Payor Number Assigned. |
Apr 17 2009 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Aug 16 2013 | REM: Maintenance Fee Reminder Mailed. |
Oct 11 2013 | M1552: Payment of Maintenance Fee, 8th Year, Large Entity. |
Oct 11 2013 | M1555: 7.5 yr surcharge - late pmt w/in 6 mo, Large Entity. |
Apr 18 2017 | M1553: Payment of Maintenance Fee, 12th Year, Large Entity. |
Date | Maintenance Schedule |
Jan 03 2009 | 4 years fee payment window open |
Jul 03 2009 | 6 months grace period start (w surcharge) |
Jan 03 2010 | patent expiry (for year 4) |
Jan 03 2012 | 2 years to revive unintentionally abandoned end. (for year 4) |
Jan 03 2013 | 8 years fee payment window open |
Jul 03 2013 | 6 months grace period start (w surcharge) |
Jan 03 2014 | patent expiry (for year 8) |
Jan 03 2016 | 2 years to revive unintentionally abandoned end. (for year 8) |
Jan 03 2017 | 12 years fee payment window open |
Jul 03 2017 | 6 months grace period start (w surcharge) |
Jan 03 2018 | patent expiry (for year 12) |
Jan 03 2020 | 2 years to revive unintentionally abandoned end. (for year 12) |