Methods, apparatus, systems and articles of manufacture are disclosed. An example partitioned computer database system includes a plurality of nodes, a data director to distribute a plurality of portions of database data across the plurality of nodes, queriers associated with respective ones of the plurality of nodes, the queriers to execute respective sub-queries of respective portions of the database data, and a coordinator to receive a request to query the database data, and merge results of the plurality of sub-queries to form a response to the request.
|
9. A method, comprising:
randomly or pseudo-randomly distributing respective portions of database data across a plurality of nodes without respect to content of the database data;
decomposing a request to query the database data, by executing an instruction with at least one processor, to form a plurality of sub-queries of respective portions of the database data based on at least one sharding key value, the request including the at least one sharding key value provided as part of a conditional statement;
instructing queriers to execute the sub-queries on respective ones of the nodes and provide results of the execution of the respective ones of the plurality of sub-queries; and
combining results of the plurality of sub-queries, by executing an instruction with at least one processor, to form a response to the request.
14. A non-transitory computer-readable storage medium comprising instructions that, when executed, cause a machine to at least:
randomly or pseudo-randomly distributing respective portions of database data across a plurality of nodes of a partitioned database without respect to content of the database data;
decompose a request to query the database data to form a plurality of sub-queries of respective portions of the database data based on at least one sharding key value, the request including the at least one sharding key value provided as part of a conditional statement;
instruct queriers to execute the sub-queries to respective nodes on respective portions of the database data and provide results of the execution of the respective ones of the plurality of sub-queries; and
combine results of the plurality of sub-queries to form a response to the request.
1. A partitioned database system, comprising:
a plurality of nodes;
a data director to randomly or pseudo-randomly distribute a plurality of portions of database data across the plurality of nodes without respect to content of the database data;
queriers associated with respective ones of the plurality of nodes, the queriers to execute respective ones of a plurality of sub-queries of respective portions of the database data; and
a coordinator to:
receive a request to query the database data, the request including at least one sharding key value provided as part of a conditional statement;
generate the plurality of sub-queries based on the at least one sharding key value;
instruct the queriers to execute the respective ones of the plurality of sub-queries and provide results of the execution of the respective ones of the plurality of sub-queries to the coordinator; and
merge the results of the plurality of sub-queries to form a response to the request.
2. The partitioned database system of
3. The partitioned database system of
4. The partitioned database system of
5. The partitioned database system of
6. The partitioned database system of
form the plurality of sub-queries based on the request; and
send the plurality of sub-queries to others of the queriers.
7. The partitioned database system of
8. The partitioned database system of
10. The method of
12. The method of
send the sub-queries to respective ones of the plurality of nodes;
receive the results of the sub-queries from the respective ones of the nodes; and
combine the results to form the response.
13. The method of
15. The non-transitory computer-readable storage medium of
16. The non-transitory computer-readable storage medium of
17. The non-transitory computer-readable storage medium of
|
This disclosure relates generally to databases, and, more particularly, to methods, apparatus and systems to aggregate partitioned computer database data.
Data driven security applies big data analytics to security data streams. Security data streams may be generated by collecting data coming from large numbers of machines distributed across large-scale customer systems.
When useful, the same reference numbers will be used in the drawing(s) and accompanying written description to refer to the same or like parts. Connecting lines or connectors shown in the various figures presented are intended to represent example functional relationships and/or physical or logical couplings between the various elements.
In the field of computer security information and event management (SIEM), security operations center analysts need to be able to interactively control data stream aggregation, and filtering to identify data stream properties that might otherwise remain unobserved. However, as customer deployments have grown to cloud scale, security data streams have become so large (e.g., hundreds of millions of events) that stream ingestion and aggregation can no longer be handled by a single database node. Accordingly, systems, methods and apparatus that scale beyond the existing limits of a single node are disclosed herein. Some examples disclosed herein scale stream data ingestion by partitioning (e.g., spreading, distributing) the stream data across a plurality of nodes, as the stream data is received and ingested (i.e., in substantially real-time).
In known large systems, data that has been partitioned in that way across a plurality of nodes cannot be joined without data re-shuffle (e.g., aggregated, combined, etc.). For example, given two tables R and S, R JOIN S is the set of all combinations of tuples in R and S that have common attribute names. Consider example common attributes that are the subset of fields K. To compute R JOIN S, the database system must take each row r in R and find all the tuples s in S where r.k=s.k. To compute this in a distributed system, where both R and S are distributed, each physical node in the cluster must provide the data from either S or R to the other nodes, which can then do a local merge. A local merge between R and P(S) is that each r in R is compared with all the rows in P(S) to ensure a total Cartesian product is determined for the JOIN. The providing of the data between nodes is known as re-shuffling, and makes a real-time JOIN not feasible on a known distributed systems. In contrast, according to teachings of this disclosure, the partitioned data can be scalably filtered, joined and aggregated in real-time, across the plurality of nodes without re-shuffling. Currently-available, expensive and complex systems are only capable of processing approximately one million events-per-second (EPS). In stark contrast, the teachings of this disclosure have been used to demonstrate systems that are capable of over two million EPS. As EPS is an important benchmark in the field of SIEM, a 2× improvement in EPS represents a significant improvement in database systems, apparatus and methods for SIEM. Such improvements allow SIEM analysts the ability to more quickly detect security events and respond to mitigate them in the computer systems they are monitoring, thereby lessening chances of, for example, data loss, data theft, computer system unavailability, etc.
Reference will now be made in detail to non-limiting examples, some of which are illustrated in the accompanying drawings.
To partition the database data 102 into the portions 106A-N, the example system 100 of
In the illustrated example of
In the illustrated example of
The example portions 106A-N, the example fact table 112, and the example dimension table 110 may be stored on any number and/or type(s) of computer-readable storage device(s) and/or disk(s) using any number and/or type(s) of data structure(s).
To query the database data 102, the example nodes 104A of
In the illustrated example, the querier 114 of the node 104A, which is acting as the coordinator, forms (e.g., defines, generates, etc.) sub-queries 120B, 120C, . . . 120N to be executed by the queriers 114 of respective nodes 104B-N. In some examples, the coordinator forms a sub-query 120A to be executed by the coordinator (i.e., the querier 114 of node 104A). The example coordinator forms a sub-query 120A-N for each node 104A-N, which may be the same, that stores a portion 106A-N of the database data related to the request 116. Example SQL sub-queries 120A-N are:
The example coordinator of
The example API 132 sends a query 116 to the designated coordinator 118 (e.g., one of the queriers 114). The coordinator 118 performs three actions: (1) determine what nodes need to participate in the resolution of the query 116; (2) creates sub-queries 120A-N, which may be the same, to be computed locally on each participating node 104A-N; and (3) consolidates the individual results 122A-N into a single result set. The determination of the participating nodes is done by, for example, observing the sharding key values resulting from the WHERE condition of the example query 116. If no sharding key is provided as part of the WHERE condition, then one replica of each shard participates in the query 116 (e.g., all shards participates. If a set of sharding key-values is provided as part of the WHERE condition, then the knowledge of what shards manage those key-values is used to determine to what shards to send each sub-query 120A-N. The creation of the sub-queries 120A-N includes translation of aggregate functions and, if sharding key-values are present, filtering criteria segregated per node 104A-N according to the data managed by each shard. The translation is implemented by, for example, mapping one high-level aggregate function (e.g., from the example AVG(s)) to one or more lower level aggregate functions to be computed in the sub-queries 120A-N (e.g., COUNT(F.s) and SUM(F.s) in the illustrated example). The filtering conditions, if present, are inserted for the data range managed by each shard (the example sub-queries 120A-N shown above do not use sharding keys). The data shuffling inherent in known distributed JOIN operations is not required in the examples disclosed herein, at least because facts (e.g., alerts) are partitioned and dimensions (e.g., rules) are replicated. This obviates the need to provide either the facts or the dimensions over a network to ensure all combinations of facts and dimensions can be considered during a JOIN operation.
In some examples, the coordinator identifies whether any nodes 104B-N fail to respond to the sub-queries 120B-N. If/when a node 104B-N fails to respond, the coordinator stores a handoff hint in a hints directory 134 on the affected node 104B-N for handling by cluster management processes.
In the illustrated example of
To enable the client application 126 to communicate with the nodes 104A-N, the example client application 126 of
While an example manner of implementing the partitioned database system 100 is illustrated in
In the illustrated example of
To manage data persistency, the example nodes 104A-N of
To manage the cluster 103, the example nodes 104A-N of
While an example manner of implementing the example cluster 103 and the example nodes 104A-N of
A flowchart representative of example hardware logic or machine-readable instructions for implementing the example nodes 104A-N, and/or, more generally, the example cluster 130 of
As mentioned above, the example process of
“Including” and “comprising” (and all forms and tenses thereof) are used herein to be open ended terms. Thus, whenever a claim employs any form of “include” or “comprise” (e.g., comprises, includes, comprising, including, having, etc.) as a preamble or within a claim recitation of any kind, it is to be understood that additional elements, terms, etc. may be present without falling outside the scope of the corresponding claim or recitation. As used herein, when the phrase “at least” is used as the transition term in, for example, a preamble of a claim, it is open-ended in the same manner as the term “comprising” and “including” are open ended. The term “and/or” when used, for example, in a form such as A, B, and/or C refers to any combination or subset of A, B, C such as (1) A alone, (2) B alone, (3) C alone, (4) A with B, (5) A with C, and (6) B with C.
The program of
The coordinator waits to receive results 122A-N for the sub-queries 120A-N from the identified nodes 104A-N (block 410). When the results 122A-N have been received (block 410), the coordinator combines the results (block 412) and reduces the results to, for example, remove redundant data (block 414). The coordinator sends a response 124 with the results to the client application 126 (block 416), and control exits from the example program of
Returning to block 410, when not all sub-query results 122A-N have been received (block 410), the coordinator determines whether a timeout has occurred (block 418). If a timeout has not occurred (block 418), the coordinator continues to wait for sub-query results 122A-N (block 410). If a timeout has occurred (block 418), the coordinator stores a hinted handoff notice for the node(s) 104A-N from which sub-query results 122A-N have not been received (block 420), and control proceeds to block 412 to combine the sub-query results 122A-N that were received.
The processor platform 500 of the illustrated example includes a processor 510. The processor 510 of the illustrated example is hardware. For example, the processor 510 can be implemented by one or more integrated circuits, logic circuits, microprocessors, GPUs, DSPs, or controllers from any desired family or manufacturer. The hardware processor may be a semiconductor based (e.g., silicon based) device. In this example, the processor implements the example queriers 114, the example coordinator, the example service interface modules 302A-N, the example data management modules 304A-N, the example cluster management modules 306A-N, the example data director 108, the example client application 126, the example API 132.
The processor 510 of the illustrated example includes a local memory 512 (e.g., a cache). The processor 510 of the illustrated example is in communication with a main memory including a volatile memory 514 and a non-volatile memory 516 via a bus 518. The volatile memory 514 may be implemented by Synchronous Dynamic Random-Access Memory (SDRAM), Dynamic Random-Access Memory (DRAM), RAMBUS® Dynamic Random-Access Memory (RDRAM®) and/or any other type of random access memory device. The non-volatile memory 516 may be implemented by flash memory and/or any other desired type of memory device. Access to the main memory 514, 516 is controlled by a memory controller.
The processor platform 500 of the illustrated example also includes an interface circuit 520. The interface circuit 520 may be implemented by any type of interface standard, such as an Ethernet interface, a universal serial bus (USB), a Bluetooth® interface, a near field communication (NFC) interface, and/or a PCI express interface.
In the illustrated example, one or more input devices 522 are connected to the interface circuit 520. The input device(s) 522 permit(s) a user to enter data and/or commands into the processor 510. The input device(s) can be implemented by, for example, an audio sensor, a microphone, a camera (still or video), a keyboard, a button, a mouse, a touchscreen, a track-pad, a trackball, isopoint and/or a voice recognition system.
One or more output devices 524 are also connected to the interface circuit 520 of the illustrated example. The output devices 524 can be implemented, for example, by display devices (e.g., a light emitting diode (LED), an organic light emitting diode (OLED), a liquid crystal display (LCD), a cathode ray tube display (CRT), an in-place switching (IPS) display, a touchscreen, etc), a tactile output device, a printer and/or speaker. The interface circuit 520 of the illustrated example, thus, typically includes a graphics driver card, a graphics driver chip and/or a graphics driver processor. The example dashboard 130 may be displayed on an output device 524
The interface circuit 520 of the illustrated example also includes a communication device such as a transmitter, a receiver, a transceiver, a modem, a residential gateway, a wireless access point, and/or a network interface to facilitate exchange of data with external machines (e.g., computing devices of any kind) via a network 526. The communication can be via, for example, an Ethernet connection, a digital subscriber line (DSL) connection, a telephone line connection, a coaxial cable system, a satellite system, a line-of-site wireless system, a cellular telephone system, etc. In some examples of a Wi-Fi system, the interface circuit 520 includes a radio frequency (RF) module, antenna(s), amplifiers, filters, modulators, etc.
The processor platform 500 of the illustrated example also includes one or more mass storage devices 528 for storing software and/or data. Examples of such mass storage devices 528 include floppy disk drives, hard drive disks, CD drives, Blu-ray disc drives, redundant array of independent disks (RAID) systems, and DVD drives. In the illustrated example, the example portions 106A-N, the example fact table 112, and the example dimension table 110 are stored on the mass storage device 528.
Coded instructions 532 including the coded instructions of
From the foregoing, it will be appreciated that example systems, methods, apparatus and articles of manufacture have been disclosed that aggregate partitioned database data. From the foregoing, it will be appreciated that methods, apparatus and articles of manufacture have been disclosed that make computer operations more efficient by being able to aggregate partitioned database data. Thus, through use of teachings of this disclosure, computers can operate more efficiently by being able to process database data in real-time at rates that are currently infeasible.
Example methods, systems, apparatus, and articles of manufacture to aggregate partitioned database data are disclosed herein. Further examples and combinations thereof include at least the following.
Example 1 is a partitioned database system that includes:
Example 2 is the partitioned database system of example 1, wherein at least some of the nodes store their respective portions of the database data in a horizontally-arranged fact table.
Example 3 is the partitioned database system of any of examples 1 to 2, wherein the pattern is at least one of a rotating pattern, or a random pattern.
Example 4 is the partitioned database system of any of examples 1 to 3, wherein the queriers implement a distributed interface, the distributed interface to monitor a topology of storage devices associated with the nodes, and a real-time status of the partitioned database system.
Example 5 is the partitioned database system of any of examples 1 to 4, wherein a first of the queriers is to perform the respective sub-query without a shuffle of the respective portion of the database data.
Example 6 is the partitioned database system of any of examples 1 to 5, wherein a first of the sub-queries is a linearly-scalable query.
Example 7 is the partitioned database system of any of examples 1 to 6, wherein the coordinator is a first of the queriers, and the coordinator is to:
Example 8 is the partitioned database system of example 7, wherein the first of the queriers is to decompose the request to form the sub-queries.
Example 9 is the partitioned database system of any of examples 1 to 8, wherein the request is received from a client application.
Example 10 is method that includes:
Example 11 is the method of example 10, wherein distributing the database data is according to at least one of a rotating pattern, or a random pattern.
Example 12 is the method of any of examples 10 to 11, wherein a first of the sub-queries does not shuffle the respective portion of the database data.
Example 13 is the method of any of examples 10 to 11, wherein a first of the plurality of sub-queries is linearly scalable.
Example 14 is the method of any of examples 10 to 13, wherein receiving the query request and decomposing the query request to form the plurality of sub-queries is performed on a first node of the plurality of nodes, the first node to:
Example 15 is the method of example 14, wherein combining the results includes merging and reducing the results of the sub-queries.
Example 16 is a non-transitory computer-readable storage medium storing instructions that, when executed, cause a machine to at least:
Example 17 is the non-transitory computer-readable storage medium of example 16, wherein a first of the sub-queries is linearly scalable.
Example 18 is the non-transitory computer-readable storage medium of any of examples 16 to 17, including further instructions that, when executed, cause the machine to combine the results by merging and reducing the results of the sub-queries.
Example 19 is the non-transitory computer-readable storage medium of any of examples 16 to 18, wherein a first of the sub-queries does not shuffle the respective portion of the database data.
Example 20 is the non-transitory computer-readable storage medium of any of examples 16 to 19, wherein a distribution pattern of the database data is not dependent on data content.
Although certain example methods, apparatus and articles of manufacture have been disclosed herein, the scope of coverage of this patent is not limited thereto. On the contrary, this patent covers all methods, apparatus and articles of manufacture fairly falling within the scope of the claims of this patent.
Michelis, Pablo A., Stewart, Brian H.
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
5745746, | Jun 24 1996 | International Business Machines Corporation | Method for localizing execution of subqueries and determining collocation of execution of subqueries in a parallel database |
6092062, | Jun 30 1997 | International Business Machines Corporation | Relational database query optimization to perform query evaluation plan, pruning based on the partition properties |
6405198, | Sep 04 1998 | International Business Machines Corporation | Complex data query support in a partitioned database system |
6795817, | May 31 2001 | ORACLE INTERNATIONAL CORPORATION OIC | Method and system for improving response time of a query for a partitioned database object |
7805456, | Feb 05 2007 | Microsoft Technology Licensing, LLC | Query pattern to enable type flow of element types |
8655901, | Jun 23 2010 | GOOGLE LLC | Translation-based query pattern mining |
9471647, | Oct 09 2008 | International Business Machines Corporation | Node-level sub-queries in distributed databases |
9563663, | Sep 28 2012 | Oracle International Corporation | Fast path evaluation of Boolean predicates |
20030088558, | |||
20040088386, | |||
20040220911, | |||
20050027680, | |||
20050187977, | |||
20060041537, | |||
20060294087, | |||
20070226177, | |||
20080040317, | |||
20080059408, | |||
20080177716, | |||
20110131199, | |||
20120310916, | |||
20130311441, | |||
20140067792, | |||
20140172914, | |||
20150081666, | |||
20160117369, | |||
20160188677, | |||
20160191509, | |||
20170103116, | |||
20170346875, | |||
20180060385, | |||
CN112204541, | |||
GB2521197, | |||
WO2017062288, | |||
WO2017208221, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Mar 26 2018 | McAfee, LLC | (assignment on the face of the patent) | / | |||
Mar 26 2018 | STEWART, BRIAN H | McAfee, LLC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 045394 | /0087 | |
Mar 29 2018 | MICHELIS, PABLO A | McAfee, LLC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 045394 | /0087 | |
Mar 01 2022 | McAfee, LLC | JPMORGAN CHASE BANK, N A , AS ADMINISTRATIVE AGENT AND COLLATERAL AGENT | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 059354 | /0335 | |
Mar 01 2022 | McAfee, LLC | JPMORGAN CHASE BANK, N A , AS ADMINISTRATIVE AGENT | CORRECTIVE ASSIGNMENT TO CORRECT THE THE PATENT TITLES AND REMOVE DUPLICATES IN THE SCHEDULE PREVIOUSLY RECORDED AT REEL: 059354 FRAME: 0335 ASSIGNOR S HEREBY CONFIRMS THE ASSIGNMENT | 060792 | /0307 |
Date | Maintenance Fee Events |
Mar 26 2018 | BIG: Entity status set to Undiscounted (note the period is included in the code). |
Date | Maintenance Schedule |
Oct 05 2024 | 4 years fee payment window open |
Apr 05 2025 | 6 months grace period start (w surcharge) |
Oct 05 2025 | patent expiry (for year 4) |
Oct 05 2027 | 2 years to revive unintentionally abandoned end. (for year 4) |
Oct 05 2028 | 8 years fee payment window open |
Apr 05 2029 | 6 months grace period start (w surcharge) |
Oct 05 2029 | patent expiry (for year 8) |
Oct 05 2031 | 2 years to revive unintentionally abandoned end. (for year 8) |
Oct 05 2032 | 12 years fee payment window open |
Apr 05 2033 | 6 months grace period start (w surcharge) |
Oct 05 2033 | patent expiry (for year 12) |
Oct 05 2035 | 2 years to revive unintentionally abandoned end. (for year 12) |