Method, system, and computer program product for computing histogram aggregations

Method, system, and computer program product for computing histogram aggregations
US5960435

A data record transformation that computes histograms and aggregations quickly for an incoming record stream. The data record transformation computes histograms and aggregations in one-step, thereby, avoiding the creation of a large intermediate result. The data record transformation operates in a streaming fashion on each record in an incoming record stream. Little memory is required to operate on one record or a few records at a time. According to a first embodiment, a method, system, and computer program product for transforming sorted data records is provided. A data transformation unit includes a binning module and a histogram aggregation module. The histogram aggregation module processes each binned and sorted record to form an aggregate record in a histogram format in one step. data received in each incoming binned and sorted record is expanded and accumulated in an aggregate record for matching group-by fields. According to a second embodiment, a method, system, and computer program product for transforming unsorted data records is provided. An associative data structure holds a collection of partially aggregated histogram records. A histogram aggregation module processes each binned record to form an aggregate record in a histogram format in one step. input records from the unordered record stream are matched against the collection of partially aggregated histogram records and expanded and accumulated into the aggregate histogram record having matching group-by fields.

PTO Wrapper PDF
Dossier Espace Google

Patent 5960435
Priority Mar 11 1997
Filed Mar 11 1997
Issued Sep 28 1999
Expiry Mar 11 2017
Inventors Haber, Ebe…
Assg.orig Silicon Gr…
Assg.curr RPX Corpor…
Entity Large
Referenced by 74
References 43
Maint.: all paid

CROSS-REFERENCE TO R…
BACKGROUND OF THE IN…
SUMMARY OF THE INVEN…
BRIEF DESCRIPTION OF…
DETAILED DESCRIPTION…

17. A system for transforming a stream of sorted input data records into an aggregate record, each input data record including at least one field to be binned and at least one value field, and each aggregate record including an aggregation result field, wherein the value field stores data to be aggregated, and wherein the aggregation result field includes at least one location and stores aggregated data in a histogram format, the system comprising:

a binning module that, for each input data record in the stream, evaluates data in the input data record field to be binned to determine a bin-index value, wherein the bin-index value identifies locations of the aggregation result field the value field data should be aggregated into; and

a histogram aggregation module that, for each input data record in the stream, aggregates the value field data into the aggregation result field location identified by the bin-index value.

18. A system for transforming a stream of input data records into a partially aggregated record, each input data record including at least one field to be binned and at least one value field, and each partially aggregated record including an aggregation result field, wherein the value field stores data to be aggregated, and wherein the aggregation result field includes at least one location and stores aggregated data in a histogram format, the system comprising:

a histogram aggregation module that, for each input data record in the stream, aggregates the value field data into the aggregation result field location identified by the bin-index value.

8. A method for transforming a stream of input data records into a partially aggregated record, each input data record including at least one group-by field, at least one field to be binned, and at least one value field, and each partially aggregated record including at least one group-by field and an aggregation result field, wherein the input data record group-by field stores data to be matched against data stored in the partially aggregated record group-by field, wherein the value field stores data to be aggregated, and wherein the aggregation result field includes at least one location and stores aggregated data in a histogram format, the method comprising the steps of:

evaluating, for each input data record in the stream, data in the input data record field to be binned to determine a bin-index value, wherein the bin-index value identifies locations of the aggregation result field the value field data should be aggregated into; and

for each input data record in the stream, aggregating the value field data into the aggregation result field location identified by the bin-index value, when data in the input data record group-by field matches data in the partially aggregated record group-by field.

3. A method for transforming a stream of sorted input data records into an aggregate record, each input data record including at least one group-by field, at least one field to be binned, and at least one value field, and each aggregate record including at least one group-by field and an aggregation result field, wherein the input data record group-by field stores data to be matched against data stored in the aggregate record group-by field, wherein the input data records are sorted by the data in the group-by field, wherein the value field stores data to be aggregated, and wherein the aggregation result field includes at least one location and stores aggregated data in a histogram format, the method comprising the steps of:

7. A system for transforming a stream of input data records into a partially aggregated record, each input data record including at least one group-by field, at least one field to be binned, and at least one value field, and each partially aggregated record including at least one group-by field and an aggregation result field, wherein the input data record group-by field stores data to be matched against data stored in the partially aggregated record group-by field, wherein the value field stores data to be aggregated, and wherein the aggregation result field includes at least one location and stores aggregated data in a histogram format, the system comprising:

a histogram aggregation module that, for each input data record in the stream, aggregates the value field data into the aggregation result field location identified by the bin-index value, when data in the input data record group-by field matches data in the partially aggregated record group-by field.

11. A system for transforming a stream of input data records into a partially aggregated record, each input data record including at least one group-by field, at least one field to be binned, and at least one value field, and each partially aggregated record including at least one group-by field and an aggregation result field, wherein the input data record group-by field stores data to be matched against data stored in the partially aggregated record group-by field, wherein the value field stores data to be aggregated, and wherein the aggregation result field includes at least one location and stores aggregated data in a histogram format, the system comprising:

an evaluating means that, for each input data record in the stream, evaluates data in the input data record field to be binned to determine a bin-index value, wherein the bin-index value identifies locations of the aggregation result field the value field data should be aggregated into; and

a histogram aggregation means that, for each input data record in the stream, aggregates the value field data into the aggregation result field location identified by the bin-index value, when data in the input data record group-by field matches data in the partially aggregated record group-by field.

1. A system for transforming a stream of sorted input data records into an aggregate record, each input data record including at least one group-by field, at least one field to be binned, and at least one value field, and each aggregate record including at least one group-by field and an aggregation result field, wherein the input data record group-by field stores data to be matched against data stored in the aggregate record group-by field, wherein the input data records are sorted by the data in the group-by field, wherein the value field stores data to be aggregated, and wherein the aggregation result field includes at least one location and stores aggregated data in a histogram format, the system comprising:

15. A computer program product comprising a computer useable medium having computer program logic recorded thereon for enabling a processor in a computer system to transform a stream of input data records into a partially aggregated record, each input data record including at least one group-by field, at least one field to be binned, and at least one value field, and each partially aggregated record including at least one group-by field and an aggregation result field, wherein the input data record group-by field stores data to be matched against data stored in the partially aggregated record group-by field, wherein the value field stores data to be aggregated, and wherein the aggregation result field includes at least one location and stores aggregated data in a histogram format, the computer program logic comprising:

means for enabling the processor, for each input data record in the stream, to evaluate data in the input data record field to be binned to determine a bin-index value, wherein the bin-index value identifies locations of the aggregation result field the value field data should be aggregated into; and

means for enabling the processor, for each input data record in the stream, to aggregate the value field data into the aggregation result field location identified by the bin-index value, when data in the input data record group-by field matches data in the partially aggregated record group-by field.

10. A system for transforming a stream of sorted input data records into an aggregate record, each input data record including at least one group-by field, at least one field to be binned, and at least one value field, and each aggregate record including at least one group-by field and an aggregation result field, wherein the input data record group-by field stores data to be matched against data stored in the aggregate record group-by field, wherein the value field stores data to be aggregated, and wherein the aggregation result field includes at least one location and stores aggregated data in a histogram format, the system comprising:

evaluating means for evaluating each input data record in the stream, the evaluating means evaluates data in the input data record field to be binned to determine a bin-index value, wherein the bin-index value identifies locations of the aggregation result field the value field data should be aggregated into;

histogram aggregation means that, for each input data record in the stream, aggregates the value field data into the aggregation result field location identified by the bin-index value, when data in the input data record group-by field matches data in the aggregate record group-by field; and

a graphical user-interface means for providing parameters for the histogram aggregation means, the parameters identifying a selected histogram aggregation operation and selected group-by fields.

5. A computer program product comprising a computer useable medium having computer program logic recorded thereon for enabling a processor in a computer system to transform a stream of sorted input data records into an aggregate record, each input data record including at least one group-by field, at least one field to be binned, and at least one value field, and each aggregate record including at least one group-by field and an aggregation result field, wherein the input data record group-by field stores data to be matched against data stored in the aggregate record group-by field, wherein the input data records are sorted by the data in the group-by field, wherein the value field stores data to be aggregated, and wherein the aggregation result field includes at least one location and stores aggregated data in a histogram format, the computer program logic comprising:

2. The system of claim 1, further comprising:

an output module that outputs the aggregated data in the histogram format stored in the aggregation result field of the aggregate record once data in the input data record group-by field does not match data in the aggregate record group-by field.

4. The method of claim 3, further comprising the steps of:

outputting the aggregated data in the histogram format stored in the aggregation result field of the aggregate record once data in the input data record group-by field does not match data in the aggregate record group-by field; and

re-initializing the aggregate record, wherein the data in the aggregate record group-by field is set equal to the data in the input data record group-by field and data in each location of the aggregation result field is set to null.

6. The computer program product of claim 5, wherein the computer program logic further comprises:

means for enabling the processor to output the aggregated data in the histogram format stored in the aggregation result field of the aggregate record once data in the input data record group-by field does not match data in the aggregate record group-by field; and

means for enabling a processor to re-initialize the aggregate record, wherein the data in the aggregate record group-by field is set equal to the data in the input data record group-by field and data in each location of the aggregation result field is set to null.

9. The method of claim 8, further comprising the steps of:

creating a partially aggregated record when the data in the input data record group-by field does not match with any of the data in the existing partially aggregated records group-by fields, wherein the data in the created partially aggregated record group-by field is set equal to the data in the input data record group-by field and data in each location of the aggregation result field of the created partially aggregated record is set to null; and

outputting the aggregated data in the histogram format stored in the aggregation result field of each partially aggregated record once all of the input data records have been processed.

12. The system of claim 11, further comprising a graphical user-interface means that provides parameters for the histogram aggregation means, the parameters identifying a selected histogram aggregation operation and selected group-by fields.

13. The system of claim 12, wherein the graphical user-interface means displays an aggregate panel, the aggregate panel including three windows for displaying columns to aggregate, group-by columns, and columns to remove, a first group of boxes for choosing at least one index for columns to aggregate and distribution columns, and a second group of boxes for selecting the histogram aggregation operation from a group of histogram aggregation operations.

14. The system of claim 13, wherein the group of histogram aggregation operations include at least one of determining a sum, maximum, minimum, count, and average.

16. The computer program product of claim 15, wherein the computer program logic further comprises:

means for creating a partially aggregated record when the data in the input data record group-by field does not match with any of the data in the existing partially aggregated records group-by fields, wherein the data in the created partially aggregated record group-by field is set equal to the data in the input data record group-by field and data in each location of the aggregation result field of the created partially aggregated record is set to null; and

means for outputting the aggregated data in the histogram format stored in the aggregation result field of each partially aggregated record once all of the input data records have been processed.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is related to the following co-pending utility patent applications:

1. Sang'udi et al., "Computer-Related Method, System, and Program Product for Controlling Data Visualization in External Dimension(s)," application Ser. No. 08/748,548, (filed Nov. 12, 1996, and incorporated by reference in its entirety herein), pending.

2. Tesler, "Method, System and Computer Program Product for Visualizing Data Using Partial Hierarchies," Ser. No. 08/813,347, Attorney Docket No. 15-4-103.0151, 1452.0180003, (filed Mar. 7, 1997 and incorporated by reference in its entirety herein), pending.

3. Kohavi and Tesler, "Method, System and Computer Program Product for Visualizing a Decision Tree Classifier," Ser. No. 08/813,336, Attorney Docket No. 15-4-417.00, 1452.2220000, (filed Mar. 7, 1997 and incorporated by reference in its entirety herein), pending.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates generally to data record manipulation.

2. Related Art

Data mining and data visualization tools are called upon to handle ever increasing amounts of data. Databases or flat files can contain large data sets of records. The data records have different fields representing different data attributes or variables. In this way, the records can represent multivariate data sets. Depending upon the type of data involved, a data visualization is displayed to identify data content, characteristics, properties, relationships, trends, and any other aspect of the data. Many types of data visualizations are possible, such as, a scatter plot, geographical or map visualization, and a tree or hierarchy visualization.

Histograms are often used to summarize data records and attributes. A data record transformation unit is used to transform records into corresponding histograms. If data records are transformed into histogram format before performing aggregations on the data, the resulting calculation is costly and inefficient. Processing all records for a large gigabyte data set is slow and expensive.

SUMMARY OF THE INVENTION

The present invention provides a data record transformation that computes histograms and aggregations quickly for an incoming record stream. The data record transformation computes histograms and aggregations in one-step, thereby, avoiding the creation of a large intermediate result. The data record transformation operates in a streaming fashion on each record in an incoming record stream.

According to a first embodiment, a method, system, and computer program product for transforming sorted data records is provided. The data records are sorted on any one or more group-by fields. A data transformation unit includes a binning module and a histogram aggregation module. For each incoming record, the binning module bins numeric and/or categorical data attributes, as necessary.

The histogram aggregation module processes each binned and sorted record. The input binned and sorted record is matched with an aggregated histogram record, to check that it matches on all group-by fields. If so, values from the input record are accumulated into the aggregation result fields of the aggregation record. If not, the partially aggregated histogram record is written as output and then reinitialized with values from the input record. Little memory is required for the first embodiment since it only needs intermediate storage sufficient for recoding the values of one or a few histogram records.

According to a second embodiment, a method, system, and computer program product for transforming unsorted data records is provided. A data transformation unit includes a binning module and a histogram aggregation module. An associative data structure holds a collection of partially aggregated histogram records. For each incoming record, the binning module bins numeric and/or categorical data attributes, as necessary.

The histogram aggregation module processes each binned record. The input binned record is matched against the records in the associative data structure to see if any stored therein match it on all group-by fields. If such a matching record exists, values from the input record are accumulated into the aggregation result fields of the matching aggregation record. If not, the input record is used to initialize a new partially aggregated histogram record, which is added to the associative data structure. The second embodiment requires additional memory to store the associative data structure.

Further features and advantages of the present invention, as well as the structure and operation of various embodiments of the present invention, are described in detail below with reference to the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated herein and form part of the specification, illustrate the present invention and, together with the description, further serve to explain the principles of the invention and to enable a person skilled in the pertinent art to make and use the invention.

FIG. 1 is a block diagram of a data record transformation unit according to a first embodiment of the present invention.

FIG. 2 is a flowchart of a histogram aggregation routine according to the first embodiment of the present invention.

FIG. 3A is an example new data record, according to the present invention. FIG. 3B is a diagram of an example aggregation record according to the present invention.

FIG. 4 is a block diagram of a data record transformation unit according to a second embodiment of the present invention.

FIG. 5 is a flowchart of a histogram aggregation routine according to the second embodiment of the present invention.

FIG. 6 is a diagram of an example associative data structure according to the second embodiment of the present invention.

FIG. 7 is an example computer system for implementing the present invention.

FIG. 8 is an example client-server data-mining architecture implementing the present invention.

FIGS. 9A, 9B, and 9C are display views for a tool-manager graphical user interface implementing the present invention.

FIGS. 10A and 10B are display views of binning panels used in the tool-manager of FIGS. 9A to 9C.

FIG. 11 is a display view of an aggregation panel used in the tool-manager of FIGS. 9A to 9C.

The present invention is described with reference to the accompanying drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Additionally, the left-most digit(s) of a reference number identifies the drawing in which the reference number first appears.

DETAILED DESCRIPTION OF THE EMBODIMENTS

1. Overview of the Invention

The present invention provides a data manipulation tool for computing histogram aggregation roll-ups of an incoming stream of record-based data (sorted or unsorted). According to the present invention, histogram roll-ups are generated and defined in a separate operation and do not have to be defined in terms of other primitives. Histogram distribution and aggregation are combined into a single step. Processing is made simpler and more efficient as the creation of a large intermediate result is avoided.

2. Terminology

"Record" can be any type of data record having data attributes, and variables. Any type of data can be used, including but not limited to, business, engineering, science, and other applications. Data sets can be received as data records, flat files, relational or non-relational database files, direct user inputs, or any other data form.

"Binning" refers to any conventional process for assigning data values (numeric and/or categorical) into bins. Bins can be made up of uniform and/or non-uniform clusters of data points.

"Histogram" refers to a distributive representation of data attributes or variables.

3. Example Environment

The present invention is described in terms of an example data mining environment. Given the description herein, it would be obvious to one skilled in the art to implement the present invention in any general computer including, but not limited to, a computer graphics processor (single chip or multiple chips), high-end to low-end graphics workstations, virtual machine (e.g. Java-created application), and network architectures (e.g., client/server, local, intermediate or wide area networks). In one preferred example, the present invention can be implemented as software, firmware, and/or hardware in a data mining tool, such as, the MINESET product released by SILICON GRAPHICS, Inc., and executed on a graphics workstation manufactured by SILICON GRAPHICS, Inc. (e.g., an INDIGO2, INDY, ONYX, O2, or OCTANE workstation). A further example computer system and tool manager implementing the present invention is described below, but is not intended to limit the present invention.

Description in these terms is provided for convenience only. It is not intended that the invention be limited to application in this example environment. In fact, after reading the following description, it will become apparent to a person skilled in the relevant art how to implement alternative environments of the invention.

4. Data Transformation Using Histogram Aggregation

According to a first embodiment, a method, system, and computer program product for transforming sorted data records is provided. As shown in FIG. 1, a data transformation unit 110 is coupled to a database 100. Database 100 can be any type of database (relation and/or non-relational) or flat file storage device. Data transformation unit 110 receives an incoming stream of sorted input records 105. For example, a database management system (not shown) can be used to sort the records. The data records are sorted on any one or more group-by fields.

Data transformation unit 110 includes a binning module 120 and a histogram aggregation module 130. For each incoming record, binning module 120 bins numeric and/or categorical data attributes, as necessary. Any conventional binning technique (uniform or non-uniform) can be used to bin numeric (i.e. real-valued) attributes and categorical attributes. The bins themselves, i.e., the categories into which the values are grouped by the binning process, are enumerated and represented by integers. The output of the binning process, is a new field, called a bin-index field, containing these integers. A value field is a numeric field, containing the data which is to be aggregated. For example, individual incomes (represented by a value field in a collection of records) may be aggregated to give total or average income for a population. As shown in FIG. 3A, after binning, an input record 300 includes one or more group-by fields 310, bin-index field(s) 320, and value field(s) 330.

According to the present invention, histogram aggregation module 130 processes each binned and sorted record in a stream 125 output by binning module 120. Histogram aggregation module 130 processes each binned and sorted record in stream 125 to form an output aggregate record 135 in a histogram format. Data received in each incoming binned and sorted record in stream 125 is expanded and accumulated in an aggregate record for matching group-by fields 115. Expansion and aggregation by histogram aggregation module 130 is performed in one-step without calculation of an intermediate result. This aggregation can be any associative aggregation operation, including but not limited to, determining a sum, maximum, minimum, and/or count. Average may be synthesized from sum and count. Because of the streaming operation, histogram aggregation module 130 requires little memory only for processing one or a few records at a time.

Group-by fields 115 used in matching by histogram aggregation module 130 can be input from any input device and stored in a memory device (not shown) in data transformation module 110. According to one feature of the present invention described below, a graphical user-interface is used to define parameters for the operation of the histogram aggregation module 130, including defining the group-by fields 115 and the type of aggregation operation (sum, max, min, count, and average).

The operation of histogram aggregation module 130, according to the first embodiment of the present invention, is described in even further detail below with respect to FIGS. 2, 3A, and 3B. FIG. 2 is a flowchart of a histogram aggregation routine 200. In step 210, histogram aggregation module 130 reads a new record in stream 125. As shown in FIG. 3A, after binning, an input record 300 includes one or more group-by fields 310, bin-index field(s) 320, and value field(s) 330.

In step 220, new record 300 is compared to a stored aggregate record 334. As shown in FIG. 3B, aggregate record 334 has group-by fields 115 and aggregation result fields 350. The one or more group-by fields 310 in new record 300 are compared against group-by fields 115 in stored aggregate record 334 to determine if all match. Because the input record stream is sorted on the group-by fields, if a match does not occur, this means that the new record is the first to be seen with its pattern of group-by fields, and that the records which matched the old pattern of group-by fields (as stored in the histogram aggregation record) are now completely accumulated. Hence, the histogram aggregation record is written to the output record stream. Once the output has taken place, the histogram aggregation record is reinitialized (step 240). Accordingly, aggregate record 334 is re-initialized having group-by fields 115 from the new record and empty or null values in the aggregation result fields 350.

In step 250, when a match occurs in step 220 or after step 240, the new record 300 is aggregated into aggregate record 334. In particular, the bin-index fields 320 are used to determine the appropriate location within the aggregation result fields 350. This location is then updated to reflect the value field 330. In particular, if the desired aggregation is sum, the location is updated with the sum of its previous value and the value field 330. Any type of aggregation operation can be performed including, but not limited to, determining a sum, maximum, minimum, and count. Average may be synthesized from sum and count.

In step 260, a check is made for the next record in the sorted stream 125. Processing returns to step 210 until all records in stream 115 have been processed. When all records have been processed, the aggregation record 334, representing the last pattern of group-by fields in the input stream is output (step 270). This output aggregation record then includes group-by fields 115 along with aggregation result fields 350 which have values corresponding to an aggregation of data values.

According to a second embodiment, a method, system, and computer program product for transforming unsorted data records is provided. As shown in FIG. 4, data transformation unit 110 receives a stream of unsorted data records 405 from database 100. Data transformation unit 110 includes a binning module 120, as described above, and a histogram aggregation module 430. An associative data structure 432 holds a collection of partially aggregated histogram records. For example, an associative data structure can be efficiently implemented using any of a variety of well-known techniques such as a hash table (Knuth, The Art of Computer Programming, vol 2, pages 506-550, Addison-Wesley, 1973) (incorporated herein by reference) or a red-black tree (Cormen, Leiserson, and Rivest, Introduction to Algorithms (MIT Press, 1990)), from the Standard Template Library (Hewlett-Packard Company, 1994) (incorporated herein by reference).

For each incoming record 405, binning module 120 bins numeric and/or categorical data attributes, as necessary. Histogram aggregation module 430 processes each binned record output from binning module 120 in stream 425 to form an aggregate record in a histogram format in one step. Input records from the unordered record stream 425 are matched against the collection of partially aggregated histogram records in associative data structure 432. Data received in each incoming binned record in stream 425 is expanded and accumulated into the appropriate matching partially aggregated histogram record for matching group-by fields 115. Expansion and aggregation by histogram aggregation module 430 is performed in one-step without calculation of an intermediate result. This aggregation can be any type of aggregation including, but not limited to, determining a sum, maximum, minimum, and/or count. Average may be synthesized from sum and count.

Histogram aggregation module 430 may require substantially more memory than the corresponding module in the first embodiment, since the associative data structure can become large. However, the memory needed by the associative data structure is proportional to the size of the output, rather than the input data set. Since aggregation often dramatically reduces the size of a data set, this difference is a significant savings.

The operation of histogram aggregation module 430, according to the second embodiment of the present invention, is described in even further detail below with respect to FIGS. 5 and 6. FIG. 5 is a flowchart of a histogram aggregation routine 500. In step 510, histogram aggregation module 430 reads a new record in stream 425. As shown in FIG. 3A, after binning, an input record 300 includes one or more group-by fields 310, bin-index field(s) 320, and value field(s) 330.

In step 520, new record 300 is compared against the collection of partially aggregated histogram records in associative data structure 432. As shown in FIG. 6, the associative data structure 432 has a collection of partially aggregated histogram records (601 to 604). For clarity, only four partially aggregated histogram records 601 to 604 are shown, however, the present invention is not so limited. In general, any number of partially aggregated histogram records may be stored. Partially aggregated histogram records 601 to 604 have corresponding group-fields 611-614 and aggregation result fields 651-654. Group-by fields 611 to 614 represent a distribution group-by fields 115.

The one or more group-by fields 310 in new record 300 are matched with the associative data structure to see if any of the contained records 601-604, have matching group-by fields 611 to 614. If no such match exists, the associative data structure is expanded to include a new partially aggregated histogram record, with group-by fields from the input record, and empty aggregation fields (step 525). This newly created record is then used as the matching partially aggregated histogram record. If a match does exist, the appropriate record (one of 601 to 604) is used as the matching partially aggregated histogram record to be passed on to step 530.

In step 530, when a match occurs in step 520, the new record 300 is aggregated into the appropriate partially aggregated histogram record 601, 602, 603 or 604. For example, if group-by fields 310 match group-by fields 611, then the new record 300 is aggregated into partially aggregated histogram record 601. The values in aggregation result field 651 are then updated based on an aggregation operation to include corresponding data values from value fields 330 in new record 300. As in step 250 in the first embodiment, the bin-index field(s) 320 can be used to determine which of the value(s) in aggregation result field 651 are to be updated. Any type of aggregation operation can be performed including but not limited to, determining a sum, maximum, minimum, count, and/or average.

In step 540, a check is made for the next record in the unsorted (or sorted) stream 425. Processing returns to step 510 until all records in stream 425 have been processed. When all records have been processed, the associative data structure is traversed, and each of the aggregated histogram records is output (step 550).

5. Example GUI Computer Environment

FIG. 7 is a block diagram illustrating an example environment in which the present invention can operate. The environment is a computer system 700 that includes one or more processors, such as processor 704. The processor 704 is connected to a communications bus 702. Various software embodiments are described in terms of this example computer system. After reading this description, it will be apparent to a person skilled in the relevant art how to implement the invention using other computer systems and/or computer architectures.

Computer system 700 includes a graphics subsystem 703. Graphics subsystem 703 can be implemented as one or more processor chips. The graphics subsystem 703 can be included as part of processor 704, as shown, or as a separate graphics engine or processor. Graphics data is output from the graphics subsystem 703 to the bus 702. Display interface 705 forwards graphics data from the bus 702 for display on the display unit 706.

Computer system 700 also includes a main memory 708, preferably random access memory (RAM), and can also include a secondary memory 710. The secondary memory 710 can include, for example, a hard disk drive 712 and/or a removable storage drive 714, representing a floppy disk drive, a magnetic tape drive, an optical disk drive, etc. The removable storage drive 714 reads from and/or writes to a removable storage unit 718 in a well known manner. Removable storage unit 718 represents a floppy disk, magnetic tape, optical disk, etc., which is read by and written to by removable storage drive 714. As will be appreciated, the removable storage unit 718 includes a computer usable storage medium having stored therein computer software and/or data.

In alternative embodiments, secondary memory 710 may include other similar means for allowing computer programs or other instructions to be loaded into computer system 700. Such means can include, for example, a removable storage unit 722 and an interface 720. Examples can include a program cartridge and cartridge interface (such as that found in video game devices), a removable memory chip (such as an EPROM, or PROM) and associated socket, and other removable storage units 722 and interfaces 720 which allow software and data to be transferred from the removable storage unit 722 to computer system 700.

Computer system 700 can also include a communications interface 724. Communications interface 724 allows software and data to be transferred between computer system 700 and external devices via communications path 726. Examples of communications interface 724 can include a modem, a network interface (such as Ethernet card), a communications port, etc. Software and data transferred via communications interface 724 are in the form of signals which can be electronic, electromagnetic, optical or other signals capable of being received by communications interface 724, via communications path 726. Note that communications interface 724 provides a means by which computer system 700 can interface to a network such as the Internet.

Graphical user interface module 730 transfers user inputs from peripheral devices 732 to bus 706. These peripheral devices 732 can be a mouse, keyboard, touch screen, microphone, joystick, stylus, light pen, or any other type of peripheral unit. These peripheral devices 732 enable a user to operate and control the data visualization tool of the present invention as described above.

The present invention is described in terms of this example environment. Description in these terms is provided for convenience only. It is not intended that the invention be limited to application in this example environment. In fact, after reading the following description, it will become apparent to a person skilled in the relevant art how to implement the invention in alternative environments.

The present invention is preferably implemented using software running (that is, executing) in an environment similar to that described above with respect to FIG. 7. In this document, the term "computer program product" is used to generally refer to removable storage unit 718 or a hard disk installed in hard disk drive 712. These computer program products are means for providing software to computer system 700.

Computer programs (also called computer control logic) are stored in main memory and/or secondary memory 710. Computer programs can also be received via communications interface 724. Such computer programs, when executed, enable the computer system 700 to perform the features of the present invention as discussed herein. In particular, the computer programs, when executed, enable the processor 704 to perform the features of the present invention. Accordingly, such computer programs represent controllers of the computer system 700.

In an embodiment where the invention is implemented using software, the software may be stored in a computer program product and loaded into computer system 700 using removable storage drive 714, hard drive 712, or communications interface 724. Alternatively, the computer program product may be downloaded to computer system 700 over communications path 726. The control logic (software), when executed by the processor 704, causes the processor 704 to perform the functions of the invention as described herein.

In another embodiment, the invention is implemented primarily in firmware and/or hardware using, for example, hardware components such as application specific integrated circuits (ASICs). Implementation of a hardware state machine so as to perform the functions described herein will be apparent to persons skilled in the relevant art(s).

6. Example Data Network Environment

FIG. 8 further illustrates an overall client/server network environment in which a data mining tool according to the present invention operates. The network environment described herein is illustrative only and is not necessary to practicing the present invention.

A client workstation 800 communicates over a data link 801 with a host server 850 as part of a local area network (LAN), campus-wide network, wide-area-network (WAN), or other network type configuration. Any data communication protocol can be used to transport data.

The client workstation 800 includes a computer system 700 as described above with respect to FIG. 7. The computer system 700 runs a graphical user-interface tool manager 810 for managing operation of a data visualization tool 820 output on visual display 825. Preferably, tool manager 810 and data visualization tool 820 are software modules executed on computer system 700. The host system 850 includes a host processor 860 and a data source 880. Data mover 870 and data miner 890 are included at the host server 850, and preferably, are implemented in software executed by the host processor 860.

FIG. 8 further illustrates the flow of a data visualization tool execution sequence through steps 831-837. First, a user logs onto the client workstation 800 and opens tool manager 810 (step 831). Through the tool manager 810, the user can select and configure any data visualization tool. A configuration file 815 must be set which identifies the content to be displayed in a data visualization. Tool manager 810 then generates a configuration file 815 defining the content of the data visualization and scope of external dimension space (step 832). To facilitate this process, predefined preference panels, templates and/or data palettes can be accessed through menu-driven commands and pop-up display windows to permit the user to easily define the content of a data visualization.

The configuration file 815 is sent over data link 801 to the host server 850 (step 833). A copy of the configuration file 815 is also stored at the client workstation 800 for use by the data visualization tool 825 (step 837). At the host server 850, the data mover 870 extracts data from the data source 880 corresponding to the configuration file 815 (step 834). The data miner 890 receives extracted data from the data mover 870 and generates a data file 895 (step 835). Data miner 890 then sends the data file 895 to the client workstation 800 (step 836).

Finally, in step 838, the data visualization tool 820 uses the data file 895 to generate a data visualization. Any type of data visualization tool can be used, including but not limited to, a visualization tool in the MINESET product manufactured by SILICON GRAPHICS, Inc. For example, scatter and map data visualization tools can be used as described in the application by Sang'udi et al., "Computer-Related Method, System, and Program Product for Controlling Data Visualization in External Dimension(s)," application Ser. No. 08/748,548, (filed Nov. 12, 1996, and incorporated by reference in its entirety herein), pending. A tree visualization tool can be used as described in the application by Joel D. Tesler, "Method, System and Computer Program Product for Visualizing Data Using Partial Hierarchies," Ser. No. 08/813,347, Attorney Docket No. 15-4-103.0151, 1452.0180003, (filed Mar. 7, 1997 and incorporated by reference in its entirety herein), pending. A decision tree classifier visualization tool can be used as described in the application by Kohavi and Tesler, "Method, System and Computer Program Product for Visualizing a Decision Tree Classifier," Ser. No. 08/813,336, Attorney Docket No. 15-4-417.00, 1452.2220000, (filed Mar. 7, 1997 and incorporated by reference in its entirety herein), pending.

According to a further feature of the present invention, a graphical-user-interface is used to provide parameters for data transformation involving histogram aggregation. The graphical user-interface for data transformation is provided in tool manager 810. In particular, a data transformation panel is displayed to permit a user to select and define parameters for configuring the binning module 120 and histogram aggregation modules 130, 430, as described above. An example data record transformation panel 920 is described below with respect to FIGS. 9A to 9C. Data record transform panel 920 allows binning to be defined by selecting a Binning button 922 which opens binning panels. Example binning panels for defining binning parameters are shown in FIGS. 10A and 10B.

Data record transform panel 920 allows histogram aggregation to be defined by selecting an Aggregate button 923 which opens an aggregation panel. An example aggregation panel for defining histogram aggregation parameters is shown in FIG. 11.

FIG. 9A shows data manipulation panels 910, 920, 930 that permit a user to construct a configuration file 815 by defining mapping requirements for data in a scatter data visualizer. For brevity, only main functions of panels 910-930 are highlighted here. Other functions can be added as would be obvious to one skilled in the art given this description.

In panel 910, server name window 912 identifies a server name containing data to be analyzed. Data source window 914 identifies particular database management systems, databases and data files to used in a data visualization.

Data record transformation panel 920 is used to define data transformations to be performed on the data before mapping the data to visual elements. Most operations are applied to columns of data. For example, Add Column option 921 lets a user create a new column based on other columns. Bin Columns option 922 is a data transformation where numerical values in a column are grouped into ranges, e.g., grouping ages into 5-year age groups. Other data transformation operations such as aggregating or classifying data, data ranges, or columns of data can be performed (options 923, 924). Current Columns window 925 lists the columns available for mapping to visual elements after the current data transformations have been applied. Initially, this is the set of columns in the original data. The selected data transformations can add, remove, or modify one or more columns.

Successive data transformations form a "history." The Table History arrow buttons 926 let a user move back and forth in this history. Current Columns window 925 and Current View indicator 927 reflect a user's current position in the table history. Edit History option 928 brings up a new window where the entire history is displayed visually allowing edits, insertions, or removal of data transformations. Edit Prev Op option 929 lets a user edit the previous transformation.

Data destination panel 930 includes a window 935. Window 935 lists visual elements requiring mapping. For a scatter plot, these visual elements include items defining axes, entities, and summary information. Optional items are preceded by an asterisk. Axis 1 lets a user assign a dependent variable to a first axis displayed in a scatter data visualization. Assigning a second dependent variable to a second axis (Axis 2) produces a 2-D chart. Assigning a third dependent variable to a third axis (Axis 3) produces a 3-D chart. Entity is an item used to uniquely identify the entities in the scatter data visualization. Entity-size, entity-color, and entity-label are items used to define data represented by graphical attributes of the entities. The summary item is used to define the data attribute or combination of data attributes representing the summary information in a summary window. For example, a scatter data visualization can be set-up through data panels 910-930 by assigning miles per gallon (mpg) to Axis 1 and cylinder to Axis 2.

Control buttons 936-970 are also included in data destination panel 930. Button 936 indicates the scatter data visualization tool is invoked. Tool Options button 940 displays the an options dialog box for configuring the scatter visualizer. Clear Selected Mapping button 950 clears a selected axis, entity, or summary data assignment mapping. Clear All Mapping button 960 clears all axis, entity, and summary mapping data assignments. Invoke Tool button 970 can be pressed after a configuration file is saved to see the scatter data visualization corresponding to the saved configuration file.

While panels 910-930 reduce the work in creating a configuration file, a configuration file 815 can be constructed manually using a text editor.

FIG. 9B shows data record transform panel 920 used with data destination panel 930 in which a map visualizer is selected. The visual elements window 935 then lists the one or more graphical attributes (e.g. height of bars and color of bars) associated with objects in a map data visualization. Optional items are preceded by an asterisk. For example, clicking Height-Bars lets a user specify the heights of geographic bars on a map data visualization. Clicking Color-Bars lets a user assign colors to the geographic bars.

FIG. 9C shows data record transform panel 920 used with data destination panel 930 in which a tree visualizer is selected. The visual elements window 935 then lists the one or more graphical attributes (e.g. height of bars, color of bars, hierarchy levels) associated with objects in a tree data visualization. Optional items are preceded by an asterisk. For example, clicking Height-Bars lets a user specify the heights of bars on nodes in a tree data visualization. Clicking Color-Bars lets a user assign colors to the node bars.

Data transform panel 920 is described in further detail with respect to FIGS. 10A, 10B, and 11.

7. Transforming the Data

Data Transformations panel 920 lets a user manipulate the tables with which a user wants to work. After a user has selected a table (via the Server Name and Data Source panel described above), its column headings appear in the Current Columns window of the Data Transformations panel. The Data Transformation data manipulation options are as follows:

"Remove Columns" lets a user delete one or more columns that are not relevant to the current visualization or mining.

"Bin Columns" lets a user take a range of values and assign each record to a group (for example, with a range of ages, 0-18, 19-25, 26-35, and so on).

"Aggregate" lets a user find aggregations (sum, min, max, and so on), group data into new columns, or make arrays from a column indexed by other columns.

"Change Types" lets a user change a column's type.

"Add Column" lets a user add a new column based on a mathematical expression.

"Apply Classifier" lets a user use a previously created classifier on a table for labeling new records.

Binning

Binning lets a user sort the information from one or more columns into groups in a new column or columns (for example, with a range of ages, 0-18, 19-25, 26-35, and so on). Click Bin Column to get a dialog box 1000 that lets a user specify the binning options (FIG. 10A). Dialog box 1000 lets a user choose the column that is to be divided into bins, specify the name of the new column to contain values for the bins, set bin thresholds, or specify a range with thresholds at regular intervals.

To specify binning options for one or more columns, select the column name(s), a user chooses the appropriate options, and clicks the Apply button at the bottom of the dialog box 1000. If a user selects only one column for binning, the name of the resulting binned column appears in the New column name box, and a user can type in a new name. In the example shown in FIG. 10A, Age_-- Bin is the name for the new column; in this case, it provides a range of ages. If a user selects more than one column for binning, New column name stays inactive. Next to New column name is a check box labeled Delete original column. When chosen, this option automatically deletes the original column after binning. Click the check box to turn this function on or off.

In the middle of the Bin columns dialogue box 1000 are two tabs for choosing Automatic Thresholds (1020) or User Specified Thresholds (1010). Choosing Automatic Thresholds opens panel 1020 and instructs the computer to suggest the bins. Choosing User Specified Thresholds (1010) lets a user specify the thresholds. If the Automatic Thresholds tab is chosen, a program can use machine learning to suggest bins.

As shown in FIG. 10B, the first choice under Automatic Threshold Computation is between the Automatically choose number of bins and the Group into a number of bins buttons. Click Automatically choose number of bins to let the computer decide the best number of bins. If a user chooses to specify the number of bins, a user clicks Group into: _-- bins, and types the number of bins wanted into the field.

In the Use approach menu, a user can choose between Uniform or Automatic binning. If a user chooses Uniform, the algorithm separates the interval into the specified number of bins; the range of the variable is separated into uniformly sized ranges. The upper and lower bounds for the extreme ranges include any values outside the ranges that were seen in the data. For example, if the values for an attribute are all in the range 5-14, and four ranges are chosen, the thresholds are 8 and 11, corresponding to these ranges:

≦8, >8 to 11, >11 to 14, >14

Uniform lets a user decide whether he or she wants to specify the number of intervals or let the algorithm select a number automatically. The automatic selection of the number of bins for the uniform thresholding is based on a number of bins related to the number of distinct values: the more distinct values, the more ranges are chosen (the relationship is logarithmic).

If a user chooses Automatic, a user also must select a discrete label. The thresholds are chosen so that the distributions of labels at different ranges are as different as possible. This approach continues to split ranges and create thresholds until no additional interval is considered significant. No interval is split if the two subintervals do not contain the minimum number of instances you can specify (this defaults to 5).

The Minimum # instances in any bin text field lets a user specify the minimum number of records in any bin; this prevents the creation of bins with fewer records than the number specified.

If a user clicks Apply, the Tool Manager picks bin thresholds and displays them in the Thresholds for selected column are text field. Output from the process of automatically computing bins appears in a pop-up window, showing progress of the algorithm and any errors that occur.

If a user specifies his or her own thresholds (as shown in FIG. 10A), a user can choose between Use custom thresholds or Use evenly spaced thresholds by clicking either button. Typing in the thresholds and clicking Apply makes those thresholds effective for the selected columns.

The Use custom thresholds text box lets a user enter the range criteria. For example, a user could enter the numbers 18, 30, 50, 60. This results in the following ranges: 0-18, 19-30, 31-50, 51-60, 61+. Note that a user enters only the digits and commas, not the ranges.

To specify equally spaced bins over a range of values, click the Equally Spaced Bins button. This activates the three text fields below it. A user can type the start of the binning range, the end of the range, and the spacing of the bins, respectively, into these fields. If a user is binning a column that is a date, a user can specify units of time for the bin spacing (using the Date units pop-up menu under the text fields). This would permit, for example, a user to bin a time period into bins of three weeks. Dates entered into these fields can be typed in the form "MM/DD/YY". Example time units are years, quarters, months, days, hours, minutes, seconds.

Aggregation

Before describing the features and effects of the Aggregate button, this section provides an introduction to the concept of arrays and distribution as used in the aggregation feature. The Aggregate button lets a user perform simple aggregations (for example, sum, min, max, and so on), make arrays and distribute columns.

Table 3-1 illustrates some sample aggregations/calculations.

TABLE 3-1

______________________________________

Aggregate Example 1

State Age_-- bin

Total $ Spent

______________________________________

CA 0-20 $50

CA 21-40 $454

CA 41-60 $693

NY 0-20 $35

NY 21-40 $541

NY 41-60 $628

______________________________________

If a user makes Total $ Spent into an array indexed by the binned column Age bin, the resulting table, now with only two columns appear as shown in

Table 3-2

TABLE 3-2

______________________________________

Aggregate Example 2

State Total $ Spent [Age_-- bin]

______________________________________

CA [$50, $454, $693]

NY [$35, $541, $628]

______________________________________

In this case, making an array reduces the number of columns by one, and also reduces the number of rows by four. Arrays are useful for the Tree Visualizer tool; they are needed if a user want to use sliders in Scatter Visualizer and Map Visualizer displays.

Distributing columns is similar, but different in several important ways. Instead of producing a single new column holding many values, distributing produces one new column for each value of the index. For example, if in the first table Total $ Spent were not made an array, but instead distributed by Age_-- bin, Table 3-3 would be the result.

TABLE 3-3

______________________________________

Aggregate Example 3

State Total $_-- 0-20

Total $_-- 21-40

Total $_-- 41-60

______________________________________

CA $50 $454 $693

NY $35 $541 $628

______________________________________

Thus, distributing increases the number of columns but decreases the number of rows. If a user has more than one binned column (for example, Age_-- bins and Sex_-- bin), you can make a two-dimensional array (indexed by combinations of Age_-- bin and Sex_-- bin). A user also can distribute and make an array at the same time.

Table 3-4 has two binned columns: one for age, one for sex.

TABLE 3-4

______________________________________

Example of binning

State Age_-- bin

Sex_-- bin

Total $ Spent

______________________________________

CA 0-20 1 $20

CA 0-20 2 $30

CA 21-40 1 $220

CA 21-40 2 $234

CA 41-60 1 $401

CA 41-60 2 $292

______________________________________

If a user make Total $ Spent an array indexed by age, and remove Sex_-- bin, the results are shown in Table 3-5.

TABLE 3-5

______________________________________

Results When Making

State Total $ Spent [Age_-- bin]

______________________________________

CA [$50, $454, $693]

______________________________________

If a user does not remove Sex_-- bin, the results are shown in Table 3-6.

TABLE 3-6

______________________________________

Results When Specifying Sex_-- bin

State Sex_-- bin

Total $ Spent [Age_-- bin]

______________________________________

CA 1 [$20, $220, $401]

CA 2 [$30, $234, $292]

______________________________________

If a user makes an array by both Age_-- bin and Sex_-- bin, the results are shown in Table 3-7.

TABLE 3-7

______________________________________

Results of Making an Array by

State Total $ Spent [Age_-- bin][Sex_-- bin]

______________________________________

CA [$20, $220, $401, $30, $234, $292]

______________________________________

Finally, if a user distributes by Sex_-- bin and index by Age_-- bin, the results are shown in Table 3-8.

TABLE 3-8

______________________________________

Results of Distributing Sex_-- bin and Indexing by Age_-- bin

Total $ Spent [Age_-- bin],

State Sex = 1 Sex = 2

______________________________________

CA [$20, $220, $401]

[$30, $234, $292]

______________________________________

The examples above (with the exception of Table 3-5) had exactly one relevant value for each array element, and the distribution merely rearranged existing data values. In example of Table 3-5, there were two data values for each array element, and these were added. Several aggregation options for data sets contain more than one value to be distributed into a given output array element. The most common option is to add the values (as done in Table 3-5). This is useful when accumulating expenditures into budgets, for example. A user also can take the minimum, maximum, and average of total number of values, as well as count them. When distributing values for a given data set, it is possible that there are no values appropriate for a particular bin. In this case, for MIN, MAX, AVG, and SUM aggregations, the Data mover fills in a value of Null. For COUNT aggregations, the Data mover fills in a value of 0.

The Aggregate Button

A user can use the Aggregate button to create simple aggregations, make arrays, or distribute columns. Clicking this button causes the Aggregate dialog box to appear (FIG. 11). It shows three lists, with the columns in the current table appearing in the middle list. If a user wants to aggregate, distribute, or turn a column into an array, select the name of the column, and click the left arrow button between the left and center lists. Below are pop-up menus that let a user specify indexes (if the result is to be an array) and a distribution column (if the result is to be distributed). In addition, at the bottom of the dialog box are five toggles that let a user specify how different values are to be combined when aggregated: either summed, averaged, the min or max value, or the count. When a user is aggregating number-valued columns, a user can choose any combination of these options. For other types, only count is permitted. If a user chooses more than one option, a user gets more than one result. For example, selecting average and max gives a user one result with average values, and another one holding the max values.

In Aggregate dialog box 1100, the three lists of column names are given below:

Columns to aggregate.

Group-By columns (the default); this keeps the columns unchanged throughout the operation. For each set of records with the same combination of values in the Group-By columns, only one record is output in the resulting table, with values in the aggregated columns summed, averaged, minned, maxed, or counted (depending on the check boxes at the bottom of the panel).

Columns to remove, as can be seen with the Sex_-- bin column in Table 3-5. After a user has finished with the additional aggregate criteria dialog box, the Current Columns text box in the Table Processing window shows the new column names that result from applying these criteria.

8. Conclusion

While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example only, and not limitation. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined in the appended claims. Thus, the breadth and scope of the present invention should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.

INVENTORS:

Haber, Eben M., Rathmann, Peter K.

THIS PATENT IS REFERENCED BY THESE PATENTS:

Patent	Priority	Assignee	Title
10042540,	Jun 22 2006	Microsoft Technology Licensing, LLC	Content visualization
10067662,	Jun 22 2006	Microsoft Technology Licensing, LLC	Content visualization
10282874,	Sep 17 2014	APICA INC	Efficient time-series histograms
10909177,	Jan 17 2017	WORKDAY, INC	Percentile determination system
10922336,	Jun 09 2015	WELLS FARGO BANK, N A	Systems and methods for indexing and aggregating data records
6049797,	Apr 07 1998	THE CHASE MANHATTAN BANK, AS COLLATERAL AGENT	Method, apparatus and programmed medium for clustering databases with categorical attributes
6144966,	May 06 1998	Canon Kabushiki Kaisha	Transformation system and method for transforming target data
6185561,	Sep 17 1998	Affymetrix, Inc.	Method and apparatus for providing and expression data mining database
6278989,	Aug 25 1998	Microsoft Technology Licensing, LLC	Histogram construction using adaptive random sampling with cross-validation for database systems
6282544,	May 24 1999	Computer Associates Think, Inc	Method and apparatus for populating multiple data marts in a single aggregation process
6301579,	Oct 20 1998	RPX Corporation	Method, system, and computer program product for visualizing a data structure
6373483,	Jan 13 1997	RPX Corporation	Method, system and computer program product for visually approximating scattered data using color to represent values of a categorical variable
6434560,	Jul 19 1999	International Business Machines Corporation	Method for accelerated sorting based on data format
6438552,	Oct 02 1998	TERADATA US, INC	SQL-Based automated histogram bin data derivation assist
6526408,	Oct 01 1999	SAS INSTITUTE, INC	System and method for in-process data record summarization and exception testing using virtual data records held in processor memory
6549910,	Oct 02 1998	TERADATA US, INC	SQI-based automated, adaptive, histogram bin data description assist
6633834,	Aug 29 2001	Hewlett Packard Enterprise Development LP	Baselining of data collector data
6694338,	Aug 29 2000	Silicon Valley Bank	Virtual aggregate fields
6711577,	Oct 09 2000	Battelle Memorial Institute	Data mining and visualization techniques
6775819,	Oct 27 1997	KLA-Tencor Corporation	Software system and method for graphically building customized recipe flowcharts
6842176,	Nov 12 1996	RPX Corporation	Computer-related method and system for controlling data visualization in external dimension(s)
6850952,	May 24 1999	Computer Associates Think, Inc.	Method and apparatus for populating multiple data marts in a single aggregation process
6928436,	Feb 28 2002	IBM INTERNATIONAL GROUP BV; International Business Machines Corporation	Interactive generation of graphical visualizations of large data structures
6976072,	Mar 30 2001	Sharp Kabushiki Kaisha	Method and apparatus for managing job queues
7080063,	May 10 2002	Oracle International Corporation	Probabilistic model generation
7110999,	Jul 28 2000	International Business Machines Corporation	Maintaining pre-computed aggregate views incrementally in the presence of non-minimal changes
7131234,	Nov 25 2003	Weyerhaeuser NR Company	Combination end seal and restraint
7228658,	Aug 27 2003	Weyerhaeuser NR Company	Method of attaching an end seal to manufactured seeds
7246020,	Jun 10 2005	ENDO USA, INC	System and method for sorting data
7269586,	Dec 22 1999	Hitachi America, Ltd	Patient rule induction method on large disk resident data sets and parallelization thereof
7356965,	Dec 12 2003	Weyerhaeuser NR Company	Multi-embryo manufactured seed
7532214,	May 25 2005	Spectra AB; Sectra AB	Automated medical image visualization using volume rendering with local histograms
7539677,	Oct 09 2000	Battelle Memorial Institute	Sequential pattern data mining and visualization
7547488,	Dec 15 2004	Weyerhaeuser NR Company	Oriented strand board panel having improved strand alignment and a method for making the same
7552020,	Jun 10 2005	Par Pharmaceutical, Inc.	System and method for sorting data
7555865,	Nov 25 2003	Weyerhaeuser NR Company	Method and system of manufacturing artificial seed coats
7568309,	Jun 30 2004	Weyerhaeuser NR Company	Method and system for producing manufactured seeds
7584205,	Jun 27 2005	Ab Initio Software LLC; Architecture LLC; Ab Initio Technology LLC	Aggregating data with complex operations
7591287,	Dec 18 2003	Weyerhaeuser NR Company	System and method for filling a seedcoat with a liquid to a selected level
7603807,	Nov 26 2003	Weyerhaeuser NR Company	Vacuum pick-up device with mechanically assisted release
7639256,	Oct 08 1999	BLUE YONDER GROUP, INC	System and method for displaying graphs
7654037,	Jun 30 2005	Weyerhaeuser NR Company	Method to improve plant somatic embryo germination from manufactured seed
7716162,	Dec 30 2004	GOOGLE LLC	Classification of ambiguous geographic references
7739096,	Mar 09 2000	Smartsignal Corporation	System for extraction of representative data for training of adaptive process monitoring equipment
7797626,	Feb 13 2003	SAP SE	Managing different representations of information
7830381,	Dec 21 2006	Sectra AB	Systems for visualizing images using explicit quality prioritization of a feature(s) in multidimensional image data sets, related methods and computer products
7890397,	Aug 29 2006	United Services Automobile Association (USAA)	System, method, and computer-readable medium for settling accounts
7894645,	Aug 10 2000	Ohio State University	High-resolution digital image processing in the analysis of pathological materials
7987195,	Apr 08 2008	GOOGLE LLC	Dynamic determination of location-identifying search phrases
7996170,	Jun 10 2005	Par Pharmaceutical, Inc.	System and method for sorting data
8024458,	May 16 2008	Amazon Technologies, Inc.	Tracking the frequency distribution of streaming values
8041129,	May 16 2006	Sectra AB	Image data set compression based on viewing parameters for storing medical image data from multidimensional data sets, related systems, methods and computer products
8122056,	May 17 2007	R2 SOLUTIONS LLC	Interactive aggregation of data on a scatter plot
8239170,	Mar 09 2000	Smartsignal Corporation	Complex signal decomposition and modeling
8275577,	Sep 19 2006	Smartsignal Corporation	Kernel-based method for detecting boiler tube leaks
8295620,	May 16 2006	Sectra AB	Image data set compression based on viewing parameters for storing medical image data from multidimensional data sets, related systems, methods and computer products
8311774,	Dec 15 2006	Smartsignal Corporation	Robust distance measures for on-line monitoring
8620596,	Jun 10 2005	Par Pharmaceutical, Inc.	System and method for sorting data
8620853,	Jul 19 2011	Smartsignal Corporation	Monitoring method using kernel regression modeling with pattern sequences
8691575,	Sep 30 2003	Weyerhaeuser NR Company	General method of classifying plant embryos using a generalized Lorenz-Bayes classifier
8694528,	Apr 08 2008	GOOGLE LLC	Dynamic determination of location-identifying search phrases
8856143,	Dec 30 2004	GOOGLE LLC	Classification of ambiguous geographic references
8965839,	Dec 19 2012	International Business Machines Corporation	On the fly data binning
8977589,	Dec 19 2012	International Business Machines Corporation	On the fly data binning
9053353,	Jun 01 1998	Weyerhaeuser NR Company	Image classification of germination potential of somatic embryos
9213471,	Jun 22 2006	Microsoft Technology Licensing, LLC	Content visualization
9213707,	Oct 12 2012		Ordered access of interrelated data files
9250625,	Jul 19 2011	GE INTELLIGENT PLATFORMS, INC	System of sequential kernel regression modeling for forecasting and prognostics
9256224,	Jul 19 2011	GE INTELLIGENT PLATFORMS, INC	Method of sequential kernel regression modeling for forecasting and prognostics
9298799,	Dec 11 2002	Altera Corporation	Method and apparatus for utilizing patterns in data to reduce file size
9323738,	Dec 30 2004	GOOGLE LLC	Classification of ambiguous geographic references
9495775,	Jun 28 2002	Microsoft Technology Licensing, LLC	System and method for visualization of categories
9633104,	May 03 2013	SAS Institute Inc.	Methods and systems to operate on group-by sets with high cardinality
9922113,	Jun 09 2015	WELLS FARGO BANK, N A	Systems and methods for indexing and aggregating data records

THIS PATENT REFERENCES THESE PATENTS:

Patent	Priority	Assignee	Title
3816726,
4719571,	Mar 05 1986	International Business Machines Corporation; INTERNATIONAL BUSINESS MACHINES CORPORATION, A CORP OF NEW YORK	Algorithm for constructing tree structured classifiers
4868771,	Mar 30 1987	Intel Corporation	Computer image generation with topographical response
4928247,	Aug 13 1987	HEWLETT-PACKARD DEVELOPMENT COMPANY, L P	Method and apparatus for the continuous and asynchronous traversal and processing of graphics data structures
4994989,	Oct 09 1987	Hitachi, Ltd.	Displaying method and apparatus for three-dimensional computer graphics
5043920,	Sep 20 1989	International Business Machines Corporation; INTERNATIONAL BUSINESS MACHINES CORPORATION, ARMONK, NEW YORK 10504 A CORP OF NEW YORK	Multi-dimension visual analysis
5072395,	Apr 12 1990	Motorola, Inc.; MOTOROLA, INC , A CORP OF DELAWARE	Navigation system with easily learned destination selection interface
5150457,	May 02 1990	International Business Machines Corporation	Enhanced visualization using translucent contour surfaces
5164904,	Jul 26 1990	Farradyne Systems, Inc.; FARRADYNE SYSTEMS, INC	In-vehicle traffic congestion information system
5201047,	Dec 21 1989	International Business Machines Corporation	Attribute-based classification and retrieval system
5247666,	May 28 1991	BUCKWOLD, LIA	Data management method for representing hierarchical functional dependencies
5251131,	Jul 31 1991	Oracle International Corporation	Classification of data records by comparison of records to a training database using probability weights
5253333,	Jan 25 1989	Hitachi, Ltd.	Attribute value predicting method and system in the learning system
5282262,	Apr 09 1992	Sony Corporation	Method and apparatus for transforming a two-dimensional video signal onto a three-dimensional surface
5295243,	Dec 29 1989	SAP America, Inc	Display of hierarchical three-dimensional structures with rotating substructures
5307456,	Dec 04 1990	Sony Electronics, INC	Integrated multi-media production and authoring system
5325445,	May 29 1992	EASTMAN KODAK COMPANY A NJ CORP	Feature classification using supervised statistical pattern recognition
5418946,	Sep 27 1991	Fuji Xerox Co., Ltd.	Structured data classification device
5420968,	Sep 30 1993	International Business Machines Corporation	Data processing system and method for displaying dynamic images having visual appearances indicative of real world status
5426780,	Feb 28 1992	Intergraph Software Technologies Company	System for dynamic segmentation analysis using conversion of relational data into object-oriented data
5459829,	Feb 13 1991	Kabushiki Kaisha Toshiba	Presentation support system
5463773,	May 25 1992	Fujitsu Limited	Building of a document classification tree by recursive optimization of keyword selection function
5519865,	Jul 30 1993	Mitsubishi Denki Kabushiki Kaisha	System and method for retrieving and classifying data stored in a database system
5528735,	Mar 23 1993	RPX Corporation	Method and apparatus for displaying data within a three-dimensional information landscape
5553163,	Dec 26 1991	Thomson-CSF	Polytomous segmentation process
5555354,	Mar 23 1993	RPX Corporation	Method and apparatus for navigation within three-dimensional information landscape
5604821,	Feb 28 1992	The University of South Florida	Structure and method for dynamic scene analysis
5634087,	Feb 28 1991	BANK ONE COLORADO, NA, AS AGENT	Rapidly trainable neural tree network
5659731,	Jun 19 1995	DUN & BRADSTREET INC ; OLD D & B OPERATING COMPANY, INC ; NEW DUN & BRADSTREET CORPORATION, THE	Method for rating a match for a given entity found in a list of entities
5671333,	Apr 07 1994	THE CHASE MANHATTAN BANK, AS COLLATERAL AGENT	Training apparatus and method
5675711,	May 13 1994	TREND MICRO INCORPORATED	Adaptive statistical regression and classification of data strings, with application to the generic detection of computer viruses
5675785,	Oct 04 1994	CA, INC	Data warehouse which is accessed by a user using a schema of virtual tables
5675786,	Jan 31 1994		Accessing data held in large databases
5680476,	Jul 03 1991	Robert Bosch GmbH	Method of classifying signals, especially image signals
5694524,	Feb 15 1994	R R DONNELLEY & SONS COMPANY	System and method for identifying conditions leading to a particular result in a multi-variant system
5696964,	Apr 16 1996	NEC Corporation	Multimedia database retrieval system which maintains a posterior probability distribution that each item in the database is a target of a search
5706495,	May 07 1996	International Business Machines Corporation	Encoded-vector indices for decision support and warehousing
5724573,	Dec 22 1995	International Business Machines Corporation	Method and system for mining quantitative association rules in large relational tables
5727199,	Nov 13 1995	International Business Machines Corporation	Database mining using multi-predicate classifiers
5732230,	May 19 1995	Ricoh Company, LTD	Computer user interface for manipulating image fragments using drag, drop and merge operations
5737487,	Feb 13 1996	Apple Inc	Speaker adaptation based on lateral tying for large-vocabulary continuous speech recognition
5748852,	Sep 16 1994	Reena Optics LLC	Fuzzy-logic classification system
5787274,	Nov 29 1995	International Business Machines Corporation	Data mining method and system for generating a decision tree classifier for data records based on a minimum description length (MDL) and presorting of records

ASSIGNMENT RECORDS Assignment records on the USPTO

////////

Executed on	Assignor	Assignee	Conveyance	Frame	Reel	Doc
Mar 11 1997		Silicon Graphics, Inc.	(assignment on the face of the patent)
Aug 07 1997	RATHMANN, PETER	SILICON GRAPHICS, INC	ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS	008680	0406	pdf
Aug 07 1997	HABER, EBEN M	SILICON GRAPHICS, INC	ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS	008680	0406	pdf
Apr 12 2005	SILICON GRAPHICS, INC AND SILICON GRAPHICS FEDERAL, INC EACH A DELAWARE CORPORATION	WELLS FARGO FOOTHILL CAPITAL, INC	SECURITY AGREEMENT	016871	0809	pdf
Oct 17 2006	Silicon Graphics, Inc	General Electric Capital Corporation	SECURITY INTEREST SEE DOCUMENT FOR DETAILS	018545	0777	pdf
Sep 26 2007	General Electric Capital Corporation	MORGAN STANLEY & CO , INCORPORATED	ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS	019995	0895	pdf
Jun 04 2009	Silicon Graphics, Inc	GRAPHICS PROPERTIES HOLDINGS, INC	CHANGE OF NAME SEE DOCUMENT FOR DETAILS	028066	0415	pdf
Dec 24 2012	GRAPHICS PROPERTIES HOLDINGS, INC	RPX Corporation	ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS	029564	0799	pdf

MAINTENANCE FEES AND DATES: Maintenance records on the USPTO

Date	Maintenance Fee Events
Mar 27 2003	M1551: Payment of Maintenance Fee, 4th Year, Large Entity.
Apr 30 2003	ASPN: Payor Number Assigned.
Mar 28 2007	M1552: Payment of Maintenance Fee, 8th Year, Large Entity.
Mar 28 2011	M1553: Payment of Maintenance Fee, 12th Year, Large Entity.

Date	Maintenance Schedule
Sep 28 2002	4 years fee payment window open
Mar 28 2003	6 months grace period start (w surcharge)
Sep 28 2003	patent expiry (for year 4)
Sep 28 2005	2 years to revive unintentionally abandoned end. (for year 4)
Sep 28 2006	8 years fee payment window open
Mar 28 2007	6 months grace period start (w surcharge)
Sep 28 2007	patent expiry (for year 8)
Sep 28 2009	2 years to revive unintentionally abandoned end. (for year 8)
Sep 28 2010	12 years fee payment window open
Mar 28 2011	6 months grace period start (w surcharge)
Sep 28 2011	patent expiry (for year 12)
Sep 28 2013	2 years to revive unintentionally abandoned end. (for year 12)