A computer implemented method for functional integration of metadata for a plurality of databases, includes: creating a single set of classes and instances for the classes for metadata of at least one relational data schema and at least one non-relational data schema for the plurality of databases; defining semantic relationships between the instances based on the structural information in the relational data schema and the non-relational data schema for the plurality of databases; associating the semantic relationships with a property class; creating a single set of triples for the semantic relationships between the instances; storing the single set of triples in a file; and using the set of triples stored in the file to navigate between the plurality of databases to formulate a response to a query.
|
16. A computer implemented method for functional integration of metadata for a plurality of databases, comprising:
creating a single set of classes and instances for the classes for metadata of at least one relational data schema and at least one non-relational data schema for the plurality of databases, wherein the single set of classes comprises first classes for the metadata and second classes for the instances of the first classes;
defining semantic relationships between the instances of the first classes and the second classes based on the structural information in the relational data schema and the non-relational data schema for the plurality of databases;
creating a first set of triples for the semantic relationships for the at least one relational data schema, wherein a range of one of the first set of triples links to a domain of another of the first set of triples forming a first chain of triples;
creating a second set triples for the semantic relationships or the non-relational data schema, wherein a range of one of the second set of triples links to a domain of another of the second set of triples forming a second chain of triples, wherein the first chain and the second chain comprise matching attribute values;
storing the first chain and the second chain in a single file; and
using the matching attribute values of the first chain and the second chain to navigate between the plurality of databases to formulate a response to a query.
18. A computer program product for functional integration of metadata for a plurality of databases, comprising:
a computer readable memory having computer readable program code embodied therewith, the computer readable program code configured to:
create a single set of classes and instances for the classes for metadata of at least one relational data schema and at least one non-relational data schema for the plurality of databases, wherein the single set of classes comprises first classes for the metadata and second classes for the instances of the first classes;
define semantic relationships between the instances of the first classes and the second classes based on the structural information in the relational data schema and the non-relational data schema for the plurality of databases;
create a first set of triples for the semantic relationships for the at least one relational data schema, wherein a range of one of the first set of triples links to a domain of another of the first set of triples forming a first chain of triples;
create a second set triples for the semantic relationships for the non-relational data schema, wherein a range of one of the second set of triples links to a domain of another of the second set of triples forming a second chain of triples, wherein the first chain and the second chain comprise matching attribute values;
store the first chain and the second chain in a single file; and
use the matching attribute values of the first chain and the second chain to navigate between the plurality of databases to formulate a response to a query.
1. A computer implemented method for functional integration of metadata for a plurality of databases, comprising:
creating a single set of classes and instances for the classes for metadata of at least one relational data schema and at least one non-relational data schema for the plurality of databases, wherein the single set of classes comprises first classes for the metadata and second classes for the instances of the first classes;
creating a single set of semantic relationships between the instances of the first classes and the second classes based on structural information in the relational data schema and the non-relational data schema;
creating a single set of triples for the semantic relationships between the instances of the first classes and the second classes; and
storing the single set of triples in a single file;
wherein the creating the single set of the semantic relationships between the instances of the first classes and the second classes based on the structural information in the relational data schema and the non-relational data schema comprises:
defining the semantic relationships between the instances of the first classes and the second classes based on the structural information in the relational data schema and the non-relational data schema for the plurality of databases;
creating a first set of triples for the semantic relationships for the at least one relational data schema, wherein a range of one of the first set of triples links to a domain of another of the first set of triples forming a first chain of triples;
creating a second set triples for the semantic relationships for the non-relational data schema, wherein a range of one of the second set of triples links to a domain of another of the second set of triples forming a second chain of triples; and
wherein the first chain and the second chain comprise matching attribute values.
6. A computer program product for functional integration of metadata for a plurality of databases, the computer program product comprising:
a computer readable memory having computer readable program code embodied therewith, the computer readable program code configured to:
create a single set of classes and instances for the classes for metadata of at least one relational data schema and at least one non-relational data schema for the plurality of databases, wherein the single set of classes comprises first classes for the metadata and second classes for the instances of the first classes;
create a single set of semantic relationships between the instances of the first classes and the second classes based on structural information in the relational data schema and the non-relational data schema;
create a single set of triples for the semantic relationships between the instances of the first classes and the second classes; and
store the single set of triples in a single file;
wherein the computer readable program code configured to create the single set of the semantic relationships between the instances of the first classes and the second classes based on the structural information in the relational data schema and the non-relational data schema is further configured to:
define the semantic relationships between the instances of the first classes and the second classes based on the structural information in the relational data schema and the non-relational data schema for the plurality of databases;
create a first set of triples for the semantic relationships for the at least one relational data schema, wherein a range of one of the first set of triples links to a domain of another of the first set of triples forming a first chain of triples;
create a second set triples for the semantic relationships for the non-relational data schema, wherein a range of one of the second set of triples links to a domain of another of the second set of triples forming a second chain of triples; and
wherein the first chain and the second chain comprise matching attribute values.
11. A system, comprising:
a plurality of databases defined by at least one relational data schema and at least one non-relational data schema; and
a server comprising a processor and a computer readable storage medium having computer readable program code embodied therewith, wherein when the computer readable program code is executed by the processor, causes the server to:
create a single set of classes and instances for the classes for metadata of the relational data schema and the non-relational data schema for the plurality of databases, wherein the single set of classes comprises first classes for the metadata and second classes for the instances of the first classes;
create a single set of semantic relationships between the instances of the first classes and the second classes based on structural information in the relational data schema and the non-relational data schema;
create a single set of triples for the semantic relationships between the instances of the first classes and the second classes; and
store the single set of triples in a single file;
wherein when the computer readable program code configured to create the single set of the semantic relationships between the instances of the first classes and the second classes based on the structural information in the relational data schema and the non-relational data schema is executed by the processor, further causes the server to:
define the semantic relationships between the instances of the first classes and the second classes based on the structural information in the relational data schema and the non-relational data schema for the plurality of databases;
creating a first set of triples for the semantic relationships for the at least one relational data schema, wherein a range of one of the first set of triples links to a domain of another of the first set of triples forming a first chain of triples;
creating a second set triples for the semantic relationships for the non-relational data second set of triples forming a second chain of triples; and
wherein the first chain and the second chain comprise matching attribute values.
2. The method of
creating the single set of classes for the metadata of the relational data schema and the non-relational data schema for the plurality of databases, wherein the single set of classes comprises the first classes for the metadata and the second classes for the instances of the first classes;
creating the single set of the instances for the classes, wherein the single set of the instances comprises first instances for the first classes and second instances for the second classes;
associating the first instances with the first classes; and
associating the second instances with the second classes.
3. The method of
4. The method of
storing the first chain and the second chain in the single file; and
using the matching attribute values of the first chain and the second chain to navigate between the plurality of databases to formulate a response to a query.
5. The method of
7. The computer program product of
create the single set of classes for the metadata of the relational data schema and the non-relational data schema for the plurality of databases, wherein the single set of classes comprises the first classes for the metadata and the second classes for the instances of the first classes;
create the single set of the instances for the classes, wherein the single set of the instances comprises first instances for the first classes and second instances for the second classes;
associate the first instances with the first classes; and
associate the second instances with the second classes.
8. The computer program product of
storing the first chain and the second chain in the single file; and
use the matching attribute values of the first chain and the second chain to navigate between the plurality of databases to formulate a response to a query.
9. The computer program product of
10. The computer program product of
12. The system of
create the single set of classes for the metadata of the relational data schema and the non-relational data schema for the plurality of databases, wherein the single set of classes comprises the first classes for the metadata and the second classes for the instances of the first classes;
create the single set of the instances for the classes, wherein the single set of the instances comprises first instances for the first classes and second instances for the second classes;
associate the first instances with the first classes; and
associate the second instances with the second classes.
13. The system of
14. The system of
storing the first chain and the second chain in the single file; and
use the matching attribute values of the first chain and the second chain to navigate between the plurality of databases to formulate a response to a query.
15. The system of
17. The method of
19. The method of
|
The present application is a continuation-in-part of co-pending U.S. patent application Ser. No. 12/179,903, filed on Jul. 25, 2008.
Multiple data models, and the consequent databases, allow business processes to be automated through both custom-built applications and commercial off-the-shelf software package-built applications. Each data model rests upon its own domain of attributes, defined by data schemas. Often, the same business entities exist concurrently in several data schemas, with a combination of database schemas for relational databases and non-relational databases, such as Cubes, Reports, Dashboards, and Scorecards. Often, the attributes defined by the data schemas are differently named, data-typed and constraint-typed. This leads to the multiplicity of definitions of business entities, which creates problems in data integration endeavors, particular in those directly concerned with information access and analysis.
Two approaches to the problem of data integration include a federated database approach and a data warehousing approach. The federated database approach brings attributes from different data schemas together within a single context or catalog. However, there are two drawbacks. Although the federated database approach accomplishes the structural integration of data, it fails in the functional integration of data. In a federated database, entities are individually cataloged. However, the federated database fails to reconstruct the conceptual entities. For example, assume that a business entity named Orders refers to a family of entities, where an Order has many Items and an Item has many Ship-to destinations. This family of entities would have at least three entities as a consequence of data decomposition under the federated database approach. However, the entity Orders is not reconstructed as a single conceptual entity with the child entities Item and Ship-to. Further, the federated approach does not deal with the metadata of non-relational data schemas.
The data warehousing approach makes a copy of related entities/tables and transforms them into a single entity/table. For example, the entities/tables Customers and Customer Types are placed within a single Customer dimension table by means of denormalization. However, such transformation cannot be accomplished with transaction tables such as Orders and Payments.
Furthermore, the known approaches require that users have a perfect knowledge of the underlying database structures in order to access the data. This requirement is impractical for business users to learn the intricacies of the databases. Thus, data integration projects require architects to acquire perfect knowledge of databases involved, which is a costly, time consuming, and impractical process.
According to one embodiment of the present invention, a computer implemented method for functional integration of metadata for a plurality of databases, includes: creating a single set of classes and instances for the classes for metadata of at least one relational data schema and at least one non-relational data schema for the plurality of databases; creating a single set of semantic relationships between the instances based on structural information in the relational data schema and the non-relational data schema; and storing the single set of semantic relationships in a file.
In one aspect of the present invention, the creating the single set of classes and instances for the classes for the metadata of the relational data schema and the non-relational data schema for the plurality of databases includes: creating the single set of classes for the metadata of the relational data schema and the non-relational data schema for the plurality of databases; creating the single set of the instances for the classes; and associating the instances with the classes.
In one aspect of the present invention, the creating the single set of the semantic relationships between the instances based on the structural information in the relational data schema and the non-relational data schema includes: defining the semantic relationships between the instances based on the structural information in the relational data schema and the non-relational data schema for the plurality of databases; associating the semantic relationships with a property class; and creating a single set of triples for the semantic relationships between the instances.
In one aspect of the present invention, each of the triples comprise a pair of instances of the set of instances and a property linking together the pair of instances.
In one aspect of the present invention, the single set of triples are stored in the file.
In one aspect of the present invention, the set of triples stored in the file are used to navigate between the plurality of databases to formulate a response to a query.
System and computer program products corresponding to the above-summarized methods are also described and claimed herein.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.
Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java® (Java, and all Java-based trademarks and logos are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both), Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
Aspects of the present invention are described below with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer special purpose computer or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer readable medium that can direct a computer other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified local function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
The embodiments of the method of the present invention accomplish reverse data decomposition, a process for the functional integration of metadata of data schemas for relational and non-relational databases, where the attributes, and other metadata artifacts from the functional decomposition of data, are integrated into a single, unified conceptual schema. Non-relational databases include data that are derived from a relational database, such as Cubes and Reports.
In one embodiment, the method employs the frame-based knowledge representation technique and the dialect of the Web Ontology Language (OWL). OWL includes classes, individuals, and properties. A class defines a group of individuals that belong together because they share some properties. Individuals are instances of classes, and properties may be used to relate one individual to another. Properties can be used to state relationships between individuals or from individuals to data values. OWL represents the content of information through “triples”, using the format <domain><property><range>. The properties link together instances drawn from their respective classes based on the structural information in the data schemas.
For Sales_Analysis, the instance 507 of the cube is associated with the Cubes class 508. The instance 509 of the Sales_Fact table 420 is associated with the Fact_Tables class 510. The instances 511 of the dimension tables 421-424 of Sales_Analysis are each associated with the Dimension_Tables class 512. The instances 513 of the members of the dimension tables 421-424 are each associated with the Dimension_Members class 514. The instance 515 of the measure for the Sales_Fact table 420 is associated with the Fact_Measures class 516.
For EOQ_Report, the instance 517 of the report is associated with the Report_Names class 518. The instances 519 of the attributes of the report are each associated with the Report_Attributes class 520. The instance 521 of the Category attribute is associated with the Filter_Parameters class 522. The instance 523 of Eoq-catogry.dot.sql is associated with the Query_File_Names class 524. The instance 525 of the C.Colon.Slash is associated with the Query_File_Locations class 526.
The method also defines the semantic relationships between the instances based on the structural information in the data schemas and associates the semantic relationships with a property class (303). As illustrated in
From the classes and instances, the method creates a single set of triples for the semantic relationships between the instances (304).
For the Sales_Analysis cube, the triple 607 indicate that the cube has Sales_Amt as the measure. The triples 608 indicate that the cube has Time_ID, Cust_ID, Item_ID, and Salesman_ID as (foreign) keys. The triples 609 indicate that the cube has the Customers_Dim 421, Item_Dim 423, Salesman_Dim 424, and Time_Dim 422 dimension tables. The triples 610 indicate that the Customers_Dim table 421 has Customer_ID, Customer_Nm, and Cust_Type as members. The triples 611 indicate that the Item_Dim table 423 has Item_ID and Item_Nm as members. The triples 612 indicate that the Salesman_Dim table 424 has Salesman_ID and Salesman_Nm as members. The triples 613 indicate that the Time_Dim table 423 has Time_ID and Date as members. The triples 614 indicate that the Customers_Dim 421, Item_Dim 423, Salesman_Dim 424, and Time_Dim 422 tables have Cust_ID, Item_ID, Salesman_ID, and Time_ID as (primary) keys, respectively.
For EOQ_Report, the triple 615 indicates that the report has Time_ID as a key. The triples 616 indicate that the report has Company_Name, Category, Item_Description, Date, Sales_Amount, and Sales_Quantity as attributes. The triples 617 indicate that the EOQ_Report has Category as a filter parameter, Eoq-cateogry.sql as a query file name, and C:\ as a location of the query file.
The triples illustrated in
Although the present invention has been described in accordance with the embodiments shown, one of ordinary skill in the art will readily recognize that there could be variations to the embodiments and those variations would be within the spirit and scope of the present invention. Accordingly, many modifications may be made by one of ordinary skill in the art without departing from the spirit and scope of the appended claims.
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
6574619, | Mar 24 2000 | BLUE YONDER GROUP, INC | System and method for providing cross-dimensional computation and data access in an on-line analytical processing (OLAP) environment |
6615253, | Aug 31 1999 | Accenture Global Services Limited | Efficient server side data retrieval for execution of client side applications |
6718320, | Nov 02 1998 | International Business Machines Corporation | Schema mapping system and method |
6947929, | May 10 2002 | International Business Machines Corporation | Systems, methods and computer program products to determine useful relationships and dimensions of a database |
7035855, | Jul 06 2000 | GFK US MRI, LLC | Process and system for integrating information from disparate databases for purposes of predicting consumer behavior |
7133865, | Jul 21 1999 | Method and systems for making OLAP hierarchies summarisable | |
7249241, | Apr 29 2004 | Oracle America, Inc | Method and apparatus for direct virtual memory address caching |
7257597, | Dec 18 2001 | Oracle America, Inc | Table substitution |
7533107, | Sep 08 2000 | The Regents of the University of California | Data source integration system and method |
7533122, | Dec 07 2005 | SAP SE | System and method for matching schema elements to ontology according to correspondence test |
7720803, | Mar 28 2006 | SAP SE | Mapping of a transactional data model to a reporting data model |
7836097, | Sep 29 2004 | The Cleveland Clinic Foundation | Extensible database system and method |
7840589, | May 09 2005 | Gula Consulting Limited Liability Company | Systems and methods for using lexically-related query elements within a dynamic object for semantic search refinement and navigation |
7865461, | Aug 30 2005 | TRANSPACIFIC DELTA SCI, LLC | System and method for cleansing enterprise data |
20020059566, | |||
20020169788, | |||
20030018616, | |||
20030140308, | |||
20040034651, | |||
20040054683, | |||
20040064456, | |||
20040122646, | |||
20040139095, | |||
20050043940, | |||
20050102297, | |||
20050216500, | |||
20050273314, | |||
20060038084, | |||
20060136376, | |||
20060136452, | |||
20060167856, | |||
20060218157, | |||
20070136323, | |||
20070156623, | |||
20070203923, | |||
20070208697, | |||
20070260620, | |||
20080021888, | |||
20080027981, | |||
20080040308, | |||
20080059524, | |||
20080077598, | |||
20080168420, | |||
20090024590, | |||
20090043730, | |||
20090070391, | |||
20090077094, | |||
20090144293, | |||
20090150367, | |||
20100049728, | |||
20100138366, | |||
20100192057, | |||
WO2005019997, | |||
WO2005106711, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Sep 29 2010 | International Business Machines Corporation | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Date | Maintenance Schedule |
Aug 12 2017 | 4 years fee payment window open |
Feb 12 2018 | 6 months grace period start (w surcharge) |
Aug 12 2018 | patent expiry (for year 4) |
Aug 12 2020 | 2 years to revive unintentionally abandoned end. (for year 4) |
Aug 12 2021 | 8 years fee payment window open |
Feb 12 2022 | 6 months grace period start (w surcharge) |
Aug 12 2022 | patent expiry (for year 8) |
Aug 12 2024 | 2 years to revive unintentionally abandoned end. (for year 8) |
Aug 12 2025 | 12 years fee payment window open |
Feb 12 2026 | 6 months grace period start (w surcharge) |
Aug 12 2026 | patent expiry (for year 12) |
Aug 12 2028 | 2 years to revive unintentionally abandoned end. (for year 12) |