Embodiments of the present invention provide techniques for generating ontologies. In one embodiment, techniques are provided for automatically generating an ontology based upon input information. The input information may, for example, be in the form of xsd, xml, WSDL, or WSRP, etc. The automatically generated ontology may be encoded in owl or other RDF-compliant language. A set of inference rules may also be automatically generated using the input information. The automatically generated ontology and the set of inference rules may be stored in a database for further processing.

Patent
   8214401
Priority
Feb 26 2009
Filed
Feb 26 2009
Issued
Jul 03 2012
Expiry
Feb 27 2030
Extension
366 days
Assg.orig
Entity
Large
15
46
all paid
1. A computer-implemented method for generating an ontology, the method comprising:
receiving, at a processing system, input information in a first format;
annotating, at the processing system, the input information to generate annotated input information, wherein;
the annotating comprises adding a first set of annotations and a second set of annotations to the input information;
the first set of annotations is comprised of relationships between a plurality of entities present in the input information; and
the second set of annotations is comprised of annotations that can be used to generate a set of explicit business rules for the purposes of a specific organization, wherein;
the input information comprises an xsd file and an xml file;
annotating the input information comprises generating an annotated xsd file and an annotated xml file;
and wherein generating the ontology comprises:
converting the annotated xsd file to an owl schema file; and
converting the annotated xml file to an owl instance file based upon the owl schema file; and
generating, at the processing system, based upon the annotated input information, an ontology encoded in a second format, wherein the second format is different from the first format.
16. A system for generating an ontology, the system comprising:
a memory configured to store the ontology; and
a processor coupled to the memory, wherein the processor is configured to:
receive input information in a first format;
annotate the input information to generate annotated input information by adding a first set of annotations and a second set of annotations to the input information;
wherein:
the first set of annotations is comprised of relationships between a plurality of entities present in the input information; and
the second set of annotations is comprised of annotations that can be used to generate a set of explicit business rules for the purposes of a specific organization, wherein:
the input information comprises an xsd file and an xml file;
annotating the input information comprises generating an annotated xsd file and an annotated xml file; and
wherein generating the ontology comprises:
converting the annotated xsd file to an owl schema file; and
converting the annotated xml file to an owl instance file based upon the owl schema file;
and
generate, based upon the annotated input information, an ontology encoded in a second format, wherein the second format is different from the first format.
10. A computer-readable storage medium storing a plurality of instructions for controlling a processor to generate an ontology, the plurality of instructions comprising:
instructions that cause the processor to receive input information in a first format;
instructions that cause the processor to annotate the input information to generate annotated input information, wherein:
the instructions that cause the processor to annotate the input information comprise instructions that cause the processor to add a first set of annotations and a second set of annotations to the input information;
the first set of annotations is comprised of relationships between a plurality of entities present in the input information; and
the second set of annotations is comprised of annotations that can be used to generate a set of explicit business rules for the purposes of a specific organization, wherein:
the input information comprises an xsd file and an xml file;
annotating the input information comprises generating an annotated xsd file and an annotated xml file; and
wherein the instructions that cause the processor to generate the ontology comprises:
instructions that cause the processor to convert the annotated xsd file to an owl schema file; and
instructions that cause the processor to convert the annotated xml file to an owl instance file based upon the owl schema file; and
instructions that cause the processor to generate, based upon the annotated input information, an ontology encoded in a second format, wherein the second format is different from the first format.
2. The method of claim 1 wherein the first format is a format encoded using one of Extensible Markup Language (xml), xml schema Definition (xsd), Web Services Description Language (WSDL), or Web Services for Remote Portlets (WSRP).
3. The method of claim 1 wherein the second format is a format encoded using Resource Description Framework (RDF) compliant language for encoding an ontology.
4. The method of claim 3 wherein the second format uses Web ontology Language (owl).
5. The method of claim 1 wherein the second set of annotations comprises one or more annotations that can be used to generate one or more inference rules based on the input information.
6. The method of claim 1 wherein converting the xsd file to the owl schema file comprises mapping various xsd elements in the annotated xsd file to corresponding owl elements based upon a mapping table.
7. The method of claim 1 wherein the generating comprises generating a set of one or more rules based upon the annotated input information.
8. The method of claim 1 wherein the generating comprises:
generating an intermediate ontology in the second format, the intermediate ontology annotated using the second set of annotations; and
removing the second set of annotations from the intermediate ontology.
9. The method of claim 1 further comprising generating an ontology graph based upon the generated ontology.
11. The computer-readable storage medium of claim 10 wherein the first format is a format encoded using one of Extensible Markup Language (xml), xml schema Definition (xsd), Web Services Description Language (WSDL), or Web Services for Remote Portlets (WSRP).
12. The computer-readable storage medium of claim 10 wherein the second format is a format encoded using Resource Description Framework (RDF) compliant language for encoding an ontology.
13. The computer-readable storage medium of claim 10 wherein the second set of annotations comprises one or more annotations that can be used to generate one or more inference rules based on the input information.
14. The computer-readable storage medium of claim 10 wherein the instructions that cause the processor to generate an ontology comprise instructions that cause the processor to generate a set of one or more rules based upon the annotated input information.
15. The computer-readable storage medium of claim 10 further comprising instructions that cause the processor to generate an ontology graph based upon the generated ontology.

Embodiments of the present invention relate to ontologies, and more specifically to techniques for automatically generating ontologies for enterprise applications.

Many enterprise applications today rely on the usage of Extensible Markup Language (XML) based data format to exchange and share information such as via the Internet. XML-based data formats that have been used for many enterprise applications may include XML Schema Definition (XSD), Web Services Description Language (WSDL), Web Services for Remote Portlets (WSRP), and others. The use of these XML-based data formats, however, limits many enterprise applications to syntactic integration. This is due to a lack of semantic and relationship representation of enterprise data in XML-based documents since only the syntax and structure of data are defined in the XML-based documents. Accordingly, a different data format is needed to achieve semantic integration for enterprise data.

With the advent of semantic technologies, the importance of ontologies has grown manifold. An ontology is a formal representation of knowledge. An ontology is generally a formal representation of a set of concepts within a domain and relationships between the concepts. Ontologies are being used in several different areas including business analytics, enterprise systems, artificial intelligence and the like, and have the potential to be used in several other fields and applications.

An ontology is typically encoded using an ontology language. Several ontology languages are available. The Web Ontology Language (OWL) has become the industry standard for representing web ontologies. OWL can be used to explicitly represent the meaning of terms in vocabularies and the relationships between the terms. OWL thus provides facilities for expressing meaning and semantics that go far beyond the capabilities of XML-based data formats such as XSD, XML, WSRP, or WSDL. OWL is being developed by the Web Ontology Working Group as part of the World Wide Web Consortium (W3C) Semantic Web Activity. OWL is based on Resource Description Framework (RDF). Further information related to OWL and RDF may be found at the W3C website.

Ontologies are often generated manually. Such manual intervention is, however, both time consuming and error prone. On the other hand, conventional automation tools and methodologies that have been developed for generating ontologies do not meet the enterprise business needs. For example, many of these automation tools and methodologies cannot handle large amounts of enterprise data for generating large ontologies. Accordingly, existing implementations of these automation tools are generally limited to handling only a small amount of enterprise data, and often do not scale well when dealing with large amounts of enterprise data. Many of these tools also lack other desired features.

Embodiments of the present invention provide techniques for generating ontologies. In one embodiment, techniques are provided for automatically generating an ontology based upon input information. The input information may, for example, be in the form of XSD, XML, WSDL, or WSRP, etc. The automatically generated ontology may be encoded in OWL or other RDF-compliant language. A set of inference rules may also be automatically generated using the input information. The automatically generated ontology and the set of inference rules may be stored in a database for further processing.

According to an embodiment of the present invention, techniques are provided for generating an ontology. A system may receive input information in a first format. The input information may be annotated. A first set of annotations and a second set of annotations may be added to the input information. Based upon the annotated input information, an ontology is generated. The ontology is generated in a second format that is different from the first format.

In one embodiment, the first format may be XML or XSD. The second format may be RDF-compliant language such as OWL. The first set of annotations comprises one or more annotations that mark relationships between entities present within the input information. The second set of annotations comprises one or more annotations that can be used to generate one or more rules based on the input information.

In one embodiment, the input information may include an XSD document and an XML document. The XSD document and the XML document may be annotated to generate an annotated XSD file and an annotated XML file. In one embodiment, generating an ontology may comprise converting the annotated XSD file to an OWL schema file, and converting the annotated XML file to an OWL instance file. The OWL instance file may be generated based upon the type of information obtained from the OWL schema file. In one embodiment, converting the annotated XSD file to the OWL schema file may comprise mapping the various XSD elements in the annotated XSD file to their corresponding OWL elements based upon a mapping table.

In one embodiment, generating an ontology may comprise generating a set of one or more rules based upon the annotated input information. In one embodiment, generating an ontology may comprise generating an intermediate ontology in the second format, the intermediate ontology annotated using the second set of annotations; and removing the second set of annotations from the intermediate ontology. An ontology graph may be generated based upon the generated ontology.

The foregoing, together with other features and embodiments will become more apparent when referring to the following specification, claims, and accompanying drawings.

FIG. 1A is a simplified block diagram of a system incorporating an embodiment of the present invention;

FIG. 1B is a simplified flowchart depicting a method for automatically transforming an XSD document to an OWL schema incorporating an embodiment of the present invention;

FIG. 2 is a simplified flowchart depicting a method for automatically generating ontologies according to an embodiment of the present invention; and

FIG. 3 is a simplified block diagram of a computer system that may be used to practice an embodiment of the present invention.

In the following description, for the purposes of explanation, specific details are set forth in order to provide a thorough understanding of embodiments of the invention. However, it will be apparent that the invention may be practiced without these specific details.

Embodiments of the present invention provide techniques for generating ontologies. In one embodiment, techniques are provided for automatically generating an ontology based upon input information. The input information may, for example, be in the form of XSD, XML, WSDL, or WSRP, etc. The automatically generated ontology may be encoded in OWL or other RDF-compliant language. A set of inference rules may also be automatically generated using the input information. The automatically generated ontology and the set of inference rules may be stored in a database for further processing.

An ontology is a formal representation of knowledge. An ontology is a formal representation of a set of concepts within a domain and relationships between the concepts. An ontology is typically encoded using an ontology language such as OWL, other Resource Description Framework (RDF) compliant languages recommended by the W3C, and the like. OWL is based upon RDF, which is a W3C specification standard for describing resources on the web. OWL can be used to explicitly represent the meaning of terms in vocabularies and the relationships between the terms. OWL thus provides facilities for expressing meaning and semantics that go far beyond the capabilities of languages such as XML. OWL is being developed by the Web Ontology Working Group as part of the W3C Semantic Web Activity and is geared for use by applications that need to process the meaning of information instead of just presenting the information to humans. In this manner, OWL facilitates greater machine interpretability of information (e.g., Web content) than other languages.

As an example, an ontology can be viewed as a set of ontological statements. Each ontological statement is called a triple as it comprises three parts/terms. They are a subject, a predicate, and an object. A subject identifies the entity that the statement describes, a predicate identifies the property, characteristic, relationship, or attribute of the subject, and the object identifies the value of the predicate. As a specific example, an ontology for a purchasing application may include an ontological statement “Purchase Order X has an owner that is Business Corp” or “Purchase Order X is owned by Business Corp.” In the example, “Purchase Order X” is the subject, “owner/own” is the predicate, and “Business Corp” is the object. In ontologies encoded using an RDF-compliant language such as OWL, subjects, predicates, and objects may be represented by triples of Uniform Resource Indicators (URIs). A URI uniquely identifies the subject, or the predicate, or the object. The URIs may or may not identify locations of searchable data corresponding to the entities described by ontological statements.

FIG. 1A is a simplified block diagram of a system 100 incorporating an embodiment of the present invention. As depicted in FIG. 1A, system 100 comprises a processing system 102 that is configured to automate generation of ontologies. The ontologies that are automatically generated by processing system 102 may be encoded in OWL, or any other RDF-compliant language.

In the embodiment depicted in FIG. 1A, processing system 102 comprises several modules that facilitate automated generation of ontologies for enterprise applications (e.g., a purchasing application). These modules include a user interface module 110, an annotator 112, an ontology generator 114, an ontology sieve 116, an inference rule generator 118, an ontology loader 119, and a Jena adapter 130. These modules may be implemented in software (e.g., code, instructions, program) executed by a processor of processing system 102, hardware, or combinations thereof. It should be apparent that the modules depicted in FIG. 1A are not intended to limit the scope of the present invention, as recited in the claims. In alternative embodiments, processing system 102 may have more or less modules than the ones depicted in FIG. 1A.

In the embodiment depicted in FIG. 1A, the ontology (shown as ontology 124) that is automatically generated by processing system 102 is persisted in database 106. Ontology 124 may be encoded in various different languages. In one embodiment, ontology 124 is encoded using an RDF-compliant language such as the OWL web ontology language and may be queried using an RDF-compliant query language such as SPARQL. Ontology 124 may include data for one or more domains. For example, in one embodiment, ontology 124 may include information for an enterprise. In alternative embodiments, ontology 124 may be persisted using other techniques. Ontology 124 may also be stored remotely from processing system 102. For example, ontology 124 may be stored in a remote location accessible to processing system 102 via a communication network such as communication network 108 depicted in FIG. 1A. Further, ontology 124 may also be stored in a distributed manner in a network environment. While only one ontology 124 is depicted in FIG. 1A, it should be apparent that multiple ontologies may be generated by processing system 102 as described in this application.

Communication network 108 may be any network configured for data communication such as a local area network (LAN), a wide area network (WAN), the Internet, and the like. Various different communication protocols may be used to communicate data through communication network 108 including wired and wireless protocols.

User interface 110 provides an interface for receiving input information from a user of processing system 102 and for outputting information from processing system 102. For example, a user of processing system 102 may input enterprise information related to an enterprise application via user interface 110. For example, the information entered by a user of processing system 102 may be information related to a purchasing application such as information related to the buyer and seller, quantity, price, and other information. The information entered by a user may be encoded in a format such as XSD, XML, WSDL, WSRP, or other XML-based format. As mentioned previously, there is a lack of semantic and relationship representation of information in XML-based documents since only the syntax and structure of the information are defined in XML-based documents.

In one embodiment, input information (e.g., enterprise information) may be received from a device or machine other than directly from a user. For example, input information may be provided by a client system 120 located remotely from processing system 102 and coupled communicatively with processing system 102 via communication network 108. Client system 120 may be for example a computer, a mobile device such as a cell phone, a personal digital assistant (PDA), and the like. In such an embodiment, the client system may be configured to provide input information periodically to processing system 102 via communication network 108.

In one embodiment, processing system 102 is configured to automatically generate an ontology and a set of inference rules based upon the input information received by processing system 102. The automatically generated ontology is encoded in a format that is different from the format of the input information. For example, the input information may be in a format such as XSD, XML, WSDL, WSRP, or other XML-based format, while the automatically generated ontology may, for example, be encoded in OWL, or other RDF-compliant language. The automatically generated ontology may then be persisted in a database such as database 106 depicted in FIG. 1A. The set of inference rules that are automatically generated by processing system 102 may also be persisted in database 106. In one embodiment, the set of inference rules may be stored in an inference rule base 122 in database 106. In this manner, input information received by processing system 102 is automatically transformed from an XML-based data format to an OWL-based ontology and a set of inference rules that are capable of representing the relationship and semantics of the input information.

For purposes of simplifying the following description, it is assumed that the ontology that is generated and stored by processing system 102 (e.g., ontology 124) is encoded in OWL. However, this is not intended to limit the scope of the present invention. Ontology 124 may also be generated and stored in any other RDF-compliant language.

In one embodiment of the present invention, the input information received by processing system 102 (e.g., XML, XSD, WSDL, or WSRP) is first annotated to generate annotated input information. In one embodiment, two types of annotations may be inserted into the input information: (1) Relationship annotations are annotations that are added to the input information to mark the inherent relationships between entities of same type (or instances of the same entity) or relationships between two or more different types of entities. For example, an entity may be any one of xsd:element, xsd:complexType, xsd:simpleType, etc., that can be represented by an owl:class in an RDF/OWL complaint language. An example of relationship annotations may be specified as owl:sameAs that relates instances of an owl:class of the same type as same or identical. Another example of relationship annotations may be specified as owl:equivalentClass or owl:intersectionOf that relates two or more entities of different types. (2) Inference annotations are annotations that are added to the input information for generating one or more semantic web rules or inference rules. The inference rules that are generated from the inference annotations may be persisted into database and used for executing queries on the ontology to retrieve related information. These inference rules may be viewed as explicit business rules that are created for the purpose of a specific organization. An example for an inference rule that is generated from the inference annotations may be specified as: IF X is EngineeringChangeOrder with Status=“Open” THEN X is equivalent to EngineeringChangeOrder with Priority=“High”. The inference rule states that EngineeringChangeOrders that have their Status as “Open” are equivalent to those that have their Priority as “High”. This inference aims to retrieve all high priority EngineeringChangeOrders (additional related information) when asked for open EngineeringChangeOrders.

In one embodiment, as described below, relationship annotations may be used to transform the input information received by processing system 102 into corresponding OWL ontologies having rich interconnections such that the entities present in the ontology are extensively related or linked to each other via the relationships. As discussed, relationships in OWL ontology may be represented using various OWL constructs such as owl:subClassOf, owl:equivalentClass, owl:subPropertyOf, owl:TransitiveProperty, owl:sameAs, etc. In one embodiment, one or more inferences may be made from the relationships among the entities in the OWL ontology. For example, given relationship annotations that are specified as A owl:subClassOf B and B owl:subClassOf C, it may be inferred from these given relationship annotations that A owl:subClassOf C. The inferences that are derived from the relationship annotations are inherent and implicit rules (universally true) that are different from the inference rules that are derived from the inference annotations.

Annotator 112 is configured to receive input information in a format such as XSD, XML, WSDL, WSRP, or other XML-based format and annotate the input information. The input information may be annotated using customized tags. In one embodiment, the annotations generated by annotator 112 may be placed in the standard <xsd:annotation> using two namespaces ‘semrel’(indicating relationship annotations) and ‘seminf’(indicating inference annotations) for differentiating these tags from other annotated data.

For example, an input XSD document

<xsd:element name=“Item” type=“Itemtype”>
 <xsd:annotation>
   <xsd:documentation>
     <semrel:relationships>
      <semrel:sameAs
targetEntity=“../../scm/product#Product”/>
</semrel:relationship>
</xsd:documentation>
</xsd:annotation></xsd:element>

Accordingly, annotator 112 receives an XSD document as input and generates an annotated XSD document indicating that the ‘Item’ entity of the annotated XSD document is the same as the ‘Product’ entity of the product.xsd document. This ‘sameAs’ relationship may be transformed into ‘owl:sameAs’, ‘owl:equivalentProperty’, or ‘owl:equivalentClass’ in the OWL ontology generated by processing system 102 depending on the specific context that a particular entity is mapped into (whether the entity is going to be mapped into a ‘owl:Class’ or ‘owl:ObjectPropery’ or ‘owl:Datatype Property’.), as will be described below.

As another example, an input XSD document

<xsd:element name=“EngineeringChangeOrder”
type=“EngineeringChangeOrderType”>
 <xsd:annotation>
  <xsd:documentation>
   <semrel:relationship>
    <semrel:contains min=“1” max=“unbounded”
    targetEntity =“../../scm/ItemEBO#ItemEBO”/>
    <semrel:contains min=“1” max=“1”
    targetEntity =“../..//BOMEBO#BillOfMaterials ”/>
   </semrel:relationship>
  </xsd:documentation>
 </xsd:annotation>
</xsd:element>

Accordingly, annotator 112 receives an XSD document as input and generates an annotated XSD document indicating that the entity ‘EngineeringChangeOrder’ contains one or more ItemEBO entities and only one BillOfMaterials (BOMEBO) entity. These relationships may be transformed to ‘owl:Restriction’ with ‘owl:minCardinality’ and ‘owl:maxCardinality’ constraints in the OWL ontology generated by processing system 102, as described below.

As another example, an input XSD document may be annotated using the namespace ‘seminf’ to generate an inference annotation in one embodiment:

<xsd:annotation>
  <xsd:documentation>
   <seminf:relationship>
    <seminf:primaryCondition >
     corecom:EngineeringChangeOrderEBO:Status=‘open’
    </seminf:primaryCondition>
    <seminf:relationshipType=“equivalence”>
     corecom:EngineeringChangeOrderEBO:Priority=‘high’
    </<seminf:relationshipType>
   </seminf:relationship>
  </xsd:documentation>
 </xsd:annotation>

Accordingly, annotator 112 receives an XSD document as input and generates an inference annotation based on the XSD document received. The inference annotation states that those EngineeringChangeOrders that have their Status as “Open” are equivalent to those that have their Priority as “High”. In other words, an inference rule may be derived from the inference annotations such as “IF EngineeringChangeOrders with Status=“Open” THEN EngineeringChangeOrders with Priority=“High”. This inference rule is an explicit one and may not be universally true. Its aim is to retrieve all high priority ECOs whenever there is open ECOs. The inference annotations may be transformed into semantic web rules, as will be described below.

The relationship annotations generated by annotator 112 ensure that the information that is input to annotator 112 is annotated to reflect relationships between entities present within the input information. These relationship annotations enable ontologies that are generated by processing system 102 based upon the annotations to have rich interconnectivity among the various entities (entities present in the ontology are extensively linked or related via the relationships). As discussed above, the rich interconnectivity among the entities helps to extract relevant inferences. The inference annotations generated by annotator 112 enable processing system 102 to generate one or more inference rules or semantic web rules. The inference annotations enable extra information to be extracted when executing queries on the ontology. For example, an inference rule that is derived from inference annotations may be specified as “IF EngineeringChangeOrders with Status=“Open” THEN EngineeringChangeOrders with Priority=“High”. This inference rule may be used to extract all high priority ECOs whenever there are open ECOs. The inferences derived from the relationship annotations are inherent and implicit rules that are different from the inference rules derived from the inference annotations.

The following is a list of annotations that may be used to represent relationships for input information received by processing system 102.

a) Property Annotations

b) Class Annotations

c) Instance Annotations

The output of annotator 112 is annotated input information that forms the input to ontology generator 114. In one embodiment, ontology generator 114 is configured to generate and output an intermediate ontology. The intermediate ontology that is generated and output by ontology generator 114 may include the inference annotations that were generated by annotator 112. In one embodiment, processing performed by ontology generator 114 comprises automated XSD to OWL schema conversion that converts annotated XSD files to OWL schema files and automated XML to OWL instance conversion that converts annotated XML documents to OWL instances. The following sections describe each of these conversions performed by ontology generator 114.

Automated XSD to OWL Schema Conversion

FIG. 1B is a simplified flowchart depicting a method for automatically transforming an XSD document/file to an OWL schema incorporating an embodiment of the present invention. The method depicted in FIG. 1B may be performed by software (e.g., code, program, instructions) executed by a processor, in hardware, or combinations thereof. The method depicted in FIG. 1B is not intended to limit the scope of the invention as recited in the claims.

Referring to FIG. 1B, one or more annotated XSD files are received (step 172). For each annotated XSD file that is received in 172, steps 174, 176, and 178 may be performed, as described below. Processing (step 180) then returns to steps 174 176, and 178 recursively for each annotated XSD file received in 172.

  <owl:Class ...>
  ...
<rdfs:subClassOf>
  <owl:Restriction>
     <owl:allValuesFrom
  rdf:resource=“http://localhost:8989/EnterpriseObjectLibrary/
  Release2/Core/Common/V2/MERGED_OWL_3#CodeType”/>
    <owl:onProperty>
       <owl:ObjectProperty rdf:ID=“has_ReasonCode”/>
    </owl:onProperty>
 </owl:Restriction>
</rdfs:subClassOf>
...
</owl:Class>
...
<rdf:Description rdf:about=“#has_ReasonCode”>
    <owl:equivalentProperty rdf:resource=“#has_TypeCode”/>
</rdf:Description>

In one embodiment, a log file 134 may be maintained to keep track of all the annotated XSD files that have been transformed or converted by ontology generator 114 to corresponding OWL schema files. In this way, log file 134 helps to prevent identical annotated XSD files (e.g., XSDs having the same namespace and schemaLocation as previous XSDs) that are input to the ontology generator 114 at different instances of time from being converted multiple times. For example, log file 134 may keep details of annotated XSD files that have already been converted to corresponding OWL schema files by ontology generator 114. A subsequent identical annotated XSD file that is input to ontology generator 114 will thus not be converted to the corresponding OWL schema file by ontology generator 114. This reduces redundant processing and increases the efficiency of the conversion process.

Log file 134 also helps to determine whether an OWL schema file that is generated for an annotated XSD file is to be merged with another existing OWL schema file or to be written to a new OWL schema file. For example, if an annotated XSD file with a particular namespace is already listed in log file 134, then the corresponding OWL schema file that is generated for a subsequent annotated XSD file having an identical namespace as the previous annotated XSD file listed in the log file will be merged with the existing OWL file. Otherwise, the corresponding OWL schema file that is converted for this subsequent annotated XSD file is serialized to a new OWL schema file.

In the manner described above, XSD to OWL schema conversion generates one or more valid OWL schema files for a set of annotated XSD files input to ontology generator 114 while preventing duplicate XSD files from being converted by ontology generator 114. In one embodiment, valid OWL Schema file(s) may be stored in the same hierarchical directory structures as those of the inputted annotated XSD file(s) in database such as database 106 of FIG. 1A.

TABLE 1
XSD2OWL Mappings
XSD OWL
xsd:complexType owl:Class
Global xsd:element owl:Class
Local xsd:element
 built-in type owl:DataTypeProperty
 complexType owl:ObjectProperty
xsd:attribute
 built-in type owl:DataTypeProperty
 complexType owl:ObjectProperty
xsd:import owl:imports
xsd:include owl:imports
targetNamespace xml:base
xsd:annotation rdfs:comment
xsd:minOccurs owl:minCardinality
xsd:maxOccurs owl:maxCardinality
xsd:type in
 Local element owl:allValuesFrom
 Attribute owl:allValuesFrom
xsd:ref rdf:resource
xsd:name rdf:ID
xsd:use
 Optional owl:minCardinality = 0
 Required owl:minCardinality = 1
xsd:restriction/extension in
 element/complexType rdfs:subClassOf
 xsd:simpleType owl:Class

Automated XML to OWL Conversion

If the user input information is in XML format, then an annotated XML file is provided as input to ontology generator 114. Ontology generator 114 is configured to transform each annotated XML (data) file to a corresponding OWL instance file. The associated OWL schema files that are generated by the XSD to OWL schema conversion, as described above, may be used to facilitate such a transformation, as described below. In one embodiment, ontology generator 114 may be configured to perform the following steps for the automated XML to OWL conversion:

<corecom:Identification
  xmlns:corecom=“http://xmlns.oracle.com/EnterpriseObjects/
  Core/Common/V2”>
 <corecom:ID>
    ECO117
  </corecom:ID>
 <corecom:ContextID>V1</corecom:ContextID>
 ...
</corecom:Identification>

The following example is part of the OWL instance created for the XML element shown above when applying the XML to OWL conversion as described above:

 <has_Identification>
<Identification
   xmlns=“http://localhost:8989/EnterpriseObjectLibrary/
   Release2/Core/Common/V2/MERGED_OWL_3#”>
  <has_ID>
    <IdentifierType>
     <hasValue>ECO117</hasValue>
    </IdentifierType>
  </has_ID>
  <has_ContextID> ...
    </has_ContextID>
    ...
  </Identification>
 </has_Identification>
 ...

As described above, ontology generator 114 takes annotated input information (annotated XSD schema files and annotated XML data files) as its input and automatically converts the annotated input information to an intermediate ontology. Specifically, the intermediate ontology is encoded as a set of one or more OWL documents that include the “inference annotations” that were generated by annotator 112 (the intermediate ontology also sometimes referred to an annotated ontology). As mentioned previously, the relationship annotations that were generated by annotator 112 are converted to inherent relationships in OWL through the use of various OWL entailments by ontology generator 114.

In one embodiment, an ontology sieve 116 is provided that is configured to generate a final OWL ontology from the intermediate ontology that is output by ontology generator 114 and provided as input to ontology sieve 116. For example, ontology sieve 116 may be configured to remove the annotations from the annotated ontology to generate a final ontology. Alternatively, ontology sieve 116 may be configured to re-annotate the final ontology through which the user can add some more annotations (which may be provided using user interface 110), if necessary.

In one embodiment, the intermediate ontology that is output by ontology generator 114 may be input to an inference rule generator 118 that is configured to generate a set of one or more inference rules from the “inference annotations” included in the intermediate ontology. As mentioned previously, inference rules are explicit rules and need not be universally true. For example, an inference rule may be specified as “IF EngineeringChangeOrders with Status=“Open” THEN EngineeringChangeOrders with Priority=“High”. This inference rule may be used to extract all high priority ECOs whenever there is open ECOs. Accordingly, inference rules that are generated by inference rule generator 118 enable extra information be extracted from the intermediate ontology and used when executing queries on the ontology (e.g., SPARQL queries).

The inference rules generated by inference rules generator 118 may then be inserted or seeded into database 106 (e.g., an Oracle database). The inference rules may be seeded into database 106 independently of the OWL ontology stored in database 106. In one embodiment, the inference rules may be stored in an inference rule base 122 in database 106. The inference rules stored in inference rule base 122 may be used for executing SPARQL queries against the OWL ontology model saved in database 106.

In one embodiment, an ontology loader 119 is provided that is configured to persist the final ontology generated by ontology sieve 116 into database 106. The final ontology may be encoded in OWL. Ontology loader 119 may also be configured to generate an inferred ontology model/graph based on the final ontology. For example, ontology loader 119 may use an inference engine such as Pellet to generate the inferred ontology model/graph. This inferred ontology model/graph may also be persisted into database 106.

In one embodiment, for example, in an Oracle database, a query API such as an Oracle Jena adapter 130 may be used for storing ontologies, running inference rules on ontologies, executing SPARQL queries, and creating custom rule bases. Jena adapter 130 enables databases, such as Oracle databases, to handle RDF data. In one embodiment, Jena adapter 130 provides Java APIs that enable users to perform ontology-related functions from within a Java program.

FIG. 2 depicts a simplified flowchart 200 depicting a method for automatically generating one or more ontologies according to an embodiment of the present invention. The method depicted in FIG. 2 may be performed by software (e.g., code, program, instructions) executed by a processor, in hardware, or combinations thereof. The method depicted in FIG. 2 is not intended to limit the scope of the invention as recited in the claims. The embodiment described below assumes that the ontology is generated in an RDF-compliant language such as OWL. However, in alternative embodiments, ontologies may also be generated in other languages.

As depicted in FIG. 2, input information is received (step 202). The input information may be for example enterprise information related to an enterprise application. The input information may be encoded in a first format such as XSD, XML, WSRP, WSDL, etc. For example, one or more XSD/XML files comprising the input information may be received in 202.

The input information received in 202 is then annotated (step 204). In one embodiment, two types of annotations may be performed in 204: relationship annotations and inference annotations. Relationship annotations are annotations made to the input information to reflect relationships between entities present within the input information and across the input information. These relationship annotations enable generation of ontologies that have rich interconnectivity. Inference annotations are annotations made to the input information that are used to generate one or more inference rules. Inference annotations enable extra information to be extracted from the input information and used when executing queries on ontology.

An intermediate ontology is then generated based upon the annotated input information generated in step 204 (step 206). In one embodiment, the intermediate ontology is encoded in a second format that is different from the first format used for inputting information received in 202. For example, if the input information received in 202 is XSD/XML document, the intermediate ontology may be generated in a second language such as OWL that is used for encoding the intermediate ontology. As part of step 206, two conversion processes (XSD to OWL schema and XML to OWL instance) may be performed to generate an intermediate ontology from the annotated input information. As part of the conversion, the relationship annotations present in the annotated input information are transformed into inherent relationships through the use of various OWL entailments, while the inference annotations are included as part of the intermediate ontology generated in step 206.

A final ontology is then generated from the intermediate ontology generated in 206 (step 208). In one embodiment, inference annotations included in the intermediate ontology may be used to generate the inference rules, after which the inference annotations are removed to produce the final ontology. Alternatively, as part of 208, a user can add one or more annotations to the final ontology, if necessary. The additional annotations may be provided via a user interface such as user interface 110 depicted in FIG. 1A. In one embodiment, the final ontology generated in step 208 is encoded in OWL.

A set of one or more inference rules may then be generated from the intermediate ontology generated in step 206 (step 210). The inference rules generated in 210 are explicit rules and need not be universally true. The inference rules can be used to extract extra information from the generated ontology, which may be used for executing queries on the ontology (e.g., SPARQL queries). For example, an inference rule that is generated from the intermediate ontology may be specified as “IF EngineeringChangeOrders with Status=“Open” THEN EngineeringChangeOrders with Priority=“High”. This inference rule may be used to extract all high priority ECOs whenever there is open ECOs. Steps 208 and 210 may be performed in parallel.

The set of inference rules generated in step 210 and the final ontology generated in 208 may then be stored in a database such as database 106 of FIG. 1A (step 212). The set of inference rules generated in 210 may be seeded into the database independently of the OWL ontology stored in database 106.

As described above, techniques are provided for automatically generating an ontology and a set of inference rules based upon input information. The input information may, for example, be in the form of XSD, XML, WSDL, or WSRP, etc. The automatically generated ontology may be encoded in OWL or other RDF-compliant language. This automated generation of ontologies enables semantic and relationship representation of enterprise information that allows for more meaningful and contextually relevant integrations among various enterprise applications. The automated generation of ontologies enables handling of large amounts of enterprise data to generate large ontologies, thereby resolving the scalability problems associated with existing automated tools. The automated generation of ontologies further eliminates manual generation of ontologies, thereby increasing accuracy and reducing the time to development.

FIG. 3 is a simplified block diagram of a computer system 300 that may be used to practice an embodiment of the present invention. Computer system 300 may serve as a processing system 102 or a client system 120 depicted in FIG. 1A. As shown in FIG. 3, computer system 300 includes a processor 302 that communicates with a number of peripheral subsystems via a bus subsystem 304. These peripheral subsystems may include a storage subsystem 306, comprising a memory subsystem 308 and a file storage subsystem 310, user interface input devices 312, user interface output devices 314, and a network interface subsystem 316.

Bus subsystem 304 provides a mechanism for letting the various components and subsystems of computer system 300 communicate with each other as intended. Although bus subsystem 304 is shown schematically as a single bus, alternative embodiments of the bus subsystem may utilize multiple busses.

Network interface subsystem 316 provides an interface to other computer systems, networks, and portals. Network interface subsystem 316 serves as an interface for receiving data from and transmitting data to other systems from computer system 300.

User interface input devices 312 may include a keyboard, pointing devices such as a mouse, trackball, touchpad, or graphics tablet, a scanner, a barcode scanner, a touch screen incorporated into the display, audio input devices such as voice recognition systems, microphones, and other types of input devices. In general, use of the term “input device” is intended to include all possible types of devices and mechanisms for inputting information to computer system 300. A user may use an input device to enter enterprise information as input to the processing system 102 of FIG. 1A.

User interface output devices 314 may include a display subsystem, a printer, a fax machine, or non-visual displays such as audio output devices, etc. The display subsystem may be a cathode ray tube (CRT), a flat-panel device such as a liquid crystal display (LCD), or a projection device. In general, use of the term “output device” is intended to include all possible types of devices and mechanisms for outputting information from computer system 300.

Storage subsystem 306 provides a computer-readable storage medium for storing the basic programming and data constructs that provide the functionality of the present invention. Software (programs, code modules, instructions) that when executed by a processor provide the functionality of the present invention may be stored in storage subsystem 306. These software modules or instructions may be executed by processor(s) 302. Storage subsystem 306 may also provide a repository for storing data used in accordance with the present invention, for example, the data stored in the diagnostic data repository. For example, storage subsystem 306 provides a storage medium for persisting one or more ontologies and inference rules. Storage subsystem 306 may comprise memory subsystem 308 and file/disk storage subsystem 310.

Memory subsystem 308 may include a number of memories including a main random access memory (RAM) 318 for storage of instructions and data during program execution and a read only memory (ROM) 320 in which fixed instructions are stored. File storage subsystem 310 provides persistent (non-volatile) storage for program and data files, and may include a hard disk drive, a floppy disk drive along with associated removable media, a Compact Disk Read Only Memory (CD-ROM) drive, an optical drive, removable media cartridges, and other like storage media.

Computer system 300 can be of various types including a personal computer, a portable computer, a workstation, a network computer, a mainframe, a kiosk, a server or any other data processing system. Due to the ever-changing nature of computers and networks, the description of computer system 300 depicted in FIG. 3 is intended only as a specific example for purposes of illustrating the preferred embodiment of the computer system. Many other configurations having more or fewer components than the system depicted in FIG. 3 are possible.

Although specific embodiments of the invention have been described, various modifications, alterations, alternative constructions, and equivalents are also encompassed within the scope of the invention. Embodiments of the present invention are not restricted to operation within certain specific data processing environments, but are free to operate within a plurality of data processing environments. Additionally, although embodiments of the present invention have been described using a particular series of transactions and steps, it should be apparent to those skilled in the art that the scope of the present invention is not limited to the described series of transactions and steps.

Further, while embodiments of the present invention have been described using a particular combination of hardware and software, it should be recognized that other combinations of hardware and software are also within the scope of the present invention. Embodiments of the present invention may be implemented only in hardware, or only in software, or using combinations thereof.

The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense. It will, however, be evident that additions, subtractions, deletions, and other modifications and changes may be made thereunto without departing from the broader spirit and scope as set forth in the claims.

Rao, Aditya Ramamurthy, Prakash, Pravin, Rajesh, Narni, Ghosh, Bhaskar Jyoti, Rangarajan, Keshava, Krishnamurthy, Sudharsan, Srinivasan, Nagaraj

Patent Priority Assignee Title
10067972, Nov 17 2015 International Business Machines Corporation Semantic database driven form validation
10078659, Nov 17 2015 International Business Machines Corporation Semantic database driven form validation
10223329, Mar 20 2015 International Business Machines Corporation Policy based data collection, processing, and negotiation for analytics
10339130, Oct 06 2016 Microsoft Technology Licensing, LLC Diverse addressing of graph database entities by database applications
10346751, Sep 15 2014 International Business Machines Corporation Extraction of inference rules from heterogeneous graphs
10445346, Nov 10 2009 Microsoft Technology Licensing, LLC Custom local search
10545955, Jan 15 2016 Seven Bridges Genomics Inc. Methods and systems for generating, by a visual query builder, a query of a genomic data store
8805804, Nov 28 2008 PIVOTAL SOFTWARE, INC Configuring an application program in a computer system
9369478, Feb 06 2014 NICIRA, INC OWL-based intelligent security audit
9424520, Nov 17 2015 International Business Machines Corporation Semantic database driven form validation
9477694, Apr 25 2013 International Business Machines Corporation Guaranteeing anonymity of linked data graphs
9514161, Apr 25 2013 International Business Machines Corporation Guaranteeing anonymity of linked data graphs
9613162, Nov 17 2015 International Business Machines Corporation Semantic database driven form validation
9684885, Jan 17 2011 Infosys Technologies, Ltd. Method and system for converting UBL process diagrams to OWL
9984136, Mar 21 2014 ExlService Technology Solutions, LLC System, method, and program product for lightweight data federation
Patent Priority Assignee Title
6629097, Apr 28 1999 Douglas K., Keith Displaying implicit associations among items in loosely-structured data sets
7155715, Mar 31 1999 British Telecommunications public limited company Distributed software system visualization
7249100, May 15 2001 NOKIA SOLUTIONS AND NETWORKS OY Service discovery access to user location
7426526, Aug 29 2005 International Business Machines Corporation Method and system for the creation and reuse of concise business schemas using a canonical library
7761480, Jul 22 2003 KINOR TECHNOLOGIES INC Information access using ontologies
7827169, Oct 24 2005 SAP SE Methods and systems for data processing
20020174117,
20040098358,
20040213409,
20050182792,
20050234889,
20060048155,
20060053098,
20060053172,
20060165040,
20060184617,
20070038609,
20070055691,
20070106933,
20070150387,
20070162467,
20070168479,
20070192351,
20070208697,
20070226242,
20070233457,
20070239674,
20070299813,
20080005194,
20080021881,
20080091448,
20080126476,
20080133960,
20080168420,
20080183674,
20080215519,
20080275844,
20090077094,
20090150735,
20090292685,
20100010974,
20100017606,
20100057702,
20100070448,
20100070496,
20100070517,
////////
Executed onAssignorAssigneeConveyanceFrameReelDoc
Feb 19 2009RAO, ADITYA RAMAMURTHYOracle International CorporationASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS 0223180692 pdf
Feb 19 2009RAJESH, NARNIOracle International CorporationASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS 0223180692 pdf
Feb 19 2009GHOSH, BHASKAR JYOTIOracle International CorporationASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS 0223180692 pdf
Feb 19 2009RANGARAJAN, KESHAVAOracle International CorporationASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS 0223180692 pdf
Feb 19 2009PRAKASH, PRAVINOracle International CorporationASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS 0223180692 pdf
Feb 26 2009Oracle International Corporation(assignment on the face of the patent)
Feb 26 2009KRISHNAMURTHY, SUDHARSANOracle International CorporationASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS 0223180692 pdf
Feb 26 2009SRINIVASAN, NAGARAJOracle International CorporationASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS 0223180692 pdf
Date Maintenance Fee Events
Dec 16 2015M1551: Payment of Maintenance Fee, 4th Year, Large Entity.
Dec 19 2019M1552: Payment of Maintenance Fee, 8th Year, Large Entity.
Dec 21 2023M1553: Payment of Maintenance Fee, 12th Year, Large Entity.


Date Maintenance Schedule
Jul 03 20154 years fee payment window open
Jan 03 20166 months grace period start (w surcharge)
Jul 03 2016patent expiry (for year 4)
Jul 03 20182 years to revive unintentionally abandoned end. (for year 4)
Jul 03 20198 years fee payment window open
Jan 03 20206 months grace period start (w surcharge)
Jul 03 2020patent expiry (for year 8)
Jul 03 20222 years to revive unintentionally abandoned end. (for year 8)
Jul 03 202312 years fee payment window open
Jan 03 20246 months grace period start (w surcharge)
Jul 03 2024patent expiry (for year 12)
Jul 03 20262 years to revive unintentionally abandoned end. (for year 12)