The system and method for correlating, predicting and diagnosing system component performance data includes capturing knowledge about system behavior, deploying the captured knowledge as baseline system behavior files, evaluating system performance data against the baseline system behavior files, performing predictive and diagnostic analysis when received system performance data exceeds thresholds in the baseline system behavior files, and notifying a user when an analysis result is generated. The method of capturing knowledge about system behavior includes defining problems to be solved, creating datasets that correspond to defined problems, constructing problem scenarios, associating data patterns modules with the problem scenarios, and generating XML definition files that characterize system behavior in terms of the scenarios, modules, and datasets. The system has the capability to activate corrective scripts in the target system and to reconfigure the target system. Detailed information on various example embodiments of the inventions are provided in the Detailed Description below, and the inventions are defined by the appended claims.
|
6. A system for managing business transactions and infrastructure across a communications network, comprising:
a collection system embedded in an infrastructure system, said infrastructure system including a plurality of infrastructure components, said collection system including one or more data collectors, said data collectors comprising an application-specific plug-in that extracts data from said infrastructure system, wherein;
the application-specific plug-in is programmed to send the management server information identifying the application-specific plug-in when the application-specific plug-in starts;
said data collectors are organized into domains that identify a characteristic selected from the group consisting of distinct areas of the system environment, common functions, application type and shared common attributes;
a management server including a system manager for controlling the management server, said management server receiving the extracted data from the client collection system, the system manager aggregating the extracted data;
an analysis system for performing data analysis on the extracted data;
a communication network between said collection system and said management server providing for communication of extracted data from said collection system to said management system.
1. A system for managing business transactions and infrastructure, comprising:
a collection system embedded in a client system, said client system including a plurality of client system components, said collection system including one or more data collectors, said data collectors comprising an application-specific plug-in that extracts data from said client system, wherein:
the application-specific plug-in is programmed to send the management server information identifying the application-specific plug-in when the application-specific plug-in starts;
said data collectors are organized into domains that identify a characteristic selected from the group consisting of distinct areas of the system environment, common functions, application type and shared common attributes;
a management server including a system manager for controlling the management server, said management server receiving the extracted data from the client collection system, the system manager aggregating the extracted data;
a repository wherein is stored extracted data received from said collection system;
an analysis system for performing data analysis on the extracted data; and
a communication network between said collection system and said management server providing for communication of extracted data from said collection system to said management system.
11. A system for managing business transactions and an infrastructure system, comprising:
a collection system embedded in an infrastructure system, said infrastructure system including a plurality of infrastructure components, said collection system including one or more data collectors, said data collectors comprising an application-specific plug-in that extracts data from said infrastructure system, wherein:
the application-specific plug-in is programmed to send the management server information identifying the application-specific plug-in when the application-specific plug-in starts;
said data collectors are organized into domains that identify a characteristic selected from the group consisting of distinct areas of the system environment, common functions, application type and shared common attributes;
a management server including a system manager for controlling the management server, said management server receiving the extracted data from the client collection system, the system manager aggregating the extracted data;
a repository for storing the extracted data, wherein said management server stores data received from said collection system to said repository;
a reasoning system for performing data analysis on the extracted data;
a communication network between said collection system and said management server providing for communication of extracted data from said collection system to said management system.
2. A system according to
3. A system according to
4. A system according to
5. A system according to
7. A system according to
8. A system according to
9. A system according to
10. A system according to
12. A system according to
13. A system according to
14. A system according to
15. A system according to
16. A system according to
|
This application is a continuation of U.S. patent application Ser. No. 10/063,232, now U.S. Pat. No. 7,237,023, filed Apr. 2, 2002, which is a continuation-in-part of U.S. patent application Ser. No. 09/681,419, now U.S. Pat. No. 7,065,566, filed on Mar. 30, 2001, both of which are hereby incorporated by reference in their entirety.
This invention relates generally to ensuring business system performance, and more particularly, to correlating and diagnosing performance data collected from components of business systems to achieve information technology goals and business objectives.
In the developing years of business use of electronic systems, business systems were used primarily for accounting and records keeping functions. As these systems grew in capability and features, business managers began to make use of these capabilities and features in new ways to improve business performance. With the widespread acceptance and use of the Internet for conducting on-line commercial and consumer transactions, these business systems have become increasingly complex and geographically distributed. At the same time, there developed increasing demand for higher performance and increased reliability or “uptime” to satisfy these business needs. This has put greater emphasis and visibility on the role of the information technology (IT) infrastructure of e-commerce businesses, and the people that support these systems. Concurrently, there has developed a trend whereby business managers have a need to play a more active role in IT infrastructure decision-making.
Conducting business over the Internet has created many new challenges that are difficult to manage using conventional approaches. Companies with activities that rely on e-commerce struggle to find solutions that will assist with managing increasingly complex infrastructure while satisfying a more demanding customer base. In particular, downtime costs can have a substantial impact on the gross revenues of an e-commerce organization, as well as losses due to brand erosion and customer dissatisfaction. As these companies become increasingly dependent upon e-Business as a significant source of revenue, the success of the overall business is inextricably linked to the health of the IT infrastructure. The lack of tools to communicate critical information concerning the condition of the IT infrastructure to business managers further complicates this picture. A further complexity is that many e-commerce systems are widely distributed over a large geographic area, where a principle means of communications between parts of the system is via the Internet.
Businesses are further faced with the problem of translating IT organization performance goals into objectives that satisfy the needs of the business. While there is a need for a reliable, high performance infrastructure for executing business processes, there is often a lack of understanding of the impact that systems, applications, and process execution breakdowns have on business objectives, because of the inability to measure IT performance against business objectives. Regarding reliability and performance, existing management tools for heterogeneous and complex business processes offer incomplete and inadequate coverage of individual infrastructure elements. There is a lack of a systems management solution that encompasses the entire execution infrastructure as a single entity, capturing the interrelations between systems, applications, and business processes. A solution must be capable of automatically adapting to constant changes in this execution environment.
It is often difficult for IT organizations to relate the impact of process execution breakdowns to business objectives. Since infrastructure failures are viewed as isolated IT events, the impact on the business is noticed only when a product line manager or analyst sees a change in business performance. A solution is needed that will capture the business process workflows and relate potential and actual infrastructure breakdowns to business process execution, enabling IT and business managers to find a collaborative solution. It is desirable to have a solution that is capable of expediting and automating the remediation process.
IT contributions to the core business are often measured using IT-oriented metrics, rather than metrics that support IT contributions to business goals or corporate strategy. A solution is needed that reports IT performance in terms of business metrics rather than infrastructure performance. By combining business and IT metrics, information may be generated that will enable business decisions to be made based on a correlation of IT performance and business goals.
This environment has created a need for a unified solution for monitoring the health of real-time e-business infrastructures to improve the quality, reliability and total cost of ownership of e-business sites. This translates to the bottom line as greater customer satisfaction, a loyal customer base, and increased revenues. It is therefore an object of the present invention to provide a comprehensive solution for correlating collected performance data to detect and identify incipient and immediate system component failures in complex e-business infrastructures.
Another object of the present invention is to enable improved performance of e-commerce systems by diagnosing patterns in collected component performance data to determine a cause of a system component performance reduction in e-business.
Yet another object of the present invention is to provide real-time correlation, prediction and diagnosis of collected performance data from e-commerce system components for evaluating, reporting, and managing e-commerce system configuration, providing predictive and impact analysis, and reducing downtime by detecting incipient failures before there is a significant impact of business performance.
The present invention satisfies the needs for transaction monitoring and infrastructure management of modern business systems by capturing and representing expert knowledge about infrastructure components, and monitoring their behavior. Infrastructure component data is extracted by collectors from applications, operating systems and databases before being analyzed and correlated to detect, predict and diagnose the cause of execution failures. Correlation of the infrastructure component data enables users to associate and analyze data from a wide variety of sources in a simple unifying format. This provides the capability to capture relationships and correlate data from multiple systems in real time. Root cause analysis is performed by correlating and diagnosing the collected data.
In a typical scenario, a plurality of data collectors gathers relevant data about an e-business application, a database, and various web servers. This data is sent to a management system server and analyzed by a reasoning system within the server. Examples of anomalies that may occur include a web server or application server timeout, an application server error, or a database space problem. When an anomaly is detected, the reasoning system performs predictive analysis and diagnostics that determine the specific component contributing to the problem and the server system manager notifies the administrator of an actual or incipient problem, its cause, and its impacts. This process enables the management system server to pinpoint the specific cause of a problem in real-time.
The data collectors are small, individual collectors that capture data reflecting the real-time health and performance of the applications and underlying IT components in the e-business environment. These collectors deliver data to the management server where it is aggregated and analyzed. The system manager records the data in a repository and evaluates it against predefined threshold values. If any of the datapoints violates threshold values, the reasoning system is triggered to perform analysis.
An embodiment of the present invention is a computer-implemented method for correlating and diagnosing system performance data that comprises capturing knowledge about system behavior, deploying the captured knowledge as baseline system behavior files, evaluating monitored system performance datapoints against the baseline system behavior files to identify datasets that have changed states, performing real-time prediction and diagnostic analysis on the datasets that have changed states, and notifying a user of a prediction and diagnostic analysis result. The capturing step may comprise defining problems to be solved, creating datasets that correspond to the defined problems, constructing problem scenarios, associating data patterns with the problem scenario cases, and generating XML definition files that characterize system behavior in terms of the scenarios, modules and datasets. The deploying step may further comprise characterizing the captured system behavior knowledge as XML files. The deploying step may comprise loading XML definition files that characterize system behavior in terms of the scenarios, modules and datasets, receiving system domain, component and datapoint information, creating relationships between components, and evaluating and creating domain and component instances in each domain to form instances of datasets. The evaluating step may comprise receiving system performance datapoints from data collectors, comparing the datapoints against a threshold value to determine a state of the datapoints, analyzing the received performance datapoints when a state changes or exceeds a threshold, correlating the datapoints that exceeded the threshold values with stored datasets in the baseline system behavior files to identify datasets, and sending the correlated related datasets to a reasoning system for prediction and diagnostic analysis. The performing step may comprise receiving correlated datasets by a reasoning system, performing prediction and diagnosis analysis by the reasoning system to determine if the received datasets match problem patterns in a knowledge base, and generating an analysis result based on analysis of the received datasets and problem patterns. The generating step may provide an analysis result that identifies a problem and a probable cause of the problem. The generating step may provide an analysis result that predicts an incipient system component failure. The generating step may provide a best estimate of a problem and probable cause of the problem. The evaluating step may comprise parsing XML files into a tree representation, traversing the tree representation and taking actions on specific tree elements, creating and updating domain, component, datapoint, relationship and dataset objects specified by a given tree element, instantiating derived objects by processing newly created components to determine if new relationships need to be created, and instantiating datasets by processing newly created relationships and components to determine if new datasets need to be instantiated. An embodiment of the present invention may be a computer-readable medium containing instructions for controlling a computer system to carry out the steps described above.
Another embodiment of the present invention is a computer readable medium containing a data structure for storing objects for correlating and diagnosing system component performance data that comprises domain objects that identify distinct areas of a system environment, component objects that identify parts of the domains of the system environment, datapoint objects that identify monitored characteristics of system components, dataset objects that comprise logical collections of datapoint objects, relationship templates for connecting two or more components related to one another, scenario objects that identify possible causes for a problem, and module objects that encapsulate stored knowledge. A system manager may instantiates the objects in a repository from XML files. The domain objects may be defined by DomainDef XML definition structures that include a type, category, and attributes of component and datapoint objects, the component objects may be defined by ComponentDef XML definition structures that include a type and attributes of datapoint objects, the datapoint objects may be defined by DataPointDef XML definition structures that contain a name and attributes, the dataset objects may be defined by DataSetDef XML definition structures that include attributes of components objects and datapoint objects, the relationship templates may be defined by RelationTemp definition structures that include identification of related components, the scenario objects may be defined by the Scenario definition structures that include the problem description, probable cause and suggested solutions, and the module objects may encapsulate knowledge. Domain objects may comprise one or more component objects, component objects may comprise one or more datapoint objects, and dataset objects may comprise one or more component objects and one or more datapoint objects. Dataset objects may contain datapoints belonging to one or more component objects. The DataSetDef objects may comprise a DataSetDef name, a list of DsComponentDef objects, each object may include a pointer to the component definition used in the dataset, a subset of DsDataPointDef objects included in the component definitions needed in the dataset, a list of child DsComponentDef objects related to this component, a parent DsComponentDef object, the DsDataPointDef objects may include a pointer to a DataPointDef object used in the dataset, a trigger flag for specifying whether this datapoint triggers analysis, a trigger threshold at which point analysis is triggered, an analyze flag for specifying whether this datapoint participates in analysis, and a label to uniquely identify the datapoint. The trigger threshold may be selected from the group consisting of good, fair, warning and critical. The DsComponentDef object may further include a constraint selected from the group consisting of possible candidates for this component based on its host, the domain instance, and the component instance. The relationship templates may specify a rule to create relationships between components, the relationship template may include the following attributes: Type for identifying method of creation, OwnerDomainType for identifying owner domain type, OwnerCompType for identifying owner component type, MemberDomainType for identifying member domain type, MemberCompType for identifying member component type, OwnerComp for identifying owner component instances, MemberComp for identifying member component instances, OwnerDomainInst for identifying owner domain instances, MemberDomainInst for identifying member domain instances, and flags to specify that owner and member should be part of the same domain, same component and same host. The data structure may further comprise an engine template for associating analysis with a dataset, the engine including one or more modules that address a specific dataset.
Another embodiment of the present invention is a computer-implemented system for correlating and diagnosing system performance data that comprises an extension environment comprising means for capturing knowledge about system behavior, a system manager that comprises means for deploying the captured knowledge as baseline system behavior files, means for evaluating monitored system performance datapoints against the baseline system behavior files to identify datapoints that have changed states, a reasoning system that comprises means for performing real-time prediction and diagnostic analysis on the datasets, and means for notifying a user of a prediction and diagnostic analysis result. The capturing means may comprise problems to be solved, datasets that correspond to the defined problems, problem scenarios, data pattern modules with the problem scenarios, and XML definition files that characterize system behavior in terms of the scenarios, modules and datasets. The means for deploying may further comprises XML files that characterize the captured system behavior knowledge. The system behavior knowledge may be stored in a repository, encapsulated in XML files and built into the engine and module. The means for deploying may comprise XML definition files that characterize system composition and behavior in terms of the scenarios, modules and datasets, domain information, datapoints and components, and relationships between components. The means for evaluating may comprise system performance data from data collectors, domain, component, datapoint and relationship instances from received data, relationships based on pre-defined templates based on component instances, datasets based on relationships and component instances, evaluated performance data to determine if a threshold value is exceeded, identified datasets containing datapoints that exceed a threshold value, and a reasoning system for receiving the identified datasets. The means for performing real-time prediction and diagnostic analysis may comprises a reasoning system for receiving identified datasets, a prediction engine and a diagnostic engine in the reasoning system for determining if the received dataset matches a problem pattern in a knowledge base, and an analysis result from the prediction engine and diagnostic engine. The analysis result may identify a problem, a probable cause of the problem, and a suggested solution. The analysis result may predict an incipient system component failure. The analysis result may be a best estimate of a problem and a probable cause of the problem.
Yet another embodiment of the present invention is a method for correlating and diagnosing system performance data that comprises entering knowledge about system behavior data into a repository, receiving system performance data from a target system, comparing the system performance data values with system behavior data values in the persistent store to determine if threshold values have been exceeded, indicating a problem, correlating the performance data that exceeds threshold values with datasets in a repository to identify related datasets, performing predictive and diagnostic analysis of the identified datasets, and notifying a user of a result of the performing predictive and diagnostic analysis step. The system behavior data and the system performance data may be stored in the persistent store. The performing step may further comprise analyzing the identified datasets with stored problem scenario datasets to determine a cause and correction for the data values that exceed threshold values. The method may further comprise activating a corrective script based on a result of the performing predictive and diagnostic analysis step. Another embodiment of the present invention is a computer-readable medium containing instructions for controlling a computer system to carry out die steps described above.
These and other features, aspects, and advantages of the present invention will become understood with regard to the following description, appended claims, and accompanying drawings
Turning now to
The reasoning system 146 is comprised of a diagnostic engine and a predictive analysis engine. The diagnostic engine can identify patterns in collected data, which allow it to determine the state of the system or a problem that may be present. To accomplish this, the system manager organizes the data into datasets, each of which contain specific datapoints. The system manager stores datasets in a repository. A datapoint describes a specific attribute of a component, such as CPU utilization and available disk space. A dataset is a group of datapoints with a certain value range, For example, a dataset may contain a datapoint for CPU utilization that has a “warning value, indicating that the CPU is almost fully utilized. When the reasoning system submits a dataset to the diagnostic engine for analysis, the diagnostic engine retrieves the dataset from the repository, examines the patterns in the data, and matches these patterns with information in its knowledge base that best describes the current state, potential problems, or existing problems. If the diagnostic engine finds an anomaly, it determines the probable cause or condition and generates an analysis result that may be viewed using a GUI. Each analysis result describes the problem or condition, its severity, the date and time of occurrence, and short and long-term solutions, if applicable. Problems or conditions can trigger notifications to individuals or groups of individuals, and corrective action script to remedy the problem or condition.
The predictive analysis engine analyzes collected data over time to discover trends in a host business system. Predictive analysis discovers trends in degrading performance and potential system failures. When the predictive analysis engine diagnosis a problem or condition, it analyzes datapoint values over time to determine trends in system resources and business processes. If the predictive analysis engine detects a trend, it produces and sends an analysis result to a GUI. The analysis result may be a graph that displays the values of a datapoint or datapoints over time. Predictive analysis estimates the time available until resources are depleted or until a failure occurs, warning the user in advance to enable correction of the problem or condition.
Turning now to
A domain 210, shown in
Also shown in
Turning now to
Turning now to
Turning now to
Turning to
Turning now to
Turning now to
Turning now to
Turning now to
The invention is a complete object-oriented system. The System Manager and other server elements instantiate data objects representing business applications, databases, and operating system resources that are stored in an Object Oriented Database Management System (OODBMS). Using XML representations, the System Manager and other server components instantiate these objects in the Repository. The database architecture contains two types of objects, definition objects and instance objects. In object-oriented terms, definition objects are similar to a class and instance objects are similar to an instance. Definition objects provide a common place to describe instance objects and to store their attributes. TABLE 1 describes the type of instance objects and their corresponding definition objects.
TABLE 1
Instance
Definition
Object
Object
Description
Domain
DomainDef
Domains define a distinct area of an
environment being monitored: web server,
operating system, database, and business
application. A domain groups related
components.
Component
ComponentDef
A component is a part of a domain in
which data is being tracked. For
example, a file system, system process,
and log file are all components. In
the data model, components are modeled
as a group of datapoints belonging to
the same area, performing a common
function, or sharing common attributes.
Datapoint
DataPointDef
Datapoints are the monitored
characteristics of components or
component instances. File system
datapoints might include total disk
space, amount of free space, and
amount of used space on the disk.
Turning to
Datasets are logical collections of datapoints stored within the Repository. The datapoints in each dataset can belong to any number of components, and any number of datasets can contain the same datapoint. The groupings are based on various types of relationships between the components that contain the datapoints. For example, because database functionality is affected when a file system runs low on space, a dataset might group datapoints that monitor file system space within a domain, in addition to datapoints that monitor Oracle database activity in the same domain. Each dataset represents an attribute that the Reasoning System uses for analysis. The description of a dataset is contained in an object called DatasetDef in the database. The DatasetDef object lists all the components and their datapoints, and defines the dataset. TABLE 2 shows an example dataset.
TABLE 2
Domain
Component
Datapoint
Operating system
File system
Percent Used
Oracle
Tablespace
Status
Application
Server
Status
This dataset contains three datapoints. Since datapoints are essentially fixed attributes of a component, it is die component that determines how to build a dataset. Therefore, in this example, there are three distinct components and each has a datapoint.
The DatasetDef describes which components and datapoints are included in a dataset, in addition to links to the type of analysis appropriate for the dataset. A dataset instance (referred to simply as a dataset) is an instance of the DatasetDef. A dataset instance is created from the DatasetDef when the system has all the necessary components and relationships to create it. The rules for selecting the components that become part of a dataset are a crucial part of the process. A dataset can contain components that belong to one domain or multiple domains. Hence, the system is capable of performing cross-domain analysis. A dataset instance contains the following information:
Name—the name of the corresponding DatasetDef followed by a unique identifier;
Component List—the list of component instances that are a part of the dataset; and
Datapoint List—a list of datapoint instances that comprise the dataset.
The data pattern is an ordered list of datapoints and their current values or states. This pattern is generated whenever analysis occurs on a dataset.
Turning to
These special definition files called DatasetDef include DsComponentDef and DsDataPointDef definition structures. The DatasetDef is a special definition file that describes a dataset and contains the following information:
Name—the name of the DatasetDef; the name of the dataset instance is based on this name; and
List of DsComponentDef—a list of references to component definitions with additional dataset specific information.
The DsComponentDef is a special definition structure that contains the following information:
ComponentDef—a pointer to the component definition used in the dataset;
List of DsDataPointDef structures—a subset of the datapoints from the component definition that are needed for the dataset;
List of child DsComponentDef objects—components that must be specifically related to this component; and
Parent DsComponentDef object.
A DsComponentDef may contain an optional set of one or more constraints:
Host—constrains the possible candidates for this component based on its host;
Domain—the domain instance constraint; and
Component—the component instance constraint.
The dataset definition tree is defined hierarchically and includes two types of DsComponentDef files: root and child. The root DsComponentDef structures are at the top level and the child DsComponentDef structures have a parent DsComponentDef. Root structures are specified when a relationship between two component types is not envisioned ahead of time. Child level structures are specified when relationships are known. This hierarchical structure allows for construction of both simple and complex datasets.
The DsDataPointDef is a special definition structure that contains the following information:
DatapointDef—a pointer to the datapoint definition used in the dataset;
TriggerFlag—specifies whether this datapoint triggers analysis;
TriggerThreshold—the threshold state (good, fair, warning, critical) at which analysis is triggered;
AnalyzeFlag—whether this datapoint participates in analysis; and
Label—used to uniquely identify the datapoint.
Datapoints that have the analyze flag are used to create the pattern for analysis. Those that do not are used for information purposes and to identify the context for the dataset. For example, there could be a datapoint that identifies the name of the file system that is failing. TriggerFlag identifies datapoints that may trigger analysis. Trigger threshold states are used to determine the state at which analysis must be triggered.
Turning to
Turning now to
Relationships are the dependencies, interactions, and working associations among the domains and component instances that are being monitored. Relationships connect two or more components as being, belonging, or working together. Components often have relationships with one another; for example, a word processing application may depend on the printer connected to a particular host in order to print, and therefore establishes a relationship. The relationship between components is very important when creating individual datasets. When building a list of associated datapoints, the server looks for these relationships. For example, if the printer and spooler are related to each other and there is a dataset that contains datapoints from both, only datapoints from the related printer and spooler are used. This is crucial to the problem determination that is based on this relation. If the printer is down, only the spooler that is associated with the printer is affected. The System Manager instantiates datasets based on relationships. As the System Manager collects, stores, and analyzes data from the system, it checks for relationships that exist between the various elements of the business enterprise. A relationship exists when one component relies on another component in order to function. The Collector plug-ins use these types of relationships to determine what data to extract. The Collector plug-ins normally extract relationship data during configuration and initialization, rather than during the normal collection interval. In other words, the Collector plug-ins typically send data about relationships only when something has changed (is re-configured) and when the system starts. Relationships link any two components; however, the components themselves may belong to any domain. The system supports the dependency relationship between components. That is, Component A depends on Component B for its operation. Relationships are crucial to instantiating a dataset. Relationships may be specified in the following ways: discovery by plug-in and relationship templates. A Collector plug-in can discover or create a relationship based on knowledge it has about the domain or knowledge that the user has provided through configuration. Templates are XML definition files that define relationships that are created by the System Manager instead of Data Collectors. Templates can also define relationships among components. When the System Manager receives datapoints extracted from two components, it checks the template to determine if a relationship exists. It then uses the relationships to group datapoints within the Repository. Relationship templates allow the designer to specify a rule to create a relationship. The rule contains a regular expression string that can be used to choose the individual components in a relationship. The rule may also contain a flag that indicates that the two entities have something in common. The following attributes can be specified: host where owner or member component resides; domain (instance) to which the owner or member component belongs; component name of the owner or member; require that both components are from the same domain; and require that both components are on the same host. The server creates a relationship automatically when two components matching a relationship template are detected.
Turning now to
Turning to
1. If the ComponentDefs VicinityImpact is of type “System,” implying that this component affects the entire system, all components of this type are searched for and categorized.
2. If the Component that has already been selected has a relationship with another component that matches this DatasetDef, this component is included in the category.
At the end of this search, a list of candidate components that may be included in the datasets is created. The system then creates a candidate dataset consisting of these components and sends them for further analysis. In choosing the candidate dataset, a combinatorial algorithm is used. If there are three components that match a particular dataset, then three candidate datasets are created.
CompIns11 1511-CompIns21 1521-CompIns31 1531
CompIns12 1512-CompIns21 1521-CompIns31 1531
CompIns12 1512-CompIns22 1522-CompIns31 1531
CompIns13 1513-CompIns22 1522-CompIns31 1531
The same result occurs if CompType3 1530 has a VicinityImpact of “System” and has no relationship to any other component.
Turning now to
CompIns11 1611-CompIns21 1621-CompIns31 1631-CompIns22 1622
CompIns12 1612-CompIns21 1621-CompIns31 1631-CompIns22 1622
This type of relationship has several benefits: it allows multiple instances of the same type of component to appear in the dataset; and it is possible to analyze the impact of a component and its sub-components in the same dataset. In the example, if CompType2 is in an operating system domain, it is possible to pinpoint the effect of a failure in the operating system more accurately.
Turning to
The system executes triggering and analysis as outlined in
1. The Collector plug-ins gather data for each host and domain in the system. For example, each domain-specific Collector plug-in collects information for each host: database, operating system, Web server, and application data.
2. The System Manager receives the collected data, organizes it, and stores it in the Repository as datapoints.
3. As it receives data, the System Manager monitors the values of the collected datapoints to determine if a threshold has been reached. If a datapoint reaches a threshold, the System Manager triggers analysis.
4. The Reasoning System determines what type of analysis is needed and triggers the appropriate analysis engine: the Diagnostic Engine or the Predictive Analysis Engine. Analysis occurs very quickly
5. The analysis engine determines if there is a problem or condition.
Triggering is the process in which the arrival of a datapoint can trigger analysis on the dataset. Triggering is always evaluated for components that have been marked as “Triggerable” in the DatasetDef. These components must specify the threshold at which triggering should take place in mnemonic terms: Good, Fair, Warning, and Critical. For example, if the datapoint measures CPU utilization, the thresholds determine whether CPU utilization is low (good), medium (fair), high (warning), or very high (critical). This distinction is important in data analysis because it determines how the condition is diagnosed. There are currently two types of triggers: value trigger and state trigger. A value trigger is a trigger that activates analysis whenever a datapoint meets or exceeds a threshold. For example, if the CPU utilization was set to trigger analysis whenever it was in the warning stage, either warning or critical will trigger analysis. A state trigger is activated whenever the state of a component changes. This reduces, to a large extent, the amount of analysis that is performed when the datapoint value is always at a triggered state, such as when the CPU is always at warning. The state trigger operates whenever the state of a datapoint changes from being non-triggered to triggered. The state trigger also operates when a datapoint changes state and another datapoint in the same dataset is in a triggered state. For example, if the threshold was set to critical and the previous state was warning, this triggers whenever the threshold changes to critical. Once a dataset is in the triggered state, any change in the state of any other analyzed datapoint causes analysis. This ensures that the analysis accurately reflects the problem being encountered in the real system.
In order to perform analysis, a pattern is extracted from a dataset. The pattern is a list of datapoints and their corresponding states. A pattern for the single-level hierarchy may look like Critical-Fair-Warning. This pattern is then analyzed by the corresponding engine to determine if a certain condition has been met. If a condition has been met, then a result may be created or further datapoint values created in the system. When analysis is triggered for a specific dataset, this dataset is placed in a queue for the engine. The engine then receives a message telling it to look in the queue. The engine can then retrieve datasets from the queue and analyze each of them.
Turning to
Although the present invention has been described in detail with reference to certain preferred embodiments, it should be apparent that modifications and adaptations to those embodiments may occur to persons skilled in the art without departing from the spirit and scope of the present invention as set forth in the following claims.
Wolfe, Brian, Menard, Cody, Murthy, Raghavendra K.
Patent | Priority | Assignee | Title |
10402255, | Jan 22 2016 | Veritas Technologies LLC | Algorithm for aggregating relevant log statements from distributed components, which appropriately describes an error condition |
8032789, | Mar 25 2008 | Fujitsu Limited | Apparatus maintenance system and method |
9471411, | Jan 23 2013 | International Business Machines Corporation | Monitoring and capturing early diagnostic data |
Patent | Priority | Assignee | Title |
5197127, | Sep 24 1990 | International Business Machines Corporation; INTERNATIONAL BUSINESS MACHINES CORPORATION, A CORP OF NY | Expert system method for performing window protocol-based data flow analysis within a data communication network |
6125391, | Oct 16 1998 | Red Hat, Inc | Market makers using documents for commerce in trading partner networks |
6317786, | May 29 1998 | R2 SOLUTIONS LLC | Web service |
6892317, | Dec 16 1999 | LONGHORN HD LLC | Systems and methods for failure prediction, diagnosis and remediation using data acquisition and feedback for a distributed electronic system |
7430594, | Jan 26 2001 | Computer Associates Think, Inc | Method and apparatus for distributed systems management |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Apr 04 2007 | Symantec Corporation | (assignment on the face of the patent) | / | |||
Apr 30 2007 | WOLFE, BRIAN | ALTIRIS, INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 020084 | /0041 | |
Aug 21 2007 | ALTIRIS, INC | Symantec Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 019826 | /0191 | |
Feb 28 2012 | Symantec | STEC IP, LLC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 028271 | /0971 | |
May 24 2012 | STEC IP, LLC | Clouding IP, LLC | CHANGE OF NAME SEE DOCUMENT FOR DETAILS | 028275 | /0896 | |
Aug 29 2014 | Clouding IP, LLC | CLOUDING CORP | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 033709 | /0456 | |
Jan 29 2015 | MARATHON PATENT GROUP, INC | DBD CREDIT FUNDING, LLC | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 034873 | /0001 | |
Jan 29 2015 | CLOUDING CORP | DBD CREDIT FUNDING, LLC | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 034873 | /0001 | |
Jan 10 2017 | MARATHON VENTURES S À R L | DBD CREDIT FUNDING LLC, AS COLLATERAL AGENT | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 041333 | /0001 | |
Jan 10 2017 | NYANZA PROPERTIES | DBD CREDIT FUNDING LLC, AS COLLATERAL AGENT | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 041333 | /0001 | |
Jan 10 2017 | MARATHON IP GMBH | DBD CREDIT FUNDING LLC, AS COLLATERAL AGENT | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 041333 | /0001 | |
Jan 10 2017 | ORTHOPHOENIX, LLC | DBD CREDIT FUNDING LLC, AS COLLATERAL AGENT | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 041333 | /0001 | |
Jan 10 2017 | MEDTECH DEVELOPMENT DEUTSCHLAND GMBH | DBD CREDIT FUNDING LLC, AS COLLATERAL AGENT | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 041333 | /0001 | |
Jan 10 2017 | VERMILION PARTICIPATIONS | DBD CREDIT FUNDING LLC, AS COLLATERAL AGENT | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 041333 | /0001 | |
Jan 10 2017 | MUNITECH IP S À R L | DBD CREDIT FUNDING LLC, AS COLLATERAL AGENT | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 041333 | /0001 | |
Jan 10 2017 | MAGNUS IP GMBH | DBD CREDIT FUNDING LLC, AS COLLATERAL AGENT | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 041333 | /0001 | |
Jan 10 2017 | BISMARCK IP INC | DBD CREDIT FUNDING LLC, AS COLLATERAL AGENT | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 041333 | /0001 | |
Jan 10 2017 | 3D NANOCOLOR CORP | DBD CREDIT FUNDING LLC, AS COLLATERAL AGENT | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 041333 | /0001 | |
Jan 10 2017 | TRAVERSE TECHNOLOGIES CORP | DBD CREDIT FUNDING LLC, AS COLLATERAL AGENT | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 041333 | /0001 | |
Jan 10 2017 | SYNCHRONICITY IP LLC | DBD CREDIT FUNDING LLC, AS COLLATERAL AGENT | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 041333 | /0001 | |
Jan 10 2017 | MOTHEYE TECHNOLOGIES, LLC | DBD CREDIT FUNDING LLC, AS COLLATERAL AGENT | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 041333 | /0001 | |
Jan 10 2017 | TLI COMMUNICATIONS GMBH | DBD CREDIT FUNDING LLC, AS COLLATERAL AGENT | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 041333 | /0001 |
Date | Maintenance Fee Events |
Jun 17 2013 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Jul 28 2017 | REM: Maintenance Fee Reminder Mailed. |
Jan 15 2018 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Dec 15 2012 | 4 years fee payment window open |
Jun 15 2013 | 6 months grace period start (w surcharge) |
Dec 15 2013 | patent expiry (for year 4) |
Dec 15 2015 | 2 years to revive unintentionally abandoned end. (for year 4) |
Dec 15 2016 | 8 years fee payment window open |
Jun 15 2017 | 6 months grace period start (w surcharge) |
Dec 15 2017 | patent expiry (for year 8) |
Dec 15 2019 | 2 years to revive unintentionally abandoned end. (for year 8) |
Dec 15 2020 | 12 years fee payment window open |
Jun 15 2021 | 6 months grace period start (w surcharge) |
Dec 15 2021 | patent expiry (for year 12) |
Dec 15 2023 | 2 years to revive unintentionally abandoned end. (for year 12) |