The dynamic complexity and the operational risk inherent in a system are defined and incorporated into a mathematical model of the system. The mathematical model is emulated to predict the states of instability that can occur within the operation of the system. Dynamic complexity of a service is demonstrated where there is an observed effect where the cause can be multiple and seemingly inter-related effects of a many-to-one or many-to-many relationship. Having assessed the dynamic complexity efficiency and the operational risk index of a service (e.g., a business, process or information technology), these indexes can be employed to emulate all attributes of a service, thereby determining how a service responds in multiple states of operation, the states where the dynamic complexity of a service can occur, optimal dynamic complexity efficiency of a service, and the singularities wherein a service becomes unstable.
|
17. A non-transitory computer-readable medium storing instructions that, when executed by a computer processor, cause the computer processor to:
obtain a multi-layer mathematical model of a system, layers of the multi-layer model comprising a process layer, an implementation layer, and a physical layer;
model performance metrics of the multi-layer model under plural sets of operational parameters, said modeling including dimensions of cost, response time and throughput, each of the plural sets of operational parameters defining operational requirements for cost, response time and throughput;
identify at least one adverse event from a rate of change in the performance metrics exceeding at least one predetermined thresholds;
determine an occurrence probability of the system transitioning from an initial state having an initial set of operational parameters to each of a plurality of successive states, each of the plurality of successive states specifying the operational requirements of a respective one of the plural sets of operational parameters, the occurrence probability being calculated based on simulation data of the performance metrics under the plural sets of operational parameters;
generate a map relating the at least one adverse event to 1) corresponding instances of the operational requirements of the plural sets of operational parameters and 2) the occurrence probability of the system transitioning from the initial state to the successive states;
determine, based on the map, at least one risk for at least one of the successive states of the system, the at least one risk defining a probability of an outcome of the system including the at least one adverse event;
report the at least one risk to a user; and
modify an information system component of the system based on the risk, the modified information system component causing the system to avoid the at least one risk.
1. A computer implemented method for evaluating operation of a system architecture, comprising:
in a computer processor:
obtaining a multi-layer mathematical model of a system, layers of the multi-layer model comprising a process layer, an implementation layer, and a physical layer;
modeling performance metrics of the multi-layer model under plural sets of operational parameters, said modeling including dimensions of cost, response time and throughput, each of the plural sets of operational parameters defining operational requirements for cost, response time and throughput;
identifying at least one adverse event from a rate of change in the performance metrics exceeding at least one predetermined thresholds;
determining an occurrence probability of the system transitioning from an initial state having an initial set of operational parameters to each of a plurality of successive states, each of the plurality of successive states specifying the operational requirements of a respective one of the plural sets of operational parameters, the occurrence probability being calculated based on simulation data of the performance metrics under the plural sets of operational parameters;
generating a map relating the at least one adverse event to 1) corresponding instances of the operational requirements of the plural sets of operational parameters and 2) the occurrence probability of the system transitioning from the initial state to the successive states;
determining, based on the map, at least one risk for at least one of the successive states of the system, the at least one risk defining a probability of an outcome of the system including the at least one adverse event;
reporting the at least one risk to a user;
determining at least one remedy, the at least one remedy identifying a modification to the system architecture to avoid the at least one risk; and
updating the multi-layer model to incorporate the modification.
16. A computer implemented method for evaluating operation of a system architecture, comprising:
in a computer processor:
obtaining a multi-layer mathematical model of a system, layers of the multi-layer model comprising a process layer, an implementation layer, and a physical layer;
modeling performance metrics of the multi-layer model under plural sets of operational parameters, said modeling including dimensions of cost, response time and throughput, each of the plural sets of operational parameters defining operational requirements for cost, response time and throughput;
identifying at least one adverse event from a rate of change in the performance metrics exceeding at least one predetermined thresholds;
determining an occurrence probability of the system transitioning from an initial state having an initial set of operational parameters to each of a plurality of successive states, each of the plurality of successive states specifying the operational requirements of a respective one of the plural sets of operational parameters, the occurrence probability being calculated based on simulation data of the performance metrics under the plural sets of operational parameters;
generating a map relating the at least one adverse event to 1) corresponding instances of the operational requirements of the plural sets of operational parameters and 2) the occurrence probability of the system transitioning from the initial state to the successive states;
determining, based on the map, at least one risk for at least one of the successive states of the system, the at least one risk defining a probability of an outcome of the system including the at least one adverse event;
reporting the at least one risk to a user;
determining at least one remedy, the at least one remedy identifying a modification to the system architecture to avoid the at least one risk; and
modifying an information system component of the system based on the modification, the modified information system component causing the system to avoid the at least one risk.
2. The method of
3. The method of
4. The method of
analyzing a state of the system; and
wherein determining the at least one risk includes accessing the lookup table using information on the state of the system.
5. The method of
6. The method of
7. The method of
calculating the probability of the outcome of the system including the at least one adverse event based on the occurrence probability.
8. The method of
9. The method of
10. The method of
11. The method of
modeling performance metrics of the multi-layer model under a first set of operational parameters, said modeling including dimensions of cost, response time and throughput;
generating a second set of operational parameters, the second set being distinct from the first set of operational parameters by one set of variables, the one set of variables include at least one of: failure of a component of the system architecture, a delay of an operation, a change in a sequence of operations, and an alternative mode of operation;
modeling performance metrics of the multi-layer model under the second set of operational parameters, said modeling including dimensions of cost, response time and throughput; and
identifying an adverse event from a rate of change in the performance metrics of the second set of operational parameters relative to the performance metrics of the first set of operational parameters, the rate of change exceeding at least one of the predetermined thresholds.
12. The method of
13. The method of
14. The method of
15. The method of
|
with the combined solution using Laplace Transform :
Complexity Function h(σ)=(h)=∫X(τ)·σ(t−τ)dτ (2)
Let us denote the vector σ=σ(k) that represent the set of metrics that define a domain
The system of equations that represents the variations is:
From (1) and (2) the impact of complexity on the domain metrics and using Laplace transform, is:
d and s denote the 2 types of complexities and,
are computed by the method proposed in NA(3)
where (σj(d),σn(d)) and ((σj(s),σn(s)) are representing σ through different coordinates and σi,s or d represent the complexity (i order) derivative, expressed in exponential form
σj(i)=Σk(i)Σn(i) Cn,k expzt (5)
where z is a complex variable that represent the two complexities
z=σ(s)+iσ(d) where i=√{square root over (1)},σ(s)) and σ(d)) the static and dynamic complexity respectively
The set of equations 3, 4 and 5 allow the computation of all domain metrics as a function of varying the two portions of complexity representation.
We propose an aggregative concept, let us call it Complexial that represents the aggregated impact produced in each domain X0 of the vector X0 where X0(1) is performance, X0(2) denotes cost, X0(3) means quality of service and X0(4) represents availability etc.
From the above:
Complexial=ξ=Πn(X0 (n)+Xj(n)+Xn(n)+ . . . ) where Xj are the complexity contribution of higher order perturbations (direct and indirect) of domain metrics n.
Stage 4 Drive the Emulator (2120)
Once the mathematical model of the subject system or environment has been defined, the model is then emulated. The mathematical model may be constructed as described above with reference to
The outputs of this stage (2120) allow for discovery and identity of when the behavior of the emulated environment or system becomes ‘unexpected’ due to a sudden change. This may comprise running a number of starting positions and controlling the emulator to run for a number of different time lines under different initial conditions.
In short, to establish a “system limit” due to complexity, two results in particular are identified. First, the system limit due to static complexity (the “ceiling”) is what may be deemed to be the predictable limit that we understand from simple extrapolations, statistical trending and actual experiences. The “ceiling” is what is normally understood as the operational limits of a system. Second, the system limit due to dynamic complexity (a singularity), which is unpredictable by conventional methods (e.g. statistical trending) is identified. A singularity may occur at any point in time, predictable and governable through the present invention's mathematical methods that emulate interactions, feedback and interferences.
Stage 5: Identify Root Causes of Unexpected Behavior/Singularities and Define Improvements (2125)
Once the mathematical model has been emulated through one or more test scenarios as described above, the results of the emulation can be analyzed to identify the root causes of the various detected results, including adverse effects (e.g., a singularity), and changes to the system to avoid such adverse effects. Inputs at this stage (2125) include the calculated results of emulation from the previous stage (2120), as well as measurements and observations of the actual system to condition and verify the outputs of the previous stage (2120). Outputs of this stage (2125) include improvement directions, which are suggested changes or improvements to the system. Such improvement directions can include utilization scenarios, technology improvements, cost justifications, and immediate actions that can be taken to modify the system. Outputs also include re-engineering directions, which may include long-term transformations, technology improvements, and cost justifications (e.g., re-prioritization of operations).
Operations at this stage (2125) include various methods of analyzing the emulation results, including discovering aging, discovering architecture drawbacks, detecting implementation defects, determining technology limitations, and building and computing further scenarios for emulation. Further, the results of the previous stage (2120) may be quantified and qualified in a number of ways, including assessing the result for each scenario; combining scenarios to build improvement plans; classifying the actions constituting the emulation in terms of resources, implementation, architecture, algorithmic, and processes; evaluating cost versus gain (e.g., QoS, Throughput, availability etc.), and defining the plan (e.g., steps, monitoring execution, effort etc.). A method of determining whether an adverse effect has occurred is described below with reference to
Stage 6 Predict the New Behavior Patterns with the New Dynamic Complexity Using the Emulator (2130)
In response to recommended changes to the system provided in the previous stage (2125), those changes are incorporated into a revised model of the system, and the revised model may be emulated to determine the specific benefits incurred by those changes. Inputs of this stage (2130) include the outputs of the previous stage (2125), as well as defined improvement scenarios. Such improvement scenarios may include changes to the system intended to remove bottlenecks, increase productivity, reduce cost and increase effectiveness, expand more for less cost, and increase scalability. Such improvements may be suggested as a result of a process as described above with reference to
Operations at this stage (2130) include use of the reference predictive emulator to compute the improvement scenarios and define the plan. Further, the emulator may driven to provide ongoing monitoring of complexity (e.g., over long-term simulated scenarios) to identify degradation due to increase in complexity, determining the impact of such degradation, define actions to address the degradation, and determine the frequency of complexity monitoring and analysis (e.g., continuous, daily, weekly).
Stage 7 Dynamic Complexity Under Control
As a result of the previous stages, once implemented to identify and take preventive action against adverse events resulting from dynamic complexity within an emulated system, the dynamic complexity of the system can be deemed to be controlled and predictable within an acceptable tolerance. An adverse event may be identified based on a rate of change in performance metrics or other characteristics, where one or more of those metrics exceed a threshold rate of change. A singularity may be an example of such an adverse event, as well as other rapid changes to the performance or characteristics of a system. Thus, the results, and particularly the proposed changes to the system, can be exported from the model as recommendations to modify and improve the real-world system corresponding to the model.
Inputs of this stage include the outputs, knowledge and experiences of all previous stages, a change management plan, and information on the identified problems and challenges underlying the system.
The outputs and ongoing states of this stage include a proposal regarding reporting structure, destination, frequencies, and content; the operations of a control function to implement the plan; and ongoing maturity improvements.
Initially, a mathematical model is obtained for emulation (2205). The mathematical model may be constructed according to the process described above with reference to
Embodiments of the invention, as described above, provide for emulating a model system through a number of differing scenarios, where the results of such emulations can be analyzed to identify problems and potential solutions for the system. One method of this emulation is to permutate a set of input parameters, by altering one or more values, to generate one or more additional scenarios for emulation. Such selection of differing parameters is described above with reference to
Following obtaining the results of the first and second performance metrics, those metrics may be compared (2230) and reported to a user (2230) to determine the effect of the different input parameters on the performance of the system. The performance metrics can be analyzed further, as described below with reference to
At an initial stage, changes to a set of input parameters are identified (2305) and incorporated into a new set of parameters (2310) for emulation. These steps may correspond to step 2215 described above with reference to
Due to the dynamic complexity of a system, an adverse event may only develop after an extended length of operating time, and may develop despite the failure to predict such an adverse event over a shorter length of simulated time. Thus, by extending the simulation through time T2, a model system can be tested more thoroughly to determine whether adverse outcomes result over greater lengths of time. If the resulting performance metrics after time T2 exceed an acceptable threshold (2345), then an adverse outcome is reported (2360). Otherwise, an acceptable outcome can be reported (2350), indicating that the model system performs in a controlled, predictable manner under the given set of input parameters.
Next, a component is identified that is most proximate to the adverse event (2410). For example, a model system may include a computer server that exhibits a sudden spike in cost (in CPU cycles) to provide a given service, which in turn causes an unacceptable change in production cost for the system as a whole. Once the initial and proximate causes are identified, a path may then be traced between them (2415), where the path encompasses all operations and activities connecting the initial causes to the proximate causes. From this path, a series of components can be identified in the path, each of which can be considered to have contributed to the operations leading to the adverse event (2420). Each of these components can then be evaluated individually for failures, degradation, and other changes in behavior that may have contributed to the adverse event (2430). For example, it may be determined that a computer workstation, in response to the initial causes in combination with degradation over time, exhibited a crash, which in turn contributed to the adverse event further down the stream of operations. In addition, recognizing that other components (outside of this path) may also contribute to an adverse event, those other components may be evaluated in the same manner. With the components contributing to the adverse event identified, those components, as well as the specific problems inherent in each, may be reported for further analysis and remedial measures (2440).
Further description of deconstruction of dynamic complexity and prediction of adverse events, including example applications, is provided in U.S. Pub. No. 2012/0197686, the entirety of which is incorporated herein by reference.
Applications to Additional Systems and Entities
In some example embodiments of the invention, described above, the subject system to be modeled includes a business enterprise, such as a corporation. Referring back to
In further embodiments, the modeling techniques described above, with reference to
1) A process layer describes and models the processes performed by the system. Each modeled process may be described in this layer as one or more constituent operations performed by the system, including dependencies between those operations, resources, and output. Example process layers include the business layer described above with reference to
2) An implementation layer describes and models the operations and sub-operations performed by the physical layer (described below) to complete the processes of the process layer. An example implementation layer includes the application layer described above with reference to
3) A physical layer describes and models the physical components of the system, including resources. An example of a physical layer includes the infrastructure architecture layer described above with reference to
Applying the modeling techniques described above, a range of systems can be simulated as a multi-layer mathematical model having a process layer, an implementation layer and a physical layer. In some embodiments, one or more such layers may be partially or wholly merged, or otherwise reconfigured to accommodate the particular system being modeled. For example, in a relatively simple system, where processes can be described easily with direct relation to the physical components, the process and implementation layers may be implemented in a common layer. Similarly, the implementation and physical layers may be implemented in a common layer.
Predictive Risk Assessment
The possibility of an adverse event, as described above, presents an apparent risk to the operation of a system, or even to the integrity of the system itself. An adverse event may be identified based on a rate of change in performance metrics or other characteristics, where one or more of those metrics exceed a threshold rate of change. A singularity, as described above, may be an example of such an adverse event, as well as other rapid changes to the performance or characteristics of a system. By identifying outcomes including adverse events and their causes, as described above, embodiments of the invention can enable a system to be reconfigured to avoid such adverse events.
Further, embodiments of the invention can be applied, in a more comprehensive manner, to the identification and avoidance of a range of adverse events. By modeling performance metrics of a system under a range of operational parameters, the risk of an outcome including an adverse event can be ascertained as a probability. The risk can be qualified by a particular adverse event, as well as a predefined period of time. Several such risks can be reported simultaneously when warranted.
In an example embodiment of identifying and reporting one or more risks, a multi-layer mathematical model of a system bay be provided as described above. Layers of the multi-layer model may comprise a process layer, an implementation layer, and a physical layer. Performance metrics of the multi-layer model may be modeled under plural sets of operational parameters, where the performance metrics include dimensions of cost, quality of service and throughput. From these performance metrics, one or more adverse events may be identified based on a rate of change in the performance metrics exceeding at least one predetermined threshold. Given the identified adverse event(s), a map can be generated to relate the adverse event(s) to corresponding instances of the plural sets of operational parameters. Based on this map, one or more risks can be determined and reported, where the risk(s) define a probability of an outcome including the at least one adverse event.
Example embodiments providing predictive risk assessment and management are described in further detail below.
Initially, a mathematical model is obtained for emulation (2505). The mathematical model may be constructed according to the process described above with reference to
Embodiments of the invention, as described above, provide for emulating a model system through a number of differing scenarios, where the results of such emulations can be analyzed to identify problems and potential solutions for the system. One method of this emulation is to permutate a set of input parameters, by altering one or more values, to generate one or more additional scenarios for emulation. Such selection of differing parameters is described above with reference to
In an example embodiment, a first of the sets of parameters may correspond to an initial (i.e., measured or observed) state of a system, or may correspond to a hypothetical or predicted state of the system. Further, additional instances of the sets of parameters may correspond to a range of permutations of the first set of parameters, which may correspond to deviations from the initial state of the system. Such deviations can include the permutations described above, and in particular, 1) output volume, 2) external resource volume, 3) structure of the system architecture, and 4) allocation of resources internal and external to the system architecture.
With the sets of parameters defined, the model may then be simulated under each of the sets of parameters to generate corresponding sets of performance metrics (2510). The sets of performance metrics may be quantified in a manner as described above with reference to
Following obtaining resulting performance metrics, those metrics may be analyzed, as described above with reference to
Given the identified adverse event(s), a map can be generated to relate the adverse event(s) to corresponding instances of the plural sets of operational parameters (2520). An example map is described below with reference to
From the map 2600, one or more risks to the system can be determined. A risk, as described above, may indicate a probability that the system will encounter an outcome that includes an adverse event. Such risks can be calculated through a number of means and may be expressed in a number of different ways, and examples of such analysis and presentation are provided in further detail below. In one example, an occurrence probability may be assigned to each of the operational parameters 2610A-2610N, where the occurrence probability indicates a likelihood that the system will move from the initial state 2605 to the given operational parameters. For example, the set of operational parameters 2610A is assigned an occurrence probability of 3%. Such an occurrence probability may be determined based on historical data about the system, historical simulation data, data about comparable systems, and/or other sources. Based on the occurrence probability of each of the operational parameters 2610A-N, one or more risks (e.g., the probability of an outcome including one or more of the adverse events 2630A-D) can be determined. The risks may be reported to a user, including details of the predicted adverse events and the likelihood of each. The risks may also be further processed, for example, to generate a lookup table, an example of which is described below with reference to
The lookup table 2700 may be accessed using information on a given state of the information system. For example, for diagnostic applications, the state of the system may be analyzed and then compared to entries in the lookup table to determine the risk inherent in the system. The remedial actions 2730, including remedies and/or suggested actions (e.g., modifications to the system) to avoid the risk(s), can also be reported, such that they may be implemented by the system itself.
Prior to implementing embodiments for determining risk as described above, initial risk perception 2805 (phase one) may be incomplete. Accordingly, in phase two (risk modeling) 2810, information is collected as necessary to perform the deconstruction and causal analysis based on gathered information from experience and benchmarks of similar situations. From this data, the investigation and provocative scenarios that will reveal the risk and singularities may be built. Using the mathematical formulation and the deconstructed characteristics, dependencies and content behavior, a mathematical emulator that represents the system dynamics and the dynamic complexity is delivered. Using this emulator, scenarios can be deployed under different patterns of initial conditions and dynamic constraints to identify the risk and the conditions under which the risk will occur, as well as the possible mitigation strategies. The emulator can be continuously updated to replicate any changes that may happen over time with impacts on the problem definition, due to the evolution of dynamic complexity, environmental changes or content dynamics. Success is enhanced by the ability to keep the emulator representative, accurate, and able to capture all risks with sound projection of predictions.
After building the emulator in phase two 2810, in phase three 2815 (risk discovery), modified scenarios are run to identify possible risks. By modifying the parameters of each scenario within the emulator, one by one, by group or by domain, to represent possible changes, one may extrapolate each time the point at which the system will hit a singularity and use the corresponding information to diagnose the case. The emulator supports risk categorization based on the severity of impact, the class of mitigation, and many other characteristics that support decision making such as urgency, complexity of implementation of mitigating actions, and the cost dimension.
For each of scenario, the ripple effect is particularly important to results interpretation. By using perturbation theory as the analytical vehicle to represent system dynamics involving direct and indirect effect on components, as well as trajectorties representing sequence of components, the ripple effect is exerted on tightly or loosely coupled interactions.
Other scenarios may be created during this phase 2815 to explore viable and proactive remedial options that secure an acceptable risk mitigation strategy and allow the system to be fixed prior to realizing negative business outcomes caused by an eventual risk. This last dimension may be considered crucial in risk management, which supposes that most of the risk is discovered during this phase—including risks generated by dynamic complexity.
Mitigation is the identification, assessment, and prioritization of risks as the effect of uncertainty on objectives followed by coordinated and economical application of resources to minimize, monitor, and control the impact of unfortunate events or to maximize the realization of opportunities. Risk management's objective is to assure uncertainty does not deviate the endeavor from the business goals. Thus, in phase four 2820, the information derived in the previous phases is implemented to mitigate risk to the system. The risk is identified and diagnosed, and then remediation plans may be built ahead of time to eliminate, eventually reduce or at minimum counterbalance the impact of the risk. It is the application of the knowledge gained in the earlier phases that allows us to be ready with awareness of what may happen and plans of how to remediate the risk. Example embodiments may utilize the knowledge database to continuously monitor systems to cover the risk of both the knowns as well as the unknowns (e.g., risks) that are caused by the evolutionary nature of dynamic complexity.
In phase five, risk monitoring 2825, the monitoring process is implemented based on the concept of Optimal Business Control (OBC). Using the database that contains all risk cases generated in phase three 2815 and enhanced with remedial plans in phase four 2820, the system may be put under surveillance using automation technologies. Similar in functionality to what is used for planes, cars, and nuclear plants, the auto piloting capabilities may observe the system in operations to identify eventual dynamic characteristics that may lead to a pre-identified risk situation. If a matching case is found, an alert will be generated and the pre-approved remedial actions will become active.
Each stored case may contain an identifier, a diagnosis, and one or more options for remediation. If the auto piloting system does not find a matching case, but has identified critical symptoms that may cause a risk, the monitoring controller sends back the characteristics to the predictive modeling phase two 2810. The corresponding scenario may be run to estimate the risk, diagnose the case, and propose remedial options, which may then be sent back to the database, enriching the knowledge base with the new case. Using this approach, the auto piloting, monitoring and control system may gradually become more intelligent and exhaustive, which, over time, may serve to considerably reduce the occurrence of risks of adverse events.
Calculation of Risk
Example indicators or risk exhibited by a system may be referred to as a Dynamic Complexity Indicator (Dycom) and a Risk Index (RI). Dycom and RI may be implemented in the embodiments described above with reference to
Dycom may be understood as a vector for dynamic complexity metrics that shows the health of a business dynamics: Dycom may represent: A) The degree of dependencies among components forming a business system. High degree of dependencies shows high risk of generating dynamic complexity that threats efficiency, increases unproductive cost portion and reduces the quality of service. B) Degree of dependencies that produces a feedback: example, a feedback could be equivalent to n dependencies. That happens if the production line produces leftover that needs to be further treated. C) Degree of deepness (elements like priorities, locks, volumes, discriminant factors such as pay-in at the right moment, default payment etc.)
All elements of the Dycom vector are computed by the Perturbation theory, so the indicator will be given in the form of Dycom=(x1, x2, x3 . . . , xn)
From Dycom, three more management indicators may be driven:
A) Complexity index (Lost opportunity): is the loss due to the degree of dependencies among the vector of indicator. Also computed by the Perturbation Theory. It is also a vector that shows the loss or gain in each business and system process.
B) Complexity Disruptors (vector of causes): that one will be the causes that make dynamic complexity visible and eventually disruptive. It is shown as a vector (where the cause, impact and qualification appear one by one).
C) Operational Risk Index: derived directly from the above three indicators.
The metrics that we will use as components to determine the indicators are expanded to a number of ratios/percentages for each of the Service dynamic complexity metrics xn
x1 is (throughput Index (TI)=Actual Throughput/Maximum Throughput)
x2 is (Cost Efficiency (CE)=Cost of Optimal Service Path/Cost of actual Service Path)
x3 is (Quality Index (QI)=Expected Quality (as planned)/Perceived Quality (response delayed for whatever reason)
x4 is (Service Continuity (SC) equivalent to Availability and Recovery of service=Operable Time/Required Operable Time)
x5 is (Systemic Response Time Index (RTI)=Service time (as planned)/Response time (aggregation of service components)
x6 is (Operational Efficiency (OE)=(planned number of people/actual number of people)×effectiveness of tools (%) and efficiency of process (%)
x7 is (Loss of service guarantee (SE): current service index/required service index: Best=1)
x8 is (Loss in Quality (LQ): Perceived quality/best quality: Best=1)
x9 is the cache hit ratio
The Dynamic Complexity Efficiency Gradient (Dycom) of a service equal
cn Denote normalization coefficients and xn is the dynamic complexity impact on a specific indicator
The Operational Risk Index of a Service therefore equal
1−AV+exp(Dycom)
Where AV is the normalized availability in a time window of the xn
The role each metric plays in the formula can be differentiated by affecting a weight that represent a qualitative perception or a strategic importance, such as:
The dycom or DCE gradient of a service could include weights, each of which could be associated with each indicator to represent the criticality of one inequality with respect to the set of the other inequalities.
In an example embodiment, the above calculations may be applied as follows. First, each metric in the gradient should be 1 or less (i.e. 0.8 availability, 0.6 quality of response means we are only deliver good RT 60% of time etc.), then the perception is different from situation/company/project to another; therefore we need to multiply each term by a weighting factor for a space mission the availability is more important than quality so we multiply it by a factor which will greatly impact the risk and reduce eventually the other factors (i.e. 90% availability is worse than 30% quality). A remaining question is whether to normalize the sum of weights, which will eventually impact to what extent an elegant the formulae will be. In one embodiment the sum will equal 1.
Optimal Risk Control Theory
The starting point of risk management, in example embodiments, is the analysis following the causal deconstruction of a system:
A) Discover the environment, its dynamics, the infleuncers that may provoke a change, the key performance indicators and the goals in terms of economic, service quality and volume points of views
B) Collect the detailed (static) complexity elements: process flows, structures, configurations, technologies and geography. Understand the dynamic complexity: dependencies, interactions, combinatorial, operating models: scheduling, dispatching, routing mechanisms.
C) Build the Mathematical Predictive dynamic complexity emulator through a top/down hierarchical constructs that will show: organizational, logic and physical views of the system (static complexity) and dependencies, feedback, combinatorial and management parameters patterns (dynamic complexity)
D) Computing the mathematical model will produce the key performance indicators derived from the computation of the three axis: processed volume, service quality and cost. The emulator will also assess the risk by estimating the resources consumptions due to dynamic complexity and the risk index associated to such estimation.
E) After a proper validation of accuracy and precision, the emulator will be used to test scenarios and build the knowledge base:
Such an approach may provide managers a platform to control, plan and identify problems and consequences ahead of situations. In short, both goals of reducing uncertainty, and proactively estimating and fixing problems. Additional advantages include:
In this example application, a hierarchic perturbation model is provided to emulate the complexity of a transactional system comprised of: (1) application server, (2) processor, (3) database, (4-5) data storage and (6-8) data tables. In this simple case of an IT system, the transaction hits the application server, which runs a sequence of operational activities through a processor, and then the database tries to execute the work in memory. If the data cannot be found in the database memory, a data storage component will be accessed. This represents the static view everyone knows. Overtime observation of the system will produce measurements that provide a fixed picture or snapshot of the system at a point in time.
Today a simple system of this nature will be managed by drawing a simplistic correlation between processor utilization and the response time. Therefore a lack of processor capability will should into degradation in response time. To maintain an acceptable level of scalability, we may decide to increase the processor power. But in some cases, we will find that this action does not yield the expected improvement. While this does not seem to be a natural outcome, we were able to pinpoint the cause of this phenomenon by using Causal Deconstruction and the hierarchic use of perturbation mathematics to expose the impact of dynamic complexity. This is a simple example of a complex challenge.
In the perturbation model, we distinguish between two metrics: 1) the service time, which is the aggregated time the transactions spend alone with no interruption or contention (no queues) or any impact of dynamic complexity, at all service stations; and 2) the response time, which includes the afore-mentioned service times plus any time spent solving contentions, conflicts or delays that may occur at each service station. Service time is generally considered a constant in most existing methods, as it is practically impossible to measure service time due to system perturbations and echoes (measurement tooling, operating system, etc.). While response time degradation has traditionally been considered an indicator of a risk. But, we will see in this example that service time can also carry a risk.
To start, the fundamental question we must ask is, “What if response time degradation is mainly caused by service time degradation, which is supposed to be constant?” A decision based on the correlation between resource utilization and response time degradation due to conflicts, contentions, and delays will not necessarily be able to deliver the right conclusion.
The case in Table A, below, was emulated through a perturbation model populated by the static characteristics of the system and using the model libraries to compute the performance numbers. After validation with real life measurements, using common values of workload and system parameters, the emulator was considered both representative and accurate to allow for reproducibility scenarios.
TABLE A
Scenario 1. No incident. Data in memory is 100% with no contention
Processor
Response
Service
Conflicts
Data in
Data in
Utilization
Time
Time
Contentions
Storage
Memory
Arrival
System
(%)
(seconds)
(seconds)
(%)
(seconds)
(%)
Rate
Delivers
56.35
0.25
0.25
0
9
100
3
2.98
TABLE B
Scenario 2. Five percent of the data is off memory.
Processor
Response
Service
Conflicts
Data in
Data in
Utilization
Time
Time
Contentions
Storage
Memory
Arrival
System
(%)
(seconds)
(seconds)
(%)
(seconds)
(%)
Rate
Delivers
64.65
3.10
2.50
25.60
72
95
3
2.75
In examining the differences between scenario one and two, we noticed that the response time was degraded by a factor of 12.4 times. In this case, common wisdom would suggest that the problem was caused by a lack of processor power—so a decision would be made to improve it. The outcome of a decision to increase processing power is represented in scenario 3 below.
TABLE C
Scenario 3. Increase processing power
Processor
Response
Service
Conflicts
Data in
Data in
Utilization
Time
Time
Contentions
Storage
Memory
Arrival
System
(%)
(seconds)
(seconds)
(%)
(seconds)
(%)
Rate
Delivers
21.69
02.80
2.20
27
72
95
3
2.764
Even with the increase in processing power, we saw almost no improvement. This demonstrates the hierarchic impact of dynamic complexity.
Five percent of data requests are outside the database memory. Therefore, the data request moved to a slower service station that eventually would find the data or go further down in the supply chain, while the transaction was still in a processing state. From this analysis we found that the response time degradation was not due to lack of resources, but due to the fact that the service time was not constant—in fact it increased by 10 times its original value.
The lessons learned from this case were that: A) The service time, which had been previously used as baseline, is not always constant. B) The relative variations in speeds among service stations can produce complexity patterns that are difficult to measure or derive by simple statistics. C) The speed and intensity of degradation could be greater than any historical data analysis, common sense, and/or popular wisdom can support. D) In these conditions, hitting a singularity point will always come as a big surprise.
So the question becomes, “Is it possible to avoid the singularity?” And even more important, “Is it possible to learn about it before it becomes too late?” The answer in all cases is yes. This becomes possible only through advanced mathematics. Therefore the predictability, or at least the ability to understand and derive the predictability, becomes part of the requirements in building systems and layering should be explicitly represented in the emulation process to cover a wider range of dynamic complexity scenarios.
Using the scenarios above, we can extend our predictive analysis even further to expose the effect of lower level dynamics by increasing the data hit in memory to 100% again and measuring its impact on our ability do more business transactions.
TABLE D
Scenario 4. Increase the arrival rate of business transactions in Scenario 1 by five times.
Processor
Response
Service
Conflicts
Data in
Data in
Utilization
Time
Time
Contentions
Storage
Memory
Arrival
System
(%)
(seconds)
(seconds)
(%)
(seconds)
(%)
Rate
Delivers
273
0.25
0.25
0
43
100
15
14.6
Scenario 4 allows us to see that the scalability of the system was perfect. A five times increase in business transactions, used 5 times more processor power and the response time and service time were equal with no contentions. The response time and service time remained invariant as the Cache Hit Ratio (CHR) was equal to 100% and there were no contentions for resources. Additionally, the service time remained unchanged (0.25 seconds).
Then, we analyzed what would happen if we again increased the business transactions, as we did in Scenario 4 by five times, but the data was not entirely in memory. In this case the execution of the transaction moved first to the data storage memory then to the physical storage itself (the spinning disk drive).
TABLE E
Scenario 5. Increase the arrival rate of business transactions in
Scenario 2 by five times with five percent of the data off memory.
Processor
Response
Service
Conflicts
Data in
Data in
Utilization
Time
Time
Contentions
Storage
Memory
Arrival
System
(%)
(seconds)
(seconds)
(%)
(seconds)
(%)
Rate
Delivers
151
8.1
2.29
253
136
95
15
6.8
Scenario 5 was really interesting because it again defied the generally accepted wisdom. The processor utilization went down from the previous case. Since a typical business transaction would stay longer in the system, the average processor utilization was lower—this allowed some small improvement in the contention-free service time. But the conflicts became very high mainly due to a storage bottleneck. The storage bottleneck was formed by both the direct access as well as the data out of memory transformation. This was an interesting finding because under these conditions the system was only able to deliver 45% of what was requested.
In order to see how much conflicts/contentions could be attributed to lack of processing power, we computed Scenario 6.
TABLE F
Scenario 6: Increase the processing power for the previous five scenarios.
Processor
Response
Service
Conflicts
Data in
Data in
Utilization
Time
Time
Contentions
Storage
Memory
Arrival
System
(%)
(seconds)
(seconds)
(%)
(seconds)
(%)
Rate
Delivers
127
7.7
2.20
250
138
95
15
6.9
Scenario 6 proved that a more powerful processor would not be able deliver more workload (only 46% of the demand), and would show little improvement in response time (5%).
Considering our results (as summarized in Figure X), we believe there is broad impact to a number of traditional management methods, which are based on many assumptions and fail to reveal unknowns as needed to deliver robust predictions, including: A) Capacity planning management, which makes assumptions on processor capacity. B) Investment planning which does not represent the dynamic complexity. C) Operational automation because most alerts are built on partial knowledge. D) Testing which does not account for the dynamical relationships between system components. E) Architecture that only partially handles dynamics.
While this invention has been particularly shown and described with references to example embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the invention encompassed by the appended claims.
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
6311144, | May 13 1998 | X-ACT SCIENCE INC | Method and apparatus for designing and analyzing information systems using multi-layer mathematical models |
6560569, | May 13 1998 | X-ACT SCIENCE INC | Method and apparatus for designing and analyzing information systems using multi-layer mathematical models |
6904449, | Jan 14 2000 | Accenture Global Services Limited | System and method for an application provider framework |
6990437, | Jul 02 1999 | X-ACT SCIENCE INC | Systems and method for determining performance metrics for constructing information systems |
7031901, | May 13 1998 | X-ACT SCIENCE INC | System and method for improving predictive modeling of an information system |
7035786, | May 13 1998 | X-ACT SCIENCE INC | System and method for multi-phase system development with predictive modeling |
7389211, | May 13 1998 | UNIVERSAL RISK MANAGEMENT LLC | System and method of predictive modeling for managing decisions for business enterprises |
7783468, | Aug 29 2000 | X-ACT SCIENCE INC | Automated system and method for service and cost architecture modeling of enterprise systems |
7881920, | Aug 29 2000 | X-ACT SCIENCE INC | Systemic enterprise management method and apparatus |
20090112668, | |||
20090182593, | |||
20090254411, | |||
20100004963, | |||
20120016714, | |||
20120197686, | |||
20240169121, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
May 18 2023 | X-ACT SCIENCE, INC. | (assignment on the face of the patent) | / | |||
Sep 04 2023 | ABU EL ATA, NABIL A | X-ACT SCIENCE, INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 064985 | /0375 |
Date | Maintenance Fee Events |
May 18 2023 | BIG: Entity status set to Undiscounted (note the period is included in the code). |
Date | Maintenance Schedule |
Oct 29 2027 | 4 years fee payment window open |
Apr 29 2028 | 6 months grace period start (w surcharge) |
Oct 29 2028 | patent expiry (for year 4) |
Oct 29 2030 | 2 years to revive unintentionally abandoned end. (for year 4) |
Oct 29 2031 | 8 years fee payment window open |
Apr 29 2032 | 6 months grace period start (w surcharge) |
Oct 29 2032 | patent expiry (for year 8) |
Oct 29 2034 | 2 years to revive unintentionally abandoned end. (for year 8) |
Oct 29 2035 | 12 years fee payment window open |
Apr 29 2036 | 6 months grace period start (w surcharge) |
Oct 29 2036 | patent expiry (for year 12) |
Oct 29 2038 | 2 years to revive unintentionally abandoned end. (for year 12) |