Self-learning disaster-avoidance and recovery

Self-learning disaster-avoidance and recovery
US10997015

A machine-learning mechanism of a disaster-avoidance system trains a knowledgebase to associate characteristics of a data-center component with corresponding degrees of vulnerability to failure and with remedial steps that may be undertaken to avoid failure or to reduce adverse effects of a failure. This training is performed as a function of inferences derived from historical records and from extrinsic information sources. The historical records identify past failures of similar components, component characteristics associated with past failures, and results of remedial procedures undertaken in response to past failures or to previous occurrences of the characteristics. The extrinsic sources identify the current existence of external conditions known to be associated with past failures. When a component's total degree of vulnerability exceeds a predefined threshold value, the system assembles a subset of that component's remedial steps into a remedial procedure and directs downstream modules or administrators to implement the procedure.

PTO Wrapper PDF
Dossier Espace Google

Patent 10997015
Priority Feb 28 2019
Filed Feb 28 2019
Issued May 04 2021
Expiry Jul 19 2039 Extension 141 days
Inventors Lindsay, M…
Assg.orig Internatio…
Assg.curr KYNDRYL, I…
Entity Large
Referenced by 1
References 14
Maint.: EXPIRED<2yrs

BACKGROUND
SUMMARY
BRIEF DESCRIPTION OF…
DETAILED DESCRIPTION

11. A method for self-learning disaster-avoidance and recovery, the method comprising:

a disaster-avoidance system associating a first component of a data center with a first table of conditions,

where a first condition of the first table identifies a first possible characteristic, a first degree of vulnerability, and a first set of remedial steps,

where the first degree of vulnerability specifies a degree of vulnerability to failure of the first component,

where the first degree of vulnerability is incurred when the first component exhibits the first characteristic, and

where the first set of remedial steps specifies an operation intended to mitigate the first degree of vulnerability;

the system determining that a sum of all degrees of vulnerability identified by the first table exceeds a predetermined vulnerability threshold of the first component,

where exceeding the predetermined vulnerability threshold indicates that a remedial procedure must be performed to address a possible failure of the first component;

the system responding to the determining by inserting into the remedial procedure a subset of all remedial steps identified by conditions of the first table of conditions; and

the system directing downstream components to perform the remedial procedure.

1. A disaster-avoidance system comprising a processor, a memory coupled to the processor, and a computer-readable hardware storage device coupled to the processor, the storage device containing program code configured to be run by the processor via the memory to implement a method for self-learning disaster-avoidance and recovery, the method comprising:

associating a first component of a data center with a first table of conditions,

where a first condition of the first table identifies a first possible characteristic, a first degree of vulnerability, and a first set of remedial steps,

where the first degree of vulnerability specifies a degree of vulnerability to failure of the first component,

where the first degree of vulnerability is incurred when the first component exhibits the first characteristic, and

where the first set of remedial steps specifies an operation intended to mitigate the first degree of vulnerability;

determining that a sum of all degrees of vulnerability identified by the first table exceeds a predetermined vulnerability threshold of the first component,

where exceeding the predetermined vulnerability threshold indicates that a remedial procedure must be performed to address a possible failure of the first component;

responding to the determining by inserting into the remedial procedure a subset of all remedial steps identified by conditions of the first table of conditions; and

directing downstream components to perform the remedial procedure.

17. A computer program product, comprising a first computer-readable storage medium having a computer-readable program code stored therein, the program code configured to be executed by a disaster-avoidance system comprising a processor, a memory coupled to the processor, and a second computer-readable storage medium coupled to the processor, the storage medium containing program code configured to be run by the processor via the memory to implement a method for self-learning disaster-avoidance and recovery, the method comprising:

the disaster-avoidance system associating a first component of a data center with a first table of conditions,