An ftdc interface with the host or user. The interface can include a command application programming interface (API) or a data storage Command-Line interface (DS CLI)/Graphical user interface (GUI). In certain embodiments, the ftdc interface allows a host or user to customize a desired ftdc on a two-tiered system. The first tier is one in which a host/user selects, from a list of conditions, which ones, upon occurrence of those conditions, they would like the controller to perform ftdc. In the second tier a second selection is made such that for each first tier item, the host/user will select the level of ftdc (collection and offloading of logs and/or the forcing and offloading of a statesave).

Patent
   8250402
Priority
Mar 24 2008
Filed
Mar 24 2008
Issued
Aug 21 2012
Expiry
Jun 22 2031
Extension
1185 days
Assg.orig
Entity
Large
1
12
EXPIRED
1. A computer-implementable method for providing a first time data collection (ftdc) function to a storage controller comprising:
providing an ftdc interface, the ftdc interface allowing a host to interface with the storage controller; and,
enabling customization of an ftdc operation operating on the storage controller via the ftdc interface; and wherein
the ftdc operation comprises a two tiered operation, a first tier of the two tiered operation enabling a user to identify a condition from a list of conditions where detection of the condition causes initiation of an ftdc operation and a second tier of the two tier operation enabling a user to select a level of data collection for the condition so identified.
13. A non-transitory computer-usable medium embodying computer program code, the computer program code comprising computer executable instructions configured for:
providing a first time data collection (ftdc) interface, the ftdc interface allowing a host to interface with the storage controller; and,
enabling customization of an ftdc operation operating on the storage controller via the ftdc interface; and wherein
the ftdc operation comprises a two tiered operation, a first tier of the two tiered operation enabling a user to identify a condition from a list of conditions where detection of the condition causes initiation of an ftdc operation and a second tier of the two tier operation enabling a user to select a level of data collection for the condition so identified.
7. A system comprising:
a processor;
a data bus coupled to the processor; and
a computer-usable medium embodying computer program code, the computer-usable medium being coupled to the data bus, the computer program code comprising instructions executable by the processor and configured for:
providing a first time data collection (ftdc) interface, the ftdc interface allowing a host to interface with the storage controller; and,
enabling customization of an ftdc operation operating on the storage controller via the ftdc interface; and wherein
the ftdc operation comprises a two tiered operation, a first tier of the two tiered operation enabling a user to identify a condition from a list of conditions where detection of the condition causes initiation of an ftdc operation and a second tier of the two tier operation enabling a user to select a level of data collection for the condition so identified.
2. The method of claim 1 wherein:
the condition is selected from a list of conditions comprising binary conditions, threshold conditions and conditional conditions.
3. The method of claim 1 wherein:
the level of data collection comprises at least one of a collect and offload log operation, a collect and offload save state operation and a collect and offload log and save state operation.
4. The method of claim 1 wherein:
the enabling customization of the ftdc operation enables collection of ftdc data for pseudo error conditions based upon at least one of detection of predetermined threshold values or an occurrence of predetermined conditions.
5. The method of claim 4 wherein:
the pseudo error conditions comprise at least one of a selective reset, a system reset, and a Peer-to-Peer Remote Copy suspend of paths or pairs.
6. The method of claim 1 wherein:
the ftdc interface comprises at least one of an application programming interface (API), a data storage Command-Line interface (DS CU), and a Graphical user interface (GUI).
8. The system of claim 7 wherein:
the condition is selected from a list of conditions comprising binary conditions, threshold conditions and conditional conditions.
9. The system of claim 7 wherein:
the level of data collection comprises at least one of a collect and offload log operation, a collect and offload save state operation and a collect and offload log and save state operation.
10. The system of claim 7 wherein:
the enabling customization of the ftdc operation enables collection of ftdc data for pseudo error conditions based upon at least one of detection of predetermined threshold values or an occurrence of predetermined conditions.
11. The system of claim 10 wherein:
the pseudo error conditions comprise at least one of a selective reset, a system reset, and a Peer-to-Peer Remote Copy suspend of paths or pairs.
12. The system of claim 7 wherein:
the ftdc interface comprises at least one of an application programming interface (API), a data storage Command-Line interface (DS CLI), and a Graphical user interface (GUI).
14. The computer-usable medium of claim 13, wherein:
the condition is selected from a list of conditions comprising binary conditions, threshold conditions and conditional conditions.
15. The computer-usable medium of claim 13, wherein:
the level of data collection comprises at least one of a collect and offload log operation, a collect and offload save state operation and a collect and offload log and save state operation.
16. The computer-usable medium of claim 13, wherein:
the enabling customization of the ftdc operation enables collection of ftdc data for pseudo error conditions based upon at least one of detection of predetermined threshold values or an occurrence of predetermined conditions.
17. The computer-usable medium of claim 16, wherein:
the pseudo error conditions comprise at least one of a selective reset, a system reset, a Peer-to-Peer Remote Copy suspend of paths or pairs.
18. The computer-usable medium of claim 16, wherein:
the ftdc interface comprises at least one of an application programming interface (API), a data storage Command-Line interface (DS CLI), and a Graphical user interface (GUI).

1. Field of the Invention

The present invention relates in general to the field of computers and similar technologies, and in particular to software utilized in this field. Still more particularly, the present invention relates to preconditioning a storage controller for automated data collection.

2. Description of the Related Art

Storage controllers, such as those available from the International Business Machines Corporation, are known. In storage controllers, it is known for an enterprise storage server to manage Input/Output (I/O) requests from networked hosts to one or more storage units, such as a direct access storage device (DASD), Redundant Array of Independent Disks (RAID Array), and Just a Bunch of Disks (JBOD). Storage controllers include host bus adapters or interfaces to communicate with hosts over a network and adapters or interfaces to communicate with the storage units.

Data integrity is an important factor in large computer data systems. Thus, backup systems have been developed and integrated into storage controllers to prevent the loss of data in the event of various types of failures. If an error occurs on a storage controller, it is desirable to provide a first time data capture or collection (FTDC) type function (also referred to as first failure data capture (FFDC) type function) which can support analysis of the error and eventually enable obtaining a root cause and/or resolution of that error. With a FTDC type system, much of the data collection, such as error report (errpt) data, trace logs, etc., is automatically gathered but is not automatically offloaded. This automatically collected data is often collected in a round robin basis so that new data overlays older data in the collection. This overlay function means that if these logs are not offloaded until well after an error event, the pertinent data can be lost.

Another important aspect of a data collection system is known as a statesave operation, in which control and data structures are accumulated and off-loaded when a statesave is triggered. Many error events can automatically trigger these statesaves (e.g., panics, data storage interrupts (DSIs)), but there are some error events in which automatic triggers are not set because the condition is not necessarily considered an error. Some examples of these pseudo-error conditions include selective resets, system resets, Peer-to-Peer Remote Copy suspends of paths or pairs. Additionally, some of these conditions are not considered errors until some form of threshold is crossed. A notable example of this type of issue is a performance problem where, as long as response time or throughput (e.g., in MB/sec) stays below or above, respectively, some defined threshold, no problem is identified. However, when the thresholds are exceeded, the user/host is impacted. In all these examples, a statesave can be manually forced or forced by triggers set in host software. However, such a manual statesave operation is often well after the event and as such, data critical to the analysis is no longer in the collected data.

Accordingly, it is desirable to allow a user or host to define automated FTDC for both automatically collected data and for pseudo error type conditions based on the environments and needs of a customer.

In accordance with the present invention, an FTDC interface with the host or user is provided. The interface can include a command application programming interface (API), a data storage Command-Line Interface (DS CLI), or a Graphical User Interface (GUI). In certain embodiments, the FTDC interface allows a host or user to customize a desired FTDC on a two-tiered system. The first tier is one in which a host/user selects, from a list of conditions, which ones, upon occurrence of those conditions, they would like the controller to perform FTDC. The second tier a second selection such that for each first tier item, the host/user will select the level of FTDC (collection and offloading of logs and/or the forcing and offloading of a statesave).

More specifically, in one embodiment, the invention relates to a computer-implementable method for providing a first time data collection (FTDC) function to a storage controller. The method include providing an FTDC interface, the FTDC interface allowing a host to interface with the storage controller; and, enabling customization of an FTDC operation operating on the storage controller via the FTDC interface.

In another embodiment, the invention relates to a system that includes a processor; a data bus coupled to the processor; and a computer-usable medium embodying computer program code, the computer-usable medium being coupled to the data bus. The computer program code includes instructions executable by the processor and configured for providing a first time data collection (FTDC) interface, the FTDC interface allowing a host to interface with the storage controller; and, enabling customization of an FTDC operation, operating on the storage controller via the FTDC interface.

In another embodiment, the invention relates to a computer-usable medium embodying computer program code. The computer program code includes computer executable instructions configured for: providing a first time data collection (FTDC) interface, the FTDC interface allowing a host to interface with the storage controller; and, enabling customization of an FTDC operation operating on the storage controller via the FTDC interface.

The above, as well as additional purposes, features, and advantages of the present invention will become apparent in the following detailed written description.

The novel features believed characteristic of the invention are set forth in the appended claims. The invention itself, however, as well as a preferred mode of use, further purposes and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, where:

FIG. 1 shows a block diagram of a storage system in which the present invention may be implemented;

FIG. 2 shows a flow chart of the operation of a FTDC system.

FIG. 3 shows a flow chart of the operation of a select condition operation of a FTDC system.

FIG. 4 shows a flow chart of the operation of a condition level operation of a FTDC system.

FIG. 1 shows a block diagram of a storage system 100, which can be a RAID system in which the present invention may be implemented. The system 100 includes a storage controller 110 that can be a RAID controller coupled to a host or client 120. The storage controller 110 is also coupled to a disk array 130 that includes two or more disk drives. The storage controller 110 also includes a processor 114 that processes commands received from the host 120 and executes storage drive routines. The storage controller 110 may also include a cache 116 for temporary storage of recently or often accessed data.

The host 120 includes a processor 124 that executes routines and issues read and write commands to the storage controller 110. The host 120 also includes memory 126, which can include volatile and non-volatile memory. A FTDC system 128 is stored on the memory 126 and may be executed by the processor 124. The storage controller 110 interacts with and controls a plurality of drives 130. The host 120 and the storage controller 110 each include a respective FTDC interface, 140, and 142.

Referring to FIG. 2, within the FTDC interface when an FTDC system is accessed at step 210, two selection lists are presented a two tier selection process is presented). More specifically, with a first list at step 220, the host/user selects conditions under which FTDC will occur on the controller at step 230. These conditions are then translated into appropriate triggers on the controller at step 232. For each condition selected from the first list, a second selection list is presented at step 240. In this list, the host/user can select from a plurality of types of FTDC information.

Next at step 250, the FTDC system determines whether there are any remaining conditions. To prevent multiple FTDC operations from occurring in the case of recursive/reoccurring errors, either internal or host/user selected limits on the number of FTDCs may be set. This limit can be based on a number of FTDC operations or a time between FTDC operations. This interface may be modified at any time by the host/user and these current settings will survive controller initial machine loads (IMLs).

FIG. 3 shows a flow chart of the operation of a select condition operation 305 of a FTDC system. More specifically, when selecting conditions, a user may select a binary condition at step 310 (e.g., a condition which includes either a pass or fail (i.e., yes/no) condition. A user may also select a threshold condition, where the FTDC operation will occur if a certain threshold value is either exceeded or not exceeded at step 320. A user may also select a conditional condition (e.g., an if/then type of condition) at step 330.

Some of these conditions may involve setting threshold values and for these conditions, additional input will be allowed beyond the binary “yes/no.”

FIG. 4 shows a flow chart of the operation of a condition level operation 405 of a FTDC system. More specifically, in certain embodiments, when selecting from any of a plurality of types of FTDC information, a user may select to collect and offload logs (e.g. to a product engineering (PE) package) at step 410. A user may select to collect and offload statesave only at step 420. Additionally, a user may select to collect and offload both logs and statesave at step 430.

The described system may be implemented as a method, apparatus, or article of manufacture using standard programming and/or engineering techniques to produce software, firmware, hardware, or any combination thereof. The term “article of manufacture” refers to code or logic implemented in hardware logic (e.g., magnetic storage medium such as hard disk drives, floppy disks, and tape), optical storage (e.g., CD-ROMs, optical disks, etc.), volatile and non-volatile memory devices (e.g., EEPROMs, ROMs, PROMs, RAMs, DRAMs, SRAMs, firmware, programmable logic, etc.). Code in the computer readable medium is accessed and executed by a processor. The code in which implementations are made may further be accessible through a transmission media or from a file server over a network. Those skilled in the art will recognize the article of manufacture may comprise any information-bearing medium known in the art.

While the present invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention. Furthermore, as used in the specification and the appended claims, the term “computer” or “system” or “computer system” or “computing device” includes any data processing system including, but not limited to, personal computers, servers, workstations, network computers, main frame computers, routers, switches, Personal Digital Assistants (PDAs), telephones, and any other system capable of processing, transmitting, receiving, capturing and/or storing data.

Saba, Raul E., Stanley, Warren K., Peterson, Beth A., Clark, Brian D., Coronado, Juan A.

Patent Priority Assignee Title
9535780, Nov 18 2013 International Business Machines Corporation Varying logging depth based on user defined policies
Patent Priority Assignee Title
5619644, Sep 18 1995 International Business Machines Corporation Software directed microcode state save for distributed storage controller
6600614, Sep 28 2000 Seagate Technology LLC Critical event log for a disc drive
6754854, Jun 04 2001 Google Technology Holdings LLC System and method for event monitoring and error detection
6947957, Jun 20 2002 Unisys Corporation Proactive clustered database management
7080287, Jul 11 2002 LinkedIn Corporation First failure data capture
7111206, Sep 19 2001 Juniper Networks, Inc Diagnosis of network fault conditions
20040024726,
20040025077,
20050081156,
20050240826,
20060195731,
20080270776,
//////
Executed onAssignorAssigneeConveyanceFrameReelDoc
Feb 26 2008STANLEY, WARREN K International Business Machines CorporationASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS 0206890560 pdf
Feb 28 2008CLARK, BRIAN D International Business Machines CorporationASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS 0206890560 pdf
Feb 28 2008PETERSON, BETH A International Business Machines CorporationASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS 0206890560 pdf
Feb 29 2008SABA, RAUL E International Business Machines CorporationASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS 0206890560 pdf
Mar 03 2008CORONADO, JUAN A International Business Machines CorporationASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS 0206890560 pdf
Mar 24 2008International Business Machines Corporation(assignment on the face of the patent)
Date Maintenance Fee Events
Feb 10 2016M1551: Payment of Maintenance Fee, 4th Year, Large Entity.
Apr 13 2020REM: Maintenance Fee Reminder Mailed.
Sep 28 2020EXP: Patent Expired for Failure to Pay Maintenance Fees.


Date Maintenance Schedule
Aug 21 20154 years fee payment window open
Feb 21 20166 months grace period start (w surcharge)
Aug 21 2016patent expiry (for year 4)
Aug 21 20182 years to revive unintentionally abandoned end. (for year 4)
Aug 21 20198 years fee payment window open
Feb 21 20206 months grace period start (w surcharge)
Aug 21 2020patent expiry (for year 8)
Aug 21 20222 years to revive unintentionally abandoned end. (for year 8)
Aug 21 202312 years fee payment window open
Feb 21 20246 months grace period start (w surcharge)
Aug 21 2024patent expiry (for year 12)
Aug 21 20262 years to revive unintentionally abandoned end. (for year 12)