There is provided a system for dynamically resynchronizing, in the even of a system failure, a storage system made up of a plurality of nodes, each which has mirrored logical volumes respectively divided in to a plurality of mirrored logical data partitions. Each of these nodes has the means for accessing a common physical data volume, e.g. a disk drive in which data in corresponding logical volumes at respective nodes is represented by data stored in common in the physical volume. system recovery at the plurality of nodes after a failure at one of the nodes is carried out by commencing the sequential resynchronization of a logical data volume at a nonfailure node to thereby sequentially resynchronize the partitions of the physical data volume representative of the logical data volume, and indicating as resynchronized those portions of the logical data volume at the failure node represented by said resynchronized partitions of the physical data volume.
|
6. In a data processor controlled storage system for storing data in a physical volume having a plurality of physical partitions, and means for accessing said physical data volume through a plurality of nodes, each node having at least one logical data volume comprising mirrored data corresponding to a logical data volume comprising mirrored data at each of the other nodes, and wherein data in corresponding logical volumes is represented by data stored in common in said physical volume, a method of resynchronizing the logical data volumes of each of said nodes in the event of a storage system failure at one of said nodes comprising:
commencing the sequential resynchronization of a logical data volume at a nonfailure node to thereby sequentially resynchronize the partitions of said physical data volume representative of said logical data volume, and indicating as resynchronized those portions of the logical data volume at said failure node represented by said resynchronized partitions of said physical data volume.
1. In a data processor controlled storage system for storing data in a physical volume having a plurality of physical partitions,
means for accessing said physical data volume through a plurality of nodes, each node having at least one logical data volume comprising mirrored data corresponding to a logical data volume comprising mirrored data at each of the other nodes, and wherein data in corresponding logical volumes is represented by data stored in common in said physical volume, and means for resynchronizing the logical data volumes of each of said nodes in the event of a storage system failure at one of said nodes comprising: means for commencing the sequential resynchronization of a logical data volume at a nonfailure node to thereby sequentially resynchronize the partitions of said physical data volume representative of said logical data volume, and means for indicating as resynchronized those portions of the logical data volume at said failure node represented by said resynchronized partitions of said physical data volume. 11. In a data processor controlled storage system for storing data in a physical volume having a plurality of physical partitions, and means for accessing said physical data volume through a plurality of nodes, each node having at least one logical data volume comprising mirrored data corresponding to a logical data volume comprising mirrored data at each of the other nodes, and wherein data in corresponding to a logical volumes is represented by data stored in common in said physical volume, a computer readable medium including a computer program having program code thereon for resynchronizing the logical data volumes of each of said nodes in the event of a storage system failure at one of said nodes comprising:
means for commencing the sequential resynchronization of a logical data volume at a nonfailure node to thereby sequentially resynchronize the partitions of said physical data volume representative of said logical data volume, and means for indicating as resynchronized those portions of the logical data volume at said failure node represented by said resynchronized partitions of said physical data volume.
2. The data processor storage system of
3. The data processor storage system of
4. The data processor storage system of
5. The data processor storage system of
means responsive to said failure for setting resynchronization indicators for the logical volume at the failure node and for the corresponding logical volumes at the nonfailure nodes, and means for removing said resychronization indicator from the partitions of the logical data volume at said failure node represented by said resynchronized partitions of said physical data volume. 7. The resynchronization method of
8. The resynchronization method of
9. The resynchronization method of
10. The resynchronization method of
setting resynchronization indicators for the logical volume at the failure node and for the corresponding logical volumes at the nonfailure nodes responsive to said failure, and removing said resychronization indicator from the partitions of the logical data volume at said failure node represented by said resynchronized partitions of said physical data volume.
12. The computer readable medium of
13. The computer readable medium of
14. The computer readable medium of
15. The computer readable medium of
means responsive to said failure for setting resynchronization indicators for the logical volume at the failure node and for the corresponding logical volumes at the nonfailure nodes, and means for removing said resynchronization indicator from the partitions of the logical data volume at said failure node represented by said resynchronized partitions of said physical data volume.
|
The following patent application, having the same inventors and the same assignee as the present invention and filed concurrently herewith, covers subject matter related to the subject matter of the present invention: "DATA PROCESSOR STORAGE SYSTEMS WITH DYNAMIC RESYNCHRONIZATION OF MIRRORED LOGICAL DATA VOLUMES SUBSEQUENT TO A STORAGE SYSTEM FAILURE"Ser. No. 09/325,405.
1. Technical Field
The present invention is directed to methods and programs for computer storage systems conventionally implemented in disk drive storage and, more particularly, to stored data recovery by resynchronization of stored mirrored logical data volumes after failures in storage systems where the physical volume (PV) is accessed or used by multi-initiators, i.e. a plurality of independently operated data processors.
2. Background of Related Art
In the current data processing environment, there has been a dramatic increase in the availability and capacity of computer storage systems, such as hard disk drives and optical drives. Present storage systems associated with workstations may have conventional capacities up to hundreds of gigabytes. However, because of these increased capacities, problems have arisen in storage system recovery after a system failure or like problem. This is particularly the case in storage systems which use mirrored stored logical data volumes. Mirroring is the implementation where the operating system makes a plurality of copies of data (usually duplicate or triplicate copies) in order to make data recovery easier in the event of a system failure or like problem. However, all mirrored storage systems require a system resynchronization after a failure. This will resynchronize all noncurrent PV partitions used in the mirroring to represent the logical volume partitions of the logical volume group.
By way of background, most AIX™ and UNIX™ based operating systems use some form of stored data mirroring. A basic storage system may be considered to be a hierarchy managed by a logical volume manager and made up of logical volume groups, which are, in turn, made up of a plurality of logical volumes which are physically represented by PVs on the actual disk or hard drive. Each PV is divided into physical partitions (PPs), which are equal size segments on a disk, i.e. the actual units of space allocation. Data on logical volumes appears to be contiguous to the user, but can be noncontiguous on the PV. This allows file systems and other logical volumes to be resized and relocated, span multiple PVs and have their contents replicated for greater flexibility and availability in the storage of data. In mirrored systems, a logical volume is divided into a plurality of mirrored logical data partitions, i.e. each logical volume has two or three redundant partitions therein. Such logical and PVs are generally described in the text, AIX 6000 System Guide, Frank Cervone, McGraw-Hill, New York, 1996, pp. 53-56.
In any event, when mirrored logical volumes (LVs) are first brought on-line or initiated, they must be synchronized. In mirrored LVs, each partition of the mirror can have two states: stale or available (unstale). Data may be read from any unstale mirrored partition. On the other hand, in writing, the data must be written to all available (unstale) mirrored partitions before returning. Only partitions that are marked as unstale will be read and written to. In synchronization or in resynchronization, a command such as the AIX "syncvg" command is run which copies information from an unstale mirror partition to the stale mirror partition, and changes the partition designation from stale to unstale.
In systems with mirrored partitions, after a system failure, e.g. a hangup or crash, the LVs must be resynchronized. In current practice, this resynchronization must take place before the storage system may be accessed again; otherwise, the user may get inconsistent data. This is likely to result from "writes" in flight, i.e. data in the process of being written into specific partitions in LVs at the time of the crash which may not be completed and which may cause mirrored partitions to have different data. Reference is made to section 6.2.7 on pp. 163-164 of the above Cervone text. Such resynchronization is usually done sequentially LV by LV. and partition by partition. Because of the increased size of current storage systems and the large size groups of logical data volumes which may be involved in a resynchronization after a storage system failure, users pay be subject to undesirable delays while waiting for the completion of synchronization in order to access data from storage systems using mirrored volumes.
The above cross-referenced patent application, "DATA PROCESSOR STORAGE SYSTEMS WITH DYNAMIC RESYNCHRONIZATION OF MIRRORED LOGICAL DATA VOLUMES SUBSEQUENT TO A STORAGE SYSTEM FAILURE", which is hereby incorporated by reference, offers a solution to this problem. It provides a system for dynamically resynchronizing in the event of a storage system failure. Immediately after the correction of the problem causing the failure, the resynchronization of the plurality of LVs is commenced, but without waiting for the resynchronization to be completed, data is accessed from a data partition in a portion of one of said LVs. Then, there are means for determining whether the portion of the LV containing the accessed partition has already been resynchronized prior to access, together with means responsive to these determining means for replacing data in the other mirrored partitions corresponding to the accessed data with the accessed data in said accessed partition in the event that the LV has not been resynchronized.
While this approach is very effective where the physical storage system, i.e. the physical data volume is accessed by only a single data processor, additional problems arise when the PV is accessed by multi-initiators, i.e. more than one independent data processor. Since the partitions in the PVs are shared by logical volumes on different initiators through their respective nodes, the resynchronizing effects of the LVs at these different nodes must be considered during the resynchronization.
The present invention covers accessing a physical data volume through a plurality of independent data processors at a plurality of nodes. Each node has at least one logical data volume comprising mirrored data corresponding to a logical data volume comprising mirrored data at each of the other nodes. Also, data in such corresponding LVs is represented by data stored in common partitions in said PV. When a storage failure occurs at any of the nodes, there is resynchronization of the logical data volumes of each of the nodes comprising commencing the sequential resynchronization of a logical data volume at a nonfailure node to thereby sequentially resynchronize the partitions of the physical data volume representative of said logical data volume, and indicating as resynchronized those portions of the logical data volume at said failure node represented by the resynchronized partitions of said physical data volume.
Usually, each of the logical data volumes comprises a plurality of partitions of mirrored data respectively represented by said physical data volume partitions. Logical data volumes of said nodes may be open or closed and only open logical data volumes are resynchronized. In the effective operation of the system, the commencing of the sequential resynchronization of a logical data volume at the failure node is subsequent to the commencing of the sequential resynchronization of the logical data volume at the nonfailure node. Best results are achieved with means responsive to said failure for setting resynchronization indicators for the LV at the failure node and for the corresponding logical volumes at the nonfailure nodes in combination with means for removing said resychronization indicator from the partitions of the logical data volume at said failure node represented by said resynchronized partitions of said physical data volume.
The present invention will be better understood and its numerous objects and advantages will become more apparent to those skilled in the art by reference to the following drawings, in conjunction with the accompanying specification, in which:
Referring to
Application programs 40 and their calls, as controlled by the operating system, are moved into and out of the main random access memory (RAM) 14 and consequently into and out of secondary storage, disk drive 20. As will be subsequently described, the PVs of data dealt within the present invention are stored within disk drive 20. A read only memory (ROM) 16 is connected to CPU 10 via bus 12 and includes the basic input/output system (BIOS) that controls the basic computer functions. RAM 14, I/O adapter 18 and communications adapter 34 are also interconnected to system bus 12. I/O adapter 18 may be a small computer system interface (SCSI) adapter that communicates with the disk storage device 20. Communications adapter 34 interconnects bus 12 with an outside network enabling the data processing system to communicate with other such systems over a local area network (LAN) or wide area network (WAN); which includes, of course, the Internet. I/O devices are also connected to system bus 12 via user interface adapter 22 and display adapter 36. Keyboard 24 and mouse 26 are all interconnected to bus 12 through user interface adapter 22. It is through such input devices that the user may interactively make calls to application programs. Display adapter 36 includes a frame buffer 39, which is a storage device that holds a representation of each pixel on the display screen 38. Images may be stored in frame buffer 39 for display on monitor 38 through various components, such as a digital to analog converter (not shown) and the like. By using the aforementioned I/O devices, a user is capable of inputting information to the system through the keyboard 24 or mouse 26 and receiving output information from the system via display 38.
Now, with respect to
The present system shown is a multi-initiator system in which concurrent files are represented, such as the IBM parallel file mode system. The file systems shown in
The LV data is mirrored data. The mirrored corresponding LV data need not be stored in contiguous or even corresponding positions on the PVs. They may be stored at randomly assigned positions in the disk drives which make up these PVs.
Now, with respect to the flowcharts of
Also, in the following descriptions, we will note that LVs or partitions in such volumes are in need of recovery. This merely indicates that they must be resynchronized.
Now, with reference to
Now, with respect to
Now, with respect to
Now, with respect to
One of the preferred implementations of the present invention is as a routine in an operating system made up of programming steps or instructions resident in RAM 14,
Although certain preferred embodiments have been shown and described, it will be understood that many changes and modifications may be made therein without departing from the scope and intent of the appended claims.
McBrearty, Gerald Francis, Shieh, Johnny Meng-Han, Maddalozzo, Jr., John
Patent | Priority | Assignee | Title |
6832330, | Sep 05 2001 | EMC IP HOLDING COMPANY LLC | Reversible mirrored restore of an enterprise level primary disk |
7340640, | May 02 2003 | ACQUIOM AGENCY SERVICES LLC, AS ASSIGNEE | System and method for recoverable mirroring in a storage environment employing asymmetric distributed block virtualization |
7680839, | Sep 30 2004 | ACQUIOM AGENCY SERVICES LLC, AS ASSIGNEE | System and method for resynchronizing mirrored volumes |
7904747, | Jan 17 2006 | HGST NETHERLANDS B V | Restoring data to a distributed storage node |
7941602, | Feb 10 2005 | Xiotech Corporation | Method, apparatus and program storage device for providing geographically isolated failover using instant RAID swapping in mirrored virtual disks |
7971013, | Apr 30 2008 | Xiotech Corporation | Compensating for write speed differences between mirroring storage devices by striping |
8015437, | Jan 17 2006 | Western Digital Technologies, INC | Restoring data to a distributed storage node |
8234316, | Sep 30 2008 | Microsoft Technology Licensing, LLC | Nested file system support |
9021296, | Oct 18 2013 | Hitachi, LTD | Independent data integrity and redundancy recovery in a storage system |
9069784, | Jun 09 2013 | Hitachi, LTD | Configuring a virtual machine |
9110719, | Jun 19 2013 | Hitachi, LTD | Decentralized distributed computing system |
9304821, | Jun 09 2013 | Hitachi, LTD | Locating file data from a mapping file |
9430484, | Oct 18 2013 | Hitachi, LTD | Data redundancy in a cluster system |
Patent | Priority | Assignee | Title |
4993030, | Apr 22 1988 | AMDAHL CORPORATION, 1250 EAST ARQUES AVENUE, SUNNYVALE, CALIFORNIA 94088 A DE CORP | File system for a plurality of storage classes |
5129088, | Nov 30 1987 | International Business Machines Corporation | Data processing method to create virtual disks from non-contiguous groups of logically contiguous addressable blocks of direct access storage device |
5488731, | Aug 03 1992 | International Business Machines Corporation | Synchronization method for loosely coupled arrays of redundant disk drives |
5644696, | Jun 06 1995 | International Business Machines Corporation | Recovering multi-volume data sets during volume recovery |
5673382, | May 30 1996 | International Business Machines Corporation | Automated management of off-site storage volumes for disaster recovery |
5748882, | Sep 30 1992 | Alcatel Lucent | Apparatus and method for fault-tolerant computing |
5758342, | Jan 23 1995 | International Business Machines Corporation | Client server based multi-processor file system wherein client files written to by a client processor are invisible to the server |
5805785, | Feb 27 1996 | International Business Machines Corporation | Method for monitoring and recovery of subsystems in a distributed/clustered system |
5889935, | May 28 1996 | EMC Corporation | Disaster control features for remote data mirroring |
5897661, | Feb 25 1997 | International Business Machines Corporation | Logical volume manager and method having enhanced update capability with dynamic allocation of storage and minimal storage of metadata information |
5983316, | May 29 1997 | Hewlett Packard Enterprise Development LP | Computing system having a system node that utilizes both a logical volume manager and a resource monitor for managing a storage pool |
6047294, | Mar 31 1998 | EMC IP HOLDING COMPANY LLC | Logical restore from a physical backup in a computer storage system |
6173413, | May 12 1998 | Oracle America, Inc | Mechanism for maintaining constant permissions for multiple instances of a device within a cluster |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Jun 01 1999 | MADDALOZZO, JOHN JR | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 010019 | /0511 | |
Jun 01 1999 | MCBREARTY,GERALD F | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 010019 | /0511 | |
Jun 01 1999 | SHIEH,JOHNNY M | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 010019 | /0511 | |
Jun 03 1999 | International Business Machines Corporation | (assignment on the face of the patent) | / | |||
Dec 30 2013 | International Business Machines Corporation | TWITTER, INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 032075 | /0404 |
Date | Maintenance Fee Events |
Sep 14 2005 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Oct 21 2009 | M1552: Payment of Maintenance Fee, 8th Year, Large Entity. |
Jan 10 2014 | REM: Maintenance Fee Reminder Mailed. |
Feb 27 2014 | M1553: Payment of Maintenance Fee, 12th Year, Large Entity. |
Feb 27 2014 | M1556: 11.5 yr surcharge- late pmt w/in 6 mo, Large Entity. |
Date | Maintenance Schedule |
Jun 04 2005 | 4 years fee payment window open |
Dec 04 2005 | 6 months grace period start (w surcharge) |
Jun 04 2006 | patent expiry (for year 4) |
Jun 04 2008 | 2 years to revive unintentionally abandoned end. (for year 4) |
Jun 04 2009 | 8 years fee payment window open |
Dec 04 2009 | 6 months grace period start (w surcharge) |
Jun 04 2010 | patent expiry (for year 8) |
Jun 04 2012 | 2 years to revive unintentionally abandoned end. (for year 8) |
Jun 04 2013 | 12 years fee payment window open |
Dec 04 2013 | 6 months grace period start (w surcharge) |
Jun 04 2014 | patent expiry (for year 12) |
Jun 04 2016 | 2 years to revive unintentionally abandoned end. (for year 12) |