In a storage system, snapshots of data are created according to a snapshot algorithm in response to writes. The snapshot algorithm is selected from among a plurality of different snapshot algorithms according to one or more criteria.
|
8. A storage system comprising:
a processor to:
receive a write request to modify data;
select, based on a rule base, which of plural snapshot algorithms to use for creating snapshots;
apply the selected snapshot algorithm in processing the write request; and
reconcile data blocks modified using different ones of the plural snapshot algorithms.
11. An article comprising at least one computer-readable storage medium containing instructions that when executed cause a storage system to:
receive a write request to modify data;
select, based on a rule base, which of plural snapshot algorithms to use for creating snapshots; and
apply the selected snapshot algorithm in processing the write request.
1. A method for use with a storage system, comprising:
in response to writes, creating snapshots of data using a snapshot algorithm; and
selecting the snapshot algorithm from among a plurality of different snapshot algorithms according to one or more criteria, wherein selecting the snapshot algorithm from among the plurality of snapshot algorithms according to the one or more criteria comprises selecting the snapshot algorithm from among the plurality of snapshot algorithms according to input variables including at least a first parameter indicating a fraction of a source volume that has been snapped and a second parameter indicating a maximum acceptable response time of the storage system.
2. The method of
3. The method of
4. The method of
monitoring a response time of the storage system; and
comparing the monitored response time to the maximum acceptable response time to compute an error, wherein the error is used to select from among the plurality of snapshot algorithms.
5. The method of
computing a probability of a snap based on the fraction, wherein the probability is used to select from among the plurality of snapshot algorithms.
6. The method of
fuzzifying the probability of the snap by mapping different values of the probability to different first fuzzy values;
fuzzifying the error by mapping different values of the error to different second fuzzy values,
wherein the first and second fuzzy values are part of a rule base used to select from among the plurality of snapshot algorithms.
7. The method of
computing a change in error based on values of the error computed at different times; and
fuzzifying the change in error by mapping different values of the change in error to different third fuzzy values,
wherein the third fuzzy values are also part of the rule base.
9. The storage system of
10. The storage system of
12. The article of
13. The article of
|
This application is a national stage application under 35 U.S.C. §371 of PCT/US2008/079039, filed Oct. 7, 2008.
A storage system, such as a storage array system or a storage area network of storage devices, can be used to store a relatively large amount of data on behalf of an enterprise (e.g., company, business, government agency, etc.). Data and software associated with users and applications can be stored in such a storage system, such that reduced local storage resources can be provided at user terminals.
Protection of data stored in a storage system is desirable. One aspect of data protection is to enable recovery of data in case of faults or corruption due to hardware and/or software failures, or data corruption or loss caused by malware attacks (e.g., attacks caused by a virus or other malicious code designed to cause damage to stored data).
One type of technique that has been used to protect data stored in a storage system is to create point-in-time copies of data as such data is modified by write operations. Point-in-time copies of data are also referred to as “snapshots.” A snapshot can be created when a write occurs. In a snapshot-based storage system, original data can be kept in a source volume of data. Prior to modification of data in the source volume, a snapshot of the data to be modified can be taken. Many snapshots can be taken over time as writes are received at the storage system. If recovery of data is desired for any reason, one or more of the snapshots can be used to recover data back to a prior state, such as before a point in time when corruption or data loss occurred.
There are different algorithms for performing snapshots of data. A first type of snapshot algorithm is referred to as a “copy-on-write” (CoW) snapshot algorithm, in which a write of data causes the storage system to copy the original data from the source volume to a snapshot volume before proceeding with the write. With the copy-on-write snapshot algorithm, the original version of the data is kept in the snapshot volume, whereas the modified version of the data is kept in the source volume.
A second type of snapshot algorithm is a “redirect-on-write” (RoW) snapshot algorithm, in which the write data is redirected to another location (“redirect-on-write location”) that is set aside for a snapshot, while the source volume maintains an original version of the data. The redirect-on-write snapshot algorithm effectively defers the taking of a snapshot until a later point in time—at a later point in time, snapshots of original versions of data present in the source volume are taken, with the modified versions of the data moved to the source volume from the redirect-on-write location.
Typically, a storage system uses just one type of snapshot algorithm (e.g., copy-on-write snapshot algorithm or redirect-on-write operation) to create snapshots in response to writes to data in a storage system. Under certain conditions, use of just a single snapshot algorithm in creating snapshots can result in reduced performance of a storage system. For example, with the copy-on-write snapshot algorithm, a copy penalty is associated with each data write, since the original version copy of the data has to be first copied to the snapshot volume before the data in the source volume is modified. On the other hand, although the redirect-on-write snapshot algorithm avoids the copy penalty immediately after a write occurs, tracking of data and data reconciliation can be more complex.
Some embodiments of the invention are described, by way of example, with respect to the following figures:
In general, according to some embodiments, a flexible snapshot mechanism is provided for a storage system that stores data on one or more storage devices. The storage system can be a storage array of storage devices. Alternatively, the storage system can include a storage area network of storage devices. The flexible snapshot mechanism enables different types of snapshot algorithms to be used at different times, depending upon one or more criteria.
In the ensuing discussion, reference is made to data stored in “storage volumes,” which refer to logical partitions of data contained in the storage system. A “volume” can refer to any collection of data in the storage system. A “source volume” is a volume that contains a version of data prior to modification of the data. A “snapshot volume” refers to a volume that contains point-in-time copies of data, which correspond to versions of data prior to modification of the data. Multiple snapshot volumes can be maintained in the storage system, where the multiple snapshot volumes are taken at different points in time.
Snapshot volumes are typically not accessed during normal operation of the storage system; however, in case of data failure or corruption, the snapshot volumes are accessed to recover data to a prior version.
As noted above, in accordance with some embodiments, different snapshot algorithms can be selectively used in the storage system at different times, depending upon one or more criteria. The multiple different snapshot algorithms can include a copy-on-write snapshot algorithm and a redirect-on-write snapshot algorithm, for example. It is noted that the storage system may also use other snapshot algorithms in other embodiments. The ability to select different snapshot algorithms to use according to one or more criteria can result in improved performance of the storage system, since it may be possible to improve response time by switching to a different snapshot algorithm depending upon the condition of the storage system.
As depicted in
The CPU(s) 104 is (are) connected to a memory 110 in the storage controller 106. The memory 110 can be used to store one or more parameters 112 that define the one or more criteria that the fuzzy control logic 102 uses for selecting between different snapshot algorithms.
The storage controllers 106 are connected over a network 111 to storage subsystems 114. In one example, each storage subsystem 114 can be implemented with an array of storage devices. As depicted in
Although just one source volume 116 is depicted in the storage subsystem 114 of
The combination of the storage controllers 106 and storage subsystems 114 is referred to as a “storage system.” The storage system is accessible by one or more client devices 122 that are connected over a network 124. The client devices 122 are able to submit read and write requests to the storage controllers 106. A read causes a storage controller 106 to retrieve the requested data from a storage subsystem 114 (or plural storage subsystems 114), while a write causes data to be written to the storage subsystem(s) 114 to modify data (or to add data). More generally, a write is considered to “update” data in the storage subsystem, where updating includes modifying existing data or adding new data.
With a copy-on-write snapshot algorithm, every write to the source volume causes the storage system to copy the original data from the source volume before proceeding with the write. Assume there is a source volume 116 including Di data blocks, as depicted in
If a read request accesses a block of data that has not been written to since the creation of the snapshot volume 200, the data wilt be read from the source volume 116. If a write request occurs (at 210 in
After the copy-on-write is performed, the pointers to the respective data blocks are updated (as shown in
With the redirect-on-write snapshot algorithm, new writes to the source volume are redirected to another location (“redirect-on-write location”) set aside for the snapshot volume. This avoids the copy-on-write penalty since the write proceeds without first copying the original data to the snapshot volume. But in this case, the original source volume would still contain the original (unmodified) data, and the snapshot volume has the updated block, which is the reverse of the copy-on-write scenario. If a snapshot volume is deleted, the data from the snapshot volume is reconciled back into the source volume.
Basically, with the redirect-on-write snapshot algorithm, the creation of a snapshot volume containing a prior version of modified data is deferred until a later point in time, such as when reconciliation has to be performed. It is noted that, with the redirect-on-write snapshot algorithm, as the same data block is modified multiple times, the tracking of the data of the modified data block for provision in additional snapshot volumes has to be tracked, which can be more complicated than would be the case for the copy-on-write scenario.
Thus, it is apparent from the foregoing that each of the copy-on-write and redirect-on-write algorithms has its benefits and downsides. The ability to selectively and dynamically switch between the different types of algorithms according to some embodiments allows the storage system to take advantage of the different benefits offered by the different types of snapshot algorithms to adapt to changing conditions of the storage system, such that overall performance of the storage system can be improved.
The snap throttle factor, uth(t), is an input to the snapshot control logic 108 of
The term “snap” referred to above is used synonymously with copy-on-write. Thus, the “snap” throttle factor means the factor indicating the percentage of time that copy-on-write is to be used.
In some embodiments, two input parameters (criteria) that are provided to the fuzzy control logic 102 are: (1) the fraction of snapped blocks in the source volume, fsnap (which refers to the fraction of blocks in the source volume that have been copied to a snapshot volume as a result of a write to those blocks), and (2) the reference response time, wrt, which represents the maximum acceptable response time during a snapshot process. As depicted in
According to the input parameters, the fuzzy control logic 102 regulates or “throttles” the rate of copy-on-writes in a dynamic and intelligent manner to reduce system response time. In some embodiments, the fuzzy control logic 102 attempts to minimize the system response time when regulating or throttling the rate of copy-on-writes, based on the snap throttle factor, uth(t).
By using different snapshot algorithms at different times, such as the copy-on-write snapshot algorithm and the redirect-on-write snapshot algorithm, it is noted that reconciliation between the state of the data when redirect-on-write snapshot is used versus when copy-on-write snapshot is used has to be performed. To provide such reconciliation, a tracking mechanism is provided to track data blocks for which redirect-on-write has been performed. The tracking mechanism includes a redirect-on-write pointer that points to the data block associated with a redirect-on-write. Note that redirect-on-write causes a write to occur to the snapshot volume rather than the source volume; as a result, the redirect-on-write pointer will point to modified data in the snapshot volume. An example is depicted in
In the example of
An example of the above rule is discussed in the context of
Thus, according to the RoW-CoW rule, RoW pointers are used only when a redirect-on-write occurs for first writes to respective data blocks. Each subsequent write to such data blocks will cause the RoW-CoW rule to apply, which will result in a copy-on-write being performed, and the RoW pointer being deleted since the snapshot volume data block pointers 204 will point to the snapped data block as is done for copy-on-writes.
Returning again to
To control the response time y(t) output by the storage system, the output y(t) is periodically monitored every Tm time interval. The decision of how often to monitor is based on the maximum acceptable response wrt (e.g., Tm can be set equal to wrt). However, a lower monitoring time interval (Tm<wrt) can also provide a fast control response from the fuzzy control logic 102 without interrupting the storage controller as much. Effectively, the sampling of the output y(t) is performed at intervals of time Tm. Each sample is denoted by y(ti), where i is the i-th sample of the output that occurred at a time ti, as follows:
ti=Tm*i where i=0, 1, 2, . . . (Eq. 1)
The output y(ti) is compared with the reference response time wrt to compute the error with respect to it:
e(ti)=y(ti)−wrt. (Eq. 2)
The change in error is also computed:
Δe(ti)=e(ti)−e(ti-1). (Eq. 3)
The fuzzy control logic 102 can be considered as a proportional and integral (PI) control logic because of the use of the error e and the change in error Δe.
The next step is the calculation of the probability of a snap at a certain time. To do that, the fraction of snap at time ti is fsnap(ti) is defined. The fsnap(ti), in addition to being an indication of the percentage of blocks snapped, also denotes the probability of further snaps. For example, if 90% of the blocks in a source volume have been snapped, the probability of causing further snaps is only 10% (assuming a random user access over the volume). This is a consequence of the binomial nature of the snapshot process. The probability of a snap at time ti is:
psnap(ti)=1−fsnap(ti). (Eq. 4)
The probability of a snap psnap(ti), the error e(ti), and the change in error Δe(ti), are the three variables used by the fuzzy control logic 102 to compute the snap throttle factor, uth(ti). To be used in the fuzzy control logic 102 and combined according to fuzzy rules, these three variables are first “fuzzified.” The fuzzification of psnap is done in a straightforward fashion. If the probability of a snap is below or equal to 0.5 (or some other predefined fraction), it is mapped to a Low Probability (LP) fuzzy descriptor. If the probability of a snap is greater than 0.5 (or some other predefined fraction), it is mapped to a High Probability (HP) fuzzy descriptor. The membership function of probability of a snap is therefore defined by:
The final fuzzification of the psnap value is denoted by Fμsnap(μsnap), and is defined as:
A goal in the fuzzification of the error e and change in error Δe is to map them in one of three fuzzy descriptors: Zero (ZE), Positive Error (PE), and Negative Error (NE), respectively. These fuzzy descriptors apply to both the error e and change in error Δe These fuzzy descriptors indicate when the error is close to zero, or in case where the error does exist, whether the error is positive or negative. Positive error occurs if the response time y(t) is greater than the reference response time wrt (which means that copy-on-writes are causing the monitored response time to exceed the maximum acceptable response time, which indicates that copy-on-writes should not be performed—instead, redirect-on-writes should be performed). Negative error occurs if the response time y(t) is less than the reference response time wrt (which means that the monitored response time is within the maximum acceptable response time, which indicates that the number of copy-on-writes can be increased).
The fuzzification is first performed via three triangular membership functions μZE, μNE, and μPE, based on the reference response time wrt. The three mathematical membership functions are:
The membership functions of Eqs. 7-9 are for the error e but these same membership functions are used for the change in error Δe by using Δe as the independent variable instead of the error e.
Finally, the error e and the change in error Δe are mapped into one of the fuzzy descriptors (NE, ZE, or PE). This is accomplished by comparing the values obtained for the three membership functions (Esq. 7-9). Depending on which of the three has the maximum value, the fuzzy value of the error Fe, and the fuzzy value of the change in error FΔe, are mapped into one of the fuzzy descriptors NE, ZE, PE, Mathematically:
Fe=max(μeNE,μeZE,μePE), and (Eq. 10)
FΔG=max(μΔeNE,μΔeZE,μΔePE). (Eq. 11)
The error e is mapped to one of NE, ZE, and PE depending upon which of μeNE, μeZE, μePE, respectively, has the largest value; similarly, the change in error Δe is mapped to one of NE, ZE, and PE depending upon which of μΔeNE, μΔeZE, μΔEPE, respectively, has the largest value.
For example, if the output y(1) is 45 ms (milliseconds), then using (Eq. 2) the error e is 15 ms. And using Eqs. 7-9, the membership values are μZE=0, μNE=0, μPE=1. It is clear that the maximum value corresponds to μPE. Using Eq. 10, the fuzzy value of the error Fe will be mapped to Positive Error, PE.
The rule base can be built now based on the following heuristic criteria: 1) if the response time is high, then error, e, is fuzzy positive, PE, and the fuzzy control logic 102 has to reduce the number of copy-on-writes occurring—the snap throttle factor uth is reduced; 2) if the response time is low, then the fuzzy control logic 102 can increase the number of copy-on-writes occurring—the snap throttle factor uth is increased.
An example rule base is provided below:
Rule Input Variables
Rule Output
psnap
e
Δe
Δuth
R1
HP
PE
PE
−0.2
R2
HP
PE
NE
−0.1
R3
HP
ZE
PE
−0.1
R4
HP
ZE
PE
−0.1
R5
HP
NE
ZE
+0.05
R6
HP
NE
NE
+0.05
R7
LP
PE
PE
−0.05
R8
LP
PE
ZE
−0.05
R9
LP
ZE
PE
−0.05
R10
LP
NE
PE
+0.05
R11
LP
NE
NE
+0.05
The rule base includes 11 rules R1 to R11, where each rule specifies an output value for Δuth based on values of psnap (HP or LP), e (PE, NE, or ZE), and Δe (PE, NE, or ZE).
In some embodiments, a lower bound (e.g., 0.05) is set for the snap throttle factor such that the snap throttle factor uth value is in the [0.05, 1] range and the fuzzy control logic 102 checks the value of uth after each execution of one of the rules to ensure that uth is within this range. The snap throttle factor uth=0.05 means that at least 5% of the writes that cause a copy-on-write will be allowed to proceed. This lower bound on uth is set this way to allow some copy-on-writes to proceed and make progress, even if it is just little. The choice of the lower bound for uth can be based on empirical observations of actual snapshot processes.
uth(ti)=uth(ti-1)+Δuth(ti). (Eq. 12)
According to the throttle factor, the fuzzy control logic 102 selects (at 716) one of the copy-on-write snapshot algorithm and redirect-on-write snapshot algorithm to use for a write request. Note that the fuzzy control logic 102 also takes into account the RoW-CoW rule discussed above when selecting between the snapshot algorithms.
The initial values of uth and e to be used by the fuzzy control logic 102 when a snapshot volume is first created are uth(0)=0.05 and e(0)=0. The output of the fuzzy rules is the change in snap throttle factor Δuth(ti) that will be used to obtain the new snap throttle factor uth(ti) according to Eq. 12. The factor uth(ti) will be used to decide what-percentage of the work requests are to be processed according to copy-on-write and what percentage are to be processed according to redirect-on-write. Table 1 shows an example of the complete rule base.
Using the flexible snapshot algorithm discussed above, response times of storage systems that use snapshot mechanisms to provide data protection can be improved.
Instructions of software described above (including the fuzzy control logic 102 and snapshot control logic 104 of
Data and instructions (of the software) are stored in respective storage devices, which are implemented as one or more computer-readable or computer-usable storage media. The storage media include different forms of memory including semiconductor memory devices such as dynamic or static random access memories (DRAMs or SRAMs), erasable and programmable read-only memories (EPROMs), electrically erasable and programmable read-only memories (EEPROMs) and flash memories; magnetic disks such as fixed, floppy and removable disks; other magnetic media including tape; and optical media such as compact disks (CDs) or digital video disks (DVDs). Note that the instructions of the software discussed above can be provided on one computer-readable or computer-usable storage medium, or alternatively, can be provided on multiple computer-readable or computer-usable storage media distributed in a large system having possibly plural nodes. Such computer-readable or computer-usable storage medium or media is (are) considered to be part of an article (or article of manufacture). An article or article of manufacture can refer to any manufactured single component or multiple components.
In the foregoing description, numerous details are set forth to provide an understanding of the present invention. However, it will be understood by those skilled in the art that the present invention may be practiced without these details. While the invention has been disclosed with respect to a limited number of embodiments, those skilled in the art will appreciate numerous modifications and variations therefrom. It is intended that the appended claims cover such modifications and variations as fall within the true spirit and scope of the invention.
Umberger, David K., Navarro, Guillermo
Patent | Priority | Assignee | Title |
10013473, | Aug 30 2011 | International Business Machines Corporation | Fast snapshots |
10042714, | Sep 21 2015 | International Business Machines Corporation | Point-in-time copy on write for golden image |
10209910, | Sep 21 2015 | International Business Machines Corporation | Copy-redirect on write |
9189490, | Aug 30 2011 | International Business Machines Corporation | Fast snapshots |
9201892, | Aug 30 2011 | International Business Machines Corporation | Fast snapshots |
9747357, | Aug 30 2011 | International Business Machines Corporation | Fast snapshots |
9823847, | Feb 24 2015 | International Business Machines Corporation | Optimized copy algorithm selection |
9886349, | Sep 21 2015 | International Business Machines Corporation | Point-in-time copy on write for golden image |
9940041, | Sep 21 2015 | International Business Machines Corporation | Copy-redirect on write |
Patent | Priority | Assignee | Title |
6618794, | Oct 31 2000 | Hewlett Packard Enterprise Development LP | System for generating a point-in-time copy of data in a data storage system |
6898667, | May 23 2002 | Hewlett Packard Enterprise Development LP | Managing data in a multi-level raid storage array |
6917963, | Oct 05 1999 | Veritas Technologies LLC | Snapshot image for the application state of unshareable and shareable data |
6957433, | Jan 08 2001 | VALTRUS INNOVATIONS LIMITED | System and method for adaptive performance optimization of data processing systems |
7146467, | Apr 14 2003 | Hewlett Packard Enterprise Development LP | Method of adaptive read cache pre-fetching to increase host read throughput |
7213165, | Mar 23 2000 | Hewlett Packard Enterprise Development LP | Host I/O performance and availability of a storage array during rebuild by prioritizing I/O requests |
7337269, | Oct 03 2002 | Hewlett Packard Enterprise Development LP | Method of managing a data storage array, and a computer system including a raid controller |
8201018, | Sep 18 2007 | Hewlett Packard Enterprise Development LP | Control of sparing in storage systems |
20040139125, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Oct 06 2008 | NAVARRO, GUILLERMO | HEWLETT-PACKARD DEVELOPMENT COMPANY, L P | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 026051 | /0797 | |
Oct 06 2008 | UMBERGER, DAVID K | HEWLETT-PACKARD DEVELOPMENT COMPANY, L P | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 026051 | /0797 | |
Oct 07 2008 | Hewlett-Packard Development Company, L.P. | (assignment on the face of the patent) | / | |||
Oct 27 2015 | HEWLETT-PACKARD DEVELOPMENT COMPANY, L P | Hewlett Packard Enterprise Development LP | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 037079 | /0001 |
Date | Maintenance Fee Events |
Jul 19 2017 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Jul 20 2021 | M1552: Payment of Maintenance Fee, 8th Year, Large Entity. |
Date | Maintenance Schedule |
Feb 11 2017 | 4 years fee payment window open |
Aug 11 2017 | 6 months grace period start (w surcharge) |
Feb 11 2018 | patent expiry (for year 4) |
Feb 11 2020 | 2 years to revive unintentionally abandoned end. (for year 4) |
Feb 11 2021 | 8 years fee payment window open |
Aug 11 2021 | 6 months grace period start (w surcharge) |
Feb 11 2022 | patent expiry (for year 8) |
Feb 11 2024 | 2 years to revive unintentionally abandoned end. (for year 8) |
Feb 11 2025 | 12 years fee payment window open |
Aug 11 2025 | 6 months grace period start (w surcharge) |
Feb 11 2026 | patent expiry (for year 12) |
Feb 11 2028 | 2 years to revive unintentionally abandoned end. (for year 12) |