A system and method for dynamic raid geometries. A computer system comprises client computers and data storage arrays coupled to one another via a network. A data storage array utilizes solid-state drives and Flash memory cells for data storage. A storage controller within a data storage array is configured to configure a first subset of the storage devices for use in a first raid layout, the first raid layout including a first set of redundant data. The controller further configures a second subset of the storage devices for use in a second raid layout, the second raid layout including a second set of redundant data. Additionally, when writing a stripe, the controller may select from any of the plurality of storage devices for one or more of the first raid layout, the second raid layout, and storage of redundant data by the additional logical device.
|
0. 1. A computer system comprising:
a data storage subsystem comprising a plurality of storage devices in a redundant array of independent disks (raid) configuration; and
a storage controller configured to:
write a first raid stripe to the plurality of storage devices including:
for each storage device of a subset of the plurality of storage devices, writing within a page of the storage device, user data and a checksum that validates the user data stored on the storage device;
writing within a page of a particular storage device of the plurality storage devices,
inter-device protection data, the inter-device protection data protecting the user data stored on each storage device of the subset of the plurality of storage devices;
intra-page protection data, the intra-page protection data protecting the inter-device protection data stored on the particular storage device; and
inter-page protection data, the inter-page protection data protecting the checksums stored on each storage device of the subset of storage devices.
0. 15. A non-transitory computer readable storage medium storing program instructions, wherein the program instructions are executable to:
write a raid stripe to a plurality of storage devices in a redundant array of independent disks (raid) configuration, wherein writing the raid stripe includes:
for each storage device of a subset of the plurality of storage devices, writing within a page of the storage device, user data and a checksum that validates the user data stored on the storage device;
writing within a page of a particular storage device of the plurality storage devices,
inter-device protection data, the inter-device protection data protecting the user data stored on each storage device of the subset of the plurality of storage devices;
intra-page protection data, the intra-page protection data protecting the inter-device protection data stored on the particular storage device; and
inter-page protection data, the inter-page protection data protecting the checksums stored on each storage device of the subset of storage devices.
0. 19. A computer system comprising:
a data storage subsystem comprising a plurality of storage devices in a redundant array of independent drives (raid) configuration; and
a storage controller to:
write a first raid stripe to the plurality of storage devices including:
for each storage device of a subset of the plurality of storage devices, writing within a page of the storage device, user data, and a checksum that validates the user data stored on each storage device of the subset of the plurality of storage devices; and
writing, within a page of a particular storage device of the plurality of storage devices:
inter-device redundancy data, the inter-device redundancy data to protect the user data stored on each storage device of a first subset of the plurality of storage devices,
intra-page error recovery data, the intra-page error recovery data to protect the inter-device redundancy data stored on the particular storage device, and
inter-page protection data, the inter-page protection data to protect the checksums stored on each storage device of the subset of the plurality of storage devices.
0. 26. A method, comprising:
writing, by a storage controller of a data storage subsystem comprising a plurality of storage devices in a redundant array of independent drives (raid) configuration, a first raid stripe to the plurality of storage devices, wherein writing the first raid stripe comprises:
for each storage device of a subset of the plurality of storage devices, writing within a page of the storage device, user data, and a checksum that validates the user data stored on each storage device of the subset of the plurality of storage devices; and
writing, within a page of a particular storage device of the plurality of storage devices:
inter-device redundancy data, the inter-device redundancy data to protect the user data stored on each storage device of a first subset of the plurality of storage devices,
intra-page error recovery data, the intra-page error recovery data to protect the inter-device protection data stored on the particular storage device, and
inter-page protection data, the inter-page protection data to protect the checksums stored on each storage device of the subset of the plurality of storage devices.
0. 2. The computer system as recited in
0. 3. The computer system as recited in
0. 4. The computer system as recited in
0. 5. The computer system as recited in
0. 6. The computer system as recited in
0. 7. The computer system as recited in
0. 8. The computer system as recited in
0. 9. A method for use in a computing system, the method comprising:
writing a raid stripe to a plurality of storage devices in a redundant array of independent disks (raid) configuration, wherein writing the raid stripe includes:
for each storage device of a subset of the plurality of storage devices, writing within a page of the storage device, user data and a checksum that validates the user data stored on the storage device;
writing within a page of a particular storage device of the plurality storage devices,
inter-device protection data, the inter-device protection data protecting the user data stored on each storage device of the subset of the plurality of storage devices;
intra-page protection data, the intra-page protection data protecting the inter-device protection data stored on the particular storage device; and
inter-page protection data, the inter-page protection data protecting the checksums stored on each storage device of the subset of storage devices.
0. 10. The method as recited in
0. 11. The method as recited in
0. 12. The method as recited in
0. 13. The method as recited in
0. 14. The method as recited in
0. 16. The non-transitory computer readable storage medium as recited in
0. 17. The non-transitory computer readable storage medium as recited in
0. 18. The non-transitory computer readable storage medium as recited in
0. 20. The computer system of claim 19, wherein the storage controller is further configured to write a second raid stripe to a second subset of the plurality of storage devices, the first raid stripe having a first raid layout and the second raid stripe having a second raid layout.
0. 21. The computer system of claim 20, wherein the first raid layout is an L+x layout, and the second raid layout is an M+y layout, wherein L, x, M, and, y are positive integers, and wherein at least one of: (1) L is not equal to M, or (2) x is not equal to y.
0. 22. The computer system of claim 20, wherein the first raid layout is selected from a first device group and the second raid layout is selected from a second device group.
0. 23. The computer system of claim 19, wherein the plurality of storage devices are solid state storage devices.
0. 24. The computer system of claim 19, wherein the plurality of storage devices comprise flash memory cells.
0. 25. The computer system of claim 19, wherein the computer system is a flash memory based system.
0. 27. The method of claim 26, further comprising writing a second raid stripe to a second subset of the plurality of storage devices, the first raid stripe having a first raid layout and the second raid stripe having a second raid layout.
0. 28. The method of claim 27, wherein the first raid layout is an L+x layout, and the second raid layout is an M+y layout, wherein L, x, M, and, y are positive integers, and wherein at least one of: (1) L is not equal to M, or (2) x is not equal to y.
0. 29. The method of claim 27, wherein the first raid layout is selected from a first device group and the second raid layout is selected from a second device group.
0. 30. The method of claim 26, wherein the plurality of storage devices are solid state storage devices.
0. 31. The method of claim 26, wherein the plurality of storage devices comprise flash memory cells.
0. 32. The method of claim 26, wherein the data storage subsystem is a flash memory based system.
|
In the example shown, an L+1 RAID array, M+1 RAID array, and N+1 RAID array are shown. In various embodiments, L, M, and N may all be different, the same, or a combination thereof. For example, RAID array 1210 is shown in partition 1. The other storage devices 1212 are candidates for other RAID arrays within partition 1. Similarly, RAID array 1220 illustrates a given RAID array in partition 2. The other storage devices 1222 are candidates for other RAID arrays within partition 2. RAID array 1230 illustrates a given RAID array in partition 3. The other storage devices 1232 are candidates for other RAID arrays within partition 3.
Within each of the RAID arrays 1210, 1220 and 1230, a storage device P1 provides RAID single parity protection within a respective RAID array. Storage devices D1-DN store user data within a respective RAID array. Again, the storage of both the user data and the RAID single parity information may rotate between the storage devices D1-DN and P1. However, the storage of user data is described as being stored in devices D1-DN. Similarly, the storage of RAID single parity information is described as being stored in device P1 for ease of illustration and description.
One or more logical storage devices among each of the three partitions may be chosen to provide an additional amount of supported redundancy for one or more given RAID arrays. In various embodiments, a logical storage device may correspond to a single physical storage device. Alternatively, a logical storage device may correspond to multiple physical storage devices. For example, logical storage device Q1 in partition 3 may be combined with each of the RAID arrays 1210, 1220 and 1230. The logical storage device Q1 may provide RAID double parity information for each of the RAID arrays 1210, 1220 and 1230. This additional parity information is generated and stored when a stripe is written to one of the arrays 1210, 1220, or 1230. Further this additional parity information may cover stripes in each of the arrays 1210, 1220, and 1230. Therefore, the ratio of a number of storage devices storing RAID parity information to a total number of storage devices is lower. For example, if each of the partitions used N+2 RAID arrays, then the ratio of a number of storage devices storing RAID parity information to a total number of storage devices is 3(2)/(3(N+2)), or 2/(N+2). In contrast, the ratio for the hybrid RAID layout 1200 is (3+1)/(3(N+1)), or 4/(3(N+1)).
It is possible to reduce the above ratio by increasing a number of storage devices used to store user data. For example, rather than utilize storage device Q1, each of the partitions may utilize a 3N+2 RAID array. In such a case, the ratio of a number of storage devices storing RAID parity information to a total number of storage devices is 2/(3N+2). However, during a reconstruct read operation, (3N+1) storage devices receive a reconstruct read request for a single device failure. In contrast, for the hybrid RAID layout 1200, only N storage devices receive a reconstruct read request for a single device failure.
It is noted each of the three partitions may utilize a different RAID data layout architecture. A selection of a given RAID data layout architecture may be based on a given ratio number of storage devices storing RAID parity information to a total number of storage devices. In addition, the selection may be based on a given number of storage devices, which may receive a reconstruct read request during reconstruction. For example, the RAID arrays 1210, 1220 and 1230 may include geometries such as L+a, M+b and N+c, respectively.
In addition, one or more storage devices, such as storage device Q1, may be chosen based on the above or other conditions to provide an additional amount of supported redundancy for one or more of the RAID arrays within the partitions. In an example with three partitions comprising the above RAID arrays and a number Q of storage devices providing extra protection for each of the RAID arrays, a ratio of a number of storage devices storing RAID parity information to a total number of storage devices is (a+b+c+Q)/(L+a+M+b+N+c+Q). For a single device failure, a number of storage devices to receive a reconstruct read request is L, M and N, respectively, for partitions 1 to 3 in the above example. It is noted that the above discussion generally describes 3 distinct partitions in
In addition to the above, in various embodiments, when writing a stripe, the controller may select from any of the plurality of storage devices for one or more of the first RAID layout, the second RAID layout, and storage of redundant data by the additional logical device. In this manner, all of these devices may participate in the RAID groups and for different stripes the additional logical device may be different. In various embodiments, a stripe is a RAID layout on the first subset plus a RAID layout on the second subset plus the additional logical device.
Referring now to
In block 1302, a RAID engine 178 or other logic within a storage controller 174 determines to use a given number of devices to store user data in a RAID array within each partition of a storage subsystem. A RUSH or other algorithm may then be used to select which devices are to be used. In one embodiment, each partition utilizes a same number of storage devices. In other embodiments, each partition may utilize a different, unique number of storage devices to store user data. In block 1304, the storage controller 174 may determine to support a number of storage devices to store corresponding Inter-Device Error Recovery (parity) data within each partition of the subsystem. Again, each partition may utilize a same number or a different, unique number of storage devices for storing RAID parity information.
In block 1306, the storage controller may determine to support a number Q of storage devices to store extra Inter-Device Error Recovery (parity) data across the partitions of the subsystem. In block 1308, both user data and corresponding RAID parity data may be written in selected storage devices. Referring again to
If the storage controller 174 detects a condition for performing read reconstruction in a given partition (conditional block 1310), and if the given partition has a sufficient number of storage devices holding RAID parity information to handle a number of unavailable storage devices (conditional block 1312), then in block 1314, the reconstruct read operation(s) is performed with one or more corresponding storage devices within the given partition. The condition may include a storage device within a given RAID array is unavailable due to a device failure or the device operates below a given performance level. The given RAID array is able to handle a maximum number of unavailable storage devices with the number of storage devices storing RAID parity information within the given partition. For example, if RAID array 1210 in partition 1 in the above example is an L+a RAID array, then RAID array 1210 is able to perform read reconstruction utilizing only storage devices within partition 1 when k storage devices are unavailable, where 1<=k<=a.
If the given partition does not have a sufficient number of storage devices holding RAID parity information to handle a number of unavailable storage devices (conditional block 1312), and if there is a sufficient number of Q storage devices to handle the number of unavailable storage devices (conditional block 1316), then in block 1318, the reconstruct read operation(s) is performed with one or more corresponding Q storage devices. One or more storage devices in other partitions, which are storing user data, may be accessed during the read reconstruction. A selection of these storage devices may be based on a manner of a derivation of the parity information stored in the one or more Q storage devices. For example, referring again to
It is noted that the above-described embodiments may comprise software. In such an embodiment, the program instructions that implement the methods and/or mechanisms may be conveyed or stored on a computer readable medium. Numerous types of media which are configured to store program instructions are available and include hard disks, floppy disks, CD-ROM, DVD, flash memory, Programmable ROMs (PROM), random access memory (RAM), and various other forms of volatile or non-volatile storage.
In various embodiments, one or more portions of the methods and mechanisms described herein may form part of a cloud-computing environment. In such embodiments, resources may be provided over the Internet as services according to one or more various models. Such models may include Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). In IaaS, computer infrastructure is delivered as a service. In such a case, the computing equipment is generally owned and operated by the service provider. In the PaaS model, software tools and underlying equipment used by developers to develop software solutions may be provided as a service and hosted by the service provider. SaaS typically includes a service provider licensing software as a service on demand. The service provider may host the software, or may deploy the software to a customer for a given period of time. Numerous combinations of the above models are possible and are contemplated.
Although the embodiments above have been described in considerable detail, numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.
Colgrove, John, Hayes, John, Miller, Ethan, Hong, Bo
Patent | Priority | Assignee | Title |
11340987, | Mar 04 2021 | NetApp, Inc.; NetApp, Inc | Methods and systems for raid protection in zoned solid-state drives |
11698836, | Mar 04 2021 | NetApp, Inc. | Methods and systems for raid protection in zoned solid-state drives |
11775381, | Sep 17 2021 | Micron Technology, Inc | Redundancy metadata schemes for RAIN protection of large codewords |
11789611, | Apr 24 2020 | NetApp, Inc | Methods for handling input-output operations in zoned storage systems and devices thereof |
11797377, | Oct 05 2021 | NetApp, Inc. | Efficient parity determination in zoned solid-state drives of a storage system |
11803329, | Nov 22 2021 | NetApp, Inc. | Methods and systems for processing write requests in a storage system |
11816359, | Dec 16 2021 | NetApp, Inc. | Scalable solid-state storage system and methods thereof |
11861231, | Dec 16 2021 | NetApp, Inc. | Scalable solid-state storage system and methods thereof |
11940911, | Dec 17 2021 | NetApp, Inc.; NETAPP INC | Persistent key-value store and journaling system |
11960766, | Dec 06 2021 | SanDisk Technologies, Inc | Data storage device and method for accidental delete protection |
12135905, | Dec 16 2021 | NetApp, Inc. | Scalable solid-state storage system and methods thereof |
ER4186, |
Patent | Priority | Assignee | Title |
5822782, | Oct 27 1995 | AVAGO TECHNOLOGIES GENERAL IP SINGAPORE PTE LTD | Methods and structure to maintain raid configuration information on disks of the array |
6275898, | May 13 1999 | AVAGO TECHNOLOGIES INTERNATIONAL SALES PTE LIMITED | Methods and structure for RAID level migration within a logical unit |
6311251, | Nov 23 1998 | Oracle America, Inc | System for optimizing data storage in a RAID system |
6321345, | |||
6681290, | Jan 29 2001 | International Business Machines Corporation | Physical data layout to reduce seeks in a raid system |
6742081, | Apr 30 2001 | Oracle America, Inc | Data storage array employing block checksums and dynamic striping |
6854071, | May 14 2001 | GLOBALFOUNDRIES Inc | Method and apparatus for providing write recovery of faulty data in a non-redundant raid system |
6938123, | Jul 19 2002 | Oracle America, Inc | System and method for raid striping |
6983335, | Dec 12 2002 | VIA Technologies, Inc. | Disk drive managing method for multiple disk-array system |
7069381, | Jul 01 2003 | ACQUIOM AGENCY SERVICES LLC, AS ASSIGNEE | Automated Recovery from data corruption of data volumes in RAID storage |
7080278, | Mar 08 2002 | Network Appliance, Inc. | Technique for correcting multiple storage device failures in a storage array |
7200715, | Mar 21 2002 | NetApp, Inc | Method for writing contiguous arrays of stripes in a RAID storage system using mapped block writes |
7206991, | Oct 15 2003 | AVAGO TECHNOLOGIES INTERNATIONAL SALES PTE LIMITED | Method, apparatus and program for migrating between striped storage and parity striped storage |
7257674, | Jun 24 2003 | International Business Machines Corporation | Raid overlapping |
7315976, | Jan 31 2002 | AVAGO TECHNOLOGIES GENERAL IP SINGAPORE PTE LTD | Method for using CRC as metadata to protect against drive anomaly errors in a storage array |
7484137, | Mar 31 2005 | Western Digital Technologies, INC | RAID system using regional error statistics for redundancy grouping |
7506187, | Sep 02 2003 | International Business Machines Corporation | Methods, apparatus and controllers for a raid storage system |
7681072, | Aug 13 2004 | Panasas, Inc. | Systems and methods for facilitating file reconstruction and restoration in data storage systems where a RAID-X format is implemented at a file level within a plurality of storage devices |
7904782, | Mar 09 2007 | Microsoft Technology Licensing, LLC | Multiple protection group codes having maximally recoverable property |
7930475, | Mar 21 2002 | NetApp, Inc. | Method for writing contiguous arrays of stripes in a RAID storage system using mapped block writes |
7934055, | Dec 06 2006 | SanDisk Technologies LLC | Apparatus, system, and method for a shared, front-end, distributed RAID |
7958303, | Apr 27 2007 | STRIPE, INC | Flexible data storage system |
8015440, | Dec 06 2006 | SanDisk Technologies, Inc | Apparatus, system, and method for data storage using progressive raid |
8019938, | Dec 06 2006 | SanDisk Technologies, Inc | Apparatus, system, and method for solid-state storage as cache for high-capacity, non-volatile storage |
8037391, | May 22 2009 | Nvidia Corporation | Raid-6 computation system and method |
8171379, | Feb 18 2008 | Dell Products L.P. | Methods, systems and media for data recovery using global parity for multiple independent RAID levels |
8417987, | Dec 01 2009 | NetApp, Inc | Mechanism for correcting errors beyond the fault tolerant level of a raid array in a storage system |
20030115412, | |||
20060143507, | |||
20070288401, | |||
20080168225, | |||
20090055584, | |||
20090210742, | |||
20100125695, | |||
20110099321, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Mar 09 2018 | Pure Storage, Inc | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Mar 09 2018 | BIG: Entity status set to Undiscounted (note the period is included in the code). |
Jan 15 2024 | REM: Maintenance Fee Reminder Mailed. |
Jul 01 2024 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Feb 23 2024 | 4 years fee payment window open |
Aug 23 2024 | 6 months grace period start (w surcharge) |
Feb 23 2025 | patent expiry (for year 4) |
Feb 23 2027 | 2 years to revive unintentionally abandoned end. (for year 4) |
Feb 23 2028 | 8 years fee payment window open |
Aug 23 2028 | 6 months grace period start (w surcharge) |
Feb 23 2029 | patent expiry (for year 8) |
Feb 23 2031 | 2 years to revive unintentionally abandoned end. (for year 8) |
Feb 23 2032 | 12 years fee payment window open |
Aug 23 2032 | 6 months grace period start (w surcharge) |
Feb 23 2033 | patent expiry (for year 12) |
Feb 23 2035 | 2 years to revive unintentionally abandoned end. (for year 12) |