One embodiment facilitates data storage. During operation, the system receives data to be stored in a non-volatile memory of a storage device. The system determines, by a flash translation layer module of a control unit which is distinct from the storage device, a physical page address at which the data is to be stored in the non-volatile memory, wherein the flash translation layer module of the control unit determines physical page addresses for data to be stored in a plurality of storage devices. The system stores, by the flash translation layer module of the control unit, a mapping between a logical page address for the data and the physical page address. The system writes the data to the non-volatile memory at the physical page address.
|
1. A computer-implemented method for facilitating data storage, the method comprising:
receiving data to be stored in a non-volatile memory of a storage device of a plurality of storage devices;
determining, by a flash translation layer module of a control unit, a physical page address at which the data is to be stored in the non-volatile memory,
wherein the control unit is distinct from a host and the plurality of storage devices, and wherein the control unit runs separately from the host and the plurality of storage devices,
wherein the flash translation layer module of the control unit determines physical page addresses for first data to be stored in the plurality of storage devices,
wherein the control unit communicates with controllers of the plurality of storage devices, and
wherein the control unit manages a queue pair comprising a submission queue and a completion queue;
placing, by the control unit in the submission queue, a command to write the data at the physical page address;
storing, by the flash translation layer module of the control unit, a mapping between a logical page address for the data and the physical page address; and
writing the data to the non-volatile memory at the physical page address,
wherein a controller of the storage device obtains the command from the submission queue, executes the command, and sends to the control unit a complete notification,
which causes the control unit to place the command in the completion queue.
17. A non-transitory computer-readable storage medium storing instructions that when executed by a computer cause the computer to perform a method, the method comprising:
receiving data to be stored in a non-volatile memory of a storage device of a plurality of storage devices;
determining, by a flash translation layer module of a control unit, a physical page address at which the data is to be stored in the non-volatile memory,
wherein the control unit is distinct from a host and the plurality of storage devices, and wherein the control unit runs separately from the host and the plurality of storage devices,
wherein the flash translation layer module of the control unit determines physical page addresses for first data to be stored in a plurality of storage devices,
wherein the control unit communicates with controllers of the plurality of storage devices, and
wherein the control unit manages a queue pair comprising a submission queue and a completion queue;
placing, by the control unit in the submission queue, a command to write the data at the physical page address;
storing, by the flash translation layer module of the control unit, a mapping between a logical page address for the data and the physical page address; and
writing the data to the non-volatile memory at the physical page address,
wherein a controller of the storage device obtains the command from the submission queue, executes the command, and sends to the control unit a complete notification,
which causes the control unit to place the command in the completion queue.
9. A computer system for facilitating data storage, the system comprising:
a processor; and
a memory coupled to the processor and storing instructions, which when executed by the processor cause the processor to perform a method, the method comprising:
receiving data to be stored in a non-volatile memory of a storage device of a plurality of storage devices;
determining, by a flash translation layer module of a control unit, a physical page address at which the data is to be stored in the non-volatile memory,
wherein the control unit is distinct from a host and the plurality of storage devices, and wherein the control unit runs separately from the host and the plurality of storage devices,
wherein the flash translation layer module of the control unit determines physical page addresses for first data to be stored in the plurality of storage devices,
wherein the control unit communicates with controllers of the plurality of storage devices, and
wherein the control unit manages a queue pair comprising a submission queue and a completion queue;
placing, by the control unit in the submission queue, a command to write the data at the physical page address;
storing, by the flash translation layer module of the control unit, a mapping between a logical page address for the data and the physical page address; and
writing the data to the non-volatile memory at the physical page address,
wherein a controller of the storage device obtains the command from the submission queue, executes the command, and sends to the control unit a complete notification,
which causes the control unit to place the command in the completion queue.
2. The method of
in response to a query from a host for the physical page address, transmitting, by the control unit to the host, the determined physical page address,
wherein the data is held in a volatile memory of the host,
wherein the data is written directly, based on a direct memory access protocol, from the volatile memory of the host to the non-volatile memory of the storage device at the physical page address, and
wherein the host manages a second queue pair comprising a second submission queue and a second completion queue.
3. The method of
wherein in response to successfully writing the data to the non-volatile memory of the storage device, the controller of the storage device sends a complete notification to the host, and
wherein in response to receiving the complete notification, the host updates the second queue pair.
4. The method of
holding the data in a volatile memory of the control unit,
wherein writing the data to the non-volatile memory at the physical page address involves writing the data directly, based on a direct memory access protocol, from the volatile memory of the control unit to the non-volatile memory of the storage device at the physical page address; and
managing, by the control unit, the queue pair comprising the submission queue and the completion queue.
5. The method of
wherein in response to successfully writing the data to the non-volatile memory of the storage device, the controller of the storage device sends to the control unit the complete notification, and
wherein the method further comprises:
in response to receiving the complete notification, updating, by the control unit, the queue pair.
6. The method of
7. The method of
initiating a garbage collection process;
reading, by the control unit, valid data from a plurality of pages of blocks to be recycled, wherein the blocks are associated with the plurality of storage devices;
storing, by the control unit in a temporary data buffer, the valid data read from the plurality of storage devices; and
in response to obtaining a full block of data in the temporary data buffer, writing, by the control unit, the data in the full block to an open block of one of the plurality of storage devices.
8. The method of
writing the logical page address for the data in an out of band region of a page at the physical page address in the non-volatile memory; and
in response to detecting a power loss or a power failure:
reading out the page at the physical page address;
obtaining the corresponding logical page address previously written in the out of band region of the page; and
updating the mapping between the logical page address and the physical page address based on the obtained corresponding logical page address.
10. The computer system of
in response to a query from a host for the physical page address, transmitting, by the control unit to the host, the determined physical page address,
wherein the data is held in a volatile memory of the host,
wherein the data is written directly, based on a direct memory access protocol, from the volatile memory of the host to the non-volatile memory of the storage device at the physical page address, and
wherein the host manages a second queue pair comprising a second submission queue and a second completion queue.
11. The computer system of
wherein in response to successfully writing the data to the non-volatile memory of the storage device, the controller of the storage device sends a complete notification to the host, and
wherein in response to receiving the complete notification, the host updates the second queue pair.
12. The computer system of
holding the data in a volatile memory of the control unit,
wherein writing the data to the non-volatile memory at the physical page address involves writing the data directly, based on a direct memory access protocol, from the volatile memory of the control unit to the non-volatile memory of the storage device at the physical page address; and
managing, by the control unit, the queue pair comprising the submission queue and the completion queue.
13. The computer system of
wherein in response to successfully writing the data to the non-volatile memory of the storage device, the controller of the storage device sends to the control unit the complete notification, and
wherein the method further comprises:
in response to receiving the complete notification, updating, by the control unit, the queue pair.
14. The computer system of
15. The computer system of
initiating a garbage collection process;
reading, by the control unit, valid data from a plurality of pages of blocks to be recycled, wherein the blocks are associated with the plurality of storage devices;
storing, by the control unit in a temporary data buffer, the valid data read from the plurality of storage devices; and
in response to obtaining a full block of data in the temporary data buffer, writing, by the control unit, the data in the full block to an open block of one of the plurality of storage devices.
16. The computer system of
writing the logical page address for the data in an out of band region of a page at the physical page address in the non-volatile memory; and
in response to detecting a power loss or a power failure:
reading out the page at the physical page address;
obtaining the corresponding logical page address previously written in the out of band region of the page; and
updating the mapping between the logical page address and the physical page address based on the obtained corresponding logical page address.
18. The storage medium of
in response to a query from a host for the physical page address, transmitting, by the control unit to the host, the determined physical page address,
wherein the data is held in a volatile memory of the host,
wherein the data is written directly, based on a direct memory access protocol, from the volatile memory of the host to the non-volatile memory of the storage device at the physical page address,
wherein the host manages a second queue pair comprising a second submission queue and a second completion queue,
wherein in response to successfully writing the data to the non-volatile memory of the storage device, the controller of the storage device sends a complete notification to the host, and
wherein in response to receiving the complete notification, the host updates the second queue pair.
19. The storage medium of
holding the data in a volatile memory of the control unit,
wherein writing the data to the non-volatile memory at the physical page address involves writing the data directly, based on a direct memory access protocol, from the volatile memory of the control unit to the non-volatile memory of the storage device at the physical page address;
managing, by the control unit, the queue pair comprising the submission queue and the completion queue;
wherein in response to successfully writing the data to the non-volatile memory of the storage device, the controller of the storage device sends to the control unit the complete notification, and
wherein the method further comprises:
in response to receiving the complete notification, updating, by the control unit, the queue pair.
20. The storage medium of
initiating a garbage collection process;
reading, by the control unit, valid data from a plurality of pages of blocks to be recycled, wherein the blocks are associated with the plurality of storage devices;
storing, by the control unit in a temporary data buffer, the valid data read from the plurality of storage devices; and
in response to obtaining a full block of data in the temporary data buffer, writing, by the control unit, the data in the full block to an open block of one of the plurality of storage devices.
|
This disclosure is generally related to the field of data storage. More specifically, this disclosure is related to a system and method of FPGA-executed flash translation layer (FTL) in multiple solid state drives (SSDs).
The proliferation of the Internet and e-commerce continues to create a vast amount of digital content. Various storage systems and servers have been created to access and store such digital content. In cloud or clustered storage systems, multiple applications may share the underlying system resources (e.g., of the storage devices or drives). A storage system or server can include multiple drives (e.g., a solid state drive (SSD)), and a drive can include non-volatile memory such as NAND flash for persistent storage.
Current SSDs can include a flash translation layer (FTL) running in a device (“device-based FTL”). The computation power and capacity of the device controller can be increased by, e.g., by placing more microprocessors in the SSD controller and by increasing the internal dynamic random access memory (DRAM) capacity of the SSD. However, the device-based FTL of this more powerful SSD is isolated from the host, such that when the logical block address (LBA) is passed into the SSD, the host is left with no knowledge regarding the corresponding physical block address (PBA), i.e., the physical Not-And (NAND) organization for data placement. Thus, this more powerful SSD—with the device-based FTL—is like a black-box system. When one server is equipped with multiple drives (on the order of tens), an individual drive stands alone and has no communication with its peers. Thus, a single slow drive or minority drives which experience a fault can result in the degradation of the system performance. Furthermore, distributing the many more microprocessors inside each of the multiple SSDs running the device-based FTL, and installing firmware on each of the microprocessors, is an overdesign which can lead to a reduced write amplification and an increased wear-leveling of the physical NAND flash. This can result in decreased performance and efficiency of the overall storage system.
Current SSDs can also include a flash translation layer (FTL) running on the host side (“host-based FTL”), which can provide the host with visibility into the LBA-to-PBA mapping. However, the host-based FTL SSDs can consume both the resources of the host central processing unit (CPU) and the capacity utilized by the host DRAM. As the capacity of SSDs continues to increase, so increases both the host CPU consumption and the host DRAM utilization, resulting in a non-trivial resource consumption. While the host-based FTL can provide the host with flexibility and address the black-box challenges associated with the device-based FTL, this non-trivial resource consumption can decrease the efficiency of the overall storage system.
One embodiment facilitates data storage. During operation, the system receives data to be stored in a non-volatile memory of a storage device. The system determines, by a flash translation layer module of a control unit which is distinct from the storage device, a physical page address at which the data is to be stored in the non-volatile memory, wherein the flash translation layer module of the control unit determines physical page addresses for data to be stored in a plurality of storage devices. The system stores, by the flash translation layer module of the control unit, a mapping between a logical page address for the data and the physical page address. The system writes the data to the non-volatile memory at the physical page address.
In some embodiments, in response to a query from a host for the physical page address, the system transmits, by the control unit to the host, the determined physical page address, wherein the data is held in a volatile memory of the host, wherein the data is written directly, based on a direct memory access protocol, from the volatile memory of the host to the non-volatile memory of the storage device at the physical page address, and wherein the host manages a queue pair comprising a submission queue and a completion queue.
In some embodiments, in response to successfully writing the data to the non-volatile memory of the storage device, a controller of the storage device sends a complete notification to the host. In response to receiving the complete notification, the host updates the queue pair.
In some embodiments, the system holds the data in a volatile memory of the control unit, wherein writing the data to the non-volatile memory at the physical page address involves writing the data directly, based on a direct memory access protocol, from the volatile memory of the control unit to the non-volatile memory of the storage device at the physical page address. The system manages, by the control unit, a queue pair comprising a submission queue and a completion queue.
In some embodiments, in response to successfully writing the data to the non-volatile memory of the storage device, a controller of the storage device sends a complete notification to the control unit. In response to receiving the complete notification, the system updates, by the control unit, the queue pair.
In some embodiments, the plurality of storage devices includes the storage device, and a respective storage device does not include a flash translation layer module.
In some embodiments, the system initiates a garbage collection process. The system reads, by the control unit, valid data from a plurality of pages of blocks to be recycled, wherein the blocks are associated with the plurality of storage devices. The system stores, by the control unit in a temporary data buffer, the valid data read from the plurality of storage devices. In response to obtaining a full block of data in the temporary data buffer, the system writes, by the control unit, the data in the full block to an open block of one of the plurality of storage devices.
In some embodiments, writing the data to the non-volatile memory at the physical page address involves writing the logical page address for the data in an out of band region of a page at the physical page address in the non-volatile memory. Furthermore, in response to detecting a power loss or a power failure, the system: reads out the page at the physical page address; obtains the corresponding logical page address previously written in the out of band region of the page; and updates the mapping between the logical page address and the physical page address based on the obtained corresponding logical page address.
In the figures, like reference numerals refer to the same figure elements.
The following description is presented to enable any person skilled in the art to make and use the embodiments, and is provided in the context of a particular application and its requirements. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present disclosure. Thus, the embodiments described herein are not limited to the embodiments shown, but are to be accorded the widest scope consistent with the principles and features disclosed herein.
Overview
The embodiments described herein provide a system which reduces the consumption of host resources by providing a control unit (such as a field programmable gate array (FPGA) card) which performs computational processing, address mapping, and background process management for multiple storage drives.
As described above, current SSDs can include a flash translation layer (FTL) running in a device (“device-based FTL”). The computation power and capacity of the device controller can be increased by, e.g., by placing more microprocessors in the SSD controller and by increasing the internal DRAM capacity of the SSD. However, the device-based FTL of this more powerful SSD is isolated from the host, resulting in the host having no visibility into the LBA-to-PBA mapping, i.e., the physical NAND organization for data placement. Thus, this more powerful SSD—with the device-based FTL—is like a black-box system. When one server is equipped with multiple drives (on the order of tens), an individual drive stands alone and has no communication with its peers. Thus, a single slow drive or minority drives which experience a fault can result in the degradation of the system performance. Furthermore, distributing the many more microprocessors inside each of the multiple SSDs running the device-based FTL, and installing firmware on each of the microprocessors, is an overdesign which can lead to a reduced write amplification and an increased wear-leveling of the physical NAND flash. This can result in decreased performance and efficiency of the overall storage system.
Current SSDs can also include a flash translation layer (FTL) running on the host side (“host-based FTL”), which can provide the host with visibility into the LBA-to-PBA mapping and more control over the back-end operations of the SSD (e.g., garbage collection). However, the host-based FTL SSDs can consume both the resources of the host CPU and the capacity utilized by the host DRAM. As the capacity of SSDs continues to increase, so increases both the host CPU consumption and the host DRAM utilization, resulting in a non-trivial resource consumption. While the host-based FTL can provide the host with flexibility and address the lack of host visibility to the LBA-to-PBA mapping associated with the device-based FTL, this non-trivial resource consumption can decrease the efficiency of the overall storage system.
The embodiments described herein address these problems by providing a control unit which can collaborate with the host cores to handle the majority of the input/output (I/O) processing, and can communicate with the controllers of multiple SSDs. The control unit can also include the FTL module which handles the mapping of the logical to physical addresses for the multiple SSDs, and can manage the background processing of the multiple SSDs. Thus, using the control unit with the FTL module—which handles the address-mapping for the multiple SSDs—can eliminate the overhead involved in the current device-based FTL systems. This can result in a reduced consumption of the host resources.
Furthermore, in some embodiments, the control unit (rather than the host CPU) can both perform computational processing on the data to be stored and handle the submission queue (SG) and the completion queue (CQ) (referred to together as the “queue pair”). This can also reduce the consumption of the host resources.
Thus, the embodiments described herein provide a system with a control unit which performs computational processing, address mapping, and background process management for multiple storage drives, which decreases the host CPU consumption and the host DRAM utilization, and results in an improved and more efficient storage system.
The term “control unit” refers to a component, unit, or module which can perform the operations described herein. In this disclosure, the control unit is illustrated as, e.g., a field programmable gate array (FPGA) card. The control unit can also be incorporated onto an application-specific integrated circuit (ASIC), e.g., as firmware. The operations of the control unit can also be performed by a specific microprocessor with its own low-level operating system, or spread across multiple ASICs or other ICs. The control unit can also be installed as part of or as it own individual hardware, firmware, or software component (or any combination thereof) which communicates with the host and the storage devices in the manner described herein.
The term “queue pair” refers to a submission queue (SQ) and a completion queue (CQ). Commands to be executed are placed in the SQ, while commands which are completed (or an indication that a command has been completed) are placed in the CQ.
Exemplary Environment in the Prior Art (Device-Based FTL)
Including more microprocessors and increasing the internal DRAM capacity, as well as installing firmware to run the FTL module via the MCUs, can result in a more powerful SSD with a device-based FTL, e.g., by increasing the number of MCUs 122 and the capacity of DRAMs 130 and 132. However, as described above, multiple SSDs which run a respective device-based FTL can result in multiple, isolated black box systems, where each individual drive stands alone and has no communication with its peers. This lack of communication and visibility can result in degradation of the system performance. The overdesign of the distributed multiple microprocessors can also result in a reduced write amplification and an increased wear-leveling of the physical NAND flash. Thus, the more powerful SSDs (such as SSD 100) with the device-based FTL can result in decreased performance and efficiency of the overall storage system.
Exemplary Environment for Facilitating Data Storage (Control Unit-Based FTL)
The embodiments described herein solve the challenges and inefficiencies associated with the device-based FTL by providing a system with a control unit which includes an FTL module that handles and stores the mapping of logical to physical page addresses for multiple storage drives in a storage system. The control unit can be a field programmable gate array (FPGA) card which performs computational processing, address mapping, and background process management for multiple storage drives. For example, the control unit can communicate with the host CPUs to handle a significant amount (e.g., a large majority) of I/O processing by performing computation or processing of incoming I/O data (as described below in relation to
During operation, the system can store the L2P mapping information in DRAM (e.g., DRAM 230 or 232), and can update the L2P mapping information based on NAND address operations. The system can periodically write the stored mapping information from the volatile memory (DRAM 230 or 232) to the non-volatile memory (table NOR 244). This periodic writing or flushing can be based on a predetermined time period or interval, and can also be based on reaching a predetermined size of the table (or other data structure) which stores the L2P mapping information.
Furthermore, when the system writes one physical page of data to the non-volatile memory (e.g., NAND flash) of an associated storage drive (not shown), the system can also write the corresponding logical page address into the same physical page's out-of-band (OOB) region to maintain the mapping information. If the system experiences a power loss or a power failure, the system can construct the most recent mapping table by reading out the mapping information previously written in the physical page stored in the NAND flash.
The system can thus provide power loss protection by writing the logical page address into the OOB region of a same corresponding physical page. Moreover, by storing the L2P mapping information in table NOR 244, the system can accelerate the loading of multiple high-capacity storage drives. As a result, the embodiments described herein can result in reducing the amount of resources consumed by the host and by each specific SSD, which can lead to an improved and more efficient overall storage system.
Exemplary Environment: Control Unit with FTL
Environment 300 depicts communications involved in handling the I/O from the host CPU without any further computation or processing of data by the FPGA. During operation, the system can receive data to be written to a non-volatile memory of a storage device (e.g., to NAND flash of an SSD). The system can hold the data in the host DIMM (as data 312 in DIMM 310). CPU 302 can send to FPGA 322 a write request and a query to obtain a physical page address to which to write the associated stored data (e.g., data 312 in DIMM 310, which has a certain logical page address) (via a communication 350). FPGA 322 can receive the write request and query (communication 350), and FTL module 328 can determine and assign the physical page address for the associated stored data. FPGA 322 can return the assigned physical page address to CPU 302 (communication 350). CPU 302 can check the returned physical page address, and place in SQ 314 a command to write the requested data at the returned physical page address (via a communication 352).
The host (via CPU 302) can work with the SSDs (via, e.g., SSD controllers 332 and 342) to write the data to SSDs 330 and 340. For example, SSD controller 332 can obtain the placed write command from SQ 314, and execute the write command by writing data 312 to the NAND flash of SSD 330 (e.g., to a block 334 of the NAND flash of SSD 330). Similarly, SSD controller 342 can obtain another placed write command from SQ 314, and execute the other write command by writing (part of) data 312 to the NAND flash of SSD 340 (e.g., to a block 344 of the NAND flash of SSD 340). Writing data 312 to block 334 (via a communication 354) or to block 344 (via a communication 356) can be based on a direct memory access (DMA) protocol. Upon successfully executing the write command, SSD controller 332 can send to the host (via a communication 358) a complete notification, which can be a message which causes the host to place the completed command into CQ 314. Similarly, upon successfully executing the other write command, SSD controller 342 can send to the host (via a communication 360) a complete notification, can be a message which causes the host to place the completed command into CQ 314.
For a read operation, host CPU 302 can send to FPGA 322 a read request and a query (via communication 350) to obtain the physical page address associated with the data to be read (e.g., data previously stored in block 344 of NAND flash of SSD 340 via communication 356). FPGA 322 can receive the read request and query (communication 350), and can determine and return to CPU 302 (via communication 350) the physical page address associated with the logical page address for the data to be read. CPU 302 can check the returned physical page address, and place in SQ 314 a command to read the requested data at the returned physical page address (via a communication 352). SSD controller 342 can obtain the placed read command from SQ 314, and execute the read command by reading data stored in block 344 of the NAND flash of SSD 340. Reading data from block 344 and placing it in DIMM 310 can be based on a DMA protocol. Upon successfully executing the read command, SSD controller 342 can send to the host (via a communication 360) a complete notification, which can be a message which causes the host to place the completed command into CQ 314.
Thus, environment 300 depicts communications involved in handling the I/O from the host CPU without any further computation or processing of data by the control unit, where the control unit is an FPGA which assigns and stores the L2P mapping for data stored in the non-volatile memory of the plurality of storage drives (e.g., the NAND flash of SSDs 330 and 340). In environment 300, the host can hold, in a temporary data buffer of its DIMM (e.g., as data 312), the data to be written to the NAND flash of the SSDs. The host can also store and manage the queue pair (e.g., SQ/CQ 314).
By placing the FTL in a control unit which manages the L2P mapping for multiple storage drives, the embodiments described herein alleviate the burden of storing the FTL in the DRAM of each SSD or in the DRAM of the host (as in the prior art). Thus, this results in a reduced resource consumption on the host side, which can result in an improved and more efficient storage system.
Exemplary Environment: Control Unit with FTL, Queue Pair, and Data to be Written to a Storage Device
In some embodiments, the control unit performs computational processing, address mapping, and background process management across and for multiple storage drives.
Environment 400 can include a host with a CPU 402 and DIMMs 404, 406, 408, and 410. DIMM 406 can include organizational data 407. Environment 400 can also include a control unit (e.g., FPGA 422), multiple storage drives (e.g., SSD 430), and a network interface card (NIC) 440 via which network traffic is received. FPGA 422 can include DRAM 424, including an FTL module 428 and a queue pair which comprises a submission queue (SQ)/completion queue (CQ) 429. SSD 430 can include an SSD controller 432. Similar to SSDs 330 and 340 of
Environment 400 depicts communications involved in handling the I/O from the host CPU with further computation or processing of data by the FPGA. During operation, the system can receive data to be written to a non-volatile memory of a storage device (e.g., to NAND flash of an SSD). The system can receive the data via NIC 440 (via a DMA communication 452), and can hold the data in the DRAM of FPGA 422 (as data 426 in DRAM 424). FPGA 422 can communicate with DIMM 406 to determine organizational data relating to data 426 (via a communication 450). FPGA 422, via FTL module 428, can determine the physical page to address to which to write the associated stored data (e.g., data 426 in DRAM 424). FPGA 422 can also place in SQ 429 a command to write the requested data at the determined physical page address.
FPGA 422 can work with the SSDs (via, e.g., SSD controller 432) to write the data to SSD 430. For example, SSD controller 432 can obtain the placed write command from SQ 429, and execute the write command by writing, based on a DMA protocol, data 426 to the NAND flash of SSD 430 (e.g., to a block 434 of the NAND flash of SSD 430). Upon successfully executing the write command, SSD controller 432 can send to FPGA 422 (via communication 456) a complete notification, which can be a message which causes FPGA 422 to place the completed command into CQ 429.
Thus, environment 400 depicts how the control unit performs computational processing and address mapping for multiple storage devices, which results in alleviating the load on the host CPU and DRAM. By placing in the DRAM of the control unit the elements previously placed in the host (i.e., the FTL module, the queue pair, and the data to be written to persistent storage), the embodiments described herein result in reducing both the amount of resources consumed by the host CPU and the amount of space utilized by the host DRAM.
Exemplary Method for Facilitating Data Storage
The system determines, by the FTL module of the control unit, the PPA (operation 536). The system manages, by the control unit, the queue pair (i.e., the submission queue and the completion queue) (operation 538). The system writes, by the control unit, the data to the SSD (by working with the SSD controller) (operation 540). The system confirms, by the SSD, that the data has been written to the SSD by sending a confirmation to the CPU (if the operation is preceded by and reached from operation 516) or the control unit (if the operation is preceded by and reached from operation 540) (operation 542). The system processes, by the CPU or the control unit, the completion queue based on the confirmation (operation 544), and the operation returns.
Improved Garbage Collection: Exemplary Environment and Method
Thus, the embodiments described herein provide an improved garbage collection, because the control unit can read valid data on a page-by-page basis from multiple blocks to be recycled from multiple SSDs and can also write data to an open block of an SSD on a block-by-block basis. By reading pages out from multiple SSDs, rather than from a single SSD, the system can more easily and quickly form a whole block of data to be written to an open block as part of the garbage collection process. This can reduce the likelihood of an open block. Furthermore, by reducing the number of open blocks, the system can increase the reliability of the data stored in the non-volatile memory.
If the system determines that a full block is obtained in the temporary data buffer (decision 708), the system writes, by the control unit, the data from the full block to an open block of one of the plurality of storage devices (operation 710), and the operation returns.
Exemplary Computer System and Apparatus
Content-processing system 818 can include instructions, which when executed by computer system 800, can cause computer system 800 to perform methods and/or processes described in this disclosure. For example, content-processing system 818 can include instructions for receiving and transmitting data packets, including a request to write or read data, an I/O request, data to be encoded and stored, a block or a page of data, a PPA, an LPA, and a mapping (communication module 820). Content-processing system 818 can further include instructions for receiving data to be stored in a non-volatile memory of a storage device (communication module 820). Content-processing system 818 can further include instructions for managing and updating a queue pair (queue pair-managing module 822).
Control unit 830 can include a volatile memory 832 and a non-volatile memory 834. Volatile memory 832 can include an FTL module, DRAM for holding data to be stored in persistent non-volatile memory, and a queue pair. Non-volatile memory 834 can include L2P mapping information (e.g., as table NOR 244 of
Storage device 850 can include a controller 852 and a non-volatile memory 854. Storage device 850 can include instructions, which when executed by storage device 850, can cause storage device 850 to perform methods and/or processes described in this disclosure. Storage device 850 can include instructions for obtaining commands from a queue pair of computer system 800 or control unit 830 (communication module 860). Storage device 850 can include instructions for executing the obtained (I/O) commands (data-writing and data-reading module 862). Storage device 850 can include instructions for sending a complete notification to the host or the control unit (communication module 860).
Data 824, 848, and 864 can include any data that is required as input or that is generated as output by the methods and/or processes described in this disclosure. Specifically, data 824, 848, and 864 can store at least: data to be stored, written, loaded, moved, retrieved, accessed, deleted, or copied; a temporary data buffer; an indicator of a controller of a storage device; a physical page of data; a block of data; an acknowledgment that data is successfully committed or has been written to a non-volatile memory; an indicator of a detected power loss; an indicator of a control unit, an SSD, and a host; a table; a data structure; a physical page address (PPA); a logical page address (LPA); a flash translation layer; a mapping between an LPA and a PPA; valid data; invalid data; an indicator of a background process or a garbage collection process; a trigger or condition to begin a background process or a garbage collection process; an out of band region; an indicator of a power loss or a power failure; a queue pair; a submission queue; and a completion queue.
The data structures and code described in this detailed description are typically stored on a computer-readable storage medium, which may be any device or medium that can store code and/or data for use by a computer system. The computer-readable storage medium includes, but is not limited to, volatile memory, non-volatile memory, magnetic and optical storage devices such as disk drives, magnetic tape, CDs (compact discs), DVDs (digital versatile discs or digital video discs), or other media capable of storing computer-readable media now known or later developed.
The methods and processes described in the detailed description section can be embodied as code and/or data, which can be stored in a computer-readable storage medium as described above. When a computer system reads and executes the code and/or data stored on the computer-readable storage medium, the computer system performs the methods and processes embodied as data structures and code and stored within the computer-readable storage medium.
Furthermore, the methods and processes described above can be included in hardware modules. For example, the hardware modules can include, but are not limited to, application-specific integrated circuit (ASIC) chips, field-programmable gate arrays (FPGAs), and other programmable-logic devices now known or later developed. When the hardware modules are activated, the hardware modules perform the methods and processes included within the hardware modules.
The foregoing embodiments described herein have been presented for purposes of illustration and description only. They are not intended to be exhaustive or to limit the embodiments described herein to the forms disclosed. Accordingly, many modifications and variations will be apparent to practitioners skilled in the art. Additionally, the above disclosure is not intended to limit the embodiments described herein. The scope of the embodiments described herein is defined by the appended claims.
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
10013169, | Dec 19 2014 | International Business Machines Corporation | Cooperative data deduplication in a solid state storage array |
10199066, | Mar 01 2018 | Seagate Technology LLC | Write management of physically coupled storage areas |
10229735, | Dec 22 2017 | Intel Corporation | Block management for dynamic single-level cell buffers in storage devices |
10235198, | Feb 24 2016 | Samsung Electronics Co., Ltd.; SAMSUNG ELECTRONICS CO , LTD | VM-aware FTL design for SR-IOV NVME SSD |
10318467, | Mar 09 2015 | International Business Machines Corporation | Preventing input/output (I/O) traffic overloading of an interconnect channel in a distributed data storage system |
10361722, | Nov 10 2016 | SK Hynix Inc. | Semiconductor memory device performing randomization operation |
10437670, | May 24 2018 | International Business Machines Corporation | Metadata hardening and parity accumulation for log-structured arrays |
10642522, | Sep 15 2017 | Alibaba Group Holding Limited | Method and system for in-line deduplication in a storage drive based on a non-collision hash |
10649657, | Mar 22 2018 | SanDisk Technologies, Inc | Log-based storage for different data types in non-volatile memory |
10956346, | Jan 13 2017 | LIGHTBITS LABS LTD | Storage system having an in-line hardware accelerator |
3893071, | |||
4562494, | Apr 07 1983 | Verbatim Corporation | Disk drive alignment analyzer |
4718067, | Aug 02 1984 | U S PHILIPS CORPORATION | Device for correcting and concealing errors in a data stream, and video and/or audio reproduction apparatus comprising such a device |
4775932, | Jul 31 1984 | Texas Instruments Incorporated; TEXAS INSTRUMENTS INCORPORATED A CORP OF DE | Computer memory system with parallel garbage collection independent from an associated user processor |
4858040, | Aug 25 1987 | Ampex Corporation | Bimorph actuator for a disk drive |
5394382, | Feb 11 1993 | IBM Corporation | Method for the organization of data on a CD-ROM |
5602693, | Dec 14 1994 | RESEARCH INVESTMENT NETWORK, INC | Method and apparatus for sensing position in a disk drive |
5732093, | Feb 08 1996 | MEDIA TEK INCORPORATION | Error correction method and apparatus on optical disc system |
5802551, | Oct 01 1993 | Fujitsu Limited | Method and apparatus for controlling the writing and erasing of information in a memory device |
5930167, | Jul 30 1997 | SanDisk Technologies LLC | Multi-state non-volatile flash memory capable of being its own two state write cache |
6098185, | Oct 31 1997 | STMICROELECTRONICS N V | Header-formatted defective sector management system |
6148377, | Nov 22 1996 | GOOGLE LLC | Shared memory computer networks |
6226650, | Sep 17 1998 | SYNCHROLOGIC, INC | Database synchronization and organization system and method |
6457104, | Mar 20 2000 | International Business Machines Corporation | System and method for recycling stale memory content in compressed memory systems |
6658478, | Aug 04 2000 | Hewlett Packard Enterprise Development LP | Data storage system |
6795894, | Aug 08 2000 | Hewlett Packard Enterprise Development LP | Fast disk cache writing system |
7351072, | Jul 06 2006 | SAMSUNG ELECTRONICS CO , LTD | Memory module, memory extension memory module, memory module system, and method for manufacturing a memory module |
7565454, | Jul 18 2003 | Microsoft Technology Licensing, LLC | State migration in multiple NIC RDMA enabled devices |
7599139, | Jun 22 2007 | Western Digital Technologies, Inc. | Disk drive having a high performance access mode and a lower performance archive mode |
7953899, | Aug 21 2002 | Hewlett Packard Enterprise Development LP | Universal diagnostic hardware space access system for firmware |
7958433, | Nov 30 2006 | CAVIUM INTERNATIONAL; MARVELL ASIA PTE, LTD | Methods and systems for storing data in memory using zoning |
8085569, | Dec 28 2006 | Hynix Semiconductor Inc. | Semiconductor memory device, and multi-chip package and method of operating the same |
8144512, | Dec 18 2009 | SanDisk Technologies, Inc | Data transfer flows for on-chip folding |
8166233, | Jul 24 2009 | AVAGO TECHNOLOGIES INTERNATIONAL SALES PTE LIMITED | Garbage collection for solid state disks |
8260924, | May 03 2006 | BlueTie, Inc. | User load balancing systems and methods thereof |
8281061, | Mar 31 2008 | U S BANK NATIONAL ASSOCIATION, AS COLLATERAL AGENT | Data conditioning to improve flash memory reliability |
8452819, | Mar 22 2011 | Amazon Technologies, Inc | Methods and apparatus for optimizing resource utilization in distributed storage systems |
8516284, | Nov 04 2010 | International Business Machines Corporation | Saving power by placing inactive computing devices in optimized configuration corresponding to a specific constraint |
8527544, | Aug 11 2011 | Pure Storage Inc. | Garbage collection in a storage system |
8751763, | Mar 13 2013 | Nimbus Data Systems, Inc. | Low-overhead deduplication within a block-based data storage |
8825937, | Feb 25 2011 | SanDisk Technologies, Inc | Writing cached data forward on read |
8868825, | Jul 02 2014 | Pure Storage, Inc. | Nonrepeating identifiers in an address space of a non-volatile solid-state storage |
8904061, | Dec 30 2011 | EMC IP HOLDING COMPANY LLC | Managing storage operations in a server cache |
9015561, | Jun 11 2014 | SanDisk Technologies LLC | Adaptive redundancy in three dimensional memory |
9043545, | Jan 06 2012 | NetApp, Inc | Distributing capacity slices across storage system nodes |
9088300, | Dec 15 2011 | MARVELL INTERNATIONAL LTD | Cyclic redundancy check for out-of-order codewords |
9092223, | May 31 2012 | GOOGLE LLC | Systems and methods to save power in data-center networks |
9129628, | Oct 23 2014 | Western Digital Technologies, INC | Data management for data storage device with different track density regions |
9141176, | Jul 29 2013 | Western Digital Technologies, Inc.; Western Digital Technologies, INC | Power management for data storage device |
9208817, | Mar 10 2015 | Alibaba Group Holding Limited | System and method for determination and reallocation of pending sectors caused by media fatigue |
9280472, | Mar 13 2013 | Western Digital Technologies, INC | Caching data in a high performance zone of a data storage system |
9280487, | Jan 18 2013 | Cisco Technology, Inc. | Methods and apparatus for data processing using data compression, linked lists and de-duplication techniques |
9311939, | Dec 23 2014 | Western Digital Technologies, INC | Write-through media caching |
9336340, | Mar 30 2012 | EMC IP HOLDING COMPANY LLC | Evaluating management operations |
9436595, | Mar 15 2013 | GOOGLE LLC | Use of application data and garbage-collected data to improve write efficiency of a data storage device |
9529601, | Jul 15 2015 | Dell Products L.P. | Multi-processor startup system |
9588698, | Mar 15 2013 | SanDisk Technologies, Inc | Managing the write performance of an asymmetric memory system |
9588977, | Sep 30 2014 | EMC IP HOLDING COMPANY LLC | Data and metadata structures for use in tiering data to cloud storage |
9607631, | Nov 24 2014 | Seagate Technology LLC | Enhanced capacity recording |
9747202, | Mar 14 2013 | SanDisk Technologies LLC | Storage module and method for identifying hot and cold data |
9852076, | Dec 18 2014 | Innovations In Memory LLC | Caching of metadata for deduplicated LUNs |
9875053, | Jun 05 2015 | SanDisk Technologies, Inc | Scheduling scheme(s) for a multi-die storage device |
9946596, | Jan 29 2016 | Kioxia Corporation | Global error recovery system |
20010032324, | |||
20020010783, | |||
20020039260, | |||
20020073358, | |||
20020095403, | |||
20020161890, | |||
20030145274, | |||
20030163594, | |||
20030163633, | |||
20030217080, | |||
20040010545, | |||
20040066741, | |||
20040103238, | |||
20040255171, | |||
20040268278, | |||
20050038954, | |||
20050097126, | |||
20050149827, | |||
20050174670, | |||
20050177672, | |||
20050177755, | |||
20050195635, | |||
20050235067, | |||
20050235171, | |||
20060031709, | |||
20060156012, | |||
20070033323, | |||
20070061502, | |||
20070101096, | |||
20070283081, | |||
20070285980, | |||
20080034154, | |||
20080065805, | |||
20080082731, | |||
20080112238, | |||
20080301532, | |||
20090006667, | |||
20090089544, | |||
20090113219, | |||
20090183052, | |||
20090282275, | |||
20090287956, | |||
20090307249, | |||
20090310412, | |||
20100169470, | |||
20100217952, | |||
20100229224, | |||
20100325367, | |||
20100332922, | |||
20110031546, | |||
20110055458, | |||
20110055471, | |||
20110099418, | |||
20110153903, | |||
20110161784, | |||
20110191525, | |||
20110218969, | |||
20110231598, | |||
20110239083, | |||
20110252188, | |||
20110258514, | |||
20110292538, | |||
20110299317, | |||
20110302353, | |||
20120084523, | |||
20120089774, | |||
20120096330, | |||
20120117399, | |||
20120147021, | |||
20120159099, | |||
20120159289, | |||
20120173792, | |||
20120203958, | |||
20120210095, | |||
20120246392, | |||
20120278579, | |||
20120284587, | |||
20120331207, | |||
20130024605, | |||
20130054822, | |||
20130061029, | |||
20130073798, | |||
20130080391, | |||
20130145085, | |||
20130145089, | |||
20130151759, | |||
20130159251, | |||
20130166820, | |||
20130173845, | |||
20130191601, | |||
20130219131, | |||
20130238955, | |||
20130254622, | |||
20130318283, | |||
20140006688, | |||
20140019650, | |||
20140025638, | |||
20140082273, | |||
20140095827, | |||
20140108414, | |||
20140181532, | |||
20140195564, | |||
20140233950, | |||
20140250259, | |||
20140279927, | |||
20140304452, | |||
20140310574, | |||
20140359229, | |||
20140365707, | |||
20150019798, | |||
20150082317, | |||
20150106556, | |||
20150106559, | |||
20150121031, | |||
20150142752, | |||
20150199234, | |||
20150227316, | |||
20150234845, | |||
20150269964, | |||
20150277937, | |||
20150294684, | |||
20150301964, | |||
20150304108, | |||
20150347025, | |||
20150363271, | |||
20150363328, | |||
20150372597, | |||
20160014039, | |||
20160026575, | |||
20160041760, | |||
20160048341, | |||
20160077749, | |||
20160077968, | |||
20160098344, | |||
20160098350, | |||
20160103631, | |||
20160110254, | |||
20160154601, | |||
20160155750, | |||
20160162187, | |||
20160179399, | |||
20160203000, | |||
20160232103, | |||
20160239074, | |||
20160239380, | |||
20160274636, | |||
20160306853, | |||
20160321002, | |||
20160342345, | |||
20160343429, | |||
20160350002, | |||
20160350385, | |||
20160364146, | |||
20170010652, | |||
20170075583, | |||
20170075594, | |||
20170091110, | |||
20170109199, | |||
20170109232, | |||
20170147499, | |||
20170161202, | |||
20170162235, | |||
20170168986, | |||
20170177217, | |||
20170177259, | |||
20170199823, | |||
20170212708, | |||
20170220254, | |||
20170221519, | |||
20170228157, | |||
20170242722, | |||
20170249162, | |||
20170262178, | |||
20170262217, | |||
20170269998, | |||
20170285976, | |||
20170286311, | |||
20170322888, | |||
20170344470, | |||
20170344491, | |||
20170353576, | |||
20180024772, | |||
20180024779, | |||
20180033491, | |||
20180052797, | |||
20180067847, | |||
20180074730, | |||
20180076828, | |||
20180088867, | |||
20180107591, | |||
20180121121, | |||
20180143780, | |||
20180165038, | |||
20180167268, | |||
20180173620, | |||
20180188970, | |||
20180189182, | |||
20180212951, | |||
20180226124, | |||
20180232151, | |||
20180270110, | |||
20180293014, | |||
20180300203, | |||
20180329776, | |||
20180336921, | |||
20180356992, | |||
20180373428, | |||
20180373655, | |||
20180373664, | |||
20190012111, | |||
20190065085, | |||
20190073262, | |||
20190087115, | |||
20190087328, | |||
20190171532, | |||
20190205206, | |||
20190227927, | |||
20190272242, | |||
20190278654, | |||
20190339998, | |||
20190377632, | |||
20190377821, | |||
20190391748, | |||
20200004456, | |||
20200004674, | |||
20200013458, | |||
20200042223, | |||
20200050385, | |||
20200097189, | |||
20200159425, | |||
WO1994018634, | |||
WO9418634, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Dec 19 2018 | LI, SHU | Alibaba Group Holding Limited | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 048009 | /0961 | |
Jan 04 2019 | Alibaba Group Holding Limited | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Jan 04 2019 | BIG: Entity status set to Undiscounted (note the period is included in the code). |
Date | Maintenance Schedule |
Sep 28 2024 | 4 years fee payment window open |
Mar 28 2025 | 6 months grace period start (w surcharge) |
Sep 28 2025 | patent expiry (for year 4) |
Sep 28 2027 | 2 years to revive unintentionally abandoned end. (for year 4) |
Sep 28 2028 | 8 years fee payment window open |
Mar 28 2029 | 6 months grace period start (w surcharge) |
Sep 28 2029 | patent expiry (for year 8) |
Sep 28 2031 | 2 years to revive unintentionally abandoned end. (for year 8) |
Sep 28 2032 | 12 years fee payment window open |
Mar 28 2033 | 6 months grace period start (w surcharge) |
Sep 28 2033 | patent expiry (for year 12) |
Sep 28 2035 | 2 years to revive unintentionally abandoned end. (for year 12) |