Aspects of a storage device including a plurality of dies and a controller are provided which allow for asymmetric die operation handling so that controller overheads associated with common resource intensive operations may be incurred in the background without delaying subsequent die operations. When the controller receives a command to perform an mlc operation such as programming a number of dies, the controller refrains from performing the mlc operation in one or more of the dies for a period of time while simultaneously performing the mlc operation in a remainder of the dies. Instead, the controller performs another operation, such as an slc operation, another mlc operation, or a transfer operation, that involves a common resource in these dies during the period of time. controller overheads associated with these other operations thus are incurred without creating bottlenecks when the number of dies is large, thereby improving storage device performance.
|
1. A storage device, comprising:
a plurality of dies each including a plurality of multi-level cells (MLCs) and a plurality of single-level cells (slcs); and
a controller configured to refrain from programming data in the MLCs of a number of the dies for a period of time while simultaneously programming one or more of the MLCs in each of a remainder of the dies, and to perform an slc operation in the number of the dies during the period of time, wherein the number of the dies is based on a length of the period of time and a controller overhead associated with performing the slc operation.
17. A storage device, comprising:
a plurality of dies each including a plurality of multi-level cells (MLCs) and a plurality of single-level cells (slcs); and
a controller configured to receive an mlc program command, to refrain from programming data in the MLCs of a number of the dies for a period of time while simultaneously programming one or more of the MLCs in each of a remainder of the dies, and to perform one of an slc operation, a different mlc operation, or a transfer operation in the number of the dies during the period of time, wherein the number of the dies is based on a length of the period of time and a controller overhead associated with performing the one of the slc operation, the different mlc operation, or the transfer operation.
8. A storage device, comprising:
a plurality of dies each including a plurality of multi-level cells (MLCs) and a plurality of single-level cells (slcs); and
a controller configured to refrain from performing a first mlc operation in a number of the dies for a period of time while simultaneously performing the first mlc operation in one or more of the MLCs in each of a remainder of the dies, and to perform one of an slc operation, a second mlc operation different than the first mlc operation, or a transfer operation in the number of the dies during the period of time, wherein the number of the dies is based on a length of the period of time and a controller overhead associated with performing the one of the slc operation, the second mlc operation, or the transfer operation.
2. The storage device of
3. The storage device of
4. The storage device of
5. The storage device of
6. The storage device of
7. The storage device of
wherein the controller is further configured to refrain from programming the data in the MLCs of additional ones of the dies for another period of time while simultaneously programming one or more of the MLCs in each of another remainder of the dies, and to perform other slc operations in the additional ones of the dies during the another period of time.
9. The storage device of
10. The storage device of
11. The storage device of
12. The storage device of
13. The storage device of
14. The storage device of
15. The storage device of
16. The storage device of
wherein the controller is further configured to refrain from performing additional mlc operations in additional ones of the dies for another period of time while simultaneously performing the additional mlc operations in one or more of the MLCs in each of another remainder of the dies, and to perform other slc operations in the additional ones of the dies during the another period of time.
18. The storage device of
19. The storage device of
|
This disclosure is generally related to electronic devices and more particularly to storage devices.
Storage devices enable users to store and retrieve data. Examples of storage devices include non-volatile memory devices. A non-volatile memory generally retains data after a power cycle. An example of a non-volatile memory is a flash memory, which may include array(s) of NAND cells on one or more dies. Flash memory may be found in solid-state devices (SSDs), Secure Digital (SD) cards, and the like.
A flash storage device may store control information associated with data. For example, a flash storage device may maintain control tables that include a mapping of logical addresses to physical addresses. This control tables are used to track the physical location of logical sectors, or blocks, in the flash memory. The control tables are stored in the non-volatile memory to enable access to the stored data after a power cycle.
When writing data to cells of the flash memory, the flash storage device may identify the physical address of a block associated with a logical address, transfer the data to a number of data latches, and then program the data from the latches to the cells of the block at the identified physical address. Similarly, when reading data from cells of the flash memory, the flash storage device may identify the physical address of the block, sense the stored data in the block at the identified address into the data latches, and then read the data from the latches into a controller of the flash storage device. Including more dies containing these blocks in various types of NAND storage devices (e.g. SSDs, micro SD cards, Universal Serial Bus (USB) drives) may increase the storage capacity of such devices. Moreover, reducing the number of data latches in flash storage devices may save the costs of designing such devices.
Moreover, the flash storage device may suspend and resume program operations (i.e. to perform other operations such as sense operations) within a given time window. During this window, the firmware or controller overhead of the flash storage device for performing operations (e.g. the time used by the controller to prepare and send an operation command to a die) may be incurred in the background while operations themselves are being performed in the foreground. Such hiding of controller overheads (in the background) may reduce latency in completing the operations and improve the performance of the flash storage device. However, as the number of data latches is reduced to save cost, the length of the suspend resume window may be similarly reduced, thus resulting in less opportunity to hide controller overheads behind other operations. This effect on performance may become more significant as the number of dies in the flash storage device increases to improve storage capacity, thereby increasing the amount of controller overhead that may be incurred.
One aspect of a storage device is disclosed herein. The storage device includes a plurality of dies and a controller. The plurality of dies each include a plurality of multi-level cells (MLCs) and a plurality of single-level cells (SLCs). The controller is configured to receive an MLC program command, to refrain from programming data in the MLCs of one of the dies for a period of time while simultaneously programming one or more of the MLCs in each of a remainder of the dies, and to perform an SLC operation in the one of the dies during the period of time.
Another aspect of a storage device is disclosed herein. The storage device includes a plurality of dies and a controller. The plurality of dies each include a plurality of MLCs and a plurality of SLCs. The controller is configured to refrain from performing a first MLC operation in one of the dies for a period of time while simultaneously performing the first MLC operation in one or more of the MLCs in each of a remainder of the dies, and to perform one of an SLC operation, a second MLC operation different than the first MLC operation, or a transfer operation in the one of the dies during the period of time.
A further aspect of a storage device is disclosed herein. The storage device includes a plurality of dies and a controller. The plurality of dies each include a plurality of MLCs and a plurality of SLCs. The controller is configured to receive an MLC program command, to refrain from programming data in the MLCs of a number of the dies for a period of time while simultaneously programming one or more of the MLCs in each of a remainder of the dies, and to perform one of an SLC operation, a different MLC operation, or a transfer operation in the number of the dies during the period of time.
It is understood that other aspects of the storage device will become readily apparent to those skilled in the art from the following detailed description, wherein various aspects of apparatuses and methods are shown and described by way of illustration. As will be realized, these aspects may be implemented in other and different forms and its several details are capable of modification in various other respects. Accordingly, the drawings and detailed description are to be regarded as illustrative in nature and not as restrictive.
Various aspects of the present invention will now be presented in the detailed description by way of example, and not by way of limitation, with reference to the accompanying drawings, wherein:
The detailed description set forth below in connection with the appended drawings is intended as a description of various exemplary embodiments of the present invention and is not intended to represent the only embodiments in which the present invention may be practiced. The detailed description includes specific details for the purpose of providing a thorough understanding of the present invention. However, it will be apparent to those skilled in the art that the present invention may be practiced without these specific details. In some instances, well-known structures and components are shown in block diagram form in order to avoid obscuring the concepts of the present invention. Acronyms and other descriptive terminology may be used merely for convenience and clarity and are not intended to limit the scope of the invention.
The words “exemplary” and “example” are used herein to mean serving as an example, instance, or illustration. Any exemplary embodiment described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other exemplary embodiments. Likewise, the term “exemplary embodiment” of an apparatus, method or article of manufacture does not require that all exemplary embodiments of the invention include the described components, structure, features, functionality, processes, advantages, benefits, or modes of operation.
As used herein, the term “coupled” is used to indicate either a direct connection between two components or, where appropriate, an indirect connection to one another through intervening or intermediate components. In contrast, when a component referred to as being “directly coupled” to another component, there are no intervening elements present.
In the following detailed description, various aspects of a storage device in communication with a host device will be presented. These aspects are well suited for flash storage devices, such as SSDs and SD cards. However, those skilled in the art will realize that these aspects may be extended to all types of storage devices capable of storing data. Accordingly, any reference to a specific apparatus or method is intended only to illustrate the various aspects of the present invention, with the understanding that such aspects may have a wide range of applications without departing from the spirit and scope of the present disclosure.
When a controller of the storage device writes data into cells of memory during a program operation, the controller may suspend the program operation, perform another operation such as reading data from the cells of the memory, and then resume the suspended program operation. When operating within a suspend resume window, the controller may incur overheads by serially creating and sending operation commands to the dies to be handled in parallel. However, as the number of data latches in the storage device is reduced to save cost, the size of the suspend resume window may shrink. Therefore, the firmware overheads incurred by the controller may be too long to hide within the small suspend resume window, impacting performance of the storage device.
In attempt to maximize performance of parallel die operations, one approach may be to hide the controller overhead for subsequent NAND operations behind current NAND operations, regardless of (e.g. outside of) suspend resume windows. For example, while the storage device in the foreground is handling one page of data across multiple dies, the controller in the background may serially create context information and send operation commands to the dies to handle a subsequent page of data in parallel. Such approach may be effective in reducing latency for a small number of dies. However, as the number of dies in the storage device is increased to improve storage capacity, the controller overheads may similarly increase (e.g. lengthen in amount of time). Although the larger number of controller overheads may still be hidden behind longer or slower operations (e.g. program operations), such overheads may not be successfully hidden behind shorter or faster operations (e.g. sense operations). For example, the time for the controller to serially create and send instructions in the background to the larger number of dies to read a subsequent page, may be longer than the time for the dies in the foreground to complete a read operation of a current page. As a result, some controller overheads that have not been processed in the background may remain after the read operation is completed and therefore may be processed in the foreground, thereby causing a bottleneck that delays the time to read the subsequent page and that reduces the performance of the storage device.
To address such delays due to larger numbers of dies, the storage device described in the present disclosure performs different operations in one or more of the dies asymmetrically with respect to each other. For example, when the controller receives a command from a host device to perform a QLC program operation in multiple dies, then rather than waiting to complete the program operation for all of the pages in all of the dies prior to performing a SLC read or other central processing unit (CPU)-intensive operation (e.g. as illustrated in
While the above example refers specifically to CPU-intensive operations such as SLC read operations, the present disclosure may similarly be applied to operations involving other common resources such as controller random access memory (RAM) buffers and direct memory access (DMA) speeds. For instance, the controller may intentionally delay programming data in one or more of the dies in order to transfer that data to a buffer in RAM for data relocation while the other dies are busy programming. Moreover, the controller may intentionally delay programming one or more of the dies in order to receive or transmit data using DMA while the other dies are busy programming. By refraining from attempting to perform CPU-intensive or other common resource involving operations in all dies at the same time, the storage device may scatter the controller overheads associated with these operations such that they occur in the background during different die operations at different times, thereby reducing operation latency and improving storage device performance.
Those of ordinary skill in the art will appreciate that other exemplary embodiments can include more or less than those elements shown in
The host device 104 may store data to, and/or retrieve data from, the storage device 102. The host device 104 may include any computing device, including, for example, a computer server, a network attached storage (NAS) unit, a desktop computer, a notebook (e.g., laptop) computer, a tablet computer, a mobile computing device such as a smartphone, a television, a camera, a display device, a digital media player, a video gaming console, a video streaming device, or the like. The host device 104 may include at least one processor 101 and a host memory 103. The at least one processor 101 may include any form of hardware capable of processing data and may include a general purpose processing unit (such as a central processing unit (CPU)), dedicated hardware (such as an application specific integrated circuit (ASIC)), digital signal processor (DSP), configurable hardware (such as a field programmable gate array (FPGA)), or any other form of processing unit configured by way of software instructions, firmware, or the like. The host memory 103 may be used by the host device 104 to store data or instructions processed by the host or data received from the storage device 102. In some examples, the host memory 103 may include non-volatile memory, such as magnetic memory devices, optical memory devices, holographic memory devices, flash memory devices (e.g., NAND or NOR), phase-change memory (PCM) devices, resistive random-access memory (ReRAM) devices, magnetoresistive random-access memory (MRAM) devices, ferroelectric random-access memory (F-RAM), and any other type of non-volatile memory devices. In other examples, the host memory 103 may include volatile memory, such as random-access memory (RAM), dynamic random access memory (DRAM), static RAM (SRAM), and synchronous dynamic RAM (SDRAM (e.g., DDR1, DDR2, DDR3, DDR3L, LPDDR3, DDR4, and the like). The host memory 103 may also include both non-volatile memory and volatile memory, whether integrated together or as discrete units.
The host interface 106 is configured to interface the storage device 102 with the host 104 via a bus/network 108, and may interface using, for example, Ethernet or WiFi, or a bus standard such as Serial Advanced Technology Attachment (SATA), PCI express (PCIe), Small Computer System Interface (SCSI), or Serial Attached SCSI (SAS), among other possible candidates. Alternatively, the host interface 106 may be wireless, and may interface the storage device 102 with the host 104 using, for example, cellular communication (e.g. 5G NR, 4G LTE, 3G, 2G, GSM/UMTS, CDMA One/CDMA2000, etc.), wireless distribution methods through access points (e.g. IEEE 802.11, WiFi, HiperLAN, etc.), Infra Red (IR), Bluetooth, Zigbee, or other Wireless Wide Area Network (WWAN), Wireless Local Area Network (WLAN), Wireless Personal Area Network (WPAN) technology, or comparable wide area, local area, and personal area technologies.
As shown in the exemplary embodiment of
The storage device 102 also includes a volatile memory 118 that can, for example, include a Dynamic Random Access Memory (DRAM) or a Static Random Access Memory (SRAM). Data stored in volatile memory 118 can include data read from the NVM 110 or data to be written to the NVM 110. In this regard, the volatile memory 118 can include a buffer 121 (e.g. a write buffer or a read buffer) for temporarily storing data. While
The memory (e.g. NVM 110) is configured to store data 119 received from the host device 104. The data 119 may be stored in the cells 116 of any of the memory locations 112. As an example,
Each of the data 119 may be associated with a logical address. For example, the NVM 110 may store a logical-to-physical (L2P) mapping table 120 for the storage device 102 associating each data 119 with a logical address. The L2P mapping table 120 stores the mapping of logical addresses specified for data written from the host 104 to physical addresses in the NVM 110 indicating the location(s) where each of the data is stored. This mapping may be performed by the controller 123 of the storage device. The L2P mapping table may be a table or other data structure which includes an identifier such as a logical block address (LBA) associated with each memory location 112 in the NVM where data is stored. While
Referring back to
The NVM 110 includes sense amplifiers 124 and data latches 126 connected to each memory location 112. For example, the memory location 112 may be a block including cells 116 on multiple bit lines, and the NVM 110 may include a sense amplifier 124 on each bit line. Moreover, one or more data latches 126 may be connected to the bit lines and/or sense amplifiers. The data latches may be, for example, shift registers. When data is read from the cells 116 of the memory location 112, the sense amplifiers 124 sense the data by amplifying the voltages on the bit lines to a logic level (e.g. readable as a ‘0’ or a ‘1’), and the sensed data is stored in the data latches 126. The data is then transferred from the data latches 126 to the controller 123, after which the data is stored in the volatile memory 118 until it is transferred to the host device 104. When data is written to the cells 116 of the memory location 112, the controller 123 stores the programmed data in the data latches 126, and the data is subsequently transferred from the data latches 126 to the cells 116.
The storage device 102 includes a controller 123 which includes circuitry such as one or more processors for executing instructions and can include a microcontroller, a Digital Signal Processor (DSP), an Application-Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), hard-wired logic, analog circuitry and/or a combination thereof.
The controller 123 is configured to receive data transferred from one or more of the cells 116 of the various memory locations 112 in response to a read command. For example, the controller 123 may read the data 119 by activating the sense amplifiers 124 to sense the data from cells 116 into data latches 126, and the controller 123 may receive the data from the data latches 126. The controller 123 is also configured to program data into one or more of the cells 116 in response to a write command. For example, the controller 123 may write the data 119 by sending data to the data latches 126 to be programmed into the cells 116. The controller 123 is further configured to access the L2P mapping table 120 in the NVM 110 when reading or writing data to the cells 116. For example, the controller 123 may receive logical-to-physical address mappings from the NVM 110 in response to read or write commands from the host device 104, identify the physical addresses mapped to the logical addresses identified in the commands (e.g. translate the logical addresses into physical addresses), and access or store data in the cells 116 located at the mapped physical addresses.
The controller 123 and its components may be implemented with embedded software that performs the various functions of the controller described throughout this disclosure. Alternatively, software for implementing each of the aforementioned functions and components may be stored in the NVM 110 or in a memory external to the storage device 102 or host device 104, and may be accessed by the controller 123 for execution by the one or more processors of the controller 123. Alternatively, the functions and components of the controller may be implemented with hardware in the controller 123, or may be implemented using a combination of the aforementioned hardware and software.
In operation, the host device 104 stores data in the storage device 102 by sending a write command to the storage device 102 specifying one or more logical addresses (e.g., LBAs) as well as a length of the data to be written. The interface element 106 receives the write command, and the controller allocates a memory location 112 in the NVM 110 of storage device 102 for storing the data. The controller 123 stores the L2P mapping in the NVM (and the cache 122) to map a logical address associated with the data to the physical address of the memory location 112 allocated for the data. The controller also stores the length of the L2P mapped data. The controller 123 then stores the data in the memory location 112 by sending it to one or more data latches 126 connected to the allocated memory location, from which the data is programmed to the cells 116.
The host 104 may retrieve data from the storage device 102 by sending a read command specifying one or more logical addresses associated with the data to be retrieved from the storage device 102, as well as a length of the data to be read. The interface 106 receives the read command, and the controller 123 accesses the L2P mapping in the cache 122 or otherwise the NVM to translate the logical addresses specified in the read command to the physical addresses indicating the location of the data. The controller 123 then reads the requested data from the memory location 112 specified by the physical addresses by sensing the data using the sense amplifiers 124 and storing them in data latches 126 until the read data is returned to the host 104 via the host interface 106.
When the controller 123 reads data from or writes data to a page 316 of cells 302 (i.e. on a word line 304, 408), the controller may send a command to apply a read voltage or program voltage to the selected word line and a pass through voltage to the other word lines. The read or programmed state of the cell (e.g. a logic ‘0’ or a logic ‘1’ for SLCs) may then be determined based on a threshold voltage of the cells 302. For example, during an SLC read operation, if the threshold voltage of a cell 302 is smaller than the read voltage (i.e. current flows through the cell in response to the read voltage), the controller 123 may determine that the cell stores a logic ‘1’, while if the threshold voltage of the cell 302 is larger than the read voltage (i.e. current does not flow through the cell in response the read voltage), the controller 123 may determine that the cell stores a logic ‘0’. Similarly, during an SLC program operation, the controller may store a logic ‘0’ by sending a command to apply the program voltage to the cell 302 on the word line 304, 408 until the cell reaches the threshold voltage, and during an erase operation, the controller may send a command to apply an erase voltage to the block 402 including the cells 302 (e.g. to a substrate of the cells such as a p-well) until the cells reduce back below the threshold voltage (back to logic ‘1’).
For cells that store multiple bits (e.g. MLCs, TLCs, etc.), each word line 304, 408 may include multiple pages 316 of cells 302, and the controller may similarly send commands to apply read or program voltages to the word lines to determine the read or programmed state of the cells based on a threshold voltage of the cells. For instance, in the case of TLCs, each word line 304, 408 may include three pages 316, including a lower page (LP), a middle page (MP), and an upper page (UP), respectively corresponding to the different bits stored in the TLC. When programming TLCs, the LP may be programmed first, followed by the MP and then the UP. For example, a program voltage may be applied to the cell on the word line 304, 408 until the cell reaches a first intermediate threshold voltage corresponding to a least significant bit (LSB) of the cell. Next, the LP may be read to determine the first intermediate threshold voltage, and then a program voltage may be applied to the cell on the word line until the cell reaches a second intermediate threshold voltage corresponding to a next bit of the cell (between the LSB and the most significant bit (MSB)). Finally, the MP may be read to determine the second intermediate threshold voltage, and then a program voltage may be applied to the cell on the word line until the cell reaches the final threshold voltage corresponding to the MSB of the cell. Similarly, when reading TLCs, the controller 123 may read the LP to determine whether the LSB stores a logic 0 or 1 depending on the threshold voltage of the cell, the MP to determine whether the next bit stores a logic 0 or 1 depending on the threshold voltage of the cell, and the UP to determine whether the final bit stores a logic 0 or 1 depending on the threshold voltage of the cell.
When the controller 123 attempts to program cells 116, 302 of a selected word line 304, 408 into one of the program states 504, the controller may perform incremental step pulse programming (ISPP) over a number of programming loops or ISPP cycles. For example, a programming voltage (e.g. a high voltage) may be applied to the selected word line 304, 408, a pass through voltage (e.g. a high voltage lower than the programming voltage) may be applied to the other word lines 304, 408, a bit line program voltage (e.g. a low voltage) may be applied on the bit lines 306, 406 connected to the selected cells being programmed on the selected word line, and a bit line inhibit voltage (e.g. a high voltage) may be applied on the bit lines 306, 406 connected to the other cells not being programmed on the selected word line. Applying a high programming voltage to the selected word line and a low voltage to the selected bit lines allows electrons to tunnel from the channel into the charge trapping layer of those selected cells, thereby causing the threshold voltage of the cells to increase. On the other hand, applying a high voltage to unselected bit lines inhibits electrons from tunneling from the channel into the charge trapping layer of those unselected cells, thereby preventing the threshold voltage of those cells from increasing. Thus, bit lines coupled to cells programmed to lower states may be inhibited to prevent the threshold voltage of those cells from increasing while other cells are programmed to higher states. For instance, in the case of TLCs, the bit lines of cells that are first programmed into the A state may be inhibited first, followed by the bit lines of different cells that are programmed into the B state, followed by those that reach the C state, then the D state, and so forth until the remaining cells on the selected word line ultimately reach the G state and all cells on the selected word line have been programmed.
After the programming voltage is applied in one programming loop or ISPP cycle, a program verify voltage (e.g. a low voltage) may be applied to the word line 304, 408 to determine whether the threshold voltage of a cell has increased beyond a respective threshold voltage into an intended program state. If none of the cells have transitioned into an intended programming state, then another programming loop or ISPP cycle is performed in which a higher programming voltage may be applied to further increase the threshold voltage of the cells. Subsequently, a program verify voltage may again be applied to determine whether the threshold voltage of a cell has transitioned into an intended program state. The above process of incrementally increasing the programming voltage and verifying the voltage threshold of the selected cells may be repeated over a number of programming loops. If the cells transition into their respective programming states and the total number of programming loops does not exceed a predetermined loop count, the controller may determine that the cells have entered their intended program states and are thus successfully programmed.
When the controller 123 performs a program operation in a die as described above (e.g. using ISPP), data 119 is transferred into data latches 126 and programmed into the cells 116, 302 of that die. For example, when programming a TLC, data may be stored in a number of latches (e.g. including latches corresponding to the LP, MP, and UP), transferred from the latches to the TLC via the bit line 308, 408, and then programmed using applied voltages on the word line 304, 408 and bit line until the TLC transitions into a respective program state (e.g. A-G). Similarly, when programming a QLC, data may be stored in a number of latches, transferred to the QLC via the bit line, and programmed using applied voltages on the word line and bit line until the QLC transitions into a respective program state (e.g. A-N).
While the data latches 126 are occupied with data for programming, the die may be in a cache busy state. When one of the latches later becomes free (e.g. after programming), the die may enter a cache release state. A cache release may initiate a suspend resume window, during which the controller 123 may suspend the program operation, perform another operation such as a SLC read using the free latch, and then resume the program operation after completing the other operation. For instance,
However, when the number of data latches 126 is reduced to save design costs for the storage device 102, the amount of time that the die may be in cache busy state may be increased (since the time to free one of the data latches after programming may be longer when fewer data latches exist). Therefore, the time before a cache release occurs may be lengthened, resulting in a smaller suspend resume window during the program operation. For instance,
To achieve maximum performance, dies may be operated in parallel. For example, when the controller 123 sends a command to multiple dies to program or read the cells 116, 302 in NAND memory, the total NAND execution time of all dies may be equal to the NAND execution time of one of the dies when the dies are all operating in parallel. However, as the controller 123 sends commands to the dies serially (e.g. one at a time), the amount of time for the controller to issue the program or read command to each die (e.g. the firmware or controller overhead) may be multiplied by the number of dies. Thus, even if dies operate in parallel, multiple controller overheads (one for each die) may be incurred.
When the number of dies performing a current NAND operation is small, the controller overheads for a subsequent NAND operation may be hidden behind (i.e. performed in the background during) the current NAND operation. For example, while a current page is being read by multiple dies in the foreground, the controller 123 may in the background issue commands to the dies to read a subsequent page.
In this example 700, the controller 123 may receive a command from the host device 104 to program N pages of data across multiple dies in parallel. For instance, the controller may program a metapage (e.g. multiple pages 316 of cells 116, 302 across dies 708) within a metablock (e.g. a group of blocks 402 between different dies 708) of a metadie (e.g. a group of dies 708). In response to the host command, the controller may incur controller overhead 706 associated with page N−1 for dies 0 and 1. For example, the controller may issue a command to die 0 to program some cells in page N−1, followed by a command to die 1 to program other cells in page N−1. The dies 708 may then program the cells of the page N−1 in parallel in response to the commands. In the meanwhile, while page N−1 is being programmed in the foreground, the controller may in the background incur controller overhead 706 associated with page N for dies 0 and 1. For example, the controller may issue a command to die 0 to program some cells in next page N, followed by a command to die 1 to program other cells in next page N, while page N−1 is still being programmed. When page N−1 has completed programming, the dies 708 may then program the cells of the page N in parallel. This process may repeat for subsequent pages until all of the program operations 702 are completed.
After completing the program operations 702, the controller 123 may read X pages of data 119 in parallel across the dies 708. For instance, the controller may read multiple pages 316 of cells 116, 302 within the metapage/block/die to verify whether programming is successful. Accordingly, the controller may incur controller overhead 706 associated with page X−1 for dies 0 and 1. For example, the controller may issue a command to die 0 to read some cells in page X−1, followed by a command to die 1 to read other cells in page X−1. The dies 708 may then sense the cells of the page X−1 in parallel in response to the commands. In the meanwhile, while page X−1 is being sensed in the foreground, the controller may in the background incur controller overhead 706 associated with next page X for dies 0 and 1. For example, the controller may issue a command to die 0 to read some cells in next page X, followed by a command to die 1 to read other cells in next page X, while page X−1 is still being sensed. When page X−1 has completed sensing, the controller overheads 706 for the dies associated with page X may have all been incurred (due to there being few dies), and the dies 708 may then proceed to sense the cells of the page X in parallel. This process may repeat for subsequent pages until all of the read operations 704 are completed.
In the example 700 of
In this example 800, the controller 123 may receive a command from the host device 104 to program N pages of data across multiple dies in parallel, similar to the example 700 of
However, unlike the example 700 of
When MLC operations are performed, the controller CPU(s) (e.g. firmware), DMA, low density parity check (LDPC), and other system components of the storage device 102 may be idle. Moreover, MLC operations such as TLC or QLC program operations may typically be followed by sense operations. Examples of sense operations may include Enhanced Post Write Read (EPWR), header verification, and other SLC reads (e.g. host reads, relocation reads, control information/L2P reads, etc.). For instance, in EPWR, after each program operation, the controller 123 may sense the data 119 that is programmed to verify whether programming is successful. Similarly in header verification, after programming data 119 in SLCs, the controller 123 may fold the data from the SLCs to MLCs along with header data that is read/verified after folding for use in subsequent updating of the L2P mapping table 120, 205.
Typically, after the controller 123 triggers MLC program operations on all dies together (e.g. the blocks are grouped in a meta-block across the dies), the controller performs other operations (e.g. EPWR, header verification, other SLC reads, etc.) when the system components of the storage device are no longer idle. However, similar to the example 800 of
In this example 900, the controller 123 may receive a command from the host device 104 to program N pages of data across multiple dies in parallel, similar to the example 800 of
To address such bottlenecks, the controller 123 may perform program operations 602, 652, 702, 802, 902 (e.g. of MLCs, TLCs, QLCs, PLCs, etc.) asymmetrically across dies 708, 808, 908. For example, instead of initially programming all dies at the same time (e.g. in a meta-page across blocks of different dies) such as described above with respect to
Accordingly, even when all dies have a pending MLC program operation, the controller 123 may intentionally withhold the program operation for one or more dies, and in the meanwhile perform other operation(s) on the one or more dies, to prevent controller overhead bottlenecks from occurring as described above. Such approach may be advantageous over other approaches which merely focus on maximizing die utilization, e.g. where one of the dies does not have a pending QLC program operation and so the controller performs another operation on the unutilized die. Moreover, the controller may consider the operations performed on all of the dies (e.g. the total controller overheads which may be occurred in all dies) as part of its determination whether to withhold program operation(s) for one or more dies and thereby prevent CPU or common-resource intensive operations from occurring simultaneously on all dies. Such approach may be advantageous over other approaches that consider dies independently with respect to each other and merely issue a pending operation to a die if the die is free, in which case controller overhead bottlenecks may still occur. As a result, the storage device 102 may remove bottlenecks that may be caused by multiple dies attempting CPU or common resource intensive operations at the same time. Such advantage may reduce latency and provide a significant boost in performance, as opposed to, for example, simply freeing SLC space on one or more dies while QLC program operations are occurring on other dies.
For instance, referring to the example 1000 of
Next, the controller 123 may issue commands to most of the dies 1008 to program page N−1, including die 1. However, in this case, the controller may select a different die to refrain from programming page N−1 (i.e. die 30), and instead issue a command to that die to perform a SLC read operation at this time (e.g. as referenced by controller overhead 1006 for D30). Accordingly, the die 30 may perform the SLC read operation while the other dies 0-29 and 31 are programming page N−1 (or N−2). Once the die 30 has completed the SLC read operation, the controller may issue the command to die 30 to program page N−1, and the die 30 may proceed accordingly. As a result, the controller overhead 1006 associated with the SLC read operation is small enough (i.e. there is just one overhead for D30) to hide behind the program operation for page N−2, and thus no bottleneck may occur that may delay programming next page N−1. A similar benefit may occur when the controller subsequently selects another die (e.g. die 31) to perform an SLC read operation while other dies are programming the next page (e.g. page N), and the process may repeat with different dies until all the pages have been successfully programmed.
While the example 1000 of
In the example 1000 described above, the common-resource intensive operation 1004 may be a CPU-intensive operation such as a sense operation. Sense operations may include, for example, MLC reads during EPWR, header reads (for header verification), other internal reads (e.g. relocation reads, read scrub operations, etc.), or host reads. Such sense operations may include a relatively low ratio of controller overhead 1006 to operation time 1010 (e.g. ⅕ or 10 μs/50 μs). However, in other examples, the CPU-intensive operation may be a write operation that is performed in lower cost controllers, which may include a relatively higher ratio of controller overhead 1006 to operation time 1010 (e.g. due to longer controller overheads from less CPU speed). For instance, assume the storage device 102 includes 16 dies, that the common resource intensive operation 1004 is a SLC write operation, that the time 1010 to perform the SLC write operation (to program data in SLCs of one of the dies) is 140 μs, and that the controller overhead 1006 associated with performing the SLC write operation (the time to issue the command to the die to program the data) is 30 μs (due to less CPU speed). In such case, the controller may select four dies (140 μs/30 μs˜4) to refrain from page programming and instead to perform the SLC write operation, since the controller overhead 1006 for all four dies may be successfully hidden behind the time 1010 for completing the single SLC operation. Thus, assuming that there is pending work for all sixteen dies such as a QLC program operation, the remaining 12 dies may perform their respective QLC operations with respect to a particular page while the selected 4 dies are performing their respective SLC write operations. Similarly, for the next page, the controller may select another four dies (different than the previous four) to refrain from page programming and to perform SLC writing accordingly, and the controller may repeat the process for subsequent pages by rotating between different dies until all the pages have been fully programmed.
Furthermore, while the example 1000 described above refers to the common-resource intensive operation 1004 as a CPU-intensive operation (i.e. the operation 1004 involves limited processing power as a common resource shared across dies), the common resource may not be so limited. For example, the common-resource intensive operation 1004 may be another operation that involves common resources shared across multiple dies, such as buffer(s) (e.g. buffer 121) or controller RAM (e.g. volatile memory 118). In one example, when the controller performs data relocation, data may be read from the QLCs into the buffer(s) of controller RAM and then written from the buffer(s) into different QLCs. If relocation is to be performed for multiple dies in parallel, then the buffer(s) may not be large enough to store all of the data to be relocated for all of the dies. Accordingly, instead of performing a relocation operation at the same time for all of the dies, the controller may asymmetrically perform the relocation operations such that the buffer transfers are limited to a number of dies based on the buffer size. For instance, if the buffer size is 128 KB and each relocation operation requires 128 KB of QLC data (or other amount enough for one die), then the controller may select one die at a time for the data relocation and refrain from relocating the data in other dies in the meanwhile. In another example, when the controller attempts to transfer data (e.g. using DMA) from the controller RAM to the data latches 126, or vice-versa, in a large number of dies prior to performing program operation in each die, the delay in waiting for the transfer to complete for all dies may cause a similar bottleneck. Accordingly, instead of performing the DMA operation at the same time for all of the dies, the controller may asymmetrically perform the DMA operations such that the DMA transfers are limited to a number of dies.
The controller 1102 may receive a MLC program command 1116 from the host device 1106 to program one or more pages of the MLCs 1110 in the blocks 1108 of the dies 1104. For instance, the dies 1104 may be grouped into a meta-die, and the MLC program command may include a logical address corresponding to a meta-block (e.g. a group of the blocks 1108 across the dies 1104) along with the data to be programmed into the MLCs 1110 (e.g. in one or more meta-pages or groups of pages of MLCs 1110 across the dies). In response to the MLC program command 1116, the controller 1102 may perform an MLC operation 1118 in most of the dies 1104. For example, the MLC operation 1118 may correspond to program operation 1002 in
However, the controller may select one or more of the dies 1104 to refrain from performing the MLC operation 1118, and instead may perform a common resource intensive operation 1120 in the selected die. The common resource intensive operation 1120 may correspond to common resource intensive operation 1004 in
After the selected die(s) 1104 complete performance of the common resource intensive operation(s) 1120, the controller may proceed to perform the MLC operation 1118 in those die(s). For instance, after a selected die completes reading SLCs 1112 in response to an SLC read operation, the controller may then issue a command to the selected die to program the MLCs 1110 in response to the MLC program command 1116. The controller may similarly repeat the above process for other dies until the MLC program command 1116 has been completely processed. For instance, if the controller is to program multiple pages of data in response to the MLC program command, the controller may select for each page a different group of dies 1104 to withhold programming and instead perform common resource intensive operations in those selected dies during that time. Thus, the controller overhead(s) associated with issuing the common resource intensive operation(s) 1120 may be hidden behind foreground operations, thereby preventing bottlenecks from occurring which delay execution of subsequent MLC operations 1118.
As represented by block 1202, the controller may receive an MLC program command to program a plurality of dies. The plurality of dies may each include a plurality of MLCs and a plurality of SLCs. For example, referring to
As represented by block 1204, the controller may refrain from performing a first MLC operation in one or more of the dies for a period of time while simultaneously performing the first MLC operation in each of a remainder of the dies. For instance, referring to
As represented by block 1206, the controller may perform an SLC operation, a second MLC operation, or a transfer operation in the one or more of the dies during the period of time. In one example, the SLC operation may comprise one of a read operation or a write operation. For instance, referring to
In another example, as represented by block 1208, the controller may include a volatile memory having a buffer, and the controller may read data from the one or more of the dies into the buffer during the second MLC operation. For instance, referring to
In another example, as represented by block 1210, the controller may receive or transmit data from or to the one or more dies during the second MLC operation. For instance, referring to
As represented by block 1212, the controller may perform the first MLC operation in the one or more dies after performing the SLC operation, the second MLC operation, or the transfer operation. For example, referring to
As represented by block 1214, the controller may suspend performing the first MLC operation in the one or more dies while performing another SLC operation during a suspend resume window. Then, as represented by block 1216, the controller may resume performing the first MLC operation during the suspend resume window after performing the another SLC operation. For instance, referring to
As represented by block 1218, the controller may refrain from performing another MLC operation in another one or more of the dies for another period of time while simultaneously performing the another MLC operation in each of another remainder of the dies. For example, referring to
As represented by block 1220, the controller may perform another SLC operation in the another one or more of the dies during the another period of time. For instance, referring to
In another example, the controller may refrain from performing additional MLC operations in additional ones of the dies for another period of time while simultaneously performing the additional MLC operations in one or more of the MLCs in each of another remainder of the dies. For example, referring to
As a result of asymmetric die handling of common-resource intensive operations as described above, bottlenecks may be removed and performance of the storage device may thereby be improved. For example, assume the common-resource intensive operation is an SLC read operation (e.g. in EPWR). Then for TLCs, assuming a total TLC program time of 691.2 ms for all dies, a controller overhead of 6 μs for each SLC operation, a total number of sense operations of 1152 per die, and an average sense time per die of 70 μs, the performance gain of the storage device 102 that may be achieved through asymmetric TLC operation handling may be 3.9% for 16 dies and 22% for 32 dies. Similarly for QLCs, assuming a total QLC program time of 3456 ms for all dies, a controller overhead of 7 μs for each SLC operation, a total number of sense operations of 1536 per die, and an average sense time per die of 130 μs, the performance gain of the storage device 102 that may be achieved through asymmetric QLC operation handling may be 4.2% for 32 dies.
Accordingly, the storage device described in the present disclosure may provide sustained performance gain for the storage device without increasing controller cost, which may be especially advantageous for low cost controllers. Background operations involving more CPU overheads may be performed using asymmetric operation handling, resulting in improved Quality of Service (QoS). Moreover, bottlenecks may be removed not only in connection with CPU overheads, but also in connection with other shared resources such as buffers and DMA speed.
The various aspects of this disclosure are provided to enable one of ordinary skill in the art to practice the present invention. Various modifications to exemplary embodiments presented throughout this disclosure will be readily apparent to those skilled in the art, and the concepts disclosed herein may be extended to other magnetic storage devices. Thus, the claims are not intended to be limited to the various aspects of this disclosure, but are to be accorded the full scope consistent with the language of the claims. All structural and functional equivalents to the various components of the exemplary embodiments described throughout this disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the claims. Moreover, nothing disclosed herein is intended to be dedicated to the public regardless of whether such disclosure is explicitly recited in the claims. No claim element is to be construed under the provisions of 35 U.S.C. § 112(f) in the United States, or an analogous statute or rule of law in another jurisdiction, unless the element is expressly recited using the phrase “means for” or, in the case of a method claim, the element is recited using the phrase “step for.”
Sharma, Amit, Agarwal, Dinesh Kumar, Venugopal, Abhinandan
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
9153324, | Jan 30 2014 | SanDisk Technologies LLC | Pattern breaking in multi-die write management |
9245590, | Feb 28 2014 | Winbond Electronics Corporation | Stacked die flash memory device with serial peripheral interface |
9329986, | Sep 10 2012 | SanDisk Technologies LLC | Peak current management in multi-die non-volatile memory devices |
20080209109, | |||
20090244949, | |||
20170123666, | |||
20170357447, | |||
20180275922, | |||
EP3176688, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Jun 08 2020 | Western Digital Technologies, Inc. | (assignment on the face of the patent) | / | |||
Jun 19 2020 | AGARWAL, DINESH KUMAR | Western Digital Technologies, INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 053012 | /0139 | |
Jun 19 2020 | SHARMA, AMIT | Western Digital Technologies, INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 053012 | /0139 | |
Jun 23 2020 | VENUGOPAL, ABHINANDAN | Western Digital Technologies, INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 053012 | /0139 | |
Aug 28 2020 | Western Digital Technologies, INC | JPMORGAN CHASE BANK, N A , AS AGENT | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 053926 | /0446 | |
Feb 03 2022 | JPMORGAN CHASE BANK, N A | Western Digital Technologies, INC | RELEASE OF SECURITY INTEREST AT REEL 053926 FRAME 0446 | 058966 | /0321 | |
Aug 18 2023 | Western Digital Technologies, INC | JPMORGAN CHASE BANK, N A | PATENT COLLATERAL AGREEMENT - A&R LOAN AGREEMENT | 064715 | /0001 | |
Aug 18 2023 | Western Digital Technologies, INC | JPMORGAN CHASE BANK, N A | PATENT COLLATERAL AGREEMENT - DDTL LOAN AGREEMENT | 067045 | /0156 | |
May 03 2024 | Western Digital Technologies, INC | SanDisk Technologies, Inc | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 067567 | /0682 | |
Jun 21 2024 | SanDisk Technologies, Inc | SanDisk Technologies, Inc | CHANGE OF NAME SEE DOCUMENT FOR DETAILS | 067982 | /0032 | |
Aug 20 2024 | SanDisk Technologies, Inc | JPMORGAN CHASE BANK, N A , AS THE AGENT | PATENT COLLATERAL AGREEMENT | 068762 | /0494 |
Date | Maintenance Fee Events |
Jun 08 2020 | BIG: Entity status set to Undiscounted (note the period is included in the code). |
Date | Maintenance Schedule |
Mar 22 2025 | 4 years fee payment window open |
Sep 22 2025 | 6 months grace period start (w surcharge) |
Mar 22 2026 | patent expiry (for year 4) |
Mar 22 2028 | 2 years to revive unintentionally abandoned end. (for year 4) |
Mar 22 2029 | 8 years fee payment window open |
Sep 22 2029 | 6 months grace period start (w surcharge) |
Mar 22 2030 | patent expiry (for year 8) |
Mar 22 2032 | 2 years to revive unintentionally abandoned end. (for year 8) |
Mar 22 2033 | 12 years fee payment window open |
Sep 22 2033 | 6 months grace period start (w surcharge) |
Mar 22 2034 | patent expiry (for year 12) |
Mar 22 2036 | 2 years to revive unintentionally abandoned end. (for year 12) |