Examples described herein utilize multi-layer neural networks, such as multi-layer recurrent neural networks to decode encoded data (e.g., data encoded using one or more encoding techniques). The neural networks and/or recurrent neural networks have nonlinear mapping and distributed processing capabilities which are advantageous in many systems employing the neural network decoders and/or recurrent neural networks. In this manner, neural networks or recurrent neural networks described herein are used to implement error correction coding (ECC) decoders.
|
1. An apparatus, comprising:
a first stage of circuitry configured to receive encoded data, to combine the encoded data, and to evaluate at least one non-linear function using combinations of the encoded data and delayed versions of the combinations of the encoded data to provide intermediate data; and
at least a second stage of circuitry configured to receive the intermediate data and combine the intermediate data using a set of predetermined weights to provide decoded data the set of predetermined weights based at least in part on an encoding technique associated with the encoded data, wherein the first stage of circuitry and second stage of circuitry comprises a first plurality of multiplication/accumulation units, the first plurality of multiplication/accumulation units each configured to multiply at least one bit of the encoded data with at least one of the set of predetermined weights and sum multiple weighted bits of the encoded data.
15. A method comprising:
mixing encoded data received at a processing unit using a plurality of coefficients and an additional plurality of coefficients, both the plurality of coefficients and the additional plurality of coefficients specific to an encoding technique associated with the encoded data, wherein mixing the encoded data comprises:
calculating, at a first layer of multiplication/accumulation processing units (mac units) of a plurality of mac units, the encoded data and delayed versions of respective outputs of the first layer of mac units with the plurality of coefficients to generate first processing results;
calculating, at additional layers of mac units of the plurality of mac units, the first processing results and delayed versions of at least a portion of the first processing results with the additional plurality of coefficients to generate second processing results; and
providing output data based partly on the second processing results, the output data representative of the encoded data being decoded, such that the output data is an estimate of decoded data based on the encoded data.
9. A method comprising:
receiving, at a computing device that comprises a neural network, signaling indicative of a set of data pairs each comprising known encoded data and decoded data, wherein the known encoded data is encoded with a particular encoding technique,
determining a set of weights for the neural network to decode data with the particular encoding technique based partly on the signaling, coefficient multiplication results associated with the signaling, and delayed versions of the coefficient multiplication results associated with the signaling;
receiving signaling, from a memory of the computing device, indicative of data encoded with the particular encoding technique;
decoding the data using the neural network using the weights, wherein decoding the data using the neural network comprises weighting bits of the data using at least some of the set of weights, and summing selected weighted bits of the data to provide intermediate data, then weighting bits of the intermediate data using at least some of the weights and summing selected weighted bits of the intermediate data and selected delayed versions of the weighted bits of the intermediate data to provide the decoded data; and
writing the decoded data to or reading the decoded data from a memory or storage medium of the computing device.
2. The apparatus of
3. The apparatus of
4. The apparatus of
5. The apparatus of
6. The apparatus of
7. The apparatus of
8. The apparatus of
10. The method of
determining multiple sets of weights for the neural network using additional encoded data, decoded data pairs encoded with other encoding techniques, and delayed versions of additional coefficient multiplication results based on the additional encoded data and the decoded data pairs encoded with the other encoding techniques, wherein the multiple sets of weights each correspond with a different encoding technique;
selecting the set of weights associated with the particular encoding technique; and
providing the set of weights to the neural network or another neural network for use in decoding the data.
11. The method of
receiving further input data encoded with another encoding technique;
selecting a selected set of weights of the multiple sets of weights associated with the another encoding technique;
providing the selected set of weights to the neural network; and
decoding the further input data using the selected set of weights.
12. The method of
13. The method of
14. The method of
16. The method of
delaying, at respective delay units associated with the first layer of mac units, the respective outputs of the first layer of mac units to generate the delayed versions of the respective outputs of the first layer of mac units.
17. The method of
multiplying the encoded data and the delayed versions of the respective outputs of the first layer of mac units with respective coefficients of the plurality of coefficients to generate the first processing results.
18. The method of
obtaining, at the processing unit, signaling indicative of the encoded data from a memory coupled, the signaling indicative of the encoded data including an indication from a touchscreen of a mobile communication device that the encoding technique was utilized.
19. The method of
|
Examples described herein relate to neural networks for use in decoding encoded data. Examples of neural networks are described which may be used with error-correcting coding (ECC) memory, where a neural network may be used to decode encoded data.
Error correction coding (ECC) may be used in a variety of applications, such as memory devices or wireless baseband circuitry. Generally, error correction coding techniques may encode original data with additional bits to describe the original bits which are intended to be stored, retrieved, and/or transmitted. The additional bits may be stored together with the original bits. Accordingly, there may be L bits of original data to be stored and/or transmitted. An encoder may provide N-L additional bits, such that the encoded data may be N bits worth of data. The original bits may be stored as the original bits, or may be changed by the encoder to form the encoded N bits of stored data. A decoder may decode the N bits to retrieve and/or estimate the original L bits, which may be corrected in some examples in accordance with the ECC technique.
Bit flips (e.g., a change in charge at a memory cell) are an occurrence in non-volatile memory devices, where ECC may be applied. Thus, memory devices may operate with complex error correction techniques whose area and power needs are rising; resulting in higher cost silicon and longer firmware development times.
Multi-layer neural networks and/or multi-layer recurrent neural networks may be used to decode encoded data (e.g., data encoded using one or more encoding techniques). Such neural networks may have nonlinear mapping and distributed processing capabilities which may be advantageous in many systems employing the neural network decoders. In this manner, neural networks described herein may be used to implement error correction coding (ECC) decoders.
An encoder may have L bits of input data (a1, a2, . . . aL). The encoder may encode the input data in accordance with an encoding technique to provide N bits of encoded data (b1, b2, . . . bN). The encoded data may be stored and/or transmitted, or some other action taken with the encoded data, which may introduce noise into the data. Accordingly, a decoder may receive a version of the N bits of encoded data (x1, x2, . . . xN). The decoder may decode the received encoded data into an estimate of the L bits original data (y1, y2, . . . yL).
Examples of wireless baseband circuitry may utilize error correction coding (such as low density parity check coding, LDPC). An encoder may add particularly selected N-L bits into an original data of L bits, which may allow a decoder to decode the data and reduce and/or minimize errors introduced by noise, interferences and other practical factors in the data storage and transmission.
There are a variety of particular error correction coding techniques, including low density parity check coding (LDPC), Reed-Solomon coding, Bose-Chaudhuri-Hocquenghem (BCH), and Polar coding. The use of these coding techniques, however, may come at the cost of the decrease of the frequency and/or channel and/or storage resource usage efficiency and the increase of the processing complexity. For example, the use of coding techniques may increase the amount of data which may be stored and/or transmitted. Moreover, processing resources may be necessary to implement the encoding and decoding. In some examples, the decoder may be one of the processing blocks that cost the most computational resources in wireless baseband circuitry and/or memory controllers, which may reduce the desirability of existing decoding schemes in many emerging applications such as Internet of Things (IoT) and/or tactile internet where ultra-low power consumption and ultra-low latency are highly desirable.
Examples described herein utilize multi-layer neural networks (NNs), such as multi-layer recurrent neural networks (RNNs) to decode encoded data (e.g., data encoded using one or more encoding techniques). The NNs and/or RNNs have nonlinear mapping and distributed processing capabilities which may be advantageous in many systems employing the neural network decoders. For example, in some non-volatile memory devices where bit flips may degrade memory storage capabilities, NNs and/or RNNs may provide a more robust decoder that may be trained to transform noisy encoded input data to decoded data (e.g., an estimate of the decoded data). Advantageously, such noisy encoded data, may be decoded such that the NNs or RNNs reduce and/or improve errors which may be introduced by noise present in the input data. In the example, such noise may be introduced in storing the encoded data in memory that is degraded (e.g., due to bit flips).
Electronic device 110 also includes processing units 112 that may interact with mode configurable control 105, which may be encoded with instructions executable by the processing unit(s) 112. In some implementations, mode configurable control 105 may be implemented as a memory. As used herein, memory may refer to computer readable media, which may include both storage media and communication media. Example computer readable media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions. The mode configurable control 105 includes the executable instructions 115 for mixing encoded input data with coefficients. For example, the mode configuration control 105 may be implemented using circuitry (e.g., logic), one or more processor(s), microcontroller(s), controller(s), or other elements. The mode configuration control 105 may select certain weights and/or other parameters (e.g., from memory 145 which may store weights) and provide those weights and/or other parameters to one or more of the multiplication/accumulation units and/or memory look-up units of
The processing unit(s) 112 may be implemented using one or more processors, for example, having any number of cores. In some examples, the processing unit(s) 112 may include circuitry, including custom circuitry, and/or firmware for performing functions described herein. For example, circuitry can include multiplication unit/accumulation units for performing the described functions, as described herein. Processing unit(s) 112 can be any type including but not limited to a microprocessor or a digital signal processor (DSP), or any combination thereof. For example, processing unit(s) 112 can include levels of caching, such as a level one cache and a level two cache, a core, and registers. An example processor core can include an arithmetic logic unit (ALU), a bit manipulation unit, a multiplication unit, an accumulation unit, an adder unit, a look-up table unit, a memory look-up unit, or any combination thereof. Examples of processing unit 112 are described herein, for example with reference to
The mode configurable control 105, for example, may be encoded with executable instructions 115 for mixing encoded input data with coefficient data, e.g., to mix noisy encoded input data with coefficient data at the processing units 112. For example, in the context of decoding noisy encoded input data from memory 140 or 145, the executable instructions 115 for mixing encoded input data with coefficient data may include instructions for obtaining the noisy encoded input data from the memory 140 or 145; and to transform that obtained noisy encoded input data at the processing unit(s) 112 into decoded data (e.g., an estimate of the decoded data). For example, the executable instructions 115 for mixing encoded input data with coefficient data may further include instructions for multiplying a portion of the noisy encoded input data with coefficient data to generate a coefficient multiplication result and accumulating the coefficient multiplication result to be further multiplied and accumulated with another portion of the noisy encoded input data and coefficient data, examples of which are described herein. For example, to generate a coefficient multiplication result, a first layer of multiplication/accumulation processing units (MAC units) may calculate the noisy encoded input data with the plurality of coefficients to generate such coefficient multiplication results, or first processing results of the first layer of MAC units. Continuing in the example, to provide the output data, additional layers of MAC units may calculate the first processing results with additional pluralities of coefficients to generate additional coefficient multiplication results, or second processing results of the additional layers of MAC units. The MLUs of a last layer of the additional layers of MAC units may provide the decoded data based on the second processing results. Accordingly, the executable instructions 115 for mixing encoded input data with coefficient data may include various sets of executable instructions for different types of hardware implementations, such as those shown in
The user interface 114 may be implemented with any of a number of input devices including, but not limited to, a touchscreen, keyboard, mouse, microphone, or combinations thereof. The user interface 114 may receive input from a user, for example, regarding a processing mode selection to specify a processing mode for the processing unit(s) 112. The user interface 114 may communicate the user input to the mode configurable control 105. Example user interfaces 114 include a serial interface controller or a parallel interface controller, which may be configured to communicate with external input devices (e.g., keyboard, mouse, pen, voice input device, touch input device, etc.).
The network 120 may include a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency (RF), microwave, infrared (IR) and other wireless media.
The memory(s) 140, and 145 (or mode configurable control 105, if being implemented as a memory) may be implemented using any storage medium accessible to the processing unit(s) 112. For example, RAM, ROM, solid state memory, flash memory, disk drives, system memory, optical storage, or combinations thereof, may be used to implement the mode configurable control 105 or memory(s) 140, and 145. In storing encoded data in such memories, environmental or other noise may be introduced in the storing process. For example, noise may be introduced in storing the encoded data in memory 140 or 145 that is degraded (e.g., due to bit flips). Accordingly, data obtained from the memory(s) 140 or 145 may be referred to as noisy encoded input data. In some implementations, the mode configurable control 105 may store associations between coefficients and particular encoding techniques described herein.
The electronic device 110 may be implemented using any of a variety of computing systems, including but not limited to one or more desktop, server, laptop, or other computers. The electronic device 110 generally includes one or more processing unit(s) 112. The computing system 100 may be implemented as a mobile communication device using any user communication device, including but not limited to, a desktop, laptop, cellular phone, tablet, appliance, automobile, or combinations thereof. The electronic device 110 may be programmed with a mobile application (e.g. processing unit(s) 112 and computer readable media encoded with instructions which, when executed, cause the electronic device 110 to perform described functions) for mixing noisy encoded input data with coefficient data. For example, the electronic device 110 may be programmed to receive an indication from a touchscreen of a mobile communication device that a particular encoding technique was utilized for the noisy encoded input data received in a 5G wireless transmission.
It is to be understood that the arrangement of computing systems of the system 100 may be quite flexible, and although not shown, it is to be understood that the system 100 may include many electronic devices 110, which may be connected via the network 120 can operate in conjunction with each other to perform the systems and methods described herein. The memory 145 and/or the memory 140 may in some examples be implemented using the same media, and in other examples may be implemented using different media. For example, while the memory 140 is shown in
Generally, a neural network may be used including multiple stages of nodes. The nodes may be implemented using processing units (e.g., processing units 112) which may execute one or more functions on inputs received from a previous stage and provide the output of the functions to the next stage of the neural network. The processing units may be implemented using, for example, one or more processors, controllers, and/or custom circuitry, such as an application specific integrated circuit (ASIC) and/or a field programmable gate array (FPGA). In some examples, the processing units may be implemented using any combination of one or more processing units described with respect to
In the example, of
The neural network 150 may have a next layer, which may be referred to as a ‘hidden layer’ in some examples. The next layer may include combiner 152, combiner 154, combiner 156, and combiner 158, although any number of elements may be used. While the processing elements in the second stage of the neural network 150 are referred to as combiners, generally the processing elements in the second stage may perform a nonlinear activation function using the input data bits received at the processing element. Any number of nonlinear activation functions may be used. Examples of functions which may be used include Gaussian functions, such as
Examples of functions which may be used include multi-quadratic functions, such as ƒ(r)=(r2+σ2)1/2. Examples of functions which may be used include inverse multi-quadratic functions, such as ƒ(r)=(r2+σ2)−1/2. Examples of functions which may be used include thin-plate-spline functions, such as ƒ(r)=r2 log(r). Examples of functions which may be used include piece-wise linear functions, such as ƒ(r)=½(|r+1|−|r−1|). Examples of functions which may be used include cubic approximation functions, such as ƒ(r)=½(|r3+1|−|r3−1|). In these example functions, a represents a real parameter (e.g., a scaling parameter) and r is the distance between the input vector and the current vector. The distance may be measured using any of a variety of metrics, including the Euclidean norm.
Each element in the ‘hidden layer’ may receive as inputs selected bits (e.g., some or all) of the input data. For example, each element in the ‘hidden layer’ may receive as inputs from the output of multiple selected elements (e.g., some or all elements) in the input layer. For example, each element in the ‘hidden layer’ may receive as inputs from the output of multiple selected units (e.g., some or all units) in the input layer. For example, the combiner 152 may receive as inputs the output of node 168, node 169, node 172, and node 174. While a single ‘hidden layer’ is shown by way of example in
The neural network 150 may have an output layer. The output layer in the example of
In some examples, the neural network 150 may be used to provide L output bits which represent processed data corresponding to m input bits. For example, in the example of
Examples of neural networks may be trained. Training generally refers to the process of determining weights, functions, and/or other attributes to be utilized by a neural network to create a desired transformation of input data to output data. In some examples, neural networks described herein may be trained to transform encoded input data to decoded data (e.g., an estimate of the decoded data). In some examples, neural networks described herein may be trained to transform noisy encoded input data to decoded data (e.g., an estimate of the decoded data). In this manner, neural networks may be used to reduce and/or improve errors which may be introduced by noise present in the input data. In some examples, neural networks described herein may be trained to transform noisy encoded input data to encoded data with reduced noise. The encoded data with reduced noise may then be provided to any decoder (e.g., a neural network and/or other decoder) for decoding of the encoded data. In this manner, neural networks may be used to reduce and/or improve errors which may be introduced by noise.
Training as described herein may be supervised or un-supervised in various examples. In some examples, training may occur using known pairs of anticipated input and desired output data. For example, training may utilize known encoded data and decoded data pairs to train a neural network to decode subsequent encoded data into decoded data. In some examples, training may utilize known noisy encoded data and decoded data pairs to train a neural network to decode subsequent noisy encoded data into decoded data. In some examples, training may utilize known noisy encoded data and encoded data pairs to train a neural network to provide encoded data having reduced noise than input noisy encoded data. Examples of training may include determining weights to be used by a neural network, such as neural network 150 of
Examples of training can be described mathematically. For example, consider input data at a time instant (n), given as: X(n)=[x1(n), x2(n), . . . xm(n)]T. The center vector for each element in hidden layer(s) of the neural network 150 (e.g., combiner 152, combiner 154, combiner 156, and combiner 158) may be denoted as Ci (for i=1, 2, . . . , H, where H is the element number in the hidden layer).
The output of each element in a hidden layer may then be given as:
hi(n)=ƒi(∥X(n)−Ci∥) for (i=1,2, . . . ,H) (1)
The connections between a last hidden layer and the output layer may be weighted. Each element in the output layer may have a linear input-output relationship such that it may perform a summation (e.g., a weighted summation). Accordingly, an output of the i'th element in the output layer at time n may be written as:
yi(n)=Σj=1HWijhj(n)=Σj=1HWijƒj(∥X(n)−Cj∥) (2)
for (i=1, 2, . . . , L) and where L is the element number of the output of the output layer and Wij is the connection weight between the j'th element in the hidden layer and the i'th element in the output layer.
Generally, a neural network architecture (e.g., the neural network 150 of
Examples of neural networks may accordingly be specified by attributes (e.g., parameters). In some examples, two sets of parameters may be used to specify a neural network: connection weights and center vectors (e.g., thresholds). The parameters may be determined from selected input data (e.g., encoded input data) by solving an optimization function. An example optimization function may be given as:
E=Σn=1M∥Y(n)−∥2 (3)
where M is a number of trained input vector (e.g., trained encoded data inputs) and Y(n) is an output vector computed from the sample input vector using Equations (1) and (2) above, and is the corresponding desired (e.g., known) output vector. The output vector Y(n) may be written as:
Y(n)=[y1(n),y2(n), . . . yL(n)]T
Various methods (e.g., gradient descent procedures) may be used to solve the optimization function. However, in some examples, another approach may be used to determine the parameters of a neural network, which may generally include two steps—(1) determining center vectors Ci (i=1, 2, . . . , H) and (2) determining the weights.
In some examples, the center vectors may be chosen from a subset of available sample vectors. In such examples, the number of elements in the hidden layer(s) may be relatively large to cover the entire input domain. Accordingly, in some examples, it may be desirable to apply k-means cluster algorithms. Generally, k-means cluster algorithms distribute the center vectors according to the natural measure of the attractor (e.g., if the density of the data points is high, so is the density of the centers). k-means cluster algorithms may find a set of cluster centers and partition the training samples into subsets. Each cluster center may be associated with one of the H hidden layer elements in this network. The data may be partitioned in such a way that the training points are assigned to the cluster with the nearest center. The cluster center corresponding to one of the minima of an optimization function. An example optimization function for use with a k-means cluster algorithm may be given as:
Ek_means=Σj=1HΣn=1MBjn∥X(n)−Cj∥2 (4)
where Bjn is the cluster partition or membership function forming an H×M matrix. Each column may represent an available sample vector (e.g., known input data) and each row may represent a cluster. Each column may include a single ‘1’ in the row corresponding to the cluster nearest to that training point, and zeros elsewhere.
The center of each cluster may be initialized to a different randomly chosen training point. Then each training example may be assigned to the element nearest to it. When all training points have been assigned, the average position of the training point for each cluster may be found and the cluster center is moved to that point. The clusters may become the desired centers of the hidden layer elements.
In some examples, for some transfer functions (e.g., the Gaussian function), the scaling factor σ may be determined, and may be determined before determining the connection weights. The scaling factor may be selected to cover the training points to allow a smooth fit of the desired network outputs. Generally, this refers to any point within the convex hull of the processing element centers may significantly activate more than one element. To achieve this goal, each hidden layer element may activate at least one other hidden layer element to a significant degree. An appropriate method to determine the scaling parameter σ may be based on the P-nearest neighbor heuristic, which may be given as,
where Cj (for i=1, 2, . . . , H) are the P-nearest neighbors of Ci.
The connection weights may additionally or instead be determined during training. In an example of a neural network, such as neural network 150 of
where W={Wij} is the L×H matrix of the connection weights, F is an H×M matrix of the outputs of the hidden layer processing elements and whose matrix elements are computed using Fin=ƒi(∥X(n)−Ci∥) (i=1, 2, . . . H; n=1, 2, . . . M) and =[(1), (2), . . . , (M)] is the L×M matrix of the desired (e.g., known) outputs. The connection weight matrix W may be found from Equation (5) and may be written as follows:
where F+ is the pseudo-inverse of F. In this manner, the above may provide a batch-processing method for determining the connection weights of a neural network. It may be applied, for example, where all input sample sets are available at one time. In some examples, each new sample set may become available recursively, such as in the recursive-least-squares algorithms (RLS). In such cases, the connection weights may be determined as follows.
First, connection weights may be initialized to any value (e.g., random values may be used). The output vector Y(n) may be computed using Equation (2). The error term ei(n) of each output element in the output layer may be computed as follows:
ei(n)=yi(n)−i(n) (i=1,2, . . . ,L)
The connection weights may then be adjusted based on the error term, for example as follows:
Wij(n+1)=Wij(n)+γei(n)ƒj(∥X(n)−Ci∥) (7)
The total error may be computed according to the output from the output layer and the desired (known) data:
∈=∥Y(n)−Y∥2 (8)
The process may be iterated by again calculating a new output vector, error term, and again adjusting the connection weights. The process may continue until weights are identified which reduce the error to equal to or less than a threshold error.
Accordingly, the neural network 150 of
Recall that the structure of neural network 150 of
In examples of supervised learning, the input training samples: [x1(n), x2(n), . . . xm(n)] may be generated by passing the encoded samples [b1(n), b2(n), . . . bm(n)] through some noisy channels and/or adding noise. The supervised output samples may be the corresponding original code [a1(n), a2(n), . . . aL(n)] which may be used to generate [b1(n), b2(n), . . . bm(n)] by the encoder. Once these parameters are determined in offline mode, the desired decoded code-word can be obtained from input data utilizing the neural network (e.g., computing Equation (2)), which may avoid complex iterations and feedback decisions used in traditional error-correcting decoding algorithms. In this manner, neural networks described herein may provide a reduction in processing complexity and/or latency, because some complexity has been transferred to an off-line training process which is used to determine the weights and/or functions which will be used. Further, the same neural network (e.g., the neural network 150 of
The first stage of the neural network 170 includes inputs node 171. The inputs node 171 may receive input data at various inputs of the recurrent neural network. In some examples, the inputs node 171 may include multiple input nodes, such as input node 168, node 169, node 172, and node 174 of
The recurrent neural network 170 includes delay units 175a, 175b, and 175c, which generate delayed versions of the output from the respective combiner units 173a-c based on receiving such output data from the respective combiner units 173a-c. In the example, the output data of combiner units 173a-c may be represented as h(n); and, accordingly, each of the delay units 175a-c delay the output data of the combiner units 173a-c to generate delayed versions of the output data from the combiner units 173a-c, which may be represented as h(n−t). In various implementations, the amount of the delay, t, may also vary, e.g., one clock cycle, two clock cycles, or one hundred clock cycles. That is, the delay unit 175 may receive a clock signal and utilize the clock signal to identify the amount of the delay. In the example of
Continuing in the example of
Generally, a recurrent neural network may include multiple stages of nodes. The nodes may be implemented using processing units (e.g., processing units 112) which may execute one or more functions on inputs received from a previous stage and provide the output of the functions to the next stage of the recurrent neural network. The processing units may be implemented using, for example, one or more processors, controllers, and/or custom circuitry, such as an application specific integrated circuit (ASIC) and/or a field programmable gate array (FPGA). In some examples, the processing units may be implemented using any combination of one or more processing units 112 described with respect to
Examples of recurrent neural network training and inference can be described mathematically. Again, as an example, consider input data at a time instant (n), given as: X(n)=[x1(n), x2(n), . . . xm(n)]T. The center vector for each element in hidden layer(s) of the recurrent neural network 170 (e.g., combiner units 173 including combiner 152, combiner 154, combiner 156, and combiner 158 of
The output of each element in a hidden layer may then be given as:
hi(n)=ƒi(∥X(n)+hi(n−t)−Ci∥) for (i=1,2, . . . ,H) (9)
t may be the delay at the delay unit 175 such that the output of the combiner units 173 includes a delayed version of the output of the combiner units 173. In some examples, this may be referred to as feedback of the combiner units 173. Accordingly, each of the connections between a last hidden layer and the output layer may be weighted. Each element in the output layer may have a linear input-output relationship such that it may perform a summation (e.g., a weighted summation). Accordingly, an output of the i'th element in the output layer at time n may be written as:
yi(n)=Σj=1HWijhj(n)+Wijhj(n−t)=Σj=1HWijƒj(∥X(n)+hi(n−t)−Cj∥) (10)
for (i=1, 2, . . . , L) and where L is the element number of the output of the output layer and Wij is the connection weight between the j'th element in the hidden layer and the i'th element in the output layer.
Additionally or alternatively, while
As denoted in the representation of the input data signals, the input data 210a X1(i, i−1) includes a current portion of the input data, at time i, and a previous portion of the input data, at time i−1. For example, a current portion of the input data may be a sample obtained at a certain time period (e.g., at time i), while a previous portion of the input data may be a sample obtained at a time period previous to the certain time period (e.g., at time i−1). Accordingly, the previous portion of the input data may be referred to as a time-delayed version of the current portion of the input data. The portions of the input data at each time period may be obtained in a vector or matrix format, for example. In an example, a current portion of the input data, at time i, may be a single value; and a previous portion of the input data, at time i−1, may be a single value. Thus, the input data 210a X1(i, i−1) may be a vector. In some examples, the current portion of the input data, at time i, may be a vector value; and a previous portion of the input data, at time i−1, may be a vector value. Thus, the input data 210a X1(i, i−1) may be a matrix.
The processing unit 112 may include multiplication unit/accumulation (MAC) units 212a-c, 216a-b, and 220; delay units 213a-c, 217a-b, and 221; and memory lookup units (MLUs) 214a-c, 218a-b, and 222 that, when mixed with input data obtained from the memory 145, may generate output data (e.g. B (1)) 230. Each set of MAC units and MLU units having different element numbers may be referred to as a respective stage of combiners for the processing unit 112. For example, a first stage of combiners includes MAC units 212a-c and MLUs 214a-c, operating in conjunction with delay units 213a-c, to form a first stage or “layer,” as referenced with respect to
In an example of input data being encoded in accordance with an encoding technique, the output data 230 B(1) may be the decoded data (e.g., an estimate of the decoded data) corresponding to the encoded input data in some examples. For example, the output data may be the data corresponding to the encoded input data, but having reduced and/or modified noise. In operation, the processing unit 112, may provide instructions 115, stored at the mode configurable control 105, to cause the processing unit 112 to configure the multiplication units 212a-c, 216a-c, and 220 to multiply and/or accumulate input data 210a, 210b, and 210c and delayed versions of processing results from the delay units 213a-c, 217a-b, and 221 (e.g., respective outputs of the respective layers of MAC units) with coefficient data to generate the output data 230 B(1). For example, the mode configurable control 105 may execute instructions that cause the memory 145 to provide weights and/or other parameters stored in the memory 145, which may be associated with a certain encoding technique, to the MLUs 214a-c, 218a-b, and 222 as weights for the MAC units 212a-c, 216a-b, and 220 and delay units 213a-c, 217a-b, and 221. During operation, the mode configuration control 105 may be used to select weights and/or other parameters in memory 145 based on an indicated encoding technique for the processing unit 112.
As denoted in the representation of the respective outputs of the respective layers of MAC units (e.g., the outputs of the MLUs 214a-c, 218a-b, and 222), the input data to each MAC unit 212a-c, 216a-b, and 220 includes a current portion of input data, at time i, and a delayed version of a processing result, at time i−1. For example, a current portion of the input data may be a sample obtained at a certain time period (e.g., at time i), while a delayed version of a processing result (e.g., at time i−1) may be obtained from the output of the delay units 213a-c, 217a-b, and 221, which is representative of a time period previous to the certain time period (e.g., as a result of the introduced delay). Accordingly, in using such input data, obtained from both a current period and at least one previous period, output data 230 B(1) may be representative of a Markov process, such that a causal relationship between at least data from a current time period and a previous time period may improve the accuracy of weight estimation for training of coefficient data to be utilized by the MAC units and MLUs of the processing unit 112 or inference of encoded input data or noisy encoded input data in utilizing the processing unit 112. Accordingly, in utilizing delayed versions of output data from 213a-c, 217a-b, and 221 the recurrent neural network 170 provides decoded data (e.g., an estimate of the decoded data), which may have reduced and/or modified noise. Therefore, a recurrent neural network 170 may operate more efficiently to decode noisy encoded input data, e.g., data that may have been stored in a memory cells of a degraded memory device where bit flips have occurred.
In an example of executing such instructions 115 for mixing encoded input data with coefficients, at a first layer of the MAC units 212a-c and MLUs 214a-c, the multiplication unit/accumulation units 212a-c are configured to multiply and accumulate at least two operands from corresponding input data 210a, 210b, or 210c and an operand from a respective delay unit 213a-c to generate a multiplication processing result that is provided to the MLUs 214a-c. For example, the multiplication unit/accumulation units 212a-c may perform a multiply-accumulate operation such that three operands, M N, and T are multiplied and then added with P to generate a new version of P that is stored in its respective MLU 214a-c. Accordingly, the MLU 214a latches the multiplication processing result, until such time that the stored multiplication processing result is be provided to a next layer of MAC units. The MLUs 214a-c, 218a-b, and 222 may be implemented by any number of processing elements that operate as a memory look-up unit such as a D, T, SR, and/or JK latches.
The MLUs 214a-c, 218a-b, and 222 shown in
Additionally in the example, the MLU 214a provides the processing result to the delay unit 213a. The delay unit 213a delays the processing result (e.g., h1(i)) to generate a delayed version of the processing result (e.g., h1(i−1)) to output to the MAC unit 212a as operand T. While the delay units 213a-c, 217a-b, and 221 of
In the example of a first hidden layer of a recurrent neural network, the MLUs 214a-c may retrieve coefficient data stored in the memory 145, which may be weights associated with weights to be applied to the first layer of MAC units to both the data from the current period and data from a previous period (e.g., the delayed versions of first layer processing results). For example, the MLU 214a can be a table look-up that retrieves one or more coefficients (e.g., specific coefficients associated with a first frequency) to be applied to both operands M and N, as well as an additional coefficient to be applied to operand T. The MLUs 214a-c also provide the generated multiplication processing results to the next layer of the MAC units 216a-b and MLUs 218a-b. The additional layers of the MAC units 216a, 216b and MAC unit 220 working in conjunction with the MLUs 218a, 218b and MLU 222, respectively, may continue to process the multiplication results to generate the output data 230 B(1). Using such a circuitry arrangement, the output data 230 B(1) may be generated from the input data 210a, 210b, and 210c.
Advantageously, the processing unit 112 of system 200 may utilize a reduced number of MAC units and/or MLUs, e.g., as compared to the processing unit 112 of
The coefficient data, for example from memory 145, can be mixed with the input data 210a-210c and delayed version of processing results to generate the output data 230 B(1). For example, the relationship of the coefficient data to the output data 230 B(1) based on the input data 210a-c and the delayed versions of processing results may be expressed as:
B(1)=a1*ƒ(Σj=1m-1a(m-1)ƒj(Σk=1ma(m)Xk(i))) (11)
where a(m), a(m-1), a1 are coefficients for the first layer of multiplication/accumulation units 212a-c and outputs of delay units 213a-c; the second layer of multiplication/accumulation units 216a-b and outputs of delay units 217a-b; and last layer with the multiplication/accumulation unit 220 and output of delay unit 221, respectively; and where ƒ(•) is the mapping relationship which may be performed by the memory look-up units 214a-c and 218a-b. As described above, the memory look-up units 214a-c and 218a-b retrieve coefficients to mix with the input data and respective delayed versions of each layer of MAC units. Accordingly, the output data may be provided by manipulating the input data and delayed versions of the MAC units with the respective multiplication/accumulation units using a set of coefficients stored in the memory. The set of coefficients may be associated with a desired encoding technique. The resulting mapped data may be manipulated by additional multiplication/accumulation units and additional delay units using additional sets of coefficients stored in the memory associated with the desired encoding technique. The sets of coefficients multiplied at each stage of the processing unit 112 may represent or provide an estimation of the processing of the input data in specifically-designed hardware (e.g., an FPGA).
Further, it can be shown that the system 200, as represented by Equation (11), may approximate any nonlinear mapping with arbitrarily small error in some examples and the mapping of system 200 may be determined by the coefficients a(m), a(m-1), a1. For example, if such coefficient data is specified, any mapping and processing between the input data 210a-210c and the output data 230 may be accomplished by the system 200. For example, the coefficient data may represent non-linear mappings of the input data 210a-c to the output data B(1) 230. In some examples, the non-linear mappings of the coefficient data may represent a Gaussian function, a piece-wise linear function, a sigmoid function, a thin-plate-spline function, a multi-quadratic function, a cubic approximation, an inverse multi-quadratic function, or combinations thereof. In some examples, some or all of the memory look-up units 214a-c, 218a-b may be deactivated. For example, one or more of the memory look-up units 214a-c, 218a-b may operate as a gain unit with the unity gain. Such a relationship, as derived from the circuitry arrangement depicted in system 200, may be used to train an entity of the computing system 200 to generate coefficient data. For example, using Equation (11), an entity of the computing system 200 may compare input data to the output data to generate the coefficient data.
Each of the multiplication unit/accumulation units 212a-c, 216a-b, and 220 may include multiple multipliers, multiple accumulation unit, or and/or multiple adders. Any one of the multiplication unit/accumulation units 212a-c, 216a-b, and 220 may be implemented using an ALU. In some examples, any one of the multiplication unit/accumulation units 212a-c, 216a-b, and 220 can include one multiplier and one adder that each perform, respectively, multiple multiplications and multiple additions. The input-output relationship of a multiplication/accumulation unit 212a-c, 216a-b, and 220 may be represented as:
where “I” represents a number to perform the multiplications in that unit, Ci the coefficients which may be accessed from a memory, such as memory 145, and Bin(i) represents a factor from either the input data 210a-c or an output from multiplication unit/accumulation units 212a-c, 216a-b, and 220. In an example, the output of a set of multiplication unit/accumulation units, Bout, equals the sum of coefficient data, Ci multiplied by the output of another set of multiplication unit/accumulation units, Bin(i). Bin(i) may also be the input data such that the output of a set of multiplication unit/accumulation units, Bout, equals the sum of coefficient data, Ci multiplied by input data.
Accordingly, the system 200 (e.g., a hardware implementation of recurrent neural network 170) may be used to convert an input code word (e.g. x1(n), x2(n), . . . xm(n)) to an output code word (e.g., B(1). Examples of the conversion have been described herein with reference to
The mode configuration control 105 may be implemented using circuitry (e.g., logic), one or more processor(s), microcontroller(s), controller(s), or other elements. The mode configuration control 105 may select certain weights and/or other parameters from memory 145 and provide those weights and/or other parameters to one or more of the MAC units and/or MLUs of
While processing element 112 is described in the context of
The system 280, including processing element 112, also includes additional features not highlighted in the processing element 112 of
Advantageously, such delayed versions of processing results, which may be provided as inputs to different or additional hidden layers, may better compensate “higher-order” memory effects in a recurrent neural network 170 that implements one or more processing units 112 of
While processing element 112 is described in the context of
The host 302 may be a host system such as a personal laptop computer, a desktop computer, a digital camera, a mobile telephone, or a memory card reader, among various other types of hosts. The host 302 may include a number of memory access devices (e.g., a number of processors). The host 302 may also be a memory controller, such as where memory system 304 is a memory device (e.g., a memory device having an on-die controller).
The memory system 304 may be a solid state drive (SSD) or other type of memory and may include a host interface 306, a controller 308 (e.g., a processor and/or other control circuitry), and a number of memory device(s) 314. The memory system 304, the controller 308, and/or the memory device(s) 314 may also be separately considered an “apparatus.” The memory device(s) 314 may include a number of solid state memory devices such as NAND flash devices, which may provide a storage volume for the memory system 304. Other types of memory may also be used.
The controller 308 may be coupled to the host interface 306 and to the memory device(s) 314 via a plurality of channels to transfer data between the memory system 304 and the host 302. The interface 306 may be in the form of a standardized interface. For example, when the memory system 304 is used for data storage in the apparatus 300, the interface 306 may be a serial advanced technology attachment (SATA), peripheral component interconnect express (PCIe), or a universal serial bus (USB), among other connectors and interfaces. In general, interface 306 provides an interface for passing control, address, data, and other signals between the memory system 304 and the host 302 having compatible receptors for the interface 306.
The controller 308 may communicate with the memory device(s) 314 (which in some embodiments can include a number of memory arrays on a single die) to control data read, write, and erase operations, among other operations. The controller 308 may include a discrete memory channel controller for each channel (not shown in
The controller 308 may include an ECC encoder 310 for encoding data bits written to the memory device(s) 314 using one or more encoding techniques. The ECC encoder 310 may include a single parity check (SPC) encoder, and/or an algebraic error correction circuit such as one of the group including a Bose-Chaudhuri-Hocquenghem (BCH) ECC encoder and/or a Reed Solomon ECC encoder, among other types of error correction circuits. The controller 308 may further include an ECC decoder 312 for decoding encoded data, which may include identifying erroneous cells, converting erroneous cells to erasures, and/or correcting the erasures. The memory device(s) 314 may, for example, include one or more output buffers which may read selected data from memory cells of the memory device(s) 314. The output buffers may provide output data, which may be provided as encoded input data to the ECC decoder 312. The neural network 150 of
The ECC encoder 310 and the ECC decoder 312 may each be implemented using discrete components such as an application specific integrated circuit (ASIC) or other circuitry, or the components may reflect functionality provided by circuitry within the controller 308 that does not necessarily have a discrete physical form separate from other portions of the controller 308. Although illustrated as components within the controller 308 in
The memory device(s) 314 may include a number of arrays of memory cells (e.g., non-volatile memory cells). The arrays can be flash arrays with a NAND architecture, for example. However, embodiments are not limited to a particular type of memory array or array architecture. Floating-gate type flash memory cells in a NAND architecture may be used, but embodiments are not so limited. The cells may be multi-level cells (MLC) such as triple level cells (TLC) which store three data bits per cell. The memory cells can be grouped, for instance, into a number of blocks including a number of physical pages. A number of blocks can be included in a plane of memory cells and an array can include a number of planes. As one example, a memory device may be configured to store 8 KB (kilobytes) of user data per page, 128 pages of user data per block, 2048 blocks per plane, and 16 planes per device.
According to a number of embodiments, controller 308 may control encoding of a number of received data bits according to the ECC encoder 310 that allows for later identification of erroneous bits and the conversion of those erroneous bits to erasures. The controller 308 may also control programming the encoded number of received data bits to a group of memory cells in memory device(s) 314.
The apparatus shown in
From the foregoing it will be appreciated that, although specific embodiments have been described herein for purposes of illustration, various modifications may be made while remaining with the scope of the claimed technology. Certain details are set forth herein to provide an understanding of described embodiments of technology. However, other examples may be practiced without various of these particular details. In some instances, well-known circuits, control signals, timing protocols, neural network structures, algorithms, and/or software operations have not been shown in detail in order to avoid unnecessarily obscuring the described embodiments. Other embodiments may be utilized, and other changes may be made, without departing from the spirit or scope of the subject matter presented here.
Block 402 recites “receive known encoded and decoded data pairs, the encoded data encoded with a particular encoding technique.” The known encoded and decoded data pairs may be received by a computing device (e.g., electronic device 110) that includes a neural network, such as the neural network 150 of
Block 404 may follow block 402. Block 404 recites “determine a set of weights for a neural network to decode data encoded with the particular encoding technique.” For example, a neural network (e.g., any of the neural networks described herein) may be trained using the encoded and decoded data pairs received in block 402. The weights may be numerical values, which, when used by the neural network, allow the neural network to output decoded data corresponding encoded input data encoded with a particular encoding technique. The weights may be stored, for example, in the memory 145 of
In some examples, multiple sets of data pairs may be received (e.g., in block 402), with each set corresponding to data encoded with a different encoding technique. Accordingly, multiple sets of weights may be determined (e.g., in block 404), each set corresponding to a different encoding technique. For example, one set of weights may be determined which may be used to decode data encoded in accordance with LDPC coding while another set of weights may be determined which may be used to decode data encoded with BCH coding.
Block 406 may follow block 404. Block 406 recites “receive data encoded with the particular encoding technique.” For example, data (e.g., signaling indicative of data) encoded with the particular encoding technique may be retrieved from a memory of a computing device and/or received using a wireless communications receiver. Any of a variety of encoding techniques may have been used to encode the data.
Block 408 may follow block 406. Block 408 recites “decode the data using the set of weights.” By processing the encoded data received in block 406 using the weights, which may have been determined in block 404, the decoded data may be determined. For example, any neural network described herein may be used to decode the encoded data (e.g., the neural network 150 of
Block 410 may follow block 408. Block 410 recites “writing the decoded data to or reading the decoded data from memory.” For example, data decoded in block 408 may be written to a memory, such as the memory device 314 of
In some examples, blocks 406-410 may be repeated for data encoded with different encoding techniques. For example, data may be received in block 406, encoded with one particular encoding technique (e.g., LDPC coding). A set of weights may be selected that is for use with LDPC coding and provided to a neural network for decoding in block 408. The decoded data may be obtained in block 410. Data may then be received in block 406, encoded with a different encoding technique (e.g., BCH coding). Another set of weights may be selected that is for use with BCH coding and provided to a neural network for decoding in block 408. The decoded data may be obtained in block 410. In this manner, one neural network may be used to decode data that had been encoded with multiple encoding techniques.
Example method 500 may begin with a block 504 that starts execution of the mixing encoded input data with coefficient data routine. The method may include a block 508 that recites “obtaining encoded data associated with an encoding technique.” As described herein, the processing unit may be configured to obtain a variety of types of input data that may be encoded with various encoding techniques, such as data that has been encoded with a low density parity check coding (LDPC), a Reed-Solomon coding, a Bose-Chaudhuri-Hocquenghem (BCH), and/or Polar coding. In the context of apparatus 300, the encoded data may be obtained from the memory device(s) 314, as described with respect to
Block 508 may be followed by block 512 that recites “retrieving a plurality of coefficients from a memory, the plurality of coefficients associated with the encoding technique.” As described herein, the processing unit may retrieve coefficients for mixing with encoded input data; for example, utilizing a memory look-up unit (MLU). For example, the memory may store (e.g., in a database) associations between coefficients and encoding techniques described herein. For example, the processing unit may request the coefficients from a memory part of the implementing computing device, from a memory part of an external computing device, or from a memory implemented in a cloud-computing device. In turn, the plurality of coefficients may be retrieved from the memory as requested by the processing unit.
Block 512 may be followed by block 516 that recites “calculating, at a first layer of multiplication/accumulation processing units (MAC units), the encoded data with the plurality of coefficients to generate first processing results.” As described herein, the processing unit utilizes the plurality of coefficients such that mixing the coefficients with encoded input data generates output data that reflects the processing of the input data with coefficients by the circuitry of
Block 516 may be followed by block 520 that recites “calculating, at additional layers of MAC units, the first processing results with the additional plurality of coefficients to generate second processing results.” As described herein, the processing unit utilizes additional plurality of coefficients such that mixing the coefficients with certain processing results generates output data that reflects the processing of the input data with coefficients by the circuitry of
Block 520 may be followed by block 524 that recites “providing output data representative of the encoded data being decoded, such that the output data is an estimate of decoded data based on the encoded data.” As described herein, the neural network 150 or recurrent neural network 170 provides output data as output bits which represent the processed data, corresponding to the encoded input data (e.g., m encoded input bits) having been decoded according to the encoding technique. Accordingly, neural networks described herein transform encoded input data to decoded data (e.g., an estimate of the decoded data). Block 524 may be followed by block 528 that ends the example method 500. In some examples, block 512 may be an optional block.
The blocks included in the described example methods 400 and 500 are for illustration purposes. In some embodiments, the blocks may be performed in a different order. In some other embodiments, various blocks may be eliminated. In still other embodiments, various blocks may be divided into additional blocks, supplemented with other blocks, or combined together into fewer blocks. Other variations of these specific blocks are contemplated, including changes in the order of the blocks, changes in the content of the blocks being split or combined into other blocks, etc.
Examples described herein may refer to various components as “coupled” or signals as being “provided to” or “received from” certain components. It is to be understood that in some examples the components are directly coupled one to another, while in other examples the components are coupled with intervening components disposed between them. Similarly, signal may be provided directly to and/or received directly from the recited components without intervening components, but also may be provided to and/or received from the certain components through intervening components.
Patent | Priority | Assignee | Title |
11563449, | Apr 27 2021 | Micron Technology, Inc.; Micron Technology, Inc | Systems for error reduction of encoded data using neural networks |
11755408, | Oct 07 2021 | Micron Technology, Inc | Systems for estimating bit error rate (BER) of encoded data using neural networks |
Patent | Priority | Assignee | Title |
10176802, | Mar 21 2016 | Amazon Technologies, Inc. | Lattice encoding using recurrent neural networks |
10400928, | Feb 25 2009 | CAPTENT TARGET, INC ; Captent Inc | End connector for high pressure reinforced rubber hose |
10552738, | Dec 15 2016 | GOOGLE LLC | Adaptive channel coding using machine-learned models |
10698657, | Aug 12 2016 | Xilinx, Inc | Hardware accelerator for compressed RNN on FPGA |
10749594, | Aug 18 2017 | DEEPSIG INC | Learning-based space communications systems |
10812449, | Sep 19 2018 | VERISIGN | Method for generating a domain name using a learned information-rich latent space |
11088712, | Nov 05 2019 | Western Digital Technologies, Inc. | Iterative decoder performance prediction using machine learning |
11196992, | Sep 03 2015 | MEDIATEK INC | Method and apparatus of neural network based processing in video coding |
7321882, | Oct 13 2000 | Fraunhofer-Gesellschaft zur Foerderung der Angewandten Forschung E V | Method for supervised teaching of a recurrent artificial neural network |
9988090, | Jul 21 2016 | Nissan Motor Light Truck Co., Ltd. | Vehicle frame |
20040015459, | |||
20060013289, | |||
20060200258, | |||
20090292537, | |||
20170310508, | |||
20170370508, | |||
20180022388, | |||
20180046897, | |||
20180174050, | |||
20180249158, | |||
20180322388, | |||
20180357530, | |||
20190197549, | |||
20200012953, | |||
20200065653, | |||
20200160838, | |||
20200210816, | |||
20200234103, | |||
20200296741, | |||
20210273707, | |||
20210287074, | |||
20210319286, | |||
20210336779, | |||
20210351863, | |||
KR20180054554, | |||
KR20180084988, | |||
WO2020139976, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Nov 13 2019 | Micron Technology, Inc. | (assignment on the face of the patent) | / | |||
Nov 13 2019 | LUO, FA-LONG | Micron Technology, Inc | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 051002 | /0294 |
Date | Maintenance Fee Events |
Nov 13 2019 | BIG: Entity status set to Undiscounted (note the period is included in the code). |
Date | Maintenance Schedule |
Aug 23 2025 | 4 years fee payment window open |
Feb 23 2026 | 6 months grace period start (w surcharge) |
Aug 23 2026 | patent expiry (for year 4) |
Aug 23 2028 | 2 years to revive unintentionally abandoned end. (for year 4) |
Aug 23 2029 | 8 years fee payment window open |
Feb 23 2030 | 6 months grace period start (w surcharge) |
Aug 23 2030 | patent expiry (for year 8) |
Aug 23 2032 | 2 years to revive unintentionally abandoned end. (for year 8) |
Aug 23 2033 | 12 years fee payment window open |
Feb 23 2034 | 6 months grace period start (w surcharge) |
Aug 23 2034 | patent expiry (for year 12) |
Aug 23 2036 | 2 years to revive unintentionally abandoned end. (for year 12) |