bits in a received word that is based on a codeword of a polar code are decoded to generate decoded bits. A lower-order partial sum is updated based on the decoded bits, and a higher-order partial sum based on the lower-order partial sum is computed. The higher-order partial sum computation is a live computation performed during decoding of a subsequent bit in the received word in some embodiments. In decoding the subsequent bit, nodes in a data dependency graph (DDG) of the polar code may be traversed in a reverse order relative to node indices of at least some of the nodes in the DDG. A reverse order may also be applied to partial sum computations, to combine multiple lower-order partial sums that are based on previously decoded bits according to a reverse order relative to an order in which at least some of the previously decoded bits were decoded.
|
15. A method comprising:
receiving a word that is based on a codeword of a polar code;
decoding bits in the received word to generate decoded bits;
computing and storing a partial sum based on the decoded bits; and
decoding subsequent bits in the received word using the partial sum in traversing nodes in a data dependency graph (DDG) of the polar code in a reverse order relative to node indices of at least some of the nodes in the DDG.
1. A method comprising:
receiving a word that is based on a codeword of a polar code;
decoding bits in a decoding segment of the received word to generate decoded bits;
updating and storing a lower-order partial sum based on the decoded bits;
starting decoding of a subsequent bit in the received word;
computing a higher-order partial sum based on the lower-order partial sum during the decoding of the subsequent bit in the received word; and
storing the computed higher-order partial sum.
18. An apparatus comprising:
a receiver to receive a word that is based on a codeword of a polar code;
a decoder, coupled to the receiver, to decode bits in the received word to generate decoded bits, to compute and store a partial sum based on the decoded bits, and to decode subsequent bits in the received word using the lower-order partial sum in traversing nodes in a data dependency graph (DDG) of the polar code in a reverse order relative to node indices of at least some of the nodes in the DDG.
8. An apparatus comprising:
a receiver to receive a word that is based on a codeword of a polar code; and
a decoder, coupled to the receiver, to decode bits in a decoding segment of the received word to generate decoded bits, to update and store a lower-order partial sum based on the decoded bits, to start decoding of a subsequent bit in the received word, to compute a higher-order partial sum based on the lower-order partial sum during the decoding of the subsequent bit in the received word, and to store the computed higher-order partial sum.
21. A non-transitory processor-readable medium storing instructions which, when executed by one or more processors, cause the one or more processors to perform a method comprising:
receiving a word that is based on a codeword of a polar code;
decoding bits in the received word to generate decoded bits;
computing and storing a partial sum based on the decoded bits; and
decoding subsequent bits in the received word using the partial sum in traversing nodes in a data dependency graph (DDG) of the polar code in a reverse order relative to node indices of at least some of the nodes in the DDG.
14. A non-transitory processor-readable medium storing instructions which, when executed by one or more processors, cause the one or more processors to perform a method comprising:
receiving a word that is based on a codeword of a polar code;
decoding bits in a decoding segment of the received word to generate decoded bits;
updating and storing a lower-order partial sum based on the decoded bits;
starting decoding of a subsequent bit in the received word;
computing a higher-order partial sum based on the lower-order partial sum during the decoding of the subsequent bit in the received word; and
storing the computed higher-order partial sum.
2. The method of
3. The method of
wherein storing the computed higher-order partial sum comprises storing the computed higher-order partial sum to partial sum memory.
4. The method of
retrieving one of the multiple lower-order partial sums from the partial sum memory; and
retrieving another one of the multiple lower-order partial sums from partial sum scratchpad storage.
5. The method of
decoding bits in a subsequent decoding segment in the received word using the higher-order partial sum.
6. The method of
7. The method of
wherein the computing comprises combining multiple lower-order partial sums, including the lower-order partial sum and at least one further lower-order partial sum, which lower-order partial sums are based on previously decoded bits including the generated decoded bits, and
wherein the combining comprises combining the multiple lower-order partial sums according to a reverse order relative to an order in which at least some of the previously decoded bits were decoded.
9. The apparatus of
partial sum storage, coupled to the decoder, storing the lower-order partial sum,
wherein the decoder is configured to update and store the lower-order partial sum by updating the lower-order partial sum in the partial sum storage.
10. The apparatus of
partial sum memory coupled to the decoder,
wherein the decoder is configured to store the computed higher-order partial sum to the partial sum memory.
11. The apparatus of
partial sum scratchpad storage coupled to the decoder,
wherein the decoder is configured to compute the higher-order partial sum based on multiple lower-order partial sums including the lower-order partial sum and at least one further lower-order partial sum, and
wherein the decoder is further configured to retrieve one of the multiple lower-order partial sums from the partial sum memory, and to retrieve another one of the multiple lower-order partial sums from the partial sum scratchpad storage.
12. The apparatus of
wherein the decoder is further configured to decode bits in a subsequent decoding segment in the received word using the higher-order partial sum,
wherein the decoder is configured to decode the bits in the subsequent decoding segment by traversing nodes in a data dependency graph (DDG) of the polar code in a reverse order relative to node indices of at least some of the nodes in the DDG.
13. The apparatus of
wherein the decoder is further configured to decode bits in a subsequent decoding segment in the received word using the higher-order partial sum,
wherein the decoder is configured to compute the higher-order partial sum by combining multiple lower-order partial sums, including the lower-order partial sum and at least one further lower-order partial sum, which lower-order partial sums are based on previously decoded bits including the generated decoded bits, and
wherein the decoder is configured to combine the multiple lower-order partial sums according to a reverse order relative to an order in which at least some of the previously decoded bits were decoded.
16. The method of
combining the partial sum, and at least one further partial sum that is based on previously decoded bits, into a higher-order partial sum according to a reverse order relative to an order in which the decoded bits and the previously decoded bits were decoded.
17. The method of
19. The apparatus of
20. The apparatus of
|
The present disclosure relates to generally to communications and, in particular, to computation of partial sums for decoding polar coded information.
Polar codes are proposed as channel codes for use in future wireless communications. These codes are competitive with state-of-the-art error correction codes and have low encoding complexity. See E. Arikan, “Channel polarization: A method for constructing capacity-achieving codes for symmetric binary-input memoryless channels,” IEEE Trans. Inf. Theory, vol. 55, no. 7, pp. 3051-3073, 2009. Successive Cancellation List (SCL) decoding is one option for decoding polar coded information.
It is desired to reduce the time and hardware resources required to decode received signals that are encoded using polar codes.
According to an aspect of the present disclosure, a method involves receiving a word that is based on a codeword of a polar code. Bits in a decoding segment of the received word are decoded, to generate decoded bits. A lower-order partial sum is updated based on the decoded bits. A higher-order partial sum based on the lower-order partial sum is computed during decoding of a subsequent bit in the received word.
In decoding the subsequent bit, nodes in a Data Dependency Graph (DDG) of the polar code may be traversed in a reverse order relative to node indices of at least some of the nodes in the DDG.
A reverse order may also be applied to partial sum computations. For example, the partial sum computation may involve combining multiple lower-order partial sums that are based on previously decoded bits. The multiple lower-order partial sums may be combined according to a reverse order relative to an order in which at least some of the previously decoded bits were decoded.
An apparatus includes a receiver to receive a word that is based on a codeword of a polar code, and a decoder coupled to the receiver. The decoder is configured to decode bits in a decoding segment of the received word to generate decoded bits, to update a lower-order partial sum based on the decoded bits, and to compute a higher-order partial sum based on the lower-order partial sum during decoding of a subsequent bit in the received word.
According to another method, a partial sum is computed based on the decoded bits, and is used in traversing nodes in a DDG of the polar code in a reverse order relative to node indices of at least some of the nodes in the DDG, to decode subsequent bits in the received word. In an embodiment, the partial sum and at least one further partial sum that is based on previously decoded bits are combined into a higher-order partial sum according to a reverse order relative to an order in which the decoded bits and the previously decoded bits were decoded. This combining of the lower-order partial sums could be performed during decoding of the subsequent bits using the higher-order partial sum.
A decoder that is coupled to the receiver could be configured to decode bits in the received word to generate decoded bits, to compute a partial sum based on the decoded bits, and to use the lower-order partial sum in traversing nodes in a DDG of the polar code in a reverse order relative to node indices of at least some of the nodes in the DDG, to decode subsequent bits in the received word. The decoder could be further configured to combine lower-order partial sums into a higher-order partial sum according to a reverse order relative to an order in which decoded bits that were used in computing the lower-order partial sums were decoded, during decoding of subsequent bits using the higher-order partial sum.
A non-transitory processor-readable medium could be used to store instructions which, when executed by one or more processors, cause the one or more processors to perform a method as disclosed herein.
Other aspects and features of embodiments of the present disclosure will become apparent to those ordinarily skilled in the art upon review of the following description.
Examples of embodiments of the invention will now be described in greater detail with reference to the accompanying drawings.
A polar SCL decoder decodes N-bit codewords using a successive cancellation polar code algorithm with L decoding paths, where L is the list size of the polar SCL decoder. This type of decoding is based on successive cancellation with an N-by-N polar code. To estimate a decoded bit value ûx, a tree of M stages (M=log2(N)) composed of F and G nodes combines pairs of channel Log Likelihood Ratio (LLR) values with a partial sum of previously decoded bits. An LLR value is a 6-bit signed binary number in an embodiment. The quantization, or bit length, of LLR values could be different in other embodiments. LLR bit length could be selected based on a target block error performance, for example. Although decoding computation accuracy increases with the number of bits in the LLR values, the size of a decoder also increases with the number of bits in the LLR values.
Partial sum bits that are used by the G nodes are not shown in
LLR Stage #0 G nodes #0-7 use PSUM bits û0+û1+û2+û3+û4+û5+û6+û7, û0+û2+û4+û6, û2+û3+û6+û7, û3+û7, û4+û5+û6+û7, û5+û7, û6+û7 and û7
LLR Stage #1 G nodes #0-3 use PSUM bits û0+û1+û2+û3, û1+û3, û2+û3 and û3
LLR Stage #1 G nodes #4-7 use PSUM bits û8+û9+û10+û11, û9+û11, û10+û11 and û11
LLR Stage #2 G nodes #0-1 use PSUM bits û0+û1 and û1
LLR Stage #2 G nodes #2-3 use PSUM bits û4+û5 and û5
LLR Stage #2 G nodes #4-5 use PSUM bits û8+û9 and û9
LLR Stage #2 G nodes #6-7 use PSUM bits û12+û13 and û13
LLR Stage #3 G node #0 uses PSUM bit û0
LLR Stage #3 G node #1 uses PSUM bit û2
LLR Stage #3 G node #2 uses PSUM bit û4
LLR Stage #3 G node #3 uses PSUM bit û6
LLR Stage #3 G node #4 uses PSUM bit û8
LLR Stage #3 G node #5 uses PSUM bit û10
LLR Stage #3 G node #6 uses PSUM bit û12
LLR Stage #3 G node #7 uses PSUM bit û14.
At each stage in
The decoder traverses the LLR tree from right (Stage #0) to left (Stage #3), top to bottom. To estimate the value û0 of the decoded bit #0:
1. Stage #0 F nodes #0-7 combine 16 channel LLR values in up to 8 parallel PE units
2. Stage #1 F nodes #0-3 combine Stage #0 LLR results in up to 4 parallel PE units
3. Stage #2 F nodes #0-1 combine Stage #1 LLR results in up to 2 parallel PE units
4. Stage #3 F node #0 combines Stage #2 LLR results in 1 PE unit.
The decoder uses the Stage #3 (last) LLR results to estimate each decoded bit #x value ûx. If the LLR value>=0, then the estimated value of decoded bit #x is ûx=0, and if the LLR value<0, then the estimated value of decoded bit #x is ûx=1.
The decoder may store the intermediate Stage #0-Stage #2 LLR values, so that estimation of the next decoded bits need not start over at Stage #0. For example, with the intermediate Stage #0-Stage #2 LLR values from the bit #0 decoding available from LLR memory, to estimate the value û1 of the decoded bit #1:
5. Stage #3 G node #0 combines Stage #2 LLR results with partial sum bit û0.
To estimate the value û2 of the decoded bit #2:
6. Stage #2 G nodes #0-1 combine Stage #1 LLR results with partial sum bits û0+û1 and û1
7. Stage #3 F node #1 combines Stage #2 LLR results.
To estimate the value û3 of the decoded bit #3:
8. Stage #3 G node #1 combines Stage #2 LLR results with partial sum bit û2.
To estimate the value û4 of the decoded bit #4:
9. Stage #1 G nodes #0-3 combine Stage #0 LLR results with partial sum bits û0+û1+û2+û3, û1+û3, û2+û3 and û3
10. Stage #2 F nodes #2-3 combine Stage #1 LLR results
11. Stage #3 F node #2 combines Stage #2 LLR results.
The decoder repeats this recursive process until it reaches the last codeword bit ûN-1.
A Successive Cancellation (SC) decoder tracks one decoding path. After the value of a decoded bit is estimated, the other possible value is ignored. Decoding continues with the next bit, assuming that the previous bits have been correctly estimated when updating partial sum results.
For example, for a codeword length of N=8, there are 28=256 possibilities for the estimated values û0 to û7. As codeword length increases, the number of possibilities grows exponentially, and evaluation of all combinations of ûx becomes impractical. By tracking multiple decoding paths according to a list of size L, SCL decoders may offer better decoding performance than SC decoders, with reasonable size and complexity. An SCL decoder monitors the best L decoding paths simultaneously.
Each decoding path from the root (decoded bit #0) of a decoding tree is associated with a Path Metric (PM). A decoding path appends each newly decoded bit to previous estimated values. After the LLR computations for each decoded bit, path metrics are continuously updated using the LLR values as follows:
if the LLR value>=0
if the LLR value<0
The best decoding paths have the smallest PM values. If an LLR is less than 0, then decoded bit is most likely a 1, so the next PM for the estimated value 1 (PM[1, i+1]) remains the same as the current path metric, and the absolute LLR value is added to the PM for the estimated value 0 (PM[0, i+1]), in effect “penalizing” the less likely path with the absolute LLR value. If the LLR value is near 0, then the decision for the value of ûx is unreliable and the PM penalty on the penalized path is small.
For every decoded bit in a decoding tree, each decoding path produces 2 new decoding paths. After the number of decoding paths reaches L, an SCL decoder selects, from the 2L PMs for the 2L candidate decoding paths, the L paths with the lowest PMs, and drops the other L decoding paths. In Cyclic Redundancy Check (CRC)-aided list decoding, a CRC is run against the L selected decoding paths after the last codeword bit ûN-1 is estimated. The decoding path with a successful CRC and the best PM is selected as the decoded codeword. If all of the decoding paths fail the CRC, then the decoding path with the best PM may be selected.
The partial sum bits for the G nodes are computed and updated using the estimated values ûx of previously decoded bits. A partial sum bit is a free-running sum (modulo 1) of some ûx based on the polar code matrix. Partial sum bits are individually computed for all L decoding paths. The partial sum bit that is used by a G node depends on the LLR stage and the node index or ID as shown in
PSUM bit #0=û0+û1+û2+û3+û4+û5+û6+û7
PSUM bit #1=û1+û3+û5+û7
PSUM bit #2=û2+û3+û6+û7
PSUM bit #3=û3+û7
PSUM bit #4=û4+û5+û6+û7
PSUM bit #5=û5+û7
PSUM bit #6=û6+û7
PSUM bit #7=û7.
Partial sums are denoted herein as PSm(y:z), where m is the size of the partial sum (and the polar code matrix that defines how the partial sum bits are computed) and y:z is the partial sum bit range. The size of the polar code matrix, which is equivalent to the number of partial sum bits, is dependent upon the LLR stage. For example, for N=16, Stage #0 G nodes #0-7 receive partial sum bits from a PS8 partial sum, Stage #1 G nodes #0-3 and #4-7 receive partial sum bits from PS4 partial sums, Stage #2 G nodes #0-1, #2-3, #4-5, and #6-7 receive partial sum bits from PS2 partial sums, and Stage #3 G nodes #0, #1, #2, #3, #4, #5, #6, and #7 receive partial sum bits from PS1 partial sums. The PSUM size relationship is fixed to the DDG LLR stage. The LLR stage and decoded bit index or position in a received word determine the partial sum bit range.
For example, for an N=2048-bit codeword and a decoder with 11 Stages #0 to #10, partial sum bit details for decoded bit 169 are as follows in an embodiment:
LLR Stage #3 G nodes #0-127=PS128(0-127)
LLR Stage #5 G nodes #64-91=PS32(128-159)
LLR Stage #7 G nodes #80-87=PS8(160-167)
LLR Stage #10 G node #84=PS1(168).
The partial sum computation or update sequence is repetitive. The first half (upper rows) of estimated values ûy:z contribute to half of the first PSUM bits (left columns). This is because the upper right quadrant of a polar code matrix includes all zeros and partial sum bits are updated based on decoded bits according to the polar code matrix, as noted above. The second half (lower rows) of the estimated values ûy:z contribute to all PSUM bits (all columns). This polar coding matrix sequence and partial sum computation sequence are repeated for any polar code matrix size, as shown in
In
PS16(0-7)=PS8(0-7)+PS8(8-15)
PS16(8-15)=PS8(8-15),
and for PS32(0-31):
PS32(0-15)=PS16(0-15)+PS8(16-31)
PS32(16-31)=PS16(16-31).
This pattern can be used to generate other sizes or orders of polar coding matrices, and to compute other sizes or orders of partial sums as well, and PS64(0-63) and PS128(0-127) polar code matrices which define computations of higher-order partial sums are shown in
According to conventional SCL decoding, a main decoding loop for each codeword bit x involves:
Decoding exits the main decoding loop when the Nth codeword bit is reached. The decoding path with the best PM, and optionally with a valid CRC in CRC-aided list decoding, is set as the decoded codeword.
All of the decoding paths are independent. However, for better performance, the DDG traversal in the first step of the conventional SCL main decoding loop may be executed for all decoding paths simultaneously. At the third step of the conventional SCL main decoding loop, partial sum updating may similarly be performed simultaneously to update partial sums for all decoding paths. After the number of decoding paths exceeds L, decoding of a next codeword bit i+1 cannot start until the PM sort is completed, the surviving decoding paths are selected, and all partial sums are updated. In some implementations, for better throughput and reduced latency, an SCL decoder decodes 2 or more bits in parallel. For example, when 2 bits are decoded, a last stage F (even index) node and G (odd index) node can be run in parallel, 4 child paths are generated. The best L paths are selected from the resultant 4L paths during path sorting, and all partial sums are updated.
A partial sum is a linear combination of previously decoded bits. A decoder stores N partial sum bits for each decoding path. The number of partial sum bits to be updated increases as the decoded bit index increases, to a worst case after bit index ((N/2)−1). In an example of N=2048, after decoding bit i=1023, 1024 partial sum bits require an update. For a list size L=32, the decoder must update 32 kb of partial sum bits for the decoding of codeword bit i=1024.
Partial sum bits are typically stored in memory because of the storage size required to accommodate the number of values to be stored. The number of cycles required to update the partial sum bits depends on memory width. For a 256-bit wide memory, for example, a conventional decoder spends 128 cycles updating 32 kb of partial sum bits after codeword bit i=1023, in an implementation with N=2048 and L=32. As shown in the table below, partial sum updates add 1024 cycles to codeword decoding latency in this example. This is a theoretical number, and does not account for decoder overhead and memory access latency.
Stage
0
1
2
3
4
5
6
7
8
9
10
# of PSUM bits
1024
512
256
128
64
32
16
8
4
2
1
# of visit (G node)
1
2
4
8
16
32
64
128
256
512
1024
# of cycles to
128
64
32
16
8
4
2
1
update PSUM
Total # of cycles
128
128
128
128
128
128
128
128
Total
1024
cycles
Optimal codeword decoding latency is approximately 17920 cycles, with parallel execution of 32 paths with 8 PEs per path, 5-cycle path metric sort per decoded bit, PSUM update and 1024-cycle CRC overhead. In this example, the partial sum updates consume 5.7% of total latency.
According to an embodiment disclosed herein, partial sum bits in higher-order partial sums are computed (updated) during decoding of subsequent bits in a received word. Instead of dedicating cycles to updating all partial sum bits before starting decoding of a next bit, a decoder performs a partial sum computation during decoding of subsequent bits, and the results of the partial sum computations are used by G nodes to calculate LLR results. For example, partial sum bits could be updated based on decoded bit ûx during LLR computations for decoding subsequent bits.
As noted above, DDG nodes are normally traversed during decoding in a direction from right to left and top to bottom in
Partial sum computation during decoding of subsequent bits of a received word, is also referred to herein as “live” partial sum computation. In an embodiment, live partial sum computation uses 3 storage structures, including:
These storage structures need not necessarily be implemented in distinct physical memory devices. For example, the in-flight partial sum storage could be implemented in flip-flops or registers as noted above, and the partial sum memory and the partial sum scratchpad could be implemented in one or more other memory devices. In an embodiment, the partial sum memory and the partial sum scratchpad are implemented using a dual-port memory device, with one port accessing the partial sum memory and the other port accessing the partial sum scratchpad. The in-flight partial sum storage, the partial sum memory, and the partial sum scratchpad could therefore be considered different logical sources of partial sums, but need not be implemented in different physical memory devices.
With live partial sum construction, a decoder main decoding loop for each bit #x may be rewritten as follows:
Live partial sum computation may reduce decoding latency by performing partial sum computations for at least higher-order partial sums during decoding of subsequent bits in a received word.
The “PSUM update” in the main decoding loop for the conventional polar code SCL decoder is a full PSUM update of all PSUM bits to which a current decoded bit contributes. With live PSUM computation, although there may be PSUM updates to PSUM bits in a lower-order PSUM, PSUM computations to update PSUM bits in higher-order PSUMs are performed during decoding of subsequent bits. The lower-order PSUM updates are shown as “In-Flight PSUM Update” in the main decoding loop for a polar code SCL decoder with live PSUM computation in
After each bit in a received word is decoded to generate an estimated value ûx, a partial sum PSm(y:z) is updated in the in-flight PSUM storage 602. This type of partial sum update is represented in
An 8-bit decoding segment is used herein as an illustrative example. Other decoding segment sizes may be used in other embodiments. Similarly, 8 PEs are used in implementing each decoding path in a decoder in an embodiment, but other embodiments could include different numbers of PEs. Although decoding segment length in bits may match the number of PEs, the number of bits per decoding segment may be different from the number of PEs per decoding path.
The PS8(0:7) partial sum shown in the in-flight PSUM storage 602 is written to the PSUM memory 606 when decoding of the 8-bit segment in this example is complete. This writing to memory, as shown at 610, is during decoding of bit #8 in an embodiment.
For this example of an 8-bit decoding segment, the in-flight PSUM storage 602 could also be erased when the PS8(0:7) partial sum is stored to the PSUM memory 606 at 610. The in-flight PSUM storage 602 could instead be over-written with new partial sum bits that are generated as subsequent bits in the received word are decoded.
As the bits #8 to #15 in the next 8-bit segment are decoded, partial sums are computed and stored to the in-flight PSUM storage 602. After bit #15 is decoded, and during decoding of bit #16 in the example shown, PS8(8:15) is transferred from the in-flight PSUM storage 602, to the PSUM scratchpad 604, as shown at 612. To compute the higher-order PS16(0:15), the lower-order PS8(8:15) is available in the PSUM scratchpad 604 as shown at 616, and PS8(0:7) is read from the PSUM memory 606 as shown at 614. PS16(0:15) is computed at 618 and is written to the PSUM memory 606 at 620. The PS16(0:15) computation involves only one memory read operation and only one memory write operation with the PSUM memory 606 in this example.
Partial sum computation proceeds in a similar manner for subsequent segments of bits. A partial sum is transferred from the in-flight PSUM storage 602 to either the PSUM scratchpad 604 or to the PSUM memory 606 after each decoding segment of 8 bits, as shown at 622, 624, 640. After every 16 bits, a higher-order partial sum computation is performed, at 636 for example, using a lower-order partial sum that is read from the PSUM memory 606 at 626, 628 and one or more lower-order partial sums available in the PSUM scratchpad 604, as shown at 630, 632, 634. The newly generated higher-order partial sum is written to the PSUM memory 606 at 638. Entries in the PSUM memory 606 which are no longer needed for decoding of subsequent bits or computation of other partial sums may be overwritten in the PSUM memory 606. As shown at 638, for example, PS16(0:15) and PS8(16:23) are overwritten by PS32(0:31).
During live PSUM computation, there is only one read operation or write operation involving the PSUM memory 606 per clock cycle. However, more than one read operation or write operation involving the PSUM memory 606 could be performed during PSUM computation over multiple clock cycles. For example, with higher-order PSUMs, the PSUM bits could be written over multiple cycles. Consider an example computation of a PS16 higher-order partial sum. PSUM16(8:15) could be written in one clock cycle, and PS16(0:7) could be written in a second clock cycle, to perform a PSUM update in reverse order relative to indices of the G nodes that use the PSUM bits.
As discussed in further detail herein, G nodes in a DDG may also be executed in a reverse order relative to node index. Provided there are at least as many PSUM bits available in a clock cycle as there are PEs, then live PSUM updating does not add decoding latency. In reverse order decoding, the G function is performed for a subset of G nodes in a DDG using available PSUM bits, during computation of PSUM bits that will be used in other G nodes.
The PSUM scratchpad 604 is used to temporarily store lower-order partial sums that are used in computing higher-order partial sums. For example, PS8(8:15) contributes more than once to PS16(0:15), because the polar code matrix PS8(8-15) is in both lower quadrants of PS16(0-15) as shown in
Although not explicitly shown in
In conventional polar decoding, partial sums are updated after each bit is decoded, and before decoding of a next bit begins. As described herein, higher-order partial sum computations are performed during decoding of subsequent bits, and in this manner decoding latency could be reduced. In addition, partial sum memory ports may be running continuously for partial sum updates in conventional polar decoding, and partial sum updates may therefore consume significant power. Live higher-order partial sum computations as disclosed herein are performed only after segments of bits are decoded, thereby potentially reducing power consumption associated with partial sum updates. In
Faster or more power efficient memory could be used to store partial sums in conventional decoders, to reduce partial sum update latency or power consumption. For example, flip flops could be used to store partial sums. However, such storage for all partial sums tends to be practical for codeword lengths of at most a few hundred bits. Another hardware-based approach to improving partial sum update performance could involve storing partial sums in multiple shallow but wide memories. Such an approach, however, adds significant cost, and memory form factor could become a physical implementation issue.
The PSUM scratchpad 604 is provided to store lower-order partial sums that are used in computing higher-order partial sums, which may avoid at least some memory access operations with the PSUM memory 606 during higher-order partial sum computation. Consider an example of N=2048 bits codeword length, list size L=32 paths, 6-bit LLRs, and decoding of 4 codewords. The decoding of 4 codewords relates to an example implementation, and other embodiments could decode other numbers of codewords. Multi-codeword decoding may utilize existing hardware gates, when those gates would otherwise be idle, to process other codewords. For example, if only one codeword were to be decoded at a time, then the LLR nodes would be idle during each sort stage of decoding that codeword. In a multi-codeword decoding embodiment, another codeword could be processed at an LLR stage using those LLR nodes, instead of having those LLR nodes remain idle during the sort stage of a different codeword. Multi-codeword decoding could also realize a benefit from economy of scale of memory. As memory size increases to provide for multi-codeword decoding, the memory area per bit tends to decrease.
In the above example of 4-codeword decoding, LLR storage for 1024 Stage #0 LLR results is determined as follows:
1024LLR results*32 paths*LLR size*4 codewords=786,432 bits.
LLR storage for Stage #1 (512 LLR results) through Stage #10 (1 LLR result) can be similarly determined. Total LLR storage in this example is 1,572,096 bits, or approximately 1535.3 kb. For the PSUM memory 606, 1024 partial sum bits for 32 paths occupy 128 kb in this example. Total memory space for LLR memory and PSUM memory 606 in this example is approximately 1663.3 kb. In an embodiment, the PSUM scratchpad 604 is used to store only lower-order partial sums that contribute multiple times to a higher-order partial sum. The largest higher-order partial sum in this example is 1024 bits, and the largest lower-order partial sum that contributes multiple times to this largest higher-order partial sum is 512 bits. For 32 paths, total PSUM scratchpad 604 storage space is 16 kb, which represents an increase in total storage space of only about 0.96%.
The foregoing discussion of storage space is an illustrative example, and storage space may be different in other embodiments.
As noted above, each G node combines a respective partial sum bit with LLR values in executing the G function. The 64-by-64 polar code matrix shown in
Reverse order partial sum computation and reverse order traversal of the PS64(0-63) structure in
G Nodes #56-63 (Clock Cycle 1):
G Nodes #48-55 (Clock Cycle 2):
G Nodes #40-47 (Clock Cycle 3):
G Nodes #32-39 (Clock Cycle 4):
G Nodes #0-31 (Over Clock Cycles 5 to 8, 8 Nodes in 8 PEs Per Cycle):
When PS64(0:63) is written to the PSUM memory, it overwrites the existing PS32(0:31), PS16(32:47) and PS8(48:55), which are no longer needed.
The example in
Consider again an example of N=2048, 11-stage decoding with 8-bit decoding segmentation. When the decoder starts the LLR computations at Stage #7 for decoded bit x (x=8, 24, 40, . . . ), it copies the in-flight PS8 from the current list position to the PSUM memory, as shown at 610, 622, 640 in
Higher-order partial sums are similarly computed. When the decoder starts the LLR computations at Stage #3 for N=2048 and decoded bit x (x=128, 384, 640, . . . ), the decoder computes the PS128 based on the previous 128 decoded bits and writes it to the PSUM memory for the current list position. The PS128(0:127) computation, for example, uses the in-flight PSUM PS8(120:127) and PSUM memory PS8(112:119), PS16(96:118), PS32(64:95) and PS64(0:63) from its parent decoding path. When the decoder starts the LLR computations at Stage #2 for decoded bit x (x=256, 768, 1280, 1792), it computes the PS256 based on the previous 256 decoded bits and writes it to the PSUM memory for the current list position. The PS256(0:255) computation, for example, uses the in-flight PSUM PS8(248:255), PSUM memory PS8(240:247), PS16(224:239), PS32(192:223) and PS64(128:191), and PSUM memory PS128(0:127) from its parent decoding path. When the decoder starts the LLR computations at Stage #1 for decoded bit x (x=512, 1536), it computes the PS512 based on the previous 512 decoded bits and writes it to the PSUM memory for the current list position. The PS512(0:511) computation uses the in-flight PSUM PS8(504:511), PSUM memory PS8(296:503), PS16(280:295), PS32(448:479) and PS64(384:447), and PSUM memory PS128(256:446) and PS256(0:255) from its parent decoding path. When the decoder starts the LLR computations at Stage #0 for decoded bit x (x=1024), it constructs the PS1024 based on the previous 1024 decoded bits and writes it to the PSUM memory for the current list position. The PS1024(0:1023) computation uses the in-flight PSUM PS8(1016:1023), PSUM memory PS8(1008:1015), PS16(992:1007), PS32(960:991) and PS64(896:959), and PSUM memory PS128(768:895), PS256(512:767) and PS512(0:511) from its parent decoding path.
This example illustrates how higher-order partial sums are computed, by building up the higher-order partial sums from lower-order partial sums that are based on previously decoded bits. The lower-order partial sums are used in computing the higher-order partial sums in a reverse order relative to the order in which at least some of the previously decoded bits were decoded, and G nodes are traversed during decoding of subsequent bits in a reverse order relative to an order or index of the nodes in a DDG. In
A bit in the received word is decoded at 804 to generate a decoded bit. Decoded bits are output at 808, for additional receiver processing, if it is determined at 806 that decoding has reached the end of the received word. Otherwise, a lower-order partial sum is updated at 810 based on the decoded bit, and is stored to in-flight storage at 812. At 814, a determination is made as to whether decoding has reached the end of a decoding segment. For example, with reference to
At the end of a decoding segment, a higher-order partial sum is computed and stored at 816, during decoding of a subsequent bit in the received word at 804.
At 902, a partial sum is stored to temporary storage in the form of the PSUM scratchpad, and/or to the PSUM memory, as noted above with reference to
Higher-order partial sum computations at 906 in
Returning to
The decoder 1006 is configured to decode bits in a decoding segment of the received word to generate decoded bits, to update a lower-order partial sum based on the decoded bits, and to compute a higher-order partial sum based on the lower-order partial sum during decoding of a subsequent bit in the received word. The PSUM in-flight storage 1010 is illustrative of partial sum storage for storing lower-order partial sums as bits in each decoding segment are decoded, and the decoder 1006 is configured to update the lower-order partial sums by updating the lower-order partial sums in the PSUM in-flight storage. The decoder 1006 is also configured to store computed higher-order partial sums to the PSUM memory 1014. The PSUM scratchpad memory 1012 is illustrative of temporary partial sum storage, which the decoder 1006 is configured to use during computation of higher-order partial sums. For example, the decoder 1006 could be configured to retrieve lower-order partial sums from the PSUM scratchpad memory 1012 and the PSUM memory 1014 for computation of a higher-order partial sum, and/or to store computation results to one or both of the PSUM scratchpad memory and the PSUM memory. The decoder 1006 is also or instead configured to implement reverse order PSUM computation and/or reverse order DDG node traversal in some embodiments.
In some embodiments, the apparatus 1100, and similarly the apparatus 1000 in
Communication equipment could include the apparatus 1000, the apparatus 1100, or both a transmitter and a receiver and both an encoder and a decoder. Such communication equipment could be user equipment or communication network equipment.
The previous description of some embodiments is provided to enable any person skilled in the art to make or use an apparatus, method, or processor readable medium according to the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles of the methods and devices described herein may be applied to other embodiments. Thus, the present disclosure is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Patent | Priority | Assignee | Title |
10831231, | Mar 28 2018 | XILINX, Inc. | Circuit for and method of implementing a polar decoder |
Patent | Priority | Assignee | Title |
6324670, | Mar 24 1999 | JPMORGAN CHASE BANK, N A , AS SUCCESSOR AGENT | Checksum generator with minimum overflow |
20130117344, | |||
20140365842, | |||
20150381208, | |||
20160241258, | |||
CN104539393, | |||
CN105811998, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Sep 09 2016 | Huawei Technologies Co., Ltd. | (assignment on the face of the patent) | / | |||
Sep 09 2016 | HAMELIN, LOUIS-PHILIPPE | HUAWEI TECHNOLOGIES CO , LTD | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 039688 | /0782 |
Date | Maintenance Fee Events |
Mar 08 2023 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Date | Maintenance Schedule |
Sep 24 2022 | 4 years fee payment window open |
Mar 24 2023 | 6 months grace period start (w surcharge) |
Sep 24 2023 | patent expiry (for year 4) |
Sep 24 2025 | 2 years to revive unintentionally abandoned end. (for year 4) |
Sep 24 2026 | 8 years fee payment window open |
Mar 24 2027 | 6 months grace period start (w surcharge) |
Sep 24 2027 | patent expiry (for year 8) |
Sep 24 2029 | 2 years to revive unintentionally abandoned end. (for year 8) |
Sep 24 2030 | 12 years fee payment window open |
Mar 24 2031 | 6 months grace period start (w surcharge) |
Sep 24 2031 | patent expiry (for year 12) |
Sep 24 2033 | 2 years to revive unintentionally abandoned end. (for year 12) |