In one embodiment, a circuit for matrix decomposition is provided. The circuit includes an input circuit for receiving a first matrix. A permutation circuit is coupled to the input circuit and configured to interchange columns of the first matrix according to a selected permutation to produce a second matrix. A systolic array is coupled to the permutation circuit and configured to perform QR decomposition of the second matrix to produce a third matrix and a fourth matrix. A reverse permutation circuit is coupled to the systolic array and configured to interchange rows of the third matrix according to an inverse of the selected permutation to produce a first factor matrix and interchange rows of the fourth matrix according to the inverse of the selected permutation to produce a second factor matrix.
|
11. A processor-implemented method of decoding multiple input multiple output (mimo) signals, comprising
storing a first matrix of inputs from a mimo receiver;
reordering the first matrix according to a selected permutation that is selected based on a value of a first element of column 1;
inputting the reordered first matrix into a first systolic array;
QR decomposing the first matrix by performing steps by a processor including:
triangularizing the reordered first matrix with the first systolic array to produce a second matrix;
performing an inversion of the second matrix to produce a third matrix; and
performing a left multiplication on the third matrix to produce a fourth matrix;
performing a cross-diagonal transposition on the fourth matrix to produce a fifth matrix;
performing right multiplication on the fifth matrix to produce a sixth matrix;
reverse permuting one of second, third, fourth, fifth or sixth matrices to place the sixth matrix in a non-permuted form;
multiplying the sixth matrix with a received signal vector to produce decoded mimo signals; and
outputting the decoded mimo signals.
1. A multiple input multiple output (mimo) receiver circuit, comprising:
a receiver front end circuit configured to provide a set of channel transfer elements corresponding to radio frequency signals received on a subcarrier of a wireless channel by a plurality of antennas;
a channel estimation circuit coupled to the receiver front end circuit and configured to construct a channel matrix from the set of channel transfer elements;
a preprocessing circuit coupled to the channel estimation circuit, wherein the preprocessing circuit is configured to:
receive the channel matrix; and
reorder columns of the channel matrix into an input matrix according to an ordering that is selected based on a value of a first element of column 1;
a systolic array having boundary cells and internal cells, wherein the preprocessing circuit is coupled to one of the boundary cells and a subset of the internal cells, and the boundary cells and internal cells are configured to perform QR decomposition and minimum mean square error (MMSE) mimo decoder weight matrix computation operations to produce a weight matrix;
a post-processing circuit coupled to the systolic array and configured to reorder rows of the weight matrix according to the selected ordering; and
an output circuit coupled to the systolic array and configured to multiply the weight matrix by the matrix of unresolved symbols from the receiver front-end circuit to produce an estimate of isolated symbols corresponding to the unresolved symbols.
2. The mimo receiver circuit of
the selected ordering has the column at column index 1 swapped with the column at an column index i, wherein i is less than or equal to a number of columns in the channel matrix; and
the reorder of the rows of the weight matrix swaps the row at row index 1 with the row at row index i.
3. The mimo receiver circuit of
4. The mimo receiver circuit of
5. The mimo receiver circuit of
the selected ordering has the column at column index 1 swapped with a column i, i being less than or equal to a number of columns in the channel matrix;
the first value in column i is greater than the selected threshold.
6. The mimo receiver circuit of
7. The mimo receiver circuit of
a first systolic array having boundary and internal cells, including the one of the boundary cells and a subset of the internal cells coupled to the preprocessing circuit, the first systolic array configured to perform triangularization and back-substitution on the input matrix to produce an output matrix; and
a second systolic array coupled to receive the output matrix of the first systolic array, wherein the second systolic array is configured to perform right and left multiplication operations and cross-diagonal transposition on the output matrix to produce a weight matrix.
8. The mimo receiver circuit of
the second systolic array is coupled to the output of the first systolic array though the post-processing circuit; and
the post-processing circuit is coupled to the output of the first systolic array and is configured to reorder rows of the weight matrix according to the selected ordering by reordering rows of the matrix output from the first systolic array.
9. The mimo receiver circuit of
10. The mimo receiver circuit of
the boundary cells, other than the one boundary cell coupled to one of the pre-processing circuits, are respectively coupled to receive input from the internal cells;
each internal cell is respectively coupled to one of the boundary cells or one of the internal cells to receive a first input;
each internal cell is respectively coupled to one of the internal cells or one of the respective preprocessing circuits to receive a second input;
while the first systolic array is performing triangularization:
the boundary cells are configured to store respective first residues as a result of triangularization of the input matrix and to provide respective inverted residues from the first residues; and
the internal cells are configured to store respective second residues as a result of triangularization of the input matrix; and
while the first systolic array is performing back-substitution:
the boundary cells are configured to respectively multiply the inverted first residues with the first inputs to provide first outputs;
the internal cells are configured to respectively multiply the first inputs with the second residues to provide intermediate results; and
the internal cells are further configured to respectively add the intermediate results with the second inputs to provide second outputs.
12. The method of
13. The method of
14. The method of
15. The method of
16. The method of
17. The method of
18. The method of
19. The method of
|
The disclosed embodiments generally relate to applications utilizing QR matrix decomposition, and more particularly to the communication to multiple input antennas from multiple output (MIMO) antennas.
Data can be transmitted electromagnetically between a transmitting and a receiving antenna. The transmitter encodes the data into a sequence of symbols selected from a symbol constellation. The transmitting antenna transmits the symbols and the receiving antenna detects the symbols.
Interference from noise and reflections may corrupt the symbols received by the receiving antenna. For a maximum-likelihood detector, the receiver can compare the received signal with the expected received signal for all of the symbols in the constellation. The expected received signal that most closely matches the actual received signal provides the detected symbol.
A measurement of the characteristics of the communication medium helps proper symbol detection. In one example, the transmitter periodically transmits a known pattern of symbols to the receiver and the receiver uses the known pattern to determine the characteristics, such as multiple signal propagation paths, of the communication medium.
The data transfer rate may be increased by transmitting multiple symbols in parallel from multiple transmitting antennas. The detection of the multiple transmitted symbols improves by receiving the symbols with multiple receiving antennas. For maximum-likelihood detection with multiple transmitting antennas, the number of possible combinations of symbols transmitted in parallel is the degree of the constellation raised to the power of the number of transmitting antennas. Evaluation of all possible combinations is infeasible for higher order modulation and a large number of antennas.
The disclosed embodiments may address one or more of the above issues.
In one embodiment, a multiple input multiple output (MIMO) receiver circuit is provided. The receiver circuit includes a receiver front-end circuit configured to provide a set of channel transfer elements corresponding to radio frequency signals received on a subcarrier of a wireless channel by a plurality of antennas. Channel estimation circuitry is coupled to the receiver front-end circuit and configured to construct a channel matrix from a set of channel transfer elements. A preprocessing circuit is coupled to the channel estimation circuitry and is configured to receive input from channel matrix and reorder columns of the channel matrix into an input matrix according to an ordering that is selected based on first values in at least two of the columns. The receiver circuit includes a systolic array having boundary cells and internal cells. The preprocessing circuit is coupled to one of the boundary cells and a subset of the internal cells, and the boundary cells and internal cells are configured to perform QR decomposition and MMSE operations to produce a weight matrix. A post-processing circuit is coupled to the systolic array and configured to reorder rows of the weight matrix according to the selected ordering. An output circuit is coupled to the systolic array and configured to multiply the weight matrix by the matrix of unresolved symbols from the receiver front-end circuit to produce an estimate of isolated symbols corresponding to the unresolved symbols.
In another embodiment, the selected ordering swaps the column at column index 1 with the column at column index i, where i is less than or equal to the number of columns in the channel matrix. The reordering of the rows of the weighted matrix swaps the row at row index 1 with the row at row index i.
The reordering of the channel matrix, in another embodiment, is performed in response to the first element of column 1 having a value less than the first element of column i, wherein i is less than or equal to the number of columns in the channel matrix.
The reordering of the channel matrix is performed in response to the first element of column 1 having a value less than a selected threshold in another embodiment.
In another embodiment, the selected ordering has the column at column index 1 swapped with a column i. The first value in column i is greater than the selected threshold, and i is less than or equal to the number of columns in the channel matrix.
The selected ordering, in another embodiment, has columns sorted according to the value of the first element in each column, the sorting placing the column with the largest value at column index 1.
In another embodiment, the systolic array includes a first systolic array and a second systolic array. The first systolic array has boundary and internal cells, including the one of the boundary cells and a subset of the internal cells coupled to the preprocessing circuit. The first systolic array is configured to perform triangularization and back-substitution on the input matrix to produce an output matrix. The second systolic array is coupled to receive the output matrix of the first systolic array and is configured to perform right and left multiplication operations and cross-diagonal transposition on the output matrix to produce a weight matrix.
The second systolic array, in another embodiment, is coupled to the output of the first systolic array though the post-processing circuit. The post-processing circuit is coupled to the output of the first systolic array and is configured to reorder rows of the weight matrix according to the selected ordering by reordering rows of the matrix output from the first systolic array.
In another embodiment, the one of the boundary cells outputs a reciprocal of a number dependent on the input from the first column of the input matrix.
The boundary cells, other than the one boundary cell coupled to one of the pre-processing circuits, are respectively coupled to receive input from the internal cells in another embodiment. Each internal cell is respectively coupled to one of the boundary cells or one of the internal cells to receive a first input. Each internal cell is respectively coupled to one of the internal cells or one of the respective preprocessing circuits to receive a second input. While the first systolic array is performing triangularization, the boundary cells are configured to store respective first residues as a result of triangularization of the input matrix and to provide respective inverted residues from the first residues. The internal cells are configured to store respective second residues as a result of triangularization of the input matrix. While the first systolic array is performing back-substitution, the boundary cells are configured to respectively multiply the inverted first residues with the first inputs to provide first outputs, the internal cells are configured to respectively multiply the first inputs with the second residues to provide intermediate results, and the internal cells are further configured to respectively add the intermediate results with the second inputs to provide second outputs.
In another embodiment, a processor-implemented method of decoding MIMO signals is provided. A first matrix of inputs from a MIMO receiver is stored. The first matrix is reordered according to a selected permutation that is selected based on first values in at least two of the columns. The reordered first matrix is input into a first systolic array and QR decomposed by a processor by triangularizing the reordered first matrix with the first systolic array to produce a second matrix, performing an inversion of the second matrix to produce a third matrix, and performing a left multiplication on the third matrix to produce a fourth matrix. Cross-diagonal transposition is performed on the fourth matrix to produce a fifth matrix. Right multiplication is performed on the fifth matrix to produce a sixth matrix. One of second, third, fourth, fifth or sixth matrices is reverse permuted to place the sixth matrix in a non-permuted form. The sixth matrix is multiplied with a received signal vector to produce decoded MIMO signals.
The reverse permuting includes reordering the fifth matrix according to the transposition of the selected permutation in another embodiment.
In another embodiment, the reverse permuting includes reordering the fourth matrix according to the transposition of the selected.
The reverse permuting, in another embodiment, includes reordering the third matrix according to the transposition of the selected ordering.
In another embodiment, the reverse permuting includes reordering the second matrix according to the transposition of the selected ordering.
The inversion of the third matrix is performed by the first systolic array in another embodiment.
The left multiplication on the third matrix and right multiplication on the fourth matrix are performed by the first systolic array in another embodiment.
In another embodiment, the left multiplication, cross-diagonal transposition, and right multiplication are performed by a second systolic array.
The selected permutation, in another embodiment, swaps the column at index 1 with a column at an index i in response to the first element of the column 1 having a value less than the first element of column i.
In another embodiment, a circuit for matrix factorization is provided. The circuit includes an input configured to receive a first matrix. A permutation circuit is coupled to the input circuit and configured to interchange columns of the first matrix according to a selected permutation to produce a second matrix. A systolic array is coupled to the permutation circuit and configured to perform QR decomposition of the second matrix to produce a third matrix and a fourth matrix. A reverse permutation circuit is coupled to the systolic array and configured to interchange rows of the third matrix according to an inverse of the selected permutation to produce a first factor matrix and interchange rows of the fourth matrix according to the inverse of the selected permutation to produce a second factor matrix.
It will be appreciated that various other embodiments are set forth in the Detailed Description and Claims which follow.
Various aspects and advantages of the invention will become apparent upon review of the following detailed description and upon reference to the drawings, in which:
In multiple input multiple output (MIMO) systems multiple (M) transmitting antennas transmit respective symbols in parallel to multiple (N) receiving antennas. Each of the receiving antennas receives a weight sum of the respective symbols transmitted from the transmitting antennas. Various methods exist to decode or separate the symbols transmitted by each transmitting antenna. One such method involves using a QR decomposition of the channel matrix to compute the MIMO decoding weight matrix. In the QR decomposition and MIMO decoding calculation, a systolic array can be used to increase streaming throughput. A systolic array is an interconnected matrix of individual signal processing units, or “cells,” where the cells process individual elements of an input matrix and exchange processed output to perform an overall operation.
In many of the present methods, systolic array signal processing cells must perform calculations that are dominated by a multiplicative inverse (1/x) calculation of an input value (x). Alternatively, 1/sqrt(x) could also be calculated. Small input values, resulting from the weakening of signals during transmission, cause the multiplicative inverse (1/x) or (1/sqrt(x)) to become very large. This may cause inaccuracies in results compared to floating point implementations if larger number of bits are not allocated and this also may cause overflow to occur in the processing cells if the systolic array is not capable of processing values with a sufficient number of bits to store the multiplicative inverse.
Overflow occurs when a number becomes too large to be represented by the processing hardware. For example if a processor ALU is n-bits, than the largest number that can be output is 2n. For practical reasons, a typical ALU performs operations using only the lower n-bits of the result when overflow occurs. In QR decomposition and decoding MIMO symbols, this loss of precision can adversely affect the correct estimation of symbols or the computation of the QR decomposition. The conventional solution is to implement processing cells of the systolic array to allow values having a larger number of bits to be processed. This requires additional hardware, increasing the cost of hardware as well as the area needed to implement the systolic array.
The disclosed embodiments provide a method and circuit that prevent overflow and reduce hardware requirements of the systolic array by selectably arranging elements of the input matrix in a manner that prevents division operations from becoming dominated by weakened input symbols. The arranging of elements in matrices may also be referred to as reordering, swapping, interchanging, or permuting and such terms are used interchangeably herein.
A model for the communication channel between the M transmitting antennas and the N receiving antennas is:
y=Hx+n
where H is an N×M channel matrix between the N receiving antennas and the M transmitting antennas, x is a column vector of M symbols transmitted from the transmitting antennas, n is a column vector of N received noise elements, and y is a column vector of N signals received at the receiving antennas. Each of the M transmitted symbols in column vector x is a symbol from a constellation having an order of w symbols.
An estimate {circumflex over (x)} of the transmitted symbols can be computed by finding a weight matrix W that can multiply the received signal vector y. The weight matrix W can be computed using the minimum mean square error (MMSE) of inverse of H. The MMSE solution is given by:
W=(HHH+σ−2InT)−1HH
The MMSE solution above requires the generation of the HHH matrix. In various solutions the HHH multiplication can be avoided by using an extended channel matrix defined as:
The estimate {circumflex over (x)} is defined in terms of the extended channel matrix as:
{circumflex over (x)}=Wy=(HHH)−1HHy=H†y
Both solutions require a matrix inverse of the H matrix. This is accomplished through QR decomposition as follows:
H=QR
H†=R−1QH
In the case of the extended channel matrix solution the QR decomposition of the extended matrix can be expressed as:
By equating the lower block the following solution is obtained:
With this solution the estimate {circumflex over (x)} can be expressed as:
where
W=R−1Q1H
Q1 can be calculated by equating the upper block matrix as:
H=Q1RQ1=HR−1
The calculation of the weight matrix through MMSE QR decomposition can be implemented using one or more systolic arrays. A systolic array is an interconnected matrix of individual signal processing units or cells, where overall operation of the systolic array depends upon functions of the individual signal processing cells and the interconnection scheme of such signal processing cells. A clock signal may be applied to a systolic array to control data flow through each cell. Alternatively, operations of an individual cell may be triggered by the arrival of input data objects.
The interconnection scheme of some systolic arrays may include interconnects only between nearest neighbor signal processing cells within a systolic array. However, interconnection schemes are not limited to having only nearest neighbor interconnects.
In matrix processing operations, matrix elements are passed between cells according to element relationship and the function to be performed. For example, matrix multiplication is performed by inputting one row of the matrix at a time from the top of the array, which is passed down the array. The other matrix is input one column at a time from the left hand side of the array and passes from left to right. When each cell has processed one whole row and one whole column, the result of the multiplication is stored in the array and can now be output a row or a column at a time, flowing across or down the array.
The systolic array implementation of the MMSE calculation is advantageous because it is easily scalable as the number of antenna channels used increases. To calculate MMSE in a systolic array, the triangularization operation (which is part of QR decomposition) is performed on the extended channel matrix H and a triangular matrix R is generated. The triangularized matrix R is inverted using back-substitution within the systolic array to generate R−1. The Q1 matrix is then generated by left multiplication of the original channel matrix H with R−1. Q1H, the Hermitian transpose of Q1 is generated by some special circuitry and wiring between output and input of the systolic array. The weight matrix W is then generated by right multiplying Q1H with
R−1. An estimate {circumflex over (x)} is then computed by multiplying weight matrix W with received signal vector y.
The systolic array cells may be configured to operate in different modes to perform each function of the MMSE calculation. As such, some systolic array configurations will implement all functions of the MMSE calculation within a single systolic array with a different mode for each function to be performed. Alternatively, the various functions of the MMSE calculation may be performed by separate systolic arrays, where the output matrix of one array is passed as input to the next.
Implementation of the boundary cells is different from the internal cells. Boundary cells are configured to calculate initial values that are passed on to and/or updated by the internal cells. For example, in triangularization, the boundary cells are configured to calculate rotation factors which are passed through and applied by the internal cells.
Systolic arrays for QR decomposition are advantageous in that they are fast and scale easily as the matrix size increases or, in the context of MMSE, as the number of MIMO antennas is increased. However, the MMSE calculation includes division operations, which may lead to overflow or loss of precision if the inputs become too small due to weakening of the symbols during transmission.
Rotation factors are calculated and updated as each element of the matrix is input to and processed by each cell. In calculating rotation factors c and s, the denominator value rnew is dependent on the input x. When processing the first input to the boundary cell, the denominator value rnew is equal to the square root of x. As a consequence, if the first input value x is very small, as a result of signal fading, the value of 1/sqrt(rold2+x2) or 1/rnew from the above equations will be a very large number, leading to inaccurate results as compared to floating point implementations unless a large number of bits are used for the fixed point operations.
In contrast, when processing subsequent inputs to the boundary cell, the denominator value rnew is dependent on both rold and x. If the first input is large, rold will also be large, ensuring that the denominator rnew is not dominated by a small value x. It is therefore desirable to have the first input to a boundary cell be a larger value.
The disclosed embodiments prevent overflow and loss of precision by ensuring that the first input x to the boundary cell is sufficiently large to avoid overflow in the multiplicative inverse division operation. If the first element in the first column of the input matrix is a small value, the element is swapped with another element of the input matrix. This is performed by permuting the ordering of rows or columns of the matrix prior to QR decomposition with the systolic array. In this manner, the input of small initial values to the boundary cells can be avoided. This method allows fixed point implemented operations to be optimized to use fewer bits and reduce hardware requirements.
In the language of matrices, the columns or rows may be inter-changed to produce equivalent systems, which have the same solution as the original. A matrix may be permuted into an equivalent system and processed as the equivalent system. After processing is complete, the reverse permutation of the permuted result produces the un-permuted result.
The interchanging of rows and columns are represented by the multiplication of a matrix A with a permutation matrix P. Multiplication of permutation P with A produces the permuted matrix B which can be used for intermediate processing. Once intermediate processing is completed, the original matrix A is restored from the permuted result matrix B by multiplying B with reverse permutation matrix PT. This relationship is shown by:
P*A=B, and
A=PT*B
The reverse permutation matrix PT is the inverse of the permutation matrix P. The product of PT and P produce what is known as the identity matrix I, which represents an equivalence between the original and resulting matrices. Example matrix A and permutation matrices are:
Using these matrices, permutation of A with P produces the permuted matrix:
In this example, the permutation process moves the first, second and third rows to respective the third, first, and second rows. Reverse permutation of the permuted matrix B restores the original matrix:
In this manner, the ordering of an input matrix can be changed for processing and restored, once processing is completed, to produce the correct result. The disclosed embodiments selectably permute the input channel matrix to allow elements to be processed in an order that does not result in processing overflow.
In the example implementation of
Xout=−s·r+c·Xin
r(new)=c*·r+s*·Xin
In calculating the value of Xout, the value of rnew is dependent on the values of Xin and rold which are calculated from previously processed elements of the channel matrix. Even if Xin is a weakened signal, rold will be sufficiently large to ensure that no overflow or loss of precision occurs.
In the previous example, pre-reordering is described in terms of column permutations and post-reordering in terms of row permutations. One skilled in the art will recognize that pre-reordering may be performed using column permutations and post-reordering in using row permutations as well.
In one embodiment, the channel matrix is permuted to prevent overflow in the first boundary cell. By ensuring that a sufficiently strong element is processed by this boundary cell, a sufficiently strong output from the boundary cell will propagate through the systolic array, preventing overflow from occurring in the other boundary cells as well.
Accordingly, a first input row 401 for input of matrix H is H1, 0, 0, 0 as respectively input to cells 401 through 404. Furthermore, a second input row 402 for input of matrix H includes values 0, H2, 0, 0, respectively input to cells 401 through 404. A third input row 403 for input of matrix H is 0, 0, H3, 0 as respectively input to cells 401 through 404. A fourth input row 404 for input of matrix H does not include any zero padding in the depicted exemplary embodiment; however, input rows after row 404 do include zero padding in the depicted exemplary embodiment. Accordingly, rows 451 through 454 of matrix H may be input as staggered with zero padding for multiplication.
As H is input and triangularization is performed, the c and s values calculated in boundary cell 401 will propagate through the array and be used for calculation in other cells. If overflow is prevented in boundary cell 401, the propagated c and s values will be values sufficient to ensure that Xin values input to other boundary cells are sufficiently large in order to prevent overflow or loss of precision in the other boundary cells.
When triangularization is complete, trained register values of the processing cells contain matrix R. On the right side of systolic array 400 output 460 may be obtained. If the systolic array is configured to perform back substitution in addition to triangularization, each cell will switch to a back substitution mode following triangularization, and would use the stored R values to perform the inversion operation. After back-substitution each cell would be trained to contain R−1 values. The inverted matrix R−1 would be shifted to outputs 460 on the right side of systolic array 400.
Alternately, if the systolic array were configured to operate in yet another mode to perform the left multiplication operation, the trained values, R−1 would not be shifted to output but would be maintained within each cell to perform the left multiplication operation. In some embodiments, the trained stored values in a systolic array are referred to as residues and such terms are used interchangeably herein.
The systolic array implementations of
The various embodiments described herein perform reordering according to a number of different permutations. For ease of explanation, the following example permutations are described in terms of sorting columns according to the first element in each column. One skilled in the art will recognize that the channel matrix may be reordered by permuting the rows according to the first element in each row.
In one embodiment, the permutation may entail sorting columns according to the strength of the first element in each of the columns. The columns may be sorted according to a number of various sorting methods such as: bubble sort, insertion sort, shell sort, merge sort, heap sort, quick sort, counting sort, bucket sort, radix sort, etc.
In several embodiments, swapping of two rows is sufficient to prevent overflow. In one such embodiment the first column is swapped with the column having the strongest first element of the respective columns. The permutation method may selectably interchange the first column with the column having the highest value in the first element of the respective columns.
In some embodiments, permutation may be selectably performed based on the value of the first element in the first column. The value of the first element is analyzed to determine whether the un-permuted input matrix will result in overflow. This may be performed by comparing the first value to a selected threshold value. If the value is less than the selected threshold value, permutation is performed according to the implemented permutation method. The selected threshold may be chosen to be the minimum value that will not cause an overflow.
The above permutation methods are provided for exemplary purposes. One skilled in the art will recognize that a number of other permutation methods may be used as well.
Upper right triangularization is performed on the extended channel matrix H at step 708 using a systolic array, which conditions the systolic array with triangularized matrix R. Back substitution is performed on R at step 710 to obtain inverted matrix R−1. Left multiplication of extended channel matrix H with R−1 is performed at step 712 to provide matrix Q1. Cross diagonal transposition is performed on matrix Q1 at step 713 to produce Q′1. Right multiplication of Q′1 with R−1 is then performed to provide weight matrix W at step 714.
Reverse permutation is performed on weight matrix W with reverse permutation matrix PT at step 716 to place the matrix in un-permuted form W′. Received symbols matrix y is obtained at step 718 and right multiplied with matrix W′ to obtain an estimate of transmit symbols matrix X at step 720. Estimated data symbols 724 are output from X.
Although the embodiments are primarily described in terms of MMSE decoding, one skilled in the art will recognize that these embodiments may be utilized in a number of other applications which perform QR decomposition. For example, the systolic array is configured to perform QR decomposition by performing steps 702, 704, 706, 708, 710, and 712 to produce respective Q and R matrices.
Matrix processing block 808 performs the MMSE operation on the extended channel matrix to produce weight matrix W. Matrix processing block 808 contains one or more systolic arrays (not shown) to perform the MMSE operations. The post processing block 822 includes a reverse permutation circuit 824 for performing reverse permutation on the weight matrix W to produce matrix W′. Output circuit 820 multiplies each by a symbol selection vector y to output an estimated symbol matrix X.
In some FPGAs, each programmable tile includes a programmable interconnect element (INT 311) having standardized connections to and from a corresponding interconnect element in each adjacent tile. Therefore, the programmable interconnect elements taken together implement the programmable interconnect structure for the illustrated FPGA. The programmable interconnect element INT 311 also includes the connections to and from the programmable logic element within the same tile, as shown by the examples included at the top of
For example, a CLB 302 can include a configurable logic element CLE 312 that can be programmed to implement user logic plus a single programmable interconnect element INT 311. A BRAM 303 can include a BRAM logic element (BRL 313) in addition to one or more programmable interconnect elements. Typically, the number of interconnect elements included in a tile depends on the height of the tile. In the pictured embodiment, a BRAM tile has the same height as five CLBs, but other numbers (e.g., four) can also be used. A DSP tile 306 can include a DSP logic element (DSPL 314) in addition to an appropriate number of programmable interconnect elements. An IOB 304 can include, for example, two instances of an input/output logic element (IOL 315) in addition to one instance of the programmable interconnect element INT 311. As will be clear to those of skill in the art, the actual I/O pads connected, for example, to the I/O logic element 315 are manufactured using metal layered above the various illustrated logic blocks, and typically are not confined to the area of the input/output logic element 315.
In the pictured embodiment, a columnar area near the center of the die (shown shaded in
Some FPGAs utilizing the architecture illustrated in
Note that
The present invention is thought to be applicable to a variety of systolic arrays configured for QR decomposition of a matrix and MIMO decoding. Other aspects and embodiments will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. It is intended that the specification and illustrated embodiments be considered as examples only, with a true scope and spirit of the invention being indicated by the following claims.
Rao, Raghavendar M., Tarn, Hai-Jo, Mazahreh, Raied N.
Patent | Priority | Assignee | Title |
11507452, | Jul 16 2021 | GOOGLE LLC | Error checking for systolic array computation |
11853156, | Jul 16 2021 | GOOGLE LLC | Error checking for systolic array computation |
11853256, | Sep 10 2013 | Cornami, Inc. | Method, apparatus, and computer-readable medium for parallelization of a computer program on a plurality of computing cores |
Patent | Priority | Assignee | Title |
4727503, | Jul 06 1983 | The Secretary of State for Defence in Her Britannic Majesty's Government | Systolic array |
4777614, | Dec 19 1984 | SECRETARY OF STATE OF DEFENCE, THE | Digital data processor for matrix-vector multiplication |
4823299, | Apr 01 1987 | The United States of America as represented by the Administrator of the | Systolic VLSI array for implementing the Kalman filter algorithm |
5018065, | Oct 13 1988 | SECRETARY OF STATE FOR DEFENCE IN HER BRITANNIC MAJESTY S GOVERNMENT OF THE UNITED KINGDOM OF GREAT BRITAIN AND NORTHERN IRELAND | Processor for constrained least squares computations |
5323335, | Jul 05 1991 | Lockheed Martin Corporation | Regular and fault-tolerant Kalman filter systolic arrays |
5475793, | Feb 10 1989 | Qinetiq Limited | Heuristic digital processor using non-linear transformation |
6675187, | Jun 10 1999 | AVAGO TECHNOLOGIES INTERNATIONAL SALES PTE LIMITED | Pipelined linear array of processor elements for performing matrix computations |
7069372, | Jul 30 2001 | Cisco Technology, Inc | Processor having systolic array pipeline for processing data packets |
7120658, | May 14 2002 | Digital systolic array architecture and method for computing the discrete Fourier transform | |
7489746, | Apr 22 2004 | Qualcomm Incorporated | MIMO receiver using maximum likelihood detector in combination with QR decomposition |
7606207, | Nov 14 2001 | InterDigital Technology Corporation | User equipment and base station performing data detection using a scalar array |
7685219, | Mar 31 2006 | Intel Corporation | Parallel systolic CORDIC algorithm with reduced latency for unitary transform of complex matrices and application to MIMO detection |
7933353, | Sep 30 2005 | Intel Corporation | Communication system and technique using QR decomposition with a triangular systolic array |
8190549, | Dec 21 2007 | Honda Motor Co., Ltd. | Online sparse matrix Gaussian process regression and visual applications |
20030018675, | |||
20040071207, | |||
20080028015, | |||
20090110120, | |||
20090116588, | |||
20090310656, | |||
20110125819, | |||
20110264721, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Jun 08 2010 | MAZAHREH, RAIED N | Xilinx, Inc | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 024526 | /0075 | |
Jun 09 2010 | RAO, RAGHAVENDAR M | Xilinx, Inc | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 024526 | /0075 | |
Jun 11 2010 | XILINX, Inc. | (assignment on the face of the patent) | / | |||
Jun 11 2010 | TARN, HAI-JO | Xilinx, Inc | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 024526 | /0075 |
Date | Maintenance Fee Events |
Sep 26 2016 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Sep 28 2020 | M1552: Payment of Maintenance Fee, 8th Year, Large Entity. |
Sep 11 2024 | M1553: Payment of Maintenance Fee, 12th Year, Large Entity. |
Date | Maintenance Schedule |
Mar 26 2016 | 4 years fee payment window open |
Sep 26 2016 | 6 months grace period start (w surcharge) |
Mar 26 2017 | patent expiry (for year 4) |
Mar 26 2019 | 2 years to revive unintentionally abandoned end. (for year 4) |
Mar 26 2020 | 8 years fee payment window open |
Sep 26 2020 | 6 months grace period start (w surcharge) |
Mar 26 2021 | patent expiry (for year 8) |
Mar 26 2023 | 2 years to revive unintentionally abandoned end. (for year 8) |
Mar 26 2024 | 12 years fee payment window open |
Sep 26 2024 | 6 months grace period start (w surcharge) |
Mar 26 2025 | patent expiry (for year 12) |
Mar 26 2027 | 2 years to revive unintentionally abandoned end. (for year 12) |