Data processing device and method of computing the cosine transform of a matrix

Data processing device and method of computing the cosine transform of a matrix
RE46712

A data processing device provides for registers which can be formatted as segments containing numbers to which operations can be applied in SIMD fashion. In addition it is possible to perform operations which combine different segments of one register or segments at different positions in the different registers. By providing specially selected it is thus made possible to perform multidimensional separable transformations (like the 2-dimensional IDCT) without transposing the numbers in the registers.

PTO Wrapper PDF
Dossier Espace Google

Patent RE46712
Priority Mar 18 1998
Filed Apr 28 2014
Issued Feb 13 2018
Expiry Mar 16 2019
Inventors Van Eijndh…
Assg.orig Koninklijk…
Assg.curr Koninklijk…
Entity unknown
Referenced by 0
References 90
Maint.: EXPIRED

0. 10. A data processing device comprising:

an operand storage circuit for storing operands, each operand subdivided into a plurality of segments at respective positions in the operand wherein each operand is subdivided into the same plurality of segments at the same respective positions; and

an instruction execution unit including an instruction decoder and arithmetic circuits wired to execute an instruction containing an opcode and one or more operand references, each operand reference of the instruction referring commonly to the segments of a respective source operand in the operand storage circuit, said instruction causing the instruction decoder to decode the instruction and set the instruction execution unit to execute a plurality of operations consisting only of addition and subtraction operations in parallel and independently of one another to generate a result that is written to a result register subdivided into the same plurality of segments at the same respective positions as the operands, each operation of the plurality of operations combining, by specific wiring of the arithmetic circuits of the instruction execution unit, predetermined segments from one or more of the respective source operands and writing a result of the combining to a segment of the result register, wherein each of the operations of the plurality of operations caused to execute by the instruction combines segments that have mutually different positions in the one or more respective source operands and at least one of the operations caused to execute by the instruction differs from the other operations caused to execute by the instruction.

0. 1. A data processing device comprising

an operand storage circuit for storing operands, each subdivided into a plurality of segments at respective positions in the operand;

an instruction execution unit for executing an instruction containing one or more operand references, each referring commonly to the segments of a respective source operand in the operand storage circuit, said instruction causing the instruction execution unit to execute a plurality of operations in parallel and independently of one another, each operation combining predetermined segments from one or more of the respective source operands, characterized in that at least one of the operations combines segments that have mutually different positions in the one or more respective source operands and/or that at least one of the operations differs from the other operations.

0. 2. A data processing device according to claim 1, wherein said instruction is referred to as a cross instruction, the instruction execution unit also being arranged for executing a parallel instruction containing two or more further operand references each referring commonly to the segments of a respective source operand in the operand storage circuit, said parallel instruction causing the instruction execution unit to execute a plurality of operations in parallel and independently of one another, each operation combining predetermined segments from the source operands having mutually corresponding positions in the two or more referenced further source operands.

0. 3. A data processing device according to claim 2, programmed with a program for computing a composition of a column transformation and a row transformation of a matrix having at least rows and columns,

the column transformation transforming columns each according to a one dimensional column transformation, to the column transformation being executed using the parallel instruction, the two or more operands each storing information items for different columns in respective segments according to the column;

the row transformation transforming rows each according to a one dimensional row transformation, the row transformation being executed using the cross instruction, information items for the same row being stored in respective segments of the at least one operand.

0. 4. A data processing device according to claim 3, where the row and column transformation correspond to the same one-dimensional transformation.

0. 5. A data processing device according to claim 1, wherein the operations caused by the instruction comprise computing a sum and a difference of two segments in one of the one or more source operands.

0. 6. A data processing device according to claim 1, wherein the operations caused by the instruction result in the computation of a plurality of component coefficients of a vector transformation, such as an IDCT or DCT, of the numbers stored in the respective segments of the one or more source operands, the data processing device storing the component coefficients in segments at respective positions of a result operand commonly referred to by the instruction.

0. 7. A data processing device according to claim 6, wherein the numbers stored in the segments of two or more of the source operands make up an input vector, which is transformed, the component coefficients of the transformation of the input vector being stored in the segments of two or more result operands.

0. 8. A method of transforming a matrix having at least rows and columns using a processor having segmented operand storage circuits, the method comprising:

computing a composition of a column transformation and a row transformation,

the column transformation transforming columns each according to a one dimensional column transformation, the column transformation being executed using at least one SIMD instruction which causes the processor to process different columns in parallel, using information items for the different columns stored in respective segments of an operand storage circuit referred to in the SIMD instruction;

the row transformation transforming rows each according to a one dimensional row transformation, the row transformation being executed using at least one cross instruction which causes the processor to perform several operations upon information items for the same row in parallel, the information items for the same row being stored in respective segments of an operand storage circuit referred to in the cross instruction, wherein the row and column transformation correspond to the same one-dimensional transformation.

0. 9. A computer readable medium storing a computer program for executing the method according to claim 8.

where Cu=1/sqrt(2) if u=0 and C_u=1 otherwise and the sums run over the integers from 0 to N−1. This two-dimensional transformation can be computed by first obtaining an intermediate block INT_i,vby a one-dimensional transformation according to
INT_i,v=Σ_uC_uB_u,vcos ((2i+1)uπ/2N)
and subsequently applying a one-dimensional transform to the intermediate block
A_i,j=2/N Σ_vC_vINT_i,vcos ((2j+1)vπ/2N)
Thus, the two-dimensional transformation is computed as a composition of two one dimensional transformations, one transforming B into INT and the other transforming INT into A (“composition” of two transformations means that one transformation is applied to the result of applying the other transformation). In the example of the IDCT it does not matter which one-dimensional transformation is applied first: in the example one sums first along the first index u of the block B_u,vand subsequently along the second index v, but that order may be inverted without affecting the end result.

Such a two stage two-dimensional transformation can be speeded up using SIMD instructions. When the numbers B_u,vof the intermediate block B are stored as described in the preceding. i.e. with several numbers B_u,vv=0, 1, 2, 3 of a row in respective segments of a register, the computation of the intermediate block INT_i,vcan be performed by transforming a number of columns (all numbers having v=0 in the first column, v=1 in the second column and so on) in parallel.

Similar parallel processing using SIMD instructions is possible if the numbers from the intermediate block INT are stored in the registers so that several number of a column are stored in one register, e.g. if the segments of a first register store INT_i,vi=0.3, v=0, respectively, the segments of a second register store INT_i,vi=4.7, v=0, the register of a third register INT_i,vi=0.3, v=1 and so on. In this case a number of rows of the intermediate block INT can be transformed in parallel using SIMD instructions.

However, after the computation of the intermediate block INT from the block B, the numbers will not be stored in the register in this way, with several numbers INT_i,vi=0.3, v=0 from one column in a register, but instead several numbers INT_i,vi=0, v=0.3 from each row will be stored in each register. This is because the computation of the intermediate block requires separate one dimensional transformation of respective columns, whereas the computation of the final block A requires separate one dimensional transformations of respective rows.

In order to be able to use SIMD instructions for both types of transformations the intermediate block needs to be transposed: the numbers have to be regrouped over the registers. This is a complicated operation: in the example of an 8×8 block with 4-segment registers one needs 16 registers and 32 operations with two-inputs for the transposition.

The invention aims at avoiding the transposition. For the transformation of the rows the arrangement of the numbers of the intermediate block wherein registers contain different numbers from the same row is retained, and special instructions are used that combine these numbers from these registers in order to perform the one dimensional transformation in the row that is stored in these registers.

These instructions make it possible to perform a two-dimensional separable transformation without transposition. Without farther measures, the combination of such special instructions for one dimension and the SIMD type of operations for two or more further dimensions can be used to perform higher than 2 dimensional transformations as well.

In the most straightforward implementation at least one functional unit is provided that is capable of performing the entire IDCT of a row. In case of an 8-point IDCT using registers that each contain four respective numbers from a column, such an instruction would need two operand registers and two result registers.

FIG. 2 shows an example of a data-flow diagram for an implementation of an 8 point one dimensional IDCT. The data-flow diagram is based on expressions described in an article published by C.Loeffler, A.Ligtenberg and G. Moschytz, titled “Practical Fast 1-D DCT Algorithms with 11 multiplications”, published in Proceedings International Conference on Acoustics, Speech and Signal Processing 1989 (IC-IASSP '89) pages 988-991. At the left, nodes 30a-h symbolize the numbers by means of the value of the index v at positions v=0.7 in the row that has to be transformed. At the right nodes 32a-h symbolize the transformed numbers by means of the value of the index j at positions j =0.7 in the transformed row. The lines from the nodes 32a-h symbolize data flow of the numbers to different operations and of data flow of the results from these operation to other operations or to the transformed numbers. The operations are symbolized as follows. A dot with two solid incoming lines symbolizes summation. A dot with one incoming solid line and one incoming dashed line symbolizes subtraction, the number flowing along the dashed line being subtracted from the number flowing along the solid line. A box with two inputs and two outputs symbolizes rotation and factorization, that is, the computation of (X₁,Y₁) from (X₀,Y₀) according to
X₁=α(X₀cosφ−Y₀sin φ)
Y₁=α(X₀cosφ+Y₀cos φ)

The value of the factor a and an identification of the angle φ are noted on the box; these are predetermined values: the blocks can be implemented using four multiplication's, an addition and a subtraction (alternatively three multiplication's and three additions can be used).

In one implementation at least one functional unit is provided which is capable of executing a row-IDCT instruction that causes that functional unit to IDCT-transform the contents of the segments of its operands. In the example of an 8-point IDCT with four segments in each a register, this would require two operands to transform a row. Such an instruction requires two result registers in which the numbers that represent the transformation are written in respective segments according to their frequency position in the transformation.

Execution of the IDCT by such a functional unit is much faster than execution by means of individual instructions at least because the combination of numbers stored in segments at different positions in the operands can be realized by wiring in the functional unit. This wiring is specific to the IDCT. In addition, the data-flow diagram of FIG. 3 shows that a considerable amount of parallelism is possible in such a functional unit, so the speed of execution can be increased further by parallel execution of a number of operations.

Thus, the 2-dimensional IDCT transformation can be performed for the columns using arithmetic SIMD instructions to apply a one-dimensional IDCT-transformation to a number of columns in parallel and for the rows using a different, dedicated IDCT instruction to apply a functionally identical IDCT-transformation to a row.

Some processor architectures require that functional units use a standard instruction format, typically containing an opcode, two source register references and a result register reference. In this case each functional unit may have two ports connected to read ports of the register file and one port connected to a write port of a register file. In case of an IDCT instruction which transforms numbers stored in more than one register, more than one result register will be needed to write the transformed numbers. In architectures that allow only one result register this may be realized in various ways, for example by writing the results time-sequentially in logically adjacent result registers. Alternatively, one may use a combination of two instructions issued in parallel to the functional units. Such two instructions would normally be used for two different functional units in parallel. Instead, one uses the combination of the two instructions to program one functional unit that performs IDCT. By using this combination of two instructions, two separate result registers can be specified. In a processor that provides a write port to the register file for each of the instructions that is issued in parallel it is moreover ensured in this way that a write port to the register file is available for both results.

Alternatively, one might define two different types of instruction for the functional units, one for generating half the numbers in a register and another one for generating the other half of the numbers.

More generally, one may provide several dedicated instructions for respective parts of the computation of the IDCT, none of the instructions requiring more than a maximum number (e.g. one) of result registers. In order to select such instructions, one may split the IDCT data-flow diagram into sub-diagrams and assign a dedicated instruction to each sub-diagram. By selecting only sub-diagrams with a limited number of outputs it can be ensured that no more than one result register is required for any of the dedicated instructions.

FIG. 3 shows an example of a split-up into sub-diagrams indicated by dashed boxes 39a-g. Each of these boxes defines the data-flow of a number of a dedicated instructions which provide combinations of operations that are executed in parallel to help speed up the computation of transformation. The required number of segments in the results of each instruction is limited to four. These instructions are especially defined so that the locations of numbers in respective segments correspond to the location required for the SIMD transformation, that is, with the numbers indicated by v=0.3 at the left of FIG. 3 in respective segments of a first register R1 and the numbers indicated by u=4.7 in respective segments of a second register R2.

A first example of a first instruction INS1 R1,R2,R3 corresponding to a first dashed box 39a refers to the two registers R1, R2 as operands. This instruction causes a functional unit to perform the following operations in parallel:

- Sum the number (v=0) in a first segment of the first register R1 to the number (v=4) in the first segment of the second register R2. The result is placed in a first segment of a result register R3.
- Subtract the same numbers from one another and place the result in a second segment of the result register R3.
- Use the numbers in a third segment (v=2) of the first register R1 and the third segment of the second register R2 as X₀and Y₀in a rotation with a factor sqrt(2) and a predetermined sine and cosine value. Place the resulting X₁, Y₁are in the third and fourth segment of the result register.
  FIG. 4b shows an example of a functional unit 40 for executing the INS1 instruction. The functional unit 40 contains two input sections 42, 46 for receiving the content of the first register R1 and the second register R2 respectively, an instruction decoder 48 for setting the functional unit into action, and arithmetic circuits 44a-c for computing the sum of the first segment S0 of R1 and R2, the difference of the first segment of R1 and R2 and the rotation of the third segment S2 of R1 and R2. The results of these computations is combined into the segments S0-S3 of an output section 49 for writing into the result register R3.

A second example of a second instruction INS2 R3,R4 corresponding to a second dashed box 39b refers to one register R3 as operand. This instruction causes a functional unit to perform the following operations in parallel:

- Sum the numbers stored in the first and fourth segment of the operand register R3 and place the result in a first segment of a result register R4
- Sum the numbers stored in the second and third segment of the operand register R3 and place the result in a second segment of a result register R4
- Subtract the number in the third segment of the operand register R3 from the number in the second segment of the operand register R3 and place the result in the third segment of the result register R4.

Subtract the number in the fourth segment of the operand register R3 from the number in the first segment of the operand register R3 and place the result in the fourth segment of the result register R4

FIG. 4a shows an example of a functional unit 20 for executing the INS2 instruction. The functional unit 20 contains an input section, for receiving the content of the operand register R3, arithmetic units 24a-b, 25a-b for computing the sums and subtractions; an instruction decoder 28 for setting the functional unit 20 into action and an output section 26. The results of the sums and subtractions is combined into the segments S0-S3 of the output section 26 for writing into the result register R4.

A third example of a third instruction INS3 R4,R5,R6 corresponding to a third dashed box 39c refers to two registers R4, R5 as operands. This instruction causes a functional unit to perform the following operations in parallel:

- Sum the numbers stored in the first segment of the first operand register R4 and the fourth segment of the operand register R5 and place the result in the first segment of the result register R6
- Sum the numbers stored in the second segment of the first operand register R4 and the third segment of the second operand register R5 and place the result in the second segment of the result register R6
- Sum the numbers stored in the third segment of the first operand register R4 and the second segment of the second operand register R5 and place the result in the third segment of the result register R6
- Sum the numbers stored in the fourth segment of the first operand register R4 and the first segment of the second operand register R5 and place the result in the fourth segment of the result register R6
  A fourth example of a fourth instruction INS4 R4,R5,R6 corresponding to a dashed box 39h refers to two registers R4, R5 as operands. This instruction causes a functional unit to perform the following operations in parallel:
- Subtract from the number stored in the first segment of the first operand register R4 the number stored in the fourth segment of the operand register R5 and place the result in the fourth segment of the result register R6
- Subtract from the number stored in the second segment of the first operand register R4 the number stored in the third segment of the second operand register R5 and place the result in the third segment of the result register R6
- Subtract from the number stored in the third segment of the first operand register R4 the number stored in the second segment of the second operand register R5 and place the result in the second segment of the result register R6
- Subtract from the number stored in the fourth segment of the first operand register R4 the number stored in the first segment of the second operand register R5 and place the result in the fourth segment of the result register R6
  A fifth example of a fifth instruction INS5 R1,R2,R7 corresponding to a fourth dashed box 39d refers to two registers R1, R2 as operands. This instruction causes a functional unit to perform the following operations in parallel:
- Place the numbers from the fourth segment of the first source register R1 and the second segment of the second source register R2 into the second and third segment of the result register R7 respectively.
- Use the numbers in a third segment (v=2) of the second register R2 and the second segment of the first register R1 as X₀and Y₀in a rotation with a factor 2 and a predetermined sine and cosine value (corresponding to 45 degrees). Place the resulting X₁, Y₁are in the third and fourth segment of the result register. (This rotation can be implemented using fewer multiplication's because the sine and cosine of 45 degrees are equal to each other).
  A sixth example of a sixth instruction INS6 R7,R8 corresponding to a sixth dashed box 39e refers to one register R7 as operand. This instruction causes a functional unit to perform the following operations in parallel:
- Sum the numbers stored in the first and third segment of the operand register R7 and place the result in a first segment of a result register R8
- Sum the numbers stored in the second and fourth segment of the operand register R7 and place the result in a fourth segment of a result register R8
- Subtract the number in the third segment of the operand register R7 from the number in the first segment of the operand register R7 and place the result in the third segment of the result register R8
- Subtract the number in the second segment of the operand register R7 from the number in the fourth segment of the operand register R7 and place the result in the second segment of the result register R8
  A seventh example of a seventh instruction INS7 R8,R9 corresponding to a seventh dashed box 39f refers to one register R8 as operand. This instruction causes a functional unit to perform the following operations in parallel:
- Use the numbers in a first and fourth segment of the source register R8 and as

X₀and Y₀in a rotation with a factor sqrt(2) and a predetermined sine and cosine value. Place the resulting X₁, Y₁are in the first and fourth segment of the result register R9.

- Use the numbers in a second and third segment of the source register R8 and as X₀and Y₀in a rotation with a factor sqrt(2) and a predetermined sine and cosine value. Place the resulting X₁, Y₁are in the second and third segment of the result register R9.
  In these instructions numbers may be represented in the registers as fixed point numbers, all with the same number of bits, so that on multiplication a number of least significant bits are discarded. Almost all fixed point numbers may be defined to be in a range from +1 to −1. An exception are the results of the rotation/scalings, which are preferably fixed point numbers in a range from −2 to 2. It has been found that only insignificant accuracy is lost through rounding when one uses this representation of the numbers and when the data flow graph is split into instructions as described above. Preferably, the additions and/or multiplications in these instructions provide for clipping of results of these instructions if the magnitude of the result exceeds the range of values that can be held in the registers. However, it has been found that if the data flow graph is split into instructions in the way shown above, clipping is not normally necessary.

When the data processing device provides for all of these instructions the 8-point IDCT of a row contained in the segments of two registers R1, R2 can be programmed with the following program:

INS1 R1,R2,R3

INS2 R3,R4

INS5 R1,R2,R7

INS6 R7,R8

INS7 R8,R9

INS3 R4,R9,R5

INS4 R4,R9,R6

As a result the numbers making up a row of the IDCT transform will be contained in the segments of register R5,R6. To transform a complete block these instructions must be repeated for the other rows, with other registers as far as necessary. Needless to say that in a VLIW processor, with more than one functional unit, although all these instructions INS1-INS7 may be instructions for the same single functional unit, it is also possible that these instructions may be executed by different functional units. For example, specialized functional units might be provided for the instructions which involve multiplication on one hand and instructions which involve only additions and subtractions on the other hand.

Different grouping of operations into instructions is also possible. For example, one may combine for example the operations of INS1 and INS2 into one instruction INSA so that execution of INSA R1,R2,R4 is functionally equivalent to successive execution of INS1 R1,R2,X; INS2 X,R4; similarly INS5, INS6, INS7 may be combined into an instruction, so that execution of INSB R1,R2,R9 is equivalent to successive execution of INS5 R1,R2,X; INS6 X,Y; INS7 Y,R9. The instructions INS3 and INS4 can be replaced by SIMD additions and subtraction respectively, when the instruction INS7 is modified so that it puts its results into the segments of the result register in reverse order. However, in this case an additional “reverse order” instruction, which exchanges the contents of segments 0-3 with each other and the contents of segments 1-2 with each other is required. This instruction must applied to the result of the SIMD version of INS4 to get the transformed number in the proper order.

The number of instructions that needs to be executed to transform the block can be reduced by providing one or more functional units which accept the instructions INS1-INS7 and execute the operations in parallel combining different segments of the one or more operands referenced in the instruction. This reduces the time (number of instruction cycles) needed for the transform. Execution of the IDCT by such a functional unit is much faster than execution by means of individual instructions at least because the combination of numbers stored in segments at different positions in the operands can be realized by wiring in the functional unit. This wiring is specific to the IDCT. Of course, a reduction in the required time is already achieved if the functional units provide for only one of the additional instructions INS1-INS7 or any combination of these instructions. If one or more of these instructions are not provided for, their function can be implemented using conventional instructions.

Furthermore, the memory space needed for storing programs is reduced, in particular for programs which involve transformations. This benefit would of course be realized even if the operations in an instruction were not executed in parallel. The reduced program space would result from instructions that involve arbitrary combinations of operations. The particular combinations INS1-INS7, however, are not arbitrary: they have the special property that they provide operations that combine segments as required for computing the IDCT, so as to speed up processing and that furthermore they combine operations that can be executed in parallel to increase the speed of computing the IDCT even further.

The examples given above use registers with four segments to implement an 8-point two-dimensional IDCT, e.g. 64-bit registers with four 16 bit segments. Of course, the invention is not limited to these numbers. One may use segments of a different size, e.g. 8,12 or 32 bit segments (the segment need not fill the entire register) and/or registers with a different number of bits, e.g. 128-bits. In the latter case a register with 16-bit segments can store 8 numbers, for example an entire row of an 8-bit block and the 8-point IDCT can be executed as an instruction that requires only one operand register and one result register.

More generally, any kind of program can be speeded up by providing functional units which are capable of executing dedicated instructions involving (preferably parallel) execution of operations which combine operands stored in segments at different positions in the registers. The separable transforms discussed in the preceding are but an example of this. For a given program, suitable dedicated instructions can be found by analyzing the data-flow of the program and isolating often occurring combinations of operations that combine different segments of the same one or two operands. When a suitable instruction is found the instruction decoder 120 and the switch circuit 125 are designed so that the functional unit is capable of handling that instruction.

Preferably these dedicated instructions are combined with a set of SIMD instructions. In this case, one or more functional unit either together or individually provide a complete set of arithmetic instructions is provided with SIMD data flow (combining pairs of segments at corresponding positions in the operands). In addition at least one functional unit is capable of executing a few selected instructions that combine segments at different positions in one or more operands of the instruction, different, that is, than in the SIMD instruction.

This is particularly useful for any kind of separable transformations, not only for the IDCT. Use can be made of this in for example, 2-dimensional fourier transforms or Hadamard transforms, convolutions with 2-dimensional separable kernels (such as a Gaussian kernel) H(x,y) which can be written as H1(x)H2(y) etc and higher than two dimensional transformations or convolutions. In general, a separable transform uses a one dimensional transformation which takes a series of numbers as input and defines a new series of numbers as output. A separable transformation comprises the composition of two such one-dimensional transformations. A first one-dimensional transformation is computed for each of a set of series, producing a set of new series. A second transformation is computed for a transversal series obtainable by taking numbers from corresponding positions in series from the set of new series.

In each of these cases, the numbers that have to be transformed may be stored in segments of operands, the position of the segment in which a number is stored being determined in the same way for each row by the column in which the number is located, the numbers in each operand belonging to the same row. The transformation can then be executed in the row direction using the dedicated instructions and a number of times in parallel in a direction transverse to the rows by means of SIMD instructions.

INVENTORS:

Van Eijndhoven, Josephus Theodorus Johannes, Sijstermans, Fransiscus Wilhelmus

THIS PATENT IS REFERENCED BY THESE PATENTS:

Patent

Priority

Assignee

Title

THIS PATENT REFERENCES THESE PATENTS:

Patent	Priority	Assignee	Title
4689762,	Sep 10 1984	Lockheed Martin Corporation	Dynamically configurable fast Fourier transform butterfly circuit
5128760,	Feb 28 1990	TEXAS INSTRUMENTS INCORPORATED, 13500 NORTH CENTRAL EXPRESSWAY, DALLAS TEXAS 75265 A CORP OF DE	Television scan rate converter
5230057,	Sep 19 1988	Fujitsu Limited	SIMD system having logic units arranged in stages of tree structure and operation of stages controlled through respective control registers
5361367,	Jun 10 1991	The United States of America as represented by the Administrator of the	Highly parallel reconfigurable computer architecture for robotic computation having plural processor cells each having right and left ensembles of plural processors
5404550,	Jul 25 1991	HEWLETT-PACKARD DEVELOPMENT COMPANY, L P	Method and apparatus for executing tasks by following a linked list of memory packets
5410727,	Oct 24 1989	International Business Machines Corporation	Input/output system for a massively parallel, single instruction, multiple data (SIMD) computer providing for the simultaneous transfer of data between a host computer input/output system and all SIMD memory devices
5450603,	Dec 18 1992	Xerox Corporation	SIMD architecture with transfer register or value source circuitry connected to bus
5487133,	Jul 01 1993	Intel Corporation	Distance calculating neural network classifier chip and system
5488570,	Nov 24 1993	Intel Corporation	Encoding and decoding video signals using adaptive filter switching criteria
5493513,	Nov 24 1993	Intel Corporation	Process, apparatus and system for encoding video signals using motion estimation
5493514,	Nov 24 1993	Intel Corporation	Process, apparatus, and system for encoding and decoding video signals
5508942,	Nov 24 1993	Intel Corporation	Intra/inter decision rules for encoding and decoding video signals
5509129,	Nov 30 1993	Texas Instruments Incorporated	Long instruction word controlling plural independent processor operations
5511003,	Nov 24 1993	Intel Corporation	Encoding and decoding video signals using spatial filtering
5515296,	Nov 24 1993	Intel Corporation	Scan path for encoding and decoding two-dimensional signals
5524265,	Mar 08 1994	Texas Instruments Incorporated	Architecture of transfer processor
5528238,	Nov 24 1993	Intel Corporation	Process, apparatus and system for decoding variable-length encoded signals
5532940,	Nov 24 1993	Micron Technology, Inc	Process, apparatus and system for selecting quantization levels for encoding video signals
5535138,	Nov 24 1993	Intel Corporation	Encoding and decoding video signals using dynamically generated quantization matrices
5535410,	Nov 19 1993	Renesas Electronics Corporation	Parallel processor having decoder for selecting switch from the group of switches and concurrently inputting MIMD instructions while performing SIMD operation
5537338,	Nov 24 1993	Intel Corporation	Process and apparatus for bitwise tracking in a byte-based computer system
5539662,	Nov 24 1993	Intel Corporation	Process, apparatus and system for transforming signals using strength-reduced transforms
5539663,	Nov 24 1993	Intel Corporation	Process, apparatus and system for encoding and decoding video signals using temporal filtering
5559722,	Nov 24 1993	Micron Technology, Inc	Process, apparatus and system for transforming signals using pseudo-SIMD processing
5588152,	May 22 1992	International Business Machines Corporation	Advanced parallel processor including advanced support hardware
5594679,	Mar 31 1993	Sony Corporation	Adaptive video signal processing apparatus
5630083,	Mar 01 1994	Intel Corporation	Decoder for decoding multiple instructions in parallel
5636351,	Nov 23 1993	HEWLETT-PACKARD DEVELOPMENT COMPANY, L P	Performance of an operation on whole word operands and on operations in parallel on sub-word operands in a single processor
5638068,	Apr 28 1994	Intel Corporation	Processing images using two-dimensional forward transforms
5649135,	Jan 17 1995	IBM Corporation	Parallel processing system and method using surrogate instructions
5703966,	Jun 27 1995	Intel Corporation	Block selection using motion estimation error
5708836,	Nov 13 1990	International Business Machines Corporation	SIMD/MIMD inter-processor communication
5710935,	Nov 13 1990	International Business Machines Corporation	Advanced parallel array processor (APAP)
5713037,	Nov 27 1991	International Business Machines Corporation	Slide bus communication functions for SIMD/MIMD array processor
5717944,	Nov 13 1990	International Business Machines Corporation	Autonomous SIMD/MIMD processor memory elements
5729758,	Jul 15 1994	Mitsubishi Denki Kabushiki Kaisha	SIMD processor operating with a plurality of parallel processing elements in synchronization
5734877,	Sep 09 1992	MIPS Technologies, Inc	Processor chip having on-chip circuitry for generating a programmable external clock signal and for controlling data patterns
5734921,	Nov 13 1990	International Business Machines Corporation	Advanced parallel array processor computer package
5736948,	Mar 20 1995	Renesas Electronics Corporation	Semiconductor integrated circuit device and control system
5752067,	Nov 13 1990	International Business Machines Corporation	Fully scalable parallel processing system having asynchronous SIMD processing
5754457,	Mar 05 1996	Intel Corporation	Method for performing an inverse cosine transfer function for use with multimedia information
5754871,	Nov 13 1990	International Business Machines Corporation	Parallel processing system having asynchronous SIMD processing
5764787,	Mar 27 1996	Intel Corporation	Multi-byte processing of byte-based image data
5822608,	Nov 13 1990	International Business Machines Corporation	Associative parallel processing system
5870619,	Nov 13 1990	International Business Machines Corporation	Array processor with asynchronous availability of a next SIMD instruction
5878241,	Nov 13 1990	International Business Machine	Partitioning of processing elements in a SIMD/MIMD array processor
5881259,	Oct 08 1996	ARM Limited	Input operand size and hi/low word selection control in data processing systems
5893145,	Dec 02 1996	AMD TECHNOLOGIES HOLDINGS, INC ; GLOBALFOUNDRIES Inc	System and method for routing operands within partitions of a source register to partitions within a destination register
5909572,	Dec 02 1996	GLOBALFOUNDRIES Inc	System and method for conditionally moving an operand from a source register to a destination register
5933650,	Oct 09 1997	ARM Finance Overseas Limited	Alignment and ordering of vector elements for single instruction multiple data processing
5953241,	Aug 16 1995	Microunity Systems Engineering, Inc	Multiplier array processing system with enhanced utilization at lower precision for group multiply and sum instruction
5966528,	Nov 13 1990	International Business Machines Corporation	SIMD/MIMD array processor with vector processing
5991787,	Dec 31 1997	Intel Corporation	Reducing peak spectral error in inverse Fast Fourier Transform using MMX™ technology
6044448,	Dec 16 1997	Altera Corporation	Processor having multiple datapath instances
6047366,	Dec 31 1996	Texas Instruments Incorporated	Single-instruction multiple-data processor with input and output registers having a sequential location skip function
6058465,	Aug 19 1996		Single-instruction-multiple-data processing in a multimedia signal processor
6067613,	Nov 30 1993	Texas Instruments Incorporated	Rotation register for orthogonal data transformation
6092920,	Apr 02 1997	Renesas Electronics Corporation	Method for arranging pixels to facilitate compression/extension of image data
6094715,	Nov 13 1990	International Business Machine Corporation	SIMD/MIMD processing synchronization
6116768,	Nov 30 1993	Texas Instruments Incorporated	Three input arithmetic logic unit with barrel rotator
6119140,	Jan 08 1997	NEC Corporation	Two-dimensional inverse discrete cosine transform circuit and microprocessor realizing the same and method of implementing 8×8 two-dimensional inverse discrete cosine transform
6175892,	Jun 19 1998	Hitachi America. Ltd.	Registers and methods for accessing registers for use in a single instruction multiple data system
6381690,	Aug 01 1995	Hewlett Packard Enterprise Development LP	Processor for performing subword permutations and combinations
6735690,	Jun 21 1999	Altera Corporation	Specifying different type generalized event and action pair in a processor
7509366,	Aug 16 1995	MicroUnity Systems Engineering, Inc.	Multiplier array processing system with enhanced utilization at lower precision
EP424618,
EP444368,
EP656584,
EP680013,
EP755015,
EP847551,
EP723220,
EP755015,
JP2008053652,
JP2008249293,
JP3149348,
JP3199205,
JP4242827,
JP63118842,
JP7141304,
JP8249293,
JP9305423,
JP9305424,
KR100190738,
WO1997008608,
WO1999066393,
WO9731308,
WO9733236,
WO9948025,
WO9733236,

ASSIGNMENT RECORDS Assignment records on the USPTO

Executed on	Assignor	Assignee	Conveyance	Frame	Reel	Doc
Apr 28 2014		Koninklijke Philips N.V.	(assignment on the face of the patent)

MAINTENANCE FEES AND DATES: Maintenance records on the USPTO

Date	Maintenance Fee Events

Date	Maintenance Schedule
Feb 13 2021	4 years fee payment window open
Aug 13 2021	6 months grace period start (w surcharge)
Feb 13 2022	patent expiry (for year 4)
Feb 13 2024	2 years to revive unintentionally abandoned end. (for year 4)
Feb 13 2025	8 years fee payment window open
Aug 13 2025	6 months grace period start (w surcharge)
Feb 13 2026	patent expiry (for year 8)
Feb 13 2028	2 years to revive unintentionally abandoned end. (for year 8)
Feb 13 2029	12 years fee payment window open
Aug 13 2029	6 months grace period start (w surcharge)
Feb 13 2030	patent expiry (for year 12)
Feb 13 2032	2 years to revive unintentionally abandoned end. (for year 12)