Methods and apparatus for dynamic instruction controlled reconfigurable register file with extended precision

Methods and apparatus for dynamic instruction controlled reconfigurable register file with extended precision
RE40883

A reconfigurable register file integrated in an instruction set architecture capable of extended precision operations, and also capable of parallel operation on lower precision data is described. A register file is composed of two separate files with each half containing half as many registers as the original. The halves are designated even or odd by virtue of the register addresses which they contain. Single width and double width operands are optimally supported without increasing the register file size and without increasing the number of register file ports. Separate extended registers are also employed to provide extended precision for operations such as multiply-accumulate operations.

PTO Wrapper PDF
Dossier Espace Google

Patent RE40883
Priority Jul 09 1998
Filed Apr 19 2004
Issued Aug 25 2009
Expiry Oct 09 2018
Inventors Pechanek, …
Assg.orig Altera Cor…
Assg.curr Altera Cor…
Entity Large
Referenced by 15
References 10
Maint.: all paid

FIELD OF THE INVENTI…
BACKGROUND OF THE IN…
SUMMARY OF THE INVEN…
BRIEF DESCRIPTION OF…

9. A processing method for a processing apparatus comprising a reconfigurable register file including an odd register file portion and an even register file portion comprising the steps of:

selecting the odd register file portion or the even register file portion to provide a first value;

selecting the odd register file portion or the even register file portion to provide a second value;

multiplying the first value and the second value to produce a third value;

reading a fourth and a fifth value from the reconfigurable register file;

concatenating the fourth value with the fifth value to produce a concatenated value;

accumulating the third value with the concatenated value to produce a final result value.

13. A processing method for a processing apparatus comprising a reconfigurable register file including an odd register file portion and an even register file portion comprising the steps of:

selecting the odd register file portion or the even register file portion to provide a first value;

selecting the odd register file portion or the even register file portion to provide a second value;

multiplying the first value and the second value to produce a third value;

reading a fourth and a fifth value from the reconfigurable register file;

concatenating an extended value, and the fourth value with the fifth value to produce a concatenated value; and

accumulating the third value with the concatenated value to produce a final result value.

1. A processing apparatus for performing a multiply accumulate operation comprising:

a reconfigurable register file including an odd register file portion and an even register file portion;

a first multiplexer to select the odd register file portion or the even register file portion to provide a first value;

a second multiplexer to select the odd register file portion or the even register file portion to provide a second value;

a multiplier for performing a multiply operation on the first value and the second value to produce a third value; and

an accumulator for accumulating the third value with a fourth value to produce a result value, wherein the fourth value comprises a concatenated even and odd pair of values read from the reconfigurable register file.

6. A processing apparatus for performing an extended precision multiply accumulate operation comprising:

a reconfigurable register file including an odd register file portion and an even register file portion;

a first multiplexer to select the odd register file portion or the second even register file portion to provide a first value;

a second multiplexer to select the odd register file portion or the second even register file portion to provide a second value;

an extended precision register containing an extended value;

a multiplier for performing a multiply operation on the first value and the second value to produce a third value; and

an extended accumulator for accumulating the third value with the extended value concatenated with a fourth value to produce a result value, wherein the fourth value comprises an even and odd pair read from the reconfigurable register file.

0. 17. An apparatus for performing an operation with extended precision, the apparatus comprising:

at least two extended precision registers containing an extended value;

a register file containing a plurality of registers, the register file having at least two read ports;

an execution unit reading a first and a second value through the at least two read ports and connecting said execution unit's output to the at least two extended precision registers;

a multiplexer, in response to a portion of a field in an instruction, selecting one of the at least two extended precision registers to provide a third value to the execution unit, said field in the instruction specifying one of at the least two extended precision registers to be written by the execution unit when the execution unit executes the instruction utilizing the first value, second value, and third value as inputs thereby increasing the precision of the operation.

2. The processing apparatus of claim 1 wherein the accumulator is further for writing the result value to the reconfigurable register file.

3. The processing apparatus of claim 1 wherein the accumulator is further for writing the result value to the reconfigurable register file as an even and odd pair.

4. The processing apparatus of claim 1 wherein the first multiplexer allows for single width accesses to the odd register file portion or the even register file portion.

5. The processing apparatus of claim 4 wherein the second multiplexer allows for single width accesses to the odd register file portion or the even register file portion.

7. The processing apparatus of claim 6 wherein the accumulator is further for writing a first portion of the result value to the reconfigurable register file and a second portion of the result value to the extended precision register.

8. The processing apparatus of claim 6 wherein the accumulator is further for writing a first portion of the result value to the reconfigurable register file as an even and odd pair, and writing a second portion of the result value to the extended precision register.

10. The method of claim 9 wherein the third value and the fourth value comprise an even and odd pair read from the reconfigurable register file.

11. The method of claim 9 further comprising the step of:

storing the final result value to the reconfigurable register file.

12. The method of claim 11 wherein the final result includes an odd portion stored in the odd register file portion and an even portion stored in the even file portion.

14. The processing method of claim 13 further comprising the, before the step of concatenating, the step of:

reading the extended value from an extended precision register.

15. The method of claim 13 further comprising the step of:

storing a portion of the final result value to the reconfigurable register file.

16. The method of claim 13 further comprising the step of:

storing a portion of the final result value to an extended precision register.

0. 18. The apparatus of claim 17 wherein the at least two extended precision registers having a first and second precision register, wherein the instruction further controlling whether to write the output of the execution unit to either the first or second precision register.

0. 19. The apparatus of claim 17 wherein the at least two extended precision registers are loadable and readable by an application program.

0. 20. The apparatus of claim 17 wherein the selection of one of the at least two extended precision registers as additional input to the execution unit is determined by a bit carried in the instruction.

0. 21. The apparatus of claim 17 further comprising combinational logic receiving a bit from the instruction as input to determine whether to write output from the execution unit to the at least two extended precision registers.

0. 22. The apparatus of claim 17 wherein the execution unit reads single width data types when reading the at least two read ports.

0. 23. The apparatus of claim 17 wherein the execution unit reads double width data types when reading the at least two read ports.

This application is a Div. of Ser. No. 09/169,255 filed Oct. 9, 1998, now U.S. Pat. No. 6,343,356, and claims benefit of Provisional Application No. 60/092,148 filed Jul. 9, 1998.

FIELD OF THE INVENTION

The present invention relates generally to improvements to processing, and more particularly to advantageous techniques for providing a scalable building block register file which in a first application of the register file provides a low cost lower capacity register file, while in a second application, a higher capacity register file with dynamic reconfiguration support for flexible data type operations is provided. The present invention also relates to advantageous techniques for providing a dynamically reconfigurable register file of variable size width for different levels of data precision operations when executing algorithms demanding variable data types of variable precision requirements and for conducting multiple parallel operations on lower precision data in 32 bit and 64 bit forms.

BACKGROUND OF THE INVENTION

When executing algorithms it is desirable to have a register file that can be organized to more advantageously support processing of the varying data types and formats that dynamically occur in a programming application. For example, a register file of large width for high precision operations can be required in one part of an application while single and multiple parallel operations on lower precision data can be required in a different part of the same application. This desire is offset by the hardware cost to implement a wider register file or the hardware cost to implement additional read and write ports. The problem is how to achieve a dynamically configurable register file with extended precision at a reduced hardware cost without affecting general capabilities including performance.

SUMMARY OF THE INVENTION

The present invention advantageously addresses these problems while achieving a variety of advantages as addressed in further detail below. In one aspect of the present invention, to achieve the effect of a doublewide register file, two single wide register files, each with the same number of registers, are used in combination to provide a single register model that uses less read and write ports individually than a single register file of twice the capacity would require. Due to the reduced size of the register files and reduced number of read and write ports, higher performance implementations can be achieved as compared to a single register file of equivalent combined capacity of data width and read and write ports. The architecture designates one reduced register file to contain even register addresses and the other to contain odd register addresses. In a second aspect of this invention, the architecture designates one register file configured as two banks of registers wherein the even and odd registers are selectable by means of the read/write port address lines. In a third aspect of this invention, an additional register set of at least one register can be dynamically associated with any register in the register file to flexibly provide extended precision data width to any selected file register.

By appropriate multiplexing and control logic, single width, double width, and extended precision accessing are made available. By architecture definition, double width accesses are constrained to only work on even-odd register pairs thereby treating the two separate register files as a single addressable file of twice the width of an individual register. By convention and as dictated by the architecture, either the even or odd register file is designated as containing the upper half of the bits in a double width access. Double width accesses may occur on the read, write operations, or both depending on the operation to be performed. In this way, the access width of the register file is doubled without the addition of costly read/write ports or more bits per each register and the number of required read and write ports per half is reduced. The double width register file achieved by this invention provides the single width accesses for a simpler programming model when dealing with data types of single width. Additionally, since the same number of read and write ports exist on both halves, single width accesses across the full even plus odd register address space are possible.

These and other features, aspects and advantages of the invention will be apparent to those skilled in the art from the following detailed description taken together with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A illustrates a first prior art register file arrangement;

FIG. 1B illustrates a second prior art register file arrangement;

FIG. 1C illustrates a first reconfigurable register file in accordance with the present invention;

FIGS. 1D1 and 1D2 illustrates an exemplary add instruction for use in conjunction with a reconfigurable register file;

FIG. 2 illustrates a ManArray indirect very long instruction word (iVLIW) processor in conjunction with a reconfigurable register file in accordance with the present invention;

FIG. 3A illustrates two x/2 extended precision registers used with the reconfigurable register file for extended precision;

FIG. 3B illustrates four x/4 extended precision registers used with the reconfigurable register file for extended precision;

FIGS. 3C1 and 3C2 illustrates an exemplary MPXYA instruction for use with a reconfigurable register file; and

FIG. 4 illustrates two x/4 extended 1
where Rx and Ry are 32-bit quantities and Rto∥Rte is a 64-bit quantity. In a traditional non-split 32-bit wide register file implementation, it would take 1(Rx)+1(Ry)+2(Rto∥Rte)=4 32-bit read ports and 2(Rto∥Rte←) 32-bit write ports to accommodate this instruction. However, using the two register file blocks described above, this same function can be implemented with 3 read ports and 1 write port per block by using even/odd pairs for the 64-bit quantities.

For operations that do not need 64-bit quantities, the mux on the input to the functional unit is controlled to select the proper register file. As an example, consider the add instruction executing on the ALU that performs the function:
Rt←Rx+Ry
where Rx, Ry, and Rt are 32-bit quantities. If Rx is R1, Ry is R3, and Rt is R5 then the mux on the lower 32-bit inputs selects the odd register file for both inputs. Since the ALU has two read ports on the odd register file this operation is accomplished without any problems. The 32-bit write to R5 is also easily accomplished by only enabling the write for the odd register file. Any combination of even or odd registers can be selected without restrictions.
Extended Precision

An approach to increasing the width of the register file at a reduced hardware cost comes from taking into consideration where the extra precision gained from a wider register file is really needed. For example, in multiply-accumulate operations, extra precision is needed for the accumulation in some applications to increase the number of times accumulation can occur without overflow. In addition, even though providing extended precision support to all register files is a general case, in specific applications this is usually not required and would be considered unnecessarily expensive to implement. It is also not desirable to explicitly specify which registers are specially enabled, to support extended precision operations. Further, it is not desirable to have additional architecturally defined extended precision accumulator registers in addition to an existing register file. Consequently, for low cost implementations, as well as, for a flexible programming model for extended precision support, the present reconfigurable register file with extended precision invention advantageously addresses such concerns.

To accommodate such specific needs without increasing the number of ports or the width of the entire register file, the reconfigurable register file concept is extended by adding, in the simplest case, a single additional register known as the extended precision register. FIG. 3A illustrates a system 500 employing two (x/2)-bit registers 553 and 555 labeled XH1 and XH0 which are used to extend the precision of the accumulation operation that occurs in the Extended accumulator unit 523. The Multiply with Extended Accumulate operation is defined in FIG. 3C FIGS. 3C1 and 3C2 which defines the MPYXA instruction. The apparatus of FIG. 3A is adapted for an 80-bit extended accumulate operation where a 32×32-bit multiply is carried out by multiplier 521 which produces a 64-bit result that is extended to 80-bits in the accumulate operation of extended accumulator 523. This can be seen in FIG. 3A where depending upon the least significant bit (LSB) of the target register field in the MPYXA instruction, bit 17 of FIG. 3C 3C1, one of two extended precision registers XH1553 or XH0555 is selected via multiplexer 563. The least significant bit of the Register Target field allows the extended precision register to be arbitrarily used with any pair of registers in the register file. This powerful but simple feature allows a programmer to utilize any pair of registers for an extended precision operation without any mode control or specialized accumulator hardware added to the architecture. The inputs of multiplexer 563 are the (x/2)-bit length extended precision input operands XH0552 and XH1554. The multiplexer 563 selects XH0552 when its input control line 556 is a “0”. The multiplexer 563 selects XH1554 when its input control line 556 is a “1”. The output of multiplexer 563 is signal line 564 which is (x/2)-bits and is an input to the extended accumulator 523. The extended output 566 is a partial sum of product value that is stored in the extended precision registers in preparation for the next multiply accumulate operation. The output 566 is written to either XH1553 or XH0555 under control of a Write (Wr) signal 562. The pipeline stored LSB of the Rte field 551 is used to control the Wr signal via logical AND type function where the Wr 562 is passed onto the register depending on the state of the LSB. The AND gates 557 and 559 control this function, where the LSB input to AND 559 is an inverted version 561 of whatever bit appears on line 556. The output of the AND gates 558 and 560 control the writing of the output extended precision data 566 to their extended precision registers. The extended precision registers XH1553 and XH0555 are part of the special purpose or miscellaneous registers that are used in the processor and consequently are load-able and read-able by the programmer. The read and write buses that accomplish this task for the programmer are not shown in FIG. 3A for reasons of clarity.

FIG. 3B depicts a quad extended precision apparatus 600 supporting the MPYXA multiply with extended accumulate instruction of FIG. 3C 3C1 which shows dual 40 bit accumulation 702 and double width 80 bit accumulation 703 701. In FIG. 3B, four (x/4)-bit registers are provided as partitions of two (x/2)-bit registers 653 and 655 labeled XB3 and XB2 in register 653 and XB1 and XB0 in register 655. The four (x/4)-bit registers are used to extend the precision of the accumulation operation that occurs in the Extended accumulator units 621 and 625. The Multiply with Extended Accumulate operation is defined in FIG. 3C 3C1 which defines the MPYXA instruction for dual 40-bit extended accumulates 702. The apparatus of FIG. 3B supports the dual 40-bit extended accumulate operation where two 16×16-bit multiplies 619 and 623 each produce a 32-bit result that are each extended to 40-bits in the accumulate operations performed by accumulators 621 and 625, respectively. This operation can be seen in FIG. 3B where depending upon the least significant bit (LSB) of the target register field in the MPYXA instruction, bit 17 of FIG. 3C 3C1, one of two extended precision registers XB3 and XB2653 or XB1 and XB0655 are selected via multiplexers 663 and 665. The least significant bit of the Register Target field allows the extended precision register to be arbitrarily used with any pair of registers in the register file. This powerful but simple feature allows a programmer to utilize any pair of registers for an extended precision operation without any mode control or specialized accumulator hardware added to the architecture. The input of multiplexers 663 and 665 are the (x/2 4)-bit length extended precision input operands XB0622 and XB2626 for multiplexer 663, and XB1624 and XB3628 for multiplexer 665. The multiplexer 663 selects XB0622 when its input control line 630 is a “0”. The multiplexer 665 selects XB1624 when its input control line 630 is a “0”. The multiplexer 663 selects XB2626 when its input control line 630 is a “1”. The multiplexer 665 selects XB3628 when its input control line 630 is a “1”. The output 670 of multiplexer 663 is (x/4)-bits and serves an input to the extended accumulator 621. The extended output 636 is a partial sum of product value that is stored in the extended precision registers in preparation for the next multiply accumulate operation. The output 672 of multiplexer 665 is (x/4)-bits and serves as an input to the extended accumulator 625. The extended output 638 is a partial sum of product value that is stored in the extended precision registers in preparation for the next multiply accumulate operation. The output 636 is written to either XB2 or XB0 and the output 638 is written to either XB3 or XB1 all under control of a Write (Wr) signal 648. The pipeline stored LSB of the Rte field 651 is used to control the Wr signal via a logical AND type function where the Wr 648 is passed onto the register depending on the state of the LSB. The AND gates 657 and 659 control this function, where the LSB input to AND 659 is an inverted 661 version of 630. The output of the AND gates 632 and 634 control the writing of the output extended precision data 636 and 638 to their extended precision registers. The partitioned extended precision registers 653 and 655 are part of the special purpose or miscellaneous registers that are used in the processor and consequently are load-able and read-able by the programmer. The read and write buses that accomplish this task for the programmer are not shown in FIG. 3B for reasons of clarity.

In a typical application, x is 32-bits, with (x/2)=16-bits and (x/4)=8-bits though different extended precision bit widths are not precluded. The present approach allows dual accumulations of 40-bits of precision for dual 16×16 multiply-accumulates, as specified in the MPYXA instruction FIG. 3C and for the exemplary apparatus shown in FIG. 3B. For 32×32 multiply-accumulate operations, 80-bits of precision are available for the accumulation. The extended precision concept can be further extended to support quad 20 bit accumulations where x is 16-bits and there are 4 extended precision bits. The concept can be further generalized by using more than one x-bit extended precision register and basing the selection of the register extended precision portions on more than the single LSB of the Instruction Rte field. Since a single 32-bit extended precision register provides support for up to two 80-bit extended accumulate operations and up to four 40-bit extended accumulate operations, further extensions, even though feasible, for practical reasons presently appear to be of limited use.

Due to the nature of many applications, a processor can be designed utilizing a subset of the ManArray architecture that is based upon a single 16×32 register file, i.e. one of the building blocks for a reconfigurable register file. Dual 8×32 register files can be also used to create a reconfigurable 16×32 register file. An important aspect is that a low cost register file design point can be reached by subsetting the ManArray architecture that allows future growth into higher performance processors that remain code compatible with the lower cost subset design. An exemplary apparatus 700 implementing this use of the extended precision concept with a single register file design is shown in FIG. 4.

While the present invention has been described in the context of a number of presently preferred embodiments, it will be recognized that the teachings of the present invention may be advantageously applied to a variety of processing arrays and variously adopted consistent with the claims which follow.

INVENTORS:

Pechanek, Gerald George, Barry, Edwin Franklin

THIS PATENT IS REFERENCED BY THESE PATENTS:

Patent	Priority	Assignee	Title
8115595,	Apr 25 2006	LG Electronics Inc	Reader control system
8115604,	Apr 25 2006	LG Electronics Inc	Reader control system
8378790,	Apr 25 2005	LG Electronics Inc.	Reader control system
8482389,	Apr 25 2005	LG Electronics Inc	Reader control system
8508343,	Apr 25 2005	LG Electronics Inc.	Reader control system
8598989,	Apr 25 2005	LG Electronics Inc.	Reader control system
8604913,	Apr 25 2005	LG Electronics Inc.	Reader control system
8624712,	Apr 25 2005	LG Electronics Inc.	Reader control system
8653948,	Apr 25 2005	LG Electronics Inc	Reader control system
8665066,	Apr 25 2005	LG Electronics Inc	Reader control system
8698604,	Apr 25 2005	LG Electronics Inc	Reader control system
8749355,	Jun 09 2005	LG Electronics Inc.	Reader control system
9235414,	Dec 19 2011	Intel Corporation	SIMD integer multiply-accumulate instruction for multi-precision arithmetic
9672395,	Apr 25 2005	LG Electronics Inc.	Reader control system
9679172,	Apr 25 2005	LG Electronics Inc.	Reader control system

THIS PATENT REFERENCES THESE PATENTS:

Patent	Priority	Assignee	Title
4302818,	Jul 10 1979	Texas Instruments Incorporated	Micro-vector processor
4713749,	Feb 12 1985	Texas Instruments Incorporated	Microprocessor with repeat instruction
4774688,	Oct 15 1985	International Business Machines Corporation	Data processing system for determining min/max in a single operation cycle as a result of a single instruction
5072418,	May 04 1989	Texas Instruments Incorporated; TEXAS INSTRUMENTS INCORPORATED, A CORP OF DE	Series maxium/minimum function computing devices, systems and methods
5644780,	Jun 02 1995	International Business Machines Corporation	Multiple port high speed register file with interleaved write ports for use with very long instruction word (vlin) and n-way superscaler processors
5903919,	Oct 07 1997	SHENZHEN XINGUODU TECHNOLOGY CO , LTD	Method and apparatus for selecting a register bank
6044448,	Dec 16 1997	Altera Corporation	Processor having multiple datapath instances
6078941,	Nov 18 1996	Samsung Electronics Co., Ltd.	Computational structure having multiple stages wherein each stage includes a pair of adders and a multiplexing circuit capable of operating in parallel
6134648,	Mar 15 1996	Round Rock Research, LLC	Method and apparatus for performing an operation mulitiple times in response to a single instruction
6223255,	Feb 03 1995	AVAGO TECHNOLOGIES GENERAL IP SINGAPORE PTE LTD	Microprocessor with an instruction level reconfigurable n-way cache

ASSIGNMENT RECORDS Assignment records on the USPTO

Executed on	Assignor	Assignee	Conveyance	Frame	Reel	Doc
Apr 19 2004		Altera Corporation	(assignment on the face of the patent)
Aug 24 2006	PTS Corporation	Altera Corporation	ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS	018184	0423	pdf

MAINTENANCE FEES AND DATES: Maintenance records on the USPTO

Date	Maintenance Fee Events
Dec 22 2009	M1552: Payment of Maintenance Fee, 8th Year, Large Entity.
Apr 25 2013	R1551: Refund - Payment of Maintenance Fee, 4th Year, Large Entity.
Apr 25 2013	R1554: Refund - Surcharge for Late Payment, Large Entity.
Aug 06 2013	M1553: Payment of Maintenance Fee, 12th Year, Large Entity.
Mar 14 2014	REM: Maintenance Fee Reminder Mailed.

Date	Maintenance Schedule
Aug 25 2012	4 years fee payment window open
Feb 25 2013	6 months grace period start (w surcharge)
Aug 25 2013	patent expiry (for year 4)
Aug 25 2015	2 years to revive unintentionally abandoned end. (for year 4)
Aug 25 2016	8 years fee payment window open
Feb 25 2017	6 months grace period start (w surcharge)
Aug 25 2017	patent expiry (for year 8)
Aug 25 2019	2 years to revive unintentionally abandoned end. (for year 8)
Aug 25 2020	12 years fee payment window open
Feb 25 2021	6 months grace period start (w surcharge)
Aug 25 2021	patent expiry (for year 12)
Aug 25 2023	2 years to revive unintentionally abandoned end. (for year 12)