In one embodiment of the present invention, a high-speed adder is provided. This adder may incorporate a conversion circuit in a slack propagation timing path to provide for improved performance. The present invention may be incorporated into single or multi-bit adders.
|
26. A method for adding a first and a second binary number, each having a plurality of bits, comprising:
generating a carry production control signal representing a sum of a plurality of corresponding bits of the first binary number and the second binary number where the carry production control signal comprises two signals A and b that can each have a value of either p or Q; and
converting the A and b signals into two signals, X and Y, representing one of three possible values.
1. An adder circuit for adding a first binary number and a second binary number, the adder comprising:
a carry evaluating circuit to generate a carry production control signal representing a sum of a block of corresponding bits of the first binary number and the second binary number, the carry production control signal comprising two signals A and b that can each have a value of either p or Q; and
a conversion circuit coupled to receive the A and b outputs and to output two signals X and Y, each having a value of either p or Q, and X and Y having one of three possible values.
17. A microprocessor comprising:
an arithmetic logic circuit including an adder having a carry evaluating circuit to generate a carry production control signal representing a sum of a block of corresponding bits of the first binary number and the second binary number, the carry production control signal comprising two signals A and b that can each have a value of either p or Q; and
a conversion circuit coupled to receive the A and b outputs and the conversion circuit outputting two signals X and Y, each having a value of either p or Q, and X and Y having one of three possible values.
2. An adder as in
3. An adder as in
5. An adder as in
a second circuit coupled to said conversion circuit to receive the X and Y outputs and outputting a first result if an input carry has a value of 1 and a second result if the input carry value has a value of 0.
6. An adder as in
7. An adder circuit as in
8. An adder circuit as in
9. An adder circuit as in
10. An adder circuit as in
11. An adder circuit as in
12. An adder as in
13. An adder as in
14. An adder as in
a carry binary number determining circuit, responsive to the first binary number and the second binary number to generate a carry binary number composed of carry bits of a sum of the first binary number and the second binary number, the carry binary number determining circuit having a plurality of circuit stages to operate in series to generate the carry binary number, each circuit stage to partially resolve the carry binary number and at least one circuit stage including at least one of the carry bit evaluating circuits generating a carry control production signal that is coupled between the circuit stages as an input signal to a next circuit stage; and
a combinatorial logic circuit coupled to respective corresponding bits of the first binary number, the second binary number and the carry binary number to generate a corresponding bit of a result binary number.
15. An adder as in
18. A microprocessor as in
a second circuit coupled to said conversion circuit to receive the X and Y outputs and outputting a first result if the input carry has a value of 1 and a second result if the input carry has a value of 0.
19. A microprocessor circuit as in
20. A microprocessor circuit as in
21. A microprocessor circuit as in
22. A microprocessor circuit as in
23. A microprocessor as in
a carry binary number determining circuit, responsive to the first binary number and the second binary number to generate a carry binary number composed of carry bits of a sum of the first binary number and the second binary number, the carry binary number determining circuit having a plurality of circuit stages to operate in series to generate the carry binary number, each circuit stage to partially resolve the carry binary number and at least one circuit stage including at least one of the carry bit evaluating circuits generating a carry control production signal that is passed between the circuit stages as an input signal to a next circuit stage; and
a combinatorial logic circuit responsive to respective corresponding bits of the first binary number, the second binary number and the carry binary number to generate a corresponding bit of a result binary number.
24. A microprocessor as in
25. A microprocessor as in
27. The method of
receiving the X and Y signals and generating a first result if an input carry value has a value of 1 and a second result if the input carry value has a value of 0.
28. The method of
receiving the X and Y signals and generating a final result utilizing a carry-select evaluator circuit.
29. The method of
determining a final binary result of adding the first and second binary numbers by utilizing a plurality of carry evaluating circuits in a parallel prefix structure to generate a full set of carry bits from the first binary number and the second binary number.
|
This invention relates to the field of data processing. More particularly, this invention relates to digital adder circuits used within data processing systems.
Addition is one of the most important arithmetic operations to optimize as it is frequently performed within data processing systems. A problem with producing high speed adder circuits is that the high order bits of the result are logically and physically dependent upon the carry out values from the low order bits. In other words, the carry out from the top bit of the adder is a function of every input bit. Consequently, addition operations tend to be relatively slow.
Considerable effort has been expended for decades to address the problem in order to design and develop adder circuits that are capable of operating at high speed. To that end, various addition algorithms have been developed that include adders using maximally parallel-prefix circuits. While such maximally parallel-prefix adders are a great improvement over prior designs, given the extreme focus on improving processor cycle times, there is a continuing need to reduce the time required to produce a result from an add operation.
The general concept of carry arbitration will be described to aid in the understanding of the invention. In the general case, the carry ci+1 is evaluated by adding two 1-bit binary numbers ai and bi. There are two general cases defined by the values of ai and bi. The first case, where there is an output carry request, arises when both operand bits are equal. A 1-carry request occurs if both inputs are 1, whereas a 0-carry request if both inputs are 0. The second case, where there is no output carry request, arises when the operand bits have different values. See Table 1 in which the letter u indicates there is no output carry request.
TABLE 1
aibi
ci+1
0 0
0
1 1
1
0 1
u
1 0
u
One input pair (ai, bi) may or may not make a carry request. If two input pairs (ai, bi) and (aj, bj) are used, two carry requests may occur at the same time. Therefore, it is necessary to arbitrate these two carry requests. It is of note that i and j relate to two adjacent bits (at the first level) or blocks of bits (at subsequent levels) in the calculation, thus if we are arbitrating between carry requests relating to previously arbitrated blocks of 3 bits, then i=j+3.
TABLE 2
aibi
aj, bj
Ci+1
0 0
— —
0
1 1
— —
1
0 1 (or 1 0)
0 0
0
0 1 (or 1 0)
1 1
1
0 1 (or 1 0)
0 1 (or 1 0)
u
The output carry ci+1 1 can be encoded using two wires (vi, wi) as illustrated in Table 3 below.
TABLE 3
Ci+1
Vi, Wi
0
0 0
1
1 1
u
0 1
u
1 0
The signals on the two wires constitute the carry production control signal. The following equations satisfy the conditions illustrated in Tables 2 and 3:
Vi=aibi+(ai+bi)aj (1)
Wi=aibi+(ai+bi)bj
A 16-bit fast carry computation using 2-input-pair carry arbiters is shown in
A 3-input-pair carry arbiter is shown in FIG. 3. The input pair (ai, bi) , can make a non-maskable carry request. The input pairs (aj, bj) and (ak, bk) can both make a maskable carry request at the same time. However, the input pair (aj, bj) has priority over the input pair (ak, bk) . Only when there is not a non-maskable carry request from the input pair (ai, bi) and no maskable carry from the input pair (aj, bj), is a maskable carry request from the input pair (ak, bk) acknowledged by the output carry ci+1 as illustrated in Table 4 below.
TABLE 4
ai, bi
aj, bj
ak, bk
ci+1
0 0
— —
— —
0
1 1
— —
— —
1
0 1 (or 1 0)
0 0
— —
0
0 1 (or 1 0)
1 1
— —
1
0 1 (or 1 0)
0 1 (or 1 0)
0 0
0
0 1 (or 1 0)
0 1 (or 1 0)
1 1
1
0 1 (or 1 0)
0 1 (or 1 0)
0 1 (or 1 0)
u
The following equations satisfy Tables 3 and 4:
Vi=aibi+(ai+bi)(ajbj+(aj+bj)ak) (3)
Wi=aibi+(aj+bi)(ajbj+(aj+bj)bk)
Using a similar approach to 2- or 3-input-pair carry arbiters, carry arbiters with any numbers of input pairs can be derived. However, carry arbiters with more than 4 input pairs are not usually of interest. Firstly, too many series transistors are needed to implement these arbiters, which leads to inefficient CMOS designs. Secondly, the arbiter cell layout may become too large for the bit slice of a datapath.
The final row is a sum circuit that operates to XOR the input operands and the carry result.
The carry out from the adder of
As soon as vi and wi are equal (meaning that the carry has been generated), only single-rail signals need to be routed instead of dual-rail signals. This results in a significant reduction of chip area, especially in the third row where more room is needed to accommodate signals crossing from the least significant bits to the most significant bits. Thus, the resulting adder is quite compact.
As an example of the use of the above technique, the design of an 80-bit high-speed adder with a moderate chip area will now be considered.
The carry ci is evaluated by adding two 1-bit numbers ai and bi as shown in Table 1. There are two general cases defined by the values ai and bi. The first case, where there is a carry request, arises when both operand bits are equal. A 1-carry request occurs if both inputs are 1, whereas a 0-carry request if both inputs are 0. The second case, where there is no carry request, arises when the operand bits have different values. The letter u indicates there is no carry request.
We introduce the concept of carry arbitration by taking a four-way carry arbiter as shown in
Only when there is no non-maskable carry request from the input pair (a3, b3) is a maskable carry request from the input pair (a2, b2) acknowledged by the output c. Only when there is no non-maskable carry request from the input pair (a3, b3) and no maskable carry request from the input pair (a2, b2) is a maskable carry request from the input pair (a1, b1) acknowledged by the output c. Only when there are no carry requests from the input pairs (a3, b3), (a2, b2) and (a1, b1) is a carry request from the input pair (a0, b0) acknowledged by the output c. Table 5 outlines the truth table required to implement four-way carry arbiters.
TABLE 5
a3, b3
a2, b2
a1, b1
a0, b0
c
00
—
—
—
0
11
—
—
—
1
01 or 10
00
—
—
0
01 or 10
11
—
—
1
01 or 10
01 or 10
00
—
0
01 or 10
01 or 10
11
—
1
01 or 10
01 or 10
01 or 10
00
0
01 or 10
01 or 10
01 or 10
11
1
01 or 10
01 or 10
01 or 10
01 or 10
u
Using the same approach, carry arbiters with any number of ways may be derived. The carries may be generated quickly by using carry arbiters combined into a tree structure that exploits the associativity of the carry computation.
Theoretically, the more inputs each carry arbiter handles, the faster the carries are generated. However, as discussed earlier, carry arbiters with more than four ways are not usually of practical interest. Four-way carry arbiters and their dynamic CMOS implementation may be chosen because they may achieve advantageous results for a 80-bit design. Other designs, such as 32-bit adders, favor three-way carry arbiters.
Motivated by the dual-rail data encoding used in self-timed design, the carry request out c can be encoded using two wires (aa, bb) as shown in Table 6, below.
TABLE 6
c
aa, bb
0
0 0
1
1 1
u
0 1
u
1 0
Equations 6 and 7 give the behavior defined by Tables 5 and 6.
aa=a3b3+(a3+b3)(a2b2+(a2+b2)(a1b1+(a1+b1)a0)) (6)
bb=a3b3+(a3+b3)(a2b2+(a2+b2)(a1b1+(a1+b1)b0)) (7)
However, we can take another view of a four-way carry arbiter. If we consider a four-way carry arbiter as a carry generation circuit for a 4-bit addition, then one of the outputs aa and bb can be viewed as the carry out generated with a zero carry-in and the other is with a one carry-in. The direct implementation does not distinguish which is the carry out generated with a zero carry-in and which with a one carry-in. In the modified circuit, the output aa is the carry out generated with a one carry-in and the output bb as the carry out generated with a zero carry-in. This may result in a significant reduction of chip area.
However, the use of the modified implementation needs the input conversion from (0 1) to (1 0). Fortunately this conversion is straightforward. It consists of one 2-input NAND and one 2-input NOR gate per bit. For practical reasons, gates are normally necessary anyway to isolate the signals from the main input bases. The difference here is that NAND and NOR gates are used instead of inverters. If two input buses are designed using a precharge structure, the outputs after NAND and NOR gates are naturally low (required in the dynamic implementation) when the buses are precharged high. Furthermore, these NAND and NOR gates can be reused for logic operations in an ALU design.
Consider first a conventional approach to high-speed adder design.
A block diagram of an improved 80-bit adder is shown in FIG. 12. The whole adder is visualized (but not divided) as consisting of five 16-bit groups. The first row is the conversion circuit, which contains 2-input AND and OR gates 1201-1209. The second and third rows are four-way arbiters that produce carries within each group and have the form discussed previously. The fourth row, 1211-1219, produces two intermediate sums with a zero carry-in and a one carry-in. The final row includes multiplexers which select the final sum result and three carry arbiters which generate the boundary carries c16, c32, c48 and c64. The carries of the 16 least significant bits have already been generated after two rows of the carry computation. Compared with the conventional carry-select scheme, the need for group adders has been eliminated. The two intermediate sums are elegantly generated within the carry generation tree. This may result in a significant reduction of chip area, especially when the groups are made to be long, since group adders also need some mechanisms for carry computation.
It is worth noting that only single-rail signals need to be routed (instead of dual-rail signals) if the signals aa and bb are known to be equal (meaning that the carry has been generated, as either a 1-carry or a 0-carry request).
By exploiting the properties of associatively of the logic equation governing carry generation, the adder illustrated in
To better understand the propagation delays through the adder of
The 5th row has been split into its constituent parts: a 4-way carry evaluator 1307 logic stage and a multiplexer 1311 logic stage. As before, row 4 produces two intermediate sums, res_0 and res_1, with a zero carry-in and a one carry-in respectively. The symbol “1” represents that the node is always logic “1” if two carry outputs are different. The symbol “0” represents that the node is always logic “0” if two carry outputs are different. The fifth row multiplexer 1311 selects the final sum result. The 4-way carry evaluator 1307 may be a carry-select evaluator circuit.
The total propagation delay “Tpt” through all the adder logic stages, row 1-row 5, is the total of the individual propagation delays “Tp” through each of the first three rows plus the longest of the propagation times associated with row 4 and row 5. The propagation time through the 4-way carry evaluator 1307 and the multiplexer 1311 may be considerably longer than the propagation time through the XOR 1309 and the multiplexer 1311. Therefore the total propagation time can be represented as:
Tpt=Tp 1st row+Tp 2nd row+Tp 3rd row+Tp 5th row.
In essence, the 4th and 5th rows operate in parallel. As can be seen from the above equation, the XOR logic stage 1309 is not in the critical timing path and the propagation time through this stage is considered a “slack path”. The slack path time “Spt” is simply the difference in the longest propagation path and the shortest propagation path with regards to the 4th row and can be expressed:
Spt=((Tp 1307+Tp 1311)−(Tp 1309+Tp 1311))
In
Tpt=Tp 2nd row+Tp 3rd row++Tp 5th row.
In this case, the propagation delay through the adder may be reduced by the propagation time through row 1 (Tp 1301) relative to the adder of FIG. 13.
For the second case where the propagation time through the path Tp 1301+Tp1309+Tp 1311 is greater than through the propagation path Tp 1307+Tp 1311, then the total propagation delay Tpt for the adder in
Tpt=Tp 2nd row+Tp 3rd row+Tp 4th row (Tp 1301+Tp 1309)+Tp 5th row (1311).
So in either the first or the second case, the adder of
As described in association with
Current process technologies may allow CMOS circuits operating at over 1 GHZ to accommodate about ten gates within a pipeline stage. This indicates that the adder of
In some embodiments of the present invention as shown in
Although illustrative embodiments of the invention have been described in detail herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise embodiments, and that various changes and modifications can be effected therein by one skilled in the art without departing from the scope and spirit of the invention as defined by the appended claims.
The present invention has been described with respect to a limited number of embodiments; those skilled in the art will appreciate numerous modifications and variations therefrom. It is intended that the appended claims cover all such modifications and variations as fall within the true spirit and scope of this present invention.
Patent | Priority | Assignee | Title |
7194501, | Oct 22 2002 | Oracle America, Inc | Complementary pass gate logic implementation of 64-bit arithmetic logic unit using propagate, generate, and kill |
Patent | Priority | Assignee | Title |
4866658, | Sep 10 1984 | Raytheon Company | High speed full adder |
5493524, | Nov 30 1993 | Texas Instruments Incorporated | Three input arithmetic logic unit employing carry propagate logic |
5951630, | Oct 02 1996 | ARM Limited | Digital adder circuit |
6055557, | Jan 08 1997 | Freescale Semiconductor, Inc | Adder circuit and method therefor |
20030061253, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Sep 28 2001 | Intel Corporation | (assignment on the face of the patent) | / | |||
Sep 28 2001 | LIU, JIANWEI | Intel Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 012218 | /0734 | |
Nov 08 2006 | Intel Corporation | MARVELL INTERNATIONAL LTD | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 018515 | /0817 |
Date | Maintenance Fee Events |
Apr 13 2009 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Apr 11 2013 | M1552: Payment of Maintenance Fee, 8th Year, Large Entity. |
May 19 2017 | REM: Maintenance Fee Reminder Mailed. |
Nov 06 2017 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Oct 11 2008 | 4 years fee payment window open |
Apr 11 2009 | 6 months grace period start (w surcharge) |
Oct 11 2009 | patent expiry (for year 4) |
Oct 11 2011 | 2 years to revive unintentionally abandoned end. (for year 4) |
Oct 11 2012 | 8 years fee payment window open |
Apr 11 2013 | 6 months grace period start (w surcharge) |
Oct 11 2013 | patent expiry (for year 8) |
Oct 11 2015 | 2 years to revive unintentionally abandoned end. (for year 8) |
Oct 11 2016 | 12 years fee payment window open |
Apr 11 2017 | 6 months grace period start (w surcharge) |
Oct 11 2017 | patent expiry (for year 12) |
Oct 11 2019 | 2 years to revive unintentionally abandoned end. (for year 12) |