A reciprocal square root for a radix of x is calculated when S[j] represents the partial result obtained after j iterations of calculation, w[j], a residual, and p[j], the product of an operand x and the S[j]. Firstly, appropriate values are set to the initial values S[0], w[0], and p[0]. Secondly, n iterations of calculations from j=0 to n−1 are performed. One calculation includes selecting a reciprocal square root digit qj+1 from the digit set {−a, . . . , −1, 0, 1, . . . , a}, and calculating a recurrence equation of the S[j], i.e., S[j+1]:=S[j]+qj+1r−j−1, a recurrence equation of the w[j], i.e., w[j+1]:=rW[j]−(2P[j]+Xqj+1r−j−1)qj+1, and a recurrence equation of the p[j], i.e., p[j+1]:=P[j]+Xqj+1r−j−1.
|
1. A reciprocal square rooting circuit for computing a reciprocal square root S[n] for a radix of r by using a partial result S[j], a residual w[j], and a product p[j] of an operand x and the partial result S[j] after j iterations of calculation, comprising:
as components of a circuit for performing one iteration of n iterations of calculation for j=0 to n−1 by using predetermined recurrence equations,
a digit selection circuit which determines a reciprocal square root digit qj+1 from values of most significant several bits of rW[j] and p[j];
a first 1-digit multiplier which generates Xqj+1 by multiplying the operand x by an output from the digit selection circuit;
a first carry-save adder which adds 2P[j] and an output from the first 1-digit multiplier;
a second 1-digit multiplier which multiplies an output from the first carry save adder by an output from the digit selection circuit;
a second carry save adder which calculates w[j+1] by adding rW[j] and an output from the second 1-digit multiplier;
a third carry-save adder which adds p[j] and the output from the first 1-digit multiplier; and
a converter which converts S[j] into S[j+1] in accordance with the output from the digit selection circuit.
2. A circuit according to
|
This application is based upon and claims the benefit of priority from the prior Japanese Patent Application No. 2001-165098, filed May 31, 2001, the entire contents of which are incorporated herein by reference.
1. Field of the Invention
The present invention relates to a method of computing a reciprocal square root, a reciprocal square rooting circuit, and a program for causing a computer to compute a reciprocal square root.
2. Description of the Related Art
Reciprocal square rooting circuits are disclosed in Jpn. Pat. Appln. KOKAI Publication Nos. 03-138725 and 09-319561. Jpn. Pat. Appln. KOKAI Publication No. 03-138725 discloses a multiplication type reciprocal square rooting circuit using Newtonian laws. Jpn. Pat. Appln. KOKAI Publication No. 09-319561 discloses a reciprocal square rooting circuit which has subtraction shift type dividers and square root extractors connected in series with each other and is designed to perform square root extraction in parallel with division.
The circuit disclosed in Jpn. Pat. Appln. KOKAI Publication No. 03-138725 uses multipliers and an initial value memory, and hence an increase in circuit size is inevitable.
According to Jpn. Pat. Appln. KOKAI Publication No. 09-319561, square root extraction is performed in one step per two cycles while division is performed, and square root extraction is performed in one step per cycle after division, resulting in a high latency.
It is, therefore, an object of the present invention to provide a method of computing a reciprocal square root, a reciprocal square rooting circuit, and a program, which realize a smaller amount of hardware than a conventional multiplication type reciprocal square rooting circuit and also realize faster operation than a conventional reciprocal square rooting circuit using dividers and square root extractors.
In order to achieve the above object, according to the first aspect of the present invention, there is provided a method of calculating a reciprocal square root by using a digit selection circuit, 1-digit multipliers, adders, and a converter when a radix is r in a computer system, comprising the steps of:
According to the second aspect of the present invention, there is provided a method of calculating a reciprocal square root by using a digit selection circuit, 1-digit multipliers, adders, and a converter when a radix is 2 in a computer system, comprising the steps of:
According to the third aspect of the present invention, there is provided a method of calculating a reciprocal square root by using a digit selection circuit, 1-digit multipliers, adders, and a converter when a radix is 4 in a computer system, comprising the steps of:
According to the fourth aspect of the present invention, there is provided a reciprocal square rooting circuit for computing a reciprocal square root for a radix of r by using a partial result S[j], a residual W[j], and a product P[j] of an operand X and the partial result S[j] after j iterations of calculation, comprising:
According to the fifth aspect of the present invention, the reciprocal square rooting circuit according to the fourth aspect of the present invention further comprises a register to store a value of S[j], a register to store a value of W[j], a register to store a value of P[j], and a register to store X.
According to the sixth aspect of the present invention, there is provided a program which causes a computer to implement a function of computing a reciprocal square root for a radix of r by using a partial result S[j] obtained after j iterations of calculation, a residual W[j], and a product P[j] of an operand X and the partial result S[j], the function including
According to the seventh aspect of the present invention, there is provided a program which causes a computer to implement a function of computing a reciprocal square root for a radix of 2 by using a partial result S[j] obtained after j iterations of calculation, a residual W[j], and a product P[j] of an operand X (¼<X<1) and the partial result S[j], the function including
According to the eighth aspect of the present invention, there is provided a program which causes a computer to implement a function of computing a reciprocal square root for a radix of 4 by using a partial result S[j] obtained after j iterations of calculation, a residual W[j], and a product P[j] of an operand X (¼<X<1) and the partial result S[j], the function including
Additional objects and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objects and advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out hereinafter.
The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate presently preferred embodiments of the invention, and together with the general description given above and the detailed description of the preferred embodiments given below, serve to explain the principles of the invention.
1. A hardware algorithm for computing a reciprocal square root with a radix r will be described first. Consider the computation of a mantissa part in computing the reciprocal square root of a floating point number. With respect to an operand X defined by ¼<X<1, S=X−1/2 is obtained as its reciprocal square root. In this case, 1<X−1/2 <2. Assume that X represents an n-digit r-ary fraction, and the r is the power of 2, i.e., r=sb.
−r−n<X−1/2−S<r−n (1)
S that satisfies inequality (1) is calculated down to the nth place as an r-ary number.
In this case, as in the calculation of a square root or the like, a reciprocal square root digit qj is obtained digit by digit from the most significant one. Letting S[j] be the partial result after j iterations, then
where S[0] is the initial value of the partial result.
The recurrence equation of the partial result is
S[j+1]:=S[j]+qj+1r−j−1 (2)
A reciprocal square root digit qj+1 is selected from a redundant digit set {−a, . . . , −1, 0, 1, . . . , a} where r/2≦a<r. The final result is
The result is computed with n-digit precision.
That is, S that satisfies inequality (1) is obtained.
A residual (or scaled partial remainder) W[j] is defined as
W[j]=rj(1−X·S[j]2) (3)
Substituting j+1 for j in equation (3) yields W[j+1]=rj+1(1−X·S[j+1]2). From equations (2) and (3), the recurrence equation of the residual is obtained as follows:
W[j+1]:=rW[j]−2X·S[j]qj+1−Xqj+12r−j−1 (4)
Since this equation includes the term −2X·S[j]qj+1, multiplication of an n-digit number X and a j-digit number S[j] is required for the calculation. To avoid the multiplication, X·S[j] is stored and updated by a shift, addition and subtraction, and 1-digit multiplication.
X·S[j] is defined as P[j]. The recurrence equation of W[j] is then rewritten as
W[j+1]:=rW[j]−qj+1+(2P[j]+Xqj+1r−j−1) (5)
In addition, the recurrence equation of P[j] is
P[j+1]:=P[j]+Xqj+1r−j−1 (6)
A method of selecting qj+1 will be described later.
From inequality (1), (S−r−n)2<X−1<(S+r−n)2 must hold.
Since the minimum and maximum reciprocal square root digit values are −a and a, the above equation is rewritten as (S[j]−r−jρ)2<X−1<(S[j]+r−jρ)2 where ρ=a/(r−1) is the redundancy factor of the reciprocal square root digit set. Therefore, according to equation (3), as the condition which W[j] should satisfied, the following inequality is obtained:
−2X·S[j]ρ+Xρ2r−j<W[j]<2X·S[j]ρ+Xρ2r−j (7)
At the beginning of the computation, equation (8) must be satisfied for j=0:
−2X·S[0]ρ+Xρ2<W[0]=1−X·S[0]2<2X·S[0]ρ+Xρ2 (8)
Since 2−2<X<1, equation (8) can be satisfied by, for example, letting S[0]= 3/2 and W[0]=1−9X/4. In this case, P[0]=3X/2. When ρ=1, S[0]=1 and W[0]=1−X can also be set.
The algorithm for computing the reciprocal square root is comprised of n iterations of calculation of the recurrence equations (2), (5), and (6) for the radix of r. The algorithm can be summarized as follows:
Algorithm[RSQRT]
Step 1:
X−1/2 is obtained as S[n]. Since S[n] is in the r-ary signed-digit representation (r-ary SD representation), it must be converted into an ordinary binary representation. This conversion may be performed at the end of the computation or may be performed concurrently with the computation by the on-the-fly conversion used in existing dividers and the like. As P[n], X1/2 is also obtained.
The speed of the computation can be greatly increased with a small increase in hardware by performing additions/subtractions appearing in the recurrence equations by the use of a redundant representation without carry propagation. Consider therefore implementation using the redundant representation. That is, the residual W[j], partial result S[j], and product P[j] of the operand X and S[j] are expressed in a carry-save form or binary SD representation, and additions/subtractions appearing in these recurrence equations are performed without carry propagation.
The selection of a reciprocal square root digit will be described below.
The reciprocal square root digit qj+1 is selected from the redundant digit set {−a, . . . , −1, 0, 1, . . . , a} so that W[j+1] satisfies
−2X·S[j+1]ρ+Xρ2r−j−1<W[j+1]<2X·S[j+1]ρ+Xρ2r−j−1 (9)
Note that qj+1 depends on rW[j], X, and S[j].
Letting (Lk[j], Uk[j]) be the range of rW[j] in which k (=−a, −a+1, . . . , a) can be selected as qj+1, then
Lk[j]=2X·S[j](k−ρ)+X(k−ρ)2r−j−1 (10)
Uk[j]=2X·S[j](k+ρ)+X(k+ρ)2r−j−1 (11)
Note that the lower bound of the range for k=−a and the upper bound of the range for k=a are equal to the lower and upper bounds of rW[j], respectively.
The range of rW[j] in which k−1 is selected as qj+1 must be continuous with the range in which k is selected. Hence, Uk−1[j]>Lk[j] needs to be satisfied. That is,
(2ρ−1) (2X·S[j]+X(2k−1)r−j−1)>0 (12)
needs to be satisfied. This inequality is always satisfied. The left-hand side of inequality (12) represents the overlap between adjacent selection ranges. The digit selection function can be simplified by using this overlap. More specifically, although qj+1 depends on rW[j], X, and S[j], qj+1 can be determined from estimates (most significant several bits) of rW[j], X, and S[j].
Since P[j]=X·S[j], equations (10) and (11) can be rewritten as
Lk[j]=2P[j](k−ρ)+X(k−ρ)2r−j−1 (13)
Uk[j]=2P[j](k+ρ)+X(k+ρ)2r−j−1 (14)
Therefore, qj+1 can be determined from the estimates of rW[j], X, and P[j]. Since the second terms of the right-hand sides of (13) and (14) rapidly decrease as j increases, the digit selection function can be made independent of X except for a few, if any, small j's.
Let rW[j]e and P[j]e be the estimates of rW[j] and P[j]. Assume that rW[j]e and P[j]e are obtained by truncating rW[j] and P[j] to t and d factional bits, respectively (Note that they are not r-ary digits but binary bits). The digit selection function is expressed by a set of threshold values:
{mk(P[j]e)|k∈{−a+1, . . . −1, 0, 1, . . . , a}}
In this case, if mk(P[j]e)≦rW[j]e<mk+1(P[j]e), then k is selected as qj+1.
If W[j] is expressed in the carry-save form and the value obtained by truncating rW[j] to t fractional bits is used as rW[j]e without any change, then rW[j]e≦rW[j]<rW[j]e+2−t+1. Therefore, mk(P[j]e)>maxP[j]e (Lk[j]) and mk(P{j}e)−2−t+2−t+1≦minP[j]e(Uk−1[j]) must be established. That is,
maxP[j]e(Lk[j])<mk(P[j]e)<minP[j]e(Uk−1[j])−2−t (15)
must be satisfied. In this case, maxP[j]e(Lk[j]) represents the lower bound of the range of rW[j] in which k can be selected as qj+1 when the estimate of P[j] is P[j]e, and minP[j]e(Uk−1[j]) represents the upper bound of the range of rW[j] in which k−1 can be selected as qj+1. P[i]e must be a multiple of 2−t that satisfies inequality (15). Note that the maximum value of rW[j]e for which k−1 is selected as qj+1 is mk(P[j]e)−2−t. The necessary condition for the minimum overlap required for a feasible digit selection is
minP[j]e(Uk−1[j])−maxP[j]e(Lk[j])>2−t (16)
When P[j] is expressed in the carry-save form and the value obtained by truncating P[j] to d fractional bits is used as P[j]e, P[j]e≦P[j]≦P[j]e+2−d+1. According to equations (13) and (14), therefore, for k >0,
maxP[j]e(Lk[j])<2(P[j]e+2−d+1) (k−ρ)+X(k−ρ)2r−j−1,
minP[j]e(Uk−1[j])=2P[j]e(k−1+ρ)+X(k−1+ρ)2r−j−1 (17)
For k≦0,
maxP[j]e(Lk[j])=2P[j]e(k−ρ)+X(k−ρ)2r−j−1,
minP[j]e(Uk−1[j])>2(P[j]e+2−d+1) (k−1+ρ)+X(k−1+ρ)2r−j−1 (18)
A digit selection function may be determined from these values. Since they depend on j, a different digit selection function may be determined for different j. In practice, a common selection function can be obtained except for a few, if any, small j's.
Various different specific algorithms can be designed on the basis of this algorithm by setting the radix r, the redundancy factor ρ of the reciprocal square root digit set, the type of representation of the residual W[j] and the product P[j] of operand X and partial result S[j], (carry-save form or binary SD representation), a digit selection function, and the like.
Consider circuit implementation in general. A reciprocal square rooting circuit based on the above algorithm can be implemented as a combinational circuit or a sequential circuit. Pipelining can also be used.
First, consider a circuit for performing one iteration of Step 2. The block diagram of
Referring to
A 1-digit multiplier 2 (15) is a multiplier with a 1-digit multiplier factor, which multiplies the output from the carry-save adder W1 (14) by the output qj+1 from the digit selection circuit 11. Carry-save adders W2 (12) are several carry-save adders for calculating W[j+1] by adding rW[j] and the output from the 1-digit multiplier 2 (15). Carry-save adders P (16) are several carry-save adders for calculating P[j+1] by adding P[j] and the output from the 1-digit multiplier 1 (13).
An On-the-fly converter (10) is a converter for calculating S[j+1]= and S[j+1]− from S[j]= and S[j]−, which is mainly comprised of selectors.
The 1-digit multiplier 1 (13), carry-save adders W1 (14), 1-digit multiplier 2 (15), and carry-save adders W2 (12) constitute a W updating circuit 21.
When a reciprocal square rooting circuit is to be implemented as a sequential circuit which performs one iteration of Step 2 in each clock cycle, it is comprised of a combinational circuit part and registers. The combinational circuit part is the circuit shown in
Since W[j] and P[j] are in the carry-save form, two registers are required for each of them. In order to avoid variable (j+1-digit) shift of X, Xr−j is stored in the register REG-X and shifted to the right by 1 digit in each clock cycle. This sequential circuit computes an n-digit reciprocal square root in n+1 clock cycles. The clock cycle time is a constant independent of n. The amount of hardware is proportional to n. It has a regular circuit structure with a digit-slice feature suitable for VLSI implementation.
Obviously, the above circuit can also be implemented as a sequential circuit for performing more than one iteration of Step 2 per clock cycle.
A reciprocal square rooting circuit can be implemented as a combinational circuit by series-connecting a simple circuit for performing Step 1 to ncopies of the circuit for one iteration of Step 2 described above. Shifts are implemented by wiring. The delay (the number of logic gates) of the circuit is proportional to n. The amount of hardware is proportional to n2. This circuit has a regular 2-dimensional cellular array structure suitable for VLSI implementation.
2. A hardware algorithm for computing a reciprocal square root for the radix of 2 will be described next.
As a specific example of the algorithm described in “1.”, consider a case wherein the radix r is 2, the reciprocal square root digit set is {−1, 0, 1}, i.e., the redundancy factor ρ is 1, and the residual W[j] and product P[j] are expressed in the carry-save form.
When the radix is 2, the recurrence equations are given as
S[j+1]:=S[j]+qj+12−j−1,
W[j+1]:=2W[j]−(2P[j]+Xqj+12−j−1)qj+1,
P[j+1]:=P[j]+Xqj+12−j−1
Since ρ=1, S[0]=1, W[0]=1−x, and P[0]=X can be set for all X as initial values for j=0. In this case, 0<W[0]<¾.
To determine a digit selection function, Lk and Uk are obtained first from equations (13) and (14):
U−1[j]=0
L0[j]=−2P[j]+2−j−1X
U0[j]=2P[j]+2−j−1X
L1[j]=0
Since X>¼, P[j]>¼, and j≧0,
max(L0[j])<−⅜
min(U0[j])>½
These values are independent of P[j], X, and j. According to inequality (15), therefore, −⅜<m0≦−2−t and 0<m1≦½−2−t must be satisfied. By setting t=2, m0=−¼ and m1=¼ can be obtained, which are independent of j.
This radix-2 version of the algorithm is summarized as follows. Since −2<W[j]<3, W[j] can be expressed in a two's complement carry-save form with 3-bit integer part (including the sign bit). Therefore, qj+1 can be determined from the most significant 6 (6 digits in the carry-save form) bits of 2W[j].
Algorithm [RSQRT_R2]
Step 1:
Consider next implementation of the radix-2 version as a sequential circuit which performs one iteration of Step 2 in each clock cycle. The block diagram of
As in the arrangement shown in
In addition to the arrangement in
The digit selection circuit 36 is constituted by a 6-bit carry-propagate adder and a simple constant comparator. This arrangement also requires a buffer for driving qj+1. The on-the-fly converter 32 is mainly formed from two 2-to-1 selectors. The 1-digit multipliers 1 (38) and 2 (40) are multiplexers each serving to output 0, data input itself, or the bit-inverted data of the data input.
The carry-save adders W1 (39) and P (45) are carry-save adders, and the carry-save adder W2 (41) is a 4-2 adder. Taking the truncation errors into consideration, W[j] and P[j] should be calculated with 2−n−c precision. In this case,
c≈log2n
3. A radix-4 version of the algorithm for computing a reciprocal square root will be described next.
Consider a case wherein the radix r is 4, the reciprocal square root digit set is {−2,−1, 0, 1, 2}, i.e., the redundancy factor ρ is ⅔ and the residual W[j] and product P[j] are expressed in the carry-save form.
When the radix is 4, the recurrence equations are
S[j+1]:=S[j]+qj+14−j−1,
W[j+1]:=4W[j]−(2P[j]+Xqj+14−j−1)qj+1,
P[j+1]:=P[j]+Xqj+14−j−1
The following are initial values for j=0. When X<⅜, S[0]=2, W[0]=1−4X, and P[0]=2X. When ⅜≦X<¾, S[0]= 3/2, W[0]=1−9X/4, and P[0]=3X/2. When X≧¾, S[0]=1, W[0]=1−X, and P[0]=X. Although S[0]= 3/2, W[0]=1−9X/4, and P[0]=3X/2 can be set for all X, since the same digit selection function as that for j≧1 can be used for j=0, these initial values are used.
To determine a digit selection function, maxP[j]e(Lk[j]) and minP[j]e(Uk[j]) are obtained from expressions (17) and (18).
minP[j]e(U−2[j])>− 8/3(P[j]e+2−d+1)+ 4/9X·4−j
maxP[j]e(L−1[j])=− 10/3P[j]e+ 25/36X·4−j
minP[j]e(U−1[j])>−⅔(P[j]e+2−d+1)+ 1/36X·4−j
maxP[j]e(L0[j])=− 4/3P[j]e+ 1/9X·4−j
minP[j]e(U0[j])= 4/3P[j]e+ 1/9X·4−j
maxP[j]e(L1[j])<⅔(P[j]e+2−d+1)+ 1/36X·4−j
minP[j]e(U1[j])= 10/3P[j]e+ 25/36X·4−j
maxP[j]e(L2[j])< 8/3(P[j]e+2−d+1)+ 4/9X·4−j
When j≧1, according to inequality (15), the following must be satisfied:
− 10/3P[j]e+ 25/144X<m−1(P[j]e)≦− 8/3P[j]e− 16/32−d−2−t,
− 4/3P[j]e+ 1/36X<m0(P[j]e)≦−⅔P[j]e− 4/3·2−d31 2−t,
⅔P[j]e+ 4/3·2−d+ 1/144X≦m1(P[j]e−2−t,
8/3P[j]e+ 16/3·2−d+ 1/9X≦m2(P[j]e)≦ 10/3P[j]e−2−t.
Since P[j]e>X(X−1/2−(⅔)·4−j)−2−d+1 for j≧1, the following equations can be obtained by setting d=6 and t=4:
m−1(P[j]e)=−trunc4(3(P[j]e+2−6))
m0(P[j]e)=−trunc4(P[j]e+2−6)
m1(P[j]e)=trunc4(P[j]e+2−6)
m2(P[j]e)=trunc4(P[j]e+2−6)
where trunc4( ) is the truncation to 4 fractional bits.
Consider a case wherein j=0. When X<⅜, S[0]=2 and W[0]=1−4X. Therefore, for mk(P[j]e) described above, 4W[0]=4−16X>L−1[0], 4W[0]e≧m−1(P[0]e), maxP[0]e(L0[0])<m0(P[0]e)<minP[0]e(U−1[0]), maxP[0]e(L1[0])<m1(P[0]e)<minP[0]e(U0[0]), 4W[0]<U1[0], and 4W[0]e≦m2(P[0]e)−2−4 hold.
When ⅜≦X<¾, S[0]= 3/2 and W[0]=1−9X/4. Therefore, 4W[0]=4−9X>L−1[0], 4W[0]e≧m−1(P[0]e), maxP[0]e(L0[0])<m0(P[0]e)<minP[0]e(U−1[0]), maxP[0]e(L1[0])<m1(P[0]e)<minP[0]e(U0[0]), 4W[0]<U1[0], and 4W[0]e≦m2(P[0]e)−2−4 hold.
When X≧¾, S[0]=1 and W[0]=1−X. Therefore, 4W[0]=4−4X>L−1[0], 4W[0]e≧m−1(P[0]e), maxP[0]e(L0[0])<m0(P[0]e)<minP[0]e(U−1[0]), maxP[0]e(L1[0])<m1(P[0]e)<minP[0]e(U0[0]), 4W[0]<U1[0], and 4W[0]e≦m2(P[0]e)−2−4 hold.
In any case, therefore, the same digit selection function as that for j≧1 can be used, and q1 is selected from {−1, 0, 1}.
This radix-4 version of the algorithm is summarized as follows. Since −2<W[j]<2, W[j] is expressed in a two's complement carry-save form with 2-bit integer part (including the sign bit). Since 0<P[j]<2, P[j] is expressed in an unsigned carry-save form with 1-bit integer part. Therefore, qj+1 can be determined from the most significant 8 (carry-save) bits of 4W[j] and the most significant 7 (carry-save) bits of P[j].
Algorithm [RSQRT_R4]
Step 1:
In this manner, when the radix is 4, appropriate initial values are set for each of the three cases in accordance with the value of X. This allows the same digit selection function as that for j≧1 to be used for j=0 as well.
n iterations of the following calculations are performed from j=0 to n−1 (steps S25 to S28). The reciprocal square root digit qj+1 is selected from the digit set {−2, −1, 0, 1, 2} in accordance with the estimate of 4W[j] and P[j]. Then, S[j+1]= and S[j+1]− are obtained by on-the-fly conversion. At the same time, the recurrence equation of the residual W[j], i.e., W[j+1]:=4W[j]−(2P[j]+Xqj+14−j−1)qj+1 and the recurrence equation of the product P[j], i.e., P[j+1]:=P[j]+Xqj+14−j−1 are calculated. The obtained value S[n]= is then output as S to be obtained (step S29).
Consider implementation of the radix-4 version as a sequential circuit which performs one iteration of Step 2 in each clock cycle. The circuit structure is the same as that of the sequential implementation of the radix-2 version shown in
The digit selection circuit is a combination of an 8-bit carry-propagate adder, a 7-bit carry-propagate adder, and a 15-input combination circuit. A buffer for driving qj+1 is also required. The on-the-fly converter (32) is mainly constituted by a pair of 2-to-1 selectors. The 1-digit multiplier 1 (38) and 1-digit multiplier 2 (40) are multiplexers, which outputs 0, the data input itself, the bit-inverted data of the data input, the double of the data input, or the double of the bit-inverted data of the data input. The carry-save adder P (45) and carry-save adder W1 (39) are carry-save adders, and the carry-save adder W2 (41) is a 4-to-2 adder.
According to the embodiment described above, unlike a conventional reciprocal square rooting circuit of a multiplication type based on Newtonian laws or a simple combination of a divider and a square root extractor, a reciprocal square root is directly calculated by iterations of simple calculation, i.e., shifts, additions/subtractions, and 1-digit multiplication, without obtaining the reciprocal square root of an operand. The circuit according to the embodiment is therefore smaller than the circuit of the multiplication type, and faster than a simple combination of a divider and a square root extractor.
In addition, according to a method of superposing division and square root extraction by combining dividers and square root extractors, the latency becomes higher than division and square root extraction. In the present invention, however, the latency can be made equal to division and square root extraction.
According to the present invention, there are provided a method of computing a reciprocal square root, a reciprocal square rooting circuit, and a program, which realize a smaller amount of hardware than a conventional multiplication type reciprocal square rooting circuit and also realize faster operation than a conventional reciprocal square rooting circuit using dividers and square root extractors.
Additional advantages and modifications will readily occur to those skilled in the art. Therefore, the invention in its broader aspects is not limited to the specific details and representative embodiments shown and described herein. Accordingly, various modifications may be made without departing from the spirit and scope of the general inventive concept as defined by the appended claims and their equivalents.
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
5253215, | Dec 29 1989 | Motorola, Inc. | Method and apparatus for high speed determination of jth roots and reciprocals of jth roots |
5307302, | Jun 03 1991 | Matsushita Electric Industrial Co., Ltd. | Square root operation device |
6163791, | Feb 02 1998 | International Business Machines Corporation | High accuracy estimates of elementary functions |
6175907, | Jul 17 1998 | IP-First, LLC; IP-FIRST, LLC A DELAWARE LIMITED LIABILITY COMPANY | Apparatus and method for fast square root calculation within a microprocessor |
20010010051, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
May 20 2002 | TAKAGI, NAOFUMI | Semiconductor Technology Academic Research Center | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 012966 | /0294 | |
May 31 2002 | Semiconductor Technology Academic Research Center | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Apr 11 2011 | REM: Maintenance Fee Reminder Mailed. |
Sep 04 2011 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Sep 04 2010 | 4 years fee payment window open |
Mar 04 2011 | 6 months grace period start (w surcharge) |
Sep 04 2011 | patent expiry (for year 4) |
Sep 04 2013 | 2 years to revive unintentionally abandoned end. (for year 4) |
Sep 04 2014 | 8 years fee payment window open |
Mar 04 2015 | 6 months grace period start (w surcharge) |
Sep 04 2015 | patent expiry (for year 8) |
Sep 04 2017 | 2 years to revive unintentionally abandoned end. (for year 8) |
Sep 04 2018 | 12 years fee payment window open |
Mar 04 2019 | 6 months grace period start (w surcharge) |
Sep 04 2019 | patent expiry (for year 12) |
Sep 04 2021 | 2 years to revive unintentionally abandoned end. (for year 12) |