A branch history stores execution history information of branch instructions, and predicts presence of a branch instruction and a corresponding branch destination. A first return address stack stores, when an execution of a call instruction of a subroutine is completed, address information of a return destination of a corresponding return instruction. A second return address stack stores, when presence of a call instruction of a subroutine is predicted, address information of a return destination of a corresponding return instruction. An output selecting unit selects, when presence of a return instruction is predicted, if address information is stored in the second return address stack, the address information as a result of the branch prediction with a highest priority, and outputs the address information selected.

Patent
   RE42466
Priority
Nov 30 2004
Filed
Jan 15 2010
Issued
Jun 14 2011
Expiry
Feb 25 2025
Assg.orig
Entity
Large
3
15
all paid
19. A branch predicting method of performing a branch prediction in a pipeline processor, the branch predicting method comprising:
predicting presence of a branch instruction and a branch destination corresponding to the branch instruction using a branch history that stores execution history information of branch instructions including a call instruction and a return instruction;
a first storing including storing, when presence of a call instruction of a subroutine is predicted at the predicting, address information of a return destination of a return instruction corresponding to the call instruction in a storing unit;
a second storing including storing, when an execution of a call instruction of a subroutine is completed, address information of a return destination of a return instruction corresponding to the call instruction in the storing unit;
an output selecting including:
selecting, when presence of a return instruction is predicted at the predicting, if the address information is stored at the second storing, the address information stored at the first storing as a result of the branch prediction, and
outputting the address information selected; and
expelling an oldest piece of address information from the second return address stack when an execution of any one of the call instructions is completed.
1. A branch predicting apparatus that performs a branch prediction in a pipeline processor, the branch predicting apparatus comprising:
a branch history that stores execution history information of branch instructions including a call instruction and a return instruction, and searches through the execution history information to predict presence of a branch instruction and a branch destination corresponding to the branch instruction;
a first return address stack that stores, when an execution of a call instruction of a subroutine is completed, address information of a return destination of a return instruction corresponding to the call instruction;
a second return address stack that stores, when presence of a call instruction of a subroutine is predicted by the branch history, address information of a return destination of a return instruction corresponding to the call instruction; and
an output selecting unit that selects, when presence of a return instruction is predicted by the branch history, if address information is stored in the second return address stack, the address information stored in the second return address stack as a result of the branch prediction, and outputs the address information selected,
wherein the second return address stack expels an oldest piece of address information when an execution of any one of the call instructions is completed.
2. The branch predicting apparatus according to claim 1, wherein when the branch prediction fails, all contents of the second address stack are erased.
3. The branch predicting apparatus according to claim 1, wherein when the presence of the return instruction is predicted by the branch history, if a plurality of pieces of address information are stored in the second return address stack, the output selecting unit selects valid address information stored in the second return address stack last, as the result of the branch prediction, and outputs the valid address information selected.
4. The branch predicting apparatus according to claim 1, wherein when the presence of the return instruction is predicted by the branch history, if the address information is not stored in the second return address stack, the output selecting unit selects the address information stored in the first return address stack, as the result of the branch prediction, and outputs the address information selected.
5. The branch predicting apparatus according to claim 1, wherein when the presence of the return instruction is predicted by the branch history, if the address information is stored in neither of the first return address stack and the second return address stack, the output selecting unit selects a prediction result of the branch history as the result of the branch prediction, and outputs the prediction result selected.
6. The branch predicting apparatus according to claim 1, further comprising a call-instruction-state holding unit that holds a state of a call instruction including information on whether an execution of the call instruction, of which the presence is predicted by the branch history, is completed, and information on whether a branch prediction of a return instruction corresponding to the call instruction is completed.
7. The branch predicting apparatus according to claim 6, wherein when the branch prediction fails, all contents of the call-instruction-state holding unit are erased.
8. The branch predicting apparatus according to claim 6, further comprising a counter that holds number of call instructions for which the call-instruction-state holding unit holds the state.
9. The branch predicting apparatus according to claim 8, wherein when the branch prediction fails, all contents of the counter are erased.
10. The branch predicting apparatus according to claim 6, wherein when the presence of the return instruction is predicted by the branch history, if it is found by the call-instruction-state holding unit that a call instruction, of which the presence is predicted by the branch history but an execution is not completed, and that a branch prediction of a corresponding return instruction is not completed, is present, the output selecting unit selects the address information stored in the second return address stack as the result of the branch prediction, and outputs the address information selected.
11. The branch predicting apparatus according to claim 10, wherein when the presence of the return instruction is predicted by the branch history, if it is found by the call-instruction-state holding unit that a call instruction, of which the presence is predicted by the branch history but an execution is not completed, and that a branch prediction of a corresponding return instruction is not completed, is present, and if the address information is not stored in the second return address stack, the output selecting unit selects a prediction result of the branch history as the result of the branch prediction, and outputs the prediction result selected.
12. The branch predicting apparatus according to claim 6, wherein when the presence of the return instruction is predicted by the branch history, if it is found by the call-instruction-state holding unit that a call instruction, of which the presence is predicted by the branch history but an execution is not completed, and that a branch prediction of a corresponding return instruction is not completed, is not present, the output selecting unit selects the address information stored in the first return address stack as the result of the branch prediction, and outputs the address information selected.
13. The branch predicting apparatus according to claim 12, wherein when the presence of the return instruction is predicted by the branch history, if it is found by the call-instruction-state holding unit that a call instruction, of which the presence is predicted by the branch history but an execution is not completed, and that a branch prediction of a corresponding return instruction is not completed, is not present, and if the address information is not stored in the first return address stack, the output selecting unit selects a prediction result of the branch history as the result of the branch prediction, and outputs the prediction result selected.
14. The branch predicting apparatus according to claim 6, wherein the call-instruction-state holding unit holds the state of the call instruction, of which the presence is predicted by the branch history, by switching a bit on and off for each instruction.
15. The branch predicting apparatus according to claim 14, wherein when the presence of the call instruction is predicted by the branch history, the call-instruction-state holding unit adds one bit that is an object of management, and sets a value of the bit to “0”,
when the presence of the return instruction is predicted by the branch history, the call-instruction-state holding unit changes a value of a latest bit having a value “0” to “1” from among the bits that are objects of management, and
when an execution of any one of the call instructions is completed, the call-instruction-state holding unit removes an oldest bit from the objects of management.
16. The branch predicting apparatus according to claim 14, wherein
when the presence of the call instruction is predicted by the branch history, the call-instruction-state holding unit adds one bit that is an object of management, and sets a value of the bit to “1”,
when the presence of the return instruction is predicted by the branch history, the call-instruction-state holding unit changes a value of a latest bit having a value “1” to “0” from among the bits that are objects of management, and
when an execution of any one of the call instructions is completed, the call-instruction-state holding unit removes an oldest bit from the objects of management.
17. The branch predicting apparatus according to claim 14, wherein
the call-instruction-state holding unit sets values of all bits possessed to “0” at a time of initialization,
when the presence of the call instruction is predicted by the branch history, the call-instruction-state holding unit adds one bit that is an object of management, and sets a value of the bit to “1”,
when the presence of the return instruction is predicted by the branch history, the call-instruction-state holding unit changes a latest bit having a value “1” to “0” from among the whole bits, and
when an execution of any one of the call instructions is completed, the call-instruction-state holding unit removes an oldest bit from the objects of management.
18. The branch predicting apparatus according to claim 14, wherein
the call-instruction-state holding unit sets values of all bits possessed to “1”, at a time of initialization,
when the presence of the call instruction is predicted by the branch history, the call-instruction-state holding unit adds one bit that is an object of management, and sets a value of the bit to “0”,
when the presence of the return instruction is predicted by the branch history, the call-instruction-state holding unit changes a latest bit having a value “0” to “1” from among the whole bits, and
when an execution of any one of the call instructions is completed, the call-instruction-state holding unit removes an oldest bit from the objects of management.
X_TOP_VALID USEX signal is not ON, if an entry in the return address stack 320 indicated by the return stack pointer 380 is valid, any one of STK0_SEL to STK3_SEL signals corresponding to the entry is turned ON and indicates that an address of a return destination of a return instruction should be acquired from the return address stack 320.

Two or more of the X_TOP_SEL signal and the STK0_SEL to STK3_SEL signals are never turned ON simultaneously, and a first selector equivalent to the return address selection circuit 391 outputs a content of an entry corresponding to an ON signal to a second selector as a branch prediction result. When none of the X_TOP_SEL signal and the STK0_SEL to STK3_SEL signals are not ON, output to the second selector is not performed.

If any one of the X_TOP_SEL signal and the STK0_SEL to STK3_SEL signals is ON and a return hit signal from the branch history 310 is ON, the second selector equivalent to the output selection circuit 392 outputs an address, which is outputted from the first selector, to the instruction fetch control unit 110 as a branch prediction result. In other cases, the second selector outputs an address outputted from the branch history 310 to the instruction fetch control unit 110 as a branch prediction result.

Note that, when branch prediction fails, the return address stack 320, the return address stack X 340, the call hit counter 360, the return hit table 370, and the return stack point 380 are reset to an initial state.

In the explanation according to the present embodiment, the respective bits of the return hit table 370 are initialized at a value 0 and the value is changed to 1 at the time of detection of a return instruction. However, the bits may be initialized at a value 1 and the value may be changed to 0 at the time of detection of a return instruction. A structure of the return hit table 370 in such a case will also be explained.

FIG. 17 is a logical circuit diagram of a circuit structure of the return hit table 370. As shown in the figure, the return hit table 370 holds information of m bits XH1 to XHm, and values of the respective bits are controlled by XH1_SET to XHm_SET signals. For example, a jth-bit XHj is controlled by the XHj_SET signal.

If a call hit signal from the branch history 310 is turned ON, this XHj_SET signal is turned OFF when a value of the call hit counter 360 at that point is j-1 to change a value of a bit corresponding thereto of the return hit table 370 to 0. In other words, when the call hit signal from the branch history 310 is turned ON, a value of a bit, which the call hit counter 360 indicates anew, is updated to 0.

If a return hit signal from the branch history 310 is turned ON, the XHj_SET signal is turned ON when all bits with values in the call hit counter 360 at that point more significant than j have a value 1 to change a value of a bit corresponding thereto of the return hit table 370 to 1. In other words, when the return hit signal from the branch history 310 is turned ON, a value of a most significant bit, a value of which in the return hit table 370 is 0, is updated to 1.

In this way, the respective bits of the return hit table 370 take a value 0 only when a valid value is present in entries in the imaginary return stack X corresponding thereto and takes a value 1 in other cases.

FIG. 18 is a logical circuit diagram of an output section of the return hit table 370. As shown in the figure, a USE_X signal outputted from the return hit table 370 is turned ON when at least one bit, which does not take a value 1, is present in all the bits of the return hit table 370. When a bit, which does not take a value 1, is present in the return hit table 370, this means that a valid value is present in the entries in the imaginary return stack X.

Next, an operation of the branch predicting apparatus according to the present embodiment will be explained with a case in which the instruction stream in FIG. 6 is executed as an example. FIG. 19A to 19L are explanatory diagrams for explaining the operation of the branch predicting apparatus according to the present embodiment. Note that, in the explanation of the present embodiment, a system for initializing respective bits of a return hit table with a value 0 is used.

FIG. 19A is a scene in which a call instruction is detected in the address A. A+8 is stored in X-TOP at the top of the return address stack X as an address of a return destination and a value of the return stack pointer is decremented by one to −1. In addition, a value of the call hit counter is decremented by one to 1.

Here, when it is assumed that a return instruction in the address C is detected by the branch history, since a bit with a value 0 is present between a least significant bit of the return hit table and a bit indicated by the call hit counter, a value stored in X-TOP that is a valid latest entry in the return address stack X, that is, A+8 is acquired as a branch prediction result. A+8 is a correct address as a branch destination of the return instruction in the address C.

FIG. 19B shows a scene after the return instruction is detected in the address C. A pop operation is performed in the return address stack X according to the detection of the return instruction to discard information in X-TOP. In addition, the value of the return stack pointer is incremented by one to 0. In the return hit table, a value of a bit of XH1 indicated by the call hit counter changes to 1, and the return hit table stores the fact that an imaginary entry corresponding thereto has been used.

FIG. 19C shows a scene in which execution of the call instruction in the address A is subsequently completed. In response to the completion of the execution of the call instruction, the address A+8 of a return destination is stored in SKT0 at the top of the return address stack and the value of the return stack pointer is incremented by one to 1. In addition, the value of the call hit counter is decremented by one to 0. In the return hit table, a shift operation is performed to discard the least significant bit.

FIG. 19D shows a scene in which a call instruction is subsequently detected in the address D. In response to the detection of the call instruction, an address D+8 of a return destination is stored in X-TOP at the top of the return address stack X and the value of the return stack pointer is decremented by one to 0. In addition, the value of the call hit counter is incremented by one to 1.

FIG. 19E shows a scene in which a call instruction is subsequently detected in the address F. In response to the detection of the call instruction, an address F+8 of a return destination is pushed to the return address stack X and the value of the return stack pointer is decremented by one to −1. In addition, the value of the call hit counter is incremented by one to 2.

FIG. 19F shows a scene in which a call instruction is subsequently detected in the address H. In response to the detection of the call instruction, an address H+8 of a return destination is pushed to the return address stack X. However, since both two entries have already been filled, old information D+8 is pushed out and disappears. In this way, in the return address stack X, H+8 is stored in X-TOP and F+8 is stored in X-NXT. In addition, the value of the return stack pointer is decremented by one to −2 and the value of the call hit counter is incremented by one to 3.

Here, when it is assumed that a return instruction in the address J is detected by the branch history, since a bit with a value 0 is present between a least significant bit of the return hit table and a bit indicated by the call hit counter, a value stored in X-TOP that is a valid latest entry in the return address stack X, that is, H+8 is acquired as a branch prediction result. H+8 is a correct address as an address of a branch destination of the return instruction in the address J.

FIG. 19G shows a scene after the return instruction is detected in the address J. A pop operation is performed in the return address stack X according to the detection of the return instruction and information in X-NXT is moved to X-TOP. In addition, the value of the return stack pointer is incremented by one to −1. In the return hit table, a value of a bit of XH3 indicated by the call hit counter changes to 1, and the return hit table stores the fact that an imaginary entry corresponding thereto has been used.

Here, when it is assumed that a return instruction in the address K is detected by the branch history, since a bit with a value 0 is present between the least significant bit of the return hit table and a bit indicated by a call hit counter, a value stored in X-TOP that is a valid latest entry in the return address stack X, that is, F+8 is acquired as a branch prediction result. F+8 is a correct address as an address of a branch destination of the return instruction in the address K.

FIG. 19H shows a scene after the return instruction is detected in the address K. A pop operation is performed in the return address stack X according to the detection of the return instruction to discard information in X-TOP. In addition, the value of the return stack pointer is incremented by one to 0. In the return hit table, a bit of XH3 indicated by the call hit counter already has a value 1, a value of a bit of the next XH2 changes to 1, and the return hit table stores the fact that an imaginary entry corresponding thereto has been used.

Here, when it is assumed that a return instruction in the address L is detected by the branch history, since a bit with a value 1 is present between the least significant bit of the return hit table and a bit indicated by the call hit counter, it is attempted to acquire address information from X-TOP that is a latest entry in the return address stack X. However, since information is not present in X-TOP already, address information cannot be acquired. In this case, since a correct address of a branch destination cannot be acquired even if address information is acquired from the return address stack, a predicted value of the branch history is adopted as a prediction result.

If the branch history can predict a correct address, branch prediction will be successful. Whereas branch prediction always fails in the scene in FIG. 7H that shows the same instruction execution state, in the branch prediction system according to the present embodiment, it is possible to significantly improve accuracy of branch prediction by performing branch prediction using the branch history.

FIG. 19I shows a scene after the return instruction is detected in the address L. A pop operation is performed in the return address stack X according to the detection of the return instruction. In addition, the value of the return stack pointer is incremented by one to 1. Since the bit of XH3 and the bit of HX2 already take a value 1 in the return hit table, a value of a bit of the next XH1 changes to 1, and the return hit table stores the fact that all the imaginary entries have been used.

If the execution of the call instruction in the address D is completed before the return instruction in the address L is detected, accuracy of branch prediction is further improved. Since the execution of the return instruction in the address C is required to be completed to complete the execution of the call instruction in the address D, it is assumed that the execution of the return instruction in the address C is completed in the scene of FIG. 19H.

FIG. 19J shows a scene in which the execution of the return instruction in the address C is completed in the state in FIG. 19H. In response to the completion of the execution of the return instruction, a pop operation is performed in the return address stack to discard a content of SKT0 at the top. In addition, the value of the return stack pointer is decremented by one to −1.

FIG. 19K shows a scene in which the execution of the call instruction in the address D is subsequently completed. In response to the completion of the execution of the call instruction, the address D+8 of a return destination is stored in SKT0 at the top of the return address stack and the value of the return stack pointer is incremented by one to 0. In addition, the value of the call hit counter is decremented by one to 2. In the return hit table, a shift operation is performed to discard the least significant bit.

Here, when it is assumed that a return instruction in the address L is detected by the branch history, since a bit with a value 0 is not present between the least significant bit of the return hit table and a bit indicated by the call hit counter, the value stored in SKT0 that is an entry in the return address stack indicated by the return stack pointer, that is, D+8 is acquired as a branch prediction result. D+8 is a correct address as an address of a branch destination of the return instruction in the address L.

In this way, if the execution of the call instruction is completed promptly, it is possible to perform branch prediction highly accurately without increasing the number of entries in the return address stack X with the branch prediction system according to the present embodiment.

FIG. 19L shows a scene after the return instruction is detected in the address L. According to the detection of the return instruction, a pop operation is performed in the return address stack X. In addition, the value of the return stack pointer is incremented by one to 1.

As described above, according to the second embodiment, validity of entries in the imaginary return address stack X is managed by the management table consisting of a small number of bits. Thus, it is possible to perform branch prediction highly accurately while holding down the number of entries in the actual return address stack X.

Note that, in the explanation according to the present embodiment, bits in the return hit table are used in order from a least significant bit. However, the bits do not have to be used in this way. For example, it is also possible to use the bits in order from a most significant bit and to use the bits like a stack.

In the branch prediction system explained in the second embodiment, a bit of a management table for imaginary entries is updated with detection of a return instruction as an opportunity. However, it is also possible to update a bit of a management table for imaginary entries with detection of a call instruction as an opportunity. According to the a third embodiment of the present invention, a branch prediction system for updating a bit of a management table for imaginary entries with detection of a call instruction as an opportunity will be explained.

FIG. 20 is an explanatory diagram for explaining an outline of the branch prediction system according to the third embodiment. As shown in the figure, in the branch prediction system according to the present embodiment, an X valid table and a call hit counter are used.

The X valid table is a table for managing validity of respective entries in an imaginary return address stack X. Each bit of the return hit table corresponds to one entry in the imaginary return address stack X and takes 0 as an initial value. If a bit has a value 0, this indicates that an imaginary entry corresponding thereto is invalid, and if a bit has a value 1, this indicates that an imaginary entry corresponding thereto is valid.

In the X valid table, a value 1 is set in order from a least significant bit XV1 every time a call instruction is detected and a bit with a most significant value 1 is reset to a value 0 every time a return instruction is detected. In addition, a value of the least significant bit is discarded by a shift operation every time execution of the call instruction is completed. This is because, when the execution of the call instruction is completed, since an address of a return destination of the return instruction is stored in a return address stack, entries in the imaginary return address stack X are made unnecessary.

Since the X valid table only has to have one bit for each entry of the imaginary return address stack X, it is possible to mount the X valid table at extremely low cost. For example, when an address length is assumed to be 64 bits, if eight entries are provided in the actual return address stack X, total 512 (64×8) bits are required. However, the X valid table requires only 8 bits.

The call hit counter takes 0 as an initial value and is incremented by one every time a call instruction is detected and decremented by one every time execution of the call instruction is completed. Therefore, a value of the call hit counter indicates the number of call instructions that have been detected by the branch history but execution of which has not been completed, that is, the number of entries in the imaginary return address stack X in which the value is stored.

FIG. 20 indicates that the value of the call hit counter is 5, there are five call instructions, execution of which has not been completed by the execution unit, and addresses of return destinations of return instructions corresponding to those call instructions are stored in five entries in the imaginary return address stack X.

The return instructions corresponding to some of those call instructions may have already been detected by the branch history. In an example in FIG. 20, third and fifth call instructions are pertinent to such call instructions, and values of bits in the X valid table corresponding to the call instructions are 0. When a return instruction is detected and an entry is made unnecessary, a bit corresponding to the entry is held in the X valid table to hold an order for shift-out at the time when the execution of the call instruction is completed.

It is assumed that the actual return address stack X includes only two entries because of limitation on cost or the like. In this return address stack X, an address of a return destination of a return instruction is pushed when a call instruction is detected, and a content of a latest entry is discarded by a pop operation when a return instruction is detected. When a call instruction is detected in a state in which all the entries are filled, a content of an oldest entry is pushed out and discarded.

In the example in FIG. 20, since three valid entries are present in the imaginary return address stack X but only two entries are present in the actual return address stack X, a content of an oldest entry is pushed out. However, since a record in the X valid table indicates that three entries should be originally present in the return address stack X, it is possible to prevent an address of a branch destination from being acquired by mistake from the return address stack as in the case of FIGS. 7 and 8 according to the second embodiment.

In the branch prediction system according to the present embodiment, when a return instruction is detected by the branch history, if a value of the call hit counter is 1 or more, the X valid table is checked. Then, if at least one bit having a value 1 is present among all the bits, a value of a highest bit with a value 1 is changed to 0, and a content of a latest entry in the actual return address stack X is acquired to set the content as a branch prediction result.

When address information has been discarded by pushout, all entries in the return address stack X may have been invalidated. In this case, since it is impossible to perform branch prediction using the return address stack X and the return address stack, a prediction result by the branch history is adopted as a branch prediction result.

If a bit having a value 1 is not present in the X valid table at all, since all the entries in the imaginary return address stack X have been used, an address of a return destination is acquired from the return address stack according to the return stack pointer.

As described above, in the branch prediction system according to the present embodiment, validity of the respective entries in the imaginary return address stack X is managed by the X valid table. Thus, even when the number of entries in the actual return address stack X is not enough, it is possible to perform branch prediction while keeping high accuracy.

FIG. 21 is a block diagram of the structure of the branch predicting apparatus according to the present embodiment. Since components from the instruction fetch control unit 110 to the address generation reservation station are the same as those according to the first embodiment, explanations thereof are omitted.

A branch predicting apparatus 400 includes a branch history 410, a return address stack 420, a return address arithmetic circuit 430, a return address stack X 440, a return address arithmetic circuit 450, a call hit counter 460, an X valid table 470, a return stack pointer 480, a return address selection circuit 491, and an output selection circuit 492.

The branch history 410, the return address stack 420, the return address arithmetic circuit 430, the return address stack X 440, the return address arithmetic circuit 450, the return stack pointer 480, the return address selection circuit 491, and the output selection circuit 492 correspond to and have the same functions as the branch history 310, the return address stack 320, the return address arithmetic circuit 330, the return address stack X 340, the return address arithmetic circuit 350, the return stack pointer 380, the return address selection circuit 391, and the output selection circuit 392 according to the second embodiment, respectively.

The call hit counter 460 is a device that keeps information on an entry in the imaginary return address stack X corresponding to the X valid table 470 up to which address information is stacked. As shown in FIG. 10 according to the second embodiment, the call hit counter 460 has a latch of K bits sufficient for holding the number of imaginary entries and outputs a value held by this latch as a CALL_HIT_CTR signal. The latch is set to 0 at the time of initialization and incremented by one every time a call hit signal from the branch history 410 is turned ON and decremented by one every time a call instruction flag from the branch reservation station 150 is turned ON.

As shown in FIG. 11 according to the second embodiment, a CALL_HIT_CTR signal outputted from the call hit counter 460 is decoded by the decoder to turn ON one of CTR_EQ_0 to CTR_EQ_m signals. For example, if a value of the CALL_HIT_CTR signal is zero, CTR_EQ_0 is turned ON and, if the value is m, CTR_EQ_m is turned ON.

The X valid table 470 is a device that holds bits indicating validity of entries in the imaginary return address stack X corresponding thereto. Each bit of the X valid table 470 has an initial value 0. A value of the bit is set to 1 when a value is stored in an imaginary entry corresponding thereto and is reset to 0 when a return instruction is detected and a value of an imaginary entry corresponding thereto is used for prediction. In addition, a bit corresponding to an oldest imaginary entry is shifted out every time execution of the call instruction is completed.

FIG. 22 is a logical circuit diagram of a circuit structure of the X valid table 470. As shown in the figure, the X valid table 470 holds information of m bits XV1 to XVm, and values of the respective bits are controlled by XV1_SET to XVm_SET signals. For example, the j-th bit XVj is controlled by an XVj_SET signal.

If a call hit signal from the branch history 410 is turned ON, this XVj_SET signal is turned ON when a value of the call hit counter 460 at that point is j-1 and changes a value of a bit corresponding thereto of the X valid table 470 to 1. In other words, when the call hit signal from the branch history 410 is turned ON, a value of a bit, which the call hit counter 460 indicates a new, is updated to 1.

If a return hit signal from the branch history 410 is turned ON, the XVj_SET signal is turned OFF when all bits with values in the call hit counter 460 at that point more significant than j have a value 0 and changes a value of a bit corresponding thereto of the X valid table 470 to a value 0. In other words, when the return hit signal from the branch history 410 is turned ON, a value of a most significant bit, which is 1 in the X valid table 470, is updated to 0.

In this way, the respective bits of the X valid table 470 take a value 1 only when a valid value is present in entries in the imaginary return stack X corresponding thereto and takes a value 0 in other cases.

FIG. 23 is a logical circuit diagram of an output section of the X valid table 470. As shown in the figure, a USE_X signal outputted from the X valid table 470 is turned ON when at least one bit, which takes a value 1, is present in all the bits of the X valid table 470. When a bit, which takes a value 1, is present in the X valid table 470, this means that a valid value is present in the entries in the imaginary return stack X.

Note that, when branch prediction fails, the return address stack 420, the return address stack X 440, the call hit counter 460, the X valid table 470, and the return stack point 480 are reset to an initial state.

Next, an operation of the branch predicting apparatus according to the present embodiment will be explained with a case in which the instruction stream in FIG. 6 according to the second embodiment is executed as an example. FIGS. 24A to 24I are explanatory diagrams for explaining the operation of the branch predicting apparatus according to the present embodiment. Note that, according to the present embodiment, it is assumed that all bits of the X valid table are set to a value 0 at the time of initialization.

FIG. 24A is a scene in which a call instruction is detected in the address A. A+8 is stored in X-TOP at the top of the return address stack X as an address of a return destination and a value of the return stack pointer is decremented by one to −1. In addition, a value of the call hit counter is incremented by one to 1, and a value 1 is set in XV1 of the X valid table indicated by the call hit counter.

Here, when it is assumed that a return instruction in the address C is detected by the branch history, since a bit with a value 1 is present in the X valid table, a value stored in X-TOP that is a valid latest entry in the return address stack X, that is, A+8 is acquired as a branch prediction result. A+8 is a correct address as a branch destination of the return instruction in the address C.

FIG. 24B shows a scene after the return instruction is detected in the address C. A pop operation is performed in the return address stack X according to the detection of the return instruction to discard information in X-TOP. In addition, the value of the return stack pointer is incremented by one to 0. In the X valid table, a value of a bit of XH1 indicated by the call hit counter changes to 1, and the X valid table stores the fact that an imaginary entry corresponding thereto has been used.

FIG. 24C shows a scene in which execution of the call instruction in the address A is subsequently completed. In response to the completion of the execution of the call instruction, the address A+8 of a return destination is stored in SKT0 at the top of the return address stack and the value of the return stack pointer is incremented by one to 1. In addition, the value of the call hit counter is decremented by one to 0. In the X valid table, a shift operation is performed to discard a least significant bit.

FIG. 24D shows a scene in which a call instruction is subsequently detected in the address D. In response to the detection of the call instruction, an address D+8 of a return destination is stored in X-TOP at the top of the return address stack X and the value of the return stack pointer is decremented by one to 0. In addition, the value of the call hit counter is incremented by one to 1, and a value 1 is set in XV1 of the X valid table indicated by the call hit counter.

FIG. 24E shows a scene in which a call instruction is subsequently detected in the address F. In response to the detection of the call instruction, an address F+8 of a return destination is pushed to the return address stack X and the value of the return stack pointer is decremented by one to −1. In addition, the value of the call hit counter is incremented by one to 2, and a value 1 is set in XV2 of the X valid table indicated by the call hit counter.

FIG. 24F shows a scene in which a call instruction is subsequently detected in the address H. In response to the detection of the call instruction, an address H+8 of a return destination is pushed to the return address stack X. However, since both two entries have already been filled, old information D+8 is pushed out and disappears. In this way, in the return address stack X, H+8 is stored in X-TOP and F+8 is stored in X-NXT. In addition, the value of the return stack pointer is decremented by one to −2 and the value of the call hit counter is incremented by one to 3, and a value 1 is set in XV3 of the X valid table indicated by the call hit counter.

Here, when it is assumed that a return instruction in the address J is detected by the branch history, since a bit with a value 1 is present in the X valid table, a value stored in X-TOP that is a valid latest entry in the return address stack X, that is, H+8 is acquired as a branch prediction result. H+8 is a correct address as an address of a branch destination of the return instruction in the address J.

FIG. 24G shows a scene after the return instruction is detected in the address J. A pop operation is performed in the return address stack X according to the detection of the return instruction and information in X-NXT is moved to X-TOP. In addition, the value of the return stack pointer is incremented by one to −1. In the X valid table, a value 1 of a most significant bit XV3 changes to 0, and the X valid table stores the fact that an imaginary entry corresponding thereto has been used.

Here, when it is assumed that a return instruction in the address K is detected by the branch history, since a bit with a value 1 is present in the X valid table, a value stored in X-TOP that is a valid latest entry in the return address stack X, that is, F+8 is acquired as a branch prediction result. F+8 is a correct address as an address of a branch destination of the return instruction in the address K.

FIG. 24H shows a scene after the return instruction is detected in the address K. A pop operation is performed in the return address stack X according to the detection of the return instruction to discard information in X-TOP. In addition, the value of the return stack pointer is incremented by one to 0. In the X valid table, a value 1 of a most significant bit XV2 changes to 0, and the X valid table stores the fact that an imaginary entry corresponding thereto has been used.

Here, when it is assumed that a return instruction in the address L is detected by the branch history, since a bit with a value 1 is present in the X valid table, it is attempted to acquire address information from X-TOP that is a latest entry in the return address stack X. However, since information is not present in X-TOP already, address information cannot be acquired. In this case, since a correct address of a branch destination cannot be acquired even if address information is acquired from the return address stack, a predicted value of the branch history is adopted as a prediction result.

If the branch history can predict a correct address, branch prediction will be successful. Whereas branch prediction always fails in the scene in FIG. 7H according to the second embodiment that shows the same instruction execution state, in the branch prediction system according to the present embodiment, it is possible to significantly improve accuracy of branch prediction by performing branch prediction using the branch history.

FIG. 24I shows a scene after the return instruction is detected in the address L. A pop operation is performed in the return address stack X according to the detection of the return instruction. In addition, the value of the return stack pointer is incremented by one to 1. In the X valid table, a value 1 of a most significant bit XV1 is changes to 0, and the X valid table stores the fact that all the imaginary entries have been used.

Note that, although not explained here, as in the case of the second embodiment, if execution of a call instruction is completed promptly, accuracy of branch prediction by the branch prediction system according to the present embodiment is further improved.

As described above, according to the third embodiment, validity of entries in the imaginary return address stack X is managed by the management table consisting of a small number of bits. Thus, it is possible to perform branch prediction highly accurately while holding down the number of entries in the actual return address stack X.

As it is seen if FIGS. 12 and 22 are compared, the branch prediction system according to the present embodiment can realize advantages, which are the same as those explained according to the second embodiment, with a more simple mechanism.

Note that, in the explanation according to the present embodiment, bits in the X valid table are used in order from a least significant bit. However, the bits do not have to be used in this way. For example, it is also possible to use the bits in order from a most significant bit and to use the bits like a stack. In addition, it is also possible to reverse ON/OFF of bits from that according to the present embodiment.

According to the present invention, when a valid entry is present in the second return address stack, an address of a branch destination is acquired from the second return address stack regardless of a value of the return stack pointer. Thus, there is an effect that it is possible to acquire address information stored in the first return address stack and the second return address stack in an appropriate order to perform highly accurate branch prediction.

Furthermore, according to the present invention, when plural valid entries are present in the second return address stack, valid address information stored last in the second return address stack is acquired. Thus, there is an effect that it is possible to perform highly accurate branch prediction even when there are plural call instructions that have been detected by the branch history but execution of which has not been completed.

Moreover, according to the present invention, an address of a branch destination is acquired from the first return address stack when valid address information is not stored in the second return address stack. Thus, there is an effect that it is possible to acquire address information stored in the first return address stack and the second return address stack in an appropriate order to perform highly accurate branch prediction.

Furthermore, according to the present invention, a prediction result of the branch history is used when valid address information is not stored in the first and the second return address stacks. Thus, there is an effect that it is possible to acquire address information stored in the first return address stack and the second return address stack and information of the branch history in an appropriate order to perform highly accurate branch prediction.

Moreover, according to the present invention, validity of an entry in an imaginary second return address stack is managed by the call-instruction-state holding unit. Thus, there is an effect that it is possible to perform branch prediction highly accurately while holding down the number of entries in an actual second return address stack.

Furthermore, according to the present invention, the number of call instructions, information on which is held by the call-instruction-state holding unit, is held in the counter. Thus, there is an effect that it is possible to simplify a mechanism of the call-instruction-state holding unit.

Moreover, according to the present invention, it is judged whether valid address information is stored in the second return address stack based on information stored in the call-instruction-state holding unit. Thus, there is an effect that it is possible to acquire address information stored in the first return address stack and the second return address stack and information of the branch history in an appropriate order to perform highly accurate branch prediction.

Furthermore, according to the present invention, validity of an entry in the imaginary second return address stack is managed by a management table consisting of a small number of bits. Thus, there is an effect that it is possible to perform branch prediction highly accurately while reducing the number of entries in the actual second return address stack to control an increase in cost.

Moreover, according to the present invention, various kinds of information for branch prediction are initialized when the branch prediction fails such that the branch prediction is not continued based on wrong information. Thus, there is an effect that it is possible to perform branch prediction highly accurately.

Although the invention has been described with respect to a specific embodiment for a complete and clear disclosure, the appended claims are not to be thus limited but are to be construed as embodying all modifications and alternative constructions that may occur to one skilled in the art which fairly fall within the basic teaching herein set forth.

Yokoi, Megumi

Patent Priority Assignee Title
10747539, Nov 14 2016 Apple Inc. Scan-on-fill next fetch target prediction
9354886, Nov 28 2011 Apple Inc. Maintaining the integrity of an execution return address stack
9405544, May 14 2013 Apple Inc. Next fetch predictor return address stack
Patent Priority Assignee Title
4459659, Feb 04 1981 Unisys Corporation Subroutine control circuitry for providing subroutine operations in a data processing system in which tasks are executed on a microprogrammed level
5193205, Mar 01 1988 Mitsubishi Denki Kabushiki Kaisha Pipeline processor, with return address stack storing only pre-return processed address for judging validity and correction of unprocessed address
5355459, Mar 01 1988 Renesas Electronics Corporation Pipeline processor, with return address stack storing only pre-return processed addresses for judging validity and correction of unprocessed address
5526498, Mar 01 1988 Renesas Electronics Corporation Pipeline processor, with a return address stack and two stack pointers, for storing pre-return processed addresses
5604877, Jan 04 1994 Intel Corporation Method and apparatus for resolving return from subroutine instructions in a computer processor
5701449, Mar 01 1988 Mitsubishi Denki Kabushiki Kaisha Data processor
5964868, May 15 1996 Intel Corporation Method and apparatus for implementing a speculative return stack buffer
5978904, Mar 01 1988 Mitsubishi Denki Kabushiki Kaisha Data processor
6151673, Mar 01 1988 Mitsubishi Denki Dabushiki Kaisha Data processor
6408385, Mar 01 1988 Mitsubishi Denki Dabushiki Kaisha Data processor
20020188833,
20040003218,
JP1258032,
JP200438323,
JP57143642,
/
Executed onAssignorAssigneeConveyanceFrameReelDoc
Jan 15 2010Fujitsu Limited(assignment on the face of the patent)
Date Maintenance Fee Events
Aug 22 2011REM: Maintenance Fee Reminder Mailed.
Sep 06 2011M1551: Payment of Maintenance Fee, 4th Year, Large Entity.
Sep 06 2011M1554: Surcharge for Late Payment, Large Entity.
Jul 01 2015M1552: Payment of Maintenance Fee, 8th Year, Large Entity.
Jul 04 2019M1553: Payment of Maintenance Fee, 12th Year, Large Entity.


Date Maintenance Schedule
Jun 14 20144 years fee payment window open
Dec 14 20146 months grace period start (w surcharge)
Jun 14 2015patent expiry (for year 4)
Jun 14 20172 years to revive unintentionally abandoned end. (for year 4)
Jun 14 20188 years fee payment window open
Dec 14 20186 months grace period start (w surcharge)
Jun 14 2019patent expiry (for year 8)
Jun 14 20212 years to revive unintentionally abandoned end. (for year 8)
Jun 14 202212 years fee payment window open
Dec 14 20226 months grace period start (w surcharge)
Jun 14 2023patent expiry (for year 12)
Jun 14 20252 years to revive unintentionally abandoned end. (for year 12)