A method for eliminating a branch instruction in a control flow path of a computer program. The method includes providing a computer program having a plurality of basic blocks forming control flow paths, determining a cost of executing a branch instruction terminating a basic block in one of the control flow paths, determining a cost of combining basic blocks when merging the control flow paths, and eliminating the branch instruction from the basic block whose cost of execution is greater than the cost of combining the basic blocks in merging the control flow paths.
|
41. A computer system comprising: a memory for storing a computer program including combined basic blocks having a combined cost less than a cost of executing a branch instruction when the basic blocks are uncombined, wherein the combined cost includes a resource height increase and a dependence height increase of the combined basic blocks; and a processor for executing the computer program, and wherein the basic blocks are combined if the cost of executing the branch instruction when the blocks are uncombined is greater than the larger of the resource height increase and dependence height increase.
44. A computer system comprising: a memory for storing a computer program including merged basic blocks having an increase in height from being merged which is less than a cost of a branch instruction when the basic blocks are unmerged, wherein the increase in height includes a resource height increase and a dependence height increase of the merged basic blocks, and wherein the basic blocks are merged if the cost of the branch instruction when the basic blocks are unmerged is greater than the larger of the resource height increase and dependence height increase; and a processor for executing the computer program.
31. A computer implemented method for merging control flow paths in a computer program, comprising:
providing a computer program having a plurality of basic blocks forming control flow paths;
determining an increase in height of the basic blocks when merged;
determining that a cost of a branch instruction is more than the increase in height of the basic blocks when merged, wherein the increase in height includes a resource height increase and a dependence height increase of the basic blocks when merged; and
combining contents of the basic blocks to merge control flow paths, if the cost of the branch instruction is more than the larger of the resource height increase and dependence height increase.
1. A computer-implemented method for eliminating a branch instruction in a control flow path of a computer program, comprising:
providing a computer program having a plurality of basic blocks forming control flow paths;
determining a cost of executing a branch instruction in one of the control flow paths, where the cost of executing the branch instruction is based on a predictive ratio of a branch, a mispredict ratio of the branch, and a correctly predicted taken branch penalty in cycles for a processor;
determining a cost of combining basic blocks in merging the control flow paths; and
eliminating the branch instruction from one control flow path whose cost of execution is greater than the cost of combining basic blocks in merging the control flow paths.
49. A computer system for merging control flow paths in a computer program, comprising: a memory for storing a computer program having a plurality of basic blocks forming control flow paths and a processor for executing the computer program; and scheduling means for determining an increase in height of the basic blocks when merged, wherein the increase in height includes a resource height increase and a dependence height increase of the basic blocks when merged, for determining that a cost of a branch instruction is more than the increase in height of the basic blocks when merged, and for combining contents of the basic blocks to merge control flow paths if the cost of the branch instruction is more than the larger of the resource height increase and dependence height increase.
47. A computer system for eliminating a branch instruction in a control flow path of a computer program, comprising:
a memory for storing a computer program having a plurality of basic blocks forming control flow paths, and a processor for executing the computer program;
means for determining a cost of executing a branch instruction in one of the control flow paths;
means for determining a cost of combining basic blocks in merging the control flow paths, wherein the cost of combining the basic blocks includes a resource height increase and a dependence height increase of the combined basic blocks; and
means for eliminating the branch instruction from one control flow path if the cost of execution of the one control path is greater than the larger of the resource height increase and dependence height increase.
15. A computer implemented method for producing a computer control flow path comprising:
providing a computer program having a plurality of basic blocks forming control flow paths;
selecting a first control flow path having at least one first basic block, leaving a second control flow path having at least one second basic block;
determining a cost of executing the first basic block in the first control path and a cost of executing the second basic block in the second control flow path;
determining that a cost of combining the first basic block and the second basic block is less than a cost of executing a branch instruction of the first control flow path, where the cost of combining the first and second basic blocks includes a resource height increase and a dependence height increase of the combined first and second basic blocks;
combining the first basic block and the second basic block to merge at least part of the second control flow path with the first control flow path to produce a computer control flow path having at least one combined basic block, if the cost of executing the branch instruction of the first control flow path is greater than the larger of the resource height increase and dependence height increase.
48. A computer system for producing a computer control flow path in a computer program, comprising:
a compilation means for:
selecting in a computer program a first control flow path having a branch instruction and at least one first basic block, leaving a second control flow path having at least one second basic block;
determining a cost of executing the first basic block in the first control path and a cost of executing the second basic block in the second control flow path;
determining that a cost of combining the first basic block and the second basic block is less than a cost of executing the branch instruction of the first control flow path, wherein the cost of combining the first and second basic blocks includes a resource height increase and a dependence height increase of the combined first and second basic blocks;
combining the first basic block and the second basic block to merge at least part of the second control flow path with the first control flow path to produce a computer control flow path having at least one combined basic block, if the cost of executing the branch instruction of the first control path is greater than the larger of the resource height increase and dependence height increase; and
a memory for storing the compilation means, and a processor for executing the compilation means.
2. The computer implemented method of
3. The computer implemented method of
4. The computer implemented method of
5. The computer implemented method of
6. The computer implemented method of
7. The computer implemented method of
8. The computer implemented method of
9. The computer implemented method of
10. The computer implemented method of
13. The computer implemented method of
14. The computer implemented method of
16. The computer implemented method of
17. The computer implemented method of
18. The computer implemented method of
19. The computer implemented method of
20. The computer implemented method of
21. The computer implemented method of
22. The computer implemented method of
23. The computer implemented method of
26. The computer implemented method of
27. The computer implemented method of
28. The computer implemented method of
29. The computer implemented method of
30. The computer implemented method of
32. The computer implemented method of merging of
33. The computer implemented method of merging of
34. The computer implemented method of merging of
35. The computer implemented method of
36. The computer implemented method of merging of
37. The computer implemented method of
38. The computer implemented method of
39. The computer implemented method of
40. The computer implemented method of
42. The computer system of
43. The computer system of
45. The computer system of
46. The computer system of
|
1. Field of the Invention
Embodiments of the present invention relate generally to computer systems. More particularly, embodiments of the present invention relate to a system and method for eliminating branch instructions and/or merging control flow paths of a computer program in a computer-based environment.
2. Description of the Background Art
Work has been performed in the area of predicated execution for computer programs. A published algorithm for branch elimination exists in a Ph.D, dissertation entitled Exploiting Instruction-Level Parallelism in the Presence of Conditional Branches to Scott A. Mahlke, Department of Electrical and Computer Engineering, University of Illinois, Urbana, Ill., September, 1996, fully incorporated herein by reference thereto. The object of Mahlke's approach is to merge as many control flow paths together to form “hyperblocks” that are as large as possible. A “Hyperblock”, as defined by Mahlke, is a collection of connected basic blocks in which control may only enter through the first block, referred to as the entry block. Control flow may leave from any number of blocks in the hyperblock. All control flow between basic blocks in a hyperblock is removed via if-conversion. The goal of hyperblocks is to intelligently group basic blocks from many different control flow paths into a single manageable block for compiler optimization and scheduling. The formation of hyperblocks is necessary for Mahlke's approach because the IMPACT compiler in which Mahlke's approach was implemented does not contain an instruction scheduler capable of cross basic block code motion.
In order to achieve a desired combined path, Mahlke's algorithm actually enumerates all possible control flow paths through the scheduling region of the computer program and computes a priority function for each control flow path. A disadvantage to Mahlke's approach is that it has to find all possible control flow paths, the number of which is proportional to the square of the number of split or bifurcation points in the region.
Another disadvantage of Mahike's approach is that his aggressive if-conversion routine actually results in an over subscription of computer resources. As an attempt to solve this problem, David I. August, Wen-mei W. Hwu, and Scott A. Mahlke, in an article entitled A Framework for Balancing Control Flow and Predication, published for the Proceedings of the 30th International Symposium on Microarchitecture, December, 1997, and fully incorporated herein by reference thereto, propose a technique that involves iteratively removing control flow paths and rescheduling hyperblocks until the resources are no longer over subscribed.
Embodiments of the present invention provide a method for eliminating a branch instruction in a control flow path of a computer program. The method comprises providing a computer program having a plurality of basic blocks forming control flow paths. The computer program may additionally comprise a scheduling region having an entry basic block, at least one exit basic block, and the basic blocks positioned between the entry basic block and the exit basic block of the scheduling region. The method also comprises determining a cost of executing a branch instruction in one of the control flow paths, determining a cost of combining basic blocks in the control flow paths, and eliminating the branch instruction from one control flow path whose cost of execution is greater than the cost of combining basic blocks in merging control flow paths. The branch instruction may terminate one of the basic blocks (e.g., an entry basic block). The method may additionally comprise determining, prior to eliminating, that the branch instruction has a cost of execution that is greater than the cost of combining at least one basic block of one control flow path with at least one basic block of another control flow path. The cost of executing the branch instruction is also greater than the cost of a height increase from combining basic blocks in merging control flow paths. The height is selected from a resource height and a dependence height. The method may further additionally comprise selecting for the height increase the largest of a resource height increase and a dependence height increase. Combining basic blocks in merging control flow paths comprises combining at least one basic block assigned to a first control flow path with at least one basic block assigned to a second control flow path. A computer program produced in accordance with one or more of these methods is also provided under embodiments of the present invention.
Embodiments of the present invention also provide a method for producing a computer control flow path comprising providing a computer program having a plurality of basic blocks forming control flow paths. As indicated, preferably the computer program may additionally comprise a scheduling region having an entry basic block, at least one exit basic block, and the basic blocks disposed between the entry basic block and the exit basic block. A first control flow path with at least one basic block is selected, leaving a second control flow path having at least one second basic block. The method for producing a computer control flow path additionally comprises determining a cost of executing the first basic block in the first control path and a cost of executing the second basic block in the second control flow path, determining that a cost of combining the first basic block and the second basic block is less than a cost of executing a branch instruction of the first control flow path, and combining the first basic block and the second basic block to merge at least part of the second control flow path with the first control flow path to produce a computer control flow path having at least one combined basic block. The produced computer control flow path may extend within the scheduling region from the entry basic block to the exit basic block of the scheduling region. The cost of executing the branch instruction is greater than the cost of a height increase from combining the first basic block and the second basic block to merge at least part of the second control flow path with the first control flow path. The method, which may be partly or fully combined with other embodiments of the present invention, may further comprise selecting from the combined basic block a third control flow path having at least one third basic block. The third control flow path may comprise at least a subset of the second control flow path. A computer program produced in accordance with this method is also provided under embodiments of the present invention.
Another embodiment of the present invention provides a method for merging control flow paths in a computer program. The method comprises providing a computer program having a plurality of basic blocks forming control flow paths. For this embodiment of the present invention and as previously indicated for other embodiments, the computer program may also have a scheduling region including an entry basic block and an exit basic block, with the basic blocks disposed between the entry basic blocks and the exit basic blocks of the scheduling region. The method, which may be partly or fully combined with other embodiments of the present invention, further includes determining an increase in height of the basic blocks when merged, determining that a cost of a branch instruction is more than the increase in height of the basic blocks when merged, and combining contents of the basic blocks to merge control flow paths. The basic blocks may comprise a first basic block having a first instruction, and a second basic block having a second instruction. The height increase of the first and second basic blocks may comprise the difference in a height of merged first and second basic blocks and a height of unmerged first and second basic block. The height of merged first and second basic blocks includes a total number of cycles for the first and second basic blocks when merged times a predicted ratio of the first basic block and a predicted ratio of the second basic block. The height of the unmerged first and second basic blocks may include (the predicted ratio of the first basic block times a number of cycles for the first basic block) plus (the predicted ratio of the second basic block times a number of cycles for the second basic block). The height for respective basic blocks may be a resource height or a dependence height. A computer program produced in accordance with this method is also provided under embodiments of the present invention.
Additional embodiments of the present invention provide articles of manufacture. In one embodiment an article of manufacture comprises a computer-readable medium having instructions for: determining a cost of executing a branch instruction in one of a plurality of control flow paths of a computer program, determining a cost of combining basic blocks in merging the control flow paths of the computer program, and eliminating the branch instruction from the control flow path whose cost of execution is greater than the cost of combining basic blocks in merging the control flow paths. In an additional embodiment an article of manufacture comprises a computer-readable medium having instructions for selecting from a computer program a first control flow path having at least one first basic block, leaving a second control flow path having at least one second basic block, determining a cost of executing the first basic block in the first control path and the cost of executing the second basic block in the second control flow path, determining that a cost of combining the first basic block and the second basic block is less than a cost of executing a branch instruction of the first control flow path, and combining the first basic block and the second basic block to merge at least part of the second control flow path with the first control flow path to produce a computer control flow path having at least one combined basic block. In a further embodiment for an article of manufacture, an article of manufacture comprises a computer-readable medium having instructions for: determining from basic blocks of control flow paths of a computer program an increase in height of the basic blocks when merged, determining that a cost of a branch instruction is more than the increase in height of the basic blocks when merged, and combining contents of the basic blocks to merge control flow paths.
Further embodiments of the present invention provide a computer system. In one embodiment a computer system comprises a computer program including combined basic blocks having a combined cost less than a cost of executing a branch instruction when the basic blocks are uncombined. The computer system may additionally comprise a compilation system having the computer program and/or an instruction scheduler having the computer program. Another embodiment of the computer system comprises a computer program including merged basic blocks having an increase in height from being merged which is less than a cost of a branch instruction when the basic blocks are unmerged. The computer system additionally comprises a compilation system including a compiler having the computer program. The compiler may comprise an instruction scheduler having the computer program.
In at least one additional embodiment, a computer system for eliminating a branch instruction in a control flow path of a computer program, comprises a computer program having a plurality of basic blocks forming control flow paths containing branch instructions, means for determining a cost of executing a branch instruction in one of the control flow paths, means for determining a cost of combining basic blocks in merging the control flow paths, and means for eliminating the branch instruction from one control flow path whose cost of execution is greater than the cost of combining basic blocks in merging the control flow paths. In at least one further additional embodiment a computer system for producing a computer control flow path in a computer program comprises a compilation means for: selecting in a computer program a first control flow path having a branch instruction and at least one first basic block, leaving a second control flow path having at least one second basic block, determining the cost of executing the first basic block in the first control path and the cost of executing the second basic block in the second control flow path, determining that a cost of combining the first basic block and the second basic block is less than a cost of executing the branch instruction of the first control flow path, and combining the first basic block and the second basic block to merge at least part of the second control flow path with the first control flow path to produce a computer control flow path having at least one combined basic block.
Another embodiment includes a computer system for merging control flow paths in a computer program comprising a computer program having a plurality of basic blocks forming control flow paths; and scheduling means for determining an increase in height of the basic blocks when merged, for determining that a cost of a branch instruction is more than the increase in height of the basic blocks when merged, and for combining contents of the basic blocks to merge control flow paths.
These provisions together with the various ancillary provisions and features which will become apparent to those artisans possessing skill in the art as the following description proceeds are attained by devices, assemblies, systems and methods of embodiments of the present invention, various embodiments thereof being shown with reference to the accompanying drawings, by way of example only, wherein:
In the description herein, numerous specific details are provided, such as examples of components and/or methods, to provide a thorough understanding of embodiments of the present invention. One skilled in the relevant art will recognize, however, that an embodiment of the invention can be practiced without one or more of the specific details, or with other apparatus, systems, assemblies, methods, components, materials, parts, and/or the like. In other instances, well-known structures, materials, or operations are not specifically shown or described in detail to avoid obscuring aspects of embodiments of the present invention.
A “computer” for purposes of embodiments of the present invention may be any processor-containing device, such as a mainframe computer, a personal computer, a laptop, a notebook, a microcomputer, a server, or any of the like. A “computer program” may be any suitable program or sequence of coded instructions which are to be inserted into a computer, well know to those skilled in the art. Stated more specifically, a computer program is an organized list of instructions that, when executed, causes the computer to behave in a predetermined manner. A computer program contains a list of ingredients (called variables) and a list of directions (called statements) that tell the computer what to do with the variables. The variables may represent numeric data, text, or graphical images.
A “computer-readable medium” for purposes of embodiments of the present invention may be any medium that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, system or device. The computer readable medium can be, by way of example only but not by limitation, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, system, device, propagation medium, or computer memory.
Referring now to
Computer memory 16 may be any suitable memory storage device, including random access memory (RAM), cache memory, magnetic medium such as a resident hard disk, or other memory storage devices. The term “storage” may refer to computer resources, such as the computer memory 16, and may be employed to store suitable data or instructions in executing a computer program. For exemplarily purposes only and as best illustrated in
The compilation system 40 for various embodiments of the invention would comprise a compiler having a special program that processes statements written in a particular programming language and turns them into machine language or “code” that a processor, such as processor 14, uses. Typically, a programmer writes language statements in a language such as “Pascal” or “C” one line at a time using an editor. The file that is created contains what are called “source statements” or “source codes”. The programmer then runs the appropriate language compiler, specifying the name of the file that contains the source statements. When the compiler executes or runs, the compiler first parses (or analyzes) all of the language statements syntactically one after the other and then, in one or more successive stages or “passes”, builds the output code, making sure that statements that refer to other statements are referred to correctly in the final code. Traditionally, the output of the compilation has been called object code or sometimes an object module. It is well known that the object code is machine code that the processor of the computer can process or “execute” one instruction at a time. Thus, stated alternatively, the compiler translates source code into object code, particularly by looking at the entire piece of source code and collecting and reorganizing the instructions. Compilers have schedulers, such as instruction scheduler 50, for instruction scheduling. The scheduler is the compiler phase that orders instructions on a pipelined, superscalar, or VLIW architecture so as to maximize the number of function units operating in parallel and to minimize the time they spend waiting for each other. Examples of instruction scheduling that is performed by schedulers include, but are not limited to: filling a delay slot, interspersing floating-point instructions with integer instructions to keep both units operating, making adjacent instruction independent (e.g., one which writes a register and another which reads from it), and separating memory writes to avoid filling the write buffer.
Continuing to refer to
The module 36 may be implemented in any suitable program language, or in any combination of software, hardware, or firmware. Thus, the module 36 may include instructions and data and be embodied in a computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as the computer system 10 which may pursue and derive any suitable instructions for operation. Any function ascribed to the module 36 and any of its associated functional files, whether implemented in software, hardware, firmware, or any combination thereof, may be included in the functions of the O.S. 38, since the O.S. 38 may include files from the module 36. In some instances, the functions ascribed to the module 36 may be typically performed by the processor 14 executing suitable software instructions in cooperation with aspects of the O.S. 38 that may incorporate the module 36. Therefore, it is to be understood that the module 36 may cooperate with aspects of the O.S. 38.
It will be appreciated by those skilled in the relevant art that the term “execute” may mean the process of manipulating code, such as software, for operation on the computer system 10. It will be further appreciated by those skilled in the relevant art that the term “code” may refer to any suitable instructions or data used by the computer system 10 for the purpose of generating instructions that can execute in the computer system 10. As indicated, the term “module” may refer to a software “procedure” or “function” such as a unit of code that may be independently compiled. Thus, a “computer program” may contain not only suitable software program code, but may also contain at least one module 36, and may be independently compiled and executed.
The emulator 44, as well as the compilation system 40 and the O.S. 38, may reside in the computer system 10, more particularly in the computer memory 16 of the computer system 10. The emulator 44 may substitute instructions typically associated with a different computer system than the executing computer system 10, for any original instruction. Any substitute instruction may be associated with a hardware, software, or firmware representation of a different computer system 10.
The data storage device 18 may be any suitable storage device, including a compact disk drive, a tape drive, a removable hard disk drive, or diskette drive. The data storage device 18 may communicate with the I/O adapter 20, which in turn communicates with other components of the computer system 10, in order to retrieve and store data used by the computer system 10. The data storage device 18 typically includes a computer storage medium having stored therein a computer software program and data.
The computer system 10 for embodiments of the present invention includes suitable input/output devices for accepting input information and promulgating generated information. Input/output devices may include any suitable storage device, such as a compact disk drive, a tape drive, a removable hard disk drive, or a diskette drive. Suitable input devices include, by way of example only, the keyboard 28, the mouse 30, a touch-screen display (not shown), a touch pad (not shown), a microphone including a voice recognition device (not shown), a network card (not shown), or a modem (not shown). The input devices may communicate with the user interface adapter 26 which in turn communicates with components in the computer system 10 for processing input and output commands. Program code may typically be loaded through a suitable input device and may be stored on the data storage device 18. A copy of the program code, or any portion thereof, may alternatively be disposed by the processor 14 in the computer memory 16 for subsequent execution on the computer system 10.
Output devices may include any suitable output devices for presenting generated information to a user, whether a human or a machine, and whether local or remote. Such devices may include, by way of example only, the computer monitor 34, a printer (not shown), an audio speaker with a voice synthesis device (not shown), a network card (not shown), or a modem (not shown). Output devices, such as the monitor 34, may communicate with other components in the computer system 10 through the display adapter 32.
The computer system 10 for various embodiments of the present invention may communicate with communications network 24 via the communications adapter 22, such as a networking card. It may be appreciated that the communications network 24 may be a local area network, a wide area network, or any other suitable computer network. It may also be appreciated any suitable input/output device employed by the module 36 may be coupled to the communications network 24 through the communications adapter 22 and therefore may not necessarily be co-located with the computer system 10. Similarly other portions of the computer system 10, such as the data storage device 18 and the monitor 34, may be coupled to the communications network 24 through the communications adapter 22 and may also not be necessarily co-located with the computer system 10.
Referring now to
The computer system 10 may manage the processing of the source code 62 through the O.S. 38, which may direct the processing of the source code 62 through the compiler optimizer 60 that may generated intermediate code 66 from the source code 62. The intermediate code 66 typically is a list of intermediate-level instructions. Alternatively, the compiler optimizer 60 may generate object code 70 that includes appropriate optimization changes, which could be generated in accordance with various embodiments of the present invention, such as by way of example, the compiler optimizer features which comprise inter alia a system and method for eliminating branch instructions and/or creating and/or merging control flow paths of a computer program.
The computer optimizer 60, which may be a low-level optimizer, performs if-conversion steps. More specifically, the computer optimizer 60 for various embodiments of the present invention performs if-conversions in control flow paths of a computer program to determine when it is beneficial to eliminate branch instructions, such as those in a basic block, and merge control flow paths together using predication. In the compiler optimizer 60, if-conversion steps are performed immediately before instruction scheduling in the instruction scheduler 50.
For various embodiments of the present invention, the instruction scheduler 50 in the compiler optimizer 60 operates on regions (e.g., scheduling regions) in a computer program. For additional various embodiments of the present invention, the regions under compiler-optimizer operation in a computer program may be single entry, multiple exit subgraphs of a control-flow graph of a computer program. The instruction scheduler 50 may operate globally across a computer program and may move program instructions across one or more boundaries of basic blocks. For further additional various embodiments of the present invention, the scheduling phase ordering within the instruction scheduler 50 may be predicate materialization, dependence graph construction followed by appropriate branch elimination. The branch elimination process for embodiments of the present invention operates on control flow paths in a computer program, such as control flow paths within a scheduling region.
The output of the compiler optimizer 60 is preferably optimized object code 70 which may then be transmitted to a suitable linker 74 for resolving any undefined computer location references in the optimized object code 70 and for generating executable code 78 that is capable of executing on an output multi-purpose computer system, such as computer system 10, with appropriate input/output devices, such as the keyboard 28 and the mouse 30. It will be appreciated by those artisans skilled in the relevant art that the input of the computer system 10, and the output of the computer system 10, may both be the same, common computer system 10 and are not to be limited to the exemplary configuration disclosed and illustrated.
As previously mentioned, in the compilation system 40, which preferably includes the compiler optimizer 60 having the instruction scheduler 50, “if-conversions” in a computer program are preferably performed before any scheduling is conducted by the instruction scheduler 50. “If-conversion” preferably replaces control dependence on computer program branches with and/or data dependencies on predicates or prediction parameters. In
Block 94 as best shown in
Block 98 represents the code in block 94 after application of if-conversion. All branches have been eliminated. The instruction setting “x” to 1 is guarded by predicate “p1” which is true if “a” is not equal to 0. The instruction setting “x” to 0 is guarded by predicate “p2” which is true if “a” is equal to 0.
The desire is to eliminate potentially difficult to predict computer program branches by merging certain control flow paths, or at least subsets thereof. The benefit of merging selected control flow paths is to avoid computer program branch misprediction penalties. A misprediction penalty is the cost of mispredicting the direction of a branch instruction. More specifically, a misprediction penalty is typically the number of processor cycles required for the CPU to detect the misprediction and commence executing instructions along the correct control flow path.
Thus, the computer system 10, preferably the compilation system 40 including the compiler optimizer 50, performs “if-conversions” before the instruction scheduler 50 performs instructional scheduling. More specifically, the computer system 10 including the associated compiler optimizer 50 performs appropriate “if-conversions” when it is beneficial to eliminate certain computer program branches and merge control flow paths, or at least subsets thereof, before any instruction scheduling and phase ordering takes place within the instruction scheduler 60. It has been discovered that it is beneficial to merge together control flow paths, or at least subsets thereof, including basic blocks associated with the control flow paths, when the cost of a computer program branch of a control flow path (i.e., the control flow path which potentially is to be eliminated) is greater than the cost of increasing the number of instructions (e.g., increasing the size of a basic block, or augmenting instructions of a basic block with additional instructions from or merging with another basic block) in the control flow path which is not to be eliminated and is the receptive control flow path (i.e., the critical control flow path) in the merger of the control flow paths. Stated alternatively, it has been discovered that it is beneficial to merge control flow paths, or at least subsets thereof, including the associated basic blocks, when the computer program branch cost in a particular flow control path (i.e., the control flow path that is a candidate for elimination) is greater than the cost of a height increase of a merged or combined basic block (i.e., a basic block which has been at least partially combined with another (eliminated) basic block) in a non-eliminated control flow path
A “basic block” for purposes of various embodiments of the present invention may be a sequence of statements or instructions in a computer program, well known to those skilled in the art, especially in the art of computer compilers. More specifically, a “basic block” may be a sequence of consecutive statements or instructions in which flow of control enters at the beginning and leaves at the end without terminating, or possibly branching, except at the end. A basic block includes a “branch instruction” for determining the next basic block to be executed. Also for purposes of embodiments of the present invention, “cost” may be defined as the number of CPU or processor cycles required to execute a computer instruction or group of computer instructions. The cost of a computer program branch depends on the frequency of execution of the computer program branch, the ability of the microprocessor to predict the computer program branch target correctly, and the penalties associated with incorrect (or even correct) mispredictions.
Processors have the ability to predict a computer program branch target correctly through the employment of hardware mechanisms that predict the direction a computer program branch will take. Such processor hardware mechanisms will also mispredict the direction of a computer program branch at a given rate that is dependent on both the particular hardware mechanism employed and the behavior of the particular computer program branch. Penalties associated with incorrect mispredictions include the number of CPU cycles required to detect the misprediction and begin executing instruction(s) on the correct path. Penalties associated with correct predictions include the number of cycles required to begin executing instructions at the target of a correctly predicted taken computer program branch.
The cost of a computer program branch may be determined by the following source code:
BranchCost(b)=(TP(b)×(1−MPR(b)×CPTBP(m))+(MPR(b)×MPP(m))
For various embodiments of the present invention, it has been empirically determined that:
MPR(b)=−1.04357×TR(b)2+1.1987×TR(b)+0.0112
As will be further explained hereafter, it has been discovered that if the cost in cycles of a computer program branch is greater than the increase in resource height, or the increase in dependence height, then it is beneficial to combine control flow paths. Thus, if the computer program branch cost in cycles minus(−) the increase in the number of cycles due to the resource height increase, or the dependence height increase, is greater than zero(0), then it is beneficial to merge that computer program branch with another computer program branch. For various embodiments of the present invention, a comparison is made between the resource height increase and the dependence height increase to determine which of the two is the largest. After this determination has been made, then the larger of the two is used to determine if it is beneficial to merge control flow paths. Therefore, if the cost in cycles of a computer program branch is greater than the largest of, or the largest between, the resource height increase and dependence height increase for that computer program branch merged with another computer program branch, then it would be beneficial to merge that computer program branch with the other computer program branch. Stated alternatively, the final benefit would essentially be the cost of all branches eliminated by merging control flow paths (e.g., two control flow paths) minus or less the largest of the increase in dependence height and the increase in resource height, as the result of merging the control flow paths (e.g., the result of merging the two control flow paths).
The term “height” may include “resource height” and/or “dependence height”. The resource height for a control flow path is the number of cycles that a computer system takes to execute the instructions in each basic block on a control flow path. “Resource height” ignores dependencies amongst and/or within a given set of instructions of basic block(s). Thus, no instruction in a basic block depends on a value or parameter produced from or by another instruction in the basic block or in any other basic block. Stated alternatively, for a “resource height” all instructions are mutually exclusive in the sense that one or more instructions do not depend on one or more other instructions.
“Dependence height” for a control flow path also depends on the number of cycles that a computer system takes to execute instructions in each basic block on a control flow path. However, “dependence height” does have dependency among and/or within a given set of instructions of basic block(s), yet ignores the resources required to execute the instructions. Stated alternatively, “dependence height” takes into account any latency among instructions of a basic block or instructions from another basic block. In other words, for a “dependence height” a value or parameter produced by one or more instructions in a basic block is employed by one of more other instructions in the basic block or in another basic block. Thus, instructions in a basic block for “dependence height” are not mutually exclusive of each other, yet require no resources to execute.
When a combined “resource height” and/or “dependence height” for two or more control flow paths is to be determined, the “resource height” and/or “dependence height” is respectively weighted in accordance with a probability factor that the computer system will be instructed to execute one particular control flow path as opposed to one or more other control flow paths. Stated alternatively, when a combined resource height and/or combined dependence height is to be determined for two or more control flow paths, the number of cycles for a “resource height” and/or the number of cycles for a “dependence height” of the respective two or more control flow path is multiplied by a respective probability factor associated with the control flow paths. A probability factor (identified above as and defined below as “predicted ratio”) for a control flow path is the probability or likelihood that a computer system will be instructed to execute the instructions of that control flow path. These probability-factor executions produce a “weighted” number of cycles (i.e., a weighted resource height and/or a weighted dependence height) for each control flow path. As will be further explained below, the weighted resource height and/or weighted dependence height for each control flow path is then added together to obtain respectively a combined resource height and/or a combined dependence height for the control flow paths.
A cycle may be defined as a unit of real time that depends on the speed of the CPU clock. A “pass” is a single cycle in the processing of a set of data, usually performing part of an overall process. For example, a pass of an assembler through a source program or a pass of a sort program through a set of data. As indicated above, the term “predicted ratio” means or may be defined as the probability factor or likelihood that the computer system will be instructed to execute certain one or more instruction(s) (e.g., basic block(s)) in a control flow path, or any subset thereof, as opposed to being instructed to execute one or more instruction(s) in another control flow path. “Predicted ratio” is empirically determined from past observations and/or experiences, and is employed when a “resource height” and/or a “dependence height” is or are to be weighted in order to determine the “weighted resource height” and/or “weighted dependence height” for control flow paths, both in a merged status and/or in an unmerged status. A “mispredict ratio” is the ratio of the number of times a computer program branch is mispredicted over the number of times the computer program branch is executed.
Referencing now
To determine the total or combined resource height for when subset control flow path 100a and control flow path 104 are merged, reference is now made to
Furthermore, the increase in resource height would be 12 cycles minus(−) 9.5 cycles, or 2.5 cycles. Stated alternatively, the increase in resource height, or incremental change in resource height, as a result of merging control flow paths, such as subset control flow path 100a and control flow path 104, would be the weighted resource height of the merged control flow paths (e.g., merged control flow path 106 merged basic block D–B) minus or less the additive/combined weighted resource heights of the respective control flow paths not merged or in singular status (e.g., the weighted resource height of subset control flow path 100a including its associated basic block B plus the weighted resource height of control flow path 104 including its associated basic block D).
Reference now is made to
To determine the total or combined dependence height for when subset control flow path 100a and control flow path 104 are merged, reference is now made to
For determining the cost of executing the branch instructions in control flow path 100, as previously indicated, the following source code is employed:
BranchCost(b)=(TR(b)×(1−MPR(b)×CPTBP(m))+(MPR(b)×MPP(m))
where: TR(b) is the used or taken ratio (predictive ratio) of Branch(b);
For the subset control flow path 100a (including basic block B) of the control flow path 100 in
For one embodiment of the invention, the computer program branch cost and the resource height increase are used to determine if it is beneficial to merge control flow paths. If the computer program branch cost is greater than the resource height increase, then there is benefit in the merger. Thus, for the example pertaining to the illustration of
In a further embodiment of the present invention, the computer program branch cost and the dependence height increase are used to determine if it is beneficial to merge control flow paths. For this embodiment, if the computer program branch cost is greater than the dependence height increase, then there is benefit in a merger. Thus, for the example pertaining to the illustration of
In another embodiment of the present invention, a comparison is made between the increase in resource height and the increase in dependence height. More specifically, to determine the final benefit of merging control flow paths, the computer branch cost in cycles is used in combination with the largest cycle value between the resource height increase or the dependence height increase is selected. Thus, if the cycle value of the resource height increase is larger than the cycle value of the dependence height increase, the cycle value for the resource height increase is selected for determining the final benefit of merging control flow paths. If the cycle value of the dependence height increase is larger than the cycle value of the resource height increase, the cycle value for the dependence height increase is selected for determining the final benefit of merging control flow paths. For the resource height example of
Referring now to
Various embodiments of the present invention provide a method for assigning a basic block in a computer program to a control flow path. More specifically, and by way of illustration only, various embodiments of the present invention provide for a method of assigning each basic block in the assembly 120 of
In one embodiment and also referencing the block flow diagram of
The biasness of subset control flow paths 124a and 128a would be based on the respective predictive ratios associated with each of the subset control flow paths 124a and 128a. Thus, if subset control flow path 128a has a predictive ratio of 30%(0.30) and subset control flow path 124a has a predictive ratio of 70%(0.70), the most biased subset control flow path would be subset control flow path 124a, and basic block G would be chosen and assigned to the control flow path containing basic block E since it is the most frequent successive basic block in the most biased subset flow control path. If the immediate subset control flow paths 124a and 128a are essentially unbiased with respect to each other, that is they have predictive ratios which are essentially equal (e.g., predictive ratios that do not differ by more than plus(+) or minus(−) 5%(0.05), then the basic block with the smallest or shortest resource height or dependence height is chosen. If basic block G has a shorter or smaller resource height in cycles than basic block F, then basic block G would be chosen, and vice versa. If basic block F has a shorter or smaller dependence height in cycles than basic block G, then basic block F would be chosen, and vice versa. If one basic block has a shorter or smaller resource height and the other basic block has a shorter or smaller dependence height, then the basic block (e.g., basic block G) with the shortest or smallest resource height would be selected. The control flow path would then be extended by applying the “if-instructions” of subparagraphs (i), (ii), and (iii) supra to the newly selected basic block.
As previously mentioned and under the “if-instruction” of subparagraph (iv) supra, if immediate subsets of control flow paths commencing with the entry basic block E contain no basic block(s) selected in accordance with the “if-instruction” of subparagraph (i) or the “if-instruction” of subparagraph (ii) or the “if instruction” of subparagraph (iii), then a (e.g., a frequently or a most frequently executed) basic block not assigned in any immediate subset control flow paths is selected to commence a new control flow path. Thus, if no viable basic-block successor from entry basic block E is found in accordance with the “if-instructions” of subparagraphs (i), (ii), and (iii), the basic block as indicated not yet on any subset control flow path (e.g., not on subset control flow paths 124a or 128a) is subsequently chosen. Reasons for not finding any available viable basic-block successor would include that there is no successor basic block in the region, or there is no successor basic block not yet assigned to a control flow path. Thus, by way of example only and now referencing
It is to be understood basic blocks F, G, J or I could have been chosen instead of basic block H and concomitantly commencing a new computer control flow path, if either basic block F, G, J or I is more frequently executed than basic block H. Thus, other various embodiments of the present invention provide a method for commencing a computer control flow path in a computer program. For this embodiment of the present invention, the computer control flow path, or at least a subset control flow path, would be commenced or begun through the non-selection of a basic block (e.g., basic block G or basic block F) assigned on one of the immediate subset control flow paths (e.g., subset control flow paths 124a and 128a), and through the selection of a basic block (e.g. basic block H:ENTRY) not assigned in any immediate subset control flow paths.
Any control flow path or subset control flow path, including one or more basic blocks associated therewith, may be eliminated or merged with or into another control flow path, or subset control flow path, in accordance with the previously mentioned procedures and principles. It is desired not to necessarily eliminate all control flow paths saving one, but to eliminate the control flow paths and associated basic blocks for which the computer system 10 including the compilation system 40 estimates that the cost of executing the computer program branch instructions is higher than, or greater than, the cost of merging the control flow paths, more specifically two control flow paths. For each subset control flow path in the computer program, such as in the control flow path assembly 120 of
Various embodiments of the present invention may be combined. By way of example only, after the “if-instructions”routine has been executed (see
Referring in detail now to
To determine the total or combined resource height for when the basic blocks G and I of control flow path 124 and basic blocks F and J of control flow path 128 are merged, the same procedure that was used for the illustrations of
To determine the total or combined dependence height for when the basic blocks G and I of control flow path 124 and basic blocks F and J of control flow path 128 are merged, the same procedure that was used for the illustrations of
As previously indicated, the cost of any branch instruction is a function of the frequency of execution of the branch instruction, the ability of the microprocessor to predict the branch target correctly, and the penalties associated with incorrect or correct mispredictions. With respect to the control flow path assembly 120 of
BranchCost(b)=(TP(b)×(1−MPR(b)×CPTBP(m))+(MPR(b)×MPP(m))
Typically, the compilation system 40 has access to all ratios and cycles, except the mispredict ratio of a branch instruction. For various embodiments of the present invention and as previously suggested, the following formula is employed to estimate MPR(b) from TR(b):
MPR(b)=−1.04357×TR(b)2+1.1987×TR(b)+0.0112
As previously mentioned, the resource height increase computes the difference in resource height of the combined control flow path and the weighted resource height of the control flow paths if they are separate. Suitable source code for this procedure is:
Resource Height Increase=mergedResHeight−(path1ResHeight*path1Fraction)−(path2ResHeight* path2Fraction)
As also previously mentioned, the dependence height increase computes the increase in dependence height between a split point (e.g., point 194 in
Dependence Height Increase=max(path1Height, path2Height)−(path1Height*path1Fraction)−(path2Height* path2 Fraction)
If the cost in cycles of a computer program branch instruction (e.g., branch instruction 134, a branch instruction terminating basic block E) is greater than the increase in resource height, or the increase in dependence height, then it is beneficial to combine control flow paths (e.g. to combine basic blocks G and I of control flow path 124 with control flow path 128 including its associated basic blocks F and J). Thus, if the computer program branch cost in cycles minus(−) the resource height increase, or the dependence height increase, is greater than zero(0), then it is beneficial to merge the control flow paths and eliminate the computer program branch instruction. As was previously seen for the example employing the illustrations of
For the computer program branch instruction 134 terminating basic block E in
In another embodiment of the present invention, the computer program branch cost and the dependence height increase are used to determine if it is beneficial to merge control flow paths. If the computer program branch cost is greater than the dependence height increase, then there is benefit in the merger. Thus, for subset basic blocks G and I of control flow path 124, and for basic blocks F and J of control flow path 128, the dependence height increase of 0.8 cycles is used in combination with the computer program branch instruction 134 cost of 8.15 cycles to determine if it is beneficial to merge the subset basic blocks G and I of control flow path 124 with basic blocks F and J of control flow path 128. More particularly, since the computer program branch cost of 8.15 cycles is greater than the dependence height increase of 0.8 cycles, it would then be beneficial to combine basic blocks G and I of control flow path 124 and basic blocks F and J of control flow path 128 to obtain or produce a merged single control flow path 204 terminating in merged basic blocks F–G as shown in
In another embodiment of the present invention, a comparison is made between the increase in resource height and the increase in dependence height. More specifically, to determine the final benefit of merging control flow paths, the computer branch cost in cycles is used in combination with the largest cycle value between the resource height increase or the dependence height increase is selected. Thus, if the cycle value of the resource height increase is larger than the cycle value of the dependence height increase, the cycle value for the resource height increase is selected for determining the final benefit of merging control flow paths. If the cycle value of the dependence height increase is larger than the cycle value of the resource height increase, the cycle value for the dependence height increase is selected for determining the final benefit of merging control flow paths. For the example of
Continuing to refer to
The source code for testing and determining whether or not control flow paths or subset/branches of control flow paths should be merged or combined is as follows:
performPathSelection ( );
Boolean change = TRUE;
while ( change ) {
change = FALSE;
for ( curPath = each selected path in the region ) {
for ( candPath = each selected path in the region ) {
// if neither path has been modified since the last
// time don't bother trying to combine them again
if ( !curPath.modified( ) && !candPath.modified( ) )
continue;
// if we determine that it is both possible
// and beneficial to combine these control-flow
// paths, then do so.
if ( canCombine (curPath, candPath) &&
beneficialToCombine (curPath, candPath ) {
combine (curPath, candPath);
curPath.modified (TRUE) ;
change = TRUE;
}
}
Reference throughout this specification to “one embodiment”, “an embodiment”, or “a specific embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention and not necessarily in all embodiments. Thus, respective appearances of the phrases “in one embodiment”, “in an embodiment”, or “in a specific embodiment” in various places throughout this specification are not necessarily referring to the same embodiment. Furthermore, the particular features, structures, or characteristics of any specific embodiment of the present invention may be combined in any suitable manner with one or more other embodiments. It is to be understood that other variations and modifications of the embodiments of the present invention described and illustrated herein are possible in light of the teachings herein and are to be considered as part of the spirit and scope of the present invention.
Further, at least some of the components of an embodiment of the invention may be implemented by using a programmed general purpose digital computer, by using application specific integrated circuits, programmable logic devices, or field programmable gate arrays, or by using a network of interconnected components and circuits. Connections may be wired, wireless, by modem, and the like.
It will also be appreciated that one or more of the elements depicted in the drawings/figures can also be implemented in a more separated or integrated manner, or even removed or rendered as inoperable in certain cases, as is useful in accordance with a particular application. It is also within the spirit and scope of the present invention to implement a program or code that can be stored in a machine-readable medium to permit a computer to perform any of the methods described above.
Additionally, any signal arrows in the drawings/Figures should be considered only as exemplary, and not limiting, unless otherwise specifically noted. Furthermore, the term “or” as used herein is generally intended to mean “and/or” unless otherwise indicated. Combinations of components or steps will also be considered as being noted, where terminology is foreseen as rendering the ability to separate or combine is unclear.
As used in the description herein and throughout the claims that follow, “a”, “an”, and “the” includes plural references unless the context clearly dictates otherwise. Also, as used in the description herein and throughout the claims that follow, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise.
The foregoing description of illustrated embodiments of the present invention, including what is described in the Abstract, is not intended to be exhaustive or to limit the invention to the precise forms disclosed herein. While specific embodiments of, and examples for, the invention are described herein for illustrative purposes only, various equivalent modifications are possible within the spirit and scope of the present invention, as those skilled in the relevant art will recognize and appreciate. As indicated, these modifications may be made to the present invention in light of the foregoing description of illustrated embodiments of the present invention and are to be included within the spirit and scope of the present invention.
Thus, while the present invention has been described herein with reference to particular embodiments thereof, a latitude of modification, various changes and substitutions are intended in the foregoing disclosures, and it will be appreciated that in some instances some features of embodiments of the invention will be employed without a corresponding use of other features without departing from the scope and spirit of the invention as set forth. Therefore, many modifications may be made to adapt a particular situation or material to the essential scope and spirit of the present invention. It is intended that the invention not be limited to the particular terms used in following claims and/or to the particular embodiment disclosed as the best mode contemplated for carrying out this invention, but that the invention will include any and all embodiments and equivalents falling within the scope of the appended claims.
Patent | Priority | Assignee | Title |
10901743, | Jul 19 2018 | International Business Machines Corporation | Speculative execution of both paths of a weakly predicted branch instruction |
7383544, | Mar 04 2003 | International Business Machines Corporation | Compiler device, method, program and recording medium |
7594223, | Jun 27 2005 | VALTRUS INNOVATIONS LIMITED | Straight-line post-increment optimization for memory access instructions |
7711936, | Aug 28 2007 | Oracle America, Inc | Branch predictor for branches with asymmetric penalties |
7979853, | Mar 04 2003 | International Business Machines Corporation | Compiler device, method, program and recording medium |
Patent | Priority | Assignee | Title |
5889999, | May 15 1996 | Freescale Semiconductor, Inc | Method and apparatus for sequencing computer instruction execution in a data processing system |
5937195, | Nov 27 1996 | Hewlett Packard Enterprise Development LP | Global control flow treatment of predicated code |
5943499, | Nov 27 1996 | Hewlett Packard Enterprise Development LP | System and method for solving general global data flow predicated code problems |
5966536, | May 28 1997 | Oracle America, Inc | Method and apparatus for generating an optimized target executable computer program using an optimized source executable |
5978588, | Jun 30 1997 | Oracle America, Inc | Method and apparatus for profile-based code placement using a minimum cut set of the control flow graph |
6006033, | Aug 15 1994 | International Business Machines Corporation | Method and system for reordering the instructions of a computer program to optimize its execution |
6260190, | Aug 11 1998 | Hewlett Packard Enterprise Development LP | Unified compiler framework for control and data speculation with recovery code |
6289507, | Sep 30 1997 | SOCIONEXT INC | Optimization apparatus and computer-readable storage medium storing optimization program |
6381740, | Sep 16 1997 | Microsoft Technology Licensing, LLC | Method and system for incrementally improving a program layout |
6594824, | Feb 17 1999 | Elbrus International Limited | Profile driven code motion and scheduling |
6611956, | Oct 22 1998 | Matsushita Electric Industrial Co., Ltd. | Instruction string optimization with estimation of basic block dependence relations where the first step is to remove self-dependent branching |
6817013, | Oct 04 2000 | International Business Machines Corporation | Program optimization method, and compiler using the same |
20020056078, | |||
20020078436, | |||
20020095666, | |||
20030066061, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Jun 17 2002 | HANK, RICHARD EUGENE | Hewlett-Packard Company | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 013214 | /0436 | |
Jun 18 2002 | Hewlett-Packard Development Company, L.P. | (assignment on the face of the patent) | / | |||
Jan 31 2003 | Hewlett-Packard Company | HEWLETT-PACKARD DEVELOPMENT COMPANY, L P | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 013776 | /0928 | |
Sep 26 2003 | Hewlett-Packard Company | HEWLETT-PACKARD DEVELOPMENT COMPANY L P | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 014061 | /0492 |
Date | Maintenance Fee Events |
May 28 2010 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Jul 11 2014 | REM: Maintenance Fee Reminder Mailed. |
Nov 28 2014 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Nov 28 2009 | 4 years fee payment window open |
May 28 2010 | 6 months grace period start (w surcharge) |
Nov 28 2010 | patent expiry (for year 4) |
Nov 28 2012 | 2 years to revive unintentionally abandoned end. (for year 4) |
Nov 28 2013 | 8 years fee payment window open |
May 28 2014 | 6 months grace period start (w surcharge) |
Nov 28 2014 | patent expiry (for year 8) |
Nov 28 2016 | 2 years to revive unintentionally abandoned end. (for year 8) |
Nov 28 2017 | 12 years fee payment window open |
May 28 2018 | 6 months grace period start (w surcharge) |
Nov 28 2018 | patent expiry (for year 12) |
Nov 28 2020 | 2 years to revive unintentionally abandoned end. (for year 12) |