A method for code size reduction, which comprises determining basic blocks in an IR module; grouping the basic blocks having duplicate code into groups; providing weighting values corresponding to different instructions of the module, wherein the weighting values are determined based on a plurality of intermediate representation program codes; determining a weighted size of the module, wherein the weighted size of the module is determined by summing weighted sizes of the basic blocks of the module, and the weighted size of each basic block is determined by summing products of numbers of different instructions of the basic blocks and the corresponding weighting values; removing duplicates in one group to obtain a module having one processed group; determining a weighted size of the module having one processed group; and comparing the weighted size of the module to the weighted size of the module having one processed group.
|
1. A method for code size reduction executed by a processor, comprising the steps of:
a) determining basic blocks in an intermediate representation module;
b) grouping ones of the basic blocks having duplicate code into groups;
c) providing weighting values corresponding to different instructions of the module, wherein the weighting values are determined based on a plurality of intermediate representation program codes of number i, wherein i is an integer greater than 0;
d) determining a weighted size of the module, wherein the weighted size of the module is determined by summing weighted sizes of the basic blocks of the module, and the weighted size of each basic block is determined by summing products of numbers of different instructions of the basic blocks and the corresponding weighting values;
e) removing duplicates in one of the groups of the module to obtain a module having one processed group;
f) determining a weighted size of the module having one processed group, wherein the weighted size of the module having one processed group is determined by summing weighted sizes of the basic blocks of the module having one processed group, and the weighted size of each basic block is determined by summing products of numbers of different instructions of the basic blocks of the module having one processed group and the corresponding weighting values; and
g) comparing the weighted size of the module to the weighted size of the module having one processed group.
6. An apparatus for code size reduction, comprising:
a computer processor; and
a procedural extraction mechanism operated on the computer processor, wherein the procedural extraction mechanism is used to:
a) determine basic blocks in an intermediate representation module;
b) group ones of the basic blocks having duplicate code into groups;
c) provide weighting values corresponding to different instructions of the module;
d) determine a weighted size of the module;
e) remove duplicates in one of the groups of the module to obtain a module having one processed group;
f) determine a weighted size of the module having one processed group; and
g) compare the weighted size of the module and the weighted size of the module having one processed group;
wherein the weighting values are determined based on a plurality of intermediate representation program codes of number i, wherein i is an integer greater than 0;
wherein the weighted size of the module is determined by summing weighted sizes of the basic blocks of the module, and the weighted size of each basic block is determined by summing products of numbers of different instructions of the basic blocks and the corresponding weighting values;
wherein the weighted size of the module having one processed group is determined by summing weighted sizes of the basic blocks of the module having one processed group, and the weighted size of each basic block is determined by summing products of numbers of different instructions of the basic blocks of the module having one processed group and the corresponding weighting values.
2. The method of
h) removing the one of the groups if the weighted size of the module having one processed group is greater than the weighted size of the module.
3. The method of
repeating step e) to step h) until all groups are processed; and
removing duplicates in remaining groups.
4. The method of
5. The method of
providing the plurality of intermediate representation program codes of number i;
providing a plurality of parameters (xj) corresponding to all intermediate representation instructions, wherein j is an integer greater than 0;
counting a number (ai,j) of each intermediate representation instruction used in each intermediate representation program code;
compiling each intermediate representation program code of the plurality of intermediate representation program codes to an object code;
determining a size (bi) of each object code; and
solving the following equation for the plurality of parameters (xj),
where n is an integer;
wherein the weighting values are selected from the parameters (xj).
7. The apparatus of
8. The apparatus of
9. The apparatus of
10. The apparatus of
provide the plurality of intermediate representation program codes of number i;
provide a plurality of parameters (xj) corresponding to all intermediate representation instructions, wherein j is an integer greater than 0;
count a number (ai,j) of each intermediate representation instruction used in each intermediate representation program code;
compile each intermediate representation program code of the plurality of intermediate representation program codes to an object code;
determine a size (bi) of each object code; and
solve the following equation for the plurality of parameters (xj),
where n is an integer;
wherein the weighting values are selected from the parameters (xj).
|
1. Technical Field
The present invention relates to a method and apparatus for code size reduction.
2. Related Art
In computer science, program optimization or software optimization is a process of modifying a software program to work more efficiently or use fewer resources. Normally, after being optimized, a computer program can execute commands more rapidly, operate with less memory storage or other resources, or consume less power.
A compiler is a computer program that transforms a source code written in a high-level computer language into a low-level computer language. Usually, a compiler comprises three main parts: the front-end, the middle-end, and the back-end. The front-end parses a text-based programming language into an intermediate representation (IR) of the source code for the middle-end; the middle-end performs optimizations on the IR and generates another IR for the back-end; and the back-end translates the IR from the middle-end into an assembly code.
Many compiler optimization techniques have been developed to reduce the size of generated codes, and one such technique is known as procedural abstraction. Procedural abstraction identifies repeated segments of code, and then extracts the repeated segments of code to construct a subroutine and replace other repeated segments by procedural calls to the newly created subroutine. As a result, the size of the code can be reduced.
Conventional procedural abstraction techniques are not perfect. In some circumstances, code size reduction between a non-optimized IR code and an optimized IR code is greater than that between the object code generated from a non-optimized IR code and the object code generated from an optimized IR code.
One embodiment provides a method for code size reduction, which comprises determining basic blocks in an intermediate representation module; grouping those of the basic blocks having duplicate code into groups; providing weighting values corresponding to different instructions of the module, wherein the weighting values are determined based on a plurality of intermediate representation program codes; determining a weighted size of the module, wherein the weighted size of the module is determined by summing weighted sizes of the basic blocks of the module, and the weighted size of each basic block is determined by summing products of numbers of different instructions of the basic blocks and the corresponding weighting values; removing duplicates in one of the groups of the module to obtain a module having one processed group; determining a weighted size of the module having one processed group, wherein the weighted size of the module having one processed group is determined by summing weighted sizes of the basic blocks of the module having one processed group, and the weighted size of each basic block is determined by summing products of numbers of different instructions of the basic blocks of the module having one processed group and the corresponding weighting values; and comparing the weighted size of the module to the weighted size of the module having one processed group.
One embodiment of the present invention comprises an apparatus for code size reduction. The apparatus can comprise a computer processor and a procedural extraction mechanism operated on the computer processor. The procedural extraction mechanism is used to determine basic blocks in an intermediate representation module, group those of the basic blocks having duplicate code into groups, provide weighting values corresponding to different instructions of the module, determine a weighted size of the module, remove duplicates in one of the groups of the module to obtain a module having one processed group, determine a weighted size of the module having one processed group, and compare the weighted size of the module to the weighted size of the module having one processed group. The weighted size of the module having one processed group is determined by summing weighted sizes of the basic blocks of the module having one processed group, and the weighted size of each basic block is determined by summing products of numbers of different instructions of the basic blocks of the module having one processed group and the corresponding weighting values. The weighted size of the module is determined by summing weighted sizes of the basic blocks of the module, and the weighted size of each basic block is determined by summing products of numbers of different instructions of the basic blocks and the corresponding weighting values. The weighting values are determined based on a plurality of intermediate representation program codes.
To better understand the above-described objectives, characteristics and advantages of the present invention, embodiments, with reference to the drawings, are provided for detailed explanations.
The invention will be described according to the appended drawings in which:
The following description is presented to enable any person skilled in the art to make and use the disclosed embodiments, and is provided in the context of a particular application and its requirements. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the disclosed embodiments. Thus, the disclosed embodiments are not limited to the embodiments shown, but are to be accorded the widest scope consistent with the principles and features disclosed herein.
In some embodiments, a new method for code size reduction is disclosed. The new method is demonstrated using an LLVM (Low Level Virtual Machine) infrastructure. The new method is not limited to an LLVM infrastructure, and is applicable to any suitable compiler infrastructures. In some embodiments, the new method is applicable to a computer program, code, module, or the like, wherein a module may contain one or more functions, a function may comprise a plurality of basic blocks, and a basic block comprises a sequence of instructions. A basic block may have one entry point and one exit point. An instruction may be any representation of an element of an executable program, such as a bytecode. In some embodiments, a basic block may have a terminator instruction such as a branch or a function return.
A basic block can have a size. The size of the basic block can be represented by instruction counts or the number of instructions of the basic blocks. A module M may comprise a plurality of function (function1), each of which may comprise a plurality of basic blocks BBm; however, some are similar while others are not. The size, S(M), of the module M can be determined by the following equation (1):
The equation (1) can be utilized to evaluate the effect of performing procedural abstraction, or procedural extraction, to extract a group of a module including similar basic blocks. The equation (1) is initially used to determine the size S(Morig) of the original module, and then to determine the size S(Mextract) of the module that has a group whose basic blocks' duplicates are removed. If the size S(Mextract) is less than the size S(Morig), the removal of the duplicates in the group can be considered as being helpful in the optimization of the module.
TABLE 1
line
Procedural abstraction algorithm
M is a LLVM Module; Prof is a profiling message of a hot-
spot basic block
1
select a basic block BBx for comparison
2
for basic block BBy in module M do
3
if BBx is equal to BBy then
4
if BBx is not in any group then
5
push BBx into vector V and mark BBx with group GPi
6
push BBy into vector V and mark BBy with group GPi
7
else
8
push BBy into vector V and mark BBy with group GPi
9
end if
10
end if
11
end for
12
calculate weighted size W(M)
13
for Group GPi in V do
14
copy Module M to another Module M1
15
extract the corresponding members (basic blocks)
of GPi from M1
16
calculate weighted size W(M1)
17
if W(M1) > W(M) then
18
remove members of GPi from V
19
end if
20
end for
21
for BBi in V do
22
if BBi is a hot-spot region in Prof then
23
remove BBi from V
24
end if
25
end for
26
Reorganize V
27
for BBi in GPk in V do
28
extract BBi from M
29
end for
30
output M as a transformed file
Referring to
In Step S102, the method determines the basic blocks of the module. In one embodiment, the module is an intermediate representation (IR) module. Next, as also shown in lines 2 to 11 of Table 1, the method iteratively traverses all basic blocks of the module to determine basic blocks with duplicate code, and then classifies the basic blocks with duplicate code into the same group. Each classified basic block can be pushed into a vector V. This step continues until all basic blocks are classified.
For example, as shown in
Referring to
Next, the method uses the following equation (2) to multiply the numbers of different instructions (Instrk) of each basic block of the module by corresponding weighting values, and sums the products to obtain a weighted size of the basic block and to sum the weighted sizes of the basic blocks of the module to obtain the weighted size (W(M)) of the module.
Subsequently, the method creates another module M1 by making a copy of the module M. The method then selects a group of basic blocks of the module M1 and removes or eliminates the duplicates in the group. Next, the method uses the equation (2) to multiply the numbers of different instructions (Instrk) of each basic block of the module M1 by corresponding weighting values, and sums the products to obtain a weighted size of the basic block and to sum the weighted sizes of the basic blocks of the module M1 to obtain the weighted size W(M1) of the module M1. In Step S105, the methods compare the weighted size W(M) with the weighted size W(M1) to determine whether such a removal of duplicates in the group can create more benefit than cost. If the removal of duplicates in a group creates more benefit, the group will be left in the vector, and it and other remaining groups will be extracted in a later optimization process.
In one embodiment, in each cost-benefit analysis, only one group of basic blocks are extracted or processed.
For example, in the embodiments of
As shown in
Referring to
Referring to
The following demonstrates the generation of the instruction weight table, while the present invention is not limited to such a disclosure.
A plurality of IR (intermediate representation) program codes of number i are provided. The program code may comprise a library, a benchmark program, a function of a benchmark program, a function, or an optimized IR program. Next, a plurality of parameters (xj) are used to correspondingly represent IR (intermediate representation) instructions. The parameters (xj) can be organized into a vector as shown below.
Each IR program code is iteratively examined to count the number of each IR instruction until all IR instructions are exhausted. Thereafter, a set of equations can be obtained.
where i denotes an index between one and n, j denotes an index between 1 and m, n is an integer, m is an integer and may be equal to n, ai,j represents the numbers of instruction j in the ith IR program code, and bi represents the code size of the compiled ith IR program code.
The set of equations can be converted into a matrix equation (5), M{right arrow over (x)}={right arrow over (b)}.
An approximate vector {right arrow over (x)} can be obtained by solving the equation (5), and each element of the vector {right arrow over (x)} represents an estimate of the corresponding IR instruction, which can be used as a weighting value.
In one embodiment, the vector {right arrow over (x)} is determined by calculating a generalized inverse or pseudo-inverse of the matrix M. In one embodiment, the vector {right arrow over (x)} is calculated by a constrained least squares method. In one embodiment, the elements of the vector {right arrow over (x)} are constrained between 5 and 35.
In one embodiment, the weighting values can be determined based on a plurality of intermediate representation program codes. In one embodiment, the weighting values can be determined based on a plurality of intermediate representation program codes and sizes thereof. In one embodiment, the weighting values can be determined based on a plurality of intermediate representation program codes, the number of the intermediate representation instructions of each intermediate representation program code, and the size of each intermediate representation program code.
As shown in
In addition, a weighting value generating mechanism 312 is operated on the computer processor 31 to provide a plurality of intermediate representation program codes of number i; provide a plurality of parameters (xj) corresponding to all intermediate representation instructions of an intermediate language; count a number (ai,j) of each intermediate representation instruction used in each intermediate representation program code; compile each IR program code of the plurality of intermediate representation program codes to an object code; determine a size (bi) of each object code; and determine the plurality of parameters (xj) by the following equation,
where n is an integer.
The apparatus 3 can comprise a memory 32, which can be disposed with the same system, including the processor 31, or can remotely communicate with the processor 31. The memory 32 can store an instruction weight table, which can be formed according to the above plurality of parameters (xj). The computer processor 31 can select required weighting values from the instruction weight table.
The data structures and code described in this detailed description are typically stored on a non-transitory computer-readable storage medium, which may be any device or medium that can store code and/or data for use by a computer system. The non-transitory computer-readable storage medium includes, but is not limited to, volatile memory, non-volatile memory, magnetic and optical storage devices such as disk drives, magnetic tape, CDs (compact discs), DVDs (digital versatile discs or digital video discs), or other media capable of storing code and/or data now known or later developed.
The methods and processes described in the detailed description section can be embodied as code and/or data, which can be stored in a non-transitory computer-readable storage medium as described above. When a computer system reads and executes the code and/or data stored on the non-transitory computer-readable storage medium, the computer system performs the methods and processes embodied as data structures and code and stored within the non-transitory computer-readable storage medium. Furthermore, the methods and processes described below can be included in hardware modules. For example, the hardware modules can include, but are not limited to, application-specific integrated circuit (ASIC) chips, field-programmable gate arrays (FPGAs), and other programmable-logic devices now known or later developed. When the hardware modules are activated, the hardware modules perform the methods and processes included within the hardware modules.
It will be apparent to those skilled in the art that various modifications and variations can be made to the disclosed embodiments. It is intended that the specification and examples be considered as exemplary only, with a true scope of the disclosure being indicated by the following claims and their equivalent.
Lee, Jenq Kuen, Wang, Shao Chung, Yang, Kun Hua
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
6064819, | Dec 08 1993 | IMEC | Control flow and memory management optimization |
7278137, | Dec 26 2001 | Synopsys, Inc | Methods and apparatus for compiling instructions for a data processor |
8661422, | Feb 08 2008 | Qualcomm Incorporated | Methods and apparatus for local memory compaction |
20060070047, | |||
20110055819, | |||
20120159459, | |||
20120265972, | |||
TW200947343, | |||
TW201246064, | |||
TW505853, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Apr 08 2013 | YANG, KUN HUA | National Tsing Hua University | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 030424 | /0117 | |
Apr 08 2013 | WANG, SHAO CHUNG | National Tsing Hua University | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 030424 | /0117 | |
Apr 08 2013 | LEE, JENQ KUEN | National Tsing Hua University | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 030424 | /0117 | |
May 15 2013 | National Tsing Hua University | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Sep 26 2018 | M2551: Payment of Maintenance Fee, 4th Yr, Small Entity. |
Feb 06 2023 | M2552: Payment of Maintenance Fee, 8th Yr, Small Entity. |
Date | Maintenance Schedule |
Sep 01 2018 | 4 years fee payment window open |
Mar 01 2019 | 6 months grace period start (w surcharge) |
Sep 01 2019 | patent expiry (for year 4) |
Sep 01 2021 | 2 years to revive unintentionally abandoned end. (for year 4) |
Sep 01 2022 | 8 years fee payment window open |
Mar 01 2023 | 6 months grace period start (w surcharge) |
Sep 01 2023 | patent expiry (for year 8) |
Sep 01 2025 | 2 years to revive unintentionally abandoned end. (for year 8) |
Sep 01 2026 | 12 years fee payment window open |
Mar 01 2027 | 6 months grace period start (w surcharge) |
Sep 01 2027 | patent expiry (for year 12) |
Sep 01 2029 | 2 years to revive unintentionally abandoned end. (for year 12) |