A technique for minimizing overhead caused by copying or moving a value from one cluster to another cluster is provided. A number of operations, for example, a mov operation for moving or copying a value from one cluster to another cluster and a normal operation may be executed concurrently. Accordingly, access to a register file outside of the cluster may be reduced and the performance of code may be improved.
|
17. A processor with a clustered architecture, the processor comprising:
an implicit operation generator configured to generate an implicit operation comprising an intra-cluster operation and an inter-cluster operation within a basic block of code which are to be executed concurrently; and
a processing core configured to concurrently execute the intra-cluster operation and the inter-cluster operation included in the implicit operation.
10. A method of minimizing overhead caused by communication between clusters, the method comprising:
generating an implicit operation that implies the presence of a copy operation that is to be executed concurrently with a normal operation within a basic block of code; and
inserting an operand providing operation into the basic block of code to provide an operand value of the copy operation to be executed, in response to the implicit operation being executed.
1. An apparatus for reducing overhead caused by communication between clusters, the apparatus comprising:
an implicit operation generating unit configured to generate an implicit operation that implies the presence of a copy operation that is to be executed concurrently with a normal operation within a basic block of code; and
an operand providing unit configured to insert an operand providing operation into the basic block of code to provide an operand value of the copy operation to be executed, in response to the implicit operation being executed.
2. The apparatus of
3. The apparatus of
4. The apparatus of
5. The apparatus of
a scheduling unit configured to schedule the basic block of code in consideration of the generated implicit operation.
7. The apparatus of
8. The apparatus of
9. The apparatus of
11. The method of
12. The method of
13. The method of
rescheduling the basic block of code in consideration of the generated implicit operation.
14. The method of
15. The method of
16. The method of
18. The processor of
19. The processor of
20. The processor of
|
This application claims the benefit under 35 USC §119(a) of Korean Patent Application No. 10-2011-0119147, filed on Nov. 15, 2011, in the Korean Intellectual Property Office, the entire disclosure of which is incorporated herein by reference for all purposes.
1. Field
The following description relates to a technique for reducing overhead caused by communication from one cluster to another cluster.
2. Description of the Related Art
An application program that includes large parallelism typically requires a register file that has a great number of ports and registers to concurrently access many operands during processing. However, implementation of such a register file is very difficult and incurs enormous hardware expense.
To solve these drawbacks, a clustered architecture has been introduced. In a clustered architecture an independent register file is provided for each cluster unit. The clustered architecture typically has a small number of ports, and operations are concurrently input to multiple clusters and executed. In the clustered architecture, access to many operands is possible, thus various application programs can be executed with a register file having a simpler architecture.
However, if one cluster needs to access a register file that is outside of the cluster, for example, if one cluster needs to move data or copy data to a register file of another cluster, overhead may be incurred. In this example, if the cluster has a large number of data move or data copy operations to a register file of another cluster, a large amount of overhead may be incurred and throughput efficiency may be degraded.
In one aspect, there is provided an apparatus for reducing overhead caused by communication between clusters, the apparatus including an implicit operation generating unit configured to generate an implicit operation that implies the presence of a copy operation that is to be executed concurrently with a normal operation within a basic block of code, and an operand providing unit configured to insert an operand providing operation into the basic block of code to provide an operand value of the copy operation to be executed, in response to the implicit operation being executed.
The implicit operation may comprise an additional operand which indicates the presence of the copy operation to be executed concurrently with each of the normal operations.
The operand that indicates the presence of the copy operation may comprise a single bit that is set to either “0” or “1” to represent the absence or the presence of the copy operation.
The copy operation may comprise an operation to copy or move a value from one cluster to another cluster, and the normal operation may comprise an operation within a cluster.
The copy operation may comprise a mov operation and the normal operation may comprise an add operation.
The apparatus may further comprise a scheduling unit configured to schedule the basic block of code in consideration of the generated implicit operation.
The operand providing operation may comprise four operands.
The operand providing operation may comprise a pushmvs operation which is inserted into the basic block of code and which provides an operand value for the copy operation.
The operand providing operation may pair values of its operands and input the pairs sequentially to a hardware buffer.
The implicit operation may read a corresponding operand value for the copy operation from the hardware buffer and execute the copy operation concurrently with the normal operation.
In another aspect, there is provided a method of minimizing overhead caused by communication between clusters, the method including generating an implicit operation that implies the presence of a copy operation that is to be executed concurrently with a normal operation within a basic block of code, and inserting an operand providing operation into the basic block of code to provide an operand value of the copy operation to be executed, in response to the implicit operation being executed.
The implicit operation may comprise an additional operand that indicates the presence of the copy operation to be executed concurrently with the normal operation.
The operand that indicates the presence of the copy operation may consist of a single bit that is set to either “0” or “1” to represent the absence or the presence of the copy operation.
The method may further comprise rescheduling the basic block of code in consideration of the generated implicit operation.
The copy operation may comprise an operation to copy or move a value from one cluster to another cluster, and the normal operation may comprise an operation performed within a cluster.
The operand providing operation may pair values of its operands and input the pairs sequentially to a hardware buffer.
The implicit operation may read a corresponding operand value for the copy operation from the hardware buffer and execute the copy operation concurrently with the normal operation.
In another aspect, there is provided a processor with a clustered architecture, the processor including an implicit operation generator configured to generate an implicit operation comprising an intra-cluster operation and an inter-cluster operation within a basic block of code which are to be executed concurrently, and a processing core configured to concurrently execute the intra-cluster operation and the inter-cluster operation included in the implicit operation.
The inter-cluster operation may comprise at least one of a copy operation and a move operation configured to copy or to move a value from a first cluster to a second cluster, respectively.
The implicit operation may further comprise an operand which indicates the presence of the inter-cluster operation within the implicit operation.
The implicit operation generator may be further configured to analyze a dependence of operations within the basic block of code to determine the intra-cluster operation and the inter-cluster operation to be included in the implicit operation.
Other features and aspects may be apparent from the following detailed description, the drawings, and the claims.
Throughout the drawings and the detailed description, unless otherwise described, the same drawing reference numerals will be understood to refer to the same elements, features, and structures. The relative size and depiction of these elements may be exaggerated for clarity, illustration, and convenience.
The following description is provided to assist the reader in gaining a comprehensive understanding of the methods, apparatuses, and/or systems described herein. Accordingly, various changes, modifications, and equivalents of the methods, apparatuses, and/or systems described herein will be suggested to those of ordinary skill in the art. Also, descriptions of well-known functions and constructions may be omitted for increased clarity and conciseness.
Referring to
The implicit operation generating unit 110 may generate implicit operations. As an example, an implicit operation may imply the presence of a copy operation that is to be executed concurrently with a normal operation in a basic block of code. For example, the copy operation may copy or move a value of one cluster to another cluster. This is a copy operation is also referred to as an inter-cluster operation. In this example, the copy operation may include an operation, for example, a mov operation. Because such operations typically require regular access to a register file outside of a cluster, the performance of an application code which uses a large amount of copy operations may be remarkably degraded.
A normal operation is an operation performed within a cluster. For example, a normal operation may be an add operation, a mul operation, and the like. A normal operation does not exchange values between different clusters. That is, a normal operation includes an intra-cluster operation.
An implicit operation is an operation that is defined to execute a normal operation and a copy operation in parallel with each other. For example, an implicit operation may be generated for each of the normal operations within a basic block. In this example, the implicit operation implies the presence of a copy operation to be executed.
For example, the implicit operation generating unit 110 may search for a copy operation to be executed concurrently with a normal operation based on dependence between operations within a basic block. The implicit operation generating unit 110 may store a pair of operations including a found copy operation and the normal operation in a temporary table. In this example, the implicit operation generating unit 110 may check whether general instructions have a copy instruction to be executed together with reference to the table, and generate an implicit operation based on the check result.
The scheduling unit 120 may schedule the basic block in consideration of the generated implicit operations. Because the generation of the implicit operations causes modification of the code within the basic block, re-scheduling may be performed.
The operand providing unit 130 may insert an operand providing operation into an upper portion of the basic block of code. For example, the operand providing operation may provide an operand value of the copy operation to be executed in response to the execution of an implicit operation. An operand value of the copy operation may be provided to the implicit operation before the implicit operation is executed. Thus, operation code may be inserted at the beginning of the basic block to provide operand values of copy operations to the implicit operations before the execution of the implicit operations.
Processing unit 140 may execute the implicit operation. For example, the processing unit 140 may process in parallel the copy operation and the normal operation included in the implicit operation. For example, the processing unit may simultaneously process the copy operation and the normal operation. It should be appreciated that the processing unit 140 may also execute other operations in addition to the implicit operations.
In (a) of
If a normal operation such as an op operation is executed concurrently with a mov operation as shown in (b) of
Generally, scheduling and register allocation with respect to application source code is completed by a compiler. Subsequently, assembly code is generated. In the examples shown in
For example, an emb value of an implicit operation that includes a normal operation and a peer copy operation to be executed together may be set to “1,” and an emb value of an implicit operation that includes a normal operation that does not have a peer copy operation to be executed together may be set to “0.”
Referring to
Accordingly, an operand providing operation may be inserted at a top of the basic block to provide an operand value of the copy operation as shown in
If there is a copy operation that is to be executed while the implicit operation is executed, the implicit operation may execute the copy operation by reading a corresponding operand value from the hardware buffer in a decode stage or a decompression pipeline stage. For example, in response to operations of pushmvs crf[1], drf[2], drf[3], crf[4] being executed in the example illustrated in
Referring to
The implicit operation is an operation to execute a normal operation and a copy operation concurrently, and may be generated for each of the operations within a basic block of code, implying the presence of the copy operation to be executed. In various examples, the implicit operation may further include an additional operand to indicate the presence of the copy operation to be executed in addition to the original normal operation as shown in (c) of
In 320, in consideration of the generated implicit operations, the basic block is re-scheduled. Generally, assembly code is generated after completion of scheduling and register allocation with respect to application source code. In this example, after the scheduling and register allocation with respect to application source code is completed, the implicit operations may be generated before assembly code is generated. Re-scheduling may be performed taking into consideration the generated implicit operations. For example, as a result of the generation of the implicit operations and the scheduling, the first and the fourth implicit operations shown in
In 330, an operand providing operation for providing an operand value of the copy operation to be executed is inserted into a top of the basic block. In response to the first and the fourth implicit operations in the basic block shown in
For example, the operand providing operation may have four operands, and when the operand providing operation itself is executed in a processor, the operand providing operation may pair the operands into two pairs, and sequentially input the pairs to a hardware buffer.
In the presence of a copy operation that is to be executed concurrently with the execution of the implicit operation, the implicit operation may read a corresponding operand value from the hardware buffer in a decode stage or a decompression pipeline stage and execute the copy operation. For example, in the example illustrated in
A simple dependence graph consisting of four operations is shown in an upper portion of
The apparatus and method shown in the above examples may contribute to the improvement of the performance of processing an application source code through the use of implicit operations.
Program instructions to perform a method described herein, or one or more operations thereof, may be recorded, stored, or fixed in one or more computer-readable storage media. The program instructions may be implemented by a computer. For example, the computer may cause a processor to execute the program instructions. The media may include, alone or in combination with the program instructions, data files, data structures, and the like. Examples of computer-readable storage media include magnetic media, such as hard disks, floppy disks, and magnetic tape; optical media such as CD ROM disks and DVDs; magneto-optical media, such as optical disks; and hardware devices that are specially configured to store and perform program instructions, such as read-only memory (ROM), random access memory (RAM), flash memory, and the like. Examples of program instructions include machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter. The program instructions, that is, software, may be distributed over network coupled computer systems so that the software is stored and executed in a distributed fashion. For example, the software and data may be stored by one or more computer readable storage mediums. Also, functional programs, codes, and code segments for accomplishing the example embodiments disclosed herein can be easily construed by programmers skilled in the art to which the embodiments pertain based on and using the flow diagrams and block diagrams of the figures and their corresponding descriptions as provided herein. Also, the described unit to perform an operation or a method may be hardware, software, or some combination of hardware and software. For example, the unit may be a software package running on a computer or the computer on which that software is running.
As a non-exhaustive illustration only, a terminal/device/unit described herein may refer to mobile devices such as a cellular phone, a personal digital assistant (PDA), a digital camera, a portable game console, and an MP3 player, a portable/personal multimedia player (PMP), a handheld e-book, a portable laptop PC, a global positioning system (GPS) navigation, a tablet, a sensor, and devices such as a desktop PC, a high definition television (HDTV), an optical disc player, a setup box, a home appliance, and the like that are capable of wireless communication or network communication consistent with that which is disclosed herein.
A computing system or a computer may include a microprocessor that is electrically connected with a bus, a user interface, and a memory controller. It may further include a flash memory device. The flash memory device may store N-bit data via the memory controller. The N-bit data is processed or will be processed by the microprocessor and N may be 1 or an integer greater than 1. Where the computing system or computer is a mobile apparatus, a battery may be additionally provided to supply operation voltage of the computing system or computer. It will be apparent to those of ordinary skill in the art that the computing system or computer may further include an application chipset, a camera image processor (CIS), a mobile Dynamic Random Access Memory (DRAM), and the like. The memory controller and the flash memory device may constitute a solid state drive/disk (SSD) that uses a non-volatile memory to store data.
A number of examples have been described above. Nevertheless, it should be understood that various modifications may be made. For example, suitable results may be achieved if the described techniques are performed in a different order and/or if components in a described system, architecture, device, or circuit are combined in a different manner and/or replaced or supplemented by other components or their equivalents. Accordingly, other implementations are within the scope of the following claims.
Ahn, Min-wook, Jin, Tai-song, Ahn, Hee-Jin
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
5838984, | Aug 19 1996 | SAMSUNG ELECTRONICS CO , LTD | Single-instruction-multiple-data processing using multiple banks of vector registers |
7647473, | Apr 25 2001 | Fujitsu Limited | Instruction processing method for verifying basic instruction arrangement in VLIW instruction for variable length VLIW processor |
8127117, | May 10 2006 | Qualcomm Incorporated | Method and system to combine corresponding half word units from multiple register units within a microprocessor |
20080126762, | |||
KR100236527, | |||
KR100822612, | |||
KR1020090009959, | |||
KR1020100034976, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Jul 04 2012 | AHN, MIN-WOOK | SAMSUNG ELECTRONICS CO , LTD | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 028529 | /0101 | |
Jul 04 2012 | JIN, TAI-SONG | SAMSUNG ELECTRONICS CO , LTD | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 028529 | /0101 | |
Jul 04 2012 | AHN, HEE-JIN | SAMSUNG ELECTRONICS CO , LTD | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 028529 | /0101 | |
Jul 11 2012 | Samsung Electronics Co., Ltd. | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Dec 16 2015 | ASPN: Payor Number Assigned. |
Apr 22 2019 | REM: Maintenance Fee Reminder Mailed. |
Oct 07 2019 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Sep 01 2018 | 4 years fee payment window open |
Mar 01 2019 | 6 months grace period start (w surcharge) |
Sep 01 2019 | patent expiry (for year 4) |
Sep 01 2021 | 2 years to revive unintentionally abandoned end. (for year 4) |
Sep 01 2022 | 8 years fee payment window open |
Mar 01 2023 | 6 months grace period start (w surcharge) |
Sep 01 2023 | patent expiry (for year 8) |
Sep 01 2025 | 2 years to revive unintentionally abandoned end. (for year 8) |
Sep 01 2026 | 12 years fee payment window open |
Mar 01 2027 | 6 months grace period start (w surcharge) |
Sep 01 2027 | patent expiry (for year 12) |
Sep 01 2029 | 2 years to revive unintentionally abandoned end. (for year 12) |