An apparatus for data processing in a heterogeneous multi-processor environment are provided. The apparatus including an analysis unit configured to analyze 1) operations to be run in connection with data processing and 2) types and a number of processors available for the data processing, a partition unit configured to dynamically partition data into a plurality of data regions having different sizes based on the analyzed operations and operation-specific processor priority information, which is stored in advance of running the operations, and a scheduling unit configured to perform scheduling by allocating operations to be run in the data regions between the available processors.
|
11. A method of optimizing data processing in a heterogeneous multi-processor environment, the method comprising:
analyzing, in an optimizing processor,
operations to be run in connection with data processing, by putting on hold operations called over a predetermined period of time, and monitoring the called operations for the predetermined period of time,
a processor type of each processor available for the data processing among processors in the heterogeneous multi-processor environment, and
a quantity of the processors available for the data processing;
dynamically partitioning data into data regions having different sizes, based on the analyzed operations and operation-specific processor priority information, wherein the operation-specific processor priority information is stored in advance of running the operations; and
performing scheduling, by allocating operations to be run in the data regions between the processors available for the data processing.
1. An apparatus for optimizing data processing in a heterogeneous multi-processor environment, the apparatus comprising:
an optimizing processor comprising
an analyzer configured to analyze
operations to be run in connection with data processing, by putting on hold operations called over a predetermined period of time, and monitoring the called operations for the predetermined period of time,
a processor type of each processor available for the data processing among processors in the heterogeneous multi-processor environment, and
a quantity of the processors available for the data processing;
a partitioner configured to dynamically partition data into data regions having different sizes, based on the analyzed operations and operation-specific processor priority information, wherein the operation-specific processor priority information is stored in advance of running the operations; and
a scheduler configured to perform scheduling, by allocating operations to be run in the data regions between the processors available for the data processing.
19. A method of optimizing data processing in a heterogeneous multi-processor environment, the method comprising:
analyzing, in an optimizing processor,
operations to be run in connection with data processing,
a processor type of each processor available for the data processing among processors in the heterogeneous multi-processor environment, and
a quantity of the processors available for the data processing;
dynamically partitioning data into data regions having different sizes, based on the analyzed operations and operation-specific processor priority information, wherein the operation-specific processor priority information is stored in advance of running the operations,
wherein the dynamically partitioning the data comprises determining a direction in which to partition the data, in response to the data being two-dimensional or higher-dimensional, and
wherein the direction in which to partition the data is a direction that produces a smallest quotient when dividing a total quantity of operations to be run by a quantity of data regions; and
performing scheduling, by allocating operations to be run in the data regions between the processors available for the data processing.
10. An apparatus for optimizing data processing in a heterogeneous multi-processor environment, the apparatus comprising:
an optimizing processor comprising:
an analyzer configured to analyze operations to be run in connection with data processing, to analyze a processor type of each processor available for the data processing among processors in the heterogeneous multi-processor environment, and to analyze a quantity of the processors available for the data processing;
a partitioner configured to
dynamically partition data into data regions having different sizes, based on the analyzed operations and operation-specific processor priority information, wherein the operation-specific processor priority information is stored in advance of running the operations, and
determine a direction in which to partition the data, in response to the data is two-dimensional or higher-dimensional, wherein the direction in which to partition the data is a direction that produces a smallest quotient when dividing a total quantity of operations to be run by a quantity of data regions; and
a scheduler configured to perform scheduling, by allocating operations to be run in the data regions between the processors available for the data processing.
2. The apparatus of
3. The apparatus of
4. The apparatus of
5. The apparatus of
6. The apparatus of
7. The apparatus of
8. The apparatus of
9. The apparatus of
12. The method of
13. The method of
14. The method of
15. The method of
16. The method of
17. The method of
18. The method of
|
This application claims the benefit under 35 U.S.C. §119(a) of Korean Patent Application No. 10-2010-0117050, filed on Nov. 23, 2010, in the Korean Intellectual Property Office, the entire disclosure of which is incorporated herein by reference for all purposes.
1. Field
The following description relates to an apparatus and method for processing data in a heterogeneous multi-processor environment.
2. Description of the Related Art
Heterogeneous multi-processor systems may be systems equipped with heterogeneous processors, for example, a main processor and accelerators. Since the accelerators can run only certain tasks at high speed, the use of the accelerators for all operations may deteriorate the overall performance of the entire heterogeneous multi-processor system. Thus, certain tasks may be run by the accelerators, and other tasks may be run by the main processor.
One of the simplest ways to utilize accelerators in a heterogeneous multi-processor environment is to allocate tasks between processor cores. However, if there is a dependency between the tasks, a main processor and accelerators may not be able to run the tasks at the same time in parallel, and the main processor or the accelerators may often be placed in a standby mode.
There may be certain tasks that are determined to be processed by certain processor cores. In this case, even though there are other processing cores available and the other processing cores may have better processing capabilities than the certain processor cores, the other available processing cores may not be used during run time and may often be placed in a standby mode instead.
The following description relates to a technique of optimizing data processing in a heterogeneous multi-processor environment, which may solve the problem that, in a heterogeneous multi-processor environment, some processors become idle due to an operation dependency while other processors run operations.
In one general aspect, an apparatus for optimizing data processing in a heterogeneous multi-processor environment is provided. The apparatus includes an analysis unit configured to analyze 1) operations to be run in connection with data processing and 2) types of processors and a number of processors available for the data processing, a partition unit configured to dynamically partition data into a plurality of data regions having different sizes based on the analyzed operations and operation-specific processor priority information, the operation-specific processor priority information being stored in advance of running the operations, and a scheduling unit configured to perform scheduling by allocating operations to be run in the data regions between the available processors.
The operation-specific processor priority information may include priority levels of processors to run each operation at high speed.
The scheduling unit may be further configured to manage the operations to be run in the data regions by placing the operations to be run in the data regions in a queue.
The scheduling unit may be further configured to place first in the queue, operations to be run in data regions having an operation dependency therebetween, regardless of a sequence between the data regions.
The scheduling unit may be further configured to place first in the queue operations to be run in data regions having a high operation dependency therebetween.
The partition unit may be further configured to preliminarily partition the data into a plurality of preliminary data regions and incorporate preliminary data regions, whose size is less than a predetermined size, with their neighboring preliminary data regions.
The partition unit may be further configured to determine a direction in which to partition the data in a case in which the data is two-dimensional or higher-dimensional.
The partition unit may be further configured to determine, as the direction in which to partition the data, a direction that produces a smallest quotient when dividing a total number of operations to be run by a number of data regions.
The partition unit may be further configured to additionally partition each data region, whose size is greater than a predetermined size, into smaller data regions.
The number of processors may be more than one.
In another general aspect, a method of optimizing data processing in a heterogeneous multi-processor environment is provided. The method includes analyzing 1) operations to be run in connection with data processing and 2) types of processors and a number of processors available for the data processing, dynamically partitioning data into a plurality of data regions having different sizes based on the analyzed operations and operation-specific processor priority information, the operation-specific processor priority information being stored in advance of running the operations, and performing scheduling by allocating operations to be run in the data regions between the available processors.
The operation-specific processor priority information may include priority levels of processors to run each operation at high speed.
The performing scheduling may include managing the operations to be run in the data regions by placing the operations to be run in the data regions in a queue.
The performing scheduling may further include placing first in the queue operations to be run in data regions having an operation dependency therebetween, regardless of a sequence between the data regions.
The performing scheduling may further include placing first in the queue operations to be run in data regions having a high operation dependency therebetween.
The dynamically partitioning the data may include preliminarily partitioning the data into a plurality of preliminary data regions and incorporating preliminary data regions, whose size is less than a predetermined size, with their neighboring preliminary data regions.
The dynamically partitioning the data may further include determining a direction in which to partition the data in a case in which the data is two-dimensional or higher-dimensional.
The dynamically partitioning the data may further include determining, as the direction in which to partition the data, a direction that produces a smallest quotient when dividing a total number of operations to be run by a number of data regions.
The dynamically partitioning the data may further include additionally partitioning each data region, whose size is greater than a predetermined size, into smaller data regions.
The number of processors may be more than one.
Other features and aspects may be apparent from the following detailed description, the drawings, and the claims.
Throughout the drawings and the detailed description, unless otherwise described, the same drawing reference numerals will be understood to refer to the same elements, features, and structures. The relative size and depiction of these elements may be exaggerated for clarity, illustration, and convenience.
The following description is provided to assist the reader in gaining a comprehensive understanding of the methods, apparatuses, and/or systems described herein. Accordingly, various changes, modifications, and equivalents of the methods, apparatuses, and/or systems described herein will be suggested to those of ordinary skill in the art. Also, descriptions of well-known functions and constructions may be omitted for increased clarity and conciseness.
In a heterogeneous multi-processor environment, while some processors run operations, other processors may be placed in a standby mode due to dependencies between operations. In this case, an apparatus for optimizing data processing in a heterogeneous multi-processor environment may partition data into a plurality of data regions having different sizes based on the processing capabilities of heterogeneous processors, and the apparatus may allow some of the heterogeneous processors available to handle the data regions, thereby 1) minimizing any dependency between operations and 2) improving the data processing capabilities of the heterogeneous processors. The apparatus will hereinafter be described in detail with reference to
The analysis unit 110 analyzes 1) operations to be run in connection with data processing and 2) types of processors and a number of processors available for the data processing.
For example, the analysis unit 110 may acquire the operations to be run in connection with the data processing by putting on hold operations called over a predetermined period of time and monitoring the called operations for the predetermined period of time, instead of instantly running the called operations.
The analysis unit 110 may acquire the types of processors and the number of processors available from device registration information and device resource utilization information of an operating system (OS).
The partition unit 120 partitions data into a plurality of data regions having different sizes based on the operations acquired by the analysis unit 110 and operation-specific processor priority information, which is stored in advance of running the operations. The operation-specific processor priority information may include the priority levels of processors to run each operation at high speed.
It will hereinafter be described how to partition data including six operations, which output their respective screen interfaces on a screen, into a plurality of data regions with reference to
Referring to
Referring to
The partition unit 120 dynamically partitions data into a plurality of data regions having different sizes using the operations acquired by the analysis unit 110, thereby resolving an operation dependency between the data regions. The partition unit 120 uses the operation-specific processor priority information to take into consideration the characteristics of heterogeneous processors having different processing capabilities in a heterogeneous multi-processor environment.
The main processor is a first-priority processor for operation 3. Accelerator 2, accelerator 1, and the main processor are first-, second, and third-priority processors, respectively, for operation 4. The accelerator 1 and the main processor are first- and second-priority processors, respectively, for operation 5. The main processor is a first-priority processor for operation 6.
The partition unit 120 incorporates preliminary data regions, whose size is less than a predetermined size, with their neighboring preliminary data regions in order to reduce the number of preliminary data regions and thus to reduce the processing load.
For two-dimensional or higher-dimensional data, the partition unit 120 determines a direction in which to partition the data. For example, the partition unit 120 may be configured to determine, as the direction in which to partition the data, a direction that produces a smallest quotient when dividing a total number of operations to be run by the number of data regions.
Referring to
Referring to
As a result of dynamically partitioning the data, as described above with reference to
Referring to
The scheduling unit 130 performs scheduling by allocating operations to be run in data regions obtained by the partition unit 120 between processors determined to be available based on results of analysis performed by the analysis unit 110. For example, the scheduling unit 130 may be configured to place the operations to be run in the data regions in a queue and thus to manage the operations to be run in the data regions.
The placing of the operations to be run in the data regions in the queue regardless of the sequence between the data regions may be one of the simplest ways of scheduling, but may not resolve an operation dependency between the data regions. Thus, the scheduling unit 130 may be configured to place operations to be run in data regions having an operation dependency therebetween first in the queue, regardless of the sequence between the data regions, and thus to reduce the operation dependency. The scheduling unit 130 may also be configured to place operations to be run in data regions having a high operation dependency therebetween in the queue ahead of operations to be run in data regions having a low operation dependency therebetween.
Each of main processor cores 1 and 2 and accelerators 1 and 2 may run operations placed in a queue by removing the operations from the queue. Dependency resolving may be performed on the operations in order from the front of the queue to the rear of the queue, and the main processor cores 1 and 2 and the accelerators 1 and 2 may search for and run first-priority operations, which may be operations designating the main processor cores 1 and 2 and the accelerators 1 and 2 as first-priority processors.
If none of the first-priority operations are dependency-resolved, the main processor cores 1 and 2 and the accelerators 1 and 2 may search any dependency-resolved operations from second-priority operations, which may be operations designating the main processor cores 1 and 2 and the accelerators 1 and 2 as second-priority processors, and run the dependency-resolved second-priority operations, or may wait a predetermined amount of time and search again for first-priority operations. The main processor cores 1 and 2 and the accelerators 1 and 2 may determine whether to run second-priority operations or to stand by for the predetermined amount of time based on 1) their current state information, 2) the number of first-priority operations yet to be run, and 3) the number of dependency-resolved first-priority operations.
In this manner, an apparatus for optimizing data processing in a heterogeneous multi-processor environment may improve data processing performance by partitioning data into a plurality of data regions having different sizes in consideration of the characteristics of heterogeneous processors having different processing capabilities in a heterogeneous multi-processor environment and processing operations to be run in the data regions so as to minimize any operation dependency between the data regions.
A data processing optimization operation performed by an apparatus for optimizing data processing in a heterogeneous multi-processor environment will hereinafter be described in detail with reference to
For example, the apparatus may acquire the operations to be run in connection with data processing by putting on hold operations called over a predetermined period of time and monitoring called operations, instead of instantly running the called operations.
The apparatus may also acquire the types of processors and the number of processors available from device registration information and device resource utilization information of an operating system (OS).
The apparatus dynamically partitions data into a plurality of data regions having different sizes based on the operations analyzed in operation 910 and operation-specific processor priority information, which is stored in advance (920). For example, the operation-specific processor priority information may include priority levels of processors to run each operation at high speed.
In operation 920, the apparatus may preliminary partition the data into a plurality of preliminary data regions and incorporate preliminary data regions, whose size is less than a predetermined size, with their neighboring preliminary data regions.
If the data is two-dimensional or higher-dimensional, the apparatus may be configured to determine a direction in which to partition the data. For example, the apparatus may be configured to determine, as the direction in which to partition the data, a direction that produces a smallest quotient when dividing a total number of operations to be run by the number of data regions.
In operation 920, the apparatus may be configured to additionally partition each preliminary data region, whose size is greater than a predetermined size, into smaller data regions. The dynamic partitioning of data into a plurality of data regions has already been described above, and thus, a detailed description thereof will be omitted.
The apparatus performs scheduling by allocating operations to be run in the data regions obtained in operation 920 between the available processors analyzed in operation 910 (930).
For example, the apparatus may be configured to place the operations to be run in the data regions obtained in operation 920 in a queue and thus to manage the operations to be run in the data regions obtained in operation 920. The apparatus may also be configured to place operations to be run in data regions having an operation dependency therebetween first in the queue, regardless of the sequence between the data regions, and thus to reduce the operation dependency. The apparatus may also be configured to place operations to be run in data regions having a high operation dependency therebetween in the queue ahead of operations to be run in data regions having a low operation dependency therebetween.
In a heterogeneous multi-processor environment, various processors such as main processor cores and accelerators may run operations placed in a queue by removing the operations from the queue. Dependency resolving may be run on the operations in order from the front of the queue to the rear of the queue, and the various processors may search for and run first-priority operations, which are operations designating the various processors as first-priority processors.
In a case in which none of the first-priority operations are dependency-resolved, the various processors may search any dependency-resolved operations from second-priority operations, which are operations designating the various processors as second-priority processors, and run the dependency-resolved second-priority operations, or may wait a predetermined amount of time and search again for first-priority operations. The various processors may determine whether to run second-priority operations or to stand by for the predetermined amount of time based on 1) their current state information, 2) the number of first-priority operations yet to be run, and 3) the number of dependency-resolved first-priority operations.
In this manner, the apparatus can improve data processing performance by partitioning data into a plurality of data regions having different sizes in consideration of the characteristics of heterogeneous processors having different processing capabilities in a heterogeneous multi-processor environment and processing operations to be run in the data regions so as to minimize any operation dependency between the data regions.
The processes, functions, methods and/or software described herein may be recorded, stored, or fixed in one or more computer-readable storage media that includes program instructions to be implemented by a computer to cause a processor to execute or perform the program instructions. The media may also include, alone or in combination with the program instructions, data files, data structures, and the like. The media and program instructions may be those specially designed and constructed, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable media include magnetic media, such as hard disks, floppy disks, and magnetic tape; optical media such as CD ROM disks and DVDs; magneto-optical media, such as optical disks; and hardware devices that are specially configured to store and perform program instructions, such as read-only memory (ROM), random access memory (RAM), flash memory, and the like. Examples of program instructions include machine code, such as produced by a compiler, and files containing higher level code that may be executed by the computer using an interpreter. The described hardware devices may be configured to act as one or more software modules that are recorded, stored, or fixed in one or more computer-readable storage media, in order to perform the operations and methods described above, or vice versa. In addition, a computer-readable storage medium may be distributed among computer systems connected through a network and computer-readable codes or program instructions may be stored and executed in a decentralized manner.
A number of examples have been described above. Nevertheless, it should be understood that various modifications may be made. For example, suitable results may be achieved if the described techniques are performed in a different order and/or if components in a described system, architecture, device, or circuit are combined in a different manner and/or replaced or supplemented by other components or their equivalents. Accordingly, other implementations are within the scope of the following claims.
Kim, Jae-Won, Baik, Hyun-Ki, Joung, Gyong-jin, Chung, Hee-Jin
Patent | Priority | Assignee | Title |
11093441, | Nov 20 2017 | Samsung Electronics Co., Ltd. | Multi-core control system that detects process dependencies and selectively reassigns processes |
Patent | Priority | Assignee | Title |
4722072, | Jun 16 1983 | National Research Development Corporation | Priority resolution in bus orientated computer systems |
6779182, | May 06 1996 | Sun Microsystems, Inc. | Real time thread dispatcher for multiprocessor applications |
7369256, | Jul 30 2001 | Ricoh Company, LTD | Interruption of job in information processing apparatus by means of acquisition and release of resources |
7380039, | Dec 30 2003 | Computer Associates Think, Inc | Apparatus, method and system for aggregrating computing resources |
8108844, | Jun 20 2006 | GOOGLE LLC | Systems and methods for dynamically choosing a processing element for a compute kernel |
8146063, | Feb 20 2003 | NETWORK SYSTEM TECHNOLOGIES LLC | Translation of a series of computer instructions |
20070283337, | |||
20080109814, | |||
20090150898, | |||
20090154572, | |||
JP2006520034, | |||
JP675786, | |||
KR1020070116712, | |||
KR1020080041047, | |||
KR1020090061177, | |||
KR1020090065398, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Aug 01 2011 | CHUNG, HEE-JIN | SAMSUNG ELECTRONICS CO , LTD | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 026697 | /0367 | |
Aug 01 2011 | BAIK, HYUN-KI | SAMSUNG ELECTRONICS CO , LTD | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 026697 | /0367 | |
Aug 01 2011 | KIM, JAE-WON | SAMSUNG ELECTRONICS CO , LTD | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 026697 | /0367 | |
Aug 01 2011 | JOUNG, GYONG-JIN | SAMSUNG ELECTRONICS CO , LTD | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 026697 | /0367 | |
Aug 03 2011 | Samsung Electronics Co., Ltd. | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Jun 28 2017 | ASPN: Payor Number Assigned. |
Nov 23 2020 | REM: Maintenance Fee Reminder Mailed. |
May 10 2021 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Apr 04 2020 | 4 years fee payment window open |
Oct 04 2020 | 6 months grace period start (w surcharge) |
Apr 04 2021 | patent expiry (for year 4) |
Apr 04 2023 | 2 years to revive unintentionally abandoned end. (for year 4) |
Apr 04 2024 | 8 years fee payment window open |
Oct 04 2024 | 6 months grace period start (w surcharge) |
Apr 04 2025 | patent expiry (for year 8) |
Apr 04 2027 | 2 years to revive unintentionally abandoned end. (for year 8) |
Apr 04 2028 | 12 years fee payment window open |
Oct 04 2028 | 6 months grace period start (w surcharge) |
Apr 04 2029 | patent expiry (for year 12) |
Apr 04 2031 | 2 years to revive unintentionally abandoned end. (for year 12) |