A multiprocessor system which includes automatic workload distribution. As threads execute in the multiprocessor system, an operating system or hypervisor continuously learns the execution characteristics of the threads and saves the information in thread-specific control blocks. The execution characteristics are used to generate thread performance data. As the thread executes, the operating system continuously uses the performance data to steer the thread to a core that will execute the workload most efficiently.
|
1. A method for automatic workload distribution within a multiprocessor system comprising
measuring performance of an application while executing on processors of the multiprocessor system;
storing data relating to performance of the application on the processors of the multiprocessor system; and,
assigning executing of an application to a processor having characteristics corresponding to processing consumption attributes of the application; and wherein
the storing is within a control data structure of a corresponding application.
7. An apparatus for automatic workload distribution within a multi-core processor comprising
means for measuring performance of an application while executing on processors of the multiprocessor system;
means for storing data relating to performance of the application on the processors of the multiprocessor system; and,
means for assigning executing of an application to a processor having characteristics corresponding to processing consumption attributes of the application; and wherein
the storing is within a control data structure of a corresponding application.
6. A method for automatic workload distribution within a multiprocessor system comprising
measuring performance of an application while executing on processors of the multiprocessor system;
storing data relating to performance of the application on the processors of the multiprocessor system;
assigning executing of an application to a processor having characteristics corresponding to processing consumption attributes of the application;
learning a resource load that the application places on the multiprocessor system; and,
considering the resource load when assigning executing of the application based upon the resource load.
10. An apparatus for automatic workload distribution within a multi-core processor comprising
means for measuring performance of an application while executing on processors of the multiprocessor system;
means for storing data relating to performance of the application on the processors of the multiprocessor system;
means for assigning executing of an application to a processor having characteristics corresponding to processing consumption attributes of the application;
means for learning a resource load that the application places on the multiprocessor system; and,
means for considering the resource load when assigning executing of the application based upon the resource load.
12. A multi-core processor system comprising
a plurality of processor cores;
a memory, the memory storing an automatic workload distribution system, the automatic workload distribution system comprising instructions executable by the multi-core processor for
measuring performance of an application while executing on processors of the multiprocessor system;
storing data relating to performance of the application on the processors of the multiprocessor system; and,
assigning executing of an application to a processor having characteristics corresponding to processing consumption attributes of the application; and wherein,
the storing is within a control data structure of a corresponding application.
15. A multi-core processor system comprising:
a plurality of processor cores;
a memory, the memory storing an automatic workload distribution system, the automatic workload distribution system comprising instructions executable by the multi-core processor for
measuring performance of an application while executing on processors of the multiprocessor system;
storing data relating to performance of the application on the processors of the multiprocessor system;
assigning executing of an application to a processor having characteristics corresponding to processing consumption attributes of the application
learning a resource load that the application places on the multiprocessor system; and,
considering the resource load when assigning executing of the application based upon the resource load.
2. The method of
comparing hardware utilization statistics stored in the control data structure of the application with characteristics for processors on the system.
3. The method of
the multiprocessor system comprises a performance monitor; and,
the measuring is performed by the performance monitor of the multiprocessor system.
4. The method of
at least one processor of the processors of the multiprocessor system comprises a plurality of cores; and,
the measuring comprises measuring performance of an application while executing on the plurality of cores of the at least one processor; and further comprising
characterizing performance of the plurality of cores based upon the measuring; and,
storing characterization information relating to performance of the plurality of cores of the at least one processor.
5. The method of
the resource load comprises determining at least one of single or double floating point operation, memory usage, and, instructions that use single or multiple cycles.
8. The apparatus of
means for comparing hardware utilization statistics stored in the control data structure of the application with characteristics for processors on the system.
9. The apparatus of
the multiprocessor system comprises a performance monitor; and,
the measuring is performed by the performance monitor of the multiprocessor system.
11. The apparatus of
the resource load comprises determining at least one of single or double floating point operation, memory usage, and, instructions that use single or multiple cycles.
13. The multi-core processor system of
comparing hardware utilization statistics stored in the control data structure of the application with characteristics for processors on the system.
14. The multi-core processor system of
a performance monitor; and wherein
the instructions for measuring cause the performance monitor to measure performance of the application.
16. The multi-core processor system of
the resource load comprises determining at least one of single or double floating point operation, memory usage, and, instructions that use single or multiple cycles.
|
1. Field of the Invention
The present invention relates to a method for autonomic workload distribution on a multicore processor and more specifically to automate workload distribution on a multicore processor.
2. Description of the Related Art
In multi-core computer systems, different system resources (such as CPUs, memory, I/O bandwidth, disk storage, etc.) are each used to operate on multiple instruction threads. Challenges associated with efficiently operating these multi-core computer systems only increase as the number and complexity of cores in a multiprocessor computer grows.
One issue relating to the use of multi-core integrated circuits is that it often is difficult to write software to take advantage of the multiple cores. To take advantage of the multi-core processors, tasks often need to be divided into threads, and the threads often need to be distributed onto available cores. An issue relating to distributing threads is how to efficiently steer the threads. In known systems, workload is sent to cores based on availability and affinity. In other systems, software is written so that particular tasks run on a specific type core. As the number and type of cores increase, there will be an opportunity to distribute workload in a more intelligent way.
In accordance with the present invention, a multi-core system which includes automatic workload distribution is set forth. More specifically, as threads execute in the multi-core system, an operating system/hypervisor continuously learns the execution characteristics of the threads and saves the information in thread-specific control blocks. The execution characteristics are used to generate thread performance data. As the thread executes, the operating system/hypervisor continuously uses the performance data to steer the thread to a core that will execute the workload most efficiently.
More specifically, in one embodiment, the invention relates to a method for automatic workload distribution within a multiprocessor system. The method includes measuring performance of an application while executing on processors of the multiprocessor system; storing data relating to performance of the application on the processors of the multiprocessor system; and, assigning executing of an application to a processor having characteristics corresponding to processing consumption attributes of the application.
In another embodiment, the invention relates to an apparatus for automatic workload distribution within a multi-core processor. The apparatus includes means for measuring performance of an application while executing on processors of the multiprocessor system; means for storing data relating to performance of the application on the processors of the multiprocessor system; and, means for assigning executing of an application to a processor having characteristics corresponding to processing consumption attributes of the application.
In another embodiment, the invention relates to a multi-core processor system comprising a plurality of processor cores and a memory. The memory stores an automatic workload distribution system. The automatic workload distribution system includes instructions executable by the multi-core processor for measuring performance of an application while executing on processors of the multiprocessor system; storing data relating to performance of the application on the processors of the multiprocessor system; and, assigning executing of an application to a processor having characteristics corresponding to processing consumption attributes of the application.
The present invention may be better understood, and its numerous objects, features and advantages made apparent to those skilled in the art by referencing the accompanying drawings. The use of the same reference number throughout the several figures designates a like or similar element.
Referring now to
As further depicted in
The processing units communicate with other components of system 100 via a system interconnect or fabric bus 150. Fabric bus 150 is connected to one or more service processors 160, a system memory device 161, a memory controller 162, a shared or L3 system cache 166, and/or various peripheral devices 169. A processor bridge 170 can optionally be used to interconnect additional processor groups. Though not shown, it will be understood that the data processing system 100 may also include firmware which stores the system's basic input/output logic, and seeks out and loads an operating system from one of the peripherals whenever the computer system is first turned on (booted).
As depicted in
The system memory device 161 (random access memory or RAM) stores program instructions and operand data used by the processing units, in a volatile (temporary) state, including the operating system 161A and application programs 161B. Automatic workload distribution module 161C may be stored in the system memory in any desired form, such as an operating system module, Hypervisor component, etc, and is used to optimize the execution of a single threaded program across multiple cores of the processor units. Although illustrated as a facility within system memory, those skilled in the art will appreciate that automatic workload distribution module 161C may alternatively be implemented within another component of data processing system 100 or an automatic workload distribution unit may exist as a stand alone unit (located either in the processor or off of the processor). The automatic workload distribution module 161C is implemented as executable instructions, code and/or control logic including programmable registers which is operative to check performance monitor information for codes running on the system 100, to assign priority values to the code using predetermined policies, and to tag each instruction with its assigned priority value so that the priority value is distributed across the system 100 with the instruction, as described more fully below.
The system 100 also includes a performance monitor 180. The performance monitor 180 may provide the performance information used by the automatic workload distribution module 161C where performing an automatic workload distribution function. More specifically, as threads execute in the multi-core system, an operating system/hypervisor continuously learns the execution characteristics of the threads and saves the information in thread-specific control blocks. The execution characteristics are used to generate thread performance data. As the thread executes, the operating system/hypervisor continuously uses the performance data to steer the thread to a core that will execute the workload most efficiently.
Referring to
The first time an application is executed in the system, a performance monitor on the core measures the characteristics of the system usage. The monitor analyzes for example, single or double floating point operation, memory usage (L1, L2 or main memory accesses), instructions that use single or multiple cycles, and other items. The performance monitor learns the resource load that the application puts on the system at step 230. The application or subroutine or thread is tagged and the performance data is stored at step 240. The performance monitor data is extracted from the performance monitor 180. The hardware performance data is stored within a core data structure of the thread for use by the operating system/hypervisor/cluster scheduler. (The hardware performance data may also be used to characterize performance of the cores of the processor and characterization information may be stored on the processor.) The scheduler compares the hardware utilization statistics stored in the control data structure of the thread with characteristics for the processors on the system at step 250. The operating system or hypervisor assigns the thread to the appropriate core that best matches the hardware capabilities with the workload of the software measured processing consumption attributes at step 260.
The data can also be used by the scheduler to intelligently combine workloads on a processor or core. For example, the automatic workload distribution 170 may determine that it is more efficient to execute a thread that frequently accesses memory with a thread that accesses data out of the cache on the same core or processor. The data can also be used to match cache latency performance to caches with various latencies and sizes.
The combination of processors with different processing characteristics all being part of a single system, cluster, or hypervisor execution complex, low level non-intrusive processor or core performance monitoring capabilities, and a scheduling algorithms that makes dispatching decisions based on measured unit utilization characteristics to route work to the appropriate processor or core provides an advantageous automatic workload distribution system. Additionally, because this process is continuous and performance utilization data is gathered during every time slice, a thread or workload can autonomously move from processor to processor in the complex over time if the thread or workload contains workload changes.
Those skilled in the art will appreciate that data processing system 100 can include many additional or fewer components, such as I/O adapters, interconnect bridges, non-volatile storage, ports for connection to networks or attached devices, etc. Because such components are not necessary for an understanding of the present invention, they are not illustrated in
Consequently, the invention is intended to be limited only by the spirit and scope of the appended claims, giving full cognizance to equivalents in all respects.
Capps, Jr., Louis Bennie, Nayar, Naresh, Cook, Thomas Edward, Bell, Jr., Robert H., Shapiro, Michael Jay, Pierson, Bernadette Ann, Dewkett, Thomas J., Newhart, Ronald Edward
Patent | Priority | Assignee | Title |
10248466, | Sep 21 2016 | International Business Machines Corporation | Managing workload distribution among processing systems based on field programmable devices |
10417012, | Sep 21 2016 | International Business Machines Corporation | Reprogramming a field programmable device on-demand |
10572310, | Sep 21 2016 | International Business Machines Corporation | Deploying and utilizing a software library and corresponding field programmable device binary |
10579426, | Jul 01 2010 | NeoDana, Inc. | Partitioning processes across clusters by process type to optimize use of cluster specific configurations |
10599479, | Sep 21 2016 | International Business Machines Corporation | Resource sharing management of a field programmable device |
10956210, | Jun 05 2018 | Samsung Electronics Co., Ltd. | Multi-processor system, multi-core processing device, and method of operating the same |
11061693, | Sep 21 2016 | International Business Machines Corporation | Reprogramming a field programmable device on-demand |
11095530, | Sep 21 2016 | International Business Machines Corporation | Service level management of a workload defined environment |
11150948, | Nov 04 2011 | ThroughPuter, Inc. | Managing programmable logic-based processing unit allocation on a parallel data processing platform |
11429500, | Sep 30 2020 | EMC IP HOLDING COMPANY LLC | Selective utilization of processor cores while rebuilding data previously stored on a failed data storage drive |
11915055, | Aug 23 2013 | ThroughPuter, Inc. | Configurable logic platform with reconfigurable processing circuitry |
8321362, | Dec 22 2009 | Intel Corporation | Methods and apparatus to dynamically optimize platforms |
8332854, | May 19 2009 | Microsoft Technology Licensing, LLC | Virtualized thread scheduling for hardware thread optimization based on hardware resource parameter summaries of instruction blocks in execution groups |
8347301, | Jun 30 2008 | TAHOE RESEARCH, LTD | Device, system, and method of scheduling tasks of a multithreaded application |
8359597, | Sep 11 2008 | Board of Regents, The University of Texas System | Workload-guided application scheduling in multi-core system based at least on applicaton branch transition rates |
8631415, | Aug 25 2009 | NetApp, Inc. | Adjustment of threads for execution based on over-utilization of a domain in a multi-processor system by sub-dividing parallizable group of threads to sub-domains |
9043802, | Aug 25 2009 | NetApp, Inc. | Adjustment of threads for execution based on over-utilization of a domain in a multi-processor system by destroying parallizable group of threads in sub-domains |
9477524, | Jul 01 2010 | KANG, DAN C | Partitioning processes across clusters by process type to optimize use of cluster specific configurations |
9606842, | May 08 2013 | NATIONAL SCIENCE FOUNDATION | Resource and core scaling for improving performance of power-constrained multi-core processors |
9959139, | Jul 01 2010 | KANG, DAN C | Partitioning processes across clusters by process type to optimize use of cluster specific configurations |
Patent | Priority | Assignee | Title |
CN101076770, | |||
CN1860446, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Dec 19 2007 | International Business Machines Corporation | (assignment on the face of the patent) | / | |||
Jan 29 2008 | BELL, ROBERT H , JR | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 020549 | /0291 | |
Jan 29 2008 | NEWHART, RONALD EDWARD | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 020549 | /0291 | |
Jan 30 2008 | NAYAR, NARESH | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 020549 | /0291 | |
Jan 31 2008 | SHAPIRO, MICHAEL JAY | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 020549 | /0291 | |
Feb 01 2008 | CAPPS, LOUIS BENNIE, JR | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 020549 | /0291 | |
Feb 04 2008 | DEWKETT, THOMAS J | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 020549 | /0291 | |
Feb 08 2008 | COOK, THOMAS EDWARD | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 020549 | /0291 | |
Feb 13 2008 | PIERSON, BERNADETTE ANN | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 020549 | /0291 |
Date | Maintenance Fee Events |
Jan 21 2015 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Jan 15 2019 | M1552: Payment of Maintenance Fee, 8th Year, Large Entity. |
Jan 18 2023 | M1553: Payment of Maintenance Fee, 12th Year, Large Entity. |
Date | Maintenance Schedule |
Aug 09 2014 | 4 years fee payment window open |
Feb 09 2015 | 6 months grace period start (w surcharge) |
Aug 09 2015 | patent expiry (for year 4) |
Aug 09 2017 | 2 years to revive unintentionally abandoned end. (for year 4) |
Aug 09 2018 | 8 years fee payment window open |
Feb 09 2019 | 6 months grace period start (w surcharge) |
Aug 09 2019 | patent expiry (for year 8) |
Aug 09 2021 | 2 years to revive unintentionally abandoned end. (for year 8) |
Aug 09 2022 | 12 years fee payment window open |
Feb 09 2023 | 6 months grace period start (w surcharge) |
Aug 09 2023 | patent expiry (for year 12) |
Aug 09 2025 | 2 years to revive unintentionally abandoned end. (for year 12) |