In one embodiment, the present invention includes a method for handling a registration message received from a host processor, where the registration message delegates a poll operation with respect to a device from the host processor to another component. Information from the message may be stored in a poll table, and the component may send a read request to poll the device and report a result of the poll to the host processor based on a state of the device. Other embodiments are described and claimed.
| 
 | 11.  A method comprising:
 receiving a registration message from a host processor in an interconnect coupled between the host processor and a device, the registration message to delegate a poll operation with respect to the device to the interconnect; storing information regarding a device monitored location, a memory monitored address, and an initial value of the device monitored location in a poll table associated with the interconnect; and using a poll delegation logic of the interconnect to send a read request from the interconnect to the device to poll the device and to compare the initial value with a device value obtained from the device monitored location, and reporting a result of the poll to the host processor if the device value is different than the initial value. 1.  An apparatus comprising:
 a core to generate a registration message to delegate a poll operation to an input/output (IO) interconnect; a coherent interconnect coupled to the core; the IO interconnect coupled to the coherent interconnect, the IO interconnect including a poll table having a plurality of entries each having a register address field to store a register address received in a registration message, a destination address field to store a destination address in a system memory received in the registration message, and an initial value field to store an initial value associated with the register address received in the registration message; and at least one device coupled to the IO interconnect to perform an operation for an application executing on the core and including at least one status register, the IO interconnect to poll the at least one status register responsive to information in a poll table entry, and to issue a write transaction to the destination address if a polled value of the at least one status register differs from the initial value, the IO interconnect further including a poll delegation logic to issue a read request to the at least one device at a predetermined interval to perform the poll. 8.  A system comprising:
 a first integrated circuit including:
 at least one core including a first logic to execute a first instruction to set up a monitored address in a memory and a second logic to cause the at least one core to enter a low power state when a predetermined instruction follows the first instruction; a first coherent interconnect coupled to the at least one core; a first input/output (IO) interconnect including a poll table to store a tuple including a register identifier of a register in an intellectual property (IP) block coupled to the first IO interconnect, the monitored address in the memory, and an initial value associated with the register, and a delegation logic to receive a delegation message from the at least one core, and based on the tuple, obtain a current value of the register until the current value differs from the initial value and responsive to the difference write data to the monitored address; and the IP block coupled to the first IO interconnect including the register and to perform a function for an application executing on the at least one core; and the memory coupled to the first integrated circuit via a memory interconnect, wherein the at least one core is to exit the low power state responsive to the data being written to the destination address of the memory and to continue execution of the application at a next instruction following the predetermined instruction. 2.  The apparatus of  3.  The apparatus of  4.  The apparatus of  5.  The apparatus of  6.  The apparatus of  7.  The apparatus of  9.  The system of  10.  The system of  12.  The method of  13.  The method of  14.  The method of  15.  The method of  16.  The method of  17.  The method of  | |||||||||||||||||||||||
Modern computer systems typically include a processor and various other components that are coupled together. In addition, many systems include one or more peripheral or input/output (IO) devices.
To enable communications between software that executes on the processor and operations that may be performed by the other devices, different mechanisms can be used. Common mechanisms include a polling method and an interrupt method. However, neither of these methods is optimal. Using a polling technique, software either continuously polls status registers on the IO device if the IO device's task is fine-grained, or relies on an asynchronous interrupt through the operating system (OS) if the IO device's task is coarse-grained. While a polling method may ensure good performance, it suffers from drawbacks. First, the core/thread that needs to know the completion status has to continuously check (e.g., via a busy spin operation) on a memory mapped input/output (MMIO) status register, preventing itself from entering a low power state. Second, repetitive polling on an uncacheable MMIO address results in a large amount of traffic on a system interconnect. In a word, the fast response time comes at a cost of power consumption (a major issue especially for ultra-low power environments) and waste of system resources.
An interrupt method avoids busy spinning of a processor on the status register. While waiting, the core/thread can either context switch to execute another process or enter a lower power state. Completion of the task on the IO device triggers an interrupt into the OS. However, in a typical system, several hundred cache misses and tens of thousand clock cycles are induced by a kernel interrupt handler. This performance overhead of interrupt handling is not acceptable for many fine-grained logic blocks.
Thus both polling and interrupt techniques are not satisfactory for a low power application, as polling negates a large portion of any power benefits from using an IO device, while interrupts introduce a large performance penalty.
In various embodiments, a poll delegation technique may be implemented in which an interconnect serves as a delegate in a polling and notification process. In one embodiment the interconnect may be an input/output (IO) interconnect, although the scope of the present invention is not limited in this regard. Using this technique, the interconnect polls IO devices for a host processor such as a central processing unit (CPU) and notifies an application software of a given event using one of a number of techniques such as a test and hold operation, or by update to a user-selected memory location that triggers a processor's exit from a power optimized state. In one embodiment, user-level instructions such as MONITOR/MWAIT may be used to notify application software. In various embodiments, poll delegation may enable a response time as short as polling, and power consumption/resource usage as low as an interrupt-based technique, thus providing user-level notification of IO device status without the need for polling or interrupts.
In one embodiment the IO interconnect may include specific-purpose hardware to poll status register(s) of an IO device. Then upon a status change, the IO interconnect can issue a write operation (e.g., a coherent write) to the memory address that is being monitored by the host. The coherent write will be detected by this hardware, and cause the thread that is waiting on the address to resume execution. Thus in various embodiments, a processor can stay in a low power state until the IO device is done with its task, and resume execution almost as fast as if it had been busy-spinning. No change is required in the processor core, the cache, the system coherent interconnect, or the IO devices.
The MONITOR/MWAIT pair of instructions can support inter-thread synchronization. The instruction pair can be available at all privilege levels. MONITOR can be used to enable a CPU to set up monitoring hardware of the CPU to detect stores to an effective address range (typically a cacheline). This address range belongs to a coherent, write-back address range. In one embodiment, cache coherency hardware may monitor for a write to the destination address. When that write occurs the cache coherency controller will send a message to the processor to come out of the low power state. After this set up, the succeeding MWAIT instruction puts the processor core into a selected low-power state (e.g., a clock-gated state or a power-gated state). When the monitoring hardware detects a store to any byte in the address range, the stalled thread resumes execution from the instruction following MWAIT. Architecturally, MWAIT behaves like a no operation (NOP). While these MONITOR and MWAIT instructions are designed to implement performance and power-optimized inter-thread synchronization, embodiments can leverage the instructions for IO device completion notification.
Referring now to 
The one or more cores may be coupled via a coherent interconnect 115 to one or more cache memory 120a. Coherent interconnect 115 may include various hardware, software and/or firmware to implement a cache coherency protocol, such as a modified exclusive shared invalid (MESI) protocol, to maintain a coherent view of information stored within the system. In some implementations, coherent interconnect 115 may be a layered protocol including various layers such as a protocol layer, a link layer and possibly a physical layer (where the system is not on a single die).
In turn, coherent interconnect 115 may be coupled via a hub 120 to a memory controller 130 that in turn may be coupled to a system memory, e.g., dynamic random access memory (DRAM), for example. Note that such memory is not shown in 
In addition, coherent interconnect 115 may be coupled to an upstream side of an IO interconnect 140 which may be of a given communication protocol such as a Peripheral Component Interconnect Express (PCI Express™ (PCIe™)) protocol in accordance with links based on the PCI Express™ Specification Base Specification version 2.0 (published Jan. 17, 2007) (hereafter the PCIe™ Specification), or another such protocol. IO interconnect 140 may include a polling table 150 in accordance with an embodiment of the present invention. While shown as being present in the interconnect, other implementations may locate this buffer elsewhere in close relation to the interconnect. In turn, various devices, e.g., devices 1600 and 1601, which may be IO devices, intellectual property (IP) blocks or so forth may be coupled to a downstream side of IO interconnect 140.
As seen in 
Referring now to 
Still referring to 
During operation, the application then initializes the monitored location, issuing the MONITOR and MWAIT instructions, thus enabling the device to begin executing its task. Various such tasks may be realized, including offloading of specialized functions, graphics processing, physics processing or so forth. As one example, the function may be a specialized calculation such as a fast fourier transform (FFT). The application thus may pass various information regarding the FFT such as the number of points, the starting address and so forth, prior to execution of the MONITOR/MWAIT instructions.
Accordingly, at this time one or more cores of the host processor may enter a low power state, which may be configurable depending on a type of operation that the device is to perform.
Referring still to 
The generation of the registration message and release message may use help from the OS. The user-level MONITOR and MWAIT instructions may be performed completely in user mode, and the poll delegation operation can be purely hardware. Note that the registration and release are usually only performed at the application initialization and cleanup phases. Therefore their power and performance do not matter. In contrast, the user-level setup and poll detection usually are executed a large number of times. The efficiency of these two steps thus enables efficient system power and performance characteristics.
To support multiple IP accelerators, polling table 150 may be a multi-entry translation table. In one embodiment, each entry 156 contains an MMIO address and the physical memory address that is linked to it. In some implementations, the number of entries in the polling table can be a small number, e.g., N=8. In the extreme case when more than N registers need to be checked at the same time, the user thread can always directly poll the registers instead of using the above method, although a virtualized polling table could instead be used.
In one embodiment, a message signaling interrupt (MSI-X) feature of PCI that may be in IO interconnect 140 provides hardware that allows devices 160 to issue writes to system memory locations. In MSI-X, the target memory locations are special addresses that will lead to interrupts. In embodiments instead addresses in write back memory space can be used as targets of such writes. The property of the target memory location is transparent to MSI-X hardware, it simply delivers a packet from the IO interconnect to the memory system. In some implementations the polling performed by interconnect 140 may cause a poll of registered status register addresses, even if the devices that some of the registers represent not actively computing. This is because in some implementations the poll delegation logic 145 has no knowledge of whether or not a valid entry in its mapping table represents an inactive device. Different mechanisms can be used to provide this information to the logic. For example, a system call can be provided by the OS to allow an application to release a particular entry in the polling table. Alternatively, the IO interconnect 140 could intercept power-state transition commands that are sent to the IP blocks 160 so that it will know which status register will not be updated any time soon. This information may be included in a status portion of the entries of polling table 150 in such embodiments. It is noted however, that the cost for the interconnect 140 to poll IO registers is rather low, and as such the power and performance impact of indiscriminate polling may be minimal.
While described herein for a system-on-chip (SoC) configuration, which may be the primary processing component for a computing device such as an embedded, portable or mobile device, other implementations may be used in other systems such as multiprocessor computer systems having a processor coupled to a coherent interconnect, that in turn may be coupled to an IO interconnect via one or more chipsets or other components. Still further, embodiments may be implemented in a multi-chip architecture for a computing device.
Referring now to 
Using an embodiment of the present invention, a process can avoid either suffering from long latency for interrupt handling, or have to busy-spin in a high power state. On a processor that supports MONITOR/MWAIT or similar test and set functions, poll delegation allows the processor to enter power and performance-optimized states while still achieving the quick response time of busy spins. For low-power SoCs that include finer-grained IP blocks, embodiments provide a near-optimal completion notification solution in terms of power and performance.
Embodiments may be implemented in code and may be stored on a storage medium having stored thereon instructions which can be used to program a system to perform the instructions. The storage medium may include, but is not limited to, any type of disk including floppy disks, optical disks, compact disk read-only memories (CD-ROMs), compact disk rewritables (CD-RWs), and magneto-optical disks, semiconductor devices such as read-only memories (ROMs), random access memories (RAMs) such as dynamic random access memories (DRAMs), static random access memories (SRAMs), erasable programmable read-only memories (EPROMs), flash memories, electrically erasable programmable read-only memories (EEPROMs), magnetic or optical cards, or any other type of media suitable for storing electronic instructions.
While the present invention has been described with respect to a limited number of embodiments, those skilled in the art will appreciate numerous modifications and variations therefrom. It is intended that the appended claims cover all such modifications and variations as fall within the true spirit and scope of this present invention.
Harriman, David J., Iyer, Ravishankar, Fang, Zhen, Espig, Michael J.
| Patent | Priority | Assignee | Title | 
| Patent | Priority | Assignee | Title | 
| 4598363, | Jul 07 1983 | AT&T Bell Laboratories | Adaptive delayed polling of sensors | 
| 5504921, | Sep 17 1990 | CONCORD COMMUNICATIONS, INC ; Computer Associates Think, Inc | Network management system using model-based intelligence | 
| 5586297, | Mar 24 1994 | HEWLETT-PACKARD DEVELOPMENT COMPANY, L P | Partial cache line write transactions in a computing system with a write back cache | 
| 5819028, | Jun 10 1992 | Rockstar Consortium US LP | Method and apparatus for determining the health of a network | 
| 5903749, | Jul 02 1996 | INSTITUTE FOR THE DEVELOPMENT OF EMERGING ARCHITECTURE, L L C | Method and apparatus for implementing check instructions that allow for the reuse of memory conflict information if no memory conflict occurs | 
| 6009488, | Nov 07 1997 | Microlinc, LLC | Computer having packet-based interconnect channel | 
| 6138171, | Nov 14 1996 | TUMBLEWEED HOLDINGS LLC | Generic software state machine | 
| 6148337, | Apr 01 1998 | Bridgeway Corporation | Method and system for monitoring and manipulating the flow of private information on public networks | 
| 6477667, | Oct 07 1999 | CRITICAL DEVICES, INC | Method and system for remote device monitoring | 
| 6504785, | Feb 20 1998 | S AQUA SEMICONDUCTOR | Multiprocessor system with integrated memory | 
| 6704842, | |||
| 6892312, | Oct 30 2001 | AVAGO TECHNOLOGIES INTERNATIONAL SALES PTE LIMITED | Power monitoring and reduction for embedded IO processors | 
| 7069480, | Mar 28 2001 | OPTANIX, INC | Method and apparatus for identifying problems in computer networks | 
| 7298758, | Jul 08 2002 | Synaptics Incorporated | Method and system for optimizing UTOPIA CLAV polling arbitration | 
| 7509540, | Mar 28 2001 | OPTANIX, INC | Method and apparatus for maintaining the status of objects in computer networks using virtual state machines | 
| 7516247, | Aug 12 2005 | Advanced Micro Devices, INC | Avoiding silent data corruption and data leakage in a virtual environment with multiple guests | 
| 7610569, | Feb 17 2004 | YUN, DONG-GOO | Chip design verification apparatus and data communication method for the same | 
| 7774522, | Nov 17 2008 | MACOM CONNECTIVITY SOLUTIONS, LLC | Cache stashing processor control messages | 
| 7805550, | May 26 2004 | ARM Limited | Management of polling loops in a data processing apparatus | 
| 7917729, | Jul 25 2003 | Microsoft Technology Licensing, LLC | System on chip IC with subsystem of multiple processing cores switch coupled to network protocol device and bus bridge to local system bus | 
| 8009584, | Jun 11 2001 | GLOBALFOUNDRIES U S INC | System and method for implementing an IRC across multiple network devices | 
| 8024499, | Jan 17 2008 | Juniper Networks, Inc. | Systems and methods for automated sensor polling | 
| 20040024871, | |||
| 20040078528, | |||
| 20050223178, | |||
| 20060101226, | |||
| 20060168396, | |||
| 20060277594, | |||
| 20070150671, | |||
| 20070294486, | |||
| 20080130543, | |||
| 20080209133, | |||
| 20090019232, | |||
| 20100191920, | |||
| 20100191921, | |||
| KR1020060112184, | |||
| KR1020060126065, | 
| Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc | 
| Jun 03 2009 | HARRIMAN, DAVID J | Intel Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 022811/ | 0512 | |
| Jun 05 2009 | FANG, ZHEN | Intel Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 022811/ | 0512 | |
| Jun 05 2009 | IYER, RAVISHANKAR | Intel Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 022811/ | 0512 | |
| Jun 10 2009 | ESPIG, MICHAEL J | Intel Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 022811/ | 0512 | |
| Jun 11 2009 | Intel Corporation | (assignment on the face of the patent) | / | 
| Date | Maintenance Fee Events | 
| Jan 23 2013 | ASPN: Payor Number Assigned. | 
| Jul 14 2016 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. | 
| Sep 21 2020 | REM: Maintenance Fee Reminder Mailed. | 
| Mar 08 2021 | EXP: Patent Expired for Failure to Pay Maintenance Fees. | 
| Date | Maintenance Schedule | 
| Jan 29 2016 | 4 years fee payment window open | 
| Jul 29 2016 | 6 months grace period start (w surcharge) | 
| Jan 29 2017 | patent expiry (for year 4) | 
| Jan 29 2019 | 2 years to revive unintentionally abandoned end. (for year 4) | 
| Jan 29 2020 | 8 years fee payment window open | 
| Jul 29 2020 | 6 months grace period start (w surcharge) | 
| Jan 29 2021 | patent expiry (for year 8) | 
| Jan 29 2023 | 2 years to revive unintentionally abandoned end. (for year 8) | 
| Jan 29 2024 | 12 years fee payment window open | 
| Jul 29 2024 | 6 months grace period start (w surcharge) | 
| Jan 29 2025 | patent expiry (for year 12) | 
| Jan 29 2027 | 2 years to revive unintentionally abandoned end. (for year 12) |