In one embodiment, the present invention includes a method for receiving a request in a router from a first endpoint coupled to the router, where the request is for an aggregated completion. In turn, the router can forward the request to multiple target agents, receive a response from each of the target agents, and consolidate the responses into an aggregated completion. Then, the router can send the aggregated completion to the first endpoint. Other embodiments are described and claimed.
|
13. An apparatus comprising:
a semiconductor die including but not limited to:
a plurality of integrated endpoints; and
a sideband router comprising a plurality of interfaces, one or more of which is coupled to one or more of the plurality of integrated endpoints, and an aggregation logic, responsive to an aggregation request from a particular endpoint of the plurality of integrated endpoints, to combine a plurality of responses from at least some of the plurality of integrated endpoints into a combined response and send the combined response to the particular endpoint, the aggregation request including an aggregation indicator having a source port identifier with a predetermined value, the predetermined value reserved for use by at least some of the plurality of integrated endpoints for issuance of aggregation requests.
1. An apparatus comprising:
a semiconductor die including but not limited to:
a plurality of integrated endpoints; and
a router comprising a plurality of interfaces, one or more of which is coupled to one or more of the plurality of integrated endpoints, and an aggregation logic, responsive to an aggregation request from a particular endpoint of the plurality of integrated endpoints, to combine a plurality of responses from at least some of the plurality of integrated endpoints into a combined response and send the combined response to the particular endpoint, the aggregation request including an aggregation indicator that is a source port identifier having a predetermined value, wherein the predetermined value is reserved for use by at least some of the plurality of integrated endpoints for issuance of aggregation requests and the aggregation request comprises a non-posted request.
19. An apparatus comprising:
a semiconductor die including but not limited to:
a first plurality of integrated endpoints;
a first router coupled to at least some of the first plurality of integrated endpoints and including a first aggregation logic, responsive to a first aggregation request from a first endpoint of the first plurality of integrated endpoints, to combine a plurality of responses from at least some of the first plurality of integrated endpoints into a combined response and send the combined response to the first endpoint, the first aggregation request including an aggregation indicator having a source port identifier with a predetermined value reserved for use by at least some of the first plurality of integrated endpoints for issuance of aggregation requests;
a second plurality of integrated endpoints; and
a second router coupled to at least some of the second plurality of integrated endpoints and including a second aggregation logic, responsive to a second aggregation request from a second endpoint of the second plurality of integrated endpoints, to combine a plurality of responses from at least some of the second plurality of integrated endpoints into a combined response and send the combined response to the second endpoint, the second aggregation request including an aggregation indicator having a source port identifier with a predetermined value reserved for use by at least some of the second plurality of integrated endpoints for issuance of aggregation requests.
2. The apparatus of
3. The apparatus of
4. The apparatus of
5. The apparatus of
6. The apparatus of
8. The apparatus of
9. The apparatus of
10. The apparatus of
11. The apparatus of
12. The apparatus of
14. The apparatus of
15. The apparatus of
16. The apparatus of
18. The apparatus of
20. The apparatus of
22. The apparatus of
|
This application is a continuation of U.S. patent application Ser. No. 13/248,243, filed Sep. 29, 2011, now U.S. Pat. No. 8,711,875, the content of which is hereby incorporated by reference.
Mainstream processor chips, both in high performance and low power segments, are increasingly integrating additional functionality such as graphics, display engines, security engines, PCIe™ ports (i.e., ports in accordance with the Peripheral Component Interconnect Express (PCI Express™ (PCIe™)) Specification Base Specification version 2.0 (published 2007) (hereafter the PCIe™ specification) and other PCIe™ based peripheral devices, while maintaining legacy support for devices compliant with a PCI specification such as the Peripheral Component Interconnect (PCI) Local Bus Specification, version 3.0 (published 2002) (hereafter the PCI specification).
Such designs are highly segmented due to varying requirements from the server, desktop, mobile, embedded, ultra-mobile and mobile Internet device segments. Different markets seek to use single chip system-on-chip (SoC) solutions that combine at least some of processor cores, memory controllers, input/output controllers and other segment specific acceleration elements onto a single chip. However, designs that accumulate these features are slow to emerge due to the difficulty of integrating different intellectual property (IP) blocks on a single die. This is especially so, as IP blocks can have various requirements and design uniqueness, and can require many specialized wires, communication protocols and so forth to enable their incorporation into an SoC. As a result, each SoC or other advanced semiconductor device that is developed requires a great amount of design complexity and customization to incorporate different IP blocks into a single device. This is so, as a given IP block typically needs to be re-designed to accommodate interface and signaling requirements of a given SoC.
In many computer systems, an IP block or agent can send a broadcast or multicast request to many or all other agents within the system. When this request is for a read operation, the agent will receive a completion/reply for every agent or targeted agent in the system. It is thus the agent's responsibility to aggregate the status and the data of all of these completions. The sending of these multiple completions raises complexity for the requesting agent and consumes bandwidth and other resources.
Embodiments may be used to aggregate completions over a sideband interface. In this way, transmission of multiple unicast read requests in a sideband fabric can be avoided, e.g., when identical registers in multiple agents are to be read or multicast/broadcast completion status is to be determined. In some embodiments an initiating master agent can receive an aggregated completion responsive to a multicast or broadcast non-posted request from that initiating master agent. To identify a request for aggregated completions, a predetermined aggregation indicator may be included in the request. In some embodiments, this indicator may be a predetermined port identifier (ID) that is reserved for all endpoints initiating multicast/broadcast non-posted requests that request a single aggregated completion back from a fabric that couples agents together.
Embodiments can be used in many different types of systems. As examples, implementations described herein may be used in connection with semiconductor devices such as processors or other semiconductor devices that can be fabricated on a single semiconductor die. In particular implementations, the device may be a system-on-chip (SoC) or other advanced processor or chipset that includes various homogeneous and/or heterogeneous processing agents, and additional components such as networking components, e.g., routers, controllers, bridge devices, devices, memories and so forth.
Some implementations may be used in a semiconductor device that is designed according to a given specification such as an integrated on-chip system fabric (IOSF) specification issued by a semiconductor manufacturer to provide a standardized on-die interconnect protocol for attaching intellectual property (IP) blocks within a chip, including a SoC. Such IP blocks can be of varying types, including general-purpose processors such as in-order or out-of-order cores, fixed function units, graphics processors, IO controllers, display controllers, media processors among many others. By standardizing an interconnect protocol, a framework is thus realized for a broad use of IP agents in different types of chips. Accordingly, not only can the semiconductor manufacturer efficiently design different types of chips across a wide variety of customer segments, it can also, via the specification, enable third parties to design logic such as IP agents to be incorporated in such chips. And furthermore, by providing multiple options for many facets of the interconnect protocol, reuse of designs is efficiently accommodated. Although embodiments are described herein in connection with this IOSF specification, understand the scope of the present invention is not limited in this regard and embodiments can be used in many different types of systems.
Referring now to
As will be described further below, each of the elements shown in
The IOSF specification includes 3 independent interfaces that can be provided for each agent, namely a primary interface, a sideband message interface and a testability and debug interface (design for test (DFT), design for debug (DFD) interface). According to the IOSF specification, an agent may support any combination of these interfaces. Specifically, an agent can support 0-N primary interfaces, 0-N sideband message interfaces, and optional DFx interfaces. However, according to the specification, an agent must support at least one of these 3 interfaces.
Fabric 20 may be a hardware element that moves data between different agents. Note that the topology of fabric 20 will be product specific. As examples, a fabric can be implemented as a bus, a hierarchical bus, a cascaded hub or so forth. Referring now to
In various implementations, primary interface fabric 112 implements a split transaction protocol to achieve maximum concurrency. That is, this protocol provides for a request phase, a grant phase, and a command and data phase. Primary interface fabric 112 supports three basic request types: posted, non-posted, and completions, in various embodiments. Generally, a posted transaction is a transaction which when sent by a source is considered complete by the source and the source does not receive a completion or other confirmation message regarding the transaction. One such example of a posted transaction may be a write transaction. In contrast, a non-posted transaction is not considered completed by the source until a return message is received, namely a completion. One example of a non-posted transaction is a read transaction in which the source agent requests a read of data. Accordingly, the completion message provides the requested data.
In addition, primary interface fabric 112 supports the concept of distinct channels to provide a mechanism for independent data flows throughout the system. As will be described further, primary interface fabric 112 may itself include a master interface that initiates transactions and a target interface that receives transactions. The primary master interface can further be sub-divided into a request interface, a command interface, and a data interface. The request interface can be used to provide control for movement of a transaction's command and data. In various embodiments, primary interface fabric 112 may support PCI ordering rules and enumeration.
In turn, sideband interface fabric 116 may be a standard mechanism for communicating all out-of-band information. In this way, special-purpose wires designed for a given implementation can be avoided, enhancing the ability of IP reuse across a wide variety of chips. Thus in contrast to an IP block that uses dedicated wires to handle out-of-band communications such as status, interrupt, power management, fuse distribution, configuration shadowing, test modes and so forth, a sideband interface fabric 116 according to the IOSF specification standardizes all out-of-band communication, promoting modularity and reducing validation requirements for IP reuse across different designs. In general, sideband interface fabric 116 may be used to communicate non-performance critical information, rather than for performance critical data transfers, which typically may be communicated via primary interface fabric 112.
As further illustrated in
Using an IOSF specification, various types of chips can be designed having a wide variety of different functionality. Referring now to
As further seen in
As further seen in
As further seen, fabric 250 may further couple to an IP agent 255. Although only a single agent is shown for ease of illustration in the
Furthermore, understand that while shown as a single die SoC implementation in
As discussed above, in various embodiments all out-of-band communications may be via a sideband message interface. Referring now to
Referring now to
Aggregated completions may be used in various instances. For example, such completions can be used for register shadowing in multiple agents. If registers are shadowed in multiple agents, a master agent can issue a multicast read request to the shadow register in each of these agents and request an aggregated response. If the aggregated response does not match with its expected value of the register being shadowed, the agent can determine that the shadow update has yet to complete, or that an error has occurred. Another use case may be for reading duplicate status registers in multiple agents. For example, if multiple agents include one or more duplicate status registers that are updated on a given condition (e.g., a link status register of multiple PCIe lanes), a master agent can issue a multicast read to these status registers and request an aggregated response. The aggregated response thus provides an indication as to whether a specific condition has been updated in each of the status registers. A still further use case may be for determining completion status for a multicast/broadcast transaction.
In this example, an initiating master agent can send, e.g., a non-posted multicast/broadcast write transaction with a source identifier (ID) having a predetermined value (e.g., a source ID of FEh) that indicates that an aggregated response is requested, and in turn receive a single aggregated completion. A successful response status in the aggregated completion thus indicates to the initiating agent that the write message has successfully completed in all target agents.
Aggregated responses in accordance with an embodiment of the present invention may also be used to determine a power state of agents in the system. An initiating master can send a single non-posted multicast/broadcast write transaction with a source ID indicative of an aggregated response request (e.g., a source ID having a value of FEh) to query the power state of all agents in the system. If the completion is received with a power down status, then the master agent can determine that all agents were powered down. Likewise, if the completion is received with a successful status, the master agent can determine that all agents have power. Conversely, if the completion has a mixed status, the master agent can determine that the system has a mix of powered, unpowered, or otherwise misbehaving agents. And in some embodiments, each agent can have a pre-defined bit to set, such that when set, it is an indication of the agent having power and an identification of the agent. If the router completes the message for an agent, it would indicate the power down status and also not be able to set the agent's specific bit. Still other use cases may enable a multicast/broadcast read request with aggregation to avoid multiple unicast read requests.
Messages sent to a broadcast port ID or group port ID (multicast) may be either posted or non-posted. In the case of a non-posted operation, the sender can use the aggregate request indicator as its source port ID if it seeks aggregation of all completions by the fabric and agents with multiple port IDs. In other words, by using this specified port ID (e.g., 0xFE) as a source port ID within a request, a single completion is guaranteed to be returned to the sender responsive to the request. Thus when a non-posted request is sent with this aggregation source port ID, aggregated completions can be collected in the router coupled to the requester, and a single response status is returned.
In various embodiments, routers can apply a “bitwise OR” or a “multi-bit OR” operation to the completion response status they receive before sending the aggregated completion to the ingress port of the requesting agent. When aggregating completions with data, the data returned to the requester can be the bitwise OR of the corresponding data from each completer. If a combination of completion with data and completion without data responses are received by the router, then the aggregated completion can be formed as a completion with data message, where the aggregated response status field is the bitwise OR of the status fields of all received completion messages and the aggregated data is the bitwise OR of the data from all received completion with data messages. In some embodiments, a router may synthesize or create a completion for certain components. For example, a router can synthesize a response for a powered down endpoint, and in some embodiments the response for such endpoints can be considered as a received completion for the purposes of aggregation.
Sideband agents having multiple port IDs can send a single aggregated completion for non-posted messages received with an aggregation request. Such sideband agents with multiple port IDs that aggregate completions may operate similarly to a router with regard to aggregations. That is, such agents may follow all aggregation rules defined for routers.
Thus as a result of data aggregation in accordance with an embodiment of the present invention, an endpoint that initiates a broadcast or multicast can receive a completion with data response indicating successful, unsuccessful/not supported, powered down, or multicast mixed status.
In contrast to a conventional receipt and processing of separate responses in a requesting agent, embodiments may locate the responsibility for aggregation to a system's sideband routers, which may simplify agent design. And by placing this responsibility in the router, this functionality from multiple agents in the system can be aggregated into a shared object (the router), which may lead to a decrease in system gate count, and also simplify agent design by allowing each agent to be agnostic of the total size of the sideband network.
Embodiments thus enable aggregation via usage of an aggregation indicator (e.g., a predetermined port ID (e.g., network address)) as the source address to indicate to all routers in the system that they should aggregate completions. Responsive to detection of such a request, the system routers can aggregate both status and data for a given completion.
Referring now to
At block 320, the router can forward the request to the indicated endpoints. For example, in a broadcast request the router can forward the request along to all system agents, while for a multicast request, the router can forward the request to the indicated agents. In some embodiments, the router can determine whether each agent has available resources, e.g., as determined with reference to a credit counter, before sending the requests along.
Still referring to
Control then passes to block 330 where the status from these individual responses can be aggregated. More specifically in one embodiment aggregation logic of the router can operate to aggregate status information and data information separately, e.g., by respective bitwise operations. Of course, rather than a single bit from each individual response, the bitwise ORs may be of multi-bit length. Control then passes to block 340, where a completion can be sent back to the requesting agent with aggregated status and data.
If instead at diamond 315 it is determined that an aggregated completion is not requested, control passes to diamond 350 where it can be determined whether the received request is a non-posted request. If not (that is, the request is a posted request), control passes to block 355 where the request can be forwarded to the indicated endpoints. If instead, the request is a non-posted request, it is forwarded to the indicated endpoints at block 360. Thereafter, individual responses can be received from the indicated endpoints and individual completions can be sent back to the requester (block 370). Thus as seen in
Although the SoCs of
Thus as seen, an off-die interface 710 (which in one embodiment can be a direct media interface (DMI)) may couple to a hub 715, e.g., an input/output hub that in turn provides communication between various peripheral devices. Although not shown for ease of illustration in
To provide connection to multiple buses, which may be multi-point or shared buses in accordance with the IOSF specification, an IOSF controller 720 may couple between hub 715 and bus 730, which may be an IOSF bus that thus incorporates elements of the fabric as well as routers. In the embodiment shown in
As further seen in
Still other implementations are possible. Referring now to
As further seen in
Furthermore, to enable communications, e.g., with storage units of a server-based system, a switch port 830 may couple between bus 820 and another IOSF bus 850, which in turn may be coupled to a storage controller unit (SCU) 855, which may be a multi-function device for coupling with various storage devices.
Embodiments may be implemented in code and may be stored on a non-transitory storage medium having stored thereon instructions which can be used to program a system to perform the instructions. The storage medium may include, but is not limited to, any type of disk including floppy disks, optical disks, solid state drives (SSDs), compact disk read-only memories (CD-ROMs), compact disk rewritables (CD-RWs), and magneto-optical disks, semiconductor devices such as read-only memories (ROMs), random access memories (RAMs) such as dynamic random access memories (DRAMs), static random access memories (SRAMs), erasable programmable read-only memories (EPROMs), flash memories, electrically erasable programmable read-only memories (EEPROMs), magnetic or optical cards, or any other type of media suitable for storing electronic instructions.
While the present invention has been described with respect to a limited number of embodiments, those skilled in the art will appreciate numerous modifications and variations therefrom. It is intended that the appended claims cover all such modifications and variations as fall within the true spirit and scope of this present invention.
Murray, Joseph, Fanning, Blaise, Klinglesmith, Michael T., Nair, Mohan K., Lakshmanamurthy, Sridhar, Adler, Robert P., Hunsaker, Mikal C., Verma, Rohit R., Lavelle, Gary J.
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
6330647, | Aug 31 1999 | U S BANK NATIONAL ASSOCIATION, AS COLLATERAL AGENT | Memory bandwidth allocation based on access count priority scheme |
6430182, | Oct 16 1997 | NEC Corporation | Fabric system and method for assigning identifier for fabric apparatus therefor |
6469982, | Jul 31 1998 | Alcatel | Method to share available bandwidth, a processor realizing such a method, and a scheduler, an intelligent buffer and a telecommunication system including such a processor |
7065733, | Dec 02 2003 | International Business Machines Corporation | Method for modifying the behavior of a state machine |
7415533, | Jun 14 2002 | Juniper Networks, Inc. | Packet prioritization systems and methods using address aliases |
7421543, | Dec 02 2004 | Fujitsu Limited | Network device, fiber channel switch, method for shared memory access control, and computer product |
8069286, | May 10 2006 | Altera Corporation | Flexible on-chip datapath interface having first and second component interfaces wherein communication is determined based on a type of credit condition |
8711875, | Sep 29 2011 | Intel Corporation | Aggregating completion messages in a sideband interface |
20030227926, | |||
20040208512, | |||
20040218600, | |||
20050010687, | |||
20050120323, | |||
20060101179, | |||
20060277346, | |||
20090006165, | |||
20090248940, | |||
20090296624, | |||
20090300245, | |||
20090310616, | |||
20100106912, | |||
20100220703, | |||
20100235675, | |||
20100250889, | |||
20100293304, | |||
20100312942, | |||
20110032947, | |||
20110238728, | |||
20120051297, | |||
20120303842, | |||
20120303899, | |||
20130054845, | |||
20130089095, | |||
CN101267376, | |||
CN101558589, | |||
CN101873339, | |||
CN1819555, | |||
CN1833415, | |||
EP1328104, | |||
EP2216722, | |||
JP2007135035, | |||
WO2010102055, | |||
WO2010137572, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Mar 13 2014 | Intel Corporation | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Aug 08 2019 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Jun 21 2023 | M1552: Payment of Maintenance Fee, 8th Year, Large Entity. |
Date | Maintenance Schedule |
Feb 23 2019 | 4 years fee payment window open |
Aug 23 2019 | 6 months grace period start (w surcharge) |
Feb 23 2020 | patent expiry (for year 4) |
Feb 23 2022 | 2 years to revive unintentionally abandoned end. (for year 4) |
Feb 23 2023 | 8 years fee payment window open |
Aug 23 2023 | 6 months grace period start (w surcharge) |
Feb 23 2024 | patent expiry (for year 8) |
Feb 23 2026 | 2 years to revive unintentionally abandoned end. (for year 8) |
Feb 23 2027 | 12 years fee payment window open |
Aug 23 2027 | 6 months grace period start (w surcharge) |
Feb 23 2028 | patent expiry (for year 12) |
Feb 23 2030 | 2 years to revive unintentionally abandoned end. (for year 12) |