Systems and methods of processing write transactions provide for combining write transactions on an input/output (i/O) hub according to a protocol between the i/O hub and a processor. data associated with the write transactions can be flushed to an i/O device without the need for proprietary software and specialized registers within the i/O device.
|
26. An apparatus comprising:
a processor including:
write logic to transmit a plurality of write transactions to be identified as write combinable to a hub;
receiving logic to receive a write completion for each of the plurality of write transactions from the hub;
protocol logic to transmit a flush signal to the hub to initiate a flush of data associated with the plurality of write transactions in a single transaction to an i/O device in response to detecting a flush event and in response to receiving, with the receiving logic, a last write completion for a last write transaction of the plurality of write transactions.
17. An apparatus comprising:
a hub to be coupled to a first device and a second device, the hub including:
a buffer;
receiving logic to receive a first combinable write transaction and a second combinable write transaction from the first device; and
a write combining module to
store first data associated with the first combinable write transaction in the buffer and send a first write completion signal to the first device in response to storing the first data in the buffer,
store second data associated with the second combinable write transaction in the buffer and send a second write completion signal to the first device in response to storing the second data in the buffer, and
flush the first data and the second data as combined data in a single transaction to the second device in response to receiving a flush signal from the first device.
9. An apparatus comprising: a hub to be coupled to a processor, the hub including:
receiving logic to receive a first write transaction and a second write transaction from a processor, the first and the second write transactions to reference partial data of a cache line within the processor, wherein the first and second write transactions include a write combinable attribute to indicate a first and a second partial write transactions as write combinable,
combining logic coupled to the receiving logic to combine the partial data of the cache line referenced by the first and second write transactions as write combined data in response to the first and second write transactions including the write combinable attribute to indicate they are write combinable; and
flushing logic coupled to the combining logic to flush the write combined data in a single transaction to an i/O device in response to a flush protocol event.
1. A method comprising:
receiving a plurality of write transactions with a controller hub from a processor, each of the plurality of write transactions being associated with a write combinable attribute to indicate they are a plurality of write combinable write transactions;
buffering data associated with the plurality of write combinable write transactions in a buffer of the controller hub in response to each of the plurality of write transactions being associated with the write combinable attribute to indicate they are the write combinable write transactions; and
transmitting a write completion signal corresponding to each of the plurality of write combinable write transactions with the controller hub to the processor in response to buffering the data associated with each of the plurality of write combinable write transactions;
flushing the data associated with the plurality of write combinable write transactions in the buffer as write combined data in a single transaction to an i/O device in response to receiving a flush signal from the processor after transmitting the write completion signal corresponding to each of the plurality of write combinable write transactions with the controller hub to the processor.
2. The method of
3. The method of
4. The method of
5. The method of
6. The method of
7. The method of
8. The method of
10. The apparatus of
11. The apparatus of
12. The apparatus of
13. The apparatus of
14. The apparatus of
15. The apparatus of
16. The apparatus of
18. The apparatus of
19. The apparatus of
20. The apparatus of
21. The apparatus of
22. The apparatus of
23. The apparatus of
24. The apparatus of
25. The apparatus of
flush signal is from the first device based on a first source identifier associated with the first and second combinable write transactions matching a second source identifier associated with the flush signal.
27. The apparatus of
28. The apparatus of
|
The present application is related to U.S. patent application Ser. No. 10/402,125, filed on Mar. 28, 2003.
1. Technical Field
One or more embodiments of the present invention generally relate to the processing of computer system transactions. In particular, certain embodiments relate to protocols for combining write transactions.
2. Discussion
As consumer demand for faster processing and enhanced functionality continues to increase, the importance of computing efficiency and performance also increases. Modern day processors use cache memory as one technique to make the processing of data more efficient, where data in the cache memory is allocated by the operating system (OS) one page at a time and each page contains a number of cache entries. Each cache entry usually holds a certain number of words, known as a “cache line” or “cache block” and an entire line is typically read and cached at once in order to achieve optimum “burst” performance. Unfortunately, processors running certain applications such as graphics applications are most often required to implement pixel writes, which tend to be 8-bit, 16-bit or 32-bit quantities rather than the full cache lines (e.g., 64-byte) necessary to provide optimum burst performance.
As a result, a conventional processor may not be able to achieve the desired level of performance in some cases. To address this problem, more recent computer architectures have been designed to automatically combine smaller, or partial, writes into larger cache line writes. This approach is referred to as processor “write-combining”. Processor write-combining is implemented by tagging each page in memory with a write combining (WC) attribute, which indicates whether partial writes from the page can be combined, and buffering the writes on the processor until a full cache line is obtained. The combined writes are typically then sent to their intended destination by way of a chipset input/output (I/O) hub, where the intended destination might be a memory mapped input/output (MMIO) space of an input/output (I/O) device. The I/O hub serves as a bridge between the processor/processor bus and an I/O interface (e.g., bus) that connects to the I/O device containing the MMIO space.
It has been determined, however, that a cache line is not typically an optimal data length for certain I/O interfaces. For example, one 64-byte cache line is roughly 69% efficient for write transactions (or “writes”) on peripheral components interconnect-express (PCI-Express) buses. While recent approaches have been developed to provide for chipset write combining in order to make writes from the chipset more efficient from the perspective of the I/O interface, a number of difficulties remain.
One difficulty results from the fact that posting memory writes to an intermediate agent such as a chipset I/O hub can cause problems with regard to unordered interfaces. The use of unordered interfaces essentially leads to multiple paths for data traveling from a source to a destination. Since some routes may be shorter than others, the “multipath” effect can lead to the execution of instructions out of their intended order. For example, a posted memory write transaction typically completes at the processor before it actually completes at the MMIO space. Posting enables the processor to proceed with the next operation while the posted write transaction is still making its way through the system to its ultimate destination. Because the processor proceeds before the write actually reaches its destination, other events that the operating system (OS) expects to occur after the write (e.g., a read from the same destination) may pass the write. The result can be unexpected behavior of the computer system.
To address this concern, various consumer/producer ordering rules have been developed so that hardware can maintain the ability to use posting to optimize performance without negatively affecting software operation. Indeed, many ordering rules specifically focus on the flushing of posting buffers. One particular chipset write-combining technique relies upon an I/O software driver to enforce ordering rules by notifying the chipset when it is necessary to flush the buffer contents to the I/O device. Unfortunately, the software driver is a proprietary solution that cannot be used by off-the-shelf OS software and application software.
The various advantages of the embodiments of the present invention will become apparent to one skilled in the art by reading the following specification and appended claims, and by referencing the following drawings, in which:
The illustrated processor 16 has a write combining module 17 capable of automatically combining partial writes into cache line writes in order to achieve optimum burst performance from the perspective of the processor 16. Specifically, processor write-combining is implemented by tagging each page in memory with a write combining (WC) attribute, which indicates whether partial writes from the page in question can be combined. Write combinable writes are typically buffered until a full cache line is obtained, although the processor 16 does not have to actually combine the writes in order for the I/O hub 18 to implement write combining. If the writes have the WC attribute, the write transactions 24 (24a-24c) are sent out as write combinable writes. The write transactions 24 therefore instruct (or command) the I/O hub 18 to consider the writes for chipset write combining.
The interface between the processor 16 and the I/O hub 18 may represent a portion of a point-to-point fabric. In such and embodiment, a point-to-point network interconnect is coupled to the processor 16, the I/O hub 18 and various other nodes in the system 10. In the point-to-point fabric topology, each node has a direct link to other nodes in the system. The network interconnect can also have a layered communication protocol in which write transactions are transferred between nodes in packets at a protocol layer. Packets are data structures having a header and payload; where, the header includes “routing information” such as the source address and/or destination address of the packet; and/or, a connection identifier that identifies a connection that effectively exists in the network interconnect to transport the packet. Other layers such as transport, routing, link and physical layers can reside beneath the protocol layer in the hierarchy. Table I summarizes one approach to implementing the layered communication protocol.
TABLE I
Layer
Description
Protocol
Higher level communication protocol between
nodes such as power management, cache coherence,
ordering, peer to peer I/O, interrupt
delivery, etc.
Transport
End-to-end reliable transmission between two agents
Routing
Flexible and distributed way to route packets from a
source to a destination
Link
Reliable data transfer and flow control between two
directly connected agents & virtualization of the
physical channel
Physical
Electrical transfer of information between two
directly connected agents.
The transport and routing layers may be needed for certain platform options only. In desktop/mobile and dual processor systems, for example, the functionality of the routing layer can be embedded in the link layer. Simply put, layers may be added or removed from the protocol without parting from the spirit and scope of the illustrated embodiments. Furthermore, other topologies such as ring topologies can be used depending upon scalability and other implementation concerns.
The I/O hub 18 has a buffer 20 and a chipset write combining (CSWC) module 22, where the write combining module 22 receives the write transactions 24 from the processor 16 and stores data 28 associated with the write transactions 24 to the buffer 20. The write combining module 22 also flushes the data 28 to the I/O device 12 according to a protocol between the I/O hub 18 and the processor 16. Thus, proprietary driver software and specialized writes to internal registers of the I/O device 12 are not necessary for proper consumer/producer transaction ordering. As illustrated, the amount of data 28 can also be greater than one cache line in order to maximize performance from the perspective of the I/O interface 14.
Turning now to
Meanwhile, the illustrated I/O hub sends a write completion signal 32a-32c back to the processor for each of the write transactions 24, where each write completion signal 32a-32c verifies buffering of the corresponding write transaction. Upon receiving the final write completion signal 32c (i.e., the write completion signal corresponding to the last write combinable write issued to the I/O hub), the processor sends a flush signal (special flush, or “SpcFlush”) 34 to the I/O hub. When the flush signal 34 is received by the I/O hub, the buffered data is flushed to the I/O device and a flush completion signal 36 is returned to the processor. Therefore, the write completion signals 32a-32c are sent before the data is flushed to the I/O device and the write transactions 24 become globally observable after flushing at time tgo. During the time period tfl between sending the flush signal 34 and receiving the flush completion signal 36, the processor refrains from issuing write combinable writes to the same memory region in the I/O device. The protocol illustrated in diagram 30 therefore defines an explicit signaling protocol between the processor and the 10 hub, where the flush signal 34 specifically instructs the I/O hub to flush data.
Once the data is flushed to the I/O device the I/O hub sends write completion signals 32a-32c back to the processor to verify flushing of each write transaction. Therefore, in the illustrated example the write transactions 24 become globally observable at time tgo′ and the write completion signals 32a-32c are sent after the data is flushed to the I/O device. During the time period tfl′ between detection of the flushing event and receiving the final write completion signal, the processor refrains from issuing order-dependent transactions. Simply put, the processor waits for all previous write combinable writes to complete before proceeding with the next order-dependent event. The protocol illustrated in diagram 30′ therefore represents an implicit timing protocol between the processor and the I/O hub.
Turning now to
The illustrated processing block 40 provides for receiving a plurality of write transactions from a processor and block 42 provides for storing data associated with the write transactions to a buffer of an I/O hub. Blocks 40 and 42 can be repeated as needed until flushing is required. The data is flushed to an I/O device at block 44 according to a protocol between the I/O hub and the processor. As already discussed, the flushing protocol can be implemented in a number of ways.
Source matching can provide enhanced performance in multi-processor/multi-buffer environments and will now be described in greater detail with reference to
Source matching can be implemented by tagging the buffers 72, 74 with a source identifier that is associated with one or more write combinable write transactions (i.e., a first source identifier). In one approach, the first source identifier is extracted from the initial write combinable write in a series of write combinable writes. Thus, buffer tagging can be incorporated in to the processing block 42′ (
Block 102 provides for determining whether a flushing event has occurred. As already discussed, flushing events can include, but are not limited to, the use of ordering fences, encountering implicit locked instructions and encountering interrupts. When a flushing event is encountered, it is determined at block 104 whether a write combine history indicates that one or more combinable write transactions have been issued by the processor. Consulting the write combine history can reduce the negative effects on bandwidth that may result from issuing flush signals for every flushing event that is encountered. The write combine history can be implemented in a number of different ways. For example, in one approach the write combine history tracks combinable write transactions per processor thread. The write combine history could also track combinable write transactions per I/O hub in addition to a per thread basis. If a combinable write has been previously sent, block 106 provides for sending a flush signal to the appropriate I/O hub. If a combinable write has not been sent, the next hub is selected at block 108 and the determination at block 104 is repeated.
It should also be noted that the chipset could include multiple I/O hubs, where the processor is able to issue flush signals to each of the I/O hubs. If the generation of flush signals is rare, broadcasting the flush signals to all of the I/O hubs is not likely to negatively affect performance. Alternatively, the processor could verify that a given I/O hub has previously been issued one or more write combinable write transactions before sending a flush signal to the hub.
Turning now to
The latency condition typically includes a delay in receiving a next combinable write transaction from the processor. The latency condition may also include the interface to the I/O device being in an idle state. Block 96 provides for sending a write completion signal to the processor for each of the write transactions as the data is flushed to the I/O device, where each write completion signal verifies flushing of a corresponding write transaction.
Those skilled in the art can appreciate from the foregoing description that the broad techniques of the embodiments of the present invention can be implemented in a variety of forms. Therefore, while the embodiments of this invention have been described in connection with particular examples thereof, the true scope of the embodiments of the invention should not be so limited since other modifications will become apparent to the skilled practitioner upon a study of the drawings, specification, and following claims.
Ajanovic, Jasmin, Muthrasanallur, Sridhar, Creta, Kenneth C., Spink, Aaron T., Hacking, Lance E.
Patent | Priority | Assignee | Title |
10191865, | Apr 14 2016 | Amazon Technologies, Inc | Consolidating write transactions for a network device |
10754797, | Apr 14 2016 | Amazon Technologies, Inc. | Consolidating write transactions for a network device |
8285893, | Oct 13 2006 | Dell Products L.P. | System and method for adaptively setting connections to input/output hubs within an information handling system |
Patent | Priority | Assignee | Title |
6101568, | Aug 25 1998 | STMicroelectronics, Inc. | Bus interface unit having dual purpose transaction buffer |
6223641, | Nov 12 1996 | Xynatech, Inc.,; XYNATECH, INC , A NEW MEXICO CORPORATION | Perforating and slitting die sheet |
6233641, | Jun 08 1998 | International Business Machines Corporation | Apparatus and method of PCI routing in a bridge configuration |
6400730, | Mar 10 1999 | AVAGO TECHNOLOGIES INTERNATIONAL SALES PTE LIMITED | Method and apparatus for transferring data between IP network devices and SCSI and fibre channel devices over an IP network |
6457084, | Nov 23 1998 | Advanced Micro Devices, Inc. | Target side distributor mechanism for connecting multiple functions to a single logical pipe of a computer interconnection bus |
6553446, | Sep 29 1999 | Hewlett Packard Enterprise Development LP | Modular input/output controller capable of routing packets over busses operating at different speeds |
6601118, | Jul 18 1997 | U S BANK NATIONAL ASSOCIATION, AS COLLATERAL AGENT | Dynamic buffer allocation for a computer system |
6629166, | Jun 29 2000 | Intel Corp | Methods and systems for efficient connection of I/O devices to a channel-based switched fabric |
6683883, | Apr 09 2002 | CALLAHAN CELLULAR L L C | ISCSI-FCP gateway |
6748496, | Apr 18 2000 | ATI Technologies ULC | Method and apparatus for providing cacheable data to a peripheral device |
6813653, | Nov 16 2000 | Sun Microsystems, Inc. | Method and apparatus for implementing PCI DMA speculative prefetching in a message passing queue oriented bus system |
6877049, | May 30 2002 | Viavi Solutions Inc | Integrated FIFO memory management control system using a credit value |
6880062, | Feb 13 2001 | NetApp, Inc | Data mover mechanism to achieve SAN RAID at wire speed |
6950438, | Sep 17 1999 | SAMSUNG ELECTRONICS CO , LTD | System and method for implementing a separate virtual channel for posted requests in a multiprocessor computer system |
7356608, | May 06 2002 | TAHOE RESEARCH, LTD | System and method for implementing LAN within shared I/O subsystem |
20020087801, | |||
20020174229, | |||
20030023666, | |||
20030185154, | |||
20040015503, | |||
20040019729, | |||
20040193757, | |||
20050005044, | |||
20050066225, | |||
20050071534, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Apr 20 2004 | Intel Corporation | (assignment on the face of the patent) | / | |||
Aug 19 2004 | CRETA, KENNETH C | Intel Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 015739 | /0777 | |
Aug 19 2004 | MUTHRASANALLUR, SRIDHAR | Intel Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 015739 | /0777 | |
Aug 23 2004 | AJANOVIC, JASMIN | Intel Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 015739 | /0777 | |
Aug 24 2004 | SPINK, AARON T | Intel Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 015739 | /0777 | |
Aug 26 2004 | HACKING, LANCE E | Intel Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 015739 | /0777 |
Date | Maintenance Fee Events |
Mar 14 2013 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Oct 23 2017 | REM: Maintenance Fee Reminder Mailed. |
Apr 09 2018 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Mar 09 2013 | 4 years fee payment window open |
Sep 09 2013 | 6 months grace period start (w surcharge) |
Mar 09 2014 | patent expiry (for year 4) |
Mar 09 2016 | 2 years to revive unintentionally abandoned end. (for year 4) |
Mar 09 2017 | 8 years fee payment window open |
Sep 09 2017 | 6 months grace period start (w surcharge) |
Mar 09 2018 | patent expiry (for year 8) |
Mar 09 2020 | 2 years to revive unintentionally abandoned end. (for year 8) |
Mar 09 2021 | 12 years fee payment window open |
Sep 09 2021 | 6 months grace period start (w surcharge) |
Mar 09 2022 | patent expiry (for year 12) |
Mar 09 2024 | 2 years to revive unintentionally abandoned end. (for year 12) |