A shared memory provides data access to a plurality of agents (e.g., processor, cells of processors, I/O controllers, etc.) and includes a memory and a memory controller. The memory controller selectively provides memory access to the agents in both coherent and read current modes of operation. In the coherent mode, the memory controller ensures that the data stored in system memory is accurately and precisely mirrored in all subservient copies of that data as might typically be stored in agent cache memories. Using, for example, a MESI type protocol, the memory controller limits access to memory so that only an "owner" or a particular portion or line of memory has write access and that, during the extension of these write privileges, no other agent has a valid copy of the data subject to being updated. Thus, the memory controller implements a first set of rule in the coherent mode of operation to insure that all copies of data stored by the agents are coherent with data stored in the memory. In a read current mode of access, a read-once segment of data is copied to an agent with the agent implementing a second set of rules to minimize or eliminate the possibility that the data might become stale prior to use or that it be misused by another agent or process. Thus, in the read current, an "uncontrolled" copy of the data is released subject to certain restrictions to be implemented by the receiving agent as defined by a second set of rules. For example, these rules require that the agent's copy of data be provided as an output and then invalidated within a predetermined period of time, that the agent have limit access to the memory during any set of fetch operations to no more than a predetermined maximum block size. Also included is a requirement that the copy of data include only that data contained within a range of memory addresses, the range beginning within a predetermined block of memory addresses and continuing through an end of block address. These limitations limit the amount of data that might be subject to misuse, particularly in the case of a failure resulting in the inability of a requesting agent to complete a transaction according to the rules.
|
14. A method of providing memory access to a plurality of agents, comprising the steps of:
selectively providing memory access in a coherent mode of operation including implementing a first set of rules insuring that all copies of data stored by the agents are coherent with data stored in said memory; and selectively providing read current memory access implementing a second set of rules for providing read only access to data stored in said memory.
1. A shared memory providing access to a plurality of agents, comprising:
a memory; and a memory controller configured to selectively provide memory access to the agents in both coherent and read current modes of operation, said memory controller implementing a first set of rules in said coherent mode of operation to insure that all copies of data stored by the agents are coherent with data stored in said memory, and a second set of rules for providing a copy of data stored in said memory in said read current mode of operation.
9. A shared memory providing access to a plurality of agents, comprising:
a memory; and a memory controller configured to selectively provide memory access to the agents in both coherent and read current modes of operation, (i) in said coherent mode of operation, said memory controller selectively providing at least two types of access to data stored in said memory, including (a) shared access to data stored in said memory to a plurality of said agents by providing each of said plurality of agents with a copy of said data, and (b) exclusive access to data stored in said memory to a requesting one of said agents wherein all preexisting copies of said data maintained by others of said agents are invalidated prior to providing said requesting agent with said exclusive access and said data stored in said memory is updated to reflect changes made to said data by said requesting agent as part of terminating said exclusive access, said memory controller maintaining a status of each line of data being accessed by one or more of said agents in said coherent mode of operation including identification of each of said agents having a copy of corresponding ones of said lines of data and an indication of the type of access provided thereto, (ii) in said read current mode of operation, said memory controller providing a requesting agent with access to said memory to create a copy of data stored in said memory subject to a set of rules implemented by the agents. 2. The shared memory according to
3. The shared memory according to
4. The shared memory according to
5. The shared memory according to
6. The shared memory according to
7. The shared memory according to
8. The shared memory according to
10. The shared memory according to
11. The shared memory according to
12. The shared memory according to
(i) data supplied in said read current mode of operation is to be sequentially and incrementally transferred by the receiving agent beginning at a starting address through an end of block address; (ii) data supplied in said read current mode of operation is to be flushed by the receiving agent upon completion of a transfer operation; (iii) data supplied in said read current mode of operation is to be flushed by one of the agents upon detection of a first fault condition; (iv) data supplied in said read current mode of operation is to be flushed upon an interruption of a transfer of said data; (v) fetching operations to supply said copy of data are to be terminated upon detection of a second fault condition; and (vi) in response to a third fault condition, fetching operations to supply said copy of data are to be completed through an end of a corresponding DMA block address and said copy of said data is to be flushed.
13. The shared memory according to
15. The method according to
creating a cache copy of data using said read current memory access; transferring to a peripheral device said cache copy of data; and flushing said cache copy of data.
16. The method according to
17. The method according to
19. The method according to
(i) data supplied in said read current mode of operation is to be sequentially and incrementally transferred by a receiving agent beginning at a starting address through an end of block address; (ii) data supplied in said read current mode of operation is to be flushed by a receiving agent upon completion of a transfer operation; (iii) data supplied in said read current mode of operation is to be flushed by one of the agents upon detection of a first fault condition; (iv) data supplied in said read current mode of operation is to be flushed upon an interruption of a transfer of said data; (v) fetching operations performed in said read current mode of operation are to be terminated upon detection of a second fault condition; and (vi) in response to a third fault condition, fetching operations performed to a cache in said read current mode of operation are to be completed through an end of a corresponding DMA block address and said cache is to be flushed.
20. The method according to
|
This invention relates generally to computer memory systems and more particularly to an access method for reading from a shared memory to an I/O device.
In a computer system, problems often arise when more than one device attempts to access information stored in a memory location. While multiple devices can access a single memory location, if one of the accessing devices attempts to update the information in the memory location, without informing the other devices who also have access to the specific memory location, data mismatches may occur resulting in a loss of data coherency.
To speed access and minimize data latency, some memory accessing devices use a local memory cache. Within the local cache the device may store a copy of information which it has recently accessed. Thus, there may be several copies of information scattered throughout a system.
When a system implements local cache, main and cache memory may be organized into cache lines. A cache line is typically 64 bytes of data. Therefore, when a device attempts to access a specific memory location the cache controller first searches its local cache to determine if it already has a copy of the information available. If a copy of the requested memory location is not currently stored in the local cache, the cache controller attempts to obtain a copy of the cache line from the system memory controller. If the information is available in the local cache, the device will use the information in the local cache. Issues arise when multiple devices attempt to access the same cache line, and each device stores copies of this information in local cache. Not only must access conflicts be resolved, but procedures must be implemented to ensure coherency of the various copies of the data contained in the multiple caches and in the main memory.
Numerous protocols exist which maintain cache coherency across multiple caches and main memory. One such protocol is called MESI. MESI protocol, which is described in detail in M. Papamarcos and J. Patel, "A Low Overhead Coherent Solution for Multiprocessors with Private Cache Memories," in Proceedings of the 11th International Symposium on Computer Architecture, IEEE, New York (1984), pp. 348-354, incorporated herein by reference. MESI stands for Modified, Exclusive, Shared, Invalid. Under the MESI protocol, a cache line is categorized according to its use. A modified cache line indicates that the particular line has been written to by the cache that is the current owner of the line. An exclusive cache line indicates that a cache has exclusive ownership of the cache line, which will allow the cache controller to modify the cache line. A shared cache line indicates that one, or more than one, cache(s) have ownership of the line. A shared cache line is considered read only and any device under the cache may read the line but no one is allowed to write to the cache. A cache line with no owner identifies a cache line whose data may not be valid since the cache no longer owns the cache line.
While MESI is a standard term in the industry, other classifications or nomenclature are frequency employed. Modified cache lines are typically referred to as private dirty. Exclusive cache lines are typically referred to as private cache lines. Private cache lines which have not been updated are typically referred to as private clean cache lines.
If a specific device requires access to a specific memory location it will check its local cache to determine if the information is available there. If the information is not currently contained within the local cache, the cache controller will go to main memory to access the information. Before requesting the cache line from the system memory controller, the cache controller decides what type of ownership for the line it will seek (i.e. shared, private, etc.). If a cache wants private ownership of a cache line, the system memory controller will ensure that no cache has the same line. The system memory controller will not allow any other cache to get ownership of this line so the cache's access to the line will be private. If the cache's access to the line is shared, more than one cache may have the same line as shared simultaneously.
A cache must have private or exclusive ownership of a cache line to modify the line. That is, other copies may not be relied upon as being valid until after the data is modified and an updated version is supplied. If the cache line is currently categorized as read only, the cache which needs to update the information must make a request to the system memory controller, for exclusive access to the cache line. The system memory controller then identifies any other caches having access to the cache line, and makes the necessary arrangements for the other caches to relinquish access and for the requesting cache to have exclusive use of the cache line.
One method for a cache to obtain the exclusive use of a cache line is for the memory controller to invalidate other copies of the cache line currently in use. Once other caches' access to the cache line has been invalidated, the remaining cache has exclusive use of the cache line and can update the cache line accordingly.
One of the methods to implement cache coherency is a directory-based system where rather than sending each transaction to every other agent or other processor in the system a table is maintained for each cache line which indicates which agent(s) have the cache line. The system memory controller consults with the directory to see the status of the cache line before it allows a cache to get data. For example, another cache has the cache line as private, the memory controller recalls the cache line from the original owner.
A device first checks its local cache to determine whether the information is available. If the information it requires is not locally available, the cache controller servicing the device sends a read request for shared ownership of the cache line via the system interconnect fabric to the system memory controller. If the requested cache line is shared or is not in use by any other cache, the system memory controller sends the data from main memory to the requesting cache directory reflecting the new owner of the cache line in shared mode. Once the requesting device has completed its use of the cache line, the cache controller sends a message relinquishing ownership of the cache line to the system memory controller. Upon receipt of his message the system memory controller removes the cache as a shared owner from its directory tag. If the cache controller had requested private access to the cache line but had updated the information stored within the cache line, the cache controller also sends updated information to the system memory controller. Each of these transactions between the cache controller and the system memory controller consume a portion of the interconnect fabric bandwidth. Additionally the system memory controller bandwidth is also affected by these transactions. A reduction in the number of steps required for the cache controller to acquire and relinquish ownership of the cache line would provide a corresponding improvement to overall system performance.
Data requested and accessed by the cache controller can be one of several types. First, the cache controller request and memory access can be associated with payload data. Payload data consists of a large data transfer so that it is often handled as blocks of 1 to 4 KB of data. A second type of data that can be requested and accessed by the cache controller is control data and is generally smaller in size. This information is typically between 4 and 32 KB and can be accommodated in one cache line for most applications.
In addition to the shared and private memory requests discussed previously, a third type of access to information stored in system memory exists. A "read current" transaction results in the requesting cache controllers receiving coherent data of the cache line at the time of the request but does not affect the ownership of the cache line. A read current transaction guarantees that the information contained within the cache line was up to date at the time the data was obtained from the system memory controller. One of the advantages of a read current request is that, after the device has used the data, the cache controller does not have to relinquish ownership of the cache line since the read current had no effect on the ownership of the cache line within the system. The elimination of this third step in the normal cache controller access of memory location improves the useable bandwidth in the system fabric.
Contentions can arise between two caches which both attempt to obtain ownership of a cache line. Effectively, after the system memory controller has assigned one cache ownership of the cache line, the second cache can request ownership (prior to the first cache's use of the cache line), resulting in the first cache losing ownership of the cache line. Similarly, after the system memory controller assigns ownership of the cache line to the second cache, but before the second cache has used the cache line, the first cache can again request the cache line. In some cases this contention between the requesting caches can result in the cache line "ping-ponging" between the two contending caches. This prevents either device under the two caches from making forward progress. It also results in increased latency for the devices/processors accessing the cache line.
A need currently exists to improve the useable bandwidth for large payload transfers by eliminating one of the three steps in a normal data transfer. Additionally, a further need exists to eliminate the presence of "ping-ponging" which may occur between contending caches.
These and other objects, features and technical advantages are achieved by a system and method which according to an aspect of the invention, a shared memory providing access to a plurality of agents (e.g., processor, cells of processors, I/O controllers, etc.) includes a memory and a memory controller. The memory controller selectively provides memory access to the agents in both coherent and read current modes of operation. In the coherent mode, the memory controller ensures that an agent's cache reflects the most up to date data. Using, for example, a MESI type protocol, the memory controller limits access to memory so that only one "owner" gets the cache line at any time for write access and that, during the extension of these write privileges, no other agent has a valid copy of the data subject to being updated. Thus, the memory controller implements a first set of rule in the coherent mode of operation to insure that all copies of data stored by the agents are coherent with data stored in the memory. In a read current mode of access, a read-once segment of data is copied to an agent with the agent implementing a second set of rules to eliminate the possibility that the data might become stale prior to use or that it be misused by another agent or process. Thus, in the read current, an "uncontrolled" copy of the data is released subject to certain restrictions to be implemented by the receiving agent as defined by a second set of rules.
According to a feature of the invention, the second set of rules require that the agent's copy of data be provided as an output and then invalidated after use. These rules further require that the uncontrolled data, i.e., the agent's copy, be limited in size and terminate in a predetermined block boundary. These limitations restrict the read current data from causing data corruption including the scenario where the requesting device were to experience a failure such that protection of the data might be compromised.
According to another feature of the invention, the second set of rules require that the copy of data be accessed by an external device in a sequentially incremental fashion and that, upon completion of the transfer, the copy of data be flushed from the host agent's cache. Again, these rules are directed at ensuring that the data is used immediately and not permitted to become stale, and that the data is flushed, i.e., is read once and only once. Another rule implemented by the agents according to the read current protocol is a requirement that detection of an error condition causes the copy of data to be flushed. These errors include, for example, a failure of the requesting agent such that it cannot comply with or implement one or more of the safeguard required to protect this uncontrolled data from misuse.
According to another feature of the invention, different sets of rules, of subsets of rules, are used and applied to provide payload versus control data and traffic. The dichotomy accommodates differences between these data types, include typical sizes and uses of the data.
According to another aspect of the invention, a shared memory providing access to a plurality of agents includes both a memory and a memory controller. The memory controller selectively provides memory access to the agents in both coherent and read current modes of operation. In the coherent mode of operation, the memory controller follows cache coherence protocol in prior art such as the MESI protocol. In the read current mode of operation, the memory controller provides memory access to an agent so that a copy of data stored in the memory is retrieved subject to the agent implementing a set of rules to avoid the data going stale or being misused. The agents may implement theses rules using appropriate control logic.
The read current rules implemented by the agents include:
(i) data supplied in the read current mode of operation is to be sequentially and incrementally transferred by the receiving agent beginning at a starting address through an end of a predetermined block address;
(ii) data supplied in the read current mode of operation is to be flushed by the receiving agent upon completion of a transfer operation;
(iii) data supplied in the read current mode of operation is to be flushed by one of the agents upon detection of a first fault condition;
(iv) data supplied in the read current mode of operation is to be flushed upon an interruption of a transfer of the data;
(v) fetching operations to supply the copy of data are to be terminated upon detection of a second fault condition; and
(vi) in response to a third fault condition, fetching operations to supply the copy of data are to be completed through an end of a corresponding DMA block address and the copy of the data is to be flushed.
According to another aspect of the invention, a method of providing memory access to a plurality of agents selectively provides memory access in a coherent and read current mode of operation. A first set of rules, initiated by the controller, ensures that all copies of data stored by the agents are coherent with data stored in the memory. A second set of rules supporting read current memory access are implemented by the agents without controller monitoring and/or intervention so as to provide the agents with read only access to data stored in the memory. Read current procedures implemented by a participating agent include creating a cache copy of data using the read current memory access, transferring to a peripheral device the cache copy of data, and then flushing the cache copy of data.
The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention. It should be appreciated by those skilled in the art that the conception and specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. It should also be realized by those skilled in the art that such equivalent constructions do not depart from the spirit and scope of the invention as set forth in the appended claims.
For a more complete understanding of the present invention, and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawing, in which:
The processor bus 120 connects one or more processor(s) 125 to the system memory controller 130. The system memory controller 130 controls the coherency tasks and may also control activities on the bus including which of the connected components acquires ownership of the processor bus 120. The system memory controller 130's main function is to control access to the main memory of the system or, within cell 0 (105), the DIMMS 135. It should be noted that the DIMMS 135 are but one type of memory, the invention not being limited to the use of any particular storage device, but applicable to any type of main system memory including SRAMs. The main memory, or DIMMs 135 in cell 0 (105), are used to store any system information including, e.g, system, utility and application software, data, configuration files, etc.
Also connected to the processor bus 120 is a host I/O (input/output) bridge 140. The host I/O bridge 140 is the gateway into one or more I/O buses 145. The host I/O bridge 140 controls traffic between the processor bus 120 and the I/O cards 150. Within each I/O bus 145, one or more I/O cards 150 provides the system with external connectivity, i.e., connections to the outside world. These I/O cards 150 provide connections to local and remote peripheral devices including disks (not shown), tape drives (not shown), or any other I/O device. An I/O device is then available to a processor 125 and can access memory 135 through a specific I/O card 150, the respective I/O bus 145 which the I/O card resides within, the host I/O bridge 140, and the processor bus 120.
Overall, the processors 125 of cell 0 (105) perform the required processing, the system memory controller 130 controls all of the coherency actions in the system and the access to the main memory 135, and host I/O bridge 140 controls access between the processor bus 120 and the I/O cards 150 which, in turn, control the flow of data into, or out of, the system.
Referring to
The cache storage unit 415 is divided into cache lines. Shown in
The cache tag 420 consists of a cache line address 445, an I/O bus number 450, a device number 455, and a transaction ID 460. While prior art systems normally only included cache line address 445, the present invention farther provides additional fields including I/O bus number 450, device number 455 and transaction ID 460 as part of the cache tag 420. Normally a request for a cache line would include a cache line address, and the requested cache line address would be stored. The 110 bus number 450 is an identifier for the specific I/O bus 145 from which the request was received. Similarly, the device number 455 is an identifier for the specific device 150 which made the cache line request. These two identifiers specify which device made the original request for the cache line. The cache line address 445 is used to determine the cache line boundary. For example, if the byte address is known and the cache line size is 64 byte, then the lower six bits can be ignored resulting in alignment of the 64 bytes. The cache line address would be equal to the remaining portion.
The transaction ID 460 is used for forward looking I/O bus specification such as PCIX or infinity band, or NGIO which are used for split transactions. A split transaction uses a transaction ID to identify the read requests sent by the I/O device 150 of
Normally, an I/O device 150 makes a read request on an I/O bus 145, which is sent to the I/O bus interface 320 and then sent to the cache unit 315. The I/O device 150's request asks the cache unit 315 to provide the data immediately, and the cache unit 315's cache controller 405 looks to the cache tag 420 to determine if that specific cache line is in use. If the cache line is available and the cache status is alright, the data is supplied right away. If the cache line is not available, the cache controller 405 tells the I/O bus interface 320 to tell the I/O device 150, via the I/O bus 145, to try the read at a later time. In anticipation of I/O device 150's follow up request, the cache unit 315 will request the cache line from the system memory controller 130 through the processor bus interface 305 on the processor bus 120. When the cache line becomes available, the data is brought into the cache unit 315 and will be available when the I/O device 150 makes a subsequent request for the cache line. The I/O device 150 will continue to receive retries until the data becomes available. On a split read the cache controller 405 sends data to the I/O device 150 voluntarily as soon as the cache line data is returned by the system memory controller 130. The data goes out with the transaction ID 460.
Referring back to
The fields for invalid 465, private 475, dirty 480, read current 485, fetch in progress 490 and split read 492 represent the status of the cache line. Invalid, shared, private or dirty represent the set up of the cache line. These fields reflect the ownership of the cache line as far as the system memory controller 130 and the cache controller 405 is concerned and as reflected in the type field 225 and the owner fields 230-240 of FIG. 2. The fetch in progress 490 of
The first field, the I/O bus number 510, represents the specific I/O bus (145 of
Since each I/O bus (145 of
The second field of DMA request 505 shown in
The next field, transaction ID 535, holds the transaction ID for split read requests, originates from the I/O device 150 (
The remaining fields do not come directly from the I/O bus 145 (FIG. 3), but are generated by the I/O bus interface 320 (FIG. 3). The block boundary 540 is used as a stopping point of a data read. A typical block boundary is I kilobyte (Kb). The block boundary is similar to a commitment boundary point that software provides to the hardware so that the data is read from the starting point to the end of the associated block boundary.
Read current mode field 545 indicates whether a particular DMA read can be requested by the cache unit 315 (
The DMA read/write unit 610 further communicates with the read current unit 605. The DMA read/write unit 610 passes the byte address of the data that is going on the I/O bus 145 and the address and the device number of new requests to read current unit 605. The read current unit 605 examines this information and implements a set of rules that allows the read current mode to be used in a directory based coherent I/O system. The read current unit 605 implements a portion of these read current rules in connection with the cache unit 315 (
The software programmable read current register 705 consists of a set of registers storing data used to implement the read current solutions of the present invention. The preferred embodiment includes two sets of inputs to the software programmable read current register 705. The DMA read/write unit 610 supplies information pertaining to the command, i.e., the address and the device number related to a new read request. This information also goes to the control unit 710 which determines if a read current transaction is applicable to this request. At approximately the same time, the address and the device number are received by the programmable read current registers 705 and, if a match is found with the information stored in the register, the information is supplied to the control unit 710.
The control unit 710 and software programmable read current registers 705 work together: the control unit 710 determines if the command is a read command; the software programmable read current register 705 determines if the address and device number indicate that the software prefers the use of a read current for this transaction. If both the control unit 710 and the software programmable read current register 705 determine that a read current transaction is preferred, the control unit will release the block boundary information. The block boundary information is supplied by the software programmable read current registers 705 and indicates that the software has guaranteed that reads will not exceed a particular block boundary. The control unit 710 determines the type of read current data desired, i.e., graphics or DMA read current. Once this determination is made, the control unit 710 determines the classification of the transaction including whether or not read current is warranted and, if read current is warranted, whether it should be DMA or graphics read current. The control unit 710 supplies the block boundary 540, the read current mode 545 and the payload cache control 550.
The control unit 710 also receives read/write path for software to access registers from the cache unit 315. This information is used to provide access information so that software can read and write to the appropriate register. For example, if software wants to read or write to this register it sends a read or write command that flows through the processor 120 (FIG. 1), through the processor interface 305 (FIG. 3), through the internal bus 310 (FIG. 3), to the cache unit 315 (FIG. 3), and then to the control unit 710.
Whenever a read current occurs, the end of the cache line detection control 715 reads the block boundary 545, read current mode 545, and payload/control 550 from the DMA request and address and I/O bus data from the DMA read/write unit 605 (
For example, if a read data is served in the I/O bus 145, the byte address is sent to end of cache line detection control 715 by DMA read/write unit 610. This information is used to determine if the cache unit 315 (
The flow chart of
In Step 820, the system determines if a setup is available for this payload block boundary in one of the I/O interfaces' read current registers (705 of FIG. 7), i.e., whether it is possible to use read current for this transaction. If a setup is not already available, Step 825 causes an appropriate read current register to be setup in the I/O interface with the appropriate address range, block boundary value, and device ID. Software programmable read current register in block 705 (
Using the payload block boundary, Step 830 sets up the appropriate DMA structure so that the requesting I/O device 150 (
For example, if a transfer included data from 0 to 1.5 kilobytes and a 1 Kb block boundary was used, two blocks of data can be read. The first read will access information from 0 to 1 Kb and Step 845 will determine that not all of the information has been transferred. Accordingly, Step 845 would return to Step 810 so the remaining information from 1 to 1.5 KB can be transferred. The remaining information is examined to determine if the remaining data can be rearranged in the hardware to end on a block boundary. If a block boundary of 512 bytes were used, the remaining information may be transferred via read current transactions too.
Returning to Step 805, if the data being accessed does represent payload data, but instead consisted of control data, the typical characteristics of such data indicate that only a relatively small chunk of data will be involved in the transfer. Typically, the size of this data is 4 bytes, 8 bytes, or 16 bytes, and the data is typically contained in a single cache line. Step 850 determines if device 150 will access this control data as read only. If the device 150 requires write access to the data, Step 860 initiates traditional DMA read steps necessary to complete the transfer. If, however, the device will only read the data, Step 855 determines if a single transaction can be used to read all of the requested control information bytes and if the use of a read current transaction would be beneficial. If the answer to either of these questions is "no" flow continues to Step 860 and traditional DMA read steps are taken. If the control data can be accessed in a single transaction and if the use of the read current would be beneficial, Step 865 determines if an entry is available in the software programmable read current register 705 (FIG. 7). If an entry is available it is used for the read current transaction (similar to Step 820) and if one is not available, Step 870 attempts to set up an appropriate read current register in the I/O interface control traffic dynamically.
Since control data is limited in size, block boundaries are not used with control data. Instead of block boundaries, the information obtained from the software programmable read current register consists of checkup information specific to the control data and the I/O interface readings. This prevents hardware from prefetching data.
Steps 910 and 915 authorized the cache unit 315 (
In Step 930, the cache unit 315 (
When the read request is received, Step 1010 gathers the command, address, and it device number and sends the information to the read current unit 605 (FIG. 6). The read current unit 605 accesses the software programmable read current register 705 (
Returning to Step 1015, if software programmable read current registers 705 and the control unit 710 determine that read current is possible, processing flow continues to Step 1030. If control traffic is involved, as determined by software programmable read current registers 705 and the control unit 710, Step 1035 is used to set the control traffic block boundary to 0. This ensures no prefetch is done by the cache unit 315. Additionally, the read current mode is set to DMA or graphics read current (from Step 705 of FIG. 7), the payload/control field is set to control and the request is submitted to the cache unit 315 (FIG. 3). In this case, prefetches will not be used. The information used in Step 1035 is obtained from block boundary 540, read current mode 545 and payload/control 550 as shown in FIG. 7.
Step 1040 determines if the data or the current retry has returned from the cache unit 315 (FIG. 3). This is where the I/O interface 320 is looking for something from the cache unit 315. In Step 1040, read requests are sent to the cache unit 315, and Step 1040 is awaiting a report from the cache unit 315. The cache unit 315 should return the data or a retry request. In Step 1045, if the cache unit 315 has not responded, a determination is made whether a retry should be attempted. If a retry should be attempted, Step 1050 issues the retry on the bus, disconnects and waits for the cache unit 315 to return the data. If a retry should not be attempted, the data is given to the requesting I/O card or device 150 (
Returning to Step 1030 of
Step 1055 of
If, however, either the end of the cache line has been reached in Step 1110, or the I/O device or card 150 (
Step 1210 determines if the data or the current retry has returned from the cache unit 315 (FIG. 3). This is where the I/O interface 320 (
If the answer is no at to Step 1215 then Step 1225 gives data to the cards. If a retry is not being attempted, cache unit 315 (
Returning to Step 1230, if the end of the cache line has gone out on the I/O bus, then Step 1235 issues a flush request to the cache unit 315 (
Still referring to Step 1230, if the end of the cache line has not gone out on the I/O bus, control is transferred Step 1305 of FIG. 13. Returning to Step 1210 Step 1310 of
If, however, the cache line address is not in the cache tag, the answer in Step 1410 is then no, and Step 1415 is encountered. Step 1415 identifies and makes room for a cache entry to be used by the read request. In other words, in Step 1415 an I/O device 150 is requesting to read a particular cache line that does not exist in my cache unit 315. In Step 1415, the cache unit 315 identifies the cache line and makes room for the storage of that particular cache entry. In order to do this, the cache line location needs to be identified for the ultimate storage of the cache line.
In Step 1420, the cache line address, the I/O bus number, the device number and the transaction ID are stored in the tag unit. This output comes from the control unit 710 (FIG. 7), and appears in the form of the DMA read 505 (
In Step 1425, a determination is made as to whether the read current mode is on. If the DMA read current or graphic read current are enabled, the yes branch is followed from Step 1425 with processing continuing at Step 1440. If no read current is enabled, the no path is be followed to Step 1430. At Step 1430, the cache line is requested as read shared and invalid is set to 1, share is set to 1, private is set to 0, dirty is set to 0, read current is set to 0 because read current is not requested and fetch in progress is set to 1. This processing establishes the cache line as shared. The request goes from cache unit 315 (
Referring again to the cache status 425, a value of 1 means that, as in the instant situation, 1 because the access that a split read is being used. If the access were not a split read access, this value would be a 0.
Referring again to Step 1425, if read current mode were on, Step 1440 would be encountered and the request cache line would be read from the system memory controller as read current. In this case, invalid equals 1, share equals 0, private equals 0, dirty equals 0, read current equals 1, fetch in progress equals 1 and split read equals 0.
From either Step 1440 or Step 1430, Step 1435 is next encountered. In either case, within Step 1435, a retry is transmitted to the I/O bus interface 320 (FIG. 3). This retry will try to access the device at a later time. From Step 1435, flow continues at Step 1605 of
Referring back to Step 1410, if the cache line address is in the cache tag, processing continues at Step 1505 of FIG. 15. Upon answering yes to Step 1410 in FIG. 14. This means that a cache line address is in the cache tag as described in cache line address 525 (
Referring again to Step 1515, the read current was not equal to 1, control is transferred to Step 1535.
In summary, the determinations made by the processing shown in this flow chart are attempting to determine if the same device in the same I/O bus for which a particular cache line has been accessed as read current is the same device as is making the current request. This determination is represented in Step 1520.
If, in Step 1520, the answer is a no, Step 1525 sends a retry to I/O bus interface 320. Step 1525 represents a cache line for which another I/O device 150 has requested the cache line as a read current and a subsequent I/O device 150 cannot access this particular cache line.
If instead, the answer at Step 1520 is a yes then, in Step 1530 a determination is made concerning whether the split read and transaction ID are matched or not. If the cache access is a split read, then the I/O bus number, the device number, and the transaction ID must match to determine that the same I/O device is trying to access the cache line. If these items don't match, Step 1535 is encountered and a send requested data to the I/O bus interface is performed. If a split read is not present, the I/O bus number and the device number match with the information contained in the information contained in the cache tag 420. If a split read is present then the transaction identifier must also be checked to determine the outcome of Step 1530. The transaction ID will match if transaction ID 535 (
Referring back to Step 1530, if the answer is a yes and everything matches, processing continues at Step 1525 and a retry is sent to the I/O bus interface. Since the same device and the same I/O bus number are involved, but a different transaction ID is present, no data is transferred.
From Step 1535, after the request for the data has been sent to the I/O bus interface, Step 1540 is encountered. In Step 1540, a determination is made concerning whether the I/O bus interface needs more data. If the answer to this question is no, the transaction terminates.
If, however, the I/O bus interface needs more data, then, in Step 1545, a determination is made as to whether the end of the cache line has been reached with the last transfer. If the answer is "yes" then Step 1550 is encountered and a new request for the next cache line is made. In this case, a write address occurred and the cache line address is given to the next cache line address and is no longer in the previous cache line address. This flushes the data. If, however, the answer at Step 1550 is no, then the flow is returned to Step 1535 and the requested data is sent to the I/O bus interface again. This loop will ensure that additional data is accessed until the end of the cache line is reached.
If read current mode is active in Step 1610, Step 1620 is encountered and a determination is made concerning whether the cache line address is equal to the previous cache line address (which was stored in cache line address 525 of
If, in Step 1625, the cache line address does not cross the block boundary, the answer is no and Step 1630 is encountered. In Step 1630, a determination is made concerning whether the prefetch algorithm allows prefetch to occur. Prefetch operations are performed only if there is bandwidth available to support the prefetch operations. For example, a decision may be made that the prefetch will only occur to a depth of two. If the prefetch has already been performed to a depth of two, additional prefetch may not be performed to conserve system resources. Once the available data is used, additional prefetches are desirable.
Referring again to Step 1630, if the prefetch algorithm does not allow prefetch to occur, Step 1635 determines if software has terminated the prefetch operation 910 (FIG. 9). If the answer to this question is yes, a return is encountered and the process is ended. If the answer to this question is no, Step 1630 is again encountered.
Referring back to Step 1630, if the prefetch algorithm allows prefetch to occur, Step 1640 is encountered and an entry in the cache is acquired for the prefetch operation. In both Step 1630 and Step 1640 a determination is made as to whether the prefetch line is already available to the cache unit 315 (FIG. 3). From Step 1640, 1645 is encountered when an entry is not available from the cache. In Step 1645, the computed cache line address and device ID, I/O bus number and the transaction ID of the original request and the tag unit are set. These are determined from the DMA read request 505 (FIG. 5). The updated, newly computed cache line address 1620 is not used in this Step 1645. This is possible because the device ID, the I/O bus interface and the transaction ID do not change.
In Step 1650, requests the cache line from the system memory controller as read current. Within Step 1650 read current graphic or read current DMA value will be decided by the read current mode 545 (
Step 1810 determines if the read current is equal to 1. If the read current is equal to one, Step 1815 sets invalid equal to 1 and releases the cache entry for future use. If, in Step 1810, the read current is not equal to 1, Step 1820 determines if 1) share is equal to one or 2) if private is equal to one and dirty is equal to 0. If either of these two conditions are equal Step 1825 is encountered. In Step 1825, invalid is set equal to 1, the update_tag for the cache line address to the system memory controller is sent and release the cache entry from future use. Finally, in Step 1820, if 1) share is not equal to one nor 2) private is not equal to one or dirty is equal to one, Step 1830 is encountered. At Step 1830, invalid is set equal to 1, a write_back transaction is sent giving ownership and data to the system memory controller and release the cache entry from future use. A write_back transaction means the ownership of the cache line is being given up.
Referring to
If a recall for graphics read current has bee issued, control is passed to Step 2005 of FIG. 20. If a recall has not been issued for graphics read current then Step 1910 is encountered. In Step 1910, a determination is made whether 1) share is equal to 1 or 2) private equals 1 and dirty equals 0. If either of these two conditions exists, Step 1915 is encountered and invalid is set equal to 1, the recall_ack (no data) is sent to the system memory controller and a the cache entry is release for future use. Alternatively, if neither of the two conditions are present 1) share is not equal to 1 or 2) private equals 0 or dirty equals 1, Step 1920 is encountered. In Step 1920 a determination is made as to whether private equals 1 and dirty equals 1. If both of these conditions are true, Step 1925 is encountered and invalid is equal to 1, a recall_data request is sent to the system memory controller and a release is issued to the cache entry for future use. Finally, if either private equals 0 or dirty equals 0, Step 1930 is encountered and a recall_nack is sent to the system memory controller.
Wickeraad, John A., Sharma, Debendra Das, Ebner, Sharon M., Cowan, Joe P., Jackson, Carl H.
Patent | Priority | Assignee | Title |
10567493, | Aug 20 2015 | DRNC HOLDINGS, INC | Intelligent predictive stream caching |
6918015, | Aug 31 2000 | Hewlett-Packard Development Company, L.P. | Scalable directory based cache coherence protocol |
7265865, | Jun 19 2002 | HEWLETT-PACKARD DEVELOPMENT COMPANY L P | Security for mass storage devices in imaging devices |
7328310, | Feb 02 2005 | MEIZU TECHNOLOGY CO , LTD | Method and system for cache utilization by limiting number of pending cache line requests |
7330940, | Feb 02 2005 | MEIZU TECHNOLOGY CO , LTD | Method and system for cache utilization by limiting prefetch requests |
7496715, | Jul 16 2003 | Unisys Corporation | Programmable cache management system and method |
7949726, | Mar 12 2004 | OCEAN AND COASTAL ENVIRONMENTAL SENSING, INC | System and method for delivering information on demand |
8364851, | Aug 31 2000 | Hewlett Packard Enterprise Development LP | Scalable efficient I/O port protocol |
8621323, | May 13 2004 | SanDisk Technologies, Inc | Pipelined data relocation and improved chip architectures |
8914703, | Dec 21 2004 | SanDisk Technologies, Inc | Method for copying data in reprogrammable non-volatile memory |
9122591, | May 13 2004 | SanDisk Technologies, Inc | Pipelined data relocation and improved chip architectures |
Patent | Priority | Assignee | Title |
5867644, | Sep 10 1996 | CONVERSANT INTELLECTUAL PROPERTY MANAGEMENT INC | System and method for on-chip debug support and performance monitoring in a microprocessor |
6018763, | May 28 1997 | Hewlett Packard Enterprise Development LP | High performance shared memory for a bridge router supporting cache coherency |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
May 01 2000 | Hewlett-Packard Development Company, L.P. | (assignment on the face of the patent) | / | |||
Jun 28 2000 | DAS SHARMA, DEBENDRA | Hewlett-Packard Company | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 011202 | /0058 | |
Jul 13 2000 | EBNER, SHARON M | Hewlett-Packard Company | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 011202 | /0058 | |
Jul 21 2000 | WICKERAAD, JOHN A | Hewlett-Packard Company | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 011202 | /0058 | |
Aug 04 2000 | COWAN, JOE P | Hewlett-Packard Company | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 011202 | /0058 | |
Aug 09 2000 | JACKSON, CARL H | Hewlett-Packard Company | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 011202 | /0058 | |
Jul 03 2003 | Hewlett-Packard Company | HEWLETT-PACKARD DEVELOPMENT COMPANY, L P | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 013780 | /0741 | |
Oct 27 2015 | HEWLETT-PACKARD DEVELOPMENT COMPANY, L P | Hewlett Packard Enterprise Development LP | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 037079 | /0001 | |
Jan 15 2021 | Hewlett Packard Enterprise Development LP | OT PATENT ESCROW, LLC | PATENT ASSIGNMENT, SECURITY INTEREST, AND LIEN AGREEMENT | 055269 | /0001 | |
Jan 15 2021 | HEWLETT PACKARD ENTERPRISE COMPANY | OT PATENT ESCROW, LLC | PATENT ASSIGNMENT, SECURITY INTEREST, AND LIEN AGREEMENT | 055269 | /0001 | |
Aug 03 2022 | OT PATENT ESCROW, LLC | VALTRUS INNOVATIONS LIMITED | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 061244 | /0298 |
Date | Maintenance Fee Events |
May 11 2007 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
May 11 2011 | M1552: Payment of Maintenance Fee, 8th Year, Large Entity. |
Jun 19 2015 | REM: Maintenance Fee Reminder Mailed. |
Nov 12 2015 | M1553: Payment of Maintenance Fee, 12th Year, Large Entity. |
Nov 12 2015 | M1556: 11.5 yr surcharge- late pmt w/in 6 mo, Large Entity. |
Date | Maintenance Schedule |
Nov 11 2006 | 4 years fee payment window open |
May 11 2007 | 6 months grace period start (w surcharge) |
Nov 11 2007 | patent expiry (for year 4) |
Nov 11 2009 | 2 years to revive unintentionally abandoned end. (for year 4) |
Nov 11 2010 | 8 years fee payment window open |
May 11 2011 | 6 months grace period start (w surcharge) |
Nov 11 2011 | patent expiry (for year 8) |
Nov 11 2013 | 2 years to revive unintentionally abandoned end. (for year 8) |
Nov 11 2014 | 12 years fee payment window open |
May 11 2015 | 6 months grace period start (w surcharge) |
Nov 11 2015 | patent expiry (for year 12) |
Nov 11 2017 | 2 years to revive unintentionally abandoned end. (for year 12) |