storage circuitry is provided, that is designed to form part of a memory hierarchy. The storage circuitry comprises receiver circuitry for receiving a request to obtain data from the memory hierarchy. Transfer circuitry causes the data to be stored at a selected destination in response to the request, wherein the selected destination is selected in dependence on at least one selection condition. Tracker circuitry tracks the request while the request is unresolved. If at least one selection condition is met then the destination is the storage circuitry and otherwise the destination is other storage circuitry in the memory hierarchy.
|
13. A method comprising:
receiving, at storage circuitry, a request to obtain data from a memory hierarchy;
selecting a selected destination in the memory hierarchy in dependence on at least one selection condition;
causing the selected destination to obtain the data;
tracking the request while the request is unresolved;
wherein
if at least one selection condition is met then the selected destination is the storage circuitry and otherwise the destination is other storage circuitry; and
causing a higher level storage circuit in the memory hierarchy than the storage circuitry to use a higher prefetch distance with respect to the data when the selected destination is the other storage circuitry.
14. storage circuitry to form part of a memory hierarchy, the storage circuitry comprising:
means for receiving, at storage circuitry, a request to obtain data from a memory hierarchy;
means for selecting a selected destination in the memory hierarchy in dependence on at least one selection condition;
means for causing the selected destination to obtain the data;
means for tracking the request while the request is unresolved;
wherein
if at least one selection condition is met then the selected destination is the storage circuitry and otherwise the destination is other storage circuitry; and
means for causing a higher level storage circuit in the memory hierarchy than the storage circuitry to use a higher prefetch distance with respect to the data when the selected destination is the other storage circuitry.
1. storage circuitry to form part of a memory hierarchy, the storage circuitry comprising:
receiver circuitry to receive a request to obtain data from the memory hierarchy;
transfer circuitry to cause the data to be stored at a selected destination in response to the request, wherein the selected destination is selected in dependence on at least one selection condition; and
tracker circuitry to track the request while the request is unresolved;
wherein
if the at least one selection condition is met then the selected destination is the storage circuitry and otherwise the selected destination is other storage circuitry in the memory hierarchy; and
the storage circuitry comprises prefetch control circuitry to cause a higher level storage circuit in the memory hierarchy than the storage circuitry to use a higher prefetch distance with respect to the data when the selected destination is the other storage circuitry.
2. The storage circuitry according to
the at least one selection condition comprises a condition that an occupancy of the tracker circuitry is below a predetermined level.
3. The data processing apparatus according to
the request to obtain data from the memory hierarchy is a prefetch request.
4. The storage circuitry according to
the selected destination is the other storage circuitry; and
the selected destination is lower in the memory hierarchy than the storage circuitry.
5. The storage circuitry according to
the selected destination is a last level cache.
6. The storage circuitry according to
in response to an acknowledgement from the other storage circuitry that the other storage circuitry is to act as the selected destination of the data, the storage circuitry is adapted to respond to the request by indicating that the request is resolved.
7. The storage circuitry according to
when the storage circuitry is selected to be the selected destination of the data, the transfer circuitry is adapted to indicate that the request is resolved in response to the data being obtained.
8. The storage circuitry according to
inhibition circuitry to inhibit, in response to at least one inhibition condition being met, the transfer circuitry from selecting the other storage circuitry as the selected destination.
9. The storage circuitry according to
the at least one inhibition condition comprises a condition that a utilisation level of an other tracking circuitry of the other storage circuitry is above a predetermined utilisation limit.
10. The storage circuitry according to
the at least one inhibition condition comprises a condition that a usage level of the other storage circuitry by a neighbouring storage circuit is above a predetermined usage limit.
11. The storage circuitry according to
the prefetch control circuitry is adapted to cause the higher level storage circuit in the memory hierarchy to use a lower prefetch distance in respect of the data when the selected destination is the storage circuitry.
12. The storage circuitry according to
prefetch control circuitry to cause an origin of the request to issue a further request for the storage circuitry to obtain the data from the memory hierarchy in response to the selected destination being the other storage circuitry.
|
The present disclosure relates to data processing. In particular, the present disclosure relates to storage circuitry.
A memory hierarchy may consist of a number of storage circuits in the form of a plurality of caches and a main memory (e.g. backed by DRAM). At the top of the hierarchy, storage circuits are comparatively smaller and faster while at the bottom of the hierarchy, the main memory is comparatively large and slow. When a request for data ‘misses’ one of the storage circuits, lower level storage circuits are queried for the requested data and transferred to higher level caches when the data is found. However, only a limited number of such requests can be tracked at a time, and this limits the bandwidth of the memory system. It has previously been proposed to increase this bandwidth by increasing the capacity for tracking requests. However, this can lead to larger sized circuitry that consumes more power and reacts more slowly. It would therefore be desirable to improve the memory bandwidth while avoiding at least some of these disadvantages.
Viewed from a first example configuration, there is provided storage circuitry to form part of a memory hierarchy, the storage circuitry comprising: receiver circuitry to receive a request to obtain data from the memory hierarchy; transfer circuitry to cause the data to be stored at a selected destination in response to the request, wherein the selected destination is selected in dependence on at least one selection condition; and tracker circuitry to track the request while the request is unresolved, wherein if the at least one selection condition is met then the selected destination is the storage circuitry and otherwise the selected destination is other storage circuitry in the memory hierarchy.
Viewed from a second example configuration, there is provided a method comprising: receiving, at storage circuitry, a request to obtain data from a memory hierarchy; selecting a selected destination in the memory hierarchy in dependence on at least one selection condition; causing the selected destination to obtain the data; and tracking the request while the request is unresolved, wherein if at least one selection condition is met then the selected destination is the storage circuitry and otherwise the destination is other storage circuitry.
Viewed from a third example configuration, there is provided storage circuitry to form part of a memory hierarchy, the storage circuitry comprising: means for receiving, at storage circuitry, a request to obtain data from a memory hierarchy; means for selecting a selected destination in the memory hierarchy in dependence on at least one selection condition; means for causing the selected destination to obtain the data; and means for tracking the request while the request is unresolved, wherein if at least one selection condition is met then the selected destination is the storage circuitry and otherwise the destination is other storage circuitry.
The present invention will be described further, by way of example only, with reference to embodiments thereof as illustrated in the accompanying drawings, in which:
Before discussing the embodiments with reference to the accompanying figures, the following description of embodiments is provided.
In accordance with one example configuration there is provided storage circuitry to form part of a memory hierarchy, the storage circuitry comprising: receiver circuitry to receive a request to obtain data from the memory hierarchy; transfer circuitry to cause the data to be stored at a selected destination in response to the request, wherein the selected destination is selected in dependence on at least one selection condition; and tracker circuitry to track the request while the request is unresolved, wherein if the at least one selection condition is met then the selected destination is the storage circuitry and otherwise the selected destination is other storage circuitry in the memory hierarchy.
In these embodiments, the storage circuitry can take the form of a cache such as a level two cache. The storage circuitry receives a request to obtain data from somewhere in the memory hierarchy. This request could be an explicit request or it could be a prefetch request to obtain the data before it is specifically required by a processor core. In such embodiments when the request for data misses in the storage circuitry, the data is obtained from the memory hierarchy. The request for the data is therefore tracked by the tracking circuitry until it can be resolved. In the above embodiments, depending on a selection condition, the data is fetched into the storage circuitry itself or is fetched into other storage circuitry. This latter option can be achieved by the request being converted into a fetch request for a lower level storage circuit. In this way, the storage circuitry is required to do less tracking—the storage circuitry need not track the request any longer because the data is being fetched into other storage circuitry. The request is therefore “handed off” to the other storage circuitry. This enables the storage circuitry to increase its memory bandwidth by effectively relying on other storage circuits in the memory hierarchy to be made responsible for fetching data. As a consequence, more requests can be active simultaneously and thus the overall bandwidth of the memory hierarchy can be improved.
In accordance with some embodiments there is provided the at least one selection condition comprises a condition that an occupancy of the tracker circuitry is below a predetermined level. Each outstanding (e.g. unresolved or in-flight) request has a corresponding entry in the tracker circuitry. Accordingly, the number of entries in the tracker circuitry restricts the number of outstanding requests that can be present and accordingly restricts the number of data transfers that can occur within the memory hierarchy. Consequently, when the occupancy (e.g. the number of entries) of the tracker circuitry is below a predetermined level, it is appropriate for the storage circuitry itself to obtain data and store it locally. Alternatively, if the occupancy of the tracker circuitry is at or above the predetermined level, then the further bandwidth usage may be limited. Accordingly, the storage circuitry may cause a storage circuit other than the storage circuitry in the memory hierarchy to obtain the requested data and store it. In this way, the request need not be represented in the tracker circuitry for an extended period of time and so more requests can be in-flight simultaneously, thereby improving the memory hierarchy bandwidth.
In some embodiments, the request to obtain data from the memory hierarchy is a prefetch request. Prefetching is process used to obtain data without any explicit request for that data being made from the processor core (e.g. processing circuitry that executes a stream of instructions). A prediction is made by a prefetcher as to data that will be required in the near future based on the explicit requests for data that have been made. Using this information, data can be fetched prior to be it being required. When the explicit request for that data is eventually made, the act of having prefetched the data will cause it to be in a higher level of the memory hierarchy. Thus, the time taken to obtain the data may be lower. This causes the memory latency (e.g. the period of time between data being explicitly requested and provided) to be reduced.
In some embodiments, the selected destination is lower in the memory hierarchy than the storage circuitry. For example, when the condition is not met, the transfer circuitry of the storage circuitry causes a storage circuit that is at a lower level of the memory hierarchy (e.g. nearer to the main memory) to be the storage circuit that obtains the data from the memory hierarchy. Typically, the act of fetching the data will cause the data to be fetched from an even lower level of the memory hierarchy (e.g. such as a main memory). Consequently, in these embodiments, the transfer circuitry causes the requested data to move further from the main memory and closer to the processing circuitry thereby reducing the memory latency for when the data is ultimately requested. However, at the time that the transfer circuitry makes the request, the data is not being transferred all the way to the storage circuitry.
In some embodiments, the selected destination is a Last Level Cache (LLC). An LLC is considered to be the storage circuit (e.g. cache) that is nearest to the main memory. In some embodiments, the LLC is shared between a number of processor cores. However, this is not necessary, and in some other embodiments, the LLC is dedicated to a particular processor core.
In some embodiments, in response to an acknowledgement from the other storage circuitry that the other storage circuitry is to act as the selected destination of the data, the storage circuitry is adapted to respond to the request by indicating that the request is resolved. When a request to receive data is received by the receiver circuitry, an entry may be stored in the tracker circuitry until that request has been resolved. If the transfer circuitry then causes the data to be fetched and stored in the other storage circuitry, then that other storage circuitry will fetch the data and once the request to fetch that data has been received, the other storage circuitry may respond with an acknowledgement indicating that the request for that data to be obtained has been received. At this point, the initial request that was made to the storage circuitry to obtain the data has been resolved since it has been converted into a different request to be satisfied by different storage circuitry. Consequently, the entry in the tracker circuitry of the storage circuitry can be removed. It will be appreciated that since the data need not be retrieved by the storage circuitry, in many cases this will cause the entry in the tracker circuitry to be stored for a substantially smaller period of time than if data had to be obtained by and stored in the storage circuitry. For instance, the period of time take for a level 2 cache to cause a level 3 cache to obtain data from main memory would be expected to be smaller than the period of time taken for the level 2 cache to obtain the data from main memory itself. The length of time the individual entry is stored in the tracker circuitry can therefore be reduced even though the request is being satisfied and consequently other requests can be satisfied by the storage circuitry at the same time thereby increasing the overall bandwidth of the memory hierarchy.
In some embodiments, when the storage circuitry is selected to be the selected destination of the data, the transfer circuitry is adapted to indicate that the request is resolved in response to the data being obtained. Where the transfer circuitry causes the storage circuitry itself to obtain and store the data, the request can be considered to be resolved when the data has been obtained by the storage circuitry. At this point, the request for the storage circuitry to obtain the data as received by the receiver circuitry has been resolved and so the entry can be removed.
In some embodiments, the storage circuitry comprises inhibition circuitry to inhibit, in response to at least one inhibition condition being met, the transfer circuitry from selecting the other storage circuitry as the selected destination. In some situations, even if the at least one condition is not met, then it may be undesirable for the selected storage circuit to be anything other than the storage circuitry. For instance, even where the condition is not met, it may be undesirable for the storage circuitry to convert a request for the storage circuitry to obtain data into a request for another storage circuit to obtain that data. Accordingly, the inhibition circuitry is provided in order to cause this situation to be inhibited in response to the at least one inhibition condition being met.
There are a number of situations in which the inhibition condition is met. In some embodiments, the at least one inhibition condition comprises a condition that a utilisation level of an other tracking circuitry of the other storage circuitry is above a predetermined utilisation limit. If tracking circuitry (e.g. a request buffer) of the other circuitry is above a particular utilisation limit then causing the other storage circuitry to fetch the data and store it is unlikely to succeed. It will be appreciated that there is a limit to the number of conversions of incoming requests for data that can be converted by the storage circuitry. Therefore when this limit is reached (e.g. when the utilisation limit is reached) it may be inappropriate to perform further conversions.
As an alternative or in addition to the above, in some embodiments, the at least one inhibition condition comprises a condition that a usage level of the other storage circuitry by a neighbouring storage circuit is above a predetermined usage limit. A neighbour of a storage circuit can be considered to be a storage circuit that appears at the same level of that storage circuit in the memory hierarchy. For instance, the neighbours of a level 1 cache would be other level 1 caches. In this way, if the usage level of the other storage circuitry by a neighbour of the storage circuit is above a predetermined usage limit (e.g. 50%) then it may be assumed that the other storage circuitry is being heavily relied upon by that neighbour and it may therefore be inappropriate to use the other storage circuitry for offloading of data requests.
In some embodiments, the storage circuitry comprises prefetch control circuitry to cause a higher level storage circuit in the memory hierarchy than the storage circuitry to use a higher prefetch distance in respect of the data when the selected destination is the other storage circuitry.
In some embodiments, the prefetch control circuitry is adapted to cause the higher level storage circuit in the memory hierarchy to use a lower prefetch distance in respect of the data when the selected destination is the storage circuitry. It will be appreciated that where the selected storage circuit is the other storage circuitry, the requested data will be stored further away from the processor core than if the selected storage circuit is the storage circuitry itself. Accordingly, where a further access request for the data is made, the access must allow for a higher memory latency. In the form of a prefetch, this is represented by the prefetch distance being increased. In particular, where prefetching occurs, the data will be prefetched further ahead of where it would ordinarily be fetched in order to compensate for the fact that the data will take a longer time to be fetched from the further (other) storage circuit.
In some embodiments, the storage circuitry comprises prefetch control circuitry to cause an origin of the request to issue a further request for the storage circuitry to obtain the data from the memory hierarchy in response to the selected destination being the other storage circuitry. When the request is converted (e.g. when the request causes the transfer circuitry to select the other storage circuitry as the selected storage circuit) the data is fetched into different storage circuitry. Consequently, prefetch control circuitry can be provided in order to signal the origin of the request for issue a further request for the storage circuitry to obtain the data from the memory hierarchy. This has the effect that the first request will cause the data to move up the memory hierarchy, and the further request will cause the data to move further up the memory hierarchy. Furthermore, by splitting the request for the data to be obtained by the storage circuitry into two requests, the length of time for which a particular request will be represented in the tracker circuitry can be reduced overall. By potentially reducing this time as well as splitting the time in half, it is possible to be more flexible with requests in the memory hierarchy and consequently the overall bandwidth of the memory hierarchy can be improved.
Particular embodiments will now be described with reference to the figures.
A request from the load/store unit 120a, 120b is initially passed to the level 1 cache 130a, 130b where the requested data is returned if that data is present in the level 1 cache 130a, 130b. However, owing to the smaller size of the level 1 cache 130a, 130b only frequently or recently accessed data is likely to be stored here. Consequently, if a “miss” occurs on the level 1 cache 130a, 130b, then the request is forwarded to a level 2 cache 140a, 140b for the requested data. If the request “misses” on the level 2 cache 140a, 140b, then it is forwarded to the level 3 cache 150. In this example, the level 3 cache is shared between the processor cores 110a, 110b. In other words, data that is accessed by one of the processor cores and stored in the level 3 cache 150 may be accessible by the other processor core 110b. The level 3 cache is larger and slower than even the level 2 caches 140a, 140b and is lower in the hierarchy (being closer to the main memory 160). The main memory 160 is shared between the processor cores 110a, 110b and may be backed by, for instance, DRAM. The main memory is typically the slowest of the storage circuits that make up the memory hierarchy 100.
Within this system, prefetching may be used in order to reduce memory latencies. When a processor core 110a executes an instruction that explicitly accesses data that is stored in the main memory 160 it will be appreciated that the load/store unit 120a may have to forward a request that passes through the level 1 cache 130a, the level 2 cache 140a, and the level 3 cache 150 to the main memory 160. The main memory must then search for the requested data and forward the data back through this chain of storage circuits back to the load/store unit 120a. Even if the intermediate storage circuits 130a, 140a, 150 can be circumvented, as a consequence of being larger and therefore slower, the amount of time required to fetch the requested data and return it to the load/store unit 120a can be high. Consequently, the concept of prefetching can be used. In a prefetcher system, the sequence of explicitly requested memory addresses can be analysed in order for a pattern to be established. When such a pattern has been established, it is possible to pre-emptively request data from the memory hierarchy 100 such that the data can be made available in a higher level cache of the memory hierarchy 100 at the time it is explicitly requested. Consequently, the time taken between the explicit request for that data coming in at the load/store unit 120a and it actually being returned to the load/store unit 120a can be significantly reduced. Prefetching can occur at each level of the memory hierarchy. For instance, data may be prefetched into the level 2 cache 140a as well as the level 1 cache 130a and the level 3 cache 150, or data could be prefetched up through the memory hierarchy at different stages (i.e. at different times in advance of the data being explicitly requested). Although not relevant to the present technique, different strategies and parameters may be considered for determining when a prefetch occurs, how proactively to obtain data, and how readily a particular pattern should be established.
It will be appreciated that passing messages between different parts of the memory hierarchy 100 uses bandwidth. In many instances it is desirable to make good use of the available memory bandwidth.
In this example, the level 1 cache 130 issues a level 2 prefetch request to the level 2 cache 140. This causes the level 2 cache's fetch circuitry 210 to issue a read request to the main memory 160 via the LLC 150. Note that as illustrated in
It will be appreciated that as a consequence of this process, the lifetime of the request that is received by the level 2 cache 140 is relatively extensive.
It will be appreciated that as a consequence of this process, the lifetime of the level 2 prefetch request is significantly shorter. Consequently, the tracker circuitry 220 of the level 2 cache 140 can be cleared more quickly thereby enabling more bandwidth of memory hierarchy 100 to be used. In particular, this is achieved by virtue of corresponding tracker circuitry at the LLC 150 being used by the level 2 cache 140. In the time that is saved, another request can be tracked by the tracker circuitry—this allows more activity to take place in the hierarchy 100 at once, effectively increasing the bandwidth of the memory hierarchy 100.
The first of these situations represents the situation in which conversion may not be possible. In particular, if the lower level cache does not have the ability to perform a request because its own tracker circuitry is nearly full, then it would be inappropriate to convert the request.
In the second example, if another cache is already making heavy use of the lower level cache (e.g. if the level 2 cache 140b is making heavy use of the level 3 cache 150) then again it may be considered to be inappropriate for another level 2 cache 140a to begin converting requests. Accordingly, conversion should be inhibited in these cases.
It will be appreciated that where no change to the prefetched distance occurs, no signal necessarily need be issued by the prefetch control circuitry 240.
Accordingly, it can be seen that by handling the occupancy of the tracker circuitry 220 (e.g. by converting requests so as for those requests to be handled by lower level caches) more requests can be in-flight at the same time and consequently a bandwidth of the memory hierarchy can be improved.
In the present application, the words “configured to . . . ” are used to mean that an element of an apparatus has a configuration able to carry out the defined operation. In this context, a “configuration” means an arrangement or manner of interconnection of hardware or software. For example, the apparatus may have dedicated hardware which provides the defined operation, or a processor or other processing device may be programmed to perform the function. “Configured to” does not imply that the apparatus element needs to be changed in any way in order to provide the defined operation.
Although illustrative embodiments of the invention have been described in detail herein with reference to the accompanying drawings, it is to be understood that the invention is not limited to those precise embodiments, and that various changes, additions and modifications can be effected therein by one skilled in the art without departing from the scope and spirit of the invention as defined by the appended claims. For example, various combinations of the features of the dependent claims could be made with the features of the independent claims without departing from the scope of the present invention.
Montero, Adrian, Dooley, Miles Robert, Pusdesris, Joseph Michael, Bruce, Klas Magnus, Abernathy, Chris
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
8909866, | Nov 06 2012 | Advanced Micro Devices, Inc. | Prefetching to a cache based on buffer fullness |
20110078380, | |||
20160283232, | |||
20170286304, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Aug 23 2018 | BRUCE, KLAS MAGNUS | ARM Limited | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 046764 | /0897 | |
Aug 24 2018 | ABERNATHY, CHRIS | ARM Limited | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 046764 | /0897 | |
Aug 28 2018 | MONTERO, ADRIAN | ARM Limited | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 046764 | /0897 | |
Aug 28 2018 | DOOLEY, MILES ROBERT | ARM Limited | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 046764 | /0897 | |
Aug 29 2018 | PUSDESRIS, JOSEPH MICHAEL | ARM Limited | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 046764 | /0897 | |
Aug 31 2018 | ARM Limited | (assignment on the face of the patent) | / |
Date | Maintenance Fee Events |
Aug 31 2018 | BIG: Entity status set to Undiscounted (note the period is included in the code). |
Feb 20 2024 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Date | Maintenance Schedule |
Sep 15 2023 | 4 years fee payment window open |
Mar 15 2024 | 6 months grace period start (w surcharge) |
Sep 15 2024 | patent expiry (for year 4) |
Sep 15 2026 | 2 years to revive unintentionally abandoned end. (for year 4) |
Sep 15 2027 | 8 years fee payment window open |
Mar 15 2028 | 6 months grace period start (w surcharge) |
Sep 15 2028 | patent expiry (for year 8) |
Sep 15 2030 | 2 years to revive unintentionally abandoned end. (for year 8) |
Sep 15 2031 | 12 years fee payment window open |
Mar 15 2032 | 6 months grace period start (w surcharge) |
Sep 15 2032 | patent expiry (for year 12) |
Sep 15 2034 | 2 years to revive unintentionally abandoned end. (for year 12) |