Disclosed is a technique in which an application can record changes it makes to physical memory. In the technique, the application specifies a virtual memory region which is converted to a plurality of cache lines, each of which is monitored for changes by a device connected to a coherence interconnect coupled to the processor caches. The application sends a start signal to start the logging process and an end signal to stop the process. During the logging process, when a change occurs to one of the cache lines, an undo entry corresponding to the change is created and entered into a transaction log residing in persistent memory. The transaction log containing the undo entries makes the set of changes recorded in the transaction log atomic. If a failure occurs, the recorded changes can be undone as if they never occurred.
|
11. A computing system comprising:
a plurality of CPUs, each having a cache;
a coherence interconnect coupled to the caches;
a persistent memory containing a log for a transaction; and
a device coupled to the persistent memory and the coherence interconnect;
wherein the device is configured to:
receive a signal to begin logging changes to cache lines in a set of cache lines corresponding to a plurality of physical addresses, the plurality of physical addresses derived from a virtual memory region specified by an application, wherein the signal to begin the logging is received in response to the application sending a command to register the virtual memory region to be tracked for the transaction;
in response to the signal to begin the logging, add a beginning mark for the transaction to the log;
track each of the cache lines in the set of cache lines while the application is executing;
receive an indication that a cache line of the cache lines in the set of cache lines has been modified while the application is executing, the indication being a write-back event for the cache line;
create and add an undo entry corresponding to the cache line to the log; and
add an ending mark for the transaction to the log.
1. A method for logging changes to cache lines in a set of cache lines into a log for a transaction, the log permitting recovery from a failure, comprising:
receiving a signal to begin logging changes to cache lines in the set of cache lines corresponding to a plurality of physical addresses, the plurality of physical addresses derived from a virtual memory region specified by an application, wherein the signal to begin the logging is received in response to the application sending a command to register the virtual memory region to be tracked for the transaction;
in response to the signal to begin the logging, adding a beginning mark for the transaction to the log, wherein the log resides in a persistent memory and the beginning mark indicates that the transaction is active; and
while the transaction is active:
tracking each of the cache lines in the set of cache lines while the application is executing;
receiving an indication that a cache line of the cache lines in the set of cache lines has been modified while the application is executing, the indication being a write-back event for the cache line;
creating and adding an undo entry corresponding to the cache line to the log;
receiving a signal to end the logging; and
in response to the signal to end the logging, adding an ending mark for the transaction to the log, the ending mark indicating that the transaction is inactive.
6. A non-transitory computer-readable medium containing instructions, which when executed by one or more processors, cause the one or more processors to carry out a method for logging changes to cache lines in a set of cache lines into a log for a transaction, the log permitting recovery from a failure, wherein the method comprises:
receiving a signal to begin logging changes to cache lines in the set of cache lines corresponding to a plurality of physical addresses, the plurality of physical addresses derived from a virtual memory region specified by an application, wherein the signal to begin the logging is received in response to the application sending a command to register the virtual memory region to be tracked for the transaction;
in response to the signal to begin the logging, adding a beginning mark for the transaction to the log, wherein the log resides in a persistent memory and the beginning mark indicates that the transaction is active; and
while the transaction is active:
tracking each of the cache lines in the set of cache lines while the application is executing;
receiving an indication that a cache line of the cache lines in the set of cache lines has been modified while the application is executing, the indication being a write-back event for the cache line;
creating and adding an undo entry corresponding to the cache line to the log;
receiving a signal to end the logging; and
in response to the signal to end the logging, adding an ending mark for the transaction in the log, the ending mark indicating that the transaction is inactive.
2. The method of
further comprising receiving a transaction id, the transaction id identifying the log among a plurality of transaction logs;
wherein the transaction id is included in the signal to begin the logging and the signal to end the logging.
3. The method of
4. The method of
wherein receiving the indication that the cache line has been modified includes receiving the write-back event for the cache line on a coherence interconnect; and
wherein the write-back event occurs while the transaction is active.
5. The method of
in response to receiving the signal to end the logging, snooping each of the cache lines in the set of cache lines; and
in response to snooping each of the cache lines, determining that a second cache line is modified and creating and adding a second undo entry corresponding to the second cache line to the log for the transaction prior to adding the ending mark.
7. The non-transitory computer-readable medium of
wherein the method further comprises receiving a transaction id, the transaction id identifying the log among a plurality of transaction logs; and
wherein the transaction id is included in the signal to begin the logging and the signal to end the logging.
8. The non-transitory computer-readable medium of
9. The non-transitory computer-readable medium of
10. The non-transitory computer-readable medium of
in response to receiving the signal to end the logging, snooping each of the cache lines in the set of cache lines; and
in response to snooping each of the cache lines, determining that a second cache line is modified and creating and adding a second undo entry corresponding to the second cache line to the log for the transaction prior to adding the ending mark.
12. The system of
13. The system of
wherein the device is further configured to detect the write-back event relating to the cache line while the transaction is active, on the coherence interconnect, the write-back event providing the indication that the cache line has been modified.
14. The system of
wherein one or more CPUs of the plurality of CPUs run the application; and
wherein the application provides an ending signal to the device to cause the device to add the ending mark to the log.
|
Persistent memory, in the memory/storage hierarchy, resides between system memory and mass storage. Persistent memory is a type of memory that holds its content across power cycles, is byte-addressable, and in some cases has about nine times the density of DRAM. With latencies in the nanosecond range, persistent memory is fast enough to be connected to the memory bus of the CPU as part of main memory so that the CPU can access the data directly rather than perform block I/O operations to buffers for mass storage. It is desirable to make use of the persistence of persistent memory to improve the reliability of applications running in a computing system.
One embodiment is a method for logging changes to cache lines in a set of cache lines corresponding to one or more physical memory ranges into a log for a transaction, where the log permits recovery from a failure. The method includes receiving a signal to begin logging changes to cache lines in the set of cache lines and in response to the signal to begin the logging, adding a beginning mark for the transaction to the log, where the log resides in a persistent memory, and the beginning mark indicates that the transaction is active. The method further includes, while the transaction is active, tracking each of the cache lines in the set of cache lines, receiving an indication that one cache line of the set of cache lines being tracked has been modified, creating and adding an undo entry corresponding to the modified cache line to the transaction log, and receiving a signal to end the logging. The method further includes, in response to the signal to end the logging, adding an ending mark for the transaction in the transaction log, where the ending mark indicates that the transaction is inactive.
Further embodiments include a device, such as a computer-readable medium for carrying one or more aspects of the above method, and a system configured to carry out one or more aspects of the above method.
Some of the advantages of the embodiments are that: (1) no special write instructions need to be executed by the application to perform the logging; (2) no read-indirection is needed, that is, no access to a new value in a redo log is needed, because only an undo log is used; (3) no stalling is needed prior to every update; and (4) the logging has minimal effect on the caches of the processor in that the entire capacity of the cache is available (i.e., cache lines are not used as log entries) but with a small increase in the cache miss rate due to the use of snooping of the caches.
Embodiments described herein provide a facility for an application to selectively mark a set of changes (e.g., data modifications, such as writes to memory) it makes to memory as an atomic transaction. In an atomic transaction, all changes that are part of the transaction persist atomically, meaning either all or none of the changes persist upon a system failure. Having this facility means that if an application fails or discovers an error, the application can undo all the changes of the transaction as if they did not occur. The facility makes the application more robust and lessens the likelihood that an application will lose or corrupt data. In addition, storing the set of changes as a transaction in persistent memory assures that the transaction is preserved over events that require the reloading of traditional volatile memory and even power interruptions.
A virtualization software layer, referred to hereinafter as hypervisor 111, is installed on top of hardware platform 102. Hypervisor 111 makes possible the concurrent instantiation and execution of one or more virtual machines (VMs) 1181-118N. The interaction of a VM 118 with hypervisor 111 is facilitated by the virtual machine monitors (VMMs) 134. Each VMM 1341-134N is assigned to and monitors a corresponding VM 1181-118N. In one embodiment, hypervisor 111 may be a hypervisor implemented as a commercial product in VMware's vSphere® virtualization product, available from VMware Inc. of Palo Alto, Calif. In an alternative embodiment, hypervisor 111 runs on top of a host operating system which itself runs on hardware platform 102. In such an embodiment, hypervisor 111 operates above an abstraction level provided by the host operating system.
After instantiation, each VM 1181-118N encapsulates a virtual computing machine platform that is executed under the control of hypervisor 111. Virtual devices of a VM 118 are embodied in the virtual hardware platform 120, which is comprised of, but not limited to, one or more virtual CPUs (vCPUs) 1221-122N, a virtual random access memory (vRAM) 124, a virtual network interface adapter (vNIC) 126, and virtual storage (vStorage) 128. Virtual hardware platform 120 supports the installation of a guest operating system (guest OS) 130, which is capable of executing applications 132. Examples of a guest OS 130 include any of the well-known commodity operating systems, such as the Microsoft Windows® operating system, and the Linux® operating system, and the like.
It should be recognized that the various terms, layers, and categorizations used to describe the components in
Hardware platform 103 supports the installation of an operating system (OS) 136, which is capable of executing applications 135. Examples of OS 136 include any of the well-known commodity operating systems, such as the Microsoft Windows® operating system, and the Linux® operating system, and the like.
In one embodiment, CPU 104a has one or more caches 224a, and CPU 104b has one or more caches 224b, which are used to reduce the average cost to access data from memory. Memory controller 230a, 230b transfers cache lines between CPU-Mem 106a and persistent memory 106b and respective caches 224a, 224b where a cache line (sometimes called a cache block) generally refers to a block of data of fixed size that is transferred between a memory location and a cache. When memory controller 230a, 230b copies a cache line from CPU-Mem 106a, CPU-Mem 106c or persistent memory 106b, 106d respectively into caches 224a, 224b, a cache entry is created, which may include the copied data as well as the memory location from which the data was copied (which may be called a tag). When CPU 104a, 104b needs to read or write a location in CPU-Mem 106a, 106c, persistent memory 106b, 106d, it first checks for a corresponding entry in respective caches 224a, 224b. Caches 224a, 224b check for the contents of the requested memory location in any cache lines that might contain that address. If CPU 104a, 104b finds that the memory location is in caches 224a, 224b, a cache hit has occurred; CPU 104a, 104b immediately reads or writes the data in the cache line. However, if CPU 104a, 104b does not find the memory location in caches 224a 224b, a cache miss has occurred. For a cache miss, caches 224a, 224b allocate a new entry and respective memory controller 230a, 230b copies data from CPU-Mem 106a, 106c, persistent memory 106b, 106d. CPU 104a, 104b then accesses the requested memory location respectively from the contents of caches 224a, 224b.
In one embodiment, CPU 104 has one or more caches 224, which are used to reduce the average cost to access data from memory. Data is transferred between CPU-Mem 106a and caches 224 in blocks of fixed size, called cache lines or cache blocks. When a cache line is copied from CPU-Mem 106a into caches 224, a cache entry is created, which includes both the copied data and the requested memory location (called a tag). When CPU 104 requests to read or write a location in CPU-Mem 106a, caches 224 first check for a corresponding entry contained therein. That is, caches 224 search for the contents of the requested memory location in any cache lines that might contain that address. If CPU 104 finds that the memory location resides in caches 224, a cache hit has occurred, and CPU 104 immediately reads or writes the data in the cache line. However, if CPU 104 does not find the memory location in caches 224, a cache miss has occurred. For a cache miss, caches 224 allocate a new entry and copy data from CPU-Mem 106a. The request is then fulfilled from the contents of caches 224. Operation of caches 224a and 224b is similar to that described for caches 224.
Cif ports 208, 212, mentioned above, support a coherence protocol, which is designed to maintain cache coherence in a system with many processors each having its own cache or caches. With FPGA 112 residing in one socket 202b of CPU sockets and having its own cif port 212, FPGA 112 can monitor and participate in the coherency protocol that keeps the processors' caches coherent.
Cache coherence on coherence interconnect 114 is maintained according to a standard coherence protocol, such as modified, exclusive, shared, invalid (MESI) protocol or modified, exclusive, shared, invalid, forwarded (MESIF) protocol. In these protocols, cache lines marked invalid signify that the cache line has invalid data and fresh data must be brought into caches 224 from CPU-Mem 106a. Cache lines marked exclusive, shared, and forwarded (in the MESIF protocol) all signify that the cache line has valid data, but the cache line is clean (not modified), so the cache line can be discarded from the cache without writing data of the cache line back to CPU-Mem 106a. A cache line marked as modified signifies the cache line is modified or dirty, and data of the cache line must be written back to CPU-Mem 106a before the cache line is discarded from caches 224.
The cache coherence protocol is enforced by a cache protocol agent for each cache connected to a coherence interconnect. Each cache protocol agent can initiate and respond to transactions on coherence interconnect 114 by sending and receiving messages on interconnect 114. In the present embodiments, CPU 104 includes a cache protocol agent 209 and FPGA 112 includes a cache protocol agent 220. Cache protocol agent 209 cooperates with cache protocol agent 220 by sending messages, including broadcast messages, over coherence interconnect 114. In the protocol, one of the cache protocol agents among the several agents present is an owner of a set of cache lines and contains information regarding those cache lines. The other cache protocol agents send messages to the owner agent requesting a cache line or to find the status of cache line owned by the owner agent. The owner agent may service the request directly or request that another cache protocol agent satisfy the request.
When a CPU 104 accesses a cache line that is not in its caches 224, at any level of the cache hierarchy, it is cache protocol agent 209 of CPU 104 that requests the cache line from CPU-Mem 106a. Thus, cache protocol agent 209 in CPU 104 issues a load cache line transaction on coherence interconnect 114. The transaction can be ‘Load Shared’ for sharing the cache line or ‘Load Exclusive’ for cache lines that will be modified. A cache line that is loaded as ‘Shared’ means that the line probably will not be modified. In contrast, a cache line that is loaded as ‘Exclusive’ is considered potentially dirty, because it is not certain the cache line will be modified. When a cache line gets evicted from caches 224 to CPU-Mem 106a, if it is modified, it must be written back to CPU-Mem 106a from which it originated. The operation of writing the cache line is performed on coherence interconnect 114 as a write-back transaction and can be monitored for tracking dirty cache lines. In the case of a write-back transaction, the cache line is actually dirty, rather than potentially dirty. In the description that follows, a write-back transaction is converted to and handled as a message, ‘WB Data CL’, which is received by write-back module 314.
To confirm whether a cache line is dirty or not, a cache protocol agent, such as agent 220 in FPGA 112, can snoop the cache line in accordance with the coherence interconnect protocol. If agent 220 determines the cache line is dirty, the snoop triggers a write-back transaction, thereby exposing the dirty cache line that was residing in a cache. Cache protocol agents 209 and 220 also have information regarding the cache lines that are resident in the processors' caches. This information is accessible via coherence interconnect 114. Operation of the coherence protocol and cache protocol agents 209a, 209b for caches 224a and 224b is similar to that described for caches 224.
Also depicted in
Certain embodiments as described above involve a hardware abstraction layer on top of a host computer. The hardware abstraction layer allows multiple contexts to share the hardware resource. In one embodiment, these contexts are isolated from each other, each having at least a user application running therein. The hardware abstraction layer thus provides benefits of resource isolation and allocation among the contexts. In the foregoing embodiments, virtual machines are used as an example for the contexts and hypervisors as an example for the hardware abstraction layer. As described above, each virtual machine includes a guest operating system in which at least one application runs. It should be noted that these embodiments may also apply to other examples of contexts, such as containers not including a guest operating system, referred to herein as “OS-less containers” (see, e.g., www.docker.com). OS-less containers implement operating system—level virtualization, wherein an abstraction layer is provided on top of the kernel of an operating system on a host computer. The abstraction layer supports multiple OS-less containers each including an application and its dependencies. Each OS-less container runs as an isolated process in user space on the host operating system and shares the kernel with other containers. The OS-less container relies on the kernel's functionality to make use of resource isolation (CPU, memory, block I/O, network, etc.) and separate namespaces and to completely isolate the application's view of the operating environments. By using OS-less containers, resources can be isolated, services restricted, and processes provisioned to have a private view of the operating system with their own process ID space, file system structure, and network interfaces. Multiple containers can share the same kernel, but each container can be constrained to only use a defined amount of resources such as CPU, memory and I/O.
Certain embodiments may be implemented in a host computer without a hardware abstraction layer or an OS-less container. For example, certain embodiments may be implemented in a host computer running a Linux® or Windows® operating system.
The various embodiments described herein may be practiced with other computer system configurations including hand-held devices, microprocessor systems, microprocessor-based or programmable consumer electronics, minicomputers, mainframe computers, and the like.
One or more embodiments of the present invention may be implemented as one or more computer programs or as one or more computer program modules embodied in one or more computer readable media. The term computer-readable medium refers to any data storage device that can store data which can thereafter be input to a computer system. Computer readable media may be based on any existing or subsequently developed technology for embodying computer programs in a manner that enables them to be read by a computer. Examples of a computer-readable medium include a hard drive, network attached storage (NAS), read-only memory, random-access memory (e.g., a flash memory device), a CD (Compact Discs)-CD-ROM, a CD-R, or a CD-RW, a DVD (Digital Versatile Disc), a magnetic tape, and other optical and non-optical data storage devices. The computer readable medium can also be distributed over a network coupled computer system so that the computer readable code is stored and executed in a distributed fashion.
Although one or more embodiments of the present invention have been described in some detail for clarity of understanding, it will be apparent that certain changes and modifications may be made within the scope of the claims. Accordingly, the described embodiments are to be considered as illustrative and not restrictive, and the scope of the claims is not to be limited to details given herein but may be modified within the scope and equivalents of the claims. In the claims, elements and/or steps do not imply any particular order of operation, unless explicitly stated in the claims.
Plural instances may be provided for components, operations or structures described herein as a single instance. Finally, boundaries between various components, operations, and data stores are somewhat arbitrary, and particular operations are illustrated in the context of specific illustrative configurations. Other allocations of functionality are envisioned and may fall within the scope of the invention(s). In general, structures and functionality presented as separate components in exemplary configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements may fall within the scope of the appended claim(s).
Calciu, Irina, Subrahmanyam, Pratap, Kolli, Aasheesh, Gandhi, Jayneel
Patent | Priority | Assignee | Title |
11544194, | Jul 23 2021 | VMware LLC | Coherence-based cache-line Copy-on-Write |
11782832, | Aug 25 2021 | VMware LLC | Low latency host processor to coherent device interaction |
11899562, | May 02 2019 | Microsoft Technology Licensing, LLC | Coprocessor-based logging for time travel debugging |
ER174, |
Patent | Priority | Assignee | Title |
10445238, | Apr 24 2018 | ARM Limited | Robust transactional memory |
5275907, | Jul 23 1992 | Eastman Kodak Company | Photosensitive compositions and lithographic printing plates containing acid-substituted ternary acetal polymer and copolyester with reduced propensity to blinding |
6711653, | Mar 30 2000 | Intel Corporation | Flexible mechanism for enforcing coherency among caching structures |
9003131, | Mar 27 2013 | Virtuozzo International GmbH | Method and system for maintaining context event logs without locking in virtual machine |
20020013886, | |||
20140112339, | |||
20150052309, | |||
20160246724, | |||
20170024324, | |||
20170277636, | |||
20190004851, | |||
20200042446, | |||
20200241978, | |||
20200242035, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Jan 16 2019 | CALCIU, IRINA | VMWARE, INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 048125 | /0964 | |
Jan 16 2019 | GANDHI, JAYNEEL | VMWARE, INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 048125 | /0964 | |
Jan 17 2019 | SUBRAHMANYAM, PRATAP | VMWARE, INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 048125 | /0964 | |
Jan 21 2019 | KOLLI, AASHEESH | VMWARE, INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 048125 | /0964 | |
Jan 24 2019 | VMware, Inc. | (assignment on the face of the patent) | / | |||
Nov 21 2023 | VMWARE, INC | VMware LLC | CHANGE OF NAME SEE DOCUMENT FOR DETAILS | 067102 | /0314 |
Date | Maintenance Fee Events |
Jan 24 2019 | BIG: Entity status set to Undiscounted (note the period is included in the code). |
Date | Maintenance Schedule |
Jul 20 2024 | 4 years fee payment window open |
Jan 20 2025 | 6 months grace period start (w surcharge) |
Jul 20 2025 | patent expiry (for year 4) |
Jul 20 2027 | 2 years to revive unintentionally abandoned end. (for year 4) |
Jul 20 2028 | 8 years fee payment window open |
Jan 20 2029 | 6 months grace period start (w surcharge) |
Jul 20 2029 | patent expiry (for year 8) |
Jul 20 2031 | 2 years to revive unintentionally abandoned end. (for year 8) |
Jul 20 2032 | 12 years fee payment window open |
Jan 20 2033 | 6 months grace period start (w surcharge) |
Jul 20 2033 | patent expiry (for year 12) |
Jul 20 2035 | 2 years to revive unintentionally abandoned end. (for year 12) |