A system, method, and computer-accessible medium for detecting and logging in-line synchronization primitives are disclosed. One or more in-line synchronization primitives in a computer program are programmatically detected during execution of the computer program. The one or more in-line synchronization primitives are stored in a log.
|
1. A method comprising:
programmatically detecting one or more in-line synchronization primitives in a computer program during execution of the computer program using dynamic binary compilation to detect the one or more in-line synchronization primitives in the computer program during the execution of the computer program;
replacing each of the one or more in-line synchronization primitives in the computer program with a corresponding substitute synchronization primitive during execution of the computer program;
executing at least one of the substitute synchronization primitives in a manner visible to an operating system;
using the operating system to detect the execution of the at least one of the substitute synchronization primitives;
storing information indicative of each of the detected substitute synchronization primitives in a log, wherein the information comprises an execution order of each of the detected substitute synchronization primitives and a result of each of the detected substitute synchronization primitives;
determining that the execution of the computer program has failed at a point in time; and
using the log to resume the execution of the computer program from the point in time by replaying the execution of the computer program in a manner that guarantees the execution order and the results of each of the detected substitute synchronization primitives as stored in the log.
6. A system comprising:
a storage device;
a cpu; and
a memory coupled to the cpu, wherein the memory stores program instructions which are executable by the cpu to:
detect one or more in-line synchronization primitives in a computer program using dynamic binary compilation to detect the one or more in-line synchronization primitives in the computer program during execution of the computer program;
replace each of the one or more in-line synchronization primitives in the computer program with a corresponding substitute synchronization primitive during execution of the computer program;
execute at least one of the substitute synchronization primitives in a manner visible to an operating system;
use the operating system to detect the execution of the at least one of the substitute synchronization primitives;
store information indicative of each of the detected substitute synchronization primitives in a log on the storage device, wherein the information comprises an execution order of each of the detected substitute synchronization primitives and a result of each of the detected substitute synchronization primitives;
determine that the execution of the computer program has failed at a point in time; and
use the log to resume the execution of the computer program from the point in time by replaying the execution of the computer program in a manner that guarantees the execution order and the results of each of the detected substitute synchronization primitives as stored in the log.
4. A computer-accessible storage medium comprising program instructions, wherein the program instructions are computer executable to implement:
detecting one or more in-line synchronization primitives in a computer program during execution of the computer program using dynamic binary compilation to detect the one or more in-line synchronization primitives in the computer program during the execution of the computer program;
replacing each of the one or more in-line synchronization primitives in the computer program with a corresponding substitute synchronization primitive during execution of the computer program;
executing at least one of the substitute synchronization primitives in a manner visible to an operating system;
using the operating system to detect the execution of the at least one of the substitute synchronization primitives;
storing information indicative of each of the detected substitute synchronization primitives in a log, wherein the information comprises an execution order of each of the detected substitute synchronization primitives and a result of each of the detected substitute synchronization primitives;
determining that the execution of the computer program has failed at a point in time; and
using the log to resume the execution of the computer program from the point in time by replaying the execution of the computer program in a manner that guarantees the execution order and the results of each of the detected substitute synchronization primitives as stored in the log.
2. The method of
intercepting the one or more in-line synchronization primitives during the execution of the computer program using dynamic binary compilation.
3. The method of
5. The computer-accessible storage medium of
7. The system of
|
1. Field of the Invention
This invention relates to enterprise system management and, more particularly, to continuous availability techniques in multi-server networked environments.
2. Description of the Related Art
The impact of system downtime on productivity is increasing as organizations rely more heavily on information technology. Consequently, organizations may seem to minimize downtime through various approaches designed to increase reliability and availability. Ultimately, the goal of many organizations is to ensure the continuous availability of critical systems.
One approach to continuous availability is the use of redundant hardware executing redundant instances of an application in lockstep. If one instance of an application on one unit of hardware fails, then the instance on the other unit of hardware may continue to operate. However, the redundant hardware is often proprietary, and both the redundant and proprietary natures of the hardware yield a cost that may be prohibitive.
To avoid the expense of special-purpose hardware, software techniques may be used to provide failover of an application. For example, cluster management software may support application failover in a networked environment having two or more servers and a shared storage device. If the failure of an application or its host server is sensed, then a new instance of the application may be started on a functioning server in the cluster. However, software-based failover approaches may fail to preserve the entire context of the application instance on the failed server up to the moment of failure. In the wake of a failure, the new instance of the application is typically started anew. In the process, recent transactions and events may be discarded. Other transactions and events may be left in an indeterminate state.
It is desirable to provide improved methods and systems for continuously available execution environments.
A system, method, and computer-accessible medium for detecting and logging in-line synchronization primitives are disclosed. The method may include programmatically detecting one or more in-line synchronization primitives in a computer program during execution of the computer program. The in-line synchronization primitives may be detected using dynamic binary compilation at runtime. Using dynamic binary compilation, each in-line synchronization primitive may be replaced by or redirected to a substitute synchronization primitive which is visible to the operating system.
The method may further include storing the in-line synchronization primitives in a log. Upon execution of a substitute synchronization primitive in a manner visible to the operating system, the substitute synchronization primitive may be recognized by the operating system and logged accordingly. Any time a synchronization primitive is encountered in the computer program, it may be logged in the same order (with respect to other logged events) and with the same result as encountered during execution.
In one embodiment, execution of the computer program may be deterministically replayed using the log of synchronization primitives. Synchronization primitives in the log may be replayed in the same order and with the same results as originally encountered. In this manner, the state of the computer program at or immediately preceding the point of failure may be restored, and execution of the computer program may continue from the point of failure in a manner transparent to any clients.
A better understanding of the present invention can be obtained when the following detailed description is considered in conjunction with the following drawings, in which:
While the invention is described herein by way of example for several embodiments and illustrative drawings, those skilled in the art will recognize that the invention is not limited to the embodiments or drawings described. It should be understood that the drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the invention as defined by the appended claims.
Using the systems, methods, and computer-accessible media described herein, detection, interception, and/or capture of in-line synchronization primitives may be provided. The detection, interception, and/or capture of in-line synchronization primitives may be used to preserve and duplicate application context in a continuously available execution environment.
As used herein, the term “server(s)” or “servers(s) 102” may refer collectively to any of the hosts 102A-102F illustrated in
The continuously available execution environment may also be referred to as “software fault tolerance” or “application virtualization.” In one embodiment, the continuously available execution environment may be implemented in software, i.e., without using redundant propriety hardware executing in lockstep. In one embodiment, the continuously available execution environment may be implemented without recompilation of an operating system kernel. In one embodiment, the continuously available execution environment may be implemented without static recompilation of applications 104. In one embodiment, the continuously available execution environment may be implemented without modification of clients 110, and the failover 105 may be transparent to clients 110. The continuously available execution environment may also be used for migration of applications 104 from server to server for maintenance or performance reasons.
In the example shown in
In various embodiments, the network 120 may comprise any local area network (LAN) such as an intranet or any wide area network (WAN) such as the Internet. The network 120 may use a variety of wired or wireless connection media. Wired connection media may include, for example, Ethernet, Fiber Channel media, or another sufficiently fast connection media. Wireless connection media may include, for example, a satellite link, a modem link through a cellular service, or a wireless link such as Wi-Fi.
In various embodiments, the multi-server networked environment 100 may employ any of a number of commercially available software products for continuous availability, such as, for example, various products available from VERITAS Software Corporation (Mountain View, Calif.). The software products for continuous availability may be installed and executed on servers 102 which are coupled to the network 120. In one embodiment, the software products for continuous availability may operate transparently to the servers 102 and/or applications 104. In various embodiments, the multi-server networked environment 100 may also employ any of a number of commercially available software products for storage management, such as, for example, various products available from VERITAS Software Corporation (Mountain View, Calif.). The storage management software may provide functionality such as cluster management, volume management, storage virtualization, and/or file system management to organize data on one or more storage devices 130 and/or provide storage access to servers 102 and clients 110.
In one embodiment,
In order to capture the application state 103 at a point in time at or immediately prior to the point of failure, sufficient data about the application state 103 may be logged to enable deterministic restoration of the application state 103.
The application snapshot 132C may comprise the execution state, memory state, transaction state, open network connections, open files, and other suitable state-related data for the application instance 104C at a particular point in time. The file system snapshot 133C may comprise contents and metadata of a file system used by the application instance 104C at a particular point in time. In one embodiment, snapshots of either type may be taken at a regular interval (e.g., once per minute). Further aspects regarding possible implementations of application snapshots are described in U.S. Pat. No. 6,848,106, which is incorporated herein by reference. Further aspects regarding possible implementations of file system snapshots are described in U.S. Pat. No. 6,850,945, which is incorporated herein by reference.
Because snapshots are too resource-intensive to be taken after every event that changes the application state 103C, one or more logs 134C may be used to store data between snapshots which alters the application state 103C. The log 134C may comprise any events that are capable of introducing non-determinism into program execution, including their original order and original results. For example, a log 134C may comprise a record of events and results such as transaction requests from clients 110B of the application, interprocess communication events, TCP/IP events, other file I/O, system calls for random number generation, system calls for a date or time, attempts to acquire semaphores, signal execution, etc. After restoring the state-related data in the application snapshot 132C and/or the file system data in the file system snapshot 133C, the entries in the log 134C may be “replayed” (i.e., encountered in the same order and with the same results as originally experienced) to deterministically restore the application state 103C and continue execution from the point of failure. The most recent snapshots 132C and/or 133C and the log 134C may be used to resume execution of an application 104, including the opening of connections to any clients 110B, from a point in time at or immediately prior to the point of failure. In this manner, the failover 105 from one server 102C to another server 102D may be transparent to any clients 110B.
To enable deterministic replay of the application, synchronization instructions 106C should be logged in the same order in which they were used. If the application uses library calls or calls to the kernel for such instructions (e.g., for semaphores), then the synchronization instructions may be detected and logged using conventional techniques (e.g., for monitoring the kernel). However, the application may perform such instructions internally or in-line, such that the instructions are invisible to the operating system (i.e., outside of kernel knowledge). For example, applications running in user mode may avoid calling the kernel for such instructions. As illustrated in
In one embodiment, dynamic binary compilation techniques may be used to detect, intercept, and/or capture the one or more in-line synchronization primitives in the computer program. Dynamic binary compilation is a technique used for generating program code at runtime. Dynamic binary compilation may also be referred to as “dynamic compilation.”
In 510, the dynamic binary compiler may modify the in-line synchronization primitive to permit its logging. The in-line synchronization primitive may be replaced by or redirected to a substitute synchronization primitive which is visible to the operating system (e.g., the kernel 107 or other core element of the operating system). In one embodiment, the dynamic binary compiler 108 may automatically substitute the in-line synchronization primitive with program code to switch the process into the kernel, where the substitute synchronization primitive may be executed. In one embodiment, the replacement code in the application program may comprise a trap, wherein control is transferred to the operating system (e.g., for execution of the substitute synchronization primitive) and then back to the application program. In 512, the substitute synchronization primitive is then executed in a manner which is visible to the operating system (OS), such as by executing the substitute synchronization primitive in kernel mode. Steps 510 and/or 512 may also be referred to as “intercepting” the in-line synchronization primitives. Steps 510 and/or 512 may also be referred to as “simulating” execution of the in-line synchronization primitives. The in-line synchronization primitives 106E shown in
In 514, the substitute synchronization primitive is recognized by the operating system and stored in a log 134E. The synchronization primitive may be logged in the same order (with respect to other logged events) and with the same result as encountered during execution. In one embodiment, each synchronization primitive may be logged at substantially the same time at which it is encountered in the execution of the computer program. Each synchronization primitive may be logged each time it is encountered in the execution of the computer program. Storing or logging the synchronization primitive may also be referred to as “capturing” the synchronization primitive.
Execution of the computer program may continue after the logging in 514. Each in-line synchronization primitive encountered for the first time may be detected and intercepted using dynamic binary compilation as shown in steps 508 through 512. However, a subsequent encounter with the synchronization primitive may bypass steps 508 and 510 and quickly result in the execution and logging of the substitute synchronization primitive in 512 and 514.
In one embodiment, any performance penalty suffered due to dynamic binary compilation may be small. After each synchronization primitive is initially encountered in the program code, recognized, and replaced, the application instance 104E will typically run at substantially the same speed as an unmodified version of the same application.
In one embodiment, execution of the computer program may be deterministically replayed using the log 134E.
In 606, it is determined that execution of the computer program 104E has failed on a server 102E at a particular point in time. Failure of the application instance 104E may be caused by a hardware or software fault in the server 102E itself or by a fault in an external entity such as a storage device. In one embodiment, the failure may be sensed automatically by another server 102F (e.g., using conventional cluster management techniques). The failure may also be sensed by another element such as a client 110C, a storage device 130C, or another computer system tasked with oversight of the multi-server networked environment 400.
In 608, the log 134E is used to resume execution of the computer program on another server 102F from the particular point in time. In one embodiment, the most recent valid application snapshot 132E and/or file system snapshot 133E may initially be restored. After restoring the snapshots 132E and/or 133E, entries in the log 134E may be replayed in the same order and with the same results as originally encountered to restore the application state 103E deterministically. The log 134E may comprise any events that are capable of introducing non-determinism into program execution along with the results of such events. For example, the log 134E may comprise a record of events and results such as transaction requests from clients 110C of the application, interprocess communication events, TCP/IP events, other file I/O, system calls for random number generation, system calls for a date or time, attempts to acquire semaphores, signal execution, etc. As discussed above, the log may comprise synchronization primitives 106E that were detected and logged in the proper order using dynamic binary compilation techniques. Replaying the synchronization primitives to restore the application state 103E may comprise executing or simulating execution of the primitives in the same order and with the same results as originally detected and logged. After restoring the snapshots 132E and/or 133E and the log 134E, including the opening of connections to any clients 110C, execution of the application 104 may continue from a point in time at or immediately prior to the point of failure. In this manner, the failover 105 from one server 102E to another server 102F may be transparent to any clients 110C.
In one embodiment, the application state 103E restored to the second server 102F may include the substitute synchronization primitives generated according to
Exemplary Computer Systems
Computer system 900 may also include devices such as keyboard & mouse 950, SCSI interface 952, network interface 954, graphics & display 956, hard disk 958, and other nonvolatile storage 960, all of which are coupled to processor 910 by a communications bus. In various embodiments, nonvolatile storage 960 may include optical media devices such as read-only or writable CD or DVD, solid-state devices such as nonvolatile RAM, or any other suitable type of nonvolatile storage. It will be apparent to those having ordinary skill in the art that computer system 900 can also include numerous elements not shown in the figure, such as additional storage devices, communications devices, input devices, and output devices, as illustrated by the ellipsis shown. An example of such an additional computer system device is a Fibre Channel interface.
Those having ordinary skill in the art will readily recognize that the techniques and methods discussed above can be implemented in software as one or more software programs, using a variety of computer languages, including, for example, traditional computer languages such as assembly language, Pascal, and C; object oriented languages such as C++ and Java; and scripting languages such as Perl and Tcl/Tk. In some embodiments, software 940 may comprise program instructions executable, for example by one or more processors 910, to perform any of the functions or methods described above. Also, in some embodiments software 940 can be provided to the computer system via a variety of computer-accessible media including electronic media (e.g., flash memory), magnetic storage media (e.g., hard disk 958, a floppy disk, etc.), optical storage media (e.g., CD-ROM 960), and communications media conveying signals encoding the instructions (e.g., via a network coupled to network interface 954). In some embodiments, separate instances of these programs can be executed on separate computer systems in keeping with the methods described above. Thus, although certain steps have been described as being performed by certain devices, software programs, processes, or entities, this need not be the case and a variety of alternative implementations will be understood by those having ordinary skill in the art.
Additionally, those having ordinary skill in the art will readily recognize that the techniques described above can be utilized in a variety of different storage devices and computer systems with variations in, for example, the number of nodes, the type of operation of the computer system, e.g., cluster operation (failover, parallel, etc.), the number and type of shared data resources, and the number of paths between nodes and shared data resources.
Various modifications and changes may be made to the invention as would be obvious to a person skilled in the art having the benefit of this disclosure. It is intended that the following claims be interpreted to embrace all such modifications and changes and, accordingly, the specifications and drawings are to be regarded in an illustrative rather than a restrictive sense.
Shats, Serge, Pashenkov, Serge
Patent | Priority | Assignee | Title |
10210071, | Jul 14 2006 | AT&T Intellectual Property I, L P | Delta state tracking for event stream analysis |
10489203, | Apr 03 2015 | Oracle International Corporation | System and method for using an in-memory data grid to improve performance of a process defined by a process execution language in a SOA middleware environment |
10585766, | Jun 06 2011 | Microsoft Technology Licensing, LLC | Automatic configuration of a recovery service |
10824343, | Aug 08 2008 | Amazon Technologies, Inc. | Managing access of multiple executing programs to non-local block data storage |
11086760, | Jul 14 2006 | AT&T Intellectual Property I, L P | Delta state tracking for event stream analysis |
11354387, | Mar 15 2021 | SAP SE | Managing system run-levels |
11520683, | Jul 14 2006 | AT&T Intellectual Property I, L P | Delta state tracking for event stream analysis |
11762743, | Jun 28 2021 | International Business Machines Corporation | Transferring task data between edge devices in edge computing |
11768609, | Aug 08 2008 | Amazon Technologies, Inc. | Managing access of multiple executing programs to nonlocal block data storage |
8402318, | Mar 24 2009 | The Trustees of Columbia University in the City of New York | Systems and methods for recording and replaying application execution |
8924930, | Jun 28 2011 | Microsoft Technology Licensing, LLC | Virtual machine image lineage |
Patent | Priority | Assignee | Title |
4718008, | Jan 16 1986 | International Business Machines Corporation | Method to control paging subsystem processing in a virtual memory data processing system during execution of critical code sections |
4868738, | Aug 15 1985 | LANIER WORLDWIDE, INC , A CORP OF DE | Operating system independent virtual memory computer system |
5280611, | Nov 08 1991 | International Business Machines Corporation | Method for managing database recovery from failure of a shared store in a system including a plurality of transaction-based systems of the write-ahead logging type |
5282274, | May 24 1990 | International Business Machines Corporation; INTERNATIONAL BUSINESS MACHINES CORPORATION, A CORP OF NEW YORK | Translation of multiple virtual pages upon a TLB miss |
5740440, | Jan 06 1995 | WIND RIVER SYSTEMS, INC | Dynamic object visualization and browsing system |
5802585, | Jul 17 1996 | Hewlett Packard Enterprise Development LP | Batched checking of shared memory accesses |
6014513, | Dec 23 1997 | Washington, University of | Discovering code and data in a binary executable program |
6101524, | Oct 23 1997 | International Business Machines Corporation; IBM Corporation | Deterministic replay of multithreaded applications |
6243793, | Jul 27 1995 | Intel Corporation | Protocol for arbitrating access to a shared memory area using historical state information |
6408305, | Aug 13 1999 | International Business Machines Corporation | Access frontier for demand loading pages in object-oriented databases |
6625635, | Nov 02 1998 | International Business Machines Corporation | Deterministic and preemptive thread scheduling and its use in debugging multithreaded applications |
6728950, | Oct 29 1998 | Texas Instruments Incorporated | Method and apparatus for translating between source and target code |
6820218, | Sep 04 2001 | Microsoft Technology Licensing, LLC | Persistent stateful component-based applications via automatic recovery |
6832367, | Mar 06 2000 | International Business Machines Corporation | Method and system for recording and replaying the execution of distributed java programs |
6848106, | Oct 05 1999 | Veritas Technologies LLC | Snapshot restore of application chains and applications |
6850945, | Feb 28 2002 | Veritas Technologies LLC | Systems, methods and apparatus for creating stable disk images |
6854108, | May 11 2000 | International Business Machines Corporation | Method and apparatus for deterministic replay of java multithreaded programs on multiprocessors |
7093162, | Sep 04 2001 | Microsoft Technology Licensing, LLC | Persistent stateful component-based applications via automatic recovery |
7251745, | Jun 11 2003 | SALESFORCE COM, INC | Transparent TCP connection failover |
20020087843, | |||
20020133675, | |||
20030212983, | |||
20040221272, | |||
20040255182, | |||
20060026387, | |||
20060150183, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Sep 29 2005 | SHATS, SERGE | VEROTAS OPERATING CORPORATION | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 017062 | /0529 | |
Sep 29 2005 | PASHENKOV, SERGE | VEROTAS OPERATING CORPORATION | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 017062 | /0529 | |
Sep 30 2005 | Symantec Corporation | (assignment on the face of the patent) | / | |||
Oct 30 2006 | VERITAS Operating Corporation | Symantec Corporation | CHANGE OF NAME SEE DOCUMENT FOR DETAILS | 019872 | /0979 | |
Oct 30 2006 | VERITAS Operating Corporation | Symantec Operating Corporation | CORRECTIVE ASSIGNMENT TO CORRECT THE ASSIGNEE PREVIOUSLY RECORDED ON REEL 019872 FRAME 979 ASSIGNOR S HEREBY CONFIRMS THE ASSIGNEE IS SYMANTEC OPERATING CORPORATION | 027819 | /0462 | |
Jan 29 2016 | Symantec Corporation | Veritas US IP Holdings LLC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 037697 | /0412 | |
Jan 29 2016 | Veritas US IP Holdings LLC | BANK OF AMERICA, N A , AS COLLATERAL AGENT | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 037891 | /0001 | |
Jan 29 2016 | Veritas US IP Holdings LLC | WILMINGTON TRUST, NATIONAL ASSOCIATION, AS COLLATERAL AGENT | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 037891 | /0726 | |
Mar 29 2016 | Veritas Technologies LLC | Veritas Technologies LLC | MERGER AND CHANGE OF NAME SEE DOCUMENT FOR DETAILS | 038455 | /0752 | |
Mar 29 2016 | Veritas US IP Holdings LLC | Veritas Technologies LLC | MERGER AND CHANGE OF NAME SEE DOCUMENT FOR DETAILS | 038455 | /0752 | |
Aug 20 2020 | Veritas Technologies LLC | WILMINGTON TRUST, NATIONAL ASSOCIATION, AS NOTES COLLATERAL AGENT | SECURITY INTEREST SEE DOCUMENT FOR DETAILS | 054370 | /0134 | |
Nov 27 2020 | WILMINGTON TRUST, NATIONAL ASSOCIATION, AS COLLATERAL AGENT | VERITAS US IP HOLDINGS, LLC | TERMINATION AND RELEASE OF SECURITY IN PATENTS AT R F 037891 0726 | 054535 | /0814 |
Date | Maintenance Fee Events |
Mar 26 2014 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Mar 22 2018 | M1552: Payment of Maintenance Fee, 8th Year, Large Entity. |
Jun 13 2022 | REM: Maintenance Fee Reminder Mailed. |
Nov 28 2022 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Oct 26 2013 | 4 years fee payment window open |
Apr 26 2014 | 6 months grace period start (w surcharge) |
Oct 26 2014 | patent expiry (for year 4) |
Oct 26 2016 | 2 years to revive unintentionally abandoned end. (for year 4) |
Oct 26 2017 | 8 years fee payment window open |
Apr 26 2018 | 6 months grace period start (w surcharge) |
Oct 26 2018 | patent expiry (for year 8) |
Oct 26 2020 | 2 years to revive unintentionally abandoned end. (for year 8) |
Oct 26 2021 | 12 years fee payment window open |
Apr 26 2022 | 6 months grace period start (w surcharge) |
Oct 26 2022 | patent expiry (for year 12) |
Oct 26 2024 | 2 years to revive unintentionally abandoned end. (for year 12) |