A method and system for asserting a lock in a distributed file system is provided. All distributed locks have a lease for a limited time period, wherein the lease may be renewed periodically. A lock manager data structure is provided to maintain mode compatibility of locks granted to different client nodes for the same object. The process of acquiring or reasserting a lock includes determining whether there are other valid locks in use, as well as whether a valid lock is in operation in a conflicting or compatible mode with a requested lock. A new lock lease may be granted and a lock lease which has expired may be reasserted if a conflicting lease is not present.
|
1. A computer implemented method for reasserting a lock in a distributed file system comprising:
creating a lock manager data structure in volatile memory for an object assigned a distributed lock; and
assigning an identifying number to a file system in which said object is located, said file system identifying number adapted to be incremented upon a restart of a server assigned to said file system;
comparing a file system identifying number provided by a client requesting a lock on said system object with a current file system identifying number in which said object is located; and
granting a lock reassertion request to said requesting client subsequent to breaking a lock if said compared file system identifying numbers are within an appropriate range.
7. A data storage system comprising:
a lock manager data structure for a system object assigned a distributed lock from a server, wherein said data structure is in volatile memory;
an identifying number assigned to a file system in which said object is located, said file system identifying number adapted to be incremented upon a restart of said server assigned to said file system; and
a manager adapted to compare a file system identifying number provided by a client requesting a lock on said system object with a current file system identifying number in which said object is located, and to grant a lock reassertion request to said requesting client subsequent to breaking a lock if said compared file system identifying numbers are within an appropriate range.
13. An article comprising:
a computer-readable signal-bearing medium;
means in the medium for creating a lock manager data structure in volatile memory for an object assigned a distributed lock obtained from a server;
means in the medium for assigning an identifying number to a file system in which said object is located;
means in the medium for incrementing said file system identifying number upon a restart of said server assigned to said file system;
means in the medium for comparing a file system identifying number provided by a client requesting a lock on said system object with a current file system identifying number in which said object is located; and
means in the medium for granting a lock reassertion request to said requesting client subsequent to breaking a lock if said compared file system identifying numbers are within an appropriate range.
2. The method of
3. The method of
4. The method of
5. The method of
6. The method of
8. The system of
9. The system of
10. The system of
11. The system of
12. The system of
14. The article of
15. The article of
16. The article of
17. The article of
18. The article of
19. The article of
|
1. Technical Field
This invention relates to a system and method for acquiring a distributed lock or reasserting a lock upon which a lock lease has expired. More specifically, the invention relates to respecting lock mode compatibility in a lock acquisition.
2. Description of the Prior Art
A storage area network (“SAN”) is an increasingly popular storage technology.
In distributed system with shared persistent storage, such as the configuration shown in
Prior art solutions for addressing failure associated with a distributed lock include leased based locks, versioned locks, lock expiration, and lock reassertion techniques. However, these prior art techniques do not always allow a client node to reassert a lock it previously held while maintaining cache consistency. In addition, the prior art techniques do not always deny a client node from reasserting a lock it previously held. Accordingly, there is a need for technique that enables client nodes to reassert a distributed lock while maintaining cache consistency.
This invention comprises a method and system for asserting a distributed lock while maintaining cache coherency in lock mode compatibility.
In a first aspect, a method for reasserting a lock in a distributed file system is provided. A lock manager data structure is created in volatile memory for each object on which a distributed lock is obtained from a server node. Increment of the lock version number for the distributed lock in persistent storage is managed. The lock version number increment is either deferred until lock information is evicted from the volatile memory, or incremented in persistent storage prior to grant of a first lock subsequent to a server start. In addition, cache consistency between a client node and the server node is preferably restored upon reassert of a lock request following loss of a lock lease.
In a second aspect of the invention, a computer system with a lock manager data structure for a system object assigned a distributed lock is provided. The data structure is in volatile memory. A manager adapted to control increment of a lock version number in persistent storage for the distributed lock is provided. The manager either defers increment of the lock version number in persistent storage until lock information is evicted from the volatile memory, or increments the lock version number in persistent storage prior to grant of a first lock subsequent to a server start. A cache manager is preferably provided to restore cache consistency between a client node and the server node in conjunction with a reassert of a lock request from the client node subsequent to loss of a lock lease.
In a third aspect of the invention, an article in a computer-readable signal-bearing medium is provided. Means in the medium are provided to create a lock manager data structure in volatile memory for each system object on which a distributed lock is obtained from a server node. In addition, means in the medium for managing increment of a lock version number for the distributed lock in persistent storage are provided. The managing means may either defer increment of the lock version number until lock information is evicted from the volatile memory, or increment the lock version number in persistent storage prior to granting a first lock subsequent to start of the server node. In addition, means in the medium may be provided to restore cache consistency between a client node and the server node in conjunction with a reassert of a lock request from the client node subsequent to loss of a lock lease.
Other features and advantages of this invention will become apparent from the following detailed description of the presently preferred embodiment of the invention, taken in conjunction with the accompanying drawings.
In a distributed file system, locks are requested by client nodes and granted by a server node. All locks have a lock version number that is provided to the client node together with the grant of the lock. In addition, a server node instance is identified by a persistently stored epoch number, wherein the epoch number is incremented each time the server node begins managing the storage containing the data to be locked, such as when the server node is restarted. When a client node is granted a lock from the server node, the client node establishes a lease with each server node serving a file system in use by the client node. A unique client node identifier is assigned to the client node when the lease is established. A granted lock is valid as long as a client node maintains a lease from the data server node that has granted the lock. If a client node fails to renew a lease for a granted lock, the lease will expire. The client node must obtain a new lease with a new client node identifier if they need the lock. If a client node should request a lock that is indicated as owned by a second client node in the system, but the second client node has failed to maintain the lock lease, the requesting client node may recover the lock from the second client node. This lock is known as a stolen lock. When a lock is stolen, the server node increments the lock version number on disk. Alternatively, if a lock lease has expired and has not been stolen, then a client node may try to reassert the lock. Accordingly, locks may be acquired or reasserted within various parameters while maintaining compatibility within the modes of the granted locks.
In a client node, a filesystem is provided, wherein the file system is in the form of a subtree rooted at a particular directory. The root of the tree is the name of the file system that describes the filesystem tree rooted at the root directory. A distributed lock is obtained by a client node from a server node serving the filesystem which contains the file system object metadata. Such locks may include session locks, data locks, and range locks. Each of these types of locks operate in different modes. For example, there are eight different modes of operation for a session lock. Some of these modes are compatible, meaning they may co-exist, and others are not. Accordingly, in the process of acquiring or reasserting a lock it is important to determine whether there are other valid locks in progress, as well as whether a valid lock is operating in a conflicting or compatible mode with a requested lock.
When a server node is in a start-up mode of operation, it is assigned a file system to manage. For each file system that the server node is assigned, it must proceed through a routine to open the file system.
Following the process of opening a file system, a new lock may be requested by a client from the server node managing the file system.
If at step 70 it is determined that the lock manager data structure is in the cache in the server node's volatile memory, a query is conducted to determine if the client requesting the lock already holds a lock in a mode that is not strength related to the requested lock mode 72. A positive response to the query at step 72, results in a denial of the lock request 74. However, a negative response to the query at step 72, results in a subsequent query to determine if the client requesting the lock is in possession of another lock that has a mode equivalent or greater than the mode in the requested lock 80. A positive response to the query at step 80 will return a communication from the server node to the client with the lock version number and the file system epoch number of the lock it holds 82, and a successful completion of the lock acquisition process 84. However, a negative response will result in a subsequent query to determine if any other client nodes in communication with the server node are in possession of a lock that conflicts with the mode presented by the client in the lock request 86. If it is determined that other client nodes in communication with the server node are not in possession of a lock with a conflicting mode, it must be determined whether the client node requesting the lock is already in possession of a lock for the identified object 118. For example, the client node requesting the lock may be in possession of a read lock for the identified object and is now in the process of acquiring a write lock. A positive response to the query at step 118 will result in an upgrade in the existing lock to the mode in the current lock request 122. Alternatively, a negative response to the query at step 118 will result in creation of a new lock in the requested mode 120 as the client node is not in possession of a lock. Thereafter, an entry for the lock mode associated with new lock 120 or the modified pre-existing lock 122 is added to the lock manager data structure 124. The lock mode is then communicated to the client node requesting the lock, together with the lock version, and the file system epoch number 126 indicating successful completion of acquisition of the new lock 128. Accordingly, the steps presented above outline the process of granting a lock to a client node when the mode of the requested lock mode does not conflict with any current locks and associated modes held by other client nodes in the system.
The steps described above illustrate the process of granting a lock to a client node when other client nodes in the system do not hold locks that are in a conflicting mode to the lock in the process of being requested. However, if at step 86, the response to the query is positive, the conflicting lock mode must be resolved. For each lock in conflict with the requested lock mode 88, a query is conducted to determine if the existing client node's lock lease that conflicts with the current lock request has expired 90. If the response at step 90 is positive, a process for theft of the lease expired lock is initiated. A flag is set in the lock manager data structure to indicate that the lock has been stolen 96 and the expired client node's lock is then deleted 98. However, if the response at step 90 is negative, the client node requesting the lock is added to a list of client nodes that may hold locks with modes that conflict with the mode of the lock being requested 92. A demand message is then sent to the client node in possession of the conflicting lock 94. The processes outlined in steps 92–98 are conducted for each existing lock in conflict with the mode of the requested lock. For each client node that has received a demand message 100, the server node waits for the client node to acknowledge receipt of the demand message in the form of a signal 102. Such a signal may be a message requesting to downgrade the mode of the existing lock, as illustrated in
In addition to acquiring a lock, or requesting an upgrade to an existing lock, a client node may also reassert a lock for a lease that has expired.
Following a positive response to the query a step 150, a subsequent query is conducted to determine if a lock manager data structure for the client node identified object exists in the lock manager cache 152. A negative response to the query at step 152 will result in creation of a new lock manager data structure for the identified object 154, followed by insertion of the lock manager data structure object into the lock manager cache 156. However, if at step 152, the response to the query is negative, a subsequent query is conducted to determine if the reasserting client node is currently in possession of a lock in the requested mode 158. If the requesting client node is already in possession of the lock in the requested mode, the existing lock mode, lock version and file system epoch number is communicated to the client node 160, and the lock is successfully reasserted 162. However, if the response to the query at step 158 is negative, another query is conducted to determine if the requested lock exists under an alternative identifier 164. Every time a client obtains a new lease, a client node is assigned a client node identifier associated with the lease. An alternative client node identifier is a client node identifier associated with the now expired lease that was in effect at the time the lock was granted. A positive response to the query at step 164 will result in deletion of the lock owned by the alternative identifier 170. Alternatively, a negative response to the query at step 164 will result in a subsequent query to determine if any lock mode in the set of existing locks conflicts with the lock reassertion request 166. If there is a conflict, the lock reassertion request is denied 166. However, if there is no lock mode conflict, a new lock is created in the requested lock mode 172. Similarly, following insertion of the lock manager data structure object into the lock manager cache at step 156, or deletion of the lock owned by an alternative identifier at step 170, a new lock is created in the requested lock mode 172. Following the creation of the new lock 172, the lock mode, the lock version, and the file system epoch number is communication to the client node reasserting the lock 174, and the lock is granted to the client node 176. Accordingly, the lock reassertion routine enables a client node to reassert a lock request for a lease that has expired within certain parameters.
Part of the lock reassertion subroutine not clearly shown in
A client node in need of a lock may either request a new lock or reassert a lock on an expired lock lease. During the routine of acquiring a lock, as shown in
During the routine of downgrading a lock mode for an existing lock, it is possible that the lock mode has been previously downgraded, as shown at step 192 of
A client node granted a lock may hold the lock for a set lease period. Thereafter, the client node may renew the lease. However, if a client node fails to a renew a lock lease, the lease will expire.
The lock acquisition and lock reassertion routines described herein enable a client node to request a distributed lock while maintaining cache consistency. Increment of the lock version number in persistent storage number is either deferred until lock information is evicted from volatile memory, or incremented in persistent storage prior to granting a first lock following a start or restart of the server. Incrementing the lock version number in persistent storage before the grant of the first lock after a restart of the server removes any issues associated with the server storing the lock state in volatile memory. In addition, the process of restoring cache consistency between a client node and a server node upon reasserting a lock request subsequent to a loss of a lock lease enables both the client and the server to recover any data that may have become lost during a loss of a lock lease. When a filesystem epoch number is incremented, the commitment of the new epoch number to persistent storage is deferred until expiration of a lock reassertion grace period. This enables client's previously holding locks to reassert their locks for a defined time period. Once the filesystem epoch number is committed to persistent storage, the client that failed to reassert the lock within the grace period will lose its lock and it's ability to reassert it's lock.
It will be appreciated that, although specific embodiments of the invention have been described herein for purposes of illustration, various modifications may be made without departing from the spirit and scope of the invention. In particular, a list of granted lock modes may be maintained in volatile memory. An increment of the lock version number in persistent memory may be deferred until the lock information is evicted from the volatile memory. Finally, both the epoch and the lock version can be represented any incrementable value such as an alphabetic character or raw bytes—not necessarily only a number. Accordingly, the scope of protection of this invention is limited only by the following claims and their equivalents.
Ananthanarayanan, Rajagopal, Rees, Robert M., Becker-Szendy, Ralph A., Guthridge, D. Scott
Patent | Priority | Assignee | Title |
10776206, | Feb 06 2004 | VMware | Distributed transaction system |
10949397, | Dec 11 2014 | Amazon Technologies, Inc | Data locking and state management on distributed storage systems |
11061864, | Jul 26 2018 | Red Hat, Inc. | Clustered network file system server reclaim via decentralized, coordinated grace period |
7640243, | Dec 16 2004 | Kabushiki Kaisha Toshiba; Toshiba Tec Kabushiki Kaisha | Image forming apparatus including a document storing section |
7849098, | Feb 06 2004 | VMWARE, INC | Providing multiple concurrent access to a file system |
8266634, | Oct 12 2007 | Microsoft Technology Licensing, LLC | Resource assignment system with recovery notification |
8307085, | Mar 16 2010 | Microsoft Technology Licensing, LLC | Storing state of distributed architecture in external store |
8433693, | Apr 02 2007 | Microsoft Technology Licensing, LLC | Locking semantics for a storage system based on file types |
8489636, | Feb 06 2004 | VMware, Inc. | Providing multiple concurrent access to a file system |
8543781, | Feb 06 2004 | VMware, Inc. | Hybrid locking using network and on-disk based schemes |
8560747, | Feb 16 2007 | VMWARE, INC | Associating heartbeat data with access to shared resources of a computer system |
8700585, | Feb 06 2004 | VMware, Inc. | Optimistic locking method and system for committing transactions on a file system |
8707318, | Oct 12 2007 | Microsoft Technology Licensing, LLC | Partitioning system including a generic partitioning manager for partitioning resources |
8914610, | Aug 26 2011 | VMware, Inc. | Configuring object storage system for input/output operations |
8966147, | Aug 22 2012 | Apple Inc | Lock leasing method for solving deadlock |
9031984, | Feb 06 2004 | VMware, Inc. | Providing multiple concurrent access to a file system |
9130821, | Feb 06 2004 | VMware, Inc. | Hybrid locking using network and on-disk based schemes |
9134922, | Mar 12 2009 | VMware, Inc. | System and method for allocating datastores for virtual machines |
Patent | Priority | Assignee | Title |
5212788, | May 22 1990 | HEWLETT-PACKARD DEVELOPMENT COMPANY, L P | System and method for consistent timestamping in distributed computer databases |
5414839, | Jun 19 1992 | HEWLETT-PACKARD DEVELOPMENT COMPANY, L P | Hybrid lock escalation and de-escalation protocols |
6324581, | Mar 03 1999 | EMC IP HOLDING COMPANY LLC | File server system using file system storage, data movers, and an exchange of meta data among data movers for file locking and direct access to shared file systems |
6353898, | Feb 21 1997 | RPX Corporation | Resource management in a clustered computer system |
6601070, | Apr 05 2001 | HEWLETT-PACKARD DEVELOPMENT COMPANY, L P | Distribution of physical file systems |
6609136, | Feb 13 1998 | Oracle International Corporation | Recovering data from a failed cache using a surviving cache |
6772155, | Apr 04 2001 | TERADATA US, INC | Looking data in a database system |
6775703, | May 01 2000 | International Business Machines Corporation | Lease based safety protocol for distributed system with multiple networks |
6850938, | Feb 08 2001 | Cisco Technology, Inc. | Method and apparatus providing optimistic locking of shared computer resources |
6959337, | Apr 23 2001 | HEWLETT-PACKARD DEVELOPMENT COMPANY, L P | Networked system for assuring synchronous access to critical facilities |
7003531, | Aug 15 2001 | RPX Corporation | Synchronization of plural databases in a database replication system |
7065540, | Nov 24 1998 | Oracle International Corporation | Managing checkpoint queues in a multiple node system |
20020147719, | |||
20020165929, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Apr 29 2003 | International Business Machines Corporation | (assignment on the face of the patent) | / | |||
Jun 24 2003 | GUTHRIDGE, D SCOTT | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 014434 | /0682 | |
Jun 26 2003 | ANANTHANARAYANAN, RAJAGOPAL | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 014434 | /0682 | |
Jun 26 2003 | BECKER-SZENDY, RALPH A | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 014434 | /0682 | |
Jun 26 2003 | REES, ROBERT M | International Business Machines Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 014434 | /0682 | |
Mar 31 2014 | International Business Machines Corporation | LinkedIn Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 035201 | /0479 |
Date | Maintenance Fee Events |
Sep 11 2006 | ASPN: Payor Number Assigned. |
Apr 16 2010 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
May 30 2014 | REM: Maintenance Fee Reminder Mailed. |
Sep 24 2014 | M1552: Payment of Maintenance Fee, 8th Year, Large Entity. |
Sep 24 2014 | M1555: 7.5 yr surcharge - late pmt w/in 6 mo, Large Entity. |
May 28 2018 | REM: Maintenance Fee Reminder Mailed. |
Nov 19 2018 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Oct 17 2009 | 4 years fee payment window open |
Apr 17 2010 | 6 months grace period start (w surcharge) |
Oct 17 2010 | patent expiry (for year 4) |
Oct 17 2012 | 2 years to revive unintentionally abandoned end. (for year 4) |
Oct 17 2013 | 8 years fee payment window open |
Apr 17 2014 | 6 months grace period start (w surcharge) |
Oct 17 2014 | patent expiry (for year 8) |
Oct 17 2016 | 2 years to revive unintentionally abandoned end. (for year 8) |
Oct 17 2017 | 12 years fee payment window open |
Apr 17 2018 | 6 months grace period start (w surcharge) |
Oct 17 2018 | patent expiry (for year 12) |
Oct 17 2020 | 2 years to revive unintentionally abandoned end. (for year 12) |