A method for controlling a software lock acquirable by processors in a plurality of nodes of a multiprocessing system is disclosed. The method comprises a first processor of a first node of the plurality of nodes acquiring the lock, and the first processor selectively releasing the lock in a first state that allows other processors within the first node to acquire the lock but that prevents processors in a remote node of the plurality of nodes from obtaining the lock. In another embodiment, a method comprises a first processor of a first node attempting to acquire the lock, the first processor determining whether another processor within the same node is remotely spinning on the lock, and the first processor remotely spinning on the lock in response to determining that another processor in the same node is not remotely spinning on the software lock.
|
1. A method for controlling a software lock to a shared memory resource that is acquirable by processors in a plurality of nodes of a multiprocessing system, the method comprising:
a first processor of a first node of the plurality of nodes acquiring the software lock;
the first processor selectively releasing the software lock in a local free state by writing to a local software lock that allows another processor within the first node to acquire the software lock but that prevents processors in a remote node of the plurality of nodes from obtaining the software lock in response to a fairness value being in a false state and in response to an indication that the another processor in the same node attempted to acquire the software lock;
the first processor releasing the software lock in a free state that allows all processors within any of the plurality of nodes to attempt to acquire the software lock in response to the fairness value being in a false state and in response to an indication that no other processor in the same node attempted to acquire the software lock; and
the first processor also releasing the software lock in the free state in response to the fairness value being in a true state regardless of an indication that the another processor in the same node attempted to acquire the software lock.
9. A multiprocessing computer system comprising:
a plurality of nodes including a plurality of processors;
wherein a first processor of a first node of the plurality of nodes is configured to acquire a software lock to a shared memory resource;
wherein the first processor is configured to selectively release the software lock in a local free state by writing to a local lock that allows another processor within the first node to acquire the software lock but that prevents processors in a remote node of the plurality of nodes from obtaining the software lock in response to a the fairness value being in a false state and in response to an indication that the another processor in the same node attempted to acquire the software lock;
wherein the first processor is further configured to release the software lock in a free state that allows all processors within any of the plurality of nodes to attempt to acquire the software lock in response to the fairness value being in a false state and in response to an indication that no other processor in the same node attempted to acquire the software lock; and
wherein the first processor is further configured to release the software lock in the free state in response to the fairness value being in a true state regardless of an indication that any other processor in the same node attempted to acquire the software lock.
14. A computer readable storage including program instructions executable to implement a method for controlling a software lock to a shared memory resource, the software lock being acquirable by processors in a plurality of nodes of a multiprocessing system, wherein the method comprises:
a first processor of a first node of the plurality of nodes acquiring the software lock;
the first processor selectively releasing the software lock in a local free state that allows another processor within the first node to acquire the software lock but that prevents processors in a remote node of the plurality of nodes from obtaining the software lock in response to a fairness value being in a false state and in response to an indication that the another processor in the same node attempted to acquire the software lock;
the first processor releasing the software lock in a free state that allows all processors within in any of the plurality of nodes to attempt to acquire the software lock in response to the fairness value being in a false state and in response to an indication that no other processor in the same node attempted to acquire the software lock; and
the first processor releasing the software lock in the free state in response to the fairness value being in a true state regardless of an indication that any other processor in the same node attempted to acquire the software lock.
2. The method as recited in
3. The method as recited in
the first processor checking the state of a remote lock;
the first processor determining that the state of the remote lock is free; and
the first processor changing the state of the remote lock to remote.
4. The method as recited in
a second processor of a second node of the plurality of nodes attempting to acquire the software lock;
the second processor determining whether another processor within the second node is remotely spinning on the software lock; and
the second processor remotely spinning on the software lock in response to determining that another processor in the second node is not remotely spinning on the software lock.
5. The method as recited in
6. The method as recited in
7. The method as recited in
8. The method as recited in
10. The multiprocessing computer system as recited in
11. The multiprocessing computer system as recited in
a second processor of a second node of the plurality of nodes;
wherein the second processor is configured to attempt to acquire the software lock;
wherein the second processor is further configured to determine whether another processor within the second node is remotely spinning on the software lock; and
wherein the second processor is configured to remotely spin on the software lock in response to determining that another processor in the second node is not remotely spinning on the software lock.
12. The multiprocessing computer system as recited in
13. The multiprocessing computer system as recited in
15. The computer readable storage recited in
the first processor checking the state of a remote lock;
the first processor determining that the state of the remote lock is free; and
the first processor changing the state of the remote lock to remote.
16. The computer readable storage recited in
a second processor of a second node of the plurality of nodes attempting to acquire the software lock;
the second processor determining whether another processor within the second node is remotely spinning on the software lock; and
the second processor remotely spinning on the software lock in response to determining that another processor in the second node is not remotely spinning on the software lock.
17. The computer readable storage recited in
18. The computer readable storage recited in
|
This application claims the benefit of U.S. Provisional Application No. 60/375,742, entitled “Scalable Hierarchical Spin-Lock”, filed Apr. 26, 2002.
1. Field of the Invention
This invention relates to the field of multiprocessor computer systems and, more particularly, to mechanisms and methods for optimizing spin-lock operations within multiprocessor computer systems.
2. Description of the Related Art
A popular architecture in commercial multiprocessing computer systems is a distributed shared memory architecture. A distributed shared memory architecture includes multiple nodes within which processors and memory reside. The multiple nodes communicate via a network coupled there between. When considered as a whole, the memory included within the multiple nodes forms the shared memory for the computer system. Typically, directories are used to identify which nodes have cached copies of data corresponding to a particular address. Coherency activities may be generated via examination of the directories.
Distributed shared memory systems are scaleable, overcoming various limitations associated with shared bus architectures. Since many of the processor accesses are completed within a node, nodes typically have much lower bandwidth requirements upon the network in comparison to the bandwidth requirements a shared bus architecture must provide upon its shared bus to attain comparable performance. The nodes may operate at high clock frequency and bandwidth, accessing the network when needed. Additional nodes may be added to the network without affecting the local bandwidth of the nodes. Instead, only the network bandwidth is affected.
Many distributed shared memory architectures have non-uniform access time to the shared memory. Such architectures are known as non-uniform memory architectures (NUMA). Most systems that form NUMA architectures also have the characteristic of a non-uniform communication architecture (NUCA), in which the access time from a processor the other processors' caches varies greatly depending on their placement. In particular, node-based NUMA systems, where a group of processors have a much shorter access time to each other's caches than to the other caches, are common. Recently, technology trends have made it attractive to run more than one thread per chip, using either the chip multiprocessor (CMP) and/or the simultaneous multi-threading (SMT) approach. Large servers, built from several such chips, can therefore be expected to form NUCA architectures, since collated threads will most likely share an on-chip cache at some level.
Due to the popularity of NUMA systems, optimizations directed to such architectures have attracted much attention in the past. For example, optimizations involving the migration and replication of data in NUMA systems have demonstrated a great performance improvement in many applications. In addition, since many of today's applications exhibit a large fraction of cache-to-cache misses, optimizations which consider the NUCA nature of a system may also lead to significant performance enhancements.
One particular problem associated with multiprocessing computer systems having distributed shared memory architectures relates to spin-lock operations. In general, spin-lock operations are associated with software locks that are used by programs to ensure that only one parallel process at a time can access a critical region of memory. A variety of lock implementations have been proposed, ranging from simple spin-locks to advanced queue-based locks. Although simple spin-lock implementations can create very bursty traffic as described below, they are still the most commonly used software lock within computer systems.
Systems employing spin-lock implementations typically require that a given process perform an atomic operation to obtain access to a critical memory region. For example, an atomic test-and-set operation is commonly used. The test-and-set operation is performed to determine whether a lock bit associated with the memory region is cleared and to atomically set the lock bit. That is, the test allows the thread to determine whether the memory region is free of a lock by another thread, and the set operation allows the thread to acquire the lock if the lock bit is cleared. If the test of the lock bit indicates that the memory region is currently locked, the thread initiates a software loop wherein the lock bit is continuously read until the lock bit is detected as cleared, at which time the thread reinitiates the atomic test-and-set operation.
When several spinning processors contend for access to the same memory region, a relatively large number of transaction requests may be generated. Since only one of the spinning processors will acquire the lock during such contention, the failed test-and-set operations of the remaining processors result in undesirable requests on the network. Due to this, the latency associated with the release of a lock until the next contender can acquire the lock may be relatively high. In addition, the large number of transactions can further limit the maximum frequency at which ownership of the lock can migrate from processor to processor and node to node. Still further, depending upon the order in which granting of the lock to different processors is performed, the lock (and the data lock is protecting) may migrate back and forth between nodes, invalidating other copies. This may result in yet additional undesirable network traffic.
Other spin-lock implementations have therefore been proposed to improve performance and reduce network traffic when contention for a lock exists. For example, in some implementations, the burst of refill traffic when a lock is released may be reduced by using an exponential back-off delay in which, after failing to obtain a lock, the requester waits for successively longer periods of time before initiating additional lock operations. In other implementations, queue-based locking methodologies have been employed to reduce network traffic. In a system that implements a queue-based lock, requesting processors contending for a lock are queued in an order. A contending processor generates transactions to acquire the lock only if it is the next in line contender. Numerous variations of queue-based lock implementations are known.
While the various optimizations for spin-lock implementations have in some instances led to enhanced performance, most solutions do not consider or exploit the NUCA characteristics of a distributed shared memory computer system. In addition, many implementations have resulted in relatively high latencies for uncontended locks. A mechanism and methodology are therefore desirable that may exploit the NUCA nature of a multiprocessing system to optimize spin-lock operations without introducing significant latencies for uncontended locks.
Various embodiments of multiprocessing computer systems and methods are provided that implement hierarchical spin locks. In one embodiment, a method for controlling a software lock acquirable by processors in a plurality of nodes of a multiprocessing system is disclosed. The method comprises a first processor of a first node of the plurality of nodes acquiring the lock, and the first processor selectively releasing the lock in a first state that allows other processors within the first node to acquire the lock but that prevents processors in a remote node of the plurality of nodes from obtaining the lock. In this manner, node locality may result wherein a thread that is executing within the same node in which a lock has already been obtained will be more likely to subsequently acquire the lock when it is freed in relation to other contending threads executing in other nodes.
In another embodiment, a method for performing spin lock operations in a computer system is disclosed. The method comprises a first processor of a first node attempting to acquire the lock, the first processor determining whether another processor within the same node is remotely spinning on the lock, and the first processor remotely spinning on the lock in response to determining that another processor in the same node is not remotely spinning on the software lock.
While the invention is susceptible to various modifications and alternative forms, specific embodiments are shown by way of example in the drawings and are herein described in detail. It should be understood, however, that drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the invention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the present invention as defined by the appended claims.
Turning now to
Each processing node 12 is a processing node having memory 22 as the shared memory. Processors 16 are high performance processors. In one embodiment, each processor 16 may employ an ultraSPARC® processor architecture. It is noted, however, that any processor architecture may be employed by processors 16.
Typically, processors 16 may include internal instruction and data caches. Therefore, caches 18 are labeled as L2 caches (for level 2, wherein the internal caches are level 1 caches). If processors 16 are not configured with internal caches, then external caches 18 are level 1 caches. It is noted that the “level” nomenclature is used to identify proximity of a particular cache to the processing core within processor 16. Level 1 is nearest the processing core, level 2 is next nearest, etc. Caches 18 provide rapid access to memory addresses frequently accessed by the processor 16 coupled thereto. It is noted that external caches 18 may be configured in any of a variety of specific cache arrangements. For example, set-associative or direct-mapped configurations may be employed by external caches 18. It is noted that in some embodiments, the processors 16 and caches 18 of a node may be incorporated together on a single integrated circuit in a chip multiprocessor (CMP) configuration.
Node interconnect 20 accommodates communication between processors 16 (e.g., through caches 18), memory 22, system interface 24, and I/O interface 26. In one embodiment, node interconnect 20 includes an address bus and related control signals, as well as a data bus and related control signals. Because the address and data buses are separate, a split-transaction bus protocol may be employed upon node interconnect 20. Generally speaking, a split-transaction bus protocol is a protocol in which a transaction occurring upon the address bus may differ from a concurrent transaction occurring upon the data bus. Transactions involving address and data include an address phase in which the address and related control information is conveyed upon the address bus, and a data phase in which the data is conveyed upon the data bus. Additional address phases and/or data phases for other transactions may be initiated prior to the data phase corresponding to a particular address phase. An address phase and the corresponding data phase may be correlated in a number of ways. For example, data transactions may occur in the same order that the address transactions occur. Alternatively, address and data phases of a transaction may be identified via a unique tag.
In alternative embodiments, node interconnect 20 may be implemented as a circuit-switched network or a packet-switched network. In embodiments where node interconnect 20 is a packet-switched network, packets may be sent through the data network using techniques such as wormhole, store and forward, or virtual cut-through. In a circuit-switched network, a particular device may communicate directly with a second device via a dedicated point-to-point link that may be established through a switched interconnect mechanism. To communicate with a different device, a different link is established through the switched interconnect. In some embodiments, separate address and data networks may be employed.
Memory 22 is configured to store data and instruction code for use by processors 16. Memory 22 preferably comprises dynamic random access memory (DRAM), although any type of memory may be used. Memory 22, in conjunction with similar illustrated memories in the other processing nodes 12, forms a distributed shared memory system. Each address in the address space of the distributed shared memory is assigned to a particular node, referred to as the home node of the address. A processor within a different node than the home node may access the data at an address of the home node, potentially caching the data. Therefore, coherency is maintained between processing nodes 12 as well as among processors 16 and caches 18 within a particular processing node 12A-12D. System interface 24 provides internode coherency, and may further perform snooping or other functionality on interconnect 20 to provide intranode coherency.
In various embodiments, data stored in a particular cache, such as cache 18A, may be accessed and used to service requests by other processors (e.g., processor 16B). In addition or alternatively, in various embodiments, portions of memory 22 may be allocated to cache data that is mapped to other nodes (i.e., data having remote home nodes). In this manner, data accessed by a processor 16 may be cached in an allocated location of the memory 22 of that node to allow quicker subsequent accesses to the data.
In addition to maintaining internode coherency, system interface 24 may receive transactions through node interconnect 20 which require a data transfer to or from another processing node 12. System interface 24 performs the transfer, and provides the corresponding data for the transaction upon node interconnect 20, if appropriate. In the embodiment shown, system interface 24 is coupled to a point-to-point network 14. However, it is noted that in alternative embodiments other interconnect structures may be used.
It is noted that the system of
As will be described in further detail below, computer system 10 may be configured to perform optimized spin-lock operations. More particularly, code executing on processors 16 may be configured such that lock implementations result in a higher probability that contended locks will be handed over to threads executing within the same node. In addition, the number of processors spinning on a remotely held lock may be limited. In view of the non-uniform nature of the communication architecture, improved performance may thus be attained by reducing network traffic and by reducing migration of the lock (as well as the data the lock is protecting) from node to node.
In the implementation of
A local lock may be associated with any one of a number of states.
More particularly, in one embodiment, a local lock may be in a “free” state, a “local free” state, or a “remote” state. The free state indicates that the lock is not currently owned by any thread, and allows a processor in any node to acquire the lock. The local free state similarly indicates that no thread currently owns the lock, but allows only processors within the same node to acquire the lock. Finally, the remote state indicates that a thread executing in a remote processor currently holds the lock. Indications of any of these states may be encoded and stored in the coherence unit forming the lock. As will be discussed further below, a local lock may further identify a thread id of either a thread that has acquired the lock or of a thread that has attempted to acquire the lock.
In the following discussion, it is assumed that the system implementation contains a total of two nodes. It is also assumed that the local lock in one of the nodes is initialized in the free state while the mirror of the lock in the other node is initialized in the remote state. Other embodiments are possible that employ larger numbers of nodes.
A thread executing on a given processor 16 may attempt to obtain a software lock by initiating the lock acquisition steps depicted in
In one implementation, steps 202 and 204 may be performed atomically by executing a swap instruction. A swap instruction atomically writes a value to the addressed memory location and returns its original contents. It is noted that in other implementations, other atomic memory primitive operations and/or non-atomic memory primitive operations may alternatively be performed.
In step 206, if the content of the local lock indicates the lock was in a free or a local free state, the routine ends. It is noted that in this situation, the thread has acquired the lock, and the id of the thread acquiring the lock is stored with the content of the lock upon completion.
If, on the other hand, the lock was neither in a free or a local free state, a further check is performed in step 206 to determine whether the lock was in a remote state. If the lock was not in a remote state, an Acquire Slow Path Routine is performed, as discussed below with reference to
In step 302, a variable “Be_fair” is calculated. In one embodiment, the Be_fair value is a boolean value of either true or false. The function of the Be_fair variable will be discussed further below.
In steps 304 and 306, the content of the local lock is checked and the id of the thread performing the routine is written into the local lock. Similar to the previous description, steps 304 and 306 may be performed atomically through execution of, for example, a swap instruction. In step 308, if the content of the lock indicates the lock was in either a free or local free state, the routine ends. It is noted that in this situation, the thread has acquired the lock, and its id is stored in the content of the lock upon completion.
If, on the other hand, the lock was not in a free or a local free state, it is determined whether the lock was in the remote state (step 310). If the lock was in the remote state, the Acquire Remote Routine is performed. On the other hand, if the state of the lock was not in the remote state, a loop delay (i.e., backoff delay) is introduced in step 312, and the steps are repeated beginning at step 304. It is noted that steps 304-312 are repetitively performed until the state of the lock is returned in the free, local free or remote state.
In one embodiment, the delay introduced in step 312 may exponentially increase (e.g., with a capped delay value) during successive iterations of the acquire slow path routine, as desired. This may result in improved spin lock performance. In other embodiments, the delay introduced in step 312 may vary over time in other specific ways.
In step 404, the content of the remote lock is checked. If the content of the remote lock indicates the remote lock is in a free state (step 406), the content of the remote lock is changed (i.e., written to) in step 408 to indicate the remote state. In one embodiment, steps 404, 406 and 408 may be performed atomically by executing a compare and swap (CAS) instruction. The CAS (address, expected value, new value) instruction atomically checks the content of the addressed memory location to see if it matches an expected value and, if so, replaces it with a new value. It is noted that in other implementations, other specific instructions may be utilized to determine whether the remote lock is in a free state and, if so, to change the state of the remote lock to the remote state. For example, in other implementations, a test and set operation or a swap operation may be utilized. Still other implementations may employ other atomic memory primitive operations and/or non-atomic memory primitive operations (e.g., load-linked) or load-locked), store-conditional operations).
If, on the other hand, it is determined in step 406 that the remote lock was not in the free state, a loop delay is introduced in step 410, and steps 404 and 406 are repeated. Like the foregoing, the loop delay provided in step 410 may vary over successive iterations. For example, in one embodiment, the delay introduced in step 410 increases exponentially, as desired. It is noted that a thread performing the Acquire Remote Routine will remain in the loop until the remote lock becomes free and its state is changed to remote. When this occurs, the processor spinning on the remote lock has acquired the lock, and the routine ends.
If the content of the local lock contains a value equaling the id of the releasing thread (e.g., “my_thread id”), the local lock is released in the free state in step 510 (i.e., by writing to the local lock). Alternatively, if the content of the local lock does not contain the id of the releasing thread, the local lock is released in the local free state in step 512. In one embodiment, steps 506, 508 and 510 are performed atomically by executing a compare and swap instruction. In alternative embodiments, other atomic memory primitive operations and/or non-atomic memory primitive operations may be performed.
Thus, it is noted that if another processor within the same node attempted to acquire the lock while it was held by the releasing processor, the lock will not contain a value equaling the id of the releasing thread (e.g., my_thread id), but will instead contain a value equaling the id of the thread that attempted to acquire the lock. Accordingly, based on this, if another processor within the same node attempted to acquire the lock, the releasing thread will release the lock in the local free state in step 512. As stated previously, when released in the local free state, only processors within the same node will be able to acquire the lock. When the thread executing on that processor later acquires the lock in steps 304 and 308 of the Acquire Slow Path Routine, it may ultimately release the lock in the free state, if no other processors within the same node are continuing to attempt to acquire the lock. In this manner, it is more likely that a contended lock will be handed over to threads executing within the same node.
It is noted that Be_fair value is calculated randomly, and may be provided to prevent excessive bias towards processors within the same node. In the depicted implementation, based on the randomly generated Be_fair value, a processor releasing the lock is selectively forced to release the lock in the free state thus allowing processors in remote nodes to acquire the lock regardless of whether other local processors have attempted to acquire the lock. It is further noted that while the Be_fair value is calculated in step 302 of the Acquire Slow Path Routine, the Be_fair value could be calculated at alternative points in the code execution, as desired.
The following code sequence may be used to implement one embodiment of the operations depicted in
typedef volatile unsigned long rh_lock;
---------------------------------------------------------------------------------------
1:
void rh_acquire (rh_lock *L)
2:
{
3:
unsigned long tmp;
4:
5:
tmp = swap(L, my_tid);
6:
if (tmp == L_FREE ∥ tmp == FREE)
7:
return;
8:
if(tmp == REMOTE) {
9:
rh_acquire_remote_lock(L);
10:
return;
11:
}
12:
rh_acquire_slowpath(L);
13:
}
1:
void rh_acquire_slowpath(rh_lock *L)
2:
{
3:
unsigned long tmp;
4:
int b = BACKOFF_BASE, i
5:
6:
if ((random( ) % FAIR_FACTOR) == 0)
7:
Be_fair = TRUE;
8:
else
9:
Be_fair = FALSE;
10:
11:
while (1) {
12:
for (i = b; i; i--); // delay
13:
b = min(b * BACKOFF_FACTOR, BACKOFF—
CAP);
14:
if(*L < FREE)
15:
continue;
16:
tmp = swap(L, my_tid);
17:
if(tmp == L_FREE ∥ tmp == FREE)
18:
break;
19:
if(tmp == REMOTE) {
20:
rh_acquire_remote_lock(L);
21:
break;
22:
}
23:
}
24:
}
1:
void rh_acquire_remote_lock(rh_lock *L)
2:
{
3:
int b = REMOTE_BACKOFF_BASE, i
4:
5:
L = get_remote_lock_addr(L, my_node_id);
6:
7:
while (1) {
8:
if(cas(L, FREE, REMOTE) == FREE)
9:
break;
10:
for (i = b; i; i--) ;// delay
11:
b = min(b * BACKOFF_FACTOR, REMOTE—
BACKOFF_CAP);
12:
}
13:
}
1:
void rh_release(rh_lock *L)
2:
{
3:
if (Be_fair)
4:
*L 32 FREE;
5:
else {
6:
if (cas(L, myid, FREE) != my_tid)
7:
*L = L_FREE;
8:
}
9:
56
In accordance with the foregoing, when multiple processors in different nodes of computer system 10 contend for a lock, the processors that spin in the node that currently owns the lock may have a greater likelihood of acquiring the lock. In addition, the number of processors spinning on a remotely held lock may be limited (e.g., in the above embodiment, only one processor may spin remotely). As a result, overall network traffic may be reduced, and the frequency of migration of the lock (and the data it is protecting) from node to node may be reduced. Overall improved performance may thus be attained.
While in the above embodiment, the number of processors spinning on a remotely held lock is limited (e.g., to only one processor), other embodiments may be possible that do not limit the number of processors spinning on a remotely held lock. In addition, to avoid starvation when a lock is held in a remote node, the number of processors that spin remotely may be increased over time based on various factors, such as, for example, the expiration of a predetermined time period or after a predetermined number of failed acquisition attempts. Similarly, the rate at which a processor spins on a remotely held lock may be increased over time.
In addition, yet additional alternative embodiments may be possible in which the number of processors spinning remotely is limited but wherein a local free state is not implemented that prevents processors in remote nodes from obtaining a lock.
Various embodiments may further include receiving, sending or storing instructions and/or data that implement the spin lock implementations of any of
Although the embodiments above have been described in considerable detail, numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.
Hagersten, Erik E., Radovic, Zoran
Patent | Priority | Assignee | Title |
10042804, | Sep 09 2013 | SANMINA CORPORATION | Multiple protocol engine transaction processing |
7861093, | Aug 30 2006 | International Business Machines Corporation | Managing data access via a loop only if changed locking facility |
8458150, | Apr 18 2008 | Intel Corporation | Method and article of manufacture for ensuring fair access to information using a fair propagation delay period in a transaction ownership step |
8694706, | Apr 27 2012 | Oracle International Corporation | System and method for NUMA-aware locking using lock cohorts |
8909788, | Sep 18 2009 | Siemens Healthcare GmbH | Method and system for using temporary exclusive blocks for parallel accesses to operating means |
8966491, | Apr 27 2012 | Oracle International Corporation | System and method for implementing NUMA-aware reader-writer locks |
9158597, | Jul 08 2011 | Microsoft Technology Licensing, LLC | Controlling access to shared resource by issuing tickets to plurality of execution units |
9330363, | Apr 18 2008 | Intel Corporation | Method and article of manufacture for ensuring fair access to information using propagation delays to determine when to release object locks |
9996402, | Apr 07 2014 | Oracle International Corporation | System and method for implementing scalable adaptive reader-writer locks |
Patent | Priority | Assignee | Title |
5263161, | Jul 26 1989 | Massachusetts Institute of Technology | Non-busy waiting resource control |
5303362, | Mar 20 1991 | HEWLETT-PACKARD DEVELOPMENT COMPANY, L P | Coupled memory multiprocessor computer system including cache coherency management protocols |
5305448, | Jul 02 1990 | International Business Machines Corp. | Shared access serialization featuring second process lock steal and subsequent write access denial to first process |
5404482, | Jun 29 1990 | HEWLETT-PACKARD DEVELOPMENT COMPANY, L P | Processor and method for preventing access to a locked memory block by recording a lock in a content addressable memory with outstanding cache fills |
5987550, | Jun 30 1997 | EMC IP HOLDING COMPANY LLC | Lock mechanism for shared resources in a data processing system |
6182195, | May 05 1995 | Silicon Graphics International Corp | System and method for maintaining coherency of virtual-to-physical memory translations in a multiprocessor computer |
6523078, | Nov 23 1999 | STEELEYE TECHNOLOGY, INC | Distributed locking system and method for a clustered system having a distributed system for storing cluster configuration information |
6823511, | Jan 10 2000 | International Business Machines Corporation | Reader-writer lock for multiprocessor systems |
7058948, | Aug 10 2001 | HEWLETT-PACKARD DEVELOPMENT COMPANY L P | Synchronization objects for multi-computer systems |
7065763, | Sep 29 2000 | EMC IP HOLDING COMPANY LLC | Method of reducing contention of a highly contended lock protecting multiple data items |
7080375, | Dec 30 2000 | EMC IP HOLDING COMPANY LLC | Parallel dispatch wait signaling method, method for reducing contention of highly contended dispatcher lock, and related operating systems, multiprocessor computer systems and products |
20010016947, | |||
20020052959, | |||
20020087769, | |||
20020199045, | |||
20030037178, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Apr 24 2003 | Sun Microsystems, Inc. | (assignment on the face of the patent) | / | |||
Jul 03 2003 | RADOVIC, ZORAN | Sun Microsystems Inc | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 014422 | /0843 | |
Jul 03 2003 | HAGERSTEN, ERIK E | Sun Microsystems Inc | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 014422 | /0843 | |
Feb 12 2010 | ORACLE USA, INC | Oracle America, Inc | MERGER AND CHANGE OF NAME SEE DOCUMENT FOR DETAILS | 037304 | /0151 | |
Feb 12 2010 | Sun Microsystems, Inc | Oracle America, Inc | MERGER AND CHANGE OF NAME SEE DOCUMENT FOR DETAILS | 037304 | /0151 | |
Feb 12 2010 | Oracle America, Inc | Oracle America, Inc | MERGER AND CHANGE OF NAME SEE DOCUMENT FOR DETAILS | 037304 | /0151 |
Date | Maintenance Fee Events |
Oct 01 2012 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Oct 20 2016 | M1552: Payment of Maintenance Fee, 8th Year, Large Entity. |
Sep 30 2020 | M1553: Payment of Maintenance Fee, 12th Year, Large Entity. |
Date | Maintenance Schedule |
May 05 2012 | 4 years fee payment window open |
Nov 05 2012 | 6 months grace period start (w surcharge) |
May 05 2013 | patent expiry (for year 4) |
May 05 2015 | 2 years to revive unintentionally abandoned end. (for year 4) |
May 05 2016 | 8 years fee payment window open |
Nov 05 2016 | 6 months grace period start (w surcharge) |
May 05 2017 | patent expiry (for year 8) |
May 05 2019 | 2 years to revive unintentionally abandoned end. (for year 8) |
May 05 2020 | 12 years fee payment window open |
Nov 05 2020 | 6 months grace period start (w surcharge) |
May 05 2021 | patent expiry (for year 12) |
May 05 2023 | 2 years to revive unintentionally abandoned end. (for year 12) |