A reference-counting garbage collection system utilizing overlooking roots injects eager reference-counting updates into the program. An overlooking roots reference-counting subsumption optimization tool identifies redundant reference-counting updates and removes them, lowering the number of reference-counting update calls and improving execution throughput. The optimization tool also includes new overlooking root relationships in order to permit other unnecessary reference-counting updates to be removed during optimization. Reference-counting updates which are specialized based on the overlooking root information are also included in the program.
|
1. A method for modifying a program to employ reference-counting garbage collection, the method comprising:
generating one or more sets of overlooking roots for one or more respective points in the program, wherein generating one or more sets of overlooking roots comprises (a) and (b):
(a) computing tie functions for procedures in the program, a tie function is a function which, for two roots, gives a set of fields and array types that tie the roots together; and
(b) updating the tie functions, the updating comprising including additional fields or array types that tie the two roots together, the tie functions being updated for relevant statements in the program based on a categorization of the relevant statements, at least one of the relevant statement being categorized into at least one of the following categories: allocations and heap stores; and
introducing reference-counting updates into the program based at least in part using the one or more sets of overlooking roots;
wherein generating one or more sets of overlooking roots comprises performing the fixed-point computation to arrive at a stable set of overlooking roots; and wherein performing the fixed-point computation comprises (i)-(v):
(i) computing a killed overlooking roots set consulting an incoming overlooking roots set and a tie function;
(ii) computing a generated overlooking roots set from a modified version of the incoming overlooking roots set;
(iii) weakly updating the tie function for a procedure;
(iv) computing a meet operation at confluence points; and
(v) repeating (i)-(iv) until a stable set of overlooking roots is calculated.
13. One or more computer-readable storage media comprising computer-executable instructions for performing a method for creating a computer-executable program that employs reference-counting garbage collection, the method comprising:
creating a control-flow graph for an input program;
analyzing the control-flow graph using tie functions to determine overlooking root reference-counting subsumption live ranges, a tie function is a function which, for two roots, gives a set of plural fields and plural array types that tie the two roots together, the tie functions being updated for relevant statements in the input program according to categories of the relevant statements, the categories comprising simple assignments, allocations, heap loads, heap stores, and procedure invocations, wherein the tie functions being updated comprises including additional fields or array types that tie the two roots together; and
inserting reference-counting update commands for reference updates outside of the determined live ranges;
wherein determining overlooking root reference-counting subsumption live ranges comprises performing the fixed-point computation to arrive at a stable set of overlooking roots; and wherein, for a program statement, performing the fixed-point computation comprises repeating (i)-(iv) below until a stable set of overlooking roots is calculated:
(i) computing a killed overlooking roots set consulting an incoming overlooking roots set and a tie function;
(ii) computing a generated overlooking roots set from a modified version of the incoming overlooking roots set;
(iii) weakly updating the tie function for a procedure; and
(iv) computing a meet operation at confluence points.
9. One or more computer-readable storage media comprising a computer-executable system that utilizes reference-counting garbage collection from source code, the system comprising:
a first compiler module configured to represent the source code as an intermediate representation;
an overlooking roots analysis module configured to analyze program statements in the intermediate representation using tie functions to determine sets of overlooking roots for roots in the program statements, a tie function is a function which, for two roots, gives a set of plural fields and plural array types that tie the roots together, the tie functions being updated to include additional fields or array types that tie the two roots together, the tie functions being updated for relevant statements based on a categorization of the relevant statements, wherein at least one of the relevant statements is categorized as an allocation, at least one of the relevant statements is categorized as a heap store, and at least one of the relevant statements is categorized as a procedure invocation;
a live range analysis module configured to determine live ranges based on the sets of overlooking roots; and
an overlooking root rc subsumption optimization module configured to optimize redundant reference-counting updates based at least in part on the determined live ranges;
wherein the overlooking roots analysis module is further configured to, for one or more statements in the program;
compute a set of overlooking roots for the statement using a tie function for a procedure the statement is in;
update a tie function; and
repeat the computing and updating until the module arrives at a stable set of overlooking roots.
16. One or more computer-readable storage media comprising computer-executable instructions for performing a method for modifying a program to employ reference-counting garbage collection, the method comprising:
generating one or more sets of overlooking roots for one or more points in the program, wherein generating one or more sets of overlooking roots comprises (a) and (b):
(a) computing tie functions for procedures in the program, a tie function is a function which, for two roots, gives a set of fields and array types that tie the two roots together; and
(b) updating the tie functions to include new fields or array types that tie the two roots together, the tie functions being updated for relevant statements in the program based on a categorization of the relevant statements, the relevant statements being categorized into at least one of the following five categories: simple assignments, allocations, heap loads, and procedure invocations, such that at least one of the relevant statements is categorized into each of the five categories; and
introducing reference-counting updates into the program based at least in part using the one or more sets of overlooking roots;
wherein generating one or more sets of overlooking roots comprises performing a fixed-point computation to arrive at a stable set of overlooking roots and wherein, for a program statement, performing the fixed-point computation comprises (i)-(v):
(i) computing a killed overlooking roots set consulting an incoming overlooking roots set and a tie function;
(ii) computing a generated overlooking roots set from a modified version of the incoming overlooking roots set;
(iii) weakly updating the tie function for a procedure;
(iv) computing a meet operation at confluence points; and
(v) repeating the above steps until a stable set of overlooking roots is calculated.
2. The method of
3. The method of
4. The method of
5. The method of
8. The method of
10. The system of
11. The system of
12. The system of
if the reference is overlooked by a live root at every point in the live range; and
if the reference is not redefined in the live range.
14. The computer-readable storage media of
for each of one or more program points, iterating through an analysis which successively expands an overlooking roots set until the overlooking roots set is stable.
15. The computer-readable storage media of
determining, for a given local reference, one or more live ranges for the local reference for which the local reference is not redefined and for which there is an overlooking live root at every point in the live range.
|
The vast majority of computer systems allow programs to dynamically allocate memory to data structures during execution. While dynamic allocation provides flexibility to programmers, systems which allocate memory must also find a way to identify and deallocate memory locations that are no longer being used during execution. Such techniques, which are generally known as garbage collection, allow for efficient use of memory, and prevent programs from running out of resources.
The efficiency of garbage collection schemes is often measured by reference to “throughput” and “pause time” metrics. Generally, “throughput” refers to the performance of a garbage collection technique. Specifically, the throughput of a program can be measured by the inverse of its execution time while using a particular garbage collection scheme. By another method of measurement, throughput is related to the amount of memory that can be reclaimed per amount of time that a program is executing. In the description to follow, we shall use throughput to mean the former description. Pause time, by contrast, is the amount of time taken up as the main program is prevented from executing while a garbage collector locates and reclaims memory.
Garbage collection methods are typically distinguished by the methods through which they identify memory locations that can no longer be reached during execution and how these methods affect throughput and pause time. For example, one collection technique called indirect collection periodically pauses execution of a main program in order to traverse memory references and identify memory locations that are no longer reachable by the program. While indirect-collection techniques usually show a relatively high throughput, as they combine reclamation of many memory locations into a single traversal, they tend to have high, and oftentimes unbounded, pause times.
By contrast, another technique, known as reference-counting (“RC”) garbage collection, reclaims memory using a count maintained against each logically independent unit of data, for example, a count ρ(x) is maintained against a unit of data x. In this example, ρ(x) is a tally that signifies whether there are any references to x, and changes as references to x are added and deleted. These count increments and decrements are referred to herein generally as “RC updates.” A ρ(x) value of zero means that there are no references to x, at which point it is safe to reclaim x. RC techniques, generally, are superior to indirect-collection techniques in the pause time metric, because garbage collection calls are usually of bounded time. However, these techniques, through their frequent calling of garbage collection routines, can cause throughput to suffer.
Moreover, traditional RC implementations are typically based on a reachability view of memory management. That is, RC updates are applied just when references are actually destroyed (either due to a redefinition or due to a reference going out of scope) or created, or after that. This could cause garbage objects to be held long after the references to them are last used, resulting in a program consuming more memory than needed.
Thus there remains room for improving the execution time and peak memory usage characteristic of the RC garbage collection technique.
A systematic compiler-oriented methodology for inserting and optimizing RC increments and decrements (collectively referred to as RC updates) is described. The methodology takes into account stack reference lifetimes determined through static program analysis to update the stack contribution to reference counts more eagerly than in the traditional, nondeferred, reachability-based style of RC collection (herein referred to as “classic” RC collection). The methodology has been couched in general terms to cover modern object-oriented instruction sets and features such as exceptions, interior pointers and object pinning.
An optimization called “overlooking reference-counting subsumption” or “ORCS,” is also described that statically identifies and eliminates redundant RC updates on stack references based on an overlooking roots analysis. This optimization can significantly reduce the number of garbage collection calls and improve the throughput of the above described eager RC collection method, as well as that of classic RC collection. In addition, further optimizations are described which include new overlooking root relationships in order to enable unnecessary reference-counting updates to be removed during optimization. Among these are optimizations which remove updates to RC chained roots and immortal roots. Optimizations are also described for specializing reference-counting updates based on the overlooking root information.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
Additional features and advantages will be made apparent from the following detailed description of embodiments that proceeds with reference to the accompanying drawings.
Some existing RC garbage collection techniques ensure an always up-to-date value for ρ(x). That is, using these techniques, an object's reference count is increased or decreased immediately before a reference to it is created or destroyed. Another class of RC techniques maintains ρ(x) lazily and thus does not necessarily update RC counts immediately upon a reference being created or destroyed. Sometimes these lazy techniques are used to lower the number of calls that are made to RC functions, which improves the throughput. The trade-off, however, is that these lazy techniques potentially allow unused memory to sit for an unacceptably long period of time without being reclaimed.
These techniques are contrasted by what can be called “eager” RC techniques. These techniques are a kind of nondeferred RC garbage collection, which can update ρ(x) ahead of references actually materializing and disappearing. For example, if a reference l to an object x is no longer used, then ρ(x) can be decremented ahead of l being overwritten or going out of scope. Such a technique, if efficiently implemented, could provide more immediate reclamation of memory than existing RC techniques while preserving reference counting's generally well-regarded pause times. Additionally, if an efficient number of redundant RC updates could be identified and eliminated before execution, the number of garbage collection calls made during execution could be reduced, improving throughput of RC techniques generally.
In the past, work was done relating to modifying a program to support eager RC garbage collection and to support limited elimination of redundant RC updates. These techniques, examples of which are described herein, can process a compiler's internal representation (“IR”) of a program to produce a normal form of the program and then perform a liveness analysis on the program to determine reference lifetimes and thus points at which RC updates may be performed eagerly. Then RC updates are injected into the IR to support garbage collection based on the reference lifetimes. Through this analysis and injection at the proper points, the eager RC techniques described herein provide quicker reclamation of memory than other RC techniques, while still providing the bounded pause times which are a hallmark of RC garbage collection. These techniques and systems can be integrated into a compiler, providing garbage collection support during compilation.
Additionally, subsumption techniques utilizing overlooking roots are described which utilize a static program analysis to determine references that are subsumed by other references, and whose RC updates are thus redundant. Previous techniques generated an RC subsumption graph which identified subsumed references. RC updates on these references are then eliminated, reducing the number of RC calls in the program, and thus improving throughput.
These techniques prove needlessly restrictive however, and miss certain types of redundant updates. Techniques described herein improve the detection of subsumed references by utilizing a broader definition of potentially-removable updates using a concept of overlooking roots. Informally, a root x is said to “overlook” a root y if whatever is reachable from y is also reachable from x. In one implementation, the term “root” should be taken as a local or static reference. Subsumption processes based on overlooking roots, also known as “overlooking reference-counting subsumption” or “ORCS,” as described herein, are more liberal in their identification of removable RC updates and thus improve performance by excising additional redundancy.
1. Examples of Supported Language Features
The techniques and systems described herein will generally be described with reference to the IR of an input program. This is done because the techniques described herein are generally not language-specific, and also because the techniques can be readily integrated into compilation procedures by being based on the manipulation of a compiler IR.
In various implementations, RC updates could be inserted into a compiler's intermediate representation at various points in a pipeline of phases: either when the IR is at a high level, medium level or after it has been lowered into whichever native code the compiler is configured to produce. Inserting RC updates into the high-level IR permits optimization opportunities that may not be identifiable in other IRs. Implementations which utilize modifications of high-level IRs must also ensure that the downstream phases be aware of RC updates and preserve the invariants that their insertion imposes. In alternative implementations, the analysis and insertion of RC updates could be performed on source programming language code rather than an intermediate representation.
At the IR level, the techniques described herein assume that there are two kinds of pointers relevant to garbage collection: references, which resemble the object references of Java and C#, and interior pointers (“IPs”), which resemble the managed pointers of .NET. Typically, interior pointers are similar to conventional pointers in that they are dereferenceable. However, they are associated with strong typing information and have only a limited set of operations.
As far as logical units of data are concerned in the description herein, there are two kinds: objects that reside on the heap (including arrays), and value types (like struct types) that reside on the stack. While references point to the beginning of objects, interior pointers can point to an object's beginning as well as specific places in the middle, such as, for example, fields and array elements. IPs can also point into the static data area and the stack, in which case they must target the beginning of a value type, a field thereof, or a reference. In one implementation, the syntax S of their definitions is determined by the following exemplary grammar productions:
In these grammar productions, L is the set of local value-type and reference variables, R is the set of local reference variables, T is the set of interior pointers (which are local variables), W is the set of local reference and interior pointer variables (i.e., W=R∪T), I is the set of integer-valued expressions, and F is the set of static, object and value-type fields. unbox is an operator that takes a reference to an object-version of a value type (sometimes called a boxed value type) and returns an interior pointer to its beginning. The “member access” operator (‘.’) extracts a field of an object or a value-type instance, given a reference or an interior pointer to it. The “address of” operator (‘&’) returns the address of a variable, field or array element. Thus, &(L.F) is an interior pointer to a field of a heap object or a field of a value-type stack instance, and &.F is an interior pointer to a static field.
In implementations supporting the grammar productions listed above, interior pointers cannot be stored into the heap, a field of a value type, or returned from a function. However, both references and interior pointers are allowed to be passed into functions. These restrictions are similar to those in the .NET standard of Microsoft Corporation. While the productions above do not cover all possible ways of defining interior pointers in .NET—for example, conversions from so-called unmanaged pointers to interior pointers, alternative eager RC and subsumption implementations can be modified in a straightforward manner to deal with these additional cases. Additionally, in an alternative implementation, the techniques described herein can be extended to support a language that allows IPs to be returned from a function.
Some of the descriptions herein also assume a service called findstart(p) provided by a garbage collection allocator that returns a reference to the start of the object enveloping the interior pointer p if p points into the heap, and null otherwise.
The techniques described herein support that any variable v that is either an interior pointer or a reference can carry an attribute, called pinned, that prevents the garbage collector from reclaiming (or moving) any object that v may point to until v's redefinition, or until the end of v's lexical scope.
In various language implementations, value types can contain references. These need to be properly accounted for in compiler-assisted RC collection schemes. Rather than specifically considering them and for the sake of uncluttered explanation, the techniques can assume the execution of a value-type “unwrapping” phase prior to the RC update insertion phase that replaces all field-possessing value-type variables by a series of variables corresponding to the primitive value-type fields and reference fields directly or indirectly embedded in them. This unwrapping can adjust the signatures of functions that accept or return reference-embedding value types.
Finally, the techniques support languages where statements in the IR can throw exceptions. Excluding function call instructions, it is assumed herein that when a statement throws an exception, it does so without discharging any of the external state side-effect actions that it would normally perform in the course of program execution, with the external state being the heap, stack and static data. The action of throwing an exception could be implicit, such as when the divisor in a division instruction is zero. For explicitly throwing exceptions, it can be assumed that the IR provides a throw statement, which is allowed to occur only at the end of basic blocks.
2. Examples of Eager Reference Counting and Subsumption Architectures
It should be noted that, as used in this application, the terms “optimize,” “optimized,” “optimization” and the like are terms of art that generally refer to improvement without reference to any particular degree of improvement. Thus, in various scenarios, while an “optimization” may improve one or more aspects of the performance of a system or technique, it does not necessarily require that every aspect of the system or technique be improved. Additionally, in various situations, “optimization” does not necessarily imply improvement of any aspect to any particular minimum or maximum degree. Finally, while an “optimized” system or technique may show performance improvement in one or more areas, it may likewise show a decrease in performance in other areas. In the particular circumstances described below, while optimizations will result in the removal of redundant or superfluous RC updates, possibly providing increased performance, these optimizations should not imply that every possible RC update will be identified or removed.
As
In a typical CFG implementation, nodes in the CFG are basic blocks and arcs depict the control flow between them. CFG edges are of two types: normal arcs that denote the normal flow of control from the end of one basic block to the beginning of another, and exception arcs that represent the flow of control from anywhere within a basic block to the header block of an exception handler. In one implementation, exception header blocks contain a special statement called an exception assignment that catches and assigns the thrown exception to an exception variable. This statement is assumed to have the form x:=catch( ), where catch is an IR opcode, and is classified as a function call instruction for the purposes of this description.
After creation of an IR 125, the IR 125 is passed to an RC injection module 130 that serves to add instrumentation for eager RC garbage collection, and then to an overlooking-root-based optimizations module 140 where RC updates on RC-subsumed references are identified utilizing an overlooking roots analysis and then removed. Moreover, in some implementations, additional optimizations based on the overlooking roots information are performed by this module. Particular implementations of these processes will be described in greater detail below. Finally, the IR with RC instrumentation added to it is passed to a second compiler module 150 for compilation into the executable program 160.
The illustrated RC injection module 130 of
The illustrated overlooking-root-based optimizations module 140 of
The second module illustrated in
The third illustrated module is the overlooking roots subsumption module 260, which again utilizes overlooking root information developed by the module 240 to identify and determine RC updates which can be culled and to remove these redundant RC updates.
The final illustrated module is the RC update specialization and removal module 270. This module utilizes overlooking roots information gained from the overlooking roots analysis module 240 to substitute specialized versions of RC updates for traditional ones. In various implementations, the overlooking roots relation can be used to encode state or other information, which can eliminate the need for certain checks performed during RC updates. Thus, simplified versions can be substituted which do not implement these checks, improving the efficiency of those particular updates. Additionally, the module 270 also removes extraneous updates which are known to be extraneous due to particular overlooking root information, but which were not removed by the processes of the overlooking roots subsumption module 260. Particular examples of processes for performing the functions of the modules described above are discussed in detail below.
3. Examples of Eager RC Transformation Processes
Generally, the processes described herein for inserting eager RC garbage collection instrumentation comprise three stages.
The process begins at block 420, where the preprocessing module 210 preprocesses the IR, produced by the compiler 100, into a normal form. This normal form provides that references returned from functions are not lost; if these references were not captured, memory leaks could arise. The normal form also provides that the actual-to-formal copying of reference parameters at call sites is automatically handled at later stages and that the definitions and deaths of interior pointers can be ignored by the later stages.
Next, at block 440, the liveness analysis module 220 performs a live-range analysis on local references, modified to model the object lifetime semantics of pinned references. In one implementation, this second stage can be implemented using known live-range analysis techniques which are modified to handle the semantics of pinned references. Next, the RC injection module 230 introduces RC updates against local and heap references, their placement being guided by the liveness information previously derived in the second stage.
3.1 Preprocessing Examples
where ƒ is a function that returns a reference, with an IR statement
where {dot over (r)} is a compiler-generated temporary.
Next, at block 520, the preprocessing module 210 introduces fake initializations of formal references. This is performed, in one implementation, by inserting initializations of the form
Next, at process 530, the preprocessing module 210 pairs every IP with a compiler-generated reference called a shadow at various program points. In one implementation, this is done by preceding every definition of an IP with a definition that assigns its shadow to the start of its enveloping object. In addition, a pinned attribute on an IP is carried over to its shadow. The shadowing procedure also comprises following each use of an IP by a fake use of its shadow. In this way, the later stages of the RC injection processes can ignore IPs while knowing that any memory management of objects pointed to by IPs is taken care of by management of the shadows.
Different kinds of IP definitions involve different methods of creating shadows. For instance, if {tilde over (p)} is the shadow of an interior pointer p, then in one implementation, the preprocessing module 210 inserts an assignment against a definition of p that points it into an array in the heap as follows. (Please note that for the sake of illustration, in this and subsequent examples, IR statements which are inserted by the described techniques will be denoted with the symbol .)
Note from the syntax description above that r is a local reference variable. If, by contrast, p were defined to point into the stack (for example, by assigning the address of r to it), then in one implementation, the following code would be produced:
Other kinds of definitions involving the address-of and unbox operators can be similarly dealt with. In another example implementation, to handle definitions involving an offset calculation on an interior pointer, the compiler inserts basic blocks with the following code:
p := q ± e
w := (q ± e) − {tilde over (q)}
if w ≧ 0 w < sz
{tilde over (p)} := {tilde over (q)}
else
{tilde over (p)} := findstart(q ± e)
end
p := q ± e
In this insertion example, {tilde over (p)} and {tilde over (q)} are the shadows of the interior pointers p and q, e is an integer-valued expression and sz is the statically determined size of the object pointed to by {tilde over (q)}.
As mentioned above, the process of shadowing interior pointers also includes introducing a fake use of a shadow after each use of an interior pointer in the IR:
It should also be noted that shadow references need not be passed into functions that take in IP parameters because every IP formal parameter is guaranteed after preprocessing to have an associated shadow reference up the call stack whose lifetime subsumes the call. This subsumption makes RC updates on the IP parameter redundant, and thus not necessary. In another implementation, shadow references could be passed into functions, realizing a nondeferred RC scheme that is more aggressive in reclaiming storage.
3.2 Examples of Live-Range Analysis
The live-range analysis is the second stage of the eager RC instrumentation process.
At block 610, a default exception handler is created against the currently analyzed statement for every exception that it could implicitly throw and for which a handler does not already exist. The default handler simply catches and re-throws the exception via the throw statement.
Next at block 620, fake uses for pinned references are added. This is done because an RC decrement cannot be inserted after the last use of a pinned reference r since the object that it targets must be held until its redefinition or until the end of its lexical scope. Furthermore, simply considering r as live throughout a function is not sufficient because an RC decrement is needed just before each of r's redefinitions. Instead, the live ranges of r need to be stretched so that they span the definition points of r and so that they extend until the end of the body of the function that r is found in. This can be done by (a) introducing a fake use of r into each statement that must define r, and by (b) introducing fakeuse(r) as the last statement in basic blocks that return control from the function. After this extension and the ensuing liveness calculations, the insertion process performed by the RC injection module 230 automatically achieves the pinned semantics for r.
At block 630, definition and usage sets are generated for the current statement. In one implementation, for a statement s of a basic block, the sets defsmust(s) and usesmay(s) are defined as the sets of local references that must be defined at s and which may be used at s respectively.
Finally, at block 640, the sets of references that are live at a statement, and that die across it, are generated. In one implementation, this is performed based on the following equation, which relates the local references that are live before and after the statement s:
livein(s)=(liveout(s)−defsmust(s))∪usesmay(s).
This equation is applied on the function's statements in reverse order, starting from its exit basic block and proceeding to its entry basic block. For the exit basic block, two kinds of references are considered live at its end: (1) those returned from the function, and (2) those expressly thrown (using the throw statement) from basic blocks lacking handlers for the exception.
From the above sets, the set of local references that die across a statement s is
dieacross(s)=(livein(s)∪defsmust(s))−liveout(s).
Hence dieacross(s) is exactly the set of references against which RC decrements are required just after s, assuming three conditions hold: (1) heap references are not defined in s; (2) local references are not both used and defined in s; and (3) the set of local references that may be defined in s (for example, through interior pointers) is the same as defsmust(s). However, the injection process described below is resilient to any of these conditions not holding.
After block 640, the process continues on to block 645, where it is repeated for the next statement.
3.3 Examples of RC Injection
The RC injection stage is the third stage of the eager RC instrumentation process 400.
Generally, RC updates are inserted by this stage over three steps using liveness information. The first step injects RC increments and decrements against heap and local references immediately after statements. It should be noted that only statements that exist before this stage are considered in this step. The second step injects RC increments against references thrown using throw and for which there is an exception handler in the function. The injection happens just before the throw statement. The third step introduces RC decrements on references which die in a basic block into that basic block's exception header, if it has one.
Thus, the process begins at block 705, where the process enters a loop that repeats for each statement in the IR. Inside the loop, the process continues to decision block 710, where the RC injection module determines if the current statement is a call statement. As noted above in one implementation, exception assignments and the fakedef statement are considered call and non-call instructions respectively for the purposes of process 700. Also, in one implementation, allocation instructions of the form
If, at decision block 710, the module determines that the statement is a call statement, the process continues to block 720, where RC decrements are injected after the call statement for all references that die across the statement. Apart from the RC decrements against them, no other RC updates or assignments are injected. In particular, no RC increments are present before the call against the actual reference parameters because the necessary increments occur on entry into the function. IP arguments as well need no special consideration because they are indirectly taken care of through their shadows, as discussed above with respect to
Thus for diεdieacross(s), a function call r:=ƒ(x, y, . . . ) becomes the set of instructions:
Here, RC−(r) and RC+(r) represent RC increment and decrement instructions on the object targeted by the reference r. Also note that if r is null, then these operations become no-ops.
If instead, the RC injection module determines at decision block 710 that the statement is not a call statement, more complex RC injections are used. These injections are performed with reference to various sets of references for the current statement. The context of the sets is based on liveness information that can be derived from a static analysis of the IR. Thus, in one implementation, the sets are referred to as follows: Let ldefs(s) be the set of l-value expressions of all references (stack and heap) that may be defined at a statement s, and let L (Q) be the set of l-values for variables in the set Q. The remaining sets used during RC injection are:
In a preferred implementation, the behavior of the RC injection module 230 depends on whether the compiler can establish that L (defsmust(s)) equals ldefs(s). Thus, at block 730, these l-value sets are compared. Then, at block 740, RC updates are injected based upon the comparison.
The process begins at decision block 805, where the RC injection module 230 determines if the sets L (defsmust(s)) and ldefs(s) are equivalent. If the two sets are found to be equivalent, references common to defsmust(s) and livein(s) are saved so that their old values are available for doing RC decrements after s. Thus, at block 810, these assignments are injected. Next, at block 820, RC increments are injected for all references defined in s. This is followed by injecting decrements against the temporaries at block 830. Thus, the use of temporaries allows former targets of redefined references to be decremented. Finally, at block 840, RC decrements are inserted against local references that die across s and the process ends.
Note that in the case of the trivial assignment z:=z, the inserted RC updates would cancel out; in particular, the RC increment against z would be balanced by the following RC decrement against the temporary that holds the previous value of z. This is why, in one implementation, formal references are initialized using fakedef statements in the process of
Thus, for a non-call statement s for which L(defsmust(s))=ldefs(s), the following RC injections occur (variables with dot accents represent the temporaries):
If, however, at decision block 805 the RC injection module 230 determines that L(defsmust(s))≠ldefs(s), a different set of injections occurs. First, at block 850, null assignments are injected against the wi references, which are those references that must be defined in s but which are not used in s. This is done because RC decrements already exist at an earlier place since these references die before their redefinition in s. They are thus assigned null to preclude double decrements later when decrements against temporaries are made.
Next, at block 860, the module injects assignments to temporaries for references which may be defined in s. In one implementation, these assignments to temporaries apply the dereference operator (‘*’) on l-value expressions in ldefs(s) to obtain the old values of references potentially to be overwritten in s. Next, at block 870, RC increments are made against the potentially new references. Then the process proceeds to block 880, where the temporaries are subjected to RC decrements. Finally, at block 890, RC decrements are injected against the references that die across s and the process ends.
Thus, for a non-call statement s for which L(defsmust(s))≠ldefs(s), the following RC injections occur:
In an alternative implementation, not every increment and decrement is necessary; if an alias analysis can prove that a pj will point to a wq, then the statements wq:=null, {umlaut over (t)}j:=*p and RC−({umlaut over (t)}j) can be omitted.
Returning to the process of
On the other hand, when explicitly thrown references are caught in the same function, the absence of an RC increment against the exception variable must be countered by an RC increment at the point of the throw statement, or earlier.
Finally, there is one last concern. If a statement s were to throw an exception, then none of the ensuing RC updates shown above will get executed. In this case, injected RC increments for the throwing statement should not happen, because an exception-throwing s is assumed to create no side effects against the program's external state, as stated above. However, among the RC decrements, those that operate on the local references that die across s should still be performed.
Thus, at block 770, RC decrements are inserted into exception headers for any basic block which could throw an exception. In one implementation, for a basic block B with an exception header B′, RC decrements are made against the set of references
where livein(B) and livein(B′) are the live sets on entry to B and B′. The RC decrements are inserted into B′.
However, at execution time, RC decrements on a subset of D′ will occur in B before an exception is actually thrown. To forestall another decrement in B′ on references that have already died in B, in one implementation the RC− operation is imparted the following semantics: it resets its operand reference to null after decrementing the reference count of the targeted object. This solution naturally works because the RC− operation is always introduced at the death point of its operand. Under this implementation, the null assignments made during the process of block 850 are not necessarily required.
3.4 Examples of IR with Injected Eager RC Updates
Specific examples of concrete instructions handled by the above-described techniques are getfield in the Java language and ldfld in MSIL (Microsoft Intermediate Language). An IR representation of either is o.f, where o is a local reference and f a field. As noted above, from the point of view of the injection process, this is considered a non-call instruction. As such, the following is an example of code emitted by the compiler in a specific instance:
In this example, defsmust(s)={o}, usesmay(s)={o} and ldefs(s)={&o}. Since L(defsmust(s))=ldefs(s), the code generated corresponds to that generated by the process of blocks 810-840 of
Another example is the IR instruction cmpxchg, which mimics the compareExchange method of the System.Threading.Interlocked class in .NET. cmpxchg takes an interior pointer p to a reference, a pair of references x and y and compares x with the reference at p for equality. If equal, the reference at p is replaced by y and the original reference at p is returned. If unequal, only the reference at p is returned. The following shows the code after execution of the insertion process, which regards the statement as a non-call instruction:
In this example, defsmust(s)={r} and ldefs(s)={p,&r}. Thus, depending on whether an alias analysis can prove that p always equals &r, either of the two patterns generated in the process 800 of
It is also worth noting that two optimizations are possible on the injected code in this example. First, *(&r) is replaceable by r. Second, RC+(*(&r)) and RC−({umlaut over (t)}1) cancel out because after the cmpxchg operation, r equals {umlaut over (t)}1. Given an optimizing compiler, these optimizations could create increased efficiencies in the eager RC-instrumented program.
4. Examples of RC Subsumption Analysis
4.1 Examples of RC Subsumed References
In the example,
Altogether,
It turns out that of the RC updates introduced into real programs by the eager RC insertion procedures described above, a large number are on local references that are RC subsumed by local references on which RC updates are also introduced. For instance, the RC updates on formal references are often redundant because formal reference parameters are usually RC subsumed by actual reference parameters. The goal of the RC subsumption analysis described herein is to locate such subsumed references. An RC subsumption optimization would then remove RC updates on these references, resulting in fewer garbage collection-related calls during program execution and therefore increased throughput.
While the discussion above is given in the context of eager RC updates, RC subsumption can also occur when RC updates are inserted according to classic RC collection schemes. As an example, consider the IR
in which newobj( ), as before, returns a new object with reference count of 1, and in which RC updates are inserted according to a classic RC collection scheme. Since y points to the same object as x in its live range from Line 4 to Line 5, and since this live range is contained in the live range from Line 1 to Line 7 of x, the RC updates on Lines 2 and 5 are superfluous. (Note that the decrement against y on Line 3 is not superfluous, as it is performed to decrement the reference count of the object that y is about to be swung away from due to the assignment on Line 4.) Thus, as this example shows, RC subsumption can also occur when RC updates are inserted according to a classic RC collection scheme.
4.2 Examples of Previous Techniques
Related techniques attempted to take advantage of the notion of subsumption by defining a local reference y as being always RC subsumed by a local reference x, if:
This definition, which will be referred to as “enveloping RC subsumption” (“ERCS”) herein, led to an optimization that was somewhat effective on many programs. Nonetheless, it covered only a limited set of scenarios. Moreover, known algorithms for finding ERCS references (roots that fulfill the ERCS definition) were overly conservative. A conservative algorithm for an already conservative definition resulted in missed opportunities in some test results
A shortcoming of ERCS is that, for a variable to be nontrivially subsumed, it must always be reachable from a particular variable different from itself. For example, consider the following code fragment:
This consists of mutator code, and the RC updates that a classic RC collection scheme would require. If x, y and z are defined for the first time on Lines 2, 4 and 6, and used last on Lines 14, 13 and 12 respectively (ignoring the RC update usages), then even if we assume that the fields f1 and f2 reside in thread-local objects, the object targeted by z will not always be reachable from x or y alone. Thus, despite z satisfying Provision A1 above, it is not an ERCS reference since Provision A3 does not hold relative to x or y alone. And yet, preferably, z should be subsumed since in the example it will always be reachable from either x or y.
Another example is the code fragment below:
In this example, neither Provisions A3 nor A1 hold for any of the variables. Hence, there would be no subsumption by ERCS, even though the RC updates for w can be avoided if the coverage jointly provided by u and v were considered.
Another source of ERCS conservatism is that, of the two clauses in Provision A2, the second is constraining. The first is that y should never be live through a redefinition of itself. It is used in ERCS to prevent a dangling reference problem. The second is that y should never be live through a redefinition of x. It exists to eliminate the possibility of y's target becoming unreachable from x due to indirect writes of x through pointers. This requirement of ERCS is unnecessary if Provision A3 can be computed more precisely.
In another limitation, ERCS references were computed by finding local references that “overlook” an object targeted by a live local reference y, from just before a statement s until their death or possible redefinition. This overlooking root set of y at s was defined in these past techniques as:
where R is the set of local references, liveout(s) is the set of local references that are live just after s,
means y points to the object ω at program point P, and sin and sout are program points just before and just after s.
Because a straight calculation of (s,y) by Equation (4.1) at every possible definition of y is computationally expensive, one implementation of the ERCS techniques approximated it by a peephole examination of a small context around a statement s. For example, in this technique, an approximate (s,y) for the statement y:=x.f is {x}, if x is known to only target thread-local objects, and if x.f is not written into before x dies.
Although a peephole examination could be useful for a number of important statements, opportunities were missed. For y:=x.f, since ascertaining whether x.f is written into before x dies might require an inspection of all basic blocks reachable from the basic block B in which s occurs, the opportunity was conservatively identified by restricting the inspection to just the basic block B. The approach was to not consider y for subsumption if x did not die before the end of B.
To summarize, subsumption techniques were previously limited both in the opportunities they recognized and the flow-insensitive manner in which they were calculated.
5. Examples of Overlooking Subsumption
5.1 Examples of General Overlooking Subsumption Processes
The processes which follow utilize a less conservative subsumption definition based on the concept of overlooking roots. To restate, the overlooking relation can be thought of as follows: A root x overlooks a root y at a program point P if whatever is the object reachable from y at P is also reachable from x at P without going through y.
This overlooking roots binary relation is irreflexive and transitive at any P. In implementations described herein, a set of ordered pairs that fulfills the overlooking relation at a program point P is denoted as olook(P). When (x,y)εolook(P), x is called the overlooker or the overlooking root herein, and y the overlookee or the overlooked root herein. Generally, (x,y) as used herein is called an overlooking pair.
Using this definition, the broader idea of overlooking RC subsumption, or ORCS, can be defined as a kind of subsumption more general than ERCS. In ORCS, a live range l of a local reference y is considered subsumed according to ORCS, if:
Referring back to the code examples discussed above in Section 4.2, it may be noticed that, although there is no subsumption in these examples according to ERCS, z and w will be subsumed according to ORCS. Thus ORCS provides greater opportunities for removing unnecessary RC updates than does ERCS.
In one implementation, the olook sets mentioned above contain path-insensitive information. If (u,v)εolook(P), then u overlooks v at P irrespective of the path taken to reach P. This is stronger information than is necessarily needed for every implementation of ORCS optimization. What is used, according to the propositions above, is a set of live roots at each point, of which at least one is guaranteed to be an overlooker of a root in question. Techniques described herein with reference to
Next at block 1120, the module performs an RC chaining transformation on the program. This is performed, according to some implementations, because certain RC updates can be identified ahead of time as being extraneous because they satisfy RC chaining conditions. By adding roots which overlook these target roots, needless updates to the target roots can be taken out during the same RC update removal and specialization process performed for the RC updates added during the eager injection process described above. Particular examples of processes of RC chaining transformations are described below with reference to
Next, at block 1130, the redundant overlooking roots subsumption module 260 performs an ORCS analysis and removes RC updates for references which satisfy the ORCS provisions. Particular examples of processes of ORCS analysis are described below with reference to
5.2 Examples of Overlooking Root Analysis
Before describing specific uses of overlooking roots information, such as RC chaining or ORCS, it is useful to understand processes by which the overlooking roots information is obtained.
As mentioned above with respect to
In one implementation, the analysis is intraprocedural. It handles invoked procedures using reference mutation summaries. A reference mutation summary for a procedure F is understood to be the set of reference fields and reference array types that could be mutated by F.
The illustrated process begins in a loop block 1305 that is performed for each statement. Again, in various implementations, this may not be performed for every statement, but instead may be kept for larger groups, such as basic blocks, and then computed on demand for each statement. In the illustrated implementation, this is performed only for “relevant” statements. Examples of relevant statements are given below. At block 1310 the set olookout(s), which is the olook set just before statement s is created. In one implementation, the set is given by
olookout(s)=(olookin(s)−kill(s))∪gen(s) (5.1)
where olookin(s) is the set of overlooking roots just before s, and kill(s) and gen(s) are the sets of roots that are made by s to stop overlooking a root, and that are made by s to overlook a root, respectively.
In one implementation, when computing olookin(s), there are three possibilities. If s is preceded by a statement s′ in its basic block B, then olookin(s) equals olookout(s′). Otherwise, olookin(s) equals the meet of the olookout sets for the last statements in the predecessor basic blocks of B. If B has no predecessors, such as the entry basic block of the control-flow graph (CFG), olookin(s) equals a special initializing set called olookT. The elements in olookT are the overlooking root pairs on entry to a procedure. For every local reference r that is not a formal reference, it contains the pair (T,r), where T denotes an “undefined” virtual root. This is because these references have initially undefined values. For every formal reference z, olookT contains the pair ({circumflex over (z)},z), where {circumflex over (z)} is a virtual root that models the actual parameter corresponding to z. In one implementation, there are no other pairs in olookT, because all the remaining roots, such as the static and virtual roots (including ⊥), do not have initially known overlookers.
However, while equation (5.1) provides a beginning for computing olookout(s′), the equation by itself is not efficient because olookout becomes vacuous in the face of called procedures that may mutate the heap. For example, consider a call statement y:=F(), where the callee F is known to mutate a reference field f. Because f may lie on the heap path by which one root overlooks another, the kill set for the statement, in the absence of more knowledge, would have to be at least olookin(s). For such a conservative estimate, however, all incoming information for the statement would be lost.
This problem is not specific to callees that may mutate the heap. It exists for any instruction that may mutate the heap. Its ramifications in the procedure invocation case, however, are extreme. In another example, the case of a statement like y.f:=x, not all of the incoming information has to be killed. This is because, at the very least, y will overlook x after the statement (assuming that f is a thread-safe or read-only field).
5.2.1 Examples of Tie Functions
The heap mutation problem can be addressed using the concept of tie fields and tie array types. A reference field f is said to “tie” a root u to a root v if there exists a point P in the procedure at which f occurs on every heap path by which u overlooks v at P. Similarly, a reference array type A is said to tie u to v if an instance of A occurs on every heap path by which u overlooks v somewhere in the procedure. A tie function T can then be constructed such that given u and v, T(u,v) is the set of all fields and array types that may tie u to v.
Generally, tie fields and tie array types are links in the heap that, when severed, cause the overlooking relation between a pair of roots to be broken. Hence, if a field or an instance of an array type in T(u,v) is updated, then the overlooking relation between u and v could be broken.
One implementation of the overlooking roots analysis described herein keeps one tie function per procedure. T is initialized to map every pair of roots to the empty set Ø. Then, as new overlooking pairs are generated with every application of equation (5.1), the tie function is updated to include new fields and array types. Once added to an image of T, these fields and array types are never removed; this implementation conforms to a weak update policy.
A potential drawback of weak updating is that it can rapidly dilute the usefulness of the gathered information. The tie function could be updated for each pair in gen(s), but that may needlessly dilute the tie information. In particular, if (u,v)εolookin(s) and (u,v)∉kill(s), then there is no need to update T(u,v), because whatever ties u to v after s already exists in T(u,v) before s. Hence, in a general fashion, T is updated as:
At block 1320 of
5.2.2 Examples of Determining Overlooking Roots
The main benefit of the tie function is that it can be used to enable the kill set for a statement to be more specialized, which in turn prevents the olookout set from losing more information than is necessary. For instance, as described below, in the case of procedure calls, the kill set need only be a subset of {(u,v)|T(u,v)≠Ø}. Another use is in determining dolook sets, as described below.
As mentioned above, in the illustrated implementation, the iterated processes of blocks 1310 and 1320 are performed only on “relevant” statements. In various implementations, statements are deemed “relevant” if they can alter the olook sets. Irrelevant statements, such as those that only side-effect arithmetic variables, propagate their olookin sets to their olookout sets.
For the sake of brevity of description, the analysis described herein with respect to updating the tie functions and olook sets is described assuming all roots to be only references. In one implementation, this is in accordance with the Java programming model. In .NET implementations, roots can also be interior pointers to objects, but extending the descriptions herein to account for these pointers is straightforward.
Assuming x, y, u and v are local or static references, relevant statements can be divided into five categories:
The process begins at block 1410, where the statement's kill set is determined. In one implementation, this can be performed consulting the statement's olookin set and T. Next, at block 1420, the statement's gen set is computed. In one implementation, this can be performed using the statement's olookin* set. Finally, at block 1430, the tie function T might be updated, at pairs in gen*(s). Particular examples of how statements in each of the categories affect the tie function and other sets are described below in Section 6.
Returning now to the process of
After the sets are determined to be stable, at block 1240 the module computes “directly” overlooking roots sets from the overlooking roots sets using the computed tie functions obtained during the fixed-point analysis. It is thus useful to define a “directly” overlooking roots binary relation:
In most situations, this relation is an aliasing relation. One exception is when virtual roots target multiple objects at the same time.
In one implementation, this is computed as:
dolookin(s)={(u,v)|(u,v)εolookin(s)T(u,v)=Ø} (5.5)
In one implementation, it is preferable to keep two maps for T, to speed up tie function lookups. The first is for querying all the ties for a root pair, as in Equation (5.2). The second is for determining all the pairs tied by a field or type, which is utilized when determining particular kill sets as described below. After computing directly overlooking roots, the process then ends.
5.3 Examples of RC Chaining
In existing techniques, compile-time RC update coalescing is restricted to within contiguous RC update sequences occurring within basic blocks. In techniques described herein, however, a transformation referred to as RC chaining is utilized which provides a more general coalescing effect that can be achieved by applying an ORCS optimization on the transformed code.
It should be noticed that the RC updates for
The process begins at block 1610, where the module 250 utilizes the overlooking roots analysis module 240 to identify overlooking root relationships, in particular directly overlooking root relationships, which are used in subsequent RC chaining processes. Particular examples of this process are described above. Next, at block 1610 the module generates an RC chaining graph. An RC chaining graph GC=(V, EC) is an undirected graph in which the nodes stand for live ranges, and in which the edges denote the RC chaining relationship described above.
For the sake of argument, let the resulting graph be called Gl′. What is left is to remove edges so that provision C3 is satisfied. Thus, at block 1725, a loop begins for each program point. At block 1730, for program point P, the module computes a set:
Δ(P)={(lu,lv)|uεlive(P)((u,v)∉dolook(P)(v,u)∉dolook(P))},
where live(P) is the set of live roots at P, and where lu and lv are the live ranges corresponding to u and v at P. Finally, at block 1740, edges occurring in Δ(P) are deleted from Gl′. The process then loops at block 1745 for the next program point. After all the program points are processed in this manner, the resulting graph is the RC chaining graph GC.
By its construction, every connected component in the RC chaining graph represents a set of live ranges across which RC updates can be coalesced. Thus, returning to process 1600, the module next, at block 1630 utilizes the RC chaining graph to generate definitions and uses of a temporary chaining root tC against each connected component c in GC. Generally, one implementation of this process makes assignments against tC, and introduces fake uses of it so that its live range tightly spans all the live ranges in c. The assignments are such that tC aliases at every point the variables corresponding to the live ranges in c that are also active at that point. Later, when an ORCS optimization is applied on the transformed code, all the live ranges in c will be subsumed by tC. Only RC updates against tC will be retained. The net effect equals a coalescing of the RC updates at the overlap points in c. After this generation, the process of
The process then continues to block 1830 where the module precedes every definition u:=e that corresponds to a definition point in D(lu) with a new definition tc:=e. Finally, at block 1840, the module introduces a fake use of tC after every last use of u in lu (meaning every time u is used last in lu). The process of
The tightly-spanning live range of the chaining root created in process 1800 is illustrated in
Note that provision C3 uses the directly overlooking roots relation. If the general overlooking roots relation were used instead, either the reclamation characteristics of the original collection scheme could be affected or dangling references could be created. For instance, suppose that x overlooked y but did not alias it in the overlap region of
In an alternative implementation, the creation of a new spanning live range can sometimes be obviated by copy propagation. As an example, suppose that the live range of y in
5.4 Examples of Overlooking RC Subsumption Analysis
The process begins at block 1910, where the module 260 utilizes the overlooking roots analysis module 240 to identify overlooking root relationships in order to perform an ORCS analysis. Particular examples of this process are described above. Next, at block 1920, live ranges which correspond to roots that are overlooked, and therefore, whose updates can be removed, are computed. Examples of this process are described in greater detail below with respect to
The process begins at block 2005 in a loop for each statement and root. Although this is illustrated as a single loop for the purposes of brevity, this may properly be thought of as two loops, one loop which loops over every statement, and then an inner loop which loops over each root in the statement. At block 2010, for the particular root and statement being iterated on, the module computes a “live cover” of the root after the statement. Block 2020 performs the meet operation on live covers at confluence points. Then, at block 2025, the loop continues for the next root and/or the next statement. After the loop is completed, at decision block 2035 the module determines if the live covers created are stable. If not, the process returns to the loop and is performed again until a fixed-point is reached. The stable live covers thus produced are used to determine the live ranges that satisfy the ORCS propositions B1 and B2 discussed above. Finally, at block 2040, the process returns a list of RC updates whose references have nontrivial live covers at the time of the update, and thus which satisfy ORCS proposition B1. RC updates for the live covers among these that also satisfy ORCS proposition B2 can be removed at block 1930. The process then ends.
The live cover of a root r at a point P is defined in one implementation as the set of live roots at least one of which overlooks r at P. Let liver(P,r) denote this set. If liver(P,r) is nonempty at all P in a live range l of r, and if provision B2 is also satisfied, then l is an ORCS live range by the definition above.
Some of the properties of live cover sets should be noted. First, every subset of a live cover is not assured to be a live cover. As an example, if liver(P,r) is {x1,x2}, then {x1} may not be a live cover of r at P. However, every superset (comprising live roots) of a nonempty live cover is a live cover. In one implementation, the former property is called the subset property, and the latter the superset property. The empty set is a special case, and is a trivial live cover.
Because of the subset property, computing liver(P,r) is not necessarily straightforward. One guaranteed live cover at P is:
liver′(P,r)=live(P)∩xproj(olook(P),r) (5.6)
where xproj is the x-projection operator described below in Section 6 with reference to updating tie functions in procedure invocations. But liver′(P,r) could be Ø, as in at confluence points. Therefore, various implementations attempt to derive better information such as ascertaining a nonempty liver(P,r) when liver′(P,r) is Ø.
If liverin(s,r) and liverout(s,r) are the live covers of r just before and just after a statement s, then, as before:
liverout(s,r)=(liverin(s,r)−KILL(s,r))∪GEN(s,r) (5.7)
When figuring out the KILL(s,r) and GEN(s,r) sets, a few cases are considered. Let sout be the program point just after s. If liver′(sout,r) is nonempty, then by the superset property, a valid liverout′(s,r) is liverin(s,r)∪liver′(sout,r).
Otherwise, if s does not kill (with respect to the overlooking roots relation) any of the roots in liverin(s,r), and if none of these roots die as control flows through s, then liverout(s,r) can be set to liverin(s,r). Thus:
The expression kill(s) in the above is the overlooking roots' kill set, from above, and {hacek over (R)} is the set of all roots. The set diethru(s) are the roots that die as control flows through s, defined in one implementation as:
diethru(s)=(livein(s)−liveout(s))∪(livein(s)∩defsmust(s)) (5.10)
where livein(s) are the roots that are live on entry to s, and where defsmust(s) are the roots that must be defined in s.
As previously mentioned, after the live covers are determined, at block 2020 the module combines them. A meet operation will, at a program point Q, lying at the confluence of points P1 and P2, combine the live covers liver1 and liver2, respectively. In one implementation, the meet operation used is:
liver1 ∪ liver2
if liver1 ∪ liver2 ⊂ live(Q)
(5.11)
liver′(Q, r)
else if liver′1 ∩ live(Q) ≠ Ø
(5.12)
liver′(Q, r)
else if liver′2 ∩ live(Q) ≠ Ø
(5.13)
liver′1∪ liver′2 otherwise
(5.14)
where liver′1 = liver′(P1, r) and liver′2 = liver′(P2, r).
5.5 Examples of Optimized RC Updates
The process begins at block 2110 where the module 270 obtains overlooking roots information from the overlooking roots analysis module 240 to identify overlooking roots. In one implementation, the overlooking roots information thus obtained contains particular state information about the roots which is coded in overlooking relationships, as is described below. In some implementations, the overlooking root information obtained at block 2110 was computed and stored during earlier processes (such as the overlooking roots analysis performed during the RC chaining or ORCS procedures), and is not recomputed at this point. In alternative implementations the overlooking roots information is newly-computed at this point.
5.5.1 Examples of Specializing RC Updates
Two implementations of substituting specialized RC updates are illustrated next. The first is concerned with trial deletion. Reference counting, by itself, cannot detect when a cyclic structure becomes unreachable. To get around this, a technique called trial deletion is used in various RC implementations which avoids a full heap traversal to capture garbage cycles.
Trial deletion is based on the following observation: when a reference is swung away from an object whose reference count is at least 2, that object may become unreachable, because it may represent a series of references pointing to each other in one or more cycles. Typically, such implementations stash these references away in a “potentially leaked cycles” (PLC) list, so that they can be processed later for reclaiming leaked cycles.
Overlooking roots make the following optimization possible: If a root v is overlooked by some other live root, say u, at the time v is swung away from an object b, then v does not have to be put on the PLC list. This is because b will still be reachable from u at that time. Thus, the decrement against v can be as if it pointed to an acyclic object. Hence, at block 2120, decrements that do not stash the updated reference away are substituted when the reference is known to be overlooked at the point of update.
The second implementation involves tracking interesting states of concrete roots using virtual roots. For example, a state can be tracked for a concrete root that is non-null. This is useful information because an RC update on a non-null reference can be substituted by a specialized version that elides the initial null check.
If a concrete root x is directly overlooked by a non-immortal root, then x is naturally non-null. But there may be cases where x is non-null, even though it may not be overlooked by any of the roots discussed above. An example is in the code fragment below:
Thus, in one implementation, it is profitable to track a concrete root's non-null state with a separate non-null root. A statement's gen calculation would have to suitably include it in the computed olook set. In both the examples above, the calculation would add it to the overlookers of x after line 1. Hence, at block 2130, updates can be substituted which do not perform a null check for those roots that are overlooked by a non-null root.
Virtual roots could also be used to track the aliasing of local and actual references. This application assumes only four kinds of virtual roots—undefined, immortal, pristine, and actual parameter roots—to keep the exposition simple.
For the most part, virtual roots are not distinguished from concrete roots when producing and consuming overlooking root information. But distinctions are sometimes needed. For instance, in ORCS and the trial deletion buffering optimizations, the live overlooker should either be immortal, actual or concrete.
5.5.2 Examples of Utilizing Virtual Immortal and Pristine Roots
Next at block 2140, RC updates for roots directly overlooked by an immortal root are removed, in one implementation according to the following process. An object can be thought of as “immortal” if it lasts, once created, until the end of a program's execution. RC updates on these objects—examples of which include string literals and GC tables—are not needed as they live, essentially, forever. Unlike for subsumption, the RC updates do not have to be “matched” for elimination; typically, even the removal of an isolated RC update on an immortal object will not compromise program correctness, or risk a memory leak.
Past techniques presented a tailored data-flow analysis for finding sets of immortal target variables (local references to immortal objects). With overlooking roots, such a custom analysis can be included in the general overlooking roots process.
This can be done by utilizing a special virtual root which is thought to always target an immortal object. This immortal root is immutable, and “materializes” when the target immortal object is allocated. In one implementation, this is done at the very beginning of program execution, to account for statically allocated data.
Immortal roots allow RC updates on other directly overlooked roots to be removed. Let dolook(P) be defined as any set of ordered pairs that honor the directly overlooking roots relation at a point P. Under this definition, there is always an olook(P) for a given dolook(P) such that dolook(P) c olook(P). Hence, if the direct overlookers among the overlookers of a root r include an immortal root, then r must be an immortal target variable. In this way, immortal analyses can be superseded by an analysis for overlooking roots.
The use of overlooking roots in subsumption analyses make it possible to go further—they permit the detection of overlookers when roots are loaded off pristine fields. A field f can be defined as being in a “pristine” state from the moment its containing object is allocated, up to the moment it is assigned a nonzero value. For example, according to the allocation semantics of virtual execution environments like Java and .NET, the value of f in the pristine state is assured to be an appropriately casted zero. Therefore, if a reference field in the pristine state is loaded into a root y, y will be directly overlooked by an immortal root. An RC update against y can then be omitted.
This definition can be slightly generalized. Rather than up to the moment at which it is assigned a nonzero value, a reference field can be considered pristine up to the moment at which it points to an object that is not immortal.
The pristine field mechanics can be captured in the framework of overlooking roots by introducing another set of virtual roots called the pristine roots. Consider the following code fragment, which corresponds to an initialization sequence:
Thus, line 3 can be optimized out because the temporary that r.f1 is loaded into (during execution of the update) can be directly overlooked by an immortal root. When r.f1 is overwritten on line 4, (r,f1) should be omitted or removed from the set of r's overlookers. Because (r,f2) will remain a pristine overlooker of r, this allows line 6 to be optimized out as well. While this description involves the use of virtual pristine roots at a very fine, field-level of granularity, in alternative implementations, pristine roots of a more course granularity may be used, such as using a single pristine root overlooker per target object, or even for all objects with pristine fields. Coarser formulations, however, will not encode the same degree of information as finer-grained formulations. Examples of techniques whereby immortal roots are introduced as overlookers are given below.
Next, at block 2220 pristine roots are caused to overlook roots targeting objects with pristine fields identified earlier up to the point they are no longer “pristine.” This may be done by automatically generating relations involving pristine roots when computing the olook sets during the overlooking root analysis for roots targeting objects with pristine fields. Thus, in one implementation the virtual pristine roots are not created in the sense that they are actually added to code in the program. Additionally, while
Next, at block 2230, points in the program that assign immortal objects for the first time are identified. This includes, for example, static objects, strings, and temporaries into which pristine fields are loaded, as described above. Next, at block 2240, a virtual immortal root is caused to directly overlook the roots assigned to target immortal objects at the points identified earlier. Similarly to the process for pristine roots discussed above, in one implementation, the overlooking roots analysis is modified to automatically generate relations involving immortal roots when computing the dolook sets for roots that target immortal objects. Note that, in the examples of updating tie functions given below, the virtual immortal root is referred to as I. Finally, at block 2250, RC updates for roots directly overlooked by this immortal root are removed, providing additional optimizations of the program. The process then ends.
6. Examples of Particular Tie Function Updating Procedures
6.1 Examples of Updating Tie Functions
Herein follow various examples of procedures and calculations for updating tie functions, according to techniques described above. Tie functions are updated for relevant functions. In various implementations, tie functions are updated according to the category of the relevant statement. To review, assuming x, y, u and v are local or static references, relevant statements can be divided into five categories:
6.2 Procedure Invocations
These are statements of the form y:=F( . . . ). This statement's analysis uses a reference mutation summary μ(F) for the callee F, if one is available. The summary is transitive, i.e., the summaries of the callees are included in the caller's summary. If no summary is available, as may happen under separate compilation, all tied pairs are killed. Otherwise, only those tied by the fields or array types in μ(F) have to be killed:
The gen set is normally Ø. However, if the program call graph is available, as may happen under whole-program compilation, better gen information can be produced.
We now define the x-projection operator mentioned above. The x-projection on a root v of a set S of ordered pairs is the set of first elements in pairs of the form (u,v) in S. This is expressed as xproj(S,v). Consider a return point Q in the function F, at which a local reference r is returned. We call xproj(olook(Q),r) the return overlooker set of F at Q. Using this definition, the intersection of the return overlooker sets across all the return points of F gives the set of roots that always overlook F's returned value. Let this be called olookret(F).
Now consider a call statement that invokes F:
For Equation (6.2) to be efficacious, the analysis should be first performed on the callees of a procedure, before being performed on the procedure itself. This can be done by processing the procedures in a postorder traversal of the call graph. The olookret sets for leaf procedures, and procedures at the ends of back edges in the call graph, can be set to Ø.
The extension makes it possible to derive valuable overlooking information across procedure boundaries, without resorting to a full interprocedural analysis. For instance, it can be determined that the reference returned by the function
6.3 Simple Assignments
There are two types of simple assignment statements discussed herein. The first are statements of the form y:=x. This statement kills pairs in which the overlooker is y, and the overlookee is not x or something overlooked by x. It also kills pairs in which the overlookee is y, and the overlooker is not x or something that overlooks x. Thus,
kill(s)={(y,v)|v≠x(x,v)∉olookin(s)}∪{(u,y)|u≠x(u,x)∉olookin(s)} (6.3)
The statement generates two kinds of overlooking pairs: (1) those in which the overlooker is y, and the overlookee is x and whatever is overlooked by x; and (2) those in which the overlookee is y, and the overlooker is x and whatever overlooks x. This gives
gen(s)={(y,v)|≠y(v=x(x,v)εolookin*(s))}∪{(u,y)|u≠y(u=x(u,x)εolookin*(s))} (6.4)
Pairs of the form (u,u) should not exist in the gen sets because of the irreflexivity of the olook sets. This is the reason for the predicates v≠y and u≠y in Equation (6.4).
As discussed above, the tie function is updated at only those pairs that are in gen*(s). From Equations (6.4) and (6.5), these are of the form (y,v) or (u,y). For the (y,v) pairs, T is updated to include the ties for (x,v). For the (u,y) pairs, it is updated to include the ties for (u,x). This leads to the update equation
applied at all (u,v)εgen*(s). In the above equation,
is a concise representation of T(u,v)←T(u,v)∪X.
The second type of statements are statements of the form y:=c, where c is a constant. Since c is a constant reference, its target can be viewed as the target of an immortal root. In various implementations, the analysis may utilize any out of a range of options on how many immortal roots to model. At one extreme, a unique immortal root may be associated with every different c. The statement can then be treated the same way as y:=x, by substituting x with the immortal root corresponding to c. At the other extreme, a single immortal root, say I, simultaneously targets all immortal objects. This implementation offers simplicity over precision. All pairs in which the overlookee is y and the overlooker is not I, or in which the overlooker is y, would then have to be killed:
kill(s)={(u,v)|(u≠Iv=y)u=y} (6.7)
The gen set calculation for this statement should produce the pair (I,y). With a single immortal root, this will be the only pair generated; the pair (y,I) is not generated because I may target more than one object, of which some may not be reachable from y. Thus
gen(s)={(I,y)} (6.8)
T is not updated here, because I has no overlookers.
6.4 Allocations
These are statements of the form y:=allocobj(T). Above, it was explained that the object returned by allocobj(T) can be thought to be the target of a pristine root. Like in the immortal case, various options are available on how many pristine roots to consider. There could be one per allocated type, or one per field per allocated type, or even one per allocation site. For the sake of simplicity, the implementation described herein assumes one pristine root, say P, for all allocated objects. With a single pristine root, the same issues that pertained to the calculations in Equations (6.9) and (6.10) apply to the kill and gen set calculations here:
kill(s)={(u,v)|(u≠Pv=y)u=y} (6.11)
gen(s)={(P,y)} (6.12)
T is also not updated here because P has no overlookers.
6.5 Heap Loads
These are statements of the form y:=x.f If x points to a thread-local object, or if f is a thread-safe field (i.e., only accessed by a particular thread) or a read-only field, we say that the statement is multithread (MT) safe. For such statements, all pairs in which y is the overlooker, and all pairs in which the overlookee is y and the overlooker is not x or something that overlooks x, must be killed. For other statements, all pairs in which y is either the overlooker or overlookee are killed:
There are a couple of cases in the gen set analysis for this statement. The easiest are the ones where s is not known to be MT-safe. In this case, depending on whether f is an immortal field, the gen set is either Ø or has the single pair (I,y). Fields are immortal if they always target immortal objects (i.e., even when simultaneously mutated by multiple threads). An example is the vtable field that all objects possess in many object-oriented language implementations.
If s is MT-safe, then gen(s) will at least have pairs in which the overlookee is y, and the overlooker is x and whatever overlooks x. These cases yield the following equation:
In Equation (6.15), the second case occurs when f is an immortal field and s is not MT-safe. In the first case, ψ is nonempty only when f is immortal, or when P directly overlooks x. If P directly overlooks x, then y can be considered to be overlooked by I, since f will then be in a pristine state. We therefore obtain the following equation for ψ:
According to Equation (6.17), there may be two types of pairs in gen*(s). The first is (I,y). The tie function will have to be updated here only if s is MT-safe and (I,x)εolookin*(s). The second type is (u,y), where u≠I. For these pairs, either (u,x)εolookin*(s) or u=x; in both of these cases, f may tie u to y. If u≠x, then whatever ties u to x may also tie u to y. This leads to the following update of the tie function, performed at all (u,v)εgen*(s):
The treatment of y:=x[e] is similar, except that instead of a tie field, the discussion involves a tie array type.
6.6 Heap Stores
These are statements of the form y.f:=x. We say that the statement is MT-safe if y points to a thread-local object, or if f is a read-only or thread-safe field. Note that it is not necessarily contradictory for f to be read-only despite this being an update of f; this will be the case if the update is in an object construction sequence. Then all accesses to f in the sequence will still be thread safe because the object being constructed will only be accessible to the initializing thread.
Irrespective of whether it is MT-safe, it will kill all overlooking pairs involving the pristine root, if the number of pristine roots is one, and if x is not directly overlooked by the immortal root. This is because the update could then destroy the pristine state of any newly allocated object.
At first glance, it would appear that all pairs in olookin(s) that are tied by f would have to be killed. But the statement has two important properties, from the standpoint of the overlooking roots relation, which permit better kill information. First it can be proven that the following relation is true: Let s be the statement y.f:=x, which is given to be MT-safe. If m overlooks x just before s, and if m is not the pristine root, then m will also overlook x just after s.
Second the following can also be proven: Let s be the statement y.f:=x, given to be not necessarily MT-safe. If n overlooks y just before s, and if n is not the pristine root, then n will also overlook y just after s. This relation, unlike the first, does not impose MT-safe requirements on s. This is because if n overlooks y just before s, then during the execution of s, there will always be a path by which n overlooks y and that is free of the specific instance of f updated by s. Hence, from the two relations above and the discussion on killing pairs that involve the pristine root, we can generate:
In Equation (6.21), the predicate (I,x)∉olookin(s)T(I,x)≠Ø is true if I might not directly overlook x. The equation then includes all pairs in which P is the overlooker.
If s is MT-safe, then all of the overlookers of y, including y, will end up overlooking x as well as whatever is overlooked by x. If s is not MT-safe, but if f is known to be an immortal field, then x will be directly overlooked by I. This is subtle, because even if another thread mutates f as s is executed, its target, by definition, remains immortal. Neither y nor x, however, end up overlooking I because I could target multiple objects. This gives
ξ in Equation (6.23) is similar to ψ in Equation (6.24). It is usually Ø, except when f is immortal:
From Equation (23), a pair in gen*(s) can be of three forms. If it is (y,x), then only f needs to be added to T(y,x). If it is of the form (y,v), where v≠x, then {f} and T(x,v) would have to be added to T(y,v). Pairs in gen*(s) that do not match (y,v) will be of the form (u,v), where either (u,y)εolookin*(s) or u=I. In both cases, a safe update of T(u,v) is to add T(u,y), {f} and T(x,v) to it. By observing that both T(x,x) and T(y,y) equal Ø, all of these cases can be combined into:
where (u,v)εgen*(s). The statement y[e]:=x is handled the same way.
6.7 Examples of Meet Operators
The meet operation for overlooking root analysis is set intersection, except that overlooking pairs containing the T root are specially dealt with. T can only occur in pairs of the form (T,u). This is because when there are no upward-exposed uses of a concrete root x, x will not overlook any other root until defined. Its only overlooker until its definition will be T, after which it will no longer be overlooked by T. Thus, if olook1 and olook2 are two olook sets that reach a confluence point, their meet olook2olook2 at that point can be given by:
Let {hacek over (R)} be the set of all roots. Because is an idempotent, commutative and associative operator on the set of ordered pairs, {hacek over (R)}×{hacek over (R)}, the pair ({hacek over (R)}×{hacek over (R)},) defines a semilattice. The semilattice, the operator, and the transfer functions in Section 4.2, form a monotone data-flow analysis framework.
7. Computing Environment
The above reference-counting insertion and overlooking-root-based optimization techniques can be performed on any of a variety of computing devices. The techniques can be implemented in hardware circuitry, as well as in software executing within a computer or other computing environment, such as shown in
With reference to
A computing environment may have additional features. For example, the computing environment (2300) includes storage (2340), one or more input devices (2350), one or more output devices (2360), and one or more communication connections (2370). An interconnection mechanism (not shown) such as a bus, controller, or network, interconnects the components of the computing environment (2300). Typically, operating system software (not shown) provides an operating environment for other software executing in the computing environment (2300), and coordinates activities of the components of the computing environment (2300).
The storage (2340) may be removable or non-removable, and includes magnetic disks, magnetic tapes or cassettes, CD-ROMs, CD-RWs, DVDs, or any other medium which can be used to store information and which can be accessed within the computing environment (2300). The storage (2340) stores instructions for the software (2380) implementing the described techniques.
The input device(s) (2350) may be a touch input device such as a keyboard, mouse, pen, or trackball, a voice input device, a scanning device, or another device that provides input to the computing environment (2300). For audio, the input device(s) (2350) may be a sound card or similar device that accepts audio input in analog or digital form, or a CD-ROM reader that provides audio samples to the computing environment. The output device(s) (2360) may be a display, printer, speaker, CD writer, or another device that provides output from the computing environment (2300).
The communication connection(s) (2370) enable communication over a communication medium to another computing entity. The communication medium conveys information such as computer-executable instructions, compressed audio or video information, or other data in a modulated data signal. A modulated data signal is a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media include wired or wireless techniques implemented with an electrical, optical, RF, infrared, acoustic, or other carrier.
The techniques described herein can be described in the general context of computer-readable media. Computer-readable media are any available media that can be accessed within a computing environment. By way of example, and not limitation, with the computing environment (2300), computer-readable media include memory (2320), storage (2340), communication media, and combinations of any of the above.
The techniques herein can be described in the general context of computer-executable instructions, such as those included in program modules, being executed in a computing environment on a target real or virtual processor. Generally, program modules include routines, programs, libraries, objects, classes, components, data structures, etc., which perform particular tasks or implement particular abstract data types. The functionality of the program modules may be combined or split between program modules as desired in various embodiments. Computer-executable instructions for program modules may be executed within a local or distributed computing environment.
For the sake of presentation, the detailed description uses terms like “determine,” “generate,” “interpolate,” and “compute” to describe computer operations in a computing environment. These terms are high-level abstractions for operations performed by a computer, and should not be confused with acts performed by a human being. The actual computer operations corresponding to these terms vary depending on implementation.
In view of the many possible variations of the subject matter described herein, we claim as our invention all such embodiments as may come within the scope of the following claims and equivalents thereto.
Patent | Priority | Assignee | Title |
10558439, | Apr 22 2013 | Embarcadero Technologies, Inc. | Automatic reference counting |
11157251, | Apr 22 2013 | Embarcadero Technologies, Inc. | Automatic reference counting |
9747088, | Apr 22 2013 | Embarcadero Technologies, Inc. | Automatic reference counting |
9891899, | Apr 22 2013 | Embarcadero Technologies, Inc. | Automatic reference counting |
Patent | Priority | Assignee | Title |
4695949, | Jul 19 1984 | Texas Instruments Incorporated; TEXAS INSTRUMENTS INCORPORATED, 13500 NORTH CENTRAL EXPRESSWAY, DALLAS, TX 75265 A DE CORP | Method for efficient support for reference counting |
4775932, | Jul 31 1984 | Texas Instruments Incorporated; TEXAS INSTRUMENTS INCORPORATED A CORP OF DE | Computer memory system with parallel garbage collection independent from an associated user processor |
4912629, | Jun 26 1986 | The United States of America as represented by the Administrator of the | Real-time garbage collection for list processing using restructured cells for increased reference counter size |
6339779, | Jun 27 1998 | UNILOC 2017 LLC | Reference counting mechanism for garbage collectors |
6363403, | Jun 30 1999 | WSOU Investments, LLC | Garbage collection in object oriented databases using transactional cyclic reference counting |
6473773, | Dec 24 1997 | International Business Machines Corporation | Memory management in a partially garbage-collected programming system |
6961740, | Aug 01 2001 | GLOBAL INFOTEK, INC | Method and system for multimode garbage collection |
6993770, | Jan 12 2001 | Oracle America, Inc | Lock free reference counting |
7031990, | Dec 06 2002 | Oracle America, Inc | Combining external and intragenerational reference-processing in a garbage collector based on the train algorithm |
7216136, | Dec 11 2000 | International Business Machines Corporation | Concurrent collection of cyclic garbage in reference counting systems |
20030140085, | |||
20040111451, | |||
20050015417, | |||
20060143421, | |||
20060167960, | |||
20070022149, | |||
20070162527, | |||
20070203960, | |||
20090094301, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Oct 05 2007 | Microsoft Corporation | (assignment on the face of the patent) | / | |||
Oct 05 2007 | JOISHA, PRAMOD G | Microsoft Corporation | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 019975 | /0082 | |
Oct 14 2014 | Microsoft Corporation | Microsoft Technology Licensing, LLC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 034542 | /0001 |
Date | Maintenance Fee Events |
Jan 30 2012 | ASPN: Payor Number Assigned. |
Jun 24 2015 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Sep 16 2019 | REM: Maintenance Fee Reminder Mailed. |
Mar 02 2020 | EXP: Patent Expired for Failure to Pay Maintenance Fees. |
Date | Maintenance Schedule |
Jan 24 2015 | 4 years fee payment window open |
Jul 24 2015 | 6 months grace period start (w surcharge) |
Jan 24 2016 | patent expiry (for year 4) |
Jan 24 2018 | 2 years to revive unintentionally abandoned end. (for year 4) |
Jan 24 2019 | 8 years fee payment window open |
Jul 24 2019 | 6 months grace period start (w surcharge) |
Jan 24 2020 | patent expiry (for year 8) |
Jan 24 2022 | 2 years to revive unintentionally abandoned end. (for year 8) |
Jan 24 2023 | 12 years fee payment window open |
Jul 24 2023 | 6 months grace period start (w surcharge) |
Jan 24 2024 | patent expiry (for year 12) |
Jan 24 2026 | 2 years to revive unintentionally abandoned end. (for year 12) |