The Java virtual machine's heap stores all objects created by a
running Java application. Objects are created by the new never
freed explicitly by the code. Garbage collection is the process of
automatically freeing objects that are no longer referenced by the program.
Why Garbage Collection?
Objects no longer needed by
the program are "garbage" and can be thrown away. When an object is
no longer referenced can be recycled from heap so that the space is made
available new objects.
The garbage collector determines
which objects are no longer referenced by the program and make available the
heap space occupied by such unreferenced objects.
In the
process of freeing unreferenced objects, the garbage collector must run any
finalizers of objects being freed.
To
freeing unreferenced objects, a garbage collector may also combat heap fragmentation. New objects are
allocated, and unreferenced objects are freed such that free portions of heap
memory are left in between portions occupied by live objects. New object may allocated
by extending the size of the heap even though there is enough total unused
space in the existing heap. If there is not enough contiguous free heap space
available into which the new object will fit.
On a
virtual memory system, the extra paging (or swapping) required to service an
ever growing heap can degrade the performance of the executing program. On an
embedded system with low memory, fragmentation could cause the virtual machine
to "run out of memory" unnecessarily.
Advantages, Garbage
collection relieves programmer from the burden of freeing allocated memory. It
helps ensure program integrity. Garbage collection is an important part of
Java's security strategy.
Disadvantage, The JVM
has to keep track of which objects are being referenced by the executing
program, and finalize and free unreferenced objects on the fly. This activity
will likely require more CPU time than would have been required if the program
explicitly freed unnecessary memory. Programmers in a garbage-collected
environment have less control over the scheduling of CPU time devoted to
freeing objects that are no longer needed.
Garbage Collection Algorithms
Garbage collection
algorithm has two basic steps.
It must detect garbage objects.
It must reclaim the heap space used by the garbage objects and
make the space available again to the program.
Garbage
detection is ordinarily accomplished by defining a set of roots and
determining reachability from the roots. If there is some path of
references from the roots, an object is reachable (considered "live"
else considered garbage because there longer use in program execution).
The root set is dependent JVM
implementation, but would always include any object references in the local
variables and operand stack, object references (any class variables, strings constant
pool of loaded classes).
The constant pool of a
loaded class may refer to strings stored on the heap, such as the class name,
superclass name, super interface names, field names, field signatures, method
names, and method signatures.
Two basic
approaches to distinguishing live objects from garbage are reference counting and tracing.
Reference Counting Collectors
Reference
counting garbage collectors distinguish live objects from garbage objects by
keeping a count for each object on the heap. The count keeps track of the
number of references to that object.
Reference counting was an
early garbage collection strategy. In this approach, a reference count is
maintained for each object on the heap. When an object is first created and a
reference to it is assigned to a variable, the object's reference count is set
to one. When any other variable is assigned a reference to that object, the
object's count is incremented. When a reference to an object goes out of scope
or is assigned a new value, the object's count is decremented.
Any object with a reference
count of zero can be garbage collected. When an object is garbage collected may
lead to subsequent garbage collection if any objects refers garbage collected
object.
Advantage, of this
approach is that a reference counting collector can run in small chunks of time
closely interwoven with the execution of the program. This characteristic makes
it particularly suitable for real-time environments where the program can't be
interrupted for very long.
Disadvantage, is that reference
counting does not detect cycles: two or more objects that refer to
one another. Parent object has a reference
to a child object that has a reference back to the parent. These objects will never have a reference count of zero
even though they may be unreachable by the roots of the executing program.
Reference counting is the
overhead of incrementing and decrementing the reference count each time.
Because of the
disadvantages inherent in the reference counting approach, this technique is
currently out of favor.
Tracing Collectors
Tracing garbage collectors
trace out the graph of object references starting with the root nodes. Objects
that are encountered during the trace are marked in some way. Marking is
generally done by either setting flags in the objects themselves or by setting
flags in a separate bitmap. After the trace is complete, unmarked objects are
known to be unreachable and can be garbage collected.
The basic
tracing algorithm is called "mark
and sweep." In the mark phase,
the garbage collector traverses the tree of references and marks each object it
encounters. In the sweep phase, unmarked objects are freed.
In the JVM,
the sweep phase must include finalization of objects.
Compacting Collectors
Garbage collectors of JVM
will likely have a strategy to combat heap fragmentation.
Mark and
sweep collectors are commonly uses two strategies compacting and copying.
Both of these approaches
move objects on the fly to reduce heap fragmentation. Compacting collectors
slide live objects over free memory space toward one end of the heap result to other
end of the heap becomes one large contiguous free area. All references to the
moved objects are updated to refer to the new location.
Updating references to
moved objects is sometimes made simpler by adding a level of indirection to
object references. Instead of referring
directly to objects on the heap, object references refer to a table of object
handles. The object handles refer to the actual objects on the heap. When
an object is moved, only the object handle must be updated with the new
location. All references to the object in the executing program will still
refer to the updated handle, which did not move. While this approach simplifies
the job of heap de-fragmentation, it adds a performance overhead to every object
access.
Copying Collectors
Copying garbage collectors
move all live objects to a new area. As the objects are moved to the new area,
they are placed side by side, thus eliminating any free space that may have
separated them in the old area.
Advantage, of this
approach is that objects can be copied as they are discovered by the traversal
from the root nodes. There are no separate mark and sweep phases. Objects are
copied to the new area on the fly, and forwarding pointers are left in their
old locations. The forwarding pointers allow the garbage collector to detect
references to objects that have already been moved. The garbage collector can
then assign the value of the forwarding pointer to the references so they point
to the object's new location.
A common copying collector
algorithm is called "stop and copy."
In this scheme, the heap is
divided into two regions. Only one of the two regions is used at any time.
Objects are allocated from one of the regions until all the space in that
region has been exhausted. At that point program execution is stopped and the
heap is traversed. Live objects are copied to the other region as they are
encountered by the traversal. When the stop and copy procedure is finished,
program execution resumes. Memory will be allocated from the new heap region
until it too runs out of space. At that point the program will once again be
stopped. The heap will be traversed and live objects will be copied back to the
original region. The cost associated with this approach is that twice as much
memory is needed for a given amount of heap space because only half of the
available memory is used at any time.
Disadvantage, of simple stop and copy collectors is that all live
objects must be copied at every collection.
Generational Collectors
This facet of copying
algorithms can be improved:
- Most
objects created by most programs have very short lives.
- Most
programs create some objects that have very long lifetimes. A major source
of inefficiency in simple copying collectors is that they spend much of
their time copying the same long-lived objects again and again.
Generational collectors
address this inefficiency by grouping objects by age and garbage collecting
younger objects more often than older objects. In this approach, the heap is
divided into two or more sub-heaps, each of which serves one
"generation" of objects. The youngest generation is garbage collected
most often. As most objects are short-lived, only a small percentage of young
objects are likely to survive their first collection. Once an object has
survived a few garbage collections as a member of the youngest generation, the
object is promoted to the next generation: it is moved to another sub-heap.
Each progressively older generation is garbage collected less often than the
next younger generation. As objects "mature" (survive multiple
garbage collections) in their current generation, they are moved to the next
older generation.
The generational collection
technique can be applied to mark and sweep algorithms as well as copying
algorithms. In either case, dividing the heap into generations of objects can
help improve the efficiency of the basic underlying garbage collection
algorithm.
Adaptive Collectors
An
adaptive algorithm the current situation on the heap and adjusts its garbage
collection technique accordingly. It may tweak the parameters of a single
garbage collection algorithm as the program runs. Because every garbage
collection algorithms work better in some situations, while others work better
in other situations. It may switch from one algorithm to another on the fly. Or
it may divide the heap into sub-heaps and use different algorithms on different
sub-heaps simultaneously.
With an adaptive approach,
designers of JVM implementations can choose garbage collection technique and can
use algorithms for which it is best suited.
No comments:
Post a Comment