Saturday, 19 September 2015

Garbage Collection Algorithms - memory recycling

The Java virtual machine's heap stores all objects created by a running Java application. Objects are created by the new never freed explicitly by the code. Garbage collection is the process of automatically freeing objects that are no longer referenced by the program.

Why Garbage Collection?

Objects no longer needed by the program are "garbage" and can be thrown away. When an object is no longer referenced can be recycled from heap so that the space is made available new objects.

The garbage collector determines which objects are no longer referenced by the program and make available the heap space occupied by such unreferenced objects.

In the process of freeing unreferenced objects, the garbage collector must run any finalizers of objects being freed.

To freeing unreferenced objects, a garbage collector may also combat heap fragmentation. New objects are allocated, and unreferenced objects are freed such that free portions of heap memory are left in between portions occupied by live objects. New object may allocated by extending the size of the heap even though there is enough total unused space in the existing heap. If there is not enough contiguous free heap space available into which the new object will fit.

On a virtual memory system, the extra paging (or swapping) required to service an ever growing heap can degrade the performance of the executing program. On an embedded system with low memory, fragmentation could cause the virtual machine to "run out of memory" unnecessarily.

Advantages, Garbage collection relieves programmer from the burden of freeing allocated memory. It helps ensure program integrity. Garbage collection is an important part of Java's security strategy.

Disadvantage, The JVM has to keep track of which objects are being referenced by the executing program, and finalize and free unreferenced objects on the fly. This activity will likely require more CPU time than would have been required if the program explicitly freed unnecessary memory. Programmers in a garbage-collected environment have less control over the scheduling of CPU time devoted to freeing objects that are no longer needed.

Garbage Collection Algorithms

Garbage collection algorithm has two basic steps.
It must detect garbage objects.

It must reclaim the heap space used by the garbage objects and make the space available again to the program.

Garbage detection is ordinarily accomplished by defining a set of roots and determining reachability from the roots. If there is some path of references from the roots, an object is reachable (considered "live" else considered garbage because there longer use in program execution).

The root set is dependent JVM implementation, but would always include any object references in the local variables and operand stack, object references (any class variables, strings constant pool of loaded classes).

The constant pool of a loaded class may refer to strings stored on the heap, such as the class name, superclass name, super interface names, field names, field signatures, method names, and method signatures.

Two basic approaches to distinguishing live objects from garbage are reference counting and tracing.

Reference Counting Collectors

Reference counting garbage collectors distinguish live objects from garbage objects by keeping a count for each object on the heap. The count keeps track of the number of references to that object.

Reference counting was an early garbage collection strategy. In this approach, a reference count is maintained for each object on the heap. When an object is first created and a reference to it is assigned to a variable, the object's reference count is set to one. When any other variable is assigned a reference to that object, the object's count is incremented. When a reference to an object goes out of scope or is assigned a new value, the object's count is decremented.

Any object with a reference count of zero can be garbage collected. When an object is garbage collected may lead to subsequent garbage collection if any objects refers garbage collected object.

Advantage, of this approach is that a reference counting collector can run in small chunks of time closely interwoven with the execution of the program. This characteristic makes it particularly suitable for real-time environments where the program can't be interrupted for very long.

Disadvantage, is that reference counting does not detect cycles: two or more objects that refer to one another. Parent object has a reference to a child object that has a reference back to the parent. These objects will never have a reference count of zero even though they may be unreachable by the roots of the executing program.

Reference counting is the overhead of incrementing and decrementing the reference count each time.

Because of the disadvantages inherent in the reference counting approach, this technique is currently out of favor.

Tracing Collectors

Tracing garbage collectors trace out the graph of object references starting with the root nodes. Objects that are encountered during the trace are marked in some way. Marking is generally done by either setting flags in the objects themselves or by setting flags in a separate bitmap. After the trace is complete, unmarked objects are known to be unreachable and can be garbage collected.

The basic tracing algorithm is called "mark and sweep."  In the mark phase, the garbage collector traverses the tree of references and marks each object it encounters. In the sweep phase, unmarked objects are freed.
In the JVM, the sweep phase must include finalization of objects.

Compacting Collectors

Garbage collectors of JVM will likely have a strategy to combat heap fragmentation.

Mark and sweep collectors are commonly uses two strategies compacting and copying.

Both of these approaches move objects on the fly to reduce heap fragmentation. Compacting collectors slide live objects over free memory space toward one end of the heap result to other end of the heap becomes one large contiguous free area. All references to the moved objects are updated to refer to the new location.

Updating references to moved objects is sometimes made simpler by adding a level of indirection to object references. Instead of referring directly to objects on the heap, object references refer to a table of object handles. The object handles refer to the actual objects on the heap. When an object is moved, only the object handle must be updated with the new location. All references to the object in the executing program will still refer to the updated handle, which did not move. While this approach simplifies the job of heap de-fragmentation, it adds a performance overhead to every object access.

Copying Collectors

Copying garbage collectors move all live objects to a new area. As the objects are moved to the new area, they are placed side by side, thus eliminating any free space that may have separated them in the old area.

Advantage, of this approach is that objects can be copied as they are discovered by the traversal from the root nodes. There are no separate mark and sweep phases. Objects are copied to the new area on the fly, and forwarding pointers are left in their old locations. The forwarding pointers allow the garbage collector to detect references to objects that have already been moved. The garbage collector can then assign the value of the forwarding pointer to the references so they point to the object's new location.

A common copying collector algorithm is called "stop and copy."

In this scheme, the heap is divided into two regions. Only one of the two regions is used at any time. Objects are allocated from one of the regions until all the space in that region has been exhausted. At that point program execution is stopped and the heap is traversed. Live objects are copied to the other region as they are encountered by the traversal. When the stop and copy procedure is finished, program execution resumes. Memory will be allocated from the new heap region until it too runs out of space. At that point the program will once again be stopped. The heap will be traversed and live objects will be copied back to the original region. The cost associated with this approach is that twice as much memory is needed for a given amount of heap space because only half of the available memory is used at any time.

Disadvantage, of simple stop and copy collectors is that all live objects must be copied at every collection.

Generational Collectors

This facet of copying algorithms can be improved:
  1. Most objects created by most programs have very short lives.
  2. Most programs create some objects that have very long lifetimes. A major source of inefficiency in simple copying collectors is that they spend much of their time copying the same long-lived objects again and again.
Generational collectors address this inefficiency by grouping objects by age and garbage collecting younger objects more often than older objects. In this approach, the heap is divided into two or more sub-heaps, each of which serves one "generation" of objects. The youngest generation is garbage collected most often. As most objects are short-lived, only a small percentage of young objects are likely to survive their first collection. Once an object has survived a few garbage collections as a member of the youngest generation, the object is promoted to the next generation: it is moved to another sub-heap. Each progressively older generation is garbage collected less often than the next younger generation. As objects "mature" (survive multiple garbage collections) in their current generation, they are moved to the next older generation.

The generational collection technique can be applied to mark and sweep algorithms as well as copying algorithms. In either case, dividing the heap into generations of objects can help improve the efficiency of the basic underlying garbage collection algorithm.

Adaptive Collectors

An adaptive algorithm the current situation on the heap and adjusts its garbage collection technique accordingly. It may tweak the parameters of a single garbage collection algorithm as the program runs. Because every garbage collection algorithms work better in some situations, while others work better in other situations. It may switch from one algorithm to another on the fly. Or it may divide the heap into sub-heaps and use different algorithms on different sub-heaps simultaneously.

With an adaptive approach, designers of JVM implementations can choose garbage collection technique and can use algorithms for which it is best suited.

No comments:

Post a Comment

Related Posts Plugin for WordPress, Blogger...