Concurrent roots and class unloading

The first part of this miniseries about Shenandoah GC in JDK 14 covered self-fixing barriers. This article discusses concurrent roots processing and concurrent class unloading, both of which aim to reduce GC pause time by moving GC work from the pause to a concurrent phase.

Concurrent roots processing

Once concurrent marking is done, Shenandoah needs to complete the marking and prepare for evacuation. While these are two logically independent operations, they are performed under a single pause that is confusingly named "Final Mark."

While evacuation itself is concurrent in Shenandoah, there are still a few things that need to be done at pause. These include:

  • Pre-evacuating and updating non-weak roots (for example, thread stacks and strong JNI handles).
  • Pre-evacuating and cleaning up weak roots (for example, string tables and weak JNI handles).
  • Unloading classes.

Since this work is done during the pause, it affects pause times. To minimize these pause times, we want to perform most of these tasks concurrently. This approach is particularly important for any GC roots that are unbounded in size.

The reason we need to pre-evacuate and update all GC roots during the pause is to ensure the strong invariant. Any object that is read from or stored to must be in to-space.

Here is the important caveat: Loading the objects out of GC roots does not employ load reference barriers. So, the application has to see the correct copy of the object, and we have to perform the evacs and updates before unblocking from the pause. In this problem statement lies a relatively simple solution: Ensure that loads from relevant GC roots are guarded by a Load Reference Barrier (LRB) that we call "native LRB," and move the actual updating of those roots to the concurrent phase.

The so-called "weak" roots are special, though. During marking, we might determine that certain GC roots are no longer reachable. An example of this issue is weak JNI handles. Once the weak JNI handle is declared dead (during final mark), it should not be accidentally resurrected—for example, by inserting the reference to its presumed-dead object back into the heap.

Therefore, not only do we have to pre-evacuate and update the weak roots that are reachable (like all other roots), we also need to clean up the weak roots that are not reachable so the application cannot possibly touch and resurrect them.

Moving this cleanup to the concurrent phase requires extra work for the native LRB, which checks whether a weak root is reachable (as told by the marking bitmap). If the weak root is not reachable, the native LRB simply returns NULL, thus pretending to the rest of the JVM that the handle is already cleaned. This process ensures that we do not accidentally make an already-unreachable object reachable again.

In pseudocode, the native LRB looks like this:

T native_LRB(T* addr) {
  T obj = *addr; // Load from GC root
  if (is_reachable(obj)) {
    return LRB(obj);
  } else {
    return NULL;
  }
}

Concurrent class unloading

Another large item during the final mark pause used to be class unloading, which is important for applications that make heavy use of class loaders. This situation is usually the case for application servers and other large-ish applications (e.g., IDEs). However, class unloading is also relevant when using anonymous classes (each of which has its own class loader) and lambdas (similar to anonymous classes).

Class unloading is a complex procedure. It requires the code to determine whether or not classes (or rather, class loaders) are reachable. This check already happens during concurrent marking. When reachability of all objects (including class-loaders) is established, all unreachable class loaders and their classes and auxiliary data structures need to be unlinked and cleaned. Compiled code that belongs to those classes needs to get cleaned.

For the most part, Shenandoah's implementation builds on the work done by ZGC developers in JDK 13. This implementation does require the native barriers described above. In addition to that, it also requires so-called "nmethod entry barriers."

Usually, during the pause, we need to pre-evacuate and update all references that are embedded in all compiled methods. Ideally, we would only pre-evacuate/update references in methods that are currently executed (i.e., reachable by frames on the stacks), and handle other methods concurrently. In order for this approach to work, we need to handle the scenario where a thread starts executing a method.

The idea behind nmethod barriers is that they are executed whenever a method is called. Before execution is handed over to the method, the GC barrier is called to do certain things. In Shenandoah, this means to scan the method's code for embedded objects (constants) and evacuate-and-update them, in order to ensure the strong invariant above. Live nmethods are armed at the final mark pause and disarmed by either GC threads during a concurrent phase or by Java threads when the nmethods are about to be executed.

The net advantage of concurrent roots processing and concurrent class unloading is that the final mark pause is shorter, and thus global latency is improved, even when the application makes heavy use of class loaders or JNI handles.

Last updated: June 29, 2020