Red Hat OpenShipt JDK lmage

The development of the Shenandoah Garbage Collector (GC) in the upcoming JDK 14 has seen significant improvements. The first one covered here (self-fixing barriers) aims to reduce local latencies that are spent in barrier mid- and slow paths. The second will cover concurrent root processing and concurrent class unloading.

Self-fixing barriers

The self-fixing barriers improvement builds on the load reference barriers that went into JDK 13. A load reference barrier is employed after a load from a reference field or array element and before the loaded object is given to the rest of the application code. In pseudocode, the barrier looks like this:

T load_reference_barrier(T* addr) {
  T obj = *addr;

  // Fast-path: accesses thread-local flag
  if (!is_gc_active()) return obj;

  // Mid-path 1: accesses small bitmap (byte per region, handful of KBs?)
  if (!is_in_collection_set(obj)) return obj;

  // Mid-path 2: accesses fwdptrs in object (entire heap?)
  T fwd = resolve_forwardee(obj);
  if (obj != fwd) return fwd;

  // Slow-path: call to runtime, once per location with non-forwarded object
  return load_reference_barrier_slowpath(obj);
}

The result is that whenever we load an object while the GC is active, and if that object is a reference into the collection set, we dive into resolving the object and possibly into the slow parts of the runtime to do the actual evacuation. Chances are that the object would be evacuated by the GC itself, and the barrier would discover this at the Mid-path 2 check, and then return from there. In the worst case, the barrier would call the runtime and do the whole thing.

The performance-sensitive part of this story is that we would go to the Mid-path 2 check all of the time for relocated objects until the GC cycle is over and GC updates the interesting references. If we do the access in a hot loop, we always walk deep into the barrier while GC is running. And the Mid-path 2 check is expensive because it reaches far and wide.

The idea behind self-fixing barriers is that when we have resolved the object and discovered the forwarded copy, we can just as well update the location right there. Since we are updating the reference to the object copy not in the collection set, on the next barrier invocation we would exit from Mid-path 1.

This is what the barrier looks like with those changes:

T load_reference_barrier(T* addr) {
  T obj = *addr;

  // Fast-path: accesses thread-local flag
  if (!is_gc_active()) return obj;

  // Mid-path 1: accesses small bitmap (byte per region, handful of KBs?)
  if (!is_in_collection_set(obj)) return obj;

  // Mid-path 2: accesses fwdptrs in objects (entire heap?)
  T fwd = resolve_forwardee(obj);

  if (obj != fwd) {
    // Can do the update here
    CAS(addr, fwd, obj);
    return fwd;
  }

  // Slow-path: call to runtime, once per location with non-forwarded object
  fwd = load_reference_barrier_slowpath(obj);
  // Can do the update here
  CAS(addr, fwd, obj);
  return fwd;
}

In other words, as soon as we get the forwardee, we dive into the slow path, where we can stamp the reference to the new copy back into the original address. We do this using a compare-and-set operation in order to avoid a potentially racing update of the same field by another Java thread that we must not override.

Now, notice that we only fail the Mid-path 2 check once per non-updated location. When we fail this check, we can fix it and then never visit the darker parts of the barrier again. With that in mind, we can simplify the barrier by moving the entire update into the slow path itself:

T load_reference_barrier(T* addr) {
  T obj = *addr;

  // Fast-path: accesses thread-local flag
  if (!is_gc_active()) return obj;

  // Mid-path 1: accesses small bitmap (byte per region, handful of KBs?)
  if (!is_in_collection_set(obj)) return obj;

  // Slow-path: call to runtime, once per non-updated location
  return load_reference_barrier_slowpath(obj, addr); // Update is actually here
}

Now we have simpler mutator-side barriers. The complication is in passing addr to the slow path, which requires fiddling with the interpreter, C1, and C2—this is why it is not done this way from the beginning. Also, notice the caveat. For non-updated locations, we used to exit earlier from the Mid-path 2 check. Now, we enter the runtime for them. While this behavior looks worse in the code, doing many Mid-path 2 checks for hot objects that are in the collection set is much more expensive than entering the runtime for fix-up once per location.

These mechanics also leave less work to do for the GC workers during the update-references phase later.

Conclusion

Hopefully, you now have a better sense of how the self-fixing barriers in JDK 14's Shenandoah GC release can help to reduce local latencies in barrier mid- and slow paths. In the next article, I will cover concurrent root processing and concurrent class unloading. Together, these features reduce GC pause times, and thus global latencies, by moving GC work from the pause to the concurrent phase.

Last updated: June 29, 2020