Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • Guided learning
      Receive custom learning paths powered by our AI assistant.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

Shenandoah in OpenJDK 17: Sub-millisecond GC pauses

September 16, 2021
Roman Kennke
Related topics:
JavaOpen source
Related products:
Red Hat build of OpenJDK

    Our primary motivation for the Shenandoah OpenJDK garbage collection (GC) project is to reduce garbage collection pause times. In JDK 12, we released the original Shenandoah garbage collector, which implements concurrent heap evacuation, which solved the major problem of cleaning (potentially large) heaps without stopping the application. This version was eventually backported to JDK 11. In JDK 14, we implemented concurrent class unloading, and in JDK 16, we added concurrent reference processing, both of which further reduced pause times in those garbage collection operations. The remaining garbage collection operation under pause was thread-stack processing, which we've solved in JDK 17.

    This article introduces the new concurrent thread-stack processing in Shenandoah GC. Processing thread stacks concurrently gives us reliable sub-millisecond pauses in JDK 17.

    Thread processing in Java

    What is thread processing and why do we need to stop the application for it? Java programs are executed in threads, and each thread owns a stack: A list of stack frames, each frame holding local variables, monitors, and other information related to the currently executed method. Most importantly, in the context of Java garbage collection, it holds references to heap objects (e.g., local variables to reference typed objects).

    When a garbage collection cycle is initiated, we first scan all of the threads' stacks to seed marking queues with the references that we find on stacks. We do so at a GC pause (safepoint) because we need a consistent state of the stack at mark start, without the thread's execution concurrently messing with the stack. When we're done, we resume execution and traverse the graph of reachable objects, starting from the references that we found during the initial thread scan.

    Likewise, when evacuating reachable objects into empty regions, we need to update all references on thread stacks to point to the new object locations. We need to do so at a pause because garbage collection load barriers normally act when loading the reference from the heap (for example into the local variable or register), which means local variables or registers cannot have object references in the state that requires a GC intervention. It is too late, at that point, to pass the garbage collection barrier. Invoking a garbage collection barrier for every local variable or register access quicky runs into performance problems.

    Scanning and processing threads stacks takes time. Smallish workloads (few threads with small stacks) would probably take very few milliseconds to scan, but large workloads— application servers, I'm looking at you!—can easily take several dozens of milliseconds to process. All of that processing is done while the application is stopped, so it affects the application's overall end-to-end latency.

    Concurrent thread processing in OpenJDK 17

    How can we improve the situation and process thread stacks concurrently? We do so by utilizing a mechanism called stack watermarks (originally implemented by ZGC developers). The central observation is that all the thread stack's action happens in the top-most frame: The currently executed method. All the frames below that are basically static and don't change—they can safely be scanned concurrently by garbage collection threads. All we need to do is coordinate GC threads with executing threads whenever a stack frame is destroyed (e.g., by a return to caller, or by throwing an exception), and thus falls out of GC processing. This coordination is achieved by the stack watermark, a pointer that tells us which parts of the stack are safe to scan, and a barrier that allows garbage collectors to deal with returns. Figure 1 illustrates the role of the stack watermark in concurrent thread processing.

    The stack watermark in concurrent thread processing.
    Figure 1: The stack watermark in concurrent thread processing.

    Using stack watermarks during garbage collection

    Let's consider an example. Say, at the beginning of marking, during the initial pause, we set the stack watermark to the top-most frame of each thread and arm the thread. That means that we consider all frames as safe for concurrent scanning, but none (yet) for execution. Upon returning from the safepoint, we hand control back to the Java program, and thus need the top frame to be safe for execution. Here, the stack-watermark barrier kicks in, and lets the garbage collector handle the top frame (and its caller, for practical reasons). The thread will scan the top frames and lower the watermark accordingly, and resume its own execution at the point where it left before the safepoint. Meanwhile, GC threads also start working to scan the stacks, from the bottom up to the watermark, i.e., in the safe zone. Whenever a thread executes something that would destroy a stack frame it does a few things:

    1. Lowers the watermark by one frame.
    2. Prevents GC threads from scanning beyond that watermark.
    3. Handles the frame that is now above the watermark, by scanning it for any references.

    The end result is that we will effectively scan all the same frames and references as we would have done at the initial mark pause, but we have done so concurrently, while the program is executing.

    Benchmarking Shenandoah GC

    So what is the effect of those changes in practice? I have run a number of benchmarks measuring garbage collection pauses. The table below shows the average pause times over all benchmarks in JDK 11, JDK 16, and JDK 17. The difference between JDK 16 and JDK 17 shows the improvement that is achieved by concurrent stack processing. The difference to JDK 11 is shown for completeness and includes the various other improvements from previous releases.

    Table 1: Benchmarks measuring GC pauses in OpenJDK.
      Init mark Final mark
    JDK 11 421 µs 1294 µs
    JDK 16 321 µs 704 µs
    JDK 17 63 µs 328 µs

    Conclusion

    This article explained how concurrent thread stack processing in Shenandoah GC solves the remaining garbage collection pause-time problem and delivers reliable sub-millisecond garbage collection pauses in JDK 17. To learn more, visit the GitHub repository and OpenJDK Wiki page for the Shenandoah GC project. See the previous articles in this series detailing how Shenandoah improves garbage collection pause times in OpenJDK. You can also download the Red Hat build of OpenJDK.

    Last updated: January 6, 2023

    Related Posts

    • Shenandoah garbage collection in OpenJDK 16: Concurrent reference processing

    • Shenandoah GC in JDK 14, Part 2: Concurrent roots and class unloading

    • DevNation Talk: Shenandoah: An ultra-low pause time garbage collector for OpenJDK

    • How the JVM uses and allocates memory

    Recent Posts

    • Every layer counts: Defense in depth for AI agents with Red Hat AI

    • Fun in the RUN instruction: Why container builds with distroless images can surprise you

    • Trusted software factory: Building trust in the agentic AI era

    • Build a zero trust AI pipeline with OpenShift and RHEL CVMs

    • Red Hat Hardened Images: Top 5 benefits for software developers

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Chat Support

    Please log in with your Red Hat account to access chat support.