Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Best practices for Java in single-core containers

April 19, 2022
Ben Evans
Related topics:
ContainersJava
Related products:
Red Hat OpenShift

Share:

    An increasing number of Java applications run in containers. The exact number is hard to determine, because adoption of containers depends upon the market segment and cloud maturity of each particular team or company. However, some data is available—for example, data from New Relic suggests that over 62% of their customers' Java workloads run in containers. Like all data points, this one is an imperfect proxy for the market as a whole, but the report demonstrates that a significant subset of the Java market has already moved to container-based environments. Anecdotal data also tells us that this migration trend is far from over.

    Teams using Java need to pay special attention to some aspects of container-based deployments and adopt a couple of best practices. This article focuses on the choice of garbage collector (GC) and how the default choice is based on available CPUs and memory.

    Java application lifecycle

    The traditional Java application lifecycle consists of a number of phases: Bootstrap, intense class loading, and a warmup with just-in-time (JIT) compilation, followed by a long-lived steady state lasting for days or weeks with relatively little class loading or JIT. This makes sense when we remember that Java started as a server-side technology in an era where JVMs ran on bare metal in data centers.

    In that world, cluster scaling involved ordering more physical machines and having them delivered to your data centers, application version upgrades happened perhaps every few months, and application processes measured their uptime in weeks or months. Scalability of Java applications typically focused on scaling up, with the goal to make Java perform efficiently on large multicore machines with large amounts of memory.

    This model of deployment for Java apps is challenged by cloud deployments in a few distinct but related ways:

    • Containers might live for much shorter time periods (seconds, in some cases).

    • Cluster sizes might be dynamically readjusted or reconfigured (e.g., through Kubernetes).

    • Microservice architectures tend to imply smaller process sizes and shorter lifetimes.

    As a result of these factors, many developers, when migrating their Java applications into containers, try to use the smallest possible containers. This seems to make sense, as cloud-based applications are typically charged by the amount of RAM and CPU they use.

    However, there are some subtleties here that might not be apparent to engineers who are not Java specialists. Let's take a closer look.

    The JVM as a dynamic execution platform

    The JVM is a very dynamic platform that sets certain important parameters at startup time based on the observed properties of the machine that it's running on. These properties include the count and type of the CPUs and the available physical memory, as perceived by the JVM. The behavior of the running application can and will be different when running on differently-sized machines—and this applies to containers too.

    Some dynamic properties that the JVM observes at startup include:

    • JVM Intrinsics: Hand-tuned implementations of performance-critical methods that rely upon specific CPU features (vector support, SIMD, etc.)

    • Sizes of internal threadpools (such as the "common pool")

    • Number of threads used for GC

    Just from this list, you can see that incorrectly defining the resources needed for the container image can cause problems related to GC or common thread operations.

    However, the problem is fundamentally deeper than this. Current versions of Java, including Java 17, perform some dynamic checks and decide on the GC to use ergonomically (automatically) if a GC is not explicitly specified on the command line.

    To track down the logic for this, let's look at the OpenJDK source code. Specifically, in the src/hotspot/share/gc/shared/gcConfig.cpp file, you can find a C++ method called GCConfig::select_gc(), which calls GCConfig::select_gc_ergonomically() unless a GC is explicitly chosen. The code for this method is:

    void GCConfig::select_gc_ergonomically() {
      if (os::is_server_class_machine()) {
    #if INCLUDE_G1GC
        FLAG_SET_ERGO_IF_DEFAULT(UseG1GC, true);
    #elif INCLUDE_PARALLELGC
        FLAG_SET_ERGO_IF_DEFAULT(UseParallelGC, true);
    #elif INCLUDE_SERIALGC
        FLAG_SET_ERGO_IF_DEFAULT(UseSerialGC, true);
    #endif
      } else {
    #if INCLUDE_SERIALGC
        FLAG_SET_ERGO_IF_DEFAULT(UseSerialGC, true);
    #endif
      }
    }

    The meaning of this code snippet is somewhat obscured by the C++ macros (which are used everywhere in Hotspot's source), but it basically boils down to this: For Java 11 and 17, if you didn't specify a collector, the following rules apply:

    • If the machine is server class, choose G1 as the GC.

    • If the machine is not server class, choose Serial as the GC.

    The Hotspot method that determines whether a machine is server class is os::is_server_class_machine(). Looking at the code for this, you'll find:

    // This is the working definition of a server class machine:
    // >= 2 physical CPU's and >=2GB of memory

    This means that if a Java application runs on a machine or in a container that appears to have fewer than two CPUs and less than 2GB of memory, the Serial algorithm will be used unless the deployment chooses a specific GC algorithm explicitly. This result is usually not what teams want, because it typically causes longer stop-the-world (STW) pause times than G1.

    Let's see this effect in action. As an example application, we'll use HyperAlloc, which is part of Amazon's Heapothesys project. This benchmarking tool is "a synthetic workload which simulates fundamental application characteristics that affect garbage collector latency."

    We spin up a container image from a simple Dockerfile:

    FROM docker.io/eclipse-temurin:17
    RUN mkdir /app
    COPY docker_fs/ /app
    WORKDIR /app
    CMD ["java", "-Xmx1G", "-XX:StartFlightRecording=duration=60s,filename=hyperalloc.jfr", "-jar", "HyperAlloc-1.0.jar", "-a", "128", "-h", "1024", "-d", "60"]

    The HyperAlloc parameters in use are a heap size of 1GB, a simulation run time of 60 seconds, and an allocation rate of 128MB per second. This is the image we will use for a single core.

    We'll also create an image that is identical, except for an allocation rate of 256MB per second to use with a two-core container. The higher allocation rate in the second case is intended to compensate for the larger amount of CPU that is available to HyperAlloc, so that both versions experience the same allocation pressure.

    Java Flight Recorder (JFR) allows us to capture a log for the entire duration, which is quite short at 60 seconds, but provides ample time to demonstrate the JVM's overall behavior in this simple example.

    We are comparing two cases:

    • 1 CPU, 2GB image with 128MB alloc rate (Serial GC)

    • 2 CPUs, 2GB image with 256MB alloc rate (G1 GC)

    The two GC data points that we want to look at are the pause time and GC throughput (expressed as total CPU expended for performing GC) for the separate cases.

    If you would like to explore this example and experiment with your own data, the code for it can be found in the JFR Hacks GitHub repository. The project depends upon JFR Analytics, by Gunnar Morling, which provides an SQL-like interface to query JFR recording files.

    Note: In all the graphs that follow, the timestamps of the runs have been normalized to milliseconds after VM start.

    Let's start with pause time. Figure 1 shows the total pause time for the two cases.

    The 1-CPU run has much higher total pauses than the 2-CPU run.
    Figure 1. The 1-CPU run has much higher total pauses than the 2-CPU run.
    Figure 1: The one-CPU run has much higher total pauses than the two-CPU run.

    This outcome shows the clear benefit of using G1: All the collections have much shorter pause times. The G1New collections are shorter than Serial's young collections (known as DefNew). However, there are almost three times as many G1New collections as DefNew.

    The reason for this leap is that young collections are always fully STW because the allocator threads (the application's threads) tend to have high or unpredictable allocation rates. This means that competition between the GC threads and the allocation threads for CPU is not a winning proposition—it's better to accept an STW pause for young collections and keep them as short as possible.

    G1 is not an "all-or-nothing" collector. Because its work is based on regions, it can collect a few young regions to stay ahead of the current allocation rate and then restart the application threads, leading to a higher number of shorter pauses. We will have more to say about the overall effect of this trade-off later.

    For the old collections, the effect is even more pronounced: G1Old actually experiences a dip in total pause time, whereas SerialOld experiences a clear spike for the old collections. This is because G1Old is a concurrent collector, and so for the majority of the runtime of the collection, it is running alongside the application threads. In our two-CPU example, while G1Old is running, one CPU is being used for GC and one for application threads.

    Figure 2 shows the elapsed time to perform each collection, and contrasts it to the total stop time. This illustrates the concurrent nature of G1Old.

    The 1-CPU run spends much more time on a GC than the 2-CPU run.
    Figure 2. The 1-CPU run spends much more time on a GC than the 2-CPU run.
    Figure 2: The one-CPU run spends much more time on a GC than the two-CPU run.

    Sure enough, the dips in total pause time that were associated with G1Old have become peaks in elapsed time. It's also apparent to the eye that there are more or less the same number of old GCs whether G1 or Serial is used.

    One obvious question that might be asked at this point is: What is the overall cost, in CPU time, of performing GC? Is it possible that, because there are more young G1 collections than young Serial collections, the overall CPU time used by G1 is higher? To answer this question, take a look at Figure 3, which shows the cumulative time spent in GC for the two collectors.

    The 2-CPU run spends more time GC as a whole than the 1-CPU run, but not twice as much time.
    Figure 3. The 2-CPU run spends more time GC as a whole than the 1-CPU run, but not twice as much time.
    Figure 3: The two-CPU run spends more time GC as a whole than the one-CPU run, but not twice as much time.

    At first glance, it does seem as though G1 uses more CPU than Serial. However, it is worth remembering that the G1 run uses two CPUs and is dealing with twice the allocation rate. So on a per-CPU or per-allocation-GB basis, G1 is still more efficient than Serial.

    The overall takeaway is that despite the apparent attractiveness of smaller containers, in almost all cases it is better to run Java processes in containers with two visible CPUs and 2GB of memory and allow G1's concurrent GC to exploit the available resources.

    Conclusion, and a look ahead

    Having clearly seen the effect in this toy example, there's one major remaining question: How does this effect play out for containers in production?

    The answer is that it depends upon the Java version and kernel support in place: In particular—whether a particular kernel API known as cgroups is at v1 or v2.

    In the second part of this series, Severin Gehwolf will explain the deep-dive details of exactly how Hotspot detects the container properties and auto-sizes based upon them. You might also want to check out a recent article from Microsoft on containerizing Java applications.

    Last updated: December 14, 2023

    Related Posts

    • Beyond Loom: Weaving new concurrency patterns

    • How to choose the best Java garbage collector

    • Stages and levels of Java garbage collection

    • Java 17: What’s new in OpenJDK's container awareness

    • The process of migrating Java applications

    • What's new for developers in JDK 21

    Recent Posts

    • How to run a fraud detection AI model on RHEL CVMs

    • How we use software provenance at Red Hat

    • Alternatives to creating bootc images from scratch

    • How to update OpenStack Services on OpenShift

    • How to integrate vLLM inference into your macOS and iOS apps

    What’s up next?

    java-nutshell-cover_Share

    Read Java in a Nutshell, the reference guide every Java developer needs at their fingertips. The latest edition of this book helps you get the most out of versions through Java 17, with examples that show how to take advantage of modern Java APIs and development best practices.

    Get the e-book
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue