Skip to main content
Redhat Developers  Logo
  • Products

    Platforms

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat AI
      Red Hat AI
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • View All Red Hat Products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat Developer Hub
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat OpenShift Local
    • Red Hat Developer Sandbox

      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Secure Development & Architectures

      • Security
      • Secure coding
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • Product Documentation
    • API Catalog
    • Legacy Documentation
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

JVM tuning for Red Hat Data Grid on Red Hat OpenShift 4

July 16, 2025
Francisco De Melo Junior
Related topics:
ContainersJavaJava MicroservicesRuntimes
Related products:
Red Hat build of OpenJDKRed Hat Data Grid

Share:

    In Red Hat OpenShift 4, the Operator framework became a fundamental part of the daily cluster operations. We previously explained the Data Grid Operator in the blog post How to install and upgrade Data Grid 8 Operator. You can also deploy via a Helm chart, as discussed in How to use Helm charts to deploy Data Grid on OpenShift.

    This article describes a few Java Virtual Machine (JVM) tuning considerations for Red Hat Data Grid deployments. Note that some of this advice also applies to Red Hat JBoss Enterprise Application Platform (EAP) and other middleware products. This post builds on the article How to Use Java Container Awareness in OpenShift 4, which discusses several of those topics.

    Fundamentals of JVM tuning for Red Hat Data Grid

    While containers make it simple to deploy an application, the JVM is bound by the container, which can make tuning difficult.

    Beyond application performance and garbage collection (GC) impact, you should consider additional factors for the container and the OpenShift node itself. Application deployment involves several considerations; the GC collector is just one of many.

    These tuning considerations for Data Grid deployments focus on 3 key aspects:

    • The container deployment and how this impacts the JVM.
    • The Kubernetes/OpenShift deployment and its impacts, such as Quality of Service (QoS) of the pods.
    • The GC and JVM performance in general. While the GC collector is a big factor here, it doesn't entirely control the Java container performance. 

    5 core tuning considerations

    Here are some key points to keep in mind for tuning:

    1. JVM container awareness: The JVM is container-aware, automatically detecting and adhering to the container's boundaries. This is a valuable feature for resource management.
    2. JVM Inelasticity: For main Data Grid operations, the JVM is inelastic, meaning it primarily uses resources.limits for CPU calculations (including GC threads, blocking threads, and non-blocking threads).
    3. Allocate adequate CPU resources: Provide sufficient CPU resources for the container. Avoid allocating minimal CPU resources; this might not cause immediate restarts or out-of-memory errors, but it could lead to unresponsive probes and trigger a SIGTERM from the kubelet.
    4. Set appropriate quality of service (QoS): In OpenShift Container Platform 4, resource limits and requests enable sharing and potential overcommitment of cluster node resources. Because this overcommitment can be problematic, set an adequate QoS to avoid performance issues.

      Specifically, setting pod resource requests the same value as the pod limits provides the pod with Guaranteed Quality of Service (QoS). For production-critical deployments, this approach is recommended. It prevents overcommitment by users and ensures applications do not interfere with critical deployments. See Figure 1.

      alt text
      Figure 1: Diagram of Quality of Service (QoS).

      However, this does not mean that one should always use this setting. Red Hat's recommendation is to deploy critical workloads with Guaranteed Quality of Service. Most applications do not need to use a high number of CPUs throughout the day, as they might spike sporadically and not simultaneously. Therefore, having different values with Burstable QoS can be adequate, or in some cases, BestEffort QoS. The use of Guaranteed QoS might be excessive (and even counterproductive) for all circumstances. For more information on overcommitting resources, refer to the OpenShift documentation. This has other implications, such as CPU throttling—see below.

    5. Finally, adjust the deployment method. Select the adequate deployment method in OpenShift Container Platform 4, either Operator or Helm charts. In most cases, the Data Grid Operator should be adequate.
    Core factPurposeRecommended
    JVM is container-awareJVM boundaries are associated with the container resources via cgroups.Setting the adequate container memory size will also set the JVM size (heap and off-heap).
    JVM is inelasticThe JVM will not dynamically change its heap and number of threads (more on this in a moment).Set adequate resources on the container for the JVM to start with.
    Allocate reasonable resourcesAvoid allocating scarce resources for the JVM.Avoid extremely small container sizes for Data Grid in terms of CPU and memory.
    Runtime versus build timeRuntime allows better adaptability.  Avoid setting Java parameters at build time; instead, prefer runtime (e.g., extraJvmOpts).
    Quality of Service (QoS)The usage of Guaranteed QoS provides more security in case of resource scarcity.Deploy critical applications with Guaranteed QoS.

    About the number of threads: For all intents and purposes, Red Hat build of OpenJDK considers the CPU/memory values at the start of the container. Thread and memory calculations are derived from those values, affecting heap size and thread pools. If the value changes with an update to the cgroups file, the JVM will see it, but its behavior is undefined if it's no longer the same as the usage at startup. See more information here.

    Most importantly, benchmark your application so that memory and CPU usage are clearly understood. This will help identify resource requirements and breakpoints in terms of usage and allocation. It also helps understand how the application will behave under non-optimal OpenShift Container Platform scenarios.

    Considerations for Data Grid Operator versus Data Grid via Helm charts

    When using the Data Grid Operator, you must take additional considerations into account for each of the pods:

    • Data Grid Operator pod: If many clusters are being deployed by the Data Grid Operator, more resources might be needed if there are a large number of Infinispan CR and Cache CRs.
    • Router pod: This is useful for cross-site configuration.
    • Config Listener pod (also known as the tattletale pod, as it provides the configuration the user creates in the Data Grid cluster): You might need to increase its resources if there are many cache Custom Resources for bidirectional reconciliation.

    The following table summarizes these considerations:

    PodPurposeWhy tune it?
    Data Grid OperatorListens for API changes and creates resources like caches and Infinispan clusters.To handle more Infinispan resources. More caches/clusters require more resources.
    Router podEnables cross-site communication.For scaling up cross-site communication.
    ConfigListenerListen to adjustments on the YAML files.More caches/clusters require more resources to listen for changes.

    When using Data Grid Helm charts, be sure to consider container resource scalability. For instance, in cases where there are many memcached listeners, the Infinispan pod container must increase resources accordingly. The more socket listeners, for example, the more container resources you need to allocate.

    Benchmark

    The main recommendation in any tuning setting is to have a benchmarking methodology that allows comparing a certain baseline versus a certain alternate scenario. The difference can then be pinpointed specifically for a certain task or series of tasks.

    Steps:

    1. Set a baseline for comparison.
    2. Set a comparative scenario.
    3. Conclude by comparing the baseline versus comparative scenarios.

    In terms of comparison metrics, comparing specific scenarios in terms of latency versus footprint versus throughput is always the simplest approach. More often than not, we sacrifice footprint for a better throughput or latency. 

    GC log collection and GC collectors

    GC log collection is highly recommended in any scenario. It has very low overhead and provides extensive, detailed information about the specific operation of the Data Grid Java process.

    Example configuration:

    Xlog:gc*=info:file=/tmp/gc.log:time,level,tags,uptimemillis:filecount=10,filesize=1m

    JVM garbage collectors offer numerous configuration options. G1GC, for instance, simplifies JVM deployment by prioritizing latency targets over generational memory sizing using the MaxGCPauseMillis argument, streamlining the process compared to CMS with its many settings.

    Furthermore, it's important to note that the non-generational Shenandoah collector is generally unsuitable for workloads involving extensive random memory allocation and deallocation. Therefore, combining it with Data Grid is not recommended unless the application features consistent, ongoing data allocation and deallocation, which is uncommon in Data Grid use cases.

    GC collectorUsage
    Non-generational (e.g.,  Shenandoah or ZGC)Avoid using high-performance Data Grid (e.g., cross-site scenarios and random/high allocation/deallocation).
    G1GCGenerational collector focused on latency (usually has the best performance for high performance Data Grid, including cross-site scenarios).
    ParallelGCGenerational collector focused on throughput rather than latency.

    For generational workloads, generational collectors will have better performance than non-generational, even concurrent ones.

    JVM tuning

    JVM tuning for Data Grid Operator and Data Grid Helm charts can be configured as described below..

    Data Grid Operator:

    spec:
        container:
          cpu: '2'
          extraJvmOpts: '-Xlog:gc*=info:file=/tmp/gc.log:time,level,tags,uptimemillis:filecount=10,filesize=1m'

    Data Grid Helm chart:

    deploy:
       container 
          extraJvmOpts: '-Xlog:gc*=info:file=/tmp/gc.log:time,level,tags,uptimemillis:filecount=10,filesize=1m'

    For more information on Helm chart releases, see the Red Hat Data Grid 8.3 Documentation.

    CPU throttling

    When discussing tuning, including the number of threads and Garbage Collector, CPU throttling is an important topic. CPU throttling occurs when a container's runtime CPU utilization reaches, approaches, or surpasses the set CPU limits, triggering a kernel reaction to cap its utilization.

    This is a consequence of the container being just a process in the OpenShift Container Platform host, as explained on this solution. The kernel has the ability and prerogative to preempt the processes to run; this is a feature. Below are some additional points to consider, depending on the situation:

    The throttling should occur when the application's threads have used up its quota (limit), given the CPU quota the process is preempted. Depending on the kernel version this plays less of a role.
    OCP host might use Completely Fair Schedule's (CFS) quota mechanism to implement process limits, and therefore impose a limit on the threads as a consequence, regardless of CPU and memory settings.

    CPU throttling process is a kernel feature. Setting a different Quality of Service (QoS), which is a Kubelet feature, does not prevent the kernel from throttling. For example, this occurs in a scenario where the application maxes out a single CPU on the thread where a single-threaded Lightweight Process (LWP) executes on one CPU. The following table summarizes these scenarios.

    OptionConsPros
    Leave the container without any limitsThe QoS will be BestEffort.Not having limits means the kernel will not throttle the application. However, if eviction occurs, this container will be the first. Additionally, Data Grid will use the full OpenShift node as the CPU (or memory) limit.
    Leave the container with limitsThe QoS will be can be Guaranteed or Burstable Having limits means the kernel will throttle the application at some point. However, if eviction occurs, this container will not be the first.

    Although Kubelet's Guaranteed or Burstable QoS does not prevent CPU throttling from happening, avoiding setting limits will prevent the kernel from performing CPU throttling. Nonetheless, this has two critical implications:

    • First and most importantly: Data Grid and any Red Hat build of OpenJDK application use the deployment CPU limits for CPU thread calculation. This means leaving no limits on the deployment has consequences.
    • Second: Not setting limits (and requests), which means not setting Guaranteed or Burstable QoS, results in a Best Effort QoS. If the cluster comes under pressure, Best Effort QoS pods will be evicted first, followed by Burstable QoS pods.

    Therefore, if the user sets limits and requests (thereby setting pods as Guaranteed or Burstable QoS), CPU throttling can occur. Avoiding CPU limits on the deployment would prevent this.

    To compromise between eviction and CPU throttling, you can set a larger request-to-limit ratio on the deployment so the pod will have Burstable QoS. This compromise will:

    • Reduce (but not completely avoid) CPU throttling.
    • Ensure that if eviction occurs, this pod will not be the first to be evicted.
    • Prevent Data Grid (or any OpenJDK application) from using the full OpenShift node as a limit for memory and/or CPU, which would lead to an excessive number of threads.

    Other factors to consider

    Below are some additional points to consider, depending on the situation:

    • Complementary to the above, do not set Xmx for more than 80% of the heap (leaving less than off-heap space). Always leave space for off-heap, even if the cache is not explicitly configured for off-heap.
    • Do not set off-heap for 80%+ of the total memory, even if all caches are off-heap. This is because state transfers temporarily require a significant chunk of heap memory. After the actual cache data, state transfers are the second biggest memory consumer in Data Grid (in short bursts).
    • One should not deploy anything critical in production with less than 2 or 3 CPUs. A lack of CPU will cause timeouts on the execution of CLI commands. If you have more resources, this is the time to use them.
    • Using 2+ CPUs also allows for much better utilization of multi-thread collectors such as ParallelGC and G1GC, taking advantage of multiple CPUs.
    • Avoid deprecated service caches, such as those described in Red Hat Solution 6972352.
    • Avoid deprecated flags, like using JDK 17 with UseParallelOldGC. This will return: Unrecognized VM option 'UseParallelOldGC' or --illegal-access=debug.

    For specific Data Grid recommendations, see this Red Hat article.

    Conclusion

    This article covered JVM tuning, core tuning settings, and benchmarking topics, focusing on Data Grid specificities in OpenShift Container Platform 4. It also raised some aspects of Data Grid installation and customization that can be done via the Data Grid Operator (preferable) or Data Grid Helm charts.

    Regarding tuning and settings, a considerable number of JVM settings can be changed, including—but not limited to—GC and container settings. For GC collectors, G1GC has shown great performance for generational workloads, specifically focusing on latency targets rather than generational memory size.

    This article briefly describes some tuning possibilities for Java applications deployed in OpenShift Container Platform 4. None of the points described here are mandatory rules; everything should be taken with a grain of salt depending on the specific use case. Therefore, benchmarking and comparing performance with a baseline and expectations should be the primary lessons learned.

    For critical deployments, the recommendation is to set Guaranteed or Burstable QoS and avoid Best Effort QoS. This prevents critical pods from being evicted if the OpenShift cluster resources are under pressure or over-provisioned. However, the trade-off will be the effect of CPU throttling by the kernel, as these containers will have limits. A compromise to both problems, preemption and eviction, was proposed above: Burstable QoS with a larger ratio of requests to limits. This will prevent pods from being the first to be evicted if resources run out and will avoid significant throttling effects.

    Finally, this article builds on the main article How to Use Java Container Awareness in OpenShift 4, which details how changes in JDK 8u191+ enabled container awareness. This is undoubtedly the largest feature and causes a significant shift in Java application development and deployment inside OpenShift Container Platform 4. The application no longer needs to be explicitly "deployed for container settings" but rather, the deployment will detect the container settings and adapt from the detected cgroups (version 1 and version 2).

    Additional resources

    To learn more, read Java 17: What’s new in OpenJDK's container awareness.

    To learn more about the DG Operator, see the Data Grid Operator Guide.

    To learn more about the Data Grid Helm chart deployment, see here.

    For any other specific inquiries, please open a case with Red Hat support. Our global team of experts can help you with any issues.

    Special thanks to Will Russell and Alexander Barbosa for the review of this article. Finally, thank you to Vladislav Walek for his inputs and lessons on OpenShift those last 5 years working together.

    Related Posts

    • Develop and test a Quarkus client on Red Hat CodeReady Containers with Red Hat Data Grid 8.0

    • Using Red Hat Data Grid to power a multi-cloud real-time game

    • Securely connect Quarkus and Red Hat Data Grid on Red Hat OpenShift

    • Overhauling memory tuning in OpenJDK containers updates

    • Build embedded cache clusters with Quarkus and Red Hat Data Grid

    • How the JVM uses and allocates memory

    Recent Posts

    • Kubernetes MCP server: AI-powered cluster management

    • Unlocking the power of OpenShift Service Mesh 3

    • Run DialoGPT-small on OpenShift AI for internal model testing

    • Skopeo: The unsung hero of Linux container-tools

    • Automate certificate management in OpenShift

    What’s up next?

    Learn how to develop applications using Quarkus, .NET Core 7, and Golang that are distributed in two different Red Hat OpenShift clusters and share data with each other through Red Hat Data Grid via cross-site replication in this learning path.

    Start the activity
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue