Troubleshoot application misbehavior after an OpenShift upgrade

This article explains how to diagnose and address application misbehavior after a Red Hat OpenShift upgrade.

Container awareness is a primary focus, as it dictates how an application behaves within a container. I therefore consider this article a follow-up to How to use Java container awareness in OpenShift 4, serving as a second expansion package after How does cgroups v2 impact Java, .NET, and Node.js in OpenShift 4?.

I discuss several points regarding this matter in the sections that follow. The topic is extensive so I will be as concise as possible. This article assumes you are already investigating unexpected application behavior. Therefore, it will not discuss other topics (in migration) such as how to do it, or how to prepare for doing so.

Migration

Upgrading your Red Hat OpenShift version should not require changes to your application, and most migrations are painless.

This is generally true if your application is compatible with cgroups v1 or cgroups v2. In these cases, application behavior remains consistent across upgrades. The following table shows the expected outputs.

Command	cgroups v1 output	cgroups v2 output
`grep cgroup /proc/filesystems`	`nodev cgroup`	`nodev cgroup`, `nodev cgroup2`
`$ stat -fc %T /sys/fs/cgroup/`	`tmpfs`	`cgroup2fs`
`$ cat output (file)`	`/sys/fs/cgroup/memory/memory.max_usage_in_bytes`	`/sys/fs/cgroup/memory.max`

However, some scenarios can lead to unexpected outcomes; these troublesome scenarios are discussed in the following sections.

Troublesome scenarios

After migrating to a new cluster, the application behaves differently than in the previous version.

Application changes

Although seemingly straightforward, this is a common source of problems, especially when image tags are modified. The tag might be incorrect or the image can be cached on the registry.

Workload changes

Workload patterns might change due to Ingress parameters or increased application access.

This is easily verifiable using the metrics already embedded in Red Hat OpenShift 4, such as Prometheus and Grafana metrics for use and networking (packets) usage or exchange.

Instrumentation

Changes to instrumentation—such as a different image or a new sidecar configuration—can cause the application to consume more memory than expected. Even with an identical image, updated instrumentation settings might introduce unexpected overhead or cause the application to misbehave. Because these issues are case-specific, consult your tool vendor to evaluate how a cgroups version migration affects their software.

Non-container-aware applications

An application that is not container-aware looks at the host hardware sizing rather than the specific resource allocations of its pod foundation.

JVM container awareness

When optimizing heap configurations, awareness metrics depend on deployment states.

If a JVM deployment cannot detect its container limits, the application becomes non-container-aware.

The application typically behaves consistently if it fails to detect limits on either the source or target hosts, assuming both environments share identical CPU and memory resources. The following table summarizes these scenarios.

Source host state	Target host state	Scenario description
Container-aware	Non-container-aware	Most common scenario: The application detects limits on the source host but fails to detect them on the target host.
Non-container-aware	Non-container-aware	Uncommon scenario: The application must already be deployed as non-container-aware.
Non-container-aware	Container-aware	Rare scenario: The application or its deployment script must dynamically detect host limits.

Off-heap non-container-aware applications

For instance, Netty (depending on the application and configuration) is not always container-aware and can scale according to the number of CPUs on the host. This means off-heap resource use inside the container aligns directly with the host's CPU metrics.

Non-JVM component awareness

Even if the JVM is container-aware, underlying runtime components might not be. For example, native libraries like jemalloc and glibc reside below the JVM level and do not inherently detect container constraints.

When deployed in an OpenShift cluster with fewer resource restrictions on its nodes, the application scales based on the host resources instead of the container limits.

cgroups version change

This is a specific instance of the previous use case. An application deployed under cgroups v1 might successfully detect container limits, but fail to do so after migrating to cgroups v2. Refer to the previous table.

When an application is migrated to a different cgroups version, Java runtime environments might fail to detect container memory or CPU limits.

Review the following criteria to identify and resolve these compatibility failures:

When it will fail: Deploying an application that is incompatible with cgroups v2 can cause unexpected behavior in Red Hat OpenShift 4.19.
How to fix it: Upgrade to a cgroups v2-compatible version of your runtime environment.
Workaround: If an upgrade is not possible, limit CPU and memory allocations manually to mitigate broader cluster issues.

For instance, the removal of cgroups v1 in Red Hat OpenShift 4.19 could explain differences, especially if the application is not cgroups v2 compliant. In this case, the application will behave as non-container-aware.

This is the core topic of the article How does cgroups v2 impact Java, .NET, and Node.js in OpenShift 4?

Unbounded deployments

Unbounded deployments also directly impact application migration.

When deploying an application inside the Red Hat OpenShift host, you can set that application without limits, allowing it to use the full host for performance.

This configuration is known as an unbounded application. This type of deployment allows the application to spike when workloads peak, taking advantage of over-provisioned OpenShift nodes.

Java uses container limits rather than requests for resource calculations. For unbounded deployments, host-level specifications directly dictate runtime thread and memory footprint calculations.

CPU impact

Host CPU cores directly affect the application thread count and overall resource footprint, as many heap and off-heap components base their internal configurations on available CPU cores. For example, Netty scales its thread pool according to the host processor count.

In an unbounded deployment, a high host CPU count creates an excessively large thread pool. This leads to CPU throttling as multiple threads compete for kernel time slices while garbage collection (GC) threads execute simultaneously. The throttling should occur when the application's threads have used up its quota (limit), given the CPU quota the process is preempted. Depending on the kernel version, this plays less of a role, though.

Memory impact

The memory will be calculated from the host limits, which results in a larger footprint.

Even at baseline idle states, resource consumption can be significantly higher than in strictly bounded environments.

Increased memory use can directly affect garbage collection (GC) performance. For example, when using a collector like the Garbage-First garbage collector (G1GC), memory use scales upward until a full GC cycle is triggered. Similarly, the triple-mapping mechanism in the Z Garbage Collector (ZGC) for OpenJDK 17 or other non-generational collectors can significantly increase the container's overall memory footprint.

Although the extent to which you should use unbounded applications is debatable, the problem of noisy neighbors and the normal OpenShift workload directly affects the application, even if the resource footprint varies.

FAQ

The following frequently asked questions can help you verify and address this problem:

Q1. What is the first step to verify an issue after migration?

A1. Verify the cgroups difference. This can be done proactively.

Q2. What would be a possible troubleshooting flow?

A2. Verify the cgroups difference, deploy and verify. Also benchmark to make sure the problem always happens. Intermittent issues can often be more challenging to resolve than persistent, predictable failures. Verify memory, latency, and CPU usage—not just the spike, but the number of threads as well.

Q3. Is an increase or decrease in garbage collection cycles a sign of a problem?

A3. Not necessarily. Several variables can alter garbage collection (GC) frequency, including unbounded deployments, instrumentation choices, and specific runtime configurations. For instance, in an unbounded deployment, a higher host thread count might result in shorter, more frequent GC cycles.

Q4. Is deploying without boundaries a problem?

A4. Not necessarily. A DevOps team can decide to deploy an unbounded configuration, which is valid and effective in specific runtime environments.

However, the trade-off is that the application operates without constraints, creating specific resource challenges:

For CPU use, host limits dictate the total thread count and CPU-bounded threads, allowing allocations to scale far beyond standard container thresholds during workload spikes. If you deploy multiple unbounded applications that experience simultaneous performance spikes, the host processor can become overwhelmed.
For memory use, the application requests a percentage of the host resource pool rather than the container allocation. This behavior occurs in both unbounded deployments and non-container-aware configurations. Memory use can reach 50%, 60%, 70%, or 80% of total host capacity.

This deployment approach works well in isolated environments, such as standalone microservices or small clusters where workloads do not compete for hardware resources. Because this configuration prioritizes the application at the expense of the underlying node, it can starve neighboring deployments or use the entire memory allocation of the host.

Data collection

You can gather specific data points to troubleshoot these scenario-based issues. For instance, for Java 11 and later you can use the VM.info file to verify heap size, off-heap configurations, container size, and CPU details. If the runtime fails to detect cgroups limits, the file entries show this mismatch. That's the most straightforward approach. -Xlog:os+container=trace and -XshowSettings:system can be helpful:

Xlog:os+container=trace: Pass this flag at boot time to see exactly how Java attempts to discover cgroups limits.
-XshowSettings:system: Add this flag to print the system resources the JVM it has access to at startup.
VM.info: Available for Java 11 and later (this file is not a feature in JDK 8), this generates a human-readable dump file that provides container limits and complete JVM settings.

Isolate resource constraints after your next upgrade

When troubleshooting application disruptions after an OpenShift upgrade, check your cgroups compatibility first. Verifying whether your runtime components accurately detect container-level limits isolates the root cause of unexpected resource footprint spikes and streamlines your diagnostic path.

Overview

Understanding how your application currently runs simplifies the migration process. Determine if the application is compliant with cgroups v2 or if it operates in an unbounded deployment model. Knowing these architectural specifics improves your diagnostic path.

The following table maps common migration scenarios to their corresponding diagnostic actions and verification steps.

Troubleshooting scenario	Diagnostic action	Verification step
Application change	Track the specific application change backward.	Verify elements of application difference, such as the image SHA and image metadata.
Workload change	Track workload and application behaviors using instrumentation and data collection.	Verify elements of workload difference, cluster metrics, and memory allocation.
Instrumentation change or impact	Track the instrumentation overhead and disable the instrumentation or settings.	Verify elements of workload difference not related to application memory use.
Cgroups version change	Either the application or the container startup script lacks updates, or the application might not be compatible with cgroups v2.	Verify the application's compatibility with cgroups v2.
Unbounded deployment	Unbounded deployments face host configuration fluctuations, and noisy neighbor issues can affect resource distribution.	Verify details on the OpenShift host.

Maintaining a rollback process provides a clear recovery path so your deployment remains stable if issues arise.

Additional resources

The following articles cover similar themes and complement this guide:

How does cgroups v2 impact Java, .NET, and Node.js in OpenShift 4?
Cgroups v2 in OpenJDK container in Openshift 4
Verifying Cgroup v2 Support in OpenJDK Images
What Red Hat middleware software is cgroups v2 compatible?
CPU Throttling even when the container does not reach its CPU Limit explains the impact of throttling when limits are not met.
Red Hat build of OpenJDK container awareness for Kubernetes tuning covers container awareness and optimization configurations.

To learn more about Java container awareness and how it prevents heap decoupling, see How to use Java container awareness in OpenShift 4. For detailed release notes, review Severin Gehwolf's article on cgroups v2 support in OpenJDK 8u372.

Acknowledgments

Thanks to Moises Lozano and Pamela Giz for their contributions to this work.

Last updated: June 19, 2026

Troubleshoot application misbehavior after an OpenShift upgrade

Migration

Troublesome scenarios

Application changes

Workload changes

Instrumentation

Non-container-aware applications

JVM container awareness

Off-heap non-container-aware applications

Non-JVM component awareness

cgroups version change

Unbounded deployments

CPU impact

Memory impact

FAQ

Q1. What is the first step to verify an issue after migration?

Q2. What would be a possible troubleshooting flow?

Q3. Is an increase or decrease in garbage collection cycles a sign of a problem?

Q4. Is deploying without boundaries a problem?

Data collection

Isolate resource constraints after your next upgrade

Overview

Additional resources

Acknowledgments

Build a distributed RAG pipeline with Ray Data on OpenShift AI

OSFT explained: Prevent catastrophic forgetting in LLM fine-tuning

Enrich OpenShift compliance results with custom metadata

How we designed customizable dashboards in OpenShift

Standardize project context with AGENTS.md and Agent Skills

Enterprise Java Design Patterns in the Cloud Native Era

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links