How Kruize optimizes OpenShift workloads

Last year, Red Hat announced the availability of resource optimization for Red Hat OpenShift. This article goes deeper into discussing the benefits of Kruize Autotune, the engine that provides container right-sizing recommendations for resource optimization.

What is Kruize Autotune?

Kruize provides container right-sizing recommendations in Kubernetes in the form of CPU and memory requests and limits. The request and limit values for both CPU and memory are set to be the same. The recommendations are based on monitoring a data source such as Prometheus where the data source can be local or remote. The recommendations are based on resource usage in the past 24 hours (short term), 7 days (medium term), and 15 days (long term) and provide cost and performance-optimized suggestions for each term on a per-container basis.

Kruize also provides capacity and utilization data used to represent resource request versus actual resource utilization data (e.g., as a box plot) to better understand the recommendations.

Performance-optimized recommendations currently use the 98th percentile for CPU usage for the given term. The usage includes any throttling that may have happened in the term.

Memory recommendations use the max value in the observed term with an added buffer. The buffer represents the minimum of 20% over the max value and the maximum interval spike in the observed term.

mem^{Recommendation} = termMemMax^Interval + min(0.2 * termMemMax^Interval, termSpikeMax^Interval);

Here, Interval refers to the minimum observable duration of the gathered metrics. The default interval has currently been set to 15 minutes.

Cost-optimized recommendations uses the 60th percentile for CPU usage for the given term (including throttling), and the memory recommendation is the same as that of performance-optimized recommendations.

Note that the algorithms used to arrive at these recommendations are subject to change.

Production cluster example

Let's look at an example to explain the CPU and memory recommendations. In this example, the container swatch-tally-service on a production cluster has resource utilization over a 15-day term (Figure 1).

Fig 1: Box plot for container resource utilization over a 15 day term — Figure 1: A box plot for container resource utilization over a 15-day term.

In Figure 2, the container currently has a CPU request and limit of 2 cores, a memory request of 2 GiB, and a limit of 4 GiB.

Based on the past 15 days of actual usage, we see the cost-optimized recommendation from Kruize is 0.15 cores for CPU, which is 93% less than what has been set currently, and 957 MiB for memory, which is 53% less.

Figure 2: Cost-optimized recommendation based on the last 15 days of resource utilization — Figure 2: A cost-optimized recommendation based on the last 15 days of resource utilization.

If the same container needs to be optimized for performance, Figure 3 shows that Kruize recommends CPU request and limit of 1.18 cores, which is still 41% less than what is set currently. Memory remains 53% less than what has been set currently.

Fig 3: Performance Optimized Recommendation based on last 15 days of resource utilization — Figure 3: Performance-optimized recommendation based on the last 15 days of resource utilization.

Staging cluster example

In this example, the container swatch-tally-service now runs on a staging (non-production) cluster. Figure 4 shows the resource utilization over the last 24 hours.

Fig 4: Box plot for a container which has been idling — Figure 4: Box plot for container resource utilization over the last 24 hrs.

We see that the container currently has a CPU request and limit of 2 cores, memory request of 1 GiB, and a limit of 4 GiB.

Based on the past 24 hours of actual usage, we see the cost-optimized recommendation from Kruize to be 0.1 core for CPU, which is 95% less than what has been set currently, and 4.5 GiB for memory, which is 12% more. The memory recommendation is higher than what has been currently set for two reasons.

First, the max is very close to what has been set. Second, there may be observed spikes, which may push the usage beyond what was set as the limit. Since memory is not a compressible resource, the recommendation is higher to help offset any OOM scenarios (Figure 5).

Figure 5: Performance Optimized Recommendation based on last 24 hrs resource utilization — Figure 5: Cost-optimized recommendation based on the last 24 hours resource utilization.

On the other hand, the performance-optimized recommendation for the same term is 3.32 cores, which is an increase of 66% compared to the current set term. The memory recommendation is 4.5 GiB, which is an increase of 12%, as shown in Figure 6.

Fig 6: Performance Optimized Recommendation based on last 24 hrs resource utilization — Figure 6: Performance-optimized recommendation based on the last 24 hrs resource utilization.

Warnings in the recommendations

In certain conditions, recommendations display warnings against them. This section discusses them in more detail.

Warnings about idle containers:
Containers can idle (< 1 millicore of CPU usage) in the observed term. In Figure 7, we see a container that has been idle for the last 24 hours.

Fig 7: Box plot for a container which has been idling — Figure 7: A box plot for a container which has been idling.

In this scenario, Kruize will be unable to generate a recommendation for CPU for the respective term. Figure 8 shows a recommendation for the CPU idling case, which has an empty CPU recommendation with a warning icon.

Fig 8: Performance Optimized Recommendation based on last 24 hrs resource utilization — Figure 8: No CPU recommendation can be generated since the container was idling in the observed period.

Containers that do not have either a request or limit set in their current configuration:
Figure 9 shows a case where the CPU limit was not set in the current configuration, which results in a warning icon.

Fig 9: CPU Limit not set in the current configuration — Figure 9: CPU Limit not set in the current configuration.

Similarly, Figure 10 shows a case where both CPU and memory requests and limits have not been set.

Fig 10: CPU and Memory requests and limits not set in the current configuration — Figure 10: CPU and Memory requests and limits not set in the current configuration.

Important takeaways

For critical production workloads, we recommend setting the performance-optimized configuration recommendation based on the previous 15-day term. To prevent or reduce disruption to production workloads, it would be better to have fewer updates to the container configuration and only do updates if the recommended configuration is significantly different from the current one.

For non-production workloads, it would be wise to optimize for cost and update the configuration more frequently. In this case, you can get the maximum benefit if the configuration is set to the cost-optimized recommendation based on resource usage of the past 24 hours.

In general, when we optimize an entire cluster in this fashion, we see a more than 40% reduction in overall resource usage and the associated cost benefits thanks to Kruize. We are working on new and exciting recommendations, including AI workloads, so stay tuned.

Last updated: November 5, 2025

Red Hat Developer Sandbox

Programming languages & frameworks

System design & architecture

Developer experience

Automated data processing

Platform engineering

Secure development & architectures

E-books

Cheat sheets

Documentation

How Kruize optimizes OpenShift workloads

What is Kruize Autotune?

Production cluster example

Staging cluster example

Warnings in the recommendations

Important takeaways

How to deploy and benchmark vLLM with GuideLLM on Kubernetes

Getting started with OpenShift APIs for Data Protection

How in-place pod resizing boosts efficiency in OpenShift

Automate Oracle 19c deployments on OpenShift Virtualization

Monitoring OpenShift Gateway API and Service Mesh with Kiali

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Report a website issue