I've spent years helping teams troubleshoot node stability issues in production Red Hat OpenShift clusters, and one pattern keeps appearing: nodes with insufficient system reserves running out of memory or experiencing CPU starvation for critical system daemons. The problem has become more pronounced as nodes have grown larger. I've seen clusters running 256 GB worker nodes where system daemons were competing with hundreds of pods for just 1 GB of reserved memory.
Starting with Red Hat OpenShift Container Platform 4.21, that's changing. OpenShift Container Platform will now automatically calculate and allocate system-reserved resources for newly created clusters, along with enforcing CPU limits on system daemons. Designed to improve node stability across the board, these changes have important implications for capacity planning that every OpenShift administrator needs to understand.
This article describes everything you need to know about the default enablement of AutoSizingReserved (OpenShift Container Platform 4.21+) and system-reserved-compressible (OpenShift Container Platform 4.22+).
What's changing
These changes ensure system stability on large nodes while maintaining backward compatibility for existing deployments. Starting with OpenShift Container Platform 4.21, it enables AutoSizingReserved by default for new clusters. This feature automatically calculates system resource reservations based on node size, ensuring that larger nodes get proportionally more reserves for system daemons. However, existing clusters preserve their current behavior until you explicitly opt in.
Also with OpenShift Container Platform 4.22, system-reserved-compressible enforcement on worker nodes uses cgroup controls to enforce CPU limits on system daemons, providing predictable CPU allocation during contention. Performance profile nodes are automatically excluded from enforcement, and control plane nodes continue using existing system resource management.
The problem these changes solve
To appreciate why these defaults are changing, it helps to understand how resource reservation worked previously. The kubelet has always supported reserving system resources (e.g., CPU, memory, and ephemeral storage) for node-level daemons like kubelet, CRI-O, and other system services. This ensures that critical system processes have the resources they need even when the node is running at capacity.
However, earlier versions of OpenShift Container Platform disabled the autoSizingReserved parameter by default. This meant administrators had to manually configure system reserves, and many clusters ran with minimal to no reserves beyond basic defaults. A typical default might reserve 1 GB of memory regardless of whether the node had 8 GB or 512 GB of total memory. This worked on smaller nodes but created real problems on large nodes running dense workloads.
I've debugged incidents where massive nodes with hundreds of pods experienced memory pressure, not because workloads exceeded their limits but because system daemons were fighting for scraps. The kubelet might try to manage 300 pods while competing for memory with those same pods. Under memory pressure, the kernel's out-of-memory killer would sometimes terminate system processes instead of workload containers, causing cascading failures.
The new defaults address this by automatically scaling system reserves based on node capacity. Larger nodes get proportionally more reserves, ensuring system stability while still maximizing allocatable capacity for workloads. Additionally, CPU enforcement ensures system daemons stay within their allocated share even on high-CPU-count nodes.
When these changes take effect
Let me break down exactly what's different and when these changes take effect.
AutoSizingReserved becomes default in 4.21
Starting with OpenShift Container Platform 4.21, any new cluster you create will have autoSizingReserved enabled by default. This means the kubelet will automatically calculate how much CPU, memory, and ephemeral storage to reserve for system daemons based on the node's total capacity. The calculation uses carefully tuned formulas that balance protecting system stability with maximizing allocatable resources for workloads.
If you're upgrading from 4.20 to 4.21, the important part for existing clusters is that this feature remains disabled. This is intentional. We don't want to surprise you with different capacity allocations in the middle of an upgrade. Your existing nodes will continue operating exactly as they did before, and you can enable auto-sizing on your own timeline after verifying the capacity impact.
System-reserved-compressible enforcement in 4.22
OpenShift Container Platform 4.22 introduces CPU enforcement for system reserves through a feature called system-reserved-compressible. While 4.21 calculates appropriate CPU reserves, it doesn't strictly enforce them at the cgroup level. System daemons could still burst beyond their allocation if CPU was available, which is fine, but they could also consume more than intended during contention.
It's worth clarifying what this feature actually does. The cgroup v2 compressible CPU capability already exists in the Linux kernel. This feature simply applies that capability to system reservations. When enabled, the kubelet configures the systemd system.slice cgroup (where kubelet, CRI-O, and other system services run) with CPU limits and enforces them through cgroup controls.
OpenShift Container Platform 4.22+ enables this enforcement by default on worker nodes. Note that master nodes do not have this enabled by default. Control plane nodes continue to use their existing resource management approach to ensure cluster management operations aren't impacted.
The compressible term refers to how CPU differs from memory as a resource type. CPU is compressible, meaning when a process exceeds its CPU limit, the kernel throttles it, reducing its CPU time without killing the process. This contrasts with memory, which is incompressible. When a process exceeds its memory limit, the out-of-memory (OOM) killer terminates it. This distinction matters because CPU enforcement gracefully degrades performance under pressure, while memory enforcement can cause abrupt failures.
How to calculate the reservation
When enabling AutoSizingReserved, the kubelet uses specific formulas to determine how much CPU and memory to reserve for system services. Resource reservation for system daemons is an industry-wide practice across managed Kubernetes platforms.
Various cloud providers have developed similar approaches to ensure node stability as follows:
Google Kubernetes Engine (GKE) uses a tiered memory reservation formula based on total node memory, reserving 25% of the first 4 GB, 20% of the next 4 GB, and smaller percentages for larger memory allocations. Review the GKE documentation on node allocatable resources.
Azure Kubernetes Service (AKS) implements graduated CPU and memory reservations, with the first core and first 4 GB receiving higher reservation percentages similar to OpenShift's approach. Details are available in the AKS resource reservations documentation.
Amazon Elastic Kubernetes Service (EKS) calculates reserved resources based on instance size and maximum pod count per node, accounting for both system processes and networking overhead. Refer to the EKS best practices guide.
The upstream Kubernetes documentation provides the foundational concepts for system resource reservations that these implementations build upon.
OpenShift Container Platform's AutoSizingReserved feature follows similar principles, specifically tuned for OpenShift Container Platform architecture and operational requirements. The following formulas reflect extensive testing across diverse production workloads.
Memory calculation formula
The memory reservation, optimized to be less aggressive on smaller nodes, still provides adequate protection for larger nodes:
- First 8 GiB of memory: 1 GiB (flat reservation, matching the old non-dynamic default)
- Next 120 GiB of memory (up to 128 GiB total): 6% of memory
- Above 128 GiB: 2% of any memory above 128 GiB
This table shows memory reservation at common node sizes:
Total Memory | Reserved Memory | Allocatable Memory |
|---|---|---|
8 GiB | 1 GiB | ~7 GiB |
16 GiB | 1.48 GiB | ~14.52 GiB |
32 GiB | 2.44 GiB | ~29.56 GiB |
64 GiB | 4.36 GiB | ~59.64 GiB |
128 GiB | 8.2 GiB | ~119.8 GiB |
256 GiB | 10.44 GiB | ~245.56 GiB |
512 GiB | 15.56 GiB | ~496.44 GiB |
Let's walk through a specific example for a 16 GB node:
- First 8 GiB: 1.0 GiB (flat)
- Next 8 GiB: 8 × 6% = 0.48 GiB
- Total reserved: 1.48 GiB (leaving ~14.52 GiB for workloads)
CPU calculation formula
The CPU reservation uses a base-plus-increment model but enforces a strict minimum floor.
The logic:
- Base (1st Core): 60 millicores (0.06 CPU)
- Increment: 12 millicores (0.012 CPU) for every additional core beyond the first
- Minimum threshold: Compare this result against a floor of 0.5 CPU. If the calculated value is less than 0.5, the system enforces a reservation of 0.5 CPU.
Let's see how this applies to a smaller worker node with 4 vCPUs:
- Calculate raw requirement:
- Base (1st Core): 0.06
- Additional (3 Cores): 3 × 0.012 = 0.036
- Raw Total: 0.06 + 0.036 = 0.096 CPU
- Apply threshold: Since 0.096 is less than the minimum of 0.5, the reservation raises to the floor value.
- Final reserved: 0.5 vCPU (500 millicores)
Thus, on a 4-core machine, it reserves 0.5 vCPU for system daemons, leaving 3.5 vCPUs allocatable for pods.
CPU enforcement for system daemons
Previously, OpenShift Container Platform calculated CPU reserves as an accounting measure. The kubelet knew that it reserved 0.5 CPU for system processes and factored that into the node's allocatable capacity. But nothing prevented system processes from using more CPU if it was available. This flexibility is generally good—you want system daemons to use idle CPU for housekeeping tasks. The problem arises under contention.
In addition to calculating system-reserved resources, OpenShift Container Platform now enforces CPU limits on system daemons through cgroup-based enforcement.
What is system-reserved-compressible
Previously, while OpenShift Container Platform calculated how much CPU should be reserved for system processes, this reservation was more of an accounting measure—it didn't actually enforce limits on system processes. This meant that on nodes with high CPU counts, system daemons could consume more CPU than intended, potentially impacting workload performance.
With system-reserved-compressible enabled:
- The kubelet enforces CPU limits on system daemons via
systemReservedCgroup: /system.slice. - It constrains system processes to their allocated CPU share through cgroup controls.
- This improves CPU allocation predictability, especially on nodes with high CPU counts.
How it works
The kubelet configuration now includes the following YAML:
systemReservedCgroup: /system.slice
enforceNodeAllocatable:
- pods
- system-reserved-compressibleThis tells the kubelet to enforce node allocatable limits on both pods (as before) and system-reserved resources. The enforcement happens at the cgroup level, where the Linux kernel mechanisms ensure that CPU distributes according to configured weights and limits.
It's important to understand that this isn't a hard cap in the traditional sense. CPU shares in Linux cgroups work proportionally during contention. When the node isn't under CPU pressure—your workload pods are idle and system daemons need CPU to pull images or perform garbage collection—system processes can use more than their reserved share. The unused CPU is available, so the kernel allows it.
How enforcement behaves in practice
The enforcement becomes meaningful when CPU contention occurs. Imagine your node is running at high utilization: workload pods are consuming their CPU requests, system daemons want CPU for ongoing operations, and there aren't enough cycles to satisfy everyone. In this scenario, the cgroup controller ensures the constraining of system processes to their configured share. If it reserves 0.5 CPU for system daemons on a 4-core node, system.slice will receive approximately that much CPU during contention, and workload pods will receive their expected share of the remaining 3.5 CPUs.
What actually happens when system daemons hit their CPU limit? They get throttled. The kernel reduces the CPU time allocated to processes in system.slice, spreading their work over a longer period. This means operations might take longer—an image pull might be slower, or garbage collection might lag—but the processes continue running. This is fundamentally different from memory limits, where exceeding the limit triggers the OOM killer and terminates processes. CPU throttling gracefully degrades performance rather than causing failures, which is exactly what you want for system daemons under pressure.
This predictability is crucial for clusters running latency-sensitive workloads. Without enforcement, I've seen production nodes where system daemons consumed significant CPU during busy periods, indirectly throttling workload containers that expected consistent CPU access. With enforcement, workload pods get more predictable CPU performance even when the node is under heavy load.
Compatibility with performance profiles
The kubelet cannot simultaneously enforce systemReservedCgroup and --reserved-cpus (used by performance profiles in the node tuning operator). For automatic handling, when detecting a performance profile with reservedSystemCPUs, the systemReservedCgroup automatically clears and enforceNodeAllocatable sets to ["pods"] only, ensuring the preservation of existing performance profile behavior without requiring any manual changes.
Note: The control plane nodes are not impacted. They are excluded from this change, and their resource reservation behavior remains untouched.
How to enable auto-sizing after upgrading
The upgrade process has a specific mechanism to ensure the preservation of your current configuration.
The pre-upgrade patch (4.20.6): Before upgrading to 4.21, your cluster will mandate a patch to version 4.20.6. During this patch, a specific MachineConfig named
50-worker-auto-sizing-disabledis automatically applied to your cluster. This config explicitly forcesautoSizingReservedto remain disabled.Enable the feature in 4.21: Once you have successfully upgraded to OpenShift Container Platform 4.21, the
50-worker-auto-sizing-disabledconfig persists, keeping the feature off. To enable auto-sizing and allow OpenShift Container Platform to manage system reserves dynamically, you simply need to remove this restriction.To enable
autoSizingReserved, delete the blockingMachineConfig.oc delete machineconfig 50-worker-auto-sizing-disabledMonitor the rollout: Deleting
MachineConfigwill trigger the machine config operator (MCO) to revert the nodes to the default 4.21 behavior (enabled). The MCO will drain, reconfigure, and reboot the nodes in the pool one by one. Ensure your cluster has enough spare capacity to handle the rolling reboot before executing this command.
Verify the configuration
After enabling these features (either on a new cluster or after removing the blocking MachineConfig), you can verify the configuration.
Verify AutoSizingReserved:
# SSH into a node and check allocatable resources
oc debug node/<node-name>
chroot /host
cat /etc/node-sizing-enabled.envVerify System-Reserved-Compressible:
# Check kubelet configuration for system-reserved-compressible
oc debug node/<node-name>
chroot /host
cat /etc/kubernetes/kubelet.conf | grep -A2 systemReservedCgroup
cat /etc/kubernetes/kubelet.conf | grep -A3 enforceNodeAllocatableExpected output:
systemReservedCgroup: /system.slice
enforceNodeAllocatable:
- pods
- system-reserved-compressibleIf you have performance profiles with reservedSystemCPUs configured, this is for nodes with performance profiles:
# Verify systemReservedCgroup is NOT present
cat /etc/kubernetes/kubelet.conf | grep systemReservedCgroup
# Verify enforceNodeAllocatable only contains pods
cat /etc/kubernetes/kubelet.conf | grep enforceNodeAllocatableFinal thoughts
These changes represent an important maturation of OpenShift Container Platform node management capabilities. By automatically scaling system reserves based on node capacity and enforcing CPU limits on system daemons, OpenShift Container Platform 4.21 and 4.22 provide better out-of-the-box stability while still giving administrators the flexibility to customize when needed.
For new clusters, the defaults should work well for most use cases. You can deploy your workloads knowing that nodes have adequate system reserves without manual tuning. For existing clusters, the upgrade path gives you control over when to adopt the new behavior, allowing you to plan for capacity changes on your timeline.
I've seen too many production incidents caused by insufficient system reserves—nodes crashing under memory pressure, system daemons competing with workloads for CPU, or kubelet becoming unresponsive due to resource starvation. These changes address the root cause of many of those issues, and I expect they'll meaningfully improve cluster stability across the Red Hat OpenShift ecosystem.
If you're planning an upgrade to 4.21, take time to understand how these changes will affect your specific clusters. Test them in non-production environments first, verify the capacity impact, and plan your rollout accordingly. Once enabled, you'll have more predictable, stable nodes that can reliably run the workloads you're deploying.
In rare cases where other slices are running CPU-intensive workloads, contention from slices other than system.slice and kubepods.slice may still impact overall CPU allocation. These changes primarily address the issue of massive nodes only getting 1 GiB of reserved memory despite running hundreds of pods.
For more details about configuring node resources in OpenShift Container Platform, refer to the official documentation for managing node resources.