Skip to main content
Redhat Developers  Logo
  • Products

    Platforms

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat AI
      Red Hat AI
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • See all Red Hat products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat Developer Hub
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat OpenShift Local
    • Red Hat Developer Sandbox

      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Red Hat OpenShift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • See all technologies
    • Programming languages & frameworks

      • Java
      • Python
      • JavaScript
    • System design & architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer experience

      • Productivity
      • Tools
      • GitOps
    • Automated data processing

      • AI/ML
      • Data science
      • Apache Kafka on Kubernetes
    • Platform engineering

      • DevOps
      • DevSecOps
      • Red Hat Ansible Automation Platform for applications and services
    • Secure development & architectures

      • Security
      • Secure coding
  • Learn

    Featured

    • Kubernetes & cloud native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • AI/ML
      AI/ML Icon
    • See all learning resources

    E-books

    • GitOps cookbook
    • Podman in action
    • Kubernetes operators
    • The path to GitOps
    • See all e-books

    Cheat sheets

    • Linux commands
    • Bash commands
    • Git
    • systemd commands
    • See all cheat sheets

    Documentation

    • Product documentation
    • API catalog
    • Legacy documentation
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore the Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Prepare to enable Linux pressure stall information on Red Hat OpenShift

Prepare to enable Linux pressure stall information on Red Hat OpenShift

March 18, 2026
Qiujie Li
Related topics:
Application modernization
Related products:
Red Hat OpenShift

    Starting with Red Hat OpenShift 4.21, Linux pressure stall information (PSI) can be enabled using a MachineConfig object. Enabling PSI monitoring makes PSI metrics for CPU, memory, and I/O available for your cluster. Activating PSI helps you monitor and act on actual resource contention, not just utilization.

    Traditional metrics (CPU, memory used) tell you how much is being used, but not how much work is being delayed or stalled. To bridge that gap, PSI can:

    • reveal hidden bottlenecks before they turn into outages
    • explain resource contention that utilization alone can't justify
    • support better resource sizing and autoscaling decisions
    • improve debugging of resource starvation

    This article shares performance evaluation results for enabling PSI at scale. Enabling PSI has no observable impact on kubelet CPU or memory usage, but with PSI metrics collected at the node, pod, and container level, it does increase memory for Prometheus pods, and the impact grows with large container counts. On a cluster with 500+ test containers, enabling PSI causes Prometheus pod resident set size (RSS) to increase up to 1.3+ GB per Prometheus pod. That's a 42% increase from baseline memory consumption. Understanding the impact helps to plan cluster infrastructure resources before enabling PSI in production.

    Vocabulary for PSI

    There are two important terms when looking at PSI. First is PSI itself.

    • PSI: Pressure stall information is a Linux kernel feature that tracks the time that any process spends waiting for CPU, memory, and I/O resources. There are two levels in PSI: System-wide PSI and per-cgroup PSI.
    • System-wide PSI: Exposed in /proc/pressure/{cpu,memory,io} and represents the global pressure outside of any cgroup.
    • Per-cgroup PSI: PSI is also tracked for tasks grouped into cgroups. Each subdirectory in the cgroupfs mountpoint contains cpu.pressure, memory.pressure, and io.pressure files.

    This article focuses on per-cgroup PSI.

    • PSI metrics: Kubernetes supports kubelet configuration to collect Linux kernel PSI for CPU, memory, and I/O usage. The information is collected at the node, pod, and container level, and is exposed at the /metrics/cadvisor endpoint:
    • container_pressure_cpu_stalled_seconds_total
    • container_pressure_cpu_waiting_seconds_total
    • container_pressure_memory_stalled_seconds_total
    • container_pressure_memory_waiting_seconds_total
    • container_pressure_io_stalled_seconds_total
    • container_pressure_io_waiting_seconds_total

    Test Methodology

    We tested the performance impact of enabling PSI on kubelet and Prometheus at a scale, with 500+ user containers in a cluster.

    Cluster configuration

    • Infrastructure:
    • 3 masters nodes
    • 3 worker nodes
    • 3 infra nodes
    • Nodes are AWS EC2 instances with 4 vCPU, 16GB memory
    • Component Isolation:
    • Ingress, monitoring, and registry components deployed exclusively to infrastructure nodes
    • Prometheus configured with persistent storage (PVC-backed)

    Test procedure

    Here's how we implemented the test.

    1. Deploy a fresh cluster and allow stabilization for 1 hour.
    2. Deploy 500+ test pods/containers across worker nodes.
    3. Monitor cluster for 1-2 hours to establish baseline.
    4. Enable PSI.
      • The MachineConfig adds the boot parameter to set PSI at the kernel level
      • After PSI is enabled, /proc/pressure/{cpu,memory,io} directories are created
    5. Monitor the cluster for several hours. Compact, GC, and WAL checkpoints happen every 2 hours. They impact the Prometheus RSS. If there are Prometheus pods restarts, this could be caused by nodes updated by an operator, or admin operation. This also impacts Prometheus memory RSS.
    6. Collect metrics (listed below) in each phase.

    Metrics monitored

    Here are the metric categories and associated PromQL queries for each:

    Pods/container count

    sum(kube_pod_status_phase{})by(phase)
    count(kube_pod_container_info)

    PSI metric count

    count({__name__=~"container_pressure_.*"})

    Prometheus memory RSS

    There are two possible queries to retrieve this information:

    container_memory_rss{container="prometheus",namespace="openshift-monitoring"}
    
    container_memory_working_set_bytes{container="prometheus",namespace="openshift-monitoring"}

    Kubelet process CPU

    irate(process_cpu_seconds_total{service="kubelet",job="kubelet"}[1m])*100 * on (node) group_left kube_node_role{ role = "worker" }

    Kubelet process memory

    process_resident_memory_bytes{service="kubelet",job="kubelet"} *  on (node) group_left kube_node_role{ role = "worker" }

    Kubelet slice CPU

    rate(container_cpu_usage_seconds_total{ job=~".*", id =~"/system.slice/kubelet.service"}[1m]) * 100* on (node) group_left kube_node_role{ role = "worker" }

    Kubelet slice memory

    container_memory_rss{ job=~".*", id =~"/system.slice/kubelet.service"} * on (node) group_left kube_node_role{ role = "worker" }

    Test results

    Our tests produced the results displayed in the following table:

    MetricsPhase1 After installPhase2 Add 500+ pods (Baseline)Phase3 Enable PSIPhase4 After compact/GCPhase5 After prometheus pods restart
    Prometheus RSS(0/1)1.887G/1.956G3.102G/3.098G (Phase1+1.215G/1.142G)4.414G/4.251G (Baseline+1.312G/1.153G)3.827G/3.756G (Baseline+725M/658M) (Phase3-587M/495M)3.361G/3.314G (Baseline+259M/216M) (Phase4-466M/442M)
    Containers5871131 (Phase1+544)1126 (Baseline-5)11261149 (Phase4+23)
    Running pods279823 (Phase1+544)823823823
    PSI metrics11,44824,504 (Phase1+13,056)24,50424,50424,504
    PSI container metrics from nodes cadvisor11,55624,612 (Phase1+13,056)246122461224612
    Total metrics484,597691,997 (Phase1+207,400)699,697 (Baseline+7,700)699,976 (Phase3+279)611,389 (Phase4-88,587)

    Prometheus memory impact

    Based on these test results, we found that with 500+ test pods, the maximum Prometheus RSS increase per pod can exceed 1.3 GB.

    • Baseline (deployed 500+ test pods): Established baseline memory usage of 3.0-3.1GB.
    • PSI enabled: 1.2-1.3 GB RSS increase per Prometheus pod.
    • After compaction/GC/WAL checkpoint: 400-500 MB RSS decrease per pod, but usage increased again until the next compaction cycle.
    • Prometheus pods restart: 400+ MB RSS decrease per pod. Compared to baseline, there is a 200-300MB increase.

    RSS is the physical RAM a process is actually using. You can see the Prometheus memory RSS increase in Figure 1.

    Prometheus memory rss increased after enabling PSI.
    Figure 1: Prometheus memory rss increased after enabling PSI.

    Kubelet CPU and memory

    There's no significant increase in kubelet CPU or memory after enabling PSI. This was validated for both kubelet process metrics and kubelet systemd slice metrics.

    Figure 2 displays the kubelet process CPU. The yellow line at the top is the sum of 3 workers. Lines below the top line are for each worker. The Y-axis 100 marker is 1 core of CPU.

    Kubelet process CPU has no significant increase after enabling PSI.
    Figure 2: Kubelet process CPU has no significant increase after enabling PSI.

    Figure 3 displays kubelet process memory. The yellow line at the top is the sum of 3 workers. Lines below it are for each worker.

    Kubelet process CPU has no significant increase after enabling PSI.
    Figure 3: Kubelet process memory has no significant increase after enabling PSI.

    Figure 4 displays the system.slice/kubelet.service CPU. The Y-axis 100 marker is 1 core of CPU. The yellow line at the top is the sum of 3 workers, and lines below it are for each worker.

    Kubelet slice CPU has no significant increase after enabling PSI.
    Figure 4: Kubelet slice CPU has no significant increase after enabling PSI.

    Figure 5 displays system.slice/kubelet.service memory. The yellow line at the top is the sum of 3 workers. Lines below are for each worker.

    Kubelet slice memory has no significant increase after enabling PSI.
    Figure 5: Kubelet slice memory has no significant increase after enabling PSI.

    Understanding PSI metric cardinality

    For each pod, PSI metrics are emitted not only for application containers but also for two additional containers:

    • container="" (pause container - infra)
    • container="POD" (pod cgroup)

    Here is an example of one of three PSI metrics for a pod:

    container_pressure_cpu_waiting_seconds_total{container="",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod43e94c3b_60b2_463c_bb0c_bb10d153e49d.slice",image="",name="",namespace="node-density-heavy-0",pod="perfapp-1-1-bc966c69-h6c77"} 1.456709 1769503672619
    
    container_pressure_cpu_waiting_seconds_total{container="POD",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod43e94c3b_60b2_463c_bb0c_bb10d153e49d.slice/crio-d9ee10c5cdfc43b2bf36c7af5e34cffd4c353e09de52556c06ab98ee25d89310",image="",name="k8s_POD_perfapp-1-1-bc966c69-h6c77_node-density-heavy-0_43e94c3b-60b2-463c-bb0c-bb10d153e49d_0",namespace="node-density-heavy-0",pod="perfapp-1-1-bc966c69-h6c77"} 0 1769503667734
    
    container_pressure_cpu_waiting_seconds_total{container="perfapp",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod43e94c3b_60b2_463c_bb0c_bb10d153e49d.slice/crio-cf24cfb805081bc45e300e9c041d123b78bc97167e43831d83bb1b1c1bfd7609.scope",image="quay.io/cloud-bulldozer/perfapp:latest",name="k8s_perfapp_perfapp-1-1-bc966c69-h6c77_node-density-heavy-0_43e94c3b-60b2-463c-bb0c-bb10d153e49d_0",namespace="node-density-heavy-0",pod="perfapp-1-1-bc966c69-h6c77"} 1.295762 1769503671596

    cAdvisor emits PSI metrics for every relevant cgroup. For a pod with a single application container, this means:

    3 containers × 6 PSI metric types = 18 total PSI metrics per pod

    Dropping the "" and "POD" containers would eliminate approximately 66% of the PSI series. Is it possible to reduce the Prometheus scrape effort and resource usage?

    Reducing Prometheus resource usage

    The metrics pipeline has three stages: Emit, scrape, and query. In this context, that is:

    cAdvisor → Prometheus → query result

    You can't (yet) suppress emission at the source

    It's not possible yet to configure Kubernetes/cAdvisor/CRI-O to suppress PSI metrics for pause and pod cgroups (the "" and "POD" containers) at their source. There are no known configuration options for this approach, but there is an issue on GitHub to discuss this option.

    Metric relabeling

    Configure Prometheus metric relabeling to drop PSI metrics for non-application containers. This approach reduces:

    • Scrape payload size: Less data transferred during collection
    • Series count: Fewer time series stored in Prometheus
    • Head memory: Lower RAM usage for active series
    • WAL churn: Reduced write-ahead log activity

    Implementation example (unsupported)

    The following configuration drops PSI metrics for POD, and pauses containers:

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: cluster-monitoring-config
      namespace: openshift-monitoring
    data:
      config.yaml: |
        kubelet:
          metricRelabelings:
          - sourceLabels:
            - __name__
            - container
            regex: container_pressure_.*;(POD|)
            action: drop

    This configuration is not supported in OpenShift's built-in cluster monitoring operator. Attempting to apply this configuration in OpenShift results in the following error:

    error when patching "cluster-monitoring-config.yaml": admission webhook "monitoringconfigmaps.openshift.io" denied the request: failed to parse data at key "config.yaml": error unmarshaling: unknown field "kubelet"

    The OpenShift cluster monitoring operator only exposes a subset of Prometheus configuration parameters. According to the config map reference for the cluster monitoring operator:

    > Not all configuration parameters for the monitoring stack are exposed. > Only the parameters and fields listed in this reference are supported for configuration.

    OpenShift does not support customizing the kubelet ServiceMonitor. Using a custom Prometheus deployment outside of the managed monitoring stack is not recommended for production.

    PromQL query filter

    Filtering in PromQL using container!="", container!="POD" only affects query results and provides no resource savings. This approach is useful for data visualization, but it doesn't address the underlying resource consumption issues.

    Test conclusion

    The test shows that PSI enablement has a measurable memory impact on Prometheus pods. With 500+ test containers on a cluster as baseline, enabling PSI causes Prometheus pod RSS to increase up to 1.3+ GB per Prometheus pod — a 42% increase from baseline memory consumption.

    However, there is no observable impact on kubelet CPU or memory usage.

    Based on a performance evaluation with 500+ test containers, here are some recommendations when enabling PSI:

    • Prometheus capacity planning: Allocate an additional 1.4 GB RSS per Prometheus pod before enabling PSI. For clusters exceeding 500 test containers, scale this allocation proportionally.
    • Monitoring: Closely monitor Prometheus pod memory usage after enabling PSI.
    • Kubelet performance: Kubelet performance remains stable with 500+ test containers after enabling PSI.

    Learn more

    To learn more about PSI, kubelet PSI metrics, and Prometheus, check out these resources:

    • Pressure stall information (PSI) in Linux Kernel documentation
    • Metrics for Kubernetes system components
    • Understand PSI metrics in Kubernetes
    • Config map reference for the cluster monitoring operator
    • API reference for Prometheus operator

    Related Posts

    • Unlocking UBI to Red Hat Enterprise Linux container images

    • Smarter multi-cluster scheduling with dynamic scoring framework

    Recent Posts

    • Prepare to enable Linux pressure stall information on Red Hat OpenShift

    • Advanced Cluster Management 2.16 right-sizing recommendation GA

    • Configure NVIDIA Blackwell GPUs for Red Hat AI workloads

    • Unlocking UBI to Red Hat Enterprise Linux container images

    • What's new in Red Hat Developer Hub 1.9?

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue