Skip to main content
Redhat Developers  Logo
  • Products

    Platforms

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat AI
      Red Hat AI
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • View All Red Hat Products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat Developer Hub
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat OpenShift Local
    • Red Hat Developer Sandbox

      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Secure Development & Architectures

      • Security
      • Secure coding
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • Product Documentation
    • API Catalog
    • Legacy Documentation
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Dynamic VM CPU Workload Rebalancing with Load Aware Descheduler

June 3, 2025
Guoqing Li
Related topics:
Virtualization
Related products:
Red Hat OpenShift

Share:

    Overview

    We evaluate the behavior of load aware descheduler with OpenShift Virtualization on OCP 4.19. This blog explores how Load Aware Descheduler balances VM distribution using the technology preview profile devKubeVirtRelieveAndMigrate based on CPU utilization and Node CPU pressure. Our data demonstrated how Descheduler could help improve overall CPU performance when nodes are suffering from CPU contentions due to imbalanced distribution. 

    Environment

    This testing was conducted on  a 3 masters + 12 workers cluster. Each node is equipped with 2 sockets x 16 cores x 2 threads = 64 CPUs, 376Gi of RAM.  

    Descheduler profiles &  customization

    Profile:

    • devKubeVirtRelieveAndMigrate

    profileCustomizations:

    • devEnableEvictionsInBackground: true
    • devEnableSoftTainter: true
    • devDeviationThresholds: AsymmetricLow
    • devActualUtilizationProfile: PrometheusCPUCombined

    This profile makes dynamic VM descheduling decisions based on both CPU utilization and PSI (Pressure Stall Information) CPU metric which quantifies the disruptions of workloads due to CPU contention, often caused by excessive overcommit.  At first, Descheduler will balance workloads by evicting VMs from overutilized nodes (those exceeding the cluster average CPU utilization by 10% or more) to underutilized nodes (those below the cluster average). However, when cluster-wide CPU utilization reaches 80% threshold, Descheduler shifts from using CPU utilization to PSI CPU metrics. This allows Descheduler to make smarter decisions, moving VMs from high-pressure nodes to lower pressure ones.

    Evaluation

    Baseline

    baseline

    We deployed 130 VMIs across 6 of 12 worker nodes using Node Selectors and Zone labels. Each VM ran stress-ng init scripts that fully utilized all 4 allocated vCPUs. This created a stark imbalance: 6 nodes operated at maximum CPU capacity while the remaining 6 nodes (highlighted in magenta) sat completely idle. Upon activating the Descheduler, VMs gradually migrated from overutilized to idle nodes. The cluster quickly achieved balance, with CPU utilization converging across all nodes and standard deviation dropping from approximately 50% to just 7%.

    cpu wait time

    We also observed that the cluster's average CPU utilization substantially increased following descheduler rebalancing. This counterintuitive result stemmed from the initial overcommitment of CPUs as reflected by the vCPU wait time plot above, where requested vCPU exceeded total node capacity on the active nodes. This created contention with VMs competing for limited CPU resources, degrading overall performance. By rebalancing the VM distribution, the descheduler improved overall CPU performance in this situation, reducing the average vCPU wait time from over 100% to nearly 0%. 

    Cluster Upgrade

    creation phase

    For the node upgrade scenario, we simply keep the descheduler running at an interval of 60s and launch 130 VMIs without applying node selectors. The default scheduler did a reasonably good job by placing most VMs on 11 out of 12 nodes, However, only a few VMs got scheduled on to node f08-h03.  Since the descheduler is running every 60s, it is continuously applying/removing soft-taints to nodes (according to their utilization) as a hint for the scheduler.  it quickly classified node f08-h03 as underutilized and started moving some VMs from other nodes onto this one, helping the scheduler to converge faster in such cases.

    node upgrade

    We then used the machine config that artificially simulated the node upgrade scenario to reboot each node one after another.  As expected, the last node (f08-h05) got drained and eventually had some VMs moved in, achieving a balanced distribution in the end. 

    Node Pressure Rebalancing

    node pressure

    When cluster average CPU utilization exceeds 80%, the Descheduler begins rebalancing nodes based on PSI pressure metrics. In our deployment of 800 VMs across 12 worker nodes, cluster-wide CPU utilization reached nearly 85%. Initially, several nodes experienced high CPU pressure due to uneven workload distribution. Once the Descheduler activated, we observed a significant improvement - nodes that had previously shown high pressure readings gradually saw their PSI values drop below the 20% threshold, Both the standard deviation and average node pressure metrics showed noticeable decline, demonstrating the ability of PSI-based scheduling for optimizing VM workload distribution.

    Important Notes

    Please note that LoadAware descheduler is still in technology preview and there are non-converging corner cases we need to pay attention to such as VMs configured with node selectors or a single VM usage exceeds overutilization threshold etc. 

    Acknowledgement

    This is a collaborative effort within the OpenShift Virtualization Performance and Scale team, We address storage, network performance and scalability challenges, conducting in-depth performance analysis to ensure workloads operate efficiently at scale across the entire infrastructure stack. Special thanks to Simone Tiraboschi, Robert Krawitz, Jenifer Abrams, Shekhar Berry, Peter Lauterbach 
     

     

    Last updated: June 19, 2025
    Disclaimer: Please note the content in this blog post has not been thoroughly reviewed by the Red Hat Developer editorial team. Any opinions expressed in this post are the author's own and do not necessarily reflect the policies or positions of Red Hat.

    Recent Posts

    • Cloud bursting with confidential containers on OpenShift

    • Reach native speed with MacOS llama.cpp container inference

    • A deep dive into Apache Kafka's KRaft protocol

    • Staying ahead of artificial intelligence threats

    • Strengthen privacy and security with encrypted DNS in RHEL

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue