Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Dynamic VM CPU Workload Rebalancing with Load Aware Descheduler

June 3, 2025
Guoqing Li
Related topics:
Virtualization
Related products:
Red Hat OpenShift

Share:

    Overview

    We evaluate the behavior of load aware descheduler with OpenShift Virtualization on OCP 4.19. This blog explores how Load Aware Descheduler balances VM distribution using the technology preview profile devKubeVirtRelieveAndMigrate based on CPU utilization and Node CPU pressure. Our data demonstrated how Descheduler could help improve overall CPU performance when nodes are suffering from CPU contentions due to imbalanced distribution. 

    Environment

    This testing was conducted on  a 3 masters + 12 workers cluster. Each node is equipped with 2 sockets x 16 cores x 2 threads = 64 CPUs, 376Gi of RAM.  

    Descheduler profiles &  customization

    Profile:

    • devKubeVirtRelieveAndMigrate

    profileCustomizations:

    • devEnableEvictionsInBackground: true
    • devEnableSoftTainter: true
    • devDeviationThresholds: AsymmetricLow
    • devActualUtilizationProfile: PrometheusCPUCombined

    This profile makes dynamic VM descheduling decisions based on both CPU utilization and PSI (Pressure Stall Information) CPU metric which quantifies the disruptions of workloads due to CPU contention, often caused by excessive overcommit.  At first, Descheduler will balance workloads by evicting VMs from overutilized nodes (those exceeding the cluster average CPU utilization by 10% or more) to underutilized nodes (those below the cluster average). However, when cluster-wide CPU utilization reaches 80% threshold, Descheduler shifts from using CPU utilization to PSI CPU metrics. This allows Descheduler to make smarter decisions, moving VMs from high-pressure nodes to lower pressure ones.

    Evaluation

    Baseline

    baseline

    We deployed 130 VMIs across 6 of 12 worker nodes using Node Selectors and Zone labels. Each VM ran stress-ng init scripts that fully utilized all 4 allocated vCPUs. This created a stark imbalance: 6 nodes operated at maximum CPU capacity while the remaining 6 nodes (highlighted in magenta) sat completely idle. Upon activating the Descheduler, VMs gradually migrated from overutilized to idle nodes. The cluster quickly achieved balance, with CPU utilization converging across all nodes and standard deviation dropping from approximately 50% to just 7%.

    cpu wait time

    We also observed that the cluster's average CPU utilization substantially increased following descheduler rebalancing. This counterintuitive result stemmed from the initial overcommitment of CPUs as reflected by the vCPU wait time plot above, where requested vCPU exceeded total node capacity on the active nodes. This created contention with VMs competing for limited CPU resources, degrading overall performance. By rebalancing the VM distribution, the descheduler improved overall CPU performance in this situation, reducing the average vCPU wait time from over 100% to nearly 0%. 

    Cluster Upgrade

    creation phase

    For the node upgrade scenario, we simply keep the descheduler running at an interval of 60s and launch 130 VMIs without applying node selectors. The default scheduler did a reasonably good job by placing most VMs on 11 out of 12 nodes, However, only a few VMs got scheduled on to node f08-h03.  Since the descheduler is running every 60s, it is continuously applying/removing soft-taints to nodes (according to their utilization) as a hint for the scheduler.  it quickly classified node f08-h03 as underutilized and started moving some VMs from other nodes onto this one, helping the scheduler to converge faster in such cases.

    node upgrade

    We then used the machine config that artificially simulated the node upgrade scenario to reboot each node one after another.  As expected, the last node (f08-h05) got drained and eventually had some VMs moved in, achieving a balanced distribution in the end. 

    Node Pressure Rebalancing

    node pressure

    When cluster average CPU utilization exceeds 80%, the Descheduler begins rebalancing nodes based on PSI pressure metrics. In our deployment of 800 VMs across 12 worker nodes, cluster-wide CPU utilization reached nearly 85%. Initially, several nodes experienced high CPU pressure due to uneven workload distribution. Once the Descheduler activated, we observed a significant improvement - nodes that had previously shown high pressure readings gradually saw their PSI values drop below the 20% threshold, Both the standard deviation and average node pressure metrics showed noticeable decline, demonstrating the ability of PSI-based scheduling for optimizing VM workload distribution.

    Important Notes

    Please note that LoadAware descheduler is still in technology preview and there are non-converging corner cases we need to pay attention to such as VMs configured with node selectors or a single VM usage exceeds overutilization threshold etc. 

    Acknowledgement

    This is a collaborative effort within the OpenShift Virtualization Performance and Scale team, We address storage, network performance and scalability challenges, conducting in-depth performance analysis to ensure workloads operate efficiently at scale across the entire infrastructure stack. Special thanks to Simone Tiraboschi, Robert Krawitz, Jenifer Abrams, Shekhar Berry, Peter Lauterbach 
     

     

    Last updated: June 19, 2025
    Disclaimer: Please note the content in this blog post has not been thoroughly reviewed by the Red Hat Developer editorial team. Any opinions expressed in this post are the author's own and do not necessarily reflect the policies or positions of Red Hat.

    Recent Posts

    • Create and enrich ServiceNow ITSM tickets with Ansible Automation Platform

    • Expand Model-as-a-Service for secure enterprise AI

    • OpenShift LACP bonding performance expectations

    • Build container images in CI/CD with Tekton and Buildpacks

    • How to deploy OpenShift AI & Service Mesh 3 on one cluster

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue