Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • Guided learning
      Receive custom learning paths powered by our AI assistant.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

Gang autoscaling on OpenShift with Kueue and ProvisionRequest

Coordinated autoscaling for AI/ML workloads on OpenShift with Kueue

June 8, 2026
Kevin Hannon Michael McCune
Related topics:
Artificial intelligenceAutomation and managementKubernetes
Related products:
Red Hat OpenShiftRed Hat OpenShift Container Platform

    Running high-performance workloads like AI/ML training, high-performance computing (HPC) simulations, or large data processing on Kubernetes introduces a critical challenge: gang scheduling and coordinated autoscaling.

    Imagine you have a distributed machine learning job that requires ten worker pods to start simultaneously. If the cluster only has capacity for eight, traditional Kubernetes scheduling might start those eight pods, leaving the remaining two unschedulable. The Cluster Autoscaler might see those two pending pods and scale up the cluster. However, the first eight pods are useless without the full "gang," wasting compute resources and potentially causing the autoscaler to over-provision or get into a scaling loop before the job can even begin.

    This scenario is where the concept of gang autoscaling becomes essential. We need a mechanism that can hold the entire group (or gang) of pods, make sure that all required resources are available before any pod is scheduled, and efficiently signal the need for new capacity to the autoscaler. This coordinated approach prevents resource waste so your latency-sensitive, high-throughput workloads get exactly the resources they need right from the start.

    This post explores how combining Red Hat build of Kueue, a queueing and resource management tool, with the ProvisionRequest API brings true gang autoscaling capabilities to OpenShift, so your critical workloads start efficiently and reliably.

    Configuring OpenShift autoscaler to support ProvisionRequest

    Edit the feature gate for OpenShift clusters to be in DevPreviewNoUpgrade.

    featureSet: DevPreviewNoUpgrade

    The first step to enable the autoscaler is to create the ClusterAutoscaler custom resource (CR).

    Note

    The values in this example CR are for demonstration purposes and should only be used after consulting a cluster administrator.

    apiVersion: "autoscaling.openshift.io/v1"
    kind: "ClusterAutoscaler"
    metadata:
      name: "default"
    spec:
      podPriorityThreshold: -10
      resourceLimits:
        maxNodesTotal: 24
        cores:
          min: 8
          max: 128
        memory:
          min: 4
          max: 256
      logVerbosity: 4
      scaleDown:
        cordonNodeBeforeTerminating: Enabled
        enabled: true
        delayAfterAdd: 10m
        delayAfterDelete: 5m
        delayAfterFailure: 30s
        unneededTime: 5m
        utilizationThreshold: "0.4"
      scaleUp:
        newPodScaleUpDelay: "10s"
      expanders: ["Random"]

    The next step is to configure the MachineAutoscaler to know which MachineSet resources you need to scale. The maxReplicas field in this CR controls the maximum number of nodes that this MachineAutoscaler will be able to create.

    apiVersion: "autoscaling.openshift.io/v1beta1"
    kind: "MachineAutoscaler"
    metadata:
      name: "worker-autoscaler"
      namespace: "openshift-machine-api"
    spec:
      minReplicas: 1
      maxReplicas: 12
      scaleTargetRef:
        apiVersion: machine.openshift.io/v1beta1
        kind: MachineSet
        name: your_machine_set_you_want_to_scale

    Configuring Red Hat build of Kueue to work with autoscaling

    Red Hat build of Kueue acts as the intelligent arbiter for resource requests, managing job queues and making sure cluster capacity is reserved or provisioned before jobs are admitted. Integrating it with autoscaling via ProvisionRequest requires specific configuration to enable this coordinated behavior.

    To achieve this, Kueue must recognize when a job requires provisioning new capacity by setting the appropriate cluster queue and resource flavor configurations. The following breakdown represents the complete plumbing required to move from a job request to physical cloud infrastructure, connecting the user's queue to an automated provisioning system.

    ResourceFlavor: Defining available resource types

    This resource is the simplest piece. It defines a type of resource. In a real-world scenario, you might have one for spot-instances and one for on-demand. Here, it’s just a label called default-flavor.

    kind: ResourceFlavor
    apiVersion: kueue.x-k8s.io/v1beta2
    metadata:
      name: "default-flavor"

    ClusterQueue: Managing cluster resource quotas

    The ClusterQueue configuration manages the overall cluster resource budget. It specifies that this cluster can handle 36 CPUs of default-flavor.

    kind: ClusterQueue
    apiVersion: kueue.x-k8s.io/v1beta2
    spec:
      resourceGroups:
      - flavors:
        - name: "default-flavor"
          resources:
          - name: "cpu"
            nominalQuota: 36
      admissionChecksStrategy:
        admissionChecks:
          - name: "sample-prov"
            onFlavors: [default-flavor]

    The admissionChecksStrategy field acts as a bridge for this configuration. It tells Kueue that even if the user has enough quota (fewer than 36 CPUs), do not admit the job until the check named sample-prov returns a Ready status.

    AdmissionCheck: Controlling job admission gates

    The AdmissionCheck resource acts as an automated cluster gatekeeper.

    kind: AdmissionCheck
    metadata:
      name: sample-prov
    spec:
      controllerName: kueue.x-k8s.io/provisioning-request
      parameters:
        kind: ProvisioningRequestConfig
        name: prov-test-config

    By setting controllerName to provisioning-request, you tell Kueue to use its internal provisioning logic. This configuration points directly to a ProvisioningRequestConfig resource for instructions on how to build the infrastructure if it doesn't already exist.

    ProvisioningRequestConfig: Specifying autoscaling parameters

    This configuration block provides the explicit set of instructions for the Cluster Autoscaler. The provisioningClassName tells the cluster which autoscaler driver to use, such as an atomic scale-up driver. The managedResources field specifies that the autoscaler must provision more CPU capacity. Finally, retryStrategy defines that if the cloud provider lacks capacity, the system will attempt to provision the nodes up to two times before giving up.

    kind: ProvisioningRequestConfig
    metadata:
      name: prov-test-config
    spec:
      provisioningClassName: best-effort-atomic-scale-up.autoscaling.x-k8s.io
      managedResources:
      - cpu
      retryStrategy:
        backoffLimitCount: 2

    How the components flow together

    To understand how these individual configurations interact in a live cluster, you can track the end-to-end lifecycle of a job request as it passes through the provisioning pipeline.

    1. First, a user submits a job to the user-queue (the LocalQueue).
    2. Kueue then evaluates the ClusterQueue to see if there are enough of the 36 CPUs left.
    3. If yes, it sees the AdmissionCheck and notices the check requires provisioning.
    4. Kueue then creates a ProvisioningRequest custom resource definition based on the template in ProvisioningRequestConfig.
    5. The Cluster Autoscaler sees that request, builds the nodes, and marks the request as Succeeded.
    6. The AdmissionCheck flips to Ready, and Kueue allows the job to run on the new nodes.

    Optimizing cluster use for AI/ML

    Kueue support for provision requests opens the door for AI/ML workloads. AI/ML workloads need gang scheduling, and being able to provide the right nodes for your workloads is essential.

    This integration of Red Hat build of Kueue and the ProvisionRequest API solves a long-standing challenge in running high-performance workloads on OpenShift. By enabling true gang autoscaling, we eliminate resource waste, prevent scaling deadlocks, and make sure that complex, multi-pod jobs—like distributed AI/ML training—can start immediately and reliably once all required capacity is available. This capability is foundational for optimizing cluster use and increasing the throughput of critical applications.

    What's next: Autoscaling for inference

    While this post focused on the provisioning of resources for demanding, capacity-guaranteed workloads, the integration of Kueue and the ProvisionRequest API paves the way for advanced autoscaling for inference workloads. We are exploring how to use these same mechanisms to handle sudden, large spikes in demand for AI/ML model serving, allowing the system to scale rapidly to meet real-time user needs without maintaining expensive idle capacity.

    Connect with the Kueue community

    Are you running distributed AI/ML, HPC, or other gang-scheduled workloads on OpenShift? Does the challenge of coordinated autoscaling resonate with your team's pain points? We are actively seeking feedback and collaborations to refine these features.

    Related Posts

    • Red Hat build of Kueue 1.3: Enhanced batch workload management on Kubernetes

    • Autoscaling vLLM with OpenShift AI

    • Tame Ray workloads on OpenShift AI with KubeRay and Kueue

    • Optimize GPU utilization with Kueue and KEDA

    • Improve GPU utilization with Kueue in OpenShift AI

    • Boost AI efficiency with GPU autoscaling on OpenShift

    Recent Posts

    • MPI-powered gradient synchronization in PyTorch distributed training

    • llama.cpp vs. vLLM: Choosing the right local LLM inference engine

    • How speculative decoding delivers faster LLM inference

    • What's New in Red Hat Developer Hub 1.10?

    • Model-as-a-Service: How to run your own private AI API

    What’s up next?

    Learning Path AI sparkles and a tiny red hat on a dark background

    Get started with consuming GPU-hosted large language models on Developer Sandbox

    Learn the many ways you can interact with GPU-hosted large language models...
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Chat Support

    Please log in with your Red Hat account to access chat support.