Skip to main content
Redhat Developers  Logo
  • Products

    Platforms

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat AI
      Red Hat AI
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • See all Red Hat products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat Developer Hub
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat OpenShift Local
    • Red Hat Developer Sandbox

      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Red Hat OpenShift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • See all technologies
    • Programming languages & frameworks

      • Java
      • Python
      • JavaScript
    • System design & architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer experience

      • Productivity
      • Tools
      • GitOps
    • Automated data processing

      • AI/ML
      • Data science
      • Apache Kafka on Kubernetes
    • Platform engineering

      • DevOps
      • DevSecOps
      • Red Hat Ansible Automation Platform for applications and services
    • Secure development & architectures

      • Security
      • Secure coding
  • Learn

    Featured

    • Kubernetes & cloud native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • AI/ML
      AI/ML Icon
    • See all learning resources

    E-books

    • GitOps cookbook
    • Podman in action
    • Kubernetes operators
    • The path to GitOps
    • See all e-books

    Cheat sheets

    • Linux commands
    • Bash commands
    • Git
    • systemd commands
    • See all cheat sheets

    Documentation

    • Product documentation
    • API catalog
    • Legacy documentation
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore the Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Dynamic resource allocation goes GA in Red Hat OpenShift 4.21: Smarter GPU scheduling for AI workloads

March 25, 2026
Harshal Patil
Related topics:
Application modernizationKubernetes
Related products:
Red Hat OpenShift

    With OpenShift 4.21, Dynamic Resource Allocation (DRA) graduates to General Availability, fundamentally changing how GPU and accelerator resources are requested, allocated, and shared across your cluster. Built on the upstream Kubernetes 1.34 DRA implementation, this release replaces the limitations of the old device plug-in model with a richer, expression-driven framework that understands device attributes, not just device counts.

    This post covers what DRA is, why it matters, what's new in OpenShift 4.21, and how to use it with real examples running on an OpenShift 4.21 cluster with NVIDIA A100 GPUs.

    The problem: Why device plug-ins fall short

    Kubernetes has supported hardware accelerators like GPUs through the device plug-in framework since version 1.8. While functional, device plug-ins have fundamental limitations that become painful at scale, especially for AI/ML workloads:

    • Count-based allocation only: A pod requests nvidia.com/gpu: 1, but has no way to say which GPU it needs. There is no mechanism to filter by model, memory capacity, compute capability, or driver version.
    • No device sharing: A GPU allocated to one container cannot be shared with another, even for lightweight inference workloads that would only use a fraction of the device.
    • No topology awareness: The scheduler is blind to PCIe topology, NVLink connectivity, and NUMA placement. Multi-GPU workloads may land on suboptimal device combinations.
    • No parameterization: Workloads cannot request specific device configurations like MIG profiles or power limits at scheduling time.
    • No cluster autoscaler integration: The autoscaler cannot reason about opaque device resources when deciding whether to add or remove nodes.

    Teams work around these limitations with node labels, taints, tolerations, and custom admission webhooks, but these are brittle, error-prone, and do not scale.

    What is dynamic resource allocation?

    Dynamic resource allocation (DRA) is a Kubernetes API framework under the resource.k8s.io API group that enables workloads to request specialized hardware based on device attributes rather than simple counts. Think of it as the device equivalent of the PersistentVolume/PersistentVolumeClaim model, but for GPUs, FPGAs, NICs, and other accelerators.

    DRA introduces four core API objects:

    • ResourceSlice: Published by DRA drivers on each node. Describes available devices with typed attributes (model, memory, driver version, UUID, and so on).
    • DeviceClass: Defines a category of devices using CEL selector expressions. Created by admins or drivers.
    • ResourceClaim: A workload's request for specific devices. Supports CEL-based filtering, can be shared across pods, and persists independently of pod lifecycle.
    • ResourceClaimTemplate: A template from which Kubernetes auto-generates per-pod ResourceClaims. The generated claim is deleted when its pod terminates.

    The key architectural insight is that DRA drivers publish structured, transparent device information (ResourceSlices) to the API server, and the kube-scheduler itself handles allocation decisions by evaluating CEL expressions against device attributes. No external controller negotiation is needed during scheduling, which makes DRA significantly faster and fully compatible with the cluster autoscaler.

    See it in practice

    On a cluster with the NVIDIA DRA driver installed, each node publishes a ResourceSlice describing its GPUs. Here is what the driver advertises for a full A100 GPU:

    {
        "attributes": {
            "architecture":          { "string":  "Ampere" },
            "brand":                 { "string":  "Nvidia" },
            "cudaComputeCapability": { "version": "8.0.0" },
            "cudaDriverVersion":     { "version": "13.0.0" },
            "driverVersion":         { "version": "580.105.8" },
            "productName":           { "string":  "NVIDIA A100-SXM4-40GB" },
            "type":                  { "string":  "gpu" },
            "uuid":                  { "string":  "GPU-ec819aa6-26b9-d90a-00c8-3fcf0a34a0c9" }
        },
        "capacity": {
            "memory": { "value": "40Gi" }
        },
        "name": "gpu-0"
    }

    And for a MIG slice on the same GPU model:

    {
        "attributes": {
            "architecture":          { "string":  "Ampere" },
            "productName":           { "string":  "NVIDIA A100-SXM4-40GB" },
            "profile":               { "string":  "1g.5gb" },
            "type":                  { "string":  "mig" },
            "uuid":                  { "string":  "MIG-e42ee090-5c43-53b2-a164-c6a0b7ac1a57" },
            "parentUUID":            { "string":  "GPU-e40930a0-c463-2611-3473-bc72ac15679a" }
        },
        "capacity": {
            "memory":           { "value": "4864Mi" },
            "multiprocessors":  { "value": "14" }
        },
        "name": "gpu-0-mig-1g5gb-19-5"
    }

    The scheduler can now see the product name, architecture, MIG profile, memory capacity, and more. This is information that was completely invisible in the device plug-in model.

    The NVIDIA DRA driver also creates DeviceClasses automatically:

    $ oc get deviceclasses
    NAME                                        AGE
    gpu.nvidia.com                              12m
    mig.nvidia.com                              12m

    The gpu.nvidia.com name matches full GPUs, while mig.nvidia.com matches MIG slices. Both use CEL selectors against the type attribute published in the ResourceSlice.

    The road to GA

    DRA's path to General Availability (GA) in OpenShift 4.21 spans multiple Kubernetes and OpenShift releases:

    ReleaseKubernetesDRA statusMilestone
    OpenShift 4.191.32Not availableUpstream DRA beta with structured parameters; classic DRA withdrawn
    OpenShift 4.201.33Technology PreviewDRA enabled behind TechPreview feature gate in OpenShift with validation of Nvidia driver
    OpenShift 4.211.34General AvailabilityDRA enabled by default; resource.k8s.io/v1 API; beta APIs removed

    The feature gate DynamicResourceAllocation was promoted to the default feature set (OCPNODE-3779, now closed). Earlier alpha/beta API enablement was removed since the v1 API is now served by default.

    What's GA in OpenShift 4.21

    Three DRA capabilities reached General Availability in the 4.21 release. The demos in this article were run on an OpenShift 4.21.3 cluster on Google Cloud with three A100-SXM4-40GB worker nodes, each configured with a different GPU layout:

    • worker-1: all-1g.5gb
      • 7x MIG 1g.5gb slices (4.8 GB each)
    • worker-2: all-3g.20gb
      • 2x MIG 3g.20gb slices (19.6 GB each)
    • worker-3: MIG disabled
      • 1x full A100 40 GB

    1. Attribute-based GPU allocation

    This is the headline feature. Pods can now request GPUs based on specific device attributes exposed by a DRA driver, including product name, memory capacity, compute capability, driver version, and MIG profile.

    Requesting a specific MIG profile

    The following ResourceClaimTemplate requests a 1g.5gb MIG slice using a CEL selector against the profile attribute:

    apiVersion: resource.k8s.io/v1
    kind: ResourceClaimTemplate
    metadata:
      name: mig-1g5gb-claim
      namespace: dra-demo
    spec:
      spec:
        devices:
          requests:
          - name: mig
            exactly:
              deviceClassName: mig.nvidia.com
              selectors:
              - cel:
                  expression: "device.attributes['gpu.nvidia.com'].profile == '1g.5gb'"

    A pod references this template and runs the CUDA vectorAdd sample to verify that the GPU is actually usable:

    apiVersion: v1
    kind: Pod
    metadata:
      name: vectoradd-1g5gb
      namespace: dra-demo
    spec:
      restartPolicy: Never
      containers:
      - name: vectoradd
        image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0
        resources:
          claims:
          - name: gpu
      resourceClaims:
      - name: gpu
        resourceClaimTemplateName: mig-1g5gb-claim

    The result:

    $ oc logs vectoradd-1g5gb -n dra-demo
    [Vector addition of 50000 elements]
    Copy input data from the host memory to the CUDA device
    CUDA kernel launch with 196 blocks of 256 threads
    Copy output data from the CUDA device to the host memory
    Test PASSED
    Done

    The pod landed on worker-1 (the node with 1g.5gb slices), and the scheduler matched the claim to a specific MIG device.

    Requesting a full GPU

    Using the gpu.nvidia.com DeviceClass instead of mig.nvidia.com requests a whole GPU:

    apiVersion: resource.k8s.io/v1
    kind: ResourceClaim
    metadata:
      name: full-gpu-claim
      namespace: dra-demo
    spec:
      devices:
        requests:
        - name: gpu
          exactly:
            deviceClassName: gpu.nvidia.com

    The pod landed on worker-3 (the only node with a full GPU), and the allocation shows exactly which device was assigned:

    {
        "device": "gpu-0",
        "driver": "gpu.nvidia.com",
        "pool": "worker-3",
        "request": "gpu"
    }

    No node selectors, no taints, no tolerations. The claim describes what the workload needs, and the scheduler finds a match.

    Device sharing between containers

    Two containers in the same pod can reference the same ResourceClaim, giving both access to the same physical device. This is something the device plug-in framework cannot do.

    apiVersion: resource.k8s.io/v1
    kind: ResourceClaim
    metadata:
      name: shared-mig-claim
      namespace: dra-demo
    spec:
      devices:
        requests:
        - name: mig
          exactly:
            deviceClassName: mig.nvidia.com
            selectors:
            - cel:
                expression: "device.attributes['gpu.nvidia.com'].profile == '3g.20gb'"
    ---
    apiVersion: v1
    kind: Pod
    metadata:
      name: shared-gpu-pod
      namespace: dra-demo
    spec:
      restartPolicy: Never
      containers:
      - name: vectoradd-1
        image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0
        command: ["sh", "-c", "/cuda-samples/vectorAdd && nvidia-smi -L"]
        resources:
          claims:
          - name: gpu
      - name: vectoradd-2
        image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0
        command: ["sh", "-c", "/cuda-samples/vectorAdd && nvidia-smi -L"]
        resources:
          claims:
          - name: gpu
      resourceClaims:
      - name: gpu
        resourceClaimName: shared-mig-claim

    Both containers ran vectorAdd successfully, and nvidia-smi -L confirms they see the exact same MIG device:

    === Container 1 ===
    [Vector addition of 50000 elements]
    ...
    Test PASSED
    Done
    GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-ce5b3123-f4f2-6250-8428-40617e5f9b9d)
      MIG 3g.20gb     Device  0: (UUID: MIG-90155500-9a09-5016-9746-95ef09bd78a6)
    === Container 2 ===
    [Vector addition of 50000 elements]
    ...
    Test PASSED
    Done
    GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-ce5b3123-f4f2-6250-8428-40617e5f9b9d)
      MIG 3g.20gb     Device  0: (UUID: MIG-90155500-9a09-5016-9746-95ef09bd78a6)

    The same UUID is in both containers. This enables lightweight inference sidecars, monitoring agents, or multi-process training to share a single GPU allocation without wasting resources.

    2. Prioritized alternatives in device requests

    Based on upstream KEP-4816, this feature allows pods to specify a prioritized list of acceptable device types within a single ResourceClaim. The scheduler tries to satisfy requests in priority order and falls back to lower-priority alternatives when preferred devices are unavailable.

    Here is a ResourceClaimTemplate that prefers 1g.5gb MIG slices but falls back to 3g.20gb if none are available:

    apiVersion: resource.k8s.io/v1
    kind: ResourceClaimTemplate
    metadata:
      name: prefer-small-mig
      namespace: dra-demo-priority
    spec:
      spec:
        devices:
          requests:
          - name: gpu
            firstAvailable:
            - name: prefer-1g5gb
              deviceClassName: mig.nvidia.com
              selectors:
              - cel:
                  expression: "device.attributes['gpu.nvidia.com'].profile == '1g.5gb'"
            - name: fallback-3g20gb
              deviceClassName: mig.nvidia.com
              selectors:
              - cel:
                  expression: "device.attributes['gpu.nvidia.com'].profile == '3g.20gb'"

    Step 1: Deploy when a preferred device is available

    The first pod gets a 1g.5gb slice. The allocation's request field confirms the first alternative was selected:

    {
        "device": "gpu-0-mig-1g5gb-19-5",
        "driver": "gpu.nvidia.com",
        "pool": "worker-1",
        "request": "gpu/prefer-1g5gb"
    }

    Step 2: Exhaust the preferred device

    Deploy seven more pods using the same template. All seven 1g.5gb slices on worker-1 are consumed.

    Step 3: Fallback kicks in

    The next pod cannot get a 1g.5gb slice because all seven are taken. The scheduler automatically falls back to 3g.20gb on worker-2:

    $ oc logs priority-fallback -n dra-demo-priority
    [Vector addition of 50000 elements]
    ...
    Test PASSED
    Done
    GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-ce5b3123-f4f2-6250-8428-40617e5f9b9d)
      MIG 3g.20gb     Device  0: (UUID: MIG-90155500-9a09-5016-9746-95ef09bd78a6)

    The allocation confirms the fallback alternative was selected:

    {
        "device": "gpu-0-mig-3g20gb-9-0",
        "driver": "gpu.nvidia.com",
        "pool": "worker-2",
        "request": "gpu/fallback-3g20gb"
    }

    Step 4: Total exhaustion

    When both 1g.5gb and 3g.20gb slices are consumed, the next pod remains Pending:

    $ oc get pod priority-exhausted -n dra-demo-priority
    NAME                 READY   STATUS    RESTARTS   AGE
    priority-exhausted   0/1     Pending   0          10s
    $ oc get pod priority-exhausted -n dra-demo-priority -o jsonpath='{.status.conditions[0].message}'
    0/6 nodes are available: 3 cannot allocate all claims, 3 node(s) had untolerated
    taint(s). still not schedulable, preemption: 0/6 nodes are available:
    6 Preemption is not helpful for scheduling.

    In a heterogeneous cluster, teams no longer need separate deployments for each GPU type. One ResourceClaimTemplate handles the preference logic, and the scheduler does the rest.

    3. Namespace-controlled admin access

    Cluster administrators can gain privileged access to devices already in use by other workloads. This is useful for monitoring, health checks, and debugging, and it does not disrupt those workloads. To use admin access, the namespace must carry a specific label, and the ResourceClaim must set adminAccess: true:

    apiVersion: v1
    kind: Namespace
    metadata:
      name: dra-demo-admin
      labels:
        resource.kubernetes.io/admin-access: "true"
    ---
    apiVersion: resource.k8s.io/v1
    kind: ResourceClaim
    metadata:
      name: admin-gpu-claim
      namespace: dra-demo-admin
    spec:
      devices:
        requests:
        - name: gpu
          exactly:
            adminAccess: true
            deviceClassName: mig.nvidia.com
            selectors:
            - cel:
                expression: "device.attributes['gpu.nvidia.com'].profile == '3g.20gb'"

    In this demo, the 3g.20gb slices on worker-2 are already allocated to workload pods from the prioritized alternatives demo above. The admin monitoring pod is deployed in a separate namespace with admin access enabled:

    apiVersion: v1
    kind: Pod
    metadata:
      name: admin-monitor
      namespace: dra-demo-admin
    spec:
      restartPolicy: Never
      containers:
      - name: monitor
        image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0
        command: ["sh", "-c", "nvidia-smi -L && nvidia-smi && sleep 3600"]
        resources:
          claims:
          - name: gpu
      resourceClaims:
      - name: gpu
        resourceClaimName: admin-gpu-claim

    The admin pod gets access to the in-use device and can run nvidia-smi to inspect it:

    GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-ce5b3123-f4f2-6250-8428-40617e5f9b9d)
      MIG 3g.20gb     Device  0: (UUID: MIG-90155500-9a09-5016-9746-95ef09bd78a6)
    +-------------------------------------------------------------------------+
    |NVIDIA-SMI 580.105.08       Driver Ver:580.105.08     CUDA Ver: 13.0     |
    +-------------------------+------------------------+----------------------+
    | GPU  Name     Persistence-M | Bus-Id      Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp Perf Pwr:Usage/Cap| Memory-Usage       | GPU-Util Compute M.  |
    |=========================+========================+======================|
    | 0 NVIDIA A100-SXM4-40GB  On |00000000:00:04.0 Off|                   On |
    | N/A   36C    P0 93W /  400W |              N/A   |     N/A      Default |
    |                             |                    |              Enabled |
    +---------------------------+------------------------+----------------------+
    | MIG devices:                                                                            |
    +------------------+--------------------+-----------+-----------------------+
    | GPU  GI  CI  MIG |Shared Memory-Usage |        Vol|        Shared         |
    |      ID  ID  Dev |  Shared BAR1-Usage | SM     Unc| CE ENC  DEC  OFA  JPG |
    |==================+================================+===========+===========|
    |  0    1   0   0  |107MiB / 20096MiB   | 42      0 |  3   0    2    0    0 |
    |                  |               0MiB / 12210MiB  |           |           |
    +------------------+--------------------------------+-----------+-----------+

    The allocation confirms admin access was granted:

    {
        "adminAccess": true,
        "device": "gpu-0-mig-3g20gb-9-0",
        "driver": "gpu.nvidia.com",
        "pool": "worker-2",
        "request": "gpu"
    }

    Meanwhile, the original workload pod continues running undisturbed on the same device. This gives SREs and platform teams the ability to monitor GPU health and debug allocation issues in production without evicting running workloads.

    What's next

    DRA continues to evolve upstream. Features currently in alpha or beta in Kubernetes 1.34 that may appear in future OpenShift releases include:

    • Partitionable devices allow drivers to advertise overlapping logical device partitions and reconfigure physical hardware dynamically based on actual allocations.
    • Device taints and tolerations mark devices as degraded or unusable, similar to node taints, with workloads explicitly tolerating tainted devices.
    • Device binding conditions support for network-attached and fabric-attached accelerators that need pre-binding to nodes before pod scheduling.

    To learn more, check out these resources:

    • OpenShift 4.21 documentation: Allocating GPUs to Pods
    • Kubernetes documentation: Dynamic Resource Allocation
    • NVIDIA DRA Driver for GPUs
    • KEP-3063: Dynamic Resource Allocation
    • KEP-4816: Prioritized Alternatives in Device Requests

    Related Posts

    • Evaluate OpenShift cluster health with the cluster observability operator

    • Rebalance hub workloads with managed cluster migration

    • How Advanced Cluster Management simplifies rule management

    • Advanced Cluster Management 2.16 right-sizing recommendation GA

    • Facing a forced migration? You have a choice with OpenShift Virtualization

    • OpenShift networking evolved: Real routing, no NAT or asymmetry

    Recent Posts

    • Getting started with the vLLM Semantic Router project's Athena release: Optimize your tokens for agentic AI

    • Dynamic resource allocation goes GA in Red Hat OpenShift 4.21: Smarter GPU scheduling for AI workloads

    • How to run a Red Hat-powered local AI audio transcription

    • Run Model-as-a-Service for multiple LLMs on OpenShift

    • Evaluate OpenShift cluster health with the cluster observability operator

    What’s up next?

    GrumpyDev_ebook-tilecard

    The Grumpy Developer's Guide to OpenShift

    Ian Lawson
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue