Dynamic resource allocation goes GA in Red Hat OpenShift 4.21: Smarter GPU scheduling for AI workloads

With OpenShift 4.21, Dynamic Resource Allocation (DRA) graduates to General Availability, fundamentally changing how GPU and accelerator resources are requested, allocated, and shared across your cluster. Built on the upstream Kubernetes 1.34 DRA implementation, this release replaces the limitations of the old device plug-in model with a richer, expression-driven framework that understands device attributes, not just device counts.

This post covers what DRA is, why it matters, what's new in OpenShift 4.21, and how to use it with real examples running on an OpenShift 4.21 cluster with NVIDIA A100 GPUs.

The problem: Why device plug-ins fall short

Kubernetes has supported hardware accelerators like GPUs through the device plug-in framework since version 1.8. While functional, device plug-ins have fundamental limitations that become painful at scale, especially for AI/ML workloads:

Count-based allocation only: A pod requests nvidia.com/gpu: 1, but has no way to say which GPU it needs. There is no mechanism to filter by model, memory capacity, compute capability, or driver version.
No device sharing: A GPU allocated to one container cannot be shared with another, even for lightweight inference workloads that would only use a fraction of the device.
No topology awareness: The scheduler is blind to PCIe topology, NVLink connectivity, and NUMA placement. Multi-GPU workloads may land on suboptimal device combinations.
No parameterization: Workloads cannot request specific device configurations like MIG profiles or power limits at scheduling time.
No cluster autoscaler integration: The autoscaler cannot reason about opaque device resources when deciding whether to add or remove nodes.

Teams work around these limitations with node labels, taints, tolerations, and custom admission webhooks, but these are brittle, error-prone, and do not scale.

What is dynamic resource allocation?

Dynamic resource allocation (DRA) is a Kubernetes API framework under the resource.k8s.io API group that enables workloads to request specialized hardware based on device attributes rather than simple counts. Think of it as the device equivalent of the PersistentVolume/PersistentVolumeClaim model, but for GPUs, FPGAs, NICs, and other accelerators.

DRA introduces four core API objects:

ResourceSlice: Published by DRA drivers on each node. Describes available devices with typed attributes (model, memory, driver version, UUID, and so on).
DeviceClass: Defines a category of devices using CEL selector expressions. Created by admins or drivers.
ResourceClaim: A workload's request for specific devices. Supports CEL-based filtering, can be shared across pods, and persists independently of pod lifecycle.
ResourceClaimTemplate: A template from which Kubernetes auto-generates per-pod ResourceClaims. The generated claim is deleted when its pod terminates.

The key architectural insight is that DRA drivers publish structured, transparent device information (ResourceSlices) to the API server, and the kube-scheduler itself handles allocation decisions by evaluating CEL expressions against device attributes. No external controller negotiation is needed during scheduling, which makes DRA significantly faster and fully compatible with the cluster autoscaler.

See it in practice

On a cluster with the NVIDIA DRA driver installed, each node publishes a ResourceSlice describing its GPUs. Here is what the driver advertises for a full A100 GPU:

{
    "attributes": {
        "architecture":          { "string":  "Ampere" },
        "brand":                 { "string":  "Nvidia" },
        "cudaComputeCapability": { "version": "8.0.0" },
        "cudaDriverVersion":     { "version": "13.0.0" },
        "driverVersion":         { "version": "580.105.8" },
        "productName":           { "string":  "NVIDIA A100-SXM4-40GB" },
        "type":                  { "string":  "gpu" },
        "uuid":                  { "string":  "GPU-ec819aa6-26b9-d90a-00c8-3fcf0a34a0c9" }
    },
    "capacity": {
        "memory": { "value": "40Gi" }
    },
    "name": "gpu-0"
}

And for a MIG slice on the same GPU model:

{
    "attributes": {
        "architecture":          { "string":  "Ampere" },
        "productName":           { "string":  "NVIDIA A100-SXM4-40GB" },
        "profile":               { "string":  "1g.5gb" },
        "type":                  { "string":  "mig" },
        "uuid":                  { "string":  "MIG-e42ee090-5c43-53b2-a164-c6a0b7ac1a57" },
        "parentUUID":            { "string":  "GPU-e40930a0-c463-2611-3473-bc72ac15679a" }
    },
    "capacity": {
        "memory":           { "value": "4864Mi" },
        "multiprocessors":  { "value": "14" }
    },
    "name": "gpu-0-mig-1g5gb-19-5"
}

The scheduler can now see the product name, architecture, MIG profile, memory capacity, and more. This is information that was completely invisible in the device plug-in model.

The NVIDIA DRA driver also creates DeviceClasses automatically:

$ oc get deviceclasses
NAME                                        AGE
gpu.nvidia.com                              12m
mig.nvidia.com                              12m

The gpu.nvidia.com name matches full GPUs, while mig.nvidia.com matches MIG slices. Both use CEL selectors against the type attribute published in the ResourceSlice.

The road to GA

DRA's path to General Availability (GA) in OpenShift 4.21 spans multiple Kubernetes and OpenShift releases:

Release	Kubernetes	DRA status	Milestone
OpenShift 4.19	1.32	Not available	Upstream DRA beta with structured parameters; classic DRA withdrawn
OpenShift 4.20	1.33	Technology Preview	DRA enabled behind TechPreview feature gate in OpenShift with validation of Nvidia driver
OpenShift 4.21	1.34	General Availability	DRA enabled by default; `resource.k8s.io/v1` API; beta APIs removed

The feature gate DynamicResourceAllocation was promoted to the default feature set (OCPNODE-3779, now closed). Earlier alpha/beta API enablement was removed since the v1 API is now served by default.

What's GA in OpenShift 4.21

Three DRA capabilities reached General Availability in the 4.21 release. The demos in this article were run on an OpenShift 4.21.3 cluster on Google Cloud with three A100-SXM4-40GB worker nodes, each configured with a different GPU layout:

worker-1: all-1g.5gb
- 7x MIG 1g.5gb slices (4.8 GB each)
worker-2: all-3g.20gb
- 2x MIG 3g.20gb slices (19.6 GB each)
worker-3: MIG disabled
- 1x full A100 40 GB

1. Attribute-based GPU allocation

This is the headline feature. Pods can now request GPUs based on specific device attributes exposed by a DRA driver, including product name, memory capacity, compute capability, driver version, and MIG profile.

Requesting a specific MIG profile

The following ResourceClaimTemplate requests a 1g.5gb MIG slice using a CEL selector against the profile attribute:

apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
  name: mig-1g5gb-claim
  namespace: dra-demo
spec:
  spec:
    devices:
      requests:
      - name: mig
        exactly:
          deviceClassName: mig.nvidia.com
          selectors:
          - cel:
              expression: "device.attributes['gpu.nvidia.com'].profile == '1g.5gb'"

A pod references this template and runs the CUDA vectorAdd sample to verify that the GPU is actually usable:

apiVersion: v1
kind: Pod
metadata:
  name: vectoradd-1g5gb
  namespace: dra-demo
spec:
  restartPolicy: Never
  containers:
  - name: vectoradd
    image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0
    resources:
      claims:
      - name: gpu
  resourceClaims:
  - name: gpu
    resourceClaimTemplateName: mig-1g5gb-claim

The result:

$ oc logs vectoradd-1g5gb -n dra-demo
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done

The pod landed on worker-1 (the node with 1g.5gb slices), and the scheduler matched the claim to a specific MIG device.

Requesting a full GPU

Using the gpu.nvidia.com DeviceClass instead of mig.nvidia.com requests a whole GPU:

apiVersion: resource.k8s.io/v1
kind: ResourceClaim
metadata:
  name: full-gpu-claim
  namespace: dra-demo
spec:
  devices:
    requests:
    - name: gpu
      exactly:
        deviceClassName: gpu.nvidia.com

The pod landed on worker-3 (the only node with a full GPU), and the allocation shows exactly which device was assigned:

{
    "device": "gpu-0",
    "driver": "gpu.nvidia.com",
    "pool": "worker-3",
    "request": "gpu"
}

No node selectors, no taints, no tolerations. The claim describes what the workload needs, and the scheduler finds a match.

Device sharing between containers

Two containers in the same pod can reference the same ResourceClaim, giving both access to the same physical device. This is something the device plug-in framework cannot do.

apiVersion: resource.k8s.io/v1
kind: ResourceClaim
metadata:
  name: shared-mig-claim
  namespace: dra-demo
spec:
  devices:
    requests:
    - name: mig
      exactly:
        deviceClassName: mig.nvidia.com
        selectors:
        - cel:
            expression: "device.attributes['gpu.nvidia.com'].profile == '3g.20gb'"
---
apiVersion: v1
kind: Pod
metadata:
  name: shared-gpu-pod
  namespace: dra-demo
spec:
  restartPolicy: Never
  containers:
  - name: vectoradd-1
    image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0
    command: ["sh", "-c", "/cuda-samples/vectorAdd && nvidia-smi -L"]
    resources:
      claims:
      - name: gpu
  - name: vectoradd-2
    image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0
    command: ["sh", "-c", "/cuda-samples/vectorAdd && nvidia-smi -L"]
    resources:
      claims:
      - name: gpu
  resourceClaims:
  - name: gpu
    resourceClaimName: shared-mig-claim

Both containers ran vectorAdd successfully, and nvidia-smi -L confirms they see the exact same MIG device:

=== Container 1 ===
[Vector addition of 50000 elements]
...
Test PASSED
Done
GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-ce5b3123-f4f2-6250-8428-40617e5f9b9d)
  MIG 3g.20gb     Device  0: (UUID: MIG-90155500-9a09-5016-9746-95ef09bd78a6)
=== Container 2 ===
[Vector addition of 50000 elements]
...
Test PASSED
Done
GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-ce5b3123-f4f2-6250-8428-40617e5f9b9d)
  MIG 3g.20gb     Device  0: (UUID: MIG-90155500-9a09-5016-9746-95ef09bd78a6)

The same UUID is in both containers. This enables lightweight inference sidecars, monitoring agents, or multi-process training to share a single GPU allocation without wasting resources.

2. Prioritized alternatives in device requests

Based on upstream KEP-4816, this feature allows pods to specify a prioritized list of acceptable device types within a single ResourceClaim. The scheduler tries to satisfy requests in priority order and falls back to lower-priority alternatives when preferred devices are unavailable.

Here is a ResourceClaimTemplate that prefers 1g.5gb MIG slices but falls back to 3g.20gb if none are available:

apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
  name: prefer-small-mig
  namespace: dra-demo-priority
spec:
  spec:
    devices:
      requests:
      - name: gpu
        firstAvailable:
        - name: prefer-1g5gb
          deviceClassName: mig.nvidia.com
          selectors:
          - cel:
              expression: "device.attributes['gpu.nvidia.com'].profile == '1g.5gb'"
        - name: fallback-3g20gb
          deviceClassName: mig.nvidia.com
          selectors:
          - cel:
              expression: "device.attributes['gpu.nvidia.com'].profile == '3g.20gb'"

Step 1: Deploy when a preferred device is available

The first pod gets a 1g.5gb slice. The allocation's request field confirms the first alternative was selected:

{
    "device": "gpu-0-mig-1g5gb-19-5",
    "driver": "gpu.nvidia.com",
    "pool": "worker-1",
    "request": "gpu/prefer-1g5gb"
}

Step 2: Exhaust the preferred device

Deploy seven more pods using the same template. All seven 1g.5gb slices on worker-1 are consumed.

Step 3: Fallback kicks in

The next pod cannot get a 1g.5gb slice because all seven are taken. The scheduler automatically falls back to 3g.20gb on worker-2:

$ oc logs priority-fallback -n dra-demo-priority
[Vector addition of 50000 elements]
...
Test PASSED
Done
GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-ce5b3123-f4f2-6250-8428-40617e5f9b9d)
  MIG 3g.20gb     Device  0: (UUID: MIG-90155500-9a09-5016-9746-95ef09bd78a6)

The allocation confirms the fallback alternative was selected:

{
    "device": "gpu-0-mig-3g20gb-9-0",
    "driver": "gpu.nvidia.com",
    "pool": "worker-2",
    "request": "gpu/fallback-3g20gb"
}

Step 4: Total exhaustion

When both 1g.5gb and 3g.20gb slices are consumed, the next pod remains Pending:

$ oc get pod priority-exhausted -n dra-demo-priority
NAME                 READY   STATUS    RESTARTS   AGE
priority-exhausted   0/1     Pending   0          10s
$ oc get pod priority-exhausted -n dra-demo-priority -o jsonpath='{.status.conditions[0].message}'
0/6 nodes are available: 3 cannot allocate all claims, 3 node(s) had untolerated
taint(s). still not schedulable, preemption: 0/6 nodes are available:
6 Preemption is not helpful for scheduling.

In a heterogeneous cluster, teams no longer need separate deployments for each GPU type. One ResourceClaimTemplate handles the preference logic, and the scheduler does the rest.

3. Namespace-controlled admin access

Cluster administrators can gain privileged access to devices already in use by other workloads. This is useful for monitoring, health checks, and debugging, and it does not disrupt those workloads. To use admin access, the namespace must carry a specific label, and the ResourceClaim must set adminAccess: true:

apiVersion: v1
kind: Namespace
metadata:
  name: dra-demo-admin
  labels:
    resource.kubernetes.io/admin-access: "true"
---
apiVersion: resource.k8s.io/v1
kind: ResourceClaim
metadata:
  name: admin-gpu-claim
  namespace: dra-demo-admin
spec:
  devices:
    requests:
    - name: gpu
      exactly:
        adminAccess: true
        deviceClassName: mig.nvidia.com
        selectors:
        - cel:
            expression: "device.attributes['gpu.nvidia.com'].profile == '3g.20gb'"

In this demo, the 3g.20gb slices on worker-2 are already allocated to workload pods from the prioritized alternatives demo above. The admin monitoring pod is deployed in a separate namespace with admin access enabled:

apiVersion: v1
kind: Pod
metadata:
  name: admin-monitor
  namespace: dra-demo-admin
spec:
  restartPolicy: Never
  containers:
  - name: monitor
    image: nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0
    command: ["sh", "-c", "nvidia-smi -L && nvidia-smi && sleep 3600"]
    resources:
      claims:
      - name: gpu
  resourceClaims:
  - name: gpu
    resourceClaimName: admin-gpu-claim

The admin pod gets access to the in-use device and can run nvidia-smi to inspect it:

GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-ce5b3123-f4f2-6250-8428-40617e5f9b9d)
  MIG 3g.20gb     Device  0: (UUID: MIG-90155500-9a09-5016-9746-95ef09bd78a6)
+-------------------------------------------------------------------------+
|NVIDIA-SMI 580.105.08       Driver Ver:580.105.08     CUDA Ver: 13.0     |
+-------------------------+------------------------+----------------------+
| GPU  Name     Persistence-M | Bus-Id      Disp.A | Volatile Uncorr. ECC |
| Fan  Temp Perf Pwr:Usage/Cap| Memory-Usage       | GPU-Util Compute M.  |
|=========================+========================+======================|
| 0 NVIDIA A100-SXM4-40GB  On |00000000:00:04.0 Off|                   On |
| N/A   36C    P0 93W /  400W |              N/A   |     N/A      Default |
|                             |                    |              Enabled |
+---------------------------+------------------------+----------------------+
| MIG devices:                                                                            |
+------------------+--------------------+-----------+-----------------------+
| GPU  GI  CI  MIG |Shared Memory-Usage |        Vol|        Shared         |
|      ID  ID  Dev |  Shared BAR1-Usage | SM     Unc| CE ENC  DEC  OFA  JPG |
|==================+================================+===========+===========|
|  0    1   0   0  |107MiB / 20096MiB   | 42      0 |  3   0    2    0    0 |
|                  |               0MiB / 12210MiB  |           |           |
+------------------+--------------------------------+-----------+-----------+

The allocation confirms admin access was granted:

{
    "adminAccess": true,
    "device": "gpu-0-mig-3g20gb-9-0",
    "driver": "gpu.nvidia.com",
    "pool": "worker-2",
    "request": "gpu"
}

Meanwhile, the original workload pod continues running undisturbed on the same device. This gives SREs and platform teams the ability to monitor GPU health and debug allocation issues in production without evicting running workloads.

What's next

DRA continues to evolve upstream. Features currently in alpha or beta in Kubernetes 1.34 that may appear in future OpenShift releases include:

Partitionable devices allow drivers to advertise overlapping logical device partitions and reconfigure physical hardware dynamically based on actual allocations.
Device taints and tolerations mark devices as degraded or unusable, similar to node taints, with workloads explicitly tolerating tainted devices.
Device binding conditions support for network-attached and fabric-attached accelerators that need pre-binding to nodes before pod scheduling.

To learn more, check out these resources:

Last updated: March 30, 2026

Dynamic resource allocation goes GA in Red Hat OpenShift 4.21: Smarter GPU scheduling for AI workloads

The problem: Why device plug-ins fall short

What is dynamic resource allocation?

See it in practice

The road to GA

What's GA in OpenShift 4.21

1. Attribute-based GPU allocation

Requesting a specific MIG profile

Requesting a full GPU

Device sharing between containers

2. Prioritized alternatives in device requests

Step 1: Deploy when a preferred device is available

Step 2: Exhaust the preferred device

Step 3: Fallback kicks in

Step 4: Total exhaustion

3. Namespace-controlled admin access

What's next

Red Hat build of Quarkus 3.33: Stability and performance advancements for enterprise Java

Batch inference on OpenShift AI with llm-d: Architecture, integration, and workflows

Upgrade RHEL with leapp

Kafka Monthly Digest: June 2026

Build a multi-agent supervisor pattern on Red Hat AI

The Grumpy Developer's Guide to OpenShift

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links