Page

Taint and label GPU node to trigger AMD hardware partitioning

June 12, 2026

Leonardo Ochoa Aday

With your control plane protected and the device config manager (DCM) deployed, you are ready to begin partitioning. Slicing an AMD Instinct GPU requires resetting the underlying device. Because of this, you must clear the target node of all active workloads so that no applications are left trying to use the hardware when the configuration change occurs.

Tainting the GPU node with amd-dcm=up:NoExecute will immediately evict all non-essential workloads and prevent the scheduling of new workloads on the node. This ensures no workloads are using the GPUs before partitioning. Only pods and DaemonSets with the matching toleration will remain running.

We will then label the node with the desired partition profile, which signals DCM to apply the GPU partitioning configuration.

Prerequisites:

In this lesson, you will:

Apply a node taint to safely evict active user workloads from your GPU host.
Label the node with a partition profile to trigger DCM partitioning.

Taint and label the GPU node

To protect your workloads and begin partitioning, start by tainting and then labeling the GPU node.

Taint the GPU node to evict non-essential workloads. Run the following command:
```
oc taint nodes "$NODE_NAME" amd-dcm=up:NoExecute
```
After tainting the node, inspect the status of all Pods in the cluster. In some environments, a Prometheus exporter Pod may enter an Error state (rather than Pending) and continue holding GPU resources, which blocks the DCM partitioning process. If this occurs, scale down the Prometheus instance and force-delete the stuck Pod:
```
oc patch prometheus amd-gpu-prometheus -n devmetrics --type='merge' -p '{"spec":{"replicas":0}}'
oc delete pod -n devmetrics prometheus-amd-gpu-prometheus-0 --force --grace-period=0
```

With GPU-consuming workloads evicted, label the node to trigger DCM partitioning.

oc label node $NODE_NAME dcm.amd.com/gpu-config-profile=cpx-profile-nps4 --overwrite

Note

The --overwrite flag accounts for any existing gpu-config-profile label.

Wait for DCM to process the profile.

oc logs -n openshift-amd-gpu -l app.kubernetes.io/name=device-config-manager -f

A successful output will look like this:

NEW TRIGGER ALERT FROM NODE LABELS
2026/05/27 09:10:48 Label changed: dcm.amd.com/gpu-config-profile
Old value: 
New value: cpx-profile-nps4
2026/05/27 09:10:48 #####################################
2026/05/27 09:10:48 Partition profile info:
2026/05/27 09:10:48 Selected profile name: cpx-profile-nps4
2026/05/27 09:10:48 #####################################

. . . TRUNCATED . . .

2026/05/27 09:12:10 Successfully Partitioned GPUs of profile 1
2026/05/27 09:12:10 Partition completed successfully
2026/05/27 09:12:10 Label "dcm.amd.com/gpu-config-profile-state" added successfully to node "smc6216gpu.partner-accelerators.redhat.lab"
2026/05/27 09:12:10 AMD SMI shutdown successfully
2026/05/27 09:12:10 #####################################
2026/05/27 09:12:10 PartitionGPU executed successfully
2026/05/27 09:12:10 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2026/05/27 09:12:10 ServicesList [amd-metrics-exporter gpuagent]
2026/05/27 09:12:10 Restarting service skipped for: amd-metrics-exporter.service (was not-loaded at 2026-05-27 09:10:58.654388053 +0000 UTC m=+45.487989488)
2026/05/27 09:12:10 Restarting service skipped for: gpuagent.service (was not-loaded at 2026-05-27 09:11:08.663196594 +0000 UTC m=+55.496798039)
2026/05/27 09:12:10 Cleaning up PreStateDB...
2026/05/27 09:12:10 PreStateDB has been successfully emptied.
2026/05/27 09:12:10 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Once DCM has completed the partitioning procedure, it labels the node with the result of that operation:

oc get nodes -o json | jq '.items[].metadata.labels | with_entries(select(.key | startswith("dcm.amd.com")))'

Your output will look like this:

  "dcm.amd.com/gpu-config-profile": "cpx-profile-nps4",
  "dcm.amd.com/gpu-config-profile-state": "success"

Success! You’ve tainted and labeled the GPU node, triggering DCM to partition your GPU accelerators to the selected profile.

Now it’s time to verify your new GPU partitioning.

Partition AMD Instinct GPU accelerators via device config manager (DCM) in Red Hat OpenShift

Taint and label GPU node to trigger AMD hardware partitioning

Prerequisites:

In this lesson, you will:

Taint and label the GPU node

Note

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Breadcrumb

Partition AMD Instinct GPU accelerators via device config manager (DCM) in Red Hat OpenShift

Path resource: Taint and label GPU node to trigger AMD hardware partitioning

Prerequisites:

In this lesson, you will:

Taint and label the GPU node

Note

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links