Page
Taint and label GPU node to trigger AMD hardware partitioning
With your control plane protected and the device config manager (DCM) deployed, you are ready to begin partitioning. Slicing an AMD Instinct GPU requires resetting the underlying device. Because of this, you must clear the target node of all active workloads so that no applications are left trying to use the hardware when the configuration change occurs.
Tainting the GPU node with amd-dcm=up:NoExecute will immediately evict all non-essential workloads and prevent the scheduling of new workloads on the node. This ensures no workloads are using the GPUs before partitioning. Only pods and DaemonSets with the matching toleration will remain running.
We will then label the node with the desired partition profile, which signals DCM to apply the GPU partitioning configuration.
Prerequisites:
In this lesson, you will:
- Apply a node taint to safely evict active user workloads from your GPU host.
- Label the node with a partition profile to trigger DCM partitioning.
Taint and label the GPU node
To protect your workloads and begin partitioning, start by tainting and then labeling the GPU node.
Taint the GPU node to evict non-essential workloads. Run the following command:
oc taint nodes "$NODE_NAME" amd-dcm=up:NoExecuteAfter tainting the node, inspect the status of all
Podsin the cluster. In some environments, a Prometheus exporterPodmay enter anErrorstate (rather thanPending) and continue holding GPU resources, which blocks the DCM partitioning process. If this occurs, scale down the Prometheus instance and force-delete the stuckPod:oc patch prometheus amd-gpu-prometheus -n devmetrics --type='merge' -p '{"spec":{"replicas":0}}' oc delete pod -n devmetrics prometheus-amd-gpu-prometheus-0 --force --grace-period=0With GPU-consuming workloads evicted, label the node to trigger DCM partitioning.
oc label node $NODE_NAME dcm.amd.com/gpu-config-profile=cpx-profile-nps4 --overwrite
Note
The --overwrite flag accounts for any existing gpu-config-profile label.
Wait for DCM to process the profile.
oc logs -n openshift-amd-gpu -l app.kubernetes.io/name=device-config-manager -fA successful output will look like this:
NEW TRIGGER ALERT FROM NODE LABELS 2026/05/27 09:10:48 Label changed: dcm.amd.com/gpu-config-profile Old value: New value: cpx-profile-nps4 2026/05/27 09:10:48 ##################################### 2026/05/27 09:10:48 Partition profile info: 2026/05/27 09:10:48 Selected profile name: cpx-profile-nps4 2026/05/27 09:10:48 ##################################### . . . TRUNCATED . . . 2026/05/27 09:12:10 Successfully Partitioned GPUs of profile 1 2026/05/27 09:12:10 Partition completed successfully 2026/05/27 09:12:10 Label "dcm.amd.com/gpu-config-profile-state" added successfully to node "smc6216gpu.partner-accelerators.redhat.lab" 2026/05/27 09:12:10 AMD SMI shutdown successfully 2026/05/27 09:12:10 ##################################### 2026/05/27 09:12:10 PartitionGPU executed successfully 2026/05/27 09:12:10 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2026/05/27 09:12:10 ServicesList [amd-metrics-exporter gpuagent] 2026/05/27 09:12:10 Restarting service skipped for: amd-metrics-exporter.service (was not-loaded at 2026-05-27 09:10:58.654388053 +0000 UTC m=+45.487989488) 2026/05/27 09:12:10 Restarting service skipped for: gpuagent.service (was not-loaded at 2026-05-27 09:11:08.663196594 +0000 UTC m=+55.496798039) 2026/05/27 09:12:10 Cleaning up PreStateDB... 2026/05/27 09:12:10 PreStateDB has been successfully emptied. 2026/05/27 09:12:10 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^Once DCM has completed the partitioning procedure, it labels the node with the result of that operation:
oc get nodes -o json | jq '.items[].metadata.labels | with_entries(select(.key | startswith("dcm.amd.com")))'Your output will look like this:
"dcm.amd.com/gpu-config-profile": "cpx-profile-nps4", "dcm.amd.com/gpu-config-profile-state": "success"
Success! You’ve tainted and labeled the GPU node, triggering DCM to partition your GPU accelerators to the selected profile.
Now it’s time to verify your new GPU partitioning.