Red Hat Interactive Learning Portal
OpenShift learning
Partition AMD Instinct GPU accelerators via device config manager (DCM) in Red Hat OpenShift
Verify control plane toleration

Partition AMD Instinct GPU accelerators via device config manager (DCM) in Red Hat OpenShift

Learning path | 7 resources | 1 hr and 5 mins | Published on June 12, 2026

Make your AI infrastructure more efficient by partitioning AMD Instinct GPUs via the device config manager in Red Hat OpenShift and validate your setup with a vLLM workload.

Page

Verify control plane toleration

June 9, 2026

Leonardo Ochoa Aday

AMD GPUs cannot be partitioned while workloads are using them. In a later lesson, you will apply an amd-dcm=up:NoExecute taint to the GPU node, which evicts all pods that lack a matching toleration. On Kubernetes, this requires manually patching kube-system DaemonSets and Deployments with the DCM toleration before tainting.

On Red Hat OpenShift, this step is not required. Critical control-plane DaemonSets (cluster networking, core services, machine-config, etc.) carry a wildcard toleration (operator: Exists) by default, so they survive the NoExecute taint without any manual patching.

The following table summarizes what stays running and what gets evicted when the taint is applied:

Component	Status	Reason
etcd, apiserver, controller-manager, scheduler	Running	Wildcard toleration (`operator: Exists`)
OVN, Multus, DNS, MCD, node-exporter	Running	Wildcard toleration (`operator: Exists`)
DCM	Running	Explicit `amd-dcm=up` toleration
device-plugin, node-labeller, metrics-exporter	Evicted	No matching toleration (expected)

The GPU operands (device-plugin, node-labeller, metrics-exporter) are intentionally evicted; they must release the GPU devices before DCM can repartition the hardware.

Prerequisites

Full cluster administrator privileges on your Red Hat OpenShift environment.
The jq command-line utility installed locally for processing JSON outputs.

In this lesson, you will:

Confirm that Red Hat OpenShift control-plane pods already tolerate the DCM taint.
Understand which pods will be evicted and why that is expected.

Verify control plane toleration

First, verify your control planes and all tolerations they have. This confirms which components are already protected and helps you understand the impact of the taint you'll apply later.

Pick any critical control-plane DaemonSet and inspect its tolerations. For example, check the OVN-Kubernetes DaemonSet:
```
oc get daemonset ovnkube-node -n openshift-ovn-kubernetes -o jsonpath='{.spec.template.spec.tolerations}' | jq .
```
You will see a wildcard toleration entry like this:
```
[
 {
   "operator": "Exists"
 }
]
```
The operator: Exists toleration (with no key specified) matches every taint, including amd-dcm=up:NoExecute. This is why no manual patching is needed.
Optional: For additional confidence, verify that the DCM Pod itself carries the explicit toleration it needs. Run this after the DCM is deployed. The pod won't exist until then.
```
oc get pod -n openshift-amd-gpu -l app.kubernetes.io/name=device-config-manager -o jsonpath='{.items[0].spec.tolerations}' | jq .
```
You should see:
```
    [
 {
   "key": "amd-dcm",
   "operator": "Equal",
   "value": "up",
   "effect": "NoExecute"
 },
... TRUNCATED ...
]
```
The operator: Exists toleration (with no key specified) matches every taint, including amd-dcm=up:NoExecute. This is why no manual patching is needed.

Note

On vanilla Kubernetes clusters, you must manually add the amd-dcm toleration to kube-system Deployments and DaemonSets before tainting. See the upstream DCM documentation for the Kubernetes-specific steps.

Your Red Hat OpenShift control plane is already protected. No toleration patching is required before proceeding with GPU partitioning. You can now configure and deploy the device config manager.

Previous resource

Overview: Partition AMD Instinct GPU accelerators via device config manager (DCM) in Red Hat OpenShift

Configure and deploy the device config manager