Page
Verify control plane toleration
AMD GPUs cannot be partitioned while workloads are using them. In a later lesson, you will apply an amd-dcm=up:NoExecute taint to the GPU node, which evicts all pods that lack a matching toleration. On Kubernetes, this requires manually patching kube-system DaemonSets and Deployments with the DCM toleration before tainting.
On Red Hat OpenShift, this step is not required. Critical control-plane DaemonSets (cluster networking, core services, machine-config, etc.) carry a wildcard toleration (operator: Exists) by default, so they survive the NoExecute taint without any manual patching.
The following table summarizes what stays running and what gets evicted when the taint is applied:
Component | Status | Reason |
|---|---|---|
etcd, apiserver, controller-manager, scheduler | Running | Wildcard toleration ( |
OVN, Multus, DNS, MCD, node-exporter | Running | Wildcard toleration ( |
DCM | Running | Explicit |
device-plugin, node-labeller, metrics-exporter | Evicted | No matching toleration (expected) |
The GPU operands (device-plugin, node-labeller, metrics-exporter) are intentionally evicted; they must release the GPU devices before DCM can repartition the hardware.
Prerequisites
- Full cluster administrator privileges on your Red Hat OpenShift environment.
- The
jqcommand-line utility installed locally for processing JSON outputs.
In this lesson, you will:
- Confirm that Red Hat OpenShift control-plane pods already tolerate the DCM taint.
- Understand which pods will be evicted and why that is expected.
Verify control plane toleration
First, verify your control planes and all tolerations they have. This confirms which components are already protected and helps you understand the impact of the taint you'll apply later.
Pick any critical control-plane
DaemonSetand inspect its tolerations. For example, check theOVN-KubernetesDaemonSet:oc get daemonset ovnkube-node -n openshift-ovn-kubernetes -o jsonpath='{.spec.template.spec.tolerations}' | jq .You will see a wildcard toleration entry like this:
[ { "operator": "Exists" } ]
Theoperator: Existstoleration (with no key specified) matches every taint, includingamd-dcm=up:NoExecute. This is why no manual patching is needed.Optional: For additional confidence, verify that the
DCMPod itself carries the explicit toleration it needs. Run this after the DCM is deployed. The pod won't exist until then.oc get pod -n openshift-amd-gpu -l app.kubernetes.io/name=device-config-manager -o jsonpath='{.items[0].spec.tolerations}' | jq .You should see:
[ { "key": "amd-dcm", "operator": "Equal", "value": "up", "effect": "NoExecute" }, ... TRUNCATED ... ]The
operator: Existstoleration (with no key specified) matches every taint, includingamd-dcm=up:NoExecute. This is why no manual patching is needed.
Note
On vanilla Kubernetes clusters, you must manually add the amd-dcm toleration to kube-system Deployments and DaemonSets before tainting. See the upstream DCM documentation for the Kubernetes-specific steps.
Your Red Hat OpenShift control plane is already protected. No toleration patching is required before proceeding with GPU partitioning. You can now configure and deploy the device config manager.