Page

Configure and deploy the device config manager

June 9, 2026

Leonardo Ochoa Aday

Now that your control-plane workloads are protected with tolerations, you can prepare the infrastructure components that control the accelerator devices. The AMD GPU Operator manages these hardware day2 operations using a component called the device config manager (DCM).

By default, the DCM is not active. To enable it, define a ConfigMap that maps partition profiles to your hardware—for example, a single monolithic GPU (SPX) or multiple smaller slices (CPX)—so the operator knows which compute and memory layout to apply on each node.

Prerequisites

Control plane toleration verified.
The AMD GPU Operator installed in the openshift-amd-gpu namespace.

In this lesson, you will:

Create a ConfigMap with GPU partition profiles
Patch the DeviceConfig custom resource to enable and configure the DCM, which orchestrates GPU partitioning.

Deploy DCM

To ensure your infrastructure components are mapped correctly, create and deploy DCM profiles with ConfigMap.

Validate the available compute and memory partitions that your target GPU node supports with the command below. Run this on the GPU node itself (e.g., via SSH or oc debug node).

cat /sys/module/amdgpu/drivers/pci\:amdgpu/*/{available_compute_partition,available_memory_partition}

On a standard AMD Instinct MI300X system with an 8-GPU layout, your output will look like this:

SPX, DPX, QPX, CPX
SPX, DPX, QPX, CPX
SPX, DPX, QPX, CPX
SPX, DPX, QPX, CPX
SPX, DPX, QPX, CPX
SPX, DPX, QPX, CPX
SPX, DPX, QPX, CPX
SPX, DPX, QPX, CPX
NPS1, NPS2, NPS4
NPS1, NPS2, NPS4
NPS1, NPS2, NPS4
NPS1, NPS2, NPS4
NPS1, NPS2, NPS4
NPS1, NPS2, NPS4
NPS1, NPS2, NPS4
NPS1, NPS2, NPS4

Note

If some of the available partitions are not shown in your MI300X series systems (e.g., NPS2), it is recommended to update to the latest Firmware and BIOS versions from your vendor. For the MI300X system, the minimal VBIOS version supported for partitioning is 022.040.003.042.

Create DCM profiles via ConfigMap. This ConfigMap defines three partition profiles: unpartitioned (spx-profile-nps1), dual-partition (dpx-profile-nps2), and maximum-partition (cpx-profile-nps4). Run the command below to create the ConfigMap on your cluster.

cat <<EOF | tee gpu-partition-profiles.yaml | oc apply -f -
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: config-manager-config
  namespace: openshift-amd-gpu
data:
  config.json: |
    {
      "gpu-config-profiles": {
        "spx-profile-nps1": {
          "skippedGPUs": {
            "ids": []
          },
          "profiles": [
            {
              "computePartition": "SPX",
              "memoryPartition": "NPS1",
              "numGPUsAssigned": 8
            }
          ]
        },
        "dpx-profile-nps2": {
          "skippedGPUs": {
            "ids": []
          },
          "profiles": [
            {
              "computePartition": "DPX",
              "memoryPartition": "NPS2",
              "numGPUsAssigned": 8
            }
          ]
        },
        "cpx-profile-nps4": {
          "skippedGPUs": {
            "ids": []
          },
          "profiles": [
            {
              "computePartition": "CPX",
              "memoryPartition": "NPS4",
              "numGPUsAssigned": 8
            }
          ]
        }
      },
      "gpuClientSystemdServices": {
        "names": ["amd-metrics-exporter", "gpuagent"]
      }
    }
EOF

Patch the DeviceConfig custom resource to reference the ConfigMap and enable DCM.

oc patch deviceconfig amdgpu-driver-install -n openshift-amd-gpu --type='merge' -p '{
  "spec": {
    "configManager": {
      "enable": true,
      "image": "docker.io/rocm/device-config-manager:v1.4.1",
      "imagePullPolicy": "IfNotPresent",
      "config": {
        "name": "config-manager-config"
      }
    }
  }
}'

To determine which DCM versions are publicly available, you can use Skopeo to list the tags released by AMD.

skopeo list-tags docker://docker.io/rocm/device-config-manager
{
    "Repository": "docker.io/rocm/device-config-manager",
    "Tags": [
        "v1.3.0",
        "v1.3.0-beta.0",
        "v1.3.1",
        "v1.3.1-beta.0",
        "v1.4.0",
        "v1.4.0-beta.0",
        "v1.4.1"
    ]
}

Wait for the DCM Pod to be ready:

oc wait --for=condition=ready pod \
  -l app.kubernetes.io/name=device-config-manager \
  -n openshift-amd-gpu \
  --timeout=300s

You have verified your GPU's available partition modes, created the partition profiles, and enabled DCM.

In the next lesson, you will taint and label the GPU node to trigger the partitioning process.

Partition AMD Instinct GPU accelerators via device config manager (DCM) in Red Hat OpenShift

Configure and deploy the device config manager

Prerequisites

In this lesson, you will:

Deploy DCM

Note

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Breadcrumb

Partition AMD Instinct GPU accelerators via device config manager (DCM) in Red Hat OpenShift

Path resource: Configure and deploy the device config manager

Prerequisites

In this lesson, you will:

Deploy DCM

Note

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links