Enable GPU acceleration with the Kernel Module Management Operator

In this article we cover the required steps to configure Kernel Module Management Operator (KMM) and use it to deploy an out-of-tree (OOT) kernel module, as well as leveraging other related technologies to build a toolset for hardware enablement. To illustrate that process, we'll leverage the Intel Data Center GPU Flex 140.

What is the Kernel Module Management Operator?

The Kernel Module Management Operator manages, builds, signs, and deploys out-of-tree (OOT) kernel modules and device plug-ins on Red Hat OpenShift Container Platform clusters.

Before KMM, cluster admins had to manually install drivers to multiple nodes. Upgrades were painful and prone to errors from incompatible drivers. Furthermore, workloads might get scheduled to a node with broken drivers causing scheduling issues or missing hardware. KMM solves all of these problems, as we'll see.

KMM is designed to accommodate multiple kernel versions at once for any kernel module, allowing for seamless node upgrades and reduced application downtime. For more information, refer to the Kernel Module Management Operator product documentation.

KMM is also a community project, which you can test on upstream Kubernetes, and there is a Slack community channel.

Prerequisites

For this scenario, we'll require an already working OpenShift environment as we will use it to deploy the different tools on top. Check the documentation for instructions.

KMM will require a registry to push images to. If you've installed on bare metal, ensure the internal registry is enabled and configured (refer to Installing a user-provisioned cluster on bare metal).

Additionally, this tutorial references data available from Intel at the following locations:

Set up Node Feature Discovery Operator

Node Feature Discovery (NFD) detects hardware features available on nodes and advertises those features using nodes labels, so that they can later be used as selector for scheduling decisions.

The NFD Operator automatically adds labels to the nodes that present some characteristics, including if the node has a GPU and which GPU it has.

It's an ideal way to identify which nodes require a kernel module to be enabled for the specific node(s) and later use it to instruct KMM to build it only for those.

We can install it via the following YAML:

---
apiVersion: v1
kind: Namespace
metadata:
  name: openshift-nfd
---
apiVersion: operators.coreos.com/v1
kind: OperatorGroup
metadata:
  name: openshift-nfd
  namespace: openshift-nfd
spec:
  targetNamespaces:
    - openshift-nfd
---
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
  name: nfd
  namespace: openshift-nfd
spec:
  channel: "stable"
  installPlanApproval: Automatic
  name: nfd
  source: redhat-operators
  sourceNamespace: openshift-marketplace

Enable GPU acceleration with the Kernel Module Management Operator

Share:

What is the Kernel Module Management Operator?

Prerequisites

Set up Node Feature Discovery Operator

Advanced node labeling

Set up Kernel Module Management Operator

Create a kernel module

Integration with NFD via CRDs

KMM dependencies

Replace an in-tree module with an out-of-tree module

Configure the Driver Toolkit for image building

Manage heterogeneous nodes in the cluster

Enable the Intel Data Center GPU Flex 140

Intel Data Center GPU Flex 140 kernel module

Using prebuilt drivers

Compiling your own driver

Verify the deployment

Simple approach

Using OpenVINO application for image text to image with stable diffusion

Requirements

ImageStream

Containerfile

StorageClass

Persistent Volume Claim

PVC for cache

Running the application

Wrap up

But I do still want to upgrade my cluster!

Products

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Report a website issue