Skip to main content
Redhat Developers  Logo
  • Products

    Platforms

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat AI
      Red Hat AI
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • View All Red Hat Products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat Developer Hub
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat OpenShift Local
    • Red Hat Developer Sandbox

      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Secure Development & Architectures

      • Security
      • Secure coding
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • Product Documentation
    • API Catalog
    • Legacy Documentation
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Boost GPU efficiency in Kubernetes with NVIDIA Multi-Instance GPU

May 27, 2025
Kuan Feng (IBM)
Related topics:
Artificial intelligenceKubernetesOperators
Related products:
Red Hat AI

Share:

    Multi-Instance GPU (MIG) is a technology provided by NVIDIA to expand the performance and value of some of its GPU products. It can partition a GPU into multiple instances, with each instance fully isolated and having its own high-bandwidth memory, cache, and compute cores. This gives administrators the ability to support every workload, small or large, ensuring quality of service (QoS) while extending accelerated computing resources to all users."

    Kubernetes is an open source system for automating the deployment, scaling, and management of containerized applications. Kubernetes provides access to specialized hardware such as GPUs through its device plug-in framework. The NVIDIA GPU Operator uses the Kubernetes operator framework to automate the management of NVIDIA software components for GPU provisioning, including MIG. 

    This article explains the foundation of the GPU MIG partition, the efficiency challenges of different partition approaches, and how to improve efficiency with tools like MIG-Adapter.

    Mixed strategy and MIG profiles

    With mixed strategy offering greater flexibility, let's examine how these profiles are created and utilized in the context of GPU partitioning.

    GPU partitioning and strategy

    NVIDIA allows users to customize MIG partitions based on their needs. Using NVIDIA A100 (40GB) as an example, we will explain the flexibility and restrictions in creating MIG partitions. While other GPU types might have different numbers of compute or memory slices, the partitioning rules remain the same. 

    There are 7 compute slices and 8 memory slices on a NVIDIA A100 GPU; the partitioning process is really about how to divide them. Figure 1 represents the physical layout of A100 and can help you to better understand the partitioning.

    A close-up of a computer

AI-generated content may be incorrect.

    Figure 1: A100 GPU physical layout.

    Creating a MIG partition combines compute and memory slices. To provide high performance, the GPU driver keeps a tight relationship with the physical layout, but it causes certain restrictions to the partitioning. For example:

    • A user can only create a MIG partition with either 1, 2, 3, 4, or all 7 compute slices.
    • All compute slices in a partition must be physically adjacent.

    In practice, there are 2 partitioning strategies: single or mixed. 

    In a single strategy, all partitions in the GPU must be the same. That means the user gets 4 options (Figure 2) with a single strategy.

    A diagram of different sizes of objects

AI-generated content may be incorrect.

    Figure 2: Single partitioning strategy options.

    Mixed strategy provides greater customization flexibility. Figure 3 shows the 19 supported configurations user can use as profile for partitioning. 

    A table with different colored squares

AI-generated content may be incorrect.

    Figure 3: Mixed partitioning strategy configurations.

    From a user’s point of view, some configurations overlap (e.g. #15–#18 all result in 1x2g.10gb+5x1g.5gb partitions). The NVIDIA GPU Operator simplifies this by offering profiles.

    GPU Operator and MIG Profiles

    The GPU Operator deploys the MIG Manager to manage MIG configuration on nodes in your Kubernetes cluster. It supports both single and mixed strategies. With a mixed strategy, users can label nodes in Kubernetes with profile names defined in a ConfigMap. For example, here is a profile that applies config #15~#18 to all GPUs in the desired node:

    apiVersion: v1

    kind: ConfigMap

    metadata:

      name: custom-mig-config

    data:

      config.yaml: |

        version: v1

          mig-configs: 

            five-1g-one-2g:

            - devices: all 

              mig-enabled: true

              mig-devices:

                "1g.10gb": 5

                "2g.20gb": 1

    There are lots of useful profiles already provided by GPU Operator in the default ConfigMap. You can review them before defining your own profiles.

    Efficiency challenges

    In Kubernetes, different MIG partition types are registered as different resource types, and the user needs to explicitly put the type they want in resource requirements. A cluster administrator needs to understand the distribution of requirements to different resource types in the cluster to plan the GPU partitions accordingly. This creates many efficiency challenges.

    Single strategy

    Single strategy exposes a single type of MIG devices for all GPUs on the same node. In this scenario:

    • Single strategy does not support the 4g.20gb type; it can only expose a 1x4g.20gb partition for each GPU. This wastes the remaining 3 compute slices and 4 memory slices.
    • It wastes certain compute and memory slices because 7 compute slices and 8 memory slices are not good numbers to divide (for example, 3x2g.10gb partition wastes 1 compute slice and 2 memory slices).
    • It is hard to match certain distributions of resource types.

    Single strategy is not widely used in production because of these limitations.

    Mixed strategy

    Mixed strategy provides greater flexibility and efficiency in partitioning. Cluster administrators can leverage every compute/memory slice in the GPU to match a certain distribution of resource requirements. This is the most popular use case for MIG. 

    However, the challenges come from the unpredictable resource requirements of the workloads in the cluster. A large cluster with multi-node multi-GPUs is good for sharing resources across different types of workloads, but distribution of resource requirement types is changing dynamically. It often happens that workloads are pending for certain MIG types when there is an available resource, but in unmatched MIG types.

    Dynamic resource allocation

    Partition GPUs on nodes and registering MIGs as resource types for workload resource requirements is also known as static resource allocation. There is separate research on dynamic resource allocation using ResourceClaim with ResourceClaimTemplate, DeviceClass, etc. 

    With dynamic resource allocation, GPUs are not partitioned until a Pod with resource requirements comes to the cluster. The device driver will only create partitions to fulfill immediate Pod resource requirements and keep remaining compute and memory slices untouched for future workloads. 

    In theory, this approach satisfies the dynamic distribution of resource requirement types, but it has some critical technical problems with the tight relationship to GPU physical layout:

    • Additional scheduling delays due to dynamic partitioning and back-and-forth communication between scheduler and nodes; the scheduler is the critical path in the cluster and is very sensitive to delays for a single workload.
    • GPU resource fragmentation; an MIG partition with more than 1 compute slice must be allocated on adjacent slices. Frequent allocation and free MIGs causes discontinuity between available slices and renders the GPU unable to fulfill certain resource requirements. 
    • Increased Pod start time; creating a MIG partition needs more lead time, which is critical to scale-to-zero use cases.

    Because of these issues, dynamic resource allocation with NVIDIA MIG is still not ready for production. It is hard to make progress with the tight relationship between MIG partitions and the GPU physical layout in the driver. So, we’re introducing MIG Adapter to improve cluster resource efficiency with a MIG mixed strategy. 

    How MIG Adapter works

    In order to leverage free but unmatched MIG types in the system for pending workloads, MIG Adapter temporarily boosts the workload resource requirements to match free MIG resources without changing the GPU partitioning. It defines chains of compatible resource types, then performs Borrow and Return actions for workloads.

    Chain(s) of compatible resources types

    MIG Adapter defines a one-way chain to describe the compatibility of MIG types. For example, for a cluster with all A100 GPUs:

    1g5gb → 2g10gb → 3g20gb → 4g20gb

    If the cluster uses A100 and H100 together, the chain is:

    1g5gb → 1g10b → 2g10gb → 2g20gb → 3g20gb → 3g40gb → 4g20gb → 4g40gb

    These chains enable MIG Adapter to find replacements when a certain type is not available. 

    Borrow resource for pending workloads

    MIG Adapter watches for pending pods in the cluster and identifies those waiting for MIG resources. Once a pending pod is identified, it extracts the resource requirements from the Pod spec and looks for available resources through the chain of compatible resources. That means if a pod is pending for 1g5gb MIG, MIG Adapter will try to find 2g10gb, 3g20gb, etc., available in the cluster. 

    Once an available resource is identified, MIG Adapter restarts the pending pod and patches the new pod with the elevated resource requirements with its mutating admission webhook. 

    Return resource by restoring the workload requests

    MIG Adapter also watches for Terminated or Successed Pods in the cluster because it might free up MIG resources. Once a MIG resource is available, the MIG Adapter goes through all the workloads currently borrowing resources and identifies the one returned the most. 

    For example, if a Pod using 1g5gb is successed and there are Pods bumped from 1g5gb to 2g10gb and Pods bumped from 1g5gb to 3g20gb, the latter will be restored to its original resource requirements.

    Future improvements

    On top of the basic capabilities just described, you can introduce [olicies to:

    • Restrict resource types allowed to be used for adapting.
    • Restrict maximum room for adapting.
    • Enable/disable restore/preemption for certain workloads.

    Beyond that, the adapting concept can go beyond MIG. As long as the chain of compatibility can be defined, you should be able to adapt the workload to all compatible resource types.

    How to use MIG-Adapter

    MIG-Adapter is implemented with operator framework using operator SDK. There are two ways to run it: locally or as a deployment inside the cluster.

    Run locally (recommended)

    Technically, you can run the webhooks locally, but for this to work, you need to generate certificates for the webhook server and store them at /tmp/k8s-webhook-server/serving-certs/tls.{crt,key}. For more details about running webhook locally, refer to these instructions: Running and deploying the controller.

    Some shell commands to assist certificate generation are kept here.

    The MutatingWebhookConfiguration also needs to be updated with the generated certificates

    Run as a Deployment inside the cluster

    Running the MIG Adapter as a Deployment inside the cluster is the same as deploying an Operator. For instructions on deploying MIG Adapter into a cluster, refer to the Operator SDK tutorial.

    If the target cluster is an OpenShift cluster, refer to the documentation for injecting certificates. 

    Next steps

    Now you know how to configure Profiles for NVIDIA GPU Operator to customize MIG partitions for your business needs and how you can leverage MIG Adapter to improve the system efficiency. Go ahead and try it!

    Related Posts

    • What is GPU programming?

    • Your first GPU algorithm: Scan/prefix sum

    • GPU enablement on MicroShift

    • How MIG maximizes GPU efficiency on OpenShift AI

    • Accelerate model training on OpenShift AI with NVIDIA GPUDirect RDMA

    • How to set up NVIDIA NIM on Red Hat OpenShift AI

    Recent Posts

    • A deep dive into Apache Kafka's KRaft protocol

    • Staying ahead of artificial intelligence threats

    • Strengthen privacy and security with encrypted DNS in RHEL

    • How to enable Ansible Lightspeed intelligent assistant

    • Why some agentic AI developers are moving code from Python to Rust

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue