Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

How MIG maximizes GPU efficiency on OpenShift AI

February 6, 2025
Ido Pal
Related topics:
Artificial intelligenceContainers
Related products:
Red Hat OpenShift AIRed Hat OpenShift

    Modern data science workloads demand high computational power, and Graphic Processing Units (GPUs) are often at the heart of these operations. However, sharing GPU resources efficiently among multiple users or workloads can be challenging. NVIDIA Multi-Instance GPU (MIG) technology offers a solution. This article explores how I tested MIG on Red Hat OpenShift AI using an NVIDIA Ampere architecture GPU and the benefits for AI and data science teams.

    The NVIDIA MIG solution and test

    GPUs in a Kubernetes environment are assigned to pods in a 1:1 ratio by default. This means a single GPU is dedicated to one pod, regardless of whether the workload fully utilizes the GPU’s capacity. This limitation can lead to inefficient resource usage, especially for smaller workloads. NVIDIA MIG solves this issue by splitting a single GPU into multiple independent instances to be used by different pods. This feature maximizes GPU utilization and ensures resources are not wasted. In the next sections, I will demonstrate how I tested MIG on Red Hat OpenShift AI.

    Prepare the environment

    For this test, certain preparatory steps are required to leverage MIG on OpenShift. I used Azure’s Standard_NC24ads_A100_v4 virtual machine (VM), equipped with an NVIDIA A100 PCIe 80GB GPU as an OpenShift worker (Figure 1).

    lspci output from the GPU OpenShift worker
    Figure 1: lspci output from the GPU OpenShift worker.

    Step 1: Install NFD

    First, I installed the Node Feature Discovery (NFD) operator, as shown in Figures 2 and 3.

    NFD operator installation page
    Figure 2: NFD operator installation page.

    This operator detects hardware features and ensures that GPUs are discoverable by the NVIDIA GPU operator.

    NFD deamonset
    Figure 3: NFD deamonset.

    We will see many labels added to the node, indicating the operator detects its GPU:

    $ oc describe node/ods-cluster-mqt7l-worker-eastus2-fn5w8
                            Labels:             beta.kubernetes.io/arch=amd64
                                              feature.node.kubernetes.io/cpu-cpuid.ADX=true
                                              feature.node.kubernetes.io/cpu-cpuid.AESNI=true
                                              ...
                                              feature.node.kubernetes.io/cpu-cpuid.FMA3=true
                                              feature.node.kubernetes.io/gpu.present=true
                                              feature.node.kubernetes.io/gpu.memory=80GB
                                              feature.node.kubernetes.io/gpu.vendor=nvidia
                                              feature.node.kubernetes.io/gpu.model=A100

    Step 2: Install the NVIDIA GPU operator

    Next, I installed the NVIDIA GPU operator, which handles the configuration of GPU resources (Figure 4).

    GPU Operator installation page
    Figure 4: GPU Operator installation page.

    I made sure to enable the MIG manager in the ClusterPolicy configuration to facilitate the MIG setup (Figure 5).

    MIG configuration
    Figure 5: MIG configuration.

    Step 3: Check the pods

    There are two ways to make sure all pods under the nvidia-gpu-operator namespace are up and running:

    1. From the CLI:

      $ oc get pods -n nvidia-gpu-operator
    2. From the console, as shown in Figure 6:
    GPU Operator pods
    Figure 6: GPU Operator pods.

    Choose the right MIG configuration

    MIG offers a variety of configurations tailored to different GPU models and workload requirements. You have to understand which configurations are supported for the NVIDIA A100–80GB GPU. For example, I ran the command oc describe configmap/default-mig-parted-config, explored the available configurations, and selected one that matched my requirements.1g.10gb, which divides the GPU into seven instances.

    The following configuration is ideal for workloads that require smaller, dedicated slices of GPU power.

        # H100-80GB, H800-80GB, A100-80GB, A800-80GB, A100-40GB, A800-40GB
         all-1g.10gb:
           # H100-80GB, H800-80GB, A100-80GB, A800-80GB
           - device-filter: ["0x233010DE", "0x233110DE", "0x232210DE", "0x20B210DE", "0x20B510DE", "0x20F310DE", "0x20F510DE", "0x232410DE"]
             devices: all
             mig-enabled: true
             mig-devices:
               "1g.10gb": 7

    Enable and verify MIG

    To verify the setup, I used the nvidia-smi tool to query the GPU status and configurations. When MIG was initially disabled, I enabled it and restarted the node:

    sh-4.4# nvidia-smi -i 0 -mig 1
                            Enabled MIG Mode for GPU 00000001:00:00.0
                            All done.

    To verify that MIG is enabled for the GPU, I connected to the nvidia-mig-manager pod in OpenShift and used the terminal tab to query GPU=0 configurations with the following command:

    sh-4.4#
                            sh-4.4# nvidia-smi -i 0 -q
                            ==============NVSMI LOG==============
                            Timestamp                           : Tue Dec  5 15:41:13 2023
                            Driver Version                      : 535.104.12
                            CUDA Version                        : Not Found
                            Attached GPUs                       : 1
                            GPU 00000001:00:00.0
                                Product Name                    : NVIDIA A100 80GB PCIe
                                Product Brand                   : NVIDIA
                                Product Architecture            : Ampere
                                Display Mode                    : Enabled
                                Display Active                  : Disabled
                                Persistence Mode                : Enabled
                                Addressing Mode                 : None
                                MIG Mode
                                    Current                     : Enabled
                                    Pending                     : Enabled

    After selecting the configuration, I labeled the node with the following command:

    $ oc label node <node-name> nvidia.com/mig.config=all-1g.10gb --overwrite

    The MIG manager pod logs insights into the status of the node labeling process (Figure 7).

    MIG-manager logs
    Figure 7: MIG-manager logs.

    Once successful, the node reported multiple allocatable GPUs instead of a single one.

    Let's describe the node to confirm that it recognizes seven GPUs:

    $ oc describe node/ods-cluster-mqt7l-worker-eastus2-fn5w8
                            Capacity:
                              attachable-volumes-azure-disk: 8
                              cpu: 24
                              ephemeral-storage: 133682156Ki
                              hugepages-1Gi: 0
                              hugepages-2Mi: 0
                              memory: 226965748Ki
                              nvidia.com/gpu: 7
                              pods: 250
                            Allocatable:
                              attachable-volumes-azure-disk: 8
                              cpu: 23500m
                              ephemeral-storage: 122127732942
                              hugepages-1Gi: 0
                              hugepages-2Mi: 0
                              memory: 225814772Ki
                              nvidia.com/gpu: 7
                              pods: 250

    Consume the sliced GPUs via Red Hat OpenShift AI

    With MIG enabled, the OpenShift AI dashboard reflected the increased availability of GPU resources. I could select up to seven GPUs for my workbench (Figure 8). This setup empowers AI and data science teams to run diverse workloads simultaneously without bottlenecks.

    Allocating GPUs in the OpenShift AI notebook
    Figure 8: Allocating GPUs in the OpenShift AI notebook.

    Unlock GPU potential with NVIDIA MIG and OpenShift AI

    NVIDIA MIG technology, integrated with Red Hat OpenShift AI, transforms GPU resource management by facilitating scalable and efficient workloads. By partitioning GPUs into smaller, independent units, organizations can achieve maximum resource utilization, cost savings, and streamlined AI/ML operations. MIG on OpenShift AI helps teams fully harness the power of GPU technology, whether they manage diverse workloads or scale multi-user environments.

    Learn more about using NVIDIA NIM on Red Hat OpenShift AI and the performance results shown by Red Hat AI Performance and Scale when testing NVIDIA GPUs with MIG.

    Related Posts

    • Model training in Red Hat OpenShift AI

    • Red Hat OpenShift AI and machine learning operations

    • Deliver generative AI at scale with NVIDIA NIM on OpenShift AI

    • Sensitive information detection using the NVIDIA Morpheus AI framework

    • How to integrate Quarkus applications with OpenShift AI

    Recent Posts

    • Best Practice Configuration and Tuning for Linux and Windows VMs

    • Red Hat UBI 8 builders have been promoted to the Paketo Buildpacks organization

    • Using eBPF in Red Hat products

    • How we made one data layer serve the UI, the mocks, and the E2E tests

    • Build trusted Python containers with Project Hummingbird and Calunga

    What’s up next?

    This hands-on learning path demonstrates how retrieval-augmented generation (RAG) works and how users can implement a RAG workflow using Red Hat OpenShift AI and Elasticsearch vector database.

    Start the activity
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility