Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • Guided learning
      Receive custom learning paths powered by our AI assistant.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

The benefits of dynamic GPU slicing in OpenShift

May 6, 2025
Gaurav Singh Harshal Patil Abhishek Malvankar
Related topics:
Artificial intelligenceData science
Related products:
Red Hat AIRed Hat Enterprise Linux AIRed Hat OpenShift AI

    In the era of AI and machine learning, efficient resource management is paramount. As Red Hat OpenShift administrators, we face the challenge of deploying intensive AI workloads on a platform where GPUs represent a significant cost. Traditional methods like pre-slicing with NVIDIA’s Multi-Instance GPU (MIG) can lead to resource wastage, especially when the static slices do not align with dynamic workload demands.

    In this article, we will explore how dynamic GPU slicing—enabled by the dynamic accelerator slicer operator—can revolutionize GPU resource management in OpenShift by dynamically adjusting allocation based on workload needs.

    The challenge of GPU utilization

    OpenShift supports the NVIDIA GPU operator. However, conventional approaches come with several limitations:

    • Rigid slicing: Predefined MIG slices may not match the actual resource requirements of your pods, leading to underutilization.
    • Static allocation: Allocating GPU slices at node start-up forces administrators to reconfigure the nodes if workloads change, often causing disruptions.
    • Lack of dynamic provisioning: Without the ability to allocate GPU resources on demand, clusters end up either over-provisioning or underutilizing GPUs, both of which can increase operational costs.

    These issues highlight the need for a more flexible approach that enables fractional GPU sharing, ensuring each workload gets only what it needs. 

    Introducing the dynamic accelerator slicer

    The dynamic accelerator slicer operator is currently in developer preview (DP) and is designed to dynamically allocate and manage GPU slices in OpenShift. Its core objectives include:

    • Dynamic allocation: Provisioning MIG slices based on the precise pod resource requests and limits.
    • Intelligent scheduling: Leveraging Kubernetes scheduling gates to hold pods until the necessary GPU slices are available.
    • Seamless integration: Working alongside the NVIDIA GPU Operator, it manages GPU slices without requiring changes to the pod specifications.
    • Automated lifecycle management: Tracking allocations and automatically releasing GPU slices once workloads complete.

    How dynamic GPU slicing works

    Dynamic GPU slicing leverages the scheduling primitives within OpenShift to optimize resource usage in this three-step process:

    1. Dynamic allocation and placement

      When a pod requests GPU resources, it is held in a pre-scheduled state via Kubernetes scheduling gates. The operator dynamically allocates the required GPU slice only when the workload is ready to run, thereby avoiding the inefficiencies associated with pre-slicing.

    2. Integration with the NVIDIA GPU Operator

      By harnessing the power of the NVIDIA GPU Operator, the Dynamic Accelerator Slicer introduces an external controller that manages slice allocation. This integration ensures that GPU management remains robust and that the slicing mechanism fits seamlessly into the existing ecosystem.

    3. Automated slice lifecycle management

      From allocation to deallocation, the operator handles the entire lifecycle of GPU slices. This automation not only streamlines resource management but also ensures that resources are promptly returned to the pool once the pod completes its execution.

    Additionally, the solution extends GPU memory quota management via Kubernetes Quota and Kueue and introduces dynamic policy-driven slicing with efficient packing to meet diverse workload demands.

    Deploying the dev preview in OpenShift

    Deploying the dynamic accelerator slicer is straightforward. Before you begin, ensure you have the necessary dependencies installed:

    • oc CLI: Used to interact with your OpenShift cluster.
    • Operator SDK: Required to run the operator bundle.

    Note:

    Make sure the KUBECONFIG environment variable is set to point to your OpenShift cluster's configuration file.

    Follow these steps to deploy the operator:

    1. Create a new project:

      oc new-project instaslice-system 
    2. Run the operator bundle:

      operator-sdk run bundle quay.io/ibm/instaslice-bundle:0.0.1 -n instaslice-system

    These commands deploy the operator into your OpenShift cluster, allowing it to dynamically manage GPU slices alongside your existing NVIDIA GPU setup.

    Deploying an inference large language model (LLM) 

    Multiple instances of the inference can be deployed on a given GPU with the isolation provided by the NVIDIA MIG technology. Here we have a Red Hat OpenShift Container Platform worker node with 4 NVIDIA A100 GPUs:

    [root@nvidia-driver-daemonset-417 drivers]# nvidia-smi -L 
    
    GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-42db43a1-526e-626d-ff2d-456bf26d9df0)
    
    GPU 1: NVIDIA A100-SXM4-40GB (UUID: GPU-143a66c4-bd69-7559-8898-26f9886a2a56)
    
    GPU 2: NVIDIA A100-SXM4-40GB (UUID: GPU-7d9ad99b-3d06-cf12-bab8-e1f31672e01f)
    
    GPU 3: NVIDIA A100-SXM4-40GB (UUID: GPU-7cf6fcb8-d173-0224-618c-4813e24a3383)

    These GPUs can be sliced to serve multiple inference LLM models. Download the sample vLLM model.  

    After downloading, make sure to edit the file to set the value of HF_TOKEN in huggingface-secret obtained from your HuggingFace account. 

    Apply that file using oc apply -f <filename>.

    Please note that this particular sample model is rather large and may take a while to download depending on your network speed. Make sure the vLLM deployment has come up fine.

    $ oc get deployments vllm 
    
    NAME   READY   UP-TO-DATE   AVAILABLE   AGE
    
    vllm   1/1     1            1           35s

    You can verify that instaslice has indeed provisioned a slice requested by the deployment by observing the instaslice object as well as by probing the NVIDIA GPU operator. 

    We will find out the uuid of the pod running on the host. This is one of the ways to get it:

     $ oc get pods  vllm-7dbb49b8f8-znd4s -o json | jq .metadata.uid
    
    "98c08795-2da3-4598-81b7-538c4e37093b"

     This uuid will help us track the allocation status in the instaslice object:

    $ oc get instaslice host-192-168-11-144 -o json | jq .status.podAllocationResults
    {
      "98c08795-2da3-4598-81b7-538c4e37093b": {
        "allocationStatus": {
          "allocationStatusController": "ungated",
          "allocationStatusDaemonset": "created"
        },
        "configMapResourceIdentifier": "a02ae459-0618-4804-83f9-e5ba36af756f",
        "gpuUUID": "GPU-143a66c4-bd69-7559-8898-26f9886a2a56",
        "migPlacement": {
          "size": 4,
          "start": 0
        },
        "nodename": "host-192-168-11-144"
      }
    }

    As you can see in the previous example, the slice has been successfully created dynamically. You can verify this with NVIDIA GPU operator:

    # nvidia-smi -L 
    
    GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-42db43a1-526e-626d-ff2d-456bf26d9df0)
    
    GPU 1: NVIDIA A100-SXM4-40GB (UUID: GPU-143a66c4-bd69-7559-8898-26f9886a2a56)
    
      MIG 3g.20gb     Device  0: (UUID: MIG-b065e8ea-01ff-598f-83fd-3eea1f2869b9)
    
    GPU 2: NVIDIA A100-SXM4-40GB (UUID: GPU-7d9ad99b-3d06-cf12-bab8-e1f31672e01f)
    
    GPU 3: NVIDIA A100-SXM4-40GB (UUID: GPU-7cf6fcb8-d173-0224-618c-4813e24a3383)

    Now in another terminal, make sure to set the KUBECONFIG for the OpenShift cluster and execute the following command to forward the port:

    oc port-forward svc/vllm 8000:8000 -n instaslice-system

    Come back to the earlier terminal to query the model:

    $ curl http://localhost:8000/v1/completions \
      -H "Content-Type: application/json" \
      -d '{
        "model": "facebook/opt-125m",
        "prompt": "San Francisco is a",
        "max_tokens": 7,
        "temperature": 0
      }'
    {"id":"cmpl-474b5727068745baa4718925bcfa8f91","object":"text_completion","created":163554,"model":"facebook/opt-125m","choices":[{"index":0,"text":" great place to live.  I","logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":5,"total_tokens":12,"completion_tokens":7}}

    If you happen to delete this workload, instaslice will automatically clear corresponding slices on the respective GPUs dynamically.  

    The benefits of dynamic GPU slicing

    Adopting dynamic GPU slicing offers multiple benefits:

    • Increased utilization: Workloads can utilize GPU slices efficiently, reducing the need to reserve entire GPUs when only a fraction is necessary.
    • Cost reduction: By paying only for the resources actively used, organizations can significantly reduce GPU-related expenses.
    • Seamless scaling: Dynamic allocation allows clusters to adapt in real time to changing workloads, ensuring optimal performance and resource distribution.

    Looking ahead: Integration with Kubernetes DRA

    The Kubernetes Dynamic Resource Allocation (DRA) framework aims to provide even finer control over GPU scheduling. While DRA is still evolving, the dynamic accelerator slicer offers a practical and production-ready solution today. It paves the way for future enhancements, ensuring that OpenShift users can continue to push the boundaries of resource efficiency.

    Efficient GPU management is essential for maximizing the performance of AI workloads in OpenShift. The dynamic accelerator slicer operator delivers dynamic GPU slicing that optimizes resource usage, reduces operational costs, and adapts seamlessly to the fluctuating demands of modern AI applications.

    By integrating dynamically with the NVIDIA GPU operator and leveraging OpenShift’s scheduling capabilities, this innovative approach transforms GPU allocation from a static, often inefficient process into a dynamic, on-demand service. Embracing dynamic GPU slicing is a critical step towards unlocking the full potential of AI workloads in the OpenShift environment.

    Related Posts

    • What is GPU programming?

    • Your first GPU algorithm: Scan/prefix sum

    • GPU benchmarking and how to choose a GPU framework

    • Use Stable Diffusion to create images on Red Hat OpenShift AI on a ROSA cluster with GPU enabled

    • Deliver generative AI at scale with NVIDIA NIM on OpenShift AI

    Recent Posts

    • MCP servers vs. skills: Choosing the right context for your AI

    • How to route external and local LLMs with Models-as-a-Service

    • Protect data offloaded to GPU-accelerated environments with OpenShift sandboxed containers

    • Case study: Measuring energy efficiency on the x64 platform

    • How to prevent AI inference stack silent failures

    What’s up next?

    Learn how to run distributed AI training on Red Hat OpenShift using RoCE with this step-by-step guide from manual setup to fully automated training.

    Start the activity
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Chat Support

    Please log in with your Red Hat account to access chat support.