Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

Boost AI efficiency with GPU autoscaling on OpenShift

August 12, 2025
Rohit Ralhan
Related topics:
Artificial intelligence
Related products:
Red Hat OpenShift Container Platform

    In modern applications, autoscaling is crucial to maintaining responsiveness, efficiency, and cost-effectiveness. Workloads often experience fluctuating demand, and scaling enables dynamic resource allocation, preventing performance bottlenecks and ensuring high availability. Without effective scaling, applications risk over-provisioning resources, leading to unnecessary costs or under-provisioning, which can result in degraded performance and potential downtime.

    Autoscaling in Red Hat OpenShift and Kubernetes ensures responsiveness, efficiency, and cost-effectiveness by dynamically adjusting resources. Key mechanisms include horizontal pod autoscaling (HPA) for scaling pods, vertical pod autoscaling (VPA) for optimizing resource allocation, and cluster autoscaling for managing worker nodes. These features enhance resilience, reduce latency, and optimize costs in cloud-native environments.

    Traditional scaling mechanisms rely on system resource metrics like CPU or memory to trigger scaling actions. However, there are scenarios where these traditional mechanisms fall short, especially when dealing with more specialized workloads like GPU-accelerated applications, number of incoming requests, queue length, etc.

    Custom metrics autoscaler (KEDA) and Prometheus

    The custom metrics autoscaler operator, based on KEDA (Kubernetes event-driven autoscaler), enables autoscaling using events and custom metrics. It extends the horizontal pod autoscaler by providing external metrics and managing scaling to zero. The operator has two main components: 

    1. The operator controls workload scaling and manages custom resources.

    2. The metrics server supplies external metrics to the OpenShift API for autoscaling.

    To use the custom metrics autoscaler operator, you define a ScaledObject or ScaledJob for your workload. These custom resources (CRs) specify the scaling configuration, including the deployment or job to scale, the metric source that triggers the scaling, and other parameters, such as the minimum and maximum number of replicas. Figure 1 depicts the KEDA architecture.

    A diagram depicting the KEDA architecture.
    Figure 1: This is an illustration of the KEDA architecture.

    Prometheus is an open source monitoring and alerting toolkit under the Cloud Native Computing Foundation (CNCF). It collects metrics from various sources, stores them as time-series data, and supports visualization through tools like Grafana or other API consumers.

    Figure 2 provides a summary of how things work at a high level. We will discuss how to set up the custom metrics autoscaler in detail in the sections that follow.

    A diagram depicting a high-level overview of Custom Metrics Autoscaler (KEDA) and Prometheus working together.
    Figure 2: This is a high-level overview of the custom metrics autoscaler (KEDA) and Prometheus working together.

    In this diagram:

    • The GPU application receives requests, and the resource utilization hits a threshold.
    • Prometheus is configured to scrape those metrics.
    • Prometheus scaler in KEDA is configured and deployed to autoscale based on the utilization metrics.

    The procedure

    Before you can initiate the procedure, ensure that you have:

    • Admin access to Red Hat OpenShift Container Platform.
    • Access to oc CLI.
    • Monitoring of user-defined workloads enabled in OpenShift.
    • An inference application deployed on Red Hat OpenShift (feel free to deploy any model of your preference).

    The procedure is as follows:

    1. Install the custom metrics autoscaler operator.
    2. Create KedaController instance.
    3. Set up a secret. This is used by TriggerAuthentication for accessing Prometheus and contains token and ca cert.
    4. Create TriggerAuthentications. This is used to describe authentication parameters.
    5. Create ScaledObject to define the triggers and how the custom metrics autoscaler should scale your application.

    Next, we will walk through this procedure in detail.

    Install custom metrics autoscaler operator

    First, install the custom metrics autoscaler operator:

    1. In the OpenShift Container Platform web console, click Operators -> OperatorHub.
    2. Choose Custom Metrics Autoscaler from the list of available operators, and click Install.
    3. On the Install Operator page, ensure that the All namespaces on the cluster (default) option is selected for Installation Mode to install the operator in all namespaces.
    4. Ensure that the openshift-keda namespace is selected for Installed Namespace. OpenShift Container Platform creates the namespace if not present in your cluster.
    5. Click Install.
    6. Verify the installation by listing the custom metrics autoscaler operator components:
      1. Navigate to Workloads -> Pods.
      2. Select the openshift-keda project from the drop down menu and verify that the custom-metrics-autoscaler-operator-*  pod is running.
      3. Navigate to Workloads -> Deployments to verify that the custom-metrics-autoscaler-operator deployment is running.

    Create a KedaController instance

    Next, create the KedaController instance:

    1. In the OpenShift Container Platform web console, click Operators -> Installed Operators.
    2. Click Custom Metrics Autoscaler.
    3. On the Operator Details page, click the KedaController tab.
    4. On the KedaController tab, click Create KedaController and edit the file. The default values should be good enough, however you can make changes to the values like log level, which namespace to monitor (if any) etc. as desired.
    5. Click Create to create the KEDA controller.
    6. Select the openshift-keda project from the drop-down menu and verify that the keda-admission, keda-metrics-apiserver, and keda-operator deployment as well as the corresponding pods are running.

    Here is a sample KEDA Controller YAML:

    apiVersion: keda.sh/v1alpha1
    kind: KedaController
      name: keda
      namespace: openshift-keda
    spec:
      admissionWebhooks:
        logEncoder: console
        logLevel: info
      metricsServer:
        logLevel: '0'
      operator:
        logEncoder: console
        logLevel: info
      watchNamespace: ''

    Figures 3 through 6 depict the OpenShift Container Platform UI where the previous steps to install CMA and set up the controller take place. 

    A view of the OpenShift Container Platform UI. Select Operators > OperatorHub then search for Custom Metrics Autoscaler and click Install.
    Figure 3: In the OpenShift Container Platform UI, select Operators -> OperatorHub then search for Custom Metrics Autoscaler and click Install.

     

    A view of the OpenShift Container Platform Ui. On the Install Operator page, select the All namespaces on the cluster (default) option for Installation Mode.
    Figure 4: On the Install Operator page, select the All namespaces on the cluster (default) option for Installation Mode.

     

    A view of the OpenShift Container Platform UI. Select Operators -> Installed Operators.
    Figure 5: In the OpenShift Container Platform UI, select Operators -> Installed Operators.

     

    A view of the OpenShift Container Platform UI. n the Operator Details page, click the KedaController tab -> Create KedaController and edit the file.
    Figure 6: On the Operator Details page, click the KedaController tab -> Create KedaController and edit the file.

    Set up the secret

    To access Prometheus metrics, we will use bearer authentication by generating a one-year token for the prometheus-k8 service account in the OpenShift monitoring namespace for simplicity. Additionally, we extract the SSL certificate from the same namespace to securely connect to the Prometheus metrics endpoint.

    1. Run the following command:

    oc create secret generic <<secret-name>> --from-literal=ca.crt="$(oc get secret
    prometheus-k8s-tls -n openshift-monitoring -o jsonpath="{.data['tls\.crt']}" | base64 
    -d)" --from-literal=token="$(oc create token prometheus-k8s -n openshift-monitoring 
    --duration=8760h)" -n <<to be scaled object namespace>>

    Replace the secret-name and the scaled object namespace with the appropriate values. 

    Create TriggerAuthentications

    Next, create the TriggerAuthentications:

    1. Create a YAML file trigger-auth.yaml similar to the following: 
    apiVersion: keda.sh/v1alpha1
    kind: TriggerAuthentication
    metadata:
      name: trigger-auth-prometheus
      namespace: openshift-keda
    spec:
      secretTargetRef:
      - parameter: bearerToken
        name: <<secret-name created in the above step>>
        key: token
      - parameter: ca
        name: <<secret-name created in the above step>>
        key: ca.crt

    2. Create the CR object: oc create -f trigger-auth.yaml.

    Create the ScaledObject

    Now, create the ScaledObject:

    1. Create a YAML file scaled-object.yaml similar to the following:
    apiVersion: keda.sh/v1alpha1
    kind: ScaledObject
    metadata:
        annotations:
          scaledobject.keda.sh/transfer-hpa-ownership: 'true'
        name: cm-autoscaler
        namespace: <<scaled object namespace>>
        labels:
          scaledobject.keda.sh/name: cm-autoscaler
    spec:
     maxReplicaCount: 2
     minReplicaCount: 1
     pollingInterval: 10
     scaleTargetRef:
       apiVersion: apps/v1
       kind: <<scaled object type, eg. Deployment, Pod etc.>>
       name: <<scaled object Name>>
     triggers:
      - authenticationRef:
          name: trigger-auth-prometheus
        metadata:
          authModes: bearer
          metricName: DCGM_FI_DEV_GPU_UTIL
          query: 'SUM(DCGM_FI_DEV_GPU_UTIL{instance=~".+", gpu=~".+"})'
          serverAddress: 'https://prometheus-k8s.openshift-monitoring.svc.cluster.local:9091'
          threshold: '90'
        name: DCGM_FI_DEV_GPU_UTIL
        type: prometheus
    1. Create the CR object: oc create -f scaled-object.yaml. This should be created in the scaled object namespace.
    2. View the command output to verify that the custom metrics autoscaler was created: oc get scaledobject cm-autoscaler.

    In the above ScaledObject YAML, we use DCGM_FI_DEV_GPU_UTIL to get the total CPU utilization for the GPUs. However, you can update the query according to the requirement and/or use other metrics identifiers as applicable. For the list of Identifiers refer to the Field Identifiers docs.

    Sample output:

    NAME            SCALETARGETKIND      SCALETARGETNAME        MIN   MAX   TRIGGERS     AUTHENTICATION               READY   ACTIVE   FALLBACK   AGE
    scaledobject    apps/v1.Deployment   example-deployment     0     50    prometheus   prom-triggerauthentication   True    True     False       17s 

    Make sure the READY and ACTIVE are set to True indicating everything is working correctly.

    Test the AutoScaler

    For this we will use CUDA Test Generator (dcgmproftester). The dcgmproftester is a CUDA load generator. It can be used to generate deterministic CUDA workloads for reading and validating GPU metrics. Customers can use the tool to quickly generate a load on the GPU.

    Follow these steps to test the AutoScaler:

    1. Navigate to the nvidia-gpu-operator namespace.
    2. Navigate to a pod named nvidia-dcgm-****.
    3. Go the terminal tab in the Pod.
    4. Run the following command: /usr/bin/dcgmproftester12 --no-dcgm-validation -t 1004 -d 90 where -t is for generating load for a particular metric and -d for specifying the test duration. Add --no-dcgm-validation to let dcgmproftester generate test loads only.
    5. Watch the object get scaled.

    The following video demonstrates the scaling of the object.

    For more information, refer to these resources:

    1. OpenShift documentation
    2. Keda documentation
    3. NVIDIA DCGM

    Related Posts

    • Your first GPU algorithm: Scan/prefix sum

    • Anonymize data in real time with KEDA and Rook

    • How to use AMD GPUs for model serving in OpenShift AI

    • Testing memory-based horizontal pod autoscaling on OpenShift

    Recent Posts

    • Federated identity across the hybrid cloud using zero trust workload identity manager

    • Confidential virtual machine storage attack scenarios

    • Introducing virtualization platform autopilot

    • Integrate zero trust workload identity manager with Red Hat OpenShift GitOps

    • Best Practice Configuration and Tuning for Linux and Windows VMs

    What’s up next?

    Install Red Hat Device Edge on an NVIDIA® Jetson Orin™/NVIDIA IGX Orin™ Developer Kit and explore new features brought by rpm-ostree.

    Start the activity
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Chat Support

    Please log in with your Red Hat account to access chat support.