Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

A practical example of the custom metrics autoscaler

May 5, 2025
Lazhar Halleb
Related topics:
Hybrid CloudIntegrationOperators
Related products:
Red Hat OpenShift Container PlatformRed Hat OpenShift Data Foundation

Share:

    The custom metrics autoscaler is the Red Hat version of the Kubernetes Event Driven Autoscaler (KEDA) community operator based on the Cloud Native Computing Foundation (CNCF) project. It has been designed to use metrics from different sources, such as pods CPU, memory usage, or other cluster metrics, to decide on whether to autoscale a pod up or down based on the metrics in question. 

    Examples of such metrics include CPU, memory, number HTTP requests to ingress endpoint, cluster’s health, and more. The source of those metrics is called a trigger, coming from Prometheus monitoring, Kafka, a cron schedule, or many others.

    This article demonstrates how to use the custom metrics autoscaler in a different manner than the well-known method. You will learn how a metric collected from the system can be a base to decide on the autoscaling of the application’s pod. In this example, one of the metrics collected by Prometheus from the Red Hat OpenShift Data Foundation storage system has been used to shutdown an application (scaling it down).

    This was a real use case from one of our partners that acted on its database application automatically when OpenShift Data Foundation is not healthy. That application has volume claims from OpenShift Data Foundation to store the data.

    Use case explanation

    In this situation, a database (DB) application is writing data to persistent storage claimed from OpenShift Data Foundation. If OpenShift Data Foundation has a problem, the ceph health status will change accordingly to show either a degraded or even down status. If that happens, writing data to the volume becomes a problem, as it can be corrupted or even lost depending on the issue causing the health state degradation.

    It is in the interest of the generator for the data to get the application stopped, and hence, no more data is written to the DB. The custom metrics autoscaler can be used to avoid manual intervention to stop the DB application, which can take time and cause damage. 

    The cluster used for testing is running OpenShift 4.12 (or above) and has a compatible version of OpenShift Data Foundation installed. This is a 3-node cluster, the minimum number of nodes required for OpenShift Data Foundation.

    There is a large number of metrics monitoring the OpenShift Data Foundation cluster. The one that monitors the global health of the cluster (called ceph_health_status) is the one that will be used as a metric by the custom metrics autoscaler. Many issues in the cluster can cause this health status metric to change: disk failure, node failure, etc. 

    The ceph_health_status can have the value of 0 (Health OK), 1 (Health Warning), and 2 (Health Critical). Node failure will be simulated by stopping on the machines, hence ceph_health_status going from 0 to 1. This triggers the custom metrics autoscaler that we will explain later.

    Overview of installation and configuration

    To begin, the first requirement is to install the custom metrics autoscaler operator. You can follow the installing the custom metrics autoscaler instructions. 

    That installation procedure requires creating a KedaController object. In this example, the object was created with the same information as the documentation, except the .spec.operator.caConfigMaps configuration, which is not required here.

    The .spec.operator.caConfigMaps configuration is required if you have an external metrics service that you want to connect to. In this example, we use the internal Prometheus metrics collector.

    In order to make the creation of KedaController successful, there also needs to be a persistent volume claim (PVC). A PVC was created using the ocs-storagecluster-cephfs storage class from OpenShift Data Foundation. (See Table 2 in the Detailed configuration section).

    It is also required to create a scaledObject as per the installation procedure. (See Table 5 in the Detailed configuration section).

    The important piece of configuration in that object is the triggers section. The triggers types can be CPU, memory, Kafka, Prometheus, Cron, or others. The triggers field is a list of objects, hence there can be many triggers to activate the autoscaler.

    In this integration test, the Prometheus trigger type will be used. The metric will be provided by this metric's collector. The trigger object in this case will use ceph health to decide on scaling up and down the application. There is a parameter called cooldownPeriod:. This gives that specified period of time to the Horizontal Pod Autoscaler (HPA) to keep watching before applying the scaling. It avoids frequent application disruption.

    The OpenShift Data Foundation ceph health status metrics is collected by the Prometheus metrics in Red Hat OpenShift. Health status of ceph will be the one metric triggering the autoscaler. It is part of the trigger's entry configuration. You can refer to the Trigger section ceph_health_status in Table 5 in the Detailed configuration section.

    There is also a section to configure the authentication to the prometheus metrics collector. Details are provided in the next section.

    Detailed configuration

    In this section, you can see the different specific configurations related to this test example.

    Table 1 shows the spec section from configuration of the KedaController:

    spec:
      metricsServer:
        auditConfig:
          lifetime:
            maxAge: "2"
            maxBackup: "1"
            maxSize: "50"
          logFormat: json
          logOutputVolumeClaim: kedapvc [1]
          policy:
            omitManagedFields: false
            omitStages:
            - RequestReceived
            rules:
            - level: Metadata
        logLevel: "0"
      operator:
        logEncoder: console
        logLevel: info
      serviceAccount: {}
      watchNamespace: ""

    [1]  See Table 2.

    Table 2 shows the persistent volume claim created for the KedaController object:

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: kedapvc
      namespace: openshift-keda
    spec:
      accessModes:
        - ReadWriteOnce
      volumeMode: Filesystem
      resources:
        requests:
          storage: 2Gi
      storageClassName: ocs-storagecluster-cephfs

    Once the KedaController is created, there must be pods running in the openshift-keda namespace:

    $ oc get pods -n openshift-keda
    NAME                                                  READY   STATUS    RESTARTS       AGE
    custom-metrics-autoscaler-operator-5fd684746c-bmdrj   1/1     Running   0              10d
    keda-admission-644969dc5b-jgqpl                       1/1     Running   0              10d
    keda-metrics-apiserver-85d55b9577-cqgkp               1/1     Running   0              111m
    keda-operator-69b768cfb4-tzvzw                        1/1     Running   0              10d

    Figure 1 shows a Keda pod view in the OpenShift console in developer view.

    Keda pod view in the OpenShift console in developer view
    Figure 1: Keda pod view in the OpenShift console in developer view.

    If the scaledObject's trigger needs to authenticate to a data source, its custom resource requires a trigger authentication reference.

    The configuration item authenticationRef, in the scaledObject custom resource definition (CRD) refers the name and the type of the authentication object to the TriggerAuthentication object created according to the steps in the Configuring the custom metrics autoscaler to use OpenShift Container Platform monitoring documentation. 

    First, a service account (SA) (Procedure, Step 2.a) is created in the project (namespace) where the workload application is running, in this example is the DB.  The application runs in the project "Default". The documentation indicates to create a new project, but for this test, that is not necessary.

    Then a secret is created and will be using the bearer token from that SA [Procedure, Step 2.b ]. Creating this secret was optional in Red Hat OpenShift Container Platform (RHOCP) 4.15 and earlier depending on if the image registry is disabled or not. From RHOCP 4.16 and up, this is required as that authentication bearer secret is not created automatically.

    As per the documentation, we show how to create the two above objects. To create the service account:

    oc create serviceaccount kedasa -n default

    To create the secret, Table 3 shows the bearer token created of the service account:

    apiVersion: v1
    kind: Secret
    metadata:
      name: kedasa-token
      Namespace: default
      annotations:
        kubernetes.io/service-account.name: kedasa 
      type: kubernetes.io/service-account-token

    When this secret is automatically created in RHOCP 4.15 and earlier, it will look like the following:

    kedasa-token-xyztw. 

    Running the following command will get you the exact name: 

    oc get secret -n projectName | grep “serviceaccountName-token”

    The following is an example: 

    $ oc get secret -n default | grep "kedasa-token"
    kedasa-token-2gc22         kubernetes.io/service-account-token   4      140d

    Once that is done, proceed with the next steps (Procedure, Step 3) to create the TriggerAuthentication.

    Note:

    As for steps 4 and 5 to create the role and role binding, that is required when you need metrics from the namespace where pods are running. In this application example, it was not needed. The role binding was needed from the monitoring platform.

    For this integration example to work, the monitoring view cluster role below was bound to the service account in order to be able to gather the metric from prometheus.

    The configuration used for this example is as Table 4 shows the TriggerAuthentication object:

    apiVersion: keda.sh/v1alpha1
    kind: TriggerAuthentication
    metadata:
      name: keda-trigger-auth-prometheus
      namespace: default
    spec:
      secretTargetRef:
      - parameter: bearerToken
        name: kedasa-token-2gc22
        key: token
      - parameter: ca
        name: kedasa-token-2gc22
        key: ca.crt

    To be able to get the cluster monitoring data from OpenShift monitoring (i.e., Prometheus), it must be added the monitoring view role to the service account Keda as follows:

    $ oc adm policy add-cluster-role-to-user cluster-monitoring-view -z kedasa -n  default

    The scaledObject CRD is the one used to control the horizontal auto scaling of the application, via the deployment CRD or any other CRD that can create pods.

    Table 5 shows the scaledObject configuration:

    apiVersion: keda.sh/v1alpha1
    kind: ScaledObject
    metadata:
      name: autoscaler-ceph-helth-postgresql
      namespace: default
    spec:
      cooldownPeriod: 60
      maxReplicaCount: 1
      minReplicaCount: 0
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: postgresql-db
      triggers:
      - type: prometheus
        metadata:
          serverAddress: https://prometheus-k8s.openshift-monitoring.svc.cluster.local:9091
          namespace: default
          metricName: ceph_health_status
          threshold: '2'
          activationThreshold: '1'
          query: 2-ceph_health_status
          authModes: "bearer"
          ignoreNullValues: "false"
          unsafeSsl: "true"
        authenticationRef: 
          name: keda-trigger-auth-prometheus
          kind: TriggerAuthentication

    The different parameters in the scaledObject configuration are as follows:

    • cooldownPeriod: Specifies the period in seconds to wait after the last trigger is reported before scaling the deployment back to 0 if the minReplicaCount is set to 0. The default is 300. This is an important parameter in this case study. One does not need too long to decide to shutdown the application if the storage is not healthy. I have chosen 60 seconds for testing purposes only. It needs to be adjusted according to one's needs.

    • maxReplicaCount: Specifies the maximum number of replicas when scaling up. The default is 100. In this example of a DB server, it is set to 1. It can be set to any value as long as there is no problem with writing to the storage (such as a DB server).

    • minReplicaCount: Specifies the minimum number of replicas when scaling down. In this example, it is set to 0. The application must be shut down if there is a problem with storage. That is the main purpose of the test.

    • scaleTargetRef: This is the target application that is monitored for scaling. In this example, it is the deployment of the DB server. 

    It can be one of kind Deployment, StatefulSet, or CustomResource.

    The following are the different configuration parameters of the trigger:

    • type: This has been explained in the previous sections. In this example, I am using Prometheus from the cluster’s monitoring.

    • serverAddress: This is the address of the service to be queried for the metrics. This is the address of the Prometheus service in the OpenShift monitoring operator that is installed by default in the RHOCP cluster.

    • metricName: This is the metric name in the external.metrics.k8s.io API. If you are using more than one trigger, all metric names must be unique. Of course, here the OpenShift Data Foundation ceph cluster health status is collected in the parameter ceph_health_status. It is updated by ceph monitor depending on many other parameters such as OSD (disks) health.

    • threshold: Specifies the value that triggers scaling. This must be specified as a quoted string value.

    • activationThreshold: This is the threshold to activate the scaling object. It is explained in the documentation.

    • query: This is how to calculate the value to be decided on for scaling the deployment.

    • authenticationRef: This is the authentication object created in the previous steps based on the service account in the workload namespace. KEDA can control the scaled object via this authentication. For more information, check the documentation. 

    For reference, all the files are in the GitHub repo.

    Start the application and verification steps

    The RHOCP cluster is a 3-node cluster with OpenShift Data Foundation installed. Each node has a disk dedicated to OpenShift Data Foundation besides the OS disk.

    We've used a simple deployment using a PVC from OpenShift Data Foundation (cephFS again). The application starts by creating the deployment object. 

    Once the application is up, one of the servers is shut down. In this way, ceph health status will go from HEALTH_OK to HEALTH_WARN.

    On the other hand, the ceph_health_status in Prometheus OpenShift monitoring will vary from 0 to 2.

    0 means HEALTH_OK, 1 means HEALTH_WARN, and 2 means HEALTH_CRITICAL (or HEALTH_ERR).

    Figure 2 shows the ceph health status in OpenShift Metrics Console.

    Ceph health Status in OpenShift Metrics Console.
    Figure 2: Ceph health Status in OpenShift Metrics Console. 

    Based on the trigger’s query, threshold, activation threshold, and the min and max replicas, the horizontal autoscaler will decide to scale up and down the application:

    spec:
      cooldownPeriod: 60 [1]
      maxReplicaCount: 1 [2]
      minReplicaCount: 0 [3]
    …
          metricName: ceph_health_status [7]
          threshold: '2' [8]
          activationThreshold: '1' [9]
          query: 2-ceph_health_status [10]
    …

    Here is how to do the checks:

    1. When the OpenShift Data Foundation cluster is healthy, the application must be up and running.

      In this test I have used a sample pod to write logs to disk simulating the DB.

      The deployment will be responsible for starting the pods and it uses a Fedora or Red Hat Enterprise Linux (RHEL) Red Hat Universal Base Image (UBI) image and starts a logging bash script.

      Running the following command on the cluster will show the status of the pod:

      $ oc get pods -n default

      The following is an example:

      NAME                               READY   STATUS    RESTARTS      AGE
      postgresql-db-5c49b84787-7q62n     1/1     Running   1 (14h ago)   14h

    Figure 3 shows Pod Running status in the OpenShift web console in developer view.

    Pod Running status in OpenShift Console, Developer View: Managed by keda autoscaler.
    Figure 3: Pod Running status in OpenShift Console, developer view: Managed by keda autoscaler.
    1. Now one of the OpenShift Data Foundation servers is going to be stopped. The purpose of this step is to cause the health warn error to be triggered, and hence, the ceph_health_status metric what is monitored by the autoscaler to go from 0 (HEALTH OK) to 1 (HEALTH WARN)

      The node will show NotReady:

      $ oc get nodes
      NAME       STATUS   ROLES                         AGE    VERSION
      master-0  NotReady  control-plane,master,worker   259d   v1.27.8+4fab27b
      master-1   Ready    control-plane,master,worker   259d   v1.27.8+4fab27b
      master-2   Ready    control-plane,master,worker   259d   v1.27.8+4fab27b

      Next, monitor the keda operator pod.

      This is an example:

      $ oc logs keda-operator-69b768cfb4-x4swv -n openshift-keda 
      
      2024-10-25T12:55:07Z INFO scaleexecutor Successfully set ScaleTarget replicas count to ScaledObject minReplicaCount {"scaledobject.Name": "autoscaler-ceph-helth-postgresql", "scaledObject.Namespace": "default", "scaleTarget.Name": "postgresql-db", "Original Replicas Count": 1, "New Replicas Count": 0}

      Then monitor the pods:

      $ oc get pods -n default
      NAME                           READY   STATUS    RESTARTS   AGE
      postgresql-db-5c49b84787-7q62n   1/1   Terminating   0      15h

      The ceph health status can also be monitored:

      $ oc exec -n openshift-storage -it rook-ceph-tools-559cf5bc46-gxpds -- ceph -s| grep HEALTH
          health: HEALTH_WARN

    In the OpenShift monitoring dashboard the following query can be run: ceph_health_status. It will see how the value is moving from 0 to 1.

    At some point, the service is brought back online. This will move the health metric static back to the HEALTH_OK state:

    $ oc get nodes
    NAME       STATUS   ROLES                         AGE    VERSION
    master-0   Ready    control-plane,master,worker   259d   v1.27.8+4fab27b
    master-1   Ready    control-plane,master,worker   259d   v1.27.8+4fab27b
    master-2   Ready    control-plane,master,worker   259d   v1.27.8+4fab27b

    OpenShift Data Foundation health Status is recovered:

    $ oc exec -n openshift-storage -it rook-ceph-tools-5fc865bfd-zrdts -- ceph -s| grep HEALTH
        health: HEALTH_OK

    The logs from the keda-operator pod are indicating that the new replicas count is now 1:

    2024-10-25T13:14:07Z INFO scaleexecutor Successfully updated ScaleTarget {"scaledobject.Name": "autoscaler-ceph-helth-postgresql", "scaledObject.Namespace": "default", "scaleTarget.Name": "postgresql-db", "Original Replicas Count": 0, "New Replicas Count": 1}

    The custom metrics autoscaler, which uses the Horizontal Pod Autoscaler, watches the metric gets the new value, and hence, deploys new pods. HPA creates a new replicaSet from the deployment with the min replica count:

    $ oc get pods -n default
    Example:
    NAME                               READY   STATUS    RESTARTS   AGE
    postgresql-db-5c49b84787-9qzsc   1/1     Running   0          7m6s

    Conclusion

    In this article, we provided a practical example of how to configure the custom metrics autoscaler (also known as KEDA in upstream Kubernetes) with a real-world scenario. The metric used was collected from a different namespace and service (OpenShift Data Foundation) other than the metrics available in the pod itself. It shows how to configure the critical parts of OpenShift Container Platform cluster to make it work appropriately.

    Thanks to Pedro Ibanez Requena, Max Cao, and John Kyros for reviewing and approving this article.

    Related Posts

    • Autoscaling Kafka workloads with custom metrics autoscaler

    • Benchmarking the Vertical Pod Autoscaler

    • Testing memory-based horizontal pod autoscaling on OpenShift

    • Red Hat OpenShift Container Platform Load Testing Tips

    Recent Posts

    • How to run a fraud detection AI model on RHEL CVMs

    • How we use software provenance at Red Hat

    • Alternatives to creating bootc images from scratch

    • How to update OpenStack Services on OpenShift

    • How to integrate vLLM inference into your macOS and iOS apps

    What’s up next?

    Operators make it easier to automate the lifecycle of complex, stateful programs by adding application-specific skills to a Kubernetes cluster. Read Kubernetes Operators for realistic examples that show you how Operators work and how to build them with the Operator Framework and SDK.

    Get the e-book
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue