Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • Guided learning
      Receive custom learning paths powered by our AI assistant.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

Troubleshooting application performance with Red Hat OpenShift metrics, Part 4: Gathering performance metrics

July 29, 2021
Pavel Macik
Related topics:
KubernetesOperators
Related products:
Developer SandboxRed Hat OpenShift

    This series shows how to use Red Hat OpenShift metrics in a real-life performance testing scenario. I used these metrics to run performance tests on the Service Binding Operator. We used the results to performance-tune the Service Binding Operator for acceptance into the Developer Sandbox for Red Hat OpenShift.

    In Part 3, I showed you how we set up OpenShift's monitoring stack to collect runtime metrics for our testing scenarios. I also shared a collector script that ensures the results are preserved on a node that won't crash. Now, we can look at the performance metrics we'll use and how to gather the data we need.

    Read the whole series:

    • Part 1: Performance requirements
    • Part 2: The test environment
    • Part 3: Collecting runtime metrics
    • Part 4: Gathering performance metrics
    • Part 5: Test rounds and results

    CPU and memory usage

    As mentioned in Part 3, some metrics have to be collected for the duration of the test. The most important metrics required by the Developer Sandbox team were the CPU and memory usage of both OpenShift nodes and the tested operators. We also wanted to see the number of resources left in the cluster after the test was done.

    Unfortunately, we ran into problems during our first attempts to run the stress tests. The OpenShift cluster actually crashed when one of its worker nodes came down. Naturally, it was important to know what caused the failure. I needed a more granular view of the CPU and memory usage, so I had to collect data from the workloads deployed on those nodes.

    From watching and inspection using the cluster's own Grafana instance, I identified a couple of resources that were loaded and stressed by our scenario. Then, I included them to be watched by the collector script I shared in the previous article, together with the cluster's nodes.

    I identified workloads from a handful of namespaces as stressed, and included them to be watched for CPU and memory usage. Table 1 shows these namespaces and workloads.

    Table 1: Workloads to watch for CPU and memory usage.
    NamespaceWorkloads
    openshift-apiserverapiserver-* pods
    openshift-kube-apiserverkube-apiserver-* pods
    openshift-monitoringprometheus-k8s-* pods
    openshift-operators
    • service-binding-operator-* pods
    • rhoas-operator-* pods
    openshift-operator-lifecycle-manager
    • catalog-operator-* pods
    • olm-operator-* pods

    Resources created in the cluster

    Another one of the metrics requested by the Developer Sandbox team was the number of resources created in the cluster during the test. There were two ways to get that information:

    • If the Prometheus instance was available (meaning the node on which it is deployed had not crashed), I could use a simple Prometheus query:

      sort_desc(cluster:usage:resources:sum)
    • If the Prometheus instance was not available, I had to employ a brute-force approach, using the oc tool to count the number of each resource.

    Service Binding Operator performance and developer experience

    The final set of metrics was related to the performance of the Service Binding Operator itself, from a developer's perspective. Specifically, we wanted to know how long it took to perform the binding after the ServiceBinding resource was created.

    The typical situation for a developer using the Service Binding Operator is to have a backing service and an application running that the user wants to bind together. So, the developer sends a ServiceBinding request and expects the binding to be done by Service Binding Operator. The scenario can be split into the following sequence of steps. Each step is shown along with the way to retrieve the respective timestamp:

    1. The ServiceBinding request is sent, processed by OpenShift, and created internally as a resource (.metadata.creationTimestamp of the ServiceBinding resource).
    2. The Service Binding Operator picks up the resource while watching for it and processes it. (This is the first "Reconciling ServiceBinding" message for the particular ServiceBinding resource in the Service Binding Operator logs.)
    3. Based on the content of the resource, the Service Binding Operator performs the binding. It collects bindable information from the backing service and injects it into the application.
    4. The application is re-deployed with the bound information injected into the Deployment resource (.status.conditions[] | select(.type=="Available") | select(.status=="True").lastTransitionTime).
    5. The ServiceBinding resource is marked as done. (A "Done" message is sent for the particular ServiceBinding resource in the Service Binding Operator logs.)

    I defined the following metrics to evaluate the developer experience in this scenario:

    • Time to Ready: The time between the creation of the ServiceBinding resource (1) and completion of the binding (5). That can be further split into the following:
      • Time to Reconcile: The time between the creation of the ServiceBinding resource (1) and when it is picked up by Service Binding Operator (2).
      • Reconcile to Ready: The time between when the Service Binding Operator picks up the ServiceBinding (2) and completes the binding (5).

    At the time of the performance evaluation, these metrics were not collected by OpenShift's monitoring stack or by the Service Binding Operator. So, I had to dig up the information from the data that I could get. As shown earlier, I derived the information I needed from metadata that OpenShift collects about the active user's backing service and the application (especially timestamps), along with the Service Binding Operator logs. I wrote the following script to collect the necessary data from OpenShift and compute the information after the test is complete:

    if [ -z "$QUAY_NAMESPACE" ]; then
        echo "QUAY_NAMESPACE environemnt variable needs to be set to a non-empty value"
        exit 1
    fi
    
    DT=$(date "+%F_%T")
    RESULTS=results-$DT
    mkdir -p $RESULTS
    
    USER_NS_PREFIX=${1:-zippy}
    
    # Resource counts
    resource_counts(){
        echo -n "$1;"
        # All resource counts from user namespaces
        echo -n "$(oc get $1 --all-namespaces -o custom-columns=NAMESPACE:.metadata.namespace | grep $USER_NS_PREFIX | wc -l)"
        echo -n ";"
        # All resource counts from all namespaces
        echo "$(oc get $1 --all-namespaces -o name | wc -l)"
    }
    
    # Dig various timestamps out
    timestamps(){
        SBR_JSON=$1
        DEPLOYMENTS_JSON=$2
        SBO_LOG=$3
        RESULTS=$4
    
        jq -rc '((.metadata.namespace) + ";" + (.metadata.name) + ";" + (.metadata.creationTimestamp) + ";" + (.status.conditions[] | select(.type=="Ready").lastTransitionTime))' $SBR_JSON > $RESULTS/tmp.csv
        echo "ServiceBinding;Created;ReconciledTimestamp;Ready;AllDoneTimestamp" > $RESULTS/sbr-timestamps.csv
        for i in $(cat $RESULTS/tmp.csv); do
            ns=$(echo -n $i | cut -d ";" -f1)
            name=$(echo -n $i | cut -d ";" -f2)
            echo -n $ns/$name;
            echo -n ";";
            echo -n $(date -d $(echo -n $i | cut -d ";" -f3) "+%F %T");
            echo -n ";";
            log=$(cat $SBO_LOG | grep $ns)
            date -d @$(echo $log | jq -rc 'select(.msg | contains("Reconciling")).ts' | head -n1) "+%F %T.%N" | tr -d "\n"
            echo -n ";";
            echo -n $(date -d $(echo -n $i | cut -d ";" -f4) "+%F %T");
            echo -n ";";
            done_ts=$(echo $log | jq -rc 'select(.msg | contains("Done")) | select(.retry==false).ts')
            if [ -n "$done_ts" ]; then
                date -d "@$done_ts" "+%F %T.%N"
            else
                echo ""
            fi
        done >> $RESULTS/sbr-timestamps.csv
        rm -f $RESULTS/tmp.csv
    
        jq -rc '((.metadata.namespace) + ";" + (.metadata.name) + ";" + (.metadata.creationTimestamp) + ";" + (.status.conditions[] | select(.type=="Available") | select(.status=="True").lastTransitionTime)) + ";" + (.metadata.managedFields[] | select(.manager=="manager").time)' $DEPLOYMENTS_JSON > $RESULTS/tmp.csv
        echo "Namespace;Deployment;Deployment_Created;Deployment_Available;Deployment_Updated_by_SBO;SB_Name;SB_created;SB_ReconciledTimestamp;SB_Ready;SB_AllDoneTimestamp" > $RESULTS/binding-timestamps.csv
        for i in $(cat $RESULTS/tmp.csv); do
            NS=$(echo -n $i | cut -d ";" -f1);
            echo -n $NS;
            echo -n ";";
            echo -n $(echo -n $i | cut -d ";" -f2);
            echo -n ";";
            echo -n $(date -d $(echo -n $i | cut -d ";" -f3) "+%F %T");
            echo -n ";";
            echo -n $(date -d $(echo -n $i | cut -d ";" -f4) "+%F %T");
            echo -n ";";
            echo -n $(date -d $(echo -n $i | cut -d ";" -f5) "+%F %T");
            echo -n ";";
            cat $RESULTS/sbr-timestamps.csv | grep $NS
        done >> $RESULTS/binding-timestamps.csv
        rm -f $RESULTS/tmp.csv
    }
    
    # Collect timestamps
    {
    # ServiceBinding resources in user namespaces
    oc get sbr --all-namespaces -o json | jq -r '.items[] | select(.metadata.namespace | contains("'$USER_NS_PREFIX'"))' > $RESULTS/service-bindings.json
    
    # Deployment resources in user namespaces
    oc get deploy --all-namespaces -o json | jq -r '.items[] | select(.metadata.namespace | contains("'$USER_NS_PREFIX'"))' > $RESULTS/deployments.json
    
    # ServiceBiding operator log
    oc logs $(oc get $(oc get pods -n openshift-operators -o name | grep service-binding-operator) -n openshift-operators -o jsonpath='{.metadata.name}') -n openshift-operators > $RESULTS/service-binding-operator.log
    
    timestamps $RESULTS/service-bindings.json $RESULTS/deployments.json $RESULTS/service-binding-operator.log $RESULTS
    } &
    
    # Collect resource counts
    {
    RESOURCE_COUNTS_OUT=$RESULTS/resource-count.csv
    echo "Resource;UserNamespaces;AllNamespaces" > $RESOURCE_COUNTS_OUT
    for i in $(cat resources.list); do
        resource_counts $i >> $RESOURCE_COUNTS_OUT;
    done
    } &
    
    wait
    

    Next steps

    This article introduced the metrics we collected to performance-test the Service Binding Operator for acceptance into the Developer Sandbox for Red Hat OpenShift. I showed how we collected both metrics of interest to the Developer Sandbox team, and additional metrics for our team specifically. Look for Part 5, the final article in this series, where I will present the testing rounds and their results.

    Last updated: January 15, 2025

    Related Posts

    • Troubleshooting application performance with Red Hat OpenShift metrics, Part 3: Collecting runtime metrics

    • Service Binding Operator: The Operator in action

    • Building an observability stack for automated performance tests on Kubernetes and OpenShift (part 2)

    Recent Posts

    • A guide to JIT checkpointing with Kubeflow Trainer on OpenShift AI

    • How to manage TLS certificates used by OpenShift GitOps operator

    • Configure a split disk on OpenShift Container Platform

    • Red Hat Enterprise Linux 10.2 and 9.8: Top features for developers

    • What GPU kernels mean for your distributed inference

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Chat Support

    Please log in with your Red Hat account to access chat support.