Skip to main content
Redhat Developers  Logo
  • Products

    Platforms

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat AI
      Red Hat AI
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • View All Red Hat Products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat Developer Hub
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat OpenShift Local
    • Red Hat Developer Sandbox

      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Secure Development & Architectures

      • Security
      • Secure coding
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • Product Documentation
    • API Catalog
    • Legacy Documentation
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Troubleshooting application performance with Red Hat OpenShift metrics, Part 4: Gathering performance metrics

July 29, 2021
Pavel Macik
Related topics:
KubernetesOperators
Related products:
Developer SandboxRed Hat OpenShift

Share:

    This series shows how to use Red Hat OpenShift metrics in a real-life performance testing scenario. I used these metrics to run performance tests on the Service Binding Operator. We used the results to performance-tune the Service Binding Operator for acceptance into the Developer Sandbox for Red Hat OpenShift.

    In Part 3, I showed you how we set up OpenShift's monitoring stack to collect runtime metrics for our testing scenarios. I also shared a collector script that ensures the results are preserved on a node that won't crash. Now, we can look at the performance metrics we'll use and how to gather the data we need.

    Read the whole series:

    • Part 1: Performance requirements
    • Part 2: The test environment
    • Part 3: Collecting runtime metrics
    • Part 4: Gathering performance metrics
    • Part 5: Test rounds and results

    CPU and memory usage

    As mentioned in Part 3, some metrics have to be collected for the duration of the test. The most important metrics required by the Developer Sandbox team were the CPU and memory usage of both OpenShift nodes and the tested operators. We also wanted to see the number of resources left in the cluster after the test was done.

    Unfortunately, we ran into problems during our first attempts to run the stress tests. The OpenShift cluster actually crashed when one of its worker nodes came down. Naturally, it was important to know what caused the failure. I needed a more granular view of the CPU and memory usage, so I had to collect data from the workloads deployed on those nodes.

    From watching and inspection using the cluster's own Grafana instance, I identified a couple of resources that were loaded and stressed by our scenario. Then, I included them to be watched by the collector script I shared in the previous article, together with the cluster's nodes.

    I identified workloads from a handful of namespaces as stressed, and included them to be watched for CPU and memory usage. Table 1 shows these namespaces and workloads.

    Table 1: Workloads to watch for CPU and memory usage.
    NamespaceWorkloads
    openshift-apiserverapiserver-* pods
    openshift-kube-apiserverkube-apiserver-* pods
    openshift-monitoringprometheus-k8s-* pods
    openshift-operators
    • service-binding-operator-* pods
    • rhoas-operator-* pods
    openshift-operator-lifecycle-manager
    • catalog-operator-* pods
    • olm-operator-* pods

    Resources created in the cluster

    Another one of the metrics requested by the Developer Sandbox team was the number of resources created in the cluster during the test. There were two ways to get that information:

    • If the Prometheus instance was available (meaning the node on which it is deployed had not crashed), I could use a simple Prometheus query:

      sort_desc(cluster:usage:resources:sum)
    • If the Prometheus instance was not available, I had to employ a brute-force approach, using the oc tool to count the number of each resource.

    Service Binding Operator performance and developer experience

    The final set of metrics was related to the performance of the Service Binding Operator itself, from a developer's perspective. Specifically, we wanted to know how long it took to perform the binding after the ServiceBinding resource was created.

    The typical situation for a developer using the Service Binding Operator is to have a backing service and an application running that the user wants to bind together. So, the developer sends a ServiceBinding request and expects the binding to be done by Service Binding Operator. The scenario can be split into the following sequence of steps. Each step is shown along with the way to retrieve the respective timestamp:

    1. The ServiceBinding request is sent, processed by OpenShift, and created internally as a resource (.metadata.creationTimestamp of the ServiceBinding resource).
    2. The Service Binding Operator picks up the resource while watching for it and processes it. (This is the first "Reconciling ServiceBinding" message for the particular ServiceBinding resource in the Service Binding Operator logs.)
    3. Based on the content of the resource, the Service Binding Operator performs the binding. It collects bindable information from the backing service and injects it into the application.
    4. The application is re-deployed with the bound information injected into the Deployment resource (.status.conditions[] | select(.type=="Available") | select(.status=="True").lastTransitionTime).
    5. The ServiceBinding resource is marked as done. (A "Done" message is sent for the particular ServiceBinding resource in the Service Binding Operator logs.)

    I defined the following metrics to evaluate the developer experience in this scenario:

    • Time to Ready: The time between the creation of the ServiceBinding resource (1) and completion of the binding (5). That can be further split into the following:
      • Time to Reconcile: The time between the creation of the ServiceBinding resource (1) and when it is picked up by Service Binding Operator (2).
      • Reconcile to Ready: The time between when the Service Binding Operator picks up the ServiceBinding (2) and completes the binding (5).

    At the time of the performance evaluation, these metrics were not collected by OpenShift's monitoring stack or by the Service Binding Operator. So, I had to dig up the information from the data that I could get. As shown earlier, I derived the information I needed from metadata that OpenShift collects about the active user's backing service and the application (especially timestamps), along with the Service Binding Operator logs. I wrote the following script to collect the necessary data from OpenShift and compute the information after the test is complete:

    if [ -z "$QUAY_NAMESPACE" ]; then
        echo "QUAY_NAMESPACE environemnt variable needs to be set to a non-empty value"
        exit 1
    fi
    
    DT=$(date "+%F_%T")
    RESULTS=results-$DT
    mkdir -p $RESULTS
    
    USER_NS_PREFIX=${1:-zippy}
    
    # Resource counts
    resource_counts(){
        echo -n "$1;"
        # All resource counts from user namespaces
        echo -n "$(oc get $1 --all-namespaces -o custom-columns=NAMESPACE:.metadata.namespace | grep $USER_NS_PREFIX | wc -l)"
        echo -n ";"
        # All resource counts from all namespaces
        echo "$(oc get $1 --all-namespaces -o name | wc -l)"
    }
    
    # Dig various timestamps out
    timestamps(){
        SBR_JSON=$1
        DEPLOYMENTS_JSON=$2
        SBO_LOG=$3
        RESULTS=$4
    
        jq -rc '((.metadata.namespace) + ";" + (.metadata.name) + ";" + (.metadata.creationTimestamp) + ";" + (.status.conditions[] | select(.type=="Ready").lastTransitionTime))' $SBR_JSON > $RESULTS/tmp.csv
        echo "ServiceBinding;Created;ReconciledTimestamp;Ready;AllDoneTimestamp" > $RESULTS/sbr-timestamps.csv
        for i in $(cat $RESULTS/tmp.csv); do
            ns=$(echo -n $i | cut -d ";" -f1)
            name=$(echo -n $i | cut -d ";" -f2)
            echo -n $ns/$name;
            echo -n ";";
            echo -n $(date -d $(echo -n $i | cut -d ";" -f3) "+%F %T");
            echo -n ";";
            log=$(cat $SBO_LOG | grep $ns)
            date -d @$(echo $log | jq -rc 'select(.msg | contains("Reconciling")).ts' | head -n1) "+%F %T.%N" | tr -d "\n"
            echo -n ";";
            echo -n $(date -d $(echo -n $i | cut -d ";" -f4) "+%F %T");
            echo -n ";";
            done_ts=$(echo $log | jq -rc 'select(.msg | contains("Done")) | select(.retry==false).ts')
            if [ -n "$done_ts" ]; then
                date -d "@$done_ts" "+%F %T.%N"
            else
                echo ""
            fi
        done >> $RESULTS/sbr-timestamps.csv
        rm -f $RESULTS/tmp.csv
    
        jq -rc '((.metadata.namespace) + ";" + (.metadata.name) + ";" + (.metadata.creationTimestamp) + ";" + (.status.conditions[] | select(.type=="Available") | select(.status=="True").lastTransitionTime)) + ";" + (.metadata.managedFields[] | select(.manager=="manager").time)' $DEPLOYMENTS_JSON > $RESULTS/tmp.csv
        echo "Namespace;Deployment;Deployment_Created;Deployment_Available;Deployment_Updated_by_SBO;SB_Name;SB_created;SB_ReconciledTimestamp;SB_Ready;SB_AllDoneTimestamp" > $RESULTS/binding-timestamps.csv
        for i in $(cat $RESULTS/tmp.csv); do
            NS=$(echo -n $i | cut -d ";" -f1);
            echo -n $NS;
            echo -n ";";
            echo -n $(echo -n $i | cut -d ";" -f2);
            echo -n ";";
            echo -n $(date -d $(echo -n $i | cut -d ";" -f3) "+%F %T");
            echo -n ";";
            echo -n $(date -d $(echo -n $i | cut -d ";" -f4) "+%F %T");
            echo -n ";";
            echo -n $(date -d $(echo -n $i | cut -d ";" -f5) "+%F %T");
            echo -n ";";
            cat $RESULTS/sbr-timestamps.csv | grep $NS
        done >> $RESULTS/binding-timestamps.csv
        rm -f $RESULTS/tmp.csv
    }
    
    # Collect timestamps
    {
    # ServiceBinding resources in user namespaces
    oc get sbr --all-namespaces -o json | jq -r '.items[] | select(.metadata.namespace | contains("'$USER_NS_PREFIX'"))' > $RESULTS/service-bindings.json
    
    # Deployment resources in user namespaces
    oc get deploy --all-namespaces -o json | jq -r '.items[] | select(.metadata.namespace | contains("'$USER_NS_PREFIX'"))' > $RESULTS/deployments.json
    
    # ServiceBiding operator log
    oc logs $(oc get $(oc get pods -n openshift-operators -o name | grep service-binding-operator) -n openshift-operators -o jsonpath='{.metadata.name}') -n openshift-operators > $RESULTS/service-binding-operator.log
    
    timestamps $RESULTS/service-bindings.json $RESULTS/deployments.json $RESULTS/service-binding-operator.log $RESULTS
    } &
    
    # Collect resource counts
    {
    RESOURCE_COUNTS_OUT=$RESULTS/resource-count.csv
    echo "Resource;UserNamespaces;AllNamespaces" > $RESOURCE_COUNTS_OUT
    for i in $(cat resources.list); do
        resource_counts $i >> $RESOURCE_COUNTS_OUT;
    done
    } &
    
    wait
    

    Next steps

    This article introduced the metrics we collected to performance-test the Service Binding Operator for acceptance into the Developer Sandbox for Red Hat OpenShift. I showed how we collected both metrics of interest to the Developer Sandbox team, and additional metrics for our team specifically. Look for Part 5, the final article in this series, where I will present the testing rounds and their results.

    Last updated: January 15, 2025

    Related Posts

    • Troubleshooting application performance with Red Hat OpenShift metrics, Part 3: Collecting runtime metrics

    • Service Binding Operator: The Operator in action

    • Building an observability stack for automated performance tests on Kubernetes and OpenShift (part 2)

    Recent Posts

    • A deep dive into Apache Kafka's KRaft protocol

    • Staying ahead of artificial intelligence threats

    • Strengthen privacy and security with encrypted DNS in RHEL

    • How to enable Ansible Lightspeed intelligent assistant

    • Why some agentic AI developers are moving code from Python to Rust

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue