Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Egress IP Scale Testing in OpenShift Container Platform

June 21, 2024
Venkata Anil Kommaddi
Related topics:
Developer ToolsDevOpsKubernetes
Related products:
Red Hat OpenShift

Share:

    Introduction

    This blog post explores how kube-burner-ocp, an opinionated wrapper designed  on top of kube-burner can be used to simplify performance and scale testing, and  leveraged to evaluate egress IP scalability  in OpenShift’s default CNI plugin which is OVNKubernetes . We’ll delve into the intricacies of the egress IP feature, its role in traffic management, and how kube-burner and kube-burner-ocp are helping us understand its behavior under load. This blog also serves as a classic example of how the Red Hat Performance and Scale team works with the Development team to understand, test, characterize and improve features with a holistic approach, for the benefit of our customers who reply on OpenShift for their mission critical workloads on-prem and in the cloud. 

    Egress IP is a network abstraction assigned to a set of pods within a project. When a pod is configured with egress IP, traffic originating from the pod which is destined to outside the cluster undergoes source IP address translation with the egress IP address by OVN. This ensures external resources, such as firewalls, can consistently identify traffic from a set of pods using a single IP address, regardless of their hosting nodes.

    A key requirement for an egress IP is that it should be on the same subnet as OpenShift  node’s primary interface. The egress IP for pods from a particular namespace/project can live on any one of the worker nodes in the OpenShift cluster.

    Packet Flow  through egress IPs

    Figure 1: Packet flow through egress IP during the testing
    Figure 1: Packet flow through egress IP during the testing
    Created by vkommadi@redhat.com,

     Let us use the above pictorial representation to understand the flow of traffic during this egress IP testing. Client pod (10.128.2.10) on worker1, periodically sends requests to the external Nginx server (10.0.34.146) with a 1-second interval and 3-second timeout. When OVN on Worker1 processes a packet originating from this client pod, it identifies that the pod is configured with an egress IP. This egress IP is hosted on Worker2. As a result, OVN on Worker1 forwards the packet to OVN on Worker2. OVN on worker2 replaces the source IP with egress IP (SNAT and some other magic) and sends the packet to the external Nginx server. External Nginx server receives the request with the egress IP as the source IP in the packet.The Nginx server is customized to function as an echo server. It is configured to retrieve the source IP of received packets, which corresponds to the egress IP. The server then frames a reply packet, embedding this egress IP in the reply packet’s data field. As a result, the response packet from the Nginx server includes the egress IP. OVN on worker2 replaces the destination egress IP with client pod’s IP and forwards the packet back to the client pod. The client pod through our custom written logic validates that the response indeed contains the relevant egress IP, which confirms a successful Egress IP data flow.

    Kube-burner for Egress IP Testing

    Kube-burner is an open-source kubernetes performance and scale test orchestration toolset written in Golang.  It provides multi-faceted functionality, the most important of which are summarized below.

    • Create, delete and patch Kubernetes resources at scale
    • Prometheus metric collection and indexing
    • Measurements
    • Alerting

    More details about kube-burner including its recent inclusion into the prestigious CNCF sandbox can be found in the following blogs: https://www.redhat.com/en/blog/introducing-kube-burner-a-tool-to-burn-down-kubernetes-and-openshift, https://developers.redhat.com/articles/2024/03/04/test-kubernetes-performance-and-scale-kube-burner and https://www.redhat.com/en/blog/kube-burner-fanning-flames-innovation-cncf-sandbox.

    Kube-burner in its most common usage is invoked by passing a YAML based configuration file to the executable on the CLI. The configuration file contains certain global configuration options such as the endpoints of the Elasticsearch/OpenSearch indexer followed by a list of jobs, with each job having its own set of supported parameters. Each job could create, delete or patch objects at a rate defined by the QPS and Burst parameters within the job. An example job would be one that creates several deployment objects per namespace across several namespaces (determined by the jobIterations parameter) and through each deployment creates multiple pods.

    In the context of egress IP test orchestration, kube-burner reads a provided job configuration and creates the specified objects for each job iteration. 

    Each iteration involves:

    • Creating a namespace (our test ran 12000 iteration deploying 12000 namespaces)
    • Deploying pods within the namespace (our test deployed 2 pods per namespace)
    • Creating a cluster-scoped egress IP object with unique Egress IP addresses (our test deployed 2 egress IPs for matching the 2 pods of the namespace)

    In our test, a configuration file  with a job containing egress IP and deployment objects was run with 12,000 iterations. This was done to test the scalability of the solution to ensure that customers use cases that heavily rely on egress IPs are thoroughly tested internally to find and fix bottlenecks in a quick and reliable manner.

    Kube-burner generates unique egress IP addresses for each job iteration and passes them to both the egress IP and pod resources that are created as a part of that iteration. This helps in binding the egress IP to the pods and the validation of egress IP by the pods in the response packet as described in the earlier section.

    Figure 2: Kube-burner job templates used for the testing
    Figure 2: Kube-burner job templates used for the testing

    Kube-burner renders the above templates for each iteration. For example, in iteration 0, it creates

    • namespace egressip-0
    • cluster scoped egressip-obj-0 egress IP object
    • pods client-1-0 and client-1-1 in that egressip-0 namespace
    Figure 3:  A look at the pod and egress IP resources created by kube-burner
    Figure 3: A look at the pod and egress IP resources created by kube-burner

    Users can either manually provision an AWS instance with nginx server, or leverage the e2e-benchmarking tool (discussed in the below section) which automates this process. Subsequently, they can run the kube-burner-ocp command to invoke the kube-burner-ocp wrapper to run the egress Ip scale testing workload as shown below: 

    kube-burner-ocp egressip --uuid <uuid> --addresses-per-iteration=1 --iterations=10 --external-server-ip=10.0.40.194

    Automation with e2e-benchmarking

    Our CI pipeline for continuous automated performance and scale testing  leverages a wrapper tool called   e2e-benchmarking ) to automate the creation of test environments and configuration of kube-burner. This tool provisions an AWS instance with the same VPC, subnet and security groups (client should reach server on ports 9002 to 9064) as worker node and runs customized nginx server containers to listen on  ports 9002 to 9064. The Nginx server configuration includes adding the egress IP to the return packet data field as described in the section on the Packet Flow through egress IPs 

    OVN assigns egress IPs only to OpenShift worker nodes labeled with ‘k8s.ovn.org/egress-assignable’. For our testing purposes, we need all OpenShift worker nodes eligible for hosting egress IPs. Therefore, this tool automatically labels all OpenShift worker nodes with ‘k8s.ovn.org/egress-assignable’ label.

    The README in e2e-benchmarking repo provides details about how to run this workload

    git clone https://github.com/cloud-bulldozer/e2e-benchmarking
    cd e2e-benchmarking/workloads/kube-burner-ocp-wrapper/
    EXTRA_FLAGS="--addresses-per-iteration=1" AWS_ACCESS_KEY_ID=<aws_key> AWS_SECRET_ACCESS_KEY=<aws_secret> WORKLOAD=egressip ITERATIONS=10 ./run.sh

    Goal of Scale testing

    The primary goals of this scale testing are to measure startup and failover latency at scale for pods configured with egress IPs. Initially, latency was measured using startup and liveness probes as shown below:

            startupProbe:
             exec:
               command:
               - "/bin/sh"
               - "-c"
               {{- $eips := (splitList " " (GetIPAddress .eipAddresses .Iteration .addrPerIteration) | join "|") }}
               - "curl -s http://${SERVER_HOST}:${SERVER_PORT} | grep -qE '{{$eips}}'"
             periodSeconds: 5
             timeoutSeconds: 3
             failureThreshold: 600
           livenessProbe:
             exec:
               command:
               - "/bin/sh"
               - "-c"
               {{- $eips := (splitList " " (GetIPAddress .eipAddresses .Iteration .addrPerIteration) | join "|") }}
               - "curl -s http://${SERVER_HOST}:${SERVER_PORT} | grep -qE '{{$eips}}'"
             periodSeconds: 10
             timeoutSeconds: 5
             failureThreshold: 3

    Later, a Golang eivalidator application (image quay.io/cloud-bulldozer/eipvalidator prepared from this application) was used to send http requests, validate OVN’s use of the egress IP by inspecting responses, measure latency and expose that as prometheus metrics.Our testing was done with this Golang application instead of startup and liveness probes.

    Testing on Openshift

    AWS imposes limits on the maximum number of IP addresses that can be assigned per network interface on an instance, ranging from 30 to 50 IP addresses depending on instance size. This means we can only test 30-50 egress IPs per OpenShift worker node. To overcome this limitation, we conducted our tests on a 120-node baremetal OpenShift cluster in one of our internal labs. Each node is a Dell R640 machine with 80 CPUs, 384GB memory and a 25 Gbps NIC for the Node/Pod Networking.

    In a test with 12,000 iterations:

    • 12,000 namespaces were created
    • Each namespace contained 2 pods and 2 egress IPs
    • This resulted in a total of 24,000 pods and 24,000 egress IPs

    Key Findings

    • Startup Latency: To create 24,000 egress IPs (200 per worker node) on the OpenShift cluster, the 99th percentile (P99) startup latency was 8.4 milliseconds. Startup latency refers to the time taken for the application in the pod to begin running and successfully send HTTP request to the nginx server and then validate the HTTP response containing the egress IP (i.e including request response round trip time).
    • Failover Latency: OVN monitors health of the nodes by probing a configured port, typically port 9107. When this probe fails, OVN initiates an egress IP failover to other nodes. To simulate egress IP failover, we blocked the health check port on a worker node that was hosting 200 egress IPs. This triggered OVN to seamlessly fail over 200 Egress IPs to other worker nodes. HTTP requests for some client pods will fail during this failover time and once the successful connection is reestablished, we record the failover latency for these pods. Initial failover latencies were high (around 41 seconds), prompting a bug report https://issues.redhat.com/browse/OCPBUGS-32161 that the OVN team successfully addressed, reducing failover latency to a commendable 2 seconds. This is another great example of how diligent scale testing in collaboration with Development teams help establish OpenShift as the industry leader in Performance and Scalability.
    Disclaimer: Please note the content in this blog post has not been thoroughly reviewed by the Red Hat Developer editorial team. Any opinions expressed in this post are the author's own and do not necessarily reflect the policies or positions of Red Hat.

    Recent Posts

    • How Kafka improves agentic AI

    • How to use service mesh to improve AI model security

    • How to run AI models in cloud development environments

    • How Trilio secures OpenShift virtual machines and containers

    • How to implement observability with Node.js and Llama Stack

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue