Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training

    Featured resources

    • Open Source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

Istio Chaos Engineering: I Meant to Do That

April 10, 2018
Don Schenck
Related topics:
ContainersJavaKubernetesMicroservicesNode.jsService mesh
Related products:
Red Hat OpenShift Container Platform

    If you break things before they break, it'll give you a break and they won't break.

    (Clearly, this is management-level material.)

    [This is part six of my ten-week Introduction to Istio Service Mesh series.  My previous article was Part 5: Istio Tracing & Monitoring: Where Are You and How Fast Are You Going?]

    Testing software isn't just challenging, it's important. Testing for correctness is one thing (e.g. "does this function return the correct result?"), but testing for failures in network reliability (the very first of the eight fallacies of distributed computing) is quite another task. One of the challenges is to be able to mimic or inject faults into the system. Doing it in your source code means changing the very code you're testing, which is impossible. You can't test the code without the faults added, but the code you want to test doesn't have the faults added. Thus the deadly embrace of fault injection and the introduction of Heisenbugs -- defects that disappear when you attempt to observe them.

    Let's see how Istio makes this oh so easy.

    We're All Fine Here Now, Thank You ... How Are You?

    Here's a scenario: Two pods are running our "recommendation" microservice (from our Istio Tutorial), one labeled "v1", the other labeled "v2". As you can see, everything is working just fine:

    (By the way, the number on the right is simply a counter for each pod)

    Everything is working swimmingly. Well... We can't have that now, can we? Let's have some fun and break things -- without changing any source code.

    Give Your Microservice A Break

    Here's the content of the yaml file we'll use to create an Istio route rule that breaks (503, server error) half the time:

    Notice that we're specifying a 503 error be returned 50 percent of the time.

    Here's another screen capture of a curl command loop running against the microservices, after we've implemented the route rule (above) to break things. Notice that once it goes into effect, half of the requests result in 503 errors, regardless of which pod (v1 or v2) is the endpoint:

    To restore normal operation, you need to simply delete the route rule; in our case the command is istioctl delete routerule recommendation-503 -n tutorial. "Tutorial" is the name of the Red Hat OpenShift project where this tutorial runs.

    Delay Tactics

    Generating 503 errors is helpful when testing the robustness of your system, but anticipating and handling delays is even more impressive -- and probably more common. A slow response from a microservice is like a poison pill that sickens the entire system. Using Istio, you can test your delay-handling code without changing any of your code. In this first example, we are exaggerating the network latency.

    Note that, after testing, you may need (or desire) to change your code, but this is you being proactive instead of reactive. This is the proper code-test-feedback-code-test... loop.

    Here's a route rule that will... Well, you know what? Istio is so easy to use, and the yaml file is so easy to understand, I'll let it speak for itself. I'm sure you'll immediately see what it does:

    Half the time we'll see a seven-second delay. Note that this is not like a sleep command in the source code; Istio is holding the request for seven seconds before completing the round trip. Since Istio supports Jaeger tracing, we can see the effect in this screen capture of the Jaeger UI. Notice the long-running request toward the upper right of the chart -- it took 7.02 seconds:

    This scenario allows you to test and code for network latencies. Of course, removing the route rule removes the delay. Again, I hate to belabor the point, but it's so important. We introduced this fault without changing our source code.

    Never Gonna Give You Up

    Another useful Istio feature related to chaos engineering is the ability to retry a service N more times. The thought is this: requesting a service may result in a 503 error, but a retry may work. Perhaps some odd edge case caused the service to fail the first time. Yes, you want to know about that and fix it. In the meantime, let's keep our system up and running.

    So we want a service to occasionally throw a 503 error, and then have Istio retry the service. Hmmm... If only there was a way to throw a 503 error without changing our code.

    Wait. Istio can do that. We just did that several paragraphs ago.

    Using the following file, we'll have 503 errors being thrown by our "recommendation-v2" service half the time:

    Sure enough, some requests are failing:

    Now we can introduce the Retry feature of Istio, using this nifty configuration:

    We've configured this route rule to retry up to 2-3 times, waiting two seconds between attempts. This should reduce (or hopefully eliminate) 503 errors:

    Just to recap: We have Istio tossing 503 errors for half of the requests, and we also have Istio performing three retries after a 503 error. As a result, everything is A-OK. By not giving up, but by using the Retry, we kept our promise.

    Did I mention we're doing all this with no changes to our source code? I may have mentioned that. Two Istio route rules were all it took:

    Never Gonna Let You Down

    Now it's time turn around and do the opposite; we want a scenario where we're going to wait only a given time span before giving up and deserting our request attempt. In other words, we're not going to slow down everything while waiting for one slow service. Instead, we will bail out of the request and use some sort of fallback position. Don't worry dear website user... We won't let you down.

    Istio allows us to establish a Timeout limit for a request. If the service takes longer than the Timeout, a 504 (Gateway Timeout) error is returned. Again, this is all done via Istio configuration. We did however add a sleep command to our source code (and rebuilt and redeployed the code in a container) to mimic a slow service. There's not really a no-touch way around this; we need slow code.

    After adding the three-second sleep to our recommendation (v2 image and redeploying the container), we'll add the following timeout rule via an Istio route rule:

    As you can see, we're giving the recommendation service one second before we return a 504 error. After implementing this route rule (and with the three-second sleep built into our recommendation:v2 service), here's what we get:

    Where Have I Heard This Before?

    Repeating, ad nauseam: we are able to set this timeout function with no changes to our source code. The value here is that you can now write your code to respond to a timeout and easily test it using Istio.

    All Together Now

    Injecting chaos into your system, via Istio, is a powerful way to push your code to the limits and test your robustness. Fallbacks, bulkheads, and circuit breaker patterns are combined with Istio's fault injection, delays, retries, and timeouts to support your efforts to build fault-tolerant, cloud-native systems. Using these technologies (combined with Kubernetes and Red Hat OpenShift), give you the tools needed to move into the future.

    And to give yourself a break.


    All articles in the "Introduction to Istio" series:

    • Part 1: Introduction to Istio; It Makes a Mesh of Things
    • Part 2: Istio Route Rules: Telling Service Requests Where to Go
    • Part 3: Istio Circuit Breaker: How to Handle (Pool) Ejection
    • Part 4: Istio Circuit Breaker: When Failure Is an Option
    • Part 5: Istio Tracing & Monitoring: Where Are You and How Fast Are You Going?
    • Part 6: Istio Chaos Engineering: I Meant to Do That
    • Part 7: Istio Dark Launch: Secret Services
    • Part 8: Istio Smart Canary Launch: Easing into Production
    • Part 9: Istio Egress: Exit Through the Gift Shop
    • Part 10: Istio Service Mesh Blog Series Recap
    Last updated: March 24, 2023

    Recent Posts

    • Red Hat UBI 8 builders have been promoted to the Paketo Buildpacks organization

    • Using eBPF in Red Hat products

    • How we made one data layer serve the UI, the mocks, and the E2E tests

    • Build trusted Python containers with Project Hummingbird and Calunga

    • Simplify distributed tracing: ObservabilityInstaller installation

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility