Istio Tracing & Monitoring: Where Are You and How Fast Are You Going?

The Heisenberg Uncertainty Principle states that you cannot measure an object's position and velocity at the same time. If it's moving, it's not in a location. If it's in a location, then it has no velocity.

Thanks to some awesome open-source software, our microservices running in Red Hat OpenShift (using Kubernetes) can report both their performance and their health. Granted, they can't violate the Uncertainty Principle, but they can help bring certainty to your cloud-native applications. Istio brings tracing and monitoring to your system with very little effort, helping you keep things humming.

[This is part five of my ten-week Introduction to Istio Service Mesh series. My previous article was Part 4: Istio Circuit Breaker: When Failure Is an Option.]

Let's Come To Terms

First, let's agree on some basic terminology.

Tracing refers to the logging of system activity. That sounds pretty broad, but in fact, one of the basic rules of tracing is that data are dumped into a tracing data store without regard to formatting. The consumer of the data is responsible for finding and parsing the relevant data. For Istio, Jaeger is the tracing system used; it follows the OpenTracing Data Model.

Traces are the end-to-end data that make up a request, or unit of work. For example, from the moment a user clicks the button on the web page until the data are returned, and all the microservices in between. One trace represents (or models) that round trip. In the Jaeger UI, traces are displayed with a time axis. An analogy might be an entire chain, made up of individual links. Traces are made up of one or more spans.

Spans are the slices of time represented by the beginning to the end of a unit of work. Using our chain analogy, each span represents a link in the chain. Spans can have zero, one, or many child spans. At the root level, the initial span's elapsed time will equal the elapsed time of the trace that owns it.

Monitoring is the act of watching your system, by a UI and/or automation. Tracing data drives monitoring. Monitoring in Istio is provided by Prometheus and includes a UI. Prometheus supports automated monitoring via Alerts and Alert Managers.

Dropping Breadcrumbs

To implement tracing, the application needs to create a collection of "Spans". Those "Spans" will be exported to Jaeger, so it can create the visualization of the tracing. A "Span" marks, among other data, an operation name, a start timestamp, a finish timestamp. The propagation of the "Span" is done by forwarding Jaeger specific HTTP Request Headers from inbound requests to outbound requests. Depending on your development language and/or your approach, this may require a small modification to your source code. The following Java (using Spring Boot) code gives an example of adding B3 (Zipkin-style) headers to your request in the Spring configuration class:

Here are the header settings used:

If you're using Java, there is a no-code approach that simply requires a few lines added to your Maven POM file and setting some environment variables. These lines added to your POM.XML file will inject the Jaeger Tracer Resolver:

In the docker build configuration file (Dockerfile), the environment variables are set:

Now that everything is wired up, our microservices will start emitting tracing data.

Paradise By The Dashboard: Lite

A simple dashboard, built using Grafana, is included with your Istio installation. With everything up and running in our Red Hat OpenShift PaaS (for this blog post I'm using minishift to run Red Hat OpenShift and Kubernetes) we can open the dashboard by running this command:

open "$(minishift openshift service grafana -u)/d/1/istio-dashboard?refresh=5&ordId=1"

The Grafana Istio dashboard gives you quick insight into how your system is doing. Here's a screen capture of just a small part of the dashboard:

We can see that the "customer" microservice is calling to "preference" version 1, and preference is calling to both version 1 and version 2 of "recommendation". The information available on the Grafana dashboard includes a Dashboard Row with high-level metrics (e.g. Global Request Volume, success rates, 4xx errors), a Server Mesh view with charts for each service, and a Services row with details about each container for each service.

Dig A Little Deeper In The Well

With the proper tracing in place, Istio's out-of-the-box experience allows us to dive deeper into our system's performance. Using the Jaeger UI, we can view traces, see how far and deep they go, and get an idea of where performance might be lagging. Again, using minishift to run Red Hat OpenShift, I was able to use the following command to get to see the Jaeger UI:

minishift openshift service jaeger-query --in-browser

Examining this trace, we can see:

Seven spans.
A total elapsed time of 6.99ms.
The last service in the chain, "recommendation", took .69ms.

This type of graph gives you a visible understanding how just one underperforming service in a chain can hobble your entire system.

Things get very interesting when you run multiple instances of a microservice. I ran the command oc scale --replicas=2 deployment/recommendation-v2 to get two "recommendation:v2" microservices running. Here are my pods after the command:

Back at the Jaeger UI, if I expand the "recommendation" span, I can see the exact pod that is being routed to. This is information you want to know in the event that a particular pod is underperforming. Notice the node_id in this panel:

Where's Everybody Going?

Finally, using the Prometheus UI, I can see what I expect: two out of three requests are being routed to v2 of the recommendation pods; they outnumber the v1 by 2:1 at this point, so this makes sense. As pods are scaled up and down, this graph will display the results. This graph will be particularly helpful when you're performing a Canary Deployment (that will be covered in this blog three weeks from now).

We're Just Getting Started

This is just (as they say) scratching the surface. Jaeger, Grafana, and Prometheus have depths of information that can fill books. This blog post is simply intended to get you headed in the right direction and to let you know what is possible using Istio.

Remember: all this comes built into Istio. Depending on your language (Java, for example) and framework (Spring Boot), you could implement all this with no changes to your source code. Other languages (I'm looking at you, Nodejs, and C#) may require some small code changes. But since tracing is a requirement of a robust cloud-native system, you're going to be making those changes whether you use Istio or not. Why not leverage your effort? For free.

Then you'll know just where you are, and how fast you're going.

With certainty.

All articles in the "Introduction to Istio" series:

Part 1: Introduction to Istio; It Makes a Mesh of Things
Part 2: Istio Route Rules: Telling Service Requests Where to Go
Part 3: Istio Circuit Breaker: How to Handle (Pool) Ejection
Part 4: Istio Circuit Breaker: When Failure Is an Option
Part 5: Istio Tracing & Monitoring: Where Are You and How Fast Are You Going?
Part 6: Istio Chaos Engineering: I Meant to Do That
Part 7: Istio Dark Launch: Secret Services
Part 8: Istio Smart Canary Launch: Easing into Production
Part 9: Istio Egress: Exit Through the Gift Shop
Part 10: Istio Service Mesh Blog Series Recap

Last updated: February 24, 2022

Istio Tracing & Monitoring: Where Are You and How Fast Are You Going?

Share:

Let's Come To Terms

Dropping Breadcrumbs

Paradise By The Dashboard: Lite

Dig A Little Deeper In The Well

Where's Everybody Going?

We're Just Getting Started

All articles in the "Introduction to Istio" series:

The new Multi-Cluster Alerting UI: A developer preview

How to run MicroShift with OpenShift Local and Podman Desktop

Essential Buildpack resources for Node.js developers

How to fine-tune LLMs with Kubeflow Training Operator

Deliver generative AI at scale with NVIDIA NIM on OpenShift AI

Products

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Report a website issue