Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Metrics and traces correlation in Kiali

February 18, 2020
Joel Takvorian
Related topics:
KubernetesMicroservicesService Mesh
Related products:
Developer Tools

Share:

    Metrics, traces, and logs might be the Three Pillars of Observability, as you've certainly already heard. This mantra helps us focus our mindset around observability, but it is not a religion. "There is so much more data that can help us have insight into our running systems," said Frederic Branczyk at KubeCon last year.

    These three kind of signals do have their specificities, but they also have common denominators that we can generalize. They could all appear on a virtual timeline and they all originate from a workload, so they are timed and sourced, which is a good start for enabling correlation. If there's anything as important as knowing the signals that a system can emit, it's knowing the relationships between those signals and being able to correlate one with another, even when they're not strictly of the same nature. Ultimately, we can postulate that any sort of signal that is timed and sourced is a good candidate for correlation as well, even if we don’t have hard links between them.

    This fact is, of course, not something new. Correlation has always been possible, but the true stake is to make it easier, and hence cheaper. What makes correlation easier today? I can see at least one pattern that helps, and that we see more and more in monitoring systems: An automatic and consistent sourcing of incoming signals.

    When you use Prometheus in Kubernetes, the Kubernetes service discovery might be enabled and configured for label mapping. As the name suggests, this mechanism maps pods' existing labels to Prometheus labels, or in other words, it forwards source context into metrics (hence, allowing filters and aggregations based on that information). This setup participates in automatic and consistent sourcing. Loki, for instance, has the same for logs. If you can define a context for metrics search and reuse that same context for logs search, then guess what you have? Easier correlation.

    But that's just a step, not the end of the journey.

    New correlation feature in Kiali

    In Kiali, our observability console for Istio, we recently started work regarding correlation. We still have a long way to go, but we're definitely involved. In a previous post, I described how Kiali can help with troubleshooting by navigating between screens (graph, logs, metrics, and traces) while always keeping an active context. We wanted to do more, such as visually correlating traces and metrics, so that when we're seeing an oddly behaving metric we can try to relate it with traces—or the other way around, analyze metrics behavior near high-latency traces.

    In order to do that, each metric chart in Kiali has now a Span duration legend item that, when clicked, shows the spans on that chart as you can see in Figure 1.

    Span duration plots displayed along with Istio request duration metric
    Span duration plots displayed along with Istio request duration metric

    Why spans and not traces? This chart is a service-centric view. We only want to show what is strictly related to the service to better correlate with the displayed metrics, while a trace would encompass calls from other services as well. But be reassured, we can jump from a span to its trace as shown in Figure 2. Kiali now integrates its own traces view along with external links to the Jaeger UI.

     

    Kiali displaying trace details and metadata
    Trace details and meta-data

    This setup is nice because we can now correlate, for instance, the Istio response time metric with actual traces and view all the metadata associated with a trace, which I’m sure will be a typical scenario in troubleshooting high latencies in Kiali. But it's not only about response time: Kiali can monitor non-Istio metrics as well, such as JVM memory. So, we could also correlate a memory increase with actual traces as shown in Figure 3 (or any other metric).

    Kiali chart correlating a memory increase with actual traces.
    Spans spike clearly correlated with an increase of memory and threads used

    Potential pitfall

    There’s a pitfall, though: The spans displayed are limited in number. When the volumetry is high, this limit is quickly reached. Sampling strategies can be configured with Jaeger to limit the amount of ingested traces, but the problem remains: We might miss relevant data. Troubleshooting high latencies often means looking at p99 latencies, or p99.9, or even max. The more we want to have a sharp look, the more we need to work from a complete dataset basis.

    Today, Kiali tries to show the most relevant spans first, such as the ones with errors or high latency. This tactic is similar to what we can do with tail-based sampling, except Kiali does it at query time. This setup is also perfectible because it makes assumptions regarding what is relevant, and anyway, it will still reach a limit at some point.

    There are several ideas around aggregation that we can consider tackling. Some tools apparently do this already, like shown here by Pinterest, and there are several possible approaches (keeping in mind that Kiali is an API-consuming tool that at the moment doesn’t come with persistent storage). Handling traces is still an open field in Kiali and people are welcome to contribute!

    Correlation with exemplars

    When it comes to correlating traces and metrics, there is another option that may come to mind: deep linking metrics and traces through exemplars (see the screenshot in Figure 4).

    Screen capture of the video above, featuring Rob Skillington at KubeCon 2019, San Diego, using Grafana
    Screen capture of the video above, featuring Rob Skillington at KubeCon 2019, San Diego, using Grafana

    The details are being formalized in the OpenMetrics specifications. The idea is to enrich the metrics scraping endpoints with trace IDs associated with one or more metrics. That trace is an exemplar (just a single one, among potentially many others).

    This ID will not be a Prometheus label to avoid impacting metric cardinality. The implementation in Prometheus is not finished yet. In Jaeger, we can imagine that the presence of exemplars would influence sampling decisions, but this issue is not relevant today. This is definitely a hot topic among Prometheus/Grafana/tracing communities. We are following it with interest for Kiali.

    However, again, questions might be raised about the representativeness of a single exemplar among many traces. Correlation is not done just for the sake of correlation, but because it helps solve a real problem. Exemplar linking will help to spot some of them or figure out some business/technical processes involved while looking at metrics. But, there's more that we can do in the field of trace/span aggregation in order to better figure out the health of a system and to troubleshoot. (Not as opposed to exemplars, but as a complement in the debugger's toolset.)

    So, what's next?

    We will continue to work on correlations and traces, such as considering more signals and easing the troubleshooting path. And why not analytics as well? If you have any suggestions or comments, do not hesitate to get in touch. Remember that Kiali is an open source project and you're welcome to contribute with code, or ideas, or both.

    Thanks to Simon Pasquier, Gary Brown, Alissa Bonas and Juca Paixão Kröhling for reviewing and sharing ideas.

    Last updated: February 13, 2024

    Recent Posts

    • Create and enrich ServiceNow ITSM tickets with Ansible Automation Platform

    • Expand Model-as-a-Service for secure enterprise AI

    • OpenShift LACP bonding performance expectations

    • Build container images in CI/CD with Tekton and Buildpacks

    • How to deploy OpenShift AI & Service Mesh 3 on one cluster

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue