Skip to main content
Redhat Developers  Logo
  • AI

    Get started with AI

    • Red Hat AI
      Accelerate the development and deployment of enterprise AI solutions.
    • AI learning hub
      Explore learning materials and tools, organized by task.
    • AI interactive demos
      Click through scenarios with Red Hat AI, including training LLMs and more.
    • AI/ML learning paths
      Expand your OpenShift AI knowledge using these learning resources.
    • AI quickstarts
      Focused AI use cases designed for fast deployment on Red Hat AI platforms.
    • No-cost AI training
      Foundational Red Hat AI training.

    Featured resources

    • OpenShift AI learning
    • Open source AI for developers
    • AI product application development
    • Open source-powered AI/ML for hybrid cloud
    • AI and Node.js cheat sheet

    Red Hat AI Factory with NVIDIA

    • Red Hat AI Factory with NVIDIA is a co-engineered, enterprise-grade AI solution for building, deploying, and managing AI at scale across hybrid cloud environments.
    • Explore the solution
  • Learn

    Self-guided

    • Documentation
      Find answers, get step-by-step guidance, and learn how to use Red Hat products.
    • Learning paths
      Explore curated walkthroughs for common development tasks.
    • Guided learning
      Receive custom learning paths powered by our AI assistant.
    • See all learning

    Hands-on

    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.
    • Interactive labs
      Learn by doing in these hands-on, browser-based experiences.
    • Interactive demos
      Click through product features in these guided tours.

    Browse by topic

    • AI/ML
    • Automation
    • Java
    • Kubernetes
    • Linux
    • See all topics

    Training & certifications

    • Courses and exams
    • Certifications
    • Skills assessments
    • Red Hat Academy
    • Learning subscription
    • Explore training
  • Build

    Get started

    • Red Hat build of Podman Desktop
      A downloadable, local development hub to experiment with our products and builds.
    • Developer Sandbox
      Spin up Red Hat's products and technologies without setup or configuration.

    Download products

    • Access product downloads to start building and testing right away.
    • Red Hat Enterprise Linux
    • Red Hat AI
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Featured

    • Red Hat build of OpenJDK
    • Red Hat JBoss Enterprise Application Platform
    • Red Hat OpenShift Dev Spaces
    • Red Hat Developer Toolset

    References

    • E-books
    • Documentation
    • Cheat sheets
    • Architecture center
  • Community

    Get involved

    • Events
    • Live AI events
    • Red Hat Summit
    • Red Hat Accelerators
    • Community discussions

    Follow along

    • Articles & blogs
    • Developer newsletter
    • Videos
    • Github

    Get help

    • Customer service
    • Customer support
    • Regional contacts
    • Find a partner

    Join the Red Hat Developer program

    • Download Red Hat products and project builds, access support documentation, learning content, and more.
    • Explore the benefits

Metrics and traces correlation in Kiali

February 18, 2020
Joel Takvorian
Related topics:
KubernetesMicroservicesService mesh
Related products:
Developer Toolset

    Metrics, traces, and logs might be the Three Pillars of Observability, as you've certainly already heard. This mantra helps us focus our mindset around observability, but it is not a religion. "There is so much more data that can help us have insight into our running systems," said Frederic Branczyk at KubeCon last year.

    These three kind of signals do have their specificities, but they also have common denominators that we can generalize. They could all appear on a virtual timeline and they all originate from a workload, so they are timed and sourced, which is a good start for enabling correlation. If there's anything as important as knowing the signals that a system can emit, it's knowing the relationships between those signals and being able to correlate one with another, even when they're not strictly of the same nature. Ultimately, we can postulate that any sort of signal that is timed and sourced is a good candidate for correlation as well, even if we don’t have hard links between them.

    This fact is, of course, not something new. Correlation has always been possible, but the true stake is to make it easier, and hence cheaper. What makes correlation easier today? I can see at least one pattern that helps, and that we see more and more in monitoring systems: An automatic and consistent sourcing of incoming signals.

    When you use Prometheus in Kubernetes, the Kubernetes service discovery might be enabled and configured for label mapping. As the name suggests, this mechanism maps pods' existing labels to Prometheus labels, or in other words, it forwards source context into metrics (hence, allowing filters and aggregations based on that information). This setup participates in automatic and consistent sourcing. Loki, for instance, has the same for logs. If you can define a context for metrics search and reuse that same context for logs search, then guess what you have? Easier correlation.

    But that's just a step, not the end of the journey.

    New correlation feature in Kiali

    In Kiali, our observability console for Istio, we recently started work regarding correlation. We still have a long way to go, but we're definitely involved. In a previous post, I described how Kiali can help with troubleshooting by navigating between screens (graph, logs, metrics, and traces) while always keeping an active context. We wanted to do more, such as visually correlating traces and metrics, so that when we're seeing an oddly behaving metric we can try to relate it with traces—or the other way around, analyze metrics behavior near high-latency traces.

    In order to do that, each metric chart in Kiali has now a Span duration legend item that, when clicked, shows the spans on that chart as you can see in Figure 1.

    Span duration plots displayed along with Istio request duration metric
    Span duration plots displayed along with Istio request duration metric

    Why spans and not traces? This chart is a service-centric view. We only want to show what is strictly related to the service to better correlate with the displayed metrics, while a trace would encompass calls from other services as well. But be reassured, we can jump from a span to its trace as shown in Figure 2. Kiali now integrates its own traces view along with external links to the Jaeger UI.

     

    Kiali displaying trace details and metadata
    Trace details and meta-data

    This setup is nice because we can now correlate, for instance, the Istio response time metric with actual traces and view all the metadata associated with a trace, which I’m sure will be a typical scenario in troubleshooting high latencies in Kiali. But it's not only about response time: Kiali can monitor non-Istio metrics as well, such as JVM memory. So, we could also correlate a memory increase with actual traces as shown in Figure 3 (or any other metric).

    Kiali chart correlating a memory increase with actual traces.
    Spans spike clearly correlated with an increase of memory and threads used

    Potential pitfall

    There’s a pitfall, though: The spans displayed are limited in number. When the volumetry is high, this limit is quickly reached. Sampling strategies can be configured with Jaeger to limit the amount of ingested traces, but the problem remains: We might miss relevant data. Troubleshooting high latencies often means looking at p99 latencies, or p99.9, or even max. The more we want to have a sharp look, the more we need to work from a complete dataset basis.

    Today, Kiali tries to show the most relevant spans first, such as the ones with errors or high latency. This tactic is similar to what we can do with tail-based sampling, except Kiali does it at query time. This setup is also perfectible because it makes assumptions regarding what is relevant, and anyway, it will still reach a limit at some point.

    There are several ideas around aggregation that we can consider tackling. Some tools apparently do this already, like shown here by Pinterest, and there are several possible approaches (keeping in mind that Kiali is an API-consuming tool that at the moment doesn’t come with persistent storage). Handling traces is still an open field in Kiali and people are welcome to contribute!

    Correlation with exemplars

    When it comes to correlating traces and metrics, there is another option that may come to mind: deep linking metrics and traces through exemplars (see the screenshot in Figure 4).

    Screen capture of the video above, featuring Rob Skillington at KubeCon 2019, San Diego, using Grafana
    Screen capture of the video above, featuring Rob Skillington at KubeCon 2019, San Diego, using Grafana

    The details are being formalized in the OpenMetrics specifications. The idea is to enrich the metrics scraping endpoints with trace IDs associated with one or more metrics. That trace is an exemplar (just a single one, among potentially many others).

    This ID will not be a Prometheus label to avoid impacting metric cardinality. The implementation in Prometheus is not finished yet. In Jaeger, we can imagine that the presence of exemplars would influence sampling decisions, but this issue is not relevant today. This is definitely a hot topic among Prometheus/Grafana/tracing communities. We are following it with interest for Kiali.

    However, again, questions might be raised about the representativeness of a single exemplar among many traces. Correlation is not done just for the sake of correlation, but because it helps solve a real problem. Exemplar linking will help to spot some of them or figure out some business/technical processes involved while looking at metrics. But, there's more that we can do in the field of trace/span aggregation in order to better figure out the health of a system and to troubleshoot. (Not as opposed to exemplars, but as a complement in the debugger's toolset.)

    So, what's next?

    We will continue to work on correlations and traces, such as considering more signals and easing the troubleshooting path. And why not analytics as well? If you have any suggestions or comments, do not hesitate to get in touch. Remember that Kiali is an open source project and you're welcome to contribute with code, or ideas, or both.

    Thanks to Simon Pasquier, Gary Brown, Alissa Bonas and Juca Paixão Kröhling for reviewing and sharing ideas.

    Last updated: February 13, 2024

    Recent Posts

    • Tekton joins the CNCF as an incubating project

    • Federated identity across the hybrid cloud using zero trust workload identity manager

    • Confidential virtual machine storage attack scenarios

    • Introducing virtualization platform autopilot

    • Integrate zero trust workload identity manager with Red Hat OpenShift GitOps

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Platforms

    • Red Hat AI
    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform
    • See all products

    Build

    • Developer Sandbox
    • Developer tools
    • Interactive tutorials
    • API catalog

    Quicklinks

    • Learning resources
    • E-books
    • Cheat sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site status dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2026 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Chat Support

    Please log in with your Red Hat account to access chat support.