Build a monitoring infrastructure for your Jaeger installation

Build a monitoring infrastructure for your Jaeger installation

When you deploy Jaeger in a production configuration, it makes sense to keep an eye on your Jaeger instance to see if it is performing as expected. After all, an outage in Jaeger means that tracing data is being lost, making it challenging to understand possible problems happening within your production applications.

This article guides you in building a monitoring infrastructure for your Jaeger installation. We’ll first provide a link to ready-made resources for those who just want a quick way to monitor Jaeger.

In the second part, we’ll go deeper and see how we can install all the tools in a Kubernetes cluster, including Prometheus, Grafana, and Jaeger itself, as well as the required tools to customize the alert rules and dashboards using the official Jaeger monitoring mixin.

TL;DR: If you already have a working environment involving Grafana, Prometheus, and Jaeger, you might only need to know where the base dashboards and alert definitions are located. Here they are:

If you are already familiar with mixins, the official Jaeger monitoring mixin is available in our main source repository.

Prerequisites

This guide assumes you have admin access to a Kubernetes cluster. An easy way to get a Kubernetes cluster for testing purposes is running it locally via Minikube.

This guide also requires you to have jsonnet and jb (jsonnet-bundler). They can be installed on your local machine using go get, as follows:

$ go get github.com/google/go-jsonnet/cmd/jsonnet
$ go get github.com/jsonnet-bundler/jsonnet-bundler/cmd/jb

Everything you need to grow your career.

With your free Red Hat Developer program membership, unlock our library of cheat sheets and ebooks on next-generation application development.

SIGN UP

Installing Prometheus, Alertmanager, and Grafana

Installing Prometheus on Kubernetes can be done in several ways. One way to install it is via the project kube-prometheus, but the Prometheus Operator can be directly used instead, as well the community Helm chart for the Prometheus Operator. For this guide, we’ll use kube-prometheus to get a Prometheus, Alertmanager, and Grafana instance.

First, let’s use jb to generate a base jsonnet file describing our installation, adding kube-prometheus as a dependency:

$ jb init
$ jb install \
  github.com/jaegertracing/jaeger/monitoring/jaeger-mixin@master \
  github.com/grafana/jsonnet-libs/grafana-builder@master \
  github.com/coreos/kube-prometheus/jsonnet/kube-prometheus@master

Once that is done, we should have a manifest file named jsonnetfile.json similar to this one:

{
    "dependencies": [
        {
            "name": "mixin",
            "source": {
                "git": {
                    "remote": "https://github.com/jpkrohling/jaeger",
                    "subdir": "monitoring/mixin"
                }
            },
            "version": "1668-Move-Jaeger-mixing-to-main-repo"
        },
        {
            "name": "grafana-builder",
            "source": {
                "git": {
                    "remote": "https://github.com/grafana/jsonnet-libs",
                    "subdir": "grafana-builder"
                }
            },
            "version": "master"
        },
        {
            "name": "kube-prometheus",
            "source": {
                "git": {
                    "remote": "https://github.com/coreos/kube-prometheus",
                    "subdir": "jsonnet/kube-prometheus"
                }
            },
            "version": "master"
        }
    ]
}

The install command should also have created vendor directory with all the jsonnet dependencies. All we need now is a deployment descriptor: create a file named monitoring-setup.jsonnet, with the following contents.

local kp =
  (import 'kube-prometheus/kube-prometheus.libsonnet') +
  {
    _config+:: {
      namespace: 'monitoring',
    },
  };

{ ['00namespace-' + name + '.json']: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
{ ['0prometheus-operator-' + name + '.json']: kp.prometheusOperator[name] for name in std.objectFields(kp.prometheusOperator) } +
{ ['node-exporter-' + name + '.json']: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
{ ['kube-state-metrics-' + name + '.json']: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
{ ['alertmanager-' + name + '.json']: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
{ ['prometheus-' + name + '.json']: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
{ ['prometheus-adapter-' + name + '.json']: kp.prometheusAdapter[name] for name in std.objectFields(kp.prometheusAdapter) } +
{ ['grafana-' + name + '.json']: kp.grafana[name] for name in std.objectFields(kp.grafana) }

With that, we are ready to generate our deployment manifests and apply them:

$ jsonnet -J vendor -cm manifests/ monitoring-setup.jsonnet
$ kubectl apply -f manifests/

It might be the case that the Custom Resource Definitions (CRDs) aren’t ready when they are first used, causing messages like:

no matches for kind "ServiceMonitor" in version "monitoring.coreos.com/v1"

In that case, simply apply the manifests again, since they are idempotent.

After a couple of minutes, there should be a few Deployment and Statefulset resources available:

$ kubectl get deployments -n monitoring 
NAME                  READY     UP-TO-DATE   AVAILABLE   AGE
grafana               1/1       1            1           56s
kube-state-metrics    1/1       1            1           56s
prometheus-adapter    1/1       1            1           56s
prometheus-operator   1/1       1            1           57s

$ kubectl get statefulsets -n monitoring
NAME                READY     AGE
alertmanager-main   3/3       60s
prometheus-k8s      2/2       50s

Let’s check that Prometheus is up by connecting directly to the service’s port:

$ kubectl port-forward -n monitoring service/prometheus-k8s 9090:9090
$ firefox http://localhost:9090

Perform the same check for Grafana, where the default credentials are admin for both the username and password.

$ kubectl port-forward -n monitoring service/grafana 3000:3000
$ firefox http://localhost:3000

Installing Jaeger

The Jaeger Operator is installed by default in the “observability” namespace. For this guide, let’s keep it together with Prometheus and Grafana in the “monitoring” namespace. To accomplish that, we’ll get the manifests with curl and replace observability with monitoring, feeding the outcome to kubectl:

$ kubectl create -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/v1.13.1/deploy/crds/jaegertracing_v1_jaeger_crd.yaml
$ curl -s https://raw.githubusercontent.com/jaegertracing/jaeger-operator/v1.13.1/deploy/service_account.yaml | sed 's/observability/monitoring/gi' | kubectl apply -f -
$ curl -s https://raw.githubusercontent.com/jaegertracing/jaeger-operator/v1.13.1/deploy/role.yaml | sed 's/observability/monitoring/gi' | kubectl apply -f -
$ curl -s https://raw.githubusercontent.com/jaegertracing/jaeger-operator/v1.13.1/deploy/role_binding.yaml | sed 's/observability/monitoring/gi' | kubectl apply -f -
$ curl -s https://raw.githubusercontent.com/jaegertracing/jaeger-operator/v1.13.1/deploy/operator.yaml | sed 's/observability/monitoring/gi' | kubectl apply -f -

At the time of writing, the latest version was v1.13.1, so change the URLs above to match the desired version. After a few moments, the Jaeger Operator should be up and running:

$ kubectl get deployment/jaeger-operator -n monitoring
NAME              READY     UP-TO-DATE   AVAILABLE   AGE
jaeger-operator   1/1       1            1           23s

Once the Jaeger Operator is ready, it’s time to create a Jaeger instance called tracing:

kubectl apply -f - <<EOF
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: tracing
  namespace: monitoring
EOF

And in a few moments, a Jaeger instance should be ready:

$ kubectl get deployment/tracing -n monitoring 
NAME      READY     UP-TO-DATE   AVAILABLE   AGE
tracing   1/1       1            1           17s

$ kubectl get ingress -n monitoring 
NAME            HOSTS     ADDRESS           PORTS     AGE
tracing-query   *         192.168.122.181   80        26s

We can access the Jaeger UI by opening the given IP address in a web browser. In this example, it’s http://192.168.122.181/, but your IP will likely be different.

Now that we have everything running, let’s install our business application, instrumented to create spans for each HTTP request it receives:

$ kubectl apply -n default -f https://raw.githubusercontent.com/jaegertracing/jaeger-operator/v1.13.1/deploy/examples/business-application-injected-sidecar.yaml

Once the deployment is ready, we can open a direct connection to the Pod and start sending requests to it:

$ kubectl get -n default deployment/myapp 
NAME      READY     UP-TO-DATE   AVAILABLE   AGE
myapp     1/1       1            1           26s

$ kubectl port-forward deployment/myapp 8080:8080
$ watch -n 0.5 curl localhost:8080

This will generate two HTTP requests per second, and we should see one new trace in the Jaeger UI for every HTTP request.

Create a PodMonitor

At this point, we have a fully functional set of monitoring services: Prometheus, Grafana, Alertmanager, and Jaeger. However, the metrics generated from our Jaeger deployment aren’t being scraped by Prometheus: we need to create a ServiceMonitor or a PodMonitor telling Prometheus where to get our data.

Depending on the component, the metrics are served in different ports:

Component Port
Agent 14271
Collector 14269
Query 16687
All in one 14269

As the Jaeger instance we created didn’t specify a strategy, the default strategy allInOne was selected. Our PodMonitor will then tell Prometheus to get the metrics from the port 14269:

$ kubectl apply -f - <<EOF
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: tracing
  namespace: monitoring
spec:
  podMetricsEndpoints:
  - interval: 5s
    targetPort: 14269
  selector:
    matchLabels:
      app: jaeger
EOF

It might take a couple of minutes for Prometheus to see this new target. Check the Targets page, looking for the target monitoring/tracing/0. Once Prometheus scrapes Jaeger’s metrics endpoint, we can see Jaeger-specific metrics in the Prometheus Graph view. For instance, enter jaeger_collector_traces_saved_by_svc_total and click Execute. The number of traces shown in the graph should be increasing over time, reflecting the number of HTTP requests that we are running against the business application from the previous steps.

Pod monitor

Adapt the mixin

We now have the metrics from our Jaeger instance available in Prometheus, but which metrics should be seen on a dashboard and which alerts should be generated under which circumstances?

While it’s hard to come up with a generic, one-size-fits-all answer to those questions, our friends at Grafana Labs came up with a mixin for Jaeger, providing a starting point for your own dashboards and alerts. The mixin has since been contributed to the Jaeger project and is available under the main repository.

Let’s go back to our original monitoring-setup.jsonnet and add the Jaeger-specific dashboard and alert rules:

local jaegerAlerts = (import 'jaeger-mixin/alerts.libsonnet').prometheusAlerts;
local jaegerDashboard = (import 'jaeger-mixin/mixin.libsonnet').grafanaDashboards;

local kp =
  (import 'kube-prometheus/kube-prometheus.libsonnet') +
  {
    _config+:: {
      namespace: 'monitoring',
    },
    grafanaDashboards+:: {
      'jaeger.json': jaegerDashboard['jaeger.json'],
    },
    prometheusAlerts+:: jaegerAlerts,
  };

{ ['00namespace-' + name + '.json']: kp.kubePrometheus[name] for name in std.objectFields(kp.kubePrometheus) } +
{ ['0prometheus-operator-' + name + '.json']: kp.prometheusOperator[name] for name in std.objectFields(kp.prometheusOperator) } +
{ ['node-exporter-' + name + '.json']: kp.nodeExporter[name] for name in std.objectFields(kp.nodeExporter) } +
{ ['kube-state-metrics-' + name + '.json']: kp.kubeStateMetrics[name] for name in std.objectFields(kp.kubeStateMetrics) } +
{ ['alertmanager-' + name + '.json']: kp.alertmanager[name] for name in std.objectFields(kp.alertmanager) } +
{ ['prometheus-' + name + '.json']: kp.prometheus[name] for name in std.objectFields(kp.prometheus) } +
{ ['prometheus-adapter-' + name + '.json']: kp.prometheusAdapter[name] for name in std.objectFields(kp.prometheusAdapter) } +
{ ['grafana-' + name + '.json']: kp.grafana[name] for name in std.objectFields(kp.grafana) }

And let’s generate the new manifests:

$ jsonnet -J vendor -cm manifests/ monitoring-setup.jsonnet

Only a few manifests should have been changed, but it’s safe to apply all the manifests again:

$ kubectl apply -f manifests/

After a few moments, a new pod for Grafana should replace the previous one:

$ kubectl get pods -n monitoring -l app=grafana
NAME                       READY     STATUS    RESTARTS   AGE
grafana-558647b59-fkmr4    1/1       Running   0          11m
grafana-7bcb7f5b9b-6rv2w   0/1       Pending   0          8s

Tip: When using Minikube, your new pod might get stuck in the Pending state due to Insufficient cpu. You can check the cause by running kubectl describe -n monitoring pod POD_NAME, and manually kill the old pod with kubectl delete -n monitoring pod POD_NAME, or start minikube with a higher value for the flag --cpus.

Once the new Grafana pod is up and running, you should see that Grafana has a new Jaeger dashboard available, displaying the data provided by Prometheus. Similarly, new alert rules should be available in Prometheus: look for the ones with “Jaeger” in the name, such as JaegerCollectorQueueNotDraining.

Dashboard

Wrapping up

Deploying observability tools to bring insights to your business application is a necessity in the world of cloud-native microservices, and it’s essential to be able to monitor how these tools are behaving. This blog post shows one way to get a complete stack up and running in Kubernetes, with the final goal of monitoring Jaeger using its own internal metrics. The same setup can be extended to get Prometheus to scrape the metrics from your business application, and Grafana as the dashboard tool to visualize the data.

Are you using Grafana already to visualize the metrics scraped from Jaeger? Share your dashboards with us, we might have a place for them in the official Jaeger monitoring mixin!

Upcoming

Juraci Paixão Kröhling will present “What are my microservices doing?” at the Red Hat “Develop. Deploy. Deliver continuously.” virtual event, Thursday, October 10, 2019.

Share