Graph of a .NET application service's 99% quantile response time

Prometheus is an open source monitoring solution that collects metrics from the system and its applications. As a developer, you can query these metrics and use them to create alerts, which you can use as a source for dashboards. One example would be using Prometheus metrics with Grafana.

In this article, I show you how to use Prometheus to monitor a .NET Core application running on Kubernetes. Note that installation instructions are not included with the article. I do include a reference for using the Prometheus Operator to create and configure Prometheus on Kubernetes.

Note: Learn more about Prometheus' support for monitoring Kubernetes and containerized applications deployed on OpenShift.

Open source monitoring with Prometheus

Prometheus organizes data in a time series. This type of data graph is useful for tracking how a numeric value changes over time. Prometheus then uses the time series to track the following:

  • Counters: Values that can only increment, like the number of requests handled.
  • Gauges: Values that can go up and down, like memory used.
  • Histograms: Values that are counted in a number of buckets, like response time.

A single metric (like HTTP response time) corresponds to multiple time series that have a unique set of labels. Thanks to these labels, you can filter queries for specific criteria, such as the HTTP response time for a particular URL.

Deploying Prometheus

You can use the Prometheus Operator to create and configure Prometheus on Kubernetes. To set up this example, I started by creating a project with a user that has monitoring permissions. I followed the steps described in Monitoring your own services to create the project and user.

Exposing metrics from .NET Core

We'll use the prometheus-net library to expose metrics from .NET Core. This library includes a package for monitoring .NET Core, and a separate package for monitoring ASP.NET Core. The ASP.NET Core monitoring package includes additional metrics related to the web server.

As described in the prometheus-net README, we need to include the prometheus-net.AspNetCore package:

<ItemGroup>
  <PackageReference Include="prometheus-net.AspNetCore" Version="3.5.0" />
</ItemGroup>

Next, we'll add an endpoint that Prometheus will use to retrieve the metrics:

app.UseEndpoints(endpoints =>
{
  // ...
  endpoints.MapMetrics();
};

Finally, we enable capturing the HTTP metrics:

public void Configure(IApplicationBuilder app, ...)
{
  // ...
  app.UseRouting();
  app.UseHttpMetrics();
  // ...
}

We'll deploy this application on Red Hat OpenShift, and make it accessible from outside the cluster:

$ oc new-app dotnet:3.1~https://github.com/redhat-developer/s2i-dotnetcore-ex#dotnetcore-3.1-monitor --context-dir app
$ oc expose service s2i-dotnetcore-ex

Now that our application is up and running, we can have a look at the HTTP endpoint that is used by Prometheus at the /metrics path. Notice the different gauges, counters, and histograms exposed by the ASP.NET Core application:

# HELP process_private_memory_bytes Process private memory size
# TYPE process_private_memory_bytes gauge
process_private_memory_bytes 383516672
# HELP process_working_set_bytes Process working set
# TYPE process_working_set_bytes gauge
process_working_set_bytes 229879808
# HELP http_requests_in_progress The number of requests currently in progress in the ASP.NET Core pipeline. One series without controller/action label values counts all in-progress requests, with separate series existing for each controller-action pair.
# TYPE http_requests_in_progress gauge
http_requests_in_progress{method="GET",controller="",action=""} 1
http_requests_in_progress{method="POST",controller="Home",action="Index"} 0
http_requests_in_progress{method="GET",controller="Home",action="Index"} 0
# HELP http_requests_received_total Provides the count of HTTP requests that have been processed by the ASP.NET Core pipeline.
# TYPE http_requests_received_total counter
http_requests_received_total{code="200",method="POST",controller="Home",action="Index"} 1
http_requests_received_total{code="200",method="GET",controller="Home",action="Index"} 1288
http_requests_received_total{code="200",method="GET",controller="",action=""} 4944
# HELP http_request_duration_seconds The duration of HTTP requests processed by an ASP.NET Core application.
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_sum{code="200",method="GET",controller="Home",action="Index"} 0.5861144999999994
http_request_duration_seconds_count{code="200",method="GET",controller="Home",action="Index"} 1288
http_request_duration_seconds_bucket{code="200",method="GET",controller="Home",action="Index",le="0.001"} 1262
...
http_request_duration_seconds_bucket{code="200",method="GET",controller="Home",action="Index",le="+Inf"} 1288
http_request_duration_seconds_sum{code="200",method="GET",controller="",action=""} 8.691159999999982
http_request_duration_seconds_count{code="200",method="GET",controller="",action=""} 4944
...

You can see metrics for memory like the process_working_set_bytes gauge. You can also see http_request_duration_seconds that exposes a histogram for the request duration. The process_working_set_bytes metric has time series per code, method, controller, and action. This lets us filter based on those labels. The histogram data is in the http_request_duration_seconds_bucket metric, which defines buckets using the le (less or equal) label. The histogram also includes a *_count and *_sum metric.

Monitoring the .NET application

Now, we need to configure metrics collection for the .NET application. We do this on OpenShift by adding a PodMonitor or ServiceMonitor configuration to the namespace. The OpenShift Prometheus Operator picks up these resources and configures monitoring.

Next, let’s look at the service we’ve deployed. We'll use this information to configure the ServiceMonitor:

$ oc get service s2i-dotnetcore-ex -o yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    openshift.io/generated-by: OpenShiftNewApp
  creationTimestamp: 2020-05-08T11:48:02Z
  labels:
    app: s2i-dotnetcore-ex
  name: s2i-dotnetcore-ex
  namespace: demoproject
  resourceVersion: "22076"
  selfLink: /api/v1/namespaces/demoproject/services/s2i-dotnetcore-ex
  uid: 2aa94ebe-2384-4544-bcbe-b8283bd2db60
spec:
  clusterIP: 172.30.35.187
  ports:
  - name: 8080-tcp
    port: 8080
    protocol: TCP
    targetPort: 8080
  selector:
    app: s2i-dotnetcore-ex
    deploymentconfig: s2i-dotnetcore-ex
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

Configure the ServiceMonitor

We'll add a ServiceMonitor that matches the app: s2i-dotnetcore-ex label, name: 8080-tcp port, and namespace: demoproject namespace from the service configuration.

First, we create an example-app-service-monitor.yaml file with the following content:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: prometheus-example-monitor
  name: prometheus-example-monitor
  namespace: demoproject
spec:
  endpoints:
  - interval: 30s
    port: 8080-tcp
    scheme: http
  selector:
    matchLabels:
      app: s2i-dotnetcore-ex

Note that the configuration includes the interval for monitoring, which in this case is set to 30s.

All that's left to do is add the monitor:

$ oc apply -f example-app-service-monitor.yaml

Querying Prometheus

Now that Prometheus is monitoring our application, we can look at the metrics we've collected. To start, open the OpenShift web user interface (UI) and go to the Advanced > Metrics page. On this page, we can execute Prometheus queries. PromQL is the Prometheus Query Language and offers a simple, expressive language to query the time series that Prometheus collected.

As an example, we’ll use a query for calculating the 99% quantile response time of the .NET application service:

histogram_quantile(0.99, sum by(le) (rate(http_request_duration_seconds_bucket[5m])))*1000

Figure 1 shows the response-time graph generated by Prometheus.

Graph of a .NET application service's 99% quantile response time
Figure 1: A histogram graphs a .NET application service's 99% quantile response time.

PromQL queries

In case you're not familiar with PromQL, let's dissect this query. We're using the http_request_duration_seconds_bucket metric from the http_request_duration_seconds histogram that we saw under the /metrics path.

Because these values are ever-incrementing counters, we apply the rate operation over a five-minute window. This gives us the response-time change over the last five minutes. The http_request_duration_seconds_bucket metric is split into a number of time series (per code, method, and so on). We don't care about these individual series, so we will sum them up. We add the by (le) argument to maintain the separate buckets that make up the histogram. We use histogram_quantile to obtain the 99% quantile value, and multiply by 1000 to get the time in milliseconds.

Conclusion

In this article, you've learned about using Prometheus to monitor .NET Core applications that are deployed on Kubernetes. If you wanted to continue with the example, you could use the metrics collected by Prometheus to generate alerts and view them in one or more Grafana dashboards. If you're curious, check out OpenShift 4.3's support for accessing Prometheus, the Alerting UI, and Grafana via the web console.

Last updated: July 30, 2020