Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Lightweight Network Observability Operator without Loki

September 19, 2024
Mehul Modi Steven Lee
Related topics:
DevOpsKubernetesObservabilityOperators
Related products:
Red Hat OpenShiftRed Hat OpenShift Container Platform

Share:

    Recently, the Network Observability Operator released version 1.6, which added a major enhancement to provide network insights for your Red Hat OpenShift cluster without the Loki log aggregation system. This enhancement was described in What's new in Network Observability 1.6 article, providing a quick overview of the feature. Until this release, Loki was required to be deployed alongside Network Observability to store the network flows data. 

    In this article, let's look at some of the advantages and trade-offs users would have when deploying the Network Observability Operator with Loki disabled. As more metrics are enabled by default with this feature, we'll also demonstrate a use case on how those metrics can benefit users for real world scenarios.

    Configure Network Observability without Loki

    Loki as datasource is currently enabled by default. To configure the Network Observability Operator without Loki, set the FlowCollector resource specification to disable Loki:

    loki:
      enable: false

    When Loki is disabled, metrics continue to get scraped by the OpenShift cluster Prometheus without any additional configuration. The Network Traffic console uses Prometheus as a source for fetching the data.  

    Performance and resource utilization gains

    In this section, let's dive into the query performance and resource utilization differences when Network Observability is configured with Loki versus without it. 

    Query performance

    Prometheus queries are blazing fast compared to Loki queries, but don't take my word for it. Let's look at the data from the query performance tests:  

    Test bench environment

    • Test: We conducted 50 identical queries without any filters for 3 separate time ranges to render a topology view for both Loki and Prometheus. Such a query requests all K8s Owners for the workload running in an OpenShift cluster that had network flows associated to them.
    • Test bed: 9 workers and 3 primary nodes, AWS m5.2xlarge machines (8 vCPU, 32 GB RAM).
    • LokiStack size: 1x.small.

    Results

    The following table shows the 90th percentile query times for each table:

    Time RangeLokiPrometheus
    Last 5m984 ms99 ms
    Last 1h2410 ms236 ms
    Last 6h> 10 s474 ms

    As the time range to fetch network flows gets wider, Loki queries tend to get slower or time out, while Prometheus queries are able to render the data within a fraction of a second.  

    Resource utilization

    In our tests conducted on 3 different test beds with varied workloads and network throughput, when Network Observability is configured without Loki, total savings of memory usage are in the 45-65% range and CPU utilization is lower by 10-20%. (Note that actual resource utilization may depend on various factors such as network traffic, FlowCollector sampling size, and the number of workloads and nodes in a Red Hat OpenShift Container Platform cluster). Not to mention you do not need to provision and plan for additional object storage in public clouds for Loki, which overall reduces the cost and improves operational efficiency significantly. 

    In our perf tests, kube-burner workloads were used to generate several objects and create heavy network traffic. We used a sampling rate of 1 for all of the following tests:

    • Test bed 1: 25 nodes cluster—this test takes into account the total number of namespace, pods, and services in a OpenShift Container Platform cluster, places load on the eBPF agent (component that probes and generates flows), and represents use cases with a high number of workloads for a given cluster size. For example, Test 1 consists of 76 Namespaces, 5153 Pods, and 2305 Services. 
    • Test bed 2: 65 nodes cluster—this test takes into account a high ingress traffic volume. 
    • Test bed 3: 120 nodes cluster—this test takes into account the total number of namespace, pods, and services in an OpenShift Container Platform cluster, places load on the eBPF agent on a larger scale than Test bed 1, and represents use cases with a high number of workloads for a given cluster size. For example, Test bed 3 consists of 553 Namespaces, 6998 Pods, and 2508 Services.

    The following graphs (Figures 1 and 2) show total vCPU, memory, and storage usage for a recommended Network Observability stack: flowlogs-pipeline (component that enriches the flows with Kubernetes-related information to be stored as logs), eBPF-agent, Kafka, Prometheus, and optionally Loki for production clusters. 

    A bar graph showing the vCPU consumption difference between Test bed 1, 2, and 3 with and without Loki.
    vCPU consumption difference with and without Loki
    Figure 1: vCPU consumption difference with and without Loki.

         

    A bar graph showing the memory consumption difference between Test bed 1, 2, and 3 with and without Loki.
    Memory consumption difference with and without Loki
    Figure 2: Memory consumption difference with and without Loki.

    Let's look at the amount of estimated storage you need for all the network flows and Prometheus metrics that Network Observability has to store. For context, even when Loki is installed, Network Observability publishes a default set of Prometheus metrics for monitoring dashboards, and it adds additional metrics when Loki is disabled to visualize network flows. Figure 3 shows the estimated amount of storage required to store 15 days of network flows data (when configured with Loki), with Prometheus metrics and Kafka as an intermediary data streaming layer between the eBPF-agent and the flowlogs-pipeline.

    The network flows rate for each test bed was 10K, 13K, and 30K flows/second respectively. The storage for Loki includes AWS S3 bucket utilization and its PVC usage. For Kafka PVC storage value, it assumes 1 day of retention or 100 GB, whichever is attained first.
      

    A graph showing the storage consumption differences between Loki, Kafka, and Prometheus with and without Loki.
    Storage consumption difference with and without Loki
    Figure 3: Storage consumptions differences with and without Loki.

    As seen across the test beds, we find a storage savings of 90% when Network Observability is configured without Loki.   

    Trade-offs

    We saw that having Prometheus as datasource provides impressive performance gains and sub-second query times; however, it also introduces the following constraints:  

    • Without storage of network flows data, the Network Observability OpenShift web console no longer provides the Traffic flows table, as shown in Figure 4.
    A view of the Network Observability OpenShift web console showing that the Traffic flows table is disabled.
    Figure 4: Disabled Traffic flows table.
    • Per-pod level of resource granularity is not available since it causes Prometheus metrics to have high cardinality, as shown in Figure 5.
    A view of the Network Observability OpenShift web console showing high cardinality in Prometheus.
    Figure 5: Prometheus metrics displaying high cardinality.

    Should you need per-flow or per-pod level of granularity for diagnostic and troubleshooting, other than enabling Loki you have multiple other options available: 

    • Collect flowlogs into your preferred data analytics tool using the .spec.exporters configuration in the FlowCollector. Currently Kafka and IPFIX are supported exporters.
    • In this release, Network Observability also introduced the FlowMetrics API, which lets you create custom metrics that are not available out of the box. The FlowMetrics API creates on-demand Prometheus metrics based on enriched flowlogs fields, which can be used as labels for custom Prometheus metrics. 

    Note

    Use this option with caution, as introducing metrics that may have labels with high cardinality increases the cluster's Prometheus resource usage and might impact overall cluster monitoring. 

    • Restricted multitenancy: Prometheus in an OpenShift cluster currently doesn't support multitenancy in a way that Loki does. Non-admin users can be added to cluster-monitoring-view where the user can have access to view all available Prometheus metrics.

    For example, the following command can be used to enable Prometheus metrics, visualizing for the testuser-0 user:

    oc adm policy add-cluster-role-to-user cluster-monitoring-view testuser-0

    Network Observability metrics use case

    Let's look at a scenario about how users can benefit from metrics published by the Network Observability Operator. Let's say you suspect an anomaly with DNS lookups in your cluster and want to investigate workloads that may be facing DNS latencies. With DNSTracking feature and enriched Prometheus metrics, you can quickly set up an alert to trigger on high DNS latencies. For example, the following alert is triggered for any workloads that experience a DNS latency > 100ms:

    apiVersion: monitoring.coreos.com/v1
        kind: PrometheusRule
        metadata:
          name: dns-latency-alert
          namespace: netobserv
        spec:
          groups:
          - name: DNSLatencyAlert
            rules:
            - alert: DNSLatencyAlert
              annotations:
                message: |-
                  {{ $labels.DstK8S_OwnerName }} in {{ $labels.DstK8S_Namespace }} is experiencing high DNS Latencies.
                summary: "Trigger for any workloads experiencing > than 100ms DNS Latency."
              expr: histogram_quantile(0.9, sum(rate(netobserv_workload_dns_latency_seconds_bucket{DstK8S_Namespace!=""}[2m])) by (le,DstK8S_Namespace,DstK8S_OwnerName))*1000 > 100
              for: 10s
              labels:
                severity: warning

    To demonstrate this use case, I configured the CoreDNS erratic plug-in in the openshift-dns namespace to add latencies for the example.org domain using the following configuration:  

    example.org {
                erratic {
                    delay 2 100ms
                }
        }

    Configuring for DNS latencies adds a 100ms delay to every second DNS request coming in for example.org. A test pod performing DNS lookups for example.org every 1 second was created, eventually triggering the earlier configured DNSLatencyAlert in my OpenShift Container Platform cluster. See Figure 6.

    A view of the DNSLatencyAlert within the OCP cluster.
    DNS Latency detection alert use case
    Figure 6: DNS Latency detection alert case.

    Similarly, additional alerts on different DNS response codes could be set up, for example an alert for DNS lookup failures such as DNS queries receiving NXDOMAIN or SERVFAIL responses can also be set up as flowlogs and metrics are already enriched with DNS response codes. In addition to metrics for the DNSTracking feature, Network Observability provides metrics for other features, such as round-trip-time and packet drops.   

    Conclusion

    Network Observability operator provides the visibility you need to proactively detect issues within OpenShift cluster networking. Now with an option to disable Loki, Network Observability operator provides lightweight solution to visualize, diagnose, and troubleshoot networking issues faster and at a lower cost. Network Observability's Prometheus metrics can be leveraged to set up user defined alerts in your Red Hat OpenShift Container Platform cluster. 

    Whether you have already deployed or considering to deploy Network Observability, we would love to engage with you and hear your thoughts here. Thanks for reading. Special thanks to Joel Takvorian, Julien Pinsonneau, and Sara Thomas for providing information for this article. 

    Related Posts

    • What's new in Network Observability 1.6

    • Network observability using TCP handshake round-trip time

    • Network observability with eBPF on single node OpenShift

    • Packet capture using Network Observability eBPF Agent

    • eBPF application development: Beyond the basics

    Recent Posts

    • Storage considerations for OpenShift Virtualization

    • Upgrade from OpenShift Service Mesh 2.6 to 3.0 with Kiali

    • EE Builder with Ansible Automation Platform on OpenShift

    • How to debug confidential containers securely

    • Announcing self-service access to Red Hat Enterprise Linux for Business Developers

    What’s up next?

    Operators make it easier to automate the lifecycle of complex, stateful programs by adding application-specific skills to a Kubernetes cluster. Read Kubernetes Operators for realistic examples that show you how Operators work and how to build them with the Operator Framework and SDK.

    Get the e-book
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue