Performance

Building an observability stack for automated performance tests on Kubernetes and OpenShift (part 2)

Building an observability stack for automated performance tests on Kubernetes and OpenShift (part 2)

This is the second of a series of three articles based on a session I held at Red Hat Tech Exchange in EMEA. In the first article, I presented the rationale and approach for leveraging Red Hat OpenShift or Kubernetes for automated performance testing, and I gave an overview of the setup.

In this article, we will look at building an observability stack. In production, the observability stack can help verify that the system is working correctly and performing well. It can also be leveraged during performance tests to provide insight into how the application performs under load.

An example of what is described in this article is available in my GitHub repository.

Continue reading “Building an observability stack for automated performance tests on Kubernetes and OpenShift (part 2)”

Share
Using a Kotlin-based gRPC API with Envoy proxy for server-side load balancing

Using a Kotlin-based gRPC API with Envoy proxy for server-side load balancing

These days, microservices-based architectures are being implemented almost everywhere. One business function could be using a few microservices that generate lots of network traffic in the form of messages being passed around. If we can make the way we pass messages more efficient by having a smaller message size, we could  the same infrastructure to handle higher loads.

Protobuf (short for “protocol buffers”) provides language- and platform-neutral mechanisms for serializing structured data for use in communications protocols, data storage, and more. gRPC is a modern, open source remote procedure call (RPC) framework that can run anywhere. Together, they provide an efficient message format that is automatically compressed and provides first-class support for complex data structures among other benefits (unlike JSON).

Microservices environments require lots of communication between services, and for this to happen, services need to agree on a few things. They need to agree on an API for exchanging data, for example, POST (or PUT) and GET to send and receive messages. And they need to agree on the format of the data (JSON). Clients calling the service also need to write lots of boilerplate code to make the remote calls (frameworks!). Protobuf and gRPC provide a way to define the schema of the message (JSON cannot) and generate skeleton code to consume a gRPC service (no frameworks required).

Continue reading “Using a Kotlin-based gRPC API with Envoy proxy for server-side load balancing”

Share
Using eXpress Data Path (XDP) maps in RHEL 8: Part 2

Using eXpress Data Path (XDP) maps in RHEL 8: Part 2

Diving into XDP

In the first part of this series on XDP, I introduced XDP and discussed the simplest possible example. Let’s now try to do something less trivial, exploring some more-advanced eBPF features—maps—and some common pitfalls.

XDP is available in Red Hat Enterprise Linux 8, which you can download and run now.

Continue reading “Using eXpress Data Path (XDP) maps in RHEL 8: Part 2”

Share
Achieving high-performance, low-latency networking with XDP: Part I

Achieving high-performance, low-latency networking with XDP: Part I

XDP: From zero to 14 Mpps

In past years, the kernel community has been using different approaches in the quest for ever-increasing networking performance. While improvements have been measurable in several areas, a new wave of architecture-related security issues and related counter-measures has undone most of the gains, and purely in-kernel solutions for some packet-processing intensive workloads still lag behind the bypass solution, namely Data Plane Development Kit (DPDK), by almost an order of magnitude.

But the kernel community never sleeps (almost literally) and the holy grail of kernel-based networking performance has been found under the name of XDP: the eXpress Data Path. XDP is available in Red Hat Enterprise Linux 8, which you can download and run now.

Continue reading “Achieving high-performance, low-latency networking with XDP: Part I”

Share
Reducing the startup overhead of SystemTap monitoring scripts with syscall_any tapset

Reducing the startup overhead of SystemTap monitoring scripts with syscall_any tapset

A number of the SystemTap script examples in the newly released SystemTap 4.0 available in Fedora 28 and 29 have reduced the amount of time required to convert the scripts into running instrumentation by using the syscall_any tapset.

This article discusses the particular changes made in the scripts and how you might also use this new tapset to make the instrumentation that monitors system calls smaller and more efficient. (This article is a follow-on to my previous article: Analyzing and reducing SystemTap’s startup cost for scripts.)

The key observation that triggered the creation of the syscall_any tapset was a number of scripts that did not use the syscall arguments. The scripts often used syscall.* and syscall.*.return, but they were only concerned with the particular syscall name and the return value. This type of information for all the system calls is available from the sys_entry and sys_exit kernel tracepoints. Thus, rather than creating hundreds of kprobes for each of the individual functions implementing the various system calls, there are just a couple of tracepoints being used in their place.

Continue reading “Reducing the startup overhead of SystemTap monitoring scripts with syscall_any tapset”

Share
Analyzing and reducing SystemTap’s startup cost for scripts

Analyzing and reducing SystemTap’s startup cost for scripts

SystemTap is a powerful tool for investigating system issues, but for some SystemTap instrumentation scripts, the startup times are too long. This article is Part 1 of a series and describes how to analyze and reduce SystemTap’s startup costs for scripts.

We can use SystemTap to investigate this problem and provide some hard data on the time required for each of the passes that SystemTap uses to convert a SystemTap script into instrumentation. SystemTap has a set of probe points marking the start and end of passes from 0 to 5:

  • pass0: Parsing command-line arguments
  • pass1: Parsing scripts
  • pass2: Elaboration
  • pass3: Translation to C
  • pass4: Compilation of C code into kernel module
  • pass5: Running the instrumentation

Articles in this series:

Continue reading “Analyzing and reducing SystemTap’s startup cost for scripts”

Share
Natively compile Java code for better startup time

Natively compile Java code for better startup time

Microservices and serverless architectures are being implemented, or are a part of the roadmap, in most modern solution stacks. Given that Java is still the dominant language for business applications, the need for reducing the startup time for Java is becoming more important. Serverless architectures are one such area that needs faster startup times, and applications hosted on container platforms such as Red Hat Openshift can benefit from both fast Java startup time and a smaller Docker image size.

Let’s see how GraalVM can be beneficial for Java-based programs in terms of speed and size improvements. Surely, these gains are not bound to containers or serverless architectures and can be applied to a variety of use cases.

Continue reading “Natively compile Java code for better startup time”

Share
Improving .NET Core Kestrel performance using a Linux-specific transport

Improving .NET Core Kestrel performance using a Linux-specific transport

ASP.NET Core is the web framework for .NET Core. Performance is a key feature. The stack is heavily optimized and continuously benchmarked. Kestrel is the name of the HTTP server. In this blog post, we’ll replace Kestrel’s networking layer with a Linux-specific implementation and benchmark it against the default out-of-the-box implementations. The TechEmpower web framework benchmarks are used to compare the network layer performance.

Continue reading “Improving .NET Core Kestrel performance using a Linux-specific transport”

Share
Scaling AMQ 7 Brokers with AMQ Interconnect

Scaling AMQ 7 Brokers with AMQ Interconnect

Red Hat JBoss AMQ Interconnect provides flexible routing of messages between AMQP-enabled endpoints, including clients, brokers, and standalone services. With a single connection to a network of AMQ Interconnect routers, a client can exchange messages with any other endpoint connected to the network.

AMQ Interconnect can create various topologies to manage a high volume of traffic or define an elastic network in front of AMQ 7 brokers. This article shows a sample AMQ Interconnect topology for scaling AMQ 7 brokers easily.

Continue reading “Scaling AMQ 7 Brokers with AMQ Interconnect”

Share
SystemTap’s BPF Backend Introduces Tracepoint Support

SystemTap’s BPF Backend Introduces Tracepoint Support

This blog is the third in a series on stapbpf, SystemTap’s BPF (Berkeley Packet Filter) backend. In the first post, Introducing stapbpf – SystemTap’s new BPF backend, I explain what BPF is and what features it brings to SystemTap. In the second post, What are BPF Maps and how are they used in stapbpf, I examine BPF maps, one of BPF’s key components, and their role in stapbpf’s implementation.

In this post, I introduce stapbpf’s recently added support for tracepoint probes. Tracepoints are statically-inserted hooks in the Linux kernel onto which user-defined probes can be attached. Tracepoints can be found in a variety of locations throughout the Linux kernel, including performance-critical subsystems such as the scheduler. Therefore, tracepoint probes must terminate quickly in order to avoid significant performance penalties or unusual behavior in these subsystems. BPF’s lack of loops and limit of 4k instructions means that it’s sufficient for this task.

Continue reading “SystemTap’s BPF Backend Introduces Tracepoint Support”

Share