performance

Reducing the startup overhead of SystemTap monitoring scripts with syscall_any tapset

Reducing the startup overhead of SystemTap monitoring scripts with syscall_any tapset

A number of the SystemTap script examples in the newly released SystemTap 4.0 available in Fedora 28 and 29 have reduced the amount of time required to convert the scripts into running instrumentation by using the syscall_any tapset.

This article discusses the particular changes made in the scripts and how you might also use this new tapset to make the instrumentation that monitors system calls smaller and more efficient. (This article is a follow-on to my previous article: Analyzing and reducing SystemTap’s startup cost for scripts.)

The key observation that triggered the creation of the syscall_any tapset was a number of scripts that did not use the syscall arguments. The scripts often used syscall.* and syscall.*.return, but they were only concerned with the particular syscall name and the return value. This type of information for all the system calls is available from the sys_entry and sys_exit kernel tracepoints. Thus, rather than creating hundreds of kprobes for each of the individual functions implementing the various system calls, there are just a couple of tracepoints being used in their place.

Continue reading “Reducing the startup overhead of SystemTap monitoring scripts with syscall_any tapset”

Share
Analyzing and reducing SystemTap’s startup cost for scripts

Analyzing and reducing SystemTap’s startup cost for scripts

SystemTap is a powerful tool for investigating system issues, but for some SystemTap instrumentation scripts, the startup times are too long. This article describes how to analyze and reduce SystemTap’s startup costs for scripts.

We can use SystemTap to investigate this problem and provide some hard data on the time required for each of the passes that SystemTap uses to convert a SystemTap script into instrumentation. SystemTap has a set of probe points marking the start and end of passes from 0 to 5:

  • pass0: Parsing command-line arguments
  • pass1: Parsing scripts
  • pass2: Elaboration
  • pass3: Translation to C
  • pass4: Compilation of C code into kernel module
  • pass5: Running the instrumentation

Continue reading “Analyzing and reducing SystemTap’s startup cost for scripts”

Share
Troubleshooting FDB table wrapping in Open vSwitch

Troubleshooting FDB table wrapping in Open vSwitch

When most people deploy an Open vSwitch configuration for virtual networking using the NORMAL rule, that is, using L2 learning, they do not think about configuring the size of the Forwarding DataBase (FDB).

When hardware-based switches are used, the FDB size is generally rather large and the large FDB size is a key selling point. However for Open vSwitch, the default FDB value is rather small, for example, in version 2.9 and earlier it is only 2K entries. Starting with version 2.10 the FDB size was increased to 8K entries. Note that for Open vSwitch, each bridge has its own FDB table for which the size is individually configurable.

This blog explains the effects of configuring too small an FDB table, how to identify which bridge is suffering from too small an FDB table, and how to configure the FDB table size appropriately.

Continue reading “Troubleshooting FDB table wrapping in Open vSwitch”

Share
Configuring the MongoDB WiredTiger memory cache for RHMAP

Configuring the MongoDB WiredTiger memory cache for RHMAP

This article describes how to configure MongoDB’s WiredTiger memory cache in Red Hat Mobile Application Platform (RHMAP) to prevent high-usage memory issues and Nagios alerts. If the WiredTiger cache consumes all the memory available for a container, memory issues and Nagios alerts will occur.

The WiredTiger storage engine is the default storage engine starting in MongoDB version 3.2. It uses MultiVersion Concurrency Control (MVCC) architecture for write operations in order to allow multiple different modifications to the same document at the same time.

WiredTiger also caches data and creates checkpoints to give you the ability to recover anytime it’s necessary. For example, if a MongoDB image deployed in a container fails, it is useful to recover the data that was not persisted. Additionally, WiredTiger can recover un-checkpointed data with its journal files. See the journal documentation and snapshots and checkpoint documentation for more information.

Continue reading “Configuring the MongoDB WiredTiger memory cache for RHMAP”

Share
Improving rsync performance with GlusterFS

Improving rsync performance with GlusterFS

Rsync is a particularly tough workload for GlusterFS because with its defaults, it exercises some of the worst case operations for GlusterFS. GlusterFS is the core of Red Hat Gluster’s scale-out storage solution. Gluster is an open, software-defined storage (SDS) platform that is designed to scale out to handle data intensive tasks across many servers in physical, virtual, or cloud deployments. Since GlusterFS is a POSIX compatible distributed file system, getting the best performance from rsync requires some tuning/tweaking on both sides.

In this post, I will go through some of the pain points and the different tunables for working around the pain points.  Getting rsync to run as fast on GlusterFS as it would on a local file system is not really feasible given its architecture, but below I describe how to get as close as possible.

Continue reading “Improving rsync performance with GlusterFS”

Share
Natively compile Java code for better startup time

Natively compile Java code for better startup time

Microservices and serverless architectures are being implemented, or are a part of the roadmap, in most modern solution stacks. Given that Java is still the dominant language for business applications, the need for reducing the startup time for Java is becoming more important. Serverless architectures are one such area that needs faster startup times, and applications hosted on container platforms such as Red Hat Openshift can benefit from both fast Java startup time and a smaller Docker image size.

Let’s see how GraalVM can be beneficial for Java-based programs in terms of speed and size improvements. Surely, these gains are not bound to containers or serverless architectures and can be applied to a variety of use cases.

Continue reading “Natively compile Java code for better startup time”

Share
Monitoring Red Hat AMQ 7 with the jmxtrans Agent

Monitoring Red Hat AMQ 7 with the jmxtrans Agent

Monitoring Red Hat AMQ 7

Red Hat AMQ 7 includes some tools for monitoring the Red Hat AMQ broker. These tools allow you to get metrics about the performance and behavior of the broker and its resources. Metrics are very important for measuring performance and for identifying issues that are causing poor performance.

The following components are included for monitoring the Red Hat AMQ 7 broker:

  • Management web console that is based on Hawtio: This console includes some perspectives and dashboards for monitoring the most important components of the broker.
  • A Jolokia REST-like API: This provides full access to JMX beans through HTTP requests.
  • Red Hat JBoss Operation Network: This is an enterprise, Java-based administration and management platform for developing, testing, deploying, and monitoring Red Hat JBoss Middleware applications.

These tools are incredible and fully integrated with the original product. However, there are cases where Red Hat AMQ 7 is deployed in environments where other tools are used to monitor the broker, for example, jmxtrans.

Continue reading “Monitoring Red Hat AMQ 7 with the jmxtrans Agent”

Share
SystemTap’s BPF Backend Introduces Tracepoint Support

SystemTap’s BPF Backend Introduces Tracepoint Support

This blog is the third in a series on stapbpf, SystemTap’s BPF (Berkeley Packet Filter) backend. In the first post, Introducing stapbpf – SystemTap’s new BPF backend, I explain what BPF is and what features it brings to SystemTap. In the second post, What are BPF Maps and how are they used in stapbpf, I examine BPF maps, one of BPF’s key components, and their role in stapbpf’s implementation.

In this post, I introduce stapbpf’s recently added support for tracepoint probes. Tracepoints are statically-inserted hooks in the Linux kernel onto which user-defined probes can be attached. Tracepoints can be found in a variety of locations throughout the Linux kernel, including performance-critical subsystems such as the scheduler. Therefore, tracepoint probes must terminate quickly in order to avoid significant performance penalties or unusual behavior in these subsystems. BPF’s lack of loops and limit of 4k instructions means that it’s sufficient for this task.

Continue reading “SystemTap’s BPF Backend Introduces Tracepoint Support”

Share
Red Hat OpenShift Container Platform Load Testing Tips

Red Hat OpenShift Container Platform Load Testing Tips

A large bank in the Association of Southeast Asian Nations (ASEAN) plans to develop a new mobile back-end application using microservices and container technology. They expect the platform to be able to support 10,000,000 customers with 5,000 TPS. They decided to use Red Hat OpenShift Container Platform (OCP) as the runtime platform for this application. To ensure that this platform is able to support their throughput requirements and future growth rate, they have performed internal load testing with their infrastructure and mock-up services. This article will share the lessons learned load testing Red Hat OpenShift Container Platform.

Continue reading “Red Hat OpenShift Container Platform Load Testing Tips”

Share
Towards The Ruby 3×3 Performance Goal

Towards The Ruby 3×3 Performance Goal

This blog post is about my work to improve CRuby performance by introducing new virtual machine instructions and a JIT. It is loosely based on my presentation at RubyKaigi 2017 in Hiroshima, Japan.

As many Ruby people know, the author of Ruby, Yukihiro Matsumoto (Matz), set up a very ambitious goal for performance of CRuby version 3. Version 3 should be 3 times faster than version 2.

Koichi Sasada did a great job improving the performance of CRuby version 2 by about 3 times over version 1, by introducing a byte code virtual machine (VM). So I guess it is symbolic to set up the same goal for CRuby version 3.

Continue reading “Towards The Ruby 3×3 Performance Goal”

Share