Developers think of their programs as a serial sequence of operations running as written in the original source code. However, program source code is just a specification for computations. The compiler analyzes the source code and determines if changes to the specified operations will yield the same visible results but be more efficient. It will eliminate operations that are ultimately not visible, and rearrange operations to extract more parallelism and hide latency. These differences between the original program’s source code and the optimized binary that actually runs might be visible when inspecting the execution of the optimized binary via tools like GDB and SystemTap.
To aid with the debugging and instrumentation of binaries the compiler generates debug information to map between the source code and executable binary. The debug information includes which line of source code each machine instruction is associated with, where the variables are located, and how to unwind the stack to get a backtrace of function calls. However, even with the compiler generating this information, a number of non-intuitive effects might be observed when instrumenting a compiler-optimized binary:
Continue reading “Possible issues with debugging and inspecting compiler-optimized binaries”
I recently saw an article from Uber Engineering describing an issue they were having with an increase in latency. The Uber engineers suspected that their code was running out of stack space causing the golang runtime to issue a stack growth, which would introduce additional latency . The engineers ended up modifying the golang runtime with additional instrumentation to report these stack growths to confirm their suspicions. This situation is a perfect example of where SystemTap could have been used.
SystemTap is a tool that can be used to perform live analysis of a running program. It is able to interrupt normal control flow and execute code specified by a SystemTap script, which can allow users to temporarily modify a running program without having to change the source and recompile.
Continue reading “Probing golang runtime using SystemTap”
SystemTap has extensive libraries called tapsets that allow developers to instrument various aspects of the kernel’s operation. SystemTap allows the use of wildcards to instrument multiple locations in particular subsystems. SystemTap has to perform a significant amount of work to create instrumentation for each of the places being probed. This overhead is particularly apparent when using the wildcards for the system call tapset that contains hundreds of entries (
syscall.*.return). For some subsets of data collection, replacing the wildcard-matched syscall probes in SystemTap scripts with the
kernel.trace("sys_enter") and the
kernel.trace("sys_exit") probe will produce smaller instrumentation modules that compile and start up more quickly. In this article, I’ll show a few examples of how this works.
Continue reading “Speed up SystemTap script monitoring of system calls”
A number of the SystemTap script examples in the newly released SystemTap 4.0 available in Fedora 28 and 29 have reduced the amount of time required to convert the scripts into running instrumentation by using the
This article discusses the particular changes made in the scripts and how you might also use this new tapset to make the instrumentation that monitors system calls smaller and more efficient. (This article is a follow-on to my previous article: Analyzing and reducing SystemTap’s startup cost for scripts.)
The key observation that triggered the creation of the
syscall_any tapset was a number of scripts that did not use the
syscall arguments. The scripts often used
syscall.*.return, but they were only concerned with the particular
syscall name and the return value. This type of information for all the system calls is available from the
sys_exit kernel tracepoints. Thus, rather than creating hundreds of kprobes for each of the individual functions implementing the various system calls, there are just a couple of tracepoints being used in their place.
Continue reading “Reducing the startup overhead of SystemTap monitoring scripts with syscall_any tapset”
SystemTap is a powerful tool for investigating system issues, but for some SystemTap instrumentation scripts, the startup times are too long. This article is Part 1 of a series and describes how to analyze and reduce SystemTap’s startup costs for scripts.
We can use SystemTap to investigate this problem and provide some hard data on the time required for each of the passes that SystemTap uses to convert a SystemTap script into instrumentation. SystemTap has a set of probe points marking the start and end of passes from 0 to 5:
- pass0: Parsing command-line arguments
- pass1: Parsing scripts
- pass2: Elaboration
- pass3: Translation to C
- pass4: Compilation of C code into kernel module
- pass5: Running the instrumentation
Articles in this series:
Continue reading “Analyzing and reducing SystemTap’s startup cost for scripts”
You can study source code and manually instrument functions as described in the “Use the dynamic tracing tools, Luke” blog article, but why not make it easier to find key points in the software by adding user-space markers to the application code? User-space markers have been available in Linux for quite some time (since 2009). The inactive user-space markers do not significantly slow down the code. Having them available allows you to get a more accurate picture of what the software is doing internally when unexpected issues occur. The diagnostic instrumentation can be more portable with the user-space markers, because the instrumentation does not need to rely on instrumenting particular function names or lines numbers in source code. The naming of the instrumentation points can also make clearer what event is associated with a particular instrumentation point.
For example, Ruby MRI on Red Hat Enterprise Linux 7 has a number of different instrumentation points made available as a SystemTap tapset. If SystemTap is installed on the system, as described by What is SystemTap and how to use it?, the installed Ruby MRI instrumentation points can be listed with the
stap -L” command shown below. These events show the start and end of various operations in the Ruby runtime, such as the start and end of garbage collection (GC) marking and sweeping.
Continue reading “Making the Operation of Code More Transparent and Obvious with SystemTap”
A common refrain for tracking down issues on computer systems running open source software is “Use the source, Luke.” Reviewing the source code can be helpful in understanding how the code works, but the static view may not give you a complete picture of how things work (or are broken) in the code. The paths taken through code are heavily data dependent. Without knowledge about specific values at key locations in code, you can easily miss what is happening. Dynamic instrumentation tools, such as SystemTap, that trace and instrument the software can help provide a more complete understanding of what the code is actually doing
I have wanted to better understand how the Ruby interpreter works. This is an opportunity to use SystemTap to investigate Ruby MRI internals on Red Hat Enterprise Linux 7. The article What is SystemTap and how to use it? has more information about installing SystemTap. The x86_64 RHEL 7 machine has
ruby-2.0.0648-33.el7_4.x86_64.rpm installed, so the matching
debuginfo RPM is installed to provide SystemTap with information about function parameters and to provide me with human-readable source code. The
debuginfo RPM is installed by running the following command as root:
Continue reading ““Use the dynamic tracing tools, Luke””
This blog is the third in a series on stapbpf, SystemTap’s BPF (Berkeley Packet Filter) backend. In the first post, Introducing stapbpf – SystemTap’s new BPF backend, I explain what BPF is and what features it brings to SystemTap. In the second post, What are BPF Maps and how are they used in stapbpf, I examine BPF maps, one of BPF’s key components, and their role in stapbpf’s implementation.
In this post, I introduce stapbpf’s recently added support for tracepoint probes. Tracepoints are statically-inserted hooks in the Linux kernel onto which user-defined probes can be attached. Tracepoints can be found in a variety of locations throughout the Linux kernel, including performance-critical subsystems such as the scheduler. Therefore, tracepoint probes must terminate quickly in order to avoid significant performance penalties or unusual behavior in these subsystems. BPF’s lack of loops and limit of 4k instructions means that it’s sufficient for this task.
Continue reading “SystemTap’s BPF Backend Introduces Tracepoint Support”
Compared to SystemTap’s default backend, one of stapbpf’s most distinguishing features is the absence of a kernel module runtime. The BPF machinery inside the kernel instead mostly handles its runtime. Therefore it would be very helpful if BPF provided us with a way for states to be maintained across multiple invocations of BPF programs and for userspace programs to be able to communicate with BPF programs. This is accomplished by BPF maps. In this blog post, I will introduce BPF maps and explain their role in stapbpf’s implementation.
What are BPF maps?
BPF maps are essentially generic data structures consisting of key/value pairs. They are created from userspace using the BPF system call, which returns a file descriptor for the map. The key size and value size are specified by the user, allowing for the storage of key/value pairs with arbitrary types. Once a map is created, elements can be accessed from userspace using the BPF system call. Maps are automatically deallocated once the user process that created the map terminates (although it is possible to force the map to persist longer than this process). Stapbpf uses the following function to create new BPF maps.
Continue reading “What are BPF Maps and how are they used in stapbpf”
SystemTap 3.2 includes an early prototype of SystemTap’s new BPF backend (stapbpf). It represents a first step towards leveraging powerful new tracing and performance analysis capabilities recently added to the Linux kernel. In this post, I will compare the translation process of stapbpf with the default backend (stap) and compare some differences in functionality between these two backends.
Continue reading “Introducing stapbpf – SystemTap’s new BPF backend”