William Cohen

William Cohen has been a developer of performance tools at Red Hat for over a decade and has worked on a number of the performance tools in Red Hat Enterprise Linux and Fedora such as OProfile, PAPI, SystemTap, and Dyninst.

Recent Posts

How to use the Linux perf tool to count software events

How to use the Linux perf tool to count software events

The Linux perf tool was originally written to allow access to the performance monitoring hardware that counts hardware events, such as instructions executed, processor cycles, and cache misses. However, it can also be used to count software events, which can be useful in gauging how frequently some part of the system software is executed.

Recently someone at Red Hat asked whether there was a way to get a count of system calls being executed on the system. The kernel has a predefined software trace point, raw_syscalls:sys_enter, which collects that exact information; it counts each time a system call is made. To use the trace point events, the perf command needs to be run as root.

Continue reading “How to use the Linux perf tool to count software events”

Share
Speed up SystemTap scripts with statistical aggregates

Speed up SystemTap scripts with statistical aggregates

A common question that SystemTap can be used to answer involves how often particular events occur on the system. Some events, such as system calls, can happen frequently and the goal is to make the SystemTap script as efficient as possible.

Using the statistical aggregate in the place of regular integers is one way to improve the performance of SystemTap scripts. The statistical aggregates record data on a per-CPU basis to reduce the amount of coordination required between processors, allowing information to be recorded with less overhead. In this article, I’ll show an example of how to reduce overhead in SystemTap scripts.

Continue reading “Speed up SystemTap scripts with statistical aggregates”

Share
Speed up SystemTap script monitoring of system calls

Speed up SystemTap script monitoring of system calls

SystemTap has extensive libraries called tapsets that allow developers to instrument various aspects of the kernel’s operation. SystemTap allows the use of wildcards to instrument multiple locations in particular subsystems.  SystemTap has to perform a significant amount of work to create instrumentation for each of the places being probed.  This overhead is particularly apparent when using the wildcards for the system call tapset that contains hundreds of entries (syscall.* and syscall.*.return). For some subsets of data collection, replacing the wildcard-matched syscall probes in SystemTap scripts with the kernel.trace("sys_enter")  and the kernel.trace("sys_exit") probe will produce smaller instrumentation modules that compile and start up more quickly. In this article, I’ll show a few examples of how this works.

Continue reading “Speed up SystemTap script monitoring of system calls”

Share
How data layout affects memory performance

How data layout affects memory performance

The mental model most people have of how computer memory (aka Random Access Memory or RAM) operates is inaccurate. The assumption that any access to any byte in memory has the same low cost does not hold on modern processors. In this article, I’ll explain what developers need to know about modern memory and how data layout can affect performance.

Current memory is starting to look more like an extremely fast block storage device. Rather than reading or writing individual bytes, the processor is reading or writing groups of bytes that fill a cache line (commonly 32 to 128 bytes in size). An access to memory requires well over a hundred clock cycles, two orders of magnitude slower than executing an instruction on the processor. Thus, programmers might reconsider the data structures used in their program if they are interested in obtaining better performance.

Continue reading “How data layout affects memory performance”

Share
Algorithms != Programs and Programs are not “One size fits all”

Algorithms != Programs and Programs are not “One size fits all”

You’ve probably been taught that picking an algorithm that has the best Big-O asymptotic cost will yield the best performance. You might be surprised to find that on current hardware, this isn’t always the case. Much of algorithmic analysis assumes very simple costs where the order of operations doesn’t matter. Memory access times are assumed to be the same. However, the difference between a cache hit (a few processor clock cycles) and a cache miss that requires access to main memory (a couple hundred cycles) is immense.

This article series is the result of the authors (William Cohen and Ben Woodard) discussion that there is a disconnect on the typical ideas of algorithm efficiency taught in computer science and computer engineering versus what is currently encountered in actual computer systems.

Continue reading “Algorithms != Programs and Programs are not “One size fits all””

Share
Reducing the startup overhead of SystemTap monitoring scripts with syscall_any tapset

Reducing the startup overhead of SystemTap monitoring scripts with syscall_any tapset

A number of the SystemTap script examples in the newly released SystemTap 4.0 available in Fedora 28 and 29 have reduced the amount of time required to convert the scripts into running instrumentation by using the syscall_any tapset.

This article discusses the particular changes made in the scripts and how you might also use this new tapset to make the instrumentation that monitors system calls smaller and more efficient. (This article is a follow-on to my previous article: Analyzing and reducing SystemTap’s startup cost for scripts.)

The key observation that triggered the creation of the syscall_any tapset was a number of scripts that did not use the syscall arguments. The scripts often used syscall.* and syscall.*.return, but they were only concerned with the particular syscall name and the return value. This type of information for all the system calls is available from the sys_entry and sys_exit kernel tracepoints. Thus, rather than creating hundreds of kprobes for each of the individual functions implementing the various system calls, there are just a couple of tracepoints being used in their place.

Continue reading “Reducing the startup overhead of SystemTap monitoring scripts with syscall_any tapset”

Share
Analyzing and reducing SystemTap’s startup cost for scripts

Analyzing and reducing SystemTap’s startup cost for scripts

SystemTap is a powerful tool for investigating system issues, but for some SystemTap instrumentation scripts, the startup times are too long. This article is Part 1 of a series and describes how to analyze and reduce SystemTap’s startup costs for scripts.

We can use SystemTap to investigate this problem and provide some hard data on the time required for each of the passes that SystemTap uses to convert a SystemTap script into instrumentation. SystemTap has a set of probe points marking the start and end of passes from 0 to 5:

  • pass0: Parsing command-line arguments
  • pass1: Parsing scripts
  • pass2: Elaboration
  • pass3: Translation to C
  • pass4: Compilation of C code into kernel module
  • pass5: Running the instrumentation

Articles in this series:

Continue reading “Analyzing and reducing SystemTap’s startup cost for scripts”

Share
Making the Operation of Code More Transparent and Obvious with SystemTap

Making the Operation of Code More Transparent and Obvious with SystemTap

You can study source code and manually instrument functions as described in the “Use the dynamic tracing tools, Luke” blog article, but why not make it easier to find key points in the software by adding user-space markers to the application code? User-space markers have been available in Linux for quite some time (since 2009). The inactive user-space markers do not significantly slow down the code. Having them available allows you to get a more accurate picture of what the software is doing internally when unexpected issues occur. The diagnostic instrumentation can be more portable with the user-space markers, because the instrumentation does not need to rely on instrumenting particular function names or lines numbers in source code. The naming of the instrumentation points can also make clearer what event is associated with a particular instrumentation point.

For example, Ruby MRI on Red Hat Enterprise Linux 7 has a number of different instrumentation points made available as a SystemTap tapset. If SystemTap is installed on the system, as described by What is SystemTap and how to use it?, the installed Ruby MRI instrumentation points can be listed with the stap -L” command shown below. These events show the start and end of various operations in the Ruby runtime, such as the start and end of garbage collection (GC) marking and sweeping.

Continue reading “Making the Operation of Code More Transparent and Obvious with SystemTap”

Share
“Use the dynamic tracing tools, Luke”

“Use the dynamic tracing tools, Luke”

A common refrain for tracking down issues on computer systems running open source software is “Use the source, Luke.” Reviewing the source code can be helpful in understanding how the code works, but the static view may not give you a complete picture of how things work (or are broken) in the code. The paths taken through code are heavily data dependent. Without knowledge about specific values at key locations in code, you can easily miss what is happening. Dynamic instrumentation tools, such as SystemTap, that trace and instrument the software can help provide a more complete understanding of what the code is actually doing

I have wanted to better understand how the Ruby interpreter works. This is an opportunity to use SystemTap to investigate Ruby MRI internals on Red Hat Enterprise Linux 7. The article What is SystemTap and how to use it? has more information about installing SystemTap. The x86_64 RHEL 7 machine has ruby-2.0.0648-33.el7_4.x86_64.rpm installed, so the matching debuginfo RPM is installed to provide SystemTap with information about function parameters and to provide me with human-readable source code. The debuginfo RPM is installed by running the following command as root:

Continue reading ““Use the dynamic tracing tools, Luke””

Share

Find what capabilities an application requires to successful run in a container

Many developers would like to run their existing applications in a container with restricted capabilities to improve security. However, it may not be clear which capabilities the application uses because the code uses libraries or other code developed elsewhere. The developer could run the application in an unrestricted container that allows all syscalls and capabilities to be used to avoid possible hard to diagnose failures caused by the application’s use of forbidden capabilities or syscalls. Of course, this eliminates the enhanced security of restricted containers. At Red Hat, we have developed a SystemTap script (container_check.stp) to provide information about the capabilities that an application uses. Read the SystemTap Beginners Guide for information on how to setup SystemTap.

Continue reading “Find what capabilities an application requires to successful run in a container”

Share