Making SystemTap instrumentation easier with tapsets

SystemTap is an instrumentation tool included in Red Hat Enterprise Linux (RHEL) and Fedora. Most languages provide a means to reuse code. Python provides modules and C provides libraries as a means of encapsulating code for reuse. In SystemTap we call them "tapsets". Why write everything from scratch in your SystemTap script when you can save time by using probe points and functions defined by existing tapsets? Using tapsets can also make your SystemTap script more portable as they can hide some of the variations in probe points between different versions of the code being instrumented.

Initially, I will show you where to find documentation about the set of probes and functions distributed with SystemTap as tapsets and then I will explain some of the details contained in a tapset. Other packages outside of SystemTap also include tapsets and we will take a look at an example from the Ruby language. We will cover how to make it easier to develop and use your own local tapset. This discussion of SystemTap tapsets should make it easier for you to develop and distribute SystemTap instrumentation.

Documentation on tapsets

When developing SystemTap we wanted to make it as easy as possible for people to discover the functions and instrumentation probe points available in the tapsets. Having that information about the tapsets hidden away and difficult to navigate would negate the benefit of tapsets. People would end up "rolling their own" implementations of the tapset functionality only to later find out that there is an existing tapset that does exactly what they want. To make life easier the tapset source code includes documentation that is extracted by scripts. Having the documentation next to source encourages developers to update the documentation when they update a part of a tapset.

When we generate a new release of SystemTap we have a script to update the website documentation on the current tapsets. The resulting documentation is the SystemTap Tapset Reference Manual. This webpage provides a summary for each of the probe points and functions. One can click on the name to get a more detailed description of the item. For example the probe::vm.brk:

`probe::vm.brk`
Prev	Chapter 6. Memory Tapset	Next

Name: probe::vm.brk—Fires when a brk is requested (i.e., the heap will be resized).
Synopsis: vm.brk
Values: address—The requested address.
name: name of the probe point.
length: the length of the memory segment.
Context: The process calling brk.

Prev	Up	Next
`function::vm_fault_contain`	Home	`probe::vm.kfree`

As seen above the format is similar to a man page. The documentation includes the values that are available at the probe point. The vm.brk probe point includes local context values: address to indicate the requested address, name for the name of the probe point itself, and length for the size of the memory segment. For tapset functions such as function::proc_mem_txt below we see that it has one optional argument, pid. If no argument is included, the text size of the current process running on the processor is returned. If a pid argument is given, the text size of that process is returned.

`function::proc_mem_txt`
Prev	Chapter 6. Memory Tapset	Next

Name: function::proc_mem_txt—Program text (code) size in pages.
Synopsis:
- proc_mem_txt:long()
- proc_mem_txt:long(pid:long)
Arguments: pid—The pid of process to examine.
Description:
- Returns the current process text (code) size in pages, or zero when there is no current process or the number of pages couldn't be retrieved.
- Returns the given process text (code) size in pages, or zero when the process doesn't exist or the number of pages couldn't be retrieved.

Prev	Up	Next
`function::proc_mem_string`	Home	`function::vm_fault_contains`

What is in a tapset

If you want to take a closer look at how the tapsets are implemented, you can look at the files in /usr/share/systemtap/tapsets. For the previous examples we can look at /usr/share/systemtap/tapset/linux/memory.stp for the vm.brk probe point:

/**
 * probe vm.brk - Fires when a brk is requested (i.e. the heap will be resized)
 *
 * @name: name of the probe point
 * @address: the requested address
 * @length: the length of the memory segment
 *
 * Context:
 *  The process calling brk.
 */
probe vm.brk = kernel.function("do_brk_flags")!, kernel.function("do_brk") {
    name = "brk"
    address = $addr
    length = $len
}

Copy snippet

Before the probe point definition there is the special comment that is converted into the documentation. The second line of the comment is the summary. The lines in the comment with @ describe the three context variables available for use at this probe point. The last section Context: in the comment provides some information about the environment when this probe point is operating.

Following the special comment is the line that maps the vm.brk probe point to the actual locations to instrument. Multiple probe locations are separated by commas: the Linux kernel's do_brk_flags and do_brk functions. In the particular probe point the first function, do_brk_flags is followed by an exclamation point (!) to indicate if this function is found, place the instrumentation on this function and do not attempt to instrument anything following the exclamation point. This is one of the ways that SystemTap hides the variations between different versions of the kernel. If the do_brk_flags function is not found, SystemTap will instrument the do_brk function.

At the end of the probe line is an opening brace ({) matched with a closing brace (}) several lines later. This is the body of the probe which defines the context variables: name, address, and length. The name context variable is just a simple string to provide a handy name for the probe. The address and length context variables are obtained from the do_brk_flags (or do_brk) function arguments addr and len, respectively.

Below are the two versions of SystemTap proc_mem_txt function. The first has no argument and will return text size for the current process and the second function will return the text size for a specific process (pid). Like the probe vm.brk probe point each of the functions is preceded by a special comment which contains the documentation about the function. They are virtually the same with the exception of how the local task variable is initialized in the body of the function. There are checks to ensure that the task and the memory map (mm) are valid before computing the size of the program's text region:

/**
 * sfunction proc_mem_txt - Program text (code) size in pages
 *
 * Description: Returns the current process text (code) size in pages,
 * or zero when there is no current process or the number of pages
 * couldn't be retrieved.
 */
function proc_mem_txt:long ()
{
  task = task_current()
  if (_stp_valid_task(task)) {
    mm = @task(task)->mm
    if (mm != 0) {
       s = @mm(mm)->start_code
       e = @mm(mm)->end_code
       return _stp_mem_txt_adjust(s, e)
    }
  }
  return 0
}
/**
 * sfunction proc_mem_txt - Program text (code) size in pages
 *
 * @pid: The pid of process to examine
 *
 * Description: Returns the given process text (code) size in pages,
 * or zero when the process doesn't exist or the number of pages
 * couldn't be retrieved.
 */
function proc_mem_txt:long (pid:long)
{
  task = pid2task(pid)
  if (_stp_valid_task(task)) {
    mm = @task(task)->mm
    if (mm != 0) {
       s = @mm(mm)->start_code
       e = @mm(mm)->end_code
       return _stp_mem_txt_adjust (s, e)
    }
  }
  return 0
}

Copy snippet

Using tapsets to abstract away code differences

A benefit of using tapsets is to have a common instrumentation interface that works across different versions of a piece of software. SystemTap is being used to instrument a variety of Linux kernels in different distributions of RHEL and Fedora. There have been changes in function names and variables available. One short example is the vm.pagefault probe below:

probe vm.pagefault = kernel.function("handle_mm_fault@mm/memory.c").call !,
                     kernel.function("__handle_mm_fault@mm/memory.c").call
{
        name = "pagefault"
        write_access = (@defined($flags)
                        ? $flags & FAULT_FLAG_WRITE : $write_access)
        address =  $address
}

Copy snippet

There have been changes in the specific function name that handles page faults in the Linux kernel. There are two possible functions to probe: handle_mm_fault and __handle_mm_fault. The ! at the end of the first line’s probe point indicates that if the handle_mm_fault function is found, there is no need to instrument following probe points in the list. If the probe point is suffixed by the ! is not available, SystemTap instruments the __handle_mm_fault function. There is also the ? suffix that makes the probe point totally optional and instrument it if available.

Changes in variable name, locations, or semantics can be handled with the @defined(). In this example the function argument describing whether the fault was caused by a read or a write access changed. Using the @defined() selects the proper method of extracting that information for write_access based on the variable available. A @choose_defined($flags & FAULT_FLAG_WRITE, $write_access) would be an alternative way of determining write_access.

Tapsets for user applications

A number of user-space applications that provide user-space markers are listed on the SystemTap wiki under Applications with built-in User-Space Markers and some of those include their own tapsets. Qemu, glib2, Java, perl, Ruby, and SSSD provide their own tapsets to simplify instrumenting the user-space application code. The tapsets for these applications are also placed in the /usr/share/systemtap/tapset directory and are available to SystemTap instrumentation scripts on the system.

For example, the ruby-libs RPM on RHEL 9 installs /usr/share/systemtap/tapset/libruby.so.3.0.stp. The Ruby runtime has a number of user-space markers at key locations that clearly indicate where instrumentation for monitoring particular events should be placed in the Ruby library. Having a tapset simplifies instrumenting Ruby code. Below is the probe for a Ruby method entry. It is similar to the vm.brk probe mentioned earlier. It has a comment describing the probe point and its arguments. The actual mapping of ruby.method.entry to the user-space marker follows. The body of the probe handler maps the user-space marker arguments to more meaningful context variables: classname, methodname, file, and line:

/**
 * probe ruby.method.entry - Fired just before a method implemented in Ruby is entered.
 *
 * @classname: Name of the class (string)
 * @methodname: The method about bo be executed (string)
 * @file: The file name where the method is being called (string)
 * @line: The line number where the method is being called (int)
 */
probe ruby.method.entry =
      process("/usr/lib*/libruby.so.3.0").mark("method__entry")
{
        classname  = user_string($arg1)
        methodname = user_string($arg2)
        file = user_string($arg3)
        line = $arg4
}

Copy snippet

Like instrumenting the Linux kernel, having user-space application tapsets make it easier for developers to write portable instrumentation and not have to dive through the application code looking for suitable places to instrument. Below is SystemTap listing out the arguments for essentially the same location in the Ruby library:

$ stap -L 'process("/usr/lib64/libruby.so.3.0").mark("method__entry")'
process("/usr/lib64/libruby.so.3.0.7").mark("method__entry") $arg1:long $arg2:long $arg3:long $arg4:long
$ stap -L 'ruby.method.entry'
ruby.method.entry classname:string methodname:string file:string line:long $arg1:long $arg2:long $arg3:long $arg4:long

Copy snippet

Which would you prefer to use?

Using existing tapsets

If the tapset is in the default tapset directory, /usr/share/systemtap/tapset, its functions and probe points will be found by SystemTap. For example we can write the following very simple SystemTap script (tally_ruby_methods.stp) using the ruby.method.entry probe point provided by the previously mentioned Ruby tapset to tally method invocations:

global tally
probe ruby.method.entry { tally[classname,methodname] <<< 1 }

Copy snippet

The following shows a run of the tally_ruby_methods.stp script with the snmpcheck program which is written in Ruby:

$ stap tally_ruby_mothods.stp -c 'snmpcheck 192.168.1.68'
WARNING: missing unwind/symbol data for module '/usr/bin/snmpcheck'
snmpcheck.rb v1.9 - SNMP enumerator
Copyright (c) 2005-2015 by Matteo Cantoni (www.nothink.org)

[+] Try to connect to 192.168.1.68:161 using SNMPv1 and community 'public'

[!] 192.168.1.68:161 SNMP request timeout
tally["SNMP::Manager::Config","transport"] @count=2 @min=1 @max=1 @sum=2 @avg=1
tally["SNMP::Manager::Config","host"] @count=2 @min=1 @max=1 @sum=2 @avg=1
tally["SNMP::Manager::Config","community"] @count=2 @min=1 @max=1 @sum=2 @avg=1
tally["SNMP::Manager::Config","version"] @count=1 @min=1 @max=1 @sum=1 @avg=1
tally["SNMP::Manager::Config","timeout"] @count=1 @min=1 @max=1 @sum=1 @avg=1
tally["SNMP::Manager::Config","use_IPv6"] @count=1 @min=1 @max=1 @sum=1 @avg=1
tally["SNMP::Manager::Config","max_recv_bytes"] @count=1 @min=1 @max=1 @sum=1 @avg=1
tally["SNMP::Manager::Config","port"] @count=1 @min=1 @max=1 @sum=1 @avg=1
tally["SNMP::Manager::Config","mib_dir"] @count=1 @min=1 @max=1 @sum=1 @avg=1
tally["SNMP::Manager::Config","ignore_oid_order"] @count=1 @min=1 @max=1 @sum=1 @avg=1
tally["SNMP::Manager::Config","retries"] @count=1 @min=1 @max=1 @sum=1 @avg=1
tally["SNMP::Manager::Config","mib_modules"] @count=1 @min=1 @max=1 @sum=1 @avg=1
tally["SNMP::Manager::Config","trap_port"] @count=1 @min=1 @max=1 @sum=1 @avg=1
tally["SNMP::Manager::Config","write_community"] @count=1 @min=1 @max=1 @sum=1 @avg=1

Copy snippet

Developing your own local tapset

There are some suggestions for writing tapsets in the Tapset Developer's Guide if you are adding instrumentation to a package you maintain or using probe points that already exist in a user-space application currently without a SystemTap tapset. In the situation you are developing your own tapset to simplify instrumenting the code but not ready to place the tapset in the system directory /usr/share/systemtap/tapset that SystemTap searches by default use the stap command option -Idir to include other additional non-default directories.

Conclusion

SystemTap tapsets provide a cleaner way for instrumenting code by hiding some of the probe point details, providing more meaningful names for probe context variables, and including useful functions. Making use of the existing tapset or creating your own for areas of code that do not currently have them will help you be more productive.

Making SystemTap instrumentation easier with tapsets

Share:

Documentation on tapsets

What is in a tapset

Using tapsets to abstract away code differences

Tapsets for user applications

Using existing tapsets

Developing your own local tapset

Conclusion

Products

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Report a website issue