SystemTap is an instrumentation tool included in Red Hat Enterprise Linux (RHEL) and Fedora. Most languages provide a means to reuse code. Python provides modules and C provides libraries as a means of encapsulating code for reuse. In SystemTap we call them "tapsets". Why write everything from scratch in your SystemTap script when you can save time by using probe points and functions defined by existing tapsets? Using tapsets can also make your SystemTap script more portable as they can hide some of the variations in probe points between different versions of the code being instrumented.
Initially, I will show you where to find documentation about the set of probes and functions distributed with SystemTap as tapsets and then I will explain some of the details contained in a tapset. Other packages outside of SystemTap also include tapsets and we will take a look at an example from the Ruby language. We will cover how to make it easier to develop and use your own local tapset. This discussion of SystemTap tapsets should make it easier for you to develop and distribute SystemTap instrumentation.
Documentation on tapsets
When developing SystemTap we wanted to make it as easy as possible for people to discover the functions and instrumentation probe points available in the tapsets. Having that information about the tapsets hidden away and difficult to navigate would negate the benefit of tapsets. People would end up "rolling their own" implementations of the tapset functionality only to later find out that there is an existing tapset that does exactly what they want. To make life easier the tapset source code includes documentation that is extracted by scripts. Having the documentation next to source encourages developers to update the documentation when they update a part of a tapset.
When we generate a new release of SystemTap we have a script to update the website documentation on the current tapsets. The resulting documentation is the SystemTap Tapset Reference Manual. This webpage provides a summary for each of the probe points and functions. One can click on the name to get a more detailed description of the item. For example the probe::vm.brk:
- Name:
probe::vm.brk
—Fires when abrk
is requested (i.e., the heap will be resized). - Synopsis:
vm.brk
- Values:
address
—The requested address. name
: name of the probe point.length
: the length of the memory segment.- Context: The process calling
brk
.
As seen above the format is similar to a man page. The documentation includes the values that are available at the probe point. The vm.brk
probe point includes local context values: address
to indicate the requested address, name
for the name of the probe point itself, and length
for the size of the memory segment. For tapset functions such as function::proc_mem_txt
below we see that it has one optional argument, pid
. If no argument is included, the text size of the current process running on the processor is returned. If a pid
argument is given, the text size of that process is returned.
- Name:
function::proc_mem_txt
—Program text (code) size in pages. - Synopsis:
proc_mem_txt:long()
proc_mem_txt:long(pid:long)
- Arguments:
pid
—Thepid
of process to examine. - Description:
- Returns the current process text (code) size in pages, or zero when there is no current process or the number of pages couldn't be retrieved.
- Returns the given process text (code) size in pages, or zero when the process doesn't exist or the number of pages couldn't be retrieved.
What is in a tapset
If you want to take a closer look at how the tapsets are implemented, you can look at the files in /usr/share/systemtap/tapsets
. For the previous examples we can look at /usr/share/systemtap/tapset/linux/memory.stp
for the vm.brk
probe point:
/**
* probe vm.brk - Fires when a brk is requested (i.e. the heap will be resized)
*
* @name: name of the probe point
* @address: the requested address
* @length: the length of the memory segment
*
* Context:
* The process calling brk.
*/
probe vm.brk = kernel.function("do_brk_flags")!, kernel.function("do_brk") {
name = "brk"
address = $addr
length = $len
}
Before the probe point definition there is the special comment that is converted into the documentation. The second line of the comment is the summary. The lines in the comment with @
describe the three context variables available for use at this probe point. The last section Context:
in the comment provides some information about the environment when this probe point is operating.
Following the special comment is the line that maps the vm.brk
probe point to the actual locations to instrument. Multiple probe locations are separated by commas: the Linux kernel's do_brk_flags
and do_brk
functions. In the particular probe point the first function, do_brk_flags
is followed by an exclamation point (!
) to indicate if this function is found, place the instrumentation on this function and do not attempt to instrument anything following the exclamation point. This is one of the ways that SystemTap hides the variations between different versions of the kernel. If the do_brk_flags
function is not found, SystemTap will instrument the do_brk
function.
At the end of the probe line is an opening brace ({
) matched with a closing brace (}
) several lines later. This is the body of the probe which defines the context variables: name
, address
, and length
. The name
context variable is just a simple string to provide a handy name for the probe. The address
and length
context variables are obtained from the do_brk_flags
(or do_brk
) function arguments addr
and len
, respectively.
Below are the two versions of SystemTap proc_mem_txt
function. The first has no argument and will return text size for the current process and the second function will return the text size for a specific process (pid
). Like the probe vm.brk
probe point each of the functions is preceded by a special comment which contains the documentation about the function. They are virtually the same with the exception of how the local task variable is initialized in the body of the function. There are checks to ensure that the task and the memory map (mm
) are valid before computing the size of the program's text region:
/**
* sfunction proc_mem_txt - Program text (code) size in pages
*
* Description: Returns the current process text (code) size in pages,
* or zero when there is no current process or the number of pages
* couldn't be retrieved.
*/
function proc_mem_txt:long ()
{
task = task_current()
if (_stp_valid_task(task)) {
mm = @task(task)->mm
if (mm != 0) {
s = @mm(mm)->start_code
e = @mm(mm)->end_code
return _stp_mem_txt_adjust(s, e)
}
}
return 0
}
/**
* sfunction proc_mem_txt - Program text (code) size in pages
*
* @pid: The pid of process to examine
*
* Description: Returns the given process text (code) size in pages,
* or zero when the process doesn't exist or the number of pages
* couldn't be retrieved.
*/
function proc_mem_txt:long (pid:long)
{
task = pid2task(pid)
if (_stp_valid_task(task)) {
mm = @task(task)->mm
if (mm != 0) {
s = @mm(mm)->start_code
e = @mm(mm)->end_code
return _stp_mem_txt_adjust (s, e)
}
}
return 0
}
Using tapsets to abstract away code differences
A benefit of using tapsets is to have a common instrumentation interface that works across different versions of a piece of software. SystemTap is being used to instrument a variety of Linux kernels in different distributions of RHEL and Fedora. There have been changes in function names and variables available. One short example is the vm.pagefault
probe below:
probe vm.pagefault = kernel.function("handle_mm_fault@mm/memory.c").call !,
kernel.function("__handle_mm_fault@mm/memory.c").call
{
name = "pagefault"
write_access = (@defined($flags)
? $flags & FAULT_FLAG_WRITE : $write_access)
address = $address
}
There have been changes in the specific function name that handles page faults in the Linux kernel. There are two possible functions to probe: handle_mm_fault
and __handle_mm_fault
. The !
at the end of the first line’s probe point indicates that if the handle_mm_fault
function is found, there is no need to instrument following probe points in the list. If the probe point is suffixed by the !
is not available, SystemTap instruments the __handle_mm_fault
function. There is also the ?
suffix that makes the probe point totally optional and instrument it if available.
Changes in variable name, locations, or semantics can be handled with the @defined()
. In this example the function argument describing whether the fault was caused by a read or a write access changed. Using the @defined()
selects the proper method of extracting that information for write_access based on the variable available. A @choose_defined($flags & FAULT_FLAG_WRITE, $write_access)
would be an alternative way of determining write_access
.
Tapsets for user applications
A number of user-space applications that provide user-space markers are listed on the SystemTap wiki under Applications with built-in User-Space Markers and some of those include their own tapsets. Qemu, glib2, Java, perl, Ruby, and SSSD provide their own tapsets to simplify instrumenting the user-space application code. The tapsets for these applications are also placed in the /usr/share/systemtap/tapset
directory and are available to SystemTap instrumentation scripts on the system.
For example, the ruby-libs RPM on RHEL 9 installs /usr/share/systemtap/tapset/libruby.so.3.0.stp
. The Ruby runtime has a number of user-space markers at key locations that clearly indicate where instrumentation for monitoring particular events should be placed in the Ruby library. Having a tapset simplifies instrumenting Ruby code. Below is the probe for a Ruby method entry. It is similar to the vm.brk
probe mentioned earlier. It has a comment describing the probe point and its arguments. The actual mapping of ruby.method.entry
to the user-space marker follows. The body of the probe handler maps the user-space marker arguments to more meaningful context variables: classname
, methodname
, file
, and line
:
/**
* probe ruby.method.entry - Fired just before a method implemented in Ruby is entered.
*
* @classname: Name of the class (string)
* @methodname: The method about bo be executed (string)
* @file: The file name where the method is being called (string)
* @line: The line number where the method is being called (int)
*/
probe ruby.method.entry =
process("/usr/lib*/libruby.so.3.0").mark("method__entry")
{
classname = user_string($arg1)
methodname = user_string($arg2)
file = user_string($arg3)
line = $arg4
}
Like instrumenting the Linux kernel, having user-space application tapsets make it easier for developers to write portable instrumentation and not have to dive through the application code looking for suitable places to instrument. Below is SystemTap listing out the arguments for essentially the same location in the Ruby library:
$ stap -L 'process("/usr/lib64/libruby.so.3.0").mark("method__entry")'
process("/usr/lib64/libruby.so.3.0.7").mark("method__entry") $arg1:long $arg2:long $arg3:long $arg4:long
$ stap -L 'ruby.method.entry'
ruby.method.entry classname:string methodname:string file:string line:long $arg1:long $arg2:long $arg3:long $arg4:long
Which would you prefer to use?
Using existing tapsets
If the tapset is in the default tapset directory, /usr/share/systemtap/tapset
, its functions and probe points will be found by SystemTap. For example we can write the following very simple SystemTap script (tally_ruby_methods.stp
) using the ruby.method.entry
probe point provided by the previously mentioned Ruby tapset to tally method invocations:
global tally
probe ruby.method.entry { tally[classname,methodname] <<< 1 }
The following shows a run of the tally_ruby_methods.stp
script with the snmpcheck
program which is written in Ruby:
$ stap tally_ruby_mothods.stp -c 'snmpcheck 192.168.1.68'
WARNING: missing unwind/symbol data for module '/usr/bin/snmpcheck'
snmpcheck.rb v1.9 - SNMP enumerator
Copyright (c) 2005-2015 by Matteo Cantoni (www.nothink.org)
[+] Try to connect to 192.168.1.68:161 using SNMPv1 and community 'public'
[!] 192.168.1.68:161 SNMP request timeout
tally["SNMP::Manager::Config","transport"] @count=2 @min=1 @max=1 @sum=2 @avg=1
tally["SNMP::Manager::Config","host"] @count=2 @min=1 @max=1 @sum=2 @avg=1
tally["SNMP::Manager::Config","community"] @count=2 @min=1 @max=1 @sum=2 @avg=1
tally["SNMP::Manager::Config","version"] @count=1 @min=1 @max=1 @sum=1 @avg=1
tally["SNMP::Manager::Config","timeout"] @count=1 @min=1 @max=1 @sum=1 @avg=1
tally["SNMP::Manager::Config","use_IPv6"] @count=1 @min=1 @max=1 @sum=1 @avg=1
tally["SNMP::Manager::Config","max_recv_bytes"] @count=1 @min=1 @max=1 @sum=1 @avg=1
tally["SNMP::Manager::Config","port"] @count=1 @min=1 @max=1 @sum=1 @avg=1
tally["SNMP::Manager::Config","mib_dir"] @count=1 @min=1 @max=1 @sum=1 @avg=1
tally["SNMP::Manager::Config","ignore_oid_order"] @count=1 @min=1 @max=1 @sum=1 @avg=1
tally["SNMP::Manager::Config","retries"] @count=1 @min=1 @max=1 @sum=1 @avg=1
tally["SNMP::Manager::Config","mib_modules"] @count=1 @min=1 @max=1 @sum=1 @avg=1
tally["SNMP::Manager::Config","trap_port"] @count=1 @min=1 @max=1 @sum=1 @avg=1
tally["SNMP::Manager::Config","write_community"] @count=1 @min=1 @max=1 @sum=1 @avg=1
Developing your own local tapset
There are some suggestions for writing tapsets in the Tapset Developer's Guide if you are adding instrumentation to a package you maintain or using probe points that already exist in a user-space application currently without a SystemTap tapset. In the situation you are developing your own tapset to simplify instrumenting the code but not ready to place the tapset in the system directory /usr/share/systemtap/tapset
that SystemTap searches by default use the stap command option -Idir to include other additional non-default directories.
Conclusion
SystemTap tapsets provide a cleaner way for instrumenting code by hiding some of the probe point details, providing more meaningful names for probe context variables, and including useful functions. Making use of the existing tapset or creating your own for areas of code that do not currently have them will help you be more productive.