A number of the SystemTap script examples in the newly released SystemTap 4.0 available in Fedora 28 and 29 have reduced the amount of time required to convert the scripts into running instrumentation by using the syscall_any
tapset.
This article discusses the particular changes made in the scripts and how you might also use this new tapset to make the instrumentation that monitors system calls smaller and more efficient. (This article is a follow-on to my previous article: Analyzing and reducing SystemTap’s startup cost for scripts.)
The key observation that triggered the creation of the syscall_any
tapset was a number of scripts that did not use the syscall
arguments. The scripts often used syscall.*
and syscall.*.return
, but they were only concerned with the particular syscall
name and the return value. This type of information for all the system calls is available from the sys_entry
and sys_exit
kernel tracepoints. Thus, rather than creating hundreds of kprobes for each of the individual functions implementing the various system calls, there are just a couple of tracepoints being used in their place.
Let's take a concrete look at the impact of using syscall_any.*.return
in place of syscall.*.return
with the errno.stp
example on a Fedora 28 machine. This errno.stp
script provides system-wide information about the system calls that return error codes. It prints a separate line for each combination of PID, system call, and error. At the end of each line is a count of the times that particular combination was seen. Below is a version of the script using syscall.*.return
.
#!/usr/bin/stap # # Copyright (C) 2010, 2018 Red Hat, Inc. # By Dominic Duval, Red Hat Inc. # dduval@redhat.com # # Monitors errors returned by system calls. # # USAGE: stap errno.stp # global execname, errors probe syscall.*.return { errno = retval if ( errno < 0 ) { p = pid() execname[p]=execname(); errors[p, errno, name] <<< 1 } } probe end { printf("\n") printf("%8s %-32s %-16s %-12s %8s\n", "PID", "Syscall", "Process", "Error", "Count") foreach ([pid, error, thissyscall] in errors- limit 20) { printf("%8d %-32s %-16s %-12s %8d\n", pid, thissyscall, execname[pid], error ? errno_str(error) : "", @count(errors[pid, error, thissyscall]) ) } } global prom_arr probe prometheus { foreach ([p, errno, name] in errors- limit 20) { prom_arr[p, errno, name] = @count(errors[p, errno, name]) } @prometheus_dump_array3(prom_arr, "count", "pid", "errno", "name") delete prom_arr }
We can use the stap_time.stp
script discussed in my previous SystemTap article to record the time required by the various SystemTap phases. The stap_time.stp
script will be running in another window.
Below is a run of the old_errorno.stp
script that instruments the system calls with the syscall.*.return
probe points. The command line has -m old_error
to keep the resulting kernel module old_errno.ko
around, the keep temporary files (-k
) option so we can look at the size of the files used to create the instrumentation, and -T 0.25
to run the instrumentation for a quarter second.
$ stap -m old_errno -k old_errno.stp -T 0.25 PID Syscall Process Error Count 8489 read pulseaudio EAGAIN 50 9016 recvmsg Timer EAGAIN 16 9223 recvmsg firefox EAGAIN 11 9016 futex Timer ETIMEDOUT 7 8386 recvmsg gnome-shell EAGAIN 7 25430 recvmsg Web Content EAGAIN 6 2196 futex pmdaperfevent ETIMEDOUT 2 16278 read stapio EAGAIN 1 25430 futex Web Content ETIMEDOUT 1 9792 recvmsg hexchat EAGAIN 1 Keeping temporary directory "/tmp/stapUfJ9KC"
Now we'll replace the syscall.*.return
in the script with syscall_any.return
. The new, improved version has similar output:
$ stap -m new_errno -k new_errno.stp -T 0.25 PID Syscall Process Error Count 8489 read pulseaudio EAGAIN 50 9016 recvmsg Web Content EAGAIN 40 9016 futex Web Content ETIMEDOUT 10 8386 recvmsg gnome-shell EAGAIN 9 11884 recvmsg Web Content EAGAIN 8 9792 recvmsg hexchat EAGAIN 4 2196 futex pmdaperfevent ETIMEDOUT 3 10336 read gnome-terminal- EAGAIN 2 16840 read stapio EAGAIN 1 5262 read qemu-system-x86 EAGAIN 1 1863 recvmsg gsd-color EAGAIN 1 811 openat systemd-journal ENOENT 1 11884 futex Web Content ETIMEDOUT 1 Keeping temporary directory "/tmp/staprI6U1R"
The stap_time
script provided the following output for the various passes of generating the instrumentation:
$ stap stap_time.stp old_errno.stp 80 2592 13033 807 20384 new_errno.stp 96 2594 1593 22 4125
With some minor work, Gnuplot can generate a graph of the data, as seen below. Pass 2 (elaboration) and pass 4 (compilation) are the ones that take a majority of the time in old_errno.stp
using the syscall.*.return
probe points. We see those are greatly reduced in new_errno.stp
using the syscall_any.return
probe point: from 13,033 ms to 1,593 ms for pass 2 and from 20,384 ms to 4,125 ms for pass 4. We also see pass 3 code generation is virtually nonexistent on the graph (reduced from 807 ms to 22 ms).
The new_errno.ko
script is less than 1/3 the size of the old_errno.ko
script:
$ ls -ls *.ko 472 -rw-rw-r--. 1 wcohen wcohen 481016 Nov 1 11:07 new_errno.ko 1528 -rw-rw-r--. 1 wcohen wcohen 1564360 Nov 1 11:06 old_errno.ko
Looking at the *src.c
files in the temporary directories /tmp/stapUfJ9KC/
and /tmp/staprI6U1R/
, we can see that old_errno_src.c
is about four times larger than new_errno_src.c
. This explains the additional overhead for the pass 3 code generation and pass 4 compilation and the larger old_errno.ko
.
$ ls -l /tmp/stapUfJ9KC/*src.c /tmp/staprI6U1R/*src.c -rw-rw-r--. 1 wcohen wcohen 967746 Nov 1 11:07 /tmp/staprI6U1R/new_errno_src.c -rw-rw-r--. 1 wcohen wcohen 4119779 Nov 1 11:06 /tmp/stapUfJ9KC/old_errno_src.c
Conclusion
If your SystemTap script just requires information about the system call name and return value for all the system calls, you should consider using the syscall_any
tapset in place of the syscall.*
to make the instrumentation compile faster and result in small instrumentation.
Additional resources
See my other articles on Red Hat Developers:
- Making the Operation of Code More Transparent and Obvious with SystemTap
- "Use the dynamic tracing tools, Luke"
- Find what capabilities an application requires to successfully run in a container
- How to avoid wasting megabytes of memory a few bytes at a time
- Instruction-level Multithreading to improve processor utilization
- “Don’t cross the streams”: Thread safety and memory accesses at the speed of light
Red Hat Developer has many other articles on SystemTap and performance.