SystemTap 3.2 includes an early prototype of SystemTap's new BPF backend (stapbpf). It represents a first step towards leveraging powerful new tracing and performance analysis capabilities recently added to the Linux kernel. In this post, I will compare the translation process of stapbpf with the default backend (stap) and compare some differences in functionality between these two backends.
Stap and stapbpf share common parsing and semantic analysis stages. As input for translation, both backends receive data structures representing a parse tree complete with type information and references to the definitions of all variables and functions being used. A summary of this information can be displayed using the stap command's '-p2' option.
$ cat sample.stp probe kernel.function("sys_read") { printf("hi from sys_read!\n"); exit() } $ stap -p2 sample.stp # functions exit:unknown () kernel.function("SyS_read@fs/read_write.c:542") /* pc=_stext+0x273da0 */ /* <- kernel.function("SyS_read@fs/read_write.c:542") */ $ stap -p2 --runtime=bpf sample.stp # functions _set_exit_status:long () exit:unknown () # probes kernel.function("SyS_read@fs/read_write.c:542") /* pc=_stext+0x273da0 */ /* <- kernel.function("SyS_read@fs/read_write.c:542") */
You can see that stapbpf's exit function involves an additional call to _set_exit_status
but otherwise, the two backends are probing the same location.
From this point, the translation processes diverge. Stap's goal is to convert the script into a kernel module. To accomplish this, stap translates the parse tree into the C source code of the desired kernel module. At runtime, GCC is used to compile this source code into the actual kernel module. The '-p4' option can be used with the stap command to produce the kernel object file.
# stap -p4 sample.stp [...]_1316.ko # staprun [...]_1316.ko hi from sys_read!
Instead of C, stapbpf translates the script directly into BPF bytecode to be executed by an in-kernel virtual machine. The bytecode is then stored in a BPF-ELF file intended for use by the stapbpf runtime.
# stap -p4 --runtime=bpf sample.stp stap_1348.bo # stapbpf stap_1348.bo hi from sys_read!
Unlike stap's kernel modules, producing the BPF bytecode requires no external compiler. This helps keep stapbpf's compile times and installation footprint low. With the '-v' option, we can see the duration of each stage of translation.
# stap -v -p4 sample.stp [...] Pass 3: translated to C [...] in 0usr/0sys/4real ms. Pass 4: compiled C [...] in 1330usr/310sys/1559real ms. # stap -v -p4 --runtime=bpf sample.stp [...] Pass 4: compiled BPF into "stap_3792.bo" in 0usr/0sys/0real ms.
Notice that pass 3 and 4 takes 1563ms for stap but <1ms for stapbpf (which combines pass 3 and 4 into a single pass).
When loading BPF programs into the kernel, they are first checked for safety by a BPF verifier built into the kernel. It checks for undesirable behaviors such as out of bound jumps, out of bound stack loads/stores and reads from uninitialized addresses. It also checks for the presence of unreachable instructions. Any BPF program, which does not pass verification will not be loaded into the BPF virtual machine. Although the default stap is held to similar standards and is known to be very safe to use, stapbpf has the advantage of inheriting BPF's simpler security model.
However, this advantage does come with some trade-offs. For example, BPF does not support writing to kernel memory. Although stap disables this ability by default, it does provide a "guru mode" that acts as an escape hatch for the user who wishes to have this level of control over their operating system. This means that stapbpf does not share stap's ability to, for example, administer security band-aids to a live system. Even more restricting is that in order to ensure that BPF programs terminate quickly; the verifier rejects any program with loops. While it would be possible for stapbpf to perform loop unwinding, BPF also imposes a limit of 4096 instructions per program.
# stap --runtime=bpf contains_loops.stp Error loading /tmp/stapxSM7Kg/stap_8316.bo: bpf program load failed: Invalid argument [...] Pass 5: run failed. # stap --runtime=bpf too_many_insns.stp Error loading /tmp/stapqxRXi4/stap_8432.bo: bpf program load failed: Argument list too long [...] Pass 5: run failed.
The following table is a summary comparing stap and stapbpf. Features which BPF permits but are not yet implemented in stapbpf are indicated with 'possible'.
stap | stapbpf | |
non-blocking probe handlers | yes | yes |
protected probe execution environment | yes | yes |
lock-protected global variables | per probe locking | per operation locking |
kprobes (DWARF) | yes | yes |
kprobes (DWARF-less) | yes | possible |
uprobes | yes | possible |
tracepoint probes | yes | possible |
probe dynamically loaded kernel objects | yes | possible |
timer-based probing | yes | yes |
able to change state in probed program | yes | possible (userspace only) |
means available to bypass protections for advanced users | yes | no |
loop support (for, while, for each) | yes | limited* |
string support (variables, literals) | yes | limited** |
probe handler length limit | 1000 statements | 4096 instructions |
means available to increase handler length limit | yes | no |
kernel verifies the safety of program | no | yes |
* For and while loops are enabled in begin and end probes. These probes are executed in user space and therefore do not require verification.
** There is support for printf
's format string literal.
It can be seen that stapbpf is able to provide only a subset of stap's functionality. However, for systems whose security policies either prevent the full kernel module backend or require software with a security model simpler than stap's, stapbpf aims to provide a convenient way to use this subset.
Take advantage of your Red Hat Developers membership and download RHEL today at no cost.
Last updated: December 12, 2017