Imagine you're a programmer with a problem: Your code is linked to a library that you're unfamiliar with. The code should work, but it doesn't. It almost works, but something is wrong inside the library. Another program works correctly with the same part of the same library. So, now what? It's probably a silly problem, but how will you locate it?
In a scenario like this one, you could be in for some serious source code gazing, documentation digestion, and mailing list archaeology. If that fails, it's time to reach out to human experts and hope for the best. Or, you could try to save time with a clever tool. I was recently hit with several code traps like this one. Fortunately, I'm familiar with SystemTap.
Statement tracing with SystemTap
SystemTap is a general-purpose, systemwide probing and tracing tool that is just about to pass its sixteenth birthday. It can do many things, including performance monitoring, injecting print statements into running programs, and inspecting or even modifying data structures. It can probe any part of the kernel, or a running program or library in C, C++, assembler, Java, Python, and other languages. Given enough access to DWARF debuginfo, SystemTap can reveal the internal operation of the whole software stack.
How does this help you today? By using tracing at the granularity of statements, you can find out which line of which function leads to a divergence of behavior between working and broken cases. Let's look at an example.
Tracing a function in a shared library
If you suspected divergence occurring in a particular function, myfn
, of the shared library libfoo.so
, you could trace it like this:
probe process("/lib64/libfoo.so").statement("myfn@*:*") {
println(pp(), " ", $$vars)
}
Here, you are asking SystemTap to intercept all instructions of the libfoo.so
shared library corresponding to every source line statement of the myfn
function. At each hit, print the current probe point (the file or line number) and pretty-print local variables. Plop that script into a file foo.stp
. Set a $DEBUGINFOD_URLS environment variable if appropriate. Assuming the passing and failing tests are available as binaries, run the following script for the two tests:
# stap foo.stp -c "./test_scenario1" | tee test-1.txt
# stap foo.stp -c "./test_scenario2" | tee test-2.txt
If this worked, you should have two text files containing streams of lines that record the history of each function-call invocation, plus a long list of variable names or values in scope:
process("libfoo.so").statement("myfn@foo.c:1252") [...vars...]
process("libfoo.so").statement("myfn@foo.c:1253") [...vars...]
process("libfoo.so").statement("myfn@foo.c:1277") [...vars...]
Differential analysis: Comparing traces
Next, open up the two files in an extra-wide text editor and compare them side-by-side. Segments should be much alike, maybe even the same sequence of line numbers. The key is to look for points where the line number sequences or data meaningfully diverge. A good editor like emacs, or classic Unix tools like perl
, sed
, and grep
can help eliminate near-duplication by searching and replacing common content.
Note: You can also add almost any conceivable filtering logic right into the SystemTap script to reduce or focus the trace.
With practice and a keen eye, you could get a side-by-side view as specific as the one shown in Figure 1.
In Figure 1, note the statement number divergence around line 1470. Note also the suggestive variable must_add_keep_alive
changing its value to 0x1
, but only in one of the divergent forks.
Once you have the side-by-side comparison and analysis, you just need to pop open the library source code at the given file and line number, look at the variable, make the final inferences, and write a bug fix.
Conclusion
This sunny day scenario might sound too good to be true, but let me assure you it is not: It's based on not one but three real head-scratchers involving different bugs, programs, and libraries. Even as an experienced programmer, I benefited from a tool that let me cut right through the firehose of activity in an unfamiliar library and focus in on the most salient lines in the code.
Differential, statement-level tracing is a powerful technique, and SystemTap is one of the few tools that does it well. Try it, and check out some of SystemTap's other related examples.
Last updated: September 19, 2022