Making the Operation of Code More Transparent and Obvious with SystemTap

Making the Operation of Code More Transparent and Obvious with SystemTap

You can study source code and manually instrument functions as described in the “Use the dynamic tracing tools, Luke” blog article, but why not make it easier to find key points in the software by adding user-space markers to the application code? User-space markers have been available in Linux for quite some time (since 2009). The inactive user-space markers do not significantly slow down the code. Having them available allows you to get a more accurate picture of what the software is doing internally when unexpected issues occur. The diagnostic instrumentation can be more portable with the user-space markers, because the instrumentation does not need to rely on instrumenting particular function names or lines numbers in source code. The naming of the instrumentation points can also make clearer what event is associated with a particular instrumentation point.

For example, Ruby MRI on Red Hat Enterprise Linux 7 has a number of different instrumentation points made available as a SystemTap tapset. If SystemTap is installed on the system, as described by What is SystemTap and how to use it?, the installed Ruby MRI instrumentation points can be listed with the stap -L” command shown below. These events show the start and end of various operations in the Ruby runtime, such as the start and end of garbage collection (GC) marking and sweeping.

$ stap -L "ruby.**"
ruby.array.create size:long file:string line:long $arg1:long $arg2:long $arg3:long
ruby.cmethod.entry classname:string methodname:string file:string line:long $arg1:long $arg2:long $arg3:long $arg4:long
ruby.cmethod.return classname:string methodname:string file:string line:long $arg1:long $arg2:long $arg3:long $arg4:long
ruby.find.require.entry requiredfile:string file:string line:long $arg1:long $arg2:long $arg3:long
ruby.find.require.return requiredfile:string file:string line:long $arg1:long $arg2:long $arg3:long
ruby.hash.create size:long file:string line:long $arg1:long $arg2:long $arg3:long
ruby.load.entry loadedfile:string file:string line:long $arg1:long $arg2:long $arg3:long
ruby.load.return loadedfile:string $arg1:long $arg2:long $arg3:long
ruby.method.entry classname:string methodname:string file:string line:long $arg1:long $arg2:long $arg3:long $arg4:long
ruby.method.return classname:string methodname:string file:string line:long $arg1:long $arg2:long $arg3:long $arg4:long
ruby.object.create classname:string file:string line:long $arg1:long $arg2:long $arg3:long
ruby.parse.begin parsedfile:string parsedline:long $arg1:long $arg2:long
ruby.parse.end parsedfile:string parsedline:long $arg1:long $arg2:long
ruby.raise classname:string file:string line:long $arg1:long $arg2:long $arg3:long
ruby.require.entry requiredfile:string file:string line:long $arg1:long $arg2:long $arg3:long
ruby.require.return requiredfile:string $arg1:long $arg2:long $arg3:long
ruby.string.create size:long file:string line:long $arg1:long $arg2:long $arg3:long

Ruby and other languages provide tools to give some information about aspects of the their operation, such as statistics about garbage collection. However, often these tools are designed just to work within the confines of that language and a single process. These language-specific tools are less useful for examining problems that span code written in different languages or problems between multiple communicating processes. There are existing compiled C libraries that have been plugged into Python, Ruby, and Java code. Having SystemTap tapsets for the operations and events in these shared libraries can make it easier to write diagnostics that give a clearer understanding of what is occurring across multiple software components of complex systems.

For example, you might have a Ruby program that uses a GUI built on GNOME GLib libraries. You observe pauses when doing the screen updates. You suspect that the pause might be due to Ruby’s garbage collection, but also consider that the pauses could be due to issues in the GLib library operations. The developers of the GLib libraries have included a number of instrumentation points in the libraries. Having instrumentation points in both the Ruby runtime and GLib libraries allows you to check both of those different areas of code using a single tool, SystemTap.

Over time, developers and maintainers of various software packages have added user-space markers to their applications. At Red Hat, we have enabled that instrumentation where possible in the RHEL packages. On RHEL 7.5, a number of RPMs make SystemTap user-space probes available, as seen with a query of files in /usr/share/systemtap/tapset:

$ cd /usr/share/systemtap/tapset; find -path "*.stp" -exec rpm -qf {} \; |sort |uniq

You should consider adding similar, handy instrumentation probe points to the applications you maintain to make it easier for you and others to investigate issues on complex systems. Adding User Space Probing to an Application (heapsort example) provides information about how to implement this instrumentation.