As every developer knows, debugging an application can be difficult and often enough you spend as much or more time debugging an application as originally writing it. Every programmer develops their collection of tools and techniques. Traditionally these have included full-fledged debuggers, instrumentation of the code, and tracing and logging. Each of these has their particular strengths and weaknesses.
Source-level debuggers like GDB, TotalView, and DDT operate outside of the address space of the programs being debugged and use special syscalls like ptrace to adopt processes and to read and write to the process's memory. To allow debugging, they maintain external control over the processes and threads. This external control can be a slow process and unacceptably affect process performance because nearly every interaction requires round trips through the kernel and coordination with signals. The interface for debuggers to control processes was designed for a different era when processes didn’t have many threads and consequently these interfaces are not well suited to handling processes with many coordinating processes and threads. For example, many of the debugging functions require quiescing at least some portion of the process. Just fully quiescing a process with ephemeral threads can be difficult or nearly impossible if the process being debugged is a complicated multithreaded state machine. One common problem is when bugs are caused by the interactions between the timing of threads the simple act of stopping of threads and process quiescing that debuggers need to do can make bugs disappear and become Heisenbugs.
- Full access to the process address space. i.e. not limited by scoping rules.
- The ability to stop processes and look around at variables and the state of execution.
- Functions can be executed completely outside of the program’s execution.
- The ability to modify data within the running program.
- External control can be slow due to round trips through the kernel.
- Difficulty managing multiple threads and quiescing the system.
- Difficulty reproducing timing sensitive issues involving multiple threads.
- Especially with complex data structures like containers, the abstractions directly available in the debugger often times are not ones, which the programmer are used to.
- Inline functions can disappear even when they are used while other functions may not be instantiated or linked in.
- External debuggers can be confused by optimizations.
Internal instrumentation and embedded logging
Problems like those mentioned above have led many developers to fall back to the old standby of instrumenting their code. At the simplest level, this can be adding some kind of debug printing or logging at various points in the code. This is simple and effective and has the advantage that the debug tooling is written in the same language as the program. The instrumentation also has the advantage of having direct access to all the data and variables in a context where the instrumentation is inserted. Though debug information and debuggers have gotten much better at debugging optimized code it still isn’t perfect, instrumentation has the advantage that it isn’t confused by optimizations or complex data representations. This advantage of local context and direct access to data can also be a curse because embedded instrumentation is limited by language scoping. If the data you want to present for debugging purposes isn’t accessible or is out of scope, unlike a source level debugger, embedded instrumentation can’t directly access it.
For all the simplicity of embedded instrumentation, it also introduces another set of problems. Do you leave the instrumentation or logging in production code and then turn it off or on with some type of mechanism? How do you select which part of the program you are going to debug? Where should this debugging output go? Will the receiving end of this information have the bandwidth and capacity to deal with the rate at which this information is generated? How do you provide enough synchronization and coordination that threads do not step on each other when writing their debug information.
- The data structures and abstractions are accessible in the same idiom as the code being debugged.
- Can be very quick because it runs inside the context of the process.
- All accessed functions are linked in and inline functions are available.
- Doesn’t get confused by optimization.
- Data, which is out of scope, can be very hard to access.
- Recompilation is often necessary.
- Active instrumentation isn’t advisable on the production code.
- Instrumentation is often ephemeral or it must be incorporated into a broader debug logging framework.
- Debug logging frameworks can be hard to target for unusual problems.
- High rates of logging for multithreaded applications can introduce interesting challenges themselves like requiring synchronization and handling large volumes of data.
- Too much logging in certain areas can affect performance even when it is not enabled.
To resolve many of the challenges presented by internal instrumentation while retaining almost all of their advantages, tracing frameworks have been developed. These are essentially libraries, which solve many of the challenging problems presented by embedded logging and internal instrumentation. For example, LTTng allows a developer to instrument code in a way that is low overhead for production code, but then selectively turn on and off various tracepoints as needed to debug a problem. It provides low overhead synchronization and coordination between threads writing data. It also has several mechanisms to receive the trace data and efficiently process it so that the transport and storage of the trace data isn’t overwhelmed.
Advanced tracing frameworks like LTTng are a huge step forward over simple embedded instrumentation but they still have some limitations. They can generate and pass through huge amounts of data, assuming that storage is no problem, finding the problematic interaction in the huge sheaf of recorded events and piecing together an understanding of the state of a complex application from the logged data can be quite difficult. This can be particularly difficult when the problematic effect and its root cause are temporarily dislocated. Furthermore, in spite of all the efforts of the designers of tracing frameworks like LTTng, there is still notable overhead in recording the traced events. They usually require a system call to write the data or at least synchronization on what could become a highly contested lock.
- Much of the complexity of the tracing framework is already implemented and debugged in the library.
- The tracing tools are sophisticated enough to provide effective ways to handle potentially large amounts of data from the app.
- The tracing library provides well-targeted mechanisms to enable and disable specific instrumentation.
- The data being marshaled to be presented by tracepoints is accessible in the same way as in the rest of the program.
- Tracepoints can be left in production code and can lie latent with minimal performance impact.
- The language scoping rules still apply and data out of scope can still be hard to access.
- The amount of data being logged can affect performance and can still potentially overwhelm storage and the transport channel.
- The tracing subsystem may still affect performance and mask timing sensitive problems by introducing context switching, synchronization, and syscalls when writing data or logging events.
- If the tracepoints and the data provided are not sufficient to identify the problem, a recompile may be necessary.
- Coming to an understanding, a problem based upon huge amounts of logged data can be difficult.
Programmatic debugging: the concept
Programmatic debugging is a new paradigm in debugging that is emerging. It is designed to address many of the challenges affecting other debugging paradigms. The general concept is that you insert functions and control code into the process’s address space using the same syscall interface used by source level debuggers. This code is a special purpose debugger and written either to solve a particular problem or to illuminate a class of related problems. This debugging code can be inserted into a running process’s address space so it does not need to be coded into the program’s source code the way that instrumentation is. This allows it to be used without having to edit and recompile the original source.
By having the debugging code live in the process’s address space rather than being back in the debugger, there isn’t the latency of a signal and a context switch introduced when a monitored event is encountered. The event is handled in the context of the process and the thread and so components of the context such as lock state remain intact. This overcomes one of the bigger weaknesses of most source level debuggers, the latency of responding to events.
Another advantage of having a special purpose debugger running in the executing process’s address space is that it can directly access the process’s memory to both read and write it. This can be much faster than the numerous ptrace peeks and pokes and all their associated trips through the kernel to gather or modify data in a process.
When used as a form of smart instrumentation to code, this special purpose debugger can do data reductions at the source making sure that only truly interesting events are logged. This overcomes one of the biggest problems with tracing a large-scale application, managing the volume of data being generated. No matter how advanced the design of a tracing framework, at some point data reduction at the source is the only practical approach. This data reduction can include any logic that can be conceived and implemented in the context of the process. So one capability of programmatic debugging is that it can be fashioned into a kind of smart tracer, which applies logic to events before passing them along.
- Can be very quick because it runs inside the context of the process.
- Can access variables that are out of scope at the locations where data is being recorded.
- Doesn’t require recompilation to change the instrumentation or run different tests.
- Doesn’t clutter up the original source code of an application with tracing or instrumentation.
- Doesn’t affect the performance of production code when not in use.
- Doesn’t require quiescing a program for most operations except initially inserting or changing the instrumentation.
- Fully programmable:
- Can be programmed to do data reduction before reporting results to a tracer or logger.
- Can be programmed to only record specific events
- Can be used to patch around broken code without having to recompile.
- Can modify data.
- Still cutting edge.
- Can be technically challenging.
Programmatic debugging: Conceptually how does it work
Programmatic debugging must use the same tool interfaces that other debuggers have, it just applies them in different ways. First, you have to write some code to execute in the context of the process. This will be compiled just like any other function in a program but since it will be inserted into the address space of another process, it usually requires some special compiler flags.
Since only the most general purpose, programmatic debuggers don’t need to access program variables, these debugger functions need to be able to locate functions and variables. When these debuggers are being compiled, they locate the functions and variable using the debug information that was generated when the same target program was compiled. Locating variables this way gives them an advantage over embedded instrumentation in that they are not limited by the scoping rules of the language. If a variable is out of context in a function, it still can be found in the debug information so long as the proper scope is specified.
Inserting these programmatic debuggers into a process’s address space is done using the same tool interfaces, i.e. ptrace, as the source level debuggers use. It basically pokes the function into the process’s address space.
Finally, for this programmatic debugger to actually do anything it must be triggered somehow. Once again using the debug information about the target, it can find functions, lines of code and addresses and mutate the binary to redirect program control flow to the programmatic debugging functions.
|Source level debuggers
|printf, PAPI, Caliper
|SystemTap with the dyninst runtime
|medium to fast
|Limited by scoping rules
|Data exposed in the idiom of the original program
|Performance Impact when active
|Performance impact when inactive
|Can be used on running production programs
|Requires recompilation of target program for a new test
|Can do data reduction at source
|Can modify or replace functions on running code
|Extensive when using manual tools.
Programmatic debugging is an emerging paradigm for debugging. It seeks to overcome many of the limitations of other approaches to debugging and is especially useful for complex multithreaded and distributed applications. It allows programmers to write special purpose tools to debug applications as they run without modifying their source code or needing to recompile. Most of the time, it doesn’t suffer the overhead or the latency of the context swaps needed by other kinds of debuggers. It allows direct access to variables that are out of scope at the point in the code where a targeted event happens. It also can be used to filter events down to only the most interesting ones at the source, therefore, bypassing some of the data transport or storage requirements that often are a challenge for tracers.
In part two of this series, I will show how userspace SystemTap can be used to quickly develop some simple kinds of programmatic debuggers.
Visit the Red Hat portfolio of products, empowering professional developers to be more productive and build great solutions.Last updated: February 2, 2017