In this article, I will explain how to use Delve to trace your Go programs and how Delve leverages eBPF under the hood to maximize efficiency and speed. The goal of Delve is to provide developers with a pleasant and efficient Go debugging experience. In that vein, this article focuses on how we optimized the function tracing subsystem so you can inspect your programs and get to root-cause analysis quicker. Delve has two different backends for its tracing implementation, one is ptrace based, while the other uses eBPF. If you’re unfamiliar with any of these terms, don’t worry, I will explain along the way.
What is program tracing?
Tracing is a technique that allows a developer to see what the program is doing during execution. As opposed to typical debugging techniques, this method does not require direct user interaction. One of the most well-known tracing tools is strace, which allows developers to see which system calls their program during execution.
While the aforementioned strace tool is useful for gaining insight into system calls, the Delve trace command allows you to gain insight into what is happening in "userspace" within your Go programs. This Delve trace technique allows you to trace arbitrary functions in your program in order to see the inputs and outputs of those functions. Additionally, you can also use this tool to gain insight into the control flow of your program without the overhead of an interactive debugging session as it will also display with Goroutine is executing the function. For highly concurrent programs this can be a quicker way to gain insights into your programs execution without starting a full interactive debugging session.
How to trace Go programs with Delve
Delve allows you to trace your Go programs by invoking the dlv trace
subcommand. The subcommand accepts a regular expression and will execute your program, setting a tracepoint on each function that matches the regular expression and displaying the results in real time.
The following program is an example:
package main
import "fmt"
func foo(x, y int) (z int) {
fmt.Printf("x=%d, y=%d, z=%d\n", x, y, z)
z = x + y
return
}
func main() {
x := 99
y := x * x
z := foo(x, y)
fmt.Printf("z=%d\n", z)
}
Tracing this program will give you the following output:
$ dlv trace foo
> goroutine(1): main.foo(99, 9801)
x=99, y=9801, z=0
>> goroutine(1): => (9900)
z=9900
Process 583475 has exited with status 0
As you can see, we supplied foo
as the regexp, which in this case, matched the function of the same name in the main package. The output prefixed with >
denotes the function being called and shows the arguments the function was called by, while the output prefixed with >>
denotes the return from the function and the return value associated with it. All input and output lines are prefixed with the Goroutine executing at the time.
By default, the dlv trace
command uses the ptrace based backend, however adding the --ebpf
flag will enable the experimental eBPF based backend. Using the previous example, if we were to invoke the trace subcommand like the following:
$ dlv trace –ebpf foo
We would receive similar output. However, what happens behind the scenes is much different and significantly more efficient.
The inefficiencies of ptrace
By default, Delve will use the ptrace syscall in order to implement the tracing feature. The ptrace is a syscall that allows programs to observe and manipulate other programs on the same machine. In fact, on Unix systems, Delve uses this ptrace functionality to implement many low-level functionalities provided by the debugger, such as reading/writing memory, controlling execution, and more.
While ptrace is a useful and powerful mechanism, it suffers from inherent inefficiencies. First, the fact that ptrace is a syscall means that we must cross the user space/kernel space boundary, which adds overhead every time the function is used. This is compounded by the number of times we have to invoke ptrace in order to achieve the desired results. Considering the previous example, the following is a rough outline of the tracing implementation steps using ptrace:
- Start the program and attach the debugger using `ptrace(PT_ATTACH)`.
- Set a breakpoint at each function which matches the provided regular expression, using `ptrace` to insert the breakpoint instruction into the traced processes executable memory.
- Additionally, set breakpoint at each return instruction for that function.
- Continue the program, again using `ptrace(PT_CONT)`.
- Hit breakpoint at function entry, and read function arguments. This step can involve many ptrace calls as we read CPU registers, memory on the stack and memory in the heap if we must dereference a pointer.
- Continue the program again using `ptrace(PT_CONT)`.
- Hit breakpoint at function return, going through the same aforementioned process to read variables potentially involving many more calls to `ptrace` to read registers and memory.
- Continue the program again using `ptrace(PT_CONT)`.
- Repeat until the program ends.
Obviously, the more arguments and return values the function has, the more expensive every stop becomes. All the time the debugger spends making these `ptrace` syscalls, the program we are tracing is paused and not executing any instructions. From the users’ perspective, this makes the program run significantly slower than it otherwise would. Now, for development and debugging, maybe this isn’t such a big deal, but time is precious, and we should endeavor to do things as quickly as possible. The quicker your program runs while tracing, the quicker you can get to the root cause of the issue you’re trying to debug.
Now, the question becomes, how can we make this better? In the next section, we discuss the new eBPF based backend and how it improves upon this approach.
How eBPF is faster than ptrace
One of the biggest speed and efficiency improvements we can make is to avoid a lot of the syscall overhead altogether. This is where eBPF comes into play because instead of setting breakpoints on each function, we can instead set uprobes on function entry and exit and attach small eBPF programs to them. Delve uses the Cilium eBPF Go library to load and interact with the eBPF programs.
Each time the probe is hit, the kernel will invoke our eBPF program and then continue the main program once it has completed. The small eBPF program we write will handle all of the steps listed above at function entry and exit but without all the syscall context switching because the program executes directly within kernel space. Our eBPF program is able to communicate with the debugger in userspace via eBPF ringbuffer and map data structures, allowing Delve to collect all of the information it needs.
The benefit of this approach is that the time the program we are tracing needs to be paused is significantly decreased. Running our eBPF program when a probe is hit is much quicker than invoking multiple syscalls at function entry and exit.
The flow of tracing and debugging using eBPF
Again, using the previous example, the following is a rough outline of the tracing implementation steps using eBPF:
- Start the program and attach using `ptrace(PT_ATTACH)`.
- Load all uprobes into the kernel for each function to trace.
- Continue the program using `ptrace(PT_CONT)`.
- Hit uprobes at function entry / exit. In kernel space, each time a probe is hit, the kernel runs our eBPF program, which gathers function arguments or return values and sends them back to userspace. In user space, read from eBPF ringbuffer as function arguments, and return values are sent.
- Repeat until the program ends.
Using this method, Delve is able to trace a program in significantly less time than with the default ptrace implementation. Now, you may ask, if it is so much more efficient to use this method, why not make it the default? Eventually, it likely will be made default. But for the time being, development is still ongoing to improve this eBPF based backend and ensure it has parity with the ptrace based one. However, you can still use it today by supplying the `--ebpf` flag during the `dlv trace` invocation.
To give a sense of how much more efficient this method is, I measured a different example program running by itself and then under the different tracing methods with the following results.
Program execution: 23.7µs
With eBPF trace: 683.1µs
With ptrace tracing: 2.3s
The numbers speak for themselves!
Why not use uretprobes?
If you're familiar with eBPF a uprobes / uretprobes you may be asking yourself why we use uprobes for everything as opposed to simply using uretprobes to capture return arguments. The explanation for this gets relatively complex, but the short version is that the Go runtime needs to inspect the call stack at various times during the execution of a Go program. When uretprobes are attached to a function they overwrite the return address of that function on the stack. When the Go runtime then inspects the stack it finds an unexpected return address for the function and will end up fatally exiting the program. To work around this we simply use uprobes and leveraging Delves ability to inspect the machine instructions of the program to set probes at each return instruction for a function.
Delve debugs Go code faster with eBPF
The overall goal of Delve is to help developers find bugs in their Go code as quickly as possible. To do this, we leverage the latest methods and tech available and try to push the boundaries of what a debugger can accomplish. Delve leverages eBPF under the hood to maximize efficiency and speed. User space tracing is a great tool for any engineer to have in their toolbox, and we aim to make it efficient and easy to use.
Building and delivering modern, innovative apps and services is more complicated and fast-moving than ever. Join the Red Hat Developer program for tools, technologies, and community to level up your knowledge and career. Learn more...