Featured image for: Instant replay: Debugging C and C++ programs with rr.

The common theme in many time-travel movies is to go back in time to find out what went wrong and fix it. Developers also have that desire to go back in time and find why the code broke and fix it. But, often, that crucial step where everything went wrong happened long ago, and the information is no longer available.

The rr project lets programmers examine the entire life of a C or C++ program run, and replay code execution to see what action in the past caused "things to go horribly wrong." rr is packaged with Extra Packages for Enterprise Linux (EPEL) in Red Hat Enterprise Linux (RHEL), and with Fedora 31, 32, 33, and 34.

rr records trace information about the execution of an application. This information allows you to repeatedly replay a particular recording of a failure and examine it in the GNU Debugger (GDB) to better investigate the cause. In addition to replaying the trace, rr lets you run the program in reverse, in essence allowing you "rewind the tape" to see what happened earlier in the execution of the program.

The techniques that rr provides for recording the reproducer for further examination can be a useful addition to traditional core dumps and backtraces, which give a snapshot of an issue at  a particular moment. The rr recording can provide a way for developers to further investigate intermittent problems where only some application runs fail.

Let's see how to set up rr and use it in an example to better illustrate its utility.

Requirements and setup

Because rr uses a number of newer Linux kernel features and specific processor performance monitoring hardware, the environments it runs in are limited. The newer Fedora kernels have the needed support. However, to correctly track and recreate when asynchronous events happen in a thread, rruses performance monitoring hardware event counters that provide deterministic counts of when a thread is interrupted by an asynchronous event. Currently, rr supports these counts only for Intel processors using the Westmere or newer microarchitectures.

Installing rr on Fedora is a simple task, requiring a single RPM. If you are on RHEL, you will need to enable EPEL. Once you have access to the appropriate repositories, you can install rr with the following command:

$ sudo dnf -y install rr

Due to the possibility that hardware performance monitoring counters might leak privileged information, many kernels default to limiting their monitoring capabilities. You will need to run the following command to allow rr access to the performance monitoring counters:

$ sudo sh -c 'echo 1 >/proc/sys/kernel/perf_event_paranoid'

If you want to make that setting for the performance monitoring hardware permanent, you can also add the following line to /etc/sysctl.conf:

kernel.perf_event_paranoid = 1

A simple debugging example

The following is a toy program that computes 2 times 0, 1, 2, and 3 in an array and prints the information:

#include <stdio.h>
#define SIZE 4

void zero(char *a, int size)
{
	while (size>0)
		a[size--] = 0; 
}

void initialize(char *a, int size)
{
	zero(a, size);
}

void multiply(char *a, int size, int mult)
{
	int i;
	for (i=0; i<size; i++)
		a[i] = i * mult;
}

void pr_array(char *a, int size)
{
	int i;
	for (i=0; i<size; i++)
		printf("f(%d)=%d\n", i, a[i]);
}

int main(int argc, char **argv)
{
	char a[SIZE];
	int mult = 2;

	initialize(a, SIZE);
	multiply(a, SIZE, mult);
	pr_array(a, SIZE);
	return 0;
}

Compile it with the usual command, as follows, and include the debug option -g to allow later debugging:

$ gcc -g multiply.c -o multiply

When you run multiply, the results may be very surprising. All the results are zero. The main function is passing in 2 to the multiply function. The loop in that function is very straightforward. How could this possibly go wrong?

$ ./multiply
f(0)=0
f(1)=0
f(2)=0
f(3)=0

Recording the issue

You can investigate what is going wrong with rr. Record a run of the multiply program that demonstrates the issue with the following command. If the problem is intermittent, you could make multiple runs until the issue is seen:

$ rr record ./multiply
rr: Saving execution to trace directory `/home/wcohen/.local/share/rr/multiply-0`.
f(0)=0
f(1)=0
f(2)=0
f(3)=0

Replaying and investigating the issue

To start debugging the issue, use the rr replay command, which puts you into a GDB session:

$ rr replay

You are at the start of execution:

(rr) where
#0 0x00007f7f56b1f110 in _start () from /lib64/ld-linux-x86-64.so.2
#1 0x0000000000000001 in ?? ()
#2 0x00007ffe0bea6889 in ?? ()
#3 0x0000000000000000 in ?? ()

You can continue the program from the start and see that it has the same results as before:

(rr) c
Continuing.
f(0)=0
f(1)=0
f(2)=0
f(3)=0

Program received signal SIGKILL, Killed.
0x0000000070000002 in ?? ()

You can set a breakpoint at return 0 in the main function and work backward from there:

(rr) break multiply.c:37
Breakpoint 1 at 0x401258: file multiply.c, line 37.
(rr) c
Continuing.
f(0)=0
f(1)=0
f(2)=0
f(3)=0

Breakpoint 1, main (argc=1, argv=0x7ffe0bea5c58) at multiply.c:37
37 return 0;

First, we check the values in the array a. Maybe the pr_array function printed wrong values. But that isn't the problem, because according to GDB, the values are all 0:

(rr) print a[0]
$5 = 0 '\000'
(rr) print a[1]
$6 = 0 '\000'
(rr) print a[2]
$7 = 0 '\000'
(rr) print a[3]
$8 = 0 '\000'

Maybe pr_array corrupted the values. Let's set a breakpoint on the entry to pr_array and go backward in execution with a reverse-continue command, to see what the state was before the pr_array function executed. Looks like pr_array is printing the correct values:

(rr) break pr_array
Breakpoint 2 at 0x4011cc: file multiply.c, line 25.
(rr) reverse-continue
Continuing.

Breakpoint 2, pr_array (a=0x7ffe0bea5b58 "", size=4) at multiply.c:25
25 for (i=0; i<size; i++)
(rr) print a[0]
$9 = 0 '\000'
(rr) print a[1]
$10 = 0 '\000'
(rr) print a[2]
$11 = 0 '\000'
(rr) print a[3]
$12 = 0 '\000'

Maybe something is going wrong in the multiply function. Let's set a breakpoint on it and reverse-continue to it:

rr) break multiply
Breakpoint 3 at 0x401184: file multiply.c, line 18.
(rr) reverse-continue
Continuing.

Breakpoint 3, multiply (a=0x7ffe0bea5b58 "", size=4, mult=0) at multiply.c:18
18 for (i=0; i0)
(rr) c
Continuing.

Wait!

What happened to the mult argument? It should be 2. Zero times anything is zero. That explains the results. However, how did the main function's local variable mult end up being zero? It is initialized in main and only passed to the compute function. Let's set a hardware watchpoint on mult and continue the reverse execution of the program:

(rr) up
#1  0x0000000000401247 in main (argc=1, argv=0x7ffe0bea5c58) at multiply.c:35
35		multiply(a, SIZE, mult);
(rr) watch -l mult
Hardware watchpoint 4: -location mult
(rr) reverse-continue
Continuing.

Hardware watchpoint 4: -location mult

Old value = 0
New value = 2
zero (a=0x7ffe0bea5b58 "", size=3) at multiply.c:7
7			a[size--] = 0; 

Ah, now it's becoming clear how things went wrong. The zero function wrote past the end of the a array and overwrote the mult variable even though it wasn't passed in. The size-- statement should be --size. We can see the call to zero hidden in the initialize function:

(rr) where
#0  zero (a=0x7ffe0bea5b58 "", size=3) at multiply.c:7
#1  0x0000000000401173 in initialize (a=0x7ffe0bea5b58 "", size=4)
    at multiply.c:12
#2  0x0000000000401233 in main (argc=1, argv=0x7ffe0bea5c58) at multiply.c:34

If we want, we can use regular GDB continues and play it forward and go through those breakpoints we set earlier again:

(rr) c
Continuing.

Hardware watchpoint 4: -location mult

Old value = 2
New value = 0
zero (a=0x7ffe0bea5b58 "", size=3) at multiply.c:6
6		while (size>0)
(rr) c
Continuing.

Breakpoint 3, multiply (a=0x7ffe0bea5b58 "", size=4, mult=0) at multiply.c:18
18		for (i=0; i<size; i++)
(rr) c
Continuing.

Breakpoint 2, pr_array (a=0x7ffe0bea5b58 "", size=4) at multiply.c:25
25		for (i=0; i<size; i++)

Fixing the issue

Now that we've identified the problem, we can fix the zero function in the "new and improved" multiply2.c program, and things work as expected:

$ gcc -g multiply2.c -o multiply2
$ ./multiply2
f(0)=0
f(1)=2
f(2)=4
f(3)=6

Limitations of rr

Although rr is a useful addition to the tools that help developers find issues in programs, it does have limitations:

  • All the threads that rr records run on a single core, so multithreaded applications will be slower.
  • rr syscall monitoring is not complete, so some syscalls might slip through the cracks and not be recorded in the trace.
  • rr uses ptrace to monitor apps and will not work well with apps that also use ptrace.
  • rr does not monitor processes outside the children of what it is recording, and misses any communication through shared memory to an outside process.
  • rr operates only on very specific processor microarchitectures.

Conclusion

The ability of rr to go backward in the execution of a program to investigate the earlier events that lead to a problem is a useful addition to the developer's set of tools. The example in this article illustrates how you might use rr to track down problems. The example was just a toy, but rr has been used to track down problems in much much more substantial applications such as JavaScriptCore, the Apache httpd server, and Ruby on Rails.

See the following resources for additional information:

Last updated: October 14, 2022