Many programming errors turn up when one function calls another. They can be caused, for instance, because the caller passes bad arguments or because a function is called when it shouldn't be. This article shows the tools offered to meet these challenges by the GNU Debugger (GDB), the standard open source debugger for C and C++ programs.
The commands in this article manipulate stack frames, which represent all the information stored on the stack when one function calls another. GDB allows you to see a lot of information related to each function call, such as local variables, who called what, and much more.
We will talk specifically about the following GDB commands:
backtrace
info locals
backtrace
up
anddown
finish
A recursive program to debug
Every programming tutorial needs a trivial sample program, and this article uses a recursive program because such programs are particularly good tools to explore what happens on the stack.
We'll use a trivial merge sort, which operates by dividing an array in half repeatedly and building a new array from the elements of the two halves. Such programs have to deal with a tricky question: How do we divide an odd number of elements in half? That dilemma forms the basis of the bug we'll uncover in this article.
Here's the buggy program we'll start with:
1 void sort(int* v, int size){
2 if(size == 2){
3 if(v[0] > v[1]) {
4 int tmp = v[0];
5 v[0] = v[1];
6 v[1] = tmp;
7 }
8 return;
9 }
10 int mid = size/2, i = 0, j = size/2, sorted[size], added = 0;
11 sort(v, mid);
12 sort(v+mid, mid);
13 while(i < mid && j < size)sorted[added++]=(v[i] < v[j])?v[i++]:v[j++];
14 while(i < mid) sorted[added++] = v[i++];
15 while(j < size) sorted[added++] = v[j++];
16 for(i = 0; i<size; i++) v[i] = sorted[i];
17 }
18
19 int main(){
20 int vec[] = {7, 3, 5, 4, 8, 2, 1, 6, 0};
21 sort(vec, 9);
22 return 0;
23 }
The code is not particularly beautiful, but hopefully it's readable enough for the purposes of this article. When the program is tested with vectors that have an even number of elements, such as 4 and 8, everything looks good. But when we test the program on an array of 9 elements ranging from 0 to 8, the program produces 1, 2, 3, 4, 5, 6, 7, 8, 0 as output, which is definitely not good.
Basic frame manipulation
Let's see what is happening inside the sort. In particular, we'll want to check that the functions are being called when they should be and are passed the correct arguments. The backtrace
command shows us the sequence of calls, and info locals
shows the local variables in the current function, including arguments passed to it by the caller. Set a breakpoint on line 10, run the program up to that breakpoint, and look at what is going on:
$ g++ -g -O0 merge.cc -o merge
$ gdb -q ./merge
Reading symbols from ./merge...done.
(gdb) break 10
Breakpoint 1 at 0x40063d: file merge.cc, line 10.
(gdb) run
Breakpoint 1, sort (v=0x7fffffffde10, size=9) at merge.cc:10
10 sort(v, mid);
(gdb) backtrace
#0 sort (v=0x7fffffffde10, size=9) at merge.cc:10
#1 0x0000000000400815 in main () at merge.cc:20
(gdb) info locals
mid = 4
i = 0
j = 4
sorted = {640, 0, -146166848, 32767, 656, 0, -149461383, 32767, 0}
added = 0
backtrace
lists the functions on the stack. It shows first the function we're currently in, as frame 0 or #0
. The #0
line is followed by all the functions that made calls leading up to the current function, in reverse order. So the first function in the program to run, main
, is last in the output.
info locals
, as you can imagine, shows you information about all local variables in the current frame. The output shows that mid
is 4. The midpoint is the smaller of the two middle indexes because integer division rounds down. That's fine; let's look at the next call:
(gdb) next
Breakpoint 1, sort (v=0x7fffffffde10, size=4) at merge.cc:10
10 sort(v, mid);
(gdb) backtrace
#0 sort (v=0x7fffffffde10, size=4) at merge.cc:10
#1 0x000000000040064e in sort (v=0x7fffffffde10, size=9) at merge.cc:10
#2 0x0000000000400815 in main () at merge.cc:20
We hit the same breakpoint again because we're in a recursive function. The stack has grown upward, because only frames #0
and #1
existed before, but now #2
appears as well. Because size
is 4, it will be 2 on the next call and will invoke the stop (or base) case, so the GDB won't reach our breakpoint at line 10 again. Instead of using next
until we're out of the program, we'll ask GDB just to finish
the current frame:
(gdb) finish
Run till exit from #0 sort (v=0x7fffffffde10, size=4) at merge.cc:10
sort (v=0x7fffffffde10, size=9) at merge.cc:11
11 sort(v+mid, mid);
(gdb) backtrace
#0 sort (v=0x7fffffffde10, size=9) at merge.cc:11
#1 0x0000000000400815 in main () at merge.cc:20
finish
runs the inferior until the end of the current frame, whereupon GDB prints the result of the call when exiting.
When we look back at the source code, we can see the reason for the problem: The code assumes that mid
can be used as the size for both halves of the divided array, whereas an odd array produces one array that is 1 larger than the other. Let's try fixing the problem by replacing line 11 with sort(v + mid, size - mid)
. Recompiling and running the code gives us:
(gdb) run
Program received signal SIGSEGV, Segmentation fault.
0x000000000040055f in sort (v=<error reading variable: Cannot access memory at address 0x7fffff7fefe8>, size=<error reading variable: Cannot access memory at address 0x7fffff7fefe4>) at merge_size.cc:4
4 void sort(int* v, int size){
Things are even worse now—we got a segmentation fault. Not just that: GDB can't even read the arguments that were passed, so there is something very wrong going on.
We could use the backtrace
command again, but in its bare form it won't be terribly useful, because it will list tens of thousands of frames. (We created runaway recursion that grew the stack until it overflowed the program's memory.) For more usable information, we can ask instead for only the last 10 frames and see whether they show a pattern:
(gdb) backtrace 10
#0 0x000000000040110f in sort (v=<error reading variable: Cannot access memory at address 0x7fffff7fefd8>, size=<error reading variable: Cannot access memory at address 0x7fffff7fefd4>) at merge.cc:1
#1 0x0000000000401201 in sort (v=0x7fffffffe0f0, size=0) at merge.cc:11
#2 0x0000000000401201 in sort (v=0x7fffffffe0f0, size=0) at merge.cc:11
#3 0x0000000000401201 in sort (v=0x7fffffffe0f0, size=0) at merge.cc:11
#4 0x0000000000401201 in sort (v=0x7fffffffe0f0, size=0) at merge.cc:11
#5 0x0000000000401201 in sort (v=0x7fffffffe0f0, size=0) at merge.cc:11
#6 0x0000000000401201 in sort (v=0x7fffffffe0f0, size=0) at merge.cc:11
#7 0x0000000000401201 in sort (v=0x7fffffffe0f0, size=0) at merge.cc:11
#8 0x0000000000401201 in sort (v=0x7fffffffe0f0, size=0) at merge.cc:11
#9 0x0000000000401201 in sort (v=0x7fffffffe0f0, size=0) at merge.cc:11
(More stack frames follow...)
Yes, there does seem to be a pattern: We get infinite recursion when the vector size is 0. But a size of 0 shouldn't even be possible! Why is this happening?
Intermediate frame manipulation
To understand the cause of our latest failure, we'll use something called a conditional breakpoint. This is a breakpoint where GDB stops only if a certain condition is met. A later article will explain conditional breakpoints in more detail. For now, tell GDB to stop the first time it finds that size
is 0:
(gdb) break sort if size == 0
Breakpoint 1 at 0x400566: file merge.cc, line 15.
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/blarsen/Documents/blog/merge
Breakpoint 1, sort (v=0x7fffffffde18, size=0) at merge.cc:15
15 for(i = 0; i<size; i++) v[i] = sorted[i];
(gdb) backtrace
#0 sort (v=0x7fffffffde18, size=0) at merge.cc:15
#1 0x000000000040064e in sort (v=0x7fffffffde18, size=1) at merge.cc:10
#2 0x000000000040064e in sort (v=0x7fffffffde18, size=3) at merge.cc:10
#3 0x0000000000400673 in sort (v=0x7fffffffde10, size=5) at merge.cc:11
#4 0x0000000000400673 in sort (v=0x7fffffffde00, size=9) at merge.cc:11
#5 0x0000000000400819 in main () at merge.cc:20
Ignore the output claiming that the breakpoint is on line 15; this error is due to a GCC bug. We are actually stopping at line 2.
Using the up
command, we can look around frame #1
(moving "up" the call stack) as if the code was stopped there instead of the frame below.
If we went down the wrong rabbit hole, we can even move up a few frames and finish
that frame (although next
and step
don't work in that context, unfortunately). The opposite of the up
command is down
. Here we run just up
:
(gdb) up
#1 0x0000000000401201 in sort (v=0x7fffffffe108, size=1) at merge.cc:11
11 sort(v, mid);
(gdb) info locals
mid = 0
i = 0
j = 0
sorted = {0}
added = 0
Another command, info frame
, can turn up odd things about the current frame:
(gdb) info frame
Stack level 1, frame at 0x7fffffffdd40:
rip = 0x40064e in sort (merge.cc:10); saved rip = 0x40064e
called by frame at 0x7fffffffdda0, caller of frame at 0x7fffffffdce0
source language c++.
Arglist at 0x7fffffffdd30, args: v=0x7fffffffde10, size=1
Locals at 0x7fffffffdd30, Previous frame's sp is 0x7fffffffdd40
Saved registers:
rbx at 0x7fffffffdd28, rbp at 0x7fffffffdd30, rip at 0x7fffffffdd38
Our problem, as it turns out, is that we have no case to handle size == 1
. If size == 2
, a stop case ends recursion gracefully. But when 3 is divided in half (with integer division) to produce 1, the next iteration of the recursive function skips the stop case and tries to continue to divide the array of length 1 into two arrays.
The fix is to add a new stop condition when size
is 1. The start of the sort
function now looks like this:
1 void sort(int* v, int size){
2 if(size == 2){
3 if(v[0] > v[1]) {
4 int tmp = v[0];
5 v[0] = v[1];
6 v[1] = tmp;
7 }
8 return;
9 } else if (size == 1){
10 return; //size 1 arrays are always ordered
11 }
Recompiling and running through the debugger yields this result:
(gdb) break 23
Breakpoint 1 at 0x40082d: file merge.cc, line 23.
(gdb) run
Starting program: /home/blarsen/Documents/blog/merge
Breakpoint 1, main () at merge.cc:23
23 return 0;
(gdb) print vec
$1 = {0, 1, 2, 3, 4, 5, 6, 7, 8}
Conclusion
This article has presented the basic concepts for thinking about frames when debugging. If you are in a reading mood and would like to understand how GDB can think that you are stopped in a completely unrelated line because of that GCC bug we encountered above, please read The GDB developer's GNU Debugger tutorial: All about debuginfo, which explains how GDB gets debugging information in the first place.