Featured image for: Instant replay: Debugging C and C++ programs with rr.

Suppose you want to use GDB, the GNU debugger for C and C++ programs, to debug a program invoked from a shell script. You might have trouble knowing what is going on in the program because the script might give it a complicated run-time context, setting environment variables in various ways depending upon the machine, architecture, installed programs, etc. with which it's being run.

A good example of such a script is /usr/bin/firefox. On my Fedora 35 machine, the firefox script is 290 lines long. It mostly sets a lot of environment variables, but it also contains commands to make directories, remove files and directories, and make symbolic links. All these changes can have impacts on the binary when it runs. Near the end of the script, a command invokes (via exec) another script named run-mozilla.sh.

The run-mozilla.sh script itself is 356 lines long. It also sets environment variables and eventually invokes (also via exec) the Firefox binary. Additionally, the script provides options that allow you to debug the Firefox binary with a debugger, though for this article we won't use those options.

Use of a wrapper script to set environment variables and then invoke a binary is fairly common. On my Fedora 35 machine, more than 13 percent of the files in /usr/bin start with either #!/usr/bin/sh or #!/usr/bin/bash. An initial #! string on the first line of a text file is a convention known in Unix and Linux as a shebang. The line specifies the program that should run the script. In short, lots of programs in the directory named bin are not binaries. Some culminate in an exec command to run some other executable, as the run-mozilla.sh script does.

In the distant past, when attempting to debug programs associated with similar scripts, I'd examine the script and then set up what I perceived to be the relevant environment variables in an interactive shell session. After doing this, I'd invoke GDB in the usual way on the binary. However, it might take a fair amount of time to understand the wrapper script well enough to create an environment comparable to that created when running the script, and the whole procedure is error-prone.

It turns out that there's a far better and easier way to use GDB to debug binaries invoked via a wrapper script.

Debugging a binary run from a wrapper script via exec

It's common for wrapper scripts to use the shell's exec command to run a binary. The exec command causes the process in which the shell is running to be replaced by that of the binary. This is different from a fork and exec (which is used to run other non-builtin commands not prefixed by exec). A fork and exec creates a new process and enables the shell script to continue after the command it invokes has exited.

In order to use GDB to debug a binary invoked by the exec command, follow these steps:

  1. Make sure that the script in question uses exec to invoke the program you are debugging. You can identify whether the wrapper uses exec by simply searching for exec in the script. Once you find that command, verify that the exec command invokes the binary you want to debug. For instance, the last line of the /usr/bin/firefox script looks like this:

    exec $MOZ_LAUNCHER $script_args $MOZ_PROGRAM "$@"

    Furthermore, /usr/lib64/firefox/run-mozilla.sh contains the following line in the shell function moz_run_program:

    exec "$prog" ${1+"$@"}

    So the inner and outer wrapper scripts each use an exec command to run the next script or binary.

  2. Find the name of the shell used by the outermost wrapper script, usually specified by the shebang mentioned earlier. Thus, for Firefox, view the first line by entering:

    $ head -n 1 /usr/bin/firefox
    #!/usr/bin/bash

    The output shows that the firefox wrapper script uses the /usr/bin/bash shell to run the script.

  3. Start GDB by debugging the shell rather than the binary that you want to (eventually) debug. For Firefox, the command could look like this:

    $ gdb -q --args /usr/bin/bash /usr/bin/firefox

    The -q option just suppresses the copyright notice and other information that's normally printed by GDB when starting up. The --args option specifies /usr/bin/bash as the executable file to debug, and /usr/bin/firefox as a command-line argument to pass once GDB starts the executable.

  4. Once in GDB, use GDB's catch exec command to cause GDB to stop on an exec system call:

    (gdb) catch exec
  5. Use GDB's run command to start execution:

    (gdb) run
  6. When an exec catchpoint is hit, examine the message to see what binary will be debugged next. If it's just another shell, you probably want to continue. If the next file turns out to be the binary that you're interested in, you can start debugging the binary as usual. Typically, you now place a breakpoint on some function that you know will be hit prior to using continue.

An example GDB session follows, to demonstrate the steps just described. Note that, toward the end, I place a breakpoint on the main function and then continue to it.

$ gdb -q --args /usr/bin/bash /usr/bin/firefox
Reading symbols from /usr/bin/bash...
This GDB supports auto-downloading debuginfo from the following URLs:
https://debuginfod.fedoraproject.org/
Enable debuginfod for this session? (y or [n]) y
Debuginfod has been enabled.
To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit.
Reading symbols from /home/kev/.cache/debuginfod_client/65289d3e4b67a5f765c63c7ec51c7f28f753ce08/debuginfo...
(gdb) catch exec
Catchpoint 1 (exec)
(gdb) run
Starting program: /usr/bin/bash /usr/bin/firefox
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[several ‘Detaching after fork from child process' messages snipped]
process 876717 is executing new program: /usr/bin/bash
Catchpoint 1 (exec'd /usr/bin/bash), 0x00007ffff7fe7ac0 in _start () from /lib64/ld-linux-x86-64.so.2
(gdb) continue
Continuing.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[several ‘Detaching after fork from child process' messages snipped]
process 876717 is executing new program: /usr/lib64/firefox/firefox
Catchpoint 1 (exec'd /usr/lib64/firefox/firefox), 0x00007ffff7fe7ac0 in _start () from /lib64/ld-linux-x86-64.so.2
(gdb) break main
Breakpoint 2 at 0x55555559a7b0: file /usr/src/debug/firefox-100.0-4.fc35.x86_64/browser/app/nsBrowserApp.cpp, line 259.
(gdb) continue
Continuing.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff77ff640 (LWP 876922)]
[Thread 0x7ffff77ff640 (LWP 876922) exited]
Thread 1 "firefox" hit Breakpoint 2, main (argc=1, argv=0x7fffffffdb28, envp=0x7fffffffdb38) at /usr/src/debug/firefox-100.0-4.fc35.x86_64/browser/app/nsBrowserApp.cpp:259
259 int main(int argc, char* argv[], char* envp[]) {
(gdb)

At this point in the session, GDB has stopped at main in the Firefox binary. Debugging can proceed normally from this point. You can set additional breakpoints, continue to those breakpoints, examine the stack, look at variables, use the step or next commands, etc.

Debugging programs invoked via fork and exec from a wrapper script

Things get more complicated when you wish to debug a binary invoked via a fork and then an exec from a wrapper script. One complication is that the script might invoke a number of commands, continuing after each one. You have to take care to debug only the binary of interest. Another complication is that, by default, GDB doesn't follow the child after a fork. Fortunately, GDB offers a command to change that default behavior.

A reasonably simple yet interesting script to consider as an example is /usr/bin/zmore. The zmore script is part of the gzip package (on Fedora systems). It invokes gzip (in decompress mode) on the arguments (which are filenames) provided to the script and then pipes the decompressed output to $PAGER if that environment variable is defined or to more if it isn't. For this discussion, let's assume that PAGER is not defined and that we wish to debug more.

The last six lines of /usr/bin/zmore look like the following. The eval command simply chooses $PAGER or more based on the criteria I just explained:

for FILE
do
  test $# -lt 2 ||
    printf '::::::::::::::\n%s\n::::::::::::::\n' "$FILE" || break
  gzip -cdfq -- "$FILE"
done 2>&1 | eval ${PAGER-more}

Also, the first line of /usr/bin/zmore shows that the shell used by the script is /usr/bin/sh.

Let's start by making a test file compressed using gzip:

$ for i in {1..1000}; do echo $i; done | gzip >testfile.gz

I won't show the output here, but I suggest running the following commands to make sure that a suitable test file has been created and that more is being used to output it:

$ unset PAGER
$ zmore testfile.gz

When you use more as your pager, the string "--More–" is shown at the bottom of the terminal window.

Now, to debug more when invoked from zmore, start with the following:

$ gdb -q --args /bin/sh zmore testfile.gz
Reading symbols from /bin/sh...
This GDB supports auto-downloading debuginfo from the following URLs:
https://debuginfod.fedoraproject.org/
Enable debuginfod for this session? (y or [n]) y
Debuginfod has been enabled.
To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit.
Reading symbols from ~/.cache/debuginfod_client/65289d3e4b67a5f765c63c7ec51c7f28f753ce08/debuginfo…
(gdb)

The command used to invoke GDB is similar to that shown earlier for debugging Firefox, except that in this case I included the name of the test file as a third --args argument.

Also, note that I answered y to the "Enable debuginfod" question. Using debuginfod makes debugging this kind of program easy. Without it, you'd need to manually download debuginfo for each of the programs that you're debugging.

As before, issue the catch exec command to make GDB stop when the inferior is performing an exec system call:

(gdb) catch exec
Catchpoint 1 (exec)

Now use two commands to change how fork is handled. The first command, set detach-on-fork off, helps GDB control both the parent and child processes after a fork. The second command, set follow-fork-mode child, causes the child process to be debugged (instead of the parent, which is followed by default). These commands produce no output:

(gdb) set detach-on-fork off
(gdb) set follow-fork-mode child
(gdb)

Next, use the run command to start the program:

(gdb) run
Starting program: /usr/bin/sh zmore testfile.gz
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Attaching after Thread 0x7ffff7d65740 (LWP 943122) fork to child process 943125]
[New inferior 2 (process 943125)]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Attaching after Thread 0x7ffff7d65740 (LWP 943125) fork to child process 943126]
[New inferior 3 (process 943126)]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
process 943126 is executing new program: /usr/bin/gzip
Reading symbols from /lib64/ld-linux-x86-64.so.2...
[Switching to process 943126]
Reading symbols from /lib64/ld-linux-x86-64.so.2...
Thread 3.1 "gzip" hit Catchpoint 1 (exec'd /usr/bin/gzip), 0x00007ffff7fe7ac0 in _start () from /lib64/ld-linux-x86-64.so.2
(gdb)

Execution has stopped at the exec catchpoint for /usr/bin/gzip. If we wanted to debug gzip, we could place breakpoints and continue, but our goal is instead to debug more. Look at the inferiors that GDB knows about by entering info inferiors. Also, since you don't want to debug gzip, detach from that inferior.

(gdb) info inferiors
  Num  Description       Connection           Executable
  1    process 943122    1 (native)           /usr/bin/sh
  2    process 943125    1 (native)           /usr/bin/sh
* 3    process 943126    1 (native)           /usr/bin/gzip
(gdb) detach
Detaching from program: /usr/bin/gzip, process 943126
[Inferior 3 (process 943126) detached]
(gdb)

The info inferiors command showed three inferiors: gzip plus two /bin/sh inferiors. I think it's likely that inferior #2 is for the for loop being piped to more, so it's likely that you want to switch to inferior #1. But let's not assume that and instead switch to inferior #2, continue, and see what happens:

(gdb) inferior 2
[Switching to inferior 2 [process 943125] (/usr/bin/sh)]
[Switching to thread 2.1 (Thread 0x7ffff7d65740 (LWP 943125))]
#0  arch_fork (ctid=Reading symbols from /lib64/ld-linux-x86-64.so.2...
0x7ffff7d65a10) at ../sysdeps/unix/sysv/linux/arch-fork.h:52
52   ret = INLINE_SYSCALL_CALL (clone, flags, 0, NULL, ctid, 0);
(gdb) continue
Continuing.
[Inferior 2 (process 943125) exited normally]

That inner shell process won't always exit at this point. If it doesn't exit and appears to hang (while it's either reading input or waiting for more to consume its output), use Ctrl-C to interrupt GDB and then use the detach command on inferior #2. After switching to inferior #1 (as shown soon), if you see that it is interrupted due to SIGINT, just continue again. These extra steps can be avoided by simply switching to the desired inferior in the first place.

Now let's switch to inferior #1 and continue:

(gdb) inferior 1
[Switching to inferior 1 [process 943122] (/usr/bin/sh)]
[Switching to thread 1.1 (Thread 0x7ffff7d65740 (LWP 943122))]
#0  arch_fork (ctid=0x7ffff7d65a10)
    at ../sysdeps/unix/sysv/linux/arch-fork.h:52
52   ret = INLINE_SYSCALL_CALL (clone, flags, 0, NULL, ctid, 0);
(gdb) continue
Continuing.
[Attaching after Thread 0x7ffff7d65740 (LWP 943122) fork to child process 949916]
[New inferior 4 (process 949916)]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[Attaching after Thread 0x7ffff7d65740 (LWP 949916) fork to child process 949917]
[New inferior 5 (process 949917)]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
process 949917 is executing new program: /usr/bin/more
Reading symbols from /lib64/ld-linux-x86-64.so.2...
[Switching to process 949917]
Reading symbols from /lib64/ld-linux-x86-64.so.2...
Thread 5.1 "more" hit Catchpoint 1 (exec'd /usr/bin/more), 0x00007ffff7fe7ac0 in _start () from /lib64/ld-linux-x86-64.so.2
(gdb)

At this point, we've hit the exec catchpoint for /usr/bin/more, which is what we wanted to debug. Let's put a breakpoint on main and continue to it:

(gdb) break main
Reading symbols from /lib64/ld-linux-x86-64.so.2...
Breakpoint 2 at 0x555555556c40: main. (3 locations)
(gdb) continue
Continuing.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Thread 5.1 "more" hit Breakpoint 2, main (argc=1, argv=0x7fffffffddf8) at text-utils/more.c:2009
2009 {
(gdb)

Now that we're stopped at the main function of /usr/bin/more, let's restore GDB's fork-related settings to their defaults. This is recommended because, should more fork and exec some other program, we want to stay within more instead of following the child, whatever it is:

(gdb) set detach-on-fork on
(gdb) set follow-fork-mode parent
(gdb)

After restoring these settings, you can proceed to debug /usr/bin/more as normal using less esoteric GDB commands.

Using gdbserver to avoid writing to the same terminal as GDB

If you debug a program that produces output, like more in the previous section, output from the debugged program is normally sent to the same terminal as that used by GDB. If you attempt to simply enter continue, you'll find that GDB will stop due to a SIGTTOU signal. Further attempts to continue will repeatedly stop due to the SIGTTOU signal.

This problem can be avoided by running zmore with gdbserver in one terminal and connecting to the program from GDB running in another terminal. The command used to run gdbserver looks like this:

$ gdbserver localhost:12345 zmore testfile.gz
Process zmore created; pid = 1550989
Listening on port 12345

(If port 12345 is already in use, simply pick another port. Make sure you use the same port number when connecting to gdbserver from GDB.)

Connect to gdbserver from GDB as follows:

$ gdb -q
(gdb) target remote localhost:12345
Remote debugging using localhost:12345
Reading /usr/bin/bash from remote target...
[lots of output snipped]

Once connected, the session proceeds as shown earlier except that you need to use the continue command in place of the run command.

Conclusion

This article has shown how to use GDB to debug binaries run from a shell script. The main ideas presented were:

  • Use GDB to debug the shell binary used for running the script. The name of the script plus arguments to the script become arguments to the shell command.
  • Issue GDB's catch exec command to cause GDB to stop when an exec system call is encountered during program execution.
  • When debugging binaries invoked via a fork and exec, two additional commands, set detach-on-fork off and set follow-fork mode child, change GDB's default behavior with regard to forks.
  • When an exec catchpoint is reached, start debugging the binary as normal if it's the one that you wish to debug. If not, other commands are available to continue either the current inferior or some other inferior until a suitable exec catchpoint is reached.
  • gdbserver can be used in situations where it's confusing to distinguish output from GDB and output from the program, or where normal debugging is simply not possible due to continued receipt of SIGTTOU.