Featured image for Python topics.

Many popular Python modules are written in the C language, and bugs in C extensions can cause nasty crashes that Python's error-catching mechanism won't catch. Fortunately, numerous powerful debuggers—notably, the GNU Project Debugger (GDB)—were designed for the C language. In Python 3.9, developers can use these to debug Python programs, and particularly the C extensions included in Python programs.

This article shows how to use the improved Python debug build in Python 3.9. I'll first discuss how we adapted Python to allow developers to use traditional C debuggers, then show you how to use the debug build and GDB to debug C extensions in a Python program.

Getting started with Python 3.9

Python 3.9 is now provided in the Red Hat Enterprise Linux 8.4 AppStream. The command to install the new version is:

$ sudo yum install python3.9

Python 3.9 brings many new features:

  • PEP 584: Union operators added to dict.
  • PEP 585:Type hinting generics in standard collections.
  • PEP 614: Relaxed grammar restrictions on decorators.
  • PEP 616: String methods to remove prefixes and suffixes.
  • PEP 593: Flexible function and variable annotations.
  • A new os.pidfd_open() call that allows process management without races and signals.
  • PEP 615: Relocation of the IANA Time Zone Database to the standard library in the zoneinfo module.
  • An implementation of a topological sort of a graph in the new graphlib module.

See What’s New In Python 3.9 for the full list of changes.

Using C debuggers in Python

When a Python executable is highly optimized, such as the one shipped in RHEL, a typical C debugger doesn't work well. The debugger can't read many helpful pieces of information, such as function arguments, type information, and local variables.

Python does have a built-in fault-handler module that prints the Python traceback when a crash occurs. But when a Python object is corrupted (by a buffer overflow or for any other reason), the executable can continue for a long time before crashing. In this case, knowing the crash location is useless. Usually, the crash occurs during a garbage collection, when Python visits all Python objects. It's therefore hard to guess how the object was corrupted.

Unfortunately, for various reasons, some bugs can be reproduced only on production systems, not on developers' workstations. This adds to the importance of a good debugger.

Python can be built in debug mode, which adds many runtime checks. It helps to detect bugs such as corrupted Python objects. Prior to Python 3.9, a major usability issue was the need to rebuild C extensions in debug mode so they could run with a debug build of Python.

How we improved the Python debug build

I have been working for three years on the Python debugging experience to make it easier to use a C-language debugger such as GDB on Python. This section discusses the changes to Python that were required.

ABI compatibility

The first practical issue was that C extensions needed to be rebuilt in debug mode to be able to use a Python debug build.

I made the Python debug build compatible at an application binary interface (ABI) level with the Python release build in Python issue 36465. The main PyObject C structure is now the same in release and debug builds.

The debug build no longer defines the Py_TRACE_REFS macro, which caused the ABI incompatibility. If you want the macro, you need to explicitly request it through the ./configure --with-trace-refs build option. See the commit for more details.

C extensions are no longer linked to libpython

Another issue was that C extensions were linked to libpython. When a C extension was built in release mode and imported into a Python executable that was built in debug mode, the extension pulled in a version of libpython built in release mode, which was incompatible.

Python functions such as PyLong_FromLong() are already loaded in the running Python process. C extensions inherit these symbols when their dynamic libraries are loaded. Therefore, linking C extensions to libpython explicitly is not strictly required.

I modified how C extensions are built in Python 3.8 so the extensions are no longer linked to libpython: See Python issue 21536. Some RHEL packages contained C extensions that linked to libpython manually; these had to be modified further.

Compiler optimizations disabled in the debug build

Last but not least, the Python package was modified to build Python in debug mode with gcc -O0 rather than gcc -Og. The -Og option is meant to allow some optimizations that don't interfere with debug information. In practice, GDB is fully usable only on an executable built with -O0, which disables all compiler optimizations.

Debugging with GBD in Python 3.9

The Python 3.9 debug build shipped with RHEL 8.4 combines all of these enhancements and is now usable with debuggers. A Python 3.9 executable built in debug mode can import C extensions built in release mode. In short, the python3.9d executable can be used as a seamless drop-in replacement for the usual python3.9 to help you run a debug session.

A special debug build of Python can work with a C debugger pretty much like a C program. This section shows how to use GDB to debug a Python program, plus some special debugger commands Python provides.

Before: Trying GDB on a Python release build

Before showing how debugging works better with the new Python 3.9 debug build, let's start with the release build, which is not usable with GDB.

First, install GDB and the Python 3.9 debug symbols:

$ sudo yum install gdb
$ sudo yum debuginfo-install python39

Create a simple Python program named slow.py to play with GDB:

import time
def slow_function():
    print("Slow function...")
    x = 3
    time.sleep(60 * 10)
slow_function()

Debug slow.py in GDB and interrupt it with Ctrl+C:

$ gdb -args python3.9 slow.py
(gdb) run
Slow function...
^C

Program received signal SIGINT, Interrupt.
0x00007ffff7b790e7 in select () from /lib64/libc.so.6

(gdb) where
#0  select () from /lib64/libc.so.6
#1  pysleep (secs=<optimized out>) at .../Modules/timemodule.c:2036
#2  time_sleep (self=<optimized out>, obj=<optimized out>, self=<optimized out>,
    obj=<optimized out>) at .../Modules/timemodule.c:365
(...)
#7  _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>,
    throwflag=<optimized out>) at .../Python/ceval.c:3487
3487     res = call_function(tstate, &sp, oparg, NULL);
(...)

Note: The previous GDB output was reformatted and truncated to make it easier to read.

If you try to explore the problem, you find that GDB fails to read the function arguments in pysleep():

(gdb) frame 1
#1  0x00007ffff757769a in pysleep (secs=<optimized out>)
    at .../Modules/timemodule.c:2036
2036     err = select(0, (fd_set *)0, (fd_set *)0, (fd_set *)0, &timeout);
(gdb) p secs
$1 = <optimized out>

GDB also fails to read _PyEval_EvalFrameDefault() local variables:

(gdb) frame 7
#7  _PyEval_EvalFrameDefault (tstate=<optimized out>, f=<optimized out>,
    throwflag=<optimized out>)
    at .../Python/ceval.c:3487
3487                res = call_function(tstate, &sp, oparg, NULL);
(gdb) p opcode
$11 = <optimized out>
(gdb) p oparg
$10 = <optimized out>

In the previous output, GDB displays <optimized out>, rather than expected values. Usually, this means that CPU registers are used for these values. Since CPU registers are used for multiple purposes, GDB cannot guess whether the register currently contains the specified function argument or variable or something else.

In addition, the python3.9 executable is built in release mode with link time optimization (LTO), profile guided optimization (PGO), and gcc -O2 optimizations. Because of these optimizations, when debugged functions get inlined by the compiler, GDB's where command can display invalid call stacks.

After: Using GDB on the new debug build

Now install the new Python 3.9 debug build:

$ sudo yum module enable --enablerepo=rhel-CRB python39-devel
$ sudo yum install --enablerepo=rhel-CRB python39-debug
$ sudo yum debuginfo-install python39-debug

These commands enable the python39-devel module, install the python39-debug package from this module, and then install debug symbols. The Red Hat CodeReady Linux Builder repository is enabled in these commands to get the python39-devel module.

Now, run GDB again to debug the same slow.py program, but using python3.9d. Again, interrupt the program with Ctrl+C:

$ gdb -args python3.9d slow.py
(gdb) run
Slow function...
^C

Program received signal SIGINT, Interrupt.
select () from /lib64/libc.so.6

(gdb) where
#0  select () from /lib64/libc.so.6
#1  pysleep (secs=600000000000) at .../Modules/timemodule.c:2036
#2  time_sleep (self=<module at remote 0x7ffff7eb73b0>, obj=600)
    at .../Modules/timemodule.c:365
(...)
#7  _PyEval_EvalFrameDefault (tstate=0x55555575a7e0,
        f=Frame 0x7ffff7ecb850, for file slow.py, line 5, in slow_function (x=3),
        throwflag=0) at .../Python/ceval.c:3487
(...)

Reading the pysleep() function arguments now gives the expected values:

(gdb) frame 1
#1  0x00007ffff754c156 in pysleep (secs=600000000000) at .../Modules/timemodule.c:2036
2036        err = select(0, (fd_set *)0, (fd_set *)0, (fd_set *)0, &timeout);
(gdb) p secs
$1 = 600000000000

Reading _PyEval_EvalFrameDefault() local variables now also gives the expected values:

(gdb) frame 7
#7  _PyEval_EvalFrameDefault (...)
3487                res = call_function(tstate, &sp, oparg, NULL);
(gdb) p opcode
$2 = 161
(gdb) p oparg
$3 = 1

As you can see, the <optimized out> messages are gone. GDB works as expected thanks to the new executable built without compiler optimizations.

Python commands in GDB

Python comes with a libpython3.9(...)-gdb.py gdb extension (implemented in Python) that adds GDB commands prefixed by py-. Expanding this prefix with the tab key shows the available commands:

(gdb) py-<tab><tab>
py-bt  py-bt-full  py-down  py-list  py-locals  py-print  py-up

The py-bt command displays the Python call stack:

(gdb) py-bt
Traceback (most recent call first):
  File "slow.py", line 5, in slow_function
    time.sleep(60 * 10)
  File "slow.py", line 6, in <module>
    slow_function()

The py-locals command lists Python local variables:

(gdb) py-locals
x = 3

The py-print command gets the value of a Python variable:

(gdb) py-print x
local 'x' = 3

Additional debug checks

Before the program even runs its first statement, a debug build of Python can detect potential problems. When Python is built in debug mode, many debug checks are executed at runtime to detect bugs in C extensions. For example:

  • Debug hooks are installed on memory allocators to detect buffer overflows and other memory errors.
  • Assertions are made on various function arguments.
  • The garbage collector (gc.collect() function) runs some checks on objects' consistency.

See the Python debug build web page for more details.

Red Hat contributions to the Python debug build

Red Hat contributed the following changes to Python upstream to enhance the Python debug build:

  • Adding assertions in the garbage collection module to make debugging easier with corrupted Python objects: See Python issue 9263. These enhancements were written by Dave Malcolm, maintained as downstream patches in Red Hat Enterprise Linux and Fedora, and pushed upstream in Python 3.8 in 2018. The change adds a new _PyObject_ASSERT() function that dumps the Python object that caused the assertion failure.
  • Detecting freed memory to avoid crashes when debugging Python: I added _PyObject_IsFreed() and _PyMem_IsFreed() functions. The visit_decref() function used by the Python garbage collector now detects freed memory and dumps the parent object on an attempt to access that memory: see Python issue 9263.
  • Maintenance of python-gdb.py and associated test_gdb regression tests: See Python issue 34989.

Conclusion

Python now works quite well with powerful open source debuggers such as GDB. We suggest you try out a Python debug build and GDB when you encounter a problem, especially a segmentation fault caused by a C extension to Python.

Comments