How to cache data using GDB's Python API

The GNU Debugger (GDB) has an extensive, and growing Python API which can be used to customize and extend GDB as needed. Different object types exist within the Python API to represent different parts of GDB and the processes being debugged.

Often when writing extensions using the Python API is is useful to cache information about the objects with which you are working. This information might be temporal; it needs to be recorded at a precise moment in time, and can't be re-calculated later; or the information might be expensive to calculate, so computing once and caching gives a significant performance boost.

This article describes how information can be cached for different object types within GDB's Python API; some object types provide special support for data caching, while for other object types you'll need to do additional work to manage cached data.

Caching for object files and program spaces

The gdb.Objfile and gdb.Progspace types both contain support for caching data within the object itself using a dictionary that is built into the object. This allows user defined attributes to be added to the object at any time. For the purpose of the examples in this post we're only going to be caching very simple data, a date and time, which is pretty contrived, but it helps keep the example code short, and hopefully it is obvious how this simple data could be replaced with something larger in a real extension.

Caching for the gdb.Objfile type

The following example shows a user extension that records the time when GDB first sees each Objfile, and adds a new user defined command to print the cached information. The extension makes use of GDB's Event API to spot when a new Objfile is loaded into GDB:

import datetime

def new_objfile(event):
    objfile = event.new_objfile
    objfile._first_seen_at = datetime.datetime.today()

gdb.events.new_objfile.connect(new_objfile)

class objfile_time_command(gdb.Command):
    def __init__(self):
        super().__init__("objfile-time", gdb.COMMAND_NONE)

    def invoke(self, args, from_tty):
        for objfile in gdb.objfiles():
            print(f"Filename: {objfile.filename}\t"
                  f"Seen at: {objfile._first_seen_at}")

objfile_time_command()

With the above code placed into a file per-objfile.py we can make use of this within a GDB session like this:

> gdb -q
(gdb) source per-objfile.py 
(gdb) file /tmp/hello.x 
Reading symbols from /tmp/hello.x...
(gdb) objfile-time 
Filename: /tmp/hello.x	Seen at: 2024-01-05 15:23:30.451302
(gdb) start
Temporary breakpoint 1 at 0x401198: file /tmp/hello.c, line 18.
Starting program: /tmp/hello.x 

Temporary breakpoint 1, main () at /tmp/hello.c:18
18	  printf ("Hello World\n");
(gdb) objfile-time 
Filename: /tmp/hello.x	Seen at: 2024-01-05 15:23:30.451302
Filename: /lib64/ld-linux-x86-64.so.2	Seen at: 2024-01-05 15:23:35.924193
Filename: system-supplied DSO at 0x7ffff7fcf000	Seen at: 2024-01-05 15:23:35.927639
Filename: /lib64/libc.so.6	Seen at: 2024-01-05 15:23:35.980646
(gdb)

The great thing about this approach is that as GDB unloads Objfiles, and the corresponding Objfile objects are deleted, the cached data associated with the object will be automatically cleaned up.

Caching for the gdb.Progspace type

The Progspace object type supports caching via custom attributes, just like Objfile. The following extension caches the time at which the executable was changed within a program space (this makes use of the executable_changed event which is only available in GDB 14.1 and later):

import datetime

def exec_changed(event):
    if not event.reload:
        pspace = event.progspace
        pspace._last_changed_at = datetime.datetime.today()

gdb.events.executable_changed.connect(exec_changed)

class exec_changed_time_command(gdb.Command):
    def __init__(self):
        super().__init__("exec-change-time", gdb.COMMAND_NONE)

    def invoke(self, args, from_tty):
        for pspace in gdb.progspaces():
            print(f"Filename: {pspace.filename}\t"
                  f"Changed at: {pspace._last_changed_at}")

exec_changed_time_command()

If the above is placed into a file exec-change.py, then it can be used within GDB like this:

> gdb -q
(gdb) source exec-change.py 
(gdb) file /tmp/hello.x 
Reading symbols from /tmp/hello.x...
(gdb) exec-change-time 
Filename: /tmp/hello.x	Changed at: 2024-01-05 15:33:41.973756
(gdb) file /tmp/other.x
Load new symbol table from "/tmp/other.x"? (y or n) y
Reading symbols from /tmp/other.x...
(gdb) exec-change-time 
Filename: /tmp/other.x	Changed at: 2024-01-05 15:34:04.669511
(gdb)

Problems with cache initialization

There are a couple of problems that I have ignored in the above examples. The first is that GDB creates an initial inferior within an initial program space before it sources any user provided Python scripts. So, if we use the new exec-change-time command before loading an executable, we're going to get an error, like this:

> gdb -q
(gdb) source exec-change.py 
(gdb) exec-change-time 
Python Exception <class 'AttributeError'>: 'gdb.Progspace' object has no attribute '_last_changed_at'
Error occurred in Python: 'gdb.Progspace' object has no attribute '_last_changed_at'
(gdb)

But we can avoid this by using Python's hasattr() function, here's an updated implementation of the exec_changed_time_command class from the last example:

class exec_changed_time_command(gdb.Command):
    def __init__(self):
        super().__init__("exec-change-time", gdb.COMMAND_NONE)

    def invoke(self, args, from_tty):
        for pspace in gdb.progspaces():
            if hasattr(pspace, '_last_changed_at'):
                last_changed_at = pspace._last_changed_at
            else:
                last_changed_at = None
            print(f"Filename: {pspace.filename}\t"
                  f"Changed at: {last_changed_at}")

There's a similar problem with the per-objfile.py example: If an executable is passed to GDB on the command line, then the associated Objfile will be loaded before any Python scripts are sourced, and so the _first_seen_at attribute will be missing. This can be fixed in a similar way to the program space example; this is left as an exercise for the reader.

A short note on attribute names

Notice that in both of the previous examples, I selected a name for the new attribute that started with an underscore character. User defined attributes exist in the same namespace as the attributes that are part of GDB's Python API. As a concrete example, we can't create an attribute called filename on a Progspace object, as GDB's Python API already defines a gdb.Progspace.filename attribute.

It is easy enough to look through the documentation and select an attribute name that GDB is not currently using, however, future releases of GDB often add new attributes and methods to existing types within the API, and so care needs to be taken to ensure user defined attributes don't clash with future releases of GDB.

Attributes and methods that are part of GDB's official API will always start with a lower case character. Python's built in methods always start and end with a double underscore, for example the __init__ method, or the __dict__ attribute. User defined attributes should avoid these two naming schemes in order to avoid conflicts, for example, starting user defined attributes with a single underscore, or with a capital letter, will help avoid naming conflicts.

Another possible issue is the risk of clashing with an attribute added by some other extension. Right now there is no reliable way to prevent this from happening other than picking attribute names that are likely to be unique: selecting longer names, and possibly include the extension name within the attribute name, will help reduce the possibility of naming conflicts.

Caching for other types

There are plans to make the above technique available for additional types within GDB's Python API; however, this will not arrive until GDB 15 at the earliest. So, if you want data caching for other types or you need to support older releases of GDB, then you're going to have to create and manage your own cache.

We've already seen a hint at how this caching can be done in the previous examples; previously we used the Event API to create new cached data, now we will also use the Event API to clear the cache at the appropriate time.

We'll work through some examples for gdb.InferiorThread and gdb.Inferior object types to see how this can be done.

Caching for the gdb.InferiorThread object type

Here's an example extension that caches the creation time of each thread. As we can't add attributes to an InferiorThread object, we instead use a global dictionary for the cache. In order to avoid the cache becoming filled with stale threads we connect to the thread_exited event and use this event clear entries from the cache:

import datetime

thread_creation_time_cache = {}

def thread_created(event):
    global thread_creation_time_cache
    thread = event.inferior_thread
    thread_creation_time_cache[thread] = datetime.datetime.today()

def thread_deleted(event):
    global thread_creation_time_cache
    thread = event.inferior_thread
    if thread in thread_creation_time_cache:
        del thread_creation_time_cache[thread]

gdb.events.new_thread.connect(thread_created)
gdb.events.thread_exited.connect(thread_deleted)

class thread_creation_time_command(gdb.Command):
    def __init__(self):
        super().__init__("thread-creation-time", gdb.COMMAND_NONE)

    def invoke(self, args, from_tty):
        global thread_creation_time_cache

        for inferior in gdb.inferiors():
            for thread in inferior.threads():
                if thread in thread_creation_time_cache:
                    time = thread_creation_time_cache[thread]
                else:
                    time = "unknown"
                print(f"Thread: {thread.ptid}: Creation time: {time}")

thread_creation_time_command()

GDB only creates an InferiorThread object when an inferior starts, or when an inferior spawns additional threads. However, we still have to consider the case that the Python extension was not loaded until after the first thread was started. This is handled by checking if the thread is present in the cache before trying to read or delete data from the cache.

Caching for the gdb.Inferior object type

When caching per-inferior data, we run into the same problem as with the ProgSpace type: GDB creates an initial inferior before the user has a chance to load any Python extensions. As a result, there will be no events associated with the creation of the very first inferior.

To work around this, one solution is to check for preexisting inferiors at the time the Python script is loaded. This is the approach taken in the following example:

import datetime

inferior_creation_time_cache = {}

def add_inferior(inferior):
    global inferior_creation_time_cache
    inferior_creation_time_cache[inferior] = datetime.datetime.today()

def del_inferior(inferior):
    global inferior_creation_time_cache
    del inferior_creation_time_cache[inferior]

def new_inferior(event):
    add_inferior(event.inferior)

def inferior_deleted(event):
    del_inferior(event.inferior)

gdb.events.new_inferior.connect(new_inferior)
gdb.events.inferior_deleted.connect(inferior_deleted)

def handle_existing_inferiors():
    for inferior in gdb.inferiors():
        add_inferior(inferior)

handle_existing_inferiors()

class inferior_creation_time_command(gdb.Command):
    def __init__(self):
        super().__init__("inferior-creation-time", gdb.COMMAND_NONE)

    def invoke(self, args, from_tty):
        global inferior_creation_time_cache

        for inferior in gdb.inferiors():
            time = inferior_creation_time_cache[inferior]
            print(f"Inferior #{inferior.num}: Creation time: {time}")

inferior_creation_time_command()

The inferior_deleted event is used to ensure that the cache is cleaned up as inferiors are removed from GDB.

The handle_existing_inferiors function is called once when the Python script is initially loaded and adds some initial data for all the existing inferiors. Obviously, in this case, when the data is the time that the inferior was created, this initial data is not going to be accurate; GDB might have started long before the Python extension was loaded, but I wanted to demonstrate this approach as an alternative to ignoring unknown objects, like we did for InferiorThread.

Placing the above into a file called per-inferior.py, then it can be used like this:

$ gdb -q
(gdb) source per-inferior.py 
(gdb) inferior-creation-time 
Inferior #1: Creation time: 2024-01-10 10:05:56.421793
(gdb) add-inferior 
[New inferior 2]
Added inferior 2
(gdb) inferior-creation-time 
Inferior #1: Creation time: 2024-01-10 10:05:56.421793
Inferior #2: Creation time: 2024-01-10 10:06:02.907802
(gdb)

Conclusion

As GDB's Python API continues to grow, it is becoming possible to write more complex Python extensions. More complex extensions often require data to live for extended periods of time, and this in turn requires that extension authors must understand how to correctly manage the data they are holding to avoid excessive memory use, or worse, stale data leading to incorrect results. GDB has two methods for managing data: custom object attributes and manually managed caches using events, both of which we have explored.

This is an area of GDB that is under active development, so it is worth checking GDB's NEWS page for each release to see what new features have been added to the Python API.

Red Hat Developer Sandbox

Programming Languages & Frameworks

System Design & Architecture

Developer Productivity

Automated Data Processing

Platform Engineering

Secure Development & Architectures

E-Books

Cheat Sheets

Documentation

How to cache data using GDB's Python API

Caching for object files and program spaces

Caching for the gdb.Objfile type

Caching for the gdb.Progspace type

Problems with cache initialization

A short note on attribute names

Caching for other types

Caching for the gdb.InferiorThread object type

Caching for the gdb.Inferior object type

Conclusion

Profiling vLLM Inference Server with GPU acceleration on RHEL

Network performance in distributed training: Maximizing GPU utilization on OpenShift

Clang bytecode interpreter update

How Red Hat has redefined continuous performance testing

Simplify OpenShift installation in air-gapped environments

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Report a website issue

How to cache data using GDB's Python API

Share:

Caching for object files and program spaces

Caching for the gdb.Objfile type

Caching for the gdb.Progspace type

Problems with cache initialization

A short note on attribute names

Caching for other types

Caching for the gdb.InferiorThread object type

Caching for the gdb.Inferior object type

Conclusion

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Report a website issue