Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

How to cache data using GDB's Python API

February 5, 2024
Andrew Burgess
Related topics:
LinuxPython
Related products:
Red Hat Enterprise Linux

Share:

    The GNU Debugger (GDB) has an extensive, and growing Python API which can be used to customize and extend GDB as needed. Different object types exist within the Python API to represent different parts of GDB and the processes being debugged. 

    Often when writing extensions using the Python API is is useful to cache information about the objects with which you are working. This information might be temporal; it needs to be recorded at a precise moment in time, and can't be re-calculated later; or the information might be expensive to calculate, so computing once and caching gives a significant performance boost.

    This article describes how information can be cached for different object types within GDB's Python API; some object types provide special support for data caching, while for other object types you'll need to do additional work to manage cached data.

    Caching for object files and program spaces

    The gdb.Objfile and gdb.Progspace types both contain support for caching data within the object itself using a dictionary that is built into the object. This allows user defined attributes to be added to the object at any time. For the purpose of the examples in this post we're only going to be caching very simple data, a date and time, which is pretty contrived, but it helps keep the example code short, and hopefully it is obvious how this simple data could be replaced with something larger in a real extension.

    Caching for the gdb.Objfile type

    The following example shows a user extension that records the time when GDB first sees each Objfile, and adds a new user defined command to print the cached information. The extension makes use of GDB's Event API to spot when a new Objfile is loaded into GDB:

    import datetime
    
    def new_objfile(event):
        objfile = event.new_objfile
        objfile._first_seen_at = datetime.datetime.today()
    
    gdb.events.new_objfile.connect(new_objfile)
    
    class objfile_time_command(gdb.Command):
        def __init__(self):
            super().__init__("objfile-time", gdb.COMMAND_NONE)
    
        def invoke(self, args, from_tty):
            for objfile in gdb.objfiles():
                print(f"Filename: {objfile.filename}\t"
                      f"Seen at: {objfile._first_seen_at}")
    
    objfile_time_command()

    With the above code placed into a file per-objfile.py we can make use of this within a GDB session like this:

    > gdb -q
    (gdb) source per-objfile.py 
    (gdb) file /tmp/hello.x 
    Reading symbols from /tmp/hello.x...
    (gdb) objfile-time 
    Filename: /tmp/hello.x	Seen at: 2024-01-05 15:23:30.451302
    (gdb) start
    Temporary breakpoint 1 at 0x401198: file /tmp/hello.c, line 18.
    Starting program: /tmp/hello.x 
    
    Temporary breakpoint 1, main () at /tmp/hello.c:18
    18	  printf ("Hello World\n");
    (gdb) objfile-time 
    Filename: /tmp/hello.x	Seen at: 2024-01-05 15:23:30.451302
    Filename: /lib64/ld-linux-x86-64.so.2	Seen at: 2024-01-05 15:23:35.924193
    Filename: system-supplied DSO at 0x7ffff7fcf000	Seen at: 2024-01-05 15:23:35.927639
    Filename: /lib64/libc.so.6	Seen at: 2024-01-05 15:23:35.980646
    (gdb) 

    The great thing about this approach is that as GDB unloads Objfiles, and the corresponding Objfile objects are deleted, the cached data associated with the object will be automatically cleaned up.

    Caching for the gdb.Progspace type

    The Progspace object type supports caching via custom attributes, just like Objfile. The following extension caches the time at which the executable was changed within a program space (this makes use of the executable_changed event which is only available in GDB 14.1 and later):

    import datetime
    
    def exec_changed(event):
        if not event.reload:
            pspace = event.progspace
            pspace._last_changed_at = datetime.datetime.today()
    
    gdb.events.executable_changed.connect(exec_changed)
    
    class exec_changed_time_command(gdb.Command):
        def __init__(self):
            super().__init__("exec-change-time", gdb.COMMAND_NONE)
    
        def invoke(self, args, from_tty):
            for pspace in gdb.progspaces():
                print(f"Filename: {pspace.filename}\t"
                      f"Changed at: {pspace._last_changed_at}")
    
    exec_changed_time_command()

    If the above is placed into a file exec-change.py, then it can be used within GDB like this:

    > gdb -q
    (gdb) source exec-change.py 
    (gdb) file /tmp/hello.x 
    Reading symbols from /tmp/hello.x...
    (gdb) exec-change-time 
    Filename: /tmp/hello.x	Changed at: 2024-01-05 15:33:41.973756
    (gdb) file /tmp/other.x
    Load new symbol table from "/tmp/other.x"? (y or n) y
    Reading symbols from /tmp/other.x...
    (gdb) exec-change-time 
    Filename: /tmp/other.x	Changed at: 2024-01-05 15:34:04.669511
    (gdb) 

    Problems with cache initialization

    There are a couple of problems that I have ignored in the above examples. The first is that GDB creates an initial inferior within an initial program space before it sources any user provided Python scripts. So, if we use the new exec-change-time command before loading an executable, we're going to get an error, like this:

    > gdb -q
    (gdb) source exec-change.py 
    (gdb) exec-change-time 
    Python Exception <class 'AttributeError'>: 'gdb.Progspace' object has no attribute '_last_changed_at'
    Error occurred in Python: 'gdb.Progspace' object has no attribute '_last_changed_at'
    (gdb) 

    But we can avoid this by using Python's hasattr() function, here's an updated implementation of the exec_changed_time_command class from the last example:

    class exec_changed_time_command(gdb.Command):
        def __init__(self):
            super().__init__("exec-change-time", gdb.COMMAND_NONE)
    
        def invoke(self, args, from_tty):
            for pspace in gdb.progspaces():
                if hasattr(pspace, '_last_changed_at'):
                    last_changed_at = pspace._last_changed_at
                else:
                    last_changed_at = None
                print(f"Filename: {pspace.filename}\t"
                      f"Changed at: {last_changed_at}")

    There's a similar problem with the per-objfile.py example: If an executable is passed to GDB on the command line, then the associated Objfile will be loaded before any Python scripts are sourced, and so the _first_seen_at attribute will be missing.  This can be fixed in a similar way to the program space example; this is left as an exercise for the reader.

    A short note on attribute names

    Notice that in both of the previous examples, I selected a name for the new attribute that started with an underscore character. User defined attributes exist in the same namespace as the attributes that are part of GDB's Python API. As a concrete example, we can't create an attribute called filename on a Progspace object, as GDB's Python API already defines a gdb.Progspace.filename attribute.

    It is easy enough to look through the documentation and select an attribute name that GDB is not currently using, however, future releases of GDB often add new attributes and methods to existing types within the API, and so care needs to be taken to ensure user defined attributes don't clash with future releases of GDB.

    Attributes and methods that are part of GDB's official API will always start with a lower case character. Python's built in methods always start and end with a double underscore, for example the __init__ method, or the __dict__ attribute. User defined attributes should avoid these two naming schemes in order to avoid conflicts, for example, starting user defined attributes with a single underscore, or with a capital letter, will help avoid naming conflicts.

    Another possible issue is the risk of clashing with an attribute added by some other extension. Right now there is no reliable way to prevent this from happening other than picking attribute names that are likely to be unique: selecting longer names, and possibly include the extension name within the attribute name, will help reduce the possibility of naming conflicts.

    Caching for other types

    There are plans to make the above technique available for additional types within GDB's Python API; however, this will not arrive until GDB 15 at the earliest. So, if you want data caching for other types or you need to support older releases of GDB, then you're going to have to create and manage your own cache.

    We've already seen a hint at how this caching can be done in the previous examples; previously we used the Event API to create new cached data, now we will also use the Event API to clear the cache at the appropriate time.

    We'll work through some examples for gdb.InferiorThread and gdb.Inferior object types to see how this can be done.

    Caching for the gdb.InferiorThread object type

    Here's an example extension that caches the creation time of each thread. As we can't add attributes to an InferiorThread object, we instead use a global dictionary for the cache. In order to avoid the cache becoming filled with stale threads we connect to the thread_exited event and use this event clear entries from the cache:

    import datetime
    
    thread_creation_time_cache = {}
    
    def thread_created(event):
        global thread_creation_time_cache
        thread = event.inferior_thread
        thread_creation_time_cache[thread] = datetime.datetime.today()
    
    def thread_deleted(event):
        global thread_creation_time_cache
        thread = event.inferior_thread
        if thread in thread_creation_time_cache:
            del thread_creation_time_cache[thread]
    
    gdb.events.new_thread.connect(thread_created)
    gdb.events.thread_exited.connect(thread_deleted)
    
    class thread_creation_time_command(gdb.Command):
        def __init__(self):
            super().__init__("thread-creation-time", gdb.COMMAND_NONE)
    
        def invoke(self, args, from_tty):
            global thread_creation_time_cache
    
            for inferior in gdb.inferiors():
                for thread in inferior.threads():
                    if thread in thread_creation_time_cache:
                        time = thread_creation_time_cache[thread]
                    else:
                        time = "unknown"
                    print(f"Thread: {thread.ptid}: Creation time: {time}")
    
    thread_creation_time_command()
    

    GDB only creates an InferiorThread object when an inferior starts, or when an inferior spawns additional threads. However, we still have to consider the case that the Python extension was not loaded until after the first thread was started. This is handled by checking if the thread is present in the cache before trying to read or delete data from the cache.

    Caching for the gdb.Inferior object type

    When caching per-inferior data, we run into the same problem as with the ProgSpace type: GDB creates an initial inferior before the user has a chance to load any Python extensions. As a result, there will be no events associated with the creation of the very first inferior.

    To work around this, one solution is to check for preexisting inferiors at the time the Python script is loaded. This is the approach taken in the following example:

    import datetime
    
    inferior_creation_time_cache = {}
    
    def add_inferior(inferior):
        global inferior_creation_time_cache
        inferior_creation_time_cache[inferior] = datetime.datetime.today()
    
    def del_inferior(inferior):
        global inferior_creation_time_cache
        del inferior_creation_time_cache[inferior]
    
    def new_inferior(event):
        add_inferior(event.inferior)
    
    def inferior_deleted(event):
        del_inferior(event.inferior)
    
    gdb.events.new_inferior.connect(new_inferior)
    gdb.events.inferior_deleted.connect(inferior_deleted)
    
    def handle_existing_inferiors():
        for inferior in gdb.inferiors():
            add_inferior(inferior)
    
    handle_existing_inferiors()
    
    class inferior_creation_time_command(gdb.Command):
        def __init__(self):
            super().__init__("inferior-creation-time", gdb.COMMAND_NONE)
    
        def invoke(self, args, from_tty):
            global inferior_creation_time_cache
    
            for inferior in gdb.inferiors():
                time = inferior_creation_time_cache[inferior]
                print(f"Inferior #{inferior.num}: Creation time: {time}")
    
    inferior_creation_time_command()

    The inferior_deleted event is used to ensure that the cache is cleaned up as inferiors are removed from GDB.

    The handle_existing_inferiors function is called once when the Python script is initially loaded and adds some initial data for all the existing inferiors.  Obviously, in this case, when the data is the time that the inferior was created, this initial data is not going to be accurate; GDB might have started long before the Python extension was loaded, but I wanted to demonstrate this approach as an alternative to ignoring unknown objects, like we did for InferiorThread.

    Placing the above into a file called per-inferior.py, then it can be used like this:

    $ gdb -q
    (gdb) source per-inferior.py 
    (gdb) inferior-creation-time 
    Inferior #1: Creation time: 2024-01-10 10:05:56.421793
    (gdb) add-inferior 
    [New inferior 2]
    Added inferior 2
    (gdb) inferior-creation-time 
    Inferior #1: Creation time: 2024-01-10 10:05:56.421793
    Inferior #2: Creation time: 2024-01-10 10:06:02.907802
    (gdb) 

    Conclusion

    As GDB's Python API continues to grow, it is becoming possible to write more complex Python extensions. More complex extensions often require data to live for extended periods of time, and this in turn requires that extension authors must understand how to correctly manage the data they are holding to avoid excessive memory use, or worse, stale data leading to incorrect results. GDB has two methods for managing data: custom object attributes and manually managed caches using events, both of which we have explored.

    This is an area of GDB that is under active development, so it is worth checking GDB's NEWS page for each release to see what new features have been added to the Python API.

    Related Posts

    • Debugging binaries invoked from scripts with GDB

    • Remote debugging with GDB

    • The GDB developer's GNU Debugger tutorial, Part 1: Getting started with the debugger

    • How to debug C++ lambda expressions with GDB

    • How to debug stack frames and recursion in GDB

    • Debugging Python C extensions with GDB

    Recent Posts

    • Storage considerations for OpenShift Virtualization

    • Upgrade from OpenShift Service Mesh 2.6 to 3.0 with Kiali

    • EE Builder with Ansible Automation Platform on OpenShift

    • How to debug confidential containers securely

    • Announcing self-service access to Red Hat Enterprise Linux for Business Developers

    What’s up next?

    Cache configuration in Red Hat Data Grid

    Red Hat Data Grid is an in-memory, distributed, NoSQL datastore solution that boosts application performance, provides greater deployment flexibility, and minimizes the overhead of standing up new applications. Download the Cache configuration in Red Hat Data Grid cheat sheet for a quick guide to creating and configuring Data Grid caches, complete with configuration examples.

    Get the cheat sheet
    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit
    © 2025 Red Hat

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue