Skip to main content
Redhat Developers  Logo
  • Products

    Featured

    • Red Hat Enterprise Linux
      Red Hat Enterprise Linux Icon
    • Red Hat OpenShift AI
      Red Hat OpenShift AI
    • Red Hat Enterprise Linux AI
      Linux icon inside of a brain
    • Image mode for Red Hat Enterprise Linux
      RHEL image mode
    • Red Hat OpenShift
      Openshift icon
    • Red Hat Ansible Automation Platform
      Ansible icon
    • Red Hat Developer Hub
      Developer Hub
    • View All Red Hat Products
    • Linux

      • Red Hat Enterprise Linux
      • Image mode for Red Hat Enterprise Linux
      • Red Hat Universal Base Images (UBI)
    • Java runtimes & frameworks

      • JBoss Enterprise Application Platform
      • Red Hat build of OpenJDK
    • Kubernetes

      • Red Hat OpenShift
      • Microsoft Azure Red Hat OpenShift
      • Red Hat OpenShift Virtualization
      • Red Hat OpenShift Lightspeed
    • Integration & App Connectivity

      • Red Hat Build of Apache Camel
      • Red Hat Service Interconnect
      • Red Hat Connectivity Link
    • AI/ML

      • Red Hat OpenShift AI
      • Red Hat Enterprise Linux AI
    • Automation

      • Red Hat Ansible Automation Platform
      • Red Hat Ansible Lightspeed
    • Developer tools

      • Red Hat Trusted Software Supply Chain
      • Podman Desktop
      • Red Hat OpenShift Dev Spaces
    • Developer Sandbox

      Developer Sandbox
      Try Red Hat products and technologies without setup or configuration fees for 30 days with this shared Openshift and Kubernetes cluster.
    • Try at no cost
  • Technologies

    Featured

    • AI/ML
      AI/ML Icon
    • Linux
      Linux Icon
    • Kubernetes
      Cloud icon
    • Automation
      Automation Icon showing arrows moving in a circle around a gear
    • View All Technologies
    • Programming Languages & Frameworks

      • Java
      • Python
      • JavaScript
    • System Design & Architecture

      • Red Hat architecture and design patterns
      • Microservices
      • Event-Driven Architecture
      • Databases
    • Developer Productivity

      • Developer productivity
      • Developer Tools
      • GitOps
    • Secure Development & Architectures

      • Security
      • Secure coding
    • Platform Engineering

      • DevOps
      • DevSecOps
      • Ansible automation for applications and services
    • Automated Data Processing

      • AI/ML
      • Data Science
      • Apache Kafka on Kubernetes
      • View All Technologies
    • Start exploring in the Developer Sandbox for free

      sandbox graphic
      Try Red Hat's products and technologies without setup or configuration.
    • Try at no cost
  • Learn

    Featured

    • Kubernetes & Cloud Native
      Openshift icon
    • Linux
      Rhel icon
    • Automation
      Ansible cloud icon
    • Java
      Java icon
    • AI/ML
      AI/ML Icon
    • View All Learning Resources

    E-Books

    • GitOps Cookbook
    • Podman in Action
    • Kubernetes Operators
    • The Path to GitOps
    • View All E-books

    Cheat Sheets

    • Linux Commands
    • Bash Commands
    • Git
    • systemd Commands
    • View All Cheat Sheets

    Documentation

    • API Catalog
    • Product Documentation
    • Legacy Documentation
    • Red Hat Learning

      Learning image
      Boost your technical skills to expert-level with the help of interactive lessons offered by various Red Hat Learning programs.
    • Explore Red Hat Learning
  • Developer Sandbox

    Developer Sandbox

    • Access Red Hat’s products and technologies without setup or configuration, and start developing quicker than ever before with our new, no-cost sandbox environments.
    • Explore Developer Sandbox

    Featured Developer Sandbox activities

    • Get started with your Developer Sandbox
    • OpenShift virtualization and application modernization using the Developer Sandbox
    • Explore all Developer Sandbox activities

    Ready to start developing apps?

    • Try at no cost
  • Blog
  • Events
  • Videos

Querying DWARF For Fun And Profit

January 22, 2015
Petr Machata

Share:

    dwarfDebugging information provides a view into the source code of an application. It's by no means exhaustive, but many features present in the source code are present in the debugging information as well: translation units, functions, scopes, types, variables, etc. It is essential to a range of tools: GDB, SystemTap, OProfile, as well as various developer aids (such as pahole or libabigail).

    So clearly this data can be a useful source of information (hence the name, after all). But the only way to access it is through a dumping tool—such as dwarfdump or {eu-,}readelf -w. Deriving any sort of relationship from a dump is a very clumsy undertaking that usually requires nontrivial amounts of scripting to glue the disparate pieces together.

    A new tool that Red Hat has been working on lately addresses the need to query debugging information in a structured manner. It is called dwgrep (DWARF grep). As the name indicates, it's aimed particularly at DWARF, which is the name of the format that's generally used in Linux for representing debugging information.

    There are two major families of use cases. The first is automated checking of generated DWARF. This comprises looking for instances of a known bug when trying to gauge its impact, discovery of an obscure DWARF construct to check a DWARF consumer that you are writing, or testing DWARF for structural soundness.

    The second use case family is that of writing small, ad-hoc tools for deriving information from DWARF. For example, it's not hard to write a small script that dumps class inheritance information, or one that shows all typedefs and what they resolve to.

    Both of these families are closely related: in each case, you present a script that describes some sort of relationship between various parts of the debugging information (or ELF in general), and the tool goes ahead to find instances of that relationship. In addition to pattern matching, dwgrep has some typical general-purpose tools, such as integer math or string formatting.

    Note that this article mostly skims over some of the key DWARF concepts. I recommend reading through Michael Eager's Introduction to the DWARF Debugging Format, if you are new to DWARF. It makes a good first exposition.

    shared-lib-calls-exit

    Dwgrep's first version was released recently, and I went ahead to package it for Fedora, so that it's easy for people to get their hands on. There's a whole process around getting a package to Fedora, with meticulous eye towards quality. One of the tests checks that a shared library doesn't call exit. It turned out that dwgrep's own library did:

    dwgrep-libzwerg.x86_64: W: shared-lib-calls-exit /usr/lib64/libzwerg.so.0.1 exit@GLIBC_2.2.5
    

    That was confusing to me: that's not the sort of interface that I'd come up with. Luckily, at that point dwgrep was mature enough that it could be used to find the offender. (Don't worry about the details of the query, we'll get to that later.)

    $ ./dwgrep/dwgrep ./libzwerg/libzwerg.so.0.1 -e '
            entry (name == "exit") parent @AT_decl_file'
    /home/petr/proj/dwgrep/64/libzwerg/lexer.cc
    

    Ah ha! lexer.cc is calling exit. Now, lexer.cc is a file generated from a flex source, lexer.ll, and indeed there's no open-coded call to exit in there. But in the autogenerated code, sure enough:

    static void yy_fatal_error (yyconst char* msg , yyscan_t yyscanner)
    {
            (void) fprintf( stderr, "%sn", msg );
        exit( YY_EXIT_FAILURE );
    }
    

    The Language

    So what are the rules of the expression language?

    Dwgrep operates as a stack machine with a twist. The stack that the query operates on contains values of various types: integers and strings, but also DWARF entries, attributes and other artifacts. The DWARF file that you mention on the command line is put as an initial value to the stack:

    $ ./dwgrep/dwgrep ./libzwerg/libzwerg.so.0.1 -e ''
    <Dwarf "./libzwerg/libzwerg.so.0.1">
    

    The query language is a concatenative language, which means that if you have two valid expressions, you can combine them simply by placing them next to each other. Unlike e.g. Forth, there's a bit more syntax, but the overall philosophy remains.

    The simplest expression, apart from an empty expression, is a mere word. E.g. child, parent, entry, or @AT_decl_file. Words take values from the stack and put values back. When several words are written in the row, they hand the stack over to one another.

    The twist mentioned in the previous paragraph is this: expressions can return more than once. In normal concatenative languages, you have one input stack and one output stack. Dwgrep generalizes this: each word has exactly one input stack, but zero or more output stacks.

    As it turns out, this simple mechanism is very handy for depth-first exploration of DWARF files. Each word simultaneously filters and extends the search space, and the stacks that come out of the end of the query are the solution.

    As an example, take the word unit. That expects a stack whose top value is a DWARF object, and produces a number of stacks, one for each compilation unit in the program:

    $ ./dwgrep/dwgrep ./libzwerg/libzwerg.so.0.1 -e 'unit'
    CU 0
    CU 0x493ef
    CU 0x5f304
    CU 0xdaae5
    CU 0xf71e5
    [... etc ...]
    

    In the initial example, entry is a word that takes a DWARF object, and produces its debuginfo entries. That is to say, each input stack of entry is supposed to have a DWARF object on top. That object is discarded, and entry then produces one stack for each entry in that DWARF file, with that entry pushed on top of the stack:

    $ ./dwgrep/dwgrep ./libzwerg/libzwerg.so.0.1 -e 'entry' | grep '^['
    [b] compile_unit
    [29]    namespace
    [34]    structure_type
    [40]    member
    [4b]    typedef
    [56]    subprogram
    [... etc ...]
    

    name is a word as well: it takes an object, and produces its name, which could be a file name of a DWARF object, or a DW_AT_name of a DIE:

    $ ./dwgrep/dwgrep ./libzwerg/libzwerg.so.0.1 -e 'entry name'
    /home/petr/proj/dwgrep-older/64/libzwerg/parser.cc
    std
    integral_constant<bool, false>
    value
    value_type
    operator std::integral_constant<bool, false>::value_type
    [... etc ...]
    

    name does not produce anything if there is no name associated with the object.

    parent then takes a DIE and produces its parent (if any), and @AT_decl_file takes a DIE and produces value of attribute DW_AT_decl_file (again, if any).

    Dwgrep actually has an in-depth tutorial, and a syntax reference that's mostly designed to be read as a tutorial as well. If you want to know more, please read through them. Syntax reference is also where you would look for explanations regarding the whole subexpression (name == "exit"), which is called "infix assertion".

    You may also wish to look at the vocabulary of types of DWARF-related stack values and words applicable to them. That is however not designed as a tutorial at all, but rather as an exhaustive reference. So instead this article will concentrate on bringing some of that in a form that's relatively easy to stomach.

    Navigating the DWARF Graph

    Under typical circumstances, the way that dwgrep is operated is that you put a file name of a binary on the command line, and an object corresponding to that file is put to stack. For separate debuginfo, all you need to do is mention the name of the main file. If the debuginfo file is installed, elfutils' libdw will locate it for you automatically behind the scenes. You then operate on that file.

    Apart from this approach, you could use the word dwopen to convert a string to a DWARF object. It is thus possible to open several files and do some sort of cross-querying in those. For example, one could write a query to find source files used from two different modules:

    $ ./dwgrep/dwgrep '
        "dwgrep/dwgrep" dwopen unit root name
        (== "libzwerg/libzwerg.so.0.1" dwopen unit root name)'
    /home/petr/proj/dwgrep-older/libzwerg/strip.cc
    

    It turns out that the main dwgrep binary shares one source unit with the libzwerg library.

    Dwgrep has fairly solid first-class support for .debug_info and .debug_abbrev, meaning that there are words for direct navigation of entities that live in these sections. Both of them contain units, entries, and attributes. In .debug_info, the concrete types of these entities are called T_CU (for compile unit), T_DIE (for debug info entry) and T_ATTR. In .debug_abbrev, the types are T_ABBREV_UNIT, T_ABBREV and T_ABBREV_ATTR.

    To display relations among the types, let's use the following notation: "T₁ (W)→ T₂", which reads: given a value of type T₁, when word W is applied to it, value of type T₂ is produced. If the operator is "→*" instead, it means "zero or more values are produced". When it is "→?", it means "at most one value is produced".

    The following words are available for the motion from more high-level concepts to more low-level ones and back:

    - T_DWARF (unit)→* T_CU -(entry)→* T_DIE (attribute)→* T_ATTR
    - T_DWARF (entry)→* T_DIE
    - T_DWARF (abbrev)→* T_ABBREV_UNIT (entry)→* T_ABBREV (attribute)→* T_ABBREV_ATTR
    - T_CU (root)→ T_DIE
    - T_DIE (unit)→ T_CU
    

    The following are for sideways motion, from .debug_info to .debug_abbrev.

    - T_CU (abbrev)→ T_ABBREV_UNIT
    - T_DIE (abbrev)→ T_ABBREV
    

    Within .debug_info itself, the following words are available for navigating the graph:

    - T_DIE (child)→* T_DIE  # For access to children of a node.
    - T_DIE (parent)→? T_DIE # For access to parent of a node.
    - T_DIE (root)→ T_DIE    # For access from a node to CU DIE.
    

    There are more types than those mentioned—e.g. location expressions and address sets. The DWARF vocabulary reference mentioned above lists them all - we won't go through them in detail here.

    Cooking The Dwarfs

    One word that we haven't used so far is attribute. Let's go back to the original example, slightly modified:

    $ ./dwgrep/dwgrep ./libzwerg/libzwerg.so.0.1 -e '
        entry (name == "exit") ?TAG_GNU_call_site'
    [551cc] GNU_call_site
        low_pc (addr)   0x37d19;
        abstract_origin (ref4)  [5f112];
    

    And now let's dump the attributes:

    $ ./dwgrep/dwgrep ./libzwerg/libzwerg.so.0.1 -e '
        entry (name == "exit") ?TAG_GNU_call_site attribute'
    low_pc (addr)   0x37d19;
    abstract_origin (ref4)  [5f112];
    external (flag_present) true;
    name (strp) exit;
    decl_file (data1)   /usr/include/stdlib.h;
    decl_line (data2)   543;
    

    Curiously, we end up with quite a few more attributes than the previous dump showed. The reason for that is, that the DWARF file that we work with is cooked. .debug_info values come in two flavors: cooked and raw. Raw values remain faithful to the underlying representation, but cooked ones interpret things. Thus in the previous example, the attribute word pulled in attributes from the DIE referenced by DW_AT_abstract_origin. It would likewise pull attributes from DW_AT_specification, and it would keep pulling if the specification DIE contained in turn more such attributes. For example, consider the following case:

    $ ./dwgrep/dwgrep ./libzwerg/libzwerg.so.0.1 -e '
        entry (name == "overload_op") ?DW_TAG_GNU_call_site'
    [36f1ba]    GNU_call_site
        low_pc (addr)   0x60ed7;
        abstract_origin (ref4)  [3680b6];
        sibling (ref4)  [36f1e5];
    

    Let's follow the rabbit down the hole:

    $ ./dwgrep/dwgrep ./libzwerg/libzwerg.so.0.1 -e '
        entry (name == "overload_op") ?DW_TAG_GNU_call_site @AT_abstract_origin'
    [3680b6]    subprogram
        abstract_origin (ref4)  [368084];
        linkage_name (strp) _ZN11overload_opC2ESt10shared_ptrI2opE17overload_instance;
        low_pc (addr)   0x600d0;
        high_pc (data8) 33;
        frame_base (exprloc)    0..0xffffffffffffffff:[0:call_frame_cfa];
        object_pointer (ref4)   [3680da];
        GNU_all_call_sites (flag_present)   true;
        sibling (ref4)  [368122];
    
    $ ./dwgrep/dwgrep ./libzwerg/libzwerg.so.0.1 -e '
        entry (name == "overload_op") ?DW_TAG_GNU_call_site
        @AT_abstract_origin @AT_abstract_origin'
    [368084]    subprogram
        specification (ref4)    [34c2b1];
        inline (data1)  DW_INL_not_inlined;
        object_pointer (ref4)   [368093];
        sibling (ref4)  [3680b6];
    
    $ ./dwgrep/dwgrep ./libzwerg/libzwerg.so.0.1 -e '
        entry (name == "overload_op") ?DW_TAG_GNU_call_site
        @AT_abstract_origin @AT_abstract_origin @AT_specification'
    [34c2b1]    subprogram
        external (flag_present) true;
        name (strp) overload_op;
        decl_file (data1)   /home/petr/proj/dwgrep-older/libzwerg/overload.cc;
        decl_line (data1)   205;
        accessibility (data1)   DW_ACCESS_public;
        declaration (flag_present)  true;
        object_pointer (ref4)   [34c2c1];
        sibling (ref4)  [34c2d1];
    

    ... and finally we are at the bottom. attribute will bring in all these attributes to one view, pruning duplicates (except for DW_AT_abstract_origin and DW_AT_specification themselves, which seems like a bug).

    $ ./dwgrep/dwgrep ./libzwerg/libzwerg.so.0.1 -e '
        entry (name == "overload_op") ?DW_TAG_GNU_call_site attribute'
    low_pc (addr)   0x60ed7;
    abstract_origin (ref4)  [3680b6];
    sibling (ref4)  [36f1e5];
    abstract_origin (ref4)  [368084];
    linkage_name (strp) _ZN11overload_opC2ESt10shared_ptrI2opE17overload_instance;
    high_pc (data8) 33;
    frame_base (exprloc)    0..0xffffffffffffffff:[0:call_frame_cfa];
    object_pointer (ref4)   [3680da];
    GNU_all_call_sites (flag_present)   true;
    specification (ref4)    [34c2b1];
    inline (data1)  DW_INL_not_inlined;
    external (flag_present) true;
    name (strp) overload_op;
    decl_file (data1)   /home/petr/proj/dwgrep-older/libzwerg/overload.cc;
    decl_line (data1)   205;
    accessibility (data1)   DW_ACCESS_public;
    

    The word raw produces a raw view of the object it's applied to. When we use attribute on such an object, only the attributes that are actually present at the object are produced:

    $ ./dwgrep/dwgrep ./libzwerg/libzwerg.so.0.1 -e '
        entry (name == "overload_op") ?DW_TAG_GNU_call_site raw attribute'
    low_pc (addr)   0x60ed7;
    abstract_origin (ref4)  [3680b6];
    sibling (ref4)  [36f1e5];
    

    Other words sensitive to value "doneness" are child and parent. When child encounters a DW_TAG_imported_unit tag, it suppresses yielding of this DIE, and instead recursively produces DIE's referenced by that import point. It also makes a note of which import point the imported DIE's were brought in by.

    The word parent then makes use of that note. When it traverses to a root node, it checks whether an import point was remembered, and if yes, it produces that import point's parent instead of the root DIE.

    Now, What?

    Should you need more information, the following are resources associated with dwgrep project as such:

    • Dwgrep website: http://pmachata.github.io/dwgrep/
    • Dwgrep project site: https://github.com/pmachata/dwgrep

    Other than that, let me know what you think and what you would like to see. If something doesn't work, please file issues in the issue tracker.

    Last updated: April 6, 2018

    Recent Posts

    • More Essential AI tutorials for Node.js Developers

    • How to run a fraud detection AI model on RHEL CVMs

    • How we use software provenance at Red Hat

    • Alternatives to creating bootc images from scratch

    • How to update OpenStack Services on OpenShift

    Red Hat Developers logo LinkedIn YouTube Twitter Facebook

    Products

    • Red Hat Enterprise Linux
    • Red Hat OpenShift
    • Red Hat Ansible Automation Platform

    Build

    • Developer Sandbox
    • Developer Tools
    • Interactive Tutorials
    • API Catalog

    Quicklinks

    • Learning Resources
    • E-books
    • Cheat Sheets
    • Blog
    • Events
    • Newsletter

    Communicate

    • About us
    • Contact sales
    • Find a partner
    • Report a website issue
    • Site Status Dashboard
    • Report a security problem

    RED HAT DEVELOPER

    Build here. Go anywhere.

    We serve the builders. The problem solvers who create careers with code.

    Join us if you’re a developer, software engineer, web designer, front-end designer, UX designer, computer scientist, architect, tester, product manager, project manager or team lead.

    Sign me up

    Red Hat legal and privacy links

    • About Red Hat
    • Jobs
    • Events
    • Locations
    • Contact Red Hat
    • Red Hat Blog
    • Inclusion at Red Hat
    • Cool Stuff Store
    • Red Hat Summit

    Red Hat legal and privacy links

    • Privacy statement
    • Terms of use
    • All policies and guidelines
    • Digital accessibility

    Report a website issue