Introduction to using libFuzzer with llvm-toolset

This article is for people interested in the long-term maintenance of software systems that expose application binary interfaces (a.k.a. ABIs) to other systems. That long-term maintenance involves detecting and analyzing inevitable changes in the ABIs and assessing whether these changes allow the maintained systems to stay compatible with the components with which they interact.

In this article, I describe what happened to the ABI change analysis framework that I worked on during 2018: the Abigail library (Libabigail) and its associated set of tools. The goal is not to list the myriad changes that happened throughout releases 1.2, 1.3, 1.4, and 1.5 that occurred during that year, but I will walk you through the main changes that happened and put them in perspective.

Core functionality improvements

These are improvements to the core library. They are thus propagated to all the tools using the library.

General improvement to the leaf change report

Several Libabigail tools can emit change reports using either a default reporting mode or a leaf change reporting mode. In the latter mode, only changes on types, variables, and functions are reported. Unlike in the default reporting mode, the impact of those changes (for example, which function was impacted by a given type change and how) is not reported. In other words, if changes are linked to each other in a tree-like manner, only the leaves are reported in this mode. The kmidiff tool, for instance,  uses this mode by default. And abidiff can emit reports using this mode, using the --leaf-changes-only option.

Source locations of the changes on types are now reported in this mode.

The introductory summary of changes emitted at the beginning of change reports in this mode has been improved.

The meaningfulness of change reports in this mode has been generally improved through many little changes, making the output of the kmidiff tool (in particular) much more usable.

Improved redundancy detection

Whenever Libabigail emits change reports, it avoids reporting the "same" change twice.  For instance, let's suppose we have a type struct Foo like this:

 struct Foo
 {
   int m0;
 };

Suppose that type is used by two functions named function1 and function2, as shown below:

void
function1(struct Foo *a)
{
}

void
function2(struct Foo *b)
{
}

Now let's see how abidiff compares the ABIs of two versions of the binary that contain the definition of structFoo, function1, and function2, where the only change that occurred is adding a data member to struct Foo, like this:

 struct Foo
 {
   int m0;
+  char m0;
 };

The result of invoking abidiff on the two versions of the binary would be as follows:

$ abidiff test-v0.o test-v1.o
Functions changes summary: 0 Removed, 1 Changed (1 filtered out), 0 Added functions
Variables changes summary: 0 Removed, 0 Changed, 0 Added variable

1 function with some indirect sub-type change:

[C]'function void function1(Foo*)' at test-v1.cc:8:1 has some indirect sub-type changes:
  parameter 1 of type 'Foo*' has sub-type changes:
    in pointed to type 'struct Foo' at test-v1.cc:1:1:
      type size changed from 32 to 64 (in bits)
      1 data member insertion:
        'char Foo::m1', at offset 32 (in bits) at test-v1.cc:4:1

$

Notice how the change to struct Foo is reported as impacting function1. But function2 also uses struct Foo, and abidiff voluntarily avoids reporting the struct Foo change in the context of function2 because such a report would be redundant.

There are cases, though, where we want to report redundant changes. For instance, we want to report all instances of function parameter changes where the type const char* was modified to char*. Those changes should not be considered redundant.  This is an area where Libabigail was lacking. It was over-filtering redundant changes. This is improved now.

Another case of redundancy detection improvement is on enum types. The reporting pass of Libabigail was failing to detect that a given change to an enum type was already reported. So there were cases were the tools would report a given enum type changed several times in several different contexts. This was fixed.

Improved categorizing of changes

Whenever Libabigail's comparison engine detects that an ABI artifact (for example, a symbol, type, or declaration) has changed, that change is modeled in an internal (in-memory) representation, a.k.a the diff IR.  The diff IR is a graph where the nodes are the artifact changes (a.k.a. diff nodes).  The diff IR is later processed by various passes for various purposes.  The purpose of one of those passes is to categorize the changes carried by each diff node. Each change carried by each diff node ends up in one of three big categories: harmful change, harmless changes, and non-categorized changes.

Later, when the diff IR is traversed for the purpose of emitting change reports, harmless changes can, for instance, be omitted by default. This helps to increase the signal-to-noise ratio of change reports by being able to avoid reporting ABI changes that are deemed not important to users. But this categorizing business (particularly the part where we tried to improve it) seems to be the never-ending kind.

One example of a recent improvement made is the "harmless name change" category. Whenever a diff IR node carries a typedef or an enum type name change, Libabigail would consider that change to be harmless. That, in turns, allows the change reporting pass to avoid showing typedef and enum name changes by default, because these have no impact on the ABI of the library we are looking at. This is all fine and dandy unless the diff IR node also carries other changes that might not be considered harmless. Thus, the change categorizing engine of Libabigail has now "tightened up" the conditions for which to categorize a typedef or enum type diff IR node as a harmless name change.  For typedef types, a name change is now considered harmless only if there is no change in the textual representation of the underlying type. For enum types, a name change is now considered harmless only if there is no other change in the enum, for example, on the enumerators or on the underlying type.

CV-qualifier changes on return types of functions are now also categorized as harmless by default. Note that these were already categorized as harmless on function parameter types.

Whenever a void* pointer is changed into a more "typed" pointer, that change is now categorized as harmless by default.

Support of anonymous data members

An anonymous data member is a data member of a struct or a union that has no name. The type of such a data member is either a struct or a union, for instance:

struct Foo
{
  int a;
  struct /* <-- This is an anonymous data member. */
   {
     char b;
    char c;
   };
   int d;
};

The debug information emitted by GCC in the DWARF format describes such anonymous data member constructs, but the DWARF reader and the various internal representations of Libabigail had to be adapted to support them. This is now supported.

Better support for ELF symbol versions

An ELF symbol can have multiple versions, and Libabigail has been supporting this feature for a long time. But when a function symbol S has several versions and several different functions with different names have those different versions of S as underlying symbols, Libabigail could mistakenly take one function for another. This is because there are cases where Libabigail identifies the function using its symbol name without taking the version name into account. This has now been fixed and Libabigail always takes the symbol version into account when identifying a function.

Support for union types in suppression specifications

Libabigail (and its tools) allows users to suppress change reports based on what they want. Users can provide a file in which they describe the kind of artifacts for which changes should be suppressed. For instance, a user can say that changes to a type named FooPrivateType should not be reported by the tool. To do so, the user would write a suppression specification file that would look like this:

[suppress_type]
  name = FooPrivateType

That suppression specification file would then be passed to the Libabigail tools using appropriate options.

That kind of suppression specification now acts on union types as well.

Default suppression specification for new projects

Libabigail installs default suppression specifications that are used automatically and implicitly used by tools like abidiff and abipkgdiff whenever these compare some shared libraries identified either by their filename or their soname.

Whenever a particular project team feels the need to define a set of specific suppression rules (for example, to suppress changes detected on types or symbols that are deemed private, based on a specific naming scheme), project members can reach out to Libabigail developers so that together, we come up with a default suppression specification for the project.  Libabigail already installs default suppression specifications for several system libraries, for instance.

In that spirit, a new default suppression specification is now installed for the krb5 and libvirt projects and their libraries.

Tools-specific improvements

fedabipkgdiff

Fedabipkgdiff is a command-line tool to compare the ABI of ELF binaries contained in Fedora packages. It interacts with the Fedora build system to get the packages to act on.

This tool was ported over to Python 3 as part of the general effort to move over to Python 3 in Fedora. It can still be used with Python 2, though.

abipkgdiff

abipkgdiff is a command-line tool to compare the ABI of ELF binaries contained in software packages that are available locally.

When an RPM contains a shared library whose soname is not advertised by the RPM as being "provided," abipkgdiff now considers that shared library to be private to the RPM.  It thus drops that shared library from the set of libraries to compare.  This prevents abipkgdiff from emitting ABI change reports about libraries deemed private to the package.

Conclusion

Several other fixes and improvements were made across the codebase of Libabigail and its associated tools during 2018. These were made possible due to users who took the time to report issues they encountered while using the framework or to request enhancements they thought about while trying to adapt it to their environment. I would like to thank them warmly and sincerely.

The Libabigail developers keep working on improving the features of the Libabigail static analysis framework in general, and we hope to hear from you each time you feel like reaching out to us!

Last updated: March 8, 2019