How the GNU C Library handles backward compatibility

How the GNU C Library handles backward compatibility

One of the GNU C Library’s (glibc’s) unwritten rules is that a program built against an old version of glibc will continue to work against newer versions of glibc. But how does this work? What hidden magic lets you call the same function with different results, just based on when you built your program?

Add magical symbols

This magic is called “compat symbols,” which lets glibc and the static linker (the one used at build time) select from one of many implementations of a function. For example, if we look at the 32-bit libc-2.29.so‘s dynamic symbol table, we see three versions of the glob64 function (in 2017, the glob function was changed to handle dangling symlinks differently, which would cause older programs to crash, but that’s a different story):

$ readelf --dyn-syms -W /lib/libc-2.29.so | grep glob64
   411: 0012d0e0  7183 FUNC    GLOBAL DEFAULT   14 glob64@GLIBC_2.1
   412: 0012edb0  7183 FUNC    GLOBAL DEFAULT   14 glob64@GLIBC_2.2
   413: 000b69a0  7183 FUNC    GLOBAL DEFAULT   14 glob64@@GLIBC_2.27

In your program, you only refer to glob64(). The dynamic linker (the one invoked to start your program) searches for a symbol that starts with glob64 followed by @@ and something else. The @@ tells the dynamic linker that this version is the default version. In this case, the dynamic linker finds glob64@@GLIBC_2.27, because that application binary interface (ABI) last changed in glibc 2.27. The linker replaces @@ with @ to make glob64@GLIBC_2.27, which is stored in your program’s dynamic symbol table.

If the dynamic linker doesn’t find any @@ symbols, it looks for an unversioned symbol, as usual.

Next, when your program runs the dynamic linker and sees the version numbers on all symbols, it links to the correspondingly versioned symbol, because the names now match. The only exception here is that the current version of each symbol still has @@ in the shared object, which is matched against @ in your program:

$ readelf --dyn-syms -W myprog.x | grep glob64
     2: 00000000     0 FUNC    GLOBAL DEFAULT  UND glob64@GLIBC_2.27 (2)

Now consider the case where we’ve built a program against version 2.26 of the C library. In that case, glibc’s dynamic symbol table has something like this when you link against it:

   411: 0012d0e0  7183 FUNC    GLOBAL DEFAULT   14 glob64@GLIBC_2.1
   412: 0012edb0  7183 FUNC    GLOBAL DEFAULT   14 glob64@@GLIBC_2.2

Your program would select the GLIBC_2.2 version as the “latest” symbol, and would add glob64@GLIBC_2.2 in its dynamic symbol table.

If you run that build on a system with glibc 2.27, the dynamic linker sees that you’ve built against version 2.2 of that symbol, and links you to version 2.2 despite there being a newer version available.

Everything you need to grow your career.

With your free Red Hat Developer program membership, unlock our library of cheat sheets and ebooks on next-generation application development.

SIGN UP

Change your ABI with compatibility

Let us say you wanted to do something similar in your own library. Consider this example code:

int lookup (int index)
{
 . . .
}

After a few releases, you realize you want to pass a pointer to the thing you want to look up:

int lookup (int index, void *data)
{
 . . .
}

You can’t have two copies of the same function in your library. You don’t want to change the name of the function, so you write something like this:

__asm__(".symver lookup_v2, lookup@@v2");
int lookup_v2 (int index, void *data)
{
 . . .
}

__asm__(".symver lookup_v1, lookup@");
int lookup_v1 (int index)
{
 . . .
}

We now have two differently named functions. The original code is now lookup_v1 and is set to version @, which means “no version tag, but not the default version.” The dynamic symbol table has an entry lookup for this symbol, which is what your older binaries expect.

The new function lookup_v2 is set to version @@v2, where the @@ means it is the default for any newly-linked programs. If you link a program against the new library, an entry lookup@v2 (note one @) is added to its dynamic symbol table.

The last step here is to tell the static linker what versions you’re using and what internal names to hide, using a version file like this one, which we will call mylib.vers:

v1 {
  local: lookup_v1;
};

v2 {
  local: lookup_v2;
};

You specify this version file to the static linker with the --version-script option, like this:

  gcc . . . -Wl,--version-script,mylib.vers

Note the v1 and v2 clauses corresponding to the versions we’re using. We also use the local command to “hide” our internal names for the functions that implement older versions. This practice makes the (for example) lookup_v2 symbol local in scope to your library, and not visible outside it.

Plan ahead

In the case of glibc, versioning was used from the beginning. You can do this, too, if you add a wildcard version to your version script, like this:

v1 {
  *;
};

This code sets the version for all of your symbols that don’t already have a specific version to v1. Of course, you only want to do this before your first release, because versioning your symbols is itself an ABI change. Then, as you develop future versions of your library, you add more version clauses and list new symbols in those new clauses.

Understand compatibility’s limits

Despite the long history of compatibility and its almost magical ability to keep old programs running, there is one scenario that compatibility can’t solve. You can’t run a new program on an old glibc. Well, that’s not exactly true. You can build a new program that’s intended to run on an old glibc if you have a copy of that old glibc and its headers around. The easiest way to do that is to install an older operating system that has the version of glibc you want, which is the typical advice of “build on the oldest platform you want to support,” possibly using a more modern toolset (gcc et al.), such as Red Hat’s Developer Toolset, which was created for this purpose. That way, the new program depends only on compatibility symbols that are available in that old glibc and any newer glibc. Older glibcs cannot, of course, know the future.

Unless you can predict the future, in which case, please already have contacted me.

Know your nits, picks, and caveats

As this is a short article, a lot of details are glossed over. For example, the @@ syntax is merely a user-visible version of the executable and linkable format (ELF) structure’s internals, which is beyond the scope of this piece.

Although it’s possible for your program to link against two dynamically shared objects (DSOs) that use two different versions of the same symbol (i.e., those two DSOs were built against different glibc versions), such a situation is not supported. The glibc developers do their best to make it work anyway, but if it breaks, you get to keep both pieces.

Similarly, using dlsym to look up symbols in glibc (or any other versioned DSO) can result in using a different version of the symbol than other DSOs you use, with the same caveats.

Learn more

Here are additional materials that may interest you:

Share