Testing… Testing… GCC

The next release of the GNU Compiler CollectionGCC 7, is fast approaching, so in this post, I’m going to talk about work I’ve done to make GCC more reliable

GCC has a large test suite: when I test a patch, roughly 330,000 tests are run, covering various aspects of the compiler, such as:

  • handling valid and invalid syntax in the front-ends
  • verifying that optimizations passes are run
  • verifying that the resulting code runs correctly
  • verifying that the debugger can step through the resulting code and that it can print expression values sanely (both of these rely on metadata generated by the compiler)

We’re proud of GCC’s reliability, but there’s always room for improvement. So, for GCC 7, we’ve extended our test coverage in some ways.

The existing test suite uses DejaGnu: this allows us to write our test cases as source files, annotated with “magic” comments: a domain-specific language for expressing how to compile the source files, and what we expect to happen. This makes it relatively easy to turn an issue into a test case, expressing the aspect of the compiler’s behavior that we want to verify, without introducing “brittleness” by relying too much on the exact byte-for-byte output of the compiler.

One limitation of the existing approach relates to the complexity of the software being tested. A compiler is a complicated beast: GCC 7 currently has 332 optimization passes. The first 242 relate to the “GIMPLE” internal representation: a high-level version of the code being compiled, whereas the later 90 passes relate to the “RTL” representation, which is closer to machine instructions.

So to test a specific behavior of, say, the 200th optimization pass means that we need a source file that is untouched by the earlier passes so that the testing directives we ask for are still meaningful. As GCC’s optimization passes improve, the earlier optimization passes can transform the code flowing through the pipeline enough that the test cases for later passes can start to “bit rot”.

This is a good problem to have, but it still needs a solution. So, in GCC 7, we’ve extended the C frontend so that we can embed fragments of GIMPLE and RTL dumps as the bodies of functions. This allows us to write test cases for specific optimization passes, and assert that a given fragment of IR is handled in a particular way. Integrating this with the C frontend also preserves one of the benefits of our existing approach: as well as unit-testing a particular optimization pass, the same test can continue to run the rest of the compiler, and the resulting code can be run and verified – giving integration testing that the compiler is sane. (I wrote the RTL dump support, others wrote the GIMPLE dump support).

The other new approach is in unit-testing: GCC’s existing testing was almost all done by verifying the externally-visible behavior of the program, but we had very little direct coverage of specific implementation subsystems; this was done in a piecemeal fashion using testing plugins.

To address this, I’ve added a unit-testing suite to GCC 7, which is run automatically during a non-release build. Compilers use many data structures, so the most obvious benefit is that we can directly test corner-cases in these. As a relative newcomer to the project, one of my “pain points” learning GCC’s internals was the custom garbage collector it uses to manage memory. So, I’m very happy that the test suite now has specific test coverage for various aspects of the collector, which should make the compiler more robust when handling very large input files.

Another area of testing relates to how we track source code locations. This was originally quite simple, but it has grown over time: early versions of GCC merely tracked file name and line number; column numbers were added in 2004; macro expansions in 2011, and I added tracking of ranges of source code (rather that just points) in GCC 6. To avoid memory bloat within the internal representation, we encode source locations as 32-bit values, which are effectively keys into a database. This database has various heuristics to try to gracefully handle whatever code is thrown at it: large numbers of source files vs. files with a large number of lines vs. files with very wide source lines, and so on.  So, an advantage of the new unit-testing approach is that we can inject various interesting situations into the location-tracking code (e.g. what happens if you have a single very wide line in the middle of lines of more typical length) and assert that it copes with it all, whilst looping over various cases at or close to boundary conditions in the heuristics.

So these changes are “under the hood”, but should mean a more reliable compiler, and they’ve given us more scope for implementing user-visible improvements.

Pre-releases of GCC 7 can already be downloaded if you want to try it out; we hope that the full release will be available in April, and in Fedora 26.

Join Red Hat Developers, a developer program for you to learn, share, and code faster – and get access to Red Hat software for your development.  The developer program and software are both free!


Leave a Reply