Testing GCC in the wild
Currently, the GCC testsuite contains more than fifty thousand tests which make up to two million lines of code. Since we, the GCC developers, try hard to avoid regressions in the compiler, almost every change to the compiler or to a related library requires a new test or a set of tests to be added. Hence the internal testsuite keeps growing at a rapid pace. It goes without saying that a new change should not break any existing test.
Despite all this effort, new bugs still creep in. Because it is best to discover as many bugs as possible before a particular version of the compiler is released, many distribution maintainers try to rebuild all the packages in the distro with a still unreleased compiler. They file bug reports if they find any bugs to the upstream Bugzilla so that the compiler is up to snuff when it is released. This rebuild typically happens once a year in regression fixes-only stage of the development, i.e. a stage when no new features are being introduced and the development revolves around fixing bugs.
While a new version of GCC is usually released in April, the Fedora project performs a mass rebuild of all the Fedora packages in January or February. Due to limited resources, this mass rebuild is only done on x86_64 architecture.
Since the point is to test GCC, this is a mass rebuild of so-called first order.
We are looking for several classes of bugs here. These include:
- ICEs, internal compiler errors, are when the compiler just crashes or segfaults. These are always bugs even if the compiler crashes on a code that is not valid or contains undefined behavior. This sort of bug is usually rather easy to deal with. The compiler is clever enough to produce a pre-processed source file in /tmp.
- false positive warnings occur when the compiler warns on a code even though it should not. This may cause a package build to fail in combination with the -Werror command-line option.
- Another class of bugs is when the compiler fails to compile valid code. It is sometimes unclear whether the code in question is valid or not in a given language and requires significant expertise to assess; it might even involve raising a DR (Defect Report) with the relevant standard committee.
- wrong-code issues are the most formidable ones. They very often manifest themselves by causing a package’s testsuite to fail. However, the failures can be caused by a host of things; too often these are merely undefined behavior in applications that seemingly worked in the past. Some packages misuse the preprocessor in twisted ways, others might be relying on internal details of libstdc++ headers, and so on. All that makes it hard and quite time-consuming to gauge whether the bug is in the compiler or in the package. Fortunately sanitizers such as UBSan can be extremely useful when examining such failures. Another option is to build the package with e.g. -fno-strict-aliasing or -fno-aggressive-loop-optimizations to see if the build still fails. If any of these options helps, it is most likely a bug in the package.
When compiler bug is found, it is important to reduce the problem to a stand-alone test case. This helps insure that future versions of the compiler will continue to appropriately detect invalid code or unspecified behaviors. It is understood that some problems such as those which involve LTO are nearly impossible to reduce into a stand alone test case. Given that not every package uses standard GNU Makefiles, it is sometimes needed to dabble with the internals of a package.
The following is a general overview of how we perform a mass rebuild
First, we rebuild every package in the distro with the new GCC. The number of packages is ever-increasing; while in 2008 there were 5118 packages, in 2011 it was 10404 packages, and this year there were 17741 packages. Naturally, this number of packages requires the rebuild to run in parallel. We start by creating a repository of all the source RPMs and list of their names. Then we create a second repository, this time with the new GCC and corresponding libtool packages. Afterwards we set up mock and chroot configs on every build machine and prepare a script which downloads an SRPM, rebuilds the SRPM in mock, and saves the logs of the rebuild. Now we run this script fed by the list created before (in parallel) on every build box.
Secondly, we rebuild failed builds with the old GCC to quickly evaluate packages that failed due to non-GCC related reason.
Thirdly, we need to investigate FTBFS (Fails To Build From Source) that only happen with the new GCC.
The result of a mass rebuild is so-called “porting to” document, whose purpose is to provide help with porting a project to the new GCC. This year’s “porting to” is still somewhat in flux, but interested readers might want to take a look
at https://gcc.gnu.org/gcc-6/porting_to.html. Furthermore, we post a summary mail on the Fedora devel mailing list which concludes on the particular mass rebuild. For the latest summary, see https://firstname.lastname@example.org/message/DH7M2ADHM6XCRFTRRSKZD6MWFUJKHBZK/.
Ideally, all the bugs that were discovered during the mass rebuild are fixed, either in the compiler or in the packages built by it. This annual rebuild should insure that the compiler is of a good quality and is prepared to be deployed as a new system compiler.