Earlier this year I was asked to bootstrap our core tools (compiler, assembler, linker, and libraries) from the ground up, to help the rest of the team in providing enough infrastructure for bootstrapping an entire OS to POWER8 little endian. Since I spend most of my days working on the upstream development of the GNU Compiler Collection (GCC), prior to this project I hadn't actually worked much with either RHEL's development processes or RPM as a whole. So leading our effort to bootstrap the tools onto a new architecture required a lot of coming up to speed.
Not being one to shy away from learning an entire new infrastructure, I accepted, and so began a 6 month ordeal fighting with everything from the assembler to GNU Emacs. Having learned so much from this project, I thought it would be good to write some of it down, both for the curious, and to help in future bootstrapping efforts.
Consequently, I'd like to give an overview of how an OS is bootstrapped, and what insights I've learned that can help developers in designing packages that are easy to bootstrap and bring up in new architectures.
Everything begins with the compiler and associated tools. Without a compiler and assembler there can be no editor, no GNOME, and no kernel. And without an OS, you have to depend on a pre-existing OS to bootstrap your compiler.
Bootstrapping GCC from zero is a little bit of black magic, mixed with a fair amount of frustration and swearing mixed in. My general approach was the following:
- Cross build GNU Binary Utilities (binutils) (assembler, linker, and an assortment of other low-level tools) so we could run the assembler/linker/etc from an x86 machine and generate ppc64le binaries. This involved porting upstream ppc64le support into our RHEL 7.x code base.
- Build a minimal cross GCC using the binutils in step 1. This is basically a cross GCC without target libraries.
- Then we used the minimal GCC to build minimal GNU C Library (glibc) headers.
- Use the the above to build a cross GCC again, but this time with libgcc and some minimal target libraries.
- Build glibc.
- Build a full GCC that uses the glibc recently built. Once this stage is completed, we have a compiler capable of building ppc64le binaries from another host (x86 in my case).
Once we have a bare compiler and libraries working, we can use these tools to manually build the core supporting packages (make, sed, bash, tar, gawk, grep, etc etc). And by building these packages manually, I mean completely manually, because we can't depend on ./configure or bash or anything else.
Eventually, we end up with a set of packages that can be run on a ppc64le system (or a simulated qemu system in my case), where we can build the rest of the packages ala ./configure and make (albeit with many manual hacks thrown in).
Most of the above steps had been automated by the 64-bit ARM team. I heavily modified their scripts, with changes that will hopefully make it into their stage1/stage2 scripts (see Stage1 notes and Stage2 notes).
Past manual building with ./configure and make, we eventually have enough packages to build `rpm', at which point things get progressively easier tools-wise, but progressively harder package-wise. From here on out, packages must be built with rpm, but care must be taken to build things in the appropriate order. The appropriate order is mostly unknown, so a lot of trial and error happens at this stage. Building autoconf may need emacs which may need ImageMagick, at which point you're pretty much building the entire OS.
This stage is the most infuriating and takes the longest. You must build packages breaking circular dependencies by hand-- sometimes, by hacking .spec files, and sometimes by hacking source files. But eventually you end up with enough RPMs to be able to fire up yum and ultimately mock. Once you get to mock, if you can successfully rebuild all your packages within it, you are done.
For the uninitiated, mock is a tool that works in unison with yum to build a given SRPM in a chroot. It takes care of downloading and installing RPM dependencies and building your SRPM in isolation from your build system's setup. It is useful in building and testing packages, as you can build a package for say RHEL 7.1, even though your host system is Fedora 20. If you read through the mock build logs, you can see exactly which packages your package needs to build. This is very useful in minimizing your package's dependencies and optimizing the bootstrap the process.
Does all this sound hard? It is! Perhaps not hard, but time consuming and frustrating. It's at this point where you write down everything that could have been done differently... So here are a few hints and tips for package developers and maintainers to aid in the inevitable future architecture that some poor soul must port RHEL/Fedora to.
- Do not stream ABI details to disk. ABI details should never be streamed to disk. This makes it extremely hard to bootstrap an OS, and guarantee a clean ABI.
- Verify and re-verify that you have the minimum amount of dependencies. With every OS release, verify that your BuildRequires and Requires are correct in your .spec files. Unnecessary requirements complicate the bootstrap process.
- Try to build your package from a minimal chroot. This exercise will help you determine what is actually needed for your package. You'll be surprise how often ImageMagick is unnecessarily required for some core package.
- Avoid having simpler packages depend on more complex ones... for example, right now, bison depends on Perl, and autoconf depends on emacs.
- libtool is evil. Do you really need it? Can you do without it? If you can't, make sure your package can auto reconfigure itself if a user touches an .m4 file. And make sure, it can be reconfigured with the auto tools in the OS version you are building for.
- autoconf is also evil. See above notes on libtool.
- Do not make assumptions about endianness of architecture. Glibc has macros for determining this (see <endian.h> and the __BYTE_ORDER macro). Heck, there are even autoconf tests for that if you must.
- When possible use the lua extension language for rpm instead of external tool dependencies (say perl or php). Lua is a an embedded scripting language available for RPM .spec files that can minimize your package build dependencies.
- RPM spec files have a blessed way of dealing with patches. Don't try to be clever. For instance, groff.spec uses `git' to apply patches. Really, don't be clever. If you dislike RPM's patch syntax/usage, propose a better solution.
- Use .spec file options such as --without-docs or globals like %{bootstrap} to specify shortcuts around building the entire package. 90% of bootstrapping problems stem from missing dependencies for documentation. Providing a %global or an option to avoid building documentation saves a lot of effort in the bootstrapping process.
- Only place tests in the %check section so they can be easily overridden while bootstrapping.
All in all, bootstrapping is a tedious process, but its pain can be alleviated by following simple guidelines and testing your adherence to them periodically (ideally with every release).
And the end results can be great - Red Hat Enterprise Linux 7.1 Beta is available to all RHEL subscribers today with support for POWER8 Little Endian.
Thanks for reading - as ever, we welcome your feedback via the comments.
Last updated: March 15, 2023