When you pull an OCI image from the registry, you implicitly trust that it contains what its builder claims it does. They may even provide an SBOM for this image, but the SBOM itself must also be trusted. Nothing prevents a builder from reporting an innocuous SBOM, while injecting malware into the image.
Reproducible builds render this sort of undetectable tampering impossible: A user can directly rebuild the image and verify that the result is bit-for-bit identical to the published blobs. In fact, this capability is useful for an end-user and builder alike because it provides a powerful software supply chain security check against infrastructure compromises. In an era where digital sovereignty is top of mind for many, reproducibility is increasingly important to establish trust and verification of artifacts.
Hummingbird reproducibility
Red Hat recently announced Project Hummingbird, a catalog of hardened minimal container images built to address the needs of environments that demand near-zero CVEs. Hummingbird images are built in Konflux, a Tekton-based software factory with a focus on supply chain security. All images built by Konflux also come with both an SBOM and a SLSA provenance artifact.
Using these Konflux artifacts, it's possible to rebuild any Hummingbird images using just cosign and podman. First, we download and verify the SLSA provenance attestation for the desired image (for now, the key used in this example is only available in the upstream GitLab project):
$ IMAGE=quay.io/hummingbird-hatchling/jq:latest
$ curl -LO https://gitlab.com/redhat/hummingbird/containers/-/raw/fc5c29670347ea2666ec2910a28880f76f5cdc4e/ci/key.pub
$ cosign verify-attestation --key key.pub --insecure-ignore-tlog \
--type slsaprovenance $IMAGE > attestation.jsonRun the rebuild script, feeding in the attestation and capturing the final image ID:
$ iid=$(podman run -i --rm --privileged -v /mnt \
quay.io/hummingbird-ci/builder rebuild < attestation.json)And finally, pull the original image and verify its image ID matches what was rebuilt locally:
$ iid2=$(podman pull $IMAGE)
$ [ $iid = $iid2 ] && echo "Identical"If it prints Identical, then congratulations! You've successfully reproduced a Hummingbird image.
Note the comparison here is checking the image ID, which is of uncompressed content. This is distinct from the repo digest once it's pushed to a registry, which is calculated from compressed contents. More detailed steps and information are available in the Project Hummingbird upstream repo.
Achieving reproducibility is not a one-time task. It's a property that requires constant upkeep as new content is added or as build tooling evolves. For example, Hummingbird images have recently started leveraging chunkah, a tool that splits images into content-based layers for more efficient updates and storage. Part of the onboarding process for this required fixing various reproducibility issues in the splitting process.
To ensure all our images are always reproducible, our CI rebuilds images on every change. The image is built once in Konflux, and then rebuilt once more outside of it using steps similar to the above.
Note: It's important to understand that in this process, we did not rebuild the individual RPMs that go into Hummingbird images. What we've reproduced is only the assembly of those RPMs into an OCI image.
How it works
There are a lot of things that must come together for this to succeed. First, most inputs to the Hummingbird image builds are kept in the Git repo itself. This allows us to do CI testing before any input is updated (this is not very different from how a Rust application may use Cargo.lock). But it also means that given an image and the associated Git commit from which it was built, we know exactly which RPMs were used to produce it.
Both Konflux and the rebuild process above use Buildah for building images. In Buildah 1.41, support was added for reproducible builds using the --source-date-epoch (or SOURCE_DATE_EPOCH environment variable) and --rewrite-timestamp options. SOURCE_DATE_EPOCH is a standardized way to tell tools to use a fixed timestamp for reproducible output. When building Hummingbird images, the SOURCE_DATE_EPOCH comes from the timestamp of the Git commit we're building from.
The Git commit can be obtained from the SLSA provenance. But there are other factors which influence the build output that's not captured by the Git commit:
- Konflux uses a containerized version of buildah. For maximum reproducibility, we need the exact buildah image that was used during the build. This is obtained from the SLSA provenance.
- Hummingbird images are built using a multi-stage build process where a builder image creates the target image (using e.g.
dnf install --installroot=...). When reproducing builds, we need the exact builder image that was used during the build. This is obtained from the image's SBOM.
And this is what the rebuild command does: It identifies all of these inputs using the SLSA attestation and SBOM and then builds the image.
Reaching reproducibility
Installing RPMs is currently an imperative process. RPMs are unpacked, scriptlets are run in a specific order, and various files are created at runtime, including non-trivial ones like databases. All of these had to be made reproducible. For instance, the RPM SQLite database by default is in WAL mode, which produces non-deterministic journal files, so it must be switched to DELETE mode after installation (sometimes called "parking"). The order in which RPMs are installed was made reproducible by having dnf sort the package list before feeding to RPM. The order in which OCI annotations are added matters. Various non-reproducible bits which don't matter or don't belong in images were removed (such as /etc/machine-id).
Once we had reproducible builds locally, we also had to adapt the Konflux build process itself to better support reproducibility for production builds. For example, Konflux injects various security-related metadata into built images for use by scanners. For better reproducibility, we moved this logic to live directly in the Containerfile build process rather than in Konflux. The Konflux release pipeline still verifies that this metadata is correct and in line with supply chain security best practices.
Looking ahead
Reproducible builds play a key role in software supply chain security. It cannot be only theoretical. Reproducibility as a property is worthless until someone actually exercises it. This is why Hummingbird images were designed to be easy to reproduce, with minimal tooling. A major output of this effort was the start of Konflux discussions to make reproducible builds a priority feature of Konflux. This entails moving a lot of the work that happened in Hummingbird out to shared tooling, where more images can achieve reproducibility. Until then, feel free to grab a Project Hummingbird image and try reproducing it!