Breaking up the Container Monolith
Dan Walsh, of SELinux Coloring Book fame, presented on the work he and his team have been doing with containers. Dan has long been a technical leader in the container and SELinux spaces and is an amazing guy.
If you take a moment to think back to the PDF format, it was originally created by Adobe to solve representing a document in a consistent way. However, that is not what made it popular and useful. The power of the PDF format is the format is open and there are tons of tools to read and write them. This is what made the PDF format ubiquitous. Dan argues the same needs to be true of containers. They need to be every bit as generic as the PDF format, with multiple creation and runtime tools. Competition breeds better software and ultimately better standards.
If you look at containers, containers are just linux in a standard image format. So, what does OpenShift/Kubernetes need to run a container? It needs a standard container format. It needs a way to pull and push containers to/from registries. It needs a way to explode the container to disk. It needs a way to execute the container image. Finally and optionally, it needs a container management API. Red Hat has been working on these problems under the ‘Project Atomic’ flag. This is a collection of tools, all under one CLI command called atomic.
Standard Container Image Format
The Open Container Initiative (OCI) provides the OCI Image Format to standardize the container image format. This is what allows everything else to be developed, as now there is the “PDF” for containers.
Skopeo (‘remote viewing’ in Greek) is used by the Atomic CLI. Its purpose in life is to view container metadata at the registry without downloading the full image. It can now pull/push images as well. Red Hat worked with CoreOS to split it out into a go library and is now available on Github.
Explode Image Local Storage
A critical need for containers was a way to mount the container image without actually running it. There are many places where this is useful and required. For example, if you want to run an OpenSCAP scan of the container to make sure there are no critical vulnerabilities with it, you don’t want to actually execute the container. You need a way to mount the data in a standard way. This is now possible with the ‘atomic mount’ command. Of course, this project can be found on Github.
Execute the container
Executing the container was tackled by the full OCI standard, the OCI Runtime Specification. This standard describes what the container should look like on disk. runc is the default implementation of this standard. As of Docker 1.11, runc is the default backend. There are a couple other runtime implementations, runv and clear are alternatives; however, docker only supports runc as a backend. The runtime standard allows the PDF-like tool chain ecosystem.
These various components make up the bare minimum for running a container. That said, the container space also could use other concepts borrowed from full linux distributions.
The industry needs a way to sign containers in an open and scalable manner, similar to GPG web-of-trust. Strangely enough, containers can now be signed with GPG; moreover, the signatures can be detached from the image. They are also not specific to any registry. You can now trust containers produced by trusted organizations, as well as verify they have not been altered. You can find a signing example here and one for managing trust here.
Red Hat has developed Atomic Host over the last several years. Atomic Host is an OS based on RHEL but designed for running containers in a lightweight fashion. On Atomic Host, software is shipped as a container. Dan believes the future OS will be a very basic OS where system services and everything else will be installed as containers. Atomic CLI can now use Skopeo to pull down images, using OSTree to store image layers on disk, then creates a systemd unit file, then use runc to run the container. Note, there is no daemon involved here, which is important for boot processes. This allows etcd and flannel to start before the container runtime! Even open source docker can now run as a system container.
Standalone containers are a method to provide normal RHEL content as containers, rather than using RPM. Daemons run in containers with standard ports and volumes prepackaged for standard use-cases. Think of Apache when started at boot, listening on privileged ports, reading local content, etc.
Read-only container images should be the default, but they are not currently. In production, images should really be immutable. This would allow us to get rid of the COW filesystem, improving storage performance. Shared filesystems would also be a huge benefit here with read-only containers. Currently, in order for a runtime in a cluster to run a container, that container has to be pulled down from a registry and stored on local storage. Moving to an NFS storage model would dramatically improve performance and image management. Rather than having to push new containers to a registry, just so all your OpenShift nodes can download it, wastes a ton of CPU, storage, and network resources. Dan’s team is working on this problem by supporting network-based storage.
Container Image Development Tools
Dockerfile sucks. It’s a fact. It has been four years and we still have to use Dockerfile, which is a horrible version of bash. An image in a tarball and a json file at its core. We now have Ansible-Containers, which gives us another way to describe a containerized application; however, it still uses Dockerfile under the hood. We need a way to build images without a container runtime daemon. This brings us to Buildah, which is an open source tool for building containers without using Dockerfile at all. This tool also allows us to build containers without all the container development tools inside it (yum, GCC, secrets, etc). Basically, Buildah produces clean containers, which means the container can be much, much smaller. It even supports reading in a Dockerfile to produce a container, but that kind of defeats the point.
Container Management API
Red Hat started working on the CRI-O effort. It is not a form of docker, rather it supports the Kubernetes container management API or to put it another way “it implements the Kubelet Container Runtime Interface (CRI) using OCI conformant runtimes”. Kubernetes communicates with CRI-O, which in turn executes the container.
Dan’s team at Red Hat has been focused on bringing open source development, principles and concepts to the container lifecycle. While containers to date are all open source, the container management system has been one large monolith. Breaking up the container into individual components allows for rapid innovation. These components each evolve, being developed, forked or even allowing for net-new tools to meet the needs of the container ecosystem. These efforts are now paying off and the container landscape is much better for it.
Whether you are new to Containers or have experience, downloading this cheat sheet can assist you when encountering tasks you haven’t done lately.