Featured image for containers and containerization topics.

Recently, I've been experimenting with how to build and use composable software catalogs on Kubernetes. Similar to Red Hat Software Collections for Red Hat Enterprise Linux, but adapted for a container context, composable software catalogs let developers add tooling without building a new container image.

This article explains how composable software catalogs use existing container technologies to build on the Software Collections model, and how they can potentially make more options available to container users, simplify builds, and reduce container image sizes.

Software Collections in a containerized world

To understand the need for composable software catalogs, let's go back in time a bit.

Do you remember Software Collections? The motto of this project, backed by Red Hat, was: All versions of any software on your system. Together.

The promise was to build, install, and use multiple versions of software on the same system, without affecting system-wide installed packages. The key point was to create this multifold environment without affecting system-wide installed packages. In other words, it provided additional tooling without any change to the current state of the operating system as a whole. Software Collections worked well in its time, even winning a Top Innovator Award at DeveloperWeek 2014.

Note: Red Hat Software Collections is available for Red Hat Enterprise Linux 7 and earlier supported releases. Starting with Red Hat Enterprise Linux 8, application streams replace Software Collections.

A new landscape but the same need

Things have changed since 2014. The container revolution popped up and brought features such as execution isolation, file system layering, and volume mounting. This solved quite a lot of problems. Thanks to containers, one could say that the old Software Collections became obsolete. But container orchestrators came along, as well (I'll stick to Kubernetes in this article). Deploying workloads as containers inside pods became standard. Finally, even workloads such as build pipelines or IDEs moved to the cloud and also ran inside containers.

But containers themselves have limitations. At some point, developers start experiencing the same type of need inside containers that Software Collections once tried to solve at the operating system level.

Why revisit Software Collections?

A container is based on a single container image, which is like a template for multiple identical containers. And a container image is optionally based on a single container image parent. To build a container image, you typically start from a basic operating system image. Then you add layers one by one, each on top of the previous one, to provide each additional tool or feature that you need in your container. Thus, each container is based on an image whose layers are overlays in a single inheritance tree. A container image is a snapshot of the current state of an operating system at a given point in time.

For container images, the old promise of Software Collections would be useful. In a container context, the goal of providing additional tooling without any change to the current state of the operating system simply becomes without having to build a new container image.

A combinatorial explosion of components

Let's take an example:

  • I'd like to run a Quarkus application—let's say the getting started example—directly from source code, in development mode. I will need at least a JDK version and a Maven version on top of the base operating system.
  • I'd also like to test the application with the widest available range of versions and flavors of the JDK, Maven, and the base operating system.

For each combination of the possible variants for those three components (JDK, Maven, and operating system), I would need to build a dedicated container image. And what if I also wanted to test with as many Gradle versions as possible? Not to mention including the native build use case, which requires GraalVM. Now imagine the combinatorial explosion that will occur if I decide to also include arbitrary versions of all my preferred tools.

Inheritance versus composition

The current manner of building containers limits us to a single-inheritance model when what we need is composition. Sometimes it would be great to be able to compose additional features or tools inside a container, without having to build a new container image. In fact, we just need to compose container images at runtime. Obviously, allowing that in full generality seems tricky (if not impossible) to implement, at least given the current state of Kubernetes and containers. But what about a more limited case where we would only inject external self-contained tooling or read-only data into an existing container?

Toward composable software catalogs on Kubernetes

Injecting external self-contained tooling or read-only data into a container at runtime would obviously be particularly relevant if you think of things such as Java, Maven, Gradle, even Node.js, NPM, Typescript, and the growing number of self-contained Go utilities like Kubectl and Helm, as well as the Knative or Tekton CLI tools. None of them requires an "installation" process, strictly speaking. In order to start using them on most Linux variants of a given platform, you need only to download and extract them.

Combining two container technologies

Now let's introduce two container technologies that will allow us to implement this tool injection at runtime:

Container Storage Interface

According to the Kubernetes documentation:

CSI was developed as a standard for exposing arbitrary block and file storage storage systems to containerized workloads on Container Orchestration Systems (COs) like Kubernetes. With the adoption of the Container Storage Interface, the Kubernetes volume layer becomes truly extensible. Using CSI, third-party storage providers can write and deploy plugins exposing new storage systems in Kubernetes without ever having to touch the core Kubernetes code. This gives Kubernetes users more options for storage and makes the system more secure and reliable.

CSI opens many doors to implementing and integrating storage solutions into Kubernetes. On top of that, the CSI Ephemeral Inline Volumes feature, still in beta, for now, allows you to specify a CSI volume, along with its parameters, directly in the pod spec, and only there. This is perfect to allow references, directly inside the pod, to the name of a tool to inject into pod containers.

Buildah containers

The buildah tool is a well-known CLI tool that facilitates building Open Container Initiative (OCI) container images. Among many other features, it provides two that are very interesting for us:

  • Creating a container (from an image) that is not executing any command at the start, but can be manipulated, completed, and modified to possibly create a new image from it.
  • Mounting such a container to gain access to its underlying file system.

Buildah containers as CSI volumes

The first attempt at combining CSI and buildah started as a prototype example by the Kubernetes-CSI contributors, csi-driver-image-populator. It was my main inspiration for the work shown in this article.

Providing a very lightweight and simple CSI driver, with the image.csi.k8s.io identifier, csi-driver-image-populator allows container images to be mounted as volumes. Deployed with a DaemonSet, the driver runs on each worker node of the Kubernetes cluster and waits for volume-mount requests. In the following example, a container image reference is specified in the pod as a parameter of the image.csi.k8s.io CSI volume. Using buildah, the corresponding CSI driver pulls the image, creates a container from it, and mounts its file system. The buildah container filesystem is thus available to mount directly as a pod volume. Finally, the pod containers can reference this pod volume and use it:

apiVersion: v1
kind: Pod
metadata:
  name: example
spec:
  containers:
    - name: main
      image: main-container-image
      volumeMount:
    - name: composed-container-volume
      mountPath: /somewhere-to-add-the-composed-container-filesystem
  volumes:
    - name: composed-container-volume
      csi:
        driver: image.csi.k8s.io
        volumeAttributes:
          image: composed-container-image

Upon pod removal, the pod volume is unmounted by the driver, and the buildah container is removed.

Adapting the CSI driver for composable software catalogs

Some aspects of the csi-driver-image-populator prototype do not fit our use case for composable software catalogs:

  • We don't need containers in the pod to have write access to composed image volumes. The whole idea in this article is to inject read-only tools and data to the pod containers through the CSI inline volumes.
  • Sticking to the read-only use case allows us to use a single buildah container for a given tool image, and share its mounted file system with all the pods that reference it. The number of buildah containers then depends only on the number of images provided by the software catalog on the CSI driver side. This opens the door to additional performance optimizations.
  • For both performance and security reasons, we should avoid automatically pulling the container image mounted as a CSI inline volume. Let's pull images by an external component, outside the CSI driver. And let the CSI driver expose only images that were already pulled. Thus we limit the mounted images to a well-defined list of known images. In other words, we stick to a managed software catalog.
  • Finally, for Kubernetes clusters that use an OCI-conformant container runtime (cri-o, for example), we should be able to reuse images already pulled by the Kubernetes container runtime on the cluster node. This would take advantage of the image pulling capability of the Kubernetes distribution and comply with its configuration, instead of using a dedicated, distinct mechanism and configuration to pull a new image.

To validate the idea described in this article, the changes just listed were implemented in a newly created CSI driver named csi-based-tool-provider, starting from the csi-driver-image-populator prototype to bootstrap the code.

Providing dedicated tooling images

In general, the new csi-based-tool-provider driver is able to mount, as a pod read-only volume, any file system subpath of any container image. But still, it would be useful to define a typical structure for the container images that would populate such a software catalog. For "no-installation" software such as Java, which is simply delivered as an archive to extract, the most straightforward way to populate the catalog is to use "from scratch" images with the software directly extracted at the root of the filesystem. An example of a Dockerfile for the OpenJDK 11 image would be:

FROM registry.access.redhat.com/ubi8/ubi as builder
WORKDIR /build
RUN curl -L https://github.com/AdoptOpenJDK/openjdk11-binaries/releases/download/jdk-11.0.9.1%2B1/OpenJDK11U-jdk_x64_linux_hotspot_11.0.9.1_1.tar.gz | tar xz

FROM scratch
WORKDIR /
COPY --from=builder /build/jdk-11.0.9.1+1 .

The same holds true for the Maven distribution required by our Quarkus example mentioned earlier. Next, we'll use the Quarkus example as a proof of concept (POC).

Using composable software catalogs with Quarkus

Now let's come back to our Quarkus example. I want to use only an interchangeable basic operating system for my container, without building any dedicated container image. And now I can manage additional tooling through CSI volume mounts on images from my new composable software catalog.

The full deployment looks like this:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: csi-based-tool-provider-test
spec:
  selector:
    matchLabels:
      app: csi-based-tool-provider-test
  replicas: 1
  template:
    metadata:
      labels:
        app: csi-based-tool-provider-test
    spec:
      initContainers:
      - name: git-sync
        image: k8s.gcr.io/git-sync:v3.1.3
        volumeMounts:
        - name: source
          mountPath: /tmp/git
        env:
        - name: HOME
          value: /tmp
        - name: GIT_SYNC_REPO
          value: https://github.com/quarkusio/quarkus-quickstarts.git
        - name: GIT_SYNC_DEST
          value: quarkus-quickstarts
        - name: GIT_SYNC_ONE_TIME
          value: "true"
        - name: GIT_SYNC_BRANCH
          value: 'main'
      containers:
      - name: main
        image: registry.access.redhat.com/ubi8/ubi
        args:
          - ./mvnw
          - compile
          - quarkus:dev
          - -Dquarkus.http.host=0.0.0.0
        workingDir: /src/quarkus-quickstarts/getting-started
        ports:
          - containerPort: 8080
        env:
          - name: HOME
            value: /tmp
          - name: JAVA_HOME
            value: /usr/lib/jvm/jdk-11
          - name: M2_HOME
            value: /opt/apache-maven-3.6.3
        volumeMounts:
        - name: java
          mountPath: /usr/lib/jvm/jdk-11
        - name: maven
          mountPath: /opt/apache-maven-3.6.3
        - name: source
          mountPath: /src
      volumes:
      - name: java
        csi:
          driver: toolprovider.csi.katalogos.dev
          volumeAttributes:
            image: quay.io/dfestal/csi-tool-openjdk11u-jdk_x64_linux_hotspot_11.0.9.1_1:latest
      - name: maven
        csi:
          driver: toolprovider.csi.katalogos.dev
          volumeAttributes:
            image: quay.io/dfestal/csi-tool-maven-3.6.3:latest
      - name: source
        emptyDir: {}

To clone the example source code from GitHub, I reuse the git-sync utility inside an initContainer of my Kubernetes Deployment, but that's just for the sake of laziness and doesn't relate to the current work.

Making tools available

The first real interesting part of the implementation is:

      ...
      volumes:
      - name: java
        csi:
          driver: toolprovider.csi.katalogos.dev
          volumeAttributes:
            image: quay.io/dfestal/csi-tool-openjdk11u-jdk_x64_linux_hotspot_11.0.9.1_1:latest
      - name: maven
        csi:
          driver: toolprovider.csi.katalogos.dev
          volumeAttributes:
            image: quay.io/dfestal/csi-tool-maven-3.6.3:latest
      ...

This configuration uses the new CSI driver to expose my two tooling images as CSI read-only volumes.

Mounting the tools

The following configuration makes Java and Maven installations available for the main pod container to mount them at the needed place:

      ...
      containers:
      - name: main
        ...
        env:
          ...
          - name: JAVA_HOME
            value: /usr/lib/jvm/jdk-11
          - name: M2_HOME
            value: /opt/apache-maven-3.6.3
        volumeMounts:
        - name: java
          mountPath: /usr/lib/jvm/jdk-11
        - name: maven
          mountPath: /opt/apache-maven-3.6.3
        ...

Note that the pod container owns the final path where the Java and Maven installations will be mounted. So the pod container can also set the related environment variables to the right paths.

Using the mounted tools

Finally, the container that will build and run the application source code in development mode can be based on a bare operating system image, and has nothing more to do than call the recommended startup command:

      ...
      - name: main
        image: registry.access.redhat.com/ubi8/ubi
        args:
          - ./mvnw
          - compile
          - quarkus:dev
          - -Dquarkus.http.host=0.0.0.0
        workingDir: /src/quarkus-quickstarts/getting-started
        ...

The example will start on a Red Hat Universal Base Image. But the great thing is that you can make changes, such as switching to an Ubuntu image, and the server will start and run the same way without any other change. And if you want to switch to another version of Maven, just change the reference to the corresponding container image in the maven CSI volume.

If you scale up this deployment to ten pods, the same underlying Java and Maven installations will be used. No files will be duplicated on the disk, and no additional containers will be created on the cluster node. Only additional bind mounts will be issued on the cluster node. And the space savings will be the same, however many workloads use the Java and Maven tooling images on this node.

What about performance?

In the very first implementation, the new csi-based-tool-provider driver ran buildah manifest commands to store the various metadata related to mounted images, along with the associated containers and volumes, inside an OCI manifest. Although this design was useful to get a POC working quickly, it required hard locks on the whole CSI mounting and unmounting process (NodePublishVolume and NodeUnpublishVolume CSI requests), in order to avoid concurrent modification of this global index and ensure consistency. Moreover, the buildah container was initially created on the fly at mount time if necessary, and as soon as a given tool was not mounted by any pod container anymore, the corresponding buildah container was removed by the CSI driver.

This design could lead to a mount delay of several seconds, especially when mounting an image for the first time. Instead of that design, the driver now uses an embeddable, high-performance, transactional key-value database called BadgerDB. This choice allows much better performance and less contention caused by read-write locks. In addition, the list of container images exposed to the driver is now configured through a mounted ConfigMap. Images, as well as their related buildah containers, are managed, created, and cleaned up in background tasks. These two simple changes have reduced the average volume mount delay to some fractions of a second, as shown by the graph of the related Prometheus metric in Figure 1.

Composable software catalogs on Kubernetes: Average volume mount delay for a tool is between 15 and 20 ms.
Figure 1: The average volume mount delay for updated containers.
Figure 1: Average volume mount delay for updated containers.

On a local Minikube installation, for a simple pod containing one mounted CSI volume with the JDK image mentioned earlier, and one very simple container (doing nothing more than listing the content of the mounted volume and then sleeping), the average delay required to mount the JDK inside the Pod fluctuated between 15 and 20 milliseconds. In comparison with the overall pod startup duration (between 1 and 3 seconds), this is pretty insignificant.

Testing the example

The related code is available in the csi-based-tool-provider GitHub repository, including instructions on how to test it using pre-built container images.

Additional use cases for composable software catalogs

Beyond the example used in this article, we can foresee concrete use cases where such tool injection would be useful. First, it reduces the combinatorial-explosion effect of having to manage, in a single container image, the versioning and lifecycle of both the underlying system and all the various system-independent tools. So it could reduce the overall size of image layers stored on Kubernetes cluster nodes.

Red Hat OpenShift Web Terminal

The Red Hat OpenShift Web Terminal is an example of a tool that could benefit from a software catalog. When opening a web terminal, the OpenShift console starts a pod with a container embedding all the typically required CLI tools. But if you need additional tools, you will have to replace this default container image with your own customized one, built by your own means. This build would not be necessary if we could provide all the CLI tools as volumes in a basic container. Composable software catalogs would also relieve the continuous integration (CI) burden of having to rebuild the all-in-one container image each time one of the tools has to be updated. Going one step further, a catalog should allow using, in the web terminal, exactly the same version of the Kubernetes-related command-line tools (like oc and kubectl) as the version of the underlying OpenShift cluster.

Tekton pipelines

I also imagine how composable software catalogs could be used to inject off-the-shelf build tools into Tekton Task Steps. Here as well, there would be no more need to change and possibly rebuild Step container images each time you want to run your pipeline with different build tool variants or versions.

Cloud IDEs

Last but not least, composable software catalogs could benefit the various cloud-enabled IDEs, such as Eclipse Che. The catalogs would make it really easy to:

  • Efficiently switch the Java or Maven installations in a workspace
  • Share these installations among the various containers
  • Have several versions at the same time

Here as well, this new approach could greatly reduce the CI burden. We could stop building and maintaining a container image for each combination of underlying OS and tools. And composable software catalogs would finally unlock the combination of developer tools at runtime according to the developer's needs.

What next?

Although the proof of concept presented in this article is in an early alpha stage, we can already imagine some of the next steps to move it forward.

Welcoming Katalogos

A lot can be built on the foundation of the csi-based-tool-provider. But as a first step, we should certainly set up a wider project dedicated to Kubernetes composable software catalogs. The CSI driver would be its first core component. So we've called this project Katalogos, from the ancient Greek word for a catalog: a register, especially one used for enrollment.

Packaging the project as a complete solution

Once the wider Katalogos project is bootstrapped, these next steps come to mind:

  • Add a Software Catalog Manager component to organize, pull, and manage images as software catalogs and make them available to the CSI driver on each cluster node.
  • Build an operator to install the CSI driver as well as configure the Software Catalog Manager.
  • Define a way to easily inject the required CSI volumes, as well as related environment variables, into pods according to annotations.
  • Provide related tooling and processes to easily build software catalogs that can feed the Software Catalog Manager.
  • Extend the mechanism to support the more complex case of software packages that are not inherently self-contained.

Getting feedback and building a community

This article presented some ideas, with a minimal proof of concept, for a project that, I believe, could meet a real need in the current state of cloud-native development. The article is also a bid to get feedback, spark interest, and gather other use cases where the concept would fit. So, please comment, try the examples, open issues, fork the GitHub repository ... or simply star it.

Comments