Use bootc logically bound images to deploy a Kafka cluster

Logically bound images are a recently added feature in bootc that allows any container image to be managed directly by bootc. One of the many benefits this provides is simplified system management. An application image can be "bound" to a bootc operating system image as part of the bootc container build.

This simplified system management would have been useful for a project I previously worked on. The project was a large web application composed of dozens of microservices running in a Kubernetes cluster. These microservices primarily used Kafka to communicate. Our Kafka instance was also deployed in a Kubernetes cluster using the amq-streams operator.

The operator made the deployment and management of the Kafka cluster simple; however, keeping the cluster stable was a constant battle. One of the main issues was due to Kafka's heavy memory requirement. 32GB of memory is the minimum recommendation for a single Kafka node with a minimum of 3 nodes. Due to the cost, our Kubernetes nodes were not sized to accommodate this large memory requirement. We likely could have leveraged many of the newer Kubernetes features to stabilize the cluster; however, I wonder if it would have made more sense to run each Kafka node directly on a virtual machine (VM).

We ran our other critical loads like our databases outside of Kubernetes already and Kafka was equally critical. The ability to run Kafka directly on a cloud-based VM would have made scaling each node easy, just by increasing the memory in the cloud provider. By using a bootc image with a logically bound Kafka image, I should be able to preserve the benefits of a container workflow while still allowing each node to scale vertically beyond the limits of a Kubernetes cluster. Let's see how this looks in practice.

Building the broker base image

This environment will be based off the basic example from the Kafka image docs. The goal is to create a production-like test environment, so we need 3 broker nodes and 3 controller nodes. Let's start by creating a broker image. Each Kafka broker node will share the basic configuration, so let's first create a base image with the configuration defined. Each individual broker will require some unique parameters to be defined. So we will later create additional images based off the kafka-broker image.

Create the directory structure for the image

First, create a directory to contain all the Kafka broker related files:

$ mkdir kafka-broker && cd kafka-broker
$ mkdir -p /usr/share/containers/systemd

Create the quadlet container

Next, let's create the quadlet container definition at /usr/share/containers/systemd/kafka-broker.container. The quadlet docs explain the structure of this file quite well. The environment variables are the same ones found in the examples from the Kafka image docs, with some adaptations to the hostnames and ports. This is to ensure we can access the nodes when they are running in virtual machines rather than a container. See below:

[Container]
Image=docker.io/apache/kafka:3.8.0
GlobalArgs=--storage-opt=additionalimagestore=/usr/lib/bootc/storage
PublishPort=9092:9092
Environment=KAFKA_PROCESS_ROLES=broker
Environment=KAFKA_INTER_BROKER_LISTENER_NAME=PLAINTEXT
Environment=KAFKA_CONTROLLER_LISTENER_NAMES=CONTROLLER
Environment=KAFKA_LISTENER_SECURITY_PROTOCOL_MAP=CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
Environment=KAFKA_CONTROLLER_QUORUM_VOTERS=4@kafka-controller-1:9093,5@kafka-controller-2:9093,6@kafka-controller-3:9093
Environment=KAFKA_LISTENERS='PLAINTEXT://:9092'
Environment=KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS=0
[Unit]
Description=Kafka broker
[Install]
WantedBy=default.target
[Service]
Restart=always

Create the bootc image Containerfile

The only other file we need to create is the Containerfile in the root of the kafka-broker directory. A few interesting things to note about this Containerfile:

It installs firewalld and forwards port 9092/tcp. Firewalld is used for convenience and port 9092 is how we will communicate with the Kafka broker service
It creates a symlink to bind the kafka-broker image to the bootc image.

See below:

FROM registry.redhat.io/rhel9/rhel-bootc:9.4
COPY ./usr/. /usr
RUN <<EOF
set -euo pipefail
# install dependencies
dnf -y install firewalld
dnf clean all
# bind kafka-node image to the bootc image
ln -s /usr/share/containers/systemd/kafka-broker.container /usr/lib/bootc/bound-images.d/kafka-broker.container
# expose Kafka port 9092
firewall-offline-cmd --add-port 9092/tcp
EOF

Build the container image

Now that the files are in place, let's build the image locally. This will serve as the base image for the 3 broker nodes:

$ podman build . -t localhost/kafka-broker:latest

Building unique Kafka broker nodes

Each broker requires a unique hostname and node_id. This could be accomplished in a few different ways. I found it straightforward to create a unique container image for each node. This would allow me to continue using all the same familiar container build infrastructure in a production pipeline to deploy the nodes. However, you might have other infrastructure that would make more sense. In the end, each image just needs to add a couple of files to define the unique parameters.

Create the directories

Let's create 3 new directories, one for each broker:

$ mkdir kafka-broker-1 
$ mkdir kafka-broker-2
$ mkdir kafka-broker-3

Each broker also needs an etc and usr directory for configuration files. Let's create those now:

$ mkdir -p kafka-broker-1/usr/share/containers/systemd/kafka-broker
$ mkdir -p kafka-broker-2/usr/share/containers/systemd/kafka-broker
$ mkdir -p kafka-broker-3/usr/share/containers/systemd/kafka-broker
$ mkdir -p kafka-broker-1/etc
$ mkdir -p kafka-broker-2/etc
$ mkdir -p kafka-broker-3/etc

Create the unique broker configuration files

First, let's create the hostname file. For this test environment, we'll set each node's hostname to kafka-broker-{id}. So, let's create kafka-broker-1/etc/hostname with the following contents. Do the same for the other two nodes:

kafka-broker-1

The other configuration file will be created at kafka-broker-1/usr/share/containers/systemd/kafka-broker.container.d/10-broker.conf:

[Container]
Environment=KAFKA_NODE_ID=1
Environment=KAFKA_ADVERTISED_LISTENERS='PLAINTEXT://kafka-broker-1:9092'

Finally, create a Containerfile in the root at kafka-broker-1/Containerfile:

FROM localhost/kafka-broker
COPY ./usr/. /usr
COPY ./etc/. /etc

Create the same files for the other two brokers, substituting the ID.

Build and push the broker container images

Now that we have defined all the broker specific configuration, let's build a container image for each broker. We will also push each broker image to a registry to enable bootc upgrades:

$ export USER=<your-quay-username>
$ cd kafka-broker-1 && podman build . -t "quay.io/$USER/kafka-broker-1:latest"
$ cd kafka-broker-2 && podman build . -t "quay.io/$USER/kafka-broker-2:latest"
$ cd kafka-broker-3 && podman build . -t "quay.io/$USER/kafka-broker-3:latest"
$ podman push "quay.io/$USER/kafka-broker-1:latest"
$ podman push "quay.io/$USER/kafka-broker-2:latest"
$ podman push "quay.io/$USER/kafka-broker-3:latest"

Building the controller base image

Now we need to build the 3 controller node images. This is a very similar process to building the broker nodes.

Create the directory structure for the image:

$ mkdir kafka-controller && cd kafka-controller
$ mkdir -p /usr/share/containers/systemd

Create the quadlet container:

[Container]
Image=docker.io/apache/kafka:3.8.0
GlobalArgs=--storage-opt=additionalimagestore=/usr/lib/bootc/storage
PublishPort=9093:9093
Environment=KAFKA_PROCESS_ROLES=controller
Environment=KAFKA_INTER_BROKER_LISTENER_NAME=PLAINTEXT
Environment=KAFKA_CONTROLLER_LISTENER_NAMES=CONTROLLER
Environment=KAFKA_CONTROLLER_QUORUM_VOTERS=4@kafka-controller-1:9093,5@kafka-controller-2:9093,6@kafka-controller-3:9093
Environment=KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS=0
Environment=KAFKA_LISTENERS=CONTROLLER://:9093
[Unit]
Description=Kafka controller
[Install]
WantedBy=default.target
[Service]
Restart=always

Create the bootc image Containerfile:

FROM registry.redhat.io/rhel9/rhel-bootc:9.4
COPY ./usr/. /usr
RUN <<EOF
set -euo pipefail
# install dependencies
dnf -y install firewalld
dnf clean all
# bind kafka-node image to the bootc image
ln -s /usr/share/containers/systemd/kafka-controller.container /usr/lib/bootc/bound-images.d/kafka-controller.container
# expose Kafka port 9092, 9093
firewall-offline-cmd --add-port 9093/tcp --add-port 9092/tcp
EOF

Build the container image:

$ podman build . -t localhost/kafka-controller:latest

Building unique Kafka controller nodes

Create the directories:

$ mkdir -p kafka-controller-1/usr/share/containers/systemd/kafka-controller
$ mkdir -p kafka-controller-2/usr/share/containers/systemd/kafka-controller
$ mkdir -p kafka-controller-3/usr/share/containers/systemd/kafka-controller
$ mkdir -p kafka-controller-1/etc
$ mkdir -p kafka-controller-2/etc
$ mkdir -p kafka-controller-3/etc

Create the unique broker configuration files (below).

kafka-controller-1/etc/hostname:

kafka-controller-1

kafka-controller-1/usr/share/containers/systemd/kafka-controller.container.d/10-controller.conf:

[Container]
Environment=KAFKA_NODE_ID=4
Environment=KAFKA_ADVERTISED_LISTENERS='PLAINTEXT://kafka-controller-1:9092'

kafka-controller-1/Containerfile:

FROM localhost/kafka-controller
COPY ./usr/. /usr
COPY ./etc/. /etc

Create the same files for the other two controllers, substituting the ID. The controller NODE_IDs start at 4 and count up, so:

kafka-controller-1 -> KAFKA_NODE_ID=4
kafka-controller-2 -> KAFKA_NODE_ID=5
kafka-controller-3 -> KAFKA_NODE_ID=6

Build and push the controller container images:

$ export USER=<your-quay-username>
$ cd kafka-controler-1 && podman build . -t "quay.io/$USER/kafka-controler-1:latest"
$ cd kafka-controler-2 && podman build . -t "quay.io/$USER/kafka-controler-2:latest"
$ cd kafka-controler-3 && podman build . -t "quay.io/$USER/kafka-controler-3:latest"
$ podman push "quay.io/$USER/kafka-controler-1:latest"
$ podman push "quay.io/$USER/kafka-controler-2:latest"
$ podman push "quay.io/$USER/kafka-controler-3:latest"

Verify the images

Phew—that was a lot of image builds. Let's take a break and look at all the images we built:

$ podman images
REPOSITORY                                TAG         IMAGE ID      CREATED        SIZE
quay.io/ckyrouac/kafka-controller-3       latest      ddeba64d6bf0  8 minutes ago  1.58 GB
quay.io/ckyrouac/kafka-controller-2       latest      955517c0b3f1  8 minutes ago  1.58 GB
quay.io/ckyrouac/kafka-controller-1       latest      32cfad55950c  8 minutes ago  1.58 GB
localhost/kafka-controller                latest      8b6929f1386f  8 minutes ago  1.58 GB
quay.io/ckyrouac/kafka-broker-3           latest      295f5446fd5c  9 minutes ago  1.58 GB
quay.io/ckyrouac/kafka-broker-2           latest      ef47f4b33421  9 minutes ago  1.58 GB
quay.io/ckyrouac/kafka-broker-1           latest      e26067698d24  9 minutes ago  1.58 GB
localhost/kafka-broker                    latest      bd1f12f4a216  9 minutes ago  1.58 GB
quay.io/centos-bootc/centos-bootc         stream9     ae1314a556bb  4 days ago     1.53 GB
quay.io/centos-bootc/bootc-image-builder  latest      080b71f914e7  5 days ago     741 MB
docker.io/apache/kafka                    3.8.0       b610bd8a193a  2 months ago   384 MB

Perfect, we see the 3 broker and 3 controller images. For a quick sanity check, let's use podman-bootc to quickly boot a controller node and inspect it to make sure Kafka starts at boot:

$ podman-bootc run quay.io/ckyrouac/kafka-controller-1
... wait for it to boot into a shell...
[root@kafka-controller-1 ~]# podman --storage-opt=additionalimagestore=/usr/lib/bootc/storage images
REPOSITORY              TAG         IMAGE ID      CREATED       SIZE        R/O
docker.io/apache/kafka  3.8.0       b610bd8a193a  2 months ago  384 MB      true
[root@kafka-controller-1 ~]# systemctl status kafka-controller
● kafka-controller.service - Kafka controller
     Loaded: loaded (/usr/share/containers/systemd/kafka-controller.container; generated)
     Active: active (running) since Tue 2024-10-15 19:44:18 UTC; 25s ago
   Main PID: 1150 (conmon)
      Tasks: 23 (limit: 11990)
     Memory: 173.3M
        CPU: 1.049s
     CGroup: /system.slice/kafka-controller.service
             ├─libpod-payload-fc0c3de32bf2713f69f2e48537873603a4ab7b874c0fb3345f6b9844ee6884f0
             │ ├─1152 bash /etc/kafka/docker/run
             │ ├─1159 bash /etc/kafka/docker/run
             │ └─1160 /opt/java/openjdk/bin/java -Xmx256M -XX:SharedArchiveFile=/opt/kafka/storage.jsa -Dcom.sun.ma>
             └─runtime
               └─1150 /usr/bin/conmon --api-version 1 -c fc0c3de32bf2713f69f2e48537873603a4ab7b874c0fb3345f6b9844ee>
[root@kafka-controller-1 ~]# sudo podman ps
CONTAINER ID  IMAGE                         COMMAND               CREATED        STATUS        PORTS                             NAMES
fc0c3de32bf2  docker.io/apache/kafka:3.8.0  /etc/kafka/docker...  1 minutes ago  Up 1 minutes  0.0.0.0:9093->9093/tcp, 9092/tcp  systemd-kafka-controller

Great—we see the image is in the bootc container storage, the systemd service is running, and the container is started. Now let's remove the test VM:

$ podman-bootc rm -f 32cfad55950c

Build bootc disk images

For this basic test environment, we're going to create VMs in libvirt. In a production environment, these could be VMs in a cloud provider, bare metal, or another type of system. To provision the libvirt VMs, we're going to use bootc-image-builder to create disk images. We will end up with 6 total disk images, one for each broker and controller.

Configure bootc image builder

First let's create a configuration file that will be applied by bootc image builder to each disk image. This config is responsible for system specific configuration. For our basic test environment, we're going to create a user and define the disk size. This will allow us to easily SSH into the VMs to validate their state, and limit the disk size to avoid eating up too much space. In a production environment, you most likely wouldn't want a user, especially not one with a simple password.

Create a disk-images directory then create a file named bib.toml within that directory with the following contents:

[[customizations.user]]
name = "kafka"
password = "kafka"
groups = ["wheel"]
[[customizations.filesystem]]
mountpoint = "/"
minsize = "4 GiB"

Create the disk image directories:

$ mkdir -p disk-images/kafka-broker-1
$ mkdir -p disk-images/kafka-broker-2
$ mkdir -p disk-images/kafka-broker-3
$ mkdir -p disk-images/kafka-controller-1
$ mkdir -p disk-images/kafka-controller-2
$ mkdir -p disk-images/kafka-brcontrollerker-3

Build the disk images

This will run bootc image builder to create a disk image from the kafka-broker-1 container image. Repeat this for the 5 other images (kafka-broker-2, kafka-broker-3, kafka-controller-1, kafka-controller-2, kafka-controller-3). Only the final word needs to be changed for each subsequent build:

$ export USER=<your-quay-username>
$ podman run --rm --privileged \
    -v /var/lib/containers/storage:/var/lib/containers/storage \
    --security-opt label=type:unconfined_t \
    -v "./kafka-broker-1:/output" \
    -v "./bib.toml:/config.toml:ro" \
    quay.io/centos-bootc/bootc-image-builder:latest build --type qcow2 --local "quay.io/$USER/kafka-broker-1"

After running this command for all 6 nodes, the disk-images directory should look like this:

$ tree disk-images
disk-images
├── kafka-broker-1
│   ├── manifest-qcow2.json
│   └── qcow2
│       └── disk.qcow2
├── kafka-broker-2
│   ├── manifest-qcow2.json
│   └── qcow2
│       └── disk.qcow2
├── kafka-broker-3
│   ├── manifest-qcow2.json
│   └── qcow2
│       └── disk.qcow2
├── kafka-controller-1
│   ├── manifest-qcow2.json
│   └── qcow2
│       └── disk.qcow2
├── kafka-controller-2
│   ├── manifest-qcow2.json
│   └── qcow2
│       └── disk.qcow2
└── kafka-controller-3
    ├── manifest-qcow2.json
    └── qcow2
        └── disk.qcow2

Start the virtual machines

All right, we now have everything we need to create the 6 VM Kafka cluster and start streaming messages. For each controller (kafka-controller-1, kafka-controller-2, kafka-controller-3), run the following virt-install command:

$ virt-install \
  --connect "qemu:///system" \
  --name "kafka-controller-1" \
  --cpu host \
  --vcpus "2" \
  --memory "2048" \
  --network network=default \
  --noautoconsole \
  --import --disk "./disk-images/kafka-controller-1/qcow2/disk.qcow2,format=qcow2" \
  --os-variant fedora-eln

Then for each broker (kafka-broker-1, kafka-broker-2, kafka-broker-3), run the following virt-install command:

$ virt-install \
  --connect "qemu:///system" \
  --name "kafka-broker-1" \
  --cpu host \
  --vcpus "2" \
  --memory "4196" \
  --network network=default \
  --noautoconsole \
  --import --disk "./disk-images/kafka-broker-1/qcow2/disk.qcow2,format=qcow2" \
  --os-variant fedora-eln

Give all the VMs a few minutes to start up and get settled. Each node needs to communicate with each other and it will take some time for this process to complete.

Let's verify all three VMs are running:

$ virsh --connect qemu:///system list
 Id   Name                 State
------------------------------------
 85   kafka-broker-1       running
 86   kafka-broker-2       running
 87   kafka-broker-3       running
 88   kafka-controller-1   running
 89   kafka-controller-2   running
 90   kafka-controller-3   running

Send a message

Great! All the nodes appear to be running. Let's try producing and consuming a message:

$ echo "hello world" | kcat -b kafka-broker-1:9092 -P -t test-topic
$ kcat -b kafka-broker-1:9092 -C -t test-topic -o beginning
hello world
% Reached end of topic test-topic [0] at offset 1

Metrics

One of the benefits of deploying Kafka on a Kubernetes cluster is the simplicity of enabling metrics. Let's see how easy it is to get metrics set up on the bootc-based Kafka nodes using logically bound images.

Create Prometheus quadlet container

Let's use the node-exporter image provided by prometheus to get the quadlet container up and running. We'll add the quadlet container into each base image (kafka-broker and kafka-controller). First, let's create the file in the two working directories:

kafka-broker/usr/share/containers/systemd/prometheus.container
kafka-controller/usr/share/containers/systemd/prometheus.container

See below:

[Container]
Image=docker.io/prom/node-exporter:1.8.2
GlobalArgs=--storage-opt=additionalimagestore=/usr/lib/bootc/storage
PublishPort=9100:9100
[Unit]
Description=Prometheus
[Install]
WantedBy=default.target
[Service]
Restart=always

Then, we need to modify each Containerfile to create the symlink to bind the Prometheus image to the bootc image. We do this by adding one line to the Containerfile:

ln -s /usr/share/containers/systemd/prometheus.container /usr/lib/bootc/bound-images.d/prometheus.container

We will also expose the metrics port 9100.

The final file will look like this (similar for the kafka-broker):

FROM registry.redhat.io/rhel9/rhel-bootc:9.4
COPY ./usr/. /usr
RUN <<EOF
set -euo pipefail
# install dependencies
dnf -y install firewalld
dnf clean all
# bind kafka-node image to the bootc image
ln -s /usr/share/containers/systemd/kafka-controller.container /usr/lib/bootc/bound-images.d/kafka-controller.container
# bind prometheus iamge to the bootc image
ln -s /usr/share/containers/systemd/prometheus.container /usr/lib/bootc/bound-images.d/prometheus.container
# expose Kafka port 9092, 9093
firewall-offline-cmd --add-port 9093/tcp --add-port 9092/tcp --add-port 9100/tcp
EOF

Building a new version of the Kafka nodes

Now that we have added Prometheus to the base images, we need to rebuild each node and upgrade them to start serving metrics. In a production environment, this would ideally be automatically handled by a CI/CD pipeline. For this test environment, we will need to manually build and push each image:

export USER=<your quay user>
cd kafka-broker && podman build . -t localhost/kafka-broker:latest && cd ../
cd kafka-broker-1 && podman build . -t "quay.io/$USER/kafka-broker-1:latest" && cd ../
cd kafka-broker-2 && podman build . -t "quay.io/$USER/kafka-broker-2:latest" && cd ../
cd kafka-broker-3 && podman build . -t "quay.io/$USER/kafka-broker-3:latest" && cd ../
cd kafka-controller && podman build . -t localhost/kafka-controller:latest && cd ../
cd kafka-controller-1 && podman build . -t "quay.io/$USER/kafka-controller-1:latest" && cd ../
cd kafka-controller-2 && podman build . -t "quay.io/$USER/kafka-controller-2:latest" && cd ../
cd kafka-controller-3 && podman build . -t "quay.io/$USER/kafka-controller-3:latest" && cd ../
podman push "quay.io/$USER/kafka-broker-1:latest"
podman push "quay.io/$USER/kafka-broker-2:latest"
podman push "quay.io/$USER/kafka-broker-3:latest"
podman push "quay.io/$USER/kafka-controller-1:latest"
podman push "quay.io/$USER/kafka-controller-2:latest"
podman push "quay.io/$USER/kafka-controller-3:latest"

Upgrade each bootc node

Now that the images are updated in the registry, we can run bootc upgrade on each node to upgrade each system to the latest image:

ssh kafka@kafka-broker-1
$ sudo bootc upgrade
...
$ sudo reboot

Verify Prometheus metrics are exposed

Finally, since we don't have a Prometheus server setup to scrape each node, let's curl each node to validate the metrics are working:

$ curl kafka-broker-1:9100/metrics
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0
go_gc_duration_seconds{quantile="0.25"} 0
...

Conclusion

You now have a fully functional, production-like Kafka instance running as a cluster of bootc systems utilizing logically bound images! While we used local tools to build and test this, you can imagine how to integrate this container image workflow into an existing production pipeline. This lets you scale out your Kafka nodes into larger systems without sacrificing the container management and security tools you're familiar with. Using logically bound images ensures each Kafka node will reboot quickly on upgrades and the state of each image is what we expect. This example also demonstrated how easy it is to add additional services to each node.

All of the code to build this virtual Kafka cluster can be found at github.com/ckyrouac/kafka-bootc-cluster.