Logically bound images are a recently added feature in bootc that allows any container image to be managed directly by bootc. One of the many benefits this provides is simplified system management. An application image can be "bound" to a bootc operating system image as part of the bootc container build.
This simplified system management would have been useful for a project I previously worked on. The project was a large web application composed of dozens of microservices running in a Kubernetes cluster. These microservices primarily used Kafka to communicate. Our Kafka instance was also deployed in a Kubernetes cluster using the strimzi
operator.
The operator made the deployment and management of the Kafka cluster simple; however, keeping the cluster stable was a constant battle. One of the main issues was due to Kafka's heavy memory requirement. 32GB of memory is the minimum recommendation for a single Kafka node with a minimum of 3 nodes. Due to the cost, our Kubernetes nodes were not sized to accommodate this large memory requirement. We likely could have leveraged many of the newer Kubernetes features to stabilize the cluster; however, I wonder if it would have made more sense to run each Kafka node directly on a virtual machine (VM).
We ran our other critical loads like our databases outside of Kubernetes already and Kafka was equally critical. The ability to run Kafka directly on a cloud-based VM would have made scaling each node easy, just by increasing the memory in the cloud provider. By using a bootc image with a logically bound Kafka image, I should be able to preserve the benefits of a container workflow while still allowing each node to scale vertically beyond the limits of a Kubernetes cluster. Let's see how this looks in practice.
Building the broker base image
This environment will be based off the basic example from the Kafka image docs. The goal is to create a production-like test environment, so we need 3 broker nodes and 3 controller nodes. Let's start by creating a broker image. Each Kafka broker node will share the basic configuration, so let's first create a base image with the configuration defined. Each individual broker will require some unique parameters to be defined. So we will later create additional images based off the kafka-broker
image.
Create the directory structure for the image
First, create a directory to contain all the Kafka broker related files:
$ mkdir kafka-broker && cd kafka-broker
$ mkdir -p /usr/share/containers/systemd
Create the quadlet container
Next, let's create the quadlet container definition at /usr/share/containers/systemd/kafka-broker.container
. The quadlet docs explain the structure of this file quite well. The environment variables are the same ones found in the examples from the Kafka image docs, with some adaptations to the hostname
s and ports. This is to ensure we can access the nodes when they are running in virtual machines rather than a container. See below:
[Container]
Image=docker.io/apache/kafka:3.8.0
GlobalArgs=--storage-opt=additionalimagestore=/usr/lib/bootc/storage
PublishPort=9092:9092
Environment=KAFKA_PROCESS_ROLES=broker
Environment=KAFKA_INTER_BROKER_LISTENER_NAME=PLAINTEXT
Environment=KAFKA_CONTROLLER_LISTENER_NAMES=CONTROLLER
Environment=KAFKA_LISTENER_SECURITY_PROTOCOL_MAP=CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
Environment=KAFKA_CONTROLLER_QUORUM_VOTERS=4@kafka-controller-1:9093,5@kafka-controller-2:9093,6@kafka-controller-3:9093
Environment=KAFKA_LISTENERS='PLAINTEXT://:9092'
Environment=KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS=0
[Unit]
Description=Kafka broker
[Install]
WantedBy=default.target
[Service]
Restart=always
Create the bootc image Containerfile
The only other file we need to create is the Containerfile in the root of the kafka-broker
directory. A few interesting things to note about this Containerfile:
- It installs firewalld and forwards port 9092/tcp. Firewalld is used for convenience and port 9092 is how we will communicate with the Kafka broker service
- It creates a symlink to bind the
kafka-broker
image to the bootc image.
See below:
FROM registry.redhat.io/rhel9/rhel-bootc:9.4
COPY ./usr/. /usr
RUN <<EOF
set -euo pipefail
# install dependencies
dnf -y install firewalld
dnf clean all
# bind kafka-node image to the bootc image
ln -s /usr/share/containers/systemd/kafka-broker.container /usr/lib/bootc/bound-images.d/kafka-broker.container
# expose Kafka port 9092
firewall-offline-cmd --add-port 9092/tcp
EOF
Build the container image
Now that the files are in place, let's build the image locally. This will serve as the base image for the 3 broker nodes:
$ podman build . -t localhost/kafka-broker:latest
Building unique Kafka broker nodes
Each broker requires a unique hostname
and node_id
. This could be accomplished in a few different ways. I found it straightforward to create a unique container image for each node. This would allow me to continue using all the same familiar container build infrastructure in a production pipeline to deploy the nodes. However, you might have other infrastructure that would make more sense. In the end, each image just needs to add a couple of files to define the unique parameters.
Create the directories
Let's create 3 new directories, one for each broker:
$ mkdir kafka-broker-1
$ mkdir kafka-broker-2
$ mkdir kafka-broker-3
Each broker also needs an etc
and usr
directory for configuration files. Let's create those now:
$ mkdir -p kafka-broker-1/usr/share/containers/systemd/kafka-broker
$ mkdir -p kafka-broker-2/usr/share/containers/systemd/kafka-broker
$ mkdir -p kafka-broker-3/usr/share/containers/systemd/kafka-broker
$ mkdir -p kafka-broker-1/etc
$ mkdir -p kafka-broker-2/etc
$ mkdir -p kafka-broker-3/etc
Create the unique broker configuration files
First, let's create the hostname
file. For this test environment, we'll set each node's hostname
to kafka-broker-{id}
. So, let's create kafka-broker-1/etc/hostname
with the following contents. Do the same for the other two nodes:
kafka-broker-1
The other configuration file will be created at kafka-broker-1/usr/share/containers/systemd/kafka-broker.container.d/10-broker.conf
:
[Container]
Environment=KAFKA_NODE_ID=1
Environment=KAFKA_ADVERTISED_LISTENERS='PLAINTEXT://kafka-broker-1:9092'
Finally, create a Containerfile in the root at kafka-broker-1/Containerfile
:
FROM localhost/kafka-broker
COPY ./usr/. /usr
COPY ./etc/. /etc
Create the same files for the other two brokers, substituting the ID.
Build and push the broker container images
Now that we have defined all the broker specific configuration, let's build a container image for each broker. We will also push each broker image to a registry to enable bootc upgrades:
$ export USER=<your-quay-username>
$ cd kafka-broker-1 && podman build . -t "quay.io/$USER/kafka-broker-1:latest"
$ cd kafka-broker-2 && podman build . -t "quay.io/$USER/kafka-broker-2:latest"
$ cd kafka-broker-3 && podman build . -t "quay.io/$USER/kafka-broker-3:latest"
$ podman push "quay.io/$USER/kafka-broker-1:latest"
$ podman push "quay.io/$USER/kafka-broker-2:latest"
$ podman push "quay.io/$USER/kafka-broker-3:latest"
Building the controller base image
Now we need to build the 3 controller node images. This is a very similar process to building the broker nodes.
Create the directory structure for the image:
$ mkdir kafka-controller && cd kafka-controller
$ mkdir -p /usr/share/containers/systemd
Create the quadlet container:
[Container]
Image=docker.io/apache/kafka:3.8.0
GlobalArgs=--storage-opt=additionalimagestore=/usr/lib/bootc/storage
PublishPort=9093:9093
Environment=KAFKA_PROCESS_ROLES=controller
Environment=KAFKA_INTER_BROKER_LISTENER_NAME=PLAINTEXT
Environment=KAFKA_CONTROLLER_LISTENER_NAMES=CONTROLLER
Environment=KAFKA_CONTROLLER_QUORUM_VOTERS=4@kafka-controller-1:9093,5@kafka-controller-2:9093,6@kafka-controller-3:9093
Environment=KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS=0
Environment=KAFKA_LISTENERS=CONTROLLER://:9093
[Unit]
Description=Kafka controller
[Install]
WantedBy=default.target
[Service]
Restart=always
Create the bootc image Containerfile:
FROM registry.redhat.io/rhel9/rhel-bootc:9.4
COPY ./usr/. /usr
RUN <<EOF
set -euo pipefail
# install dependencies
dnf -y install firewalld
dnf clean all
# bind kafka-node image to the bootc image
ln -s /usr/share/containers/systemd/kafka-controller.container /usr/lib/bootc/bound-images.d/kafka-controller.container
# expose Kafka port 9092, 9093
firewall-offline-cmd --add-port 9093/tcp --add-port 9092/tcp
EOF
Build the container image:
$ podman build . -t localhost/kafka-controller:latest
Building unique Kafka controller nodes
Create the directories:
$ mkdir -p kafka-controller-1/usr/share/containers/systemd/kafka-controller
$ mkdir -p kafka-controller-2/usr/share/containers/systemd/kafka-controller
$ mkdir -p kafka-controller-3/usr/share/containers/systemd/kafka-controller
$ mkdir -p kafka-controller-1/etc
$ mkdir -p kafka-controller-2/etc
$ mkdir -p kafka-controller-3/etc
Create the unique broker configuration files (below).
kafka-controller-1/etc/hostname
:
kafka-controller-1
kafka-controller-1/usr/share/containers/systemd/kafka-controller.container.d/10-controller.conf
:
[Container]
Environment=KAFKA_NODE_ID=4
Environment=KAFKA_ADVERTISED_LISTENERS='PLAINTEXT://kafka-controller-1:9092'
kafka-controller-1/Containerfile
:
FROM localhost/kafka-controller
COPY ./usr/. /usr
COPY ./etc/. /etc
Create the same files for the other two controllers, substituting the ID. The controller NODE_ID
s start at 4 and count up, so:
kafka-controller-1
->KAFKA_NODE_ID=4
kafka-controller-2
->KAFKA_NODE_ID=5
kafka-controller-3
->KAFKA_NODE_ID=6
Build and push the controller container images:
$ export USER=<your-quay-username>
$ cd kafka-controler-1 && podman build . -t "quay.io/$USER/kafka-controler-1:latest"
$ cd kafka-controler-2 && podman build . -t "quay.io/$USER/kafka-controler-2:latest"
$ cd kafka-controler-3 && podman build . -t "quay.io/$USER/kafka-controler-3:latest"
$ podman push "quay.io/$USER/kafka-controler-1:latest"
$ podman push "quay.io/$USER/kafka-controler-2:latest"
$ podman push "quay.io/$USER/kafka-controler-3:latest"
Verify the images
Phew—that was a lot of image builds. Let's take a break and look at all the images we built:
$ podman images
REPOSITORY TAG IMAGE ID CREATED SIZE
quay.io/ckyrouac/kafka-controller-3 latest ddeba64d6bf0 8 minutes ago 1.58 GB
quay.io/ckyrouac/kafka-controller-2 latest 955517c0b3f1 8 minutes ago 1.58 GB
quay.io/ckyrouac/kafka-controller-1 latest 32cfad55950c 8 minutes ago 1.58 GB
localhost/kafka-controller latest 8b6929f1386f 8 minutes ago 1.58 GB
quay.io/ckyrouac/kafka-broker-3 latest 295f5446fd5c 9 minutes ago 1.58 GB
quay.io/ckyrouac/kafka-broker-2 latest ef47f4b33421 9 minutes ago 1.58 GB
quay.io/ckyrouac/kafka-broker-1 latest e26067698d24 9 minutes ago 1.58 GB
localhost/kafka-broker latest bd1f12f4a216 9 minutes ago 1.58 GB
quay.io/centos-bootc/centos-bootc stream9 ae1314a556bb 4 days ago 1.53 GB
quay.io/centos-bootc/bootc-image-builder latest 080b71f914e7 5 days ago 741 MB
docker.io/apache/kafka 3.8.0 b610bd8a193a 2 months ago 384 MB
Perfect, we see the 3 broker and 3 controller images. For a quick sanity check, let's use podman-bootc to quickly boot a controller node and inspect it to make sure Kafka starts at boot:
$ podman-bootc run quay.io/ckyrouac/kafka-controller-1
... wait for it to boot into a shell...
[root@kafka-controller-1 ~]# podman --storage-opt=additionalimagestore=/usr/lib/bootc/storage images
REPOSITORY TAG IMAGE ID CREATED SIZE R/O
docker.io/apache/kafka 3.8.0 b610bd8a193a 2 months ago 384 MB true
[root@kafka-controller-1 ~]# systemctl status kafka-controller
● kafka-controller.service - Kafka controller
Loaded: loaded (/usr/share/containers/systemd/kafka-controller.container; generated)
Active: active (running) since Tue 2024-10-15 19:44:18 UTC; 25s ago
Main PID: 1150 (conmon)
Tasks: 23 (limit: 11990)
Memory: 173.3M
CPU: 1.049s
CGroup: /system.slice/kafka-controller.service
├─libpod-payload-fc0c3de32bf2713f69f2e48537873603a4ab7b874c0fb3345f6b9844ee6884f0
│ ├─1152 bash /etc/kafka/docker/run
│ ├─1159 bash /etc/kafka/docker/run
│ └─1160 /opt/java/openjdk/bin/java -Xmx256M -XX:SharedArchiveFile=/opt/kafka/storage.jsa -Dcom.sun.ma>
└─runtime
└─1150 /usr/bin/conmon --api-version 1 -c fc0c3de32bf2713f69f2e48537873603a4ab7b874c0fb3345f6b9844ee>
[root@kafka-controller-1 ~]# sudo podman ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
fc0c3de32bf2 docker.io/apache/kafka:3.8.0 /etc/kafka/docker... 1 minutes ago Up 1 minutes 0.0.0.0:9093->9093/tcp, 9092/tcp systemd-kafka-controller
Great—we see the image is in the bootc container storage, the systemd service is running, and the container is started. Now let's remove the test VM:
$ podman-bootc rm -f 32cfad55950c
Build bootc disk images
For this basic test environment, we're going to create VMs in libvirt. In a production environment, these could be VMs in a cloud provider, bare metal, or another type of system. To provision the libvirt VMs, we're going to use bootc-image-builder
to create disk images. We will end up with 6 total disk images, one for each broker and controller.
Configure bootc image builder
First let's create a configuration file that will be applied by bootc image builder to each disk image. This config is responsible for system specific configuration. For our basic test environment, we're going to create a user and define the disk size. This will allow us to easily SSH into the VMs to validate their state, and limit the disk size to avoid eating up too much space. In a production environment, you most likely wouldn't want a user, especially not one with a simple password.
Create a disk-images
directory then create a file named bib.toml
within that directory with the following contents:
[[customizations.user]]
name = "kafka"
password = "kafka"
groups = ["wheel"]
[[customizations.filesystem]]
mountpoint = "/"
minsize = "4 GiB"
Create the disk image directories:
$ mkdir -p disk-images/kafka-broker-1
$ mkdir -p disk-images/kafka-broker-2
$ mkdir -p disk-images/kafka-broker-3
$ mkdir -p disk-images/kafka-controller-1
$ mkdir -p disk-images/kafka-controller-2
$ mkdir -p disk-images/kafka-brcontrollerker-3
Build the disk images
This will run bootc image builder to create a disk image from the kafka-broker-1
container image. Repeat this for the 5 other images (kafka-broker-2
, kafka-broker-3
, kafka-controller-1
, kafka-controller-2
, kafka-controller-3
). Only the final word needs to be changed for each subsequent build:
$ export USER=<your-quay-username>
$ podman run --rm --privileged \
-v /var/lib/containers/storage:/var/lib/containers/storage \
--security-opt label=type:unconfined_t \
-v "./kafka-broker-1:/output" \
-v "./bib.toml:/config.toml:ro" \
quay.io/centos-bootc/bootc-image-builder:latest build --type qcow2 --local "quay.io/$USER/kafka-broker-1"
After running this command for all 6 nodes, the disk-images
directory should look like this:
$ tree disk-images
disk-images
├── kafka-broker-1
│ ├── manifest-qcow2.json
│ └── qcow2
│ └── disk.qcow2
├── kafka-broker-2
│ ├── manifest-qcow2.json
│ └── qcow2
│ └── disk.qcow2
├── kafka-broker-3
│ ├── manifest-qcow2.json
│ └── qcow2
│ └── disk.qcow2
├── kafka-controller-1
│ ├── manifest-qcow2.json
│ └── qcow2
│ └── disk.qcow2
├── kafka-controller-2
│ ├── manifest-qcow2.json
│ └── qcow2
│ └── disk.qcow2
└── kafka-controller-3
├── manifest-qcow2.json
└── qcow2
└── disk.qcow2
Start the virtual machines
All right, we now have everything we need to create the 6 VM Kafka cluster and start streaming messages. For each controller (kafka-controller-1
, kafka-controller-2
, kafka-controller-3
), run the following virt-install
command:
$ virt-install \
--connect "qemu:///system" \
--name "kafka-controller-1" \
--cpu host \
--vcpus "2" \
--memory "2048" \
--network network=default \
--noautoconsole \
--import --disk "./disk-images/kafka-controller-1/qcow2/disk.qcow2,format=qcow2" \
--os-variant fedora-eln
Then for each broker (kafka-broker-1
, kafka-broker-2
, kafka-broker-3
), run the following virt-install
command:
$ virt-install \
--connect "qemu:///system" \
--name "kafka-broker-1" \
--cpu host \
--vcpus "2" \
--memory "4196" \
--network network=default \
--noautoconsole \
--import --disk "./disk-images/kafka-broker-1/qcow2/disk.qcow2,format=qcow2" \
--os-variant fedora-eln
Give all the VMs a few minutes to start up and get settled. Each node needs to communicate with each other and it will take some time for this process to complete.
Let's verify all three VMs are running:
$ virsh --connect qemu:///system list
Id Name State
------------------------------------
85 kafka-broker-1 running
86 kafka-broker-2 running
87 kafka-broker-3 running
88 kafka-controller-1 running
89 kafka-controller-2 running
90 kafka-controller-3 running
Send a message
Great! All the nodes appear to be running. Let's try producing and consuming a message:
$ echo "hello world" | kcat -b kafka-broker-1:9092 -P -t test-topic
$ kcat -b kafka-broker-1:9092 -C -t test-topic -o beginning
hello world
% Reached end of topic test-topic [0] at offset 1
Metrics
One of the benefits of deploying Kafka on a Kubernetes cluster is the simplicity of enabling metrics. Let's see how easy it is to get metrics set up on the bootc-based Kafka nodes using logically bound images.
Create Prometheus quadlet container
Let's use the node-exporter image provided by prometheus to get the quadlet container up and running. We'll add the quadlet container into each base image (kafka-broker
and kafka-controller
). First, let's create the file in the two working directories:
kafka-broker/usr/share/containers/systemd/prometheus.container
kafka-controller/usr/share/containers/systemd/prometheus.container
See below:
[Container]
Image=docker.io/prom/node-exporter:1.8.2
GlobalArgs=--storage-opt=additionalimagestore=/usr/lib/bootc/storage
PublishPort=9100:9100
[Unit]
Description=Prometheus
[Install]
WantedBy=default.target
[Service]
Restart=always
Then, we need to modify each Containerfile to create the symlink to bind the Prometheus image to the bootc image. We do this by adding one line to the Containerfile:
ln -s /usr/share/containers/systemd/prometheus.container /usr/lib/bootc/bound-images.d/prometheus.container
We will also expose the metrics port 9100.
The final file will look like this (similar for the kafka-broker
):
FROM registry.redhat.io/rhel9/rhel-bootc:9.4
COPY ./usr/. /usr
RUN <<EOF
set -euo pipefail
# install dependencies
dnf -y install firewalld
dnf clean all
# bind kafka-node image to the bootc image
ln -s /usr/share/containers/systemd/kafka-controller.container /usr/lib/bootc/bound-images.d/kafka-controller.container
# bind prometheus iamge to the bootc image
ln -s /usr/share/containers/systemd/prometheus.container /usr/lib/bootc/bound-images.d/prometheus.container
# expose Kafka port 9092, 9093
firewall-offline-cmd --add-port 9093/tcp --add-port 9092/tcp --add-port 9100/tcp
EOF
Building a new version of the Kafka nodes
Now that we have added Prometheus to the base images, we need to rebuild each node and upgrade them to start serving metrics. In a production environment, this would ideally be automatically handled by a CI/CD pipeline. For this test environment, we will need to manually build and push each image:
export USER=<your quay user>
cd kafka-broker && podman build . -t localhost/kafka-broker:latest && cd ../
cd kafka-broker-1 && podman build . -t "quay.io/$USER/kafka-broker-1:latest" && cd ../
cd kafka-broker-2 && podman build . -t "quay.io/$USER/kafka-broker-2:latest" && cd ../
cd kafka-broker-3 && podman build . -t "quay.io/$USER/kafka-broker-3:latest" && cd ../
cd kafka-controller && podman build . -t localhost/kafka-controller:latest && cd ../
cd kafka-controller-1 && podman build . -t "quay.io/$USER/kafka-controller-1:latest" && cd ../
cd kafka-controller-2 && podman build . -t "quay.io/$USER/kafka-controller-2:latest" && cd ../
cd kafka-controller-3 && podman build . -t "quay.io/$USER/kafka-controller-3:latest" && cd ../
podman push "quay.io/$USER/kafka-broker-1:latest"
podman push "quay.io/$USER/kafka-broker-2:latest"
podman push "quay.io/$USER/kafka-broker-3:latest"
podman push "quay.io/$USER/kafka-controller-1:latest"
podman push "quay.io/$USER/kafka-controller-2:latest"
podman push "quay.io/$USER/kafka-controller-3:latest"
Upgrade each bootc node
Now that the images are updated in the registry, we can run bootc upgrade
on each node to upgrade each system to the latest image:
ssh kafka@kafka-broker-1
$ sudo bootc upgrade
...
$ sudo reboot
Verify Prometheus metrics are exposed
Finally, since we don't have a Prometheus server setup to scrape each node, let's curl
each node to validate the metrics are working:
$ curl kafka-broker-1:9100/metrics
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0
go_gc_duration_seconds{quantile="0.25"} 0
...
Conclusion
You now have a fully functional, production-like Kafka instance running as a cluster of bootc systems utilizing logically bound images! While we used local tools to build and test this, you can imagine how to integrate this container image workflow into an existing production pipeline. This lets you scale out your Kafka nodes into larger systems without sacrificing the container management and security tools you're familiar with. Using logically bound images ensures each Kafka node will reboot quickly on upgrades and the state of each image is what we expect. This example also demonstrated how easy it is to add additional services to each node.
All of the code to build this virtual Kafka cluster can be found at github.com/ckyrouac/kafka-bootc-cluster.
Last updated: November 15, 2024