Apache Kafka has emerged as the leading platform for building real-time data pipelines. Born as a messaging system, mainly for the publish/subscribe pattern, Kafka has established itself as a data-streaming platform for processing data in real-time. Today, Kafka is also heavily used for developing event-driven applications, enabling the services in your infrastructure to communicate with each other through events using Apache Kafka as the backbone. Meanwhile, cloud-native application development is gathering more traction thanks to Kubernetes.
Thanks to the abstraction layer provided by this platform, it’s easy to move your applications from running on bare metal to any cloud provider (AWS, Azure, GCP, IBM, and so on) enabling hybrid-cloud scenarios as well. But how do you move your Apache Kafka workloads to the cloud? It’s possible, but it’s not simple. You could learn all of the Apache Kafka tools for handling a cluster well enough to move your Kafka workloads to Kubernetes, or you could leverage the Kubernetes knowledge you already have using Strimzi.
Note: Strimzi will be represented at the virtual KubeCon Europe 2020 conference from 17-20 August 2020. See the end of the article for details.
Welcome to Strimzi
Strimzi is an open source project licensed under Apache License 2.0 that is part of the Cloud Native Computing Foundation (CNCF) as a sandbox project since last year. Its main focus is running Apache Kafka on Kubernetes while providing container images for Apache Kafka itself, Zookeeper, and other components that are part of the Strimzi ecosystem.
Leveraging the Kubernetes Operator pattern, it addresses the whole lifecycle from creating, managing, and monitoring Kafka clusters to managing all the related entities like topics and users. You get a real Kubernetes-native experience for handling all of the components in the Apache Kafka ecosystem.
Extending Kubernetes: The Strimzi custom resource definitions
Strimzi extends the Kubernetes API with new Kafka-related custom resource definitions (CRDs). This means that other than having the usual Kubernetes-native resources and objects like Pod
, Deployment
, and so on, you get a bunch of custom resources describing Kafka-related components. The main CRD is Kafka
, which describes a Kafka cluster to deploy with:
- The number of replicas (brokers) you want.
- The related configuration.
- The listeners for making the brokers accessible from inside or outside the Kubernetes cluster where Kafka is running.
- And many more.
It also describes the ZooKeeper ensemble needed by Kafka for working and the Kubernetes Operators’ configurations; finally, thanks to this resource, it’s also possible to deploy Cruise Control for cluster-rebalancing operations.
Using the well-known kubectl
tool you can get all of the Kafka instances running on your Kubernetes cluster with:
$ kubectl get kafka NAME DESIRED KAFKA REPLICAS DESIRED ZK REPLICAS my-cluster 3 3
Listing 1: Kafka resources in the Kubernetes cluster.
But Strimzi is not just the Kafka brokers, it’s about its entire ecosystem. Thanks to KafkaTopic
and KafkaUser
resources, you can create topics with related configuration (partition, replicas, etc.) and users, with related Access Control Lists (ACLs) for accessing topics, without using any specific Kafka tool. KafkaConnect
and KafkaConnector
resources allow you to deploy Kafka Connect and configure connectors for using Kafka to move data between different systems (i.e., migrating data from one database to another).
Strimzi also supports mirroring data between two different clusters living in different data centers—thanks to Mirror Maker—that can be deployed using the KafkaMirrorMaker
and KafkaMirrorMaker2
resources. And HTTP clients can connect to your Kafka cluster using the Strimzi bridge that's available thanks to the KafkaBridge
resource.
Finally, because a cluster can become unbalanced over time, with some brokers handling more traffic than others, it’s possible to use a KafkaRebalance
resource for asking the Cruise Control instance to rebalance the cluster in order to meet goals in terms of CPU, network, memory utilization, and so on.
How does Strimzi work?
There are different ways to install Strimzi on your Kubernetes cluster: using the YAML files provided by each release, through OperatorHub.io, or just applying a single YAML file directly from the official website as we will do in this example.
Type the following command in order to deploy the Strimzi Cluster Operator:
$ kubectl apply -f https://strimzi.io/install/latest?namespace=default
Listing 2: Installing the Strimzi Operators.
At this point, what you need is a YAML file containing a Kafka
resource that describes the Kafka cluster you want to deploy:
apiVersion: kafka.strimzi.io/v1beta1 kind: Kafka metadata: name: my-cluster spec: kafka: version: 2.5.0 replicas: 3 listeners: plain: {} tls: {} config: offsets.topic.replication.factor: 3 transaction.state.log.replication.factor: 3 transaction.state.log.min.isr: 2 log.message.format.version: "2.5" storage: type: jbod volumes: - id: 0 type: persistent-claim size: 100Gi deleteClaim: false zookeeper: replicas: 3 storage: type: persistent-claim size: 100Gi deleteClaim: false entityOperator: topicOperator: {} userOperator: {}
Listing 3: Kafka resource describing a cluster.
The Kafka
resource in Listing 3 describes a simple Kafka cluster with three brokers accessible through “plain” (on port 9092) and TLS-encrypted listeners as well as specific configuration parameters. It also uses persistent storage.
When applying this resource, the cluster Strimzi Cluster Operator takes care of it in order to deploy the Kafka cluster, starting the ZooKeeper and Kafka pods and finally the Strimzi Operators for handling topics and users. In the end, you should see all the pods from Listing 3 running and the cluster ready to be used by your Kafka clients for exchanging messages:
NAME READY STATUS RESTARTS AGE my-cluster-entity-operator-f977bf457-l2rjf 3/3 Running 0 51s my-cluster-kafka-0 2/2 Running 0 90s my-cluster-kafka-1 2/2 Running 0 90s my-cluster-kafka-2 2/2 Running 0 90s my-cluster-zookeeper-0 1/1 Running 0 2m58s my-cluster-zookeeper-1 1/1 Running 0 2m58s my-cluster-zookeeper-2 1/1 Running 0 2m58s strimzi-cluster-operator-7d6cd6bdf7-8xxvx 1/1 Running 0 17m
Listing 4: Strimzi Operator and Kafka cluster-related pods running.
In just a few minutes, an Apache Kafka cluster is up and running on Kubernetes. From there, you can easily scale your cluster up or down by changing the number of replicas or updating the cluster configuration. The Strimzi Cluster Operator watches for changes on the Kafka
resource and applies those changes to the running cluster, starting new brokers or shutting down existing ones, updating their configuration, and doing a rolling update if it’s needed.
This is just the beginning of your journey. Take a look at the official Strimzi website and blog posts to learn more about all of the other resources.
The Strimzi community
As part of CNCF, Strimzi is well-integrated with other projects in such an ecosystem. First of all, it provides Helm charts for deploying the Strimzi Operators as well as configuring metric exports to be scraped by a Prometheus server and shown on Grafana dashboards. It’s also possible to enable tracing using Jaeger and OpenTracing in order to trace messages flowing through the Apache Kafka cluster between clients and the other components like the bridge, Kafka Connect, and Mirror Maker.
Strimzi is also well-integrated with the Open Policy Agent (OPA) project for describing the client authorization policies in order to let them produce and consume messages on topics. Finally, the Kubernetes Event-driven Autoscaling (KEDA) project can be used for autoscaling event-driven applications using an Apache Kafka-based scaler.
But the Strimzi ecosystem doesn’t mean only integration with other CNCF projects. It also means interacting with a growing community of people working on the project itself.
To engage with the community and help the project gain traction:
- Download and try Strimzi, deploy your Apache Kafka cluster, and maybe discover bugs or suggest new features through the official GitHub repository.
- Improve the documentation if you're not working on the code.
- Spread the word at conferences and meetups.
- Blog about Strimzi and the way you are using it.
Conclusion
This article showed that it does not have to be painful to move your Apache Kafka workload to the cloud. Thanks to Strimzi, it’s a matter of a few minutes to get a Kafka cluster up and running on Kubernetes, and then you begin tuning to be productization ready. The awesome thing is that Strimzi not only just about Kafka itself, but its entire ecosystem.
If you want to know more, just visit the official website, join the Slack channel's #strimzi room, or follow the Twitter account. We are really excited to hear from you!
KubeCon Europe 2020
Strimzi will be represented at the virtual KubeCon Europe 2020 conference from August 17-20, 2020:
- 17 August 2020 at 13:00 CEST: “Meet the Maintainers” session.
- 19 August 2020 at 13:00 CEST: Attend our introduction talk, which will explain the basics of how Strimzi works and show a demo of to easily deploy a Kafka cluster. This demo will be followed by a live Q&A.
- 19 August 2020 at 14:00 CEST: “Meet the Maintainers” session.
In the "Meet the Maintainers" sessions, you can meet all our maintainers, talk with us, and ask questions about things like problems, bugs, or missing features.
Last updated: August 13, 2020