Kubernetes-native Apache Kafka with Strimzi, Debezium, and Apache Camel (Kafka Summit 2020)
Apache Kafka has become the leading platform for building real-time data pipelines. Today, Kafka is heavily used for developing event-driven applications, where it lets services communicate with each other through events. Using Kubernetes for this type of workload requires adding specialized components such as Kubernetes Operators and connectors to bridge the rest of your systems and applications to the Kafka ecosystem.
In this article, we’ll look at how the open source projects Strimzi, Debezium, and Apache Camel integrate with Kafka to speed up critical areas of Kubernetes-native development.
Note: Red Hat is sponsoring the Kafka Summit 2020 virtual conference from August 24-25, 2020. See the end of this article for details.
Kafka on Kubernetes with Strimzi
Strimzi is an open source project that is part of the Cloud Native Computing Foundation (CNCF) that makes it easier to move Apache Kafka workloads to the cloud. Strimzi relies on the abstraction layer provided by Kubernetes and the Kubernetes Operator pattern. Its main focus is running Apache Kafka on Kubernetes while providing container images for Kafka, Zookeeper, and other components that are part of the Strimzi ecosystem.
Strimzi extends the Kubernetes API with Kafka-related custom resource definitions (CRDs). The main Kafka CRD describes a Kafka cluster to deploy, as well as the Zookeeper ensemble that is needed. But Strimzi is not just for the broker; you can also use it to create and configure topics, and create users to access those topics. Strimzi also supports the configuration for mirroring data between clusters using Kafka MirrorMaker 2.0 custom resources, as well as deploying and managing the Strimzi Kafka Bridge for HTTP clients.
Change data capture with Debezium
Debezium is a set of distributed services that captures row-level changes in your databases so that your applications can see and respond to the changes. Debezium records all row-level changes committed to each database table in a transaction log. Applications simply read the transaction logs they’re interested in and see all of the events in the order in which they occurred. Debezium is durable and fast, so apps can respond quickly and never miss an event, even when things go wrong.
Debezium provides connectors for monitoring the following databases:
- MySQL Connector
- PostgreSQL Connector
- MongoDB Connector
- SQL Server Connector
Debezium connectors record all events to a Red Hat AMQ Streams Kafka cluster. Applications then consume those events through AMQ Streams. Debezium uses the Apache Kafka Connect framework, which makes all of Debezium’s connectors into Kafka Connector source connectors. As such, they can be deployed and managed using AMQ Streams’ Kafka Connect custom Kubernetes resources.
Kafka connectivity with Apache Camel Kafka Connect
The Apache Camel community has built one of the busiest open source integration frameworks in the Apache Foundation ecosystem. Camel lets you quickly and easily integrate data consumer and producer systems. It also implements the most used enterprise integration patterns and incorporates popular interfaces and protocols as they emerge.
The Camel Kafka Connector subproject focuses on using Camel components as Kafka Connect connectors. To this end, the development team built a tiny layer between the Camel and Kafka frameworks, which allows you to easily use each Camel component as a Kafka connector in the Kafka ecosystem. More than 340 Camel Kafka connectors support integrations with everything from AWS S3 to Telegram and Slack. All of these connectors are available to use with Kafka without throwing a single line of code.
New organizations are adopting Apache Kafka as an event backbone every day. Communities like Apache Camel are working on how to speed up development in key areas such as integration. The Debezium community provides specialized connectors that simplify integrating database-generated events from microservices or legacy applications into modern, event-driven architectures. Finally, CNCF projects like Strimzi make it easier to access the benefits of Kubernetes and deploy Apache Kafka workloads in a cloud-native way.
For those who want an open source development model with enterprise support, Red Hat Integration lets you deploy your Kafka-based event-driven architecture on Red Hat OpenShift, the enterprise Kubernetes. Red Hat AMQ Streams, Debezium, and the Apache Camel Kafka Connect connectors are all available with a Red Hat Integration subscription.
Kafka Summit 2020
If you want to know more about running Apache Kafka on Kubernetes, Red Hat is sponsoring the Kafka Summit 2020 virtual conference from August 24-25, 2020. You can join either of the following sessions (note that you must be registered to follow these links):
- Tuesday, August 25, 2020, at 10:00 a.m. PDT: Change Data Capture Pipelines with Debezium and Kafka Streams by Debezium project lead Gunnar Morling.
- Tuesday, August 25, 2020, at 10:30 a.m. PDT: Camel Kafka Connectors: Tune Kafka to “Speak” With (Almost) Everything by Apache Camel engineers Andrea Cosentino and Andrea Tarocchi.
If you want to follow the conversation and talk with the presenters, I’ll be hosting panel discussions with the engineering leads at these times:
- Monday August 24, 2020 from 10:00 a.m. – 11:00 a.m. PDT
- Monday August 24, 2020 from 1:00 p.m. – 2:00 p.m. PDT
- Tuesday August 25, 2020 from 11:00 a.m. – 12:00 p.m. PDT
- Tuesday August 25, 2020 from 1:00 p.m. – 2:00 p.m. PDT
Finally, we will have more Red Hatters at the sponsored booth throughout the event, to solve your questions regarding running Kafka on Kubernetes.