This 56th edition of the Kafka Monthly Digest covers what happened in the Apache Kafka community in September 2022.
For last month’s digest, see Kafka Monthly Digest: August 2022.
Four bugfix versions, 2.8.2, 3.0.2, 3.1.2 and 3.2.3, have been released to address CVE-2022-34917. This CVE allows malicious clients to allocate large amounts of memory on brokers potentially leading to
OutOfMemoryError and causing denial of service. You should upgrade to one of these releases as soon as possible.
The next minor version, Kafka 3.3, has released.
Just before announcing 3.3.0 a couple of blocker JIRAs (KAFKA-14259 and KAFKA-14265) were found. So 3.3.0 was scrapped and on September 29 José Armando García Sancio published 3.3.1 RC0. The vote passed and 3.3.1 was released on October 3. As always, you can find the complete list of changes in the release notes (for 3.3.0 and 3.3.1) or the release plan on the Kafka wiki.
This new minor release brings many new interesting features, which I'll highlight in the next sections.
Kafka brokers and clients
Updates to the Kafka broker and clients include the following:
Running Kafka in KRaft mode is now production ready. Note that there are still a few missing features such as configuring SCRAM or updating certain dynamic configurations via the admin API, support for JBOD configurations and delegation tokens. It's also still not possible to upgrade clusters using ZooKeeper to KRaft, this feature is currently planned to be ready by Kafka 3.5 (KIP-833).
Upgrades in KRaft mode are now supported. If you deploy a cluster with KRaft with 3.3 you will be able to upgrade it when Kafka 3.4 is released (KIP-778).
You can now retrieve the total and free space of log directories via the Admin API describeLogDirs() method. (KIP-827).
The default partitioner has been updated to spread records uniformly. This partitioner is now used when
partitioner.classis not set. (KIP-794).
OffsetFetchrequest has been updated so it can retrieve the committed offsets for multiple groups at the same time. This can improve the performance of applications or tools managing multiple consumer groups (KIP-709).
Superusers can now create delegation tokens on behalf of other users (KIP-373).
There are now metrics that track log recovery at startup (KIP-831).
Updates to Kafka Connect include the following:
- Source connectors can now provide exactly once semantics (KIP-618).
Updates to Kafka Streams include the following:
A number of recent improvements from the Processor API are now available via the Streams DSL API (KIP-820).
You can now pause and resume Streams topologies. This can be useful to reduce resource usage when processing is not required or when handling operational issues (KIP-834).
Last month, the community submitted 6 KIPs (KIP-866 to KIP-872, 867 was skipped). I'll highlight a few of them:
KIP-866 ZooKeeper to KRaft Migration. This KIP proposes a mechanism to migrate Kafka clusters currently using ZooKeeper to KRaft. The goal is to allow migrating clusters to KRaft without impacting their availability or consistency.
KIP-868 Metadata Transactions. In KRaft mode, whenever a change in the cluster metadata happens, it is written into the
__cluster_metadatatopic. Some changes, like a topic creation, maps to several records in the topic. In these cases, in order to guarantee the atomicity of the change, the records are put in the same batch. However the maximum fetch size by members of the quorum, currently 8 kB, limits the maximum batch size and can prevent some changes from being applied. To address this issue, this KIP proposes a lightweight transaction mechanism when writing to the metadata topic so metadata records don't necessarily have to be in the same batch to be applied atomically.
KIP-869: Improve Streams State Restoration Visibility. When a Kafka Streams application rebalances, some tasks may need to restore their state. This process is currently hard to monitor. This KIP's goal is to provide metrics and APIs so users can easily track when tasks are restoring their state and see their progress.
KIP-870: Retention policy based on record event time. Kafka records have a timestamp field that is either set arbitrarily when producing them or set by the broker, using its own wall-clock time, when it receives them. Retention policies are applied by comparing the broker wall-clock time to timestamps so this can cause unexpected behaviors when records have arbitrary timestamps. This KIP proposes having two timestamp fields so one of the timestamp is always set to the broker wall-clock time.
- strimzi-kafka-operator 0.31.1: Strimzi is a Kubernetes Operator for running Kafka. This release adds support for Kafka 3.1.2 and 3.2.3. It's now possible to run multiple cluster operator replicas and use IPv6 addresses in Strimzi issued certificates. It also deprecated support for Kubernetes 1.16, 1.17 and 1.18.
I selected some interesting blog articles that were published last month:
- Instrumenting Apache Kafka clients with OpenTelemetry
- An Ideation for Kubernetes-native Kafka Connect
- Exploring Popular Open-source Stream Processing Technologies: Part 1 of 2
- Using Pixie to Monitor Strimzi Clusters and Kafka Applications
To learn more about Kafka, visit Red Hat Developer's Apache Kafka topic page.