Featured image for Kafka topics.

This article is the second in a two-part series describing the many ways to run Apache Kafka and the benefits of each. The first article covered distributions for local development and self-managed Kafka. This article talks about Kafka as a Service and "serverless-like" Kafka. We'll conclude with guidance on when you should use each type of distribution.

Kafka as a Service

The popularity of Kafka has made it an attractive service for third-party cloud vendors. Each vendor is different, but some of the common criteria for production Kafka clusters include deployment on multiple availability zones, on-demand scaling, compliance and certification assurance, a predictable cost model, and an open platform for third-party tools and services integrations.

Today, Kafka is over a decade old and there are multiple mature Kafka as a Service offerings able to satisfy many production needs. Although these offerings vary in sizing options, the richness of the user interface, Kafka ecosystem components, and so forth, a key difference is whether Kafka is treated as an infrastructure component or as its own event-streaming category with event-streaming abstractions.

Some Kafka as a Service providers (such as Amazon Managed Streaming for Apache Kafka, Heroku, Instaclustr, and Aiven) expose infrastructure details, such as virtual machine (VM) sizes, the number of cores and memory, storage types, broker, Zookeeper, Kafka Connect details, and so forth. Many critical decisions about the infrastructure choice, capacity matching to Kafka, Kafka cluster configurations, and Zookeeper topology, are left for the user to decide. These services resemble infrastructure services that happen to run Kafka on top, a design reflected in their VM-based sizing and pricing models. These services have a larger selection of sizing options. Such providers might be preferred by teams that have infrastructure knowledge and prefer to control all aspects of a service (even a managed service).

Other Kafka as a Service providers (such as Red Hat Openshift Streams for Apache Kafka, Confluent Cloud, Amazon MSK Serverless, Upstash) take a diametrically opposed "Kafka-first" approach. The vendors take care of the details of the infrastructure, management layer (typically based on Kubernetes), and Kafka clusters. With these services, the user is dealing with higher level, Kafka-focused abstractions such as:

  • Streaming/Kafka/topic-like units of measure (representing normalized, multi-dimensional Kafka capacity) rather than infrastructure capacity
  • Availability guarantees instead of the deployment topology of brokers and Zookeeper
  • Connectors to external systems as an API (regardless of the implementation technology) instead of Kafka Connect cluster deployment and connector deployments

This approach exposes what is important for a Kafka user and not the underlying infrastructure or implementation choices that make up a Kafka service. In addition, these Kafka-first services offer a consumption-based, Kafka-centric pricing model, where the user pays for the Kafka capacity they use and quality of service rather than provisioned infrastructure with the additional Kafka margin. These services are more suitable for business teams that focus on their business domain and treat Kafka as a commodity tool to solve their business challenges.

Figure 1 rates some popular services along two dimensions: high-level abstractions versus low-level details, and user-managed versus fully autonomous.

Kafka as a Service offerings can be rated  for high-level abstractions versus low-level details, and user-managed versus fully autonomous.
Figure 1. Kafka as a Service offerings can be rated for high-level abstractions versus low-level details, and user-managed versus fully autonomous.
Figure 1: Kafka as a Service offerings can be rated for high-level abstractions versus low-level details, and user-managed versus fully autonomous.

"Serverless-like" Kafka

Serverless technologies are a class of SaaS that free users from the work of provisioning and managing servers. Such services eliminate the need for capacity management and scaling through built-in high availability, built-in rebalancing, automatic scaling up, and scaling down to zero. The benefits offered to users include a pay-per-use pricing model.

Some Kafka-first managed services are blurring the line and getting closer to a serverless-like experience, where the user is interacting with Kafka APIs and everything else is taken care of.

Serverless-like Kafka offerings currently have strengths and weaknesses. On the positive side, Kafka-first services are getting pretty close to a serverless user experience, except for the pricing aspect. Their users don't have to provision infrastructure. The Kafka clusters are already preconfigured for high availability, with partition rebalancing, storage expansion, and auto-scaling (within certain boundaries).

On the negative side, whether a Kafka service is called serverless or not, these offerings still have significant technical and pricing limitations and are not yet mature enough. The services are constrained in terms of message size, partition count, partition limit, network limit, and storage limit. These constraints limit the use cases for so-called serverless Kafka. Other than Upstash, which charges per message, the remaining serverless Kafka services charge for cluster hours, which goes against the scale-to-zero/pay-per-use ethos of the serverless definition.

That is why today I consider the serverless Kafka category still an aspiration rather than a reality. Nevertheless, these trends set the direction where managed Kafka offerings are headed:

  • Infrastructure and deployment issues completely hidden from the user
  • Kafka-first primitives for capacity, usage, and quality of service
  • An autonomous service lifecycle that doesn't require any user intervention
  • A true pay-for-what-you-use pricing model

What Kafka distribution is right for you?

Table 1 shows the pros and cons of the different models covered in this series. How many types do you need? The answer is more than one.

Table 1: Pros and cons of different types of Kafka distributions

Kafka for local development

Self-managed Kafka

Kafka as a Service

"Serverless-like" Kafka


Developer friendly, stateless runtimes for rapid iterative development

Allows full customization and tuning for on-premise or cloud deployment

Takes away the burden of maintaining, certifying, and operating Kafka clusters

Pay-per-use pricing with autonomous management and autoscaling


These deployments are not applicable for any other purpose than development

Requires 24/7 monitoring, patching, upgrades, and maintenance

Kafka clusters are configured and deployed with common configurations that might not apply for custom needs

Imposes constraints on the Kafka clusters that limit their use cases to simple ones

We'll end by looking at the various requirements you might be dealing with in your environment. You want developer frameworks (such as Quarkus) that can emulate Kafka locally and enable rapid, iterative development. You want a declarative and automated way to repeatedly deploy and update development environments. Depending on your business requirements, you may require highly customized Kafka implementations at the edge or standard implementations across multiple clouds that are all connected. As your organization's adoption of event streaming and sophistication with Kafka grow, you will need more Kafka capabilities.

But there is a paradox. If you are not in the Kafka business, you should work less on Kafka itself and use Kafka more with a focus on tasks that set your business apart. This is possible if you use Kafka through higher-level frameworks like Strimzi that automate many of the operational aspects, or through a Kafka-first service (such as Red Hat OpenShift Streams for Apache Kafka) that takes care of low-level decision-making and relieves you of the responsibility of running Kafka. This way, your teams stop thinking about Kafka and start thinking about how to use Kafka for what matters to your customers.

Last updated: January 6, 2023