What is Apache Kafka?
Apache Kafka is a distributed data streaming platform that enables applications to publish, subscribe to, store, and process streams of messages in real time. Publish/subscribe (pub/sub) systems are characterized by senders pushing messages to a central point for classification. Subscribers receive messages of interest from the central point.
Apache Kafka offers a few major advantages over traditional pub/sub systems:
- Storing messages with fault tolerance.
- Processing streams (data streams) as they occur in real time instead of in a batch.
- Guarantees that messages are never overridden.
- Support for very high throughput.
Publish/subscribe systems require a broker, which is the central point where messages are published. A typical Kafka cluster contains multiple brokers (Figure 1). Topics are hosted on the brokers, and each topic is split into one or more partitions. We will dive into each of these concepts in the next few sections.