Topics and partitions recap

As a reminder, a topic is a collection of messages that are persisted to disk, and replicated across brokers for fault-tolerance. Topics are configured with a duration, so that messages can be made available for a short time or potentially forever.

Figure 14 shows the difference between storing messages in a relational database management system (RDBMS) and storing them in a Kafka topic. Kafka appends each event to a log and does not overwrite the events. In contrast, RDBMS stores a snapshot of a record, and when you update the record you are modifying the original message.

An RDBMS updates records in place, whereas Kafka places updates in a log after the old messages.
Figure 14: An RDBMS updates records in place, whereas Kafka places updates in a log after the old messages.

 

A topic can be divided into several partitions to improve performance in cases of heavy load. The partitions of a topic are distributed across brokers in an Apache Kafka cluster to maximize parallelism when working with topics (Figure 15).

 

Providing multiple partitions for a topic can improve throughput.
Figure 15: Providing multiple partitions for a topic can improve throughput.