Kafka Internals : Kafka Consumer Group Is Smart Choice

vipul pachauri
4 min readJul 27, 2023

--

The consumer group in Kafka is an intelligent design choice. In this article, we will discuss the concept of a consumer group and highlight its powerful design.

Why Kafka offers consumer groups ?

We know Kakfa is not only a queue-based system,it also supports pub-sub messaging system,right. Let’s understand this with an example.

Imagine we are uploading a YouTube video that needs to undergo compression, copyright checks and formatting into different formats. To accomplish this, we can utilize various microservices responsible for compression, copyright validation, and formatting. These microservices, referred to as consumers, need to receive the video message. Using a queue-based system, if the copyright system retrieves the message (video), the others (compression, formatter) won’t be able to access it. This approach is not ideal because we want a pub-sub (publish-subscribe) messaging system instead, to facilitate this, Kafka offers consumer groups.

What is Consumer Group ?

A consumer can only consume a specific partition within a topic. Let’s consider a scenario with three partitions and a consumer group initially without any consumers.

  1. When we add a consumer, it becomes responsible for all partitions by default. This arrangement is acceptable because each partition should belong to one and only one consumer. A partition can only be read by one consumer in a group, it is impossible for two consumers in the same group to read the same partition.
  2. If we add another consumer to the group, a balancing stage begins, resulting in each consumer being assigned a separate partition.
  3. With three consumers and three partitions, each consumer will have its own partition.
  4. If we have more than three consumers, there won’t be enough partitions for each one, so the additional consumer(s) will remain idle.

Now let’s understand and differentiate how Kafka behaves as a Queue and Pub-Sub.

Kafka as Queue

Kafka as Queue

Above discussed behavior is similar to a queue implementation. To create a queue-like behavior, we add all consumers to the same group, and each partition is consumed by one and only one consumer. Once a consumer reads a message from a partition and commits it, it is considered complete and allowing the process to move on to other tasks. The message in that partition will not be consumed by any other consumer. This behavior provides the parallelism we desire, as consumers can read different partitions simultaneously. While a consumer can read other partitions as well, the crucial point is that once a partition is read by one consumer, it becomes invisible to any other consumer in same consumer group. So this is the way we use Kafka as queue.

Kafka as Pub-Sub

Kafka as Pub-Sub

Imagine we have two consumer groups: CG1 and CG2. Each consumer group consists of three consumers. With this setup, both CG1 and CG2 can subscribe to the same topic and process the data in parallel independently. Each consumer within CG1 will be assigned a partition to consume from, and the same applies to the consumers within CG2. This means that the messages published to the topic will be consumed by the consumers in each group simultaneously.

When a message is produced to the topic, Kafka ensures that it is replicated across multiple partitions. Each partition represents an ordered and immutable sequence of messages. Consumer groups are designed to enable parallelism and scalability. By having multiple consumer groups, each with multiple consumers, we can process messages concurrently, resulting in faster data processing and improved throughput.

In Kafka, if we want to utilize it as a pub-sub (publish-subscribe) model, we can add multiple consumer groups. Each consumer group can independently read data from the same topic or partition.Kafka automatically handles the partition assignment process and ensures that a partition is consumed by only one consumer within a consumer group. This distribution of partitions among consumers allows for efficient and parallel processing of the data.

Overall, using Kafka as a pub-sub model with multiple consumer groups enables parallel processing and scalability, as different consumer groups can read data from the same topic or partition concurrently. We can add or remove consumer groups as needed to accommodate the desired processing capacity.This approach provides the flexibility to distribute the workload and efficiently handle large volumes of data.

Thank you for taking the time to read this article. I hope you enjoyed it and found valuable insights that contribute to your tech knowledge. Happy Learning!!

--

--

vipul pachauri
vipul pachauri

Written by vipul pachauri

Senior Software Backend Engineer

No responses yet