Why & How Kafka Fast ?

vipul pachauri
3 min readJul 12, 2023

--

The Need for Speed: How Kafka Moves Data Lightning-Fast

Welcome guys, to our journey into the world of system design! Today, we’ll explore the secrets behind Kafka’s impressive speed and efficiency. If you’re new to the subject, fear not! We’ll break down complex concepts into simple terms, making it easy for even newcomers to grasp. So, let’s embark on our adventure and uncover the magic behind Kafka’s high-speed data processing capabilities!

Section 1: Understanding Kafka’s Speed

To understand why Kafka is considered fast, let’s start with the basics. Kafka is designed to handle large amounts of data quickly. But what does “fast” mean in this context? It refers to Kafka’s ability to process and move data efficiently. Whether it’s about reducing delays or handling a large number of messages, Kafka’s speed comes from smart design choices.

Sequential vs Random Access

Sequential Access in disk is faster than Random Access in memory.

Section 2: The Power of Sequential I/O

One key design decision behind Kafka’s speed is its use of sequential input/output (I/O). Think of it like water flowing through a pipe. In this case, Kafka treats data like a liquid and uses a clever trick called an “append-only log.” This log adds new data to the end of a file, allowing Kafka to read and write data in a sequential manner. By avoiding random jumps, like skipping from place to place on a hard drive, Kafka’s sequential approach makes data processing much faster.

Zero Copy (ByteByteGo refrence)

Section 3: Streamlined Data Transfer with Zero Copy

Efficiency is another vital aspect of Kafka’s speed. When transferring data between the network and disk, Kafka aims to minimize unnecessary copying. This is where “zero copy” comes into play. It’s a technique that allows Kafka to directly transfer data from the operating system’s cache to the network, without making unnecessary copies. By eliminating extra steps, Kafka can move data faster and more efficiently.

Kafka’s data is not written to the hard disk in real-time. When the Broker receives data, it writes the data to the Page Cache first and later on flushes it to disk asynchronously.

Kafka does not load the data in the application buffer (sine kafka keeps the data in the same (binary) format during its lifecycle), it directly copies data from the disk (page cache to the NIC buffer). It reduces byte copying and context switches, hence making the process faster. This copy uses DMA (direct memory access) which means that the CPU is not involved and that makes the process way more efficient (bringing down the time by ~60 % ).

Section 4: Supercharging Kafka’s Performance

While sequential I/O and zero copy are the main ingredients of Kafka’s speed, there are other tricks up its sleeve. Kafka takes advantage of hard disks, which offer more storage space at a lower cost compared to solid-state drives (SSDs). This allows Kafka to retain messages for longer periods without sacrificing performance. By optimizing every aspect of data processing, Kafka ensures lightning-fast performance on modern hardware.

Conclusion:

We’ve learned how sequential I/O and zero copy principles contribute to its impressive performance. Remember, Kafka is a powerful tool for processing data quickly and effectively. Happy learning and enjoy your adventures with Kafka’s lightning-fast capabilities!

--

--