What is Kafka?
Kafka is an open-source distributed event streaming platform designed for handling real-time data feeds.
Originally developed at LinkedIn and later open-sourced under the Apache Software Foundation, Kafka is now widely used for building high-throughput, fault-tolerant, and scalable data pipelines, real-time analytics, and event-driven architectures.
What Problem Does Kafka Solve?
Before Kafka, traditional message queues like RabbitMQ and ActiveMQ were widely used, but they had limitations in handling massive, high-throughput real-time data streams.
Kafka was designed to address these issues by providing:
- Large-scale data handling – Kafka is optimized for ingesting, storing, and distributing high-volume data streams across distributed systems.
- Fault tolerance – Kafka replicates data across multiple nodes, ensuring that even if a broker fails, data remains available.
- Durability – Messages persist on disk, allowing consumers to replay events when needed.
- Support for event-driven architecture – It enables asynchronous communication between microservices, making it ideal for modern cloud applications.
When to Use Kafka
Kafka is the right choice when you need:
- High-throughput, real-time data processing – Ideal for log processing, financial transactions, and IoT data streams.
- Microservices decoupling – Kafka acts as an intermediary, allowing microservices to communicate asynchronously without direct dependencies.
- Event-driven systems – If your architecture revolves around reacting to changes (e.g., a user event triggering multiple downstream actions), Kafka is a solid choice.
- Reliable message delivery with persistence – Unlike traditional message queues that may drop messages, Kafka retains messages for a configurable period, ensuring durability and replayability.
- Scalability and fault tolerance – Kafka’s distributed nature allows it to scale horizontally while maintaining fault tolerance through replication.
How Kafka Works
Kafka consists of several key components:
1. Message
A message is the smallest unit of data in Kafka.
It can be a JSON object, a string, or any binary data.
Messages may have an associated key, which determines which partition the message will be stored in.
2. Topic
A topic is a logical channel where messages are sent by producers and read by consumers. Topics help categorize messages (e.g., logs, transactions, orders).
3. Producer
A producer is a Kafka client that publishes messages to a topic. Messages can be sent in three ways:
- Fire and forget – The producer sends the message without waiting for confirmation, ensuring maximum speed but risking data loss.
- Synchronous send – The producer waits for an acknowledgment from Kafka before proceeding, ensuring reliability but adding latency.
- Asynchronous send – The producer sends messages in batches asynchronously, offering a balance between speed and reliability.
Kafka allows configuring acknowledgments (ACKs) to balance consistency and performance:
- ACK 0 – No acknowledgment required (fastest but riskier).
- ACK 1 – The message is acknowledged when the leader broker receives it (faster but less safe).
- ACK All – The message is acknowledged only when all replicas confirm receipt (slower but safest).
Producer Optimizations
- Message Compression & Batching – Kafka producers can batch and compress messages before sending them to brokers. This improves throughput and reduces disk usage but increases CPU overhead.
- Avro Serializer/Deserializer – Using Avro instead of JSON requires defining schemas upfront, but it improves performance and reduces storage consumption.
4. Partition
Kafka topics are divided into partitions, which allow for parallel processing and scalability.
Messages in a partition are ordered and immutable.
5. Consumer
A consumer reads messages from partitions and keeps track of its position using an offset.
Consumers can reset offsets to reprocess older messages.
Kafka consumers work on a polling model, meaning they continuously request data from the broker rather than the broker pushing data to them.
Consumer Optimization
-
Partition Assignment Strategies:
- Range – Consumers get consecutive partitions.
- Round Robin – Partitions are evenly distributed across consumers.
- Sticky – Tries to minimize changes during rebalancing.
- Cooperative Sticky – Like Sticky but allows cooperative rebalancing.
Batch Size Configuration – Consumers can define how many records or how much data should be retrieved per poll cycle.
6. Consumer Group
A consumer group is a set of consumers that work together to process messages from a topic.
Kafka ensures that a single partition is consumed by only one consumer within a group, maintaining order.
7. Offset Management
When a consumer reads a message, it updates its offset—the position of the last processed message.
- Auto-commit – Kafka automatically commits the offset at regular intervals.
- Manual commit – The application explicitly commits the offset, either synchronously or asynchronously.
8. Broker
A broker is a Kafka server that stores messages, assigns offsets, and handles client requests.
Multiple brokers form a Kafka cluster for scalability and fault tolerance.
9. Zookeeper
Zookeeper manages metadata, tracks brokers, and handles leader elections.
However, newer Kafka versions are working towards eliminating Zookeeper dependencies.
Example: Kafka in Action
To understand Kafka better, let's look at a simple example where a producer sends messages to a topic, and two different consumers process those messages separately: one simulating an email notification service and the other storing messages in a database.
Setup Kafka (docker-compose.yml)
services:
zookeeper:
image: confluentinc/cp-zookeeper:latest
container_name: zookeeper
restart: always
environment:
ZOOKEEPER_CLIENT_PORT: 2181
kafka:
image: confluentinc/cp-kafka:latest
container_name: kafka
restart: always
depends_on:
- zookeeper
ports:
- "9092:9092"
- "29092:29092"
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092,PLAINTEXT_INTERNAL://kafka:29092
KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,PLAINTEXT_INTERNAL://0.0.0.0:29092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_INTERNAL:PLAINTEXT
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
Producer Code (producer.js)
const { Kafka } = require("kafkajs");
const kafka = new Kafka({
clientId: "family-producer",
brokers: ["localhost:9092"],
});
const producer = kafka.producer();
async function sendMessage() {
await producer.connect();
console.log("🟢 Producer connected");
const message = {
id: Date.now(),
content: `Hi Mom! Time is ${new Date().getMinutes()}:${new Date().getSeconds()}`,
};
await producer.send({
topic: "family-topic",
messages: [{ value: JSON.stringify(message) }],
});
console.log(`📨 Sent: ${JSON.stringify(message)}`);
await producer.disconnect();
}
sendMessage();
Consumer for Email Notifications (consumer.js)
const { Kafka } = require("kafkajs");
const kafka = new Kafka({
clientId: "family-email-consumer",
brokers: ["localhost:9092"],
});
const consumer = kafka.consumer({ groupId: "email-group" });
async function consumeMessages() {
await consumer.connect();
await consumer.subscribe({ topic: "family-topic", fromBeginning: true });
console.log("🟢 Email Consumer Connected");
await consumer.run({
eachMessage: async ({ message }) => {
const msg = JSON.parse(message.value.toString());
console.log(`📩 Notification Sent: "${msg.content}"`);
console.log(`📧 Email Sent: "${msg.content}" \n`);
},
});
}
consumeMessages();
Consumer for Database Storage (dbconsumer.js)
const { Kafka } = require("kafkajs");
const kafka = new Kafka({
clientId: "family-db-consumer",
brokers: ["localhost:9092"],
});
const consumer = kafka.consumer({ groupId: "db-group" });
async function consumeMessages() {
await consumer.connect();
await consumer.subscribe({ topic: "family-topic", fromBeginning: true });
console.log("🟢 DB Consumer Connected");
await consumer.run({
eachMessage: async ({ message }) => {
const msg = JSON.parse(message.value.toString());
console.log(`💾 Storing message in DB: "${msg.content}" \n`);
},
});
}
consumeMessages();
Final Thoughts
Kafka is a powerful tool that has transformed real-time data processing.
However, while it offers incredible scalability and durability, it’s crucial to evaluate whether it's the right fit for your architecture.
Stay tuned! I will write a follow-up article comparing Kafka vs. Redis to explore their use cases and when to choose one over the other. 🚀
I’ve been working on a super-convenient tool called LiveAPI.
LiveAPI helps you get all your backend APIs documented in a few minutes
With LiveAPI, you can quickly generate interactive API documentation that allows users to execute APIs directly from the browser.
If you’re tired of manually creating docs for your APIs, this tool might just make your life easier.
Sources: Some images have been taken from here: 1
Top comments (8)
Thanks for short JS samples. When I first heard about Kafka, I thought it's about Franz Kafka.
Thanks :)
Haha xD
Lol same ... my first thought was "cockroach" 🪳
Really liked the way you described it, nice and informative 👍🏻
Thanks a lot :)
Hi! Don’t forget that setting up Kafka properly for your needs requires careful planning
Here are a few things to consider:
Overall, Kafka is great — I’ve worked with it a lot
By the way, why didn’t you mention Kafka Streams?
I'm glad to read about Kafka here. It's pretty capable and proven, as you can see here:
blog.cloudflare.com/using-apache-k...
Great article 👏 Love the Kafka memes 😂 The breakdown of each component is really good.
Some comments may only be visible to logged-in visitors. Sign in to view all comments.