DEV Community

Cover image for Why Kafka? A Developer-Friendly Guide to Event-Driven Architecture
Athreya aka Maneshwar
Athreya aka Maneshwar

Posted on

Why Kafka? A Developer-Friendly Guide to Event-Driven Architecture

What is Kafka?

Kafka is an open-source distributed event streaming platform designed for handling real-time data feeds.

Originally developed at LinkedIn and later open-sourced under the Apache Software Foundation, Kafka is now widely used for building high-throughput, fault-tolerant, and scalable data pipelines, real-time analytics, and event-driven architectures.

Image description

What Problem Does Kafka Solve?

Before Kafka, traditional message queues like RabbitMQ and ActiveMQ were widely used, but they had limitations in handling massive, high-throughput real-time data streams.

Kafka was designed to address these issues by providing:

  • Large-scale data handling – Kafka is optimized for ingesting, storing, and distributing high-volume data streams across distributed systems.
  • Fault tolerance – Kafka replicates data across multiple nodes, ensuring that even if a broker fails, data remains available.
  • Durability – Messages persist on disk, allowing consumers to replay events when needed.
  • Support for event-driven architecture – It enables asynchronous communication between microservices, making it ideal for modern cloud applications.

When to Use Kafka

Kafka is the right choice when you need:

  • High-throughput, real-time data processing – Ideal for log processing, financial transactions, and IoT data streams.
  • Microservices decoupling – Kafka acts as an intermediary, allowing microservices to communicate asynchronously without direct dependencies.
  • Event-driven systems – If your architecture revolves around reacting to changes (e.g., a user event triggering multiple downstream actions), Kafka is a solid choice.
  • Reliable message delivery with persistence – Unlike traditional message queues that may drop messages, Kafka retains messages for a configurable period, ensuring durability and replayability.
  • Scalability and fault tolerance – Kafka’s distributed nature allows it to scale horizontally while maintaining fault tolerance through replication.

How Kafka Works

Kafka consists of several key components:

Image description

1. Message

A message is the smallest unit of data in Kafka.

It can be a JSON object, a string, or any binary data.

Messages may have an associated key, which determines which partition the message will be stored in.

2. Topic

A topic is a logical channel where messages are sent by producers and read by consumers. Topics help categorize messages (e.g., logs, transactions, orders).

3. Producer

A producer is a Kafka client that publishes messages to a topic. Messages can be sent in three ways:

  • Fire and forget – The producer sends the message without waiting for confirmation, ensuring maximum speed but risking data loss.
  • Synchronous send – The producer waits for an acknowledgment from Kafka before proceeding, ensuring reliability but adding latency.
  • Asynchronous send – The producer sends messages in batches asynchronously, offering a balance between speed and reliability.

Image description

Kafka allows configuring acknowledgments (ACKs) to balance consistency and performance:

  • ACK 0 – No acknowledgment required (fastest but riskier).
  • ACK 1 – The message is acknowledged when the leader broker receives it (faster but less safe).
  • ACK All – The message is acknowledged only when all replicas confirm receipt (slower but safest).

Producer Optimizations

  • Message Compression & Batching – Kafka producers can batch and compress messages before sending them to brokers. This improves throughput and reduces disk usage but increases CPU overhead.
  • Avro Serializer/Deserializer – Using Avro instead of JSON requires defining schemas upfront, but it improves performance and reduces storage consumption.

4. Partition

Kafka topics are divided into partitions, which allow for parallel processing and scalability.

Messages in a partition are ordered and immutable.

5. Consumer

A consumer reads messages from partitions and keeps track of its position using an offset.

Consumers can reset offsets to reprocess older messages.

Kafka consumers work on a polling model, meaning they continuously request data from the broker rather than the broker pushing data to them.

Image description

Consumer Optimization

  • Partition Assignment Strategies:

    • Range – Consumers get consecutive partitions.
    • Round Robin – Partitions are evenly distributed across consumers.
    • Sticky – Tries to minimize changes during rebalancing.
    • Cooperative Sticky – Like Sticky but allows cooperative rebalancing.
  • Batch Size Configuration – Consumers can define how many records or how much data should be retrieved per poll cycle.

6. Consumer Group

A consumer group is a set of consumers that work together to process messages from a topic.

Kafka ensures that a single partition is consumed by only one consumer within a group, maintaining order.

7. Offset Management

When a consumer reads a message, it updates its offset—the position of the last processed message.

  • Auto-commit – Kafka automatically commits the offset at regular intervals.
  • Manual commit – The application explicitly commits the offset, either synchronously or asynchronously.

8. Broker

A broker is a Kafka server that stores messages, assigns offsets, and handles client requests.

Multiple brokers form a Kafka cluster for scalability and fault tolerance.

Image description

9. Zookeeper

Zookeeper manages metadata, tracks brokers, and handles leader elections.

However, newer Kafka versions are working towards eliminating Zookeeper dependencies.

Example: Kafka in Action

To understand Kafka better, let's look at a simple example where a producer sends messages to a topic, and two different consumers process those messages separately: one simulating an email notification service and the other storing messages in a database.

Setup Kafka (docker-compose.yml)

services:
  zookeeper:
    image: confluentinc/cp-zookeeper:latest
    container_name: zookeeper
    restart: always
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181

  kafka:
    image: confluentinc/cp-kafka:latest
    container_name: kafka
    restart: always
    depends_on:
      - zookeeper
    ports:
      - "9092:9092"
      - "29092:29092"
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092,PLAINTEXT_INTERNAL://kafka:29092
      KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,PLAINTEXT_INTERNAL://0.0.0.0:29092
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_INTERNAL:PLAINTEXT
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
Enter fullscreen mode Exit fullscreen mode

Image description

Producer Code (producer.js)

const { Kafka } = require("kafkajs");

const kafka = new Kafka({
  clientId: "family-producer",
  brokers: ["localhost:9092"],
});
const producer = kafka.producer();

async function sendMessage() {
  await producer.connect();
  console.log("🟢 Producer connected");

  const message = {
    id: Date.now(),
    content: `Hi Mom! Time is ${new Date().getMinutes()}:${new Date().getSeconds()}`,
  };
  await producer.send({
    topic: "family-topic",
    messages: [{ value: JSON.stringify(message) }],
  });

  console.log(`📨 Sent: ${JSON.stringify(message)}`);
  await producer.disconnect();
}

sendMessage();
Enter fullscreen mode Exit fullscreen mode

Image description

Consumer for Email Notifications (consumer.js)

const { Kafka } = require("kafkajs");

const kafka = new Kafka({
  clientId: "family-email-consumer",
  brokers: ["localhost:9092"],
});
const consumer = kafka.consumer({ groupId: "email-group" });

async function consumeMessages() {
  await consumer.connect();
  await consumer.subscribe({ topic: "family-topic", fromBeginning: true });
  console.log("🟢 Email Consumer Connected");

  await consumer.run({
    eachMessage: async ({ message }) => {
      const msg = JSON.parse(message.value.toString());
      console.log(`📩 Notification Sent: "${msg.content}"`);
      console.log(`📧 Email Sent: "${msg.content}" \n`);
    },
  });
}

consumeMessages();
Enter fullscreen mode Exit fullscreen mode

Image description

Consumer for Database Storage (dbconsumer.js)

const { Kafka } = require("kafkajs");

const kafka = new Kafka({
  clientId: "family-db-consumer",
  brokers: ["localhost:9092"],
});
const consumer = kafka.consumer({ groupId: "db-group" });

async function consumeMessages() {
  await consumer.connect();
  await consumer.subscribe({ topic: "family-topic", fromBeginning: true });
  console.log("🟢 DB Consumer Connected");

  await consumer.run({
    eachMessage: async ({ message }) => {
      const msg = JSON.parse(message.value.toString());
      console.log(`💾 Storing message in DB: "${msg.content}" \n`);
    },
  });
}

consumeMessages();
Enter fullscreen mode Exit fullscreen mode

Image description

Final Thoughts

Kafka is a powerful tool that has transformed real-time data processing.

However, while it offers incredible scalability and durability, it’s crucial to evaluate whether it's the right fit for your architecture.

Image description

Stay tuned! I will write a follow-up article comparing Kafka vs. Redis to explore their use cases and when to choose one over the other. 🚀

I’ve been working on a super-convenient tool called LiveAPI.

LiveAPI helps you get all your backend APIs documented in a few minutes

With LiveAPI, you can quickly generate interactive API documentation that allows users to execute APIs directly from the browser.

Image description

If you’re tired of manually creating docs for your APIs, this tool might just make your life easier.

Sources: Some images have been taken from here: 1

Top comments (8)

Collapse
 
almaren profile image
Alexander

Thanks for short JS samples. When I first heard about Kafka, I thought it's about Franz Kafka.

Collapse
 
lovestaco profile image
Athreya aka Maneshwar

Thanks :)
Haha xD

Collapse
 
jesterly profile image
Jester Lee

Lol same ... my first thought was "cockroach" 🪳

Collapse
 
amirhoseinhaseli profile image
Amir Hosein Haseli

Really liked the way you described it, nice and informative 👍🏻

Collapse
 
lovestaco profile image
Athreya aka Maneshwar

Thanks a lot :)

Collapse
 
devh0us3 profile image
Alex P

Hi! Don’t forget that setting up Kafka properly for your needs requires careful planning

Here are a few things to consider:

  1. Plan the number of partitions in advance
  2. Avoid making a single message handler do too many things — otherwise, processing speed will be limited by the slowest one and your consumer lag increase
  3. Think ahead about data deletion. If you ever need to instantly remove a user's data, the only way to do that is if the data was encrypted with a unique key per user — deleting the key would then erase the data (data will be useless without the key)
  4. If your topic already holds terabytes of data, you need to decide where new consumers should start reading from
  5. Topic configuration (retention period, compaction mode, number of partitions, etc.) should involve both sysadmins and developers
  6. Don’t overlook failover testing, batch reading, commit modes, and various client-side details

Overall, Kafka is great — I’ve worked with it a lot

By the way, why didn’t you mention Kafka Streams?

Collapse
 
jesterly profile image
Jester Lee

I'm glad to read about Kafka here. It's pretty capable and proven, as you can see here:

blog.cloudflare.com/using-apache-k...

Collapse
 
madhurima_rawat profile image
Madhurima Rawat

Great article 👏 Love the Kafka memes 😂 The breakdown of each component is really good.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.