Kafka Kya Hai?
- Kafka ek distributed messaging system hai jo real-time data streaming ke liye use hota hai.
- Ye publish-subscribe model par kaam karta hai, jismeh producer messages bhejte hain aur consumer wo messages padte hain.
- Kafka ko high-throughput, scalability, aur fault tolerance ke liye design kiya gaya hai.
Partition Kya Hota Hai?
- Partition ek topic ka sub-division hai jo data ko multiple brokers (servers) me distribute karta hai.
- Har partition me messages ordered hote hain aur har message ka ek unique offset hota hai jo uska identifier hota hai.
-
Partition ka purpose:
- Scalability: Multiple partitions hone se parallel processing ho sakti hai.
- Fault Tolerance: Partitions ko replicate karke data durable banaya jata hai.
- Load Balancing: Messages ko alag-alag partitions me distribute karke brokers ka load balance hota hai.
Partition Banate Kyu Hain?
- Parallel Processing: Agar ek topic me ek se zyada partitions hain, toh multiple consumers parallel me messages read kar sakte hain.
- High Throughput: Jyada partitions hone se zyada speed se data process ho sakta hai kyunki har partition ek alag consumer ko assign ho sakta hai.
- Ordering Guarantee: Ek partition ke andar messages order me hote hain, lekin alag-alag partitions ke beech order guarantee nahi hoti.
- Fault Tolerance: Agar ek broker fail ho jaye toh uske partitions ke replicas dusre brokers par hote hain jo us data ko accessible banate hain.
Kafka Me Partition Kaise Divide Hota Hai?
-
Key Based Partitioning:
- Agar message me key di gayi hai (jaise user ID), toh Kafka hashing algorithm use karke us key ko ek partition me bhejta hai.
- Isse consistent ordering milti hai, jaise ek user ke sare messages ek hi partition me jayenge.
-
Round-Robin Partitioning:
- Agar key nahi di gayi hai, toh Kafka messages ko round-robin method se partitions me distribute karta hai.
- Isse partitions me even distribution hota hai.
Example:
Topic: order-events (4 Partitions)
Partition 0: [Message 0, Message 4, ...]
Partition 1: [Message 1, Message 5, ...]
Partition 2: [Message 2, Message 6, ...]
Partition 3: [Message 3, Message 7, ...]
- Yaha 4 partitions hain jo round-robin me distribute hue hain.
- Agar key (jaise user ID) use kari hoti, toh us user ke sare messages ek hi partition me jate.
Consumer aur Consumer Group Kya Hain?
-
Consumer:
- Ye messages read karta hai ek ya ek se zyada partitions se.
- Agar ek topic ko ek consumer pad raha hai, toh wo topic ke sare partitions usi consumer ko milenge.
- Agar multiple consumers hain bina consumer group ke, toh sabko same messages milenge (broadcast).
-
Consumer Group:
- Ek consumer group me multiple consumers hote hain jo mil kar ek topic read karte hain.
- Har message sirf ek consumer ko milta hai us group me.
- Partitions group ke consumers me distribute hote hain taaki parallel processing ho sake.
Consumer Jyada ya Kam Honge Toh Kya Hoga?
-
Jyada Consumers honge:
- Agar consumers ki sankhya partitions se jyada ho jaye, toh kuch consumers idle (bekar) rahenge kyunki ek partition ek hi consumer ko assign hota hai.
- Example: 4 partitions aur 6 consumers hain, toh 2 consumers idle rahenge.
-
Kam Consumers honge:
- Agar consumers ki sankhya partitions se kam hai, toh ek consumer multiple partitions read karega.
- Example: 6 partitions aur 3 consumers hain, toh har consumer ko 2 partitions milenge.
-
Barabar Consumers honge:
- Agar consumers aur partitions ki sankhya barabar hai, toh har consumer ko ek partition milega.
- Ye best performance aur load balancing ke liye ideal hai.
Possible Cases:
-
Partitions > Consumers:
- Kuch consumers multiple partitions padhte hain.
- High throughput aur better resource utilization hota hai.
-
Partitions < Consumers:
- Kuch consumers idle hote hain.
- Resource wastage hoti hai.
-
Partitions = Consumers:
- Har consumer ko ek partition milta hai.
- Best parallelism aur load balancing hota hai.
Kafka Ko Kaise Configure Karein?
- Topic Banate Waqt Partition Aur Replication Factor Set Karna:
kafka-topics.sh --create \
--bootstrap-server localhost:9092 \
--topic order-events \
--partitions 4 \
--replication-factor 3
- Is command me hum 4 partitions aur replication factor 3 set kar rahe hain.
- Partitions Badana (Scaling):
kafka-topics.sh --alter \
--bootstrap-server localhost:9092 \
--topic order-events \
--partitions 6
- Ye topic me partitions ko
4
se6
kar deta hai, lekin purane data ka redistribution nahi hota.
-
Replication Factor Badana:
- Ye sirf cluster configuration me badlaav karke ho sakta hai aur data ko rebalance karna padta hai.
Best Practices:
- Partitions = Consumers x 2 rakhne se scalability aur performance acchi hoti hai.
- Replication Factor 3 rakhne se fault tolerance badhta hai.
- Consistent Keys use karo taaki messages ordered rahein (jaise User ID ya Order ID).
- 2000 partitions per broker se jyada na ho varna performance impact hota hai.
Top comments (0)