DEV Community

Donald Johnson
Donald Johnson

Posted on

A Bite-Sized Journey into Kafka, Arrow, and Go

CandyFlow isn’t an actual product—it’s a playful concept to showcase what happens when you combine Apache Kafka (scalable streaming), Apache Arrow (ultra-fast in-memory columnar data), and Go (efficient microservices with concurrency). By using these three technologies together, you can build a lean yet incredibly powerful data pipeline that can handle tens of thousands of requests per second at sub-millisecond latencies.

1. Why Kafka + Arrow + Go?

  1. Kafka:

    • A bulletproof message broker that ingests massive volumes of data and streams it in real time.
  2. Arrow:

    • A columnar in-memory format, perfect for zero-copy reads and near-instant analytics/queries.
  3. Go:

    • Offers excellent concurrency performance and a lightweight approach for building HTTP endpoints and consumers.

Putting them in Docker Compose means you can spin up a working prototype with minimal overhead, then scale out if you need bigger volumes in production.


2. Under the Hood (Conceptually)

  • Producer (Go) → Publishes JSON “candy price” updates to Kafka.
  • Consumer (Go + Arrow) → Reads from Kafka, appends each message into an Arrow-based table in memory, then exposes an HTTP endpoint (/cheapest, etc.) to handle user queries instantly.
  • topic-init Container → Creates the Kafka topic automatically on startup.
  • Zookeeper & Kafka → Provide the robust messaging backbone.

CandyFlow is purely an illustrative name; the “candy price” angle is just for fun. In reality, you could track e-commerce prices, sensor data, or any streaming events that need real-time lookups.


3. The Performance Numbers

Using k6 load tests, we hammered the consumer endpoint (/cheapest):

  1. Ramping from 1k RPS to 10k RPS.
  2. Achieved a p(95) latency of ~0.4–0.5 ms.
  3. Zero HTTP errors across millions of requests.
  4. Only rare outliers around 200 ms, likely due to minor GC/network blips.

This level of throughput and sub-millisecond latency is exceptional and shows how Arrow’s columnar structure + Go’s concurrency + Kafka’s streaming capabilities come together seamlessly.


4. Not a Product, but a Teaching Tool

Remember: CandyFlow is not a real candy-price aggregator. It’s an example designed to:

  • Demonstrate the synergy of Kafka (for ingestion), Arrow (for in-memory performance), and Go (for concurrency and HTTP).
  • Prove you can achieve near real-time queries (sub-ms) under heavy loads (thousands to tens of thousands RPS).
  • Inspire you to apply this same concept to e-commerce price trackers, IoT sensor data streams, or real-time analytics.

5. Closing Thoughts

  • Cost-Effective & Scalable: The Docker Compose approach is quick to launch and test. You can expand partitions/replicas for bigger use cases.
  • Minimal Complexity: A few containers, a small amount of Go code, and a straightforward Arrow schema are all it takes.
  • Impressive Performance: Sub-millisecond latencies at 10k+ RPS without throwing specialized hardware or monstrous clusters at the problem.

CandyFlow stands as a sweet demonstration of what’s possible with Kafka, Arrow, and Go—and hopefully sparks ideas for your own real-world streaming and analytics needs!

Top comments (0)