Effective Backpressure Handling in Distributed Systems: Techniques, Implementations, and Workflows

Backpressure handling in distributed systems is essential to ensure systems can manage load efficiently, preventing service failures under high traffic conditions. Backpressure occurs when a system or service cannot process incoming requests as quickly as they arrive, leading to overload, failures, or data loss.

In this blog, we'll explore key techniques for handling backpressure, including implementation details and practical examples.

1. Rate Limiting (Token Bucket Algorithm)

Rate limiting controls how frequently a system processes requests, preventing overload by rejecting or delaying excess traffic. The Token Bucket Algorithm is a widely used rate-limiting technique.

How Token Bucket Works

The system maintains a "bucket" filled with tokens.
Each incoming request consumes a token.
Tokens are refilled at a fixed rate.
If the bucket is empty, new requests are either rejected or delayed.

Use Cases

API Gateways: Prevent abuse by limiting user requests within a time window.
Microservices: Avoid overwhelming downstream services with excessive traffic.

Key Characteristics

Token Generation Rate: Controls request flow.
Bucket Capacity: Determines the burst limit.
Request Handling: Requests without tokens are delayed or dropped.

Implementation in Java

import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;

public class TokenBucketRateLimiter {
    private final int maxTokens;
    private final AtomicInteger currentTokens;
    private long lastRefillTimestamp;
    private final long refillIntervalMillis;
    private final int refillTokens;

    public TokenBucketRateLimiter(int maxTokens, int refillTokens, long refillInterval, TimeUnit timeUnit) {
        this.maxTokens = maxTokens;
        this.currentTokens = new AtomicInteger(maxTokens);
        this.refillTokens = refillTokens;
        this.refillIntervalMillis = timeUnit.toMillis(refillInterval);
        this.lastRefillTimestamp = System.currentTimeMillis();
    }

    private void refill() {
        long now = System.currentTimeMillis();
        long elapsedTime = now - lastRefillTimestamp;

        if (elapsedTime > refillIntervalMillis) {
            int tokensToAdd = (int) (elapsedTime / refillIntervalMillis) * refillTokens;
            currentTokens.set(Math.min(maxTokens, currentTokens.get() + tokensToAdd));
            lastRefillTimestamp = now;
        }
    }

    public synchronized boolean tryConsume() {
        refill();
        if (currentTokens.get() > 0) {
            currentTokens.decrementAndGet();
            return true;
        }
        return false; // Request rejected
    }
}

2. Load Shedding

Load shedding drops excess requests when the system is overloaded, ensuring critical services remain functional.

How Load Shedding Works

The system rejects requests when traffic exceeds a pre-defined threshold.
Low-priority requests are discarded first.
This prevents the system from becoming entirely unresponsive.

Use Cases

Web Servers: Drop non-critical requests to maintain responsiveness.
Message Queues: Discard low-priority messages to process important ones.
Microservices: Prevents services from crashing due to excessive load.

Key Characteristics

Threshold-Based Shedding: Drops requests beyond a limit.
Priority Handling: High-priority requests are processed first.
Graceful Degradation: Keeps the system functional under heavy load.

Implementation in Java

import java.util.concurrent.atomic.AtomicInteger;

public class LoadShedding {
    private final int requestThreshold;
    private AtomicInteger currentRequests;

    public LoadShedding(int requestThreshold) {
        this.requestThreshold = requestThreshold;
        this.currentRequests = new AtomicInteger(0);
    }

    public boolean tryProcessRequest() {
        if (currentRequests.incrementAndGet() <= requestThreshold) {
            // Process request
            System.out.println("Request processed.");
            currentRequests.decrementAndGet();
            return true;
        } else {
            // Reject request
            System.out.println("Request rejected (load shedding).");
            currentRequests.decrementAndGet();
            return false;
        }
    }
}

3. Circuit Breaking

Circuit breakers prevent cascading failures by temporarily blocking requests to failing services.

Circuit Breaker States

Closed: All requests are allowed.
Open: Requests are blocked when failure rate exceeds a threshold.
Half-Open: Some requests are allowed to test if recovery is possible.

Use Cases

Microservices: Prevents one failing service from affecting others.
API Gateways: Fails fast instead of overloading backend services.

Key Characteristics

Failure Threshold: Opens the circuit if failures exceed the limit.
Recovery Mechanism: Tests whether the service has recovered.
Health Checks: Monitors service availability.

Implementation in Java

public class CircuitBreaker {
    private enum State { CLOSED, OPEN, HALF_OPEN }

    private State state = State.CLOSED;
    private int failureCount = 0;
    private final int failureThreshold;
    private final long timeout;
    private long lastFailureTime;

    public CircuitBreaker(int failureThreshold, long timeout) {
        this.failureThreshold = failureThreshold;
        this.timeout = timeout;
    }

    public boolean allowRequest() {
        switch (state) {
            case OPEN:
                if (System.currentTimeMillis() - lastFailureTime > timeout) {
                    state = State.HALF_OPEN;
                    return true; // Allow limited requests
                }
                return false;
            case HALF_OPEN:
            case CLOSED:
            default:
                return true;
        }
    }

    public void recordSuccess() {
        if (state == State.HALF_OPEN) {
            state = State.CLOSED;
        }
        failureCount = 0;
    }

    public void recordFailure() {
        failureCount++;
        lastFailureTime = System.currentTimeMillis();
        if (failureCount >= failureThreshold) {
            state = State.OPEN;
        }
    }
}

4. Buffering

Buffering temporarily stores requests in a queue, smoothing out spikes in traffic.

How Buffering Works

Incoming requests are stored in a buffer (queue).
Requests are processed at a steady rate.
Prevents the system from being overwhelmed during traffic spikes.

Use Cases

Message Queues (Kafka, RabbitMQ): Store messages before processing.
Web Servers: Queue requests instead of rejecting them.
Streaming Systems: Process data at a controlled rate.

Key Characteristics

Temporary Storage: Holds requests until they can be processed.
Smoothing Load Spikes: Prevents sudden bursts from overwhelming the system.
Rate Matching: Balances data production and consumption.

Implementation in Java

import java.util.concurrent.BlockingQueue;
import java.util.concurrent.LinkedBlockingQueue;

public class BufferingSystem {
    private final BlockingQueue<String> buffer;
    private final int bufferCapacity;

    public BufferingSystem(int bufferCapacity) {
        this.bufferCapacity = bufferCapacity;
        this.buffer = new LinkedBlockingQueue<>(bufferCapacity);
    }

    public void produce(String request) {
        try {
            buffer.put(request); // Block if buffer is full
            System.out.println("Request added to buffer: " + request);
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        }
    }

    public void consume() {
        try {
            String request = buffer.take(); // Block if buffer is empty
            System.out.println("Processing request: " + request);
        } catch (InterruptedException e) {
            Thread.currentThread().interrupt();
        }
    }

    public static void main(String[] args) {
        BufferingSystem system = new BufferingSystem(10);

        // Producer Thread
        new Thread(() -> {
            for (int i = 1; i <= 20; i++) {
                system.produce("Request-" + i);
            }
        }).start();

        // Consumer Thread
        new Thread(() -> {
            for (int i = 1; i <= 20; i++) {
                system.consume();
                try {
                    Thread.sleep(500); // Simulate processing delay
                } catch (InterruptedException e) {
                    Thread.currentThread().interrupt();
                }
            }
        }).start();
    }
}

Final Thoughts

Backpressure handling techniques such as rate limiting, load shedding, circuit breaking, and buffering are essential for maintaining system stability. Choosing the right strategy depends on system requirements and constraints. As distributed systems grow, mastering these techniques is key to building scalable and resilient architectures.