DEV Community

Ujjwal Raj
Ujjwal Raj

Posted on

Network Load Balancing in Distributed systems

Welcome to this Sunday’s blog! Every week, I publish articles on distributed systems, exploring fascinating concepts and their real-world applications.

We know that scaling a server is important. However, scaling vertically indefinitely is not possible. Therefore, we need horizontal scaling to enhance performance. By running multiple instances of servers in parallel and distributing requests among them, we can decrease the load on each server.

Image description

Stateful vs. Stateless Applications

In this article, I will discuss scaling stateless servers.

A stateful server requires or deals with the state of a request. For example, during a login process, a request to confirm an OTP relies on session data. On the other hand, a stateless application does not require state persistence. A good example is a Google search, where each request is independent.

For stateful servers, advanced replication and partitioning techniques are used, which I have discussed in previous blogs. A fundamental rule to follow when designing a system is to offload state management to third-party providers such as databases or S3 buckets.

Now, let's dive deeper into stateless server load balancing.

Load Balancing and Service Discovery

A load balancer can use several algorithms to distribute incoming requests to target servers, such as round-robin or hashing.

The load balancer is responsible for redirecting requests efficiently to ensure no single server becomes a hotspot and load is distributed evenly.

One approach is to query the servers to check their availability before directing requests. However, this adds computational overhead. Another approach is to randomly assign a request to any registered server, but this can fail if a server crashes or becomes unavailable.

A hybrid approach is to randomly select 2 or 3 servers and then query them to check if they are healthy and available before directing the request.

These service discovery features are provided by tools like ZooKeeper, which is fault-tolerant.

Health Checks

There are two types of health checks: active and passive.

  • Passive Health Checks: The load balancer detects unhealthy servers when a request fails or is not completed.
  • Active Health Checks: The servers expose a dedicated health-check endpoint that the load balancer regularly queries to assess their health.

Active health checks are particularly useful for managing overloaded or faulty servers. For example, in auto-scaling, if a server’s RAM or CPU usage is exhausted, the load balancer can restart it. Additionally, the load balancer can determine when new servers need to be added.

DNS Load Balancing

Load balancing can also be performed at the DNS level. The DNS maintains multiple target IPs associated with a domain and distributes them to users. Each IP corresponds to one of several servers.

Image description

The above picture is taken from the book - Understanding Distributed Systems.

However, this approach is not fault-tolerant. If a server becomes unavailable, the DNS will still serve its IP to users, potentially leading to failed requests.

Transport Layer Load Balancing

This type of load balancing occurs at the TCP layer. A pool of connections is maintained, and a virtual IP is exposed to the client.

Each network packet includes a source IP and target IP, which helps direct the request to the appropriate server at the TLS layer.

This approach can be optimized by allowing the target server to respond directly to the client. However, a major drawback is that it cannot terminate TLS.

If you want to learn more about TLS, check my previous blog.

Application Layer Load Balancing

Application layer load balancing is typically handled by reverse proxies. You can refer to my previous blogs for a detailed explanation of reverse proxies.

A challenge with this approach is that all requests must pass through the load balancer, which can create a performance bottleneck.

To address this, a sidecar pattern or service mesh is widely used when all services belong to the same organization. In this pattern, a local load balancer (e.g., NGINX) runs on the same machine, handling service discovery, health checks, and request routing. However, this adds complexity.

The sidecar pattern is particularly popular in microservices architectures, where services communicate internally without requiring a centralized load balancer.

Conclusion

Load balancing is a critical aspect of scaling stateless servers. By leveraging different load balancing techniques—DNS-level, transport-layer, and application-layer load balancing—we can ensure efficient request distribution and improve system resilience.

Here are some links to my previous posts, which I publish every Sunday on distributed systems:

Feel free to check them out and share your thoughts!

Top comments (0)