system design

Load Balancing Explained: Algorithms and Strategies

Learn how load balancers distribute traffic across servers. Covers round robin, least connections, consistent hashing, and when to use each.

By Akash Sharma·5 min read
#system design
#load balancing
#scalability
#high availability
#backend
#infrastructure

Your single server handles 100 requests per second fine. Traffic grows. Now it's getting 500. The server starts struggling. You add more servers — but how do you split traffic between them?

That's exactly what a load balancer does. It sits in front of your servers and decides which one handles each incoming request.

What Load Balancers Actually Do

A load balancer is a reverse proxy that receives all incoming traffic and distributes it across a pool of servers.

Beyond just splitting traffic, load balancers also:

  • Health check servers — remove them from rotation if they stop responding
  • Terminate SSL — handle HTTPS encryption so your app servers don't have to
  • Handle failover — route around dead servers automatically
  • Stick sessions — send the same user to the same server when needed

The interesting part is how they decide which server gets each request. That's the load balancing algorithm.

Load Balancing Algorithms

Round Robin

The simplest approach. Send request 1 to server 1, request 2 to server 2, request 3 to server 3, then back to server 1.

plaintext
Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A (start over)

Best for: Servers with similar specs handling similar requests. Problem: Doesn't account for server load. A slow request on Server A might take 10 seconds while Server B sits idle.

Weighted Round Robin

Same as round robin, but servers with more capacity get more requests. If Server A is twice as powerful, it gets twice the traffic.

plaintext
Server A (weight 2): gets requests 1, 2, 5, 6...
Server B (weight 1): gets requests 3, 7...
Server C (weight 1): gets requests 4, 8...

Best for: Heterogeneous infrastructure where servers have different capacities.

Least Connections

Route each new request to the server with the fewest active connections right now.

python
def pick_server(servers):
    # Pick the server handling the least requests currently
    return min(servers, key=lambda s: s.active_connections)

Best for: Long-lived connections like WebSockets, or when requests vary a lot in processing time. If some requests take 100ms and others take 10 seconds, round robin doesn't work — least connections adapts.

IP Hash (Sticky Sessions)

Hash the client's IP address to always send them to the same server.

python
def pick_server(client_ip, servers):
    index = hash(client_ip) % len(servers)
    return servers[index]

Best for: Stateful applications where session data is stored on a specific server. Problem: If a server goes down, all its clients get routed elsewhere, losing their session. The better fix is to move session state to a shared cache like Redis.

Consistent Hashing

A more sophisticated version of IP hash. Uses a hash ring so adding/removing servers only affects a small fraction of clients. Used in distributed caches and databases.

For more detail: Consistent Hashing Explained.

Layer 4 vs Layer 7 Load Balancing

Load balancers operate at different layers of the network stack:

Layer 4 (Transport): Works at the TCP/UDP level. Routes packets based on IP address and port. Very fast, but can't inspect HTTP content.

Layer 7 (Application): Works at the HTTP level. Can route based on URL path, headers, cookies, or even request body content.

plaintext
Layer 7 routing example:
/api/*     → API server pool
/images/*  → Image server pool
/checkout  → Checkout server pool (with more resources)

Layer 7 is more flexible — it lets you route different parts of your app to different server pools, enable A/B testing, and do canary deployments. Most modern load balancers (Nginx, HAProxy, AWS ALB) support Layer 7.

Health Checks: Automatically Removing Dead Servers

A good load balancer constantly pings servers to check they're healthy.

nginx
upstream backend {
    server 10.0.0.1;
    server 10.0.0.2;
    server 10.0.0.3;
}

The typical health check hits a /health endpoint on each server every few seconds. No response (or an error response) means the server gets removed from the pool. When it recovers, it gets added back automatically.

This is how you get high availability without manual intervention.

Common Load Balancer Tools

ToolTypeBest for
NginxSoftwareWeb apps, API gateways
HAProxySoftwareHigh-performance TCP/HTTP LB
AWS ALBManaged Layer 7AWS workloads, no infra to manage
AWS NLBManaged Layer 4Millions of req/sec, static IP needed
TraefikSoftwareKubernetes, auto service discovery

When Load Balancing Isn't Enough

Load balancers distribute traffic but they're not magic. Watch out for:

Database bottleneck: A load balancer helps your app servers, but they all hit the same database. If the database is the bottleneck, you need database scaling strategies.

Stateful sessions: If your app stores session state in memory, load balancing becomes painful. Move session state to Redis so any server can handle any request.

Hot spots: If certain users or paths generate much heavier traffic, you need smarter routing or caching — not just more servers.

Key Takeaways

  • Load balancers split traffic across servers and handle failover automatically
  • Round robin is simple; least connections handles variable request times better
  • Layer 7 load balancing lets you route by URL path, headers, and content
  • Always implement health checks so dead servers get removed automatically
  • Sticky sessions are a workaround — make your app stateless with Redis instead

Load balancing is usually one of the first things you add when scaling beyond a single server.

Related reading: Vertical vs Horizontal Scaling · Consistent Hashing

Enjoyed this article?

Get weekly insights on backend architecture, system design, and Go programming.