Load Balancing Explained: Algorithms, Layers, and Strategies

Load balancing is the practice of distributing incoming network traffic across multiple servers so no single machine becomes a bottleneck or single point of failure. A load balancer sits in front of your server pool, intercepts every incoming request, and decides which backend server should handle it — based on an algorithm, current server load, or routing rules.

Without load balancing, scaling means vertical scaling: throw more RAM and CPU at your one server until you hit hardware limits. With load balancing, you scale horizontally: add more servers to the pool and let the load balancer spread the work. Horizontal scaling is cheaper, more fault-tolerant, and has no practical ceiling.

This post covers every major load balancing algorithm with pseudocode, the difference between Layer 4 and Layer 7 load balancing, health checks and failure handling, sticky sessions, tool comparisons, nginx and HAProxy config examples, and global load balancing with GeoDNS.

What Load Balancers Actually Do

Your single server handles 100 requests per second fine. Traffic grows. Now it's getting 500. The server starts struggling — CPU spikes, memory fills, response times climb. You add more servers — but how do you split traffic between them? How do clients know which server to hit? How do servers get removed when they crash?

That's exactly what a load balancer does. It sits in front of your servers and decides which one handles each incoming request. Clients connect to the load balancer's IP address, never directly to individual servers. The load balancer proxies the request to a backend server, gets the response, and returns it to the client.

Beyond just splitting traffic, load balancers also:

Health check servers — remove them from rotation if they stop responding
Terminate SSL — handle HTTPS encryption so your app servers don't have to
Handle failover — route around dead servers automatically
Stick sessions — send the same user to the same server when needed
Rate limit — protect backend servers from traffic spikes
Compress responses — gzip content at the edge before sending to clients
Log and observe — centralize access logs, latency metrics, error rates

The interesting part is how they decide which server gets each request. That's the load balancing algorithm.

Load Balancing Algorithms Deep Dive

Round Robin

The simplest approach. Requests cycle through servers in order: request 1 goes to server A, request 2 to server B, request 3 to server C, then back to server A.

plaintext

Request 1 → Server A
Request 2 → Server B
Request 3 → Server C
Request 4 → Server A  ← cycles back
Request 5 → Server B

Pseudocode:

python

class RoundRobinBalancer:
    def __init__(self, servers):
        self.servers = servers
        self.index = 0
 
    def pick_server(self):
        server = self.servers[self.index]
        self.index = (self.index + 1) % len(self.servers)
        return server

Nginx config (default — no directive needed):

nginx

upstream backend {
    server 10.0.0.1;
    server 10.0.0.2;
    server 10.0.0.3;
}
 
server {
    listen 80;
    location / {
        proxy_pass http://backend;
    }
}

When to use: Servers with identical specs handling similarly-sized requests (e.g., stateless API calls, microservice endpoints).

When it breaks down: If request processing times vary widely. A slow 10-second database query on Server A means it keeps receiving new requests while already overloaded — Server B sits idle. Round robin doesn't measure actual load, only distributes by turn.

Pros and cons:

Pros	Cons
Dead simple to implement	Ignores actual server load
No state to track	Performs poorly with variable request durations
Predictable distribution	All servers must have similar capacity
Low overhead	Hot spots if traffic isn't uniform

Weighted Round Robin

Same cycle as round robin, but servers with more capacity receive proportionally more requests. Assign weights based on CPU, RAM, or measured throughput.

plaintext

Server A (weight 3): handles requests 1, 2, 3, 7, 8, 9 ...
Server B (weight 1): handles requests 4, 10 ...
Server C (weight 1): handles requests 5, 11 ...

Pseudocode:

python

class WeightedRoundRobinBalancer:
    def __init__(self, servers):
        # servers = [{"host": "10.0.0.1", "weight": 3}, ...]
        self.pool = []
        for s in servers:
            self.pool.extend([s["host"]] * s["weight"])
        self.index = 0
 
    def pick_server(self):
        server = self.pool[self.index]
        self.index = (self.index + 1) % len(self.pool)
        return server

Nginx config:

nginx

upstream backend {
    server 10.0.0.1 weight=3;
    server 10.0.0.2 weight=1;
    server 10.0.0.3 weight=1;
}

When to use: Heterogeneous infrastructure — e.g., a mix of older 2-core servers and newer 8-core servers. Also useful during rolling deployments when you're gradually shifting traffic to new instances (start the new server at weight=1, ramp up after it proves healthy).

Pros	Cons
Handles servers with different capacity	Weights are static — must be tuned manually
Good for gradual traffic shifts	Still ignores real-time load
Simple extension of round robin	Wrong weights cause uneven distribution

Least Connections

Route each new request to whichever server currently has the fewest active connections. This adapts dynamically to varying request durations — servers handling fewer concurrent requests receive the next one.

python

def pick_server(servers):
    # Pick the server handling the fewest concurrent requests right now
    return min(servers, key=lambda s: s.active_connections)

Nginx config:

nginx

upstream backend {
    least_conn;
    server 10.0.0.1;
    server 10.0.0.2;
    server 10.0.0.3;
}

When to use: Long-lived connections (WebSockets, gRPC streaming, file uploads) or when request processing times vary significantly. If one request takes 100ms and another takes 10 seconds, round robin will pile requests onto the slow server — least connections won't.

Pros	Cons
Adapts to variable request duration	Requires tracking active connection count
Prevents server overload dynamically	Connection count != CPU load (a 1000ms DB query ≠ 1ms cache hit)
Works well for long-lived connections	More complex than round robin

Least Response Time

An extension of least connections that also factors in server latency. The load balancer tracks average response time per server and prefers servers that are both fast and lightly loaded.

python

def score(server):
    # Lower score = better candidate
    return server.avg_response_ms * server.active_connections
 
def pick_server(servers):
    return min(servers, key=score)

This algorithm requires the load balancer to actively measure response times, which adds overhead. Most production implementations use an exponentially weighted moving average (EWMA) to smooth out spikes and avoid chasing transient outliers.

python

def update_ewma(server, new_sample_ms, alpha=0.2):
    # Alpha closer to 1 = more weight on recent samples
    server.avg_response_ms = (alpha * new_sample_ms) + ((1 - alpha) * server.avg_response_ms)

When to use: Heterogeneous request types where some backends are inherently slower due to data locality or computation — e.g., database-heavy vs cache-served responses. Also useful when serving mixed API endpoints where some routes are computationally expensive.

Pros	Cons
Most intelligent standard algorithm	Higher overhead per request
Handles both load and latency differences	Response time can fluctuate — needs smoothing
Adapts to real-world server performance	Not available in Nginx OSS

IP Hash

Hash the client's IP address to determine which server handles their requests. The same client IP always maps to the same server (as long as the server pool size doesn't change).

python

import hashlib
 
def pick_server(client_ip: str, servers: list) -> str:
    # Use a stable hash — Python's built-in hash() is randomized per-process
    digest = int(hashlib.md5(client_ip.encode()).hexdigest(), 16)
    return servers[digest % len(servers)]

Important: Don't use Python's built-in hash() here — it's randomized per-process since Python 3.3 (PYTHONHASHSEED). Use hashlib.md5 or similar for a stable, deterministic hash.

Nginx config:

nginx

upstream backend {
    ip_hash;
    server 10.0.0.1;
    server 10.0.0.2;
    server 10.0.0.3;
}

When to use: When you need the same client to reach the same server for session state stored locally (not in a shared cache). IP hash is one mechanism for sticky sessions — see the Session Persistence section below for alternatives.

Limitation: Adding or removing a server reshuffles most of the mappings, causing a large fraction of clients to hit a different server. Consistent hashing solves this.

Pros	Cons
Simple deterministic routing	Adding/removing servers reshuffles most sessions
No cookie overhead	Breaks for mobile users (IP changes)
Works at Layer 4	Uneven if many users share an IP (NAT)

Consistent Hashing

A more sophisticated hashing scheme. Instead of hash(ip) % N, servers are placed on a virtual "hash ring." Each request hashes to a point on the ring and goes to the nearest server clockwise. When a server is added or removed, only the requests that mapped to that server's segment are affected — not the entire pool.

plaintext

Virtual ring (0 → 2^32):
  ServerA at position 100
  ServerB at position 300
  ServerC at position 700
 
Request hash = 200 → nearest clockwise server = ServerB
Request hash = 500 → nearest clockwise server = ServerC
Request hash = 900 → wraps around → ServerA

For more detail: Consistent Hashing Explained.

When to use: Distributed caches, databases with sharding, or any scenario where minimizing cache misses during server additions/removals is critical.

Random

Pick a server at random. Simpler to implement than round robin and often performs comparably in practice because randomness distributes load evenly over time.

python

import random
 
def pick_server(servers):
    return random.choice(servers)

A variant called Power of Two Choices picks two servers at random and routes to the one with fewer connections. This gives most of the benefit of least connections with minimal overhead — it's used by Nginx and Envoy internally.

python

def pick_server_power_of_two(servers):
    a, b = random.sample(servers, 2)
    return a if a.active_connections <= b.active_connections else b

When to use: Stateless backends at high throughput where simplicity and low overhead matter more than precise load balancing.

Algorithm Comparison Table

Algorithm	Good for	Bad for	Overhead
Round Robin	Uniform stateless requests	Variable-duration requests	Minimal
Weighted Round Robin	Mixed-capacity servers	Dynamic load changes	Minimal
Least Connections	Long-lived connections	Very short requests	Low
Least Response Time	Mixed latency backends	—	Medium
IP Hash	Sticky sessions (simple)	Mobile users, pool changes	Low
Consistent Hashing	Caching, sharding	—	Medium
Random	Stateless high-throughput	—	Minimal
Power of Two	General purpose	—	Low

Layer 4 vs Layer 7 Load Balancing

Load balancers operate at different layers of the OSI model, and the layer determines what information they can see and use for routing decisions.

Layer 4 — Transport Layer

Layer 4 load balancers work at the TCP/UDP level. They see:

Source IP and port
Destination IP and port
Protocol (TCP vs UDP)

They cannot inspect the HTTP content of the request. Routing decisions are based purely on network-level information.

plaintext

Client → [L4 LB] → Backend server
         ↑
         Sees: IP:port only
         Does NOT see: HTTP headers, URL path, cookies

What L4 can do:

Route TCP connections to backend servers with very low CPU overhead (connection table lookups only)
Handle millions of connections per second
Work with any TCP-based protocol — HTTP, HTTPS, MySQL, Redis, custom binary protocols
Provide static IP addresses (useful for DNS whitelisting and IP-based firewall rules)
TLS passthrough (the backend server decrypts — the load balancer never sees plaintext)

AWS equivalent: AWS Network Load Balancer (NLB) — handles millions of requests per second with microsecond-level latency, static Elastic IP support.

Use cases: High-throughput systems where you don't need HTTP-aware routing, non-HTTP protocols (database connections, SMTP, custom TCP), TLS passthrough scenarios, situations requiring a static IP.

Layer 7 — Application Layer

Layer 7 load balancers understand HTTP/HTTPS. They terminate the TLS connection, decrypt the traffic, inspect it, route it, and optionally re-encrypt to the backend. They can make routing decisions based on:

URL path (/api/* vs /static/*)
HTTP headers (Host:, Authorization:, User-Agent:, custom headers)
Cookies (for sticky sessions)
Request body content
Query parameters
HTTP method (GET vs POST)

plaintext

Layer 7 content-based routing example:
 
GET /api/users     → API server pool (4 servers)
GET /images/       → CDN or image server pool (2 servers)
GET /checkout      → High-memory checkout pool (more RAM for cart processing)
POST /upload       → Upload handler pool (fast disk I/O)
Host: admin.co    → Admin server pool (restricted access)

What L7 can do (that L4 cannot):

Route different URL paths to different backend pools
Route different hostnames to different services (virtual hosting)
Terminate SSL and inspect decrypted content
Rewrite URLs and HTTP headers before forwarding
Enable A/B testing by routing a percentage of traffic to a new version
Canary deployments — gradually shift traffic from old to new
Rate limiting by API key, user ID, or IP address
WAF (web application firewall) rules — block SQL injection, XSS, etc.
Compression, caching, and response modification at the edge

AWS equivalent: AWS Application Load Balancer (ALB) — supports path-based routing, host-based routing, weighted target groups for canary deployments, and integrates with AWS WAF.

Nginx L7 routing config:

nginx

upstream api_servers {
    least_conn;
    server 10.0.1.1;
    server 10.0.1.2;
}
 
upstream image_servers {
    server 10.0.2.1;
    server 10.0.2.2;
}
 
upstream checkout_servers {
    server 10.0.3.1;
    server 10.0.3.2;
}
 
server {
    listen 443 ssl;
    ssl_certificate /etc/nginx/ssl/cert.pem;
    ssl_certificate_key /etc/nginx/ssl/key.pem;
 
    # Route /api/* to API servers
    location /api/ {
        proxy_pass http://api_servers;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    }
 
    # Route /images/* to image servers
    location /images/ {
        proxy_pass http://image_servers;
        expires 7d;
        add_header Cache-Control "public";
    }
 
    # Route checkout to dedicated pool
    location /checkout {
        proxy_pass http://checkout_servers;
        proxy_read_timeout 60s;
    }
 
    # Default: route everything else to API servers
    location / {
        proxy_pass http://api_servers;
    }
}

When to Use Which Layer

Requirement	Use L4	Use L7
Maximum throughput (millions RPS)	✓	—
Non-HTTP protocols (MySQL, Redis, SMTP)	✓	—
Static IP address required	✓	—
TLS passthrough (backend decrypts)	✓	—
Route by URL path	—	✓
Route by hostname (virtual hosting)	—	✓
Sticky sessions via cookies	—	✓
A/B testing / canary deployments	—	✓
WAF / rate limiting	—	✓
Header-based routing	—	✓
Inspect request body	—	✓

Many production systems use both: L4 at the edge for raw throughput and static IPs, L7 behind it for intelligent HTTP routing.

Health Checks and Failure Handling

A load balancer is only as good as its ability to detect and route around failures. Three mechanisms work together: health checks, circuit breaking, and connection draining.

Active Health Checks

The load balancer proactively sends requests to each backend server on a schedule, regardless of real traffic. If a server fails to respond (or returns an error) a configured number of times, it gets removed from the rotation.

nginx

# HAProxy active health check config
backend app_servers
    balance leastconn
    option httpchk GET /health HTTP/1.1\r\nHost:\ example.com
    http-check expect status 200
    server web1 10.0.0.1:8080 check inter 5s fall 3 rise 2
    server web2 10.0.0.2:8080 check inter 5s fall 3 rise 2
    server web3 10.0.0.3:8080 check inter 5s fall 3 rise 2

This config: checks each server every 5 seconds, marks it down after 3 consecutive failures, and reinstates it after 2 consecutive successes. The inter, fall, and rise parameters let you tune the sensitivity.

Your /health endpoint should:

Return HTTP 200 for a healthy server
Check real readiness: database connection alive, cache accessible, critical dependencies reachable
Return 503 if the server is not ready to serve traffic (e.g., still warming up)
Respond quickly (under 1 second) to avoid timeout-induced false positives
Not require authentication — the health checker won't send credentials
Not cause side effects (not a write operation)

python

# Example health check endpoint (FastAPI)
@app.get("/health")
async def health_check():
    checks = {}
    
    # Check database
    try:
        await db.execute("SELECT 1")
        checks["database"] = "ok"
    except Exception:
        checks["database"] = "down"
    
    # Check Redis
    try:
        await redis.ping()
        checks["cache"] = "ok"
    except Exception:
        checks["cache"] = "down"
    
    all_ok = all(v == "ok" for v in checks.values())
    return JSONResponse(
        status_code=200 if all_ok else 503,
        content={"status": "healthy" if all_ok else "degraded", "checks": checks}
    )

Passive Health Checks

Nginx open-source only supports passive health checks — the load balancer monitors real traffic responses and marks servers as down when they repeatedly fail. No proactive polling happens.

nginx

upstream backend {
    server 10.0.0.1 max_fails=3 fail_timeout=30s;
    server 10.0.0.2 max_fails=3 fail_timeout=30s;
    server 10.0.0.3 max_fails=3 fail_timeout=30s;
}

After 3 failures within 30 seconds, the server is excluded for 30 seconds. Then Nginx sends it a probe request — if it succeeds, the server rejoins the pool.

Active vs passive comparison:

	Active	Passive
Detects failures	Before real traffic hits	Only when real traffic fails
False positives	Possible (health check ≠ real workload)	Lower
Availability in Nginx OSS	No (Plus only)	Yes
Overhead	Low (periodic health requests)	None
Recommended	Yes, for production	Use as fallback

Circuit Breaking

Circuit breaking prevents cascading failures. When a backend starts failing, instead of waiting for each request to time out, the circuit breaker "opens" and immediately returns an error (or a cached fallback) without hitting the failing server.

plaintext

CLOSED → normal operation, requests pass through
   ↓ (failure threshold exceeded — e.g., 50% error rate in 10s window)
OPEN → immediately reject requests, return fallback (fast fail)
   ↓ (after timeout period — e.g., 30 seconds)
HALF-OPEN → allow one probe request through
   ↓ success              ↓ failure
CLOSED (recovered)        OPEN (back to fast fail)

This state machine protects the system from wasting resources on a backend that's clearly broken. It also gives the backend time to recover without being hammered by a flood of requests the moment it comes back up.

Circuit breaking is typically implemented at the service mesh or client library level (Hystrix for Java, resilience4j, Envoy proxy, Istio) rather than in the load balancer itself.

Connection Draining

When you remove a server from the pool (for maintenance, deployment, or controlled shutdown), in-flight requests should complete before the server is disconnected. Abruptly cutting off a server mid-request causes errors for active users.

Connection draining (AWS calls it "deregistration delay") tells the load balancer: "stop sending new requests to this server, but let existing connections finish."

bash

# AWS CLI: set deregistration delay to 60 seconds on a target group
aws elbv2 modify-target-group-attributes \
  --target-group-arn arn:aws:elasticloadbalancing:... \
  --attributes Key=deregistration_delay.timeout_seconds,Value=60

nginx

# HAProxy: manually drain a server via runtime API
# Connect to HAProxy stats socket and disable the server
echo "disable server app_servers/web1" | socat stdio /run/haproxy/admin.sock
# web1 stops getting new connections; existing ones complete
# Re-enable after maintenance:
echo "enable server app_servers/web1" | socat stdio /run/haproxy/admin.sock

AWS ALB default deregistration delay is 300 seconds — configure this to match your longest expected request duration. For APIs with fast responses, 30-60 seconds is usually sufficient.

Session Persistence (Sticky Sessions)

What It Is

By default, a load balancer may route consecutive requests from the same user to different servers. This works fine if your app is stateless, but breaks if session data (shopping cart, login state, game state, WebSocket connection) is stored in a server's local memory.

Sticky sessions (also called session affinity or session persistence) ensure that a client's requests always go to the same backend server for the lifetime of a session.

Why You Need It (And When You Don't)

You need sticky sessions when:

Your app stores user session data in local memory or the local filesystem
You're using in-memory WebSocket state (chat rooms, live dashboards)
You have stateful gRPC streaming sessions
You're in a legacy system that cannot be made stateless quickly

You don't need sticky sessions when:

Session data lives in a shared store (Redis, Memcached, a database)
Your app is fully stateless (REST API with JWT tokens, no server-side session)

The correct long-term solution is stateless design. Store sessions in Redis so any server can serve any request. Sticky sessions are a workaround for stateful apps.

IP Hash vs Cookie-Based Stickiness

IP Hash:

Hashes the client's IP address to select a server
No cookie overhead
Breaks if the client's IP changes (mobile users switching from WiFi to LTE, NAT, VPNs, CDNs)
Breaks when servers are added/removed (most sessions remap to different servers)
Works at Layer 4 (no HTTP inspection needed)
Uneven distribution if many users share an IP (corporate NAT, ISP-level NAT)

Cookie-Based Stickiness:

The load balancer sets a cookie on the first response (e.g., SERVERID=web1)
Subsequent requests include the cookie, and the LB routes to web1
Survives IP changes — mobile users, proxies, VPNs all work correctly
Works even when you add servers (existing cookies still map to the right server)
Requires Layer 7 (the LB must read HTTP cookies)
Small overhead per request for cookie parsing

Nginx Plus cookie-based sticky sessions:

nginx

upstream backend {
    sticky cookie srv_id expires=1h domain=.example.com path=/ httponly;
    server 10.0.0.1;
    server 10.0.0.2;
    server 10.0.0.3;
}

HAProxy cookie-based sticky sessions:

plaintext

backend app_servers
    balance roundrobin
    cookie SERVERID insert indirect nocache
    server web1 10.0.0.1:8080 check cookie web1
    server web2 10.0.0.2:8080 check cookie web2
    server web3 10.0.0.3:8080 check cookie web3

AWS ALB sticky sessions:

plaintext

Target Group → Attributes → Stickiness: Enabled
Type: Load balancer generated cookie (AWSALB)
Duration: 1 day

Trade-offs of Sticky Sessions

Consideration	Impact
Uneven load distribution	Heavy users pinned to one server can overload it
Server failure recovery	Users on a failed server lose their session (unless session is replicated)
Scaling out	Newly added servers don't receive existing sticky traffic immediately
Zero-downtime deploys	Harder — must drain servers gracefully before replacing them
Session expiry	Stale cookies can route to wrong server after pool changes

Load Balancer Tools

nginx

Nginx is a multi-purpose web server, reverse proxy, SSL terminator, and load balancer. OSS version is free; Nginx Plus adds active health checks, cookie-based sticky sessions, and a live dashboard.

Full nginx load balancer config:

nginx

upstream api_backend {
    least_conn;
    
    # Passive health check: mark down after 3 fails in 30s
    server 10.0.0.1:8080 max_fails=3 fail_timeout=30s;
    server 10.0.0.2:8080 max_fails=3 fail_timeout=30s;
    # Backup: only used when all others are down
    server 10.0.0.3:8080 backup;
    
    # Keep connections open to backends (avoid TCP handshake on each request)
    keepalive 32;
}
 
server {
    listen 80;
    server_name api.example.com;
    
    # Redirect HTTP → HTTPS
    return 301 https://$host$request_uri;
}
 
server {
    listen 443 ssl http2;
    server_name api.example.com;
    
    ssl_certificate /etc/nginx/ssl/fullchain.pem;
    ssl_certificate_key /etc/nginx/ssl/privkey.pem;
    ssl_protocols TLSv1.2 TLSv1.3;
    
    location / {
        proxy_pass http://api_backend;
        proxy_http_version 1.1;
        proxy_set_header Connection "";              # Enable keepalive to backend
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        
        proxy_connect_timeout 5s;
        proxy_read_timeout 30s;
        proxy_send_timeout 30s;
    }
}

HAProxy

HAProxy is a dedicated proxy and load balancer — not a web server. Offers more granular TCP/HTTP control, a detailed real-time stats dashboard, and generally higher raw performance at extreme connection counts.

HAProxy config with health checks, sticky sessions, and stats:

plaintext

global
    maxconn 50000
    log stdout format raw local0
 
defaults
    mode http
    timeout connect 5s
    timeout client 30s
    timeout server 30s
    log global
    option httplog
 
frontend http_front
    bind *:80
    bind *:443 ssl crt /etc/haproxy/certs/example.pem
    redirect scheme https if !{ ssl_fc }
    default_backend app_servers
 
backend app_servers
    balance leastconn
    option httpchk GET /health
    http-check expect status 200
    cookie SERVERID insert indirect nocache
    server web1 10.0.0.1:8080 check inter 5s fall 3 rise 2 cookie web1
    server web2 10.0.0.2:8080 check inter 5s fall 3 rise 2 cookie web2
    server web3 10.0.0.3:8080 check inter 5s fall 3 rise 2 cookie web3
 
# Stats dashboard at /haproxy_stats
listen stats
    bind *:8404
    stats enable
    stats uri /haproxy_stats
    stats refresh 10s
    stats auth admin:secretpassword

Tool Comparison

Tool	Type	Layer	Health Checks	Sticky Sessions	Best For
Nginx (OSS)	Software	L4 + L7	Passive only	IP hash only	Web apps, API gateways, reverse proxy
Nginx Plus	Commercial	L4 + L7	Active + passive	Cookie-based	Enterprise Nginx features
HAProxy	Software	L4 + L7	Active + passive	Cookie + source IP	High-performance TCP/HTTP, fine-grained control
AWS ALB	Managed	L7 only	Active	Cookie-based	AWS workloads, WAF, path routing
AWS NLB	Managed	L4 only	Active	Source IP	Millions RPS, static IP, non-HTTP protocols
Cloudflare LB	Managed	L7	Active	Cookie-based	Global GeoDNS, DDoS protection, edge routing
Traefik	Software	L4 + L7	Active	Cookie-based	Kubernetes, Docker, automatic service discovery
Envoy	Software	L4 + L7	Active	Cookie + header	Service mesh (Istio), gRPC, advanced observability

Nginx vs HAProxy in depth: Both are excellent open-source options. Nginx has a gentler learning curve and doubles as a web server and reverse proxy — ideal if you're already using it to serve static files. HAProxy is a dedicated proxy with more fine-grained connection control, a built-in real-time stats dashboard, and often better raw throughput at very high connection counts (100k+ concurrent connections). For pure load balancing at scale, HAProxy is generally preferred; for all-in-one web infrastructure, Nginx is the more common choice.

AWS ALB vs NLB: Use ALB when you need HTTP features (path routing, WAF, authentication offload via Cognito). Use NLB when you need maximum throughput, ultra-low latency, a static Elastic IP address, or are load balancing non-HTTP TCP protocols like database connections.

Global Load Balancing and GeoDNS

Single-region load balancing distributes traffic across servers in one datacenter. Global load balancing distributes traffic across datacenters in multiple geographic regions — reducing latency for international users and providing multi-region redundancy.

GeoDNS

GeoDNS resolves a domain name to different IP addresses based on the geographic location of the DNS requester. A user in Europe gets the IP of your EU datacenter; a user in Asia gets your Singapore IP.

plaintext

User in Germany → DNS resolves example.com → 34.90.x.x (EU datacenter)
User in US      → DNS resolves example.com → 54.80.x.x (US-East datacenter)
User in Japan   → DNS resolves example.com → 13.115.x.x (AP-Northeast datacenter)

Tools: AWS Route 53 (latency-based routing, geolocation routing), Cloudflare Load Balancer, NS1, Google Cloud DNS.

Latency-Based Routing

Instead of routing by geographic region (which can be inaccurate — a user's DNS resolver may be in a different country than the user), latency-based routing measures actual round-trip time to each datacenter and routes to the fastest one.

AWS Route 53 latency-based routing sends requests to the AWS region that provides the lowest latency for the end user, continuously updated based on empirical measurements across AWS's global infrastructure.

Failover Routing

Global load balancers can route all traffic to a primary region and automatically fail over to a secondary region if the primary becomes unreachable.

plaintext

Primary:   us-east-1 (active)
Secondary: eu-west-1 (standby)
 
Health check fails on us-east-1
→ Route 53 marks the primary record as unhealthy
→ DNS TTL expires (30-60 seconds)
→ Traffic shifts to eu-west-1

DNS TTL consideration: GeoDNS failover is bounded by DNS TTL. A 300-second TTL means clients might continue hitting the failed region for up to 5 minutes after the failure. For faster failover, use an anycast network (Cloudflare) or configure a very short TTL (30-60s) on your health-checked records.

Anycast

Anycast assigns the same IP address to servers in multiple locations. BGP routing directs each client to the nearest location automatically — no DNS-level magic needed, no TTL delays. Cloudflare uses anycast across 300+ Points of Presence to route users to the nearest edge node with zero DNS-level failover latency.

When You Need Global Load Balancing

Your users span multiple continents and latency is a core product concern
You need active-active multi-region redundancy (not just failover standby)
Regulatory requirements mandate data residency in specific geographic regions
One region going down should not take your entire service offline

When Load Balancing Isn't Enough

Load balancers distribute traffic but they're not magic. Watch out for these common traps:

Database bottleneck: A load balancer helps your app servers, but they all hit the same database. If the database is the bottleneck, adding more app servers behind a load balancer doesn't help. You need database scaling strategies — read replicas, sharding, or caching.

Stateful sessions: If your app stores session state in local memory, load balancing becomes painful. Move session state to Redis so any server can handle any request — then you can freely add/remove servers without disrupting users.

Hot spots: If certain users or URL paths generate much heavier traffic than others, uniform load balancing won't help. You need smarter routing (route heavy paths to dedicated pools), caching (cache expensive responses at the LB or CDN layer), or rate limiting (prevent single clients from consuming disproportionate capacity).

Thundering herd after failover: When a failed server comes back online, it may receive a flood of traffic as it rejoins the pool — potentially failing again. Gradual ramp-up (via weighted round robin starting at low weight) or slowstart (HAProxy slowstart 60s parameter) prevents this.

FAQ

What is load balancing and why is it important?

Load balancing is the process of distributing incoming network requests across multiple servers. It's important because no single server can handle unlimited traffic — as demand grows, you need multiple machines working in parallel. A load balancer makes this pool of servers look like a single address to clients, hides individual server failures, and ensures no one server is overwhelmed while others sit idle. It's a foundational building block for horizontal scaling, high availability, and zero-downtime deployments.

What load balancing algorithm should I use?

Start with round robin for stateless APIs where all servers are similar. Use least connections when requests vary significantly in duration — WebSockets, file uploads, database-heavy queries. Use weighted round robin when servers have different capacities or during gradual traffic shifts in deployments. Use IP hash or cookie-based stickiness only when you have a legacy stateful application — the correct long-term solution is making your app stateless with Redis-backed sessions.

What is the difference between Layer 4 and Layer 7 load balancing?

Layer 4 (L4) load balancers operate at the TCP/IP level — they see IP addresses and ports, but not HTTP content. They're extremely fast and work with any TCP-based protocol. Layer 7 (L7) load balancers understand HTTP — they can inspect URLs, headers, cookies, and route traffic based on that content (e.g., /api/* to API servers, /images/* to image servers). Use L4 for raw throughput, non-HTTP protocols, or static IP requirements. Use L7 when you need content-aware routing, sticky sessions via cookies, A/B testing, or WAF rules.

What are sticky sessions and when should I use them?

Sticky sessions (session affinity) ensure a user's requests always go to the same server — typically because session state is stored in that server's local memory. Use sticky sessions as a short-term fix for legacy stateful applications. The long-term solution is to move session data to a shared store (Redis or Memcached) so your app becomes stateless and any server can handle any request — which makes horizontal scaling, deployments, and failover dramatically simpler.

What is the difference between nginx and HAProxy for load balancing?

Both are excellent open-source options. Nginx is a multi-purpose web server, reverse proxy, and load balancer — a good choice if you're already using it to serve static files or as an API gateway, and its configuration style is widely known. HAProxy is a dedicated proxy focused entirely on load balancing and connection management, with more granular TCP control, a built-in real-time stats dashboard, active health checks without a commercial license, and often better raw throughput at extreme connection counts (100k+ concurrent). For pure load balancing at scale, HAProxy is generally preferred; for all-in-one web infrastructure, Nginx is the more common choice.

How do load balancers handle server failures?

Through health checks and automatic pool management. Active health checks proactively poll a /health endpoint on each server every few seconds — if a server fails to respond or returns an error a configured number of times (e.g., 3 consecutive failures), it's removed from the rotation automatically. Passive health checks detect failures from real traffic instead of polling. Once a server recovers and passes health checks again (e.g., 2 consecutive successes), it's automatically reinstated. Connection draining ensures in-flight requests complete before a server is removed, preventing errors for active users.

What is the difference between a load balancer and a reverse proxy?

A reverse proxy sits in front of your servers and forwards client requests to them, returning the server's response to the client — the client never connects directly to the backend. Every load balancer is a reverse proxy. But not every reverse proxy is a load balancer — a reverse proxy might forward to just one backend server (for SSL termination, caching, request filtering, or header injection). Load balancing is a specific capability of a reverse proxy where the backend is a pool of servers and the proxy decides which one to use for each request.

Key Takeaways

Load balancers split traffic across servers, hide individual server failures, and make horizontal scaling practical
Round robin is simple; least connections handles variable request times better; consistent hashing minimizes reshuffling when your server pool changes
Layer 4 load balancing is faster and works with any TCP protocol; Layer 7 enables content-aware routing, sticky sessions, WAF rules, and A/B testing
Always implement health checks — active checks (HAProxy, AWS ALB) are preferable to passive (Nginx OSS) because they detect failures before real traffic hits
Sticky sessions are a workaround for stateful apps — move sessions to Redis for true stateless design
Global load balancing with GeoDNS or anycast reduces latency for international users and enables multi-region redundancy
Connection draining and circuit breaking protect users during server failures and deployments

Load balancing is usually one of the first infrastructure additions you make when scaling beyond a single server. Get the algorithm right for your workload, implement solid health checks, and make your app stateless — then horizontal scaling becomes straightforward.

Related reading: Vertical vs Horizontal Scaling · Consistent Hashing

What Load Balancers Actually Do

Load Balancing Algorithms Deep Dive

Round Robin

Weighted Round Robin

Least Connections

Least Response Time

IP Hash

Consistent Hashing

Random

Algorithm Comparison Table

Layer 4 vs Layer 7 Load Balancing

Layer 4 — Transport Layer

Layer 7 — Application Layer

When to Use Which Layer

Health Checks and Failure Handling

Active Health Checks

Passive Health Checks

Circuit Breaking

Connection Draining

Session Persistence (Sticky Sessions)

What It Is

Why You Need It (And When You Don't)

IP Hash vs Cookie-Based Stickiness

Trade-offs of Sticky Sessions

Load Balancer Tools

nginx

HAProxy

Tool Comparison

Global Load Balancing and GeoDNS

GeoDNS

Latency-Based Routing

Failover Routing

Anycast

When You Need Global Load Balancing

When Load Balancing Isn't Enough

FAQ

What is load balancing and why is it important?

What load balancing algorithm should I use?

What is the difference between Layer 4 and Layer 7 load balancing?

What are sticky sessions and when should I use them?

What is the difference between nginx and HAProxy for load balancing?

How do load balancers handle server failures?

What is the difference between a load balancer and a reverse proxy?

Key Takeaways

Enjoyed this article?

Related Posts

API Gateway Pattern: The Front Door to Your Microservices

Reverse Proxy Explained: How It Works and Why You Need One (2026)

Vertical vs Horizontal Scaling: When to Use Each