Consistent Hashing Explained: Distributed Systems Guide
Learn how consistent hashing distributes data across servers with minimal reshuffling. Used by Cassandra, DynamoDB, and Redis Cluster.
Imagine you have 4 servers and 1 million users. You need to decide which server handles which user. Simple enough — put user 1 on server 1, user 2 on server 2, and so on.
Now you add a 5th server. Suddenly, almost every user needs to move to a different server. That's a disaster if you're running a cache — you just lost most of your cached data.
Consistent hashing solves this. When you add or remove a server, only a small fraction of data moves. Everything else stays put.
Why Normal Hashing Breaks When Servers Change
The simplest approach is modulo hashing: server = hash(user_id) % number_of_servers.
With 4 servers:
- User 100 →
hash(100) % 4 = 0→ Server 0 - User 101 →
hash(101) % 4 = 1→ Server 1
Add a 5th server and the formula changes to % 5. Almost every user gets a different server number. If those servers were caches, you just lost ~80% of your cached data in one deployment.
That's the core problem: normal hashing is fragile when the number of servers changes.
How Consistent Hashing Works
Consistent hashing uses a circle (called a hash ring) instead of a simple modulo.
Step 1: Place servers on the ring
Imagine a clock face numbered 0 to 360. Each server gets placed at a position based on its hash. For example:
- Server A → position 50
- Server B → position 150
- Server C → position 250
Step 2: Place data on the ring
When you need to store a piece of data, hash it to get a position on the ring. Then find the next server clockwise from that position. That server owns the data.
Example: User 100 hashes to position 80. The next server clockwise from 80 is Server B at 150. So Server B handles User 100.
Step 3: Adding a server
Add Server D at position 100. Now only the data between position 50 and 100 needs to move from Server B to Server D. Everything else stays.
Instead of reshuffling 80% of data, you move roughly 1/N of data (where N is the number of servers). With 10 servers, adding one moves about 10% of data.
import hashlib
import bisect
class ConsistentHashRing:
def __init__(self, nodes=None):
self.ring = {}
self.sorted_keys = []
if nodes:
for node in nodes:
self.add_node(node)
def add_node(self, node):
# Place node on the ring using its hash
key = self._hash(node)
self.ring[key] = node
bisect.insort(self.sorted_keys, key)
def get_node(self, data_key):
if not self.ring:
return None
hash_val = self._hash(data_key)
# Find the next server clockwise
idx = bisect.bisect(self.sorted_keys, hash_val)
if idx == len(self.sorted_keys):
idx = 0 # wrap around the ring
return self.ring[self.sorted_keys[idx]]
def _hash(self, key):
return int(hashlib.md5(key.encode()).hexdigest(), 16)The Hotspot Problem (and Virtual Nodes)
The basic ring has a flaw: servers don't end up evenly spaced. One server might handle 40% of data while another handles 5%.
The fix is virtual nodes. Each physical server gets multiple positions on the ring (say, 100–200 positions each). This spreads the load more evenly, because random positions average out across many replicas.
Physical Server A → Virtual nodes at positions: 50, 120, 310, 450, ...
Physical Server B → Virtual nodes at positions: 80, 200, 380, 520, ...With virtual nodes:
- Load is naturally balanced across servers
- When a server goes down, its load spreads across all remaining servers (not just the next one)
- Servers with more capacity can have more virtual nodes
Where Consistent Hashing Is Used in Practice
Cassandra partitions rows across nodes using consistent hashing. Each row key maps to a token on the ring, and each node owns a range of tokens.
DynamoDB uses a similar partitioning scheme under the hood, invisibly managing rebalancing as you add read/write capacity.
Redis Cluster splits data into 16,384 hash slots and distributes them across nodes using consistent hashing.
Memcached client libraries use consistent hashing to decide which cache server holds a given key.
CDNs like Akamai use it to route requests to the right edge server without a central lookup.
When to Use Consistent Hashing
Use it when:
- You have horizontally scaled infrastructure that changes frequently
- Cache hit rates matter (you can't afford to invalidate everything on rescale)
- You're building a distributed cache, database, or load balancer
Don't bother when:
- You have a fixed, small number of servers that rarely change
- Data locality doesn't matter (simple round-robin works fine)
- You're using a managed service that handles partitioning for you
Key Takeaways
- Normal hashing breaks when server count changes — consistent hashing doesn't
- The hash ring means adding/removing a server only moves ~1/N of data
- Virtual nodes solve the uneven load distribution problem
- Cassandra, DynamoDB, Redis Cluster all rely on this technique
- The Python implementation above captures the core idea in under 30 lines
If you're designing a distributed cache, understanding consistent hashing will save you from nasty cache stampede issues during deployments.
Related reading: Load Balancing Strategies · Vertical vs Horizontal Scaling
Enjoyed this article?
Get weekly insights on backend architecture, system design, and Go programming.
Related Posts
Continue reading with these related posts
Database Sharding Explained: Scale to Millions of Users
Learn how database sharding works, when to use it, and common strategies. Covers horizontal partitioning, shard keys, and challenges with real examples.
Vertical vs Horizontal Scaling: When to Use Each
Learn the difference between vertical and horizontal scaling. Understand trade-offs, real costs, and when each strategy makes sense for your system.
Idempotency in APIs: Preventing Duplicate Operations
Learn what idempotency means in API design and why it matters for payments, retries, and distributed systems. With practical implementation patterns.