system design

Consistent Hashing Explained: Distributed Systems Guide

Learn how consistent hashing distributes data across servers with minimal reshuffling. Used by Cassandra, DynamoDB, and Redis Cluster.

By Akash Sharma·5 min read
#system design
#distributed systems
#hashing
#scalability
#algorithms
#caching

Imagine you have 4 servers and 1 million users. You need to decide which server handles which user. Simple enough — put user 1 on server 1, user 2 on server 2, and so on.

Now you add a 5th server. Suddenly, almost every user needs to move to a different server. That's a disaster if you're running a cache — you just lost most of your cached data.

Consistent hashing solves this. When you add or remove a server, only a small fraction of data moves. Everything else stays put.

Why Normal Hashing Breaks When Servers Change

The simplest approach is modulo hashing: server = hash(user_id) % number_of_servers.

With 4 servers:

  • User 100 → hash(100) % 4 = 0 → Server 0
  • User 101 → hash(101) % 4 = 1 → Server 1

Add a 5th server and the formula changes to % 5. Almost every user gets a different server number. If those servers were caches, you just lost ~80% of your cached data in one deployment.

That's the core problem: normal hashing is fragile when the number of servers changes.

How Consistent Hashing Works

Consistent hashing uses a circle (called a hash ring) instead of a simple modulo.

Step 1: Place servers on the ring

Imagine a clock face numbered 0 to 360. Each server gets placed at a position based on its hash. For example:

  • Server A → position 50
  • Server B → position 150
  • Server C → position 250

Step 2: Place data on the ring

When you need to store a piece of data, hash it to get a position on the ring. Then find the next server clockwise from that position. That server owns the data.

Example: User 100 hashes to position 80. The next server clockwise from 80 is Server B at 150. So Server B handles User 100.

Step 3: Adding a server

Add Server D at position 100. Now only the data between position 50 and 100 needs to move from Server B to Server D. Everything else stays.

Instead of reshuffling 80% of data, you move roughly 1/N of data (where N is the number of servers). With 10 servers, adding one moves about 10% of data.

python
import hashlib
import bisect
 
class ConsistentHashRing:
    def __init__(self, nodes=None):
        self.ring = {}
        self.sorted_keys = []
        if nodes:
            for node in nodes:
                self.add_node(node)
 
    def add_node(self, node):
        # Place node on the ring using its hash
        key = self._hash(node)
        self.ring[key] = node
        bisect.insort(self.sorted_keys, key)
 
    def get_node(self, data_key):
        if not self.ring:
            return None
        hash_val = self._hash(data_key)
        # Find the next server clockwise
        idx = bisect.bisect(self.sorted_keys, hash_val)
        if idx == len(self.sorted_keys):
            idx = 0  # wrap around the ring
        return self.ring[self.sorted_keys[idx]]
 
    def _hash(self, key):
        return int(hashlib.md5(key.encode()).hexdigest(), 16)

The Hotspot Problem (and Virtual Nodes)

The basic ring has a flaw: servers don't end up evenly spaced. One server might handle 40% of data while another handles 5%.

The fix is virtual nodes. Each physical server gets multiple positions on the ring (say, 100–200 positions each). This spreads the load more evenly, because random positions average out across many replicas.

plaintext
Physical Server A → Virtual nodes at positions: 50, 120, 310, 450, ...
Physical Server B → Virtual nodes at positions: 80, 200, 380, 520, ...

With virtual nodes:

  • Load is naturally balanced across servers
  • When a server goes down, its load spreads across all remaining servers (not just the next one)
  • Servers with more capacity can have more virtual nodes

Where Consistent Hashing Is Used in Practice

Cassandra partitions rows across nodes using consistent hashing. Each row key maps to a token on the ring, and each node owns a range of tokens.

DynamoDB uses a similar partitioning scheme under the hood, invisibly managing rebalancing as you add read/write capacity.

Redis Cluster splits data into 16,384 hash slots and distributes them across nodes using consistent hashing.

Memcached client libraries use consistent hashing to decide which cache server holds a given key.

CDNs like Akamai use it to route requests to the right edge server without a central lookup.

When to Use Consistent Hashing

Use it when:

  • You have horizontally scaled infrastructure that changes frequently
  • Cache hit rates matter (you can't afford to invalidate everything on rescale)
  • You're building a distributed cache, database, or load balancer

Don't bother when:

  • You have a fixed, small number of servers that rarely change
  • Data locality doesn't matter (simple round-robin works fine)
  • You're using a managed service that handles partitioning for you

Key Takeaways

  • Normal hashing breaks when server count changes — consistent hashing doesn't
  • The hash ring means adding/removing a server only moves ~1/N of data
  • Virtual nodes solve the uneven load distribution problem
  • Cassandra, DynamoDB, Redis Cluster all rely on this technique
  • The Python implementation above captures the core idea in under 30 lines

If you're designing a distributed cache, understanding consistent hashing will save you from nasty cache stampede issues during deployments.

Related reading: Load Balancing Strategies · Vertical vs Horizontal Scaling

Enjoyed this article?

Get weekly insights on backend architecture, system design, and Go programming.