database

Database Sharding Explained: Scale to Millions of Users

Learn how database sharding works, when to use it, and common strategies. Covers horizontal partitioning, shard keys, and challenges with real examples.

By Akash Sharma·5 min read
#database
#sharding
#scalability
#system design
#distributed systems
#backend
#postgresql

Your user table has 50 million rows. Queries are slow. You've added indexes, upgraded to a bigger server, added read replicas. But writes are still the bottleneck.

This is the problem sharding solves. Instead of one big database, you split the data across multiple databases — each one handles a subset of the data.

What Is Sharding?

Sharding is horizontal partitioning — splitting a table's rows across multiple database instances (called shards).

Each shard has the same schema but different data. Together, they hold the complete dataset.

plaintext
Without sharding:
Database → Users table (50M rows)
 
With sharding:
Shard 1 → Users 1–12.5M
Shard 2 → Users 12.5M–25M
Shard 3 → Users 25M–37.5M
Shard 4 → Users 37.5M–50M

Each shard can be on a different server. Each server handles 25% of the write load.

Sharding Strategies

Range-Based Sharding

Split data based on ranges of the shard key. User IDs 1–1M go to Shard 1, 1M–2M to Shard 2, etc.

python
def get_shard(user_id: int, num_shards: int = 4) -> str:
    shard_size = 1_000_000
    shard_num = user_id // shard_size
    return f"db_shard_{shard_num % num_shards}"

Advantage: Simple. Range queries (find all users created between date X and Y) are efficient — they usually hit one or two shards. Problem: Hot spots. If most active users have high IDs (recently created), Shard 4 handles all the load while Shard 1 sits idle.

Hash-Based Sharding

Hash the shard key and use modulo to pick a shard.

python
import hashlib
 
def get_shard(user_id: int, num_shards: int = 4) -> str:
    hash_value = int(hashlib.md5(str(user_id).encode()).hexdigest(), 16)
    shard_num = hash_value % num_shards
    return f"db_shard_{shard_num}"

Advantage: Even distribution. No hot spots. Problem: Range queries become expensive — you have to check all shards. Resharding (adding more shards) is painful — you have to move data around.

Solution to resharding: Use consistent hashing instead of simple modulo. Cassandra and DynamoDB do this.

Directory-Based Sharding

A lookup service (routing table) maps each key to its shard.

python
# Lookup table stored in Redis or a database
routing_table = {
    "user:1": "shard_1",
    "user:2": "shard_3",
    ...
}
 
def get_shard(user_id: int) -> str:
    return redis.get(f"routing:user:{user_id}") or "shard_default"

Advantage: Most flexible. You can move data between shards easily by updating the routing table. Problem: The routing service becomes a bottleneck and a single point of failure. Extra network hop per query.

Choosing a Shard Key

The shard key determines how data is distributed. A bad shard key creates hot spots.

Good shard keys:

  • High cardinality (many unique values)
  • Evenly distributed access patterns
  • Rarely changes (changing a shard key means moving data)

Common choices:

  • user_id — if most queries are per-user
  • tenant_id — for multi-tenant SaaS apps
  • created_at (with hash) — for time-series data
  • Geographic region — if users are clustered geographically

Avoid:

  • Boolean fields — only 2 shards max
  • Low-cardinality fields — uneven distribution
  • Fields that change frequently

The Real Challenges of Sharding

Sharding solves write scale but introduces complexity.

Cross-shard queries: If you need data that spans multiple shards (e.g., "find all users who purchased product X"), you have to query all shards and merge results in your application. This is expensive.

Joins: You can't JOIN across shards in SQL. You have to fetch data from Shard 1 and Shard 2 separately and join them in application code.

Distributed transactions: Updating data across shards in a single atomic transaction requires distributed transaction protocols (2-phase commit), which are complex and slow.

Schema changes: Running a migration on 4 shards instead of 1 takes 4× longer and needs careful coordination.

Before You Shard

Sharding is complex. Try these first:

  1. Add indexes (often the real fix)
  2. Read replicas (scale reads, not writes)
  3. Caching (Redis for frequently-read data)
  4. Vertical scaling (bigger machines)
  5. Partition by table (move large tables like logs to separate databases)

Sharding should be your last resort for scaling writes. Most applications never need it.

When Managed Databases Handle Sharding for You

You don't always need to implement sharding yourself:

  • Amazon DynamoDB: Automatically shards and rebalances under the hood
  • Google Spanner: Horizontally scalable SQL with automatic sharding
  • CockroachDB: Distributed PostgreSQL-compatible database
  • Cassandra: Native sharding with consistent hashing

If you're at a scale that needs sharding, strongly consider these managed options before implementing it yourself.

Sharding in Practice: A Real Example

Instagram's user media was sharded by user_id. Each shard holds all photos for a range of users.

When you view a profile, all media for that user is on one shard — no cross-shard query. Efficient.

But searching for photos by hashtag across all users is expensive — it has to touch all shards. Instagram built a separate search index for this rather than cross-shard querying.

This is the pattern: shard for your hot path (user profile views), build separate systems for the expensive cross-shard queries.

Key Takeaways

  • Sharding splits data across multiple databases to scale write throughput
  • Hash-based sharding distributes evenly; range-based is better for range queries
  • Choose your shard key carefully — it's hard to change later
  • Cross-shard queries, joins, and distributed transactions are the main pain points
  • Try indexing, caching, and read replicas before sharding
  • Managed databases (DynamoDB, Spanner, CockroachDB) handle sharding for you

Sharding is powerful but complex. Use it when you have to, not because you can.

Related reading: Database Indexing Explained · CAP Theorem Explained · Consistent Hashing

Enjoyed this article?

Get weekly insights on backend architecture, system design, and Go programming.