system design

CAP Theorem Explained Simply

Learn what the CAP theorem means in practice: why distributed systems must choose between consistency and availability when a network partition occurs.

By Akash Sharma·5 min read
#system design
#distributed systems
#cap theorem
#consistency
#availability
#databases

You're designing a distributed database. You want it to always return correct data, always be available, and handle network failures gracefully.

Here's the problem: you can only have two of those three at the same time.

That's the CAP theorem.

The Three Properties

Consistency (C): Every read gets the most recent write. All nodes in the cluster return the same data at the same time. If you write "price = $100" to one node, any node you read from next will return $100.

Availability (A): Every request gets a response (not an error). The system keeps responding even if some nodes are down. No timeouts, no "service unavailable."

Partition Tolerance (P): The system keeps working even if network messages between nodes are dropped or delayed. Nodes can't always talk to each other — that's just reality in distributed systems.

Why You Can't Have All Three

Network partitions happen. Cables break. Switches fail. Cloud regions lose connectivity. In any real distributed system, you must tolerate partitions — because the alternative is a system that stops working whenever there's any network issue.

So partition tolerance is mandatory. The real choice is between C and A during a partition.

Scenario: Your database has two nodes in different data centers. A network partition splits them — they can't communicate.

A new write comes in to Node 1: "Set user balance to $500."

Node 1 can't sync this to Node 2. Now what?

Option 1: Prioritize Consistency (CP) Reject the write (or any reads) until the partition is resolved. Returns an error. Data is correct, but the system is unavailable.

Option 2: Prioritize Availability (AP) Accept the write on Node 1, serve reads from both nodes. Node 2 still returns the old value. System is up, but data is inconsistent between nodes.

CP Systems: Correctness Over Uptime

CP databases sacrifice availability during partitions to ensure data is always correct.

Examples: HBase, Zookeeper, Redis Cluster, etcd

They're used when being wrong is worse than being unavailable:

  • Financial transactions — wrong balance is catastrophic
  • Inventory management — overselling because of stale data is a serious problem
  • Distributed locks and leader election
plaintext
Partition occurs:
- Client reads from Node 1 → returns latest data ✓
- Client reads from Node 2 → ERROR: "Cannot guarantee consistency"

AP Systems: Uptime Over Correctness

AP databases sacrifice consistency to stay available. During a partition, they return data that might be slightly stale.

Examples: Cassandra, DynamoDB, CouchDB, Riak

They use eventual consistency — given enough time without new writes, all nodes will converge to the same value.

Used when availability matters more than perfect consistency:

  • Social media likes/views (showing 999 instead of 1000 likes is fine)
  • User profiles (stale data for a few seconds is acceptable)
  • Shopping carts (temporary inconsistency is okay)
  • Product catalog
plaintext
Partition occurs:
- Client writes to Node 1: balance = $500
- Client reads from Node 2 → returns $450 (stale) — but no error
- Partition heals → both nodes sync → both return $500

The PACELC Extension

CAP only describes behavior during partitions. But what about normal operation when there's no partition?

PACELC extends CAP: even without a partition (E), you trade Latency for Consistency.

  • A system that waits for all nodes to confirm a write (strong consistency) is slower
  • A system that writes to one node and returns immediately (eventual consistency) is faster

This is why DynamoDB and Cassandra are fast — they don't wait for all nodes to agree.

Practical Examples

PostgreSQL: CA in single-node mode (no partition tolerance if you're on one server). With synchronous replication, it becomes CP.

Cassandra: AP by default. You can tune consistency level per query — ask for consistency from 1 node (fast) or majority of nodes (slower, more consistent).

python
# Cassandra: choose your consistency per query
from cassandra import ConsistencyLevel
 
# Fast but potentially stale
session.execute(query, consistency_level=ConsistencyLevel.ONE)
 
# Slower but consistent across majority of nodes
session.execute(query, consistency_level=ConsistencyLevel.QUORUM)

DynamoDB: AP by default. Eventual consistency. You can pay extra for strongly consistent reads.

Zookeeper / etcd: CP. Used for leader election and distributed locks where correctness is critical.

What This Means for Your System Design

For most web applications, you'll use AP systems with eventual consistency — and that's fine. The data doesn't need to be perfectly consistent across all nodes within milliseconds.

For financial data, locks, or anything where incorrect data causes real harm, use CP systems or design your application to handle consistency at the application level (distributed transactions, sagas).

A common pattern is to use different databases for different data:

  • PostgreSQL (CP) for financial transactions
  • DynamoDB (AP) for user sessions and preferences
  • Redis (CP) for distributed locks and rate limiting

Key Takeaways

  • CAP theorem: Consistency, Availability, Partition Tolerance — pick two (partition tolerance is mandatory in distributed systems, so really pick C or A)
  • CP systems return errors during partitions to stay consistent (HBase, Redis, etcd)
  • AP systems stay available during partitions but may return stale data (Cassandra, DynamoDB)
  • Eventual consistency means nodes will agree — eventually, not necessarily immediately
  • Most web apps work fine with AP + eventual consistency
  • Use CP for money, locks, and anything where wrong data causes serious problems

CAP is a fundamental constraint, not a design flaw. Understanding it helps you choose the right database for each part of your system.

Related reading: Vertical vs Horizontal Scaling · Database Sharding Explained

Enjoyed this article?

Get weekly insights on backend architecture, system design, and Go programming.