Vertical vs Horizontal Scaling: When to Use Each
Learn the difference between vertical and horizontal scaling. Understand trade-offs, real costs, and when each strategy makes sense for your system.
Your system is struggling under load. You need to scale. There are only two ways to do it: make your existing machines bigger, or add more machines.
That's vertical vs horizontal scaling. Both work. The choice depends on your situation.
Vertical Scaling: Make the Machine Bigger
Vertical scaling (scale up) means upgrading the server you already have. More CPU cores, more RAM, faster SSD, more network bandwidth.
Your app runs on one machine. You just make that machine more powerful.
Before: 4 CPU, 16GB RAM, 500GB SSD
After: 32 CPU, 128GB RAM, 2TB NVMe SSDWhy it's appealing: Simple. No code changes needed. One machine means no distributed systems complexity. No need to redesign your app.
Real example: Your PostgreSQL database is slow. You move it from a 4-core machine to a 32-core machine with more RAM. Queries are faster. Done.
The Limits of Vertical Scaling
There's a ceiling. You can't buy a machine with unlimited RAM. The biggest AWS instance (u-24tb1.metal) has 24TB of RAM — and costs around $200/hour. At some point, bigger hardware doesn't exist or doesn't make economic sense.
Also: one machine means one point of failure. If it goes down, everything goes down.
Horizontal Scaling: Add More Machines
Horizontal scaling (scale out) means adding more servers and distributing load across them.
Your app runs on 3 servers instead of 1. Traffic is split between them by a load balancer.
Before: 1 server handling 1,000 req/s
After: 5 servers × 1,000 req/s = 5,000 req/s capacityWhy it's powerful: No theoretical ceiling. You can add more servers indefinitely. If one server dies, others keep running — no single point of failure.
Real example: Your API can't handle traffic spikes. You put it behind a load balancer and run 10 instances. During a spike, you spin up 5 more automatically. When traffic drops, you scale back down.
The Challenges of Horizontal Scaling
Your app needs to be stateless. If Server A stores session data in memory, and the load balancer sends the next request to Server B, the session is gone.
Solution: Move state out of your servers. Store sessions in Redis. Store files in S3. Use a database for anything that needs to persist.
You also get complexity: load balancers, service discovery, distributed tracing, and network latency between services. More moving parts means more things to monitor and debug.
Comparison: When to Use Which
| Aspect | Vertical | Horizontal |
|---|---|---|
| Complexity | Low | High |
| Upper limit | Hardware limit | Practically unlimited |
| Failure tolerance | Single point of failure | High availability |
| Cost | Expensive big machines | Many smaller machines |
| Downtime to scale | Usually requires restart | Zero-downtime |
| Stateful apps | Works fine | Requires redesign |
| Best for | Databases, legacy apps | Web servers, APIs, microservices |
Real Scaling Decisions
Scale vertically first when:
- Your app isn't designed for horizontal scaling
- The bottleneck is a database (stateful, complex to distribute)
- You need a quick fix during an incident
- Vertical is cheaper at your scale
Scale horizontally when:
- You've hit the vertical limit (or the cost is too high)
- You need high availability with no single point of failure
- Traffic is spiky and you want to scale down when quiet
- You're building a system that needs to grow for years
In practice, most production systems use both. Horizontal scaling for stateless layers (web servers, API servers). Vertical scaling for databases (then add read replicas for reads, then consider sharding if you need to write at massive scale).
Database Scaling: A Special Case
Databases are the hardest part to scale horizontally because they're stateful.
Read replicas: One primary database handles writes. Multiple replicas handle reads. Works well when your app reads far more than it writes (most apps).
Write: → Primary DB
Read: → Replica 1, Replica 2, Replica 3 (round robin)Sharding: Split the data across multiple databases. User IDs 1–1M on DB1, 1M–2M on DB2. Complex to implement, hard to rebalance. Worth it only at massive scale.
Managed databases: AWS RDS Aurora can scale reads across 15 replicas automatically. Much simpler than managing this yourself.
Practical Advice from Production
Start with vertical scaling. It's simpler and it works. When you've maxed out vertical scaling (or its cost is too high), move to horizontal.
Design your application to be stateless from the start — even if you're only running one server. It costs almost nothing upfront and makes future horizontal scaling much easier.
For databases, add read replicas before sharding. Sharding is complex and you probably won't need it until you're at a very large scale.
Key Takeaways
- Vertical scaling = bigger machine. Simple, but has a ceiling and a single point of failure
- Horizontal scaling = more machines. Scalable and resilient, but requires stateless design
- Most systems use both: horizontal for app servers, vertical for databases initially
- Make your app stateless (use Redis for sessions, S3 for files) to enable horizontal scaling
- Start simple. Scale vertically first. Go horizontal when you need to
The right scaling strategy isn't about what's "best" — it's about what works for your team, your traffic, and your budget right now.
Related reading: Load Balancing Strategies · Consistent Hashing
Enjoyed this article?
Get weekly insights on backend architecture, system design, and Go programming.
Related Posts
Continue reading with these related posts
Database Sharding Explained: Scale to Millions of Users
Learn how database sharding works, when to use it, and common strategies. Covers horizontal partitioning, shard keys, and challenges with real examples.
Load Balancing Explained: Algorithms and Strategies
Learn how load balancers distribute traffic across servers. Covers round robin, least connections, consistent hashing, and when to use each.
Idempotency in APIs: Preventing Duplicate Operations
Learn what idempotency means in API design and why it matters for payments, retries, and distributed systems. With practical implementation patterns.