Rate Limiting: Protect Your API From Being Overwhelmed
Learn how to implement rate limiting with Redis. Covers token bucket, fixed window, and sliding window algorithms with Python and Go examples.
One user sends 10,000 requests per minute to your API. Your server slows to a crawl. Legitimate users get timeouts. Your costs spike.
Rate limiting stops this. It caps how many requests a client can make in a given time period.
Why Rate Limiting Matters
Beyond protection from abuse, rate limiting helps with:
- Preventing DDoS: A flood of requests can't bring down your service
- Fair usage: One heavy user can't degrade the experience for everyone
- Cost control: Limits on expensive operations (like AI inference calls) prevent runaway bills
- Business rules: APIs often have usage tiers — free users get 100 calls/day, paid users get 10,000
Rate Limiting Algorithms
Fixed Window Counter
Divide time into fixed windows (e.g., each minute). Count requests in the current window. If count exceeds limit, reject.
Window: 12:00:00 - 12:00:59 → 95 requests → OK
Window: 12:01:00 - 12:01:59 → 120 requests → 20 requests rejectedSimple to implement, but has an edge case: a client can double the limit by sending 100 requests at 12:00:59 and 100 more at 12:01:00 — that's 200 requests in 2 seconds, but both windows show 100.
Sliding Window Log
Track the timestamp of every request. On each new request, count requests in the last N seconds.
More accurate, but stores a timestamp for every request — expensive for high-volume APIs.
Sliding Window Counter
A practical middle ground. Weight the previous window count based on how far into the current window you are.
Previous window: 80 requests
Current window: 40 requests (30 seconds in, out of 60)
Estimated rate = (80 × 0.5) + 40 = 80 requests in this windowThis is what most production systems use.
Token Bucket (Most Flexible)
Imagine a bucket that fills with tokens at a steady rate. Each request uses one token. If the bucket is empty, the request is rejected.
- Bucket capacity: 100 tokens (max burst)
- Refill rate: 10 tokens/second
- A client can burst to 100 requests, then is limited to 10/second
import redis
import time
class TokenBucketRateLimiter:
def __init__(self, redis_client, capacity: int, refill_rate: float):
self.r = redis_client
self.capacity = capacity # Max tokens (burst limit)
self.refill_rate = refill_rate # Tokens added per second
def is_allowed(self, client_id: str) -> bool:
now = time.time()
key = f"rate_limit:{client_id}"
# Atomic Lua script to prevent race conditions
script = """
local tokens = tonumber(redis.call('GET', KEYS[1]) or ARGV[1])
local last_refill = tonumber(redis.call('GET', KEYS[2]) or ARGV[2])
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[3])
local now = tonumber(ARGV[2])
-- Add tokens based on elapsed time
local elapsed = now - last_refill
tokens = math.min(capacity, tokens + (elapsed * refill_rate))
if tokens >= 1 then
tokens = tokens - 1
redis.call('SET', KEYS[1], tokens, 'EX', 3600)
redis.call('SET', KEYS[2], now, 'EX', 3600)
return 1 -- allowed
else
return 0 -- rejected
end
"""
result = self.r.eval(
script, 2,
f"{key}:tokens", f"{key}:last_refill",
self.capacity, now, self.refill_rate
)
return bool(result)Simple Redis Implementation (Fixed Window)
For most APIs, a simple fixed window counter in Redis is enough:
import redis
r = redis.Redis(host='localhost', port=6379)
def is_rate_limited(client_id: str, limit: int = 100, window_seconds: int = 60) -> bool:
key = f"rate_limit:{client_id}:{int(time.time() / window_seconds)}"
current = r.incr(key)
if current == 1:
# First request in this window — set expiry
r.expire(key, window_seconds)
return current > limit
# Usage in FastAPI
from fastapi import Request, HTTPException
@app.get("/api/data")
async def get_data(request: Request):
client_ip = request.client.host
if is_rate_limited(client_ip, limit=100, window_seconds=60):
raise HTTPException(status_code=429, detail="Too many requests")
return {"data": "..."}Returning Useful Rate Limit Headers
Good APIs tell clients their rate limit status:
@app.get("/api/data")
async def get_data(request: Request):
client_ip = request.client.host
key = f"rate_limit:{client_ip}:{int(time.time() / 60)}"
current = r.incr(key)
if current == 1:
r.expire(key, 60)
remaining = max(0, 100 - current)
if current > 100:
raise HTTPException(
status_code=429,
detail="Rate limit exceeded",
headers={
"X-RateLimit-Limit": "100",
"X-RateLimit-Remaining": "0",
"X-RateLimit-Reset": str(int(time.time() / 60 + 1) * 60),
"Retry-After": "60",
}
)
return JSONResponse(
{"data": "..."},
headers={
"X-RateLimit-Limit": "100",
"X-RateLimit-Remaining": str(remaining),
}
)Where to Apply Rate Limits
Per IP address: Good for anonymous APIs. Easy to implement. Per API key / user: Better for authenticated APIs. Allows different limits per tier. Per endpoint: Expensive operations (image generation, AI calls) need tighter limits. Globally: Protect against total traffic overload regardless of individual client behavior.
In production, you often combine these:
- Global limit: 10,000 req/sec total
- Per IP: 100 req/min
- Authenticated free tier: 1,000 req/day
- Authenticated paid tier: 100,000 req/day
Rate Limiting at Infrastructure Level
For serious production systems, put rate limiting in the infrastructure layer, not just in application code:
- Nginx:
limit_req_zonemodule for per-IP limiting - API Gateway: AWS API Gateway, Kong, or Traefik all support rate limiting natively
- CDN level: Cloudflare's rate limiting rules before requests even reach your server
Infrastructure-level rate limiting is more efficient (rejects requests before they consume server resources) and works across all your services.
Key Takeaways
- Rate limiting protects your API from abuse, accidental overload, and runaway costs
- Fixed window is the simplest algorithm — good enough for most use cases
- Token bucket allows burst traffic while enforcing an average rate
- Redis is the standard tool for distributed rate limiting
- Always return
X-RateLimit-*headers so clients know their status - Apply rate limits per client AND globally for full protection
Rate limiting is one of those things that seems unnecessary until you don't have it.
Related reading: REST API Design Best Practices · Redis Caching Explained
Enjoyed this article?
Get weekly insights on backend architecture, system design, and Go programming.
Related Posts
Continue reading with these related posts
JWT Authentication: How It Works and Common Pitfalls
Learn how JSON Web Tokens work, when to use them, and the security mistakes developers make. Covers signing algorithms, refresh tokens, and Python/Go examples.
Redis Caching Explained: Speed Up Your Backend
Learn how Redis caching works, when to use it, and common patterns like cache-aside, write-through, and TTL. With Python and Go examples.
REST API Design Best Practices That Actually Matter
Learn REST API design best practices: URL structure, HTTP methods, status codes, versioning, and error handling. Build APIs developers love to use.