Saga Pattern: Distributed Transactions Without 2PC
Learn how the saga pattern handles distributed transactions in microservices. Covers choreography vs orchestration, compensating transactions, and real examples.
You're building an e-commerce checkout. It needs to: reserve inventory, charge the payment, and create a shipment — in different services.
What happens if payment succeeds but shipment creation fails? You've charged the user for something you can't ship.
In a single database, a transaction handles this. In microservices across multiple databases, you can't do a single transaction. That's where the saga pattern comes in.
Why Distributed Transactions Are Hard
A traditional database transaction is ACID — all steps succeed or all roll back together.
In microservices, each service has its own database. You can't lock rows across separate databases in a single transaction. The standard solution (2-phase commit) works but is slow, complex, and fragile — most teams avoid it.
The saga pattern is the alternative.
What Is a Saga?
A saga is a sequence of local transactions. Each step publishes an event or calls the next step. If any step fails, the saga runs compensating transactions to undo the previous steps.
Checkout Saga:
1. Reserve inventory → success
2. Charge payment → success
3. Create shipment → FAILS
Compensating (undo in reverse):
3. (failed — nothing to undo)
2. Refund payment ← run this
1. Release inventory ← run thisInstead of one atomic transaction, you have a chain of operations with an undo for each one.
Choreography: Event-Driven Sagas
Each service listens for events and reacts. No central coordinator.
Order Service: creates order → publishes "OrderCreated"
↓
Inventory Service: receives "OrderCreated" → reserves stock → publishes "InventoryReserved"
↓
Payment Service: receives "InventoryReserved" → charges card → publishes "PaymentCompleted"
↓
Shipping Service: receives "PaymentCompleted" → creates shipment → publishes "OrderFulfilled"
On failure:
Payment Service: charge fails → publishes "PaymentFailed"
↓
Inventory Service: receives "PaymentFailed" → releases stock → publishes "InventoryReleased"
↓
Order Service: receives "InventoryReleased" → marks order failed# Inventory service — Kafka consumer
from kafka import KafkaConsumer, KafkaProducer
import json
consumer = KafkaConsumer("order-events", group_id="inventory-service")
producer = KafkaProducer(bootstrap_servers="kafka:9092")
for message in consumer:
event = json.loads(message.value)
if event["type"] == "OrderCreated":
order_id = event["order_id"]
items = event["items"]
if reserve_stock(items):
producer.send("inventory-events", json.dumps({
"type": "InventoryReserved",
"order_id": order_id
}).encode())
else:
producer.send("inventory-events", json.dumps({
"type": "InventoryReservationFailed",
"order_id": order_id,
"reason": "Out of stock"
}).encode())
elif event["type"] == "PaymentFailed":
order_id = event["order_id"]
release_stock(order_id) # Compensating transaction
producer.send("inventory-events", json.dumps({
"type": "InventoryReleased",
"order_id": order_id
}).encode())Advantage: Loose coupling. Services don't know about each other — only the events. Disadvantage: Hard to follow the flow. Business logic is scattered across services. Debugging is painful.
Orchestration: Centralized Coordinator
A central saga orchestrator tells each service what to do and handles failures.
# Checkout Saga Orchestrator
class CheckoutSaga:
def __init__(self, order_id: str):
self.order_id = order_id
self.state = "STARTED"
async def execute(self, order_data: dict):
# Step 1: Reserve inventory
try:
self.state = "RESERVING_INVENTORY"
await inventory_service.reserve(self.order_id, order_data["items"])
except Exception as e:
await self.compensate(step="before_inventory")
raise SagaFailedException(f"Inventory failed: {e}")
# Step 2: Charge payment
try:
self.state = "CHARGING_PAYMENT"
await payment_service.charge(self.order_id, order_data["amount"])
except Exception as e:
await self.compensate(step="before_payment")
raise SagaFailedException(f"Payment failed: {e}")
# Step 3: Create shipment
try:
self.state = "CREATING_SHIPMENT"
await shipping_service.create(self.order_id, order_data["address"])
except Exception as e:
await self.compensate(step="before_shipment")
raise SagaFailedException(f"Shipping failed: {e}")
self.state = "COMPLETED"
async def compensate(self, step: str):
if step == "before_shipment":
await payment_service.refund(self.order_id)
await inventory_service.release(self.order_id)
elif step == "before_payment":
await inventory_service.release(self.order_id)
# before_inventory: nothing to undo
# Usage
saga = CheckoutSaga(order_id="order-123")
try:
await saga.execute(order_data)
except SagaFailedException as e:
print(f"Order failed and rolled back: {e}")Advantage: Business logic in one place. Easy to see the full flow. Disadvantage: Orchestrator is a single point of failure. Can become a bottleneck.
Compensating Transactions: The Key Concept
Compensating transactions are the "undo" operations. They're not rollbacks — they're forward-moving operations that reverse the effect.
Original: Charge $100 to card
Compensating: Refund $100 to card
Original: Reserve 5 units of product X
Compensating: Release 5 units of product X
Original: Send welcome email
Compensating: ??? (can't unsend an email)Some operations can't be compensated. Sending an email is irreversible. The design choice: either accept this (some side effects are acceptable), or move irreversible steps to the end of the saga after all reversible steps succeed.
Keeping Track of Saga State
The orchestrator needs to persist its state — if it crashes mid-saga, it must be able to resume or compensate.
# Store saga state in database
class SagaState(Base):
__tablename__ = "saga_states"
saga_id = Column(String, primary_key=True)
order_id = Column(String, index=True)
current_step = Column(String) # "RESERVING_INVENTORY", "CHARGING_PAYMENT", etc.
status = Column(String) # "RUNNING", "COMPLETED", "COMPENSATING", "FAILED"
steps_completed = Column(JSON) # ["inventory_reserved", "payment_charged"]
created_at = Column(DateTime)
updated_at = Column(DateTime)
# Before each step: update state
db.query(SagaState).filter_by(saga_id=saga_id).update({
"current_step": "CHARGING_PAYMENT",
"updated_at": datetime.utcnow()
})
db.commit()If the orchestrator restarts, it reads the database, sees which step it was on, and decides whether to continue or compensate.
Saga vs 2-Phase Commit
| Saga | 2-Phase Commit | |
|---|---|---|
| Consistency | Eventual | Strong |
| Complexity | Medium | Very high |
| Performance | Fast | Slow (blocking) |
| Failure tolerance | Good | Coordinator failure = stuck |
| Use case | Microservices | Legacy distributed DBs |
Sagas are eventually consistent — there's a window where one service has committed but another hasn't yet. For most business workflows (order processing, user registration), this is acceptable.
For financial ledgers where every cent must balance perfectly, you may need stronger guarantees.
When to Use Sagas
Good fit:
- Multi-step business processes spanning multiple services
- Order processing (inventory + payment + fulfillment)
- User registration (create account + send email + provision resources)
- Travel booking (flight + hotel + car)
Bad fit:
- Simple two-service coordination (direct call is fine)
- High-frequency operations (saga overhead adds up)
- When you need immediate consistency
Key Takeaways
- Sagas replace distributed transactions by chaining local transactions with compensating operations
- Choreography: services react to events — loose coupling, hard to debug
- Orchestration: central coordinator calls services — easier to trace, single responsibility
- Compensating transactions undo previous steps forward (they're not rollbacks)
- Persist saga state so it can resume or compensate after crashes
- Some operations (like sending emails) can't be compensated — put them last
- Sagas give you eventual consistency, not immediate consistency
Sagas add complexity. Use them when a multi-service workflow needs reliability, not just convenience.
Related reading: Message Queues Explained · CAP Theorem Explained · Circuit Breaker Pattern
Enjoyed this article?
Get weekly insights on backend architecture, system design, and Go programming.
Related Posts
Continue reading with these related posts
Distributed Tracing: Debug Requests Across Services
Learn how distributed tracing works and how to implement it. Covers trace IDs, spans, OpenTelemetry, Jaeger, and how to find performance bottlenecks in microservices.
Idempotency in APIs: Preventing Duplicate Operations
Learn what idempotency means in API design and why it matters for payments, retries, and distributed systems. With practical implementation patterns.
Service Discovery: How Microservices Find Each Other
Learn how service discovery works in microservices. Covers client-side vs server-side discovery, Consul, etcd, and Kubernetes DNS with practical examples.