Saga Pattern: Distributed Transactions Without 2PC

You're building an e-commerce checkout. It needs to: reserve inventory, charge the payment, and create a shipment — in different services.

What happens if payment succeeds but shipment creation fails? You've charged the user for something you can't ship.

In a single database, a transaction handles this. In microservices across multiple databases, you can't do a single transaction. That's where the saga pattern comes in.

Why Distributed Transactions Are Hard

A traditional database transaction is ACID — all steps succeed or all roll back together.

In microservices, each service has its own database. You can't lock rows across separate databases in a single transaction. The standard solution (2-phase commit) works but is slow, complex, and fragile — most teams avoid it.

The saga pattern is the alternative.

What Is a Saga?

A saga is a sequence of local transactions. Each step publishes an event or calls the next step. If any step fails, the saga runs compensating transactions to undo the previous steps.

plaintext

Checkout Saga:
1. Reserve inventory      → success
2. Charge payment         → success  
3. Create shipment        → FAILS
 
Compensating (undo in reverse):
3. (failed — nothing to undo)
2. Refund payment         ← run this
1. Release inventory      ← run this

Instead of one atomic transaction, you have a chain of operations with an undo for each one.

Choreography: Event-Driven Sagas

Each service listens for events and reacts. No central coordinator.

plaintext

Order Service: creates order → publishes "OrderCreated"
     ↓
Inventory Service: receives "OrderCreated" → reserves stock → publishes "InventoryReserved"
     ↓
Payment Service: receives "InventoryReserved" → charges card → publishes "PaymentCompleted"
     ↓
Shipping Service: receives "PaymentCompleted" → creates shipment → publishes "OrderFulfilled"
 
On failure:
Payment Service: charge fails → publishes "PaymentFailed"
     ↓
Inventory Service: receives "PaymentFailed" → releases stock → publishes "InventoryReleased"
     ↓
Order Service: receives "InventoryReleased" → marks order failed

python

# Inventory service — Kafka consumer
from kafka import KafkaConsumer, KafkaProducer
import json
 
consumer = KafkaConsumer("order-events", group_id="inventory-service")
producer = KafkaProducer(bootstrap_servers="kafka:9092")
 
for message in consumer:
    event = json.loads(message.value)
    
    if event["type"] == "OrderCreated":
        order_id = event["order_id"]
        items = event["items"]
        
        if reserve_stock(items):
            producer.send("inventory-events", json.dumps({
                "type": "InventoryReserved",
                "order_id": order_id
            }).encode())
        else:
            producer.send("inventory-events", json.dumps({
                "type": "InventoryReservationFailed",
                "order_id": order_id,
                "reason": "Out of stock"
            }).encode())
    
    elif event["type"] == "PaymentFailed":
        order_id = event["order_id"]
        release_stock(order_id)  # Compensating transaction
        producer.send("inventory-events", json.dumps({
            "type": "InventoryReleased",
            "order_id": order_id
        }).encode())

Advantage: Loose coupling. Services don't know about each other — only the events. Disadvantage: Hard to follow the flow. Business logic is scattered across services. Debugging is painful.

Orchestration: Centralized Coordinator

A central saga orchestrator tells each service what to do and handles failures.

python

# Checkout Saga Orchestrator
class CheckoutSaga:
    def __init__(self, order_id: str):
        self.order_id = order_id
        self.state = "STARTED"
    
    async def execute(self, order_data: dict):
        # Step 1: Reserve inventory
        try:
            self.state = "RESERVING_INVENTORY"
            await inventory_service.reserve(self.order_id, order_data["items"])
        except Exception as e:
            await self.compensate(step="before_inventory")
            raise SagaFailedException(f"Inventory failed: {e}")
        
        # Step 2: Charge payment
        try:
            self.state = "CHARGING_PAYMENT"
            await payment_service.charge(self.order_id, order_data["amount"])
        except Exception as e:
            await self.compensate(step="before_payment")
            raise SagaFailedException(f"Payment failed: {e}")
        
        # Step 3: Create shipment
        try:
            self.state = "CREATING_SHIPMENT"
            await shipping_service.create(self.order_id, order_data["address"])
        except Exception as e:
            await self.compensate(step="before_shipment")
            raise SagaFailedException(f"Shipping failed: {e}")
        
        self.state = "COMPLETED"
    
    async def compensate(self, step: str):
        if step == "before_shipment":
            await payment_service.refund(self.order_id)
            await inventory_service.release(self.order_id)
        elif step == "before_payment":
            await inventory_service.release(self.order_id)
        # before_inventory: nothing to undo
 
# Usage
saga = CheckoutSaga(order_id="order-123")
try:
    await saga.execute(order_data)
except SagaFailedException as e:
    print(f"Order failed and rolled back: {e}")

Advantage: Business logic in one place. Easy to see the full flow. Disadvantage: Orchestrator is a single point of failure. Can become a bottleneck.

Compensating Transactions: The Key Concept

Compensating transactions are the "undo" operations. They're not rollbacks — they're forward-moving operations that reverse the effect.

plaintext

Original: Charge $100 to card
Compensating: Refund $100 to card
 
Original: Reserve 5 units of product X
Compensating: Release 5 units of product X
 
Original: Send welcome email
Compensating: ??? (can't unsend an email)

Some operations can't be compensated. Sending an email is irreversible. The design choice: either accept this (some side effects are acceptable), or move irreversible steps to the end of the saga after all reversible steps succeed.

Keeping Track of Saga State

The orchestrator needs to persist its state — if it crashes mid-saga, it must be able to resume or compensate.

python

# Store saga state in database
class SagaState(Base):
    __tablename__ = "saga_states"
    
    saga_id = Column(String, primary_key=True)
    order_id = Column(String, index=True)
    current_step = Column(String)  # "RESERVING_INVENTORY", "CHARGING_PAYMENT", etc.
    status = Column(String)  # "RUNNING", "COMPLETED", "COMPENSATING", "FAILED"
    steps_completed = Column(JSON)  # ["inventory_reserved", "payment_charged"]
    created_at = Column(DateTime)
    updated_at = Column(DateTime)
 
# Before each step: update state
db.query(SagaState).filter_by(saga_id=saga_id).update({
    "current_step": "CHARGING_PAYMENT",
    "updated_at": datetime.utcnow()
})
db.commit()

If the orchestrator restarts, it reads the database, sees which step it was on, and decides whether to continue or compensate.

Saga vs 2-Phase Commit

	Saga	2-Phase Commit
Consistency	Eventual	Strong
Complexity	Medium	Very high
Performance	Fast	Slow (blocking)
Failure tolerance	Good	Coordinator failure = stuck
Use case	Microservices	Legacy distributed DBs

Sagas are eventually consistent — there's a window where one service has committed but another hasn't yet. For most business workflows (order processing, user registration), this is acceptable.

For financial ledgers where every cent must balance perfectly, you may need stronger guarantees.

When to Use Sagas

Good fit:

Multi-step business processes spanning multiple services
Order processing (inventory + payment + fulfillment)
User registration (create account + send email + provision resources)
Travel booking (flight + hotel + car)

Bad fit:

Simple two-service coordination (direct call is fine)
High-frequency operations (saga overhead adds up)
When you need immediate consistency

Key Takeaways

Sagas replace distributed transactions by chaining local transactions with compensating operations
Choreography: services react to events — loose coupling, hard to debug
Orchestration: central coordinator calls services — easier to trace, single responsibility
Compensating transactions undo previous steps forward (they're not rollbacks)
Persist saga state so it can resume or compensate after crashes
Some operations (like sending emails) can't be compensated — put them last
Sagas give you eventual consistency, not immediate consistency

Sagas add complexity. Use them when a multi-service workflow needs reliability, not just convenience.

Saga Pattern: Distributed Transactions Without 2PC

Why Distributed Transactions Are Hard

What Is a Saga?

Choreography: Event-Driven Sagas

Orchestration: Centralized Coordinator

Compensating Transactions: The Key Concept

Keeping Track of Saga State

Saga vs 2-Phase Commit

When to Use Sagas

Key Takeaways

Enjoyed this article?

Related Posts

Distributed Tracing: Debug Requests Across Services

Idempotency in APIs: Preventing Duplicate Operations

Service Discovery: How Microservices Find Each Other