SAGA Pattern in Microservices — A Complete Guide with Order Processing Example
Deep-dive into the SAGA pattern: why distributed transactions need it, Choreography vs Orchestration, an order processing walkthrough with code, compensating transactions, real .NET examples, advantages, disadvantages, and production pitfalls.
- Author
- Randhir Jassal
- Published
- Reading time
- 16 min read
The SAGA pattern is how you maintain data consistency across multiple microservices when a traditional database transaction (ACID, two-phase commit) is not an option. It breaks a single business transaction into a sequence of local transactions, each owned by one service. If any step fails, prior steps are reversed via compensating transactions.
This guide is the complete picture: why SAGA exists, the two implementation flavors, an order-processing walkthrough with real .NET code, the operational reality, and where this pattern fails.
Why SAGA exists
In a monolith, "place an order" looks like this:
BEGIN;
INSERT INTO orders (...) VALUES (...);
UPDATE inventory SET stock = stock - 1 WHERE sku = 'X';
INSERT INTO payments (...) VALUES (...);
UPDATE customers SET loyalty_points = loyalty_points + 10 WHERE id = ...;
COMMIT;
One database, one transaction. Atomic. Either all rows commit or none.
In microservices, every service owns its own database. Cross-service queries are forbidden. So "place an order" becomes:
Order Service → INSERT INTO orders ...
Inventory Service → UPDATE stock ...
Payment Service → INSERT INTO payments ...
Customer Service → UPDATE loyalty_points ...
Four databases. No single transaction can span them. A traditional 2PC (Two-Phase Commit) would lock all four databases and is too brittle at scale. SAGA replaces ACID guarantees with eventual consistency + explicit compensation.
Architecture in one diagram
SAGA — Order Processing
───────────────────────
┌────────────┐ create ┌────────────┐ reserve ┌────────────┐ charge ┌────────────┐
│ │ order │ │ stock │ │ card │ │
│ Order │─────────▶│ Inventory │─────────▶│ Payment │─────────▶│ Shipping │
│ Service │ │ Service │ │ Service │ │ Service │
│ │ │ │ │ │ │ │
└────────────┘ └────────────┘ └────────────┘ └────────────┘
│ │ │ │
│ │ │ │
▼ on failure ▼ on failure ▼ on failure ▼ ✓ done
┌────────────┐ ┌────────────┐ ┌────────────┐
│ cancel │ │ release │ │ refund │ COMPENSATING
│ order │◀─────────│ stock │◀─────────│ payment │ TRANSACTIONS
│ │ │ │ │ │
└────────────┘ └────────────┘ └────────────┘
forward path (left → right): each service does its local work, then triggers the next.
reverse path (right → left): if any step fails, prior steps are undone in reverse order.
Three things to internalize:
- Each step is a local ACID transaction in one database. No distributed lock.
- Each forward step has a defined compensating step. "Reserved 1 unit" has a compensation: "release 1 unit".
- Compensations run in reverse order. If payment fails, you release stock THEN cancel the order — same order the forward path took, reversed.
The two flavors — Choreography vs Orchestration
Choreography (event-driven, decentralized)
Each service publishes events. Other services listen and react.
Order Service Event Bus Inventory Service
│ │ │
│ ─OrderPlaced───────────────────▶ │
│ │ ──OrderPlaced──────────────▶ │
│ │ │ reserves stock
│ │ ◀──StockReserved──────────── │
│ │ ──StockReserved────────────▶ │ (to Payment)
│ │
No central coordinator. Every service knows its part. Loose coupling, but the workflow is invisible — to understand "place an order" end-to-end you have to read N service codebases.
Orchestration (central coordinator)
A dedicated SAGA orchestrator (often a state machine) sends commands and tracks state.
┌───────────────────┐
│ Order Saga │
│ Orchestrator │
│ (state machine) │
└─┬───────┬───┬─────┘
│ │ │
reserve───▶│ │ │◀───stockReserved
stock │ │ │
▼ ▼ ▼
┌────────┐ ┌──────┐ ┌─────────┐
│ Order │ │ Inv. │ │ Payment │
└────────┘ └──────┘ └─────────┘
One service is the source of truth for the workflow. Easier to reason about, easier to monitor. But it becomes a single point of complexity.
Which to pick
| Decision factor | Choreography | Orchestration |
|---|---|---|
| Workflow visibility | Low — distributed | High — single state machine |
| Coupling | Loose | Tighter (orchestrator knows everyone) |
| Best when | 2-3 steps, stable workflow | 4+ steps, evolving workflow |
| Failure tracing | Hard (multi-service logs) | Easy (one log timeline) |
| Adding a new step | New event subscription per service | Edit the orchestrator state machine |
Rule of thumb: start with Choreography for simple workflows. As complexity grows past ~3 steps, migrate to Orchestration.
Order processing — full code (Orchestration flavor, .NET)
Step 1 — Define the saga state and events
public enum OrderSagaState
{
Started,
StockReserved,
PaymentCharged,
ShippingScheduled,
Completed,
Failed,
Compensating,
Compensated
}
public record OrderSaga(
Guid SagaId,
Guid OrderId,
Guid CustomerId,
List<LineItem> Items,
decimal Total,
OrderSagaState State,
DateTimeOffset StartedAt
);
Step 2 — The orchestrator
public class OrderSagaOrchestrator
{
private readonly ISagaRepository _repo;
private readonly IInventoryClient _inventory;
private readonly IPaymentClient _payments;
private readonly IShippingClient _shipping;
private readonly IOrderClient _orders;
private readonly ILogger<OrderSagaOrchestrator> _log;
public async Task<OrderSaga> StartAsync(PlaceOrderCommand cmd, CancellationToken ct)
{
var saga = new OrderSaga(
SagaId: Guid.NewGuid(),
OrderId: Guid.NewGuid(),
CustomerId: cmd.CustomerId,
Items: cmd.Items,
Total: cmd.Items.Sum(i => i.Price * i.Quantity),
State: OrderSagaState.Started,
StartedAt: DateTimeOffset.UtcNow);
await _repo.SaveAsync(saga, ct);
return await StepReserveStockAsync(saga, ct);
}
private async Task<OrderSaga> StepReserveStockAsync(OrderSaga saga, CancellationToken ct)
{
try
{
await _inventory.ReserveAsync(saga.OrderId, saga.Items, ct);
saga = saga with { State = OrderSagaState.StockReserved };
await _repo.SaveAsync(saga, ct);
return await StepChargePaymentAsync(saga, ct);
}
catch (Exception ex)
{
_log.LogWarning(ex, "stock reservation failed for saga {SagaId}", saga.SagaId);
return await FailAsync(saga, "stock_unavailable", ct);
}
}
private async Task<OrderSaga> StepChargePaymentAsync(OrderSaga saga, CancellationToken ct)
{
try
{
await _payments.ChargeAsync(saga.OrderId, saga.CustomerId, saga.Total, ct);
saga = saga with { State = OrderSagaState.PaymentCharged };
await _repo.SaveAsync(saga, ct);
return await StepScheduleShippingAsync(saga, ct);
}
catch (Exception ex)
{
_log.LogWarning(ex, "payment failed for saga {SagaId}", saga.SagaId);
return await CompensateAsync(saga, "payment_declined", ct);
}
}
private async Task<OrderSaga> StepScheduleShippingAsync(OrderSaga saga, CancellationToken ct)
{
try
{
await _shipping.ScheduleAsync(saga.OrderId, ct);
saga = saga with { State = OrderSagaState.Completed };
await _orders.MarkCompletedAsync(saga.OrderId, ct);
await _repo.SaveAsync(saga, ct);
return saga;
}
catch (Exception ex)
{
_log.LogWarning(ex, "shipping failed for saga {SagaId}", saga.SagaId);
return await CompensateAsync(saga, "shipping_failed", ct);
}
}
private async Task<OrderSaga> CompensateAsync(OrderSaga saga, string reason, CancellationToken ct)
{
saga = saga with { State = OrderSagaState.Compensating };
await _repo.SaveAsync(saga, ct);
// Reverse order: payment refund → stock release → order cancel
if (saga.State.HasReached(OrderSagaState.PaymentCharged))
await _payments.RefundAsync(saga.OrderId, saga.Total, ct);
if (saga.State.HasReached(OrderSagaState.StockReserved))
await _inventory.ReleaseAsync(saga.OrderId, saga.Items, ct);
await _orders.CancelAsync(saga.OrderId, reason, ct);
saga = saga with { State = OrderSagaState.Compensated };
await _repo.SaveAsync(saga, ct);
return saga;
}
private async Task<OrderSaga> FailAsync(OrderSaga saga, string reason, CancellationToken ct)
{
saga = saga with { State = OrderSagaState.Failed };
await _orders.CancelAsync(saga.OrderId, reason, ct);
await _repo.SaveAsync(saga, ct);
return saga;
}
}
Step 3 — Each service exposes both forward and compensating endpoints
// Inventory service
[ApiController]
[Route("inventory")]
public class InventoryController(InventoryService svc) : ControllerBase
{
[HttpPost("reserve")]
public Task ReserveAsync(ReserveStockRequest req) => svc.ReserveAsync(req.OrderId, req.Items);
[HttpPost("release")]
public Task ReleaseAsync(ReleaseStockRequest req) => svc.ReleaseAsync(req.OrderId, req.Items);
}
The key insight: every "do X" endpoint has a paired "undo X" endpoint. That pairing is the SAGA contract.
Compensating transactions — the hard part
A compensation is not a database ROLLBACK. The original transaction already committed. Compensation is a new transaction that semantically reverses the prior effect.
| Forward | Compensation |
|---|---|
| Reserve 2 units of stock | Release 2 units of stock |
| Charge ₹1,500 to card | Refund ₹1,500 |
| Send "order confirmed" email | Send "order cancelled" email |
| Decrement loyalty points | Increment loyalty points |
Important properties of good compensating transactions:
- Idempotent — running it twice has the same effect as running it once. (Retries are inevitable.)
- Always succeeds (or has its own retry policy) — a failed compensation leaks "ghost" reservations.
- Can be safely delayed — sometimes the compensation runs minutes after the failure.
- Audit-logged — you must be able to explain "this order was refunded because shipping failed at 14:32".
When SAGA pays off
- Workflows spanning 3+ services, where you cannot pessimistically lock all of them
- Long-running business processes (multi-second or multi-minute)
- E-commerce checkout, travel booking (flight + hotel + car), insurance claims processing
- When 2PC would lock too many resources during peak load
- When you accept eventual consistency for a short window
When SAGA is the wrong tool
- Workflows fully inside one bounded context — use a database transaction
- Read-heavy operations — there's nothing to compensate
- Strict "everything atomic, no observer ever sees intermediate state" requirements (banking core, ATM transactions) — use 2PC or rearchitect to a monolith
- Workflows where compensation is impossible (irreversible side effects like physical shipment leaving a warehouse) — design to commit only AFTER the point of no return
Advantages
- Scales horizontally — each service can be deployed, scaled, failed independently
- No distributed locks — avoids the 2PC liveness traps
- Resilient — a saga can resume mid-workflow after a crash (state is durable)
- Audit trail by design — the saga state log IS the audit log
- Polyglot friendly — each service can use its own database tech
Disadvantages
- Eventual consistency window — readers may see intermediate states (order exists but not yet paid)
- Compensation logic doubles your code — every "do X" needs an "undo X"
- Some operations are not compensable — once a notification email is sent, you can't unsend it
- Debugging is harder — failure trace spans many services and many minutes
- Idempotency is non-negotiable — retries cause double-charges if not handled
Production checklist
- Persist the saga state durably before each step. Crash recovery depends on it.
- Idempotency keys on every external call — Payment.Charge(orderId, key=hash(saga, step)).
- Timeouts on every step — a hung service should not pause the saga indefinitely.
- Dead-letter queue for permanent failures — when even compensation fails, surface to ops.
- Saga timeout — if a saga takes 24h, ops should know. Alert on stuck sagas.
- Distributed tracing — every saga step must carry a
saga_idin the trace context. - Observability — dashboards for saga durations, failure rates, compensation rates.
Pitfalls to watch for
- Forgotten compensations. Adding a new forward step without its compensating step. Code review must catch this.
- Cascading retries. A retry storm on a single service brings down others. Add circuit breakers.
- Compensating transactions that fail. Have a retry policy + manual review queue.
- Choreography spaghetti. When the workflow grows past 3 steps, refactor to orchestration.
- State machine drift. Orchestrator code and saga state schema fall out of sync. Add tests.
SAGA vs 2PC vs Eventual Consistency — quick decision
- 2PC (Two-Phase Commit): strong consistency, terrible scalability. Use only when business requires it AND you control all participants.
- SAGA: eventual consistency with explicit recovery. Use for most distributed business workflows.
- Naive eventual consistency (publish and pray): no recovery, no audit. Don't use for money or inventory.
Summary
The SAGA pattern is the practical answer to distributed transactions at microservices scale. You accept that the system passes through intermediate states, and you build explicit compensation paths to recover from any failure.
Start with Choreography if the workflow is small and stable. Move to Orchestration as it grows. Treat compensating transactions as first-class code — write them, test them, monitor them. Persist saga state durably and always carry an idempotency key.
When implemented well, sagas give you the same business guarantees as monolith transactions but with the scaling, deployment, and fault-isolation benefits of microservices. When implemented poorly, you get partial orders, lost money, and customer support tickets.
📚 Test your knowledge → Practice with our SAGA pattern interview questions — common scenarios, code traps, design trade-offs, and production gotchas.
Get the next issue
A short, curated email with the newest posts and questions.