SAGA vs Two-Phase Commit (2PC) — when does each fit?

Question

Randhir Jassal · Accepted Answer

Both solve "distributed transactions" but make opposite trade-offs.

2PC — Two-Phase Commit

A coordinator asks every participant "can you commit?" (phase 1), then if all say yes, sends COMMIT to all (phase 2). Strong consistency. All participants are locked until phase 2 completes.

SAGA

Each service does its work locally, then triggers the next. No coordinator-held locks. Eventual consistency with explicit compensation on failure.

Side-by-side

	2PC	SAGA
Consistency model	Strong (linearizable)	Eventual
Locks held during transaction	Yes — across all participants	No — only local per step
Failure when coordinator dies	Participants stuck waiting	Saga state durably persisted, resumes
Latency	High (round trips × participants)	Each step is local + a message
Scalability	Poor — locks limit throughput	Excellent — fully async
Compensation needed	No — rollback is automatic	Yes — each step pairs with an undo
Best for	A few participants under your control	Microservices at scale
Modern usage	Banking core, ERP	E-commerce, travel, SaaS workflows

Why 2PC fell out of favor

Coordinator is a single point of failure. When it dies mid-phase-2, participants are blocked in "prepared" state, unable to commit OR roll back. Recovery requires manual intervention.
Locks block other transactions. Inventory rows locked for the duration kill throughput.
Network partitions are unrecoverable. CAP theorem says you can't have both consistency and availability under partition; 2PC picks consistency, becomes unavailable.
Cross-database support is uneven. XA transactions work on commercial DBs (Oracle, SQL Server) but are weak/absent on cloud-native stores (MongoDB, DynamoDB, Cassandra).

Why SAGA wins for microservices

Each service stays independent — no shared coordinator
Locks are local and brief
Network partitions reduce to "stuck saga", recoverable when partition heals
Works with any storage tech — relational, document, key-value
Aligns with how distributed systems actually behave (asynchronous, partial failures)

When 2PC still wins

Banking core ledgers — strict no-intermediate-state required by regulation
All participants under one team's operational control with mature ops
Workflow latency budget is high (back-office batch)
Transaction count is modest (10s/sec, not 10k/sec)

What about distributed transactions in cloud databases?

Modern cloud DBs offer internal distributed transactions via consensus protocols (Spanner, CockroachDB, YugabyteDB). These give 2PC-like guarantees with better failure handling. BUT they only span data inside the same DB cluster — you still need SAGA when spanning multiple services.

Interview rule of thumb

If the question asks about a workflow across multiple services with different teams, languages, or storage — answer SAGA. If the question is about a single database (even sharded) — answer "use the DB's transaction". If the question mentions banking, ATM, ledger, regulatory — mention 2PC plus the fact that it requires careful operational maturity.

SAGA vs Two-Phase Commit (2PC) — when does each fit?

2PC — Two-Phase Commit

SAGA

Side-by-side

Why 2PC fell out of favor

Why SAGA wins for microservices

When 2PC still wins

What about distributed transactions in cloud databases?

Interview rule of thumb

SAGA vs Two-Phase Commit (2PC) — when does each fit?

2PC — Two-Phase Commit

SAGA

Side-by-side

Why 2PC fell out of favor

Why SAGA wins for microservices

When 2PC still wins

What about distributed transactions in cloud databases?

Interview rule of thumb

What is the SAGA pattern and when would you use it?

SAGA Choreography vs Orchestration — which one should you choose?

How do you handle compensating transactions in a SAGA?

What is the SAGA pattern and when would you use it?

SAGA Choreography vs Orchestration — which one should you choose?

How do you handle compensating transactions in a SAGA?

Related questions

Related questions