Outbox Pattern — A Complete Guide with Order Processing Example
Deep-dive into the Outbox pattern: the dual-write problem it solves, architecture diagram, real .NET code with EF Core, the relay/dispatcher implementation, CDC vs polling variants, advantages, disadvantages, and production pitfalls.
- Author
- Randhir Jassal
- Published
- Reading time
- 17 min read
The Outbox pattern is how you safely publish events to a message bus from a service that also writes to a database — without ever losing or duplicating an event, even when the service crashes mid-write. It is the single most important pattern for reliable event-driven microservices.
This guide is the complete picture: the problem it solves (the "dual-write problem"), the architecture with diagram, an order-processing implementation in .NET + Postgres, the relay/dispatcher options, and the production reality.
The problem — "dual write"
Imagine an order service that places an order AND tells the rest of the system about it:
// ❌ The naive (broken) approach
public async Task PlaceOrderAsync(PlaceOrderCommand cmd)
{
await _db.SaveAsync(order); // (1) write to DB
await _bus.PublishAsync(new OrderPlaced(order)); // (2) publish event
}
Looks fine. It is catastrophically broken.
What if step (2) crashes after step (1) succeeds?
- Order is in the DB
- No event was ever published
- Inventory never reserves stock
- Payment never charges
- Customer waits forever for the email that never arrives
What if step (2) succeeds but the service crashes before returning?
- Event was published
- Caller retries → step (1) saves a SECOND order → step (2) publishes a SECOND event
- Customer gets charged twice
What if step (1) commits and step (2) is briefly down?
- The retry storm might publish duplicate events later
- Or the event is permanently lost
There is no ordering of these two writes that gives you reliability. You need a single atomic action that says "the order is saved AND the event is queued for delivery." That is what the Outbox pattern does.
The solution — write the event into the SAME database transaction
Instead of publishing to the bus inline, you write the event into an "outbox" table in the same DB as the business entity. A separate relay/dispatcher process polls the outbox and publishes events to the bus, then marks them sent.
Now the only DB write is atomic: order + outbox row commit together, or neither does.
Architecture in one diagram
Outbox Pattern — Order Processing
─────────────────────────────────
┌─────────────────────┐
HTTP POST /orders ─────▶ Order Service │
│ (ASP.NET) │
└──────────┬──────────┘
│
│ ONE atomic transaction
▼
┌─────────────────────┐
│ Postgres │
│ │
│ ┌──────────────┐ │
│ │ orders │ │
│ └──────────────┘ │
│ ┌──────────────┐ │
│ │ outbox │ │ ← row inserted in the
│ │ events │ │ same transaction
│ └──────────────┘ │
└──────────┬──────────┘
│
│ poll OR change-data-capture
▼
┌─────────────────────┐
│ Outbox Relay │ background worker
│ (BackgroundService │ (also durable)
│ + Polly retry) │
└──────────┬──────────┘
│
│ publish
▼
┌─────────────────────┐
│ Kafka / RabbitMQ │
│ / SQS / Service │
│ Bus │
└──────────┬──────────┘
│
┌────────────┴────────────┐
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Inventory │ │ Payment │
│ Service │ │ Service │
└──────────────────┘ └──────────────────┘
Three properties that hold:
- Order + event row commit together. Either both happen, or neither does. (Atomicity via DB transaction.)
- The relay can retry forever without duplicates — it marks events sent, then deletes them.
- Crash anywhere is recoverable. The outbox row sits in the DB until the relay processes it.
Implementation — full code (.NET + EF Core + Postgres + RabbitMQ)
Step 1 — Outbox table schema
CREATE TABLE outbox_events (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
aggregate TEXT NOT NULL, -- e.g. 'order'
aggregate_id UUID NOT NULL,
event_type TEXT NOT NULL, -- e.g. 'order.placed'
payload JSONB NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
sent_at TIMESTAMPTZ,
attempts INT NOT NULL DEFAULT 0,
last_error TEXT
);
CREATE INDEX idx_outbox_unsent ON outbox_events (created_at)
WHERE sent_at IS NULL;
The partial index is the trick — the relay query SELECT ... WHERE sent_at IS NULL ORDER BY created_at becomes O(unsent rows), not O(all rows ever).
Step 2 — Order service writes both rows in one transaction
public class OrderService(AppDb db)
{
public async Task<Order> PlaceAsync(PlaceOrderCommand cmd, CancellationToken ct)
{
var order = new Order
{
Id = Guid.NewGuid(),
CustomerId = cmd.CustomerId,
Items = cmd.Items,
Total = cmd.Items.Sum(i => i.Price * i.Quantity),
Status = OrderStatus.Placed,
CreatedAt = DateTimeOffset.UtcNow
};
db.Orders.Add(order);
// SAME TRANSACTION — this is the whole point of the pattern
db.OutboxEvents.Add(new OutboxEvent
{
Aggregate = "order",
AggregateId = order.Id,
EventType = "order.placed",
Payload = JsonSerializer.Serialize(new
{
orderId = order.Id,
customerId = order.CustomerId,
total = order.Total,
items = order.Items.Select(i => new { i.Sku, i.Quantity })
})
});
await db.SaveChangesAsync(ct); // both rows commit atomically
return order;
}
}
That db.SaveChangesAsync issues ONE DB transaction. Either both INSERTs commit or neither does. No bus call inline. No failure mode that leaves data half-written.
Step 3 — The relay (BackgroundService)
public class OutboxRelay : BackgroundService
{
private readonly IServiceScopeFactory _scopes;
private readonly IMessageBus _bus;
private readonly ILogger<OutboxRelay> _log;
public OutboxRelay(IServiceScopeFactory scopes, IMessageBus bus, ILogger<OutboxRelay> log)
{ _scopes = scopes; _bus = bus; _log = log; }
protected override async Task ExecuteAsync(CancellationToken stop)
{
while (!stop.IsCancellationRequested)
{
try
{
await DispatchBatchAsync(stop);
await Task.Delay(TimeSpan.FromMilliseconds(250), stop);
}
catch (OperationCanceledException) { /* shutdown */ }
catch (Exception ex)
{
_log.LogError(ex, "outbox relay batch failed");
await Task.Delay(TimeSpan.FromSeconds(5), stop);
}
}
}
private async Task DispatchBatchAsync(CancellationToken stop)
{
using var scope = _scopes.CreateScope();
var db = scope.ServiceProvider.GetRequiredService<AppDb>();
// Lock & claim — `FOR UPDATE SKIP LOCKED` lets multiple relay replicas
// run in parallel without stepping on each other.
var batch = await db.OutboxEvents
.FromSqlRaw(@"
SELECT * FROM outbox_events
WHERE sent_at IS NULL
ORDER BY created_at
LIMIT 100
FOR UPDATE SKIP LOCKED")
.ToListAsync(stop);
if (batch.Count == 0) return;
foreach (var ev in batch)
{
try
{
await _bus.PublishAsync(ev.EventType, ev.Payload, stop);
ev.SentAt = DateTimeOffset.UtcNow;
}
catch (Exception ex)
{
ev.Attempts += 1;
ev.LastError = ex.Message;
_log.LogWarning(ex, "publish failed for outbox {EventId} attempt {N}",
ev.Id, ev.Attempts);
}
}
await db.SaveChangesAsync(stop);
}
}
Register in Program.cs:
builder.Services.AddHostedService<OutboxRelay>();
Step 4 — Periodic cleanup of old sent events
public class OutboxJanitor : BackgroundService
{
protected override async Task ExecuteAsync(CancellationToken stop)
{
while (!stop.IsCancellationRequested)
{
using var scope = _scopes.CreateScope();
var db = scope.ServiceProvider.GetRequiredService<AppDb>();
// Delete events sent more than 7 days ago.
await db.Database.ExecuteSqlRawAsync(
"DELETE FROM outbox_events WHERE sent_at < now() - interval '7 days'",
stop);
await Task.Delay(TimeSpan.FromHours(6), stop);
}
}
}
Without this, the outbox table grows forever. 7 days gives ops time to replay if something downstream is wrong.
Variants — Polling vs CDC
Polling (the code above)
The relay queries the table every ~250 ms. Simple, portable, works on any database.
| Pros | Cons |
|---|---|
| Trivial to implement | Latency ≈ poll interval (100-500 ms) |
| Works on every database | Constant DB load |
| Easy to test, easy to reason about | Doesn't scale to millions of events/sec |
Change Data Capture (CDC) — Debezium / Postgres logical replication
The database streams every row change to Kafka via the WAL (write-ahead log). The "relay" is the database itself.
| Pros | Cons |
|---|---|
| Lower latency (~10 ms) | Operational complexity (Debezium, schema registry) |
| No app-level polling load | Postgres logical replication slot management |
| Scales to high throughput | Tighter coupling to the DB tech |
Rule of thumb: start with polling. Move to CDC when you have >10k events/sec or need sub-100ms event latency.
Advantages of the Outbox pattern
- At-least-once delivery guaranteed — events never silently disappear
- Atomic with the business transaction — order and event always agree
- Crash-safe at every point — the outbox row is the durable handoff
- Replayable — re-publish events by clearing
sent_atand letting the relay pick them up again - Bus-agnostic — switch from RabbitMQ to Kafka by changing the relay
- Inspectable —
SELECT * FROM outbox_events WHERE last_error IS NOT NULLis a real ops tool - Audit trail — the outbox IS the event log of business decisions
Disadvantages
- At-LEAST-once, not exactly-once — consumers MUST be idempotent
- Eventual delivery — between commit and relay there is a gap (~100-500 ms with polling)
- Extra table + background process — operational cost
- Outbox table growth — needs janitor + monitoring
- Order is per-aggregate, not global — global ordering needs additional design
When the Outbox pattern is essential
- Any service that writes to a DB AND publishes events
- Order processing, payment, inventory, user registration — any business workflow that triggers downstream effects
- Anywhere "lost event" is a customer-visible bug
- When implementing the SAGA pattern (sagas publish events; outbox makes that reliable)
When you can skip it
- Pure read services (no writes)
- Workflows where the downstream system has an idempotent pull API and you can let it discover changes (rare)
- Internal-only fire-and-forget telemetry where loss is acceptable
Production checklist
- Idempotency on consumers.
(aggregateId, eventType, sequence)should produce the same effect no matter how many times processed. - Monitoring on the relay. Lag (unsent count, age of oldest unsent), error rate, throughput.
- Alerts on stuck events.
SELECT count(*) FROM outbox_events WHERE sent_at IS NULL AND created_at < now() - interval '5 minutes'— fire if > 0. - Bounded retries with backoff. After 10 failed attempts, move to a dead-letter table for ops review.
- Multi-instance safety.
FOR UPDATE SKIP LOCKED(Postgres) or row-level lease tokens (other DBs) so multiple relay replicas don't double-publish. - Janitor with bounded retention. Keep 7-30 days of sent events for audit + replay.
- Schema for events. Don't store free-form JSON. Define event schemas (Avro / Protobuf / JSON Schema).
- Versioning. Add
event_versionso schema evolutions can be handled by upcasters downstream.
Common pitfalls
- Mixing transaction boundaries. Calling the bus inside the same method as the DB transaction defeats the pattern. Bus publish lives in the RELAY only.
- Forgetting
FOR UPDATE SKIP LOCKED. Two relays publish the same event twice — your consumers must dedup. - No janitor. Outbox table at 50 GB after a year of production growth.
- No idempotency downstream. Outbox guarantees at-least-once. Without dedup on consumers, you get double-charges.
- Treating events as commands. Outbox events should describe what HAPPENED ("OrderPlaced"), not what to DO ("ReserveStock"). The latter belongs in a command queue.
Outbox vs Inbox — the symmetric pattern
The Outbox pattern is about reliably emitting events. The Inbox pattern is about reliably consuming them.
Inbox table:
- event_id (unique key on the source event's id)
- received_at
- processed
Consumer logic:
1. INSERT into inbox (event_id) — fails if already there
2. If insert succeeded → process the event
3. Mark processed
Together, outbox + inbox = effectively-once message delivery across services.
Summary
The Outbox pattern is the non-negotiable building block for reliable event-driven microservices. It solves the dual-write problem by replacing two writes (DB + bus) with one DB transaction that includes the event as data.
Implement it as your default for any new service that emits events. Pair it with the Inbox pattern on consumers for effectively-once delivery. Pair it with the SAGA pattern when the workflow spans multiple services.
The pattern is conceptually simple — one extra table, one background worker. The discipline is in the operational details: monitoring, janitor, idempotency, schema versioning, retry policy. Get those right and your event-driven architecture will survive years in production.
📚 Test your knowledge → Practice with our Outbox pattern interview questions — implementation gotchas, the dual-write problem, CDC vs polling trade-offs, and production scenarios.
Get the next issue
A short, curated email with the newest posts and questions.