Outbox Pattern — A Complete Guide with Order Processing Example

The Outbox pattern is how you safely publish events to a message bus from a service that also writes to a database — without ever losing or duplicating an event, even when the service crashes mid-write. It is the single most important pattern for reliable event-driven microservices.

This guide is the complete picture: the problem it solves (the "dual-write problem"), the architecture with diagram, an order-processing implementation in .NET + Postgres, the relay/dispatcher options, and the production reality.

The problem — "dual write"

Imagine an order service that places an order AND tells the rest of the system about it:

// ❌ The naive (broken) approach
public async Task PlaceOrderAsync(PlaceOrderCommand cmd)
{
    await _db.SaveAsync(order);                       // (1) write to DB
    await _bus.PublishAsync(new OrderPlaced(order));  // (2) publish event
}

Looks fine. It is catastrophically broken.

What if step (2) crashes after step (1) succeeds?

Order is in the DB
No event was ever published
Inventory never reserves stock
Payment never charges
Customer waits forever for the email that never arrives

What if step (2) succeeds but the service crashes before returning?

Event was published
Caller retries → step (1) saves a SECOND order → step (2) publishes a SECOND event
Customer gets charged twice

What if step (1) commits and step (2) is briefly down?

The retry storm might publish duplicate events later
Or the event is permanently lost

There is no ordering of these two writes that gives you reliability. You need a single atomic action that says "the order is saved AND the event is queued for delivery." That is what the Outbox pattern does.

The solution — write the event into the SAME database transaction

Instead of publishing to the bus inline, you write the event into an "outbox" table in the same DB as the business entity. A separate relay/dispatcher process polls the outbox and publishes events to the bus, then marks them sent.

Now the only DB write is atomic: order + outbox row commit together, or neither does.

Architecture in one diagram

                              Outbox Pattern — Order Processing
                              ─────────────────────────────────

                              ┌─────────────────────┐
       HTTP POST /orders ─────▶  Order Service      │
                              │   (ASP.NET)         │
                              └──────────┬──────────┘
                                         │
                                         │ ONE atomic transaction
                                         ▼
                              ┌─────────────────────┐
                              │     Postgres        │
                              │                     │
                              │  ┌──────────────┐   │
                              │  │  orders      │   │
                              │  └──────────────┘   │
                              │  ┌──────────────┐   │
                              │  │  outbox      │   │  ← row inserted in the
                              │  │  events      │   │     same transaction
                              │  └──────────────┘   │
                              └──────────┬──────────┘
                                         │
                                         │ poll OR change-data-capture
                                         ▼
                              ┌─────────────────────┐
                              │  Outbox Relay       │  background worker
                              │  (BackgroundService │  (also durable)
                              │   + Polly retry)    │
                              └──────────┬──────────┘
                                         │
                                         │ publish
                                         ▼
                              ┌─────────────────────┐
                              │  Kafka / RabbitMQ   │
                              │  / SQS / Service    │
                              │  Bus                │
                              └──────────┬──────────┘
                                         │
                            ┌────────────┴────────────┐
                            ▼                         ▼
                  ┌──────────────────┐      ┌──────────────────┐
                  │  Inventory       │      │  Payment         │
                  │  Service         │      │  Service         │
                  └──────────────────┘      └──────────────────┘

Three properties that hold:

Order + event row commit together. Either both happen, or neither does. (Atomicity via DB transaction.)
The relay can retry forever without duplicates — it marks events sent, then deletes them.
Crash anywhere is recoverable. The outbox row sits in the DB until the relay processes it.

Implementation — full code (.NET + EF Core + Postgres + RabbitMQ)

Step 1 — Outbox table schema

CREATE TABLE outbox_events (
    id           UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    aggregate    TEXT NOT NULL,         -- e.g. 'order'
    aggregate_id UUID NOT NULL,
    event_type   TEXT NOT NULL,         -- e.g. 'order.placed'
    payload      JSONB NOT NULL,
    created_at   TIMESTAMPTZ NOT NULL DEFAULT now(),
    sent_at      TIMESTAMPTZ,
    attempts     INT NOT NULL DEFAULT 0,
    last_error   TEXT
);

CREATE INDEX idx_outbox_unsent ON outbox_events (created_at)
    WHERE sent_at IS NULL;

The partial index is the trick — the relay query SELECT ... WHERE sent_at IS NULL ORDER BY created_at becomes O(unsent rows), not O(all rows ever).

Step 2 — Order service writes both rows in one transaction

public class OrderService(AppDb db)
{
    public async Task<Order> PlaceAsync(PlaceOrderCommand cmd, CancellationToken ct)
    {
        var order = new Order
        {
            Id           = Guid.NewGuid(),
            CustomerId   = cmd.CustomerId,
            Items        = cmd.Items,
            Total        = cmd.Items.Sum(i => i.Price * i.Quantity),
            Status       = OrderStatus.Placed,
            CreatedAt    = DateTimeOffset.UtcNow
        };

        db.Orders.Add(order);

        // SAME TRANSACTION — this is the whole point of the pattern
        db.OutboxEvents.Add(new OutboxEvent
        {
            Aggregate   = "order",
            AggregateId = order.Id,
            EventType   = "order.placed",
            Payload     = JsonSerializer.Serialize(new
            {
                orderId    = order.Id,
                customerId = order.CustomerId,
                total      = order.Total,
                items      = order.Items.Select(i => new { i.Sku, i.Quantity })
            })
        });

        await db.SaveChangesAsync(ct);   // both rows commit atomically
        return order;
    }
}

That db.SaveChangesAsync issues ONE DB transaction. Either both INSERTs commit or neither does. No bus call inline. No failure mode that leaves data half-written.

Step 3 — The relay (BackgroundService)

public class OutboxRelay : BackgroundService
{
    private readonly IServiceScopeFactory _scopes;
    private readonly IMessageBus _bus;
    private readonly ILogger<OutboxRelay> _log;

    public OutboxRelay(IServiceScopeFactory scopes, IMessageBus bus, ILogger<OutboxRelay> log)
    { _scopes = scopes; _bus = bus; _log = log; }

    protected override async Task ExecuteAsync(CancellationToken stop)
    {
        while (!stop.IsCancellationRequested)
        {
            try
            {
                await DispatchBatchAsync(stop);
                await Task.Delay(TimeSpan.FromMilliseconds(250), stop);
            }
            catch (OperationCanceledException) { /* shutdown */ }
            catch (Exception ex)
            {
                _log.LogError(ex, "outbox relay batch failed");
                await Task.Delay(TimeSpan.FromSeconds(5), stop);
            }
        }
    }

    private async Task DispatchBatchAsync(CancellationToken stop)
    {
        using var scope = _scopes.CreateScope();
        var db = scope.ServiceProvider.GetRequiredService<AppDb>();

        // Lock & claim — `FOR UPDATE SKIP LOCKED` lets multiple relay replicas
        // run in parallel without stepping on each other.
        var batch = await db.OutboxEvents
            .FromSqlRaw(@"
                SELECT * FROM outbox_events
                WHERE sent_at IS NULL
                ORDER BY created_at
                LIMIT 100
                FOR UPDATE SKIP LOCKED")
            .ToListAsync(stop);

        if (batch.Count == 0) return;

        foreach (var ev in batch)
        {
            try
            {
                await _bus.PublishAsync(ev.EventType, ev.Payload, stop);
                ev.SentAt = DateTimeOffset.UtcNow;
            }
            catch (Exception ex)
            {
                ev.Attempts += 1;
                ev.LastError = ex.Message;
                _log.LogWarning(ex, "publish failed for outbox {EventId} attempt {N}",
                    ev.Id, ev.Attempts);
            }
        }

        await db.SaveChangesAsync(stop);
    }
}

builder.Services.AddHostedService<OutboxRelay>();

Step 4 — Periodic cleanup of old sent events

public class OutboxJanitor : BackgroundService
{
    protected override async Task ExecuteAsync(CancellationToken stop)
    {
        while (!stop.IsCancellationRequested)
        {
            using var scope = _scopes.CreateScope();
            var db = scope.ServiceProvider.GetRequiredService<AppDb>();

            // Delete events sent more than 7 days ago.
            await db.Database.ExecuteSqlRawAsync(
                "DELETE FROM outbox_events WHERE sent_at < now() - interval '7 days'",
                stop);

            await Task.Delay(TimeSpan.FromHours(6), stop);
        }
    }
}

Without this, the outbox table grows forever. 7 days gives ops time to replay if something downstream is wrong.

Variants — Polling vs CDC

Polling (the code above)

The relay queries the table every ~250 ms. Simple, portable, works on any database.

Pros	Cons
Trivial to implement	Latency ≈ poll interval (100-500 ms)
Works on every database	Constant DB load
Easy to test, easy to reason about	Doesn't scale to millions of events/sec

Change Data Capture (CDC) — Debezium / Postgres logical replication

The database streams every row change to Kafka via the WAL (write-ahead log). The "relay" is the database itself.

Pros	Cons
Lower latency (~10 ms)	Operational complexity (Debezium, schema registry)
No app-level polling load	Postgres logical replication slot management
Scales to high throughput	Tighter coupling to the DB tech

Rule of thumb: start with polling. Move to CDC when you have >10k events/sec or need sub-100ms event latency.

Advantages of the Outbox pattern

At-least-once delivery guaranteed — events never silently disappear
Atomic with the business transaction — order and event always agree
Crash-safe at every point — the outbox row is the durable handoff
Replayable — re-publish events by clearing sent_at and letting the relay pick them up again
Bus-agnostic — switch from RabbitMQ to Kafka by changing the relay
Inspectable — SELECT * FROM outbox_events WHERE last_error IS NOT NULL is a real ops tool
Audit trail — the outbox IS the event log of business decisions

Disadvantages

At-LEAST-once, not exactly-once — consumers MUST be idempotent
Eventual delivery — between commit and relay there is a gap (~100-500 ms with polling)
Extra table + background process — operational cost
Outbox table growth — needs janitor + monitoring
Order is per-aggregate, not global — global ordering needs additional design

When the Outbox pattern is essential

Any service that writes to a DB AND publishes events
Order processing, payment, inventory, user registration — any business workflow that triggers downstream effects
Anywhere "lost event" is a customer-visible bug
When implementing the SAGA pattern (sagas publish events; outbox makes that reliable)

When you can skip it

Pure read services (no writes)
Workflows where the downstream system has an idempotent pull API and you can let it discover changes (rare)
Internal-only fire-and-forget telemetry where loss is acceptable

Production checklist

Idempotency on consumers. (aggregateId, eventType, sequence) should produce the same effect no matter how many times processed.
Monitoring on the relay. Lag (unsent count, age of oldest unsent), error rate, throughput.
Alerts on stuck events. SELECT count(*) FROM outbox_events WHERE sent_at IS NULL AND created_at < now() - interval '5 minutes' — fire if > 0.
Bounded retries with backoff. After 10 failed attempts, move to a dead-letter table for ops review.
Multi-instance safety. FOR UPDATE SKIP LOCKED (Postgres) or row-level lease tokens (other DBs) so multiple relay replicas don't double-publish.
Janitor with bounded retention. Keep 7-30 days of sent events for audit + replay.
Schema for events. Don't store free-form JSON. Define event schemas (Avro / Protobuf / JSON Schema).
Versioning. Add event_version so schema evolutions can be handled by upcasters downstream.

Common pitfalls

Mixing transaction boundaries. Calling the bus inside the same method as the DB transaction defeats the pattern. Bus publish lives in the RELAY only.
Forgetting FOR UPDATE SKIP LOCKED. Two relays publish the same event twice — your consumers must dedup.
No janitor. Outbox table at 50 GB after a year of production growth.
No idempotency downstream. Outbox guarantees at-least-once. Without dedup on consumers, you get double-charges.
Treating events as commands. Outbox events should describe what HAPPENED ("OrderPlaced"), not what to DO ("ReserveStock"). The latter belongs in a command queue.

Outbox vs Inbox — the symmetric pattern

The Outbox pattern is about reliably emitting events. The Inbox pattern is about reliably consuming them.

Inbox table:
- event_id (unique key on the source event's id)
- received_at
- processed

Consumer logic:
1. INSERT into inbox (event_id) — fails if already there
2. If insert succeeded → process the event
3. Mark processed

Together, outbox + inbox = effectively-once message delivery across services.

Summary

The Outbox pattern is the non-negotiable building block for reliable event-driven microservices. It solves the dual-write problem by replacing two writes (DB + bus) with one DB transaction that includes the event as data.

Implement it as your default for any new service that emits events. Pair it with the Inbox pattern on consumers for effectively-once delivery. Pair it with the SAGA pattern when the workflow spans multiple services.

The pattern is conceptually simple — one extra table, one background worker. The discipline is in the operational details: monitoring, janitor, idempotency, schema versioning, retry policy. Get those right and your event-driven architecture will survive years in production.

📚 Test your knowledge → Practice with our Outbox pattern interview questions — implementation gotchas, the dual-write problem, CDC vs polling trade-offs, and production scenarios.

The problem — "dual write"

Imagine an order service that places an order AND tells the rest of the system about it:

// ❌ The naive (broken) approach
public async Task PlaceOrderAsync(PlaceOrderCommand cmd)
{
    await _db.SaveAsync(order);                       // (1) write to DB
    await _bus.PublishAsync(new OrderPlaced(order));  // (2) publish event
}

Looks fine. It is catastrophically broken.

What if step (2) crashes after step (1) succeeds?

Order is in the DB
No event was ever published
Inventory never reserves stock
Payment never charges
Customer waits forever for the email that never arrives

What if step (2) succeeds but the service crashes before returning?

Event was published
Caller retries → step (1) saves a SECOND order → step (2) publishes a SECOND event
Customer gets charged twice

What if step (1) commits and step (2) is briefly down?

The retry storm might publish duplicate events later
Or the event is permanently lost

The solution — write the event into the SAME database transaction

Now the only DB write is atomic: order + outbox row commit together, or neither does.

Architecture in one diagram

                              Outbox Pattern — Order Processing
                              ─────────────────────────────────

                              ┌─────────────────────┐
       HTTP POST /orders ─────▶  Order Service      │
                              │   (ASP.NET)         │
                              └──────────┬──────────┘
                                         │
                                         │ ONE atomic transaction
                                         ▼
                              ┌─────────────────────┐
                              │     Postgres        │
                              │                     │
                              │  ┌──────────────┐   │
                              │  │  orders      │   │
                              │  └──────────────┘   │
                              │  ┌──────────────┐   │
                              │  │  outbox      │   │  ← row inserted in the
                              │  │  events      │   │     same transaction
                              │  └──────────────┘   │
                              └──────────┬──────────┘
                                         │
                                         │ poll OR change-data-capture
                                         ▼
                              ┌─────────────────────┐
                              │  Outbox Relay       │  background worker
                              │  (BackgroundService │  (also durable)
                              │   + Polly retry)    │
                              └──────────┬──────────┘
                                         │
                                         │ publish
                                         ▼
                              ┌─────────────────────┐
                              │  Kafka / RabbitMQ   │
                              │  / SQS / Service    │
                              │  Bus                │
                              └──────────┬──────────┘
                                         │
                            ┌────────────┴────────────┐
                            ▼                         ▼
                  ┌──────────────────┐      ┌──────────────────┐
                  │  Inventory       │      │  Payment         │
                  │  Service         │      │  Service         │
                  └──────────────────┘      └──────────────────┘

Three properties that hold:

Order + event row commit together. Either both happen, or neither does. (Atomicity via DB transaction.)
The relay can retry forever without duplicates — it marks events sent, then deletes them.
Crash anywhere is recoverable. The outbox row sits in the DB until the relay processes it.

Implementation — full code (.NET + EF Core + Postgres + RabbitMQ)

Step 1 — Outbox table schema

CREATE TABLE outbox_events (
    id           UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    aggregate    TEXT NOT NULL,         -- e.g. 'order'
    aggregate_id UUID NOT NULL,
    event_type   TEXT NOT NULL,         -- e.g. 'order.placed'
    payload      JSONB NOT NULL,
    created_at   TIMESTAMPTZ NOT NULL DEFAULT now(),
    sent_at      TIMESTAMPTZ,
    attempts     INT NOT NULL DEFAULT 0,
    last_error   TEXT
);

CREATE INDEX idx_outbox_unsent ON outbox_events (created_at)
    WHERE sent_at IS NULL;

The partial index is the trick — the relay query SELECT ... WHERE sent_at IS NULL ORDER BY created_at becomes O(unsent rows), not O(all rows ever).

Step 2 — Order service writes both rows in one transaction

public class OrderService(AppDb db)
{
    public async Task<Order> PlaceAsync(PlaceOrderCommand cmd, CancellationToken ct)
    {
        var order = new Order
        {
            Id           = Guid.NewGuid(),
            CustomerId   = cmd.CustomerId,
            Items        = cmd.Items,
            Total        = cmd.Items.Sum(i => i.Price * i.Quantity),
            Status       = OrderStatus.Placed,
            CreatedAt    = DateTimeOffset.UtcNow
        };

        db.Orders.Add(order);

        // SAME TRANSACTION — this is the whole point of the pattern
        db.OutboxEvents.Add(new OutboxEvent
        {
            Aggregate   = "order",
            AggregateId = order.Id,
            EventType   = "order.placed",
            Payload     = JsonSerializer.Serialize(new
            {
                orderId    = order.Id,
                customerId = order.CustomerId,
                total      = order.Total,
                items      = order.Items.Select(i => new { i.Sku, i.Quantity })
            })
        });

        await db.SaveChangesAsync(ct);   // both rows commit atomically
        return order;
    }
}

That db.SaveChangesAsync issues ONE DB transaction. Either both INSERTs commit or neither does. No bus call inline. No failure mode that leaves data half-written.

Step 3 — The relay (BackgroundService)

public class OutboxRelay : BackgroundService
{
    private readonly IServiceScopeFactory _scopes;
    private readonly IMessageBus _bus;
    private readonly ILogger<OutboxRelay> _log;

    public OutboxRelay(IServiceScopeFactory scopes, IMessageBus bus, ILogger<OutboxRelay> log)
    { _scopes = scopes; _bus = bus; _log = log; }

    protected override async Task ExecuteAsync(CancellationToken stop)
    {
        while (!stop.IsCancellationRequested)
        {
            try
            {
                await DispatchBatchAsync(stop);
                await Task.Delay(TimeSpan.FromMilliseconds(250), stop);
            }
            catch (OperationCanceledException) { /* shutdown */ }
            catch (Exception ex)
            {
                _log.LogError(ex, "outbox relay batch failed");
                await Task.Delay(TimeSpan.FromSeconds(5), stop);
            }
        }
    }

    private async Task DispatchBatchAsync(CancellationToken stop)
    {
        using var scope = _scopes.CreateScope();
        var db = scope.ServiceProvider.GetRequiredService<AppDb>();

        // Lock & claim — `FOR UPDATE SKIP LOCKED` lets multiple relay replicas
        // run in parallel without stepping on each other.
        var batch = await db.OutboxEvents
            .FromSqlRaw(@"
                SELECT * FROM outbox_events
                WHERE sent_at IS NULL
                ORDER BY created_at
                LIMIT 100
                FOR UPDATE SKIP LOCKED")
            .ToListAsync(stop);

        if (batch.Count == 0) return;

        foreach (var ev in batch)
        {
            try
            {
                await _bus.PublishAsync(ev.EventType, ev.Payload, stop);
                ev.SentAt = DateTimeOffset.UtcNow;
            }
            catch (Exception ex)
            {
                ev.Attempts += 1;
                ev.LastError = ex.Message;
                _log.LogWarning(ex, "publish failed for outbox {EventId} attempt {N}",
                    ev.Id, ev.Attempts);
            }
        }

        await db.SaveChangesAsync(stop);
    }
}

builder.Services.AddHostedService<OutboxRelay>();

Step 4 — Periodic cleanup of old sent events

public class OutboxJanitor : BackgroundService
{
    protected override async Task ExecuteAsync(CancellationToken stop)
    {
        while (!stop.IsCancellationRequested)
        {
            using var scope = _scopes.CreateScope();
            var db = scope.ServiceProvider.GetRequiredService<AppDb>();

            // Delete events sent more than 7 days ago.
            await db.Database.ExecuteSqlRawAsync(
                "DELETE FROM outbox_events WHERE sent_at < now() - interval '7 days'",
                stop);

            await Task.Delay(TimeSpan.FromHours(6), stop);
        }
    }
}

Without this, the outbox table grows forever. 7 days gives ops time to replay if something downstream is wrong.

Variants — Polling vs CDC

Polling (the code above)

The relay queries the table every ~250 ms. Simple, portable, works on any database.

Pros	Cons
Trivial to implement	Latency ≈ poll interval (100-500 ms)
Works on every database	Constant DB load
Easy to test, easy to reason about	Doesn't scale to millions of events/sec

Change Data Capture (CDC) — Debezium / Postgres logical replication

The database streams every row change to Kafka via the WAL (write-ahead log). The "relay" is the database itself.

Pros	Cons
Lower latency (~10 ms)	Operational complexity (Debezium, schema registry)
No app-level polling load	Postgres logical replication slot management
Scales to high throughput	Tighter coupling to the DB tech

Rule of thumb: start with polling. Move to CDC when you have >10k events/sec or need sub-100ms event latency.

Advantages of the Outbox pattern

At-least-once delivery guaranteed — events never silently disappear
Atomic with the business transaction — order and event always agree
Crash-safe at every point — the outbox row is the durable handoff
Replayable — re-publish events by clearing sent_at and letting the relay pick them up again
Bus-agnostic — switch from RabbitMQ to Kafka by changing the relay
Inspectable — SELECT * FROM outbox_events WHERE last_error IS NOT NULL is a real ops tool
Audit trail — the outbox IS the event log of business decisions

Disadvantages

At-LEAST-once, not exactly-once — consumers MUST be idempotent
Eventual delivery — between commit and relay there is a gap (~100-500 ms with polling)
Extra table + background process — operational cost
Outbox table growth — needs janitor + monitoring
Order is per-aggregate, not global — global ordering needs additional design

When the Outbox pattern is essential

Any service that writes to a DB AND publishes events
Order processing, payment, inventory, user registration — any business workflow that triggers downstream effects
Anywhere "lost event" is a customer-visible bug
When implementing the SAGA pattern (sagas publish events; outbox makes that reliable)

When you can skip it

Pure read services (no writes)
Workflows where the downstream system has an idempotent pull API and you can let it discover changes (rare)
Internal-only fire-and-forget telemetry where loss is acceptable

Production checklist

Idempotency on consumers. (aggregateId, eventType, sequence) should produce the same effect no matter how many times processed.
Monitoring on the relay. Lag (unsent count, age of oldest unsent), error rate, throughput.
Alerts on stuck events. SELECT count(*) FROM outbox_events WHERE sent_at IS NULL AND created_at < now() - interval '5 minutes' — fire if > 0.
Bounded retries with backoff. After 10 failed attempts, move to a dead-letter table for ops review.
Multi-instance safety. FOR UPDATE SKIP LOCKED (Postgres) or row-level lease tokens (other DBs) so multiple relay replicas don't double-publish.
Janitor with bounded retention. Keep 7-30 days of sent events for audit + replay.
Schema for events. Don't store free-form JSON. Define event schemas (Avro / Protobuf / JSON Schema).
Versioning. Add event_version so schema evolutions can be handled by upcasters downstream.

Common pitfalls

Mixing transaction boundaries. Calling the bus inside the same method as the DB transaction defeats the pattern. Bus publish lives in the RELAY only.
Forgetting FOR UPDATE SKIP LOCKED. Two relays publish the same event twice — your consumers must dedup.
No janitor. Outbox table at 50 GB after a year of production growth.
No idempotency downstream. Outbox guarantees at-least-once. Without dedup on consumers, you get double-charges.
Treating events as commands. Outbox events should describe what HAPPENED ("OrderPlaced"), not what to DO ("ReserveStock"). The latter belongs in a command queue.

Outbox vs Inbox — the symmetric pattern

The Outbox pattern is about reliably emitting events. The Inbox pattern is about reliably consuming them.

Inbox table:
- event_id (unique key on the source event's id)
- received_at
- processed

Consumer logic:
1. INSERT into inbox (event_id) — fails if already there
2. If insert succeeded → process the event
3. Mark processed

Together, outbox + inbox = effectively-once message delivery across services.

Summary

📚 Test your knowledge → Practice with our Outbox pattern interview questions — implementation gotchas, the dual-write problem, CDC vs polling trade-offs, and production scenarios.

Get the next issue

Keep reading

Get the next issue

Keep reading