How do you handle event ordering in the Outbox pattern?

Question

Randhir Jassal · Accepted Answer

Event ordering matters because some downstream effects depend on the sequence of events. ("UserDeactivated then UserReactivated" must not arrive as "Reactivated then Deactivated".)

What the outbox pattern naturally guarantees

Per-aggregate ordering — events for one order, one user, one cart arrive in the order they were committed. Because the outbox is sorted by created_at and the relay processes them in order per partition.

What it does NOT guarantee

Global ordering across aggregates — events for different orders may interleave in any order.
Cross-service ordering — if Service A publishes E1 and Service B publishes E2, downstream sees them in arbitrary order.

Implementation — preserve per-aggregate order

CREATE TABLE outbox_events (
    id           UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    aggregate    TEXT NOT NULL,         -- 'order', 'user', etc.
    aggregate_id UUID NOT NULL,         -- the order_id, user_id
    event_type   TEXT NOT NULL,
    payload      JSONB NOT NULL,
    sequence     BIGINT NOT NULL,       -- per-aggregate sequence
    created_at   TIMESTAMPTZ NOT NULL DEFAULT now(),
    sent_at      TIMESTAMPTZ
);

-- Per-aggregate sequence via a sequence per aggregate type OR a global one
CREATE INDEX idx_outbox_aggregate_seq ON outbox_events (aggregate_id, sequence)
    WHERE sent_at IS NULL;

When publishing to Kafka, use aggregate_id as the partition key. Kafka guarantees in-partition ordering, so all events for one order land on one partition and are consumed in order.

await _producer.ProduceAsync(new Message<string, string> {
    Key   = ev.AggregateId.ToString(),   // partition key
    Value = ev.Payload
});

The "single dispatcher per partition" rule

For per-aggregate ordering to actually hold:

The relay must publish events for the SAME aggregate IN SEQUENCE
Multiple relay instances must NOT publish events for the same aggregate concurrently

How to enforce this with Postgres + multiple replicas:

-- Each relay locks the next aggregate by ID, processes ALL its unsent events,
-- then moves on. Other relays can claim DIFFERENT aggregates concurrently.

WITH next_aggregate AS (
    SELECT aggregate_id FROM outbox_events
    WHERE sent_at IS NULL
    ORDER BY created_at
    LIMIT 1
    FOR UPDATE SKIP LOCKED
)
SELECT * FROM outbox_events
WHERE aggregate_id = (SELECT aggregate_id FROM next_aggregate)
  AND sent_at IS NULL
ORDER BY sequence;

This is "aggregate locking" — one relay owns one aggregate's events at a time. Throughput scales by adding aggregates, not by parallelising one aggregate.

Cross-aggregate ordering — when needed

Sometimes you need "user X's UserCreated event MUST arrive before any of their OrderPlaced events." Two strategies:

A. Single partition

Publish all events to a single Kafka partition (Key = constant). Strict global order, but throughput limited to one consumer.

B. Sequence number + consumer-side reordering

Include a global monotonic sequence in each event. Consumers buffer out-of-order events for a short window (~5s) and process in sequence order.

public async Task Handle(IncomingEvent ev)
{
    _buffer.Add(ev);
    while (_buffer.TryGetNext(_expectedSeq, out var next))
    {
        await ProcessAsync(next);
        _expectedSeq = next.Sequence + 1;
    }
}

This is complex — only use it when business requires it.

Common interview question — eventual consistency vs strict ordering

"Can the outbox pattern give you exactly-once with strict global order?"

No. The outbox gives you at-least-once with per-aggregate order. Strict global order requires giving up parallelism. Exactly-once requires consumers to be idempotent (inbox pattern).

If a question demands "strict global order + exactly-once + high throughput", the answer is: that's the impossible-trinity. Pick two. For e-commerce, you almost always pick "at-least-once + per-aggregate-order + high throughput" and accept idempotent consumers.

Production checklist

Outbox table indexed by (sent_at) WHERE sent_at IS NULL — fast unsent lookup
Partition key on the bus = aggregate ID — preserves order on consumer side
Sequence column for resilience to clock skew (don't trust created_at alone)
FOR UPDATE SKIP LOCKED for multi-instance safety
Monitoring: max(now() - created_at) for unsent rows — alert at >30s

How do you handle event ordering in the Outbox pattern?

What the outbox pattern naturally guarantees

What it does NOT guarantee

Implementation — preserve per-aggregate order

The "single dispatcher per partition" rule

Cross-aggregate ordering — when needed

A. Single partition

B. Sequence number + consumer-side reordering

Common interview question — eventual consistency vs strict ordering

Production checklist

How do you handle event ordering in the Outbox pattern?

What the outbox pattern naturally guarantees

What it does NOT guarantee

Implementation — preserve per-aggregate order

The "single dispatcher per partition" rule

Cross-aggregate ordering — when needed

A. Single partition

B. Sequence number + consumer-side reordering

Common interview question — eventual consistency vs strict ordering

Production checklist

What happens if the Outbox relay crashes mid-batch?

Outbox vs Inbox pattern — what is the difference?

What is the Outbox pattern and what problem does it solve?

What happens if the Outbox relay crashes mid-batch?

Outbox vs Inbox pattern — what is the difference?

What is the Outbox pattern and what problem does it solve?

Related questions

Related questions