How do you handle event ordering in the Outbox pattern?
Event ordering matters because some downstream effects depend on the sequence of events. ("UserDeactivated then UserReactivated" must not arrive as "Reactivated then Deactivated".)
What the outbox pattern naturally guarantees
- Per-aggregate ordering — events for one order, one user, one cart arrive in the order they were committed. Because the outbox is sorted by
created_atand the relay processes them in order per partition.
What it does NOT guarantee
- Global ordering across aggregates — events for different orders may interleave in any order.
- Cross-service ordering — if Service A publishes E1 and Service B publishes E2, downstream sees them in arbitrary order.
Implementation — preserve per-aggregate order
CREATE TABLE outbox_events (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
aggregate TEXT NOT NULL, -- 'order', 'user', etc.
aggregate_id UUID NOT NULL, -- the order_id, user_id
event_type TEXT NOT NULL,
payload JSONB NOT NULL,
sequence BIGINT NOT NULL, -- per-aggregate sequence
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
sent_at TIMESTAMPTZ
);
-- Per-aggregate sequence via a sequence per aggregate type OR a global one
CREATE INDEX idx_outbox_aggregate_seq ON outbox_events (aggregate_id, sequence)
WHERE sent_at IS NULL;
When publishing to Kafka, use aggregate_id as the partition key. Kafka guarantees in-partition ordering, so all events for one order land on one partition and are consumed in order.
await _producer.ProduceAsync(new Message<string, string> {
Key = ev.AggregateId.ToString(), // partition key
Value = ev.Payload
});
The "single dispatcher per partition" rule
For per-aggregate ordering to actually hold:
- The relay must publish events for the SAME aggregate IN SEQUENCE
- Multiple relay instances must NOT publish events for the same aggregate concurrently
How to enforce this with Postgres + multiple replicas:
-- Each relay locks the next aggregate by ID, processes ALL its unsent events,
-- then moves on. Other relays can claim DIFFERENT aggregates concurrently.
WITH next_aggregate AS (
SELECT aggregate_id FROM outbox_events
WHERE sent_at IS NULL
ORDER BY created_at
LIMIT 1
FOR UPDATE SKIP LOCKED
)
SELECT * FROM outbox_events
WHERE aggregate_id = (SELECT aggregate_id FROM next_aggregate)
AND sent_at IS NULL
ORDER BY sequence;
This is "aggregate locking" — one relay owns one aggregate's events at a time. Throughput scales by adding aggregates, not by parallelising one aggregate.
Cross-aggregate ordering — when needed
Sometimes you need "user X's UserCreated event MUST arrive before any of their OrderPlaced events." Two strategies:
A. Single partition
Publish all events to a single Kafka partition (Key = constant). Strict global order, but throughput limited to one consumer.
B. Sequence number + consumer-side reordering
Include a global monotonic sequence in each event. Consumers buffer out-of-order events for a short window (~5s) and process in sequence order.
public async Task Handle(IncomingEvent ev)
{
_buffer.Add(ev);
while (_buffer.TryGetNext(_expectedSeq, out var next))
{
await ProcessAsync(next);
_expectedSeq = next.Sequence + 1;
}
}
This is complex — only use it when business requires it.
Common interview question — eventual consistency vs strict ordering
"Can the outbox pattern give you exactly-once with strict global order?"
No. The outbox gives you at-least-once with per-aggregate order. Strict global order requires giving up parallelism. Exactly-once requires consumers to be idempotent (inbox pattern).
If a question demands "strict global order + exactly-once + high throughput", the answer is: that's the impossible-trinity. Pick two. For e-commerce, you almost always pick "at-least-once + per-aggregate-order + high throughput" and accept idempotent consumers.
Production checklist
- Outbox table indexed by
(sent_at) WHERE sent_at IS NULL— fast unsent lookup - Partition key on the bus = aggregate ID — preserves order on consumer side
- Sequence column for resilience to clock skew (don't trust
created_atalone) FOR UPDATE SKIP LOCKEDfor multi-instance safety- Monitoring: max(now() - created_at) for unsent rows — alert at >30s