Outbox: Transactional Outbox vs Change Data Capture (CDC) — when each?
Both reliably deliver events from a database to a message bus. They differ in how events are extracted and published.
Transactional Outbox (the standard pattern)
Your application explicitly INSERTs a row into an outbox_events table inside the same business transaction. A background relay polls the table and publishes.
db.Orders.Add(order);
db.OutboxEvents.Add(new OutboxEvent { EventType = "order.placed", Payload = ... });
await db.SaveChangesAsync(); // both rows commit together
CDC (Change Data Capture)
The application writes ONLY to the business tables. The database itself streams every row change to Kafka via its write-ahead log (WAL). Tools: Debezium, Postgres logical replication, MySQL binlog.
db.Orders.Add(order);
await db.SaveChangesAsync(); // that's it — Debezium picks it up automatically
Side-by-side
| Transactional Outbox | CDC (Debezium etc.) | |
|---|---|---|
| Application code | Writes outbox row explicitly | No outbox code at all |
| Event schema | Full control via outbox row payload | Database row shape (one event per column change) |
| Latency | 100-500 ms (poll interval) | ~10 ms (log streaming) |
| Throughput ceiling | Limited by polling, ~10k events/sec | High — 100k+ events/sec |
| Operational complexity | Low (one table, one worker) | High (Kafka, Debezium, schema registry, replication slots) |
| Vendor lock-in | Portable across any DB | Tied to specific DB log format |
| Schema evolution | App controls the event shape | Tied to DB schema changes |
| Best for | Most apps under ~10k events/sec | High-volume systems, real-time data pipelines |
When to pick Transactional Outbox
- New service or up to ~10k events/sec
- Team is small, ops capacity is limited
- You want explicit control over event shape (e.g. "OrderPlaced" event with curated fields, not raw row dump)
- Migration path between bus providers (RabbitMQ → Kafka) shouldn't be hard
- Most common choice — default for ~90% of services
When to pick CDC
- High-volume systems (millions of events/day)
- You already operate Kafka + Debezium for analytics
- Real-time data warehousing alongside operational events
- You can tolerate "raw row change" event shape OR have a separate transformer
- Common in companies past 50 engineers with dedicated platform teams
Hybrid pattern
Many large systems use BOTH:
- Transactional outbox for business events (intentful, curated payloads: "OrderPlaced", "PaymentSucceeded")
- CDC for data replication (continuous sync of OLTP data into OLAP / data lake / read replicas)
Common interview follow-up
"What's the downside of CDC for business events specifically?"
Answer:
- The event shape is "table row diff" — not "business event". Consumers need to reconstruct meaning ("if status went from 'pending' to 'paid', that's the OrderPaid event").
- Multi-row business events become multiple CDC events. "Order placed with 3 items" → 1 INSERT in orders + 3 INSERTs in order_items → 4 events, not 1.
- Schema changes ripple through every consumer. With explicit outbox, you control the event API independent of DB schema.
Migration pattern
Most teams that need CDC eventually run the outbox table THROUGH Debezium — get the best of both:
- Application code keeps writing to a clean outbox table (intentful event shape)
- Debezium streams that table to Kafka (sub-100ms latency, high throughput)
- No polling load on the DB
- Schema stays under app control
This "Outbox + CDC" hybrid is the production-grade architecture for high-volume event-driven systems at companies like Shopify, Confluent, and Zalando.