Distributed tracing in microservices — OpenTelemetry, span context, sampling

Question

Randhir Jassal · Accepted Answer

Distributed tracing reconstructs one logical operation across N services as a trace made of spans linked by a shared traceid. Core vocabulary - Trace — the whole request, one tree. - Span — one unit of work (HTTP call, DB query). Has spanid, parentspanid, start/end time, attributes. - Context propagation — traceparent header (W3C) carries traceid + parentspanid between services. - Sampler — decides which traces to keep. Head-based (decide at trace start) vs tail-based (after observing the whole trace). OpenTelemetry — .NET setup Spans are emitted automatically for incoming HTTP, outgoing HTTP, and EF Core queries. Add your own: Sampling — the cost knob 100% sampling is unaffordable at scale. Common configs: | Strategy | Pros | Cons | |---|---|---| | Always-on | Full fidelity | High storage cost | | Probabilistic 1% | Cheap | Rare bugs invisible | | Parent-based | Decision propagates, consistent traces | Trace-start service decides for the whole graph | | Tail sampling | Keep all errors + a % of success | Requires collector with buffering memory | What good looks like - Errors and slow traces — always keep. - Successful sub-100ms traces — sample 1 to 5 %. - Add enduser.id, db.statement, http.statuscode as standard attributes. - Limit cardinality on tag values (never put userId as a metric label; OK on span attributes). Common pitfalls - Lost trace context across async boundaries — use Activity.Current correctly; bg tasks need explicit context capture. - Logs without traceid — instrument your logger to enrich every entry with the active trace + span IDs so logs and traces correlate. - Trace storage explosion — span attributes are cheap individually, lethal in aggregate. Set a budget per service.

Distributed tracing in microservices — OpenTelemetry, span context, sampling

Core vocabulary

OpenTelemetry — .NET setup

Sampling — the cost knob

What good looks like

Common pitfalls

Distributed tracing in microservices — OpenTelemetry, span context, sampling

Core vocabulary

OpenTelemetry — .NET setup

Sampling — the cost knob

What good looks like

Common pitfalls

Event sourcing vs CRUD — when does the complexity pay off in microservices?

Synchronous vs asynchronous communication in microservices — REST, gRPC, or queues?

Eventual consistency UX patterns — how to hide it from the user

Event sourcing vs CRUD — when does the complexity pay off in microservices?

Synchronous vs asynchronous communication in microservices — REST, gRPC, or queues?

Eventual consistency UX patterns — how to hide it from the user

Strategy	Pros	Cons
Always-on	Full fidelity	High storage cost
Probabilistic 1%	Cheap	Rare bugs invisible
Parent-based	Decision propagates, consistent traces	Trace-start service decides for the whole graph
Tail sampling	Keep all errors + a % of success	Requires collector with buffering memory

Related questions

Related questions