What is the difference between liveness and readiness probes, and why does it matter for service discovery?

Question

Randhir Jassal · Accepted Answer

Liveness and readiness probes look similar but answer two completely different questions — and confusing them is one of the most common causes of microservice outages.

Liveness — "is this process still alive?"

If it fails: Kubernetes restarts the pod.
What it checks: can the process respond to a basic HTTP request? Is it deadlocked? Has it crashed without exiting?
Should NOT check: downstream dependencies (DB, message broker, other services). A transient DB blip will kill your pod for no reason.

app.MapHealthChecks("/health/live", new() {
    Predicate = c => c.Tags.Contains("live")  // only the "self" check
});

Readiness — "can this pod serve traffic right now?"

If it fails: Kubernetes removes the pod from the Service''s Endpoints list but doesn''t kill it.
What it checks: all the dependencies you need to handle a real request — DB connection, cache, critical downstreams.
Why this matters for discovery: an unready pod is invisible to other services. When it recovers, it''s added back automatically.

app.MapHealthChecks("/health/ready", new() {
    Predicate = c => c.Tags.Contains("ready")  // DB + downstreams
});

Why the split matters

Imagine your DB has a 10-second hiccup:

Same probe for both: every pod fails liveness → Kubernetes restarts them all → cold-start storm → real outage.
Split probes: every pod fails readiness → drained from Service → no traffic → no errors. When DB recovers, readiness passes → pods are added back. Zero restarts. Self-healing.

Plus the startup probe (Kubernetes 1.18+)

For slow-starting services (EF Core migrations, heavy DI, JIT warmup), add a startupProbe. It runs first; liveness/readiness only kick in after startup passes. This prevents a slow start being mistaken for a crash.

Rule of thumb

Liveness = self-check only. Cheap, fast, never touches the network.
Readiness = checks every dependency that matters for serving traffic.
Startup = grace period for slow boots.

Get this right and your service discovery is self-healing. Get it wrong and you''ll page on-call every time a DB twitches.

What is the difference between liveness and readiness probes, and why does it matter for service discovery?

Liveness — "is this process still alive?"

Readiness — "can this pod serve traffic right now?"

Why the split matters

Plus the startup probe (Kubernetes 1.18+)

Rule of thumb

What is the difference between liveness and readiness probes, and why does it matter for service discovery?

Liveness — "is this process still alive?"

Readiness — "can this pod serve traffic right now?"

Why the split matters

Plus the startup probe (Kubernetes 1.18+)

Rule of thumb

What is the difference between client-side and server-side service discovery?

How does service discovery work in Kubernetes without installing Consul or Eureka?

What is service discovery and why do microservices need it?

What is the difference between client-side and server-side service discovery?

How does service discovery work in Kubernetes without installing Consul or Eureka?

What is service discovery and why do microservices need it?

Related questions

Related questions