Guide · Kubernetes Runtime Patterns
Kubernetes Liveness Probes for MCP Servers — detect hangs before users do
A Kubernetes liveness probe restarts a container that has entered a broken state it cannot recover from on its own. For MCP servers — where a hung Node.js event loop can keep the process listening on a port while every incoming JSON-RPC message goes unanswered — choosing the right probe type and threshold is the difference between a self-healing deployment and an outage that lasts until an engineer intervenes.
TL;DR
Use an httpGet liveness probe against a dedicated /live endpoint that exercises the Node.js event loop — not a TCP socket check. A TCP probe confirms the port is open; it cannot detect a hung event loop, an OOM-killed worker thread, or a deadlocked async queue. Set initialDelaySeconds generously to survive startup (60–90 s for servers that warm up a database connection pool), tune failureThreshold to three to avoid restarting on transient load spikes, and pair with AliveMCP for continuous external monitoring that catches the failure class Kubernetes probes structurally cannot see: the server appearing healthy inside the cluster while the public endpoint or TLS certificate is broken from the outside.
Why liveness probes matter specifically for MCP servers
MCP servers are long-lived process servers. Unlike a stateless REST API where each request creates a fresh execution context, MCP servers maintain open SSE connections, accumulate in-memory session state, and run tool handlers that call external APIs, query databases, and spawn child processes. Any of these can enter a state the process cannot self-recover from:
- Event loop exhaustion. A synchronous operation inside an async handler — a large JSON parse, a blocking crypto operation, a deep recursive traversal of a tool response — can starve the Node.js event loop for several seconds. If a Kubernetes TCP probe fires during that window it still succeeds, because the port is open; the event loop is just not processing incoming connections.
- Deadlocked async queue. A Promise that resolves into another Promise that waits on a shared resource held by the first creates a deadlock. The process is running, memory usage is stable, the port is open, but no MCP request will ever complete.
- OOM-killed worker thread. A worker thread that crashes due to out-of-memory is not the same as the main process crashing. The main process remains alive and accepting connections; tool calls dispatched to the dead worker silently time out.
- Connection pool saturation. All database connections are held by long-running tool calls. New requests queue indefinitely. The process appears healthy to a TCP probe while every user-visible operation fails.
In all four cases, a TCP probe or a raw HTTP probe that only checks status 200 on a trivial endpoint will report the pod as live. Only a probe that actually exercises the event loop will detect the hang.
Probe type comparison for MCP servers
| Probe type | What it checks | Catches event loop hang? | Recommended for MCP? |
|---|---|---|---|
tcpSocket |
TCP port open and accepting connections | No — port stays open even when event loop is hung | No — insufficient for production |
httpGet → trivial path |
HTTP 200 from a hardcoded response (no async work) | No — a pre-allocated response buffer can return 200 without touching the event loop | Partial — better than TCP but misses most hangs |
httpGet → /live with async probe |
HTTP 200 from a handler that awaits a minimal async operation (DB ping or Promise.race) | Yes — event loop must process the handler | Yes — recommended |
exec → curl + MCP initialize |
Full MCP initialize handshake over localhost |
Yes — exercises the full MCP message parser | No for liveness — too slow; use for readiness probe instead |
Implementing a /live endpoint that detects event loop hangs
The key insight is that a liveness endpoint must enqueue work on the event loop and wait for it to complete. The simplest correct implementation is a setImmediate or Promise that resolves in the next iteration of the event loop. If the event loop is hung, the callback never fires, the HTTP response is never sent, and the probe times out — which is exactly what you want Kubernetes to see.
Node.js / Express implementation
import express from 'express';
import { promisify } from 'util';
const app = express();
const setImmediateAsync = promisify(setImmediate);
// /live — liveness probe: event loop must process this to respond
app.get('/live', async (req, res) => {
// setImmediate yields to the event loop; if the loop is hung, this never resolves
await setImmediateAsync();
// Optionally: check that the database connection pool has at least one idle slot
// (prevents restarting on pool saturation, which readiness handles separately)
// const poolStatus = db.pool.idleCount;
// if (poolStatus === 0) return res.status(503).json({ live: false, reason: 'pool_exhausted' });
res.json({ live: true, ts: Date.now() });
});
// /ready — readiness probe: checked separately (see mcp-server-readiness-probe guide)
app.get('/ready', async (req, res) => {
// More thorough: verify DB connection, MCP tool registry loaded, etc.
const checks = await runReadinessChecks();
const ok = checks.every(c => c.ok);
res.status(ok ? 200 : 503).json({ ready: ok, checks });
});
// /health — combined human-readable status (not used for k8s probes)
app.get('/health', (req, res) => {
res.json({ status: 'ok', uptime: process.uptime(), version: process.env.npm_package_version });
});
The setImmediateAsync trick is deliberately minimal — it adds essentially zero latency to a healthy server (well under 1 ms) but fails instantly when the event loop is hung because the scheduled callback never runs. Do not make the liveness endpoint perform outbound network calls, database queries, or any I/O that could fail due to an external dependency going down. External dependency failures should be handled by circuit breakers and readiness probes, not by restarting the container.
FastAPI / Python implementation
from fastapi import FastAPI
import asyncio
app = FastAPI()
@app.get("/live")
async def liveness():
# asyncio.sleep(0) yields to the event loop — same principle as setImmediate
await asyncio.sleep(0)
return {"live": True}
@app.get("/ready")
async def readiness():
checks = await run_readiness_checks()
ok = all(c["ok"] for c in checks)
return JSONResponse({"ready": ok, "checks": checks}, status_code=200 if ok else 503)
Kubernetes liveness probe configuration
With the /live endpoint implemented, wire it into the Pod spec. The key parameters are initialDelaySeconds, periodSeconds, timeoutSeconds, and failureThreshold. Getting these wrong causes one of two failure modes: too aggressive (restarting healthy pods under load) or too lenient (leaving hung pods alive for minutes).
apiVersion: apps/v1
kind: Deployment
metadata:
name: mcp-server
spec:
replicas: 2
selector:
matchLabels:
app: mcp-server
template:
metadata:
labels:
app: mcp-server
spec:
containers:
- name: mcp-server
image: your-registry/mcp-server:v1.2.3
ports:
- containerPort: 3000
livenessProbe:
httpGet:
path: /live
port: 3000
# Do NOT set scheme: HTTPS here — probe hits the container port directly,
# before TLS termination at the Ingress or Service Mesh layer.
initialDelaySeconds: 60 # How long to wait before first probe after container start
periodSeconds: 15 # How often to probe
timeoutSeconds: 5 # Probe request must complete within this window
successThreshold: 1 # 1 success clears the failure count
failureThreshold: 3 # 3 consecutive failures → container restart
readinessProbe:
httpGet:
path: /ready
port: 3000
initialDelaySeconds: 30 # Readiness checked sooner — remove from LB before hang
periodSeconds: 10
timeoutSeconds: 5
successThreshold: 1
failureThreshold: 3
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
Tuning initialDelaySeconds
This is the most commonly mis-tuned parameter. Set it too low and Kubernetes restarts the pod during normal startup — Node.js compiling TypeScript, running database migrations, warming a connection pool, loading ML model weights. Set it too high and a crashed pod waits unnecessarily before being restarted.
| Server startup profile | Typical cold start | Recommended initialDelaySeconds |
|---|---|---|
| Minimal: no migrations, no preloading | 2–5 s | 15 |
| Medium: connection pool, config fetch | 5–15 s | 30 |
| Heavy: migrations, ML model load, cache warm | 30–60 s | 90 |
| Container image pull on first deploy | Add 30–120 s for image layer pull | Use startupProbe instead |
Using a startupProbe for slow-starting MCP servers
Kubernetes 1.16+ supports a startupProbe that disables liveness and readiness checks until the startup probe succeeds. This is the correct tool for MCP servers with highly variable startup times — it prevents the initialDelaySeconds guess-and-tune cycle entirely.
startupProbe:
httpGet:
path: /live
port: 3000
failureThreshold: 30 # Allow up to 30 * 10 s = 5 minutes of startup time
periodSeconds: 10
# Once /live returns 200, the startupProbe succeeds and liveness/readiness probes take over.
# No initialDelaySeconds needed on livenessProbe when startupProbe is configured.
What liveness probes cannot detect: the external monitoring gap
Kubernetes liveness probes run inside the cluster. The kubelet connects to the container's port over the cluster network — bypassing the Ingress controller, the load balancer, and the TLS termination layer that real MCP clients traverse. This creates a structural blind spot: your pod can be fully live from Kubernetes' perspective while the public endpoint is completely unreachable.
| Failure scenario | Liveness probe result | User experience |
|---|---|---|
| Node.js event loop hung | Fails after timeoutSeconds → pod restarted |
Detected and auto-recovered |
| TLS certificate expired at Ingress | Passes — probe bypasses Ingress | All clients see TLS handshake failure |
| Ingress misconfiguration after nginx upgrade | Passes — probe bypasses Ingress | 502 Bad Gateway or connection refused |
| DNS record pointing to wrong IP after migration | Passes — probe uses pod IP, not DNS | MCP clients cannot reach the endpoint |
| MCP protocol version mismatch after SDK upgrade | Passes if /live endpoint doesn't call MCP initialize | MCP clients cannot complete handshake |
| Upstream API rate-limited → tool calls silently fail | Passes — /live does not call upstream APIs | Tool calls return isError:true |
AliveMCP fills these gaps with continuous external monitoring. Every minute, AliveMCP sends a real MCP initialize request to your public endpoint — traversing the full network path including DNS, TLS, Ingress, and Service — and verifies the JSON-RPC response. If anything in that path breaks, AliveMCP pages your team before users notice.
The combination of Kubernetes liveness probes (catching internal process-level failures) and AliveMCP continuous external monitoring (catching path-level and protocol-level failures) gives you complete coverage of the MCP server failure surface. Neither approach alone is sufficient.
Debugging failed liveness probes
When a pod is being restarted due to liveness probe failures, the first step is to understand whether the restart is correct (catching a real hang) or incorrect (over-aggressive thresholds).
# Check probe failure events for a pod
kubectl describe pod <pod-name> -n <namespace>
# Look for: "Liveness probe failed: Get ... connection refused" or "context deadline exceeded"
# Check the kubelet events for restart loop
kubectl get events --field-selector reason=BackOff -n <namespace>
# Check probe failure from inside the cluster (same network path as kubelet probe)
kubectl exec -it <pod-name> -- curl -sf http://localhost:3000/live
# If the probe succeeds from inside the pod but Kubernetes keeps restarting,
# check whether a previous restart raised restartPolicy backoff:
kubectl get pod <pod-name> -o jsonpath='{.status.containerStatuses[0].restartCount}'
# Stream the liveness probe endpoint logs
kubectl logs <pod-name> -f | grep "GET /live"
Common misconfiguration patterns
| Symptom | Likely cause | Fix |
|---|---|---|
| Pod restarted immediately after deploy | initialDelaySeconds too low; probe fires before server is ready |
Increase initialDelaySeconds or add startupProbe |
| Pod restarted during traffic spikes | timeoutSeconds too low; event loop delayed under load |
Increase timeoutSeconds to 10; or increase failureThreshold to 5 |
| Pod never restarted despite hung event loop | Probe endpoint uses hardcoded synchronous response; does not exercise event loop | Rewrite endpoint to use setImmediate or asyncio.sleep(0) |
| Probe succeeds but MCP clients see errors | Probe bypasses Ingress; external path broken (TLS, DNS, Ingress rule) | Add AliveMCP external monitoring |
Liveness probes for SSE-transport MCP servers
MCP servers using the SSE transport maintain long-lived open connections — some lasting hours — between the client and the server process. Kubernetes cannot gracefully terminate these connections without causing the client to see a disconnection error. Two settings minimize the disruption when a liveness-probe-triggered restart closes these connections.
spec:
terminationGracePeriodSeconds: 60 # Give existing SSE connections 60 s to drain
containers:
- name: mcp-server
lifecycle:
preStop:
exec:
# Signal the server to stop accepting new connections and drain existing ones
command: ["/bin/sh", "-c", "kill -SIGTERM 1 && sleep 55"]
The terminationGracePeriodSeconds must exceed the preStop sleep by a few seconds so Kubernetes does not SIGKILL the container before the pre-stop hook completes. For Streamable HTTP transport (which is stateless), this grace period is less critical because clients can simply retry their next request against the replacement pod.
Monitoring liveness probe health over time with AliveMCP
Liveness probe failures that trigger restarts appear in kubectl describe pod and in your cluster's event log, but they are ephemeral — the event is garbage-collected after the default 1-hour TTL. If a pod restarts at 3 AM due to a liveness probe failure and you check at 9 AM, the event may already be gone.
AliveMCP's 90-day uptime graph gives you a persistent, external record of when your MCP server was unreachable. Liveness-probe-triggered restarts show up as brief downtime windows in the graph — typically 30–60 seconds while the new pod starts and passes its startup probe. If the graph shows these windows occurring regularly (every few hours or days), that is a signal that the liveness probe is catching a recurring hang that needs to be fixed in application code rather than managed through probe threshold tuning.
To register your MCP server with AliveMCP, navigate to alivemcp.com and enter your public endpoint URL. AliveMCP will run the full MCP initialize handshake every minute and show you a live status page and 90-day history — the external view of your server's health that Kubernetes probes cannot provide.
Frequently asked questions
Should I use the same endpoint for both liveness and readiness probes?
No. Liveness and readiness have different semantics and should use different endpoints. The liveness probe (/live) answers the question "is this process in a state it cannot recover from?" — if yes, Kubernetes restarts it. The readiness probe (/ready) answers "is this pod ready to serve traffic?" — if no, Kubernetes removes it from the Service load balancer but does not restart it. A pod that is live but not ready is one that is temporarily overloaded or warming up: it should stop receiving new requests but should not be killed. Using /live for both probes turns transient overload events into unnecessary restarts, which makes things worse — a pod restarting under load removes capacity right when you need it most. See the readiness probe guide for how to implement /ready correctly for MCP servers.
How do I prevent the liveness probe from restarting pods during normal high-load periods?
First, ensure your /live endpoint is genuinely lightweight — it should not call databases, upstream APIs, or any I/O that can slow down under load. If you are using setImmediate or asyncio.sleep(0), the only way the probe times out under load is if the event loop is genuinely blocked for longer than timeoutSeconds, which is itself a problem worth restarting for. Second, increase timeoutSeconds to 10 seconds and failureThreshold to 5 — this means the event loop must be continuously blocked for 50 seconds before Kubernetes restarts the pod, which eliminates false positives from temporary load spikes while still catching genuine hangs. Third, ensure your pod has sufficient CPU and memory limits. If the container is CPU-throttled, the event loop will be slow even without any application-level hang. Check kubectl top pods and compare against the configured limits.
Can I use an exec probe to run an MCP initialize handshake for liveness checking?
Technically yes, but it is not recommended for liveness probes. An exec probe that runs a curl command sending an MCP initialize request adds significant overhead: spawning a child process (curl), parsing JSON-RPC, building and sending the initialize response. Under high load this adds latency to every probe, and the child-process spawn itself requires a healthy event loop — so it is not more accurate than an httpGet probe, just slower. The correct use of a full MCP protocol probe is in the readiness check (before the pod enters the Service), in the post-deploy smoke test (before considering the deployment successful), and in AliveMCP's continuous external monitoring. For the 15-second liveness check interval, use the lightweight httpGet /live approach.
My MCP server is a stdio-transport server, not HTTP. How do I configure liveness probes?
Kubernetes liveness probes require an HTTP endpoint, TCP port, or exec command to probe. A stdio-transport MCP server has none of these by default. The solution is to run a minimal HTTP sidecar — a separate lightweight process that exposes /live and can check whether the stdio server process is alive using OS-level signals or a shared Unix socket. A simpler alternative is to run the stdio server as a subprocess of an HTTP wrapper process that also exposes /live: if the stdio subprocess exits, the HTTP wrapper detects it and returns 503. In practice, many MCP server deployments in Kubernetes use the SSE or Streamable HTTP transport specifically because it is more amenable to the Kubernetes health check model. If you control the server, consider adding an HTTP transport alongside stdio for deployment purposes.
How does AliveMCP complement Kubernetes liveness probes rather than replace them?
Liveness probes and AliveMCP monitor different layers and serve different purposes. Liveness probes monitor the container process from inside the cluster over the cluster network — they catch event loop hangs, deadlocks, and OOM conditions and automatically restart the affected pod. AliveMCP monitors the full public endpoint from outside the cluster — it catches TLS certificate expiry, Ingress misconfigurations, DNS failures, and MCP protocol version mismatches that appear healthy inside the cluster. Neither can substitute for the other. A typical outage that only liveness probes catch: the Node.js event loop deadlocks at 2 AM, the pod restarts automatically, and users see at most a 30-second gap. A typical outage that only AliveMCP catches: an nginx Ingress upgrade changes the default proxy buffer settings, SSE connections start failing for clients while the pods remain live, and your team is paged before the first user reports it. Both failure classes are real; you need both monitoring mechanisms.
Further reading
- MCP Server Readiness Probes — traffic gating, pool warmup, and dependency checks
- MCP Server Horizontal Scaling — HPA, KEDA, and SSE session state
- MCP Server on Kubernetes — Deployments, Services, and rolling updates
- MCP Server Health Check — implementing /live and /ready endpoints
- MCP Server Zero-Downtime Deployment — rolling updates and blue-green strategies
- MCP Server Capacity Planning — sizing, scaling thresholds, and load testing