Guide · Kubernetes Runtime Patterns
Kubernetes Readiness Probes for MCP Servers — gate traffic until truly ready
A Kubernetes readiness probe controls when a pod starts receiving traffic from the Service load balancer. For MCP servers this matters more than it does for stateless REST APIs — an MCP client that connects during a cold database warmup will receive errors on every tool call, but the connection itself succeeds, leaving the client in a broken half-connected state with no clean way to retry.
TL;DR
The readiness probe gates traffic; the liveness probe restarts the container. Do not use the same endpoint for both. Implement /ready to check actual dependencies your MCP tools need: database connection pool has idle slots, required environment secrets were loaded, the tool registry was built successfully, any required warmup caches are populated. Set initialDelaySeconds to match your slowest dependency startup. If the readiness probe fails during steady-state operation (not just startup), Kubernetes removes the pod from the load balancer without restarting it — giving the pod time to recover under reduced load. AliveMCP's external monitoring then detects if the entire Service becomes unready (all pods simultaneously failing readiness) and pages your team before users notice.
Liveness vs readiness: the critical distinction for MCP servers
Kubernetes uses two independent probes with fundamentally different responses to failure:
- Liveness failure → the kubelet restarts the container. This is appropriate for an unrecoverable hang: a deadlocked event loop that will never process another message. Restarting is the only fix.
- Readiness failure → the endpoint is removed from the Service's Endpoints list. The container continues running. This is appropriate for a recoverable not-yet-ready state: a database that is still initializing its connection pool, a config secret that took longer than usual to fetch from Vault, a cache that needs one more second to warm. The pod should stop receiving new traffic until it is ready, but it should not be killed.
For MCP servers, the distinction is especially important because of long-lived SSE connections. If a pod becomes temporarily overloaded — all database connections held by slow tool calls — the correct response is to stop sending it new requests, not to restart it and forcibly disconnect the existing clients. The readiness probe enables this: the overloaded pod fails readiness, stops receiving new SSE connections, clears its backlog, passes readiness again, and rejoins the load balancer — all without a single client disconnection.
| Condition | Correct probe type | Kubernetes response |
|---|---|---|
| Event loop deadlocked — cannot recover | Liveness | Container restarted |
| All DB connections held — temporarily overloaded | Readiness | Removed from load balancer; no restart |
| Server starting up — DB pool warming | Readiness (during startup) | Traffic withheld until pool ready |
| New deploy in progress — old pod still serving | Readiness on new pod | Old pod serves until new pod passes readiness |
Implementing a /ready endpoint for MCP servers
The readiness endpoint must check every dependency that tool calls actually need. A probe that only checks "is the HTTP server listening?" is no better than a TCP probe — it will report the pod as ready while database queries are failing and tool responses contain only error messages.
Node.js / TypeScript — comprehensive readiness check
import express from 'express';
import { pool } from './db'; // Your database connection pool
import { toolRegistry } from './tools'; // Your MCP tool registry
interface ReadinessCheck {
name: string;
ok: boolean;
latencyMs?: number;
error?: string;
}
async function checkDatabasePool(): Promise<ReadinessCheck> {
const start = Date.now();
try {
// Acquire a connection from the pool; if pool is saturated this will throw
const conn = await pool.acquire({ timeout: 1000 });
await conn.query('SELECT 1'); // Minimal round-trip to verify the connection is valid
pool.release(conn);
return { name: 'database', ok: true, latencyMs: Date.now() - start };
} catch (err) {
return { name: 'database', ok: false, error: String(err) };
}
}
async function checkToolRegistry(): Promise<ReadinessCheck> {
// Verify the tool registry has been built and contains the expected tools
const toolCount = toolRegistry.size;
if (toolCount === 0) {
return { name: 'tool_registry', ok: false, error: 'No tools registered' };
}
return { name: 'tool_registry', ok: true };
}
async function checkSecrets(): Promise<ReadinessCheck> {
// Verify required environment secrets were injected (not their values — just presence)
const required = ['DATABASE_URL', 'API_KEY'];
const missing = required.filter(k => !process.env[k]);
if (missing.length > 0) {
return { name: 'secrets', ok: false, error: `Missing: ${missing.join(', ')}` };
}
return { name: 'secrets', ok: true };
}
const app = express();
app.get('/ready', async (req, res) => {
const checks = await Promise.all([
checkDatabasePool(),
checkToolRegistry(),
checkSecrets(),
]);
const ok = checks.every(c => c.ok);
const status = ok ? 200 : 503;
res.status(status).json({
ready: ok,
checks,
timestamp: new Date().toISOString(),
});
});
// /live — liveness probe (separate, lighter weight — see mcp-server-liveness-probe guide)
app.get('/live', async (req, res) => {
await new Promise(resolve => setImmediate(resolve));
res.json({ live: true });
});
FastAPI / Python — equivalent readiness check
from fastapi import FastAPI
from fastapi.responses import JSONResponse
import asyncio
import asyncpg
import os
app = FastAPI()
db_pool: asyncpg.Pool | None = None # Set during startup
@app.on_event("startup")
async def startup():
global db_pool
db_pool = await asyncpg.create_pool(os.environ["DATABASE_URL"], min_size=2, max_size=10)
@app.get("/ready")
async def readiness():
checks = []
# Check database pool
try:
async with db_pool.acquire(timeout=1.0) as conn:
await conn.fetchval("SELECT 1")
checks.append({"name": "database", "ok": True})
except Exception as e:
checks.append({"name": "database", "ok": False, "error": str(e)})
# Check required secrets
required = ["DATABASE_URL", "API_KEY"]
missing = [k for k in required if not os.environ.get(k)]
checks.append({"name": "secrets", "ok": len(missing) == 0,
"error": f"Missing: {missing}" if missing else None})
ok = all(c["ok"] for c in checks)
return JSONResponse({"ready": ok, "checks": checks}, status_code=200 if ok else 503)
Kubernetes readiness probe configuration
apiVersion: apps/v1
kind: Deployment
metadata:
name: mcp-server
spec:
replicas: 2
selector:
matchLabels:
app: mcp-server
template:
metadata:
labels:
app: mcp-server
spec:
containers:
- name: mcp-server
image: your-registry/mcp-server:v1.2.3
ports:
- containerPort: 3000
livenessProbe:
httpGet:
path: /live
port: 3000
initialDelaySeconds: 60
periodSeconds: 15
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 3000
# Readiness probes start sooner — we want to know about dependency failures quickly
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
# Two consecutive successes required before pod accepts traffic (prevents flapping
# when a dependency briefly recovers then fails again during warmup)
successThreshold: 2
# Three consecutive failures to remove from load balancer
failureThreshold: 3
Tuning successThreshold to prevent SSE connection flapping
The default successThreshold is 1, meaning a single successful readiness probe is enough to add the pod back to the load balancer. For MCP servers using the SSE transport, this can cause connection flapping: a pod that is borderline ready — its database pool has one idle connection — passes readiness, receives a flood of new SSE connections, the pool saturates, and the pod immediately fails readiness again.
Setting successThreshold: 2 requires two consecutive successes before the pod re-enters the load balancer. This provides a brief window (one probe interval = 10 seconds) during which the pod must maintain readiness, which smooths out transient recovery oscillation.
Readiness probes during rolling deployments
The readiness probe is the mechanism that makes zero-downtime rolling deployments possible. When you apply a new Deployment spec, Kubernetes starts new pods with the updated image. Each new pod is excluded from the Service load balancer until its readiness probe passes. Only after the new pod is ready does Kubernetes start terminating old pods — and it only terminates as many old pods as the maxUnavailable setting allows.
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1 # At most 1 extra pod above desired count during update
maxUnavailable: 0 # Never reduce below desired count during update
# With replicas: 2, maxUnavailable: 0, maxSurge: 1:
# - New pod starts (3 pods total: 2 old + 1 new)
# - New pod passes readiness
# - 1 old pod terminates (back to 2 pods: 1 old + 1 new)
# - Second new pod starts (3 pods total: 1 old + 2 new)
# - Second new pod passes readiness
# - Last old pod terminates (2 pods: 2 new)
# At no point does the Service have fewer than 2 ready endpoints.
For MCP servers with SSE connections, the old pods drain their connections during the terminationGracePeriodSeconds window. The new pods serve new connections immediately after their readiness probe passes. The combination of a well-implemented readiness probe and a non-zero termination grace period gives you true zero-downtime deploys even for long-lived MCP SSE sessions.
Connection pool saturation as a readiness signal
One of the most valuable uses of the readiness probe for MCP servers is detecting database connection pool saturation. When every connection in the pool is held by a running tool call — which can happen during a burst of long-running operations — new tool calls queue waiting for a connection to become available. The MCP server is live (the event loop is processing requests) but not ready to accept new MCP connections because those connections will queue indefinitely.
// Expose pool saturation as a readiness signal
async function checkConnectionPool(): Promise<ReadinessCheck> {
const poolStats = pool.stats();
// If idle connections = 0 and waiting queue is non-empty, the pool is saturated
if (poolStats.idle === 0 && poolStats.waiting > 0) {
return {
name: 'connection_pool',
ok: false,
error: `Pool saturated: ${poolStats.waiting} requests waiting for ${poolStats.total} connections`,
};
}
// Also gate on pool size: if we are below minimum idle, the pool is still warming
if (poolStats.total < poolStats.min) {
return {
name: 'connection_pool',
ok: false,
error: `Pool warming: ${poolStats.total}/${poolStats.min} minimum connections established`,
};
}
return { name: 'connection_pool', ok: true };
}
This check causes the pod to leave the load balancer when saturated, reducing the flood of new connections that would otherwise worsen the saturation spiral. It is a self-regulating feedback loop: overload triggers readiness failure, which stops new traffic, which gives the pool time to clear queued work and release connections, which restores readiness.
When all pods fail readiness simultaneously
If your readiness checks gate on a shared dependency — a single database, a shared Redis cache, an external API — then all pods will fail readiness simultaneously when that dependency goes down. Kubernetes will remove all pods from the Service load balancer at the same time, making your endpoint completely unreachable from inside the cluster.
This is the boundary at which external monitoring becomes critical. AliveMCP continues probing your public endpoint regardless of Kubernetes readiness state. When all pods fail readiness and your Ingress starts returning 502 or 503, AliveMCP detects this within 60 seconds and alerts your team. The internal cluster view (readiness probe failures on individual pods) and the external view (the endpoint returning 503 to real clients) are both important signals — but only the external view represents what your users are experiencing.
| Scenario | Readiness probe signal | AliveMCP signal | User impact |
|---|---|---|---|
| One pod DB pool saturated | One pod removed from LB | No impact (other pods serve) | None |
| All pods DB pool saturated | All pods removed from LB | Alert: 503 from Ingress | All requests fail |
| Shared Redis down | All pods fail readiness | Alert: 503 from Ingress | All requests fail |
| New deploy — pods warming up | New pods not yet ready | No impact (old pods still serve) | None (if correctly configured) |
Readiness probe for MCP servers with external tool dependencies
MCP servers that proxy tool calls to external APIs (OpenAI, GitHub, Stripe, a proprietary data service) face an additional readiness question: if the upstream API is down, should the pod report itself as not-ready? The answer depends on your traffic handling strategy:
- Do not include upstream API health in the readiness probe if your server can still serve some tools without that API (the failing tool will return
isError: truefor that specific call, which is the correct MCP error handling path). Removing the pod from the load balancer would make all tools unavailable, not just the ones that depend on the failing API — which is worse than serving partial functionality. - Do include upstream API health in the readiness probe if every tool in your server requires the upstream API and there is no graceful degradation path. A pod that can only return errors is not ready to serve traffic in any meaningful sense.
// Decide whether to include upstream in readiness based on impact scope
async function checkUpstreamAPI(): Promise<ReadinessCheck> {
// Only include in readiness if ALL tools require this upstream
const required = process.env.ALL_TOOLS_REQUIRE_UPSTREAM === 'true';
if (!required) {
return { name: 'upstream_api', ok: true }; // Skip — partial degradation is acceptable
}
try {
const resp = await fetch(`${process.env.UPSTREAM_API_URL}/health`, { signal: AbortSignal.timeout(2000) });
return { name: 'upstream_api', ok: resp.ok };
} catch (err) {
return { name: 'upstream_api', ok: false, error: String(err) };
}
}
Frequently asked questions
How long should I wait before checking readiness vs liveness after container start?
Readiness probes can start earlier than liveness probes because a readiness failure has no destructive consequence — the pod just stays out of the load balancer. Set initialDelaySeconds on the readiness probe to the time it takes for your server to bind its port and begin accepting connections (typically 5–15 seconds). Set initialDelaySeconds on the liveness probe to the time it takes for the server to fully initialize, including dependency warmup (typically 30–90 seconds). This way, during startup the pod enters the load balancer as soon as it is ready, without the risk of the liveness probe restarting it during normal initialization. If startup time is highly variable, replace fixed initialDelaySeconds on the liveness probe with a startupProbe — see the liveness probe guide.
Should the readiness probe check the MCP protocol initialize handshake directly?
Not for the 10-second periodic readiness check — it is too heavyweight and has high false-positive rate under load. Instead, structure your /ready endpoint to check the prerequisites for a successful MCP initialize: are the required environment variables set, is the database reachable with an available connection, is the tool registry built? If all prerequisites pass, an MCP initialize will succeed. If you want to verify the MCP protocol itself, do that once in your pod's startup sequence (before the HTTP server starts accepting requests) and set a startup flag. The readiness probe then checks that startup flag rather than running the full handshake on every probe interval. For continuous external MCP protocol verification — which is a different concern from pod-level readiness — use AliveMCP.
My MCP server's /ready endpoint returns 503 intermittently under high load. Should I increase failureThreshold?
Only as a last resort. Intermittent 503s from the readiness endpoint under high load usually indicate a real problem: your database connection pool is too small for the traffic volume, your readiness check timeout is too low (causing the check itself to fail under slow DB response), or your timeoutSeconds on the probe configuration is less than the time the readiness check takes under load. Diagnose the root cause first: add logging to the /ready handler that records which check is failing and how long it takes. If the DB pool is saturating, increase pool size or reduce the number of concurrent tool calls per pod. If the readiness check timeout is the issue, increase timeoutSeconds on the probe configuration to 10 seconds. Increasing failureThreshold only delays the load balancer removal signal — it does not fix the underlying saturation.
Can I test the readiness probe without Kubernetes?
Yes — and you should. The /ready endpoint is a regular HTTP endpoint on your server, so you can test it with curl: curl -s http://localhost:3000/ready | jq . Run this while your server is starting up (before dependencies are ready), during steady-state operation, and while deliberately saturating the database pool. Verify that the response is 200 with ready: true when everything is healthy and 503 with informative check details when a dependency fails. Inject dependency failures in your test environment: stop the database, revoke the API key, fill the connection pool with idle transactions. Confirm the readiness endpoint correctly identifies each failure. This testing prevents the most common readiness probe misconfiguration — a /ready endpoint that always returns 200 regardless of dependency state, which turns the probe into a no-op.
How does AliveMCP work alongside readiness probes for end-to-end visibility?
Readiness probes work from inside the Kubernetes cluster — the kubelet connects directly to the pod's IP over the pod network. They answer the question "is this pod ready to serve internal cluster traffic?" AliveMCP works from outside the cluster — it connects over the public internet to your domain, through DNS, TLS, the Ingress controller, and the Service. It answers the question "is my MCP server reachable and speaking the correct protocol to real clients?" A pod can pass readiness while the Ingress is misconfigured (502 to external clients) or the TLS certificate has expired (handshake failure to external clients). Conversely, all pods can fail readiness simultaneously (shared dependency down) while AliveMCP detects this within 60 seconds from the user's perspective and alerts your team. Both signals are necessary for complete incident response: readiness probes tell you which pods are affected; AliveMCP tells you whether users are impacted and for how long.
Further reading
- MCP Server Liveness Probes — detecting hung event loops and deadlocks
- MCP Server Horizontal Scaling — HPA, KEDA, and SSE session state
- MCP Server Zero-Downtime Deployment — rolling updates and blue-green strategies
- MCP Server Health Check — /live and /ready endpoint implementations
- MCP Server on Kubernetes — Deployments, Services, and ingress configuration
- MCP Server Capacity Planning — sizing connection pools and scaling thresholds