Guide · Production Quality Engineering

Smoke Testing MCP Servers — post-deploy validation, deployment gates, and CI/CD integration

A smoke test is the smallest possible test suite that answers "does it work at all?" — not "does it work correctly in every case?" For an MCP server, smoke testing means verifying the protocol handshake succeeds, the expected tools are present, and at least one representative tool call returns a non-error response. Run it within 30 seconds of deployment, use it as an automated gate that blocks canary promotion if it fails, and let it catch the class of deployment failures that unit tests cannot: wrong binary deployed, environment variables missing in production, database migration not run, TLS certificate misconfigured. This guide covers what to include in an MCP smoke test, how to wire it into your CI/CD pipeline, and how AliveMCP's continuous probing acts as a permanent smoke test between deployments.

TL;DR

An MCP smoke test has three checks, each with a strict time budget: (1) connect and complete initialize handshake (<3s), (2) tools/list returns the expected tool names (<2s), (3) call one representative tool with a known-safe input and receive a non-error response (<10s). Total budget: under 30 seconds. Wire this as a CI/CD step after deployment and before promotion. If any check fails, fail the deployment and roll back. AliveMCP provides continuous monitoring between deployments — the combination means you have a gate at deploy time and a watchdog between deploys.

What smoke testing is and is not

Smoke testing sits at the fast end of a testing pyramid. It is not exhaustive. A passing smoke test tells you the deployment produced a working server; it does not tell you every tool works correctly, that edge cases are handled, or that performance is within spec. Those claims require broader integration tests and load tests.

Test type	What it verifies	Time budget	Run when
Unit tests	Individual functions, tool logic, parsing	Seconds	Every commit, pre-merge
Integration tests	Database interactions, external API calls, tool pipelines	1–10 minutes	Pre-deploy, main branch
Smoke tests	Does the deployed server start and respond at all?	<30 seconds	Post-deploy, before promotion
Load tests	Performance under concurrency, P95 latency at scale	Minutes to hours	Pre-release, on-demand
Synthetic monitoring	Is the production server reachable right now?	Ongoing, every 60s	Always, in production

The failure classes smoke tests catch that other tests miss:

Wrong binary deployed. Unit tests pass because they test the code you wrote; the smoke test runs against what was actually deployed. A CI pipeline that accidentally packages the wrong build artifact will pass all pre-deploy tests and fail the smoke test.
Missing environment variables. A database URL or API key that exists in CI but not in the production environment causes the server to fail at startup, not at code level. The smoke test catches this within seconds of deployment.
Migration not run. If a deploy depends on a schema migration that was not applied before deployment, the server starts but tools that query new columns fail. The smoke test's representative tool call surfaces this.
Port binding conflict. A previous deployment did not terminate cleanly and the port is still bound. The new process fails to start. The smoke test catches the connection failure.

The three-check MCP smoke test

Check 1: Protocol handshake

Connect to the server and complete the MCP initialize handshake. This check verifies the server is running, the port is bound, TLS terminates correctly, and the process has loaded enough to respond to the protocol. Budget: 3 seconds.

// smoke-test.ts — run this immediately after each deployment
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { SSEClientTransport } from '@modelcontextprotocol/sdk/client/sse.js';

async function smokeTest(serverUrl: string, config: SmokeTestConfig) {
  const results: SmokeCheckResult[] = [];
  let client: Client | null = null;

  // Check 1: Protocol handshake (budget: 3s)
  try {
    const transport = new SSEClientTransport(new URL(serverUrl));
    client = new Client({ name: 'smoke-test', version: '1.0' }, {});

    const start = Date.now();
    await withTimeout(client.connect(transport), 3000, 'handshake_timeout');
    results.push({ check: 'protocol_handshake', ok: true, ms: Date.now() - start });
  } catch (err) {
    results.push({ check: 'protocol_handshake', ok: false, error: err.message });
    return { ok: false, results };  // No point continuing if handshake failed
  }

  return { results, client };
}

Check 2: Tool manifest verification

Send tools/list and compare the returned tool names against the expected manifest. The manifest should be committed in your repository alongside the server code — a source-controlled list of which tools the production server is supposed to expose. Budget: 2 seconds.

// Expected tool manifest — commit this next to your server code
// smoke-test-manifest.json
{
  "required_tools": [
    "search_documents",
    "get_user_profile",
    "create_ticket",
    "list_recent_activity"
  ]
}

// Check 2: Tool manifest verification (budget: 2s)
const toolsStart = Date.now();
const toolsResponse = await withTimeout(client.listTools(), 2000, 'tools_list_timeout');
const presentTools = new Set(toolsResponse.tools.map(t => t.name));

const missingTools = config.requiredTools.filter(name => !presentTools.has(name));
results.push({
  check: 'tool_manifest',
  ok: missingTools.length === 0,
  ms: Date.now() - toolsStart,
  missing: missingTools,
  present: toolsResponse.tools.length,
});

Check 3: Representative tool call

Call one tool with a known-safe, idempotent input and verify the response is non-error. Choose a read-only tool (never a write tool for smoke testing — you do not want to create records in production every time you deploy). Budget: 10 seconds.

// Check 3: Representative tool call (budget: 10s)
// Use a read-only tool with deterministic behavior
const callStart = Date.now();
const toolResult = await withTimeout(
  client.callTool({
    name: config.smokeToolName,     // e.g., 'search_documents'
    arguments: config.smokeToolArgs // e.g., { query: 'smoke test sentinel' }
  }),
  10000,
  'tool_call_timeout'
);

// Verify response is non-error and has expected structure
const responseText = toolResult.content?.[0]?.text;
let parsed: unknown;
try {
  parsed = JSON.parse(responseText);
} catch {
  results.push({ check: 'tool_call', ok: false, error: 'response_not_json' });
  return { ok: false, results };
}

// Tool-specific validation — define in your smoke test config
const valid = config.validateResponse(parsed);
results.push({
  check: 'tool_call',
  ok: valid.ok,
  ms: Date.now() - callStart,
  error: valid.ok ? undefined : valid.reason,
});

Wiring smoke tests into CI/CD

Smoke tests belong in the deployment pipeline as a gate between "deploy to environment" and "promote to next stage." The deployment workflow for a canary rollout looks like:

# .github/workflows/deploy.yml
jobs:
  deploy-canary:
    steps:
      - name: Deploy to canary (10% traffic)
        run: |
          # Deploy new version to canary slot
          kubectl set image deployment/mcp-server server=$NEW_IMAGE --record
          kubectl rollout status deployment/mcp-server --timeout=120s

      - name: Wait for canary to stabilize
        run: sleep 30  # Allow health checks and process initialization

      - name: Run smoke tests against canary
        run: npx tsx smoke-test.ts --url $CANARY_MCP_URL --manifest smoke-test-manifest.json
        timeout-minutes: 2

      - name: Promote to full rollout
        if: success()
        run: kubectl patch deployment/mcp-server -p '{"spec":{"strategy":{"rollingUpdate":{"maxUnavailable":0,"maxSurge":1}}}}'

      - name: Roll back on smoke test failure
        if: failure()
        run: |
          kubectl rollout undo deployment/mcp-server
          echo "::error::Smoke tests failed — deployment rolled back"

Key decisions in the CI/CD integration:

Wait for stabilization before testing. The 30-second sleep is not arbitrary — it gives the server process time to complete initialization (load tools, establish database connections, warm up caches) before the smoke test probes it. A smoke test that runs 2 seconds after deployment will fail on legitimate slow-startup servers.
Use timeout-minutes: 2. If the smoke test itself hangs (server accepts connection but never responds), the pipeline should not wait forever. Cap the total smoke test duration at 2 minutes.
Auto-rollback on failure. The value of a smoke test gate is the automatic rollback. Without it, the smoke test is just a log entry — a human still has to notice it failed and manually intervene. Wire the rollback into the pipeline so broken deployments undo themselves.

Smoke test configuration patterns

As your MCP server evolves, maintain the smoke test manifest and tool call configuration as first-class artifacts in your repository. A smoke test that checks for tools that no longer exist will always fail, destroying the gate's utility.

// smoke-test-config.ts — versioned alongside server code
export const smokeTestConfig = {
  // Tools that MUST be present after every deployment
  requiredTools: [
    'search_documents',
    'get_user_profile',
    'create_ticket',
  ],

  // The tool to call during check 3 — must be read-only and idempotent
  smokeToolName: 'search_documents',
  smokeToolArgs: { query: '__smoke_test_sentinel__', top_k: 1 },

  // How to validate the tool call response
  validateResponse(parsed: unknown): { ok: boolean; reason?: string } {
    if (!parsed || typeof parsed !== 'object') {
      return { ok: false, reason: 'non_object_response' };
    }
    // Smoke test doesn't require results — just that the call succeeded
    // without an error and returned parseable JSON with expected structure
    const p = parsed as Record;
    if ('error' in p) {
      return { ok: false, reason: `tool_returned_error: ${p.error}` };
    }
    return { ok: true };
  },

  // Time limits for each check (milliseconds)
  timeouts: {
    handshake: 3000,
    toolsList: 2000,
    toolCall: 10000,
  },
};

Update the manifest whenever you add or remove tools as part of a deployment. The manifest diff in the pull request is a useful communication artifact: reviewers can see at a glance that this deployment adds create_ticket and removes legacy_search. It makes tool surface area changes visible in code review rather than invisible in production.

Smoke testing in different deployment scenarios

Blue-green deployment

Run smoke tests against the green (new) environment before switching the load balancer. The green environment has no production traffic, so a failing smoke test has zero user impact. Promote to green only after smoke tests pass.

Rolling deployment

Run smoke tests against the first pod/instance after it passes Kubernetes readiness checks. If smoke tests fail, pause the rollout (kubectl rollout pause) and investigate. The remaining pods continue serving traffic on the old version while you diagnose.

Serverless (Vercel, Cloudflare Workers)

Smoke test the preview deployment URL before promoting to production. Serverless platforms generate unique deployment URLs per release; wire the smoke test against that URL as a pre-promotion gate in your deployment configuration.

Container restart (PM2, Docker)

After a process restart triggered by a deployment, run the smoke test script against the local server before considering the deployment complete. PM2's post_update hook is the right place:

// ecosystem.config.js
module.exports = {
  apps: [{
    name: 'mcp-server',
    script: 'dist/server.js',
    post_update: ['npm run smoke-test'],  // Runs after pm2 reload
    wait_ready: true,
    listen_timeout: 10000,
  }]
};

AliveMCP as continuous smoke testing between deployments

A CI/CD smoke test runs at deploy time. AliveMCP's continuous protocol probe runs every 60 seconds — permanently. The two complement each other: the CI smoke test prevents you from deploying a broken server; AliveMCP catches when a previously healthy server degrades between deployments.

The failure classes each catches:

Failure class	CI smoke test catches it?	AliveMCP catches it?
Wrong binary deployed	Yes — at deploy time	Yes — if tools are missing from list
Missing env var at deploy time	Yes — server won't start	Yes — connection_refused after crash
Memory leak causes OOM 6 hours after deploy	No — deploy-time test passes	Yes — connection_refused after crash
Upstream API outage at 3 AM	No — outage post-deploy	Yes — failure_reason: external_api_failure
TLS certificate expiry	No — cert was valid at deploy	Yes — failure_reason: tls_error
Database connection pool exhaustion under load	No — low load at deploy time	Yes — timeout / P95 spike

Neither can replace the other. Use both.

Frequently asked questions

Should the smoke test call a real tool or use a dedicated health-check tool?

Call a real tool. A dedicated "smoke test" tool that just returns {"ok": true} verifies that your error-handling code works for trivial cases, not that your actual tools are functional. Use your simplest, most stable, read-only tool — typically a search or list operation — with an input designed to produce a minimal, predictable response. The goal is to exercise the real code path (database connection, embedding API, cache layer) in a lightweight way. If your tool has a limit parameter, set it to 1 so the smoke test is as fast as possible while still exercising the full tool execution path.

What should I do if the smoke test is flaky (occasionally fails without a real issue)?

A flaky smoke test is dangerous: if you learn to ignore failures, you will ignore a real failure. Investigate the root cause. The most common causes: (1) the 30-second stabilization wait is too short — the server is still establishing connections when the smoke test runs; increase to 60 seconds. (2) The time budget for the tool call is too tight — P99 is above the 10-second budget; increase or investigate what causes occasional slowness. (3) The server's startup time varies depending on the external API (embedding model, database) initialization time. Add retry logic with a 3-attempt limit and 5-second backoff before treating a smoke test failure as a deployment failure.

How is smoke testing different from readiness probes in Kubernetes?

A Kubernetes readiness probe runs repeatedly during the container's lifetime and gates whether the pod receives traffic. It is designed to be lightweight and fast (typically an HTTP 200 from a /ready endpoint). A smoke test runs once after deployment and is allowed to be richer — it calls an actual MCP tool via the full protocol stack, not just a health endpoint. Use the readiness probe to control traffic routing; use the smoke test to validate the deployment before promoting to a wider rollout. The readiness probe verifies "should this pod receive traffic right now?"; the smoke test verifies "was this deployment correct?"

How do I smoke test an MCP server that requires authentication?

Store smoke test credentials as CI/CD secrets, not in the test manifest. Create a dedicated smoke test identity (a service account or API key) with read-only permissions to only the tools needed for the smoke test. This means the smoke test never generates writes that affect real data, and the credentials can be rotated or revoked independently of production credentials. Pass the smoke test credentials via environment variables (SMOKE_TEST_API_KEY) and inject them into the MCP client's request headers in your smoke test script. Never embed smoke test credentials in the repository.

What is the minimum smoke test for a brand-new MCP server?

Two checks: (1) connect and complete initialize (verifies the server is running and protocol-compliant), (2) tools/list returns at least one tool (verifies the server loaded its tool definitions). You don't need a tool call for a brand-new server if you don't yet know what stable input to use. Add the tool call check once the server is in production and you can identify a reliable canary input. The two-check minimum takes under 5 seconds and catches every deployment failure class except application-layer regressions — which you can't test without established baseline behavior anyway.