Developer Experience · 2026-06-13 · MCP server DevEx stack

The MCP Server Developer Experience Stack: From OpenAPI to Token Budgets

Most MCP server guides cover one thing in depth: here's how to write a tool, here's how to test it, here's how to deploy it. Fewer step back and map out the full developer workflow — the five distinct phases every MCP server author navigates, and the specific practice that makes each one fast. This post synthesizes five deep-dives — OpenAPI bridging, hot reload, local stack setup, CLI tools, and token budgets — into a single decision framework: which practice eliminates friction at which phase, how each one connects to the others, and why all five together still leave one gap that only external monitoring can close.

The five-phase DevEx map

MCP server development has a natural phase structure. Each phase has a dominant pain point and a corresponding practice that eliminates it:

PhasePain pointPracticePayoff
1. API bridgingHand-writing 50 tool definitions from an existing REST APIOpenAPI-to-MCP mappingZero hand-written tool definitions; spec is the source of truth
2. Iteration loop15–30 seconds from code change to tested tool calltsx --watch hot reloadUnder 2 seconds; at 20 iterations/hour, saves 5–10 minutes
3. Environment setupEach new contributor loses half a day to project scaffoldingFull local stack setup10 minutes from npm init to a running server poked from Claude Desktop
4. OperationsManual health checks, schema diffs, deploy verificationCLI scriptsEvery operational task becomes a single npm run command
5. ScaleOne runaway LLM session exhausts a month's cloud budgetToken budget enforcementPer-tenant quotas enforced server-side, independent of the client

The five phases are sequential in the sense that you reach them in order — you can't iterate before you've set up the local stack, and you don't need token budgets until you're running a multi-tenant server at scale. But the practices compound: the CLI tools from phase 4 make the hot-reload loop from phase 2 safer to use in CI, and the local stack setup from phase 3 makes the OpenAPI bridge from phase 1 easier to test.

Phase 1: Bridging existing APIs with OpenAPI-to-MCP

Most real MCP servers don't expose novel functionality — they bridge an existing REST API into the MCP ecosystem. If that API has an OpenAPI 3.x spec, you have a choice: hand-write tool definitions by reading the spec as documentation, or generate them from the spec programmatically.

The hand-writing path has a hidden cost: every time the API adds an endpoint, changes a parameter name, or deprecates a field, the MCP server must be manually updated or it drifts. For small APIs (under 10 endpoints), this is manageable. For medium APIs (10–50 endpoints), you want build-time code generation:

// scripts/generate-tools.ts — run with npm run generate
import SwaggerParser from '@apidevtools/swagger-parser';
import { writeFileSync } from 'fs';
import type { OpenAPIV3 } from 'openapi-types';

const spec = (await SwaggerParser.validate('openapi.yaml')) as OpenAPIV3.Document;
const tools: string[] = [];

for (const [path, pathItem] of Object.entries(spec.paths ?? {})) {
  for (const method of ['get','post','put','patch','delete'] as const) {
    const op = (pathItem as Record<string, OpenAPIV3.OperationObject>)[method];
    if (!op?.operationId) continue;
    // derive inputSchema from path/query params + request body, emit as TS...
  }
}

writeFileSync('src/generated-tools.ts', `// AUTO-GENERATED\nexport const GENERATED_TOOLS = [${tools.join(',')}] as const;`);

The generator reads the spec at build time and emits a TypeScript file that is committed to the repo and reviewed in PRs. When the API spec changes, re-running npm run generate propagates the change; a CI check that fails if the committed file differs from a freshly generated one means spec drift is caught before deployment.

The key mapping to understand is how OpenAPI concepts translate to MCP equivalents. An operationId becomes the MCP tool name. Path parameters, query parameters, and the request body schema all flatten into a single inputSchema object — MCP tools have no concept of URL parameters separate from body fields. The operation summary becomes the tool description, but OpenAPI summaries are written for developers reading reference docs; MCP descriptions are instructions for an LLM deciding which tool to call. The single most valuable manual step in any OpenAPI-to-MCP bridge is rewriting summaries as LLM instructions ("Retrieve a single user record by their UUID — use when you have a specific userId and need the full user object") rather than leaving them as developer shorthand ("Get user").

Phase 2: Making the iteration loop fast with hot reload

The default MCP development cycle — edit TypeScript, run tsc, restart the process, reconnect the MCP Inspector, clear conversation context, re-run the tool — takes 15–30 seconds per iteration. At 20 iterations per hour, that's 5–10 minutes of waiting in every working hour. Compounded over weeks of development, the slow loop is one of the biggest contributors to MCP server developer frustration.

tsx --watch cuts this to under 2 seconds. It is a TypeScript executor built on esbuild that strips types (rather than type-checking) and restarts the process in 0.5–1.5 seconds after any file change. The MCP Inspector reconnects automatically:

# package.json
{
  "scripts": {
    "dev":   "tsx --watch src/index.ts",
    "build": "tsc --noEmit && tsc --outDir dist",
    "start": "node dist/index.js"
  }
}
# Run Inspector connected to the dev server — reconnects automatically on each restart
npx @modelcontextprotocol/inspector npm run dev

Two things make hot reload work reliably in MCP servers that don't work in simpler scripts. First, stateful resources (database connections, HTTP clients) must be initialized inside a factory function that runs after the Server is constructed, not at module load time. Module-level database connections can corrupt SQLite files on abrupt restart if a write is in flight. Second, tsx --watch sends SIGTERM to the old process before starting the new one — if your server handles SIGTERM and calls db.close() cleanly, the restart is safe:

async function main() {
  const deps = await createDeps(); // opens DB connection after Server construction
  const server = new Server({ name: 'my-server', version: '1.0.0' }, { capabilities: { tools: {} } });
  registerTools(server, deps);
  await server.connect(new StdioServerTransport());
  const shutdown = async () => { await deps.db.close(); process.exit(0); };
  process.on('SIGTERM', shutdown);
  process.on('SIGINT',  shutdown);
}
main().catch(console.error);

The tradeoff with tsx is that type errors are invisible during development — it strips types without checking them. Run tsc --noEmit in a separate terminal or as a pre-commit hook. The hot-reload loop is for iteration speed; the type checker is for correctness. Combining them in the same terminal step defeats the purpose of hot reload.

Phase 3: Setting up the full local stack

Most MCP server setup friction is environmental: which TypeScript settings work with the SDK's .js import extensions, whether to use better-sqlite3 or a full PostgreSQL server for local data, how to manage .env files across dev and prod. Getting these right once and codifying them in a project template eliminates days of onboarding friction for each new contributor.

The two settings that trip up most MCP TypeScript projects are "module": "node16" and "moduleResolution": "node16" in tsconfig.json. The MCP SDK uses .js import extensions in its source (import { Server } from '@modelcontextprotocol/sdk/server/index.js'). Node16 module resolution is what makes these extensions resolve correctly. Using "moduleResolution": "bundler" or "node" will fail at runtime with cryptic module-not-found errors even though the package is installed.

// tsconfig.json
{
  "compilerOptions": {
    "target": "ES2022",
    "module": "node16",
    "moduleResolution": "node16",
    "outDir": "dist",
    "rootDir": "src",
    "strict": true,
    "esModuleInterop": true
  }
}

For the data layer, prefer better-sqlite3 for local development and small single-tenant deployments. It requires no server to spin up, survives process restarts with WAL journal mode, and is fast enough for any workload where tool-call latency is measured in milliseconds rather than microseconds. The key setup detail is enabling WAL mode on connection — this makes the database resilient to the abrupt process terminations that hot reload causes:

import Database from 'better-sqlite3';
const db = new Database('./data/dev.db');
db.pragma('journal_mode = WAL');
db.pragma('foreign_keys = ON');

For environment variables, Node 20.6+ supports --env-file .env natively — no dotenv package needed. Add it to the dev script: "dev": "tsx --watch --env-file .env src/index.ts". Commit a .env.example with placeholder values so contributors know what variables to set; add .env to .gitignore so secrets are never committed.

Phase 4: CLI tools for development and operations

The MCP Inspector is designed for interactive development: you pick a tool, fill in arguments, run it, see the result. It is not designed for three common operational needs: verifying that a deployment succeeded, diffing tool schemas between versions, or running a smoke test against a live endpoint in CI without a human in the loop.

These needs are met by a minimal scripts/ directory. The MCP protocol over HTTP is plain JSON-RPC — you can probe it with curl without any SDK:

#!/bin/bash
# scripts/health-check.sh
URL="${1:-$MCP_SERVER_URL}"
RESPONSE=$(curl -sf --max-time 10 \
  -H "Content-Type: application/json" \
  -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"health-check","version":"1.0.0"}}}' \
  "$URL/mcp" 2>&1)

if echo "$RESPONSE" | grep -q '"protocolVersion"'; then
  echo "OK — MCP server healthy at $URL"
  exit 0
else
  echo "FAIL — unexpected response: $RESPONSE" >&2
  exit 1
fi

Four scripts cover most operational needs. A health-check script sends an initialize request and exits non-zero on any non-200 response, timeout, or missing protocolVersion in the result — this is the signal used in CI deploy gates and uptime monitors. A schema dump script calls ListTools and writes the JSON to a file: git diff between two dump files shows exactly which tool names, descriptions, or schema fields changed, making tool regressions visible in PRs. A smoke test script calls each tool with known-good arguments and asserts the responses look correct — it runs in CI after deployment to catch the class of failures where the server starts but immediately returns isError: true on every tool call due to a misconfigured database or expired API credential. A deployment verification script combines the health check, schema dump, and smoke test with a rollback command, ensuring a bad deployment is caught and reversed before any LLM client encounters it.

Wire all four as npm run scripts so they're discoverable and version-controlled alongside the server code:

{
  "scripts": {
    "check:health":  "bash scripts/health-check.sh",
    "check:schema":  "tsx scripts/dump-schema.ts",
    "check:smoke":   "tsx scripts/smoke-test.ts",
    "check:deploy":  "bash scripts/verify-deploy.sh",
    "check":         "npm run check:health && npm run check:schema && npm run check:smoke"
  }
}

The CLI scripts are complementary to hot reload and the local stack, not alternatives. Hot reload is the inner loop for interactive development; the CLI scripts are the outer loop for deployment validation and ongoing operations. A fast inner loop without a reliable outer loop leads to servers that work locally but fail silently in production.

Phase 5: Token budget enforcement for multi-tenant servers

Single-developer MCP servers running locally have no token budget concern — you control the client and the cost is yours. The moment you deploy a server that calls upstream LLM APIs (Anthropic, OpenAI, Gemini) on behalf of multiple tenants, token cost becomes a product-level problem. A single poorly-prompted LLM session — one that calls a tool in a loop, generates an enormous context, or triggers recursive tool chains — can exhaust a month's budget in minutes.

Token budget enforcement at the MCP server layer is the correct architectural choice for three reasons. It's the only layer you control when serving multiple clients. It's prompt-injection resistant — a user cannot instruct the LLM to bypass a server-side budget check via a system prompt. And it's the only layer that has accurate cost data, because the server that calls the upstream LLM receives the actual token counts from the API response.

The implementation has two tables — tenants (quota configuration) and usage_events (individual call records) — and three enforcement points. Before any tool call that touches an upstream LLM, check whether the tenant is over quota. After a successful call, record the tokens consumed. On a monthly billing cycle, reset quotas. The critical detail is using soft limits (warn at 80%) and hard limits (block at 100%) rather than blocking at the first sign of high usage:

async function checkBudget(tenantId: string, db: Database): Promise<void> {
  const quota = db.prepare(
    `SELECT t.monthly_quota, t.soft_limit_pct,
     COALESCE(SUM(u.tokens_used), 0) AS used
     FROM tenants t
     LEFT JOIN usage_events u ON u.tenant_id = t.id
       AND u.created_at >= date('now','start of month')
     WHERE t.id = ? GROUP BY t.id`
  ).get(tenantId) as { monthly_quota: number; soft_limit_pct: number; used: number } | undefined;

  if (!quota) throw new McpError(ErrorCode.InvalidRequest, 'Unknown tenant');
  const ratio = quota.used / quota.monthly_quota;
  if (ratio >= 1.0) throw new McpError(ErrorCode.InvalidRequest,
    `Monthly token quota exceeded (${quota.used.toLocaleString()} / ${quota.monthly_quota.toLocaleString()}). Resets on the 1st.`);
  // soft limit: warn but allow
}

The budget-exceeded error message is written for the LLM, not for a developer. The LLM will read it and relay it to the user — phrasing it as actionable information ("Resets on the 1st") means the user gets a useful message rather than a cryptic error code. Expose a check_budget tool so that an LLM planning an expensive multi-step operation can check remaining quota before starting:

// check_budget tool — called by the LLM before expensive operations
server.setRequestHandler(CallToolRequestSchema, async (request) => {
  if (request.params.name === 'check_budget') {
    const { tenantId } = CheckBudgetSchema.parse(request.params.arguments);
    const status = getBudgetStatus(tenantId, db);
    return { content: [{ type: 'text', text: JSON.stringify({
      monthly_quota:  status.monthly_quota,
      used:           status.used,
      remaining:      status.monthly_quota - status.used,
      percent_used:   Math.round((status.used / status.monthly_quota) * 100),
      resets_at:      getNextResetDate(),
      warning:        status.used / status.monthly_quota >= 0.8
        ? 'Approaching monthly limit — consider checking again after completing current task' : null,
    })}] };
  }
});

Token budget enforcement is the final piece of the DevEx stack because it only becomes necessary at scale. But when it does become necessary, the consequences of not having it are severe — a single tenant can take down the service for everyone by exhausting shared upstream API credentials. Adding it before you need it is much cheaper than scrambling to add it after your first incident.

How the five phases connect

Each phase in the DevEx stack builds on the previous ones in a way that's not always obvious from reading about them individually.

The OpenAPI bridge (phase 1) gets easier with a well-structured local stack (phase 3): when your project separates the generated src/generated-tools.ts from hand-authored src/tools/ overrides, updating the generator doesn't clobber tool-specific customizations. Hot reload (phase 2) makes the OpenAPI bridge easier to develop: you can run the generator script and watch the server pick up the new tool list in under 2 seconds, without restarting the Inspector or losing conversation context.

The CLI smoke test (phase 4) connects back to the OpenAPI bridge (phase 1): smoke-testing each generated tool with a known-good call confirms that the mapping from spec to MCP handler is correct, which is harder to catch in a type check alone. The schema dump script (also phase 4) connects to CI: failing the build when npm run generate produces a different file than what's committed is the same pattern as failing the build when the schema dump changes unexpectedly.

The token budget (phase 5) connects to the local stack (phase 3) through SQLite: the same better-sqlite3 database that stores your application data stores tenant quotas and usage events. Using a single database avoids a separate deployment dependency (no Redis, no separate Postgres instance) while still being fast enough for the per-call budget checks that happen in each tool handler.

When you reach this phaseThe earlier practice that unblocks it
OpenAPI bridge (phase 1)Local stack (phase 3) — project structure tells you where generated files live
Hot reload (phase 2)Local stack (phase 3) — SIGTERM handler + factory function pattern
CLI tools (phase 4)Hot reload (phase 2) — inner loop for dev, CLI scripts for CI and ops
Token budgets (phase 5)Local stack (phase 3) — SQLite already in the project, schema just needs two new tables

The gap all five phases share

The five DevEx practices cover the full development lifecycle — from writing the first tool definition to operating a multi-tenant production server. But they all share one blind spot: they are all pre-deploy tools that run in the developer's environment or in CI.

A server that passes the health check script can still be silently broken in production for reasons that no local or CI check can detect:

What these failures have in common: they are all observable only from outside the process, at the protocol level, with a real connection. A monitor that calls initialize and then calls a representative tool and checks for a non-error response will catch all four. A monitor that only checks whether the process is running, or whether the HTTP port is open, will miss all four and show green while users hit errors.

AliveMCP pings your deployed MCP endpoint every 60 seconds: it performs the full initialize handshake, calls your configured probe tools, and validates that the response is structurally correct and non-error. When a deployment, configuration change, or external dependency failure silently breaks your server in production — the way no local or CI check can catch — AliveMCP alerts you before any LLM client encounters the failure. The DevEx stack covers development and CI; AliveMCP closes the production monitoring gap.

Further reading