Runtime guide · 2026-06-18 · Edge & Serverless Runtimes

MCP Servers on the Edge: Cloudflare Workers, Bun, Deno, Netlify Functions, and Azure Functions

The MCP wire protocol is runtime-agnostic. The same three-message JSON-RPC sequence — initialize, tools/list, tools/call — runs identically on Cloudflare Workers V8 isolates, a Bun process, a Deno server, a Netlify Function, and an Azure Function. What differs is everything underneath the protocol: how each runtime boots, how it stores state, how long it can run, how it handles TypeScript, and — critically — what breaks in production in ways that an MCP client sees but the server's own infrastructure never reports. This guide synthesizes the implementation patterns for all five runtimes into a single reference and maps the distinct monitoring failure each creates.

Five runtimes, one protocol, five constraint profiles

Before the runtime-by-runtime breakdown, the comparison table. Every cell that differs from the Node.js baseline is a place where your MCP server implementation will need to change — or where your assumptions from Node.js development will break silently.

Runtime TypeScript Cold start Session state Storage API Max execution Monitoring challenge
Cloudflare Workers esbuild bundled <5ms (V8 reuse) Durable Objects KV, R2, DO Storage 30s CPU / request 300+ edge nodes — single-IP probe covers only one
Bun Native (no tsc) 100–300ms faster than Node In-process (long-lived) Bun.Database (SQLite) Unlimited (persistent process) Restart loops show as repeated short outages
Deno Native (built-in parser) Comparable to Node In-process or Deno KV Deno KV (globally replicated) Unlimited (persistent) / per-request (Deploy) Permission-scope errors fail tool calls, not initialize
Netlify Functions esbuild bundled 200–800ms after idle External store required External DB / background function 10s default / 26s Pro / 15min background Env var misconfiguration: initialize passes, tools fail
Azure Functions tsc (Node.js v4 model) 500ms–5s (Consumption) / <100ms (Premium) External store or Durable Functions Blob Storage, Cosmos DB, Table Storage 10min (Consumption) / unlimited (Premium) Key Vault rotation failure surfaces as sustained 500 spike

Cloudflare Workers: V8 isolates and the distributed monitoring problem

Cloudflare Workers does not run Node.js. Your Workers MCP server runs in a V8 isolate — a lightweight JavaScript context with no file system, no native Node.js modules, and no access to the operating system. The first decision this forces is transport: use StreamableHTTPServerTransport instead of SSEServerTransport. The SSE transport assumes a long-lived process that can hold an open HTTP connection for the lifetime of a session; a Workers request completes in at most 30 seconds of CPU time, making persistent SSE connections impossible.

The second constraint is state. Each Worker invocation is stateless — there is no in-process sessions Map that survives between requests. If your MCP server needs to preserve context across a multi-turn tool sequence (a common pattern when an LLM calls start_job and then get_job_result with the returned ID), you need Durable Objects. A MCPSession Durable Object receives all requests that share the same mcp-session-id header, giving you isolated per-session SQLite-backed storage. Durable Objects idle-evict after 30 days; a re-activated Durable Object after a long idle period counts as a fresh instantiation and incurs a first-call latency spike that looks to AliveMCP like a slow response, not an error.

The third constraint — and the most surprising from a monitoring perspective — is distribution. When you deploy to Cloudflare Workers, your server runs at 300+ edge locations simultaneously. A traditional uptime probe from a single IP address tests only the nearest edge node. If that node is healthy but 40% of your users are served by a different edge colocation that is running a stale deploy or experiencing a regional issue, your single-IP probe shows green while those users see failures. External protocol monitoring from AliveMCP sends the full initialize handshake and verifies the protocolVersion in the response — catching cases where a new deploy is serving the correct protocol on one edge and a broken bundle on another.

Environment variables on Workers are environment bindings: secrets set with wrangler secret put and non-secret configuration in wrangler.toml [vars]. Both are accessed as env.BINDING_NAME in the Worker handler, not process.env. An MCP server that references process.env.API_KEY directly will compile without error and fail at runtime with undefined — but only when a tool that uses that key is called. The initialize and tools/list paths will appear healthy. Only an external probe that exercises an actual tool call catches this failure class.

Bun: native TypeScript and restart-loop visibility

Bun is the simplest runtime transition from Node.js for MCP server development. The Bun MCP server setup eliminates the TypeScript compilation step — you run .ts files directly with bun run server.ts, no tsconfig.json required for basic usage, no ts-node or tsx wrapper needed. The @modelcontextprotocol/sdk package installs and works without modification because Bun's Node.js compatibility layer covers the standard module surface the SDK uses.

The main practical differences from Node.js are tooling, not protocol: bun install instead of npm install (roughly 3× faster), bun:test instead of Vitest or Jest (Jest-compatible API, native TS parsing makes it faster), bun --watch instead of nodemon, and Bun.Database as a built-in SQLite binding that replaces better-sqlite3 without a native module compilation step. The API surface is the same: prepare().all(), prepare().run(), WAL mode configuration.

Bun's startup time advantage — 100–300ms faster than the equivalent Node.js process — has a monitoring consequence. If you configure AliveMCP with a cold-start alert threshold calibrated for Node.js (say, 2 seconds), a restarted Bun server will clear that threshold easily and look healthy even under rapid restart loops. The signal to watch is not a single slow probe response but the pattern: in AliveMCP's 90-day uptime graph, a process restart loop shows as a sequence of brief downtimes separated by short recovery windows. A single cold start looks like a momentary blip; a restart loop looks like a sawtooth. Calibrate your Bun server's alert threshold lower than you would for Node.js, so each restart is visible as a distinct event rather than absorbed into the noise floor.

One compatibility note for Bun production deployments: N-API native addons are not supported. The most common MCP server packages affected are bcrypt (replace with bcryptjs or Bun's native crypto) and better-sqlite3 (replace with Bun.Database). Both replacements are drop-in at the API level. If your existing Node.js MCP server uses any package with a .node binding, audit that before migrating.

Deno: the permission model as a monitoring surface

Deno's MCP server implementation uses the same @modelcontextprotocol/sdk package via npm: imports, the same StreamableHTTPServerTransport, and Deno's built-in Deno.serve() instead of Express. What distinguishes Deno from every other runtime in this list is its permission model: Deno requires explicit flags for every category of resource access.

FlagWhat it allowsMCP server use
--allow-net=0.0.0.0:3000,api.example.comNetwork — scoped to host:portRequired: HTTP server + outbound API calls
--allow-env=DATABASE_URL,API_KEYEnvironment variables — by keyRequired: configuration
--allow-read=/dataFile system reads — scoped to pathIf tools read files
--allow-write=/data/outputFile system writes — scoped to pathIf tools write files
--allow-run=ffmpegSubprocess execution — by commandOnly if tools spawn subprocesses

The critical monitoring implication: --allow-net must include the address the server listens on. If you start Deno with --allow-net=api.example.com for outbound API access but forget to include 0.0.0.0:3000, the server will refuse to bind its listening address at startup — it fails before MCP initialization can begin. That failure is easy to catch in development. The subtler variant: if your Deno MCP server's tool handlers call an external API whose hostname is not in the --allow-net scope, the initialize handshake succeeds, tools/list succeeds, but every tool call that reaches the blocked API throws a PermissionDenied error. AliveMCP's protocol probe catches this because it exercises the full initialize path — and a periodic probe that calls an actual tool will catch the permission-scope gap that the transport-level check misses.

For production deployments requiring global distribution, Deno Deploy runs your Deno server at 35+ edge regions with Deno KV for persistent state. Deno KV is globally replicated and eventually consistent; for per-session MCP state under concurrent writes, use kv.atomic() compare-and-swap to avoid stale-read overwrites. The Deno Deploy monitoring gap mirrors the Cloudflare Workers challenge: a server running at 35 edge regions is not fully tested by a single-IP probe. AliveMCP tests from an external vantage point, verifying that the MCP protocol handshake completes correctly — not just that an HTTP 200 is returned from a Cloudflare DNS-proxied endpoint.

Netlify Functions: the 10-second wall and the cold-start warming side effect

Netlify Functions imposes the most rigid execution constraint of any runtime in this comparison: a hard timeout wall that kills the function invocation at 10 seconds by default (26 seconds on the Pro plan). There is no graceful termination, no partial response — the connection closes with a 502. For an MCP server on Netlify Functions, this means that any tool handler that might take more than 8 seconds (leaving a 2-second buffer for function overhead and MCP framing) requires the async dispatch pattern.

PlanDefault timeoutMax timeoutBackground functions
Free10s10s15 min
Pro10s26s15 min

The async dispatch pattern splits slow work across two tool calls: start_report stores a job record in an external datastore and triggers a background function (suffixed -background or typed as BackgroundHandler), then returns a jobId. get_report polls that datastore for the result. The LLM loop calls start_report, waits a turn, calls get_report. Background functions run for up to 15 minutes, placing virtually all work within reach of this pattern.

Netlify Functions are stateless per invocation — each request to your MCP server creates a new function instance with no in-process state from previous requests. Session state must live in an external store. This makes the initialize handshake stateless too: your handler creates a fresh McpServer and transport each time, then tears them down when the invocation completes. This is different from the Node.js model where the server process is shared across sessions.

Cold starts on Netlify Functions range from 200ms to 800ms after an idle period. For an MCP server being polled by an LLM agent in production, this latency appears as the first tool call in a session being noticeably slower than subsequent calls (which hit a warm instance). AliveMCP's 60-second polling interval has a side effect that is worth knowing: it keeps your function warm during the hours when AliveMCP is probing it. If AliveMCP is configured on your Netlify MCP endpoint and probing at 60-second intervals, your function instances will not idle-evict during business hours when monitoring is active. This is not a substitute for a proper warming strategy, but it is a practical benefit of external monitoring on serverless endpoints.

The most dangerous silent failure on Netlify Functions: environment variable misconfiguration. Because Netlify Functions only accepts environment variables set in the Netlify UI (never in netlify.toml), and because those variables are scoped per deploy context, it is possible for a function to deploy successfully — initialize succeeds, tools/list succeeds — while every tool that accesses process.env.DATABASE_URL fails with an undefined connection string. The only way to catch this before a user does is a post-deploy verification step that calls an actual tool, not just checks the MCP transport. AliveMCP catches this by periodically exercising the full protocol path, not just the initialize handshake.

Azure Functions: plan selection, Durable Functions, and Key Vault rotation

Azure Functions MCP servers use the Node.js v4 programming model, where HTTP triggers are registered with app.http(). Each invocation is stateless — the handler creates a new McpServer instance and tears it down per request, the same pattern as Netlify Functions. The most consequential decision is plan selection:

FeatureConsumption PlanPremium Plan (EP1)
Cold start500ms–5s<100ms (pre-warmed instances)
Scale-to-zeroYesNo
VNet integrationNoYes
Max execution time10 minutesUnlimited
Monthly cost (baseline)Near-zero (pay-per-execution)~$150+

For MCP servers with interactive tool calls (sub-second expected latency), a 5-second cold start on Consumption Plan is unusable. AliveMCP's 60-second probe partially mitigates this — active monitoring keeps an instance warm — but it is not a substitute for Premium Plan if your SLA requires consistent sub-100ms response times. Defer expensive initialization (database connection pools, SDK clients) to the first tool call, not the initialize handler, so the probe's initialize call returns quickly even on a cold start.

For MCP tools that need to run longer than 10 minutes on Consumption Plan, Durable Functions is the correct pattern. Orchestration is expressed as generator functions using yield context.df.callActivity(), where each yield is a checkpoint. If the function host restarts mid-workflow, Durable Functions replays the orchestrator from the last checkpoint rather than restarting from scratch. This maps naturally to the async dispatch pattern: a start_report tool calls context.df.startNew() to launch an orchestration and returns the instance ID; a check_report tool calls client.getStatus(instanceId) to poll for completion. The LLM loop handles the polling.

The Azure-specific failure mode that external monitoring uniquely catches: Key Vault reference resolution failure. Azure Functions can reference secrets as App Settings with the syntax @Microsoft.KeyVault(SecretUri=...), which the Function runtime resolves to the actual secret value at startup using Managed Identity. If the Key Vault access policy is revoked, the Managed Identity is removed, or the secret is deleted, the Function App fails to start — and all tool calls return 500 errors. This failure is invisible to Azure's own function health endpoints because the function never becomes healthy enough to respond. AliveMCP detects it as a sustained 500 spike on a previously healthy endpoint — the transition from green to repeated 500s is the alert signal, even if each individual 500 looks like a tool-call failure.

Azure Application Insights and AliveMCP monitor different things and are genuinely complementary, not redundant:

What to monitorTool
Function execution time, dependency latency, exception rate by function name, error distribution across invocationsApplication Insights
MCP protocol health (initialize handshake, protocolVersion verification), cold-start latency as seen by external clients, VNet firewall changes, TLS certificate expiry on the public endpointAliveMCP
Key Vault rotation failure (early detection via 500 spike)AliveMCP (external probe)

The shared monitoring gap

Five runtimes, five constraint profiles, one shared blind spot: every runtime's internal health-check mechanism — Cloudflare Workers health endpoints, Bun's process manager, Deno's supervisor, Netlify's function invocation logs, Azure's function health probe — operates from inside the runtime's own infrastructure. All five fail to catch failures that occur before a request reaches the server or inside a tool handler but not the transport handler.

The failure classes that fall into the blind spot, mapped by runtime:

Runtime Failure class Internal check result External probe result
Cloudflare Workers Stale deploy on regional edge node Healthy (nearest node tested) Catches wrong protocolVersion on affected edge
Cloudflare Workers Missing env.API_KEY binding (process.env typo) Initialize: healthy Tool call: isError: true — caught
Bun pm2 restart loop (crash every 30s) pm2 shows "online" between restarts Sawtooth outage pattern in 90-day graph
Deno --allow-net missing outbound API host Initialize: healthy; tools/list: healthy Tool call: PermissionDenied — caught
Netlify Functions Missing DATABASE_URL env var (wrong deploy context) Initialize: healthy Tool call: connection error — caught
Azure Functions Key Vault Managed Identity revoked Function App: "Running" Sustained 500 spike on previously healthy endpoint

The common structure of every row above: the transport path (initialize, tools/list) appears healthy; the failure lives in the path between the runtime's infrastructure boundary and the actual tool execution. Internal checks that probe the transport layer are blind to this failure class by definition — they only verify the path they can reach from inside the runtime.

An external protocol probe from AliveMCP sends the full MCP JSON-RPC sequence from outside each runtime's own infrastructure — the same path an LLM client takes. It is the only monitoring approach that catches all six failure rows above, because it observes the server from the same position as the client that will be affected.

Choosing a runtime

The choice between these five runtimes is not primarily a monitoring decision — it is a hosting, tooling, and constraint decision. The monitoring strategy that follows each choice is:

FAQ

Can I use the same MCP server code on all five runtimes?

Mostly yes, with targeted changes per runtime. The MCP SDK and your tool handler logic are portable. What changes: (1) transport selection — use StreamableHTTPServerTransport on Workers, Netlify, and Azure (stateless per request); use SSEServerTransport or Streamable HTTP on Bun and Deno (persistent process). (2) Environment access — Workers uses env.KEY, Deno uses Deno.env.get('KEY'), others use process.env.KEY. (3) Storage API — Workers uses KV/R2/DO, Deno uses Deno.readTextFile() or Deno KV, Bun uses Bun.Database, Netlify and Azure use external datastores. A thin abstraction layer over these three points makes the core tool logic portable.

Do I need Durable Objects on Cloudflare Workers, or can I use stateless MCP?

It depends on whether your MCP tools need to share state across calls within a session. Stateless MCP (using sessionIdGenerator: undefined on the StreamableHTTPServerTransport) works fine for tools where every call is self-contained — a database lookup, an API call, a computation. You need Durable Objects when you have tool sequences where the output of one call (a job ID, a workflow state, a cursor) needs to be available in the next call from the same session, and you cannot derive that state from the arguments alone. Many MCP servers start stateless and add Durable Objects only when a specific tool pattern requires session context.

What is the right execution timeout budget for an MCP tool on Netlify Functions?

Target under 8 seconds for synchronous tools on the Pro plan (26-second max), leaving buffer for function startup and MCP framing overhead. For tools that might take longer, use the async dispatch pattern: start_job triggers a background function (up to 15 minutes) and returns a job ID; get_job_result polls status. The LLM's tool-call loop handles the waiting. Design the timeout budget before writing the tool — a tool that works in 9 seconds locally will fail in production on the default 10-second timeout with no partial output.

Why does AliveMCP show my Azure Function as failing when the Azure portal shows it as running?

The most common cause is a Key Vault reference resolution failure. When a Managed Identity's access to Key Vault is revoked or a secret is deleted, the Function App cannot resolve its App Settings at startup and all invocations return 500. The Azure portal shows the Function App status as "Running" because the App Service plan and host process are alive — they simply cannot serve requests. AliveMCP shows the failure because it sends an actual initialize JSON-RPC call, which receives a 500 response. Other causes in the same category: VNet firewall rule changes that block the AliveMCP probe IP, TLS certificate expiry on a custom domain, or a deployment that left a syntax error in host.json. In all cases, the Azure portal's "running" indicator and AliveMCP's "failing" probe are both correct — they are measuring different things.

How does Deno's permission model affect MCP server security versus Node.js?

Deno's permission model provides defense-in-depth that Node.js does not. A Node.js MCP server running with node server.js has implicit access to the file system, network, subprocesses, and environment — a compromised tool handler can exfiltrate credentials, read arbitrary files, or open outbound connections without restriction. A Deno MCP server started with deno run --allow-net=api.example.com:443 --allow-env=API_KEY is confined: the tool handler cannot read /etc/passwd, cannot open a connection to an arbitrary IP, and cannot spawn subprocesses. The trade-off is that permission-scope errors fail silently from the MCP client's perspective — the tool returns an error, not a protocol failure — and require post-deploy tool-call verification to catch before users do. The security benefit is real and worth the operational overhead.

Monitor your MCP server regardless of runtime

Every runtime in this guide creates a class of failure that internal checks cannot detect. AliveMCP monitors the full MCP protocol path — initialize, tools/list, protocol version verification — from outside your runtime's own infrastructure, at 60-second intervals. If your MCP server is in a public registry (MCP.so, Glama, PulseMCP, Smithery, the Official Registry), it is already listed on AliveMCP's public dashboard. Claim your listing to add custom alert webhooks, get notified within 60 seconds of a failure, and access 90 days of uptime and response-time history.

View monitoring plans