Guide · Azure · Serverless

MCP server on Azure Functions

Azure Functions is Microsoft's serverless compute platform, widely used in enterprise environments that already run Azure infrastructure. Deploying an MCP server on Azure Functions gives you native integration with Azure AD for authentication, Azure Key Vault for secrets management, Application Insights for telemetry, and the entire Azure service ecosystem for tool backends. The most important architectural decision is which hosting plan to use: Consumption Plan (pay-per-execution, scale-to-zero, cold starts) vs Premium Plan (pre-warmed instances, no cold starts, higher baseline cost). For MCP servers that must always respond within a tight timeout, Consumption Plan cold starts are the critical constraint. This guide covers both paths and the patterns that work in each.

TL;DR

Use an HTTP-triggered Azure Function with the MCP SDK's StreamableHTTPServerTransport. For enterprise deployments where cold starts are unacceptable, use Premium Plan with functionAppScaleLimit ≥ 1 to keep at least one warm instance. For long-running tool operations (ETL, document processing, multi-step orchestration), use Durable Functions — the fan-out/fan-in pattern handles concurrent tool sub-tasks, and the async HTTP API handles the start/poll lifecycle. Monitor with AliveMCP for external protocol-level probing — Application Insights shows you internal metrics, but AliveMCP shows you what your agent clients actually experience.

HTTP trigger for MCP: basic implementation

Create your Function App in the Azure portal or with the Azure CLI, then add an HTTP trigger for the MCP endpoint. The Node.js v4 programming model (recommended for new projects) uses a simpler function registration API than v3:

// src/functions/mcp.ts — MCP server as an Azure HTTP trigger (Node.js v4 model)
import { app, HttpRequest, HttpResponseInit, InvocationContext } from "@azure/functions";
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
import { z } from "zod";

const createMcpServer = (context: InvocationContext) => {
  const server = new McpServer({ name: "azure-mcp", version: "1.0.0" });

  server.tool(
    "list_storage_blobs",
    "List blobs in an Azure Storage container",
    { container: z.string(), prefix: z.string().optional() },
    async ({ container, prefix }) => {
      // Azure SDK — uses Managed Identity or connection string from Key Vault reference
      const { BlobServiceClient } = await import("@azure/storage-blob");
      const client = BlobServiceClient.fromConnectionString(
        process.env.AZURE_STORAGE_CONNECTION_STRING!
      );
      const containerClient = client.getContainerClient(container);
      const blobs: string[] = [];

      for await (const blob of containerClient.listBlobsFlat({ prefix })) {
        blobs.push(blob.name);
        if (blobs.length >= 100) break; // Prevent unbounded listing
      }

      return { content: [{ type: "text", text: JSON.stringify(blobs) }] };
    }
  );

  server.tool(
    "query_cosmos",
    "Query Azure Cosmos DB",
    { container: z.string(), query: z.string() },
    async ({ container, query }) => {
      const { CosmosClient } = await import("@azure/cosmos");
      const client = new CosmosClient({ endpoint: process.env.COSMOS_ENDPOINT!, key: process.env.COSMOS_KEY! });
      const database = client.database(process.env.COSMOS_DATABASE!);
      const { resources } = await database.container(container).items.query(query).fetchAll();
      return { content: [{ type: "text", text: JSON.stringify(resources, null, 2) }] };
    }
  );

  return server;
};

app.http("mcp", {
  methods: ["POST"],
  authLevel: "anonymous",  // Use Azure AD authentication at the API Management layer instead
  route: "mcp",
  handler: async (request: HttpRequest, context: InvocationContext): Promise<HttpResponseInit> => {
    const server = createMcpServer(context);
    const transport = new StreamableHTTPServerTransport({ sessionIdGenerator: () => crypto.randomUUID() });
    await server.connect(transport);

    // Convert Azure HttpRequest to standard Fetch API Request
    const req = new Request(request.url, {
      method: request.method,
      headers: Object.fromEntries(request.headers.entries()),
      body: request.method !== "GET" ? await request.text() : undefined,
    });

    const response = await transport.handleRequest(req);

    return {
      status: response.status,
      headers: Object.fromEntries(response.headers.entries()),
      body: await response.text(),
    };
  },
});

{
  // host.json — Azure Functions host configuration
  "version": "2.0",
  "logging": {
    "applicationInsights": {
      "samplingSettings": { "isEnabled": true, "excludedTypes": "Request" }
    }
  },
  "extensions": {
    "http": {
      "routePrefix": "api",
      "maxConcurrentRequests": 100,
      "maxOutstandingRequests": 200,
      "dynamicThrottlesEnabled": true
    }
  },
  "functionTimeout": "00:05:00"  // 5-minute max per invocation on Consumption Plan
}

Consumption Plan vs Premium Plan for MCP

The hosting plan decision is the most impactful architectural choice for Azure Functions MCP servers:

Attribute	Consumption Plan	Premium Plan (EP1)
Cold start latency	500ms–5s (first request after idle)	<100ms (pre-warmed instances)
Scale-to-zero	Yes — idles after ~20min inactivity	No — minimum 1 warm instance always running
Max execution time	10 minutes (configurable to unlimited with Flex)	Unlimited
VNet integration	No (outbound only via service endpoints)	Yes — full VNet integration
Monthly cost (low traffic)	Near-zero (pay per call)	~$150–200/month minimum
Best for	Dev, staging, low-traffic MCP endpoints	Production MCP servers with latency SLAs

For MCP servers in enterprise agent pipelines, cold starts are a significant problem. A 3-second cold start on the first tool call of an agent run adds latency that compounds through multi-step agent workflows. The Premium Plan's pre-warmed instance eliminates this at the cost of ~$150/month baseline — often justified for production agent infrastructure.

If you must use Consumption Plan in production, mitigate cold starts with "keep-warm" pings from your monitoring tool. AliveMCP pings every 60 seconds, which is frequent enough to keep a Consumption Plan function warm during active business hours (Lambda/Function instances typically idle after 15–20 minutes of inactivity).

Durable Functions for long-running tool orchestration

Durable Functions extend Azure Functions with stateful orchestration — a checkpoint-based execution model that survives crashes, timeouts, and VM restarts. For MCP servers with tools that orchestrate multi-step operations (data pipelines, document processing, multi-service aggregation), Durable Functions replace the start/poll async dispatch pattern with a more elegant actor model:

// src/functions/report-orchestrator.ts — Durable Function orchestration
import { app, OrchestrationContext, ActivityHandler } from "@azure/functions";
import * as df from "durable-functions";

// Activity functions: individual steps that can be retried independently
const fetchDataActivity: ActivityHandler = async (input: { datasetId: string }) => {
  const res = await fetch(`${process.env.DATA_API}/datasets/${input.datasetId}`);
  return await res.json();
};

const processDataActivity: ActivityHandler = async (input: { raw: unknown }) => {
  // CPU-intensive processing — runs in a separate activity invocation
  return transformData(input.raw);
};

const generatePdfActivity: ActivityHandler = async (input: { data: unknown, reportId: string }) => {
  const pdf = await renderPdf(input.data);
  // Store in Azure Blob Storage
  await uploadToBlob(`reports/${input.reportId}.pdf`, pdf);
  return `reports/${input.reportId}.pdf`;
};

// Register activity functions
app.activity("fetchData", { handler: fetchDataActivity });
app.activity("processData", { handler: processDataActivity });
app.activity("generatePdf", { handler: generatePdfActivity });

// Orchestrator function: coordinates the workflow
df.app.orchestration("reportOrchestrator", function*(context: OrchestrationContext) {
  const { datasetId, reportId } = context.df.getInput<{ datasetId: string; reportId: string }>();

  const raw = yield context.df.callActivity("fetchData", { datasetId });
  const processed = yield context.df.callActivity("processData", { raw });
  const pdfPath = yield context.df.callActivity("generatePdf", { data: processed, reportId });

  return { status: "complete", pdfPath };
});

// MCP tool that starts the orchestration and returns an instance ID for polling
// (in your main mcp.ts HTTP trigger)
server.tool(
  "start_report",
  "Start a long-running report generation",
  { datasetId: z.string() },
  async ({ datasetId }) => {
    const client = df.getClient(context);  // context from Azure Function invocation
    const reportId = crypto.randomUUID();
    const instanceId = await client.startNew("reportOrchestrator", { input: { datasetId, reportId } });
    return { content: [{ type: "text", text: `Report started. Instance: ${instanceId}. Call check_report to poll status.` }] };
  }
);

server.tool(
  "check_report",
  "Check the status of a running report",
  { instanceId: z.string() },
  async ({ instanceId }) => {
    const client = df.getClient(context);
    const status = await client.getStatus(instanceId);
    return {
      content: [{
        type: "text",
        text: JSON.stringify({ runtimeStatus: status.runtimeStatus, output: status.output }),
      }]
    };
  }
);

The key advantage of Durable Functions over a simple async dispatch: if the Azure Function host crashes mid-orchestration (VM restart, deployment), the orchestration resumes from the last checkpoint when the host comes back up. Your MCP client can poll check_report and eventually get a result even if the underlying compute restarted mid-processing.

Azure Key Vault for MCP server secrets

Azure Key Vault references in Function App settings let you store secrets in Key Vault while accessing them as standard environment variables in your function — no Key Vault SDK calls in your tool code:

# Azure CLI: configure Key Vault reference in Function App settings
# 1. Create a Key Vault secret
az keyvault secret set \
  --vault-name my-keyvault \
  --name "database-connection-string" \
  --value "postgresql://user:password@host:5432/dbname"

# 2. Enable System Assigned Managed Identity on the Function App
az functionapp identity assign \
  --name my-mcp-function-app \
  --resource-group my-rg

# 3. Grant the Function App read access to Key Vault
az keyvault set-policy \
  --name my-keyvault \
  --object-id $(az functionapp identity show --name my-mcp-function-app --resource-group my-rg --query principalId -o tsv) \
  --secret-permissions get

# 4. Set the app setting to a Key Vault reference
az functionapp config appsettings set \
  --name my-mcp-function-app \
  --resource-group my-rg \
  --settings "DATABASE_URL=@Microsoft.KeyVault(VaultName=my-keyvault;SecretName=database-connection-string)"

// In your tool handler — just use process.env, Key Vault reference is transparent
server.tool("query_db", "Query the database", { sql: z.string() }, async ({ sql }) => {
  // process.env.DATABASE_URL is automatically resolved from Key Vault at startup
  const pool = new Pool({ connectionString: process.env.DATABASE_URL });
  const result = await pool.query(sql);
  await pool.end();
  return { content: [{ type: "text", text: JSON.stringify(result.rows) }] };
});

Key Vault references are resolved at Function App startup, not per-invocation. If the Key Vault reference is invalid or the Managed Identity doesn't have access, the Function App fails to start and all tool calls fail with a 500 error. AliveMCP's external probe catches this immediately — a failed initialize response after a deployment that changed Key Vault access is a fast signal that permissions need review.

Application Insights and AliveMCP: complementary monitoring

Application Insights (Azure Monitor) provides deep internal telemetry — function execution duration, dependency calls, exception traces, and custom metrics. AliveMCP provides external protocol-level monitoring — what a real agent client experiences when calling your MCP endpoint. The two are complementary:

What to monitor	Tool	Why
Function execution time, dependency latency	Application Insights	Internal timing — invisible to external probes
Exception rate, error distribution by tool	Application Insights	Stack traces, root cause investigation
MCP protocol health (initialize + tools/list)	AliveMCP	End-to-end protocol correctness from outside Azure
Cold start frequency and latency	AliveMCP (p99 latency) + App Insights (cold-start tag)	App Insights labels cold starts; AliveMCP shows user-visible latency
VNet or firewall failures	AliveMCP	Internal probes can't detect failures external clients hit
TLS certificate expiry	AliveMCP	App Insights doesn't monitor the TLS layer independently

// Add custom Application Insights telemetry from tool handlers
import { defaultClient as appInsightsClient } from "applicationinsights";

server.tool("search_index", "Search the Azure Cognitive Search index", { query: z.string() }, async ({ query }) => {
  const start = Date.now();
  const results = await searchIndex(query, process.env.SEARCH_KEY!);
  const duration = Date.now() - start;

  // Track custom metric — visible in Application Insights Metrics
  appInsightsClient.trackMetric({ name: "mcp_tool_search_duration_ms", value: duration });
  appInsightsClient.trackEvent({ name: "mcp_tool_call", properties: { tool: "search_index", query } });

  return { content: [{ type: "text", text: JSON.stringify(results) }] };
});

Frequently asked questions

Which Azure Functions hosting plan should I use for a production MCP server?

For production MCP servers used in automated agent pipelines with latency expectations, use the Premium Plan (EP1). The always-warm instance eliminates cold starts that cause tool call timeouts in agent frameworks. For development, staging, or low-traffic endpoints where cost matters more than cold-start latency, Consumption Plan is appropriate. The cost difference: Consumption Plan is near-zero at low traffic, while Premium Plan EP1 runs ~$150/month minimum. That's a cost-justified trade-off if cold start failures cause agent run failures that require developer investigation — the developer-hours cost of debugging cold-start-caused failures typically exceeds the EP1 monthly fee quickly.

How do I handle Azure AD authentication for MCP endpoints?

Azure API Management (APIM) is the recommended approach: deploy your Azure Function as an HTTP trigger with authLevel: "anonymous" on the network-internal endpoint, then put APIM in front with an Azure AD JWT policy. APIM validates the Azure AD token from the MCP client before forwarding the request to the Function. This separates authentication (APIM's concern) from your MCP server code. The alternative — using Azure Function's built-in Easy Auth — works but is harder to configure for machine-to-machine (client credentials) flows needed by automated agents. For enterprise deployments, APIM also adds rate limiting, request logging, and API versioning without Function code changes.

Can Azure Functions handle SSE connections for the MCP SSE transport?

No — Azure Functions HTTP triggers don't support long-lived streaming responses required for SSE. Each invocation runs to completion and returns a response. Use StreamableHTTPServerTransport (request/response semantics) instead of SSEServerTransport. If you need SSE semantics for MCP, deploy on Azure App Service (a full Node.js server) or Azure Container Apps instead — these platforms support long-lived connections. Azure Container Apps also integrates with the Azure ecosystem (Key Vault, Managed Identity, VNet) with the same developer experience as Functions, and supports SSE transport natively.

How do I structure tool code to avoid Azure Functions timeout limits?

The 10-minute default timeout on Consumption Plan (configurable in host.json with functionTimeout) is usually sufficient for API-calling MCP tools. For tools that may exceed this: (1) use Durable Functions for multi-step orchestration that can checkpoint and resume; (2) for data processing tools, use Azure Service Bus to dispatch work to a Queue Trigger function running on a dedicated Premium Plan with unlimited execution time; (3) for tools that call Azure ML or long-running Azure services, use the start/poll pattern — return a job ID immediately, let the agent call a polling tool to check status. Never block a tool handler waiting for work that could take more than 5 minutes.

What's the best way to test Azure Functions MCP servers locally?

Use the Azure Functions Core Tools (func start) for local development — it simulates the Azure Functions runtime including trigger bindings, host.json settings, and environment variable loading from local.settings.json. The local.settings.json file stores local secrets (never commit this file). For integration tests: start the function locally with func start, then connect a real MCP client in your test suite to http://localhost:7071/api/mcp. This tests the full protocol path, not just individual tool handlers. Use azurite (Azure Storage emulator) for tools that depend on Azure Storage without connecting to real Azure resources during testing.