Framework guide · 2026-06-15 · MCP + Agentic Frameworks

MCP Servers in Python Agentic Frameworks

MCP is intentionally framework-agnostic. The same three-step sequence — initialize, tools/list, tools/call — is the same JSON-RPC protocol regardless of which Python framework sits above it. A tool registered on your MCP server has no idea whether it is being called by a LangChain ReAct agent, a LangGraph StateGraph, a CrewAI crew, an AutoGen conversation, or a Pydantic AI agent. The protocol is the protocol. What differs across frameworks is everything above the protocol: how the framework discovers and wraps tools, how it manages the connection lifecycle, how it propagates MCP errors into its own error model, and — critically — what happens when the MCP server goes down mid-workflow. This post synthesizes five deep-dives — LangChain, LangGraph, CrewAI, AutoGen, and Pydantic AI — into a unified picture of what differs and why all five share the same monitoring blind spot.

Five frameworks at a glance

Each framework approaches MCP at a different level of abstraction. The table below captures the integration entry point, the key lifecycle decision each framework forces, how MCP errors propagate into framework-native error handling, and the silent failure mode that makes external monitoring necessary in each case.

Framework	Integration entry point	Lifecycle decision	Error propagation	Silent failure mode
LangChain	`langchain-mcp-adapters` / `MultiServerMCPClient`	Open once at FastAPI lifespan; reuse via `app.state`	`ToolException` injected as observation into ReAct loop	Reconnect-per-request adds 100–500ms and hides server instability behind retry noise
LangGraph	`MultiServerMCPClient` → `ToolNode`	Checkpoints persist state, not MCP connections — reconnect on process resume	Conditional edge routes from tool node to dedicated `error_handler` node	Dead MCP server wastes all prior graph compute — expensive multi-step workflows fail at the last tool call
CrewAI	`MCPServerAdapter` (v0.105+)	Role-based assignment — give each agent only its domain tools	`ToolException` + LLM self-correction; `max_retry_limit=2` caps loops	Nightly cron crew fails silently — no human watching, LLM budget wasted
AutoGen	`register_function` with `caller`/`executor`	Module-level `httpx.AsyncClient`; lazy-initialize, reuse for conversation lifetime	Return error strings — do not raise; uncaught exceptions abort the turn	30-turn conversation reaches turn 28 before a dead server surfaces; all prior turns are wasted
Pydantic AI	Native `MCPServerSSE` / `MCPServerStdio` in `Agent()`	`agent.run_mcp_servers()` context manager for persistent service connections	`ValidationError` with auto-retry + `ValidationError` feedback to LLM	Schema errors surface immediately; network timeouts on dead servers surface as opaque 30-second hangs

The shared protocol layer

Before diving into the per-framework differences, it helps to be precise about what they share. Every framework listed above uses the same MCP protocol sequence to call a tool. No matter what abstraction sits above it, the wire protocol looks the same:

// 1. Client connects and sends initialize
{ "jsonrpc": "2.0", "id": 1, "method": "initialize",
  "params": { "protocolVersion": "2024-11-05", "capabilities": {}, "clientInfo": {...} } }

// 2. Server responds with its capabilities
{ "jsonrpc": "2.0", "id": 1, "result": { "serverInfo": {...}, "capabilities": {...} } }

// 3. Client lists available tools
{ "jsonrpc": "2.0", "id": 2, "method": "tools/list" }

// 4. Client calls a tool
{ "jsonrpc": "2.0", "id": 3, "method": "tools/call",
  "params": { "name": "search_documents", "arguments": { "query": "MCP monitoring" } } }

// 5. Server returns the result
{ "jsonrpc": "2.0", "id": 3, "result": { "content": [{ "type": "text", "text": "..." }], "isError": false } }

The adapter libraries and framework integrations hide this entirely. A LangChain developer writes await client.get_tools() and gets a list of BaseTool objects. A Pydantic AI developer adds MCPServerSSE(url='...') to the Agent constructor. Neither writes a line of JSON-RPC. But both are making the same protocol calls under the hood — and both are subject to the same failure modes when the server becomes unavailable.

This shared foundation means that a skill learned in one framework transfers directly to another: tool description quality matters the same way (the LLM uses the description to decide when and how to call the tool, regardless of framework); flat inputSchema designs are easier for any LLM to fill correctly; connection pooling is important in all five (the initialize handshake adds 100–500ms on every reconnect); and MCP server uptime is relevant to all five regardless of how their orchestration layers differ.

LangChain — the adapter layer and persistent connections

LangChain's MCP integration goes through langchain-mcp-adapters, a library maintained by the LangChain team. Its main abstraction is MultiServerMCPClient, an async context manager that handles transport negotiation (stdio or HTTP/SSE), the MCP handshake sequence, and tool translation. Calling await client.get_tools() returns a list of LangChain BaseTool objects where each tool's schema comes from the MCP server's inputSchema and each tool's description comes from the MCP server's tool description field.

The most consequential production decision in a LangChain MCP integration is where in the application lifecycle you open the MultiServerMCPClient connection. The wrong answer — opening it per request inside a FastAPI endpoint handler — adds 100–500ms to every agent invocation (the initialize handshake roundtrip) and hides connection-level instability behind retry noise: if the MCP server is degraded but responding slowly, per-request reconnects cause cascading latency without ever triggering a clear error. The right answer is to open the client once at application lifespan startup:

from contextlib import asynccontextmanager
from fastapi import FastAPI
from langchain_mcp_adapters.client import MultiServerMCPClient

@asynccontextmanager
async def lifespan(app: FastAPI):
    async with MultiServerMCPClient({
        "search": { "url": "https://search.internal/mcp", "transport": "streamable_http" },
        "db": { "command": "python", "args": ["-m", "db_mcp"], "transport": "stdio" },
    }) as client:
        app.state.mcp_tools = await client.get_tools()
        yield  # server runs here; client stays open

app = FastAPI(lifespan=lifespan)

Error propagation in LangChain works through ToolException. When an MCP tool call returns isError: true or the MCP connection itself fails, the adapter raises a ToolException, which the AgentExecutor or create_react_agent injects back into the agent's observation step. This means MCP errors become observations in the ReAct loop — the LLM sees the error text and can decide to retry with different parameters, try an alternative tool, or report failure. The practical implication is that a flapping MCP server (intermittently returning errors) produces a lot of expensive LLM retry traffic before any human notices the failure.

Tool description quality has an outsized effect on LangChain agents because the LLM is responsible for all routing. A poorly worded description causes the LLM to call the wrong tool repeatedly. The description should specify what the tool does, what its parameters mean, and — critically — what it does not do, so the LLM doesn't try to use a search tool as a write tool. This quality pressure applies to the MCP server's tool definitions, not to LangChain config.

LangGraph — StateGraph, ToolNode, and the checkpoint gap

LangGraph's MCP integration uses the same MultiServerMCPClient and get_tools() call as LangChain, but LangGraph's model of execution introduces a production consideration that LangChain doesn't have: checkpoints.

LangGraph's StateGraph persists conversation state to a checkpointer (SQLite, Redis, or Postgres) at each node boundary. This means a graph that pauses — waiting for human review, waiting for an async event, or because the process restarted — can resume from exactly the point it left off. The state that is persisted is the message history and any state fields you define. What is not persisted is the MCP client connection: connections are open file descriptors and sockets, not serializable data. When a graph resumes from a checkpoint, the MCP connections must be re-established:

import asyncio
from langchain_mcp_adapters.client import MultiServerMCPClient
from langgraph.prebuilt import create_react_agent
from langgraph.checkpoint.sqlite.aio import AsyncSqliteSaver

async def run_or_resume(thread_id: str, user_message: str):
    # MCP connections are always re-established — they don't survive checkpoints
    async with MultiServerMCPClient({
        "tools": { "url": "https://tools.internal/mcp", "transport": "streamable_http" },
    }) as client:
        tools = await client.get_tools()
        checkpointer = AsyncSqliteSaver.from_conn_string("checkpoints.db")
        agent = create_react_agent(model, tools, checkpointer=checkpointer)

        # If thread_id has a checkpoint, LangGraph resumes from it;
        # MCP tools are freshly connected for this execution window
        result = await agent.ainvoke(
            { "messages": [{ "role": "user", "content": user_message }] },
            config={ "configurable": { "thread_id": thread_id } },
        )

LangGraph's ToolNode supports parallel tool calls natively: when the LLM requests multiple tools in a single response, ToolNode dispatches them concurrently via asyncio.gather. The latency of a parallel batch is the maximum of the individual latencies, not the sum. For MCP tools that each call different external services, this can cut multi-tool steps from sequential seconds to the latency of the slowest single call.

Error handling in LangGraph is expressed as graph topology: instead of catching exceptions in tool handlers, you define a conditional edge from the tool node that routes to a dedicated error_handler node when tool_error_count in the graph state exceeds a threshold. This makes error handling explicit, testable, and inspectable — you can see in the graph visualization exactly what the error recovery path is. The monitoring implication is that an MCP server going down mid-graph causes expensive retries (each retry invokes the LLM to generate a new tool call) before hitting the error threshold and routing to the handler node.

CrewAI — role-based tools and the batch scheduling gap

CrewAI's MCP integration uses MCPServerAdapter (available from v0.105+). The adapter is initialized as an async context manager and exposes an .tools property that returns a list of LangChain-compatible tool objects, passed directly to an Agent constructor:

from crewai import Agent, Task, Crew, Process
from crewai_tools import MCPServerAdapter

async def run_research_crew():
    async with MCPServerAdapter({ "url": "https://search.internal/mcp", "transport": "sse" }) as search_adapter:
        async with MCPServerAdapter({ "url": "https://db.internal/mcp", "transport": "sse" }) as db_adapter:

            researcher = Agent(
                role="Research Specialist",
                goal="Find authoritative information on the topic",
                backstory="Expert at web research and synthesis",
                tools=search_adapter.tools,  # Only search tools — not database tools
                max_iter=5,
                max_retry_limit=2,
            )
            analyst = Agent(
                role="Data Analyst",
                goal="Query structured data and surface trends",
                backstory="Expert at SQL queries and data interpretation",
                tools=db_adapter.tools,  # Only database tools — not search tools
                max_iter=5,
                max_retry_limit=2,
            )

Role-based tool assignment — each agent receives only the tools relevant to its function — is the most important structural decision in a CrewAI MCP integration. Giving every agent every tool causes the LLM to spend tokens reasoning about irrelevant tools on every step. More importantly, it makes task assignment ambiguous: if a researcher agent can call database tools, a manager agent in a hierarchical process may delegate a database task to the researcher. Matching tools to roles via MCPServerAdapter keeps delegation unambiguous.

The max_iter=5 and max_retry_limit=2 parameters are safety valves, not defaults to ignore. Without max_retry_limit, a CrewAI agent hitting a consistently failing MCP tool will loop until the LLM budget runs out. With it, the third failure raises a TaskError that propagates out of the crew kickoff — at least surfacing as an exception rather than a silent budget drain.

CrewAI is particularly common in batch pipeline use cases: scheduled nightly jobs that compile reports, weekly summaries, or data enrichment passes. This is where the monitoring gap matters most. A crew that runs interactively gives the developer immediate feedback when something goes wrong. A nightly batch job that fails because an MCP server went down at 2 AM produces a missing report that somebody notices at 9 AM — after the window for the report has already passed. A pre-flight health check before scheduling kickoff:

import httpx, asyncio

async def preflight_check(mcp_url: str) -> bool:
    try:
        async with httpx.AsyncClient(timeout=5.0) as client:
            # MCP initialize handshake via HTTP
            resp = await client.post(mcp_url, json={
                "jsonrpc": "2.0", "id": 1, "method": "initialize",
                "params": { "protocolVersion": "2024-11-05", "capabilities": {}, "clientInfo": { "name": "preflight", "version": "1.0" } }
            })
            return resp.status_code == 200
    except Exception:
        return False

async def maybe_kickoff_crew():
    if not await preflight_check("https://search.internal/mcp"):
        raise RuntimeError("MCP server unavailable — aborting crew to avoid LLM budget waste")
    # run crew...

The pre-flight check catches obvious infrastructure failures. AliveMCP monitoring catches the subtler cases — a server that initializes successfully but times out on actual tool calls, or one that returns correct responses for the first five calls and starts erroring on the sixth.

AutoGen — function registration and the error-string rule

AutoGen's MCP integration is the most manual of the five frameworks. AutoGen v0.4's tool system is deliberately framework-agnostic: any async Python function with type annotations becomes a tool. There is no MCP-specific adapter library. Instead, you write an async wrapper function that calls your MCP server (via the MCP SDK client or a direct httpx call), and register it using register_function with explicit caller and executor assignments:

import autogen
from mcp import ClientSession
from mcp.client.sse import sse_client

# Module-level session — one connection for the entire AutoGen conversation
_mcp_session: ClientSession | None = None

async def get_mcp_session() -> ClientSession:
    global _mcp_session
    if _mcp_session is None:
        read, write = await sse_client("https://search.internal/mcp").__aenter__()
        _mcp_session = await ClientSession(read, write).__aenter__()
        await _mcp_session.initialize()
    return _mcp_session

async def search_documents(query: str, max_results: int = 10) -> str:
    """Search for documents matching the query. Returns a JSON list of results.
    Use for: finding relevant articles, documentation, or past research.
    Do not use for: real-time data, database queries, or file operations."""
    try:
        session = await get_mcp_session()
        result = await session.call_tool("search_documents", { "query": query, "max_results": max_results })
        if result.isError:
            return f"Search failed: {result.content[0].text}"
        return result.content[0].text
    except Exception as e:
        return f"Search unavailable: {e}"  # return string, not raise — see below

assistant = autogen.AssistantAgent("assistant", llm_config={ "model": "claude-sonnet-4-6", ... })
proxy = autogen.UserProxyAgent("proxy", human_input_mode="NEVER", max_consecutive_auto_reply=10)

autogen.register_function(
    search_documents,
    caller=assistant,
    executor=proxy,
    name="search_documents",
    description="Search for documents matching the query. Returns a JSON list of results.",
)

The most important AutoGen-specific rule for MCP integration is the error-string rule: return error information as a string, never raise an exception from a registered tool function. In AutoGen's conversation model, an uncaught exception from a tool call aborts the current conversation turn — the exception propagates up through the UserProxyAgent's executor and can break the conversation state. A returned error string, by contrast, gets injected back into the conversation as the tool's output: the AssistantAgent sees the error text and can reason about it, retry with different parameters, or report the problem to the user. This gives the LLM the opportunity to recover; an exception gives it nothing.

The module-level connection pool decision has a dramatic effect on efficiency. An AutoGen conversation with 30 turns where a tool is called every other turn involves roughly 15 MCP tool calls. If the session is re-established per call — 15 initialize handshakes — each adds 100–500ms, totalling 1.5–7.5 seconds of pure overhead per conversation. A module-level session that initializes once and reuses across all 15 calls brings that overhead to one initialize handshake per conversation. For a production service handling hundreds of concurrent AutoGen conversations, the difference is significant.

AutoGen's GroupChat pattern allows multiple agents, each with different MCP tools registered, to participate in the same conversation under a speaker-selection coordinator. This maps naturally to multi-domain MCP deployments: a researcher agent with a web search MCP, an analyst agent with a database query MCP, and a writer agent with a document generation MCP. The coordinator routes each topic to the agent whose tool set covers it. Because each agent has its own module-level MCP connection, each connection is only held open for the agents that actually need it.

Pydantic AI — native MCP and the flat-schema requirement

Pydantic AI's MCP integration is architecturally different from the other four: MCP support is built into the framework's core rather than provided via an adapter library. MCPServerSSE and MCPServerStdio are first-class Agent constructor parameters. The framework discovers tools via tools/list, validates their arguments using the same Pydantic v2 machinery it applies to native Python tools, auto-retries when the LLM generates malformed arguments (injecting the ValidationError as feedback for the next attempt), and enforces a typed return value via result_type:

import asyncio
import os
from pydantic import BaseModel
from pydantic_ai import Agent
from pydantic_ai.mcp import MCPServerSSE

class ResearchResult(BaseModel):
    summary: str
    key_findings: list[str]
    confidence: float  # 0.0–1.0

research_agent = Agent(
    "claude-sonnet-4-6",
    mcp_servers=[
        MCPServerSSE(
            url="https://search.internal/mcp",
            headers={"Authorization": f"Bearer {os.environ['SEARCH_TOKEN']}"},
        )
    ],
    result_type=ResearchResult,
    retries=3,
    system_prompt="Research the topic thoroughly and return a structured summary."
)

# For a long-running FastAPI service — persistent connection across multiple runs
async def lifespan():
    async with research_agent.run_mcp_servers():
        yield  # FastAPI runs here; MCP connection stays open

The result_type=ResearchResult pattern is Pydantic AI's most distinctive production feature. Instead of receiving a free-form string from the agent and hoping it conforms to your expected structure, you declare the output shape as a Pydantic model. The framework generates a tool that the LLM must call to produce its final answer — the arguments to that tool become the validated, typed return value of agent.run(). If the LLM's output fails validation, Pydantic AI automatically retries (up to retries=3) with the ValidationError injected as feedback. The practical effect is that agent output is as reliable as form validation — either it matches your schema or it explicitly fails after a bounded number of retries.

The flat-schema requirement is the most frequently violated MCP tool design rule in Pydantic AI pipelines. Pydantic AI validates MCP tool arguments against the server's inputSchema. A flat schema — all parameters at the top level as primitives — is easy for the LLM to fill correctly and produces a clear ValidationError when it fails. A nested schema — objects containing objects containing arrays — is harder for the LLM to fill and produces deeply nested validation errors that are harder to inject as useful feedback. The Pydantic AI auto-retry mechanism works best when the validation feedback is a clear, actionable message: "the query field must be a string" beats "the filters.temporal.from field failed type validation" when the goal is LLM self-correction.

Pydantic AI's monitoring gap is the inverse of its strength. Schema errors surface immediately with clear ValidationError text — Pydantic AI fails fast and loudly on bad tool definitions. But infrastructure failures — a network timeout, a dead MCP server, a TCP connection that never completes the initialize handshake — surface as opaque 30-second hangs. The framework's strict typing catches the developer errors; external monitoring is what catches the operational failures.

The shared failure mode: all five fail silently on dead servers

Despite their different architectures, all five frameworks share the same blind spot: when an MCP server becomes unavailable mid-workflow, none of them tells you immediately. They each fail in a different way, but all five fail in a way that costs time and LLM tokens before the problem surfaces.

LangChain: A dead server causes ToolException to inject an error observation into the ReAct loop. The LLM retries the tool call (with the same broken server). After max_iterations retries — typically 15 — the chain raises OutputParserException. Each retry invokes the LLM. A 15-iteration cap on a dead server means 15 LLM calls wasted before failure surfaces.
LangGraph: A dead server causes the ToolNode to return an error result into the graph state. The conditional edge routes this to the error handler node, which may retry. In a graph designed for reliability with multiple retry paths, a dead MCP server can cause the graph to exhaust its retry budget across many nodes before propagating the error to the caller — each node retry is an LLM invocation.
CrewAI: A dead server causes a tool call failure that triggers the LLM self-correction loop (up to max_retry_limit times per task). In a crew with multiple agents and tasks, each agent may hit the failure independently before the crew raises a TaskError. Nightly batch jobs typically have nobody watching when this happens — the failure appears the next morning as a missing report.
AutoGen: Per the error-string rule, the tool function returns an error string rather than raising. The AssistantAgent sees the error, reasons about it, and may try alternative approaches for many turns before the conversation terminates. A 30-turn conversation that hits a dead server at turn 5 may spend the remaining 25 turns attempting workarounds — all charged to the LLM budget.
Pydantic AI: A network timeout on the MCP server connection causes the Agent.run() call to hang for the duration of the connection timeout (default 30 seconds). If retries are configured, each retry waits the full timeout. The caller waits, receives no feedback, and only learns of the failure when the timeout eventually fires.

The pattern is consistent: MCP server downtime does not produce an immediate, unambiguous failure. It produces a series of expensive, partial failures — each of which the framework's error handling layer attempts to recover from — before eventually surfacing as a high-level error. The downstream consequence depends on the use case:

Interactive agent: the user waits for a response that never comes, then sees an error after 30–90 seconds
Batch pipeline: the job fails silently; the output is missing when a human checks the next morning
Multi-step workflow: correct outputs from steps 1–9 are discarded when step 10 hits the dead server

External monitoring closes this gap. AliveMCP probes run the full MCP protocol sequence — initialize, tools/list, actual tool calls — at configurable intervals (typically 60 seconds). When a probe fails, an alert fires within one check interval. Alerting at 60 seconds instead of 30–90 minutes (the time it takes a framework's retry budget to exhaust and a human to notice) changes the recovery timeline entirely.

The monitoring setup for multi-framework deployments is straightforward: configure one AliveMCP monitor per MCP server endpoint, not per framework. The MCP server is the infrastructure component; the framework is the application layer. A single MCP server going down affects all frameworks that depend on it simultaneously. Monitoring the server catches the failure before any framework has time to start its retry cycle.

Choosing a framework for a new MCP-backed project

The five frameworks cover different points on the complexity-control tradeoff. A few heuristics that emerge from their MCP-specific properties:

Start with LangChain if you need a single conversational agent with tool use and you want the fastest integration path. MultiServerMCPClient + create_react_agent is three lines of meaningful code. The persistent-connection lifespan pattern is the one optimization worth adding before launch.
Move to LangGraph if you need checkpointed state, parallel tool branches, or human-in-the-loop interrupts. The checkpoint gap (MCP connections don't persist) is the one architectural invariant to internalize — it affects how you structure the graph entry point for resumed workflows.
Use CrewAI for multi-agent task pipelines where role-based tool ownership is a natural fit for your domain — research, analysis, generation, and publication as four distinct agent roles each with their own MCP toolset. Set max_retry_limit=2 and add a pre-flight health check for any batch scheduling use case.
Use AutoGen for long multi-turn conversations where different agents in the same conversation need different tool sets, and where you want explicit control over tool registration rather than an adapter layer. Internalize the error-string rule and the module-level connection pool before writing your first registered function.
Use Pydantic AI if your agent must return structured, validated output that downstream code depends on, or if you are already using Pydantic v2 heavily and want the framework that treats type safety as a first-class concern. Flatten your MCP tool input schemas and expect schema errors to surface loudly while infrastructure errors surface quietly.

All five choices have the same operational requirement: MCP server monitoring. The framework determines how your agent code is structured; the MCP server monitoring determines how quickly you know when that structure's foundation becomes unavailable.