Guide · AI Platform Integration

MCP servers with Google Gemini

Google Gemini supports function calling through the google-genai Python SDK. MCP servers connect to Gemini through a two-step adapter pattern: convert MCP tool definitions from tools/list into Gemini FunctionDeclaration objects, run the model, dispatch any FunctionCall responses to the MCP server via tools/call, and loop. Alternatively, the Google Agent Development Kit (ADK) — Google's orchestration framework for multi-agent systems — has a native MCPToolset class that handles the adapter pattern for you. Both approaches have distinct tradeoffs: the google-genai loop gives you full control; the ADK gives you managed orchestration, session memory, and multi-agent routing. In both cases, the MCP server runs independently and needs its own monitoring.

TL;DR

Convert MCP tools to types.FunctionDeclaration objects and pass them in types.Tool(function_declarations=[...]). Loop on response.function_calls: dispatch each to the MCP server via session.call_tool(), inject results as types.FunctionResponse parts, call the model again. For managed orchestration, use Google ADK's MCPToolset(connection_params=...) instead of building the loop yourself. Monitor MCP servers with AliveMCP — Gemini's function calling loop surfaces MCP server failures as empty tool results or exceptions, neither of which identifies the failure source clearly.

MCP to Gemini FunctionDeclaration conversion

Gemini's function calling uses FunctionDeclaration objects with a JSON Schema parameter definition. The conversion from MCP's inputSchema is direct — both use JSON Schema — but Gemini expects parameters nested under a properties key with explicit type: "object":

from google import genai
from google.genai import types
from mcp import ClientSession
from mcp.client.http import http_client

def mcp_tools_to_gemini(tools) -> list[types.FunctionDeclaration]:
    """Convert MCP tool definitions to Gemini FunctionDeclaration objects."""
    declarations = []
    for tool in tools:
        # MCP inputSchema is already JSON Schema — use it directly
        schema = tool.inputSchema or {"type": "object", "properties": {}}
        declarations.append(
            types.FunctionDeclaration(
                name=tool.name,
                description=tool.description or f"Call {tool.name}",
                parameters=schema,
            )
        )
    return declarations

Gemini is strict about JSON Schema validity in FunctionDeclaration. If an MCP tool's inputSchema uses JSON Schema features that Gemini doesn't support (like anyOf, oneOf, or $ref), the API returns a 400 error with a schema validation message. Flatten complex schemas before converting — expand anyOf into description text and inline $ref definitions. The MCP server tool design guide covers schema simplification patterns that work well across all LLM platforms.

Function calling loop with MCP

import asyncio
from google import genai
from google.genai import types
from mcp import ClientSession
from mcp.client.http import http_client

MCP_URL = "https://search.internal/mcp"
client = genai.Client(api_key="YOUR_GEMINI_API_KEY")

async def run_gemini_agent(question: str) -> str:
    async with http_client(MCP_URL) as (read, write):
        async with ClientSession(read, write) as session:
            await session.initialize()

            tools_result = await session.list_tools()
            gemini_tools = [
                types.Tool(
                    function_declarations=mcp_tools_to_gemini(tools_result.tools)
                )
            ]

            contents = [types.Content(
                role="user",
                parts=[types.Part(text=question)]
            )]

            # Agentic loop
            while True:
                response = await client.aio.models.generate_content(
                    model="gemini-2.0-flash",
                    contents=contents,
                    config=types.GenerateContentConfig(
                        tools=gemini_tools,
                        temperature=0,
                    ),
                )

                # Append model response to conversation
                contents.append(response.candidates[0].content)

                function_calls = response.function_calls
                if not function_calls:
                    # No tool calls — final response
                    return response.text

                # Dispatch all function calls to MCP server in parallel
                tool_results = await asyncio.gather(*[
                    dispatch_to_mcp(session, fc)
                    for fc in function_calls
                ], return_exceptions=True)

                # Inject results back into conversation
                result_parts = []
                for fc, result in zip(function_calls, tool_results):
                    if isinstance(result, Exception):
                        response_data = {"error": str(result)}
                    else:
                        response_data = {"result": result}
                    result_parts.append(
                        types.Part(function_response=types.FunctionResponse(
                            name=fc.name,
                            response=response_data,
                        ))
                    )
                contents.append(types.Content(role="user", parts=result_parts))

async def dispatch_to_mcp(session: ClientSession, fc) -> str:
    """Dispatch a Gemini FunctionCall to the MCP server."""
    result = await session.call_tool(
        fc.name,
        arguments=dict(fc.args),
    )
    if result.isError:
        raise RuntimeError(f"MCP tool error from {fc.name}")
    return "\n".join(c.text for c in result.content if hasattr(c, "text"))

asyncio.run(run_gemini_agent("Which MCP servers have the best uptime this month?"))

Gemini often returns multiple function calls in a single response turn — dispatch them in parallel with asyncio.gather() and collect all results before the next generate_content() call. This is the key performance advantage of Gemini's multi-call architecture: if the LLM decides to call 3 tools, all 3 MCP round-trips happen concurrently.

Parallel tool calls — Gemini's key advantage

Gemini 2.0 Flash and Gemini 1.5 Pro frequently generate multiple function calls in a single response when the tools are independent. This is one of Gemini's distinguishing characteristics compared to models that generate tool calls one at a time.

Scenario	Single-call model	Gemini (multi-call)
3 independent tool calls, 200 ms each	600 ms total	~220 ms total (parallel)
2 sequential dependent tools	400 ms + inference	400 ms + inference (no speedup)
5 tools, mixed dependency	1000 ms + 2 inference calls	400 ms + 2 inference calls

Parallel tool calls only help when MCP server calls are independent. For tools that must be called in order (e.g., search then fetch the result URL), the LLM correctly sequences them across turns. Design MCP tools to be as independent as possible — one tool per action, not multi-step tools that combine search + fetch internally. Independent tools also make the agent's reasoning more auditable: you can see exactly what the model chose to call and why.

Google ADK with MCPToolset

The Google Agent Development Kit (ADK) is Google's orchestration framework, similar to the OpenAI Agents SDK. It has a native MCPToolset that manages the MCP connection and tool discovery automatically:

from google.adk.agents import Agent
from google.adk.tools.mcp_tool.mcp_toolset import MCPToolset, SseServerParams

# ADK agent with MCP tools — MCPToolset handles connection lifecycle
research_agent = Agent(
    name="ResearchAgent",
    model="gemini-2.0-flash",
    instruction="You are a research assistant. Use the available tools to answer questions.",
    tools=[
        MCPToolset(
            connection_params=SseServerParams(
                url="https://search.internal/mcp",
                headers={"Authorization": "Bearer sk-..."},
            )
        )
    ],
)

from google.adk.runners import Runner
from google.adk.sessions import InMemorySessionService

session_service = InMemorySessionService()
runner = Runner(
    agent=research_agent,
    app_name="research-app",
    session_service=session_service,
)

async def run():
    session = await session_service.create_session(
        app_name="research-app",
        user_id="user-1",
    )
    async for event in runner.run_async(
        user_id="user-1",
        session_id=session.id,
        new_message=types.Content(
            role="user",
            parts=[types.Part(text="What is the current uptime of the top MCP servers?")]
        ),
    ):
        if event.is_final_response():
            print(event.content.parts[0].text)

MCPToolset also supports stdio-based MCP servers via StdioServerParameters. The ADK's session service stores conversation history, enabling multi-turn agent interactions without manually managing the contents list. ADK also supports multi-agent architectures via sub-agents, which is useful for routing between different MCP-backed specialist agents.

Streaming with Gemini function calls

Gemini's streaming API (generate_content_stream()) streams text tokens but buffers function calls internally until they are complete. This means the user sees partial text responses quickly, but tool dispatches don't start until the full function call block is received:

async def stream_with_mcp(session, contents, tools):
    full_response = None
    async for chunk in await client.aio.models.generate_content_stream(
        model="gemini-2.0-flash",
        contents=contents,
        config=types.GenerateContentConfig(tools=tools),
    ):
        if chunk.text:
            # Stream text to user immediately
            print(chunk.text, end="", flush=True)
        # Accumulate function calls (they arrive complete, not streamed)
        if chunk.candidates and chunk.candidates[0].finish_reason:
            full_response = chunk

    # After streaming completes, handle any function calls in full_response
    if full_response and full_response.function_calls:
        results = await asyncio.gather(*[
            dispatch_to_mcp(session, fc)
            for fc in full_response.function_calls
        ])
        # Continue the loop with results...

For agentic loops where the intermediate tool-call turns don't need to be shown to the user, non-streaming is simpler and equivalent in latency. Use streaming only for the final response turn, or use ADK's event stream which abstracts this distinction.

Monitoring MCP servers in Gemini pipelines

When an MCP server returns an error or is unreachable, the Gemini function calling loop receives an exception or an empty response from session.call_tool(). Neither maps to a clear Gemini-level error — from the model's perspective, it just sees a FunctionResponse with an error payload. The model may then try a different approach, ask the user for clarification, or loop on the error — burning inference tokens without making progress.

The most expensive failure mode: a temporarily slow MCP server (timeouts at 10+ seconds) in a parallel tool call batch causes the entire batch to wait for the slowest tool. If 5 tools are dispatched in parallel and one MCP server takes 12 seconds, the entire batch takes 12 seconds — even though the other 4 returned in 200 ms. AliveMCP surfaces latency degradation before it becomes a timeout: the response-time history shows when a server's P95 latency starts climbing, giving you time to investigate before the 10-second timeout cliff. For Gemini pipelines where parallel tool calls make latency outliers disproportionately expensive, this advance warning is the difference between a slow agent and a hanging one.

Frequently asked questions

Which Gemini models support function calling for MCP integration?

Gemini 2.0 Flash, Gemini 2.0 Flash-Lite, Gemini 1.5 Pro, and Gemini 1.5 Flash all support function calling. Gemini 2.0 Flash is the recommended model for MCP-backed agents: it's fast (low inference latency), cost-effective, and handles parallel tool calls well. Gemini 1.5 Pro has a larger context window (2M tokens) which is useful for agents that accumulate long tool-call conversation histories. Gemini models accessed via Vertex AI use the same function calling API through the google-cloud-aiplatform SDK.

What's the difference between using google-genai SDK directly vs Google ADK for MCP?

The google-genai SDK gives you the raw API — you write the agent loop, manage conversation history, and handle retries yourself. It has zero framework opinions. Google ADK adds: session management (conversation persistence), multi-agent routing (sub-agents, handoffs), streaming event model, and built-in MCP support via MCPToolset. Use google-genai directly if you have a simple use case or need tight control over the agent loop. Use ADK if you want managed sessions, are building multi-agent systems, or want MCP to "just work" without adapter code.

Can I mix Gemini native tools and MCP tools in the same agent?

Yes. In the google-genai SDK, pass both native callable functions and FunctionDeclaration objects from MCP in the same types.Tool list. The SDK handles native function dispatch automatically (calling the Python function directly) while you handle MCP dispatch manually via the tool loop. In Google ADK, mix native FunctionTool instances and MCPToolset in the agent's tools list — ADK routes each tool call to the appropriate handler.

How do I prevent the Gemini function calling loop from running indefinitely?

Set a maximum iteration count: maintain a counter in the loop and break with an error message after N iterations (typically 10–15 for most use cases). Gemini's function calling can loop if the LLM repeatedly calls tools that return unhelpful results — common when MCP servers return empty results for valid queries. Also set a total token budget: track cumulative input tokens across turns and break when approaching the model's context limit. ADK handles this automatically via its max_iterations parameter.

Does Gemini support MCP servers over stdio transport?

Gemini itself doesn't care about transport — it receives function declarations and calls functions, regardless of how those functions are backed. You can connect a stdio MCP server using the MCP Python SDK's stdio_client(), which manages the subprocess, and then convert its tools to Gemini FunctionDeclaration objects the same way as HTTP/SSE servers. The transport management is entirely on your side; Gemini only sees the function calling interface.