Implementation guide · 2026-06-12 · Real-world MCP tools

Building Real-World MCP Tools: Filesystem, Web, Databases, Code Execution, and APIs

Most MCP tutorial examples are self-contained: a get_weather tool that calls an API, a calculator tool that does arithmetic. These are clean to reason about. Real MCP tools are different — they reach outside the process boundary to touch the filesystem, the network, a database, a container runtime, or a third-party API. When tool inputs arrive as LLM-generated strings, each of those external interactions becomes a potential attack vector: path traversal for filesystem tools, SSRF for web fetchers, SQL injection for database queries, sandbox escape for code execution, credential leakage for API wrappers. Every tool category has a different attack surface but they all reduce to the same root cause: unsanitized LLM-provided input reaching an external system. This guide synthesizes the filesystem, web search, code execution, database, and API wrapper tool patterns into a unified framework — and explains a second cross-cutting problem that security alone cannot solve: invisible failure modes that break tool execution while leaving the MCP transport layer healthy.

TL;DR

Pattern 1: Unsanitized LLM input reaching an external system

The common thread across all real-world MCP tool categories is not the technology stack — it is the trust boundary. In each case, input that the LLM generates (a file path, a URL, a code string, a SQL query fragment, a key name) crosses a boundary into an external system that enforces its own rules. If that input is not validated before crossing the boundary, the external system does what it does with any input — which may not be what the developer intended.

The five categories map to five distinct external system types, each with a characteristic attack surface:

Tool category External system Primary attack vector Root cause Defense
Filesystem OS file API Path traversal (../../etc/passwd) Relative path segments bypass intended root path.resolve() + allowed-root prefix check with path.sep suffix
Web / fetch HTTP client + DNS SSRF to internal network or metadata APIs DNS resolution happens after URL validation, enabling rebinding Resolve hostname to IP first; block RFC 1918 + link-local + loopback after resolution
Database SQL engine SQL injection via string interpolation Query built by concatenation instead of parameterized binding Parameterized queries at driver level; read-only DB user; LIMIT injection wrapper
Code execution Runtime / OS Sandbox escape to host filesystem or network eval()/vm.Script share process + prototype chain with host Docker with --network none, --read-only, --no-new-privileges, memory + PID limits
API wrapper Third-party API Credential leakage via tool argument logging API keys passed as arguments appear in context window + call history Server-side auth injection; never accept keys as tool parameters

The unifying frame: LLMs are good at generating structured output, but they are not security-aware. A model asked to "read the user's SSH keys" may generate ../../../home/user/.ssh/id_rsa as a file path argument — not because it is malicious, but because that is the answer to the question. The MCP server is responsible for rejecting that input before it reaches the OS. The attack surface is the gap between what the LLM can generate and what the external system will accept.

Filesystem tools: path traversal defense in depth

A filesystem MCP server gives an LLM read and write access to files — which is exactly why it needs the most conservative input validation of any tool category. The core invariant is simple: every file operation must stay within an explicitly declared root directory. The implementation is subtle: path.startsWith(ALLOWED_ROOT) is wrong because /workspace-evil starts with /workspace. The correct check appends path.sep to the root before comparing:

function assertSafePath(userInput) {
  const resolved = path.resolve(userInput);
  if (!resolved.startsWith(WORKSPACE + path.sep) && resolved !== WORKSPACE) {
    throw new Error(`Access denied: path outside workspace`);
  }
  return resolved;
}

path.resolve() collapses all ../ segments before the prefix check runs, so there is no traversal vector regardless of how many levels of ../ the input contains. The path.sep suffix prevents the /workspace-evil false positive.

Beyond path validation, production filesystem tools need three additional patterns: stat-before-read guards that check file size before reading (prevents OOM from multi-GB files), atomic write patterns using .tmp-{pid}-{timestamp} rename (prevents partial-file reads during writes), and depth-limited directory listings that return relative paths (absolute paths leak server directory structure to the LLM context). The MCP resources API is the right transport for file content injection into context — use file:// URIs as resources rather than read_file tool calls for large reference documents that the LLM needs throughout a session.

Web fetch tools: SSRF and DNS rebinding

SSRF (Server-Side Request Forgery) is the web-fetch equivalent of path traversal: an attacker supplies a URL that causes the server to make requests to internal infrastructure the attacker cannot reach directly. The classic defense — checking whether the URL's hostname is a private IP address — fails against DNS rebinding attacks, where a public hostname is resolved to a private IP after the hostname check passes.

The correct defense resolves the hostname to an IP address first, then checks the IP against blocked ranges before making the connection:

async function assertSafeUrl(rawUrl) {
  const url = new URL(rawUrl); // throws on malformed
  if (!['http:', 'https:'].includes(url.protocol)) {
    throw new Error('Only http/https allowed');
  }
  const [ip] = await dns.promises.resolve4(url.hostname);
  if (isBlockedIp(ip)) {
    throw new Error(`Blocked: ${url.hostname} resolves to private IP ${ip}`);
  }
  return url;
}

function isBlockedIp(ip) {
  return /^(10\.|172\.(1[6-9]|2\d|3[01])\.|192\.168\.|127\.|169\.254\.|24[0-9]\.|25[0-5]\.)/.test(ip);
}

The regex covers RFC 1918 private ranges, loopback, AWS instance metadata (169.254.x.x), and reserved ranges. DNS rebinding is defeated because the resolution happens inside the security check — there is no window between validation and connection where a DNS response could change. See the SSRF prevention guide for the full blocked CIDR list and the IPv6 equivalent patterns.

Beyond SSRF defense, production web-fetch tools need response size limits (500 KB default, checked against Content-Length before reading), AbortController timeouts (10s default), content-type validation, HTML-to-text stripping that removes <script> and <style> blocks before tag stripping, and a URL-keyed LRU response cache (5-minute TTL, 500-entry limit, never caching errors). Per-domain rate limiting at 1 request per second prevents the tool from being used to DoS third-party sites.

Code execution tools: why eval() and vm.Script are not sandboxes

The temptation with code execution tools is to reach for eval() or Node.js's built-in vm.Script because they are readily available and easy to implement. Both provide weak or no isolation:

Approach Isolation level Known escape Filesystem access Network access
eval() None Full process access by definition Full Full
vm.Script (sandbox: false) Scope only Prototype chain escape via constructor.constructor Full Full
vm.Script (sandbox: true) Scope + prototype Context isolation escape via shared built-ins Full (if require exposed) Full (if require exposed)
Docker with flags Full container isolation None (with correct flags) Read-only volume only None (--network none)

The six Docker flags that make container isolation real: --network none (no egress — blocks all network calls from executed code), --memory 256m --memory-swap 256m (no swap — OOM kill instead of disk swap), --cpus 0.5 (CPU quota prevents host starvation), --read-only (filesystem immutable except explicit volume mounts), --no-new-privileges (no setuid/setgid escalation), --pids-limit 64 (fork bomb prevention). All six are necessary — omitting any one opens a specific escape vector.

The full code execution guide covers the per-language image setup (Python, Node, bash), the execFile-with-timeout pattern (no shell expansion), volume mounting code read-only at /sandbox, the finally block for tmpDir cleanup, and pre-pulling images at server startup to avoid cold-start latency on the first tool call. For long-running computations, the partial-result pattern emits intermediate output as MCP notifications rather than blocking the tool call until completion.

Database tools: parameterized queries and the read-only user

SQL injection is one of the oldest vulnerabilities in software, and it appears in MCP database tools through the same mechanism it always has: a query built by string interpolation instead of parameterized binding. The correct defense is driver-level parameterized queries — no string concatenation, no template literals embedding user input into SQL:

// Wrong: SQL injection vector
const rows = await db.query(`SELECT * FROM ${table} WHERE id = ${id}`);

// Correct: parameterized binding (pg driver)
const rows = await client.query(
  'SELECT * FROM items WHERE id = $1 AND owner = $2',
  [id, ownerId]
);

// Correct: better-sqlite3 (synchronous named bindings)
const rows = db.prepare(
  'SELECT * FROM items WHERE id = ? AND owner = ?'
).all(id, ownerId);

The parameterized form is not optional based on risk assessment — it applies to every query including queries that look safe, because LLM-generated inputs are not predictable and injection can be composed across multiple tool calls within a session. See the full database guide for the parameterized query syntax table across postgres.js, node-postgres, better-sqlite3, Prisma, and Drizzle.

Beyond parameterized queries, three additional patterns complete the production database tool: a read-only database user (the MCP server's credentials cannot write if they are read-only at the database level — injection that attempts to write fails at the permission level even if it bypasses the parameterized query check), LIMIT injection wrapping any user-supplied query in SELECT * FROM (user_sql) _q LIMIT max_rows (LLMs generating unbounded scans is a real operational problem), and the schema-as-resource pattern exposing table definitions via db://schema/overview and db://schema/tables/{tableName} URIs so the LLM can explore the schema without making tool calls that count against rate limits.

API wrapper tools: server-side auth injection

API wrapper tools are the most tempting to implement insecurely because the natural API design — accept api_key as a tool parameter — matches how users think about authentication. The problem is that LLMs log their inputs. API keys passed as tool arguments appear in:

The correct pattern injects credentials server-side in a shared fetch wrapper, never surfacing them to the tool parameter schema:

// Credentials loaded once at server startup — never in tool parameters
const GITHUB_TOKEN = process.env.GITHUB_TOKEN;

async function githubFetch(path, options = {}) {
  const response = await fetch(`https://api.github.com${path}`, {
    ...options,
    headers: {
      'Authorization': `Bearer ${GITHUB_TOKEN}`,
      'Accept': 'application/vnd.github.v3+json',
      ...options.headers,
    },
  });
  return mapGithubError(response);
}

// Tool parameter schema has no api_key field
server.tool('list_github_issues', {
  repo: z.string().describe('owner/repo format'),
  state: z.enum(['open', 'closed', 'all']).default('open'),
}, async ({ repo, state }) => {
  const issues = await githubFetch(`/repos/${repo}/issues?state=${state}`);
  return { content: [{ type: 'text', text: JSON.stringify(issues) }] };
});

See the authentication guide and the full API wrapper guide for the complete error mapping pattern, rate limiter implementation (token bucket at 60 req/min), and the circuit breaker pattern (closed/open/half-open states, failure threshold of 5, 30-second recovery window) that prevents a degraded upstream API from causing every tool call in a session to hang until timeout.

Pattern 2: Invisible failure modes

Security hardening addresses what happens when inputs are malicious. The second cross-cutting concern addresses what happens when external dependencies break — not due to attack, but due to normal operational failures: a disk fills up, a network policy changes, a database password rotates, a Docker daemon crashes, a third-party API subscription lapses. Each of these breaks tool execution in a way that is invisible to standard health checks.

The MCP protocol has a natural internal health surface: the initialize handshake and the tools/list response. Both are serviced by the MCP server process itself, with no dependency on any external system. A server with a full disk, a blocked outbound network, a broken database connection, a crashed Docker daemon, and an expired API key will still:

Only when a tool is actually called does the external dependency failure surface — as an isError: true response. The failure-to-health-check gap for each category:

Tool category External dependency Failure scenario Tool response Transport response
Filesystem OS filesystem Disk full (ENOSPC on write) isError: true initialize: healthy
Filesystem Workspace mount WORKSPACE_DIR env var misconfigured All calls: isError: true tools/list: healthy
Web / fetch Outbound network Egress firewall rule blocks outbound HTTP isError: true (timeout or connection refused) initialize: healthy
Database Database server Password rotated; connection pool exhausted All queries: isError: true tools/list: healthy
Code execution Docker daemon Docker daemon crashed or socket permissions changed isError: true (ENOENT on /var/run/docker.sock) initialize: healthy
API wrapper Third-party API API key expired or subscription lapsed (HTTP 401/403) All calls: isError: true tools/list: healthy

The pattern is consistent across all five categories: the MCP transport layer is decoupled from the external dependencies that tools rely on. This is correct MCP protocol behavior — a server that can still process the protocol should still respond to protocol-level requests. But it means that the de facto "is this server up?" check — pinging initialize — is an incomplete health signal for real-world tool servers.

What internal health checks cannot see

The standard recommendation for MCP server health checks is to expose an HTTP /health endpoint that verifies transport liveness and, optionally, database connectivity. This catches a useful class of failures — the server process has crashed, the database is unreachable at the network level. It does not catch:

The only way to detect these failures is by calling the actual tools with representative inputs and observing whether they succeed or return isError: true. This is fundamentally an external probe — it cannot be done from inside the server process, because the server process cannot observe whether its own tool calls are working without making them. It requires a monitor that speaks the MCP protocol, connects as a client, calls tools with safe test inputs, and alerts when the tool response is an error or when the tool takes longer than expected.

This is the architecture behind AliveMCP's monitoring: rather than pinging initialize and declaring the server healthy, AliveMCP's probes speak the full MCP protocol, call registered tools with safe test inputs, measure response latency per tool, and alert when any tool's error rate or latency exceeds its configured threshold. The probe results appear on the server's public status page as a per-tool health matrix, not just a single green/red indicator for the server.

Building the two-layer validation strategy

Given the two cross-cutting concerns — input security and invisible failure modes — a production real-world MCP server needs a two-layer validation strategy at development time and a two-layer monitoring strategy at production time.

Development: security validation + behavior validation

Security validation tests that bad inputs are correctly rejected. For each tool category, the critical tests are:

Behavior validation tests that valid inputs succeed and that error responses are LLM-readable. An isError: true response that says "Internal error: ENOENT" is correct but not useful. An isError: true response that says "File not found: /workspace/reports/q2.csv — check that the filename is correct and the file has been uploaded" is what an LLM can act on.

Production: transport liveness + tool execution monitoring

Transport liveness monitoring — the /health endpoint check — catches process crashes and database network failures. Tool execution monitoring catches everything else. The practical implementation for self-monitoring is a startup probe that calls each tool with safe test inputs and logs the results at INFO level. This catches misconfigured environment variables and permission regressions at deploy time rather than during the first real user tool call.

For ongoing production monitoring, the startup probe pattern does not scale — it only runs at startup, not continuously, and it does not notify anyone when a tool starts failing mid-deployment. External monitoring is necessary for tools that depend on external systems that can fail independently of the server process. See the error handling guide for the structured error taxonomy and the mapping from tool-level errors to alert severity levels.

Implementation checklist by tool category

Use this checklist when building any real-world MCP tool. Each item addresses either the security pattern or the invisible failure pattern for that category.

Filesystem tools

Web / fetch tools

Database tools

Code execution tools

API wrapper tools

Related guides