Implementation guide · 2026-06-12 · Real-world MCP tools

Building Real-World MCP Tools: Filesystem, Web, Databases, Code Execution, and APIs

Most MCP tutorial examples are self-contained: a get_weather tool that calls an API, a calculator tool that does arithmetic. These are clean to reason about. Real MCP tools are different — they reach outside the process boundary to touch the filesystem, the network, a database, a container runtime, or a third-party API. When tool inputs arrive as LLM-generated strings, each of those external interactions becomes a potential attack vector: path traversal for filesystem tools, SSRF for web fetchers, SQL injection for database queries, sandbox escape for code execution, credential leakage for API wrappers. Every tool category has a different attack surface but they all reduce to the same root cause: unsanitized LLM-provided input reaching an external system. This guide synthesizes the filesystem, web search, code execution, database, and API wrapper tool patterns into a unified framework — and explains a second cross-cutting problem that security alone cannot solve: invisible failure modes that break tool execution while leaving the MCP transport layer healthy.

TL;DR

Filesystem tools — path traversal is the primary risk. Every path argument must pass through assertSafePath(): path.resolve(arg) + check that the result starts with ALLOWED_ROOT + path.sep. Atomic writes via .tmp-{pid}-{timestamp} rename prevent partial-file reads. See the full filesystem guide for the read guard, depth-limited listing, and file resources pattern.
Web / fetch tools — SSRF is the primary risk. Never trust the URL argument: resolve the hostname to an IP address first, then block RFC 1918 + link-local + loopback ranges before connecting. Content-type validation + byte limit enforcement prevent response-exfiltration and OOM. See the full web-fetch guide for DNS rebinding defense and the 5-minute LRU response cache.
Code execution tools — sandbox escape is the primary risk. eval() provides no isolation; vm.Script can be escaped via prototype chains. Real isolation requires Docker with six explicit flags: --network none, --memory 256m --memory-swap 256m, --cpus 0.5, --read-only, --no-new-privileges, --pids-limit 64. See the full code execution guide for the per-language container setup and partial-result streaming pattern.
Database tools — SQL injection is the primary risk. Parameterized queries at the driver level eliminate injection; never build SQL strings by concatenation. Pair with a read-only database user and LIMIT injection (SELECT * FROM (user_sql) _q LIMIT max_rows). See the full database guide for the schema-as-resource pattern and EXPLAIN cost guard.
API wrapper tools — credential leakage is the primary risk. Never accept API keys as tool arguments — LLMs log inputs in context windows and call histories. Inject credentials server-side in the shared apiFetch() wrapper. Pair with a circuit breaker to avoid cascading failures when upstream APIs degrade. See the full API wrapper guide for the rate limiter and mapApiError() pattern.
Universal failure modes — all five categories share an invisible failure pattern: when an external dependency breaks, tool calls return isError: true but the MCP transport (initialize, tools/list) remains healthy. Any monitor that only checks transport liveness will show green while every tool is broken. External protocol monitoring that calls the tools is the only way to catch this.

Pattern 1: Unsanitized LLM input reaching an external system

The common thread across all real-world MCP tool categories is not the technology stack — it is the trust boundary. In each case, input that the LLM generates (a file path, a URL, a code string, a SQL query fragment, a key name) crosses a boundary into an external system that enforces its own rules. If that input is not validated before crossing the boundary, the external system does what it does with any input — which may not be what the developer intended.

The five categories map to five distinct external system types, each with a characteristic attack surface:

Tool category	External system	Primary attack vector	Root cause	Defense
Filesystem	OS file API	Path traversal (`../../etc/passwd`)	Relative path segments bypass intended root	`path.resolve()` + allowed-root prefix check with `path.sep` suffix
Web / fetch	HTTP client + DNS	SSRF to internal network or metadata APIs	DNS resolution happens after URL validation, enabling rebinding	Resolve hostname to IP first; block RFC 1918 + link-local + loopback after resolution
Database	SQL engine	SQL injection via string interpolation	Query built by concatenation instead of parameterized binding	Parameterized queries at driver level; read-only DB user; LIMIT injection wrapper
Code execution	Runtime / OS	Sandbox escape to host filesystem or network	`eval()`/`vm.Script` share process + prototype chain with host	Docker with `--network none`, `--read-only`, `--no-new-privileges`, memory + PID limits
API wrapper	Third-party API	Credential leakage via tool argument logging	API keys passed as arguments appear in context window + call history	Server-side auth injection; never accept keys as tool parameters

The unifying frame: LLMs are good at generating structured output, but they are not security-aware. A model asked to "read the user's SSH keys" may generate ../../../home/user/.ssh/id_rsa as a file path argument — not because it is malicious, but because that is the answer to the question. The MCP server is responsible for rejecting that input before it reaches the OS. The attack surface is the gap between what the LLM can generate and what the external system will accept.

Filesystem tools: path traversal defense in depth

A filesystem MCP server gives an LLM read and write access to files — which is exactly why it needs the most conservative input validation of any tool category. The core invariant is simple: every file operation must stay within an explicitly declared root directory. The implementation is subtle: path.startsWith(ALLOWED_ROOT) is wrong because /workspace-evil starts with /workspace. The correct check appends path.sep to the root before comparing:

function assertSafePath(userInput) {
  const resolved = path.resolve(userInput);
  if (!resolved.startsWith(WORKSPACE + path.sep) && resolved !== WORKSPACE) {
    throw new Error(`Access denied: path outside workspace`);
  }
  return resolved;
}

path.resolve() collapses all ../ segments before the prefix check runs, so there is no traversal vector regardless of how many levels of ../ the input contains. The path.sep suffix prevents the /workspace-evil false positive.

Beyond path validation, production filesystem tools need three additional patterns: stat-before-read guards that check file size before reading (prevents OOM from multi-GB files), atomic write patterns using .tmp-{pid}-{timestamp} rename (prevents partial-file reads during writes), and depth-limited directory listings that return relative paths (absolute paths leak server directory structure to the LLM context). The MCP resources API is the right transport for file content injection into context — use file:// URIs as resources rather than read_file tool calls for large reference documents that the LLM needs throughout a session.

Web fetch tools: SSRF and DNS rebinding

SSRF (Server-Side Request Forgery) is the web-fetch equivalent of path traversal: an attacker supplies a URL that causes the server to make requests to internal infrastructure the attacker cannot reach directly. The classic defense — checking whether the URL's hostname is a private IP address — fails against DNS rebinding attacks, where a public hostname is resolved to a private IP after the hostname check passes.

The correct defense resolves the hostname to an IP address first, then checks the IP against blocked ranges before making the connection:

async function assertSafeUrl(rawUrl) {
  const url = new URL(rawUrl); // throws on malformed
  if (!['http:', 'https:'].includes(url.protocol)) {
    throw new Error('Only http/https allowed');
  }
  const [ip] = await dns.promises.resolve4(url.hostname);
  if (isBlockedIp(ip)) {
    throw new Error(`Blocked: ${url.hostname} resolves to private IP ${ip}`);
  }
  return url;
}

function isBlockedIp(ip) {
  return /^(10\.|172\.(1[6-9]|2\d|3[01])\.|192\.168\.|127\.|169\.254\.|24[0-9]\.|25[0-5]\.)/.test(ip);
}

The regex covers RFC 1918 private ranges, loopback, AWS instance metadata (169.254.x.x), and reserved ranges. DNS rebinding is defeated because the resolution happens inside the security check — there is no window between validation and connection where a DNS response could change. See the SSRF prevention guide for the full blocked CIDR list and the IPv6 equivalent patterns.

Beyond SSRF defense, production web-fetch tools need response size limits (500 KB default, checked against Content-Length before reading), AbortController timeouts (10s default), content-type validation, HTML-to-text stripping that removes <script> and <style> blocks before tag stripping, and a URL-keyed LRU response cache (5-minute TTL, 500-entry limit, never caching errors). Per-domain rate limiting at 1 request per second prevents the tool from being used to DoS third-party sites.

Code execution tools: why eval() and vm.Script are not sandboxes

The temptation with code execution tools is to reach for eval() or Node.js's built-in vm.Script because they are readily available and easy to implement. Both provide weak or no isolation:

Approach	Isolation level	Known escape	Filesystem access	Network access
`eval()`	None	Full process access by definition	Full	Full
`vm.Script` (sandbox: false)	Scope only	Prototype chain escape via `constructor.constructor`	Full	Full
`vm.Script` (sandbox: true)	Scope + prototype	Context isolation escape via shared built-ins	Full (if `require` exposed)	Full (if `require` exposed)
Docker with flags	Full container isolation	None (with correct flags)	Read-only volume only	None (`--network none`)

The six Docker flags that make container isolation real: --network none (no egress — blocks all network calls from executed code), --memory 256m --memory-swap 256m (no swap — OOM kill instead of disk swap), --cpus 0.5 (CPU quota prevents host starvation), --read-only (filesystem immutable except explicit volume mounts), --no-new-privileges (no setuid/setgid escalation), --pids-limit 64 (fork bomb prevention). All six are necessary — omitting any one opens a specific escape vector.

The full code execution guide covers the per-language image setup (Python, Node, bash), the execFile-with-timeout pattern (no shell expansion), volume mounting code read-only at /sandbox, the finally block for tmpDir cleanup, and pre-pulling images at server startup to avoid cold-start latency on the first tool call. For long-running computations, the partial-result pattern emits intermediate output as MCP notifications rather than blocking the tool call until completion.

Database tools: parameterized queries and the read-only user

SQL injection is one of the oldest vulnerabilities in software, and it appears in MCP database tools through the same mechanism it always has: a query built by string interpolation instead of parameterized binding. The correct defense is driver-level parameterized queries — no string concatenation, no template literals embedding user input into SQL:

// Wrong: SQL injection vector
const rows = await db.query(`SELECT * FROM ${table} WHERE id = ${id}`);

// Correct: parameterized binding (pg driver)
const rows = await client.query(
  'SELECT * FROM items WHERE id = $1 AND owner = $2',
  [id, ownerId]
);

// Correct: better-sqlite3 (synchronous named bindings)
const rows = db.prepare(
  'SELECT * FROM items WHERE id = ? AND owner = ?'
).all(id, ownerId);

The parameterized form is not optional based on risk assessment — it applies to every query including queries that look safe, because LLM-generated inputs are not predictable and injection can be composed across multiple tool calls within a session. See the full database guide for the parameterized query syntax table across postgres.js, node-postgres, better-sqlite3, Prisma, and Drizzle.

Beyond parameterized queries, three additional patterns complete the production database tool: a read-only database user (the MCP server's credentials cannot write if they are read-only at the database level — injection that attempts to write fails at the permission level even if it bypasses the parameterized query check), LIMIT injection wrapping any user-supplied query in SELECT * FROM (user_sql) _q LIMIT max_rows (LLMs generating unbounded scans is a real operational problem), and the schema-as-resource pattern exposing table definitions via db://schema/overview and db://schema/tables/{tableName} URIs so the LLM can explore the schema without making tool calls that count against rate limits.

API wrapper tools: server-side auth injection

API wrapper tools are the most tempting to implement insecurely because the natural API design — accept api_key as a tool parameter — matches how users think about authentication. The problem is that LLMs log their inputs. API keys passed as tool arguments appear in:

The LLM context window (visible to subsequent reasoning steps and potentially included in summarized outputs)
Structured call logs from every major LLM provider's API logging
MCP server logs if the server logs tool call arguments for debugging
Any observability tooling that captures tool inputs (tracing, metrics, replay tools)

The correct pattern injects credentials server-side in a shared fetch wrapper, never surfacing them to the tool parameter schema:

// Credentials loaded once at server startup — never in tool parameters
const GITHUB_TOKEN = process.env.GITHUB_TOKEN;

async function githubFetch(path, options = {}) {
  const response = await fetch(`https://api.github.com${path}`, {
    ...options,
    headers: {
      'Authorization': `Bearer ${GITHUB_TOKEN}`,
      'Accept': 'application/vnd.github.v3+json',
      ...options.headers,
    },
  });
  return mapGithubError(response);
}

// Tool parameter schema has no api_key field
server.tool('list_github_issues', {
  repo: z.string().describe('owner/repo format'),
  state: z.enum(['open', 'closed', 'all']).default('open'),
}, async ({ repo, state }) => {
  const issues = await githubFetch(`/repos/${repo}/issues?state=${state}`);
  return { content: [{ type: 'text', text: JSON.stringify(issues) }] };
});

See the authentication guide and the full API wrapper guide for the complete error mapping pattern, rate limiter implementation (token bucket at 60 req/min), and the circuit breaker pattern (closed/open/half-open states, failure threshold of 5, 30-second recovery window) that prevents a degraded upstream API from causing every tool call in a session to hang until timeout.

Pattern 2: Invisible failure modes

Security hardening addresses what happens when inputs are malicious. The second cross-cutting concern addresses what happens when external dependencies break — not due to attack, but due to normal operational failures: a disk fills up, a network policy changes, a database password rotates, a Docker daemon crashes, a third-party API subscription lapses. Each of these breaks tool execution in a way that is invisible to standard health checks.

The MCP protocol has a natural internal health surface: the initialize handshake and the tools/list response. Both are serviced by the MCP server process itself, with no dependency on any external system. A server with a full disk, a blocked outbound network, a broken database connection, a crashed Docker daemon, and an expired API key will still:

Accept initialize and return its server info
Respond to tools/list with a full list of registered tools
Return HTTP 200 to any health check endpoint the server exposes

Only when a tool is actually called does the external dependency failure surface — as an isError: true response. The failure-to-health-check gap for each category:

Tool category	External dependency	Failure scenario	Tool response	Transport response
Filesystem	OS filesystem	Disk full (ENOSPC on write)	`isError: true`	`initialize`: healthy
Filesystem	Workspace mount	`WORKSPACE_DIR` env var misconfigured	All calls: `isError: true`	`tools/list`: healthy
Web / fetch	Outbound network	Egress firewall rule blocks outbound HTTP	`isError: true` (timeout or connection refused)	`initialize`: healthy
Database	Database server	Password rotated; connection pool exhausted	All queries: `isError: true`	`tools/list`: healthy
Code execution	Docker daemon	Docker daemon crashed or socket permissions changed	`isError: true` (ENOENT on `/var/run/docker.sock`)	`initialize`: healthy
API wrapper	Third-party API	API key expired or subscription lapsed (HTTP 401/403)	All calls: `isError: true`	`tools/list`: healthy

The pattern is consistent across all five categories: the MCP transport layer is decoupled from the external dependencies that tools rely on. This is correct MCP protocol behavior — a server that can still process the protocol should still respond to protocol-level requests. But it means that the de facto "is this server up?" check — pinging initialize — is an incomplete health signal for real-world tool servers.

What internal health checks cannot see

The standard recommendation for MCP server health checks is to expose an HTTP /health endpoint that verifies transport liveness and, optionally, database connectivity. This catches a useful class of failures — the server process has crashed, the database is unreachable at the network level. It does not catch:

Permission-level database failures — if the database is reachable but the user's password has been rotated, a SELECT 1 health check using a separate admin connection will pass while all tool queries using the application user fail with authentication errors.
Misconfigured environment variables — if WORKSPACE_DIR points to a path that does not exist, the server starts cleanly, passes its health check, and fails every filesystem tool call.
Container daemon failures — if the Docker daemon crashes after the server starts, /health returns 200 while every execute_code call fails.
Upstream API outages — if a third-party API returns 503, the MCP server health check is unaffected. Every tool call that reaches that API fails until the upstream recovers.
Network policy regressions — if a firewall rule change blocks outbound HTTP on a new deployment but not the existing deployment that passed health checks at startup, web fetch tools fail silently on the new instance.

The only way to detect these failures is by calling the actual tools with representative inputs and observing whether they succeed or return isError: true. This is fundamentally an external probe — it cannot be done from inside the server process, because the server process cannot observe whether its own tool calls are working without making them. It requires a monitor that speaks the MCP protocol, connects as a client, calls tools with safe test inputs, and alerts when the tool response is an error or when the tool takes longer than expected.

This is the architecture behind AliveMCP's monitoring: rather than pinging initialize and declaring the server healthy, AliveMCP's probes speak the full MCP protocol, call registered tools with safe test inputs, measure response latency per tool, and alert when any tool's error rate or latency exceeds its configured threshold. The probe results appear on the server's public status page as a per-tool health matrix, not just a single green/red indicator for the server.

Building the two-layer validation strategy

Given the two cross-cutting concerns — input security and invisible failure modes — a production real-world MCP server needs a two-layer validation strategy at development time and a two-layer monitoring strategy at production time.

Development: security validation + behavior validation

Security validation tests that bad inputs are correctly rejected. For each tool category, the critical tests are:

Filesystem: ../../../etc/passwd → throws path-outside-workspace error
Web fetch: http://169.254.169.254/latest/meta-data/ → throws SSRF blocked error
Database: '; DROP TABLE items; -- as query fragment → parameterized binding makes this a literal string value, not SQL
Code execution: code that reads /etc/passwd → fails with permission denied inside container
API wrapper: tool call with no api_key parameter compiles without error (the parameter does not exist in the schema)

Behavior validation tests that valid inputs succeed and that error responses are LLM-readable. An isError: true response that says "Internal error: ENOENT" is correct but not useful. An isError: true response that says "File not found: /workspace/reports/q2.csv — check that the filename is correct and the file has been uploaded" is what an LLM can act on.

Production: transport liveness + tool execution monitoring

Transport liveness monitoring — the /health endpoint check — catches process crashes and database network failures. Tool execution monitoring catches everything else. The practical implementation for self-monitoring is a startup probe that calls each tool with safe test inputs and logs the results at INFO level. This catches misconfigured environment variables and permission regressions at deploy time rather than during the first real user tool call.

For ongoing production monitoring, the startup probe pattern does not scale — it only runs at startup, not continuously, and it does not notify anyone when a tool starts failing mid-deployment. External monitoring is necessary for tools that depend on external systems that can fail independently of the server process. See the error handling guide for the structured error taxonomy and the mapping from tool-level errors to alert severity levels.

Implementation checklist by tool category

Use this checklist when building any real-world MCP tool. Each item addresses either the security pattern or the invisible failure pattern for that category.

Filesystem tools

☐ assertSafePath() with path.resolve() + ALLOWED_ROOT + path.sep prefix check on every path argument
☐ Stat-before-read guard checking file size against MAX_READ_BYTES
☐ Atomic writes using .tmp-{pid}-{timestamp} → fs.rename()
☐ Directory listing returns relative paths with depth limit (1–5)
☐ delete_file requires confirm: true literal parameter
☐ Startup probe writes and reads a test file to verify workspace is writable and correctly mounted

Web / fetch tools

☐ assertSafeUrl() resolves hostname to IP via dns.promises.resolve4() before connection
☐ isBlockedIp() rejects RFC 1918, loopback, 169.254.x.x, reserved ranges
☐ Protocol whitelist: only http: and https:
☐ AbortController with 10s timeout on every fetch
☐ Response size limit checked against Content-Length + streaming truncation
☐ URL-keyed LRU response cache (5-min TTL, 500 entries, errors not cached)
☐ Per-domain rate limiter at 1 req/sec
☐ Startup probe fetches a known-good URL to verify egress is allowed

Database tools

☐ All queries use parameterized bindings — zero string interpolation
☐ Database user is read-only (SELECT only, no INSERT/UPDATE/DELETE/DDL)
☐ User-supplied query wrapped in SELECT * FROM (…) _q LIMIT max_rows
☐ isSelectStatement() guard rejects non-SELECT queries
☐ SET LOCAL statement_timeout = '5s' per connection
☐ Schema exposed as MCP resources (db://schema/overview, db://schema/tables/{name})
☐ Startup probe executes SELECT 1 with application credentials (not admin credentials)

Code execution tools

☐ Docker with all six flags: --network none, --memory 256m --memory-swap 256m, --cpus 0.5, --read-only, --no-new-privileges, --pids-limit 64
☐ Code mounted read-only at /sandbox, not copied into image
☐ execFile with explicit timeout and maxBuffer limit (no shell expansion)
☐ Tmpdir cleaned in finally block regardless of success/failure
☐ Container images pre-pulled at startup to avoid cold-start latency
☐ Startup probe runs a trivial program (print 42) to verify Docker daemon is accessible and functional

API wrapper tools

☐ Credentials loaded from environment variables at startup — never accepted as tool parameters
☐ Shared apiFetch() wrapper injects auth headers server-side
☐ One tool per API operation (not a generic call_api tool)
☐ mapApiError() converts HTTP status codes to LLM-readable error messages
☐ Rate limiter (token bucket) per API to stay within upstream rate limits
☐ Circuit breaker (failure threshold 5, 30s recovery) to fail fast during sustained outages
☐ Startup probe makes a low-cost API call (e.g., GET /user, GET /ping) to verify credential validity

Related guides

Filesystem MCP server guide — path traversal defense, atomic writes, file resources
Web search and HTTP fetch MCP tools — SSRF defense, DNS rebinding, LRU caching
Code execution MCP server — Docker isolation, sandbox flags, partial results
Database tools MCP server — parameterized queries, read-only user, schema resources
API wrapper MCP server — server-side auth, circuit breaker, rate limiting
SSRF prevention for MCP servers — full CIDR blocklist, IPv6 defense
MCP server authentication — Bearer tokens, API key validation, mTLS
Circuit breaker pattern for MCP servers — state machine, failure threshold, recovery
MCP server error handling — isError taxonomy, LLM-readable messages, structured errors
MCP server health checks — /health endpoint, liveness vs readiness, monitoring gap
MCP resources API — file:// URIs, db:// URIs, resource subscriptions