Guide · MCP Performance
MCP server request batching
LLM agents frequently invoke the same tool multiple times in parallel — fetching 10 user records one at a time, looking up 20 product prices sequentially, or resolving a list of IDs before the next reasoning step. Each call triggers a separate database query, turning what should be a single round-trip into N round-trips: the N+1 problem at the MCP layer. Request batching coalesces these concurrent single-item queries into a single multi-item query, cutting database round-trips by 10–100× while making no change to the agent or the tool's public interface.
TL;DR
Use a DataLoader (from Facebook's dataloader package) per resource type. Each DataLoader collects individual key requests that arrive within the same Node.js event loop tick, then fires a single batch query for all of them, and fans the results back out to each individual caller. The tool handler calls loader.load(id) and awaits it — the batching is transparent. Instantiate one DataLoader per incoming MCP request (not per server start) so batches are scoped to a single agent session.
The N+1 problem in MCP servers
This pattern is familiar from GraphQL, but it appears in MCP servers for the same reason: a higher-level consumer (agent or resolver) maps over a list and calls a per-item fetcher for each element.
Consider an agent that needs to process a list of orders:
- Agent calls
list_orders→ server returns[{orderId: 'A'}, {orderId: 'B'}, ..., {orderId: 'J'}] - Agent calls
get_order_detailswithorderId: 'A' - Agent calls
get_order_detailswithorderId: 'B' - ... 8 more calls ...
Without batching: 10 separate SELECT * FROM orders WHERE id = $1 queries. With batching: 1 query SELECT * FROM orders WHERE id = ANY($1) with all 10 IDs. On a remote database (even with a connection pool), the difference between 10 round-trips and 1 is measured in hundreds of milliseconds at p50 and seconds at p99.
Agents make this worse than classic web apps because:
- Parallel tool calls — many agent frameworks invoke independent tools in parallel. The MCP spec allows concurrent requests on the same session. Multiple
get_order_detailscalls can arrive simultaneously. - Long task chains — a single agent task may involve 50–200 tool calls. An N+1 pattern multiplied across a 200-call session creates 200 database round-trips for data that could be fetched in 20 batched queries.
- No request boundary — traditional N+1 solutions scope a DataLoader to an HTTP request. MCP sessions can last minutes with ongoing tool calls — you need a batching strategy that works across a streaming session, not just a single HTTP transaction.
DataLoader pattern
The dataloader package implements a micro-batching strategy: it collects keys across all load(key) calls that happen synchronously (within one event loop tick), then executes a single batch function with all collected keys before the next tick.
import DataLoader from 'dataloader';
import { db } from './database.js';
// Create one loader per resource type
// batchFn receives an array of keys; must return an array of results in the same order
const createOrderLoader = () => new DataLoader<string, Order | null>(
async (orderIds) => {
const rows = await db.query(
'SELECT * FROM orders WHERE id = ANY($1)',
[orderIds]
);
// Build a map for O(1) lookup — must return results in the same order as input keys
const map = new Map(rows.map(r => [r.id, r]));
return orderIds.map(id => map.get(id) ?? null);
},
{
// Cache within the loader's lifetime — prevents duplicate loads for the same key
cache: true,
// Optional: maximum batch size
maxBatchSize: 100,
}
);
The critical rule: the batch function must return an array where result[i] corresponds to keys[i]. Use a Map to reorder database results back to key order. If a key has no result, return null (not omit the entry) — DataLoader uses array position to match results to callers.
Scoping loaders to an MCP request context
DataLoader's in-memory cache is scoped to the loader instance. For MCP servers:
- Per-tool-call scope — create a new DataLoader for each incoming tool call. The cache lives only for the duration of that call. Simplest to reason about but loses batching across parallel concurrent calls.
- Per-MCP-session scope — create DataLoaders when a session opens and share them across all tool calls in that session. Enables batching across parallel concurrent tool calls from the same agent. Requires passing the loader through the tool call context.
- Per-request scope (recommended) — attach loaders to the Express request object or MCP request context. Loaders are created once per HTTP request and garbage-collected when the response is sent.
// Middleware: attach loaders to the request context
app.use((req, res, next) => {
// These loaders are shared across all tool calls within the same HTTP request
req.loaders = {
orders: createOrderLoader(),
users: createUserLoader(),
products: createProductLoader(),
};
next();
});
// In your MCP server setup, pass the loaders through tool context
server.tool(
'get_order_details',
'Get detailed order information including user and product details',
{ orderId: z.string() },
async ({ orderId }, context) => {
// context.loaders is populated by middleware
const loaders = (context as any).loaders as AppLoaders;
// These three loads happen in parallel AND are automatically batched
// if multiple tool calls request the same user/product IDs simultaneously
const order = await loaders.orders.load(orderId);
if (!order) throw new Error(`Order ${orderId} not found`);
// Parallel loads within this handler — DataLoader coalesces them
const [user, product] = await Promise.all([
loaders.users.load(order.userId),
loaders.products.load(order.productId),
]);
return {
content: [{
type: 'text',
text: JSON.stringify({ order, user, product }),
}],
};
}
);
Deduplication and the cache
DataLoader's cache means that loading the same key twice returns the same Promise — the second load never issues a second database query. This is valuable when an agent calls two different tools that both need the same user record:
// If get_order_details and get_user_notes both need user 'u_abc'
// within the same request, the user is fetched exactly once
const [orderResult, notesResult] = await Promise.all([
loaders.orders.load('order_123'), // internally loads user 'u_abc'
loaders.users.load('u_abc'), // same user — returns cached Promise
]);
The cache is safe here because it is scoped to a single HTTP request. If you scope loaders to an MCP session (minutes long), you may want to disable the cache ({ cache: false }) or call loader.clear(key) after writes to prevent stale reads across session-level mutations.
Batch size limits and chunking
Some databases and APIs have limits on the number of values in an IN clause or batch request. Set maxBatchSize and chunk manually for large batches:
const createTagLoader = () => new DataLoader<string, Tag[]>(
async (tagIds) => {
// PostgreSQL IN clause performs well up to ~10,000 values
// For very large batches, chunk into groups of 1000
const CHUNK_SIZE = 1000;
const allRows: Tag[] = [];
for (let i = 0; i < tagIds.length; i += CHUNK_SIZE) {
const chunk = tagIds.slice(i, i + CHUNK_SIZE);
const rows = await db.query(
'SELECT * FROM tags WHERE id = ANY($1)',
[chunk]
);
allRows.push(...rows);
}
const map = new Map<string, Tag[]>();
for (const tag of allRows) {
const existing = map.get(tag.entityId) ?? [];
existing.push(tag);
map.set(tag.entityId, existing);
}
return tagIds.map(id => map.get(id) ?? []);
},
{ maxBatchSize: 1000 }
);
Measuring the impact
Instrument your DataLoaders to measure batch efficiency:
import { Counter, Histogram } from 'prom-client';
const batchSize = new Histogram({
name: 'mcp_dataloader_batch_size',
help: 'Number of keys per DataLoader batch',
labelNames: ['loader'],
buckets: [1, 2, 5, 10, 25, 50, 100],
});
const batchDuration = new Histogram({
name: 'mcp_dataloader_batch_duration_seconds',
help: 'DataLoader batch query duration',
labelNames: ['loader'],
buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5],
});
const createOrderLoader = () => new DataLoader<string, Order | null>(
async (orderIds) => {
batchSize.observe({ loader: 'orders' }, orderIds.length);
const start = Date.now();
const rows = await db.query('SELECT * FROM orders WHERE id = ANY($1)', [orderIds]);
batchDuration.observe({ loader: 'orders' }, (Date.now() - start) / 1000);
const map = new Map(rows.map(r => [r.id, r]));
return orderIds.map(id => map.get(id) ?? null);
}
);
A healthy batch size histogram is bimodal: many batches of size 1 (single lookups in non-parallel calls) and a peak at your typical agent parallelism (5–20 for most frameworks). If you see exclusively size-1 batches even under parallel agent load, check that your loaders are scoped correctly — they may be re-created per tool call rather than shared across concurrent calls.
AliveMCP external probes measure end-to-end tool call response time. If your probe latency drops after implementing DataLoader (typical improvement: 50–200ms on database-heavy tools), that is the batch optimization working.
Further reading
- MCP server connection pooling — database pool sizing and management
- MCP server caching — response cache and stale-while-revalidate
- MCP server concurrency — parallel tool execution and worker threads
- MCP server benchmarking — measuring tool call throughput
- MCP server profiling — finding performance bottlenecks
- MCP server Redis — shared cache across multiple server instances
- MCP server backpressure — flow control under high concurrency
- MCP server metrics — instrumentation and performance dashboards
- AliveMCP — uptime monitoring for HTTP-deployed MCP servers