Guide · MCP Performance

MCP server request batching

LLM agents frequently invoke the same tool multiple times in parallel — fetching 10 user records one at a time, looking up 20 product prices sequentially, or resolving a list of IDs before the next reasoning step. Each call triggers a separate database query, turning what should be a single round-trip into N round-trips: the N+1 problem at the MCP layer. Request batching coalesces these concurrent single-item queries into a single multi-item query, cutting database round-trips by 10–100× while making no change to the agent or the tool's public interface.

TL;DR

Use a DataLoader (from Facebook's dataloader package) per resource type. Each DataLoader collects individual key requests that arrive within the same Node.js event loop tick, then fires a single batch query for all of them, and fans the results back out to each individual caller. The tool handler calls loader.load(id) and awaits it — the batching is transparent. Instantiate one DataLoader per incoming MCP request (not per server start) so batches are scoped to a single agent session.

The N+1 problem in MCP servers

This pattern is familiar from GraphQL, but it appears in MCP servers for the same reason: a higher-level consumer (agent or resolver) maps over a list and calls a per-item fetcher for each element.

Consider an agent that needs to process a list of orders:

  1. Agent calls list_orders → server returns [{orderId: 'A'}, {orderId: 'B'}, ..., {orderId: 'J'}]
  2. Agent calls get_order_details with orderId: 'A'
  3. Agent calls get_order_details with orderId: 'B'
  4. ... 8 more calls ...

Without batching: 10 separate SELECT * FROM orders WHERE id = $1 queries. With batching: 1 query SELECT * FROM orders WHERE id = ANY($1) with all 10 IDs. On a remote database (even with a connection pool), the difference between 10 round-trips and 1 is measured in hundreds of milliseconds at p50 and seconds at p99.

Agents make this worse than classic web apps because:

DataLoader pattern

The dataloader package implements a micro-batching strategy: it collects keys across all load(key) calls that happen synchronously (within one event loop tick), then executes a single batch function with all collected keys before the next tick.

import DataLoader from 'dataloader';
import { db } from './database.js';

// Create one loader per resource type
// batchFn receives an array of keys; must return an array of results in the same order
const createOrderLoader = () => new DataLoader<string, Order | null>(
  async (orderIds) => {
    const rows = await db.query(
      'SELECT * FROM orders WHERE id = ANY($1)',
      [orderIds]
    );

    // Build a map for O(1) lookup — must return results in the same order as input keys
    const map = new Map(rows.map(r => [r.id, r]));
    return orderIds.map(id => map.get(id) ?? null);
  },
  {
    // Cache within the loader's lifetime — prevents duplicate loads for the same key
    cache: true,
    // Optional: maximum batch size
    maxBatchSize: 100,
  }
);

The critical rule: the batch function must return an array where result[i] corresponds to keys[i]. Use a Map to reorder database results back to key order. If a key has no result, return null (not omit the entry) — DataLoader uses array position to match results to callers.

Scoping loaders to an MCP request context

DataLoader's in-memory cache is scoped to the loader instance. For MCP servers:

// Middleware: attach loaders to the request context
app.use((req, res, next) => {
  // These loaders are shared across all tool calls within the same HTTP request
  req.loaders = {
    orders: createOrderLoader(),
    users: createUserLoader(),
    products: createProductLoader(),
  };
  next();
});

// In your MCP server setup, pass the loaders through tool context
server.tool(
  'get_order_details',
  'Get detailed order information including user and product details',
  { orderId: z.string() },
  async ({ orderId }, context) => {
    // context.loaders is populated by middleware
    const loaders = (context as any).loaders as AppLoaders;

    // These three loads happen in parallel AND are automatically batched
    // if multiple tool calls request the same user/product IDs simultaneously
    const order = await loaders.orders.load(orderId);
    if (!order) throw new Error(`Order ${orderId} not found`);

    // Parallel loads within this handler — DataLoader coalesces them
    const [user, product] = await Promise.all([
      loaders.users.load(order.userId),
      loaders.products.load(order.productId),
    ]);

    return {
      content: [{
        type: 'text',
        text: JSON.stringify({ order, user, product }),
      }],
    };
  }
);

Deduplication and the cache

DataLoader's cache means that loading the same key twice returns the same Promise — the second load never issues a second database query. This is valuable when an agent calls two different tools that both need the same user record:

// If get_order_details and get_user_notes both need user 'u_abc'
// within the same request, the user is fetched exactly once
const [orderResult, notesResult] = await Promise.all([
  loaders.orders.load('order_123'),  // internally loads user 'u_abc'
  loaders.users.load('u_abc'),        // same user — returns cached Promise
]);

The cache is safe here because it is scoped to a single HTTP request. If you scope loaders to an MCP session (minutes long), you may want to disable the cache ({ cache: false }) or call loader.clear(key) after writes to prevent stale reads across session-level mutations.

Batch size limits and chunking

Some databases and APIs have limits on the number of values in an IN clause or batch request. Set maxBatchSize and chunk manually for large batches:

const createTagLoader = () => new DataLoader<string, Tag[]>(
  async (tagIds) => {
    // PostgreSQL IN clause performs well up to ~10,000 values
    // For very large batches, chunk into groups of 1000
    const CHUNK_SIZE = 1000;
    const allRows: Tag[] = [];

    for (let i = 0; i < tagIds.length; i += CHUNK_SIZE) {
      const chunk = tagIds.slice(i, i + CHUNK_SIZE);
      const rows = await db.query(
        'SELECT * FROM tags WHERE id = ANY($1)',
        [chunk]
      );
      allRows.push(...rows);
    }

    const map = new Map<string, Tag[]>();
    for (const tag of allRows) {
      const existing = map.get(tag.entityId) ?? [];
      existing.push(tag);
      map.set(tag.entityId, existing);
    }

    return tagIds.map(id => map.get(id) ?? []);
  },
  { maxBatchSize: 1000 }
);

Measuring the impact

Instrument your DataLoaders to measure batch efficiency:

import { Counter, Histogram } from 'prom-client';

const batchSize = new Histogram({
  name: 'mcp_dataloader_batch_size',
  help: 'Number of keys per DataLoader batch',
  labelNames: ['loader'],
  buckets: [1, 2, 5, 10, 25, 50, 100],
});

const batchDuration = new Histogram({
  name: 'mcp_dataloader_batch_duration_seconds',
  help: 'DataLoader batch query duration',
  labelNames: ['loader'],
  buckets: [0.001, 0.005, 0.01, 0.05, 0.1, 0.5],
});

const createOrderLoader = () => new DataLoader<string, Order | null>(
  async (orderIds) => {
    batchSize.observe({ loader: 'orders' }, orderIds.length);
    const start = Date.now();
    const rows = await db.query('SELECT * FROM orders WHERE id = ANY($1)', [orderIds]);
    batchDuration.observe({ loader: 'orders' }, (Date.now() - start) / 1000);

    const map = new Map(rows.map(r => [r.id, r]));
    return orderIds.map(id => map.get(id) ?? null);
  }
);

A healthy batch size histogram is bimodal: many batches of size 1 (single lookups in non-parallel calls) and a peak at your typical agent parallelism (5–20 for most frameworks). If you see exclusively size-1 batches even under parallel agent load, check that your loaders are scoped correctly — they may be re-created per tool call rather than shared across concurrent calls.

AliveMCP external probes measure end-to-end tool call response time. If your probe latency drops after implementing DataLoader (typical improvement: 50–200ms on database-heavy tools), that is the batch optimization working.

Further reading