Guide · Tool Design

MCP Server Tool Discovery — naming, descriptions, and schema design for LLM selection accuracy

An LLM chooses which MCP tool to call based entirely on the tool's name, description, and JSON schema. If the name is ambiguous, the model guesses. If the description is missing "when NOT to use this", the model picks the wrong tool confidently. If the schema accepts unconstrained strings where it should accept an enum, the model invents values. Tool discovery failures look like model errors but are usually tool authoring errors — fixable by writing better names, descriptions, and schemas rather than swapping models. This guide covers the specific techniques that produce reliably high tool selection accuracy.

TL;DR

Name tools with verb_noun convention and shared prefixes for related tools. Write descriptions as: one-line summary, "Use this when:", "Do NOT use when:" (disambiguation), and key parameter notes. Use enums over unconstrained strings. Add description to every parameter. Keep tool count under 20 per context window; group related tools by prefix. Use readOnly: true and destructive: true annotations — they signal risk to both the LLM and the client. Monitor selection accuracy with AliveMCP's tool-level error tracking.

Naming conventions

Tool names are the first signal an LLM reads. A name like process tells the model nothing. A name like create_customer_invoice tells it the action (create), domain (customer), and object (invoice). Three principles:

Principle	Bad	Good	Why
Verb-noun format	`customer`, `data`	`get_customer`, `search_customers`	Verb signals intent; noun signals object
Specific verbs	`handle_customer`	`update_customer_email`	Generic verbs don't constrain model reasoning
Consistent prefixes	`search_customers`, `fetch_order`, `list_invoices`	`customer_search`, `order_get`, `invoice_list`	Shared prefix groups related tools alphabetically; models group by prefix

Distinguish similar verbs explicitly. LLMs conflate get, fetch, retrieve, and read. Use one consistently (prefer get for single-item lookups). Use search or list for multi-result queries — the distinction between "search by criteria" (search) and "list all with filter" (list) is meaningful to the model if the names reflect it.

// Naming examples for a CRM server
customer_get           // get single customer by ID
customer_search        // search customers by query string
customer_list          // list customers with filter/pagination
customer_create        // create new customer
customer_update        // update customer fields
customer_delete        // delete (soft) customer record

invoice_get
invoice_create
invoice_send           // send to customer (side effect — different from create)
invoice_void           // cancel invoice (destructive — different from delete)

// NOT: customer, getCustomer, doSearch, processInvoice

Description structure

Tool descriptions are read by the LLM in full at tool selection time. Structure them for skimmability and specificity — not for documentation.

server.tool('customer_search', {
  query: z.string().describe(
    'Text to search across customer name, email, and company. ' +
    'Use partial strings — "acme" matches "Acme Corp". ' +
    'NOT for lookups by exact ID — use customer_get for that.'
  ),
  limit: z.number().int().min(1).max(100).default(10)
    .describe('Max results to return. Default 10. Increase for bulk operations.'),
  include_inactive: z.boolean().default(false)
    .describe('Include deactivated customer records. Default false.')
}, {
  description:
    'Search customers by name, email, or company. ' +
    'Use this when: you have a partial name/email and need a list to choose from. ' +
    'Do NOT use when: you have an exact customer ID (use customer_get). ' +
    'Do NOT use when: you need ALL customers — use customer_list instead. ' +
    'Returns up to 100 results sorted by relevance.',

  annotations: {
    readOnly: true,
    title: 'Search Customers'
  }
}, async ({ query, limit, include_inactive }) => {
  // implementation
});

The "Do NOT use when" clause is the most underused part of tool descriptions. It explicitly tells the model when to prefer a different tool — preventing the most common source of wrong-tool selection, where two tools overlap in apparent purpose.

Schema design for LLM-friendliness

The JSON schema for tool parameters is not just a validation spec — it is a semantic prompt to the LLM about what values are acceptable. Three patterns make schemas more LLM-friendly:

1. Enums over unconstrained strings. If a parameter accepts a fixed set of values, declare them as an enum. An LLM faced with status: z.string() may invent "in-progress" when you expect "active". With status: z.enum(['active', 'inactive', 'suspended']), the model selects from the declared values.

// Instead of:
status: z.string().describe('The customer status'),

// Use:
status: z.enum(['active', 'inactive', 'suspended'])
  .describe('Current customer status. Use "suspended" for billing-hold accounts, not "inactive".')

2. Descriptions on every parameter. LLMs use parameter descriptions to understand what value to provide and how to derive it from context. A description like 'ISO 8601 date string, e.g. "2026-01-15". Required when filtering by date range.' eliminates the guessing that produces "January 15 2026" as a value.

3. Avoid deeply nested schemas. A schema with three levels of nested objects is hard for models to fill correctly — especially optional nested objects where the LLM must decide whether to include the nesting at all. Flatten where possible. If you need nested data, use a simple JSON string parameter with a schema example in the description rather than a full nested Zod schema.

// Problematic nested schema:
filters: z.object({
  date_range: z.object({
    from: z.string().optional(),
    to: z.string().optional()
  }).optional(),
  customer: z.object({
    status: z.string().optional(),
    tier: z.string().optional()
  }).optional()
})

// Better — flat:
date_from: z.string().optional()
  .describe('Filter by start date. ISO 8601, e.g. "2026-01-01". Omit for no start bound.'),
date_to: z.string().optional()
  .describe('Filter by end date. ISO 8601, e.g. "2026-12-31". Omit for no end bound.'),
customer_status: z.enum(['active', 'inactive', 'suspended']).optional()
  .describe('Filter by customer status. Omit to include all.'),

Tool count management

LLMs that receive a tool list of 40+ items show degraded selection accuracy — the context window is partially consumed by tool definitions, and the model struggles to maintain clear distinctions between similar tools. Benchmark data from production MCP deployments consistently shows that tools lists above ~20 items correlate with increased wrong-tool selection rates.

Three strategies for managing tool count:

Tool composition. Combine fine-grained tools into a single tool with a discriminated mode parameter when the use cases are nearly identical. customer_search(mode: 'by_name' | 'by_email' | 'by_company') is one tool, not three. See tool composition patterns for the full trade-off analysis.

Dynamic tool exposure. For specialized subsets of tools, use session context to expose only the relevant tool subset. A billing agent doesn't need the inventory tools. Dynamically filter the tool list at tools/list response time based on the session's declared capabilities:

server.setRequestHandler(ListToolsRequestSchema, async (request) => {
  const sessionCapabilities = getSessionCapabilities(request);
  const allTools = getAllTools();

  // Filter to only tools relevant to this session's role
  const relevantTools = allTools.filter(tool =>
    sessionCapabilities.includes('billing')
      ? tool.tags?.includes('billing') || tool.tags?.includes('customer')
      : !tool.tags?.includes('billing')
  );

  return { tools: relevantTools };
});

Hierarchical tools. For very large tool sets, implement a two-level structure: a list_capabilities tool that returns a summary of available tool groups, and then load the specific tools for a group on demand. This is more complex but scales to 100+ tools without degrading selection accuracy.

Annotations and their effect on LLM reasoning

MCP's tool annotations (readOnly, destructive, idempotent, openWorld) are included in the tool definition that the LLM reads. Well-behaved clients like Claude Desktop use them to gate user confirmation dialogs. More importantly, LLMs trained on MCP usage incorporate these signals into their tool selection reasoning:

readOnly: true — signals that calling this tool has no side effects. LLMs are more willing to call read-only tools exploratorily. Set this on all query/search/get tools.
destructive: true — signals irreversibility. LLMs give extra pause before calling destructive tools and are more likely to ask for confirmation. Set on delete/void/cancel operations.
idempotent: true — signals that calling this tool multiple times with the same arguments is safe. LLMs are more confident re-calling idempotent tools on retry after ambiguous errors.

server.tool('invoice_void', {
  invoice_id: z.string().describe('UUID of the invoice to void. Cannot be undone.')
}, {
  description: 'Permanently void an invoice. It cannot be recovered after voiding. ' +
    'Use only when the invoice was created in error. For paid invoices, use invoice_refund instead.',
  annotations: {
    destructive: true,
    idempotent: false, // calling twice on a voided invoice returns an error, not the same result
    title: 'Void Invoice'
  }
}, async ({ invoice_id }) => {
  // implementation
});

Testing tool selection accuracy

Tool selection accuracy should be tested before deploying a new tool set. The test: write 20–50 natural language requests that should trigger specific tools, run them through a test LLM session, and check which tool was actually called against which should have been called.

// tool-selection-test.ts
const testCases = [
  { prompt: "Find customers named Acme", expected: "customer_search" },
  { prompt: "Get me customer ID abc-123", expected: "customer_get" },
  { prompt: "List all active customers", expected: "customer_list" },
  { prompt: "Create a new invoice for customer abc-123 for $500", expected: "invoice_create" },
  { prompt: "Cancel invoice inv-456", expected: "invoice_void" },
];

let correct = 0;
for (const tc of testCases) {
  const result = await runAgentSession(tc.prompt);
  const firstToolCall = result.toolCalls[0]?.name;
  if (firstToolCall === tc.expected) correct++;
  else console.error(`FAIL: "${tc.prompt}" → ${firstToolCall} (expected ${tc.expected})`);
}
console.log(`Accuracy: ${(correct / testCases.length * 100).toFixed(0)}%`);

Target selection accuracy above 90% before deploying. If you're below 80%, audit your descriptions for the failing cases — usually one of: missing disambiguation clause, ambiguous name, or too similar to another tool's description.

Frequently asked questions

Should tool names use snake_case or camelCase?

Snake_case is the community standard for MCP tools (consistent with the SDK examples and most published MCP servers). LLMs have no strong preference between the two — they can call tools by either naming convention. What matters is consistency within your server: mixing customer_search with getInvoice and list-orders in the same tool list creates unnecessary cognitive load. Pick one convention and apply it everywhere. If you're building on top of an existing API with a different naming convention, normalize to snake_case at the MCP tool layer even if the underlying API uses camelCase.

How long should tool descriptions be?

Long enough to disambiguate and short enough to not waste context. A good heuristic: 2–5 sentences. One sentence for what the tool does. One for when to use it. One to two for when NOT to use it (the disambiguation clause). One for key parameter notes if they're non-obvious. Avoid documenting parameter behavior in the tool description — that belongs in the parameter's own description field. Avoid implementation details — the LLM doesn't care how the query is executed, only what the tool does from the caller's perspective.

What happens if two different MCP servers expose tools with the same name?

The MCP SDK prefixes tool names with the server name when multiple servers are connected (server_name__tool_name). This prevents collisions but creates longer, less natural tool names that models sometimes struggle to memorize. If you control both servers, coordinate naming to avoid overlap — don't name a customer management tool search if another server already has a search tool with different semantics. The multi-server MCP patterns guide covers namespace management across server combinations.

Can I change tool names after deployment without breaking agents?

Changing a tool name is a breaking change for any agent that has the old tool name hardcoded in a system prompt or in persisted tool call history. For gradual migration: keep the old tool name as an alias (registered separately with the same handler), update your descriptions to deprecate it ("deprecated — use new_name instead. Will be removed 2026-09-01"), and update your system prompt to reference the new name. Remove the alias after agents have been updated. See MCP server versioning for broader API evolution patterns.