Guide · Property Testing

MCP server property-based testing

Example-based tests verify specific inputs you thought to write down. Property-based tests verify claims that must hold for every input in a defined domain — and they generate hundreds of random inputs automatically to check them. For MCP tool handlers, property testing is particularly valuable because LLM clients send unpredictable arguments: empty strings, very long text, Unicode emoji, nulls inside objects, integers at the boundaries of 32-bit ranges. A handler that works on your three example inputs can still crash on input 47. fast-check is the standard property-testing library for TypeScript/JavaScript; this guide covers how to integrate it with Vitest, derive arbitraries from Zod schemas, and verify the four invariants every MCP handler must satisfy.

TL;DR

Install fast-check. Write fc.assert(fc.property(arbitrary, handler => { /* assert invariant */ })) rather than one it() per example. The four invariants worth testing: (1) the handler never throws — always returns a valid CallToolResult, (2) when content[0].type === 'text' and the content is supposed to be structured JSON, JSON.parse() succeeds, (3) read-only tools are idempotent — same args produce the same output, (4) error results always have isError: true, content[0].type === 'text', and a non-empty message. When fast-check finds a failing input it automatically shrinks it to the simplest still-failing example — read that output first, not the original random seed.

What property tests catch that example tests miss

Consider a search_documents tool that accepts a query string and a limit integer. An example-based test suite might look like this:

// Example-based — only tests what you thought to write
it('returns results for a normal query', async () => {
  const result = await client.callTool({
    name: 'search_documents',
    arguments: { query: 'typescript', limit: 10 },
  });
  expect(result.isError).toBeFalsy();
});

it('returns isError when limit is zero', async () => {
  const result = await client.callTool({
    name: 'search_documents',
    arguments: { query: 'typescript', limit: 0 },
  });
  expect(result.isError).toBe(true);
});

These pass. But the handler contains this bug buried inside a helper:

// Inside the handler — a subtle crash waiting to happen
function buildSqlLike(query: string): string {
  // Crashes when query contains a backslash followed by a quote
  return `'%${query}%'`;
}

A property test catches it on the first run:

import fc from 'fast-check';

it('never throws for any valid string query and positive limit', async () => {
  await fc.assert(
    fc.asyncProperty(
      fc.string(),            // any string — including '', '\0', '\\\'', emoji
      fc.integer({ min: 1, max: 1000 }),
      async (query, limit) => {
        const result = await client.callTool({
          name: 'search_documents',
          arguments: { query, limit },
        });
        // Invariant: always returns a result object — never throws
        expect(result).toHaveProperty('content');
        expect(Array.isArray(result.content)).toBe(true);
      }
    ),
    { numRuns: 200 }
  );
});

fast-check generates 200 random (query, limit) pairs. Within the first few runs it will produce a string containing a backslash-quote sequence, the handler throws instead of returning a result, and the test fails. fast-check then shrinks the failing input: the final reported counterexample might be query: "\\'" — the minimal string that reproduces the crash, not the original 47-character random string that first triggered it.

The inputs that most often expose bugs in MCP handlers: empty string (""), string with only whitespace (" "), string containing SQL metacharacters (%_\), very long string (50,000 characters), Unicode combining characters and right-to-left marks, null bytes (""), integers at Number.MAX_SAFE_INTEGER, negative integers where only positive is documented, and arrays with zero elements.

Setting up fast-check with Vitest

npm install --save-dev fast-check

fast-check works with both Jest and Vitest without any additional configuration — it is a plain TypeScript library that exports fc.assert() and fc.property(). For async tool calls, use fc.asyncProperty():

// src/search.property.test.ts
import { describe, it, beforeEach, afterEach } from 'vitest';
import fc from 'fast-check';
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { InMemoryTransport } from '@modelcontextprotocol/sdk/inMemory.js';
import { createServer } from './server.js';
import { createFakeDeps } from './test-helpers.js';

describe('search_documents — property invariants', () => {
  let client: Client;

  beforeEach(async () => {
    const [serverTransport, clientTransport] = InMemoryTransport.createLinkedPair();
    const server = createServer(createFakeDeps());
    await server.connect(serverTransport);
    client = new Client({ name: 'prop-test-client', version: '1.0.0' }, { capabilities: {} });
    await client.connect(clientTransport);
  });

  afterEach(async () => {
    await client.close();
  });

  it('never crashes for any non-empty string query', async () => {
    await fc.assert(
      fc.asyncProperty(
        fc.string({ minLength: 0, maxLength: 10_000 }),
        fc.integer({ min: 1, max: 500 }),
        async (query, limit) => {
          // Must not throw — always return a result
          const result = await client.callTool({
            name: 'search_documents',
            arguments: { query, limit },
          });
          return Array.isArray(result.content) && result.content.length > 0;
        }
      ),
      { numRuns: 300, verbose: true }
    );
  });
});

The verbose: true option prints each failing counterexample along with the seed that generated it. During CI you can suppress it; during local debugging it is invaluable.

Generating valid MCP tool arguments from a Zod schema

If your tool validates arguments with Zod, you want to generate inputs that satisfy the schema — not random garbage. fast-check does not read Zod schemas automatically, but you can build a matching arbitrary with a small helper:

import fc, { Arbitrary } from 'fast-check';
import { z } from 'zod';

// Maps a flat Zod object schema to a fast-check arbitrary
function arbitraryFromZodObject<T extends z.ZodRawShape>(
  shape: T
): Arbitrary<z.infer<z.ZodObject<T>>> {
  const fields: Record<string, Arbitrary<unknown>> = {};

  for (const [key, schema] of Object.entries(shape)) {
    fields[key] = arbitraryFromZodType(schema);
  }

  return fc.record(fields) as Arbitrary<z.infer<z.ZodObject<T>>>;
}

function arbitraryFromZodType(schema: z.ZodTypeAny): Arbitrary<unknown> {
  if (schema instanceof z.ZodString)   return fc.string({ maxLength: 200 });
  if (schema instanceof z.ZodNumber)   return fc.float({ noNaN: true, noDefaultInfinity: true });
  if (schema instanceof z.ZodBoolean)  return fc.boolean();
  if (schema instanceof z.ZodEnum)     return fc.constantFrom(...(schema.options as string[]));
  if (schema instanceof z.ZodArray)    return fc.array(arbitraryFromZodType(schema.element), { maxLength: 20 });
  if (schema instanceof z.ZodOptional) return fc.option(arbitraryFromZodType(schema.unwrap()), { nil: undefined });
  if (schema instanceof z.ZodObject)   return arbitraryFromZodObject(schema.shape);
  // Fallback for unsupported types — fast-check will skip those runs
  return fc.constant(null);
}

Use it to generate valid arguments directly from your existing schema:

const SearchArgsSchema = z.object({
  query:    z.string().min(1).max(500),
  limit:    z.number().int().min(1).max(100),
  category: z.enum(['docs', 'issues', 'prs']).optional(),
});

const validArgsArbitrary = arbitraryFromZodObject(SearchArgsSchema.shape);

Four key invariants to test in every MCP handler

1. Never throws — always returns a valid CallToolResult

A handler that throws causes the MCP SDK to emit a JSON-RPC error response, which is a different shape from a tool result. LLM clients handle these differently, and often not gracefully. The invariant is that for any well-formed input, the handler returns an object with a content array — it never throws.

it('invariant: never throws for any valid args', async () => {
  await fc.assert(
    fc.asyncProperty(validArgsArbitrary, async (args) => {
      let result: Awaited<ReturnType<typeof client.callTool>>;
      try {
        result = await client.callTool({ name: 'search_documents', arguments: args });
      } catch (e) {
        // A thrown error means the handler threw — this is the failure
        throw new Error(`Handler threw instead of returning isError:true — ${String(e)}`);
      }
      // Must have content array
      if (!Array.isArray(result.content) || result.content.length === 0) {
        throw new Error('Result has no content array');
      }
    }),
    { numRuns: 300 }
  );
});

2. Structured text content is always valid JSON

Many MCP tools return structured data serialised as a JSON string inside a text content block. If your handler sometimes produces truncated or malformed JSON, the LLM cannot parse it. The property: whenever content[0].type === 'text' and the result is not an error, JSON.parse(content[0].text) must succeed.

it('invariant: non-error text content is always valid JSON', async () => {
  await fc.assert(
    fc.asyncProperty(validArgsArbitrary, async (args) => {
      const result = await client.callTool({ name: 'search_documents', arguments: args });
      if (result.isError) return; // error path is tested separately

      const block = result.content[0];
      if (block.type !== 'text') return; // image/resource blocks are exempt

      try {
        JSON.parse((block as { type: string; text: string }).text);
      } catch {
        throw new Error(
          `Non-error text content is not valid JSON.\nArgs: ${JSON.stringify(args)}\nContent: ${(block as { type: string; text: string }).text.slice(0, 200)}`
        );
      }
    }),
    { numRuns: 300 }
  );
});

3. Read tools are idempotent — same input, same output

A read-only tool (one that does not mutate state) should return identical content for identical arguments on consecutive calls. If it does not — if the order of results changes, or a timestamp leaks in — the LLM gets inconsistent answers when it retries. The property: calling the tool twice with the same args returns the same content[0].text.

it('invariant: read tool returns same result on repeated calls', async () => {
  await fc.assert(
    fc.asyncProperty(validArgsArbitrary, async (args) => {
      const first  = await client.callTool({ name: 'search_documents', arguments: args });
      const second = await client.callTool({ name: 'search_documents', arguments: args });

      if (first.isError !== second.isError) {
        throw new Error('isError differs between calls with same args');
      }

      const text1 = (first.content[0]  as { type: string; text: string }).text;
      const text2 = (second.content[0] as { type: string; text: string }).text;

      if (text1 !== text2) {
        throw new Error(
          `Non-deterministic output for args: ${JSON.stringify(args)}\nFirst:  ${text1.slice(0, 100)}\nSecond: ${text2.slice(0, 100)}`
        );
      }
    }),
    { numRuns: 150 } // fewer runs — two calls per iteration
  );
});

If your tool has inherent non-determinism (random IDs, current time), inject a controllable clock and a seeded random source via your dependency injection layer so the invariant holds in tests.

4. Error results are always well-formed

When a handler encounters an error it cannot recover from (invalid arguments, upstream failure), it should return a result with isError: true, a content array with at least one text block, and a message with enough detail for the LLM to understand what went wrong. A malformed error result — empty content, missing isError, numeric text — is almost as bad as a thrown error.

// Arbitraries that SHOULD trigger the error path — violate schema constraints
const invalidArgsArbitrary = fc.oneof(
  fc.record({ query: fc.constant(''), limit: fc.integer({ min: 1, max: 10 }) }),
  fc.record({ query: fc.string(), limit: fc.integer({ min: -1000, max: 0 }) }),
  fc.record({ query: fc.string({ maxLength: 10_000 }), limit: fc.constant(9999) }),
);

it('invariant: error results are always well-formed', async () => {
  await fc.assert(
    fc.asyncProperty(invalidArgsArbitrary, async (args) => {
      const result = await client.callTool({ name: 'search_documents', arguments: args });

      if (!result.isError) return; // handler accepted the input — fine, not an error path

      // isError: true must come with a text content block
      const block = result.content[0] as { type: string; text: string } | undefined;
      if (!block) throw new Error('isError result has empty content array');
      if (block.type !== 'text') throw new Error(`isError result has non-text content[0]: ${block.type}`);
      if (!block.text || block.text.trim().length === 0) throw new Error('isError result has empty message');
    }),
    { numRuns: 200 }
  );
});

Shrinking and MCP debugging

Shrinking is fast-check's most valuable feature for MCP debugging. When a property fails, fast-check does not just report the first random input that caused it — it iteratively reduces that input, removing parts and retrying, until it finds the simplest input that still fails. For string arbitraries, shrinking means the string gets shorter. For objects, fields get removed or their values collapse toward zero/empty. For arrays, elements are removed.

A real example. fast-check finds that the search_documents handler crashes. The original failing input is:

// Original failing input — random, hard to read
{
  query: "helloworld‮test",
  limit: 47
}

After shrinking, fast-check reports:

// Shrunk counterexample — minimal, diagnostic
{
  query: "",
  limit: 1
}

// Counterexample seed — paste into your test to reproduce exactly
fc.assert(fc.property(...), { seed: 1718272841, path: "3:1:0", endOnFailure: true })

The shrunk counterexample tells you immediately: the bug involves a null byte in the query string, not Unicode RTL marks, not long strings. You open the handler and find that query.trim() does not remove , which then hits an unescaped SQLite parameter. Without shrinking you would have spent time investigating whether the bug was the RTL mark or the BOM or the trailing null. Shrinking points you directly at .

To reproduce the exact failure later:

it('reproduces the null-byte crash (seed from CI)', async () => {
  await fc.assert(
    fc.asyncProperty(validArgsArbitrary, async (args) => {
      const result = await client.callTool({ name: 'search_documents', arguments: args });
      return Array.isArray(result.content) && result.content.length > 0;
    }),
    { seed: 1718272841, path: '3:1:0', endOnFailure: true }
  );
});

Run this test after your fix to confirm the specific case is resolved, then delete it — the main property test will continue checking thousands of other random inputs.

Generating MCP-aware arbitraries for complex schemas

MCP tool arguments often have cross-field constraints — startDate must be before endDate, or pageSize and pageToken must be provided together. Use fc.chain() and .filter() to express these constraints as arbitraries rather than skipping invalid inputs inside the property body.

// Generate a date range where start <= end
const dateRangeArbitrary = fc.tuple(
  fc.integer({ min: 0, max: 1_700_000_000 }),
  fc.integer({ min: 0, max: 1_700_000_000 })
).map(([a, b]) => ({
  startDate: new Date(Math.min(a, b) * 1000).toISOString(),
  endDate:   new Date(Math.max(a, b) * 1000).toISOString(),
}));

// Pagination args where cursor is only provided when page > 1
const paginationArbitrary = fc.integer({ min: 1, max: 50 }).chain((page) =>
  fc.record({
    page:   fc.constant(page),
    cursor: page === 1 ? fc.constant(undefined) : fc.string({ minLength: 8, maxLength: 32 }),
  })
);

For pagination specifically, there is a fourth invariant worth testing: the same items appear in the same order regardless of page size. Generate the same dataset twice, once with pageSize: 1 (fetching every item one per page) and once with pageSize: 10, then assert that the concatenated list of IDs is identical in both cases. This catches off-by-one bugs in cursor generation that example tests almost never find.

it('invariant: pagination is consistent across page sizes', async () => {
  await fc.assert(
    fc.asyncProperty(
      fc.integer({ min: 1, max: 50 }),  // total items to create (via fake deps)
      fc.integer({ min: 1, max: 5 }),   // small page size
      fc.integer({ min: 6, max: 20 }),  // larger page size
      async (totalItems, smallPage, largePage) => {
        const idsSmall = await fetchAllPages(client, 'list_items', { pageSize: smallPage });
        const idsLarge = await fetchAllPages(client, 'list_items', { pageSize: largePage });

        if (JSON.stringify(idsSmall) !== JSON.stringify(idsLarge)) {
          throw new Error(
            `Pagination inconsistency: small=${JSON.stringify(idsSmall.slice(0, 5))} ` +
            `large=${JSON.stringify(idsLarge.slice(0, 5))}`
          );
        }
      }
    ),
    { numRuns: 100 }
  );
});

CI integration

Property tests are more expensive than unit tests because they run the handler many times per test. Keep the configuration intentional:

// vitest.config.ts — environment variable controls numRuns
const numRuns = process.env.CI ? 500 : 100;

// Pass the run count to each property via fc.configureGlobal()
// Put this in your test setup file (vitest.setup.ts):
import fc from 'fast-check';
fc.configureGlobal({
  numRuns: process.env.CI ? 500 : 100,
  verbose: process.env.CI ? false : true,
});

For seed control — making CI failures reproducible — log the seed whenever a property fails:

it('never crashes for any valid input', async () => {
  await fc.assert(
    fc.asyncProperty(validArgsArbitrary, async (args) => {
      const result = await client.callTool({ name: 'search_documents', arguments: args });
      return Array.isArray(result.content) && result.content.length > 0;
    }),
    {
      numRuns: 500,
      reporter: (output) => {
        if (!output.failed) return;
        // Logs the seed and path so the CI failure is fully reproducible
        console.error(
          `PROPERTY FAILED\n` +
          `seed: ${output.seed}\n` +
          `path: ${output.counterexamplePath}\n` +
          `counterexample: ${JSON.stringify(output.counterexample, null, 2)}`
        );
      },
    }
  );
});

In GitHub Actions, the logged seed appears in the failed test output. Any developer can copy the seed and path values into the { seed, path, endOnFailure: true } options to reproduce the exact failure locally — no trial-and-error, no "it only fails in CI" investigations. Include the seed in your PR description when fixing a property failure so reviewers know which edge case you addressed.

Run property tests in a separate CI step from unit tests. Property tests are slower and it is useful to see unit test results even when a property test is still running. A typical split: unit tests in 15 seconds, property tests in 90 seconds for 2,000 total handler invocations across four properties.

Property tests and AliveMCP

Property tests verify static correctness: given any well-formed input, your handler code returns a valid result. They run during development and CI against fake dependencies with controlled behaviour. What they cannot verify is whether the deployed server responds to real MCP protocol messages over the network — the initialize handshake, capability negotiation, the actual TCP or HTTP connection. A server can pass every property test locally and still be unreachable after a deployment that misconfigured a missing environment variable or a broken TLS certificate.

AliveMCP monitors the deployed server by running real MCP probes — initialize then tools/list — every 60 seconds from external infrastructure. It alerts you the moment the server stops responding, regardless of why. Together, property testing and AliveMCP give you two complementary guarantees: property tests certify that the handler logic is correct for all inputs in the input domain; AliveMCP certifies that the server is alive and speaking valid MCP on the network. Neither replaces the other.

A practical workflow: add a property test when you find or fix a bug — the property test documents the invariant that was violated and prevents regression. Register the server on AliveMCP so that any infrastructure-level failure (crash, OOM, network partition) triggers an alert within 60 seconds. Both checks together mean a handler correctness bug and a production outage each take under five minutes to detect.