Guide · MCP Protocol

MCP server binary content

MCP tool results are not limited to text. A tool handler can return images, charts, screenshots, and other binary data using the image content type — a base64-encoded payload with an explicit MIME type. The LLM receives the image as a visual input alongside any text description you include. For files too large to inline (PDFs, datasets, large images), the resource content type provides a URI reference that the client can retrieve separately. Getting binary content right requires understanding the base64 size overhead, which MIME types clients actually render, how to combine text and image items in a single response, and how to handle the case where a client does not support image content.

TL;DR

Return images as { type: 'image', data: buffer.toString('base64'), mimeType: 'image/png' } in the content array. Keep inline images under 1 MB (base64-encoded) — most clients have implicit context limits. Combine a text description with the image item for richer LLM context. Use { type: 'resource', resource: { uri, mimeType, blob } } for large or non-image files where a reference is more appropriate than inline embedding. Always generate a thumbnail for large images before inlining them.

MCP content types

A tools/call response carries a content array where each item has a type field that determines how the client handles it:

Content type	Fields	Best for
`text`	`text` (string)	Plain text, Markdown, JSON output, error messages
`image`	`data` (base64 string), `mimeType` (string)	Screenshots, charts, diagrams, small images (< ~1 MB encoded)
`resource`	`resource` object with `uri` and optionally `text` or `blob`	Large files, PDFs, binary content better served by URI reference

Multiple items can coexist in the content array. A tool that generates a chart can return both a text summary and the chart image in the same response. The LLM receives all items and can reference both in its reply to the user.

Returning an image

The most common binary content pattern is a screenshot or chart tool. Read the binary data into a Buffer, encode it as base64, and return it as an image content item.

import * as fs from 'node:fs/promises';
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { z } from 'zod';

const server = new McpServer({ name: 'visual-tools', version: '1.0.0' });

server.tool(
  'capture_chart',
  {
    metric: z.enum(['requests', 'latency', 'errors']),
    period_hours: z.number().int().positive().max(168).default(24),
  },
  async (args) => {
    // Generate chart and write to temp file (example: using charting library)
    const chartBuffer = await generateChart(args.metric, args.period_hours);

    return {
      content: [
        {
          type: 'text',
          text: `${args.metric} chart for the last ${args.period_hours} hours (PNG, ${chartBuffer.length} bytes):`,
        },
        {
          type: 'image',
          data: chartBuffer.toString('base64'),
          mimeType: 'image/png',
        },
      ],
    };
  }
);

Including a text description before the image item helps the LLM understand what it is looking at and gives it context to reference when replying to the user. Without a description, the LLM must infer the image's content from the pixels alone.

Supported MIME types

The MCP spec does not enumerate which MIME types clients must support. In practice, support varies by host application:

MIME type	Claude Desktop	Cursor	Notes
`image/png`	✓ Rendered inline	✓ Rendered inline	Best compatibility; use for screenshots and charts
`image/jpeg`	✓ Rendered inline	✓ Rendered inline	Lossy compression — avoid for text-heavy images
`image/gif`	✓ Rendered inline	Varies	Only first frame shown by some clients
`image/webp`	✓ Rendered inline	✓ Rendered inline	Good compression; not supported in all environments
`image/svg+xml`	Varies	Varies	Treat as text content, not image, for broader support
`application/pdf`	Not rendered	Not rendered	Extract text and return as text content instead

For maximum compatibility, use image/png. If you receive a file in another format and need to return it as viewable image content, convert it with a library like sharp before encoding:

import sharp from 'sharp';

// Convert any image to PNG before inlining
const pngBuffer = await sharp(inputBuffer)
  .resize({ width: 1200, withoutEnlargement: true })
  .png({ compressionLevel: 8 })
  .toBuffer();

return {
  content: [{
    type: 'image',
    data: pngBuffer.toString('base64'),
    mimeType: 'image/png',
  }],
};

Size limits and thumbnails

Base64 encoding inflates binary data by approximately 33% (every 3 bytes become 4 characters). A 750 KB PNG becomes roughly 1 MB of base64 in the content array. This base64 payload is included in the LLM's context window alongside the conversation and any other tool results — so very large images consume significant context budget.

Image size (raw binary)	Base64 size (approx.)	Recommendation
< 500 KB	< 670 KB	Safe to inline — use directly
500 KB – 1.5 MB	670 KB – 2 MB	Generate a thumbnail first, inline the thumbnail
> 1.5 MB	> 2 MB	Use resource reference instead of inline; always thumbnail

const MAX_INLINE_BYTES = 500_000; // 500 KB raw

async function inlineOrThumbnail(buffer: Buffer, mimeType: string) {
  if (buffer.length <= MAX_INLINE_BYTES) {
    return { data: buffer.toString('base64'), mimeType };
  }

  // Downsample to 800px wide thumbnail for inlining
  const thumbnail = await sharp(buffer)
    .resize({ width: 800, withoutEnlargement: true })
    .png({ compressionLevel: 9 })
    .toBuffer();

  return { data: thumbnail.toString('base64'), mimeType: 'image/png' };
}

Using resource content for large files

When a file is too large to inline or is not an image (PDF, CSV, archive), use the resource content type. This returns a URI reference that the client can retrieve via the MCP Resources API or display as a download link, rather than embedding the binary data in the tool result.

// Store the file and return a resource reference
server.tool('export_csv', { table: z.string() }, async (args) => {
  const data = await exportTable(args.table);
  const filename = `${args.table}-${Date.now()}.csv`;

  // Save to a location the server can serve via the Resources API
  await fs.writeFile(path.join(exportsDir, filename), data, 'utf-8');

  return {
    content: [
      {
        type: 'text',
        text: `Exported ${data.split('\n').length - 1} rows from ${args.table}.`,
      },
      {
        type: 'resource',
        resource: {
          uri: `file:///exports/${filename}`,
          mimeType: 'text/csv',
          text: data.slice(0, 500) + (data.length > 500 ? '\n… (truncated)' : ''),
        },
      },
    ],
  };
});

// For binary files — use blob (base64) instead of text
server.tool('export_pdf', { report_id: z.string() }, async (args) => {
  const pdfBuffer = await generateReport(args.report_id);
  const filename = `report-${args.report_id}.pdf`;
  await fs.writeFile(path.join(exportsDir, filename), pdfBuffer);

  return {
    content: [
      { type: 'text', text: `Generated PDF report for ${args.report_id}.` },
      {
        type: 'resource',
        resource: {
          uri: `file:///exports/${filename}`,
          mimeType: 'application/pdf',
          blob: pdfBuffer.toString('base64'),
        },
      },
    ],
  };
});

Screenshot and vision tool patterns

Screenshot tools are one of the most common uses of binary content. The pattern is: launch a headless browser (or use an OS-level screenshot API), capture the image to a Buffer, optionally resize it, and return as an image content item.

import puppeteer from 'puppeteer';

server.tool(
  'screenshot_url',
  {
    url: z.string().url(),
    width: z.number().int().positive().max(1920).default(1280),
    height: z.number().int().positive().max(1080).default(800),
  },
  async (args, extra) => {
    const browser = await puppeteer.launch({ headless: true });
    try {
      const page = await browser.newPage();
      await page.setViewport({ width: args.width, height: args.height });

      // Respect cancellation — check before long operations
      if (extra.signal.aborted) {
        return { content: [{ type: 'text', text: 'Cancelled.' }] };
      }

      await page.goto(args.url, { waitUntil: 'networkidle0', timeout: 15000 });
      const screenshot = await page.screenshot({ type: 'png' }) as Buffer;

      // Resize if over limit
      const MAX_BYTES = 500_000;
      const finalBuffer = screenshot.length > MAX_BYTES
        ? await sharp(screenshot).resize({ width: 800 }).png({ compressionLevel: 9 }).toBuffer()
        : screenshot;

      return {
        content: [
          { type: 'text', text: `Screenshot of ${args.url} (${finalBuffer.length} bytes):` },
          { type: 'image', data: finalBuffer.toString('base64'), mimeType: 'image/png' },
        ],
      };
    } finally {
      await browser.close();
    }
  }
);

Testing binary content

Integration tests for binary content should assert structure, not pixel values. Check that the content item is present, has the correct type and MIME type, and that the data is valid base64 that decodes to a plausible size.

import { InMemoryTransport } from '@modelcontextprotocol/sdk/inMemory.js';
import { Client } from '@modelcontextprotocol/sdk/client/index.js';
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { describe, it, expect } from 'vitest';

describe('capture_chart', () => {
  it('returns a PNG image content item', async () => {
    const [clientTransport, serverTransport] = InMemoryTransport.createLinkedPair();
    const server = createServer(); // your server factory
    await server.connect(serverTransport);

    const client = new Client({ name: 'test', version: '1.0.0' }, { capabilities: {} });
    await client.connect(clientTransport);

    const result = await client.callTool({ name: 'capture_chart', arguments: { metric: 'requests' } });

    expect(result.isError).toBeFalsy();
    expect(result.content).toHaveLength(2);

    const textItem = result.content[0];
    expect(textItem.type).toBe('text');
    expect(textItem.text).toContain('chart');

    const imageItem = result.content[1];
    expect(imageItem.type).toBe('image');
    expect(imageItem.mimeType).toBe('image/png');

    // Valid base64 that decodes to > 0 bytes
    const decoded = Buffer.from(imageItem.data as string, 'base64');
    expect(decoded.length).toBeGreaterThan(100);

    // PNG magic bytes: 89 50 4E 47
    expect(decoded[0]).toBe(0x89);
    expect(decoded[1]).toBe(0x50);

    await client.close();
    await server.close();
  });
});

Frequently asked questions

Can I return multiple images in a single tool response?

Yes. The content array accepts any number of items in any order. A comparison tool might return a before-and-after pair as two consecutive image items, each preceded by a text label. There is no protocol-enforced limit on the number of image items; the practical limit is the client's context window and rendering performance. Three to five images per response is a reasonable ceiling before the response becomes unwieldy.

What happens if the client does not support image content?

Clients that do not support image rendering will either display a placeholder, skip the item, or show the base64 data as raw text (which is not useful). The safest approach for tools targeting a broad range of clients is to always include a text item with a description alongside any image, so the LLM has readable context even if the image is not rendered. You can also check client capabilities during initialization to determine whether to include image content or fall back to text-only output.

How do I serve large binary files without inlining them?

Register a resource handler on your MCP server using server.resource() or server.setResourceHandler(). Return a resource content item from your tool with the URI pointing to that registered resource. The client can then call resources/read to retrieve the full binary content. This is the correct pattern for PDFs, archives, and any file larger than ~1 MB — it avoids bloating the tool response and the LLM context with base64 data. See MCP server Resources API for the resource handler pattern.

Does AliveMCP verify image content in tool responses?

AliveMCP's standard probe sends initialize and tools/list only — it does not call individual tools and therefore does not inspect tool response content types. If you configure a custom synthetic probe that calls a specific image-returning tool, AliveMCP verifies that the tool responds successfully (no isError: true and no JSON-RPC error) but does not validate the visual content of the image. Visual regression testing requires a separate test suite.

Should I convert all images to PNG before returning them?

Not necessarily — but PNG is the safest choice for maximum client compatibility. If you are returning photos or camera images where JPEG compression is acceptable and you need smaller sizes, JPEG works fine. If the image contains text or line art (diagrams, charts, screenshots of UIs), PNG's lossless compression preserves readability better than JPEG, which introduces artifacts around sharp edges and text. When in doubt, use PNG. If size is a concern, use sharp to resize before encoding rather than switching to JPEG.