Guide · Multi-modal & Media Integration
MCP Server Image Processing — Sharp, ImageContent, and Vision Tools
MCP servers can process, transform, and return images as first-class content — giving AI agents vision capabilities over generated or user-supplied images. This guide covers integrating Sharp into a TypeScript MCP server, building resize, crop, and format-conversion tools, returning images as MCP ImageContent blocks, exposing image assets as MCP resources, and wiring a /health/image endpoint so AliveMCP detects when your image pipeline degrades.
TL;DR
Use Sharp for all image processing — it wraps libvips and is 4–5× faster than Jimp or Canvas for most operations. Load Sharp lazily (not at module top level) to avoid penalizing cold starts when image tools aren't called. Return image results as MCP ImageContent blocks with base64-encoded data and the correct mimeType. Cap input image sizes (20 MB max, 8000×8000 px max) to prevent memory bombs. For image resources, use URIs like image://{id} with a ReadResource handler that reads from disk or a content-addressed store. Wire /health/image to run a Sharp probe operation — a process that's alive but has Sharp initialization failures will silently fail every tool call.
Sharp setup and initialization
Sharp is a native Node.js module that wraps libvips. It has a binary that must be built for your target platform. On macOS and most Linux distros, npm install sharp downloads a prebuilt binary. In Alpine Linux (common in Docker), add --platform linux/amd64 or install libvips manually: apk add vips-dev.
// lazy-load sharp to avoid penalizing cold starts
let _sharp: typeof import('sharp') | null = null;
async function getSharp() {
if (!_sharp) {
// Dynamic import — only executed when an image tool is first called
const mod = await import('sharp');
_sharp = mod.default;
// Validate that Sharp and libvips initialized correctly
const info = _sharp.versions;
if (!info?.vips) {
throw new Error('Sharp loaded but libvips not available');
}
}
return _sharp;
}
// Input size guards — prevent memory bombs from huge images
const MAX_INPUT_BYTES = 20 * 1024 * 1024; // 20 MB
const MAX_DIMENSION = 8000; // 8000 px on either side
function validateImageBuffer(buf: Buffer): void {
if (buf.length > MAX_INPUT_BYTES) {
throw new McpError(
ErrorCode.InvalidParams,
`Image too large: ${(buf.length / 1e6).toFixed(1)} MB (max 20 MB)`
);
}
}
Never accept an image URL from the tool caller and fetch it inside the handler without URL validation. An attacker can pass http://169.254.169.254/latest/meta-data/ and use the image tool as an SSRF vector. Either restrict image input to base64-encoded data in the tool parameters, or validate the URL against an allow-list of trusted domains before fetching.
Resize and format conversion tools
The most common image tool pattern is resize + format conversion. Callers pass base64-encoded image data and get back a processed version. Use z.string() for the base64 input rather than z.string().base64() — Zod's built-in base64 validator rejects valid base64 strings with padding variations in some Node.js versions.
import { z } from 'zod';
import { McpError, ErrorCode } from '@modelcontextprotocol/sdk/types.js';
import { getSharp, validateImageBuffer } from './image.js';
server.tool(
'resize_image',
{
image_base64: z.string().min(1).describe('Base64-encoded image data (no data: URI prefix)'),
input_mime_type: z.enum(['image/jpeg', 'image/png', 'image/webp', 'image/gif', 'image/tiff']),
width: z.number().int().min(1).max(MAX_DIMENSION).optional(),
height: z.number().int().min(1).max(MAX_DIMENSION).optional(),
fit: z.enum(['cover', 'contain', 'fill', 'inside', 'outside']).default('inside'),
output_format: z.enum(['jpeg', 'png', 'webp']).default('webp'),
quality: z.number().int().min(10).max(100).default(85)
},
async ({ image_base64, width, height, fit, output_format, quality }) => {
if (!width && !height) {
throw new McpError(ErrorCode.InvalidParams, 'At least one of width or height must be provided');
}
const inputBuffer = Buffer.from(image_base64, 'base64');
validateImageBuffer(inputBuffer);
const sharp = await getSharp();
let pipeline = sharp(inputBuffer).resize({ width, height, fit, withoutEnlargement: true });
switch (output_format) {
case 'jpeg': pipeline = pipeline.jpeg({ quality, mozjpeg: true }); break;
case 'png': pipeline = pipeline.png({ compressionLevel: 6 }); break;
case 'webp': pipeline = pipeline.webp({ quality }); break;
}
const outputBuffer = await pipeline.toBuffer();
const { width: outW, height: outH, format } = await sharp(outputBuffer).metadata();
return {
content: [{
type: 'image',
data: outputBuffer.toString('base64'),
mimeType: `image/${output_format}` as 'image/jpeg' | 'image/png' | 'image/webp'
}, {
type: 'text',
text: JSON.stringify({
width: outW,
height: outH,
format,
bytes: outputBuffer.length,
compression_ratio: (inputBuffer.length / outputBuffer.length).toFixed(2)
})
}]
};
}
);
Return both the image content block and a text metadata block. LLM clients that support vision will render the image inline; the metadata block gives the agent programmatic access to the output dimensions and file size without parsing the image.
| fit mode | Behavior | Use case |
|---|---|---|
inside |
Scale down to fit within width×height, preserve ratio, never upscale | Default — safe for all inputs |
cover |
Crop to fill exact width×height | Thumbnail generation |
contain |
Letterbox to fit, add background | Fixed-size output with padding |
fill |
Stretch to exact dimensions (distorts) | Rarely appropriate |
outside |
Scale up until both dimensions meet or exceed target | Minimum-size guarantee |
Crop tool
server.tool(
'crop_image',
{
image_base64: z.string().min(1),
left: z.number().int().min(0),
top: z.number().int().min(0),
width: z.number().int().min(1),
height: z.number().int().min(1),
output_format: z.enum(['jpeg', 'png', 'webp']).default('png')
},
async ({ image_base64, left, top, width, height, output_format }) => {
const inputBuffer = Buffer.from(image_base64, 'base64');
validateImageBuffer(inputBuffer);
const sharp = await getSharp();
const meta = await sharp(inputBuffer).metadata();
// Validate crop region is within bounds
if (left + width > (meta.width ?? 0) || top + height > (meta.height ?? 0)) {
throw new McpError(
ErrorCode.InvalidParams,
`Crop region [${left},${top},${width},${height}] exceeds image dimensions ${meta.width}×${meta.height}`
);
}
const outputBuffer = await sharp(inputBuffer)
.extract({ left, top, width, height })
[output_format]()
.toBuffer();
return {
content: [{
type: 'image',
data: outputBuffer.toString('base64'),
mimeType: `image/${output_format}` as 'image/jpeg' | 'image/png' | 'image/webp'
}]
};
}
);
Exposing images as MCP resources
MCP resources let clients browse image libraries without calling a tool. A common pattern is a content-addressed image store (files named by SHA-256 hash) with URIs like image://{hash}. This lets agents reference images by stable URI and read them via resources/read without needing to call a tool.
import crypto from 'node:crypto';
import fs from 'node:fs/promises';
import path from 'node:path';
const IMAGE_STORE_DIR = process.env.IMAGE_STORE_DIR ?? '/tmp/mcp-images';
// Store an image and return its URI
export async function storeImage(buffer: Buffer, ext: string): Promise<string> {
await fs.mkdir(IMAGE_STORE_DIR, { recursive: true });
const hash = crypto.createHash('sha256').update(buffer).digest('hex').slice(0, 16);
const filename = `${hash}.${ext}`;
await fs.writeFile(path.join(IMAGE_STORE_DIR, filename), buffer);
return `image://${filename}`;
}
// List all stored images
server.setRequestHandler(ListResourcesRequestSchema, async () => {
let files: string[] = [];
try {
files = await fs.readdir(IMAGE_STORE_DIR);
} catch {
files = [];
}
return {
resources: files
.filter(f => /\.(png|jpg|jpeg|webp|gif)$/.test(f))
.map(f => ({
uri: `image://${f}`,
name: f,
mimeType: f.endsWith('.png') ? 'image/png'
: f.endsWith('.webp') ? 'image/webp'
: 'image/jpeg'
}))
};
});
// Read a specific image
server.setRequestHandler(ReadResourceRequestSchema, async (request) => {
const { uri } = request.params;
if (!uri.startsWith('image://')) {
throw new McpError(ErrorCode.InvalidParams, `Unknown resource URI: ${uri}`);
}
const filename = uri.slice('image://'.length);
// Prevent path traversal
if (filename.includes('/') || filename.includes('..')) {
throw new McpError(ErrorCode.InvalidParams, 'Invalid resource URI');
}
const filePath = path.join(IMAGE_STORE_DIR, filename);
const buffer = await fs.readFile(filePath).catch(() => {
throw new McpError(ErrorCode.InvalidParams, `Resource not found: ${uri}`);
});
const ext = path.extname(filename).slice(1).toLowerCase();
const mimeType = ext === 'png' ? 'image/png' : ext === 'webp' ? 'image/webp' : 'image/jpeg';
return {
contents: [{
uri,
mimeType,
blob: buffer.toString('base64')
}]
};
});
Health endpoint for image processing monitoring
A naive health check that only pings the process will miss Sharp initialization failures — for example, a missing libvips binary after a botched deployment, or a memory-constrained container where Sharp fails to allocate. Wire a health endpoint that actually runs a Sharp operation.
http.get('/health/image', async (req, reply) => {
const start = Date.now();
try {
const sharp = await getSharp();
// Probe: create a 10×10 solid color PNG — cheap and tests the full pipeline
const probeBuffer = await sharp({
create: {
width: 10,
height: 10,
channels: 3,
background: { r: 0, g: 128, b: 0 }
}
}).png().toBuffer();
const meta = await sharp(probeBuffer).metadata();
return reply.send({
status: 'ok',
latency_ms: Date.now() - start,
sharp_version: sharp.versions?.sharp,
vips_version: sharp.versions?.vips,
probe_width: meta.width,
probe_height: meta.height
});
} catch (err) {
return reply.code(503).send({
status: 'error',
detail: err instanceof Error ? err.message : String(err),
latency_ms: Date.now() - start
});
}
});
The probe creates a 10×10 green PNG — fast enough to run on every health check cycle without meaningful overhead. If libvips is missing or Sharp fails to allocate, this probe returns 503 and triggers an AliveMCP alert before any real image processing tools are called.
Silent failure modes
| Failure | Symptom | Caught by process ping? | Detection |
|---|---|---|---|
| Sharp/libvips not installed | All image tool calls throw on first use | No | /health/image with probe operation |
| Out-of-memory on large image | Process crash (or OOM kill in container) | After crash, yes | Cap input size; monitor container memory |
| Corrupt input buffer | Sharp throws Input buffer contains unsupported image format |
No — tool call fails with 200 | Log Sharp errors with input metadata; return structured McpError |
| Image store disk full | fs.writeFile throws ENOSPC |
No | Monitor disk usage; add disk-space check to /health/image |
Frequently asked questions
How do I accept an image from the user in an MCP tool?
Define the parameter as image_base64: z.string().min(1) and document that callers should pass the raw base64-encoded bytes without a data:image/png;base64, prefix — the MCP transport layer is JSON and the tool receives a plain string. If you need to support both formats (with and without the data URI prefix), strip the prefix in the handler: const data = image_base64.replace(/^data:[^;]+;base64,/, ''). Then Buffer.from(data, 'base64') works correctly in both cases.
Can I return multiple images in one tool call?
Yes — return an array of ImageContent blocks in the content array. LLM clients that support vision will render each one in sequence. Limit the number of images per response to 4–6; very large responses (many large images) can exceed client buffer limits and cause the tool call to fail silently. If a tool produces many images (e.g., a batch resize), consider returning them as stored resources with URIs rather than inline base64 data, and let the client read individual resources on demand.
How do I handle animated GIFs with Sharp?
Sharp supports animated GIFs and WebPs when built with libvips' animation support. Pass { animated: true } to the Sharp constructor for animated inputs: sharp(buffer, { animated: true }). Converting an animated GIF to WebP (which compresses 40–60% better) works transparently. However, resizing animated images is slow — Sharp processes each frame. For MCP tool use, consider converting animated GIFs to static frames or limiting animated processing to files under 2 MB.
What MIME types does MCP's ImageContent support?
The MCP spec defines ImageContent with a mimeType field. In practice, support depends on the LLM client. Claude clients support image/jpeg, image/png, image/gif, and image/webp. Stick to these four formats. Do not return image/tiff, image/bmp, or image/svg+xml as ImageContent — they may be ignored or cause errors in some clients. If you need to deliver SVG, return it as a TextContent block with the SVG markup.
Further reading
- MCP Server Playwright — screenshot tools and browser automation
- MCP Server PDF Tools — text extraction and page chunking
- MCP Server Binary Content — handling non-text responses in MCP tools
- MCP Server Health Check — designing endpoints for uptime monitors
- MCP Server Error Handling — mapping Sharp errors to McpError codes