Multi-modal guide · 2026-06-22 · Multi-modal & Media Integration

Multi-modal MCP Servers: Playwright Screenshots, Sharp Images, PDF Extraction, S3 Storage, and FFmpeg Transcription

Text tools are the entry point for MCP servers, but the most useful agents work with the full range of media: screenshots, images, documents, files, and audio. Each media integration — Playwright browser automation, Sharp image processing, PDF extraction, S3 file storage, and FFmpeg transcription — introduces an external native dependency that can fail independently of the Node.js process. When it does, your HTTP health check stays green while every media tool silently breaks. This guide synthesizes the five integrations into a unified reliability pattern: what each dependency brings, the exact silent failure mode it creates, and the /health endpoint that surfaces that failure to AliveMCP before callers encounter it.

The five integrations at a glance

Integration	What it enables	Native dependency	Silent failure mode	Health signal
Playwright	Screenshots, page scraping, browser automation	Chromium subprocess	Chromium crashes — process alive, all browser tools timeout	`/health/browser`: blank-page navigation probe
Sharp	Resize, crop, format conversion, metadata extraction	libvips native binary	libvips init failure — Sharp import succeeds, first resize throws	`/health/image`: create 10×10 solid-color PNG
PDF tools	Text extraction, page chunking, RAG ingestion	pdf-parse / pdfjs-dist	Scanned PDF (no text layer) — extraction returns empty string, no error	`/health/pdf`: extract known-good minimal PDF
S3 tools	Upload, download, presigned URLs, file listing	AWS IAM credentials	IAM policy change removes `PutObject` — reads work, writes fail silently	`/health/s3`: write-read-delete canary object
FFmpeg	Audio transcription, video metadata, format probing	ffmpeg + ffprobe binaries	Missing binary after container rebuild — process starts, all media tools throw ENOENT	`/health/ffmpeg`: `ffmpeg -version` + `ffprobe -version`

The unifying pattern: each integration adds a dependency outside the Node.js process boundary. AliveMCP's protocol probe and your standard HTTP check both test the process boundary — they cannot see what's happening inside Chromium, libvips, a PDF parser, S3 IAM, or a system binary. Closing this gap requires a custom /health endpoint that actually exercises each dependency path.

Playwright: browser automation that returns ImageContent

The standard mistake when adding Playwright to an MCP server is launching a browser inside each tool call. Chromium startup takes 300–800 ms and allocates 80–200 MB. Launching per-call under concurrent load quickly exhausts memory. The correct pattern is a singleton browser launched at server startup, with each tool call creating and immediately closing its own BrowserContext.

The BrowserContext isolation is not optional — it is a security boundary. Without it, cookies, localStorage, and session state leak between callers. An agent handling multiple user sessions can inadvertently read another session's authenticated cookies from a previous call.

let _browser: Browser | null = null;

export async function getBrowser(): Promise<Browser> {
  if (_browser?.isConnected()) return _browser;
  _browser = await chromium.launch({ headless: true, args: ['--no-sandbox', '--disable-dev-shm-usage'] });
  _browser.on('disconnected', () => { _browser = null; }); // force re-launch on crash
  return _browser;
}

// Each tool call creates an isolated context
async function withContext<T>(fn: (ctx: BrowserContext) => Promise<T>): Promise<T> {
  const browser = await getBrowser();
  const ctx = await browser.newContext();
  try { return await fn(ctx); }
  finally { await ctx.close(); }
}

Screenshot tools return base64-encoded data as MCP ImageContent blocks — the only way to pass image data through the MCP protocol. Always set a hard timeout on page.goto() and use waitForLoadState('domcontentloaded'), not 'networkidle': network-idle waits for all in-flight requests to complete, which can stall indefinitely on pages that fire continuous AJAX polling. SSRF prevention must block private IP ranges and non-HTTP schemes before passing the URL to Playwright.

Silent failure mode: Chromium subprocess crash. When Chromium crashes, the browser.isConnected() check returns false, but this only triggers on the next tool call that tries to reuse the browser. Between the crash and the next call, the Node.js MCP process is fully alive and responding to initialize and tools/list requests. Your HTTP health check sees HTTP 200. AliveMCP's protocol probe completes successfully. Every screenshot tool call fails with a timeout after the crash.

The /health/browser endpoint closes this gap by performing an actual blank-page navigation on each probe cycle:

app.get('/health/browser', async (req, res) => {
  const start = Date.now();
  try {
    await withContext(async (ctx) => {
      const page = await ctx.newPage();
      await page.goto('about:blank', { timeout: 5000 });
      await page.close();
    });
    res.json({ status: 'ok', latency_ms: Date.now() - start });
  } catch (err) {
    res.status(503).json({ status: 'error', error: String(err) });
  }
});

Register /health/browser as a custom health URL in AliveMCP at a 2-minute probe interval. A Chromium crash surfaces within one probe cycle rather than when the next caller triggers the failure.

Concurrency also needs explicit management. Without a semaphore, an agent loop calling take_screenshot ten times in parallel launches ten simultaneous BrowserContexts. Each context holds 30–80 MB of Chromium renderer memory. Add a semaphore capped at 3–5 parallel contexts:

import pLimit from 'p-limit';
const browserSemaphore = pLimit(4); // max 4 parallel browser contexts

server.tool('take_screenshot', schema, (args) =>
  browserSemaphore(() => withContext(async (ctx) => {
    // ... screenshot logic
  }))
);

Sharp: image processing with libvips under the hood

Sharp is a native Node.js module that wraps libvips for high-performance image operations. The module ships pre-built binaries for common platforms, but native binary mismatches are common in containerized deployments — especially when the image is built on a different architecture than it runs on, or when Sharp is installed without the correct libc version.

Initialize Sharp lazily at first use, not at module import, so the MCP server starts up even when libvips is misconfigured — making the failure visible through the health endpoint rather than as a startup crash:

let _sharpReady = false;

async function ensureSharp(): Promise<void> {
  if (_sharpReady) return;
  // Force an actual operation to catch native binding failures at startup probe time
  await sharp({ create: { width: 1, height: 1, channels: 3, background: { r: 0, g: 0, b: 0 } } })
    .jpeg().toBuffer();
  _sharpReady = true;
}

Always enforce input size guards before processing. A 500 MB RAW image file passed to resize without guards will exhaust process memory. Sharp reads the full image into a libvips memory buffer by default:

const MAX_INPUT_BYTES = 20 * 1024 * 1024;   // 20 MB
const MAX_DIMENSION   = 8000;               // px per side

async function safeResize(inputBuffer: Buffer, width: number, height: number): Promise<Buffer> {
  if (inputBuffer.length > MAX_INPUT_BYTES) throw new Error('Image too large');
  const meta = await sharp(inputBuffer).metadata();
  if ((meta.width ?? 0) > MAX_DIMENSION || (meta.height ?? 0) > MAX_DIMENSION)
    throw new Error('Image dimensions exceed limit');
  return sharp(inputBuffer).resize(width, height, { fit: 'inside', withoutEnlargement: true })
    .jpeg({ quality: 85 }).toBuffer();
}

Image tools should return both the processed image as an ImageContent block and a TextContent block with the metadata (dimensions, format, file size) — agents need the metadata to make decisions without re-reading the binary data.

Silent failure mode: libvips native binary not found after container image rebuild. Sharp's JavaScript module loads fine; the require('sharp') call succeeds. The failure only surfaces when libvips is first invoked — the first call to sharp(buffer).resize() throws Error: Could not load the 'sharp' module using the linux-x64 runtime. Without a startup probe, this failure is invisible until a caller triggers an image tool.

The /health/image probe exercises the full libvips path by generating a minimal 10×10 PNG programmatically — no file dependency, works in any environment:

app.get('/health/image', async (req, res) => {
  try {
    await ensureSharp();
    const buf = await sharp({
      create: { width: 10, height: 10, channels: 3, background: { r: 64, g: 128, b: 192 } }
    }).png().toBuffer();
    res.json({ status: 'ok', probe_bytes: buf.length });
  } catch (err) {
    res.status(503).json({ status: 'error', error: String(err) });
  }
});

PDF tools: extraction, chunking, and the text-layer trap

PDF tools for MCP servers serve two primary use cases: displaying PDF content to an agent for analysis, and chunking PDF pages into RAG-ready fragments with stable IDs and page attribution. The library choice between pdf-parse and pdfjs-dist is the first architectural decision and it matters for your health endpoint design.

Dimension	pdf-parse	pdfjs-dist
API	Single async function returning full text	Page-by-page async iterator
Encrypted PDFs	Fails with an error	Supports password decryption
Page metadata	Limited (total pages, info)	Full viewport, rotation, annotations
Bundle size	~2 MB	~8 MB
Worker threads	No	Optional, improves large PDF performance
Best fit	Simple extraction, small files	Per-page streaming, large files, passwords

Always validate the PDF magic bytes before passing to any parser — reject non-PDF data early with a clear error rather than letting the parser throw an opaque internal error:

function assertPdfMagic(buf: Buffer): void {
  const MAGIC = Buffer.from('%PDF-');
  if (!buf.slice(0, 5).equals(MAGIC))
    throw new Error('Not a PDF: missing %PDF- magic bytes');
}

// Hard caps prevent OOM on large files
const MAX_PDF_BYTES = 50 * 1024 * 1024; // 50 MB
const MAX_PDF_PAGES = 500;

For RAG chunking, produce stable chunk IDs that survive re-ingestion. A chunk ID built from the document hash and page number lets downstream vector stores identify and replace stale embeddings without duplication:

import { createHash } from 'crypto';

function chunkId(docHash: string, pageNum: number, chunkIndex: number): string {
  return `${docHash}-p${pageNum}-c${chunkIndex}`;
}

// Estimate tokens for LLM context budgeting (rough approximation)
function estimateTokens(text: string): number {
  return Math.ceil(text.length / 4);
}

Silent failure mode: scanned PDF with no text layer. When a PDF is produced by scanning a physical document, each page is a rasterized image embedded in a PDF container. pdf-parse and pdfjs-dist both return an empty string for the page text — no error, no exception, zero bytes extracted. An agent receives a successful tool response with an empty string and may hallucinate content, report "no text found," or — in the worst case — proceed as if the document was blank.

Detect this at tool handler time and surface it explicitly:

const extracted = await extractPdfText(buffer, { maxPages: MAX_PDF_PAGES });
const totalChars = extracted.pages.reduce((sum, p) => sum + p.text.length, 0);

if (totalChars === 0) {
  return {
    content: [{ type: 'text', text: '[PDF has no extractable text — likely a scanned document. Use an OCR tool to process this file.]' }],
    isError: false // not a server error; the document is the issue
  };
}

The /health/pdf endpoint uses a minimal but valid PDF embedded as a base64 literal — no file system dependency, no test fixture to lose:

// A minimal 1-page PDF containing "Hello" as a text object — 89 bytes
const CANARY_PDF_B64 = 'JVBERi0xLjAKMSAwIG9iajw8L1R5cGUvQ2F0YWxvZy9QYWdlcyAyIDAgUj4+ZW5kb2JqCjIgMCBvYmo8PC9UeXBlL1BhZ2VzL0tpZHNbMyAwIFJdL0NvdW50IDE+PmVuZG9iagozIDAgb2JqPDwvVHlwZS9QYWdlL01lZGlhQm94WzAgMCAzIDNdPj5lbmRvYmoKeHJlZgowIDQKMDAwMDAwMDAwMCA2NTUzNSBmIAowMDAwMDAwMDA5IDAwMDAwIG4gCjAwMDAwMDAwNTggMDAwMDAgbiAKMDAwMDAwMDExNSAwMDAwMCBuIAp0cmFpbGVyPDwvU2l6ZSA0L1Jvb3QgMSAwIFI+PgpzdGFydHhyZWYKMTkwCiUlRU9G';

app.get('/health/pdf', async (req, res) => {
  try {
    const buf = Buffer.from(CANARY_PDF_B64, 'base64');
    assertPdfMagic(buf);
    const result = await pdfParse(buf);
    res.json({ status: 'ok', pages: result.numpages });
  } catch (err) {
    res.status(503).json({ status: 'error', error: String(err) });
  }
});

S3 tools: file storage and the IAM permission gap

S3 tools make an MCP server a first-class participant in a storage workflow — agents can upload documents, retrieve processed results, generate presigned download URLs, and list bucket contents. The AWS SDK v3 credential chain handles IAM roles, environment variables, and shared credential files transparently, making S3 tools straightforward to wire up. The dangerous part is credential management — and specifically what happens when credentials change after deployment.

Always use the credential chain rather than hard-coded keys, and always use IAM roles when running on EC2, ECS, or Lambda:

import { S3Client, PutObjectCommand, GetObjectCommand, HeadBucketCommand } from '@aws-sdk/client-s3';
import { getSignedUrl } from '@aws-sdk/s3-request-presigner';

const s3 = new S3Client({ region: process.env.AWS_REGION ?? 'us-east-1' });
// Credentials resolved from: env vars → ~/.aws → EC2/ECS instance role → Web Identity

For user-provided file uploads, enforce a content-type allowlist and a maximum size before the data reaches S3. Storing agent-provided files without validation creates an open storage vector for arbitrary content:

const ALLOWED_CONTENT_TYPES = new Set(['application/pdf', 'image/jpeg', 'image/png', 'text/plain', 'text/csv']);
const MAX_UPLOAD_BYTES = 50 * 1024 * 1024; // 50 MB

function assertUploadSafe(contentType: string, sizeBytes: number): void {
  if (!ALLOWED_CONTENT_TYPES.has(contentType))
    throw new Error(`Content-type not allowed: ${contentType}`);
  if (sizeBytes > MAX_UPLOAD_BYTES)
    throw new Error('File exceeds 50 MB upload limit');
}

Presigned URL generation is a pure IAM operation — no S3 API call is made until the presigned URL is actually used. This means presigned URL generation can succeed even when the object being referenced does not exist, and even when the generating role lacks GetObject permission at policy evaluation time.

Silent failure mode: IAM policy change removes PutObject permission while leaving GetObject intact. This is the most common S3 silent failure in shared environments where IAM policies are managed separately from the application. After the policy change, read tools (download_file, list_files, generate_presigned_url) continue to work normally. Write tools (upload_file) return AccessDenied — but only when actually called. The MCP process is healthy. The S3 bucket is accessible. A standard uptime check that hits HEAD /health sees HTTP 200.

The only way to detect this class of failure before callers hit it is a write-read-delete canary that exercises the full PutObject → GetObject → DeleteObject path:

app.get('/health/s3', async (req, res) => {
  const key = `_health/canary-${Date.now()}.txt`;
  try {
    // Write
    await s3.send(new PutObjectCommand({
      Bucket: process.env.S3_BUCKET!, Key: key,
      Body: 'alivemcp-canary', ContentType: 'text/plain'
    }));
    // Read back
    const obj = await s3.send(new GetObjectCommand({ Bucket: process.env.S3_BUCKET!, Key: key }));
    await obj.Body?.transformToString();
    // Clean up
    await s3.send(new DeleteObjectCommand({ Bucket: process.env.S3_BUCKET!, Key: key }));
    res.json({ status: 'ok' });
  } catch (err: any) {
    res.status(503).json({ status: 'error', code: err.Code ?? err.name, message: err.message });
  }
});

Set the /health/s3 probe interval to 5 minutes in AliveMCP — the canary write costs a small amount of S3 PUT/GET/DELETE API calls, but at 5-minute intervals the monthly cost is negligible (roughly 26,000 requests per month, well under $0.01 at standard S3 pricing).

For Cloudflare R2 compatibility, add forcePathStyle: true and set the endpoint to your R2 account endpoint. R2 implements the S3 API but uses path-style addressing rather than virtual-hosted-style. The canary pattern works identically against R2.

FFmpeg: transcription, video metadata, and the missing-binary trap

FFmpeg tools unlock audio transcription (via Whisper integration), video metadata extraction, format conversion, and frame extraction. Unlike the other integrations in this guide, FFmpeg is not a Node.js module — it is a system binary that must be present on the host and reachable from the server's PATH. This makes it the most fragile integration from a deployment perspective.

Spawn FFmpeg processes with child_process.spawn rather than exec — exec buffers the entire output in memory, which fails silently on large video files. Pass media data through stdin and read output from stdout to avoid temporary file management:

import { spawn } from 'child_process';

async function ffprobeMetadata(inputBuffer: Buffer): Promise<FfprobeResult> {
  return new Promise((resolve, reject) => {
    const proc = spawn('ffprobe', [
      '-v', 'quiet', '-print_format', 'json', '-show_format', '-show_streams',
      '-i', 'pipe:0'  // read from stdin
    ]);
    const chunks: Buffer[] = [];
    proc.stdout.on('data', (chunk) => chunks.push(chunk));
    proc.stderr.on('data', () => {}); // discard stderr (-v quiet suppresses most)
    proc.on('close', (code) => {
      if (code !== 0) return reject(new Error(`ffprobe exited with code ${code}`));
      resolve(JSON.parse(Buffer.concat(chunks).toString()));
    });
    proc.on('error', reject); // catches ENOENT when binary is missing
    proc.stdin.end(inputBuffer);
  });
}

For audio transcription, the typical pattern is FFmpeg as the preprocessing step (normalize sample rate, convert to 16 kHz mono WAV) piped into a Whisper model (either a local binary like whisper.cpp or an API call to OpenAI Whisper). The two-stage pipeline means two binary dependencies: FFmpeg for preprocessing and Whisper for transcription. Both need independent health checks.

Always set a hard timeout on FFmpeg child processes. A malformed media file can cause FFmpeg to loop indefinitely when probing the container format:

const FFMPEG_TIMEOUT_MS = 30_000; // 30 seconds max per operation

async function withTimeout<T>(promise: Promise<T>, ms: number, label: string): Promise<T> {
  const timeout = new Promise<never>((_, reject) =>
    setTimeout(() => reject(new Error(`${label} timed out after ${ms}ms`)), ms)
  );
  return Promise.race([promise, timeout]);
}

// Usage:
const meta = await withTimeout(ffprobeMetadata(buffer), FFMPEG_TIMEOUT_MS, 'ffprobe');

Silent failure mode: FFmpeg binary missing after container image rebuild. This is the most common FFmpeg deployment failure. A developer rebuilds the Docker image and forgets to include the RUN apt-get install -y ffmpeg layer, or a base image upgrade switches from ubuntu:22.04 (which includes ffmpeg in the default repositories) to an Alpine-based image where the package name is different. The Node.js process starts successfully. All non-media tools work normally. The first call to any FFmpeg tool throws Error: spawn ffmpeg ENOENT — the binary is simply not there.

The /health/ffmpeg probe runs version checks on both binaries at startup and on every probe cycle:

app.get('/health/ffmpeg', async (req, res) => {
  try {
    const [ffmpegVer, ffprobeVer] = await Promise.all([
      runCommand('ffmpeg', ['-version']).then(out => out.split('\n')[0]),
      runCommand('ffprobe', ['-version']).then(out => out.split('\n')[0]),
    ]);
    res.json({ status: 'ok', ffmpeg: ffmpegVer, ffprobe: ffprobeVer });
  } catch (err) {
    res.status(503).json({ status: 'error', error: String(err) });
  }
});

function runCommand(bin: string, args: string[]): Promise<string> {
  return new Promise((resolve, reject) => {
    const proc = spawn(bin, args);
    const chunks: Buffer[] = [];
    proc.stdout.on('data', (c) => chunks.push(c));
    proc.on('close', (code) => code === 0
      ? resolve(Buffer.concat(chunks).toString())
      : reject(new Error(`${bin} exited ${code}`))
    );
    proc.on('error', reject);
  });
}

Unlike the other health probes in this guide, /health/ffmpeg is nearly free — it just execs the binary with the version flag, no media processing. Run it at 1-minute intervals.

The unified monitoring architecture for multi-modal servers

A multi-modal MCP server that includes all five integrations has six observable health states: the MCP protocol layer plus one state per native dependency. Each layer can fail independently, and each failure produces a different signal — or no signal at all if you're only checking the protocol layer.

Health check	Catches	Does NOT catch	Recommended interval
AliveMCP protocol probe (MCP `initialize`)	Process death, TLS expiry, network partition, port change	All five native dependency failures above	60 seconds
`/health/browser`	Chromium crash, browser startup failure, headless detection causing hangs	Page-specific rendering failures	2 minutes
`/health/image`	libvips native binding failure, Sharp platform mismatch	Input-specific processing errors	5 minutes
`/health/pdf`	Parser library crash, Node.js version incompatibility	Scanned-PDF content gaps (detected at tool call time)	5 minutes
`/health/s3`	IAM permission change, bucket policy change, credential expiry, network ACL changes	Per-object ACL changes	5 minutes
`/health/ffmpeg`	Missing binary, wrong PATH, corrupted installation	Codec-specific failures	1 minute

Register all five custom health URLs in AliveMCP. The protocol probe runs every 60 seconds by default — custom health URL probes run at the intervals you configure, typically 1–5 minutes. When any probe returns a non-2xx status, AliveMCP fires the alert before the next caller discovers the failure.

Aggregate the five health checks into a single /health endpoint that returns the overall status plus per-integration states:

app.get('/health', async (req, res) => {
  const checks = await Promise.allSettled([
    checkBrowser(),
    checkSharp(),
    checkPdf(),
    checkS3(),
    checkFfmpeg(),
  ]);
  const results = {
    browser:  checks[0].status === 'fulfilled' ? checks[0].value : { status: 'error', error: String((checks[0] as PromiseRejectedResult).reason) },
    image:    checks[1].status === 'fulfilled' ? checks[1].value : { status: 'error', error: String((checks[1] as PromiseRejectedResult).reason) },
    pdf:      checks[2].status === 'fulfilled' ? checks[2].value : { status: 'error', error: String((checks[2] as PromiseRejectedResult).reason) },
    s3:       checks[3].status === 'fulfilled' ? checks[3].value : { status: 'error', error: String((checks[3] as PromiseRejectedResult).reason) },
    ffmpeg:   checks[4].status === 'fulfilled' ? checks[4].value : { status: 'error', error: String((checks[4] as PromiseRejectedResult).reason) },
  };
  const overallOk = Object.values(results).every((r) => r.status === 'ok');
  res.status(overallOk ? 200 : 503).json({ status: overallOk ? 'ok' : 'degraded', checks: results });
});

AliveMCP can be configured to alert on any 503 from a custom health URL. When your multi-modal server becomes partially degraded — say, S3 writes fail while image tools and browser tools remain healthy — the alert fires immediately, the individual check report tells you exactly which dependency failed, and agents calling non-S3 tools continue operating without interruption.

What the protocol probe misses

The pattern that emerges across all five integrations is the same: the MCP protocol probe and a standard HTTP uptime check both operate at the process boundary. They verify that the Node.js process is running, the TCP port is accepting connections, and the initialize exchange completes. None of that exercises Chromium, libvips, pdf-parse, IAM, or system binaries.

This is not a limitation of AliveMCP — it is a fundamental property of protocol-level monitoring. The protocol probe is the correct tool for detecting process death, TLS certificate expiry, network partitions, and port changes. It is the wrong tool for detecting what happens inside a running process.

The custom health URL is the bridge. It runs inside the process, exercises the dependency path, and returns a signal to the external monitor. AliveMCP polls that URL on your configured interval and alerts when it crosses the 2xx threshold. The combination of protocol probe plus custom health URLs gives you visibility at both boundaries — the process boundary and the dependency boundary — which together cover the full failure surface of a multi-modal MCP server.

Start monitoring your MCP server's media integrations with AliveMCP — the free tier includes protocol monitoring for public endpoints, and the Author tier at $9/mo adds custom health URL monitoring, 90-day history, and alert webhooks for private servers.

FAQ

Can I return binary file data directly from an MCP tool?: For images: yes, as an ImageContent block with base64-encoded data and a mimeType. Supported MIME types are image/jpeg, image/png, image/gif, and image/webp. SVGs must be returned as text. For other binary formats (PDF, audio, video), return a presigned S3 URL or a resource URI pointing to your MCP server's resource handler — the MCP protocol does not have a generic binary content type.
Should Playwright's Chromium run in headless or headed mode in a container?: Always headless in containers — headed mode requires a display server (Xvfb or equivalent). Pass headless: true to chromium.launch() and include --no-sandbox and --disable-dev-shm-usage in the args array. The /dev/shm shared memory partition is often only 64 MB in Docker — without --disable-dev-shm-usage, Chromium will crash silently on pages with heavy rendering.
How do I handle PDFs that are a mix of scanned and digital pages?: With pdfjs-dist, process page by page and check the text length of each page independently. Flag pages with zero characters as likely scanned and include that information in the tool's response alongside the extracted text from digital pages. Do not fail the entire extraction because one page lacks a text layer.
What's the safest way to store agent-uploaded files in S3?: Use a dedicated uploads prefix (e.g., uploads/{tenantId}/{uuid}.{ext}), enforce a content-type allowlist at the tool boundary, set a maximum file size before accepting the buffer, and scan uploads with an antivirus Lambda trigger if the content will be served to other users. Never generate presigned PUT URLs that allow the caller to override content-type — specify ContentType in your PutObjectCommand directly and set Content-Type in the presigned URL conditions.
FFmpeg and Sharp are large dependencies. Is there a lighter alternative for basic operations?: For Sharp: the jimp package is pure JavaScript (no native binary) and handles basic resize/crop/format conversion. It is 3–5× slower than Sharp for large images but eliminates the native binary deployment risk. For FFmpeg: for audio transcription only, you can skip FFmpeg preprocessing entirely and pass WAV files directly to Whisper if your callers can guarantee the input format. For production use cases where media quality and performance matter, the native binary approach with robust health monitoring is the correct trade-off.