Guide · Multi-modal & Media Integration

MCP Server Playwright — Browser Automation Tools with Playwright

Playwright turns a TypeScript MCP server into a browser automation engine — letting AI agents take screenshots, scrape rendered pages, fill forms, and interact with web apps. This guide covers integrating Playwright into an MCP server: browser lifecycle management, building a page-context pool, implementing screenshot and scrape tools safely, handling parallel tool calls, and wiring a /health/browser endpoint so AliveMCP can detect when your Chromium instance goes down.

TL;DR

Install playwright and launch a single Browser instance at server startup rather than per-tool-call — browser startup costs 300–800 ms and per-call launches will exhaust memory quickly. Give each tool call an isolated BrowserContext (incognito) that you close after the call finishes — this prevents cookies and localStorage from leaking between callers. Return screenshot data as base64 image content blocks in the MCP response. Set a hard timeout on every page.goto() and use waitForLoadState('domcontentloaded') not 'networkidle' in server contexts. Wire /health/browser to probe a blank-page navigation so your monitor catches browser crashes, not just process uptime.

Browser lifecycle: startup and pool management

The biggest mistake when integrating Playwright into an MCP server is launching a browser inside each tool call. A Chromium startup takes 300–800 ms and allocates 80–200 MB of memory. If three tool calls arrive in parallel, you're launching three browsers simultaneously. Instead, launch one browser at server start and give each tool call its own BrowserContext.

import { chromium, Browser, BrowserContext } from 'playwright';

let _browser: Browser | null = null;

export async function getBrowser(): Promise<Browser> {
  if (_browser?.isConnected()) return _browser;

  _browser = await chromium.launch({
    headless: true,
    args: [
      '--no-sandbox',              // required in Docker/container environments
      '--disable-setuid-sandbox',
      '--disable-dev-shm-usage',   // /dev/shm is often too small in containers
      '--disable-gpu',
      '--single-process',          // reduces memory in constrained environments
    ]
  });

  _browser.on('disconnected', () => {
    _browser = null; // force re-launch on next call
  });

  return _browser;
}

// Create an isolated context per tool call
export async function newContext(): Promise<BrowserContext> {
  const browser = await getBrowser();
  return browser.newContext({
    viewport: { width: 1280, height: 720 },
    userAgent: 'MCP-Browser-Tool/1.0 (AliveMCP)',
    ignoreHTTPSErrors: false,     // leave true only if you trust the target domain
    javaScriptEnabled: true,
    acceptDownloads: false,       // never let tools trigger file downloads
    bypassCSP: false,
    extraHTTPHeaders: {
      'Accept-Language': 'en-US,en;q=0.9'
    }
  });
}

Call newContext() at the top of each tool handler and await context.close() in a finally block. This pattern isolates callers from each other and releases memory after each call — critical in long-running server processes.

Approach	Startup cost	Isolation	Memory growth
One browser per tool call	300–800 ms	Perfect	O(concurrent calls) — leaks fast
One browser, one shared context	~0 ms	None — cookies leak	Low, but cookies/storage accumulate
One browser, context per call (recommended)	~30 ms per context	Full (incognito)	Constant if context is closed after use
Browser pool (2–4 browsers)	~0 ms	Full	Fixed pool size × browser memory

Screenshot tool

Screenshots return image content blocks — the MCP content type that carries base64-encoded image data. Most LLM clients that support vision will render these inline. The key decisions are: what format to use (PNG for accuracy, JPEG for smaller payloads), what to capture (full page vs viewport), and how to time the capture.

import { z } from 'zod';
import { McpError, ErrorCode } from '@modelcontextprotocol/sdk/types.js';
import { newContext } from './browser.js';

server.tool(
  'take_screenshot',
  {
    url: z.string().url(),
    full_page: z.boolean().default(false),
    format: z.enum(['png', 'jpeg']).default('png'),
    quality: z.number().int().min(10).max(100).default(80), // only applies to jpeg
    wait_for: z.enum(['domcontentloaded', 'load', 'networkidle']).default('domcontentloaded'),
    timeout_ms: z.number().int().min(1000).max(30000).default(10000)
  },
  async ({ url, full_page, format, quality, wait_for, timeout_ms }) => {
    // Block dangerous URL schemes
    const parsed = new URL(url);
    if (!['https:', 'http:'].includes(parsed.protocol)) {
      throw new McpError(ErrorCode.InvalidParams, `Scheme ${parsed.protocol} is not allowed`);
    }

    const context = await newContext();
    try {
      const page = await context.newPage();

      await page.goto(url, {
        waitUntil: wait_for,
        timeout: timeout_ms
      });

      const screenshotBuffer = await page.screenshot({
        fullPage: full_page,
        type: format,
        quality: format === 'jpeg' ? quality : undefined,
        timeout: 5000
      });

      const base64 = screenshotBuffer.toString('base64');
      const mimeType = format === 'jpeg' ? 'image/jpeg' : 'image/png';

      return {
        content: [{
          type: 'image',
          data: base64,
          mimeType
        }]
      };
    } catch (err) {
      if (err instanceof Error && err.message.includes('Timeout')) {
        throw new McpError(ErrorCode.InternalError, `Page load timed out after ${timeout_ms}ms: ${url}`);
      }
      throw err;
    } finally {
      await context.close();
    }
  }
);

Use waitUntil: 'domcontentloaded' not 'networkidle' as the default. networkidle waits until no network requests for 500 ms — many modern SPAs with background polling never reach this state, causing the tool to time out on pages that are visually complete. If a caller needs to wait for a specific element before screenshotting, add a wait_for_selector parameter and call await page.waitForSelector(selector, { timeout }) before the screenshot.

Page scraping tool

A scrape tool returns page text content, the page title, and optionally structured data extracted from specific CSS selectors. This is more useful to LLMs than a raw HTML dump because it removes markup noise.

server.tool(
  'scrape_page',
  {
    url: z.string().url(),
    selectors: z.array(z.object({
      name: z.string().min(1),
      css: z.string().min(1),
      attribute: z.string().optional() // if omitted, returns text content
    })).max(20).default([]),
    include_links: z.boolean().default(false),
    timeout_ms: z.number().int().min(1000).max(30000).default(10000)
  },
  async ({ url, selectors, include_links, timeout_ms }) => {
    const parsed = new URL(url);
    if (!['https:', 'http:'].includes(parsed.protocol)) {
      throw new McpError(ErrorCode.InvalidParams, `Scheme ${parsed.protocol} is not allowed`);
    }

    const context = await newContext();
    try {
      const page = await context.newPage();
      await page.goto(url, { waitUntil: 'domcontentloaded', timeout: timeout_ms });

      const title = await page.title();
      const bodyText = await page.evaluate(() => {
        // Remove script, style, nav, footer noise
        const remove = ['script', 'style', 'nav', 'footer', 'aside'];
        remove.forEach(tag => {
          document.querySelectorAll(tag).forEach(el => el.remove());
        });
        return document.body?.innerText ?? '';
      });

      // Extract named selectors
      const extracted: Record<string, string[]> = {};
      for (const { name, css, attribute } of selectors) {
        extracted[name] = await page.evaluate(
          ({ css, attribute }) => {
            const els = Array.from(document.querySelectorAll(css)).slice(0, 50);
            return els.map(el =>
              attribute ? (el as HTMLElement).getAttribute(attribute) ?? '' : (el as HTMLElement).innerText
            );
          },
          { css, attribute }
        );
      }

      // Optionally extract all links
      const links = include_links
        ? await page.evaluate(() =>
            Array.from(document.querySelectorAll('a[href]'))
              .map(a => ({ text: (a as HTMLAnchorElement).innerText.trim(), href: (a as HTMLAnchorElement).href }))
              .filter(l => l.href.startsWith('http'))
              .slice(0, 100)
          )
        : [];

      return {
        content: [{
          type: 'text',
          text: JSON.stringify({ title, url, bodyText: bodyText.slice(0, 20000), extracted, links }, null, 2)
        }]
      };
    } finally {
      await context.close();
    }
  }
);

Truncate bodyText to 20,000 characters before returning it. A long article page can easily have 50,000+ characters that would overflow the LLM context window and slow tool execution. If the caller needs the full text, add a pagination parameter (page_num) that slices the text in chunks.

Concurrency limiting

A single Chromium browser can run many pages in parallel, but each page consumes memory (30–100 MB) and CPU. Without a concurrency limit, a burst of tool calls can crash the server. Use a semaphore to cap concurrent page operations.

// Simple semaphore for controlling parallelism
class Semaphore {
  private _queue: Array<() => void> = [];
  private _running = 0;

  constructor(private readonly _max: number) {}

  async acquire(): Promise<() => void> {
    if (this._running < this._max) {
      this._running++;
      return () => this._release();
    }
    return new Promise(resolve => {
      this._queue.push(() => {
        this._running++;
        resolve(() => this._release());
      });
    });
  }

  private _release() {
    this._running--;
    const next = this._queue.shift();
    if (next) next();
  }
}

// Allow at most 3 concurrent browser operations
const browserSemaphore = new Semaphore(3);

// Usage in a tool handler:
// const release = await browserSemaphore.acquire();
// try { ... } finally { release(); await context.close(); }

Set the semaphore limit based on your server's available memory: each concurrent browser context uses approximately 80–150 MB. On a 512 MB container, keep the limit at 2–3. On a 2 GB instance, 8–10 concurrent contexts is safe.

Health endpoint for browser monitoring

Your process health check only tells you whether the Node.js process is alive. If Playwright's Chromium subprocess has crashed, the browser object will have isConnected() === false and all subsequent tool calls will fail — but the process itself is still running and returning 200 to a naive health check. Wire a dedicated browser health check that actually navigates a page.

import Fastify from 'fastify';
import { getBrowser, newContext } from './browser.js';

const http = Fastify();

http.get('/health/browser', async (req, reply) => {
  const start = Date.now();
  const browser = await getBrowser().catch(() => null);
  if (!browser || !browser.isConnected()) {
    return reply.code(503).send({
      status: 'error',
      detail: 'browser_disconnected',
      latency_ms: Date.now() - start
    });
  }

  // Probe: navigate a blank page — confirms Chromium is actually responsive
  const context = await newContext().catch(() => null);
  if (!context) {
    return reply.code(503).send({ status: 'error', detail: 'context_creation_failed' });
  }

  try {
    const page = await context.newPage();
    await page.goto('about:blank', { timeout: 3000 });
    await page.close();
    return reply.send({
      status: 'ok',
      latency_ms: Date.now() - start,
      connected: true,
      version: browser.version()
    });
  } catch (err) {
    return reply.code(503).send({
      status: 'error',
      detail: 'probe_failed',
      latency_ms: Date.now() - start
    });
  } finally {
    await context.close();
  }
});

The about:blank navigation probe is cheap (under 50 ms) and reliably detects Chromium crashes. Wire this endpoint to AliveMCP with a 30-second check interval. If the browser crashes, the endpoint returns 503 before any real tool calls fail with confusing timeout errors.

Silent failure modes

Failure	Observable symptom	Caught by process ping?	Detection strategy
Chromium subprocess crash	All tool calls throw `Target page, context or browser has been closed`	No — process is alive	`/health/browser` with blank-page navigation probe
OOM kill of Chromium	Browser disconnected event; new tool calls fail on context creation	No	Same as above; also watch container memory metrics
Target site blocks headless	Tool returns empty content or CAPTCHA page	No — tool call succeeds with 200	Validate expected page structure in tool output; log blocked responses
Page load timeout	Tool throws `McpError(InternalError)`	No	Track tool-level error rate; alert on sustained timeout spikes
Context leak (context.close() skipped)	Memory grows until OOM	No — until crash	Always use `finally { await context.close() }`; monitor process RSS

AliveMCP monitors your /health/browser endpoint on a 30-second cycle and alerts you the moment Chromium crashes — before a user discovers that the screenshot tool has stopped responding.

Frequently asked questions

Should I use Playwright or Puppeteer for MCP browser tools?

Playwright is recommended over Puppeteer for new MCP server projects. Playwright has built-in TypeScript types (no @types/puppeteer package needed), supports Chromium, Firefox, and WebKit from a single API, has a first-class BrowserContext abstraction for per-call isolation, and is actively maintained by Microsoft. Puppeteer works fine if you already use it, but Playwright's context.newPage() pattern maps more naturally to the MCP tool call isolation model.

How do I handle authentication in browser tools?

The safest pattern is credential injection per context rather than maintaining a persistent logged-in browser. Accept an optional cookies parameter (array of {name, value, domain}) or an auth_header string and set it on the context before navigating. Never store session cookies in the server process between tool calls — this would let one caller's authentication state affect another's. For sites that require OAuth flows, pre-authenticate in a setup script and export the storage state: await context.storageState({ path: 'auth-state.json' }), then pass storageState: 'auth-state.json' when creating contexts for authenticated tools.

Can browser tools run in Docker or Kubernetes?

Yes, but you need the right Chromium dependencies. Use the official Playwright Docker image (mcr.microsoft.com/playwright:v1.x.x-noble) as your base, or run npx playwright install-deps chromium in your Dockerfile. In Kubernetes, set --disable-dev-shm-usage (the /dev/shm tmpfs is limited to 64 MB by default) and --no-sandbox (sandboxing requires kernel capabilities not available in most cluster security policies). Add a SHM_SIZE environment variable or a /dev/shm volume mount of at least 256 MB if you need to run without --disable-dev-shm-usage.

How do I block callers from navigating to internal network addresses?

Parse the URL and block private IP ranges and localhost before calling page.goto(): reject localhost, 127.x.x.x, 10.x.x.x, 172.16-31.x.x, 192.168.x.x, 169.254.x.x (link-local), and [::1]. Also block file://, data://, and javascript:// schemes. This prevents SSRF — a caller using the screenshot tool to reach your internal Kubernetes services or cloud metadata endpoint (169.254.169.254). Validate the URL server-side after DNS resolution if possible, or use a DNS-based block list.

What's the right timeout for page.goto() in production?

Use 10–15 seconds as the default timeout for page.goto() with waitUntil: 'domcontentloaded'. This is generous for most pages while still bounding tool execution time. Let callers override it up to 30 seconds via a timeout_ms parameter. Do not use Playwright's default of 30 seconds for all operations — it makes tool calls feel hung and makes it hard to distinguish a slow page from a crashed browser. Always set a page.setDefaultNavigationTimeout() that matches your tool's maximum.