Guide · Agentic Patterns

Long-Running Tasks in MCP Servers — async tool dispatch, polling, and progress notifications

MCP tool calls are synchronous in the protocol sense: the client sends a request and holds the connection open until the server responds. In practice, clients and proxies enforce timeouts — typically 30 to 120 seconds. Report generation, data exports, model inference pipelines, and code compilation routinely exceed those limits. Holding the connection for 10 minutes is not a viable answer: it exhausts connection slots, breaks behind proxies that have lower timeouts, and prevents the agent from doing other work while waiting. This guide covers the three viable patterns for long-running operations in MCP servers: async dispatch with polling, MCP progress notifications for real-time feedback, and webhook callbacks for results that arrive after the session ends.

TL;DR

Return immediately with a job ID. Give the agent a get_job_status tool to poll. Use MCP's progress notifications (notifications/progress) for real-time progress updates if the client supports them. Store job state in a persistent queue (BullMQ + Redis or pg-boss + Postgres) so jobs survive server restarts. Monitor the job processor with AliveMCP — a dead job processor looks identical to "tasks are running fine" from the agent's perspective.

The three async patterns compared

Pattern	Mechanism	Best for	Requires
Dispatch + poll	Tool returns job ID; agent calls `get_job_status` tool	All long-running tasks; simplest implementation	Job store + status endpoint
Progress notifications	Server pushes `notifications/progress` during execution	Tasks with meaningful intermediate milestones	MCP client that supports notifications
Webhook callback	Job sends HTTP POST to caller URL on completion	Tasks that outlast the agent session entirely	Caller provides a reachable URL

In practice, combine dispatch + poll (always available) with progress notifications (use when supported). Webhook callbacks are a last resort for multi-hour jobs — most agentic sessions don't survive that long.

Pattern 1: Dispatch and poll with BullMQ

The job store is the single source of truth. BullMQ with Redis is the standard choice for Node.js; pg-boss with Postgres is better for deployments that already have Postgres and want to avoid a separate Redis dependency.

// job-queue.ts
import { Queue, Worker, Job } from 'bullmq';
import { Redis } from 'ioredis';

const connection = new Redis(process.env.REDIS_URL!);
export const reportQueue = new Queue('reports', { connection });

// Worker runs in a separate process or thread
new Worker('reports', async (job: Job) => {
  const { reportType, filters, outputFormat } = job.data;

  await job.updateProgress(10);
  const rawData = await fetchReportData(filters);

  await job.updateProgress(50);
  const formatted = await formatReport(rawData, outputFormat);

  await job.updateProgress(90);
  const url = await uploadToStorage(formatted);

  return { url, row_count: rawData.length };
}, { connection });

The two MCP tools — dispatch and poll:

// generate_report tool: returns immediately with job ID
server.tool('generate_report', {
  report_type: z.enum(['sales', 'inventory', 'customer_activity']),
  date_range: z.object({ from: z.string(), to: z.string() }),
  output_format: z.enum(['csv', 'pdf', 'json']).default('csv'),
  idempotency_key: z.string().optional()
}, async ({ report_type, date_range, output_format, idempotency_key }) => {
  // Dedup: if a job with this idempotency key is already running, return its ID
  if (idempotency_key) {
    const existing = await reportQueue.getJob(idempotency_key);
    if (existing) {
      return {
        content: [{
          type: 'text',
          text: JSON.stringify({ job_id: existing.id, status: 'already_queued' })
        }]
      };
    }
  }

  const job = await reportQueue.add(
    'generate',
    { reportType: report_type, filters: date_range, outputFormat: output_format },
    { jobId: idempotency_key } // BullMQ uses jobId for dedup
  );

  return {
    content: [{
      type: 'text',
      text: JSON.stringify({
        job_id: job.id,
        status: 'queued',
        message: 'Report generation started. Poll get_report_status for results.',
        estimated_duration_seconds: 30
      })
    }]
  };
});

// get_report_status tool: poll this
server.tool('get_report_status', {
  job_id: z.string()
}, async ({ job_id }) => {
  const job = await reportQueue.getJob(job_id);

  if (!job) {
    return {
      content: [{ type: 'text', text: JSON.stringify({ error: 'Job not found' }) }]
    };
  }

  const state = await job.getState();
  const progress = job.progress;
  const result = job.returnvalue;
  const error = job.failedReason;

  return {
    content: [{
      type: 'text',
      text: JSON.stringify({
        job_id,
        status: state, // 'waiting' | 'active' | 'completed' | 'failed'
        progress_pct: typeof progress === 'number' ? progress : null,
        result: state === 'completed' ? result : null,
        error: state === 'failed' ? error : null,
        next_action: state === 'active' || state === 'waiting'
          ? 'Poll again in 10 seconds'
          : state === 'completed'
          ? `Download result at ${result?.url}`
          : 'Job failed — retry with generate_report'
      })
    }]
  };
});

Pattern 2: MCP progress notifications

The MCP protocol includes a notifications/progress message type that lets servers send incremental updates during a long tool call without returning the final result yet. The client receives progress events and can display them to the user or feed them back to the LLM context.

Progress notifications require that the initial tool call include a _meta.progressToken in the request. If the token is present, the server must emit progress notifications with that token until the tool returns. If no token is provided (many clients don't send one), fall back to the dispatch+poll pattern.

// Tool handler with progress notification support
server.tool('analyze_codebase', {
  repo_url: z.string().url(),
  analysis_depth: z.enum(['shallow', 'deep']).default('deep')
}, async ({ repo_url, analysis_depth }, { meta }) => {
  const progressToken = meta?.progressToken;

  const sendProgress = progressToken
    ? async (progress: number, message: string) => {
        await server.notification({
          method: 'notifications/progress',
          params: { progressToken, progress, total: 100, message }
        });
      }
    : async () => {}; // No-op if client doesn't support progress

  await sendProgress(5, 'Cloning repository...');
  await cloneRepo(repo_url);

  await sendProgress(20, 'Indexing files...');
  const files = await indexFiles(repo_url);

  await sendProgress(60, `Analyzing ${files.length} files...`);
  const analysis = await analyzeFiles(files, analysis_depth);

  await sendProgress(95, 'Generating report...');
  const report = await generateReport(analysis);

  return {
    content: [{ type: 'text', text: JSON.stringify(report) }]
  };
});

The key invariant: progress percentage should be monotonically increasing. Don't send 80% followed by 40% — clients that render progress bars will jump backward visibly. If you don't know the exact percentage, use a graduated estimate: 5% → 15% → 35% → 65% → 90% → return.

Idempotency for retried tool calls

Long-running jobs create a timing problem: the agent's session times out while the job is still running, the user re-runs the agent, and now the same expensive job runs twice. Idempotency keys prevent duplicate job creation:

// Agent includes a stable idempotency key derived from the session + tool parameters
// The server deduplicates on this key

function deriveIdempotencyKey(
  sessionId: string,
  toolName: string,
  params: Record<string, unknown>
): string {
  const stableParams = JSON.stringify(params, Object.keys(params).sort());
  return `${sessionId}:${toolName}:${Buffer.from(stableParams).toString('base64').slice(0, 24)}`;
}

// pg-boss deduplication example (uses singletonKey)
async function enqueueWithDedup(
  boss: PgBoss,
  jobName: string,
  data: object,
  idempotencyKey: string
): Promise<string | null> {
  return boss.send(jobName, data, {
    singletonKey: idempotencyKey,
    singletonSeconds: 3600, // dedup window — one hour
    retryLimit: 3,
    retryDelay: 30
  });
}

Job processor health monitoring

The silent failure mode for async MCP tools is a dead job processor. The dispatch tool works — it accepts the job and returns a job ID immediately. The agent polls get_job_status and sees waiting. Minutes pass. Still waiting. The job processor went down 20 minutes ago and no one noticed.

A health check endpoint for the job processor:

app.get('/health/jobs', async (req, res) => {
  // Count jobs stuck in 'active' for longer than max expected duration
  const stuckJobs = await reportQueue.getJobs(['active']);
  const stuckCount = stuckJobs.filter(
    j => Date.now() - j.timestamp > 5 * 60 * 1000 // active > 5 minutes
  ).length;

  // Count jobs waiting longer than expected queue time
  const waitingJobs = await reportQueue.getJobs(['waiting']);
  const stalledWaiting = waitingJobs.filter(
    j => Date.now() - j.timestamp > 10 * 60 * 1000 // waiting > 10 minutes
  ).length;

  const workerCount = await reportQueue.getWorkers();

  const status = workerCount.length === 0 || stuckCount > 0 || stalledWaiting > 5
    ? 'degraded'
    : 'ok';

  res.status(status === 'ok' ? 200 : 503).json({
    status,
    worker_count: workerCount.length,
    stuck_active_jobs: stuckCount,
    stalled_waiting_jobs: stalledWaiting,
    queue_depth: waitingJobs.length
  });
});

Register https://your-mcp-server.com/health/jobs in AliveMCP with a 1-minute check interval. When the job processor dies, AliveMCP alerts before the first affected user finishes their polling loop wondering why their report is still waiting.

Frequently asked questions

What polling interval should I tell the agent to use?

Include a poll_after_seconds field in the job dispatch response. Set it based on the expected job duration: for 30-second jobs, suggest 10 seconds; for 5-minute jobs, suggest 30 seconds; for hour-long jobs, suggest 120 seconds. Shorter polling intervals don't complete jobs faster, they just increase load on your job store. Most LLMs will respect an explicit suggestion in the tool result. If the LLM ignores it and polls every second, add a server-side rate limit on get_job_status: 6 requests per minute per job ID is sufficient for any reasonable poll interval.

What happens if the job processor restarts while a job is running?

BullMQ marks active jobs as stalled after a configurable stalledInterval (default: 30 seconds). Stalled jobs are automatically moved back to waiting and retried up to maxStalledCount times (default: 1). For jobs with side effects (e.g., partially written files), implement idempotent workers that check whether prior progress exists before re-executing each step. pg-boss has similar restart semantics via the expire_in column — jobs that expire are moved back to the queue for retry. Always design job workers as idempotent: running the same job twice with the same inputs must produce the same result.

How do I handle jobs that need human approval mid-execution?

Suspend the job at the approval checkpoint and emit an approval request. Use the human-in-the-loop pattern: the job writes a pending_approval row, then suspends itself (by awaiting a Promise that resolves only when the approval row transitions to approved). The job processor listens on a Postgres NOTIFY channel or Redis pub/sub for approval events. When the approval arrives, the Promise resolves and the job continues from where it paused. This avoids the job timing out while waiting for a human — the approval wait can be hours without a timeout error.

Should I use progress notifications or polling for streaming output?

Progress notifications are the right tool when the client supports them — they reduce latency to the first visible update and don't require the agent to spend tool calls on polling. The catch: not all MCP clients send a progressToken. Always implement the dispatch+poll fallback as the baseline; add progress notifications as an enhancement. For streaming output (not just progress percentage but actual partial results, like a streaming LLM response), the MCP streaming tools API or SSE transport is more appropriate than progress notifications, which are designed for percentage-style progress rather than incremental result data.