Agentic Patterns guide · 2026-06-21 · Agentic Patterns & Long-Running Operations

Five Agentic Patterns Every Production MCP Server Needs

An MCP server built for a single-step API call and an MCP server built for autonomous agentic workflows are not the same thing. Agents differ from conventional API callers in four fundamental ways: they pick tools based on natural language understanding rather than explicit code (tool selection can fail); they execute long-running operations without holding a connection open (async jobs can stall invisibly); they run multi-step workflows that can be interrupted and resumed (state must survive restarts); and they take irreversible actions that sometimes need a human checkpoint before execution (guardrails and approval gates are not optional). Each difference produces a distinct category of silent failure — the kind where the server returns HTTP 200, the protocol probe stays green, but the agent is not actually doing useful work. This guide synthesizes five patterns that together close those gaps: tool discovery (accurate tool selection by design), long-running tasks (async dispatch without silent stalls), state machines (workflow persistence that survives interruption), human-in-the-loop approval gates (irreversible action safety), and guardrails (defense at both trust boundaries). Each pattern also has a characteristic health signal that AliveMCP can monitor — because in agentic systems, the most dangerous failures are the ones that look fine from the outside.

The five patterns and what they each prevent

Pattern	Silent failure it prevents	Monitored via
Tool discovery	Agent calls the wrong tool; fails loudly or silently misuses a write tool when a read tool was appropriate	Tool selection accuracy tests; tool-level error rates
Long-running tasks	Job processor dead; agent polls forever while queue never drains	`/health/jobs` — worker count, stuck active jobs, stalled waiting jobs
State machines	Workflow stuck in intermediate state after session timeout or partial failure; duplicate side effects on retry	`/health/workflows` — count of non-terminal states idle for >1 hour
Human-in-the-loop	Approval service down; every destructive agentic action silently stalls	`/health/approvals` — Slack connectivity, stale pending approvals
Guardrails	Prompt injection via tool arguments; SSRF via URL parameters; PII leaking through tool results into the LLM context	`/health/security` — guardrail rejection rate vs baseline

Each pattern adds a dedicated health endpoint. Together those endpoints give AliveMCP enough signal to distinguish "server is reachable" from "server is working correctly for agentic callers" — the gap that protocol probes alone cannot close.

Pattern 1: Tool discovery — make the right tool obvious

Agentic tool selection failures look like model errors but are almost always tool authoring errors. The LLM chooses which tool to call based entirely on the tool's name, description, and JSON schema. Ambiguous names, missing disambiguation clauses, and unconstrained string parameters that should be enums are the three most common causes of wrong-tool selection.

The fix has three parts, each addressing a different selection signal:

Naming: verb-noun with consistent prefixes. customer_search, customer_get, customer_list group alphabetically and encode both action and object. Generic names like process or handle_data force the model to infer intent from context it may not have. Use specific verbs: get for single-item lookups, search for criteria-based queries, list for paginated enumeration, create/update/delete for writes.

Descriptions: include when NOT to use. The "Do NOT use when" clause is the most underused and most effective part of a tool description. It explicitly tells the model when to prefer a different tool — directly preventing the most common wrong-tool pattern where two tools overlap in apparent purpose:

{
  description:
    'Search customers by name, email, or company. ' +
    'Use this when: you have a partial name/email and need a list to choose from. ' +
    'Do NOT use when: you have an exact customer ID — use customer_get instead. ' +
    'Do NOT use when: you need ALL customers — use customer_list instead.'
}

Schema: enums over unconstrained strings. An LLM faced with status: z.string() may generate "in-progress" when you expect "active". With status: z.enum(['active', 'inactive', 'suspended']), the model selects from declared values. This single change eliminates an entire class of tool call failures in high-cardinality parameter fields.

Measure accuracy before deploying: write 20–50 natural-language test prompts that should trigger specific tools, run them through a test agent session, and check which tool was actually called. Target above 90% accuracy; below 80% means your descriptions need a disambiguation audit. See tool discovery patterns for the complete naming system, annotation flags (readOnly, destructive, idempotent), and tool count management when your server exceeds 20 tools.

Pattern 2: Long-running tasks — return immediately, track asynchronously

MCP tool calls are synchronous in the protocol sense, but clients and proxies enforce timeouts — typically 30 to 120 seconds. Report generation, data exports, model inference pipelines, and code compilation routinely exceed those limits. The pattern: return a job ID immediately, give the agent a polling tool, and process the work in a persistent queue.

BullMQ with Redis (or pg-boss with Postgres) provides the job store:

// Dispatch tool: returns immediately
server.tool('generate_report', {
  report_type: z.enum(['sales', 'inventory', 'customer_activity']),
  date_range: z.object({ from: z.string(), to: z.string() }),
  idempotency_key: z.string().optional()
}, async ({ report_type, date_range, idempotency_key }) => {
  const job = await reportQueue.add(
    'generate',
    { reportType: report_type, filters: date_range },
    { jobId: idempotency_key }
  );
  return {
    content: [{ type: 'text', text: JSON.stringify({
      job_id: job.id,
      status: 'queued',
      message: 'Poll get_report_status for results.',
      poll_after_seconds: 10
    })}]
  };
});

// Poll tool: agent calls this until status is 'completed' or 'failed'
server.tool('get_report_status', {
  job_id: z.string()
}, async ({ job_id }) => {
  const job = await reportQueue.getJob(job_id);
  const state = await job?.getState();
  return {
    content: [{ type: 'text', text: JSON.stringify({
      job_id,
      status: state,
      progress_pct: job?.progress ?? null,
      result: state === 'completed' ? job?.returnvalue : null
    })}]
  };
});

Include an idempotency_key parameter on every dispatch tool. When an agent session times out mid-run and the user resumes, the re-dispatched job returns the existing job ID rather than creating a duplicate. BullMQ deduplicates on jobId natively.

The critical monitoring point: a dead job processor is invisible from the outside. The dispatch tool works — it accepts jobs and returns job IDs. The agent polls and sees waiting. Minutes pass, still waiting. The processor went down 20 minutes ago and nothing alerted. Wire AliveMCP to a dedicated health endpoint:

app.get('/health/jobs', async (req, res) => {
  const workers = await reportQueue.getWorkers();
  const stuckActive = (await reportQueue.getJobs(['active']))
    .filter(j => Date.now() - j.timestamp > 5 * 60 * 1000).length;

  const status = workers.length === 0 || stuckActive > 0 ? 'degraded' : 'ok';
  res.status(status === 'ok' ? 200 : 503).json({
    status, worker_count: workers.length, stuck_active_jobs: stuckActive
  });
});

One-minute AliveMCP check interval on /health/jobs means you know within a single polling cycle when the processor dies. See long-running tasks patterns for progress notifications (the MCP notifications/progress mechanism for real-time updates), webhook callbacks for jobs that outlast the agent session, and pg-boss restart semantics for jobs that were active when the processor crashed.

Pattern 3: State machines — persist workflow state across session boundaries

Multi-step agentic workflows — validate input → reserve inventory → charge payment → trigger fulfillment — cannot live in in-memory server state. When the agent session times out, the process restarts, or the user returns the next day, the workflow must resume from the last committed step rather than restart from scratch. In-memory Maps lose all state on restart. Redis is fast but has no audit trail and TTL eviction can lose active workflow state. Postgres is the right choice for workflow state: it survives restarts, supports concurrent access with row locking, and gives you an append-only audit log of every transition.

The schema stores current state and context in one table, and transitions in an append-only events table:

CREATE TABLE workflows (
  id           UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  workflow_type TEXT NOT NULL,
  state        TEXT NOT NULL,
  context      JSONB NOT NULL DEFAULT '{}',
  version      INTEGER NOT NULL DEFAULT 1,
  updated_at   TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

CREATE TABLE workflow_events (
  id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  workflow_id UUID NOT NULL REFERENCES workflows(id),
  from_state  TEXT NOT NULL,
  to_state    TEXT NOT NULL,
  event       TEXT NOT NULL,
  payload     JSONB,
  actor       TEXT,
  occurred_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Enforce valid transitions server-side with a typed constant and use FOR UPDATE row locking to prevent concurrent agents from racing on the same workflow. Two agents cannot both advance through authorize_payment when the second agent's FOR UPDATE blocks until the first transaction commits, and the state check then sees the already-advanced state and rejects the duplicate transition.

Expose the workflow as MCP tools — one to read current state, one per allowed action. Include next_allowed_actions in the read response so the agent knows exactly which tools to call next without reasoning about the state machine definition itself:

// get_order_workflow response shape
{
  workflow_id: "uuid",
  current_state: "payment_authorized",
  next_allowed_actions: ["capture_order_payment", "cancel_order"],
  summary: "Payment authorized for $149.99. Ready to capture."
}

Detect stuck workflows with a 5-minute AliveMCP check on /health/workflows that counts non-terminal states idle for more than one hour. Any workflow stuck that long represents either a stalled agent, a swallowed error, or a missing human-approval step. See state machine patterns for optimistic locking implementation, XState integration for complex branching workflows, and the idempotency pattern for side effects (Stripe charges, email sends) that may be retried when a step is re-entered after partial failure.

Pattern 4: Human-in-the-loop — enforce approval gates server-side

MCP SDK tool annotations (destructive: true) signal intent to the client — they let Claude Desktop show a confirmation dialog before calling your tool. This is the right first layer. The gap: annotation-based approval lives entirely in the client. An agent that calls the tool programmatically, a client that doesn't implement confirmation dialogs, or a prompt that instructs the agent to skip confirmation — none of these can be stopped by an annotation. Server-side approval gates enforce at the tool handler boundary regardless of client behavior.

The approval middleware intercepts calls before execution and either proceeds (low-risk, or prior approval confirmed), returns a pending state (medium-risk, no approval yet), or denies outright (high-risk, no policy exception):

// Destructive tool handler with server-side approval
server.tool('delete_customer_records', {
  customer_ids: z.array(z.string()).min(1).max(1000),
  __approval_id: z.string().optional()
}, async ({ customer_ids, __approval_id }, { meta }) => {
  const approval = await requireApproval(
    'delete_customer_records',
    { customer_ids, __approval_id },
    meta?.requestId ?? 'unknown'
  );

  if (!approval.proceed) {
    return { content: [{ type: 'text', text: JSON.stringify({
      status: 'pending_approval',
      approval_id: approval.approvalId,
      message: `Approval requested. Call check_approval_status with this ID, ` +
               `then re-call this tool with __approval_id when approved.`
    })}] };
  }

  await customerDb.deleteMany(customer_ids);
  return { content: [{ type: 'text', text: `Deleted ${customer_ids.length} records.` }] };
});

Approval state lives in a Postgres table with an expires_at column. The companion check_approval_status tool lets the agent poll without re-triggering the destructive action. Human approvers are notified via Slack interactive messages with Approve/Deny buttons that write to the same table.

The approval service itself introduces a new silent failure mode: it goes down, and every agentic action stalls without alerting. The agent calls the destructive tool, gets pending_approval, polls check_approval_status, and waits — while the Slack notification was never delivered because the Slack client failed, and no one knows an action is waiting. Wire AliveMCP to /health/approvals that checks table reachability, Slack client connectivity, and whether the pending-approval queue depth is growing unboundedly:

app.get('/health/approvals', async (req, res) => {
  const staleCount = await db.query(
    `SELECT count(*) FROM pending_approvals
     WHERE status = 'pending' AND created_at < NOW() - INTERVAL '1 hour'`
  );
  const slackHealthy = await testSlackConnection();
  const status = slackHealthy && staleCount.rows[0].count < 10 ? 'ok' : 'degraded';
  res.status(status === 'ok' ? 200 : 503).json({
    status,
    pending_stale: staleCount.rows[0].count,
    slack_reachable: slackHealthy
  });
});

Design the risk classifier explicitly — a lookup table mapping tool names and parameter shapes to Low/Medium/High risk tiers, with escalation rules for bulk operations (e.g., delete of more than 100 records escalates to High regardless of base tier). See human-in-the-loop patterns for the complete approval database schema, rollback obligations for each action type (soft-delete for database deletes, draft-before-send for emails, Stripe idempotency keys for billing), and how to handle approvals for long-running jobs that may span multiple agent sessions.

Pattern 5: Guardrails — defend both trust boundaries

MCP servers sit at the intersection of two trust boundaries that most API servers don't face simultaneously. The LLM calling your tools can be prompted (by adversarial user input in the conversation) to supply injected instructions in tool arguments. The external data your tools fetch (web pages, customer notes, documents) may contain embedded LLM instructions placed by an adversary specifically to hijack agent behavior when the data is returned as a tool result. Guardrails enforce safety constraints at both boundaries.

The middleware pattern is the critical implementation choice. Applying guardrails in each handler produces inconsistent coverage — the handler added at 2am won't have the same injection checks as handlers written carefully. A wrapper ensures every tool goes through the same pipeline:

export function withGuardrails(toolName, schema, options, handler) {
  return async (params, context) => {
    // Input: check all string parameters for injection patterns
    if (options.checkInjection !== false) {
      const check = checkAllStringParams(params, toolName);
      if (check.detected) {
        await logGuardrailRejection(toolName, 'prompt_injection', params, context);
        return errorResult('Input rejected: potentially adversarial content detected.');
      }
    }

    const result = await handler(params, context);
    let output = getOutputText(result);

    // Output: truncate, scrub PII, remove injected instructions
    if (options.maxOutputBytes) output = truncateToBytes(output, options.maxOutputBytes);
    if (options.scrubPIIFromOutput) output = scrubPII(output);
    output = removeInjectedInstructions(output); // always

    return replaceOutputText(result, output);
  };
}

Four guardrail types address different threat classes:

Type	Applied when	Implementation
Schema validation	Before handler — catches wrong types, out-of-range values	Zod (MCP SDK built-in, zero additional cost)
Semantic guardrails	Before handler — catches prompt injection in string arguments	Pattern matching on INJECTION_PATTERNS list + token score
Structural guardrails	Before handler — catches SSRF in URL params, path traversal in file paths	DNS resolution + private-range blocklist; `path.resolve` prefix check
Output guardrails	After handler — catches PII in results, injected instructions in fetched content	PII regex scrubbing; instruction-pattern removal from third-party content

A key design point: guardrail rejections should be returned as valid MCP tool results with isError: true, not as HTTP errors. This preserves the monitoring signal — AliveMCP sees a healthy server that responded correctly, not an error rate spike. Security events are logged separately to an audit table. A spike in rejections from a single session that exceeds baseline is the alert you actually want:

// Distinguish security events from operational errors in monitoring
return {
  content: [{ type: 'text', text: JSON.stringify({
    error: 'guardrail_rejection',
    code: 'PROMPT_INJECTION_DETECTED',
    message: 'Input rejected.',
    support_ref: requestId
  })}],
  isError: true
};

Apply guardrails selectively. Schema validation: always. Semantic injection detection: on tools that accept free-text strings derived from user input. SSRF prevention: on any tool that accepts a URL parameter. PII scrubbing: on tools that fetch external or user-generated content. Skip output scrubbing on tools that return your own structured data where you need the actual values for downstream tool calls. See guardrails patterns for the full SSRF DNS-resolution guard, the PII scrubbing regex library, the instruction-injection remover for fetched content, and the test harness for validating guardrail coverage in CI.

The unified monitoring architecture

Each pattern produces a distinct health signal. Together the five endpoints give AliveMCP a complete picture of agentic server health — one that covers failure classes the protocol probe cannot reach:

Endpoint	Check interval	Signals
`/health`	60s	Process alive, protocol responding, DB reachable
`/health/jobs`	60s	Worker count > 0; no stuck active jobs; waiting queue not growing
`/health/workflows`	5m	No non-terminal workflows idle >1 hour
`/health/approvals`	5m	Slack reachable; no stale pending approvals >1 hour
`/health/security`	5m	Guardrail rejection rate within baseline ± 2 standard deviations

The protocol probe at /health catches process death, TLS expiry, and network failures. The four specialized endpoints catch the failure classes that look fine from the outside: a dead job processor, stuck orders, a silenced approval service, and an active injection campaign. None of these produce 5xx errors at the protocol layer. All of them produce wrong outcomes — jobs that never complete, orders that never fulfill, actions that never execute, agents that follow adversarial instructions.

The implementation order that makes sense in practice: tool discovery first (it fixes the highest-frequency agent-visible failure — wrong tool selection — before adding infrastructure complexity), then long-running tasks (if your tools need it — not all do), then state machines (for multi-step workflows), then human-in-the-loop gates (for any tool that's irreversible), then guardrails last (the middleware wrapper goes around everything, so it's easiest to add after the tool surface is stable). The monitoring endpoints can be added in any order — they're cheap to implement and valuable from day one.

Frequently asked questions

Which of the five patterns is most important for a simple MCP server that doesn't do long-running work or multi-step workflows?

Tool discovery and guardrails apply to every MCP server regardless of complexity. If you have even one tool that accepts a free-text string from user input, you have a prompt injection surface. If you have two or more tools with overlapping apparent purposes (e.g., search_customers and get_customer), you have a tool selection failure surface. Long-running tasks, state machines, and human-in-the-loop patterns are additive — they solve problems that emerge at a specific scale of operation. Start with tool discovery (audit your names and descriptions against the patterns in this guide) and the guardrail middleware wrapper (apply it at registration for every tool) before building any async or workflow infrastructure.

Can state machines handle workflows that need human approval mid-way?

Yes — model the approval checkpoint as a workflow state. Instead of payment_authorized → payment_captured, the transition becomes payment_authorized → awaiting_manager_approval → payment_captured. The workflow row persists in the awaiting_manager_approval state until the approver calls an approval tool or the approval expires. The agent polls get_order_workflow and sees the state unchanged — no job queue is needed for the wait. The human-in-the-loop pattern provides the notification layer (Slack interactive message) and the approval service health monitoring; the state machine pattern provides the persistence and transition enforcement. The stuck-workflow health check catches approval checkpoints that expire without being acted on — the same monitoring endpoint covers both operational stalls and approval timeouts.

How do long-running tasks and state machines relate? Can I use one without the other?

They solve adjacent but distinct problems. Long-running tasks address operations that take too long for a synchronous tool call — the job queue holds the work and the agent polls for completion. State machines address multi-step workflows where each step has a side effect and the system must know which steps have already completed to avoid double-execution on retry. You can have long-running tasks without state machines (a single report generation job that just runs to completion without complex branching) and state machines without long-running tasks (a fast order workflow where each step completes in milliseconds). The combination is powerful for workflows where individual steps are themselves long-running: the state machine tracks which steps are complete, and each step uses the job queue pattern for the work that takes time. The state machine's context JSONB column is the right place to store job IDs for in-progress steps.

If I use the guardrail middleware to catch prompt injection, will it generate false positives on legitimate queries?

Yes, particularly for free-text fields where user content naturally contains words like "ignore," "always," or "only." The pattern-matching approach in the guardrails guide uses a scored threshold: a single injection token doesn't trigger rejection, but a combination of multiple instruction-like tokens plus a pattern match does. Tune thresholds per tool: a customer_name field warrants higher precision (require multiple signals) than a query field where injection risk is higher and false positives are more acceptable. Log every rejection to a review queue and check it weekly for false-positive patterns — the goal is to reduce false positives over time without opening the gap that genuine injection attempts exploit. For high-stakes tools where both false positives and false negatives are costly, route borderline inputs to a dedicated lightweight classifier rather than relying solely on regex patterns.