Guide · Alert Routing
OpsGenie for MCP Servers — team alert routing, schedules, and escalation
Atlassian OpsGenie is the on-call management tool favored by teams already in the Jira and Confluence ecosystem. Its alert routing model differs from PagerDuty in one important way: OpsGenie routes alerts to teams, not individual services, which makes it better suited for organizations where different squads own different MCP servers. This guide covers the full integration: creating an OpsGenie API integration, forwarding AliveMCP webhooks with the correct payload shape, configuring team routing rules, setting on-call schedules, and using OpsGenie's alert alias for deduplication and auto-close.
TL;DR
In OpsGenie, create an API integration under your team, copy the integration API key, and deploy a bridge that POSTs to https://api.opsgenie.com/v2/alerts when AliveMCP sends alert.triggered, and sends a DELETE to https://api.opsgenie.com/v2/alerts/{alias}/close on alert.resolved. Use alias: "alivemcp-{serverSlug}" for deduplication. Set priority: P1 for production MCP servers and P3 for third-party dependency monitoring. OpsGenie's Heartbeat feature can also serve as a dead-man switch: AliveMCP pings the heartbeat URL every minute, and OpsGenie fires an alert if the ping stops — useful for catching cases where AliveMCP itself has a connectivity issue reaching your private endpoint.
OpsGenie vs PagerDuty for MCP server teams
Both OpsGenie and PagerDuty handle on-call alerting well. The choice usually comes down to your existing tooling stack. Teams using Atlassian products (Jira, Confluence, Statuspage) choose OpsGenie because the integration with Jira Service Management is tighter and alert-to-ticket workflows are built in. Teams using a standalone ops stack choose PagerDuty for its deeper AIOps features. For MCP server monitoring specifically, both tools are equally capable.
| Feature | OpsGenie | PagerDuty |
|---|---|---|
| Routing model | Team-centric (alert → team → on-call) | Service-centric (alert → service → policy) |
| Jira integration | Native, bi-directional | Plugin-based |
| Free tier | Up to 5 users, basic features | Up to 5 users, basic features |
| Alert deduplication | Via alias field |
Via dedup_key field |
| Auto-close on resolve | Yes (close action via API) | Yes (resolve action via Events API) |
| Heartbeat monitoring | Built-in (dead-man switch) | Via separate service |
Creating an OpsGenie API integration
Navigate to Settings → Integrations → Add integration in OpsGenie. Select API from the integration list. On the integration configuration screen:
- Set a name: "AliveMCP Bridge".
- Assign it to the team responsible for MCP server on-call (create the team first if it doesn't exist).
- Copy the API key shown — this is the credential your bridge uses to authenticate requests.
- Leave "Responders" and "Override priority" fields empty — your bridge will set these dynamically based on the server type.
If you are monitoring multiple MCP servers owned by different teams, create a separate API integration per team, each with its own key. Your bridge selects the key based on the server_owner_team field in the AliveMCP webhook payload.
Bridge code: AliveMCP webhooks to OpsGenie alerts
The bridge is a small HTTP handler that transforms AliveMCP's webhook shape into OpsGenie's /v2/alerts API format.
// alivemcp-opsgenie-bridge.js
const OPSGENIE_API_KEY = process.env.OPSGENIE_API_KEY;
const OPSGENIE_BASE = 'https://api.opsgenie.com/v2/alerts';
async function handleAliveMCPWebhook(req, res) {
const event = req.body;
const { type, server_slug, server_name, failure_reason, check_url, server_tier } = event;
const alias = `alivemcp-${server_slug}`; // dedup key
if (type === 'alert.triggered') {
const priority = server_tier === 'internal' ? 'P1' :
server_tier === 'author' ? 'P2' : 'P3';
await fetch(OPSGENIE_BASE, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `GenieKey ${OPSGENIE_API_KEY}`,
},
body: JSON.stringify({
message: `MCP server down: ${server_name}`,
alias, // deduplication — repeat alerts update, not duplicate
description: [
`Failure: ${failure_reason}`,
`Endpoint: ${check_url}`,
`Dashboard: https://alivemcp.com/status/${server_slug}`,
].join('\n'),
responders: [{ type: 'team', name: 'mcp-oncall' }],
tags: ['mcp-server', 'alivemcp', server_tier ?? 'unknown'],
priority,
details: { server_slug, failure_reason, check_url },
source: 'AliveMCP',
}),
});
}
if (type === 'alert.resolved') {
// OpsGenie close: POST to /v2/alerts/{identifier}/close
await fetch(`${OPSGENIE_BASE}/${encodeURIComponent(alias)}/close?identifierType=alias`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `GenieKey ${OPSGENIE_API_KEY}`,
},
body: JSON.stringify({
note: `MCP server ${server_name} recovered. AliveMCP confirmed endpoint is healthy.`,
source: 'AliveMCP',
}),
});
}
res.status(200).json({ ok: true });
}
Note the identifierType=alias query parameter on the close call. OpsGenie identifies alerts by either their internal ID or the alias you set when creating the alert. Always use alias as the identifier — the internal ID is not available at alert creation time when AliveMCP fires the resolved event.
On-call schedule configuration for MCP server teams
OpsGenie's schedule editor is more visual than PagerDuty's. For a small MCP server team:
- Create a schedule named "MCP Server On-Call" under your team.
- Add a weekly rotation with all on-call eligible team members. OpsGenie rotates weekly by default starting Monday 09:00 in your team's timezone.
- Add restrictions if you want daytime-only coverage for non-critical servers: "Only during business hours, Monday–Friday 09:00–18:00". Outside these hours, alerts are held until the next coverage window unless they are P1/P2 (which bypass restrictions).
- Add a "follow-the-sun" schedule if your team spans time zones: create three rotation sub-schedules (Americas, EMEA, APAC), each covering an 8-hour block, with team members in each region.
For solo indie MCP authors on OpsGenie's free tier: configure a single schedule with yourself on rotation. Set your notification rules to send push notification immediately for P1, wait 3 minutes then call for P2, and email only for P3. This ensures production MCP servers wake you up but dependency monitoring doesn't.
OpsGenie Heartbeat as a dead-man switch
OpsGenie Heartbeats are an underused feature that complements AliveMCP monitoring. A heartbeat expects a ping at a regular interval; if the ping stops arriving, OpsGenie fires an alert. For MCP server monitoring, you can use heartbeats in two ways:
AliveMCP heartbeat: Configure AliveMCP to send a periodic ping to your OpsGenie Heartbeat URL every 5 minutes. If AliveMCP has a connectivity issue that prevents it from checking your private endpoint, the heartbeat ping also stops, and OpsGenie pages you that AliveMCP has lost contact with your server. This is particularly useful for Team tier users monitoring private MCP endpoints that are not publicly accessible — the heartbeat tells you when AliveMCP's network path to your VPN or private network is broken, not just when the server itself is down.
# OpsGenie heartbeat ping — run every 5 minutes from AliveMCP's check runner
curl -X GET "https://api.opsgenie.com/v2/heartbeats/alivemcp-bridge/ping" \
-H "Authorization: GenieKey ${OPSGENIE_API_KEY}"
Your MCP server's own heartbeat: For MCP servers you control, have the server itself ping an OpsGenie heartbeat URL every minute from its health check loop. If the process hangs or is killed without triggering a clean shutdown, the heartbeat stops and OpsGenie fires an alert independently of AliveMCP — giving you two independent detection paths for the same server.
Alert routing rules for mixed server portfolios
Teams monitoring a mix of internal MCP servers and third-party dependencies need different alert handling for each category. OpsGenie's alert policies let you route and transform alerts based on payload content before they reach the on-call team.
Create an alert policy for each category:
- Internal servers (P1): Route to the "mcp-oncall" team, send push + call immediately, escalate to the engineering manager after 10 minutes without acknowledgment.
- Author-claimed servers (P2): Route to the "mcp-author" team (the server's author), push notification immediately, call after 5 minutes. No manager escalation.
- Third-party dependencies (P3): Route to a Jira service ticket, no push notification. Create a daily digest alert report in Jira for morning review. Only upgrade to P1 if the same third-party server has been down for more than 30 minutes — at that point it may be affecting production.
The tag-based routing in OpsGenie policies makes this clean: your bridge sets the tags array based on the AliveMCP event fields, and the policies match on tags rather than requiring the bridge to know which team to route to.
Frequently asked questions
What is the alias field and why does it matter for MCP server monitoring?
The alias field is OpsGenie's deduplication key — equivalent to PagerDuty's dedup_key. When you set alias: "alivemcp-my-server" on the first alert, and AliveMCP sends another alert.triggered event 1 minute later (because the server is still down), OpsGenie does not open a second alert. Instead it updates the existing alert's count and re-notifies if the previous notification was not acknowledged. This is critical for MCP server monitoring: AliveMCP checks every minute, and a server that has been down for 30 minutes will have sent 30 alert.triggered events. Without deduplication via alias, you would receive 30 separate alerts. The close action also uses the alias to find the right alert to close — without it, the auto-resolution mechanic does not work.
How do I handle MCP servers monitored by multiple team members?
Set responders in the alert payload to include multiple team members or a team (recommended): "responders": [{"type": "team", "name": "mcp-platform"}]. OpsGenie delivers the notification to whoever is currently on-call in that team's schedule. If you name individual users as responders instead of a team, every named responder is notified simultaneously — which creates alert noise if one of them acknowledges quickly. The team approach is cleaner: the on-call schedule determines who receives the notification, and only the current on-call person is paged. Use responders with individual user names only for the escalation level (the person who receives the escalation after the first responder fails to acknowledge in time).
Can OpsGenie create Jira tickets automatically from MCP server alerts?
Yes. OpsGenie's Jira integration (available on Standard plan and above) can automatically create a Jira issue when an alert reaches a certain state. Configure it to create a Jira bug ticket when a P1 or P2 MCP server alert is acknowledged but not resolved within 15 minutes — this creates the paper trail for the post-incident review. Set the Jira issue type to "Incident" and populate fields from the OpsGenie alert details: server_slug becomes the ticket summary suffix, failure_reason becomes the description, and the AliveMCP dashboard link is added as a remote link. This way, every MCP server incident automatically has a Jira ticket to attach the postmortem notes to.
What priority should I use for third-party MCP dependency alerts?
P3 is appropriate for third-party MCP dependencies you depend on but don't control. P3 in OpsGenie sends a push notification but does not make phone calls or send SMS — it is an awareness signal, not an emergency. Upgrade to P2 if the third-party server being down causes your own application to degrade (not just lose a feature). Upgrade to P1 if it is causing your production service to be completely unavailable. A useful heuristic: if an on-call engineer would take manual action at 2 AM, the alert is P1. If they would handle it during business hours, it is P2–P3. Most third-party MCP dependencies fall into P3 — you document them in your runbook, monitor the third party's status page, and wait for them to recover rather than intervening yourself.
How does OpsGenie compare to Alertmanager for MCP server alert routing?
Prometheus Alertmanager handles alert routing for infrastructure that already runs Prometheus metrics collection — if you are already collecting MCP server metrics with Prometheus (error rate, response time, active connections), Alertmanager is a natural choice because it receives firing alerts directly from Prometheus without an additional bridge. OpsGenie is better when the alert source is external monitoring (AliveMCP) rather than internal metrics, and when you need on-call rotation management, mobile push notifications, and schedule management — features Alertmanager does not have. Many teams use both: Alertmanager routes infrastructure metrics alerts (CPU, memory, disk) to OpsGenie, and AliveMCP webhooks route protocol-level MCP server health alerts to OpsGenie. OpsGenie deduplicates across both sources using the same alias scheme. See the Prometheus MCP monitoring guide for how to combine both approaches.
Further reading
- PagerDuty for MCP Servers — routing alerts to on-call
- MCP Server Alert Routing Architecture — multi-channel and escalation design
- Discord Alerts for MCP Servers — webhook routing for dev teams
- MCP Server Slack Alerts — channel routing and message formatting
- Prometheus MCP Monitoring — scraping metrics from MCP servers
- MCP Server SLAs — defining and tracking uptime commitments