Guide · Kubernetes

MCP Server Helm Charts — packaging, versioning, and verified Kubernetes deployments

A Helm chart turns your MCP server's Kubernetes manifests into a versioned, parameterized release you can deploy, upgrade, roll back, and test with a single command. This guide walks through the full chart structure — from Chart.yaml to the post-deploy MCP protocol test hook — along with the SSE-specific ingress annotations, HPA tuning, and multi-environment values patterns your MCP server needs to run reliably in production.

TL;DR

A Helm chart packages your MCP server's Kubernetes manifests (Deployment, Service, Ingress, HPA, ConfigMap, Secret) into a versioned, configurable release. helm upgrade --install deploys your MCP server idempotently — it installs on first run and upgrades on every subsequent run. Include a Helm test hook that sends a real initialize JSON-RPC request after every deployment to verify the MCP protocol handshake works end-to-end. For SSE transport, annotate your Ingress with nginx.ingress.kubernetes.io/proxy-buffering: "off" and a long proxy-read-timeout — without these, proxies buffer or kill long-lived SSE connections. Set terminationGracePeriodSeconds: 60 on the Deployment so running SSE sessions can drain before the pod is killed. Finally, add the MCP server URL to AliveMCP for continuous external monitoring — the Helm test hook verifies protocol at deploy time, but AliveMCP catches the failures that happen between deployments.

Helm chart structure for an MCP server

Helm organizes everything into a chart directory. For an MCP server, the complete structure looks like this:

mcp-server/
├── Chart.yaml
├── values.yaml
├── templates/
│   ├── deployment.yaml
│   ├── service.yaml
│   ├── ingress.yaml
│   ├── hpa.yaml
│   ├── configmap.yaml
│   ├── secret.yaml
│   ├── pdb.yaml
│   └── tests/
│       └── mcp-protocol-test.yaml
└── .helmignore

The Chart.yaml file declares the chart's identity. It distinguishes between the chart version (the version field, which bumps when you change the chart's templates or defaults) and the app version (the appVersion field, which tracks the MCP server image tag). Keeping these separate means you can fix a chart template bug without pushing a new server image, and vice versa.

apiVersion: v2
name: mcp-server
description: A Helm chart for deploying an MCP server with SSE transport
type: application
version: 1.3.0
appVersion: "2.1.4"
keywords:
  - mcp
  - model-context-protocol
  - ai
  - llm
maintainers:
  - name: platform-team
    email: platform@example.com

The apiVersion: v2 declaration tells Helm 3 this is a modern chart. Helm 3 dropped Tiller, the server-side component from Helm 2, so everything runs client-side with your kubeconfig credentials. If you are still on Helm 2, migrate — it has been end-of-life since November 2020 and has known security issues.

The .helmignore file works exactly like .gitignore: it prevents files from being packaged into the chart archive when you run helm package. At minimum, exclude your README.md, .git/, and any local override files that contain real secrets.

values.yaml and the Deployment template

The values.yaml file is the public interface of your chart. Every value a deployer might need to customize should live here. Templates reference values with {{ .Values.* }} syntax. Here is a complete values.yaml for an MCP server:

# Image configuration
image:
  repository: ghcr.io/your-org/mcp-server
  tag: ""          # Defaults to Chart.yaml appVersion if empty
  pullPolicy: IfNotPresent

# Pod count
replicaCount: 2

# Service configuration
service:
  type: ClusterIP
  port: 8080

# Ingress configuration
ingress:
  enabled: true
  className: nginx
  host: mcp.example.com
  tls: true
  tlsSecret: mcp-tls-cert

# Resource requests and limits
resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 512Mi

# Horizontal pod autoscaling
autoscaling:
  enabled: false
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70
  targetMemoryUtilizationPercentage: 80

# Environment variables (non-secret)
env:
  LOG_LEVEL: info
  MCP_TRANSPORT: http
  MCP_PORT: "8080"

# Secret references — keys become env vars sourced from a K8s Secret
secrets:
  secretName: mcp-server-secrets
  keys:
    - API_KEY
    - DATABASE_URL
    - OPENAI_API_KEY

A few important defaults to note. Setting image.tag: "" and resolving it in the template to Chart.yaml's appVersion when empty means helm upgrade --install without an explicit --set image.tag=... always uses the version the chart was designed for. This prevents accidental drift between chart and image. In production, always pin the tag explicitly via --set image.tag=v2.1.4 or a values override file — latest is not a valid production tag.

The pullPolicy: IfNotPresent default is correct for production: if the image is already cached on the node, Kubernetes won't re-pull it, which saves time and avoids registry rate limits. During local development, you might temporarily set this to Always to force fresh pulls of a latest-tagged image, but switch back before merging.

The Deployment template ties all of these values together. An MCP server has three properties that generic templates often miss: the terminationGracePeriodSeconds must be long enough for SSE sessions to drain, the probes must check actual MCP protocol readiness, and the secrets must be mounted as environment variables rather than files (most MCP SDK configuration is environment-variable-driven).

apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "mcp-server.fullname" . }}
  labels:
    {{- include "mcp-server.labels" . | nindent 4 }}
spec:
  {{- if not .Values.autoscaling.enabled }}
  replicas: {{ .Values.replicaCount }}
  {{- end }}
  selector:
    matchLabels:
      {{- include "mcp-server.selectorLabels" . | nindent 6 }}
  template:
    metadata:
      labels:
        {{- include "mcp-server.selectorLabels" . | nindent 8 }}
    spec:
      terminationGracePeriodSeconds: 60
      containers:
        - name: mcp-server
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          ports:
            - name: http
              containerPort: {{ .Values.service.port }}
              protocol: TCP
          readinessProbe:
            httpGet:
              path: /health
              port: http
            initialDelaySeconds: 5
            periodSeconds: 10
            failureThreshold: 3
          livenessProbe:
            httpGet:
              path: /health
              port: http
            initialDelaySeconds: 15
            periodSeconds: 20
            failureThreshold: 5
          resources:
            {{- toYaml .Values.resources | nindent 12 }}
          envFrom:
            - secretRef:
                name: {{ .Values.secrets.secretName }}
                optional: true
          env:
            {{- range $key, $val := .Values.env }}
            - name: {{ $key }}
              value: {{ $val | quote }}
            {{- end }}

The terminationGracePeriodSeconds: 60 setting is the MCP-specific detail most teams get wrong initially. When Kubernetes evicts a pod — during a rolling update, a node drain, or an HPA scale-in — it sends SIGTERM to the container and then waits for the grace period before sending SIGKILL. An MCP server using SSE transport may have active streaming sessions. Without a meaningful grace period, those sessions terminate instantly, dropping the tool call in progress. Sixty seconds gives most sessions time to reach a natural completion point. If your MCP server is used for long-running tool calls (code execution, file processing), consider raising this to 120 or 180 seconds. See zero-downtime deployment for MCP servers for the full lifecycle management strategy.

Service and Ingress templates

The Service template is straightforward for a ClusterIP service, but the Ingress template needs MCP-specific annotations. For SSE transport, the nginx ingress controller must be told to disable response buffering and allow long-lived connections — both behaviors that conflict with its defaults.

# templates/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: {{ include "mcp-server.fullname" . }}
  labels:
    {{- include "mcp-server.labels" . | nindent 4 }}
spec:
  type: {{ .Values.service.type }}
  ports:
    - port: {{ .Values.service.port }}
      targetPort: http
      protocol: TCP
      name: http
  selector:
    {{- include "mcp-server.selectorLabels" . | nindent 4 }}
# templates/ingress.yaml
{{- if .Values.ingress.enabled -}}
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: {{ include "mcp-server.fullname" . }}
  labels:
    {{- include "mcp-server.labels" . | nindent 4 }}
  annotations:
    nginx.ingress.kubernetes.io/proxy-buffering: "off"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "60"
spec:
  ingressClassName: {{ .Values.ingress.className }}
  {{- if .Values.ingress.tls }}
  tls:
    - hosts:
        - {{ .Values.ingress.host }}
      secretName: {{ .Values.ingress.tlsSecret }}
  {{- end }}
  rules:
    - host: {{ .Values.ingress.host }}
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: {{ include "mcp-server.fullname" . }}
                port:
                  number: {{ .Values.service.port }}
{{- end }}

The two critical annotations are nginx.ingress.kubernetes.io/proxy-buffering: "off" and nginx.ingress.kubernetes.io/proxy-read-timeout: "3600". Without disabling proxy buffering, nginx accumulates SSE events in its buffer before sending them to the client, which breaks the real-time streaming behavior that MCP tool calls depend on. Without a long read timeout (the default is 60 seconds), nginx closes the connection after one minute of inactivity on an SSE connection — even if the connection is healthy and waiting for the next tool response. The value of 3600 (one hour) is a reasonable upper bound for interactive sessions; adjust lower if your usage patterns allow it. For a deeper discussion of nginx-specific MCP configuration, see nginx ingress configuration for MCP servers.

If your deployment uses a different ingress controller — Traefik, Istio, or the AWS ALB controller — the annotations differ but the underlying requirement is the same: disable buffering, extend idle timeouts. Traefik uses traefik.ingress.kubernetes.io/router.middlewares pointing to a Middleware resource with responseHeaderTimeout and no buffering. Istio uses VirtualService with timeout set on the route.

Helm test hook for MCP protocol verification

Kubernetes readiness probes check that a pod is alive and accepting connections, but they do not verify that the MCP protocol handshake succeeds. An HTTP 200 on /health only means the HTTP server is up — it says nothing about whether the MCP server correctly responds to an initialize request, whether its tool registry loaded without errors, or whether a downstream dependency (a database, an embeddings service) is reachable.

A Helm test hook fills this gap. It runs a pod after every helm install or helm upgrade, executes a real MCP protocol probe, and causes the Helm operation to fail if the probe fails. This makes protocol correctness part of your deployment gate, not just an afterthought.

# templates/tests/mcp-protocol-test.yaml
apiVersion: v1
kind: Pod
metadata:
  name: "{{ include "mcp-server.fullname" . }}-mcp-test"
  labels:
    {{- include "mcp-server.labels" . | nindent 4 }}
  annotations:
    "helm.sh/hook": test
    "helm.sh/hook-delete-policy": before-hook-creation,hook-succeeded
spec:
  containers:
    - name: mcp-probe
      image: curlimages/curl:8.6.0
      command: ['sh', '-c']
      args:
        - |
          curl -sf -X POST \
            http://{{ include "mcp-server.fullname" . }}:{{ .Values.service.port }}/ \
            -H 'Content-Type: application/json' \
            -d '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","clientInfo":{"name":"helm-test","version":"1.0"},"capabilities":{}}}' \
            | grep -q '"protocolVersion"' \
            && echo "MCP protocol probe passed" \
            || { echo "MCP protocol probe FAILED"; exit 1; }
  restartPolicy: Never

Run it with:

helm test mcp-server-production

The helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded annotation keeps the cluster tidy: the test pod is deleted before a new test run starts (preventing name collisions) and deleted after a successful run. Failed pods are deliberately left around so you can inspect their logs with kubectl logs <pod-name>.

The probe itself sends a minimal but valid MCP initialize request and checks that the response body contains "protocolVersion". A correct MCP server must echo back the negotiated protocol version in its initialize response. If the server responds with an HTTP error, a malformed JSON-RPC response, or a body that lacks protocolVersion, the test fails and the Helm deployment is marked as failed in its history.

In CI/CD pipelines, integrate the test into the deploy step:

helm upgrade --install mcp-server-production ./mcp-server \
  -f values-production.yaml \
  --set image.tag=$IMAGE_TAG \
  --wait \
  --timeout 5m

helm test mcp-server-production --timeout 2m

The --wait flag on helm upgrade blocks until all pods reach the Ready state before returning. Running helm test immediately after ensures you test a fully healthy deployment, not pods that are still starting up. For comprehensive coverage of what a correct MCP health probe should check, see MCP server health check implementation.

HPA and resource configuration for MCP servers

Horizontal pod autoscaling for MCP servers has a subtle complication that stateless HTTP services don't share: SSE connections are long-lived. When the HPA adds a new pod, new sessions route to it. But existing SSE sessions — potentially mid-tool-call — remain pinned to their original pods via the load balancer's connection tracking. This means scale-out works cleanly (new load goes to new pods) but scale-in requires caution (the HPA cannot simply terminate pods that still have active connections).

The HPA template:

# templates/hpa.yaml
{{- if .Values.autoscaling.enabled -}}
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: {{ include "mcp-server.fullname" . }}
  labels:
    {{- include "mcp-server.labels" . | nindent 4 }}
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: {{ include "mcp-server.fullname" . }}
  minReplicas: {{ .Values.autoscaling.minReplicas }}
  maxReplicas: {{ .Values.autoscaling.maxReplicas }}
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: {{ .Values.autoscaling.targetCPUUtilizationPercentage }}
    {{- if .Values.autoscaling.targetMemoryUtilizationPercentage }}
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: {{ .Values.autoscaling.targetMemoryUtilizationPercentage }}
    {{- end }}
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Pods
          value: 1
          periodSeconds: 60
{{- end }}

The behavior.scaleDown.stabilizationWindowSeconds: 300 setting prevents flapping: the HPA waits five minutes after utilization drops before removing pods. The policy of removing at most one pod every 60 seconds gives time for active SSE sessions on the to-be-removed pod to complete naturally. Combine this with the terminationGracePeriodSeconds: 60 on the Deployment to give the pod a full minute to drain after the HPA marks it for removal.

On the CPU versus memory scaling decision: MCP servers that maintain session state in memory — tool call history, cached embeddings, open database connections per session — should include a memory utilization target. These servers grow their memory footprint proportionally with active session count, which correlates better with load than CPU does. Stateless StreamableHTTP MCP servers that reconstruct context from each request scale cleanly on CPU utilization alone.

Add a PodDisruptionBudget to prevent Kubernetes from terminating all pods simultaneously during node maintenance, cluster upgrades, or aggressive scale-in:

# templates/pdb.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: {{ include "mcp-server.fullname" . }}
  labels:
    {{- include "mcp-server.labels" . | nindent 4 }}
spec:
  minAvailable: 1
  selector:
    matchLabels:
      {{- include "mcp-server.selectorLabels" . | nindent 6 }}

With minAvailable: 1, Kubernetes will not voluntarily evict a pod if doing so would leave zero pods running. Node drains (during cluster upgrades or instance replacements) respect PodDisruptionBudgets — if your MCP server has two replicas and the PDB requires one available, the node drain will wait for the first pod to be rescheduled and reach Ready before evicting the second. For full coverage of the load balancing strategies that work alongside HPA, see MCP server load balancing patterns.

Deploying across environments with Helm values overrides

The multi-environment pattern in Helm is to keep values.yaml as the base set of safe defaults, then layer environment-specific overrides with separate values files. The key discipline is that nothing environment-specific should live in values.yaml — only generic defaults that work everywhere.

mcp-server/
├── values.yaml            # Safe base defaults, no env-specific values
├── values-staging.yaml    # Staging overrides
└── values-production.yaml # Production overrides

A values-staging.yaml typically reduces replica count and resource limits to save cost, and points to a staging-specific hostname and certificate:

# values-staging.yaml
replicaCount: 1

image:
  pullPolicy: Always   # Always re-pull in staging to test latest builds

ingress:
  host: mcp-staging.example.com
  tlsSecret: mcp-staging-tls-cert

resources:
  requests:
    cpu: 50m
    memory: 64Mi
  limits:
    cpu: 200m
    memory: 256Mi

autoscaling:
  enabled: false

env:
  LOG_LEVEL: debug
  MCP_TRANSPORT: http

A values-production.yaml enables autoscaling, raises resource limits, and uses production hostnames:

# values-production.yaml
replicaCount: 3

ingress:
  host: mcp.example.com
  tlsSecret: mcp-production-tls-cert

resources:
  requests:
    cpu: 200m
    memory: 256Mi
  limits:
    cpu: 1000m
    memory: 1Gi

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 20
  targetCPUUtilizationPercentage: 65

env:
  LOG_LEVEL: warn
  MCP_TRANSPORT: http

Deploy to staging with:

helm upgrade --install mcp-server-staging ./mcp-server \
  -f values-staging.yaml \
  --set image.tag=$STAGING_IMAGE_TAG \
  --namespace staging \
  --create-namespace

Deploy to production with:

helm upgrade --install mcp-server-production ./mcp-server \
  -f values-production.yaml \
  --set image.tag=$PRODUCTION_IMAGE_TAG \
  --namespace production \
  --atomic \
  --timeout 10m

The --atomic flag on the production deploy combines --wait with automatic rollback: if the deploy does not reach healthy status within the timeout, Helm automatically runs helm rollback to the previous revision. This is the safest option for production deployments where a bad release should never be left in a partially-upgraded state.

Never hardcode secrets in any values file. Values files are typically committed to git alongside the chart. Pass secrets at deploy time with --set secrets.API_KEY=... (sourced from a CI/CD secret store), or better, use an ExternalSecrets operator and only reference the external secret's name in values — never the secret value itself. For the full secrets management strategy, see MCP server secrets management.

Monitoring Helm-deployed MCP servers with AliveMCP

The Helm test hook described above is a one-time post-deploy check that verifies the MCP protocol works immediately after a deployment completes. It answers the question: "Did this deployment break the protocol handshake?" But it says nothing about what happens an hour later, a day later, or after a certificate renews, a node is replaced, or a backing service becomes unavailable at 2am.

AliveMCP runs protocol-level probes against your MCP server's public endpoint on a continuous schedule — every minute, from multiple locations. Each probe sends a real initialize JSON-RPC request and validates the full MCP handshake, exactly like the Helm test hook, but continuously. When a probe fails — whether due to a connection timeout, a TLS error, an HTTP 5xx, or a malformed protocol response — AliveMCP sends an alert immediately.

After your first helm upgrade --install succeeds and helm test passes, register the server with AliveMCP:

# Using the AliveMCP API to register a new monitor
curl -X POST https://alivemcp.com/api/monitors \
  -H "Authorization: Bearer $ALIVEMCP_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "mcp-server-production",
    "url": "https://mcp.example.com",
    "interval": 60,
    "alertChannels": ["slack-platform-oncall", "pagerduty-mcp"]
  }'

You can also automate monitor registration as a post-deploy step in your CI/CD pipeline, so every new MCP server deployed via Helm is automatically monitored from the moment it goes live. The monitor configuration — URL, interval, alert channels — can be stored alongside your values-production.yaml so the monitoring setup is version-controlled with the deployment configuration.

The division of responsibility between Helm test and AliveMCP:

Aspect Helm test hook AliveMCP
When it runs Once, after each deploy Every minute, continuously
Where it runs from Inside the cluster External, from multiple regions
What it checks MCP protocol handshake MCP protocol handshake + TLS + DNS + network path
Catches post-deploy failures No Yes — certificate expiry, node failure, image restart loops
Visible to end users Never — internal only The same path end users take
On failure action Marks deploy as failed, optionally auto-rolls back Pages on-call, shows incident timeline

A failure that AliveMCP catches but a Helm test hook would not: a TLS certificate managed by cert-manager expires on a Sunday because cert-manager failed to renew it (ACME DNS-01 challenge timeout). The cluster, the pods, and the MCP protocol all appear healthy from inside the cluster. The Helm test hook — running inside the cluster against the ClusterIP Service — bypasses TLS entirely. AliveMCP, probing from outside the cluster through the Ingress with the expired certificate, catches the TLS handshake failure within one minute and pages the on-call engineer. See MCP server health check implementation for the full list of failure modes that only external monitoring catches.

For teams rolling out Helm-based deployments across multiple environments, AliveMCP supports multiple monitors — one per environment — with separate alert channels. The staging monitor can post to a Slack channel without paging anyone; the production monitor wakes up the on-call rotation. This mirrors the values-based environment separation in the Helm chart itself.

Frequently asked questions

How do I update the MCP server image without changing the Helm chart?

Pass the new image tag at upgrade time with --set image.tag=v2.1.0:

helm upgrade mcp-server-production ./mcp-server \
  -f values-production.yaml \
  --set image.tag=v2.1.0

This overrides the default from Chart.yaml's appVersion without touching the chart itself. The chart version stays the same; only the running image changes. Alternatively, update image.tag directly in values-production.yaml before running helm upgrade --install. During active development, setting image.pullPolicy: Always alongside a fixed tag causes Kubernetes to re-pull the image even if the tag hasn't changed — useful for testing latest-tagged builds in a dev cluster, but never appropriate in production where you want the image to be exactly what was last pulled.

How do I store MCP server secrets (API keys) in Helm without committing them?

Never put secret values in values.yaml or any values override file committed to git. There are two safe patterns. The first is deploy-time injection: pass the secret value via --set in your CI/CD pipeline, sourcing it from the pipeline's secret store (GitHub Actions Secrets, GitLab CI Variables, Vault, AWS Secrets Manager). Example:

helm upgrade --install mcp-server-production ./mcp-server \
  -f values-production.yaml \
  --set-string secrets.apiKey="$MCP_API_KEY"

The second and preferred pattern for production is to use an ExternalSecrets operator (External Secrets Operator or Vault Secrets Operator). Your values.yaml references only the name of the Kubernetes Secret object: secrets.secretName: mcp-server-secrets. The ExternalSecrets controller creates and keeps the Secret up to date by syncing from your external store. The chart never touches the actual secret values. See MCP server secrets management for the full setup.

Can I use Helm to manage both the MCP server and its backing database?

Yes, but the recommended approach depends on your team structure and upgrade cadence. The simplest option is to deploy them as separate Helm releases — helm upgrade --install mcp-server and helm upgrade --install mcp-postgres — and pass the database connection string to the MCP server at deploy time via --set or an ExternalSecret. Separate releases mean you can upgrade the MCP server without touching the database, which is the correct deployment model for stateful backing services. The alternative is a parent chart with the MCP server and a database chart (such as Bitnami's PostgreSQL chart) as a subchart dependency. This works well for development environments where simplicity matters more than independent lifecycle management, but in production, the coupling between chart upgrades and database upgrades adds risk. Keep the MCP server chart stateless from a Helm perspective: it should deploy and upgrade without data migration steps. Deploy the database separately and manage schema migrations as a separate job.

What is the difference between helm install and helm upgrade --install?

helm install fails if a release with the given name already exists. This makes it unsuitable for CI/CD pipelines where the same pipeline handles both the first deployment and all subsequent updates. helm upgrade --install is idempotent: it calls install if no release exists and upgrade if one does. Always use helm upgrade --install in automated pipelines to avoid failures on first deploy versus re-deploy. The only time to use plain helm install is when you intentionally want the command to fail if the release already exists — a safeguard for manual one-time setup tasks where accidentally overwriting an existing release would be harmful.

How do I roll back a bad MCP server deployment with Helm?

Helm stores the full history of every release revision. To see all revisions and their status:

helm history mcp-server-production

To roll back to the previous revision:

helm rollback mcp-server-production

To roll back to a specific revision (for example, revision 14):

helm rollback mcp-server-production 14

A rollback is itself recorded as a new revision, so helm history shows the full chain of events including the rollback. If you deployed with --atomic, a failed deployment rolls back automatically without manual intervention — the deploy step fails, the previous working revision is restored, and your CI/CD pipeline receives a non-zero exit code. Combine Helm rollback capability with AliveMCP alerting: when AliveMCP detects a spike in probe failures immediately after a deployment, that is the signal to inspect the Helm history and roll back if the new revision is the cause. See zero-downtime deployment for MCP servers for the full rollback and canary strategy.

Further reading

Know when your MCP server is down — before users do

AliveMCP probes your server's MCP endpoint every minute, detects protocol errors and transport failures, and pages you before users notice.

Start monitoring free