Skip to content

Monitoring Setup — Restoration Decision

Monitoring Setup — Restoration Decision

Historical decision; superseded by the worked example at docs/helix/05-deploy/monitoring-setup.md.

monitoring-setup is restored as the canonical deploy-activity artifact for service-specific observability configuration and alerting readiness.

Decision

This artifact is restored rather than retired.

Current HELIX still requires docs/helix/05-deploy/monitoring-setup.md in the deploy and iterate gates, and the deploy guidance still treats monitoring as a first-class readiness concern. That responsibility is not replaced cleanly by a deployment checklist, runbook, or metrics dashboard.

Why It Exists

  • The deployment checklist confirms release readiness tasks were completed.
  • The runbook explains how operators respond when something goes wrong.
  • The metrics dashboard summarizes what the system is doing over time.
  • Monitoring setup defines the concrete metrics, alerts, dashboards, logging, tracing, health checks, and SLO signals that must exist before release.

Canonical Inputs

  • deployment strategy and rollout plan
  • service architecture and dependency boundaries
  • health-check surfaces and operational invariants
  • on-call ownership, routing, and escalation paths
  • security monitoring requirements when alerting or audit signals overlap

Minimum Prompt Bar

  • Keep the setup service-specific rather than producing a generic observability checklist.
  • Define the metrics, dashboards, logs, traces, and alerts that operators will actually use.
  • Include measurable thresholds, routing, escalation ownership, and immediate containment actions where they matter.
  • Tie health checks and SLI/SLO signals to deploy readiness and rollback decisions.
  • Link incident response expectations to the operator flow, and add runbook references once they exist.

Minimum Template Bar

  • service summary
  • metrics collection
  • alerting rules
  • dashboards
  • logs and tracing
  • health checks
  • SLI/SLO tracking
  • incident-response routing

Canonical Replacement Status

monitoring-setup is not replaced by deployment-checklist, runbook, or metrics-dashboard. Those artifacts consume or reference observability setup, but this document remains the canonical place where deploy-time monitoring requirements are made explicit.

The deleted security-monitoring artifact stays superseded rather than restored. Its surviving intent, security-focused alerts, audit signals, escalation paths, and compliance-relevant monitoring, is now part of monitoring-setup. Restoring a second deploy artifact for the same operational surface would reintroduce overlap without adding a distinct prompt or template responsibility.