Runbook — Restoration Decision
Runbook — Restoration Decision
Historical decision; superseded by the worked example at
docs/helix/05-deploy/runbook.md.
runbook is restored as the canonical deploy-activity artifact for
service-specific on-call response, rollback, recovery, and routine operating
procedures.
Decision
This artifact is restored rather than retired.
Current HELIX still requires docs/helix/05-deploy/runbook.md in the deploy
exit gate, checks it for rollback guidance, and references it from the live
deploy artifact surface in workflows/DDX.md, workflows/conventions.md,
workflows/diagrams/artifact-flow.md, deployment-checklist, and
monitoring-setup. The intent therefore still exists in the current contract.
Why It Exists
deployment-checklistis the short go/no-go surface for a release.monitoring-setupdefines the signals, dashboards, and alerts operators use.runbookexplains what operators do when those signals fire or when a routine operational task must be executed safely.- Tracker issues record rollout work, but they do not replace durable, service-specific procedures that on-call responders need during incidents, rollback, or recovery.
Canonical Inputs
- deployment checklist and rollback entrypoints
- monitoring setup, dashboards, and alert routing
- architecture and dependency boundaries
- on-call ownership and escalation paths
- service-specific recovery and maintenance procedures
Minimum Prompt Bar
- Keep the runbook service-specific, concise, and executable during real incidents or maintenance windows.
- Map concrete alerts or symptoms to first checks, commands, dashboards, and owners rather than writing generic advice.
- Include rollback and recovery procedures with prerequisites, stop conditions, and validation steps.
- Include only recurring operational procedures that somebody is actually expected to perform.
- Distinguish the runbook from
deployment-checklistandmonitoring-setupinstead of duplicating those artifacts. - Include security or data-safety response only when the service has service-specific handling beyond a shared organization playbook.
- Omit generic SRE handbook filler, sample vendor commands, and broad launch coordination tasks.
Minimum Template Bar
- service summary and ownership
- operator entry points
- dependencies and failure boundaries
- alert triage table
- common incident procedures
- rollback and recovery steps
- routine operations or maintenance
- escalation and communications
Canonical Replacement Status
runbook is not replaced by deployment-checklist or monitoring-setup.
Those artifacts define release decision points and observability configuration;
the runbook is the durable response surface that ties signals to operator
action. activity:deploy beads track rollout work, but they do not replace the
persistent per-service procedures operators need during incidents and
recovery.
The deleted prompt and template were too broad and generic to justify keeping them. Restoration is warranted only with a tighter prompt and template bar that forces service-specific operational guidance.