Skip to content

ADR-012: Runbook owns incident response procedures; monitoring-setup owns detection only

Source identity (from 02-design/adr/ADR-012-runbook-owns-incident-response.md):

ddx:
  id: ADR-012
  depends_on:
    - helix.prd

ADR-012: Runbook owns incident response procedures; monitoring-setup owns detection only

DateStatusDecidersRelatedConfidence
2026-05-30AcceptedHELIX maintainersplan-2026-05-30-artifact-types-and-concerns-audit, monitoring-setup, runbookHigh

Context

AspectDescription
ProblemThe 2026-05-30 artifact-types-and-concerns audit flagged an ownership collision between monitoring-setup and runbook: both currently define incident-response routing. monitoring-setup’s template carries an Incident Response section (escalation paths, response procedures) that duplicates and competes with runbook’s Common Incident Procedures and escalation content. With two artifact types claiming the same surface, operators must reconcile two sources of truth at the moment they can least afford to.
Current Statemonitoring-setup/template.md includes an ## Incident Response section. runbook/template.md includes ## Common Incident Procedures (with per-incident sections and a security/data-safety incident path) and explicit escalation routing. The two artifact types overlap on response procedures and escalation rather than partitioning detection from response.
RequirementsThe catalog must assign incident-response ownership to exactly one artifact type. The other type must stop defining that surface so operators have a single canonical procedure document during an incident.

Decision

monitoring-setup owns detection only: SLI/SLO definitions, alert routing inputs, threshold tuning, and the observability surface that produces signals.

runbook owns incident response: the procedures operators run when those signals fire — recovery steps, common incident procedures, escalation paths, and incident commander routing.

In Phase 3, the Incident Response section is removed from the monitoring-setup template. Any content from that section not already covered moves to runbook. After Phase 3, the catalog validator can treat presence of an Incident Response (or equivalent response-procedure) H2 in monitoring-setup as a drift signal.

Key Points: monitoring-setup = detection surface | runbook = operator procedure | Phase 3 removes Incident Response from monitoring-setup template | content not already in runbook moves there

Alternatives

OptionProsConsEvaluation
Keep both artifact types defining incident response; rely on authors to keep them in syncNo catalog changeThe audit found they are already out of sync; two sources of truth during an incident is the worst possible time for ambiguityRejected: status quo the audit flagged
Move detection ownership into runbook and remove monitoring-setupSingle artifact type for the whole operational surfaceDetection (SLI/SLO definitions, alert thresholds) and response (procedures, escalation) are authored by different people at different cadences; collapsing them loses that separationRejected: conflates two genuinely distinct authoring surfaces
monitoring-setup owns detection; runbook owns response; Phase 3 removes the Incident Response section from monitoring-setupClean partition aligned to the actual purpose of each artifact type; single source of truth for procedures; enforceable by validatorRequires a content migration for any monitoring-setup content not already in runbookSelected: smallest sufficient ownership fix

Consequences

TypeImpact
PositiveOperators have a single canonical document (runbook) for incident procedures and escalation.
Positivemonitoring-setup becomes focused on its actual purpose — the observability surface — without bleeding into operator procedure.
PositiveThe catalog validator can flag any future reintroduction of response procedures into monitoring-setup as drift.
PositiveAuthoring cadences are no longer entangled: detection thresholds can evolve without re-touching response procedures, and vice versa.
NegativePhase 3 must perform a content migration from monitoring-setup’s Incident Response section to runbook for anything not already covered.
NegativeExisting monitoring-setup artifacts authored under the previous contract need a one-time edit to remove the Incident Response section.
NeutralThe two artifact types continue to live side-by-side in the 05-deploy activity; only their boundaries change.

Risks

RiskProbImpactMitigation
Content in monitoring-setup’s Incident Response section is lost rather than migratedLHPhase 3 migration explicitly diffs the removed content against the runbook template before deleting; anything not already covered is added to runbook
Authors continue to add response procedures to monitoring-setup after Phase 3MMCatalog validator flags Incident Response (or equivalent response-procedure) H2 in monitoring-setup as drift
The detection/response partition is unclear at the boundary (e.g. alert routing)MMDetection ends at “signal produced and routed to a destination”; response begins at “operator receives signal and acts.” Alert routing destinations (PagerDuty, channel) are detection; what the recipient does is response
Existing references from outside the catalog point to the removed sectionLLPhase 3 ships a redirect note in commit message and updates in-tree references in the same change

Validation

Success MetricReview Trigger
monitoring-setup/template.md contains no Incident Response section after Phase 3A PR reintroduces an Incident Response H2 (or equivalent response-procedure section) into monitoring-setup
All response-procedure content from the removed section is present in runbook (either pre-existing or migrated)Phase 3 lands without a diff confirming migration coverage
The catalog validator flags monitoring-setup artifacts that carry response-procedure sectionsA monitoring-setup artifact ships with response-procedure content and validation passes
Operators consulting runbook during an incident find escalation and procedure content without needing to cross-reference monitoring-setupAn incident retrospective surfaces split-source-of-truth as a contributing factor

References