Skip to content

Product Requirements Document

Purpose

The PRD is the product-scope authority for what to build and why. Its unique job is to translate the Product Vision into prioritized, measurable requirements and boundaries. It sits between the product vision (which defines direction) and feature specs (which define feature-level detail). Every design decision and implementation choice should trace back to a PRD requirement.

(kind: data) When kind: data, the PRD is the data-product-scope authority for what data to build and why. Its job is to translate business intent into data-centric requirements: data sources, consumer personas, quality contracts, technical constraints (catalog, schema, medallion layer), and measurable success metrics. It sits between the Product Vision and the data-architecture artifact. Every data pipeline design choice and quality expectation should trace back to a Data PRD requirement.

Authoring guidance

  • Problem first — the problem section should make someone feel the pain before they see the solution.
  • Decision-oriented — every section should help someone make a build/skip decision. If a section doesn’t inform a decision, it’s filler.
  • Testable requirements — every P0 requirement should be verifiable. If you can’t describe how to test it, it’s too vague.
  • Traceable boundaries — requirements should connect upward to the Product Vision and downward to feature specs, designs, tests, and build work.
  • Honest non-goals — non-goals should exclude things someone might reasonably expect to be in scope. “Not a replacement for X” only matters if someone might assume it is.
Boundaries: what belongs elsewhere

Product requirements are for product scope. If you find yourself writing about:

This contentBelongs in
Market sizing, ROI, investment case00-discover/business-case.md
Positioning, target market, long-horizon strategic success00-discover/product-vision.md
Detailed feature behavior and edge cases01-frame/features/FEAT-*.md
User journey phrasing independent of product-level requirements01-frame/user-stories.md
Exact command invocations, CLI flags, endpoints, schemas, payloads, error codes, config keys, or adapter signatures02-design/contracts/
Architecture choices or implementation approach02-design/
Detailed test cases and fixtures03-test/
Build sequencing and execution slices04-build/implementation-plan.md
Quality checklist from the prompt

After drafting, verify every item. If any blocking check fails, revise before committing.

Blocking

  • Problem section quantifies the pain or names a specific failure mode
  • Every P0 requirement is testable (someone could write an acceptance test)
  • Every P0 has an acceptance test sketch with inputs and expected outputs
  • Success metrics have numeric targets and named measurement methods
  • Requirements trace upward to the Product Vision and downward to downstream artifacts
  • No [TBD], [TODO], or [NEEDS CLARIFICATION] markers in any section except Open Questions
  • Non-goals exclude something a reasonable person might assume is in scope
  • Personas are specific enough to validate with a real user

Warning

  • Summary works as a standalone 1-pager (problem, solution, metrics)
  • Goals describe state changes, not activities
  • Risk mitigations are concrete actions, not “monitor”
  • P0 requirements number 7 or fewer
  • Assumptions are falsifiable

Additional guidance continues in the full prompt below.

Example

Show a worked example of this artifact
---
ddx:
  id: example.prd.depositmatch
  depends_on:
    - example.product-vision.depositmatch
  review:
    self_hash: c9c24e1694af4548a6deaad8d92059e365da110148bd9adc44d8640dff9770a4
    deps:
      example.product-vision.depositmatch: 8abbb2fcb552b536f07829f57d91ef3ae8dbf52a6066955222e83d196b59b5ae
    reviewed_at: "2026-05-15T04:11:24Z"
---
# Product Requirements Document

## Summary

DepositMatch is a reconciliation workspace for small bookkeeping firms. It
imports bank deposits and invoice exports, suggests likely matches, and gives
reviewers an exception queue for deposits that need human judgment. The first
release targets weekly reconciliation for firms serving recurring
small-business clients. Success means reviewers can close most clients in
minutes, trust the evidence behind accepted matches, and keep unresolved
deposits from disappearing into spreadsheets or email.

## Problem and Goals

### Problem

Bookkeeping firms with growing client rosters spend 4-8 hours each week
matching bank deposits to invoices across accounting exports, bank portals,
spreadsheets, and email threads. The work is repetitive, but mistakes are
expensive: a missed split payment or duplicate invoice can delay monthly close
and trigger client follow-up days later.

### Goals

1. Reviewers can reconcile routine deposits from one workspace.
2. Every accepted match has visible evidence and reviewer attribution.
3. Unclear deposits become owned exceptions with a next action.

### Success Metrics

| Metric | Target | Measurement Method |
|--------|--------|--------------------|
| Median reconciliation time | Under 3 minutes per client per week | In-product workflow timing |
| Suggestion acceptance accuracy | 95% of accepted suggestions remain accepted in weekly audit sample | Reviewer audit |
| Exception ownership | 90% of unresolved deposits have owner and next action within one business day | Exception queue report |

### Non-Goals

- Replacing QuickBooks, Xero, or the firm's general ledger.
- Automatically posting journal entries.
- Supporting payroll, inventory, or tax workflows.
- Making irreversible match decisions without reviewer approval.

Deferred items tracked in `docs/helix/parking-lot.md`.

## Users and Scope

### Primary Persona: Maya, Reconciliation Lead

**Role**: Senior bookkeeper responsible for weekly reconciliation across 10-20
small-business clients.
**Goals**: Finish routine matching quickly, catch exceptions early, and leave a
clear audit trail for month-end review.
**Pain Points**: Rebuilding context across exports, losing client follow-up in
email, and repeating the same manual comparisons every week.

### Secondary Persona: Andre, Firm Owner

**Role**: Owner of a 12-person bookkeeping firm.
**Goals**: Increase client capacity without hiring another reviewer and reduce
month-end surprises.
**Pain Points**: Spreadsheet-based processes do not scale, and quality depends
too heavily on individual reviewer habits.

## Requirements

Each requirement traces to the Product Vision goal of reducing routine weekly
reconciliation time while preserving reviewer trust and exception ownership.

### Must Have (P0)

1. Import bank deposit CSV files and invoice export CSV files for a client.
2. Generate match suggestions using amount, date, payer, and invoice metadata.
3. Require reviewer approval before a suggested match becomes accepted.
4. Preserve match evidence, reviewer, timestamp, and source rows.
5. Create an exception for every unmatched or low-confidence deposit.

### Should Have (P1)

1. Support split deposits that pay multiple invoices.
2. Export a client-level reconciliation report.
3. Assign exception owners and due dates.

### Nice to Have (P2)

1. Bank feed integration.
2. Accounting platform API sync.
3. Client-facing question portal.

## Functional Requirements

### Import

- The system accepts CSV uploads for bank deposits and invoice exports.
- The user maps required columns on first import for each client.
- The system rejects files missing amount, date, and identifier columns.

### Match Review

- The system suggests matches with a confidence label and evidence summary.
- The reviewer can accept, reject, split, or flag each suggestion.
- Accepted matches are immutable except through a recorded correction.

### Exceptions

- The system creates an exception for every deposit without an accepted match.
- Each exception has status, owner, next action, and due date.
- Reviewers can export exceptions by client.

## Acceptance Test Sketches

| Requirement | Scenario | Input | Expected Output |
|-------------|----------|-------|-----------------|
| Import CSV files | Reviewer uploads bank and invoice exports | Two valid CSV files for one client | Imported deposits and invoices appear in review queue |
| Generate suggestions | Deposit amount and payer match open invoice | Deposit for 1200.00 from Acme Dental; invoice INV-104 for 1200.00 | High-confidence suggestion links deposit to invoice |
| Require approval | Reviewer views suggested match | Suggested match with evidence | Match remains pending until reviewer accepts |
| Preserve evidence | Reviewer accepts suggestion | Accepted match | Audit log records source rows, reviewer, timestamp, and evidence |
| Create exceptions | Deposit has no likely invoice | Deposit for 847.13 with no matching invoice | Exception is created with status `needs-review` |

## Technical Context

- **Language/Runtime**: TypeScript 5.x on Node 20+
- **Key Libraries**: React 18 for UI, Fastify 5 for API, Papa Parse for CSV
- **Data/Storage**: PostgreSQL 16
- **APIs**: Internal REST API; no external accounting API in v1
- **Platform Targets**: Modern desktop browsers; Chrome, Edge, Firefox, Safari

## Constraints, Assumptions, Dependencies

### Constraints

- **Technical**: CSV import is the only v1 data ingestion path.
- **Business**: First release must support a firm with up to 25 active clients.
- **Legal/Compliance**: Customer financial data must be encrypted at rest and
  excluded from analytics events.

### Assumptions

- Firms can export invoice data from their current accounting system.
- Weekly reconciliation is the first workflow worth optimizing.
- Reviewers will trust suggestions only when evidence is visible.

### Dependencies

- Sample CSV exports from at least three accounting systems.
- Security review for financial-data handling.
- Firm owner approval of audit-log retention policy.

## Risks

| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| CSV formats vary too much across clients | High | Medium | Add per-client column mapping and save mappings after first import |
| Suggestions look opaque and reviewers reject them | Medium | High | Show amount, date, payer, and invoice evidence beside every suggestion |
| Split payments are common enough to block adoption | Medium | Medium | Include split deposit support as P1 before paid launch |

## Open Questions

- [ ] Which three accounting exports define the v1 CSV compatibility set? - ask pilot firms.
- [ ] What audit-log retention period do firms require? - ask firm owners and legal reviewer.
- [ ] Should low-confidence suggestions appear in review or go straight to exceptions? - ask pilot reviewers.

## Success Criteria

DepositMatch is successful when pilot firms reconcile routine weekly deposits
from one workspace, reviewers accept at least 95% of audited suggestions, and
unresolved deposits consistently leave a named owner and next action.

Reference

ActivityFrame — Define what the system should do, for whom, and how success will be measured.
Default locationdocs/helix/01-frame/prd.md
RequiresNone
EnablesNone
InformsPrinciples
Feature Specification
User Stories
Feature Registry
Referenced bySolution Design
Test Plan
HELIX documentsdocs/helix/01-frame/prd.md
Generation prompt
Show the full generation prompt
# PRD Generation Prompt

Create a PRD that frames the problem, product scope, priorities, and success
criteria clearly enough that downstream feature specs, designs, tests, and
implementation work can trace back to it.

## Variant Selection (`kind`)

The PRD has two framing variants selected by the `kind` field in `meta.yml`:

- **`kind: product`** (default) — frames general product requirements.
- **`kind: data`** — frames a data product (pipeline, warehouse, data
  platform, or data service) with data sources, consumers, quality
  contracts, and platform context.

The shape of the artifact is unified; sections marked **(kind: data)** below
swap in the data-product framing when `kind: data`. Per ADR-008, both
framings share one role and one template — pick the variant that matches the
authored artifact and follow its conditional guidance.

## Storage Location

Store at: `docs/helix/01-frame/prd.md` (for either variant).

## Purpose

The PRD is the **product-scope authority for what to build and why**. Its
unique job is to translate the Product Vision into prioritized, measurable
requirements and boundaries. It sits between the product vision (which defines
direction) and feature specs (which define feature-level detail). Every design
decision and implementation choice should trace back to a PRD requirement.

**(kind: data)** When `kind: data`, the PRD is the **data-product-scope
authority for what data to build and why**. Its job is to translate business
intent into data-centric requirements: data sources, consumer personas,
quality contracts, technical constraints (catalog, schema, medallion layer),
and measurable success metrics. It sits between the Product Vision and the
data-architecture artifact. Every data pipeline design choice and quality
expectation should trace back to a Data PRD requirement.

## Reference Anchors

Use these local resource summaries as grounding:

- `docs/resources/atlassian-prd.md` frames a PRD as shared understanding of purpose, behavior, user needs, assumptions, out-of-scope items, and success criteria.
- `docs/resources/aha-prd-template.md` supports concise cross-functional scope: what is being built, who it is for, and how it delivers value.
- `docs/resources/ibm-requirements-management.md` grounds measurable, prioritized, traceable requirements and validation/verification discipline.

**(kind: data)** When `kind: data`, also use:

- `docs/resources/databricks-unity-catalog.md` grounds data governance through unified catalog hierarchies (metastore → catalog → schema → volume/table).
- `docs/resources/databricks-lakehouse-medallion-architecture.md` grounds medallion topology (Bronze/Silver/Gold) and layer responsibilities in a Lakehouse.
- `docs/resources/databricks-sdp.md` grounds Databricks Semantic Data Platform governance, lineage, and quality contracts through `EXPECT ... ON VIOLATION ...` clauses and SDP-aware pipeline patterns.

If you adopt this on another data platform, substitute Databricks concepts
with the platform equivalent (Snowflake DB/Schema/Table; BigQuery
Project/Dataset/Table; Snowpipe/Streaming Inserts for Auto Loader;
dbt/Great Expectations for SDP `EXPECT` clauses). The medallion pattern
applies universally.

## Key Principles

- **Problem first** — the problem section should make someone feel the pain
  before they see the solution.
- **Decision-oriented** — every section should help someone make a build/skip
  decision. If a section doesn't inform a decision, it's filler.
- **Testable requirements** — every P0 requirement should be verifiable. If
  you can't describe how to test it, it's too vague.
- **Traceable boundaries** — requirements should connect upward to the Product
  Vision and downward to feature specs, designs, tests, and build work.
- **Honest non-goals** — non-goals should exclude things someone might
  reasonably expect to be in scope. "Not a replacement for X" only matters if
  someone might assume it is.

## Stay in Your Lane

Product requirements are for product scope. If you find yourself writing about:

| This content | Belongs in |
|---|---|
| Market sizing, ROI, investment case | `00-discover/business-case.md` |
| Positioning, target market, long-horizon strategic success | `00-discover/product-vision.md` |
| Detailed feature behavior and edge cases | `01-frame/features/FEAT-*.md` |
| User journey phrasing independent of product-level requirements | `01-frame/user-stories.md` |
| Exact command invocations, CLI flags, endpoints, schemas, payloads, error codes, config keys, or adapter signatures | `02-design/contracts/` |
| Architecture choices or implementation approach | `02-design/` |
| Detailed test cases and fixtures | `03-test/` |
| Build sequencing and execution slices | `04-build/implementation-plan.md` |

## Section-by-Section Guidance

### Summary
Write this last. This section must work as a **standalone 1-pager**: what
we're building, who uses it, the problem, the solution approach, and the top
2-3 success metrics. Someone who reads only this section should understand the
product well enough to decide whether to invest time in the full PRD. Test:
could a new team member read this alone and explain the product to someone
else?

### Problem
Describe the failure mode, not the absence of your solution. "Users don't have
a X" is weak. "Users spend N hours/week doing Y manually because Z doesn't
exist, leading to W failures" is strong. Quantify where possible.

### Goals
Each goal should describe a state change, not an activity. "Build a dashboard"
is an activity. "Operators can see system health without SSH" is a state
change.

### Success Metrics
Every metric needs three things: what you're measuring, a numeric target, and
how you'll measure it. "User satisfaction" is not a metric. "NPS > 40 from
monthly survey of active users" is. Drop the Timeline column — success metrics
should define what success looks like, not when it happens.

**(kind: data)** When `kind: data`, frame metrics for the data product
itself: throughput (rows/day), latency (max age end-to-end), quality score
(percentage of expectations passing), cost per GB or per DBU, freshness-SLA
compliance, and consumer satisfaction. Each metric still needs a numeric
target and a named measurement method — e.g., "SLA compliance > 95% measured
by on-time delivery vs. promised refresh cadence."

### Non-Goals
Each non-goal should prevent scope creep on something plausible. "Not a
general-purpose AI" is only useful if someone might think it is. Test: would
someone on the team argue for including this? If not, it's not a useful
non-goal.

### Personas
Name them. Give them a role, goals, and pain points specific enough to
validate with a real person. "Alex the Developer" with generic goals is a
template, not a persona.

**(kind: data)** When `kind: data`, frame this section as **Data Consumers**
instead of personas. Name actual teams, systems, or roles that consume the
data, their use case (what decisions they make), their freshness/latency SLA,
the key dimensions they query, and their access level (row, column, full).
Add an inventory of upstream **Data Sources** in a parallel section: source
system, schema/table, owner, update frequency, quality baseline, and notes
on API limits or retry policy. Generic personas are insufficient for a data
product — the consumer and source tables drive every downstream design and
quality decision.

### Requirements (P0/P1/P2)
P0 = the product is broken without this. P1 = the product is weak without
this. P2 = the product is better with this. If you have more than 7 P0s,
you're not prioritizing.

Each requirement should be stable enough to trace into feature specs and tests.
If a requirement describes a screen, algorithm, API field, command flag,
payload, error code, or implementation sequence in detail, move that detail to
the owning downstream artifact and keep the PRD at product scope. Exact
interface surface belongs in a Contract.

### Functional Requirements
Functional requirements are product-level capability and outcome requirements
grouped by subsystem. Each one should be testable without defining exact
interface, API, CLI, event, schema, config, telemetry, or adapter surface.

**(kind: data)** When `kind: data`, the functional requirements describe
**data behavior**: ingestion cadence, deduplication rules, transformation
contracts, freshness windows, schema-evolution policy, and consumer-facing
table or feed definitions. Add a **Data Quality Requirements** subsection
with quality dimensions (completeness, timeliness, accuracy, uniqueness)
each carrying a P0 threshold, a P1 threshold, a measurement method, and an
enforcement strategy (alert, reject, quarantine). Reference the
`data-quality-expectations` artifact for executable `EXPECT` clauses per
medallion layer; the PRD owns the requirement, the expectations artifact
owns the contract.

**Group requirements under named subsystems.** Organize FRs by subsystem (not by
priority) under canonical, parseable headings: `### Subsystem: <name>`. Each
`FR-n` belongs to **exactly one** subsystem. A subsystem is a cohesive capability
of the product — the unit that becomes a feature: **one subsystem maps to ~one
feature spec** (`FEAT-NNN`). This is the PRD↔FEAT boundary — the PRD owns
*breadth* (it names every subsystem and enumerates all `FR-n` + priorities); the
feature spec owns *depth* (one subsystem's behavior, ACs, edge cases). A
multi-subsystem product must not collapse into a single mega-feature, nor should
one subsystem fragment into many tiny features. (The feature spec's Decomposition
test resolves borderline cases; reconcile-alignment checks that every subsystem
maps to a feature.)

Give each functional requirement a **stable `FR-n` ID** (`FR-1`, `FR-2`, …). The
ID is the trace anchor: every `FR-n` must be covered by ≥1 user story, and
reconcile-alignment flags any `FR-n` with no story as a coverage gap. Number
sequentially and never renumber on edit — downstream stories reference the ID by
name.

**Surface boundary example.** Do not write a PRD requirement as
``FR-1: `tool autotune --gpu-tier high --write-config` writes config.toml``.
Write the product requirement as `FR-1: Operators can generate a validated
runtime profile for the selected hardware tier and persist it for future runs`,
then add a handoff or link to a Contract for the exact command, flags, config
keys, exit codes, and compatibility rules.

### Acceptance Test Sketches
For each P0 requirement, write a concrete scenario: what the user does, what
input they provide, and what observable result they see. These aren't full test
cases — they're the minimum an implementer (human or agent) needs to verify
the requirement is met. An AI agent should be able to read a sketch and write
a passing test without asking clarifying questions.

### Technical Context
Name the stack, key dependencies with versions, API schemas, and platform
targets. Be specific enough that an implementer knows what to install and what
interfaces to code against. "React" is not enough — "React 18 with TypeScript
5.x and Vite 6" is. If there's an API schema (OpenAPI, GraphQL SDL), point to
it. This section exists because AI agents need concrete dependency and
interface information to produce correct implementations.

**Important**: This section records stack decisions — it does not make them.
Stack selection rationale belongs in ADRs (Architecture Decision Records). If
you're documenting a choice that doesn't have an ADR yet, note it in Open
Questions. If an existing ADR contradicts what you'd write here, the ADR
governs until it's superseded.

**(kind: data)** When `kind: data`, frame the technical context as the
**data-platform context**: target catalog and schema (e.g., `prod.customer_360`),
medallion layer strategy (Bronze/Silver/Gold scopes and responsibilities),
ingestion pattern (Auto Loader, Streaming Tables, batch), processing model
(streaming vs batch vs incremental), compute tier (all-purpose, jobs,
serverless), storage format (Delta, Parquet), DBU budget assumption, and
governance posture (data classification, retention policy, audit trail,
lineage). Pin the access-control model: row-level security, column masking,
and the catalog policies that enforce it. Same ADR discipline applies — the
PRD records platform decisions, ADRs justify them.

### Constraints
Name real constraints, not aspirational ones. "Must work on mobile" is a
constraint only if you'd otherwise skip it. Budget, compliance, and platform
constraints matter most.

### Assumptions
These are bets. When an assumption is wrong, the plan breaks. Name each one
so the team knows what to watch.

### Risks
Each risk needs a concrete mitigation, not "monitor closely." If the
mitigation is monitoring, say what you'll monitor and what triggers action.

### Open Questions
List unresolved items explicitly rather than leaving `[TBD]` markers
scattered through the document. Each question should name who can answer it
and what's blocked by it. This section is honest about what you don't know
yet — it's better to have a clear list of unknowns than a document that
pretends to be complete.

### Success Criteria
These are the acceptance criteria for the entire initiative. They should be
observable outcomes ("operators can do X without Y") not activities ("we
shipped Z").

## Quality Checklist

After drafting, verify every item. If any blocking check fails, revise before
committing.

### Blocking

- [ ] Problem section quantifies the pain or names a specific failure mode
- [ ] Every P0 requirement is testable (someone could write an acceptance test)
- [ ] Every P0 has an acceptance test sketch with inputs and expected outputs
- [ ] Success metrics have numeric targets and named measurement methods
- [ ] Requirements trace upward to the Product Vision and downward to downstream artifacts
- [ ] No `[TBD]`, `[TODO]`, or `[NEEDS CLARIFICATION]` markers in any section except Open Questions
- [ ] Non-goals exclude something a reasonable person might assume is in scope
- [ ] Personas are specific enough to validate with a real user

### Warning

- [ ] Summary works as a standalone 1-pager (problem, solution, metrics)
- [ ] Goals describe state changes, not activities
- [ ] Risk mitigations are concrete actions, not "monitor"
- [ ] P0 requirements number 7 or fewer
- [ ] Assumptions are falsifiable
- [ ] Functional requirements are grouped under canonical `### Subsystem: <name>` headings (each `FR-n` under exactly one subsystem); each subsystem is a capability that maps to ~one feature spec
- [ ] Each functional requirement carries a stable `FR-n` ID for downstream story traceability
- [ ] Technical Context names specific versions, not just library names
- [ ] Open Questions name who can answer and what's blocked
Template
Show the template structure
---
ddx:
  id: prd
kind: product  # `product` (default) frames general product requirements; `data` frames a data product (pipeline, warehouse, data platform, or service). See ADR-008.
---

# Product Requirements Document

> **Variant guidance.** This template carries both the default `product`
> framing and a `data` framing. Sections marked **(kind: data)** apply when
> `kind: data` and replace the corresponding `product` framing above them.
> When `kind: product`, ignore the **(kind: data)** blocks. The shape of the
> document is the same; the framing is parameterized.

## Summary

[This section should work as a standalone 1-pager. Include: what we're
building, who uses it, what problem it solves, the solution approach, and the
top 2-3 success metrics. Write this last — it should be a distillation of the
full PRD, not an introduction. Someone who reads only this section should
understand the product well enough to decide whether to read the rest.

**(kind: data)** Frame this as a standalone 1-pager for the **data product**:
what data we are building, who consumes it, the business problem it solves,
the data solution approach (sources, medallion layer strategy, consumption
shape), and the top 2-3 success metrics (freshness, quality, cost).]

## Problem and Goals

### Problem

[What is broken or missing? Who is affected? Be specific about the failure
mode — not "users struggle with X" but "users spend N hours per week doing X
because Y doesn't exist."

**(kind: data)** Be specific about the data failure mode — not "users
struggle with reporting" but "sales analysts spend 4 hours per week
reconciling pipeline outputs with source systems because current freshness
is 24 hours and source data changes hourly."]

### Goals

1. [Primary goal — what changes for users]
2. [Secondary goal]

### Success Metrics

| Metric | Target | Measurement Method |
|--------|--------|--------------------|
| [Metric] | [Numeric target] | [Named tool or process] |

**(kind: data)** When `kind: data`, frame metrics for the data product
itself (throughput, latency, quality score, cost per GB). Include a baseline
and cadence column:

| Metric | Target | Baseline | Measurement Method | Cadence |
|--------|--------|----------|--------------------|---------|
| [Throughput] | [e.g., 1M rows/day] | [Current: 100K rows/day] | [COUNT(*) from production table] | Daily |
| [Latency] | [e.g., ≤1 hour end-to-end] | [Current: 4 hours] | [MAX(ingestion_timestamp) - MAX(source_timestamp)] | Hourly |
| [Quality Score] | [e.g., ≥98%] | [Current: 85%] | [Automated quality checks pass rate] | Daily |
| [Cost per GB] | [e.g., $0.05/GB/month] | [Current: $0.12/GB/month] | [DBU spend / data volume] | Monthly |

### Non-Goals

[What we are explicitly not trying to achieve. Each non-goal should exclude
something a reasonable person might assume is in scope.]

Deferred items tracked in `docs/helix/parking-lot.md`.

## Users and Scope

### Primary Persona: [Name]

**Role**: [Job title/function]
**Goals**: [What they want to achieve]
**Pain Points**: [Current frustrations — specific enough to validate]

### Secondary Persona: [Name]

[Same structure]

### (kind: data) Data Consumers

[When `kind: data`, replace the persona blocks with concrete data consumers.]

#### Primary Consumer: [Name/Role]

**Team**: [Data Engineering, Analytics, Product, Finance, etc.]
**Use Case**: [What they do with the data; what decision it informs]
**Frequency**: [Real-time, daily, weekly, ad-hoc]
**Key Tables/Feeds**: [Which outputs matter most]

#### Data Consumer Requirements Table

| Consumer | Use Case | Freshness SLA | Latency Tolerance | Key Dimensions | Access Level |
|----------|----------|---------------|-------------------|----------------|--------------|
| [Team] | [What they do] | [e.g., hourly] | [max delay] | [customer_id, product_id, ...] | [Row-level, Column-level, or Full] |

### (kind: data) Data Sources

[Inventory of upstream systems supplying this data product.]

| Source System | Schema / Table | Owner | Update Frequency | Quality Baseline | Notes |
|---------------|----------------|-------|------------------|------------------|-------|
| [e.g., Salesforce] | [e.g., Accounts, Opportunities] | [Team] | [hourly, daily, on-demand] | [% completeness, freshness] | [Data model version, API limits, retry policy] |

## Requirements

Each requirement should trace to the Product Vision and be specific enough to
drive feature specs, designs, tests, and implementation work without embedding
the detailed design here. Exact command invocations, CLI flags, endpoints,
schemas, payloads, error codes, config keys, telemetry fields, event shapes,
and adapter signatures belong in `02-design/contracts/`, not in PRD
requirements.

### Must Have (P0)

1. [Core capability — what must be true for the product to be usable]

### Should Have (P1)

1. [Important feature — valuable but not blocking launch]

### Nice to Have (P2)

1. [Enhancement — improves experience but can be deferred]

## Functional Requirements

[Product-level capability and outcome requirements grouped under canonical
`### Subsystem: <name>` headings. Each requirement is testable without defining
exact API, CLI, event, schema, config, telemetry, or adapter surface. Each
`FR-n` belongs to exactly one subsystem. A subsystem is a cohesive product
capability — the unit that maps to ~one feature spec (`FEAT-NNN`). The PRD owns
breadth (all subsystems + `FR-n` + priority); feature specs own each subsystem's
depth, and Contracts own exact shared interface surfaces.

Each functional requirement carries a **stable `FR-n` ID** (e.g. `FR-1`). The ID
survives edits so downstream artifacts trace to a specific requirement by name:
every `FR-n` must map to ≥1 user story `US-n`, and reconcile-alignment checks that
mapping (and that each subsystem maps to a feature) as a coverage floor. Number
them sequentially; do not renumber on edit.]

### Subsystem: [Name — a cohesive capability that becomes ~one FEAT]

- **FR-1** — [product capability or outcome requirement, testable]
- **FR-2** — [product capability or outcome requirement, testable]

### Subsystem: [Name]

- **FR-3** — [product capability or outcome requirement, testable]

### (kind: data) Data Quality Requirements

[When `kind: data`, add this subsection. Quality dimensions with numeric
thresholds and enforcement strategy. Reference `data-quality-expectations`
for executable `EXPECT` clauses per medallion layer.]

| Dimension | P0 Threshold | P1 Threshold | Measurement Method | Enforcement |
|-----------|--------------|--------------|--------------------|-------------|
| Completeness | [e.g., ≥99%] | [e.g., ≥95%] | [Count NULLs / total rows] | [Alert if below P0] |
| Timeliness | [e.g., ≤1 hour lag] | [e.g., ≤4 hour lag] | [MAX(ingestion_time) - MAX(source_time)] | [Reject data if exceeds P0] |
| Accuracy | [e.g., ≥98% match to source] | [e.g., ≥95% match] | [Row-count reconciliation + sample audit] | [Manual review + auto-reject if P0 fails] |
| Uniqueness | [e.g., PK has no duplicates] | [as P0] | [COUNT(*) = COUNT(DISTINCT PK)] | [Fail ingestion] |

## Acceptance Test Sketches

[For each P0 requirement, describe a concrete scenario with inputs and
expected outputs. These aren't full test cases — they're the minimum needed
for an implementer (human or agent) to verify the requirement is met.]

| Requirement | Scenario | Input | Expected Output |
|-------------|----------|-------|-----------------|
| [P0 requirement] | [What the user does] | [Specific input or state] | [Observable result] |

## Technical Context

[Stack, key dependencies with versions, API schemas, and platform targets.
Be specific enough that an implementer knows what to install and what
interfaces to code against. This section records current stack decisions — it
does not make them. Stack selection rationale belongs in ADRs. If a choice
here isn't backed by an ADR yet, note it in Open Questions.]

- **Language/Runtime**: [e.g., TypeScript 5.x, Node 20+]
- **Key Libraries**: [e.g., React 18, Tailwind CSS 4]
- **Data/Storage**: [e.g., PostgreSQL 16, Redis 7]
- **APIs**: [e.g., OpenAPI spec at docs/api/v2.yaml]
- **Platform Targets**: [e.g., Linux, macOS; browser: Chrome/Firefox/Safari latest]

### (kind: data) Data Platform Context

[When `kind: data`, replace the stack list above with platform context.]

- **Target Catalog**: [e.g., `prod`, `analytics`, or domain-specific catalog]
- **Target Schema**: [e.g., `customer_360`, `payment_events`]
- **Medallion Layers**: Bronze (raw), Silver (validated), Gold (business)
- **Access Control Model**: [UC policies, row-level security, column masking]

| Feature | Decision | Rationale |
|---------|----------|-----------|
| Ingestion Pattern | [Auto Loader, Streaming Tables, batch] | [Why this choice?] |
| Processing Model | [Streaming, Batch, Incremental] | [Freshness SLA and cost tradeoff] |
| Compute Tier | [All-purpose, Jobs, Serverless] | [Workload characteristics, cost model] |
| Storage Format | [Delta, Parquet, CSV] | [Durability, query performance needs] |
| DBU Budget (Monthly) | [Estimated spend] | [Based on row volume, freshness, complexity] |

- **Data Classification**: [Public, Internal, Sensitive, PII]
- **Retention Policy**: [e.g., Bronze: 7 days, Silver: 90 days, Gold: 2 years]
- **Audit Trail**: [Who accessed what, when, why]
- **Lineage Tracking**: [Table-to-table dependencies for impact analysis]

## Constraints, Assumptions, Dependencies

### Constraints

- **Technical**: [Platform or technology limits]
- **Business**: [Budget, timeline, resource limits]
- **Legal/Compliance**: [Regulatory requirements]

### Assumptions

- [Key assumptions — what must be true for this plan to work]

### Dependencies

- [External systems, teams, or artifacts this work depends on]

## Risks

| Risk | Probability | Impact | Mitigation |
|------|-------------|--------|------------|
| [Risk] | High/Med/Low | High/Med/Low | [Concrete strategy, not "monitor"] |

## Open Questions

[Unresolved items that need answers before or during implementation. Each
should name who can answer it and what's blocked by it. Prefer explicit
questions here over `[TBD]` markers scattered through the document.]

- [ ] [Question] — blocks [what], ask [who]

## Success Criteria

[What must be true to call the initiative successful. These should be
observable outcomes, not activities.]

## Review Checklist

Use this checklist when reviewing a PRD artifact:

- [ ] Summary works as a standalone 1-pager — someone can decide whether to read the rest
- [ ] Problem statement describes a specific failure mode with concrete cost
- [ ] Goals are outcomes, not activities ("users can X" not "we build Y")
- [ ] Success metrics have numeric targets and named measurement methods
- [ ] Non-goals exclude things a reasonable person might assume are in scope
- [ ] Personas have specific pain points, not generic descriptions
- [ ] P0 requirements are necessary for launch — removing any one makes the product unusable
- [ ] P1/P2 requirements are correctly prioritized relative to each other
- [ ] Every P0 requirement has an acceptance test sketch
- [ ] Requirements can trace upward to the Product Vision and downward to downstream artifacts
- [ ] Functional requirements are testable — each can be verified with specific inputs and expected outputs
- [ ] Each functional requirement carries a stable `FR-n` ID so user stories can trace to it by name
- [ ] Functional requirements are grouped under canonical `### Subsystem: <name>` headings, each `FR-n` under exactly one subsystem; each subsystem is a capability that maps to ~one feature spec
- [ ] Technical context names specific versions and interfaces, not vague technology areas
- [ ] Risks have concrete mitigations ("we do X"), not vague strategies ("we monitor")
- [ ] Open questions name who can answer and what is blocked
- [ ] No contradictions between requirements sections
- [ ] PRD is consistent with the governing product vision