Skip to content

Data Design

Purpose

Design-level data architecture covering entities, stores, access patterns, constraints, and migration strategy.

Example

Show a worked example of this artifact
---
ddx:
  id: example.data-design.depositmatch
  depends_on:
    - example.solution-design.depositmatch
    - example.security-requirements.depositmatch
    - example.threat-model.depositmatch
---

# Data Design

## Data Summary

- Scope: DepositMatch CSV import, match review, exception ownership, and review
  log export for the CSV-first pilot.
- Storage systems: PostgreSQL for normalized records and audit data; object
  storage for temporary source CSV files.
- Main concerns: firm/client isolation, import validation, audit integrity,
  retention, restricted telemetry, and reversible schema migration.

## Entities and Stores

| Entity or Store | Purpose | Key Fields | Volume / Growth | Notes |
|-----------------|---------|------------|-----------------|-------|
| Firm | Tenant boundary for staff, clients, imports, and records | id, name, status | Low; one per pilot firm | All customer data scopes through firm_id |
| Client | Customer business managed by a firm | id, firm_id, name, status | 5-25 per pilot firm | Required for exception grouping |
| Import Batch | One uploaded CSV set for a client and period | id, firm_id, client_id, type, status, uploaded_by, source_file_ref | Weekly per client | Stores processing state, not raw CSV content |
| Deposit Record | Normalized bank deposit row | id, firm_id, client_id, batch_id, amount, date, payer_ref, normalized_hash | Hundreds per client/week | Restricted financial data |
| Invoice Record | Normalized invoice export row | id, firm_id, client_id, batch_id, invoice_ref, amount_due, due_date, payer_ref | Hundreds per client/week | Restricted business/customer data |
| Match Suggestion | Candidate link between deposit and invoice records | id, firm_id, client_id, deposit_id, invoice_id, confidence, evidence_json, status | Derived per import | Must be reviewable before acceptance |
| Review Decision | Reviewer action on a suggestion or exception | id, firm_id, client_id, actor_id, action, source_refs, created_at | Grows with review activity | Audit record; immutable after write |
| Exception Item | Unresolved deposit needing ownership and next action | id, firm_id, client_id, deposit_id, owner_id, next_action, status, due_at | Small subset of deposits | Drives follow-up queue |
| Source File Object | Temporary original CSV file | object_key, firm_id, client_id, batch_id, retention_until | Short-lived | Delete according to retention policy |

## Relationships

| From | To | Type | Cardinality | On Delete |
|------|----|------|-------------|-----------|
| Firm | Client | 1:N | Required | RESTRICT |
| Client | Import Batch | 1:N | Required | RESTRICT |
| Import Batch | Deposit Record | 1:N | Optional by batch type | RESTRICT |
| Import Batch | Invoice Record | 1:N | Optional by batch type | RESTRICT |
| Deposit Record | Match Suggestion | 1:N | Optional | RESTRICT |
| Invoice Record | Match Suggestion | 1:N | Optional | RESTRICT |
| Match Suggestion | Review Decision | 1:N | Optional | RESTRICT |
| Deposit Record | Exception Item | 1:0..1 | Optional | RESTRICT |

## Access Patterns and Constraints

| Access Pattern | Frequency | Performance Need | Supporting Index or Cache |
|----------------|-----------|------------------|---------------------------|
| List imports by firm/client/status | Daily per reviewer | <500 ms for pilot scale | `(firm_id, client_id, status, created_at)` |
| Load review queue for a client | Weekly per client | <1 s for hundreds of suggestions | `(firm_id, client_id, status, confidence)` |
| Accept or reject a match | Many per review session | Transactional write with audit record | Transaction on suggestion + review_decision |
| List open exceptions by owner/client | Daily | <500 ms | `(firm_id, owner_id, status, due_at)` |
| Export review log by client/date range | Weekly/monthly | Async if large | `(firm_id, client_id, created_at)` on review decisions |
| Delete expired source files | Daily job | Complete within maintenance window | object metadata: retention_until |

## Validation and Security

| Field or Data Type | Rules / Classification | Protection or Error Handling |
|--------------------|------------------------|------------------------------|
| firm_id/client_id | Tenant and client boundary | Required in every restricted table and authorization query |
| source CSV object | Restricted financial data | Encrypt, short retention, delete after retention_until |
| amount/date/payer_ref | Restricted financial/customer data | Validate type and bounds; avoid telemetry/logging raw values |
| evidence_json | Audit-supporting evidence | Store source row references and normalized evidence, not unnecessary raw file content |
| Review Decision | Audit record | Append-only; corrections create new decisions |
| normalized_hash | Deduplication support | Use for duplicate detection; do not expose to users |

## Migration Strategy

- Tooling: Versioned database migrations committed with application code.
- Approach: Use additive migrations for pilot tables before enabling writes.
  Avoid destructive schema changes during pilot. For breaking changes, expand
  schema first, migrate/backfill data, update code, then contract only after
  old readers/writers are gone.
- Backfill or cleanup: Import batches may require cleanup jobs for expired
  source files and failed partial imports. Backfills must be idempotent and
  firm-scoped.
- Rollback: Application rollback must remain compatible with additive schema
  changes. Destructive changes require a separate rollback plan and explicit
  approval.

Reference

ActivityDesign — Decide how to build it. Capture trade-offs, contracts, and architecture decisions.
Default locationdocs/helix/02-design/data-design.md
RequiresNone
EnablesNone
InformsTechnical Design
Test Plan
HELIX documentsdocs/helix/02-design/data-design.md
Generation prompt
Show the full generation prompt
# Data Design Generation Prompt
Document the data model and access patterns needed to support the design.

Reference Anchors

Use this local resource summary as grounding:

  • docs/resources/fowler-evolutionary-database-design.md grounds schema evolution, versioned migrations, data movement, and rollback expectations.

Focus

  • Name the main entities, stores, and key fields.
  • Make relationships, lifecycle, and integrity constraints explicit.
  • Capture the main access patterns and their performance or consistency needs.
  • Note privacy, classification, retention, and protection consequences where they materially shape the design.
  • Define migration and rollback expectations for schema or storage changes.
  • Avoid drifting into implementation-specific query or ORM code.

Role Boundary

Data Design is not the full architecture or implementation plan. It explains the data model, storage responsibilities, access patterns, integrity/security constraints, and migration consequences that technical designs must honor.

Completion Criteria

  • The model is understandable to another engineer without reading code.
  • Key data decisions and constraints are explicit.
  • Access patterns and migration strategy are concrete enough to guide implementation and tests.
Template
Show the template structure
---
ddx:
  id: data-design
---

Data Design

Data Summary

  • Scope: [What feature, subsystem, or workflow this data design supports]
  • Storage systems: [Database, queue, cache, object store]
  • Main concerns: [Consistency, scale, retention, privacy, migration]

Entities and Stores

Entity or Store Purpose Key Fields Volume / Growth Notes
[Name] [What it represents] [Important fields] [Expected scale] [Business rules or constraints]

Relationships

From To Type Cardinality On Delete
[Entity1] [Entity2] [1:N, N:M] [Required/Optional] [CASCADE/RESTRICT/SET NULL]

Access Patterns and Constraints

Access Pattern Frequency Performance Need Supporting Index or Cache
[Read or write path] [Rate] [Latency or throughput target] [Index, partition, cache]

Validation and Security

Field or Data Type Rules / Classification Protection or Error Handling
[Field] [Constraints or classification] [Masking, encryption, validation, retention]

Migration Strategy

  • Tooling: [Migration framework]
  • Approach: [Schema rollout and rollback strategy]
  • Backfill or cleanup: [If needed]