Test Plan: TP-002-helix-cli

Source identity (from 03-test/test-plans/TP-002-helix-cli.md):

ddx:
  id: TP-002
  status: partially-superseded
  superseded_by: helix.prd

PARTIALLY SUPERSEDED — This test plan validates wrapper CLI behavior, tracker operations, and run-loop mechanics. The current PRD (helix.prd) removes the CLI and execution loop from HELIX’s scope. This test plan survives only as DDx adapter / transition compatibility test coverage for the duration that the wrapper CLI exists as a reference-runtime tool. Core HELIX verification must instead test catalog completeness, artifact schema conformance, and portable alignment skill behavior — not wrapper command behavior.

Test Plan: TP-002-helix-cli

Status: backfilled Backfill Date: 2026-03-25

Test Objective

Protect the wrapper CLI contract with deterministic shell tests that exercise queue control, tracker semantics, prompt construction, installer behavior, and command-specific safety rules.

Primary Verification Command

bash tests/helix-cli.sh

Covered Behaviors

Tracker (`ddx bead` over `.ddx/beads.jsonl`)

issue creation and display
dependency-aware ready and blocked queries
claim flow setting in_progress and assignee
--claim records claimed-at (ISO-8601 UTC) and claimed-pid metadata
--unclaim restores open status and clears claim metadata
claimed work remains owned until it is explicitly released or closed
tracker update coverage for execution metadata fields including execution-eligible, superseded-by, and replaces
tracker status summary
lock timeout reports the recorded owner and fails closed
execution-safe ready queries exclude refinement and superseded work

Wrapper Help and Dry-Run Output

help lists supported commands and key options
check --dry-run prints the expected agent invocation and action reference
backfill --dry-run includes writable-session and trailer requirements
design, polish, review, and experiment dry-runs include their scoped prompt details
build --dry-run and triage surfaces reflect the converged command and tracker-validation contract
align --dry-run and align behavior reflect the bead-governed alignment contract rather than an ad hoc standalone review path

Loop, Queue, and Cycle Control

run stops after the queue drains
check can emit DESIGN and POLISH when design or issue refinement must happen before execution resumes
run dispatches bounded design and polish passes from queue-drain NEXT_ACTION results, then re-checks before build resumes
run --review-every N triggers periodic alignment
run auto-aligns once after NEXT_ACTION: ALIGN
run surfaces alignment failures
align acquires or creates the governing kind:planning,action:align bead before it writes reports or follow-on issues
run treats NEXT_ACTION: WAIT as terminal and does not attempt an unblock build pass
run surfaces NEXT_ACTION: BACKFILL as a distinct terminal branch rather than collapsing it into WAIT or STOP
run --max-cycles N counts successful build completions, not failed attempts
failed implementation attempts do not advance completed-cycle counters or periodic alignment timing
Codex token accounting captures the tokens used footer when Codex emits it on stderr
.helix/context.md is regenerated at run start, on epic switch, and every 5 completed build cycles with Quick Reference build/test commands and current issue counts
run revalidates selected work before claim and before close using tracker fingerprints (spec-id, parent, superseded-by, replaces)
interactive refinement during a live run is surfaced as queue drift rather than stale claim/close behavior
parent field changes during execution are detected as queue drift
spec-id changes during execution are detected as queue drift
supersession during execution is detected as queue drift and blocks a stale close
run stays focused on an active epic until its child work finishes or a blocker releases focus
run retries difficult issues with bounded exponential backoff (min(5 * 2^(attempt-1), 40) seconds, 4 attempts max) before blocking
backoff delay formula produces correct values (5, 10, 20, 40s cap)
intractable child blocks the parent epic during epic focus
run expands batch selection to shared area:* labels when parent and spec-id metadata do not produce siblings
run emits blocker reports, cycle timing, and token-usage observability data for helix status

Backfill Contract

backfill fails when BACKFILL_REPORT is missing
backfill succeeds only when the declared report file exists

Recovery and Review

orphan recovery reclaims stale issues when PID is dead and claim age exceeds HELIX_ORPHAN_THRESHOLD (default 7200s)
orphan recovery skips issues with fresh claimed-at timestamps
orphan recovery does not destroy unrelated worktree changes
orphan recovery does not unclaim legitimately active work without sufficient evidence
recovery is issue-scoped and non-destructive by default
failed or timed-out implementation attempts leave the worktree clean for the next retry or stop with an explicit blocker instead of retrying atop stale local state
failed or timed-out implementation attempts release stale claims via --unclaim before a fresh retry path resumes
run invokes post-implementation review when enabled
run --review-agent <other-agent> switches review to a second model for cross-model verification (tested in live run, not just dry-run)
REVIEW_STATUS: CLEAN allows the loop to continue
REVIEW_STATUS: ISSUES_FOUND with ISSUES_COUNT and FINDINGS_FILED trailers is parsed and the loop continues
review findings are surfaced and redirect or stop the loop rather than being ignored
epic closure triggers a scoped post-epic review

Summary Mode

--summary flag is accepted and implies --quiet
summary output contains concise cycle lines with issue IDs and completion status
verbose detail (tool calls, prompt echo, gate results) goes to log file only
summary output includes log-file line-range pointers for diagnostics
--summary is listed in help output

BUILD Loop Breaker

consecutive empty BUILD cycles (check returns BUILD, no issue selectable) stop after 2 iterations
orphan recovery is attempted before stopping
if recovery frees issues, the loop continues

Commit

commit fails with nothing to commit
commit <issue-id> stages, runs build gate, commits with issue title, and closes the tracker issue
commit without issue ID generates a summary from changed filenames
commit auto-stages unstaged modifications when nothing is staged

Issue Selection Priority

run prefers non-epic tasks with execution metadata (spec-id, acceptance, or design) over epics when selecting from the ready queue

Utility Commands

next returns the first ready issue or no ready issues
experiment requires a clean worktree
experiment --close includes close-session guidance
status reports persisted run-controller state and blocker summaries
triage enforces required create fields instead of allowing partially specified HELIX issues
installer creates the local helix launcher

Test Method

Create isolated temporary git workspaces
Inject mock codex and claude binaries
Seed .ddx/beads.jsonl (the ddx bead tracker store) with known issue graphs
Assert exact stdout or stderr fragments and filesystem side effects

Test Count

133 deterministic tests verified by bash tests/helix-cli.sh.

Port Safety

The test harness is implementation-language-agnostic. To verify a port to another language, change the run_helix() helper to invoke the new binary instead of bash scripts/helix. All mock agents, tracker JSONL assertions, and stderr output checks work unchanged.

Known Gaps

The current harness validates prompt shape and loop behavior, not live remote agent correctness.
The harness should be extended if check grows additional machine-readable trailers beyond NEXT_ACTION that affect loop control.
The harness should be extended when helix status begins exposing richer lifecycle history than the initial run-controller snapshot contract.

Evidence

docs/helix/01-frame/features/FEAT-002-helix-cli.md
docs/helix/02-design/technical-designs/TD-002-helix-cli.md
workflows/EXECUTION.md
workflows/TRACKER.md
tests/helix-cli.sh

Test Plan: TP-002-helix-cli

Test Plan: TP-002-helix-cli

Test Objective

Primary Verification Command

Covered Behaviors

Tracker (ddx bead over .ddx/beads.jsonl)

Wrapper Help and Dry-Run Output

Loop, Queue, and Cycle Control

Backfill Contract

Recovery and Review

Summary Mode

BUILD Loop Breaker

Commit

Issue Selection Priority

Utility Commands

Test Method

Test Count

Port Safety

Known Gaps

Evidence

Tracker (`ddx bead` over `.ddx/beads.jsonl`)