Test Plan: TP-002-helix-cli
Source identity (from
03-test/test-plans/TP-002-helix-cli.md):
ddx:
id: TP-002
status: partially-superseded
superseded_by: helix.prdPARTIALLY SUPERSEDED — This test plan validates wrapper CLI behavior, tracker operations, and run-loop mechanics. The current PRD (
helix.prd) removes the CLI and execution loop from HELIX’s scope. This test plan survives only as DDx adapter / transition compatibility test coverage for the duration that the wrapper CLI exists as a reference-runtime tool. Core HELIX verification must instead test catalog completeness, artifact schema conformance, and portable alignment skill behavior — not wrapper command behavior.
Test Plan: TP-002-helix-cli
Status: backfilled Backfill Date: 2026-03-25
Test Objective
Protect the wrapper CLI contract with deterministic shell tests that exercise queue control, tracker semantics, prompt construction, installer behavior, and command-specific safety rules.
Primary Verification Command
bash tests/helix-cli.shCovered Behaviors
Tracker (ddx bead over .ddx/beads.jsonl)
- issue creation and display
- dependency-aware ready and blocked queries
- claim flow setting
in_progressand assignee --claimrecordsclaimed-at(ISO-8601 UTC) andclaimed-pidmetadata--unclaimrestoresopenstatus and clears claim metadata- claimed work remains owned until it is explicitly released or closed
- tracker update coverage for execution metadata fields including
execution-eligible,superseded-by, andreplaces - tracker status summary
- lock timeout reports the recorded owner and fails closed
- execution-safe ready queries exclude refinement and superseded work
Wrapper Help and Dry-Run Output
helplists supported commands and key optionscheck --dry-runprints the expected agent invocation and action referencebackfill --dry-runincludes writable-session and trailer requirementsdesign,polish,review, andexperimentdry-runs include their scoped prompt detailsbuild --dry-runandtriagesurfaces reflect the converged command and tracker-validation contractalign --dry-runandalignbehavior reflect the bead-governed alignment contract rather than an ad hoc standalone review path
Loop, Queue, and Cycle Control
runstops after the queue drainscheckcan emitDESIGNandPOLISHwhen design or issue refinement must happen before execution resumesrundispatches boundeddesignandpolishpasses from queue-drainNEXT_ACTIONresults, then re-checks before build resumesrun --review-every Ntriggers periodic alignmentrunauto-aligns once afterNEXT_ACTION: ALIGNrunsurfaces alignment failuresalignacquires or creates the governingkind:planning,action:alignbead before it writes reports or follow-on issuesruntreatsNEXT_ACTION: WAITas terminal and does not attempt an unblock build passrunsurfacesNEXT_ACTION: BACKFILLas a distinct terminal branch rather than collapsing it intoWAITorSTOPrun --max-cycles Ncounts successful build completions, not failed attempts- failed implementation attempts do not advance completed-cycle counters or periodic alignment timing
- Codex token accounting captures the
tokens usedfooter when Codex emits it on stderr .helix/context.mdis regenerated at run start, on epic switch, and every 5 completed build cycles with Quick Reference build/test commands and current issue countsrunrevalidates selected work before claim and before close using tracker fingerprints (spec-id, parent, superseded-by, replaces)- interactive refinement during a live run is surfaced as queue drift rather than stale claim/close behavior
- parent field changes during execution are detected as queue drift
- spec-id changes during execution are detected as queue drift
- supersession during execution is detected as queue drift and blocks a stale close
runstays focused on an active epic until its child work finishes or a blocker releases focusrunretries difficult issues with bounded exponential backoff (min(5 * 2^(attempt-1), 40)seconds, 4 attempts max) before blocking- backoff delay formula produces correct values (5, 10, 20, 40s cap)
- intractable child blocks the parent epic during epic focus
runexpands batch selection to sharedarea:*labels when parent andspec-idmetadata do not produce siblingsrunemits blocker reports, cycle timing, and token-usage observability data forhelix status
Backfill Contract
backfillfails whenBACKFILL_REPORTis missingbackfillsucceeds only when the declared report file exists
Recovery and Review
- orphan recovery reclaims stale issues when PID is dead and claim age exceeds
HELIX_ORPHAN_THRESHOLD(default 7200s) - orphan recovery skips issues with fresh
claimed-attimestamps - orphan recovery does not destroy unrelated worktree changes
- orphan recovery does not unclaim legitimately active work without sufficient evidence
- recovery is issue-scoped and non-destructive by default
- failed or timed-out implementation attempts leave the worktree clean for the next retry or stop with an explicit blocker instead of retrying atop stale local state
- failed or timed-out implementation attempts release stale claims via
--unclaimbefore a fresh retry path resumes runinvokes post-implementation review when enabledrun --review-agent <other-agent>switches review to a second model for cross-model verification (tested in live run, not just dry-run)REVIEW_STATUS: CLEANallows the loop to continueREVIEW_STATUS: ISSUES_FOUNDwithISSUES_COUNTandFINDINGS_FILEDtrailers is parsed and the loop continues- review findings are surfaced and redirect or stop the loop rather than being ignored
- epic closure triggers a scoped post-epic review
Summary Mode
--summaryflag is accepted and implies--quiet- summary output contains concise cycle lines with issue IDs and completion status
- verbose detail (tool calls, prompt echo, gate results) goes to log file only
- summary output includes log-file line-range pointers for diagnostics
--summaryis listed in help output
BUILD Loop Breaker
- consecutive empty BUILD cycles (check returns BUILD, no issue selectable) stop after 2 iterations
- orphan recovery is attempted before stopping
- if recovery frees issues, the loop continues
Commit
commitfails with nothing to commitcommit <issue-id>stages, runs build gate, commits with issue title, and closes the tracker issuecommitwithout issue ID generates a summary from changed filenamescommitauto-stages unstaged modifications when nothing is staged
Issue Selection Priority
runprefers non-epic tasks with execution metadata (spec-id, acceptance, or design) over epics when selecting from the ready queue
Utility Commands
nextreturns the first ready issue orno ready issuesexperimentrequires a clean worktreeexperiment --closeincludes close-session guidancestatusreports persisted run-controller state and blocker summariestriageenforces required create fields instead of allowing partially specified HELIX issues- installer creates the local
helixlauncher
Test Method
- Create isolated temporary git workspaces
- Inject mock
codexandclaudebinaries - Seed
.ddx/beads.jsonl(theddx beadtracker store) with known issue graphs - Assert exact stdout or stderr fragments and filesystem side effects
Test Count
133 deterministic tests verified by bash tests/helix-cli.sh.
Port Safety
The test harness is implementation-language-agnostic. To verify a port to
another language, change the run_helix() helper to invoke the new binary
instead of bash scripts/helix. All mock agents, tracker JSONL assertions,
and stderr output checks work unchanged.
Known Gaps
- The current harness validates prompt shape and loop behavior, not live remote agent correctness.
- The harness should be extended if
checkgrows additional machine-readable trailers beyondNEXT_ACTIONthat affect loop control. - The harness should be extended when
helix statusbegins exposing richer lifecycle history than the initial run-controller snapshot contract.
Evidence
docs/helix/01-frame/features/FEAT-002-helix-cli.mddocs/helix/02-design/technical-designs/TD-002-helix-cli.mdworkflows/EXECUTION.mdworkflows/TRACKER.mdtests/helix-cli.sh