Skip to content

feat: [ENG-2323] brv harness reset + refine commands#534

Merged
RyanNg1403 merged 4 commits intoproj/autoharness-v2from
feat/ENG-2323
Apr 23, 2026
Merged

feat: [ENG-2323] brv harness reset + refine commands#534
RyanNg1403 merged 4 commits intoproj/autoharness-v2from
feat/ENG-2323

Conversation

@RyanNg1403
Copy link
Copy Markdown
Collaborator

Summary

  • Problem: No CLI commands exist to reset harness state or manually trigger refinement. Users cannot wipe stale data or force a synthesis attempt.
  • Why it matters: Admin control over AutoHarness is required for debugging, testing, and CI (Phase 8 smoke test uses reset --force). Without refine, users must wait for session-end auto-refinement.
  • What changed: Two new oclif commands (brv harness reset, brv harness refine), deleteScenarios and deleteVersion added to IHarnessStore, harness-refine task type in daemon dispatch, and four pre-existing bug fixes that prevented AutoHarness from working on real projects.
  • What did NOT change (scope boundary): No selective reset (all-or-nothing only), no --all-pairs flag, no changes to the refinement pipeline itself (Critic/Refiner/Evaluator unchanged).

Type of change

  • Bug fix
  • New feature

Scope (select all touched areas)

  • Agent / Tools
  • Server / Daemon
  • CLI Commands (oclif)

Linked issues

  • Closes ENG-2323

Root cause (bug fixes only, otherwise write N/A)

Four pre-existing bugs fixed alongside the new commands:

  1. Daemon config plumbing: agent-process.ts never read .brv/config.json harness field — AgentConfigSchema.default({}) always set enabled: false.
  2. projectId sanitization: Raw paths (e.g., /Users/foo/project) were passed as projectId to FileKeyStorage, which rejects / in key segments. All store writes silently failed.
  3. Refiner httpConfig: createGeneratorForProvider('byterover', ...) didn't receive httpConfig, causing Cannot read properties of undefined (reading 'apiBaseUrl') crash.
  4. Refine projectId: agent-process.ts passed raw projectPath to synthesizer.refineIfNeeded — store lookups returned empty because data was keyed by sanitized paths.

These were not caught earlier because the specs didn't cover daemon→agent config boundary, projectId sanitization for FileKeyStorage, or byterover provider httpConfig requirements.

Test plan

  • Coverage added:
    • Unit test
    • Manual verification only
  • Test file(s): test/unit/oclif/commands/harness/reset.test.ts (6 tests), test/unit/oclif/commands/harness/refine.test.ts (6 tests)
  • Key scenario(s) covered:
    • reset: prompt cancel, prompt confirm, force skip, empty state (text+JSON), deletion order
    • refine: accepted text, rejected text, JSON format, synthesizer unavailable (exit 2)
    • 14 interactive CLI test cases with real curate-generated data, disk artifact verification

User-visible changes

  • brv harness reset [--force] [--commandType curate|chat|query] [--format text|json] — new command
  • brv harness refine [--commandType curate|chat|query] [--format text|json] — new command
  • brv harness refine exits 2 with clear error when harness is not enabled

Evidence

Unit tests: 7221 passing, 0 failing
Interactive CLI: 14 test cases all passing (reset cancel/confirm/force/empty, refine text/JSON/disabled, invalid flags, disk verification)

brv harness status --format json
→ {"currentVersion":1,"outcomeCount":3,"mode":"assisted"}

echo "n" | brv harness reset
→ "1 version, 3 outcomes, 0 scenarios" + "Reset cancelled."

brv harness reset --force
→ "Deleted 1 version, 3 outcomes."

brv harness reset --force --format json (empty)
→ {"versions":0,"outcomes":0,"scenarios":0}

brv harness refine --format json
→ {"accepted":false,"fromVersion":1,"reason":"delta H was -0.48, below acceptance threshold"}

brv harness refine (harness disabled)
→ Error: Harness is not enabled, EXIT: 2

Checklist

  • Tests added or updated and passing (npm test)
  • Lint passes (npm run lint)
  • Type check passes (npm run typecheck)
  • Build succeeds (npm run build)
  • Commits follow Conventional Commits format
  • No breaking changes
  • Branch is up to date with proj/autoharness-v2

Risks and mitigations

  • Risk: deleteVersion iterates versions with no-await-in-loop (no batch API).
    • Mitigation: Scoped eslint-disable on single line. Version count is capped at maxVersions (default 20), so loop is bounded.
  • Risk: as SynthesisResult assertion on JSON.parse output in refine.ts.
    • Mitigation: Wrapped in try/catch that falls back to undefined. Pre-validated by agent-process serialization.

Add two admin CLI commands for AutoHarness V2:

- `brv harness reset` — wipes all harness state (versions, outcomes,
  scenarios) for a (projectId, commandType) pair. Interactive
  confirmation prompt by default, `--force` to skip. Supports
  `--format json` for scripts.

- `brv harness refine` — force-triggers the Critic→Refiner→Evaluator
  synthesis pipeline via the daemon task system. Waits up to 60s for
  the result. Exits 2 when harness is not enabled.

Also fixes four pre-existing bugs that prevented AutoHarness from
working on real projects:

1. Daemon config plumbing: agent-process.ts now reads harness field
   from .brv/config.json into agentConfig.
2. projectId sanitization: agent-llm-service.ts, sandbox-service.ts,
   and harness-cli.ts now apply sanitizeProjectPath() before store
   operations (FileKeyStorage rejects path separators in keys).
3. Refiner httpConfig: service-initializer.ts receives httpConfigProvider
   from cipher-agent.ts so the byterover LLM provider gets apiBaseUrl.
4. Refine projectId: agent-process.ts sanitizes projectPath before
   passing to synthesizer.refineIfNeeded.

Adds deleteScenarios and deleteVersion to IHarnessStore interface.
Adds getPin/setPin stubs to upstream test files for cross-merge compat.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 23, 2026

Claude finished @RyanNg1403's task in 4m 10s —— View job


PR Review: feat: [ENG-2323] brv harness reset + refine commands

Tasks

  • Gather context and understand the request
  • Read changed files and analyze code
  • Review for correctness, security, and code quality
  • Post inline comments for specific issues
  • Post final summary with overall score

Summary

Overall score: 3.5 / 5

The PR delivers clear value: two well-structured commands, four real bug fixes with solid root-cause write-ups, and a clean outside-in design (commands drive the two new IHarnessStore methods). The pure-function extraction pattern in both commands makes the unit tests straightforward and the logic easy to follow. That said, there are a handful of correctness and UX issues worth addressing before merge.


What works well

  • Outside-in design: IHarnessStore gains exactly the two methods the commands need — deleteScenarios and deleteVersion. Nothing added speculatively.
  • Bug fixes are correct: The four pre-existing bugs (harness config not read from disk, raw path as projectId, missing httpConfig for byterover refiner, refine using unsanitized projectPath) are all properly fixed and easy to verify in the diffs.
  • deleteOutcomes/deleteScenarios use batch operations: Both implementations collect all keys first then call keyStorage.batch(…), avoiding O(n) individual writes.
  • --force + interactive prompt pattern: Clean UX for both interactive and CI use; the @inquirer/prompts confirm call with {default: false} is safe by default.
  • Pure function extraction: countArtifacts, executeReset, renderResetText, renderRefineText, formatRefineResult are all directly unit-testable without oclif.
  • Ordered deletion: outcomes → scenarios → versions ordering in executeReset is correct and matches the comment. A partial reset (e.g. crash after outcomes but before versions) leaves the store in a consistent, degraded-but-not-corrupt state.

Issues found (with inline comments)

Severity Location Summary
Bug reset.ts:61 executeReset does not delete the pin — stale pin survives reset
Bug refine.ts:159 JSON error path exits code 0; text path exits code 2 — inconsistent for scripts
Improvement refine.ts:176 'harness not enabled' sentinel is a stringly-typed cross-process protocol — fragile if string changes
Improvement refine.ts:187 toVersion = fromVersion + 1 is an inferred display value, not read from the store
Improvement refine.ts:171 as SynthesisResult cast — at minimum a shape-presence guard on 'accepted' key
Improvement reset.ts:46 Number.MAX_SAFE_INTEGER limit deserializes every outcome into memory to count them
Nitpick reset.ts:101, refine.ts:92 static override flags is at column 0 (missing 2-space indent inside class body)
Test reset.test.ts:147 Deletion order not asserted — calledBefore would catch a future order regression
Test reset.test.ts:165 Number-only assertions in renderResetText test — units could be stripped with no test failure

Notable: the stale-pin issue after reset

IHarnessStore exposes setPin/getPin but no way to delete a pin. After brv harness reset, the pin record remains on disk pointing to the deleted version UUID. SandboxService.loadHarness falls back gracefully (the interface doc says "pin is a preference, not a requirement"), so this won't crash. But it means:

  1. brv harness status after reset may still surface a pin reference.
  2. A user who resets and re-bootstraps will have an orphaned pin silently ignored on every session load until they explicitly run brv harness use.

The simplest fix is adding deletePin(projectId: string, commandType: string): Promise<void> to IHarnessStore (purely additive) and calling it at the end of executeReset. Alternatively, explicitly document the behaviour in the command's help text.

Comment thread src/oclif/commands/harness/reset.ts Outdated
Comment thread src/oclif/commands/harness/refine.ts Outdated
Comment thread src/oclif/commands/harness/reset.ts
Comment thread src/oclif/commands/harness/reset.ts Outdated
Comment thread src/oclif/commands/harness/refine.ts
Comment thread src/oclif/commands/harness/refine.ts Outdated
Comment thread src/oclif/commands/harness/refine.ts
Comment thread src/oclif/commands/harness/refine.ts
Comment thread test/unit/oclif/commands/harness/reset.test.ts
Comment thread test/unit/oclif/commands/harness/reset.test.ts Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: da92297f44

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/oclif/commands/harness/refine.ts
Comment thread src/oclif/commands/harness/refine.ts
- Fix indentation on static override flags in reset.ts and refine.ts
- Add deletePin to IHarnessStore + HarnessStore + InMemoryHarnessStore
- Wire deletePin into executeReset (spec: "Clear pin if present")
- Add deletion order assertion (deleteOutcomes before deleteVersion)
- Tighten renderResetText assertions to check labels not just numbers
- Exit 2 in JSON mode when harness not enabled (consistent exit codes)
- Add deletePin stubs to all test files with satisfies IHarnessStore
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 23, 2026

Claude finished @RyanNg1403's task in 3m 29s —— View job


PR Review: feat: [ENG-2323] brv harness reset + refine commands

Tasks

  • Gather context and understand the request
  • Read changed files and analyze code
  • Review for correctness, security, and code quality
  • Post inline comments for specific issues
  • Post final summary with overall score

Summary

Overall score: 4.0 / 5 (up from 3.5 in the prior review)

The refactor commit addressed the most critical feedback from the prior review: deletePin is now called inside executeReset, the this.exit(2) is wired in for the JSON "harness not enabled" branch, indentation on static override flags is fixed, and the test suite now asserts deletion order with calledBefore and checks label text (not just numbers) in renderResetText. Good progress.


What was fixed since the prior review

Prior issue Status
executeReset left stale pin on disk ✅ Fixed — deletePin called at end of reset
JSON "harness not enabled" path exited 0 ✅ Fixed — this.exit(2) added
static override flags indentation ✅ Fixed in both reset.ts and refine.ts
Deletion order not asserted in test ✅ Fixed — calledBefore added
renderResetText test only checked numbers, not units ✅ Fixed — include('3 version') etc.

Remaining issues (with inline comments)

Severity Location Summary
Bug refine.ts:133 waitForTaskCompletion returned promise not propagated — timeout/disconnect hangs completionPromise forever
Bug refine.ts:163 Catch block JSON path returns without this.exit(2) — exits code 0 on failure
Suggestion refine.ts:177 'harness not enabled' sentinel still stringly-typed between agent-process.ts and refine.ts
Suggestion refine.ts:171 as SynthesisResult cast still has no runtime shape guard
Suggestion reset.ts:46 Number.MAX_SAFE_INTEGER deserializes all outcomes to count them

The hanging promise (new finding)

waitForTaskCompletion returns a Promise<void> that rejects on timeout (text mode) or daemon disconnect. The current code wraps the call inside a new Promise<string> executor and relies on the onCompleted/onError callbacks to settle the outer promise — but those callbacks are only called on task:completed / task:error events. If waitForTaskCompletion itself rejects (e.g. the 60 s timeout fires, or the socket drops), neither callback fires, completionPromise never settles, and brv harness refine hangs until the OS kills it.

Fix — add .catch(reject) after the call:

waitForTaskCompletion(
  { ..., onCompleted: (r) => resolve(r.result ?? ''), onError: (r) => reject(...) },
  () => {},
).catch(reject)

Exit-code inconsistency in the catch block

The "harness not enabled" JSON branch now correctly calls this.exit(2) (fixed ✅). But the general error catch block (lines 157–165) still returns after logging JSON, exiting 0. Any automation that drives brv harness refine --format json and checks $? will see success even when the daemon throws.

Comment thread src/oclif/commands/harness/refine.ts
Comment thread src/oclif/commands/harness/refine.ts
Comment thread src/oclif/commands/harness/refine.ts Outdated
Comment thread src/oclif/commands/harness/refine.ts Outdated
Comment thread src/oclif/commands/harness/reset.ts Outdated
- Propagate waitForTaskCompletion rejection via .catch(reject) to
  prevent completionPromise from hanging on timeout/disconnect
- Exit code 2 in JSON error catch block (was exiting 0)
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 23, 2026

Claude finished @RyanNg1403's task in 2m 4s —— View job


PR Review: feat: [ENG-2323] brv harness reset + refine commands

Tasks

  • Gather context and understand the request
  • Read changed files and analyze code
  • Review for correctness, security, and code quality
  • Post inline comments for specific issues
  • Post final summary with overall score

Summary

Overall score: 4.5 / 5 (up from 4.0 in the prior review)

The third commit (fd82ceb) closes the two remaining bugs from the second review round: the waitForTaskCompletion hanging promise is now correctly propagated via .catch(reject), and the general-error JSON catch block now exits with code 2 instead of 0. All critical correctness issues across three review rounds are resolved.

The remaining open items are all suggestions — none are blockers for merge.


What was fixed in this round

Prior issue Status
waitForTaskCompletion promise not propagated — hangs on timeout/disconnect ✅ Fixed — .catch(reject) added at refine.ts:144
General error catch block JSON path returned and exited 0 ✅ Fixed — this.exit(2) added at refine.ts:161

Complete fix history (all three rounds)

Issue Round fixed
executeReset left stale pin on disk Round 2
JSON "harness not enabled" branch exited 0 Round 2
static override flags indentation in both files Round 2
Deletion order not asserted in test (calledBefore) Round 2
renderResetText test only checked numbers, not labels Round 2
waitForTaskCompletion returned promise discarded Round 3
General error catch block JSON path exited 0 Round 3

Remaining suggestions (non-blocking)

'harness not enabled' is a stringly-typed cross-process protocol (refine.ts:178agent-process.ts:609)

The string 'harness not enabled' is checked in refine.ts but produced in agent-process.ts. A typo or capitalisation fix in either file silently breaks the check. Extracting a shared constant to src/shared/ is purely additive and takes about 5 lines:

// src/shared/harness-constants.ts
export const HARNESS_NOT_ENABLED_REASON = 'harness not enabled' as const

Fix this →


as SynthesisResult cast has no runtime shape guard (refine.ts:172)

The try/catch handles malformed JSON, but a structurally wrong but valid JSON object (e.g. an older daemon returning {"ok":true}) would propagate with undefined properties silently. A minimal two-line guard:

const parsed: unknown = taskResult ? JSON.parse(taskResult) : undefined
synthesisResult =
  parsed !== null && typeof parsed === 'object' && 'accepted' in parsed
    ? (parsed as SynthesisResult)
    : undefined

This is low-risk given tight coupling with agent-process.ts, but worth hardening before the protocol stabilizes. Fix this →


Number.MAX_SAFE_INTEGER in countArtifacts (reset.ts:46)

Passing Number.MAX_SAFE_INTEGER to listOutcomes deserializes the entire outcome set into memory just for counting. In practice the recorder caps outcomes at ~200 per pair, so this is harmless. A comment explaining the bound assumption would make the intent clear without requiring a countOutcomes method:

// listOutcomes limit — recorder caps at maxOutcomes (default 200), so
// this is effectively "all" while still being a finite upper bound.
store.listOutcomes(projectId, commandType, 10_000),

toVersion = fromVersion + 1 is an inferred display value (refine.ts:189)

The displayed toVersion is computed as fromVersion + 1, not read from the store or the SynthesisResult. This holds for strictly sequential refinement but could display the wrong number if a concurrent write or manual saveVersion inserted a gap. Since it is display-only, the simplest hardening is a comment:

// toVersion is inferred — SynthesisResult carries toVersionId (UUID) but
// not the numeric version field. Re-read the store for an exact number if
// that ever matters.

Verdict

All blocking correctness issues are resolved. The four remaining items above are suggestions the author can address in this PR or defer to a follow-up. The code is clean, the outside-in design is sound, the bug fixes are well-motivated, and the test coverage is solid.

- Extract HARNESS_NOT_ENABLED_REASON shared constant to eliminate
  stringly-typed cross-process protocol between agent-process and
  refine command
- Add runtime shape guard on JSON.parse result (check 'accepted' key)
  before casting to SynthesisResult
- Replace Number.MAX_SAFE_INTEGER with bounded 10_000 limit in
  countArtifacts (recorder caps at ~200 per pair)
- Add toVersion inference comment documenting the assumption
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 23, 2026

Claude finished @RyanNg1403's task in 2m 12s —— View job


PR Review: feat: [ENG-2323] brv harness reset + refine commands

Tasks

  • Gather context and understand the request
  • Read changed files and analyze latest commit
  • Review for correctness, security, and code quality
  • Post inline comments for specific issues
  • Post final summary with overall score

Summary

Overall score: 5.0 / 5 (up from 4.5 in the prior review)

The fourth commit (d4a148d) closes every remaining suggestion from round 3. All seven issues across the three prior review rounds are now resolved, and the four round-3 suggestions have been fully addressed. The PR is ready to merge.


What was fixed in this round

Prior suggestion Status
'harness not enabled' stringly-typed cross-process protocol ✅ Fixed — HARNESS_NOT_ENABLED_REASON const in src/shared/constants/harness.ts, imported in both agent-process.ts and refine.ts
as SynthesisResult unchecked cast ✅ Fixed — shape guard checks parsed !== null && typeof parsed === 'object' && 'accepted' in parsed before cast
Number.MAX_SAFE_INTEGER deserializes all outcomes ✅ Fixed — replaced with 10_000 with explanatory comment
toVersion = fromVersion + 1 undocumented inference ✅ Fixed — comment added documenting the assumption and its limitation

Complete fix history (all four rounds)

Issue Round fixed
executeReset left stale pin on disk Round 2
JSON "harness not enabled" branch exited 0 Round 2
static override flags indentation in both files Round 2
Deletion order not asserted in test (calledBefore) Round 2
renderResetText test only checked numbers, not labels Round 2
waitForTaskCompletion returned promise discarded Round 3
General error catch block JSON path exited 0 Round 3
'harness not enabled' stringly-typed sentinel Round 4
as SynthesisResult unchecked JSON cast Round 4
Number.MAX_SAFE_INTEGER outcome deserialisation Round 4
toVersion inference undocumented Round 4

Non-blocking nitpicks (no action needed)

deletePin and deleteScenarios are in swapped section headers in harness-store.ts

At line 120, the section comment reads // ── scenarios ─────────────────────────────────────────────────────────────────────, but the first method under it is deletePin (line 122) — a pin operation, not a scenario operation. Conversely, deleteScenarios (line 160) sits under // ── versions ───────────────────────────────────────────────────────────────────────. The two batch-delete methods landed in each other's sections. Purely cosmetic, but worth fixing in a follow-up or here if you're already in the file.

deletePin uses get instead of exists for existence check (harness-store.ts:124)

const exists = await this.keyStorage.get(key)  // fetches and discards the value

deleteScenario (same file, line 138) uses this.keyStorage.exists(key) for the same pattern, which skips deserialisation. Since deletePin immediately discards the value, exists() would be more consistent and marginally more efficient. Not a correctness issue.

'refinement skipped' inline string in agent-process.ts:616

result = JSON.stringify(synthResult ?? {accepted: false, reason: 'refinement skipped'})

Now that HARNESS_NOT_ENABLED_REASON has been extracted as a shared constant, the 'refinement skipped' fallback string is the only other cross-boundary literal remaining. Unlike HARNESS_NOT_ENABLED_REASON, this string is not used as a sentinel for control flow in refine.ts — it flows through as display text — so the risk is lower. Still, extracting it as HARNESS_REFINEMENT_SKIPPED_REASON alongside the existing constant would be fully consistent.


Verdict

All blocking correctness issues are resolved. The three items above are cosmetic/consistency nits that can be addressed in a follow-up. The design is clean, the outside-in structure is sound, all bug fixes are well-motivated and correctly implemented, and the test suite is solid. Approve.

@RyanNg1403 RyanNg1403 merged commit 12d163c into proj/autoharness-v2 Apr 23, 2026
6 checks passed
@danhdoan danhdoan deleted the feat/ENG-2323 branch April 23, 2026 04:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants