feat: [ENG-2323] brv harness reset + refine commands by RyanNg1403 · Pull Request #534 · campfirein/byterover-cli

RyanNg1403 · 2026-04-23T03:35:49Z

Summary

Problem: No CLI commands exist to reset harness state or manually trigger refinement. Users cannot wipe stale data or force a synthesis attempt.
Why it matters: Admin control over AutoHarness is required for debugging, testing, and CI (Phase 8 smoke test uses reset --force). Without refine, users must wait for session-end auto-refinement.
What changed: Two new oclif commands (brv harness reset, brv harness refine), deleteScenarios and deleteVersion added to IHarnessStore, harness-refine task type in daemon dispatch, and four pre-existing bug fixes that prevented AutoHarness from working on real projects.
What did NOT change (scope boundary): No selective reset (all-or-nothing only), no --all-pairs flag, no changes to the refinement pipeline itself (Critic/Refiner/Evaluator unchanged).

Type of change

Bug fix
New feature

Scope (select all touched areas)

Agent / Tools
Server / Daemon
CLI Commands (oclif)

Linked issues

Closes ENG-2323

Root cause (bug fixes only, otherwise write `N/A`)

Four pre-existing bugs fixed alongside the new commands:

Daemon config plumbing: agent-process.ts never read .brv/config.json harness field — AgentConfigSchema.default({}) always set enabled: false.
projectId sanitization: Raw paths (e.g., /Users/foo/project) were passed as projectId to FileKeyStorage, which rejects / in key segments. All store writes silently failed.
Refiner httpConfig: createGeneratorForProvider('byterover', ...) didn't receive httpConfig, causing Cannot read properties of undefined (reading 'apiBaseUrl') crash.
Refine projectId: agent-process.ts passed raw projectPath to synthesizer.refineIfNeeded — store lookups returned empty because data was keyed by sanitized paths.

These were not caught earlier because the specs didn't cover daemon→agent config boundary, projectId sanitization for FileKeyStorage, or byterover provider httpConfig requirements.

Test plan

Coverage added:
- Unit test
- Manual verification only
Test file(s): test/unit/oclif/commands/harness/reset.test.ts (6 tests), test/unit/oclif/commands/harness/refine.test.ts (6 tests)
Key scenario(s) covered:
- reset: prompt cancel, prompt confirm, force skip, empty state (text+JSON), deletion order
- refine: accepted text, rejected text, JSON format, synthesizer unavailable (exit 2)
- 14 interactive CLI test cases with real curate-generated data, disk artifact verification

User-visible changes

brv harness reset [--force] [--commandType curate|chat|query] [--format text|json] — new command
brv harness refine [--commandType curate|chat|query] [--format text|json] — new command
brv harness refine exits 2 with clear error when harness is not enabled

Evidence

Unit tests: 7221 passing, 0 failing
Interactive CLI: 14 test cases all passing (reset cancel/confirm/force/empty, refine text/JSON/disabled, invalid flags, disk verification)

brv harness status --format json
→ {"currentVersion":1,"outcomeCount":3,"mode":"assisted"}

echo "n" | brv harness reset
→ "1 version, 3 outcomes, 0 scenarios" + "Reset cancelled."

brv harness reset --force
→ "Deleted 1 version, 3 outcomes."

brv harness reset --force --format json (empty)
→ {"versions":0,"outcomes":0,"scenarios":0}

brv harness refine --format json
→ {"accepted":false,"fromVersion":1,"reason":"delta H was -0.48, below acceptance threshold"}

brv harness refine (harness disabled)
→ Error: Harness is not enabled, EXIT: 2

Checklist

Tests added or updated and passing (npm test)
Lint passes (npm run lint)
Type check passes (npm run typecheck)
Build succeeds (npm run build)
Commits follow Conventional Commits format
No breaking changes
Branch is up to date with proj/autoharness-v2

Risks and mitigations

Risk: deleteVersion iterates versions with no-await-in-loop (no batch API).
- Mitigation: Scoped eslint-disable on single line. Version count is capped at maxVersions (default 20), so loop is bounded.
Risk: as SynthesisResult assertion on JSON.parse output in refine.ts.
- Mitigation: Wrapped in try/catch that falls back to undefined. Pre-validated by agent-process serialization.

Add two admin CLI commands for AutoHarness V2: - `brv harness reset` — wipes all harness state (versions, outcomes, scenarios) for a (projectId, commandType) pair. Interactive confirmation prompt by default, `--force` to skip. Supports `--format json` for scripts. - `brv harness refine` — force-triggers the Critic→Refiner→Evaluator synthesis pipeline via the daemon task system. Waits up to 60s for the result. Exits 2 when harness is not enabled. Also fixes four pre-existing bugs that prevented AutoHarness from working on real projects: 1. Daemon config plumbing: agent-process.ts now reads harness field from .brv/config.json into agentConfig. 2. projectId sanitization: agent-llm-service.ts, sandbox-service.ts, and harness-cli.ts now apply sanitizeProjectPath() before store operations (FileKeyStorage rejects path separators in keys). 3. Refiner httpConfig: service-initializer.ts receives httpConfigProvider from cipher-agent.ts so the byterover LLM provider gets apiBaseUrl. 4. Refine projectId: agent-process.ts sanitizes projectPath before passing to synthesizer.refineIfNeeded. Adds deleteScenarios and deleteVersion to IHarnessStore interface. Adds getPin/setPin stubs to upstream test files for cross-merge compat.

github-actions · 2026-04-23T03:36:11Z

Claude finished @RyanNg1403's task in 4m 10s —— View job

PR Review: feat: [ENG-2323] brv harness reset + refine commands

Tasks

Gather context and understand the request
Read changed files and analyze code
Review for correctness, security, and code quality
Post inline comments for specific issues
Post final summary with overall score

Summary

Overall score: 3.5 / 5

The PR delivers clear value: two well-structured commands, four real bug fixes with solid root-cause write-ups, and a clean outside-in design (commands drive the two new IHarnessStore methods). The pure-function extraction pattern in both commands makes the unit tests straightforward and the logic easy to follow. That said, there are a handful of correctness and UX issues worth addressing before merge.

What works well

Outside-in design: IHarnessStore gains exactly the two methods the commands need — deleteScenarios and deleteVersion. Nothing added speculatively.
Bug fixes are correct: The four pre-existing bugs (harness config not read from disk, raw path as projectId, missing httpConfig for byterover refiner, refine using unsanitized projectPath) are all properly fixed and easy to verify in the diffs.
deleteOutcomes/deleteScenarios use batch operations: Both implementations collect all keys first then call keyStorage.batch(…), avoiding O(n) individual writes.
--force + interactive prompt pattern: Clean UX for both interactive and CI use; the @inquirer/prompts confirm call with {default: false} is safe by default.
Pure function extraction: countArtifacts, executeReset, renderResetText, renderRefineText, formatRefineResult are all directly unit-testable without oclif.
Ordered deletion: outcomes → scenarios → versions ordering in executeReset is correct and matches the comment. A partial reset (e.g. crash after outcomes but before versions) leaves the store in a consistent, degraded-but-not-corrupt state.

Issues found (with inline comments)

Severity	Location	Summary
Bug	`reset.ts:61`	`executeReset` does not delete the pin — stale pin survives reset
Bug	`refine.ts:159`	JSON error path exits code 0; text path exits code 2 — inconsistent for scripts
Improvement	`refine.ts:176`	`'harness not enabled'` sentinel is a stringly-typed cross-process protocol — fragile if string changes
Improvement	`refine.ts:187`	`toVersion = fromVersion + 1` is an inferred display value, not read from the store
Improvement	`refine.ts:171`	`as SynthesisResult` cast — at minimum a shape-presence guard on `'accepted'` key
Improvement	`reset.ts:46`	`Number.MAX_SAFE_INTEGER` limit deserializes every outcome into memory to count them
Nitpick	`reset.ts:101`, `refine.ts:92`	`static override flags` is at column 0 (missing 2-space indent inside class body)
Test	`reset.test.ts:147`	Deletion order not asserted — `calledBefore` would catch a future order regression
Test	`reset.test.ts:165`	Number-only assertions in `renderResetText` test — units could be stripped with no test failure

Notable: the stale-pin issue after reset

IHarnessStore exposes setPin/getPin but no way to delete a pin. After brv harness reset, the pin record remains on disk pointing to the deleted version UUID. SandboxService.loadHarness falls back gracefully (the interface doc says "pin is a preference, not a requirement"), so this won't crash. But it means:

brv harness status after reset may still surface a pin reference.
A user who resets and re-bootstraps will have an orphaned pin silently ignored on every session load until they explicitly run brv harness use.

The simplest fix is adding deletePin(projectId: string, commandType: string): Promise<void> to IHarnessStore (purely additive) and calling it at the end of executeReset. Alternatively, explicitly document the behaviour in the command's help text.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: da92297f44

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- Fix indentation on static override flags in reset.ts and refine.ts - Add deletePin to IHarnessStore + HarnessStore + InMemoryHarnessStore - Wire deletePin into executeReset (spec: "Clear pin if present") - Add deletion order assertion (deleteOutcomes before deleteVersion) - Tighten renderResetText assertions to check labels not just numbers - Exit 2 in JSON mode when harness not enabled (consistent exit codes) - Add deletePin stubs to all test files with satisfies IHarnessStore

github-actions · 2026-04-23T03:58:16Z

Claude finished @RyanNg1403's task in 3m 29s —— View job

PR Review: feat: [ENG-2323] brv harness reset + refine commands

Tasks

Gather context and understand the request
Read changed files and analyze code
Review for correctness, security, and code quality
Post inline comments for specific issues
Post final summary with overall score

Summary

Overall score: 4.0 / 5 (up from 3.5 in the prior review)

The refactor commit addressed the most critical feedback from the prior review: deletePin is now called inside executeReset, the this.exit(2) is wired in for the JSON "harness not enabled" branch, indentation on static override flags is fixed, and the test suite now asserts deletion order with calledBefore and checks label text (not just numbers) in renderResetText. Good progress.

What was fixed since the prior review

Prior issue	Status
`executeReset` left stale pin on disk	✅ Fixed — `deletePin` called at end of reset
JSON "harness not enabled" path exited 0	✅ Fixed — `this.exit(2)` added
`static override flags` indentation	✅ Fixed in both `reset.ts` and `refine.ts`
Deletion order not asserted in test	✅ Fixed — `calledBefore` added
`renderResetText` test only checked numbers, not units	✅ Fixed — `include('3 version')` etc.

Remaining issues (with inline comments)

Severity	Location	Summary
Bug	`refine.ts:133`	`waitForTaskCompletion` returned promise not propagated — timeout/disconnect hangs `completionPromise` forever
Bug	`refine.ts:163`	Catch block JSON path returns without `this.exit(2)` — exits code 0 on failure
Suggestion	`refine.ts:177`	`'harness not enabled'` sentinel still stringly-typed between `agent-process.ts` and `refine.ts`
Suggestion	`refine.ts:171`	`as SynthesisResult` cast still has no runtime shape guard
Suggestion	`reset.ts:46`	`Number.MAX_SAFE_INTEGER` deserializes all outcomes to count them

The hanging promise (new finding)

waitForTaskCompletion returns a Promise<void> that rejects on timeout (text mode) or daemon disconnect. The current code wraps the call inside a new Promise<string> executor and relies on the onCompleted/onError callbacks to settle the outer promise — but those callbacks are only called on task:completed / task:error events. If waitForTaskCompletion itself rejects (e.g. the 60 s timeout fires, or the socket drops), neither callback fires, completionPromise never settles, and brv harness refine hangs until the OS kills it.

Fix — add .catch(reject) after the call:

waitForTaskCompletion(
  { ..., onCompleted: (r) => resolve(r.result ?? ''), onError: (r) => reject(...) },
  () => {},
).catch(reject)

Exit-code inconsistency in the catch block

The "harness not enabled" JSON branch now correctly calls this.exit(2) (fixed ✅). But the general error catch block (lines 157–165) still returns after logging JSON, exiting 0. Any automation that drives brv harness refine --format json and checks $? will see success even when the daemon throws.

- Propagate waitForTaskCompletion rejection via .catch(reject) to prevent completionPromise from hanging on timeout/disconnect - Exit code 2 in JSON error catch block (was exiting 0)

github-actions · 2026-04-23T04:06:23Z

Claude finished @RyanNg1403's task in 2m 4s —— View job

PR Review: feat: [ENG-2323] brv harness reset + refine commands

Tasks

Gather context and understand the request
Read changed files and analyze code
Review for correctness, security, and code quality
Post inline comments for specific issues
Post final summary with overall score

Summary

Overall score: 4.5 / 5 (up from 4.0 in the prior review)

The third commit (fd82ceb) closes the two remaining bugs from the second review round: the waitForTaskCompletion hanging promise is now correctly propagated via .catch(reject), and the general-error JSON catch block now exits with code 2 instead of 0. All critical correctness issues across three review rounds are resolved.

The remaining open items are all suggestions — none are blockers for merge.

What was fixed in this round

Prior issue	Status
`waitForTaskCompletion` promise not propagated — hangs on timeout/disconnect	✅ Fixed — `.catch(reject)` added at `refine.ts:144`
General error catch block JSON path returned and exited 0	✅ Fixed — `this.exit(2)` added at `refine.ts:161`

Complete fix history (all three rounds)

Issue	Round fixed
`executeReset` left stale pin on disk	Round 2
JSON "harness not enabled" branch exited 0	Round 2
`static override flags` indentation in both files	Round 2
Deletion order not asserted in test (`calledBefore`)	Round 2
`renderResetText` test only checked numbers, not labels	Round 2
`waitForTaskCompletion` returned promise discarded	Round 3
General error catch block JSON path exited 0	Round 3

Remaining suggestions (non-blocking)

'harness not enabled' is a stringly-typed cross-process protocol (refine.ts:178 ↔ agent-process.ts:609)

The string 'harness not enabled' is checked in refine.ts but produced in agent-process.ts. A typo or capitalisation fix in either file silently breaks the check. Extracting a shared constant to src/shared/ is purely additive and takes about 5 lines:

// src/shared/harness-constants.ts
export const HARNESS_NOT_ENABLED_REASON = 'harness not enabled' as const

Fix this →

as SynthesisResult cast has no runtime shape guard (refine.ts:172)

The try/catch handles malformed JSON, but a structurally wrong but valid JSON object (e.g. an older daemon returning {"ok":true}) would propagate with undefined properties silently. A minimal two-line guard:

const parsed: unknown = taskResult ? JSON.parse(taskResult) : undefined
synthesisResult =
  parsed !== null && typeof parsed === 'object' && 'accepted' in parsed
    ? (parsed as SynthesisResult)
    : undefined

This is low-risk given tight coupling with agent-process.ts, but worth hardening before the protocol stabilizes. Fix this →

Number.MAX_SAFE_INTEGER in countArtifacts (reset.ts:46)

Passing Number.MAX_SAFE_INTEGER to listOutcomes deserializes the entire outcome set into memory just for counting. In practice the recorder caps outcomes at ~200 per pair, so this is harmless. A comment explaining the bound assumption would make the intent clear without requiring a countOutcomes method:

// listOutcomes limit — recorder caps at maxOutcomes (default 200), so
// this is effectively "all" while still being a finite upper bound.
store.listOutcomes(projectId, commandType, 10_000),

toVersion = fromVersion + 1 is an inferred display value (refine.ts:189)

The displayed toVersion is computed as fromVersion + 1, not read from the store or the SynthesisResult. This holds for strictly sequential refinement but could display the wrong number if a concurrent write or manual saveVersion inserted a gap. Since it is display-only, the simplest hardening is a comment:

// toVersion is inferred — SynthesisResult carries toVersionId (UUID) but
// not the numeric version field. Re-read the store for an exact number if
// that ever matters.

Verdict

All blocking correctness issues are resolved. The four remaining items above are suggestions the author can address in this PR or defer to a follow-up. The code is clean, the outside-in design is sound, the bug fixes are well-motivated, and the test coverage is solid.

- Extract HARNESS_NOT_ENABLED_REASON shared constant to eliminate stringly-typed cross-process protocol between agent-process and refine command - Add runtime shape guard on JSON.parse result (check 'accepted' key) before casting to SynthesisResult - Replace Number.MAX_SAFE_INTEGER with bounded 10_000 limit in countArtifacts (recorder caps at ~200 per pair) - Add toVersion inference comment documenting the assumption

github-actions · 2026-04-23T04:15:16Z

Claude finished @RyanNg1403's task in 2m 12s —— View job

PR Review: feat: [ENG-2323] brv harness reset + refine commands

Tasks

Gather context and understand the request
Read changed files and analyze latest commit
Review for correctness, security, and code quality
Post inline comments for specific issues
Post final summary with overall score

Summary

Overall score: 5.0 / 5 (up from 4.5 in the prior review)

The fourth commit (d4a148d) closes every remaining suggestion from round 3. All seven issues across the three prior review rounds are now resolved, and the four round-3 suggestions have been fully addressed. The PR is ready to merge.

What was fixed in this round

Prior suggestion	Status
`'harness not enabled'` stringly-typed cross-process protocol	✅ Fixed — `HARNESS_NOT_ENABLED_REASON` const in `src/shared/constants/harness.ts`, imported in both `agent-process.ts` and `refine.ts`
`as SynthesisResult` unchecked cast	✅ Fixed — shape guard checks `parsed !== null && typeof parsed === 'object' && 'accepted' in parsed` before cast
`Number.MAX_SAFE_INTEGER` deserializes all outcomes	✅ Fixed — replaced with `10_000` with explanatory comment
`toVersion = fromVersion + 1` undocumented inference	✅ Fixed — comment added documenting the assumption and its limitation

Complete fix history (all four rounds)

Issue	Round fixed
`executeReset` left stale pin on disk	Round 2
JSON "harness not enabled" branch exited 0	Round 2
`static override flags` indentation in both files	Round 2
Deletion order not asserted in test (`calledBefore`)	Round 2
`renderResetText` test only checked numbers, not labels	Round 2
`waitForTaskCompletion` returned promise discarded	Round 3
General error catch block JSON path exited 0	Round 3
`'harness not enabled'` stringly-typed sentinel	Round 4
`as SynthesisResult` unchecked JSON cast	Round 4
`Number.MAX_SAFE_INTEGER` outcome deserialisation	Round 4
`toVersion` inference undocumented	Round 4

Non-blocking nitpicks (no action needed)

deletePin and deleteScenarios are in swapped section headers in harness-store.ts

At line 120, the section comment reads // ── scenarios ─────────────────────────────────────────────────────────────────────, but the first method under it is deletePin (line 122) — a pin operation, not a scenario operation. Conversely, deleteScenarios (line 160) sits under // ── versions ───────────────────────────────────────────────────────────────────────. The two batch-delete methods landed in each other's sections. Purely cosmetic, but worth fixing in a follow-up or here if you're already in the file.

deletePin uses get instead of exists for existence check (harness-store.ts:124)

const exists = await this.keyStorage.get(key)  // fetches and discards the value

deleteScenario (same file, line 138) uses this.keyStorage.exists(key) for the same pattern, which skips deserialisation. Since deletePin immediately discards the value, exists() would be more consistent and marginally more efficient. Not a correctness issue.

'refinement skipped' inline string in agent-process.ts:616

result = JSON.stringify(synthResult ?? {accepted: false, reason: 'refinement skipped'})

Now that HARNESS_NOT_ENABLED_REASON has been extracted as a shared constant, the 'refinement skipped' fallback string is the only other cross-boundary literal remaining. Unlike HARNESS_NOT_ENABLED_REASON, this string is not used as a sentinel for control flow in refine.ts — it flows through as display text — so the risk is lower. Still, extracting it as HARNESS_REFINEMENT_SKIPPED_REASON alongside the existing constant would be fully consistent.

Verdict

All blocking correctness issues are resolved. The three items above are cosmetic/consistency nits that can be addressed in a follow-up. The design is clean, the outside-in structure is sound, all bug fixes are well-motivated and correctly implemented, and the test suite is solid. Approve.

RyanNg1403 requested review from bao-byterover, cuongdo-byterover, danhdoan, leehpham and ngduyanhece as code owners April 23, 2026 03:35