feat: [ENG-2323] brv harness reset + refine commands#534
feat: [ENG-2323] brv harness reset + refine commands#534RyanNg1403 merged 4 commits intoproj/autoharness-v2from
Conversation
Add two admin CLI commands for AutoHarness V2: - `brv harness reset` — wipes all harness state (versions, outcomes, scenarios) for a (projectId, commandType) pair. Interactive confirmation prompt by default, `--force` to skip. Supports `--format json` for scripts. - `brv harness refine` — force-triggers the Critic→Refiner→Evaluator synthesis pipeline via the daemon task system. Waits up to 60s for the result. Exits 2 when harness is not enabled. Also fixes four pre-existing bugs that prevented AutoHarness from working on real projects: 1. Daemon config plumbing: agent-process.ts now reads harness field from .brv/config.json into agentConfig. 2. projectId sanitization: agent-llm-service.ts, sandbox-service.ts, and harness-cli.ts now apply sanitizeProjectPath() before store operations (FileKeyStorage rejects path separators in keys). 3. Refiner httpConfig: service-initializer.ts receives httpConfigProvider from cipher-agent.ts so the byterover LLM provider gets apiBaseUrl. 4. Refine projectId: agent-process.ts sanitizes projectPath before passing to synthesizer.refineIfNeeded. Adds deleteScenarios and deleteVersion to IHarnessStore interface. Adds getPin/setPin stubs to upstream test files for cross-merge compat.
|
Claude finished @RyanNg1403's task in 4m 10s —— View job PR Review: feat: [ENG-2323] brv harness reset + refine commandsTasks
SummaryOverall score: 3.5 / 5 The PR delivers clear value: two well-structured commands, four real bug fixes with solid root-cause write-ups, and a clean outside-in design (commands drive the two new What works well
Issues found (with inline comments)
Notable: the stale-pin issue after reset
The simplest fix is adding |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: da92297f44
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- Fix indentation on static override flags in reset.ts and refine.ts - Add deletePin to IHarnessStore + HarnessStore + InMemoryHarnessStore - Wire deletePin into executeReset (spec: "Clear pin if present") - Add deletion order assertion (deleteOutcomes before deleteVersion) - Tighten renderResetText assertions to check labels not just numbers - Exit 2 in JSON mode when harness not enabled (consistent exit codes) - Add deletePin stubs to all test files with satisfies IHarnessStore
|
Claude finished @RyanNg1403's task in 3m 29s —— View job PR Review: feat: [ENG-2323] brv harness reset + refine commandsTasks
SummaryOverall score: 4.0 / 5 (up from 3.5 in the prior review) The refactor commit addressed the most critical feedback from the prior review: What was fixed since the prior review
Remaining issues (with inline comments)
The hanging promise (new finding)
Fix — add waitForTaskCompletion(
{ ..., onCompleted: (r) => resolve(r.result ?? ''), onError: (r) => reject(...) },
() => {},
).catch(reject)Exit-code inconsistency in the catch blockThe "harness not enabled" JSON branch now correctly calls |
- Propagate waitForTaskCompletion rejection via .catch(reject) to prevent completionPromise from hanging on timeout/disconnect - Exit code 2 in JSON error catch block (was exiting 0)
|
Claude finished @RyanNg1403's task in 2m 4s —— View job PR Review: feat: [ENG-2323] brv harness reset + refine commandsTasks
SummaryOverall score: 4.5 / 5 (up from 4.0 in the prior review) The third commit ( The remaining open items are all suggestions — none are blockers for merge. What was fixed in this round
Complete fix history (all three rounds)
Remaining suggestions (non-blocking)
The string // src/shared/harness-constants.ts
export const HARNESS_NOT_ENABLED_REASON = 'harness not enabled' as const
The try/catch handles malformed JSON, but a structurally wrong but valid JSON object (e.g. an older daemon returning const parsed: unknown = taskResult ? JSON.parse(taskResult) : undefined
synthesisResult =
parsed !== null && typeof parsed === 'object' && 'accepted' in parsed
? (parsed as SynthesisResult)
: undefinedThis is low-risk given tight coupling with
Passing // listOutcomes limit — recorder caps at maxOutcomes (default 200), so
// this is effectively "all" while still being a finite upper bound.
store.listOutcomes(projectId, commandType, 10_000),
The displayed // toVersion is inferred — SynthesisResult carries toVersionId (UUID) but
// not the numeric version field. Re-read the store for an exact number if
// that ever matters.VerdictAll blocking correctness issues are resolved. The four remaining items above are suggestions the author can address in this PR or defer to a follow-up. The code is clean, the outside-in design is sound, the bug fixes are well-motivated, and the test coverage is solid. |
- Extract HARNESS_NOT_ENABLED_REASON shared constant to eliminate stringly-typed cross-process protocol between agent-process and refine command - Add runtime shape guard on JSON.parse result (check 'accepted' key) before casting to SynthesisResult - Replace Number.MAX_SAFE_INTEGER with bounded 10_000 limit in countArtifacts (recorder caps at ~200 per pair) - Add toVersion inference comment documenting the assumption
|
Claude finished @RyanNg1403's task in 2m 12s —— View job PR Review: feat: [ENG-2323] brv harness reset + refine commandsTasks
SummaryOverall score: 5.0 / 5 (up from 4.5 in the prior review) The fourth commit ( What was fixed in this round
Complete fix history (all four rounds)
Non-blocking nitpicks (no action needed)
At line 120, the section comment reads
const exists = await this.keyStorage.get(key) // fetches and discards the value
result = JSON.stringify(synthResult ?? {accepted: false, reason: 'refinement skipped'})Now that VerdictAll blocking correctness issues are resolved. The three items above are cosmetic/consistency nits that can be addressed in a follow-up. The design is clean, the outside-in structure is sound, all bug fixes are well-motivated and correctly implemented, and the test suite is solid. Approve. |
Summary
reset --force). Withoutrefine, users must wait for session-end auto-refinement.brv harness reset,brv harness refine),deleteScenariosanddeleteVersionadded toIHarnessStore,harness-refinetask type in daemon dispatch, and four pre-existing bug fixes that prevented AutoHarness from working on real projects.--all-pairsflag, no changes to the refinement pipeline itself (Critic/Refiner/Evaluator unchanged).Type of change
Scope (select all touched areas)
Linked issues
Root cause (bug fixes only, otherwise write
N/A)Four pre-existing bugs fixed alongside the new commands:
agent-process.tsnever read.brv/config.jsonharness field —AgentConfigSchema.default({})always setenabled: false./Users/foo/project) were passed asprojectIdtoFileKeyStorage, which rejects/in key segments. All store writes silently failed.createGeneratorForProvider('byterover', ...)didn't receivehttpConfig, causingCannot read properties of undefined (reading 'apiBaseUrl')crash.agent-process.tspassed rawprojectPathtosynthesizer.refineIfNeeded— store lookups returned empty because data was keyed by sanitized paths.These were not caught earlier because the specs didn't cover daemon→agent config boundary, projectId sanitization for FileKeyStorage, or byterover provider httpConfig requirements.
Test plan
test/unit/oclif/commands/harness/reset.test.ts(6 tests),test/unit/oclif/commands/harness/refine.test.ts(6 tests)User-visible changes
brv harness reset [--force] [--commandType curate|chat|query] [--format text|json]— new commandbrv harness refine [--commandType curate|chat|query] [--format text|json]— new commandbrv harness refineexits 2 with clear error when harness is not enabledEvidence
Unit tests: 7221 passing, 0 failing
Interactive CLI: 14 test cases all passing (reset cancel/confirm/force/empty, refine text/JSON/disabled, invalid flags, disk verification)
Checklist
npm test)npm run lint)npm run typecheck)npm run build)proj/autoharness-v2Risks and mitigations
deleteVersioniterates versions withno-await-in-loop(no batch API).eslint-disableon single line. Version count is capped atmaxVersions(default 20), so loop is bounded.as SynthesisResultassertion onJSON.parseoutput inrefine.ts.undefined. Pre-validated by agent-process serialization.