feat: [ENG-2324] --feedback flag on curate + query (Phase 7 Task 7.4)#532
feat: [ENG-2324] --feedback flag on curate + query (Phase 7 Task 7.4)#532danhdoan merged 3 commits intoproj/autoharness-v2from
Conversation
Closes brutal-review Tier 2 D4. Adds `--feedback <good|bad>` to both `brv curate` and `brv query` so users can flag the outcome the daemon just wrote, feeding Phase 6 Task 6.5's weighting policy (3x synthetic failures for `bad`, 1x synthetic success for `good`). Primary action runs first — a failed curate/query reports the primary failure, not a silent feedback drop. CLI surface: - `src/oclif/commands/curate/index.ts` — new `feedback` flag + `handleFeedback` after `submitTask`. Detach mode skips with a hint (no completed outcome yet). Disabled harness warns + exit 0. Missing outcome → exit 1. - `src/oclif/commands/query.ts` — same shape, adapted to the query command's flow. Shared lib: - `src/oclif/lib/harness-feedback.ts` — entry point `attachFeedbackFromCli(projectRoot, commandType, verdict)` + pure store variant `attachFeedbackToStore` for tests. Minimal `HarnessOutcomeRecorder` construction (NoOp logger + unused SessionEventBus) keeps the CLI delegating to Phase 6's canonical `attachFeedback` instead of re-implementing the 3x/1x policy. Real bug caught in test authoring: The first draft of the helper scanned `listOutcomes(..., limit=1)` for the most recent outcome, but synthetic rows (inserted with `Date.now()` timestamps) shadow the original real outcome. A re-label flow (`--feedback good` then `--feedback bad`) would target the earlier synthetic, not the original. Fix filters `__synthetic_` ids out of the "most recent" scan. Test 3 pins the correct behavior. Tests (6 new): - 3 synthetic rows for `bad`, 1 for `good` - Re-label clears previous synthetics, inserts new - Empty store → NO_RECENT_OUTCOME - Most-recent-by-timestamp selection - commandType partitioning (curate feedback leaves query outcomes untouched) Scope narrowing: - HARNESS_DISABLED / NO_STORAGE paths exercised indirectly via `readHarnessFeatureConfig` + `openHarnessStoreForProject` tests that already cover the filesystem/config side; the feedback helper composes them and doesn't duplicate coverage. - No per-outcome targeting (`--feedback-outcome <id>`) or explicit `--feedback none` — Phase 7.4 keeps the CLI minimal per the task-doc "out of scope" section. Acceptance criteria from task 7.4: - --feedback <good|bad> on curate + query - Most-recent lookup scoped to (projectId, commandType) - Phase 6 recordFeedback invoked with the right verdict - Repeat with different verdict replaces previous (idempotent) - No-outcome exits 1 - Harness-disabled primary still runs, warn, exit 0 - typecheck / lint / test / build clean
|
Claude finished @danhdoan's task in 5m 11s —— View job PR Review: feat: [ENG-2324] --feedback flag on curate + query (Phase 7 Task 7.4)Tasks
Overall Score: 3 / 5The feature is well-scoped and the core logic is sound. The synthetic-row shadowing fix is the highlight — the problem is subtle (synthetics carry That said there are two issues that need attention before merging: Issues🔴 Bug — incorrect exit code for
🟡 Missing JSON output for feedback
🟡 Duplicated constants / inaccurate
🟡 Lines 80–81 use 🔵 Hardcoded
🔵 Fragile message assertion in Test 4
What's good
|
Correctness:
- `handleFeedback` is now invoked AFTER `withDaemonRetry` resolves
in both curate and query. Previously it ran inside the daemon
callback, which meant `this.error({exit: 1})` from a
NO_RECENT_OUTCOME / NO_STORAGE path was caught by the outer
try/catch and routed to `reportError` — `reportError` only
logs, so the exit code was silently dropped. The fix captures
`projectRoot` into an outer variable and runs feedback on the
post-daemon path, so CLIError propagates cleanly.
- Feedback attaches only when the primary daemon call actually
succeeded (new `daemonSucceeded` guard). A failed primary
already skipped the feedback branch transitively, but pulling
it out of the try/catch required making that intent explicit.
JSON output:
- `--format json` now emits structured feedback events for both
the success path (`{outcomeId, syntheticCount, verdict}`) and
the HARNESS_DISABLED skip path (`{reason, skipped: true}`) in
both commands. Previously JSON consumers had no way to know
whether feedback was attached.
Constants dedup:
- Exported `BAD_SYNTHETIC_COUNT`, `GOOD_SYNTHETIC_COUNT`, and
`SYNTHETIC_DELIMITER` from `harness-outcome-recorder.ts` — the
canonical owner of the §C2 weighting policy. Feedback helper
imports them instead of re-declaring, so a future policy change
can't silently drift.
Not addressed:
- Pre-existing `as` casts on `query.ts:80-81` are not introduced
by this PR; fixing unrelated legacy casts would expand the
review surface.
- Hardcoded `language`/`maxVersions` in the CLI recorder are
documented as ignored by `attachFeedback`; adding a runtime
assertion has no behavioural benefit and would noise the log.
- Test-message substring matches are typical mocha-chai shape;
relaxing to code-only checks loses coverage of the
user-visible text without meaningfully reducing brittleness.
|
Claude finished @danhdoan's task in 1m 48s —— View job PR Review: feat: [ENG-2324] --feedback flag on curate + query (Phase 7 Task 7.4)Tasks
Overall Score: 4 / 5Strong improvement over the first draft. The three substantive issues from the prior review are all fixed. Two minor items remain. Previous Issues — Status✅ Fixed: Exit-code bug (was 🔴)
✅ Fixed: Missing JSON output for feedback (was 🟡)
✅ Fixed: Duplicated constants (was 🟡)
Remaining Issues🟡 // query.ts:80–81
const flags = rawFlags as QueryFlags
const format = (flags.format ?? 'text') as 'json' | 'text'These were flagged in the prior review and are unchanged. const feedbackVerdict: FeedbackVerdict | undefined =
rawFlags.feedback === 'good' || rawFlags.feedback === 'bad' ? rawFlags.feedback : undefined
const flags: QueryFlags = {
feedback: feedbackVerdict,
format: rawFlags.format === 'json' ? 'json' : rawFlags.format === 'text' ? 'text' : undefined,
timeout: rawFlags.timeout,
}
const format: 'json' | 'text' = flags.format ?? 'text'🔵 Fragile message assertion in Test 4 — still present // harness-feedback.test.ts:106
expect((caught as FeedbackError).message).to.include('run curate first')The Acknowledged (no change needed)
Hardcoded What's good
|
- Replace the two `as` casts in query.ts:80-81 with explicit typed
construction — mirrors the curate command's pattern and keeps
the file on the right side of CLAUDE.md's "avoid `as Type`"
rule. The new code paths I added in Task 7.4 touched this block,
so fixing the legacy casts now keeps the reviewed surface
consistent.
- Swap the fragile `.message.to.include('run curate first')`
substring assertion in harness-feedback.test.ts for a
structural check on `details.commandType`. The error-code check
on the previous line was already the strong assertion; the
substring added brittleness without coverage.
|
Claude finished @danhdoan's task in 1m 26s —— View job PR Review: feat: [ENG-2324] --feedback flag on curate + query (Phase 7 Task 7.4)Tasks
Overall Score: 5 / 5Both remaining issues from the previous review are fully resolved. No new issues introduced. This PR is ready to merge. Previous Issues — Status✅ Fixed: Lines 80–81 now use explicit type narrowing matching the curate command's pattern: // query.ts:83–89
const feedbackVerdict: FeedbackVerdict | undefined =
rawFlags.feedback === 'good' || rawFlags.feedback === 'bad' ? rawFlags.feedback : undefined
const flags: QueryFlags = {
feedback: feedbackVerdict,
format: rawFlags.format === 'json' ? 'json' : rawFlags.format === 'text' ? 'text' : undefined,
timeout: rawFlags.timeout,
}No ✅ Fixed: Fragile message assertion in Test 4 (was 🔵) The // harness-feedback.test.ts:104–106
expect(caught).to.be.instanceOf(FeedbackError)
expect((caught as FeedbackError).code).to.equal('NO_RECENT_OUTCOME')
expect((caught as FeedbackError).details.commandType).to.equal('curate')This is strictly better than the prior fix — it not only removes the brittle text check but adds a meaningful behavioural assertion ( Confirmed Good (unchanged from prior review)
|
Summary
Phase 7 Task 7.4 — closes brutal-review Tier 2 D4. Adds
`--feedback <good|bad>` to both `brv curate` and `brv query` so
users can flag the outcome the daemon just wrote, feeding Phase 6
Task 6.5's weighting policy (3× synthetic failures for `bad`,
1× synthetic success for `good`).
CLI surface
`handleFeedback` run after `submitTask`. Detach mode skips with
a hint (no completed outcome yet). Disabled harness → warn +
exit 0. Missing outcome → exit 1.
query command's flow.
Shared lib
`attachFeedbackFromCli(projectRoot, commandType, verdict)` +
pure store variant `attachFeedbackToStore` for tests. Minimal
`HarnessOutcomeRecorder` construction delegates to Phase 6's
canonical `attachFeedback` instead of re-implementing the 3×/1×
policy.
Bug caught in test authoring
First draft scanned `listOutcomes(..., limit=1)` for the most
recent outcome, but synthetic rows (inserted with `Date.now()`
timestamps) shadow the original real outcome. A re-label flow
(`--feedback good` then `--feedback bad`) would target the
earlier synthetic, not the original. Fix filters `_synthetic`
ids out of the scan. Test 3 pins the correct behaviour.
Test plan
outcomes untouched)
Scope narrowing
via existing `readHarnessFeatureConfig` + `openHarnessStoreForProject`
tests that already cover the filesystem/config side; the
feedback helper composes them and doesn't duplicate coverage.
`--feedback none` — kept minimal per the task-doc "out of
scope" section.
Acceptance criteria