Conversation
There was a problem hiding this comment.
Devin Review found 1 potential issue.
⚠️ 1 issue in files not directly in the diff
⚠️ Missing root CHANGELOG entry for cross-package change (AGENTS.md violation) (CHANGELOG.md:7-11)
This PR touches both packages/analyze (new PlanUsageFidelity type + deriveFidelity logic) and packages/cli (fidelity column + footer in burn plans list view + JSON). AGENTS.md states: "Update [Unreleased] only when the work spans packages or warrants a top-level summary; single-package work belongs only in that package's CHANGELOG." Since the work spans two packages, the root CHANGELOG.md's [Unreleased] section should have an entry for #108, but it does not (CHANGELOG.md:7-11).
View 3 additional findings in Devin Review.
1fd106a to
3020a2a
Compare
| ### Added | ||
|
|
||
| - **`compareFromArchive(query, opts)`** ([#88](https://github.com/AgentWorkforce/burn/issues/88)). New helper that builds a `CompareTable` directly from `archive.sqlite` via a single grouped `SELECT … GROUP BY model, activity, source` plus a tiny per-(model, activity) follow-up for median retries, instead of streaming every `EnrichedTurn` through `buildCompareTable` in memory. Returns `{ table, analyzedTurns }` so the caller can populate the same "turns analyzed" header the legacy path uses. Output is byte-identical to `buildCompareTable(await queryAll(q), opts)` for the parity fixture; per-source reasoning-mode handling (Codex's `included_in_output`) is preserved by grouping on `source` alongside `(model, activity)`. Powers the migration of `burn compare` to the archive read model. | ||
| - **`PlanUsage.fidelity` annotates per-cycle token-coverage confidence** ([#108](https://github.com/AgentWorkforce/burn/issues/108)). `computePlanUsage` now walks every contributing turn through `summarizeFidelity` and emits a `{ confidence: 'high' | 'low', summary }` block alongside the existing spend/projection fields. `confidence === 'high'` only when every turn in the cycle is `full` or `usage-only` with both per-turn input and output token coverage; otherwise `low`. Records with no `fidelity` field at all (older ledger writers) are treated as best-effort high, matching the codebase's existing backward-compat policy. Spend totals continue to include `partial` / `aggregate-only` / `cost-only` contributions — under-counting is worse than annotating low-confidence — so the cycle's `spentUsd` is the lower bound the consumer renders against the new flag. The `PlanUsageFidelity` type is exported for downstream consumers. |
There was a problem hiding this comment.
🟡 Analyze CHANGELOG entry placed under already-released [0.31.0] instead of [Unreleased]
The new PlanUsage.fidelity entry is appended under the already-stamped [0.31.0] - 2026-04-27 section (line 15) instead of under [Unreleased] (line 8). The release commit 3164aaf already promoted the previous [Unreleased] block to [0.31.0], leaving [Unreleased] empty. AGENTS.md explicitly states: "Curate [Unreleased] in the relevant per-package packages/*/CHANGELOG.md as you land PRs." The root CHANGELOG.md and packages/cli/CHANGELOG.md correctly place their entries under [Unreleased], making this inconsistent. As written, the entry falsely claims the feature shipped in 0.31.0, and the publish workflow won't pick it up for the next release since it only promotes the [Unreleased] block.
Prompt for agents
The new changelog entry for PlanUsage.fidelity (issue #108) was added under the already-released [0.31.0] section at line 15 of packages/analyze/CHANGELOG.md. Per the AGENTS.md convention, new work should be placed under the [Unreleased] section (currently empty at line 8). Move the bullet from under [0.31.0] to under [Unreleased] with an ### Added subsection, matching how the root CHANGELOG.md and packages/cli/CHANGELOG.md handle the same feature's entry.
Was this helpful? React with 👍 or 👎 to provide feedback.
`computePlanUsage` now annotates each cycle with a `fidelity:
{ confidence, summary }` block computed over its contributing turns.
`confidence === 'high'` only when every turn is `full` or `usage-only`
with both per-turn input and output token coverage; otherwise `low`.
Records without a `fidelity` field stay best-effort high (matches the
codebase's existing backward-compat policy). Spend totals continue to
include `partial` / `aggregate-only` / `cost-only` contributions —
under-counting silently is worse than annotating low-confidence — so
the cycle's `spentUsd` is the lower bound the consumer renders against
the new flag.
`burn plans` (list view) renders a `confidence` column and a footer
note (e.g. `note: claude-pro: 3 of 412 turns this cycle lack per-turn
token data — totals are a lower bound.`) when at least one plan has
any low-confidence cycle. Full-fidelity cycles render exactly as
before. `--json` gains a per-plan `usage.fidelity` block.
`PlanUsageFidelity` is exported from `@relayburn/analyze`. The
`limits.test.ts` mocks now include `fidelity` because `PlanUsage`
gained a required field.
Tests cover the high/low/cost-only/partial cycle paths in analyze,
and the rendered-note + JSON shape in cli.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Will Washburn <will.washburn@gmail.com>
…confidence Co-Authored-By: Will Washburn <will.washburn@gmail.com>
3020a2a to
edef2ed
Compare
Co-Authored-By: Will Washburn <will.washburn@gmail.com>
Summary
computePlanUsagenow annotates each cycle with afidelity: { confidence, summary }block.confidence === 'high'only when every contributing turn isfullorusage-onlywith both per-turn input and output token coverage; otherwiselow. Records without afidelityfield stay best-effort high (matches the codebase's existing pre-Coverage and fidelity metadata: distinguish missing, zero, aggregate-only, and partial usage #41 backward-compat policy). Spend totals continue to includepartial/aggregate-only/cost-onlycontributions — under-counting silently is worse than annotating low-confidence — sospentUsdbecomes the lower bound the consumer renders against the new flag.burn plans(list view) gains aconfidencecolumn when at least one plan has any low-confidence cycle, and a footer note naming the affected plan + lower-bound caveat (note: claude-pro: 3 of 412 turns this cycle lack per-turn token data — totals are a lower bound.). Full-fidelity cycles render exactly as before — no extra column, no footer.--jsonemits a per-planusage.fidelity: { confidence, summary }block carrying the sameFidelitySummaryshapesummarizeFidelityproduces elsewhere.PlanUsageFidelityis exported from@relayburn/analyze.Devin Review fixes + rebase integration
CHANGELOG.mdentry for cross-package change (burn plans: honor fidelity (mark monthly spend totals low-confidence on partial usage) #108) — per AGENTS.md, work spanningpackages/analyzeandpackages/clirequires a root-level[Unreleased]entry.mainafter PR Migrate burn plans rolling-window usage to archive (#91) #131 (plans-from-archive) landed, resolving CHANGELOG + import + test fixture conflicts.planUsageFromArchiveso the archive-backed path emits the samefidelity: { confidence, summary }block as the in-memory path. Queriesattribution_fidelity/tokens_present/cost_presentcolumns from the archive'sturnstable and synthesizesFidelityobjects matching the ledger'ssynthesizeFidelitylogic. Added the three fidelity columns to the test fixture DDL.Rebase consideration for PR #131
PR #131 (issue #91) is migratingResolved — PR #131 has landed and this branch is rebased on top of it. The fidelity annotation flows through both the in-memory and archive-backed paths.burn plansto read fromarchive.sqlite.Review & Testing Checklist for Human
CHANGELOG.mdentry reads well as a release notesynthesizeFidelitysemanticsNotes
pnpm run build— cleanmain(includes Migrate burn plans rolling-window usage to archive (#91) #131)Test plan
pnpm run buildclean.cost-onlycontributions counted toward spend + flagged low-confidence, empty cycle, and out-of-cycle turns ignored.confidencecolumn + footer note when any cycle is low-confidence, and--jsonemits theusage.fidelityblock with the rightconfidence/summaryshape.Refs
Closes #108 — refs #41, #76 (which shipped
summarizeFidelity/hasMinimumFidelity).Link to Devin session: https://app.devin.ai/sessions/64f0c6f8e1cf4e7aad523f45a21c5aca
Requested by: @willwashburn