Honor fidelity in burn plans (#108) by willwashburn · Pull Request #134 · AgentWorkforce/burn

willwashburn · 2026-04-26T19:33:59Z

Summary

computePlanUsage now annotates each cycle with a fidelity: { confidence, summary } block. confidence === 'high' only when every contributing turn is full or usage-only with both per-turn input and output token coverage; otherwise low. Records without a fidelity field stay best-effort high (matches the codebase's existing pre-Coverage and fidelity metadata: distinguish missing, zero, aggregate-only, and partial usage #41 backward-compat policy). Spend totals continue to include partial / aggregate-only / cost-only contributions — under-counting silently is worse than annotating low-confidence — so spentUsd becomes the lower bound the consumer renders against the new flag.
burn plans (list view) gains a confidence column when at least one plan has any low-confidence cycle, and a footer note naming the affected plan + lower-bound caveat (note: claude-pro: 3 of 412 turns this cycle lack per-turn token data — totals are a lower bound.). Full-fidelity cycles render exactly as before — no extra column, no footer.
--json emits a per-plan usage.fidelity: { confidence, summary } block carrying the same FidelitySummary shape summarizeFidelity produces elsewhere.
PlanUsageFidelity is exported from @relayburn/analyze.

Devin Review fixes + rebase integration

Added missing root CHANGELOG.md entry for cross-package change (burn plans: honor fidelity (mark monthly spend totals low-confidence on partial usage) #108) — per AGENTS.md, work spanning packages/analyze and packages/cli requires a root-level [Unreleased] entry.
Rebased on main after PR Migrate burn plans rolling-window usage to archive (#91) #131 (plans-from-archive) landed, resolving CHANGELOG + import + test fixture conflicts.
Wired fidelity into planUsageFromArchive so the archive-backed path emits the same fidelity: { confidence, summary } block as the in-memory path. Queries attribution_fidelity / tokens_present / cost_present columns from the archive's turns table and synthesizes Fidelity objects matching the ledger's synthesizeFidelity logic. Added the three fidelity columns to the test fixture DDL.

Rebase consideration for PR #131

~~PR #131 (issue #91) is migrating burn plans to read from archive.sqlite.~~ Resolved — PR #131 has landed and this branch is rebased on top of it. The fidelity annotation flows through both the in-memory and archive-backed paths.

Review & Testing Checklist for Human

Verify root CHANGELOG.md entry reads well as a release note
Verify the archive fidelity integration matches the ledger's synthesizeFidelity semantics
Confirm the 3 additional Devin Review findings visible in the Devin Review UI are addressed or acceptable

Notes

pnpm run build — clean
All 32 analyze plan-usage tests pass (23 in-memory + 9 archive parity)
The branch is cleanly rebased on current main (includes Migrate burn plans rolling-window usage to archive (#91) #131)

Test plan

pnpm run build clean.
All analyze plan-usage tests pass (32 total: 23 in-memory + 9 archive parity).
New analyze tests cover: high-confidence cycle (all full), high-confidence cycle (usage-only with both axes), high-confidence cycle for unknown-fidelity (older ledger writers), low-confidence cycle (partial turn), cost-only contributions counted toward spend + flagged low-confidence, empty cycle, and out-of-cycle turns ignored.
New CLI tests cover: text table omits the column + footer when every cycle is full-fidelity, text table renders the confidence column + footer note when any cycle is low-confidence, and --json emits the usage.fidelity block with the right confidence / summary shape.

Refs

Closes #108 — refs #41, #76 (which shipped summarizeFidelity / hasMinimumFidelity).

Link to Devin session: https://app.devin.ai/sessions/64f0c6f8e1cf4e7aad523f45a21c5aca
Requested by: @willwashburn

devin-ai-integration

Devin Review found 1 potential issue.

⚠️ 1 issue in files not directly in the diff

⚠️ Missing root CHANGELOG entry for cross-package change (AGENTS.md violation) (`CHANGELOG.md:7-11`)

This PR touches both packages/analyze (new PlanUsageFidelity type + deriveFidelity logic) and packages/cli (fidelity column + footer in burn plans list view + JSON). AGENTS.md states: "Update [Unreleased] only when the work spans packages or warrants a top-level summary; single-package work belongs only in that package's CHANGELOG." Since the work spans two packages, the root CHANGELOG.md's [Unreleased] section should have an entry for #108, but it does not (CHANGELOG.md:7-11).

View 3 additional findings in Devin Review.

devin-ai-integration

Devin Review found 1 new potential issue.

View 5 additional findings in Devin Review.

devin-ai-integration · 2026-04-27T13:48:32Z

 ### Added

 - **`compareFromArchive(query, opts)`** ([#88](https://github.com/AgentWorkforce/burn/issues/88)). New helper that builds a `CompareTable` directly from `archive.sqlite` via a single grouped `SELECT … GROUP BY model, activity, source` plus a tiny per-(model, activity) follow-up for median retries, instead of streaming every `EnrichedTurn` through `buildCompareTable` in memory. Returns `{ table, analyzedTurns }` so the caller can populate the same "turns analyzed" header the legacy path uses. Output is byte-identical to `buildCompareTable(await queryAll(q), opts)` for the parity fixture; per-source reasoning-mode handling (Codex's `included_in_output`) is preserved by grouping on `source` alongside `(model, activity)`. Powers the migration of `burn compare` to the archive read model.
+- **`PlanUsage.fidelity` annotates per-cycle token-coverage confidence** ([#108](https://github.com/AgentWorkforce/burn/issues/108)). `computePlanUsage` now walks every contributing turn through `summarizeFidelity` and emits a `{ confidence: 'high' | 'low', summary }` block alongside the existing spend/projection fields. `confidence === 'high'` only when every turn in the cycle is `full` or `usage-only` with both per-turn input and output token coverage; otherwise `low`. Records with no `fidelity` field at all (older ledger writers) are treated as best-effort high, matching the codebase's existing backward-compat policy. Spend totals continue to include `partial` / `aggregate-only` / `cost-only` contributions — under-counting is worse than annotating low-confidence — so the cycle's `spentUsd` is the lower bound the consumer renders against the new flag. The `PlanUsageFidelity` type is exported for downstream consumers.


🟡 Analyze CHANGELOG entry placed under already-released [0.31.0] instead of [Unreleased]

The new PlanUsage.fidelity entry is appended under the already-stamped [0.31.0] - 2026-04-27 section (line 15) instead of under [Unreleased] (line 8). The release commit 3164aaf already promoted the previous [Unreleased] block to [0.31.0], leaving [Unreleased] empty. AGENTS.md explicitly states: "Curate [Unreleased] in the relevant per-package packages/*/CHANGELOG.md as you land PRs." The root CHANGELOG.md and packages/cli/CHANGELOG.md correctly place their entries under [Unreleased], making this inconsistent. As written, the entry falsely claims the feature shipped in 0.31.0, and the publish workflow won't pick it up for the next release since it only promotes the [Unreleased] block.

Prompt for agents

The new changelog entry for PlanUsage.fidelity (issue #108) was added under the already-released [0.31.0] section at line 15 of packages/analyze/CHANGELOG.md. Per the AGENTS.md convention, new work should be placed under the [Unreleased] section (currently empty at line 8). Move the bullet from under [0.31.0] to under [Unreleased] with an ### Added subsection, matching how the root CHANGELOG.md and packages/cli/CHANGELOG.md handle the same feature's entry.

Was this helpful? React with 👍 or 👎 to provide feedback.

`computePlanUsage` now annotates each cycle with a `fidelity: { confidence, summary }` block computed over its contributing turns. `confidence === 'high'` only when every turn is `full` or `usage-only` with both per-turn input and output token coverage; otherwise `low`. Records without a `fidelity` field stay best-effort high (matches the codebase's existing backward-compat policy). Spend totals continue to include `partial` / `aggregate-only` / `cost-only` contributions — under-counting silently is worse than annotating low-confidence — so the cycle's `spentUsd` is the lower bound the consumer renders against the new flag. `burn plans` (list view) renders a `confidence` column and a footer note (e.g. `note: claude-pro: 3 of 412 turns this cycle lack per-turn token data — totals are a lower bound.`) when at least one plan has any low-confidence cycle. Full-fidelity cycles render exactly as before. `--json` gains a per-plan `usage.fidelity` block. `PlanUsageFidelity` is exported from `@relayburn/analyze`. The `limits.test.ts` mocks now include `fidelity` because `PlanUsage` gained a required field. Tests cover the high/low/cost-only/partial cycle paths in analyze, and the rendered-note + JSON shape in cli. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Will Washburn <will.washburn@gmail.com>

…confidence Co-Authored-By: Will Washburn <will.washburn@gmail.com>

Co-Authored-By: Will Washburn <will.washburn@gmail.com>

devin-ai-integration Bot reviewed Apr 26, 2026

View reviewed changes

This was referenced Apr 26, 2026

burn summary: surface fidelity per-cell (partial-coverage notation, footer note, JSON block) #136

Open

Complete fidelity-aware command contracts #127

Closed

devin-ai-integration Bot force-pushed the feat/plans-honor-fidelity-108 branch 3 times, most recently from 1fd106a to 3020a2a Compare April 27, 2026 13:40

devin-ai-integration Bot reviewed Apr 27, 2026

View reviewed changes

willwashburn and others added 3 commits April 27, 2026 14:14

Add root CHANGELOG entry for cross-package fidelity work (#108)

f0126d3

Co-Authored-By: Will Washburn <will.washburn@gmail.com>

Wire fidelity into planUsageFromArchive so archive-backed plans emit …

edef2ed

…confidence Co-Authored-By: Will Washburn <will.washburn@gmail.com>

devin-ai-integration Bot force-pushed the feat/plans-honor-fidelity-108 branch from 3020a2a to edef2ed Compare April 27, 2026 14:18

Fix fidelity CLI tests: use --no-archive to test exact per-axis coverage

5212bb2

Co-Authored-By: Will Washburn <will.washburn@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Honor fidelity in burn plans (#108)#134

Honor fidelity in burn plans (#108)#134
willwashburn wants to merge 4 commits intomainfrom
feat/plans-honor-fidelity-108

willwashburn commented Apr 26, 2026 •

edited by devin-ai-integration Bot

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

willwashburn commented Apr 26, 2026 • edited by devin-ai-integration Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Devin Review fixes + rebase integration

Rebase consideration for PR #131

Review & Testing Checklist for Human

Notes

Test plan

Refs

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

⚠️ Missing root CHANGELOG entry for cross-package change (AGENTS.md violation) (CHANGELOG.md:7-11)

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

willwashburn commented Apr 26, 2026 •

edited by devin-ai-integration Bot

Loading

⚠️ Missing root CHANGELOG entry for cross-package change (AGENTS.md violation) (`CHANGELOG.md:7-11`)