Skip to content

Honor fidelity in burn plans (#108)#134

Open
willwashburn wants to merge 4 commits intomainfrom
feat/plans-honor-fidelity-108
Open

Honor fidelity in burn plans (#108)#134
willwashburn wants to merge 4 commits intomainfrom
feat/plans-honor-fidelity-108

Conversation

@willwashburn
Copy link
Copy Markdown
Member

@willwashburn willwashburn commented Apr 26, 2026

Summary

  • computePlanUsage now annotates each cycle with a fidelity: { confidence, summary } block. confidence === 'high' only when every contributing turn is full or usage-only with both per-turn input and output token coverage; otherwise low. Records without a fidelity field stay best-effort high (matches the codebase's existing pre-Coverage and fidelity metadata: distinguish missing, zero, aggregate-only, and partial usage #41 backward-compat policy). Spend totals continue to include partial / aggregate-only / cost-only contributions — under-counting silently is worse than annotating low-confidence — so spentUsd becomes the lower bound the consumer renders against the new flag.
  • burn plans (list view) gains a confidence column when at least one plan has any low-confidence cycle, and a footer note naming the affected plan + lower-bound caveat (note: claude-pro: 3 of 412 turns this cycle lack per-turn token data — totals are a lower bound.). Full-fidelity cycles render exactly as before — no extra column, no footer.
  • --json emits a per-plan usage.fidelity: { confidence, summary } block carrying the same FidelitySummary shape summarizeFidelity produces elsewhere.
  • PlanUsageFidelity is exported from @relayburn/analyze.

Devin Review fixes + rebase integration

  • Added missing root CHANGELOG.md entry for cross-package change (burn plans: honor fidelity (mark monthly spend totals low-confidence on partial usage) #108) — per AGENTS.md, work spanning packages/analyze and packages/cli requires a root-level [Unreleased] entry.
  • Rebased on main after PR Migrate burn plans rolling-window usage to archive (#91) #131 (plans-from-archive) landed, resolving CHANGELOG + import + test fixture conflicts.
  • Wired fidelity into planUsageFromArchive so the archive-backed path emits the same fidelity: { confidence, summary } block as the in-memory path. Queries attribution_fidelity / tokens_present / cost_present columns from the archive's turns table and synthesizes Fidelity objects matching the ledger's synthesizeFidelity logic. Added the three fidelity columns to the test fixture DDL.

Rebase consideration for PR #131

PR #131 (issue #91) is migrating burn plans to read from archive.sqlite. Resolved — PR #131 has landed and this branch is rebased on top of it. The fidelity annotation flows through both the in-memory and archive-backed paths.

Review & Testing Checklist for Human

  • Verify root CHANGELOG.md entry reads well as a release note
  • Verify the archive fidelity integration matches the ledger's synthesizeFidelity semantics
  • Confirm the 3 additional Devin Review findings visible in the Devin Review UI are addressed or acceptable

Notes

Test plan

  • pnpm run build clean.
  • All analyze plan-usage tests pass (32 total: 23 in-memory + 9 archive parity).
  • New analyze tests cover: high-confidence cycle (all full), high-confidence cycle (usage-only with both axes), high-confidence cycle for unknown-fidelity (older ledger writers), low-confidence cycle (partial turn), cost-only contributions counted toward spend + flagged low-confidence, empty cycle, and out-of-cycle turns ignored.
  • New CLI tests cover: text table omits the column + footer when every cycle is full-fidelity, text table renders the confidence column + footer note when any cycle is low-confidence, and --json emits the usage.fidelity block with the right confidence / summary shape.

Refs

Closes #108 — refs #41, #76 (which shipped summarizeFidelity / hasMinimumFidelity).

Link to Devin session: https://app.devin.ai/sessions/64f0c6f8e1cf4e7aad523f45a21c5aca
Requested by: @willwashburn


Open in Devin Review

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

⚠️ 1 issue in files not directly in the diff

⚠️ Missing root CHANGELOG entry for cross-package change (AGENTS.md violation) (CHANGELOG.md:7-11)

This PR touches both packages/analyze (new PlanUsageFidelity type + deriveFidelity logic) and packages/cli (fidelity column + footer in burn plans list view + JSON). AGENTS.md states: "Update [Unreleased] only when the work spans packages or warrants a top-level summary; single-package work belongs only in that package's CHANGELOG." Since the work spans two packages, the root CHANGELOG.md's [Unreleased] section should have an entry for #108, but it does not (CHANGELOG.md:7-11).

View 3 additional findings in Devin Review.

Open in Devin Review

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 5 additional findings in Devin Review.

Open in Devin Review

### Added

- **`compareFromArchive(query, opts)`** ([#88](https://github.com/AgentWorkforce/burn/issues/88)). New helper that builds a `CompareTable` directly from `archive.sqlite` via a single grouped `SELECT … GROUP BY model, activity, source` plus a tiny per-(model, activity) follow-up for median retries, instead of streaming every `EnrichedTurn` through `buildCompareTable` in memory. Returns `{ table, analyzedTurns }` so the caller can populate the same "turns analyzed" header the legacy path uses. Output is byte-identical to `buildCompareTable(await queryAll(q), opts)` for the parity fixture; per-source reasoning-mode handling (Codex's `included_in_output`) is preserved by grouping on `source` alongside `(model, activity)`. Powers the migration of `burn compare` to the archive read model.
- **`PlanUsage.fidelity` annotates per-cycle token-coverage confidence** ([#108](https://github.com/AgentWorkforce/burn/issues/108)). `computePlanUsage` now walks every contributing turn through `summarizeFidelity` and emits a `{ confidence: 'high' | 'low', summary }` block alongside the existing spend/projection fields. `confidence === 'high'` only when every turn in the cycle is `full` or `usage-only` with both per-turn input and output token coverage; otherwise `low`. Records with no `fidelity` field at all (older ledger writers) are treated as best-effort high, matching the codebase's existing backward-compat policy. Spend totals continue to include `partial` / `aggregate-only` / `cost-only` contributions — under-counting is worse than annotating low-confidence — so the cycle's `spentUsd` is the lower bound the consumer renders against the new flag. The `PlanUsageFidelity` type is exported for downstream consumers.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Analyze CHANGELOG entry placed under already-released [0.31.0] instead of [Unreleased]

The new PlanUsage.fidelity entry is appended under the already-stamped [0.31.0] - 2026-04-27 section (line 15) instead of under [Unreleased] (line 8). The release commit 3164aaf already promoted the previous [Unreleased] block to [0.31.0], leaving [Unreleased] empty. AGENTS.md explicitly states: "Curate [Unreleased] in the relevant per-package packages/*/CHANGELOG.md as you land PRs." The root CHANGELOG.md and packages/cli/CHANGELOG.md correctly place their entries under [Unreleased], making this inconsistent. As written, the entry falsely claims the feature shipped in 0.31.0, and the publish workflow won't pick it up for the next release since it only promotes the [Unreleased] block.

Prompt for agents
The new changelog entry for PlanUsage.fidelity (issue #108) was added under the already-released [0.31.0] section at line 15 of packages/analyze/CHANGELOG.md. Per the AGENTS.md convention, new work should be placed under the [Unreleased] section (currently empty at line 8). Move the bullet from under [0.31.0] to under [Unreleased] with an ### Added subsection, matching how the root CHANGELOG.md and packages/cli/CHANGELOG.md handle the same feature's entry.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

willwashburn and others added 3 commits April 27, 2026 14:14
`computePlanUsage` now annotates each cycle with a `fidelity:
{ confidence, summary }` block computed over its contributing turns.
`confidence === 'high'` only when every turn is `full` or `usage-only`
with both per-turn input and output token coverage; otherwise `low`.
Records without a `fidelity` field stay best-effort high (matches the
codebase's existing backward-compat policy). Spend totals continue to
include `partial` / `aggregate-only` / `cost-only` contributions —
under-counting silently is worse than annotating low-confidence — so
the cycle's `spentUsd` is the lower bound the consumer renders against
the new flag.

`burn plans` (list view) renders a `confidence` column and a footer
note (e.g. `note: claude-pro: 3 of 412 turns this cycle lack per-turn
token data — totals are a lower bound.`) when at least one plan has
any low-confidence cycle. Full-fidelity cycles render exactly as
before. `--json` gains a per-plan `usage.fidelity` block.

`PlanUsageFidelity` is exported from `@relayburn/analyze`. The
`limits.test.ts` mocks now include `fidelity` because `PlanUsage`
gained a required field.

Tests cover the high/low/cost-only/partial cycle paths in analyze,
and the rendered-note + JSON shape in cli.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Will Washburn <will.washburn@gmail.com>
…confidence

Co-Authored-By: Will Washburn <will.washburn@gmail.com>
@devin-ai-integration devin-ai-integration Bot force-pushed the feat/plans-honor-fidelity-108 branch from 3020a2a to edef2ed Compare April 27, 2026 14:18
Co-Authored-By: Will Washburn <will.washburn@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

burn plans: honor fidelity (mark monthly spend totals low-confidence on partial usage)

1 participant