Skip to content

Harden token savings advisory reports#200

Merged
ictechgy merged 2 commits into
mainfrom
ultragoal/token-savings-batch1-followup
Jun 14, 2026
Merged

Harden token savings advisory reports#200
ictechgy merged 2 commits into
mainfrom
ultragoal/token-savings-batch1-followup

Conversation

@ictechgy

Copy link
Copy Markdown
Owner

Summary

  • Add cache-score amortization advisory fields using explicit user-supplied cache write/read multipliers only.
  • Extend tool-prune defer-report with gross/net deferred-schema char/4 proxy accounting and retrieval-required boundaries.
  • Add a benchmark report measurement-baseline contract that documents captured fields, claim-eligible fields, proxy-only fields, and future run-identity gaps without changing CSV schema.
  • Refresh README/plugin/kit docs and changelog while preserving no-provider-call/no-hosted-savings-claim boundaries.

Validation

  • PYTHONDONTWRITEBYTECODE=1 python3 -m py_compile context-guard-kit/cache_score.py context-guard-kit/tool_schema_pruner.py context-guard-kit/benchmark_runner.py tests/test_context_guard_kit.py
  • PYTHONDONTWRITEBYTECODE=1 python3 -m unittest tests.test_context_guard_kit.ClaudeTokenKitTests -k cache_score
  • PYTHONDONTWRITEBYTECODE=1 python3 -m unittest tests.test_context_guard_kit.ClaudeTokenKitTests -k tool_prune
  • PYTHONDONTWRITEBYTECODE=1 python3 -m unittest tests.test_context_guard_kit.BenchmarkRunnerTests -k benchmark_report
  • python3 scripts/sync_plugin_copies.py --check
  • git diff --check
  • PYTHONDONTWRITEBYTECODE=1 python3 scripts/prepublish_check.py --skip-tests
  • PYTHONDONTWRITEBYTECODE=1 python3 scripts/release_smoke.py --timeout 20
  • PYTHONDONTWRITEBYTECODE=1 python3 scripts/prepublish_check.py — 691 tests OK

Claim boundary

This PR is advisory/local-only. It does not add provider calls, bundled pricing defaults, native provider tool-search configuration, lossy compression, or hosted API token/cost savings claims.

@ictechgy

Copy link
Copy Markdown
Owner Author

Quad review + validation evidence

Local validation before PR / after R2 fix:

  • python3 scripts/sync_plugin_copies.py --check — OK
  • git diff --check — OK
  • PYTHONDONTWRITEBYTECODE=1 python3 -m py_compile context-guard-kit/cache_score.py context-guard-kit/tool_schema_pruner.py context-guard-kit/benchmark_runner.py tests/test_context_guard_kit.py — OK
  • PYTHONDONTWRITEBYTECODE=1 python3 -m unittest tests.test_context_guard_kit.ClaudeTokenKitTests -k cache_score — 3 tests OK
  • PYTHONDONTWRITEBYTECODE=1 python3 -m unittest tests.test_context_guard_kit.ClaudeTokenKitTests -k tool_prune — 14 tests OK
  • PYTHONDONTWRITEBYTECODE=1 python3 -m unittest tests.test_context_guard_kit.BenchmarkRunnerTests -k benchmark_report — 13 tests OK
  • PYTHONDONTWRITEBYTECODE=1 python3 scripts/prepublish_check.py --skip-tests — OK
  • PYTHONDONTWRITEBYTECODE=1 python3 scripts/release_smoke.py --timeout 20 — OK
  • PYTHONDONTWRITEBYTECODE=1 python3 scripts/prepublish_check.py — 691 tests OK

Quad review loop:

  • Codex R1: REQUEST_CHANGES, MEDIUM cache amortization risk accounting issue.
  • Forge R1: APPROVE with LOW note on same cache read-premium risk; accepted and fixed with the Codex MEDIUM.
  • Agy R1: APPROVE.
  • Claude R1: full-diff run produced no usable output; re-run on R2 fix diff.
  • R2 fix: cache-score now compares expected cached vs uncached relative cost and returns no_read_discount/high for write-cheaper/read-more-expensive negative-savings cases; measurement baseline now lists primary_tokens_measured.
  • Codex R2: APPROVE.
  • Claude R2: APPROVE.
  • Forge R2: APPROVE.
  • Agy R2: APPROVE.

PR CI:

  • test-and-prepublish (3.11) — pass
  • test-and-prepublish (3.12) — pass
  • test-and-prepublish (macos-latest, 3.12) — pass

No unresolved CRITICAL/HIGH blockers and no accepted unresolved MEDIUM blockers remain.

@ictechgy ictechgy merged commit adf79b7 into main Jun 14, 2026
3 checks passed
@ictechgy ictechgy deleted the ultragoal/token-savings-batch1-followup branch June 14, 2026 13:44
ictechgy added a commit that referenced this pull request Jun 14, 2026
Follow-up to PR #200 final review: clarifies read-premium cache-score amortization by removing monotonic break-even semantics, adding positive-only max_profitable_reuses, and covering exact/decimal break-even plus zero-read cases.

Validation:
- sync_plugin_copies.py --check
- py_compile changed files
- cache_score unittest subset (3 tests)
- prepublish_check.py --skip-tests
- release_smoke.py --timeout 20
- full prepublish_check.py (691 tests)
- PR CI green
- final code review/architect review clear
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant