Skip to content

Add benchmark evidence replay dashboard#202

Merged
ictechgy merged 1 commit into
mainfrom
g001-evidence-replay-dashboard
Jun 14, 2026
Merged

Add benchmark evidence replay dashboard#202
ictechgy merged 1 commit into
mainfrom
g001-evidence-replay-dashboard

Conversation

@ictechgy

Copy link
Copy Markdown
Owner

Summary

  • add context-guard-bench --evidence-jsonl replay mode and --dashboard-md rendering
  • keep CSV schema unchanged while annotating replay/public-claim provenance in reports and ledgers
  • add synthetic 12-task replay evidence fixture plus docs/package/test coverage

Validation

  • python3 scripts/sync_plugin_copies.py --check
  • python3 -m py_compile context-guard-kit/benchmark_runner.py plugins/context-guard/bin/context-guard-bench tests/test_context_guard_kit.py
  • PYTHONDONTWRITEBYTECODE=1 python3 -m unittest tests.test_context_guard_kit.BenchmarkRunnerTests
  • PYTHONDONTWRITEBYTECODE=1 python3 scripts/prepublish_check.py --skip-tests
  • python3 scripts/release_smoke.py --timeout 20
  • PYTHONDONTWRITEBYTECODE=1 python3 scripts/prepublish_check.py

Claim boundary: synthetic/manual replay evidence is not public hosted API token/cost savings evidence; provider-export provenance plus matched-task quality/token/cost gates remain required.

@ictechgy ictechgy force-pushed the g001-evidence-replay-dashboard branch from 23e485a to d003209 Compare June 14, 2026 16:59
@ictechgy

Copy link
Copy Markdown
Owner Author

G001 quad review loop completed for R2.

Review verdicts:

  • Codex: APPROVE — prior provider_export public-claim blocker resolved; no CRITICAL/HIGH/MEDIUM blockers.
  • Claude: APPROVE — report-level eligibility now requires same-run completeness, provider-export provenance, raw savings status, and matched-pair claim boundaries.
  • Forge: APPROVE — no CRITICAL/HIGH/MEDIUM blockers.
  • Agy: APPROVE — no eligibility-gate blocker found after short re-run.

Validation evidence:

  • python3 scripts/sync_plugin_copies.py --check
  • python3 -m py_compile context-guard-kit/benchmark_runner.py plugins/context-guard/bin/context-guard-bench tests/test_context_guard_kit.py
  • PYTHONDONTWRITEBYTECODE=1 python3 -m unittest tests.test_context_guard_kit.BenchmarkRunnerTests
  • PYTHONDONTWRITEBYTECODE=1 python3 scripts/prepublish_check.py --skip-tests
  • python3 scripts/release_smoke.py --timeout 20
  • PYTHONDONTWRITEBYTECODE=1 python3 scripts/prepublish_check.py
  • GitHub CI: test-and-prepublish on 3.11, 3.12, macos-latest/3.12 all SUCCESS.

Proceeding to merge.

@ictechgy ictechgy merged commit 291701c into main Jun 14, 2026
3 checks passed
@ictechgy ictechgy deleted the g001-evidence-replay-dashboard branch June 14, 2026 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant