ictechgy · ictechgy · Jun 14, 2026 · Jun 14, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,7 @@ All notable changes for the ContextGuard plugin are documented here.
 
 ## [Unreleased]
 
+- Added `context-guard-bench --evidence-jsonl` replay and `--dashboard-md` rendering so synthetic/local benchmark evidence can regenerate CSV/report/dashboard artifacts while remaining non-public-claim-eligible unless provider-export provenance is complete.
 - Extended Batch 1 token-savings advisory reports with cache-score amortization risk fields, tool-prune deferred-schema proxy accounting, and a benchmark measurement-baseline contract while preserving local-only/no-savings-claim boundaries.
 - Clarified cache-score amortization output for cache-read multipliers above uncached cost by reporting a bounded `max_profitable_reuses` instead of a monotonic break-even reuse count.
 

diff --git a/README.ko.md b/README.ko.md
@@ -375,7 +375,7 @@ JSON 출력에는 여러 증거 surface가 포함될 수 있습니다.
 - 비용 필드가 0이거나 없으면 토큰 절감만 표시하고 실제 비용 절감은 주장하지 않습니다.
 - CSV 스키마는 엄격하게 검사합니다. 벤치마크 헬퍼를 업그레이드한 뒤에는 새 `--csv` 파일을 시작하거나 mismatch 오류가 알려주는 헤더로 마이그레이션하세요.
 
-최소 보고서 형태 예시는 [`docs/benchmark-report.example.json`](docs/benchmark-report.example.json)을, 작업 유형별 합성 예시와 안전한 해석 경계는 [`docs/benchmark-workflow-examples.md`](docs/benchmark-workflow-examples.md)을, fixture-only 실험 시작 예시는 [`docs/experimental-benchmark-fixtures.md`](docs/experimental-benchmark-fixtures.md)을 참고하세요.
+최소 보고서 형태 예시는 [`docs/benchmark-report.example.json`](docs/benchmark-report.example.json)을, 작업 유형별 합성 예시와 안전한 해석 경계는 [`docs/benchmark-workflow-examples.md`](docs/benchmark-workflow-examples.md)을, fixture-only 실험 시작 예시는 [`docs/experimental-benchmark-fixtures.md`](docs/experimental-benchmark-fixtures.md)을 참고하세요. live provider 실행 전 deterministic local replay가 필요하면 `--evidence-jsonl docs/benchmark-fixtures/token-savings-12task.evidence.example.jsonl --dashboard-md ... --baseline-variant baseline_full_context_fixture`를 사용하세요. Replay mode는 provider와 `success_command`를 실행하지 않고 CSV/report/dashboard를 만들지만 synthetic/manual evidence는 public hosted-savings claim 불가로 표시합니다.
 
 ### 실험 기능 opt-in 관리
 

diff --git a/README.md b/README.md
@@ -406,9 +406,12 @@ These fields can flag likely volatile content near the prompt prefix, stable-pre
 ```bash
 ./plugins/context-guard/bin/context-guard-bench \
   --tasks bench/tasks.json --variants bench/variants.json --csv bench/results.csv \
-  --ledger-jsonl bench/cost-shift.jsonl --report-json bench/report.json
+  --ledger-jsonl bench/cost-shift.jsonl --report-json bench/report.json \
+  --dashboard-md bench/dashboard.md
 ```
 
+For deterministic local replay before a live provider run, add `--evidence-jsonl docs/benchmark-fixtures/token-savings-12task.evidence.example.jsonl` and, for the 12-task fixture, `--baseline-variant baseline_full_context_fixture`. Replay mode skips provider and `success_command` execution, writes the same CSV/report/dashboard surfaces, and marks synthetic/manual evidence as non-public-claim-eligible.
+
 Read the report through its claim boundaries before writing any savings statement:
 
 - Successful baseline/variant runs are compared by real tokens and `cost_usd + external_cost_usd`; byte reductions stay proxy evidence.
@@ -419,7 +422,7 @@ Read the report through its claim boundaries before writing any savings statemen
 - If cost fields are zero or unavailable, the report can still mark token savings but will not claim shifted-cost savings.
 - CSV schemas are strict; after upgrading the benchmark helper, start a new `--csv` file or migrate the header named in the mismatch error.
 
-See [`docs/benchmark-report.example.json`](docs/benchmark-report.example.json) for a minimal report-shape example, [`docs/benchmark-workflow-examples.md`](docs/benchmark-workflow-examples.md) for workflow-specific synthetic examples, and [`docs/experimental-benchmark-fixtures.md`](docs/experimental-benchmark-fixtures.md) for fixture-only experimental task/variant starters.
+See [`docs/benchmark-report.example.json`](docs/benchmark-report.example.json) for a minimal report-shape example, [`docs/benchmark-workflow-examples.md`](docs/benchmark-workflow-examples.md) for workflow-specific synthetic examples, and [`docs/experimental-benchmark-fixtures.md`](docs/experimental-benchmark-fixtures.md) for fixture-only experimental task/variant starters plus synthetic evidence replay.
 
 ### Manage experimental opt-ins
 

diff --git a/context-guard-kit/README.md b/context-guard-kit/README.md
@@ -79,7 +79,7 @@ python3 context-guard-kit/sanitize_output.py -- git diff
 
 `experimental_registry.py plan local-proxy`는 localhost-only dry-run 안내 plan입니다. `experimental_registry.py plan local-proxy-external-forwarding`은 future external forwarding을 위한 design-only dry-run gate이며 explicit intent, HTTPS allowlist, threat model note, credential redaction policy, provider-evidence boundary를 요구하고 DNS lookup, external service call, traffic forwarding은 하지 않습니다. `experimental_registry.py record local-proxy-runtime-gate --ledger-jsonl ...`은 listener 시작, traffic forwarding, DNS lookup 없이 local gate row 하나만 기록하는 명시적 runtime입니다. `experimental_registry.py serve local-proxy`는 명시적 one-shot loopback forwarding MVP이며 `--runtime-gate-ack --forwarding-gate-ack --once`, private `--ready-file` nonce handoff, literal loopback bind/target IP, hostname DNS target 금지, nonzero port, byte/time limit, credential-free request가 필요합니다. API key를 저장하지 않고, external forwarding이나 CONNECT/TLS proxying을 지원하지 않으며, hosted savings claim도 만들지 않습니다. 선택적 `--diagnostic-ledger-jsonl`은 successful forwarded request 뒤에 raw header/body나 hosted-savings evidence 없이 shifted-cost diagnostic row 하나만 추가합니다. External proxy forwarding runtime은 shipped가 아니며, 나머지 roadmap lane은 별도 runtime gate가 생기기 전까지 안내 상태로 남습니다.
 
-`benchmark_runner.py`는 `research/benchmark-plan.md`의 고정 task/variant 실험을 실행합니다. `variant_prompt_files`는 선택된 task/variant를 필터링한 뒤 필요한 file-backed prompt만 읽으므로 선택하지 않은 fixture의 누락 파일이 선택된 실행을 깨지 않습니다. `--ledger-jsonl`은 subagent·artifact 등 외부 실행 표면으로 옮겨간 token/cost와 run별 측정 가능 여부를 남기고, 선택적 `self_hosted_metrics` provider payload는 run별 sidecar로만 기록합니다. `--report-json`은 baseline 대비 실제 token/cost 절감과 proxy byte 감소를 분리한 A/B report를 생성하며, `self_hosted_metrics`는 CSV/report 요약에 접지 않습니다. Report의 `matched_pair_evidence`는 성공한 baseline/variant task bucket을 transform, quality gate, 측정 가능 여부, claim boundary와 연결하므로 절감 주장을 쓰기 전에 이 항목을 확인하세요.
+`benchmark_runner.py`는 `research/benchmark-plan.md`의 고정 task/variant 실험을 실행합니다. `variant_prompt_files`는 선택된 task/variant를 필터링한 뒤 필요한 file-backed prompt만 읽으므로 선택하지 않은 fixture의 누락 파일이 선택된 실행을 깨지 않습니다. `--ledger-jsonl`은 subagent·artifact 등 외부 실행 표면으로 옮겨간 token/cost와 run별 측정 가능 여부를 남기고, 선택적 `self_hosted_metrics` provider payload는 run별 sidecar로만 기록합니다. `--report-json`은 baseline 대비 실제 token/cost 절감과 proxy byte 감소를 분리한 A/B report를 생성하며, `--dashboard-md`는 같은 report에서 Markdown dashboard를 렌더링합니다. `--evidence-jsonl` replay는 provider와 `success_command`를 실행하지 않는 deterministic import mode이고, synthetic/manual evidence는 public hosted-savings claim 불가로 강제됩니다. `self_hosted_metrics`는 CSV/report 요약에 접지 않습니다. Report의 `matched_pair_evidence`는 성공한 baseline/variant task bucket을 transform, quality gate, 측정 가능 여부, claim boundary와 연결하므로 절감 주장을 쓰기 전에 이 항목을 확인하세요.
 
 `../research/experimental-token-reduction-radar.md`는 learned compression, generated crop/OCR/visual-token pruning, self-hosted KV/latent inference optimization 같은 선택적 미래 실험을 문서화한 gate입니다. `../docs/experimental-benchmark-fixtures.md`에는 fixture-only task/variant 시작 예시가 있습니다. 이 radar와 fixture는 hosted API token/cost 절감을 보장하지 않습니다. 현재 제공되는 helper surface는 명시적 local context-diff emit, visual evidence-pack emit, learned candidate emit, self-hosted metrics record, local proxy gate record, one-shot literal-loopback local proxy serve, design-only external-forwarding plan 같은 좁은 local surface뿐이며, hosted API token/cost 절감 주장은 provider가 측정한 matched-task 근거가 있을 때만 허용합니다. Radar의 later-roadmap gate는 neural/semantic compression, trust-tiered injection-aware compression, generated visual-token reduction, broader external/daemon/hostname-DNS/credential-bearing local proxy forwarding constraints를 별도 미래 PR이 gate를 통과하기 전까지 experimental/non-shipped로 유지합니다.