ictechgy · ictechgy · Jun 14, 2026 · Jun 14, 2026 · Jun 14, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,8 @@ All notable changes for the ContextGuard plugin are documented here.
 
 ## [Unreleased]
 
+- Extended Batch 1 token-savings advisory reports with cache-score amortization risk fields, tool-prune deferred-schema proxy accounting, and a benchmark measurement-baseline contract while preserving local-only/no-savings-claim boundaries.
+
 ## [0.4.10] - 2026-06-14
 
 - Added `context-guard-artifact search`, a local sanitized artifact sandbox search that returns capped literal matches with exact `get --lines` rehydration commands and no hosted savings claims.

diff --git a/README.ko.md b/README.ko.md
@@ -102,6 +102,7 @@ brief 모드는 코딩 에이전트가 군더더기를 줄이도록 요청하되
 - `context-guard-audit`가 보고한 대화 기록 사용량 집중 지점, `cache_friendliness` 프롬프트 배치 신호, `cache_layout_advice` 실험 우선순위
 - 상태표시줄의 `cache` / `reuse` 값: ContextGuard가 직접 만든 절감 효과가 아니라 관찰된 대화 기록·provider cache 신호입니다.
 - `context-guard cost preflight`로 Anthropic 요청 JSON의 추정 비용을 보고, 호출 뒤 `context-guard cost observe`로 provider usage 필드(`cache_creation_input_tokens`, `cache_read_input_tokens`)를 대조합니다.
+- `context-guard-cache-score`로 정적 cache layout과, 사용자가 직접 넣은 cache write/read multiplier 기반 amortization 위험을 안내받습니다. char/4 토큰 값은 provider 측정 절감이 아니라 추정 proxy입니다.
 - `context-guard-bench`로 성공한 기준/변형 실행을 쌍으로 맞춰 비교한 결과
 - 큰 tool/MCP catalog와 `context-guard-tool-prune` top-k 리포트 및 요약 기록 재조회 방식의 차이
 - [`research/experimental-token-reduction-radar.md`](research/experimental-token-reduction-radar.md)의 선택적 실험 lane과 마찬가지로, [`docs/experimental-benchmark-fixtures.md`](docs/experimental-benchmark-fixtures.md)의 fixture-only 시작 예시도 절감 주장을 하려면 같은 matched-task benchmark gate를 먼저 통과해야 합니다.
@@ -282,10 +283,14 @@ long-command 2>&1 | ./plugins/context-guard/bin/context-guard-artifact store --c
   --catalog tools.json \
   --query "review failing tests" \
   --top 5 --budget-bytes 12000 --json
+./plugins/context-guard/bin/context-guard-tool-prune defer-report \
+  --catalog tools.json \
+  --query "review failing tests" \
+  --core-top 3 --deferred-top 20 --json
 ./plugins/context-guard/bin/context-guard-tool-prune get <receipt_id> --tool read_file --json
 ```
 
-`context-guard-tool-prune`은 로컬 tool 또는 MCP catalog를 결정적 lexical heuristic(어휘 기반 휴리스틱)으로 순위화해 제한된 top-k 자문 리포트를 만듭니다. inline schema는 관측된 UTF-8 바이트 예산을 지키고, 누락되거나 예산 때문에 생략된 schema는 `.context-guard/tool-prune`의 compact 요약 기록과 별도 가림 처리 payload로 다시 조회할 수 있습니다. 이 기능은 안내용이며 MCP 설정을 변경하지 않습니다. 토큰 값은 provider가 측정한 절감 수치가 아니라 추정 proxy입니다.
+`context-guard-tool-prune`은 로컬 tool 또는 MCP catalog를 결정적 lexical heuristic(어휘 기반 휴리스틱)으로 순위화해 제한된 top-k 자문 리포트를 만듭니다. inline schema는 관측된 UTF-8 바이트 예산을 지키고, 누락되거나 예산 때문에 생략된 schema는 `.context-guard/tool-prune`의 compact 요약 기록과 별도 가림 처리 payload로 다시 조회할 수 있습니다. `defer-report`는 core inline tool과 deferred tool stub/namespace 요약을 나누고, 첫 프롬프트에서 빠진 schema의 gross/net char/4 proxy 회계를 함께 보여줍니다. 이 기능은 안내용이며 MCP 설정이나 native provider tool search를 변경하지 않습니다. 토큰 값은 provider가 측정한 절감 수치가 아니라 추정 proxy입니다.
 
 ### 총비용, batchability, routing 후보 자문
 

diff --git a/README.md b/README.md
@@ -104,7 +104,7 @@ When you need a savings claim, measure it on your own tasks:
 - transcript hotspots reported by `context-guard-audit`, including `cache_friendliness` prompt-layout signals and `cache_layout_advice` experiment priorities
 - statusline `cache` / `reuse` as observed transcript/provider-cache signals, not savings caused by ContextGuard
 - `context-guard cost preflight` estimates for Anthropic request JSON, followed by `context-guard cost observe` using provider usage fields (`cache_creation_input_tokens`, `cache_read_input_tokens`) after the call
-- static prompt/request cache layout checks from `context-guard-cache-score`; its char/4 token estimates and warnings are advisory only until provider usage fields confirm real cache hits
+- static prompt/request cache layout checks from `context-guard-cache-score`, including optional user-supplied cache write/read multiplier amortization risk; its char/4 token estimates and warnings are advisory only until provider usage fields confirm real cache hits
 - matched successful baseline/variant runs from `context-guard-bench`
 - large tool/MCP catalogs versus `context-guard-tool-prune` top-k reports plus receipt retrieval
 - optional experimental lanes in [`research/experimental-token-reduction-radar.md`](research/experimental-token-reduction-radar.md); fixture-only starters in [`docs/experimental-benchmark-fixtures.md`](docs/experimental-benchmark-fixtures.md) use the same matched-task benchmark gates before any savings claim
@@ -303,7 +303,7 @@ The packer uses deterministic standard-library heuristics only: no network, mode
 ./plugins/context-guard/bin/context-guard-tool-prune get <receipt_id> --tool read_file --json
 ```
 
-`context-guard-tool-prune` ranks a local tool or MCP catalog with deterministic lexical heuristics and emits a bounded top-k advisory report. Inline selected schemas respect an observed UTF-8 byte budget, and omitted or budget-skipped schemas remain recoverable from a compact local receipt plus a separate sanitized payload under `.context-guard/tool-prune`. `defer-report` uses the same receipt path to split a catalog into core inline tools plus deferred tool stubs and namespace summaries. This is advisory only: it does not mutate MCP configuration, does not configure native provider tool search, and token counts remain estimated proxies rather than measured provider savings.
+`context-guard-tool-prune` ranks a local tool or MCP catalog with deterministic lexical heuristics and emits a bounded top-k advisory report. Inline selected schemas respect an observed UTF-8 byte budget, and omitted or budget-skipped schemas remain recoverable from a compact local receipt plus a separate sanitized payload under `.context-guard/tool-prune`. `defer-report` uses the same receipt path to split a catalog into core inline tools plus deferred tool stubs and namespace summaries, and reports gross deferred-schema plus net initial-report char/4 proxy accounting so you can see what moved out of the first prompt. This is advisory only: it does not mutate MCP configuration, does not configure native provider tool search, and token counts remain estimated proxies rather than measured provider savings.
 
 ### Score static prompt cacheability
 
@@ -312,7 +312,7 @@ The packer uses deterministic standard-library heuristics only: no network, mode
 ./plugins/context-guard/bin/context-guard cache-score --input prompt.txt --provider anthropic --json
 ```
 
-`context-guard-cache-score` is a local static lint for prompt/request layout. It estimates total and cacheable-prefix size with a tokenizer-free char/4 proxy, warns about dynamic-looking values near the prefix, and records provider caveats for OpenAI, Anthropic, Gemini, or a generic threshold. It does not call providers, store raw prompts, estimate prices, observe cache hits, or prove token/cost savings; verify real cache behavior with provider usage telemetry.
+`context-guard-cache-score` is a local static lint for prompt/request layout. It estimates total and cacheable-prefix size with a tokenizer-free char/4 proxy, warns about dynamic-looking values near the prefix, and records provider caveats for OpenAI, Anthropic, Gemini, or a generic threshold. Optional `--expected-reuses`, `--cache-write-multiplier`, and `--cache-read-multiplier` inputs add an advisory amortization-risk section using user-supplied economics only. It does not call providers, store raw prompts, estimate prices from bundled defaults, observe cache hits, or prove token/cost savings; verify real cache behavior with provider usage telemetry.
 
 ### Advise on total cost, batchability, and routing
 

diff --git a/context-guard-kit/README.md b/context-guard-kit/README.md
@@ -57,9 +57,9 @@ python3 context-guard-kit/sanitize_output.py -- git diff
 
 `context_filter.py`는 opt-in declarative output filter helper입니다. filter JSON은 사용자가 package code 밖(예: `.context-guard/filter-dsl.json`)에 두고 `validate`로 검증한 뒤 `run --config ... -- <command>`로 적용합니다. invalid config, no-match, filter error, empty output, protected `git`/test/lint/`gh` failure는 원래 command stdout/stderr와 exit code를 passthrough합니다. filtered mode는 stdout+stderr를 합친 line에 filter를 적용해 stdout으로 쓰고, passthrough mode는 stdout/stderr stream을 그대로 보존합니다. `--json-report`는 stdout을 command/filter output 전용으로 두기 위해 stderr에만 diagnostic JSON을 쓰지만, protected nonzero passthrough에서는 stderr 원문 보존을 위해 report를 생략합니다. token/cost 절감 수치는 측정 claim이 아니라 local presentation 변화로만 다루세요.
 
-`cache_score.py`는 provider 호출 없이 prompt/request 파일 또는 stdin을 정적으로 검사하는 cacheability lint입니다. OpenAI/Anthropic/Gemini/generic threshold를 기준으로 stable prefix, 첫 dynamic marker, JSON/tool ordering hint, char/4 token proxy, provider caveat, claim boundary를 출력합니다. raw prompt를 저장하지 않으며, 가격/ledger/cache hit 관측은 `cost_guard.py`와 provider usage field의 영역입니다.
+`cache_score.py`는 provider 호출 없이 prompt/request 파일 또는 stdin을 정적으로 검사하는 cacheability lint입니다. OpenAI/Anthropic/Gemini/generic threshold를 기준으로 stable prefix, 첫 dynamic marker, JSON/tool ordering hint, char/4 token proxy, provider caveat, claim boundary를 출력합니다. 선택적으로 `--expected-reuses`, `--cache-write-multiplier`, `--cache-read-multiplier`를 받아 사용자가 제공한 경제성 가정으로만 amortization risk를 표시합니다. raw prompt를 저장하지 않으며, 번들 가격 추정/ledger/cache hit 관측은 `cost_guard.py`와 provider usage field의 영역입니다.
 
-`tool_schema_pruner.py`는 provider-neutral tool/MCP catalog helper입니다. `select`는 task query와 lexical overlap으로 top-k tool을 고르고, inline schema는 `--budget-bytes` 안에만 넣으며, compact receipt와 별도 sanitized payload를 `.context-guard/tool-prune`에 기록합니다. `defer-report`는 같은 receipt path를 사용해 core inline tools와 deferred tool stubs/namespace summaries를 분리합니다. `get`은 payload size/SHA-256을 검증한 뒤 전체 정제 schema를 반환합니다. 이 helper는 MCP 설정이나 native provider tool search를 바꾸지 않으며, token 절감은 측정값이 아니라 추정 proxy로만 표현합니다.
+`tool_schema_pruner.py`는 provider-neutral tool/MCP catalog helper입니다. `select`는 task query와 lexical overlap으로 top-k tool을 고르고, inline schema는 `--budget-bytes` 안에만 넣으며, compact receipt와 별도 sanitized payload를 `.context-guard/tool-prune`에 기록합니다. `defer-report`는 같은 receipt path를 사용해 core inline tools와 deferred tool stubs/namespace summaries를 분리하고, gross deferred-schema 및 net initial-report `chars_div_4` proxy 회계를 표시합니다. `get`은 payload size/SHA-256을 검증한 뒤 전체 정제 schema를 반환합니다. 이 helper는 MCP 설정이나 native provider tool search를 바꾸지 않으며, token 절감은 측정값이 아니라 추정 proxy로만 표현합니다.
 
 `context_compress.py --protected-policy`는 기본 압축 동작을 바꾸지 않고 code fence, diff, identifier, numeric constant, hash, path, stack frame, quoted string, JSON key 같은 보호-zone class/count 정책 메타데이터를 추가합니다. 보호-zone 정책은 semantic/paraphrase rewrite를 금지하고 structural dedupe/window/truncate 및 artifact retrieval만 허용합니다. raw span은 receipt에 저장하지 않으며, lossy structural transform에는 정확 재조회가 필요하다는 hint를 남깁니다. `context_compress.py --mode readable`은 가림 처리된 prose에만 deterministic sentence-window preview를 시도하고, prompt-like/high-risk protected signal이 있으면 보수 모드로 차단합니다. learned compressor, model, embedding, reranker, hosted savings claim은 포함하지 않습니다.
 

diff --git a/context-guard-kit/benchmark_runner.py b/context-guard-kit/benchmark_runner.py
@@ -184,6 +184,7 @@
 TOKEN_PROXY_BYTES_PER_TOKEN = 4
 BENCH_RUN_EVIDENCE_SCHEMA_VERSION = "contextguard.bench.run-evidence.v1"
 MATCHED_PAIR_EVIDENCE_SCHEMA_VERSION = "contextguard.bench.matched-pair.v1"
+MEASUREMENT_BASELINE_SCHEMA_VERSION = "contextguard.bench.measurement-baseline.v1"
 SELF_HOSTED_METRICS_SCHEMA_VERSION = "contextguard.bench.self-hosted-metrics.v1"
 SELF_HOSTED_METRICS_KEY = "self_hosted_metrics"
 SELF_HOSTED_METRICS_CLAIM_BOUNDARY = "self_hosted_metrics_only_not_hosted_api_token_or_cost_savings"
@@ -1546,6 +1547,77 @@ def row_cost_shift_measured(row: dict[str, str]) -> bool:
     )
 
 
+def measurement_baseline_contract() -> dict[str, Any]:
+    """Describe the benchmark report's current measurement baseline contract.
+
+    This block is descriptive. It does not change the CSV schema and does not
+    grant token/cost savings claims by itself; those remain gated by matched
+    successful tasks, measured primary tokens/costs, shifted-cost accounting,
+    and quality gates.
+    """
+    return {
+        "schema_version": MEASUREMENT_BASELINE_SCHEMA_VERSION,
+        "csv_schema_unchanged": True,
+        "csv_columns": list(CSV_COLUMNS),
+        "captured_fields": {
+            "task_identity": ["task_id", "variant"],
+            "run_configuration": ["model", "effort", "claude_version"],
+            "primary_token_buckets": [
+                "input_tokens",
+                "output_tokens",
+                "cache_read",
+                "cache_creation",
+                "total_tokens",
+                "primary_tokens_measured",
+            ],
+            "primary_cost": ["cost_usd", "cost_measured"],
+            "provider_cache_telemetry": ["provider_cached_tokens", "provider_cached_tokens_measured"],
+            "latency": ["wall_time_seconds"],
+            "quality_and_result": ["success", "corrections", "notes"],
+            "tooling_and_proxy_metrics": ["turns", "hook_triggers", "bytes_before", "bytes_after", "artifacts_used"],
+            "shifted_cost_accounting": [
+                "external_tokens",
+                "external_tokens_measured",
+                "external_cost_usd",
+                "external_cost_measured",
+                "total_cost_with_shift_usd",
+            ],
+        },
+        "claim_eligible_fields": {
+            "token_savings": [
+                "matched successful baseline and variant tasks",
+                "primary_tokens_measured=true on both sides",
+                "quality_gate=pass",
+            ],
+            "shifted_cost_savings": [
+                "matched successful baseline and variant tasks",
+                "cost_measured=true on both sides",
+                "external_cost_measured=true when external_tokens are present",
+                "quality_gate=pass",
+            ],
+        },
+        "proxy_only_fields": {
+            "byte_metrics": ["bytes_before", "bytes_after"],
+            "token_proxy": "chars_div_4_proxy_only",
+            "provider_cache": "diagnostic_telemetry_not_contextguard_token_reduction",
+        },
+        "missing_future_run_identity_fields": [
+            "repo_revision",
+            "agent_harness",
+            "feature_flags",
+            "provider_name",
+            "success_command_identity",
+        ],
+        "claim_boundary": {
+            "descriptive_contract_only": True,
+            "enables_savings_claims_by_itself": False,
+            "requires_matched_successful_tasks": True,
+            "requires_shifted_cost_accounting_for_cost_claims": True,
+            "raw_proxy_estimates_are_not_hosted_api_token_savings": True,
+        },
+    }
+
+
 def summarize_benchmark_rows(rows: list[dict[str, str]], baseline_variant: str) -> dict[str, Any]:
     by_variant: dict[str, dict[str, Any]] = {}
     successful_rows_by_variant_task: dict[str, dict[str, list[dict[str, str]]]] = {}
@@ -2191,6 +2263,7 @@ def matched_pair_evidence_entry(
         "schema": "context-guard-bench-report-v1",
         "baseline_variant": baseline_variant,
         "row_count": len(rows),
+        "measurement_baseline": measurement_baseline_contract(),
         "summary_by_variant": by_variant,
         "comparisons": comparisons,
         "matched_pair_evidence": matched_pair_evidence,