diff --git a/CHANGELOG.md b/CHANGELOG.md
index 00447f6..aa48422 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -4,6 +4,8 @@ All notable changes for the ContextGuard plugin are documented here.
 
 ## [Unreleased]
 
+- Extended Batch 1 token-savings advisory reports with cache-score amortization risk fields, tool-prune deferred-schema proxy accounting, and a benchmark measurement-baseline contract while preserving local-only/no-savings-claim boundaries.
+
 ## [0.4.10] - 2026-06-14
 
 - Added `context-guard-artifact search`, a local sanitized artifact sandbox search that returns capped literal matches with exact `get --lines` rehydration commands and no hosted savings claims.
diff --git a/README.ko.md b/README.ko.md
index 8ab8741..542bc8b 100644
--- a/README.ko.md
+++ b/README.ko.md
@@ -102,6 +102,7 @@ brief 모드는 코딩 에이전트가 군더더기를 줄이도록 요청하되
 - `context-guard-audit`가 보고한 대화 기록 사용량 집중 지점, `cache_friendliness` 프롬프트 배치 신호, `cache_layout_advice` 실험 우선순위
 - 상태표시줄의 `cache` / `reuse` 값: ContextGuard가 직접 만든 절감 효과가 아니라 관찰된 대화 기록·provider cache 신호입니다.
 - `context-guard cost preflight`로 Anthropic 요청 JSON의 추정 비용을 보고, 호출 뒤 `context-guard cost observe`로 provider usage 필드(`cache_creation_input_tokens`, `cache_read_input_tokens`)를 대조합니다.
+- `context-guard-cache-score`로 정적 cache layout과, 사용자가 직접 넣은 cache write/read multiplier 기반 amortization 위험을 안내받습니다. char/4 토큰 값은 provider 측정 절감이 아니라 추정 proxy입니다.
 - `context-guard-bench`로 성공한 기준/변형 실행을 쌍으로 맞춰 비교한 결과
 - 큰 tool/MCP catalog와 `context-guard-tool-prune` top-k 리포트 및 요약 기록 재조회 방식의 차이
 - [`research/experimental-token-reduction-radar.md`](research/experimental-token-reduction-radar.md)의 선택적 실험 lane과 마찬가지로, [`docs/experimental-benchmark-fixtures.md`](docs/experimental-benchmark-fixtures.md)의 fixture-only 시작 예시도 절감 주장을 하려면 같은 matched-task benchmark gate를 먼저 통과해야 합니다.
@@ -282,10 +283,14 @@ long-command 2>&1 | ./plugins/context-guard/bin/context-guard-artifact store --c
   --catalog tools.json \
   --query "review failing tests" \
   --top 5 --budget-bytes 12000 --json
+./plugins/context-guard/bin/context-guard-tool-prune defer-report \
+  --catalog tools.json \
+  --query "review failing tests" \
+  --core-top 3 --deferred-top 20 --json
 ./plugins/context-guard/bin/context-guard-tool-prune get <receipt_id> --tool read_file --json
 ```
 
-`context-guard-tool-prune`은 로컬 tool 또는 MCP catalog를 결정적 lexical heuristic(어휘 기반 휴리스틱)으로 순위화해 제한된 top-k 자문 리포트를 만듭니다. inline schema는 관측된 UTF-8 바이트 예산을 지키고, 누락되거나 예산 때문에 생략된 schema는 `.context-guard/tool-prune`의 compact 요약 기록과 별도 가림 처리 payload로 다시 조회할 수 있습니다. 이 기능은 안내용이며 MCP 설정을 변경하지 않습니다. 토큰 값은 provider가 측정한 절감 수치가 아니라 추정 proxy입니다.
+`context-guard-tool-prune`은 로컬 tool 또는 MCP catalog를 결정적 lexical heuristic(어휘 기반 휴리스틱)으로 순위화해 제한된 top-k 자문 리포트를 만듭니다. inline schema는 관측된 UTF-8 바이트 예산을 지키고, 누락되거나 예산 때문에 생략된 schema는 `.context-guard/tool-prune`의 compact 요약 기록과 별도 가림 처리 payload로 다시 조회할 수 있습니다. `defer-report`는 core inline tool과 deferred tool stub/namespace 요약을 나누고, 첫 프롬프트에서 빠진 schema의 gross/net char/4 proxy 회계를 함께 보여줍니다. 이 기능은 안내용이며 MCP 설정이나 native provider tool search를 변경하지 않습니다. 토큰 값은 provider가 측정한 절감 수치가 아니라 추정 proxy입니다.
 
 ### 총비용, batchability, routing 후보 자문
 
diff --git a/README.md b/README.md
index d78f25a..dd467e1 100644
--- a/README.md
+++ b/README.md
@@ -104,7 +104,7 @@ When you need a savings claim, measure it on your own tasks:
 - transcript hotspots reported by `context-guard-audit`, including `cache_friendliness` prompt-layout signals and `cache_layout_advice` experiment priorities
 - statusline `cache` / `reuse` as observed transcript/provider-cache signals, not savings caused by ContextGuard
 - `context-guard cost preflight` estimates for Anthropic request JSON, followed by `context-guard cost observe` using provider usage fields (`cache_creation_input_tokens`, `cache_read_input_tokens`) after the call
-- static prompt/request cache layout checks from `context-guard-cache-score`; its char/4 token estimates and warnings are advisory only until provider usage fields confirm real cache hits
+- static prompt/request cache layout checks from `context-guard-cache-score`, including optional user-supplied cache write/read multiplier amortization risk; its char/4 token estimates and warnings are advisory only until provider usage fields confirm real cache hits
 - matched successful baseline/variant runs from `context-guard-bench`
 - large tool/MCP catalogs versus `context-guard-tool-prune` top-k reports plus receipt retrieval
 - optional experimental lanes in [`research/experimental-token-reduction-radar.md`](research/experimental-token-reduction-radar.md); fixture-only starters in [`docs/experimental-benchmark-fixtures.md`](docs/experimental-benchmark-fixtures.md) use the same matched-task benchmark gates before any savings claim
@@ -303,7 +303,7 @@ The packer uses deterministic standard-library heuristics only: no network, mode
 ./plugins/context-guard/bin/context-guard-tool-prune get <receipt_id> --tool read_file --json
 ```
 
-`context-guard-tool-prune` ranks a local tool or MCP catalog with deterministic lexical heuristics and emits a bounded top-k advisory report. Inline selected schemas respect an observed UTF-8 byte budget, and omitted or budget-skipped schemas remain recoverable from a compact local receipt plus a separate sanitized payload under `.context-guard/tool-prune`. `defer-report` uses the same receipt path to split a catalog into core inline tools plus deferred tool stubs and namespace summaries. This is advisory only: it does not mutate MCP configuration, does not configure native provider tool search, and token counts remain estimated proxies rather than measured provider savings.
+`context-guard-tool-prune` ranks a local tool or MCP catalog with deterministic lexical heuristics and emits a bounded top-k advisory report. Inline selected schemas respect an observed UTF-8 byte budget, and omitted or budget-skipped schemas remain recoverable from a compact local receipt plus a separate sanitized payload under `.context-guard/tool-prune`. `defer-report` uses the same receipt path to split a catalog into core inline tools plus deferred tool stubs and namespace summaries, and reports gross deferred-schema plus net initial-report char/4 proxy accounting so you can see what moved out of the first prompt. This is advisory only: it does not mutate MCP configuration, does not configure native provider tool search, and token counts remain estimated proxies rather than measured provider savings.
 
 ### Score static prompt cacheability
 
@@ -312,7 +312,7 @@ The packer uses deterministic standard-library heuristics only: no network, mode
 ./plugins/context-guard/bin/context-guard cache-score --input prompt.txt --provider anthropic --json
 ```
 
-`context-guard-cache-score` is a local static lint for prompt/request layout. It estimates total and cacheable-prefix size with a tokenizer-free char/4 proxy, warns about dynamic-looking values near the prefix, and records provider caveats for OpenAI, Anthropic, Gemini, or a generic threshold. It does not call providers, store raw prompts, estimate prices, observe cache hits, or prove token/cost savings; verify real cache behavior with provider usage telemetry.
+`context-guard-cache-score` is a local static lint for prompt/request layout. It estimates total and cacheable-prefix size with a tokenizer-free char/4 proxy, warns about dynamic-looking values near the prefix, and records provider caveats for OpenAI, Anthropic, Gemini, or a generic threshold. Optional `--expected-reuses`, `--cache-write-multiplier`, and `--cache-read-multiplier` inputs add an advisory amortization-risk section using user-supplied economics only. It does not call providers, store raw prompts, estimate prices from bundled defaults, observe cache hits, or prove token/cost savings; verify real cache behavior with provider usage telemetry.
 
 ### Advise on total cost, batchability, and routing
 
diff --git a/context-guard-kit/README.md b/context-guard-kit/README.md
index 11222a6..bdb71c9 100644
--- a/context-guard-kit/README.md
+++ b/context-guard-kit/README.md
@@ -57,9 +57,9 @@ python3 context-guard-kit/sanitize_output.py -- git diff
 
 `context_filter.py`는 opt-in declarative output filter helper입니다. filter JSON은 사용자가 package code 밖(예: `.context-guard/filter-dsl.json`)에 두고 `validate`로 검증한 뒤 `run --config ... -- <command>`로 적용합니다. invalid config, no-match, filter error, empty output, protected `git`/test/lint/`gh` failure는 원래 command stdout/stderr와 exit code를 passthrough합니다. filtered mode는 stdout+stderr를 합친 line에 filter를 적용해 stdout으로 쓰고, passthrough mode는 stdout/stderr stream을 그대로 보존합니다. `--json-report`는 stdout을 command/filter output 전용으로 두기 위해 stderr에만 diagnostic JSON을 쓰지만, protected nonzero passthrough에서는 stderr 원문 보존을 위해 report를 생략합니다. token/cost 절감 수치는 측정 claim이 아니라 local presentation 변화로만 다루세요.
 
-`cache_score.py`는 provider 호출 없이 prompt/request 파일 또는 stdin을 정적으로 검사하는 cacheability lint입니다. OpenAI/Anthropic/Gemini/generic threshold를 기준으로 stable prefix, 첫 dynamic marker, JSON/tool ordering hint, char/4 token proxy, provider caveat, claim boundary를 출력합니다. raw prompt를 저장하지 않으며, 가격/ledger/cache hit 관측은 `cost_guard.py`와 provider usage field의 영역입니다.
+`cache_score.py`는 provider 호출 없이 prompt/request 파일 또는 stdin을 정적으로 검사하는 cacheability lint입니다. OpenAI/Anthropic/Gemini/generic threshold를 기준으로 stable prefix, 첫 dynamic marker, JSON/tool ordering hint, char/4 token proxy, provider caveat, claim boundary를 출력합니다. 선택적으로 `--expected-reuses`, `--cache-write-multiplier`, `--cache-read-multiplier`를 받아 사용자가 제공한 경제성 가정으로만 amortization risk를 표시합니다. raw prompt를 저장하지 않으며, 번들 가격 추정/ledger/cache hit 관측은 `cost_guard.py`와 provider usage field의 영역입니다.
 
-`tool_schema_pruner.py`는 provider-neutral tool/MCP catalog helper입니다. `select`는 task query와 lexical overlap으로 top-k tool을 고르고, inline schema는 `--budget-bytes` 안에만 넣으며, compact receipt와 별도 sanitized payload를 `.context-guard/tool-prune`에 기록합니다. `defer-report`는 같은 receipt path를 사용해 core inline tools와 deferred tool stubs/namespace summaries를 분리합니다. `get`은 payload size/SHA-256을 검증한 뒤 전체 정제 schema를 반환합니다. 이 helper는 MCP 설정이나 native provider tool search를 바꾸지 않으며, token 절감은 측정값이 아니라 추정 proxy로만 표현합니다.
+`tool_schema_pruner.py`는 provider-neutral tool/MCP catalog helper입니다. `select`는 task query와 lexical overlap으로 top-k tool을 고르고, inline schema는 `--budget-bytes` 안에만 넣으며, compact receipt와 별도 sanitized payload를 `.context-guard/tool-prune`에 기록합니다. `defer-report`는 같은 receipt path를 사용해 core inline tools와 deferred tool stubs/namespace summaries를 분리하고, gross deferred-schema 및 net initial-report `chars_div_4` proxy 회계를 표시합니다. `get`은 payload size/SHA-256을 검증한 뒤 전체 정제 schema를 반환합니다. 이 helper는 MCP 설정이나 native provider tool search를 바꾸지 않으며, token 절감은 측정값이 아니라 추정 proxy로만 표현합니다.
 
 `context_compress.py --protected-policy`는 기본 압축 동작을 바꾸지 않고 code fence, diff, identifier, numeric constant, hash, path, stack frame, quoted string, JSON key 같은 보호-zone class/count 정책 메타데이터를 추가합니다. 보호-zone 정책은 semantic/paraphrase rewrite를 금지하고 structural dedupe/window/truncate 및 artifact retrieval만 허용합니다. raw span은 receipt에 저장하지 않으며, lossy structural transform에는 정확 재조회가 필요하다는 hint를 남깁니다. `context_compress.py --mode readable`은 가림 처리된 prose에만 deterministic sentence-window preview를 시도하고, prompt-like/high-risk protected signal이 있으면 보수 모드로 차단합니다. learned compressor, model, embedding, reranker, hosted savings claim은 포함하지 않습니다.
 
diff --git a/context-guard-kit/benchmark_runner.py b/context-guard-kit/benchmark_runner.py
index 70afd68..e338b88 100755
--- a/context-guard-kit/benchmark_runner.py
+++ b/context-guard-kit/benchmark_runner.py
@@ -184,6 +184,7 @@
 TOKEN_PROXY_BYTES_PER_TOKEN = 4
 BENCH_RUN_EVIDENCE_SCHEMA_VERSION = "contextguard.bench.run-evidence.v1"
 MATCHED_PAIR_EVIDENCE_SCHEMA_VERSION = "contextguard.bench.matched-pair.v1"
+MEASUREMENT_BASELINE_SCHEMA_VERSION = "contextguard.bench.measurement-baseline.v1"
 SELF_HOSTED_METRICS_SCHEMA_VERSION = "contextguard.bench.self-hosted-metrics.v1"
 SELF_HOSTED_METRICS_KEY = "self_hosted_metrics"
 SELF_HOSTED_METRICS_CLAIM_BOUNDARY = "self_hosted_metrics_only_not_hosted_api_token_or_cost_savings"
@@ -1546,6 +1547,77 @@ def row_cost_shift_measured(row: dict[str, str]) -> bool:
     )
 
 
+def measurement_baseline_contract() -> dict[str, Any]:
+    """Describe the benchmark report's current measurement baseline contract.
+
+    This block is descriptive. It does not change the CSV schema and does not
+    grant token/cost savings claims by itself; those remain gated by matched
+    successful tasks, measured primary tokens/costs, shifted-cost accounting,
+    and quality gates.
+    """
+    return {
+        "schema_version": MEASUREMENT_BASELINE_SCHEMA_VERSION,
+        "csv_schema_unchanged": True,
+        "csv_columns": list(CSV_COLUMNS),
+        "captured_fields": {
+            "task_identity": ["task_id", "variant"],
+            "run_configuration": ["model", "effort", "claude_version"],
+            "primary_token_buckets": [
+                "input_tokens",
+                "output_tokens",
+                "cache_read",
+                "cache_creation",
+                "total_tokens",
+                "primary_tokens_measured",
+            ],
+            "primary_cost": ["cost_usd", "cost_measured"],
+            "provider_cache_telemetry": ["provider_cached_tokens", "provider_cached_tokens_measured"],
+            "latency": ["wall_time_seconds"],
+            "quality_and_result": ["success", "corrections", "notes"],
+            "tooling_and_proxy_metrics": ["turns", "hook_triggers", "bytes_before", "bytes_after", "artifacts_used"],
+            "shifted_cost_accounting": [
+                "external_tokens",
+                "external_tokens_measured",
+                "external_cost_usd",
+                "external_cost_measured",
+                "total_cost_with_shift_usd",
+            ],
+        },
+        "claim_eligible_fields": {
+            "token_savings": [
+                "matched successful baseline and variant tasks",
+                "primary_tokens_measured=true on both sides",
+                "quality_gate=pass",
+            ],
+            "shifted_cost_savings": [
+                "matched successful baseline and variant tasks",
+                "cost_measured=true on both sides",
+                "external_cost_measured=true when external_tokens are present",
+                "quality_gate=pass",
+            ],
+        },
+        "proxy_only_fields": {
+            "byte_metrics": ["bytes_before", "bytes_after"],
+            "token_proxy": "chars_div_4_proxy_only",
+            "provider_cache": "diagnostic_telemetry_not_contextguard_token_reduction",
+        },
+        "missing_future_run_identity_fields": [
+            "repo_revision",
+            "agent_harness",
+            "feature_flags",
+            "provider_name",
+            "success_command_identity",
+        ],
+        "claim_boundary": {
+            "descriptive_contract_only": True,
+            "enables_savings_claims_by_itself": False,
+            "requires_matched_successful_tasks": True,
+            "requires_shifted_cost_accounting_for_cost_claims": True,
+            "raw_proxy_estimates_are_not_hosted_api_token_savings": True,
+        },
+    }
+
+
 def summarize_benchmark_rows(rows: list[dict[str, str]], baseline_variant: str) -> dict[str, Any]:
     by_variant: dict[str, dict[str, Any]] = {}
     successful_rows_by_variant_task: dict[str, dict[str, list[dict[str, str]]]] = {}
@@ -2191,6 +2263,7 @@ def matched_pair_evidence_entry(
         "schema": "context-guard-bench-report-v1",
         "baseline_variant": baseline_variant,
         "row_count": len(rows),
+        "measurement_baseline": measurement_baseline_contract(),
         "summary_by_variant": by_variant,
         "comparisons": comparisons,
         "matched_pair_evidence": matched_pair_evidence,
diff --git a/context-guard-kit/cache_score.py b/context-guard-kit/cache_score.py
index db642cd..c330c9d 100755
--- a/context-guard-kit/cache_score.py
+++ b/context-guard-kit/cache_score.py
@@ -23,6 +23,9 @@
 SCHEMA_VERSION = "contextguard.cache-score.v1"
 DEFAULT_MAX_INPUT_BYTES = 1_000_000
 TOKEN_PROXY_CHARS_PER_TOKEN = 4
+DEFAULT_EXPECTED_REUSES = 1
+MAX_EXPECTED_REUSES = 1_000_000
+MAX_CACHE_MULTIPLIER = 1_000_000.0
 PROVIDER_MINIMUM_CACHEABLE_TOKENS = {
     # Provider and model minimums move over time.  These defaults are advisory
     # and can be overridden with --minimum-cacheable-tokens.
@@ -110,6 +113,30 @@ def bounded_int(value: object, *, default: int, minimum: int, maximum: int, name
     return number
 
 
+def bounded_float(
+    value: object,
+    *,
+    minimum: float,
+    maximum: float,
+    name: str,
+) -> float | None:
+    if value is None:
+        return None
+    if isinstance(value, bool):
+        fail(f"{name} must be a finite number")
+    try:
+        number = float(value)
+    except (TypeError, ValueError, OverflowError):
+        fail(f"{name} must be a finite number")
+    if not math.isfinite(number):
+        fail(f"{name} must be finite")
+    if number < minimum:
+        fail(f"{name} must be >= {minimum:g}")
+    if number > maximum:
+        fail(f"{name} must be <= {maximum:g}")
+    return number
+
+
 def normalized_link_target(parent: Path, raw_target: str) -> Path:
     target = Path(raw_target)
     if not target.is_absolute():
@@ -252,7 +279,103 @@ def json_shape_warnings(text: str) -> tuple[str, list[dict[str, Any]]]:
     return "json", warnings
 
 
-def score_prompt(text: str, *, provider: str, minimum_cacheable_tokens: int) -> dict[str, Any]:
+def build_amortization_report(
+    *,
+    eligible: bool,
+    prefix_tokens: int,
+    expected_reuses: int,
+    cache_write_multiplier: float | None,
+    cache_read_multiplier: float | None,
+) -> dict[str, Any]:
+    """Return advisory cache amortization math using user-supplied multipliers.
+
+    ``expected_reuses`` means future cache reads after the initial cache write.
+    Multipliers are relative to uncached prefix input cost = 1.0.  Provider
+    pricing/cache policies change, so ContextGuard intentionally does not ship
+    provider-specific multiplier defaults.
+    """
+    supplied = cache_write_multiplier is not None and cache_read_multiplier is not None
+    break_even_reuses: int | None = None
+    expected_uncached_relative_cost: float | None = None
+    expected_cached_relative_cost: float | None = None
+    expected_relative_savings: float | None = None
+    status = "multipliers_not_supplied"
+    risk = "unknown"
+
+    if not eligible:
+        status = "not_cacheable"
+        risk = "high"
+    elif not supplied:
+        status = "multipliers_not_supplied"
+        risk = "unknown"
+    else:
+        expected_uncached_relative_cost = 1.0 + expected_reuses
+        expected_cached_relative_cost = cache_write_multiplier + (expected_reuses * cache_read_multiplier)
+        expected_relative_savings = expected_uncached_relative_cost - expected_cached_relative_cost
+        if cache_read_multiplier < 1.0:
+            if cache_write_multiplier <= 1.0:
+                break_even_reuses = 0
+            else:
+                break_even_reuses = int(math.ceil((cache_write_multiplier - 1.0) / (1.0 - cache_read_multiplier)))
+            if expected_reuses >= break_even_reuses:
+                status = "already_break_even_on_write" if break_even_reuses == 0 else "amortizes_with_expected_reuses"
+                risk = "low"
+            elif expected_reuses > 0:
+                status = "not_enough_expected_reuses"
+                risk = "medium"
+            else:
+                status = "not_enough_expected_reuses"
+                risk = "high"
+        elif cache_read_multiplier == 1.0 and cache_write_multiplier <= 1.0:
+            break_even_reuses = 0
+            status = "already_break_even_on_write"
+            risk = "low"
+        elif cache_read_multiplier > 1.0 and cache_write_multiplier <= 1.0 and expected_reuses == 0:
+            break_even_reuses = 0
+            status = "already_break_even_on_write"
+            risk = "low"
+        elif cache_read_multiplier > 1.0 and expected_relative_savings >= 0:
+            break_even_reuses = 0 if cache_write_multiplier <= 1.0 else None
+            status = "amortizes_with_expected_reuses"
+            risk = "medium"
+        else:
+            status = "no_read_discount"
+            risk = "high"
+
+    return {
+        "expected_reuses": expected_reuses,
+        "expected_reuses_semantics": "future_cache_reads_after_initial_write",
+        "cacheable_prefix_tokens": prefix_tokens,
+        "break_even_reuses": break_even_reuses,
+        "status": status,
+        "risk": risk,
+        "cache_write_multiplier": cache_write_multiplier,
+        "cache_read_multiplier": cache_read_multiplier,
+        "expected_uncached_relative_cost": expected_uncached_relative_cost,
+        "expected_cached_relative_cost": expected_cached_relative_cost,
+        "expected_relative_savings": expected_relative_savings,
+        "multiplier_baseline": "uncached_prefix_input_cost_equals_1.0",
+        "user_supplied_multipliers": supplied,
+        "formula": "expected_cached=write_multiplier + expected_reuses*read_multiplier; expected_uncached=1 + expected_reuses; break_even=ceil((write_multiplier - 1.0)/(1.0-read_multiplier)) only when read_multiplier<1",
+        "claim_boundary": {
+            "advisory_only": True,
+            "provider_pricing_defaults_included": False,
+            "provider_measured_cache_hit": False,
+            "hosted_api_token_or_cost_savings_claim_allowed": False,
+            "requires_user_supplied_or_provider_documented_multipliers": True,
+        },
+    }
+
+
+def score_prompt(
+    text: str,
+    *,
+    provider: str,
+    minimum_cacheable_tokens: int,
+    expected_reuses: int = DEFAULT_EXPECTED_REUSES,
+    cache_write_multiplier: float | None = None,
+    cache_read_multiplier: float | None = None,
+) -> dict[str, Any]:
     prompt_kind, shape_warnings = json_shape_warnings(text)
     dynamic_offset, dynamic_marker = first_dynamic_marker(text)
     prefix_text = text if dynamic_offset is None else text[:dynamic_offset]
@@ -282,13 +405,14 @@ def score_prompt(text: str, *, provider: str, minimum_cacheable_tokens: int) ->
             "message": "Anthropic caching usually requires cache_control around the reusable prefix.",
         })
 
+    eligible = prefix_estimated >= minimum_cacheable_tokens
     return {
         "tool": TOOL_NAME,
         "schema_version": SCHEMA_VERSION,
         "provider": provider,
         "prompt_kind": prompt_kind,
         "minimum_cacheable_tokens": minimum_cacheable_tokens,
-        "eligible": prefix_estimated >= minimum_cacheable_tokens,
+        "eligible": eligible,
         "estimated_tokens": estimated,
         "cacheable_prefix_tokens": prefix_estimated,
         "token_estimate": {
@@ -305,6 +429,13 @@ def score_prompt(text: str, *, provider: str, minimum_cacheable_tokens: int) ->
         "static_prefix_ratio": round(static_ratio, 6),
         "warnings": warnings,
         "provider_caveat": PROVIDER_CAVEATS[provider],
+        "amortization": build_amortization_report(
+            eligible=eligible,
+            prefix_tokens=prefix_estimated,
+            expected_reuses=expected_reuses,
+            cache_write_multiplier=cache_write_multiplier,
+            cache_read_multiplier=cache_read_multiplier,
+        ),
         "raw_prompt_stored": False,
         "claim_boundary": {
             "advisory_only": True,
@@ -320,11 +451,15 @@ def render_text(report: dict[str, Any]) -> str:
     status = "eligible" if report.get("eligible") else "not eligible"
     warnings = report.get("warnings") if isinstance(report.get("warnings"), list) else []
     warning_codes = ", ".join(str(item.get("code")) for item in warnings if isinstance(item, dict)) or "none"
+    amortization = report.get("amortization") if isinstance(report.get("amortization"), dict) else {}
     return (
         f"{TOOL_NAME}: {status} for {report['provider']} "
         f"(static_prefix≈{report['cacheable_prefix_tokens']} char/4 tokens, "
         f"minimum={report['minimum_cacheable_tokens']})\n"
         f"warnings: {warning_codes}\n"
+        f"amortization: {amortization.get('status', 'unknown')} "
+        f"(risk={amortization.get('risk', 'unknown')}, "
+        f"break_even_reuses={amortization.get('break_even_reuses')})\n"
         "claim boundary: advisory static lint only; not a measured provider cache hit or cost saving.\n"
     )
 
@@ -344,6 +479,24 @@ def build_parser() -> argparse.ArgumentParser:
         help="override provider threshold for model/platform-specific cache minimums",
     )
     parser.add_argument("--max-input-bytes", default=DEFAULT_MAX_INPUT_BYTES, help=f"maximum input bytes (default: {DEFAULT_MAX_INPUT_BYTES})")
+    parser.add_argument(
+        "--expected-reuses",
+        default=DEFAULT_EXPECTED_REUSES,
+        help=(
+            "future cache reads expected after the initial write; advisory only "
+            f"(default: {DEFAULT_EXPECTED_REUSES})"
+        ),
+    )
+    parser.add_argument(
+        "--cache-write-multiplier",
+        default=None,
+        help="optional user-supplied cache write multiplier relative to uncached prefix input cost=1.0",
+    )
+    parser.add_argument(
+        "--cache-read-multiplier",
+        default=None,
+        help="optional user-supplied cache read multiplier relative to uncached prefix input cost=1.0",
+    )
     parser.add_argument("--json", action="store_true", help="emit stable JSON")
     return parser
 
@@ -362,8 +515,34 @@ def main(argv: list[str] | None = None) -> int:
             maximum=10_000_000,
             name="--minimum-cacheable-tokens",
         )
+        expected_reuses = bounded_int(
+            args.expected_reuses,
+            default=DEFAULT_EXPECTED_REUSES,
+            minimum=0,
+            maximum=MAX_EXPECTED_REUSES,
+            name="--expected-reuses",
+        )
+        cache_write_multiplier = bounded_float(
+            args.cache_write_multiplier,
+            minimum=0.0,
+            maximum=MAX_CACHE_MULTIPLIER,
+            name="--cache-write-multiplier",
+        )
+        cache_read_multiplier = bounded_float(
+            args.cache_read_multiplier,
+            minimum=0.0,
+            maximum=MAX_CACHE_MULTIPLIER,
+            name="--cache-read-multiplier",
+        )
         text = read_limited_path(Path(args.input), max_input_bytes) if args.input else read_limited_stdin(max_input_bytes)
-        report = score_prompt(text, provider=provider, minimum_cacheable_tokens=minimum)
+        report = score_prompt(
+            text,
+            provider=provider,
+            minimum_cacheable_tokens=minimum,
+            expected_reuses=expected_reuses,
+            cache_write_multiplier=cache_write_multiplier,
+            cache_read_multiplier=cache_read_multiplier,
+        )
         if args.json:
             sys.stdout.write(json_bytes(report, indent=2) + "\n")
         else:
diff --git a/context-guard-kit/tool_schema_pruner.py b/context-guard-kit/tool_schema_pruner.py
index c070c42..d2ae4a1 100755
--- a/context-guard-kit/tool_schema_pruner.py
+++ b/context-guard-kit/tool_schema_pruner.py
@@ -844,7 +844,14 @@ def defer_report(args: argparse.Namespace) -> str:
         namespace_top=namespace_top,
     )
     all_schema_bytes = sum(byte_len_json(cand.schema) for cand in ranked)
+    listed_deferred_schema_bytes = sum(byte_len_json(cand.schema) for cand in deferred_candidates)
+    total_deferred_schema_bytes = sum(byte_len_json(cand.schema) for cand in ranked[core_top:])
     tool_stub_report_bytes = byte_len_json(core_tools) + byte_len_json(deferred_tools)
+    all_schema_tokens = proxy_tokens(all_schema_bytes)
+    inline_core_schema_tokens = proxy_tokens(core_schema_bytes)
+    listed_deferred_schema_tokens = proxy_tokens(listed_deferred_schema_bytes)
+    total_deferred_schema_tokens = proxy_tokens(total_deferred_schema_bytes)
+    tool_stub_report_tokens = proxy_tokens(tool_stub_report_bytes)
     result = {
         "tool": TOOL_NAME,
         "schema_version": DEFER_SCHEMA_VERSION,
@@ -862,6 +869,7 @@ def defer_report(args: argparse.Namespace) -> str:
         "deferred_tools_truncated_count": max(0, len(ranked) - core_top - len(deferred_tools)),
         "deferred_namespaces": deferred_namespaces,
         "deferred_namespaces_truncated_count": deferred_namespaces_truncated_count,
+        "deferred_schema_retrieval_required_before_use": True,
         "receipt": {
             **receipt,
             "bytes": receipt_size,
@@ -871,9 +879,21 @@ def defer_report(args: argparse.Namespace) -> str:
             "method": "char4_proxy",
             "chars_per_token": TOKEN_PROXY_CHARS_PER_TOKEN,
             "all_schema_bytes": all_schema_bytes,
+            "inline_core_schema_bytes": core_schema_bytes,
+            "listed_deferred_schema_bytes": listed_deferred_schema_bytes,
+            "total_deferred_schema_bytes": total_deferred_schema_bytes,
             "tool_stub_report_bytes": tool_stub_report_bytes,
-            "all_schema_tokens_estimated": proxy_tokens(all_schema_bytes),
-            "tool_stub_report_tokens_estimated": proxy_tokens(tool_stub_report_bytes),
+            "all_schema_tokens_estimated": all_schema_tokens,
+            "inline_core_schema_tokens_estimated": inline_core_schema_tokens,
+            "listed_deferred_schema_tokens_estimated": listed_deferred_schema_tokens,
+            "total_deferred_schema_tokens_estimated": total_deferred_schema_tokens,
+            "tool_stub_report_tokens_estimated": tool_stub_report_tokens,
+            "gross_listed_deferred_schema_tokens_avoided": listed_deferred_schema_tokens,
+            "gross_total_deferred_schema_tokens_avoided": total_deferred_schema_tokens,
+            "net_initial_report_tokens_delta": tool_stub_report_tokens - all_schema_tokens,
+            "net_initial_report_tokens_delta_semantics": "tool_stub_report_tokens_estimated_minus_all_schema_tokens_estimated",
+            "estimated_initial_schema_tokens_avoided": max(0, all_schema_tokens - tool_stub_report_tokens),
+            "estimated_initial_schema_tokens_avoided_semantics": "max(0, all_schema_tokens_estimated - tool_stub_report_tokens_estimated)",
             "claim_boundary": "proxy_only_not_provider_billed_tokens",
         },
         "provider_patterns": [
@@ -899,11 +919,13 @@ def defer_report(args: argparse.Namespace) -> str:
             "provider_tool_search_configured": False,
             "hosted_api_token_or_cost_savings_claim_allowed": False,
             "requires_provider_measured_matched_tasks_for_savings_claims": True,
+            "deferred_schema_retrieval_required_before_use": True,
         },
         "redaction": {"redacted_values": total_redactions},
         "caveats": [
             "Deferred loading is an application strategy report, not a native provider integration.",
             "Token proxy values are char/4 estimates over sanitized local JSON, not billed provider tokens.",
+            "Deferred schema token fields are initial-prompt proxy accounting; full schemas must be retrieved before deferred tool use.",
             "Use receipt get commands to retrieve full sanitized schemas before using deferred tools.",
         ],
     }
diff --git a/plugins/context-guard/README.ko.md b/plugins/context-guard/README.ko.md
index a86ec77..9340a80 100644
--- a/plugins/context-guard/README.ko.md
+++ b/plugins/context-guard/README.ko.md
@@ -79,7 +79,9 @@ context-guard-sanitize-output -- git diff
 context-guard-pack auto --root . --query "failing tests review" --diff HEAD --manifest-out suggested-pack.json --pack-out context-pack.md --budget-bytes 12000 --json --explain
 context-guard-pack build --root . --manifest suggested-pack.json --budget-bytes 12000 --json
 context-guard-pack slice --root . --path README.md --lines 1:40 --json
+context-guard-cache-score --input prompt.json --provider openai --json
 context-guard-tool-prune select --catalog tools.json --query "review failing tests" --top 5 --budget-bytes 12000 --json
+context-guard-tool-prune defer-report --catalog tools.json --query "review failing tests" --core-top 3 --deferred-top 20 --json
 context-guard-tool-prune get <receipt_id> --tool read_file --json
 context-guard-statusline
 context-guard-statusline-merged
@@ -92,15 +94,15 @@ context-guard-statusline-merged
 - **대용량 읽기 가드와 심볼 리더**는 파일 전체 읽기 전에 검색, 심볼 구간, 작은 줄 범위 읽기 순서로 에이전트를 안내합니다. Python, JavaScript/TypeScript, Go, Rust 소스 구간 읽기를 지원합니다.
 - **로컬 로그 보관소**는 큰 명령 출력을 기본적으로 `.context-guard/artifacts`에 가림 처리해 저장하고, 줄 번호가 있는 top error, 중복 라인 그룹, 가림 처리된 bounded suggested query가 담긴 요약 기록이나 요청한 정확한 줄 범위만 반환합니다. `get`과 `list`는 리브랜딩 이전의 `.claude-token-optimizer/artifacts` 요약 기록도 읽을 수 있습니다.
 - **예산 기반 컨텍스트 패커**는 우선순위가 있는 로컬 파일 근거를 렌더링된 바이트 예산 안의 Markdown pack으로 조립하고, 포함·부분 포함·누락 source 메타데이터, bounded `.context-guard/packs` 요약 기록, 안전할 때만 정확한 가림 처리 `slice` 명령, 안전하지 않을 때의 `retrieval_omitted_reason`을 남깁니다. 추가된 `auto` 하위 명령은 추천과 pack build를 한 번에 실행하고, `auto --explain`은 manifest, pack 본문, receipt, byte budget을 바꾸지 않으면서 결정적 로컬 선택/build 이유를 짧게 추가합니다. JSON explain의 bounded repo-map은 sampled byte/token-proxy tree, category-only secret risk count, signature-first hint, explain-only graph rank, 기존 `slice`/symbol 재조회 힌트를 제공하지만 pack 선택이나 provider savings claim은 아닙니다. `suggest`는 로컬 query, diff, 명시 파일, 가림 처리된 output/test-output 신호를 `build`와 호환되는 manifest로 순위화하며 네트워크·모델 호출·임베딩·provider 비용 추정은 하지 않습니다. 토큰 수는 측정된 provider token 절감이 아니라 추정 `chars_div_4` proxy입니다.
-- **Tool/MCP schema pruner**는 로컬 tool catalog를 bounded top-k 자문 리포트로 순위화하고, compact 요약 기록과 payload integrity check로 전체 가림 처리된 schema 재조회를 보존합니다.
+- **Tool/MCP schema pruner**는 로컬 tool catalog를 bounded top-k 자문 리포트로 순위화하고, compact 요약 기록과 payload integrity check로 전체 가림 처리된 schema 재조회를 보존합니다. `defer-report`는 core inline tool과 deferred stub/namespace 요약을 나누고 gross deferred-schema 및 net initial-report `chars_div_4` proxy 회계를 보여주지만, deferred tool을 쓰기 전에는 전체 schema를 다시 조회해야 합니다.
 - **보수적 압축기**는 가림 처리된 stdin을 JSON, diff, 로그, 검색 출력, 코드, 산문으로 분류하고, 관측 바이트 근거와 추정 토큰 proxy를 함께 노출합니다.
-- **Anthropic 비용 가드와 route advisor**는 `context-guard cost preflight/observe/ledger/compile`로 호출 전 비용 추정, provider usage 대조, keyed-HMAC cache 위험 기록, 안정적인 prefix 배치 안내를 제공합니다. `context-guard route-advisor`는 caller가 제공한 workload JSON, provider feature 선언, usage telemetry, 외부·로컬 shifted cost를 읽는 local-only passive advisor이며 queue를 시작하거나 provider를 호출하거나 pricing 문서를 새로 가져오거나 provider feature 지식을 authoritative하게 취급하지 않고 total-cost accounting, batchability blocker, route 후보를 출력합니다. 원문 프롬프트를 저장하지 않고 Anthropic/provider prompt cache를 대체하지 않으며, 추천은 matched successful task, 비열등 quality evidence, shifted-cost accounting 없이는 hosted token/cost 절감 주장이 아닙니다.
+- **정적 cache-score lint와 Anthropic 비용 가드/route advisor**는 `context-guard-cache-score`로 로컬 prompt/request cache layout과 사용자 제공 cache write/read multiplier 기반 amortization 위험을 안내하고, `context-guard cost preflight/observe/ledger/compile`로 호출 전 비용 추정, provider usage 대조, keyed-HMAC cache 위험 기록, 안정적인 prefix 배치 안내를 제공합니다. `context-guard route-advisor`는 caller가 제공한 workload JSON, provider feature 선언, usage telemetry, 외부·로컬 shifted cost를 읽는 local-only passive advisor이며 queue를 시작하거나 provider를 호출하거나 pricing 문서를 새로 가져오거나 provider feature 지식을 authoritative하게 취급하지 않고 total-cost accounting, batchability blocker, route 후보를 출력합니다. 원문 프롬프트를 저장하지 않고 Anthropic/provider prompt cache를 대체하지 않으며, 추천은 matched successful task, 비열등 quality evidence, shifted-cost accounting 없이는 hosted token/cost 절감 주장이 아닙니다.
 - **출력 축약기**는 감싼 명령의 종료 코드를 보존하면서 긴 로그를 줄이고, `--digest markdown` 또는 `--digest json`으로 실행기 실패 정보, 가림 처리된 failure signature, 중복 라인 그룹, 다음 조회 제안이 담긴 요약을 만들 수 있습니다.
 - **민감정보 가림 도구**는 검색, diff, 로그 출력에서 자격 증명 패턴, 비공개 키 블록, 인증 헤더, 자격 증명이 포함된 URL, 민감해 보이는 경로를 가립니다.
 - **상태표시줄**은 모델, 컨텍스트, 비용 신호를 짧게 보여주고, 대화 기록 데이터가 있으면 캐시 읽기와 캐시 재사용 신호도 함께 표시합니다.
 - **대화 기록 감사**는 usage/cost/cache bucket을 집계하고, 토큰 집중 지점, `cache_friendliness` 프롬프트 배치 신호, `cache_layout_advice` 확인/실험 우선순위를 제한된 가림 처리된 segment hash로 보고합니다. 원문 프롬프트는 출력하지 않습니다.
 - **반복 실패 알림**은 Bash 실패가 반복될 때 같은 경로를 계속 재시도하지 않고 전략을 바꾸도록 안내합니다.
-- **벤치마크 헬퍼**는 기준/변형 실행을 대응해 실제 토큰·비용 필드, 별도의 바이트 감소 간접 증거, 진단용 `wall_time_seconds`, `provider_cached_tokens`, provider-cache 사용 가능성 텔레메트리, 파일 기반 `variant_prompt_files`, 선택적 run별 `self_hosted_metrics` JSONL ledger sidecar를 기록합니다. 이 sidecar는 hosted API 절감 주장에 합치지 않습니다.
+- **벤치마크 헬퍼**는 기준/변형 실행을 대응해 실제 토큰·비용 필드, 별도의 바이트 감소 간접 증거, 진단용 `wall_time_seconds`, `provider_cached_tokens`, provider-cache 사용 가능성 텔레메트리, report-level measurement-baseline contract, 파일 기반 `variant_prompt_files`, 선택적 run별 `self_hosted_metrics` JSONL ledger sidecar를 기록합니다. 이 sidecar는 hosted API 절감 주장에 합치지 않습니다.
 
 비용 가드의 로컬 HMAC 키는 기본적으로 `.context-guard/cost-ledger/hmac.key`에 자동 생성됩니다. 관리자가 직접 주입하는 경우 파일에는 필수 padding을 포함한 canonical URL-safe base64 32바이트 키만 정확히 들어 있어야 하며, trailing newline이나 공백은 허용하지 않습니다. 리포트는 키와 원문 프롬프트를 출력하지 않고, 로컬 ledger는 Anthropic/provider prompt cache를 대체하지 않습니다.
 
diff --git a/plugins/context-guard/README.md b/plugins/context-guard/README.md
index b628282..d3c10c1 100644
--- a/plugins/context-guard/README.md
+++ b/plugins/context-guard/README.md
@@ -103,15 +103,15 @@ context-guard-statusline-merged
 - **Declarative output filter** validates user-owned JSON filter files outside package code and applies the first matching line filter only as an explicit `run --config ... -- <command>` wrapper. Invalid configs, no-match commands, filter errors, empty filtered output, and protected `git`/test/lint/`gh` command failures preserve original stdout/stderr and exit code. Filtered mode applies line rules to combined stdout+stderr and writes the filtered result to stdout; `--json-report` diagnostics go to stderr, except protected nonzero passthrough suppresses reports to keep stderr raw. It is local and opt-in, with no savings guarantee.
 - **Artifact store** saves large sanitized command output under `.context-guard/artifacts` by default and returns compact receipts, local sandbox search results, or exact requested slices. JSON receipts include line-numbered top errors, duplicate-line groups, and sanitized bounded suggested queries. `search` scans sanitized local artifacts by literal substring, emits capped match/context records, and includes `get --lines START:END` rehydration commands without hosted token/cost savings claims. Custom `--dir` raw paths stay redacted by default; reuse the same `--dir` or opt into `search --show-paths` for a directly executable local command. In suggested `--lines START:END` queries, `--max-lines` is only the returned-line cap for that selected range, not a wider selector. `get`, `list`, and `search` can also read legacy `.claude-token-optimizer/artifacts` receipts.
 - **Budgeted context packer** assembles prioritized local file evidence into a rendered byte-budgeted Markdown pack with included/partial/omitted source metadata, bounded `.context-guard/packs` receipts, exact sanitized `slice` commands when safe, and `retrieval_omitted_reason` when a path/root should not be echoed. The additive `auto` subcommand runs that recommendation and pack build in one step, and `auto --explain` adds compact deterministic local selection/build reasons without changing the manifest, pack body, receipt, or byte budget. JSON explain also includes bounded repo-map metadata: sampled byte/token-proxy tree entries, category-only secret-risk counts, signature-first hints, explain-only graph ranks, and exact `slice`/symbol retrieval hints. `suggest` remains available to rank local query, diff, explicit file, and sanitized output/test-output signals into a build-compatible manifest without network, model, embedding, or provider-cost calls. `suggest/auto --adaptive-k` adds advisory-only shrink/expand top-k metadata from local score distribution, byte-budget fit, and score-mass recall/precision proxies; it never applies the recommendation automatically or changes the manifest, pack body, receipt, or byte budget. `auto --symbol-memory` adds repo-map-derived symbol/graph advisory metadata with exact `slice`/`read-symbol` verification hints and still does not change selection or pack output. Token counts are estimated `chars_div_4` proxies, not measured provider-token savings.
-- **Tool/MCP schema pruner** ranks local tool catalogs into bounded top-k advisory reports while preserving full sanitized schema fallback through compact receipts and payload integrity checks.
+- **Tool/MCP schema pruner** ranks local tool catalogs into bounded top-k advisory reports while preserving full sanitized schema fallback through compact receipts and payload integrity checks. `defer-report` additionally separates core inline tools from deferred stubs/namespaces and reports gross deferred-schema plus net initial-report char/4 proxy accounting; full schemas still must be retrieved before deferred tool use.
 - **Conservative compressor** classifies sanitized stdin as JSON, diff, log, search output, code, or prose and shrinks it with observed byte evidence plus estimated token proxies. Add `--protected-policy` for opt-in protected-zone class/count metadata that denies semantic rewrites for code fences, diffs, identifiers, numeric constants, hashes, paths, stack frames, quoted strings, and JSON keys while preserving exact-retrieval guidance. Add `--mode readable` only for sanitized prose previews: it uses deterministic sentence windows, blocks prompt-like/high-risk protected signals, stores no raw protected spans, and does not run learned compressors, models, embeddings, or rerankers.
-- **Anthropic cost guard and route advisor** provides `context-guard cost preflight/observe/ledger/compile` for passive pre-call estimates, provider-usage reconciliation, keyed-HMAC cache-risk history, and stable-prefix layout advice. `context-guard route-advisor` is a local-only passive advisor for caller-supplied workload JSON, provider feature declarations, usage telemetry, and shifted external/local costs; it emits total-cost accounting, batchability blockers, and route candidates without starting a queue, calling providers, refreshing pricing docs, or treating provider feature knowledge as authoritative. It stores no raw prompt text, does not replace Anthropic/provider prompt caching, and its recommendations are not hosted token/cost savings claims without matched successful tasks, non-inferior quality evidence, and shifted-cost accounting.
+- **Static cache-score lint plus Anthropic cost guard and route advisor** provides `context-guard-cache-score` for local prompt/request cache layout checks, with optional user-supplied cache write/read multiplier amortization risk, and `context-guard cost preflight/observe/ledger/compile` for passive pre-call estimates, provider-usage reconciliation, keyed-HMAC cache-risk history, and stable-prefix layout advice. `context-guard route-advisor` is a local-only passive advisor for caller-supplied workload JSON, provider feature declarations, usage telemetry, and shifted external/local costs; it emits total-cost accounting, batchability blockers, and route candidates without starting a queue, calling providers, refreshing pricing docs, or treating provider feature knowledge as authoritative. It stores no raw prompt text, does not replace Anthropic/provider prompt caching, and its recommendations are not hosted token/cost savings claims without matched successful tasks, non-inferior quality evidence, and shifted-cost accounting.
 - **Output trimmer** preserves the wrapped command exit code, trims long logs, and can emit `--digest markdown` or `--digest json` summaries with runner failure facts, sanitized failure signatures, duplicate-line groups, and suggested next queries. Add `--artifact-receipt` with digest mode to store the exact sanitized full output as a local artifact receipt and re-expand omitted slices with the emitted `context-guard-artifact get ...` command.
 - **Sanitizer** redacts common credential patterns, private key blocks, auth headers, credential URLs, and sensitive-looking paths from search, diff, and log output.
 - **Statusline** displays compact model/context/cost signals and, when transcript data is available, cache-read and cache-reuse signals.
 - **Transcript audit** aggregates usage/cost/cache buckets, flags likely token hotspots, and exposes `cache_friendliness`, additive [`cache_diagnostics`](https://github.com/ictechgy/context-guard/blob/main/docs/cache-diagnostics-schema.md), and `cache_layout_advice` experiment priorities from bounded usage fields, timestamped cache telemetry records, and redacted segment hashes without printing raw prompt text or claiming provider-cache savings.
 - **Repeated-failure nudge** warns after repeated Bash failures so the agent switches strategy instead of retrying the same context-heavy path.
-- **Benchmark helper** records matched baseline/variant runs with real token and cost fields, separate byte-reduction proxy evidence, diagnostic `wall_time_seconds`, `provider_cached_tokens`, provider-cache availability telemetry, file-backed `variant_prompt_files`, and optional per-run `self_hosted_metrics` JSONL ledger sidecars that stay out of hosted API savings claims.
+- **Benchmark helper** records matched baseline/variant runs with real token and cost fields, separate byte-reduction proxy evidence, diagnostic `wall_time_seconds`, `provider_cached_tokens`, provider-cache availability telemetry, a report-level measurement-baseline contract, file-backed `variant_prompt_files`, and optional per-run `self_hosted_metrics` JSONL ledger sidecars that stay out of hosted API savings claims.
 
 Cost guard creates its local HMAC key automatically at `.context-guard/cost-ledger/hmac.key`. If you provision that file yourself, it must contain exactly one canonical URL-safe base64 32-byte key with required padding and no trailing newline or whitespace. Reports never emit the key or raw prompt text, and the local ledger does not replace Anthropic/provider prompt caching.
 
diff --git a/plugins/context-guard/bin/context-guard-bench b/plugins/context-guard/bin/context-guard-bench
index 70afd68..e338b88 100755
--- a/plugins/context-guard/bin/context-guard-bench
+++ b/plugins/context-guard/bin/context-guard-bench
@@ -184,6 +184,7 @@ MAX_USAGE_COST_USD = 10**9
 TOKEN_PROXY_BYTES_PER_TOKEN = 4
 BENCH_RUN_EVIDENCE_SCHEMA_VERSION = "contextguard.bench.run-evidence.v1"
 MATCHED_PAIR_EVIDENCE_SCHEMA_VERSION = "contextguard.bench.matched-pair.v1"
+MEASUREMENT_BASELINE_SCHEMA_VERSION = "contextguard.bench.measurement-baseline.v1"
 SELF_HOSTED_METRICS_SCHEMA_VERSION = "contextguard.bench.self-hosted-metrics.v1"
 SELF_HOSTED_METRICS_KEY = "self_hosted_metrics"
 SELF_HOSTED_METRICS_CLAIM_BOUNDARY = "self_hosted_metrics_only_not_hosted_api_token_or_cost_savings"
@@ -1546,6 +1547,77 @@ def row_cost_shift_measured(row: dict[str, str]) -> bool:
     )
 
 
+def measurement_baseline_contract() -> dict[str, Any]:
+    """Describe the benchmark report's current measurement baseline contract.
+
+    This block is descriptive. It does not change the CSV schema and does not
+    grant token/cost savings claims by itself; those remain gated by matched
+    successful tasks, measured primary tokens/costs, shifted-cost accounting,
+    and quality gates.
+    """
+    return {
+        "schema_version": MEASUREMENT_BASELINE_SCHEMA_VERSION,
+        "csv_schema_unchanged": True,
+        "csv_columns": list(CSV_COLUMNS),
+        "captured_fields": {
+            "task_identity": ["task_id", "variant"],
+            "run_configuration": ["model", "effort", "claude_version"],
+            "primary_token_buckets": [
+                "input_tokens",
+                "output_tokens",
+                "cache_read",
+                "cache_creation",
+                "total_tokens",
+                "primary_tokens_measured",
+            ],
+            "primary_cost": ["cost_usd", "cost_measured"],
+            "provider_cache_telemetry": ["provider_cached_tokens", "provider_cached_tokens_measured"],
+            "latency": ["wall_time_seconds"],
+            "quality_and_result": ["success", "corrections", "notes"],
+            "tooling_and_proxy_metrics": ["turns", "hook_triggers", "bytes_before", "bytes_after", "artifacts_used"],
+            "shifted_cost_accounting": [
+                "external_tokens",
+                "external_tokens_measured",
+                "external_cost_usd",
+                "external_cost_measured",
+                "total_cost_with_shift_usd",
+            ],
+        },
+        "claim_eligible_fields": {
+            "token_savings": [
+                "matched successful baseline and variant tasks",
+                "primary_tokens_measured=true on both sides",
+                "quality_gate=pass",
+            ],
+            "shifted_cost_savings": [
+                "matched successful baseline and variant tasks",
+                "cost_measured=true on both sides",
+                "external_cost_measured=true when external_tokens are present",
+                "quality_gate=pass",
+            ],
+        },
+        "proxy_only_fields": {
+            "byte_metrics": ["bytes_before", "bytes_after"],
+            "token_proxy": "chars_div_4_proxy_only",
+            "provider_cache": "diagnostic_telemetry_not_contextguard_token_reduction",
+        },
+        "missing_future_run_identity_fields": [
+            "repo_revision",
+            "agent_harness",
+            "feature_flags",
+            "provider_name",
+            "success_command_identity",
+        ],
+        "claim_boundary": {
+            "descriptive_contract_only": True,
+            "enables_savings_claims_by_itself": False,
+            "requires_matched_successful_tasks": True,
+            "requires_shifted_cost_accounting_for_cost_claims": True,
+            "raw_proxy_estimates_are_not_hosted_api_token_savings": True,
+        },
+    }
+
+
 def summarize_benchmark_rows(rows: list[dict[str, str]], baseline_variant: str) -> dict[str, Any]:
     by_variant: dict[str, dict[str, Any]] = {}
     successful_rows_by_variant_task: dict[str, dict[str, list[dict[str, str]]]] = {}
@@ -2191,6 +2263,7 @@ def summarize_benchmark_rows(rows: list[dict[str, str]], baseline_variant: str)
         "schema": "context-guard-bench-report-v1",
         "baseline_variant": baseline_variant,
         "row_count": len(rows),
+        "measurement_baseline": measurement_baseline_contract(),
         "summary_by_variant": by_variant,
         "comparisons": comparisons,
         "matched_pair_evidence": matched_pair_evidence,
diff --git a/plugins/context-guard/bin/context-guard-cache-score b/plugins/context-guard/bin/context-guard-cache-score
index db642cd..c330c9d 100755
--- a/plugins/context-guard/bin/context-guard-cache-score
+++ b/plugins/context-guard/bin/context-guard-cache-score
@@ -23,6 +23,9 @@ TOOL_NAME = "context-guard-cache-score"
 SCHEMA_VERSION = "contextguard.cache-score.v1"
 DEFAULT_MAX_INPUT_BYTES = 1_000_000
 TOKEN_PROXY_CHARS_PER_TOKEN = 4
+DEFAULT_EXPECTED_REUSES = 1
+MAX_EXPECTED_REUSES = 1_000_000
+MAX_CACHE_MULTIPLIER = 1_000_000.0
 PROVIDER_MINIMUM_CACHEABLE_TOKENS = {
     # Provider and model minimums move over time.  These defaults are advisory
     # and can be overridden with --minimum-cacheable-tokens.
@@ -110,6 +113,30 @@ def bounded_int(value: object, *, default: int, minimum: int, maximum: int, name
     return number
 
 
+def bounded_float(
+    value: object,
+    *,
+    minimum: float,
+    maximum: float,
+    name: str,
+) -> float | None:
+    if value is None:
+        return None
+    if isinstance(value, bool):
+        fail(f"{name} must be a finite number")
+    try:
+        number = float(value)
+    except (TypeError, ValueError, OverflowError):
+        fail(f"{name} must be a finite number")
+    if not math.isfinite(number):
+        fail(f"{name} must be finite")
+    if number < minimum:
+        fail(f"{name} must be >= {minimum:g}")
+    if number > maximum:
+        fail(f"{name} must be <= {maximum:g}")
+    return number
+
+
 def normalized_link_target(parent: Path, raw_target: str) -> Path:
     target = Path(raw_target)
     if not target.is_absolute():
@@ -252,7 +279,103 @@ def json_shape_warnings(text: str) -> tuple[str, list[dict[str, Any]]]:
     return "json", warnings
 
 
-def score_prompt(text: str, *, provider: str, minimum_cacheable_tokens: int) -> dict[str, Any]:
+def build_amortization_report(
+    *,
+    eligible: bool,
+    prefix_tokens: int,
+    expected_reuses: int,
+    cache_write_multiplier: float | None,
+    cache_read_multiplier: float | None,
+) -> dict[str, Any]:
+    """Return advisory cache amortization math using user-supplied multipliers.
+
+    ``expected_reuses`` means future cache reads after the initial cache write.
+    Multipliers are relative to uncached prefix input cost = 1.0.  Provider
+    pricing/cache policies change, so ContextGuard intentionally does not ship
+    provider-specific multiplier defaults.
+    """
+    supplied = cache_write_multiplier is not None and cache_read_multiplier is not None
+    break_even_reuses: int | None = None
+    expected_uncached_relative_cost: float | None = None
+    expected_cached_relative_cost: float | None = None
+    expected_relative_savings: float | None = None
+    status = "multipliers_not_supplied"
+    risk = "unknown"
+
+    if not eligible:
+        status = "not_cacheable"
+        risk = "high"
+    elif not supplied:
+        status = "multipliers_not_supplied"
+        risk = "unknown"
+    else:
+        expected_uncached_relative_cost = 1.0 + expected_reuses
+        expected_cached_relative_cost = cache_write_multiplier + (expected_reuses * cache_read_multiplier)
+        expected_relative_savings = expected_uncached_relative_cost - expected_cached_relative_cost
+        if cache_read_multiplier < 1.0:
+            if cache_write_multiplier <= 1.0:
+                break_even_reuses = 0
+            else:
+                break_even_reuses = int(math.ceil((cache_write_multiplier - 1.0) / (1.0 - cache_read_multiplier)))
+            if expected_reuses >= break_even_reuses:
+                status = "already_break_even_on_write" if break_even_reuses == 0 else "amortizes_with_expected_reuses"
+                risk = "low"
+            elif expected_reuses > 0:
+                status = "not_enough_expected_reuses"
+                risk = "medium"
+            else:
+                status = "not_enough_expected_reuses"
+                risk = "high"
+        elif cache_read_multiplier == 1.0 and cache_write_multiplier <= 1.0:
+            break_even_reuses = 0
+            status = "already_break_even_on_write"
+            risk = "low"
+        elif cache_read_multiplier > 1.0 and cache_write_multiplier <= 1.0 and expected_reuses == 0:
+            break_even_reuses = 0
+            status = "already_break_even_on_write"
+            risk = "low"
+        elif cache_read_multiplier > 1.0 and expected_relative_savings >= 0:
+            break_even_reuses = 0 if cache_write_multiplier <= 1.0 else None
+            status = "amortizes_with_expected_reuses"
+            risk = "medium"
+        else:
+            status = "no_read_discount"
+            risk = "high"
+
+    return {
+        "expected_reuses": expected_reuses,
+        "expected_reuses_semantics": "future_cache_reads_after_initial_write",
+        "cacheable_prefix_tokens": prefix_tokens,
+        "break_even_reuses": break_even_reuses,
+        "status": status,
+        "risk": risk,
+        "cache_write_multiplier": cache_write_multiplier,
+        "cache_read_multiplier": cache_read_multiplier,
+        "expected_uncached_relative_cost": expected_uncached_relative_cost,
+        "expected_cached_relative_cost": expected_cached_relative_cost,
+        "expected_relative_savings": expected_relative_savings,
+        "multiplier_baseline": "uncached_prefix_input_cost_equals_1.0",
+        "user_supplied_multipliers": supplied,
+        "formula": "expected_cached=write_multiplier + expected_reuses*read_multiplier; expected_uncached=1 + expected_reuses; break_even=ceil((write_multiplier - 1.0)/(1.0-read_multiplier)) only when read_multiplier<1",
+        "claim_boundary": {
+            "advisory_only": True,
+            "provider_pricing_defaults_included": False,
+            "provider_measured_cache_hit": False,
+            "hosted_api_token_or_cost_savings_claim_allowed": False,
+            "requires_user_supplied_or_provider_documented_multipliers": True,
+        },
+    }
+
+
+def score_prompt(
+    text: str,
+    *,
+    provider: str,
+    minimum_cacheable_tokens: int,
+    expected_reuses: int = DEFAULT_EXPECTED_REUSES,
+    cache_write_multiplier: float | None = None,
+    cache_read_multiplier: float | None = None,
+) -> dict[str, Any]:
     prompt_kind, shape_warnings = json_shape_warnings(text)
     dynamic_offset, dynamic_marker = first_dynamic_marker(text)
     prefix_text = text if dynamic_offset is None else text[:dynamic_offset]
@@ -282,13 +405,14 @@ def score_prompt(text: str, *, provider: str, minimum_cacheable_tokens: int) ->
             "message": "Anthropic caching usually requires cache_control around the reusable prefix.",
         })
 
+    eligible = prefix_estimated >= minimum_cacheable_tokens
     return {
         "tool": TOOL_NAME,
         "schema_version": SCHEMA_VERSION,
         "provider": provider,
         "prompt_kind": prompt_kind,
         "minimum_cacheable_tokens": minimum_cacheable_tokens,
-        "eligible": prefix_estimated >= minimum_cacheable_tokens,
+        "eligible": eligible,
         "estimated_tokens": estimated,
         "cacheable_prefix_tokens": prefix_estimated,
         "token_estimate": {
@@ -305,6 +429,13 @@ def score_prompt(text: str, *, provider: str, minimum_cacheable_tokens: int) ->
         "static_prefix_ratio": round(static_ratio, 6),
         "warnings": warnings,
         "provider_caveat": PROVIDER_CAVEATS[provider],
+        "amortization": build_amortization_report(
+            eligible=eligible,
+            prefix_tokens=prefix_estimated,
+            expected_reuses=expected_reuses,
+            cache_write_multiplier=cache_write_multiplier,
+            cache_read_multiplier=cache_read_multiplier,
+        ),
         "raw_prompt_stored": False,
         "claim_boundary": {
             "advisory_only": True,
@@ -320,11 +451,15 @@ def render_text(report: dict[str, Any]) -> str:
     status = "eligible" if report.get("eligible") else "not eligible"
     warnings = report.get("warnings") if isinstance(report.get("warnings"), list) else []
     warning_codes = ", ".join(str(item.get("code")) for item in warnings if isinstance(item, dict)) or "none"
+    amortization = report.get("amortization") if isinstance(report.get("amortization"), dict) else {}
     return (
         f"{TOOL_NAME}: {status} for {report['provider']} "
         f"(static_prefix≈{report['cacheable_prefix_tokens']} char/4 tokens, "
         f"minimum={report['minimum_cacheable_tokens']})\n"
         f"warnings: {warning_codes}\n"
+        f"amortization: {amortization.get('status', 'unknown')} "
+        f"(risk={amortization.get('risk', 'unknown')}, "
+        f"break_even_reuses={amortization.get('break_even_reuses')})\n"
         "claim boundary: advisory static lint only; not a measured provider cache hit or cost saving.\n"
     )
 
@@ -344,6 +479,24 @@ def build_parser() -> argparse.ArgumentParser:
         help="override provider threshold for model/platform-specific cache minimums",
     )
     parser.add_argument("--max-input-bytes", default=DEFAULT_MAX_INPUT_BYTES, help=f"maximum input bytes (default: {DEFAULT_MAX_INPUT_BYTES})")
+    parser.add_argument(
+        "--expected-reuses",
+        default=DEFAULT_EXPECTED_REUSES,
+        help=(
+            "future cache reads expected after the initial write; advisory only "
+            f"(default: {DEFAULT_EXPECTED_REUSES})"
+        ),
+    )
+    parser.add_argument(
+        "--cache-write-multiplier",
+        default=None,
+        help="optional user-supplied cache write multiplier relative to uncached prefix input cost=1.0",
+    )
+    parser.add_argument(
+        "--cache-read-multiplier",
+        default=None,
+        help="optional user-supplied cache read multiplier relative to uncached prefix input cost=1.0",
+    )
     parser.add_argument("--json", action="store_true", help="emit stable JSON")
     return parser
 
@@ -362,8 +515,34 @@ def main(argv: list[str] | None = None) -> int:
             maximum=10_000_000,
             name="--minimum-cacheable-tokens",
         )
+        expected_reuses = bounded_int(
+            args.expected_reuses,
+            default=DEFAULT_EXPECTED_REUSES,
+            minimum=0,
+            maximum=MAX_EXPECTED_REUSES,
+            name="--expected-reuses",
+        )
+        cache_write_multiplier = bounded_float(
+            args.cache_write_multiplier,
+            minimum=0.0,
+            maximum=MAX_CACHE_MULTIPLIER,
+            name="--cache-write-multiplier",
+        )
+        cache_read_multiplier = bounded_float(
+            args.cache_read_multiplier,
+            minimum=0.0,
+            maximum=MAX_CACHE_MULTIPLIER,
+            name="--cache-read-multiplier",
+        )
         text = read_limited_path(Path(args.input), max_input_bytes) if args.input else read_limited_stdin(max_input_bytes)
-        report = score_prompt(text, provider=provider, minimum_cacheable_tokens=minimum)
+        report = score_prompt(
+            text,
+            provider=provider,
+            minimum_cacheable_tokens=minimum,
+            expected_reuses=expected_reuses,
+            cache_write_multiplier=cache_write_multiplier,
+            cache_read_multiplier=cache_read_multiplier,
+        )
         if args.json:
             sys.stdout.write(json_bytes(report, indent=2) + "\n")
         else:
diff --git a/plugins/context-guard/bin/context-guard-tool-prune b/plugins/context-guard/bin/context-guard-tool-prune
index c070c42..d2ae4a1 100755
--- a/plugins/context-guard/bin/context-guard-tool-prune
+++ b/plugins/context-guard/bin/context-guard-tool-prune
@@ -844,7 +844,14 @@ def defer_report(args: argparse.Namespace) -> str:
         namespace_top=namespace_top,
     )
     all_schema_bytes = sum(byte_len_json(cand.schema) for cand in ranked)
+    listed_deferred_schema_bytes = sum(byte_len_json(cand.schema) for cand in deferred_candidates)
+    total_deferred_schema_bytes = sum(byte_len_json(cand.schema) for cand in ranked[core_top:])
     tool_stub_report_bytes = byte_len_json(core_tools) + byte_len_json(deferred_tools)
+    all_schema_tokens = proxy_tokens(all_schema_bytes)
+    inline_core_schema_tokens = proxy_tokens(core_schema_bytes)
+    listed_deferred_schema_tokens = proxy_tokens(listed_deferred_schema_bytes)
+    total_deferred_schema_tokens = proxy_tokens(total_deferred_schema_bytes)
+    tool_stub_report_tokens = proxy_tokens(tool_stub_report_bytes)
     result = {
         "tool": TOOL_NAME,
         "schema_version": DEFER_SCHEMA_VERSION,
@@ -862,6 +869,7 @@ def defer_report(args: argparse.Namespace) -> str:
         "deferred_tools_truncated_count": max(0, len(ranked) - core_top - len(deferred_tools)),
         "deferred_namespaces": deferred_namespaces,
         "deferred_namespaces_truncated_count": deferred_namespaces_truncated_count,
+        "deferred_schema_retrieval_required_before_use": True,
         "receipt": {
             **receipt,
             "bytes": receipt_size,
@@ -871,9 +879,21 @@ def defer_report(args: argparse.Namespace) -> str:
             "method": "char4_proxy",
             "chars_per_token": TOKEN_PROXY_CHARS_PER_TOKEN,
             "all_schema_bytes": all_schema_bytes,
+            "inline_core_schema_bytes": core_schema_bytes,
+            "listed_deferred_schema_bytes": listed_deferred_schema_bytes,
+            "total_deferred_schema_bytes": total_deferred_schema_bytes,
             "tool_stub_report_bytes": tool_stub_report_bytes,
-            "all_schema_tokens_estimated": proxy_tokens(all_schema_bytes),
-            "tool_stub_report_tokens_estimated": proxy_tokens(tool_stub_report_bytes),
+            "all_schema_tokens_estimated": all_schema_tokens,
+            "inline_core_schema_tokens_estimated": inline_core_schema_tokens,
+            "listed_deferred_schema_tokens_estimated": listed_deferred_schema_tokens,
+            "total_deferred_schema_tokens_estimated": total_deferred_schema_tokens,
+            "tool_stub_report_tokens_estimated": tool_stub_report_tokens,
+            "gross_listed_deferred_schema_tokens_avoided": listed_deferred_schema_tokens,
+            "gross_total_deferred_schema_tokens_avoided": total_deferred_schema_tokens,
+            "net_initial_report_tokens_delta": tool_stub_report_tokens - all_schema_tokens,
+            "net_initial_report_tokens_delta_semantics": "tool_stub_report_tokens_estimated_minus_all_schema_tokens_estimated",
+            "estimated_initial_schema_tokens_avoided": max(0, all_schema_tokens - tool_stub_report_tokens),
+            "estimated_initial_schema_tokens_avoided_semantics": "max(0, all_schema_tokens_estimated - tool_stub_report_tokens_estimated)",
             "claim_boundary": "proxy_only_not_provider_billed_tokens",
         },
         "provider_patterns": [
@@ -899,11 +919,13 @@ def defer_report(args: argparse.Namespace) -> str:
             "provider_tool_search_configured": False,
             "hosted_api_token_or_cost_savings_claim_allowed": False,
             "requires_provider_measured_matched_tasks_for_savings_claims": True,
+            "deferred_schema_retrieval_required_before_use": True,
         },
         "redaction": {"redacted_values": total_redactions},
         "caveats": [
             "Deferred loading is an application strategy report, not a native provider integration.",
             "Token proxy values are char/4 estimates over sanitized local JSON, not billed provider tokens.",
+            "Deferred schema token fields are initial-prompt proxy accounting; full schemas must be retrieved before deferred tool use.",
             "Use receipt get commands to retrieve full sanitized schemas before using deferred tools.",
         ],
     }
diff --git a/tests/test_context_guard_kit.py b/tests/test_context_guard_kit.py
index 0a4141c..c5c8335 100644
--- a/tests/test_context_guard_kit.py
+++ b/tests/test_context_guard_kit.py
@@ -10321,7 +10321,19 @@ def test_cache_score_reports_static_prefix_and_claim_boundary(self):
         prompt = stable + "\nrequest_id: 123e4567-e89b-12d3-a456-426614174000\nuser: fix CI"
         for script in CACHE_SCORE_SCRIPTS:
             with self.subTest(script=script):
-                proc = self._run_cache_score(script, "--provider", "openai", "--json", input_data=prompt)
+                proc = self._run_cache_score(
+                    script,
+                    "--provider",
+                    "openai",
+                    "--expected-reuses",
+                    "3",
+                    "--cache-write-multiplier",
+                    "1.25",
+                    "--cache-read-multiplier",
+                    "0.1",
+                    "--json",
+                    input_data=prompt,
+                )
                 data = json.loads(proc.stdout)
                 self.assertEqual(data["tool"], "context-guard-cache-score")
                 self.assertEqual(data["schema_version"], "contextguard.cache-score.v1")
@@ -10334,10 +10346,44 @@ def test_cache_score_reports_static_prefix_and_claim_boundary(self):
                 self.assertFalse(data["raw_prompt_stored"])
                 self.assertFalse(data["claim_boundary"]["hosted_api_token_or_cost_savings_claim_allowed"])
                 self.assertTrue(data["claim_boundary"]["requires_provider_usage_fields_for_claims"])
+                amortization = data["amortization"]
+                self.assertEqual(amortization["expected_reuses"], 3)
+                self.assertEqual(amortization["expected_reuses_semantics"], "future_cache_reads_after_initial_write")
+                self.assertEqual(amortization["cache_write_multiplier"], 1.25)
+                self.assertEqual(amortization["cache_read_multiplier"], 0.1)
+                self.assertEqual(amortization["break_even_reuses"], 1)
+                self.assertEqual(amortization["status"], "amortizes_with_expected_reuses")
+                self.assertEqual(amortization["risk"], "low")
+                self.assertAlmostEqual(amortization["expected_uncached_relative_cost"], 4.0)
+                self.assertAlmostEqual(amortization["expected_cached_relative_cost"], 1.55)
+                self.assertAlmostEqual(amortization["expected_relative_savings"], 2.45)
+                self.assertTrue(amortization["user_supplied_multipliers"])
+                self.assertFalse(amortization["claim_boundary"]["hosted_api_token_or_cost_savings_claim_allowed"])
                 warning_codes = {item["code"] for item in data["warnings"]}
                 self.assertIn("dynamic_marker_in_prompt", warning_codes)
                 self.assertNotIn(stable[:80], proc.stdout)
 
+                premium_proc = self._run_cache_score(
+                    script,
+                    "--provider",
+                    "openai",
+                    "--expected-reuses",
+                    "1",
+                    "--cache-write-multiplier",
+                    "0.5",
+                    "--cache-read-multiplier",
+                    "2",
+                    "--json",
+                    input_data=prompt,
+                )
+                premium = json.loads(premium_proc.stdout)["amortization"]
+                self.assertEqual(premium["status"], "no_read_discount")
+                self.assertEqual(premium["risk"], "high")
+                self.assertIsNone(premium["break_even_reuses"])
+                self.assertAlmostEqual(premium["expected_uncached_relative_cost"], 2.0)
+                self.assertAlmostEqual(premium["expected_cached_relative_cost"], 2.5)
+                self.assertLess(premium["expected_relative_savings"], 0)
+
     def test_cache_score_json_order_provider_thresholds_and_help(self):
         request = {
             "tools": [
@@ -10367,6 +10413,8 @@ def test_cache_score_json_order_provider_thresholds_and_help(self):
                 self.assertIn("json_object_key_order_not_sorted", codes)
                 self.assertIn("tool_order_not_sorted", codes)
                 self.assertIn("anthropic_cache_control_not_detected", codes)
+                self.assertEqual(data["amortization"]["status"], "not_cacheable")
+                self.assertFalse(data["amortization"]["user_supplied_multipliers"])
                 warning_paths = {item.get("path") for item in data["warnings"]}
                 self.assertIn("$.[redacted-key]", warning_paths)
                 self.assertNotIn("$.timestamp", warning_paths)
@@ -10399,6 +10447,12 @@ def test_cache_score_rejects_symlink_and_oversized_input(self):
                     oversized = self._run_cache_score(script, "--max-input-bytes", "5", input_data="0123456789", check=False)
                     self.assertNotEqual(oversized.returncode, 0)
                     self.assertIn("max-input-bytes", oversized.stderr)
+                    bad_reuses = self._run_cache_score(script, "--expected-reuses", "-1", input_data="stable", check=False)
+                    self.assertNotEqual(bad_reuses.returncode, 0)
+                    self.assertIn("expected-reuses", bad_reuses.stderr)
+                    bad_multiplier = self._run_cache_score(script, "--cache-read-multiplier", "NaN", input_data="stable", check=False)
+                    self.assertNotEqual(bad_multiplier.returncode, 0)
+                    self.assertIn("cache-read-multiplier", bad_multiplier.stderr)
 
 
     def _run_tool_prune(self, script: Path, cwd: Path, *args: str, input_data: str | None = None, check: bool = True) -> subprocess.CompletedProcess[str]:
@@ -10514,6 +10568,8 @@ def test_tool_prune_defer_report_splits_core_deferred_and_preserves_receipt(self
                     self.assertFalse(data["native_provider_integration"])
                     self.assertFalse(data["claim_boundary"]["native_provider_integration"])
                     self.assertFalse(data["claim_boundary"]["hosted_api_token_or_cost_savings_claim_allowed"])
+                    self.assertTrue(data["claim_boundary"]["deferred_schema_retrieval_required_before_use"])
+                    self.assertTrue(data["deferred_schema_retrieval_required_before_use"])
                     self.assertEqual(len(data["core_tools"]), 1)
                     self.assertEqual(len(data["deferred_tools"]), 2)
                     self.assertFalse(data["core_tools"][0]["schema_included"])
@@ -10523,6 +10579,31 @@ def test_tool_prune_defer_report_splits_core_deferred_and_preserves_receipt(self
                     self.assertEqual(data["token_proxy"]["chars_per_token"], 4)
                     self.assertIn("tool_stub_report_bytes", data["token_proxy"])
                     self.assertNotIn("inline_report_bytes", data["token_proxy"])
+                    self.assertIn("inline_core_schema_bytes", data["token_proxy"])
+                    self.assertIn("listed_deferred_schema_bytes", data["token_proxy"])
+                    self.assertIn("total_deferred_schema_bytes", data["token_proxy"])
+                    self.assertIn("gross_listed_deferred_schema_tokens_avoided", data["token_proxy"])
+                    self.assertIn("gross_total_deferred_schema_tokens_avoided", data["token_proxy"])
+                    self.assertIn("net_initial_report_tokens_delta", data["token_proxy"])
+                    self.assertIn("estimated_initial_schema_tokens_avoided", data["token_proxy"])
+                    self.assertEqual(
+                        data["token_proxy"]["net_initial_report_tokens_delta"],
+                        data["token_proxy"]["tool_stub_report_tokens_estimated"]
+                        - data["token_proxy"]["all_schema_tokens_estimated"],
+                    )
+                    self.assertEqual(
+                        data["token_proxy"]["estimated_initial_schema_tokens_avoided"],
+                        max(
+                            0,
+                            data["token_proxy"]["all_schema_tokens_estimated"]
+                            - data["token_proxy"]["tool_stub_report_tokens_estimated"],
+                        ),
+                    )
+                    self.assertGreater(data["token_proxy"]["listed_deferred_schema_tokens_estimated"], 0)
+                    self.assertGreaterEqual(
+                        data["token_proxy"]["total_deferred_schema_tokens_estimated"],
+                        data["token_proxy"]["listed_deferred_schema_tokens_estimated"],
+                    )
                     self.assertIn("proxy_only_not_provider_billed_tokens", data["token_proxy"]["claim_boundary"])
                     self.assertEqual(data["listed_deferred_count"], 2)
                     self.assertEqual(data["total_deferred_count"], 2)
@@ -23757,6 +23838,15 @@ def test_benchmark_report_does_not_claim_shifted_cost_when_cost_unmeasured(self)
                 )
                 self.assertEqual(report["claim_status"], "token_savings_observed_cost_unmeasured")
                 self.assertIsNone(report["comparisons"][0]["cost_savings_pct_with_shift"])
+                baseline = report["measurement_baseline"]
+                self.assertEqual(baseline["schema_version"], "contextguard.bench.measurement-baseline.v1")
+                self.assertTrue(baseline["csv_schema_unchanged"])
+                self.assertIn("total_cost_with_shift_usd", baseline["csv_columns"])
+                self.assertIn("primary_token_buckets", baseline["captured_fields"])
+                self.assertIn("primary_tokens_measured", baseline["captured_fields"]["primary_token_buckets"])
+                self.assertIn("repo_revision", baseline["missing_future_run_identity_fields"])
+                self.assertFalse(baseline["claim_boundary"]["enables_savings_claims_by_itself"])
+                self.assertTrue(baseline["claim_boundary"]["requires_matched_successful_tasks"])
 
     def test_benchmark_report_treats_missing_external_cost_as_unmeasured(self):
         for index, script in enumerate(BENCH_SCRIPTS):