TheGreenCedar · TheGreenCedar · Jun 11, 2026 · Jun 11, 2026 · Jun 11, 2026 · Jun 11, 2026
diff --git a/.agents/skills/codestory-grounding/SKILL.md b/.agents/skills/codestory-grounding/SKILL.md
@@ -61,6 +61,13 @@ checkout is only the tool artifact unless the user is editing CodeStory itself.
   failed, treat product retrieval as unavailable until `retrieval_mode=full` is
   restored. Repo-text output is diagnostic only; do not use it as a substitute
   for mandatory sidecar evidence.
+- Under `graph_first_v1`, `retrieval_mode=full` means graph and lexical sidecars
+  are complete, generated `symbol_search_doc` and component-report virtual docs
+  are current, and Qdrant is complete only for selected dense anchors. A zero
+  dense-anchor manifest is valid only when reported explicitly; otherwise
+  Qdrant mismatch or unavailability is fail-closed. Search evidence should name
+  provenance such as `exact`, `lexical_source`, `symbol_doc`, `graph_neighbor`,
+  `component_report`, or `dense_anchor`.
 
 ## Command Routing
 

diff --git a/.agents/skills/codestory-grounding/references/doctor.md b/.agents/skills/codestory-grounding/references/doctor.md
@@ -22,7 +22,7 @@ Reads project/cache/index/retrieval health without mutating the index. Use it at
 | Path | Command | Expected result |
 |------|---------|-----------------|
 | Normal path | `<codestory-cli> doctor --project <target-workspace>` | Reports project root, cache path, indexed stats, retrieval state, sidecar embedding setup, environment hints, and next commands. |
-| Failure path | If cache or index checks warn, run `index --project <target-workspace> --refresh full`; if mandatory sidecars are missing or stale, run the setup/index commands surfaced by `doctor`; if semantic reports `semantic partial`, `semantic stale`, or `semantic failed`, rebuild before trusting broad packet/search evidence. | Separates missing index, stale semantic docs, partial semantic docs, and mandatory retrieval setup failures. |
+| Failure path | If cache or index checks warn, run `index --project <target-workspace> --refresh full`; if mandatory sidecars are missing or stale, run the setup/index commands surfaced by `doctor`; if symbol docs, dense anchors, policy version, Qdrant counts, or semantic health report partial/stale/failed state, rebuild before trusting broad packet/search evidence. | Separates missing index, stale symbol docs, partial dense anchors, and mandatory retrieval setup failures. |
 | Integration edge | Use doctor before `ground`, `search --why`, `explore`, `context`, or `serve`; its next commands are the safe follow-up loop. | Prevents read commands from silently querying the wrong or empty cache. |
 
 ## Notes
@@ -31,5 +31,5 @@ Reads project/cache/index/retrieval health without mutating the index. Use it at
 - The `attention:` block repeats warnings first so agents do not miss semantic partial/stale/failure messages buried in the full check list.
 - Environment rows report retrieval-related variables such as `CODESTORY_EMBED_BACKEND`, `CODESTORY_EMBED_LLAMACPP_URL`, and sidecar enablement flags.
 - The embedding checks distinguish product llama.cpp sidecar state from hash, ONNX, disabled, or stale diagnostic states.
-- Treat `semantic ok` plus `retrieval_mode=full` as the health state suitable for broad repository explanation prompts. Treat `semantic partial`, `semantic stale`, `semantic failed`, and non-`full` retrieval modes as instructions to repair setup or rebuild before trusting agent-facing evidence.
+- Treat `semantic ok` plus `retrieval_mode=full` as the health state suitable for broad repository explanation prompts. Under `graph_first_v1`, `full` may explicitly skip Qdrant only when dense-anchor count is zero and graph/lexical artifacts are current. Treat `semantic partial`, `semantic stale`, `semantic failed`, Qdrant count mismatch, and non-`full` retrieval modes as instructions to repair setup or rebuild before trusting agent-facing evidence.
 - Prefer JSON for CI or doc-contract checks.
diff --git a/.agents/skills/codestory-grounding/references/index.md b/.agents/skills/codestory-grounding/references/index.md
@@ -1,7 +1,8 @@
 # `index` - Build or Refresh the Symbol Index
 
 Discovers project files, extracts symbols and edges, persists graph/search state
-to SQLite, and synchronizes semantic docs when embedding assets are available.
+to SQLite, writes graph-native symbol docs and component reports, and
+synchronizes selected dense anchors when embedding assets are available.
 
 ## Usage
 
@@ -15,7 +16,7 @@ to SQLite, and synchronizes semantic docs when embedding assets are available.
 |--------|---------|-----|
 | `--project <path>` / `--path <path>` | `.` | Target repository root. Always pass this explicitly. |
 | `--cache-dir <path>` | auto | Override the per-project cache root. |
-| `--refresh <auto|full|incremental|none>` | `auto` | Choose the graph/snapshot/semantic refresh mode. |
+| `--refresh <auto|full|incremental|none>` | `auto` | Choose the graph/snapshot/symbol-doc/dense-anchor refresh mode. |
 | `--format <markdown|json>` | `markdown` | Use JSON for automation and timing analysis. |
 | `--output-file <path>` | stdout | Write output to a file with an existing parent directory. |
 | `--dry-run` | off | Show workspace discovery and planned adds/removals without writing storage. |
@@ -28,19 +29,21 @@ to SQLite, and synchronizes semantic docs when embedding assets are available.
 | Mode | Behavior |
 |------|----------|
 | `auto` | Use `full` for an empty cache and `incremental` otherwise. |
-| `full` | Rebuild the project graph and semantic docs from the discovered workspace. |
-| `incremental` | Reindex changed/new/unindexed files, remove disappeared files, and prune touched semantic docs. |
+| `full` | Rebuild the project graph, symbol docs, component reports, and dense anchors from the discovered workspace. |
+| `incremental` | Reindex changed/new/unindexed files, remove disappeared files, and prune touched symbol docs or dense anchors. |
 | `none` | Inspect the existing cache without refreshing it. Use only after a known-good same-session index. |
 
 Use `--refresh full` for first-time indexes, cache/schema uncertainty, and fixes
 for historical indexing failures. Incremental runs can leave stale error rows
 when previously failing files are not touched.
 
-## Semantic Retrieval
+## Symbol Docs And Dense Anchors
 
-There is no `index --semantic off` flag. Semantic docs are part of the default
-index contract when embedding assets are ready. On a fresh machine, check the
-setup plan first:
+There is no `index --semantic off` flag. Graph-native `symbol_search_doc` rows
+are part of the default index contract. Under `graph_first_v1`, dense vectors
+are only written for selected anchors such as entrypoints, public APIs,
+documented nontrivial symbols, central graph nodes, component reports, and
+unstructured docs. On a fresh machine, check the setup plan first:
 
 ```text
 <codestory-cli> setup embeddings --project <target-workspace> --dry-run --format json
@@ -53,25 +56,27 @@ High-signal environment toggles:
 
 | Variable | Use |
 |----------|-----|
-| `CODESTORY_SEMANTIC_DOC_SCOPE=all` | Include all-symbol semantic docs. Accepted all-symbol aliases are `all`, `full`, `all-symbols`, and `all_symbols`; omitted or other values default to durable symbols. |
+| `CODESTORY_SEMANTIC_DOC_SCOPE=all` | Include the broader all-symbol symbol-doc scope for diagnostics. Accepted aliases are `all`, `full`, `all-symbols`, and `all_symbols`; omitted or other values default to durable symbols. |
 | `CODESTORY_EMBED_BACKEND=llamacpp` | Use the mandatory local llama.cpp embedding sidecar. |
 | `CODESTORY_EMBED_LLAMACPP_URL=http://127.0.0.1:8080/v1/embeddings` | Product embedding endpoint for bge-base sidecar vectors. |
 | `CODESTORY_SUMMARY_ENDPOINT=local` | Enable deterministic local summaries with `--summarize`. |
 
 Use other embedding, alias, batch-size, tokenizer, provider, hash, ONNX, and
 summary tuning variables only for focused diagnostics or historical comparisons.
 Agent packet/search readiness requires retrieval status to report
-`retrieval_mode=full`.
+`retrieval_mode=full`. A zero dense-anchor corpus is valid only when the
+manifest reports it explicitly; otherwise stale or unavailable Qdrant state
+fails closed.
 
 ## Output
 
 Markdown returns a compact index summary. JSON exposes the same data for tools:
 
 - project and storage path
 - refresh mode and discovered file/error counts
-- local navigation readiness notes and semantic doc counts
+- local navigation readiness notes, symbol-doc counts, dense-anchor counts, and policy reason counts
 - parse, flush, resolve, cleanup, cache, and semantic timing buckets
-- resolution counters and semantic reuse/embed/prune counts
+- resolution counters plus symbol-doc write and dense-anchor reuse/embed/skip/prune counts
 
 Important timing fields are `timings_ms.parse`, `timings_ms.flush`,
 `timings_ms.resolve`, `timings_ms.cleanup`, `cache_ms.search_index`,

diff --git a/.agents/skills/codestory-grounding/references/retrieval-rollout.md b/.agents/skills/codestory-grounding/references/retrieval-rollout.md
@@ -10,12 +10,30 @@ trustworthy; running retrieval alone is not enough.
 | Rollout layer | Trustworthy proof | Run when | Does not prove |
 | --- | --- | --- | --- |
 | Indexer coverage | `cargo test -p codestory-indexer --test fidelity_regression`; `cargo test -p codestory-indexer --test tictactoe_language_coverage`; targeted `files` or `affected` checks for changed paths | Parser, tree-sitter, semantic-resolution, symbol, edge, file-role, or coverage changes | Sidecar readiness, runtime packet behavior, or CLI search contract |
-| Retrieval sidecar crate | `cargo test -p codestory-retrieval`; then live `retrieval bootstrap`, `retrieval index --project <repo> --refresh full`, and `retrieval status --project <repo> --format json` reporting `retrieval_mode="full"` | Zoekt, Qdrant, SCIP, manifest generation, sidecar status, embedding backend/dim, or Qdrant client changes | Runtime admission, stdio cache invalidation, or full CLI output shape |
+| Retrieval sidecar crate | `cargo test -p codestory-retrieval`; then live `retrieval bootstrap`, `retrieval index --project <repo> --refresh full`, and `retrieval status --project <repo> --format json` reporting `retrieval_mode="full"` plus current `symbol_doc_count`, `dense_projection_count`, `semantic_policy_version`, `graph_artifact_hash`, and dense reason counts | Zoekt, Qdrant, SCIP, manifest generation, sidecar status, symbol-doc virtual docs, dense-anchor policy, embedding backend/dim, or Qdrant client changes | Runtime admission, stdio cache invalidation, or full CLI output shape |
 | Runtime integration | `cargo test -p codestory-runtime --lib`; `cargo test -p codestory-runtime --test retrieval_generalization_guard`; `cargo test -p codestory-runtime --test retrieval_eval`; set `CODESTORY_RETRIEVAL_EVAL_FULL_TESTS=1` only after real sidecars are prepared | Packet/search orchestration, fail-closed modes, retrieval shadow traces, rollback-warning logic, or runtime use of sidecar results | CLI argument/output behavior or GitHub smoke workflow behavior |
 | CLI surface | `cargo test -p codestory-cli --test retrieval_bootstrap_contracts`; `cargo test -p codestory-cli --test stdio_protocol_contracts`; `cargo test -p codestory-cli --test search_json_output`; with real sidecars, run the ignored full-mode search JSON test explicitly | `retrieval bootstrap/status/index` contracts, stdio protocol/cache fingerprints, fail-closed search JSON, or user-facing command shape | Full product readiness unless `retrieval status` is `full` after live sidecar indexing |
-| Benchmark harness | `cargo check -p codestory-bench --benches`; the relevant Criterion bench only when it isolates the hot path; release e2e stats for real-repo timing | New benchmark code, latency/timing claims, rollback baseline updates, or performance-sensitive retrieval/index changes | Promotion by itself; synthetic or narrow benches are scouts until real-repo evidence exists |
+| Benchmark harness | `cargo check -p codestory-bench --benches`; the relevant Criterion bench only when it isolates the hot path; release e2e stats for real-repo timing; for AST-first retrieval, include same-run baseline/candidate rows for cold total index time, `semantic_embedding_ms`, dense doc count reduction, repeat refresh embedded-doc count, holdout MRR@10/Hit@10/exact-symbol Hit@1, packet lazy-search source reads, and peak descendant working set | New benchmark code, latency/timing claims, rollback baseline updates, dense-policy changes, or performance-sensitive retrieval/index changes | Promotion by itself; synthetic or narrow benches are scouts until real-repo evidence exists |
 | Smoke CI | `.github/workflows/retrieval-sidecar-smoke.yml` plus `docs/contributors/retrieval-sidecar-smoke-ci.md` pass criteria | PRs touching retrieval crate, runtime/stdio/search wiring, indexer retrieval hooks, retrieval docs, scripts, Docker sidecar config, or the workflow | Full sidecar readiness. CI smoke uses `--skip-compose --wait-secs 0` and proves manifest-missing fail-closed shape only |
 
+## Agent-Grounding Release Gates
+
+Use the highest completed tier as the only claim level in docs, PRs, or final
+handoffs:
+
+| Tier | Required evidence | Claim boundary |
+| --- | --- | --- |
+| CodeStory self-e2e | Generalization lint, targeted runtime/indexer tests, release CLI build, `doctor`, and repo-scale e2e stats | This branch still works on CodeStory and product Rust has no banned holdout literals |
+| Local-real drill suite | Self-e2e plus local-real packet/drill rows without skip allowances | Product tuning survived realistic local repos |
+| Holdout-retrieval drill suite | Local-real plus materialized holdout-retrieval rows, required recall/quality thresholds, and forbidden-claim checks with no skip allowances | Retrieval behavior is generalized for the public holdout suite |
+| Promotion-grade paired benchmark | Holdout plus repeated CodeStory/no-CodeStory rows, timing/cost accounting, answer-quality ledger classifications, and packet-first source-read avoidance checks | Useful-for-agents, speed, or savings claims |
+
+Packet statuses (`sufficient`, `partial`, `blocked`) describe evidence coverage
+only. Final answer quality is promoted only by `drill`/`drill-suite` ledger
+classifications. Holdout literals belong in manifests, tests, benchmark
+harnesses, or the `CODESTORY_EVAL_PROBES` eval module, not production
+planner/ranker/runtime code.
+
 ## CI Smoke Triage
 
 The Windows `retrieval-sidecar-smoke` workflow is intentionally reduced. It
@@ -38,9 +56,10 @@ evidence is trustworthy only after live sidecars are indexed and status is full.
 | Symptom | Likely layer | Action |
 | --- | --- | --- |
 | `retrieval_manifest_missing` | Bootstrap/state exists but no project manifest was finalized | In CI smoke this is expected. For product proof, run live `retrieval index --refresh full` and recheck status |
-| `sidecar_manifest_stale`, input-hash drift, or embedding-backend drift | Source, SQLite projection, semantic docs, backend, dimension, or schema changed after the manifest | Rerun `retrieval index --refresh full`; `--refresh auto` may repair stale stored semantic-doc contracts once, but explicit failures still fail closed |
-| `no_semantic`, `lexical_only`, or `unavailable` with Qdrant errors | Qdrant, embedding endpoint, or semantic smoke failed | Run bootstrap, confirm ports `6333`/`6334` and the embedding endpoint, then rebuild sidecar indexes |
-| Qdrant collection exists but point count is below the semantic-doc projection count, is one-point, or has a stub marker | Partial or obsolete collection | Rerun `retrieval index`; do not bless semantic smoke alone as full readiness |
+| `sidecar_manifest_stale`, input-hash drift, policy-version drift, graph-artifact-hash drift, dense-reason drift, or embedding-backend drift | Source, SQLite projection, `symbol_search_doc`, dense anchors, backend, dimension, policy, or schema changed after the manifest | Rerun `retrieval index --refresh full`; `--refresh auto` may repair stale stored symbol-doc or dense-anchor contracts once, but explicit failures still fail closed |
+| `no_semantic`, `lexical_only`, or `unavailable` with Qdrant errors while dense anchors are expected | Qdrant, embedding endpoint, or semantic smoke failed | Run bootstrap, confirm ports `6333`/`6334` and the embedding endpoint, then rebuild sidecar indexes |
+| Qdrant skipped while manifest dense-anchor count is `0` | Expected `graph_first_v1` graph/lexical full mode | Verify Zoekt and SCIP are healthy and manifest symbol-doc count, policy version, graph hash, and dense reason counts match |
+| Qdrant collection exists but point count is below the dense-anchor projection count, is one-point, or has a stub marker | Partial or obsolete collection | Rerun `retrieval index`; do not bless semantic smoke alone as full readiness |
 | Qdrant response lacks `result.points[]` | Qdrant client/API contract drift or wrong image | Verify the pinned Qdrant image and update the client/test contract deliberately |
 | `storage_repair.scan_errors` appears during bootstrap | Cache protection scan was incomplete | Resolve unreadable cache roots or DBs before relying on retention pruning; do not treat suppressed pruning as readiness proof |
 
@@ -55,8 +74,8 @@ cargo test -p codestory-cli --test codestory_repo_e2e_stats -- --ignored --nocap
 ```
 
 This log is especially mandatory for retrieval rollout changes that affect
-default indexing, semantic-doc persistence or reuse, sidecar indexing/status,
-packet/search behavior, runtime grounding surfaces, CLI command shape, or any
-performance/timing claim. A stats-only row with
+default indexing, symbol-doc persistence, dense-anchor persistence or reuse,
+sidecar indexing/status, packet/search behavior, runtime grounding surfaces, CLI
+command shape, or any performance/timing claim. A stats-only row with
 `CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1` can record local timing, but it
 is not real-drill release evidence.
diff --git a/Cargo.lock b/Cargo.lock
diff --git a/Cargo.toml b/Cargo.toml
@@ -34,6 +34,10 @@ tree-sitter-go = "0.23.4"
 tree-sitter-ruby = "0.23.1"
 tree-sitter-php = "0.23.11"
 tree-sitter-c-sharp = "=0.23.0"
+tree-sitter-kotlin-ng = "1.1.0"
+tree-sitter-swift = "0.7.0"
+tree-sitter-dart-orchard = "0.3.2"
+tree-sitter-bash = "0.23.3"
 
 # Semantic Analysis
 tree-sitter-graph = "0.12"