Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .agents/skills/codestory-grounding/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,13 @@ checkout is only the tool artifact unless the user is editing CodeStory itself.
failed, treat product retrieval as unavailable until `retrieval_mode=full` is
restored. Repo-text output is diagnostic only; do not use it as a substitute
for mandatory sidecar evidence.
- Under `graph_first_v1`, `retrieval_mode=full` means graph and lexical sidecars
are complete, generated `symbol_search_doc` and component-report virtual docs
are current, and Qdrant is complete only for selected dense anchors. A zero
dense-anchor manifest is valid only when reported explicitly; otherwise
Qdrant mismatch or unavailability is fail-closed. Search evidence should name
provenance such as `exact`, `lexical_source`, `symbol_doc`, `graph_neighbor`,
`component_report`, or `dense_anchor`.

## Command Routing

Expand Down
4 changes: 2 additions & 2 deletions .agents/skills/codestory-grounding/references/doctor.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ Reads project/cache/index/retrieval health without mutating the index. Use it at
| Path | Command | Expected result |
|------|---------|-----------------|
| Normal path | `<codestory-cli> doctor --project <target-workspace>` | Reports project root, cache path, indexed stats, retrieval state, sidecar embedding setup, environment hints, and next commands. |
| Failure path | If cache or index checks warn, run `index --project <target-workspace> --refresh full`; if mandatory sidecars are missing or stale, run the setup/index commands surfaced by `doctor`; if semantic reports `semantic partial`, `semantic stale`, or `semantic failed`, rebuild before trusting broad packet/search evidence. | Separates missing index, stale semantic docs, partial semantic docs, and mandatory retrieval setup failures. |
| Failure path | If cache or index checks warn, run `index --project <target-workspace> --refresh full`; if mandatory sidecars are missing or stale, run the setup/index commands surfaced by `doctor`; if symbol docs, dense anchors, policy version, Qdrant counts, or semantic health report partial/stale/failed state, rebuild before trusting broad packet/search evidence. | Separates missing index, stale symbol docs, partial dense anchors, and mandatory retrieval setup failures. |
| Integration edge | Use doctor before `ground`, `search --why`, `explore`, `context`, or `serve`; its next commands are the safe follow-up loop. | Prevents read commands from silently querying the wrong or empty cache. |

## Notes
Expand All @@ -31,5 +31,5 @@ Reads project/cache/index/retrieval health without mutating the index. Use it at
- The `attention:` block repeats warnings first so agents do not miss semantic partial/stale/failure messages buried in the full check list.
- Environment rows report retrieval-related variables such as `CODESTORY_EMBED_BACKEND`, `CODESTORY_EMBED_LLAMACPP_URL`, and sidecar enablement flags.
- The embedding checks distinguish product llama.cpp sidecar state from hash, ONNX, disabled, or stale diagnostic states.
- Treat `semantic ok` plus `retrieval_mode=full` as the health state suitable for broad repository explanation prompts. Treat `semantic partial`, `semantic stale`, `semantic failed`, and non-`full` retrieval modes as instructions to repair setup or rebuild before trusting agent-facing evidence.
- Treat `semantic ok` plus `retrieval_mode=full` as the health state suitable for broad repository explanation prompts. Under `graph_first_v1`, `full` may explicitly skip Qdrant only when dense-anchor count is zero and graph/lexical artifacts are current. Treat `semantic partial`, `semantic stale`, `semantic failed`, Qdrant count mismatch, and non-`full` retrieval modes as instructions to repair setup or rebuild before trusting agent-facing evidence.
- Prefer JSON for CI or doc-contract checks.
29 changes: 17 additions & 12 deletions .agents/skills/codestory-grounding/references/index.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
# `index` - Build or Refresh the Symbol Index

Discovers project files, extracts symbols and edges, persists graph/search state
to SQLite, and synchronizes semantic docs when embedding assets are available.
to SQLite, writes graph-native symbol docs and component reports, and
synchronizes selected dense anchors when embedding assets are available.

## Usage

Expand All @@ -15,7 +16,7 @@ to SQLite, and synchronizes semantic docs when embedding assets are available.
|--------|---------|-----|
| `--project <path>` / `--path <path>` | `.` | Target repository root. Always pass this explicitly. |
| `--cache-dir <path>` | auto | Override the per-project cache root. |
| `--refresh <auto|full|incremental|none>` | `auto` | Choose the graph/snapshot/semantic refresh mode. |
| `--refresh <auto|full|incremental|none>` | `auto` | Choose the graph/snapshot/symbol-doc/dense-anchor refresh mode. |
| `--format <markdown|json>` | `markdown` | Use JSON for automation and timing analysis. |
| `--output-file <path>` | stdout | Write output to a file with an existing parent directory. |
| `--dry-run` | off | Show workspace discovery and planned adds/removals without writing storage. |
Expand All @@ -28,19 +29,21 @@ to SQLite, and synchronizes semantic docs when embedding assets are available.
| Mode | Behavior |
|------|----------|
| `auto` | Use `full` for an empty cache and `incremental` otherwise. |
| `full` | Rebuild the project graph and semantic docs from the discovered workspace. |
| `incremental` | Reindex changed/new/unindexed files, remove disappeared files, and prune touched semantic docs. |
| `full` | Rebuild the project graph, symbol docs, component reports, and dense anchors from the discovered workspace. |
| `incremental` | Reindex changed/new/unindexed files, remove disappeared files, and prune touched symbol docs or dense anchors. |
| `none` | Inspect the existing cache without refreshing it. Use only after a known-good same-session index. |

Use `--refresh full` for first-time indexes, cache/schema uncertainty, and fixes
for historical indexing failures. Incremental runs can leave stale error rows
when previously failing files are not touched.

## Semantic Retrieval
## Symbol Docs And Dense Anchors

There is no `index --semantic off` flag. Semantic docs are part of the default
index contract when embedding assets are ready. On a fresh machine, check the
setup plan first:
There is no `index --semantic off` flag. Graph-native `symbol_search_doc` rows
are part of the default index contract. Under `graph_first_v1`, dense vectors
are only written for selected anchors such as entrypoints, public APIs,
documented nontrivial symbols, central graph nodes, component reports, and
unstructured docs. On a fresh machine, check the setup plan first:

```text
<codestory-cli> setup embeddings --project <target-workspace> --dry-run --format json
Expand All @@ -53,25 +56,27 @@ High-signal environment toggles:

| Variable | Use |
|----------|-----|
| `CODESTORY_SEMANTIC_DOC_SCOPE=all` | Include all-symbol semantic docs. Accepted all-symbol aliases are `all`, `full`, `all-symbols`, and `all_symbols`; omitted or other values default to durable symbols. |
| `CODESTORY_SEMANTIC_DOC_SCOPE=all` | Include the broader all-symbol symbol-doc scope for diagnostics. Accepted aliases are `all`, `full`, `all-symbols`, and `all_symbols`; omitted or other values default to durable symbols. |
| `CODESTORY_EMBED_BACKEND=llamacpp` | Use the mandatory local llama.cpp embedding sidecar. |
| `CODESTORY_EMBED_LLAMACPP_URL=http://127.0.0.1:8080/v1/embeddings` | Product embedding endpoint for bge-base sidecar vectors. |
| `CODESTORY_SUMMARY_ENDPOINT=local` | Enable deterministic local summaries with `--summarize`. |

Use other embedding, alias, batch-size, tokenizer, provider, hash, ONNX, and
summary tuning variables only for focused diagnostics or historical comparisons.
Agent packet/search readiness requires retrieval status to report
`retrieval_mode=full`.
`retrieval_mode=full`. A zero dense-anchor corpus is valid only when the
manifest reports it explicitly; otherwise stale or unavailable Qdrant state
fails closed.

## Output

Markdown returns a compact index summary. JSON exposes the same data for tools:

- project and storage path
- refresh mode and discovered file/error counts
- local navigation readiness notes and semantic doc counts
- local navigation readiness notes, symbol-doc counts, dense-anchor counts, and policy reason counts
- parse, flush, resolve, cleanup, cache, and semantic timing buckets
- resolution counters and semantic reuse/embed/prune counts
- resolution counters plus symbol-doc write and dense-anchor reuse/embed/skip/prune counts

Important timing fields are `timings_ms.parse`, `timings_ms.flush`,
`timings_ms.resolve`, `timings_ms.cleanup`, `cache_ms.search_index`,
Expand Down
35 changes: 27 additions & 8 deletions .agents/skills/codestory-grounding/references/retrieval-rollout.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,30 @@ trustworthy; running retrieval alone is not enough.
| Rollout layer | Trustworthy proof | Run when | Does not prove |
| --- | --- | --- | --- |
| Indexer coverage | `cargo test -p codestory-indexer --test fidelity_regression`; `cargo test -p codestory-indexer --test tictactoe_language_coverage`; targeted `files` or `affected` checks for changed paths | Parser, tree-sitter, semantic-resolution, symbol, edge, file-role, or coverage changes | Sidecar readiness, runtime packet behavior, or CLI search contract |
| Retrieval sidecar crate | `cargo test -p codestory-retrieval`; then live `retrieval bootstrap`, `retrieval index --project <repo> --refresh full`, and `retrieval status --project <repo> --format json` reporting `retrieval_mode="full"` | Zoekt, Qdrant, SCIP, manifest generation, sidecar status, embedding backend/dim, or Qdrant client changes | Runtime admission, stdio cache invalidation, or full CLI output shape |
| Retrieval sidecar crate | `cargo test -p codestory-retrieval`; then live `retrieval bootstrap`, `retrieval index --project <repo> --refresh full`, and `retrieval status --project <repo> --format json` reporting `retrieval_mode="full"` plus current `symbol_doc_count`, `dense_projection_count`, `semantic_policy_version`, `graph_artifact_hash`, and dense reason counts | Zoekt, Qdrant, SCIP, manifest generation, sidecar status, symbol-doc virtual docs, dense-anchor policy, embedding backend/dim, or Qdrant client changes | Runtime admission, stdio cache invalidation, or full CLI output shape |
| Runtime integration | `cargo test -p codestory-runtime --lib`; `cargo test -p codestory-runtime --test retrieval_generalization_guard`; `cargo test -p codestory-runtime --test retrieval_eval`; set `CODESTORY_RETRIEVAL_EVAL_FULL_TESTS=1` only after real sidecars are prepared | Packet/search orchestration, fail-closed modes, retrieval shadow traces, rollback-warning logic, or runtime use of sidecar results | CLI argument/output behavior or GitHub smoke workflow behavior |
| CLI surface | `cargo test -p codestory-cli --test retrieval_bootstrap_contracts`; `cargo test -p codestory-cli --test stdio_protocol_contracts`; `cargo test -p codestory-cli --test search_json_output`; with real sidecars, run the ignored full-mode search JSON test explicitly | `retrieval bootstrap/status/index` contracts, stdio protocol/cache fingerprints, fail-closed search JSON, or user-facing command shape | Full product readiness unless `retrieval status` is `full` after live sidecar indexing |
| Benchmark harness | `cargo check -p codestory-bench --benches`; the relevant Criterion bench only when it isolates the hot path; release e2e stats for real-repo timing | New benchmark code, latency/timing claims, rollback baseline updates, or performance-sensitive retrieval/index changes | Promotion by itself; synthetic or narrow benches are scouts until real-repo evidence exists |
| Benchmark harness | `cargo check -p codestory-bench --benches`; the relevant Criterion bench only when it isolates the hot path; release e2e stats for real-repo timing; for AST-first retrieval, include same-run baseline/candidate rows for cold total index time, `semantic_embedding_ms`, dense doc count reduction, repeat refresh embedded-doc count, holdout MRR@10/Hit@10/exact-symbol Hit@1, packet lazy-search source reads, and peak descendant working set | New benchmark code, latency/timing claims, rollback baseline updates, dense-policy changes, or performance-sensitive retrieval/index changes | Promotion by itself; synthetic or narrow benches are scouts until real-repo evidence exists |
| Smoke CI | `.github/workflows/retrieval-sidecar-smoke.yml` plus `docs/contributors/retrieval-sidecar-smoke-ci.md` pass criteria | PRs touching retrieval crate, runtime/stdio/search wiring, indexer retrieval hooks, retrieval docs, scripts, Docker sidecar config, or the workflow | Full sidecar readiness. CI smoke uses `--skip-compose --wait-secs 0` and proves manifest-missing fail-closed shape only |

## Agent-Grounding Release Gates

Use the highest completed tier as the only claim level in docs, PRs, or final
handoffs:

| Tier | Required evidence | Claim boundary |
| --- | --- | --- |
| CodeStory self-e2e | Generalization lint, targeted runtime/indexer tests, release CLI build, `doctor`, and repo-scale e2e stats | This branch still works on CodeStory and product Rust has no banned holdout literals |
| Local-real drill suite | Self-e2e plus local-real packet/drill rows without skip allowances | Product tuning survived realistic local repos |
| Holdout-retrieval drill suite | Local-real plus materialized holdout-retrieval rows, required recall/quality thresholds, and forbidden-claim checks with no skip allowances | Retrieval behavior is generalized for the public holdout suite |
| Promotion-grade paired benchmark | Holdout plus repeated CodeStory/no-CodeStory rows, timing/cost accounting, answer-quality ledger classifications, and packet-first source-read avoidance checks | Useful-for-agents, speed, or savings claims |

Packet statuses (`sufficient`, `partial`, `blocked`) describe evidence coverage
only. Final answer quality is promoted only by `drill`/`drill-suite` ledger
classifications. Holdout literals belong in manifests, tests, benchmark
harnesses, or the `CODESTORY_EVAL_PROBES` eval module, not production
planner/ranker/runtime code.

## CI Smoke Triage

The Windows `retrieval-sidecar-smoke` workflow is intentionally reduced. It
Expand All @@ -38,9 +56,10 @@ evidence is trustworthy only after live sidecars are indexed and status is full.
| Symptom | Likely layer | Action |
| --- | --- | --- |
| `retrieval_manifest_missing` | Bootstrap/state exists but no project manifest was finalized | In CI smoke this is expected. For product proof, run live `retrieval index --refresh full` and recheck status |
| `sidecar_manifest_stale`, input-hash drift, or embedding-backend drift | Source, SQLite projection, semantic docs, backend, dimension, or schema changed after the manifest | Rerun `retrieval index --refresh full`; `--refresh auto` may repair stale stored semantic-doc contracts once, but explicit failures still fail closed |
| `no_semantic`, `lexical_only`, or `unavailable` with Qdrant errors | Qdrant, embedding endpoint, or semantic smoke failed | Run bootstrap, confirm ports `6333`/`6334` and the embedding endpoint, then rebuild sidecar indexes |
| Qdrant collection exists but point count is below the semantic-doc projection count, is one-point, or has a stub marker | Partial or obsolete collection | Rerun `retrieval index`; do not bless semantic smoke alone as full readiness |
| `sidecar_manifest_stale`, input-hash drift, policy-version drift, graph-artifact-hash drift, dense-reason drift, or embedding-backend drift | Source, SQLite projection, `symbol_search_doc`, dense anchors, backend, dimension, policy, or schema changed after the manifest | Rerun `retrieval index --refresh full`; `--refresh auto` may repair stale stored symbol-doc or dense-anchor contracts once, but explicit failures still fail closed |
| `no_semantic`, `lexical_only`, or `unavailable` with Qdrant errors while dense anchors are expected | Qdrant, embedding endpoint, or semantic smoke failed | Run bootstrap, confirm ports `6333`/`6334` and the embedding endpoint, then rebuild sidecar indexes |
| Qdrant skipped while manifest dense-anchor count is `0` | Expected `graph_first_v1` graph/lexical full mode | Verify Zoekt and SCIP are healthy and manifest symbol-doc count, policy version, graph hash, and dense reason counts match |
| Qdrant collection exists but point count is below the dense-anchor projection count, is one-point, or has a stub marker | Partial or obsolete collection | Rerun `retrieval index`; do not bless semantic smoke alone as full readiness |
| Qdrant response lacks `result.points[]` | Qdrant client/API contract drift or wrong image | Verify the pinned Qdrant image and update the client/test contract deliberately |
| `storage_repair.scan_errors` appears during bootstrap | Cache protection scan was incomplete | Resolve unreadable cache roots or DBs before relying on retention pruning; do not treat suppressed pruning as readiness proof |

Expand All @@ -55,8 +74,8 @@ cargo test -p codestory-cli --test codestory_repo_e2e_stats -- --ignored --nocap
```

This log is especially mandatory for retrieval rollout changes that affect
default indexing, semantic-doc persistence or reuse, sidecar indexing/status,
packet/search behavior, runtime grounding surfaces, CLI command shape, or any
performance/timing claim. A stats-only row with
default indexing, symbol-doc persistence, dense-anchor persistence or reuse,
sidecar indexing/status, packet/search behavior, runtime grounding surfaces, CLI
command shape, or any performance/timing claim. A stats-only row with
`CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1` can record local timing, but it
is not real-drill release evidence.
44 changes: 44 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,10 @@ tree-sitter-go = "0.23.4"
tree-sitter-ruby = "0.23.1"
tree-sitter-php = "0.23.11"
tree-sitter-c-sharp = "=0.23.0"
tree-sitter-kotlin-ng = "1.1.0"
tree-sitter-swift = "0.7.0"
tree-sitter-dart-orchard = "0.3.2"
tree-sitter-bash = "0.23.3"

# Semantic Analysis
tree-sitter-graph = "0.12"
Expand Down
Loading
Loading