diff --git a/README.md b/README.md
index 4dc9f71..16a02c3 100644
--- a/README.md
+++ b/README.md
@@ -10,32 +10,42 @@ Local codebase grounding for coding agents.
-CodeStory builds a local evidence layer for a repository. It indexes files,
-symbols, relationships, snippets, search state, and freshness notes into a
-per-project SQLite cache, then exposes that evidence through a CLI and
-`serve --stdio`.
-
-Use it when a coding agent needs repository context before explaining behavior,
-planning a change, or choosing files to inspect. The workflow is explicit: check
-cache health, build or refresh the index, find candidate symbols, inspect
-relationships, pull snippets, and return an answer tied to source evidence.
-
-Repository contents and inference stay local after the required tool or model
-assets are installed. Setup can fetch the CodeStory source artifact or managed
-embedding assets; the indexed project data stays in the user cache and commands
-stay explicit about which workspace they read.
-
-## Public Promise
-
-CodeStory is a local evidence layer for repositories, not an automatic
-correctness guarantee. It gives operators and coding agents explicit commands
-for cache health, indexing, search, trails, snippets, and source-backed answers
-that name the files they used. The per-project SQLite cache is separate from
-the optional local retrieval sidecars used by packet/search workflows; a healthy
-local navigation readiness report does not by itself prove agent packet/search
-readiness and does not by itself prove sidecar readiness. Benchmark notes are
-environment- and repository-specific evidence, so public claims should cite the
-checked setup instead of promising universal speedups or savings.
+**Situation.** You are in a repo with more files than anyone holds in memory.
+The agent needs to change behavior that spans packages - routing, indexing,
+auth, whatever - not rename a variable in the one file already open.
+
+**Task.** Find the symbol that owns the behavior, see who calls it, read the
+source that actually runs, and know what to touch next. Without treating
+`grep -R` as architecture.
+
+**Action.** CodeStory indexes the workspace into a local SQLite graph:
+files, symbols, calls, imports, snippets, search projections, and freshness
+notes. Use `doctor`, `index`, `ground`, and `report` for local navigation. Use
+`packet` and `search` after sidecars report `retrieval_mode: "full"`.
+
+**Result.** Work starts at a file and line you can open, not whichever match
+ranked first in ripgrep. Answers say what they used; gaps say when the index or
+sidecars are stale.
+
+```mermaid
+flowchart LR
+ Repo["workspace files"] --> Index["index"]
+ Index --> Store["local SQLite graph"]
+ Store --> Local["local: ground, report, files, trail, snippet"]
+ Store --> RetrievalIndex["retrieval index"]
+ RetrievalIndex --> Sidecars["Zoekt + Qdrant + SCIP + llama.cpp"]
+ Sidecars --> AgentSearch["agent: packet, search"]
+```
+
+## What You Get
+
+| Need | Use |
+| --- | --- |
+| "Where do I start?" | `doctor`, `index`, `ground`, `report` |
+| "What does this symbol touch?" | `symbol`, `trail`, `snippet` |
+| "What changed and what might break?" | `affected` |
+| "Answer a broad repo question with citations." | `packet` with full sidecars |
+| "Find candidates by behavior term, path, symbol, or literal." | `search` with full sidecars |
## Try It On A Repo
@@ -51,158 +61,73 @@ TARGET_WORKSPACE="/path/to/repo"
"$CODESTORY_CLI" index --project "$TARGET_WORKSPACE" --refresh full
"$CODESTORY_CLI" ground --project "$TARGET_WORKSPACE" --why
"$CODESTORY_CLI" report --project "$TARGET_WORKSPACE" --output-file codestory-report.md
-"$CODESTORY_CLI" report --project "$TARGET_WORKSPACE" --format json --output-file codestory-graph.json
```
On Windows PowerShell, use `.\target\release\codestory-cli.exe`, environment
-assignments such as `$env:NAME = "value"`, and normal Windows paths such as
-`C:\path\to\repo`.
-
-That basic path establishes local navigation readiness: the local cache, graph,
-lexical index, and DB-backed navigation commands are usable for health, file,
-symbol, trail, snippet, context, orientation checks, and derived report/export
-artifacts.
-`report` reads the current SQLite store and writes generated artifacts; the
-Markdown report and full JSON graph export are not source-of-truth state. The managed
-embedding dry-run is a local semantic setup check; it does not prove agent
-packet/search readiness.
-
-Agent packet/search readiness requires `retrieval_mode=full` from local Zoekt,
-Qdrant, SCIP, and llama.cpp sidecars. See [docs/usage.md](docs/usage.md) for the
-full local-navigation versus sidecar-readiness split and
-[docs/ops/retrieval-sidecars.md](docs/ops/retrieval-sidecars.md) for sidecar
-setup.
+assignments such as `$env:TARGET_WORKSPACE = "C:\path\to\repo"`, and normal
+Windows paths.
+
+That path proves local navigation readiness. It does not prove sidecar readiness
+for `packet` or `search`.
-After that first index, use narrower commands instead of asking the agent to
-start over:
+For agent-facing packet/search evidence, build and verify sidecars first:
```sh
+"$CODESTORY_CLI" retrieval bootstrap --project "$TARGET_WORKSPACE" --format json
+"$CODESTORY_CLI" retrieval index --project "$TARGET_WORKSPACE" --refresh full
+"$CODESTORY_CLI" retrieval status --project "$TARGET_WORKSPACE" --format json
+"$CODESTORY_CLI" packet --project "$TARGET_WORKSPACE" --question "what owns request routing?"
"$CODESTORY_CLI" search --project "$TARGET_WORKSPACE" --query "request routing" --why
-"$CODESTORY_CLI" trail --project "$TARGET_WORKSPACE" --id --story --hide-speculative
-"$CODESTORY_CLI" snippet --project "$TARGET_WORKSPACE" --id --context 40
```
-A good CodeStory-backed answer should name the source files it used, say when
-evidence is stale or partial, and give the next concrete command when more proof
-is needed.
-
-For task-shaped flows, use [docs/usage.md](docs/usage.md).
-
-## Retrieval sidecars
-
-For Zoekt/Qdrant/SCIP packet retrieval, run `cargo retrieval-setup` once from
-this repository root, then follow
-[docs/ops/retrieval-sidecars.md](docs/ops/retrieval-sidecars.md) for bootstrap
-flags, version pins, and troubleshooting.
+`retrieval status` must report `retrieval_mode: "full"` before trusting
+`packet` or `search`. See
+[docs/usage.md](docs/usage.md) for task-shaped flows and
+[docs/ops/retrieval-sidecars.md](docs/ops/retrieval-sidecars.md) for sidecar
+setup and repair.
## Install As An Agent Skill
-Use this path when CodeStory should be installed once as a grounding skill and
-then pointed at whatever repository an agent is working on.
-
-```sh
-SkillHome=""
-mkdir -p "$SkillHome"
-cp -R ./.agents/skills/codestory-grounding "$SkillHome/codestory-grounding"
-bash "$SkillHome/codestory-grounding/scripts/setup.sh"
-```
-
-On Windows PowerShell:
-
-```powershell
-$SkillHome = ""
-New-Item -ItemType Directory -Force -Path $SkillHome | Out-Null
-Copy-Item -Recurse -Force .\.agents\skills\codestory-grounding "$SkillHome\codestory-grounding"
-& "$SkillHome\codestory-grounding\scripts\setup.ps1"
-```
-
-The setup script prints `CODESTORY_CLI=`. Persist that path if your agent
-environment does not preserve variables between sessions.
+Copy [`.agents/skills/codestory-grounding`](.agents/skills/codestory-grounding) to
+your skill directory. Run `scripts/setup.sh` or `scripts/setup.ps1`. See
+[`.agents/skills/codestory-grounding/SKILL.md`](.agents/skills/codestory-grounding/SKILL.md).
-The skill package lives at
-[.agents/skills/codestory-grounding/SKILL.md](.agents/skills/codestory-grounding/SKILL.md).
+## Commands
-## Core Flow
-
-| Need | Command |
+| Task | Command |
| --- | --- |
-| Local navigation readiness | `codestory-cli doctor --project ` |
-| Build or refresh an index | `codestory-cli index --project --refresh full` |
-| Broad orientation | `codestory-cli ground --project --why` |
-| Repo report / graph export | `codestory-cli report --project --format markdown` |
-| Broad task evidence (requires full sidecar retrieval) | `codestory-cli packet --project --question "" --budget compact` |
-| Candidate discovery (requires full sidecar retrieval) | `codestory-cli search --project --query "" --why` |
-| Exact symbol evidence | `codestory-cli symbol --project --id ` |
-| Flow evidence | `codestory-cli trail --project --id --story --hide-speculative` |
-| Source excerpt | `codestory-cli snippet --project --id ` |
-| Bundled navigation packet | `codestory-cli explore --project --id --no-tui` |
-| Deep context bundle | `codestory-cli context --project --id ` |
-| Changed-file impact | `codestory-cli affected --project --format markdown` |
-| Persistent read surface | `codestory-cli serve --project --stdio` |
-
-Use `packet` for broad task questions once `ready --goal agent` reports full
-sidecar retrieval. For local cache-only inspection, start with `ground`,
-`report`, or `doctor`, then use `symbol`, `trail`, `snippet`, or `context` after
-you have a concrete target. Use `doctor` when output looks stale, incomplete, or
-inconsistent.
-
-## What It Builds
-
-```mermaid
-flowchart LR
- Repo["repository"] --> Workspace["workspace discovery"]
- Workspace --> Indexer["symbol and edge extraction"]
- Indexer --> Store["SQLite store"]
- Store --> Runtime["retrieval and context assembly"]
- Runtime --> CLI["CLI and stdio reads"]
- CLI --> Agent["coding agent"]
-```
-
-CodeStory builds a local evidence layer so agents can request grounded context
-instead of relying on ad hoc file reads.
-
-## Language Support Claims
-
-CodeStory separates parser-backed graph indexing, regression-tested accuracy,
-structural extraction, framework route coverage, and agent packet/search
-readiness. The current contract is documented in
+| Cache health | `doctor --project ` |
+| Index | `index --project --refresh full` |
+| Orientation | `ground --project --why` |
+| Lookup with sidecars | `search --project --query "..." --why` |
+| Call graph | `trail --project --id --story` |
+| Source | `snippet --project --id ` |
+| Target bundle | `context --project --id ` |
+| Task packet with sidecars | `packet --project --question "..."` |
+| Persistent reads | `serve --project --stdio` |
+
+## Language Support
+
+CodeStory separates parser-backed graph coverage, structural collectors,
+regression-tested fidelity, and agent packet/search readiness. The current
+contract is documented in
[docs/architecture/language-support.md](docs/architecture/language-support.md).
-In short: Python, Java, Rust, JavaScript, TypeScript/TSX, C++, C, Go, Ruby,
-PHP, C#, Kotlin, Swift, Dart, and Bash are fidelity-gated parser-backed graph
-languages; HTML, CSS, and SQL use structural collectors.
-
-The opt-in OSS language corpus pairs each public language-support profile with a
-pinned medium-sized open source project and compares raw filesystem counts
-against CodeStory indexing of the same files:
-[docs/testing/oss-language-corpus.md](docs/testing/oss-language-corpus.md).
-The separate `language-expansion-holdout` benchmark suite runs strict
-`without_codestory` versus `with_codestory` agent tasks on those pinned
-projects and records elapsed time, token usage, estimated cost, tool calls,
-command counts, source reads, post-packet source reads, and quality gates.
-
-For the system model, start with
-[docs/concepts/how-codestory-works.md](docs/concepts/how-codestory-works.md),
-then [docs/architecture/overview.md](docs/architecture/overview.md).
+Python, Java, Rust, JavaScript, TypeScript/TSX, C++, C, Go, Ruby, PHP, C#,
+Kotlin, Swift, Dart, and Bash are fidelity-gated parser-backed graph languages.
+HTML, CSS, and SQL use structural collectors.
## Evidence
-The benchmark docs are deliberately cautious. They separate current checked-in
-benchmark history from the state of your local cache, which can drift and should
-be checked with `doctor`.
-
-- Public evidence summary and caveats:
- [docs/testing/benchmark-ledger.md](docs/testing/benchmark-ledger.md)
-- Repo-scale timing history:
- [docs/testing/codestory-e2e-stats-log.md](docs/testing/codestory-e2e-stats-log.md)
-- Warm stdio loop evidence:
- [docs/testing/codestory-stdio-warm-loop-stats.md](docs/testing/codestory-stdio-warm-loop-stats.md)
-- Repeatable with/without harness:
- [`scripts/codestory-agent-ab-benchmark.mjs`](scripts/codestory-agent-ab-benchmark.mjs)
+Benchmark notes are environment- and repository-specific evidence. Do not turn
+one row into a universal savings claim.
-Do not promote a single benchmark row into a universal savings claim.
+- Scorecard and caveats: [docs/testing/benchmark-ledger.md](docs/testing/benchmark-ledger.md)
+- Repo-scale timing history: [docs/testing/codestory-e2e-stats-log.md](docs/testing/codestory-e2e-stats-log.md)
+- Warm stdio loop history: [docs/testing/codestory-stdio-warm-loop-stats.md](docs/testing/codestory-stdio-warm-loop-stats.md)
+- Repeatable with/without harness: [`scripts/codestory-agent-ab-benchmark.mjs`](scripts/codestory-agent-ab-benchmark.mjs)
-## Hack On CodeStory
+## Contributing
Start with the contributor docs, then run Cargo checks serially because this
workspace shares build locks.
@@ -210,14 +135,17 @@ workspace shares build locks.
- [docs/contributors/getting-started.md](docs/contributors/getting-started.md)
- [docs/contributors/debugging.md](docs/contributors/debugging.md)
- [docs/contributors/testing-matrix.md](docs/contributors/testing-matrix.md)
+- [docs/architecture/overview.md](docs/architecture/overview.md)
- [docs/architecture/runtime-execution-path.md](docs/architecture/runtime-execution-path.md)
-- [docs/architecture/language-support.md](docs/architecture/language-support.md)
-- [docs/architecture/subsystems/contracts.md](docs/architecture/subsystems/contracts.md)
-- [docs/architecture/subsystems/workspace.md](docs/architecture/subsystems/workspace.md)
-- [docs/architecture/subsystems/indexer.md](docs/architecture/subsystems/indexer.md)
-- [docs/architecture/subsystems/store.md](docs/architecture/subsystems/store.md)
-- [docs/architecture/subsystems/runtime.md](docs/architecture/subsystems/runtime.md)
-- [docs/architecture/subsystems/cli.md](docs/architecture/subsystems/cli.md)
+
+## Docs Map
+
+- Usage: [docs/usage.md](docs/usage.md)
+- Concepts: [docs/concepts/how-codestory-works.md](docs/concepts/how-codestory-works.md)
+- Architecture: [docs/architecture/overview.md](docs/architecture/overview.md)
+- Languages: [docs/architecture/language-support.md](docs/architecture/language-support.md)
+- Benchmarks: [docs/testing/benchmark-ledger.md](docs/testing/benchmark-ledger.md)
+- Contributing: [docs/contributors/getting-started.md](docs/contributors/getting-started.md)
## License
diff --git a/crates/codestory-cli/tests/onboarding_contracts.rs b/crates/codestory-cli/tests/onboarding_contracts.rs
deleted file mode 100644
index 436efc9..0000000
--- a/crates/codestory-cli/tests/onboarding_contracts.rs
+++ /dev/null
@@ -1,492 +0,0 @@
-use std::fs;
-use std::path::{Path, PathBuf};
-use std::process::Command;
-
-fn repo_root() -> PathBuf {
- PathBuf::from(env!("CARGO_MANIFEST_DIR"))
- .parent()
- .expect("cli crate has workspace parent")
- .parent()
- .expect("workspace root exists")
- .to_path_buf()
-}
-
-fn collect_markdown_files(dir: &Path, files: &mut Vec) {
- for entry in fs::read_dir(dir).expect("read markdown dir") {
- let entry = entry.expect("markdown entry");
- let path = entry.path();
- if path.is_dir() {
- collect_markdown_files(&path, files);
- continue;
- }
- if path.extension().and_then(|ext| ext.to_str()) == Some("md") {
- files.push(path);
- }
- }
-}
-
-fn extract_markdown_links(contents: &str) -> Vec {
- let mut links = Vec::new();
- let bytes = contents.as_bytes();
- let mut index = 0;
- while index + 1 < bytes.len() {
- if bytes[index] == b']' && bytes[index + 1] == b'(' {
- let mut end = index + 2;
- while end < bytes.len() && bytes[end] != b')' {
- end += 1;
- }
- if end < bytes.len() {
- links.push(contents[index + 2..end].trim().to_string());
- index = end;
- }
- }
- index += 1;
- }
- links
-}
-
-fn normalize_local_link_target(raw: &str) -> Option {
- let target = raw.trim().trim_matches(|ch| ch == '<' || ch == '>');
- if target.is_empty()
- || target.starts_with('#')
- || target.starts_with("http://")
- || target.starts_with("https://")
- || target.starts_with("mailto:")
- || target.starts_with("app://")
- || target.starts_with("plugin://")
- {
- return None;
- }
-
- Some(
- target
- .split_once('#')
- .map(|(path, _)| path)
- .unwrap_or(target)
- .to_string(),
- )
-}
-
-fn assert_public_doc_avoids_agent_specific_framing(file: &Path, contents: &str) {
- let lowered = contents.to_lowercase();
- for blocked in [
- "codegraph",
- "codex-first",
- "codex first",
- "global codex",
- "for codex users",
- ".codex/skills",
- ".codex\\skills",
- ] {
- assert!(
- !lowered.contains(blocked),
- "public doc should not contain `{blocked}`: {}",
- file.display()
- );
- }
-}
-
-fn extract_inline_toml_string_array(manifest: &str, key: &str) -> Vec {
- let prefix = format!("{key} = [");
- let line = manifest
- .lines()
- .find(|line| line.trim_start().starts_with(&prefix))
- .unwrap_or_else(|| panic!("manifest should contain inline array `{key}`"));
- let values = line
- .trim()
- .strip_prefix(&prefix)
- .and_then(|value| value.strip_suffix(']'))
- .unwrap_or_else(|| panic!("manifest should use inline string array for `{key}`"));
-
- values
- .split(',')
- .map(|value| value.trim().trim_matches('"').to_string())
- .filter(|value| !value.is_empty())
- .collect()
-}
-
-#[test]
-fn cli_package_metadata_is_adoption_ready() {
- let root = repo_root();
- let manifest_path = root.join("crates/codestory-cli/Cargo.toml");
- let manifest = fs::read_to_string(&manifest_path).expect("CLI manifest should exist");
-
- for required in [
- "description = \"Local repository evidence and grounding CLI for source-backed coding workflows.\"",
- "license = \"Apache-2.0\"",
- "repository = \"https://github.com/TheGreenCedar/CodeStory.git\"",
- "readme = \"../../README.md\"",
- ] {
- assert!(
- manifest.contains(required),
- "CLI package metadata should include `{required}`"
- );
- }
-
- let readme_from_manifest = manifest_path
- .parent()
- .expect("CLI manifest should have parent")
- .join("../../README.md");
- assert_eq!(
- fs::canonicalize(readme_from_manifest).expect("manifest readme path should resolve"),
- fs::canonicalize(root.join("README.md")).expect("repo README should resolve"),
- "CLI package readme should point at the repository README"
- );
-
- let keywords = extract_inline_toml_string_array(&manifest, "keywords");
- assert_eq!(
- keywords,
- vec!["code-search", "grounding", "cli", "agents"],
- "keywords should stay conservative and adoption-oriented"
- );
- assert!(
- keywords.len() <= 5,
- "crates.io accepts at most five package keywords"
- );
- for keyword in keywords {
- assert!(
- keyword.len() <= 20
- && keyword
- .chars()
- .all(|ch| ch.is_ascii_alphanumeric() || ch == '-'),
- "keyword should stay crates.io-compatible: {keyword}"
- );
- }
-
- let categories = extract_inline_toml_string_array(&manifest, "categories");
- assert_eq!(
- categories,
- vec!["command-line-utilities", "development-tools"],
- "categories should stay accurate and crates.io-compatible"
- );
-}
-
-#[test]
-fn readme_keeps_customer_first_onboarding() {
- let root = repo_root();
- let readme = fs::read_to_string(root.join("README.md")).expect("README should exist");
- assert!(readme.contains("Public Promise"));
- assert!(readme.contains("Try It On A Repo"));
- assert!(readme.contains("What It Builds"));
- assert!(readme.contains("Local codebase grounding for coding agents"));
- assert!(readme.contains("Install As An Agent Skill"));
- assert!(readme.contains("Core Flow"));
- assert!(readme.contains("Hack On CodeStory"));
- assert!(readme.contains("A good CodeStory-backed answer should name"));
- assert!(readme.contains("local evidence layer for repositories"));
- assert!(readme.contains("explicit commands"));
- assert!(readme.contains("source-backed answers"));
- assert!(readme.contains("per-project SQLite cache is separate"));
- assert!(readme.contains("local retrieval sidecars"));
- assert!(readme.contains("does not by itself prove sidecar readiness"));
- assert!(readme.contains("environment- and repository-specific evidence"));
- assert!(readme.contains("instead of promising universal speedups or savings"));
- assert!(readme.contains("benchmark history"));
- assert!(readme.contains("checked with `doctor`"));
- assert!(readme.contains(".agents/skills/codestory-grounding/SKILL.md"));
- assert!(readme.contains("docs/usage.md"));
- assert!(readme.contains("docs/concepts/how-codestory-works.md"));
- assert!(readme.contains("docs/architecture/language-support.md"));
- assert!(readme.contains("docs/testing/benchmark-ledger.md"));
- assert!(readme.contains(
- r#""$CODESTORY_CLI" setup embeddings --project "$TARGET_WORKSPACE" --dry-run --format json"#
- ));
- assert!(readme.contains("serve --stdio"));
- assert!(readme.contains("docs/architecture/overview.md"));
- assert!(readme.contains("docs/contributors/debugging.md"));
- assert!(readme.contains("docs/contributors/testing-matrix.md"));
- assert!(
- readme.find("Try It On A Repo").expect("quickstart section")
- < readme.find("Evidence").expect("evidence section"),
- "README should show the usable path before benchmark evidence"
- );
-
- for path in [
- "docs/usage.md",
- "docs/concepts/how-codestory-works.md",
- "docs/architecture/overview.md",
- "docs/architecture/runtime-execution-path.md",
- "docs/architecture/language-support.md",
- "docs/architecture/subsystems/contracts.md",
- "docs/architecture/subsystems/workspace.md",
- "docs/architecture/subsystems/indexer.md",
- "docs/architecture/subsystems/runtime.md",
- "docs/architecture/subsystems/store.md",
- "docs/architecture/subsystems/cli.md",
- "docs/contributors/getting-started.md",
- "docs/contributors/debugging.md",
- "docs/contributors/testing-matrix.md",
- ".agents/skills/codestory-grounding/scripts/setup.ps1",
- ".agents/skills/codestory-grounding/scripts/setup.sh",
- "scripts/codestory-agent-ab-benchmark.mjs",
- ] {
- assert!(
- root.join(path).exists(),
- "expected onboarding doc to exist: {path}"
- );
- }
-
- for path in [
- ".agents/skills/codestory-grounding/scripts/setup.ps1",
- ".agents/skills/codestory-grounding/scripts/setup.sh",
- ] {
- let setup = fs::read_to_string(root.join(path)).expect("read setup script");
- assert!(
- !setup.contains("DEFAULT_CODESTORY_REPO_REF"),
- "setup script should not pin a stale default CLI source ref: {path}"
- );
- assert!(
- setup.contains("CODESTORY_REPO_REF"),
- "setup script should keep explicit source-ref override support: {path}"
- );
- assert!(
- setup.contains("origin/HEAD"),
- "setup script should build the remote default branch when no ref is explicit: {path}"
- );
- }
-}
-
-#[test]
-fn docs_drift_contracts_keep_living_sources_explicit() {
- let root = repo_root();
- let readme = fs::read_to_string(root.join("README.md")).expect("README should exist");
- let usage = fs::read_to_string(root.join("docs/usage.md")).expect("usage doc should exist");
- let testing_matrix = fs::read_to_string(root.join("docs/contributors/testing-matrix.md"))
- .expect("testing matrix should exist");
- let language_support = fs::read_to_string(root.join("docs/architecture/language-support.md"))
- .expect("language support doc should exist");
- let benchmark_scorecard = fs::read_to_string(root.join("docs/testing/benchmark-ledger.md"))
- .expect("benchmark ledger should exist");
-
- assert!(
- readme.contains(
- r#""$CODESTORY_CLI" setup embeddings --project "$TARGET_WORKSPACE" --dry-run --format json"#
- ),
- "README quickstart should show first-run semantic setup dry-run"
- );
- assert!(
- !usage.contains("semantic_doc_scope = \"durable\""),
- "usage config example should omit the default durable semantic scope"
- );
- for accepted_scope in ["`all`", "`full`", "`all-symbols`", "`all_symbols`"] {
- assert!(
- usage.contains(accepted_scope),
- "usage docs should name accepted all-symbol semantic_doc_scope value {accepted_scope}"
- );
- }
- assert!(
- testing_matrix.contains("latest row in")
- && testing_matrix.contains("codestory-e2e-stats-log.md")
- && testing_matrix.contains("historical")
- && testing_matrix.contains("examples only"),
- "testing matrix should point current timing claims at the living stats log"
- );
- assert!(
- !testing_matrix.contains("The 2026-04-18 repo-scale baseline"),
- "testing matrix should not present an old hard-coded baseline as current"
- );
- assert!(
- benchmark_scorecard.contains("## Current Scorecard")
- && benchmark_scorecard.contains("codestory-e2e-stats-log.md"),
- "benchmark ledger should keep the scorecard and living timing log references"
- );
- for required in [
- "parser-backed graph",
- "fidelity-gated",
- "structural collector",
- "candidate parser compatibility record",
- "Go, Ruby, PHP, C#, Kotlin, Swift, Dart, Bash",
- "Kotlin, Swift, Dart, Bash",
- ] {
- assert!(
- language_support.contains(required),
- "language support doc should preserve support-claim term `{required}`"
- );
- }
- for required in [
- "crates/codestory-contracts/src/language_support.rs",
- "language_support_profile_for_ext",
- "language_support_profile_for_language_name",
- "get_language_for_ext",
- ] {
- assert!(
- language_support.contains(required),
- "language support docs should mention `{required}`"
- );
- }
- assert!(
- testing_matrix.contains("../architecture/language-support.md"),
- "testing matrix should link the language support claim contract"
- );
- assert!(
- root.join("docs/testing/benchmark-ledger.md").exists(),
- "benchmark ledger should preserve detailed historical rows"
- );
-}
-
-#[test]
-fn public_docs_avoid_competitor_and_agent_specific_framing() {
- let root = repo_root();
- let mut files = vec![root.join("README.md")];
- collect_markdown_files(&root.join("docs"), &mut files);
- collect_markdown_files(&root.join(".agents/skills/codestory-grounding"), &mut files);
-
- for file in files {
- let contents = fs::read_to_string(&file).expect("read public doc");
- assert_public_doc_avoids_agent_specific_framing(&file, &contents);
- }
-}
-
-#[test]
-fn usage_doc_keeps_agent_contract_terms_out_of_operator_flow() {
- let root = repo_root();
- let usage = fs::read_to_string(root.join("docs/usage.md")).expect("usage doc should exist");
- assert!(usage.contains("Common Workflows"));
- assert!(usage.contains("I need a repo overview"));
- assert!(usage.contains("I need evidence for a broad question"));
- assert!(usage.contains("The cache or local navigation looks stale"));
- assert!(usage.contains("For agent-facing packet/search recovery"));
- assert!(usage.contains(
- "codestory-cli retrieval index --project --refresh full --format json"
- ));
- for blocked in [
- "sufficiency.avoid_opening",
- "supported-claim wording",
- "claim-ledger",
- "Support files",
- ] {
- assert!(
- !usage.contains(blocked),
- "operator usage doc should not expose agent-internal contract term {blocked}"
- );
- }
-}
-
-#[test]
-fn usage_doc_names_two_readiness_tracks_and_predictable_output_modes() {
- let root = repo_root();
- let usage = fs::read_to_string(root.join("docs/usage.md")).expect("usage doc should exist");
-
- assert!(usage.contains("## Readiness Tracks"));
- assert!(usage.contains("### Local navigation/cache readiness"));
- assert!(usage.contains("### Agent packet/search sidecar readiness"));
- assert!(usage.contains("`local_navigation`"));
- assert!(usage.contains("`agent_packet_search`"));
- assert!(usage.contains("`retrieval_mode: \"full\"`"));
- assert!(usage.contains("## Predictable Output Modes"));
- assert!(usage.contains("Most commands default to Markdown"));
- assert!(
- usage.contains("Use `--format json` when automation needs the complete structured result")
- );
- assert!(usage.contains("Use `--output-file `"));
- assert!(usage.contains("The parent directory must already exist"));
- assert!(usage.contains("`explore` opens the terminal UI by default"));
- assert!(usage.contains("Use `--no-tui`"));
- assert!(
- usage
- .find("## Readiness Tracks")
- .expect("readiness heading")
- < usage
- .find("## Retrieval Defaults")
- .expect("retrieval defaults heading"),
- "usage should introduce readiness tracks before retrieval defaults"
- );
-}
-
-#[test]
-fn benchmark_docs_show_proof_tier_ladder() {
- let root = repo_root();
- let benchmark_scorecard = fs::read_to_string(root.join("docs/testing/benchmark-ledger.md"))
- .expect("benchmark ledger should exist");
-
- assert!(benchmark_scorecard.contains("## Proof Tier Ladder"));
- for tier in [
- "Stats-only local regression signal",
- "Full sidecar readiness proof",
- "Real-repo drill proof",
- "Promotion-grade benchmark proof",
- ] {
- assert!(
- benchmark_scorecard.contains(tier),
- "benchmark ledger should explain proof tier {tier}"
- );
- }
- assert!(benchmark_scorecard.contains("Full sidecar readiness, agent packet/search readiness"));
- assert!(benchmark_scorecard.contains("`retrieval_mode: \"full\"`"));
- assert!(benchmark_scorecard.contains("Generalized agent savings"));
- assert!(
- benchmark_scorecard
- .find("## Proof Tier Ladder")
- .expect("proof tier ladder")
- < benchmark_scorecard
- .find("## Promotion Rules")
- .expect("promotion rules"),
- "proof tier ladder should frame promotion rules"
- );
-}
-
-#[test]
-fn markdown_links_resolve_to_existing_local_files() {
- let root = repo_root();
- let mut markdown_files = vec![root.join("README.md")];
- collect_markdown_files(&root.join("docs"), &mut markdown_files);
-
- for file in markdown_files {
- let contents = fs::read_to_string(&file).expect("read markdown file");
- for link in extract_markdown_links(&contents) {
- let Some(target) = normalize_local_link_target(&link) else {
- continue;
- };
- let resolved = file.parent().expect("markdown file parent").join(target);
- assert!(
- resolved.exists(),
- "broken markdown link in {} -> {}",
- file.display(),
- resolved.display()
- );
- }
- }
-}
-
-#[test]
-fn codestory_grounding_skill_command_refs_track_cli_commands() {
- let root = repo_root();
- let skill_root = root.join(".agents/skills/codestory-grounding");
- let commands = [
- "index", "ground", "doctor", "search", "symbol", "trail", "snippet", "query", "explore",
- "bookmark", "context", "packet", "drill", "setup", "serve",
- ];
-
- for command in commands {
- let reference = skill_root.join("references").join(format!("{command}.md"));
- assert!(
- reference.exists(),
- "codestory-grounding should document `{command}` at {}",
- reference.display()
- );
-
- let help = Command::new(env!("CARGO_BIN_EXE_codestory-cli"))
- .arg(command)
- .arg("--help")
- .output()
- .unwrap_or_else(|error| panic!("run `{command} --help`: {error}"));
- assert!(
- help.status.success(),
- "`{command}` should remain a valid CLI subcommand\nstdout:\n{}\nstderr:\n{}",
- String::from_utf8_lossy(&help.stdout),
- String::from_utf8_lossy(&help.stderr)
- );
- }
-
- for command in ["context", "bookmark", "doctor", "explore", "serve"] {
- let reference =
- fs::read_to_string(skill_root.join("references").join(format!("{command}.md")))
- .expect("read command reference");
- for required in ["Normal path", "Failure path", "Integration edge"] {
- assert!(
- reference.contains(required),
- "`{command}` reference should include a {required} row"
- );
- }
- }
-}
diff --git a/crates/codestory-indexer/src/resolution/mod.rs b/crates/codestory-indexer/src/resolution/mod.rs
index fba97ba..5669df1 100644
--- a/crates/codestory-indexer/src/resolution/mod.rs
+++ b/crates/codestory-indexer/src/resolution/mod.rs
@@ -2769,6 +2769,7 @@ mod tests {
kind INTEGER NOT NULL,
source_node_id INTEGER NOT NULL,
target_node_id INTEGER NOT NULL,
+ file_node_id INTEGER,
resolved_target_node_id INTEGER,
confidence REAL,
certainty TEXT,
@@ -2869,6 +2870,7 @@ mod tests {
kind INTEGER NOT NULL,
source_node_id INTEGER NOT NULL,
target_node_id INTEGER NOT NULL,
+ file_node_id INTEGER,
resolved_target_node_id INTEGER,
confidence REAL,
certainty TEXT,
diff --git a/docs/architecture/overview.md b/docs/architecture/overview.md
index 2c6640a..d7138aa 100644
--- a/docs/architecture/overview.md
+++ b/docs/architecture/overview.md
@@ -1,7 +1,8 @@
# Architecture Overview
-CodeStory has one job: turn a repository into local evidence that a coding agent
-can query before relying on a small set of manually opened files.
+CodeStory turns a repository into local evidence a coding agent can query: files
+and symbols in SQLite, optional sidecar indexes for packet/search, thin CLI on
+top of `codestory-runtime`.
The runtime path is:
diff --git a/docs/concepts/how-codestory-works.md b/docs/concepts/how-codestory-works.md
index b4d86b7..dafa508 100644
--- a/docs/concepts/how-codestory-works.md
+++ b/docs/concepts/how-codestory-works.md
@@ -1,84 +1,75 @@
# How CodeStory Works
-CodeStory is a local evidence layer for codebases. It does not replace judgment,
-tests, or source reading. It makes the first pass more structured.
+CodeStory indexes a workspace into a local graph, then serves read commands
+against that graph. It does not replace tests or judgment; it structures the
+first pass.
+
+Command loop: [README - What You Get](../../README.md#what-you-get).
+Readiness lanes: [usage.md](../usage.md#readiness-tracks).
+
+```mermaid
+flowchart TD
+ Files["workspace files"] --> Plan["workspace discovery and refresh plan"]
+ Plan --> Parse["tree-sitter parsing and semantic resolution"]
+ Parse --> Graph["SQLite graph, occurrences, snippets, snapshots"]
+ Graph --> Local["local navigation commands"]
+ Graph --> SidecarBuild["retrieval index"]
+ SidecarBuild --> Sidecars["Zoekt, Qdrant, SCIP, llama.cpp"]
+ Sidecars --> Agent["agent packet/search commands"]
+```
+
+## What gets stored
+
+Per-project SQLite under your user cache, keyed by workspace path:
+
+| Stored | Purpose |
+| --- | --- |
+| File inventory and refresh metadata | Incremental re-index |
+| Graph nodes and edges | Calls, imports, overrides, references |
+| Snippets and occurrences | Source-backed reads |
+| Search projections and symbol docs | Lookup without opening every file |
+| Snapshots | Cached read models rebuilt from the graph |
+| Dense anchors (when policy selects them) | Sidecar vector search only |
-An agent usually fails on a large repo by over-weighting the first few files it
-opens. CodeStory gives that agent an indexed map before it explains behavior or
-plans a change.
+Repo content stays local. Managed setup may fetch tool assets; indexed evidence
+does not leave the cache unless you copy it.
-## The Loop
+## The loop
```text
-doctor -> index -> ground -> search -> symbol/trail/snippet/explore -> context
+doctor -> index -> ground/report/files -> exact target -> trail/snippet/context
```
-- `doctor` checks whether the cache, index, retrieval mode, and local embedding
- setup are usable.
-- `index` builds or refreshes local graph, search, snapshot, graph-native
- symbol-doc, component-report, and selected dense-anchor state for one target
- repository.
-- `ground` gives broad orientation and reports limited coverage or gaps.
-- `search` finds candidate files, symbols, routes, literals, modules, or behavior
- terms.
-- `symbol`, `trail`, `snippet`, and `explore` inspect one selected target.
-- `context` bundles deeper evidence around that concrete target.
-- `packet` handles broad task questions and reports citations, gaps, and next
- commands.
-
-The workflow is a repeatable evidence loop.
-
-## What Gets Stored
-
-CodeStory writes per-project state under the user cache, keyed by the target
-workspace path. The cache can include:
-
-- discovered files and refresh metadata
-- graph nodes for files, symbols, and related code elements
-- graph edges such as calls, imports, overrides, and references
-- source snippets and occurrence locations
-- search projection rows and local search indexes
-- grounding snapshots rebuilt from the graph
-- graph-native symbol docs, which are deterministic searchable summaries for
- durable AST symbols
-- selected dense anchors, which are the only generated docs embedded as vectors
- under the active semantic policy
-
-Repository data stays local. Managed setup may fetch tool or model assets, but
-the indexed project evidence lives in the local cache.
-
-## Key Terms
-
-- Grounding is source-backed context: the files, symbols, and summaries a command
- returns so an answer can be tied back to repository evidence.
-- A symbol doc is deterministic generated text for a symbol, stored so lexical
- and graph retrieval can find relevant code even when the query words are not
- exact.
-- A dense anchor is a policy-selected symbol, component report, or unstructured
- doc that receives a vector embedding. Code symbols do not need dense vectors
- to be product-searchable.
-- A snapshot is a cached read model rebuilt from the local graph. If a snapshot
- is stale, the tool should say so.
-- A trail is a focused graph walk around one symbol: callers, callees,
- references, or neighborhood context.
-- A packet is a bounded evidence bundle for a broad task. It should include
- citations, gaps, and follow-up commands.
-
-## What Good Looks Like
+Use `packet` and `search` after the sidecar lane reports
+`retrieval_mode: "full"`. Until then, keep local browsing on exact targets from
+`ground`, `report`, `files`, or existing node ids.
+
+## Terms
+
+| Term | Meaning |
+| --- | --- |
+| Grounding | Context tied back to indexed files and symbols |
+| Symbol doc | Generated searchable text for a symbol (lexical, not embedded by default) |
+| Dense anchor | Policy-selected symbol or report that gets a vector |
+| Snapshot | Derived read model; may be stale, and commands should say so |
+| Trail | Graph walk from one symbol: callers, callees, neighbors |
+| Packet | Bounded task evidence with citations, gaps, next commands |
+
+More: [glossary.md](../glossary.md).
+
+## What good output looks like
A good CodeStory-backed answer does three things:
-1. It names the files, symbols, or snippets it used.
-2. It says when evidence is stale, partial, ambiguous, or missing.
-3. It gives the next concrete command when the current evidence is not enough.
+1. Names the files, symbols, snippets, or sidecar evidence it used.
+2. Says when evidence is stale, partial, ambiguous, or missing.
+3. Gives the next concrete command when the current evidence is not enough.
The goal is not a more confident answer. The goal is confidence constrained by
source evidence.
-## Where To Go Next
+## Related
-- Use [../usage.md](../usage.md) for command flows.
-- Use [../architecture/overview.md](../architecture/overview.md) for the system
- boundary and crate model.
-- Use [../contributors/debugging.md](../contributors/debugging.md) when output
- looks wrong.
+- [usage.md](../usage.md)
+- [architecture/overview.md](../architecture/overview.md)
+- [contributors/debugging.md](../contributors/debugging.md)
diff --git a/docs/contributors/testing-matrix.md b/docs/contributors/testing-matrix.md
index 01c0487..dbb71d2 100644
--- a/docs/contributors/testing-matrix.md
+++ b/docs/contributors/testing-matrix.md
@@ -14,7 +14,7 @@ flowchart TD
change --> cli["CLI args or output boundary work"]
change --> bench["Bench or perf-surface work"]
change --> e2e["Repo-scale semantic or cold-start behavior"]
- docs --> docs_checks["markdown/link checks + any touched doc contracts"]
+ docs --> docs_checks["readback + git diff --check"]
always --> workspace["fmt, check, targeted tests, clippy"]
indexer --> fidelity["fidelity_regression, tictactoe_language_coverage, integration"]
store --> store_tests["cargo test -p codestory-store"]
@@ -40,11 +40,11 @@ These are the default checks for any contributor change.
If you only changed `README.md` or `docs/**`, use the smallest credible lane:
```sh
-cargo fmt --check
-cargo test -p codestory-cli --test onboarding_contracts
+git diff --check
```
-Only escalate to broader cargo checks if the doc change depends on new code behavior or command output.
+Read the changed pages back before finishing. Only escalate to broader Cargo
+checks if the doc change depends on new code behavior or command output.
## Indexer And Graph Fidelity
diff --git a/docs/glossary.md b/docs/glossary.md
index 461860e..60fb6e2 100644
--- a/docs/glossary.md
+++ b/docs/glossary.md
@@ -1,20 +1,28 @@
# Glossary
-- grounding: the process of turning indexed code state into concise, relevant context for a question or tool action
-- snapshot: a derived SQLite-backed grounding view that can be rebuilt from the primary graph tables
-- projection: derived persisted data such as callable projection state or ranked grounding summaries
-- staged snapshot: the temporary SQLite database built during full refresh before publish replaces the live cache
-- refresh baseline: the persisted file inventory and metadata used to decide what an incremental refresh must index or remove
-- trail: a focused graph walk rooted at one symbol, usually caller/callee or neighborhood oriented
-- runtime: the orchestration surface that coordinates project opening, indexing, search, grounding, trail generation, and system actions
-- workspace: the manifest plus filesystem discovery layer that decides which files belong to the project
-- contracts: shared graph, DTO, and event types that are safe to depend on across boundaries
-- repo-text hit: a direct file-content match surfaced alongside indexed-symbol search results
-- retrieval mode: retrieval status contract for sidecar evidence; `retrieval_mode=full` is required for agent packet/search readiness
-- symbol doc: deterministic generated per-symbol text stored in SQLite for graph-native lexical retrieval; it is not embedded by default
-- dense anchor: a policy-selected symbol, component report, or unstructured doc that receives a vector embedding
-- local navigation readiness: the local cache, graph, lexical index, and DB-backed navigation commands are usable
-- agent packet/search readiness: sidecar packet/search evidence is trustworthy only when retrieval status reports `retrieval_mode=full`
-- target context: DB-first evidence for one concrete target; not a replacement for broad packet, search, or drill questions
-- semantic ready: local diagnostic state where dense-anchor retrieval is enabled, an embedding runtime is available when dense anchors exist, and persisted dense anchors match the active policy; not agent packet/search readiness
-- cache root: the directory that owns one project cache; by default this is under the user cache directory, but `--cache-dir` can override it
+## Readiness
+
+- **local navigation readiness**: SQLite cache, graph, and DB-backed browse commands (`ground`, `report`, `files`, `trail`, `snippet`, `context --id`, etc.) are usable
+- **agent packet/search readiness**: sidecars are healthy and `retrieval_mode=full`; required for trustworthy `packet`, `search`, and query-based candidate discovery
+- **retrieval mode**: sidecar status contract; only `full` serves agent packet/search
+- **semantic ready**: dense-anchor embedding state matches policy; not the same as agent packet/search readiness
+
+## Index and graph
+
+- **grounding**: indexed context returned for a question or command, with source ties
+- **snapshot**: derived grounding view rebuilt from graph tables
+- **projection**: persisted derived state such as callable projection state or ranked summaries
+- **staged snapshot**: temporary DB during full refresh before publish
+- **refresh baseline**: file inventory used to plan incremental refresh
+- **trail**: focused graph walk from one symbol
+- **symbol doc**: deterministic per-symbol search text in SQLite; not embedded by default
+- **dense anchor**: symbol, component report, or doc selected for vector embedding
+- **repo-text hit**: raw file-content match; diagnostic, not a substitute for graph evidence
+
+## System
+
+- **runtime**: orchestrates indexing, grounding, trails, packet/search flows, and system actions
+- **workspace**: manifest and discovery layer for which files belong to the project
+- **contracts**: shared graph types, DTOs, and events across crates
+- **target context**: DB-first bundle for one concrete target (`context --id` or bookmark), not broad `packet`
+- **cache root**: directory for one project cache; override with `--cache-dir`
diff --git a/docs/ops/retrieval-sidecars.md b/docs/ops/retrieval-sidecars.md
index a3651ea..1c88049 100644
--- a/docs/ops/retrieval-sidecars.md
+++ b/docs/ops/retrieval-sidecars.md
@@ -1,21 +1,22 @@
# Retrieval sidecars — Operations runbook
-Local Zoekt, Qdrant, and SCIP indexer processes for sidecar packet retrieval. Data directories
-live under the CodeStory user cache; ports are fixed for local dev and CI smoke.
-
-This runbook covers the `agent_packet_search` readiness lane. Sidecar readiness
-is required before agent-facing `packet` and `search` output can be trusted.
-Local SQLite navigation is a separate `local_navigation` lane: `codestory-cli
-index`, `ground`, `symbol`, `trail`, `snippet`, `explore`, `context`, `files`,
-and `affected` can be useful with a healthy local cache, but that cache alone
-does not prove packet/search sidecar readiness.
-
-**Design reference:** [`retrieval-design.md`](../architecture/retrieval-design.md)
-(mode definitions, cost envelopes, promotion guards).
-
-**Operations reference:** this runbook owns setup commands, version pins, env
-vars, troubleshooting, and CI smoke sequences. Proof tiers and promotion
-checklists live in [`retrieval-architecture.md`](../testing/retrieval-architecture.md).
+Local Zoekt, Qdrant, SCIP, and llama.cpp processes for agent `packet` and
+`search`. Data dirs live under the user cache; default ports are 6070 (Zoekt)
+and 6333 (Qdrant).
+
+Required for `agent_packet_search` readiness (`retrieval_mode=full`). A healthy
+SQLite cache alone does not satisfy that lane.
+
+Design: [`retrieval-design.md`](../architecture/retrieval-design.md).
+Promotion checks: [`retrieval-architecture.md`](../testing/retrieval-architecture.md).
+
+```mermaid
+flowchart LR
+ cli[codestory-cli] --> zoekt["Zoekt localhost:6070"]
+ cli --> qdrant["Qdrant localhost:6333"]
+ cli --> scip[SCIP artifacts in user cache]
+ cli --> embed[llama.cpp embedding endpoint]
+```
---
@@ -34,11 +35,13 @@ checklists live in [`retrieval-architecture.md`](../testing/retrieval-architectu
From the CodeStory repository root (Windows, macOS, Linux):
```sh
-cargo retrieval-setup
+cargo run -p codestory-cli -- retrieval bootstrap --project . --format json
```
This starts or checks the local sidecar services for the CodeStory checkout; it
does not by itself finalize the retrieval manifest for every target workspace.
+The `--project .` is intentional here. For another repo, pass that repo path to
+`--project`.
Plain `codestory-cli index` builds the core SQLite code index only. It can make
the local navigation lane usable, but it does not generate sidecar artifacts or
@@ -55,7 +58,7 @@ node scripts/setup-retrieval-env.mjs --fetch-embed-model
export CODESTORY_EMBED_MODEL_DIR="$(pwd)/target/retrieval-models"
export CODESTORY_EMBED_BACKEND="llamacpp"
export CODESTORY_EMBED_LLAMACPP_URL="http://127.0.0.1:8080/v1/embeddings"
-cargo retrieval-setup
+./target/release/codestory-cli retrieval bootstrap --project --format json
./target/release/codestory-cli index --project --refresh full
./target/release/codestory-cli retrieval index --project --refresh full
./target/release/codestory-cli retrieval status --project --format json
@@ -74,12 +77,11 @@ Qdrant component as policy-skipped rather than querying a missing collection.
Status after bootstrap:
```sh
-cargo retrieval-status
+cargo run -p codestory-cli -- retrieval status --project . --format json
```
-Aliases are defined in [`.cargo/config.toml`](../../.cargo/config.toml). They run
-`codestory retrieval bootstrap --project .` and `retrieval status --project .`, building the CLI
-when needed.
+Optional aliases are defined in [`.cargo/config.toml`](../../.cargo/config.toml).
+They wrap the same project-dot bootstrap and status commands.
**Bootstrap flags** (via `cargo run -p codestory-cli -- retrieval bootstrap ...`):
@@ -102,7 +104,7 @@ node scripts/setup-retrieval-env.mjs --with-holdout-clone
|------|---------|
| `--check-only` | Prerequisites report only; exit 1 if required tools missing |
| `--skip-compose` | Passed to bootstrap |
-| `--skip-build` | Skip `cargo build` (alias still builds on first `cargo retrieval-setup`) |
+| `--skip-build` | Skip `cargo build` when the wrapper invokes bootstrap directly |
| `--with-holdout-clone` | Also run `scripts/fetch-holdout-repos.mjs` (large git clones under `target/`) |
When `--fetch-embed-model` is present, the wrapper downloads
@@ -128,7 +130,7 @@ Compose file: [`docker/retrieval-compose.yml`](../../docker/retrieval-compose.ym
| Dependency | Pin policy | Pinned version | Notes |
|------------|------------|----------------|-------|
-| Zoekt real (Phase 2) | `COMPOSE_PROFILES=real` | `zoekt-20250506123554` | `sourcegraph/zoekt-webserver:0.0.0-20250506123554-490422d1adb4` + lexical shards |
+| Zoekt real | `COMPOSE_PROFILES=real` | `zoekt-20250506123554` | `sourcegraph/zoekt-webserver:0.0.0-20250506123554-490422d1adb4` + lexical shards |
| Qdrant | Fixed container image tag | `qdrant/qdrant:v1.12.5` | HTTP `6333`, gRPC `6334` |
| SCIP | CodeStory graph artifact emitter | `graph-` | Generated local graph artifacts under the sidecar generation |
@@ -257,7 +259,7 @@ Managed `setup embeddings` output is not a substitute for this lane: it may
install local semantic assets, but it does not start llama.cpp, build the
retrieval manifest, or make `retrieval status` report `full`.
-**Phase 2 (shipped in crate):**
+**Shipped component status:**
| Component | Status |
|-----------|--------|
@@ -280,7 +282,8 @@ diagnostic only and never produce `retrieval_mode=full`.
- `CODESTORY_EMBED_MODEL_DIR=/target/retrieval-models`
- `CODESTORY_EMBED_BACKEND=llamacpp` (recommended explicit product mode; unset is also product mode for retrieval commands)
- `CODESTORY_EMBED_LLAMACPP_URL=http://127.0.0.1:8080/v1/embeddings`
-3. `cargo retrieval-setup` (starts Qdrant, Zoekt webserver, `codestory-embed` on `:8080`)
+3. `./target/release/codestory-cli retrieval bootstrap --project --format json`
+ starts Qdrant, Zoekt webserver, and `codestory-embed` on `:8080`.
4. Dim smoke: `curl -s http://127.0.0.1:8080/v1/embeddings -H "Content-Type: application/json" -d "{\"input\":[\"function\"]}"` → embedding length **768**
5. `retrieval index --project --refresh full` (manifest records `embedding_backend`, `embedding_dim`, `sidecar_input_hash`, `sidecar_generation`, the generated Qdrant collection, `symbol_doc_count`, `dense_projection_count`, `semantic_policy_version`, `graph_artifact_hash`, and dense reason counts; the input hash includes symbol-doc and dense-anchor metadata plus the embedding contract)
6. `retrieval status` → `retrieval_mode: full` and `capabilities.semantic=true`
@@ -322,7 +325,7 @@ count is zero, Qdrant reuse is skipped explicitly and cannot mask stale graph/le
./target/release/codestory-cli retrieval down
```
-### Standalone query (Phase 2+)
+### Standalone Query
```sh
./target/release/codestory-cli retrieval query "ExtensionService" --project .
@@ -342,7 +345,7 @@ GGUF embedding model.
1. `retrieval up` - exit 0
2. `retrieval status` - JSON with expected shape; non-`full` status is a failure for agent use
3. `retrieval index --project ` - manifest row in SQLite only when all sidecars are real
-4. `retrieval query ""` - Phase 2+
+4. `retrieval query ""` - standalone sidecar query
5. `retrieval down` - clean shutdown
**CI reduced sequence:**
diff --git a/docs/testing/benchmark-ledger.md b/docs/testing/benchmark-ledger.md
index 957109a..c9792b1 100644
--- a/docs/testing/benchmark-ledger.md
+++ b/docs/testing/benchmark-ledger.md
@@ -1,9 +1,8 @@
# CodeStory Benchmark Ledger
-This ledger keeps the decision-grade scorecard and detailed benchmark history
-that is too dense for the README. Treat every row as machine-, cache-, runner-,
-and date-specific. Promote only rows that pass the current harness gates
-documented below.
+Decision-grade scorecard and benchmark history - too dense for the README.
+Treat every row as machine-, cache-, runner-, and date-specific. Do not quote a
+row as a universal savings claim without checking harness tier and setup.
Runs recorded before the 2026-05-24 harness tightening are historical unless
they are reanalyzed or rerun with answer-level expected-file/symbol recall,
@@ -173,7 +172,7 @@ mismatches. Warm stdio task medians ranged from `2.69s` to `3.60s`, with an
aggregate task median of `3.13s`; cold CLI task medians ranged from `4.22s` to
`5.76s`, with an aggregate task median of `4.86s`.
-## Methodology
+## Harness Contract
The agent A/B harness runs the same repository prompt in two arms:
diff --git a/docs/testing/retrieval-architecture.md b/docs/testing/retrieval-architecture.md
index 5237e6e..102d1ae 100644
--- a/docs/testing/retrieval-architecture.md
+++ b/docs/testing/retrieval-architecture.md
@@ -12,7 +12,7 @@ env vars, CI smoke), [`../architecture/retrieval-design.md`](../architecture/ret
---
-## Implemented stack (Phases 0–5)
+## Implemented Stack
| Layer | Location | Role |
|-------|----------|------|
@@ -107,12 +107,11 @@ anchors such as repository names, specific source paths, and manifest-specific
symbols. Keep those strings in manifests, tests, benchmark harnesses, or the
test-only eval probe module.
-## Fast CI-style checks (automated in Phase 6)
+## Required Checks
```sh
cargo test -p codestory-runtime --test retrieval_generalization_guard
node --test scripts/tests/codestory-agent-ab-analyzer.test.mjs
-cargo test -p codestory-cli --test onboarding_contracts
```
Optional broader lane:
@@ -125,10 +124,10 @@ node --test scripts/tests/codestory-agent-ab-analyzer.test.mjs
---
-## Promotion checklist
+## Promotion Checklist
-Status as of Phase 6 documentation pass. **Benchmark pass columns require a human run** with
-repos, sidecars, and release CLI — not claimed here.
+**Benchmark pass columns require a human run** with repos, sidecars, and release
+CLI. This page records the gates; it does not claim those rows have passed.
### Language support audit alignment
@@ -137,7 +136,7 @@ tests in the branch. Do not infer support for languages without direct benchmark
| Item | Status | Notes |
|------|--------|-------|
-| Phases 0–5 code landed | done | See implemented stack above |
+| Core sidecar stack | done | See implemented stack above |
| Architecture / design docs | done | `docs/architecture/retrieval-design.md` |
| Sidecar runbook | done | `docs/ops/retrieval-sidecars.md` |
| Local-real manifests | done | `benchmarks/tasks/local-real/` |
@@ -145,14 +144,13 @@ tests in the branch. Do not infer support for languages without direct benchmark
| `freelancer` / `traderotate` removed from default holdouts | done | OSS holdouts only |
| Generalization lint + guard test | done | `lint-retrieval-generalization.mjs`, `retrieval_generalization_guard` |
| Warning config | done | `docs/architecture/retrieval-rollback.json` |
-| Markdown link contract (`onboarding_contracts`) | verify | `cargo test -p codestory-cli --test onboarding_contracts` |
| local-real cold packet + north-star SLOs | **human** | p99 retrieval, quality 3/4, wall targets |
| holdout-retrieval pass without skip allowances | **human** | Requires materialized OSS repos + index; no generalized claim without required recall/quality/forbidden-claim thresholds |
| `agent_value_gap` < 0.20 | **human** | Measure from a fresh coherent bundle |
| Windows `retrieval-sidecar-smoke` CI job | fail-closed sidecar smoke | [`retrieval-sidecars.md`](../ops/retrieval-sidecars.md#preflight-smoke-contract) |
| Ragas/Phoenix nightly eval | optional | Not configured |
-### North-star SLOs (targets — measure before claiming pass)
+### North-Star SLOs
| Metric | Target |
|--------|--------|
@@ -173,13 +171,13 @@ tests in the branch. Do not infer support for languages without direct benchmark
---
-## Rollback drill (REQ-RES-005)
+## Rollback Warning Drill
After promotion runs, verify rollback warnings:
1. Point `retrieval_rollback` at a baseline `packet-runtime-summary.json` with thresholds that will trip on the current summary (or use unit test `rollback_drill_warns_without_setting_legacy_env` in `retrieval_rollback.rs`).
2. Confirm `check_and_log_rollback_warnings` logs trigger ids without setting `CODESTORY_RETRIEVAL=0`.
-3. File a one-line incident note in this doc with date and trigger id if rollback fires in production promotion.
+3. Record the trigger id with the promotion evidence if rollback fires during production promotion.
**One-shot operator drill (after each promotion run):**
@@ -189,7 +187,9 @@ cargo test -p codestory-runtime retrieval_rollback::tests::rollback_drill_warns_
Expect rollback warnings only when configured thresholds fire (see `docs/architecture/retrieval-rollback.json`). Sidecar retrieval remains mandatory.
-**Closure status (2026-05-27, semantic promotion pass):** Phase A shipped (bge-base 768-d, llama.cpp `embed` compose service, manifest `embedding_backend`/`embedding_dim`, Qdrant collection migration, llamacpp dim hard-fail). Local `retrieval status` reaches `full` with default 768-d vectors after Qdrant re-index. Sidecar-primary is the intended product path, but product promotion remains gated until fresh benchmark evidence passes.
+**Promotion note:** Local `retrieval status` can report `full` after Qdrant
+re-index. Sidecar-primary is the intended product path, but product promotion
+still requires fresh benchmark evidence.
---
diff --git a/docs/usage.md b/docs/usage.md
index 3bd8702..da3dad5 100644
--- a/docs/usage.md
+++ b/docs/usage.md
@@ -1,17 +1,12 @@
# CodeStory Usage
-This is the operator guide. It keeps setup, common workflows, retrieval defaults,
-and recovery notes in one place.
-
-Examples use POSIX shell syntax unless a block is labeled PowerShell. On
-Windows, use `.\target\release\codestory-cli.exe` for the release binary,
-`$env:NAME = "value"` for environment variables, and Windows paths when that is
-the workspace you are indexing.
+Setup, workflows, sidecars, recovery. Shell examples are POSIX unless noted.
+Windows: `.\target\release\codestory-cli.exe`, `$env:NAME = "value"`.
## Install The Skill
Install the grounding skill once, then point it at explicit target workspaces.
-See [README — Install As An Agent Skill](../README.md#install-as-an-agent-skill)
+See [README - Install as an agent skill](../README.md#install-as-an-agent-skill)
for the full copy/setup commands and Windows PowerShell variant.
The source skill package lives at
@@ -42,31 +37,19 @@ TARGET_WORKSPACE="/path/to/repo"
## Readiness Tracks
-CodeStory has two readiness tracks. Keep them separate when deciding whether an
-agent can rely on packet/search output.
-
-### Local navigation/cache readiness
-
-This lane is for local browsing and source navigation. It uses the project
-SQLite cache built by `index` and read by commands such as `ground`, `symbol`,
-`trail`, `snippet`, `explore`, `context`, `files`, and `affected`.
-
-`doctor` may report this lane as `local_navigation`. Local navigation readiness
-means the local cache, graph, lexical index, and DB-backed navigation commands
-are usable. It does not prove agent packet/search readiness.
-
-### Agent packet/search sidecar readiness
+Two lanes - do not mix them when judging `packet` or `search` output.
-This lane is for agent-facing `packet` and `search` evidence. It requires the
-sidecar retrieval stack to be built and healthy: Zoekt lexical shards, Qdrant
-semantic vectors, SCIP graph artifacts, the llama.cpp query embedding endpoint,
-and a current retrieval manifest.
+| | Local navigation | Agent packet/search |
+| --- | --- | --- |
+| Lane id | `local_navigation` | `agent_packet_search` |
+| Built by | `index` | `index` then `retrieval index` |
+| Requires | Healthy SQLite cache and graph | Sidecars healthy and `retrieval_mode=full` |
+| Commands | `ground`, `report`, `files`, `symbol`, `trail`, `snippet`, `explore`, `context --id`, `affected` | `packet`, `search`, query-based candidate discovery |
+| Does not prove | Sidecar readiness | That cache-only browse is enough for agent search |
-`doctor` may report this lane as `agent_packet_search`. Agent packet/search
-readiness means sidecar packet/search evidence is trustworthy only when
-retrieval status reports `retrieval_mode: "full"`. Missing, stale, stubbed,
-hash-vector, or non-product sidecar state is diagnostic only and must not be
-described as agent packet/search readiness.
+`doctor` reports lane status. Sidecar topology:
+[architecture/overview.md](architecture/overview.md),
+[ops/retrieval-sidecars.md](ops/retrieval-sidecars.md).
## Common Workflows
@@ -76,19 +59,12 @@ described as agent packet/search readiness.
codestory-cli doctor --project
codestory-cli index --project --refresh full
codestory-cli ground --project --why
-codestory-cli report --project --output-file out/codestory-report.md
-codestory-cli report --project --format json --output-file out/codestory-graph.json
+codestory-cli report --project --output-file codestory-report.md
+codestory-cli report --project --format json --output-file codestory-graph.json
```
-Use this when the repository is new to the agent. `doctor` tells you whether the
-cache and retrieval state are usable. `ground --why` gives broad orientation and
-reports limited coverage or gaps. `report` reads the current SQLite store
-without refreshing it and emits generated artifacts: Markdown for repo summary,
-hotspots, entry points, bridge/high-connectivity nodes, and next queries; JSON
-for automation that needs the full current graph, including nodes, edges,
-confidence/certainty, source locations, and generation metadata. `--limit`
-bounds the Markdown report sections, not the full JSON graph export. Treat both
-files as outputs to regenerate, not source-of-truth state.
+Health check, orientation, optional report and graph export. Regenerate reports
+after index changes; they are artifacts, not source-of-truth state.
### I need evidence for a broad question
@@ -96,32 +72,31 @@ files as outputs to regenerate, not source-of-truth state.
codestory-cli packet --project --question "" --budget compact
```
-Use `packet` for questions like "how does routing work?" or "what owns indexing
-state?" It returns a `sufficient`, `partial`, or `blocked` status with
-citations, trust limits, gaps, and follow-up commands. If the packet is
-`partial` or `blocked`, follow the named source-truth commands instead of
-opening unstructured source files directly. Treat `sufficient` as evidence
-coverage, not final answer-quality proof.
+Returns `sufficient`, `partial`, or `blocked` with citations and follow-ups.
+Requires `retrieval_mode=full`.
### I need to understand one symbol or file
+With full sidecar readiness, use `search` for candidate discovery:
+
```sh
-codestory-cli search --project --query "" --why
-codestory-cli explore --project --id --no-tui
+codestory-cli search --project --query "" --why
codestory-cli trail --project --id --story --hide-speculative
codestory-cli snippet --project --id --context 40
```
-Start with `search`, pick a concrete `node-id`, then inspect the relationships
-and source. Use `context` when you want a bundled handoff around that target:
+Without sidecars, stay on the local navigation lane until you have a concrete
+target:
```sh
-codestory-cli context --project --id --bundle out/context-name
+codestory-cli ground --project --why
+codestory-cli report --project --output-file codestory-report.md
+codestory-cli files --project --path src --limit 80
```
-Target context is DB-first evidence for one concrete target. `context` is
-target-first; it is not an open chat endpoint and is not a replacement for broad
-`packet`, `search`, or `drill` questions.
+Then use `symbol`, `trail`, `snippet`, or `context --id` with an exact node id
+from local output. Do not treat `search` or `context --query` as cache-only
+fallbacks; query-based discovery is part of the agent packet/search lane.
### I changed files and need likely impact
@@ -129,12 +104,9 @@ target-first; it is not an open chat endpoint and is not a replacement for broad
codestory-cli index --project --refresh incremental
codestory-cli affected --project --format markdown
git diff --name-only HEAD | codestory-cli affected --project --stdin --format json
-git diff --name-status HEAD | codestory-cli affected --project --stdin --stdin-format name-status --format json
```
-Treat `affected` as test-selection evidence, not a replacement for tests. The
-default command preserves git name-status records; path-only stdin remains
-available when another tool already chose the file list.
+Impacted symbols and test hints - not a substitute for running tests.
### The cache or local navigation looks stale
@@ -144,13 +116,11 @@ codestory-cli index --project --refresh full
codestory-cli doctor --project
```
-If `doctor` reports stale inventory, dense-anchor contract mismatch, missing
-managed assets, or a non-`full` retrieval mode, fix that layer before
-investigating answer quality. Treat the health report as the first source of
-truth for cache and retrieval state.
+Fix inventory or indexing errors before trusting local navigation output. If
+`packet`, `search`, or `context --query` reports `retrieval_unavailable`, repair
+the sidecar lane instead of repeating the same command.
-For agent-facing packet/search recovery, use the full sidecar repair sequence
-that `ready --goal agent` reports:
+### For agent-facing packet/search recovery
```sh
codestory-cli retrieval bootstrap --project --format json
@@ -159,12 +129,8 @@ codestory-cli retrieval status --project --format json
codestory-cli doctor --project --format markdown
```
-When the core index is missing, stale, unchecked, or has recorded fatal indexing
-errors, `ready` reports the necessary `codestory-cli index` repair first.
-Otherwise, sidecar recovery does not need to repeat a full core reindex.
-`retrieval bootstrap` prepares or checks the local sidecar services. The target
-workspace is not packet/search-ready until `retrieval index` writes a current
-target manifest and `doctor` or `retrieval status` reports `retrieval_mode=full`.
+Target `retrieval_mode=full`. Core index problems may require `index` first -
+see `ready --goal agent`.
## Core Commands
@@ -178,15 +144,16 @@ target manifest and `doctor` or `retrieval status` reports `retrieval_mode=full`
SQLite store; use `--output-file` to keep artifacts separate from terminal
logs.
- `packet`: bounded broad-task evidence packet with citations, budget usage,
- gaps, and follow-up commands.
-- `search`: candidate discovery for symbols, files, literals, API paths,
- modules, and behavior terms.
+ gaps, and follow-up commands; requires agent packet/search readiness.
+- `search`: sidecar-backed candidate discovery for symbols, files, literals,
+ API paths, modules, and behavior terms.
- `symbol`: inspect one exact symbol and relationships.
- `trail`: follow caller, callee, and reference relationships around a symbol.
- `snippet`: fetch source context around a symbol.
- `explore`: bundled navigation packet or terminal explorer around a target.
-- `context`: deep evidence bundle for one concrete target selected by `--id`,
- `--query`, or `--bookmark`.
+- `context`: deep evidence bundle for one concrete target. `--id` and
+ `--bookmark` are exact-target paths; `--query` must be treated like
+ sidecar-backed discovery.
- `affected`: map changed files to impacted symbols and likely tests.
- `files`: inspect indexed file inventory, language counts, roles, and coverage
notes.
@@ -230,9 +197,15 @@ reset, schema change, or suspected stale-state incident.
## Predictable Output Modes
-Most commands default to Markdown for human review. Use `--format json` when automation needs the complete structured result, including exact field comparisons such as `retrieval_mode` or cache paths. Use `--output-file ` when the artifact should live outside terminal logs. The parent directory must already exist.
+Most commands default to Markdown for human review. Use `--format json` when
+automation needs the complete structured result, including exact field
+comparisons such as `retrieval_mode` or cache paths. Use `--output-file `
+when the artifact should live outside terminal logs. The parent directory must
+already exist.
-`explore` opens the terminal UI by default when a TUI is available. Use `--no-tui`, `--plain`, or `CODESTORY_NO_TUI=1` for predictable command output in agent runs, tests, non-interactive terminals, and CI logs.
+`explore` opens the terminal UI by default when a TUI is available. Use
+`--no-tui`, `--plain`, or `CODESTORY_NO_TUI=1` for predictable command output in
+agent runs, tests, non-interactive terminals, and CI logs.
Agent-facing Markdown may start with `Status`, `Trust`, `Next Action`, and
`Proof Tier` before dense citations. Use `search --why --plan-details` only when
@@ -264,7 +237,7 @@ node scripts/setup-retrieval-env.mjs --fetch-embed-model
export CODESTORY_EMBED_MODEL_DIR="$(pwd)/target/retrieval-models"
export CODESTORY_EMBED_BACKEND="llamacpp"
export CODESTORY_EMBED_LLAMACPP_URL="http://127.0.0.1:8080/v1/embeddings"
-cargo retrieval-setup
+codestory-cli retrieval bootstrap --project --format json
codestory-cli index --project --refresh full
codestory-cli retrieval index --project --refresh full
@@ -279,9 +252,10 @@ with SHA-256
`ad1afe72cd6654a558667a3db10878b049a75bfd72912e1dabb91310d671173c`; all
configured mirrors must pass the same check.
-Run `codestory-cli retrieval index` only after the local sidecar services,
-llama.cpp embedding endpoint, and `bge-base-en-v1.5` model configuration are
-ready, then require `retrieval status --format json` to report
+Run `codestory-cli retrieval bootstrap` for the same target workspace you will
+query. Then run `codestory-cli retrieval index` only after the local sidecar
+services, llama.cpp embedding endpoint, and `bge-base-en-v1.5` model
+configuration are ready. Require `retrieval status --format json` to report
`retrieval_mode: "full"` before trusting agent-facing packet/search evidence.
The status JSON also reports `query_embedding_backend`,
`manifest_vector_embedding_backend`, and `stored_doc_vector_producer_backend`
@@ -381,7 +355,7 @@ Typical recovery flow:
```sh
codestory-cli doctor --project
codestory-cli index --project --refresh full
-codestory-cli search --project --query WorkspaceIndexer
+codestory-cli ground --project --why
```
If the cache directory itself is suspect, get the exact project cache path from
@@ -424,10 +398,10 @@ cargo test
cargo clippy --all-targets -- -D warnings
```
-Focused docs/onboarding lane:
+Docs-only lane:
```sh
-cargo test -p codestory-cli --test onboarding_contracts
+git diff --check
```
Release-blocking fidelity lanes:
diff --git a/scripts/tests/codestory-agent-ab-analyzer.test.mjs b/scripts/tests/codestory-agent-ab-analyzer.test.mjs
index e6162fb..46a2c9f 100644
--- a/scripts/tests/codestory-agent-ab-analyzer.test.mjs
+++ b/scripts/tests/codestory-agent-ab-analyzer.test.mjs
@@ -262,7 +262,7 @@ test("categorizes commands without treating source paths as cli invocations", ()
);
assert.equal(commandCategory("Get-Content crates/codestory-cli/src/main.rs"), "direct_file_read");
assert.equal(commandCategory("Get-Content C:\\tools\\codestory-cli.exe"), "direct_file_read");
- assert.equal(commandCategory("cargo test -p codestory-cli --test onboarding_contracts"), "build_test");
+ assert.equal(commandCategory("cargo test -p codestory-cli --test runtime_backed_flows"), "build_test");
});
test("packet gate retries only transient sidecar packet failures", async () => {
@@ -697,8 +697,8 @@ test("analyzes transcript command friction and scores manifest anchors", () => {
commandEvent("cmd_7", "item.completed", `$p='"'crates/codestory-runtime/src/lib.rs'; Get-Content $p`, "pub struct RuntimeContext;"),
commandEvent("cmd_5", "item.started", "git status --short"),
commandEvent("cmd_5", "item.completed", "git status --short", ""),
- commandEvent("cmd_6", "item.started", "cargo test -p codestory-cli --test onboarding_contracts"),
- commandEvent("cmd_6", "item.completed", "cargo test -p codestory-cli --test onboarding_contracts", "ok"),
+ commandEvent("cmd_6", "item.started", "cargo test -p codestory-cli --test runtime_backed_flows"),
+ commandEvent("cmd_6", "item.completed", "cargo test -p codestory-cli --test runtime_backed_flows", "ok"),
{
type: "item.completed",
item: {
@@ -724,7 +724,7 @@ test("analyzes transcript command friction and scores manifest anchors", () => {
id: "fixture",
task_class: "architecture_explanation",
expected_files: ["crates/codestory-cli/src/main.rs"],
- expected_verification_files: ["crates/codestory-cli/tests/onboarding_contracts.rs"],
+ expected_verification_files: ["crates/codestory-cli/tests/runtime_backed_flows.rs"],
expected_symbols: ["RuntimeContext::ensure_open", "MissingSymbol"],
expected_claims: ["Full indexing starts"],
forbidden_claims: ["remote service is required"],
@@ -744,7 +744,7 @@ test("analyzes transcript command friction and scores manifest anchors", () => {
assert.deepEqual(quality.missed_anchors.symbols, ["MissingSymbol"]);
assert.equal(quality.expected_verification_files.recall, 0);
assert.deepEqual(quality.missed_anchors.verification_files, [
- "crates/codestory-cli/tests/onboarding_contracts.rs",
+ "crates/codestory-cli/tests/runtime_backed_flows.rs",
]);
assert.equal(quality.citation_coverage.recall, 1);
});