diff --git a/README.md b/README.md index 4dc9f71..16a02c3 100644 --- a/README.md +++ b/README.md @@ -10,32 +10,42 @@ Local codebase grounding for coding agents. Benchmarks

-CodeStory builds a local evidence layer for a repository. It indexes files, -symbols, relationships, snippets, search state, and freshness notes into a -per-project SQLite cache, then exposes that evidence through a CLI and -`serve --stdio`. - -Use it when a coding agent needs repository context before explaining behavior, -planning a change, or choosing files to inspect. The workflow is explicit: check -cache health, build or refresh the index, find candidate symbols, inspect -relationships, pull snippets, and return an answer tied to source evidence. - -Repository contents and inference stay local after the required tool or model -assets are installed. Setup can fetch the CodeStory source artifact or managed -embedding assets; the indexed project data stays in the user cache and commands -stay explicit about which workspace they read. - -## Public Promise - -CodeStory is a local evidence layer for repositories, not an automatic -correctness guarantee. It gives operators and coding agents explicit commands -for cache health, indexing, search, trails, snippets, and source-backed answers -that name the files they used. The per-project SQLite cache is separate from -the optional local retrieval sidecars used by packet/search workflows; a healthy -local navigation readiness report does not by itself prove agent packet/search -readiness and does not by itself prove sidecar readiness. Benchmark notes are -environment- and repository-specific evidence, so public claims should cite the -checked setup instead of promising universal speedups or savings. +**Situation.** You are in a repo with more files than anyone holds in memory. +The agent needs to change behavior that spans packages - routing, indexing, +auth, whatever - not rename a variable in the one file already open. + +**Task.** Find the symbol that owns the behavior, see who calls it, read the +source that actually runs, and know what to touch next. Without treating +`grep -R` as architecture. + +**Action.** CodeStory indexes the workspace into a local SQLite graph: +files, symbols, calls, imports, snippets, search projections, and freshness +notes. Use `doctor`, `index`, `ground`, and `report` for local navigation. Use +`packet` and `search` after sidecars report `retrieval_mode: "full"`. + +**Result.** Work starts at a file and line you can open, not whichever match +ranked first in ripgrep. Answers say what they used; gaps say when the index or +sidecars are stale. + +```mermaid +flowchart LR + Repo["workspace files"] --> Index["index"] + Index --> Store["local SQLite graph"] + Store --> Local["local: ground, report, files, trail, snippet"] + Store --> RetrievalIndex["retrieval index"] + RetrievalIndex --> Sidecars["Zoekt + Qdrant + SCIP + llama.cpp"] + Sidecars --> AgentSearch["agent: packet, search"] +``` + +## What You Get + +| Need | Use | +| --- | --- | +| "Where do I start?" | `doctor`, `index`, `ground`, `report` | +| "What does this symbol touch?" | `symbol`, `trail`, `snippet` | +| "What changed and what might break?" | `affected` | +| "Answer a broad repo question with citations." | `packet` with full sidecars | +| "Find candidates by behavior term, path, symbol, or literal." | `search` with full sidecars | ## Try It On A Repo @@ -51,158 +61,73 @@ TARGET_WORKSPACE="/path/to/repo" "$CODESTORY_CLI" index --project "$TARGET_WORKSPACE" --refresh full "$CODESTORY_CLI" ground --project "$TARGET_WORKSPACE" --why "$CODESTORY_CLI" report --project "$TARGET_WORKSPACE" --output-file codestory-report.md -"$CODESTORY_CLI" report --project "$TARGET_WORKSPACE" --format json --output-file codestory-graph.json ``` On Windows PowerShell, use `.\target\release\codestory-cli.exe`, environment -assignments such as `$env:NAME = "value"`, and normal Windows paths such as -`C:\path\to\repo`. - -That basic path establishes local navigation readiness: the local cache, graph, -lexical index, and DB-backed navigation commands are usable for health, file, -symbol, trail, snippet, context, orientation checks, and derived report/export -artifacts. -`report` reads the current SQLite store and writes generated artifacts; the -Markdown report and full JSON graph export are not source-of-truth state. The managed -embedding dry-run is a local semantic setup check; it does not prove agent -packet/search readiness. - -Agent packet/search readiness requires `retrieval_mode=full` from local Zoekt, -Qdrant, SCIP, and llama.cpp sidecars. See [docs/usage.md](docs/usage.md) for the -full local-navigation versus sidecar-readiness split and -[docs/ops/retrieval-sidecars.md](docs/ops/retrieval-sidecars.md) for sidecar -setup. +assignments such as `$env:TARGET_WORKSPACE = "C:\path\to\repo"`, and normal +Windows paths. + +That path proves local navigation readiness. It does not prove sidecar readiness +for `packet` or `search`. -After that first index, use narrower commands instead of asking the agent to -start over: +For agent-facing packet/search evidence, build and verify sidecars first: ```sh +"$CODESTORY_CLI" retrieval bootstrap --project "$TARGET_WORKSPACE" --format json +"$CODESTORY_CLI" retrieval index --project "$TARGET_WORKSPACE" --refresh full +"$CODESTORY_CLI" retrieval status --project "$TARGET_WORKSPACE" --format json +"$CODESTORY_CLI" packet --project "$TARGET_WORKSPACE" --question "what owns request routing?" "$CODESTORY_CLI" search --project "$TARGET_WORKSPACE" --query "request routing" --why -"$CODESTORY_CLI" trail --project "$TARGET_WORKSPACE" --id --story --hide-speculative -"$CODESTORY_CLI" snippet --project "$TARGET_WORKSPACE" --id --context 40 ``` -A good CodeStory-backed answer should name the source files it used, say when -evidence is stale or partial, and give the next concrete command when more proof -is needed. - -For task-shaped flows, use [docs/usage.md](docs/usage.md). - -## Retrieval sidecars - -For Zoekt/Qdrant/SCIP packet retrieval, run `cargo retrieval-setup` once from -this repository root, then follow -[docs/ops/retrieval-sidecars.md](docs/ops/retrieval-sidecars.md) for bootstrap -flags, version pins, and troubleshooting. +`retrieval status` must report `retrieval_mode: "full"` before trusting +`packet` or `search`. See +[docs/usage.md](docs/usage.md) for task-shaped flows and +[docs/ops/retrieval-sidecars.md](docs/ops/retrieval-sidecars.md) for sidecar +setup and repair. ## Install As An Agent Skill -Use this path when CodeStory should be installed once as a grounding skill and -then pointed at whatever repository an agent is working on. - -```sh -SkillHome="" -mkdir -p "$SkillHome" -cp -R ./.agents/skills/codestory-grounding "$SkillHome/codestory-grounding" -bash "$SkillHome/codestory-grounding/scripts/setup.sh" -``` - -On Windows PowerShell: - -```powershell -$SkillHome = "" -New-Item -ItemType Directory -Force -Path $SkillHome | Out-Null -Copy-Item -Recurse -Force .\.agents\skills\codestory-grounding "$SkillHome\codestory-grounding" -& "$SkillHome\codestory-grounding\scripts\setup.ps1" -``` - -The setup script prints `CODESTORY_CLI=`. Persist that path if your agent -environment does not preserve variables between sessions. +Copy [`.agents/skills/codestory-grounding`](.agents/skills/codestory-grounding) to +your skill directory. Run `scripts/setup.sh` or `scripts/setup.ps1`. See +[`.agents/skills/codestory-grounding/SKILL.md`](.agents/skills/codestory-grounding/SKILL.md). -The skill package lives at -[.agents/skills/codestory-grounding/SKILL.md](.agents/skills/codestory-grounding/SKILL.md). +## Commands -## Core Flow - -| Need | Command | +| Task | Command | | --- | --- | -| Local navigation readiness | `codestory-cli doctor --project ` | -| Build or refresh an index | `codestory-cli index --project --refresh full` | -| Broad orientation | `codestory-cli ground --project --why` | -| Repo report / graph export | `codestory-cli report --project --format markdown` | -| Broad task evidence (requires full sidecar retrieval) | `codestory-cli packet --project --question "" --budget compact` | -| Candidate discovery (requires full sidecar retrieval) | `codestory-cli search --project --query "" --why` | -| Exact symbol evidence | `codestory-cli symbol --project --id ` | -| Flow evidence | `codestory-cli trail --project --id --story --hide-speculative` | -| Source excerpt | `codestory-cli snippet --project --id ` | -| Bundled navigation packet | `codestory-cli explore --project --id --no-tui` | -| Deep context bundle | `codestory-cli context --project --id ` | -| Changed-file impact | `codestory-cli affected --project --format markdown` | -| Persistent read surface | `codestory-cli serve --project --stdio` | - -Use `packet` for broad task questions once `ready --goal agent` reports full -sidecar retrieval. For local cache-only inspection, start with `ground`, -`report`, or `doctor`, then use `symbol`, `trail`, `snippet`, or `context` after -you have a concrete target. Use `doctor` when output looks stale, incomplete, or -inconsistent. - -## What It Builds - -```mermaid -flowchart LR - Repo["repository"] --> Workspace["workspace discovery"] - Workspace --> Indexer["symbol and edge extraction"] - Indexer --> Store["SQLite store"] - Store --> Runtime["retrieval and context assembly"] - Runtime --> CLI["CLI and stdio reads"] - CLI --> Agent["coding agent"] -``` - -CodeStory builds a local evidence layer so agents can request grounded context -instead of relying on ad hoc file reads. - -## Language Support Claims - -CodeStory separates parser-backed graph indexing, regression-tested accuracy, -structural extraction, framework route coverage, and agent packet/search -readiness. The current contract is documented in +| Cache health | `doctor --project ` | +| Index | `index --project --refresh full` | +| Orientation | `ground --project --why` | +| Lookup with sidecars | `search --project --query "..." --why` | +| Call graph | `trail --project --id --story` | +| Source | `snippet --project --id ` | +| Target bundle | `context --project --id ` | +| Task packet with sidecars | `packet --project --question "..."` | +| Persistent reads | `serve --project --stdio` | + +## Language Support + +CodeStory separates parser-backed graph coverage, structural collectors, +regression-tested fidelity, and agent packet/search readiness. The current +contract is documented in [docs/architecture/language-support.md](docs/architecture/language-support.md). -In short: Python, Java, Rust, JavaScript, TypeScript/TSX, C++, C, Go, Ruby, -PHP, C#, Kotlin, Swift, Dart, and Bash are fidelity-gated parser-backed graph -languages; HTML, CSS, and SQL use structural collectors. - -The opt-in OSS language corpus pairs each public language-support profile with a -pinned medium-sized open source project and compares raw filesystem counts -against CodeStory indexing of the same files: -[docs/testing/oss-language-corpus.md](docs/testing/oss-language-corpus.md). -The separate `language-expansion-holdout` benchmark suite runs strict -`without_codestory` versus `with_codestory` agent tasks on those pinned -projects and records elapsed time, token usage, estimated cost, tool calls, -command counts, source reads, post-packet source reads, and quality gates. - -For the system model, start with -[docs/concepts/how-codestory-works.md](docs/concepts/how-codestory-works.md), -then [docs/architecture/overview.md](docs/architecture/overview.md). +Python, Java, Rust, JavaScript, TypeScript/TSX, C++, C, Go, Ruby, PHP, C#, +Kotlin, Swift, Dart, and Bash are fidelity-gated parser-backed graph languages. +HTML, CSS, and SQL use structural collectors. ## Evidence -The benchmark docs are deliberately cautious. They separate current checked-in -benchmark history from the state of your local cache, which can drift and should -be checked with `doctor`. - -- Public evidence summary and caveats: - [docs/testing/benchmark-ledger.md](docs/testing/benchmark-ledger.md) -- Repo-scale timing history: - [docs/testing/codestory-e2e-stats-log.md](docs/testing/codestory-e2e-stats-log.md) -- Warm stdio loop evidence: - [docs/testing/codestory-stdio-warm-loop-stats.md](docs/testing/codestory-stdio-warm-loop-stats.md) -- Repeatable with/without harness: - [`scripts/codestory-agent-ab-benchmark.mjs`](scripts/codestory-agent-ab-benchmark.mjs) +Benchmark notes are environment- and repository-specific evidence. Do not turn +one row into a universal savings claim. -Do not promote a single benchmark row into a universal savings claim. +- Scorecard and caveats: [docs/testing/benchmark-ledger.md](docs/testing/benchmark-ledger.md) +- Repo-scale timing history: [docs/testing/codestory-e2e-stats-log.md](docs/testing/codestory-e2e-stats-log.md) +- Warm stdio loop history: [docs/testing/codestory-stdio-warm-loop-stats.md](docs/testing/codestory-stdio-warm-loop-stats.md) +- Repeatable with/without harness: [`scripts/codestory-agent-ab-benchmark.mjs`](scripts/codestory-agent-ab-benchmark.mjs) -## Hack On CodeStory +## Contributing Start with the contributor docs, then run Cargo checks serially because this workspace shares build locks. @@ -210,14 +135,17 @@ workspace shares build locks. - [docs/contributors/getting-started.md](docs/contributors/getting-started.md) - [docs/contributors/debugging.md](docs/contributors/debugging.md) - [docs/contributors/testing-matrix.md](docs/contributors/testing-matrix.md) +- [docs/architecture/overview.md](docs/architecture/overview.md) - [docs/architecture/runtime-execution-path.md](docs/architecture/runtime-execution-path.md) -- [docs/architecture/language-support.md](docs/architecture/language-support.md) -- [docs/architecture/subsystems/contracts.md](docs/architecture/subsystems/contracts.md) -- [docs/architecture/subsystems/workspace.md](docs/architecture/subsystems/workspace.md) -- [docs/architecture/subsystems/indexer.md](docs/architecture/subsystems/indexer.md) -- [docs/architecture/subsystems/store.md](docs/architecture/subsystems/store.md) -- [docs/architecture/subsystems/runtime.md](docs/architecture/subsystems/runtime.md) -- [docs/architecture/subsystems/cli.md](docs/architecture/subsystems/cli.md) + +## Docs Map + +- Usage: [docs/usage.md](docs/usage.md) +- Concepts: [docs/concepts/how-codestory-works.md](docs/concepts/how-codestory-works.md) +- Architecture: [docs/architecture/overview.md](docs/architecture/overview.md) +- Languages: [docs/architecture/language-support.md](docs/architecture/language-support.md) +- Benchmarks: [docs/testing/benchmark-ledger.md](docs/testing/benchmark-ledger.md) +- Contributing: [docs/contributors/getting-started.md](docs/contributors/getting-started.md) ## License diff --git a/crates/codestory-cli/tests/onboarding_contracts.rs b/crates/codestory-cli/tests/onboarding_contracts.rs deleted file mode 100644 index 436efc9..0000000 --- a/crates/codestory-cli/tests/onboarding_contracts.rs +++ /dev/null @@ -1,492 +0,0 @@ -use std::fs; -use std::path::{Path, PathBuf}; -use std::process::Command; - -fn repo_root() -> PathBuf { - PathBuf::from(env!("CARGO_MANIFEST_DIR")) - .parent() - .expect("cli crate has workspace parent") - .parent() - .expect("workspace root exists") - .to_path_buf() -} - -fn collect_markdown_files(dir: &Path, files: &mut Vec) { - for entry in fs::read_dir(dir).expect("read markdown dir") { - let entry = entry.expect("markdown entry"); - let path = entry.path(); - if path.is_dir() { - collect_markdown_files(&path, files); - continue; - } - if path.extension().and_then(|ext| ext.to_str()) == Some("md") { - files.push(path); - } - } -} - -fn extract_markdown_links(contents: &str) -> Vec { - let mut links = Vec::new(); - let bytes = contents.as_bytes(); - let mut index = 0; - while index + 1 < bytes.len() { - if bytes[index] == b']' && bytes[index + 1] == b'(' { - let mut end = index + 2; - while end < bytes.len() && bytes[end] != b')' { - end += 1; - } - if end < bytes.len() { - links.push(contents[index + 2..end].trim().to_string()); - index = end; - } - } - index += 1; - } - links -} - -fn normalize_local_link_target(raw: &str) -> Option { - let target = raw.trim().trim_matches(|ch| ch == '<' || ch == '>'); - if target.is_empty() - || target.starts_with('#') - || target.starts_with("http://") - || target.starts_with("https://") - || target.starts_with("mailto:") - || target.starts_with("app://") - || target.starts_with("plugin://") - { - return None; - } - - Some( - target - .split_once('#') - .map(|(path, _)| path) - .unwrap_or(target) - .to_string(), - ) -} - -fn assert_public_doc_avoids_agent_specific_framing(file: &Path, contents: &str) { - let lowered = contents.to_lowercase(); - for blocked in [ - "codegraph", - "codex-first", - "codex first", - "global codex", - "for codex users", - ".codex/skills", - ".codex\\skills", - ] { - assert!( - !lowered.contains(blocked), - "public doc should not contain `{blocked}`: {}", - file.display() - ); - } -} - -fn extract_inline_toml_string_array(manifest: &str, key: &str) -> Vec { - let prefix = format!("{key} = ["); - let line = manifest - .lines() - .find(|line| line.trim_start().starts_with(&prefix)) - .unwrap_or_else(|| panic!("manifest should contain inline array `{key}`")); - let values = line - .trim() - .strip_prefix(&prefix) - .and_then(|value| value.strip_suffix(']')) - .unwrap_or_else(|| panic!("manifest should use inline string array for `{key}`")); - - values - .split(',') - .map(|value| value.trim().trim_matches('"').to_string()) - .filter(|value| !value.is_empty()) - .collect() -} - -#[test] -fn cli_package_metadata_is_adoption_ready() { - let root = repo_root(); - let manifest_path = root.join("crates/codestory-cli/Cargo.toml"); - let manifest = fs::read_to_string(&manifest_path).expect("CLI manifest should exist"); - - for required in [ - "description = \"Local repository evidence and grounding CLI for source-backed coding workflows.\"", - "license = \"Apache-2.0\"", - "repository = \"https://github.com/TheGreenCedar/CodeStory.git\"", - "readme = \"../../README.md\"", - ] { - assert!( - manifest.contains(required), - "CLI package metadata should include `{required}`" - ); - } - - let readme_from_manifest = manifest_path - .parent() - .expect("CLI manifest should have parent") - .join("../../README.md"); - assert_eq!( - fs::canonicalize(readme_from_manifest).expect("manifest readme path should resolve"), - fs::canonicalize(root.join("README.md")).expect("repo README should resolve"), - "CLI package readme should point at the repository README" - ); - - let keywords = extract_inline_toml_string_array(&manifest, "keywords"); - assert_eq!( - keywords, - vec!["code-search", "grounding", "cli", "agents"], - "keywords should stay conservative and adoption-oriented" - ); - assert!( - keywords.len() <= 5, - "crates.io accepts at most five package keywords" - ); - for keyword in keywords { - assert!( - keyword.len() <= 20 - && keyword - .chars() - .all(|ch| ch.is_ascii_alphanumeric() || ch == '-'), - "keyword should stay crates.io-compatible: {keyword}" - ); - } - - let categories = extract_inline_toml_string_array(&manifest, "categories"); - assert_eq!( - categories, - vec!["command-line-utilities", "development-tools"], - "categories should stay accurate and crates.io-compatible" - ); -} - -#[test] -fn readme_keeps_customer_first_onboarding() { - let root = repo_root(); - let readme = fs::read_to_string(root.join("README.md")).expect("README should exist"); - assert!(readme.contains("Public Promise")); - assert!(readme.contains("Try It On A Repo")); - assert!(readme.contains("What It Builds")); - assert!(readme.contains("Local codebase grounding for coding agents")); - assert!(readme.contains("Install As An Agent Skill")); - assert!(readme.contains("Core Flow")); - assert!(readme.contains("Hack On CodeStory")); - assert!(readme.contains("A good CodeStory-backed answer should name")); - assert!(readme.contains("local evidence layer for repositories")); - assert!(readme.contains("explicit commands")); - assert!(readme.contains("source-backed answers")); - assert!(readme.contains("per-project SQLite cache is separate")); - assert!(readme.contains("local retrieval sidecars")); - assert!(readme.contains("does not by itself prove sidecar readiness")); - assert!(readme.contains("environment- and repository-specific evidence")); - assert!(readme.contains("instead of promising universal speedups or savings")); - assert!(readme.contains("benchmark history")); - assert!(readme.contains("checked with `doctor`")); - assert!(readme.contains(".agents/skills/codestory-grounding/SKILL.md")); - assert!(readme.contains("docs/usage.md")); - assert!(readme.contains("docs/concepts/how-codestory-works.md")); - assert!(readme.contains("docs/architecture/language-support.md")); - assert!(readme.contains("docs/testing/benchmark-ledger.md")); - assert!(readme.contains( - r#""$CODESTORY_CLI" setup embeddings --project "$TARGET_WORKSPACE" --dry-run --format json"# - )); - assert!(readme.contains("serve --stdio")); - assert!(readme.contains("docs/architecture/overview.md")); - assert!(readme.contains("docs/contributors/debugging.md")); - assert!(readme.contains("docs/contributors/testing-matrix.md")); - assert!( - readme.find("Try It On A Repo").expect("quickstart section") - < readme.find("Evidence").expect("evidence section"), - "README should show the usable path before benchmark evidence" - ); - - for path in [ - "docs/usage.md", - "docs/concepts/how-codestory-works.md", - "docs/architecture/overview.md", - "docs/architecture/runtime-execution-path.md", - "docs/architecture/language-support.md", - "docs/architecture/subsystems/contracts.md", - "docs/architecture/subsystems/workspace.md", - "docs/architecture/subsystems/indexer.md", - "docs/architecture/subsystems/runtime.md", - "docs/architecture/subsystems/store.md", - "docs/architecture/subsystems/cli.md", - "docs/contributors/getting-started.md", - "docs/contributors/debugging.md", - "docs/contributors/testing-matrix.md", - ".agents/skills/codestory-grounding/scripts/setup.ps1", - ".agents/skills/codestory-grounding/scripts/setup.sh", - "scripts/codestory-agent-ab-benchmark.mjs", - ] { - assert!( - root.join(path).exists(), - "expected onboarding doc to exist: {path}" - ); - } - - for path in [ - ".agents/skills/codestory-grounding/scripts/setup.ps1", - ".agents/skills/codestory-grounding/scripts/setup.sh", - ] { - let setup = fs::read_to_string(root.join(path)).expect("read setup script"); - assert!( - !setup.contains("DEFAULT_CODESTORY_REPO_REF"), - "setup script should not pin a stale default CLI source ref: {path}" - ); - assert!( - setup.contains("CODESTORY_REPO_REF"), - "setup script should keep explicit source-ref override support: {path}" - ); - assert!( - setup.contains("origin/HEAD"), - "setup script should build the remote default branch when no ref is explicit: {path}" - ); - } -} - -#[test] -fn docs_drift_contracts_keep_living_sources_explicit() { - let root = repo_root(); - let readme = fs::read_to_string(root.join("README.md")).expect("README should exist"); - let usage = fs::read_to_string(root.join("docs/usage.md")).expect("usage doc should exist"); - let testing_matrix = fs::read_to_string(root.join("docs/contributors/testing-matrix.md")) - .expect("testing matrix should exist"); - let language_support = fs::read_to_string(root.join("docs/architecture/language-support.md")) - .expect("language support doc should exist"); - let benchmark_scorecard = fs::read_to_string(root.join("docs/testing/benchmark-ledger.md")) - .expect("benchmark ledger should exist"); - - assert!( - readme.contains( - r#""$CODESTORY_CLI" setup embeddings --project "$TARGET_WORKSPACE" --dry-run --format json"# - ), - "README quickstart should show first-run semantic setup dry-run" - ); - assert!( - !usage.contains("semantic_doc_scope = \"durable\""), - "usage config example should omit the default durable semantic scope" - ); - for accepted_scope in ["`all`", "`full`", "`all-symbols`", "`all_symbols`"] { - assert!( - usage.contains(accepted_scope), - "usage docs should name accepted all-symbol semantic_doc_scope value {accepted_scope}" - ); - } - assert!( - testing_matrix.contains("latest row in") - && testing_matrix.contains("codestory-e2e-stats-log.md") - && testing_matrix.contains("historical") - && testing_matrix.contains("examples only"), - "testing matrix should point current timing claims at the living stats log" - ); - assert!( - !testing_matrix.contains("The 2026-04-18 repo-scale baseline"), - "testing matrix should not present an old hard-coded baseline as current" - ); - assert!( - benchmark_scorecard.contains("## Current Scorecard") - && benchmark_scorecard.contains("codestory-e2e-stats-log.md"), - "benchmark ledger should keep the scorecard and living timing log references" - ); - for required in [ - "parser-backed graph", - "fidelity-gated", - "structural collector", - "candidate parser compatibility record", - "Go, Ruby, PHP, C#, Kotlin, Swift, Dart, Bash", - "Kotlin, Swift, Dart, Bash", - ] { - assert!( - language_support.contains(required), - "language support doc should preserve support-claim term `{required}`" - ); - } - for required in [ - "crates/codestory-contracts/src/language_support.rs", - "language_support_profile_for_ext", - "language_support_profile_for_language_name", - "get_language_for_ext", - ] { - assert!( - language_support.contains(required), - "language support docs should mention `{required}`" - ); - } - assert!( - testing_matrix.contains("../architecture/language-support.md"), - "testing matrix should link the language support claim contract" - ); - assert!( - root.join("docs/testing/benchmark-ledger.md").exists(), - "benchmark ledger should preserve detailed historical rows" - ); -} - -#[test] -fn public_docs_avoid_competitor_and_agent_specific_framing() { - let root = repo_root(); - let mut files = vec![root.join("README.md")]; - collect_markdown_files(&root.join("docs"), &mut files); - collect_markdown_files(&root.join(".agents/skills/codestory-grounding"), &mut files); - - for file in files { - let contents = fs::read_to_string(&file).expect("read public doc"); - assert_public_doc_avoids_agent_specific_framing(&file, &contents); - } -} - -#[test] -fn usage_doc_keeps_agent_contract_terms_out_of_operator_flow() { - let root = repo_root(); - let usage = fs::read_to_string(root.join("docs/usage.md")).expect("usage doc should exist"); - assert!(usage.contains("Common Workflows")); - assert!(usage.contains("I need a repo overview")); - assert!(usage.contains("I need evidence for a broad question")); - assert!(usage.contains("The cache or local navigation looks stale")); - assert!(usage.contains("For agent-facing packet/search recovery")); - assert!(usage.contains( - "codestory-cli retrieval index --project --refresh full --format json" - )); - for blocked in [ - "sufficiency.avoid_opening", - "supported-claim wording", - "claim-ledger", - "Support files", - ] { - assert!( - !usage.contains(blocked), - "operator usage doc should not expose agent-internal contract term {blocked}" - ); - } -} - -#[test] -fn usage_doc_names_two_readiness_tracks_and_predictable_output_modes() { - let root = repo_root(); - let usage = fs::read_to_string(root.join("docs/usage.md")).expect("usage doc should exist"); - - assert!(usage.contains("## Readiness Tracks")); - assert!(usage.contains("### Local navigation/cache readiness")); - assert!(usage.contains("### Agent packet/search sidecar readiness")); - assert!(usage.contains("`local_navigation`")); - assert!(usage.contains("`agent_packet_search`")); - assert!(usage.contains("`retrieval_mode: \"full\"`")); - assert!(usage.contains("## Predictable Output Modes")); - assert!(usage.contains("Most commands default to Markdown")); - assert!( - usage.contains("Use `--format json` when automation needs the complete structured result") - ); - assert!(usage.contains("Use `--output-file `")); - assert!(usage.contains("The parent directory must already exist")); - assert!(usage.contains("`explore` opens the terminal UI by default")); - assert!(usage.contains("Use `--no-tui`")); - assert!( - usage - .find("## Readiness Tracks") - .expect("readiness heading") - < usage - .find("## Retrieval Defaults") - .expect("retrieval defaults heading"), - "usage should introduce readiness tracks before retrieval defaults" - ); -} - -#[test] -fn benchmark_docs_show_proof_tier_ladder() { - let root = repo_root(); - let benchmark_scorecard = fs::read_to_string(root.join("docs/testing/benchmark-ledger.md")) - .expect("benchmark ledger should exist"); - - assert!(benchmark_scorecard.contains("## Proof Tier Ladder")); - for tier in [ - "Stats-only local regression signal", - "Full sidecar readiness proof", - "Real-repo drill proof", - "Promotion-grade benchmark proof", - ] { - assert!( - benchmark_scorecard.contains(tier), - "benchmark ledger should explain proof tier {tier}" - ); - } - assert!(benchmark_scorecard.contains("Full sidecar readiness, agent packet/search readiness")); - assert!(benchmark_scorecard.contains("`retrieval_mode: \"full\"`")); - assert!(benchmark_scorecard.contains("Generalized agent savings")); - assert!( - benchmark_scorecard - .find("## Proof Tier Ladder") - .expect("proof tier ladder") - < benchmark_scorecard - .find("## Promotion Rules") - .expect("promotion rules"), - "proof tier ladder should frame promotion rules" - ); -} - -#[test] -fn markdown_links_resolve_to_existing_local_files() { - let root = repo_root(); - let mut markdown_files = vec![root.join("README.md")]; - collect_markdown_files(&root.join("docs"), &mut markdown_files); - - for file in markdown_files { - let contents = fs::read_to_string(&file).expect("read markdown file"); - for link in extract_markdown_links(&contents) { - let Some(target) = normalize_local_link_target(&link) else { - continue; - }; - let resolved = file.parent().expect("markdown file parent").join(target); - assert!( - resolved.exists(), - "broken markdown link in {} -> {}", - file.display(), - resolved.display() - ); - } - } -} - -#[test] -fn codestory_grounding_skill_command_refs_track_cli_commands() { - let root = repo_root(); - let skill_root = root.join(".agents/skills/codestory-grounding"); - let commands = [ - "index", "ground", "doctor", "search", "symbol", "trail", "snippet", "query", "explore", - "bookmark", "context", "packet", "drill", "setup", "serve", - ]; - - for command in commands { - let reference = skill_root.join("references").join(format!("{command}.md")); - assert!( - reference.exists(), - "codestory-grounding should document `{command}` at {}", - reference.display() - ); - - let help = Command::new(env!("CARGO_BIN_EXE_codestory-cli")) - .arg(command) - .arg("--help") - .output() - .unwrap_or_else(|error| panic!("run `{command} --help`: {error}")); - assert!( - help.status.success(), - "`{command}` should remain a valid CLI subcommand\nstdout:\n{}\nstderr:\n{}", - String::from_utf8_lossy(&help.stdout), - String::from_utf8_lossy(&help.stderr) - ); - } - - for command in ["context", "bookmark", "doctor", "explore", "serve"] { - let reference = - fs::read_to_string(skill_root.join("references").join(format!("{command}.md"))) - .expect("read command reference"); - for required in ["Normal path", "Failure path", "Integration edge"] { - assert!( - reference.contains(required), - "`{command}` reference should include a {required} row" - ); - } - } -} diff --git a/crates/codestory-indexer/src/resolution/mod.rs b/crates/codestory-indexer/src/resolution/mod.rs index fba97ba..5669df1 100644 --- a/crates/codestory-indexer/src/resolution/mod.rs +++ b/crates/codestory-indexer/src/resolution/mod.rs @@ -2769,6 +2769,7 @@ mod tests { kind INTEGER NOT NULL, source_node_id INTEGER NOT NULL, target_node_id INTEGER NOT NULL, + file_node_id INTEGER, resolved_target_node_id INTEGER, confidence REAL, certainty TEXT, @@ -2869,6 +2870,7 @@ mod tests { kind INTEGER NOT NULL, source_node_id INTEGER NOT NULL, target_node_id INTEGER NOT NULL, + file_node_id INTEGER, resolved_target_node_id INTEGER, confidence REAL, certainty TEXT, diff --git a/docs/architecture/overview.md b/docs/architecture/overview.md index 2c6640a..d7138aa 100644 --- a/docs/architecture/overview.md +++ b/docs/architecture/overview.md @@ -1,7 +1,8 @@ # Architecture Overview -CodeStory has one job: turn a repository into local evidence that a coding agent -can query before relying on a small set of manually opened files. +CodeStory turns a repository into local evidence a coding agent can query: files +and symbols in SQLite, optional sidecar indexes for packet/search, thin CLI on +top of `codestory-runtime`. The runtime path is: diff --git a/docs/concepts/how-codestory-works.md b/docs/concepts/how-codestory-works.md index b4d86b7..dafa508 100644 --- a/docs/concepts/how-codestory-works.md +++ b/docs/concepts/how-codestory-works.md @@ -1,84 +1,75 @@ # How CodeStory Works -CodeStory is a local evidence layer for codebases. It does not replace judgment, -tests, or source reading. It makes the first pass more structured. +CodeStory indexes a workspace into a local graph, then serves read commands +against that graph. It does not replace tests or judgment; it structures the +first pass. + +Command loop: [README - What You Get](../../README.md#what-you-get). +Readiness lanes: [usage.md](../usage.md#readiness-tracks). + +```mermaid +flowchart TD + Files["workspace files"] --> Plan["workspace discovery and refresh plan"] + Plan --> Parse["tree-sitter parsing and semantic resolution"] + Parse --> Graph["SQLite graph, occurrences, snippets, snapshots"] + Graph --> Local["local navigation commands"] + Graph --> SidecarBuild["retrieval index"] + SidecarBuild --> Sidecars["Zoekt, Qdrant, SCIP, llama.cpp"] + Sidecars --> Agent["agent packet/search commands"] +``` + +## What gets stored + +Per-project SQLite under your user cache, keyed by workspace path: + +| Stored | Purpose | +| --- | --- | +| File inventory and refresh metadata | Incremental re-index | +| Graph nodes and edges | Calls, imports, overrides, references | +| Snippets and occurrences | Source-backed reads | +| Search projections and symbol docs | Lookup without opening every file | +| Snapshots | Cached read models rebuilt from the graph | +| Dense anchors (when policy selects them) | Sidecar vector search only | -An agent usually fails on a large repo by over-weighting the first few files it -opens. CodeStory gives that agent an indexed map before it explains behavior or -plans a change. +Repo content stays local. Managed setup may fetch tool assets; indexed evidence +does not leave the cache unless you copy it. -## The Loop +## The loop ```text -doctor -> index -> ground -> search -> symbol/trail/snippet/explore -> context +doctor -> index -> ground/report/files -> exact target -> trail/snippet/context ``` -- `doctor` checks whether the cache, index, retrieval mode, and local embedding - setup are usable. -- `index` builds or refreshes local graph, search, snapshot, graph-native - symbol-doc, component-report, and selected dense-anchor state for one target - repository. -- `ground` gives broad orientation and reports limited coverage or gaps. -- `search` finds candidate files, symbols, routes, literals, modules, or behavior - terms. -- `symbol`, `trail`, `snippet`, and `explore` inspect one selected target. -- `context` bundles deeper evidence around that concrete target. -- `packet` handles broad task questions and reports citations, gaps, and next - commands. - -The workflow is a repeatable evidence loop. - -## What Gets Stored - -CodeStory writes per-project state under the user cache, keyed by the target -workspace path. The cache can include: - -- discovered files and refresh metadata -- graph nodes for files, symbols, and related code elements -- graph edges such as calls, imports, overrides, and references -- source snippets and occurrence locations -- search projection rows and local search indexes -- grounding snapshots rebuilt from the graph -- graph-native symbol docs, which are deterministic searchable summaries for - durable AST symbols -- selected dense anchors, which are the only generated docs embedded as vectors - under the active semantic policy - -Repository data stays local. Managed setup may fetch tool or model assets, but -the indexed project evidence lives in the local cache. - -## Key Terms - -- Grounding is source-backed context: the files, symbols, and summaries a command - returns so an answer can be tied back to repository evidence. -- A symbol doc is deterministic generated text for a symbol, stored so lexical - and graph retrieval can find relevant code even when the query words are not - exact. -- A dense anchor is a policy-selected symbol, component report, or unstructured - doc that receives a vector embedding. Code symbols do not need dense vectors - to be product-searchable. -- A snapshot is a cached read model rebuilt from the local graph. If a snapshot - is stale, the tool should say so. -- A trail is a focused graph walk around one symbol: callers, callees, - references, or neighborhood context. -- A packet is a bounded evidence bundle for a broad task. It should include - citations, gaps, and follow-up commands. - -## What Good Looks Like +Use `packet` and `search` after the sidecar lane reports +`retrieval_mode: "full"`. Until then, keep local browsing on exact targets from +`ground`, `report`, `files`, or existing node ids. + +## Terms + +| Term | Meaning | +| --- | --- | +| Grounding | Context tied back to indexed files and symbols | +| Symbol doc | Generated searchable text for a symbol (lexical, not embedded by default) | +| Dense anchor | Policy-selected symbol or report that gets a vector | +| Snapshot | Derived read model; may be stale, and commands should say so | +| Trail | Graph walk from one symbol: callers, callees, neighbors | +| Packet | Bounded task evidence with citations, gaps, next commands | + +More: [glossary.md](../glossary.md). + +## What good output looks like A good CodeStory-backed answer does three things: -1. It names the files, symbols, or snippets it used. -2. It says when evidence is stale, partial, ambiguous, or missing. -3. It gives the next concrete command when the current evidence is not enough. +1. Names the files, symbols, snippets, or sidecar evidence it used. +2. Says when evidence is stale, partial, ambiguous, or missing. +3. Gives the next concrete command when the current evidence is not enough. The goal is not a more confident answer. The goal is confidence constrained by source evidence. -## Where To Go Next +## Related -- Use [../usage.md](../usage.md) for command flows. -- Use [../architecture/overview.md](../architecture/overview.md) for the system - boundary and crate model. -- Use [../contributors/debugging.md](../contributors/debugging.md) when output - looks wrong. +- [usage.md](../usage.md) +- [architecture/overview.md](../architecture/overview.md) +- [contributors/debugging.md](../contributors/debugging.md) diff --git a/docs/contributors/testing-matrix.md b/docs/contributors/testing-matrix.md index 01c0487..dbb71d2 100644 --- a/docs/contributors/testing-matrix.md +++ b/docs/contributors/testing-matrix.md @@ -14,7 +14,7 @@ flowchart TD change --> cli["CLI args or output boundary work"] change --> bench["Bench or perf-surface work"] change --> e2e["Repo-scale semantic or cold-start behavior"] - docs --> docs_checks["markdown/link checks + any touched doc contracts"] + docs --> docs_checks["readback + git diff --check"] always --> workspace["fmt, check, targeted tests, clippy"] indexer --> fidelity["fidelity_regression, tictactoe_language_coverage, integration"] store --> store_tests["cargo test -p codestory-store"] @@ -40,11 +40,11 @@ These are the default checks for any contributor change. If you only changed `README.md` or `docs/**`, use the smallest credible lane: ```sh -cargo fmt --check -cargo test -p codestory-cli --test onboarding_contracts +git diff --check ``` -Only escalate to broader cargo checks if the doc change depends on new code behavior or command output. +Read the changed pages back before finishing. Only escalate to broader Cargo +checks if the doc change depends on new code behavior or command output. ## Indexer And Graph Fidelity diff --git a/docs/glossary.md b/docs/glossary.md index 461860e..60fb6e2 100644 --- a/docs/glossary.md +++ b/docs/glossary.md @@ -1,20 +1,28 @@ # Glossary -- grounding: the process of turning indexed code state into concise, relevant context for a question or tool action -- snapshot: a derived SQLite-backed grounding view that can be rebuilt from the primary graph tables -- projection: derived persisted data such as callable projection state or ranked grounding summaries -- staged snapshot: the temporary SQLite database built during full refresh before publish replaces the live cache -- refresh baseline: the persisted file inventory and metadata used to decide what an incremental refresh must index or remove -- trail: a focused graph walk rooted at one symbol, usually caller/callee or neighborhood oriented -- runtime: the orchestration surface that coordinates project opening, indexing, search, grounding, trail generation, and system actions -- workspace: the manifest plus filesystem discovery layer that decides which files belong to the project -- contracts: shared graph, DTO, and event types that are safe to depend on across boundaries -- repo-text hit: a direct file-content match surfaced alongside indexed-symbol search results -- retrieval mode: retrieval status contract for sidecar evidence; `retrieval_mode=full` is required for agent packet/search readiness -- symbol doc: deterministic generated per-symbol text stored in SQLite for graph-native lexical retrieval; it is not embedded by default -- dense anchor: a policy-selected symbol, component report, or unstructured doc that receives a vector embedding -- local navigation readiness: the local cache, graph, lexical index, and DB-backed navigation commands are usable -- agent packet/search readiness: sidecar packet/search evidence is trustworthy only when retrieval status reports `retrieval_mode=full` -- target context: DB-first evidence for one concrete target; not a replacement for broad packet, search, or drill questions -- semantic ready: local diagnostic state where dense-anchor retrieval is enabled, an embedding runtime is available when dense anchors exist, and persisted dense anchors match the active policy; not agent packet/search readiness -- cache root: the directory that owns one project cache; by default this is under the user cache directory, but `--cache-dir` can override it +## Readiness + +- **local navigation readiness**: SQLite cache, graph, and DB-backed browse commands (`ground`, `report`, `files`, `trail`, `snippet`, `context --id`, etc.) are usable +- **agent packet/search readiness**: sidecars are healthy and `retrieval_mode=full`; required for trustworthy `packet`, `search`, and query-based candidate discovery +- **retrieval mode**: sidecar status contract; only `full` serves agent packet/search +- **semantic ready**: dense-anchor embedding state matches policy; not the same as agent packet/search readiness + +## Index and graph + +- **grounding**: indexed context returned for a question or command, with source ties +- **snapshot**: derived grounding view rebuilt from graph tables +- **projection**: persisted derived state such as callable projection state or ranked summaries +- **staged snapshot**: temporary DB during full refresh before publish +- **refresh baseline**: file inventory used to plan incremental refresh +- **trail**: focused graph walk from one symbol +- **symbol doc**: deterministic per-symbol search text in SQLite; not embedded by default +- **dense anchor**: symbol, component report, or doc selected for vector embedding +- **repo-text hit**: raw file-content match; diagnostic, not a substitute for graph evidence + +## System + +- **runtime**: orchestrates indexing, grounding, trails, packet/search flows, and system actions +- **workspace**: manifest and discovery layer for which files belong to the project +- **contracts**: shared graph types, DTOs, and events across crates +- **target context**: DB-first bundle for one concrete target (`context --id` or bookmark), not broad `packet` +- **cache root**: directory for one project cache; override with `--cache-dir` diff --git a/docs/ops/retrieval-sidecars.md b/docs/ops/retrieval-sidecars.md index a3651ea..1c88049 100644 --- a/docs/ops/retrieval-sidecars.md +++ b/docs/ops/retrieval-sidecars.md @@ -1,21 +1,22 @@ # Retrieval sidecars — Operations runbook -Local Zoekt, Qdrant, and SCIP indexer processes for sidecar packet retrieval. Data directories -live under the CodeStory user cache; ports are fixed for local dev and CI smoke. - -This runbook covers the `agent_packet_search` readiness lane. Sidecar readiness -is required before agent-facing `packet` and `search` output can be trusted. -Local SQLite navigation is a separate `local_navigation` lane: `codestory-cli -index`, `ground`, `symbol`, `trail`, `snippet`, `explore`, `context`, `files`, -and `affected` can be useful with a healthy local cache, but that cache alone -does not prove packet/search sidecar readiness. - -**Design reference:** [`retrieval-design.md`](../architecture/retrieval-design.md) -(mode definitions, cost envelopes, promotion guards). - -**Operations reference:** this runbook owns setup commands, version pins, env -vars, troubleshooting, and CI smoke sequences. Proof tiers and promotion -checklists live in [`retrieval-architecture.md`](../testing/retrieval-architecture.md). +Local Zoekt, Qdrant, SCIP, and llama.cpp processes for agent `packet` and +`search`. Data dirs live under the user cache; default ports are 6070 (Zoekt) +and 6333 (Qdrant). + +Required for `agent_packet_search` readiness (`retrieval_mode=full`). A healthy +SQLite cache alone does not satisfy that lane. + +Design: [`retrieval-design.md`](../architecture/retrieval-design.md). +Promotion checks: [`retrieval-architecture.md`](../testing/retrieval-architecture.md). + +```mermaid +flowchart LR + cli[codestory-cli] --> zoekt["Zoekt localhost:6070"] + cli --> qdrant["Qdrant localhost:6333"] + cli --> scip[SCIP artifacts in user cache] + cli --> embed[llama.cpp embedding endpoint] +``` --- @@ -34,11 +35,13 @@ checklists live in [`retrieval-architecture.md`](../testing/retrieval-architectu From the CodeStory repository root (Windows, macOS, Linux): ```sh -cargo retrieval-setup +cargo run -p codestory-cli -- retrieval bootstrap --project . --format json ``` This starts or checks the local sidecar services for the CodeStory checkout; it does not by itself finalize the retrieval manifest for every target workspace. +The `--project .` is intentional here. For another repo, pass that repo path to +`--project`. Plain `codestory-cli index` builds the core SQLite code index only. It can make the local navigation lane usable, but it does not generate sidecar artifacts or @@ -55,7 +58,7 @@ node scripts/setup-retrieval-env.mjs --fetch-embed-model export CODESTORY_EMBED_MODEL_DIR="$(pwd)/target/retrieval-models" export CODESTORY_EMBED_BACKEND="llamacpp" export CODESTORY_EMBED_LLAMACPP_URL="http://127.0.0.1:8080/v1/embeddings" -cargo retrieval-setup +./target/release/codestory-cli retrieval bootstrap --project --format json ./target/release/codestory-cli index --project --refresh full ./target/release/codestory-cli retrieval index --project --refresh full ./target/release/codestory-cli retrieval status --project --format json @@ -74,12 +77,11 @@ Qdrant component as policy-skipped rather than querying a missing collection. Status after bootstrap: ```sh -cargo retrieval-status +cargo run -p codestory-cli -- retrieval status --project . --format json ``` -Aliases are defined in [`.cargo/config.toml`](../../.cargo/config.toml). They run -`codestory retrieval bootstrap --project .` and `retrieval status --project .`, building the CLI -when needed. +Optional aliases are defined in [`.cargo/config.toml`](../../.cargo/config.toml). +They wrap the same project-dot bootstrap and status commands. **Bootstrap flags** (via `cargo run -p codestory-cli -- retrieval bootstrap ...`): @@ -102,7 +104,7 @@ node scripts/setup-retrieval-env.mjs --with-holdout-clone |------|---------| | `--check-only` | Prerequisites report only; exit 1 if required tools missing | | `--skip-compose` | Passed to bootstrap | -| `--skip-build` | Skip `cargo build` (alias still builds on first `cargo retrieval-setup`) | +| `--skip-build` | Skip `cargo build` when the wrapper invokes bootstrap directly | | `--with-holdout-clone` | Also run `scripts/fetch-holdout-repos.mjs` (large git clones under `target/`) | When `--fetch-embed-model` is present, the wrapper downloads @@ -128,7 +130,7 @@ Compose file: [`docker/retrieval-compose.yml`](../../docker/retrieval-compose.ym | Dependency | Pin policy | Pinned version | Notes | |------------|------------|----------------|-------| -| Zoekt real (Phase 2) | `COMPOSE_PROFILES=real` | `zoekt-20250506123554` | `sourcegraph/zoekt-webserver:0.0.0-20250506123554-490422d1adb4` + lexical shards | +| Zoekt real | `COMPOSE_PROFILES=real` | `zoekt-20250506123554` | `sourcegraph/zoekt-webserver:0.0.0-20250506123554-490422d1adb4` + lexical shards | | Qdrant | Fixed container image tag | `qdrant/qdrant:v1.12.5` | HTTP `6333`, gRPC `6334` | | SCIP | CodeStory graph artifact emitter | `graph-` | Generated local graph artifacts under the sidecar generation | @@ -257,7 +259,7 @@ Managed `setup embeddings` output is not a substitute for this lane: it may install local semantic assets, but it does not start llama.cpp, build the retrieval manifest, or make `retrieval status` report `full`. -**Phase 2 (shipped in crate):** +**Shipped component status:** | Component | Status | |-----------|--------| @@ -280,7 +282,8 @@ diagnostic only and never produce `retrieval_mode=full`. - `CODESTORY_EMBED_MODEL_DIR=/target/retrieval-models` - `CODESTORY_EMBED_BACKEND=llamacpp` (recommended explicit product mode; unset is also product mode for retrieval commands) - `CODESTORY_EMBED_LLAMACPP_URL=http://127.0.0.1:8080/v1/embeddings` -3. `cargo retrieval-setup` (starts Qdrant, Zoekt webserver, `codestory-embed` on `:8080`) +3. `./target/release/codestory-cli retrieval bootstrap --project --format json` + starts Qdrant, Zoekt webserver, and `codestory-embed` on `:8080`. 4. Dim smoke: `curl -s http://127.0.0.1:8080/v1/embeddings -H "Content-Type: application/json" -d "{\"input\":[\"function\"]}"` → embedding length **768** 5. `retrieval index --project --refresh full` (manifest records `embedding_backend`, `embedding_dim`, `sidecar_input_hash`, `sidecar_generation`, the generated Qdrant collection, `symbol_doc_count`, `dense_projection_count`, `semantic_policy_version`, `graph_artifact_hash`, and dense reason counts; the input hash includes symbol-doc and dense-anchor metadata plus the embedding contract) 6. `retrieval status` → `retrieval_mode: full` and `capabilities.semantic=true` @@ -322,7 +325,7 @@ count is zero, Qdrant reuse is skipped explicitly and cannot mask stale graph/le ./target/release/codestory-cli retrieval down ``` -### Standalone query (Phase 2+) +### Standalone Query ```sh ./target/release/codestory-cli retrieval query "ExtensionService" --project . @@ -342,7 +345,7 @@ GGUF embedding model. 1. `retrieval up` - exit 0 2. `retrieval status` - JSON with expected shape; non-`full` status is a failure for agent use 3. `retrieval index --project ` - manifest row in SQLite only when all sidecars are real -4. `retrieval query ""` - Phase 2+ +4. `retrieval query ""` - standalone sidecar query 5. `retrieval down` - clean shutdown **CI reduced sequence:** diff --git a/docs/testing/benchmark-ledger.md b/docs/testing/benchmark-ledger.md index 957109a..c9792b1 100644 --- a/docs/testing/benchmark-ledger.md +++ b/docs/testing/benchmark-ledger.md @@ -1,9 +1,8 @@ # CodeStory Benchmark Ledger -This ledger keeps the decision-grade scorecard and detailed benchmark history -that is too dense for the README. Treat every row as machine-, cache-, runner-, -and date-specific. Promote only rows that pass the current harness gates -documented below. +Decision-grade scorecard and benchmark history - too dense for the README. +Treat every row as machine-, cache-, runner-, and date-specific. Do not quote a +row as a universal savings claim without checking harness tier and setup. Runs recorded before the 2026-05-24 harness tightening are historical unless they are reanalyzed or rerun with answer-level expected-file/symbol recall, @@ -173,7 +172,7 @@ mismatches. Warm stdio task medians ranged from `2.69s` to `3.60s`, with an aggregate task median of `3.13s`; cold CLI task medians ranged from `4.22s` to `5.76s`, with an aggregate task median of `4.86s`. -## Methodology +## Harness Contract The agent A/B harness runs the same repository prompt in two arms: diff --git a/docs/testing/retrieval-architecture.md b/docs/testing/retrieval-architecture.md index 5237e6e..102d1ae 100644 --- a/docs/testing/retrieval-architecture.md +++ b/docs/testing/retrieval-architecture.md @@ -12,7 +12,7 @@ env vars, CI smoke), [`../architecture/retrieval-design.md`](../architecture/ret --- -## Implemented stack (Phases 0–5) +## Implemented Stack | Layer | Location | Role | |-------|----------|------| @@ -107,12 +107,11 @@ anchors such as repository names, specific source paths, and manifest-specific symbols. Keep those strings in manifests, tests, benchmark harnesses, or the test-only eval probe module. -## Fast CI-style checks (automated in Phase 6) +## Required Checks ```sh cargo test -p codestory-runtime --test retrieval_generalization_guard node --test scripts/tests/codestory-agent-ab-analyzer.test.mjs -cargo test -p codestory-cli --test onboarding_contracts ``` Optional broader lane: @@ -125,10 +124,10 @@ node --test scripts/tests/codestory-agent-ab-analyzer.test.mjs --- -## Promotion checklist +## Promotion Checklist -Status as of Phase 6 documentation pass. **Benchmark pass columns require a human run** with -repos, sidecars, and release CLI — not claimed here. +**Benchmark pass columns require a human run** with repos, sidecars, and release +CLI. This page records the gates; it does not claim those rows have passed. ### Language support audit alignment @@ -137,7 +136,7 @@ tests in the branch. Do not infer support for languages without direct benchmark | Item | Status | Notes | |------|--------|-------| -| Phases 0–5 code landed | done | See implemented stack above | +| Core sidecar stack | done | See implemented stack above | | Architecture / design docs | done | `docs/architecture/retrieval-design.md` | | Sidecar runbook | done | `docs/ops/retrieval-sidecars.md` | | Local-real manifests | done | `benchmarks/tasks/local-real/` | @@ -145,14 +144,13 @@ tests in the branch. Do not infer support for languages without direct benchmark | `freelancer` / `traderotate` removed from default holdouts | done | OSS holdouts only | | Generalization lint + guard test | done | `lint-retrieval-generalization.mjs`, `retrieval_generalization_guard` | | Warning config | done | `docs/architecture/retrieval-rollback.json` | -| Markdown link contract (`onboarding_contracts`) | verify | `cargo test -p codestory-cli --test onboarding_contracts` | | local-real cold packet + north-star SLOs | **human** | p99 retrieval, quality 3/4, wall targets | | holdout-retrieval pass without skip allowances | **human** | Requires materialized OSS repos + index; no generalized claim without required recall/quality/forbidden-claim thresholds | | `agent_value_gap` < 0.20 | **human** | Measure from a fresh coherent bundle | | Windows `retrieval-sidecar-smoke` CI job | fail-closed sidecar smoke | [`retrieval-sidecars.md`](../ops/retrieval-sidecars.md#preflight-smoke-contract) | | Ragas/Phoenix nightly eval | optional | Not configured | -### North-star SLOs (targets — measure before claiming pass) +### North-Star SLOs | Metric | Target | |--------|--------| @@ -173,13 +171,13 @@ tests in the branch. Do not infer support for languages without direct benchmark --- -## Rollback drill (REQ-RES-005) +## Rollback Warning Drill After promotion runs, verify rollback warnings: 1. Point `retrieval_rollback` at a baseline `packet-runtime-summary.json` with thresholds that will trip on the current summary (or use unit test `rollback_drill_warns_without_setting_legacy_env` in `retrieval_rollback.rs`). 2. Confirm `check_and_log_rollback_warnings` logs trigger ids without setting `CODESTORY_RETRIEVAL=0`. -3. File a one-line incident note in this doc with date and trigger id if rollback fires in production promotion. +3. Record the trigger id with the promotion evidence if rollback fires during production promotion. **One-shot operator drill (after each promotion run):** @@ -189,7 +187,9 @@ cargo test -p codestory-runtime retrieval_rollback::tests::rollback_drill_warns_ Expect rollback warnings only when configured thresholds fire (see `docs/architecture/retrieval-rollback.json`). Sidecar retrieval remains mandatory. -**Closure status (2026-05-27, semantic promotion pass):** Phase A shipped (bge-base 768-d, llama.cpp `embed` compose service, manifest `embedding_backend`/`embedding_dim`, Qdrant collection migration, llamacpp dim hard-fail). Local `retrieval status` reaches `full` with default 768-d vectors after Qdrant re-index. Sidecar-primary is the intended product path, but product promotion remains gated until fresh benchmark evidence passes. +**Promotion note:** Local `retrieval status` can report `full` after Qdrant +re-index. Sidecar-primary is the intended product path, but product promotion +still requires fresh benchmark evidence. --- diff --git a/docs/usage.md b/docs/usage.md index 3bd8702..da3dad5 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -1,17 +1,12 @@ # CodeStory Usage -This is the operator guide. It keeps setup, common workflows, retrieval defaults, -and recovery notes in one place. - -Examples use POSIX shell syntax unless a block is labeled PowerShell. On -Windows, use `.\target\release\codestory-cli.exe` for the release binary, -`$env:NAME = "value"` for environment variables, and Windows paths when that is -the workspace you are indexing. +Setup, workflows, sidecars, recovery. Shell examples are POSIX unless noted. +Windows: `.\target\release\codestory-cli.exe`, `$env:NAME = "value"`. ## Install The Skill Install the grounding skill once, then point it at explicit target workspaces. -See [README — Install As An Agent Skill](../README.md#install-as-an-agent-skill) +See [README - Install as an agent skill](../README.md#install-as-an-agent-skill) for the full copy/setup commands and Windows PowerShell variant. The source skill package lives at @@ -42,31 +37,19 @@ TARGET_WORKSPACE="/path/to/repo" ## Readiness Tracks -CodeStory has two readiness tracks. Keep them separate when deciding whether an -agent can rely on packet/search output. - -### Local navigation/cache readiness - -This lane is for local browsing and source navigation. It uses the project -SQLite cache built by `index` and read by commands such as `ground`, `symbol`, -`trail`, `snippet`, `explore`, `context`, `files`, and `affected`. - -`doctor` may report this lane as `local_navigation`. Local navigation readiness -means the local cache, graph, lexical index, and DB-backed navigation commands -are usable. It does not prove agent packet/search readiness. - -### Agent packet/search sidecar readiness +Two lanes - do not mix them when judging `packet` or `search` output. -This lane is for agent-facing `packet` and `search` evidence. It requires the -sidecar retrieval stack to be built and healthy: Zoekt lexical shards, Qdrant -semantic vectors, SCIP graph artifacts, the llama.cpp query embedding endpoint, -and a current retrieval manifest. +| | Local navigation | Agent packet/search | +| --- | --- | --- | +| Lane id | `local_navigation` | `agent_packet_search` | +| Built by | `index` | `index` then `retrieval index` | +| Requires | Healthy SQLite cache and graph | Sidecars healthy and `retrieval_mode=full` | +| Commands | `ground`, `report`, `files`, `symbol`, `trail`, `snippet`, `explore`, `context --id`, `affected` | `packet`, `search`, query-based candidate discovery | +| Does not prove | Sidecar readiness | That cache-only browse is enough for agent search | -`doctor` may report this lane as `agent_packet_search`. Agent packet/search -readiness means sidecar packet/search evidence is trustworthy only when -retrieval status reports `retrieval_mode: "full"`. Missing, stale, stubbed, -hash-vector, or non-product sidecar state is diagnostic only and must not be -described as agent packet/search readiness. +`doctor` reports lane status. Sidecar topology: +[architecture/overview.md](architecture/overview.md), +[ops/retrieval-sidecars.md](ops/retrieval-sidecars.md). ## Common Workflows @@ -76,19 +59,12 @@ described as agent packet/search readiness. codestory-cli doctor --project codestory-cli index --project --refresh full codestory-cli ground --project --why -codestory-cli report --project --output-file out/codestory-report.md -codestory-cli report --project --format json --output-file out/codestory-graph.json +codestory-cli report --project --output-file codestory-report.md +codestory-cli report --project --format json --output-file codestory-graph.json ``` -Use this when the repository is new to the agent. `doctor` tells you whether the -cache and retrieval state are usable. `ground --why` gives broad orientation and -reports limited coverage or gaps. `report` reads the current SQLite store -without refreshing it and emits generated artifacts: Markdown for repo summary, -hotspots, entry points, bridge/high-connectivity nodes, and next queries; JSON -for automation that needs the full current graph, including nodes, edges, -confidence/certainty, source locations, and generation metadata. `--limit` -bounds the Markdown report sections, not the full JSON graph export. Treat both -files as outputs to regenerate, not source-of-truth state. +Health check, orientation, optional report and graph export. Regenerate reports +after index changes; they are artifacts, not source-of-truth state. ### I need evidence for a broad question @@ -96,32 +72,31 @@ files as outputs to regenerate, not source-of-truth state. codestory-cli packet --project --question "" --budget compact ``` -Use `packet` for questions like "how does routing work?" or "what owns indexing -state?" It returns a `sufficient`, `partial`, or `blocked` status with -citations, trust limits, gaps, and follow-up commands. If the packet is -`partial` or `blocked`, follow the named source-truth commands instead of -opening unstructured source files directly. Treat `sufficient` as evidence -coverage, not final answer-quality proof. +Returns `sufficient`, `partial`, or `blocked` with citations and follow-ups. +Requires `retrieval_mode=full`. ### I need to understand one symbol or file +With full sidecar readiness, use `search` for candidate discovery: + ```sh -codestory-cli search --project --query "" --why -codestory-cli explore --project --id --no-tui +codestory-cli search --project --query "" --why codestory-cli trail --project --id --story --hide-speculative codestory-cli snippet --project --id --context 40 ``` -Start with `search`, pick a concrete `node-id`, then inspect the relationships -and source. Use `context` when you want a bundled handoff around that target: +Without sidecars, stay on the local navigation lane until you have a concrete +target: ```sh -codestory-cli context --project --id --bundle out/context-name +codestory-cli ground --project --why +codestory-cli report --project --output-file codestory-report.md +codestory-cli files --project --path src --limit 80 ``` -Target context is DB-first evidence for one concrete target. `context` is -target-first; it is not an open chat endpoint and is not a replacement for broad -`packet`, `search`, or `drill` questions. +Then use `symbol`, `trail`, `snippet`, or `context --id` with an exact node id +from local output. Do not treat `search` or `context --query` as cache-only +fallbacks; query-based discovery is part of the agent packet/search lane. ### I changed files and need likely impact @@ -129,12 +104,9 @@ target-first; it is not an open chat endpoint and is not a replacement for broad codestory-cli index --project --refresh incremental codestory-cli affected --project --format markdown git diff --name-only HEAD | codestory-cli affected --project --stdin --format json -git diff --name-status HEAD | codestory-cli affected --project --stdin --stdin-format name-status --format json ``` -Treat `affected` as test-selection evidence, not a replacement for tests. The -default command preserves git name-status records; path-only stdin remains -available when another tool already chose the file list. +Impacted symbols and test hints - not a substitute for running tests. ### The cache or local navigation looks stale @@ -144,13 +116,11 @@ codestory-cli index --project --refresh full codestory-cli doctor --project ``` -If `doctor` reports stale inventory, dense-anchor contract mismatch, missing -managed assets, or a non-`full` retrieval mode, fix that layer before -investigating answer quality. Treat the health report as the first source of -truth for cache and retrieval state. +Fix inventory or indexing errors before trusting local navigation output. If +`packet`, `search`, or `context --query` reports `retrieval_unavailable`, repair +the sidecar lane instead of repeating the same command. -For agent-facing packet/search recovery, use the full sidecar repair sequence -that `ready --goal agent` reports: +### For agent-facing packet/search recovery ```sh codestory-cli retrieval bootstrap --project --format json @@ -159,12 +129,8 @@ codestory-cli retrieval status --project --format json codestory-cli doctor --project --format markdown ``` -When the core index is missing, stale, unchecked, or has recorded fatal indexing -errors, `ready` reports the necessary `codestory-cli index` repair first. -Otherwise, sidecar recovery does not need to repeat a full core reindex. -`retrieval bootstrap` prepares or checks the local sidecar services. The target -workspace is not packet/search-ready until `retrieval index` writes a current -target manifest and `doctor` or `retrieval status` reports `retrieval_mode=full`. +Target `retrieval_mode=full`. Core index problems may require `index` first - +see `ready --goal agent`. ## Core Commands @@ -178,15 +144,16 @@ target manifest and `doctor` or `retrieval status` reports `retrieval_mode=full` SQLite store; use `--output-file` to keep artifacts separate from terminal logs. - `packet`: bounded broad-task evidence packet with citations, budget usage, - gaps, and follow-up commands. -- `search`: candidate discovery for symbols, files, literals, API paths, - modules, and behavior terms. + gaps, and follow-up commands; requires agent packet/search readiness. +- `search`: sidecar-backed candidate discovery for symbols, files, literals, + API paths, modules, and behavior terms. - `symbol`: inspect one exact symbol and relationships. - `trail`: follow caller, callee, and reference relationships around a symbol. - `snippet`: fetch source context around a symbol. - `explore`: bundled navigation packet or terminal explorer around a target. -- `context`: deep evidence bundle for one concrete target selected by `--id`, - `--query`, or `--bookmark`. +- `context`: deep evidence bundle for one concrete target. `--id` and + `--bookmark` are exact-target paths; `--query` must be treated like + sidecar-backed discovery. - `affected`: map changed files to impacted symbols and likely tests. - `files`: inspect indexed file inventory, language counts, roles, and coverage notes. @@ -230,9 +197,15 @@ reset, schema change, or suspected stale-state incident. ## Predictable Output Modes -Most commands default to Markdown for human review. Use `--format json` when automation needs the complete structured result, including exact field comparisons such as `retrieval_mode` or cache paths. Use `--output-file ` when the artifact should live outside terminal logs. The parent directory must already exist. +Most commands default to Markdown for human review. Use `--format json` when +automation needs the complete structured result, including exact field +comparisons such as `retrieval_mode` or cache paths. Use `--output-file ` +when the artifact should live outside terminal logs. The parent directory must +already exist. -`explore` opens the terminal UI by default when a TUI is available. Use `--no-tui`, `--plain`, or `CODESTORY_NO_TUI=1` for predictable command output in agent runs, tests, non-interactive terminals, and CI logs. +`explore` opens the terminal UI by default when a TUI is available. Use +`--no-tui`, `--plain`, or `CODESTORY_NO_TUI=1` for predictable command output in +agent runs, tests, non-interactive terminals, and CI logs. Agent-facing Markdown may start with `Status`, `Trust`, `Next Action`, and `Proof Tier` before dense citations. Use `search --why --plan-details` only when @@ -264,7 +237,7 @@ node scripts/setup-retrieval-env.mjs --fetch-embed-model export CODESTORY_EMBED_MODEL_DIR="$(pwd)/target/retrieval-models" export CODESTORY_EMBED_BACKEND="llamacpp" export CODESTORY_EMBED_LLAMACPP_URL="http://127.0.0.1:8080/v1/embeddings" -cargo retrieval-setup +codestory-cli retrieval bootstrap --project --format json codestory-cli index --project --refresh full codestory-cli retrieval index --project --refresh full @@ -279,9 +252,10 @@ with SHA-256 `ad1afe72cd6654a558667a3db10878b049a75bfd72912e1dabb91310d671173c`; all configured mirrors must pass the same check. -Run `codestory-cli retrieval index` only after the local sidecar services, -llama.cpp embedding endpoint, and `bge-base-en-v1.5` model configuration are -ready, then require `retrieval status --format json` to report +Run `codestory-cli retrieval bootstrap` for the same target workspace you will +query. Then run `codestory-cli retrieval index` only after the local sidecar +services, llama.cpp embedding endpoint, and `bge-base-en-v1.5` model +configuration are ready. Require `retrieval status --format json` to report `retrieval_mode: "full"` before trusting agent-facing packet/search evidence. The status JSON also reports `query_embedding_backend`, `manifest_vector_embedding_backend`, and `stored_doc_vector_producer_backend` @@ -381,7 +355,7 @@ Typical recovery flow: ```sh codestory-cli doctor --project codestory-cli index --project --refresh full -codestory-cli search --project --query WorkspaceIndexer +codestory-cli ground --project --why ``` If the cache directory itself is suspect, get the exact project cache path from @@ -424,10 +398,10 @@ cargo test cargo clippy --all-targets -- -D warnings ``` -Focused docs/onboarding lane: +Docs-only lane: ```sh -cargo test -p codestory-cli --test onboarding_contracts +git diff --check ``` Release-blocking fidelity lanes: diff --git a/scripts/tests/codestory-agent-ab-analyzer.test.mjs b/scripts/tests/codestory-agent-ab-analyzer.test.mjs index e6162fb..46a2c9f 100644 --- a/scripts/tests/codestory-agent-ab-analyzer.test.mjs +++ b/scripts/tests/codestory-agent-ab-analyzer.test.mjs @@ -262,7 +262,7 @@ test("categorizes commands without treating source paths as cli invocations", () ); assert.equal(commandCategory("Get-Content crates/codestory-cli/src/main.rs"), "direct_file_read"); assert.equal(commandCategory("Get-Content C:\\tools\\codestory-cli.exe"), "direct_file_read"); - assert.equal(commandCategory("cargo test -p codestory-cli --test onboarding_contracts"), "build_test"); + assert.equal(commandCategory("cargo test -p codestory-cli --test runtime_backed_flows"), "build_test"); }); test("packet gate retries only transient sidecar packet failures", async () => { @@ -697,8 +697,8 @@ test("analyzes transcript command friction and scores manifest anchors", () => { commandEvent("cmd_7", "item.completed", `$p='"'crates/codestory-runtime/src/lib.rs'; Get-Content $p`, "pub struct RuntimeContext;"), commandEvent("cmd_5", "item.started", "git status --short"), commandEvent("cmd_5", "item.completed", "git status --short", ""), - commandEvent("cmd_6", "item.started", "cargo test -p codestory-cli --test onboarding_contracts"), - commandEvent("cmd_6", "item.completed", "cargo test -p codestory-cli --test onboarding_contracts", "ok"), + commandEvent("cmd_6", "item.started", "cargo test -p codestory-cli --test runtime_backed_flows"), + commandEvent("cmd_6", "item.completed", "cargo test -p codestory-cli --test runtime_backed_flows", "ok"), { type: "item.completed", item: { @@ -724,7 +724,7 @@ test("analyzes transcript command friction and scores manifest anchors", () => { id: "fixture", task_class: "architecture_explanation", expected_files: ["crates/codestory-cli/src/main.rs"], - expected_verification_files: ["crates/codestory-cli/tests/onboarding_contracts.rs"], + expected_verification_files: ["crates/codestory-cli/tests/runtime_backed_flows.rs"], expected_symbols: ["RuntimeContext::ensure_open", "MissingSymbol"], expected_claims: ["Full indexing starts"], forbidden_claims: ["remote service is required"], @@ -744,7 +744,7 @@ test("analyzes transcript command friction and scores manifest anchors", () => { assert.deepEqual(quality.missed_anchors.symbols, ["MissingSymbol"]); assert.equal(quality.expected_verification_files.recall, 0); assert.deepEqual(quality.missed_anchors.verification_files, [ - "crates/codestory-cli/tests/onboarding_contracts.rs", + "crates/codestory-cli/tests/runtime_backed_flows.rs", ]); assert.equal(quality.citation_coverage.recall, 1); });