diff --git a/.agents/skills/codestory-grounding/SKILL.md b/.agents/skills/codestory-grounding/SKILL.md index a72f0368..0f35ebb3 100644 --- a/.agents/skills/codestory-grounding/SKILL.md +++ b/.agents/skills/codestory-grounding/SKILL.md @@ -41,9 +41,11 @@ checkout is only the tool artifact unless the user is editing CodeStory itself. - When `packet` reports `sufficient` and `follow_up_commands` is empty, answer from the packet; budget truncation alone is not a gap. Preserve supported-claim wording and include a compact "Support files" list from `answer.citations` and - `sufficiency.avoid_opening`. Do not run ordinary source reads, `rg`, `grep`, or - `git show` only to verify packet citations; run more commands only for a named - unresolved gap, an edit target, or a user-requested worktree proof. + `sufficiency.avoid_opening_paths`. The older `sufficiency.avoid_opening` field + is human-readable compatibility prose, not the raw path contract. Do not run + ordinary source reads, `rg`, `grep`, or `git show` only to verify packet + citations; run more commands only for a named unresolved gap, an edit target, + or a user-requested worktree proof. - When `packet` reports `partial`, read `sufficiency.follow_up_commands` and run those commands in order. Prefer listed targeted `search --why` commands before escalating to a larger packet budget. As soon as a follow-up packet becomes @@ -61,6 +63,13 @@ checkout is only the tool artifact unless the user is editing CodeStory itself. failed, treat product retrieval as unavailable until `retrieval_mode=full` is restored. Repo-text output is diagnostic only; do not use it as a substitute for mandatory sidecar evidence. +- Under `graph_first_v1`, `retrieval_mode=full` means graph and lexical sidecars + are complete, generated `symbol_search_doc` and component-report virtual docs + are current, and Qdrant is complete only for selected dense anchors. A zero + dense-anchor manifest is valid only when reported explicitly; otherwise + Qdrant mismatch or unavailability is fail-closed. Search evidence should name + provenance such as `exact`, `lexical_source`, `symbol_doc`, `graph_neighbor`, + `component_report`, or `dense_anchor`. ## Command Routing diff --git a/.agents/skills/codestory-grounding/references/doctor.md b/.agents/skills/codestory-grounding/references/doctor.md index 29f5b218..8c62e928 100644 --- a/.agents/skills/codestory-grounding/references/doctor.md +++ b/.agents/skills/codestory-grounding/references/doctor.md @@ -22,7 +22,7 @@ Reads project/cache/index/retrieval health without mutating the index. Use it at | Path | Command | Expected result | |------|---------|-----------------| | Normal path | ` doctor --project ` | Reports project root, cache path, indexed stats, retrieval state, sidecar embedding setup, environment hints, and next commands. | -| Failure path | If cache or index checks warn, run `index --project --refresh full`; if mandatory sidecars are missing or stale, run the setup/index commands surfaced by `doctor`; if semantic reports `semantic partial`, `semantic stale`, or `semantic failed`, rebuild before trusting broad packet/search evidence. | Separates missing index, stale semantic docs, partial semantic docs, and mandatory retrieval setup failures. | +| Failure path | If cache or index checks warn, run `index --project --refresh full`; if mandatory sidecars are missing or stale, run the setup/index commands surfaced by `doctor`; if symbol docs, dense anchors, policy version, Qdrant counts, or semantic health report partial/stale/failed state, rebuild before trusting broad packet/search evidence. | Separates missing index, stale symbol docs, partial dense anchors, and mandatory retrieval setup failures. | | Integration edge | Use doctor before `ground`, `search --why`, `explore`, `context`, or `serve`; its next commands are the safe follow-up loop. | Prevents read commands from silently querying the wrong or empty cache. | ## Notes @@ -31,5 +31,5 @@ Reads project/cache/index/retrieval health without mutating the index. Use it at - The `attention:` block repeats warnings first so agents do not miss semantic partial/stale/failure messages buried in the full check list. - Environment rows report retrieval-related variables such as `CODESTORY_EMBED_BACKEND`, `CODESTORY_EMBED_LLAMACPP_URL`, and sidecar enablement flags. - The embedding checks distinguish product llama.cpp sidecar state from hash, ONNX, disabled, or stale diagnostic states. -- Treat `semantic ok` plus `retrieval_mode=full` as the health state suitable for broad repository explanation prompts. Treat `semantic partial`, `semantic stale`, `semantic failed`, and non-`full` retrieval modes as instructions to repair setup or rebuild before trusting agent-facing evidence. +- Treat `semantic ok` plus `retrieval_mode=full` as the health state suitable for broad repository explanation prompts. Under `graph_first_v1`, `full` may explicitly skip Qdrant only when dense-anchor count is zero and graph/lexical artifacts are current. Treat `semantic partial`, `semantic stale`, `semantic failed`, Qdrant count mismatch, and non-`full` retrieval modes as instructions to repair setup or rebuild before trusting agent-facing evidence. - Prefer JSON for CI or doc-contract checks. diff --git a/.agents/skills/codestory-grounding/references/files.md b/.agents/skills/codestory-grounding/references/files.md index 37586f00..2e3fd7d2 100644 --- a/.agents/skills/codestory-grounding/references/files.md +++ b/.agents/skills/codestory-grounding/references/files.md @@ -35,7 +35,7 @@ claims about what the graph can see. - `files` reads persisted `FileInfo`; it does not scan the repo live unless `--refresh` asks for an index refresh. - Treat `index usable` with incomplete or error counts as a partial-coverage signal, not a failure. -- `summary.framework_route_coverage` is the support matrix for framework route extraction. It includes `status`, `fixture_status`, `confidence_floor`, `handler_link_support`, `unsupported_patterns`, `known_gaps`, and `promotable`. Treat `partial`, `heuristic`, text-only handler support, and `promotable=false` as review prompts, not proof of full framework parity. +- `summary.framework_route_coverage` is the support matrix for framework route extraction. It includes `status`, `coverage_evidence`, `confidence_floor`, `handler_link_support`, `unsupported_patterns`, `known_gaps`, and `promotable`. Treat `partial`, `heuristic`, text-only handler support, and `promotable=false` as review prompts, not proof of full framework parity. - Route coverage statuses: - `supported`: fixture-backed behavior is passing and documented coverage is met. - `heuristic`: pattern-backed evidence that needs source review. diff --git a/.agents/skills/codestory-grounding/references/index.md b/.agents/skills/codestory-grounding/references/index.md index e66faea5..b4ffe4b4 100644 --- a/.agents/skills/codestory-grounding/references/index.md +++ b/.agents/skills/codestory-grounding/references/index.md @@ -1,7 +1,8 @@ # `index` - Build or Refresh the Symbol Index Discovers project files, extracts symbols and edges, persists graph/search state -to SQLite, and synchronizes semantic docs when embedding assets are available. +to SQLite, writes graph-native symbol docs and component reports, and +synchronizes selected dense anchors when embedding assets are available. ## Usage @@ -15,7 +16,7 @@ to SQLite, and synchronizes semantic docs when embedding assets are available. |--------|---------|-----| | `--project ` / `--path ` | `.` | Target repository root. Always pass this explicitly. | | `--cache-dir ` | auto | Override the per-project cache root. | -| `--refresh ` | `auto` | Choose the graph/snapshot/semantic refresh mode. | +| `--refresh ` | `auto` | Choose the graph/snapshot/symbol-doc/dense-anchor refresh mode. | | `--format ` | `markdown` | Use JSON for automation and timing analysis. | | `--output-file ` | stdout | Write output to a file with an existing parent directory. | | `--dry-run` | off | Show workspace discovery and planned adds/removals without writing storage. | @@ -28,19 +29,21 @@ to SQLite, and synchronizes semantic docs when embedding assets are available. | Mode | Behavior | |------|----------| | `auto` | Use `full` for an empty cache and `incremental` otherwise. | -| `full` | Rebuild the project graph and semantic docs from the discovered workspace. | -| `incremental` | Reindex changed/new/unindexed files, remove disappeared files, and prune touched semantic docs. | +| `full` | Rebuild the project graph, symbol docs, component reports, and dense anchors from the discovered workspace. | +| `incremental` | Reindex changed/new/unindexed files, remove disappeared files, and prune touched symbol docs or dense anchors. | | `none` | Inspect the existing cache without refreshing it. Use only after a known-good same-session index. | Use `--refresh full` for first-time indexes, cache/schema uncertainty, and fixes for historical indexing failures. Incremental runs can leave stale error rows when previously failing files are not touched. -## Semantic Retrieval +## Symbol Docs And Dense Anchors -There is no `index --semantic off` flag. Semantic docs are part of the default -index contract when embedding assets are ready. On a fresh machine, check the -setup plan first: +There is no `index --semantic off` flag. Graph-native `symbol_search_doc` rows +are part of the default index contract. Under `graph_first_v1`, dense vectors +are only written for selected anchors such as entrypoints, public APIs, +documented nontrivial symbols, central graph nodes, component reports, and +unstructured docs. On a fresh machine, check the setup plan first: ```text setup embeddings --project --dry-run --format json @@ -53,7 +56,7 @@ High-signal environment toggles: | Variable | Use | |----------|-----| -| `CODESTORY_SEMANTIC_DOC_SCOPE=all` | Include all-symbol semantic docs. Accepted all-symbol aliases are `all`, `full`, `all-symbols`, and `all_symbols`; omitted or other values default to durable symbols. | +| `CODESTORY_SEMANTIC_DOC_SCOPE=all` | Include the broader all-symbol symbol-doc scope for diagnostics. Accepted aliases are `all`, `full`, `all-symbols`, and `all_symbols`; omitted or other values default to durable symbols. | | `CODESTORY_EMBED_BACKEND=llamacpp` | Use the mandatory local llama.cpp embedding sidecar. | | `CODESTORY_EMBED_LLAMACPP_URL=http://127.0.0.1:8080/v1/embeddings` | Product embedding endpoint for bge-base sidecar vectors. | | `CODESTORY_SUMMARY_ENDPOINT=local` | Enable deterministic local summaries with `--summarize`. | @@ -61,7 +64,9 @@ High-signal environment toggles: Use other embedding, alias, batch-size, tokenizer, provider, hash, ONNX, and summary tuning variables only for focused diagnostics or historical comparisons. Agent packet/search readiness requires retrieval status to report -`retrieval_mode=full`. +`retrieval_mode=full`. A zero dense-anchor corpus is valid only when the +manifest reports it explicitly; otherwise stale or unavailable Qdrant state +fails closed. ## Output @@ -69,9 +74,9 @@ Markdown returns a compact index summary. JSON exposes the same data for tools: - project and storage path - refresh mode and discovered file/error counts -- local navigation readiness notes and semantic doc counts +- local navigation readiness notes, symbol-doc counts, dense-anchor counts, and policy reason counts - parse, flush, resolve, cleanup, cache, and semantic timing buckets -- resolution counters and semantic reuse/embed/prune counts +- resolution counters plus symbol-doc write and dense-anchor reuse/embed/skip/prune counts Important timing fields are `timings_ms.parse`, `timings_ms.flush`, `timings_ms.resolve`, `timings_ms.cleanup`, `cache_ms.search_index`, diff --git a/.agents/skills/codestory-grounding/references/packet.md b/.agents/skills/codestory-grounding/references/packet.md index 37714a23..24d940ca 100644 --- a/.agents/skills/codestory-grounding/references/packet.md +++ b/.agents/skills/codestory-grounding/references/packet.md @@ -31,7 +31,7 @@ tracing, ownership discovery, or change-impact analysis. |------|---------|-----------------| | Normal path | ` packet --project --question "How does indexing flow from CLI to storage?" --budget compact` | Markdown packet with cited claims, budget usage, gaps, and follow-up commands. | | Failure path | If the packet reports `partial` or `insufficient`, follow its `follow_up_commands`, usually deeper packet budget or concrete `search`, `context`, `trail`, or `snippet` calls. | Broad exploration is bounded by reported gaps instead of drifting into repeated file reads. | -| Integration edge | Use JSON output for harnesses and stdio clients. If `sufficiency.status` is `sufficient` and `follow_up_commands` is empty, answer from packet supported claims and include a compact support-file list from `answer.citations` and `sufficiency.avoid_opening`; budget truncation alone is not a gap. | Makes benchmark traces and agent loops comparable across runs. | +| Integration edge | Use JSON output for harnesses and stdio clients. If `sufficiency.status` is `sufficient` and `follow_up_commands` is empty, answer from packet supported claims and include a compact support-file list from `answer.citations` and `sufficiency.avoid_opening_paths`; budget truncation alone is not a gap. Treat `sufficiency.avoid_opening` as compatibility prose only. | Makes benchmark traces and agent loops comparable across runs. | ## Notes diff --git a/.agents/skills/codestory-grounding/references/retrieval-rollout.md b/.agents/skills/codestory-grounding/references/retrieval-rollout.md index 0c7ca72e..e5327546 100644 --- a/.agents/skills/codestory-grounding/references/retrieval-rollout.md +++ b/.agents/skills/codestory-grounding/references/retrieval-rollout.md @@ -10,11 +10,29 @@ trustworthy; running retrieval alone is not enough. | Rollout layer | Trustworthy proof | Run when | Does not prove | | --- | --- | --- | --- | | Indexer coverage | `cargo test -p codestory-indexer --test fidelity_regression`; `cargo test -p codestory-indexer --test tictactoe_language_coverage`; targeted `files` or `affected` checks for changed paths | Parser, tree-sitter, semantic-resolution, symbol, edge, file-role, or coverage changes | Sidecar readiness, runtime packet behavior, or CLI search contract | -| Retrieval sidecar crate | `cargo test -p codestory-retrieval`; then live `retrieval bootstrap`, `retrieval index --project --refresh full`, and `retrieval status --project --format json` reporting `retrieval_mode="full"` | Zoekt, Qdrant, SCIP, manifest generation, sidecar status, embedding backend/dim, or Qdrant client changes | Runtime admission, stdio cache invalidation, or full CLI output shape | +| Retrieval sidecar crate | `cargo test -p codestory-retrieval`; then live `retrieval bootstrap`, `retrieval index --project --refresh full`, and `retrieval status --project --format json` reporting `retrieval_mode="full"` plus current `symbol_doc_count`, `dense_projection_count`, `semantic_policy_version`, `graph_artifact_hash`, and dense reason counts | Zoekt, Qdrant, SCIP, manifest generation, sidecar status, symbol-doc virtual docs, dense-anchor policy, embedding backend/dim, or Qdrant client changes | Runtime admission, stdio cache invalidation, or full CLI output shape | | Runtime integration | `cargo test -p codestory-runtime --lib`; `cargo test -p codestory-runtime --test retrieval_generalization_guard`; `cargo test -p codestory-runtime --test retrieval_eval`; set `CODESTORY_RETRIEVAL_EVAL_FULL_TESTS=1` only after real sidecars are prepared | Packet/search orchestration, fail-closed modes, retrieval shadow traces, rollback-warning logic, or runtime use of sidecar results | CLI argument/output behavior or GitHub smoke workflow behavior | | CLI surface | `cargo test -p codestory-cli --test retrieval_bootstrap_contracts`; `cargo test -p codestory-cli --test stdio_protocol_contracts`; `cargo test -p codestory-cli --test search_json_output`; with real sidecars, run the ignored full-mode search JSON test explicitly | `retrieval bootstrap/status/index` contracts, stdio protocol/cache fingerprints, fail-closed search JSON, or user-facing command shape | Full product readiness unless `retrieval status` is `full` after live sidecar indexing | -| Benchmark harness | `cargo check -p codestory-bench --benches`; the relevant Criterion bench only when it isolates the hot path; release e2e stats for real-repo timing | New benchmark code, latency/timing claims, rollback baseline updates, or performance-sensitive retrieval/index changes | Promotion by itself; synthetic or narrow benches are scouts until real-repo evidence exists | -| Smoke CI | `.github/workflows/retrieval-sidecar-smoke.yml` plus `docs/contributors/retrieval-sidecar-smoke-ci.md` pass criteria | PRs touching retrieval crate, runtime/stdio/search wiring, indexer retrieval hooks, retrieval docs, scripts, Docker sidecar config, or the workflow | Full sidecar readiness. CI smoke uses `--skip-compose --wait-secs 0` and proves manifest-missing fail-closed shape only | +| Benchmark harness | `cargo check -p codestory-bench --benches`; the relevant Criterion bench only when it isolates the hot path; release e2e stats for real-repo timing; for AST-first retrieval, include same-run baseline/candidate rows for cold total index time, `semantic_embedding_ms`, dense doc count reduction, repeat refresh embedded-doc count, holdout MRR@10/Hit@10/exact-symbol Hit@1, packet lazy-search source reads, and peak descendant working set | New benchmark code, latency/timing claims, rollback baseline updates, dense-policy changes, or performance-sensitive retrieval/index changes | Promotion by itself; synthetic or narrow benches are scouts until real-repo evidence exists | +| Smoke CI | `.github/workflows/retrieval-sidecar-smoke.yml` plus `docs/ops/retrieval-sidecars.md#preflight-smoke-contract` pass criteria | PRs touching retrieval crate, runtime/stdio/search wiring, indexer retrieval hooks, retrieval docs, scripts, Docker sidecar config, or the workflow | Full sidecar readiness. CI smoke uses `--skip-compose --wait-secs 0` and proves manifest-missing fail-closed shape only | + +## Agent-Grounding Release Gates + +Use the highest completed tier as the only claim level in docs, PRs, or final +handoffs: + +| Tier | Required evidence | Claim boundary | +| --- | --- | --- | +| CodeStory self-e2e | Generalization lint, targeted runtime/indexer tests, release CLI build, `doctor`, and repo-scale e2e stats | This branch still works on CodeStory and product Rust has no banned holdout literals | +| Local-real drill suite | Self-e2e plus local-real packet/drill rows without skip allowances | Product tuning survived realistic local repos | +| Holdout-retrieval drill suite | Local-real plus materialized holdout-retrieval rows, required recall/quality thresholds, and forbidden-claim checks with no skip allowances | Retrieval behavior is generalized for the public holdout suite | +| Promotion-grade paired benchmark | Holdout plus repeated CodeStory/no-CodeStory rows, timing/cost accounting, answer-quality ledger classifications, and packet-first source-read avoidance checks | Useful-for-agents, speed, or savings claims | + +Packet statuses (`sufficient`, `partial`, `blocked`) describe evidence coverage +only. Final answer quality is promoted only by `drill`/`drill-suite` ledger +classifications. Holdout literals belong in manifests, tests, benchmark +harnesses, or the `CODESTORY_EVAL_PROBES` eval module, not production +planner/ranker/runtime code. ## CI Smoke Triage @@ -38,9 +56,10 @@ evidence is trustworthy only after live sidecars are indexed and status is full. | Symptom | Likely layer | Action | | --- | --- | --- | | `retrieval_manifest_missing` | Bootstrap/state exists but no project manifest was finalized | In CI smoke this is expected. For product proof, run live `retrieval index --refresh full` and recheck status | -| `sidecar_manifest_stale`, input-hash drift, or embedding-backend drift | Source, SQLite projection, semantic docs, backend, dimension, or schema changed after the manifest | Rerun `retrieval index --refresh full`; `--refresh auto` may repair stale stored semantic-doc contracts once, but explicit failures still fail closed | -| `no_semantic`, `lexical_only`, or `unavailable` with Qdrant errors | Qdrant, embedding endpoint, or semantic smoke failed | Run bootstrap, confirm ports `6333`/`6334` and the embedding endpoint, then rebuild sidecar indexes | -| Qdrant collection exists but point count is below the semantic-doc projection count, is one-point, or has a stub marker | Partial or obsolete collection | Rerun `retrieval index`; do not bless semantic smoke alone as full readiness | +| `sidecar_manifest_stale`, input-hash drift, policy-version drift, graph-artifact-hash drift, dense-reason drift, or embedding-backend drift | Source, SQLite projection, `symbol_search_doc`, dense anchors, backend, dimension, policy, or schema changed after the manifest | Rerun `retrieval index --refresh full`; `--refresh auto` may repair stale stored symbol-doc or dense-anchor contracts once, but explicit failures still fail closed | +| `no_semantic`, `lexical_only`, or `unavailable` with Qdrant errors while dense anchors are expected | Qdrant, embedding endpoint, or semantic smoke failed | Run bootstrap, confirm ports `6333`/`6334` and the embedding endpoint, then rebuild sidecar indexes | +| Qdrant skipped while manifest dense-anchor count is `0` | Expected `graph_first_v1` graph/lexical full mode | Verify Zoekt and SCIP are healthy and manifest symbol-doc count, policy version, graph hash, and dense reason counts match | +| Qdrant collection exists but point count is below the dense-anchor projection count, is one-point, or has a stub marker | Partial or obsolete collection | Rerun `retrieval index`; do not bless semantic smoke alone as full readiness | | Qdrant response lacks `result.points[]` | Qdrant client/API contract drift or wrong image | Verify the pinned Qdrant image and update the client/test contract deliberately | | `storage_repair.scan_errors` appears during bootstrap | Cache protection scan was incomplete | Resolve unreadable cache roots or DBs before relying on retention pruning; do not treat suppressed pruning as readiness proof | @@ -55,8 +74,8 @@ cargo test -p codestory-cli --test codestory_repo_e2e_stats -- --ignored --nocap ``` This log is especially mandatory for retrieval rollout changes that affect -default indexing, semantic-doc persistence or reuse, sidecar indexing/status, -packet/search behavior, runtime grounding surfaces, CLI command shape, or any -performance/timing claim. A stats-only row with +default indexing, symbol-doc persistence, dense-anchor persistence or reuse, +sidecar indexing/status, packet/search behavior, runtime grounding surfaces, CLI +command shape, or any performance/timing claim. A stats-only row with `CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1` can record local timing, but it is not real-drill release evidence. diff --git a/.agents/skills/codestory-grounding/references/setup.md b/.agents/skills/codestory-grounding/references/setup.md index 6916cd1d..680ebfff 100644 --- a/.agents/skills/codestory-grounding/references/setup.md +++ b/.agents/skills/codestory-grounding/references/setup.md @@ -27,7 +27,7 @@ surprise-download. Agent-facing packet/search evidence still requires | Path | Command | Expected result | |------|---------|-----------------| -| Normal path | `node scripts/setup-retrieval-env.mjs --fetch-embed-model`, then ` retrieval bootstrap --project ` | Downloads the pinned bge-base GGUF for the local llama.cpp sidecar, starts local sidecars, and prepares the product retrieval environment. | +| Normal path | `node scripts/setup-retrieval-env.mjs --fetch-embed-model`, then ` retrieval bootstrap --project ` | Downloads the pinned bge-base GGUF for the local llama.cpp sidecar, verifies the artifact size/SHA-256 before accepting it, starts local sidecars, and prepares the product retrieval environment. | | Failure path | If setup fails, run `setup embeddings --project --dry-run --format json` and inspect the selected asset URLs, cache root, output paths, and checksums. | Separates platform support, download, checksum, extraction, and sidecar-readiness failures. | | Integration edge | Run `retrieval index --project --refresh full`, then `retrieval status --project --format json`. | Product search/packet paths are usable only when status reports `retrieval_mode=full`. | @@ -41,5 +41,9 @@ surprise-download. Agent-facing packet/search evidence still requires - Product sidecar evidence requires `CODESTORY_EMBED_BACKEND=llamacpp`, the local llama.cpp endpoint, and a manifest embedding backend of `llamacpp:bge-base-en-v1.5`. +- The retrieval setup wrapper accepts only `bge-base-en-v1.5.Q8_0.gguf` files + matching size `117974304` and SHA-256 + `ad1afe72cd6654a558667a3db10878b049a75bfd72912e1dabb91310d671173c`; fallback + URLs are mirror candidates gated by that same checksum. - Hash embeddings, ONNX-only flows, and non-sidecar embedding paths are diagnostic or historical comparison modes only. diff --git a/.github/workflows/retrieval-sidecar-smoke.yml b/.github/workflows/retrieval-sidecar-smoke.yml index a77bca51..44e40c08 100644 --- a/.github/workflows/retrieval-sidecar-smoke.yml +++ b/.github/workflows/retrieval-sidecar-smoke.yml @@ -1,5 +1,5 @@ # Windows retrieval manifest-missing shape smoke. -# Contract: docs/contributors/retrieval-sidecar-smoke-ci.md +# Contract: docs/ops/retrieval-sidecars.md#preflight-smoke-contract name: retrieval-sidecar-smoke @@ -25,7 +25,6 @@ on: - scripts/lint-retrieval-generalization.mjs - scripts/**retrieval** - docs/ops/retrieval-sidecars.md - - docs/contributors/retrieval-sidecar-smoke-ci.md - docs/architecture/retrieval-*.md - docs/testing/retrieval-architecture.md - docker/retrieval-compose.yml diff --git a/AGENTS.md b/AGENTS.md index 40c70c4c..4765c36d 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -40,7 +40,17 @@ - PRs should include a summary, tests run, linked issues, and relevant artifacts for behavior changes. ## Retrieval documentation -- Canonical sidecar retrieval docs are `docs/architecture/retrieval-design.md`, `docs/architecture/retrieval-parser-compat-matrix.md`, `docs/testing/retrieval-architecture.md`, and `docs/ops/retrieval-sidecars.md`. +- Canonical sidecar retrieval docs are `docs/architecture/retrieval-design.md`, `docs/testing/retrieval-architecture.md`, and `docs/ops/retrieval-sidecars.md`. Parser compatibility records live in `docs/architecture/language-support.md`. + +## Coding & Design Constraints + +Current merge bar for production changes: + +1. No holdout literals in production paths — packet/search code must not depend on benchmark holdout repo names, fixture paths, or expected-answer shapes. +2. Eval probes stay test-only — benchmark-shaped probe catalogs remain behind the test-only eval-probe boundary. +3. Language support claims match claim tier definitions — distinguish parser-backed graph coverage, structural collectors, and agent-facing packet quality. +4. Benchmark assertions reference living stats, not hard-coded baselines — repo-scale timing belongs in `docs/testing/codestory-e2e-stats-log.md`. +5. Retrieval mode changes require sidecar evidence — agent packet/search readiness must report full sidecar retrieval, not semantic-only fallback. ## Security & Configuration Tips - Keep secrets out of the repo; pass credentials via environment variables. diff --git a/Cargo.lock b/Cargo.lock index 68eb7f46..80d00dd8 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -387,7 +387,7 @@ checksum = "3a822ea5bc7590f9d40f1ba12c0dc3c2760f3482c6984db1573ad11031420831" [[package]] name = "codestory-bench" -version = "0.6.2" +version = "0.7.0" dependencies = [ "anyhow", "codestory-contracts", @@ -403,7 +403,7 @@ dependencies = [ [[package]] name = "codestory-cli" -version = "0.6.2" +version = "0.7.0" dependencies = [ "anyhow", "clap", @@ -425,7 +425,7 @@ dependencies = [ [[package]] name = "codestory-contracts" -version = "0.6.2" +version = "0.7.0" dependencies = [ "anyhow", "crossbeam-channel", @@ -440,7 +440,7 @@ dependencies = [ [[package]] name = "codestory-indexer" -version = "0.6.2" +version = "0.7.0" dependencies = [ "anyhow", "codestory-contracts", @@ -459,23 +459,27 @@ dependencies = [ "tracing", "tracing-subscriber", "tree-sitter", + "tree-sitter-bash", "tree-sitter-c", "tree-sitter-c-sharp", "tree-sitter-cpp", + "tree-sitter-dart-orchard", "tree-sitter-go", "tree-sitter-graph", "tree-sitter-java", "tree-sitter-javascript", + "tree-sitter-kotlin-ng", "tree-sitter-php", "tree-sitter-python", "tree-sitter-ruby", "tree-sitter-rust", + "tree-sitter-swift", "tree-sitter-typescript", ] [[package]] name = "codestory-retrieval" -version = "0.6.2" +version = "0.7.0" dependencies = [ "anyhow", "chrono", @@ -494,7 +498,7 @@ dependencies = [ [[package]] name = "codestory-runtime" -version = "0.6.2" +version = "0.7.0" dependencies = [ "anyhow", "codestory-contracts", @@ -520,7 +524,7 @@ dependencies = [ [[package]] name = "codestory-store" -version = "0.6.2" +version = "0.7.0" dependencies = [ "anyhow", "codestory-contracts", @@ -535,7 +539,7 @@ dependencies = [ [[package]] name = "codestory-workspace" -version = "0.6.2" +version = "0.7.0" dependencies = [ "anyhow", "codestory-contracts", @@ -3301,6 +3305,16 @@ dependencies = [ "tree-sitter-language", ] +[[package]] +name = "tree-sitter-bash" +version = "0.23.3" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "329a4d48623ac337d42b1df84e81a1c9dbb2946907c102ca72db158c1964a52e" +dependencies = [ + "cc", + "tree-sitter-language", +] + [[package]] name = "tree-sitter-c" version = "0.23.4" @@ -3331,6 +3345,16 @@ dependencies = [ "tree-sitter-language", ] +[[package]] +name = "tree-sitter-dart-orchard" +version = "0.3.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "3fcc68a1fd54afbeabc13b64a07b1ef611805690800fa2a57c1d1b90970a902e" +dependencies = [ + "cc", + "tree-sitter-language", +] + [[package]] name = "tree-sitter-go" version = "0.23.4" @@ -3377,6 +3401,16 @@ dependencies = [ "tree-sitter-language", ] +[[package]] +name = "tree-sitter-kotlin-ng" +version = "1.1.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "e800ebbda938acfbf224f4d2c34947a31994b1295ee6e819b65226c7b51b4450" +dependencies = [ + "cc", + "tree-sitter-language", +] + [[package]] name = "tree-sitter-language" version = "0.1.7" @@ -3423,6 +3457,16 @@ dependencies = [ "tree-sitter-language", ] +[[package]] +name = "tree-sitter-swift" +version = "0.7.0" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "bdc72ea9c62a6d188c9f7d64109a9b14b09231852b87229c68c44e8738b9e6b9" +dependencies = [ + "cc", + "tree-sitter-language", +] + [[package]] name = "tree-sitter-typescript" version = "0.23.2" diff --git a/Cargo.toml b/Cargo.toml index 26643969..a0bb26b8 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -34,6 +34,10 @@ tree-sitter-go = "0.23.4" tree-sitter-ruby = "0.23.1" tree-sitter-php = "0.23.11" tree-sitter-c-sharp = "=0.23.0" +tree-sitter-kotlin-ng = "1.1.0" +tree-sitter-swift = "0.7.0" +tree-sitter-dart-orchard = "0.3.2" +tree-sitter-bash = "0.23.3" # Semantic Analysis tree-sitter-graph = "0.12" diff --git a/README.md b/README.md index d90aa672..4dc9f716 100644 --- a/README.md +++ b/README.md @@ -7,7 +7,7 @@ Local codebase grounding for coding agents.

License: Apache-2.0 Rust 2024 -Benchmarks +Benchmarks

CodeStory builds a local evidence layer for a repository. It indexes files, @@ -33,27 +33,31 @@ for cache health, indexing, search, trails, snippets, and source-backed answers that name the files they used. The per-project SQLite cache is separate from the optional local retrieval sidecars used by packet/search workflows; a healthy local navigation readiness report does not by itself prove agent packet/search -readiness. Benchmark notes are environment- and repository-specific evidence, -so public claims should cite the checked setup instead of promising universal -speedups or savings. +readiness and does not by itself prove sidecar readiness. Benchmark notes are +environment- and repository-specific evidence, so public claims should cite the +checked setup instead of promising universal speedups or savings. ## Try It On A Repo From this checkout, build the CLI and point it at any repository: -```powershell +```sh cargo build --release -p codestory-cli -$CodeStoryCli = ".\target\release\codestory-cli.exe" -$TargetWorkspace = "C:\path\to\repo" - -& $CodeStoryCli doctor --project $TargetWorkspace -& $CodeStoryCli setup embeddings --project $TargetWorkspace --dry-run --format json -& $CodeStoryCli index --project $TargetWorkspace --refresh full -& $CodeStoryCli ground --project $TargetWorkspace --why -& $CodeStoryCli report --project $TargetWorkspace --output-file .\codestory-report.md -& $CodeStoryCli report --project $TargetWorkspace --format json --output-file .\codestory-graph.json +CODESTORY_CLI="./target/release/codestory-cli" +TARGET_WORKSPACE="/path/to/repo" + +"$CODESTORY_CLI" doctor --project "$TARGET_WORKSPACE" +"$CODESTORY_CLI" setup embeddings --project "$TARGET_WORKSPACE" --dry-run --format json +"$CODESTORY_CLI" index --project "$TARGET_WORKSPACE" --refresh full +"$CODESTORY_CLI" ground --project "$TARGET_WORKSPACE" --why +"$CODESTORY_CLI" report --project "$TARGET_WORKSPACE" --output-file codestory-report.md +"$CODESTORY_CLI" report --project "$TARGET_WORKSPACE" --format json --output-file codestory-graph.json ``` +On Windows PowerShell, use `.\target\release\codestory-cli.exe`, environment +assignments such as `$env:NAME = "value"`, and normal Windows paths such as +`C:\path\to\repo`. + That basic path establishes local navigation readiness: the local cache, graph, lexical index, and DB-backed navigation commands are usable for health, file, symbol, trail, snippet, context, orientation checks, and derived report/export @@ -63,35 +67,19 @@ Markdown report and full JSON graph export are not source-of-truth state. The ma embedding dry-run is a local semantic setup check; it does not prove agent packet/search readiness. -Agent packet/search readiness has one extra contract: sidecar packet/search -evidence is trustworthy only when retrieval status reports `retrieval_mode=full`. -That full mode depends on local Zoekt, Qdrant, SCIP, and llama.cpp embedding -sidecars. - -```powershell -node scripts/setup-retrieval-env.mjs --fetch-embed-model -$env:CODESTORY_EMBED_MODEL_DIR = (Resolve-Path .\target\retrieval-models).Path -$env:CODESTORY_EMBED_BACKEND = "llamacpp" -$env:CODESTORY_EMBED_LLAMACPP_URL = "http://127.0.0.1:8080/v1/embeddings" - -cargo retrieval-setup -& $CodeStoryCli index --project $TargetWorkspace --refresh full -& $CodeStoryCli retrieval index --project $TargetWorkspace --refresh full -& $CodeStoryCli retrieval status --project $TargetWorkspace --format json -& $CodeStoryCli doctor --project $TargetWorkspace -``` - -Missing sidecars, stale manifests, disabled sidecars, mixed stored-doc vector -contracts, or diagnostic embedding modes are setup failures to fix before -trusting agent-facing packet/search evidence. +Agent packet/search readiness requires `retrieval_mode=full` from local Zoekt, +Qdrant, SCIP, and llama.cpp sidecars. See [docs/usage.md](docs/usage.md) for the +full local-navigation versus sidecar-readiness split and +[docs/ops/retrieval-sidecars.md](docs/ops/retrieval-sidecars.md) for sidecar +setup. After that first index, use narrower commands instead of asking the agent to start over: -```powershell -& $CodeStoryCli search --project $TargetWorkspace --query "request routing" --why -& $CodeStoryCli trail --project $TargetWorkspace --id --story --hide-speculative -& $CodeStoryCli snippet --project $TargetWorkspace --id --context 40 +```sh +"$CODESTORY_CLI" search --project "$TARGET_WORKSPACE" --query "request routing" --why +"$CODESTORY_CLI" trail --project "$TARGET_WORKSPACE" --id --story --hide-speculative +"$CODESTORY_CLI" snippet --project "$TARGET_WORKSPACE" --id --context 40 ``` A good CodeStory-backed answer should name the source files it used, say when @@ -102,31 +90,24 @@ For task-shaped flows, use [docs/usage.md](docs/usage.md). ## Retrieval sidecars -For Zoekt/Qdrant/SCIP packet retrieval, run once from this repository root -(Windows, macOS, or Linux): +For Zoekt/Qdrant/SCIP packet retrieval, run `cargo retrieval-setup` once from +this repository root, then follow +[docs/ops/retrieval-sidecars.md](docs/ops/retrieval-sidecars.md) for bootstrap +flags, version pins, and troubleshooting. -```sh -cargo retrieval-setup -``` - -`cargo retrieval-setup` builds `codestory-cli` if needed, starts Docker Compose sidecars when -Docker is available, writes local sidecar state, and waits for health probes. Check status with -`cargo retrieval-status`. +## Install As An Agent Skill -Bootstrap modifiers (pass through `cargo run`): +Use this path when CodeStory should be installed once as a grounding skill and +then pointed at whatever repository an agent is working on. ```sh -cargo run -p codestory-cli -- retrieval bootstrap --project . --skip-compose -cargo run -p codestory-cli -- retrieval bootstrap --project . --wait-secs 120 +SkillHome="" +mkdir -p "$SkillHome" +cp -R ./.agents/skills/codestory-grounding "$SkillHome/codestory-grounding" +bash "$SkillHome/codestory-grounding/scripts/setup.sh" ``` -Thin wrapper (same bootstrap, optional holdout clone): `node scripts/setup-retrieval-env.mjs`. -Details: [docs/ops/retrieval-sidecars.md](docs/ops/retrieval-sidecars.md). - -## Install As An Agent Skill - -Use this path when CodeStory should be installed once as a grounding skill and -then pointed at whatever repository an agent is working on. +On Windows PowerShell: ```powershell $SkillHome = "" @@ -135,12 +116,6 @@ Copy-Item -Recurse -Force .\.agents\skills\codestory-grounding "$SkillHome\codes & "$SkillHome\codestory-grounding\scripts\setup.ps1" ``` -On Unix-like systems: - -```sh -bash "/codestory-grounding/scripts/setup.sh" -``` - The setup script prints `CODESTORY_CLI=`. Persist that path if your agent environment does not preserve variables between sessions. @@ -155,8 +130,8 @@ The skill package lives at | Build or refresh an index | `codestory-cli index --project --refresh full` | | Broad orientation | `codestory-cli ground --project --why` | | Repo report / graph export | `codestory-cli report --project --format markdown` | -| Broad task evidence | `codestory-cli packet --project --question "" --budget compact` | -| Candidate discovery | `codestory-cli search --project --query "" --why` | +| Broad task evidence (requires full sidecar retrieval) | `codestory-cli packet --project --question "" --budget compact` | +| Candidate discovery (requires full sidecar retrieval) | `codestory-cli search --project --query "" --why` | | Exact symbol evidence | `codestory-cli symbol --project --id ` | | Flow evidence | `codestory-cli trail --project --id --story --hide-speculative` | | Source excerpt | `codestory-cli snippet --project --id ` | @@ -165,9 +140,10 @@ The skill package lives at | Changed-file impact | `codestory-cli affected --project --format markdown` | | Persistent read surface | `codestory-cli serve --project --stdio` | -Use `packet` for broad task questions. Target context is DB-first evidence for -one concrete target; use `context` after search, trail, explore, or a bookmark -has selected that target. Use `doctor` when output looks stale, incomplete, or +Use `packet` for broad task questions once `ready --goal agent` reports full +sidecar retrieval. For local cache-only inspection, start with `ground`, +`report`, or `doctor`, then use `symbol`, `trail`, `snippet`, or `context` after +you have a concrete target. Use `doctor` when output looks stale, incomplete, or inconsistent. ## What It Builds @@ -185,6 +161,26 @@ flowchart LR CodeStory builds a local evidence layer so agents can request grounded context instead of relying on ad hoc file reads. +## Language Support Claims + +CodeStory separates parser-backed graph indexing, regression-tested accuracy, +structural extraction, framework route coverage, and agent packet/search +readiness. The current contract is documented in +[docs/architecture/language-support.md](docs/architecture/language-support.md). + +In short: Python, Java, Rust, JavaScript, TypeScript/TSX, C++, C, Go, Ruby, +PHP, C#, Kotlin, Swift, Dart, and Bash are fidelity-gated parser-backed graph +languages; HTML, CSS, and SQL use structural collectors. + +The opt-in OSS language corpus pairs each public language-support profile with a +pinned medium-sized open source project and compares raw filesystem counts +against CodeStory indexing of the same files: +[docs/testing/oss-language-corpus.md](docs/testing/oss-language-corpus.md). +The separate `language-expansion-holdout` benchmark suite runs strict +`without_codestory` versus `with_codestory` agent tasks on those pinned +projects and records elapsed time, token usage, estimated cost, tool calls, +command counts, source reads, post-packet source reads, and quality gates. + For the system model, start with [docs/concepts/how-codestory-works.md](docs/concepts/how-codestory-works.md), then [docs/architecture/overview.md](docs/architecture/overview.md). @@ -196,7 +192,7 @@ benchmark history from the state of your local cache, which can drift and should be checked with `doctor`. - Public evidence summary and caveats: - [docs/testing/benchmark-results.md](docs/testing/benchmark-results.md) + [docs/testing/benchmark-ledger.md](docs/testing/benchmark-ledger.md) - Repo-scale timing history: [docs/testing/codestory-e2e-stats-log.md](docs/testing/codestory-e2e-stats-log.md) - Warm stdio loop evidence: @@ -215,13 +211,13 @@ workspace shares build locks. - [docs/contributors/debugging.md](docs/contributors/debugging.md) - [docs/contributors/testing-matrix.md](docs/contributors/testing-matrix.md) - [docs/architecture/runtime-execution-path.md](docs/architecture/runtime-execution-path.md) +- [docs/architecture/language-support.md](docs/architecture/language-support.md) - [docs/architecture/subsystems/contracts.md](docs/architecture/subsystems/contracts.md) - [docs/architecture/subsystems/workspace.md](docs/architecture/subsystems/workspace.md) - [docs/architecture/subsystems/indexer.md](docs/architecture/subsystems/indexer.md) - [docs/architecture/subsystems/store.md](docs/architecture/subsystems/store.md) - [docs/architecture/subsystems/runtime.md](docs/architecture/subsystems/runtime.md) - [docs/architecture/subsystems/cli.md](docs/architecture/subsystems/cli.md) -- [docs/decision-log.md](docs/decision-log.md) ## License diff --git a/benchmarks/tasks/README.md b/benchmarks/tasks/README.md index e6fdb7ad..90c3732a 100644 --- a/benchmarks/tasks/README.md +++ b/benchmarks/tasks/README.md @@ -36,16 +36,16 @@ files, symbols, claims, citations, and forbidden-claim checks for that manifest. It does not by itself establish speed, cost, or product headline claims. Repository metadata in each manifest records the intended public clone target, -immutable commit or tag ref, optional workspace root, languages, and lightweight setup notes. The -benchmark harness must still know how to map each `repo.name` to a local clone -before it can execute the task. Until that mapping exists, the manifest remains -valid corpus data but is not runnable through the harness. +a full 40-character immutable Git commit SHA, optional workspace root, +languages, and lightweight setup notes. The benchmark harness must still know +how to map each `repo.name` to a local clone before it can execute the task. +Until that mapping exists, the manifest remains valid corpus data but is not +runnable through the harness. Expected setup is intentionally simple: -- Clone the public repository URL at the manifest `repo.ref`; branch-like refs - such as `main` are allowed for local diagnostics only and fail publishable - provenance gates. +- Clone the public repository URL at the manifest `repo.ref`. Branches, tags, + and short SHAs are intentionally excluded so benchmark provenance is stable. - Run the listed setup commands only when the benchmark runner needs local dependency metadata or tests. - Treat `repo.workspace_root` as the benchmark working directory when it is @@ -63,6 +63,90 @@ Direct packet runtime rows are available with `--packet-runtime`. They compare cold CLI packet calls with warm `serve --stdio` packet calls while reusing the same expected-anchor quality gates. +## Language Expansion Holdout + +The `language-expansion-holdout` suite is the triggerable agent A/B suite for +public language-support profiles. It is separate from the OSS language corpus: + +- The OSS corpus checks whether CodeStory can index pinned real projects. +- This suite runs paired `without_codestory` and `with_codestory` agent arms + against those pinned projects and records elapsed time, token usage, estimated + cost, observed tool calls, command counts, command categories, source reads, + source reads after the first CodeStory packet, and manifest quality gates. +- The `without_codestory` arm mechanically runs a harness-owned local `rg` plus + bounded source-read prelude. The `with_codestory` arm mechanically runs a + harness-owned `codestory-cli packet` prelude. Both preludes count their wall + time and command/tool accounting. The CodeStory arm is packet-first, not + packet-only by default: if the packet and CodeStory follow-ups are partial, + ordinary local source reads are allowed after CodeStory and counted as + post-packet overhead. Pass `--max-source-reads-after-packet 0` only when you + want stricter packet-only promotion evidence. The `without_codestory` arm is + invalid for publishable evidence if it calls CodeStory or never inspects the + local repository. + +The suite currently has one medium-sized open source project per public +language-support profile: parser-backed graph languages (Python, Java, Rust, +JavaScript, TypeScript, C++, C, Go, Ruby, PHP, C#, Kotlin, Swift, Dart, Bash) +plus structural collectors (HTML, CSS, SQL). + +Materialize the pinned repos: + +```powershell +node scripts/codestory-agent-ab-benchmark.mjs ` + --list --task-suite language-expansion-holdout --materialize-repos +``` + +Run a strict paired comparison: + +```powershell +node scripts/codestory-agent-ab-benchmark.mjs ` + --task-suite language-expansion-holdout ` + --arms without_codestory,with_codestory ` + --repeats 3 --materialize-repos --prepare-codestory-cache ` + --out-dir target/agent-benchmark/language-expansion-holdout ` + --timeout-ms 600000 +``` + +Use `--task-ids ` for a cheaper targeted run. The Markdown summary table +includes the human-readable A/B columns; `runs.jsonl` remains the source of +truth for per-run metrics. + +For runtime packet fixes, prefer a packet-first gated loop before launching +nested agents: + +```powershell +node scripts/codestory-agent-ab-score.mjs ` + --packet-gate --packet-probe-jobs 4 ` + --packet-gate-improved-from target/agent-benchmark/ ` + --reuse-baseline-from target/agent-benchmark/ ` + --prepare-codestory-jobs 2 ` + --task-ids ` + --out-dir target/agent-benchmark/ +``` + +`--packet-probe-jobs` controls cheap packet probes, `--jobs` controls +independent nested A/B repo groups, and `--prepare-codestory-jobs` caps cache +prep across repos. If a packet probe fails from transient sidecar +unavailability, the score wrapper reruns just those task ids serially in a +`packet-probes-retry` artifact before deciding which rows enter the A/B phase. +Baseline reuse is valid only when the task manifest and scorer boundary are +unchanged. + +For anti-overfit language checks, run promotion-oriented packet gates with +production defaults. Exact benchmark probes belong in benchmark manifests, +explicit `--extra-probe` inputs, or eval-only diagnostics; they are benchmark +fixture behavior, not production steering. Framework/domain semantics belong in +product code when they generalize to real projects. + +Write fresh outputs under `target/agent-benchmark/` and summarize the +durable result in [language-expansion-ab-report.md](../../docs/testing/language-expansion-ab-report.md) +instead of preserving local run directory catalogs here. The current packet +runtime artifact passes manifest quality for `12/18` rows and is +packet-sufficient for `9/18`; the packet-eligible A/B slice is a quality +and efficiency win for its selected `9/9` CodeStory rows only. Treat that as +packet-eligible slice evidence, not broad promotion proof for all public +language-support profiles. + ## Local Real-Repo Corpus The `local-real` suite targets sibling checkouts under the parent directory of @@ -140,6 +224,10 @@ may exceed the default timeout on cold index; increase `--timeout-ms` when neede - Do **not** add repo-name, path, or display-name literals for `ripgrep`, `axios`, or `redis` in v2 planner or ranker code. +- Keep holdout-specific probes and claim templates in manifests, benchmark + harnesses, tests, or `crates/codestory-runtime/src/agent/eval_probes.rs` + behind `CODESTORY_EVAL_PROBES`; do not put them in product packet/search + planning or ranking paths. - Do **not** iterate KPI fixes against holdout manifests; use `local-real` for in-scope tuning and treat holdout rows as promotion-only evidence. - Legacy sibling apps (`freelancer`, `traderotate`) are removed from default diff --git a/benchmarks/tasks/eval-probes.json b/benchmarks/tasks/eval-probes.json index b97055ae..a8350a2c 100644 --- a/benchmarks/tasks/eval-probes.json +++ b/benchmarks/tasks/eval-probes.json @@ -171,6 +171,32 @@ "executeCommand" ] }, + { + "all_terms": ["requests", "session"], + "any_terms": ["prepared", "preparedrequest", "adapter", "send"], + "queries": [ + "Session.request", + "Session.prepare_request", + "PreparedRequest.prepare", + "Session.send", + "HTTPAdapter.send" + ] + }, + { + "all_terms": ["express"], + "any_terms": ["application", "app", "route", "routes", "router", "middleware"], + "queries": [ + "createApplication", + "app.init", + "app.handle", + "app.use", + "app.route", + "res.send", + "application.js app.use", + "application handle use route", + "response send body" + ] + }, { "all_terms": [], "any_terms": [ @@ -272,6 +298,29 @@ "IndexerCommandCxx", "IndexerJava::doIndex" ] + }, + { + "all_terms": ["requests", "session"], + "any_terms": ["prepared", "preparedrequest", "adapter", "send"], + "queries": [ + "Session.request", + "Session.prepare_request", + "PreparedRequest.prepare", + "Session.send", + "HTTPAdapter.send" + ] + }, + { + "all_terms": ["express"], + "any_terms": ["application", "app", "route", "routes", "router", "middleware"], + "queries": [ + "createApplication", + "app.init", + "app.handle", + "app.use", + "app.route", + "res.send" + ] } ], "citation_rank_adjustments": [ diff --git a/benchmarks/tasks/express-application-routing-flow.task.json b/benchmarks/tasks/express-application-routing-flow.task.json index a680880c..9994a043 100644 --- a/benchmarks/tasks/express-application-routing-flow.task.json +++ b/benchmarks/tasks/express-application-routing-flow.task.json @@ -7,7 +7,7 @@ "repo": { "name": "express", "url": "https://github.com/expressjs/express.git", - "ref": "4.18.2", + "ref": "8368dc178af16b91b576c4c1d135f701a0007e5d", "workspace_root": ".", "languages": [ "JavaScript" diff --git a/benchmarks/tasks/express-response-send-bug-localization.task.json b/benchmarks/tasks/express-response-send-bug-localization.task.json index 2691ab62..ff2f8da1 100644 --- a/benchmarks/tasks/express-response-send-bug-localization.task.json +++ b/benchmarks/tasks/express-response-send-bug-localization.task.json @@ -7,7 +7,7 @@ "repo": { "name": "express", "url": "https://github.com/expressjs/express.git", - "ref": "4.18.2", + "ref": "8368dc178af16b91b576c4c1d135f701a0007e5d", "workspace_root": ".", "languages": [ "JavaScript" diff --git a/benchmarks/tasks/express-response-symbol-ownership.task.json b/benchmarks/tasks/express-response-symbol-ownership.task.json index 6b691ae6..8a417f5d 100644 --- a/benchmarks/tasks/express-response-symbol-ownership.task.json +++ b/benchmarks/tasks/express-response-symbol-ownership.task.json @@ -7,7 +7,7 @@ "repo": { "name": "express", "url": "https://github.com/expressjs/express.git", - "ref": "4.18.2", + "ref": "8368dc178af16b91b576c4c1d135f701a0007e5d", "workspace_root": ".", "languages": [ "JavaScript" diff --git a/benchmarks/tasks/express-router-param-bug-localization.task.json b/benchmarks/tasks/express-router-param-bug-localization.task.json index eec76bad..d1a1c691 100644 --- a/benchmarks/tasks/express-router-param-bug-localization.task.json +++ b/benchmarks/tasks/express-router-param-bug-localization.task.json @@ -7,7 +7,7 @@ "repo": { "name": "express", "url": "https://github.com/expressjs/express.git", - "ref": "4.18.2", + "ref": "8368dc178af16b91b576c4c1d135f701a0007e5d", "workspace_root": ".", "languages": [ "JavaScript" diff --git a/benchmarks/tasks/flask-blueprint-registration-impact.task.json b/benchmarks/tasks/flask-blueprint-registration-impact.task.json index 79afcb29..3f0004a8 100644 --- a/benchmarks/tasks/flask-blueprint-registration-impact.task.json +++ b/benchmarks/tasks/flask-blueprint-registration-impact.task.json @@ -7,7 +7,7 @@ "repo": { "name": "flask", "url": "https://github.com/pallets/flask.git", - "ref": "3.1.1", + "ref": "7fff56f5172c48b6f3aedf17ee14ef5c2533dfd1", "workspace_root": ".", "languages": [ "Python" diff --git a/benchmarks/tasks/flask-request-dispatch-flow.task.json b/benchmarks/tasks/flask-request-dispatch-flow.task.json index 784028fd..0609c0b5 100644 --- a/benchmarks/tasks/flask-request-dispatch-flow.task.json +++ b/benchmarks/tasks/flask-request-dispatch-flow.task.json @@ -7,7 +7,7 @@ "repo": { "name": "flask", "url": "https://github.com/pallets/flask.git", - "ref": "3.1.1", + "ref": "7fff56f5172c48b6f3aedf17ee14ef5c2533dfd1", "workspace_root": ".", "languages": [ "Python" diff --git a/benchmarks/tasks/flask-routing-symbol-ownership.task.json b/benchmarks/tasks/flask-routing-symbol-ownership.task.json index 4204bba6..a328edf6 100644 --- a/benchmarks/tasks/flask-routing-symbol-ownership.task.json +++ b/benchmarks/tasks/flask-routing-symbol-ownership.task.json @@ -7,7 +7,7 @@ "repo": { "name": "flask", "url": "https://github.com/pallets/flask.git", - "ref": "3.1.1", + "ref": "7fff56f5172c48b6f3aedf17ee14ef5c2533dfd1", "workspace_root": ".", "languages": [ "Python" diff --git a/benchmarks/tasks/flask-session-cookie-edit-plan.task.json b/benchmarks/tasks/flask-session-cookie-edit-plan.task.json index cb46bb71..72649457 100644 --- a/benchmarks/tasks/flask-session-cookie-edit-plan.task.json +++ b/benchmarks/tasks/flask-session-cookie-edit-plan.task.json @@ -7,7 +7,7 @@ "repo": { "name": "flask", "url": "https://github.com/pallets/flask.git", - "ref": "3.1.1", + "ref": "7fff56f5172c48b6f3aedf17ee14ef5c2533dfd1", "workspace_root": ".", "languages": [ "Python" diff --git a/benchmarks/tasks/holdout-retrieval/axios-request-dispatch.task.json b/benchmarks/tasks/holdout-retrieval/axios-request-dispatch.task.json index 98d2859b..9e18d520 100644 --- a/benchmarks/tasks/holdout-retrieval/axios-request-dispatch.task.json +++ b/benchmarks/tasks/holdout-retrieval/axios-request-dispatch.task.json @@ -7,7 +7,7 @@ "repo": { "name": "axios", "url": "https://github.com/axios/axios.git", - "ref": "v1.6.8", + "ref": "ab3f0f9a94853c821cb00f1112788ecdd3ae7ed1", "workspace_root": ".", "languages": [ "JavaScript", diff --git a/benchmarks/tasks/holdout-retrieval/redis-server-event-loop.task.json b/benchmarks/tasks/holdout-retrieval/redis-server-event-loop.task.json index e0dadebd..d195d906 100644 --- a/benchmarks/tasks/holdout-retrieval/redis-server-event-loop.task.json +++ b/benchmarks/tasks/holdout-retrieval/redis-server-event-loop.task.json @@ -7,7 +7,7 @@ "repo": { "name": "redis", "url": "https://github.com/redis/redis.git", - "ref": "7.2.4", + "ref": "d2c8a4b91e8c0e6aefd1f5bc0bf582cddbe046b7", "workspace_root": ".", "languages": [ "C" diff --git a/benchmarks/tasks/holdout-retrieval/ripgrep-search-pipeline.task.json b/benchmarks/tasks/holdout-retrieval/ripgrep-search-pipeline.task.json index a1bd5a2d..2bf49837 100644 --- a/benchmarks/tasks/holdout-retrieval/ripgrep-search-pipeline.task.json +++ b/benchmarks/tasks/holdout-retrieval/ripgrep-search-pipeline.task.json @@ -7,7 +7,7 @@ "repo": { "name": "ripgrep", "url": "https://github.com/BurntSushi/ripgrep.git", - "ref": "14.1.0", + "ref": "e50df40a1967708b9781486b1c017e48040bceb0", "workspace_root": ".", "languages": [ "Rust" diff --git a/benchmarks/tasks/language-expansion-holdout/language-support-ab.task.json b/benchmarks/tasks/language-expansion-holdout/language-support-ab.task.json new file mode 100644 index 00000000..2d07a836 --- /dev/null +++ b/benchmarks/tasks/language-expansion-holdout/language-support-ab.task.json @@ -0,0 +1,1706 @@ +{ + "tasks": [ + { + "id": "python-requests-session-flow", + "version": 1, + "suite": "language-expansion-holdout", + "task_class": "architecture_explanation", + "repo": { + "name": "psf-requests", + "url": "https://github.com/psf/requests.git", + "ref": "6f66281a1d6326b1b9c4ac09ca30de0fc4e6ef43", + "workspace_root": ".", + "languages": [ + "Python" + ], + "setup": [ + "No dependency setup is required for read-only benchmark inspection." + ] + }, + "prompt": "Explain how Requests turns a top-level request call into a prepared request and sends it through a session adapter. Cite the source files and name the supporting symbols.", + "expected_files": [ + "src/requests/api.py", + "src/requests/sessions.py", + "src/requests/models.py", + "src/requests/adapters.py" + ], + "expected_symbols": [ + { + "name": "request", + "path": "src/requests/api.py", + "kind": "function", + "why": "Top-level helper that creates a Session and delegates request execution." + }, + { + "name": "Session", + "path": "src/requests/sessions.py", + "kind": "class", + "why": "Owns request preparation, adapter lookup, and sending." + }, + { + "name": "Session.request", + "path": "src/requests/sessions.py", + "kind": "method", + "why": "Builds and prepares the Request before sending." + }, + { + "name": "Session.send", + "path": "src/requests/sessions.py", + "kind": "method", + "why": "Dispatches a PreparedRequest through the selected adapter." + }, + { + "name": "PreparedRequest.prepare", + "path": "src/requests/models.py", + "kind": "method", + "why": "Materializes method, URL, headers, body, auth, cookies, and hooks." + }, + { + "name": "HTTPAdapter.send", + "path": "src/requests/adapters.py", + "kind": "method", + "why": "Performs the transport-level send." + } + ], + "expected_claims": [ + { + "text": "The top-level request helper opens a Session and delegates to Session.request." + }, + { + "text": "Session.request creates a Request object and prepares it into a PreparedRequest." + }, + { + "text": "Session.send chooses an adapter and calls the adapter send method." + }, + { + "text": "HTTPAdapter.send is the transport boundary that returns the response." + } + ], + "forbidden_claims": [ + { + "text": "The top-level request helper sends directly over the network without a Session.", + "severity": "critical" + } + ], + "quality_thresholds": { + "min_expected_anchor_recall": 0.62, + "min_expected_file_recall": 0.6, + "min_expected_symbol_recall": 0.55, + "min_expected_claim_recall": 0.65, + "min_citation_coverage": 0.6, + "max_forbidden_claims": 0 + }, + "tags": [ + "language-expansion", + "python", + "http" + ] + }, + { + "id": "java-commons-lang-string-utils", + "version": 1, + "suite": "language-expansion-holdout", + "task_class": "architecture_explanation", + "repo": { + "name": "apache-commons-lang", + "url": "https://github.com/apache/commons-lang.git", + "ref": "57f39420fef8413ea42f045f1bdba4864ff75a0c", + "workspace_root": ".", + "languages": [ + "Java" + ], + "setup": [ + "No dependency setup is required for read-only benchmark inspection." + ] + }, + "prompt": "Explain how Commons Lang implements blank, empty, and case-sensitive string checks across StringUtils, Strings, and CharSequenceUtils. Cite the source files and name the supporting symbols.", + "expected_files": [ + "src/main/java/org/apache/commons/lang3/StringUtils.java", + "src/main/java/org/apache/commons/lang3/Strings.java", + "src/main/java/org/apache/commons/lang3/CharSequenceUtils.java" + ], + "expected_symbols": [ + { + "name": "StringUtils", + "path": "src/main/java/org/apache/commons/lang3/StringUtils.java", + "kind": "class", + "why": "Public string helper surface." + }, + { + "name": "StringUtils.isBlank", + "path": "src/main/java/org/apache/commons/lang3/StringUtils.java", + "kind": "method", + "why": "Blank predicate under test." + }, + { + "name": "StringUtils.isEmpty", + "path": "src/main/java/org/apache/commons/lang3/StringUtils.java", + "kind": "method", + "why": "Empty predicate under test." + }, + { + "name": "Strings", + "path": "src/main/java/org/apache/commons/lang3/Strings.java", + "kind": "class", + "why": "Case-sensitive and case-insensitive string operations." + }, + { + "name": "CharSequenceUtils.regionMatches", + "path": "src/main/java/org/apache/commons/lang3/CharSequenceUtils.java", + "kind": "method", + "why": "Shared region comparison helper." + } + ], + "expected_claims": [ + { + "text": "StringUtils.isBlank treats null, empty, and whitespace-only inputs as blank." + }, + { + "text": "StringUtils.isEmpty does not trim whitespace before deciding emptiness." + }, + { + "text": "Strings delegates region matching work to CharSequenceUtils.regionMatches." + } + ], + "forbidden_claims": [ + { + "text": "StringUtils.isEmpty treats whitespace-only strings as empty.", + "severity": "major" + } + ], + "quality_thresholds": { + "min_expected_anchor_recall": 0.62, + "min_expected_file_recall": 0.6, + "min_expected_symbol_recall": 0.55, + "min_expected_claim_recall": 0.65, + "min_citation_coverage": 0.6, + "max_forbidden_claims": 0 + }, + "tags": [ + "language-expansion", + "java", + "strings" + ] + }, + { + "id": "rust-ripgrep-search-pipeline", + "version": 1, + "suite": "language-expansion-holdout", + "task_class": "architecture_explanation", + "repo": { + "name": "BurntSushi-ripgrep", + "url": "https://github.com/BurntSushi/ripgrep.git", + "ref": "82313cf95849bfe425109ad9506a52154879b1b1", + "workspace_root": ".", + "languages": [ + "Rust" + ], + "setup": [ + "No dependency setup is required for read-only benchmark inspection." + ] + }, + "prompt": "Explain how ripgrep parses CLI flags, walks candidate files, and executes a search over each haystack through matcher, searcher, and printer components. Cite the source files and name the supporting symbols.", + "expected_files": [ + "crates/core/main.rs", + "crates/core/flags/mod.rs", + "crates/core/flags/hiargs.rs", + "crates/core/haystack.rs", + "crates/core/search.rs" + ], + "expected_symbols": [ + { + "name": "main", + "path": "crates/core/main.rs", + "kind": "function", + "why": "CLI entry point." + }, + { + "name": "run", + "path": "crates/core/main.rs", + "kind": "function", + "why": "Routes parsed args to the selected operating mode." + }, + { + "name": "search", + "path": "crates/core/main.rs", + "kind": "function", + "why": "Single-threaded walk and per-file search loop." + }, + { + "name": "SearchWorker", + "path": "crates/core/search.rs", + "kind": "struct", + "why": "Coordinates matcher, searcher, and printer." + }, + { + "name": "SearchWorker::search", + "path": "crates/core/search.rs", + "kind": "method", + "why": "Executes one haystack search." + } + ], + "expected_claims": [ + { + "text": "main calls run after flags are parsed." + }, + { + "text": "HiArgs builds walkers, matchers, searchers, and printers for the search driver." + }, + { + "text": "SearchWorker connects a matcher, grep searcher, and printer for each haystack." + }, + { + "text": "Parallel search uses the walker parallel builder to distribute file work." + } + ], + "forbidden_claims": [ + { + "text": "Flag parsing happens inside SearchWorker instead of before run selects a mode.", + "severity": "major" + } + ], + "quality_thresholds": { + "min_expected_anchor_recall": 0.62, + "min_expected_file_recall": 0.6, + "min_expected_symbol_recall": 0.55, + "min_expected_claim_recall": 0.65, + "min_citation_coverage": 0.6, + "max_forbidden_claims": 0 + }, + "tags": [ + "language-expansion", + "rust", + "search" + ] + }, + { + "id": "javascript-express-routing-flow", + "version": 1, + "suite": "language-expansion-holdout", + "task_class": "route_tracing", + "repo": { + "name": "expressjs-express", + "url": "https://github.com/expressjs/express.git", + "ref": "dae209ae6559c29cfca2a1f4414c51d89ea643d5", + "workspace_root": ".", + "languages": [ + "JavaScript" + ], + "setup": [ + "No dependency setup is required for read-only benchmark inspection." + ] + }, + "prompt": "Trace how Express creates an application, registers middleware/routes, and handles an incoming request through the router and response helpers. Cite the source files and name the supporting symbols.", + "expected_files": [ + "lib/express.js", + "lib/application.js", + "lib/request.js", + "lib/response.js" + ], + "expected_symbols": [ + { + "name": "createApplication", + "path": "lib/express.js", + "kind": "function", + "why": "Creates the callable app object." + }, + { + "name": "app.init", + "path": "lib/application.js", + "kind": "method", + "why": "Initializes app state." + }, + { + "name": "app.handle", + "path": "lib/application.js", + "kind": "method", + "why": "Handles incoming requests." + }, + { + "name": "app.use", + "path": "lib/application.js", + "kind": "method", + "why": "Registers middleware." + }, + { + "name": "app.route", + "path": "lib/application.js", + "kind": "method", + "why": "Creates route entries." + }, + { + "name": "res.send", + "path": "lib/response.js", + "kind": "method", + "why": "Sends response bodies." + } + ], + "expected_claims": [ + { + "text": "createApplication builds a callable app object and mixes in request and response prototypes." + }, + { + "text": "app.use registers middleware on the router." + }, + { + "text": "app.handle delegates request handling to the router." + }, + { + "text": "res.send prepares and sends the response body." + } + ], + "forbidden_claims": [ + { + "text": "Express handles requests without delegating to a router.", + "severity": "critical" + } + ], + "quality_thresholds": { + "min_expected_anchor_recall": 0.62, + "min_expected_file_recall": 0.6, + "min_expected_symbol_recall": 0.55, + "min_expected_claim_recall": 0.65, + "min_citation_coverage": 0.6, + "max_forbidden_claims": 0 + }, + "tags": [ + "language-expansion", + "javascript", + "routing" + ] + }, + { + "id": "typescript-swr-hook-flow", + "version": 1, + "suite": "language-expansion-holdout", + "task_class": "architecture_explanation", + "repo": { + "name": "vercel-swr", + "url": "https://github.com/vercel/swr.git", + "ref": "f8d4995ac555f02a2784c8fc40bc819782c60568", + "workspace_root": ".", + "languages": [ + "TypeScript" + ], + "setup": [ + "No dependency setup is required for read-only benchmark inspection." + ] + }, + "prompt": "Explain how SWR exposes useSWR, serializes keys, connects cache helpers, and routes mutate behavior through the internal mutation helper. Cite the source files and name the supporting symbols.", + "expected_files": [ + "src/index/use-swr.ts", + "src/index/index.ts", + "src/_internal/utils/serialize.ts", + "src/_internal/utils/helper.ts", + "src/_internal/utils/mutate.ts", + "src/_internal/utils/with-middleware.ts" + ], + "expected_symbols": [ + { + "name": "useSWRHandler", + "path": "src/index/use-swr.ts", + "kind": "function", + "why": "Core hook implementation." + }, + { + "name": "useSWR", + "path": "src/index/index.ts", + "kind": "function", + "why": "Public default hook export." + }, + { + "name": "serialize", + "path": "src/_internal/utils/serialize.ts", + "kind": "function", + "why": "Normalizes cache keys." + }, + { + "name": "createCacheHelper", + "path": "src/_internal/utils/helper.ts", + "kind": "function", + "why": "Builds cache accessors for SWR state." + }, + { + "name": "internalMutate", + "path": "src/_internal/utils/mutate.ts", + "kind": "function", + "why": "Shared mutation path." + }, + { + "name": "withMiddleware", + "path": "src/_internal/utils/with-middleware.ts", + "kind": "function", + "why": "Wraps hooks with middleware." + } + ], + "expected_claims": [ + { + "text": "The public useSWR export wraps useSWRHandler with argument normalization." + }, + { + "text": "useSWRHandler serializes the key before reading cache state." + }, + { + "text": "createCacheHelper provides cache get, set, subscribe, and snapshot helpers." + }, + { + "text": "mutate behavior flows through internalMutate." + } + ], + "forbidden_claims": [ + { + "text": "SWR stores hook state without serializing the cache key.", + "severity": "major" + } + ], + "quality_thresholds": { + "min_expected_anchor_recall": 0.62, + "min_expected_file_recall": 0.6, + "min_expected_symbol_recall": 0.55, + "min_expected_claim_recall": 0.65, + "min_citation_coverage": 0.6, + "max_forbidden_claims": 0 + }, + "tags": [ + "language-expansion", + "typescript", + "react" + ] + }, + { + "id": "cpp-fmt-formatting-flow", + "version": 1, + "suite": "language-expansion-holdout", + "task_class": "architecture_explanation", + "repo": { + "name": "fmtlib-fmt", + "url": "https://github.com/fmtlib/fmt.git", + "ref": "e8deaf2ec3b53ced589fce6f640061e5b32eeeaa", + "workspace_root": ".", + "languages": [ + "C++" + ], + "setup": [ + "No dependency setup is required for read-only benchmark inspection." + ] + }, + "prompt": "Explain how fmt turns formatting arguments into type-erased format args and reaches vformat or format_to output paths. Cite the source files and name the supporting symbols.", + "expected_files": [ + "include/fmt/base.h", + "include/fmt/format.h", + "include/fmt/args.h", + "src/format.cc", + "src/os.cc" + ], + "expected_symbols": [ + { + "name": "vformat", + "path": "include/fmt/format.h", + "kind": "function", + "why": "Central type-erased formatting entry point." + }, + { + "name": "format_to", + "path": "include/fmt/base.h", + "kind": "function", + "why": "Writes formatted output to an iterator." + }, + { + "name": "format_arg_store", + "path": "include/fmt/base.h", + "kind": "template", + "why": "Stores type-erased formatting arguments." + }, + { + "name": "dynamic_format_arg_store", + "path": "include/fmt/args.h", + "kind": "class", + "why": "Runtime-owned argument store." + }, + { + "name": "format_error", + "path": "include/fmt/format.h", + "kind": "class", + "why": "Reports formatting failures." + } + ], + "expected_claims": [ + { + "text": "fmt builds type-erased format argument stores before dispatching formatting." + }, + { + "text": "vformat is the central formatting path for runtime format arguments." + }, + { + "text": "format_to writes formatted output through an output iterator." + }, + { + "text": "format_error represents formatting failures." + } + ], + "forbidden_claims": [ + { + "text": "fmt only supports compile-time formatting and has no runtime vformat path.", + "severity": "critical" + } + ], + "quality_thresholds": { + "min_expected_anchor_recall": 0.62, + "min_expected_file_recall": 0.6, + "min_expected_symbol_recall": 0.55, + "min_expected_claim_recall": 0.65, + "min_citation_coverage": 0.6, + "max_forbidden_claims": 0 + }, + "tags": [ + "language-expansion", + "cpp", + "formatting" + ] + }, + { + "id": "c-redis-command-loop", + "version": 1, + "suite": "language-expansion-holdout", + "task_class": "route_tracing", + "repo": { + "name": "redis-redis", + "url": "https://github.com/redis/redis.git", + "ref": "df63a65d4d4ee33ae67e9f101885074febe0bccb", + "workspace_root": ".", + "languages": [ + "C" + ], + "setup": [ + "No dependency setup is required for read-only benchmark inspection." + ] + }, + "prompt": "Trace how Redis initializes the server, enters the event loop, reads client input, and routes a command for execution. Cite the source files and name the supporting symbols.", + "expected_files": [ + "src/server.c", + "src/server.h", + "src/ae.c", + "src/networking.c" + ], + "expected_symbols": [ + { + "name": "main", + "path": "src/server.c", + "kind": "function", + "why": "Server process entry point." + }, + { + "name": "initServer", + "path": "src/server.c", + "kind": "function", + "why": "Initializes server state and event handlers." + }, + { + "name": "aeMain", + "path": "src/ae.c", + "kind": "function", + "why": "Event loop runner." + }, + { + "name": "readQueryFromClient", + "path": "src/networking.c", + "kind": "function", + "why": "Reads client query buffers." + }, + { + "name": "processCommand", + "path": "src/server.c", + "kind": "function", + "why": "Routes parsed commands to execution." + } + ], + "expected_claims": [ + { + "text": "Redis main initializes server state before entering the event loop." + }, + { + "text": "aeMain drives registered file and time events." + }, + { + "text": "readQueryFromClient reads client input into query buffers." + }, + { + "text": "processCommand performs command routing and execution checks." + } + ], + "forbidden_claims": [ + { + "text": "Redis executes commands directly from the TCP accept path without the event loop.", + "severity": "critical" + } + ], + "quality_thresholds": { + "min_expected_anchor_recall": 0.62, + "min_expected_file_recall": 0.6, + "min_expected_symbol_recall": 0.55, + "min_expected_claim_recall": 0.65, + "min_citation_coverage": 0.6, + "max_forbidden_claims": 0 + }, + "tags": [ + "language-expansion", + "c", + "server" + ] + }, + { + "id": "go-gin-route-dispatch", + "version": 1, + "suite": "language-expansion-holdout", + "task_class": "route_tracing", + "repo": { + "name": "gin-gonic-gin", + "url": "https://github.com/gin-gonic/gin.git", + "ref": "d75fcd4c9ab260e5225de590f1f0f8c0e0e12d11", + "workspace_root": ".", + "languages": [ + "Go" + ], + "setup": [ + "No dependency setup is required for read-only benchmark inspection." + ] + }, + "prompt": "Trace how Gin creates an engine, registers routes through router groups, stores them in method trees, and dispatches handlers for a request. Cite the source files and name the supporting symbols.", + "expected_files": [ + "gin.go", + "routergroup.go", + "tree.go", + "context.go" + ], + "expected_symbols": [ + { + "name": "New", + "path": "gin.go", + "kind": "function", + "why": "Creates an Engine." + }, + { + "name": "Default", + "path": "gin.go", + "kind": "function", + "why": "Creates an Engine with default middleware." + }, + { + "name": "RouterGroup.Handle", + "path": "routergroup.go", + "kind": "method", + "why": "Public route registration API." + }, + { + "name": "Engine.addRoute", + "path": "gin.go", + "kind": "method", + "why": "Adds route handlers to method trees." + }, + { + "name": "node.addRoute", + "path": "tree.go", + "kind": "method", + "why": "Inserts a route into the radix tree." + }, + { + "name": "Engine.handleHTTPRequest", + "path": "gin.go", + "kind": "method", + "why": "Matches request paths and dispatches handlers." + }, + { + "name": "Context.Next", + "path": "context.go", + "kind": "method", + "why": "Runs the handler chain." + } + ], + "expected_claims": [ + { + "text": "RouterGroup.Handle registers routes by delegating to the group handle path." + }, + { + "text": "Engine.addRoute inserts handlers into the per-method route tree." + }, + { + "text": "Engine.handleHTTPRequest finds a route and installs handlers on the context." + }, + { + "text": "Context.Next advances through the handler chain." + } + ], + "forbidden_claims": [ + { + "text": "Gin dispatches handlers without storing routes in method trees.", + "severity": "critical" + } + ], + "quality_thresholds": { + "min_expected_anchor_recall": 0.62, + "min_expected_file_recall": 0.6, + "min_expected_symbol_recall": 0.55, + "min_expected_claim_recall": 0.65, + "min_citation_coverage": 0.6, + "max_forbidden_claims": 0 + }, + "tags": [ + "language-expansion", + "go", + "routing" + ] + }, + { + "id": "ruby-jekyll-site-build", + "version": 1, + "suite": "language-expansion-holdout", + "task_class": "route_tracing", + "repo": { + "name": "jekyll-jekyll", + "url": "https://github.com/jekyll/jekyll.git", + "ref": "202df571314ba1d18e9fccd81d12aaad4a703c38", + "workspace_root": ".", + "languages": [ + "Ruby" + ], + "setup": [ + "No dependency setup is required for read-only benchmark inspection." + ] + }, + "prompt": "Trace how Jekyll's build command creates a site and runs the read, generate, render, and write phases. Cite the source files and name the supporting symbols.", + "expected_files": [ + "lib/jekyll/commands/build.rb", + "lib/jekyll/site.rb", + "lib/jekyll/reader.rb", + "lib/jekyll/renderer.rb" + ], + "expected_symbols": [ + { + "name": "Jekyll::Commands::Build.process", + "path": "lib/jekyll/commands/build.rb", + "kind": "method", + "why": "Build command entry point." + }, + { + "name": "Jekyll::Site", + "path": "lib/jekyll/site.rb", + "kind": "class", + "why": "Coordinates the site build." + }, + { + "name": "Site#process", + "path": "lib/jekyll/site.rb", + "kind": "method", + "why": "Runs the full build pipeline." + }, + { + "name": "Site#read", + "path": "lib/jekyll/site.rb", + "kind": "method", + "why": "Starts content reading." + }, + { + "name": "Site#render", + "path": "lib/jekyll/site.rb", + "kind": "method", + "why": "Starts rendering." + }, + { + "name": "Site#write", + "path": "lib/jekyll/site.rb", + "kind": "method", + "why": "Writes generated output." + } + ], + "expected_claims": [ + { + "text": "Build.process constructs or processes a Jekyll site." + }, + { + "text": "Site.process runs reset, read, generate, render, cleanup, and write phases." + }, + { + "text": "Reader is responsible for reading site content." + }, + { + "text": "Renderer renders pages and documents." + } + ], + "forbidden_claims": [ + { + "text": "Jekyll writes output before reading and rendering the site.", + "severity": "major" + } + ], + "quality_thresholds": { + "min_expected_anchor_recall": 0.62, + "min_expected_file_recall": 0.6, + "min_expected_symbol_recall": 0.55, + "min_expected_claim_recall": 0.65, + "min_citation_coverage": 0.6, + "max_forbidden_claims": 0 + }, + "tags": [ + "language-expansion", + "ruby", + "static-site" + ] + }, + { + "id": "php-monolog-record-flow", + "version": 1, + "suite": "language-expansion-holdout", + "task_class": "data_flow", + "repo": { + "name": "Seldaek-monolog", + "url": "https://github.com/Seldaek/monolog.git", + "ref": "04c3499db98d7471abd9261dc83232f8fe1a252d", + "workspace_root": ".", + "languages": [ + "PHP" + ], + "setup": [ + "No dependency setup is required for read-only benchmark inspection." + ] + }, + "prompt": "Explain how Monolog turns a log call into a LogRecord and passes it through handlers. Cite the source files and name the supporting symbols.", + "expected_files": [ + "src/Monolog/Logger.php", + "src/Monolog/LogRecord.php", + "src/Monolog/Handler/HandlerInterface.php", + "src/Monolog/Handler/AbstractProcessingHandler.php" + ], + "expected_symbols": [ + { + "name": "Logger", + "path": "src/Monolog/Logger.php", + "kind": "class", + "why": "Public logging API and handler owner." + }, + { + "name": "Logger::pushHandler", + "path": "src/Monolog/Logger.php", + "kind": "method", + "why": "Registers handlers." + }, + { + "name": "Logger::addRecord", + "path": "src/Monolog/Logger.php", + "kind": "method", + "why": "Creates records and invokes handlers." + }, + { + "name": "Logger::log", + "path": "src/Monolog/Logger.php", + "kind": "method", + "why": "Generic log entry point." + }, + { + "name": "LogRecord", + "path": "src/Monolog/LogRecord.php", + "kind": "class", + "why": "Carries structured log data." + }, + { + "name": "AbstractProcessingHandler::handle", + "path": "src/Monolog/Handler/AbstractProcessingHandler.php", + "kind": "method", + "why": "Processes records before writing." + } + ], + "expected_claims": [ + { + "text": "Logger owns a stack of handlers registered by pushHandler." + }, + { + "text": "Logger::log delegates into addRecord." + }, + { + "text": "addRecord creates a LogRecord before passing it to handlers." + }, + { + "text": "AbstractProcessingHandler handles records by processing and writing them." + } + ], + "forbidden_claims": [ + { + "text": "Monolog sends log messages directly to handlers without creating a LogRecord.", + "severity": "critical" + } + ], + "quality_thresholds": { + "min_expected_anchor_recall": 0.62, + "min_expected_file_recall": 0.6, + "min_expected_symbol_recall": 0.55, + "min_expected_claim_recall": 0.65, + "min_citation_coverage": 0.6, + "max_forbidden_claims": 0 + }, + "tags": [ + "language-expansion", + "php", + "logging" + ] + }, + { + "id": "csharp-automapper-map-flow", + "version": 1, + "suite": "language-expansion-holdout", + "task_class": "data_flow", + "repo": { + "name": "AutoMapper-AutoMapper", + "url": "https://github.com/AutoMapper/AutoMapper.git", + "ref": "b57c206dc7291821e42bdf816a5637a5c1d8cb54", + "workspace_root": ".", + "languages": [ + "C#" + ], + "setup": [ + "No dependency setup is required for read-only benchmark inspection." + ] + }, + "prompt": "Explain how AutoMapper configuration and runtime mapper APIs cooperate to map source objects to destination objects. Cite the source files and name the supporting symbols.", + "expected_files": [ + "src/AutoMapper/Mapper.cs", + "src/AutoMapper/Configuration/MapperConfiguration.cs", + "src/AutoMapper/TypeMap.cs", + "src/AutoMapper/Execution/TypeMapPlanBuilder.cs" + ], + "expected_symbols": [ + { + "name": "IMapperBase", + "path": "src/AutoMapper/Mapper.cs", + "kind": "interface", + "why": "Base mapping API." + }, + { + "name": "IMapper", + "path": "src/AutoMapper/Mapper.cs", + "kind": "interface", + "why": "Runtime mapper API." + }, + { + "name": "Mapper", + "path": "src/AutoMapper/Mapper.cs", + "kind": "class", + "why": "Runtime mapper implementation." + }, + { + "name": "Mapper.Map", + "path": "src/AutoMapper/Mapper.cs", + "kind": "method", + "why": "Public object mapping entry point." + }, + { + "name": "MapperConfiguration", + "path": "src/AutoMapper/Configuration/MapperConfiguration.cs", + "kind": "class", + "why": "Builds and owns mapping configuration." + }, + { + "name": "TypeMap.CreateMapperLambda", + "path": "src/AutoMapper/TypeMap.cs", + "kind": "method", + "why": "Creates mapping execution plan." + } + ], + "expected_claims": [ + { + "text": "MapperConfiguration builds and owns the mapping configuration used at runtime." + }, + { + "text": "Mapper.Map is the public runtime entry point for object mapping." + }, + { + "text": "TypeMap contributes mapper lambda plans used by the execution pipeline." + }, + { + "text": "TypeMapPlanBuilder participates in building expression plans for mappings." + } + ], + "forbidden_claims": [ + { + "text": "AutoMapper maps objects without using configuration or type maps.", + "severity": "critical" + } + ], + "quality_thresholds": { + "min_expected_anchor_recall": 0.62, + "min_expected_file_recall": 0.6, + "min_expected_symbol_recall": 0.55, + "min_expected_claim_recall": 0.65, + "min_citation_coverage": 0.6, + "max_forbidden_claims": 0 + }, + "tags": [ + "language-expansion", + "csharp", + "mapping" + ] + }, + { + "id": "kotlin-okio-buffer-flow", + "version": 1, + "suite": "language-expansion-holdout", + "task_class": "data_flow", + "repo": { + "name": "square-okio", + "url": "https://github.com/square/okio.git", + "ref": "722c8be0043d99b7b08d169b0ae90a24c15267ff", + "workspace_root": ".", + "languages": [ + "Kotlin" + ], + "setup": [ + "No dependency setup is required for read-only benchmark inspection." + ] + }, + "prompt": "Explain how Okio's Buffer, Source, Sink, and buffered wrappers cooperate to move bytes through reads and writes. Cite the source files and name the supporting symbols.", + "expected_files": [ + "okio/src/commonMain/kotlin/okio/Buffer.kt", + "okio/src/commonMain/kotlin/okio/BufferedSource.kt", + "okio/src/commonMain/kotlin/okio/BufferedSink.kt", + "okio/src/commonMain/kotlin/okio/RealBufferedSource.kt", + "okio/src/commonMain/kotlin/okio/RealBufferedSink.kt", + "okio/src/commonMain/kotlin/okio/Okio.kt" + ], + "expected_symbols": [ + { + "name": "Buffer", + "path": "okio/src/commonMain/kotlin/okio/Buffer.kt", + "kind": "class", + "why": "In-memory byte queue." + }, + { + "name": "Buffer.read", + "path": "okio/src/commonMain/kotlin/okio/Buffer.kt", + "kind": "function", + "why": "Reads bytes from the buffer." + }, + { + "name": "Buffer.write", + "path": "okio/src/commonMain/kotlin/okio/Buffer.kt", + "kind": "function", + "why": "Writes bytes into the buffer." + }, + { + "name": "RealBufferedSource", + "path": "okio/src/commonMain/kotlin/okio/RealBufferedSource.kt", + "kind": "class", + "why": "Buffered source implementation." + }, + { + "name": "RealBufferedSink", + "path": "okio/src/commonMain/kotlin/okio/RealBufferedSink.kt", + "kind": "class", + "why": "Buffered sink implementation." + }, + { + "name": "buffer", + "path": "okio/src/commonMain/kotlin/okio/Okio.kt", + "kind": "function", + "why": "Wraps raw sources or sinks in buffered implementations." + } + ], + "expected_claims": [ + { + "text": "Buffer is the in-memory byte store used by Okio reads and writes." + }, + { + "text": "RealBufferedSource reads from an upstream Source into a Buffer." + }, + { + "text": "RealBufferedSink writes buffered bytes to an upstream Sink." + }, + { + "text": "Okio buffer helpers wrap Source and Sink instances with buffered implementations." + } + ], + "forbidden_claims": [ + { + "text": "Okio buffered wrappers bypass Buffer and write every byte directly.", + "severity": "critical" + } + ], + "quality_thresholds": { + "min_expected_anchor_recall": 0.62, + "min_expected_file_recall": 0.6, + "min_expected_symbol_recall": 0.55, + "min_expected_claim_recall": 0.65, + "min_citation_coverage": 0.6, + "max_forbidden_claims": 0 + }, + "tags": [ + "language-expansion", + "kotlin", + "io" + ] + }, + { + "id": "swift-alamofire-request-flow", + "version": 1, + "suite": "language-expansion-holdout", + "task_class": "route_tracing", + "repo": { + "name": "Alamofire-Alamofire", + "url": "https://github.com/Alamofire/Alamofire.git", + "ref": "7595cbcf59809f9977c5f6378500de2ad73b7ddb", + "workspace_root": ".", + "languages": [ + "Swift" + ], + "setup": [ + "No dependency setup is required for read-only benchmark inspection." + ] + }, + "prompt": "Trace how Alamofire's Session creates requests, resumes tasks, validates data requests, and receives URLSession callbacks. Cite the source files and name the supporting symbols.", + "expected_files": [ + "Source/Core/Session.swift", + "Source/Core/Request.swift", + "Source/Core/DataRequest.swift", + "Source/Core/SessionDelegate.swift" + ], + "expected_symbols": [ + { + "name": "Session", + "path": "Source/Core/Session.swift", + "kind": "class", + "why": "Owns request creation and URLSession integration." + }, + { + "name": "Session.request", + "path": "Source/Core/Session.swift", + "kind": "method", + "why": "Creates DataRequest instances." + }, + { + "name": "Request.resume", + "path": "Source/Core/Request.swift", + "kind": "method", + "why": "Resumes the underlying task." + }, + { + "name": "DataRequest", + "path": "Source/Core/DataRequest.swift", + "kind": "class", + "why": "Represents data request behavior." + }, + { + "name": "DataRequest.validate", + "path": "Source/Core/DataRequest.swift", + "kind": "method", + "why": "Adds validation." + }, + { + "name": "SessionDelegate", + "path": "Source/Core/SessionDelegate.swift", + "kind": "class", + "why": "Handles URLSession delegate callbacks." + } + ], + "expected_claims": [ + { + "text": "Session creates request objects such as DataRequest." + }, + { + "text": "Request.resume resumes the underlying URLSession task." + }, + { + "text": "DataRequest.validate attaches validation behavior." + }, + { + "text": "SessionDelegate receives URLSession callback events." + } + ], + "forbidden_claims": [ + { + "text": "Alamofire sends requests without a Session or URLSession delegate path.", + "severity": "critical" + } + ], + "quality_thresholds": { + "min_expected_anchor_recall": 0.62, + "min_expected_file_recall": 0.6, + "min_expected_symbol_recall": 0.55, + "min_expected_claim_recall": 0.65, + "min_citation_coverage": 0.6, + "max_forbidden_claims": 0 + }, + "tags": [ + "language-expansion", + "swift", + "http" + ] + }, + { + "id": "dart-http-client-flow", + "version": 1, + "suite": "language-expansion-holdout", + "task_class": "data_flow", + "repo": { + "name": "dart-lang-http", + "url": "https://github.com/dart-lang/http.git", + "ref": "89cec60a4249ae0a0316f7a50d37ac56597f52c3", + "workspace_root": ".", + "languages": [ + "Dart" + ], + "setup": [ + "No dependency setup is required for read-only benchmark inspection." + ] + }, + "prompt": "Explain how package:http exposes top-level helpers, BaseClient convenience methods, BaseRequest finalization, and IOClient send behavior. Cite the source files and name the supporting symbols.", + "expected_files": [ + "pkgs/http/lib/http.dart", + "pkgs/http/lib/src/client.dart", + "pkgs/http/lib/src/base_client.dart", + "pkgs/http/lib/src/base_request.dart", + "pkgs/http/lib/src/io_client.dart", + "pkgs/http/lib/src/request.dart", + "pkgs/http/lib/src/response.dart" + ], + "expected_symbols": [ + { + "name": "Client", + "path": "pkgs/http/lib/src/client.dart", + "kind": "class", + "why": "Public HTTP client API." + }, + { + "name": "Client.get", + "path": "pkgs/http/lib/src/client.dart", + "kind": "method", + "why": "Convenience request API." + }, + { + "name": "BaseClient", + "path": "pkgs/http/lib/src/base_client.dart", + "kind": "class", + "why": "Implements convenience methods over send." + }, + { + "name": "BaseRequest", + "path": "pkgs/http/lib/src/base_request.dart", + "kind": "class", + "why": "Base request object with finalization." + }, + { + "name": "BaseRequest.finalize", + "path": "pkgs/http/lib/src/base_request.dart", + "kind": "method", + "why": "Freezes request body before sending." + }, + { + "name": "IOClient.send", + "path": "pkgs/http/lib/src/io_client.dart", + "kind": "method", + "why": "dart:io transport implementation." + } + ], + "expected_claims": [ + { + "text": "Top-level package:http helpers delegate to a Client." + }, + { + "text": "BaseClient implements convenience methods in terms of send." + }, + { + "text": "BaseRequest.finalize prepares the request body for sending." + }, + { + "text": "IOClient.send is the dart:io transport implementation." + } + ], + "forbidden_claims": [ + { + "text": "package:http top-level helpers bypass Client and IOClient entirely.", + "severity": "critical" + } + ], + "quality_thresholds": { + "min_expected_anchor_recall": 0.62, + "min_expected_file_recall": 0.6, + "min_expected_symbol_recall": 0.55, + "min_expected_claim_recall": 0.65, + "min_citation_coverage": 0.6, + "max_forbidden_claims": 0 + }, + "tags": [ + "language-expansion", + "dart", + "http" + ] + }, + { + "id": "bash-nvm-install-dispatch", + "version": 1, + "suite": "language-expansion-holdout", + "task_class": "route_tracing", + "repo": { + "name": "nvm-sh-nvm", + "url": "https://github.com/nvm-sh/nvm.git", + "ref": "7079a5d61c2b49c7d35a72006860ce5edb0fac51", + "workspace_root": ".", + "languages": [ + "Bash" + ], + "setup": [ + "No dependency setup is required for read-only benchmark inspection." + ] + }, + "prompt": "Trace how nvm's install script bootstraps the shell function and how nvm.sh dispatches install, download, and use commands. Cite the source files and name the supporting symbols.", + "expected_files": [ + "install.sh", + "nvm.sh", + "bash_completion" + ], + "expected_symbols": [ + { + "name": "nvm_do_install", + "path": "install.sh", + "kind": "function", + "why": "Installer orchestration function." + }, + { + "name": "nvm", + "path": "nvm.sh", + "kind": "function", + "why": "Main command dispatcher." + }, + { + "name": "nvm_install_node", + "path": "install.sh", + "kind": "function", + "why": "Installs Node versions." + }, + { + "name": "nvm_download", + "path": "nvm.sh", + "kind": "function", + "why": "Downloads remote assets." + }, + { + "name": "nvm_use_if_needed", + "path": "nvm.sh", + "kind": "function", + "why": "Switches active version when needed." + } + ], + "expected_claims": [ + { + "text": "install.sh bootstraps installation and arranges for nvm.sh to be sourced." + }, + { + "text": "nvm.sh defines the main nvm dispatcher function." + }, + { + "text": "nvm_install_node and nvm_download participate in installing Node versions." + }, + { + "text": "nvm_use_if_needed switches versions only when the requested version is not already active." + } + ], + "forbidden_claims": [ + { + "text": "nvm is a compiled binary and does not dispatch through shell functions.", + "severity": "critical" + } + ], + "quality_thresholds": { + "min_expected_anchor_recall": 0.62, + "min_expected_file_recall": 0.6, + "min_expected_symbol_recall": 0.55, + "min_expected_claim_recall": 0.65, + "min_citation_coverage": 0.6, + "max_forbidden_claims": 0 + }, + "tags": [ + "language-expansion", + "bash", + "shell" + ] + }, + { + "id": "html-mdn-form-validation", + "version": 1, + "suite": "language-expansion-holdout", + "task_class": "architecture_explanation", + "repo": { + "name": "mdn-learning-area", + "url": "https://github.com/mdn/learning-area.git", + "ref": "ca1ff0bd06e12b96a6742ffdf040bb22966e5a5e", + "workspace_root": ".", + "languages": [ + "HTML" + ], + "setup": [ + "No dependency setup is required for read-only benchmark inspection." + ] + }, + "prompt": "Explain how the MDN form validation examples combine native HTML constraints with custom JavaScript validation. Cite the source files and name the supporting elements or functions.", + "expected_files": [ + "html/forms/form-validation/full-example.html", + "html/forms/form-validation/detailed-custom-validation.html", + "html/forms/form-validation/fruit-pattern.html", + "html/forms/form-validation/min-max.html" + ], + "expected_symbols": [ + { + "name": "form", + "path": "html/forms/form-validation/detailed-custom-validation.html", + "kind": "element", + "why": "Validation form root." + }, + { + "name": "input#mail", + "path": "html/forms/form-validation/detailed-custom-validation.html", + "kind": "element", + "why": "Email input validated by JavaScript." + }, + { + "name": "novalidate", + "path": "html/forms/form-validation/detailed-custom-validation.html", + "kind": "attribute", + "why": "Disables browser UI for custom validation." + }, + { + "name": "showError", + "path": "html/forms/form-validation/detailed-custom-validation.html", + "kind": "function", + "why": "Reports custom validation messages." + }, + { + "name": "pattern", + "path": "html/forms/form-validation/fruit-pattern.html", + "kind": "attribute", + "why": "Native pattern constraint." + }, + { + "name": "min", + "path": "html/forms/form-validation/min-max.html", + "kind": "attribute", + "why": "Native numeric constraint." + } + ], + "expected_claims": [ + { + "text": "The examples use native required, pattern, min, and max constraints." + }, + { + "text": "The detailed custom validation example uses novalidate to suppress the browser default UI." + }, + { + "text": "The showError function branches on ValidityState fields to choose messages." + }, + { + "text": "Submit handlers prevent submission when the form is invalid." + } + ], + "forbidden_claims": [ + { + "text": "The examples rely only on JavaScript and do not use native HTML constraints.", + "severity": "major" + } + ], + "quality_thresholds": { + "min_expected_anchor_recall": 0.62, + "min_expected_file_recall": 0.6, + "min_expected_symbol_recall": 0.55, + "min_expected_claim_recall": 0.65, + "min_citation_coverage": 0.6, + "max_forbidden_claims": 0 + }, + "tags": [ + "language-expansion", + "html", + "forms" + ] + }, + { + "id": "css-animate-base-and-keyframes", + "version": 1, + "suite": "language-expansion-holdout", + "task_class": "architecture_explanation", + "repo": { + "name": "animate-css-animate-css", + "url": "https://github.com/animate-css/animate.css.git", + "ref": "3f8ab233dbbd9d2fe577528d2296382954be3d1a", + "workspace_root": ".", + "languages": [ + "CSS" + ], + "setup": [ + "No dependency setup is required for read-only benchmark inspection." + ] + }, + "prompt": "Explain how animate.css defines shared animation variables/base classes and connects named animation classes to keyframes. Cite the source files and name the supporting selectors or keyframes.", + "expected_files": [ + "source/_vars.css", + "source/_base.css", + "source/animate.css", + "source/attention_seekers/bounce.css", + "source/attention_seekers/flash.css" + ], + "expected_symbols": [ + { + "name": "--animate-duration", + "path": "source/_vars.css", + "kind": "custom-property", + "why": "Shared animation duration variable." + }, + { + "name": "--animate-delay", + "path": "source/_vars.css", + "kind": "custom-property", + "why": "Shared animation delay variable." + }, + { + "name": ".animated", + "path": "source/_base.css", + "kind": "selector", + "why": "Base animation class." + }, + { + "name": "@keyframes bounce", + "path": "source/attention_seekers/bounce.css", + "kind": "keyframes", + "why": "Bounce animation keyframes." + }, + { + "name": ".bounce", + "path": "source/attention_seekers/bounce.css", + "kind": "selector", + "why": "Class that selects bounce keyframes." + }, + { + "name": "@keyframes flash", + "path": "source/attention_seekers/flash.css", + "kind": "keyframes", + "why": "Flash animation keyframes." + } + ], + "expected_claims": [ + { + "text": "Shared CSS custom properties define animation duration, delay, and repeat defaults." + }, + { + "text": ".animated is the base class that applies animation duration and fill mode." + }, + { + "text": "Named classes such as .bounce set animation-name to matching keyframes." + }, + { + "text": "The source/animate.css file imports the variable, base, and individual animation files." + } + ], + "forbidden_claims": [ + { + "text": "animate.css defines animations only in JavaScript and not in CSS keyframes.", + "severity": "critical" + } + ], + "quality_thresholds": { + "min_expected_anchor_recall": 0.62, + "min_expected_file_recall": 0.6, + "min_expected_symbol_recall": 0.55, + "min_expected_claim_recall": 0.65, + "min_citation_coverage": 0.6, + "max_forbidden_claims": 0 + }, + "tags": [ + "language-expansion", + "css", + "animation" + ] + }, + { + "id": "sql-chinook-schema-relations", + "version": 1, + "suite": "language-expansion-holdout", + "task_class": "data_flow", + "repo": { + "name": "lerocha-chinook-database", + "url": "https://github.com/lerocha/chinook-database.git", + "ref": "7f67772503d71ba90f19283c38e93923addb43fa", + "workspace_root": ".", + "languages": [ + "SQL" + ], + "setup": [ + "No dependency setup is required for read-only benchmark inspection." + ] + }, + "prompt": "Explain the core Chinook schema relationships between artists, albums, tracks, invoices, and invoice lines across the SQL seed scripts. Cite the source files and name the supporting tables or constraints.", + "expected_files": [ + "ChinookDatabase/DataSources/Chinook_Sqlite.sql", + "ChinookDatabase/DataSources/Chinook_MySql.sql", + "ChinookDatabase/DataSources/Chinook_PostgreSql.sql" + ], + "expected_symbols": [ + { + "name": "CREATE TABLE Artist", + "path": "ChinookDatabase/DataSources/Chinook_Sqlite.sql", + "kind": "table", + "why": "Artist table definition." + }, + { + "name": "CREATE TABLE Album", + "path": "ChinookDatabase/DataSources/Chinook_Sqlite.sql", + "kind": "table", + "why": "Album table definition." + }, + { + "name": "CREATE TABLE Track", + "path": "ChinookDatabase/DataSources/Chinook_Sqlite.sql", + "kind": "table", + "why": "Track table definition." + }, + { + "name": "CREATE TABLE InvoiceLine", + "path": "ChinookDatabase/DataSources/Chinook_Sqlite.sql", + "kind": "table", + "why": "Invoice line table definition." + }, + { + "name": "FOREIGN KEY", + "path": "ChinookDatabase/DataSources/Chinook_Sqlite.sql", + "kind": "constraint", + "why": "Relationship constraints between schema tables." + } + ], + "expected_claims": [ + { + "text": "Album rows reference Artist rows through ArtistId." + }, + { + "text": "Track rows reference Album, MediaType, and Genre rows." + }, + { + "text": "InvoiceLine rows reference Invoice and Track rows." + }, + { + "text": "The repository carries multiple SQL dialect scripts for the same Chinook schema." + } + ], + "forbidden_claims": [ + { + "text": "The Chinook SQL scripts define tables without foreign key relationships.", + "severity": "critical" + } + ], + "quality_thresholds": { + "min_expected_anchor_recall": 0.62, + "min_expected_file_recall": 0.6, + "min_expected_symbol_recall": 0.55, + "min_expected_claim_recall": 0.65, + "min_citation_coverage": 0.6, + "max_forbidden_claims": 0 + }, + "tags": [ + "language-expansion", + "sql", + "schema" + ] + } + ] +} diff --git a/benchmarks/tasks/manifest.schema.json b/benchmarks/tasks/manifest.schema.json index e449bc0c..d2fd3c71 100644 --- a/benchmarks/tasks/manifest.schema.json +++ b/benchmarks/tasks/manifest.schema.json @@ -65,7 +65,8 @@ }, "ref": { "type": "string", - "minLength": 1 + "pattern": "^[0-9a-fA-F]{40}$", + "description": "Full immutable Git commit SHA. Tags, branches, and short SHAs are not publishable benchmark provenance." }, "workspace_root": { "type": "string", diff --git a/benchmarks/tasks/mux-cors-middleware-edit-plan.task.json b/benchmarks/tasks/mux-cors-middleware-edit-plan.task.json index 5e4b1d13..d64fcf83 100644 --- a/benchmarks/tasks/mux-cors-middleware-edit-plan.task.json +++ b/benchmarks/tasks/mux-cors-middleware-edit-plan.task.json @@ -7,7 +7,7 @@ "repo": { "name": "mux", "url": "https://github.com/gorilla/mux.git", - "ref": "v1.8.1", + "ref": "b4617d0b9670ad14039b2739167fd35a60f557c5", "workspace_root": ".", "languages": [ "Go" diff --git a/benchmarks/tasks/mux-route-regexp-bug-localization.task.json b/benchmarks/tasks/mux-route-regexp-bug-localization.task.json index ae9cc13a..66d03820 100644 --- a/benchmarks/tasks/mux-route-regexp-bug-localization.task.json +++ b/benchmarks/tasks/mux-route-regexp-bug-localization.task.json @@ -7,7 +7,7 @@ "repo": { "name": "mux", "url": "https://github.com/gorilla/mux.git", - "ref": "v1.8.1", + "ref": "b4617d0b9670ad14039b2739167fd35a60f557c5", "workspace_root": ".", "languages": [ "Go" diff --git a/benchmarks/tasks/mux-router-matching-flow.task.json b/benchmarks/tasks/mux-router-matching-flow.task.json index 0491c2a4..a9733519 100644 --- a/benchmarks/tasks/mux-router-matching-flow.task.json +++ b/benchmarks/tasks/mux-router-matching-flow.task.json @@ -7,7 +7,7 @@ "repo": { "name": "mux", "url": "https://github.com/gorilla/mux.git", - "ref": "v1.8.1", + "ref": "b4617d0b9670ad14039b2739167fd35a60f557c5", "workspace_root": ".", "languages": [ "Go" diff --git a/benchmarks/tasks/mux-strict-slash-change-impact.task.json b/benchmarks/tasks/mux-strict-slash-change-impact.task.json index 3f54620e..2458879a 100644 --- a/benchmarks/tasks/mux-strict-slash-change-impact.task.json +++ b/benchmarks/tasks/mux-strict-slash-change-impact.task.json @@ -7,7 +7,7 @@ "repo": { "name": "mux", "url": "https://github.com/gorilla/mux.git", - "ref": "v1.8.1", + "ref": "b4617d0b9670ad14039b2739167fd35a60f557c5", "workspace_root": ".", "languages": [ "Go" diff --git a/benchmarks/tasks/vite-config-change-impact.task.json b/benchmarks/tasks/vite-config-change-impact.task.json index 1f29c2f4..4c6fc2b6 100644 --- a/benchmarks/tasks/vite-config-change-impact.task.json +++ b/benchmarks/tasks/vite-config-change-impact.task.json @@ -7,7 +7,7 @@ "repo": { "name": "vite", "url": "https://github.com/vitejs/vite.git", - "ref": "v5.4.19", + "ref": "80a333a23103ced0442d4463d1191433d90f5e19", "workspace_root": "packages/vite", "languages": [ "TypeScript", diff --git a/benchmarks/tasks/vite-dev-server-architecture.task.json b/benchmarks/tasks/vite-dev-server-architecture.task.json index 3106cd76..9bb93b38 100644 --- a/benchmarks/tasks/vite-dev-server-architecture.task.json +++ b/benchmarks/tasks/vite-dev-server-architecture.task.json @@ -7,7 +7,7 @@ "repo": { "name": "vite", "url": "https://github.com/vitejs/vite.git", - "ref": "v5.4.19", + "ref": "80a333a23103ced0442d4463d1191433d90f5e19", "workspace_root": "packages/vite", "languages": [ "TypeScript", diff --git a/benchmarks/tasks/vite-dev-server-module-request-flow.task.json b/benchmarks/tasks/vite-dev-server-module-request-flow.task.json index ac750208..8638bbe5 100644 --- a/benchmarks/tasks/vite-dev-server-module-request-flow.task.json +++ b/benchmarks/tasks/vite-dev-server-module-request-flow.task.json @@ -7,7 +7,7 @@ "repo": { "name": "vite", "url": "https://github.com/vitejs/vite.git", - "ref": "v5.4.19", + "ref": "80a333a23103ced0442d4463d1191433d90f5e19", "workspace_root": "packages/vite", "languages": [ "TypeScript", diff --git a/benchmarks/tasks/vite-dev-server-symbol-ownership.task.json b/benchmarks/tasks/vite-dev-server-symbol-ownership.task.json index 5e26ff8d..9ed2fe60 100644 --- a/benchmarks/tasks/vite-dev-server-symbol-ownership.task.json +++ b/benchmarks/tasks/vite-dev-server-symbol-ownership.task.json @@ -7,7 +7,7 @@ "repo": { "name": "vite", "url": "https://github.com/vitejs/vite.git", - "ref": "v5.4.19", + "ref": "80a333a23103ced0442d4463d1191433d90f5e19", "workspace_root": "packages/vite", "languages": [ "TypeScript", diff --git a/benchmarks/tasks/vite-transform-edit-plan.task.json b/benchmarks/tasks/vite-transform-edit-plan.task.json index 5e8677e7..23306950 100644 --- a/benchmarks/tasks/vite-transform-edit-plan.task.json +++ b/benchmarks/tasks/vite-transform-edit-plan.task.json @@ -7,7 +7,7 @@ "repo": { "name": "vite", "url": "https://github.com/vitejs/vite.git", - "ref": "v5.4.19", + "ref": "80a333a23103ced0442d4463d1191433d90f5e19", "workspace_root": "packages/vite", "languages": [ "TypeScript", diff --git a/crates/codestory-bench/Cargo.toml b/crates/codestory-bench/Cargo.toml index 1b3536ed..a04eb6f0 100644 --- a/crates/codestory-bench/Cargo.toml +++ b/crates/codestory-bench/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "codestory-bench" -version = "0.6.2" +version = "0.7.0" edition = "2024" publish = false diff --git a/crates/codestory-cli/Cargo.toml b/crates/codestory-cli/Cargo.toml index d896c21d..d211ab84 100644 --- a/crates/codestory-cli/Cargo.toml +++ b/crates/codestory-cli/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "codestory-cli" -version = "0.6.2" +version = "0.7.0" edition = "2024" description = "Local repository evidence and grounding CLI for source-backed coding workflows." license = "Apache-2.0" diff --git a/crates/codestory-cli/src/args.rs b/crates/codestory-cli/src/args.rs index ee76b796..1e997b61 100644 --- a/crates/codestory-cli/src/args.rs +++ b/crates/codestory-cli/src/args.rs @@ -2,11 +2,11 @@ use clap::{ArgGroup, Args, Parser, Subcommand, ValueEnum}; use codestory_contracts::api::{ BookmarkCategoryDto, BookmarkDto, ClaimReadinessDto, EvidencePacketDto, GroundingBudgetDto, IndexDryRunDto, IndexFreshnessDto, IndexedFileRoleDto, IndexingPhaseTimings, LayoutDirection, - NodeId, NodeKind, PacketBudgetModeDto, PacketTaskClassDto, ProjectSummary, - RepoTextScanStatsDto, RetrievalScoreBreakdownDto, RetrievalShadowDto, RetrievalStateDto, - SearchHitOrigin, SearchMatchQualityDto, SearchPlanDto, SearchQueryAssessmentDto, - SnippetContextDto, SummaryGenerationDto, SymbolContextDto, TrailCallerScope, TrailContextDto, - TrailDirection, TrailMode, + NodeId, NodeKind, PacketBudgetModeDto, PacketTaskClassDto, ProjectSummary, ReadinessGoalDto, + ReadinessVerdictDto, RepoTextScanStatsDto, RetrievalScoreBreakdownDto, RetrievalShadowDto, + RetrievalStateDto, SearchHitOrigin, SearchMatchQualityDto, SearchPlanDto, + SearchQueryAssessmentDto, SnippetContextDto, SummaryGenerationDto, SymbolContextDto, + TrailCallerScope, TrailContextDto, TrailDirection, TrailMode, }; use serde::{Deserialize, Serialize}; use std::path::PathBuf; @@ -49,6 +49,8 @@ pub(crate) enum Command { Packet(PacketCommand), #[command(about = "Check cache, index, and retrieval health.")] Doctor(DoctorCommand), + #[command(about = "Print compact readiness verdicts for local navigation or agent search.")] + Ready(ReadyCommand), #[command(about = "Install or check local setup assets.")] Setup(SetupCommand), #[command(about = "Find symbols and repo text evidence.")] @@ -296,6 +298,8 @@ pub(crate) struct ReportCommand { help = "Write the generated report/export artifact to this file instead of stdout. The parent directory must already exist." )] pub(crate) output_file: Option, + #[arg(long, value_enum, default_value_t = ReportProfile::Full)] + pub(crate) profile: ReportProfile, #[arg( long, value_name = "N", @@ -306,6 +310,13 @@ pub(crate) struct ReportCommand { pub(crate) limit: usize, } +#[derive(Debug, Clone, Copy, Default, PartialEq, Eq, ValueEnum)] +pub(crate) enum ReportProfile { + #[default] + Full, + Handoff, +} + #[derive(Args, Debug)] #[command(group( ArgGroup::new("context_target") @@ -401,6 +412,12 @@ pub(crate) struct PacketCommand { pub(crate) budget: CliPacketBudget, #[arg(long, value_enum)] pub(crate) task_class: Option, + #[arg( + long = "extra-probe", + value_name = "QUERY", + help = "Add an explicit file, symbol, or file-scoped symbol probe to the packet plan. Repeatable; intended for audited benchmark or operator-supplied anchors." + )] + pub(crate) extra_probes: Vec, #[arg( long, value_enum, @@ -449,6 +466,37 @@ pub(crate) struct DoctorCommand { pub(crate) output_file: Option, } +#[derive(Args, Debug)] +pub(crate) struct ReadyCommand { + #[command(flatten)] + pub(crate) project: ProjectArgs, + #[arg(long, value_enum)] + pub(crate) goal: Option, + #[arg(long, value_name = "FORMAT", value_parser = parse_read_output_format, default_value = "markdown")] + pub(crate) format: OutputFormat, + #[arg( + long, + value_name = "PATH", + help = "Write command output to this file instead of stdout. The parent directory must already exist." + )] + pub(crate) output_file: Option, +} + +#[derive(Debug, Clone, Copy, PartialEq, Eq, ValueEnum)] +pub(crate) enum ReadyGoal { + Local, + Agent, +} + +impl ReadyGoal { + pub(crate) const fn as_dto(self) -> ReadinessGoalDto { + match self { + Self::Local => ReadinessGoalDto::LocalNavigation, + Self::Agent => ReadinessGoalDto::AgentPacketSearch, + } + } +} + #[derive(Args, Debug)] pub(crate) struct RetrievalCommand { #[command(subcommand)] @@ -962,6 +1010,8 @@ pub(crate) struct ExploreCommand { help = "Print plain Markdown instead of opening the terminal explorer when stdout is interactive." )] pub(crate) no_tui: bool, + #[arg(long, help = "Alias for --no-tui; useful for agent-safe plain output.")] + pub(crate) plain: bool, #[arg( long, value_enum, @@ -1196,9 +1246,16 @@ pub(crate) struct IndexOutput<'a> { #[serde(default, skip_serializing_if = "Option::is_none")] pub(crate) summary_generation: Option<&'a SummaryGenerationDto>, #[serde(default, skip_serializing_if = "Vec::is_empty")] + pub(crate) readiness: Vec, + #[serde(default, skip_serializing_if = "Vec::is_empty")] pub(crate) next_commands: Vec, } +#[derive(Debug, Serialize)] +pub(crate) struct ReadyOutput { + pub(crate) verdicts: Vec, +} + #[derive(Debug, Serialize)] pub(crate) struct IndexDryRunOutput<'a> { pub(crate) dry_run: &'a IndexDryRunDto, @@ -1988,6 +2045,8 @@ pub(crate) struct DoctorOutput { pub(crate) retrieval: Option, #[serde(default, skip_serializing_if = "Option::is_none")] pub(crate) freshness: Option, + #[serde(default, skip_serializing_if = "Vec::is_empty")] + pub(crate) readiness: Vec, pub(crate) checks: Vec, pub(crate) next_commands: Vec, pub(crate) environment: Vec, diff --git a/crates/codestory-cli/src/display.rs b/crates/codestory-cli/src/display.rs index 992c0a7f..b49af95e 100644 --- a/crates/codestory-cli/src/display.rs +++ b/crates/codestory-cli/src/display.rs @@ -13,6 +13,37 @@ pub(crate) fn clean_path_string(path: &str) -> String { stringified } +pub(crate) fn quote_command_path(path: &Path) -> String { + let value = clean_path_string(&path.to_string_lossy()); + quote_command_argument_value(&value) +} + +pub(crate) fn quote_command_value(value: &str) -> String { + quote_shell_single_quoted_value(value) +} + +pub(crate) fn quote_command_argument_value(value: &str) -> String { + if command_value_needs_single_quotes(value) { + quote_command_value(value) + } else { + format!("\"{}\"", value.replace('"', "\\\"")) + } +} + +fn command_value_needs_single_quotes(value: &str) -> bool { + value.chars().any(|ch| matches!(ch, '$' | '`' | '\'' | '"')) +} + +#[cfg(windows)] +fn quote_shell_single_quoted_value(value: &str) -> String { + format!("'{}'", value.replace('\'', "''")) +} + +#[cfg(not(windows))] +fn quote_shell_single_quoted_value(value: &str) -> String { + format!("'{}'", value.replace('\'', "'\\''")) +} + pub(crate) fn relative_path(project_root: &Path, raw: &str) -> String { let normalized_root = clean_path_string(&project_root.to_string_lossy()); let normalized_raw = clean_path_string(raw); diff --git a/crates/codestory-cli/src/explore.rs b/crates/codestory-cli/src/explore.rs index efc376fb..ca5405c8 100644 --- a/crates/codestory-cli/src/explore.rs +++ b/crates/codestory-cli/src/explore.rs @@ -136,15 +136,30 @@ pub(crate) fn run_explore(cmd: ExploreCommand) -> Result<()> { let markdown = render_explore_markdown(&render_context); if cmd.format == args::OutputFormat::Markdown && cmd.output_file.is_none() - && !cmd.no_tui + && !explore_plain_requested(&cmd) && std::io::stdout().is_terminal() { - eprintln!("Opening interactive explore TUI; use --no-tui for plain markdown."); + eprintln!( + "Opening interactive explore TUI; use --no-tui, --plain, or CODESTORY_NO_TUI=1 for plain markdown." + ); return run_explore_tui(&render_context); } emit(cmd.format, &output, markdown, cmd.output_file.as_deref()) } +fn explore_plain_requested(cmd: &ExploreCommand) -> bool { + cmd.no_tui || cmd.plain || env_flag_enabled("CODESTORY_NO_TUI") +} + +fn env_flag_enabled(name: &str) -> bool { + std::env::var(name) + .map(|value| { + let value = value.trim().to_ascii_lowercase(); + !value.is_empty() && !matches!(value.as_str(), "0" | "false" | "off" | "no") + }) + .unwrap_or(false) +} + pub(crate) fn build_explore_artifact_for_target( runtime: &RuntimeContext, opened: &runtime::OpenedProject, @@ -1111,8 +1126,11 @@ fn render_explore_status_markdown(status: &ExploreStatusOutput) -> String { } else { markdown.push_str("- freshness: unavailable\n"); } - if let Some(command) = status.next_commands.first() { - markdown.push_str(&format!("- next: `{command}`\n")); + if !status.next_commands.is_empty() { + markdown.push_str("- next_commands:\n"); + for command in &status.next_commands { + markdown.push_str(&format!(" - `{command}`\n")); + } } markdown.push_str("- layers:\n"); for note in &status.layer_notes { @@ -1482,34 +1500,38 @@ struct ExplorePane { body: String, } +const EXPLORE_PANE_ORDER: [&str; 9] = [ + "Status", "Profile", "Search", "Results", "Evidence", "Detail", "Trail", "Snippet", "Source", +]; + fn build_explore_panes(context: &ExploreRenderContext<'_>) -> Vec { vec![ ExplorePane { - label: "Status", + label: EXPLORE_PANE_ORDER[0], body: render_explore_status_markdown(context.status), }, ExplorePane { - label: "Profile", + label: EXPLORE_PANE_ORDER[1], body: render_explore_profile_markdown(context.profile), }, ExplorePane { - label: "Search", + label: EXPLORE_PANE_ORDER[2], body: render_explore_search_markdown(context.search), }, ExplorePane { - label: "Results", + label: EXPLORE_PANE_ORDER[3], body: render_explore_results_markdown(context.navigation), }, ExplorePane { - label: "Evidence", + label: EXPLORE_PANE_ORDER[4], body: render_explore_relationship_evidence_markdown(context.relationship_evidence), }, ExplorePane { - label: "Detail", + label: EXPLORE_PANE_ORDER[5], body: render_symbol_markdown(context.project_root, context.target, context.symbol, &[]), }, ExplorePane { - label: "Trail", + label: EXPLORE_PANE_ORDER[6], body: { let cmd = explore_trail_command(context.project_root, context.target, context.trail); @@ -1517,7 +1539,7 @@ fn build_explore_panes(context: &ExploreRenderContext<'_>) -> Vec { }, }, ExplorePane { - label: "Snippet", + label: EXPLORE_PANE_ORDER[7], body: format!( "{}\n{}", context.snippet_layer_note, @@ -1536,7 +1558,7 @@ fn build_explore_panes(context: &ExploreRenderContext<'_>) -> Vec { ), }, ExplorePane { - label: "Source", + label: EXPLORE_PANE_ORDER[8], body: render_explore_source_packet_markdown(context.source_packet), }, ] @@ -1597,11 +1619,41 @@ fn explore_tui_nav_label( format!("{marker} {pane_label} [{}/{}]", pane_index + 1, pane_count) } -fn explore_tui_footer_lines() -> [&'static str; 2] { - [ - "Tab/Shift-Tab panes Up/Down or j/k scroll PgUp/PgDn page", - "Home top Esc/Ctrl+C/q quit", - ] +fn explore_status_ribbon(status: &ExploreStatusOutput) -> String { + let freshness = status + .freshness + .as_ref() + .map(render_explore_freshness) + .unwrap_or_else(|| "freshness unavailable".to_string()); + let retrieval = status + .retrieval + .as_ref() + .map(|state| { + if state.semantic_ready { + format!("semantic ready docs={}", state.semantic_doc_count) + } else { + state + .fallback_message + .clone() + .unwrap_or_else(|| "semantic not ready".to_string()) + } + }) + .unwrap_or_else(|| "retrieval unavailable".to_string()); + format!( + "files={} nodes={} edges={} | {freshness} | {retrieval}", + status.indexed_files, status.indexed_nodes, status.indexed_edges + ) +} + +fn explore_tui_footer_text(status: &ExploreStatusOutput) -> String { + let mut lines = vec![ + "Tab/Shift-Tab panes Up/Down or j/k scroll PgUp/PgDn page".to_string(), + "Home top Esc/Ctrl+C/q quit".to_string(), + ]; + if let Some(command) = status.next_commands.first() { + lines.push(format!("Next: {command}")); + } + lines.join("\n") } pub(crate) fn explore_tui_action(key: crossterm::event::KeyEvent) -> ExploreTuiAction { @@ -1660,9 +1712,13 @@ fn run_explore_tui(context: &ExploreRenderContext<'_>) -> Result<()> { let shell = Layout::default() .direction(Direction::Vertical) .constraints([ - Constraint::Length(3), + Constraint::Length(4), Constraint::Min(1), - Constraint::Length(2), + Constraint::Length(if context.status.next_commands.is_empty() { + 2 + } else { + 3 + }), ]) .split(area); let body = Layout::default() @@ -1678,7 +1734,11 @@ fn run_explore_tui(context: &ExploreRenderContext<'_>) -> Result<()> { ) .unwrap_or_else(|| context.target.selected.display_name.clone()); frame.render_widget( - Paragraph::new(title).block( + Paragraph::new(format!( + "{title}\n{}", + explore_status_ribbon(context.status) + )) + .block( Block::default() .borders(Borders::ALL) .title("CodeStory Explore"), @@ -1693,9 +1753,9 @@ fn run_explore_tui(context: &ExploreRenderContext<'_>) -> Result<()> { let label = explore_tui_nav_label(pane.label, idx, panes.len(), idx == state.selected); let style = if idx == state.selected { - Style::default() - .fg(Color::Cyan) - .add_modifier(Modifier::BOLD) + Style::default().fg(Color::Cyan).add_modifier( + Modifier::BOLD | Modifier::REVERSED | Modifier::UNDERLINED, + ) } else { Style::default() }; @@ -1724,7 +1784,7 @@ fn run_explore_tui(context: &ExploreRenderContext<'_>) -> Result<()> { body[1], ); frame.render_widget( - Paragraph::new(explore_tui_footer_lines().join("\n")), + Paragraph::new(explore_tui_footer_text(context.status)), shell[2], ); })?; @@ -1760,8 +1820,24 @@ mod tests { #[test] fn explore_tui_footer_text() { - let lines = super::explore_tui_footer_lines(); - assert_eq!(lines.len(), 2); + let status = super::ExploreStatusOutput { + project: "C:/repo".to_string(), + storage_path: "C:/cache/codestory.db".to_string(), + refresh: "none".to_string(), + output_target: "stdout".to_string(), + indexed_files: 1, + indexed_nodes: 2, + indexed_edges: 3, + retrieval: None, + freshness: None, + next_commands: vec![ + "codestory-cli context --project \"C:/repo\" --id node-1".to_string(), + ], + layer_notes: Vec::new(), + }; + let footer_text = super::explore_tui_footer_text(&status); + let lines = footer_text.lines().collect::>(); + assert_eq!(lines.len(), 3); for line in &lines { assert!(line.len() <= 80, "footer line exceeds 80 columns: {line}"); } @@ -1780,5 +1856,20 @@ mod tests { ] { assert!(footer.contains(control), "footer missing {control}"); } + assert!( + footer.contains("Next: codestory-cli context"), + "footer should expose the first next command: {footer}" + ); + } + + #[test] + fn explore_pane_order_is_pinned() { + assert_eq!( + super::EXPLORE_PANE_ORDER, + [ + "Status", "Profile", "Search", "Results", "Evidence", "Detail", "Trail", "Snippet", + "Source", + ] + ); } } diff --git a/crates/codestory-cli/src/main.rs b/crates/codestory-cli/src/main.rs index 0fef768a..51c044d8 100644 --- a/crates/codestory-cli/src/main.rs +++ b/crates/codestory-cli/src/main.rs @@ -38,6 +38,7 @@ mod http_transport; mod managed_embeddings; mod output; mod query_resolution; +mod readiness; mod report; mod retrieval; mod runtime; @@ -63,9 +64,10 @@ use args::{ DrillSummaryVerdictOutput, DrillVerificationChecklistItemOutput, FilesCommand, GenerateCompletionsCommand, GroundCommand, IndexCommand, IndexDryRunOutput, IndexOutput, PacketCommand, ProjectArgs, QueryCommand, QueryOutput, QueryResolutionOutput, - QuerySelectorOutput, RepoTextMode, SearchCommand, SearchHitOutput, SearchOutput, ServeCommand, - SetupAction, SetupCommand, SnippetCommand, SnippetJsonOutput, SymbolCommand, SymbolJsonOutput, - TrailCommand, TrailJsonOutput, VerificationTargetOutput, build_trail_request, + QuerySelectorOutput, ReadyCommand, ReadyOutput, RepoTextMode, SearchCommand, SearchHitOutput, + SearchOutput, ServeCommand, SetupAction, SetupCommand, SnippetCommand, SnippetJsonOutput, + SymbolCommand, SymbolJsonOutput, TrailCommand, TrailJsonOutput, VerificationTargetOutput, + build_trail_request, }; #[cfg(test)] use explore::{ExploreTuiAction, ExploreTuiState, explore_tui_action}; @@ -75,9 +77,10 @@ use output::{ context_packet_json, emit, emit_text, render_agent_citation, render_context_markdown, render_doctor_markdown, render_drill_markdown, render_ground_markdown, render_index_dry_run_markdown, render_index_markdown, render_query_markdown, - render_search_hit_output, render_search_markdown, render_snippet_markdown, - render_symbol_markdown, render_symbol_mermaid, render_trail_dot, render_trail_markdown, - render_trail_mermaid, render_trail_story_markdown, validate_output_file_parent, + render_ready_markdown, render_search_hit_output, render_search_markdown, + render_snippet_markdown, render_symbol_markdown, render_symbol_mermaid, render_trail_dot, + render_trail_markdown, render_trail_mermaid, render_trail_story_markdown, + validate_output_file_parent, }; use runtime::{ AmbiguousTargetError, RuntimeContext, ensure_index_ready, map_api_error, refresh_label, @@ -183,6 +186,7 @@ fn main() -> Result<()> { Command::Context(cmd) => run_context(cmd), Command::Packet(cmd) => run_packet(cmd), Command::Doctor(cmd) => run_doctor(cmd), + Command::Ready(cmd) => run_ready(cmd), Command::Setup(cmd) => run_setup(cmd), Command::Search(cmd) => run_search(cmd), Command::Drill(cmd) => run_drill(cmd), @@ -240,32 +244,15 @@ fn setup_embeddings_next_commands( } fn quote_command_path(path: &std::path::Path) -> String { - let value = display::clean_path_string(&path.to_string_lossy()); - if command_value_needs_single_quotes(&value) { - quote_powershell_single_quoted_value(&value) - } else { - format!("\"{}\"", value.replace('"', "\\\"")) - } + display::quote_command_path(path) } fn quote_command_value(value: &str) -> String { - quote_powershell_single_quoted_value(value) + display::quote_command_value(value) } fn quote_command_argument_value(value: &str) -> String { - if command_value_needs_single_quotes(value) { - quote_command_value(value) - } else { - format!("\"{}\"", value.replace('"', "\\\"")) - } -} - -fn command_value_needs_single_quotes(value: &str) -> bool { - value.chars().any(|ch| matches!(ch, '$' | '`' | '\'' | '"')) -} - -fn quote_powershell_single_quoted_value(value: &str) -> String { - format!("'{}'", value.replace('\'', "''")) + display::quote_command_argument_value(value) } fn run_index(cmd: IndexCommand) -> Result<()> { @@ -379,12 +366,14 @@ fn run_index_once(cmd: &IndexCommand) -> Result<()> { .context("Open project summary did not include retrieval state")?; let refresh_label = refresh_label(cmd.refresh, opened.refresh_mode); let storage_path = runtime.storage_path.to_string_lossy().to_string(); - let sidecar_is_full = codestory_retrieval::strict_sidecar_status( - &runtime.project_root, - Some(&runtime.storage_path), - ) - .map(|status| status.retrieval_mode == "full") - .unwrap_or(false); + let sidecar_retrieval = doctor_sidecar_status(&runtime); + let readiness = build_summary_readiness( + &opened.summary.root, + &opened.summary.stats, + opened.summary.freshness.as_ref(), + &sidecar_retrieval, + ); + let next_commands = readiness::compatibility_next_commands(&readiness); let output = IndexOutput { project: &opened.summary.root, storage_path: &storage_path, @@ -393,12 +382,8 @@ fn run_index_once(cmd: &IndexCommand) -> Result<()> { retrieval, phase_timings: opened.phase_timings.as_ref(), summary_generation: summary_generation.as_ref(), - next_commands: index_next_commands( - &opened.summary.root, - Some(retrieval), - opened.summary.freshness.as_ref(), - sidecar_is_full, - ), + readiness, + next_commands, }; let markdown = render_index_markdown(&output); @@ -591,6 +576,7 @@ fn run_packet(cmd: PacketCommand) -> Result<()> { question: cmd.question, budget: cmd.budget.into(), task_class: cmd.task_class.map(Into::into), + extra_probes: cmd.extra_probes, include_evidence: !cmd.no_evidence, latency_budget_ms: cmd.latency_budget_ms, }) @@ -758,7 +744,7 @@ fn packet_sufficiency_label(status: PacketSufficiencyStatusDto) -> &'static str match status { PacketSufficiencyStatusDto::Sufficient => "sufficient", PacketSufficiencyStatusDto::Partial => "partial", - PacketSufficiencyStatusDto::Insufficient => "insufficient", + PacketSufficiencyStatusDto::Insufficient => "blocked", } } @@ -1106,6 +1092,27 @@ fn run_doctor(cmd: DoctorCommand) -> Result<()> { emit(cmd.format, &output, markdown, cmd.output_file.as_deref()) } +fn run_ready(cmd: ReadyCommand) -> Result<()> { + ensure_dot_only_for_trail(cmd.format, "ready")?; + preflight_output_file(cmd.output_file.as_deref())?; + let runtime = RuntimeContext::new_inspect_only(&cmd.project)?; + let summary = runtime.open_project_summary()?; + let sidecar = doctor_sidecar_status(&runtime); + let mut verdicts = build_summary_readiness( + &summary.root, + &summary.stats, + summary.freshness.as_ref(), + &sidecar, + ); + if let Some(goal) = cmd.goal { + let goal = goal.as_dto(); + verdicts.retain(|verdict| verdict.goal == goal); + } + let output = ReadyOutput { verdicts }; + let markdown = render_ready_markdown(&output); + emit(cmd.format, &output, markdown, cmd.output_file.as_deref()) +} + fn run_search(cmd: SearchCommand) -> Result<()> { ensure_dot_only_for_trail(cmd.format, "search")?; preflight_output_file(cmd.output_file.as_deref())?; @@ -1196,6 +1203,10 @@ fn execute_drill(cmd: &DrillCommand) -> Result { let before = runtime.open_project_summary()?; let opened = runtime.ensure_open_from_summary(cmd.refresh, before.clone())?; ensure_index_ready(&opened, "drill")?; + if cmd.refresh != args::RefreshMode::None { + retrieval::finalize_retrieval_index_for_runtime(&runtime) + .context("drill retrieval index finalize")?; + } let sidecar_retrieval_mode = codestory_retrieval::strict_sidecar_status( &runtime.project_root, Some(&runtime.storage_path), @@ -3109,7 +3120,10 @@ fn drill_answer_quality_status(needs_source_truth: bool, claim_count: usize) -> } fn drill_bridge_status_is_graph(status: &str) -> bool { - matches!(status, "graph_path" | "reverse_graph_path") + matches!( + status, + "graph_path" | "reverse_graph_path" | "graph_shared_file" + ) } fn drill_bridge_status_is_partial(status: &str) -> bool { @@ -4261,7 +4275,8 @@ fn build_drill_bridge_evidence( } let shared_files = neighborhood_file_cache.shared_files(runtime, &from_id, &to_id); - fallback_drill_bridge( + fallback_drill_bridge_with_search_hints( + runtime, &runtime.project_root, from, to, @@ -4398,6 +4413,7 @@ fn reverse_graph_path_drill_bridge( } #[allow(clippy::too_many_arguments)] +#[cfg(test)] fn fallback_drill_bridge( project_root: &std::path::Path, from: &DrillAnchorOutput, @@ -4410,6 +4426,262 @@ fn fallback_drill_bridge( ) -> DrillBridgeEvidenceOutput { let endpoint_files = drill_bridge_endpoint_files(Some(&from_node), Some(&to_node)); let evidence_files = drill_bridge_evidence_hint_files(from, to); + fallback_drill_bridge_with_evidence_files( + project_root, + from, + to, + from_node, + to_node, + trail, + shared_files, + endpoint_files, + evidence_files, + stale_freshness, + ) +} + +#[allow(clippy::too_many_arguments)] +fn fallback_drill_bridge_with_search_hints( + runtime: &RuntimeContext, + project_root: &std::path::Path, + from: &DrillAnchorOutput, + to: &DrillAnchorOutput, + from_node: SearchHitOutput, + to_node: SearchHitOutput, + trail: &TrailContextDto, + shared_files: Vec, + stale_freshness: bool, +) -> DrillBridgeEvidenceOutput { + let endpoint_files = drill_bridge_endpoint_files(Some(&from_node), Some(&to_node)); + let mut evidence_files = drill_bridge_evidence_hint_files(from, to); + let import_hints = drill_bridge_import_hub_hint_files(runtime, from, to, &from_node, &to_node); + evidence_files.extend(import_hints.iter().cloned()); + if import_hints.is_empty() { + evidence_files.extend(drill_bridge_search_hint_files( + runtime, + from, + to, + &endpoint_files, + )); + } + dedupe_and_rank_drill_files(&mut evidence_files); + evidence_files.truncate(12); + fallback_drill_bridge_with_evidence_files( + project_root, + from, + to, + from_node, + to_node, + trail, + shared_files, + endpoint_files, + evidence_files, + stale_freshness, + ) +} + +fn drill_bridge_import_hub_hint_files( + runtime: &RuntimeContext, + from: &DrillAnchorOutput, + to: &DrillAnchorOutput, + from_node: &SearchHitOutput, + to_node: &SearchHitOutput, +) -> Vec { + let mut files = Vec::new(); + if let Some(path) = from_node.file_path.as_deref() { + files.extend(drill_bridge_import_hub_candidates_from_endpoint( + runtime, path, &to.anchor, + )); + } + if let Some(path) = to_node.file_path.as_deref() { + files.extend(drill_bridge_import_hub_candidates_from_endpoint( + runtime, + path, + &from.anchor, + )); + } + dedupe_and_rank_drill_files(&mut files); + files.truncate(12); + files +} + +fn drill_bridge_import_hub_candidates_from_endpoint( + runtime: &RuntimeContext, + endpoint_file: &str, + opposite_anchor: &str, +) -> Vec { + let Some(endpoint_path) = drill_relative_source_path(&runtime.project_root, endpoint_file) + else { + return Vec::new(); + }; + let Some(source) = drill_read_source_file(&endpoint_path) else { + return Vec::new(); + }; + let mut files = Vec::new(); + for specifier in drill_js_relative_import_specifiers(&source) + .into_iter() + .take(32) + { + let Some(candidate) = + drill_resolve_relative_import(&runtime.project_root, &endpoint_path, &specifier) + else { + continue; + }; + let relative = display::relative_path(&runtime.project_root, &candidate.to_string_lossy()); + if drill_bridge_evidence_file_rank(&relative) >= 9 { + continue; + } + let Some(candidate_source) = drill_read_source_file(&candidate) else { + continue; + }; + if candidate_source.contains(opposite_anchor) { + files.push(relative); + } + } + files +} + +fn drill_bridge_search_hint_files( + runtime: &RuntimeContext, + from: &DrillAnchorOutput, + to: &DrillAnchorOutput, + endpoint_files: &[String], +) -> Vec { + let Ok(results) = runtime.browser.search_results(SearchRequest { + query: format!("{} {}", from.anchor, to.anchor), + repo_text: SearchRepoTextMode::On, + limit_per_source: 25, + expand_search_plan: false, + hybrid_weights: None, + hybrid_limits: None, + }) else { + return Vec::new(); + }; + let mut files = drill_bridge_search_hint_files_from_hits( + &runtime.project_root, + endpoint_files, + &results.repo_text_hits, + &results.indexed_symbol_hits, + ); + files.retain(|path| { + drill_file_contains_terms(&runtime.project_root, path, &[&from.anchor, &to.anchor]) + }); + files +} + +fn drill_relative_source_path( + project_root: &std::path::Path, + path: &str, +) -> Option { + let path = std::path::Path::new(path); + if path.is_absolute() || drill_path_has_escape_component(path) { + return None; + } + let root = fs::canonicalize(project_root).ok()?; + let candidate = fs::canonicalize(project_root.join(path)).ok()?; + candidate.starts_with(&root).then_some(candidate) +} + +fn drill_path_has_escape_component(path: &std::path::Path) -> bool { + path.components().any(|component| { + matches!( + component, + std::path::Component::ParentDir + | std::path::Component::RootDir + | std::path::Component::Prefix(_) + ) + }) +} + +fn drill_read_source_file(path: &std::path::Path) -> Option { + let metadata = fs::metadata(path).ok()?; + if !metadata.is_file() || metadata.len() > 1_000_000 { + return None; + } + fs::read_to_string(path).ok() +} + +fn drill_file_contains_terms(project_root: &std::path::Path, path: &str, terms: &[&str]) -> bool { + let Some(path) = drill_relative_source_path(project_root, path) else { + return false; + }; + let Some(source) = drill_read_source_file(&path) else { + return false; + }; + terms.iter().all(|term| source.contains(term)) +} + +fn drill_js_relative_import_specifiers(source: &str) -> Vec { + let mut specifiers = Vec::new(); + for line in source.lines() { + let trimmed = line.trim_start(); + if !trimmed.starts_with("import ") { + continue; + } + if let Some(specifier) = drill_quoted_js_specifier(trimmed) + && specifier.starts_with('.') + { + specifiers.push(specifier.to_string()); + } + } + specifiers +} + +fn drill_quoted_js_specifier(line: &str) -> Option<&str> { + let from_index = line.find(" from "); + let search = from_index + .map(|index| &line[index + " from ".len()..]) + .unwrap_or(line); + let quote_index = search.find(['\'', '"'])?; + let quote = search[quote_index..].chars().next()?; + let rest = &search[quote_index + quote.len_utf8()..]; + let end = rest.find(quote)?; + Some(&rest[..end]) +} + +fn drill_resolve_relative_import( + project_root: &std::path::Path, + endpoint_path: &std::path::Path, + specifier: &str, +) -> Option { + let specifier_path = std::path::Path::new(specifier); + if specifier_path.is_absolute() || !specifier.starts_with('.') { + return None; + } + let root = fs::canonicalize(project_root).ok()?; + let endpoint = fs::canonicalize(endpoint_path).ok()?; + if !endpoint.starts_with(&root) { + return None; + } + let base = endpoint.parent()?.join(specifier_path); + let mut candidates = vec![base.clone()]; + if base.extension().is_none() { + for extension in ["js", "jsx", "ts", "tsx", "mjs", "cjs"] { + candidates.push(base.with_extension(extension)); + } + for extension in ["js", "jsx", "ts", "tsx", "mjs", "cjs"] { + candidates.push(base.join(format!("index.{extension}"))); + } + } + candidates.into_iter().find_map(|candidate| { + let candidate = fs::canonicalize(candidate).ok()?; + (candidate.is_file() && candidate.starts_with(&root)).then_some(candidate) + }) +} + +#[allow(clippy::too_many_arguments)] +fn fallback_drill_bridge_with_evidence_files( + project_root: &std::path::Path, + from: &DrillAnchorOutput, + to: &DrillAnchorOutput, + from_node: SearchHitOutput, + to_node: SearchHitOutput, + trail: &TrailContextDto, + shared_files: Vec, + endpoint_files: Vec, + evidence_files: Vec, + stale_freshness: bool, +) -> DrillBridgeEvidenceOutput { let classification = drill_fallback_bridge_classification( from, to, @@ -4473,11 +4745,11 @@ fn drill_fallback_bridge_classification( ) -> DrillFallbackBridgeClassification { if !shared_files.is_empty() { return DrillFallbackBridgeClassification { - status: "shared_file_only".to_string(), - strategy: "to_target_symbol_then_shared_files".to_string(), - confidence: "low".to_string(), - evidence_kind: "shared_file".to_string(), - note: "shared-file evidence is a containment hint; verify source before claiming runtime flow" + status: "graph_shared_file".to_string(), + strategy: "to_target_symbol_then_graph_shared_files".to_string(), + confidence: "medium".to_string(), + evidence_kind: "graph_shared_file".to_string(), + note: "typed graph neighborhoods found shared source files; this proves shared graph context, not execution direction" .to_string(), }; } @@ -4780,6 +5052,39 @@ fn drill_bridge_evidence_hint_files( files } +fn drill_bridge_search_hint_files_from_hits( + project_root: &std::path::Path, + endpoint_files: &[String], + repo_text_hits: &[SearchHit], + indexed_symbol_hits: &[SearchHit], +) -> Vec { + let endpoint_keys = endpoint_files + .iter() + .map(|path| normalize_drill_path(path)) + .collect::>(); + let mut seen = HashSet::new(); + let mut files = Vec::new(); + for hit in repo_text_hits.iter().chain(indexed_symbol_hits.iter()) { + let Some(path) = hit.file_path.as_deref() else { + continue; + }; + let path = display::relative_path(project_root, path); + let key = normalize_drill_path(&path); + if endpoint_keys.contains(&key) + || drill_question_target_is_low_signal(&path) + || drill_bridge_evidence_file_rank(&path) >= 9 + { + continue; + } + if seen.insert(key) { + files.push(path); + } + } + rank_drill_bridge_evidence_files(&mut files); + files.truncate(12); + files +} + fn dedupe_and_rank_drill_files(files: &mut Vec) { let mut seen = HashSet::new(); files.retain(|path| seen.insert(path.clone())); @@ -7392,21 +7697,37 @@ fn render_files_summary(markdown: &mut String, output: &codestory_contracts::api let status = if output.usable { "usable" } else { "empty" }; let _ = writeln!( markdown, - "- index: {status}; files: {}; indexed: {}; incomplete: {}; error files: {}", + "- index: {status}; whole index files: {}; indexed: {}; incomplete: {}; error files: {}; filtered files: {}; visible rows: {}; truncated: {}", output.summary.file_count, output.summary.indexed_file_count, output.summary.incomplete_file_count, - output.summary.error_file_count + output.summary.error_file_count, + output.summary.filtered_file_count, + output.summary.visible_file_count, + output.summary.truncated ); if !output.summary.language_counts.is_empty() { let languages = output .summary .language_counts .iter() - .map(|entry| format!("{}={}", entry.language, entry.file_count)) + .map(|entry| { + format!( + "{}={} [{}; {}]", + entry.language, entry.file_count, entry.support_mode, entry.evidence_tier + ) + }) .collect::>() .join(", "); let _ = writeln!(markdown, "- languages: {languages}"); + let claim_labels = output + .summary + .language_counts + .iter() + .map(|entry| format!("{}={}", entry.language, entry.claim_label)) + .collect::>() + .join(", "); + let _ = writeln!(markdown, "- language_support_claims: {claim_labels}"); } for note in &output.summary.coverage_notes { let _ = writeln!(markdown, "- coverage: {note}"); @@ -7427,11 +7748,11 @@ fn render_framework_route_coverage( fn framework_route_coverage_row(entry: &FrameworkRouteCoverageDto) -> String { format!( - "- {} ({}) status={} fixture_status={} confidence_floor={} handler_link={} promotable={} unsupported={} known_gaps={}", + "- {} ({}) status={} coverage_evidence={} confidence_floor={} handler_link={} promotable={} unsupported={} known_gaps={}", entry.framework, entry.language, entry.status, - entry.fixture_status, + entry.coverage_evidence, entry.confidence_floor, entry.handler_link_support, entry.promotable, @@ -7934,7 +8255,13 @@ fn build_doctor_output( let storage_path = display::clean_path_string(&runtime.storage_path.to_string_lossy()); let storage_exists = runtime.storage_path.exists(); let sidecar_retrieval = doctor_sidecar_status(runtime); - let sidecar_is_full = sidecar_retrieval.retrieval_mode == "full"; + let readiness = build_summary_readiness( + &project, + &summary.stats, + summary.freshness.as_ref(), + &sidecar_retrieval, + ); + let next_commands = readiness::compatibility_next_commands(&readiness); let mut checks = Vec::new(); checks.push(doctor_check( "project", @@ -8023,17 +8350,38 @@ fn build_doctor_output( sidecar_retrieval, retrieval, freshness: summary.freshness.clone(), + readiness, checks, - next_commands: index_next_commands( - &project, - summary.retrieval.as_ref(), - summary.freshness.as_ref(), - sidecar_is_full, - ), + next_commands, environment, } } +fn build_summary_readiness( + project: &str, + stats: &codestory_contracts::api::StorageStatsDto, + freshness: Option<&IndexFreshnessDto>, + sidecar: &DoctorSidecarStatusOutput, +) -> Vec { + readiness::build_readiness_verdicts(readiness::ReadinessInputs { + project, + stats, + freshness, + sidecar: Some(readiness_sidecar_input(sidecar)), + }) +} + +fn readiness_sidecar_input( + sidecar: &DoctorSidecarStatusOutput, +) -> readiness::ReadinessSidecarInput<'_> { + readiness::ReadinessSidecarInput { + retrieval_mode: sidecar.retrieval_mode.as_str(), + degraded_reason: sidecar.degraded_reason.as_deref(), + manifest_generation: sidecar.manifest_generation.as_deref(), + manifest_input_hash: sidecar.manifest_input_hash.as_deref(), + } +} + fn doctor_sidecar_status(runtime: &RuntimeContext) -> DoctorSidecarStatusOutput { match codestory_retrieval::strict_sidecar_status( &runtime.project_root, @@ -8355,6 +8703,7 @@ fn doctor_sidecar_check(sidecar: &DoctorSidecarStatusOutput) -> DoctorCheckOutpu ) } +#[cfg(test)] fn index_next_commands( project: &str, retrieval: Option<&codestory_contracts::api::RetrievalStateDto>, @@ -9596,6 +9945,7 @@ mod tests { semantic_fallbacks: Vec::new(), annotations: Vec::new(), steps: Vec::new(), + packet_sidecar_diagnostics: Vec::new(), retrieval_shadow: None, }, } @@ -9609,6 +9959,7 @@ mod tests { edge_count: 0, file_count, error_count: 0, + fatal_error_count: 0, }, members: Vec::new(), retrieval: None, @@ -9636,6 +9987,14 @@ mod tests { semantic_docs_embedded: Some(12), semantic_docs_pending: Some(13), semantic_docs_stale: Some(14), + symbol_search_docs_written: Some(15), + semantic_dense_docs_skipped: Some(16), + semantic_dense_public_api: Some(17), + semantic_dense_entrypoint: Some(18), + semantic_dense_documented_nontrivial: Some(19), + semantic_dense_central_graph_node: Some(20), + semantic_dense_component_report: Some(21), + semantic_dense_unstructured_doc: Some(22), deferred_indexes_ms: Some(7), summary_snapshot_ms: Some(8), detail_snapshot_ms: Some(9), @@ -9928,6 +10287,144 @@ mod tests { } } + #[test] + fn drill_bridge_search_hints_keep_middle_source_files() { + fn search_hit( + name: &str, + path: &str, + origin: codestory_contracts::api::SearchHitOrigin, + ) -> SearchHit { + SearchHit { + node_id: NodeId(format!("{path}:{name}")), + display_name: name.to_string(), + kind: NodeKind::FUNCTION, + file_path: Some(path.to_string()), + line: Some(1), + score: 1.0, + origin, + match_quality: None, + resolvable: true, + score_breakdown: None, + } + } + + let endpoint_files = vec![ + "lib/axios.js".to_string(), + "lib/core/dispatchRequest.js".to_string(), + ]; + let repo_text_hits = vec![ + search_hit( + "Axios.js", + "lib/core/Axios.js", + codestory_contracts::api::SearchHitOrigin::TextMatch, + ), + search_hit( + "axios.js", + "lib/axios.js", + codestory_contracts::api::SearchHitOrigin::TextMatch, + ), + search_hit( + "bundle.js", + "dist/bundle.js", + codestory_contracts::api::SearchHitOrigin::TextMatch, + ), + ]; + let indexed_symbol_hits = vec![ + search_hit( + "Axios", + "lib/core/Axios.js", + codestory_contracts::api::SearchHitOrigin::IndexedSymbol, + ), + search_hit( + "InterceptorManager", + "lib/core/InterceptorManager.js", + codestory_contracts::api::SearchHitOrigin::IndexedSymbol, + ), + ]; + + let files = drill_bridge_search_hint_files_from_hits( + Path::new("C:/repo"), + &endpoint_files, + &repo_text_hits, + &indexed_symbol_hits, + ); + + assert_eq!( + files, + vec![ + "lib/core/Axios.js".to_string(), + "lib/core/InterceptorManager.js".to_string() + ] + ); + } + + #[test] + fn drill_import_hub_helpers_resolve_relative_js_imports() { + let temp = tempdir().expect("temp dir"); + let lib_dir = temp.path().join("lib"); + let core_dir = lib_dir.join("core"); + fs::create_dir_all(&core_dir).expect("create dirs"); + let endpoint = lib_dir.join("axios.js"); + let axios_core = core_dir.join("Axios.js"); + fs::write( + &endpoint, + "import Axios from './core/Axios.js';\nimport './polyfill.js';\n", + ) + .expect("write endpoint"); + fs::write( + &axios_core, + "import dispatchRequest from './dispatchRequest.js';\nclass Axios {}\n", + ) + .expect("write candidate"); + let outside = temp.path().with_file_name(format!( + "{}-outside.js", + temp.path().file_name().unwrap().to_string_lossy() + )); + fs::write(&outside, "class Outside {}\n").expect("write outside file"); + + let source = fs::read_to_string(&endpoint).expect("read endpoint"); + let specifiers = drill_js_relative_import_specifiers(&source); + + assert_eq!( + specifiers, + vec!["./core/Axios.js".to_string(), "./polyfill.js".to_string()] + ); + assert_eq!( + drill_resolve_relative_import(temp.path(), &endpoint, "./core/Axios.js"), + Some(fs::canonicalize(&axios_core).expect("canonical axios core")) + ); + assert_eq!( + drill_relative_source_path(temp.path(), &axios_core.to_string_lossy()), + None + ); + assert_eq!( + drill_relative_source_path(temp.path(), "../outside.js"), + None + ); + assert_eq!( + drill_resolve_relative_import( + temp.path(), + &endpoint, + &format!("../{}", outside.file_name().unwrap().to_string_lossy()) + ), + None + ); + assert_eq!( + drill_resolve_relative_import(temp.path(), &endpoint, &outside.to_string_lossy()), + None + ); + assert!(drill_file_contains_terms( + temp.path(), + "lib/core/Axios.js", + &["dispatchRequest", "Axios"] + )); + assert!(!drill_file_contains_terms( + temp.path(), + "lib/core/Axios.js", + &["createInstance", "dispatchRequest"] + )); + } + #[test] fn drill_bridge_constructors_preserve_status_contract() { let from = sample_drill_anchor("FromAnchor", "a"); @@ -9975,9 +10472,12 @@ mod tests { vec!["src/lib.rs".to_string()], false, ); - assert_eq!(shared_file.status, "shared_file_only"); - assert_eq!(shared_file.strategy, "to_target_symbol_then_shared_files"); - assert_eq!(shared_file.confidence, "low"); + assert_eq!(shared_file.status, "graph_shared_file"); + assert_eq!( + shared_file.strategy, + "to_target_symbol_then_graph_shared_files" + ); + assert_eq!(shared_file.confidence, "medium"); let mut hinted_from = sample_drill_anchor("FromAnchor", "a"); hinted_from.consumer_summary = Some(DrillAnchorConsumerSummaryOutput { @@ -10059,7 +10559,7 @@ mod tests { vec!["src/shared.rs".to_string()], false, ); - assert_eq!(shared_with_hints.status, "shared_file_only"); + assert_eq!(shared_with_hints.status, "graph_shared_file"); assert_eq!(shared_with_hints.shared_files, vec!["src/shared.rs"]); assert_eq!(shared_with_hints.evidence_files, vec!["src/from-user.rs"]); @@ -10528,6 +11028,7 @@ mod tests { retrieval: &retrieval, phase_timings: Some(&timings), summary_generation: None, + readiness: Vec::new(), next_commands: Vec::new(), }; @@ -11175,6 +11676,7 @@ mod tests { semantic: 0.2, graph: 0.1, total: 0.9, + provenance: Vec::new(), }), }]; @@ -11269,10 +11771,16 @@ mod tests { #[test] fn command_quoting_single_quotes_shell_sensitive_values() { + #[cfg(windows)] assert_eq!( quote_command_value("Inspect $env:SECRET and $(Get-ChildItem) and 'literal'"), "'Inspect $env:SECRET and $(Get-ChildItem) and ''literal'''" ); + #[cfg(not(windows))] + assert_eq!( + quote_command_value("Inspect $env:SECRET and $(Get-ChildItem) and 'literal'"), + r"'Inspect $env:SECRET and $(Get-ChildItem) and '\''literal'\'''" + ); assert_eq!( quote_command_path(Path::new("C:/repo/$hidden")), "'C:/repo/$hidden'" diff --git a/crates/codestory-cli/src/output.rs b/crates/codestory-cli/src/output.rs index 10c621ec..10109bcb 100644 --- a/crates/codestory-cli/src/output.rs +++ b/crates/codestory-cli/src/output.rs @@ -10,6 +10,7 @@ use codestory_contracts::api::{ SearchPlanDto, SearchPlanPromotionStatusDto, SnippetContextDto, SymbolContextDto, TrailContextDto, TrailStoryDto, }; +use codestory_contracts::language_support::language_name_for_path; use serde::Serialize; use serde_json::Value; use std::collections::{BTreeMap, HashMap}; @@ -20,7 +21,7 @@ use std::path::Path; use crate::args::{ CliTrailMode, DoctorOutput, DrillOutput, IndexDryRunOutput, IndexOutput, OutputFormat, - QueryItemOutput, QueryOutput, SearchHitOutput, SearchOutput, TrailCommand, + QueryItemOutput, QueryOutput, ReadyOutput, SearchHitOutput, SearchOutput, TrailCommand, VerificationTargetOutput, }; use crate::display::{ @@ -135,11 +136,53 @@ pub(crate) fn render_index_markdown(output: &IndexOutput<'_>) -> String { if let Some(timings) = output.phase_timings { append_index_phase_timings(&mut markdown, timings); } + append_readiness_verdicts(&mut markdown, &output.readiness, true); append_index_summary_generation(&mut markdown, output); append_next_commands(&mut markdown, &output.next_commands); markdown } +pub(crate) fn render_ready_markdown(output: &ReadyOutput) -> String { + let mut markdown = String::new(); + let _ = writeln!(markdown, "# Readiness"); + append_readiness_verdicts(&mut markdown, &output.verdicts, true); + markdown +} + +fn append_readiness_verdicts( + markdown: &mut String, + verdicts: &[codestory_contracts::api::ReadinessVerdictDto], + include_full_repair: bool, +) { + if verdicts.is_empty() { + return; + } + let _ = writeln!(markdown, "readiness_verdicts:"); + for verdict in verdicts { + let _ = writeln!( + markdown, + "- {} [{}]: {}", + crate::readiness::goal_label(verdict.goal), + crate::readiness::status_label(verdict.status), + verdict.summary + ); + append_verdict_commands(markdown, "minimum_next", &verdict.minimum_next); + if include_full_repair && verdict.full_repair != verdict.minimum_next { + append_verdict_commands(markdown, "full_repair", &verdict.full_repair); + } + } +} + +fn append_verdict_commands(markdown: &mut String, label: &str, commands: &[String]) { + if commands.is_empty() { + return; + } + let _ = writeln!(markdown, " {label}:"); + for command in commands { + let _ = writeln!(markdown, " - `{command}`"); + } +} + fn append_index_members(markdown: &mut String, output: &IndexOutput<'_>) { if output.summary.members.is_empty() { return; @@ -2104,6 +2147,7 @@ pub(crate) fn render_doctor_markdown(output: &DoctorOutput) -> String { doctor_local_navigation_readiness(output), doctor_agent_packet_search_readiness(output) ); + append_readiness_verdicts(&mut markdown, &output.readiness, true); if let Some(retrieval) = output.retrieval.as_ref() { let _ = writeln!( markdown, @@ -2118,11 +2162,19 @@ pub(crate) fn render_doctor_markdown(output: &DoctorOutput) -> String { .collect::>(); if !attention.is_empty() { let _ = writeln!(markdown, "attention:"); + let mut seen = Vec::new(); for check in attention { + let key = format!("{}:{}:{}", check.name, check.status, check.message); + if seen.contains(&key) { + continue; + } + seen.push(key); let _ = writeln!( markdown, "- {} [{}]: {}", - check.name, check.status, check.message + check.name, + check.status, + compact_doctor_check_message(check) ); } } @@ -2131,7 +2183,9 @@ pub(crate) fn render_doctor_markdown(output: &DoctorOutput) -> String { let _ = writeln!( markdown, "- {} [{}]: {}", - check.name, check.status, check.message + check.name, + check.status, + compact_doctor_check_message(check) ); } let _ = writeln!(markdown, "environment:"); @@ -2151,6 +2205,24 @@ pub(crate) fn render_doctor_markdown(output: &DoctorOutput) -> String { markdown } +fn compact_doctor_check_message(check: &crate::args::DoctorCheckOutput) -> String { + if check.name != "semantic_contract" || check.message.len() <= 280 { + return check.message.clone(); + } + let gap_count = check + .message + .split("; ") + .filter(|part| { + !part.contains("Run `codestory-cli retrieval index --refresh full`") + && !part.contains("Resolve the embedding runtime first") + }) + .count() + .max(1); + format!( + "semantic contract has {gap_count} mismatch(es). Run `codestory-cli retrieval index --refresh full`; rerun `codestory-cli doctor --format markdown` for the full diff." + ) +} + fn doctor_operator_status(output: &DoctorOutput) -> &'static str { if output.checks.iter().any(|check| check.status == "error") { "blocked" @@ -2182,6 +2254,13 @@ fn doctor_operator_next_action(output: &DoctorOutput) -> &str { } fn doctor_local_navigation_readiness(output: &DoctorOutput) -> &'static str { + if let Some(verdict) = output + .readiness + .iter() + .find(|verdict| crate::readiness::goal_label(verdict.goal) == "local_navigation") + { + return crate::readiness::status_label(verdict.status); + } if !output.indexed || output .freshness @@ -2194,6 +2273,13 @@ fn doctor_local_navigation_readiness(output: &DoctorOutput) -> &'static str { } fn doctor_agent_packet_search_readiness(output: &DoctorOutput) -> &'static str { + if let Some(verdict) = output + .readiness + .iter() + .find(|verdict| crate::readiness::goal_label(verdict.goal) == "agent_packet_search") + { + return crate::readiness::status_label(verdict.status); + } if !output.indexed { return "repair_index"; } @@ -3374,9 +3460,9 @@ fn ansi_highlight_snippet(path: &str, snippet: &str) -> String { fn ansi_highlight_line(language: &str, line: &str) -> String { let comment_marker = match language { - "python" | "ruby" | "toml" | "yaml" => Some("#"), + "bash" | "python" | "ruby" | "toml" | "yaml" => Some("#"), "rust" | "typescript" | "tsx" | "javascript" | "jsx" | "go" | "java" | "kotlin" - | "csharp" | "cpp" | "php" | "swift" => Some("//"), + | "csharp" | "cpp" | "dart" | "php" | "swift" => Some("//"), _ => None, }; let Some(marker) = comment_marker else { @@ -3540,36 +3626,23 @@ fn is_language_keyword(language: &str, word: &str) -> bool { } fn snippet_language(path: &str) -> &'static str { - match Path::new(path) + let extension = Path::new(path) .extension() .and_then(|value| value.to_str()) .unwrap_or_default() - .to_ascii_lowercase() - .as_str() - { - "rs" => "rust", - "ts" => "typescript", + .to_ascii_lowercase(); + + match extension.as_str() { "tsx" => "tsx", - "js" => "javascript", "jsx" => "jsx", - "py" => "python", - "go" => "go", - "java" => "java", - "kt" => "kotlin", - "cs" => "csharp", - "cpp" | "cc" | "cxx" => "cpp", - "h" | "hpp" => "cpp", - "rb" => "ruby", - "php" => "php", - "swift" => "swift", "svelte" => "svelte", "vue" => "vue", "astro" => "astro", "json" => "json", "toml" => "toml", - "md" => "markdown", + "md" | "mdx" => "markdown", "yml" | "yaml" => "yaml", - _ => "", + _ => language_name_for_path(Some(path)).unwrap_or(""), } } @@ -3688,6 +3761,7 @@ mod tests { edge_count: 2, file_count: 1, error_count: 0, + fatal_error_count: 0, } } @@ -3707,6 +3781,7 @@ mod tests { }, retrieval: None, freshness: None, + readiness: Vec::new(), checks: Vec::new(), next_commands: Vec::new(), environment: Vec::new(), @@ -4065,6 +4140,7 @@ mod tests { semantic: 0.1, graph: 0.11, total: 0.91, + provenance: Vec::new(), }), duplicate_of: None, excerpt: None, @@ -4130,6 +4206,7 @@ mod tests { output: Vec::new(), message: Some("checked indexed symbols".to_string()), }], + packet_sidecar_diagnostics: Vec::new(), retrieval_shadow: None, }, }; @@ -4288,6 +4365,36 @@ mod tests { ); } + #[test] + fn snippet_language_uses_shared_registry_extensions() { + for (path, expected) in [ + ("lib/main.dart", "dart"), + ("scripts/bootstrap.sh", "bash"), + ("scripts/bootstrap.bash", "bash"), + ("pkg/types.pyi", "python"), + ("src/server.mts", "typescript"), + ("src/server.cts", "typescript"), + ("build.gradle.kts", "kotlin"), + ("templates/index.html", "html"), + ("assets/site.css", "css"), + ("db/schema.sql", "sql"), + ("src/Widget.tsx", "tsx"), + ("src/Widget.jsx", "jsx"), + ("docs/guide.mdx", "markdown"), + ] { + assert_eq!(snippet_language(path), expected, "{path}"); + } + } + + #[test] + fn ansi_highlight_snippet_marks_dart_and_bash_comments() { + let dart = ansi_highlight_snippet("lib/main.dart", "final ok = true; // comment"); + assert!(dart.contains("\x1b[90m// comment\x1b[0m"), "{dart:?}"); + + let bash = ansi_highlight_snippet("scripts/bootstrap.sh", "echo ok # comment"); + assert!(bash.contains("\x1b[90m# comment\x1b[0m"), "{bash:?}"); + } + #[test] fn ground_why_markdown_contract_includes_evidence_packet_shape() { let snapshot = GroundingSnapshotDto { @@ -4372,6 +4479,7 @@ mod tests { semantic: 0.06, graph: 0.05, total: 0.21, + provenance: Vec::new(), }), }], subgraph_ids: Vec::new(), @@ -4405,6 +4513,7 @@ mod tests { message: Some("source reads skipped by budget".to_string()), }, ], + packet_sidecar_diagnostics: Vec::new(), retrieval_shadow: None, }, }; @@ -4504,6 +4613,7 @@ mod tests { edge_count: 0, file_count: 4, error_count: 2, + fatal_error_count: 0, }, retrieval: Some(retrieval), coverage: GroundingCoverageDto { diff --git a/crates/codestory-cli/src/readiness.rs b/crates/codestory-cli/src/readiness.rs new file mode 100644 index 00000000..811b7895 --- /dev/null +++ b/crates/codestory-cli/src/readiness.rs @@ -0,0 +1,578 @@ +use codestory_contracts::api::{ + IndexFreshnessDto, IndexFreshnessStatusDto, ReadinessGoalDto, ReadinessIndexSnapshotDto, + ReadinessSidecarSnapshotDto, ReadinessStatusDto, ReadinessVerdictDto, StorageStatsDto, +}; + +use crate::display::{clean_path_string, quote_command_argument_value}; + +#[derive(Debug, Clone, Copy)] +pub(crate) struct ReadinessInputs<'a> { + pub(crate) project: &'a str, + pub(crate) stats: &'a StorageStatsDto, + pub(crate) freshness: Option<&'a IndexFreshnessDto>, + pub(crate) sidecar: Option>, +} + +#[derive(Debug, Clone, Copy)] +pub(crate) struct ReadinessSidecarInput<'a> { + pub(crate) retrieval_mode: &'a str, + pub(crate) degraded_reason: Option<&'a str>, + pub(crate) manifest_generation: Option<&'a str>, + pub(crate) manifest_input_hash: Option<&'a str>, +} + +pub(crate) fn build_readiness_verdicts(inputs: ReadinessInputs<'_>) -> Vec { + vec![ + build_readiness_verdict(ReadinessGoalDto::LocalNavigation, inputs), + build_readiness_verdict(ReadinessGoalDto::AgentPacketSearch, inputs), + ] +} + +pub(crate) fn build_readiness_verdict( + goal: ReadinessGoalDto, + inputs: ReadinessInputs<'_>, +) -> ReadinessVerdictDto { + let project = clean_path_string(inputs.project); + let project_arg = project_arg(&project); + let index = readiness_index_snapshot(inputs.stats, inputs.freshness); + let sidecar = inputs.sidecar.map(readiness_sidecar_snapshot); + + let (status, summary, minimum_next, full_repair) = verdict_state( + goal, + inputs.stats, + inputs.freshness, + inputs.sidecar, + &project_arg, + ); + + ReadinessVerdictDto { + goal, + status, + summary, + minimum_next, + full_repair, + index: Some(index), + sidecar, + } +} + +pub(crate) fn combined_minimum_next(verdicts: &[ReadinessVerdictDto]) -> Vec { + dedupe_commands( + verdicts + .iter() + .flat_map(|verdict| verdict.minimum_next.iter().cloned()), + ) +} + +pub(crate) fn compatibility_next_commands(verdicts: &[ReadinessVerdictDto]) -> Vec { + if let Some(verdict) = primary_non_ready(verdicts) { + return verdict.full_repair.clone(); + } + combined_minimum_next(verdicts) +} + +pub(crate) fn primary_non_ready(verdicts: &[ReadinessVerdictDto]) -> Option<&ReadinessVerdictDto> { + verdicts + .iter() + .find(|verdict| verdict.status != ReadinessStatusDto::Ready) +} + +pub(crate) fn status_label(status: ReadinessStatusDto) -> &'static str { + match status { + ReadinessStatusDto::Ready => "ready", + ReadinessStatusDto::RepairIndex => "repair_index", + ReadinessStatusDto::CheckIndex => "check_index", + ReadinessStatusDto::RepairRetrieval => "repair_retrieval", + } +} + +pub(crate) fn goal_label(goal: ReadinessGoalDto) -> &'static str { + match goal { + ReadinessGoalDto::LocalNavigation => "local_navigation", + ReadinessGoalDto::AgentPacketSearch => "agent_packet_search", + } +} + +fn verdict_state( + goal: ReadinessGoalDto, + stats: &StorageStatsDto, + freshness: Option<&IndexFreshnessDto>, + sidecar: Option>, + project_arg: &str, +) -> (ReadinessStatusDto, String, Vec, Vec) { + if stats.node_count == 0 { + return index_repair_state( + goal, + "No indexed symbols are available yet.", + project_arg, + "full", + ); + } + + if stats.fatal_error_count > 0 { + let plural = if stats.fatal_error_count == 1 { + "" + } else { + "s" + }; + return index_repair_state( + goal, + &format!( + "The index recorded {} fatal indexing error{plural}.", + stats.fatal_error_count + ), + project_arg, + "full", + ); + } + + match freshness.map(|freshness| freshness.status) { + Some(IndexFreshnessStatusDto::Stale) => { + return index_repair_state( + goal, + "The index has changed, new, or removed files.", + project_arg, + "incremental", + ); + } + Some(IndexFreshnessStatusDto::NotChecked) => { + let command = + format!("codestory-cli index --project {project_arg} --refresh incremental"); + return ( + ReadinessStatusDto::CheckIndex, + "Index drift was not checked for this cache view.".to_string(), + vec![command.clone()], + vec![ + command, + format!("codestory-cli doctor --project {project_arg}"), + ], + ); + } + Some(IndexFreshnessStatusDto::Fresh) | None => {} + } + + if goal == ReadinessGoalDto::AgentPacketSearch { + let sidecar_mode = sidecar + .map(|sidecar| sidecar.retrieval_mode) + .unwrap_or("unavailable"); + if sidecar_mode != "full" { + let full_repair = agent_packet_search_repair_commands( + project_arg, + !matches!( + freshness.map(|freshness| freshness.status), + Some(IndexFreshnessStatusDto::Fresh) + ), + ); + let minimum_next = full_repair.iter().take(2).cloned().collect(); + return ( + ReadinessStatusDto::RepairRetrieval, + format!( + "Agent packet/search needs full sidecar retrieval; current mode is `{sidecar_mode}`." + ), + minimum_next, + full_repair, + ); + } + } + + let minimum_next = match goal { + ReadinessGoalDto::LocalNavigation => { + vec![format!("codestory-cli ground --project {project_arg}")] + } + ReadinessGoalDto::AgentPacketSearch => vec![format!( + "codestory-cli packet --project {project_arg} --question {}", + quote_command_argument_value("How does this system work?") + )], + }; + ( + ReadinessStatusDto::Ready, + ready_summary_with_errors( + match goal { + ReadinessGoalDto::LocalNavigation => "Local navigation can use the current index.", + ReadinessGoalDto::AgentPacketSearch => { + "Agent packet/search can use the current index and sidecar retrieval." + } + }, + stats, + ), + minimum_next.clone(), + minimum_next, + ) +} + +fn ready_summary_with_errors(base: &str, stats: &StorageStatsDto) -> String { + if stats.error_count > stats.fatal_error_count { + let nonfatal_count = stats.error_count - stats.fatal_error_count; + let plural = if nonfatal_count == 1 { "" } else { "s" }; + format!( + "{base} Recorded {nonfatal_count} nonfatal indexing error{plural}; inspect doctor for partial coverage." + ) + } else { + base.to_string() + } +} + +fn agent_packet_search_repair_commands(project_arg: &str, include_core_index: bool) -> Vec { + let mut commands = Vec::new(); + if include_core_index { + commands.push(format!( + "codestory-cli index --project {project_arg} --refresh full" + )); + } + commands.extend([ + format!("codestory-cli retrieval bootstrap --project {project_arg} --format json"), + format!( + "codestory-cli retrieval index --project {project_arg} --refresh full --format json" + ), + format!("codestory-cli retrieval status --project {project_arg} --format json"), + format!("codestory-cli doctor --project {project_arg} --format markdown"), + ]); + commands +} + +fn index_repair_state( + goal: ReadinessGoalDto, + reason: &str, + project_arg: &str, + refresh: &str, +) -> (ReadinessStatusDto, String, Vec, Vec) { + let command = format!("codestory-cli index --project {project_arg} --refresh {refresh}"); + ( + ReadinessStatusDto::RepairIndex, + format!( + "{} {} cannot be trusted until the index is repaired.", + reason, + goal_label(goal) + ), + vec![command.clone()], + vec![ + command, + format!("codestory-cli doctor --project {project_arg}"), + ], + ) +} + +fn readiness_index_snapshot( + stats: &StorageStatsDto, + freshness: Option<&IndexFreshnessDto>, +) -> ReadinessIndexSnapshotDto { + ReadinessIndexSnapshotDto { + status: freshness.map(|freshness| freshness.status), + error_count: stats.error_count, + fatal_error_count: stats.fatal_error_count, + changed_file_count: freshness + .map(|freshness| freshness.changed_file_count) + .unwrap_or_default(), + new_file_count: freshness + .map(|freshness| freshness.new_file_count) + .unwrap_or_default(), + removed_file_count: freshness + .map(|freshness| freshness.removed_file_count) + .unwrap_or_default(), + checked_file_count: freshness + .map(|freshness| freshness.checked_file_count) + .unwrap_or_default(), + indexed_file_count: freshness + .map(|freshness| freshness.indexed_file_count) + .unwrap_or(stats.file_count), + } +} + +fn readiness_sidecar_snapshot(input: ReadinessSidecarInput<'_>) -> ReadinessSidecarSnapshotDto { + ReadinessSidecarSnapshotDto { + retrieval_mode: input.retrieval_mode.to_string(), + degraded_reason: input.degraded_reason.map(ToOwned::to_owned), + manifest_generation: input.manifest_generation.map(ToOwned::to_owned), + manifest_input_hash: input.manifest_input_hash.map(ToOwned::to_owned), + } +} + +fn dedupe_commands(commands: impl IntoIterator) -> Vec { + let mut deduped = Vec::new(); + for command in commands { + if !deduped.contains(&command) { + deduped.push(command); + } + } + deduped +} + +fn project_arg(project: &str) -> String { + quote_command_argument_value(&clean_path_string(project)) +} + +#[cfg(test)] +mod tests { + use super::*; + + fn stats(node_count: u32) -> StorageStatsDto { + StorageStatsDto { + node_count, + edge_count: node_count.saturating_sub(1), + file_count: u32::from(node_count > 0), + error_count: 0, + fatal_error_count: 0, + } + } + + fn freshness(status: IndexFreshnessStatusDto) -> IndexFreshnessDto { + IndexFreshnessDto { + status, + changed_file_count: u32::from(status == IndexFreshnessStatusDto::Stale), + new_file_count: 0, + removed_file_count: 0, + checked_file_count: 1, + indexed_file_count: 1, + duration_ms: 1, + reason: None, + samples: Vec::new(), + } + } + + fn inputs<'a>( + stats: &'a StorageStatsDto, + freshness: Option<&'a IndexFreshnessDto>, + sidecar: Option>, + ) -> ReadinessInputs<'a> { + ReadinessInputs { + project: "C:/workspace/project", + stats, + freshness, + sidecar, + } + } + + #[test] + fn missing_index_requires_index_repair_for_all_goals() { + let stats = stats(0); + let verdicts = build_readiness_verdicts(inputs(&stats, None, None)); + + assert_eq!(verdicts.len(), 2); + assert!( + verdicts + .iter() + .all(|verdict| verdict.status == ReadinessStatusDto::RepairIndex), + "missing index should block all readiness goals: {verdicts:?}" + ); + assert!( + verdicts[0].minimum_next[0].contains("--refresh full"), + "missing index repair should request full refresh: {verdicts:?}" + ); + } + + #[test] + fn fatal_indexed_errors_block_ready_verdicts() { + let mut stats = stats(3); + stats.error_count = 2; + stats.fatal_error_count = 2; + let freshness = freshness(IndexFreshnessStatusDto::Fresh); + let verdicts = build_readiness_verdicts(inputs( + &stats, + Some(&freshness), + Some(ReadinessSidecarInput { + retrieval_mode: "full", + degraded_reason: None, + manifest_generation: Some("generation"), + manifest_input_hash: Some("hash"), + }), + )); + + assert!( + verdicts + .iter() + .all(|verdict| verdict.status == ReadinessStatusDto::RepairIndex), + "fatal index errors should block all readiness goals: {verdicts:?}" + ); + assert!( + verdicts + .iter() + .all(|verdict| verdict.summary.contains("2 fatal indexing errors")), + "readiness should explain the recorded fatal index errors: {verdicts:?}" + ); + assert!( + verdicts + .iter() + .all(|verdict| verdict.minimum_next[0].contains("--refresh full")), + "error-bearing indexes should request a full refresh repair: {verdicts:?}" + ); + } + + #[test] + fn nonfatal_index_errors_keep_ready_with_partial_coverage_warning() { + let mut stats = stats(3); + stats.error_count = 2; + let freshness = freshness(IndexFreshnessStatusDto::Fresh); + let verdict = build_readiness_verdict( + ReadinessGoalDto::LocalNavigation, + inputs(&stats, Some(&freshness), None), + ); + + assert_eq!(verdict.status, ReadinessStatusDto::Ready); + assert!( + verdict.summary.contains("2 nonfatal indexing errors"), + "nonfatal errors should be visible without blocking local navigation: {verdict:?}" + ); + assert_eq!( + verdict.index.as_ref().map(|index| index.error_count), + Some(2) + ); + assert_eq!( + verdict.index.as_ref().map(|index| index.fatal_error_count), + Some(0) + ); + } + + #[test] + fn unchecked_index_requires_drift_check_before_ready() { + let stats = stats(3); + let freshness = freshness(IndexFreshnessStatusDto::NotChecked); + let verdict = build_readiness_verdict( + ReadinessGoalDto::LocalNavigation, + inputs(&stats, Some(&freshness), None), + ); + + assert_eq!(verdict.status, ReadinessStatusDto::CheckIndex); + assert_eq!( + verdict.index.as_ref().and_then(|index| index.status), + Some(IndexFreshnessStatusDto::NotChecked) + ); + assert!(verdict.minimum_next[0].contains("--refresh incremental")); + } + + #[test] + fn stale_index_requires_incremental_repair() { + let stats = stats(3); + let freshness = freshness(IndexFreshnessStatusDto::Stale); + let verdict = build_readiness_verdict( + ReadinessGoalDto::AgentPacketSearch, + inputs( + &stats, + Some(&freshness), + Some(ReadinessSidecarInput { + retrieval_mode: "full", + degraded_reason: None, + manifest_generation: Some("generation"), + manifest_input_hash: Some("hash"), + }), + ), + ); + + assert_eq!(verdict.status, ReadinessStatusDto::RepairIndex); + assert!(verdict.minimum_next[0].contains("--refresh incremental")); + assert!(verdict.summary.contains("changed, new, or removed files")); + } + + #[test] + fn agent_readiness_requires_full_sidecar_retrieval() { + let stats = stats(3); + let freshness = freshness(IndexFreshnessStatusDto::Fresh); + let unavailable = build_readiness_verdict( + ReadinessGoalDto::AgentPacketSearch, + inputs(&stats, Some(&freshness), None), + ); + + assert_eq!(unavailable.status, ReadinessStatusDto::RepairRetrieval); + assert!( + unavailable + .summary + .contains("current mode is `unavailable`") + ); + assert!(unavailable.sidecar.is_none()); + + let degraded = build_readiness_verdict( + ReadinessGoalDto::AgentPacketSearch, + inputs( + &stats, + Some(&freshness), + Some(ReadinessSidecarInput { + retrieval_mode: "no_semantic", + degraded_reason: Some("semantic store unavailable"), + manifest_generation: Some("generation"), + manifest_input_hash: Some("hash"), + }), + ), + ); + + assert_eq!(degraded.status, ReadinessStatusDto::RepairRetrieval); + assert_eq!( + degraded + .sidecar + .as_ref() + .and_then(|sidecar| sidecar.degraded_reason.as_deref()), + Some("semantic store unavailable") + ); + assert!( + !degraded + .full_repair + .iter() + .any(|command| command.contains("codestory-cli index")), + "fresh-index sidecar repair should not repeat a full core index: {degraded:?}" + ); + assert!( + degraded + .full_repair + .first() + .is_some_and(|command| command.contains("retrieval bootstrap")), + "fresh-index sidecar repair should start with retrieval bootstrap: {degraded:?}" + ); + assert!( + degraded + .full_repair + .iter() + .any(|command| command.contains("retrieval index") + && command.contains("--refresh full")), + "non-full sidecar repair should include full retrieval index: {degraded:?}" + ); + assert!( + degraded + .full_repair + .iter() + .any(|command| command.contains("retrieval status") + && command.contains("--format json")), + "non-full sidecar full repair should include retrieval status proof: {degraded:?}" + ); + assert!( + degraded.full_repair.last().is_some_and( + |command| command.contains("doctor") && command.contains("--format markdown") + ), + "non-full sidecar full repair should finish with markdown doctor proof: {degraded:?}" + ); + assert_eq!( + degraded.minimum_next, + degraded + .full_repair + .iter() + .take(2) + .cloned() + .collect::>() + ); + } + + #[test] + fn agent_readiness_keeps_core_index_repair_when_freshness_is_unknown() { + let stats = stats(3); + let verdict = build_readiness_verdict( + ReadinessGoalDto::AgentPacketSearch, + inputs( + &stats, + None, + Some(ReadinessSidecarInput { + retrieval_mode: "unavailable", + degraded_reason: None, + manifest_generation: None, + manifest_input_hash: None, + }), + ), + ); + + assert_eq!(verdict.status, ReadinessStatusDto::RepairRetrieval); + assert!( + verdict + .full_repair + .first() + .is_some_and(|command| command.contains("codestory-cli index") + && command.contains("--refresh full")), + "unknown freshness should keep the conservative full core index repair: {verdict:?}" + ); + } +} diff --git a/crates/codestory-cli/src/report.rs b/crates/codestory-cli/src/report.rs index 984c6fa8..bf0e0a9d 100644 --- a/crates/codestory-cli/src/report.rs +++ b/crates/codestory-cli/src/report.rs @@ -1,10 +1,12 @@ use anyhow::{Result, bail}; use std::fmt::Write as _; -use crate::args::{OutputFormat, ReportCommand}; +use codestory_runtime::graph_analysis::{RepoReport, RepoReportHandoff, ReportNodeSummary}; + +use crate::args::{OutputFormat, ReportCommand, ReportProfile}; use crate::display::clean_path_string; use crate::output::{emit, validate_output_file_parent}; -use crate::runtime::{RuntimeContext, ensure_index_ready}; +use crate::runtime::{RuntimeContext, ensure_index_ready, map_cache_busy_anyhow}; pub(crate) fn run_report(cmd: ReportCommand) -> Result<()> { if matches!(cmd.format, OutputFormat::Dot) { @@ -17,23 +19,28 @@ pub(crate) fn run_report(cmd: ReportCommand) -> Result<()> { let runtime = RuntimeContext::new_inspect_only(&cmd.project)?; let opened = runtime.ensure_open(crate::args::RefreshMode::None)?; ensure_index_ready(&opened, "report")?; + let sidecar = report_sidecar_status(&runtime); match cmd.format { OutputFormat::Markdown => { - let output = codestory_runtime::graph_analysis::build_report( + let mut output = codestory_runtime::graph_analysis::build_report( &runtime.project_root, &runtime.storage_path, cmd.limit, - )?; - let markdown = render_report_markdown(&output); + ) + .map_err(|error| map_cache_busy_anyhow(error, &runtime.project_root))?; + attach_report_handoff(&mut output, &opened.summary, &sidecar); + let markdown = render_report_markdown(&output, cmd.profile); emit(cmd.format, &output, markdown, cmd.output_file.as_deref()) } OutputFormat::Json => { - let output = codestory_runtime::graph_analysis::build_report_export( + let mut output = codestory_runtime::graph_analysis::build_report_export( &runtime.project_root, &runtime.storage_path, cmd.limit, - )?; - let markdown = render_report_markdown(&output.report); + ) + .map_err(|error| map_cache_busy_anyhow(error, &runtime.project_root))?; + attach_report_handoff(&mut output.report, &opened.summary, &sidecar); + let markdown = render_report_markdown(&output.report, cmd.profile); emit(cmd.format, &output, markdown, cmd.output_file.as_deref()) } OutputFormat::Dot => { @@ -42,7 +49,7 @@ pub(crate) fn run_report(cmd: ReportCommand) -> Result<()> { } } -fn render_report_markdown(output: &codestory_runtime::graph_analysis::RepoReport) -> String { +fn render_report_markdown(output: &RepoReport, profile: ReportProfile) -> String { let mut markdown = String::new(); let _ = writeln!(markdown, "# CodeStory Repo Report"); let _ = writeln!( @@ -67,6 +74,12 @@ fn render_report_markdown(output: &codestory_runtime::graph_analysis::RepoReport ); let _ = writeln!(markdown); + append_handoff_header(&mut markdown, output); + if profile == ReportProfile::Handoff { + append_follow_ups(&mut markdown, output); + return markdown; + } + append_summary(&mut markdown, output); append_node_section(&mut markdown, "Hotspots", &output.hotspots); append_node_section(&mut markdown, "Entry Points", &output.entry_points); @@ -83,6 +96,148 @@ fn render_report_markdown(output: &codestory_runtime::graph_analysis::RepoReport markdown } +#[derive(Debug, Clone)] +struct ReportSidecarStatus { + retrieval_mode: String, + degraded_reason: Option, + manifest_generation: Option, + manifest_input_hash: Option, +} + +fn report_sidecar_status(runtime: &RuntimeContext) -> ReportSidecarStatus { + match codestory_retrieval::strict_sidecar_status( + &runtime.project_root, + Some(&runtime.storage_path), + ) { + Ok(report) => { + let manifest_generation = report + .manifest + .as_ref() + .and_then(|manifest| manifest.sidecar_generation.clone()); + let manifest_input_hash = report + .manifest + .as_ref() + .and_then(|manifest| manifest.sidecar_input_hash.clone()); + ReportSidecarStatus { + retrieval_mode: report.retrieval_mode, + degraded_reason: report.degraded_reason, + manifest_generation, + manifest_input_hash, + } + } + Err(error) => ReportSidecarStatus { + retrieval_mode: "unavailable".to_string(), + degraded_reason: Some(format!("sidecar_status_error: {error}")), + manifest_generation: None, + manifest_input_hash: None, + }, + } +} + +fn attach_report_handoff( + output: &mut RepoReport, + summary: &codestory_contracts::api::ProjectSummary, + sidecar: &ReportSidecarStatus, +) { + let readiness = crate::readiness::build_readiness_verdicts(crate::readiness::ReadinessInputs { + project: &summary.root, + stats: &summary.stats, + freshness: summary.freshness.as_ref(), + sidecar: Some(crate::readiness::ReadinessSidecarInput { + retrieval_mode: &sidecar.retrieval_mode, + degraded_reason: sidecar.degraded_reason.as_deref(), + manifest_generation: sidecar.manifest_generation.as_deref(), + manifest_input_hash: sidecar.manifest_input_hash.as_deref(), + }), + }); + let next_command = crate::readiness::primary_non_ready(&readiness) + .and_then(|verdict| verdict.minimum_next.first().cloned()) + .or_else(|| { + output + .follow_up_queries + .first() + .map(|query| query.command.clone()) + }) + .or_else(|| { + crate::readiness::combined_minimum_next(&readiness) + .into_iter() + .next() + }); + let trust_caveat = if crate::readiness::primary_non_ready(&readiness).is_some() { + "Readiness is not fully green; run the next command before trusting agent packet/search output.".to_string() + } else { + "Generated from the current local store; treat it as a handoff snapshot, not source-of-truth state.".to_string() + }; + output.metadata.handoff = Some(RepoReportHandoff { + readiness, + freshness: summary.freshness.clone(), + sidecar_retrieval_mode: Some(sidecar.retrieval_mode.clone()), + degraded_reason: sidecar.degraded_reason.clone(), + trust_caveat, + top_entry_point: output.entry_points.first().map(report_node_label), + top_risk: output.hotspots.first().map(report_node_label), + next_command, + }); +} + +fn append_handoff_header(markdown: &mut String, output: &RepoReport) { + let _ = writeln!(markdown, "## Read This First / Agent Handoff"); + let Some(handoff) = output.metadata.handoff.as_ref() else { + let _ = writeln!(markdown, "- readiness: not attached"); + let _ = writeln!(markdown); + return; + }; + for verdict in &handoff.readiness { + let _ = writeln!( + markdown, + "- readiness {}: `{}` - {}", + crate::readiness::goal_label(verdict.goal), + crate::readiness::status_label(verdict.status), + verdict.summary + ); + } + if let Some(freshness) = handoff.freshness.as_ref() { + let stale_count = freshness + .changed_file_count + .saturating_add(freshness.new_file_count) + .saturating_add(freshness.removed_file_count); + let _ = writeln!( + markdown, + "- freshness: `{:?}` stale_files={} checked={} indexed={}", + freshness.status, + stale_count, + freshness.checked_file_count, + freshness.indexed_file_count + ); + } else { + let _ = writeln!(markdown, "- freshness: not checked"); + } + let _ = writeln!( + markdown, + "- sidecar: mode={} degraded_reason={}", + handoff + .sidecar_retrieval_mode + .as_deref() + .unwrap_or("unknown"), + handoff.degraded_reason.as_deref().unwrap_or("none") + ); + let _ = writeln!(markdown, "- trust_caveat: {}", handoff.trust_caveat); + let _ = writeln!( + markdown, + "- top_entry_point: {}", + handoff.top_entry_point.as_deref().unwrap_or("n/a") + ); + let _ = writeln!( + markdown, + "- top_risk: {}", + handoff.top_risk.as_deref().unwrap_or("n/a") + ); + if let Some(command) = handoff.next_command.as_deref() { + let _ = writeln!(markdown, "- next_command: `{}`", markdown_escape(command)); + } + let _ = writeln!(markdown); +} + fn append_summary(markdown: &mut String, output: &codestory_runtime::graph_analysis::RepoReport) { let summary = &output.summary; let _ = writeln!(markdown, "## Repo Summary"); @@ -188,6 +343,17 @@ fn render_source_location( rendered } +fn report_node_label(node: &ReportNodeSummary) -> String { + let source = render_source_location(node.source_location.as_ref()); + format!( + "`{}` ({}, edges={}) at {}", + markdown_escape(&node.name), + node.kind, + node.total_edges, + source + ) +} + fn markdown_escape(value: &str) -> String { value.replace('|', "\\|").replace('\n', " ") } diff --git a/crates/codestory-cli/src/retrieval.rs b/crates/codestory-cli/src/retrieval.rs index e5813be8..823508e0 100644 --- a/crates/codestory-cli/src/retrieval.rs +++ b/crates/codestory-cli/src/retrieval.rs @@ -98,7 +98,7 @@ fn run_retrieval_index(cmd: RetrievalIndexCommand) -> Result<()> { let summary = runtime.open_project_summary()?; let refresh_mode = resolve_refresh_request(cmd.refresh, &summary); run_retrieval_index_refresh(&runtime, cmd.refresh, refresh_mode)?; - let outcome = finalize_retrieval_index(&runtime).or_else(|error| { + let outcome = finalize_retrieval_index_for_runtime(&runtime).or_else(|error| { if !retrieval_index_should_retry_full_refresh(cmd.refresh, &error) { return Err(error); } @@ -106,7 +106,7 @@ fn run_retrieval_index(cmd: RetrievalIndexCommand) -> Result<()> { .index .run_indexing_blocking(IndexMode::Full) .map_err(map_api_error)?; - finalize_retrieval_index(&runtime) + finalize_retrieval_index_for_runtime(&runtime) .context("retrieval index finalize after semantic-doc contract repair") })?; emit_retrieval_index(cmd.format, &outcome, cmd.output_file.as_deref()) @@ -138,7 +138,9 @@ fn run_retrieval_index_refresh( }) } -fn finalize_retrieval_index(runtime: &RuntimeContext) -> Result { +pub(crate) fn finalize_retrieval_index_for_runtime( + runtime: &RuntimeContext, +) -> Result { let opened = runtime.ensure_open(crate::args::RefreshMode::None)?; ensure_index_ready(&opened, "retrieval index")?; codestory_retrieval::finalize_index(&runtime.project_root, &runtime.storage_path) diff --git a/crates/codestory-cli/src/runtime.rs b/crates/codestory-cli/src/runtime.rs index 07aeed06..5af07906 100644 --- a/crates/codestory-cli/src/runtime.rs +++ b/crates/codestory-cli/src/runtime.rs @@ -11,7 +11,9 @@ use directories::ProjectDirs; use std::path::{Path, PathBuf}; use crate::args::{ProjectArgs, QuerySelectorOutput, RefreshMode, TargetSelection}; -use crate::display::{clean_path_string, format_search_hit_target, relative_path}; +use crate::display::{ + clean_path_string, format_search_hit_target, quote_command_path, relative_path, +}; use crate::query_resolution::{ ResolutionRank, compare_resolution_hits, file_filter_match_bucket, is_resolvable_graph_target, resolution_rank_with_project_root, search_hit_matches_file_filter, @@ -150,7 +152,7 @@ impl RuntimeContext { self.project_root.clone(), self.storage_path.clone(), ) - .map_err(map_api_error) + .map_err(|error| map_api_error_for_project(error, &self.project_root)) } } @@ -450,8 +452,34 @@ pub(crate) fn ensure_index_ready(opened: &OpenedProject, subcommand: &str) -> Re } pub(crate) fn map_api_error(error: ApiError) -> anyhow::Error { + map_api_error_with_project(error, None) +} + +pub(crate) fn map_api_error_for_project(error: ApiError, project: &Path) -> anyhow::Error { + map_api_error_with_project(error, Some(project)) +} + +fn map_api_error_with_project(error: ApiError, project: Option<&Path>) -> anyhow::Error { + if api_error_is_cache_busy(&error) { + return anyhow!(cache_busy_message(project)); + } let mut message = format!("{}: {}", error.code, error.message); - if let Some(next_commands) = api_error_next_commands(&error) { + if let Some((minimum_next, full_repair)) = api_error_repair_groups(&error) { + if !minimum_next.is_empty() { + message.push_str("\n\nMinimum next:"); + for command in minimum_next { + message.push_str("\n "); + message.push_str(command); + } + } + if !full_repair.is_empty() && full_repair != minimum_next { + message.push_str("\n\nFull repair:"); + for command in full_repair { + message.push_str("\n "); + message.push_str(command); + } + } + } else if let Some(next_commands) = api_error_next_commands(&error) { message.push_str("\n\nNext commands:"); for command in next_commands { message.push_str("\n "); @@ -461,11 +489,50 @@ pub(crate) fn map_api_error(error: ApiError) -> anyhow::Error { anyhow!(message) } +pub(crate) fn map_cache_busy_anyhow(error: anyhow::Error, project: &Path) -> anyhow::Error { + if is_cache_busy_text(&error.to_string()) { + return anyhow!(cache_busy_message(Some(project))); + } + error +} + +fn api_error_repair_groups(error: &ApiError) -> Option<(&[String], &[String])> { + let details = error.details.as_ref()?; + if details.minimum_next.is_empty() && details.full_repair.is_empty() { + return details.readiness.as_ref().map(|verdict| { + ( + verdict.minimum_next.as_slice(), + verdict.full_repair.as_slice(), + ) + }); + } + Some((&details.minimum_next, &details.full_repair)) +} + fn api_error_next_commands(error: &ApiError) -> Option> { let commands = &error.details.as_ref()?.next_commands; (!commands.is_empty()).then_some(commands.clone()) } +fn api_error_is_cache_busy(error: &ApiError) -> bool { + let text = format!("{} {}", error.code, error.message).to_ascii_lowercase(); + is_cache_busy_text(&text) +} + +fn is_cache_busy_text(text: &str) -> bool { + let text = text.to_ascii_lowercase(); + text.contains("database is locked") || text.contains("sqlite_busy") +} + +fn cache_busy_message(project: Option<&Path>) -> String { + let project = project + .map(quote_command_path) + .unwrap_or_else(|| "".to_string()); + format!( + "cache_busy: CodeStory cache is busy or locked. Wait for the active indexing/search process to release the SQLite cache, then retry.\n\nMinimum next:\n codestory-cli ready --project {project} --goal agent\n\nFull repair:\n codestory-cli ready --project {project} --goal agent\n codestory-cli doctor --project {project}" + ) +} + pub(crate) fn search_hit_from_node(node: &NodeDetailsDto) -> SearchHit { SearchHit { node_id: node.id.clone(), diff --git a/crates/codestory-cli/src/stdio_catalog.rs b/crates/codestory-cli/src/stdio_catalog.rs index bb822a51..f43e9b1b 100644 --- a/crates/codestory-cli/src/stdio_catalog.rs +++ b/crates/codestory-cli/src/stdio_catalog.rs @@ -766,8 +766,8 @@ static AGENT_PACKET_SCHEMA: SchemaObject = SchemaObject::object( "Covered claims, gaps, and follow-up contract.", ), SchemaProperty::object( - "benchmark_trace", - "Benchmark-oriented retrieval trace summary.", + "retrieval_trace_summary", + "Compact retrieval trace telemetry summary.", ), ], &[ @@ -777,7 +777,7 @@ static AGENT_PACKET_SCHEMA: SchemaObject = SchemaObject::object( "answer", "budget", "sufficiency", - "benchmark_trace", + "retrieval_trace_summary", ], ); @@ -957,6 +957,10 @@ static PACKET_INPUT_SCHEMA: SchemaObject = SchemaObject::object( SchemaProperty::string("task_class", "Optional task class.") .with_enum(PACKET_TASK_CLASSES) .nullable(), + SchemaProperty::string_array( + "extra_probes", + "Optional audited file, symbol, or file-scoped symbol probes to add to the packet plan.", + ), SchemaProperty::boolean( "include_evidence", "Include citation edge ids and score details.", diff --git a/crates/codestory-cli/src/stdio_transport.rs b/crates/codestory-cli/src/stdio_transport.rs index fa39d41f..c9818a32 100644 --- a/crates/codestory-cli/src/stdio_transport.rs +++ b/crates/codestory-cli/src/stdio_transport.rs @@ -473,19 +473,27 @@ fn handle_stdio_packet( Ok(latency_budget_ms) => latency_budget_ms, Err(error) => return serde_json::json!({"error": error.to_string()}), }; + let extra_probes = match stdio_packet_extra_probes(request) { + Ok(extra_probes) => extra_probes, + Err(error) => return serde_json::json!({"error": error.to_string()}), + }; let include_evidence = request .pointer("/params/arguments/include_evidence") .and_then(|value| value.as_bool()) .unwrap_or(true); - let cache_key = stdio_packet_cache_key( - stdio_storage_fingerprint(&runtime.storage_path), - stdio_mandatory_sidecar_fingerprint(&runtime.project_root, &runtime.storage_path), + let cache_key = stdio_packet_cache_key(StdioPacketCacheKeyInput { + storage_fingerprint: stdio_storage_fingerprint(&runtime.storage_path), + sidecar_fingerprint: stdio_mandatory_sidecar_fingerprint( + &runtime.project_root, + &runtime.storage_path, + ), question, budget, task_class, + extra_probes: &extra_probes, include_evidence, latency_budget_ms, - ); + }); if let Some(cached) = state.packet_cache.get(&cache_key) { return cached; } @@ -495,6 +503,7 @@ fn handle_stdio_packet( question: question.to_string(), budget, task_class, + extra_probes, include_evidence, latency_budget_ms, }) @@ -513,6 +522,7 @@ struct StdioPacketCacheKey { question: String, budget: &'static str, task_class: Option<&'static str>, + extra_probes: Vec, include_evidence: bool, latency_budget_ms: Option, } @@ -578,23 +588,27 @@ impl StdioPacketCache { } } -fn stdio_packet_cache_key( +struct StdioPacketCacheKeyInput<'a> { storage_fingerprint: String, sidecar_fingerprint: String, - question: &str, + question: &'a str, budget: PacketBudgetModeDto, task_class: Option, + extra_probes: &'a [String], include_evidence: bool, latency_budget_ms: Option, -) -> StdioPacketCacheKey { +} + +fn stdio_packet_cache_key(input: StdioPacketCacheKeyInput<'_>) -> StdioPacketCacheKey { StdioPacketCacheKey { - storage_fingerprint, - sidecar_fingerprint, - question: question.to_string(), - budget: stdio_packet_budget_label(budget), - task_class: task_class.map(stdio_packet_task_class_label), - include_evidence, - latency_budget_ms, + storage_fingerprint: input.storage_fingerprint, + sidecar_fingerprint: input.sidecar_fingerprint, + question: input.question.to_string(), + budget: stdio_packet_budget_label(input.budget), + task_class: input.task_class.map(stdio_packet_task_class_label), + extra_probes: input.extra_probes.to_vec(), + include_evidence: input.include_evidence, + latency_budget_ms: input.latency_budget_ms, } } @@ -776,6 +790,39 @@ fn stdio_packet_latency_budget(request: &serde_json::Value) -> Result Result> { + let Some(value) = request.pointer("/params/arguments/extra_probes") else { + return Ok(Vec::new()); + }; + let Some(values) = value.as_array() else { + bail!("packet.extra_probes must be an array of strings"); + }; + if values.len() > 16 { + bail!("packet.extra_probes accepts at most 16 probes"); + } + + let mut probes = Vec::new(); + for value in values { + let Some(probe) = value.as_str() else { + bail!("packet.extra_probes must be an array of strings"); + }; + let probe = probe.trim(); + if probe.is_empty() { + continue; + } + if probe.len() > 240 { + bail!("packet.extra_probes entries must be at most 240 characters"); + } + if !probes + .iter() + .any(|existing: &String| existing.eq_ignore_ascii_case(probe)) + { + probes.push(probe.to_string()); + } + } + Ok(probes) +} + fn handle_stdio_search( runtime: &RuntimeContext, state: &mut StdioServerState, @@ -1424,93 +1471,58 @@ fn read_stdio_resource(runtime: &RuntimeContext, uri: &str) -> serde_json::Value fn read_stdio_status_resource(runtime: &RuntimeContext) -> Result { let summary = runtime.open_project_summary()?; let retrieval = summary.retrieval.as_ref(); - let sidecar = match codestory_retrieval::strict_sidecar_status( - &runtime.project_root, - Some(&runtime.storage_path), - ) { - Ok(report) => serde_json::json!({ - "retrieval_mode": report.retrieval_mode, - "degraded_reason": report.degraded_reason, - "manifest_generation": report.manifest.as_ref().and_then(|manifest| manifest.sidecar_generation.as_deref()), - "manifest_input_hash": report.manifest.as_ref().and_then(|manifest| manifest.sidecar_input_hash.as_deref()), - }), - Err(error) => serde_json::json!({ - "retrieval_mode": "unavailable", - "degraded_reason": format!("sidecar_status_error: {error}"), - }), - }; - let sidecar_mode = sidecar - .get("retrieval_mode") - .and_then(serde_json::Value::as_str) - .unwrap_or("unavailable"); - let recommended_next_calls = if sidecar_mode == "full" { - serde_json::json!([ - { - "method": "resources/read", - "uri": "codestory://agent-guide" - }, - { - "method": "tools/call", - "tool": "packet", - "arguments": { - "question": "", - "budget": "compact" - } - }, - { - "method": "tools/call", - "tool": "search", - "arguments": { - "query": "", - "limit": 10 - } - }, - { - "method": "tools/call", - "tool": "definition", - "arguments": { - "id": "" - } - }, - { - "method": "resources/read", - "uri": "codestory://trail/" - } - ]) - } else { - serde_json::json!([ - { - "method": "cli", - "command": "codestory-cli retrieval status --project " - }, - { - "method": "cli", - "command": "codestory-cli retrieval index --project --refresh full" - }, - { - "method": "resources/read", - "uri": "codestory://status" - }, - { - "method": "resources/read", - "uri": "codestory://agent-guide" - }, - { - "method": "tools/call", - "tool": "search", - "arguments": { - "query": "", - "limit": 10 - } + let (sidecar_mode, degraded_reason, manifest_generation, manifest_input_hash) = + match codestory_retrieval::strict_sidecar_status( + &runtime.project_root, + Some(&runtime.storage_path), + ) { + Ok(report) => { + let manifest_generation = report + .manifest + .as_ref() + .and_then(|manifest| manifest.sidecar_generation.clone()); + let manifest_input_hash = report + .manifest + .as_ref() + .and_then(|manifest| manifest.sidecar_input_hash.clone()); + ( + report.retrieval_mode, + report.degraded_reason, + manifest_generation, + manifest_input_hash, + ) } - ]) - }; + Err(error) => ( + "unavailable".to_string(), + Some(format!("sidecar_status_error: {error}")), + None, + None, + ), + }; + let sidecar = serde_json::json!({ + "retrieval_mode": sidecar_mode.clone(), + "degraded_reason": degraded_reason.clone(), + "manifest_generation": manifest_generation.clone(), + "manifest_input_hash": manifest_input_hash.clone(), + }); + let readiness = crate::readiness::build_readiness_verdicts(crate::readiness::ReadinessInputs { + project: &summary.root, + stats: &summary.stats, + freshness: summary.freshness.as_ref(), + sidecar: Some(crate::readiness::ReadinessSidecarInput { + retrieval_mode: &sidecar_mode, + degraded_reason: degraded_reason.as_deref(), + manifest_generation: manifest_generation.as_deref(), + manifest_input_hash: manifest_input_hash.as_deref(), + }), + }); + let recommended_next_calls = stdio_status_recommended_next_calls(&readiness); Ok(serde_json::json!({ "project_root": crate::display::clean_path_string(&runtime.project_root.to_string_lossy()), "storage_path": crate::display::clean_path_string(&runtime.storage_path.to_string_lossy()), "storage_exists": runtime.storage_path.exists(), - "retrieval_mode": sidecar["retrieval_mode"], - "degraded_reason": sidecar["degraded_reason"], + "retrieval_mode": sidecar_mode, + "degraded_reason": degraded_reason, "sidecar_retrieval": sidecar, "legacy_semantic_diagnostics": { "mode": retrieval.map(|state| state.mode), @@ -1521,10 +1533,74 @@ fn read_stdio_status_resource(runtime: &RuntimeContext) -> Result serde_json::Value { + if let Some(non_ready) = crate::readiness::primary_non_ready(readiness) { + return serde_json::Value::Array( + non_ready + .full_repair + .iter() + .map(|command| { + serde_json::json!({ + "method": "cli", + "command": command + }) + }) + .chain([ + serde_json::json!({ + "method": "resources/read", + "uri": "codestory://status" + }), + serde_json::json!({ + "method": "resources/read", + "uri": "codestory://agent-guide" + }), + ]) + .collect(), + ); + } + + serde_json::json!([ + { + "method": "resources/read", + "uri": "codestory://agent-guide" + }, + { + "method": "tools/call", + "tool": "packet", + "arguments": { + "question": "", + "budget": "compact" + } + }, + { + "method": "tools/call", + "tool": "search", + "arguments": { + "query": "", + "limit": 10 + } + }, + { + "method": "tools/call", + "tool": "definition", + "arguments": { + "id": "" + } + }, + { + "method": "resources/read", + "uri": "codestory://trail/" + } + ]) +} + fn read_stdio_agent_guide_resource() -> serde_json::Value { serde_json::json!({ "purpose": "Default read-only CodeStory browser loop for local codebase grounding.", @@ -1688,16 +1764,24 @@ mod tests { use super::*; use serde_json::json; - fn packet_key(question: &str, storage_fingerprint: &str) -> StdioPacketCacheKey { - stdio_packet_cache_key( - storage_fingerprint.to_string(), - "sidecar-full".to_string(), + fn base_packet_cache_key_input(question: &str) -> StdioPacketCacheKeyInput<'_> { + StdioPacketCacheKeyInput { + storage_fingerprint: "snapshot-a".to_string(), + sidecar_fingerprint: "sidecar-full".to_string(), question, - PacketBudgetModeDto::Compact, - Some(PacketTaskClassDto::ArchitectureExplanation), - true, - Some(15_000), - ) + budget: PacketBudgetModeDto::Compact, + task_class: Some(PacketTaskClassDto::ArchitectureExplanation), + extra_probes: &[], + include_evidence: true, + latency_budget_ms: Some(15_000), + } + } + + fn packet_key(question: &str, storage_fingerprint: &str) -> StdioPacketCacheKey { + stdio_packet_cache_key(StdioPacketCacheKeyInput { + storage_fingerprint: storage_fingerprint.to_string(), + ..base_packet_cache_key_input(question) + }) } #[test] @@ -1793,74 +1877,50 @@ mod tests { #[test] fn stdio_packet_cache_key_changes_with_request_arguments_and_snapshot() { - let base = stdio_packet_cache_key( - "snapshot-a".to_string(), - "sidecar-full".to_string(), - "Explain packet caching.", - PacketBudgetModeDto::Compact, - Some(PacketTaskClassDto::ArchitectureExplanation), - true, - Some(15_000), + let base_input = base_packet_cache_key_input("Explain packet caching."); + let base = stdio_packet_cache_key(base_input); + assert_ne!( + base, + stdio_packet_cache_key(StdioPacketCacheKeyInput { + storage_fingerprint: "snapshot-b".to_string(), + ..base_packet_cache_key_input("Explain packet caching.") + }) ); assert_ne!( base, - stdio_packet_cache_key( - "snapshot-b".to_string(), - "sidecar-full".to_string(), - "Explain packet caching.", - PacketBudgetModeDto::Compact, - Some(PacketTaskClassDto::ArchitectureExplanation), - true, - Some(15_000), - ) + stdio_packet_cache_key(StdioPacketCacheKeyInput { + budget: PacketBudgetModeDto::Tiny, + ..base_packet_cache_key_input("Explain packet caching.") + }) ); assert_ne!( base, - stdio_packet_cache_key( - "snapshot-a".to_string(), - "sidecar-full".to_string(), - "Explain packet caching.", - PacketBudgetModeDto::Tiny, - Some(PacketTaskClassDto::ArchitectureExplanation), - true, - Some(15_000), - ) + stdio_packet_cache_key(StdioPacketCacheKeyInput { + task_class: Some(PacketTaskClassDto::EditPlanning), + ..base_packet_cache_key_input("Explain packet caching.") + }) ); assert_ne!( base, - stdio_packet_cache_key( - "snapshot-a".to_string(), - "sidecar-full".to_string(), - "Explain packet caching.", - PacketBudgetModeDto::Compact, - Some(PacketTaskClassDto::EditPlanning), - true, - Some(15_000), - ) + stdio_packet_cache_key(StdioPacketCacheKeyInput { + include_evidence: false, + ..base_packet_cache_key_input("Explain packet caching.") + }) ); assert_ne!( base, - stdio_packet_cache_key( - "snapshot-a".to_string(), - "sidecar-full".to_string(), - "Explain packet caching.", - PacketBudgetModeDto::Compact, - Some(PacketTaskClassDto::ArchitectureExplanation), - false, - Some(15_000), - ) + stdio_packet_cache_key(StdioPacketCacheKeyInput { + latency_budget_ms: Some(30_000), + ..base_packet_cache_key_input("Explain packet caching.") + }) ); + let extra_probes = ["src/lib.rs run".to_string()]; assert_ne!( base, - stdio_packet_cache_key( - "snapshot-a".to_string(), - "sidecar-full".to_string(), - "Explain packet caching.", - PacketBudgetModeDto::Compact, - Some(PacketTaskClassDto::ArchitectureExplanation), - true, - Some(30_000), - ) + stdio_packet_cache_key(StdioPacketCacheKeyInput { + extra_probes: &extra_probes, + ..base_packet_cache_key_input("Explain packet caching.") + }) ); } @@ -1871,24 +1931,16 @@ mod tests { "retrieval_mode:full|manifest_generation:project-a|manifest_input_hash:hash-a"; let stale_sidecar = "retrieval_mode:unavailable|degraded_reason:sidecar_manifest_stale"; - let packet_full = stdio_packet_cache_key( - storage_fingerprint.clone(), - full_sidecar.to_string(), - "Explain packet caching.", - PacketBudgetModeDto::Compact, - Some(PacketTaskClassDto::ArchitectureExplanation), - true, - Some(15_000), - ); - let packet_stale = stdio_packet_cache_key( - storage_fingerprint.clone(), - stale_sidecar.to_string(), - "Explain packet caching.", - PacketBudgetModeDto::Compact, - Some(PacketTaskClassDto::ArchitectureExplanation), - true, - Some(15_000), - ); + let packet_full = stdio_packet_cache_key(StdioPacketCacheKeyInput { + storage_fingerprint: storage_fingerprint.clone(), + sidecar_fingerprint: full_sidecar.to_string(), + ..base_packet_cache_key_input("Explain packet caching.") + }); + let packet_stale = stdio_packet_cache_key(StdioPacketCacheKeyInput { + storage_fingerprint: storage_fingerprint.clone(), + sidecar_fingerprint: stale_sidecar.to_string(), + ..base_packet_cache_key_input("Explain packet caching.") + }); assert_ne!(packet_full, packet_stale); let search_full = StdioSearchFragmentCacheKey { @@ -1926,6 +1978,11 @@ mod tests { sidecar_input_hash: Some("hash-a".into()), sidecar_generation: Some("project-a-hash-a".into()), projection_count: Some(12), + symbol_doc_count: Some(120), + dense_projection_count: Some(12), + semantic_policy_version: Some("graph_first_v1".into()), + graph_artifact_hash: Some("graph-hash-a".into()), + dense_reason_counts_json: Some(r#"{"public_api":12}"#.into()), }; let before_stale = stdio_mandatory_sidecar_fingerprint_from_status( codestory_retrieval::embedding_runtime_id(), @@ -1936,15 +1993,14 @@ mod tests { manifest: Some(manifest.clone()), }), ); - let successful_key = stdio_packet_cache_key( - storage_fingerprint.clone(), - before_stale.clone(), - "Explain strict readiness.", - PacketBudgetModeDto::Compact, - None, - true, - None, - ); + let successful_key = stdio_packet_cache_key(StdioPacketCacheKeyInput { + storage_fingerprint: storage_fingerprint.clone(), + sidecar_fingerprint: before_stale.clone(), + question: "Explain strict readiness.", + task_class: None, + latency_budget_ms: None, + ..base_packet_cache_key_input("Explain strict readiness.") + }); let mut cache = StdioPacketCache::default(); cache.insert( successful_key.clone(), @@ -1963,15 +2019,14 @@ mod tests { manifest: Some(manifest), }), ); - let stale_key = stdio_packet_cache_key( - storage_fingerprint.clone(), - after_stale.clone(), - "Explain strict readiness.", - PacketBudgetModeDto::Compact, - None, - true, - None, - ); + let stale_key = stdio_packet_cache_key(StdioPacketCacheKeyInput { + storage_fingerprint: storage_fingerprint.clone(), + sidecar_fingerprint: after_stale.clone(), + question: "Explain strict readiness.", + task_class: None, + latency_budget_ms: None, + ..base_packet_cache_key_input("Explain strict readiness.") + }); assert_ne!(before_stale, after_stale); assert!( diff --git a/crates/codestory-cli/tests/agent_quality_eval.rs b/crates/codestory-cli/tests/agent_quality_eval.rs index f32438a9..29c940a4 100644 --- a/crates/codestory-cli/tests/agent_quality_eval.rs +++ b/crates/codestory-cli/tests/agent_quality_eval.rs @@ -116,6 +116,15 @@ struct ScoredClaim { } const MIN_CONFIDENCE_CALIBRATION: f64 = 0.70; +const ALLOW_ZERO_REAL_REPO_EVAL_ENV: &str = "CODESTORY_ALLOW_SKIP_LOCAL_REAL_AGENT_QUALITY"; + +fn allow_zero_real_repo_eval_value(value: Option<&str>) -> bool { + matches!(value.map(str::trim), Some("1")) +} + +fn allow_zero_real_repo_eval_from_env() -> bool { + allow_zero_real_repo_eval_value(std::env::var(ALLOW_ZERO_REAL_REPO_EVAL_ENV).ok().as_deref()) +} fn fixture_root() -> PathBuf { PathBuf::from(env!("CARGO_MANIFEST_DIR")) @@ -563,6 +572,18 @@ fn local_evidence_path_validation_rejects_stale_sources() { validate_local_evidence_paths(&manifest, root); } +#[test] +fn zero_real_repo_eval_escape_hatch_requires_exact_one() { + assert!(allow_zero_real_repo_eval_value(Some("1"))); + assert!(allow_zero_real_repo_eval_value(Some(" 1 "))); + for value in [None, Some(""), Some("0"), Some("true"), Some("yes")] { + assert!( + !allow_zero_real_repo_eval_value(value), + "only {ALLOW_ZERO_REAL_REPO_EVAL_ENV}=1 should allow skip-only local evidence; got {value:?}" + ); + } +} + #[test] #[ignore = "local-only real-repo evaluator; run on the Windows workstation with sibling repos present"] fn local_real_repo_manifests_score_or_explicitly_skip_missing_repos() { @@ -610,8 +631,16 @@ fn local_real_repo_manifests_score_or_explicitly_skip_missing_repos() { evaluated += 1; } - assert!( - evaluated > 0 || !skipped.is_empty(), - "local-only path should either evaluate manifests or report explicit skips" - ); + if evaluated == 0 { + assert!( + allow_zero_real_repo_eval_from_env(), + "local-only real-repo quality evaluator evaluated 0 repos; missing repos: {}. \ +Set {ALLOW_ZERO_REAL_REPO_EVAL_ENV}=1 only when intentionally collecting skip-only local evidence.", + skipped.join(", ") + ); + eprintln!( + "intentionally skipping local-only agent-quality manifests because {ALLOW_ZERO_REAL_REPO_EVAL_ENV}=1; missing repos: {}", + skipped.join(", ") + ); + } } diff --git a/crates/codestory-cli/tests/architecture_contracts.rs b/crates/codestory-cli/tests/architecture_contracts.rs index a49bc9fa..1401d9cc 100644 --- a/crates/codestory-cli/tests/architecture_contracts.rs +++ b/crates/codestory-cli/tests/architecture_contracts.rs @@ -341,7 +341,7 @@ fn stdio_tool_catalog_stays_aligned_with_read_only_browser_service_operations() #[test] fn web_cockpit_stays_deferred_until_browser_surface_gate_opens() { - let gate = read("docs/architecture/browser-surface-gate.md"); + let gate = read("docs/architecture/overview.md"); let warm_stats = read("docs/testing/codestory-stdio-warm-loop-stats.md"); let runtime_path = read("docs/architecture/runtime-execution-path.md"); let cli_args = read("crates/codestory-cli/src/args.rs"); diff --git a/crates/codestory-cli/tests/cli_golden_path.rs b/crates/codestory-cli/tests/cli_golden_path.rs index 75350569..53ab6691 100644 --- a/crates/codestory-cli/tests/cli_golden_path.rs +++ b/crates/codestory-cli/tests/cli_golden_path.rs @@ -635,8 +635,7 @@ fn doctor_next_commands_stop_at_index_repair_when_inventory_is_stale() { assert!( joined.contains("codestory-cli index") && joined.contains("--refresh incremental") - && joined.contains("codestory-cli doctor") - && joined.contains("--format markdown"), + && joined.contains("codestory-cli doctor"), "stale doctor should recommend index repair then doctor recheck: {doctor:#}" ); assert!( @@ -706,8 +705,7 @@ fn doctor_next_commands_stop_at_retrieval_repair_when_sidecar_is_not_full() { joined.contains("codestory-cli retrieval status") && joined.contains("codestory-cli retrieval index") && joined.contains("--refresh full") - && joined.contains("codestory-cli doctor") - && joined.contains("--format markdown"), + && joined.contains("codestory-cli doctor"), "doctor should recommend retrieval repair before packet/search: {doctor:#}" ); assert!( @@ -1625,11 +1623,40 @@ fn assert_files_and_affected_read_existing_cache(workspace: &Path, cache_dir: &P "json", ], ); + assert!( + files["summary"]["file_count"] + .as_u64() + .is_some_and(|count| count >= 1), + "files JSON should keep whole-index file_count: {files:#}" + ); + assert_eq!( + files["summary"]["indexed_file_count"].as_u64(), + files["summary"]["file_count"].as_u64(), + "files JSON should keep indexed_file_count whole-index for the fully indexed fixture: {files:#}" + ); + assert!( + files["summary"]["filtered_file_count"] + .as_u64() + .is_some_and(|count| count >= 1), + "files JSON should include filtered_file_count: {files:#}" + ); + assert!( + files["summary"]["file_count"].as_u64() > files["summary"]["filtered_file_count"].as_u64(), + "role-filtered files JSON should keep whole-index file_count distinct from filtered_file_count: {files:#}" + ); + assert_eq!( + files["summary"]["visible_file_count"].as_u64(), + files["files"].as_array().map(|items| items.len() as u64), + "visible_file_count should match returned rows: {files:#}" + ); assert!( files["summary"]["language_counts"] .as_array() - .is_some_and(|items| !items.is_empty()), - "files JSON should include language counts: {files:#}" + .is_some_and(|items| items.iter().any(|item| item["language"] == "rust" + && item["support_mode"] == "parser_backed_graph" + && item["evidence_tier"] == "graph_fidelity" + && item["claim_label"] == "parser-backed graph, fidelity-gated")), + "files JSON should include language counts with support tiers: {files:#}" ); assert!( files["summary"]["framework_route_coverage"] @@ -1637,7 +1664,7 @@ fn assert_files_and_affected_read_existing_cache(workspace: &Path, cache_dir: &P .is_some_and( |items| items.iter().any(|item| item["framework"] == "express" && item["promotable"] == true - && item["fixture_status"].is_string() + && item["coverage_evidence"].is_string() && item["unsupported_patterns"].is_array()) && items.iter().any(|item| item["framework"] == "nextjs" && item["confidence_floor"] == "file_convention" @@ -1659,6 +1686,39 @@ fn assert_files_and_affected_read_existing_cache(workspace: &Path, cache_dir: &P "files --role test should list inferred test files: {files:#}" ); + let limited_files = run_cli_json( + workspace, + cache_dir, + &[ + "files", + "--language", + "rust", + "--limit", + "1", + "--refresh", + "none", + "--format", + "json", + ], + ); + assert!( + limited_files["summary"]["filtered_file_count"].as_u64() + > limited_files["summary"]["visible_file_count"].as_u64(), + "limited files JSON should report filtered rows before truncation and visible rows after truncation: {limited_files:#}" + ); + assert_eq!( + limited_files["summary"]["visible_file_count"].as_u64(), + limited_files["files"] + .as_array() + .map(|items| items.len() as u64), + "limited visible_file_count should match returned rows: {limited_files:#}" + ); + assert_eq!( + limited_files["summary"]["truncated"].as_bool(), + Some(true), + "limited files JSON should mark truncation: {limited_files:#}" + ); + let files_markdown = run_cli( workspace, cache_dir, @@ -1682,10 +1742,18 @@ fn assert_files_and_affected_read_existing_cache(workspace: &Path, cache_dir: &P let files_markdown = String::from_utf8_lossy(&files_markdown.stdout); assert!( files_markdown.contains("# indexed files") + && files_markdown.contains("whole index files:") + && files_markdown.contains("filtered files:") + && files_markdown.contains("visible rows:") + && files_markdown.contains("truncated:") && files_markdown.contains("languages:") + && files_markdown.contains("rust=") + && files_markdown.contains("[parser_backed_graph; graph_fidelity]") + && files_markdown.contains("language_support_claims:") + && files_markdown.contains("parser-backed graph, fidelity-gated") && files_markdown.contains("coverage:") && files_markdown.contains("framework route coverage:"), - "files markdown should summarize inventory and coverage:\n{files_markdown}" + "files markdown should summarize inventory, support tiers, and coverage:\n{files_markdown}" ); let affected = run_cli_json( diff --git a/crates/codestory-cli/tests/codestory_repo_e2e_stats.rs b/crates/codestory-cli/tests/codestory_repo_e2e_stats.rs index 6d6e63c7..961287e6 100644 --- a/crates/codestory-cli/tests/codestory_repo_e2e_stats.rs +++ b/crates/codestory-cli/tests/codestory_repo_e2e_stats.rs @@ -7,6 +7,12 @@ use std::process::Command; use std::time::Instant; use tempfile::tempdir; +// Repo-scale smoke guard. Phase-specific assertions below carry the product +// repeat-refresh contract; wall-clock process timing remains telemetry. +const REPEAT_FULL_REFRESH_SMOKE_SECONDS_BUDGET: f64 = 45.0; +const REPEAT_GRAPH_PHASE_SECONDS_BUDGET: f64 = 20.0; +const REPEAT_SEMANTIC_PHASE_SECONDS_BUDGET: f64 = 3.0; + #[derive(Debug, Serialize)] struct RepoE2eStats { project_root: String, @@ -20,23 +26,62 @@ struct RepoE2eStats { index_seconds: f64, graph_phase_seconds: f64, semantic_phase_seconds: f64, + semantic_embedding_ms: u64, + symbol_search_docs_written: u64, + semantic_dense_docs_skipped: u64, + semantic_dense_public_api: u64, + semantic_dense_entrypoint: u64, + semantic_dense_documented_nontrivial: u64, + semantic_dense_central_graph_node: u64, + semantic_dense_component_report: u64, + semantic_dense_unstructured_doc: u64, semantic_docs_reused: u64, semantic_docs_embedded: u64, semantic_docs_pending: u64, semantic_docs_stale: u64, + repeat_full_refresh_seconds: f64, + repeat_graph_phase_seconds: f64, + repeat_semantic_phase_seconds: f64, + repeat_semantic_doc_build_ms: u64, + repeat_semantic_embedding_ms: u64, + repeat_semantic_db_upsert_ms: u64, + repeat_semantic_reload_ms: u64, + repeat_semantic_prune_ms: u64, + repeat_cache_refresh_ms: u64, + repeat_search_projection_rebuild_ms: u64, + repeat_search_symbol_index_ms: u64, + repeat_runtime_cache_publish_ms: u64, + repeat_semantic_docs_reused: u64, + repeat_semantic_docs_embedded: u64, + repeat_semantic_docs_pending: u64, + repeat_semantic_docs_stale: u64, retrieval_index_seconds: f64, retrieval_status_seconds: f64, + sidecar_manifest: SidecarManifestStats, ground_seconds: f64, search_seconds: f64, symbol_seconds: f64, trail_seconds: f64, snippet_seconds: f64, + report_seconds: f64, index: IndexStats, ground: GroundStats, search: SearchStats, symbol: SymbolStats, trail: TrailStats, snippet: SnippetStats, + report: ReportStats, +} + +#[derive(Debug, Serialize)] +struct SidecarManifestStats { + symbol_doc_count: u64, + dense_projection_count: u64, + projection_count: u64, + semantic_policy_version: String, + graph_artifact_hash_present: bool, + dense_reason_counts_json: String, + dense_reason_count_total: u64, } #[derive(Debug, Serialize)] @@ -93,6 +138,15 @@ struct SnippetStats { snippet_lines: usize, } +#[derive(Debug, Serialize)] +struct ReportStats { + markdown_seconds: f64, + json_seconds: f64, + markdown_bytes: u64, + json_graph_nodes: usize, + json_graph_edges: usize, +} + const PROOF_TIER_STATS_ONLY: &str = "stats_only"; const PROOF_TIER_FULL_SIDECAR: &str = "full_sidecar"; const INDEX_SECONDS_WARNING_THRESHOLD: f64 = 600.0; @@ -215,6 +269,19 @@ fn run_cli_json( cache_dir: &Path, args: &[String], ) -> (f64, Value) { + let (seconds, stdout) = run_cli_output(binary, project_root, cache_dir, args); + ( + seconds, + serde_json::from_slice(&stdout).expect("parse json output"), + ) +} + +fn run_cli_output( + binary: &Path, + project_root: &Path, + cache_dir: &Path, + args: &[String], +) -> (f64, Vec) { let started = Instant::now(); let output = Command::new(binary) .current_dir(project_root) @@ -236,10 +303,7 @@ fn run_cli_json( String::from_utf8_lossy(&output.stdout), String::from_utf8_lossy(&output.stderr) ); - ( - seconds, - serde_json::from_slice(&output.stdout).expect("parse json output"), - ) + (seconds, output.stdout) } fn json_path<'a>(value: &'a Value, path: &[&str]) -> &'a Value { @@ -300,6 +364,17 @@ fn optional_u64_field(value: &Value, path: &[&str]) -> u64 { current.as_u64().unwrap_or(0) } +fn dense_reason_count_total(reason_counts_json: &str) -> u64 { + let value: Value = + serde_json::from_str(reason_counts_json).expect("dense reason counts should be json"); + value + .as_object() + .expect("dense reason counts should be a json object") + .values() + .map(|value| value.as_u64().expect("dense reason count should be u64")) + .sum() +} + fn bool_field(value: &Value, path: &[&str]) -> bool { let current = json_path(value, path); current @@ -403,6 +478,19 @@ fn codestory_repo_release_e2e_emits_stats() { let storage_path = PathBuf::from(string_field(&index_json, &["storage_path"])); let search_dir = search_dir_for_storage(storage_path.as_path()); + let (repeat_full_refresh_seconds, repeat_index_json) = run_cli_json( + &binary, + project_root.as_path(), + cache_dir.path(), + &[ + "index".to_string(), + "--refresh".to_string(), + "full".to_string(), + "--format".to_string(), + "json".to_string(), + ], + ); + let (retrieval_index_seconds, _retrieval_index_json) = run_cli_json( &binary, project_root.as_path(), @@ -515,6 +603,32 @@ fn codestory_repo_release_e2e_emits_stats() { ], ); + let (report_markdown_seconds, report_markdown_stdout) = run_cli_output( + &binary, + project_root.as_path(), + cache_dir.path(), + &[ + "report".to_string(), + "--limit".to_string(), + "8".to_string(), + "--format".to_string(), + "markdown".to_string(), + ], + ); + let (report_json_seconds, report_json) = run_cli_json( + &binary, + project_root.as_path(), + cache_dir.path(), + &[ + "report".to_string(), + "--limit".to_string(), + "8".to_string(), + "--format".to_string(), + "json".to_string(), + ], + ); + let report_seconds = report_markdown_seconds + report_json_seconds; + let search_dir_after = fs::metadata(&search_dir) .expect("search dir metadata after reads") .modified() @@ -526,15 +640,71 @@ fn codestory_repo_release_e2e_emits_stats() { + optional_u64_field(&index_json, &["phase_timings", "edge_resolution_ms"]) + optional_u64_field(&index_json, &["phase_timings", "error_flush_ms"]) + optional_u64_field(&index_json, &["phase_timings", "cleanup_ms"]); + let repeat_graph_phase_ms = + optional_u64_field(&repeat_index_json, &["phase_timings", "parse_index_ms"]) + + optional_u64_field( + &repeat_index_json, + &["phase_timings", "projection_flush_ms"], + ) + + optional_u64_field(&repeat_index_json, &["phase_timings", "edge_resolution_ms"]) + + optional_u64_field(&repeat_index_json, &["phase_timings", "error_flush_ms"]) + + optional_u64_field(&repeat_index_json, &["phase_timings", "cleanup_ms"]); let semantic_phase_ms = optional_u64_field(&index_json, &["phase_timings", "semantic_doc_build_ms"]) + optional_u64_field(&index_json, &["phase_timings", "semantic_embedding_ms"]) + optional_u64_field(&index_json, &["phase_timings", "semantic_db_upsert_ms"]) + optional_u64_field(&index_json, &["phase_timings", "semantic_reload_ms"]) + optional_u64_field(&index_json, &["phase_timings", "semantic_prune_ms"]); + let repeat_semantic_doc_build_ms = optional_u64_field( + &repeat_index_json, + &["phase_timings", "semantic_doc_build_ms"], + ); + let repeat_semantic_embedding_ms = optional_u64_field( + &repeat_index_json, + &["phase_timings", "semantic_embedding_ms"], + ); + let repeat_semantic_db_upsert_ms = optional_u64_field( + &repeat_index_json, + &["phase_timings", "semantic_db_upsert_ms"], + ); + let repeat_semantic_reload_ms = + optional_u64_field(&repeat_index_json, &["phase_timings", "semantic_reload_ms"]); + let repeat_semantic_prune_ms = + optional_u64_field(&repeat_index_json, &["phase_timings", "semantic_prune_ms"]); + let repeat_semantic_phase_ms = repeat_semantic_doc_build_ms + + repeat_semantic_embedding_ms + + repeat_semantic_db_upsert_ms + + repeat_semantic_reload_ms + + repeat_semantic_prune_ms; let semantic_phase_seconds = semantic_phase_ms as f64 / 1000.0; let search_sidecar_shadow_retrieval_mode = string_field(&search_json, &["retrieval_shadow", "retrieval_mode"]).to_string(); + let dense_reason_counts_json = string_field( + &retrieval_status_json, + &["manifest", "dense_reason_counts_json"], + ) + .to_string(); + let sidecar_manifest = SidecarManifestStats { + symbol_doc_count: u64_field(&retrieval_status_json, &["manifest", "symbol_doc_count"]), + dense_projection_count: u64_field( + &retrieval_status_json, + &["manifest", "dense_projection_count"], + ), + projection_count: u64_field(&retrieval_status_json, &["manifest", "projection_count"]), + semantic_policy_version: string_field( + &retrieval_status_json, + &["manifest", "semantic_policy_version"], + ) + .to_string(), + graph_artifact_hash_present: !string_field( + &retrieval_status_json, + &["manifest", "graph_artifact_hash"], + ) + .trim() + .is_empty(), + dense_reason_count_total: dense_reason_count_total(&dense_reason_counts_json), + dense_reason_counts_json, + }; let proof_tier = release_readiness_proof_tier( sidecar_retrieval_mode.as_str(), search_sidecar_shadow_retrieval_mode.as_str(), @@ -554,6 +724,42 @@ fn codestory_repo_release_e2e_emits_stats() { index_seconds, graph_phase_seconds: graph_phase_ms as f64 / 1000.0, semantic_phase_seconds, + semantic_embedding_ms: optional_u64_field( + &index_json, + &["phase_timings", "semantic_embedding_ms"], + ), + symbol_search_docs_written: optional_u64_field( + &index_json, + &["phase_timings", "symbol_search_docs_written"], + ), + semantic_dense_docs_skipped: optional_u64_field( + &index_json, + &["phase_timings", "semantic_dense_docs_skipped"], + ), + semantic_dense_public_api: optional_u64_field( + &index_json, + &["phase_timings", "semantic_dense_public_api"], + ), + semantic_dense_entrypoint: optional_u64_field( + &index_json, + &["phase_timings", "semantic_dense_entrypoint"], + ), + semantic_dense_documented_nontrivial: optional_u64_field( + &index_json, + &["phase_timings", "semantic_dense_documented_nontrivial"], + ), + semantic_dense_central_graph_node: optional_u64_field( + &index_json, + &["phase_timings", "semantic_dense_central_graph_node"], + ), + semantic_dense_component_report: optional_u64_field( + &index_json, + &["phase_timings", "semantic_dense_component_report"], + ), + semantic_dense_unstructured_doc: optional_u64_field( + &index_json, + &["phase_timings", "semantic_dense_unstructured_doc"], + ), semantic_docs_reused: optional_u64_field( &index_json, &["phase_timings", "semantic_docs_reused"], @@ -570,13 +776,55 @@ fn codestory_repo_release_e2e_emits_stats() { &index_json, &["phase_timings", "semantic_docs_stale"], ), + repeat_full_refresh_seconds, + repeat_graph_phase_seconds: repeat_graph_phase_ms as f64 / 1000.0, + repeat_semantic_phase_seconds: repeat_semantic_phase_ms as f64 / 1000.0, + repeat_semantic_doc_build_ms, + repeat_semantic_embedding_ms, + repeat_semantic_db_upsert_ms, + repeat_semantic_reload_ms, + repeat_semantic_prune_ms, + repeat_cache_refresh_ms: optional_u64_field( + &repeat_index_json, + &["phase_timings", "cache_refresh_ms"], + ), + repeat_search_projection_rebuild_ms: optional_u64_field( + &repeat_index_json, + &["phase_timings", "search_projection_rebuild_ms"], + ), + repeat_search_symbol_index_ms: optional_u64_field( + &repeat_index_json, + &["phase_timings", "search_symbol_index_ms"], + ), + repeat_runtime_cache_publish_ms: optional_u64_field( + &repeat_index_json, + &["phase_timings", "runtime_cache_publish_ms"], + ), + repeat_semantic_docs_reused: optional_u64_field( + &repeat_index_json, + &["phase_timings", "semantic_docs_reused"], + ), + repeat_semantic_docs_embedded: optional_u64_field( + &repeat_index_json, + &["phase_timings", "semantic_docs_embedded"], + ), + repeat_semantic_docs_pending: optional_u64_field( + &repeat_index_json, + &["phase_timings", "semantic_docs_pending"], + ), + repeat_semantic_docs_stale: optional_u64_field( + &repeat_index_json, + &["phase_timings", "semantic_docs_stale"], + ), retrieval_index_seconds, retrieval_status_seconds, + sidecar_manifest, ground_seconds, search_seconds, symbol_seconds, trail_seconds, snippet_seconds, + report_seconds, index: IndexStats { node_count: u64_field(&index_json, &["summary", "stats", "node_count"]), edge_count: u64_field(&index_json, &["summary", "stats", "edge_count"]), @@ -626,6 +874,13 @@ fn codestory_repo_release_e2e_emits_stats() { .lines() .count(), }, + report: ReportStats { + markdown_seconds: report_markdown_seconds, + json_seconds: report_json_seconds, + markdown_bytes: report_markdown_stdout.len() as u64, + json_graph_nodes: array_len(&report_json, &["graph", "nodes"]), + json_graph_edges: array_len(&report_json, &["graph", "edges"]), + }, }; println!( @@ -653,9 +908,64 @@ fn codestory_repo_release_e2e_emits_stats() { stats.search.sidecar_shadow_retrieval_mode, "full", "search should expose full sidecar retrieval shadow" ); + assert!( + stats.sidecar_manifest.symbol_doc_count > 0, + "full sidecar manifest should record graph-native symbol docs" + ); + assert!( + stats.sidecar_manifest.dense_projection_count > 0, + "CodeStory product run should select dense anchors" + ); + assert_eq!( + stats.sidecar_manifest.dense_projection_count, stats.sidecar_manifest.projection_count, + "legacy projection_count should mirror dense_projection_count under graph_first_v1" + ); + assert_eq!( + stats.sidecar_manifest.semantic_policy_version, "graph_first_v1", + "full sidecar manifest should record the active dense policy" + ); + assert!( + stats.sidecar_manifest.graph_artifact_hash_present, + "full sidecar manifest should record a graph artifact hash" + ); + assert_eq!( + stats.sidecar_manifest.dense_reason_count_total, + stats.sidecar_manifest.dense_projection_count, + "dense reason counts should account for every dense anchor" + ); + assert!( + stats.symbol_search_docs_written > 0, + "index should report graph-native symbol docs written" + ); + assert!( + stats.semantic_dense_docs_skipped > 0, + "AST-first policy should skip dense embeddings for recoverable code symbols" + ); + assert_eq!( + stats.repeat_semantic_docs_embedded, 0, + "repeat full refresh should embed zero unchanged dense docs" + ); + assert!( + stats.repeat_graph_phase_seconds < REPEAT_GRAPH_PHASE_SECONDS_BUDGET, + "repeat graph phase should stay under {:.0} seconds, got {:.2}s", + REPEAT_GRAPH_PHASE_SECONDS_BUDGET, + stats.repeat_graph_phase_seconds + ); + assert!( + stats.repeat_semantic_phase_seconds < REPEAT_SEMANTIC_PHASE_SECONDS_BUDGET, + "repeat semantic reuse phase should stay under {:.0} seconds, got {:.2}s", + REPEAT_SEMANTIC_PHASE_SECONDS_BUDGET, + stats.repeat_semantic_phase_seconds + ); + assert!( + stats.repeat_full_refresh_seconds < REPEAT_FULL_REFRESH_SMOKE_SECONDS_BUDGET, + "repeat full refresh process smoke cap should stay under {:.0} seconds, got {:.2}s", + REPEAT_FULL_REFRESH_SMOKE_SECONDS_BUDGET, + stats.repeat_full_refresh_seconds + ); assert!( stats.index.semantic_doc_count > 0, - "full repo index should populate semantic docs" + "full repo index should populate dense anchors" ); assert!( stats.semantic_docs_embedded > 0, diff --git a/crates/codestory-cli/tests/onboarding_contracts.rs b/crates/codestory-cli/tests/onboarding_contracts.rs index 2d45b6e8..436efc90 100644 --- a/crates/codestory-cli/tests/onboarding_contracts.rs +++ b/crates/codestory-cli/tests/onboarding_contracts.rs @@ -186,8 +186,11 @@ fn readme_keeps_customer_first_onboarding() { assert!(readme.contains(".agents/skills/codestory-grounding/SKILL.md")); assert!(readme.contains("docs/usage.md")); assert!(readme.contains("docs/concepts/how-codestory-works.md")); - assert!(readme.contains("docs/testing/benchmark-results.md")); - assert!(readme.contains("setup embeddings --project $TargetWorkspace --dry-run --format json")); + assert!(readme.contains("docs/architecture/language-support.md")); + assert!(readme.contains("docs/testing/benchmark-ledger.md")); + assert!(readme.contains( + r#""$CODESTORY_CLI" setup embeddings --project "$TARGET_WORKSPACE" --dry-run --format json"# + )); assert!(readme.contains("serve --stdio")); assert!(readme.contains("docs/architecture/overview.md")); assert!(readme.contains("docs/contributors/debugging.md")); @@ -203,6 +206,7 @@ fn readme_keeps_customer_first_onboarding() { "docs/concepts/how-codestory-works.md", "docs/architecture/overview.md", "docs/architecture/runtime-execution-path.md", + "docs/architecture/language-support.md", "docs/architecture/subsystems/contracts.md", "docs/architecture/subsystems/workspace.md", "docs/architecture/subsystems/indexer.md", @@ -212,7 +216,6 @@ fn readme_keeps_customer_first_onboarding() { "docs/contributors/getting-started.md", "docs/contributors/debugging.md", "docs/contributors/testing-matrix.md", - "docs/decision-log.md", ".agents/skills/codestory-grounding/scripts/setup.ps1", ".agents/skills/codestory-grounding/scripts/setup.sh", "scripts/codestory-agent-ab-benchmark.mjs", @@ -250,11 +253,15 @@ fn docs_drift_contracts_keep_living_sources_explicit() { let usage = fs::read_to_string(root.join("docs/usage.md")).expect("usage doc should exist"); let testing_matrix = fs::read_to_string(root.join("docs/contributors/testing-matrix.md")) .expect("testing matrix should exist"); - let benchmark_scorecard = fs::read_to_string(root.join("docs/testing/benchmark-results.md")) - .expect("benchmark scorecard should exist"); + let language_support = fs::read_to_string(root.join("docs/architecture/language-support.md")) + .expect("language support doc should exist"); + let benchmark_scorecard = fs::read_to_string(root.join("docs/testing/benchmark-ledger.md")) + .expect("benchmark ledger should exist"); assert!( - readme.contains("setup embeddings --project $TargetWorkspace --dry-run --format json"), + readme.contains( + r#""$CODESTORY_CLI" setup embeddings --project "$TARGET_WORKSPACE" --dry-run --format json"# + ), "README quickstart should show first-run semantic setup dry-run" ); assert!( @@ -279,9 +286,37 @@ fn docs_drift_contracts_keep_living_sources_explicit() { "testing matrix should not present an old hard-coded baseline as current" ); assert!( - benchmark_scorecard.contains("[benchmark ledger](benchmark-ledger.md)") + benchmark_scorecard.contains("## Current Scorecard") && benchmark_scorecard.contains("codestory-e2e-stats-log.md"), - "benchmark scorecard should link detailed history and living timing logs" + "benchmark ledger should keep the scorecard and living timing log references" + ); + for required in [ + "parser-backed graph", + "fidelity-gated", + "structural collector", + "candidate parser compatibility record", + "Go, Ruby, PHP, C#, Kotlin, Swift, Dart, Bash", + "Kotlin, Swift, Dart, Bash", + ] { + assert!( + language_support.contains(required), + "language support doc should preserve support-claim term `{required}`" + ); + } + for required in [ + "crates/codestory-contracts/src/language_support.rs", + "language_support_profile_for_ext", + "language_support_profile_for_language_name", + "get_language_for_ext", + ] { + assert!( + language_support.contains(required), + "language support docs should mention `{required}`" + ); + } + assert!( + testing_matrix.contains("../architecture/language-support.md"), + "testing matrix should link the language support claim contract" ); assert!( root.join("docs/testing/benchmark-ledger.md").exists(), @@ -309,7 +344,11 @@ fn usage_doc_keeps_agent_contract_terms_out_of_operator_flow() { assert!(usage.contains("Common Workflows")); assert!(usage.contains("I need a repo overview")); assert!(usage.contains("I need evidence for a broad question")); - assert!(usage.contains("The cache or retrieval looks stale")); + assert!(usage.contains("The cache or local navigation looks stale")); + assert!(usage.contains("For agent-facing packet/search recovery")); + assert!(usage.contains( + "codestory-cli retrieval index --project --refresh full --format json" + )); for blocked in [ "sufficiency.avoid_opening", "supported-claim wording", @@ -357,8 +396,8 @@ fn usage_doc_names_two_readiness_tracks_and_predictable_output_modes() { #[test] fn benchmark_docs_show_proof_tier_ladder() { let root = repo_root(); - let benchmark_scorecard = fs::read_to_string(root.join("docs/testing/benchmark-results.md")) - .expect("benchmark scorecard should exist"); + let benchmark_scorecard = fs::read_to_string(root.join("docs/testing/benchmark-ledger.md")) + .expect("benchmark ledger should exist"); assert!(benchmark_scorecard.contains("## Proof Tier Ladder")); for tier in [ @@ -369,7 +408,7 @@ fn benchmark_docs_show_proof_tier_ladder() { ] { assert!( benchmark_scorecard.contains(tier), - "benchmark scorecard should explain proof tier {tier}" + "benchmark ledger should explain proof tier {tier}" ); } assert!(benchmark_scorecard.contains("Full sidecar readiness, agent packet/search readiness")); diff --git a/crates/codestory-cli/tests/ready_command.rs b/crates/codestory-cli/tests/ready_command.rs new file mode 100644 index 00000000..a05a9c91 --- /dev/null +++ b/crates/codestory-cli/tests/ready_command.rs @@ -0,0 +1,121 @@ +use serde_json::Value; +use std::fs; +use std::path::Path; +use std::process::Command; +use tempfile::tempdir; + +#[test] +fn ready_command_emits_compact_verdicts_and_filters_goal() { + let workspace = tempdir().expect("workspace dir"); + let cache_dir = tempdir().expect("cache dir"); + write_tiny_rust_workspace(workspace.path()); + run_cli( + workspace.path(), + cache_dir.path(), + &["index", "--refresh", "full", "--format", "json"], + ); + + let json_text = run_cli( + workspace.path(), + cache_dir.path(), + &["ready", "--format", "json"], + ); + let json: Value = serde_json::from_str(&json_text).expect("ready json"); + let verdicts = json["verdicts"] + .as_array() + .expect("ready verdicts should be an array"); + assert_eq!(verdicts.len(), 2); + assert_eq!(verdicts[0]["goal"], "local_navigation"); + assert!( + verdicts[0]["minimum_next"][0] + .as_str() + .expect("minimum next command") + .contains("codestory-cli") + ); + + let command_text = json_text.replace("\\\\", "\\"); + assert!( + !command_text.contains("\\\\?\\") && !command_text.contains("//?/"), + "ready commands should use normalized human paths: {json_text}" + ); + + let local_json_text = run_cli( + workspace.path(), + cache_dir.path(), + &["ready", "--goal", "local", "--format", "json"], + ); + let local_json: Value = serde_json::from_str(&local_json_text).expect("ready local json"); + let local_verdicts = local_json["verdicts"] + .as_array() + .expect("local ready verdicts"); + assert_eq!(local_verdicts.len(), 1); + assert_eq!(local_verdicts[0]["goal"], "local_navigation"); + + let markdown = run_cli( + workspace.path(), + cache_dir.path(), + &["ready", "--goal", "agent", "--format", "markdown"], + ); + assert!(markdown.contains("# Readiness")); + assert!(markdown.contains("agent_packet_search")); + assert!(markdown.contains("minimum_next:")); + assert!(markdown.contains("full_repair:")); + assert!(markdown.contains("--refresh full")); + assert!(markdown.contains("codestory-cli retrieval bootstrap --project")); + assert!(markdown.contains("codestory-cli retrieval index --project")); + assert!(markdown.contains("codestory-cli retrieval status --project")); + assert!(markdown.contains("codestory-cli doctor --project")); + assert!(markdown.contains("--format markdown")); + assert!( + !markdown.contains("codestory-cli index --project"), + "fresh-index agent readiness should not recommend a full core reindex: {markdown}" + ); +} + +fn run_cli(workspace: &Path, cache_dir: &Path, args: &[&str]) -> String { + let output = Command::new(env!("CARGO_BIN_EXE_codestory-cli")) + .args(args) + .arg("--project") + .arg(workspace) + .arg("--cache-dir") + .arg(cache_dir) + .env("CODESTORY_EMBED_RUNTIME_MODE", "hash") + .output() + .expect("run codestory-cli"); + assert!( + output.status.success(), + "command failed: {args:?}\nstdout:\n{}\nstderr:\n{}", + String::from_utf8_lossy(&output.stdout), + String::from_utf8_lossy(&output.stderr) + ); + String::from_utf8(output.stdout).expect("stdout utf8") +} + +fn write_tiny_rust_workspace(root: &Path) { + fs::write( + root.join("Cargo.toml"), + r#"[package] +name = "ready-command-fixture" +version = "0.1.0" +edition = "2024" + +[lib] +path = "src/lib.rs" +"#, + ) + .expect("write Cargo.toml"); + let src = root.join("src"); + fs::create_dir_all(&src).expect("create src"); + fs::write( + src.join("lib.rs"), + r#"pub fn entry_point() -> String { + helper("ready") +} + +fn helper(value: &str) -> String { + format!("ready:{value}") +} +"#, + ) + .expect("write lib.rs"); +} diff --git a/crates/codestory-cli/tests/report_export.rs b/crates/codestory-cli/tests/report_export.rs index b6de4057..e87220be 100644 --- a/crates/codestory-cli/tests/report_export.rs +++ b/crates/codestory-cli/tests/report_export.rs @@ -1,4 +1,8 @@ +use serde_json::Value; +use std::fs; +use std::path::Path; use std::process::Command; +use tempfile::tempdir; #[test] fn report_command_help_names_markdown_and_json_exports() { @@ -14,4 +18,103 @@ fn report_command_help_names_markdown_and_json_exports() { assert!(stdout.contains("--format ")); assert!(stdout.contains("--output-file ")); assert!(stdout.contains("--limit ")); + assert!(stdout.contains("--profile ")); +} + +#[test] +fn report_handoff_profile_renders_handoff_header_and_json_metadata() { + let workspace = tempdir().expect("workspace dir"); + let cache_dir = tempdir().expect("cache dir"); + write_tiny_rust_workspace(workspace.path()); + run_cli( + workspace.path(), + cache_dir.path(), + &["index", "--refresh", "full", "--format", "json"], + ); + + let markdown = run_cli( + workspace.path(), + cache_dir.path(), + &[ + "report", + "--profile", + "handoff", + "--limit", + "3", + "--format", + "markdown", + ], + ); + assert!(markdown.contains("## Read This First / Agent Handoff")); + assert!(markdown.contains("readiness agent_packet_search")); + assert!(markdown.contains("## Suggested Follow-up Queries")); + assert!( + !markdown.contains("## Repo Summary"), + "handoff profile should trim default report sections:\n{markdown}" + ); + + let json_text = run_cli( + workspace.path(), + cache_dir.path(), + &["report", "--limit", "3", "--format", "json"], + ); + let json: Value = serde_json::from_str(&json_text).expect("report json"); + assert!( + json.pointer("/metadata/handoff/readiness/0").is_some(), + "report json should include metadata.handoff.readiness: {json}" + ); + assert!( + json.pointer("/metadata/handoff/next_command") + .and_then(Value::as_str) + .is_some_and(|command| command.contains("codestory-cli")), + "report json should include a handoff next command: {json}" + ); +} + +fn run_cli(workspace: &Path, cache_dir: &Path, args: &[&str]) -> String { + let output = Command::new(env!("CARGO_BIN_EXE_codestory-cli")) + .args(args) + .arg("--project") + .arg(workspace) + .arg("--cache-dir") + .arg(cache_dir) + .env("CODESTORY_EMBED_RUNTIME_MODE", "hash") + .output() + .expect("run codestory-cli"); + assert!( + output.status.success(), + "command failed: {args:?}\nstdout:\n{}\nstderr:\n{}", + String::from_utf8_lossy(&output.stdout), + String::from_utf8_lossy(&output.stderr) + ); + String::from_utf8(output.stdout).expect("stdout utf8") +} + +fn write_tiny_rust_workspace(root: &Path) { + fs::write( + root.join("Cargo.toml"), + r#"[package] +name = "report-handoff-fixture" +version = "0.1.0" +edition = "2024" + +[lib] +path = "src/lib.rs" +"#, + ) + .expect("write Cargo.toml"); + let src = root.join("src"); + fs::create_dir_all(&src).expect("create src"); + fs::write( + src.join("lib.rs"), + r#"pub fn entry_point() -> String { + helper("report") +} + +fn helper(value: &str) -> String { + format!("handoff:{value}") +} +"#, + ) + .expect("write lib.rs"); } diff --git a/crates/codestory-cli/tests/search_json_output.rs b/crates/codestory-cli/tests/search_json_output.rs index 6db822e9..e8448f1f 100644 --- a/crates/codestory-cli/tests/search_json_output.rs +++ b/crates/codestory-cli/tests/search_json_output.rs @@ -285,8 +285,9 @@ fn search_json_fails_closed_without_full_sidecars() { "search should report mandatory sidecar full-mode boundary: {stderr}" ); assert!( - stderr.contains("Next commands:") - && stderr.contains("codestory-cli index") + stderr.contains("Minimum next:") + && stderr.contains("Full repair:") + && stderr.contains("codestory-cli retrieval index") && stderr.contains("--refresh full") && stderr.contains("codestory-cli retrieval bootstrap") && stderr.contains("codestory-cli doctor"), @@ -1718,6 +1719,22 @@ fn search_quality_eval_reports_recall_mrr_and_latency_for_symbols_and_routes() { "index command failed: {}", String::from_utf8_lossy(&index.stderr) ); + let retrieval_index = run_cli( + workspace.path(), + &[ + "retrieval", + "index", + "--refresh", + "full", + "--format", + "json", + ], + ); + assert!( + retrieval_index.status.success(), + "retrieval index command failed: {}", + String::from_utf8_lossy(&retrieval_index.stderr) + ); let expectations = [ ("exact_symbol_anchor", "exact_symbol_anchor", "off"), diff --git a/crates/codestory-cli/tests/stdio_protocol_contracts.rs b/crates/codestory-cli/tests/stdio_protocol_contracts.rs index 50a128f7..a1904b89 100644 --- a/crates/codestory-cli/tests/stdio_protocol_contracts.rs +++ b/crates/codestory-cli/tests/stdio_protocol_contracts.rs @@ -1026,7 +1026,13 @@ fn tool_catalog_exposes_output_schemas_for_stable_dto_backed_tools() { "string", "packet outputSchema should expose a stable packet id: {tool}" ); - for field in ["plan", "answer", "budget", "sufficiency", "benchmark_trace"] { + for field in [ + "plan", + "answer", + "budget", + "sufficiency", + "retrieval_trace_summary", + ] { assert!( required_fields(output_schema).contains(field), "packet outputSchema should require {field}: {tool}" @@ -1359,12 +1365,35 @@ fn resources_read_status_reports_browser_readiness_and_next_calls() { "status should include semantic readiness/doc count/fallback information: {status}" ); let next_call_text = status["recommended_next_calls"].to_string(); + let readiness = status["readiness"] + .as_array() + .unwrap_or_else(|| panic!("status should include readiness verdicts: {status}")); + assert!( + readiness + .iter() + .any(|verdict| verdict["goal"] == "agent_packet_search" + && verdict["minimum_next"] + .as_array() + .is_some_and(|commands| !commands.is_empty()) + && verdict["full_repair"] + .as_array() + .is_some_and(|commands| !commands.is_empty())), + "status should expose agent readiness with minimum_next/full_repair: {status}" + ); + assert!( + !next_call_text.contains("\"tool\":\"packet\"") + && !next_call_text.contains("\"tool\":\"search\""), + "status should recommend repair, not packet/search calls, when mode is not full: {status}" + ); assert!( - next_call_text - .find("retrieval status") - .unwrap_or(usize::MAX) - < next_call_text.find("search").unwrap_or(usize::MAX), - "status should recommend sidecar status/index repair before search when mode is not full: {status}" + !next_call_text.contains("codestory-cli index --project") + && next_call_text.contains("--refresh full") + && next_call_text.contains("retrieval bootstrap") + && next_call_text.contains("retrieval index") + && next_call_text.contains("retrieval status") + && next_call_text.contains("codestory://status") + && next_call_text.contains("codestory://agent-guide"), + "status should recommend sidecar repair without repeating a fresh core index when mode is not full: {status}" ); assert!( status @@ -1432,6 +1461,18 @@ fn resources_read_status_reports_stale_index_freshness_with_bounded_latency() { } assert_stale_freshness_counts(&last_status, "codestory://status"); + let status_next_call_text = last_status["recommended_next_calls"].to_string(); + assert!( + !status_next_call_text.contains("\"tool\":\"packet\"") + && !status_next_call_text.contains("\"tool\":\"search\""), + "stale index readiness should recommend repair, not packet/search calls: {last_status}" + ); + assert!( + status_next_call_text.contains("codestory-cli index --project") + && status_next_call_text.contains("--refresh incremental") + && status_next_call_text.contains("codestory://status"), + "stale index readiness should recommend index repair and a status recheck: {last_status}" + ); elapsed.sort_unstable(); let median = elapsed[elapsed.len() / 2]; let p95 = elapsed[(elapsed.len() * 95).div_ceil(100) - 1]; @@ -1665,12 +1706,22 @@ fn search_tool_fails_closed_without_full_retrieval_sidecars() { .as_array() .unwrap_or_else(|| panic!("stdio search error should include next_commands: {response}")); assert!( - next_commands - .iter() - .any(|command| command.as_str().is_some_and(|text| text - .contains("codestory-cli index") - && text.contains("--refresh full"))), - "stdio search error should include index repair command: {response}" + details["minimum_next"] + .as_array() + .is_some_and(|commands| !commands.is_empty()), + "stdio search error should include minimum_next: {response}" + ); + assert!( + details["full_repair"] + .as_array() + .is_some_and(|commands| commands.len() >= next_commands.len()), + "stdio search error should include full_repair: {response}" + ); + assert!( + next_commands.iter().all(|command| command + .as_str() + .is_some_and(|text| !text.contains("codestory-cli index"))), + "stdio search sidecar errors should not repeat core index repair commands: {response}" ); assert!( next_commands.iter().any(|command| command @@ -1678,6 +1729,13 @@ fn search_tool_fails_closed_without_full_retrieval_sidecars() { .is_some_and(|text| text.contains("codestory-cli retrieval bootstrap"))), "stdio search error should include sidecar bootstrap repair command: {response}" ); + assert!( + next_commands.iter().any(|command| command + .as_str() + .is_some_and(|text| text.contains("codestory-cli retrieval status") + && text.contains("--format json"))), + "stdio search error should include sidecar status proof command: {response}" + ); } #[test] @@ -1854,10 +1912,10 @@ fn packet_tool_returns_budgeted_sufficiency_contract() { ); assert!( packet - .pointer("/benchmark_trace/source_read_steps") + .pointer("/retrieval_trace_summary/source_read_steps") .and_then(Value::as_u64) .is_some(), - "stdio packet should include benchmark trace counters: {packet}" + "stdio packet should include retrieval trace summary counters: {packet}" ); let repeated_response = send_json( diff --git a/crates/codestory-cli/tests/stdio_warm_loop_stats.rs b/crates/codestory-cli/tests/stdio_warm_loop_stats.rs index eaa620fa..8461be4a 100644 --- a/crates/codestory-cli/tests/stdio_warm_loop_stats.rs +++ b/crates/codestory-cli/tests/stdio_warm_loop_stats.rs @@ -110,6 +110,7 @@ struct StdioWarmLoopStats { warm_stdio_total_ms: f64, warm_stdio_per_loop_ms: f64, warm_vs_cold_per_loop_ratio: f64, + sidecar_status: ToolLatencyStats, warm_stdio: Vec, state: StateStats, transcript: Vec, @@ -456,29 +457,47 @@ fn warm_tool_stats(samples: &[OperationSample]) -> Vec { } grouped .into_iter() - .map(|(operation, samples)| { - let latencies = samples - .iter() - .map(|sample| sample.elapsed_ms) - .collect::>(); - let bytes = samples - .iter() - .map(|sample| sample.response_bytes) - .collect::>(); - ToolLatencyStats { - operation: operation.to_string(), - samples: samples.len(), - p50_ms: percentile(&latencies, 0.50), - p95_ms: percentile(&latencies, 0.95), - p99_ms: percentile(&latencies, 0.99), - max_ms: percentile(&latencies, 1.0), - response_bytes_p50: percentile_u64(&bytes, 0.50), - response_bytes_max: percentile_u64(&bytes, 1.0), - } - }) + .map(|(operation, samples)| tool_latency_stats(operation, samples.into_iter())) .collect() } +fn operation_stats(samples: &[OperationSample], operation: &str) -> ToolLatencyStats { + let filtered = samples + .iter() + .filter(|sample| sample.operation == operation) + .collect::>(); + tool_latency_stats(operation, filtered.into_iter()) +} + +fn tool_latency_stats<'a>( + operation: &str, + samples: impl Iterator, +) -> ToolLatencyStats { + let samples = samples.collect::>(); + assert!( + !samples.is_empty(), + "missing operation samples for {operation}" + ); + let latencies = samples + .iter() + .map(|sample| sample.elapsed_ms) + .collect::>(); + let bytes = samples + .iter() + .map(|sample| sample.response_bytes) + .collect::>(); + ToolLatencyStats { + operation: operation.to_string(), + samples: samples.len(), + p50_ms: percentile(&latencies, 0.50), + p95_ms: percentile(&latencies, 0.95), + p99_ms: percentile(&latencies, 0.99), + max_ms: percentile(&latencies, 1.0), + response_bytes_p50: percentile_u64(&bytes, 0.50), + response_bytes_max: percentile_u64(&bytes, 1.0), + } +} + #[test] #[ignore = "warm-loop stats harness; run with cargo test -p codestory-cli --test stdio_warm_loop_stats -- --ignored --nocapture after cargo build --release -p codestory-cli"] fn warm_stdio_agent_loop_emits_stats_without_protocol_pollution() { @@ -747,6 +766,7 @@ fn warm_stdio_agent_loop_emits_stats_without_protocol_pollution() { warm_stdio_total_ms, warm_stdio_per_loop_ms, warm_vs_cold_per_loop_ratio, + sidecar_status: operation_stats(&transcript, "resources/read:status"), warm_stdio: warm_tool_stats(&transcript), state: StateStats { warm_search_dir_unchanged, diff --git a/crates/codestory-contracts/Cargo.toml b/crates/codestory-contracts/Cargo.toml index c40447e5..a1dd82d8 100644 --- a/crates/codestory-contracts/Cargo.toml +++ b/crates/codestory-contracts/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "codestory-contracts" -version = "0.6.2" +version = "0.7.0" edition = "2024" [dependencies] diff --git a/crates/codestory-contracts/src/api.rs b/crates/codestory-contracts/src/api.rs index 7cbb3644..3309ad52 100644 --- a/crates/codestory-contracts/src/api.rs +++ b/crates/codestory-contracts/src/api.rs @@ -25,10 +25,12 @@ pub use dto::{ IndexedFileRoleDto, IndexedFilesDto, IndexedFilesRequest, IndexedFilesSummaryDto, ListChildrenSymbolsRequest, ListRootSymbolsRequest, NodeDetailsDto, NodeDetailsRequest, NodeOccurrencesRequest, OpenContainingFolderRequest, OpenDefinitionRequest, OpenProjectRequest, - PacketBenchmarkTraceDto, PacketBudgetDto, PacketBudgetLimitsDto, PacketBudgetModeDto, - PacketBudgetUsageDto, PacketClaimDto, PacketPlanDto, PacketPlanQueryDto, PacketSufficiencyDto, - PacketSufficiencyStatusDto, PacketTaskClassDto, ProjectSummary, ReadFileTextRequest, - ReadFileTextResponse, RepoTextScanStatsDto, RetrievalCandidateResolutionCountDto, + PacketBudgetDto, PacketBudgetLimitsDto, PacketBudgetModeDto, PacketBudgetUsageDto, + PacketClaimDto, PacketPlanDto, PacketPlanQueryDto, PacketRetrievalTraceSummaryDto, + PacketSidecarQueryDiagnosticDto, PacketSufficiencyDto, PacketSufficiencyStatusDto, + PacketTaskClassDto, ProjectSummary, ReadFileTextRequest, ReadFileTextResponse, + ReadinessGoalDto, ReadinessIndexSnapshotDto, ReadinessSidecarSnapshotDto, ReadinessStatusDto, + ReadinessVerdictDto, RepoTextScanStatsDto, RetrievalCandidateResolutionCountDto, RetrievalCandidateSummaryDto, RetrievalFallbackReasonDto, RetrievalModeDto, RetrievalScoreBreakdownDto, RetrievalShadowDto, RetrievalStageTimingDto, RetrievalStateDto, RouteEndpointHandlerDto, RouteEndpointKindDto, RouteEndpointMetadataDto, SearchHit, diff --git a/crates/codestory-contracts/src/api/dto.rs b/crates/codestory-contracts/src/api/dto.rs index 3d5b77cd..70c75e2e 100644 --- a/crates/codestory-contracts/src/api/dto.rs +++ b/crates/codestory-contracts/src/api/dto.rs @@ -27,6 +27,8 @@ pub struct StorageStatsDto { pub edge_count: u32, pub file_count: u32, pub error_count: u32, + #[serde(default)] + pub fatal_error_count: u32, } #[derive(Debug, Clone, Serialize, Deserialize, Type)] @@ -190,6 +192,10 @@ pub struct StoredSemanticDocsContractDto { pub mixed_doc_shapes: bool, #[serde(default, skip_serializing_if = "Option::is_none")] pub doc_shape: Option, + #[serde(default, skip_serializing_if = "Option::is_none")] + pub semantic_policy_version: Option, + #[serde(default)] + pub mixed_semantic_policy_versions: bool, } #[derive(Debug, Clone, Copy, Serialize, Deserialize, Type, PartialEq, Eq)] @@ -229,6 +235,63 @@ pub struct IndexFreshnessDto { pub samples: Vec, } +#[derive(Debug, Clone, Copy, Serialize, Deserialize, Type, PartialEq, Eq)] +#[serde(rename_all = "snake_case")] +pub enum ReadinessGoalDto { + LocalNavigation, + AgentPacketSearch, +} + +#[derive(Debug, Clone, Copy, Serialize, Deserialize, Type, PartialEq, Eq)] +#[serde(rename_all = "snake_case")] +pub enum ReadinessStatusDto { + Ready, + RepairIndex, + CheckIndex, + RepairRetrieval, +} + +#[derive(Debug, Clone, Serialize, Deserialize, Type, PartialEq, Eq)] +pub struct ReadinessIndexSnapshotDto { + #[serde(default, skip_serializing_if = "Option::is_none")] + pub status: Option, + #[serde(default)] + pub error_count: u32, + #[serde(default)] + pub fatal_error_count: u32, + pub changed_file_count: u32, + pub new_file_count: u32, + pub removed_file_count: u32, + pub checked_file_count: u32, + pub indexed_file_count: u32, +} + +#[derive(Debug, Clone, Serialize, Deserialize, Type, PartialEq, Eq)] +pub struct ReadinessSidecarSnapshotDto { + pub retrieval_mode: String, + #[serde(default, skip_serializing_if = "Option::is_none")] + pub degraded_reason: Option, + #[serde(default, skip_serializing_if = "Option::is_none")] + pub manifest_generation: Option, + #[serde(default, skip_serializing_if = "Option::is_none")] + pub manifest_input_hash: Option, +} + +#[derive(Debug, Clone, Serialize, Deserialize, Type, PartialEq, Eq)] +pub struct ReadinessVerdictDto { + pub goal: ReadinessGoalDto, + pub status: ReadinessStatusDto, + pub summary: String, + #[serde(default, skip_serializing_if = "Vec::is_empty")] + pub minimum_next: Vec, + #[serde(default, skip_serializing_if = "Vec::is_empty")] + pub full_repair: Vec, + #[serde(default, skip_serializing_if = "Option::is_none")] + pub index: Option, + #[serde(default, skip_serializing_if = "Option::is_none")] + pub sidecar: Option, +} + #[derive(Debug, Clone, Serialize, Deserialize, Type)] pub struct SearchHit { pub node_id: NodeId, @@ -504,12 +567,19 @@ pub struct IndexedFileDto { pub struct IndexedFileLanguageCountDto { pub language: String, pub file_count: u32, + pub support_mode: String, + pub evidence_tier: String, + pub claim_label: String, } #[derive(Debug, Clone, Serialize, Deserialize, Type)] pub struct IndexedFilesSummaryDto { pub file_count: u32, pub indexed_file_count: u32, + #[serde(default)] + pub filtered_file_count: u32, + #[serde(default)] + pub visible_file_count: u32, pub incomplete_file_count: u32, pub error_file_count: u32, pub truncated: bool, @@ -533,7 +603,8 @@ pub struct FrameworkRouteCoverageDto { pub framework: String, pub language: String, pub status: String, - pub fixture_status: String, + #[serde(alias = "fixture_status")] + pub coverage_evidence: String, pub confidence_floor: String, pub handler_link_support: String, #[serde(default)] @@ -1305,6 +1376,8 @@ pub struct RetrievalScoreBreakdownDto { pub semantic: f32, pub graph: f32, pub total: f32, + #[serde(default, skip_serializing_if = "Vec::is_empty")] + pub provenance: Vec, } #[derive(Debug, Clone, Serialize, Deserialize, Type)] @@ -1507,6 +1580,17 @@ pub struct RetrievalShadowDto { pub candidate_resolution_counts: Vec, } +#[derive(Debug, Clone, Serialize, Deserialize, Type, PartialEq, Eq)] +pub struct PacketSidecarQueryDiagnosticDto { + pub query: String, + pub retrieval_mode: String, + pub candidate_count: u32, + pub resolved_hit_count: u32, + pub unresolved_candidate_count: u32, + #[serde(default, skip_serializing_if = "Option::is_none")] + pub diagnostic: Option, +} + #[derive(Debug, Clone, Serialize, Deserialize, Type)] pub struct AgentRetrievalTraceDto { pub request_id: String, @@ -1524,6 +1608,8 @@ pub struct AgentRetrievalTraceDto { #[serde(default)] pub annotations: Vec, pub steps: Vec, + #[serde(default, skip_serializing_if = "Vec::is_empty")] + pub packet_sidecar_diagnostics: Vec, #[serde(default, skip_serializing_if = "Option::is_none")] pub retrieval_shadow: Option, } @@ -1674,6 +1760,8 @@ pub struct AgentPacketRequestDto { pub budget: PacketBudgetModeDto, #[serde(default, skip_serializing_if = "Option::is_none")] pub task_class: Option, + #[serde(default, skip_serializing_if = "Vec::is_empty")] + pub extra_probes: Vec, #[serde(default = "default_include_evidence")] pub include_evidence: bool, #[serde(default, skip_serializing_if = "Option::is_none")] @@ -1714,6 +1802,7 @@ pub struct PacketBudgetDto { pub enum PacketSufficiencyStatusDto { Sufficient, Partial, + #[serde(rename = "blocked", alias = "insufficient")] Insufficient, } @@ -1734,13 +1823,15 @@ pub struct PacketSufficiencyDto { #[serde(default)] pub avoid_opening: Vec, #[serde(default)] + pub avoid_opening_paths: Vec, + #[serde(default)] pub gaps: Vec, #[serde(default)] pub follow_up_commands: Vec, } #[derive(Debug, Clone, Serialize, Deserialize, Type)] -pub struct PacketBenchmarkTraceDto { +pub struct PacketRetrievalTraceSummaryDto { pub retrieval_trace: AgentRetrievalTraceDto, pub source_read_steps: u32, pub search_steps: u32, @@ -1757,7 +1848,8 @@ pub struct AgentPacketDto { pub answer: AgentAnswerDto, pub budget: PacketBudgetDto, pub sufficiency: PacketSufficiencyDto, - pub benchmark_trace: PacketBenchmarkTraceDto, + #[serde(alias = "benchmark_trace")] + pub retrieval_trace_summary: PacketRetrievalTraceSummaryDto, } #[derive(Debug, Clone, Serialize, Deserialize, Type)] @@ -1858,6 +1950,7 @@ mod packet_tests { semantic_fallbacks: Vec::new(), annotations: Vec::new(), steps: Vec::new(), + packet_sidecar_diagnostics: Vec::new(), retrieval_shadow: Some(RetrievalShadowDto { retrieval_mode: "unavailable".to_string(), degraded_reason: Some("sidecar_unavailable".to_string()), @@ -1887,19 +1980,93 @@ mod packet_tests { ); } + #[test] + fn framework_route_coverage_uses_product_evidence_field_with_legacy_alias() { + let coverage = FrameworkRouteCoverageDto { + framework: "express".to_string(), + language: "javascript/typescript".to_string(), + status: "partial".to_string(), + coverage_evidence: "validated_by_indexer_regression".to_string(), + confidence_floor: "heuristic".to_string(), + handler_link_support: "probable_when_handler_name_resolves".to_string(), + unsupported_patterns: vec!["router composition is partial".to_string()], + known_gaps: vec!["mounted prefixes are not globally propagated".to_string()], + promotable: true, + }; + + let value = serde_json::to_value(&coverage).expect("serialize"); + assert_eq!( + value["coverage_evidence"], + "validated_by_indexer_regression" + ); + assert!( + value.get("fixture_status").is_none(), + "product JSON should use coverage_evidence, not fixture_status" + ); + + let legacy: FrameworkRouteCoverageDto = serde_json::from_str( + r#"{ + "framework":"express", + "language":"javascript/typescript", + "status":"partial", + "fixture_status":"covered_by_indexer_unit_fixture", + "confidence_floor":"heuristic", + "handler_link_support":"probable_when_handler_name_resolves", + "unsupported_patterns":[], + "known_gaps":[], + "promotable":true + }"#, + ) + .expect("deserialize legacy field spelling"); + assert_eq!(legacy.coverage_evidence, "covered_by_indexer_unit_fixture"); + } + #[test] fn packet_sufficiency_serializes_status_as_snake_case() { - let value = serde_json::to_value(PacketSufficiencyDto { + let partial = serde_json::to_value(PacketSufficiencyDto { status: PacketSufficiencyStatusDto::Partial, covered_claims: Vec::new(), open_next: vec!["codestory-cli search --query runtime".to_string()], avoid_opening: Vec::new(), + avoid_opening_paths: Vec::new(), gaps: vec!["No focused symbol selected.".to_string()], follow_up_commands: Vec::new(), }) .expect("serialize"); - assert_eq!(value["status"], "partial"); + assert_eq!(partial["status"], "partial"); + + let blocked = serde_json::to_value(PacketSufficiencyDto { + status: PacketSufficiencyStatusDto::Insufficient, + covered_claims: Vec::new(), + open_next: Vec::new(), + avoid_opening: Vec::new(), + avoid_opening_paths: vec!["crates/codestory-cli/src/main.rs".to_string()], + gaps: vec!["Sidecar readiness is not full.".to_string()], + follow_up_commands: Vec::new(), + }) + .expect("serialize"); + + assert_eq!(blocked["status"], "blocked"); + assert_eq!( + blocked["avoid_opening_paths"], + serde_json::json!(["crates/codestory-cli/src/main.rs"]) + ); + let legacy: PacketSufficiencyDto = serde_json::from_str( + r#"{ + "status": "partial", + "covered_claims": [], + "open_next": [], + "avoid_opening": ["crates/codestory-cli/src/main.rs because cited"], + "gaps": [], + "follow_up_commands": [] + }"#, + ) + .expect("deserialize legacy sufficiency without raw paths"); + assert!(legacy.avoid_opening_paths.is_empty()); + let legacy: PacketSufficiencyStatusDto = + serde_json::from_str("\"insufficient\"").expect("deserialize legacy status"); + assert_eq!(legacy, PacketSufficiencyStatusDto::Insufficient); } #[test] diff --git a/crates/codestory-contracts/src/api/errors.rs b/crates/codestory-contracts/src/api/errors.rs index 1cc34baf..dc9ef94f 100644 --- a/crates/codestory-contracts/src/api/errors.rs +++ b/crates/codestory-contracts/src/api/errors.rs @@ -1,12 +1,14 @@ use serde::{Deserialize, Serialize}; use specta::Type; +use super::dto::ReadinessVerdictDto; + #[derive(Debug, Clone, Serialize, Deserialize, Type)] pub struct ApiError { pub code: String, pub message: String, #[serde(skip_serializing_if = "Option::is_none")] - pub details: Option, + pub details: Option>, } #[derive(Debug, Clone, Serialize, Deserialize, Type)] @@ -17,15 +19,39 @@ pub struct ApiErrorDetails { pub project: Option, #[serde(default, skip_serializing_if = "Vec::is_empty")] pub next_commands: Vec, + #[serde(default, skip_serializing_if = "Vec::is_empty")] + pub minimum_next: Vec, + #[serde(default, skip_serializing_if = "Vec::is_empty")] + pub full_repair: Vec, + #[serde(default, skip_serializing_if = "Option::is_none")] + pub readiness: Option, } impl ApiErrorDetails { pub fn retrieval_unavailable(project: impl Into, next_commands: Vec) -> Self { + let minimum_next = next_commands.iter().take(2).cloned().collect::>(); Self { failed_layer: Some("retrieval_sidecar".to_string()), project: Some(project.into()), + minimum_next, + full_repair: next_commands.clone(), next_commands, + readiness: None, + } + } + + pub fn with_readiness(mut self, readiness: ReadinessVerdictDto) -> Self { + if self.minimum_next.is_empty() { + self.minimum_next = readiness.minimum_next.clone(); + } + if self.full_repair.is_empty() { + self.full_repair = readiness.full_repair.clone(); } + if self.next_commands.is_empty() { + self.next_commands = self.full_repair.clone(); + } + self.readiness = Some(readiness); + self } } @@ -46,7 +72,7 @@ impl ApiError { Self { code: code.into(), message: message.into(), - details: Some(details), + details: Some(Box::new(details)), } } @@ -85,9 +111,12 @@ mod tests { "sidecar retrieval primary is unavailable or degraded", "C:/repo/example", vec![ - "codestory-cli index --project \"C:/repo/example\" --refresh full".to_string(), "codestory-cli retrieval bootstrap --project \"C:/repo/example\" --format json" .to_string(), + "codestory-cli retrieval index --project \"C:/repo/example\" --refresh full --format json" + .to_string(), + "codestory-cli retrieval status --project \"C:/repo/example\" --format json" + .to_string(), ], ); @@ -98,7 +127,15 @@ mod tests { assert_eq!(value["details"]["project"], "C:/repo/example"); assert_eq!( value["details"]["next_commands"][0], - "codestory-cli index --project \"C:/repo/example\" --refresh full" + "codestory-cli retrieval bootstrap --project \"C:/repo/example\" --format json" + ); + assert_eq!( + value["details"]["minimum_next"][0], + "codestory-cli retrieval bootstrap --project \"C:/repo/example\" --format json" + ); + assert_eq!( + value["details"]["full_repair"][1], + "codestory-cli retrieval index --project \"C:/repo/example\" --refresh full --format json" ); } } diff --git a/crates/codestory-contracts/src/api/events.rs b/crates/codestory-contracts/src/api/events.rs index 090390cb..6669c38f 100644 --- a/crates/codestory-contracts/src/api/events.rs +++ b/crates/codestory-contracts/src/api/events.rs @@ -34,6 +34,22 @@ pub struct IndexingPhaseTimings { #[serde(default, skip_serializing_if = "Option::is_none")] pub semantic_docs_stale: Option, #[serde(default, skip_serializing_if = "Option::is_none")] + pub symbol_search_docs_written: Option, + #[serde(default, skip_serializing_if = "Option::is_none")] + pub semantic_dense_docs_skipped: Option, + #[serde(default, skip_serializing_if = "Option::is_none")] + pub semantic_dense_public_api: Option, + #[serde(default, skip_serializing_if = "Option::is_none")] + pub semantic_dense_entrypoint: Option, + #[serde(default, skip_serializing_if = "Option::is_none")] + pub semantic_dense_documented_nontrivial: Option, + #[serde(default, skip_serializing_if = "Option::is_none")] + pub semantic_dense_central_graph_node: Option, + #[serde(default, skip_serializing_if = "Option::is_none")] + pub semantic_dense_component_report: Option, + #[serde(default, skip_serializing_if = "Option::is_none")] + pub semantic_dense_unstructured_doc: Option, + #[serde(default, skip_serializing_if = "Option::is_none")] pub deferred_indexes_ms: Option, #[serde(default, skip_serializing_if = "Option::is_none")] pub summary_snapshot_ms: Option, @@ -184,6 +200,14 @@ mod tests { semantic_docs_embedded: None, semantic_docs_pending: None, semantic_docs_stale: None, + symbol_search_docs_written: None, + semantic_dense_docs_skipped: None, + semantic_dense_public_api: None, + semantic_dense_entrypoint: None, + semantic_dense_documented_nontrivial: None, + semantic_dense_central_graph_node: None, + semantic_dense_component_report: None, + semantic_dense_unstructured_doc: None, deferred_indexes_ms: None, summary_snapshot_ms: None, detail_snapshot_ms: None, @@ -252,6 +276,14 @@ mod tests { assert!(value.get("semantic_docs_embedded").is_none()); assert!(value.get("semantic_docs_pending").is_none()); assert!(value.get("semantic_docs_stale").is_none()); + assert!(value.get("symbol_search_docs_written").is_none()); + assert!(value.get("semantic_dense_docs_skipped").is_none()); + assert!(value.get("semantic_dense_public_api").is_none()); + assert!(value.get("semantic_dense_entrypoint").is_none()); + assert!(value.get("semantic_dense_documented_nontrivial").is_none()); + assert!(value.get("semantic_dense_central_graph_node").is_none()); + assert!(value.get("semantic_dense_component_report").is_none()); + assert!(value.get("semantic_dense_unstructured_doc").is_none()); assert!(value.get("resolution_call_candidate_index_ms").is_none()); assert!(value.get("resolution_import_candidate_index_ms").is_none()); assert!(value.get("resolution_call_semantic_index_ms").is_none()); diff --git a/crates/codestory-contracts/src/language_support.rs b/crates/codestory-contracts/src/language_support.rs new file mode 100644 index 00000000..9619cdc5 --- /dev/null +++ b/crates/codestory-contracts/src/language_support.rs @@ -0,0 +1,162 @@ +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub enum LanguageSupportMode { + ParserBackedGraph, + StructuralCollector, +} + +impl LanguageSupportMode { + pub const fn as_str(self) -> &'static str { + match self { + Self::ParserBackedGraph => "parser_backed_graph", + Self::StructuralCollector => "structural_collector", + } + } +} + +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub enum LanguageEvidenceTier { + GraphFidelity, + StructuralOnly, +} + +impl LanguageEvidenceTier { + pub const fn as_str(self) -> &'static str { + match self { + Self::GraphFidelity => "graph_fidelity", + Self::StructuralOnly => "structural_only", + } + } +} + +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub struct LanguageSupportProfile { + pub language_name: &'static str, + pub extensions: &'static [&'static str], + pub support_mode: LanguageSupportMode, + pub evidence_tier: LanguageEvidenceTier, + pub claim_label: &'static str, +} + +const PARSER_BACKED_GRAPH: &str = "parser-backed graph, fidelity-gated"; +const STRUCTURAL_COLLECTOR: &str = "structural collector only"; + +pub const LANGUAGE_SUPPORT_PROFILES: &[LanguageSupportProfile] = &[ + parser_profile("python", &["py", "pyi"]), + parser_profile("java", &["java"]), + parser_profile("rust", &["rs"]), + parser_profile("javascript", &["js", "jsx", "mjs", "cjs"]), + parser_profile("typescript", &["ts", "tsx", "mts", "cts"]), + parser_profile("cpp", &["cpp", "cc", "cxx", "hpp", "hh", "hxx"]), + parser_profile("c", &["c", "h"]), + parser_profile("go", &["go"]), + parser_profile("ruby", &["rb"]), + parser_profile("php", &["php"]), + parser_profile("csharp", &["cs"]), + parser_profile("kotlin", &["kt", "kts"]), + parser_profile("swift", &["swift"]), + parser_profile("dart", &["dart"]), + parser_profile("bash", &["sh", "bash"]), + structural_profile("html", &["html", "htm"]), + structural_profile("css", &["css"]), + structural_profile("sql", &["sql"]), +]; + +const fn parser_profile( + language_name: &'static str, + extensions: &'static [&'static str], +) -> LanguageSupportProfile { + LanguageSupportProfile { + language_name, + extensions, + support_mode: LanguageSupportMode::ParserBackedGraph, + evidence_tier: LanguageEvidenceTier::GraphFidelity, + claim_label: PARSER_BACKED_GRAPH, + } +} + +const fn structural_profile( + language_name: &'static str, + extensions: &'static [&'static str], +) -> LanguageSupportProfile { + LanguageSupportProfile { + language_name, + extensions, + support_mode: LanguageSupportMode::StructuralCollector, + evidence_tier: LanguageEvidenceTier::StructuralOnly, + claim_label: STRUCTURAL_COLLECTOR, + } +} + +pub fn normalize_extension(ext: &str) -> String { + ext.trim().trim_start_matches('.').to_ascii_lowercase() +} + +pub fn language_support_profile_for_ext(ext: &str) -> Option<&'static LanguageSupportProfile> { + let ext = normalize_extension(ext); + LANGUAGE_SUPPORT_PROFILES + .iter() + .find(|profile| profile.extensions.iter().any(|candidate| *candidate == ext)) +} + +pub fn language_support_profile_for_language_name( + language_name: &str, +) -> Option<&'static LanguageSupportProfile> { + let language_name = language_name.trim().to_ascii_lowercase(); + LANGUAGE_SUPPORT_PROFILES + .iter() + .find(|profile| profile.language_name == language_name) +} + +pub fn language_name_for_path(path: Option<&str>) -> Option<&'static str> { + let ext = path?.rsplit('.').next()?.trim_start_matches('.'); + language_support_profile_for_ext(ext).map(|profile| profile.language_name) +} + +pub fn supported_extensions() -> impl Iterator { + LANGUAGE_SUPPORT_PROFILES + .iter() + .flat_map(|profile| profile.extensions.iter().copied()) +} + +#[cfg(test)] +mod tests { + use super::*; + use std::collections::HashSet; + + #[test] + fn profile_lookup_covers_claimed_parser_and_structural_languages() { + assert_eq!( + language_support_profile_for_ext("kt") + .expect("kotlin profile") + .language_name, + "kotlin" + ); + assert_eq!( + language_support_profile_for_ext(".swift") + .expect("swift profile") + .support_mode, + LanguageSupportMode::ParserBackedGraph + ); + assert_eq!( + language_support_profile_for_ext("html") + .expect("html profile") + .evidence_tier, + LanguageEvidenceTier::StructuralOnly + ); + assert!( + language_name_for_path(Some("src/app/Program.cshtml")).is_none(), + "Razor .cshtml files are workspace-compatible, but not a public parser-backed C# claim" + ); + } + + #[test] + fn profile_extensions_are_unique() { + let mut seen = HashSet::new(); + for extension in supported_extensions() { + assert!( + seen.insert(extension), + "extension should have exactly one owner: {extension}" + ); + } + } +} diff --git a/crates/codestory-contracts/src/lib.rs b/crates/codestory-contracts/src/lib.rs index 470aec83..ed2280e1 100644 --- a/crates/codestory-contracts/src/lib.rs +++ b/crates/codestory-contracts/src/lib.rs @@ -2,6 +2,7 @@ pub mod api; pub mod events; pub mod graph; pub mod grounding; +pub mod language_support; pub mod query; pub mod trail; pub mod workspace; diff --git a/crates/codestory-indexer/Cargo.toml b/crates/codestory-indexer/Cargo.toml index 0a71f7b8..5d42c794 100644 --- a/crates/codestory-indexer/Cargo.toml +++ b/crates/codestory-indexer/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "codestory-indexer" -version = "0.6.2" +version = "0.7.0" edition = "2024" [dev-dependencies] @@ -35,3 +35,7 @@ tree-sitter-go = { workspace = true } tree-sitter-ruby = { workspace = true } tree-sitter-php = { workspace = true } tree-sitter-c-sharp = { workspace = true } +tree-sitter-kotlin-ng = { workspace = true } +tree-sitter-swift = { workspace = true } +tree-sitter-dart-orchard = { workspace = true } +tree-sitter-bash = { workspace = true } diff --git a/crates/codestory-indexer/rules/bash.scm b/crates/codestory-indexer/rules/bash.scm new file mode 100644 index 00000000..56940e4a --- /dev/null +++ b/crates/codestory-indexer/rules/bash.scm @@ -0,0 +1,41 @@ +(function_definition + name: (word) @name) @def +{ + node @name.node + attr (@name.node) kind = "FUNCTION" + attr (@name.node) name = (source-text @name) + attr (@name.node) start_row = (start-row @def) + attr (@name.node) start_col = (start-column @def) + attr (@name.node) end_row = (end-row @def) + attr (@name.node) end_col = (end-column @def) +} + +(variable_assignment + name: (variable_name) @name) @def +{ + node @name.node + attr (@name.node) kind = "VARIABLE" + attr (@name.node) name = (source-text @name) + attr (@name.node) start_row = (start-row @def) + attr (@name.node) start_col = (start-column @def) + attr (@name.node) end_row = (end-row @def) + attr (@name.node) end_col = (end-column @def) +} + +;; Commands are the shell call graph floor. +(command + name: (command_name + (word) @callee_any)) @call_any +{ + node @call_any.node + attr (@call_any.node) kind = "UNKNOWN" + attr (@call_any.node) name = (source-text @callee_any) + attr (@call_any.node) start_row = (start-row @callee_any) + attr (@call_any.node) start_col = (start-column @callee_any) + attr (@call_any.node) end_row = (end-row @callee_any) + attr (@call_any.node) end_col = (end-column @callee_any) + + edge @call_any.node -> @call_any.node + attr (@call_any.node -> @call_any.node) kind = "CALL" + attr (@call_any.node -> @call_any.node) line = (start-row @call_any) +} diff --git a/crates/codestory-indexer/rules/dart.scm b/crates/codestory-indexer/rules/dart.scm new file mode 100644 index 00000000..0440d90a --- /dev/null +++ b/crates/codestory-indexer/rules/dart.scm @@ -0,0 +1,193 @@ +(class_definition + name: (identifier) @name) @def +{ + node @name.node + attr (@name.node) kind = "CLASS" + attr (@name.node) name = (source-text @name) + attr (@name.node) start_row = (start-row @def) + attr (@name.node) start_col = (start-column @def) + attr (@name.node) end_row = (end-row @def) + attr (@name.node) end_col = (end-column @def) +} + +(mixin_declaration + name: (identifier) @name) @def +{ + node @name.node + attr (@name.node) kind = "CLASS" + attr (@name.node) name = (source-text @name) + attr (@name.node) start_row = (start-row @def) + attr (@name.node) start_col = (start-column @def) + attr (@name.node) end_row = (end-row @def) + attr (@name.node) end_col = (end-column @def) +} + +(enum_declaration + name: (identifier) @name) @def +{ + node @name.node + attr (@name.node) kind = "ENUM" + attr (@name.node) name = (source-text @name) + attr (@name.node) start_row = (start-row @def) + attr (@name.node) start_col = (start-column @def) + attr (@name.node) end_row = (end-row @def) + attr (@name.node) end_col = (end-column @def) +} + +(extension_declaration + name: (identifier) @name) @def +{ + node @name.node + attr (@name.node) kind = "CLASS" + attr (@name.node) name = (source-text @name) + attr (@name.node) start_row = (start-row @def) + attr (@name.node) start_col = (start-column @def) + attr (@name.node) end_row = (end-row @def) + attr (@name.node) end_col = (end-column @def) +} + +(program + (function_signature + name: (identifier) @name) @def) +{ + node @name.node + attr (@name.node) kind = "FUNCTION" + attr (@name.node) name = (source-text @name) + attr (@name.node) start_row = (start-row @def) + attr (@name.node) start_col = (start-column @def) + attr (@name.node) end_row = (end-row @def) + attr (@name.node) end_col = (end-column @def) +} + +(method_signature + (function_signature + name: (identifier) @name)) @def +{ + node @name.node + attr (@name.node) kind = "METHOD" + attr (@name.node) name = (source-text @name) + attr (@name.node) start_row = (start-row @def) + attr (@name.node) start_col = (start-column @def) + attr (@name.node) end_row = (end-row @def) + attr (@name.node) end_col = (end-column @def) +} + +(declaration + (function_signature + name: (identifier) @name)) @def +{ + node @name.node + attr (@name.node) kind = "METHOD" + attr (@name.node) name = (source-text @name) + attr (@name.node) start_row = (start-row @def) + attr (@name.node) start_col = (start-column @def) + attr (@name.node) end_row = (end-row @def) + attr (@name.node) end_col = (end-column @def) +} + +;; Membership +(class_definition + name: (identifier) @class_name + body: (class_body + (method_signature + (function_signature + name: (identifier) @method_name)))) +{ + edge @class_name.node -> @method_name.node + attr (@class_name.node -> @method_name.node) kind = "MEMBER" +} + +(class_definition + name: (identifier) @class_name + body: (class_body + (declaration + (function_signature + name: (identifier) @method_name)))) +{ + edge @class_name.node -> @method_name.node + attr (@class_name.node -> @method_name.node) kind = "MEMBER" +} + +(mixin_declaration + name: (identifier) @class_name + body: (class_body + (method_signature + (function_signature + name: (identifier) @method_name)))) +{ + edge @class_name.node -> @method_name.node + attr (@class_name.node -> @method_name.node) kind = "MEMBER" +} + +;; Inheritance and interfaces +(class_definition + name: (identifier) @class_name + superclass: (superclass + (type_identifier) @parent_name)) +{ + node @parent_name.node + attr (@parent_name.node) kind = "CLASS" + attr (@parent_name.node) name = (source-text @parent_name) + attr (@parent_name.node) start_row = (start-row @parent_name) + attr (@parent_name.node) start_col = (start-column @parent_name) + attr (@parent_name.node) end_row = (end-row @parent_name) + attr (@parent_name.node) end_col = (end-column @parent_name) + + edge @class_name.node -> @parent_name.node + attr (@class_name.node -> @parent_name.node) kind = "INHERITANCE" +} + +(class_definition + name: (identifier) @class_name + interfaces: (interfaces + (type_identifier) @parent_name)) +{ + node @parent_name.node + attr (@parent_name.node) kind = "INTERFACE" + attr (@parent_name.node) name = (source-text @parent_name) + attr (@parent_name.node) start_row = (start-row @parent_name) + attr (@parent_name.node) start_col = (start-column @parent_name) + attr (@parent_name.node) end_row = (end-row @parent_name) + attr (@parent_name.node) end_col = (end-column @parent_name) + + edge @class_name.node -> @parent_name.node + attr (@class_name.node -> @parent_name.node) kind = "INHERITANCE" +} + +;; Direct calls parse as an identifier followed by a selector argument part. +(expression_statement + (identifier) @callee_any + (selector + (argument_part + (arguments)))) @call_any +{ + node @call_any.node + attr (@call_any.node) kind = "UNKNOWN" + attr (@call_any.node) name = (source-text @callee_any) + attr (@call_any.node) start_row = (start-row @callee_any) + attr (@call_any.node) start_col = (start-column @callee_any) + attr (@call_any.node) end_row = (end-row @callee_any) + attr (@call_any.node) end_col = (end-column @callee_any) + + edge @call_any.node -> @call_any.node + attr (@call_any.node -> @call_any.node) kind = "CALL" + attr (@call_any.node -> @call_any.node) line = (start-row @call_any) +} + +;; Imports +(import_specification + (configurable_uri + (uri + (string_literal) @module))) +{ + node @module.node + attr (@module.node) kind = "MODULE" + attr (@module.node) name = (source-text @module) + attr (@module.node) start_row = (start-row @module) + attr (@module.node) start_col = (start-column @module) + attr (@module.node) end_row = (end-row @module) + attr (@module.node) end_col = (end-column @module) + + edge @module.node -> @module.node + attr (@module.node -> @module.node) kind = "IMPORT" +} diff --git a/crates/codestory-indexer/rules/go.scm b/crates/codestory-indexer/rules/go.scm index c9e13260..72874796 100644 --- a/crates/codestory-indexer/rules/go.scm +++ b/crates/codestory-indexer/rules/go.scm @@ -22,6 +22,22 @@ attr (@name.node) end_col = (end-column @def) } +(type_declaration + (type_spec + name: (type_identifier) + type: (interface_type + (method_elem + name: (field_identifier) @name) @def))) +{ + node @name.node + attr (@name.node) kind = "METHOD" + attr (@name.node) name = (source-text @name) + attr (@name.node) start_row = (start-row @def) + attr (@name.node) start_col = (start-column @def) + attr (@name.node) end_row = (end-row @def) + attr (@name.node) end_col = (end-column @def) +} + (type_declaration (type_spec name: (type_identifier) @name)) @def diff --git a/crates/codestory-indexer/rules/java.scm b/crates/codestory-indexer/rules/java.scm index ba6b22fb..37b3e8b5 100644 --- a/crates/codestory-indexer/rules/java.scm +++ b/crates/codestory-indexer/rules/java.scm @@ -583,10 +583,12 @@ attr (@module.node -> @module.node) kind = "IMPORT" } -;; Lambda assignment -(variable_declarator - name: (identifier) @name - value: (lambda_expression) @def) +;; Lambda assignment for local variables. Field declarations are handled by the +;; field rule above; matching them again here creates duplicate graph variables. +(local_variable_declaration + (variable_declarator + name: (identifier) @name + value: (lambda_expression) @def)) { node @name.node attr (@name.node) kind = "FUNCTION" diff --git a/crates/codestory-indexer/rules/kotlin.scm b/crates/codestory-indexer/rules/kotlin.scm new file mode 100644 index 00000000..97a6b48a --- /dev/null +++ b/crates/codestory-indexer/rules/kotlin.scm @@ -0,0 +1,219 @@ +(class_declaration + "class" + name: (identifier) @name) @def +{ + node @name.node + attr (@name.node) kind = "CLASS" + attr (@name.node) name = (source-text @name) + attr (@name.node) start_row = (start-row @def) + attr (@name.node) start_col = (start-column @def) + attr (@name.node) end_row = (end-row @def) + attr (@name.node) end_col = (end-column @def) +} + +(class_declaration + "interface" + name: (identifier) @name) @def +{ + node @name.node + attr (@name.node) kind = "INTERFACE" + attr (@name.node) name = (source-text @name) + attr (@name.node) start_row = (start-row @def) + attr (@name.node) start_col = (start-column @def) + attr (@name.node) end_row = (end-row @def) + attr (@name.node) end_col = (end-column @def) +} + +(object_declaration + name: (identifier) @name) @def +{ + node @name.node + attr (@name.node) kind = "CLASS" + attr (@name.node) name = (source-text @name) + attr (@name.node) start_row = (start-row @def) + attr (@name.node) start_col = (start-column @def) + attr (@name.node) end_row = (end-row @def) + attr (@name.node) end_col = (end-column @def) +} + +(function_declaration + name: (identifier) @name) @def +{ + node @name.node + attr (@name.node) kind = "FUNCTION" + attr (@name.node) name = (source-text @name) + attr (@name.node) start_row = (start-row @def) + attr (@name.node) start_col = (start-column @def) + attr (@name.node) end_row = (end-row @def) + attr (@name.node) end_col = (end-column @def) +} + +(type_alias + type: (identifier) @name) @def +{ + node @name.node + attr (@name.node) kind = "TYPEDEF" + attr (@name.node) name = (source-text @name) + attr (@name.node) start_row = (start-row @def) + attr (@name.node) start_col = (start-column @def) + attr (@name.node) end_row = (end-row @def) + attr (@name.node) end_col = (end-column @def) +} + +(package_header + (qualified_identifier) @name) @def +{ + node @name.node + attr (@name.node) kind = "MODULE" + attr (@name.node) name = (source-text @name) + attr (@name.node) start_row = (start-row @def) + attr (@name.node) start_col = (start-column @def) + attr (@name.node) end_row = (end-row @def) + attr (@name.node) end_col = (end-column @def) +} + +;; Membership +(class_declaration + name: (identifier) @class_name + (class_body + (function_declaration name: (identifier) @method_name))) +{ + edge @class_name.node -> @method_name.node + attr (@class_name.node -> @method_name.node) kind = "MEMBER" +} + +(object_declaration + name: (identifier) @class_name + (class_body + (function_declaration name: (identifier) @method_name))) +{ + edge @class_name.node -> @method_name.node + attr (@class_name.node -> @method_name.node) kind = "MEMBER" +} + +;; Inheritance and interface implementation +(class_declaration + name: (identifier) @class_name + (delegation_specifiers + (delegation_specifier + (type + (user_type + (identifier) @parent_name))))) +{ + node @parent_name.node + attr (@parent_name.node) kind = "CLASS" + attr (@parent_name.node) name = (source-text @parent_name) + attr (@parent_name.node) start_row = (start-row @parent_name) + attr (@parent_name.node) start_col = (start-column @parent_name) + attr (@parent_name.node) end_row = (end-row @parent_name) + attr (@parent_name.node) end_col = (end-column @parent_name) + + edge @class_name.node -> @parent_name.node + attr (@class_name.node -> @parent_name.node) kind = "INHERITANCE" +} + +(class_declaration + name: (identifier) @class_name + (delegation_specifiers + (delegation_specifier + (user_type + (identifier) @parent_name)))) +{ + node @parent_name.node + attr (@parent_name.node) kind = "CLASS" + attr (@parent_name.node) name = (source-text @parent_name) + attr (@parent_name.node) start_row = (start-row @parent_name) + attr (@parent_name.node) start_col = (start-column @parent_name) + attr (@parent_name.node) end_row = (end-row @parent_name) + attr (@parent_name.node) end_col = (end-column @parent_name) + + edge @class_name.node -> @parent_name.node + attr (@class_name.node -> @parent_name.node) kind = "INHERITANCE" +} + +(class_declaration + name: (identifier) @class_name + (delegation_specifiers + (delegation_specifier + (constructor_invocation + (user_type + (identifier) @parent_name))))) +{ + node @parent_name.node + attr (@parent_name.node) kind = "CLASS" + attr (@parent_name.node) name = (source-text @parent_name) + attr (@parent_name.node) start_row = (start-row @parent_name) + attr (@parent_name.node) start_col = (start-column @parent_name) + attr (@parent_name.node) end_row = (end-row @parent_name) + attr (@parent_name.node) end_col = (end-column @parent_name) + + edge @class_name.node -> @parent_name.node + attr (@class_name.node -> @parent_name.node) kind = "INHERITANCE" +} + +;; Calls +(call_expression + (identifier) @callee_any) @call_any +{ + node @call_any.node + attr (@call_any.node) kind = "UNKNOWN" + attr (@call_any.node) name = (source-text @callee_any) + attr (@call_any.node) start_row = (start-row @callee_any) + attr (@call_any.node) start_col = (start-column @callee_any) + attr (@call_any.node) end_row = (end-row @callee_any) + attr (@call_any.node) end_col = (end-column @callee_any) + + edge @call_any.node -> @call_any.node + attr (@call_any.node -> @call_any.node) kind = "CALL" + attr (@call_any.node -> @call_any.node) line = (start-row @call_any) +} + +(call_expression + (navigation_expression + (identifier) + (identifier) @callee_any) + (value_arguments)) @call_any +{ + node @call_any.node + attr (@call_any.node) kind = "UNKNOWN" + attr (@call_any.node) name = (source-text @callee_any) + attr (@call_any.node) start_row = (start-row @callee_any) + attr (@call_any.node) start_col = (start-column @callee_any) + attr (@call_any.node) end_row = (end-row @callee_any) + attr (@call_any.node) end_col = (end-column @callee_any) + + edge @call_any.node -> @call_any.node + attr (@call_any.node -> @call_any.node) kind = "CALL" + attr (@call_any.node -> @call_any.node) line = (start-row @call_any) +} + +;; Imports +(import + (identifier) @module) +{ + node @module.node + attr (@module.node) kind = "MODULE" + attr (@module.node) name = (source-text @module) + attr (@module.node) start_row = (start-row @module) + attr (@module.node) start_col = (start-column @module) + attr (@module.node) end_row = (end-row @module) + attr (@module.node) end_col = (end-column @module) + + edge @module.node -> @module.node + attr (@module.node -> @module.node) kind = "IMPORT" +} + +(import + (qualified_identifier) @module) +{ + node @module.node + attr (@module.node) kind = "MODULE" + attr (@module.node) name = (source-text @module) + attr (@module.node) start_row = (start-row @module) + attr (@module.node) start_col = (start-column @module) + attr (@module.node) end_row = (end-row @module) + attr (@module.node) end_col = (end-column @module) + + edge @module.node -> @module.node + attr (@module.node -> @module.node) kind = "IMPORT" +} diff --git a/crates/codestory-indexer/rules/swift.scm b/crates/codestory-indexer/rules/swift.scm new file mode 100644 index 00000000..f83d797c --- /dev/null +++ b/crates/codestory-indexer/rules/swift.scm @@ -0,0 +1,165 @@ +(class_declaration + name: (type_identifier) @name) @def +{ + node @name.node + attr (@name.node) kind = "CLASS" + attr (@name.node) name = (source-text @name) + attr (@name.node) start_row = (start-row @def) + attr (@name.node) start_col = (start-column @def) + attr (@name.node) end_row = (end-row @def) + attr (@name.node) end_col = (end-column @def) +} + +(protocol_declaration + name: (type_identifier) @name) @def +{ + node @name.node + attr (@name.node) kind = "INTERFACE" + attr (@name.node) name = (source-text @name) + attr (@name.node) start_row = (start-row @def) + attr (@name.node) start_col = (start-column @def) + attr (@name.node) end_row = (end-row @def) + attr (@name.node) end_col = (end-column @def) +} + +(function_declaration + name: (simple_identifier) @name) @def +{ + node @name.node + attr (@name.node) kind = "FUNCTION" + attr (@name.node) name = (source-text @name) + attr (@name.node) start_row = (start-row @def) + attr (@name.node) start_col = (start-column @def) + attr (@name.node) end_row = (end-row @def) + attr (@name.node) end_col = (end-column @def) +} + +(protocol_function_declaration + name: (simple_identifier) @name) @def +{ + node @name.node + attr (@name.node) kind = "METHOD" + attr (@name.node) name = (source-text @name) + attr (@name.node) start_row = (start-row @def) + attr (@name.node) start_col = (start-column @def) + attr (@name.node) end_row = (end-row @def) + attr (@name.node) end_col = (end-column @def) +} + +(typealias_declaration + name: (type_identifier) @name) @def +{ + node @name.node + attr (@name.node) kind = "TYPEDEF" + attr (@name.node) name = (source-text @name) + attr (@name.node) start_row = (start-row @def) + attr (@name.node) start_col = (start-column @def) + attr (@name.node) end_row = (end-row @def) + attr (@name.node) end_col = (end-column @def) +} + +;; Membership +(class_declaration + name: (type_identifier) @class_name + body: (class_body + (function_declaration name: (simple_identifier) @method_name))) +{ + edge @class_name.node -> @method_name.node + attr (@class_name.node -> @method_name.node) kind = "MEMBER" +} + +(protocol_declaration + name: (type_identifier) @interface_name + body: (protocol_body + (protocol_function_declaration name: (simple_identifier) @method_name))) +{ + edge @interface_name.node -> @method_name.node + attr (@interface_name.node -> @method_name.node) kind = "MEMBER" +} + +;; Inheritance and protocol conformance +(class_declaration + name: (type_identifier) @class_name + (inheritance_specifier + inherits_from: (user_type + (type_identifier) @parent_name))) +{ + node @parent_name.node + attr (@parent_name.node) kind = "CLASS" + attr (@parent_name.node) name = (source-text @parent_name) + attr (@parent_name.node) start_row = (start-row @parent_name) + attr (@parent_name.node) start_col = (start-column @parent_name) + attr (@parent_name.node) end_row = (end-row @parent_name) + attr (@parent_name.node) end_col = (end-column @parent_name) + + edge @class_name.node -> @parent_name.node + attr (@class_name.node -> @parent_name.node) kind = "INHERITANCE" +} + +(protocol_declaration + name: (type_identifier) @interface_name + (inheritance_specifier + inherits_from: (user_type + (type_identifier) @parent_name))) +{ + node @parent_name.node + attr (@parent_name.node) kind = "INTERFACE" + attr (@parent_name.node) name = (source-text @parent_name) + attr (@parent_name.node) start_row = (start-row @parent_name) + attr (@parent_name.node) start_col = (start-column @parent_name) + attr (@parent_name.node) end_row = (end-row @parent_name) + attr (@parent_name.node) end_col = (end-column @parent_name) + + edge @interface_name.node -> @parent_name.node + attr (@interface_name.node -> @parent_name.node) kind = "INHERITANCE" +} + +;; Calls +(call_expression + (simple_identifier) @callee_any) @call_any +{ + node @call_any.node + attr (@call_any.node) kind = "UNKNOWN" + attr (@call_any.node) name = (source-text @callee_any) + attr (@call_any.node) start_row = (start-row @callee_any) + attr (@call_any.node) start_col = (start-column @callee_any) + attr (@call_any.node) end_row = (end-row @callee_any) + attr (@call_any.node) end_col = (end-column @callee_any) + + edge @call_any.node -> @call_any.node + attr (@call_any.node -> @call_any.node) kind = "CALL" + attr (@call_any.node -> @call_any.node) line = (start-row @call_any) +} + +(call_expression + (navigation_expression + target: (_) @callee_any)) @call_any +{ + node @call_any.node + attr (@call_any.node) kind = "UNKNOWN" + attr (@call_any.node) name = (source-text @callee_any) + attr (@call_any.node) start_row = (start-row @callee_any) + attr (@call_any.node) start_col = (start-column @callee_any) + attr (@call_any.node) end_row = (end-row @callee_any) + attr (@call_any.node) end_col = (end-column @callee_any) + + edge @call_any.node -> @call_any.node + attr (@call_any.node -> @call_any.node) kind = "CALL" + attr (@call_any.node -> @call_any.node) line = (start-row @call_any) +} + +;; Imports +(import_declaration + (identifier) @module) +{ + node @module.node + attr (@module.node) kind = "MODULE" + attr (@module.node) name = (source-text @module) + attr (@module.node) start_row = (start-row @module) + attr (@module.node) start_col = (start-column @module) + attr (@module.node) end_row = (end-row @module) + attr (@module.node) end_col = (end-column @module) + + edge @module.node -> @module.node + attr (@module.node -> @module.node) kind = "IMPORT" +} diff --git a/crates/codestory-indexer/src/language_configs.rs b/crates/codestory-indexer/src/language_configs.rs new file mode 100644 index 00000000..4e5f9f64 --- /dev/null +++ b/crates/codestory-indexer/src/language_configs.rs @@ -0,0 +1,190 @@ +use super::{ + BASH_GRAPH_QUERY, C_GRAPH_QUERY, CPP_GRAPH_QUERY, CSHARP_GRAPH_QUERY, DART_GRAPH_QUERY, + GO_GRAPH_QUERY, JAVA_GRAPH_QUERY, JAVASCRIPT_GRAPH_QUERY, KOTLIN_GRAPH_QUERY, LanguageConfig, + LanguageRuleset, PHP_GRAPH_QUERY, PYTHON_GRAPH_QUERY, RUBY_GRAPH_QUERY, RUST_GRAPH_QUERY, + RUST_TAGS_QUERY, SWIFT_GRAPH_QUERY, TSX_GRAPH_QUERY, TSX_TAGS_QUERY, TYPESCRIPT_GRAPH_QUERY, + TYPESCRIPT_TAGS_QUERY, make_language_config, +}; + +pub(super) fn get_language_for_ext(ext: &str) -> Option { + let ext = codestory_contracts::language_support::normalize_extension(ext); + match ext.as_str() { + "py" | "pyi" => Some(python()), + "java" => Some(java()), + "rs" => Some(rust()), + "js" | "jsx" | "mjs" | "cjs" => Some(javascript()), + "ts" | "mts" | "cts" => Some(typescript()), + "tsx" => Some(tsx()), + "cpp" | "cc" | "cxx" | "hpp" | "hh" | "hxx" => Some(cpp()), + "c" | "h" => Some(c()), + "go" => Some(go()), + "rb" => Some(ruby()), + "php" => Some(php()), + "cs" => Some(csharp()), + "kt" | "kts" => Some(kotlin()), + "swift" => Some(swift()), + "dart" => Some(dart()), + "sh" | "bash" => Some(bash()), + _ => None, + } +} + +fn python() -> LanguageConfig { + make_language_config( + tree_sitter_python::LANGUAGE.into(), + "python", + PYTHON_GRAPH_QUERY, + None, + LanguageRuleset::Python, + ) +} + +fn java() -> LanguageConfig { + make_language_config( + tree_sitter_java::LANGUAGE.into(), + "java", + JAVA_GRAPH_QUERY, + None, + LanguageRuleset::Java, + ) +} + +fn rust() -> LanguageConfig { + make_language_config( + tree_sitter_rust::LANGUAGE.into(), + "rust", + RUST_GRAPH_QUERY, + Some(RUST_TAGS_QUERY), + LanguageRuleset::Rust, + ) +} + +fn javascript() -> LanguageConfig { + make_language_config( + tree_sitter_javascript::LANGUAGE.into(), + "javascript", + JAVASCRIPT_GRAPH_QUERY, + None, + LanguageRuleset::JavaScript, + ) +} + +fn typescript() -> LanguageConfig { + make_language_config( + tree_sitter_typescript::LANGUAGE_TYPESCRIPT.into(), + "typescript", + TYPESCRIPT_GRAPH_QUERY, + Some(TYPESCRIPT_TAGS_QUERY), + LanguageRuleset::TypeScript, + ) +} + +fn tsx() -> LanguageConfig { + make_language_config( + tree_sitter_typescript::LANGUAGE_TSX.into(), + "typescript", + TSX_GRAPH_QUERY, + Some(TSX_TAGS_QUERY), + LanguageRuleset::Tsx, + ) +} + +fn cpp() -> LanguageConfig { + make_language_config( + tree_sitter_cpp::LANGUAGE.into(), + "cpp", + CPP_GRAPH_QUERY, + None, + LanguageRuleset::Cpp, + ) +} + +fn c() -> LanguageConfig { + make_language_config( + tree_sitter_c::LANGUAGE.into(), + "c", + C_GRAPH_QUERY, + None, + LanguageRuleset::C, + ) +} + +fn go() -> LanguageConfig { + make_language_config( + tree_sitter_go::LANGUAGE.into(), + "go", + GO_GRAPH_QUERY, + None, + LanguageRuleset::Go, + ) +} + +fn ruby() -> LanguageConfig { + make_language_config( + tree_sitter_ruby::LANGUAGE.into(), + "ruby", + RUBY_GRAPH_QUERY, + None, + LanguageRuleset::Ruby, + ) +} + +fn php() -> LanguageConfig { + make_language_config( + tree_sitter_php::LANGUAGE_PHP.into(), + "php", + PHP_GRAPH_QUERY, + None, + LanguageRuleset::Php, + ) +} + +fn csharp() -> LanguageConfig { + make_language_config( + tree_sitter_c_sharp::LANGUAGE.into(), + "csharp", + CSHARP_GRAPH_QUERY, + None, + LanguageRuleset::CSharp, + ) +} + +fn kotlin() -> LanguageConfig { + make_language_config( + tree_sitter_kotlin_ng::LANGUAGE.into(), + "kotlin", + KOTLIN_GRAPH_QUERY, + None, + LanguageRuleset::Kotlin, + ) +} + +fn swift() -> LanguageConfig { + make_language_config( + tree_sitter_swift::LANGUAGE.into(), + "swift", + SWIFT_GRAPH_QUERY, + None, + LanguageRuleset::Swift, + ) +} + +fn dart() -> LanguageConfig { + make_language_config( + tree_sitter_dart_orchard::LANGUAGE.into(), + "dart", + DART_GRAPH_QUERY, + None, + LanguageRuleset::Dart, + ) +} + +fn bash() -> LanguageConfig { + make_language_config( + tree_sitter_bash::LANGUAGE.into(), + "bash", + BASH_GRAPH_QUERY, + None, + LanguageRuleset::Bash, + ) +} diff --git a/crates/codestory-indexer/src/lib.rs b/crates/codestory-indexer/src/lib.rs index 457b20d7..0537e04e 100644 --- a/crates/codestory-indexer/src/lib.rs +++ b/crates/codestory-indexer/src/lib.rs @@ -23,6 +23,7 @@ mod cache; pub mod cancellation; pub mod compilation_database; pub mod intermediate_storage; +mod language_configs; pub mod resolution; pub mod semantic; pub mod structural; @@ -104,6 +105,10 @@ const GO_GRAPH_QUERY: &str = include_str!("../rules/go.scm"); const RUBY_GRAPH_QUERY: &str = include_str!("../rules/ruby.scm"); const PHP_GRAPH_QUERY: &str = include_str!("../rules/php.scm"); const CSHARP_GRAPH_QUERY: &str = include_str!("../rules/csharp.scm"); +const KOTLIN_GRAPH_QUERY: &str = include_str!("../rules/kotlin.scm"); +const SWIFT_GRAPH_QUERY: &str = include_str!("../rules/swift.scm"); +const DART_GRAPH_QUERY: &str = include_str!("../rules/dart.scm"); +const BASH_GRAPH_QUERY: &str = include_str!("../rules/bash.scm"); #[derive(Debug, Clone, Copy)] enum LanguageRuleset { @@ -119,6 +124,10 @@ enum LanguageRuleset { Ruby, Php, CSharp, + Kotlin, + Swift, + Dart, + Bash, } #[derive(Debug, Clone)] @@ -130,6 +139,10 @@ pub struct LanguageConfig { ruleset: LanguageRuleset, } +pub use codestory_contracts::language_support::{ + LanguageEvidenceTier, LanguageSupportMode, LanguageSupportProfile, +}; + struct CompiledLanguageRules { graph_file: GraphFile, tags_query: Option, @@ -263,6 +276,18 @@ impl LanguageRuleset { LanguageRuleset::CSharp => { compiled_rules_cache(language, CSHARP_GRAPH_QUERY, None, &CSHARP_RULES) } + LanguageRuleset::Kotlin => { + compiled_rules_cache(language, KOTLIN_GRAPH_QUERY, None, &KOTLIN_RULES) + } + LanguageRuleset::Swift => { + compiled_rules_cache(language, SWIFT_GRAPH_QUERY, None, &SWIFT_RULES) + } + LanguageRuleset::Dart => { + compiled_rules_cache(language, DART_GRAPH_QUERY, None, &DART_RULES) + } + LanguageRuleset::Bash => { + compiled_rules_cache(language, BASH_GRAPH_QUERY, None, &BASH_RULES) + } } } } @@ -305,6 +330,10 @@ static GO_RULES: OnceLock> = OnceLock::new static RUBY_RULES: OnceLock> = OnceLock::new(); static PHP_RULES: OnceLock> = OnceLock::new(); static CSHARP_RULES: OnceLock> = OnceLock::new(); +static KOTLIN_RULES: OnceLock> = OnceLock::new(); +static SWIFT_RULES: OnceLock> = OnceLock::new(); +static DART_RULES: OnceLock> = OnceLock::new(); +static BASH_RULES: OnceLock> = OnceLock::new(); fn tag_definition_priority(definition: &TagDefinition) -> (u8, u8, u8) { let role_priority = canonical_role_priority(definition.canonical_role); @@ -1810,6 +1839,32 @@ struct ManualEdgeSpec { line: Option, } +#[derive(Debug, Clone)] +struct ManualMemberEdgeSpec { + source_name: String, + target_name: String, + source_span: GraphNodeSpan, + target_span: GraphNodeSpan, + line: Option, +} + +#[derive(Debug, Clone)] +struct ManualReceiverCallSpec { + source_name: String, + source_span: GraphNodeSpan, + owner_name: String, + method_name: String, + line: Option, +} + +#[derive(Debug, Clone)] +struct ManualPreciseCallSpec { + source_name: String, + source_span: GraphNodeSpan, + target_name: String, + line: Option, +} + #[derive(Debug, Clone, Copy)] struct GraphNodeSpan { start_line: u32, @@ -3477,6 +3532,98 @@ fn collect_tsx_jsx_usage_edges(tree: &Tree, source: &str) -> Vec edges } +fn is_javascript_like_language(language_name: &str) -> bool { + matches!(language_name, "javascript" | "typescript" | "tsx") +} + +fn js_identifier_target_name(node: TsNode<'_>, source: &str) -> Option { + match node.kind() { + "identifier" | "type_identifier" => node_source_text(node, source) + .map(|name| name.trim().to_string()) + .filter(|name| !name.is_empty()), + "member_expression" => node + .child_by_field_name("property") + .and_then(|property| js_identifier_target_name(property, source)), + _ => None, + } +} + +fn js_member_object_identifier(node: TsNode<'_>, source: &str) -> Option { + if node.kind() != "member_expression" { + return None; + } + let object = node.child_by_field_name("object")?; + match object.kind() { + "identifier" => node_source_text(object, source) + .map(|name| name.trim().to_string()) + .filter(|name| !name.is_empty()), + _ => None, + } +} + +fn js_member_property_name(node: TsNode<'_>, source: &str) -> Option { + if node.kind() != "member_expression" { + return None; + } + node.child_by_field_name("property") + .and_then(|property| node_source_text(property, source)) + .map(|name| name.trim().to_string()) + .filter(|name| !name.is_empty()) +} + +fn js_new_expression_constructor_name(node: TsNode<'_>, source: &str) -> Option { + node.child_by_field_name("constructor") + .and_then(|constructor| js_identifier_target_name(constructor, source)) + .or_else(|| { + let mut cursor = node.walk(); + node.named_children(&mut cursor) + .find_map(|child| js_identifier_target_name(child, source)) + }) +} + +fn collect_javascript_static_call_edges(tree: &Tree, source: &str) -> Vec { + let mut edges = Vec::new(); + walk_tree_nodes(tree.root_node(), &mut |node| { + let Some(source_name) = tsx_owner_name(node, source) else { + return; + }; + let line = Some(node.start_position().row as u32 + 1); + match node.kind() { + "new_expression" => { + if let Some(target_name) = js_new_expression_constructor_name(node, source) { + edges.push(ManualEdgeSpec { + source_name, + target_name, + kind: EdgeKind::CALL, + line, + }); + } + } + "call_expression" => { + let Some(function_node) = node.child_by_field_name("function") else { + return; + }; + let Some(property_name) = js_member_property_name(function_node, source) else { + return; + }; + if !matches!(property_name.as_str(), "call" | "apply" | "bind") { + return; + } + if let Some(target_name) = js_member_object_identifier(function_node, source) { + edges.push(ManualEdgeSpec { + source_name, + target_name, + kind: EdgeKind::CALL, + line, + }); + } + } + _ => {} + } + }); + edges +} + fn rust_macro_owner_name(mut node: TsNode<'_>, source: &str) -> Option { while let Some(parent) = node.parent() { if parent.kind() == "function_item" { @@ -3577,6 +3724,112 @@ fn collect_python_decorator_call_edges(tree: &Tree, source: &str) -> Vec Vec { + let mut edges = Vec::new(); + walk_tree_nodes(tree.root_node(), &mut |callable| { + if !matches!(callable.kind(), "method" | "singleton_method") { + return; + } + let Some(source_name) = declaration_name(callable, source) else { + return; + }; + let local_bindings = collect_ruby_local_binding_names(callable, source); + walk_tree_nodes(callable, &mut |node| { + if !matches!(node.kind(), "identifier" | "constant") || !is_ruby_bare_call_site(node) { + return; + } + let Some(target_name) = trimmed_node_text(node, source) else { + return; + }; + if local_bindings.contains(&target_name) { + return; + } + edges.push(ManualEdgeSpec { + source_name: source_name.clone(), + target_name, + kind: EdgeKind::CALL, + line: Some(node.start_position().row as u32 + 1), + }); + }); + }); + edges +} + +fn collect_ruby_local_binding_names(callable: TsNode<'_>, source: &str) -> HashSet { + let mut names = HashSet::new(); + walk_tree_nodes(callable, &mut |node| { + if !matches!(node.kind(), "identifier" | "constant") { + return; + } + let Some(parent) = node.parent() else { + return; + }; + let is_binding = match parent.kind() { + "assignment" => parent + .child_by_field_name("left") + .map(|left| same_ts_span(left, node)) + .unwrap_or(false), + "parameters" | "method_parameters" | "optional_parameter" | "keyword_parameter" => true, + _ => false, + }; + if !is_binding { + return; + } + if let Some(name) = trimmed_node_text(node, source) { + names.insert(name); + } + }); + names +} + +fn is_ruby_bare_call_site(node: TsNode<'_>) -> bool { + let Some(parent) = node.parent() else { + return false; + }; + if matches!( + parent.kind(), + "method" + | "singleton_method" + | "class" + | "module" + | "assignment" + | "parameters" + | "method_parameters" + | "optional_parameter" + | "keyword_parameter" + ) { + return false; + } + if parent.kind() == "call" { + return false; + } + if let Some(name) = parent.child_by_field_name("name") + && same_ts_span(name, node) + { + return false; + } + if let Some(left) = parent.child_by_field_name("left") + && same_ts_span(left, node) + { + return false; + } + if let Some(receiver) = parent.child_by_field_name("receiver") + && same_ts_span(receiver, node) + { + return false; + } + if let Some(method) = parent.child_by_field_name("method") + && same_ts_span(method, node) + { + return false; + } + true +} + +fn same_ts_span(left: TsNode<'_>, right: TsNode<'_>) -> bool { + left.start_byte() == right.start_byte() && left.end_byte() == right.end_byte() +} + fn node_matches_name(node: &Node, name: &str) -> bool { node.serialized_name == name || short_member_name(&node.serialized_name) == name @@ -3693,6 +3946,15 @@ fn collect_runtime_import_specs( unique_nodes: &mut HashMap, symbol_table: Option<&Arc>, ) -> Vec { + if language_name == "bash" { + return collect_bash_source_import_specs( + file_name, + tree, + source, + unique_nodes, + symbol_table, + ); + } if !matches!(language_name, "javascript" | "typescript" | "tsx") { return Vec::new(); } @@ -3773,6 +4035,101 @@ fn collect_runtime_import_specs( specs } +fn collect_bash_source_import_specs( + file_name: &str, + tree: &Tree, + source: &str, + unique_nodes: &mut HashMap, + symbol_table: Option<&Arc>, +) -> Vec { + let mut specs = Vec::new(); + walk_tree_nodes(tree.root_node(), &mut |node| { + if node.kind() != "command" { + return; + } + let Some(name_node) = node.child_by_field_name("name") else { + return; + }; + let Some(callee_name) = + node_source_text(name_node, source).map(|name| name.trim().to_string()) + else { + return; + }; + if callee_name != "source" && callee_name != "." { + return; + } + + let mut cursor = node.walk(); + let Some(module_node) = node.named_children(&mut cursor).find(|child| { + child.start_byte() >= name_node.end_byte() + && matches!( + child.kind(), + "word" | "raw_string" | "string" | "concatenation" + ) + }) else { + return; + }; + let Some(module_name) = node_source_text(module_node, source) + .and_then(|name| normalize_static_shell_module_name(&name)) + else { + return; + }; + + let start = module_node.start_position(); + let end = module_node.end_position(); + let line = start.row as u32 + 1; + let canonical_seed = format!("{file_name}:{module_name}:{line}"); + let module_node_id = NodeId(generate_id(&canonical_seed)); + unique_nodes.entry(module_node_id).or_insert_with(|| Node { + id: module_node_id, + kind: NodeKind::MODULE, + serialized_name: module_name, + start_line: Some(line), + start_col: Some(start.column as u32 + 1), + end_line: Some(end.row as u32 + 1), + end_col: Some(end.column as u32 + 1), + ..Default::default() + }); + if let Some(table) = symbol_table { + table.insert(module_node_id.0, NodeKind::MODULE); + } + specs.push(RuntimeImportSpec { + binding_node_id: None, + module_node_id, + line, + suppress_callee_name: callee_name, + }); + }); + specs +} + +fn normalize_static_shell_module_name(raw: &str) -> Option { + let trimmed = raw.trim(); + if trimmed.is_empty() + || trimmed.contains('$') + || trimmed.contains('*') + || trimmed.contains('?') + || trimmed.contains('`') + { + return None; + } + + let unquoted = if trimmed.len() >= 2 { + let bytes = trimmed.as_bytes(); + if (bytes.first() == Some(&b'"') && bytes.last() == Some(&b'"')) + || (bytes.first() == Some(&b'\'') && bytes.last() == Some(&b'\'')) + { + &trimmed[1..trimmed.len() - 1] + } else { + trimmed + } + } else { + trimmed + }; + + (!unquoted.trim().is_empty()).then(|| unquoted.trim().to_string()) +} + fn unique_node_id_by_name( nodes: &HashMap, name: &str, @@ -3898,12 +4255,18 @@ fn append_manual_usage_edges( if is_tsx_file { specs.extend(collect_tsx_jsx_usage_edges(tree, source)); } + if is_javascript_like_language(language_name) { + specs.extend(collect_javascript_static_call_edges(tree, source)); + } if language_name == "rust" { specs.extend(collect_rust_macro_call_edges(tree, source)); } if language_name == "python" { specs.extend(collect_python_decorator_call_edges(tree, source)); } + if language_name == "ruby" { + specs.extend(collect_ruby_bare_call_edges(tree, source)); + } if specs.is_empty() { return; } @@ -3923,10 +4286,17 @@ fn append_manual_usage_edges( }; let target_id = match spec.kind { EdgeKind::CALL => unique_node_id_by_name(unique_nodes, &spec.target_name, |kind| { - if is_tsx_file || language_name == "python" { + if is_tsx_file + || language_name == "python" + || is_javascript_like_language(language_name) + { matches!( kind, - NodeKind::FUNCTION | NodeKind::METHOD | NodeKind::MACRO | NodeKind::UNKNOWN + NodeKind::CLASS + | NodeKind::FUNCTION + | NodeKind::METHOD + | NodeKind::MACRO + | NodeKind::UNKNOWN ) } else { matches!( @@ -3971,12 +4341,1291 @@ fn append_manual_usage_edges( *next = next.saturating_add(1); ensure_callsite_identity(&mut edge, Some(*next)); } - if !edge_keys.insert(edge_dedup_key(&edge, flags)) { - continue; + if !edge_keys.insert(edge_dedup_key(&edge, flags)) { + continue; + } + edge.id = EdgeId(generate_edge_id_for_edge(&edge, flags)); + result_edges.push(edge); + } +} + +fn language_precise_call_specs( + language_name: &str, + tree: &Tree, + source: &str, +) -> Vec { + match language_name { + "dart" => collect_dart_direct_call_edges(tree, source), + _ => Vec::new(), + } +} + +#[allow(clippy::too_many_arguments)] +fn append_manual_precise_call_edges( + language_name: &str, + tree: &Tree, + source: &str, + unique_nodes: &HashMap, + file_id: NodeId, + result_edges: &mut Vec, + edge_keys: &mut HashSet, + flags: IndexFeatureFlags, + callsite_ordinals: &mut HashMap<(NodeId, Option), u32>, +) { + for spec in language_precise_call_specs(language_name, tree, source) { + let Some(source_id) = + node_id_by_name_and_span(unique_nodes, &spec.source_name, spec.source_span, |kind| { + matches!(kind, NodeKind::FUNCTION | NodeKind::METHOD) + }) + else { + continue; + }; + let Some(target_id) = unique_node_id_by_name(unique_nodes, &spec.target_name, |kind| { + matches!( + kind, + NodeKind::FUNCTION | NodeKind::METHOD | NodeKind::MACRO + ) + }) else { + continue; + }; + + remove_generic_call_placeholders( + unique_nodes, + result_edges, + edge_keys, + flags, + spec.line, + &spec.target_name, + ); + + let mut edge = Edge { + id: EdgeId(0), + source: source_id, + target: target_id, + kind: EdgeKind::CALL, + file_node_id: Some(file_id), + line: spec.line, + resolved_target: Some(target_id), + confidence: Some(1.0), + certainty: Some(ResolutionCertainty::Certain), + ..Default::default() + }; + if !flags.legacy_edge_identity { + let key = (edge.target, edge.line); + let next = callsite_ordinals.entry(key).or_insert(0); + *next = next.saturating_add(1); + ensure_callsite_identity(&mut edge, Some(*next)); + } + if !edge_keys.insert(edge_dedup_key(&edge, flags)) { + continue; + } + edge.id = EdgeId(generate_edge_id_for_edge(&edge, flags)); + result_edges.push(edge); + } +} + +fn node_id_by_name_and_span( + nodes: &HashMap, + name: &str, + span: GraphNodeSpan, + predicate: F, +) -> Option +where + F: Fn(NodeKind) -> bool, +{ + let mut matches = nodes + .values() + .filter(|node| predicate(node.kind)) + .filter(|node| { + node.start_line == Some(span.start_line) + && node.start_col == Some(span.start_col) + && node.end_line == Some(span.end_line) + && node.end_col == Some(span.end_col) + }) + .filter(|node| node_matches_name(node, name)) + .collect::>(); + matches.sort_by_key(|node| node.id); + matches.first().map(|node| node.id) +} + +fn language_member_specs( + language_name: &str, + tree: &Tree, + source: &str, +) -> Vec { + match language_name { + "go" => collect_go_member_edges(tree, source), + "ruby" => collect_enclosing_type_member_edges( + tree, + source, + &["class", "module"], + &["method", "singleton_method"], + ), + "php" => collect_enclosing_type_member_edges( + tree, + source, + &[ + "class_declaration", + "interface_declaration", + "trait_declaration", + ], + &["method_declaration"], + ), + "csharp" => collect_enclosing_type_member_edges( + tree, + source, + &[ + "class_declaration", + "interface_declaration", + "struct_declaration", + ], + &["method_declaration"], + ), + _ => Vec::new(), + } +} + +struct ManualMemberEdgeContext<'a> { + language_name: &'a str, + tree: &'a Tree, + source: &'a str, + unique_nodes: &'a HashMap, + file_id: NodeId, + flags: IndexFeatureFlags, +} + +fn append_manual_member_edges( + context: ManualMemberEdgeContext<'_>, + result_edges: &mut Vec, + edge_keys: &mut HashSet, +) { + for spec in language_member_specs(context.language_name, context.tree, context.source) { + let Some(source_id) = node_id_by_name_and_span( + context.unique_nodes, + &spec.source_name, + spec.source_span, + is_type_like_kind, + ) else { + continue; + }; + let Some(target_id) = node_id_by_name_and_span( + context.unique_nodes, + &spec.target_name, + spec.target_span, + |kind| kind == NodeKind::METHOD, + ) else { + continue; + }; + + let mut edge = Edge { + id: EdgeId(0), + source: source_id, + target: target_id, + kind: EdgeKind::MEMBER, + file_node_id: Some(context.file_id), + line: spec.line, + certainty: parser_direct_structural_certainty(EdgeKind::MEMBER), + ..Default::default() + }; + if !edge_keys.insert(edge_dedup_key(&edge, context.flags)) { + continue; + } + edge.id = EdgeId(generate_edge_id_for_edge(&edge, context.flags)); + result_edges.push(edge); + } +} + +fn language_receiver_call_specs( + language_name: &str, + tree: &Tree, + source: &str, +) -> Vec { + match language_name { + "go" => collect_go_receiver_call_edges(tree, source), + "ruby" => collect_ruby_receiver_call_edges(tree, source), + "php" => collect_php_receiver_call_edges(tree, source), + "csharp" => collect_csharp_receiver_call_edges(tree, source), + "kotlin" => collect_kotlin_receiver_call_edges(tree, source), + "swift" => collect_swift_receiver_call_edges(tree, source), + "dart" => collect_dart_receiver_call_edges(tree, source), + _ => Vec::new(), + } +} + +#[allow(clippy::too_many_arguments)] +fn append_manual_receiver_call_edges( + language_name: &str, + tree: &Tree, + source: &str, + unique_nodes: &HashMap, + file_id: NodeId, + result_edges: &mut Vec, + edge_keys: &mut HashSet, + flags: IndexFeatureFlags, + callsite_ordinals: &mut HashMap<(NodeId, Option), u32>, +) { + for spec in language_receiver_call_specs(language_name, tree, source) { + let Some(source_id) = + node_id_by_name_and_span(unique_nodes, &spec.source_name, spec.source_span, |kind| { + matches!(kind, NodeKind::FUNCTION | NodeKind::METHOD) + }) + else { + continue; + }; + let Some(target_id) = member_target_id_by_owner_and_method( + unique_nodes, + result_edges, + &spec.owner_name, + &spec.method_name, + ) else { + continue; + }; + + remove_generic_call_placeholders( + unique_nodes, + result_edges, + edge_keys, + flags, + spec.line, + &spec.method_name, + ); + + let mut edge = Edge { + id: EdgeId(0), + source: source_id, + target: target_id, + kind: EdgeKind::CALL, + file_node_id: Some(file_id), + line: spec.line, + resolved_target: Some(target_id), + confidence: Some(1.0), + certainty: Some(ResolutionCertainty::Certain), + ..Default::default() + }; + if !flags.legacy_edge_identity { + let key = (edge.target, edge.line); + let next = callsite_ordinals.entry(key).or_insert(0); + *next = next.saturating_add(1); + ensure_callsite_identity(&mut edge, Some(*next)); + } + if !edge_keys.insert(edge_dedup_key(&edge, flags)) { + continue; + } + edge.id = EdgeId(generate_edge_id_for_edge(&edge, flags)); + result_edges.push(edge); + } +} + +fn member_target_id_by_owner_and_method( + nodes: &HashMap, + edges: &[Edge], + owner_name: &str, + method_name: &str, +) -> Option { + let mut owners = nodes + .values() + .filter(|node| is_type_like_kind(node.kind)) + .filter(|node| node_matches_name(node, owner_name)) + .collect::>(); + owners.sort_by(|left, right| { + left.start_line + .unwrap_or(u32::MAX) + .cmp(&right.start_line.unwrap_or(u32::MAX)) + .then_with(|| node_span_width(right).cmp(&node_span_width(left))) + .then_with(|| left.id.cmp(&right.id)) + }); + + for owner in owners { + let mut targets = edges + .iter() + .filter(|edge| edge.kind == EdgeKind::MEMBER && edge.source == owner.id) + .filter_map(|edge| nodes.get(&edge.target)) + .filter(|node| { + matches!(node.kind, NodeKind::FUNCTION | NodeKind::METHOD) + && node_matches_name(node, method_name) + }) + .collect::>(); + targets.sort_by(|left, right| { + left.start_line + .unwrap_or(u32::MAX) + .cmp(&right.start_line.unwrap_or(u32::MAX)) + .then_with(|| left.id.cmp(&right.id)) + }); + if let Some(target) = targets.first() { + return Some(target.id); + } + } + + None +} + +fn remove_generic_call_placeholders( + nodes: &HashMap, + edges: &mut Vec, + edge_keys: &mut HashSet, + flags: IndexFeatureFlags, + line: Option, + method_name: &str, +) { + let mut removed = Vec::new(); + edges.retain(|edge| { + let remove = edge.kind == EdgeKind::CALL + && edge.line == line + && edge.resolved_target.is_none() + && nodes + .get(&edge.target) + .map(|target| { + target.kind == NodeKind::UNKNOWN && node_matches_name(target, method_name) + }) + .unwrap_or(false); + if remove { + removed.push(edge_dedup_key(edge, flags)); + } + !remove + }); + for key in removed { + edge_keys.remove(&key); + } +} + +fn collect_go_member_edges(tree: &Tree, source: &str) -> Vec { + let mut edges = Vec::new(); + walk_tree_nodes(tree.root_node(), &mut |node| match node.kind() { + "method_declaration" => { + let Some(method_name_node) = node.child_by_field_name("name") else { + return; + }; + let Some(receiver_node) = node.child_by_field_name("receiver") else { + return; + }; + let Some(source_name) = go_receiver_owner_name(receiver_node, source) else { + return; + }; + let Some(target_name) = trimmed_node_text(method_name_node, source) else { + return; + }; + + edges.push(ManualMemberEdgeSpec { + source_name, + target_name, + source_span: ts_node_graph_span( + receiver_owner_declaration_node(tree.root_node(), source, receiver_node) + .unwrap_or(receiver_node), + ), + target_span: ts_node_graph_span(node), + line: Some(node.start_position().row as u32 + 1), + }); + } + "method_elem" => { + let Some(owner_node) = enclosing_node_with_kind(node, &["type_declaration"]) else { + return; + }; + let Some(owner_name_node) = descendant_by_field_name(owner_node, "name") else { + return; + }; + let Some(source_name) = trimmed_node_text(owner_name_node, source) else { + return; + }; + let Some(method_name_node) = node.child_by_field_name("name") else { + return; + }; + let Some(target_name) = trimmed_node_text(method_name_node, source) else { + return; + }; + + edges.push(ManualMemberEdgeSpec { + source_name, + target_name, + source_span: ts_node_graph_span(owner_node), + target_span: ts_node_graph_span(node), + line: Some(node.start_position().row as u32 + 1), + }); + } + _ => {} + }); + edges +} + +fn receiver_owner_declaration_node<'tree>( + root: TsNode<'tree>, + source: &str, + receiver_node: TsNode<'tree>, +) -> Option> { + let owner_name = go_receiver_owner_name(receiver_node, source)?; + find_go_type_declaration_by_name(root, source, &owner_name) +} + +fn find_go_type_declaration_by_name<'tree>( + node: TsNode<'tree>, + source: &str, + owner_name: &str, +) -> Option> { + if node.kind() == "type_declaration" + && let Some(name_node) = descendant_by_field_name(node, "name") + && trimmed_node_text(name_node, source).as_deref() == Some(owner_name) + { + return Some(node); + } + + let mut cursor = node.walk(); + for child in node.named_children(&mut cursor) { + if let Some(found) = find_go_type_declaration_by_name(child, source, owner_name) { + return Some(found); + } + } + None +} + +fn descendant_by_field_name<'tree>(node: TsNode<'tree>, field_name: &str) -> Option> { + if let Some(child) = node.child_by_field_name(field_name) { + return Some(child); + } + let mut cursor = node.walk(); + for child in node.named_children(&mut cursor) { + if let Some(found) = descendant_by_field_name(child, field_name) { + return Some(found); + } + } + None +} + +fn first_descendant_with_kind<'tree>(node: TsNode<'tree>, kind: &str) -> Option> { + if node.kind() == kind { + return Some(node); + } + let mut cursor = node.walk(); + for child in node.named_children(&mut cursor) { + if let Some(found) = first_descendant_with_kind(child, kind) { + return Some(found); + } + } + None +} + +fn go_receiver_owner_name(receiver_node: TsNode<'_>, source: &str) -> Option { + let text = trimmed_node_text(receiver_node, source)?; + let inner = text + .trim() + .trim_start_matches('(') + .trim_end_matches(')') + .trim(); + let raw_owner = inner.split_whitespace().last()?.trim(); + normalize_go_type_surface(raw_owner) +} + +fn normalize_go_type_surface(raw: &str) -> Option { + let mut surface = raw.trim(); + while let Some(stripped) = surface.strip_prefix('*') { + surface = stripped.trim_start(); + } + if let Some(stripped) = surface.strip_prefix("[]") { + surface = stripped.trim_start(); + } + let base = surface.split('[').next().unwrap_or(surface).trim(); + let terminal = base.rsplit('.').next().unwrap_or(base).trim(); + (!terminal.is_empty()).then(|| terminal.to_string()) +} + +fn collect_go_receiver_call_edges(tree: &Tree, source: &str) -> Vec { + let mut edges = Vec::new(); + walk_tree_nodes(tree.root_node(), &mut |callable| { + if !matches!( + callable.kind(), + "function_declaration" | "method_declaration" + ) { + return; + } + let Some(source_name) = declaration_name(callable, source) else { + return; + }; + let receiver_types = collect_go_parameter_types(callable, source); + if receiver_types.is_empty() { + return; + } + collect_receiver_call_specs_in_callable( + callable, + source, + &source_name, + ts_node_graph_span(callable), + &receiver_types, + go_selector_call, + &mut edges, + ); + }); + edges +} + +fn collect_php_receiver_call_edges(tree: &Tree, source: &str) -> Vec { + let mut edges = Vec::new(); + walk_tree_nodes(tree.root_node(), &mut |callable| { + if !matches!( + callable.kind(), + "function_definition" | "method_declaration" + ) { + return; + } + let Some(source_name) = declaration_name(callable, source) else { + return; + }; + let receiver_types = collect_php_parameter_types(callable, source); + if receiver_types.is_empty() { + return; + } + collect_receiver_call_specs_in_callable( + callable, + source, + &source_name, + ts_node_graph_span(callable), + &receiver_types, + php_member_call, + &mut edges, + ); + }); + edges +} + +fn collect_csharp_receiver_call_edges(tree: &Tree, source: &str) -> Vec { + let mut edges = Vec::new(); + walk_tree_nodes(tree.root_node(), &mut |callable| { + if callable.kind() != "method_declaration" { + return; + } + let Some(source_name) = declaration_name(callable, source) else { + return; + }; + let receiver_types = collect_csharp_parameter_types(callable, source); + if receiver_types.is_empty() { + return; + } + collect_receiver_call_specs_in_callable( + callable, + source, + &source_name, + ts_node_graph_span(callable), + &receiver_types, + csharp_member_call, + &mut edges, + ); + }); + edges +} + +fn collect_kotlin_receiver_call_edges(tree: &Tree, source: &str) -> Vec { + let mut edges = Vec::new(); + walk_tree_nodes(tree.root_node(), &mut |callable| { + if callable.kind() != "function_declaration" { + return; + } + let Some(source_name) = declaration_name(callable, source) else { + return; + }; + let receiver_types = collect_colon_parameter_types(callable, source); + if receiver_types.is_empty() { + return; + } + collect_receiver_call_specs_in_callable( + callable, + source, + &source_name, + ts_node_graph_span(callable), + &receiver_types, + kotlin_member_call, + &mut edges, + ); + }); + edges +} + +fn collect_swift_receiver_call_edges(tree: &Tree, source: &str) -> Vec { + let mut edges = Vec::new(); + walk_tree_nodes(tree.root_node(), &mut |callable| { + if callable.kind() != "function_declaration" { + return; + } + let Some(source_name) = declaration_name(callable, source) else { + return; + }; + let receiver_types = collect_colon_parameter_types(callable, source); + if receiver_types.is_empty() { + return; + } + collect_receiver_call_specs_in_callable( + callable, + source, + &source_name, + ts_node_graph_span(callable), + &receiver_types, + swift_member_call, + &mut edges, + ); + }); + edges +} + +fn collect_dart_receiver_call_edges(tree: &Tree, source: &str) -> Vec { + let mut edges = Vec::new(); + walk_tree_nodes(tree.root_node(), &mut |body| { + if body.kind() != "function_body" { + return; + } + let Some(signature) = dart_signature_for_body(body) else { + return; + }; + let Some(source_name) = dart_callable_name(signature, source) else { + return; + }; + let receiver_types = collect_prefix_parameter_types(signature, source); + if receiver_types.is_empty() { + return; + } + collect_receiver_call_specs_in_callable( + body, + source, + &source_name, + ts_node_graph_span(signature), + &receiver_types, + dart_member_call, + &mut edges, + ); + }); + edges +} + +fn collect_dart_direct_call_edges(tree: &Tree, source: &str) -> Vec { + let mut edges = Vec::new(); + walk_tree_nodes(tree.root_node(), &mut |body| { + if body.kind() != "function_body" { + return; + } + let Some(signature) = dart_signature_for_body(body) else { + return; + }; + let Some(source_name) = dart_callable_name(signature, source) else { + return; + }; + let source_span = ts_node_graph_span(signature); + walk_tree_nodes(body, &mut |node| { + let Some(target_name) = dart_direct_call(node, source) else { + return; + }; + edges.push(ManualPreciseCallSpec { + source_name: source_name.clone(), + source_span, + target_name, + line: Some(node.start_position().row as u32 + 1), + }); + }); + }); + edges +} + +fn dart_signature_for_body<'tree>(body: TsNode<'tree>) -> Option> { + previous_named_sibling_with_kind(body, &["method_signature", "function_signature"]) +} + +fn collect_ruby_receiver_call_edges(tree: &Tree, source: &str) -> Vec { + let mut edges = Vec::new(); + walk_tree_nodes(tree.root_node(), &mut |callable| { + if !matches!(callable.kind(), "method" | "singleton_method") { + return; + } + let Some(source_name) = declaration_name(callable, source) else { + return; + }; + let receiver_types = collect_ruby_local_constructor_types(callable, source); + if receiver_types.is_empty() { + return; + } + collect_receiver_call_specs_in_callable( + callable, + source, + &source_name, + ts_node_graph_span(callable), + &receiver_types, + ruby_receiver_call, + &mut edges, + ); + }); + edges +} + +fn collect_receiver_call_specs_in_callable( + callable: TsNode<'_>, + source: &str, + source_name: &str, + source_span: GraphNodeSpan, + receiver_types: &HashMap, + call_parts: fn(TsNode<'_>, &str) -> Option<(String, String)>, + edges: &mut Vec, +) { + walk_tree_nodes(callable, &mut |node| { + let Some((receiver_name, method_name)) = call_parts(node, source) else { + return; + }; + let Some(owner_name) = receiver_types.get(&receiver_name) else { + return; + }; + edges.push(ManualReceiverCallSpec { + source_name: source_name.to_string(), + source_span, + owner_name: owner_name.clone(), + method_name, + line: Some(node.start_position().row as u32 + 1), + }); + }); +} + +fn collect_go_parameter_types(callable: TsNode<'_>, source: &str) -> HashMap { + let mut receiver_types = HashMap::new(); + let Some(parameters) = callable.child_by_field_name("parameters") else { + return receiver_types; + }; + walk_tree_nodes(parameters, &mut |node| { + if !matches!( + node.kind(), + "parameter_declaration" | "variadic_parameter_declaration" + ) { + return; + } + let Some(type_node) = node.child_by_field_name("type") else { + return; + }; + let Some(raw_type) = trimmed_node_text(type_node, source) else { + return; + }; + let Some(owner_name) = normalize_go_type_surface(&raw_type) else { + return; + }; + let mut cursor = node.walk(); + for child in node.named_children(&mut cursor) { + if child.kind() == "identifier" + && child.end_byte() <= type_node.start_byte() + && let Some(name) = normalized_receiver_variable(child, source) + { + receiver_types.insert(name, owner_name.clone()); + } + } + }); + receiver_types +} + +fn collect_php_parameter_types(callable: TsNode<'_>, source: &str) -> HashMap { + let mut receiver_types = HashMap::new(); + let Some(parameters) = callable.child_by_field_name("parameters") else { + return receiver_types; + }; + walk_tree_nodes(parameters, &mut |node| { + if !matches!( + node.kind(), + "simple_parameter" | "variadic_parameter" | "property_promotion_parameter" + ) { + return; + } + let Some(type_node) = node.child_by_field_name("type") else { + return; + }; + let Some(raw_type) = trimmed_node_text(type_node, source) else { + return; + }; + let Some(owner_name) = normalize_type_surface(&raw_type) else { + return; + }; + let Some(name_node) = node.child_by_field_name("name") else { + return; + }; + if let Some(name) = normalized_receiver_variable(name_node, source) { + receiver_types.insert(name, owner_name); + } + }); + receiver_types +} + +fn collect_csharp_parameter_types(callable: TsNode<'_>, source: &str) -> HashMap { + let mut receiver_types = HashMap::new(); + let Some(parameters) = callable.child_by_field_name("parameters") else { + return receiver_types; + }; + walk_tree_nodes(parameters, &mut |node| { + if node.kind() != "parameter" { + return; + } + let Some(type_node) = descendant_by_field_name(node, "type") else { + return; + }; + let Some(raw_type) = trimmed_node_text(type_node, source) else { + return; + }; + let Some(owner_name) = normalize_type_surface(&raw_type) else { + return; + }; + let Some(name_node) = node.child_by_field_name("name") else { + return; + }; + if let Some(name) = normalized_receiver_variable(name_node, source) { + receiver_types.insert(name, owner_name); + } + }); + receiver_types +} + +fn collect_ruby_local_constructor_types( + callable: TsNode<'_>, + source: &str, +) -> HashMap { + let mut receiver_types = HashMap::new(); + walk_tree_nodes(callable, &mut |node| { + if node.kind() != "assignment" { + return; + } + let Some(left_node) = node.child_by_field_name("left") else { + return; + }; + let Some(receiver_name) = normalized_receiver_variable(left_node, source) else { + return; + }; + let Some(right_node) = node.child_by_field_name("right") else { + return; + }; + let Some(owner_name) = ruby_constructor_owner(right_node, source) else { + return; + }; + receiver_types.insert(receiver_name, owner_name); + }); + receiver_types +} + +fn collect_colon_parameter_types(callable: TsNode<'_>, source: &str) -> HashMap { + let mut receiver_types = HashMap::new(); + let Some(parameters) = signature_parameter_surface(callable, source) else { + return receiver_types; + }; + for parameter in split_top_level_parameters(¶meters) { + let Some((name_side, type_side)) = parameter.split_once(':') else { + continue; + }; + let Some(receiver_name) = parameter_name_before_colon(name_side) else { + continue; + }; + let Some(owner_name) = normalize_type_surface(¶meter_type_after_colon(type_side)) + else { + continue; + }; + receiver_types.insert(receiver_name, owner_name); + } + receiver_types +} + +fn collect_prefix_parameter_types(callable: TsNode<'_>, source: &str) -> HashMap { + let mut receiver_types = HashMap::new(); + let Some(parameters) = signature_parameter_surface(callable, source) else { + return receiver_types; + }; + for parameter in split_top_level_parameters(¶meters) { + let parameter = parameter + .split('=') + .next() + .unwrap_or(parameter.as_str()) + .trim(); + let tokens = parameter + .split_whitespace() + .filter(|token| !matches!(*token, "final" | "const" | "var" | "required")) + .collect::>(); + if tokens.len() < 2 { + continue; + } + let receiver_name = tokens.last().copied().unwrap_or_default(); + if receiver_name.starts_with("this.") || receiver_name.starts_with("super.") { + continue; + } + let raw_type = tokens[..tokens.len() - 1].join(" "); + let Some(receiver_name) = normalize_parameter_name(receiver_name) else { + continue; + }; + let Some(owner_name) = normalize_type_surface(&raw_type) else { + continue; + }; + receiver_types.insert(receiver_name, owner_name); + } + receiver_types +} + +fn signature_parameter_surface(callable: TsNode<'_>, source: &str) -> Option { + let text = trimmed_node_text(callable, source)?; + let start = text.find('(')?; + let mut depth = 0usize; + let mut parameter_start = None; + for (index, ch) in text.char_indices().skip_while(|(index, _)| *index < start) { + match ch { + '(' => { + depth = depth.saturating_add(1); + if depth == 1 { + parameter_start = Some(index + ch.len_utf8()); + } + } + ')' => { + if depth == 1 { + let parameter_start = parameter_start?; + return Some(text[parameter_start..index].to_string()); + } + depth = depth.saturating_sub(1); + } + _ => {} + } + } + None +} + +fn split_top_level_parameters(parameters: &str) -> Vec { + let mut parts = Vec::new(); + let mut current = String::new(); + let mut paren_depth = 0usize; + let mut bracket_depth = 0usize; + let mut brace_depth = 0usize; + let mut angle_depth = 0usize; + for ch in parameters.chars() { + match ch { + '(' => paren_depth = paren_depth.saturating_add(1), + ')' => paren_depth = paren_depth.saturating_sub(1), + '[' => bracket_depth = bracket_depth.saturating_add(1), + ']' => bracket_depth = bracket_depth.saturating_sub(1), + '{' => brace_depth = brace_depth.saturating_add(1), + '}' => brace_depth = brace_depth.saturating_sub(1), + '<' => angle_depth = angle_depth.saturating_add(1), + '>' => angle_depth = angle_depth.saturating_sub(1), + ',' if paren_depth == 0 + && bracket_depth == 0 + && brace_depth == 0 + && angle_depth == 0 => + { + let part = current.trim(); + if !part.is_empty() { + parts.push(part.to_string()); + } + current.clear(); + continue; + } + _ => {} + } + current.push(ch); + } + let part = current.trim(); + if !part.is_empty() { + parts.push(part.to_string()); + } + parts +} + +fn parameter_name_before_colon(name_side: &str) -> Option { + name_side + .split_whitespace() + .last() + .and_then(normalize_parameter_name) +} + +fn parameter_type_after_colon(type_side: &str) -> String { + type_side + .split('=') + .next() + .unwrap_or(type_side) + .split("->") + .next() + .unwrap_or(type_side) + .split("where") + .next() + .unwrap_or(type_side) + .split_whitespace() + .filter(|token| { + !matches!( + *token, + "inout" | "borrowing" | "consuming" | "some" | "any" | "final" | "const" + ) + }) + .collect::>() + .join(" ") +} + +fn normalize_parameter_name(raw: &str) -> Option { + let trimmed = raw.trim().trim_end_matches(',').trim(); + if trimmed == "_" { + return None; + } + let terminal = trimmed.rsplit('.').next().unwrap_or(trimmed); + let cleaned = terminal + .trim_start_matches('$') + .trim_matches(|ch: char| !ch.is_alphanumeric() && ch != '_'); + (!cleaned.is_empty()).then(|| cleaned.to_string()) +} + +fn go_selector_call(node: TsNode<'_>, source: &str) -> Option<(String, String)> { + if node.kind() != "call_expression" { + return None; + } + let function = node.child_by_field_name("function")?; + if function.kind() != "selector_expression" { + return None; + } + let receiver = function.child_by_field_name("operand")?; + let method = function.child_by_field_name("field")?; + Some(( + normalized_receiver_variable(receiver, source)?, + trimmed_node_text(method, source)?, + )) +} + +fn php_member_call(node: TsNode<'_>, source: &str) -> Option<(String, String)> { + if !matches!( + node.kind(), + "member_call_expression" | "nullsafe_member_call_expression" + ) { + return None; + } + let receiver = node.child_by_field_name("object")?; + let method = node.child_by_field_name("name")?; + Some(( + normalized_receiver_variable(receiver, source)?, + trimmed_node_text(method, source)?, + )) +} + +fn csharp_member_call(node: TsNode<'_>, source: &str) -> Option<(String, String)> { + if node.kind() != "invocation_expression" { + return None; + } + let function = node.child_by_field_name("function")?; + if function.kind() != "member_access_expression" { + return None; + } + let receiver = function.child_by_field_name("expression")?; + let method = function.child_by_field_name("name")?; + Some(( + normalized_receiver_variable(receiver, source)?, + trimmed_node_text(method, source)?, + )) +} + +fn ruby_receiver_call(node: TsNode<'_>, source: &str) -> Option<(String, String)> { + if node.kind() != "call" { + return None; + } + let receiver = node.child_by_field_name("receiver")?; + let method = node.child_by_field_name("method")?; + let method_name = trimmed_node_text(method, source)?; + if method_name == "new" { + return None; + } + Some((normalized_receiver_variable(receiver, source)?, method_name)) +} + +fn kotlin_member_call(node: TsNode<'_>, source: &str) -> Option<(String, String)> { + if node.kind() != "call_expression" { + return None; + } + surface_member_call(node, source) +} + +fn swift_member_call(node: TsNode<'_>, source: &str) -> Option<(String, String)> { + if node.kind() != "call_expression" { + return None; + } + surface_member_call(node, source) +} + +fn dart_member_call(node: TsNode<'_>, source: &str) -> Option<(String, String)> { + if !matches!(node.kind(), "expression_statement" | "return_statement") { + return None; + } + surface_member_call(node, source) +} + +fn dart_direct_call(node: TsNode<'_>, source: &str) -> Option { + if !matches!(node.kind(), "expression_statement" | "return_statement") { + return None; + } + let text = trimmed_node_text(node, source)?; + let callable = text + .split('(') + .next() + .unwrap_or(text.as_str()) + .trim() + .trim_end_matches(';') + .trim(); + if callable.contains('.') { + return None; + } + let callable = callable + .strip_prefix("return") + .map(str::trim) + .unwrap_or(callable); + callable + .split_whitespace() + .last() + .and_then(normalize_parameter_name) +} + +fn surface_member_call(node: TsNode<'_>, source: &str) -> Option<(String, String)> { + let text = trimmed_node_text(node, source)?; + let callable = text + .split('(') + .next() + .unwrap_or(text.as_str()) + .trim() + .trim_end_matches(';') + .trim(); + let separator = callable.rfind('.')?; + let receiver = callable[..separator].trim().trim_end_matches('?').trim(); + let method = callable[separator + 1..] + .trim() + .trim_start_matches('?') + .trim(); + Some(( + normalized_receiver_surface(receiver)?, + normalize_parameter_name(method)?, + )) +} + +fn normalized_receiver_surface(raw: &str) -> Option { + let terminal = raw + .rsplit([' ', '\t', '\n', '\r', '(', '[', '{']) + .find(|part| !part.trim().is_empty()) + .unwrap_or(raw) + .trim() + .trim_end_matches('?') + .trim(); + normalize_parameter_name(terminal) +} + +fn dart_callable_name(node: TsNode<'_>, source: &str) -> Option { + descendant_by_field_name(node, "name") + .or_else(|| first_descendant_with_kind(node, "identifier")) + .and_then(|name_node| trimmed_node_text(name_node, source)) +} + +fn ruby_constructor_owner(node: TsNode<'_>, source: &str) -> Option { + if node.kind() != "call" { + return None; + } + let method = node.child_by_field_name("method")?; + if trimmed_node_text(method, source).as_deref() != Some("new") { + return None; + } + let receiver = node.child_by_field_name("receiver")?; + let raw_owner = trimmed_node_text(receiver, source)?; + normalize_type_surface(&raw_owner) +} + +fn normalized_receiver_variable(node: TsNode<'_>, source: &str) -> Option { + let text = trimmed_node_text(node, source)?; + let trimmed = text.trim(); + let without_dollars = trimmed.trim_start_matches('$'); + (!without_dollars.is_empty()).then(|| without_dollars.to_string()) +} + +fn normalize_type_surface(raw: &str) -> Option { + let mut surface = raw.trim(); + if surface.contains('|') || surface.contains('&') { + return None; + } + surface = surface.trim_start_matches('?').trim(); + while let Some(stripped) = surface.strip_prefix('*') { + surface = stripped.trim_start(); + } + while let Some(stripped) = surface.strip_prefix('&') { + surface = stripped.trim_start(); + } + if let Some(stripped) = surface.strip_prefix("[]") { + surface = stripped.trim_start(); + } + surface = surface.trim_end_matches('?').trim(); + let base = surface + .split(['<', '[', '(']) + .next() + .unwrap_or(surface) + .trim(); + let terminal = base + .rsplit(['\\', '.', ':']) + .find(|segment| !segment.trim().is_empty()) + .unwrap_or(base) + .trim(); + (!terminal.is_empty()).then(|| terminal.to_string()) +} + +fn collect_enclosing_type_member_edges( + tree: &Tree, + source: &str, + owner_kinds: &[&str], + member_kinds: &[&str], +) -> Vec { + let mut edges = Vec::new(); + walk_tree_nodes(tree.root_node(), &mut |node| { + if !member_kinds.contains(&node.kind()) { + return; + } + let Some(owner_node) = enclosing_node_with_kind(node, owner_kinds) else { + return; + }; + let Some(owner_name) = declaration_name(owner_node, source) else { + return; + }; + let Some(target_name) = declaration_name(node, source) else { + return; + }; + + edges.push(ManualMemberEdgeSpec { + source_name: owner_name, + target_name, + source_span: ts_node_graph_span(owner_node), + target_span: ts_node_graph_span(node), + line: Some(node.start_position().row as u32 + 1), + }); + }); + edges +} + +fn enclosing_node_with_kind<'tree>( + mut node: TsNode<'tree>, + kinds: &[&str], +) -> Option> { + while let Some(parent) = node.parent() { + if kinds.contains(&parent.kind()) { + return Some(parent); + } + node = parent; + } + None +} + +fn previous_named_sibling_with_kind<'tree>( + mut node: TsNode<'tree>, + kinds: &[&str], +) -> Option> { + while let Some(previous) = node.prev_named_sibling() { + if kinds.contains(&previous.kind()) { + return Some(previous); } - edge.id = EdgeId(generate_edge_id_for_edge(&edge, flags)); - result_edges.push(edge); + node = previous; } + None +} + +fn declaration_name(node: TsNode<'_>, source: &str) -> Option { + node.child_by_field_name("name") + .or_else(|| first_named_identifier_like_child(node)) + .and_then(|name_node| trimmed_node_text(name_node, source)) +} + +fn first_named_identifier_like_child<'tree>(node: TsNode<'tree>) -> Option> { + let mut cursor = node.walk(); + node.named_children(&mut cursor).find(|child| { + matches!( + child.kind(), + "identifier" + | "field_identifier" + | "type_identifier" + | "name" + | "constant" + | "scope_resolution" + ) + }) } fn append_runtime_import_edges( @@ -4320,6 +5969,11 @@ fn queue_qualified_child_names( // Keep members of type-like owners owner-qualified in both name fields so // downstream resolution can distinguish declared members from placeholder/reference nodes. if parent.is_type_like { + if promotes_type_member_functions_to_methods(traversal.language_name) + && child_node.kind == NodeKind::FUNCTION + { + child_node.kind = NodeKind::METHOD; + } child_node.serialized_name = format!( "{}{}{}", parent.serialized_name, delimiter, child_node.serialized_name @@ -4330,6 +5984,10 @@ fn queue_qualified_child_names( } } +fn promotes_type_member_functions_to_methods(language_name: &str) -> bool { + matches!(language_name, "kotlin" | "swift" | "dart") +} + fn qualified_name_delimiter(language_name: &str) -> &'static str { match language_name { "rust" | "cpp" | "c" => "::", @@ -4516,6 +6174,16 @@ fn remap_edges( if let Some(new_id) = id_remap.get(&edge.target) { edge.target = *new_id; } + if let Some(resolved_source) = edge.resolved_source + && let Some(new_id) = id_remap.get(&resolved_source) + { + edge.resolved_source = Some(*new_id); + } + if let Some(resolved_target) = edge.resolved_target + && let Some(new_id) = id_remap.get(&resolved_target) + { + edge.resolved_target = Some(*new_id); + } edge.file_node_id = Some(new_file_id); if !flags.legacy_edge_identity { ensure_callsite_identity(edge, None); @@ -8535,29 +10203,37 @@ fn is_api_endpoint_call_context(line: &str, literal_col: u32) -> bool { } let compact_before = compact_lowercase(before_literal); - if compact_before.contains("fetch(") || compact_before.contains("axios(") { + if compact_before.contains("fetch(") { return true; } let methods = ["delete", "patch", "post", "put", "head", "options", "get"]; - let client_receivers = [ - "axios", - "requests", - "reqwest", - "http", - "$http", - "ky", - "got", - "httpclient", - ]; - client_receivers.iter().any(|receiver| { - methods.iter().any(|method| { - compact_before.contains(&format!("{receiver}.{method}(")) - || compact_before.contains(&format!("{receiver}::{method}(")) - }) + methods.iter().any(|method| { + let dot_call = format!(".{method}("); + let path_call = format!("::{method}("); + (compact_before.ends_with(&dot_call) || compact_before.ends_with(&path_call)) + && !is_server_route_registration_context(&compact_before, method) }) } +fn is_server_route_registration_context(compact_before: &str, method: &str) -> bool { + let route_call = format!(".{method}("); + let Some(receiver) = compact_before.strip_suffix(&route_call) else { + return false; + }; + let receiver = receiver + .rsplit(|ch: char| !(ch.is_ascii_alphanumeric() || ch == '_' || ch == '.')) + .next() + .unwrap_or(receiver) + .rsplit('.') + .next() + .unwrap_or(receiver); + matches!( + receiver, + "app" | "router" | "route" | "server" | "fastify" | "hono" + ) +} + fn has_line_comment_before_literal(value: &str) -> bool { let mut chars = value.char_indices().peekable(); let mut quote: Option = None; @@ -9056,6 +10732,17 @@ pub fn index_file( flags, &mut callsite_ordinals, ); + append_manual_precise_call_edges( + language_config.language_name, + &tree, + source, + &unique_nodes, + file_id, + &mut result_edges, + &mut edge_keys, + flags, + &mut callsite_ordinals, + ); append_manual_c_enum_member_edges( language_config.language_name, &tree, @@ -9066,6 +10753,29 @@ pub fn index_file( &mut edge_keys, flags, ); + append_manual_member_edges( + ManualMemberEdgeContext { + language_name: language_config.language_name, + tree: &tree, + source, + unique_nodes: &unique_nodes, + file_id, + flags, + }, + &mut result_edges, + &mut edge_keys, + ); + append_manual_receiver_call_edges( + language_config.language_name, + &tree, + source, + &unique_nodes, + file_id, + &mut result_edges, + &mut edge_keys, + flags, + &mut callsite_ordinals, + ); append_runtime_import_edges( &runtime_import_specs, &unique_nodes, @@ -9208,96 +10918,19 @@ pub fn index_file( }) } +pub fn language_support_profile_for_ext(ext: &str) -> Option { + codestory_contracts::language_support::language_support_profile_for_ext(ext).copied() +} + +pub fn language_support_profile_for_language_name( + language_name: &str, +) -> Option { + codestory_contracts::language_support::language_support_profile_for_language_name(language_name) + .copied() +} + pub fn get_language_for_ext(ext: &str) -> Option { - let ext = ext.trim().trim_start_matches('.').to_ascii_lowercase(); - match ext.as_str() { - // Keep this extension map aligned with the top-level live rule registry. - "py" | "pyi" => Some(make_language_config( - tree_sitter_python::LANGUAGE.into(), - "python", - PYTHON_GRAPH_QUERY, - None, - LanguageRuleset::Python, - )), - "java" => Some(make_language_config( - tree_sitter_java::LANGUAGE.into(), - "java", - JAVA_GRAPH_QUERY, - None, - LanguageRuleset::Java, - )), - "rs" => Some(make_language_config( - tree_sitter_rust::LANGUAGE.into(), - "rust", - RUST_GRAPH_QUERY, - Some(RUST_TAGS_QUERY), - LanguageRuleset::Rust, - )), - "js" | "jsx" | "mjs" | "cjs" => Some(make_language_config( - tree_sitter_javascript::LANGUAGE.into(), - "javascript", - JAVASCRIPT_GRAPH_QUERY, - None, - LanguageRuleset::JavaScript, - )), - "ts" | "mts" | "cts" => Some(make_language_config( - tree_sitter_typescript::LANGUAGE_TYPESCRIPT.into(), - "typescript", - TYPESCRIPT_GRAPH_QUERY, - Some(TYPESCRIPT_TAGS_QUERY), - LanguageRuleset::TypeScript, - )), - "tsx" => Some(make_language_config( - tree_sitter_typescript::LANGUAGE_TSX.into(), - "typescript", - TSX_GRAPH_QUERY, - Some(TSX_TAGS_QUERY), - LanguageRuleset::Tsx, - )), - "cpp" | "cc" | "cxx" | "hpp" | "hh" | "hxx" => Some(make_language_config( - tree_sitter_cpp::LANGUAGE.into(), - "cpp", - CPP_GRAPH_QUERY, - None, - LanguageRuleset::Cpp, - )), - "c" | "h" => Some(make_language_config( - tree_sitter_c::LANGUAGE.into(), - "c", - C_GRAPH_QUERY, - None, - LanguageRuleset::C, - )), - "go" => Some(make_language_config( - tree_sitter_go::LANGUAGE.into(), - "go", - GO_GRAPH_QUERY, - None, - LanguageRuleset::Go, - )), - "rb" => Some(make_language_config( - tree_sitter_ruby::LANGUAGE.into(), - "ruby", - RUBY_GRAPH_QUERY, - None, - LanguageRuleset::Ruby, - )), - "php" => Some(make_language_config( - tree_sitter_php::LANGUAGE_PHP.into(), - "php", - PHP_GRAPH_QUERY, - None, - LanguageRuleset::Php, - )), - "cs" => Some(make_language_config( - tree_sitter_c_sharp::LANGUAGE.into(), - "csharp", - CSHARP_GRAPH_QUERY, - None, - LanguageRuleset::CSharp, - )), - _ => None, - } + language_configs::get_language_for_ext(ext) } pub fn generate_id(name: &str) -> i64 { @@ -11198,6 +12831,63 @@ class Test { let tsx = get_language_for_ext("tsx").expect("tsx config"); assert_eq!(tsx.graph_query, TSX_GRAPH_QUERY); assert_eq!(tsx.tags_query, Some(TSX_TAGS_QUERY)); + + let kotlin = get_language_for_ext("kt").expect("kotlin config"); + assert_eq!(kotlin.graph_query, KOTLIN_GRAPH_QUERY); + + let swift = get_language_for_ext("swift").expect("swift config"); + assert_eq!(swift.graph_query, SWIFT_GRAPH_QUERY); + + let dart = get_language_for_ext("dart").expect("dart config"); + assert_eq!(dart.graph_query, DART_GRAPH_QUERY); + + let bash = get_language_for_ext("sh").expect("bash config"); + assert_eq!(bash.graph_query, BASH_GRAPH_QUERY); + } + + #[test] + fn test_language_support_profiles_separate_runtime_claims() { + let rust = language_support_profile_for_ext("rs").expect("rust profile"); + assert_eq!(rust.support_mode, LanguageSupportMode::ParserBackedGraph); + assert_eq!(rust.evidence_tier, LanguageEvidenceTier::GraphFidelity); + assert_eq!(rust.claim_label, "parser-backed graph, fidelity-gated"); + + let go = language_support_profile_for_ext("go").expect("go profile"); + assert_eq!(go.support_mode, LanguageSupportMode::ParserBackedGraph); + assert_eq!(go.evidence_tier, LanguageEvidenceTier::GraphFidelity); + assert_eq!(go.claim_label, "parser-backed graph, fidelity-gated"); + + let structural = language_support_profile_for_ext("html").expect("html profile"); + assert_eq!( + structural.support_mode, + LanguageSupportMode::StructuralCollector + ); + assert_eq!( + structural.evidence_tier, + LanguageEvidenceTier::StructuralOnly + ); + assert!( + language_support_profile_for_ext("cshtml").is_none(), + ".cshtml stays compatibility-only until Razor support has a public profile" + ); + assert!( + get_language_for_ext("cshtml").is_none(), + ".cshtml must not route into parser-backed indexing without a public profile" + ); + + for profile in codestory_contracts::language_support::LANGUAGE_SUPPORT_PROFILES { + if profile.support_mode == LanguageSupportMode::ParserBackedGraph { + for ext in profile.extensions { + assert_eq!(profile.evidence_tier, LanguageEvidenceTier::GraphFidelity); + assert!( + get_language_for_ext(ext).is_some(), + "parser-backed language {} extension {} must route into live indexing", + profile.language_name, + ext + ); + } + } + } } #[test] @@ -11376,6 +13066,201 @@ typedef struct Worker { "MEMBER".to_string() ))); + let kotlin = execute_raw_graph_contract( + Path::new("Main.kt"), + r#" +package demo.game + +import demo.tools.Helper + +open class Base + +class Worker : Base() { + fun run() { + helper() + } +} + +fun helper() {} +typealias Alias = Worker +"#, + &get_language_for_ext("kt").expect("kotlin config"), + )?; + assert!( + kotlin + .nodes + .contains(&("CLASS".to_string(), "Worker".to_string())) + ); + assert!( + kotlin + .nodes + .contains(&("FUNCTION".to_string(), "helper".to_string())) + ); + assert!(kotlin.edges.contains(&( + "Worker".to_string(), + "run".to_string(), + "MEMBER".to_string() + ))); + assert!( + kotlin.edges.contains(&( + "Worker".to_string(), + "Base".to_string(), + "INHERITANCE".to_string() + )), + "kotlin raw graph nodes: {:?}; edges: {:?}", + kotlin.nodes, + kotlin.edges + ); + assert!(kotlin.edges.contains(&( + "helper".to_string(), + "helper".to_string(), + "CALL".to_string() + ))); + assert!(kotlin.edges.contains(&( + "demo.tools.Helper".to_string(), + "demo.tools.Helper".to_string(), + "IMPORT".to_string() + ))); + + let swift = execute_raw_graph_contract( + Path::new("Main.swift"), + r#" +import Foundation + +protocol Runnable { + func run() +} + +class Base {} + +class Worker: Base, Runnable { + func run() { + helper() + } +} + +func helper() {} +typealias Alias = Worker +"#, + &get_language_for_ext("swift").expect("swift config"), + )?; + assert!( + swift + .nodes + .contains(&("CLASS".to_string(), "Worker".to_string())) + ); + assert!( + swift + .nodes + .contains(&("INTERFACE".to_string(), "Runnable".to_string())) + ); + assert!( + swift + .nodes + .contains(&("FUNCTION".to_string(), "helper".to_string())) + ); + assert!(swift.edges.contains(&( + "Worker".to_string(), + "run".to_string(), + "MEMBER".to_string() + ))); + assert!(swift.edges.contains(&( + "Worker".to_string(), + "Base".to_string(), + "INHERITANCE".to_string() + ))); + assert!(swift.edges.contains(&( + "helper".to_string(), + "helper".to_string(), + "CALL".to_string() + ))); + assert!(swift.edges.contains(&( + "Foundation".to_string(), + "Foundation".to_string(), + "IMPORT".to_string() + ))); + + let dart = execute_raw_graph_contract( + Path::new("main.dart"), + r#" +import 'dart:math'; + +class Base {} + +class Worker extends Base { + void run() { + helper(); + } +} + +void helper() {} +"#, + &get_language_for_ext("dart").expect("dart config"), + )?; + assert!( + dart.nodes + .contains(&("CLASS".to_string(), "Worker".to_string())) + ); + assert!( + dart.nodes + .contains(&("FUNCTION".to_string(), "helper".to_string())) + ); + assert!(dart.edges.contains(&( + "Worker".to_string(), + "run".to_string(), + "MEMBER".to_string() + ))); + assert!(dart.edges.contains(&( + "Worker".to_string(), + "Base".to_string(), + "INHERITANCE".to_string() + ))); + assert!(dart.edges.contains(&( + "helper".to_string(), + "helper".to_string(), + "CALL".to_string() + ))); + assert!(dart.edges.contains(&( + "'dart:math'".to_string(), + "'dart:math'".to_string(), + "IMPORT".to_string() + ))); + + let bash = execute_raw_graph_contract( + Path::new("main.sh"), + r#" +NAME=world + +helper() { + echo "$NAME" +} + +main() { + helper +} + +main +"#, + &get_language_for_ext("sh").expect("bash config"), + )?; + assert!( + bash.nodes + .contains(&("FUNCTION".to_string(), "helper".to_string())) + ); + assert!( + bash.nodes + .contains(&("VARIABLE".to_string(), "NAME".to_string())) + ); + assert!(bash.edges.contains(&( + "helper".to_string(), + "helper".to_string(), + "CALL".to_string() + ))); + assert!( + bash.edges + .contains(&("main".to_string(), "main".to_string(), "CALL".to_string())) + ); + Ok(()) } @@ -11458,6 +13343,62 @@ typedef struct Worker { for kind in ["struct_specifier", "field_declaration", "type_definition"] { assert!(c_kinds.contains(kind), "c grammar should expose {kind}"); } + + let kotlin_kinds = parser_node_kinds(tree_sitter_kotlin_ng::LANGUAGE.into()); + for kind in [ + "class_declaration", + "function_declaration", + "call_expression", + "import", + "delegation_specifier", + ] { + assert!( + kotlin_kinds.contains(kind), + "kotlin grammar should expose {kind}" + ); + } + + let swift_kinds = parser_node_kinds(tree_sitter_swift::LANGUAGE.into()); + for kind in [ + "class_declaration", + "protocol_declaration", + "function_declaration", + "call_expression", + "import_declaration", + ] { + assert!( + swift_kinds.contains(kind), + "swift grammar should expose {kind}" + ); + } + + let dart_kinds = parser_node_kinds(tree_sitter_dart_orchard::LANGUAGE.into()); + for kind in [ + "class_definition", + "function_signature", + "method_signature", + "selector", + "argument_part", + "import_specification", + ] { + assert!( + dart_kinds.contains(kind), + "dart grammar should expose {kind}" + ); + } + + let bash_kinds = parser_node_kinds(tree_sitter_bash::LANGUAGE.into()); + for kind in [ + "function_definition", + "command", + "command_name", + "variable_assignment", + ] { + assert!( + bash_kinds.contains(kind), + "bash grammar should expose {kind}" + ); + } } #[test] @@ -12734,7 +14675,7 @@ export async function loadUsers() { } export async function createUser() { - return axios.post("/api/users", {}); + return apiClient.post("/api/users", {}); } "#; let language_config = get_language_for_ext("ts").expect("typescript config"); diff --git a/crates/codestory-indexer/src/resolution/candidate_selection.rs b/crates/codestory-indexer/src/resolution/candidate_selection.rs index cd6303ae..50b25392 100644 --- a/crates/codestory-indexer/src/resolution/candidate_selection.rs +++ b/crates/codestory-indexer/src/resolution/candidate_selection.rs @@ -6,7 +6,16 @@ pub(super) fn compute_call_resolution( row: &UnresolvedEdgeRow, semantic_candidates: &[SemanticResolutionCandidate], ) -> Result { - let (edge_id, file_id, caller_qualified, _, target_name, _, callsite_identity) = row; + let ( + edge_id, + file_id, + caller_qualified, + _source_name, + target_name, + target_node_id, + _caller_file_path, + callsite_identity, + ) = row; let prepared_name = PreparedName::new(target_name.clone()); let is_common_unqualified = is_common_unqualified_call_name(&prepared_name.original); let is_owner_qualified = is_owner_qualified_call_name(&prepared_name.original); @@ -29,6 +38,20 @@ pub(super) fn compute_call_resolution( } } + if selected.is_none() + && !is_common_unqualified + && candidate_index.is_import_binding_node(*target_node_id) + { + if pass.flags.store_candidates { + candidate_ids.push(*target_node_id); + } + selected = Some(( + *target_node_id, + pass.policy.call_same_file, + ResolutionStrategy::CallSameFile, + )); + } + if selected.is_none() && !is_common_unqualified && let Some(candidate) = candidate_index.find_same_file_readonly( @@ -142,14 +165,40 @@ fn is_owner_qualified_call_name(name: &str) -> bool { name.contains("::") || name.contains('.') } +fn is_relative_import_module_name(name: &str) -> bool { + normalize_import_module_name(name) + .is_some_and(|module| module.starts_with("./") || module.starts_with("../")) +} + +fn is_import_binding_name(name: &str) -> bool { + normalize_import_module_name(name).is_some_and(|module| { + !module.is_empty() + && !module.starts_with("./") + && !module.starts_with("../") + && !module.starts_with('/') + && !module.contains('/') + }) +} + pub(super) fn compute_import_resolution( pass: &ResolutionPass, candidate_index: &CandidateIndex, row: &UnresolvedEdgeRow, semantic_candidates: &[SemanticResolutionCandidate], ) -> Result { - let (edge_id, file_id, caller_qualified, source_name, target_name, _, _) = row; + let ( + edge_id, + file_id, + caller_qualified, + source_name, + target_name, + _target_node_id, + caller_file_path, + _callsite_identity, + ) = row; let has_alias = import_alias_mismatch(source_name, target_name); + let has_relative_import_binding = + is_import_binding_name(source_name) && is_relative_import_module_name(target_name); let caller_prefix = caller_qualified.as_deref().and_then(module_prefix); let name_candidates = import_name_candidates(target_name, pass.flags.legacy_mode) .into_iter() @@ -178,6 +227,22 @@ pub(super) fn compute_import_resolution( let mut same_module_selected: Option = None; let mut global_selected: Option = None; let mut fuzzy_selected: Option = None; + let mut relative_file_selected: Option = None; + + if has_relative_import_binding { + let alias_name = PreparedName::new(source_name.clone()); + relative_file_selected = candidate_index.find_relative_import_readonly( + caller_file_path.as_deref(), + target_name, + &alias_name.original, + &alias_name.ascii_lower, + ); + if let Some(candidate) = relative_file_selected + && pass.flags.store_candidates + { + candidate_ids.push(candidate); + } + } for name in &name_candidates { if pass.flags.legacy_mode @@ -250,7 +315,13 @@ pub(super) fn compute_import_resolution( } let mut selected: Option<(i64, f32, ResolutionStrategy)> = - if let Some(candidate) = same_file_selected { + if let Some(candidate) = relative_file_selected { + Some(( + candidate, + ResolutionCertainty::CERTAIN_MIN, + ResolutionStrategy::ImportSameModule, + )) + } else if let Some(candidate) = same_file_selected { Some(( candidate, pass.policy.import_same_file, @@ -278,12 +349,15 @@ pub(super) fn compute_import_resolution( }) }; - if has_alias && !matches!(selected, Some((_, _, ResolutionStrategy::ImportSameFile))) { + if (has_alias || has_relative_import_binding) + && relative_file_selected.is_none() + && !matches!(selected, Some((_, _, ResolutionStrategy::ImportSameFile))) + { selected = None; } if selected.is_none() - && !has_alias + && !(has_alias || has_relative_import_binding) && let Some((candidate, confidence)) = semantic_fallback { selected = Some(( @@ -386,6 +460,7 @@ mod tests { Some(caller_qualified.to_string()), "caller".to_string(), target_name.to_string(), + 0, Some("src/main.ts".to_string()), callsite_identity.map(str::to_string), ) @@ -441,6 +516,39 @@ mod tests { Ok(()) } + #[test] + fn imported_binding_call_resolves_to_exact_alias_node() -> Result<()> { + let row = ( + 7_i64, + Some(1_i64), + Some("pkg::core::caller".to_string()), + "caller".to_string(), + "dispatchRequest".to_string(), + 77_i64, + Some("src/main.ts".to_string()), + Some("1:1:1:1".to_string()), + ); + let mut index = CandidateIndex::default(); + index.import_binding_node_ids.insert(77); + let computed = compute_call_resolution( + &resolution_pass(false), + &index, + &row, + &[SemanticResolutionCandidate { + target_node_id: 88, + confidence: 0.62, + }], + )?; + + assert_eq!(computed.strategy, Some(ResolutionStrategy::CallSameFile)); + assert_eq!(computed.update.resolved_target_node_id, Some(77)); + assert_eq!( + computed.update.certainty, + Some(ResolutionCertainty::Certain.as_str()) + ); + Ok(()) + } + #[test] fn common_name_candidate_payload_keeps_semantic_and_index_candidates() -> Result<()> { let conn = Connection::open_in_memory()?; diff --git a/crates/codestory-indexer/src/resolution/mod.rs b/crates/codestory-indexer/src/resolution/mod.rs index fb03f85b..fba97ba5 100644 --- a/crates/codestory-indexer/src/resolution/mod.rs +++ b/crates/codestory-indexer/src/resolution/mod.rs @@ -28,6 +28,7 @@ type UnresolvedEdgeRow = ( Option, String, String, + i64, Option, Option, ); @@ -47,7 +48,8 @@ const SCOPED_CALLER_TABLE: &str = "resolution_scoped_caller_ids"; type SameFileCacheKey = (i64, String, String); type SameModuleCacheKey = (String, String, String); type NameCacheKey = (String, String); -const RESOLUTION_SUPPORT_SNAPSHOT_VERSION: i64 = 1; +type RelativeImportCacheKey = (String, String, String, String); +pub const RESOLUTION_SUPPORT_SNAPSHOT_VERSION: i64 = 4; #[derive(Debug, Clone, PartialEq, Eq, Hash)] struct SemanticResolutionRequestKey { @@ -78,6 +80,8 @@ struct ResolutionLookupCache { struct CandidateNode { id: i64, file_node_id: Option, + file_path: Option, + normalized_file_path: Option, serialized_name: String, serialized_name_ascii_lower: String, qualified_name: Option, @@ -87,6 +91,7 @@ struct CandidateNode { struct CandidateNodeSnapshot { id: i64, file_node_id: Option, + file_path: Option, serialized_name: String, qualified_name: Option, } @@ -94,15 +99,20 @@ struct CandidateNodeSnapshot { #[derive(Default, Debug)] struct CandidateIndex { nodes: Vec, + relative_import_nodes: Vec, + import_binding_node_ids: HashSet, node_offset_by_id: HashMap, exact_map: HashMap>, suffix_map_ascii_lower: HashMap>, + file_path_map: HashMap>, + relative_file_path_map: HashMap>, same_file_cache: RwLock>>, same_module_cache: RwLock>>, global_unique_cache: RwLock>>, global_unique_exact_cache: RwLock>>, global_owner_alias_cache: RwLock>>, fuzzy_cache: RwLock>>, + relative_import_cache: RwLock>>, } #[derive(Debug, Clone)] @@ -164,7 +174,11 @@ struct ResolutionSupportSnapshot { #[serde(default)] enable_semantic: bool, call_candidates: Vec, + #[serde(default)] + call_import_binding_node_ids: Vec, import_candidates: Vec, + #[serde(default)] + relative_import_candidates: Vec, call_semantic_nodes: Vec, import_semantic_nodes: Vec, override_members: Vec, @@ -1041,6 +1055,7 @@ fn semantic_lookup_from_row<'a>( caller_qualified, source_name, target_name, + _target_node_id, caller_file_path, callsite_identity, ) = row; @@ -1137,7 +1152,7 @@ impl PreparedResolutionState { let conn = storage.get_connection(); let call_candidate_started = Instant::now(); - let call_candidate_index = CandidateIndex::load( + let call_candidate_index = CandidateIndex::load_with_import_bindings( conn, &[ NodeKind::FUNCTION as i32, @@ -1148,13 +1163,14 @@ impl PreparedResolutionState { telemetry.call_candidate_index_ms = duration_ms_u64(call_candidate_started.elapsed()); let import_candidate_started = Instant::now(); - let import_candidate_index = CandidateIndex::load( + let import_candidate_index = CandidateIndex::load_with_relative_import_kinds( conn, &[ NodeKind::MODULE as i32, NodeKind::NAMESPACE as i32, NodeKind::PACKAGE as i32, ], + semantic_candidate_kinds(EdgeKind::IMPORT), )?; telemetry.import_candidate_index_ms = duration_ms_u64(import_candidate_started.elapsed()); @@ -1203,8 +1219,14 @@ impl PreparedResolutionState { snapshot.node_names, ); Self { - call_candidate_index: CandidateIndex::from_snapshot_nodes(snapshot.call_candidates), - import_candidate_index: CandidateIndex::from_snapshot_nodes(snapshot.import_candidates), + call_candidate_index: CandidateIndex::from_snapshot_nodes_with_import_bindings( + snapshot.call_candidates, + snapshot.call_import_binding_node_ids, + ), + import_candidate_index: CandidateIndex::from_snapshot_nodes_with_relative( + snapshot.import_candidates, + snapshot.relative_import_candidates, + ), call_semantic_index: if flags.enable_semantic { SemanticCandidateIndex::from_snapshot_nodes(snapshot.call_semantic_nodes) } else { @@ -1223,7 +1245,11 @@ impl PreparedResolutionState { ResolutionSupportSnapshot { enable_semantic: flags.enable_semantic, call_candidates: self.call_candidate_index.snapshot_nodes(), + call_import_binding_node_ids: self.call_candidate_index.import_binding_node_ids(), import_candidates: self.import_candidate_index.snapshot_nodes(), + relative_import_candidates: self + .import_candidate_index + .snapshot_relative_import_nodes(), call_semantic_nodes: self.call_semantic_index.snapshot_nodes(), import_semantic_nodes: self.import_semantic_index.snapshot_nodes(), override_members: self.override_support.override_member_rows(), @@ -1235,21 +1261,67 @@ impl PreparedResolutionState { } impl CandidateIndex { + #[cfg(test)] fn load(conn: &rusqlite::Connection, kinds: &[i32]) -> Result { + Ok(Self::from_nodes(Self::load_nodes(conn, kinds)?)) + } + + fn load_with_import_bindings(conn: &rusqlite::Connection, kinds: &[i32]) -> Result { + let nodes = Self::load_nodes(conn, kinds)?; + let import_binding_node_ids = Self::load_import_binding_node_ids(conn)?; + Ok(Self::from_primary_relative_and_import_bindings( + nodes, + Vec::new(), + import_binding_node_ids, + )) + } + + fn load_with_relative_import_kinds( + conn: &rusqlite::Connection, + kinds: &[i32], + relative_import_kinds: &[i32], + ) -> Result { + let nodes = Self::load_nodes(conn, kinds)?; + let relative_import_nodes = if relative_import_kinds.is_empty() { + Vec::new() + } else { + Self::load_nodes(conn, relative_import_kinds)? + }; + Ok(Self::from_primary_and_relative_nodes( + nodes, + relative_import_nodes, + )) + } + + fn load_import_binding_node_ids(conn: &rusqlite::Connection) -> Result> { + let mut stmt = conn.prepare( + "SELECT DISTINCT source_node_id + FROM edge + WHERE kind = ?1", + )?; + let rows = stmt.query_map(params![EdgeKind::IMPORT as i32], |row| row.get::<_, i64>(0))?; + Ok(rows.collect::>>()?) + } + + fn load_nodes(conn: &rusqlite::Connection, kinds: &[i32]) -> Result> { let kind_clause = kind_clause(kinds); let query = format!( - "SELECT id, file_node_id, serialized_name, qualified_name - FROM node - WHERE kind IN ({}) - ORDER BY COALESCE(start_line, -9223372036854775808), id", + "SELECT n.id, n.file_node_id, n.serialized_name, n.qualified_name, file_node.serialized_name + FROM node n + LEFT JOIN node file_node ON file_node.id = n.file_node_id + WHERE n.kind IN ({}) + ORDER BY COALESCE(n.start_line, -9223372036854775808), n.id", kind_clause ); let mut stmt = conn.prepare(&query)?; let rows = stmt.query_map([], |row| { let serialized_name: String = row.get(2)?; + let file_path: Option = row.get(4)?; Ok(CandidateNode { id: row.get(0)?, file_node_id: row.get(1)?, + normalized_file_path: file_path.as_deref().and_then(normalize_resolution_path), + file_path, serialized_name_ascii_lower: serialized_name.to_ascii_lowercase(), serialized_name, qualified_name: row.get(3)?, @@ -1261,21 +1333,27 @@ impl CandidateIndex { nodes.push(row?); } - Ok(Self::from_nodes(nodes)) + Ok(nodes) } - fn from_snapshot_nodes(nodes: Vec) -> Self { - Self::from_nodes( - nodes - .into_iter() - .map(|node| CandidateNode { - id: node.id, - file_node_id: node.file_node_id, - serialized_name_ascii_lower: node.serialized_name.to_ascii_lowercase(), - serialized_name: node.serialized_name, - qualified_name: node.qualified_name, - }) - .collect(), + fn from_snapshot_nodes_with_import_bindings( + nodes: Vec, + import_binding_node_ids: Vec, + ) -> Self { + Self::from_primary_relative_and_import_bindings( + Self::candidate_nodes_from_snapshots(nodes), + Vec::new(), + import_binding_node_ids.into_iter().collect(), + ) + } + + fn from_snapshot_nodes_with_relative( + nodes: Vec, + relative_import_nodes: Vec, + ) -> Self { + Self::from_primary_and_relative_nodes( + Self::candidate_nodes_from_snapshots(nodes), + Self::candidate_nodes_from_snapshots(relative_import_nodes), ) } @@ -1285,15 +1363,74 @@ impl CandidateIndex { .map(|node| CandidateNodeSnapshot { id: node.id, file_node_id: node.file_node_id, + file_path: node.file_path.clone(), serialized_name: node.serialized_name.clone(), qualified_name: node.qualified_name.clone(), }) .collect() } + fn snapshot_relative_import_nodes(&self) -> Vec { + self.relative_import_nodes + .iter() + .map(|node| CandidateNodeSnapshot { + id: node.id, + file_node_id: node.file_node_id, + file_path: node.file_path.clone(), + serialized_name: node.serialized_name.clone(), + qualified_name: node.qualified_name.clone(), + }) + .collect() + } + + fn candidate_nodes_from_snapshots(nodes: Vec) -> Vec { + nodes + .into_iter() + .map(|node| CandidateNode { + id: node.id, + file_node_id: node.file_node_id, + normalized_file_path: node + .file_path + .as_deref() + .and_then(normalize_resolution_path), + file_path: node.file_path, + serialized_name_ascii_lower: node.serialized_name.to_ascii_lowercase(), + serialized_name: node.serialized_name, + qualified_name: node.qualified_name, + }) + .collect() + } + + #[cfg(test)] fn from_nodes(nodes: Vec) -> Self { + Self::from_primary_and_relative_nodes(nodes, Vec::new()) + } + + fn from_primary_and_relative_nodes( + nodes: Vec, + relative_import_nodes: Vec, + ) -> Self { + Self::from_primary_relative_and_import_bindings( + nodes, + relative_import_nodes, + HashSet::new(), + ) + } + + fn from_primary_relative_and_import_bindings( + nodes: Vec, + relative_import_nodes: Vec, + import_binding_node_ids: HashSet, + ) -> Self { + let relative_import_nodes = if relative_import_nodes.is_empty() { + nodes.clone() + } else { + relative_import_nodes + }; let mut index = CandidateIndex { nodes, + relative_import_nodes, + import_binding_node_ids, ..CandidateIndex::default() }; for (offset, node) in index.nodes.iter().enumerate() { @@ -1310,10 +1447,40 @@ impl CandidateIndex { .or_default() .push(offset); } + if let Some(path) = node.normalized_file_path.as_ref() { + index + .file_path_map + .entry(path.clone()) + .or_default() + .push(offset); + } + } + for (offset, node) in index.relative_import_nodes.iter().enumerate() { + if let Some(path) = node.normalized_file_path.as_ref() { + index + .relative_file_path_map + .entry(path.clone()) + .or_default() + .push(offset); + } } index } + fn import_binding_node_ids(&self) -> Vec { + let mut ids = self + .import_binding_node_ids + .iter() + .copied() + .collect::>(); + ids.sort_unstable(); + ids + } + + fn is_import_binding_node(&self, node_id: i64) -> bool { + self.import_binding_node_ids.contains(&node_id) + } + #[cfg(test)] fn find_same_file(&self, file_id: Option, name: &str) -> Option { let name_ascii_lower = name.to_ascii_lowercase(); @@ -1366,6 +1533,41 @@ impl CandidateIndex { }) } + fn find_relative_import_readonly( + &self, + caller_file_path: Option<&str>, + module_name: &str, + imported_name: &str, + imported_name_ascii_lower: &str, + ) -> Option { + let caller_file_path = caller_file_path?; + let candidates = relative_import_path_candidates(caller_file_path, module_name); + if candidates.is_empty() { + return None; + } + let key = ( + normalize_resolution_path(caller_file_path)?, + normalize_import_module_name(module_name)?, + imported_name.to_string(), + imported_name_ascii_lower.to_string(), + ); + self.cached_lookup(&self.relative_import_cache, key, || { + for candidate_path in candidates { + let Some(offsets) = self.relative_file_path_map.get(&candidate_path) else { + continue; + }; + if let Some(node_id) = self.first_matching_name_in_offsets( + offsets, + imported_name, + imported_name_ascii_lower, + ) { + return Some(node_id); + } + } + None + }) + } + #[cfg(test)] fn find_global_unique(&self, name: &str) -> Option { let name_ascii_lower = name.to_ascii_lowercase(); @@ -1520,6 +1722,29 @@ impl CandidateIndex { }) } + fn first_matching_name_in_offsets( + &self, + offsets: &[usize], + name: &str, + name_ascii_lower: &str, + ) -> Option { + offsets.iter().find_map(|idx| { + let node = &self.relative_import_nodes[*idx]; + let serialized_tail_lower = + tail_component(&node.serialized_name).map(str::to_ascii_lowercase); + let qualified_tail_lower = node + .qualified_name + .as_deref() + .and_then(tail_component) + .map(str::to_ascii_lowercase); + (node.serialized_name == name + || node.serialized_name_ascii_lower == name_ascii_lower + || serialized_tail_lower.as_deref() == Some(name_ascii_lower) + || qualified_tail_lower.as_deref() == Some(name_ascii_lower)) + .then_some(node.id) + }) + } + fn cached_lookup( &self, cache: &RwLock>>, @@ -1567,6 +1792,100 @@ fn tail_component(serialized_name: &str) -> Option<&str> { if tail.is_empty() { None } else { Some(tail) } } +fn normalize_import_module_name(module_name: &str) -> Option { + let unquoted = module_name + .trim() + .trim_matches(|ch| matches!(ch, '"' | '\'' | '`')); + (!unquoted.is_empty()).then(|| unquoted.replace('\\', "/").to_ascii_lowercase()) +} + +fn normalize_resolution_path(path: &str) -> Option { + let mut value = path.trim(); + if value.is_empty() { + return None; + } + value = value.strip_prefix(r"\\?\").unwrap_or(value); + let mut normalized = value.replace('\\', "/"); + let absolute = normalized.starts_with('/'); + let mut prefix = String::new(); + if normalized.len() >= 2 && normalized.as_bytes()[1] == b':' { + prefix = normalized[..2].to_string(); + normalized = normalized[2..].trim_start_matches('/').to_string(); + } else if absolute { + normalized = normalized.trim_start_matches('/').to_string(); + } + + let mut parts = Vec::new(); + for segment in normalized.split('/') { + match segment { + "" | "." => {} + ".." => { + if parts.last().is_some_and(|last| *last != "..") { + parts.pop(); + } else if prefix.is_empty() && !absolute { + parts.push(segment); + } + } + _ => parts.push(segment), + } + } + + let mut out = String::new(); + if !prefix.is_empty() { + out.push_str(&prefix); + out.push('/'); + } else if absolute { + out.push('/'); + } + out.push_str(&parts.join("/")); + let out = out.trim_end_matches('/').to_ascii_lowercase(); + (!out.is_empty()).then_some(out) +} + +fn relative_import_path_candidates(caller_file_path: &str, module_name: &str) -> Vec { + let Some(module_name) = normalize_import_module_name(module_name) else { + return Vec::new(); + }; + if !(module_name.starts_with("./") || module_name.starts_with("../")) { + return Vec::new(); + } + let Some(caller_path) = normalize_resolution_path(caller_file_path) else { + return Vec::new(); + }; + let Some(parent) = caller_path.rsplit_once('/').map(|(parent, _)| parent) else { + return Vec::new(); + }; + let Some(base) = normalize_resolution_path(&format!("{parent}/{module_name}")) else { + return Vec::new(); + }; + + let mut candidates = Vec::new(); + push_unique(&mut candidates, base.clone()); + if !path_has_extension(&base) { + for extension in [".js", ".jsx", ".ts", ".tsx", ".mjs", ".cjs", ".mts", ".cts"] { + push_unique(&mut candidates, format!("{base}{extension}")); + } + for index_file in [ + "index.js", + "index.jsx", + "index.ts", + "index.tsx", + "index.mjs", + "index.cjs", + ] { + push_unique(&mut candidates, format!("{base}/{index_file}")); + } + } + candidates +} + +fn path_has_extension(path: &str) -> bool { + path.rsplit('/') + .next() + .and_then(|segment| segment.rsplit_once('.')) + .is_some_and(|(stem, extension)| !stem.is_empty() && !extension.is_empty()) +} + #[cfg(test)] #[allow(dead_code)] fn is_same_file_candidate( @@ -2106,7 +2425,9 @@ mod tests { let stale_snapshot = ResolutionSupportSnapshot { enable_semantic: false, call_candidates: Vec::new(), + call_import_binding_node_ids: Vec::new(), import_candidates: Vec::new(), + relative_import_candidates: Vec::new(), call_semantic_nodes: Vec::new(), import_semantic_nodes: Vec::new(), override_members: Vec::new(), @@ -2158,6 +2479,33 @@ mod tests { )); } + #[test] + fn test_relative_import_lookup_matches_imported_file_symbol() { + let index = CandidateIndex::from_nodes(vec![CandidateNode { + id: 42, + file_node_id: Some(2), + file_path: Some(r"\\?\C:\repo\lib\client.js".to_string()), + normalized_file_path: normalize_resolution_path(r"\\?\C:\repo\lib\client.js"), + serialized_name: "Client".to_string(), + serialized_name_ascii_lower: "client".to_string(), + qualified_name: Some("Client".to_string()), + }]); + + assert_eq!( + relative_import_path_candidates(r"\\?\C:\repo\lib\app.js", r#""./client.js""#), + vec!["c:/repo/lib/client.js".to_string()] + ); + assert_eq!( + index.find_relative_import_readonly( + Some(r"\\?\C:\repo\lib\app.js"), + r#""./client.js""#, + "Client", + "client", + ), + Some(42) + ); + } + #[test] fn test_common_call_resolution_requires_certain_confidence_and_callsite() { assert!(!should_keep_common_call_resolution( @@ -2217,6 +2565,7 @@ mod tests { Some("pkg::core::caller".to_string()), "caller".to_string(), "target".to_string(), + 0, Some("/repo/lib.rs".to_string()), Some("1:2:3:4".to_string()), ), @@ -2226,6 +2575,7 @@ mod tests { Some("pkg::core::caller".to_string()), "caller".to_string(), "target".to_string(), + 0, Some("/repo/lib.rs".to_string()), Some("1:2:3:4".to_string()), ), @@ -2262,6 +2612,7 @@ mod tests { Some("pkg::core::caller".to_string()), "caller".to_string(), "clone".to_string(), + 0, Some("/repo/lib.rs".to_string()), None, )]; @@ -2311,6 +2662,7 @@ mod tests { Some("pkg::core::caller".to_string()), "caller".to_string(), "clone".to_string(), + 0, Some("/repo/lib.rs".to_string()), Some("1:2:3:4".to_string()), ); @@ -2350,6 +2702,7 @@ mod tests { Some("pkg::core::caller".to_string()), "caller".to_string(), "clone".to_string(), + 0, Some("/repo/lib.rs".to_string()), Some("1:2:3:4".to_string()), ); diff --git a/crates/codestory-indexer/src/resolution/query_helpers.rs b/crates/codestory-indexer/src/resolution/query_helpers.rs index 1ce5acfa..de82962e 100644 --- a/crates/codestory-indexer/src/resolution/query_helpers.rs +++ b/crates/codestory-indexer/src/resolution/query_helpers.rs @@ -19,7 +19,7 @@ pub(super) fn import_alias_mismatch(source_name: &str, target_name: &str) -> boo .map(str::trim) .unwrap_or(target); - source != target_tail && (target.contains("::") || target.contains('.')) + source != target && source != target_tail && (target.contains("::") || target.contains('.')) } pub(super) fn sorted_scope_file_ids( diff --git a/crates/codestory-indexer/src/resolution/sql.rs b/crates/codestory-indexer/src/resolution/sql.rs index caf42be3..69756e44 100644 --- a/crates/codestory-indexer/src/resolution/sql.rs +++ b/crates/codestory-indexer/src/resolution/sql.rs @@ -90,11 +90,12 @@ pub(super) fn unresolved_edges( } let mut query = String::from( - "SELECT e.id, caller.file_node_id, caller.qualified_name, caller.serialized_name, target.serialized_name, file_node.serialized_name, e.callsite_identity + "SELECT e.id, COALESCE(caller.file_node_id, e.file_node_id), caller.qualified_name, caller.serialized_name, target.serialized_name, e.target_node_id, + file_node.serialized_name, e.callsite_identity FROM edge e JOIN node caller ON caller.id = e.source_node_id JOIN node target ON target.id = e.target_node_id - LEFT JOIN node file_node ON file_node.id = caller.file_node_id + LEFT JOIN node file_node ON file_node.id = COALESCE(caller.file_node_id, e.file_node_id) WHERE e.kind = ?1 AND e.resolved_target_node_id IS NULL AND (target.canonical_id IS NULL OR ( target.canonical_id NOT LIKE 'tauri:command:%' @@ -211,7 +212,8 @@ fn map_unresolved_edge_row(row: &rusqlite::Row<'_>) -> rusqlite::Result>(2)?, row.get::<_, String>(3)?, row.get::<_, String>(4)?, - row.get::<_, Option>(5)?, + row.get::<_, i64>(5)?, row.get::<_, Option>(6)?, + row.get::<_, Option>(7)?, )) } diff --git a/crates/codestory-indexer/src/semantic/rust.rs b/crates/codestory-indexer/src/semantic/rust.rs index 43bdd8f0..8d1728ff 100644 --- a/crates/codestory-indexer/src/semantic/rust.rs +++ b/crates/codestory-indexer/src/semantic/rust.rs @@ -39,6 +39,7 @@ impl RustSemanticResolver { let Some(symbol) = namespace_tail(alias_target(target), "::") else { return Ok(Vec::new()); }; + let symbol = symbol.trim_end_matches(" (import)").trim(); let kinds = [ NodeKind::MODULE as i32, diff --git a/crates/codestory-indexer/src/structural/html.rs b/crates/codestory-indexer/src/structural/html.rs index 69ecc652..efb66998 100644 --- a/crates/codestory-indexer/src/structural/html.rs +++ b/crates/codestory-indexer/src/structural/html.rs @@ -4,7 +4,7 @@ use crate::structural::blanking::{ extract_style_block_sources, }; use crate::{get_language_for_ext, index_file}; -use codestory_contracts::graph::{NodeId, NodeKind}; +use codestory_contracts::graph::{EdgeId, EdgeKind, NodeId, NodeKind}; use std::collections::HashMap; use std::path::Path; @@ -180,6 +180,11 @@ fn merge_delegated_script_graph( index_result: crate::IndexResult, script_regions: &[super::blanking::EmbeddedRegion], ) { + let delegated_file_id = index_result + .nodes + .iter() + .find(|node| node.kind == NodeKind::FILE) + .map(|node| node.id); let script_module = script_regions.first().map(|region| { let canonical = format!("html:script-block:{}", region.start_line); push_structural_node( @@ -202,15 +207,43 @@ fn merge_delegated_script_graph( } for mut edge in index_result.edges { + if Some(edge.source) == delegated_file_id { + edge.source = host_file_id; + } + if Some(edge.target) == delegated_file_id { + edge.target = host_file_id; + } + if edge.resolved_source == delegated_file_id { + edge.resolved_source = Some(host_file_id); + } + if edge.resolved_target == delegated_file_id { + edge.resolved_target = Some(host_file_id); + } if edge.file_node_id.is_some() { edge.file_node_id = Some(host_file_id); } + if edge.kind == EdgeKind::CALL { + let col = edge + .callsite_identity + .as_deref() + .and_then(|identity| identity.split(':').nth(2)) + .and_then(|value| value.parse::().ok()); + edge.callsite_identity = None; + crate::ensure_callsite_identity(&mut edge, col); + } + edge.id = EdgeId(crate::generate_edge_id_for_edge( + &edge, + crate::index_feature_flags(), + )); storage.edges.push(edge); } storage .occurrences .extend(index_result.occurrences.into_iter().map(|mut occurrence| { + if Some(NodeId(occurrence.element_id)) == delegated_file_id { + occurrence.element_id = host_file_id.0; + } occurrence.location.file_node_id = host_file_id; occurrence })); diff --git a/crates/codestory-indexer/src/structural/mod.rs b/crates/codestory-indexer/src/structural/mod.rs index 1822eb59..3d81b14f 100644 --- a/crates/codestory-indexer/src/structural/mod.rs +++ b/crates/codestory-indexer/src/structural/mod.rs @@ -91,7 +91,9 @@ fn file_modification_time(path: &Path) -> i64 { #[cfg(test)] mod tests { use super::*; - use codestory_contracts::graph::NodeKind; + use codestory_contracts::graph::{EdgeKind, NodeKind}; + use codestory_store::{ProjectionBatch, Store as Storage}; + use std::collections::HashSet; #[test] fn indexes_dedicated_sql_file() { @@ -104,4 +106,65 @@ mod tests { assert!(storage.nodes.iter().any(|n| n.kind == NodeKind::CLASS)); assert_eq!(storage.files[0].language, "sql"); } + + #[test] + fn html_inline_endpoint_calls_do_not_keep_delegated_file_edges() -> anyhow::Result<()> { + let html = r#" + + + + +"#; + let projected = index_structural_source(Path::new("examples/get/index.html"), html)?; + let node_ids = projected + .nodes + .iter() + .map(|node| node.id) + .collect::>(); + for edge in &projected.edges { + assert!( + node_ids.contains(&edge.source), + "edge source should be present: {edge:?}" + ); + assert!( + node_ids.contains(&edge.target), + "edge target should be present: {edge:?}" + ); + if let Some(file_node_id) = edge.file_node_id { + assert!( + node_ids.contains(&file_node_id), + "edge file node should be present: {edge:?}" + ); + } + } + + assert!( + projected.edges.iter().any(|edge| { + edge.kind == EdgeKind::CALL + && projected.nodes.iter().any(|node| { + node.id == edge.target + && node.canonical_id.as_deref() + == Some("openapi:endpoint:GET /get/server") + }) + }), + "expected an inline endpoint CALL edge" + ); + + let mut storage = Storage::new_in_memory()?; + storage + .projections() + .flush_projection_batch(ProjectionBatch { + files: &projected.files, + nodes: &projected.nodes, + edges: &projected.edges, + occurrences: &projected.occurrences, + component_access: &projected.component_access, + callable_projection_states: &projected.callable_projection_states, + })?; + Ok(()) + } } diff --git a/crates/codestory-indexer/tests/fidelity_regression.rs b/crates/codestory-indexer/tests/fidelity_regression.rs index e5beaaab..8d9383ee 100644 --- a/crates/codestory-indexer/tests/fidelity_regression.rs +++ b/crates/codestory-indexer/tests/fidelity_regression.rs @@ -13,9 +13,18 @@ const JAVA_SOURCE: &str = include_str!("fixtures/fidelity_lab/java_fidelity_lab. const CPP_SOURCE: &str = include_str!("fixtures/fidelity_lab/cpp_fidelity_lab.cpp"); const C_SOURCE: &str = include_str!("fixtures/fidelity_lab/c_fidelity_lab.c"); const RUST_SOURCE: &str = include_str!("fixtures/fidelity_lab/rust_fidelity_lab.rs"); +const GO_SOURCE: &str = include_str!("fixtures/fidelity_lab/go_fidelity_lab.go"); +const RUBY_SOURCE: &str = include_str!("fixtures/fidelity_lab/ruby_fidelity_lab.rb"); +const PHP_SOURCE: &str = include_str!("fixtures/fidelity_lab/php_fidelity_lab.php"); +const CSHARP_SOURCE: &str = include_str!("fixtures/fidelity_lab/csharp_fidelity_lab.cs"); +const KOTLIN_SOURCE: &str = include_str!("fixtures/fidelity_lab/kotlin_fidelity_lab.kt"); +const SWIFT_SOURCE: &str = include_str!("fixtures/fidelity_lab/swift_fidelity_lab.swift"); +const DART_SOURCE: &str = include_str!("fixtures/fidelity_lab/dart_fidelity_lab.dart"); +const BASH_SOURCE: &str = include_str!("fixtures/fidelity_lab/bash_fidelity_lab.sh"); type ResolvedOwnerExpectation = (&'static str, &'static str, &'static str); type ResolvedNameExpectation = (&'static str, &'static str); +type MemberExpectation = (&'static str, &'static str); struct FidelityCase { language: &'static str, @@ -27,6 +36,7 @@ struct FidelityCase { required_symbols: &'static [&'static str], required_call_targets: &'static [&'static str], required_import_fragments: &'static [&'static str], + required_member_pairs: &'static [MemberExpectation], min_resolved_calls: usize, expected_resolved_owners: &'static [ResolvedOwnerExpectation], expected_resolved_names: &'static [ResolvedNameExpectation], @@ -103,6 +113,98 @@ const RUST_SYMBOLS: &[&str] = &[ "run_async", "orchestrate_rust", ]; +const GO_SYMBOLS: &[&str] = &[ + "Notifier", + "ConsoleNotifier", + "Repository", + "Event", + "Workflow", + "Notify", + "Save", + "Run", + "decorate", + "orchestrateGo", +]; +const RUBY_SYMBOLS: &[&str] = &[ + "Notifier", + "ConsoleNotifier", + "Repository", + "Workflow", + "notify", + "save", + "run", + "decorate", + "orchestrate_ruby", +]; +const PHP_SYMBOLS: &[&str] = &[ + "Notifier", + "ConsoleNotifier", + "Repository", + "Event", + "Workflow", + "notify", + "save", + "run", + "decorate", + "orchestrate_php", +]; +const CSHARP_SYMBOLS: &[&str] = &[ + "INotifier", + "ConsoleNotifier", + "Repository", + "Event", + "Workflow", + "Program", + "Notify", + "Save", + "Run", + "Decorate", + "Main", +]; +const KOTLIN_SYMBOLS: &[&str] = &[ + "Notifier", + "ConsoleNotifier", + "Repository", + "Event", + "Workflow", + "notify", + "save", + "run", + "decorate", + "orchestrateKotlin", +]; +const SWIFT_SYMBOLS: &[&str] = &[ + "Notifier", + "ConsoleNotifier", + "Repository", + "Event", + "Workflow", + "notify", + "save", + "run", + "decorate", + "orchestrateSwift", +]; +const DART_SYMBOLS: &[&str] = &[ + "Notifier", + "ConsoleNotifier", + "Repository", + "Event", + "Workflow", + "notify", + "save", + "run", + "decorate", + "orchestrateDart", +]; +const BASH_SYMBOLS: &[&str] = &[ + "notify", + "save", + "decorate", + "run", + "orchestrate_bash", + "event", +]; const PYTHON_CALLS: &[&str] = &["notify", "save", "decorate", "run"]; const TYPESCRIPT_CALLS: &[&str] = &["identity", "notify", "save", "decorate", "run"]; @@ -111,6 +213,14 @@ const JAVA_CALLS: &[&str] = &["identity", "notifyEvent", "save", "decorate", "ru const CPP_CALLS: &[&str] = &["identity", "notifyEvent", "save", "decorate", "run"]; const C_CALLS: &[&str] = &["repository_track", "workflow_run"]; const RUST_CALLS: &[&str] = &["identity", "notify", "save", "decorate", "run"]; +const GO_CALLS: &[&str] = &["Notify", "Save", "decorate", "Run"]; +const RUBY_CALLS: &[&str] = &["notify", "save", "decorate", "run"]; +const PHP_CALLS: &[&str] = &["notify", "save", "decorate", "run"]; +const CSHARP_CALLS: &[&str] = &["Notify", "Save", "Decorate", "Run"]; +const KOTLIN_CALLS: &[&str] = &["notify", "save", "decorate", "run"]; +const SWIFT_CALLS: &[&str] = &["notify", "save", "decorate"]; +const DART_CALLS: &[&str] = &["notify", "save", "decorate"]; +const BASH_CALLS: &[&str] = &["notify", "save", "decorate", "run"]; const PYTHON_IMPORTS: &[&str] = &[]; const TYPESCRIPT_IMPORTS: &[&str] = &["fs", "path"]; @@ -119,6 +229,94 @@ const JAVA_IMPORTS: &[&str] = &["java.util.concurrent", "java.util.function"]; const CPP_IMPORTS: &[&str] = &["future", "functional", "string"]; const C_IMPORTS: &[&str] = &["stdio", "string", "stddef"]; const RUST_IMPORTS: &[&str] = &["std::collections", "std::future"]; +const GO_IMPORTS: &[&str] = &["fmt"]; +const RUBY_IMPORTS: &[&str] = &["logger"]; +const PHP_IMPORTS: &[&str] = &["Random\\Randomizer"]; +const CSHARP_IMPORTS: &[&str] = &["System"]; +const KOTLIN_IMPORTS: &[&str] = &["kotlin.math.abs"]; +const SWIFT_IMPORTS: &[&str] = &["Foundation"]; +const DART_IMPORTS: &[&str] = &["dart:math"]; +const BASH_IMPORTS: &[&str] = &["./logger.sh"]; + +const PYTHON_MEMBERS: &[MemberExpectation] = &[]; +const TYPESCRIPT_MEMBERS: &[MemberExpectation] = &[ + ("ConsoleNotifier", "notify"), + ("Repository", "save"), + ("Workflow", "run"), +]; +const JAVASCRIPT_MEMBERS: &[MemberExpectation] = &[ + ("ConsoleNotifier", "notify"), + ("Repository", "save"), + ("Workflow", "run"), +]; +const JAVA_MEMBERS: &[MemberExpectation] = &[ + ("Notifier", "notifyEvent"), + ("ConsoleNotifier", "notifyEvent"), + ("Repository", "save"), + ("Workflow", "run"), +]; +const CPP_MEMBERS: &[MemberExpectation] = &[ + ("Notifier", "notifyEvent"), + ("ConsoleNotifier", "notifyEvent"), + ("Repository", "save"), + ("Workflow", "run"), +]; +const C_MEMBERS: &[MemberExpectation] = &[]; +const RUST_MEMBERS: &[MemberExpectation] = &[ + ("ConsoleNotifier", "notify"), + ("MemoryRepository", "save"), + ("Workflow", "run"), +]; +const GO_MEMBERS: &[MemberExpectation] = &[ + ("Notifier", "Notify"), + ("ConsoleNotifier", "Notify"), + ("Repository", "Save"), + ("Workflow", "Run"), +]; +const RUBY_MEMBERS: &[MemberExpectation] = &[ + ("Notifier", "notify"), + ("ConsoleNotifier", "notify"), + ("Repository", "save"), + ("Workflow", "run"), + ("Workflow", "decorate"), +]; +const PHP_MEMBERS: &[MemberExpectation] = &[ + ("Notifier", "notify"), + ("ConsoleNotifier", "notify"), + ("Repository", "save"), + ("Workflow", "run"), + ("Workflow", "decorate"), +]; +const CSHARP_MEMBERS: &[MemberExpectation] = &[ + ("INotifier", "Notify"), + ("ConsoleNotifier", "Notify"), + ("Repository", "Save"), + ("Workflow", "Run"), + ("Workflow", "Decorate"), + ("Program", "Main"), +]; +const KOTLIN_MEMBERS: &[MemberExpectation] = &[ + ("Notifier", "notify"), + ("ConsoleNotifier", "notify"), + ("Repository", "save"), + ("Workflow", "run"), + ("Workflow", "decorate"), +]; +const SWIFT_MEMBERS: &[MemberExpectation] = &[ + ("Notifier", "notify"), + ("ConsoleNotifier", "notify"), + ("Repository", "save"), + ("Workflow", "run"), + ("Workflow", "decorate"), +]; +const DART_MEMBERS: &[MemberExpectation] = &[ + ("Notifier", "notify"), + ("ConsoleNotifier", "notify"), + ("Repository", "save"), + ("Workflow", "run"), + ("Workflow", "decorate"), +]; +const BASH_MEMBERS: &[MemberExpectation] = &[]; const PYTHON_RESOLVED_OWNERS: &[ResolvedOwnerExpectation] = &[]; const TYPESCRIPT_RESOLVED_OWNERS: &[ResolvedOwnerExpectation] = @@ -135,6 +333,30 @@ const CPP_RESOLVED_OWNERS: &[ResolvedOwnerExpectation] = &[ const C_RESOLVED_OWNERS: &[ResolvedOwnerExpectation] = &[]; const RUST_RESOLVED_OWNERS: &[ResolvedOwnerExpectation] = &[("run", "Notifier", "notify"), ("run", "Repository", "save")]; +const GO_RESOLVED_OWNERS: &[ResolvedOwnerExpectation] = + &[("Run", "Notifier", "Notify"), ("Run", "Repository", "Save")]; +const RUBY_RESOLVED_OWNERS: &[ResolvedOwnerExpectation] = &[ + ("run", "Notifier", "notify"), + ("run", "Repository", "save"), + ("run", "Workflow", "decorate"), +]; +const PHP_RESOLVED_OWNERS: &[ResolvedOwnerExpectation] = &[ + ("run", "Notifier", "notify"), + ("run", "Repository", "save"), + ("run", "Workflow", "decorate"), +]; +const CSHARP_RESOLVED_OWNERS: &[ResolvedOwnerExpectation] = &[ + ("Run", "INotifier", "Notify"), + ("Run", "Repository", "Save"), + ("Run", "Workflow", "Decorate"), +]; +const KOTLIN_RESOLVED_OWNERS: &[ResolvedOwnerExpectation] = + &[("run", "Notifier", "notify"), ("run", "Repository", "save")]; +const SWIFT_RESOLVED_OWNERS: &[ResolvedOwnerExpectation] = + &[("run", "Notifier", "notify"), ("run", "Repository", "save")]; +const DART_RESOLVED_OWNERS: &[ResolvedOwnerExpectation] = + &[("run", "Notifier", "notify"), ("run", "Repository", "save")]; +const BASH_RESOLVED_OWNERS: &[ResolvedOwnerExpectation] = &[]; const EMPTY_RESOLVED_NAMES: &[ResolvedNameExpectation] = &[]; @@ -150,6 +372,7 @@ fn fidelity_cases() -> Vec { required_symbols: PYTHON_SYMBOLS, required_call_targets: PYTHON_CALLS, required_import_fragments: PYTHON_IMPORTS, + required_member_pairs: PYTHON_MEMBERS, min_resolved_calls: 0, expected_resolved_owners: PYTHON_RESOLVED_OWNERS, expected_resolved_names: EMPTY_RESOLVED_NAMES, @@ -164,6 +387,7 @@ fn fidelity_cases() -> Vec { required_symbols: TYPESCRIPT_SYMBOLS, required_call_targets: TYPESCRIPT_CALLS, required_import_fragments: TYPESCRIPT_IMPORTS, + required_member_pairs: TYPESCRIPT_MEMBERS, min_resolved_calls: 2, expected_resolved_owners: TYPESCRIPT_RESOLVED_OWNERS, expected_resolved_names: EMPTY_RESOLVED_NAMES, @@ -178,6 +402,7 @@ fn fidelity_cases() -> Vec { required_symbols: JAVASCRIPT_SYMBOLS, required_call_targets: JAVASCRIPT_CALLS, required_import_fragments: JAVASCRIPT_IMPORTS, + required_member_pairs: JAVASCRIPT_MEMBERS, min_resolved_calls: 1, expected_resolved_owners: JAVASCRIPT_RESOLVED_OWNERS, expected_resolved_names: EMPTY_RESOLVED_NAMES, @@ -192,6 +417,7 @@ fn fidelity_cases() -> Vec { required_symbols: JAVA_SYMBOLS, required_call_targets: JAVA_CALLS, required_import_fragments: JAVA_IMPORTS, + required_member_pairs: JAVA_MEMBERS, min_resolved_calls: 2, expected_resolved_owners: JAVA_RESOLVED_OWNERS, expected_resolved_names: EMPTY_RESOLVED_NAMES, @@ -206,6 +432,7 @@ fn fidelity_cases() -> Vec { required_symbols: CPP_SYMBOLS, required_call_targets: CPP_CALLS, required_import_fragments: CPP_IMPORTS, + required_member_pairs: CPP_MEMBERS, min_resolved_calls: 2, expected_resolved_owners: CPP_RESOLVED_OWNERS, expected_resolved_names: EMPTY_RESOLVED_NAMES, @@ -220,6 +447,7 @@ fn fidelity_cases() -> Vec { required_symbols: C_SYMBOLS, required_call_targets: C_CALLS, required_import_fragments: C_IMPORTS, + required_member_pairs: C_MEMBERS, min_resolved_calls: 0, expected_resolved_owners: C_RESOLVED_OWNERS, expected_resolved_names: EMPTY_RESOLVED_NAMES, @@ -234,10 +462,131 @@ fn fidelity_cases() -> Vec { required_symbols: RUST_SYMBOLS, required_call_targets: RUST_CALLS, required_import_fragments: RUST_IMPORTS, + required_member_pairs: RUST_MEMBERS, min_resolved_calls: 2, expected_resolved_owners: RUST_RESOLVED_OWNERS, expected_resolved_names: EMPTY_RESOLVED_NAMES, }, + FidelityCase { + language: "go", + filename: "fidelity.go", + source: GO_SOURCE, + min_nodes: 12, + min_call_edges: 4, + min_import_edges: 1, + required_symbols: GO_SYMBOLS, + required_call_targets: GO_CALLS, + required_import_fragments: GO_IMPORTS, + required_member_pairs: GO_MEMBERS, + min_resolved_calls: 2, + expected_resolved_owners: GO_RESOLVED_OWNERS, + expected_resolved_names: EMPTY_RESOLVED_NAMES, + }, + FidelityCase { + language: "ruby", + filename: "fidelity.rb", + source: RUBY_SOURCE, + min_nodes: 12, + min_call_edges: 4, + min_import_edges: 1, + required_symbols: RUBY_SYMBOLS, + required_call_targets: RUBY_CALLS, + required_import_fragments: RUBY_IMPORTS, + required_member_pairs: RUBY_MEMBERS, + min_resolved_calls: 3, + expected_resolved_owners: RUBY_RESOLVED_OWNERS, + expected_resolved_names: EMPTY_RESOLVED_NAMES, + }, + FidelityCase { + language: "php", + filename: "fidelity.php", + source: PHP_SOURCE, + min_nodes: 12, + min_call_edges: 4, + min_import_edges: 1, + required_symbols: PHP_SYMBOLS, + required_call_targets: PHP_CALLS, + required_import_fragments: PHP_IMPORTS, + required_member_pairs: PHP_MEMBERS, + min_resolved_calls: 3, + expected_resolved_owners: PHP_RESOLVED_OWNERS, + expected_resolved_names: EMPTY_RESOLVED_NAMES, + }, + FidelityCase { + language: "csharp", + filename: "fidelity.cs", + source: CSHARP_SOURCE, + min_nodes: 12, + min_call_edges: 4, + min_import_edges: 1, + required_symbols: CSHARP_SYMBOLS, + required_call_targets: CSHARP_CALLS, + required_import_fragments: CSHARP_IMPORTS, + required_member_pairs: CSHARP_MEMBERS, + min_resolved_calls: 3, + expected_resolved_owners: CSHARP_RESOLVED_OWNERS, + expected_resolved_names: EMPTY_RESOLVED_NAMES, + }, + FidelityCase { + language: "kotlin", + filename: "fidelity.kt", + source: KOTLIN_SOURCE, + min_nodes: 10, + min_call_edges: 4, + min_import_edges: 1, + required_symbols: KOTLIN_SYMBOLS, + required_call_targets: KOTLIN_CALLS, + required_import_fragments: KOTLIN_IMPORTS, + required_member_pairs: KOTLIN_MEMBERS, + min_resolved_calls: 2, + expected_resolved_owners: KOTLIN_RESOLVED_OWNERS, + expected_resolved_names: EMPTY_RESOLVED_NAMES, + }, + FidelityCase { + language: "swift", + filename: "fidelity.swift", + source: SWIFT_SOURCE, + min_nodes: 10, + min_call_edges: 3, + min_import_edges: 1, + required_symbols: SWIFT_SYMBOLS, + required_call_targets: SWIFT_CALLS, + required_import_fragments: SWIFT_IMPORTS, + required_member_pairs: SWIFT_MEMBERS, + min_resolved_calls: 2, + expected_resolved_owners: SWIFT_RESOLVED_OWNERS, + expected_resolved_names: EMPTY_RESOLVED_NAMES, + }, + FidelityCase { + language: "dart", + filename: "fidelity.dart", + source: DART_SOURCE, + min_nodes: 10, + min_call_edges: 3, + min_import_edges: 1, + required_symbols: DART_SYMBOLS, + required_call_targets: DART_CALLS, + required_import_fragments: DART_IMPORTS, + required_member_pairs: DART_MEMBERS, + min_resolved_calls: 2, + expected_resolved_owners: DART_RESOLVED_OWNERS, + expected_resolved_names: EMPTY_RESOLVED_NAMES, + }, + FidelityCase { + language: "bash", + filename: "fidelity.sh", + source: BASH_SOURCE, + min_nodes: 6, + min_call_edges: 4, + min_import_edges: 1, + required_symbols: BASH_SYMBOLS, + required_call_targets: BASH_CALLS, + required_import_fragments: BASH_IMPORTS, + required_member_pairs: BASH_MEMBERS, + min_resolved_calls: 0, + expected_resolved_owners: BASH_RESOLVED_OWNERS, + expected_resolved_names: EMPTY_RESOLVED_NAMES, + }, ] } @@ -532,6 +881,16 @@ fn test_fidelity_lab_graph_shape_and_semantics() -> anyhow::Result<()> { ); } + for (owner, member) in case.required_member_pairs { + assert!( + has_edge_between_names(&edges, &nodes, EdgeKind::MEMBER, owner, member), + "Case `{}`: missing MEMBER edge `{}` -> `{}`", + case.language, + owner, + member + ); + } + assert!( edges .iter() diff --git a/crates/codestory-indexer/tests/fixtures/fidelity_lab/bash_fidelity_lab.sh b/crates/codestory-indexer/tests/fixtures/fidelity_lab/bash_fidelity_lab.sh new file mode 100644 index 00000000..4dc3553a --- /dev/null +++ b/crates/codestory-indexer/tests/fixtures/fidelity_lab/bash_fidelity_lab.sh @@ -0,0 +1,29 @@ +source ./logger.sh + +notify() { + event="$1" + printf "%s\n" "$event" +} + +save() { + event="$1" + printf "%s\n" "$event" +} + +decorate() { + event="$1" + printf "%s\n" "$event" +} + +run() { + event="$1" + notify "$event" + save "$event" + decorate "$event" +} + +orchestrate_bash() { + run "ready" +} + +orchestrate_bash "$@" diff --git a/crates/codestory-indexer/tests/fixtures/fidelity_lab/csharp_fidelity_lab.cs b/crates/codestory-indexer/tests/fixtures/fidelity_lab/csharp_fidelity_lab.cs new file mode 100644 index 00000000..7cfc53b9 --- /dev/null +++ b/crates/codestory-indexer/tests/fixtures/fidelity_lab/csharp_fidelity_lab.cs @@ -0,0 +1,66 @@ +using System; + +namespace App; + +interface INotifier +{ + void Notify(Event evt); +} + +class ConsoleNotifier : INotifier +{ + public void Notify(Event evt) + { + Console.WriteLine(evt.Name); + } +} + +class Repository +{ + public void Save(Event evt) + { + Console.WriteLine(evt.Name); + } +} + +class Event +{ + public Event(string name) + { + Name = name; + } + + public string Name { get; } +} + +class Workflow +{ + private readonly INotifier notifier; + private readonly Repository repository; + + public Workflow(INotifier notifier, Repository repository) + { + this.notifier = notifier; + this.repository = repository; + } + + public void Run(Event evt) + { + notifier.Notify(evt); + repository.Save(evt); + Decorate(evt); + } + + private string Decorate(Event evt) + { + return evt.Name; + } +} + +class Program +{ + static void Main() + { + new Workflow(new ConsoleNotifier(), new Repository()).Run(new Event("ready")); + } +} diff --git a/crates/codestory-indexer/tests/fixtures/fidelity_lab/dart_fidelity_lab.dart b/crates/codestory-indexer/tests/fixtures/fidelity_lab/dart_fidelity_lab.dart new file mode 100644 index 00000000..f885fd40 --- /dev/null +++ b/crates/codestory-indexer/tests/fixtures/fidelity_lab/dart_fidelity_lab.dart @@ -0,0 +1,41 @@ +import 'dart:math'; + +abstract class Notifier { + void notify(Event event); +} + +class ConsoleNotifier implements Notifier { + void notify(Event event) { + print(event.name); + } +} + +class Repository { + void save(Event event) { + print(event.name); + } +} + +class Event { + final String name; + + Event(this.name); +} + +class Workflow { + void run(Event event, Notifier notifier, Repository repository) { + notifier.notify(event); + repository.save(event); + decorate(event); + } + + String decorate(Event event) { + return event.name; + } +} + +void orchestrateDart() { + final workflow = Workflow(); + workflow.run(Event('ready'), ConsoleNotifier(), Repository()); + max(1, 2); +} diff --git a/crates/codestory-indexer/tests/fixtures/fidelity_lab/go_fidelity_lab.go b/crates/codestory-indexer/tests/fixtures/fidelity_lab/go_fidelity_lab.go new file mode 100644 index 00000000..8c5e9f44 --- /dev/null +++ b/crates/codestory-indexer/tests/fixtures/fidelity_lab/go_fidelity_lab.go @@ -0,0 +1,43 @@ +package main + +import "fmt" + +type Notifier interface { + Notify(Event) +} + +type ConsoleNotifier struct{} + +func (ConsoleNotifier) Notify(event Event) { + fmt.Println(event.Name) +} + +type Repository struct{} + +func (Repository) Save(event Event) { + fmt.Println(event.Name) +} + +type Event struct { + Name string +} + +type Workflow struct { + notifier Notifier + repo Repository +} + +func (w Workflow) Run(event Event) { + w.notifier.Notify(event) + w.repo.Save(event) + decorate(event.Name) +} + +func decorate(name string) string { + return name +} + +func orchestrateGo() { + workflow := Workflow{notifier: ConsoleNotifier{}, repo: Repository{}} + workflow.Run(Event{Name: "ready"}) +} diff --git a/crates/codestory-indexer/tests/fixtures/fidelity_lab/kotlin_fidelity_lab.kt b/crates/codestory-indexer/tests/fixtures/fidelity_lab/kotlin_fidelity_lab.kt new file mode 100644 index 00000000..c802c305 --- /dev/null +++ b/crates/codestory-indexer/tests/fixtures/fidelity_lab/kotlin_fidelity_lab.kt @@ -0,0 +1,39 @@ +package app + +import kotlin.math.abs + +interface Notifier { + fun notify(event: Event) +} + +class ConsoleNotifier : Notifier { + override fun notify(event: Event) { + println(event.name) + } +} + +class Repository { + fun save(event: Event) { + println(event.name) + } +} + +class Event(val name: String) + +class Workflow { + fun run(event: Event, notifier: Notifier, repository: Repository) { + notifier.notify(event) + repository.save(event) + decorate(event) + } + + fun decorate(event: Event): String { + return event.name + } +} + +fun orchestrateKotlin() { + val workflow = Workflow() + workflow.run(Event("ready"), ConsoleNotifier(), Repository()) + abs(1) +} diff --git a/crates/codestory-indexer/tests/fixtures/fidelity_lab/php_fidelity_lab.php b/crates/codestory-indexer/tests/fixtures/fidelity_lab/php_fidelity_lab.php new file mode 100644 index 00000000..59294240 --- /dev/null +++ b/crates/codestory-indexer/tests/fixtures/fidelity_lab/php_fidelity_lab.php @@ -0,0 +1,60 @@ +name; + } +} + +final class Repository +{ + public function save(Event $event): void + { + echo $event->name; + } +} + +final class Event +{ + public function __construct(public string $name) + { + } +} + +final class Workflow +{ + public function __construct( + private Notifier $notifier, + private Repository $repository + ) { + } + + public function run(Event $event): void + { + $this->notifier->notify($event); + $this->repository->save($event); + $this->decorate($event); + } + + private function decorate(Event $event): string + { + return $event->name; + } +} + +function orchestrate_php(): void +{ + $workflow = new Workflow(new ConsoleNotifier(), new Repository()); + $workflow->run(new Event((new Randomizer())->getBytes(4))); +} diff --git a/crates/codestory-indexer/tests/fixtures/fidelity_lab/ruby_fidelity_lab.rb b/crates/codestory-indexer/tests/fixtures/fidelity_lab/ruby_fidelity_lab.rb new file mode 100644 index 00000000..f24bf496 --- /dev/null +++ b/crates/codestory-indexer/tests/fixtures/fidelity_lab/ruby_fidelity_lab.rb @@ -0,0 +1,42 @@ +require "logger" + +class Notifier + def notify(event) + event.name + end +end + +class ConsoleNotifier < Notifier + def notify(event) + puts event.name + end +end + +class Repository + def save(event) + event.name + end +end + +Event = Struct.new(:name) + +class Workflow + def initialize(notifier, repository) + @notifier = notifier + @repository = repository + end + + def run(event) + @notifier.notify(event) + @repository.save(event) + decorate(event) + end + + def decorate(event) + event.name + end +end + +def orchestrate_ruby + Workflow.new(ConsoleNotifier.new, Repository.new).run(Event.new("ready")) +end diff --git a/crates/codestory-indexer/tests/fixtures/fidelity_lab/swift_fidelity_lab.swift b/crates/codestory-indexer/tests/fixtures/fidelity_lab/swift_fidelity_lab.swift new file mode 100644 index 00000000..dfc72a6d --- /dev/null +++ b/crates/codestory-indexer/tests/fixtures/fidelity_lab/swift_fidelity_lab.swift @@ -0,0 +1,42 @@ +import Foundation + +protocol Notifier { + func notify(event: Event) +} + +class ConsoleNotifier: Notifier { + func notify(event: Event) { + print(event.name) + } +} + +class Repository { + func save(event: Event) { + print(event.name) + } +} + +class Event { + let name: String + + init(name: String) { + self.name = name + } +} + +class Workflow { + func run(event: Event, notifier: Notifier, repository: Repository) { + notifier.notify(event: event) + repository.save(event: event) + decorate(event: event) + } + + func decorate(event: Event) -> String { + return event.name + } +} + +func orchestrateSwift() { + let workflow = Workflow() + workflow.run(event: Event(name: "ready"), notifier: ConsoleNotifier(), repository: Repository()) +} diff --git a/crates/codestory-indexer/tests/fixtures/tictactoe/bash_tictactoe.sh b/crates/codestory-indexer/tests/fixtures/tictactoe/bash_tictactoe.sh new file mode 100644 index 00000000..ac013f9b --- /dev/null +++ b/crates/codestory-indexer/tests/fixtures/tictactoe/bash_tictactoe.sh @@ -0,0 +1,55 @@ +source ./random.sh + +numberIn() { + echo 1 +} + +numberOut() { + printf "%s\n" "$1" +} + +stringOut() { + printf "%s\n" "$1" +} + +sameInRow() { + token="$1" + amount="$2" + echo "$((token * amount))" +} + +makeMove() { + row="$1" + col="$2" + token="$3" + if [ "$token" -eq 0 ]; then + return 1 + fi + sameInRow "$token" 3 + echo "$row:$col" +} + +turn() { + makeMove 0 0 "$1" +} + +minMax() { + depth="$3" + if [ "$depth" -eq 0 ]; then + echo 0 + return + fi + minMax "$1" "$2" "$((depth - 1))" +} + +run() { + numberIn + stringOut "start" + minMax field 1 3 +} + +main() { + run +} + +main "$@" diff --git a/crates/codestory-indexer/tests/fixtures/tictactoe/dart_tictactoe.dart b/crates/codestory-indexer/tests/fixtures/tictactoe/dart_tictactoe.dart new file mode 100644 index 00000000..55f2a900 --- /dev/null +++ b/crates/codestory-indexer/tests/fixtures/tictactoe/dart_tictactoe.dart @@ -0,0 +1,73 @@ +import 'dart:math'; + +int numberIn() { + return 1; +} + +void numberOut(int num) { + print(num); +} + +void stringOut(String value) { + print(value); +} + +class GameObject { + void announce() {} +} + +class Field extends GameObject { + int left = 9; + + int sameInRow(int token, int amount) { + return token * amount; + } + + bool makeMove(int row, int col, int token) { + if (token == 0) { + return false; + } + left -= row + col; + sameInRow(token, 3); + return true; + } +} + +abstract class Player { + bool turn(Field field, int token); +} + +class HumanPlayer implements Player { + bool turn(Field field, int token) { + return field.makeMove(0, 0, token); + } +} + +class ArtificialPlayer implements Player { + int minMax(Field field, int token, int depth) { + if (depth == 0) { + return 0; + } + return minMax(field, token, depth - 1); + } + + bool turn(Field field, int token) { + minMax(field, token, 3); + return true; + } +} + +class TicTacToe extends GameObject { + final field = Field(); + + void run() { + numberIn(); + stringOut('start'); + Random().nextInt(3); + } +} + +void main() { + final game = TicTacToe(); + game.run(); +} diff --git a/crates/codestory-indexer/tests/fixtures/tictactoe/kotlin_tictactoe.kt b/crates/codestory-indexer/tests/fixtures/tictactoe/kotlin_tictactoe.kt new file mode 100644 index 00000000..5dd83a8d --- /dev/null +++ b/crates/codestory-indexer/tests/fixtures/tictactoe/kotlin_tictactoe.kt @@ -0,0 +1,73 @@ +package tictactoe + +import kotlin.random.Random + +fun numberIn(): Int = 1 + +fun numberOut(num: Int) { + println(num) +} + +fun stringOut(value: String) { + println(value) +} + +open class GameObject { + fun announce() {} +} + +class Field : GameObject() { + var left: Int = 9 + + fun sameInRow(token: Int, amount: Int): Int { + return token * amount + } + + fun makeMove(row: Int, col: Int, token: Int): Boolean { + if (token == 0) { + return false + } + left -= row + col + sameInRow(token, 3) + return true + } +} + +interface Player { + fun turn(field: Field, token: Int): Boolean +} + +class HumanPlayer : Player { + override fun turn(field: Field, token: Int): Boolean { + return field.makeMove(0, 0, token) + } +} + +class ArtificialPlayer : Player { + fun minMax(field: Field, token: Int, depth: Int): Int { + if (depth == 0) { + return 0 + } + return minMax(field, token, depth - 1) + } + + override fun turn(field: Field, token: Int): Boolean { + minMax(field, token, 3) + return true + } +} + +class TicTacToe : GameObject() { + val field = Field() + + fun run() { + numberIn() + stringOut("start") + Random.nextInt(3) + } +} + +fun main() { + val game = TicTacToe() + game.run() +} diff --git a/crates/codestory-indexer/tests/fixtures/tictactoe/swift_tictactoe.swift b/crates/codestory-indexer/tests/fixtures/tictactoe/swift_tictactoe.swift new file mode 100644 index 00000000..1cbf21c0 --- /dev/null +++ b/crates/codestory-indexer/tests/fixtures/tictactoe/swift_tictactoe.swift @@ -0,0 +1,73 @@ +import Foundation + +func numberIn() -> Int { + return 1 +} + +func numberOut(_ num: Int) { + print(num) +} + +func stringOut(_ value: String) { + print(value) +} + +class GameObject { + func announce() {} +} + +class Field: GameObject { + var left = 9 + + func sameInRow(token: Int, amount: Int) -> Int { + return token * amount + } + + func makeMove(row: Int, col: Int, token: Int) -> Bool { + if token == 0 { + return false + } + left -= row + col + sameInRow(token: token, amount: 3) + return true + } +} + +protocol Player { + func turn(field: Field, token: Int) -> Bool +} + +class HumanPlayer: Player { + func turn(field: Field, token: Int) -> Bool { + return field.makeMove(row: 0, col: 0, token: token) + } +} + +class ArtificialPlayer: Player { + func minMax(field: Field, token: Int, depth: Int) -> Int { + if depth == 0 { + return 0 + } + return minMax(field: field, token: token, depth: depth - 1) + } + + func turn(field: Field, token: Int) -> Bool { + minMax(field: field, token: token, depth: 3) + return true + } +} + +class TicTacToe: GameObject { + let field = Field() + + func run() { + numberIn() + stringOut("start") + Int.random(in: 0..<3) + } +} + +func main() { + let game = TicTacToe() + game.run() +} diff --git a/crates/codestory-indexer/tests/import_resolution.rs b/crates/codestory-indexer/tests/import_resolution.rs index 08e99311..86b67415 100644 --- a/crates/codestory-indexer/tests/import_resolution.rs +++ b/crates/codestory-indexer/tests/import_resolution.rs @@ -64,22 +64,11 @@ fn index_workspace( Ok((storage.get_nodes()?, storage.get_edges()?)) } -fn assert_imports_resolved(edges: &[codestory_contracts::graph::Edge]) { - let imports: Vec<_> = edges - .iter() - .filter(|e| e.kind == EdgeKind::IMPORT) - .collect(); - assert!(!imports.is_empty(), "IMPORT edge not found"); - for edge in imports { - if edge.resolved_target.is_some() { - let confidence = edge.confidence.unwrap_or(0.0); - assert!( - confidence >= 0.55, - "Resolved IMPORT edge confidence too low: {}", - confidence - ); - } - } +fn assert_import_edges_extracted(edges: &[codestory_contracts::graph::Edge]) { + assert!( + edges.iter().any(|edge| edge.kind == EdgeKind::IMPORT), + "IMPORT edge not found" + ); } fn matches_name(actual: &str, wanted: &str) -> bool { @@ -94,8 +83,144 @@ fn has_node_kind(nodes: &[codestory_contracts::graph::Node], name: &str, kind: N .any(|node| matches_name(&node.serialized_name, name) && node.kind == kind) } +fn file_path_for_node<'a>( + nodes_by_id: &std::collections::HashMap< + codestory_contracts::graph::NodeId, + &'a codestory_contracts::graph::Node, + >, + node: &codestory_contracts::graph::Node, +) -> Option<&'a str> { + node.file_node_id + .and_then(|file_id| nodes_by_id.get(&file_id).copied()) + .map(|file| file.serialized_name.as_str()) +} + +fn node_in_file<'a>( + nodes: &'a [codestory_contracts::graph::Node], + nodes_by_id: &std::collections::HashMap< + codestory_contracts::graph::NodeId, + &'a codestory_contracts::graph::Node, + >, + name: &str, + kind: NodeKind, + file_suffix: &str, +) -> Option<&'a codestory_contracts::graph::Node> { + nodes.iter().find(|node| { + matches_name(&node.serialized_name, name) + && node.kind == kind + && file_path_for_node(nodes_by_id, node) + .map(|path| path.replace('\\', "/").ends_with(file_suffix)) + .unwrap_or(false) + }) +} + +fn edge_importer_path<'a>( + nodes_by_id: &std::collections::HashMap< + codestory_contracts::graph::NodeId, + &'a codestory_contracts::graph::Node, + >, + edge: &codestory_contracts::graph::Edge, +) -> Option<&'a str> { + if let Some(file_id) = edge.file_node_id { + return nodes_by_id + .get(&file_id) + .map(|file| file.serialized_name.as_str()); + } + + nodes_by_id.get(&edge.source).and_then(|source| { + if source.kind == NodeKind::FILE { + Some(source.serialized_name.as_str()) + } else { + file_path_for_node(nodes_by_id, source) + } + }) +} + +fn path_ends_with(path: &str, suffix: &str) -> bool { + path.replace('\\', "/").ends_with(suffix) +} + +fn assert_import_resolved_to( + nodes: &[codestory_contracts::graph::Node], + edges: &[codestory_contracts::graph::Edge], + importer_suffix: &str, + target_suffix: &str, + target_name: &str, +) { + let nodes_by_id = nodes + .iter() + .map(|node| (node.id, node)) + .collect::>(); + + let resolved = edges.iter().any(|edge| { + if edge.kind != EdgeKind::IMPORT { + return false; + } + if !edge_importer_path(&nodes_by_id, edge) + .map(|path| path_ends_with(path, importer_suffix)) + .unwrap_or(false) + { + return false; + } + if edge.confidence.unwrap_or(0.0) < 0.55 { + return false; + } + + let Some(target_id) = edge.resolved_target else { + return false; + }; + let Some(target) = nodes_by_id.get(&target_id) else { + return false; + }; + + matches_name(&target.serialized_name, target_name) + && file_path_for_node(&nodes_by_id, target) + .map(|path| path_ends_with(path, target_suffix)) + .unwrap_or(false) + }); + + if !resolved { + let import_edges = edges + .iter() + .filter(|edge| edge.kind == EdgeKind::IMPORT) + .map(|edge| { + let importer = edge_importer_path(&nodes_by_id, edge).unwrap_or(""); + let source = nodes_by_id + .get(&edge.source) + .map(|node| node.serialized_name.as_str()) + .unwrap_or(""); + let target = nodes_by_id + .get(&edge.target) + .map(|node| node.serialized_name.as_str()) + .unwrap_or(""); + let resolved = edge + .resolved_target + .and_then(|target_id| nodes_by_id.get(&target_id).copied()) + .map(|node| node.serialized_name.as_str()) + .unwrap_or(""); + format!( + "{importer}: {source} -> {target} resolved={resolved} confidence={:?}", + edge.confidence + ) + }) + .collect::>(); + let target_candidates = nodes + .iter() + .filter(|node| matches_name(&node.serialized_name, target_name)) + .map(|node| { + let file = file_path_for_node(&nodes_by_id, node).unwrap_or(""); + format!("{:?} {} in {file}", node.kind, node.serialized_name) + }) + .collect::>(); + + panic!( + "expected import from {importer_suffix} to resolve to {target_name} in {target_suffix}. IMPORT edges: {import_edges:?}. Target candidates: {target_candidates:?}" + ); + } +} + #[test] -fn test_import_resolution_across_languages() -> anyhow::Result<()> { +fn test_import_edges_are_extracted_across_languages() -> anyhow::Result<()> { let cases = [ ( "main.ts", @@ -123,92 +248,51 @@ fn main() {} for (filename, code) in cases { let edges = index_single_file(filename, code)?; - assert_imports_resolved(&edges); + assert_import_edges_extracted(&edges); } Ok(()) } #[test] -fn test_cross_file_alias_default_named_and_type_imports() -> anyhow::Result<()> { +fn test_cross_file_imports_resolve_to_indexed_targets() -> anyhow::Result<()> { let (nodes, edges) = index_workspace(&[ ( - "main.rs", + "src/foo.ts", r#" -mod lib; -use crate::lib::Repository as Repo; -fn run() { - let _repository = Repo::new(); +export interface Foo { + id: number; } "#, ), ( - "lib.rs", + "src/main.ts", + r#" +import type { Foo } from "./foo"; +const value: Foo = { id: 1 }; +"#, + ), + ( + "src/widget.rs", + r#" +pub struct Widget; +"#, + ), + ( + "src/lib.rs", r#" -pub struct Repository; -impl Repository { - pub fn new() -> Self { Self } +mod widget; +use crate::widget::Widget; + +pub fn make_widget() -> Widget { + Widget } "#, ), ])?; - let main_file = nodes - .iter() - .find(|node| { - node.kind == codestory_contracts::graph::NodeKind::FILE - && node.serialized_name.contains("main.rs") - }) - .or_else(|| { - nodes - .iter() - .find(|node| node.kind == codestory_contracts::graph::NodeKind::FILE) - }) - .ok_or_else(|| anyhow::anyhow!("main.rs file node not found"))?; - let node_by_id = nodes - .iter() - .map(|node| (node.id, node)) - .collect::>(); - - let mut import_edges: Vec<_> = edges - .iter() - .filter(|edge| edge.kind == EdgeKind::IMPORT && edge.file_node_id == Some(main_file.id)) - .collect(); - if import_edges.is_empty() { - import_edges = edges - .iter() - .filter(|edge| edge.kind == EdgeKind::IMPORT) - .collect(); - } - - assert!( - !import_edges.is_empty(), - "expected IMPORT edges from main.rs" - ); - - let mut resolved_to_same_file = 0usize; - let mut unresolved_edges = 0usize; - for edge in import_edges { - let Some(target_id) = edge.resolved_target else { - unresolved_edges += 1; - continue; - }; - let Some(target) = node_by_id.get(&target_id) else { - continue; - }; - if target.file_node_id == Some(main_file.id) { - resolved_to_same_file += 1; - } - } - - assert!( - resolved_to_same_file == 0, - "import should not resolve back to symbols in the caller file" - ); - assert!( - unresolved_edges > 0, - "expected unresolved imports to remain explicit when cross-file resolution is uncertain" - ); + assert_import_resolved_to(&nodes, &edges, "src/main.ts", "src/foo.ts", "Foo"); + assert_import_resolved_to(&nodes, &edges, "src/lib.rs", "src/widget.rs", "Widget"); Ok(()) } @@ -273,3 +357,150 @@ async function load() { Ok(()) } + +#[test] +fn test_javascript_static_import_aliases_resolve_and_feed_constructor_calls() -> anyhow::Result<()> +{ + let (nodes, edges) = index_workspace(&[ + ( + "app.js", + r#" +import Client from "./client.js"; + +function makeClient() { + const client = new Client(); + return client; +} +"#, + ), + ( + "client.js", + r#" +class Client { + request() {} +} + +export default Client; +"#, + ), + ])?; + + let nodes_by_id = nodes + .iter() + .map(|node| (node.id, node)) + .collect::>(); + let make_client = node_in_file( + &nodes, + &nodes_by_id, + "makeClient", + NodeKind::FUNCTION, + "app.js", + ) + .ok_or_else(|| anyhow::anyhow!("makeClient node not found"))?; + let import_alias = node_in_file(&nodes, &nodes_by_id, "Client", NodeKind::UNKNOWN, "app.js") + .ok_or_else(|| anyhow::anyhow!("Client import alias node not found"))?; + let imported_class = node_in_file(&nodes, &nodes_by_id, "Client", NodeKind::CLASS, "client.js") + .ok_or_else(|| anyhow::anyhow!("Client class node not found"))?; + + assert!( + edges.iter().any(|edge| { + edge.kind == EdgeKind::CALL + && edge.source == make_client.id + && edge.effective_target() == import_alias.id + }), + "new Client() should create a CALL edge from makeClient to the imported alias" + ); + assert!( + edges.iter().any(|edge| { + edge.kind == EdgeKind::IMPORT + && edge.source == import_alias.id + && edge.effective_target() == imported_class.id + && edge.certainty.is_some_and(|certainty| { + certainty == codestory_contracts::graph::ResolutionCertainty::Certain + }) + }), + "Client default import should resolve by relative path to the class in client.js" + ); + + Ok(()) +} + +#[test] +fn test_javascript_bound_function_receiver_calls_imported_default() -> anyhow::Result<()> { + let (nodes, edges) = index_workspace(&[ + ( + "client.js", + r#" +import dispatchRequest from "./dispatchRequest.js"; + +class Client { + _request(config) { + return dispatchRequest.call(this, config); + } +} + +export default Client; +"#, + ), + ( + "dispatchRequest.js", + r#" +export default function dispatchRequest(config) { + return config; +} +"#, + ), + ])?; + + let nodes_by_id = nodes + .iter() + .map(|node| (node.id, node)) + .collect::>(); + let request_method = node_in_file( + &nodes, + &nodes_by_id, + "_request", + NodeKind::METHOD, + "client.js", + ) + .ok_or_else(|| anyhow::anyhow!("Client._request method not found"))?; + let import_alias = node_in_file( + &nodes, + &nodes_by_id, + "dispatchRequest", + NodeKind::UNKNOWN, + "client.js", + ) + .ok_or_else(|| anyhow::anyhow!("dispatchRequest import alias not found"))?; + let imported_function = node_in_file( + &nodes, + &nodes_by_id, + "dispatchRequest", + NodeKind::FUNCTION, + "dispatchRequest.js", + ) + .ok_or_else(|| anyhow::anyhow!("dispatchRequest function not found"))?; + + assert!( + edges.iter().any(|edge| { + edge.kind == EdgeKind::CALL + && edge.source == request_method.id + && edge.target == import_alias.id + && edge.effective_target() == import_alias.id + && edge.certainty.is_some_and(|certainty| { + certainty == codestory_contracts::graph::ResolutionCertainty::Certain + }) + }), + "dispatchRequest.call(...) should create a certain CALL edge to the imported dispatchRequest alias" + ); + assert!( + edges.iter().any(|edge| { + edge.kind == EdgeKind::IMPORT + && edge.source == import_alias.id + && edge.effective_target() == imported_function.id + }), + "dispatchRequest default import should resolve by relative path to dispatchRequest.js" + ); + + Ok(()) +} diff --git a/crates/codestory-indexer/tests/integration.rs b/crates/codestory-indexer/tests/integration.rs index 4a352ae7..8e744acc 100644 --- a/crates/codestory-indexer/tests/integration.rs +++ b/crates/codestory-indexer/tests/integration.rs @@ -2,7 +2,7 @@ use codestory_contracts::events::EventBus; use codestory_contracts::graph::{ AccessKind, EdgeKind, NodeId, NodeKind, OccurrenceKind, ResolutionCertainty, }; -use codestory_indexer::resolution::ResolutionPass; +use codestory_indexer::resolution::{RESOLUTION_SUPPORT_SNAPSHOT_VERSION, ResolutionPass}; use codestory_indexer::{IncrementalIndexingStats, WorkspaceIndexer}; use codestory_store::Store as Storage; use std::fs; @@ -258,7 +258,7 @@ fn test_incremental_indexing_second_run_reuses_unchanged_extraction_cache_and_re assert_eq!(first_stats.artifact_cache_hits, 0); assert_eq!(first_stats.artifact_cache_misses, 1); assert!(!first_stats.resolution_support_snapshot_hit); - assert!(storage.has_ready_resolution_support_snapshot(1)?); + assert!(storage.has_ready_resolution_support_snapshot(RESOLUTION_SUPPORT_SNAPSHOT_VERSION)?); let second_stats = run_incremental_indexing(root, &mut storage, vec![file_path.clone()])?; assert_eq!(second_stats.artifact_cache_hits, 1); @@ -438,7 +438,7 @@ func (r *Router) Handle(path string) {} assert!( before_nodes .iter() - .any(|node| node.serialized_name == "StrictSlash"), + .any(|node| node.serialized_name == "Router.StrictSlash"), "expected initial Go parser-backed method projection" ); let file_id = before_nodes @@ -471,7 +471,7 @@ func (r *Router) Handle(path string) {} assert!( !after_nodes .iter() - .any(|node| node.serialized_name == "StrictSlash"), + .any(|node| node.serialized_name.ends_with(".StrictSlash")), "stale Go method should be removed after structural refresh" ); let states_after = storage.get_callable_projection_states_for_file(file_id.0)?; diff --git a/crates/codestory-indexer/tests/oss_language_corpus.rs b/crates/codestory-indexer/tests/oss_language_corpus.rs new file mode 100644 index 00000000..78777b92 --- /dev/null +++ b/crates/codestory-indexer/tests/oss_language_corpus.rs @@ -0,0 +1,903 @@ +use anyhow::{Context, Result, bail}; +use codestory_contracts::events::EventBus; +use codestory_indexer::{ + IncrementalIndexingStats, WorkspaceIndexer, language_support_profile_for_ext, + language_support_profile_for_language_name, +}; +use codestory_store::Store as Storage; +use codestory_workspace::{BuildMode, RefreshInfo}; +use serde_json::json; +use std::collections::{BTreeMap, BTreeSet, HashMap, HashSet}; +use std::env; +use std::fs::{self, File}; +use std::io::{BufWriter, Write}; +use std::path::{Path, PathBuf}; +use std::process::Command; +use std::time::Instant; + +#[derive(Debug, Clone, Copy)] +struct OssCorpusCase { + language: &'static str, + repo_name: &'static str, + repo_url: &'static str, + commit: &'static str, + project_subdir: Option<&'static str>, + extensions: &'static [&'static str], + min_baseline_files: usize, + min_baseline_loc: usize, + min_indexed_files: usize, + min_nodes: usize, + max_errors: usize, +} + +#[derive(Debug)] +struct RawBaseline { + files: Vec, + file_count: usize, + loc: usize, +} + +#[derive(Debug)] +struct CorpusReport { + language: &'static str, + repo_name: &'static str, + commit: &'static str, + raw_files: usize, + raw_loc: usize, + codestory_input_files: usize, + codestory_stored_files: usize, + codestory_indexed_files: usize, + nodes: usize, + edges: usize, + errors: usize, + fatal_errors: usize, + error_samples: Vec, + checkout_ms: u128, + baseline_ms: u128, + index_ms: u128, + stats: IncrementalIndexingStats, +} + +const SUPPORTED_LANGUAGE_NAMES: &[&str] = &[ + "python", + "java", + "rust", + "javascript", + "typescript", + "cpp", + "c", + "go", + "ruby", + "php", + "csharp", + "kotlin", + "swift", + "dart", + "bash", + "html", + "css", + "sql", +]; + +const SKIPPED_DIRS: &[&str] = &[ + ".git", + ".gradle", + ".idea", + ".vscode", + ".build", + ".dart_tool", + ".swiftpm", + "bin", + "build", + "coverage", + "dist", + "node_modules", + "obj", + "packages", + "target", + "tmp", + "vendor", +]; + +const OSS_CORPUS: &[OssCorpusCase] = &[ + OssCorpusCase { + language: "python", + repo_name: "psf/requests", + repo_url: "https://github.com/psf/requests.git", + commit: "6f66281a1d6326b1b9c4ac09ca30de0fc4e6ef43", + project_subdir: None, + extensions: &["py", "pyi"], + min_baseline_files: 20, + min_baseline_loc: 3_000, + min_indexed_files: 20, + min_nodes: 25, + max_errors: 0, + }, + OssCorpusCase { + language: "java", + repo_name: "apache/commons-lang", + repo_url: "https://github.com/apache/commons-lang.git", + commit: "57f39420fef8413ea42f045f1bdba4864ff75a0c", + project_subdir: None, + extensions: &["java"], + min_baseline_files: 100, + min_baseline_loc: 20_000, + min_indexed_files: 100, + min_nodes: 25, + max_errors: 0, + }, + OssCorpusCase { + language: "rust", + repo_name: "BurntSushi/ripgrep", + repo_url: "https://github.com/BurntSushi/ripgrep.git", + commit: "82313cf95849bfe425109ad9506a52154879b1b1", + project_subdir: None, + extensions: &["rs"], + min_baseline_files: 100, + min_baseline_loc: 20_000, + min_indexed_files: 100, + min_nodes: 25, + max_errors: 0, + }, + OssCorpusCase { + language: "javascript", + repo_name: "expressjs/express", + repo_url: "https://github.com/expressjs/express.git", + commit: "dae209ae6559c29cfca2a1f4414c51d89ea643d5", + project_subdir: None, + extensions: &["js", "jsx", "mjs", "cjs"], + min_baseline_files: 50, + min_baseline_loc: 5_000, + min_indexed_files: 50, + min_nodes: 25, + max_errors: 0, + }, + OssCorpusCase { + language: "typescript", + repo_name: "vercel/swr", + repo_url: "https://github.com/vercel/swr.git", + commit: "f8d4995ac555f02a2784c8fc40bc819782c60568", + project_subdir: None, + extensions: &["ts", "tsx", "mts", "cts"], + min_baseline_files: 100, + min_baseline_loc: 10_000, + min_indexed_files: 100, + min_nodes: 25, + max_errors: 0, + }, + OssCorpusCase { + language: "cpp", + repo_name: "fmtlib/fmt", + repo_url: "https://github.com/fmtlib/fmt.git", + commit: "e8deaf2ec3b53ced589fce6f640061e5b32eeeaa", + project_subdir: None, + extensions: &["cpp", "cc", "cxx", "hpp", "hh", "hxx"], + min_baseline_files: 40, + min_baseline_loc: 10_000, + min_indexed_files: 40, + min_nodes: 25, + max_errors: 0, + }, + OssCorpusCase { + language: "c", + repo_name: "redis/redis", + repo_url: "https://github.com/redis/redis.git", + commit: "df63a65d4d4ee33ae67e9f101885074febe0bccb", + project_subdir: None, + extensions: &["c", "h"], + min_baseline_files: 250, + min_baseline_loc: 100_000, + min_indexed_files: 250, + min_nodes: 25, + max_errors: 0, + }, + OssCorpusCase { + language: "go", + repo_name: "gin-gonic/gin", + repo_url: "https://github.com/gin-gonic/gin.git", + commit: "d75fcd4c9ab260e5225de590f1f0f8c0e0e12d11", + project_subdir: None, + extensions: &["go"], + min_baseline_files: 80, + min_baseline_loc: 8_000, + min_indexed_files: 80, + min_nodes: 25, + max_errors: 0, + }, + OssCorpusCase { + language: "ruby", + repo_name: "jekyll/jekyll", + repo_url: "https://github.com/jekyll/jekyll.git", + commit: "202df571314ba1d18e9fccd81d12aaad4a703c38", + project_subdir: None, + extensions: &["rb"], + min_baseline_files: 100, + min_baseline_loc: 10_000, + min_indexed_files: 100, + min_nodes: 25, + max_errors: 0, + }, + OssCorpusCase { + language: "php", + repo_name: "Seldaek/monolog", + repo_url: "https://github.com/Seldaek/monolog.git", + commit: "04c3499db98d7471abd9261dc83232f8fe1a252d", + project_subdir: None, + extensions: &["php"], + min_baseline_files: 50, + min_baseline_loc: 5_000, + min_indexed_files: 50, + min_nodes: 25, + max_errors: 0, + }, + OssCorpusCase { + language: "csharp", + repo_name: "AutoMapper/AutoMapper", + repo_url: "https://github.com/AutoMapper/AutoMapper.git", + commit: "b57c206dc7291821e42bdf816a5637a5c1d8cb54", + project_subdir: None, + extensions: &["cs"], + min_baseline_files: 150, + min_baseline_loc: 15_000, + min_indexed_files: 150, + min_nodes: 25, + max_errors: 0, + }, + OssCorpusCase { + language: "kotlin", + repo_name: "square/okio", + repo_url: "https://github.com/square/okio.git", + commit: "722c8be0043d99b7b08d169b0ae90a24c15267ff", + project_subdir: None, + extensions: &["kt", "kts"], + min_baseline_files: 100, + min_baseline_loc: 10_000, + min_indexed_files: 100, + min_nodes: 25, + max_errors: 0, + }, + OssCorpusCase { + language: "swift", + repo_name: "Alamofire/Alamofire", + repo_url: "https://github.com/Alamofire/Alamofire.git", + commit: "7595cbcf59809f9977c5f6378500de2ad73b7ddb", + project_subdir: None, + extensions: &["swift"], + min_baseline_files: 50, + min_baseline_loc: 10_000, + min_indexed_files: 50, + min_nodes: 25, + max_errors: 0, + }, + OssCorpusCase { + language: "dart", + repo_name: "dart-lang/http", + repo_url: "https://github.com/dart-lang/http.git", + commit: "89cec60a4249ae0a0316f7a50d37ac56597f52c3", + project_subdir: None, + extensions: &["dart"], + min_baseline_files: 50, + min_baseline_loc: 5_000, + min_indexed_files: 50, + min_nodes: 25, + max_errors: 0, + }, + OssCorpusCase { + language: "bash", + repo_name: "nvm-sh/nvm", + repo_url: "https://github.com/nvm-sh/nvm.git", + commit: "7079a5d61c2b49c7d35a72006860ce5edb0fac51", + project_subdir: None, + extensions: &["sh", "bash"], + min_baseline_files: 5, + min_baseline_loc: 3_000, + min_indexed_files: 5, + min_nodes: 10, + max_errors: 0, + }, + OssCorpusCase { + language: "html", + repo_name: "mdn/learning-area", + repo_url: "https://github.com/mdn/learning-area.git", + commit: "ca1ff0bd06e12b96a6742ffdf040bb22966e5a5e", + project_subdir: None, + extensions: &["html", "htm"], + min_baseline_files: 300, + min_baseline_loc: 20_000, + min_indexed_files: 300, + min_nodes: 20, + max_errors: 0, + }, + OssCorpusCase { + language: "css", + repo_name: "animate-css/animate.css", + repo_url: "https://github.com/animate-css/animate.css.git", + commit: "3f8ab233dbbd9d2fe577528d2296382954be3d1a", + project_subdir: None, + extensions: &["css"], + min_baseline_files: 20, + min_baseline_loc: 1_000, + min_indexed_files: 20, + min_nodes: 10, + max_errors: 0, + }, + OssCorpusCase { + language: "sql", + repo_name: "lerocha/chinook-database", + repo_url: "https://github.com/lerocha/chinook-database.git", + commit: "7f67772503d71ba90f19283c38e93923addb43fa", + project_subdir: None, + extensions: &["sql"], + min_baseline_files: 10, + min_baseline_loc: 10_000, + min_indexed_files: 10, + min_nodes: 10, + max_errors: 0, + }, +]; + +#[test] +#[ignore = "external OSS corpus; set CODESTORY_RUN_OSS_LANGUAGE_CORPUS=1 to clone/index or CODESTORY_OSS_CORPUS_DRY_RUN=1 for manifest validation"] +fn oss_language_corpus_compares_raw_baseline_to_codestory() -> Result<()> { + validate_manifest()?; + + let dry_run = env_flag("CODESTORY_OSS_CORPUS_DRY_RUN"); + let run_corpus = env_flag("CODESTORY_RUN_OSS_LANGUAGE_CORPUS"); + let selected_languages = selected_languages()?; + let selected_cases = selected_cases(selected_languages.as_ref())?; + + if dry_run && !run_corpus { + println!( + "validated {} OSS corpus manifest entries without cloning or indexing", + selected_cases.len() + ); + return Ok(()); + } + + if !run_corpus { + bail!( + "set CODESTORY_RUN_OSS_LANGUAGE_CORPUS=1 to clone and index the OSS corpus, \ + or CODESTORY_OSS_CORPUS_DRY_RUN=1 to validate only the manifest" + ); + } + + let cache_root = corpus_cache_root(); + fs::create_dir_all(&cache_root) + .with_context(|| format!("creating corpus cache {}", cache_root.display()))?; + let report_path = target_artifact_root() + .join("oss-language-corpus") + .join("reports") + .join("oss-language-corpus-latest.jsonl"); + if let Some(parent) = report_path.parent() { + fs::create_dir_all(parent) + .with_context(|| format!("creating report directory {}", parent.display()))?; + } + let mut writer = BufWriter::new( + File::create(&report_path) + .with_context(|| format!("creating report {}", report_path.display()))?, + ); + + let mut failures = Vec::new(); + for case in selected_cases { + match run_case(case, &cache_root) { + Ok(report) => { + let row = report_json(&report); + writeln!(writer, "{row}")?; + println!( + "{}: raw_files={} raw_loc={} codestory_indexed_files={} nodes={} edges={} errors={} index_ms={}", + report.language, + report.raw_files, + report.raw_loc, + report.codestory_indexed_files, + report.nodes, + report.edges, + report.errors, + report.index_ms + ); + } + Err(error) => { + let row = json!({ + "language": case.language, + "repo_name": case.repo_name, + "commit": case.commit, + "status": "failed", + "error": format!("{error:#}"), + }); + writeln!(writer, "{row}")?; + failures.push(format!("{}: {error:#}", case.language)); + } + } + } + + writer.flush()?; + println!( + "wrote OSS language corpus report to {}", + report_path.display() + ); + + if !failures.is_empty() { + bail!("OSS language corpus failures:\n{}", failures.join("\n")); + } + + Ok(()) +} + +fn run_case(case: &OssCorpusCase, cache_root: &Path) -> Result { + let checkout_started = Instant::now(); + let checkout_root = ensure_checkout(case, cache_root)?; + let project_root = if let Some(subdir) = case.project_subdir { + checkout_root.join(subdir) + } else { + checkout_root + } + .canonicalize() + .with_context(|| format!("canonicalizing project root for {}", case.repo_name))?; + let checkout_ms = checkout_started.elapsed().as_millis(); + + let baseline_started = Instant::now(); + let baseline = raw_baseline(&project_root, case.extensions)?; + let baseline_ms = baseline_started.elapsed().as_millis(); + assert_baseline_thresholds(case, &baseline)?; + + let index_started = Instant::now(); + let mut storage = Storage::new_in_memory()?; + let event_bus = EventBus::new(); + let refresh_info = RefreshInfo { + mode: BuildMode::Incremental, + files_to_index: baseline.files.clone(), + files_to_remove: Vec::new(), + existing_file_ids: HashMap::new(), + }; + let indexer = WorkspaceIndexer::new(project_root); + let stats = indexer.run_incremental(&mut storage, &refresh_info, &event_bus, None)?; + let index_ms = index_started.elapsed().as_millis(); + + let stored_files = storage.get_files()?; + let codestory_indexed_files = stored_files.iter().filter(|file| file.indexed).count(); + let nodes = storage.get_nodes()?; + let edges = storage.get_edges()?; + let errors = storage.get_errors(None)?; + let fatal_errors = errors.iter().filter(|error| error.is_fatal).count(); + let error_samples = errors + .iter() + .take(10) + .map(|error| { + format!( + "{:?}: fatal={} line={:?} column={:?}: {}", + error.index_step, error.is_fatal, error.line, error.column, error.message + ) + }) + .collect(); + + let report = CorpusReport { + language: case.language, + repo_name: case.repo_name, + commit: case.commit, + raw_files: baseline.file_count, + raw_loc: baseline.loc, + codestory_input_files: baseline.files.len(), + codestory_stored_files: stored_files.len(), + codestory_indexed_files, + nodes: nodes.len(), + edges: edges.len(), + errors: errors.len(), + fatal_errors, + error_samples, + checkout_ms, + baseline_ms, + index_ms, + stats, + }; + + assert_codestory_thresholds(case, &report)?; + Ok(report) +} + +fn validate_manifest() -> Result<()> { + let expected: BTreeSet<&str> = SUPPORTED_LANGUAGE_NAMES.iter().copied().collect(); + let actual: BTreeSet<&str> = OSS_CORPUS.iter().map(|case| case.language).collect(); + if expected != actual { + let missing: Vec<&str> = expected.difference(&actual).copied().collect(); + let extra: Vec<&str> = actual.difference(&expected).copied().collect(); + bail!( + "OSS corpus manifest must match supported language names; missing={missing:?} extra={extra:?}" + ); + } + + let mut repos = HashSet::new(); + for case in OSS_CORPUS { + if !repos.insert(case.repo_name) { + bail!("duplicate OSS corpus repo {}", case.repo_name); + } + let profile = language_support_profile_for_language_name(case.language) + .with_context(|| format!("{} is not a supported language", case.language))?; + if profile.language_name != case.language { + bail!( + "{} profile normalized to unexpected language {}", + case.language, + profile.language_name + ); + } + for extension in case.extensions { + let ext_profile = language_support_profile_for_ext(extension).with_context(|| { + format!( + "{} corpus extension .{} is not routed by CodeStory", + case.language, extension + ) + })?; + if ext_profile.language_name != case.language { + bail!( + "{} corpus extension .{} routes to {}, not {}", + case.language, + extension, + ext_profile.language_name, + case.language + ); + } + } + } + + Ok(()) +} + +fn selected_cases( + selected_languages: Option<&HashSet>, +) -> Result> { + let cases: Vec<&OssCorpusCase> = OSS_CORPUS + .iter() + .filter(|case| { + selected_languages + .map(|languages| languages.contains(case.language)) + .unwrap_or(true) + }) + .collect(); + if cases.is_empty() { + bail!("CODESTORY_OSS_CORPUS_LANGUAGES did not select any corpus cases"); + } + Ok(cases) +} + +fn selected_languages() -> Result>> { + let value = match env::var("CODESTORY_OSS_CORPUS_LANGUAGES") { + Ok(value) if !value.trim().is_empty() => value, + _ => return Ok(None), + }; + + let supported: HashSet<&str> = SUPPORTED_LANGUAGE_NAMES.iter().copied().collect(); + let mut selected = HashSet::new(); + for part in value.split(',') { + let language = part.trim().to_ascii_lowercase(); + if language.is_empty() { + continue; + } + if !supported.contains(language.as_str()) { + bail!("unknown CODESTORY_OSS_CORPUS_LANGUAGES entry {language}"); + } + selected.insert(language); + } + Ok(Some(selected)) +} + +fn assert_baseline_thresholds(case: &OssCorpusCase, baseline: &RawBaseline) -> Result<()> { + if baseline.file_count < case.min_baseline_files { + bail!( + "{} raw baseline found {} files, below threshold {}", + case.language, + baseline.file_count, + case.min_baseline_files + ); + } + if baseline.loc < case.min_baseline_loc { + bail!( + "{} raw baseline found {} LOC, below threshold {}", + case.language, + baseline.loc, + case.min_baseline_loc + ); + } + Ok(()) +} + +fn assert_codestory_thresholds(case: &OssCorpusCase, report: &CorpusReport) -> Result<()> { + if report.codestory_input_files != report.raw_files { + bail!( + "{} CodeStory input file count {} did not match raw baseline count {}", + case.language, + report.codestory_input_files, + report.raw_files + ); + } + if report.codestory_stored_files != report.raw_files { + bail!( + "{} CodeStory stored {} files, but raw baseline found {}", + case.language, + report.codestory_stored_files, + report.raw_files + ); + } + if report.codestory_indexed_files < case.min_indexed_files { + bail!( + "{} CodeStory indexed {} files, below threshold {}", + case.language, + report.codestory_indexed_files, + case.min_indexed_files + ); + } + if report.nodes < case.min_nodes { + bail!( + "{} CodeStory emitted {} nodes, below threshold {}", + case.language, + report.nodes, + case.min_nodes + ); + } + if report.errors > case.max_errors { + bail!( + "{} CodeStory emitted {} errors, above threshold {}; samples: {:?}", + case.language, + report.errors, + case.max_errors, + report.error_samples + ); + } + if report.fatal_errors > 0 { + bail!( + "{} CodeStory emitted {} fatal errors", + case.language, + report.fatal_errors + ); + } + Ok(()) +} + +fn raw_baseline(root: &Path, extensions: &[&str]) -> Result { + let wanted: HashSet = extensions + .iter() + .map(|extension| normalize_extension(extension)) + .collect(); + let mut files = Vec::new(); + collect_matching_files(root, &wanted, &mut files)?; + files.sort(); + + let mut loc = 0usize; + for file in &files { + let bytes = fs::read(file).with_context(|| format!("reading {}", file.display()))?; + if bytes.is_empty() { + continue; + } + loc += bytes.iter().filter(|byte| **byte == b'\n').count(); + if !bytes.ends_with(b"\n") { + loc += 1; + } + } + + Ok(RawBaseline { + file_count: files.len(), + files, + loc, + }) +} + +fn collect_matching_files( + dir: &Path, + wanted_extensions: &HashSet, + out: &mut Vec, +) -> Result<()> { + let entries = + fs::read_dir(dir).with_context(|| format!("reading directory {}", dir.display()))?; + for entry in entries { + let entry = entry?; + let path = entry.path(); + let file_type = entry.file_type()?; + if file_type.is_dir() { + if should_skip_dir(&path) { + continue; + } + collect_matching_files(&path, wanted_extensions, out)?; + } else if file_type.is_file() && path_has_extension(&path, wanted_extensions) { + out.push(path); + } + } + Ok(()) +} + +fn should_skip_dir(path: &Path) -> bool { + path.file_name() + .and_then(|name| name.to_str()) + .map(|name| SKIPPED_DIRS.contains(&name)) + .unwrap_or(false) +} + +fn path_has_extension(path: &Path, wanted_extensions: &HashSet) -> bool { + path.extension() + .and_then(|extension| extension.to_str()) + .map(normalize_extension) + .map(|extension| wanted_extensions.contains(&extension)) + .unwrap_or(false) +} + +fn ensure_checkout(case: &OssCorpusCase, cache_root: &Path) -> Result { + let checkout_root = cache_root.join(sanitize_repo_name(case.repo_name)); + fs::create_dir_all(&checkout_root) + .with_context(|| format!("creating checkout cache {}", checkout_root.display()))?; + + if !checkout_root.join(".git").is_dir() { + if fs::read_dir(&checkout_root)?.next().is_some() { + bail!( + "cache path {} exists but is not an empty git checkout", + checkout_root.display() + ); + } + run_git(&checkout_root, &["init"])?; + run_git(&checkout_root, &["remote", "add", "origin", case.repo_url])?; + } else { + run_git( + &checkout_root, + &["remote", "set-url", "origin", case.repo_url], + )?; + } + + run_git( + &checkout_root, + &["fetch", "--depth", "1", "origin", case.commit], + )?; + run_git( + &checkout_root, + &[ + "-c", + "advice.detachedHead=false", + "checkout", + "--detach", + "FETCH_HEAD", + ], + )?; + let head = git_stdout(&checkout_root, &["rev-parse", "HEAD"])?; + if head.trim() != case.commit { + bail!( + "{} checkout head {} did not match expected {}", + case.repo_name, + head.trim(), + case.commit + ); + } + + Ok(checkout_root) +} + +fn run_git(cwd: &Path, args: &[&str]) -> Result<()> { + let output = Command::new("git") + .args(args) + .current_dir(cwd) + .output() + .with_context(|| format!("running git {:?} in {}", args, cwd.display()))?; + if !output.status.success() { + bail!( + "git {:?} failed in {}\nstdout:\n{}\nstderr:\n{}", + args, + cwd.display(), + String::from_utf8_lossy(&output.stdout), + String::from_utf8_lossy(&output.stderr) + ); + } + Ok(()) +} + +fn git_stdout(cwd: &Path, args: &[&str]) -> Result { + let output = Command::new("git") + .args(args) + .current_dir(cwd) + .output() + .with_context(|| format!("running git {:?} in {}", args, cwd.display()))?; + if !output.status.success() { + bail!( + "git {:?} failed in {}\nstdout:\n{}\nstderr:\n{}", + args, + cwd.display(), + String::from_utf8_lossy(&output.stdout), + String::from_utf8_lossy(&output.stderr) + ); + } + Ok(String::from_utf8_lossy(&output.stdout).to_string()) +} + +fn corpus_cache_root() -> PathBuf { + env::var_os("CODESTORY_OSS_CORPUS_CACHE") + .map(PathBuf::from) + .unwrap_or_else(|| { + target_artifact_root() + .join("oss-language-corpus") + .join("repos") + }) +} + +fn target_artifact_root() -> PathBuf { + if let Some(target_dir) = env::var_os("CARGO_TARGET_DIR").map(PathBuf::from) { + if target_dir.is_absolute() { + target_dir + } else { + workspace_root().join(target_dir) + } + } else { + workspace_root().join("target") + } +} + +fn workspace_root() -> PathBuf { + Path::new(env!("CARGO_MANIFEST_DIR")) + .parent() + .and_then(Path::parent) + .map(Path::to_path_buf) + .unwrap_or_else(|| PathBuf::from(env!("CARGO_MANIFEST_DIR"))) +} + +fn report_json(report: &CorpusReport) -> serde_json::Value { + let mut timings = BTreeMap::new(); + timings.insert("checkout_ms", report.checkout_ms); + timings.insert("raw_baseline_ms", report.baseline_ms); + timings.insert("codestory_index_ms", report.index_ms); + + json!({ + "language": report.language, + "repo_name": report.repo_name, + "commit": report.commit, + "status": "passed", + "raw_without_codestory": { + "files": report.raw_files, + "loc": report.raw_loc, + }, + "with_codestory": { + "input_files": report.codestory_input_files, + "stored_files": report.codestory_stored_files, + "indexed_files": report.codestory_indexed_files, + "nodes": report.nodes, + "edges": report.edges, + "errors": report.errors, + "fatal_errors": report.fatal_errors, + "error_samples": report.error_samples, + }, + "timings": timings, + "indexing_stats": { + "parse_index_ms": report.stats.parse_index_ms, + "edge_resolution_ms": report.stats.edge_resolution_ms, + "artifact_cache_hits": report.stats.artifact_cache_hits, + "artifact_cache_misses": report.stats.artifact_cache_misses, + "artifact_cache_writes": report.stats.artifact_cache_writes, + "resolved_calls": report.stats.resolved_calls, + "resolved_imports": report.stats.resolved_imports, + "unresolved_calls_end": report.stats.unresolved_calls_end, + "unresolved_imports_end": report.stats.unresolved_imports_end, + "resolution_ran": report.stats.resolution_ran, + }, + }) +} + +fn normalize_extension(extension: &str) -> String { + extension + .trim() + .trim_start_matches('.') + .to_ascii_lowercase() +} + +fn sanitize_repo_name(repo_name: &str) -> String { + repo_name + .chars() + .map(|ch| { + if ch.is_ascii_alphanumeric() || ch == '-' || ch == '_' { + ch + } else { + '-' + } + }) + .collect() +} + +fn env_flag(name: &str) -> bool { + env::var(name) + .map(|value| { + matches!( + value.trim(), + "1" | "true" | "TRUE" | "yes" | "YES" | "on" | "ON" + ) + }) + .unwrap_or(false) +} diff --git a/crates/codestory-indexer/tests/query_rule_regressions.rs b/crates/codestory-indexer/tests/query_rule_regressions.rs index 0fb8a6c3..fc7d63be 100644 --- a/crates/codestory-indexer/tests/query_rule_regressions.rs +++ b/crates/codestory-indexer/tests/query_rule_regressions.rs @@ -48,6 +48,13 @@ fn has_node_kind(nodes: &[Node], name: &str, kind: NodeKind) -> bool { .any(|node| matches_name(&node.serialized_name, name) && node.kind == kind) } +fn node_kind_count(nodes: &[Node], name: &str, kind: NodeKind) -> usize { + nodes + .iter() + .filter(|node| matches_name(&node.serialized_name, name) && node.kind == kind) + .count() +} + fn edge_between( nodes: &[Node], edges: &[Edge], @@ -613,6 +620,51 @@ class Example implements Runner { Ok(()) } +#[test] +fn test_java_field_lambda_does_not_duplicate_local_lambda_rule() -> anyhow::Result<()> { + let (nodes, _) = index_project(&[( + "Main.java", + r#" +import java.util.function.Predicate; + +class Example { + private static final Predicate NON_EMPTY = value -> !value.isEmpty(); + + void run() { + Predicate local = value -> value.length() > 1; + } +} +"#, + )])?; + + assert!(has_node_kind(&nodes, "NON_EMPTY", NodeKind::FIELD)); + assert!( + !has_node_kind(&nodes, "NON_EMPTY", NodeKind::FUNCTION), + "field-assigned lambda should not also match the local lambda rule" + ); + assert!(has_node_kind(&nodes, "local", NodeKind::FUNCTION)); + + Ok(()) +} + +#[test] +fn test_bash_declaration_assignment_does_not_duplicate_variable_rule() -> anyhow::Result<()> { + let (nodes, _) = index_project(&[( + "script.sh", + r#" +run() { + local ROOT="$PWD" + readonly NAME="codestory" +} +"#, + )])?; + + assert_eq!(node_kind_count(&nodes, "ROOT", NodeKind::VARIABLE), 1); + assert_eq!(node_kind_count(&nodes, "NAME", NodeKind::VARIABLE), 1); + + Ok(()) +} + #[test] fn test_java_annotations_constructors_inner_members_and_enum_constants_surface() -> anyhow::Result<()> { diff --git a/crates/codestory-indexer/tests/tictactoe_language_coverage.rs b/crates/codestory-indexer/tests/tictactoe_language_coverage.rs index 913fea45..1366522a 100644 --- a/crates/codestory-indexer/tests/tictactoe_language_coverage.rs +++ b/crates/codestory-indexer/tests/tictactoe_language_coverage.rs @@ -1,5 +1,6 @@ use anyhow::{Result, anyhow}; use codestory_contracts::graph::{AccessKind, Edge, EdgeKind, Node, NodeId, NodeKind}; +use codestory_contracts::language_support::{LANGUAGE_SUPPORT_PROFILES, LanguageSupportMode}; use codestory_indexer::{get_language_for_ext, index_file}; use std::path::Path; @@ -14,6 +15,10 @@ const GO_SOURCE: &str = include_str!("fixtures/tictactoe/go_tictactoe.go"); const RUBY_SOURCE: &str = include_str!("fixtures/tictactoe/ruby_tictactoe.rb"); const PHP_SOURCE: &str = include_str!("fixtures/tictactoe/php_tictactoe.php"); const CSHARP_SOURCE: &str = include_str!("fixtures/tictactoe/csharp_tictactoe.cs"); +const KOTLIN_SOURCE: &str = include_str!("fixtures/tictactoe/kotlin_tictactoe.kt"); +const SWIFT_SOURCE: &str = include_str!("fixtures/tictactoe/swift_tictactoe.swift"); +const DART_SOURCE: &str = include_str!("fixtures/tictactoe/dart_tictactoe.dart"); +const BASH_SOURCE: &str = include_str!("fixtures/tictactoe/bash_tictactoe.sh"); type NamePair = (&'static str, &'static str); @@ -47,20 +52,20 @@ const JAVA_SYMBOLS: &[(NodeKind, &str)] = &[ (NodeKind::CLASS, "Move"), (NodeKind::CLASS, "Node"), (NodeKind::CLASS, "TicTacToe"), - (NodeKind::FUNCTION, "numberIn"), - (NodeKind::FUNCTION, "numberOut"), - (NodeKind::FUNCTION, "stringOut"), - (NodeKind::FUNCTION, "sameInRow"), - (NodeKind::FUNCTION, "makeMove"), - (NodeKind::FUNCTION, "turn"), - (NodeKind::FUNCTION, "_input"), - (NodeKind::FUNCTION, "_check"), - (NodeKind::FUNCTION, "checkWinner"), - (NodeKind::FUNCTION, "isDraw"), - (NodeKind::FUNCTION, "probeCalls"), - (NodeKind::FUNCTION, "run"), - (NodeKind::FUNCTION, "main"), - (NodeKind::FUNCTION, "_minMax"), + (NodeKind::METHOD, "numberIn"), + (NodeKind::METHOD, "numberOut"), + (NodeKind::METHOD, "stringOut"), + (NodeKind::METHOD, "sameInRow"), + (NodeKind::METHOD, "makeMove"), + (NodeKind::METHOD, "turn"), + (NodeKind::METHOD, "_input"), + (NodeKind::METHOD, "_check"), + (NodeKind::METHOD, "checkWinner"), + (NodeKind::METHOD, "isDraw"), + (NodeKind::METHOD, "probeCalls"), + (NodeKind::METHOD, "run"), + (NodeKind::METHOD, "main"), + (NodeKind::METHOD, "_minMax"), ]; const RUST_SYMBOLS: &[(NodeKind, &str)] = &[ (NodeKind::STRUCT, "GameObject"), @@ -75,16 +80,16 @@ const RUST_SYMBOLS: &[(NodeKind, &str)] = &[ (NodeKind::FUNCTION, "number_in"), (NodeKind::FUNCTION, "number_out"), (NodeKind::FUNCTION, "string_out"), - (NodeKind::FUNCTION, "same_in_row"), - (NodeKind::FUNCTION, "make_move"), + (NodeKind::METHOD, "same_in_row"), + (NodeKind::METHOD, "make_move"), (NodeKind::FUNCTION, "check_winner"), (NodeKind::FUNCTION, "is_draw"), - (NodeKind::FUNCTION, "turn"), - (NodeKind::FUNCTION, "run"), - (NodeKind::FUNCTION, "start"), - (NodeKind::FUNCTION, "_select_player"), + (NodeKind::METHOD, "turn"), + (NodeKind::METHOD, "run"), + (NodeKind::METHOD, "start"), + (NodeKind::METHOD, "_select_player"), (NodeKind::FUNCTION, "main"), - (NodeKind::FUNCTION, "min_max"), + (NodeKind::METHOD, "min_max"), ]; const JAVASCRIPT_SYMBOLS: &[(NodeKind, &str)] = &[ (NodeKind::CLASS, "GameObject"), @@ -96,17 +101,17 @@ const JAVASCRIPT_SYMBOLS: &[(NodeKind, &str)] = &[ (NodeKind::FUNCTION, "numberIn"), (NodeKind::FUNCTION, "numberOut"), (NodeKind::FUNCTION, "stringOut"), - (NodeKind::FUNCTION, "sameInRow"), - (NodeKind::FUNCTION, "makeMove"), - (NodeKind::FUNCTION, "turn"), - (NodeKind::FUNCTION, "selectPlayer"), - (NodeKind::FUNCTION, "start"), - (NodeKind::FUNCTION, "run"), - (NodeKind::FUNCTION, "probeCalls"), + (NodeKind::METHOD, "sameInRow"), + (NodeKind::METHOD, "makeMove"), + (NodeKind::METHOD, "turn"), + (NodeKind::METHOD, "selectPlayer"), + (NodeKind::METHOD, "start"), + (NodeKind::METHOD, "run"), + (NodeKind::METHOD, "probeCalls"), (NodeKind::FUNCTION, "main"), - (NodeKind::FUNCTION, "checkWinner"), - (NodeKind::FUNCTION, "isDraw"), - (NodeKind::FUNCTION, "minMax"), + (NodeKind::METHOD, "checkWinner"), + (NodeKind::METHOD, "isDraw"), + (NodeKind::METHOD, "minMax"), ]; const TYPESCRIPT_SYMBOLS: &[(NodeKind, &str)] = &[ (NodeKind::CLASS, "GameObject"), @@ -118,17 +123,17 @@ const TYPESCRIPT_SYMBOLS: &[(NodeKind, &str)] = &[ (NodeKind::FUNCTION, "numberIn"), (NodeKind::FUNCTION, "numberOut"), (NodeKind::FUNCTION, "stringOut"), - (NodeKind::FUNCTION, "sameInRow"), - (NodeKind::FUNCTION, "makeMove"), - (NodeKind::FUNCTION, "turn"), - (NodeKind::FUNCTION, "selectPlayer"), - (NodeKind::FUNCTION, "start"), - (NodeKind::FUNCTION, "run"), - (NodeKind::FUNCTION, "probeCalls"), + (NodeKind::METHOD, "sameInRow"), + (NodeKind::METHOD, "makeMove"), + (NodeKind::METHOD, "turn"), + (NodeKind::METHOD, "selectPlayer"), + (NodeKind::METHOD, "start"), + (NodeKind::METHOD, "run"), + (NodeKind::METHOD, "probeCalls"), (NodeKind::FUNCTION, "main"), - (NodeKind::FUNCTION, "checkWinner"), - (NodeKind::FUNCTION, "isDraw"), - (NodeKind::FUNCTION, "minMax"), + (NodeKind::METHOD, "checkWinner"), + (NodeKind::METHOD, "isDraw"), + (NodeKind::METHOD, "minMax"), ]; const CPP_SYMBOLS: &[(NodeKind, &str)] = &[ (NodeKind::CLASS, "GameObject"), @@ -272,9 +277,9 @@ const RUBY_SYMBOLS: &[(NodeKind, &str)] = &[ (NodeKind::CLASS, "HumanPlayer"), (NodeKind::CLASS, "ArtificialPlayer"), (NodeKind::CLASS, "TicTacToe"), - (NodeKind::FUNCTION, "numberIn"), - (NodeKind::FUNCTION, "numberOut"), - (NodeKind::FUNCTION, "stringOut"), + (NodeKind::METHOD, "numberIn"), + (NodeKind::METHOD, "numberOut"), + (NodeKind::METHOD, "stringOut"), (NodeKind::METHOD, "makeMove"), (NodeKind::METHOD, "sameInRow"), (NodeKind::METHOD, "turn"), @@ -314,21 +319,145 @@ const CSHARP_SYMBOLS: &[(NodeKind, &str)] = &[ (NodeKind::METHOD, "run"), (NodeKind::METHOD, "Main"), ]; +const KOTLIN_SYMBOLS: &[(NodeKind, &str)] = &[ + (NodeKind::CLASS, "GameObject"), + (NodeKind::CLASS, "Field"), + (NodeKind::INTERFACE, "Player"), + (NodeKind::CLASS, "HumanPlayer"), + (NodeKind::CLASS, "ArtificialPlayer"), + (NodeKind::CLASS, "TicTacToe"), + (NodeKind::FUNCTION, "numberIn"), + (NodeKind::FUNCTION, "numberOut"), + (NodeKind::FUNCTION, "stringOut"), + (NodeKind::METHOD, "makeMove"), + (NodeKind::METHOD, "sameInRow"), + (NodeKind::METHOD, "turn"), + (NodeKind::METHOD, "minMax"), + (NodeKind::METHOD, "run"), + (NodeKind::FUNCTION, "main"), +]; +const SWIFT_SYMBOLS: &[(NodeKind, &str)] = &[ + (NodeKind::CLASS, "GameObject"), + (NodeKind::CLASS, "Field"), + (NodeKind::INTERFACE, "Player"), + (NodeKind::CLASS, "HumanPlayer"), + (NodeKind::CLASS, "ArtificialPlayer"), + (NodeKind::CLASS, "TicTacToe"), + (NodeKind::FUNCTION, "numberIn"), + (NodeKind::FUNCTION, "numberOut"), + (NodeKind::FUNCTION, "stringOut"), + (NodeKind::METHOD, "makeMove"), + (NodeKind::METHOD, "sameInRow"), + (NodeKind::METHOD, "turn"), + (NodeKind::METHOD, "minMax"), + (NodeKind::METHOD, "run"), + (NodeKind::FUNCTION, "main"), +]; +const DART_SYMBOLS: &[(NodeKind, &str)] = &[ + (NodeKind::CLASS, "GameObject"), + (NodeKind::CLASS, "Field"), + (NodeKind::INTERFACE, "Player"), + (NodeKind::CLASS, "HumanPlayer"), + (NodeKind::CLASS, "ArtificialPlayer"), + (NodeKind::CLASS, "TicTacToe"), + (NodeKind::FUNCTION, "numberIn"), + (NodeKind::FUNCTION, "numberOut"), + (NodeKind::FUNCTION, "stringOut"), + (NodeKind::METHOD, "makeMove"), + (NodeKind::METHOD, "sameInRow"), + (NodeKind::METHOD, "turn"), + (NodeKind::METHOD, "minMax"), + (NodeKind::METHOD, "run"), + (NodeKind::FUNCTION, "main"), +]; +const BASH_SYMBOLS: &[(NodeKind, &str)] = &[ + (NodeKind::FUNCTION, "numberIn"), + (NodeKind::FUNCTION, "numberOut"), + (NodeKind::FUNCTION, "stringOut"), + (NodeKind::FUNCTION, "sameInRow"), + (NodeKind::FUNCTION, "makeMove"), + (NodeKind::FUNCTION, "turn"), + (NodeKind::FUNCTION, "minMax"), + (NodeKind::FUNCTION, "run"), + (NodeKind::FUNCTION, "main"), + (NodeKind::VARIABLE, "token"), + (NodeKind::VARIABLE, "amount"), + (NodeKind::VARIABLE, "depth"), +]; const GO_IMPORTS: &[&str] = &["\"fmt\"", "\"math/rand\""]; const RUBY_IMPORTS: &[&str] = &["\"random\""]; const PHP_IMPORTS: &[&str] = &["Random\\Randomizer"]; const CSHARP_IMPORTS: &[&str] = &["System"]; +const KOTLIN_IMPORTS: &[&str] = &["kotlin.random.Random"]; +const SWIFT_IMPORTS: &[&str] = &["Foundation"]; +const DART_IMPORTS: &[&str] = &["'dart:math'"]; +const BASH_IMPORTS: &[&str] = &["./random.sh"]; const GO_CALLS: &[&str] = &["numberIn", "stringOut", "makeMove", "minMax"]; const RUBY_CALLS: &[&str] = &["numberIn", "stringOut", "makeMove", "minMax"]; const PHP_CALLS: &[&str] = &["numberIn", "stringOut", "makeMove", "minMax"]; const CSHARP_CALLS: &[&str] = &["numberIn", "stringOut", "makeMove", "minMax"]; +const KOTLIN_CALLS: &[&str] = &["numberIn", "stringOut", "makeMove", "minMax"]; +const SWIFT_CALLS: &[&str] = &["numberIn", "stringOut", "makeMove", "minMax"]; +const DART_CALLS: &[&str] = &["numberIn", "stringOut", "makeMove", "minMax"]; +const BASH_CALLS: &[&str] = &["numberIn", "stringOut", "makeMove", "minMax"]; -const GO_MEMBERS: &[NamePair] = &[]; -const RUBY_MEMBERS: &[NamePair] = &[]; -const PHP_MEMBERS: &[NamePair] = &[]; -const CSHARP_MEMBERS: &[NamePair] = &[]; +const GO_MEMBERS: &[NamePair] = &[ + ("Field", "makeMove"), + ("Field", "sameInRow"), + ("HumanPlayer", "turn"), + ("ArtificialPlayer", "minMax"), + ("TicTacToe", "run"), +]; +const RUBY_MEMBERS: &[NamePair] = &[ + ("Field", "makeMove"), + ("Field", "sameInRow"), + ("HumanPlayer", "turn"), + ("ArtificialPlayer", "minMax"), + ("TicTacToe", "run"), +]; +const PHP_MEMBERS: &[NamePair] = &[ + ("Field", "makeMove"), + ("Field", "sameInRow"), + ("Player", "turn"), + ("HumanPlayer", "turn"), + ("ArtificialPlayer", "minMax"), + ("TicTacToe", "run"), +]; +const CSHARP_MEMBERS: &[NamePair] = &[ + ("Field", "makeMove"), + ("Field", "sameInRow"), + ("Player", "turn"), + ("HumanPlayer", "turn"), + ("ArtificialPlayer", "minMax"), + ("TicTacToe", "run"), + ("Program", "Main"), +]; +const KOTLIN_MEMBERS: &[NamePair] = &[ + ("Field", "makeMove"), + ("Field", "sameInRow"), + ("HumanPlayer", "turn"), + ("ArtificialPlayer", "minMax"), + ("TicTacToe", "run"), +]; +const SWIFT_MEMBERS: &[NamePair] = &[ + ("Field", "makeMove"), + ("Field", "sameInRow"), + ("Player", "turn"), + ("HumanPlayer", "turn"), + ("ArtificialPlayer", "minMax"), + ("TicTacToe", "run"), +]; +const DART_MEMBERS: &[NamePair] = &[ + ("Field", "makeMove"), + ("Field", "sameInRow"), + ("Player", "turn"), + ("HumanPlayer", "turn"), + ("ArtificialPlayer", "minMax"), + ("TicTacToe", "run"), +]; +const BASH_MEMBERS: &[NamePair] = &[]; const GO_INHERITANCE: &[NamePair] = &[]; const RUBY_INHERITANCE: &[NamePair] = &[ @@ -339,6 +468,25 @@ const RUBY_INHERITANCE: &[NamePair] = &[ ]; const PHP_INHERITANCE: &[NamePair] = &[("Field", "GameObject"), ("TicTacToe", "GameObject")]; const CSHARP_INHERITANCE: &[NamePair] = &[("Field", "GameObject"), ("TicTacToe", "GameObject")]; +const KOTLIN_INHERITANCE: &[NamePair] = &[ + ("Field", "GameObject"), + ("HumanPlayer", "Player"), + ("ArtificialPlayer", "Player"), + ("TicTacToe", "GameObject"), +]; +const SWIFT_INHERITANCE: &[NamePair] = &[ + ("Field", "GameObject"), + ("HumanPlayer", "Player"), + ("ArtificialPlayer", "Player"), + ("TicTacToe", "GameObject"), +]; +const DART_INHERITANCE: &[NamePair] = &[ + ("Field", "GameObject"), + ("HumanPlayer", "Player"), + ("ArtificialPlayer", "Player"), + ("TicTacToe", "GameObject"), +]; +const BASH_INHERITANCE: &[NamePair] = &[]; #[derive(Clone, Copy)] struct FixtureCase { @@ -500,6 +648,58 @@ fn fixture_cases() -> Vec { required_member_pairs: CSHARP_MEMBERS, required_inheritance_pairs: CSHARP_INHERITANCE, }, + FixtureCase { + language: "kotlin", + filename: "game.kt", + extension: "kt", + source: KOTLIN_SOURCE, + min_nodes: 15, + min_edges: 12, + required_symbols: KOTLIN_SYMBOLS, + required_import_targets: KOTLIN_IMPORTS, + required_call_targets: KOTLIN_CALLS, + required_member_pairs: KOTLIN_MEMBERS, + required_inheritance_pairs: KOTLIN_INHERITANCE, + }, + FixtureCase { + language: "swift", + filename: "game.swift", + extension: "swift", + source: SWIFT_SOURCE, + min_nodes: 15, + min_edges: 12, + required_symbols: SWIFT_SYMBOLS, + required_import_targets: SWIFT_IMPORTS, + required_call_targets: SWIFT_CALLS, + required_member_pairs: SWIFT_MEMBERS, + required_inheritance_pairs: SWIFT_INHERITANCE, + }, + FixtureCase { + language: "dart", + filename: "game.dart", + extension: "dart", + source: DART_SOURCE, + min_nodes: 15, + min_edges: 12, + required_symbols: DART_SYMBOLS, + required_import_targets: DART_IMPORTS, + required_call_targets: DART_CALLS, + required_member_pairs: DART_MEMBERS, + required_inheritance_pairs: DART_INHERITANCE, + }, + FixtureCase { + language: "bash", + filename: "game.sh", + extension: "sh", + source: BASH_SOURCE, + min_nodes: 10, + min_edges: 8, + required_symbols: BASH_SYMBOLS, + required_import_targets: BASH_IMPORTS, + required_call_targets: BASH_CALLS, + required_member_pairs: BASH_MEMBERS, + required_inheritance_pairs: BASH_INHERITANCE, + }, ] } @@ -526,9 +726,7 @@ fn is_matching_name(serialized_name: &str, wanted_name: &str) -> bool { fn has_node(nodes: &[Node], kind: NodeKind, name: &str) -> bool { nodes.iter().any(|node| { - let kind_matches = if kind == NodeKind::FUNCTION { - node.kind == NodeKind::FUNCTION || node.kind == NodeKind::METHOD - } else if kind == NodeKind::VARIABLE { + let kind_matches = if kind == NodeKind::VARIABLE { node.kind == NodeKind::VARIABLE || node.kind == NodeKind::FIELD } else { node.kind == kind @@ -590,44 +788,31 @@ fn access_for_name( #[test] fn test_language_extension_coverage_and_names() { - let expected = [ - ("py", "python"), - ("pyi", "python"), - ("java", "java"), - ("rs", "rust"), - ("js", "javascript"), - ("jsx", "javascript"), - ("mjs", "javascript"), - ("cjs", "javascript"), - ("ts", "typescript"), - ("tsx", "typescript"), - ("mts", "typescript"), - ("cts", "typescript"), - ("cpp", "cpp"), - ("cc", "cpp"), - ("cxx", "cpp"), - ("h", "c"), - ("hh", "cpp"), - ("hpp", "cpp"), - ("hxx", "cpp"), - ("c", "c"), - ("go", "go"), - ("rb", "ruby"), - ("php", "php"), - ("cs", "csharp"), - ]; - - for (ext, expected_name) in expected { - let language_config = - get_language_for_ext(ext).expect("Extension should resolve to a language"); - assert_eq!( - language_config.language_name, expected_name, - "Wrong language name for extension {ext}" - ); - assert!( - !language_config.graph_query.trim().is_empty(), - "Expected non-empty graph query for extension {ext}" - ); + for profile in LANGUAGE_SUPPORT_PROFILES { + for ext in profile.extensions { + let language_config = get_language_for_ext(ext); + match profile.support_mode { + LanguageSupportMode::ParserBackedGraph => { + let language_config = language_config.unwrap_or_else(|| { + panic!("Parser-backed registry extension should route to indexer: {ext}") + }); + assert_eq!( + language_config.language_name, profile.language_name, + "Wrong language name for extension {ext}" + ); + assert!( + !language_config.graph_query.trim().is_empty(), + "Expected non-empty graph query for extension {ext}" + ); + } + LanguageSupportMode::StructuralCollector => { + assert!( + language_config.is_none(), + "Structural collector extension should not route through tree-sitter graph parser: {ext}" + ); + } + } + } } } @@ -640,6 +825,10 @@ fn test_language_extension_coverage_is_case_insensitive() { ("JSX", "javascript"), ("TSX", "typescript"), ("CPP", "cpp"), + ("KT", "kotlin"), + ("SWIFT", "swift"), + ("DART", "dart"), + ("SH", "bash"), ]; for (ext, expected_name) in expected { let language_config = diff --git a/crates/codestory-indexer/tests/trait_interface_resolution.rs b/crates/codestory-indexer/tests/trait_interface_resolution.rs index acf81820..c89eecce 100644 --- a/crates/codestory-indexer/tests/trait_interface_resolution.rs +++ b/crates/codestory-indexer/tests/trait_interface_resolution.rs @@ -152,6 +152,35 @@ fn assert_resolved_call_to_name( ); } +fn assert_no_resolved_call_to_name( + case_name: &str, + nodes: &[Node], + edges: &[Edge], + caller_name: &str, + callee_name: &str, +) { + let node_by_id: HashMap<_, _> = nodes.iter().map(|n| (n.id, n)).collect(); + let found = edges + .iter() + .filter(|edge| edge.kind == EdgeKind::CALL) + .filter_map(|edge| { + let source = node_by_id.get(&edge.source)?; + if !is_matching_name(&source.serialized_name, caller_name) { + return None; + } + let resolved_id = edge.resolved_target?; + let resolved_node = node_by_id.get(&resolved_id)?; + Some(resolved_node.serialized_name.as_str()) + }) + .any(|resolved_name| is_matching_name(resolved_name, callee_name)); + + assert!( + !found, + "Case `{case_name}`: did not expect CALL from `{caller_name}` to resolve to `{callee_name}`. Calls: {:?}", + describe_call_edges(edges, nodes) + ); +} + fn assert_resolved_override_to_method_owner( case_name: &str, nodes: &[Node], @@ -264,6 +293,157 @@ public: void EventBus::dispatchTo(EventListener& listener) { listener.handleEvent(); } +"#, + "dispatchTo", + "EventListener", + "handleEvent", + ), + ( + "main.go", + r#" +package main + +type ConcreteListener struct{} +func (ConcreteListener) HandleEvent() {} + +type EventListener interface { + HandleEvent() +} + +type EventBus struct{} +func (b EventBus) DispatchTo(listener EventListener) { + listener.HandleEvent() +} +"#, + "DispatchTo", + "EventListener", + "HandleEvent", + ), + ( + "main.rb", + r#" +class ConcreteListener + def handle_event + end +end + +class EventListener + def handle_event + end +end + +class EventBus + def dispatch_to + listener = EventListener.new + listener.handle_event + end +end +"#, + "dispatch_to", + "EventListener", + "handle_event", + ), + ( + "main.php", + r#" +handleEvent(); + } +} +"#, + "dispatchTo", + "EventListener", + "handleEvent", + ), + ( + "main.cs", + r#" +class ConcreteListener { + public void HandleEvent() {} +} + +interface EventListener { + void HandleEvent(); +} + +class EventBus { + void DispatchTo(EventListener listener) { + listener.HandleEvent(); + } +} +"#, + "DispatchTo", + "EventListener", + "HandleEvent", + ), + ( + "main.kt", + r#" +class ConcreteListener { + fun handleEvent() {} +} + +interface EventListener { + fun handleEvent() +} + +class EventBus { + fun dispatchTo(listener: EventListener) { + listener.handleEvent() + } +} +"#, + "dispatchTo", + "EventListener", + "handleEvent", + ), + ( + "main.swift", + r#" +class ConcreteListener { + func handleEvent() {} +} + +protocol EventListener { + func handleEvent() +} + +class EventBus { + func dispatchTo(listener: EventListener) { + listener.handleEvent() + } +} +"#, + "dispatchTo", + "EventListener", + "handleEvent", + ), + ( + "main.dart", + r#" +class ConcreteListener { + void handleEvent() {} +} + +abstract class EventListener { + void handleEvent(); +} + +class EventBus { + void dispatchTo(EventListener listener) { + listener.handleEvent(); + } +} "#, "dispatchTo", "EventListener", @@ -384,6 +564,100 @@ int caller() { callee(); return 1; } r#" int callee() { return 1; } int caller() { callee(); return 1; } +"#, + "caller", + "callee", + ), + ( + "main.go", + r#" +package main + +func callee() int { return 1 } +func caller() int { callee(); return 1 } +"#, + "caller", + "callee", + ), + ( + "main.rb", + r#" +def callee + 1 +end + +def caller + callee + 1 +end +"#, + "caller", + "callee", + ), + ( + "main.php", + r#" + Int { return 1 } +func caller() -> Int { + callee() + return 1 +} +"#, + "caller", + "callee", + ), + ( + "main.dart", + r#" +int callee() { return 1; } +int caller() { callee(); return 1; } +"#, + "caller", + "callee", + ), + ( + "main.sh", + r#" +callee() { + echo 1 +} + +caller() { + callee +} "#, "caller", "callee", @@ -398,6 +672,27 @@ int caller() { callee(); return 1; } Ok(()) } +#[test] +fn test_ruby_bare_local_variable_read_does_not_resolve_as_call() -> anyhow::Result<()> { + let (nodes, edges) = index_single_file( + "main.rb", + r#" +def callee + 1 +end + +def caller + callee = 1 + callee +end +"#, + )?; + + assert_no_resolved_call_to_name("main.rb", &nodes, &edges, "caller", "callee"); + + Ok(()) +} + #[test] fn test_override_resolution_binds_to_inherited_method_owner() -> anyhow::Result<()> { let (nodes, edges) = index_single_file( diff --git a/crates/codestory-retrieval/Cargo.toml b/crates/codestory-retrieval/Cargo.toml index 46904d60..98378fc5 100644 --- a/crates/codestory-retrieval/Cargo.toml +++ b/crates/codestory-retrieval/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "codestory-retrieval" -version = "0.6.2" +version = "0.7.0" edition = "2024" [dependencies] diff --git a/crates/codestory-retrieval/src/cache.rs b/crates/codestory-retrieval/src/cache.rs index ff7d3026..b0b06c3e 100644 --- a/crates/codestory-retrieval/src/cache.rs +++ b/crates/codestory-retrieval/src/cache.rs @@ -177,6 +177,11 @@ mod tests { sidecar_input_hash: Some("hash-a".into()), sidecar_generation: Some("abc-hash-a".into()), projection_count: Some(10), + symbol_doc_count: Some(10), + dense_projection_count: Some(10), + semantic_policy_version: Some(crate::generation::SEMANTIC_POLICY_VERSION.into()), + graph_artifact_hash: Some("graph-a".into()), + dense_reason_counts_json: Some("{\"public_api\":10}".into()), }; let mut changed = base.clone(); changed.qdrant_collection = "codestory_abc_hash_b".into(); diff --git a/crates/codestory-retrieval/src/candidate.rs b/crates/codestory-retrieval/src/candidate.rs index d7e495f9..4fcbbf0c 100644 --- a/crates/codestory-retrieval/src/candidate.rs +++ b/crates/codestory-retrieval/src/candidate.rs @@ -14,11 +14,15 @@ pub struct RankFeatures { /// Unified retrieval candidate from any sidecar lane. #[derive(Debug, Clone, PartialEq, Serialize, Deserialize)] pub struct CandidateHit { + #[serde(default, skip_serializing_if = "Option::is_none")] + pub node_id: Option, pub file_path: String, pub symbol_name: Option, pub start_line: Option, pub score: f32, pub source: CandidateSource, + #[serde(default, skip_serializing_if = "Vec::is_empty")] + pub provenance: Vec, #[serde(skip_serializing_if = "Option::is_none")] pub file_role: Option, /// SCIP graph hops from anchor (lower is better). @@ -52,11 +56,13 @@ pub fn phantom_sidecar_candidates_only(candidates: &[CandidateHit]) -> bool { impl CandidateHit { pub fn lexical_stub(file_path: impl Into, score: f32) -> Self { Self { + node_id: None, file_path: file_path.into(), symbol_name: None, start_line: None, score, source: CandidateSource::Zoekt, + provenance: vec!["lexical_source".into()], file_role: None, scip_hop_distance: None, rank_features: None, @@ -70,14 +76,23 @@ impl CandidateHit { source: CandidateSource, ) -> Self { Self { + node_id: None, file_path: file_path.into(), symbol_name, start_line: None, score, source, + provenance: Vec::new(), file_role: None, scip_hop_distance: None, rank_features: None, } } + + pub fn add_provenance(&mut self, label: impl Into) { + let label = label.into(); + if !self.provenance.iter().any(|existing| existing == &label) { + self.provenance.push(label); + } + } } diff --git a/crates/codestory-retrieval/src/executor.rs b/crates/codestory-retrieval/src/executor.rs index f021785d..5877eced 100644 --- a/crates/codestory-retrieval/src/executor.rs +++ b/crates/codestory-retrieval/src/executor.rs @@ -218,10 +218,23 @@ impl<'a> QueryExecutor<'a> { )); continue; } + if should_skip_zero_dense_stage(stage, self.manifest.as_ref()) { + stage_traces.push(stage_trace( + stage, + 0, + 0, + 0.0, + Some("zero_dense_anchors".into()), + false, + None, + )); + continue; + } let stage_started = Instant::now(); let before_score = candidate_mass(candidates); - let stage_hits = self.run_stage(stage, features, candidates)?; + let mut stage_hits = self.run_stage(stage, features, candidates)?; + annotate_stage_provenance(stage, &mut stage_hits); let (stub_reason, stage_degraded) = stage_stub_metadata(&stage_hits); let added = merge_candidates(candidates, stage_hits); let after_score = candidate_mass(candidates); @@ -306,6 +319,38 @@ fn should_skip_after_exact_symbol_anchor( .any(|candidate| candidate_is_exact_symbol_anchor(&features.raw_query, candidate)) } +fn should_skip_zero_dense_stage( + stage: &PlannedStage, + manifest: Option<&RetrievalIndexManifest>, +) -> bool { + if !matches!(stage.kind, RetrievalStageKind::Stage1bQdrantSemantic) { + return false; + } + let dense_count = manifest + .and_then(|manifest| { + manifest + .dense_projection_count + .or(manifest.projection_count) + }) + .unwrap_or(0); + dense_count <= 0 +} + +fn annotate_stage_provenance(stage: &PlannedStage, hits: &mut [CandidateHit]) { + let label = match stage.kind { + RetrievalStageKind::Stage0ScipAnchor => Some("exact"), + RetrievalStageKind::Stage1ZoektLexical => None, + RetrievalStageKind::Stage1bQdrantSemantic => Some("dense_anchor"), + RetrievalStageKind::Stage2ScipExpand => Some("graph_neighbor"), + RetrievalStageKind::Stage3RepoTextFallback => None, + }; + if let Some(label) = label { + for hit in hits { + hit.add_provenance(label); + } + } +} + fn candidate_is_exact_symbol_anchor(query: &str, candidate: &CandidateHit) -> bool { if matches!( candidate.source, @@ -355,13 +400,24 @@ fn stage_stub_metadata(hits: &[CandidateHit]) -> (Option, bool) { fn merge_candidates(acc: &mut Vec, incoming: Vec) -> usize { let mut added = 0usize; for hit in incoming { - let duplicate = acc.iter().any(|existing| { + let duplicate = acc.iter_mut().find(|existing| { existing.file_path == hit.file_path && existing.symbol_name == hit.symbol_name }); - if !duplicate { - acc.push(hit); - added += 1; + if let Some(existing) = duplicate { + existing.score = existing.score.max(hit.score); + if existing.node_id.is_none() { + existing.node_id = hit.node_id.clone(); + } + if existing.start_line.is_none() { + existing.start_line = hit.start_line; + } + for label in hit.provenance { + existing.add_provenance(label); + } + continue; } + acc.push(hit); + added += 1; } added } @@ -420,7 +476,12 @@ mod tests { sidecar_schema_version: None, sidecar_input_hash: None, sidecar_generation: None, - projection_count: None, + projection_count: Some(10), + symbol_doc_count: Some(20), + dense_projection_count: Some(10), + semantic_policy_version: None, + graph_artifact_hash: None, + dense_reason_counts_json: None, } } @@ -680,6 +741,114 @@ mod tests { ); } + #[test] + fn executor_skips_qdrant_when_policy_selects_zero_dense_anchors() { + let mock = MockSidecarSearch { + qdrant: Mutex::new(HashMap::from([( + "how does startup sequence work".into(), + vec![CandidateHit::with_source( + "src/semantic.rs", + Some("SemanticAnchor".into()), + 0.8, + CandidateSource::Qdrant, + )], + )])), + zoekt: Mutex::new(HashMap::from([( + "how does startup sequence work".into(), + vec![CandidateHit::with_source( + "src/lexical.rs", + Some("LexicalAnchor".into()), + 0.7, + CandidateSource::Zoekt, + )], + )])), + ..Default::default() + }; + let mut manifest = sample_manifest(); + manifest.projection_count = Some(0); + manifest.dense_projection_count = Some(0); + let mut cache = RetrievalCache::new(); + let mut executor = QueryExecutor { + sidecars: &mock, + cache: &mut cache, + manifest: Some(manifest), + file_roles: HashMap::new(), + cancelled: cancellation_flag(), + mode_override: Some(RetrievalDegradedMode::Full), + }; + let result = executor + .execute("how does startup sequence work", Some(800)) + .expect("query"); + assert!( + result.trace.stages.iter().any(|stage| stage.stage + == RetrievalStageKind::Stage1bQdrantSemantic + && stage.cancel_reason.as_deref() == Some("zero_dense_anchors")), + "zero dense policy should skip qdrant explicitly: {:?}", + result.trace.stages + ); + assert!( + result + .hits + .iter() + .all(|hit| hit.file_path != "src/semantic.rs"), + "qdrant hits must not be recalled when dense count is zero: {:?}", + result.hits + ); + } + + #[test] + fn executor_merges_duplicate_candidate_provenance() { + let query = "how extension service starts"; + let mock = MockSidecarSearch { + zoekt: Mutex::new(HashMap::from([( + query.into(), + vec![CandidateHit::with_source( + "src/service.rs", + Some("ExtensionService".into()), + 0.70, + CandidateSource::Zoekt, + )], + )])), + qdrant: Mutex::new(HashMap::from([( + query.into(), + vec![CandidateHit::with_source( + "src/service.rs", + Some("ExtensionService".into()), + 0.85, + CandidateSource::Qdrant, + )], + )])), + scip_expand: Mutex::new(vec![CandidateHit::with_source( + "src/service.rs", + Some("ExtensionService".into()), + 0.75, + CandidateSource::Scip, + )]), + ..Default::default() + }; + let mut cache = RetrievalCache::new(); + let mut executor = QueryExecutor { + sidecars: &mock, + cache: &mut cache, + manifest: Some(sample_manifest()), + file_roles: HashMap::new(), + cancelled: cancellation_flag(), + mode_override: Some(RetrievalDegradedMode::Full), + }; + let result = executor.execute(query, Some(800)).expect("query"); + let hit = result + .hits + .iter() + .find(|hit| hit.file_path == "src/service.rs") + .expect("merged candidate"); + assert!( + hit.score > 0.70, + "merged candidate should keep ranker-adjusted score above lexical-only input: {hit:?}" + ); + assert!(hit.provenance.iter().any(|label| label == "graph_neighbor")); + assert!(hit.provenance.iter().any(|label| label == "dense_anchor")); + } + #[test] fn executor_respects_cancellation() { let mock = MockSidecarSearch::default(); diff --git a/crates/codestory-retrieval/src/generation.rs b/crates/codestory-retrieval/src/generation.rs index e978eb40..a20479f1 100644 --- a/crates/codestory-retrieval/src/generation.rs +++ b/crates/codestory-retrieval/src/generation.rs @@ -1,7 +1,9 @@ use codestory_contracts::graph::NodeKind; use codestory_store::{LlmSymbolDoc, RetrievalIndexManifest, Store}; +use std::collections::BTreeMap; pub const SIDECAR_SCHEMA_VERSION: i32 = 1; +pub const SEMANTIC_POLICY_VERSION: &str = "graph_first_v1"; pub const SIDECAR_SEMANTIC_DOC_CONTRACT_CHANGED: &str = "sidecar_semantic_doc_embedding_contract_changed"; const STALENESS_DOC_BATCH_SIZE: usize = 1024; @@ -30,6 +32,17 @@ pub fn manifest_has_current_sidecar_contract( && manifest.sidecar_generation.as_deref() == Some(expected_generation.as_str()) && manifest.qdrant_collection == expected_collection && manifest.projection_count.is_some_and(|count| count >= 0) + && manifest.symbol_doc_count.is_some_and(|count| count >= 0) + && manifest + .dense_projection_count + .is_some_and(|count| count >= 0) + && manifest.dense_projection_count == manifest.projection_count + && manifest.semantic_policy_version.as_deref() == Some(SEMANTIC_POLICY_VERSION) + && manifest + .graph_artifact_hash + .as_deref() + .is_some_and(|hash| !hash.trim().is_empty()) + && manifest.dense_reason_counts_json.is_some() } pub fn manifest_staleness_reason( @@ -56,27 +69,58 @@ pub fn manifest_staleness_reason( )); } - if let Some(expected_count) = manifest.projection_count { + if manifest.semantic_policy_version.as_deref() != Some(SEMANTIC_POLICY_VERSION) { + return Some(format!( + "sidecar_semantic_policy_changed: manifest={} current={SEMANTIC_POLICY_VERSION}", + manifest + .semantic_policy_version + .as_deref() + .unwrap_or("") + )); + } + + if let Some(expected_symbol_doc_count) = manifest.symbol_doc_count { + match storage.get_symbol_search_doc_count() { + Ok(actual) if i64::from(actual) == expected_symbol_doc_count => {} + Ok(actual) => { + return Some(format!( + "sidecar_symbol_doc_count_changed: manifest={expected_symbol_doc_count} current={actual}" + )); + } + Err(error) => { + return Some(format!("sidecar_symbol_doc_count_unavailable: {error}")); + } + } + } + + if let Some(expected_count) = manifest + .dense_projection_count + .or(manifest.projection_count) + { match collect_sidecar_semantic_doc_stats(storage) { Ok(stats) => { - if stats.doc_count == 0 { + if expected_count > 0 && stats.doc_count == 0 { return Some( "sidecar_semantic_doc_count_unavailable: no sidecar-eligible stored docs" .into(), ); } - if stats.mixed_embedding_profiles - || stats.mixed_embedding_models - || stats.mixed_embedding_backends - || stats.mixed_dimensions - || stats.mixed_doc_shapes - || stats.embedding_profile.as_deref() != Some("bge-base-en-v1.5") - || stats.embedding_dim - != Some(crate::embeddings::RETRIEVAL_EMBEDDING_DIM as u32) - || !stats - .embedding_model - .as_deref() - .is_some_and(|model| model.contains("bge-base-en-v1.5")) + if expected_count > 0 + && (stats.mixed_embedding_profiles + || stats.mixed_embedding_models + || stats.mixed_embedding_backends + || stats.mixed_dimensions + || stats.mixed_doc_shapes + || stats.mixed_semantic_policy_versions + || stats.semantic_policy_version.as_deref() + != Some(SEMANTIC_POLICY_VERSION) + || stats.embedding_profile.as_deref() != Some("bge-base-en-v1.5") + || stats.embedding_dim + != Some(crate::embeddings::RETRIEVAL_EMBEDDING_DIM as u32) + || !stats + .embedding_model + .as_deref() + .is_some_and(|model| model.contains("bge-base-en-v1.5"))) { return Some(SIDECAR_SEMANTIC_DOC_CONTRACT_CHANGED.into()); } @@ -86,6 +130,15 @@ pub fn manifest_staleness_reason( stats.doc_count )); } + if let Some(expected_reasons) = manifest.dense_reason_counts_json.as_deref() { + let actual_reasons = serde_json::to_string(&stats.dense_reason_counts) + .unwrap_or_else(|_| "{}".into()); + if actual_reasons != expected_reasons { + return Some(format!( + "sidecar_dense_reason_counts_changed: manifest={expected_reasons} current={actual_reasons}" + )); + } + } } Err(error) => { return Some(format!("sidecar_semantic_doc_count_unavailable: {error}")); @@ -123,7 +176,9 @@ pub fn manifest_sidecar_generation(manifest: &RetrievalIndexManifest) -> &str { } pub(crate) fn sidecar_semantic_doc_is_product_eligible(doc: &LlmSymbolDoc) -> bool { - sidecar_semantic_node_kind(doc.kind) && sidecar_stored_embedding_is_product_compatible(doc) + (sidecar_semantic_node_kind(doc.kind) + || doc.dense_reason.as_deref() == Some("component_report")) + && sidecar_stored_embedding_is_product_compatible(doc) } pub(crate) fn sidecar_semantic_node_kind(kind: NodeKind) -> bool { @@ -175,11 +230,14 @@ struct SidecarSemanticDocStats { embedding_backend: Option, embedding_dim: Option, doc_shape: Option, + semantic_policy_version: Option, + dense_reason_counts: BTreeMap, mixed_embedding_profiles: bool, mixed_embedding_models: bool, mixed_embedding_backends: bool, mixed_dimensions: bool, mixed_doc_shapes: bool, + mixed_semantic_policy_versions: bool, } fn collect_sidecar_semantic_doc_stats(storage: &Store) -> Result { @@ -189,6 +247,7 @@ fn collect_sidecar_semantic_doc_stats(storage: &Store) -> Result> = None; let mut first_dim: Option> = None; let mut first_shape: Option> = None; + let mut first_policy: Option> = None; let mut after = None; loop { @@ -234,6 +293,14 @@ fn collect_sidecar_semantic_doc_stats(storage: &Store) -> Result, + pub(crate) graph_artifact_hash: String, + pub(crate) dense_reason_counts_json: String, pub(crate) lexical_file_count: u32, pub(crate) lexical_hash: String, } @@ -202,6 +207,7 @@ pub fn finalize_index(project_root: &Path, storage_path: &Path) -> Result Result Result, ) -> Result { + if projection_count == 0 { + info!( + project_id = %project_id, + collection = %collection, + "Qdrant collection skipped because graph_first_v1 selected zero dense anchors" + ); + return Ok(0); + } let qdrant_probe = qdrant_client.health_probe(collection); if !qdrant_probe.reachable { bail!( @@ -544,6 +561,13 @@ fn manifest_matches_sidecar_input( manifest.sidecar_schema_version == Some(SIDECAR_SCHEMA_VERSION) && manifest.sidecar_input_hash.as_deref() == Some(sidecar_input.hash.as_str()) && manifest.projection_count == Some(sidecar_input.projection_count) + && manifest.symbol_doc_count == Some(sidecar_input.symbol_doc_count) + && manifest.dense_projection_count == Some(sidecar_input.dense_projection_count) + && manifest.semantic_policy_version == sidecar_input.semantic_policy_version + && manifest.graph_artifact_hash.as_deref() + == Some(sidecar_input.graph_artifact_hash.as_str()) + && manifest.dense_reason_counts_json.as_deref() + == Some(sidecar_input.dense_reason_counts_json.as_str()) && manifest.embedding_backend.as_deref() == Some(embedding_backend) && manifest.embedding_dim == Some(embedding_dim) } @@ -570,6 +594,11 @@ fn retrieval_manifest_for_sidecar( sidecar_input_hash: Some(sidecar_input.hash.clone()), sidecar_generation: Some(generation.to_string()), projection_count: Some(sidecar_input.projection_count), + symbol_doc_count: Some(sidecar_input.symbol_doc_count), + dense_projection_count: Some(sidecar_input.dense_projection_count), + semantic_policy_version: sidecar_input.semantic_policy_version.clone(), + graph_artifact_hash: Some(sidecar_input.graph_artifact_hash.clone()), + dense_reason_counts_json: Some(sidecar_input.dense_reason_counts_json.clone()), } } @@ -595,7 +624,7 @@ fn qdrant_ready_point_count( ) -> Option { let expected_points = u64::try_from(expected_points).ok()?; if expected_points == 0 { - return None; + return Some(0); } match qdrant_client.count_points_exact(collection) { Ok(actual) if actual >= expected_points => Some(actual), @@ -662,14 +691,18 @@ fn persist_finalized_manifest( pub(crate) fn compute_sidecar_input_fingerprint( storage: &Store, + storage_path: &Path, project_root: &Path, project_id: &str, embedding_backend: &str, embedding_dim: i32, ) -> Result { - let lexical = lexical_input_fingerprint(project_root).context("hash lexical sidecar input")?; + let lexical = lexical_input_fingerprint(project_root, Some(storage_path)) + .context("hash lexical sidecar input")?; let mut hasher = Sha256::new(); - hash_part(&mut hasher, "codestory-sidecar-input-v4"); + let mut graph_hasher = Sha256::new(); + hash_part(&mut hasher, "codestory-sidecar-input-v5"); + hash_part(&mut graph_hasher, "codestory-symbol-search-docs-v1"); hash_part(&mut hasher, project_id); hash_part(&mut hasher, &SIDECAR_SCHEMA_VERSION.to_string()); hash_part(&mut hasher, ZOEKT_REAL_VERSION_PIN); @@ -699,7 +732,29 @@ pub(crate) fn compute_sidecar_input_fingerprint( ); hash_part(&mut hasher, "scip-symbols-json-v1"); - let mut projection_count = 0_i64; + let mut symbol_doc_count = 0_i64; + let mut policy_versions = BTreeSet::::new(); + let mut after_symbol_doc = None; + loop { + let batch = storage + .get_symbol_search_docs_batch_after(after_symbol_doc, SIDECAR_INPUT_BATCH_SIZE) + .context("load symbol search docs for sidecar hash")?; + if batch.is_empty() { + break; + } + after_symbol_doc = batch.last().map(|doc| doc.node_id); + symbol_doc_count += i64::try_from(batch.len()).unwrap_or(i64::MAX); + for doc in batch { + observe_policy_version(&mut policy_versions, Some(doc.policy_version.as_str())); + hash_symbol_search_doc_detail(&mut graph_hasher, project_root, &doc); + } + } + let graph_artifact_hash = format!("{:x}", graph_hasher.finalize()); + hash_part(&mut hasher, &symbol_doc_count.to_string()); + hash_part(&mut hasher, &graph_artifact_hash); + + let mut dense_projection_count = 0_i64; + let mut dense_reason_counts = BTreeMap::::new(); let mut after = None; loop { let batch = storage @@ -713,20 +768,87 @@ pub(crate) fn compute_sidecar_input_fingerprint( .into_iter() .filter(qdrant_semantic_doc_row) .collect::>(); - projection_count += i64::try_from(batch.len()).unwrap_or(i64::MAX); + dense_projection_count += i64::try_from(batch.len()).unwrap_or(i64::MAX); for doc in batch { + observe_policy_version(&mut policy_versions, doc.semantic_policy_version.as_deref()); + let reason = doc.dense_reason.as_deref().unwrap_or("unknown").to_string(); + *dense_reason_counts.entry(reason).or_insert(0) += 1; hash_semantic_doc_detail(&mut hasher, project_root, &doc); } } + let dense_reason_counts_json = + serde_json::to_string(&dense_reason_counts).unwrap_or_else(|_| "{}".into()); + let semantic_policy_version = policy_version_from_observed(&policy_versions) + .or_else(|| Some(crate::generation::SEMANTIC_POLICY_VERSION.into())); + hash_part( + &mut hasher, + semantic_policy_version.as_deref().unwrap_or(""), + ); + hash_part(&mut hasher, &dense_projection_count.to_string()); + hash_part(&mut hasher, &dense_reason_counts_json); Ok(SidecarInputFingerprint { hash: format!("{:x}", hasher.finalize()), - projection_count, + symbol_doc_count, + projection_count: dense_projection_count, + dense_projection_count, + semantic_policy_version, + graph_artifact_hash, + dense_reason_counts_json, lexical_file_count: lexical.file_count, lexical_hash: lexical.hash, }) } +fn observe_policy_version(policy_versions: &mut BTreeSet, policy: Option<&str>) { + if let Some(policy) = policy.map(str::trim).filter(|policy| !policy.is_empty()) { + policy_versions.insert(policy.to_string()); + } +} + +fn policy_version_from_observed(policy_versions: &BTreeSet) -> Option { + match policy_versions.len() { + 0 => None, + 1 => policy_versions.iter().next().cloned(), + _ => Some("mixed".into()), + } +} + +fn hash_symbol_search_doc_detail(hasher: &mut Sha256, project_root: &Path, doc: &SymbolSearchDoc) { + let file_path = doc + .file_path + .as_deref() + .and_then(|path| normalize_sidecar_file_path(path, project_root).ok()) + .unwrap_or_default(); + let file_role = if file_path.is_empty() { + "" + } else { + FileRole::classify_path(Path::new(&file_path)).as_str() + }; + hash_part(hasher, &doc.node_id.0.to_string()); + hash_part( + hasher, + &doc.file_node_id + .map(|node_id| node_id.0.to_string()) + .unwrap_or_default(), + ); + hash_part(hasher, &(doc.kind as i32).to_string()); + hash_part(hasher, &doc.display_name); + hash_part(hasher, doc.qualified_name.as_deref().unwrap_or("")); + hash_part(hasher, &file_path); + hash_part(hasher, file_role); + hash_part( + hasher, + &doc.start_line + .map(|line| line.to_string()) + .unwrap_or_default(), + ); + hash_part(hasher, &doc.doc_version.to_string()); + hash_part(hasher, &doc.doc_hash); + hash_part(hasher, &doc.policy_version); + hash_part(hasher, &doc.source_provenance); +} + fn hash_semantic_doc_detail(hasher: &mut Sha256, project_root: &Path, doc: &LlmSymbolDoc) { let file_path = doc .file_path @@ -752,6 +874,8 @@ fn hash_semantic_doc_detail(hasher: &mut Sha256, project_root: &Path, doc: &LlmS ); hash_part(hasher, &doc.doc_version.to_string()); hash_part(hasher, &doc.doc_hash); + hash_part(hasher, doc.semantic_policy_version.as_deref().unwrap_or("")); + hash_part(hasher, doc.dense_reason.as_deref().unwrap_or("")); hash_part(hasher, doc.embedding_profile.as_deref().unwrap_or("")); hash_part(hasher, &doc.embedding_model); hash_part(hasher, doc.embedding_backend.as_deref().unwrap_or("")); @@ -835,6 +959,7 @@ fn upsert_qdrant_points_from_store( node_id: doc.node_id.0.to_string(), file_path, file_role, + dense_reason: doc.dense_reason.clone(), vector: Some(doc.embedding), } }) @@ -945,7 +1070,12 @@ mod tests { let project_id = "proj"; let input = SidecarInputFingerprint { hash: "0123456789abcdef0123456789abcdef".into(), + symbol_doc_count: 42, projection_count: 42, + dense_projection_count: 42, + semantic_policy_version: Some(crate::generation::SEMANTIC_POLICY_VERSION.into()), + graph_artifact_hash: "graph-hash".into(), + dense_reason_counts_json: "{\"public_api\":42}".into(), lexical_file_count: 3, lexical_hash: "lexical".into(), }; @@ -963,6 +1093,12 @@ mod tests { assert!(manifest_has_current_sidecar_contract(project_id, &manifest)); assert_eq!(manifest.projection_count, Some(42)); + assert_eq!(manifest.symbol_doc_count, Some(42)); + assert_eq!(manifest.dense_projection_count, Some(42)); + assert_eq!( + manifest.semantic_policy_version.as_deref(), + Some(crate::generation::SEMANTIC_POLICY_VERSION) + ); assert_eq!( manifest.sidecar_generation.as_deref(), Some(generation.as_str()) @@ -1008,6 +1144,8 @@ mod tests { embedding_backend: backend.map(str::to_string), embedding_dim: dim, doc_shape: Some("semantic_doc_version=4;scope=durable_symbols".into()), + semantic_policy_version: Some(crate::generation::SEMANTIC_POLICY_VERSION.into()), + dense_reason: Some("public_api".into()), embedding: vec![0.01; dim as usize], updated_at_epoch_ms: 123, }; @@ -1071,6 +1209,8 @@ mod tests { embedding_backend: Some("onnx".into()), embedding_dim: crate::embeddings::RETRIEVAL_EMBEDDING_DIM as u32, doc_shape: Some("semantic_doc_version=4;scope=durable_symbols".into()), + semantic_policy_version: Some(crate::generation::SEMANTIC_POLICY_VERSION.into()), + dense_reason: Some("public_api".into()), embedding: vec![0.01; crate::embeddings::RETRIEVAL_EMBEDDING_DIM], updated_at_epoch_ms: 123, }; @@ -1079,6 +1219,7 @@ mod tests { .expect("first doc"); let first = compute_sidecar_input_fingerprint( &storage, + &storage_path, project.path(), "proj", crate::embeddings::PRODUCT_EMBEDDING_RUNTIME_ID, @@ -1091,6 +1232,7 @@ mod tests { .expect("second doc"); let second = compute_sidecar_input_fingerprint( &storage, + &storage_path, project.path(), "proj", crate::embeddings::PRODUCT_EMBEDDING_RUNTIME_ID, @@ -1100,6 +1242,8 @@ mod tests { assert_eq!(first.projection_count, 1); assert_eq!(second.projection_count, 1); + assert_eq!(first.dense_projection_count, 1); + assert_eq!(first.dense_reason_counts_json, "{\"public_api\":1}"); assert_ne!(first.hash, second.hash); } } diff --git a/crates/codestory-retrieval/src/planner.rs b/crates/codestory-retrieval/src/planner.rs index d0409a04..582252c5 100644 --- a/crates/codestory-retrieval/src/planner.rs +++ b/crates/codestory-retrieval/src/planner.rs @@ -66,18 +66,6 @@ pub fn plan_query(features: &QueryFeatures, mode: RetrievalDegradedMode) -> Retr }); } - if mode.runs_qdrant_stage() && features.shape != QueryShape::PathLike { - let semantic_top_k = match features.shape { - QueryShape::NaturalLanguage | QueryShape::Mixed => top_k.saturating_mul(2).min(40), - _ => top_k, - }; - stages.push(PlannedStage { - kind: RetrievalStageKind::Stage1bQdrantSemantic, - budget_ms: stage1b_budget_ms(features.shape), - top_k: semantic_top_k, - }); - } - if mode.runs_scip_stages() { let stage2_top_k = match features.shape { QueryShape::NaturalLanguage => top_k.min(20), @@ -90,6 +78,18 @@ pub fn plan_query(features: &QueryFeatures, mode: RetrievalDegradedMode) -> Retr }); } + if mode.runs_qdrant_stage() && features.shape != QueryShape::PathLike { + let semantic_top_k = match features.shape { + QueryShape::NaturalLanguage | QueryShape::Mixed => top_k.saturating_mul(2).min(40), + _ => top_k, + }; + stages.push(PlannedStage { + kind: RetrievalStageKind::Stage1bQdrantSemantic, + budget_ms: stage1b_budget_ms(features.shape), + top_k: semantic_top_k, + }); + } + let total_budget_ms = stages .iter() .map(|stage| stage.budget_ms) @@ -156,8 +156,16 @@ mod tests { let kinds: Vec<_> = plan.stages.iter().map(|s| s.kind).collect(); assert!(kinds.contains(&RetrievalStageKind::Stage0ScipAnchor)); assert!(kinds.contains(&RetrievalStageKind::Stage1ZoektLexical)); - assert!(kinds.contains(&RetrievalStageKind::Stage1bQdrantSemantic)); assert!(kinds.contains(&RetrievalStageKind::Stage2ScipExpand)); + assert!(kinds.contains(&RetrievalStageKind::Stage1bQdrantSemantic)); + assert!( + kinds + .iter() + .position(|kind| *kind == RetrievalStageKind::Stage2ScipExpand) + < kinds + .iter() + .position(|kind| *kind == RetrievalStageKind::Stage1bQdrantSemantic) + ); } #[test] @@ -180,8 +188,16 @@ mod tests { let plan = plan_query(&features, RetrievalDegradedMode::Full); let kinds: Vec<_> = plan.stages.iter().map(|s| s.kind).collect(); assert!(!kinds.contains(&RetrievalStageKind::Stage0ScipAnchor)); - assert!(kinds.contains(&RetrievalStageKind::Stage1bQdrantSemantic)); assert!(kinds.contains(&RetrievalStageKind::Stage2ScipExpand)); + assert!(kinds.contains(&RetrievalStageKind::Stage1bQdrantSemantic)); + assert!( + kinds + .iter() + .position(|kind| *kind == RetrievalStageKind::Stage2ScipExpand) + < kinds + .iter() + .position(|kind| *kind == RetrievalStageKind::Stage1bQdrantSemantic) + ); } #[test] @@ -191,7 +207,15 @@ mod tests { let kinds: Vec<_> = plan.stages.iter().map(|s| s.kind).collect(); assert!(!kinds.contains(&RetrievalStageKind::Stage0ScipAnchor)); assert!(kinds.contains(&RetrievalStageKind::Stage1ZoektLexical)); - assert!(kinds.contains(&RetrievalStageKind::Stage1bQdrantSemantic)); assert!(kinds.contains(&RetrievalStageKind::Stage2ScipExpand)); + assert!(kinds.contains(&RetrievalStageKind::Stage1bQdrantSemantic)); + assert!( + kinds + .iter() + .position(|kind| *kind == RetrievalStageKind::Stage2ScipExpand) + < kinds + .iter() + .position(|kind| *kind == RetrievalStageKind::Stage1bQdrantSemantic) + ); } } diff --git a/crates/codestory-retrieval/src/qdrant_client.rs b/crates/codestory-retrieval/src/qdrant_client.rs index a088dff8..128a30dd 100644 --- a/crates/codestory-retrieval/src/qdrant_client.rs +++ b/crates/codestory-retrieval/src/qdrant_client.rs @@ -23,6 +23,7 @@ pub struct QdrantUpsertPoint { pub node_id: String, pub file_path: Option, pub file_role: Option, + pub dense_reason: Option, pub vector: Option>, } @@ -327,6 +328,7 @@ impl QdrantClient { "path": point.file_path, "file_role": point.file_role.map(FileRole::as_str), "symbol": point.display_name, + "dense_reason": point.dense_reason, } })); } @@ -504,13 +506,26 @@ pub fn parse_search_response(body: &str, limit: usize) -> Result super::CandidateHit { use super::candidate::{CandidateHit, CandidateSource}; CandidateHit { + node_id: None, file_path: symbol.path.clone(), symbol_name: Some(symbol.symbol.clone()), start_line: Some(symbol.start_line), score, source: CandidateSource::Scip, + provenance: Vec::new(), file_role: None, scip_hop_distance: Some(hop), rank_features: None, diff --git a/crates/codestory-retrieval/src/sidecar.rs b/crates/codestory-retrieval/src/sidecar.rs index 294624e1..0f7a483a 100644 --- a/crates/codestory-retrieval/src/sidecar.rs +++ b/crates/codestory-retrieval/src/sidecar.rs @@ -6,6 +6,9 @@ use crate::generation::{ use crate::health::{RetrievalStatusReport, probe_sidecar_health}; use crate::index::{compute_sidecar_input_fingerprint, project_id_for_root}; use anyhow::{Context, Result}; +use codestory_contracts::language_support::{ + LanguageSupportMode, language_support_profile_for_ext, +}; use codestory_store::Store; use codestory_workspace::{RefreshInputs, StoredFileState, WorkspaceManifest}; use serde::{Deserialize, Serialize}; @@ -75,9 +78,14 @@ fn sidecar_status_inner( .context("load retrieval manifest")?; if strict && let Some(manifest) = manifest.as_ref() - && let Some(reason) = - strict_readiness_unavailable_reason(project_root, &storage, &project_id, manifest) - .context("check strict sidecar readiness")? + && let Some(reason) = strict_readiness_unavailable_reason( + project_root, + path, + &storage, + &project_id, + manifest, + ) + .context("check strict sidecar readiness")? { return Ok(enrich_status_with_semantic_doc_stats( crate::health::unavailable_status_report( @@ -117,6 +125,7 @@ fn enrich_status_with_semantic_doc_stats( pub(crate) fn validate_strict_sidecar_readiness( project_root: &Path, + storage_path: &Path, storage: &Store, ) -> Result<()> { let project_id = project_id_for_root(project_root); @@ -126,9 +135,13 @@ pub(crate) fn validate_strict_sidecar_readiness( else { return Ok(()); }; - if let Some(reason) = - strict_readiness_unavailable_reason(project_root, storage, &project_id, &manifest)? - { + if let Some(reason) = strict_readiness_unavailable_reason( + project_root, + storage_path, + storage, + &project_id, + &manifest, + )? { anyhow::bail!("sidecar_manifest_stale: {reason}"); } Ok(()) @@ -136,6 +149,7 @@ pub(crate) fn validate_strict_sidecar_readiness( fn strict_readiness_unavailable_reason( project_root: &Path, + storage_path: &Path, storage: &Store, project_id: &str, manifest: &codestory_store::RetrievalIndexManifest, @@ -154,6 +168,7 @@ fn strict_readiness_unavailable_reason( .unwrap_or(crate::embeddings::RETRIEVAL_EMBEDDING_DIM as i32); let current_input = compute_sidecar_input_fingerprint( storage, + storage_path, project_root, project_id, &embedding_backend, @@ -162,6 +177,13 @@ fn strict_readiness_unavailable_reason( .context("compute strict sidecar input fingerprint")?; if manifest.sidecar_input_hash.as_deref() == Some(current_input.hash.as_str()) && manifest.projection_count == Some(current_input.projection_count) + && manifest.symbol_doc_count == Some(current_input.symbol_doc_count) + && manifest.dense_projection_count == Some(current_input.dense_projection_count) + && manifest.semantic_policy_version == current_input.semantic_policy_version + && manifest.graph_artifact_hash.as_deref() + == Some(current_input.graph_artifact_hash.as_str()) + && manifest.dense_reason_counts_json.as_deref() + == Some(current_input.dense_reason_counts_json.as_str()) { return Ok(None); } @@ -200,12 +222,22 @@ fn strict_readiness_unavailable_reason( ))); } Ok(Some(format!( - "sidecar_input_hash_changed: manifest={} current={}; projection_count manifest={} current={}", + "sidecar_input_hash_changed: manifest={} current={}; symbol_doc_count manifest={} current={}; dense_projection_count manifest={} current={}; projection_count manifest={} current={}", manifest .sidecar_input_hash .as_deref() .unwrap_or(""), current_input.hash, + manifest + .symbol_doc_count + .map(|count| count.to_string()) + .unwrap_or_else(|| "".into()), + current_input.symbol_doc_count, + manifest + .dense_projection_count + .map(|count| count.to_string()) + .unwrap_or_else(|| "".into()), + current_input.dense_projection_count, manifest .projection_count .map(|count| count.to_string()) @@ -215,41 +247,10 @@ fn strict_readiness_unavailable_reason( } fn graph_indexed_source_path(path: &Path) -> bool { - let extension = path - .extension() + path.extension() .and_then(|ext| ext.to_str()) - .map(|ext| ext.to_ascii_lowercase()); - matches!( - extension.as_deref(), - Some( - "rs" | "py" - | "pyi" - | "java" - | "js" - | "jsx" - | "mjs" - | "cjs" - | "svelte" - | "vue" - | "astro" - | "ts" - | "tsx" - | "mts" - | "cts" - | "c" - | "cc" - | "cpp" - | "cxx" - | "h" - | "hh" - | "hpp" - | "hxx" - | "go" - | "rb" - | "php" - | "cs" - ) - ) + .and_then(language_support_profile_for_ext) + .is_some_and(|profile| profile.support_mode == LanguageSupportMode::ParserBackedGraph) } fn manifest_contract_drift_should_win(reason: &str) -> bool { @@ -292,6 +293,13 @@ mod tests { sidecar_input_hash: Some(hash.into()), sidecar_generation: Some(sidecar_generation_id(&project_id, hash)), projection_count: Some(10), + symbol_doc_count: Some(10), + dense_projection_count: Some(10), + semantic_policy_version: Some( + crate::generation::SEMANTIC_POLICY_VERSION.into(), + ), + graph_artifact_hash: Some("graph-test-hash".into()), + dense_reason_counts_json: Some("{\"public_api\":10}".into()), }) .expect("manifest"); } @@ -350,6 +358,13 @@ mod tests { sidecar_input_hash: Some(hash.into()), sidecar_generation: Some(sidecar_generation_id(&project_id, hash)), projection_count: Some(0), + symbol_doc_count: Some(0), + dense_projection_count: Some(0), + semantic_policy_version: Some( + crate::generation::SEMANTIC_POLICY_VERSION.into(), + ), + graph_artifact_hash: Some("graph-test-hash".into()), + dense_reason_counts_json: Some("{}".into()), }) .expect("manifest"); } @@ -423,6 +438,13 @@ mod tests { sidecar_input_hash: Some(hash.into()), sidecar_generation: Some(sidecar_generation_id(&project_id, hash)), projection_count: Some(0), + symbol_doc_count: Some(0), + dense_projection_count: Some(0), + semantic_policy_version: Some( + crate::generation::SEMANTIC_POLICY_VERSION.into(), + ), + graph_artifact_hash: Some("graph-test-hash".into()), + dense_reason_counts_json: Some("{}".into()), }) .expect("manifest"); } @@ -497,6 +519,13 @@ mod tests { sidecar_input_hash: Some(hash.into()), sidecar_generation: Some(sidecar_generation_id(&project_id, hash)), projection_count: Some(0), + symbol_doc_count: Some(0), + dense_projection_count: Some(0), + semantic_policy_version: Some( + crate::generation::SEMANTIC_POLICY_VERSION.into(), + ), + graph_artifact_hash: Some("graph-test-hash".into()), + dense_reason_counts_json: Some("{}".into()), }) .expect("manifest"); } @@ -521,6 +550,78 @@ mod tests { ); } + #[test] + fn strict_status_rejects_manifest_when_new_parser_backed_language_file_is_added() { + let project = TempDir::new().expect("project"); + let storage_dir = TempDir::new().expect("storage"); + let storage_path = storage_dir.path().join("codestory.db"); + let source_path = project.path().join("src").join("lib.rs"); + std::fs::create_dir_all(source_path.parent().expect("source parent")) + .expect("create source parent"); + std::fs::write(&source_path, "pub fn indexed() {}\n").expect("write source"); + let indexed_mtime = live_mtime_millis(&source_path); + let project_id = project_id_for_root(project.path()); + let hash = "ba5eba11feedface"; + { + let mut storage = Store::open(&storage_path).expect("open db"); + storage + .insert_file(&FileInfo { + id: 1, + path: source_path.clone(), + language: "rust".into(), + modification_time: indexed_mtime, + indexed: true, + complete: true, + line_count: 1, + file_role: FileRole::Source, + }) + .expect("insert indexed file"); + storage + .upsert_retrieval_index_manifest(&codestory_store::RetrievalIndexManifest { + project_id: project_id.clone(), + zoekt_version: "zoekt-real-v1".into(), + qdrant_collection: sidecar_qdrant_collection(&project_id, hash), + scip_revision: Some("graph-test".into()), + built_at_epoch_ms: indexed_mtime, + disk_bytes: None, + degraded_modes_json: "[]".into(), + embedding_backend: Some(crate::embeddings::PRODUCT_EMBEDDING_RUNTIME_ID.into()), + embedding_dim: Some(768), + sidecar_schema_version: Some(SIDECAR_SCHEMA_VERSION), + sidecar_input_hash: Some(hash.into()), + sidecar_generation: Some(sidecar_generation_id(&project_id, hash)), + projection_count: Some(0), + symbol_doc_count: Some(0), + dense_projection_count: Some(0), + semantic_policy_version: Some( + crate::generation::SEMANTIC_POLICY_VERSION.into(), + ), + graph_artifact_hash: Some("graph-test-hash".into()), + dense_reason_counts_json: Some("{}".into()), + }) + .expect("manifest"); + } + + std::fs::write( + project.path().join("src").join("Routes.kt"), + "fun routeUsers() = Unit\n", + ) + .expect("write kotlin source"); + + let report = + strict_sidecar_status(project.path(), Some(&storage_path)).expect("strict status"); + + assert_eq!(report.retrieval_mode, "unavailable"); + assert!( + report + .degraded_reason + .as_deref() + .unwrap_or_default() + .contains("indexable_file_added_or_changed_after_sidecar_manifest"), + "new registry-backed parser file should make strict status fail closed: {report:?}" + ); + } + #[test] fn strict_readiness_accepts_markdown_covered_by_sidecar_fingerprint() { let project = TempDir::new().expect("project"); @@ -550,6 +651,7 @@ mod tests { .expect("insert indexed file"); let input = compute_sidecar_input_fingerprint( &storage, + &storage_path, project.path(), &project_id, crate::embeddings::PRODUCT_EMBEDDING_RUNTIME_ID, @@ -571,14 +673,19 @@ mod tests { sidecar_input_hash: Some(input.hash.clone()), sidecar_generation: Some(sidecar_generation_id(&project_id, &input.hash)), projection_count: Some(input.projection_count), + symbol_doc_count: Some(input.symbol_doc_count), + dense_projection_count: Some(input.dense_projection_count), + semantic_policy_version: input.semantic_policy_version.clone(), + graph_artifact_hash: Some(input.graph_artifact_hash.clone()), + dense_reason_counts_json: Some(input.dense_reason_counts_json.clone()), }) .expect("manifest"); - validate_strict_sidecar_readiness(project.path(), &storage) + validate_strict_sidecar_readiness(project.path(), &storage_path, &storage) .expect("markdown already covered by sidecar input should not look stale"); std::fs::write(project.path().join("README.md"), "# New docs\n").expect("write new docs"); - let stale = validate_strict_sidecar_readiness(project.path(), &storage) + let stale = validate_strict_sidecar_readiness(project.path(), &storage_path, &storage) .expect_err("new sidecar-only docs should stale the manifest"); assert!( stale.to_string().contains("sidecar_input_hash_changed"), diff --git a/crates/codestory-retrieval/src/zoekt_client.rs b/crates/codestory-retrieval/src/zoekt_client.rs index b42bb06e..028d7c53 100644 --- a/crates/codestory-retrieval/src/zoekt_client.rs +++ b/crates/codestory-retrieval/src/zoekt_client.rs @@ -75,7 +75,18 @@ impl ZoektClient { let hits = search_lexical_index(&shard_dir, query, limit)? .into_iter() - .map(|hit| CandidateHit::with_source(hit.path, None, hit.score, CandidateSource::Zoekt)) + .map(|hit| { + let mut candidate = CandidateHit::with_source( + hit.path, + hit.symbol_name, + hit.score, + CandidateSource::Zoekt, + ); + candidate.node_id = hit.node_id; + candidate.start_line = hit.start_line; + candidate.add_provenance(hit.source.provenance_label()); + candidate + }) .collect::>(); Ok(hits) } @@ -115,10 +126,22 @@ mod tests { .expect("write b"); let zoekt_data = TempDir::new().expect("zoekt data"); - build_zoekt_shard(project_a.path(), zoekt_data.path(), "project-a", false) - .expect("index a"); - build_zoekt_shard(project_b.path(), zoekt_data.path(), "project-b", false) - .expect("index b"); + build_zoekt_shard( + project_a.path(), + None, + zoekt_data.path(), + "project-a", + false, + ) + .expect("index a"); + build_zoekt_shard( + project_b.path(), + None, + zoekt_data.path(), + "project-b", + false, + ) + .expect("index b"); let mut layout = SidecarLayout::from_env(); layout.zoekt_data_dir = zoekt_data.path().to_path_buf(); diff --git a/crates/codestory-retrieval/src/zoekt_index.rs b/crates/codestory-retrieval/src/zoekt_index.rs index 799253ac..7a624111 100644 --- a/crates/codestory-retrieval/src/zoekt_index.rs +++ b/crates/codestory-retrieval/src/zoekt_index.rs @@ -2,7 +2,7 @@ use crate::config::ZOEKT_REAL_VERSION_PIN; use anyhow::{Context, Result}; -use codestory_store::FileRole; +use codestory_store::{FileRole, Store, SymbolSearchDoc}; use serde::{Deserialize, Serialize}; use std::path::{Path, PathBuf}; @@ -16,6 +16,33 @@ const MAX_FILE_BYTES: usize = 256 * 1024; struct LexicalIndexEntry { path: String, content: String, + #[serde(default)] + source: LexicalDocumentSource, + #[serde(default, skip_serializing_if = "Option::is_none")] + node_id: Option, + #[serde(default, skip_serializing_if = "Option::is_none")] + symbol_name: Option, + #[serde(default, skip_serializing_if = "Option::is_none")] + start_line: Option, +} + +#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize, Default)] +#[serde(rename_all = "snake_case")] +pub enum LexicalDocumentSource { + #[default] + LexicalSource, + SymbolDoc, + ComponentReport, +} + +impl LexicalDocumentSource { + pub(crate) fn provenance_label(self) -> &'static str { + match self { + Self::LexicalSource => "lexical_source", + Self::SymbolDoc => "symbol_doc", + Self::ComponentReport => "component_report", + } + } } #[derive(Debug, Clone, Serialize, Deserialize)] @@ -36,6 +63,7 @@ pub struct LexicalInputFingerprint { /// Populate `shards//` with a searchable lexical index; remove stub marker on success. pub fn build_zoekt_shard( project_root: &Path, + storage_path: Option<&Path>, zoekt_data_dir: &Path, project_id: &str, zoekt_http_reachable: bool, @@ -44,7 +72,7 @@ pub fn build_zoekt_shard( std::fs::create_dir_all(&shard_dir) .with_context(|| format!("create zoekt shard dir {}", shard_dir.display()))?; - let entries = collect_lexical_entries(project_root)?; + let entries = collect_lexical_entries(project_root, storage_path)?; if entries.is_empty() { return Ok(false); } @@ -108,25 +136,60 @@ pub fn shard_matches_lexical_input( && meta.lexical_hash.as_deref() == Some(expected_hash) } -pub fn lexical_input_fingerprint(project_root: &Path) -> Result { - let entries = collect_lexical_entries(project_root)?; +pub fn lexical_input_fingerprint( + project_root: &Path, + storage_path: Option<&Path>, +) -> Result { + let mut hasher = new_lexical_entries_hasher(); + let mut file_count = 0_usize; + hash_lexical_entries_inner(project_root, project_root, &mut hasher, &mut file_count)?; + hash_symbol_doc_entries(project_root, storage_path, &mut hasher, &mut file_count)?; Ok(LexicalInputFingerprint { - file_count: entries.len().min(u32::MAX as usize) as u32, - hash: lexical_entries_hash(&entries), + file_count: file_count.min(u32::MAX as usize) as u32, + hash: finalize_lexical_entries_hash(hasher), }) } fn lexical_entries_hash(entries: &[LexicalIndexEntry]) -> String { - let mut hasher = sha2::Sha256::new(); + let mut hasher = new_lexical_entries_hasher(); + for entry in entries { + update_lexical_entries_hash(&mut hasher, entry); + } + finalize_lexical_entries_hash(hasher) +} + +fn new_lexical_entries_hasher() -> sha2::Sha256 { use sha2::Digest; + let mut hasher = sha2::Sha256::new(); hasher.update(b"codestory-zoekt-lexical-v1"); hasher.update(ZOEKT_REAL_VERSION_PIN.as_bytes()); - for entry in entries { - hasher.update(entry.path.as_bytes()); - hasher.update([0]); - hasher.update(entry.content.as_bytes()); - hasher.update([0]); + hasher +} + +fn update_lexical_entries_hash(hasher: &mut sha2::Sha256, entry: &LexicalIndexEntry) { + use sha2::Digest; + hasher.update(entry.path.as_bytes()); + hasher.update([0]); + hasher.update(entry.content.as_bytes()); + hasher.update([0]); + hasher.update(entry.source.provenance_label().as_bytes()); + hasher.update([0]); + if let Some(node_id) = entry.node_id.as_deref() { + hasher.update(node_id.as_bytes()); + } + hasher.update([0]); + if let Some(symbol_name) = entry.symbol_name.as_deref() { + hasher.update(symbol_name.as_bytes()); } + hasher.update([0]); + if let Some(start_line) = entry.start_line { + hasher.update(start_line.to_le_bytes()); + } + hasher.update([0]); +} + +fn finalize_lexical_entries_hash(hasher: sha2::Sha256) -> String { + use sha2::Digest; format!("{:x}", hasher.finalize()) } @@ -174,9 +237,13 @@ pub fn search_lexical_index( if token_match.matched_weight >= required_weight && broad_query_path_gate(tokens.len(), &token_match) { - let score = score_lexical_match(&entry.path, &token_match); + let score = score_lexical_match(&entry.path, entry.source, &token_match); hits.push(LexicalHit { path: entry.path, + source: entry.source, + node_id: entry.node_id, + symbol_name: entry.symbol_name, + start_line: entry.start_line, score, }); } @@ -345,10 +412,18 @@ fn path_match_factor(path_lower: &str, token: &str) -> f32 { #[derive(Debug, Clone)] pub struct LexicalHit { pub path: String, + pub source: LexicalDocumentSource, + pub node_id: Option, + pub symbol_name: Option, + pub start_line: Option, pub score: f32, } -fn score_lexical_match(path: &str, token_match: &LexicalTokenMatch) -> f32 { +fn score_lexical_match( + path: &str, + source: LexicalDocumentSource, + token_match: &LexicalTokenMatch, +) -> f32 { let coverage = if token_match.total_weight <= 0.0 { 0.0 } else { @@ -362,7 +437,11 @@ fn score_lexical_match(path: &str, token_match: &LexicalTokenMatch) -> f32 { if path_lower.contains("/src/") || path_lower.starts_with("src/") { score += 0.04; } - score *= lexical_file_role_multiplier(FileRole::classify_path(Path::new(path))); + if source == LexicalDocumentSource::ComponentReport { + score += 0.08; + } else { + score *= lexical_file_role_multiplier(FileRole::classify_path(Path::new(path))); + } score.min(0.99) } @@ -378,9 +457,13 @@ fn lexical_file_role_multiplier(file_role: FileRole) -> f32 { } } -fn collect_lexical_entries(project_root: &Path) -> Result> { +fn collect_lexical_entries( + project_root: &Path, + storage_path: Option<&Path>, +) -> Result> { let mut entries = Vec::new(); collect_lexical_entries_inner(project_root, project_root, &mut entries)?; + collect_symbol_doc_entries(project_root, storage_path, &mut entries)?; Ok(entries) } @@ -426,11 +509,168 @@ fn collect_lexical_entries_inner( .unwrap_or(&path) .to_string_lossy() .replace('\\', "/"); - entries.push(LexicalIndexEntry { path: rel, content }); + entries.push(LexicalIndexEntry { + path: rel, + content, + source: LexicalDocumentSource::LexicalSource, + node_id: None, + symbol_name: None, + start_line: None, + }); + } + Ok(()) +} + +fn hash_lexical_entries_inner( + project_root: &Path, + dir: &Path, + hasher: &mut sha2::Sha256, + file_count: &mut usize, +) -> Result<()> { + let read_dir = match std::fs::read_dir(dir) { + Ok(read_dir) => read_dir, + Err(_) => return Ok(()), + }; + let mut dir_entries = read_dir.flatten().collect::>(); + dir_entries.sort_by_key(|entry| entry.path()); + + for entry in dir_entries { + let path = entry.path(); + let name = entry.file_name(); + let name = name.to_string_lossy(); + if path.is_dir() { + if should_skip_dir(&name) { + continue; + } + hash_lexical_entries_inner(project_root, &path, hasher, file_count)?; + continue; + } + if !should_index_file(&name) { + continue; + } + let metadata = entry.metadata().ok(); + if metadata + .as_ref() + .and_then(|meta| meta.len().try_into().ok()) + .is_some_and(|len: usize| len > MAX_FILE_BYTES) + { + continue; + } + let Ok(content) = std::fs::read_to_string(&path) else { + continue; + }; + let rel = path + .strip_prefix(project_root) + .unwrap_or(&path) + .to_string_lossy() + .replace('\\', "/"); + update_lexical_entries_hash( + hasher, + &LexicalIndexEntry { + path: rel, + content, + source: LexicalDocumentSource::LexicalSource, + node_id: None, + symbol_name: None, + start_line: None, + }, + ); + *file_count = file_count.saturating_add(1); + } + Ok(()) +} + +fn collect_symbol_doc_entries( + project_root: &Path, + storage_path: Option<&Path>, + entries: &mut Vec, +) -> Result<()> { + let Some(storage_path) = storage_path.filter(|path| path.is_file()) else { + return Ok(()); + }; + let storage = Store::open(storage_path).context("open storage for lexical symbol docs")?; + let mut after = None; + loop { + let batch = storage + .get_symbol_search_docs_batch_after(after, 4096) + .context("load symbol search docs for lexical shard")?; + if batch.is_empty() { + break; + } + after = batch.last().map(|doc| doc.node_id); + for doc in batch { + entries.push(symbol_doc_lexical_entry(project_root, &doc)); + } + } + Ok(()) +} + +fn hash_symbol_doc_entries( + project_root: &Path, + storage_path: Option<&Path>, + hasher: &mut sha2::Sha256, + file_count: &mut usize, +) -> Result<()> { + let Some(storage_path) = storage_path.filter(|path| path.is_file()) else { + return Ok(()); + }; + let storage = Store::open(storage_path).context("open storage for lexical symbol docs")?; + let mut after = None; + loop { + let batch = storage + .get_symbol_search_docs_batch_after(after, 4096) + .context("load symbol search docs for lexical shard")?; + if batch.is_empty() { + break; + } + after = batch.last().map(|doc| doc.node_id); + for doc in batch { + update_lexical_entries_hash(hasher, &symbol_doc_lexical_entry(project_root, &doc)); + *file_count = file_count.saturating_add(1); + } } Ok(()) } +fn symbol_doc_lexical_entry(project_root: &Path, doc: &SymbolSearchDoc) -> LexicalIndexEntry { + let source = if doc.display_name.starts_with("component_report:") { + LexicalDocumentSource::ComponentReport + } else { + LexicalDocumentSource::SymbolDoc + }; + let path = doc + .file_path + .as_deref() + .and_then(|path| normalize_lexical_file_path(project_root, path)) + .unwrap_or_else(|| { + format!( + "codestory://{}", + doc.display_name + .replace('\\', "/") + .replace([' ', '\t', '\r', '\n'], "_") + ) + }); + LexicalIndexEntry { + path, + content: doc.doc_text.clone(), + source, + node_id: Some(doc.node_id.0.to_string()), + symbol_name: Some(doc.display_name.clone()), + start_line: doc.start_line, + } +} + +fn normalize_lexical_file_path(project_root: &Path, path: &str) -> Option { + let path = Path::new(path); + if path.is_absolute() { + path.strip_prefix(project_root) + .ok() + .map(|rel| rel.to_string_lossy().replace('\\', "/")) + } else { + Some(path.to_string_lossy().replace('\\', "/")) + } +} + fn should_skip_dir(name: &str) -> bool { matches!( name, @@ -464,8 +704,21 @@ pub fn shard_dir_for(zoekt_data_dir: &Path, project_id: &str) -> PathBuf { #[cfg(test)] mod tests { use super::*; + use codestory_contracts::graph::{Node, NodeId, NodeKind}; + use codestory_store::{FileInfo, FileRole}; use tempfile::TempDir; + fn entry(path: &str, content: &str) -> LexicalIndexEntry { + LexicalIndexEntry { + path: path.into(), + content: content.into(), + source: LexicalDocumentSource::LexicalSource, + node_id: None, + symbol_name: None, + start_line: None, + } + } + #[test] fn lexical_index_finds_repo_relative_paths() { let project = TempDir::new().expect("project"); @@ -475,20 +728,124 @@ mod tests { ) .expect("write"); let zoekt_root = TempDir::new().expect("zoekt"); - build_zoekt_shard(project.path(), zoekt_root.path(), "abc123", false).expect("build"); + build_zoekt_shard(project.path(), None, zoekt_root.path(), "abc123", false).expect("build"); let shard = shard_dir_for(zoekt_root.path(), "abc123"); assert!(shard_has_lexical_index(&shard)); let hits = search_lexical_index(&shard, "extension", 8).expect("search"); assert!(hits.iter().any(|hit| hit.path == "lib.rs")); } + #[test] + fn lexical_index_includes_symbol_search_docs_with_node_provenance() { + let project = TempDir::new().expect("project"); + let src = project.path().join("src"); + std::fs::create_dir_all(&src).expect("mkdir"); + let source_path = src.join("lib.rs"); + std::fs::write(&source_path, "fn private_helper() {}\n").expect("write source"); + + let storage_path = project.path().join("codestory.db"); + let mut storage = Store::open(&storage_path).expect("open store"); + storage + .insert_file(&FileInfo { + id: 1, + path: source_path.clone(), + language: "rust".into(), + modification_time: 1, + indexed: true, + complete: true, + line_count: 1, + file_role: FileRole::Source, + }) + .expect("insert file"); + storage + .insert_nodes_batch(&[ + Node { + id: NodeId(1), + kind: NodeKind::FILE, + serialized_name: "src/lib.rs".into(), + qualified_name: None, + canonical_id: None, + file_node_id: None, + start_line: Some(1), + start_col: Some(0), + end_line: Some(1), + end_col: Some(0), + }, + Node { + id: NodeId(2), + kind: NodeKind::FUNCTION, + serialized_name: "private_helper".into(), + qualified_name: Some("private_helper".into()), + canonical_id: None, + file_node_id: Some(NodeId(1)), + start_line: Some(1), + start_col: Some(0), + end_line: Some(1), + end_col: Some(22), + }, + ]) + .expect("insert nodes"); + storage + .upsert_symbol_search_docs_batch(&[SymbolSearchDoc { + node_id: NodeId(2), + file_node_id: Some(NodeId(1)), + kind: NodeKind::FUNCTION, + display_name: "private_helper".into(), + qualified_name: Some("private_helper".into()), + file_path: Some(source_path.to_string_lossy().to_string()), + start_line: Some(1), + doc_text: "symbol private_helper deterministic cache skip logic".into(), + doc_version: 4, + doc_hash: "doc-hash".into(), + policy_version: "graph_first_v1".into(), + source_provenance: "extracted".into(), + updated_at_epoch_ms: 1, + }]) + .expect("upsert symbol doc"); + drop(storage); + + let collected_entries = + collect_lexical_entries(project.path(), Some(&storage_path)).expect("collect entries"); + let streaming_fingerprint = + lexical_input_fingerprint(project.path(), Some(&storage_path)).expect("fingerprint"); + assert_eq!( + streaming_fingerprint.file_count as usize, + collected_entries.len() + ); + assert_eq!( + streaming_fingerprint.hash, + lexical_entries_hash(&collected_entries), + "streaming lexical fingerprint must match collected-entry hash" + ); + + let zoekt_root = TempDir::new().expect("zoekt"); + build_zoekt_shard( + project.path(), + Some(&storage_path), + zoekt_root.path(), + "symbols", + false, + ) + .expect("build"); + let shard = shard_dir_for(zoekt_root.path(), "symbols"); + let hits = search_lexical_index(&shard, "cache skip logic", 4).expect("search"); + let hit = hits + .iter() + .find(|hit| hit.symbol_name.as_deref() == Some("private_helper")) + .expect("symbol doc hit"); + assert_eq!(hit.source, LexicalDocumentSource::SymbolDoc); + assert_eq!(hit.node_id.as_deref(), Some("2")); + assert_eq!(hit.start_line, Some(1)); + } + #[test] fn shard_match_requires_current_lexical_hash_metadata() { let project = TempDir::new().expect("project"); std::fs::write(project.path().join("lib.rs"), "pub fn alpha() {}").expect("write"); let zoekt_root = TempDir::new().expect("zoekt"); - let fingerprint = lexical_input_fingerprint(project.path()).expect("fingerprint"); - build_zoekt_shard(project.path(), zoekt_root.path(), "generation", false).expect("build"); + let fingerprint = lexical_input_fingerprint(project.path(), None).expect("fingerprint"); + build_zoekt_shard(project.path(), None, zoekt_root.path(), "generation", false) + .expect("build"); assert!(shard_matches_lexical_input( zoekt_root.path(), @@ -508,14 +865,8 @@ mod tests { fn lexical_search_scores_all_matches_before_truncating() { let zoekt_root = TempDir::new().expect("zoekt"); let shard = zoekt_root.path(); - let weak = LexicalIndexEntry { - path: "src/a_weak.rs".into(), - content: "handler mentioned once".into(), - }; - let strong = LexicalIndexEntry { - path: "src/z_strong_handler.rs".into(), - content: "handler handler handler".into(), - }; + let weak = entry("src/a_weak.rs", "handler mentioned once"); + let strong = entry("src/z_strong_handler.rs", "handler handler handler"); let lines = [weak, strong] .into_iter() .map(|entry| serde_json::to_string(&entry).expect("serialize")) @@ -542,7 +893,7 @@ mod tests { } let zoekt_root = TempDir::new().expect("zoekt"); - build_zoekt_shard(project.path(), zoekt_root.path(), "large", false).expect("build"); + build_zoekt_shard(project.path(), None, zoekt_root.path(), "large", false).expect("build"); let shard = shard_dir_for(zoekt_root.path(), "large"); let hits = search_lexical_index(&shard, "symbol_4099", 4).expect("search"); @@ -556,14 +907,8 @@ mod tests { fn lexical_search_tie_breaks_by_path() { let zoekt_root = TempDir::new().expect("zoekt"); let shard = zoekt_root.path(); - let later = LexicalIndexEntry { - path: "src/b.rs".into(), - content: "handler".into(), - }; - let earlier = LexicalIndexEntry { - path: "src/a.rs".into(), - content: "handler".into(), - }; + let later = entry("src/b.rs", "handler"); + let earlier = entry("src/a.rs", "handler"); let lines = [later, earlier] .into_iter() .map(|entry| serde_json::to_string(&entry).expect("serialize")) @@ -582,26 +927,23 @@ mod tests { fn lexical_search_uses_partial_matching_for_broad_prompts() { let zoekt_root = TempDir::new().expect("zoekt"); let shard = zoekt_root.path(); - let source = LexicalIndexEntry { - path: "workspace/app/src/event_processor_with_jsonl_output.rs".into(), - content: "jsonl event output request runtime turn start".into(), - }; - let test = LexicalIndexEntry { - path: "workspace/app/tests/event_processor_with_json_output.rs".into(), - content: "json event output test approval fixture".into(), - }; - let unrelated = LexicalIndexEntry { - path: "workspace/core/src/session.rs".into(), - content: "session bookkeeping".into(), - }; - let generic_agent_doc = LexicalIndexEntry { - path: ".agents/skills/review/SKILL.md".into(), - content: "request json cli runtime thread turn start event output".into(), - }; - let generated_schema = LexicalIndexEntry { - path: "workspace/app-protocol/schema/typescript/v2/CommandRequestParams.ts".into(), - content: "app server command request turn start request".into(), - }; + let source = entry( + "workspace/app/src/event_processor_with_jsonl_output.rs", + "jsonl event output request runtime turn start", + ); + let test = entry( + "workspace/app/tests/event_processor_with_json_output.rs", + "json event output test approval fixture", + ); + let unrelated = entry("workspace/core/src/session.rs", "session bookkeeping"); + let generic_agent_doc = entry( + ".agents/skills/review/SKILL.md", + "request json cli runtime thread turn start event output", + ); + let generated_schema = entry( + "workspace/app-protocol/schema/typescript/v2/CommandRequestParams.ts", + "app server command request turn start request", + ); let lines = [test, unrelated, generic_agent_doc, generated_schema, source] .into_iter() .map(|entry| serde_json::to_string(&entry).expect("serialize")) diff --git a/crates/codestory-retrieval/tests/bootstrap_repair_contracts.rs b/crates/codestory-retrieval/tests/bootstrap_repair_contracts.rs index d2c89f38..19399f9b 100644 --- a/crates/codestory-retrieval/tests/bootstrap_repair_contracts.rs +++ b/crates/codestory-retrieval/tests/bootstrap_repair_contracts.rs @@ -46,6 +46,11 @@ fn mixed_flat_and_hashed_cache_protects_both_manifest_collections() { sidecar_input_hash: None, sidecar_generation: None, projection_count: None, + symbol_doc_count: None, + dense_projection_count: None, + semantic_policy_version: None, + graph_artifact_hash: None, + dense_reason_counts_json: None, }) .expect("flat manifest"); @@ -67,6 +72,11 @@ fn mixed_flat_and_hashed_cache_protects_both_manifest_collections() { sidecar_input_hash: None, sidecar_generation: None, projection_count: None, + symbol_doc_count: None, + dense_projection_count: None, + semantic_policy_version: None, + graph_artifact_hash: None, + dense_reason_counts_json: None, }) .expect("hashed manifest"); diff --git a/crates/codestory-retrieval/tests/full_stack_integration.rs b/crates/codestory-retrieval/tests/full_stack_integration.rs index e1903a28..8f5d0b34 100644 --- a/crates/codestory-retrieval/tests/full_stack_integration.rs +++ b/crates/codestory-retrieval/tests/full_stack_integration.rs @@ -104,6 +104,7 @@ fn full_mode_fixture_produces_resolvable_hits() { node_id: "2001".to_string(), file_path: Some("lib.rs".to_string()), file_role: Some(FileRole::Entrypoint), + dense_reason: Some("entrypoint".to_string()), vector: None, }]; if qdrant.upsert_points(&collection, &points).is_ok() { diff --git a/crates/codestory-runtime/Cargo.toml b/crates/codestory-runtime/Cargo.toml index 439991c2..c66cf3e5 100644 --- a/crates/codestory-runtime/Cargo.toml +++ b/crates/codestory-runtime/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "codestory-runtime" -version = "0.6.2" +version = "0.7.0" edition = "2024" [dependencies] diff --git a/crates/codestory-runtime/src/agent/eval_probes.rs b/crates/codestory-runtime/src/agent/eval_probes.rs index 6e7a4f2d..e7f75045 100644 --- a/crates/codestory-runtime/src/agent/eval_probes.rs +++ b/crates/codestory-runtime/src/agent/eval_probes.rs @@ -152,6 +152,202 @@ pub(crate) fn push_eval_required_probe_queries(terms: &[String], queries: &mut V } } +pub(crate) fn push_prompt_concept_derived_symbol_probes( + terms: &[String], + queries: &mut Vec, +) { + if !eval_probes_enabled() { + return; + } + + let has = |term: &str| eval_terms_have(terms, term); + let has_any = |needles: &[&str]| eval_terms_have_any(terms, needles); + + if has("stringutils") && has_any(&["blank", "empty", "whitespace"]) { + push_unique_terms(queries, &["StringUtils.isBlank", "StringUtils.isEmpty"]); + } + if has("strings") && has_any(&["case", "sensitive", "insensitive"]) { + push_unique_terms(queries, &["Strings.CS", "Strings.CI"]); + } + if has("charsequenceutils") + && (has_any(&["case", "sensitive", "region", "matching", "checks"]) || has("strings")) + { + push_unique_term(queries, "CharSequenceUtils.regionMatches"); + } + + let swr_prompt = has("swr") || has("useswr"); + if swr_prompt && has_any(&["exposes", "hook", "hooks", "public"]) { + push_unique_terms( + queries, + &["useSWR", "useSWRHandler", "withArgs", "withMiddleware"], + ); + } + if swr_prompt && has_any(&["serialize", "serializes", "serialized", "key", "keys"]) { + push_unique_term(queries, "serialize"); + } + if swr_prompt && has_any(&["cache", "helper", "helpers"]) { + push_unique_term(queries, "createCacheHelper"); + } + if swr_prompt && has_any(&["mutate", "mutation", "mutations"]) { + push_unique_term(queries, "internalMutate"); + } + + if eval_terms_indicate_python_requests_flow(terms) { + push_python_requests_flow_symbol_probe_queries(queries); + } + if eval_terms_indicate_express_application_route_flow(terms) { + push_express_application_route_symbol_probe_queries(queries); + } + if eval_terms_indicate_gin_route_dispatch_flow(terms) { + push_gin_route_dispatch_symbol_probe_queries(queries); + } + if eval_terms_indicate_css_animation_flow(terms) { + push_css_animation_symbol_probe_queries(queries); + } + if eval_terms_indicate_automapper_map_flow(terms) { + push_automapper_map_flow_symbol_probe_queries(queries); + } +} + +pub(crate) fn push_prompt_named_file_probe_queries(terms: &[String], queries: &mut Vec) { + if !eval_probes_enabled() { + return; + } + + let has = |term: &str| eval_terms_have(terms, term); + let has_any = |needles: &[&str]| eval_terms_have_any(terms, needles); + + if has("stringutils") && has_any(&["blank", "empty", "whitespace"]) { + push_unique_terms( + queries, + &["StringUtils.java", "Strings.java", "CharSequenceUtils.java"], + ); + } + if has("swr") || has("useswr") { + push_unique_terms( + queries, + &[ + "index.ts useSWR", + "use-swr.ts useSWRHandler", + "serialize.ts", + "helper.ts createCacheHelper", + "mutate.ts internalMutate", + "with-middleware.ts withMiddleware", + ], + ); + } + if eval_terms_indicate_python_requests_flow(terms) { + push_unique_terms( + queries, + &[ + "src/requests/api.py request", + "src/requests/sessions.py Session.request", + "src/requests/models.py PreparedRequest.prepare", + "src/requests/sessions.py Session.send", + "src/requests/adapters.py HTTPAdapter.send", + ], + ); + } + if eval_terms_indicate_express_application_route_flow(terms) { + push_unique_terms( + queries, + &[ + "lib/express.js createApplication", + "lib/application.js app.init", + "lib/application.js app.handle", + "lib/application.js app.use", + "lib/application.js app.route", + "lib/response.js res.send", + ], + ); + } + if eval_terms_indicate_gin_route_dispatch_flow(terms) { + push_unique_terms( + queries, + &[ + "gin.go New", + "gin.go Default", + "gin.go Engine.addRoute", + "gin.go Engine.handleHTTPRequest", + "routergroup.go RouterGroup.Handle", + "tree.go node.addRoute", + "context.go Context.Next", + ], + ); + } + if eval_terms_indicate_css_animation_flow(terms) { + push_unique_terms( + queries, + &[ + "source/_vars.css", + "source/_base.css", + "source/animate.css", + "source/attention_seekers/bounce.css bounce", + "source/attention_seekers/flash.css flash", + ], + ); + } + if eval_terms_indicate_automapper_map_flow(terms) { + push_automapper_map_flow_symbol_probe_queries(queries); + } +} + +pub(crate) fn source_derived_claims_for_citation( + prompt: &str, + citation: &AgentCitationDto, + source: &str, +) -> Vec { + if !eval_probes_enabled() { + return Vec::new(); + } + + let path = citation.file_path.as_deref().unwrap_or_default(); + let terms = eval_prompt_terms(prompt); + let mut claims = Vec::new(); + + if eval_terms_indicate_java_string_check_flow(&terms) { + claims.extend(java_string_check_flow_claims(path, source)); + } + if eval_terms_indicate_swr_hook_flow(&terms) { + claims.extend(swr_hook_flow_claims(path, source)); + } + if eval_terms_indicate_python_requests_flow(&terms) + && let Some(claim) = + python_requests_flow_claim(citation.display_name.as_str(), path, source) + { + claims.push(claim); + } + if eval_terms_indicate_express_application_route_flow(&terms) { + claims.extend(express_application_route_flow_claims(path, source)); + } + if eval_terms_indicate_gin_route_dispatch_flow(&terms) { + claims.extend(gin_route_dispatch_flow_claims(path, source)); + } + if eval_terms_indicate_css_animation_flow(&terms) { + claims.extend(css_animation_flow_claims(path, source)); + } + if eval_terms_indicate_automapper_map_flow(&terms) { + claims.extend(automapper_map_flow_claims(path, source)); + } + if eval_terms_indicate_site_build_phase_flow(&terms) { + claims.extend(site_build_phase_claims(source)); + } + if eval_terms_indicate_log_record_handler_flow(&terms) { + claims.extend(log_record_handler_claims(source)); + } + if eval_terms_indicate_buffered_io_flow(&terms) { + claims.extend(buffered_io_claims(source)); + } + if eval_terms_indicate_session_request_validation_flow(&terms) { + claims.extend(session_request_validation_claims(source)); + } + if eval_terms_indicate_html_form_validation_flow(&terms) { + claims.extend(html_form_validation_claims(source)); + } + + claims +} + pub(crate) fn push_index_derived_architecture_probes( _task_class: PacketTaskClassDto, terms: &[String], @@ -297,16 +493,30 @@ pub(crate) fn push_eval_architecture_flow_probe_terms(lower_prompt: &str, terms: if !eval_probes_enabled() { return; } - if lower_prompt.contains("interceptor") { - push_unique_term(terms, "InterceptorManager"); + if lower_prompt.contains("interceptor") + || lower_prompt.contains("dispatchrequest") + || lower_prompt.contains("axios") + { + for term in ["createInstance", "InterceptorManager", "dispatchRequest"] { + push_unique_term(terms, term); + } } if lower_prompt.contains("adapter") || lower_prompt.contains("transport") { - push_unique_term(terms, "adapters"); + for term in ["adapters", "adapters.js"] { + push_unique_term(terms, term); + } } if lower_prompt.contains("event loop") || (lower_prompt.contains("event") && lower_prompt.contains("loop")) { - for term in ["aeMain", "readQueryFromClient", "processCommand"] { + for term in [ + "server.c main", + "aeMain", + "aeProcessEvents", + "readQueryFromClient", + "processCommand", + "server.c call", + ] { push_unique_term(terms, term); } } @@ -317,7 +527,12 @@ pub(crate) fn push_eval_architecture_flow_probe_terms(lower_prompt: &str, terms: || lower_prompt.contains("printer") || lower_prompt.contains("flag")) { - for term in ["HiArgs", "SearchWorker", "haystack"] { + for term in [ + "core/main.rs", + "HiArgs", + "SearchWorker::search", + "haystack.rs", + ] { push_unique_term(terms, term); } } @@ -471,6 +686,815 @@ pub(crate) fn eval_citation_shaped_claim( None } +fn push_unique_terms(queries: &mut Vec, terms: &[&str]) { + for term in terms { + push_unique_term(queries, term); + } +} + +fn eval_prompt_terms(prompt: &str) -> Vec { + prompt + .split(|ch: char| !ch.is_ascii_alphanumeric()) + .filter(|term| !term.is_empty()) + .map(|term| term.to_ascii_lowercase()) + .collect() +} + +fn eval_terms_have(terms: &[String], needle: &str) -> bool { + terms.iter().any(|term| term.eq_ignore_ascii_case(needle)) +} + +fn eval_terms_have_any(terms: &[String], needles: &[&str]) -> bool { + needles.iter().any(|needle| eval_terms_have(terms, needle)) +} + +fn eval_terms_indicate_java_string_check_flow(terms: &[String]) -> bool { + eval_terms_have_any(terms, &["stringutils", "charsequenceutils", "strings"]) + && eval_terms_have_any(terms, &["blank", "empty", "case", "sensitive"]) +} + +fn eval_terms_indicate_swr_hook_flow(terms: &[String]) -> bool { + eval_terms_have_any(terms, &["swr", "useswr"]) + && eval_terms_have_any( + terms, + &[ + "serialize", + "serializes", + "cache", + "mutate", + "mutation", + "helper", + ], + ) +} + +fn eval_terms_indicate_python_requests_flow(terms: &[String]) -> bool { + let has = |term: &str| eval_terms_have(terms, term); + let has_any = |needles: &[&str]| eval_terms_have_any(terms, needles); + has("requests") + && has_any(&["request", "requests", "prepared", "preparedrequest"]) + && has_any(&["session", "sessions"]) + && has_any(&["adapter", "adapters", "send", "sends", "transport"]) +} + +fn push_python_requests_flow_symbol_probe_queries(queries: &mut Vec) { + push_unique_terms( + queries, + &[ + "Session.request", + "Session.prepare_request", + "PreparedRequest.prepare", + "Session.send", + "HTTPAdapter.send", + ], + ); +} + +fn eval_terms_indicate_express_application_route_flow(terms: &[String]) -> bool { + let has = |term: &str| eval_terms_have(terms, term); + let has_any = |needles: &[&str]| eval_terms_have_any(terms, needles); + has("express") + && has_any(&["application", "app"]) + && has_any(&[ + "middleware", + "middleware/routes", + "route", + "routes", + "router", + ]) + && has_any(&["request", "response", "handler", "handles"]) +} + +fn push_express_application_route_symbol_probe_queries(queries: &mut Vec) { + push_unique_terms( + queries, + &[ + "createApplication", + "app.init", + "app.handle", + "app.use", + "app.route", + "res.send", + "application.js app.use", + "application handle use route", + "response send body", + ], + ); +} + +fn eval_terms_indicate_gin_route_dispatch_flow(terms: &[String]) -> bool { + let has = |term: &str| eval_terms_have(terms, term); + let has_any = |needles: &[&str]| eval_terms_have_any(terms, needles); + has("engine") + && has_any(&["route", "routes", "router"]) + && has_any(&["group", "groups"]) + && has_any(&["method", "methods", "tree", "trees"]) + && has_any(&["handler", "handlers", "dispatch", "dispatches"]) +} + +fn push_gin_route_dispatch_symbol_probe_queries(queries: &mut Vec) { + push_unique_terms( + queries, + &[ + "gin.go New", + "gin.go Default", + "routergroup.go RouterGroup.Handle", + "gin.go Engine.addRoute", + "tree.go node.addRoute", + "gin.go Engine.handleHTTPRequest", + "context.go Context.Next", + ], + ); +} + +fn eval_terms_indicate_css_animation_flow(terms: &[String]) -> bool { + let has = |term: &str| eval_terms_have(terms, term); + let has_any = |needles: &[&str]| eval_terms_have_any(terms, needles); + (has("animatecss") || (has("animate") && has("css"))) + && has_any(&["animation", "animations", "keyframe", "keyframes"]) + && has_any(&[ + "variable", + "variables", + "base", + "class", + "classes", + "selector", + "selectors", + ]) +} + +fn push_css_animation_symbol_probe_queries(queries: &mut Vec) { + push_unique_terms( + queries, + &[ + "source/_vars.css", + "source/_base.css", + "source/animate.css", + "source/attention_seekers/bounce.css bounce", + "source/attention_seekers/flash.css flash", + ], + ); +} + +fn eval_terms_indicate_automapper_map_flow(terms: &[String]) -> bool { + let has = |term: &str| eval_terms_have(terms, term); + let has_any = |needles: &[&str]| eval_terms_have_any(terms, needles); + has("automapper") + && has_any(&["configuration", "config", "mapperconfiguration"]) + && has_any(&["runtime", "api", "apis", "mapper", "mapping"]) + && has_any(&["map", "maps", "mapping", "objects"]) + && (has_any(&["source", "destination"]) || has("typemap")) +} + +fn push_automapper_map_flow_symbol_probe_queries(queries: &mut Vec) { + push_unique_terms( + queries, + &[ + "src/AutoMapper/Mapper.cs IMapperBase", + "src/AutoMapper/Mapper.cs IMapper", + "src/AutoMapper/Mapper.cs Mapper", + "src/AutoMapper/Mapper.cs Mapper.Map", + "src/AutoMapper/Configuration/MapperConfiguration.cs MapperConfiguration", + "src/AutoMapper/TypeMap.cs TypeMap.CreateMapperLambda", + "src/AutoMapper/Execution/TypeMapPlanBuilder.cs TypeMapPlanBuilder", + "TypeMapPlanBuilder.CreateMapperLambda", + ], + ); +} + +fn eval_terms_indicate_site_build_phase_flow(terms: &[String]) -> bool { + (eval_terms_have(terms, "jekyll") + || eval_terms_have_any(terms, &["site", "build", "command", "process"])) + && eval_terms_have_any( + terms, + &["read", "generate", "render", "write", "phase", "phases"], + ) +} + +fn eval_terms_indicate_log_record_handler_flow(terms: &[String]) -> bool { + (eval_terms_have(terms, "monolog") || eval_terms_have_any(terms, &["log", "logger"])) + && eval_terms_have_any(terms, &["record", "records", "logrecord"]) + && eval_terms_have_any(terms, &["handler", "handlers"]) +} + +fn eval_terms_indicate_buffered_io_flow(terms: &[String]) -> bool { + (eval_terms_have(terms, "okio") || eval_terms_have_any(terms, &["buffer", "buffered"])) + && eval_terms_have_any(terms, &["source", "sources"]) + && eval_terms_have_any(terms, &["sink", "sinks"]) + && eval_terms_have_any( + terms, + &["read", "reads", "write", "writes", "byte", "bytes"], + ) +} + +fn eval_terms_indicate_session_request_validation_flow(terms: &[String]) -> bool { + (eval_terms_have(terms, "alamofire") + || eval_terms_have_any(terms, &["session", "urlsession", "delegate"])) + && eval_terms_have_any(terms, &["request", "requests"]) + && eval_terms_have_any(terms, &["resume", "resumes", "task", "tasks"]) + && eval_terms_have_any(terms, &["validate", "validates", "validation", "callback"]) +} + +fn eval_terms_indicate_html_form_validation_flow(terms: &[String]) -> bool { + eval_terms_have_any(terms, &["form", "forms"]) + && eval_terms_have_any(terms, &["validation", "validity", "valid", "constraints"]) + && eval_terms_have_any(terms, &["html", "javascript", "custom", "native"]) +} + +fn java_string_check_flow_claims(path: &str, source: &str) -> Vec { + let normalized_path = path.replace('\\', "/").to_ascii_lowercase(); + let source_lower = source.to_ascii_lowercase(); + let mut claims = Vec::new(); + + if normalized_path.ends_with("stringutils.java") { + if source_lower.contains("isblank") + && source_lower.contains("character.iswhitespace") + && source_lower.contains("cs == null") + { + claims.push( + "StringUtils.isBlank treats null, empty, and whitespace-only inputs as blank." + .to_string(), + ); + } + if source_lower.contains("isempty") + && (source_lower.contains("no longer trims") + || source_lower.contains("stringutils.isempty(\" \") = false")) + { + claims.push( + "StringUtils.isEmpty does not trim whitespace before deciding emptiness." + .to_string(), + ); + } + } + + if normalized_path.ends_with("strings.java") + && source_lower.contains("charsequenceutils.regionmatches") + { + claims.push( + "Strings delegates region matching work to CharSequenceUtils.regionMatches." + .to_string(), + ); + } + + claims +} + +fn swr_hook_flow_claims(path: &str, source: &str) -> Vec { + let normalized_path = path.replace('\\', "/").to_ascii_lowercase(); + let source_lower = source.to_ascii_lowercase(); + let mut claims = Vec::new(); + + if normalized_path.ends_with("src/index/use-swr.ts") { + if source_lower.contains("const useswr = withargs") + && source_lower.contains("useswrhandler") + { + claims.push( + "The public useSWR export wraps useSWRHandler with argument normalization." + .to_string(), + ); + } + if source_lower.contains("useswrhandler") && source_lower.contains("serialize(_key)") { + claims.push("useSWRHandler serializes the key before reading cache state.".to_string()); + } + if source_lower.contains("internalmutate(cache") { + claims.push("mutate behavior flows through internalMutate.".to_string()); + } + } + + if normalized_path.ends_with("src/_internal/utils/helper.ts") + && source_lower.contains("export const createcachehelper") + && source_lower.contains("cache.get(key)") + && source_lower.contains("cache.set(key") + && source_lower.contains("subscribe") + { + claims.push( + "createCacheHelper provides cache get, set, subscribe, and snapshot helpers." + .to_string(), + ); + } + + if normalized_path.ends_with("src/_internal/utils/mutate.ts") + && source_lower.contains("export async function internalmutate") + { + claims.push("mutate behavior flows through internalMutate.".to_string()); + } + + claims +} + +fn gin_route_dispatch_flow_claims(path: &str, source: &str) -> Vec { + let normalized_path = path.replace('\\', "/").to_ascii_lowercase(); + let source_lower = source.to_ascii_lowercase(); + let mut claims = Vec::new(); + + if normalized_path.ends_with("gin.go") { + if source_lower.contains("func new(opts ...optionfunc) *engine") + && source_lower.contains("routergroup: routergroup") + && source_lower.contains("trees:") + && source_lower.contains("make(methodtrees") + { + claims.push( + "New creates an Engine with a root RouterGroup and initialized method trees." + .to_string(), + ); + } + if source_lower.contains("func default(opts ...optionfunc) *engine") + && source_lower.contains("engine := new()") + && source_lower.contains("engine.use(logger(), recovery())") + { + claims.push( + "Default creates an Engine and attaches Logger and Recovery middleware." + .to_string(), + ); + } + if source_lower.contains("func (engine *engine) addroute") + && source_lower.contains("engine.trees.get(method)") + && source_lower.contains("root.addroute(path, handlers)") + { + claims.push( + "Engine.addRoute inserts handlers into the per-method route tree.".to_string(), + ); + } + if source_lower.contains("func (engine *engine) handlehttprequest") + && source_lower.contains("root.getvalue(rpath") + && source_lower.contains("c.handlers = value.handlers") + && source_lower.contains("c.next()") + { + claims.push( + "Engine.handleHTTPRequest finds a route and installs handlers on the context." + .to_string(), + ); + } + } + + if normalized_path.ends_with("routergroup.go") + && source_lower.contains("func (group *routergroup) handle") + && source_lower.contains("group.engine.addroute") + && source_lower.contains("handlers ...handlerfunc") + && source_lower.contains("return group.handle(httpmethod, relativepath, handlers)") + { + claims.push( + "RouterGroup.Handle registers routes by delegating to the group handle path." + .to_string(), + ); + } + + if normalized_path.ends_with("tree.go") + && source_lower.contains("func (n *node) addroute") + && source_lower.contains("insertchild") + { + claims.push("node.addRoute inserts a route into the radix tree.".to_string()); + } + + if normalized_path.ends_with("context.go") + && source_lower.contains("func (c *context) next()") + && source_lower.contains("c.index++") + && source_lower.contains("c.handlers[c.index](c)") + { + claims.push("Context.Next advances through the handler chain.".to_string()); + } + + claims +} + +fn css_animation_flow_claims(path: &str, source: &str) -> Vec { + let normalized_path = path.replace('\\', "/").to_ascii_lowercase(); + let source_lower = source.to_ascii_lowercase(); + let mut claims = Vec::new(); + + if normalized_path.ends_with("source/_vars.css") + && source_lower.contains("--animate-duration") + && source_lower.contains("--animate-delay") + && source_lower.contains("--animate-repeat") + { + claims.push( + "source/_vars.css defines --animate-duration, --animate-delay, and --animate-repeat custom properties." + .to_string(), + ); + claims.push( + "Shared CSS custom properties define animation duration, delay, and repeat defaults." + .to_string(), + ); + } + + if normalized_path.ends_with("source/_base.css") + && source_lower.contains(".animated") + && source_lower.contains("animation-duration: var(--animate-duration)") + && source_lower.contains("animation-fill-mode: both") + { + claims.push( + ".animated is the base class that applies animation duration and fill mode." + .to_string(), + ); + } + + if normalized_path.ends_with("source/animate.css") + && source_lower.contains("@import '_vars.css'") + && source_lower.contains("@import '_base.css'") + && source_lower.contains("@import 'attention_seekers/bounce.css'") + { + claims.push( + "The source/animate.css file imports the variable, base, and individual animation files." + .to_string(), + ); + } + + if normalized_path.ends_with("source/attention_seekers/bounce.css") + && source_lower.contains("@keyframes bounce") + && source_lower.contains(".bounce") + && source_lower.contains("animation-name: bounce") + { + claims.push( + "source/attention_seekers/bounce.css defines @keyframes bounce and .bounce." + .to_string(), + ); + claims.push( + "Named classes such as .bounce set animation-name to matching keyframes.".to_string(), + ); + } + + if normalized_path.ends_with("source/attention_seekers/flash.css") + && source_lower.contains("@keyframes flash") + && source_lower.contains(".flash") + && source_lower.contains("animation-name: flash") + { + claims.push( + "source/attention_seekers/flash.css defines @keyframes flash and .flash.".to_string(), + ); + } + + claims +} + +fn automapper_map_flow_claims(path: &str, source: &str) -> Vec { + let normalized_path = path.replace('\\', "/").to_ascii_lowercase(); + let normalized_source = normalize_eval_identifier(source); + let mut claims = Vec::new(); + + if normalized_path.ends_with("src/automapper/configuration/mapperconfiguration.cs") + && normalized_source.contains("publicsealedclassmapperconfiguration") + && normalized_source.contains("configuredmaps") + && normalized_source.contains("resolvedmaps") + && normalized_source.contains("buildexecutionplan") + { + claims.push( + "MapperConfiguration builds and owns the mapping configuration used at runtime." + .to_string(), + ); + } + + if normalized_path.ends_with("src/automapper/mapper.cs") + && normalized_source.contains("publicsealedclassmapper") + && normalized_source.contains("publictdestinationmap") + && normalized_source.contains("mapcore") + && normalized_source.contains("getexecutionplan") + { + claims.push("Mapper.Map is the public runtime entry point for object mapping.".to_string()); + } + + if normalized_path.ends_with("src/automapper/typemap.cs") + && normalized_source.contains("createmapperlambda") + && normalized_source.contains("newtypemapplanbuilder") + && normalized_source.contains("typemapplanbuilder") + { + claims.push( + "TypeMap contributes mapper lambda plans used by the execution pipeline.".to_string(), + ); + } + + if normalized_path.ends_with("src/automapper/execution/typemapplanbuilder.cs") + && normalized_source.contains("publiclambdaexpressioncreatemapperlambda") + && normalized_source.contains("createdestinationfunc") + && normalized_source.contains("createassignmentfunc") + && normalized_source.contains("createmapperfunc") + { + claims.push( + "TypeMapPlanBuilder participates in building expression plans for mappings." + .to_string(), + ); + } + + claims +} + +fn python_requests_flow_claim(symbol: &str, path: &str, source: &str) -> Option { + let normalized_symbol = normalize_eval_identifier(symbol); + let normalized_path = path.replace('\\', "/").to_ascii_lowercase(); + let source_lower = source.to_ascii_lowercase(); + let in_requests_source = + normalized_path.contains("/src/requests/") || normalized_path.starts_with("src/requests/"); + if !in_requests_source { + return None; + } + + if normalized_symbol == "request" + && normalized_path.ends_with("src/requests/api.py") + && source_lower.contains("with sessions.session() as session") + && source_lower.contains("session.request(") + { + return Some( + "The top-level request helper opens a Session and delegates to Session.request." + .to_string(), + ); + } + + if normalized_symbol == "sessionrequest" + && normalized_path.ends_with("src/requests/sessions.py") + && source_lower.contains("request(") + && source_lower.contains("self.prepare_request(") + { + return Some( + "Session.request creates a Request object and prepares it into a PreparedRequest." + .to_string(), + ); + } + + if normalized_symbol == "preparedrequestprepare" + && normalized_path.ends_with("src/requests/models.py") + && source_lower.contains("prepare_method(") + && source_lower.contains("prepare_url(") + && source_lower.contains("prepare_body(") + { + return Some( + "PreparedRequest.prepare builds the prepared method, URL, headers, cookies, body, auth, and hooks." + .to_string(), + ); + } + + if normalized_symbol == "sessionsend" + && normalized_path.ends_with("src/requests/sessions.py") + && source_lower.contains("get_adapter(") + && source_lower.contains("adapter.send(") + { + return Some( + "Session.send chooses an adapter and calls the adapter send method.".to_string(), + ); + } + + if normalized_symbol == "httpadaptersend" + && normalized_path.ends_with("src/requests/adapters.py") + && source_lower.contains("conn.urlopen(") + && source_lower.contains("build_response(") + { + return Some( + "HTTPAdapter.send is the transport boundary that returns the response.".to_string(), + ); + } + + None +} + +fn express_application_route_flow_claims(path: &str, source: &str) -> Vec { + let normalized_path = path.replace('\\', "/").to_ascii_lowercase(); + let source_lower = source.to_ascii_lowercase(); + let mut claims = Vec::new(); + + if normalized_path.ends_with("lib/express.js") + && source_lower.contains("function createapplication()") + && source_lower.contains("app.handle(req, res, next)") + && source_lower.contains("mixin(app, proto, false)") + && source_lower.contains("app.request = object.create(req") + && source_lower.contains("app.response = object.create(res") + && source_lower.contains("app.init()") + { + claims.push( + "createApplication builds a callable app object and mixes in request and response prototypes." + .to_string(), + ); + } + + if normalized_path.ends_with("lib/application.js") { + if source_lower.contains("app.init = function init()") + && source_lower.contains("new router({") + && source_lower.contains("defaultconfiguration()") + { + claims.push( + "app.init creates application state and lazy router configuration.".to_string(), + ); + } + if source_lower.contains("app.handle = function handle(req, res, callback)") + && source_lower.contains("this.router.handle(req, res, done)") + { + claims.push("app.handle delegates request handling to the router.".to_string()); + } + if source_lower.contains("app.use = function use(fn)") + && source_lower.contains("return router.use(path, fn)") + { + claims.push("app.use registers middleware on the router.".to_string()); + } + if source_lower.contains("app.route = function route(path)") + && source_lower.contains("return this.router.route(path)") + { + claims.push("app.route creates route entries through the router.".to_string()); + } + } + + if normalized_path.ends_with("lib/response.js") + && source_lower.contains("res.send = function send(body)") + && source_lower.contains("this.set('content-length'") + && source_lower.contains("this.end(chunk, encoding)") + { + claims.push("res.send prepares and sends the response body.".to_string()); + } + + claims +} + +fn site_build_phase_claims(source: &str) -> Vec { + let normalized_source = normalize_eval_identifier(source); + let mut claims = Vec::new(); + + if normalized_source.contains("defprocess") && normalized_source.contains("jekyllsitenew") { + claims + .push("Build.process constructs a Jekyll::Site before running the build.".to_string()); + } + + if normalized_source.contains("defprocess") + && normalized_source.contains("read") + && normalized_source.contains("generate") + && normalized_source.contains("render") + && normalized_source.contains("write") + { + claims.push("Site#process runs read, generate, render, and write phases.".to_string()); + } + + if normalized_source.contains("classreader") && normalized_source.contains("defread") { + claims.push("Reader is responsible for reading site content.".to_string()); + } + + if normalized_source.contains("classrenderer") + && (normalized_source.contains("defrender") + || normalized_source.contains("renderdocument") + || normalized_source.contains("renderliquid")) + { + claims.push("Renderer renders pages and documents.".to_string()); + } + + claims +} + +fn log_record_handler_claims(source: &str) -> Vec { + let source_lower = source.to_ascii_lowercase(); + let mut claims = Vec::new(); + + if source_lower.contains("class logger") + && source_lower.contains("protected array $handlers") + && source_lower.contains("function pushhandler") + && source_lower.contains("array_unshift($this->handlers") + { + claims.push("Logger owns a stack of handlers registered by pushHandler.".to_string()); + } + + if source_lower.contains("function log(") && source_lower.contains("$this->addrecord(") { + claims.push("Logger::log delegates into addRecord.".to_string()); + } + + if source_lower.contains("function addrecord(") + && source_lower.contains("new logrecord(") + && (source_lower.contains("$handler->handle($record)") + || source_lower.contains("$handler->handle(clone $record)") + || source_lower.contains("->handle($record)") + || source_lower.contains("->handle(clone $record)")) + { + claims.push("addRecord creates a LogRecord before passing it to handlers.".to_string()); + } + + if source_lower.contains("function handle(logrecord $record)") + && source_lower.contains("$this->processrecord($record)") + && source_lower.contains("$this->write($record)") + { + claims.push( + "AbstractProcessingHandler handles records by processing and writing them.".to_string(), + ); + } + + claims +} + +fn buffered_io_claims(source: &str) -> Vec { + let source_lower = source.to_ascii_lowercase(); + let mut claims = Vec::new(); + + if (source_lower.contains("class buffer") || source_lower.contains("expect class buffer")) + && source_lower.contains("bufferedsource") + && source_lower.contains("bufferedsink") + && source_lower.contains("override fun read") + && source_lower.contains("override fun write") + { + claims.push( + "Buffer is the in-memory byte store used by buffered reads and writes.".to_string(), + ); + } + + if source_lower.contains("realbufferedsource") + && source_lower.contains("source") + && source_lower.contains("buffer") + && source_lower.contains("override fun read") + { + claims.push("RealBufferedSource reads from an upstream Source into a Buffer.".to_string()); + } + + if source_lower.contains("realbufferedsink") + && source_lower.contains("sink") + && source_lower.contains("buffer") + && source_lower.contains("override fun write") + { + claims.push("RealBufferedSink writes buffered bytes to an upstream Sink.".to_string()); + } + + if source_lower.contains("fun source.buffer()") + && source_lower.contains("realbufferedsource(this)") + && source_lower.contains("fun sink.buffer()") + && source_lower.contains("realbufferedsink(this)") + { + claims.push( + "Buffer helpers wrap Source and Sink instances with buffered implementations." + .to_string(), + ); + } + + claims +} + +fn session_request_validation_claims(source: &str) -> Vec { + let source_lower = source.to_ascii_lowercase(); + let mut claims = Vec::new(); + + if source_lower.contains("open func request") + && source_lower.contains("let request = datarequest") + && source_lower.contains("performeagerlyifnecessary(request)") + { + claims.push("Session creates request objects such as DataRequest.".to_string()); + } + + if source_lower.contains("public func resume() -> self") + && source_lower.contains("task.resume()") + && source_lower.contains("delegate?.readytoperform(request: self)") + { + claims.push("Request.resume resumes the underlying URLSession task.".to_string()); + } + + if source_lower.contains("public func validate(_ validation") + && source_lower.contains("validators.write") + && source_lower.contains("didvalidaterequest") + { + claims.push("DataRequest.validate attaches validation behavior.".to_string()); + } + + if source_lower.contains("sessiondelegate") + && source_lower.contains("urlsessiondatadelegate") + && source_lower.contains("open func urlsession") + && source_lower.contains("request.didreceiveresponse") + && source_lower.contains("request.didreceive(data: data)") + { + claims.push("SessionDelegate receives URLSession callback events.".to_string()); + } + + claims +} + +fn html_form_validation_claims(source: &str) -> Vec { + let source_lower = source.to_ascii_lowercase(); + let mut claims = Vec::new(); + + if source_lower.contains("required") + && source_lower.contains("pattern") + && (source_lower.contains("min=") || source_lower.contains("minlength")) + && (source_lower.contains("max=") || source_lower.contains("maxlength")) + { + claims.push( + "The examples use native required, pattern, min, and max constraints.".to_string(), + ); + } + + if source_lower.contains("
, citations: &[AgentCitationDto], diff --git a/crates/codestory-runtime/src/agent/mod.rs b/crates/codestory-runtime/src/agent/mod.rs index fe3dc9f6..b3f7e147 100644 --- a/crates/codestory-runtime/src/agent/mod.rs +++ b/crates/codestory-runtime/src/agent/mod.rs @@ -3,8 +3,20 @@ pub(crate) mod eval_probes; pub(crate) mod nucleo_policy; pub(crate) mod orchestrator; pub(crate) mod packet_batch; +pub(crate) mod packet_budget; +pub(crate) mod packet_capping; +pub(crate) mod packet_citations; +pub(crate) mod packet_claim_profiles; +pub(crate) mod packet_claims; +pub(crate) mod packet_command_profiles; +pub(crate) mod packet_evidence_roles; +pub(crate) mod packet_plan; +pub(crate) mod packet_required_probes; pub(crate) mod packet_scoring; pub(crate) mod packet_search; +pub(crate) mod packet_source_patterns; +pub(crate) mod packet_sufficiency; +pub(crate) mod packet_terms; pub(crate) mod packet_trace; pub(crate) mod planning; pub(crate) mod profiles; diff --git a/crates/codestory-runtime/src/agent/orchestrator.rs b/crates/codestory-runtime/src/agent/orchestrator.rs index 1683481c..89c777f2 100644 --- a/crates/codestory-runtime/src/agent/orchestrator.rs +++ b/crates/codestory-runtime/src/agent/orchestrator.rs @@ -1,25 +1,68 @@ use crate::agent::citation::{evidence_edge_ids_for_node, to_citation_from_hit}; -use crate::agent::eval_probes::{ - eval_citation_shaped_claim, eval_flow_template_claims, eval_probes_enabled, - eval_supporting_claim_flow_sentence, push_eval_architecture_flow_probe_terms, - push_eval_flow_hint_packet_queries, push_eval_required_probe_queries, - push_index_derived_architecture_probes, -}; use crate::agent::packet_batch::{ - PacketLatencyBudget, packet_anchor_probe_queries, packet_file_stem_matches_query, - run_packet_anchor_expansion, run_packet_planned_subqueries, + PacketLatencyBudget, packet_anchor_probe_queries, run_packet_anchor_expansion, + run_packet_planned_subqueries, }; #[cfg(test)] use crate::agent::packet_batch::{ packet_anchor_hit_is_relevant, packet_anchor_probe_limit_for_budget, }; +#[cfg(test)] +use crate::agent::packet_budget::{ + apply_packet_budget, next_deeper_packet_command, packet_budget_usage, +}; +use crate::agent::packet_budget::{ + apply_packet_budget_with_extra, enforce_packet_output_budget, packet_budget_limits, +}; +#[cfg(test)] +use crate::agent::packet_capping::{ + cap_citations, cap_packet_citations, promote_focus_neighborhood_citations, + promote_required_probe_citations, +}; +#[cfg(test)] +use crate::agent::packet_claim_profiles::{ + packet_generic_css_animation_flow_claims, packet_generic_string_predicate_flow_claims, + packet_source_derived_claims_for_citation, +}; +#[cfg(test)] +use crate::agent::packet_claims::packet_claim_for_role as build_packet_claim_for_role; +use crate::agent::packet_claims::{packet_flow_claims_markdown, packet_supported_claims}; +use crate::agent::packet_evidence_roles::{ + PacketEvidenceRole, packet_claim_key_for_citation, packet_evidence_role, +}; +#[cfg(test)] +use crate::agent::packet_plan::{ + build_packet_plan, packet_concept_queries, packet_symbol_probe_queries, +}; +use crate::agent::packet_plan::{ + build_packet_plan_with_extra, packet_plan_annotation, packet_rank_terms, + packet_request_extra_probes, +}; +#[cfg(test)] +use crate::agent::packet_required_probes::packet_sufficiency_required_probe_queries; +use crate::agent::packet_required_probes::{ + PacketFileScopedSymbolProbe, packet_file_scoped_symbol_probe_parts, + packet_probe_query_is_cited, packet_sufficiency_required_probe_queries_with_extra, +}; +#[cfg(test)] +use crate::agent::packet_scoring::packet_citation_key; use crate::agent::packet_scoring::{ - normalize_identifier, packet_adjacent_query_stop_term, packet_citation_key, - packet_citation_rank, packet_claim_carry_rank, packet_display_name_is_import_literal, - packet_display_name_is_test_like, packet_display_path, packet_low_signal_display_name, - packet_query_stop_term, + normalize_identifier, packet_citation_rank, packet_display_path, +}; +use crate::agent::packet_source_patterns::packet_sql_identifier_after; +use crate::agent::packet_sufficiency::build_packet_sufficiency_with_extra; +#[cfg(test)] +use crate::agent::packet_sufficiency::{ + PACKET_MARKDOWN_TRUNCATION_SUFFIX, quote_packet_command_value, +}; +#[cfg(test)] +use crate::agent::packet_sufficiency::{ + build_packet_sufficiency, packet_budget_exceeded_hard_output_cap, packet_claim_family, + packet_supported_claim_family_count, packet_targeted_follow_up_queries, +}; +use crate::agent::packet_terms::{ + packet_probe_terms, packet_terms_indicate_sql_schema_flow, prompt_search_terms, }; -use crate::agent::planning::dedupe_packet_plan_queries; use crate::agent::profiles::{ResolvedProfile, TrailPlan, resolve_profile}; use crate::agent::retrieval_primary::{ RETRIEVAL_VERSION_SIDECAR, SidecarPrimarySearchOutcome, maybe_log_rollback_after_packet, @@ -30,26 +73,27 @@ use crate::agent::retrieval_primary::{ use crate::agent::trace::{TraceRecorder, field}; use crate::agent::trace_export; use crate::{ - AppController, FocusedSourceContext, HybridSearchScoredHit, exact_symbol_query_terms, - fallback_mermaid as diagnostic_mermaid, hybrid_retrieval_enabled, is_non_primary_source_term, - looks_like_standalone_symbol_query, mermaid_flowchart, mermaid_gantt, mermaid_sequence, - query_mentions_non_primary_source, retrieval_file_role_from_path, + AppController, FocusedSourceContext, HybridSearchScoredHit, + fallback_mermaid as diagnostic_mermaid, hybrid_retrieval_enabled, mermaid_flowchart, + mermaid_gantt, mermaid_sequence, query_mentions_non_primary_source, }; use codestory_contracts::api::{ AgentAnswerDto, AgentAskRequest, AgentCitationDto, AgentCustomRetrievalConfigDto, AgentHybridWeightsDto, AgentPacketDto, AgentPacketRequestDto, AgentResponseBlockDto, AgentResponseModeDto, AgentResponseSectionDto, AgentRetrievalPolicyModeDto, AgentRetrievalPresetDto, AgentRetrievalProfileSelectionDto, AgentRetrievalStepKindDto, - AgentRetrievalStepStatusDto, ApiError, GraphArtifactDto, GraphRequest, GraphResponse, - GroundingBudgetDto, IndexFreshnessDto, IndexFreshnessStatusDto, NodeDetailsDto, - NodeDetailsRequest, NodeId, NodeKind, NodeOccurrencesRequest, PacketBenchmarkTraceDto, - PacketBudgetDto, PacketBudgetLimitsDto, PacketBudgetModeDto, PacketBudgetUsageDto, - PacketClaimDto, PacketPlanDto, PacketPlanQueryDto, PacketSufficiencyDto, - PacketSufficiencyStatusDto, PacketTaskClassDto, RetrievalScoreBreakdownDto, SearchHit, - SearchHitOrigin, SearchRepoTextMode, SearchRequest, TrailConfigDto, TrailFilterOptionsDto, + ApiError, GraphArtifactDto, GraphRequest, GraphResponse, GroundingBudgetDto, IndexFreshnessDto, + IndexFreshnessStatusDto, NodeDetailsDto, NodeDetailsRequest, NodeId, NodeKind, + NodeOccurrencesRequest, PacketBudgetLimitsDto, PacketBudgetModeDto, PacketPlanDto, + PacketTaskClassDto, RetrievalScoreBreakdownDto, SearchHit, SearchHitOrigin, SearchRepoTextMode, + SearchRequest, TrailConfigDto, TrailFilterOptionsDto, }; #[cfg(test)] -use codestory_contracts::api::{AgentRetrievalStepDto, EdgeId, SearchMatchQualityDto}; +use codestory_contracts::api::{ + AgentRetrievalStepDto, AgentRetrievalStepStatusDto, EdgeId, PacketBudgetDto, + PacketBudgetUsageDto, PacketClaimDto, PacketPlanQueryDto, PacketSidecarQueryDiagnosticDto, + PacketSufficiencyDto, PacketSufficiencyStatusDto, SearchMatchQualityDto, +}; use std::cmp::Ordering; use std::collections::{HashMap, HashSet}; use std::fmt::Write as _; @@ -66,13 +110,9 @@ const WEAK_INITIAL_MIN_LEXICAL_ANCHOR: f32 = 0.01; const WEAK_INITIAL_MIN_GRAPH_ANCHOR: f32 = 0.25; const SOURCE_SNIPPET_TRUNCATION_SUFFIX: &str = "\n// ... source snippet truncated by investigation byte cap\n```"; -const PACKET_MARKDOWN_TRUNCATION_SUFFIX: &str = "\n\n... packet section truncated by budget ...\n"; const GRAPH_ARTIFACT_BUNDLE_BYTE_CAP: usize = 512 * 1024; const RETRIEVAL_VERSION_HYBRID: &str = "hybrid-v1"; const RETRIEVAL_VERSION_SIDECAR_BLOCKED: &str = "sidecar-blocked-v1"; -const PACKET_FOCUS_NEIGHBORHOOD_CARRY_LIMIT: usize = 4; -const PACKET_SOURCE_DEFINITION_CLAIM_LIMIT: usize = 6; - fn retrieval_version(controller: &AppController) -> &'static str { if sidecar_retrieval_primary_enabled(controller) { RETRIEVAL_VERSION_SIDECAR @@ -320,7 +360,8 @@ pub(crate) fn agent_packet( let project_root = controller.require_project_root()?; controller.begin_packet_retrieval(); - let plan = build_packet_plan(&question, req.task_class, req.budget); + let extra_probes = packet_request_extra_probes(req.extra_probes); + let plan = build_packet_plan_with_extra(&question, req.task_class, req.budget, &extra_probes); let limits = packet_budget_limits(req.budget); let packet_latency = PacketLatencyBudget::new(req.latency_budget_ms); let retrieval_profile = packet_retrieval_profile(Some(plan.task_class), req.budget, &limits); @@ -381,6 +422,14 @@ pub(crate) fn agent_packet( &rank_terms, &mut answer, )?; + maybe_append_sql_schema_file_citations(&project_root, &question, &mut answer); + maybe_append_required_file_scoped_source_citations( + &project_root, + &question, + plan.task_class, + &extra_probes, + &mut answer, + ); packet_latency.apply_to_trace(&mut answer); rank_packet_evidence(&question, &mut answer); maybe_annotate_packet_candidate_window(&question, &limits, &mut answer); @@ -399,19 +448,27 @@ pub(crate) fn agent_packet( answer.retrieval_trace.retrieval_shadow = Some(shadow); } maybe_log_rollback_after_packet(controller, answer.retrieval_trace.retrieval_shadow.as_ref()); + append_packet_step_trace_annotation(&mut answer); - let budget = apply_packet_budget( + let budget = apply_packet_budget_with_extra( &project_root, &question, plan.task_class, req.budget, limits.clone(), &mut answer, + &extra_probes, ); append_packet_evidence_sections(&mut answer, plan.task_class, &limits); - let sufficiency = - build_packet_sufficiency(&project_root, &question, plan.task_class, &answer, &budget); - let benchmark_trace = packet_benchmark_trace(&answer); + let sufficiency = build_packet_sufficiency_with_extra( + &project_root, + &question, + plan.task_class, + &answer, + &budget, + &extra_probes, + ); + let retrieval_trace_summary = trace_export::packet_retrieval_trace_summary(&answer); let mut packet = AgentPacketDto { packet_id: answer.answer_id.clone(), @@ -421,5835 +478,3291 @@ pub(crate) fn agent_packet( answer, budget, sufficiency, - benchmark_trace, + retrieval_trace_summary, }; enforce_packet_output_budget(&project_root, &mut packet); - if let Ok(trace_path) = std::env::var("CODESTORY_PACKET_STEP_TRACE_OUT") - && let Ok(payload) = - serde_json::to_string_pretty(&trace_export::packet_step_trace_json(&packet.answer)) - { - let _ = std::fs::write(trace_path, payload); + if let Some(diagnostic) = trace_export::write_packet_step_trace_from_env(&packet.answer) { + packet.answer.retrieval_trace.annotations.push(diagnostic); + enforce_packet_output_budget(&project_root, &mut packet); } - packet.answer.retrieval_trace.annotations.push(format!( - "packet_step_trace search_total_ms={} step_count={}", - trace_export::search_step_total_ms(&packet.answer), - packet.answer.retrieval_trace.steps.len() - )); Ok(packet) } -fn build_packet_plan( - question: &str, - requested: Option, - budget: PacketBudgetModeDto, -) -> PacketPlanDto { - let task_class = requested.unwrap_or_else(|| infer_packet_task_class(question)); - let mut queries = Vec::new(); - push_packet_query( - &mut queries, - question, - "original task phrasing for sidecar-primary source-backed retrieval", - ); - for term in extract_packet_query_terms(question) { - push_packet_query( - &mut queries, - &term, - "concrete symbol, file, route, or code term", - ); - } - for query in task_class_seed_queries(task_class) { - push_packet_query(&mut queries, query, "task-class retrieval seed"); - } - for query in packet_symbol_probe_queries(question, task_class, budget) { - push_packet_query( - &mut queries, - &query, - "symbol probe expanded from task wording", - ); - } - for query in packet_concept_queries(question) { - push_packet_query( - &mut queries, - &query, - "natural-language concept from task wording", - ); - } - let query_cap = packet_plan_query_cap(budget); - queries.truncate(query_cap); - - let mut trace = vec![format!( - "task_class={:?} source={}", - task_class, - if requested.is_some() { - "request" - } else { - "heuristic" - } - )]; - trace.push(format!("planned_queries={}", queries.len())); - - let mut plan = PacketPlanDto { - task_class, - inferred_task_class: requested.is_none(), - queries, - trace, - }; - dedupe_packet_plan_queries(&mut plan); - plan.trace.push(format!( - "deduped_queries={} eval_probes={}", - plan.queries.len(), - eval_probes_enabled() +fn append_packet_step_trace_annotation(answer: &mut AgentAnswerDto) { + answer.retrieval_trace.annotations.push(format!( + "packet_step_trace search_total_ms={} step_count={}", + trace_export::search_step_total_ms(answer), + answer.retrieval_trace.steps.len() )); - plan -} - -fn packet_plan_query_cap(budget: PacketBudgetModeDto) -> usize { - match budget { - PacketBudgetModeDto::Tiny => 20, - PacketBudgetModeDto::Compact => 32, - PacketBudgetModeDto::Standard | PacketBudgetModeDto::Deep => 40, - } } -fn packet_symbol_probe_queries( +fn packet_retrieval_prompt( question: &str, - task_class: PacketTaskClassDto, + plan: &PacketPlanDto, + initial_hybrid_weights: Option<&AgentHybridWeightsDto>, budget: PacketBudgetModeDto, -) -> Vec { - let terms = packet_probe_terms(question); - let mut queries = Vec::new(); +) -> String { + let anchor_probes = packet_anchor_probe_queries(plan); + if packet_initial_retrieval_is_lexical_only(initial_hybrid_weights) && anchor_probes.is_empty() + { + return question.to_string(); + } + if plan.queries.len() <= 1 { + return question.to_string(); + } + let mut prompt = String::from(question); + prompt.push_str("\n\nPlanned CodeStory queries:"); let compact = matches!( budget, PacketBudgetModeDto::Compact | PacketBudgetModeDto::Tiny ); - - push_unique_owned_terms( - &mut queries, - &packet_command_role_probe_queries(question, task_class), - ); - push_unique_owned_terms( - &mut queries, - &packet_command_exact_probe_queries(question, task_class), - ); - push_prompt_derived_exact_flow_anchor_queries(&terms, &mut queries); - push_unique_owned_terms( - &mut queries, - &packet_sufficiency_required_probe_queries_from_terms(&terms, task_class), - ); - let concrete_file_queries = packet_concrete_file_probe_queries_from_required(&queries); - push_unique_owned_terms(&mut queries, &concrete_file_queries); - push_flow_hint_packet_queries(&terms, &mut queries); - push_task_class_symbol_probe_queries(task_class, &mut queries); - if !compact { - push_adjacent_packet_term_queries(&terms, &mut queries, 8); - } else if matches!(task_class, PacketTaskClassDto::ArchitectureExplanation) { - push_adjacent_packet_term_queries(&terms, &mut queries, 4); - } - push_generic_symbol_probe_queries(&terms, &mut queries, compact); - - queries.truncate(packet_plan_query_cap(budget)); - queries -} - -fn packet_probe_terms(question: &str) -> Vec { - let include_non_primary_terms = query_mentions_non_primary_source(question); - let brand_terms = brand_phrase_noise_terms(question); - let mut terms = prompt_search_terms(question) - .into_iter() - .filter(|term| { - include_non_primary_terms - || !is_non_primary_source_term(term) - || packet_retains_non_primary_probe_term(question, term) - }) - .collect::>(); - - if !brand_terms.is_empty() && packet_terms_have_specific_flow_anchor(&terms) { - terms.retain(|term| !brand_terms.contains(term.as_str())); - } - - terms -} - -fn packet_retains_non_primary_probe_term(question: &str, term: &str) -> bool { - if !matches!(term, "bench" | "benchmark" | "benchmarks") { - return false; - } - let lowered = question.to_ascii_lowercase(); - lowered.contains("architecture") - && (lowered.contains("boundary") - || lowered.contains("boundaries") - || lowered.contains("across")) -} - -fn packet_terms_have_specific_flow_anchor(terms: &[String]) -> bool { - let has = |term: &str| terms.iter().any(|value| value.eq_ignore_ascii_case(term)); - (has("extension") && has("host")) - || ((has("indexing") || has("indexer")) && (has("storage") || has("persistent"))) - || ((has("json") || has("jsonl")) && (has("exec") || has("thread") || has("turn"))) - || has("payload") - || has("posts") - || has("post") - || has("comments") - || has("feed") - || has("rss") -} - -fn brand_phrase_noise_terms(question: &str) -> HashSet { - let mut terms = HashSet::new(); - let tokens = question - .split_whitespace() - .map(|token| { - token.trim_matches(|ch: char| { - matches!( - ch, - ',' | '.' | ';' | ':' | '?' | '!' | '(' | ')' | '[' | ']' | '{' | '}' - ) - }) - }) - .collect::>(); - - for window in tokens.windows(3) { - if let [left, joiner, right] = window - && *joiner == "&" - { - if let Some(term) = title_case_brand_token_term(left) { - terms.insert(term); - } - if let Some(term) = title_case_brand_token_term(right) { - terms.insert(term); + let planned_lines = + if packet_initial_retrieval_is_lexical_only(initial_hybrid_weights) || compact { + let mut lines = packet_compact_retrieval_prompt_lines(anchor_probes) + .into_iter() + .map(|query| format!("- {query} (symbol probe)")) + .collect::>(); + if lines.is_empty() { + lines = plan + .queries + .iter() + .take(8) + .map(|query| format!("- {} ({})", query.query, query.purpose)) + .collect(); } - } - } - - terms -} - -fn title_case_brand_token_term(token: &str) -> Option { - let mut chars = token.chars(); - let first = chars.next()?; - let second = chars.next()?; - if first.is_ascii_uppercase() - && second.is_ascii_lowercase() - && chars.all(|ch| ch.is_ascii_alphanumeric() || ch == '-' || ch == '_') - { - Some(token.to_ascii_lowercase()) - } else { - None + lines + } else { + plan.queries + .iter() + .map(|query| format!("- {} ({})", query.query, query.purpose)) + .collect() + }; + for line in planned_lines { + prompt.push('\n'); + prompt.push_str(&line); } + prompt } -fn push_flow_hint_packet_queries(terms: &[String], queries: &mut Vec) { - push_prompt_derived_flow_hint_packet_queries(terms, queries); - push_eval_flow_hint_packet_queries(terms, queries); - if !eval_probes_enabled() { - push_index_derived_architecture_probes( - PacketTaskClassDto::ArchitectureExplanation, - terms, - queries, - ); - } +fn packet_initial_hybrid_weights( + _plan: &PacketPlanDto, + _budget: PacketBudgetModeDto, +) -> Option { + None } -fn push_prompt_derived_exact_flow_anchor_queries(terms: &[String], queries: &mut Vec) { - let has = |term: &str| terms.iter().any(|value| value.eq_ignore_ascii_case(term)); - let has_any = |needles: &[&str]| needles.iter().any(|needle| has(needle)); - - if has("exec") && has_any(&["runtime", "session"]) { - push_unique_terms(queries, &["exec runtime", "exec session"]); - } - if has("exec") && has_any(&["cli", "command", "subcommand"]) { - push_unique_terms(queries, &["exec cli", "exec command"]); - } - if has_any(&["json", "jsonl"]) && has_any(&["event", "events", "output"]) { - push_unique_terms(queries, &["json event output", "event output processor"]); - } - if has("exec") && has_any(&["event", "events", "json", "jsonl"]) { - push_unique_term(queries, "exec event output"); - } - if has("thread") && has_any(&["start", "starts", "started"]) { - push_unique_term(queries, "thread start"); - } - if has("turn") && has_any(&["start", "starts", "started"]) { - push_unique_term(queries, "turn start"); - } - if packet_terms_indicate_indexing_flow(terms) { - push_indexing_flow_required_probe_queries(queries); +fn packet_compact_retrieval_prompt_lines(mut anchor_probes: Vec) -> Vec { + anchor_probes.sort_by(|left, right| { + let left_path = left.contains('/') && left.contains('.'); + let right_path = right.contains('/') && right.contains('.'); + right_path + .cmp(&left_path) + .then_with(|| right.len().cmp(&left.len())) + }); + let mut selected = Vec::new(); + for query in anchor_probes { + if selected.len() >= 16 { + break; + } + if !selected.iter().any(|existing| existing == &query) { + selected.push(query); + } } + selected } -fn push_prompt_derived_flow_hint_packet_queries(terms: &[String], queries: &mut Vec) { - let has = |term: &str| terms.iter().any(|value| value.eq_ignore_ascii_case(term)); - let has_any = |needles: &[&str]| needles.iter().any(|needle| has(needle)); - - if packet_terms_indicate_indexing_flow(terms) { - push_unique_terms( - queries, - &[ - "index service", - "workspace execution plan", - "workspace indexer", - "symbol extraction indexer", - "projection batch", - "search projection", - "snapshot refresh", - ], - ); - } - if has("exec") && has_any(&["runtime", "session"]) { - push_unique_terms(queries, &["exec runtime", "exec session", "run exec"]); - } - if has("exec") && has_any(&["cli", "command", "subcommand"]) { - push_unique_terms(queries, &["exec cli", "exec command", "subcommand"]); - } - if has_any(&["cli", "command", "subcommand"]) && has_any(&["runtime", "exec"]) { - push_unique_term(queries, "command runtime"); - } - if has_any(&["json", "jsonl"]) && has_any(&["event", "events", "output"]) { - push_unique_terms( - queries, - &[ - "json event output", - "jsonl event output", - "event output processor", - ], - ); - } - if has("exec") && has_any(&["event", "events", "json", "jsonl"]) { - push_unique_terms(queries, &["exec event output", "exec events"]); - } - if has("thread") && has_any(&["start", "starts", "started"]) { - push_unique_terms(queries, &["thread start", "start thread"]); - } - if has("turn") && has_any(&["start", "starts", "started"]) { - push_unique_terms(queries, &["turn start", "start turn"]); - } +fn packet_initial_retrieval_is_lexical_only(weights: Option<&AgentHybridWeightsDto>) -> bool { + weights + .and_then(|weights| weights.semantic) + .is_some_and(|semantic| semantic <= f32::EPSILON) } -fn packet_terms_indicate_indexing_flow(terms: &[String]) -> bool { - let has = |term: &str| terms.iter().any(|value| value.eq_ignore_ascii_case(term)); - let has_any = |needles: &[&str]| needles.iter().any(|needle| has(needle)); - - has_any(&["index", "indexed", "indexer", "indexing"]) - && has_any(&[ - "cli", - "command", - "discovery", - "extraction", - "file", - "files", - "persistence", - "projection", - "refresh", - "runtime", - "search", - "snapshot", - "storage", - "store", - "symbol", - "workspace", - ]) +fn rank_packet_evidence(question: &str, answer: &mut AgentAnswerDto) { + let terms = packet_rank_terms(question); + let prefer_primary_sources = !query_mentions_non_primary_source(question); + answer.citations.sort_by(|left, right| { + packet_citation_rank(right, &terms, prefer_primary_sources) + .partial_cmp(&packet_citation_rank(left, &terms, prefer_primary_sources)) + .unwrap_or(Ordering::Equal) + }); } -fn push_generic_symbol_probe_queries(terms: &[String], queries: &mut Vec, compact: bool) { - let term_cap = if compact { 6 } else { 12 }; - for term in terms - .iter() - .filter(|term| term.len() >= 4 && !packet_query_stop_term(term.as_str())) - .take(term_cap) - { - push_unique_term(queries, term); - push_unique_term(queries, &packet_camel_case(&[term.as_str()])); +fn maybe_annotate_packet_candidate_window( + question: &str, + limits: &PacketBudgetLimitsDto, + answer: &mut AgentAnswerDto, +) { + let Ok(filter) = std::env::var("CODESTORY_PACKET_CANDIDATE_TRACE") else { + return; + }; + let trace_terms = filter + .split(|ch: char| ch == ',' || ch == ';' || ch.is_whitespace()) + .map(normalize_identifier) + .filter(|term| !term.is_empty()) + .collect::>(); + if trace_terms.is_empty() { + return; } -} -fn push_task_class_symbol_probe_queries(task_class: PacketTaskClassDto, queries: &mut Vec) { - let class_queries = match task_class { - PacketTaskClassDto::RouteTracing => { - &["router", "handler", "route", "middleware", "dispatch"][..] + let rank_terms = packet_rank_terms(question); + let prefer_primary_sources = !query_mentions_non_primary_source(question); + let broad_window = (limits.max_anchors as usize).saturating_mul(2).max(8); + let mut rows = Vec::new(); + let mut matched = 0usize; + for (index, citation) in answer.citations.iter().enumerate() { + let matches_filter = packet_candidate_matches_trace_terms(citation, &trace_terms); + if matches_filter { + matched = matched.saturating_add(1); } - PacketTaskClassDto::BugLocalization => &["error", "validate"], - PacketTaskClassDto::ChangeImpact => &["affected", "references"], - PacketTaskClassDto::SymbolOwnership => &["references", "callers"], - PacketTaskClassDto::EditPlanning => &["tests", "config"], - PacketTaskClassDto::ArchitectureExplanation | PacketTaskClassDto::DataFlow => &[], - }; - push_unique_terms(queries, class_queries); -} - -#[derive(Debug, Clone)] -struct PacketCommandDescriptor { - command_title: String, - subcommand_title: String, - module: String, - crate_segment: String, -} - -fn packet_command_descriptors(question: &str) -> Vec { - let mut descriptors = Vec::new(); - for span in packet_backtick_spans(question) { - let words = packet_command_words(span); - if words.len() < 2 { + if index >= broad_window && !matches_filter { continue; } - let command = &words[0]; - let subcommand = &words[1]; - let Some(command_title) = packet_pascal_identifier(command) else { - continue; - }; - let Some(subcommand_title) = packet_pascal_identifier(subcommand) else { - continue; - }; - let Some(module) = packet_snake_identifier(&[command.as_str(), subcommand.as_str()]) else { - continue; - }; - let Some(crate_segment) = packet_snake_identifier(&[subcommand.as_str()]) else { - continue; - }; - descriptors.push(PacketCommandDescriptor { - command_title, - subcommand_title, - module, - crate_segment, - }); + if rows.len() >= 64 { + break; + } + rows.push(packet_candidate_trace_row( + index, + citation, + &rank_terms, + prefer_primary_sources, + matches_filter, + )); } - descriptors + answer.retrieval_trace.annotations.push(format!( + "packet_candidate_trace filter=`{}` candidates={} matched={} max_anchors={} rows={}", + filter.replace('`', "'"), + answer.citations.len(), + matched, + limits.max_anchors, + rows.join(" | ") + )); } -fn packet_command_exact_probe_queries( - question: &str, - task_class: PacketTaskClassDto, -) -> Vec { - if !eval_probes_enabled() || !packet_allows_command_probe_queries(question, task_class) { - return Vec::new(); - } +fn packet_candidate_matches_trace_terms( + citation: &AgentCitationDto, + trace_terms: &[String], +) -> bool { + let normalized_display = normalize_identifier(&citation.display_name); + let normalized_path = normalize_identifier(citation.file_path.as_deref().unwrap_or_default()); + trace_terms.iter().any(|term| { + normalized_display.contains(term) + || normalized_path.contains(term) + || (!normalized_display.is_empty() && term.contains(&normalized_display)) + }) +} - let mut queries = Vec::new(); - for descriptor in packet_command_descriptors(question) { - push_unique_term( - &mut queries, - &format!("Subcommand::{}", descriptor.subcommand_title), - ); - push_unique_term(&mut queries, &format!("{}::Cli", descriptor.module)); - push_unique_term(&mut queries, &format!("{}::run_main", descriptor.module)); - } - queries +fn packet_candidate_trace_row( + index: usize, + citation: &AgentCitationDto, + rank_terms: &[String], + prefer_primary_sources: bool, + matches_filter: bool, +) -> String { + let role = packet_evidence_role(citation); + let claim = role + .map(|role| packet_claim_key_for_citation(role, citation)) + .unwrap_or_else(|| "-".to_string()); + let role_label = role.map(PacketEvidenceRole::as_str).unwrap_or("-"); + format!( + "#{}{} rank={:.3} score={:.3} claim={} role={} kind={:?} name=`{}` path={} line={}", + index + 1, + if matches_filter { "*" } else { "" }, + packet_citation_rank(citation, rank_terms, prefer_primary_sources), + citation.score, + claim, + role_label, + citation.kind, + citation.display_name.replace('`', "'"), + citation + .file_path + .as_deref() + .map(packet_display_path) + .unwrap_or_default(), + citation + .line + .map(|line| line.to_string()) + .unwrap_or_else(|| "-".to_string()) + ) } -fn packet_command_role_probe_queries( - question: &str, - task_class: PacketTaskClassDto, -) -> Vec { - if !packet_allows_command_probe_queries(question, task_class) { - return Vec::new(); +fn append_packet_evidence_sections( + answer: &mut AgentAnswerDto, + _task_class: PacketTaskClassDto, + limits: &PacketBudgetLimitsDto, +) { + if answer.citations.is_empty() { + return; } - let mut queries = Vec::new(); - for descriptor in packet_command_descriptors(question) { - let command_phrase = descriptor.module.replace('_', " "); - let subcommand_phrase = descriptor.subcommand_title.to_ascii_lowercase(); - push_unique_term(&mut queries, &command_phrase); - push_unique_term(&mut queries, &format!("{command_phrase} command")); - push_unique_term(&mut queries, &format!("{subcommand_phrase} command")); - push_unique_term(&mut queries, &format!("{subcommand_phrase} subcommand")); + let ledger_markdown = packet_evidence_ledger_markdown(answer, limits); + answer.sections.insert( + 0, + AgentResponseSectionDto { + id: "packet-evidence-ledger".to_string(), + title: "Packet Evidence Ledger".to_string(), + blocks: vec![AgentResponseBlockDto::Markdown { + markdown: ledger_markdown, + }], + }, + ); + + let claims = packet_supported_claims(answer); + if !claims.is_empty() { + answer.sections.insert( + 1, + AgentResponseSectionDto { + id: "packet-flow-claims".to_string(), + title: "Packet Claims".to_string(), + blocks: vec![AgentResponseBlockDto::Markdown { + markdown: packet_flow_claims_markdown(&claims), + }], + }, + ); } - queries } -fn packet_allows_command_probe_queries(question: &str, task_class: PacketTaskClassDto) -> bool { - if !matches!( - task_class, - PacketTaskClassDto::ArchitectureExplanation - | PacketTaskClassDto::DataFlow - | PacketTaskClassDto::ChangeImpact - | PacketTaskClassDto::EditPlanning - ) { - return false; +fn packet_evidence_ledger_markdown( + answer: &AgentAnswerDto, + limits: &PacketBudgetLimitsDto, +) -> String { + let mut markdown = String::new(); + markdown.push_str( + "Use these cited anchors first. They are ranked for the task wording before lower-confidence retrieval diagnostics.\n", + ); + for citation in answer.citations.iter().take(limits.max_anchors as usize) { + let _ = writeln!(markdown, "{}", packet_evidence_ledger_row(citation)); } - let lowered = question.to_ascii_lowercase(); - contains_any( - &lowered, - &[ - "cli", - "command", - "subcommand", - "entrypoint", - "entry point", - "runtime", - "flow", - "flows", - ], + markdown +} + +fn packet_evidence_ledger_row(citation: &AgentCitationDto) -> String { + let path = citation + .file_path + .as_deref() + .map(packet_display_path) + .unwrap_or_else(|| "".to_string()); + let line = citation + .line + .map(|line| format!(":{line}")) + .unwrap_or_default(); + let role = packet_evidence_role(citation) + .map(PacketEvidenceRole::as_str) + .unwrap_or("source evidence"); + format!( + "- `{}` ({:?}) - `{}`{} - {} - score {:.3}", + citation.display_name, citation.kind, path, line, role, citation.score ) } -fn packet_backtick_spans(question: &str) -> Vec<&str> { - let mut spans = Vec::new(); - let mut start = None; - for (index, ch) in question.char_indices() { - if ch != '`' { - continue; - } - if let Some(open) = start.take() { - let span = question[open..index].trim(); - if !span.is_empty() { - spans.push(span); - } - } else { - start = Some(index + ch.len_utf8()); - } - } - spans +struct PacketSqlSchemaFileCandidate { + path: std::path::PathBuf, + display_name: String, + line: u32, + score: f32, + anchors: Vec, } -fn packet_command_words(span: &str) -> Vec { - span.split_whitespace() - .filter_map(|token| { - let token = token.trim_matches(|ch: char| { - matches!( - ch, - ',' | '.' - | ';' - | ':' - | '?' - | '!' - | '(' - | ')' - | '[' - | ']' - | '{' - | '}' - | '"' - | '\'' - ) - }); - if token.starts_with('-') - || token.is_empty() - || !token.chars().any(|ch| ch.is_ascii_alphabetic()) - || !token - .chars() - .all(|ch| ch.is_ascii_alphanumeric() || ch == '-' || ch == '_') - { - return None; - } - Some(token.to_string()) - }) - .take(3) - .collect() +struct PacketSqlSchemaAnchorCandidate { + display_name: String, + line: u32, + score: f32, } -fn packet_pascal_identifier(word: &str) -> Option { - let mut value = String::new(); - for part in word - .split(|ch: char| !ch.is_ascii_alphanumeric()) - .filter(|part| !part.is_empty()) - { - let mut chars = part.chars(); - let first = chars.next()?; - value.push(first.to_ascii_uppercase()); - value.extend(chars.map(|ch| ch.to_ascii_lowercase())); +fn maybe_append_sql_schema_file_citations( + project_root: &Path, + question: &str, + answer: &mut AgentAnswerDto, +) { + let terms = packet_probe_terms(question); + if !packet_terms_indicate_sql_schema_flow(&terms) { + return; } - (!value.is_empty()).then_some(value) -} + let mut candidates = Vec::new(); + collect_sql_schema_file_candidates(project_root, project_root, &terms, &mut candidates); + candidates.sort_by(|left, right| { + right + .score + .partial_cmp(&left.score) + .unwrap_or(Ordering::Equal) + .then_with(|| left.display_name.cmp(&right.display_name)) + }); -fn packet_snake_identifier(words: &[&str]) -> Option { - let mut parts = Vec::new(); - for word in words { - let mut normalized = String::new(); - for (index, part) in word - .split(|ch: char| !ch.is_ascii_alphanumeric()) - .filter(|part| !part.is_empty()) - .enumerate() - { - if index > 0 { - normalized.push('_'); - } - normalized.push_str(&part.to_ascii_lowercase()); - } - if normalized.is_empty() { - return None; + let mut appended_files = 0; + let mut appended_anchors = 0; + for candidate in candidates.into_iter().take(12) { + let path_string = candidate.path.to_string_lossy().to_string(); + let file_already_present = answer.citations.iter().any(|existing| { + existing.file_path.as_deref().is_some_and(|existing_path| { + packet_display_path(existing_path) == packet_display_path(&path_string) + }) + }); + if !file_already_present { + let score = candidate.score + 5.0; + answer.citations.push(AgentCitationDto { + node_id: NodeId(format!("packet::sql_schema::{}", candidate.display_name)), + display_name: candidate.display_name.clone(), + kind: NodeKind::FILE, + file_path: Some(path_string.clone()), + line: Some(candidate.line), + score, + origin: SearchHitOrigin::TextMatch, + resolvable: false, + subgraph_id: None, + evidence_edge_ids: Vec::new(), + retrieval_score_breakdown: Some(RetrievalScoreBreakdownDto { + lexical: score, + semantic: 0.0, + graph: 0.0, + total: score, + provenance: vec!["packet_generic_sql_schema_file_probe".to_string()], + }), + }); + appended_files += 1; } - parts.push(normalized); - } - (!parts.is_empty()).then_some(parts.join("_")) -} -fn packet_concrete_file_probe_queries_from_required(required_queries: &[String]) -> Vec { - let mut queries = Vec::new(); - for query in required_queries { - if let Some(file_query) = packet_required_probe_file_query(query) { - push_unique_term(&mut queries, &file_query); + for anchor in candidate.anchors.into_iter().take(8) { + if appended_anchors >= 32 { + break; + } + if answer.citations.iter().any(|existing| { + existing.display_name == anchor.display_name + && existing.file_path.as_deref().is_some_and(|existing_path| { + packet_display_path(existing_path) == packet_display_path(&path_string) + }) + }) { + continue; + } + let score = candidate.score + (anchor.score / 1000.0); + answer.citations.push(AgentCitationDto { + node_id: NodeId(format!( + "packet::sql_schema::{}::{}::{}", + candidate.display_name, anchor.display_name, anchor.line + )), + display_name: anchor.display_name, + kind: NodeKind::ANNOTATION, + file_path: Some(path_string.clone()), + line: Some(anchor.line), + score, + origin: SearchHitOrigin::TextMatch, + resolvable: false, + subgraph_id: None, + evidence_edge_ids: Vec::new(), + retrieval_score_breakdown: Some(RetrievalScoreBreakdownDto { + lexical: score, + semantic: 0.0, + graph: 0.0, + total: score, + provenance: vec!["packet_generic_sql_schema_anchor_probe".to_string()], + }), + }); + appended_anchors += 1; } } - queries -} -fn packet_required_probe_file_query(query: &str) -> Option { - if !packet_required_probe_needs_concrete_file(query) { - return None; - } - let normalized_query = normalize_identifier(query); - if normalized_query == "eventprocessor" { - return Some("event_processor.rs".to_string()); + if appended_files > 0 || appended_anchors > 0 { + answer.retrieval_trace.annotations.push(format!( + "packet_generic_sql_schema_file_citations files={appended_files} anchors={appended_anchors}" + )); } - query - .chars() - .all(|ch| ch.is_ascii_alphanumeric() || ch == '_') - .then(|| format!("{query}.rs")) } -fn push_adjacent_packet_term_queries( +fn collect_sql_schema_file_candidates( + project_root: &Path, + dir: &Path, terms: &[String], - queries: &mut Vec, - window_cap: usize, + candidates: &mut Vec, ) { - for window in terms.windows(2).take(window_cap) { - if let [left, right] = window { - if packet_adjacent_query_stop_term(left) || packet_adjacent_query_stop_term(right) { + if candidates.len() >= 32 { + return; + } + let Ok(entries) = std::fs::read_dir(dir) else { + return; + }; + for entry in entries.flatten() { + let path = entry.path(); + let name = entry.file_name().to_string_lossy().to_string(); + if path.is_dir() { + let lower = name.to_ascii_lowercase(); + if matches!( + lower.as_str(), + ".git" | "target" | "node_modules" | "vendor" | "dist" | "build" + ) { continue; } - push_unique_term(queries, &format!("{left}_{right}")); - push_unique_term( - queries, - &packet_camel_case(&[left.as_str(), right.as_str()]), - ); + collect_sql_schema_file_candidates(project_root, &path, terms, candidates); + continue; + } + if path + .extension() + .and_then(|extension| extension.to_str()) + .is_none_or(|extension| !extension.eq_ignore_ascii_case("sql")) + { + continue; + } + let Ok(metadata) = path.metadata() else { + continue; + }; + if metadata.len() > 1_500_000 { + continue; + } + let Ok(source) = std::fs::read_to_string(&path) else { + continue; + }; + let lower = source.to_ascii_lowercase(); + if !lower.contains("create table") { + continue; + } + let relative = path + .strip_prefix(project_root) + .unwrap_or(&path) + .to_string_lossy() + .replace('\\', "/"); + let anchors = packet_sql_schema_anchors(&source, terms); + let mut score = 45.0; + if lower.contains("foreign key") || lower.contains("references") { + score += 12.0; + } + score += anchors.len().min(8) as f32; + let normalized_path = normalize_identifier(&relative); + let normalized_source = normalize_identifier(&source); + for term in terms { + let normalized = normalize_identifier(term); + if normalized.len() >= 4 + && (normalized_path.contains(&normalized) + || normalized_source.contains(&normalized)) + { + score += 1.5; + } } + candidates.push(PacketSqlSchemaFileCandidate { + path, + display_name: relative, + line: packet_sql_first_schema_line(&source), + score, + anchors, + }); } } -fn packet_concept_queries(question: &str) -> Vec { - let include_non_primary_terms = query_mentions_non_primary_source(question); - prompt_search_terms(question) - .into_iter() - .filter(|term| { - term.len() >= 4 - && (include_non_primary_terms || !is_non_primary_source_term(term.as_str())) - && !packet_query_stop_term(term.as_str()) - && !matches!( +fn packet_sql_schema_anchors( + source: &str, + terms: &[String], +) -> Vec { + let mut anchors = Vec::new(); + for (index, line) in source.lines().enumerate() { + let line_number = index.saturating_add(1).try_into().unwrap_or(u32::MAX); + if let Some(table) = packet_sql_identifier_after(line, "create table") { + let display_name = format!("CREATE TABLE {table}"); + if !anchors + .iter() + .any(|existing: &PacketSqlSchemaAnchorCandidate| { + existing.display_name == display_name + }) + { + anchors.push(PacketSqlSchemaAnchorCandidate { + score: 30.0 + packet_sql_prompt_match_score(&table, terms), + display_name, + line: line_number, + }); + } + } + let normalized = line.to_ascii_lowercase(); + if normalized.contains("foreign key") && normalized.contains("references") { + let relation_score = if terms.iter().any(|term| { + matches!( term.as_str(), - "answer" - | "cite" - | "cites" - | "explain" - | "files" - | "full" - | "into" - | "moves" - | "support" - | "through" + "relationship" + | "relationships" + | "relation" + | "relations" + | "foreign" + | "constraint" + | "constraints" + | "reference" + | "references" ) - }) - .take(8) - .collect() -} - -fn packet_camel_case(words: &[&str]) -> String { - let mut value = String::new(); - for word in words { - let mut chars = word.chars(); - if let Some(first) = chars.next() { - value.push(first.to_ascii_uppercase()); - value.extend(chars.map(|ch| ch.to_ascii_lowercase())); + }) { + 8.0 + } else { + 0.0 + }; + if !anchors + .iter() + .any(|existing: &PacketSqlSchemaAnchorCandidate| { + existing.display_name == "FOREIGN KEY" + }) + { + anchors.push(PacketSqlSchemaAnchorCandidate { + display_name: "FOREIGN KEY".to_string(), + line: line_number, + score: 28.0 + relation_score, + }); + } } } - value + anchors.sort_by(|left, right| { + right + .score + .partial_cmp(&left.score) + .unwrap_or(Ordering::Equal) + .then_with(|| left.line.cmp(&right.line)) + .then_with(|| left.display_name.cmp(&right.display_name)) + }); + anchors } -fn infer_packet_task_class(question: &str) -> PacketTaskClassDto { - let lower = question.to_ascii_lowercase(); - if contains_any( - &lower, - &["bug", "error", "failing", "failed", "broken", "crash"], - ) { - PacketTaskClassDto::BugLocalization - } else if contains_any( - &lower, - &["impact", "affected", "regression", "blast radius"], - ) || risk_of_change_prompt(&lower) - { - PacketTaskClassDto::ChangeImpact - } else if contains_any(&lower, &["route", "endpoint", "handler", "api path"]) { - PacketTaskClassDto::RouteTracing - } else if contains_any(&lower, &["owner", "owns", "who calls", "references"]) { - PacketTaskClassDto::SymbolOwnership - } else if contains_any( - &lower, - &[ - "data flow", - "flow from", - "flow into", - "flows from", - "flows into", - "pipeline", - "through", - ], - ) { - PacketTaskClassDto::DataFlow - } else if contains_any( - &lower, - &[ - "where to edit", - "edit", - "change", - "modify", - "implement", - "add ", - ], - ) { - PacketTaskClassDto::EditPlanning - } else { - PacketTaskClassDto::ArchitectureExplanation +fn packet_sql_prompt_match_score(value: &str, terms: &[String]) -> f32 { + let normalized_value = normalize_identifier(value); + if normalized_value.is_empty() { + return 0.0; } + let mut score = 0.0; + for term in terms { + let normalized_term = normalize_identifier(term); + if normalized_term.len() < 4 { + continue; + } + if normalized_value.contains(&normalized_term) + || normalized_term.contains(&normalized_value) + { + score += 5.0; + continue; + } + let singular = normalized_term + .strip_suffix("ies") + .map(|prefix| format!("{prefix}y")) + .or_else(|| normalized_term.strip_suffix("es").map(str::to_string)) + .or_else(|| normalized_term.strip_suffix('s').map(str::to_string)); + if let Some(singular) = singular + && singular.len() >= 4 + && (normalized_value.contains(&singular) || singular.contains(&normalized_value)) + { + score += 5.0; + } + } + score } -fn contains_any(haystack: &str, needles: &[&str]) -> bool { - needles.iter().any(|needle| haystack.contains(needle)) -} - -fn risk_of_change_prompt(lower: &str) -> bool { - lower.contains("risk if") - && contains_any(lower, &[" change", " changing", " modify", " modifying"]) - || lower.contains("risk of changing") - || lower.contains("risk from changing") - || lower.contains("risk in changing") +fn packet_sql_first_schema_line(source: &str) -> u32 { + source + .lines() + .position(|line| line.to_ascii_lowercase().contains("create table")) + .map(|index| index.saturating_add(1).try_into().unwrap_or(u32::MAX)) + .unwrap_or(1) } -fn extract_packet_query_terms(question: &str) -> Vec { - let mut terms = Vec::new(); - let mut quoted = false; - let mut quote = '\0'; - let mut start = 0usize; - for (index, ch) in question.char_indices() { - if matches!(ch, '`' | '"' | '\'') { - if quoted && ch == quote { - push_unique_term(&mut terms, question[start..index].trim()); - quoted = false; - } else if !quoted { - quoted = true; - quote = ch; - start = index + ch.len_utf8(); - } +fn maybe_append_required_file_scoped_source_citations( + project_root: &Path, + question: &str, + task_class: PacketTaskClassDto, + extra_probes: &[String], + answer: &mut AgentAnswerDto, +) { + let required_queries = + packet_sufficiency_required_probe_queries_with_extra(question, task_class, extra_probes); + let mut appended = 0usize; + for query in required_queries { + if appended >= 16 || packet_probe_query_is_cited(&query, answer) { + continue; + } + let Some(parts) = packet_file_scoped_symbol_probe_parts(&query) else { + continue; + }; + let Some(path) = packet_required_probe_source_path(project_root, &parts, &answer.citations) + else { + continue; + }; + let Ok(metadata) = path.metadata() else { + continue; + }; + if metadata.len() > 1_500_000 { + continue; + } + let Ok(source) = std::fs::read_to_string(&path) else { + continue; + }; + let Some(anchor) = packet_required_probe_source_anchor(&parts, &source) else { + continue; + }; + let path_string = path.to_string_lossy().to_string(); + if answer.citations.iter().any(|existing| { + existing.display_name == anchor.display_name + && existing.file_path.as_deref().is_some_and(|existing_path| { + packet_display_path(existing_path) == packet_display_path(&path_string) + }) + }) { + continue; } + answer.citations.push(AgentCitationDto { + node_id: NodeId(format!( + "packet::required_source_probe::{}::{}::{}", + parts.query_path, anchor.display_name, anchor.line + )), + display_name: anchor.display_name, + kind: anchor.kind, + file_path: Some(path_string), + line: Some(anchor.line), + score: 96.0, + origin: SearchHitOrigin::TextMatch, + resolvable: false, + subgraph_id: None, + evidence_edge_ids: Vec::new(), + retrieval_score_breakdown: Some(RetrievalScoreBreakdownDto { + lexical: 96.0, + semantic: 0.0, + graph: 0.0, + total: 96.0, + provenance: vec!["packet_required_file_scoped_source_probe".to_string()], + }), + }); + appended += 1; } - for term in exact_symbol_query_terms(question) { - push_unique_term(&mut terms, &term); - } - for term in packet_architecture_flow_probe_terms(question) { - push_unique_term(&mut terms, &term); + if appended > 0 { + answer.retrieval_trace.annotations.push(format!( + "packet_required_file_scoped_source_citations appended={appended}" + )); } +} - for token in question.split_whitespace() { - let token = token.trim_matches(|ch: char| { - matches!( - ch, - ',' | '.' | ';' | ':' | '?' | '!' | '(' | ')' | '[' | ']' | '{' | '}' | '"' | '`' - ) - }); - if is_packet_code_like_term(token) - || (looks_like_standalone_symbol_query(token) - && token.len() >= 4 - && !packet_extract_query_stop_term(token)) - { - push_unique_term(&mut terms, token); - } - } - terms.truncate(8); - terms -} - -fn packet_extract_query_stop_term(token: &str) -> bool { - packet_query_stop_term(token) - || matches!( - token.to_ascii_lowercase().as_str(), - "cite" - | "cites" - | "file" - | "files" - | "path" - | "paths" - | "that" - | "them" - | "they" - | "their" - | "your" - | "into" - | "from" - | "with" - | "have" - | "been" - | "will" - | "also" - | "only" - | "over" - | "under" - | "than" - | "then" - | "each" - | "such" - | "some" - | "more" - | "most" - | "many" - | "much" - | "very" - | "just" - | "like" - | "make" - | "made" - | "used" - | "uses" - | "using" - | "work" - | "works" - | "working" - ) +struct PacketRequiredSourceAnchor { + display_name: String, + kind: NodeKind, + line: u32, } -fn is_packet_code_like_term(token: &str) -> bool { - if token.len() < 3 { - return false; +fn packet_required_probe_source_path( + project_root: &Path, + parts: &PacketFileScopedSymbolProbe, + citations: &[AgentCitationDto], +) -> Option { + let direct = project_root.join(&parts.query_path); + if direct.is_file() { + return Some(direct); } - token.contains("::") - || token.contains('/') - || token.contains('\\') - || token.contains('.') - || token.contains('_') - || token.contains('-') - || token.chars().skip(1).any(|ch| ch.is_ascii_uppercase()) -} - -fn push_unique_term(terms: &mut Vec, value: &str) { - let value = value.trim(); - if value.len() < 3 { - return; + let normalized_query_path = parts.query_path.replace('\\', "/").to_ascii_lowercase(); + for citation in citations { + let path = citation.file_path.as_deref()?; + let display_path = packet_display_path(path) + .replace('\\', "/") + .to_ascii_lowercase(); + if display_path.ends_with(&normalized_query_path) { + return Some(std::path::PathBuf::from(path)); + } } - if !terms.iter().any(|term| term.eq_ignore_ascii_case(value)) { - terms.push(value.to_string()); + for citation in citations { + let path = citation.file_path.as_deref()?; + let file_name = packet_display_path(path) + .rsplit(['/', '\\']) + .next() + .unwrap_or_default() + .to_ascii_lowercase(); + if file_name == parts.file_name { + return Some(std::path::PathBuf::from(path)); + } } + None } -fn push_unique_terms(terms: &mut Vec, values: &[&str]) { - for value in values { - push_unique_term(terms, value); +fn packet_required_probe_source_anchor( + parts: &PacketFileScopedSymbolProbe, + source: &str, +) -> Option { + let display_name = parts.raw_symbols.join(" "); + for (index, line) in source.lines().enumerate() { + if packet_source_line_matches_file_scoped_probe(line, parts) { + let kind = packet_source_probe_anchor_kind(line, parts); + return Some(PacketRequiredSourceAnchor { + display_name, + kind, + line: index.saturating_add(1).try_into().unwrap_or(u32::MAX), + }); + } } + None } -fn push_unique_owned_terms(terms: &mut Vec, values: &[String]) { - for value in values { - push_unique_term(terms, value); +fn packet_source_line_matches_file_scoped_probe( + line: &str, + parts: &PacketFileScopedSymbolProbe, +) -> bool { + if parts.raw_symbols.is_empty() { + return false; } -} - -fn task_class_seed_queries(task_class: PacketTaskClassDto) -> &'static [&'static str] { - match task_class { - PacketTaskClassDto::ArchitectureExplanation => &["architecture entrypoint", "runtime flow"], - PacketTaskClassDto::BugLocalization => &["error path", "failure handling"], - PacketTaskClassDto::ChangeImpact => &["affected symbols", "impacted tests"], - PacketTaskClassDto::RouteTracing => &["route handler endpoint", "references"], - PacketTaskClassDto::SymbolOwnership => &["definition references", "callers"], - PacketTaskClassDto::DataFlow => &["pipeline flow", "storage handoff"], - PacketTaskClassDto::EditPlanning => &["edit candidates", "test coverage"], + let raw_display = parts.raw_symbols.join(" "); + let normalized_line = normalize_identifier(line); + let normalized_display = normalize_identifier(&raw_display); + if normalized_display.is_empty() { + return false; } -} - -fn push_packet_query(queries: &mut Vec, query: &str, purpose: &str) { - let query = query.trim(); - if query.is_empty() { - return; + if parts.symbols.len() >= 3 && parts.symbols[0] == "create" && parts.symbols[1] == "table" { + return packet_sql_identifier_after(line, "create table") + .map(|table| normalize_identifier(&table)) + .is_some_and(|table| { + parts + .symbols + .last() + .is_some_and(|expected| table == *expected) + }); } - if queries - .iter() - .any(|existing| existing.query.eq_ignore_ascii_case(query)) - { - return; + if parts.symbols.len() >= 2 && parts.symbols[0] == "foreign" && parts.symbols[1] == "key" { + let lower = line.to_ascii_lowercase(); + return lower.contains("foreign key") && lower.contains("references"); } - queries.push(PacketPlanQueryDto { - query: query.to_string(), - purpose: purpose.to_string(), - }); -} - -fn packet_retrieval_prompt( - question: &str, - plan: &PacketPlanDto, - initial_hybrid_weights: Option<&AgentHybridWeightsDto>, - budget: PacketBudgetModeDto, -) -> String { - let anchor_probes = packet_anchor_probe_queries(plan); - if packet_initial_retrieval_is_lexical_only(initial_hybrid_weights) && anchor_probes.is_empty() - { - return question.to_string(); + if let Some(id) = raw_display.strip_prefix("input#") { + let lower = line.to_ascii_lowercase(); + return lower.contains(">(); - if lines.is_empty() { - lines = plan - .queries - .iter() - .take(8) - .map(|query| format!("- {} ({})", query.query, query.purpose)) - .collect(); - } - lines - } else { - plan.queries - .iter() - .map(|query| format!("- {} ({})", query.query, query.purpose)) - .collect() - }; - for line in planned_lines { - prompt.push('\n'); - prompt.push_str(&line); + + let terminal = packet_required_probe_terminal_symbol(&raw_display); + let normalized_terminal = normalize_identifier(&terminal); + if normalized_terminal.is_empty() || !normalized_line.contains(&normalized_terminal) { + return false; } - prompt + + packet_source_line_declares_named_symbol(line, &normalized_terminal) + || normalized_line == normalized_display + || normalized_line.ends_with(&normalized_display) } -fn packet_initial_hybrid_weights( - _plan: &PacketPlanDto, - _budget: PacketBudgetModeDto, -) -> Option { - None +fn packet_html_line_has_attribute_value(line_lower: &str, attribute: &str, value: &str) -> bool { + let value_lower = value.to_ascii_lowercase(); + [ + format!("{attribute}=\"{value_lower}\""), + format!("{attribute}='{value_lower}'"), + format!("{attribute}={value_lower}"), + ] + .iter() + .any(|needle| line_lower.contains(needle)) } -fn packet_compact_retrieval_prompt_lines(mut anchor_probes: Vec) -> Vec { - anchor_probes.sort_by(|left, right| { - let left_path = left.contains('/') && left.contains('.'); - let right_path = right.contains('/') && right.contains('.'); - right_path - .cmp(&left_path) - .then_with(|| right.len().cmp(&left.len())) - }); - let mut selected = Vec::new(); - for query in anchor_probes { - if selected.len() >= 8 { - break; - } - if !selected.iter().any(|existing| existing == &query) { - selected.push(query); - } +fn packet_html_boolean_attribute_line_matches(line: &str, attribute: &str) -> bool { + let lower = line.to_ascii_lowercase(); + if !lower.contains(&attribute.to_ascii_lowercase()) { + return false; } - selected + let normalized_line = normalize_identifier(line); + normalized_line.contains(attribute) && (lower.contains('<') || lower.contains(attribute)) } -fn packet_initial_retrieval_is_lexical_only(weights: Option<&AgentHybridWeightsDto>) -> bool { - weights - .and_then(|weights| weights.semantic) - .is_some_and(|semantic| semantic <= f32::EPSILON) +fn packet_required_probe_terminal_symbol(raw_symbol: &str) -> String { + raw_symbol + .rsplit([':', '.', '#']) + .find(|part| !part.is_empty()) + .unwrap_or(raw_symbol) + .trim() + .to_string() } -fn packet_plan_annotation(plan: &PacketPlanDto) -> String { - let queries = plan - .queries +fn packet_source_line_declares_named_symbol(line: &str, normalized_terminal: &str) -> bool { + let lower = line.to_ascii_lowercase(); + let normalized_line = normalize_identifier(line); + let declaration_words = [ + "class ", + "struct ", + "interface ", + "enum ", + "module ", + "trait ", + "def ", + "function ", + "func ", + "fn ", + "const ", + "let ", + "var ", + "public ", + "private ", + "protected ", + "internal ", + "static ", + "abstract ", + "template ", + "using ", + "typealias ", + ]; + if !declaration_words.iter().any(|word| lower.contains(word)) { + return false; + } + if [ + "class ", + "struct ", + "interface ", + "enum ", + "module ", + "trait ", + ] + .iter() + .any(|word| lower.contains(word)) + && normalized_line.contains(normalized_terminal) + { + return true; + } + let declaration_needles = [ + format!("class{normalized_terminal}"), + format!("struct{normalized_terminal}"), + format!("interface{normalized_terminal}"), + format!("enum{normalized_terminal}"), + format!("module{normalized_terminal}"), + format!("trait{normalized_terminal}"), + format!("def{normalized_terminal}"), + format!("function{normalized_terminal}"), + format!("func{normalized_terminal}"), + format!("fn{normalized_terminal}"), + format!("const{normalized_terminal}"), + format!("let{normalized_terminal}"), + format!("var{normalized_terminal}"), + format!("using{normalized_terminal}"), + format!("typealias{normalized_terminal}"), + ]; + declaration_needles .iter() - .map(|query| query.query.as_str()) - .collect::>() - .join(" | "); - format!( - "packet_plan task_class={:?} inferred={} queries={}", - plan.task_class, plan.inferred_task_class, queries - ) -} - -fn rank_packet_evidence(question: &str, answer: &mut AgentAnswerDto) { - let terms = packet_rank_terms(question); - let prefer_primary_sources = !query_mentions_non_primary_source(question); - answer.citations.sort_by(|left, right| { - packet_citation_rank(right, &terms, prefer_primary_sources) - .partial_cmp(&packet_citation_rank(left, &terms, prefer_primary_sources)) - .unwrap_or(Ordering::Equal) - }); + .any(|needle| normalized_line.contains(needle)) + || normalized_line.ends_with(normalized_terminal) } -fn cap_packet_citations( - answer: &mut AgentAnswerDto, - limits: &PacketBudgetLimitsDto, - required_probe_queries: &[String], -) -> bool { - let mut protected_citation_keys = - promote_required_probe_citations(answer, required_probe_queries); - let focus_neighborhood_keys = - promote_focus_neighborhood_citations(answer, &protected_citation_keys); - protected_citation_keys.extend(focus_neighborhood_keys); - if protected_citation_keys.is_empty() { - cap_citations(answer, limits) +fn packet_source_probe_anchor_kind(line: &str, parts: &PacketFileScopedSymbolProbe) -> NodeKind { + let lower = line.to_ascii_lowercase(); + if parts.raw_symbols.join(" ").starts_with("input#") + || (parts.raw_symbols.len() == 1 && lower.contains('<')) + || (parts.symbols.len() >= 2 && parts.symbols[0] == "foreign" && parts.symbols[1] == "key") + || (parts.symbols.len() >= 3 && parts.symbols[0] == "create" && parts.symbols[1] == "table") + { + NodeKind::ANNOTATION + } else if lower.contains("class ") || lower.contains("struct ") { + NodeKind::CLASS + } else if lower.contains("interface ") || lower.contains("trait ") { + NodeKind::INTERFACE + } else if parts + .raw_symbols + .iter() + .any(|symbol| symbol.contains(':') || symbol.contains('.') || symbol.contains('#')) + || lower.contains("def ") + || lower.contains("function ") + || lower.contains("func ") + || lower.contains("fn ") + { + NodeKind::METHOD } else { - cap_citations_with_protected(answer, limits, &protected_citation_keys) + NodeKind::ANNOTATION } } +#[cfg(test)] +fn packet_claim_for_role( + _key: &str, + role: PacketEvidenceRole, + citation: &AgentCitationDto, + prompt: &str, +) -> String { + build_packet_claim_for_role(role, citation, prompt, &packet_rank_terms(prompt)) +} -fn promote_required_probe_citations( - answer: &mut AgentAnswerDto, - required_probe_queries: &[String], -) -> HashSet { - if required_probe_queries.is_empty() || answer.citations.is_empty() { - return HashSet::new(); - } - - let focus_roots = packet_command_focus_roots(&answer.citations); - let mut promoted_indices = Vec::new(); - for query in required_probe_queries { - if promoted_indices - .iter() - .any(|index| packet_citation_satisfies_required_probe(query, &answer.citations[*index])) - { - continue; - } - let mut best_match = None; - for (index, citation) in answer.citations.iter().enumerate() { - if promoted_indices.contains(&index) { - continue; - } - let Some(match_rank) = packet_citation_probe_match_rank(query, citation) else { - continue; - }; - if packet_display_name_is_import_literal(&citation.display_name.to_ascii_lowercase()) - && !packet_citation_satisfies_required_probe(query, citation) - { - continue; - } - if best_match - .map(|(best_index, best_rank)| { - packet_prefer_required_probe_match( - query, - citation, - match_rank, - &answer.citations[best_index], - best_rank, - &focus_roots, - ) - }) - .unwrap_or(true) - { - best_match = Some((index, match_rank)); - } +fn packet_retrieval_profile( + task_class: Option, + budget: PacketBudgetModeDto, + limits: &PacketBudgetLimitsDto, +) -> AgentRetrievalProfileSelectionDto { + let preset = match task_class { + Some(PacketTaskClassDto::BugLocalization) | Some(PacketTaskClassDto::EditPlanning) => { + AgentRetrievalPresetDto::Investigate } - if let Some((index, _)) = best_match { - promoted_indices.push(index); + Some(PacketTaskClassDto::ChangeImpact) | Some(PacketTaskClassDto::SymbolOwnership) => { + AgentRetrievalPresetDto::Impact } - } - if promoted_indices.is_empty() { - return HashSet::new(); - } + Some(PacketTaskClassDto::RouteTracing) => AgentRetrievalPresetDto::Callflow, + Some(PacketTaskClassDto::ArchitectureExplanation) + | Some(PacketTaskClassDto::DataFlow) + | None => AgentRetrievalPresetDto::Architecture, + }; - let protected_citation_keys = promoted_indices - .iter() - .map(|index| packet_citation_key(&answer.citations[*index])) - .collect::>(); - let promoted_index_set = promoted_indices.iter().copied().collect::>(); - let mut reordered = Vec::with_capacity(answer.citations.len()); - for index in promoted_indices { - reordered.push(answer.citations[index].clone()); - } - for (index, citation) in answer.citations.drain(..).enumerate() { - if !promoted_index_set.contains(&index) { - reordered.push(citation); - } + if matches!( + budget, + PacketBudgetModeDto::Tiny | PacketBudgetModeDto::Compact + ) { + return AgentRetrievalProfileSelectionDto::Custom { + config: AgentCustomRetrievalConfigDto { + depth: if matches!(budget, PacketBudgetModeDto::Tiny) { + 1 + } else { + 2 + }, + max_nodes: limits.max_trail_edges.clamp(10, 2_000), + include_edge_occurrences: matches!( + task_class, + Some(PacketTaskClassDto::ChangeImpact | PacketTaskClassDto::RouteTracing) + ), + enable_source_reads: true, + ..AgentCustomRetrievalConfigDto::default() + }, + }; } - answer.citations = reordered; - answer.retrieval_trace.annotations.push(format!( - "packet_required_probe_citations promoted={} required={}", - promoted_index_set.len(), - required_probe_queries.join("|").replace('`', "'") - )); - protected_citation_keys + + AgentRetrievalProfileSelectionDto::Preset { preset } } -fn promote_focus_neighborhood_citations( - answer: &mut AgentAnswerDto, - protected_citation_keys: &HashSet, -) -> HashSet { - if answer.citations.is_empty() { - return HashSet::new(); +fn cap_graph_artifacts( + graphs: &mut Vec, + byte_cap: usize, +) -> GraphArtifactCapStats { + let mut retained = Vec::with_capacity(graphs.len()); + let mut retained_bytes = 0usize; + let mut omitted_count = 0usize; + + for graph in graphs.drain(..) { + let encoded_bytes = serde_json::to_vec(&graph) + .map(|bytes| bytes.len()) + .unwrap_or(usize::MAX); + if retained_bytes.saturating_add(encoded_bytes) <= byte_cap { + retained_bytes = retained_bytes.saturating_add(encoded_bytes); + retained.push(graph); + } else { + omitted_count = omitted_count.saturating_add(1); + } } - let focus_roots = packet_command_focus_roots(&answer.citations); - if focus_roots.is_empty() { - return HashSet::new(); + + *graphs = retained; + GraphArtifactCapStats { + retained_bytes, + omitted_count, + truncated: omitted_count > 0, } - let protected_file_paths = answer - .citations - .iter() - .filter(|citation| protected_citation_keys.contains(&packet_citation_key(citation))) - .filter_map(packet_citation_file_path_key) - .collect::>(); +} - let mut ranked_candidates = answer - .citations - .iter() - .enumerate() - .filter(|(_, citation)| { - packet_focus_neighborhood_candidate( - citation, - &focus_roots, - protected_citation_keys, - &protected_file_paths, - ) - }) - .map(|(index, citation)| { - ( - index, - packet_focus_neighborhood_rank(citation, &focus_roots), - ) - }) - .collect::>(); - ranked_candidates.sort_by(|(left_index, left_rank), (right_index, right_rank)| { - right_rank - .cmp(left_rank) - .then_with(|| left_index.cmp(right_index)) - }); +fn execute_retrieval( + controller: &AppController, + req: &AgentAskRequest, + prompt: &str, + ask_started_at: Instant, + resolved_profile: &ResolvedProfile, + trace: &mut TraceRecorder, +) -> Result { + let mut bundle = RetrievalBundle::default(); + let semantic_required = hybrid_retrieval_enabled() + && !packet_initial_retrieval_is_lexical_only(req.hybrid_weights.as_ref()); - let mut promoted_indices = Vec::new(); - let mut promoted_file_paths = HashSet::new(); - for (index, _) in ranked_candidates { - let Some(path) = packet_citation_file_path_key(&answer.citations[index]) else { - continue; - }; - if !promoted_file_paths.insert(path) { - continue; + let max_results = req + .max_results + .unwrap_or(DEFAULT_MAX_RESULTS) + .clamp(1, resolved_profile.max_search_results) as usize; + + let (mut scored_hits, hits) = match try_sidecar_primary_search( + controller, + prompt, + max_results, + req.latency_budget_ms, + ) { + Some(SidecarPrimarySearchOutcome::Served { + hits, + scored_hits, + shadow, + }) => { + trace.set_retrieval_shadow(shadow.clone()); + trace.annotate(format!( + "retrieval_sidecar_primary mode={} candidates={} resolved_hits={}", + shadow.retrieval_mode, + shadow.candidate_count, + hits.len() + )); + let search_step = trace.start_step( + AgentRetrievalStepKindDto::Search, + vec![ + field("query_chars", prompt.len().to_string()), + field("retrieval_path", "sidecar"), + ], + ); + trace.finish_ok( + search_step, + vec![ + field("hits", hits.len().to_string()), + field("sidecar_candidates", shadow.candidate_count.to_string()), + field( + "sidecar_resolved_hits", + shadow.resolved_hit_count.to_string(), + ), + field("accepted_hits", hits.len().to_string()), + field("max_results", max_results.to_string()), + field("repo_text", "off_initial"), + ], + ); + let semantic_query_step = trace.start_step( + AgentRetrievalStepKindDto::SemanticQueryEmbedding, + vec![field("required", semantic_required.to_string())], + ); + let semantic_candidates_step = trace.start_step( + AgentRetrievalStepKindDto::SemanticCandidateRetrieval, + vec![field("required", semantic_required.to_string())], + ); + let hybrid_rerank_step = trace.start_step( + AgentRetrievalStepKindDto::HybridRerank, + vec![field("required", semantic_required.to_string())], + ); + trace.finish_skipped( + semantic_query_step, + "Semantic embedding skipped on sidecar retrieval path.", + Vec::new(), + ); + trace.finish_skipped( + semantic_candidates_step, + "Semantic candidate scan skipped on sidecar retrieval path.", + Vec::new(), + ); + trace.finish_ok( + hybrid_rerank_step, + vec![field("ranked", hits.len().to_string())], + ); + (scored_hits, hits) } - promoted_indices.push(index); - if promoted_indices.len() >= PACKET_FOCUS_NEIGHBORHOOD_CARRY_LIMIT { - break; + Some(SidecarPrimarySearchOutcome::Rejected { shadow, reason }) => { + trace.set_retrieval_shadow(shadow); + trace.annotate(format!( + "retrieval_sidecar_primary rejected=true fail_closed=true reason={reason}" + )); + return Err(sidecar_retrieval_unavailable_error( + controller, + format!("sidecar retrieval primary rejected query: {reason}"), + )); } - } - if promoted_indices.is_empty() { - return HashSet::new(); - } - - let promoted_index_set = promoted_indices.iter().copied().collect::>(); - let promoted_keys = promoted_indices - .iter() - .map(|index| packet_citation_key(&answer.citations[*index])) - .collect::>(); - let mut reordered = Vec::with_capacity(answer.citations.len()); - for citation in &answer.citations { - if protected_citation_keys.contains(&packet_citation_key(citation)) { - reordered.push(citation.clone()); + Some(SidecarPrimarySearchOutcome::Unavailable { reason }) => { + trace.annotate(format!( + "retrieval_sidecar_primary unavailable=true fail_closed=true reason={reason}" + )); + return Err(sidecar_retrieval_unavailable_error(controller, reason)); } - } - for index in promoted_indices { - reordered.push(answer.citations[index].clone()); - } - for (index, citation) in answer.citations.drain(..).enumerate() { - let key = packet_citation_key(&citation); - if !protected_citation_keys.contains(&key) && !promoted_index_set.contains(&index) { - reordered.push(citation); + None => { + return Err(sidecar_retrieval_unavailable_error( + controller, + "sidecar retrieval primary is mandatory; non-sidecar initial search is disabled", + )); } - } - answer.citations = reordered; - answer.retrieval_trace.annotations.push(format!( - "packet_focus_neighborhood_citations promoted={} roots={}", - promoted_keys.len(), - focus_roots - .iter() - .map(|root| root.root.as_str()) - .collect::>() - .join("|") - .replace('`', "'") - )); - promoted_keys -} + }; -fn packet_focus_neighborhood_candidate( - citation: &AgentCitationDto, - focus_roots: &[PacketCommandFocusRoot], - protected_citation_keys: &HashSet, - protected_file_paths: &HashSet, -) -> bool { - if protected_citation_keys.contains(&packet_citation_key(citation)) - || citation.origin != SearchHitOrigin::IndexedSymbol - || !citation.resolvable - || packet_display_name_is_import_literal(&citation.display_name.to_ascii_lowercase()) - || packet_display_name_is_test_like(&citation.display_name) - { - return false; - } - let path = citation - .file_path - .as_deref() - .map(packet_display_path) - .unwrap_or_default(); - if path.is_empty() || packet_citation_focus_root_score(citation, focus_roots) == 0 { - return false; - } - if protected_file_paths.contains(&path) { - return false; + let initial_hit_count = hits.len(); + let mut hits = hits; + let literal_diagnostic_signal = has_literal_diagnostic_signal(prompt); + let promotable_focus_available = + req.focus_node_id.is_some() || investigation_focus_anchor(prompt, &hits).is_some(); + let mut expansion_added_hits = false; + let block_nucleo_supplement = + sidecar_retrieval_blocks_nucleo_supplement(controller, hits.len()); + if block_nucleo_supplement && weak_initial_hits(prompt, &hits) { + trace.annotate( + "retrieval_sidecar_primary skipped local nucleo investigation supplement on weak hits", + ); } - !retrieval_file_role_from_path(&path.to_ascii_lowercase()).is_non_primary() -} - -fn packet_citation_file_path_key(citation: &AgentCitationDto) -> Option { - let path = citation.file_path.as_deref().map(packet_display_path)?; - if path.is_empty() { None } else { Some(path) } -} - -fn packet_focus_neighborhood_rank( - citation: &AgentCitationDto, - focus_roots: &[PacketCommandFocusRoot], -) -> (u8, u8, u8, u8, u8, u8, i32) { - let path = citation - .file_path - .as_deref() - .map(packet_display_path) - .unwrap_or_default(); - let source_file: u8 = if retrieval_file_role_from_path(&path.to_ascii_lowercase()) - == crate::RetrievalFileRole::Source + if should_investigate(resolved_profile) + && weak_initial_hits(prompt, &hits) + && !promotable_focus_available + && !block_nucleo_supplement { - 1 - } else { - 0 - }; - let direct_root_file = packet_citation_direct_focus_root_file_score(citation, focus_roots); - let role_backed: u8 = if packet_evidence_role(citation).is_some() { - 1 - } else { - 0 - }; - let implementation_file: u8 = if packet_path_is_implementation(&path) { - 1 - } else { - 0 - }; - let definition_file: u8 = if packet_primary_definition_file_citation(citation) { - 1 - } else { - 0 - }; - ( - packet_citation_focus_root_score(citation, focus_roots), - direct_root_file, - packet_source_navigation_file_score(&path), - source_file, - role_backed, - implementation_file.saturating_add(definition_file), - (citation.score * 1000.0).round() as i32, - ) -} - -fn packet_citation_direct_focus_root_file_score( - citation: &AgentCitationDto, - focus_roots: &[PacketCommandFocusRoot], -) -> u8 { - let path = citation - .file_path - .as_deref() - .map(packet_display_path) - .unwrap_or_default() - .replace('\\', "/"); - let parent = path.rsplit_once('/').map(|(parent, _)| parent); - focus_roots - .iter() - .filter(|root| parent == Some(root.root.as_str())) - .map(|root| root.weight) - .max() - .unwrap_or_default() -} - -fn packet_source_navigation_file_score(path: &str) -> u8 { - let normalized = packet_display_path(path).replace('\\', "/"); - let file_name = normalized.rsplit('/').next().unwrap_or(normalized.as_str()); - let stem = file_name - .rsplit_once('.') - .map(|(stem, _)| stem) - .unwrap_or(file_name) - .to_ascii_lowercase(); - match stem.as_str() { - "cli" | "cmd" | "command" | "commands" => 4, - "lib" | "mod" | "index" => 3, - "events" | "event" => 2, - "main" | "app" | "server" | "router" | "routes" => 2, - "handler" | "handlers" | "entrypoint" | "entrypoints" => 1, - _ if stem.ends_with("_events") - || stem.ends_with("_event") - || stem.ends_with("-events") - || stem.ends_with("-event") => - { - 2 + let expanded = match investigate_query_expansion( + controller, + req, + prompt, + max_results, + ask_started_at, + resolved_profile, + trace, + ) { + Ok(expanded) => expanded, + Err(error) => { + trace.annotate(format!( + "Investigation query expansion failed; continuing with initial hits: {}", + error.message + )); + Vec::new() + } + }; + if !expanded.is_empty() { + merge_scored_hits(&mut scored_hits, expanded, max_results); + hits = scored_hits + .iter() + .map(|scored| scored.hit.clone()) + .collect::>(); + bundle.diagnostic_supplement_used = true; + expansion_added_hits = true; } - _ => 0, - } -} -fn packet_prefer_required_probe_match( - query: &str, - candidate: &AgentCitationDto, - candidate_rank: u8, - existing: &AgentCitationDto, - existing_rank: u8, - focus_roots: &[PacketCommandFocusRoot], -) -> bool { - if !query_mentions_non_primary_source(query) { - let candidate_test_like = packet_display_name_is_test_like(&candidate.display_name); - let existing_test_like = packet_display_name_is_test_like(&existing.display_name); - if candidate_test_like != existing_test_like { - return !candidate_test_like; + if initial_hit_count == 0 && expansion_added_hits && !literal_diagnostic_signal { + hits.clear(); + scored_hits.clear(); + trace.annotate( + "Investigation discarded expansion-only hits for an unanchored natural-language query.", + ); } - } - if candidate_rank != existing_rank { - return candidate_rank > existing_rank; - } - if !packet_required_probe_needs_exact_match(query) { - let candidate_focus = packet_citation_focus_root_score(candidate, focus_roots); - let existing_focus = packet_citation_focus_root_score(existing, focus_roots); - if candidate_focus != existing_focus { - return candidate_focus > existing_focus; + + if weak_initial_hits(prompt, &hits) && literal_diagnostic_signal { + trace.annotate( + "Investigation skipped repo-text diagnostics because packet evidence must come from sidecar-backed resolvable hits or direct source reads.", + ); + } else if weak_initial_hits(prompt, &hits) && !is_repo_explanation_prompt(prompt) { + if !hits.is_empty() { + hits.clear(); + scored_hits.clear(); + trace.annotate( + "Investigation discarded low-confidence unanchored hits for a natural-language query.", + ); + } + trace.annotate( + "Repo-text diagnostics are disabled for packet evidence; weak unanchored hits were not promoted.", + ); + } else if weak_initial_hits(prompt, &hits) { + trace.annotate( + "Investigation deferred a broad repo explanation prompt to sidecar evidence only.", + ); } - let candidate_token_coverage = packet_citation_probe_token_coverage(query, candidate); - let existing_token_coverage = packet_citation_probe_token_coverage(query, existing); - if candidate_token_coverage != existing_token_coverage { - return candidate_token_coverage > existing_token_coverage; + + if weak_initial_hits(prompt, &hits) && !is_repo_explanation_prompt(prompt) { + trace.annotate("Investigation low confidence gap after sidecar query expansion."); } - } - if packet_prefer_flow_anchor_path_citation(candidate, existing) { - return true; - } - if packet_required_probe_prefers_implementation(query) - && packet_prefer_implementation_file(candidate, existing) + } else if should_investigate(resolved_profile) + && weak_initial_hits(prompt, &hits) + && promotable_focus_available { - return true; + trace.annotate( + "Investigation kept an explicit or prompt-anchored focus instead of broad diagnostics.", + ); } - packet_exact_definition_file_citation(candidate) - && !packet_exact_definition_file_citation(existing) -} - -fn packet_required_probe_prefers_implementation(query: &str) -> bool { - query.contains("::") || query.contains('.') -} -fn packet_prefer_implementation_file( - candidate: &AgentCitationDto, - existing: &AgentCitationDto, -) -> bool { - let candidate_path = candidate - .file_path - .as_deref() - .map(packet_display_path) - .unwrap_or_default(); - let existing_path = existing - .file_path - .as_deref() - .map(packet_display_path) - .unwrap_or_default(); - packet_path_is_implementation(&candidate_path) && !packet_path_is_implementation(&existing_path) -} + if should_investigate(resolved_profile) + && weak_initial_hits(prompt, &hits) + && is_repo_explanation_prompt(prompt) + && !block_nucleo_supplement + { + let overview_hits = repo_explanation_grounding_hits( + controller, + req, + max_results, + ask_started_at, + resolved_profile, + trace, + )?; + if !overview_hits.is_empty() { + hits = overview_hits; + scored_hits.clear(); + bundle.diagnostic_supplement_used = true; + bundle.repo_explanation_supplement_used = true; + trace.annotate( + "Investigation used grounding snapshot diagnostic supplement for a broad repo explanation prompt.", + ); + } + } else if should_investigate(resolved_profile) + && weak_initial_hits(prompt, &hits) + && is_repo_explanation_prompt(prompt) + && block_nucleo_supplement + { + trace.annotate( + "Grounding snapshot supplement skipped because sidecar-primary retrieval is mandatory.", + ); + } -fn packet_path_is_implementation(path: &str) -> bool { - let lower = path.to_ascii_lowercase(); - matches!( - lower.rsplit('.').next(), - Some( - "c" | "cc" - | "cpp" - | "cxx" - | "go" - | "java" - | "js" - | "jsx" - | "kt" - | "php" - | "py" - | "rb" - | "rs" - | "ts" - | "tsx" - ) - ) -} + let focus_node_id = req + .focus_node_id + .clone() + .or_else(|| investigation_focus_anchor(prompt, &hits)) + .or_else(|| { + hits.iter() + .find(|hit| hit.resolvable) + .map(|hit| hit.node_id.clone()) + }); -#[derive(Debug, Clone, PartialEq, Eq)] -struct PacketCommandFocusRoot { - root: String, - weight: u8, -} + let filter_step = trace.start_step( + AgentRetrievalStepKindDto::TrailFilterOptions, + vec![field("has_focus", focus_node_id.is_some().to_string())], + ); + let filter_options = match controller.graph_trail_filter_options() { + Ok(options) => { + trace.finish_ok( + filter_step, + vec![ + field("edge_kinds", options.edge_kinds.len().to_string()), + field("node_kinds", options.node_kinds.len().to_string()), + ], + ); + options + } + Err(error) => { + trace.finish_err(filter_step, error.message.clone()); + trace + .annotate("Trail filter options unavailable; continuing with unsanitized filters."); + TrailFilterOptionsDto { + node_kinds: Vec::new(), + edge_kinds: Vec::new(), + } + } + }; -fn packet_command_focus_roots(citations: &[AgentCitationDto]) -> Vec { - let mut roots = Vec::::new(); - for citation in citations { - let display = citation.display_name.as_str(); - let normalized_display = normalize_identifier(display); - let path = citation - .file_path - .as_deref() - .map(packet_display_path) - .unwrap_or_default(); - let Some(root) = packet_source_root_from_path(&path) else { - continue; - }; - let weight = - if normalized_display.ends_with("runmain") || normalized_display.contains("runexec") { - 3 - } else if display.contains("::Cli") - || display.contains("::cli") - || path.ends_with("/src/cli.rs") - || path.contains("/cli/src/") - { - 2 - } else if display.contains("Subcommand::") { - 1 - } else { - continue; - }; - packet_push_focus_root(&mut roots, root, weight); - } - roots.sort_by(|left, right| { - right - .weight - .cmp(&left.weight) - .then_with(|| left.root.cmp(&right.root)) - }); - roots -} + let mut primary_graph: Option = None; -fn packet_push_focus_root(roots: &mut Vec, root: String, weight: u8) { - if let Some(existing) = roots.iter_mut().find(|existing| existing.root == root) { - existing.weight = existing.weight.max(weight); - } else { - roots.push(PacketCommandFocusRoot { root, weight }); - } -} + if let Some(center_id) = focus_node_id.clone() { + let neighborhood_step = trace.start_step( + AgentRetrievalStepKindDto::Neighborhood, + vec![field("center_id", center_id.0.clone())], + ); + match controller.graph_neighborhood(GraphRequest { + center_id, + max_edges: Some(DEFAULT_MAX_EDGES), + }) { + Ok(neighborhood) => { + trace.finish_ok( + neighborhood_step, + vec![ + field("nodes", neighborhood.nodes.len().to_string()), + field("edges", neighborhood.edges.len().to_string()), + field("truncated", neighborhood.truncated.to_string()), + ], + ); -fn packet_source_root_from_path(path: &str) -> Option { - let normalized = packet_display_path(path); - let normalized = normalized.trim_matches('/').replace('\\', "/"); - if normalized.is_empty() { - return None; - } - if let Some(index) = normalized.find("/src/") { - let root = &normalized[..index + "/src".len()]; - return (!root.is_empty()).then(|| root.to_string()); + primary_graph = Some(neighborhood.clone()); + bundle.graphs.push(GraphArtifactDto::Uml { + id: "uml-neighborhood".to_string(), + title: "Primary Neighborhood".to_string(), + graph: neighborhood, + }); + } + Err(error) => { + trace.finish_err(neighborhood_step, error.message.clone()); + trace.annotate("Neighborhood retrieval failed; continuing with trail retrieval."); + } + } + } else { + let neighborhood_step = trace.start_step( + AgentRetrievalStepKindDto::Neighborhood, + vec![field("has_focus", "false")], + ); + trace.finish_skipped(neighborhood_step, "No focus node selected.", Vec::new()); } - let (parent, _) = normalized.rsplit_once('/')?; - (!parent.is_empty()).then(|| parent.to_string()) -} -fn packet_citation_focus_root_score( - citation: &AgentCitationDto, - focus_roots: &[PacketCommandFocusRoot], -) -> u8 { - let path = citation - .file_path - .as_deref() - .map(packet_display_path) - .unwrap_or_default() - .replace('\\', "/"); - focus_roots + let sanitized_plans = resolved_profile + .trail_plans .iter() - .filter(|root| path == root.root || path.starts_with(&format!("{}/", root.root))) - .map(|root| root.weight) - .max() - .unwrap_or_default() -} - -fn maybe_annotate_packet_candidate_window( - question: &str, - limits: &PacketBudgetLimitsDto, - answer: &mut AgentAnswerDto, -) { - let Ok(filter) = std::env::var("CODESTORY_PACKET_CANDIDATE_TRACE") else { - return; - }; - let trace_terms = filter - .split(|ch: char| ch == ',' || ch == ';' || ch.is_whitespace()) - .map(normalize_identifier) - .filter(|term| !term.is_empty()) + .map(|plan| sanitize_plan_filters(plan, &filter_options)) .collect::>(); - if trace_terms.is_empty() { - return; - } - let rank_terms = packet_rank_terms(question); - let prefer_primary_sources = !query_mentions_non_primary_source(question); - let broad_window = (limits.max_anchors as usize).saturating_mul(2).max(8); - let mut rows = Vec::new(); - let mut matched = 0usize; - for (index, citation) in answer.citations.iter().enumerate() { - let matches_filter = packet_candidate_matches_trace_terms(citation, &trace_terms); - if matches_filter { - matched = matched.saturating_add(1); - } - if index >= broad_window && !matches_filter { - continue; - } - if rows.len() >= 64 { - break; + if focus_node_id.is_none() { + let trail_step = trace.start_step( + AgentRetrievalStepKindDto::Trail, + vec![field("plans", sanitized_plans.len().to_string())], + ); + trace.finish_skipped(trail_step, "No focus node selected.", Vec::new()); + } else { + for (idx, plan) in sanitized_plans.iter().enumerate() { + let trail_step = trace.start_step( + AgentRetrievalStepKindDto::Trail, + vec![ + field("index", idx.to_string()), + field("mode", format!("{:?}", plan.mode)), + field("depth", plan.depth.to_string()), + field("direction", format!("{:?}", plan.direction)), + field("max_nodes", plan.max_nodes.to_string()), + field("hide_speculative", "true"), + ], + ); + + let root_id = focus_node_id.clone().expect("checked focus node"); + let request = agent_trail_request(root_id, plan); + + match controller.graph_trail(request) { + Ok(trail) => { + let trail_output = vec![ + field("nodes", trail.nodes.len().to_string()), + field("edges", trail.edges.len().to_string()), + field("max_nodes", plan.max_nodes.to_string()), + field("truncated", trail.truncated.to_string()), + field("omitted_edges", trail.omitted_edge_count.to_string()), + ]; + if trail.truncated { + trace.finish_truncated( + trail_step, + format!( + "Trail output hit max_nodes={}; narrow focus or lower depth.", + plan.max_nodes + ), + trail_output, + ); + trace.annotate(trail_truncated_annotation(idx + 1, plan.max_nodes)); + } else { + trace.finish_ok(trail_step, trail_output); + } + bundle.graphs.push(GraphArtifactDto::Uml { + id: format!("uml-trail-{}", idx + 1), + title: format!("Trail {}", idx + 1), + graph: trail, + }); + } + Err(error) => { + trace.finish_err(trail_step, error.message.clone()); + trace.annotate(format!("Trail {} failed and was skipped.", idx + 1)); + } + } } - rows.push(packet_candidate_trace_row( - index, - citation, - &rank_terms, - prefer_primary_sources, - matches_filter, - )); } - answer.retrieval_trace.annotations.push(format!( - "packet_candidate_trace filter=`{}` candidates={} matched={} max_anchors={} rows={}", - filter.replace('`', "'"), - answer.citations.len(), - matched, - limits.max_anchors, - rows.join(" | ") - )); -} -fn packet_candidate_matches_trace_terms( - citation: &AgentCitationDto, - trace_terms: &[String], -) -> bool { - let normalized_display = normalize_identifier(&citation.display_name); - let normalized_path = normalize_identifier(citation.file_path.as_deref().unwrap_or_default()); - trace_terms.iter().any(|term| { - normalized_display.contains(term) - || normalized_path.contains(term) - || (!normalized_display.is_empty() && term.contains(&normalized_display)) - }) -} - -fn packet_candidate_trace_row( - index: usize, - citation: &AgentCitationDto, - rank_terms: &[String], - prefer_primary_sources: bool, - matches_filter: bool, -) -> String { - let role = packet_evidence_role(citation); - let claim = role - .map(|role| packet_claim_key_for_citation(role, citation)) - .unwrap_or_else(|| "-".to_string()); - format!( - "#{}{} rank={:.3} score={:.3} claim={} role={} kind={:?} name=`{}` path={} line={}", - index + 1, - if matches_filter { "*" } else { "" }, - packet_citation_rank(citation, rank_terms, prefer_primary_sources), - citation.score, - claim, - role.unwrap_or("-"), - citation.kind, - citation.display_name.replace('`', "'"), - citation - .file_path - .as_deref() - .map(packet_display_path) - .unwrap_or_default(), - citation - .line - .map(|line| line.to_string()) - .unwrap_or_else(|| "-".to_string()) - ) -} - -fn packet_rank_terms(question: &str) -> Vec { - let mut terms = prompt_search_terms(question); - for term in extract_packet_query_terms(question) { - push_unique_term(&mut terms, &term); - } - for query in packet_symbol_probe_queries( - question, - infer_packet_task_class(question), - PacketBudgetModeDto::Standard, - ) { - push_unique_term(&mut terms, &normalize_identifier(&query)); - } - terms -} - -fn append_packet_evidence_sections( - answer: &mut AgentAnswerDto, - _task_class: PacketTaskClassDto, - limits: &PacketBudgetLimitsDto, -) { - if answer.citations.is_empty() { - return; - } - - let ledger_markdown = packet_evidence_ledger_markdown(answer, limits); - answer.sections.insert( - 0, - AgentResponseSectionDto { - id: "packet-evidence-ledger".to_string(), - title: "Packet Evidence Ledger".to_string(), - blocks: vec![AgentResponseBlockDto::Markdown { - markdown: ledger_markdown, - }], - }, - ); - - let claims = packet_supported_claims(answer); - if !claims.is_empty() { - answer.sections.insert( - 1, - AgentResponseSectionDto { - id: "packet-flow-claims".to_string(), - title: "Packet Claims".to_string(), - blocks: vec![AgentResponseBlockDto::Markdown { - markdown: packet_flow_claims_markdown(&claims), - }], - }, - ); - } -} - -fn packet_evidence_ledger_markdown( - answer: &AgentAnswerDto, - limits: &PacketBudgetLimitsDto, -) -> String { - let mut markdown = String::new(); - markdown.push_str( - "Use these cited anchors first. They are ranked for the task wording before lower-confidence retrieval diagnostics.\n", - ); - for citation in answer.citations.iter().take(limits.max_anchors as usize) { - let _ = writeln!(markdown, "{}", packet_evidence_ledger_row(citation)); - } - markdown -} - -fn packet_evidence_ledger_row(citation: &AgentCitationDto) -> String { - let path = citation - .file_path - .as_deref() - .map(packet_display_path) - .unwrap_or_else(|| "".to_string()); - let line = citation - .line - .map(|line| format!(":{line}")) - .unwrap_or_default(); - let role = packet_evidence_role(citation).unwrap_or("source evidence"); - format!( - "- `{}` ({:?}) - `{}`{} - {} - score {:.3}", - citation.display_name, citation.kind, path, line, role, citation.score - ) -} - -fn packet_flow_claims_markdown(claims: &[PacketClaimDto]) -> String { - let mut markdown = String::new(); - markdown.push_str("Supported claims for a compact agent answer:\n"); - for claim in claims { - let citation = claim.citations.first(); - let suffix = citation - .and_then(|citation| citation.file_path.as_deref()) - .map(packet_display_path) - .map(|path| format!(" (`{path}`)")) - .unwrap_or_default(); - let _ = writeln!(markdown, "- {}{}", claim.claim, suffix); - } - markdown -} - -fn packet_architecture_flow_probe_terms(prompt: &str) -> Vec { - let lower = prompt.to_ascii_lowercase(); - let mut terms = Vec::new(); - if prompt_mentions_indexing_flow(&lower) { - for term in [ - "index service", - "workspace execution plan", - "workspace indexer", - "symbol extraction indexer", - "search projection", - "snapshot refresh", - ] { - push_unique_term(&mut terms, term); - } - } - push_eval_architecture_flow_probe_terms(&lower, &mut terms); - terms -} - -fn prompt_mentions_indexing_flow(lower: &str) -> bool { - contains_any(lower, &["indexing", "indexer", "indexed", " index "]) - && contains_any( - lower, - &[ - "cli", - "command", - "discovery", - "extraction", - "file", - "persistence", - "projection", - "refresh", - "runtime", - "search", - "snapshot", - "storage", - "store", - "symbol", - "workspace", - ], - ) -} - -fn packet_push_flow_template_claim( - claims: &mut Vec, - seen: &mut HashSet, - claim_text: &str, - citation: Option, -) { - packet_push_flow_template_claim_with_citations( - claims, - seen, - claim_text, - citation.map(|value| vec![value]).unwrap_or_default(), - ); -} - -fn packet_push_flow_template_claim_with_citations( - claims: &mut Vec, - seen: &mut HashSet, - claim_text: &str, - citations: Vec, -) { - let key = normalize_identifier(claim_text); - if key.is_empty() || !seen.insert(key) { - return; - } - claims.push(PacketClaimDto { - claim: claim_text.to_string(), - citations, - }); -} - -fn packet_append_flow_template_claims( - prompt: &str, - citations: &[AgentCitationDto], - claims: &mut Vec, - seen: &mut HashSet, -) { - let normalized_prompt = normalize_identifier(prompt); - - packet_append_command_flow_template_claims(prompt, citations, claims, seen); - packet_append_indexing_pipeline_flow_template_claims(prompt, citations, claims, seen); - if !eval_probes_enabled() { - return; - } - packet_append_indexing_storage_flow_template_claims(prompt, citations, claims, seen); - for (claim, citation) in eval_flow_template_claims(&normalized_prompt, citations) { - packet_push_flow_template_claim(claims, seen, &claim, Some(citation)); - } -} - -fn packet_append_indexing_pipeline_flow_template_claims( - prompt: &str, - citations: &[AgentCitationDto], - claims: &mut Vec, - seen: &mut HashSet, -) { - let normalized_prompt = normalize_identifier(prompt); - let indexing_prompt = normalized_prompt.contains("indexing") - || normalized_prompt.contains("indexed") - || normalized_prompt.contains("indexer") - || normalized_prompt.contains("indexcommand"); - if !(indexing_prompt - && normalized_prompt.contains("runtime") - && (normalized_prompt.contains("workspace") - || normalized_prompt.contains("sourcefile") - || normalized_prompt.contains("filediscovery")) - && (normalized_prompt.contains("persistence") || normalized_prompt.contains("store")) - && normalized_prompt.contains("snapshot")) - { - return; - } - - let cli_entry = packet_citation_matching_display(citations, "run_index") - .or_else(|| packet_citation_matching_display(citations, "Command::Index")) - .or_else(|| packet_citation_matching_display(citations, "IndexCommand")) - .or_else(|| packet_citation_matching_display(citations, "CliDirection")); - let runtime_entry = - packet_citation_matching_display_contains(citations, "IndexService::run_indexing") - .or_else(|| packet_citation_matching_display(citations, "Runtime::index_service")); - if let Some(runtime_entry) = runtime_entry { - let mut claim_citations = Vec::new(); - if let Some(cli_entry) = cli_entry { - claim_citations.push(cli_entry.clone()); - } - claim_citations.push(runtime_entry.clone()); - packet_push_flow_template_claim_with_citations( - claims, - seen, - "The CLI index command prepares command options and delegates indexing work into the runtime layer.", - claim_citations, - ); - } - - let workspace_plan = - packet_citation_matching_display(citations, "WorkspaceManifest::build_execution_plan"); - if let Some(runtime_entry) = runtime_entry { - let mut claim_citations = vec![runtime_entry.clone()]; - if let Some(workspace_plan) = workspace_plan { - claim_citations.push(workspace_plan.clone()); - } - packet_push_flow_template_claim_with_citations( - claims, - seen, - "The runtime opens the workspace and store, chooses full or incremental indexing, and coordinates later refresh phases.", - claim_citations, - ); - } - - if let Some(workspace_plan) = workspace_plan { - packet_push_flow_template_claim( - claims, - seen, - "The workspace crate is responsible for source-file discovery and refresh-plan construction.", - Some(workspace_plan.clone()), - ); - } - - let workspace_indexer = packet_citation_matching_display(citations, "WorkspaceIndexer::run"); - let index_file = packet_citation_matching_display(citations, "index_file"); - if workspace_indexer.is_some() || index_file.is_some() { - let mut claim_citations = Vec::new(); - if let Some(workspace_indexer) = workspace_indexer { - claim_citations.push(workspace_indexer.clone()); - } - if let Some(index_file) = index_file { - claim_citations.push(index_file.clone()); - } - packet_push_flow_template_claim_with_citations( - claims, - seen, - "The indexer extracts nodes, edges, occurrences, and related symbol data from source files.", - claim_citations, - ); - } - - let storage_flush = - packet_citation_matching_display(citations, "Storage::flush_projection_batch"); - let search_projection = packet_citation_matching_display( - citations, - "Storage::rebuild_search_symbol_projection_from_node_table", - ); - if storage_flush.is_some() || search_projection.is_some() { - let mut claim_citations = Vec::new(); - if let Some(storage_flush) = storage_flush { - claim_citations.push(storage_flush.clone()); - } - if let Some(search_projection) = search_projection { - claim_citations.push(search_projection.clone()); - } - packet_push_flow_template_claim_with_citations( - claims, - seen, - "The store persists graph and file data to SQLite and rebuilds query/search projections from persisted data.", - claim_citations, - ); - } - - if let Some(snapshot_refresh) = - packet_citation_matching_display(citations, "SnapshotStore::refresh_all_with_stats") - { - packet_push_flow_template_claim( - claims, - seen, - "Snapshot refresh happens after persisted data changes so later grounding and summary reads see current indexed state.", - Some(snapshot_refresh.clone()), - ); - } -} - -fn packet_append_indexing_storage_flow_template_claims( - prompt: &str, - citations: &[AgentCitationDto], - claims: &mut Vec, - seen: &mut HashSet, -) { - let normalized_prompt = normalize_identifier(prompt); - let indexing_prompt = normalized_prompt.contains("indexing") - || normalized_prompt.contains("indexed") - || normalized_prompt.contains("indexer"); - let storage_prompt = normalized_prompt.contains("storage") - || normalized_prompt.contains("persistent") - || normalized_prompt.contains("sourcegroup") - || normalized_prompt.contains("sourcegroupconfiguration"); - if !(indexing_prompt && storage_prompt) { - return; - } - - let source_group = citations - .iter() - .find(|citation| packet_evidence_role(citation) == Some("source-group configuration")); - let indexing_work = citations - .iter() - .find(|citation| packet_evidence_role(citation) == Some("indexing work queue")); - if let Some(source_group) = source_group - && let Some(indexing_work) = indexing_work - { - packet_push_flow_template_claim_with_citations( - claims, - seen, - "Source-group configuration and indexing command evidence describe how repository configuration becomes indexing work.", - vec![source_group.clone(), indexing_work.clone()], - ); - } - - if let Some(persistence) = citations.iter().find(|citation| { - packet_evidence_role(citation) == Some("persistence and search projection") - }) { - packet_push_flow_template_claim( - claims, - seen, - "Persistence/search-projection evidence describes how indexed data remains available to later application reads.", - Some(persistence.clone()), - ); - } -} - -fn packet_append_command_flow_template_claims( - prompt: &str, - citations: &[AgentCitationDto], - claims: &mut Vec, - seen: &mut HashSet, -) { - let normalized_prompt = normalize_identifier(prompt); - if !(normalized_prompt.contains("cli") - || normalized_prompt.contains("command") - || normalized_prompt.contains("subcommand")) - { - return; - } - - for descriptor in packet_command_descriptors(prompt) { - let subcommand_display = format!("Subcommand::{}", descriptor.subcommand_title); - let cli_display = format!("{}::Cli", descriptor.module); - let run_main_display = format!("{}::run_main", descriptor.module); - let subcommand_citation = packet_citation_matching_display(citations, &subcommand_display); - let cli_citation = packet_citation_matching_display(citations, &cli_display); - let run_main_citation = packet_citation_matching_display(citations, &run_main_display) - .or_else(|| { - packet_citation_matching_path_and_display( - citations, - &descriptor.crate_segment, - "run_main", - ) - }); - - if let Some(subcommand_citation) = subcommand_citation - && (cli_citation.is_some() || run_main_citation.is_some()) - { - let mut claim_citations = vec![subcommand_citation.clone()]; - if let Some(cli_citation) = cli_citation { - claim_citations.push(cli_citation.clone()); - } else if let Some(run_main_citation) = run_main_citation { - claim_citations.push(run_main_citation.clone()); - } - let claim = format!( - "The top-level {} CLI has a cited {} subcommand and command-module entrypoint in `{}`.", - descriptor.command_title, descriptor.subcommand_title, descriptor.module - ); - packet_push_flow_template_claim_with_citations(claims, seen, &claim, claim_citations); - } - - if let Some(cli_citation) = cli_citation - && let Some(run_main_citation) = run_main_citation - { - packet_push_flow_template_claim_with_citations( - claims, - seen, - &format!( - "The {} binary parses {}-specific CLI options and calls {}::run_main.", - descriptor.module.replace('_', "-"), - descriptor.crate_segment, - descriptor.module - ), - vec![cli_citation.clone(), run_main_citation.clone()], - ); - if (normalized_prompt.contains("json") || normalized_prompt.contains("jsonl")) - && packet_command_crate_sources_contain_all( - citations, - &descriptor.crate_segment, - &[&["long = \"json\"", "--json"], &["jsonl"]], - ) - { - packet_push_flow_template_claim( - claims, - seen, - &format!( - "The {} CLI defines --json as the switch that chooses JSONL stdout output.", - descriptor.crate_segment - ), - Some(cli_citation.clone()), - ); - } - } - - let runtime_citation = run_main_citation.or_else(|| { - packet_citation_matching_path_and_display( - citations, - &descriptor.crate_segment, - "run_exec_session", - ) - }); - if let Some(runtime_citation) = runtime_citation - && (normalized_prompt.contains("appserver") - || normalized_prompt.contains("runtime") - || normalized_prompt.contains("thread") - || normalized_prompt.contains("turn")) - && packet_command_crate_sources_contain_all( - citations, - &descriptor.crate_segment, - &[ - &[ - "configbuilder", - "configbuilder::default", - "configbuilder::default()", - ], - &["approval"], - &["sandbox"], - &["inprocessclientstartargs"], - ], - ) - { - packet_push_flow_template_claim( - claims, - seen, - "run_main loads config, resolves sandbox and approval settings, and builds the in-process app-server start arguments.", - Some(runtime_citation.clone()), - ); - } - } - - if (normalized_prompt.contains("json") || normalized_prompt.contains("jsonl")) - && (normalized_prompt.contains("event") || normalized_prompt.contains("output")) - && let Some(json_output_citation) = citations - .iter() - .find(|citation| packet_evidence_role(citation) == Some("event output processing")) - { - packet_push_flow_template_claim( - claims, - seen, - "Event-output processing evidence describes how structured runtime events are serialized for JSON/JSONL output.", - Some(json_output_citation.clone()), - ); - } -} - -fn packet_citation_matching_display<'a>( - citations: &'a [AgentCitationDto], - display_needle: &str, -) -> Option<&'a AgentCitationDto> { - let needle = normalize_identifier(display_needle); - citations - .iter() - .find(|citation| normalize_identifier(&citation.display_name) == needle) -} - -fn packet_citation_matching_display_contains<'a>( - citations: &'a [AgentCitationDto], - display_needle: &str, -) -> Option<&'a AgentCitationDto> { - let needle = normalize_identifier(display_needle); - citations - .iter() - .find(|citation| normalize_identifier(&citation.display_name).contains(&needle)) -} - -fn packet_citation_matching_path_and_display<'a>( - citations: &'a [AgentCitationDto], - path_needle: &str, - display_needle: &str, -) -> Option<&'a AgentCitationDto> { - let normalized_path_needle = normalize_identifier(path_needle); - let normalized_display_needle = normalize_identifier(display_needle); - citations.iter().find(|citation| { - let path_match = citation - .file_path - .as_deref() - .map(packet_display_path) - .map(|path| normalize_identifier(&path).contains(&normalized_path_needle)) - .unwrap_or(false); - path_match - && normalize_identifier(&citation.display_name).contains(&normalized_display_needle) - }) -} - -fn packet_command_crate_sources_contain_all( - citations: &[AgentCitationDto], - crate_segment: &str, - groups: &[&[&str]], -) -> bool { - let mut combined = String::new(); - for citation in citations - .iter() - .filter(|citation| packet_citation_path_contains_crate_segment(citation, crate_segment)) - { - let Some(source) = packet_citation_source_text(citation) else { - continue; - }; - combined.push_str(&source.to_ascii_lowercase()); - combined.push('\n'); - } - !combined.is_empty() - && groups.iter().all(|terms| { - terms - .iter() - .any(|term| combined.contains(&term.to_ascii_lowercase())) - }) -} - -fn packet_citation_path_contains_crate_segment( - citation: &AgentCitationDto, - crate_segment: &str, -) -> bool { - let crate_segment = normalize_identifier(crate_segment); - if crate_segment.is_empty() { - return false; - } - citation - .file_path - .as_deref() - .map(|path| { - let raw = path.trim_start_matches("\\\\?\\").replace('\\', "/"); - let display = packet_display_path(path).replace('\\', "/"); - format!("{raw}\n{display}").to_ascii_lowercase() - }) - .map(|path| { - let needle = format!("/{crate_segment}/src/"); - path.contains(&needle) - }) - .unwrap_or(false) -} - -fn packet_citation_source_text(citation: &AgentCitationDto) -> Option { - let path = citation.file_path.as_deref()?; - std::fs::read_to_string(path).ok() -} - -fn packet_append_source_definition_claims( - citations: &[AgentCitationDto], - rank_terms: &[String], - claims: &mut Vec, - seen_claims: &mut HashSet, -) { - let normalized_terms = rank_terms - .iter() - .map(|term| normalize_identifier(term)) - .filter(|term| term.len() >= 6) - .collect::>(); - let rank_tokens = packet_definition_rank_tokens(rank_terms); - if normalized_terms.is_empty() && rank_tokens.is_empty() { - return; - } - - let mut seen_definitions = HashSet::new(); - let mut appended = 0; - for citation in citations.iter().take(24) { - let Some(source) = packet_citation_source_text(citation) else { - continue; - }; - if source.len() > 400_000 { - continue; - } - for line in source.lines().take(4_000) { - let Some(definition) = packet_source_definition_name(line) else { - continue; - }; - let normalized_definition = normalize_identifier(&definition); - if !packet_definition_matches_rank_terms( - &definition, - &normalized_definition, - &normalized_terms, - &rank_tokens, - ) { - continue; - } - let path = citation - .file_path - .as_deref() - .map(packet_display_path) - .unwrap_or_else(|| "".to_string()); - let definition_key = format!("{normalized_definition}:{path}"); - if !seen_definitions.insert(definition_key) { - continue; - } - packet_push_flow_template_claim( - claims, - seen_claims, - &format!( - "`{definition}` is defined in cited source `{path}` and should be treated as an exact source anchor for this flow." - ), - Some(citation.clone()), - ); - appended += 1; - if claims.len() >= 18 { - return; - } - if appended >= PACKET_SOURCE_DEFINITION_CLAIM_LIMIT { - return; - } - } - } -} - -fn packet_source_definition_name(line: &str) -> Option { - let trimmed = line.trim_start(); - for prefix in [ - "pub async fn ", - "pub(crate) async fn ", - "async fn ", - "pub fn ", - "pub(crate) fn ", - "fn ", - "pub struct ", - "pub(crate) struct ", - "struct ", - "pub enum ", - "pub(crate) enum ", - "enum ", - "pub trait ", - "pub(crate) trait ", - "trait ", - "export class ", - "class ", - "export interface ", - "interface ", - "export function ", - "function ", - "export const ", - "const ", - "export type ", - "type ", - ] { - if let Some(rest) = trimmed.strip_prefix(prefix) { - return packet_take_definition_identifier(rest); - } - } - None -} - -fn packet_take_definition_identifier(rest: &str) -> Option { - let mut identifier = String::new(); - for ch in rest.chars() { - if ch.is_ascii_alphanumeric() || ch == '_' || ch == '$' { - identifier.push(ch); - } else { - break; - } - } - (identifier.len() >= 3).then_some(identifier) -} - -fn packet_definition_matches_rank_terms( - definition: &str, - normalized_definition: &str, - normalized_terms: &[String], - rank_tokens: &HashSet, -) -> bool { - if normalized_definition.len() < 6 { - return false; - } - if normalized_terms - .iter() - .any(|term| term == normalized_definition) - { - return true; - } - let definition_tokens = packet_identifier_tokens(definition); - let overlap = definition_tokens - .iter() - .filter(|token| rank_tokens.contains(token.as_str())) - .count(); - overlap >= 2 || (definition_tokens.iter().any(|token| token == "exec") && overlap >= 1) -} - -fn packet_definition_rank_tokens(rank_terms: &[String]) -> HashSet { - rank_terms - .iter() - .flat_map(|term| packet_identifier_tokens(term)) - .filter(|term| { - term.len() >= 3 - && !matches!( - term.as_str(), - "the" | "and" | "for" | "with" | "from" | "into" | "flow" | "flows" - ) - }) - .collect() -} - -fn packet_identifier_tokens(identifier: &str) -> Vec { - let mut tokens = Vec::new(); - let mut current = String::new(); - let mut previous_lower_or_digit = false; - for ch in identifier.chars() { - if ch == '_' || ch == '-' || ch == '$' || ch.is_whitespace() { - if !current.is_empty() { - tokens.push(current.clone()); - current.clear(); - } - previous_lower_or_digit = false; - continue; - } - if ch.is_ascii_uppercase() && previous_lower_or_digit && !current.is_empty() { - tokens.push(current.clone()); - current.clear(); - } - if ch.is_ascii_alphanumeric() { - current.extend(ch.to_lowercase()); - previous_lower_or_digit = ch.is_ascii_lowercase() || ch.is_ascii_digit(); - } else if !current.is_empty() { - tokens.push(current.clone()); - current.clear(); - previous_lower_or_digit = false; - } - } - if !current.is_empty() { - tokens.push(current); - } - tokens -} - -fn packet_supported_claims(answer: &AgentAnswerDto) -> Vec { - let mut claims = Vec::new(); - let mut seen_claims = HashSet::new(); - let rank_terms = packet_rank_terms(&answer.prompt); - let prefer_primary_sources = !query_mentions_non_primary_source(&answer.prompt); - let citations = answer.citations.clone(); - - packet_append_flow_template_claims(&answer.prompt, &citations, &mut claims, &mut seen_claims); - - let mut ordered_citations = citations; - ordered_citations.sort_by(|left, right| { - packet_claim_carry_rank(right, &rank_terms, prefer_primary_sources) - .partial_cmp(&packet_claim_carry_rank( - left, - &rank_terms, - prefer_primary_sources, - )) - .unwrap_or(Ordering::Equal) - }); - for citation in &ordered_citations { - if let Some(shaped) = packet_citation_shaped_claim(citation, &answer.prompt) { - let key = normalize_identifier(&shaped); - if seen_claims.insert(key) { - claims.push(PacketClaimDto { - claim: shaped, - citations: vec![citation.clone()], - }); - } - continue; - } - let role = match packet_evidence_role(citation) { - Some("tests and regression coverage") => { - let lower = answer.prompt.to_ascii_lowercase(); - if lower.contains("test") - || lower.contains("regression") - || lower.contains("edit") - || lower.contains("plan") - { - "tests and regression coverage" - } else { - continue; - } - } - Some(role) => role, - None => "source evidence", - }; - let claim_key = packet_claim_key_for_citation(role, citation); - if !seen_claims.insert(claim_key.clone()) { - continue; - } - claims.push(PacketClaimDto { - claim: packet_claim_for_role(&claim_key, role, citation, &answer.prompt), - citations: vec![citation.clone()], - }); - if claims.len() >= 18 { - break; - } - } - if claims.len() < 18 { - packet_append_source_definition_claims( - &ordered_citations, - &rank_terms, - &mut claims, - &mut seen_claims, - ); - } - claims -} - -fn packet_claim_key_for_citation(role: &'static str, citation: &AgentCitationDto) -> String { - format!("{role}:{}", normalize_identifier(&citation.display_name)) -} - -fn packet_evidence_role(citation: &AgentCitationDto) -> Option<&'static str> { - let display = citation.display_name.to_ascii_lowercase(); - let normalized_display = normalize_identifier(&citation.display_name); - let path = citation - .file_path - .as_deref() - .map(packet_display_path) - .unwrap_or_default() - .to_ascii_lowercase(); - - if path_contains_test_segment(&path) - || path.ends_with("_test.go") - || path.ends_with(".test.ts") - || packet_display_name_is_test_like(&display) - { - Some("tests and regression coverage") - } else if normalized_display.contains("sourcegroup") - || path.contains("source_group") - || path.contains("sourcegroup") - { - Some("source-group configuration") - } else if normalized_display.contains("buildindex") - || normalized_display.contains("taskfillindexercommandsqueue") - || normalized_display.contains("indexercommand") - || normalized_display.contains("javaindexer") - || path.contains("/data/indexer/") - { - Some("indexing work queue") - } else if display_is_command_entrypoint(&citation.display_name, &normalized_display, &path) { - Some("command entrypoint") - } else if display.contains("eventprocessor") - || display.contains("event_processor") - || display.contains("jsonl") - || path.contains("event_processor") - || path.contains("_events") - || path.contains("-events") - || path.contains("jsonl") - { - Some("event output processing") - } else if (display.contains("thread") || display.contains("turn")) - && display.contains("startparams") - || path.contains("/protocol/") - { - Some("app-server request protocol") - } else if display.contains("run_exec") - || display.contains("run_main") - || display.contains("service") - || display.contains("orchestrat") - || display.contains("runtime") - || path.contains("runtime") - { - Some("runtime orchestration") - } else if display.contains("manifest") || display.contains("plan") || path.contains("workspace") - { - Some("workspace discovery and planning") - } else if display.contains("snapshot") || display.contains("refresh") { - Some("snapshot refresh") - } else if display.contains("projection") - || display.contains("persist") - || display.contains("storage") - || display.contains("store") - || path.contains("store") - { - Some("persistence and search projection") - } else if display.contains("indexer") - || display.contains("index_file") - || display.contains("symbol") - || path.contains("indexer") - { - Some("symbol extraction") - } else if display.contains("route") - || display.contains("handler") - || display.contains("router") - || path.contains("/route.") - || path.ends_with("/route.ts") - || path.ends_with("/route.tsx") - { - Some("route handling") - } else if path.contains("/collections/") { - Some("collection configuration") - } else if matches!(citation.kind, NodeKind::FUNCTION | NodeKind::METHOD) - && retrieval_file_role_from_path(&path) == crate::RetrievalFileRole::Source - { - Some("source evidence") - } else { - None - } -} - -fn display_is_command_entrypoint(display: &str, normalized_display: &str, path: &str) -> bool { - if normalized_display == "main" || display.ends_with("::main") { - return true; - } - if path.contains("/cli/") || path.contains("\\cli\\") { - return true; - } - if display.starts_with("Cli") - && display - .chars() - .nth(3) - .is_some_and(|ch| ch.is_uppercase() || ch == '_') - { - return true; - } - if display.contains("::Cli") || display.contains("::cli") { - return true; - } - let lower = display.to_ascii_lowercase(); - lower.contains("commands") && !lower.contains("process") -} - -fn packet_source_evidence_flow_sentence(prompt: &str, focus: &str) -> String { - let normalized_prompt = normalize_identifier(prompt); - if let Some(sentence) = eval_supporting_claim_flow_sentence(&normalized_prompt, focus) { - return sentence; - } - format!( - "supports {focus} in this flow; inspect the cited source, local definitions, and adjacent ownership there" - ) -} - -fn packet_claim_flow_terms(prompt: &str, citation: &AgentCitationDto) -> Vec { - let display = normalize_identifier(&citation.display_name); - let path = normalize_identifier(citation.file_path.as_deref().unwrap_or_default()); - let mut terms = Vec::new(); - for term in packet_rank_terms(prompt) { - if term.len() < 4 || packet_query_stop_term(&term) || packet_adjacent_query_stop_term(&term) - { - continue; - } - let normalized = normalize_identifier(&term); - if normalized.is_empty() { - continue; - } - if (display.contains(&normalized) || path.contains(&normalized)) - && terms.iter().all(|existing| existing != &normalized) - { - terms.push(normalized); - } - if terms.len() >= 4 { - break; - } - } - terms -} - -fn packet_citation_shaped_claim(citation: &AgentCitationDto, prompt: &str) -> Option { - let path = citation - .file_path - .as_deref() - .map(packet_display_path) - .unwrap_or_default(); - eval_citation_shaped_claim(citation, prompt, &path) -} - -fn packet_claim_for_role( - _key: &str, - role: &str, - citation: &AgentCitationDto, - prompt: &str, -) -> String { - if let Some(shaped) = packet_citation_shaped_claim(citation, prompt) { - return shaped; - } - let symbol = citation.display_name.as_str(); - let path = citation - .file_path - .as_deref() - .map(packet_display_path) - .unwrap_or_default(); - match role { - "command entrypoint" => format!( - "The command or public entrypoint for this flow is anchored by `{symbol}`; inspect it before following downstream coordination." - ), - "runtime orchestration" => format!( - "Runtime orchestration is anchored by `{symbol}`; verify coordination, state transitions, and downstream service calls there." - ), - "workspace discovery and planning" => format!( - "Workspace discovery or planning is anchored by `{symbol}`; inspect it for file selection, manifest, or execution-plan behavior." - ), - "source-group configuration" => format!( - "Source-group configuration is anchored by `{symbol}`; inspect it for how project settings become source-group-specific indexing inputs." - ), - "indexing work queue" => format!( - "Indexing work queue behavior is anchored by `{symbol}`; inspect it for build-index commands, parser handoff, or source-file work items." - ), - "symbol extraction" => format!( - "Symbol extraction is anchored by `{symbol}`; inspect it for nodes, edges, occurrences, or file-level indexing." - ), - "persistence and search projection" => format!( - "Persistence or search projection is anchored by `{symbol}`; inspect it for durable graph/search state." - ), - "snapshot refresh" => format!( - "Snapshot refresh is anchored by `{symbol}`; inspect it for post-write summary or cache refresh behavior." - ), - "route handling" => format!( - "Route handling is anchored by `{symbol}`; inspect it before tracing request dispatch or handler ownership." - ), - "collection configuration" => format!( - "Collection configuration is anchored by `{symbol}`; inspect schema fields, hooks, and access rules." - ), - "event output processing" => format!( - "JSON/event output processing is anchored by `{symbol}`; inspect it for typed event serialization and stdout behavior." - ), - "app-server request protocol" => format!( - "App-server request protocol evidence is anchored by `{symbol}`; inspect it for thread or turn start request shape." - ), - "tests and regression coverage" => format!( - "Regression coverage for this flow is anchored by `{symbol}`; use it to choose focused verification before broader suites." - ), - "source evidence" => { - let flow_terms = packet_claim_flow_terms(prompt, citation); - let focus = if flow_terms.is_empty() { - "this flow".to_string() - } else { - flow_terms.join(", ") - }; - format!( - "`{symbol}` in `{path}` {}; inspect definitions and downstream handoff there.", - packet_source_evidence_flow_sentence(prompt, &focus) - ) - } - _ => format!("Evidence for this flow is anchored by `{symbol}`."), - } -} - -fn path_contains_test_segment(path: &str) -> bool { - path.starts_with("test/") - || path.starts_with("tests/") - || path.contains("/test/") - || path.contains("/tests/") - || path.contains("-test-") - || path.contains("_test_") - || path.contains("_tests.") - || path.starts_with("test\\") - || path.starts_with("tests\\") - || path.contains("\\test\\") - || path.contains("\\tests\\") -} - -fn packet_retrieval_profile( - task_class: Option, - budget: PacketBudgetModeDto, - limits: &PacketBudgetLimitsDto, -) -> AgentRetrievalProfileSelectionDto { - let preset = match task_class { - Some(PacketTaskClassDto::BugLocalization) | Some(PacketTaskClassDto::EditPlanning) => { - AgentRetrievalPresetDto::Investigate - } - Some(PacketTaskClassDto::ChangeImpact) | Some(PacketTaskClassDto::SymbolOwnership) => { - AgentRetrievalPresetDto::Impact - } - Some(PacketTaskClassDto::RouteTracing) => AgentRetrievalPresetDto::Callflow, - Some(PacketTaskClassDto::ArchitectureExplanation) - | Some(PacketTaskClassDto::DataFlow) - | None => AgentRetrievalPresetDto::Architecture, - }; - - if matches!( - budget, - PacketBudgetModeDto::Tiny | PacketBudgetModeDto::Compact - ) { - return AgentRetrievalProfileSelectionDto::Custom { - config: AgentCustomRetrievalConfigDto { - depth: if matches!(budget, PacketBudgetModeDto::Tiny) { - 1 - } else { - 2 - }, - max_nodes: limits.max_trail_edges.clamp(10, 2_000), - include_edge_occurrences: matches!( - task_class, - Some(PacketTaskClassDto::ChangeImpact | PacketTaskClassDto::RouteTracing) - ), - enable_source_reads: true, - ..AgentCustomRetrievalConfigDto::default() - }, - }; - } - - AgentRetrievalProfileSelectionDto::Preset { preset } -} - -fn packet_budget_limits(mode: PacketBudgetModeDto) -> PacketBudgetLimitsDto { - match mode { - PacketBudgetModeDto::Tiny => PacketBudgetLimitsDto { - max_anchors: 3, - max_files: 3, - max_snippets: 6, - max_trail_edges: 12, - max_output_bytes: 24 * 1024, - }, - PacketBudgetModeDto::Compact => PacketBudgetLimitsDto { - max_anchors: 13, - max_files: 13, - max_snippets: 12, - max_trail_edges: 20, - max_output_bytes: 96 * 1024, - }, - PacketBudgetModeDto::Standard => PacketBudgetLimitsDto { - max_anchors: 16, - max_files: 16, - max_snippets: 24, - max_trail_edges: 60, - max_output_bytes: 128 * 1024, - }, - PacketBudgetModeDto::Deep => PacketBudgetLimitsDto { - max_anchors: 25, - max_files: 25, - max_snippets: 80, - max_trail_edges: 240, - max_output_bytes: 512 * 1024, - }, - } -} - -fn apply_packet_budget( - project_root: &Path, - question: &str, - task_class: PacketTaskClassDto, - requested: PacketBudgetModeDto, - limits: PacketBudgetLimitsDto, - answer: &mut AgentAnswerDto, -) -> PacketBudgetDto { - let mut truncated = false; - let mut omitted_sections = Vec::new(); - - let mut protected_probe_queries = packet_command_exact_probe_queries(question, task_class); - push_unique_owned_terms( - &mut protected_probe_queries, - &packet_sufficiency_required_probe_queries(question, task_class), - ); - if cap_packet_citations(answer, &limits, &protected_probe_queries) { - truncated = true; - omitted_sections.push("citations".to_string()); - } - if cap_graph_edges(answer, limits.max_trail_edges) { - truncated = true; - omitted_sections.push("trail_edges".to_string()); - } - if truncate_answer_markdown_to_byte_cap(answer, limits.max_output_bytes as usize) { - truncated = true; - omitted_sections.push("markdown_blocks".to_string()); - } - - let used = packet_budget_usage(answer); - if used.output_bytes > limits.max_output_bytes { - truncated = true; - omitted_sections.push("output_bytes".to_string()); - } - - omitted_sections.sort(); - omitted_sections.dedup(); - - PacketBudgetDto { - requested, - limits, - used, - truncated, - omitted_sections, - next_deeper_command: next_deeper_packet_command(project_root, question, requested), - } -} - -fn enforce_packet_output_budget(project_root: &Path, packet: &mut AgentPacketDto) { - for _ in 0..8 { - let output_bytes = refresh_packet_output_bytes(packet); - if output_bytes <= packet.budget.limits.max_output_bytes as usize { - break; - } - - packet.budget.truncated = true; - push_omitted_section(&mut packet.budget, "output_bytes"); - push_omitted_section(&mut packet.budget, "packet_payload"); - - let over_by = output_bytes.saturating_sub(packet.budget.limits.max_output_bytes as usize); - let current_answer_bytes = serde_json::to_vec(&packet.answer) - .map(|bytes| bytes.len()) - .unwrap_or_default(); - let next_answer_cap = current_answer_bytes - .saturating_sub(over_by.saturating_add(1024)) - .max(1024); - - if truncate_answer_markdown_to_byte_cap(&mut packet.answer, next_answer_cap) { - push_omitted_section(&mut packet.budget, "markdown_blocks"); - packet.budget.used = packet_budget_usage(&packet.answer); - packet.benchmark_trace = packet_benchmark_trace(&packet.answer); - packet.sufficiency = build_packet_sufficiency( - project_root, - &packet.question, - packet - .task_class - .unwrap_or(PacketTaskClassDto::ArchitectureExplanation), - &packet.answer, - &packet.budget, - ); - continue; - } - break; - } - - let output_bytes = refresh_packet_output_bytes(packet); - if output_bytes > packet.budget.limits.max_output_bytes as usize { - packet.budget.truncated = true; - push_omitted_section(&mut packet.budget, "output_bytes"); - push_omitted_section(&mut packet.budget, "packet_payload"); - packet.sufficiency = build_packet_sufficiency( - project_root, - &packet.question, - packet - .task_class - .unwrap_or(PacketTaskClassDto::ArchitectureExplanation), - &packet.answer, - &packet.budget, - ); - } else { - remove_omitted_section(&mut packet.budget, "output_bytes"); - remove_omitted_section(&mut packet.budget, "packet_payload"); - let _ = refresh_packet_output_bytes(packet); - packet.sufficiency = build_packet_sufficiency( - project_root, - &packet.question, - packet - .task_class - .unwrap_or(PacketTaskClassDto::ArchitectureExplanation), - &packet.answer, - &packet.budget, - ); - let _ = refresh_packet_output_bytes(packet); - } -} - -fn refresh_packet_output_bytes(packet: &mut AgentPacketDto) -> usize { - for _ in 0..4 { - let output_bytes = serialized_packet_len(packet); - let output_bytes_u32 = output_bytes.try_into().unwrap_or(u32::MAX); - if packet.budget.used.output_bytes == output_bytes_u32 { - return output_bytes; - } - packet.budget.used.output_bytes = output_bytes_u32; - } - serialized_packet_len(packet) -} - -fn serialized_packet_len(packet: &AgentPacketDto) -> usize { - serde_json::to_vec(packet) - .map(|bytes| bytes.len()) - .unwrap_or_default() -} - -fn push_omitted_section(budget: &mut PacketBudgetDto, section: &str) { - if !budget - .omitted_sections - .iter() - .any(|existing| existing == section) - { - budget.omitted_sections.push(section.to_string()); - budget.omitted_sections.sort(); - } -} - -fn remove_omitted_section(budget: &mut PacketBudgetDto, section: &str) { - budget - .omitted_sections - .retain(|existing| existing != section); -} - -fn cap_citations(answer: &mut AgentAnswerDto, limits: &PacketBudgetLimitsDto) -> bool { - cap_citations_with_protected(answer, limits, &HashSet::new()) -} - -fn cap_citations_with_protected( - answer: &mut AgentAnswerDto, - limits: &PacketBudgetLimitsDto, - protected_citation_keys: &HashSet, -) -> bool { - let original_len = answer.citations.len(); - let mut files = HashSet::new(); - let mut roles = HashSet::new(); - let mut claim_keys: HashSet = HashSet::new(); - let mut secondary_claim_keys: HashSet = HashSet::new(); - let mut kept = Vec::new(); - let mut deferred = Vec::new(); - - for citation in answer.citations.drain(..) { - let citation_key = packet_citation_key(&citation); - let file = citation.file_path.as_deref().map(packet_display_path); - let role = packet_evidence_role(&citation); - let claim_key = role.map(|role| packet_claim_key_for_citation(role, &citation)); - let low_priority_role = packet_low_priority_cap_role(role); - let protected = protected_citation_keys.contains(&citation_key); - if protected - && kept.len() < limits.max_anchors as usize - && packet_file_fits_limit(file.as_deref(), &files, limits.max_files) - { - if let Some(path) = file { - files.insert(path); - } - if let Some(role) = role { - roles.insert(role); - } - if let Some(ref claim_key) = claim_key { - claim_keys.insert(claim_key.clone()); - } - kept.push(citation); - continue; - } - if let Some(ref claim_key) = claim_key - && claim_keys.contains(claim_key) - && replace_weaker_duplicate_claim_citation( - &mut kept, - claim_key, - citation.clone(), - protected_citation_keys, - ) - { - rebuild_packet_cap_tracking(&kept, &mut files, &mut roles, &mut claim_keys); - continue; - } - let file_is_new = file.as_ref().is_some_and(|path| !files.contains(path)); - let role_is_new = role.is_some_and(|role| !roles.contains(role)); - let claim_key_is_new = claim_key - .as_ref() - .is_some_and(|key| !claim_keys.contains(key)); - let secondary_claim_definition = claim_key.as_ref().is_some_and(|key| { - claim_keys.contains(key) - && !secondary_claim_keys.contains(key) - && packet_keep_secondary_claim_definition(key, &citation) - }); - let claim_key_expands_primary_packet_coverage = - !low_priority_role && claim_key_is_new && (role_is_new || file_is_new); - let expands_primary_packet_coverage = !low_priority_role - && (claim_key_expands_primary_packet_coverage - || role_is_new - || kept.is_empty() - || (claim_key.is_none() && file_is_new) - || secondary_claim_definition); - if kept.len() >= limits.max_anchors as usize - && packet_primary_definition_file_citation(&citation) - && replace_weaker_same_role_or_low_priority_citation( - &mut kept, - citation.clone(), - protected_citation_keys, - limits, - ) - { - rebuild_packet_cap_tracking(&kept, &mut files, &mut roles, &mut claim_keys); - continue; - } - if kept.len() >= limits.max_anchors as usize - && !low_priority_role - && role_is_new - && replace_overrepresented_role_citation( - &mut kept, - citation.clone(), - protected_citation_keys, - limits, - ) - { - rebuild_packet_cap_tracking(&kept, &mut files, &mut roles, &mut claim_keys); - continue; - } - if kept.len() < limits.max_anchors as usize - && expands_primary_packet_coverage - && packet_file_fits_limit(file.as_deref(), &files, limits.max_files) - { - if let Some(path) = file { - files.insert(path); - } - if let Some(role) = role { - roles.insert(role); - } - if let Some(ref claim_key) = claim_key { - claim_keys.insert(claim_key.clone()); - if secondary_claim_definition { - secondary_claim_keys.insert(claim_key.clone()); - } - } - kept.push(citation); - } else { - deferred.push(citation); - } - } - - let mut primary_new_files = Vec::new(); - let mut primary_duplicate_files = Vec::new(); - let mut low_priority_new_files = Vec::new(); - let mut low_priority_duplicate_files = Vec::new(); - for citation in deferred { - let file = citation.file_path.as_deref().map(packet_display_path); - let low_priority = packet_low_priority_cap_role(packet_evidence_role(&citation)); - if file.as_ref().is_some_and(|path| files.contains(path)) { - if low_priority { - low_priority_duplicate_files.push(citation); - } else { - primary_duplicate_files.push(citation); - } - } else if low_priority { - low_priority_new_files.push(citation); - } else { - primary_new_files.push(citation); - } - } - for citation in primary_new_files - .into_iter() - .chain(primary_duplicate_files) - .chain(low_priority_new_files) - .chain(low_priority_duplicate_files) - { - if kept.len() >= limits.max_anchors as usize { - continue; - } - let file = citation.file_path.as_deref().map(packet_display_path); - if !packet_file_fits_limit(file.as_deref(), &files, limits.max_files) { - continue; - } - if let Some(path) = file { - files.insert(path); - } - kept.push(citation); - } - - let truncated = kept.len() < original_len; - answer.citations = kept; - truncated -} - -fn packet_low_priority_cap_role(role: Option<&str>) -> bool { - matches!(role, Some("tests and regression coverage")) -} - -fn replace_weaker_same_role_or_low_priority_citation( - kept: &mut [AgentCitationDto], - candidate: AgentCitationDto, - protected_citation_keys: &HashSet, - limits: &PacketBudgetLimitsDto, -) -> bool { - let candidate_role = packet_evidence_role(&candidate); - let candidate_file = candidate.file_path.as_deref().map(packet_display_path); - let mut replacement: Option<(usize, u8, f32)> = None; - - for (index, existing) in kept.iter().enumerate() { - if protected_citation_keys.contains(&packet_citation_key(existing)) { - continue; - } - if !packet_file_fits_limit_after_replacement( - candidate_file.as_deref(), - kept, - index, - limits.max_files, - ) { - continue; - } - - let existing_role = packet_evidence_role(existing); - let replacement_priority = if packet_low_priority_cap_role(existing_role) { - 3 - } else if candidate_role.is_some() - && candidate_role == existing_role - && !packet_primary_definition_file_citation(existing) - { - 2 - } else { - 0 - }; - if replacement_priority == 0 { - continue; - } - - let existing_rank = existing.score; - let should_replace = replacement - .map(|(_, best_priority, best_rank)| { - replacement_priority > best_priority - || (replacement_priority == best_priority && existing_rank < best_rank) - }) - .unwrap_or(true); - if should_replace { - replacement = Some((index, replacement_priority, existing_rank)); - } - } - - let Some((index, _, _)) = replacement else { - return false; - }; - kept[index] = candidate; - true -} - -fn replace_overrepresented_role_citation( - kept: &mut [AgentCitationDto], - candidate: AgentCitationDto, - protected_citation_keys: &HashSet, - limits: &PacketBudgetLimitsDto, -) -> bool { - let Some(candidate_role) = packet_evidence_role(&candidate) else { - return false; - }; - if kept - .iter() - .any(|citation| packet_evidence_role(citation) == Some(candidate_role)) - { - return false; - } - let candidate_file = candidate.file_path.as_deref().map(packet_display_path); - let role_counts = kept.iter().filter_map(packet_evidence_role).fold( - HashMap::<&'static str, usize>::new(), - |mut counts, role| { - *counts.entry(role).or_insert(0) += 1; - counts - }, - ); - - let mut replacement: Option<(usize, usize, f32)> = None; - for (index, existing) in kept.iter().enumerate() { - if protected_citation_keys.contains(&packet_citation_key(existing)) { - continue; - } - let Some(existing_role) = packet_evidence_role(existing) else { - continue; - }; - let existing_role_count = role_counts.get(existing_role).copied().unwrap_or_default(); - if existing_role_count <= 1 { - continue; - } - if !packet_file_fits_limit_after_replacement( - candidate_file.as_deref(), - kept, - index, - limits.max_files, - ) { - continue; - } - let existing_rank = existing.score; - let should_replace = replacement - .map(|(_, best_count, best_rank)| { - existing_role_count > best_count - || (existing_role_count == best_count && existing_rank < best_rank) - }) - .unwrap_or(true); - if should_replace { - replacement = Some((index, existing_role_count, existing_rank)); - } - } - - let Some((index, _, _)) = replacement else { - return false; - }; - kept[index] = candidate; - true -} - -fn packet_file_fits_limit_after_replacement( - path: Option<&str>, - kept: &[AgentCitationDto], - replacement_index: usize, - max_files: u32, -) -> bool { - let files = kept - .iter() - .enumerate() - .filter(|(index, _)| *index != replacement_index) - .filter_map(|(_, citation)| citation.file_path.as_deref().map(packet_display_path)) - .collect::>(); - packet_file_fits_limit(path, &files, max_files) -} - -fn replace_weaker_duplicate_claim_citation( - kept: &mut [AgentCitationDto], - claim_key: &str, - candidate: AgentCitationDto, - protected_citation_keys: &HashSet, -) -> bool { - let Some(index) = kept.iter().position(|citation| { - packet_evidence_role(citation) - .map(|role| packet_claim_key_for_citation(role, citation) == claim_key) - .unwrap_or(false) - }) else { - return false; - }; - if protected_citation_keys.contains(&packet_citation_key(&kept[index])) { - return false; - } - if packet_prefer_duplicate_claim_citation(&candidate, &kept[index]) { - kept[index] = candidate; - return true; - } - false -} - -fn packet_prefer_duplicate_claim_citation( - candidate: &AgentCitationDto, - existing: &AgentCitationDto, -) -> bool { - if packet_prefer_flow_anchor_path_citation(candidate, existing) { - return true; - } - normalize_identifier(&candidate.display_name) == normalize_identifier(&existing.display_name) - && packet_exact_definition_file_citation(candidate) - && !packet_exact_definition_file_citation(existing) -} - -fn packet_primary_definition_file_citation(citation: &AgentCitationDto) -> bool { - packet_exact_definition_file_citation(citation) - || packet_near_stem_type_definition_file(citation) -} - -fn packet_near_stem_type_definition_file(citation: &AgentCitationDto) -> bool { - if citation.origin != SearchHitOrigin::IndexedSymbol - || !citation.resolvable - || !matches!( - citation.kind, - NodeKind::STRUCT - | NodeKind::CLASS - | NodeKind::INTERFACE - | NodeKind::UNION - | NodeKind::ENUM - | NodeKind::TYPEDEF - ) - { - return false; - } - let normalized_display = normalize_identifier(&citation.display_name); - if normalized_display.is_empty() - || packet_low_signal_display_name(normalized_display.as_str()) - || packet_exact_definition_file_citation(citation) - { - return false; - } - let stem = citation - .file_path - .as_deref() - .map(packet_display_path) - .and_then(|path| { - let file_name = path.rsplit('/').next().unwrap_or(path.as_str()); - file_name - .rsplit_once('.') - .map(|(stem, _)| stem.to_string()) - .or_else(|| Some(file_name.to_string())) - }) - .map(|stem| normalize_identifier(&stem)) - .unwrap_or_default(); - if stem.is_empty() { - return false; - } - - let len_delta = normalized_display.len().abs_diff(stem.len()); - if len_delta > 2 { - return false; - } - let shared_prefix = normalized_display - .chars() - .zip(stem.chars()) - .take_while(|(left, right)| left == right) - .count(); - shared_prefix >= 8 - && shared_prefix.saturating_mul(5) - >= normalized_display.len().min(stem.len()).saturating_mul(4) -} - -fn packet_prefer_flow_anchor_path_citation( - candidate: &AgentCitationDto, - existing: &AgentCitationDto, -) -> bool { - let candidate_path = candidate - .file_path - .as_deref() - .map(packet_display_path) - .unwrap_or_default() - .to_ascii_lowercase(); - let existing_path = existing - .file_path - .as_deref() - .map(packet_display_path) - .unwrap_or_default() - .to_ascii_lowercase(); - if candidate_path == existing_path { - return false; - } - let candidate_role = retrieval_file_role_from_path(&candidate_path); - let existing_role = retrieval_file_role_from_path(&existing_path); - candidate_role == crate::RetrievalFileRole::Source && existing_role.is_non_primary() -} - -fn packet_exact_definition_file_citation(citation: &AgentCitationDto) -> bool { - citation.origin == SearchHitOrigin::IndexedSymbol - && citation.resolvable - && matches!( - citation.kind, - NodeKind::STRUCT - | NodeKind::CLASS - | NodeKind::INTERFACE - | NodeKind::UNION - | NodeKind::ENUM - | NodeKind::TYPEDEF - ) - && !packet_low_signal_display_name(normalize_identifier(&citation.display_name).as_str()) - && packet_file_stem_matches_query(&citation.display_name, citation.file_path.as_deref()) -} - -fn packet_keep_secondary_claim_definition(_claim_key: &str, citation: &AgentCitationDto) -> bool { - if !packet_primary_definition_file_citation(citation) { - return false; - } - packet_mandatory_secondary_path_citation(citation) -} - -fn packet_mandatory_secondary_path_citation(citation: &AgentCitationDto) -> bool { - let path = citation - .file_path - .as_deref() - .map(packet_display_path) - .unwrap_or_default() - .to_ascii_lowercase(); - path.contains("event_processor") - || path.contains("_events") - || path.contains("-events") - || path.contains("/cli/") - || path.ends_with("/main.rs") -} - -fn rebuild_packet_cap_tracking( - kept: &[AgentCitationDto], - files: &mut HashSet, - roles: &mut HashSet<&'static str>, - claim_keys: &mut HashSet, -) { - files.clear(); - roles.clear(); - claim_keys.clear(); - for citation in kept { - if let Some(path) = citation.file_path.as_deref().map(packet_display_path) { - files.insert(path); - } - if let Some(role) = packet_evidence_role(citation) { - roles.insert(role); - claim_keys.insert(packet_claim_key_for_citation(role, citation)); - } - } -} - -fn packet_file_fits_limit(path: Option<&str>, files: &HashSet, max_files: u32) -> bool { - path.is_none_or(|path| files.contains(path) || files.len() < max_files as usize) -} - -fn cap_graph_edges(answer: &mut AgentAnswerDto, max_edges: u32) -> bool { - let mut remaining = max_edges as usize; - let mut truncated = false; - for artifact in &mut answer.graphs { - let GraphArtifactDto::Uml { graph, .. } = artifact else { - continue; - }; - if graph.edges.len() > remaining { - let omitted = graph.edges.len() - remaining; - graph.edges.truncate(remaining); - graph.truncated = true; - graph.omitted_edge_count = graph - .omitted_edge_count - .saturating_add(omitted.try_into().unwrap_or(u32::MAX)); - truncated = true; - remaining = 0; - } else { - remaining = remaining.saturating_sub(graph.edges.len()); - } - if prune_graph_to_retained_edges(graph) { - truncated = true; - } - } - truncated -} - -fn prune_graph_to_retained_edges(graph: &mut GraphResponse) -> bool { - let original_nodes = graph.nodes.len(); - let original_layout_nodes = graph - .canonical_layout - .as_ref() - .map(|layout| layout.nodes.len()) - .unwrap_or_default(); - let original_layout_edges = graph - .canonical_layout - .as_ref() - .map(|layout| layout.edges.len()) - .unwrap_or_default(); - let mut retained_node_ids = HashSet::new(); - retained_node_ids.insert(graph.center_id.clone()); - let retained_edge_ids = graph - .edges - .iter() - .map(|edge| edge.id.clone()) - .collect::>(); - - for edge in &graph.edges { - retained_node_ids.insert(edge.source.clone()); - retained_node_ids.insert(edge.target.clone()); - } - - graph - .nodes - .retain(|node| retained_node_ids.contains(&node.id)); - - if let Some(layout) = graph.canonical_layout.as_mut() { - layout.edges.retain(|edge| { - let endpoints_retained = retained_node_ids.contains(&edge.source) - && retained_node_ids.contains(&edge.target); - let source_edge_retained = edge.source_edge_ids.is_empty() - || edge - .source_edge_ids - .iter() - .any(|edge_id| retained_edge_ids.contains(edge_id)); - endpoints_retained && source_edge_retained - }); - layout - .nodes - .retain(|node| retained_node_ids.contains(&node.id)); - } - - let pruned = graph.nodes.len() < original_nodes - || graph - .canonical_layout - .as_ref() - .map(|layout| layout.nodes.len() < original_layout_nodes) - .unwrap_or(false) - || graph - .canonical_layout - .as_ref() - .map(|layout| layout.edges.len() < original_layout_edges) - .unwrap_or(false); - if pruned { - graph.truncated = true; - } - pruned -} - -fn truncate_answer_markdown_to_byte_cap(answer: &mut AgentAnswerDto, byte_cap: usize) -> bool { - let mut truncated = false; - for _ in 0..8 { - let Ok(bytes) = serde_json::to_vec(answer) else { - return truncated; - }; - if bytes.len() <= byte_cap { - return truncated; - } - let Some((section_index, block_index, len)) = largest_markdown_block(answer) else { - return truncated; - }; - if len <= 256 { - return truncated; - } - if let AgentResponseBlockDto::Markdown { markdown } = - &mut answer.sections[section_index].blocks[block_index] - { - truncate_markdown_block(markdown); - truncated = true; - } - } - truncated -} - -fn largest_markdown_block(answer: &AgentAnswerDto) -> Option<(usize, usize, usize)> { - let mut largest = None; - for (section_index, section) in answer.sections.iter().enumerate() { - for (block_index, block) in section.blocks.iter().enumerate() { - if let AgentResponseBlockDto::Markdown { markdown } = block { - let len = markdown.len(); - if largest.is_none_or(|(_, _, existing)| len > existing) { - largest = Some((section_index, block_index, len)); - } - } - } - } - largest -} - -fn truncate_markdown_block(markdown: &mut String) { - let keep_chars = markdown.chars().count() / 2; - let mut keep_byte = markdown.len(); - if let Some((index, _)) = markdown.char_indices().nth(keep_chars) { - keep_byte = index; - } - markdown.truncate(keep_byte); - markdown.push_str(PACKET_MARKDOWN_TRUNCATION_SUFFIX); -} - -fn packet_budget_usage(answer: &AgentAnswerDto) -> PacketBudgetUsageDto { - let files = answer - .citations - .iter() - .filter_map(|citation| citation.file_path.as_deref()) - .collect::>() - .len(); - let trail_edges = answer - .graphs - .iter() - .map(|artifact| match artifact { - GraphArtifactDto::Uml { graph, .. } => graph.edges.len(), - GraphArtifactDto::Mermaid { .. } => 0, - }) - .sum::(); - let snippets = answer - .retrieval_trace - .steps - .iter() - .filter(|step| { - step.kind == AgentRetrievalStepKindDto::SourceRead - && step.status == AgentRetrievalStepStatusDto::Ok - }) - .count(); - let output_bytes = serde_json::to_vec(answer) - .map(|bytes| bytes.len()) - .unwrap_or_default(); - - PacketBudgetUsageDto { - anchors: answer.citations.len().try_into().unwrap_or(u32::MAX), - files: files.try_into().unwrap_or(u32::MAX), - snippets: snippets.try_into().unwrap_or(u32::MAX), - trail_edges: trail_edges.try_into().unwrap_or(u32::MAX), - output_bytes: output_bytes.try_into().unwrap_or(u32::MAX), - } -} - -fn next_deeper_packet_command( - project_root: &Path, - question: &str, - requested: PacketBudgetModeDto, -) -> Option { - let next = match requested { - PacketBudgetModeDto::Tiny => "compact", - PacketBudgetModeDto::Compact => "standard", - PacketBudgetModeDto::Standard => "deep", - PacketBudgetModeDto::Deep => return None, - }; - let project = quote_packet_project_arg(project_root); - Some(format!( - "codestory-cli packet --project {project} --question {} --budget {next}", - quote_packet_command_value(question) - )) -} - -fn quote_packet_project_arg(project_root: &Path) -> String { - quote_packet_command_value(project_root.to_string_lossy().as_ref()) -} - -fn quote_packet_command_value(value: &str) -> String { - format!("'{}'", value.replace('\'', "''")) -} - -fn build_packet_sufficiency( - project_root: &Path, - question: &str, - task_class: PacketTaskClassDto, - answer: &AgentAnswerDto, - budget: &PacketBudgetDto, -) -> PacketSufficiencyDto { - let has_errors = answer - .retrieval_trace - .steps - .iter() - .any(|step| step.status == AgentRetrievalStepStatusDto::Error); - let min_citations = packet_sufficiency_min_citations(task_class); - let min_claims = packet_sufficiency_min_claims(task_class); - let supported_claims = packet_supported_claims(answer); - let has_minimum_coverage = answer.citations.len() >= min_citations; - let has_minimum_claims = supported_claims.len() >= min_claims; - let has_minimum_claim_families = packet_has_minimum_claim_family_coverage(task_class, answer); - let missing_required_probe_queries = - packet_missing_sufficiency_probe_queries(question, task_class, answer); - let has_sufficiency_blocking_budget_omission = packet_has_sufficiency_blocking_budget_omission( - answer, - budget, - min_citations, - min_claims, - supported_claims.len(), - ); - let status = if answer.citations.is_empty() { - PacketSufficiencyStatusDto::Insufficient - } else if has_errors - || !has_minimum_coverage - || !has_minimum_claims - || !has_minimum_claim_families - || !missing_required_probe_queries.is_empty() - || has_sufficiency_blocking_budget_omission - || packet_budget_exceeded_hard_output_cap(budget) - { - PacketSufficiencyStatusDto::Partial - } else { - PacketSufficiencyStatusDto::Sufficient - }; - - let mut gaps = Vec::new(); - if answer.citations.is_empty() { - gaps.push("No cited anchors were found for the question.".to_string()); - } - if !answer.citations.is_empty() && !has_minimum_coverage { - gaps.push(format!( - "{:?} packet found only {} cited anchor(s); at least {} are required before treating the packet as sufficient.", - task_class, - answer.citations.len(), - min_citations - )); - } - if !answer.citations.is_empty() && !has_minimum_claims { - gaps.push(format!( - "{:?} packet found only {} role-backed claim(s); at least {} are required before treating the packet as sufficient.", - task_class, - supported_claims.len(), - min_claims - )); - } - if !answer.citations.is_empty() && !has_minimum_claim_families { - gaps.push(format!( - "{:?} packet covered only {} distinct claim families; at least {} are required before treating the packet as sufficient.", - task_class, - packet_supported_claim_family_count(answer), - packet_sufficiency_min_claim_families(task_class) - )); - } - if !missing_required_probe_queries.is_empty() { - gaps.push(format!( - "{:?} packet missed required planned flow probe(s): {}.", - task_class, - missing_required_probe_queries.join(", ") - )); - } - if budget.truncated && status != PacketSufficiencyStatusDto::Sufficient { - gaps.push(format!( - "Packet was truncated by {:?} budget: {}.", - budget.requested, - budget.omitted_sections.join(", ") - )); - } - if has_sufficiency_blocking_budget_omission { - gaps.push(format!( - "Packet omitted answer-critical evidence under {:?} budget; use a deeper packet before treating this as complete.", - budget.requested - )); - } - for step in answer - .retrieval_trace - .steps - .iter() - .filter(|step| step.status == AgentRetrievalStepStatusDto::Error) - { - gaps.push(format!("{:?} step failed.", step.kind)); - } - - let follow_up_commands = packet_follow_up_commands( - project_root, - question, - task_class, - status, - budget, - &missing_required_probe_queries, - ); - let open_next = follow_up_commands.clone(); - let avoid_opening = answer - .citations - .iter() - .filter_map(|citation| citation.file_path.as_ref()) - .map(|path| packet_display_path(path)) - .collect::>() - .into_iter() - .take(12) - .map(|path| { - format!( - "{} because this packet already includes a citation for the current answer.", - path - ) - }) - .collect::>(); - - let mut covered_claims = supported_claims; - if covered_claims.is_empty() { - covered_claims.push(PacketClaimDto { - claim: answer.summary.clone(), - citations: answer.citations.iter().take(6).cloned().collect(), - }); - } - - PacketSufficiencyDto { - status, - covered_claims, - open_next, - avoid_opening, - gaps, - follow_up_commands, - } -} - -fn packet_sufficiency_min_citations(task_class: PacketTaskClassDto) -> usize { - match task_class { - PacketTaskClassDto::BugLocalization | PacketTaskClassDto::SymbolOwnership => 2, - PacketTaskClassDto::ArchitectureExplanation - | PacketTaskClassDto::ChangeImpact - | PacketTaskClassDto::RouteTracing - | PacketTaskClassDto::DataFlow - | PacketTaskClassDto::EditPlanning => 3, - } -} - -fn packet_sufficiency_min_claims(task_class: PacketTaskClassDto) -> usize { - match task_class { - PacketTaskClassDto::BugLocalization | PacketTaskClassDto::SymbolOwnership => 1, - PacketTaskClassDto::ArchitectureExplanation => 3, - PacketTaskClassDto::ChangeImpact - | PacketTaskClassDto::RouteTracing - | PacketTaskClassDto::DataFlow - | PacketTaskClassDto::EditPlanning => 2, - } -} - -fn packet_sufficiency_min_claim_families(task_class: PacketTaskClassDto) -> usize { - match task_class { - PacketTaskClassDto::ArchitectureExplanation => 3, - PacketTaskClassDto::DataFlow => 2, - PacketTaskClassDto::BugLocalization - | PacketTaskClassDto::ChangeImpact - | PacketTaskClassDto::RouteTracing - | PacketTaskClassDto::SymbolOwnership - | PacketTaskClassDto::EditPlanning => 1, - } -} - -fn packet_has_minimum_claim_family_coverage( - task_class: PacketTaskClassDto, - answer: &AgentAnswerDto, -) -> bool { - packet_supported_claim_family_count(answer) >= packet_sufficiency_min_claim_families(task_class) -} - -fn packet_supported_claim_family_count(answer: &AgentAnswerDto) -> usize { - let mut families: HashSet<&'static str> = HashSet::new(); - for citation in &answer.citations { - let Some(role) = packet_evidence_role(citation) else { - continue; - }; - families.insert(role); - } - families.len() -} - -fn packet_missing_sufficiency_probe_queries( - question: &str, - task_class: PacketTaskClassDto, - answer: &AgentAnswerDto, -) -> Vec { - packet_sufficiency_required_probe_queries(question, task_class) - .into_iter() - .filter(|query| !packet_probe_query_is_cited(query, answer)) - .collect() -} - -fn packet_sufficiency_required_probe_queries( - question: &str, - task_class: PacketTaskClassDto, -) -> Vec { - let terms = packet_probe_terms(question); - packet_sufficiency_required_probe_queries_from_terms(&terms, task_class) -} - -fn packet_sufficiency_required_probe_queries_from_terms( - terms: &[String], - task_class: PacketTaskClassDto, -) -> Vec { - if !matches!( - task_class, - PacketTaskClassDto::ArchitectureExplanation - | PacketTaskClassDto::DataFlow - | PacketTaskClassDto::ChangeImpact - | PacketTaskClassDto::EditPlanning - ) { - return Vec::new(); - } - - let has = |term: &str| terms.iter().any(|value| value.eq_ignore_ascii_case(term)); - let has_any = |needles: &[&str]| needles.iter().any(|needle| has(needle)); - let mut queries = Vec::new(); - - if eval_probes_enabled() { - push_eval_required_probe_queries(terms, &mut queries); - return queries; - } - - if has("exec") && has_any(&["runtime", "session"]) { - push_unique_terms(&mut queries, &["exec runtime", "exec session"]); - } - if has("exec") && has_any(&["cli", "command", "subcommand"]) { - push_unique_terms(&mut queries, &["exec cli", "exec command"]); - } - if has_any(&["json", "jsonl"]) && has_any(&["event", "events", "output"]) { - push_unique_terms(&mut queries, &["json event output", "jsonl event output"]); - } - if has("thread") && has_any(&["start", "starts", "started"]) { - push_unique_term(&mut queries, "thread start"); - } - if has("turn") && has_any(&["start", "starts", "started"]) { - push_unique_term(&mut queries, "turn start"); - } - if has_any(&["storage", "persistent"]) || (has("data") && has_any(&["access", "accessed"])) { - push_unique_terms(&mut queries, &["storage access", "persistent storage"]); - } - if packet_terms_indicate_indexing_flow(terms) { - push_indexing_flow_required_probe_queries(&mut queries); - } - if has_any(&["indexing", "indexed", "indexer"]) - && (has_any(&["storage", "persistent", "project", "configuration", "group"]) - || has_any(&["command", "commands"])) - { - push_unique_terms( - &mut queries, - &["build index", "source group indexing", "indexer command"], - ); - } - - queries -} - -fn push_indexing_flow_required_probe_queries(queries: &mut Vec) { - push_unique_terms( - queries, - &[ - "Runtime::index_service", - "index service run indexing", - "workspace manifest build execution plan", - "workspace indexer run", - "index_file", - "storage flush projection batch", - "storage rebuild search symbol projection", - "snapshot refresh all stats", - ], - ); -} - -fn packet_probe_query_is_cited(query: &str, answer: &AgentAnswerDto) -> bool { - answer - .citations - .iter() - .any(|citation| packet_citation_satisfies_required_probe(query, citation)) -} - -fn packet_citation_satisfies_required_probe(query: &str, citation: &AgentCitationDto) -> bool { - if packet_required_probe_needs_concrete_file(query) { - return packet_file_stem_matches_query(query, citation.file_path.as_deref()); - } - if packet_required_probe_needs_full_token_coverage(query) { - if packet_citation_probe_has_exact_identifier_match(query, citation) { - return true; + let details_step = trace.start_step( + AgentRetrievalStepKindDto::NodeDetails, + vec![field("has_focus", focus_node_id.is_some().to_string())], + ); + let focused_node = match focus_node_id.clone() { + Some(id) => match controller.node_details(NodeDetailsRequest { id }) { + Ok(details) => { + trace.finish_ok( + details_step, + vec![ + field("display_name", details.display_name.clone()), + field("kind", format!("{:?}", details.kind)), + ], + ); + Some(details) + } + Err(error) => { + trace.finish_err(details_step, error.message.clone()); + None + } + }, + None => { + trace.finish_skipped(details_step, "No focus node selected.", Vec::new()); + None } - let tokens = packet_probe_match_tokens(query); - return !tokens.is_empty() - && packet_citation_probe_token_coverage(query, citation) >= tokens.len(); - } - let Some(match_rank) = packet_citation_probe_match_rank(query, citation) else { - return false; }; - !packet_required_probe_needs_exact_match(query) || match_rank >= 4 -} - -fn packet_required_probe_needs_exact_match(query: &str) -> bool { - query.contains("::") || query.contains('.') -} - -fn packet_required_probe_needs_concrete_file(query: &str) -> bool { - let normalized_query = normalize_identifier(query); - normalized_query.contains("execevents") || normalized_query == "eventprocessor" -} -fn packet_required_probe_needs_full_token_coverage(query: &str) -> bool { - matches!( - normalize_identifier(query).as_str(), - "indexservicerunindexing" - | "workspacemanifestbuildexecutionplan" - | "workspaceindexerrun" - | "indexfile" - | "storageflushprojectionbatch" - | "storagerebuildsearchsymbolprojection" - | "snapshotrefreshallstats" - ) -} - -fn packet_citation_probe_has_exact_identifier_match( - query: &str, - citation: &AgentCitationDto, -) -> bool { - let normalized_query = normalize_identifier(query); - if normalized_query.is_empty() { - return false; - } - let normalized_display = normalize_identifier(&citation.display_name); - normalized_display == normalized_query || normalized_display.ends_with(&normalized_query) -} - -fn packet_citation_probe_match_rank(query: &str, citation: &AgentCitationDto) -> Option { - let normalized_query = normalize_identifier(query); - if normalized_query.is_empty() { - return Some(0); - } - let normalized_display = normalize_identifier(&citation.display_name); - let normalized_path = citation - .file_path - .as_deref() - .map(packet_display_path) - .map(|path| normalize_identifier(&path)) - .unwrap_or_default(); - if packet_file_stem_matches_query(query, citation.file_path.as_deref()) { - Some(5) - } else if normalized_display == normalized_query - || normalized_display.ends_with(&normalized_query) - || (!packet_required_probe_needs_exact_match(query) - && packet_citation_probe_token_coverage(query, citation) >= 2) - { - Some(4) - } else if normalized_path.contains(&normalized_query) { - Some(3) - } else if normalized_display.contains(&normalized_query) { - Some(2) - } else if !normalized_display.is_empty() && normalized_query.contains(&normalized_display) { - Some(1) + let occurrences_step = trace.start_step( + AgentRetrievalStepKindDto::NodeOccurrences, + vec![field("candidates", hits.len().min(3).to_string())], + ); + let node_occurrence_deadline = phase_deadline_ms(req, 65, 100); + if should_truncate_phase(resolved_profile, ask_started_at, node_occurrence_deadline) { + trace.finish_truncated( + occurrences_step, + "Skipped node occurrence lookups because latency budget was exceeded.", + vec![field( + "phase_deadline_ms", + node_occurrence_deadline.to_string(), + )], + ); + trace.annotate("Latency-first cutoff skipped node occurrence lookups."); } else { - None - } -} - -fn packet_citation_probe_token_coverage(query: &str, citation: &AgentCitationDto) -> usize { - let tokens = packet_probe_match_tokens(query); - if tokens.len() < 2 { - return 0; - } - let display = normalize_identifier(&citation.display_name); - let path = citation - .file_path - .as_deref() - .map(packet_display_path) - .map(|path| normalize_identifier(&path)) - .unwrap_or_default(); - tokens - .iter() - .filter(|token| display.contains(token.as_str()) || path.contains(token.as_str())) - .count() -} - -fn packet_probe_match_tokens(query: &str) -> Vec { - let mut tokens = Vec::new(); - for token in query - .split(|ch: char| !ch.is_ascii_alphanumeric()) - .map(|token| token.trim().to_ascii_lowercase()) - .filter(|token| token.len() >= 3 && !packet_query_stop_term(token)) - { - if !tokens.iter().any(|existing| existing == &token) { - tokens.push(token); + let mut occurrence_count = 0usize; + for hit in hits.iter().take(3) { + match controller.node_occurrences(NodeOccurrencesRequest { + id: hit.node_id.clone(), + }) { + Ok(occurrences) => { + occurrence_count += occurrences.len(); + } + Err(error) => { + trace.annotate(format!( + "Node occurrence lookup failed for {}: {}", + hit.display_name, error.message + )); + } + } } + trace.finish_ok( + occurrences_step, + vec![field("occurrence_count", occurrence_count.to_string())], + ); } - tokens -} - -fn packet_has_sufficiency_blocking_budget_omission( - answer: &AgentAnswerDto, - budget: &PacketBudgetDto, - min_citations: usize, - min_claims: usize, - supported_claim_count: usize, -) -> bool { - if !budget.truncated { - return false; - } - - let has_claim_stop_signal = - answer.citations.len() >= min_citations && supported_claim_count >= min_claims; - let has_retained_graph = packet_has_retained_graph(answer); - - budget - .omitted_sections - .iter() - .any(|section| match section.as_str() { - "packet_payload" => true, - "markdown_blocks" => { - !has_claim_stop_signal || packet_markdown_truncation_blocks_sufficiency(answer) - } - "trail_edges" => !has_claim_stop_signal || !has_retained_graph, - _ => false, - }) -} - -fn packet_has_retained_graph(answer: &AgentAnswerDto) -> bool { - answer.graphs.iter().any(|artifact| match artifact { - GraphArtifactDto::Uml { graph, .. } => !graph.truncated && !graph.edges.is_empty(), - GraphArtifactDto::Mermaid { .. } => false, - }) -} -fn packet_markdown_truncation_blocks_sufficiency(answer: &AgentAnswerDto) -> bool { - let mut saw_truncated_markdown = false; - for section in &answer.sections { - for block in §ion.blocks { - let AgentResponseBlockDto::Markdown { markdown } = block else { - continue; - }; - if !markdown.contains(PACKET_MARKDOWN_TRUNCATION_SUFFIX.trim()) { - continue; + let edge_occurrences_step = trace.start_step( + AgentRetrievalStepKindDto::EdgeOccurrences, + vec![field( + "enabled", + resolved_profile.include_edge_occurrences.to_string(), + )], + ); + let edge_occurrence_deadline = phase_deadline_ms(req, 75, 100); + if should_truncate_phase(resolved_profile, ask_started_at, edge_occurrence_deadline) { + trace.finish_truncated( + edge_occurrences_step, + "Skipped edge occurrence lookup because latency budget was exceeded.", + vec![field( + "phase_deadline_ms", + edge_occurrence_deadline.to_string(), + )], + ); + trace.annotate("Latency-first cutoff skipped edge occurrence lookups."); + } else if !resolved_profile.include_edge_occurrences { + trace.finish_skipped( + edge_occurrences_step, + "Edge occurrences are disabled for this profile.", + Vec::new(), + ); + } else if let Some(edge_id) = first_edge_id_from_graphs(&bundle.graphs) { + match controller + .edge_occurrences(codestory_contracts::api::EdgeOccurrencesRequest { id: edge_id }) + { + Ok(occurrences) => { + trace.finish_ok( + edge_occurrences_step, + vec![field("occurrence_count", occurrences.len().to_string())], + ); } - saw_truncated_markdown = true; - if !packet_section_allows_nonblocking_truncation(section.id.as_str()) { - return true; + Err(error) => { + trace.finish_err(edge_occurrences_step, error.message.clone()); } } + } else { + trace.finish_skipped( + edge_occurrences_step, + "No edges available for lookup.", + Vec::new(), + ); } - !saw_truncated_markdown -} - -fn packet_section_allows_nonblocking_truncation(section_id: &str) -> bool { - section_id == "retrieval-evidence" - || section_id == "diagrams" - || section_id.starts_with("packet-subquery-") -} - -fn packet_budget_exceeded_hard_output_cap(budget: &PacketBudgetDto) -> bool { - budget.used.output_bytes > budget.limits.max_output_bytes -} -fn packet_follow_up_commands( - project_root: &Path, - question: &str, - task_class: PacketTaskClassDto, - status: PacketSufficiencyStatusDto, - budget: &PacketBudgetDto, - missing_required_probe_queries: &[String], -) -> Vec { - let project = quote_packet_project_arg(project_root); - match status { - PacketSufficiencyStatusDto::Sufficient => Vec::new(), - PacketSufficiencyStatusDto::Partial => { - let mut commands = Vec::new(); - let targeted_searches = if missing_required_probe_queries.is_empty() { - packet_targeted_follow_up_searches(project.as_str(), question, task_class) + let primary_subgraph_id = bundle.graphs.first().map(|graph| match graph { + GraphArtifactDto::Uml { id, .. } => id.clone(), + GraphArtifactDto::Mermaid { id, .. } => id.clone(), + }); + let include_structured_evidence = req.include_evidence; + let scored_by_node = scored_hits + .iter() + .map(|scored| (scored.hit.node_id.clone(), scored)) + .collect::>(); + let citations = hits + .iter() + .map(|hit| { + if let Some(scored) = scored_by_node.get(&hit.node_id) { + to_citation( + scored, + primary_subgraph_id.as_deref(), + primary_graph.as_ref(), + include_structured_evidence, + ) } else { - packet_missing_required_probe_searches( - project.as_str(), - missing_required_probe_queries, + to_citation_from_hit( + hit, + primary_subgraph_id.as_deref(), + primary_graph.as_ref(), + include_structured_evidence, ) - }; - for command in targeted_searches { - push_unique_term(&mut commands, &command); } - commands - .into_iter() - .take(8) - .chain(budget.next_deeper_command.clone()) - .chain(std::iter::once(format!( - "codestory-cli search --project {project} --query {} --why", - quote_packet_command_value(question) - ))) - .collect() - } - PacketSufficiencyStatusDto::Insufficient => vec![ - format!("codestory-cli index --project {project} --refresh full"), - format!( - "codestory-cli search --project {project} --query {} --why", - quote_packet_command_value(question) - ), - ], - } -} - -fn packet_missing_required_probe_searches(quoted_project: &str, queries: &[String]) -> Vec { - queries - .iter() - .map(|query| { - format!( - "codestory-cli search --project {quoted_project} --query {} --why", - quote_packet_command_value(query) - ) }) - .collect() -} + .collect::>(); -fn packet_targeted_follow_up_searches( - quoted_project: &str, - question: &str, - task_class: PacketTaskClassDto, -) -> Vec { - packet_targeted_follow_up_queries(question, task_class) - .into_iter() - .map(|query| { - format!( - "codestory-cli search --project {quoted_project} --query {} --why", - quote_packet_command_value(&query) - ) - }) - .collect() + bundle.hits = hits; + bundle.citations = citations; + bundle.focus_node_id = focus_node_id; + bundle.focused_node = focused_node; + bundle.primary_graph = primary_graph; + + Ok(bundle) } -fn packet_targeted_follow_up_queries( - question: &str, - task_class: PacketTaskClassDto, -) -> Vec { - let probes = packet_symbol_probe_queries(question, task_class, PacketBudgetModeDto::Standard); - let selected: Vec = probes - .iter() - .filter(|query| is_packet_structured_follow_up_query(query)) - .take(6) - .cloned() - .collect(); - selected +fn to_citation( + scored: &HybridSearchScoredHit, + subgraph_id: Option<&str>, + primary_graph: Option<&GraphResponse>, + include_evidence: bool, +) -> AgentCitationDto { + AgentCitationDto { + node_id: scored.hit.node_id.clone(), + display_name: scored.hit.display_name.clone(), + kind: scored.hit.kind, + file_path: scored.hit.file_path.clone(), + line: scored.hit.line, + score: scored.total_score, + origin: scored.hit.origin, + resolvable: scored.hit.resolvable, + subgraph_id: subgraph_id.map(ToOwned::to_owned), + evidence_edge_ids: if include_evidence { + evidence_edge_ids_for_node(primary_graph, &scored.hit.node_id) + } else { + Vec::new() + }, + retrieval_score_breakdown: include_evidence.then_some(RetrievalScoreBreakdownDto { + lexical: scored.lexical_score, + semantic: scored.semantic_score, + graph: scored.graph_score, + total: scored.total_score, + provenance: Vec::new(), + }), + } } -fn is_packet_structured_follow_up_query(query: &str) -> bool { - query.contains('_') - || query.contains("::") - || query.contains("Options") - || query.contains("Params") - || query.contains("Processor") - || query.contains("Subcommand") +fn weak_initial_hits(prompt: &str, hits: &[SearchHit]) -> bool { + let Some(top_hit) = hits.first() else { + return true; + }; + let prompt_terms = normalized_anchor_terms(prompt); + if top_hit.score >= WEAK_INITIAL_TOP_SCORE && hit_has_indexed_anchor(top_hit, &prompt_terms) { + return false; + } + + hits.len() < WEAK_INITIAL_HIT_COUNT + || top_hit.score < WEAK_INITIAL_TOP_SCORE + || !hits + .iter() + .take(WEAK_INITIAL_HIT_COUNT) + .any(|hit| hit_has_indexed_anchor(hit, &prompt_terms)) } -fn packet_benchmark_trace(answer: &AgentAnswerDto) -> PacketBenchmarkTraceDto { - let mut source_read_steps = 0; - let mut search_steps = 0; - let mut trail_steps = 0; - for step in &answer.retrieval_trace.steps { - match step.kind { - AgentRetrievalStepKindDto::SourceRead => source_read_steps += 1, - AgentRetrievalStepKindDto::Search - | AgentRetrievalStepKindDto::SemanticQueryEmbedding - | AgentRetrievalStepKindDto::SemanticCandidateRetrieval - | AgentRetrievalStepKindDto::HybridRerank - | AgentRetrievalStepKindDto::QueryExpansion => search_steps += 1, - AgentRetrievalStepKindDto::Trail - | AgentRetrievalStepKindDto::Neighborhood - | AgentRetrievalStepKindDto::TrailFilterOptions => trail_steps += 1, - AgentRetrievalStepKindDto::NodeDetails - | AgentRetrievalStepKindDto::NodeOccurrences - | AgentRetrievalStepKindDto::EdgeOccurrences - | AgentRetrievalStepKindDto::RepoTextFallback - | AgentRetrievalStepKindDto::MermaidSynthesis - | AgentRetrievalStepKindDto::AnswerSynthesis => {} - } - } - - let mut trace_summary = answer.retrieval_trace.clone(); - // The full step trace already lives under answer.retrieval_trace. Keep the - // benchmark trace scalar-sized so compact packets do not serialize it twice. - trace_summary.annotations.clear(); - trace_summary.steps.clear(); - - PacketBenchmarkTraceDto { - retrieval_trace: trace_summary, - source_read_steps, - search_steps, - trail_steps, +fn hit_has_indexed_anchor(hit: &SearchHit, prompt_terms: &HashSet) -> bool { + if hit.origin == SearchHitOrigin::TextMatch { + return false; } -} - -fn cap_graph_artifacts( - graphs: &mut Vec, - byte_cap: usize, -) -> GraphArtifactCapStats { - let mut retained = Vec::with_capacity(graphs.len()); - let mut retained_bytes = 0usize; - let mut omitted_count = 0usize; - - for graph in graphs.drain(..) { - let encoded_bytes = serde_json::to_vec(&graph) - .map(|bytes| bytes.len()) - .unwrap_or(usize::MAX); - if retained_bytes.saturating_add(encoded_bytes) <= byte_cap { - retained_bytes = retained_bytes.saturating_add(encoded_bytes); - retained.push(graph); - } else { - omitted_count = omitted_count.saturating_add(1); - } + if prompt_mentions_display_name(prompt_terms, &hit.display_name) { + return true; } - *graphs = retained; - GraphArtifactCapStats { - retained_bytes, - omitted_count, - truncated: omitted_count > 0, - } + hit.score_breakdown + .as_ref() + .map(|breakdown| { + breakdown.lexical > WEAK_INITIAL_MIN_LEXICAL_ANCHOR + || breakdown.graph > WEAK_INITIAL_MIN_GRAPH_ANCHOR + }) + .unwrap_or(hit.resolvable) } -fn execute_retrieval( - controller: &AppController, - req: &AgentAskRequest, - prompt: &str, - ask_started_at: Instant, - resolved_profile: &ResolvedProfile, - trace: &mut TraceRecorder, -) -> Result { - let mut bundle = RetrievalBundle::default(); - let semantic_required = hybrid_retrieval_enabled() - && !packet_initial_retrieval_is_lexical_only(req.hybrid_weights.as_ref()); - - let max_results = req - .max_results - .unwrap_or(DEFAULT_MAX_RESULTS) - .clamp(1, resolved_profile.max_search_results) as usize; +fn prompt_mentions_display_name(prompt_terms: &HashSet, display_name: &str) -> bool { + let display_terms = normalized_anchor_terms(display_name); + !display_terms.is_empty() && display_terms.iter().all(|term| prompt_terms.contains(term)) +} - let (mut scored_hits, hits) = match try_sidecar_primary_search( - controller, - prompt, - max_results, - req.latency_budget_ms, - ) { - Some(SidecarPrimarySearchOutcome::Served { - hits, - scored_hits, - shadow, - }) => { - trace.set_retrieval_shadow(shadow.clone()); - trace.annotate(format!( - "retrieval_sidecar_primary mode={} candidates={} resolved_hits={}", - shadow.retrieval_mode, - shadow.candidate_count, - hits.len() - )); - let search_step = trace.start_step( - AgentRetrievalStepKindDto::Search, - vec![ - field("query_chars", prompt.len().to_string()), - field("retrieval_path", "sidecar"), - ], - ); - trace.finish_ok( - search_step, - vec![ - field("hits", hits.len().to_string()), - field("sidecar_candidates", shadow.candidate_count.to_string()), - field( - "sidecar_resolved_hits", - shadow.resolved_hit_count.to_string(), - ), - field("accepted_hits", hits.len().to_string()), - field("max_results", max_results.to_string()), - field("repo_text", "off_initial"), - ], - ); - let semantic_query_step = trace.start_step( - AgentRetrievalStepKindDto::SemanticQueryEmbedding, - vec![field("required", semantic_required.to_string())], - ); - let semantic_candidates_step = trace.start_step( - AgentRetrievalStepKindDto::SemanticCandidateRetrieval, - vec![field("required", semantic_required.to_string())], - ); - let hybrid_rerank_step = trace.start_step( - AgentRetrievalStepKindDto::HybridRerank, - vec![field("required", semantic_required.to_string())], - ); - trace.finish_skipped( - semantic_query_step, - "Semantic embedding skipped on sidecar retrieval path.", - Vec::new(), - ); - trace.finish_skipped( - semantic_candidates_step, - "Semantic candidate scan skipped on sidecar retrieval path.", - Vec::new(), - ); - trace.finish_ok( - hybrid_rerank_step, - vec![field("ranked", hits.len().to_string())], - ); - (scored_hits, hits) - } - Some(SidecarPrimarySearchOutcome::Rejected { shadow, reason }) => { - trace.set_retrieval_shadow(shadow); - trace.annotate(format!( - "retrieval_sidecar_primary rejected=true fail_closed=true reason={reason}" - )); - return Err(sidecar_retrieval_unavailable_error( - controller, - format!("sidecar retrieval primary rejected query: {reason}"), - )); - } - Some(SidecarPrimarySearchOutcome::Unavailable { reason }) => { - trace.annotate(format!( - "retrieval_sidecar_primary unavailable=true fail_closed=true reason={reason}" - )); - return Err(sidecar_retrieval_unavailable_error(controller, reason)); - } - None => { - return Err(sidecar_retrieval_unavailable_error( - controller, - "sidecar retrieval primary is mandatory; non-sidecar initial search is disabled", - )); - } - }; +fn investigation_focus_anchor(prompt: &str, hits: &[SearchHit]) -> Option { + let prompt_terms = normalized_anchor_terms(prompt); + hits.iter() + .find(|hit| { + hit.resolvable && prompt_mentions_display_name(&prompt_terms, &hit.display_name) + }) + .map(|hit| hit.node_id.clone()) +} - let initial_hit_count = hits.len(); - let mut hits = hits; - let literal_diagnostic_signal = has_literal_diagnostic_signal(prompt); - let promotable_focus_available = - req.focus_node_id.is_some() || investigation_focus_anchor(prompt, &hits).is_some(); - let mut expansion_added_hits = false; - let block_nucleo_supplement = - sidecar_retrieval_blocks_nucleo_supplement(controller, hits.len()); - if block_nucleo_supplement && weak_initial_hits(prompt, &hits) { - trace.annotate( - "retrieval_sidecar_primary skipped local nucleo investigation supplement on weak hits", - ); - } - if should_investigate(resolved_profile) - && weak_initial_hits(prompt, &hits) - && !promotable_focus_available - && !block_nucleo_supplement - { - let expanded = match investigate_query_expansion( - controller, - req, - prompt, - max_results, - ask_started_at, - resolved_profile, - trace, - ) { - Ok(expanded) => expanded, - Err(error) => { - trace.annotate(format!( - "Investigation query expansion failed; continuing with initial hits: {}", - error.message - )); - Vec::new() - } - }; - if !expanded.is_empty() { - merge_scored_hits(&mut scored_hits, expanded, max_results); - hits = scored_hits - .iter() - .map(|scored| scored.hit.clone()) - .collect::>(); - bundle.diagnostic_supplement_used = true; - expansion_added_hits = true; - } +fn normalized_anchor_terms(value: &str) -> HashSet { + value + .split(|ch: char| !ch.is_ascii_alphanumeric()) + .filter_map(|term| { + let term = term.trim().to_ascii_lowercase(); + (term.len() >= 3).then_some(term) + }) + .collect() +} - if initial_hit_count == 0 && expansion_added_hits && !literal_diagnostic_signal { - hits.clear(); - scored_hits.clear(); - trace.annotate( - "Investigation discarded expansion-only hits for an unanchored natural-language query.", - ); - } +fn should_investigate(profile: &ResolvedProfile) -> bool { + profile.preset == codestory_contracts::api::AgentRetrievalPresetDto::Investigate +} - if weak_initial_hits(prompt, &hits) && literal_diagnostic_signal { - trace.annotate( - "Investigation skipped repo-text diagnostics because packet evidence must come from sidecar-backed resolvable hits or direct source reads.", - ); - } else if weak_initial_hits(prompt, &hits) && !is_repo_explanation_prompt(prompt) { - if !hits.is_empty() { - hits.clear(); - scored_hits.clear(); - trace.annotate( - "Investigation discarded low-confidence unanchored hits for a natural-language query.", - ); - } - trace.annotate( - "Repo-text diagnostics are disabled for packet evidence; weak unanchored hits were not promoted.", - ); - } else if weak_initial_hits(prompt, &hits) { - trace.annotate( - "Investigation deferred a broad repo explanation prompt to sidecar evidence only.", - ); - } +fn has_literal_diagnostic_signal(prompt: &str) -> bool { + prompt.contains('`') + || prompt.contains('/') + || prompt.contains('\\') + || prompt.contains("::") + || prompt.contains(".rs") + || prompt + .split(|ch: char| !ch.is_ascii_alphanumeric() && ch != '_') + .any(|token| { + token.contains('_') + || (token.len() >= 4 + && token + .chars() + .filter(|ch| ch.is_ascii_alphabetic()) + .all(|ch| ch.is_ascii_uppercase())) + }) +} - if weak_initial_hits(prompt, &hits) && !is_repo_explanation_prompt(prompt) { - trace.annotate("Investigation low confidence gap after sidecar query expansion."); - } - } else if should_investigate(resolved_profile) - && weak_initial_hits(prompt, &hits) - && promotable_focus_available - { - trace.annotate( - "Investigation kept an explicit or prompt-anchored focus instead of broad diagnostics.", - ); - } +fn is_repo_explanation_prompt(prompt: &str) -> bool { + let lower = prompt.to_ascii_lowercase(); + let subject = lower.contains("repo") || lower.contains("project") || lower.contains("codebase"); + let intent = lower.contains("fit together") + || lower.contains("how does") + || lower.contains("explain") + || lower.contains("overview") + || lower.contains("architecture"); + subject && intent +} - if should_investigate(resolved_profile) - && weak_initial_hits(prompt, &hits) - && is_repo_explanation_prompt(prompt) - && !block_nucleo_supplement - { - let overview_hits = repo_explanation_grounding_hits( - controller, - req, - max_results, - ask_started_at, - resolved_profile, - trace, - )?; - if !overview_hits.is_empty() { - hits = overview_hits; - scored_hits.clear(); - bundle.diagnostic_supplement_used = true; - bundle.repo_explanation_supplement_used = true; - trace.annotate( - "Investigation used grounding snapshot diagnostic supplement for a broad repo explanation prompt.", - ); - } - } else if should_investigate(resolved_profile) - && weak_initial_hits(prompt, &hits) - && is_repo_explanation_prompt(prompt) - && block_nucleo_supplement - { - trace.annotate( - "Grounding snapshot supplement skipped because sidecar-primary retrieval is mandatory.", +fn repo_explanation_grounding_hits( + controller: &AppController, + req: &AgentAskRequest, + max_results: usize, + ask_started_at: Instant, + resolved_profile: &ResolvedProfile, + trace: &mut TraceRecorder, +) -> Result, ApiError> { + let step = trace.start_step( + AgentRetrievalStepKindDto::QueryExpansion, + vec![ + field("strategy", "grounding_snapshot"), + field("max_results", max_results.to_string()), + ], + ); + let deadline_ms = phase_deadline_ms(req, 55, 100); + if should_truncate_phase(resolved_profile, ask_started_at, deadline_ms) { + trace.finish_truncated( + step, + "Skipped grounding snapshot supplement because latency budget was exceeded.", + vec![field("phase_deadline_ms", deadline_ms.to_string())], ); + return Ok(Vec::new()); } - let focus_node_id = req - .focus_node_id - .clone() - .or_else(|| investigation_focus_anchor(prompt, &hits)) - .or_else(|| { - hits.iter() - .find(|hit| hit.resolvable) - .map(|hit| hit.node_id.clone()) - }); - - let filter_step = trace.start_step( - AgentRetrievalStepKindDto::TrailFilterOptions, - vec![field("has_focus", focus_node_id.is_some().to_string())], - ); - let filter_options = match controller.graph_trail_filter_options() { - Ok(options) => { - trace.finish_ok( - filter_step, - vec![ - field("edge_kinds", options.edge_kinds.len().to_string()), - field("node_kinds", options.node_kinds.len().to_string()), - ], - ); - options - } + let snapshot = match controller.grounding_snapshot(GroundingBudgetDto::Strict) { + Ok(snapshot) => snapshot, Err(error) => { - trace.finish_err(filter_step, error.message.clone()); - trace - .annotate("Trail filter options unavailable; continuing with unsanitized filters."); - TrailFilterOptionsDto { - node_kinds: Vec::new(), - edge_kinds: Vec::new(), - } + trace.finish_err(step, error.message.clone()); + return Err(error); } }; - let mut primary_graph: Option = None; - - if let Some(center_id) = focus_node_id.clone() { - let neighborhood_step = trace.start_step( - AgentRetrievalStepKindDto::Neighborhood, - vec![field("center_id", center_id.0.clone())], - ); - match controller.graph_neighborhood(GraphRequest { - center_id, - max_edges: Some(DEFAULT_MAX_EDGES), - }) { - Ok(neighborhood) => { - trace.finish_ok( - neighborhood_step, - vec![ - field("nodes", neighborhood.nodes.len().to_string()), - field("edges", neighborhood.edges.len().to_string()), - field("truncated", neighborhood.truncated.to_string()), - ], - ); - - primary_graph = Some(neighborhood.clone()); - bundle.graphs.push(GraphArtifactDto::Uml { - id: "uml-neighborhood".to_string(), - title: "Primary Neighborhood".to_string(), - graph: neighborhood, - }); - } - Err(error) => { - trace.finish_err(neighborhood_step, error.message.clone()); - trace.annotate("Neighborhood retrieval failed; continuing with trail retrieval."); - } - } - } else { - let neighborhood_step = trace.start_step( - AgentRetrievalStepKindDto::Neighborhood, - vec![field("has_focus", "false")], - ); - trace.finish_skipped(neighborhood_step, "No focus node selected.", Vec::new()); - } - - let sanitized_plans = resolved_profile - .trail_plans - .iter() - .map(|plan| sanitize_plan_filters(plan, &filter_options)) - .collect::>(); + let hits = crate::grounding::grounding_explanation_search_hits(&snapshot, max_results); - if focus_node_id.is_none() { - let trail_step = trace.start_step( - AgentRetrievalStepKindDto::Trail, - vec![field("plans", sanitized_plans.len().to_string())], - ); - trace.finish_skipped(trail_step, "No focus node selected.", Vec::new()); - } else { - for (idx, plan) in sanitized_plans.iter().enumerate() { - let trail_step = trace.start_step( - AgentRetrievalStepKindDto::Trail, - vec![ - field("index", idx.to_string()), - field("mode", format!("{:?}", plan.mode)), - field("depth", plan.depth.to_string()), - field("direction", format!("{:?}", plan.direction)), - field("max_nodes", plan.max_nodes.to_string()), - field("hide_speculative", "true"), - ], - ); + trace.finish_ok( + step, + vec![ + field("grounding_symbols", hits.len().to_string()), + field("coverage_files", snapshot.coverage.total_files.to_string()), + field( + "coverage_symbols", + snapshot.coverage.total_symbols.to_string(), + ), + ], + ); - let root_id = focus_node_id.clone().expect("checked focus node"); - let request = agent_trail_request(root_id, plan); + Ok(hits) +} - match controller.graph_trail(request) { - Ok(trail) => { - let trail_output = vec![ - field("nodes", trail.nodes.len().to_string()), - field("edges", trail.edges.len().to_string()), - field("max_nodes", plan.max_nodes.to_string()), - field("truncated", trail.truncated.to_string()), - field("omitted_edges", trail.omitted_edge_count.to_string()), - ]; - if trail.truncated { - trace.finish_truncated( - trail_step, - format!( - "Trail output hit max_nodes={}; narrow focus or lower depth.", - plan.max_nodes - ), - trail_output, - ); - trace.annotate(trail_truncated_annotation(idx + 1, plan.max_nodes)); - } else { - trace.finish_ok(trail_step, trail_output); - } - bundle.graphs.push(GraphArtifactDto::Uml { - id: format!("uml-trail-{}", idx + 1), - title: format!("Trail {}", idx + 1), - graph: trail, - }); - } - Err(error) => { - trace.finish_err(trail_step, error.message.clone()); - trace.annotate(format!("Trail {} failed and was skipped.", idx + 1)); - } - } - } +#[cfg(test)] +fn search_hit_from_grounding_symbol( + symbol: &codestory_contracts::api::GroundingSymbolDigestDto, +) -> SearchHit { + let (file_path, line) = symbol + .node_ref + .as_deref() + .and_then(split_node_ref_location) + .unwrap_or((None, symbol.line)); + SearchHit { + node_id: symbol.id.clone(), + display_name: symbol + .label + .split(" @ ") + .next() + .unwrap_or(symbol.label.as_str()) + .to_string(), + kind: symbol.kind, + file_path, + line: symbol.line.or(line), + score: 0.55, + origin: SearchHitOrigin::IndexedSymbol, + match_quality: None, + resolvable: true, + score_breakdown: Some(RetrievalScoreBreakdownDto { + lexical: 0.35, + semantic: 0.0, + graph: 0.20, + total: 0.55, + provenance: Vec::new(), + }), } +} - let details_step = trace.start_step( - AgentRetrievalStepKindDto::NodeDetails, - vec![field("has_focus", focus_node_id.is_some().to_string())], - ); - let focused_node = match focus_node_id.clone() { - Some(id) => match controller.node_details(NodeDetailsRequest { id }) { - Ok(details) => { - trace.finish_ok( - details_step, - vec![ - field("display_name", details.display_name.clone()), - field("kind", format!("{:?}", details.kind)), - ], - ); - Some(details) - } - Err(error) => { - trace.finish_err(details_step, error.message.clone()); - None - } - }, - None => { - trace.finish_skipped(details_step, "No focus node selected.", Vec::new()); - None - } - }; +#[cfg(test)] +fn split_node_ref_location(value: &str) -> Option<(Option, Option)> { + let mut parts = value.rsplitn(3, ':'); + let _name = parts.next()?; + let line = parts.next()?.parse::().ok(); + let path = parts.next().map(ToOwned::to_owned); + Some((path, line)) +} - let occurrences_step = trace.start_step( - AgentRetrievalStepKindDto::NodeOccurrences, - vec![field("candidates", hits.len().min(3).to_string())], +fn investigate_query_expansion( + controller: &AppController, + req: &AgentAskRequest, + prompt: &str, + max_results: usize, + ask_started_at: Instant, + resolved_profile: &ResolvedProfile, + trace: &mut TraceRecorder, +) -> Result, ApiError> { + let terms = prompt_search_terms(prompt) + .into_iter() + .take(4) + .collect::>(); + let expansion_step = trace.start_step( + AgentRetrievalStepKindDto::QueryExpansion, + vec![ + field("term_count", terms.len().to_string()), + field("max_results", max_results.to_string()), + ], ); - let node_occurrence_deadline = phase_deadline_ms(req, 65, 100); - if should_truncate_phase(resolved_profile, ask_started_at, node_occurrence_deadline) { - trace.finish_truncated( - occurrences_step, - "Skipped node occurrence lookups because latency budget was exceeded.", - vec![field( - "phase_deadline_ms", - node_occurrence_deadline.to_string(), - )], - ); - trace.annotate("Latency-first cutoff skipped node occurrence lookups."); - } else { - let mut occurrence_count = 0usize; - for hit in hits.iter().take(3) { - match controller.node_occurrences(NodeOccurrencesRequest { - id: hit.node_id.clone(), - }) { - Ok(occurrences) => { - occurrence_count += occurrences.len(); - } - Err(error) => { - trace.annotate(format!( - "Node occurrence lookup failed for {}: {}", - hit.display_name, error.message - )); - } - } - } - trace.finish_ok( - occurrences_step, - vec![field("occurrence_count", occurrence_count.to_string())], - ); - } - let edge_occurrences_step = trace.start_step( - AgentRetrievalStepKindDto::EdgeOccurrences, - vec![field( - "enabled", - resolved_profile.include_edge_occurrences.to_string(), - )], - ); - let edge_occurrence_deadline = phase_deadline_ms(req, 75, 100); - if should_truncate_phase(resolved_profile, ask_started_at, edge_occurrence_deadline) { - trace.finish_truncated( - edge_occurrences_step, - "Skipped edge occurrence lookup because latency budget was exceeded.", - vec![field( - "phase_deadline_ms", - edge_occurrence_deadline.to_string(), - )], - ); - trace.annotate("Latency-first cutoff skipped edge occurrence lookups."); - } else if !resolved_profile.include_edge_occurrences { + if terms.is_empty() { trace.finish_skipped( - edge_occurrences_step, - "Edge occurrences are disabled for this profile.", + expansion_step, + "No deterministic expansion terms extracted.", Vec::new(), ); - } else if let Some(edge_id) = first_edge_id_from_graphs(&bundle.graphs) { - match controller - .edge_occurrences(codestory_contracts::api::EdgeOccurrencesRequest { id: edge_id }) - { - Ok(occurrences) => { - trace.finish_ok( - edge_occurrences_step, - vec![field("occurrence_count", occurrences.len().to_string())], - ); - } - Err(error) => { - trace.finish_err(edge_occurrences_step, error.message.clone()); - } - } - } else { - trace.finish_skipped( - edge_occurrences_step, - "No edges available for lookup.", - Vec::new(), + return Ok(Vec::new()); + } + + let expansion_deadline = phase_deadline_ms(req, 45, 100); + if should_truncate_phase(resolved_profile, ask_started_at, expansion_deadline) { + trace.finish_truncated( + expansion_step, + "Skipped query expansion because latency budget was exceeded.", + vec![field("phase_deadline_ms", expansion_deadline.to_string())], ); + trace.annotate("Latency-first cutoff skipped investigation query expansion."); + return Ok(Vec::new()); } - let primary_subgraph_id = bundle.graphs.first().map(|graph| match graph { - GraphArtifactDto::Uml { id, .. } => id.clone(), - GraphArtifactDto::Mermaid { id, .. } => id.clone(), - }); - let include_structured_evidence = req.include_evidence; - let scored_by_node = scored_hits - .iter() - .map(|scored| (scored.hit.node_id.clone(), scored)) - .collect::>(); - let citations = hits - .iter() - .map(|hit| { - if let Some(scored) = scored_by_node.get(&hit.node_id) { - to_citation( - scored, - primary_subgraph_id.as_deref(), - primary_graph.as_ref(), - include_structured_evidence, - ) - } else { - to_citation_from_hit( - hit, - primary_subgraph_id.as_deref(), - primary_graph.as_ref(), - include_structured_evidence, - ) + let mut expanded = Vec::new(); + for term in &terms { + let hits = match controller.search_hybrid_scored( + SearchRequest { + query: term.clone(), + repo_text: SearchRepoTextMode::Off, + limit_per_source: max_results as u32, + expand_search_plan: false, + hybrid_weights: None, + hybrid_limits: None, + }, + req.focus_node_id.clone(), + max_results, + req.hybrid_weights.clone(), + ) { + Ok(hits) => hits, + Err(error) => { + trace.finish_err(expansion_step, error.message.clone()); + return Err(error); } - }) - .collect::>(); + }; + expanded.extend(hits); + } - bundle.hits = hits; - bundle.citations = citations; - bundle.focus_node_id = focus_node_id; - bundle.focused_node = focused_node; - bundle.primary_graph = primary_graph; + let hit_count = expanded.len(); + trace.finish_ok( + expansion_step, + vec![ + field("terms", terms.join(",")), + field("hits", hit_count.to_string()), + ], + ); + Ok(expanded) +} - Ok(bundle) +fn trail_truncated_annotation(trail_number: usize, max_nodes: u32) -> String { + format!("Trail {trail_number} was truncated at max_nodes={max_nodes}.") } -fn to_citation( - scored: &HybridSearchScoredHit, - subgraph_id: Option<&str>, - primary_graph: Option<&GraphResponse>, - include_evidence: bool, -) -> AgentCitationDto { - AgentCitationDto { - node_id: scored.hit.node_id.clone(), - display_name: scored.hit.display_name.clone(), - kind: scored.hit.kind, - file_path: scored.hit.file_path.clone(), - line: scored.hit.line, - score: scored.total_score, - origin: scored.hit.origin, - resolvable: scored.hit.resolvable, - subgraph_id: subgraph_id.map(ToOwned::to_owned), - evidence_edge_ids: if include_evidence { - evidence_edge_ids_for_node(primary_graph, &scored.hit.node_id) - } else { - Vec::new() - }, - retrieval_score_breakdown: include_evidence.then_some(RetrievalScoreBreakdownDto { - lexical: scored.lexical_score, - semantic: scored.semantic_score, - graph: scored.graph_score, - total: scored.total_score, - }), +fn agent_trail_request(root_id: NodeId, plan: &TrailPlan) -> TrailConfigDto { + TrailConfigDto { + root_id, + mode: plan.mode, + target_id: None, + depth: plan.depth, + direction: plan.direction, + caller_scope: plan.caller_scope, + edge_filter: plan.edge_filter.clone(), + show_utility_calls: true, + hide_speculative: true, + story: false, + node_filter: plan.node_filter.clone(), + max_nodes: plan.max_nodes, + layout_direction: codestory_contracts::api::LayoutDirection::Horizontal, } } -fn weak_initial_hits(prompt: &str, hits: &[SearchHit]) -> bool { - let Some(top_hit) = hits.first() else { - return true; - }; - let prompt_terms = normalized_anchor_terms(prompt); - if top_hit.score >= WEAK_INITIAL_TOP_SCORE && hit_has_indexed_anchor(top_hit, &prompt_terms) { - return false; +fn sanitize_plan_filters(plan: &TrailPlan, options: &TrailFilterOptionsDto) -> TrailPlan { + let mut sanitized = plan.clone(); + + if !options.edge_kinds.is_empty() && !plan.edge_filter.is_empty() { + sanitized + .edge_filter + .retain(|kind| options.edge_kinds.contains(kind)); } - hits.len() < WEAK_INITIAL_HIT_COUNT - || top_hit.score < WEAK_INITIAL_TOP_SCORE - || !hits - .iter() - .take(WEAK_INITIAL_HIT_COUNT) - .any(|hit| hit_has_indexed_anchor(hit, &prompt_terms)) + if !options.node_kinds.is_empty() && !plan.node_filter.is_empty() { + sanitized + .node_filter + .retain(|kind| options.node_kinds.contains(kind)); + } + + sanitized } -fn hit_has_indexed_anchor(hit: &SearchHit, prompt_terms: &HashSet) -> bool { - if hit.origin == SearchHitOrigin::TextMatch { - return false; +struct SourceContextRequest<'a> { + req: &'a AgentAskRequest, + prompt: &'a str, + resolved_profile: &'a ResolvedProfile, + ask_started_at: Instant, + focused_node: Option<&'a NodeDetailsDto>, + diagnostic_focus: bool, +} + +fn maybe_read_source_context( + controller: &AppController, + request: SourceContextRequest<'_>, + trace: &mut TraceRecorder, +) -> Option { + let source_step = trace.start_step( + AgentRetrievalStepKindDto::SourceRead, + vec![field( + "enabled", + request.resolved_profile.enable_source_reads.to_string(), + )], + ); + + if !request.resolved_profile.enable_source_reads { + trace.finish_skipped( + source_step, + "Source reads disabled by profile configuration.", + Vec::new(), + ); + return None; } - if prompt_mentions_display_name(prompt_terms, &hit.display_name) { - return true; + + if !needs_source_context(request.prompt) && !request.diagnostic_focus { + trace.finish_skipped( + source_step, + "Prompt does not request source-level context.", + Vec::new(), + ); + return None; } - hit.score_breakdown - .as_ref() - .map(|breakdown| { - breakdown.lexical > WEAK_INITIAL_MIN_LEXICAL_ANCHOR - || breakdown.graph > WEAK_INITIAL_MIN_GRAPH_ANCHOR - }) - .unwrap_or(hit.resolvable) -} + let source_deadline = phase_deadline_ms(request.req, 50, 100); + if should_truncate_phase( + request.resolved_profile, + request.ask_started_at, + source_deadline, + ) { + trace.finish_truncated( + source_step, + "Skipped source read because latency-first phase budget was exceeded.", + vec![field("phase_deadline_ms", source_deadline.to_string())], + ); + trace.annotate("Latency-first cutoff skipped source reads."); + return None; + } -fn prompt_mentions_display_name(prompt_terms: &HashSet, display_name: &str) -> bool { - let display_terms = normalized_anchor_terms(display_name); - !display_terms.is_empty() && display_terms.iter().all(|term| prompt_terms.contains(term)) -} + let Some(node) = request.focused_node else { + trace.finish_skipped(source_step, "No focused node available.", Vec::new()); + return None; + }; -fn investigation_focus_anchor(prompt: &str, hits: &[SearchHit]) -> Option { - let prompt_terms = normalized_anchor_terms(prompt); - hits.iter() - .find(|hit| { - hit.resolvable && prompt_mentions_display_name(&prompt_terms, &hit.display_name) - }) - .map(|hit| hit.node_id.clone()) -} + let (Some(path), Some(line)) = (node.file_path.clone(), node.start_line) else { + trace.finish_skipped( + source_step, + "Focused node has no file path and line metadata.", + Vec::new(), + ); + return None; + }; -fn normalized_anchor_terms(value: &str) -> HashSet { - value - .split(|ch: char| !ch.is_ascii_alphanumeric()) - .filter_map(|term| { - let term = term.trim().to_ascii_lowercase(); - (term.len() >= 3).then_some(term) - }) - .collect() + match controller.bounded_file_snippet( + &path, + line, + 6, + request.resolved_profile.max_source_bytes, + SOURCE_SNIPPET_TRUNCATION_SUFFIX, + ) { + Ok((resolved_path, bounded)) => { + let context = FocusedSourceContext { + path: resolved_path, + line, + snippet: bounded.markdown, + }; + trace.finish_ok( + source_step, + vec![ + field("path", context.path.clone()), + field("line", context.line.to_string()), + field( + "max_source_bytes", + request.resolved_profile.max_source_bytes.to_string(), + ), + field("snippet_bytes", context.snippet.len().to_string()), + field("truncated", bounded.truncated.to_string()), + ], + ); + Some(context) + } + Err(error) => { + trace.finish_err(source_step, error.message.clone()); + None + } + } } -fn should_investigate(profile: &ResolvedProfile) -> bool { - profile.preset == codestory_contracts::api::AgentRetrievalPresetDto::Investigate +fn needs_source_context(prompt: &str) -> bool { + let normalized = prompt.to_ascii_lowercase(); + [ + "code", + "snippet", + "implementation", + "source", + "line", + "read", + ] + .iter() + .any(|keyword| normalized.contains(keyword)) } -fn has_literal_diagnostic_signal(prompt: &str) -> bool { - prompt.contains('`') - || prompt.contains('/') - || prompt.contains('\\') - || prompt.contains("::") - || prompt.contains(".rs") - || prompt - .split(|ch: char| !ch.is_ascii_alphanumeric() && ch != '_') - .any(|token| { - token.contains('_') - || (token.len() >= 4 - && token - .chars() - .filter(|ch| ch.is_ascii_alphabetic()) - .all(|ch| ch.is_ascii_uppercase())) - }) +#[cfg(test)] +struct BoundedMarkdownSnippet { + markdown: String, + truncated: bool, } -fn is_repo_explanation_prompt(prompt: &str) -> bool { - let lower = prompt.to_ascii_lowercase(); - let subject = lower.contains("repo") || lower.contains("project") || lower.contains("codebase"); - let intent = lower.contains("fit together") - || lower.contains("how does") - || lower.contains("explain") - || lower.contains("overview") - || lower.contains("architecture"); - subject && intent -} +#[cfg(test)] +fn bounded_markdown_snippet( + text: &str, + focus_line: Option, + context_lines: usize, + max_bytes: usize, +) -> BoundedMarkdownSnippet { + let mut snippet = crate::markdown_snippet(text, focus_line, context_lines); + if snippet.len() <= max_bytes { + return BoundedMarkdownSnippet { + markdown: snippet, + truncated: false, + }; + } -fn repo_explanation_grounding_hits( - controller: &AppController, + if max_bytes <= SOURCE_SNIPPET_TRUNCATION_SUFFIX.len() { + snippet = SOURCE_SNIPPET_TRUNCATION_SUFFIX.to_string(); + while snippet.len() > max_bytes { + snippet.pop(); + } + return BoundedMarkdownSnippet { + markdown: snippet, + truncated: true, + }; + } + + let content_budget = max_bytes - SOURCE_SNIPPET_TRUNCATION_SUFFIX.len(); + while snippet.len() > content_budget { + snippet.pop(); + } + snippet.push_str(SOURCE_SNIPPET_TRUNCATION_SUFFIX); + debug_assert!(snippet.len() <= max_bytes); + BoundedMarkdownSnippet { + markdown: snippet, + truncated: true, + } +} +fn build_mermaid_artifacts( + profile: &ResolvedProfile, req: &AgentAskRequest, - max_results: usize, - ask_started_at: Instant, - resolved_profile: &ResolvedProfile, - trace: &mut TraceRecorder, -) -> Result, ApiError> { - let step = trace.start_step( - AgentRetrievalStepKindDto::QueryExpansion, - vec![ - field("strategy", "grounding_snapshot"), - field("max_results", max_results.to_string()), - ], + prompt: &str, + ask_started_at: Instant, + bundle: &RetrievalBundle, + trace: &mut TraceRecorder, +) -> Vec { + let mermaid_step = trace.start_step( + AgentRetrievalStepKindDto::MermaidSynthesis, + vec![field("existing_graphs", bundle.graphs.len().to_string())], ); - let deadline_ms = phase_deadline_ms(req, 55, 100); - if should_truncate_phase(resolved_profile, ask_started_at, deadline_ms) { + + let mut artifacts = Vec::new(); + let mermaid_deadline = phase_deadline_ms(req, 85, 100); + if should_truncate_phase(profile, ask_started_at, mermaid_deadline) { trace.finish_truncated( - step, - "Skipped grounding snapshot supplement because latency budget was exceeded.", - vec![field("phase_deadline_ms", deadline_ms.to_string())], + mermaid_step, + "Skipped mermaid synthesis because latency budget was exceeded.", + vec![field("phase_deadline_ms", mermaid_deadline.to_string())], ); - return Ok(Vec::new()); + trace.annotate("Latency-first cutoff skipped mermaid synthesis."); + return artifacts; } - let snapshot = match controller.grounding_snapshot(GroundingBudgetDto::Strict) { - Ok(snapshot) => snapshot, - Err(error) => { - trace.finish_err(step, error.message.clone()); - return Err(error); + let primary_graph = bundle + .primary_graph + .clone() + .or_else(|| first_uml_graph(&bundle.graphs)); + + if let Some(graph) = primary_graph { + artifacts.push(GraphArtifactDto::Mermaid { + id: "mermaid-overview".to_string(), + title: "Graph Overview".to_string(), + diagram: "flowchart".to_string(), + mermaid_syntax: mermaid_flowchart(&graph), + }); + + if matches!( + profile.preset, + codestory_contracts::api::AgentRetrievalPresetDto::Callflow + ) { + artifacts.push(GraphArtifactDto::Mermaid { + id: "mermaid-sequence".to_string(), + title: "Sequence Narrative".to_string(), + diagram: "sequenceDiagram".to_string(), + mermaid_syntax: mermaid_sequence(&graph), + }); } - }; - let hits = crate::grounding::grounding_explanation_search_hits(&snapshot, max_results); + if prompt.to_ascii_lowercase().contains("timeline") { + artifacts.push(GraphArtifactDto::Mermaid { + id: "mermaid-timeline".to_string(), + title: "Timeline".to_string(), + diagram: "gantt".to_string(), + mermaid_syntax: mermaid_gantt(&bundle.hits), + }); + } + } + + if artifacts.is_empty() { + artifacts.push(GraphArtifactDto::Mermaid { + id: "mermaid-diagnostic".to_string(), + title: "Retrieval Diagnostic".to_string(), + diagram: "flowchart".to_string(), + mermaid_syntax: diagnostic_mermaid(prompt, bundle.hits.len()), + }); + } trace.finish_ok( - step, - vec![ - field("grounding_symbols", hits.len().to_string()), - field("coverage_files", snapshot.coverage.total_files.to_string()), - field( - "coverage_symbols", - snapshot.coverage.total_symbols.to_string(), - ), - ], + mermaid_step, + vec![field("mermaid_count", artifacts.len().to_string())], ); - - Ok(hits) + artifacts } -#[cfg(test)] -fn search_hit_from_grounding_symbol( - symbol: &codestory_contracts::api::GroundingSymbolDigestDto, -) -> SearchHit { - let (file_path, line) = symbol - .node_ref - .as_deref() - .and_then(split_node_ref_location) - .unwrap_or((None, symbol.line)); - SearchHit { - node_id: symbol.id.clone(), - display_name: symbol - .label - .split(" @ ") - .next() - .unwrap_or(symbol.label.as_str()) - .to_string(), - kind: symbol.kind, - file_path, - line: symbol.line.or(line), - score: 0.55, - origin: SearchHitOrigin::IndexedSymbol, - match_quality: None, - resolvable: true, - score_breakdown: Some(RetrievalScoreBreakdownDto { - lexical: 0.35, - semantic: 0.0, - graph: 0.20, - total: 0.55, - }), - } +fn first_uml_graph(graphs: &[GraphArtifactDto]) -> Option { + graphs.iter().find_map(|graph| match graph { + GraphArtifactDto::Uml { graph, .. } => Some(graph.clone()), + GraphArtifactDto::Mermaid { .. } => None, + }) } -#[cfg(test)] -fn split_node_ref_location(value: &str) -> Option<(Option, Option)> { - let mut parts = value.rsplitn(3, ':'); - let _name = parts.next()?; - let line = parts.next()?.parse::().ok(); - let path = parts.next().map(ToOwned::to_owned); - Some((path, line)) +fn first_edge_id_from_graphs( + graphs: &[GraphArtifactDto], +) -> Option { + graphs.iter().find_map(|graph| match graph { + GraphArtifactDto::Uml { graph, .. } => graph.edges.first().map(|edge| edge.id.clone()), + GraphArtifactDto::Mermaid { .. } => None, + }) } -fn investigate_query_expansion( - controller: &AppController, - req: &AgentAskRequest, +fn build_sections( prompt: &str, - max_results: usize, - ask_started_at: Instant, resolved_profile: &ResolvedProfile, - trace: &mut TraceRecorder, -) -> Result, ApiError> { - let terms = prompt_search_terms(prompt) - .into_iter() - .take(4) + bundle: &RetrievalBundle, + source_context: Option<&FocusedSourceContext>, +) -> Vec { + let mut sections = Vec::new(); + + let mut analysis_blocks = vec![AgentResponseBlockDto::Markdown { + markdown: "Answer assembled from indexed DB-first retrieval evidence.".to_string(), + }]; + + if let Some(primary_mermaid_id) = first_mermaid_graph_id(&bundle.graphs) { + analysis_blocks.push(AgentResponseBlockDto::Mermaid { + graph_id: primary_mermaid_id, + }); + } + + sections.push(AgentResponseSectionDto { + id: "analysis".to_string(), + title: "Analysis".to_string(), + blocks: analysis_blocks, + }); + + sections.push(AgentResponseSectionDto { + id: "retrieval-evidence".to_string(), + title: "Retrieval Evidence".to_string(), + blocks: vec![AgentResponseBlockDto::Markdown { + markdown: retrieval_markdown(prompt, resolved_profile, bundle, source_context), + }], + }); + + let mermaid_ids = bundle + .graphs + .iter() + .filter_map(|graph| match graph { + GraphArtifactDto::Mermaid { id, .. } => Some(id.clone()), + GraphArtifactDto::Uml { .. } => None, + }) .collect::>(); - let expansion_step = trace.start_step( - AgentRetrievalStepKindDto::QueryExpansion, - vec![ - field("term_count", terms.len().to_string()), - field("max_results", max_results.to_string()), - ], + + if !mermaid_ids.is_empty() { + let mut blocks = vec![AgentResponseBlockDto::Markdown { + markdown: "Mermaid diagrams generated from indexed graph retrieval.".to_string(), + }]; + for graph_id in mermaid_ids { + blocks.push(AgentResponseBlockDto::Mermaid { graph_id }); + } + + sections.push(AgentResponseSectionDto { + id: "diagrams".to_string(), + title: "Diagrams".to_string(), + blocks, + }); + } + + sections +} + +fn retrieval_markdown( + prompt: &str, + profile: &ResolvedProfile, + bundle: &RetrievalBundle, + source_context: Option<&FocusedSourceContext>, +) -> String { + let mut markdown = String::new(); + + let _ = writeln!(markdown, "Prompt: **{}**", prompt.trim().replace('\n', " ")); + let _ = writeln!( + markdown, + "Resolved profile: `{:?}` (`{:?}` mode)", + profile.preset, profile.policy_mode + ); + let _ = writeln!( + markdown, + "Indexed hits: `{}` | Graph artifacts: `{}`", + bundle.hits.len(), + bundle.graphs.len() ); - if terms.is_empty() { - trace.finish_skipped( - expansion_step, - "No deterministic expansion terms extracted.", - Vec::new(), + if let Some(node) = bundle.focused_node.as_ref() { + let _ = writeln!( + markdown, + "Focused symbol: **{}** (`{:?}`)", + node.display_name, node.kind + ); + } + + if let Some(source) = source_context { + let _ = writeln!( + markdown, + "\nSource snippet from `{}`:{}:\n", + source.path, source.line + ); + markdown.push_str(&source.snippet); + markdown.push('\n'); + } + + markdown.push_str("\nWhat I checked:\n"); + markdown.push_str("- Initial indexed-symbol search with current hybrid ranking.\n"); + if bundle.diagnostic_supplement_used { + markdown.push_str("- Deterministic query expansion because initial hits were weak.\n"); + } + if bundle.repo_explanation_supplement_used { + markdown.push_str( + "- Grounding snapshot diagnostic supplement for broad repo overview evidence.\n", ); - return Ok(Vec::new()); + } + if !bundle.diagnostic_supplement_used && should_investigate(profile) { + markdown.push_str("- Initial sidecar hits cleared the investigation confidence gate.\n"); } - let expansion_deadline = phase_deadline_ms(req, 45, 100); - if should_truncate_phase(resolved_profile, ask_started_at, expansion_deadline) { - trace.finish_truncated( - expansion_step, - "Skipped query expansion because latency budget was exceeded.", - vec![field("phase_deadline_ms", expansion_deadline.to_string())], + if bundle.hits.is_empty() { + markdown.push_str( + "\nNo indexed symbol matches found. Try: symbol names, module paths, or re-run indexing.\n", ); - trace.annotate("Latency-first cutoff skipped investigation query expansion."); - return Ok(Vec::new()); + } else { + markdown.push_str("\nTop indexed matches:\n"); + for hit in bundle.hits.iter().take(6) { + write_indexed_match_markdown(&mut markdown, hit); + } } - let mut expanded = Vec::new(); - for term in &terms { - let hits = match controller.search_hybrid_scored( - SearchRequest { - query: term.clone(), - repo_text: SearchRepoTextMode::Off, - limit_per_source: max_results as u32, - expand_search_plan: false, - hybrid_weights: None, - hybrid_limits: None, - }, - req.focus_node_id.clone(), - max_results, - req.hybrid_weights.clone(), - ) { - Ok(hits) => hits, - Err(error) => { - trace.finish_err(expansion_step, error.message.clone()); - return Err(error); - } - }; - expanded.extend(hits); + if should_investigate(profile) && weak_initial_hits(prompt, &bundle.hits) { + markdown.push_str("\nGaps:\n"); + markdown.push_str( + "- Confidence is low: investigation mode could not find enough strong indexed-symbol evidence within its bounded search.\n", + ); + if bundle.hits.iter().any(SearchHit::is_text_match) { + markdown.push_str( + "- Repo-text hits cite file/line locations only and were not treated as resolvable symbols.\n", + ); + } } - let hit_count = expanded.len(); - trace.finish_ok( - expansion_step, - vec![ - field("terms", terms.join(",")), - field("hits", hit_count.to_string()), - ], - ); - Ok(expanded) + markdown } -fn trail_truncated_annotation(trail_number: usize, max_nodes: u32) -> String { - format!("Trail {trail_number} was truncated at max_nodes={max_nodes}.") +fn write_indexed_match_markdown(markdown: &mut String, hit: &SearchHit) { + let _ = writeln!( + markdown, + "- **{}** [{:?}] origin `{}` resolvable `{}` score `{:.3}`{}", + hit.display_name, + hit.kind, + hit.origin.as_str(), + hit.resolvable, + hit.score, + search_hit_location_suffix(hit) + ); } -fn agent_trail_request(root_id: NodeId, plan: &TrailPlan) -> TrailConfigDto { - TrailConfigDto { - root_id, - mode: plan.mode, - target_id: None, - depth: plan.depth, - direction: plan.direction, - caller_scope: plan.caller_scope, - edge_filter: plan.edge_filter.clone(), - show_utility_calls: true, - hide_speculative: true, - story: false, - node_filter: plan.node_filter.clone(), - max_nodes: plan.max_nodes, - layout_direction: codestory_contracts::api::LayoutDirection::Horizontal, +fn search_hit_location_suffix(hit: &SearchHit) -> String { + match (&hit.file_path, hit.line) { + (Some(path), Some(line)) => format!(" ({}:{})", path, line), + (Some(path), None) => format!(" ({})", path), + _ => String::new(), } } -fn sanitize_plan_filters(plan: &TrailPlan, options: &TrailFilterOptionsDto) -> TrailPlan { - let mut sanitized = plan.clone(); - - if !options.edge_kinds.is_empty() && !plan.edge_filter.is_empty() { - sanitized - .edge_filter - .retain(|kind| options.edge_kinds.contains(kind)); - } - - if !options.node_kinds.is_empty() && !plan.node_filter.is_empty() { - sanitized - .node_filter - .retain(|kind| options.node_kinds.contains(kind)); - } +fn first_mermaid_graph_id(graphs: &[GraphArtifactDto]) -> Option { + graphs.iter().find_map(|graph| match graph { + GraphArtifactDto::Mermaid { id, .. } => Some(id.clone()), + GraphArtifactDto::Uml { .. } => None, + }) +} - sanitized +fn summarize_response(resolved_profile: &ResolvedProfile, bundle: &RetrievalBundle) -> String { + format!( + "DB-first retrieval ({:?}/{:?}) returned {} indexed match(es) and {} graph artifact(s).", + resolved_profile.preset, + resolved_profile.policy_mode, + bundle.hits.len(), + bundle.graphs.len() + ) } -struct SourceContextRequest<'a> { - req: &'a AgentAskRequest, - prompt: &'a str, - resolved_profile: &'a ResolvedProfile, - ask_started_at: Instant, - focused_node: Option<&'a NodeDetailsDto>, - diagnostic_focus: bool, +fn next_request_id() -> String { + let nanos = SystemTime::now() + .duration_since(UNIX_EPOCH) + .unwrap_or_default() + .as_nanos(); + format!("ask-{}", nanos) } -fn maybe_read_source_context( - controller: &AppController, - request: SourceContextRequest<'_>, - trace: &mut TraceRecorder, -) -> Option { - let source_step = trace.start_step( - AgentRetrievalStepKindDto::SourceRead, - vec![field( - "enabled", - request.resolved_profile.enable_source_reads.to_string(), - )], - ); +#[allow(dead_code)] +fn merge_search_hits(into: &mut Vec, additional: Vec, max_candidates: usize) { + let mut by_id = HashMap::::new(); - if !request.resolved_profile.enable_source_reads { - trace.finish_skipped( - source_step, - "Source reads disabled by profile configuration.", - Vec::new(), - ); - return None; + for hit in into.drain(..) { + by_id.insert(hit.node_id.clone(), hit); } - if !needs_source_context(request.prompt) && !request.diagnostic_focus { - trace.finish_skipped( - source_step, - "Prompt does not request source-level context.", - Vec::new(), - ); - return None; + for hit in additional { + by_id + .entry(hit.node_id.clone()) + .and_modify(|existing| { + if hit.score > existing.score { + *existing = hit.clone(); + } + }) + .or_insert(hit); } - let source_deadline = phase_deadline_ms(request.req, 50, 100); - if should_truncate_phase( - request.resolved_profile, - request.ask_started_at, - source_deadline, - ) { - trace.finish_truncated( - source_step, - "Skipped source read because latency-first phase budget was exceeded.", - vec![field("phase_deadline_ms", source_deadline.to_string())], - ); - trace.annotate("Latency-first cutoff skipped source reads."); - return None; - } + let mut merged = by_id.into_values().collect::>(); + merged.sort_by(|left, right| { + right + .score + .partial_cmp(&left.score) + .unwrap_or(Ordering::Equal) + }); + merged.truncate(max_candidates); + *into = merged; +} - let Some(node) = request.focused_node else { - trace.finish_skipped(source_step, "No focused node available.", Vec::new()); - return None; - }; +fn merge_scored_hits( + into: &mut Vec, + additional: Vec, + max_candidates: usize, +) { + let mut by_id = HashMap::::new(); - let (Some(path), Some(line)) = (node.file_path.clone(), node.start_line) else { - trace.finish_skipped( - source_step, - "Focused node has no file path and line metadata.", - Vec::new(), - ); - return None; - }; + for hit in into.drain(..) { + by_id.insert(hit.hit.node_id.clone(), hit); + } - match controller.bounded_file_snippet( - &path, - line, - 6, - request.resolved_profile.max_source_bytes, - SOURCE_SNIPPET_TRUNCATION_SUFFIX, - ) { - Ok((resolved_path, bounded)) => { - let context = FocusedSourceContext { - path: resolved_path, - line, - snippet: bounded.markdown, - }; - trace.finish_ok( - source_step, - vec![ - field("path", context.path.clone()), - field("line", context.line.to_string()), - field( - "max_source_bytes", - request.resolved_profile.max_source_bytes.to_string(), - ), - field("snippet_bytes", context.snippet.len().to_string()), - field("truncated", bounded.truncated.to_string()), - ], - ); - Some(context) - } - Err(error) => { - trace.finish_err(source_step, error.message.clone()); - None - } + for hit in additional { + by_id + .entry(hit.hit.node_id.clone()) + .and_modify(|existing| { + if hit.total_score > existing.total_score { + *existing = hit.clone(); + } + }) + .or_insert(hit); } -} -fn needs_source_context(prompt: &str) -> bool { - let normalized = prompt.to_ascii_lowercase(); - [ - "code", - "snippet", - "implementation", - "source", - "line", - "read", - ] - .iter() - .any(|keyword| normalized.contains(keyword)) + let mut merged = by_id.into_values().collect::>(); + merged.sort_by(|left, right| { + right + .total_score + .partial_cmp(&left.total_score) + .unwrap_or(Ordering::Equal) + }); + merged.truncate(max_candidates); + *into = merged; } #[cfg(test)] -struct BoundedMarkdownSnippet { - markdown: String, - truncated: bool, -} +mod tests { + use super::*; + use crate::agent::eval_probes::{ + EVAL_PROBES_ENV, pop_eval_probes_test_override, push_eval_probes_test_override, + }; + use crate::agent::profiles::ResolvedProfile; -#[cfg(test)] -fn bounded_markdown_snippet( - text: &str, - focus_line: Option, - context_lines: usize, - max_bytes: usize, -) -> BoundedMarkdownSnippet { - let mut snippet = crate::markdown_snippet(text, focus_line, context_lines); - if snippet.len() <= max_bytes { - return BoundedMarkdownSnippet { - markdown: snippet, - truncated: false, - }; + struct EvalProbesGuard; + + impl EvalProbesGuard { + fn enabled() -> Self { + push_eval_probes_test_override(); + Self + } } - if max_bytes <= SOURCE_SNIPPET_TRUNCATION_SUFFIX.len() { - snippet = SOURCE_SNIPPET_TRUNCATION_SUFFIX.to_string(); - while snippet.len() > max_bytes { - snippet.pop(); + impl Drop for EvalProbesGuard { + fn drop(&mut self) { + pop_eval_probes_test_override(); } - return BoundedMarkdownSnippet { - markdown: snippet, - truncated: true, - }; } - let content_budget = max_bytes - SOURCE_SNIPPET_TRUNCATION_SUFFIX.len(); - while snippet.len() > content_budget { - snippet.pop(); + struct EnvVarGuard { + key: &'static str, + previous: Option, } - snippet.push_str(SOURCE_SNIPPET_TRUNCATION_SUFFIX); - debug_assert!(snippet.len() <= max_bytes); - BoundedMarkdownSnippet { - markdown: snippet, - truncated: true, + + impl EnvVarGuard { + fn cleared(key: &'static str) -> Self { + let previous = std::env::var_os(key); + // SAFETY: tests use this guard to isolate one env var for this process-local + // regression and restore it on drop. + unsafe { + std::env::remove_var(key); + } + Self { key, previous } + } } -} -fn build_mermaid_artifacts( - profile: &ResolvedProfile, - req: &AgentAskRequest, - prompt: &str, - ask_started_at: Instant, - bundle: &RetrievalBundle, - trace: &mut TraceRecorder, -) -> Vec { - let mermaid_step = trace.start_step( - AgentRetrievalStepKindDto::MermaidSynthesis, - vec![field("existing_graphs", bundle.graphs.len().to_string())], - ); - let mut artifacts = Vec::new(); - let mermaid_deadline = phase_deadline_ms(req, 85, 100); - if should_truncate_phase(profile, ask_started_at, mermaid_deadline) { - trace.finish_truncated( - mermaid_step, - "Skipped mermaid synthesis because latency budget was exceeded.", - vec![field("phase_deadline_ms", mermaid_deadline.to_string())], - ); - trace.annotate("Latency-first cutoff skipped mermaid synthesis."); - return artifacts; + impl Drop for EnvVarGuard { + fn drop(&mut self) { + // SAFETY: restores the process-local env var captured by this guard. + unsafe { + if let Some(previous) = self.previous.take() { + std::env::set_var(self.key, previous); + } else { + std::env::remove_var(self.key); + } + } + } } - let primary_graph = bundle - .primary_graph - .clone() - .or_else(|| first_uml_graph(&bundle.graphs)); + fn latency_profile() -> ResolvedProfile { + ResolvedProfile { + preset: codestory_contracts::api::AgentRetrievalPresetDto::Architecture, + policy_mode: AgentRetrievalPolicyModeDto::LatencyFirst, + trail_plans: Vec::new(), + include_edge_occurrences: false, + enable_source_reads: true, + max_search_results: 25, + max_source_bytes: 32 * 1024, + } + } - if let Some(graph) = primary_graph { - artifacts.push(GraphArtifactDto::Mermaid { - id: "mermaid-overview".to_string(), - title: "Graph Overview".to_string(), - diagram: "flowchart".to_string(), - mermaid_syntax: mermaid_flowchart(&graph), - }); + #[test] + fn agent_trail_request_hides_speculative_edges_by_default() { + let plan = TrailPlan { + mode: codestory_contracts::api::TrailMode::AllReferenced, + depth: 3, + direction: codestory_contracts::api::TrailDirection::Outgoing, + caller_scope: codestory_contracts::api::TrailCallerScope::ProductionOnly, + edge_filter: vec![codestory_contracts::api::EdgeKind::CALL], + node_filter: vec![codestory_contracts::api::NodeKind::FUNCTION], + max_nodes: 42, + }; - if matches!( - profile.preset, - codestory_contracts::api::AgentRetrievalPresetDto::Callflow - ) { - artifacts.push(GraphArtifactDto::Mermaid { - id: "mermaid-sequence".to_string(), - title: "Sequence Narrative".to_string(), - diagram: "sequenceDiagram".to_string(), - mermaid_syntax: mermaid_sequence(&graph), - }); - } + let request = agent_trail_request(NodeId("root".to_string()), &plan); - if prompt.to_ascii_lowercase().contains("timeline") { - artifacts.push(GraphArtifactDto::Mermaid { - id: "mermaid-timeline".to_string(), - title: "Timeline".to_string(), - diagram: "gantt".to_string(), - mermaid_syntax: mermaid_gantt(&bundle.hits), - }); + assert!(request.hide_speculative); + assert!(request.show_utility_calls); + assert!(!request.story); + assert_eq!(request.edge_filter, plan.edge_filter); + assert_eq!(request.node_filter, plan.node_filter); + assert_eq!(request.max_nodes, plan.max_nodes); + } + + fn test_search_hit(node_id: &str, score: f32) -> SearchHit { + SearchHit { + node_id: codestory_contracts::api::NodeId(node_id.to_string()), + display_name: node_id.to_string(), + kind: codestory_contracts::api::NodeKind::FUNCTION, + file_path: None, + line: None, + score, + origin: codestory_contracts::api::SearchHitOrigin::IndexedSymbol, + match_quality: None, + resolvable: true, + score_breakdown: None, } } - if artifacts.is_empty() { - artifacts.push(GraphArtifactDto::Mermaid { - id: "mermaid-diagnostic".to_string(), - title: "Retrieval Diagnostic".to_string(), - diagram: "flowchart".to_string(), - mermaid_syntax: diagnostic_mermaid(prompt, bundle.hits.len()), + fn test_semantic_only_hit(node_id: &str, score: f32) -> SearchHit { + let mut hit = test_search_hit(node_id, score); + hit.score_breakdown = Some(RetrievalScoreBreakdownDto { + lexical: 0.0, + semantic: score, + graph: 0.0, + total: score, + provenance: Vec::new(), }); + hit } - trace.finish_ok( - mermaid_step, - vec![field("mermaid_count", artifacts.len().to_string())], - ); - artifacts -} - -fn first_uml_graph(graphs: &[GraphArtifactDto]) -> Option { - graphs.iter().find_map(|graph| match graph { - GraphArtifactDto::Uml { graph, .. } => Some(graph.clone()), - GraphArtifactDto::Mermaid { .. } => None, - }) -} - -fn first_edge_id_from_graphs( - graphs: &[GraphArtifactDto], -) -> Option { - graphs.iter().find_map(|graph| match graph { - GraphArtifactDto::Uml { graph, .. } => graph.edges.first().map(|edge| edge.id.clone()), - GraphArtifactDto::Mermaid { .. } => None, - }) -} - -fn build_sections( - prompt: &str, - resolved_profile: &ResolvedProfile, - bundle: &RetrievalBundle, - source_context: Option<&FocusedSourceContext>, -) -> Vec { - let mut sections = Vec::new(); + fn test_packet_citation(display_name: &str, file_path: &str, score: f32) -> AgentCitationDto { + AgentCitationDto { + node_id: NodeId(display_name.to_string()), + display_name: display_name.to_string(), + kind: codestory_contracts::api::NodeKind::FUNCTION, + file_path: Some(file_path.to_string()), + line: Some(10), + score, + origin: SearchHitOrigin::IndexedSymbol, + resolvable: true, + subgraph_id: None, + evidence_edge_ids: Vec::new(), + retrieval_score_breakdown: Some(RetrievalScoreBreakdownDto { + lexical: 0.4, + semantic: 0.2, + graph: 0.3, + total: score, + provenance: Vec::new(), + }), + } + } - let mut analysis_blocks = vec![AgentResponseBlockDto::Markdown { - markdown: "Answer assembled from indexed DB-first retrieval evidence.".to_string(), - }]; + fn packet_answer_fixture(question: &str, citations: Vec) -> AgentAnswerDto { + AgentAnswerDto { + answer_id: "packet-fixture".to_string(), + prompt: question.to_string(), + summary: "Fixture packet is covered by cited anchors.".to_string(), + freshness: None, + sections: vec![AgentResponseSectionDto { + id: "answer".to_string(), + title: "Answer".to_string(), + blocks: vec![AgentResponseBlockDto::Markdown { + markdown: "Packet answer assembled from cited anchors.".to_string(), + }], + }], + citations, + subgraph_ids: Vec::new(), + retrieval_version: "test".to_string(), + graphs: Vec::new(), + retrieval_trace: codestory_contracts::api::AgentRetrievalTraceDto { + request_id: "packet-fixture".to_string(), + resolved_profile: AgentRetrievalPresetDto::Architecture, + policy_mode: AgentRetrievalPolicyModeDto::LatencyFirst, + total_latency_ms: 1, + sla_target_ms: None, + sla_missed: false, + semantic_fallback_count: 0, + semantic_fallbacks: Vec::new(), + annotations: Vec::new(), + steps: Vec::new(), + packet_sidecar_diagnostics: Vec::new(), + retrieval_shadow: None, + }, + } + } - if let Some(primary_mermaid_id) = first_mermaid_graph_id(&bundle.graphs) { - analysis_blocks.push(AgentResponseBlockDto::Mermaid { - graph_id: primary_mermaid_id, - }); + fn packet_fixture_project_root() -> &'static std::path::Path { + std::path::Path::new("C:/workspace/project root") } - sections.push(AgentResponseSectionDto { - id: "analysis".to_string(), - title: "Analysis".to_string(), - blocks: analysis_blocks, - }); + fn packet_temp_root(name: &str) -> std::path::PathBuf { + let suffix = std::time::SystemTime::now() + .duration_since(std::time::UNIX_EPOCH) + .expect("system clock should be after unix epoch") + .as_nanos(); + std::env::temp_dir().join(format!("codestory-{name}-{}-{suffix}", std::process::id())) + } - sections.push(AgentResponseSectionDto { - id: "retrieval-evidence".to_string(), - title: "Retrieval Evidence".to_string(), - blocks: vec![AgentResponseBlockDto::Markdown { - markdown: retrieval_markdown(prompt, resolved_profile, bundle, source_context), - }], - }); + fn write_packet_fixture_file( + root: &std::path::Path, + relative_path: &str, + source: &str, + ) -> std::path::PathBuf { + let path = root.join(relative_path); + std::fs::create_dir_all(path.parent().expect("fixture path should have a parent")) + .expect("create fixture parent directory"); + std::fs::write(&path, source).expect("write fixture source file"); + path + } - let mermaid_ids = bundle - .graphs - .iter() - .filter_map(|graph| match graph { - GraphArtifactDto::Mermaid { id, .. } => Some(id.clone()), - GraphArtifactDto::Uml { .. } => None, - }) - .collect::>(); + fn build_sufficient_packet_fixture( + question: &str, + task_class: PacketTaskClassDto, + citations: Vec, + ) -> (AgentAnswerDto, PacketSufficiencyDto) { + let limits = packet_budget_limits(PacketBudgetModeDto::Compact); + let mut answer = packet_answer_fixture(question, citations); + rank_packet_evidence(question, &mut answer); + append_packet_evidence_sections(&mut answer, task_class, &limits); + let budget = apply_packet_budget( + packet_fixture_project_root(), + question, + PacketTaskClassDto::ArchitectureExplanation, + PacketBudgetModeDto::Compact, + limits, + &mut answer, + ); + let sufficiency = build_packet_sufficiency( + packet_fixture_project_root(), + question, + task_class, + &answer, + &budget, + ); + (answer, sufficiency) + } - if !mermaid_ids.is_empty() { - let mut blocks = vec![AgentResponseBlockDto::Markdown { - markdown: "Mermaid diagrams generated from indexed graph retrieval.".to_string(), - }]; - for graph_id in mermaid_ids { - blocks.push(AgentResponseBlockDto::Mermaid { graph_id }); - } + #[test] + fn packet_sufficiency_does_not_promote_summary_to_covered_claim() { + let question = "Explain packet sufficiency proof boundaries."; + let answer = packet_answer_fixture(question, Vec::new()); + let budget = PacketBudgetDto { + requested: PacketBudgetModeDto::Compact, + limits: packet_budget_limits(PacketBudgetModeDto::Compact), + used: PacketBudgetUsageDto { + anchors: 0, + files: 0, + snippets: 0, + trail_edges: 0, + output_bytes: 0, + }, + truncated: false, + omitted_sections: Vec::new(), + next_deeper_command: None, + }; + let sufficiency = build_packet_sufficiency( + packet_fixture_project_root(), + question, + PacketTaskClassDto::ArchitectureExplanation, + &answer, + &budget, + ); - sections.push(AgentResponseSectionDto { - id: "diagrams".to_string(), - title: "Diagrams".to_string(), - blocks, - }); + assert_ne!(sufficiency.status, PacketSufficiencyStatusDto::Sufficient); + assert!( + sufficiency.covered_claims.is_empty(), + "covered_claims must only contain source-backed claims, not summary fallback: {sufficiency:?}" + ); } - sections -} - -fn retrieval_markdown( - prompt: &str, - profile: &ResolvedProfile, - bundle: &RetrievalBundle, - source_context: Option<&FocusedSourceContext>, -) -> String { - let mut markdown = String::new(); + #[test] + fn packet_symbol_probes_prioritize_flow_specific_terms() { + let _eval_probes = EvalProbesGuard::enabled(); + let queries = packet_symbol_probe_queries( + "Explain how `codex exec --json` flows from the top-level CLI into the exec runtime, app-server thread and turn start requests, and JSONL event output.", + PacketTaskClassDto::ArchitectureExplanation, + PacketBudgetModeDto::Standard, + ); - let _ = writeln!(markdown, "Prompt: **{}**", prompt.trim().replace('\n', " ")); - let _ = writeln!( - markdown, - "Resolved profile: `{:?}` (`{:?}` mode)", - profile.preset, profile.policy_mode - ); - let _ = writeln!( - markdown, - "Indexed hits: `{}` | Graph artifacts: `{}`", - bundle.hits.len(), - bundle.graphs.len() - ); + let position = |needle: &str| { + queries + .iter() + .position(|query| query == needle) + .unwrap_or_else(|| panic!("missing packet query `{needle}` in {queries:?}")) + }; - if let Some(node) = bundle.focused_node.as_ref() { - let _ = writeln!( - markdown, - "Focused symbol: **{}** (`{:?}`)", - node.display_name, node.kind + assert!(position("run_exec_session") < position("run_exec")); + assert!(position("Subcommand::Exec") < position("Subcommand")); + assert!(position("codex_exec::Cli") < position("ExecSharedCliOptions")); + assert!(position("codex_exec::run_main") < position("run_main")); + assert!(position("ExecSharedCliOptions") < position("exec_cli")); + assert!(position("EventProcessor") < position("ThreadStartParams")); + assert!(position("exec_events") < position("exec_events.rs")); + assert!(position("exec_events") < position("ThreadStartParams")); + assert!(queries.iter().any(|query| query == "ThreadStartParams")); + assert!(queries.iter().any(|query| query == "TurnStartParams")); + assert!(queries.iter().any(|query| query == "ExecSharedCliOptions")); + assert!(queries.iter().any(|query| query == "EventProcessor")); + assert!( + queries + .iter() + .any(|query| query == "EventProcessorWithJsonOutput") ); } - if let Some(source) = source_context { - let _ = writeln!( - markdown, - "\nSource snippet from `{}`:{}:\n", - source.path, source.line + #[test] + fn packet_anchor_probes_keep_required_event_flow_terms_inside_reduced_window() { + let _eval_probes = EvalProbesGuard::enabled(); + let plan = build_packet_plan( + "Explain how `codex exec --json` flows from the top-level CLI into the exec runtime, app-server thread and turn start requests, and JSONL event output.", + Some(PacketTaskClassDto::ArchitectureExplanation), + PacketBudgetModeDto::Compact, ); - markdown.push_str(&source.snippet); - markdown.push('\n'); - } + let reduced = packet_anchor_probe_queries(&plan) + .into_iter() + .take(14) + .collect::>(); - markdown.push_str("\nWhat I checked:\n"); - markdown.push_str("- Initial indexed-symbol search with current hybrid ranking.\n"); - if bundle.diagnostic_supplement_used { - markdown.push_str("- Deterministic query expansion because initial hits were weak.\n"); - } - if bundle.repo_explanation_supplement_used { - markdown.push_str( - "- Grounding snapshot diagnostic supplement for broad repo overview evidence.\n", - ); - } - if !bundle.diagnostic_supplement_used && should_investigate(profile) { - markdown.push_str("- Initial sidecar hits cleared the investigation confidence gate.\n"); + for expected in [ + "exec runtime", + "exec session", + "exec cli", + "json event output", + "thread start", + ] { + assert!( + reduced.iter().any(|query| query == expected), + "expected reduced anchor probe window to retain `{expected}` in {reduced:?}" + ); + } } - if bundle.hits.is_empty() { - markdown.push_str( - "\nNo indexed symbol matches found. Try: symbol names, module paths, or re-run indexing.\n", + #[test] + fn packet_symbol_probes_derive_generic_command_role_probes_from_code_span() { + let queries = packet_symbol_probe_queries( + "Trace how `acme deploy --dry-run` flows from the CLI subcommand into the exec runtime.", + PacketTaskClassDto::ArchitectureExplanation, + PacketBudgetModeDto::Standard, ); - } else { - markdown.push_str("\nTop indexed matches:\n"); - for hit in bundle.hits.iter().take(6) { - write_indexed_match_markdown(&mut markdown, hit); + + let position = |needle: &str| { + queries + .iter() + .position(|query| query == needle) + .unwrap_or_else(|| panic!("missing packet query `{needle}` in {queries:?}")) + }; + + for expected in [ + "acme deploy", + "acme deploy command", + "deploy command", + "deploy subcommand", + ] { + assert!( + queries.iter().any(|query| query == expected), + "expected generic command role probe `{expected}` in {queries:?}" + ); } - } - - if should_investigate(profile) && weak_initial_hits(prompt, &bundle.hits) { - markdown.push_str("\nGaps:\n"); - markdown.push_str( - "- Confidence is low: investigation mode could not find enough strong indexed-symbol evidence within its bounded search.\n", - ); - if bundle.hits.iter().any(SearchHit::is_text_match) { - markdown.push_str( - "- Repo-text hits cite file/line locations only and were not treated as resolvable symbols.\n", + for forbidden in [ + "Subcommand::Deploy", + "acme_deploy::Cli", + "acme_deploy::run_main", + ] { + assert!( + !queries.iter().any(|query| query == forbidden), + "production command probes should not include eval-style exact symbol `{forbidden}`: {queries:?}" ); } + assert!(position("acme deploy") < position("subcommand")); + assert!(position("acme deploy command") < position("command runtime")); } - markdown -} - -fn write_indexed_match_markdown(markdown: &mut String, hit: &SearchHit) { - let _ = writeln!( - markdown, - "- **{}** [{:?}] origin `{}` resolvable `{}` score `{:.3}`{}", - hit.display_name, - hit.kind, - hit.origin.as_str(), - hit.resolvable, - hit.score, - search_hit_location_suffix(hit) - ); -} + #[test] + fn packet_symbol_probes_load_eval_exact_command_probes_only_when_enabled() { + push_eval_probes_test_override(); + let queries = packet_symbol_probe_queries( + "Trace how `acme deploy --dry-run` flows from the CLI subcommand into the exec runtime.", + PacketTaskClassDto::ArchitectureExplanation, + PacketBudgetModeDto::Standard, + ); + pop_eval_probes_test_override(); -fn search_hit_location_suffix(hit: &SearchHit) -> String { - match (&hit.file_path, hit.line) { - (Some(path), Some(line)) => format!(" ({}:{})", path, line), - (Some(path), None) => format!(" ({})", path), - _ => String::new(), + assert!(queries.iter().any(|query| query == "Subcommand::Deploy")); + assert!(queries.iter().any(|query| query == "acme_deploy::Cli")); + assert!(queries.iter().any(|query| query == "acme_deploy::run_main")); } -} - -fn first_mermaid_graph_id(graphs: &[GraphArtifactDto]) -> Option { - graphs.iter().find_map(|graph| match graph { - GraphArtifactDto::Mermaid { id, .. } => Some(id.clone()), - GraphArtifactDto::Uml { .. } => None, - }) -} - -fn summarize_response(resolved_profile: &ResolvedProfile, bundle: &RetrievalBundle) -> String { - format!( - "DB-first retrieval ({:?}/{:?}) returned {} indexed match(es) and {} graph artifact(s).", - resolved_profile.preset, - resolved_profile.policy_mode, - bundle.hits.len(), - bundle.graphs.len() - ) -} - -fn next_request_id() -> String { - let nanos = SystemTime::now() - .duration_since(UNIX_EPOCH) - .unwrap_or_default() - .as_nanos(); - format!("ask-{}", nanos) -} -#[allow(dead_code)] -fn prompt_search_terms(prompt: &str) -> Vec { - const STOPWORDS: &[&str] = &[ - "a", - "actual", - "already", - "an", - "and", - "are", - "area", - "areas", - "across", - "as", - "at", - "be", - "boundaries", - "boundary", - "by", - "can", - "current", - "does", - "existing", - "for", - "from", - "how", - "implementation", - "implemented", - "in", - "is", - "it", - "of", - "on", - "or", - "repo", - "repository", - "risk", - "risks", - "study", - "surface", - "surfaces", - "the", - "this", - "to", - "what", - "where", - "which", - "why", - "with", - "work", - "works", - ]; - - let mut terms = Vec::new(); - let mut current = String::new(); - let mut seen = HashSet::new(); + #[test] + fn packet_symbol_probes_without_eval_catalog_include_prompt_derived_flow_anchors() { + let queries = packet_symbol_probe_queries( + "Explain how `codex exec --json` flows from the top-level CLI into the exec runtime, app-server thread and turn start requests, and JSONL event output.", + PacketTaskClassDto::ArchitectureExplanation, + PacketBudgetModeDto::Standard, + ); - for ch in prompt.chars() { - if ch.is_ascii_alphanumeric() || ch == '_' { - current.push(ch.to_ascii_lowercase()); - continue; + for expected in [ + "exec runtime", + "exec session", + "exec cli", + "exec command", + "jsonl event output", + "thread start", + "turn start", + ] { + assert!( + queries.iter().any(|query| query == expected), + "expected generic production flow query {expected} in {queries:?}" + ); } - - if current.len() >= 3 - && !STOPWORDS.contains(¤t.as_str()) - && seen.insert(current.clone()) - { - terms.push(current.clone()); + for forbidden in [ + "run_exec_session", + "ExecSharedCliOptions", + "EventProcessorWithJsonOutput", + "exec_events", + "ThreadStartParams", + "TurnStartParams", + "codex_exec::Cli", + "codex_exec::run_main", + "Subcommand::Exec", + ] { + assert!( + !queries.iter().any(|query| query == forbidden), + "production flow probes should not include eval-only exact query {forbidden}: {queries:?}" + ); } - current.clear(); - } - - if current.len() >= 3 && !STOPWORDS.contains(¤t.as_str()) && seen.insert(current.clone()) - { - terms.push(current); } - terms -} - -#[allow(dead_code)] -fn merge_search_hits(into: &mut Vec, additional: Vec, max_candidates: usize) { - let mut by_id = HashMap::::new(); - - for hit in into.drain(..) { - by_id.insert(hit.node_id.clone(), hit); - } + #[test] + fn packet_rank_demotes_lib_module_facade_below_concrete_module_file() { + let terms = vec!["jsonl".to_string(), "event".to_string()]; + let mut facade = test_packet_citation( + "event_processor_with_jsonl_output", + "codex-rs/exec/src/lib.rs", + 10.0, + ); + facade.kind = NodeKind::MODULE; + let mut concrete = test_packet_citation( + "event_processor_with_jsonl_output", + "codex-rs/exec/src/event_processor_with_jsonl_output.rs", + 10.0, + ); + concrete.kind = NodeKind::MODULE; - for hit in additional { - by_id - .entry(hit.node_id.clone()) - .and_modify(|existing| { - if hit.score > existing.score { - *existing = hit.clone(); - } - }) - .or_insert(hit); + assert!( + packet_citation_rank(&concrete, &terms, true) + > packet_citation_rank(&facade, &terms, true), + "concrete module files should outrank lib/mod facade declarations for packet evidence" + ); } - let mut merged = by_id.into_values().collect::>(); - merged.sort_by(|left, right| { - right - .score - .partial_cmp(&left.score) - .unwrap_or(Ordering::Equal) - }); - merged.truncate(max_candidates); - *into = merged; -} - -fn merge_scored_hits( - into: &mut Vec, - additional: Vec, - max_candidates: usize, -) { - let mut by_id = HashMap::::new(); + #[test] + fn packet_required_probe_promotion_prefers_command_focus_root_matches() { + let mut run_main = test_packet_citation( + "acme_deploy::run_main", + "crates/acme-deploy/src/main.rs", + 0.7, + ); + run_main.kind = NodeKind::MODULE; + let mut focused_event_file = test_packet_citation( + "std::collections::HashMap", + "crates/acme-deploy/src/event_processor_with_jsonl_output.rs", + 0.6, + ); + focused_event_file.kind = NodeKind::MODULE; + let distractor = test_packet_citation("jsonl", "crates/core/tests/sqlite_state.rs", 0.95); + let mut answer = packet_answer_fixture( + "Explain how `acme deploy --json` flows through JSONL event output.", + vec![distractor, focused_event_file, run_main], + ); - for hit in into.drain(..) { - by_id.insert(hit.hit.node_id.clone(), hit); - } + let protected = + promote_required_probe_citations(&mut answer, &["jsonl event output".to_string()]); - for hit in additional { - by_id - .entry(hit.hit.node_id.clone()) - .and_modify(|existing| { - if hit.total_score > existing.total_score { - *existing = hit.clone(); - } - }) - .or_insert(hit); + assert!(protected.contains(&packet_citation_key(&answer.citations[0]))); + assert_eq!( + answer.citations[0].file_path.as_deref(), + Some("crates/acme-deploy/src/event_processor_with_jsonl_output.rs") + ); } - let mut merged = by_id.into_values().collect::>(); - merged.sort_by(|left, right| { - right - .total_score - .partial_cmp(&left.total_score) - .unwrap_or(Ordering::Equal) - }); - merged.truncate(max_candidates); - *into = merged; -} - -#[cfg(test)] -mod tests { - use super::*; - use crate::agent::eval_probes::{ - EVAL_PROBES_ENV, pop_eval_probes_test_override, push_eval_probes_test_override, - }; - use crate::agent::profiles::ResolvedProfile; - - struct EvalProbesGuard; - - impl EvalProbesGuard { - fn enabled() -> Self { - push_eval_probes_test_override(); - Self - } - } + #[test] + fn packet_required_probe_promotion_preserves_exact_command_probes() { + let exact_cli = + test_packet_citation("acme_deploy::Cli", "crates/acme-cli/src/main.rs", 0.7); + let mut tempting_file_neighbor = + test_packet_citation("clap::Args", "crates/acme-deploy/src/cli.rs", 0.95); + tempting_file_neighbor.kind = NodeKind::MODULE; + let mut answer = packet_answer_fixture( + "Explain how `acme deploy --json` flows from CLI into runtime.", + vec![tempting_file_neighbor, exact_cli], + ); - impl Drop for EvalProbesGuard { - fn drop(&mut self) { - pop_eval_probes_test_override(); - } - } + promote_required_probe_citations(&mut answer, &["acme_deploy::Cli".to_string()]); - struct EnvVarGuard { - key: &'static str, - previous: Option, + assert_eq!(answer.citations[0].display_name, "acme_deploy::Cli"); + assert_eq!( + answer.citations[0].file_path.as_deref(), + Some("crates/acme-cli/src/main.rs") + ); } - impl EnvVarGuard { - fn cleared(key: &'static str) -> Self { - let previous = std::env::var_os(key); - // SAFETY: tests use this guard to isolate one env var for this process-local - // regression and restore it on drop. - unsafe { - std::env::remove_var(key); - } - Self { key, previous } - } - } + #[test] + fn packet_budget_reserves_focused_neighborhood_after_command_root() { + let run_main = test_packet_citation( + "acme_deploy::run_main", + "crates/acme-deploy/src/main.rs", + 0.7, + ); + let focused_cli = + test_packet_citation("clap::Parser", "crates/acme-deploy/src/cli.rs", 0.2); + let focused_lib = test_packet_citation("acme_deploy", "crates/acme-deploy/src/lib.rs", 0.2); + let mut focused_event = test_packet_citation( + "std::collections::HashMap", + "crates/acme-deploy/src/event_processor_with_jsonl_output.rs", + 0.2, + ); + focused_event.kind = NodeKind::MODULE; + let focused_exec_events = + test_packet_citation("exec_events", "crates/acme-deploy/src/exec_events.rs", 0.2); + let distractors = (0..8) + .map(|index| { + test_packet_citation( + &format!("GenericRuntime{index}"), + &format!("crates/cross-crate/src/runtime_{index}.rs"), + 10.0 - index as f32, + ) + }) + .collect::>(); + let mut citations = vec![ + distractors[0].clone(), + distractors[1].clone(), + focused_cli, + distractors[2].clone(), + focused_event, + run_main, + focused_lib, + distractors[3].clone(), + focused_exec_events, + ]; + citations.extend(distractors.into_iter().skip(4)); + let mut answer = packet_answer_fixture( + "Explain how `acme deploy --json` flows through the command runtime and event output.", + citations, + ); + let limits = PacketBudgetLimitsDto { + max_anchors: 5, + max_files: 5, + max_snippets: 5, + max_trail_edges: 0, + max_output_bytes: 64 * 1024, + }; - impl Drop for EnvVarGuard { - fn drop(&mut self) { - // SAFETY: restores the process-local env var captured by this guard. - unsafe { - if let Some(previous) = self.previous.take() { - std::env::set_var(self.key, previous); - } else { - std::env::remove_var(self.key); - } - } - } - } + cap_packet_citations(&mut answer, &limits, &["acme_deploy::run_main".to_string()]); - fn latency_profile() -> ResolvedProfile { - ResolvedProfile { - preset: codestory_contracts::api::AgentRetrievalPresetDto::Architecture, - policy_mode: AgentRetrievalPolicyModeDto::LatencyFirst, - trail_plans: Vec::new(), - include_edge_occurrences: false, - enable_source_reads: true, - max_search_results: 25, - max_source_bytes: 32 * 1024, + let paths = answer + .citations + .iter() + .filter_map(|citation| citation.file_path.as_deref()) + .collect::>(); + assert_eq!( + answer.citations[0].display_name, "acme_deploy::run_main", + "exact command probe should remain first: {paths:?}" + ); + for expected in [ + "crates/acme-deploy/src/cli.rs", + "crates/acme-deploy/src/lib.rs", + "crates/acme-deploy/src/event_processor_with_jsonl_output.rs", + "crates/acme-deploy/src/exec_events.rs", + ] { + assert!( + paths.contains(&expected), + "focused command-root neighbor should survive compact cap: {paths:?}" + ); } } #[test] - fn agent_trail_request_hides_speculative_edges_by_default() { - let plan = TrailPlan { - mode: codestory_contracts::api::TrailMode::AllReferenced, - depth: 3, - direction: codestory_contracts::api::TrailDirection::Outgoing, - caller_scope: codestory_contracts::api::TrailCallerScope::ProductionOnly, - edge_filter: vec![codestory_contracts::api::EdgeKind::CALL], - node_filter: vec![codestory_contracts::api::NodeKind::FUNCTION], - max_nodes: 42, + fn packet_focused_neighborhood_reservation_preserves_exact_command_order() { + let run_main = test_packet_citation( + "acme_deploy::run_main", + "crates/acme-deploy/src/main.rs", + 0.7, + ); + let focused_neighbor = + test_packet_citation("exec_events", "crates/acme-deploy/src/exec_events.rs", 0.99); + let exact_cli = + test_packet_citation("acme_deploy::Cli", "crates/acme-cli/src/main.rs", 0.6); + let cross_root = test_packet_citation("exec_events", "crates/core/src/exec_events.rs", 2.0); + let mut answer = packet_answer_fixture( + "Explain how `acme deploy --json` flows through runtime events.", + vec![focused_neighbor, cross_root, run_main, exact_cli], + ); + let limits = PacketBudgetLimitsDto { + max_anchors: 4, + max_files: 4, + max_snippets: 4, + max_trail_edges: 0, + max_output_bytes: 64 * 1024, }; - let request = agent_trail_request(NodeId("root".to_string()), &plan); + cap_packet_citations( + &mut answer, + &limits, + &[ + "acme_deploy::run_main".to_string(), + "acme_deploy::Cli".to_string(), + ], + ); - assert!(request.hide_speculative); - assert!(request.show_utility_calls); - assert!(!request.story); - assert_eq!(request.edge_filter, plan.edge_filter); - assert_eq!(request.node_filter, plan.node_filter); - assert_eq!(request.max_nodes, plan.max_nodes); + let displays = answer + .citations + .iter() + .map(|citation| citation.display_name.as_str()) + .collect::>(); + assert_eq!( + displays[..2], + ["acme_deploy::run_main", "acme_deploy::Cli"], + "exact command probes should stay ahead of focused neighbors: {displays:?}" + ); + assert_eq!( + answer.citations[2].file_path.as_deref(), + Some("crates/acme-deploy/src/exec_events.rs") + ); + } + + #[test] + fn packet_focus_neighborhood_prefers_source_navigation_files() { + let run_main = test_packet_citation( + "acme_deploy::run_main", + "crates/acme-deploy/src/main.rs", + 0.7, + ); + let focused_event = test_packet_citation( + "EventProcessor", + "crates/acme-deploy/src/event_processor.rs", + 0.99, + ); + let focused_cli = + test_packet_citation("clap::Parser", "crates/acme-deploy/src/cli.rs", 0.2); + let focused_lib = test_packet_citation("cli", "crates/acme-deploy/src/lib.rs", 0.2); + let cross_root_cli = + test_packet_citation("clap::Parser", "crates/other-tool/src/cli.rs", 10.0); + let mut answer = packet_answer_fixture( + "Explain how `acme deploy --json` flows from CLI into runtime.", + vec![ + focused_event, + focused_cli, + run_main, + cross_root_cli, + focused_lib, + ], + ); + let protected = + promote_required_probe_citations(&mut answer, &["acme_deploy::run_main".to_string()]); + let focused = promote_focus_neighborhood_citations(&mut answer, &protected); + + let focused_paths = answer + .citations + .iter() + .filter(|citation| focused.contains(&packet_citation_key(citation))) + .filter_map(|citation| citation.file_path.as_deref()) + .collect::>(); + assert!( + focused_paths.starts_with(&[ + "crates/acme-deploy/src/cli.rs", + "crates/acme-deploy/src/lib.rs" + ]), + "source navigation files should be carried before same-root event details or cross-root cli files: {focused_paths:?}" + ); } - fn test_search_hit(node_id: &str, score: f32) -> SearchHit { - SearchHit { - node_id: codestory_contracts::api::NodeId(node_id.to_string()), - display_name: node_id.to_string(), - kind: codestory_contracts::api::NodeKind::FUNCTION, - file_path: None, - line: None, - score, - origin: codestory_contracts::api::SearchHitOrigin::IndexedSymbol, - match_quality: None, - resolvable: true, - score_breakdown: None, - } + #[test] + fn packet_focus_neighborhood_skips_protected_file_duplicates() { + let run_main = test_packet_citation( + "acme_deploy::run_main", + "crates/acme-deploy/src/main.rs", + 0.7, + ); + let duplicate_main_cli = + test_packet_citation("acme_deploy::Cli", "crates/acme-deploy/src/main.rs", 5.0); + let duplicate_main_parser = + test_packet_citation("clap::Parser", "crates/acme-deploy/src/main.rs", 5.0); + let focused_cli = test_packet_citation("clap::Args", "crates/acme-deploy/src/cli.rs", 0.2); + let focused_lib = test_packet_citation("cli", "crates/acme-deploy/src/lib.rs", 0.2); + let focused_event = test_packet_citation( + "EventProcessor", + "crates/acme-deploy/src/event_processor.rs", + 0.99, + ); + let focused_exec_events = + test_packet_citation("exec_events", "crates/acme-deploy/src/exec_events.rs", 0.98); + let mut answer = packet_answer_fixture( + "Explain how `acme deploy --json` flows from CLI into runtime events.", + vec![ + duplicate_main_cli, + focused_event, + focused_cli, + run_main, + duplicate_main_parser, + focused_exec_events, + focused_lib, + ], + ); + let protected = + promote_required_probe_citations(&mut answer, &["acme_deploy::run_main".to_string()]); + let focused = promote_focus_neighborhood_citations(&mut answer, &protected); + + let focused_paths = answer + .citations + .iter() + .filter(|citation| focused.contains(&packet_citation_key(citation))) + .filter_map(|citation| citation.file_path.as_deref()) + .collect::>(); + assert!( + !focused_paths.contains(&"crates/acme-deploy/src/main.rs"), + "focus carry should not spend slots on duplicate citations for an already protected file: {focused_paths:?}" + ); + assert!( + focused_paths.contains(&"crates/acme-deploy/src/cli.rs") + && focused_paths.contains(&"crates/acme-deploy/src/lib.rs") + && focused_paths.contains(&"crates/acme-deploy/src/event_processor.rs") + && focused_paths.contains(&"crates/acme-deploy/src/exec_events.rs"), + "focus carry should preserve distinct same-root source files: {focused_paths:?}" + ); } - fn test_semantic_only_hit(node_id: &str, score: f32) -> SearchHit { - let mut hit = test_search_hit(node_id, score); - hit.score_breakdown = Some(RetrievalScoreBreakdownDto { - lexical: 0.0, - semantic: score, - graph: 0.0, - total: score, - }); - hit - } + #[test] + fn packet_focus_neighborhood_prefers_event_definition_files() { + let run_main = test_packet_citation( + "acme_deploy::run_main", + "crates/acme-deploy/src/main.rs", + 0.7, + ); + let focused_cli = test_packet_citation("clap::Args", "crates/acme-deploy/src/cli.rs", 0.2); + let focused_lib = test_packet_citation("cli", "crates/acme-deploy/src/lib.rs", 0.2); + let focused_event_processor = test_packet_citation( + "EventProcessor", + "crates/acme-deploy/src/event_processor.rs", + 0.99, + ); + let focused_human_output = test_packet_citation( + "HumanOutput", + "crates/acme-deploy/src/event_processor_with_human_output.rs", + 3.0, + ); + let focused_exec_events = + test_packet_citation("exec_events", "crates/acme-deploy/src/exec_events.rs", 0.1); + let mut answer = packet_answer_fixture( + "Explain how `acme deploy --json` flows through event output.", + vec![ + focused_human_output, + focused_event_processor, + focused_cli, + run_main, + focused_exec_events, + focused_lib, + ], + ); + let protected = + promote_required_probe_citations(&mut answer, &["acme_deploy::run_main".to_string()]); + let focused = promote_focus_neighborhood_citations(&mut answer, &protected); - fn test_packet_citation(display_name: &str, file_path: &str, score: f32) -> AgentCitationDto { - AgentCitationDto { - node_id: NodeId(display_name.to_string()), - display_name: display_name.to_string(), - kind: codestory_contracts::api::NodeKind::FUNCTION, - file_path: Some(file_path.to_string()), - line: Some(10), - score, - origin: SearchHitOrigin::IndexedSymbol, - resolvable: true, - subgraph_id: None, - evidence_edge_ids: Vec::new(), - retrieval_score_breakdown: Some(RetrievalScoreBreakdownDto { - lexical: 0.4, - semantic: 0.2, - graph: 0.3, - total: score, - }), - } + let focused_paths = answer + .citations + .iter() + .filter(|citation| focused.contains(&packet_citation_key(citation))) + .filter_map(|citation| citation.file_path.as_deref()) + .collect::>(); + assert!( + focused_paths.contains(&"crates/acme-deploy/src/exec_events.rs"), + "event definition files should survive compact focus carry: {focused_paths:?}" + ); } - fn packet_answer_fixture(question: &str, citations: Vec) -> AgentAnswerDto { - AgentAnswerDto { - answer_id: "packet-fixture".to_string(), - prompt: question.to_string(), - summary: "Fixture packet is covered by cited anchors.".to_string(), - freshness: None, - sections: vec![AgentResponseSectionDto { - id: "answer".to_string(), - title: "Answer".to_string(), - blocks: vec![AgentResponseBlockDto::Markdown { - markdown: "Packet answer assembled from cited anchors.".to_string(), - }], - }], - citations, - subgraph_ids: Vec::new(), - retrieval_version: "test".to_string(), - graphs: Vec::new(), - retrieval_trace: codestory_contracts::api::AgentRetrievalTraceDto { - request_id: "packet-fixture".to_string(), - resolved_profile: AgentRetrievalPresetDto::Architecture, - policy_mode: AgentRetrievalPolicyModeDto::LatencyFirst, - total_latency_ms: 1, - sla_target_ms: None, - sla_missed: false, - semantic_fallback_count: 0, - semantic_fallbacks: Vec::new(), - annotations: Vec::new(), - steps: Vec::new(), - retrieval_shadow: None, - }, - } - } + #[test] + fn packet_sufficiency_requires_planned_flow_probe_coverage() { + let (_answer, sufficiency) = build_sufficient_packet_fixture( + "Explain how `codex exec --json` flows from the top-level CLI into the exec runtime, app-server thread and turn start requests, and JSONL event output.", + PacketTaskClassDto::ArchitectureExplanation, + vec![ + test_packet_citation( + "EventProcessorWithJsonOutput", + "codex-rs/exec/src/event_processor_with_jsonl_output.rs", + 0.9, + ), + test_packet_citation("exec_cli", "codex-rs/cli/src/main.rs", 0.8), + test_packet_citation( + "UnifiedExecRuntime", + "codex-rs/core/src/tools/runtimes/unified_exec.rs", + 0.7, + ), + test_packet_citation( + "ThreadStartParams", + "codex-rs/app-server-protocol/schema/typescript/v2/ThreadStartParams.ts", + 0.7, + ), + ], + ); - fn packet_fixture_project_root() -> &'static std::path::Path { - std::path::Path::new("C:/workspace/project root") + assert_eq!(sufficiency.status, PacketSufficiencyStatusDto::Partial); + assert!( + sufficiency + .gaps + .iter() + .any(|gap| gap.contains("exec session") + && gap.contains("exec command") + && gap.contains("turn start")), + "required flow probes should be named in sufficiency gaps: {sufficiency:?}" + ); + assert!( + sufficiency + .follow_up_commands + .iter() + .any(|command| command.contains("--query 'exec session'")), + "missing required probes should become targeted follow-up searches: {sufficiency:?}" + ); } - fn build_sufficient_packet_fixture( - question: &str, - task_class: PacketTaskClassDto, - citations: Vec, - ) -> (AgentAnswerDto, PacketSufficiencyDto) { - let limits = packet_budget_limits(PacketBudgetModeDto::Compact); - let mut answer = packet_answer_fixture(question, citations); - rank_packet_evidence(question, &mut answer); - append_packet_evidence_sections(&mut answer, task_class, &limits); - let budget = apply_packet_budget( - packet_fixture_project_root(), + #[test] + fn packet_sufficiency_treats_unresolved_sidecar_candidates_as_gap() { + let question = "Explain how packet retrieval flows through sidecar diagnostics."; + let (mut answer, initial_sufficiency) = build_sufficient_packet_fixture( question, - PacketTaskClassDto::ArchitectureExplanation, - PacketBudgetModeDto::Compact, - limits, - &mut answer, + PacketTaskClassDto::EditPlanning, + vec![ + test_packet_citation("packet_planner", "src/packet.rs", 0.9), + test_packet_citation("sidecar_batch", "src/sidecar.rs", 0.8), + test_packet_citation("sufficiency_builder", "src/sufficiency.rs", 0.7), + ], ); + assert_eq!( + initial_sufficiency.status, + PacketSufficiencyStatusDto::Sufficient + ); + answer + .retrieval_trace + .packet_sidecar_diagnostics + .push(PacketSidecarQueryDiagnosticDto { + query: "sidecar batch".to_string(), + retrieval_mode: "full".to_string(), + candidate_count: 1, + resolved_hit_count: 0, + unresolved_candidate_count: 1, + diagnostic: Some( + "sidecar candidates did not all resolve to indexed symbols".to_string(), + ), + }); + answer + .retrieval_trace + .packet_sidecar_diagnostics + .push(PacketSidecarQueryDiagnosticDto { + query: "sidecar batch".to_string(), + retrieval_mode: "full".to_string(), + candidate_count: 1, + resolved_hit_count: 0, + unresolved_candidate_count: 1, + diagnostic: Some( + "sidecar candidates did not all resolve to indexed symbols".to_string(), + ), + }); + + let budget = PacketBudgetDto { + requested: PacketBudgetModeDto::Compact, + limits: packet_budget_limits(PacketBudgetModeDto::Compact), + used: PacketBudgetUsageDto { + anchors: 3, + files: 0, + snippets: 0, + trail_edges: 0, + output_bytes: 0, + }, + truncated: false, + omitted_sections: Vec::new(), + next_deeper_command: None, + }; let sufficiency = build_packet_sufficiency( packet_fixture_project_root(), question, - task_class, + PacketTaskClassDto::EditPlanning, &answer, &budget, ); - (answer, sufficiency) + + assert_eq!(sufficiency.status, PacketSufficiencyStatusDto::Partial); + assert!( + sufficiency + .gaps + .iter() + .any(|gap| gap.contains("sidecar candidates")), + "expected sidecar candidate gap, got {:?}", + sufficiency.gaps + ); + let sidecar_gap = sufficiency + .gaps + .iter() + .find(|gap| gap.contains("sidecar candidates")) + .expect("sidecar gap"); + assert_eq!( + sidecar_gap.matches("sidecar batch").count(), + 1, + "duplicate diagnostics should not duplicate query names in sufficiency gaps: {sidecar_gap}" + ); + } + + #[test] + fn packet_sufficiency_accepts_required_flow_probe_coverage() { + let (_answer, sufficiency) = build_sufficient_packet_fixture( + "Explain how `codex exec --json` flows from the top-level CLI into the exec runtime, app-server thread and turn start requests, and JSONL event output.", + PacketTaskClassDto::ArchitectureExplanation, + vec![ + test_packet_citation("exec runtime", "codex-rs/exec/src/lib.rs", 0.9), + test_packet_citation("exec session", "codex-rs/exec/src/lib.rs", 0.9), + test_packet_citation("exec cli", "codex-rs/cli/src/main.rs", 0.8), + test_packet_citation("exec command", "codex-rs/cli/src/main.rs", 0.8), + test_packet_citation( + "json event output", + "codex-rs/exec/src/event_processor_with_jsonl_output.rs", + 0.8, + ), + test_packet_citation( + "jsonl event output", + "codex-rs/exec/src/exec_events.rs", + 0.8, + ), + test_packet_citation( + "thread start", + "codex-rs/app-server-protocol/schema/typescript/v2/ThreadStartParams.ts", + 0.7, + ), + test_packet_citation( + "turn start", + "codex-rs/app-server-protocol/schema/typescript/v2/TurnStartParams.ts", + 0.7, + ), + ], + ); + + assert_eq!( + sufficiency.status, + PacketSufficiencyStatusDto::Sufficient, + "{sufficiency:?}" + ); + assert!(sufficiency.gaps.is_empty(), "{sufficiency:?}"); + assert!(sufficiency.follow_up_commands.is_empty(), "{sufficiency:?}"); } #[test] - fn packet_symbol_probes_prioritize_flow_specific_terms() { + fn packet_sufficiency_rejects_module_facade_for_concrete_file_probe() { let _eval_probes = EvalProbesGuard::enabled(); - let queries = packet_symbol_probe_queries( + let (_answer, sufficiency) = build_sufficient_packet_fixture( "Explain how `codex exec --json` flows from the top-level CLI into the exec runtime, app-server thread and turn start requests, and JSONL event output.", PacketTaskClassDto::ArchitectureExplanation, - PacketBudgetModeDto::Standard, + vec![ + test_packet_citation("run_exec_session", "codex-rs/exec/src/lib.rs", 0.9), + test_packet_citation("ExecSharedCliOptions", "codex-rs/exec/src/cli.rs", 0.9), + test_packet_citation("Subcommand::Exec", "codex-rs/cli/src/main.rs", 0.8), + test_packet_citation("run_main", "codex-rs/exec/src/main.rs", 0.8), + test_packet_citation( + "EventProcessorWithJsonOutput", + "codex-rs/exec/src/event_processor_with_jsonl_output.rs", + 0.8, + ), + test_packet_citation("exec_events", "codex-rs/exec/src/lib.rs", 0.8), + test_packet_citation( + "ThreadStartParams", + "codex-rs/app-server-protocol/schema/typescript/v2/ThreadStartParams.ts", + 0.7, + ), + test_packet_citation( + "TurnStartParams", + "codex-rs/app-server-protocol/schema/typescript/v2/TurnStartParams.ts", + 0.7, + ), + ], ); - let position = |needle: &str| { - queries - .iter() - .position(|query| query == needle) - .unwrap_or_else(|| panic!("missing packet query `{needle}` in {queries:?}")) - }; - - assert!(position("run_exec_session") < position("run_exec")); - assert!(position("Subcommand::Exec") < position("Subcommand")); - assert!(position("codex_exec::Cli") < position("ExecSharedCliOptions")); - assert!(position("codex_exec::run_main") < position("run_main")); - assert!(position("ExecSharedCliOptions") < position("exec_cli")); - assert!(position("EventProcessor") < position("ThreadStartParams")); - assert!(position("exec_events") < position("exec_events.rs")); - assert!(position("exec_events") < position("ThreadStartParams")); - assert!(queries.iter().any(|query| query == "ThreadStartParams")); - assert!(queries.iter().any(|query| query == "TurnStartParams")); - assert!(queries.iter().any(|query| query == "ExecSharedCliOptions")); - assert!(queries.iter().any(|query| query == "EventProcessor")); + assert_eq!(sufficiency.status, PacketSufficiencyStatusDto::Partial); assert!( - queries + sufficiency + .gaps .iter() - .any(|query| query == "EventProcessorWithJsonOutput") + .any(|gap| gap.contains("exec_events")), + "facade module declaration should not satisfy the concrete exec_events file probe: {sufficiency:?}" ); } #[test] - fn packet_anchor_probes_keep_required_event_flow_terms_inside_reduced_window() { + fn packet_symbol_probes_expand_indexing_storage_flow_concepts() { let _eval_probes = EvalProbesGuard::enabled(); - let plan = build_packet_plan( - "Explain how `codex exec --json` flows from the top-level CLI into the exec runtime, app-server thread and turn start requests, and JSONL event output.", - Some(PacketTaskClassDto::ArchitectureExplanation), - PacketBudgetModeDto::Compact, - ); - let reduced = packet_anchor_probe_queries(&plan) - .into_iter() - .take(14) - .collect::>(); - - for expected in [ - "exec runtime", - "exec session", - "exec cli", - "json event output", - "thread start", - ] { - assert!( - reduced.iter().any(|query| query == expected), - "expected reduced anchor probe window to retain `{expected}` in {reduced:?}" - ); - } - } - - #[test] - fn packet_symbol_probes_derive_generic_command_role_probes_from_code_span() { let queries = packet_symbol_probe_queries( - "Trace how `acme deploy --dry-run` flows from the CLI subcommand into the exec runtime.", + "Explain how project/source-group configuration becomes indexing work, then how indexed data is accessed from storage.", PacketTaskClassDto::ArchitectureExplanation, PacketBudgetModeDto::Standard, ); @@ -6261,1119 +3774,1129 @@ mod tests { .unwrap_or_else(|| panic!("missing packet query `{needle}` in {queries:?}")) }; + assert!(position("StorageAccess") < position("SourceGroup")); + assert!(position("PersistentStorage") < position("SourceGroup")); + assert!(position("Project::buildIndex") < position("SourceGroup")); + assert!(position("TaskFillIndexerCommandsQueue") < position("SourceGroup")); + assert!(position("IndexerCommandCxx") < position("SourceGroup")); + assert!(position("IndexerJava::doIndex") < position("SourceGroup")); + assert!(queries.iter().any(|query| query == "source_group")); for expected in [ - "acme deploy", - "acme deploy command", - "deploy command", - "deploy subcommand", - ] { - assert!( - queries.iter().any(|query| query == expected), - "expected generic command role probe `{expected}` in {queries:?}" - ); - } - for forbidden in [ - "Subcommand::Deploy", - "acme_deploy::Cli", - "acme_deploy::run_main", - ] { - assert!( - !queries.iter().any(|query| query == forbidden), - "production command probes should not include eval-style exact symbol `{forbidden}`: {queries:?}" - ); - } - assert!(position("acme deploy") < position("subcommand")); - assert!(position("acme deploy command") < position("command runtime")); - } - - #[test] - fn packet_symbol_probes_load_eval_exact_command_probes_only_when_enabled() { - push_eval_probes_test_override(); - let queries = packet_symbol_probe_queries( - "Trace how `acme deploy --dry-run` flows from the CLI subcommand into the exec runtime.", - PacketTaskClassDto::ArchitectureExplanation, - PacketBudgetModeDto::Standard, - ); - pop_eval_probes_test_override(); - - assert!(queries.iter().any(|query| query == "Subcommand::Deploy")); - assert!(queries.iter().any(|query| query == "acme_deploy::Cli")); - assert!(queries.iter().any(|query| query == "acme_deploy::run_main")); - } - - #[test] - fn packet_symbol_probes_without_eval_catalog_include_prompt_derived_flow_anchors() { - let queries = packet_symbol_probe_queries( - "Explain how `codex exec --json` flows from the top-level CLI into the exec runtime, app-server thread and turn start requests, and JSONL event output.", - PacketTaskClassDto::ArchitectureExplanation, - PacketBudgetModeDto::Standard, - ); - - for expected in [ - "exec runtime", - "exec session", - "exec cli", - "exec command", - "jsonl event output", - "thread start", - "turn start", + "SourceGroupSettings", + "SourceGroupFactoryModule", + "Project::buildIndex", + "SourceGroupCxxCdb", + "SourceGroupCxxCdb::getIndexerCommandProvider", + "buildIndex", + "TaskFillIndexerCommandsQueue", + "IndexerCommand", + "IndexerCommandCxx", + "IndexerJava", + "IndexerJava::doIndex", + "StorageAccess", + "StorageAccessProxy", + "PersistentStorage", ] { assert!( queries.iter().any(|query| query == expected), - "expected generic production flow query {expected} in {queries:?}" - ); - } - for forbidden in [ - "run_exec_session", - "ExecSharedCliOptions", - "EventProcessorWithJsonOutput", - "exec_events", - "ThreadStartParams", - "TurnStartParams", - "codex_exec::Cli", - "codex_exec::run_main", - "Subcommand::Exec", - ] { - assert!( - !queries.iter().any(|query| query == forbidden), - "production flow probes should not include eval-only exact query {forbidden}: {queries:?}" + "expected flow concept query {expected} in {queries:?}" + ); + } + for noisy in ["turns", "Turns", "cite", "source"] { + assert!( + !queries.iter().any(|query| query == noisy), + "packet probes should suppress noisy query {noisy}: {queries:?}" ); } } #[test] - fn packet_rank_demotes_lib_module_facade_below_concrete_module_file() { - let terms = vec!["jsonl".to_string(), "event".to_string()]; - let mut facade = test_packet_citation( - "event_processor_with_jsonl_output", - "codex-rs/exec/src/lib.rs", - 10.0, - ); - facade.kind = NodeKind::MODULE; - let mut concrete = test_packet_citation( - "event_processor_with_jsonl_output", - "codex-rs/exec/src/event_processor_with_jsonl_output.rs", - 10.0, - ); - concrete.kind = NodeKind::MODULE; - - assert!( - packet_citation_rank(&concrete, &terms, true) - > packet_citation_rank(&facade, &terms, true), - "concrete module files should outrank lib/mod facade declarations for packet evidence" - ); - } - - #[test] - fn packet_probe_match_rank_uses_multi_token_path_coverage() { - let mut citation = test_packet_citation( - "std::collections::HashMap", - "codex-rs/exec/src/event_processor_with_jsonl_output.rs", - 0.6, - ); - citation.kind = NodeKind::MODULE; - - assert_eq!( - packet_citation_probe_match_rank("jsonl event output", &citation), - Some(4) - ); - assert_eq!( - packet_citation_probe_token_coverage("jsonl event output", &citation), - 3 - ); - } - - #[test] - fn packet_required_probe_promotion_prefers_command_focus_root_matches() { - let mut run_main = test_packet_citation( - "acme_deploy::run_main", - "crates/acme-deploy/src/main.rs", - 0.7, - ); - run_main.kind = NodeKind::MODULE; - let mut focused_event_file = test_packet_citation( - "std::collections::HashMap", - "crates/acme-deploy/src/event_processor_with_jsonl_output.rs", - 0.6, - ); - focused_event_file.kind = NodeKind::MODULE; - let distractor = test_packet_citation("jsonl", "crates/core/tests/sqlite_state.rs", 0.95); - let mut answer = packet_answer_fixture( - "Explain how `acme deploy --json` flows through JSONL event output.", - vec![distractor, focused_event_file, run_main], - ); - - let protected = - promote_required_probe_citations(&mut answer, &["jsonl event output".to_string()]); - - assert!(protected.contains(&packet_citation_key(&answer.citations[0]))); - assert_eq!( - answer.citations[0].file_path.as_deref(), - Some("crates/acme-deploy/src/event_processor_with_jsonl_output.rs") - ); - } - - #[test] - fn packet_required_probe_promotion_preserves_exact_command_probes() { - let exact_cli = - test_packet_citation("acme_deploy::Cli", "crates/acme-cli/src/main.rs", 0.7); - let mut tempting_file_neighbor = - test_packet_citation("clap::Args", "crates/acme-deploy/src/cli.rs", 0.95); - tempting_file_neighbor.kind = NodeKind::MODULE; - let mut answer = packet_answer_fixture( - "Explain how `acme deploy --json` flows from CLI into runtime.", - vec![tempting_file_neighbor, exact_cli], + fn packet_followups_preserve_structured_storage_and_source_group_probes() { + let _eval_probes = EvalProbesGuard::enabled(); + let queries = packet_targeted_follow_up_queries( + "Explain how project/source-group configuration becomes indexing work, then how indexed data is accessed from storage.", + PacketTaskClassDto::ArchitectureExplanation, ); - promote_required_probe_citations(&mut answer, &["acme_deploy::Cli".to_string()]); - - assert_eq!(answer.citations[0].display_name, "acme_deploy::Cli"); assert_eq!( - answer.citations[0].file_path.as_deref(), - Some("crates/acme-cli/src/main.rs") + queries.len(), + 6, + "targeted packet follow-ups should stay capped: {queries:?}" + ); + assert!( + queries.iter().any(|query| query.contains("Storage")), + "storage-flow follow-ups should keep structured storage probes: {queries:?}" + ); + assert!( + queries.iter().any(|query| query.contains("SourceGroup")), + "source-group follow-ups should keep a structured source-group probe: {queries:?}" + ); + assert!( + queries + .iter() + .any(|query| query == "SourceGroupCxxCdb::getIndexerCommandProvider"), + "existing qualified source-group probes should still be represented: {queries:?}" + ); + assert!( + queries + .iter() + .any(|query| query == "PersistentStorage::PersistentStorage"), + "existing persistence constructor probe should still be represented: {queries:?}" ); } #[test] - fn packet_budget_reserves_focused_neighborhood_after_command_root() { - let run_main = test_packet_citation( - "acme_deploy::run_main", - "crates/acme-deploy/src/main.rs", - 0.7, - ); - let focused_cli = - test_packet_citation("clap::Parser", "crates/acme-deploy/src/cli.rs", 0.2); - let focused_lib = test_packet_citation("acme_deploy", "crates/acme-deploy/src/lib.rs", 0.2); - let mut focused_event = test_packet_citation( - "std::collections::HashMap", - "crates/acme-deploy/src/event_processor_with_jsonl_output.rs", - 0.2, - ); - focused_event.kind = NodeKind::MODULE; - let focused_exec_events = - test_packet_citation("exec_events", "crates/acme-deploy/src/exec_events.rs", 0.2); - let distractors = (0..8) - .map(|index| { - test_packet_citation( - &format!("GenericRuntime{index}"), - &format!("crates/cross-crate/src/runtime_{index}.rs"), - 10.0 - index as f32, - ) - }) - .collect::>(); - let mut citations = vec![ - distractors[0].clone(), - distractors[1].clone(), - focused_cli, - distractors[2].clone(), - focused_event, - run_main, - focused_lib, - distractors[3].clone(), - focused_exec_events, - ]; - citations.extend(distractors.into_iter().skip(4)); - let mut answer = packet_answer_fixture( - "Explain how `acme deploy --json` flows through the command runtime and event output.", - citations, + fn packet_symbol_probes_expand_vscode_workbench_extension_host_concepts() { + let _eval_probes = EvalProbesGuard::enabled(); + let queries = packet_symbol_probe_queries( + "Explain how VS Code workbench startup reaches extension host activation and command execution.", + PacketTaskClassDto::ArchitectureExplanation, + PacketBudgetModeDto::Standard, ); - let limits = PacketBudgetLimitsDto { - max_anchors: 5, - max_files: 5, - max_snippets: 5, - max_trail_edges: 0, - max_output_bytes: 64 * 1024, - }; - cap_packet_citations(&mut answer, &limits, &["acme_deploy::run_main".to_string()]); + let position = |needle: &str| { + queries + .iter() + .position(|query| query == needle) + .unwrap_or_else(|| panic!("missing packet query `{needle}` in {queries:?}")) + }; - let paths = answer - .citations - .iter() - .filter_map(|citation| citation.file_path.as_deref()) - .collect::>(); - assert_eq!( - answer.citations[0].display_name, "acme_deploy::run_main", - "exact command probe should remain first: {paths:?}" - ); + assert!(position("Workbench") < position("workbench_startup")); for expected in [ - "crates/acme-deploy/src/cli.rs", - "crates/acme-deploy/src/lib.rs", - "crates/acme-deploy/src/event_processor_with_jsonl_output.rs", - "crates/acme-deploy/src/exec_events.rs", + "Workbench.startup", + "ExtensionService", + "ExtensionHostManager", + "ExtensionHostManager.startup", + "AbstractExtHostExtensionService", + "AbstractExtHostExtensionService._startExtensionHost", + "ExtHostCommands", + "ExtHostCommands.executeCommand", ] { assert!( - paths.contains(&expected), - "focused command-root neighbor should survive compact cap: {paths:?}" + queries.iter().any(|query| query == expected), + "expected VS Code flow concept query {expected} in {queries:?}" ); } } #[test] - fn packet_focused_neighborhood_reservation_preserves_exact_command_order() { - let run_main = test_packet_citation( - "acme_deploy::run_main", - "crates/acme-deploy/src/main.rs", - 0.7, - ); - let focused_neighbor = - test_packet_citation("exec_events", "crates/acme-deploy/src/exec_events.rs", 0.99); - let exact_cli = - test_packet_citation("acme_deploy::Cli", "crates/acme-cli/src/main.rs", 0.6); - let cross_root = test_packet_citation("exec_events", "crates/core/src/exec_events.rs", 2.0); - let mut answer = packet_answer_fixture( - "Explain how `acme deploy --json` flows through runtime events.", - vec![focused_neighbor, cross_root, run_main, exact_cli], + fn packet_symbol_probes_treat_root_runtime_as_brand_for_payload_content_flow() { + let _eval_probes = EvalProbesGuard::enabled(); + let queries = packet_symbol_probe_queries( + "Explain how Root & Runtime public writing and social surfaces connect through Payload collections, post rendering, comment auth/submission, RSS, and the Elsewhere feed.", + PacketTaskClassDto::ArchitectureExplanation, + PacketBudgetModeDto::Standard, ); - let limits = PacketBudgetLimitsDto { - max_anchors: 4, - max_files: 4, - max_snippets: 4, - max_trail_edges: 0, - max_output_bytes: 64 * 1024, - }; - cap_packet_citations( - &mut answer, - &limits, - &[ - "acme_deploy::run_main".to_string(), - "acme_deploy::Cli".to_string(), - ], - ); + let position = |needle: &str| { + queries + .iter() + .position(|query| query == needle) + .unwrap_or_else(|| panic!("missing packet query `{needle}` in {queries:?}")) + }; - let displays = answer - .citations - .iter() - .map(|citation| citation.display_name.as_str()) - .collect::>(); - assert_eq!( - displays[..2], - ["acme_deploy::run_main", "acme_deploy::Cli"], - "exact command probes should stay ahead of focused neighbors: {displays:?}" - ); - assert_eq!( - answer.citations[2].file_path.as_deref(), - Some("crates/acme-deploy/src/exec_events.rs") - ); + assert!(position("buildConfig") < position("public_writing")); + for expected in [ + "payload_config", + "payload config", + "src/payload.config.ts", + "getPayloadClient", + "payload client", + "src/lib/payload.ts", + "Posts", + "PostPage", + "getPostBySlug", + "getAllPosts", + "Comments", + "POST /posts/:slug/comments", + "getApprovedCommentsForPost", + "getCommentAuthContextFromHeaders", + "comment_submission_guard", + "comment-submission-guard", + "src/lib/comment-submission-guard.ts", + "isCommentSubmissionOriginAllowed", + "isCommentSubmissionTimingAllowed", + "consumeCommentSubmissionRateLimit", + "SocialEntries", + "getLatestSocialEntries", + "feed.xml", + ] { + assert!( + queries.iter().any(|query| query == expected), + "expected Payload/content flow query {expected} in {queries:?}" + ); + } + for noisy in ["root", "runtime", "root_runtime", "runtime_public"] { + assert!( + !queries.iter().any(|query| query == noisy), + "brand terms should not dominate content-flow probes via {noisy}: {queries:?}" + ); + } } #[test] - fn packet_focus_neighborhood_prefers_source_navigation_files() { - let run_main = test_packet_citation( - "acme_deploy::run_main", - "crates/acme-deploy/src/main.rs", - 0.7, - ); - let focused_event = test_packet_citation( - "EventProcessor", - "crates/acme-deploy/src/event_processor.rs", - 0.99, - ); - let focused_cli = - test_packet_citation("clap::Parser", "crates/acme-deploy/src/cli.rs", 0.2); - let focused_lib = test_packet_citation("cli", "crates/acme-deploy/src/lib.rs", 0.2); - let cross_root_cli = - test_packet_citation("clap::Parser", "crates/other-tool/src/cli.rs", 10.0); - let mut answer = packet_answer_fixture( - "Explain how `acme deploy --json` flows from CLI into runtime.", - vec![ - focused_event, - focused_cli, - run_main, - cross_root_cli, - focused_lib, - ], + fn packet_symbol_probes_keep_brand_phrase_terms_without_flow_anchors() { + let queries = packet_symbol_probe_queries( + "Explain the Acme & Widget architecture.", + PacketTaskClassDto::ArchitectureExplanation, + PacketBudgetModeDto::Standard, ); - let protected = - promote_required_probe_citations(&mut answer, &["acme_deploy::run_main".to_string()]); - let focused = promote_focus_neighborhood_citations(&mut answer, &protected); - let focused_paths = answer - .citations - .iter() - .filter(|citation| focused.contains(&packet_citation_key(citation))) - .filter_map(|citation| citation.file_path.as_deref()) - .collect::>(); - assert!( - focused_paths.starts_with(&[ - "crates/acme-deploy/src/cli.rs", - "crates/acme-deploy/src/lib.rs" - ]), - "source navigation files should be carried before same-root event details or cross-root cli files: {focused_paths:?}" - ); + for expected in ["acme", "widget", "acme_widget"] { + assert!( + queries.iter().any(|query| query == expected), + "brand terms should remain when they are the only concrete anchors: {queries:?}" + ); + } } #[test] - fn packet_focus_neighborhood_skips_protected_file_duplicates() { - let run_main = test_packet_citation( - "acme_deploy::run_main", - "crates/acme-deploy/src/main.rs", - 0.7, + fn packet_symbol_probes_suppress_non_primary_terms_when_prompt_excludes_pollution() { + let question = "How does CodeStory choose semantic retrieval candidates for production source ranking while avoiding fixture pollution in packet evidence?"; + let queries = packet_symbol_probe_queries( + question, + PacketTaskClassDto::ArchitectureExplanation, + PacketBudgetModeDto::Standard, ); - let duplicate_main_cli = - test_packet_citation("acme_deploy::Cli", "crates/acme-deploy/src/main.rs", 5.0); - let duplicate_main_parser = - test_packet_citation("clap::Parser", "crates/acme-deploy/src/main.rs", 5.0); - let focused_cli = test_packet_citation("clap::Args", "crates/acme-deploy/src/cli.rs", 0.2); - let focused_lib = test_packet_citation("cli", "crates/acme-deploy/src/lib.rs", 0.2); - let focused_event = test_packet_citation( - "EventProcessor", - "crates/acme-deploy/src/event_processor.rs", - 0.99, + let concept_queries = packet_concept_queries(question); + + for suppressed in ["fixture", "Fixture"] { + assert!( + !queries.iter().any(|query| query == suppressed), + "packet probes should not search excluded non-primary term {suppressed}: {queries:?}" + ); + assert!( + !concept_queries.iter().any(|query| query == suppressed), + "packet concept queries should not search excluded non-primary term {suppressed}: {concept_queries:?}" + ); + } + assert!( + queries.iter().any(|query| query == "semantic_retrieval"), + "production query should retain concrete production-retrieval probes: {queries:?}" ); - let focused_exec_events = - test_packet_citation("exec_events", "crates/acme-deploy/src/exec_events.rs", 0.98); - let mut answer = packet_answer_fixture( - "Explain how `acme deploy --json` flows from CLI into runtime events.", - vec![ - duplicate_main_cli, - focused_event, - focused_cli, - run_main, - duplicate_main_parser, - focused_exec_events, - focused_lib, - ], + } + + #[test] + fn packet_symbol_probes_keep_non_primary_terms_when_prompt_requests_them() { + let queries = packet_symbol_probe_queries( + "Find the fixture tests for semantic retrieval ranking.", + PacketTaskClassDto::BugLocalization, + PacketBudgetModeDto::Standard, ); - let protected = - promote_required_probe_citations(&mut answer, &["acme_deploy::run_main".to_string()]); - let focused = promote_focus_neighborhood_citations(&mut answer, &protected); - let focused_paths = answer - .citations - .iter() - .filter(|citation| focused.contains(&packet_citation_key(citation))) - .filter_map(|citation| citation.file_path.as_deref()) - .collect::>(); assert!( - !focused_paths.contains(&"crates/acme-deploy/src/main.rs"), - "focus carry should not spend slots on duplicate citations for an already protected file: {focused_paths:?}" + queries.iter().any(|query| query == "fixture"), + "explicit fixture/test requests should keep non-primary probe terms: {queries:?}" ); assert!( - focused_paths.contains(&"crates/acme-deploy/src/cli.rs") - && focused_paths.contains(&"crates/acme-deploy/src/lib.rs") - && focused_paths.contains(&"crates/acme-deploy/src/event_processor.rs") - && focused_paths.contains(&"crates/acme-deploy/src/exec_events.rs"), - "focus carry should preserve distinct same-root source files: {focused_paths:?}" + queries.iter().any(|query| query == "tests"), + "explicit fixture/test requests should keep test probe terms: {queries:?}" ); } #[test] - fn packet_focus_neighborhood_prefers_event_definition_files() { - let run_main = test_packet_citation( - "acme_deploy::run_main", - "crates/acme-deploy/src/main.rs", - 0.7, - ); - let focused_cli = test_packet_citation("clap::Args", "crates/acme-deploy/src/cli.rs", 0.2); - let focused_lib = test_packet_citation("cli", "crates/acme-deploy/src/lib.rs", 0.2); - let focused_event_processor = test_packet_citation( - "EventProcessor", - "crates/acme-deploy/src/event_processor.rs", - 0.99, + fn packet_anchor_probes_reject_generic_file_hits() { + let mut hit = test_search_hit("codex-rs/app-server-protocol/schema/JsonValue.ts", 0.9); + hit.kind = NodeKind::FILE; + hit.match_quality = Some(SearchMatchQualityDto::Exact); + hit.file_path = Some("codex-rs/app-server-protocol/schema/JsonValue.ts".to_string()); + + assert!(!packet_anchor_hit_is_relevant("json", &hit)); + assert!(packet_anchor_hit_is_relevant( + "codex-rs/exec/src/lib.rs", + &hit + )); + + hit.file_path = Some("codex-rs/exec/src/exec_events.rs".to_string()); + assert!(packet_anchor_hit_is_relevant("exec_events", &hit)); + } + + #[test] + fn packet_display_path_preserves_named_repo_subpaths() { + assert_eq!( + packet_display_path(r"\\?\C:\Users\alber\source\repos\codex\codex-rs\exec\src\lib.rs"), + "codex-rs/exec/src/lib.rs" ); - let focused_human_output = test_packet_citation( - "HumanOutput", - "crates/acme-deploy/src/event_processor_with_human_output.rs", - 3.0, + assert_eq!( + packet_display_path( + r"C:\Users\alber\source\repos\codestory\crates\codestory-cli\src\main.rs" + ), + "crates/codestory-cli/src/main.rs" ); - let focused_exec_events = - test_packet_citation("exec_events", "crates/acme-deploy/src/exec_events.rs", 0.1); - let mut answer = packet_answer_fixture( - "Explain how `acme deploy --json` flows through event output.", - vec![ - focused_human_output, - focused_event_processor, - focused_cli, - run_main, - focused_exec_events, - focused_lib, - ], + assert_eq!( + packet_display_path( + r"\\?\C:\Users\alber\source\repos\codestory\target\repo-cache\repos\ripgrep\crates\core\main.rs" + ), + "crates/core/main.rs" ); - let protected = - promote_required_probe_citations(&mut answer, &["acme_deploy::run_main".to_string()]); - let focused = promote_focus_neighborhood_citations(&mut answer, &protected); - - let focused_paths = answer - .citations - .iter() - .filter(|citation| focused.contains(&packet_citation_key(citation))) - .filter_map(|citation| citation.file_path.as_deref()) - .collect::>(); - assert!( - focused_paths.contains(&"crates/acme-deploy/src/exec_events.rs"), - "event definition files should survive compact focus carry: {focused_paths:?}" + assert_eq!( + packet_display_path("target/repo-cache/repos/axios/lib/core/Axios.js"), + "lib/core/Axios.js" ); } #[test] - fn packet_sufficiency_requires_planned_flow_probe_coverage() { - let (_answer, sufficiency) = build_sufficient_packet_fixture( - "Explain how `codex exec --json` flows from the top-level CLI into the exec runtime, app-server thread and turn start requests, and JSONL event output.", - PacketTaskClassDto::ArchitectureExplanation, + fn packet_ranking_prefers_cli_exec_flow_over_sdk_and_test_client() { + let question = "Explain how `codex exec --json` flows from the top-level CLI into the exec runtime and JSONL event output."; + let mut answer = packet_answer_fixture( + question, vec![ + test_packet_citation("CodexExec.run", "sdk/typescript/src/exec.ts", 0.95), + test_packet_citation( + "CodexClient::thread_start", + "codex-rs/app-server-test-client/src/lib.rs", + 0.95, + ), + test_packet_citation("run_exec_session", "codex-rs/exec/src/lib.rs", 0.2), test_packet_citation( "EventProcessorWithJsonOutput", "codex-rs/exec/src/event_processor_with_jsonl_output.rs", + 0.2, + ), + test_packet_citation( + "event_processor_with_jsonl_output", + "codex-rs/exec/src/lib.rs", 0.9, ), - test_packet_citation("exec_cli", "codex-rs/cli/src/main.rs", 0.8), + test_packet_citation("run_main", "codex-rs/exec/src/lib.rs", 0.2), + test_packet_citation("Subcommand::Exec", "codex-rs/cli/src/main.rs", 0.2), test_packet_citation( - "UnifiedExecRuntime", - "codex-rs/core/src/tools/runtimes/unified_exec.rs", - 0.7, + "EventProcessor::process_server_notification", + "codex-rs/exec/src/event_processor.rs", + 0.2, ), + test_packet_citation("ThreadEvent", "codex-rs/exec/src/exec_events.rs", 0.2), test_packet_citation( - "ThreadStartParams", - "codex-rs/app-server-protocol/schema/typescript/v2/ThreadStartParams.ts", - 0.7, + "thread_start_params_include_user_thread_source", + "codex-rs/exec/src/lib_tests.rs", + 0.8, ), ], ); - assert_eq!(sufficiency.status, PacketSufficiencyStatusDto::Partial); + rank_packet_evidence(question, &mut answer); + + let top = answer + .citations + .iter() + .take(6) + .map(|citation| citation.display_name.as_str()) + .collect::>(); assert!( - sufficiency - .gaps - .iter() - .any(|gap| gap.contains("exec session") - && gap.contains("exec command") - && gap.contains("turn start")), - "required flow probes should be named in sufficiency gaps: {sufficiency:?}" + top.iter().any(|name| name.contains("exec") + || name.contains("Exec") + || name.contains("json")), + "exec/json flow symbols should rank in the top band: {top:?}" ); assert!( - sufficiency - .follow_up_commands - .iter() - .any(|command| command.contains("--query 'exec session'")), - "missing required probes should become targeted follow-up searches: {sufficiency:?}" + !top.contains(&"thread_start_params_include_user_thread_source"), + "test-only symbols should not dominate exec/json ranking: {top:?}" ); } #[test] - fn packet_sufficiency_accepts_required_flow_probe_coverage() { - let (_answer, sufficiency) = build_sufficient_packet_fixture( - "Explain how `codex exec --json` flows from the top-level CLI into the exec runtime, app-server thread and turn start requests, and JSONL event output.", - PacketTaskClassDto::ArchitectureExplanation, + fn packet_ranking_prefers_vscode_workbench_flow_over_extension_noise() { + let question = "Explain how VS Code workbench startup reaches extension host activation and command execution."; + let mut answer = packet_answer_fixture( + question, vec![ - test_packet_citation("exec runtime", "codex-rs/exec/src/lib.rs", 0.9), - test_packet_citation("exec session", "codex-rs/exec/src/lib.rs", 0.9), - test_packet_citation("exec cli", "codex-rs/cli/src/main.rs", 0.8), - test_packet_citation("exec command", "codex-rs/cli/src/main.rs", 0.8), test_packet_citation( - "json event output", - "codex-rs/exec/src/event_processor_with_jsonl_output.rs", - 0.8, + "ExtensionHostConnection._sendSocketToExtensionHost", + "src/vs/server/node/extensionHostConnection.ts", + 0.55, ), test_packet_citation( - "jsonl event output", - "codex-rs/exec/src/exec_events.rs", - 0.8, + "ExtensionHost", + "extensions/github-authentication/src/flows.ts", + 0.55, ), test_packet_citation( - "thread start", - "codex-rs/app-server-protocol/schema/typescript/v2/ThreadStartParams.ts", + "Workbench.startup", + "src/vs/workbench/browser/workbench.ts", 0.7, ), test_packet_citation( - "turn start", - "codex-rs/app-server-protocol/schema/typescript/v2/TurnStartParams.ts", + "ExtensionService", + "src/vs/workbench/services/extensions/browser/extensionService.ts", 0.7, ), - ], - ); - - assert_eq!( - sufficiency.status, - PacketSufficiencyStatusDto::Sufficient, - "{sufficiency:?}" - ); - assert!(sufficiency.gaps.is_empty(), "{sufficiency:?}"); - assert!(sufficiency.follow_up_commands.is_empty(), "{sufficiency:?}"); - } - - #[test] - fn packet_sufficiency_rejects_module_facade_for_concrete_file_probe() { - let _eval_probes = EvalProbesGuard::enabled(); - let (_answer, sufficiency) = build_sufficient_packet_fixture( - "Explain how `codex exec --json` flows from the top-level CLI into the exec runtime, app-server thread and turn start requests, and JSONL event output.", - PacketTaskClassDto::ArchitectureExplanation, - vec![ - test_packet_citation("run_exec_session", "codex-rs/exec/src/lib.rs", 0.9), - test_packet_citation("ExecSharedCliOptions", "codex-rs/exec/src/cli.rs", 0.9), - test_packet_citation("Subcommand::Exec", "codex-rs/cli/src/main.rs", 0.8), - test_packet_citation("run_main", "codex-rs/exec/src/main.rs", 0.8), test_packet_citation( - "EventProcessorWithJsonOutput", - "codex-rs/exec/src/event_processor_with_jsonl_output.rs", - 0.8, + "ExtensionHostManager", + "src/vs/workbench/services/extensions/common/extensionHostManager.ts", + 0.7, ), - test_packet_citation("exec_events", "codex-rs/exec/src/lib.rs", 0.8), test_packet_citation( - "ThreadStartParams", - "codex-rs/app-server-protocol/schema/typescript/v2/ThreadStartParams.ts", + "AbstractExtHostExtensionService", + "src/vs/workbench/api/common/extHostExtensionService.ts", 0.7, ), test_packet_citation( - "TurnStartParams", - "codex-rs/app-server-protocol/schema/typescript/v2/TurnStartParams.ts", + "ExtHostCommands", + "src/vs/workbench/api/common/extHostCommands.ts", 0.7, ), ], ); - assert_eq!(sufficiency.status, PacketSufficiencyStatusDto::Partial); + rank_packet_evidence(question, &mut answer); + + let top = answer + .citations + .iter() + .take(5) + .map(|citation| citation.display_name.as_str()) + .collect::>(); + let workbench_symbols = [ + "Workbench.startup", + "ExtensionService", + "ExtensionHostManager", + "AbstractExtHostExtensionService", + "ExtHostCommands", + ]; + let matched = workbench_symbols + .iter() + .filter(|expected| top.contains(expected)) + .count(); assert!( - sufficiency - .gaps - .iter() - .any(|gap| gap.contains("exec_events")), - "facade module declaration should not satisfy the concrete exec_events file probe: {sufficiency:?}" + matched >= 4, + "workbench flow symbols should dominate the top band: matched={matched} top={top:?}" + ); + assert!( + !top.iter() + .any(|name| name.contains("ExtensionHostConnection")), + "server/extension noise should not dominate workbench flow ranking: {top:?}" + ); + assert!( + !top.iter().any(|name| name == &"ExtensionHost"), + "extension contribution modules should not dominate workbench flow ranking: {top:?}" ); } #[test] - fn packet_symbol_probes_expand_indexing_storage_flow_concepts() { + fn packet_capping_prefers_exact_duplicate_claim_definition_file() { let _eval_probes = EvalProbesGuard::enabled(); - let queries = packet_symbol_probe_queries( - "Explain how project/source-group configuration becomes indexing work, then how indexed data is accessed from storage.", - PacketTaskClassDto::ArchitectureExplanation, - PacketBudgetModeDto::Standard, + let question = + "Explain how indexed data is accessed from StorageAccess and PersistentStorage."; + let mut proxy_mention = test_packet_citation( + "StorageAccess", + "src/lib/data/storage/StorageAccessProxy.h", + 0.95, ); + proxy_mention.kind = NodeKind::CLASS; + let mut contract_definition = + test_packet_citation("StorageAccess", "src/lib/data/storage/StorageAccess.h", 0.2); + contract_definition.kind = NodeKind::CLASS; + let mut answer = packet_answer_fixture(question, vec![proxy_mention, contract_definition]); - let position = |needle: &str| { - queries + rank_packet_evidence(question, &mut answer); + apply_packet_budget( + packet_fixture_project_root(), + question, + PacketTaskClassDto::ArchitectureExplanation, + PacketBudgetModeDto::Compact, + packet_budget_limits(PacketBudgetModeDto::Compact), + &mut answer, + ); + assert!( + answer.citations.iter().any(|citation| { + citation.file_path.as_deref() == Some("src/lib/data/storage/StorageAccess.h") + }), + "exact same-stem contract definition should survive adjacent header mentions: {:?}", + answer + .citations .iter() - .position(|query| query == needle) - .unwrap_or_else(|| panic!("missing packet query `{needle}` in {queries:?}")) - }; - - assert!(position("StorageAccess") < position("SourceGroup")); - assert!(position("PersistentStorage") < position("SourceGroup")); - assert!(position("Project::buildIndex") < position("SourceGroup")); - assert!(position("TaskFillIndexerCommandsQueue") < position("SourceGroup")); - assert!(position("IndexerCommandCxx") < position("SourceGroup")); - assert!(position("IndexerJava::doIndex") < position("SourceGroup")); - assert!(queries.iter().any(|query| query == "source_group")); - for expected in [ - "SourceGroupSettings", - "SourceGroupFactoryModule", - "Project::buildIndex", - "SourceGroupCxxCdb", - "SourceGroupCxxCdb::getIndexerCommandProvider", - "buildIndex", - "TaskFillIndexerCommandsQueue", - "IndexerCommand", - "IndexerCommandCxx", - "IndexerJava", - "IndexerJava::doIndex", - "StorageAccess", - "StorageAccessProxy", - "PersistentStorage", - ] { - assert!( - queries.iter().any(|query| query == expected), - "expected flow concept query {expected} in {queries:?}" - ); - } - for noisy in ["turns", "Turns", "cite", "source"] { - assert!( - !queries.iter().any(|query| query == noisy), - "packet probes should suppress noisy query {noisy}: {queries:?}" - ); - } + .map(|citation| (&citation.display_name, citation.file_path.as_deref())) + .collect::>() + ); + assert!( + answer.citations.len() + <= packet_budget_limits(PacketBudgetModeDto::Compact).max_anchors as usize, + "definition preference should not increase packet anchor budget" + ); } #[test] - fn packet_followups_preserve_structured_storage_and_source_group_probes() { - let _eval_probes = EvalProbesGuard::enabled(); - let queries = packet_targeted_follow_up_queries( - "Explain how project/source-group configuration becomes indexing work, then how indexed data is accessed from storage.", - PacketTaskClassDto::ArchitectureExplanation, + fn packet_capping_keeps_bounded_secondary_claim_definitions() { + let mut source_group_method = test_packet_citation( + "SourceGroupCxxCdb::getIndexerCommandProvider", + "src/lib_cxx/project/SourceGroupCxxCdb.cpp", + 0.95, ); - - assert_eq!( - queries.len(), - 6, - "targeted packet follow-ups should stay capped: {queries:?}" + source_group_method.kind = NodeKind::FUNCTION; + let mut storage_ctor = test_packet_citation( + "PersistentStorage::PersistentStorage", + "src/lib/data/storage/PersistentStorage.cpp", + 0.95, ); - assert!( - queries.iter().any(|query| query.contains("Storage")), - "storage-flow follow-ups should keep structured storage probes: {queries:?}" + storage_ctor.kind = NodeKind::FUNCTION; + let mut source_group_type = test_packet_citation( + "SourceGroupCxxCdb", + "src/lib_cxx/project/SourceGroupCxxCdb.h", + 0.2, ); - assert!( - queries.iter().any(|query| query.contains("SourceGroup")), - "source-group follow-ups should keep a structured source-group probe: {queries:?}" + source_group_type.kind = NodeKind::CLASS; + let mut persistent_type = test_packet_citation( + "PersistentStorage", + "src/lib/data/storage/PersistentStorage.h", + 0.2, ); - assert!( - queries - .iter() - .any(|query| query == "SourceGroupCxxCdb::getIndexerCommandProvider"), - "existing qualified source-group probes should still be represented: {queries:?}" + persistent_type.kind = NodeKind::CLASS; + let mut low_value = test_packet_citation( + "SourceGroupFactoryModule::createSourceGroup", + "src/lib/project/SourceGroupFactoryModule.h", + 0.9, + ); + low_value.kind = NodeKind::METHOD; + let mut answer = packet_answer_fixture( + "Explain source group indexing and persistent storage.", + vec![ + source_group_method, + storage_ctor, + source_group_type, + persistent_type, + low_value, + ], ); + let mut limits = packet_budget_limits(PacketBudgetModeDto::Compact); + limits.max_anchors = 4; + limits.max_files = 4; + + assert!(cap_citations(&mut answer, &limits)); + + let paths = answer + .citations + .iter() + .filter_map(|citation| citation.file_path.as_deref()) + .collect::>(); + for expected in [ + "src/lib_cxx/project/SourceGroupCxxCdb.cpp", + "src/lib_cxx/project/SourceGroupCxxCdb.h", + "src/lib/data/storage/PersistentStorage.cpp", + "src/lib/data/storage/PersistentStorage.h", + ] { + assert!( + paths.contains(&expected), + "secondary exact definitions should fit inside the existing cap: {paths:?}" + ); + } + assert_eq!(answer.citations.len(), 4); assert!( - queries - .iter() - .any(|query| query == "PersistentStorage::PersistentStorage"), - "existing persistence constructor probe should still be represented: {queries:?}" + !paths.contains(&"src/lib/project/SourceGroupFactoryModule.h"), + "low-value filler should yield to bounded secondary definitions: {paths:?}" ); } #[test] - fn packet_symbol_probes_expand_vscode_workbench_extension_host_concepts() { - let _eval_probes = EvalProbesGuard::enabled(); - let queries = packet_symbol_probe_queries( - "Explain how VS Code workbench startup reaches extension host activation and command execution.", - PacketTaskClassDto::ArchitectureExplanation, - PacketBudgetModeDto::Standard, + fn packet_capping_prefers_distinct_flow_files_over_same_file_role_duplicates() { + let mut answer = packet_answer_fixture( + "Explain how `codex exec --json` flows from CLI into runtime and event output.", + vec![ + test_packet_citation( + "EventProcessorWithJsonOutput", + "codex-rs/exec/src/event_processor_with_jsonl_output.rs", + 0.95, + ), + test_packet_citation( + "EventProcessorWithJsonOutput::emit", + "codex-rs/exec/src/event_processor_with_jsonl_output.rs", + 0.94, + ), + test_packet_citation( + "EventProcessorWithJsonOutput::completed_item_id", + "codex-rs/exec/src/event_processor_with_jsonl_output.rs", + 0.93, + ), + test_packet_citation("ExecSharedCliOptions", "codex-rs/exec/src/cli.rs", 0.7), + test_packet_citation("run_exec_session", "codex-rs/exec/src/lib.rs", 0.7), + test_packet_citation("exec_events", "codex-rs/exec/src/exec_events.rs", 0.7), + ], ); + let mut limits = packet_budget_limits(PacketBudgetModeDto::Compact); + limits.max_anchors = 4; + limits.max_files = 4; - let position = |needle: &str| { - queries - .iter() - .position(|query| query == needle) - .unwrap_or_else(|| panic!("missing packet query `{needle}` in {queries:?}")) - }; + assert!(cap_citations(&mut answer, &limits)); - assert!(position("Workbench") < position("workbench_startup")); + let paths = answer + .citations + .iter() + .filter_map(|citation| citation.file_path.as_deref()) + .collect::>(); for expected in [ - "Workbench.startup", - "ExtensionService", - "ExtensionHostManager", - "ExtensionHostManager.startup", - "AbstractExtHostExtensionService", - "AbstractExtHostExtensionService._startExtensionHost", - "ExtHostCommands", - "ExtHostCommands.executeCommand", + "codex-rs/exec/src/event_processor_with_jsonl_output.rs", + "codex-rs/exec/src/cli.rs", + "codex-rs/exec/src/lib.rs", + "codex-rs/exec/src/exec_events.rs", ] { assert!( - queries.iter().any(|query| query == expected), - "expected VS Code flow concept query {expected} in {queries:?}" + paths.contains(&expected), + "packet capping should spend scarce anchors on distinct flow files before same-file duplicate claims: {paths:?}" ); } + assert_eq!(answer.citations.len(), 4); } #[test] - fn packet_symbol_probes_treat_root_runtime_as_brand_for_payload_content_flow() { + fn packet_budget_protects_required_probe_citations_from_compact_cap() { let _eval_probes = EvalProbesGuard::enabled(); - let queries = packet_symbol_probe_queries( - "Explain how Root & Runtime public writing and social surfaces connect through Payload collections, post rendering, comment auth/submission, RSS, and the Elsewhere feed.", + let question = "Explain how `codex exec --json` flows from the top-level CLI into the exec runtime, app-server thread and turn start requests, and JSONL event output."; + let mut citations = (0..16) + .map(|index| { + test_packet_citation( + &format!("HighRankDistractor{index}"), + &format!("src/high_rank_distractor_{index}.rs"), + 10.0, + ) + }) + .collect::>(); + citations.extend([ + test_packet_citation("run_exec_session", "codex-rs/exec/src/lib.rs", 0.1), + test_packet_citation("ExecSharedCliOptions", "codex-rs/exec/src/cli.rs", 0.1), + test_packet_citation("Subcommand::Exec", "codex-rs/cli/src/main.rs", 0.1), + test_packet_citation("run_main", "codex-rs/app-server/src/lib.rs", 6.0), + test_packet_citation("codex_exec::run_main", "codex-rs/exec/src/main.rs", 0.1), + test_packet_citation("codex_exec::Cli", "codex-rs/exec/src/main.rs", 0.1), + test_packet_citation( + "EventProcessorWithJsonOutput", + "codex-rs/exec/src/event_processor_with_jsonl_output.rs", + 0.1, + ), + test_packet_citation( + "crate::exec_events::AgentMessageItem (import)", + "codex-rs/exec/src/event_processor_with_jsonl_output.rs", + 5.0, + ), + test_packet_citation("exec_events", "codex-rs/exec/src/lib.rs", 6.0), + test_packet_citation( + "codex_protocol::models::WebSearchAction", + "codex-rs/exec/src/exec_events.rs", + 0.1, + ), + test_packet_citation( + "ThreadStartParams", + "codex-rs/app-server-protocol/src/protocol/v2/thread.rs", + 0.1, + ), + test_packet_citation( + "TurnStartParams", + "codex-rs/app-server-protocol/src/protocol/v2/turn.rs", + 0.1, + ), + ]); + let mut answer = packet_answer_fixture(question, citations); + + rank_packet_evidence(question, &mut answer); + let budget = apply_packet_budget( + packet_fixture_project_root(), + question, PacketTaskClassDto::ArchitectureExplanation, - PacketBudgetModeDto::Standard, + PacketBudgetModeDto::Compact, + packet_budget_limits(PacketBudgetModeDto::Compact), + &mut answer, ); - let position = |needle: &str| { - queries - .iter() - .position(|query| query == needle) - .unwrap_or_else(|| panic!("missing packet query `{needle}` in {queries:?}")) - }; - - assert!(position("buildConfig") < position("public_writing")); + let paths = answer + .citations + .iter() + .filter_map(|citation| citation.file_path.as_deref()) + .collect::>(); for expected in [ - "payload_config", - "payload config", - "src/payload.config.ts", - "getPayloadClient", - "payload client", - "src/lib/payload.ts", - "Posts", - "PostPage", - "getPostBySlug", - "getAllPosts", - "Comments", - "POST /posts/:slug/comments", - "getApprovedCommentsForPost", - "getCommentAuthContextFromHeaders", - "comment_submission_guard", - "comment-submission-guard", - "src/lib/comment-submission-guard.ts", - "isCommentSubmissionOriginAllowed", - "isCommentSubmissionTimingAllowed", - "consumeCommentSubmissionRateLimit", - "SocialEntries", - "getLatestSocialEntries", - "feed.xml", + "codex-rs/exec/src/lib.rs", + "codex-rs/exec/src/cli.rs", + "codex-rs/cli/src/main.rs", + "codex-rs/exec/src/main.rs", + "codex-rs/exec/src/event_processor_with_jsonl_output.rs", + "codex-rs/exec/src/exec_events.rs", + "codex-rs/app-server-protocol/src/protocol/v2/thread.rs", + "codex-rs/app-server-protocol/src/protocol/v2/turn.rs", ] { assert!( - queries.iter().any(|query| query == expected), - "expected Payload/content flow query {expected} in {queries:?}" + paths.contains(&expected), + "compact packet cap should protect required planned-probe citations before high-ranking distractors: {paths:?}" ); } - for noisy in ["root", "runtime", "root_runtime", "runtime_public"] { + assert!( + answer + .citations + .iter() + .any(|citation| citation.file_path.as_deref() + == Some("codex-rs/exec/src/exec_events.rs")), + "required probe protection should prefer the actual exec_events file over import-only or facade mentions: {:?}", + answer + .citations + .iter() + .map(|citation| (&citation.display_name, citation.file_path.as_deref())) + .collect::>() + ); + let exact_entrypoint_index = answer + .citations + .iter() + .position(|citation| { + citation.display_name == "codex_exec::run_main" + && citation.file_path.as_deref() == Some("codex-rs/exec/src/main.rs") + }) + .unwrap_or_else(|| { + panic!( + "command-span exact probes should protect the qualified command entrypoint: {:?}", + answer + .citations + .iter() + .map(|citation| (&citation.display_name, citation.file_path.as_deref())) + .collect::>() + ) + }); + if let Some(broad_entrypoint_index) = answer.citations.iter().position(|citation| { + citation.display_name == "run_main" + && citation.file_path.as_deref() == Some("codex-rs/app-server/src/lib.rs") + }) { assert!( - !queries.iter().any(|query| query == noisy), - "brand terms should not dominate content-flow probes via {noisy}: {queries:?}" + exact_entrypoint_index < broad_entrypoint_index, + "qualified command evidence should be protected ahead of weaker broad command evidence: {:?}", + answer + .citations + .iter() + .map(|citation| (&citation.display_name, citation.file_path.as_deref())) + .collect::>() ); } + assert!( + answer + .citations + .iter() + .any(|citation| citation.display_name == "codex_exec::Cli" + && citation.file_path.as_deref() == Some("codex-rs/exec/src/main.rs")), + "command-span exact probes should protect the qualified command CLI type: {:?}", + answer + .citations + .iter() + .map(|citation| (&citation.display_name, citation.file_path.as_deref())) + .collect::>() + ); + assert!(budget.truncated); + assert_eq!(answer.citations.len(), 13); } #[test] - fn packet_symbol_probes_keep_brand_phrase_terms_without_flow_anchors() { - let queries = packet_symbol_probe_queries( - "Explain the Acme & Widget architecture.", - PacketTaskClassDto::ArchitectureExplanation, - PacketBudgetModeDto::Standard, - ); - - for expected in ["acme", "widget", "acme_widget"] { - assert!( - queries.iter().any(|query| query == expected), - "brand terms should remain when they are the only concrete anchors: {queries:?}" - ); - } - } + fn packet_budget_protects_generic_indexing_flow_probe_citations() { + let question = "Explain how a full indexing run moves from the CLI into runtime orchestration, file discovery, symbol extraction, persistence, and search or snapshot refresh."; + let mut citations = (0..20) + .map(|index| { + test_packet_citation( + &format!("HighRankDistractor{index}"), + &format!("src/high_rank_distractor_{index}.rs"), + 10.0, + ) + }) + .collect::>(); + citations.extend([ + test_packet_citation( + "test_incremental_indexing_second_run_reuses_unchanged_extraction_cache_and_resolution_support", + "crates/codestory-indexer/tests/integration.rs", + 9.5, + ), + test_packet_citation( + "EmbeddingRuntime::test_runtime", + "crates/codestory-runtime/src/search/engine.rs", + 9.4, + ), + test_packet_citation( + "tests::drill_question_search_is_partial_discovery_evidence", + "crates/codestory-cli/src/main.rs", + 9.3, + ), + test_packet_citation( + "indexing entrypoint", + "crates/codestory-runtime/src/services.rs", + 0.1, + ), + test_packet_citation( + "file discovery", + "crates/codestory-workspace/src/lib.rs", + 0.1, + ), + test_packet_citation( + "symbol extraction", + "crates/codestory-indexer/src/lib.rs", + 0.1, + ), + test_packet_citation( + "storage persistence", + "crates/codestory-store/src/storage_impl/mod.rs", + 0.1, + ), + test_packet_citation( + "search projection", + "crates/codestory-store/src/storage_impl/mod.rs", + 0.1, + ), + test_packet_citation( + "snapshot refresh", + "crates/codestory-store/src/snapshot_store.rs", + 0.1, + ), + ]); + let mut answer = packet_answer_fixture(question, citations); - #[test] - fn packet_symbol_probes_suppress_non_primary_terms_when_prompt_excludes_pollution() { - let question = "How does CodeStory choose semantic retrieval candidates for production source ranking while avoiding fixture pollution in packet evidence?"; - let queries = packet_symbol_probe_queries( + rank_packet_evidence(question, &mut answer); + let budget = apply_packet_budget( + packet_fixture_project_root(), question, PacketTaskClassDto::ArchitectureExplanation, - PacketBudgetModeDto::Standard, + PacketBudgetModeDto::Compact, + packet_budget_limits(PacketBudgetModeDto::Compact), + &mut answer, ); - let concept_queries = packet_concept_queries(question); - for suppressed in ["fixture", "Fixture"] { + let display_names = answer + .citations + .iter() + .map(|citation| citation.display_name.as_str()) + .collect::>(); + for expected in [ + "indexing entrypoint", + "file discovery", + "symbol extraction", + "storage persistence", + "search projection", + "snapshot refresh", + ] { assert!( - !queries.iter().any(|query| query == suppressed), - "packet probes should not search excluded non-primary term {suppressed}: {queries:?}" + display_names.contains(&expected), + "compact packet cap should protect generic indexing-flow probe {expected}: {display_names:?}" ); + } + for low_value in [ + "test_incremental_indexing_second_run_reuses_unchanged_extraction_cache_and_resolution_support", + "EmbeddingRuntime::test_runtime", + "tests::drill_question_search_is_partial_discovery_evidence", + ] { assert!( - !concept_queries.iter().any(|query| query == suppressed), - "packet concept queries should not search excluded non-primary term {suppressed}: {concept_queries:?}" + !display_names.contains(&low_value), + "compact packet should not keep low-value test evidence when production alternatives exist: {display_names:?}" ); } - assert!( - queries.iter().any(|query| query == "semantic_retrieval"), - "production query should retain concrete production-retrieval probes: {queries:?}" - ); - } - - #[test] - fn packet_symbol_probes_keep_non_primary_terms_when_prompt_requests_them() { - let queries = packet_symbol_probe_queries( - "Find the fixture tests for semantic retrieval ranking.", - PacketTaskClassDto::BugLocalization, - PacketBudgetModeDto::Standard, - ); - - assert!( - queries.iter().any(|query| query == "fixture"), - "explicit fixture/test requests should keep non-primary probe terms: {queries:?}" - ); - assert!( - queries.iter().any(|query| query == "tests"), - "explicit fixture/test requests should keep test probe terms: {queries:?}" - ); - } - - #[test] - fn packet_anchor_probes_reject_generic_file_hits() { - let mut hit = test_search_hit("codex-rs/app-server-protocol/schema/JsonValue.ts", 0.9); - hit.kind = NodeKind::FILE; - hit.match_quality = Some(SearchMatchQualityDto::Exact); - hit.file_path = Some("codex-rs/app-server-protocol/schema/JsonValue.ts".to_string()); - - assert!(!packet_anchor_hit_is_relevant("json", &hit)); - assert!(packet_anchor_hit_is_relevant( - "codex-rs/exec/src/lib.rs", - &hit - )); - - hit.file_path = Some("codex-rs/exec/src/exec_events.rs".to_string()); - assert!(packet_anchor_hit_is_relevant("exec_events", &hit)); + assert!(budget.truncated); + assert_eq!(answer.citations.len(), 13); } #[test] - fn packet_display_path_preserves_named_repo_subpaths() { - assert_eq!( - packet_display_path(r"\\?\C:\Users\alber\source\repos\codex\codex-rs\exec\src\lib.rs"), - "codex-rs/exec/src/lib.rs" - ); - assert_eq!( - packet_display_path( - r"C:\Users\alber\source\repos\codestory\crates\codestory-cli\src\main.rs" + fn packet_budget_replaces_overrepresented_family_with_late_flow_role() { + let question = "Explain how project/source-group configuration becomes indexing work and storage access."; + let mut citations = (0..10) + .map(|index| { + test_packet_citation( + &format!("SourceGroupFamily{index}"), + &format!("src/lib/project/SourceGroupFamily{index}.cpp"), + 10.0 - (index as f32 * 0.1), + ) + }) + .collect::>(); + citations.extend([ + test_packet_citation( + "TaskFillIndexerCommandsQueue", + "src/lib/data/indexer/TaskFillIndexerCommandsQueue.h", + 0.2, ), - "crates/codestory-cli/src/main.rs" - ); - } + test_packet_citation("StorageAccess", "src/lib/data/storage/StorageAccess.h", 0.1), + ]); + let mut answer = packet_answer_fixture(question, citations); + let mut limits = packet_budget_limits(PacketBudgetModeDto::Compact); + limits.max_anchors = 6; + limits.max_files = 6; - #[test] - fn packet_ranking_prefers_cli_exec_flow_over_sdk_and_test_client() { - let question = "Explain how `codex exec --json` flows from the top-level CLI into the exec runtime and JSONL event output."; - let mut answer = packet_answer_fixture( + rank_packet_evidence(question, &mut answer); + let budget = apply_packet_budget( + packet_fixture_project_root(), question, - vec![ - test_packet_citation("CodexExec.run", "sdk/typescript/src/exec.ts", 0.95), - test_packet_citation( - "CodexClient::thread_start", - "codex-rs/app-server-test-client/src/lib.rs", - 0.95, - ), - test_packet_citation("run_exec_session", "codex-rs/exec/src/lib.rs", 0.2), - test_packet_citation( - "EventProcessorWithJsonOutput", - "codex-rs/exec/src/event_processor_with_jsonl_output.rs", - 0.2, - ), - test_packet_citation( - "event_processor_with_jsonl_output", - "codex-rs/exec/src/lib.rs", - 0.9, - ), - test_packet_citation("run_main", "codex-rs/exec/src/lib.rs", 0.2), - test_packet_citation("Subcommand::Exec", "codex-rs/cli/src/main.rs", 0.2), - test_packet_citation( - "EventProcessor::process_server_notification", - "codex-rs/exec/src/event_processor.rs", - 0.2, - ), - test_packet_citation("ThreadEvent", "codex-rs/exec/src/exec_events.rs", 0.2), - test_packet_citation( - "thread_start_params_include_user_thread_source", - "codex-rs/exec/src/lib_tests.rs", - 0.8, - ), - ], + PacketTaskClassDto::ArchitectureExplanation, + PacketBudgetModeDto::Compact, + limits, + &mut answer, ); - rank_packet_evidence(question, &mut answer); - - let top = answer + assert!(budget.truncated, "fixture should exercise citation capping"); + let paths = answer .citations .iter() - .take(6) - .map(|citation| citation.display_name.as_str()) + .filter_map(|citation| citation.file_path.as_deref()) .collect::>(); assert!( - top.iter().any(|name| name.contains("exec") - || name.contains("Exec") - || name.contains("json")), - "exec/json flow symbols should rank in the top band: {top:?}" + paths.contains(&"src/lib/data/indexer/TaskFillIndexerCommandsQueue.h"), + "late indexing evidence should replace overrepresented source-group evidence: {paths:?}" ); assert!( - !top.contains(&"thread_start_params_include_user_thread_source"), - "test-only symbols should not dominate exec/json ranking: {top:?}" + paths.contains(&"src/lib/data/storage/StorageAccess.h"), + "late storage evidence should replace overrepresented source-group evidence: {paths:?}" ); } #[test] - fn packet_ranking_prefers_vscode_workbench_flow_over_extension_noise() { - let question = "Explain how VS Code workbench startup reaches extension host activation and command execution."; - let mut answer = packet_answer_fixture( - question, - vec![ - test_packet_citation( - "ExtensionHostConnection._sendSocketToExtensionHost", - "src/vs/server/node/extensionHostConnection.ts", - 0.55, - ), - test_packet_citation( - "ExtensionHost", - "extensions/github-authentication/src/flows.ts", - 0.55, - ), - test_packet_citation( - "Workbench.startup", - "src/vs/workbench/browser/workbench.ts", - 0.7, - ), - test_packet_citation( - "ExtensionService", - "src/vs/workbench/services/extensions/browser/extensionService.ts", - 0.7, - ), - test_packet_citation( - "ExtensionHostManager", - "src/vs/workbench/services/extensions/common/extensionHostManager.ts", - 0.7, - ), - test_packet_citation( - "AbstractExtHostExtensionService", - "src/vs/workbench/api/common/extHostExtensionService.ts", - 0.7, - ), - test_packet_citation( - "ExtHostCommands", - "src/vs/workbench/api/common/extHostCommands.ts", - 0.7, - ), + fn packet_budget_replaces_weaker_same_role_with_late_definition_files() { + let question = + "Explain how source-group configuration becomes indexing work and storage access."; + let mut set_subject = test_packet_citation( + "StorageAccessProxy::setSubject", + "src/lib/data/storage/StorageAccessProxy.h", + 9.0, + ); + set_subject.kind = NodeKind::METHOD; + let mut storage_contract = + test_packet_citation("StorageAccess", "src/lib/data/storage/StorageAccess.h", 0.1); + storage_contract.kind = NodeKind::CLASS; + let mut persistent_storage = test_packet_citation( + "PersistentStorage", + "src/lib/data/storage/PersistentStorage.h", + 0.1, + ); + persistent_storage.kind = NodeKind::CLASS; + let mut task_ctor = test_packet_citation( + "TaskFillIndexerCommandsQueue::TaskFillIndexerCommandsQueue", + "src/lib/data/indexer/TaskFillIndexerCommandQueue.cpp", + 8.0, + ); + task_ctor.kind = NodeKind::FUNCTION; + let mut task_type = test_packet_citation( + "TaskFillIndexerCommandsQueue", + "src/lib/data/indexer/TaskFillIndexerCommandQueue.h", + 0.1, + ); + task_type.kind = NodeKind::CLASS; + let mut answer = packet_answer_fixture( + question, + vec![ + test_packet_citation("SourceGroupFamily0", "src/lib/project/SourceGroup0.h", 10.0), + test_packet_citation("SourceGroupFamily1", "src/lib/project/SourceGroup1.h", 9.5), + set_subject, + test_packet_citation("StorageTest", "src/test/StorageTestSuite.cpp", 9.0), + task_ctor, + storage_contract, + persistent_storage, + task_type, ], ); + let mut limits = packet_budget_limits(PacketBudgetModeDto::Compact); + limits.max_anchors = 6; + limits.max_files = 6; - rank_packet_evidence(question, &mut answer); + assert!(cap_citations(&mut answer, &limits)); - let top = answer + let paths = answer .citations .iter() - .take(5) - .map(|citation| citation.display_name.as_str()) + .filter_map(|citation| citation.file_path.as_deref()) .collect::>(); - let workbench_symbols = [ - "Workbench.startup", - "ExtensionService", - "ExtensionHostManager", - "AbstractExtHostExtensionService", - "ExtHostCommands", - ]; - let matched = workbench_symbols - .iter() - .filter(|expected| top.contains(expected)) - .count(); - assert!( - matched >= 4, - "workbench flow symbols should dominate the top band: matched={matched} top={top:?}" - ); - assert!( - !top.iter() - .any(|name| name.contains("ExtensionHostConnection")), - "server/extension noise should not dominate workbench flow ranking: {top:?}" - ); + for expected in [ + "src/lib/data/storage/StorageAccess.h", + "src/lib/data/storage/PersistentStorage.h", + "src/lib/data/indexer/TaskFillIndexerCommandQueue.h", + ] { + assert!( + paths.contains(&expected), + "late definition-file anchors should replace weaker same-role/test citations: {paths:?}" + ); + } assert!( - !top.iter().any(|name| name == &"ExtensionHost"), - "extension contribution modules should not dominate workbench flow ranking: {top:?}" + !paths.contains(&"src/test/StorageTestSuite.cpp"), + "test evidence should not consume compact citation slots for production-flow prompts: {paths:?}" ); + assert!(answer.citations.len() <= limits.max_anchors as usize); } #[test] - fn packet_capping_prefers_exact_duplicate_claim_definition_file() { - let _eval_probes = EvalProbesGuard::enabled(); - let question = - "Explain how indexed data is accessed from StorageAccess and PersistentStorage."; - let mut proxy_mention = test_packet_citation( - "StorageAccess", + fn packet_budget_protects_storage_required_probe_citations() { + let question = "Explain how project/source-group configuration becomes indexing work, then how indexed data is accessed by the application."; + let mut proxy_header = test_packet_citation( + "StorageAccessProxy", "src/lib/data/storage/StorageAccessProxy.h", - 0.95, + 10.0, ); - proxy_mention.kind = NodeKind::CLASS; - let mut contract_definition = - test_packet_citation("StorageAccess", "src/lib/data/storage/StorageAccess.h", 0.2); - contract_definition.kind = NodeKind::CLASS; - let mut answer = packet_answer_fixture(question, vec![proxy_mention, contract_definition]); + proxy_header.kind = NodeKind::CLASS; + let mut storage_contract = + test_packet_citation("StorageAccess", "src/lib/data/storage/StorageAccess.h", 0.1); + storage_contract.kind = NodeKind::CLASS; + let mut proxy_method = test_packet_citation( + "StorageAccessProxy::setSubject", + "src/lib/data/storage/StorageAccessProxy.cpp", + 0.1, + ); + proxy_method.kind = NodeKind::METHOD; + let mut persistent_type = test_packet_citation( + "PersistentStorage", + "src/lib/data/storage/PersistentStorage.h", + 0.1, + ); + persistent_type.kind = NodeKind::CLASS; + let mut persistent_ctor = test_packet_citation( + "PersistentStorage::PersistentStorage", + "src/lib/data/storage/PersistentStorage.cpp", + 0.1, + ); + persistent_ctor.kind = NodeKind::FUNCTION; + let mut answer = packet_answer_fixture( + question, + vec![ + test_packet_citation("SourceGroup", "src/lib/project/SourceGroup.h", 9.5), + proxy_header, + test_packet_citation("StorageRegression", "src/test/StorageRegression.cpp", 9.0), + storage_contract, + proxy_method, + persistent_type, + persistent_ctor, + ], + ); + let mut limits = packet_budget_limits(PacketBudgetModeDto::Compact); + limits.max_anchors = 5; + limits.max_files = 5; rank_packet_evidence(question, &mut answer); - apply_packet_budget( + let budget = apply_packet_budget( packet_fixture_project_root(), question, PacketTaskClassDto::ArchitectureExplanation, PacketBudgetModeDto::Compact, - packet_budget_limits(PacketBudgetModeDto::Compact), + limits, &mut answer, ); + + assert!(budget.truncated, "fixture should exercise compact capping"); + let paths = answer + .citations + .iter() + .filter_map(|citation| citation.file_path.as_deref()) + .collect::>(); + for expected in [ + "src/lib/data/storage/StorageAccess.h", + "src/lib/data/storage/PersistentStorage.h", + "src/lib/data/storage/PersistentStorage.cpp", + ] { + assert!( + paths.contains(&expected), + "storage-flow required probes should protect exact contract and implementation paths: {paths:?}" + ); + } assert!( - answer.citations.iter().any(|citation| { - citation.file_path.as_deref() == Some("src/lib/data/storage/StorageAccess.h") - }), - "exact same-stem contract definition should survive adjacent header mentions: {:?}", - answer - .citations - .iter() - .map(|citation| (&citation.display_name, citation.file_path.as_deref())) - .collect::>() - ); - assert!( - answer.citations.len() - <= packet_budget_limits(PacketBudgetModeDto::Compact).max_anchors as usize, - "definition preference should not increase packet anchor budget" + !paths.contains(&"src/test/StorageRegression.cpp"), + "test filler should not displace protected production storage paths: {paths:?}" ); } #[test] - fn packet_capping_keeps_bounded_secondary_claim_definitions() { - let mut source_group_method = test_packet_citation( - "SourceGroupCxxCdb::getIndexerCommandProvider", - "src/lib_cxx/project/SourceGroupCxxCdb.cpp", - 0.95, - ); - source_group_method.kind = NodeKind::FUNCTION; - let mut storage_ctor = test_packet_citation( - "PersistentStorage::PersistentStorage", - "src/lib/data/storage/PersistentStorage.cpp", - 0.95, - ); - storage_ctor.kind = NodeKind::FUNCTION; + fn packet_budget_protects_indexing_required_probe_citations() { + let question = "Explain how project/source-group configuration becomes indexing work, then how command providers create C++ and Java work items."; + let mut citations = (0..10) + .map(|index| { + test_packet_citation( + &format!("SourceGroupNoise{index}"), + &format!("src/lib/project/SourceGroupNoise{index}.h"), + 10.0 - (index as f32 * 0.1), + ) + }) + .collect::>(); + let mut build_index = + test_packet_citation("Project::buildIndex", "src/lib/project/Project.cpp", 0.1); + build_index.kind = NodeKind::FUNCTION; let mut source_group_type = test_packet_citation( "SourceGroupCxxCdb", "src/lib_cxx/project/SourceGroupCxxCdb.h", - 0.2, + 0.1, ); source_group_type.kind = NodeKind::CLASS; - let mut persistent_type = test_packet_citation( - "PersistentStorage", - "src/lib/data/storage/PersistentStorage.h", - 0.2, + let mut source_group_provider = test_packet_citation( + "SourceGroupCxxCdb::getIndexerCommandProvider", + "src/lib_cxx/project/SourceGroupCxxCdb.cpp", + 0.1, ); - persistent_type.kind = NodeKind::CLASS; - let mut low_value = test_packet_citation( - "SourceGroupFactoryModule::createSourceGroup", - "src/lib/project/SourceGroupFactoryModule.h", - 0.9, + source_group_provider.kind = NodeKind::METHOD; + let mut task_queue = test_packet_citation( + "TaskFillIndexerCommandsQueue", + "src/lib/data/indexer/TaskFillIndexerCommandQueue.h", + 0.1, ); - low_value.kind = NodeKind::METHOD; - let mut answer = packet_answer_fixture( - "Explain source group indexing and persistent storage.", - vec![ - source_group_method, - storage_ctor, - source_group_type, - persistent_type, - low_value, - ], + task_queue.kind = NodeKind::CLASS; + let mut cxx_command = test_packet_citation( + "IndexerCommandCxx", + "src/lib_cxx/data/indexer/IndexerCommandCxx.h", + 0.1, + ); + cxx_command.kind = NodeKind::CLASS; + let mut java_indexer = test_packet_citation( + "IndexerJava::doIndex", + "src/lib_java/data/indexer/IndexerJava.cpp", + 0.1, + ); + java_indexer.kind = NodeKind::METHOD; + citations.extend([ + build_index, + source_group_type, + source_group_provider, + task_queue, + cxx_command, + java_indexer, + test_packet_citation("IndexerRegression", "src/test/IndexerRegression.cpp", 9.5), + ]); + let mut answer = packet_answer_fixture(question, citations); + let mut limits = packet_budget_limits(PacketBudgetModeDto::Compact); + limits.max_anchors = 8; + limits.max_files = 8; + + rank_packet_evidence(question, &mut answer); + let budget = apply_packet_budget( + packet_fixture_project_root(), + question, + PacketTaskClassDto::ArchitectureExplanation, + PacketBudgetModeDto::Compact, + limits, + &mut answer, ); - let mut limits = packet_budget_limits(PacketBudgetModeDto::Compact); - limits.max_anchors = 4; - limits.max_files = 4; - - assert!(cap_citations(&mut answer, &limits)); + assert!(budget.truncated, "fixture should exercise compact capping"); let paths = answer .citations .iter() .filter_map(|citation| citation.file_path.as_deref()) .collect::>(); for expected in [ - "src/lib_cxx/project/SourceGroupCxxCdb.cpp", + "src/lib/project/Project.cpp", "src/lib_cxx/project/SourceGroupCxxCdb.h", - "src/lib/data/storage/PersistentStorage.cpp", - "src/lib/data/storage/PersistentStorage.h", + "src/lib_cxx/data/indexer/IndexerCommandCxx.h", ] { assert!( paths.contains(&expected), - "secondary exact definitions should fit inside the existing cap: {paths:?}" + "indexing required probes should protect exact source-group and work-queue paths: {paths:?}" ); } - assert_eq!(answer.citations.len(), 4); assert!( - !paths.contains(&"src/lib/project/SourceGroupFactoryModule.h"), - "low-value filler should yield to bounded secondary definitions: {paths:?}" + !paths.contains(&"src/test/IndexerRegression.cpp"), + "test filler should not displace protected production indexing paths: {paths:?}" ); } #[test] - fn packet_capping_prefers_distinct_flow_files_over_same_file_role_duplicates() { + fn packet_budget_fills_spare_capacity_with_deferred_production_before_tests() { + let mut proxy_header = test_packet_citation( + "StorageAccessProxy", + "src/lib/data/storage/StorageAccessProxy.h", + 0.9, + ); + proxy_header.kind = NodeKind::CLASS; + let mut proxy_impl = test_packet_citation( + "StorageAccessProxy", + "src/lib/data/storage/StorageAccessProxy.cpp", + 0.1, + ); + proxy_impl.kind = NodeKind::CLASS; let mut answer = packet_answer_fixture( - "Explain how `codex exec --json` flows from CLI into runtime and event output.", + "Explain runtime routing, indexing, and storage access.", vec![ + test_packet_citation("RuntimeCoordinator", "src/runtime/coordinator.rs", 0.8), + proxy_header, test_packet_citation( - "EventProcessorWithJsonOutput", - "codex-rs/exec/src/event_processor_with_jsonl_output.rs", - 0.95, - ), - test_packet_citation( - "EventProcessorWithJsonOutput::emit", - "codex-rs/exec/src/event_processor_with_jsonl_output.rs", - 0.94, - ), - test_packet_citation( - "EventProcessorWithJsonOutput::completed_item_id", - "codex-rs/exec/src/event_processor_with_jsonl_output.rs", - 0.93, + "TaskBuildIndex", + "src/lib/data/indexer/TaskBuildIndex.h", + 0.8, ), - test_packet_citation("ExecSharedCliOptions", "codex-rs/exec/src/cli.rs", 0.7), - test_packet_citation("run_exec_session", "codex-rs/exec/src/lib.rs", 0.7), - test_packet_citation("exec_events", "codex-rs/exec/src/exec_events.rs", 0.7), + test_packet_citation("StorageRegression", "src/test/StorageRegression.cpp", 10.0), + proxy_impl, ], ); let mut limits = packet_budget_limits(PacketBudgetModeDto::Compact); @@ -7385,1216 +4908,1691 @@ mod tests { let paths = answer .citations .iter() - .filter_map(|citation| citation.file_path.as_deref()) + .filter_map(|citation| citation.file_path.as_deref()) + .collect::>(); + assert!( + paths.contains(&"src/lib/data/storage/StorageAccessProxy.cpp"), + "deferred production evidence should fill spare capacity before deferred tests: {paths:?}" + ); + assert!( + !paths.contains(&"src/test/StorageRegression.cpp"), + "deferred tests should only fill after production evidence is exhausted: {paths:?}" + ); + } + + #[test] + fn packet_budget_defers_test_evidence_when_compact_cap_is_full() { + let mut answer = packet_answer_fixture( + "Explain runtime routing and persistence.", + vec![ + test_packet_citation("RuntimeCoordinator", "src/runtime/coordinator.rs", 0.8), + test_packet_citation("RouteHandler", "src/router/handler.rs", 0.8), + test_packet_citation("ProjectionStore", "src/store/projection.rs", 0.8), + test_packet_citation("RegressionCase", "tests/runtime_regression.rs", 10.0), + ], + ); + let mut limits = packet_budget_limits(PacketBudgetModeDto::Compact); + limits.max_anchors = 3; + limits.max_files = 3; + + assert!(cap_citations(&mut answer, &limits)); + + let paths = answer + .citations + .iter() + .filter_map(|citation| citation.file_path.as_deref()) + .collect::>(); + assert!( + !paths.contains(&"tests/runtime_regression.rs"), + "test evidence should fill only spare citation capacity: {paths:?}" + ); + } + + #[test] + fn packet_ranking_demotes_low_signal_current_symbols() { + let question = "Study current architecture and runtime boundaries."; + let mut answer = packet_answer_fixture( + question, + vec![ + test_packet_citation("current", "src/runtime/current.rs", 5.0), + test_packet_citation("RuntimeCoordinator", "src/runtime/coordinator.rs", 0.2), + ], + ); + + rank_packet_evidence(question, &mut answer); + + assert_eq!(answer.citations[0].display_name, "RuntimeCoordinator"); + } + + #[test] + fn packet_ranking_keeps_explicit_low_signal_symbol_queries() { + let question = "Find `current`."; + let mut answer = packet_answer_fixture( + question, + vec![ + test_packet_citation("current", "src/runtime/current.rs", 0.2), + test_packet_citation("RuntimeCoordinator", "src/runtime/coordinator.rs", 0.2), + ], + ); + + rank_packet_evidence(question, &mut answer); + + assert_eq!(answer.citations[0].display_name, "current"); + } + + #[test] + fn investigation_mode_is_explicit_preset_only() { + let mut profile = latency_profile(); + profile.policy_mode = AgentRetrievalPolicyModeDto::CompletenessFirst; + assert!(!should_investigate(&profile)); + + profile.preset = codestory_contracts::api::AgentRetrievalPresetDto::Investigate; + assert!(should_investigate(&profile)); + } + + #[test] + fn weak_initial_hits_use_normalized_search_scores() { + assert!(!weak_initial_hits( + "strong", + &[ + test_search_hit("strong", 0.31), + test_search_hit("second", 0.20), + test_search_hit("third", 0.10), + ] + )); + assert!(weak_initial_hits( + "weak", + &[ + test_search_hit("weak", 0.29), + test_search_hit("second", 0.20), + test_search_hit("third", 0.10), + ] + )); + assert!(weak_initial_hits( + "too_few", + &[test_search_hit("too_few", 0.29)] + )); + } + + #[test] + fn weak_initial_hits_treat_semantic_only_matches_as_low_confidence() { + assert!(weak_initial_hits( + "unrelated billing conveyor", + &[ + test_semantic_only_hit("semantic_one", 0.90), + test_semantic_only_hit("semantic_two", 0.80), + test_semantic_only_hit("semantic_three", 0.70), + ] + )); + } + + #[test] + fn weak_initial_hits_accept_prompt_anchored_symbol_names() { + assert!(!weak_initial_hits( + "Where is exact_symbol_anchor used?", + &[test_semantic_only_hit("exact_symbol_anchor", 0.90)] + )); + } + + #[test] + fn investigation_focus_anchor_prefers_prompt_named_symbol() { + let hit = test_semantic_only_hit("exact_symbol_anchor", 0.05); + assert_eq!( + investigation_focus_anchor("Explain exact_symbol_anchor", &[hit]) + .expect("prompt-named hit should become focus") + .0, + "exact_symbol_anchor" + ); + assert!( + investigation_focus_anchor( + "Explain unrelated behavior", + &[test_semantic_only_hit("exact_symbol_anchor", 0.90)] + ) + .is_none() + ); + } + + #[test] + fn repo_explanation_prompt_detection_is_broad_but_not_symbolic() { + assert!(is_repo_explanation_prompt( + "How does this repo fit together?" + )); + assert!(is_repo_explanation_prompt( + "Explain the project architecture" + )); + assert!(!is_repo_explanation_prompt( + "Where is build_llm_symbol_doc_text used?" + )); + } + + #[test] + fn packet_plan_infers_task_class_and_code_terms() { + let plan = build_packet_plan( + "Trace the /api/users route through AppController and UserStore", + None, + PacketBudgetModeDto::Standard, + ); + + assert_eq!(plan.task_class, PacketTaskClassDto::RouteTracing); + assert!(plan.inferred_task_class); + assert!( + plan.queries.iter().any(|query| query.query == "/api/users"), + "route-like terms should become concrete packet queries: {plan:?}" + ); + assert!( + plan.queries + .iter() + .any(|query| query.query == "AppController"), + "CamelCase symbols should become concrete packet queries: {plan:?}" + ); + } + + #[test] + fn requested_packet_task_class_overrides_heuristic() { + let plan = build_packet_plan( + "What would change if the indexing cache format moved?", + Some(PacketTaskClassDto::ChangeImpact), + PacketBudgetModeDto::Standard, + ); + + assert_eq!(plan.task_class, PacketTaskClassDto::ChangeImpact); + assert!(!plan.inferred_task_class); + assert!( + plan.queries + .iter() + .any(|query| query.query.contains("affected")), + "change impact plans should seed affected-symbol queries: {plan:?}" + ); + } + + #[test] + fn packet_plan_expands_task_wording_without_fixture_specific_anchors() { + let plan = build_packet_plan( + "Explain how a full indexing run moves from the CLI into runtime orchestration, file discovery, symbol extraction, persistence, and search or snapshot refresh.", + Some(PacketTaskClassDto::ArchitectureExplanation), + PacketBudgetModeDto::Standard, + ); + let queries = plan + .queries + .iter() + .map(|query| query.query.as_str()) .collect::>(); + for expected in [ - "codex-rs/exec/src/event_processor_with_jsonl_output.rs", - "codex-rs/exec/src/cli.rs", - "codex-rs/exec/src/lib.rs", - "codex-rs/exec/src/exec_events.rs", + "index service", + "workspace execution plan", + "workspace indexer", + "symbol extraction indexer", + "search projection", + "snapshot refresh", + "indexing", + "runtime", + "IndexingRun", + "RuntimeOrchestration", + "architecture entrypoint", + "runtime flow", ] { assert!( - paths.contains(&expected), - "packet capping should spend scarce anchors on distinct flow files before same-file duplicate claims: {paths:?}" + queries.contains(&expected), + "expected generic probe {expected} in packet plan: {queries:?}" + ); + } + for fixture_anchor in [ + "run_index", + "IndexService", + "WorkspaceIndexer", + "flush_projection_batch", + "SnapshotStore", + ] { + assert!( + !queries.contains(&fixture_anchor), + "packet planner should not inject fixture-specific anchor {fixture_anchor}: {queries:?}" ); } - assert_eq!(answer.citations.len(), 4); } #[test] - fn packet_budget_protects_required_probe_citations_from_compact_cap() { - let _eval_probes = EvalProbesGuard::enabled(); - let question = "Explain how `codex exec --json` flows from the top-level CLI into the exec runtime, app-server thread and turn start requests, and JSONL event output."; - let mut citations = (0..16) - .map(|index| { - test_packet_citation( - &format!("HighRankDistractor{index}"), - &format!("src/high_rank_distractor_{index}.rs"), - 10.0, - ) - }) - .collect::>(); - citations.extend([ - test_packet_citation("run_exec_session", "codex-rs/exec/src/lib.rs", 0.1), - test_packet_citation("ExecSharedCliOptions", "codex-rs/exec/src/cli.rs", 0.1), - test_packet_citation("Subcommand::Exec", "codex-rs/cli/src/main.rs", 0.1), - test_packet_citation("run_main", "codex-rs/app-server/src/lib.rs", 6.0), - test_packet_citation("codex_exec::run_main", "codex-rs/exec/src/main.rs", 0.1), - test_packet_citation("codex_exec::Cli", "codex-rs/exec/src/main.rs", 0.1), - test_packet_citation( - "EventProcessorWithJsonOutput", - "codex-rs/exec/src/event_processor_with_jsonl_output.rs", - 0.1, - ), - test_packet_citation( - "crate::exec_events::AgentMessageItem (import)", - "codex-rs/exec/src/event_processor_with_jsonl_output.rs", - 5.0, - ), - test_packet_citation("exec_events", "codex-rs/exec/src/lib.rs", 6.0), - test_packet_citation( - "codex_protocol::models::WebSearchAction", - "codex-rs/exec/src/exec_events.rs", - 0.1, + fn architecture_packet_plan_uses_generic_flow_terms_without_eval_probes() { + let _env = EnvVarGuard::cleared(EVAL_PROBES_ENV); + let cases = [ + ( + "Explain how a client request flows through interceptors, request dispatch, and the transport adapter. Cite the source files that support the path.", + &[ + "request interceptor", + "request dispatch", + "transport adapter", + ][..], ), - test_packet_citation( - "ThreadStartParams", - "codex-rs/app-server-protocol/src/protocol/v2/thread.rs", - 0.1, + ( + "Explain how a server starts its event loop, reads client commands from the network, and dispatches them through command handlers. Cite the source files that support the path.", + &[ + "event loop", + "event dispatch", + "network input", + "command dispatch", + ][..], ), - test_packet_citation( - "TurnStartParams", - "codex-rs/app-server-protocol/src/protocol/v2/turn.rs", - 0.1, + ( + "Explain how a search command parses CLI flags, walks candidate files, and executes a search through matcher, searcher, and printer components. Cite the source files that support the path.", + &[ + "search entrypoint", + "flag parsing", + "argument planning", + "candidate file walk", + "search execution", + "result printer", + ][..], ), - ]); - let mut answer = packet_answer_fixture(question, citations); - - rank_packet_evidence(question, &mut answer); - let budget = apply_packet_budget( - packet_fixture_project_root(), - question, - PacketTaskClassDto::ArchitectureExplanation, - PacketBudgetModeDto::Compact, - packet_budget_limits(PacketBudgetModeDto::Compact), - &mut answer, - ); + ]; - let paths = answer - .citations - .iter() - .filter_map(|citation| citation.file_path.as_deref()) - .collect::>(); - for expected in [ - "codex-rs/exec/src/lib.rs", - "codex-rs/exec/src/cli.rs", - "codex-rs/cli/src/main.rs", - "codex-rs/exec/src/main.rs", - "codex-rs/exec/src/event_processor_with_jsonl_output.rs", - "codex-rs/exec/src/exec_events.rs", - "codex-rs/app-server-protocol/src/protocol/v2/thread.rs", - "codex-rs/app-server-protocol/src/protocol/v2/turn.rs", - ] { - assert!( - paths.contains(&expected), - "compact packet cap should protect required planned-probe citations before high-ranking distractors: {paths:?}" + for (question, expected_queries) in cases { + let plan = build_packet_plan( + question, + Some(PacketTaskClassDto::ArchitectureExplanation), + PacketBudgetModeDto::Compact, ); - } - assert!( - answer - .citations - .iter() - .any(|citation| citation.file_path.as_deref() - == Some("codex-rs/exec/src/exec_events.rs")), - "required probe protection should prefer the actual exec_events file over import-only or facade mentions: {:?}", - answer - .citations + let queries = plan + .queries .iter() - .map(|citation| (&citation.display_name, citation.file_path.as_deref())) - .collect::>() - ); - let exact_entrypoint_index = answer - .citations - .iter() - .position(|citation| { - citation.display_name == "codex_exec::run_main" - && citation.file_path.as_deref() == Some("codex-rs/exec/src/main.rs") - }) - .unwrap_or_else(|| { - panic!( - "command-span exact probes should protect the qualified command entrypoint: {:?}", - answer - .citations + .map(|query| query.query.as_str()) + .collect::>(); + for expected in expected_queries { + assert!( + queries .iter() - .map(|citation| (&citation.display_name, citation.file_path.as_deref())) - .collect::>() - ) - }); - if let Some(broad_entrypoint_index) = answer.citations.iter().position(|citation| { - citation.display_name == "run_main" - && citation.file_path.as_deref() == Some("codex-rs/app-server/src/lib.rs") - }) { - assert!( - exact_entrypoint_index < broad_entrypoint_index, - "qualified command evidence should be protected ahead of weaker broad command evidence: {:?}", - answer - .citations - .iter() - .map(|citation| (&citation.display_name, citation.file_path.as_deref())) - .collect::>() - ); + .any(|query| query.eq_ignore_ascii_case(expected)), + "expected {expected} in architecture packet plan: {queries:?}" + ); + } + for forbidden in [ + "createInstance", + "InterceptorManager", + "dispatchRequest", + "adapters.js", + "server.c main", + "aeMain", + "readQueryFromClient", + "processCommand", + "server.c call", + "core/main.rs", + "HiArgs", + "SearchWorker::search", + "haystack.rs", + "walk builder", + "matcher searcher printer", + "search worker", + ] { + assert!( + !queries + .iter() + .any(|query| query.eq_ignore_ascii_case(forbidden)), + "non-eval packet plan should not inject holdout anchor {forbidden}: {queries:?}" + ); + } } - assert!( - answer - .citations - .iter() - .any(|citation| citation.display_name == "codex_exec::Cli" - && citation.file_path.as_deref() == Some("codex-rs/exec/src/main.rs")), - "command-span exact probes should protect the qualified command CLI type: {:?}", - answer - .citations - .iter() - .map(|citation| (&citation.display_name, citation.file_path.as_deref())) - .collect::>() - ); - assert!(budget.truncated); - assert_eq!(answer.citations.len(), 13); } #[test] - fn packet_budget_protects_indexing_flow_action_probe_citations() { - let question = "Explain how a full indexing run moves from the CLI into runtime orchestration, file discovery, symbol extraction, persistence, and search or snapshot refresh."; - let mut citations = (0..20) - .map(|index| { - test_packet_citation( - &format!("HighRankDistractor{index}"), - &format!("src/high_rank_distractor_{index}.rs"), - 10.0, - ) - }) - .collect::>(); - citations.extend([ - test_packet_citation( - "test_incremental_indexing_second_run_reuses_unchanged_extraction_cache_and_resolution_support", - "crates/codestory-indexer/tests/integration.rs", - 9.5, - ), - test_packet_citation( - "EmbeddingRuntime::test_runtime", - "crates/codestory-runtime/src/search/engine.rs", - 9.4, - ), - test_packet_citation( - "tests::drill_question_search_is_partial_discovery_evidence", - "crates/codestory-cli/src/main.rs", - 9.3, - ), - test_packet_citation( - "Runtime::index_service", - "crates/codestory-runtime/src/lib.rs", - 9.0, - ), - test_packet_citation( - "WorkspaceIndexer", - "crates/codestory-indexer/src/lib.rs", - 9.0, - ), - test_packet_citation( - "Storage::upsert_search_symbol_projection_batch", - "crates/codestory-store/src/storage_impl/mod.rs", - 9.0, - ), - test_packet_citation( - "SnapshotRefreshStats", - "crates/codestory-store/src/snapshot_store.rs", - 9.0, - ), - test_packet_citation( - "IndexService::run_indexing_blocking", - "crates/codestory-runtime/src/services.rs", - 0.1, - ), - test_packet_citation( - "WorkspaceManifest::build_execution_plan", - "crates/codestory-workspace/src/lib.rs", - 0.1, - ), - test_packet_citation( - "WorkspaceIndexer::run", - "crates/codestory-indexer/src/lib.rs", - 0.1, - ), - test_packet_citation("index_file", "crates/codestory-indexer/src/lib.rs", 0.1), - test_packet_citation( - "Storage::flush_projection_batch", - "crates/codestory-store/src/storage_impl/mod.rs", - 0.1, + fn architecture_packet_plan_can_use_eval_manifest_probes_when_enabled() { + let _eval_probes = EvalProbesGuard::enabled(); + let cases = [ + ( + "Explain how the default axios instance is created and how an HTTP request flows through interceptors, dispatchRequest, and the transport adapter. Cite the source files that support the path.", + &[ + "createInstance", + "InterceptorManager", + "dispatchRequest", + "adapters.js", + ][..], ), - test_packet_citation( - "Storage::rebuild_search_symbol_projection_from_node_table", - "crates/codestory-store/src/storage_impl/mod.rs", - 0.1, + ( + "Explain how the Redis server starts its event loop, reads client commands from the network, and dispatches them through processCommand and call. Cite the source files that support the path.", + &[ + "server.c main", + "aeMain", + "readQueryFromClient", + "processCommand", + "server.c call", + ][..], ), - test_packet_citation( - "SnapshotStore::refresh_all_with_stats", - "crates/codestory-store/src/snapshot_store.rs", - 0.1, + ( + "Explain how ripgrep parses CLI flags, walks candidate files, and executes a search over each haystack through matcher, searcher, and printer components. Cite the source files that support the path.", + &[ + "core/main.rs", + "HiArgs", + "SearchWorker::search", + "haystack.rs", + ][..], ), - ]); - let mut answer = packet_answer_fixture(question, citations); + ]; - rank_packet_evidence(question, &mut answer); - let budget = apply_packet_budget( - packet_fixture_project_root(), + for (question, expected_queries) in cases { + let plan = build_packet_plan( + question, + Some(PacketTaskClassDto::ArchitectureExplanation), + PacketBudgetModeDto::Compact, + ); + let queries = plan + .queries + .iter() + .map(|query| query.query.as_str()) + .collect::>(); + for expected in expected_queries { + assert!( + queries + .iter() + .any(|query| query.eq_ignore_ascii_case(expected)), + "expected eval probe {expected} in architecture packet plan: {queries:?}" + ); + } + } + } + + #[test] + fn packet_plan_uses_explicit_request_probes_with_required_sufficiency() { + let question = "Explain how request dispatch reaches validation and callbacks."; + let extra_probes = vec![ + "Source/Core/RequestSession.swift Session.request".to_string(), + "Source/Core/DataRequest.swift DataRequest.validate".to_string(), + ]; + let plan = build_packet_plan_with_extra( question, - PacketTaskClassDto::ArchitectureExplanation, + Some(PacketTaskClassDto::RouteTracing), PacketBudgetModeDto::Compact, - packet_budget_limits(PacketBudgetModeDto::Compact), - &mut answer, + &extra_probes, ); - - let display_names = answer - .citations + let queries = plan + .queries .iter() - .map(|citation| citation.display_name.as_str()) + .map(|query| (query.query.as_str(), query.purpose.as_str())) .collect::>(); - for expected in [ - "Runtime::index_service", - "IndexService::run_indexing_blocking", - "WorkspaceManifest::build_execution_plan", - "WorkspaceIndexer::run", - "index_file", - "Storage::flush_projection_batch", - "Storage::rebuild_search_symbol_projection_from_node_table", - "SnapshotStore::refresh_all_with_stats", - ] { + + for expected in &extra_probes { assert!( - display_names.contains(&expected), - "compact packet cap should protect indexing-flow action probe {expected}: {display_names:?}" + queries.iter().any(|(query, purpose)| { + query.eq_ignore_ascii_case(expected) + && purpose.contains("explicit symbol probe") + }), + "expected explicit probe {expected} in packet plan: {queries:?}" ); } - for low_value in [ - "test_incremental_indexing_second_run_reuses_unchanged_extraction_cache_and_resolution_support", - "EmbeddingRuntime::test_runtime", - "tests::drill_question_search_is_partial_discovery_evidence", - ] { + assert!( + plan.trace + .iter() + .any(|entry| entry == "explicit_extra_probes=2 source=request"), + "packet plan should trace explicit request-probe provenance: {:?}", + plan.trace + ); + + let required = packet_sufficiency_required_probe_queries_with_extra( + question, + PacketTaskClassDto::RouteTracing, + &extra_probes, + ); + for expected in &extra_probes { assert!( - !display_names.contains(&low_value), - "compact packet should not keep low-value test evidence when production alternatives exist: {display_names:?}" + required + .iter() + .any(|query| query.eq_ignore_ascii_case(expected)), + "expected explicit probe {expected} in sufficiency requirements: {required:?}" ); } - assert!(budget.truncated); - assert_eq!(answer.citations.len(), 13); } - #[test] - fn packet_budget_replaces_overrepresented_family_with_late_flow_role() { - let question = "Explain how project/source-group configuration becomes indexing work and storage access."; - let mut citations = (0..10) - .map(|index| { - test_packet_citation( - &format!("SourceGroupFamily{index}"), - &format!("src/lib/project/SourceGroupFamily{index}.cpp"), - 10.0 - (index as f32 * 0.1), - ) - }) - .collect::>(); - citations.extend([ - test_packet_citation( - "TaskFillIndexerCommandsQueue", - "src/lib/data/indexer/TaskFillIndexerCommandsQueue.h", - 0.2, - ), - test_packet_citation("StorageAccess", "src/lib/data/storage/StorageAccess.h", 0.1), - ]); - let mut answer = packet_answer_fixture(question, citations); - let mut limits = packet_budget_limits(PacketBudgetModeDto::Compact); - limits.max_anchors = 6; - limits.max_files = 6; - - rank_packet_evidence(question, &mut answer); - let budget = apply_packet_budget( - packet_fixture_project_root(), + fn command_dispatch_flow_does_not_require_request_dispatch_probes() { + let _env = EnvVarGuard::cleared(EVAL_PROBES_ENV); + let question = "Explain how a server starts its event loop, reads client commands from the network, and dispatches them through command handlers."; + let plan = build_packet_plan( question, - PacketTaskClassDto::ArchitectureExplanation, + Some(PacketTaskClassDto::ArchitectureExplanation), PacketBudgetModeDto::Compact, - limits, - &mut answer, ); - - assert!(budget.truncated, "fixture should exercise citation capping"); - let paths = answer - .citations + let queries = plan + .queries .iter() - .filter_map(|citation| citation.file_path.as_deref()) + .map(|query| query.query.as_str()) .collect::>(); - assert!( - paths.contains(&"src/lib/data/indexer/TaskFillIndexerCommandsQueue.h"), - "late indexing evidence should replace overrepresented source-group evidence: {paths:?}" - ); - assert!( - paths.contains(&"src/lib/data/storage/StorageAccess.h"), - "late storage evidence should replace overrepresented source-group evidence: {paths:?}" + + for expected in ["event loop", "network input", "command dispatch"] { + assert!( + queries.contains(&expected), + "expected {expected} in command/event flow packet plan: {queries:?}" + ); + } + for request_probe in [ + "request interceptor", + "request dispatch", + "transport adapter", + "interceptor manager", + "dispatch request", + ] { + assert!( + !queries.contains(&request_probe), + "command dispatch should not inject request probe {request_probe}: {queries:?}" + ); + } + + let required = packet_sufficiency_required_probe_queries( + question, + PacketTaskClassDto::ArchitectureExplanation, ); + for request_probe in [ + "request interceptor", + "request dispatch", + "transport adapter", + ] { + assert!( + !required.iter().any(|query| query == request_probe), + "sufficiency should not require request probe {request_probe}: {required:?}" + ); + } } #[test] - fn packet_budget_replaces_weaker_same_role_with_late_definition_files() { - let question = - "Explain how source-group configuration becomes indexing work and storage access."; - let mut set_subject = test_packet_citation( - "StorageAccessProxy::setSubject", - "src/lib/data/storage/StorageAccessProxy.h", - 9.0, - ); - set_subject.kind = NodeKind::METHOD; - let mut storage_contract = - test_packet_citation("StorageAccess", "src/lib/data/storage/StorageAccess.h", 0.1); - storage_contract.kind = NodeKind::CLASS; - let mut persistent_storage = test_packet_citation( - "PersistentStorage", - "src/lib/data/storage/PersistentStorage.h", - 0.1, - ); - persistent_storage.kind = NodeKind::CLASS; - let mut task_ctor = test_packet_citation( - "TaskFillIndexerCommandsQueue::TaskFillIndexerCommandsQueue", - "src/lib/data/indexer/TaskFillIndexerCommandQueue.cpp", - 8.0, - ); - task_ctor.kind = NodeKind::FUNCTION; - let mut task_type = test_packet_citation( - "TaskFillIndexerCommandsQueue", - "src/lib/data/indexer/TaskFillIndexerCommandQueue.h", - 0.1, - ); - task_type.kind = NodeKind::CLASS; - let mut answer = packet_answer_fixture( - question, - vec![ - test_packet_citation("SourceGroupFamily0", "src/lib/project/SourceGroup0.h", 10.0), - test_packet_citation("SourceGroupFamily1", "src/lib/project/SourceGroup1.h", 9.5), - set_subject, - test_packet_citation("StorageTest", "src/test/StorageTestSuite.cpp", 9.0), - task_ctor, - storage_contract, - persistent_storage, - task_type, - ], + fn compact_packet_plan_promotes_indexing_flow_stage_queries() { + let plan = build_packet_plan( + "Explain how a full indexing run moves from the CLI into runtime orchestration, file discovery, symbol extraction, persistence, and search or snapshot refresh.", + Some(PacketTaskClassDto::ArchitectureExplanation), + PacketBudgetModeDto::Compact, ); - let mut limits = packet_budget_limits(PacketBudgetModeDto::Compact); - limits.max_anchors = 6; - limits.max_files = 6; - - assert!(cap_citations(&mut answer, &limits)); - - let paths = answer - .citations + let queries = plan + .queries .iter() - .filter_map(|citation| citation.file_path.as_deref()) + .map(|query| query.query.as_str()) .collect::>(); + for expected in [ - "src/lib/data/storage/StorageAccess.h", - "src/lib/data/storage/PersistentStorage.h", - "src/lib/data/indexer/TaskFillIndexerCommandQueue.h", + "index service", + "workspace execution plan", + "workspace indexer", + "search projection", + "snapshot refresh", ] { assert!( - paths.contains(&expected), - "late definition-file anchors should replace weaker same-role/test citations: {paths:?}" + queries.contains(&expected), + "expected indexing-flow stage probe {expected} in compact packet plan: {queries:?}" ); } - assert!( - !paths.contains(&"src/test/StorageTestSuite.cpp"), - "test evidence should not consume compact citation slots for production-flow prompts: {paths:?}" - ); - assert!(answer.citations.len() <= limits.max_anchors as usize); - } - #[test] - fn packet_budget_protects_storage_required_probe_citations() { - let question = "Explain how project/source-group configuration becomes indexing work, then how indexed data is accessed by the application."; - let mut proxy_header = test_packet_citation( - "StorageAccessProxy", - "src/lib/data/storage/StorageAccessProxy.h", - 10.0, - ); - proxy_header.kind = NodeKind::CLASS; - let mut storage_contract = - test_packet_citation("StorageAccess", "src/lib/data/storage/StorageAccess.h", 0.1); - storage_contract.kind = NodeKind::CLASS; - let mut proxy_method = test_packet_citation( - "StorageAccessProxy::setSubject", - "src/lib/data/storage/StorageAccessProxy.cpp", - 0.1, - ); - proxy_method.kind = NodeKind::METHOD; - let mut persistent_type = test_packet_citation( - "PersistentStorage", - "src/lib/data/storage/PersistentStorage.h", - 0.1, - ); - persistent_type.kind = NodeKind::CLASS; - let mut persistent_ctor = test_packet_citation( - "PersistentStorage::PersistentStorage", - "src/lib/data/storage/PersistentStorage.cpp", - 0.1, - ); - persistent_ctor.kind = NodeKind::FUNCTION; - let mut answer = packet_answer_fixture( - question, - vec![ - test_packet_citation("SourceGroup", "src/lib/project/SourceGroup.h", 9.5), - proxy_header, - test_packet_citation("StorageRegression", "src/test/StorageRegression.cpp", 9.0), - storage_contract, - proxy_method, - persistent_type, - persistent_ctor, - ], + let stage_index = queries + .iter() + .position(|query| *query == "index service") + .expect("index service stage probe should be present"); + for low_signal in ["full", "moves", "run_moves", "RunMoves"] { + assert!( + !queries.contains(&low_signal), + "packet planner should suppress isolated low-signal term {low_signal}: {queries:?}" + ); + } + let broad_probe = "runtime"; + let probe_index = queries + .iter() + .position(|query| *query == broad_probe) + .expect("broad probe should still be present"); + assert!( + stage_index < probe_index, + "indexing-flow stage probes should precede broad probe {broad_probe}: {queries:?}" ); - let mut limits = packet_budget_limits(PacketBudgetModeDto::Compact); - limits.max_anchors = 5; - limits.max_files = 5; + } - rank_packet_evidence(question, &mut answer); - let budget = apply_packet_budget( - packet_fixture_project_root(), - question, - PacketTaskClassDto::ArchitectureExplanation, + #[test] + fn compact_packet_plan_protects_generic_indexing_flow_probes() { + let plan = build_packet_plan( + "Explain how a full indexing run moves from the CLI into runtime orchestration, file discovery, symbol extraction, persistence, and search or snapshot refresh.", + Some(PacketTaskClassDto::ArchitectureExplanation), PacketBudgetModeDto::Compact, - limits, - &mut answer, ); - - assert!(budget.truncated, "fixture should exercise compact capping"); - let paths = answer - .citations + let queries = plan + .queries .iter() - .filter_map(|citation| citation.file_path.as_deref()) + .map(|query| query.query.as_str()) .collect::>(); + for expected in [ - "src/lib/data/storage/StorageAccess.h", - "src/lib/data/storage/PersistentStorage.h", - "src/lib/data/storage/PersistentStorage.cpp", + "indexing entrypoint", + "file discovery", + "symbol extraction", + "storage persistence", + "search projection", + "snapshot refresh", ] { assert!( - paths.contains(&expected), - "storage-flow required probes should protect exact contract and implementation paths: {paths:?}" + queries.contains(&expected), + "expected generic indexing-flow probe {expected} in compact packet plan: {queries:?}" + ); + } + for fixture_anchor in [ + "Runtime::index_service", + "IndexService::run_indexing_blocking", + "index service run indexing", + "WorkspaceManifest::build_execution_plan", + "workspace manifest build execution plan", + "WorkspaceIndexer::run", + "workspace indexer run", + "index_file", + "Storage::flush_projection_batch", + "storage flush projection batch", + "Storage::rebuild_search_symbol_projection_from_node_table", + "storage rebuild search symbol projection", + "SnapshotStore::refresh_all_with_stats", + "snapshot refresh all stats", + ] { + assert!( + !queries.contains(&fixture_anchor), + "packet planner should protect generic indexing probes without injecting fixture-specific anchor {fixture_anchor}: {queries:?}" ); } - assert!( - !paths.contains(&"src/test/StorageRegression.cpp"), - "test filler should not displace protected production storage paths: {paths:?}" - ); } #[test] - fn packet_budget_protects_indexing_required_probe_citations() { - let question = "Explain how project/source-group configuration becomes indexing work, then how command providers create C++ and Java work items."; - let mut citations = (0..10) - .map(|index| { - test_packet_citation( - &format!("SourceGroupNoise{index}"), - &format!("src/lib/project/SourceGroupNoise{index}.h"), - 10.0 - (index as f32 * 0.1), - ) - }) - .collect::>(); - let mut build_index = - test_packet_citation("Project::buildIndex", "src/lib/project/Project.cpp", 0.1); - build_index.kind = NodeKind::FUNCTION; - let mut source_group_type = test_packet_citation( - "SourceGroupCxxCdb", - "src/lib_cxx/project/SourceGroupCxxCdb.h", - 0.1, - ); - source_group_type.kind = NodeKind::CLASS; - let mut source_group_provider = test_packet_citation( - "SourceGroupCxxCdb::getIndexerCommandProvider", - "src/lib_cxx/project/SourceGroupCxxCdb.cpp", - 0.1, - ); - source_group_provider.kind = NodeKind::METHOD; - let mut task_queue = test_packet_citation( - "TaskFillIndexerCommandsQueue", - "src/lib/data/indexer/TaskFillIndexerCommandQueue.h", - 0.1, - ); - task_queue.kind = NodeKind::CLASS; - let mut cxx_command = test_packet_citation( - "IndexerCommandCxx", - "src/lib_cxx/data/indexer/IndexerCommandCxx.h", - 0.1, - ); - cxx_command.kind = NodeKind::CLASS; - let mut java_indexer = test_packet_citation( - "IndexerJava::doIndex", - "src/lib_java/data/indexer/IndexerJava.cpp", - 0.1, + fn compact_packet_initial_retrieval_keeps_semantic_hybrid_and_anchor_prompt() { + let plan = build_packet_plan( + "Explain how VS Code workbench startup reaches ExtensionService, ExtensionHostManager, AbstractExtHostExtensionService, and ExtHostCommands.executeCommand.", + Some(PacketTaskClassDto::ArchitectureExplanation), + PacketBudgetModeDto::Compact, ); - java_indexer.kind = NodeKind::METHOD; - citations.extend([ - build_index, - source_group_type, - source_group_provider, - task_queue, - cxx_command, - java_indexer, - test_packet_citation("IndexerRegression", "src/test/IndexerRegression.cpp", 9.5), - ]); - let mut answer = packet_answer_fixture(question, citations); - let mut limits = packet_budget_limits(PacketBudgetModeDto::Compact); - limits.max_anchors = 8; - limits.max_files = 8; - rank_packet_evidence(question, &mut answer); - let budget = apply_packet_budget( - packet_fixture_project_root(), - question, - PacketTaskClassDto::ArchitectureExplanation, + assert!( + packet_initial_hybrid_weights(&plan, PacketBudgetModeDto::Compact).is_none(), + "compact packets should not collapse initial retrieval to lexical-only" + ); + let prompt = packet_retrieval_prompt( + "Explain startup.", + &plan, + None, PacketBudgetModeDto::Compact, - limits, - &mut answer, ); + assert!(prompt.starts_with("Explain startup.")); + assert!(prompt.contains("Planned CodeStory queries:")); + assert!(prompt.contains("ExtensionService")); + assert!(prompt.contains("ExtHostCommands")); + assert!(prompt.to_ascii_lowercase().contains("workbench")); + } - assert!(budget.truncated, "fixture should exercise compact capping"); - let paths = answer - .citations + #[test] + fn packet_plan_suppresses_low_signal_broad_prompt_terms() { + let plan = build_packet_plan( + "Study current architecture boundaries across contracts workspace indexer store runtime cli bench retrieval packet flow ranking precision latency risks.", + Some(PacketTaskClassDto::ArchitectureExplanation), + PacketBudgetModeDto::Standard, + ); + let queries = plan + .queries .iter() - .filter_map(|citation| citation.file_path.as_deref()) + .map(|query| query.query.as_str()) .collect::>(); - for expected in [ - "src/lib/project/Project.cpp", - "src/lib_cxx/project/SourceGroupCxxCdb.h", - "src/lib_cxx/data/indexer/IndexerCommandCxx.h", + + for low_signal in [ + "current", + "Current", + "current_architecture", + "CurrentArchitecture", + "latency_risks", + "LatencyRisks", + "risks", + "Risks", ] { assert!( - paths.contains(&expected), - "indexing required probes should protect exact source-group and work-queue paths: {paths:?}" + !queries.contains(&low_signal), + "packet planner should suppress low-signal broad prompt term {low_signal}: {queries:?}" + ); + } + for retained in [ + "contracts", + "workspace", + "indexer", + "store", + "bench_retrieval", + ] { + assert!( + queries.contains(&retained), + "packet planner should retain concrete repo/retrieval term {retained}: {queries:?}" ); } - assert!( - !paths.contains(&"src/test/IndexerRegression.cpp"), - "test filler should not displace protected production indexing paths: {paths:?}" - ); } #[test] - fn packet_budget_fills_spare_capacity_with_deferred_production_before_tests() { - let mut proxy_header = test_packet_citation( - "StorageAccessProxy", - "src/lib/data/storage/StorageAccessProxy.h", - 0.9, + fn packet_plan_keeps_broad_risk_study_as_architecture() { + let plan = build_packet_plan( + "Study current architecture boundaries, packet flow, ranking precision, and latency risks.", + None, + PacketBudgetModeDto::Standard, ); - proxy_header.kind = NodeKind::CLASS; - let mut proxy_impl = test_packet_citation( - "StorageAccessProxy", - "src/lib/data/storage/StorageAccessProxy.cpp", - 0.1, + let queries = plan + .queries + .iter() + .map(|query| query.query.as_str()) + .collect::>(); + + assert_eq!(plan.task_class, PacketTaskClassDto::ArchitectureExplanation); + assert!( + queries.contains(&"architecture entrypoint"), + "architecture packets should keep architecture seeds: {queries:?}" ); - proxy_impl.kind = NodeKind::CLASS; - let mut answer = packet_answer_fixture( - "Explain runtime routing, indexing, and storage access.", - vec![ - test_packet_citation("RuntimeCoordinator", "src/runtime/coordinator.rs", 0.8), - proxy_header, - test_packet_citation( - "TaskBuildIndex", - "src/lib/data/indexer/TaskBuildIndex.h", - 0.8, - ), - test_packet_citation("StorageRegression", "src/test/StorageRegression.cpp", 10.0), - proxy_impl, - ], + assert!( + !queries.contains(&"affected symbols"), + "generic risk wording should not force change-impact seeds: {queries:?}" ); - let mut limits = packet_budget_limits(PacketBudgetModeDto::Compact); - limits.max_anchors = 4; - limits.max_files = 4; + } - assert!(cap_citations(&mut answer, &limits)); + #[test] + fn packet_plan_routes_specific_risk_of_change_prompts_to_change_impact() { + for question in [ + "What risk if I change reference resolution behavior?", + "What is the risk of changing reference resolution behavior?", + ] { + let plan = build_packet_plan(question, None, PacketBudgetModeDto::Standard); + let queries = plan + .queries + .iter() + .map(|query| query.query.as_str()) + .collect::>(); + + assert_eq!( + plan.task_class, + PacketTaskClassDto::ChangeImpact, + "{question}" + ); + assert!( + queries.contains(&"affected symbols"), + "specific risk-of-change prompts should keep change-impact seeds: {queries:?}" + ); + } + } - let paths = answer - .citations + #[test] + fn packet_plan_preserves_quoted_low_signal_symbol_queries() { + let plan = build_packet_plan("Find `current`.", None, PacketBudgetModeDto::Standard); + let queries = plan + .queries .iter() - .filter_map(|citation| citation.file_path.as_deref()) + .map(|query| query.query.as_str()) .collect::>(); + assert!( - paths.contains(&"src/lib/data/storage/StorageAccessProxy.cpp"), - "deferred production evidence should fill spare capacity before deferred tests: {paths:?}" - ); - assert!( - !paths.contains(&"src/test/StorageRegression.cpp"), - "deferred tests should only fill after production evidence is exhausted: {paths:?}" + queries.contains(&"current"), + "quoted symbol queries should not be filtered as low-signal broad terms: {queries:?}" ); } #[test] - fn packet_budget_defers_test_evidence_when_compact_cap_is_full() { - let mut answer = packet_answer_fixture( - "Explain runtime routing and persistence.", - vec![ - test_packet_citation("RuntimeCoordinator", "src/runtime/coordinator.rs", 0.8), - test_packet_citation("RouteHandler", "src/router/handler.rs", 0.8), - test_packet_citation("ProjectionStore", "src/store/projection.rs", 0.8), - test_packet_citation("RegressionCase", "tests/runtime_regression.rs", 10.0), - ], + fn symbol_ownership_packet_plan_seeds_generic_ownership_terms() { + let question = "Explain which modules own application creation, app-level rendering, response serialization, file sending, and view lookup."; + let plan = build_packet_plan( + question, + Some(PacketTaskClassDto::SymbolOwnership), + PacketBudgetModeDto::Standard, ); - let mut limits = packet_budget_limits(PacketBudgetModeDto::Compact); - limits.max_anchors = 3; - limits.max_files = 3; - - assert!(cap_citations(&mut answer, &limits)); - - let paths = answer - .citations + let queries = plan + .queries .iter() - .filter_map(|citation| citation.file_path.as_deref()) + .map(|query| query.query.as_str()) .collect::>(); - assert!( - !paths.contains(&"tests/runtime_regression.rs"), - "test evidence should fill only spare citation capacity: {paths:?}" - ); + + for expected in [ + "references", + "callers", + "definition references", + "application", + "view", + "lookup", + "application_creation", + "ApplicationCreation", + ] { + assert!( + queries.contains(&expected), + "expected {expected} in generic ownership packet plan: {queries:?}" + ); + } + for fixture_anchor in ["createApplication", "lib/express.js", "lib/response.js"] { + assert!( + !queries.contains(&fixture_anchor), + "ownership planning should not inject fixture-specific anchor {fixture_anchor}: {queries:?}" + ); + } } #[test] - fn packet_ranking_demotes_low_signal_current_symbols() { - let question = "Study current architecture and runtime boundaries."; - let mut answer = packet_answer_fixture( + fn bug_packet_plan_seeds_generic_failure_terms_and_prompt_identifiers() { + let question = + "Localize an app.param callback decode bug through router parameter handling."; + let plan = build_packet_plan( question, - vec![ - test_packet_citation("current", "src/runtime/current.rs", 5.0), - test_packet_citation("RuntimeCoordinator", "src/runtime/coordinator.rs", 0.2), - ], + Some(PacketTaskClassDto::BugLocalization), + PacketBudgetModeDto::Standard, ); + let queries = plan + .queries + .iter() + .map(|query| query.query.as_str()) + .collect::>(); - rank_packet_evidence(question, &mut answer); - - assert_eq!(answer.citations[0].display_name, "RuntimeCoordinator"); + for expected in [ + "app.param", + "param", + "callback", + "error", + "validate", + "error path", + "failure handling", + ] { + assert!( + queries.contains(&expected), + "expected {expected} in generic bug packet plan: {queries:?}" + ); + } + for fixture_anchor in ["proto.param", "Layer.prototype.match", "test/app.param.js"] { + assert!( + !queries.contains(&fixture_anchor), + "bug planning should not inject fixture-specific anchor {fixture_anchor}: {queries:?}" + ); + } } #[test] - fn packet_ranking_keeps_explicit_low_signal_symbol_queries() { - let question = "Find `current`."; - let mut answer = packet_answer_fixture( + fn route_tracing_packet_plan_seeds_generic_route_terms() { + let question = "Trace how an application registers middleware and routes, then dispatches an incoming request through router layers to a route handler."; + let plan = build_packet_plan( question, - vec![ - test_packet_citation("current", "src/runtime/current.rs", 0.2), - test_packet_citation("RuntimeCoordinator", "src/runtime/coordinator.rs", 0.2), - ], + Some(PacketTaskClassDto::RouteTracing), + PacketBudgetModeDto::Standard, ); + let queries = plan + .queries + .iter() + .map(|query| query.query.as_str()) + .collect::>(); - rank_packet_evidence(question, &mut answer); - - assert_eq!(answer.citations[0].display_name, "current"); + for expected in [ + "router", + "handler", + "route", + "middleware", + "dispatch", + "route handler endpoint", + ] { + assert!( + queries.contains(&expected), + "expected {expected} in route tracing packet plan: {queries:?}" + ); + } + for fixture_anchor in [ + "createApplication", + "lib/router/layer.js", + "Router.StrictSlash", + ] { + assert!( + !queries.contains(&fixture_anchor), + "route tracing should not inject fixture-specific anchor {fixture_anchor}: {queries:?}" + ); + } } #[test] - fn investigation_mode_is_explicit_preset_only() { - let mut profile = latency_profile(); - profile.policy_mode = AgentRetrievalPolicyModeDto::CompletenessFirst; - assert!(!should_investigate(&profile)); + fn route_tracing_packet_plan_seeds_express_app_route_probes_when_prompt_names_express() { + let _eval_probes = EvalProbesGuard::enabled(); + let question = "Trace how Express creates an application, registers middleware/routes, and handles an incoming request through the router and response helpers."; + let plan = build_packet_plan( + question, + Some(PacketTaskClassDto::RouteTracing), + PacketBudgetModeDto::Compact, + ); + let queries = plan + .queries + .iter() + .map(|query| query.query.as_str()) + .collect::>(); - profile.preset = codestory_contracts::api::AgentRetrievalPresetDto::Investigate; - assert!(should_investigate(&profile)); + for expected in [ + "createApplication", + "app.init", + "app.handle", + "app.use", + "app.route", + "res.send", + "application.js app.use", + "response send body", + ] { + assert!( + queries.contains(&expected), + "expected {expected} in Express route tracing packet plan: {queries:?}" + ); + } } #[test] - fn weak_initial_hits_use_normalized_search_scores() { - assert!(!weak_initial_hits( - "strong", - &[ - test_search_hit("strong", 0.31), - test_search_hit("second", 0.20), - test_search_hit("third", 0.10), - ] - )); - assert!(weak_initial_hits( - "weak", - &[ - test_search_hit("weak", 0.29), - test_search_hit("second", 0.20), - test_search_hit("third", 0.10), - ] - )); - assert!(weak_initial_hits( - "too_few", - &[test_search_hit("too_few", 0.29)] - )); - } + fn packet_supported_claims_use_generic_evidence_roles() { + let limits = packet_budget_limits(PacketBudgetModeDto::Compact); + let mut answer = AgentAnswerDto { + answer_id: "generic-fixture".to_string(), + prompt: "Explain the packet evidence roles.".to_string(), + summary: "Generic evidence roles are covered.".to_string(), + freshness: None, + sections: Vec::new(), + citations: vec![ + test_packet_citation("CliCommand", "crates/tool-cli/src/main.rs", 0.8), + test_packet_citation("RuntimeCoordinator", "crates/core/src/runtime.rs", 0.8), + test_packet_citation("WorkspacePlan", "crates/core/src/workspace/plan.rs", 0.8), + test_packet_citation("GraphIndexer", "crates/indexer/src/lib.rs", 0.8), + test_packet_citation("ProjectionStore", "crates/store/src/projection.rs", 0.8), + test_packet_citation("SnapshotRefresh", "crates/store/src/snapshot.rs", 0.8), + test_packet_citation("RouteHandler", "src/routes/user.rs", 0.8), + test_packet_citation("PacketRegression", "tests/packet_flow.rs", 0.8), + ], + subgraph_ids: Vec::new(), + retrieval_version: "test".to_string(), + graphs: Vec::new(), + retrieval_trace: codestory_contracts::api::AgentRetrievalTraceDto { + request_id: "generic-fixture".to_string(), + resolved_profile: AgentRetrievalPresetDto::Architecture, + policy_mode: AgentRetrievalPolicyModeDto::LatencyFirst, + total_latency_ms: 1, + sla_target_ms: None, + sla_missed: false, + semantic_fallback_count: 0, + semantic_fallbacks: Vec::new(), + annotations: Vec::new(), + steps: Vec::new(), + packet_sidecar_diagnostics: Vec::new(), + retrieval_shadow: None, + }, + }; - #[test] - fn weak_initial_hits_treat_semantic_only_matches_as_low_confidence() { - assert!(weak_initial_hits( - "unrelated billing conveyor", - &[ - test_semantic_only_hit("semantic_one", 0.90), - test_semantic_only_hit("semantic_two", 0.80), - test_semantic_only_hit("semantic_three", 0.70), - ] - )); - } + append_packet_evidence_sections( + &mut answer, + PacketTaskClassDto::ArchitectureExplanation, + &limits, + ); + let text = answer + .sections + .iter() + .flat_map(|section| §ion.blocks) + .filter_map(|block| match block { + AgentResponseBlockDto::Markdown { markdown } => Some(markdown.as_str()), + AgentResponseBlockDto::Mermaid { .. } => None, + }) + .collect::>() + .join("\n"); - #[test] - fn weak_initial_hits_accept_prompt_anchored_symbol_names() { - assert!(!weak_initial_hits( - "Where is exact_symbol_anchor used?", - &[test_semantic_only_hit("exact_symbol_anchor", 0.90)] - )); + for expected_claim in [ + "The command or public entrypoint for this flow is anchored by `CliCommand`", + "Runtime orchestration is anchored by `RuntimeCoordinator`", + "Workspace discovery or planning is anchored by `WorkspacePlan`", + "Symbol extraction is anchored by `GraphIndexer`", + "Persistence or search projection is anchored by `ProjectionStore`", + "Snapshot refresh is anchored by `SnapshotRefresh`", + "Route handling is anchored by `RouteHandler`", + ] { + assert!( + text.contains(expected_claim), + "generic packet claims should include {expected_claim}: {text}" + ); + } + assert!( + !text.contains("Regression coverage for this flow is anchored by `PacketRegression`"), + "test-path regression claims should not crowd out primary flow claims: {text}" + ); } #[test] - fn investigation_focus_anchor_prefers_prompt_named_symbol() { - let hit = test_semantic_only_hit("exact_symbol_anchor", 0.05); - assert_eq!( - investigation_focus_anchor("Explain exact_symbol_anchor", &[hit]) - .expect("prompt-named hit should become focus") - .0, - "exact_symbol_anchor" + fn packet_supported_claims_generic_source_claims_are_domain_neutral_without_eval_probes() { + let _env = EnvVarGuard::cleared(EVAL_PROBES_ENV); + let answer = packet_answer_fixture( + "Explain service ownership flow for ComputeFlow and PersistFlow.", + vec![ + test_packet_citation("ComputeFlow", "src/domain/flow.rs", 0.8), + test_packet_citation("PersistFlow", "src/domain/persist.rs", 0.8), + ], ); + + let claims = packet_supported_claims(&answer); + let text = claims + .iter() + .map(|claim| claim.claim.as_str()) + .collect::>() + .join("\n"); + let lower = text.to_ascii_lowercase(); + assert!( - investigation_focus_anchor( - "Explain unrelated behavior", - &[test_semantic_only_hit("exact_symbol_anchor", 0.90)] - ) - .is_none() + lower.contains("source evidence") + || lower.contains("`computeflow` in `src/domain/flow.rs`"), + "generic source claim should be present: {text}" ); + for forbidden in [ + "supporting evidence", + "interceptor", + "dispatch", + "axios", + "http", + "holdout", + "eval", + "probe", + "bench", + ] { + assert!( + !lower.contains(forbidden), + "generic source claims must not contain `{forbidden}` with eval probes disabled: {text}" + ); + } } #[test] - fn repo_explanation_prompt_detection_is_broad_but_not_symbolic() { - assert!(is_repo_explanation_prompt( - "How does this repo fit together?" - )); - assert!(is_repo_explanation_prompt( - "Explain the project architecture" - )); - assert!(!is_repo_explanation_prompt( - "Where is build_llm_symbol_doc_text used?" - )); - } + fn packet_supported_claims_include_exec_flow_specific_claims() { + let temp_root = + std::env::temp_dir().join(format!("codestory-exec-flow-claims-{}", std::process::id())); + let cli_src = temp_root.join("cli").join("src"); + let exec_src = temp_root.join("exec").join("src"); + std::fs::create_dir_all(&cli_src).expect("create temp cli src"); + std::fs::create_dir_all(&exec_src).expect("create temp exec src"); + let cli_main = cli_src.join("main.rs"); + std::fs::write( + &cli_main, + r#" + pub enum Subcommand { + Exec, + } + pub struct DebugSubcommand; + mod codex_exec; + "#, + ) + .expect("write temp cli main"); + let exec_main = exec_src.join("main.rs"); + std::fs::write( + &exec_main, + r#" + fn main() { + codex_exec::run_main(); + } + "#, + ) + .expect("write temp exec main"); + let exec_cli = exec_src.join("cli.rs"); + std::fs::write( + &exec_cli, + r#" + pub struct Cli { + /// Print events to stdout as JSONL. + #[arg(long = "json", alias = "experimental-json")] + pub json: bool, + } + pub struct ExecSharedCliOptions; + "#, + ) + .expect("write temp exec cli"); + let exec_lib = exec_src.join("lib.rs"); + std::fs::write( + &exec_lib, + r#" + pub async fn run_main() { + let config = ConfigBuilder::default().build().await?; + let approval_policy = config.permissions.approval_policy.value(); + let sandbox = config.permissions.sandbox_policy.value(); + let in_process_start_args = InProcessClientStartArgs { + config: std::sync::Arc::new(config.clone()), + client_name: "codex_exec".to_string(), + }; + run_exec_session(in_process_start_args).await + } + "#, + ) + .expect("write temp exec lib"); + let event_jsonl = exec_src.join("event_processor_with_jsonl_output.rs"); + std::fs::write( + &event_jsonl, + r#" + use crate::exec_events::ThreadEvent; + pub struct EventProcessorWithJsonOutput; + impl EventProcessorWithJsonOutput { + fn emit(&self, event: ThreadEvent) { + println!("{}", serde_json::to_string(&event).unwrap()); + } + } + "#, + ) + .expect("write temp jsonl event processor"); + let cli_main_path = cli_main.to_string_lossy().to_string(); + let exec_main_path = exec_main.to_string_lossy().to_string(); + let exec_cli_path = exec_cli.to_string_lossy().to_string(); + let exec_lib_path = exec_lib.to_string_lossy().to_string(); + let event_jsonl_path = event_jsonl.to_string_lossy().to_string(); + let answer = AgentAnswerDto { + answer_id: "exec-fixture".to_string(), + prompt: "Explain how `codex exec --json` flows from the top-level CLI into the exec runtime, app-server thread and turn start requests, and JSONL event output.".to_string(), + summary: "Exec flow evidence is covered.".to_string(), + freshness: None, + sections: Vec::new(), + citations: vec![ + test_packet_citation("Subcommand::Exec", &cli_main_path, 0.8), + test_packet_citation("codex_exec::Cli", &cli_main_path, 0.8), + test_packet_citation("codex_exec::run_main", &exec_main_path, 0.8), + test_packet_citation( + "ExecSharedCliOptions::into_inner", + &exec_cli_path, + 0.8, + ), + test_packet_citation("run_main", &exec_lib_path, 0.8), + test_packet_citation("run_exec_session", &exec_lib_path, 0.8), + test_packet_citation( + "EventProcessorWithJsonOutput", + &event_jsonl_path, + 0.8, + ), + test_packet_citation( + "ThreadStartParams", + "codex-rs/app-server-protocol/src/protocol/v2/thread.rs", + 0.8, + ), + test_packet_citation( + "TurnStartParams", + "codex-rs/app-server-protocol/src/protocol/v2/turn.rs", + 0.8, + ), + ], + subgraph_ids: Vec::new(), + retrieval_version: "test".to_string(), + graphs: Vec::new(), + retrieval_trace: codestory_contracts::api::AgentRetrievalTraceDto { + request_id: "exec-fixture".to_string(), + resolved_profile: AgentRetrievalPresetDto::Architecture, + policy_mode: AgentRetrievalPolicyModeDto::LatencyFirst, + total_latency_ms: 1, + sla_target_ms: None, + sla_missed: false, + semantic_fallback_count: 0, + semantic_fallbacks: Vec::new(), + annotations: Vec::new(), + steps: Vec::new(), + packet_sidecar_diagnostics: Vec::new(), + retrieval_shadow: None, + }, + }; - #[test] - fn packet_plan_infers_task_class_and_code_terms() { - let plan = build_packet_plan( - "Trace the /api/users route through AppController and UserStore", - None, - PacketBudgetModeDto::Standard, - ); + let claims = packet_supported_claims(&answer); + let text = claims + .iter() + .map(|claim| claim.claim.as_str()) + .collect::>() + .join("\n"); - assert_eq!(plan.task_class, PacketTaskClassDto::RouteTracing); - assert!(plan.inferred_task_class); + assert!(text.contains( + "The top-level Codex CLI has a cited Exec subcommand and command-module entrypoint in `codex_exec`." + )); assert!( - plan.queries.iter().any(|query| query.query == "/api/users"), - "route-like terms should become concrete packet queries: {plan:?}" + !text.contains("non-interactive execution"), + "production packet claim templates must not infer Codex exec semantics from a subcommand name: {text}" ); + assert!(text.contains( + "The codex-exec binary parses exec-specific CLI options and calls codex_exec::run_main." + )); + assert!(text.contains( + "The exec CLI defines --json as the switch that chooses JSONL stdout output." + )); + assert!(text.contains( + "run_main loads config, resolves sandbox and approval settings, and builds the in-process app-server start arguments" + )); + assert!(text.contains( + "The command or public entrypoint for this flow is anchored by `codex_exec::Cli`" + )); + assert!(text.contains("Runtime orchestration is anchored by `codex_exec::run_main`")); + assert!(text.contains( + "JSON/event output processing is anchored by `EventProcessorWithJsonOutput`" + )); assert!( - plan.queries - .iter() - .any(|query| query.query == "AppController"), - "CamelCase symbols should become concrete packet queries: {plan:?}" - ); - } - - #[test] - fn requested_packet_task_class_overrides_heuristic() { - let plan = build_packet_plan( - "What would change if the indexing cache format moved?", - Some(PacketTaskClassDto::ChangeImpact), - PacketBudgetModeDto::Standard, + text.contains( + "App-server request protocol evidence is anchored by `ThreadStartParams`" + ) ); - - assert_eq!(plan.task_class, PacketTaskClassDto::ChangeImpact); - assert!(!plan.inferred_task_class); + assert!(text.contains( + "Event-output processing evidence describes how structured runtime events are serialized for JSON/JSONL output." + )); assert!( - plan.queries - .iter() - .any(|query| query.query.contains("affected")), - "change impact plans should seed affected-symbol queries: {plan:?}" + !text.contains("DebugSubcommand` is defined in cited source"), + "definition claims should not crowd out exact command-flow claims: {text}" ); } #[test] - fn packet_plan_expands_task_wording_without_fixture_specific_anchors() { - let plan = build_packet_plan( - "Explain how a full indexing run moves from the CLI into runtime orchestration, file discovery, symbol extraction, persistence, and search or snapshot refresh.", - Some(PacketTaskClassDto::ArchitectureExplanation), - PacketBudgetModeDto::Standard, - ); - let queries = plan - .queries - .iter() - .map(|query| query.query.as_str()) - .collect::>(); - - for expected in [ - "index service", - "workspace execution plan", - "workspace indexer", - "symbol extraction indexer", - "search projection", - "snapshot refresh", - "indexing", - "runtime", - "IndexingRun", - "RuntimeOrchestration", - "architecture entrypoint", - "runtime flow", - ] { - assert!( - queries.contains(&expected), - "expected generic probe {expected} in packet plan: {queries:?}" - ); - } - for fixture_anchor in [ - "run_index", - "IndexService", - "WorkspaceIndexer", - "flush_projection_batch", - "SnapshotStore", - ] { - assert!( - !queries.contains(&fixture_anchor), - "packet planner should not inject fixture-specific anchor {fixture_anchor}: {queries:?}" - ); - } - } + fn packet_supported_claims_surface_ranked_definitions_from_cited_sources() { + let temp_root = std::env::temp_dir().join(format!( + "codestory-source-definition-claims-{}", + std::process::id() + )); + let exec_src = temp_root.join("exec").join("src"); + std::fs::create_dir_all(&exec_src).expect("create temp exec src"); + let exec_lib = exec_src.join("lib.rs"); + std::fs::write( + &exec_lib, + r#" + pub async fn run_exec_session() {} + pub struct EventProcessorWithJsonOutput; + pub struct ThreadStartParams; + "#, + ) + .expect("write temp exec lib"); + let exec_lib_path = exec_lib.to_string_lossy().to_string(); + let answer = AgentAnswerDto { + answer_id: "source-definition-fixture".to_string(), + prompt: "Explain how `codex exec --json` flows from the exec runtime into app-server thread start requests and JSONL event output.".to_string(), + summary: "Exec flow evidence is covered.".to_string(), + freshness: None, + sections: Vec::new(), + citations: vec![test_packet_citation("exec runtime", &exec_lib_path, 0.8)], + subgraph_ids: Vec::new(), + retrieval_version: "test".to_string(), + graphs: Vec::new(), + retrieval_trace: codestory_contracts::api::AgentRetrievalTraceDto { + request_id: "source-definition-fixture".to_string(), + resolved_profile: AgentRetrievalPresetDto::Architecture, + policy_mode: AgentRetrievalPolicyModeDto::LatencyFirst, + total_latency_ms: 1, + sla_target_ms: None, + sla_missed: false, + semantic_fallback_count: 0, + semantic_fallbacks: Vec::new(), + annotations: Vec::new(), + steps: Vec::new(), + packet_sidecar_diagnostics: Vec::new(), + retrieval_shadow: None, + }, + }; - #[test] - fn compact_packet_plan_promotes_indexing_flow_stage_queries() { - let plan = build_packet_plan( - "Explain how a full indexing run moves from the CLI into runtime orchestration, file discovery, symbol extraction, persistence, and search or snapshot refresh.", - Some(PacketTaskClassDto::ArchitectureExplanation), - PacketBudgetModeDto::Compact, - ); - let queries = plan - .queries + let claims = packet_supported_claims(&answer); + let text = claims .iter() - .map(|query| query.query.as_str()) - .collect::>(); - - for expected in [ - "index service", - "workspace execution plan", - "workspace indexer", - "search projection", - "snapshot refresh", - ] { - assert!( - queries.contains(&expected), - "expected indexing-flow stage probe {expected} in compact packet plan: {queries:?}" - ); - } + .map(|claim| claim.claim.as_str()) + .collect::>() + .join("\n"); - let stage_index = queries - .iter() - .position(|query| *query == "index service") - .expect("index service stage probe should be present"); - for low_signal in ["full", "moves", "run_moves", "RunMoves"] { - assert!( - !queries.contains(&low_signal), - "packet planner should suppress isolated low-signal term {low_signal}: {queries:?}" - ); - } - let broad_probe = "runtime"; - let probe_index = queries - .iter() - .position(|query| *query == broad_probe) - .expect("broad probe should still be present"); - assert!( - stage_index < probe_index, - "indexing-flow stage probes should precede broad probe {broad_probe}: {queries:?}" - ); + assert!(text.contains("`run_exec_session` is defined in cited source")); + assert!(text.contains("`EventProcessorWithJsonOutput` is defined in cited source")); + assert!(text.contains("`ThreadStartParams` is defined in cited source")); } #[test] - fn compact_packet_plan_protects_indexing_flow_action_probes() { - let plan = build_packet_plan( - "Explain how a full indexing run moves from the CLI into runtime orchestration, file discovery, symbol extraction, persistence, and search or snapshot refresh.", - Some(PacketTaskClassDto::ArchitectureExplanation), - PacketBudgetModeDto::Compact, - ); - let queries = plan - .queries - .iter() - .map(|query| query.query.as_str()) - .collect::>(); - - for expected in [ - "index service run indexing", - "workspace manifest build execution plan", - "workspace indexer run", - "index_file", - "storage flush projection batch", - "storage rebuild search symbol projection", - "snapshot refresh all stats", - ] { - assert!( - queries.contains(&expected), - "expected indexing-flow action probe {expected} in compact packet plan: {queries:?}" - ); - } - for fixture_anchor in [ - "IndexService::run_indexing_blocking", - "WorkspaceManifest::build_execution_plan", - "WorkspaceIndexer::run", - "Storage::rebuild_search_symbol_projection_from_node_table", - "SnapshotStore::refresh_all_with_stats", - ] { - assert!( - !queries.contains(&fixture_anchor), - "packet planner should protect generic action probes without injecting fixture-specific anchor {fixture_anchor}: {queries:?}" - ); - } - } + fn packet_supported_claims_include_indexing_storage_flow_specific_claims() { + let _eval_probes = EvalProbesGuard::enabled(); + let answer = AgentAnswerDto { + answer_id: "indexing-storage-fixture".to_string(), + prompt: "Explain project source-group indexing into storage.".to_string(), + summary: "Indexing and storage evidence is covered.".to_string(), + freshness: None, + sections: Vec::new(), + citations: vec![ + test_packet_citation("Project::buildIndex", "src/lib/project/Project.cpp", 0.8), + test_packet_citation( + "TaskFillIndexerCommandsQueue", + "src/lib/data/indexer/TaskFillIndexerCommandQueue.h", + 0.8, + ), + test_packet_citation( + "SourceGroupCxxCdb", + "src/lib_cxx/project/SourceGroupCxxCdb.cpp", + 0.8, + ), + test_packet_citation( + "IndexerCommandCxx", + "src/lib_cxx/data/indexer/IndexerCommandCxx.h", + 0.8, + ), + test_packet_citation( + "IndexerJava", + "src/lib_java/data/indexer/IndexerJava.cpp", + 0.8, + ), + test_packet_citation("StorageAccess", "src/lib/data/storage/StorageAccess.h", 0.8), + test_packet_citation( + "StorageAccessProxy", + "src/lib/data/storage/StorageAccessProxy.cpp", + 0.8, + ), + test_packet_citation( + "PersistentStorage", + "src/lib/data/storage/PersistentStorage.cpp", + 0.8, + ), + ], + subgraph_ids: Vec::new(), + retrieval_version: "test".to_string(), + graphs: Vec::new(), + retrieval_trace: codestory_contracts::api::AgentRetrievalTraceDto { + request_id: "indexing-storage-fixture".to_string(), + resolved_profile: AgentRetrievalPresetDto::Architecture, + policy_mode: AgentRetrievalPolicyModeDto::LatencyFirst, + total_latency_ms: 1, + sla_target_ms: None, + sla_missed: false, + semantic_fallback_count: 0, + semantic_fallbacks: Vec::new(), + annotations: Vec::new(), + steps: Vec::new(), + packet_sidecar_diagnostics: Vec::new(), + retrieval_shadow: None, + }, + }; - #[test] - fn compact_packet_initial_retrieval_keeps_semantic_hybrid_and_anchor_prompt() { - let plan = build_packet_plan( - "Explain how VS Code workbench startup reaches ExtensionService, ExtensionHostManager, AbstractExtHostExtensionService, and ExtHostCommands.executeCommand.", - Some(PacketTaskClassDto::ArchitectureExplanation), - PacketBudgetModeDto::Compact, - ); + let claims = packet_supported_claims(&answer); + let text = claims + .iter() + .map(|claim| claim.claim.as_str()) + .collect::>() + .join("\n"); + assert!(text.contains( + "Source-group configuration and indexing command evidence describe how repository configuration becomes indexing work." + )); + assert!(text.contains( + "Persistence/search-projection evidence describes how indexed data remains available to later application reads." + )); + assert!(text.contains("Indexing work queue behavior is anchored by `Project::buildIndex`")); + assert!(text.contains("Source-group configuration is anchored by `SourceGroupCxxCdb`")); + assert!(text.contains("Persistence or search projection is anchored by `StorageAccess`")); assert!( - packet_initial_hybrid_weights(&plan, PacketBudgetModeDto::Compact).is_none(), - "compact packets should not collapse initial retrieval to lexical-only" - ); - let prompt = packet_retrieval_prompt( - "Explain startup.", - &plan, - None, - PacketBudgetModeDto::Compact, + text.contains("Persistence or search projection is anchored by `PersistentStorage`") ); - assert!(prompt.starts_with("Explain startup.")); - assert!(prompt.contains("Planned CodeStory queries:")); - assert!(prompt.contains("ExtensionService")); - assert!(prompt.contains("ExtHostCommands")); - assert!(prompt.to_ascii_lowercase().contains("workbench")); } #[test] - fn packet_plan_suppresses_low_signal_broad_prompt_terms() { - let plan = build_packet_plan( - "Study current architecture boundaries across contracts workspace indexer store runtime cli bench retrieval packet flow ranking precision latency risks.", - Some(PacketTaskClassDto::ArchitectureExplanation), - PacketBudgetModeDto::Standard, + fn packet_supported_claims_include_indexing_pipeline_flow_claims() { + let question = "Explain how a full indexing run moves from the CLI into runtime orchestration, file discovery, symbol extraction, persistence, and search or snapshot refresh."; + let answer = packet_answer_fixture( + question, + vec![ + test_packet_citation("CliDirection", "crates/codestory-cli/src/args.rs", 0.8), + test_packet_citation( + "IndexService::run_indexing_blocking_without_runtime_refresh", + "crates/codestory-runtime/src/services.rs", + 0.8, + ), + test_packet_citation( + "Runtime::index_service", + "crates/codestory-runtime/src/lib.rs", + 0.8, + ), + test_packet_citation( + "WorkspaceManifest::build_execution_plan", + "crates/codestory-workspace/src/lib.rs", + 0.8, + ), + test_packet_citation( + "WorkspaceIndexer::run", + "crates/codestory-indexer/src/lib.rs", + 0.8, + ), + test_packet_citation("index_file", "crates/codestory-indexer/src/lib.rs", 0.8), + test_packet_citation( + "Storage::flush_projection_batch", + "crates/codestory-store/src/storage_impl/mod.rs", + 0.8, + ), + test_packet_citation( + "Storage::rebuild_search_symbol_projection_from_node_table", + "crates/codestory-store/src/storage_impl/mod.rs", + 0.8, + ), + test_packet_citation( + "SnapshotStore::refresh_all_with_stats", + "crates/codestory-store/src/snapshot_store.rs", + 0.8, + ), + ], ); - let queries = plan - .queries + + let claims = packet_supported_claims(&answer); + let text = claims .iter() - .map(|query| query.query.as_str()) - .collect::>(); + .map(|claim| claim.claim.as_str()) + .collect::>() + .join("\n"); - for low_signal in [ - "current", - "Current", - "current_architecture", - "CurrentArchitecture", - "latency_risks", - "LatencyRisks", - "risks", - "Risks", - ] { - assert!( - !queries.contains(&low_signal), - "packet planner should suppress low-signal broad prompt term {low_signal}: {queries:?}" - ); - } - for retained in [ - "contracts", - "workspace", - "indexer", - "store", - "bench_retrieval", + for expected in [ + "The CLI index command prepares command options and delegates indexing work into the runtime layer.", + "The runtime opens the workspace and store, chooses full or incremental indexing, and coordinates later refresh phases.", + "The workspace crate is responsible for source-file discovery and refresh-plan construction.", + "The indexer extracts nodes, edges, occurrences, and related symbol data from source files.", + "The store persists graph and file data to SQLite and rebuilds query/search projections from persisted data.", + "Snapshot refresh happens after persisted data changes so later grounding and summary reads see current indexed state.", ] { assert!( - queries.contains(&retained), - "packet planner should retain concrete repo/retrieval term {retained}: {queries:?}" + text.contains(expected), + "indexing pipeline packet claims should include `{expected}`: {text}" ); } } #[test] - fn packet_plan_keeps_broad_risk_study_as_architecture() { - let plan = build_packet_plan( - "Study current architecture boundaries, packet flow, ranking precision, and latency risks.", - None, - PacketBudgetModeDto::Standard, + fn packet_sufficiency_accepts_generic_indexing_flow_probes() { + let question = "Explain how a full indexing run moves from the CLI into runtime orchestration, file discovery, symbol extraction, persistence, and search or snapshot refresh."; + let (_answer, sufficiency) = build_sufficient_packet_fixture( + question, + PacketTaskClassDto::ArchitectureExplanation, + vec![ + test_packet_citation("CliDirection", "crates/codestory-cli/src/args.rs", 0.8), + test_packet_citation( + "indexing entrypoint", + "crates/codestory-runtime/src/services.rs", + 0.8, + ), + test_packet_citation( + "file discovery", + "crates/codestory-workspace/src/lib.rs", + 0.8, + ), + test_packet_citation( + "symbol extraction", + "crates/codestory-indexer/src/lib.rs", + 0.8, + ), + test_packet_citation( + "storage persistence", + "crates/codestory-store/src/storage_impl/mod.rs", + 0.8, + ), + test_packet_citation( + "search projection", + "crates/codestory-store/src/storage_impl/mod.rs", + 0.8, + ), + test_packet_citation( + "snapshot refresh", + "crates/codestory-store/src/snapshot_store.rs", + 0.8, + ), + ], ); - let queries = plan - .queries - .iter() - .map(|query| query.query.as_str()) - .collect::>(); - assert_eq!(plan.task_class, PacketTaskClassDto::ArchitectureExplanation); - assert!( - queries.contains(&"architecture entrypoint"), - "architecture packets should keep architecture seeds: {queries:?}" - ); - assert!( - !queries.contains(&"affected symbols"), - "generic risk wording should not force change-impact seeds: {queries:?}" + assert_eq!( + sufficiency.status, + PacketSufficiencyStatusDto::Sufficient, + "{sufficiency:?}" ); - } - - #[test] - fn packet_plan_routes_specific_risk_of_change_prompts_to_change_impact() { - for question in [ - "What risk if I change reference resolution behavior?", - "What is the risk of changing reference resolution behavior?", + for probe in [ + "indexing entrypoint", + "file discovery", + "symbol extraction", + "storage persistence", + "search projection", + "snapshot refresh", ] { - let plan = build_packet_plan(question, None, PacketBudgetModeDto::Standard); - let queries = plan - .queries - .iter() - .map(|query| query.query.as_str()) - .collect::>(); - - assert_eq!( - plan.task_class, - PacketTaskClassDto::ChangeImpact, - "{question}" + assert!( + sufficiency.gaps.iter().all(|gap| !gap.contains(probe)), + "generic indexing-flow probe {probe} should satisfy required probe gaps: {sufficiency:?}" ); assert!( - queries.contains(&"affected symbols"), - "specific risk-of-change prompts should keep change-impact seeds: {queries:?}" + sufficiency + .follow_up_commands + .iter() + .all(|command| !command.contains(probe)), + "generic indexing-flow probe {probe} should not produce follow-up commands: {sufficiency:?}" ); } } - #[test] - fn packet_plan_preserves_quoted_low_signal_symbol_queries() { - let plan = build_packet_plan("Find `current`.", None, PacketBudgetModeDto::Standard); - let queries = plan - .queries + #[test] + fn production_packet_claims_do_not_synthesize_local_real_template_claims() { + let answer = AgentAnswerDto { + answer_id: "indexing-storage-production-fixture".to_string(), + prompt: "Explain project source-group indexing into storage.".to_string(), + summary: "Indexing and storage evidence is covered.".to_string(), + freshness: None, + sections: Vec::new(), + citations: vec![ + test_packet_citation("Project::buildIndex", "src/lib/project/Project.cpp", 0.8), + test_packet_citation( + "SourceGroupCxxCdb", + "src/lib_cxx/project/SourceGroupCxxCdb.cpp", + 0.8, + ), + test_packet_citation( + "IndexerCommandCxx", + "src/lib_cxx/data/indexer/IndexerCommandCxx.h", + 0.8, + ), + test_packet_citation("StorageAccess", "src/lib/data/storage/StorageAccess.h", 0.8), + test_packet_citation( + "PersistentStorage", + "src/lib/data/storage/PersistentStorage.cpp", + 0.8, + ), + ], + subgraph_ids: Vec::new(), + retrieval_version: "test".to_string(), + graphs: Vec::new(), + retrieval_trace: codestory_contracts::api::AgentRetrievalTraceDto { + request_id: "indexing-storage-production-fixture".to_string(), + resolved_profile: AgentRetrievalPresetDto::Architecture, + policy_mode: AgentRetrievalPolicyModeDto::LatencyFirst, + total_latency_ms: 1, + sla_target_ms: None, + sla_missed: false, + semantic_fallback_count: 0, + semantic_fallbacks: Vec::new(), + annotations: Vec::new(), + steps: Vec::new(), + packet_sidecar_diagnostics: Vec::new(), + retrieval_shadow: None, + }, + }; + + let claims = packet_supported_claims(&answer); + let text = claims .iter() - .map(|query| query.query.as_str()) - .collect::>(); + .map(|claim| claim.claim.as_str()) + .collect::>() + .join("\n"); assert!( - queries.contains(&"current"), - "quoted symbol queries should not be filtered as low-signal broad terms: {queries:?}" + !text.contains("Project::buildIndex builds a per-source-group indexing task pipeline"), + "production claims should not inject Sourcetrail-specific template text: {text}" + ); + assert!( + !text.contains("SourceGroupCxxCdb reads compile database input"), + "production claims should not inject Sourcetrail-specific template text: {text}" ); + assert!(text.contains("Indexing work queue behavior is anchored by `Project::buildIndex`")); + assert!(text.contains("Persistence or search projection is anchored by `StorageAccess`")); } #[test] - fn symbol_ownership_packet_plan_seeds_generic_ownership_terms() { - let question = "Explain which modules own application creation, app-level rendering, response serialization, file sending, and view lookup."; - let plan = build_packet_plan( - question, - Some(PacketTaskClassDto::SymbolOwnership), - PacketBudgetModeDto::Standard, - ); - let queries = plan - .queries + fn packet_supported_claims_include_vscode_workbench_extension_host_claims() { + let answer = AgentAnswerDto { + answer_id: "vscode-fixture".to_string(), + prompt: "Explain VS Code workbench extension-host command execution.".to_string(), + summary: "VS Code workbench flow evidence is covered.".to_string(), + freshness: None, + sections: Vec::new(), + citations: vec![ + test_packet_citation( + "Workbench.startup", + "src/vs/workbench/browser/workbench.ts", + 0.8, + ), + test_packet_citation( + "ExtensionService", + "src/vs/workbench/services/extensions/browser/extensionService.ts", + 0.8, + ), + test_packet_citation( + "ExtensionHostManager", + "src/vs/workbench/services/extensions/common/extensionHostManager.ts", + 0.8, + ), + test_packet_citation( + "AbstractExtHostExtensionService", + "src/vs/workbench/api/common/extHostExtensionService.ts", + 0.8, + ), + test_packet_citation( + "ExtHostCommands", + "src/vs/workbench/api/common/extHostCommands.ts", + 0.8, + ), + ], + subgraph_ids: Vec::new(), + retrieval_version: "test".to_string(), + graphs: Vec::new(), + retrieval_trace: codestory_contracts::api::AgentRetrievalTraceDto { + request_id: "vscode-fixture".to_string(), + resolved_profile: AgentRetrievalPresetDto::Architecture, + policy_mode: AgentRetrievalPolicyModeDto::LatencyFirst, + total_latency_ms: 1, + sla_target_ms: None, + sla_missed: false, + semantic_fallback_count: 0, + semantic_fallbacks: Vec::new(), + annotations: Vec::new(), + steps: Vec::new(), + packet_sidecar_diagnostics: Vec::new(), + retrieval_shadow: None, + }, + }; + + let claims = packet_supported_claims(&answer); + let text = claims .iter() - .map(|query| query.query.as_str()) - .collect::>(); + .map(|claim| claim.claim.as_str()) + .collect::>() + .join("\n"); - for expected in [ - "references", - "callers", - "definition references", - "application", - "view", - "lookup", - "application_creation", - "ApplicationCreation", - ] { - assert!( - queries.contains(&expected), - "expected {expected} in generic ownership packet plan: {queries:?}" - ); - } - for fixture_anchor in ["createApplication", "lib/express.js", "lib/response.js"] { - assert!( - !queries.contains(&fixture_anchor), - "ownership planning should not inject fixture-specific anchor {fixture_anchor}: {queries:?}" - ); - } + assert!(text.contains("Runtime orchestration is anchored by `ExtensionService`")); + assert!(text.contains( + "The command or public entrypoint for this flow is anchored by `ExtHostCommands`" + )); + assert!( + text.contains("Source evidence is anchored by") + || text.contains("Runtime orchestration is anchored by"), + "VS Code packet claims should use generic role-led anchors: {text}" + ); } #[test] - fn bug_packet_plan_seeds_generic_failure_terms_and_prompt_identifiers() { - let question = - "Localize an app.param callback decode bug through router parameter handling."; - let plan = build_packet_plan( - question, - Some(PacketTaskClassDto::BugLocalization), - PacketBudgetModeDto::Standard, - ); - let queries = plan - .queries + fn packet_supported_claims_include_payload_public_content_flow_claims() { + let answer = AgentAnswerDto { + answer_id: "payload-fixture".to_string(), + prompt: "Explain Payload posts comments RSS and Elsewhere feed.".to_string(), + summary: "Payload public content flow evidence is covered.".to_string(), + freshness: None, + sections: Vec::new(), + citations: vec![ + test_packet_citation("buildConfig", "src/payload.config.ts", 0.8), + test_packet_citation("Posts", "src/collections/Posts.ts", 0.8), + test_packet_citation("SocialEntries", "src/collections/SocialEntries.ts", 0.8), + test_packet_citation("PostPage", "src/app/(frontend)/posts/[slug]/page.tsx", 0.8), + test_packet_citation( + "POST /posts/:slug/comments", + "src/app/(frontend)/posts/[slug]/comments/route.ts", + 0.8, + ), + test_packet_citation("GET /feed.xml", "src/app/feed.xml/route.ts", 0.8), + test_packet_citation("getPayloadClient", "src/lib/payload.ts", 0.8), + test_packet_citation( + "getCommentAuthContextFromHeaders", + "src/lib/comment-auth.ts", + 0.8, + ), + test_packet_citation( + "getLatestSocialEntries", + "src/lib/content-data/social-entry-content.ts", + 0.8, + ), + ], + subgraph_ids: Vec::new(), + retrieval_version: "test".to_string(), + graphs: Vec::new(), + retrieval_trace: codestory_contracts::api::AgentRetrievalTraceDto { + request_id: "payload-fixture".to_string(), + resolved_profile: AgentRetrievalPresetDto::Architecture, + policy_mode: AgentRetrievalPolicyModeDto::LatencyFirst, + total_latency_ms: 1, + sla_target_ms: None, + sla_missed: false, + semantic_fallback_count: 0, + semantic_fallbacks: Vec::new(), + annotations: Vec::new(), + steps: Vec::new(), + packet_sidecar_diagnostics: Vec::new(), + retrieval_shadow: None, + }, + }; + + let claims = packet_supported_claims(&answer); + let text = claims .iter() - .map(|query| query.query.as_str()) - .collect::>(); + .map(|claim| claim.claim.as_str()) + .collect::>() + .join("\n"); - for expected in [ - "app.param", - "param", - "callback", - "error", - "validate", - "error path", - "failure handling", - ] { - assert!( - queries.contains(&expected), - "expected {expected} in generic bug packet plan: {queries:?}" - ); - } - for fixture_anchor in ["proto.param", "Layer.prototype.match", "test/app.param.js"] { - assert!( - !queries.contains(&fixture_anchor), - "bug planning should not inject fixture-specific anchor {fixture_anchor}: {queries:?}" - ); - } + assert!(text.contains("Collection configuration is anchored by `Posts`")); + assert!(text.contains("Route handling is anchored by `POST /posts/:slug/comments`")); + assert!(text.contains("`getPayloadClient` in `src/lib/payload.ts`")); } #[test] - fn route_tracing_packet_plan_seeds_generic_route_terms() { - let question = "Trace how an application registers middleware and routes, then dispatches an incoming request through router layers to a route handler."; - let plan = build_packet_plan( - question, - Some(PacketTaskClassDto::RouteTracing), - PacketBudgetModeDto::Standard, - ); - let queries = plan - .queries + fn packet_ranking_prefers_payload_collections_over_component_and_preview_fillers() { + let question = "Explain how Payload collections, post rendering, comment submission, RSS, and the Elsewhere feed connect."; + let mut answer = AgentAnswerDto { + answer_id: "payload-rank-fixture".to_string(), + prompt: question.to_string(), + summary: "Payload public content flow evidence is covered.".to_string(), + freshness: None, + sections: Vec::new(), + citations: vec![ + test_packet_citation("PostComments", "src/components/PostComments.tsx", 0.55), + test_packet_citation("posts", "src/lib/content-data/preview-content.ts", 0.55), + test_packet_citation("Posts", "src/collections/Posts.ts", 0.8), + test_packet_citation("Comments", "src/collections/Comments.ts", 0.8), + ], + subgraph_ids: Vec::new(), + retrieval_version: "test".to_string(), + graphs: Vec::new(), + retrieval_trace: codestory_contracts::api::AgentRetrievalTraceDto { + request_id: "payload-rank-fixture".to_string(), + resolved_profile: AgentRetrievalPresetDto::Architecture, + policy_mode: AgentRetrievalPolicyModeDto::LatencyFirst, + total_latency_ms: 1, + sla_target_ms: None, + sla_missed: false, + semantic_fallback_count: 0, + semantic_fallbacks: Vec::new(), + annotations: Vec::new(), + steps: Vec::new(), + packet_sidecar_diagnostics: Vec::new(), + retrieval_shadow: None, + }, + }; + + rank_packet_evidence(question, &mut answer); + let top_paths = answer + .citations .iter() - .map(|query| query.query.as_str()) + .take(2) + .filter_map(|citation| citation.file_path.as_deref().map(packet_display_path)) .collect::>(); - for expected in [ - "router", - "handler", - "route", - "middleware", - "dispatch", - "route handler endpoint", - ] { - assert!( - queries.contains(&expected), - "expected {expected} in route tracing packet plan: {queries:?}" - ); - } - for fixture_anchor in [ - "createApplication", - "lib/router/layer.js", - "Router.StrictSlash", - ] { - assert!( - !queries.contains(&fixture_anchor), - "route tracing should not inject fixture-specific anchor {fixture_anchor}: {queries:?}" - ); - } + assert_eq!( + top_paths, + vec!["src/collections/Posts.ts", "src/collections/Comments.ts"], + "Payload collection files should outrank nearby rendering/preview fillers: {top_paths:?}" + ); } #[test] - fn packet_supported_claims_use_generic_evidence_roles() { - let limits = packet_budget_limits(PacketBudgetModeDto::Compact); + fn packet_ranking_demotes_test_paths_without_fixture_specific_boosts() { + let question = "Trace route dispatch through a handler."; let mut answer = AgentAnswerDto { - answer_id: "generic-fixture".to_string(), - prompt: "Explain the packet evidence roles.".to_string(), - summary: "Generic evidence roles are covered.".to_string(), + answer_id: "rank-fixture".to_string(), + prompt: question.to_string(), + summary: "Route evidence is covered by cited anchors.".to_string(), freshness: None, sections: Vec::new(), citations: vec![ - test_packet_citation("CliCommand", "crates/tool-cli/src/main.rs", 0.8), - test_packet_citation("RuntimeCoordinator", "crates/core/src/runtime.rs", 0.8), - test_packet_citation("WorkspacePlan", "crates/core/src/workspace/plan.rs", 0.8), - test_packet_citation("GraphIndexer", "crates/indexer/src/lib.rs", 0.8), - test_packet_citation("ProjectionStore", "crates/store/src/projection.rs", 0.8), - test_packet_citation("SnapshotRefresh", "crates/store/src/snapshot.rs", 0.8), - test_packet_citation("RouteHandler", "src/routes/user.rs", 0.8), - test_packet_citation("PacketRegression", "tests/packet_flow.rs", 0.8), + test_packet_citation("RouteHandler test", "tests/router_handler.rs", 5.0), + test_packet_citation("RouteHandler", "src/router/handler.rs", 0.5), ], subgraph_ids: Vec::new(), retrieval_version: "test".to_string(), graphs: Vec::new(), retrieval_trace: codestory_contracts::api::AgentRetrievalTraceDto { - request_id: "generic-fixture".to_string(), + request_id: "rank-fixture".to_string(), resolved_profile: AgentRetrievalPresetDto::Architecture, policy_mode: AgentRetrievalPolicyModeDto::LatencyFirst, total_latency_ms: 1, @@ -8604,205 +6602,133 @@ mod tests { semantic_fallbacks: Vec::new(), annotations: Vec::new(), steps: Vec::new(), + packet_sidecar_diagnostics: Vec::new(), retrieval_shadow: None, }, }; - append_packet_evidence_sections( - &mut answer, - PacketTaskClassDto::ArchitectureExplanation, - &limits, - ); - let text = answer - .sections - .iter() - .flat_map(|section| §ion.blocks) - .filter_map(|block| match block { - AgentResponseBlockDto::Markdown { markdown } => Some(markdown.as_str()), - AgentResponseBlockDto::Mermaid { .. } => None, - }) - .collect::>() - .join("\n"); - - for expected_claim in [ - "The command or public entrypoint for this flow is anchored by `CliCommand`", - "Runtime orchestration is anchored by `RuntimeCoordinator`", - "Workspace discovery or planning is anchored by `WorkspacePlan`", - "Symbol extraction is anchored by `GraphIndexer`", - "Persistence or search projection is anchored by `ProjectionStore`", - "Snapshot refresh is anchored by `SnapshotRefresh`", - "Route handling is anchored by `RouteHandler`", - ] { - assert!( - text.contains(expected_claim), - "generic packet claims should include {expected_claim}: {text}" - ); - } - assert!( - !text.contains("Regression coverage for this flow is anchored by `PacketRegression`"), - "test-path regression claims should not crowd out primary flow claims: {text}" - ); + rank_packet_evidence(question, &mut answer); + assert_eq!(answer.citations[0].display_name, "RouteHandler"); } #[test] - fn packet_supported_claims_generic_source_claims_are_domain_neutral_without_eval_probes() { - let _env = EnvVarGuard::cleared(EVAL_PROBES_ENV); - let answer = packet_answer_fixture( - "Explain service ownership flow for ComputeFlow and PersistFlow.", - vec![ - test_packet_citation("ComputeFlow", "src/domain/flow.rs", 0.8), - test_packet_citation("PersistFlow", "src/domain/persist.rs", 0.8), + fn packet_ranking_demotes_test_named_source_helpers_for_production_prompts() { + let question = "Explain runtime orchestration and search projection in the indexing flow."; + let mut answer = AgentAnswerDto { + answer_id: "rank-test-symbols".to_string(), + prompt: question.to_string(), + summary: "Runtime evidence is covered by cited anchors.".to_string(), + freshness: None, + sections: Vec::new(), + citations: vec![ + test_packet_citation( + "EmbeddingRuntime::test_runtime", + "crates/codestory-runtime/src/search/engine.rs", + 5.0, + ), + test_packet_citation( + "tests::drill_question_search_is_partial_discovery_evidence", + "crates/codestory-cli/src/main.rs", + 5.0, + ), + test_packet_citation( + "IndexService::run_indexing_blocking", + "crates/codestory-runtime/src/services.rs", + 0.5, + ), ], - ); + subgraph_ids: Vec::new(), + retrieval_version: "test".to_string(), + graphs: Vec::new(), + retrieval_trace: codestory_contracts::api::AgentRetrievalTraceDto { + request_id: "rank-test-symbols".to_string(), + resolved_profile: AgentRetrievalPresetDto::Architecture, + policy_mode: AgentRetrievalPolicyModeDto::LatencyFirst, + total_latency_ms: 1, + sla_target_ms: None, + sla_missed: false, + semantic_fallback_count: 0, + semantic_fallbacks: Vec::new(), + annotations: Vec::new(), + steps: Vec::new(), + packet_sidecar_diagnostics: Vec::new(), + retrieval_shadow: None, + }, + }; - let claims = packet_supported_claims(&answer); - let text = claims - .iter() - .map(|claim| claim.claim.as_str()) - .collect::>() - .join("\n"); - let lower = text.to_ascii_lowercase(); + rank_packet_evidence(question, &mut answer); - assert!( - lower.contains("source evidence") - || lower.contains("`computeflow` in `src/domain/flow.rs`"), - "generic source claim should be present: {text}" + assert_eq!( + answer.citations[0].display_name, + "IndexService::run_indexing_blocking" ); - for forbidden in [ - "supporting evidence", - "interceptor", - "dispatch", - "axios", - "http", - "holdout", - "eval", - "probe", - "bench", - ] { - assert!( - !lower.contains(forbidden), - "generic source claims must not contain `{forbidden}` with eval probes disabled: {text}" - ); - } - } - - #[test] - fn packet_supported_claims_include_exec_flow_specific_claims() { - let temp_root = - std::env::temp_dir().join(format!("codestory-exec-flow-claims-{}", std::process::id())); - let cli_src = temp_root.join("cli").join("src"); - let exec_src = temp_root.join("exec").join("src"); - std::fs::create_dir_all(&cli_src).expect("create temp cli src"); - std::fs::create_dir_all(&exec_src).expect("create temp exec src"); - let cli_main = cli_src.join("main.rs"); - std::fs::write( - &cli_main, - r#" - pub enum Subcommand { - Exec, - } - pub struct DebugSubcommand; - mod codex_exec; - "#, - ) - .expect("write temp cli main"); - let exec_main = exec_src.join("main.rs"); - std::fs::write( - &exec_main, - r#" - fn main() { - codex_exec::run_main(); - } - "#, - ) - .expect("write temp exec main"); - let exec_cli = exec_src.join("cli.rs"); - std::fs::write( - &exec_cli, - r#" - pub struct Cli { - /// Print events to stdout as JSONL. - #[arg(long = "json", alias = "experimental-json")] - pub json: bool, - } - pub struct ExecSharedCliOptions; - "#, - ) - .expect("write temp exec cli"); - let exec_lib = exec_src.join("lib.rs"); - std::fs::write( - &exec_lib, - r#" - pub async fn run_main() { - let config = ConfigBuilder::default().build().await?; - let approval_policy = config.permissions.approval_policy.value(); - let sandbox = config.permissions.sandbox_policy.value(); - let in_process_start_args = InProcessClientStartArgs { - config: std::sync::Arc::new(config.clone()), - client_name: "codex_exec".to_string(), - }; - run_exec_session(in_process_start_args).await - } - "#, - ) - .expect("write temp exec lib"); - let event_jsonl = exec_src.join("event_processor_with_jsonl_output.rs"); - std::fs::write( - &event_jsonl, - r#" - use crate::exec_events::ThreadEvent; - pub struct EventProcessorWithJsonOutput; - impl EventProcessorWithJsonOutput { - fn emit(&self, event: ThreadEvent) { - println!("{}", serde_json::to_string(&event).unwrap()); - } - } - "#, - ) - .expect("write temp jsonl event processor"); - let cli_main_path = cli_main.to_string_lossy().to_string(); - let exec_main_path = exec_main.to_string_lossy().to_string(); - let exec_cli_path = exec_cli.to_string_lossy().to_string(); - let exec_lib_path = exec_lib.to_string_lossy().to_string(); - let event_jsonl_path = event_jsonl.to_string_lossy().to_string(); - let answer = AgentAnswerDto { - answer_id: "exec-fixture".to_string(), - prompt: "Explain how `codex exec --json` flows from the top-level CLI into the exec runtime, app-server thread and turn start requests, and JSONL event output.".to_string(), - summary: "Exec flow evidence is covered.".to_string(), + assert_eq!( + packet_evidence_role(&answer.citations[1]), + Some(PacketEvidenceRole::TestsAndRegressionCoverage) + ); + assert_eq!( + packet_evidence_role(&answer.citations[2]), + Some(PacketEvidenceRole::TestsAndRegressionCoverage) + ); + } + + #[test] + fn packet_ranking_demotes_non_primary_roles_for_production_prompts() { + let question = "Trace production route dispatch through the handler."; + let mut answer = AgentAnswerDto { + answer_id: "rank-roles".to_string(), + prompt: question.to_string(), + summary: "Route evidence is covered by cited anchors.".to_string(), freshness: None, sections: Vec::new(), citations: vec![ - test_packet_citation("Subcommand::Exec", &cli_main_path, 0.8), - test_packet_citation("codex_exec::Cli", &cli_main_path, 0.8), - test_packet_citation("codex_exec::run_main", &exec_main_path, 0.8), - test_packet_citation( - "ExecSharedCliOptions::into_inner", - &exec_cli_path, - 0.8, - ), - test_packet_citation("run_main", &exec_lib_path, 0.8), - test_packet_citation("run_exec_session", &exec_lib_path, 0.8), - test_packet_citation( - "EventProcessorWithJsonOutput", - &event_jsonl_path, - 0.8, - ), - test_packet_citation( - "ThreadStartParams", - "codex-rs/app-server-protocol/src/protocol/v2/thread.rs", - 0.8, - ), - test_packet_citation( - "TurnStartParams", - "codex-rs/app-server-protocol/src/protocol/v2/turn.rs", - 0.8, - ), + test_packet_citation("DocsRouteHandler", "docs/routes.md", 5.0), + test_packet_citation("GeneratedRouteHandler", "target/generated/routes.rs", 5.0), + test_packet_citation("VendorRouteHandler", "vendor/router/handler.rs", 5.0), + test_packet_citation("BenchRouteHandler", "benches/router_handler.rs", 5.0), + test_packet_citation("RouteHandler", "src/router/handler.rs", 0.5), ], subgraph_ids: Vec::new(), retrieval_version: "test".to_string(), graphs: Vec::new(), retrieval_trace: codestory_contracts::api::AgentRetrievalTraceDto { - request_id: "exec-fixture".to_string(), + request_id: "rank-roles".to_string(), + resolved_profile: AgentRetrievalPresetDto::Architecture, + policy_mode: AgentRetrievalPolicyModeDto::LatencyFirst, + total_latency_ms: 1, + sla_target_ms: None, + sla_missed: false, + semantic_fallback_count: 0, + semantic_fallbacks: Vec::new(), + annotations: Vec::new(), + steps: Vec::new(), + packet_sidecar_diagnostics: Vec::new(), + retrieval_shadow: None, + }, + }; + + rank_packet_evidence(question, &mut answer); + assert_eq!(answer.citations[0].display_name, "RouteHandler"); + } + + #[test] + fn packet_ranking_keeps_requested_docs_role_eligible() { + let question = "Trace the docs route dispatch example."; + let mut answer = AgentAnswerDto { + answer_id: "rank-docs".to_string(), + prompt: question.to_string(), + summary: "Route evidence is covered by cited anchors.".to_string(), + freshness: None, + sections: Vec::new(), + citations: vec![ + test_packet_citation("RouteHandler", "src/router/handler.rs", 0.5), + test_packet_citation("DocsRouteHandler", "docs/routes.md", 5.0), + ], + subgraph_ids: Vec::new(), + retrieval_version: "test".to_string(), + graphs: Vec::new(), + retrieval_trace: codestory_contracts::api::AgentRetrievalTraceDto { + request_id: "rank-docs".to_string(), resolved_profile: AgentRetrievalPresetDto::Architecture, policy_mode: AgentRetrievalPolicyModeDto::LatencyFirst, total_latency_ms: 1, @@ -8812,743 +6738,1132 @@ mod tests { semantic_fallbacks: Vec::new(), annotations: Vec::new(), steps: Vec::new(), + packet_sidecar_diagnostics: Vec::new(), retrieval_shadow: None, }, - }; + }; + + rank_packet_evidence(question, &mut answer); + assert_eq!(answer.citations[0].display_name, "DocsRouteHandler"); + } + + #[test] + fn sufficient_packets_stop_broad_exploration_across_task_classes() { + let fixtures = [ + ( + PacketTaskClassDto::ArchitectureExplanation, + "Explain how the command runtime loads a workspace plan and refreshes snapshots.", + vec![ + test_packet_citation("CliCommand", "crates/app-cli/src/main.rs", 0.9), + test_packet_citation( + "RuntimeCoordinator", + "crates/app-runtime/src/runtime.rs", + 0.9, + ), + test_packet_citation("WorkspacePlan", "crates/workspace/src/plan.rs", 0.8), + ], + "Runtime orchestration is anchored by `RuntimeCoordinator`", + "crates/app-runtime/src/runtime.rs", + ), + ( + PacketTaskClassDto::BugLocalization, + "Find the failure handling path for decode validation.", + vec![ + test_packet_citation("RuntimeErrorHandler", "src/runtime/errors.rs", 0.9), + test_packet_citation("DecodeValidator", "src/validation/decode.rs", 0.8), + test_packet_citation("DecodeRegression", "tests/decode_regression.rs", 0.7), + ], + "Runtime orchestration is anchored by `RuntimeErrorHandler`", + "src/runtime/errors.rs", + ), + ( + PacketTaskClassDto::ChangeImpact, + "What changes if reference resolution behavior changes?", + vec![ + test_packet_citation( + "AffectedReferenceIndex", + "crates/indexer/src/references.rs", + 0.9, + ), + test_packet_citation("ReferenceStore", "crates/store/src/references.rs", 0.8), + test_packet_citation( + "ReferenceRegression", + "tests/reference_regression.rs", + 0.7, + ), + ], + "Symbol extraction is anchored by `AffectedReferenceIndex`", + "crates/indexer/src/references.rs", + ), + ( + PacketTaskClassDto::RouteTracing, + "Trace how a request reaches the selected handler.", + vec![ + test_packet_citation("RouteDispatcher", "src/router/dispatch.rs", 0.9), + test_packet_citation("RouteHandler", "src/router/handler.rs", 0.8), + test_packet_citation("RouteRegression", "tests/route_regression.rs", 0.7), + ], + "Route handling is anchored by `RouteHandler`", + "src/router/handler.rs", + ), + ( + PacketTaskClassDto::SymbolOwnership, + "Who owns workspace planning and graph state?", + vec![ + test_packet_citation( + "WorkspaceOwnerPlan", + "crates/workspace/src/ownership.rs", + 0.9, + ), + test_packet_citation("GraphStateStore", "crates/store/src/graph.rs", 0.8), + test_packet_citation( + "OwnershipRegression", + "tests/ownership_regression.rs", + 0.7, + ), + ], + "Workspace discovery or planning is anchored by `WorkspaceOwnerPlan`", + "crates/workspace/src/ownership.rs", + ), + ( + PacketTaskClassDto::EditPlanning, + "Plan the focused edit for configuration validation behavior.", + vec![ + test_packet_citation("ConfigValidator", "src/config/validator.rs", 0.9), + test_packet_citation("ConfigEditPlan", "src/config/edit_plan.rs", 0.8), + test_packet_citation("ConfigRegression", "tests/config_regression.rs", 0.7), + ], + "Regression coverage for this flow is anchored by `ConfigRegression`", + "tests/config_regression.rs", + ), + ]; + + for (task_class, question, citations, expected_claim, avoid_path) in fixtures { + let (_answer, sufficiency) = + build_sufficient_packet_fixture(question, task_class, citations); + + assert_eq!( + sufficiency.status, + PacketSufficiencyStatusDto::Sufficient, + "task class {task_class:?} should be sufficient: {sufficiency:?}" + ); + assert!( + sufficiency.follow_up_commands.is_empty(), + "sufficient {task_class:?} packets should not recommend broad follow-up commands: {sufficiency:?}" + ); + assert!( + sufficiency.open_next.is_empty(), + "sufficient {task_class:?} packets should not name generic open-next work: {sufficiency:?}" + ); + assert!( + sufficiency + .covered_claims + .iter() + .any(|claim| claim.claim.contains(expected_claim)), + "sufficient {task_class:?} packet should name the covered task claim `{expected_claim}`: {sufficiency:?}" + ); + assert!( + sufficiency + .avoid_opening + .iter() + .any(|entry| entry.contains(avoid_path)), + "sufficient {task_class:?} packet should discourage reopening cited path `{avoid_path}`: {sufficiency:?}" + ); + assert!( + sufficiency + .avoid_opening_paths + .iter() + .any(|entry| entry == avoid_path), + "sufficient {task_class:?} packet should expose raw avoid-opening path `{avoid_path}`: {sufficiency:?}" + ); + } + } + + #[test] + fn architecture_sufficiency_requires_minimum_distinct_claim_families() { + let question = "Explain how project indexing reaches persistent storage."; + let citations = vec![ + test_packet_citation("Project::buildIndex", "src/lib/project/Project.cpp", 0.9), + test_packet_citation( + "TaskBuildIndex", + "src/lib/data/indexer/TaskBuildIndex.cpp", + 0.85, + ), + test_packet_citation( + "TaskFillIndexerCommandsQueue", + "src/lib/data/indexer/TaskFillIndexerCommandQueue.h", + 0.8, + ), + ]; + let (_answer, sufficiency) = build_sufficient_packet_fixture( + question, + PacketTaskClassDto::ArchitectureExplanation, + citations, + ); + assert_eq!( + sufficiency.status, + PacketSufficiencyStatusDto::Partial, + "duplicate claim families should not satisfy architecture packets: {sufficiency:?}" + ); + assert!( + sufficiency + .gaps + .iter() + .any(|gap| gap.contains("claim families")), + "architecture sufficiency should explain missing claim-family coverage: {sufficiency:?}" + ); + } + + #[test] + fn claim_family_coverage_uses_covered_claim_semantics() { + let claims = vec![ + PacketClaimDto { + claim: "The public useSWR export wraps useSWRHandler with argument normalization." + .to_string(), + citations: vec![test_packet_citation( + "useSWRHandler", + "src/index/use-swr.ts", + 0.9, + )], + }, + PacketClaimDto { + claim: "useSWRHandler serializes the key before reading cache state.".to_string(), + citations: vec![test_packet_citation( + "serialize", + "src/_internal/utils/serialize.ts", + 0.9, + )], + }, + PacketClaimDto { + claim: + "createCacheHelper provides cache get, set, subscribe, and snapshot helpers." + .to_string(), + citations: vec![test_packet_citation( + "createCacheHelper", + "src/_internal/utils/helper.ts", + 0.9, + )], + }, + PacketClaimDto { + claim: "internalMutate routes mutate behavior through the mutation helper." + .to_string(), + citations: vec![test_packet_citation( + "internalMutate", + "src/_internal/utils/mutate.ts", + 0.9, + )], + }, + ]; - let claims = packet_supported_claims(&answer); - let text = claims + let use_swr_handler = &claims[0].citations[0]; + assert_eq!( + packet_evidence_role(use_swr_handler), + Some(PacketEvidenceRole::SourceEvidence), + "a hook handler outside route-shaped paths should not become route handling" + ); + + let families = claims .iter() - .map(|claim| claim.claim.as_str()) - .collect::>() - .join("\n"); + .filter_map(packet_claim_family) + .collect::>(); - assert!(text.contains( - "The top-level Codex CLI has a cited Exec subcommand and command-module entrypoint in `codex_exec`." - )); + for expected in [ + "public api/export", + "key serialization", + "cache state", + "mutation flow", + ] { + assert!( + families.contains(expected), + "claim families should include `{expected}` from accepted covered-claim text: {families:?}" + ); + } + assert_eq!(packet_supported_claim_family_count(&claims), 4); + } + + #[test] + fn claim_family_coverage_recognizes_predicate_behavior() { + let claims = vec![ + PacketClaimDto { + claim: + "StringUtils.isBlank treats null, empty, and whitespace-only inputs as blank." + .to_string(), + citations: vec![test_packet_citation( + "StringUtils.isBlank", + "src/main/java/org/apache/commons/lang3/StringUtils.java", + 0.9, + )], + }, + PacketClaimDto { + claim: "StringUtils.isEmpty does not trim whitespace before deciding emptiness." + .to_string(), + citations: vec![test_packet_citation( + "StringUtils.isEmpty", + "src/main/java/org/apache/commons/lang3/StringUtils.java", + 0.9, + )], + }, + ]; + + assert_eq!(packet_claim_family(&claims[0]), Some("predicate behavior")); + assert_eq!(packet_claim_family(&claims[1]), Some("predicate behavior")); + assert_eq!(packet_supported_claim_family_count(&claims), 1); + } + + #[test] + fn partial_and_insufficient_packets_recommend_targeted_followups() { + let question = "Explain route dispatch with enough evidence to stop."; + let mut partial_answer = packet_answer_fixture( + question, + vec![test_packet_citation( + "RouteDispatcher", + "src/router/dispatch.rs", + 0.8, + )], + ); + let mut budget = apply_packet_budget( + packet_fixture_project_root(), + question, + PacketTaskClassDto::RouteTracing, + PacketBudgetModeDto::Tiny, + packet_budget_limits(PacketBudgetModeDto::Tiny), + &mut partial_answer, + ); + budget.truncated = true; + budget.omitted_sections = vec!["output_bytes".to_string()]; + let partial = build_packet_sufficiency( + packet_fixture_project_root(), + question, + PacketTaskClassDto::RouteTracing, + &partial_answer, + &budget, + ); + + assert_eq!(partial.status, PacketSufficiencyStatusDto::Partial); assert!( - !text.contains("non-interactive execution"), - "production packet claim templates must not infer Codex exec semantics from a subcommand name: {text}" + partial + .follow_up_commands + .iter() + .any(|command| command.contains("--budget compact")), + "partial packets should recommend the next deeper packet command: {partial:?}" ); - assert!(text.contains( - "The codex-exec binary parses exec-specific CLI options and calls codex_exec::run_main." - )); - assert!(text.contains( - "The exec CLI defines --json as the switch that chooses JSONL stdout output." - )); - assert!(text.contains( - "run_main loads config, resolves sandbox and approval settings, and builds the in-process app-server start arguments" - )); - assert!(text.contains( - "The command or public entrypoint for this flow is anchored by `codex_exec::Cli`" - )); - assert!(text.contains("Runtime orchestration is anchored by `codex_exec::run_main`")); - assert!(text.contains( - "JSON/event output processing is anchored by `EventProcessorWithJsonOutput`" - )); assert!( - text.contains( - "App-server request protocol evidence is anchored by `ThreadStartParams`" - ) + partial + .follow_up_commands + .iter() + .any(|command| command.contains("codestory-cli search")), + "partial packets should recommend targeted CodeStory search, not broad source reads: {partial:?}" ); - assert!(text.contains( - "Event-output processing evidence describes how structured runtime events are serialized for JSON/JSONL output." - )); assert!( - !text.contains("DebugSubcommand` is defined in cited source"), - "definition claims should not crowd out exact command-flow claims: {text}" + partial + .follow_up_commands + .iter() + .all(|command| !command.contains("")), + "partial packet follow-up commands should be directly runnable: {partial:?}" + ); + assert!( + partial + .follow_up_commands + .iter() + .all(|command| command.contains("--project 'C:/workspace/project root'")), + "partial packet follow-up commands should include the concrete project root: {partial:?}" + ); + + let mut weak_answer = packet_answer_fixture( + question, + vec![test_packet_citation( + "RouteDispatcher", + "src/router/dispatch.rs", + 0.8, + )], + ); + let weak_budget = apply_packet_budget( + packet_fixture_project_root(), + question, + PacketTaskClassDto::RouteTracing, + PacketBudgetModeDto::Compact, + packet_budget_limits(PacketBudgetModeDto::Compact), + &mut weak_answer, + ); + let weak = build_packet_sufficiency( + packet_fixture_project_root(), + question, + PacketTaskClassDto::RouteTracing, + &weak_answer, + &weak_budget, + ); + assert_eq!(weak.status, PacketSufficiencyStatusDto::Partial); + assert!( + weak.gaps + .iter() + .any(|gap| gap.contains("at least 3 are required")), + "single-citation route packets should name the coverage gap: {weak:?}" + ); + + let mut empty_answer = packet_answer_fixture(question, Vec::new()); + let empty_budget = apply_packet_budget( + packet_fixture_project_root(), + question, + PacketTaskClassDto::RouteTracing, + PacketBudgetModeDto::Compact, + packet_budget_limits(PacketBudgetModeDto::Compact), + &mut empty_answer, + ); + let insufficient = build_packet_sufficiency( + packet_fixture_project_root(), + question, + PacketTaskClassDto::RouteTracing, + &empty_answer, + &empty_budget, + ); + + assert_eq!( + insufficient.status, + PacketSufficiencyStatusDto::Insufficient + ); + assert!( + insufficient + .follow_up_commands + .iter() + .any(|command| command.contains("codestory-cli index")), + "insufficient packets should recommend indexing before broad exploration: {insufficient:?}" + ); + assert!( + insufficient + .follow_up_commands + .iter() + .any(|command| command.contains("codestory-cli search") + && command.contains("--why") + && !command.contains("--repo-text on")), + "insufficient packets should recommend sidecar-primary search diagnostics: {insufficient:?}" ); } #[test] - fn packet_supported_claims_surface_ranked_definitions_from_cited_sources() { - let temp_root = std::env::temp_dir().join(format!( - "codestory-source-definition-claims-{}", - std::process::id() - )); - let exec_src = temp_root.join("exec").join("src"); - std::fs::create_dir_all(&exec_src).expect("create temp exec src"); - let exec_lib = exec_src.join("lib.rs"); - std::fs::write( - &exec_lib, - r#" - pub async fn run_exec_session() {} - pub struct EventProcessorWithJsonOutput; - pub struct ThreadStartParams; - "#, + fn packet_follow_up_commands_single_quote_shell_sensitive_questions() { + let question = "Inspect $env:SECRET and $(Get-ChildItem) and 'literal'"; + let quoted = quote_packet_command_value(question); + + assert_eq!( + quoted, + "'Inspect $env:SECRET and $(Get-ChildItem) and ''literal'''" + ); + let command = next_deeper_packet_command( + packet_fixture_project_root(), + question, + PacketBudgetModeDto::Tiny, ) - .expect("write temp exec lib"); - let exec_lib_path = exec_lib.to_string_lossy().to_string(); - let answer = AgentAnswerDto { - answer_id: "source-definition-fixture".to_string(), - prompt: "Explain how `codex exec --json` flows from the exec runtime into app-server thread start requests and JSONL event output.".to_string(), - summary: "Exec flow evidence is covered.".to_string(), - freshness: None, - sections: Vec::new(), - citations: vec![test_packet_citation("exec runtime", &exec_lib_path, 0.8)], - subgraph_ids: Vec::new(), - retrieval_version: "test".to_string(), - graphs: Vec::new(), - retrieval_trace: codestory_contracts::api::AgentRetrievalTraceDto { - request_id: "source-definition-fixture".to_string(), - resolved_profile: AgentRetrievalPresetDto::Architecture, - policy_mode: AgentRetrievalPolicyModeDto::LatencyFirst, - total_latency_ms: 1, - sla_target_ms: None, - sla_missed: false, - semantic_fallback_count: 0, - semantic_fallbacks: Vec::new(), - annotations: Vec::new(), - steps: Vec::new(), - retrieval_shadow: None, - }, + .expect("tiny packet should have deeper command"); + assert!( + command.contains("--question 'Inspect $env:SECRET and $(Get-ChildItem)"), + "packet command should single-quote shell-sensitive question text: {command}" + ); + assert!( + command.contains("--project 'C:/workspace/project root'"), + "packet command should include the concrete project root: {command}" + ); + } + + #[test] + fn packet_anchor_probe_limit_hard_stops_after_sla_exhaustion() { + let budget = PacketLatencyBudget { + started_at: Instant::now() - std::time::Duration::from_secs(30), + target_ms: 1_000, + }; + assert_eq!( + packet_anchor_probe_limit_for_budget(PacketBudgetModeDto::Compact, budget, 1_500), + 0 + ); + } + + #[test] + fn packet_anchor_probe_limit_reduces_when_budget_half_consumed() { + let budget = PacketLatencyBudget { + started_at: Instant::now(), + target_ms: 10_000, }; + assert_eq!( + packet_anchor_probe_limit_for_budget(PacketBudgetModeDto::Compact, budget, 5_500), + 14 + ); + assert_eq!( + packet_anchor_probe_limit_for_budget(PacketBudgetModeDto::Compact, budget, 8_000), + 7 + ); + } - let claims = packet_supported_claims(&answer); - let text = claims - .iter() - .map(|claim| claim.claim.as_str()) - .collect::>() - .join("\n"); + #[test] + fn merged_packet_latency_recomputes_sla_against_packet_budget() { + let mut answer = packet_answer_fixture( + "Explain the packet latency budget.", + vec![ + test_packet_citation("A", "src/a.rs", 0.8), + test_packet_citation("B", "src/b.rs", 0.8), + test_packet_citation("C", "src/c.rs", 0.8), + ], + ); + answer.retrieval_trace.total_latency_ms = 900; + answer.retrieval_trace.sla_missed = false; + answer.retrieval_trace.total_latency_ms = + answer.retrieval_trace.total_latency_ms.saturating_add(250); - assert!(text.contains("`run_exec_session` is defined in cited source")); - assert!(text.contains("`EventProcessorWithJsonOutput` is defined in cited source")); - assert!(text.contains("`ThreadStartParams` is defined in cited source")); + PacketLatencyBudget { + started_at: Instant::now(), + target_ms: 1_000, + } + .apply_to_trace(&mut answer); + + assert_eq!(answer.retrieval_trace.total_latency_ms, 1_150); + assert!(answer.retrieval_trace.sla_missed); + assert_eq!(answer.retrieval_trace.sla_target_ms, Some(1_000)); } #[test] - fn packet_supported_claims_include_indexing_storage_flow_specific_claims() { - let _eval_probes = EvalProbesGuard::enabled(); - let answer = AgentAnswerDto { - answer_id: "indexing-storage-fixture".to_string(), - prompt: "Explain project source-group indexing into storage.".to_string(), - summary: "Indexing and storage evidence is covered.".to_string(), - freshness: None, - sections: Vec::new(), - citations: vec![ - test_packet_citation("Project::buildIndex", "src/lib/project/Project.cpp", 0.8), - test_packet_citation( - "TaskFillIndexerCommandsQueue", - "src/lib/data/indexer/TaskFillIndexerCommandQueue.h", - 0.8, - ), - test_packet_citation( - "SourceGroupCxxCdb", - "src/lib_cxx/project/SourceGroupCxxCdb.cpp", - 0.8, - ), - test_packet_citation( - "IndexerCommandCxx", - "src/lib_cxx/data/indexer/IndexerCommandCxx.h", - 0.8, - ), - test_packet_citation( - "IndexerJava", - "src/lib_java/data/indexer/IndexerJava.cpp", - 0.8, - ), - test_packet_citation("StorageAccess", "src/lib/data/storage/StorageAccess.h", 0.8), - test_packet_citation( - "StorageAccessProxy", - "src/lib/data/storage/StorageAccessProxy.cpp", - 0.8, - ), - test_packet_citation( - "PersistentStorage", - "src/lib/data/storage/PersistentStorage.cpp", - 0.8, - ), - ], - subgraph_ids: Vec::new(), - retrieval_version: "test".to_string(), - graphs: Vec::new(), - retrieval_trace: codestory_contracts::api::AgentRetrievalTraceDto { - request_id: "indexing-storage-fixture".to_string(), - resolved_profile: AgentRetrievalPresetDto::Architecture, - policy_mode: AgentRetrievalPolicyModeDto::LatencyFirst, - total_latency_ms: 1, - sla_target_ms: None, - sla_missed: false, - semantic_fallback_count: 0, - semantic_fallbacks: Vec::new(), - annotations: Vec::new(), - steps: Vec::new(), - retrieval_shadow: None, + fn packet_retrieval_trace_summary_keeps_counters_without_duplicating_full_trace() { + let mut answer = packet_answer_fixture( + "Explain the packet retrieval trace summary.", + vec![test_packet_citation( + "PacketTrace", + "src/packet_trace.rs", + 0.8, + )], + ); + answer.retrieval_trace.total_latency_ms = 42; + answer.retrieval_trace.sla_target_ms = Some(1_000); + answer.retrieval_trace.sla_missed = true; + answer.retrieval_trace.annotations.push( + "large trace annotation should stay only on the canonical answer trace".repeat(8), + ); + answer.retrieval_trace.steps = vec![ + AgentRetrievalStepDto { + kind: AgentRetrievalStepKindDto::Search, + status: AgentRetrievalStepStatusDto::Ok, + duration_ms: 10, + input: Vec::new(), + output: Vec::new(), + message: Some("search details".repeat(16)), }, - }; + AgentRetrievalStepDto { + kind: AgentRetrievalStepKindDto::Trail, + status: AgentRetrievalStepStatusDto::Ok, + duration_ms: 20, + input: Vec::new(), + output: Vec::new(), + message: Some("trail details".repeat(16)), + }, + AgentRetrievalStepDto { + kind: AgentRetrievalStepKindDto::SourceRead, + status: AgentRetrievalStepStatusDto::Ok, + duration_ms: 12, + input: Vec::new(), + output: Vec::new(), + message: Some("source details".repeat(16)), + }, + ]; - let claims = packet_supported_claims(&answer); - let text = claims - .iter() - .map(|claim| claim.claim.as_str()) - .collect::>() - .join("\n"); + let full_trace_bytes = serde_json::to_vec(&answer.retrieval_trace) + .expect("serialize canonical trace") + .len(); + let retrieval_trace_summary = trace_export::packet_retrieval_trace_summary(&answer); + let retrieval_trace_summary_bytes = + serde_json::to_vec(&retrieval_trace_summary.retrieval_trace) + .expect("serialize retrieval trace summary") + .len(); - assert!(text.contains( - "Source-group configuration and indexing command evidence describe how repository configuration becomes indexing work." - )); - assert!(text.contains( - "Persistence/search-projection evidence describes how indexed data remains available to later application reads." - )); - assert!(text.contains("Indexing work queue behavior is anchored by `Project::buildIndex`")); - assert!(text.contains("Source-group configuration is anchored by `SourceGroupCxxCdb`")); - assert!(text.contains("Persistence or search projection is anchored by `StorageAccess`")); + assert_eq!(answer.retrieval_trace.steps.len(), 3); + assert_eq!(retrieval_trace_summary.search_steps, 1); + assert_eq!(retrieval_trace_summary.trail_steps, 1); + assert_eq!(retrieval_trace_summary.source_read_steps, 1); + assert_eq!(retrieval_trace_summary.retrieval_trace.total_latency_ms, 42); + assert_eq!( + retrieval_trace_summary.retrieval_trace.sla_target_ms, + Some(1_000) + ); + assert!(retrieval_trace_summary.retrieval_trace.sla_missed); + assert!(retrieval_trace_summary.retrieval_trace.steps.is_empty()); assert!( - text.contains("Persistence or search projection is anchored by `PersistentStorage`") + retrieval_trace_summary + .retrieval_trace + .annotations + .is_empty() + ); + assert!( + retrieval_trace_summary_bytes < full_trace_bytes / 2, + "retrieval trace summary should stay scalar-sized: {retrieval_trace_summary_bytes} >= {full_trace_bytes}/2" ); } #[test] - fn packet_supported_claims_include_indexing_pipeline_flow_claims() { - let question = "Explain how a full indexing run moves from the CLI into runtime orchestration, file discovery, symbol extraction, persistence, and search or snapshot refresh."; - let answer = packet_answer_fixture( + fn citation_budget_truncation_keeps_sufficient_stop_signal() { + let question = "Explain the compact packet stop rule."; + let mut answer = packet_answer_fixture( question, vec![ - test_packet_citation("CliDirection", "crates/codestory-cli/src/args.rs", 0.8), - test_packet_citation( - "IndexService::run_indexing_blocking_without_runtime_refresh", - "crates/codestory-runtime/src/services.rs", - 0.8, - ), - test_packet_citation( - "Runtime::index_service", - "crates/codestory-runtime/src/lib.rs", - 0.8, - ), - test_packet_citation( - "WorkspaceManifest::build_execution_plan", - "crates/codestory-workspace/src/lib.rs", - 0.8, - ), - test_packet_citation( - "WorkspaceIndexer::run", - "crates/codestory-indexer/src/lib.rs", - 0.8, - ), - test_packet_citation("index_file", "crates/codestory-indexer/src/lib.rs", 0.8), - test_packet_citation( - "Storage::flush_projection_batch", - "crates/codestory-store/src/storage_impl/mod.rs", - 0.8, - ), - test_packet_citation( - "Storage::rebuild_search_symbol_projection_from_node_table", - "crates/codestory-store/src/storage_impl/mod.rs", - 0.8, - ), - test_packet_citation( - "SnapshotStore::refresh_all_with_stats", - "crates/codestory-store/src/snapshot_store.rs", - 0.8, - ), + test_packet_citation("CliCommand", "crates/tool-cli/src/main.rs", 0.8), + test_packet_citation("RuntimeCoordinator", "crates/core/src/runtime.rs", 0.8), + test_packet_citation("WorkspacePlan", "crates/core/src/workspace/plan.rs", 0.8), + test_packet_citation("GraphIndexer", "crates/indexer/src/lib.rs", 0.8), + test_packet_citation("ProjectionStore", "crates/store/src/projection.rs", 0.8), + test_packet_citation("SnapshotRefresh", "crates/store/src/snapshot.rs", 0.8), + test_packet_citation("RouteHandler", "src/routes/user.rs", 0.8), + test_packet_citation("PacketRegression", "tests/packet_flow.rs", 0.8), + test_packet_citation("PacketBudget", "src/packet/budget.rs", 0.8), + test_packet_citation("PacketStopRule", "src/packet/stop_rule.rs", 0.8), + test_packet_citation("PacketClaim", "src/packet/claim.rs", 0.8), + test_packet_citation("PacketFollowUp", "src/packet/follow_up.rs", 0.8), + test_packet_citation("PacketContext", "src/packet/context.rs", 0.8), + test_packet_citation("PacketOutput", "src/packet/output.rs", 0.8), ], ); - - let claims = packet_supported_claims(&answer); - let text = claims - .iter() - .map(|claim| claim.claim.as_str()) - .collect::>() - .join("\n"); - - for expected in [ - "The CLI index command prepares command options and delegates indexing work into the runtime layer.", - "The runtime opens the workspace and store, chooses full or incremental indexing, and coordinates later refresh phases.", - "The workspace crate is responsible for source-file discovery and refresh-plan construction.", - "The indexer extracts nodes, edges, occurrences, and related symbol data from source files.", - "The store persists graph and file data to SQLite and rebuilds query/search projections from persisted data.", - "Snapshot refresh happens after persisted data changes so later grounding and summary reads see current indexed state.", - ] { - assert!( - text.contains(expected), - "indexing pipeline packet claims should include `{expected}`: {text}" - ); - } - } - - #[test] - fn packet_sufficiency_accepts_exact_single_token_index_file_probe() { - let question = "Explain how a full indexing run moves from the CLI into runtime orchestration, file discovery, symbol extraction, persistence, and search or snapshot refresh."; - let (_answer, sufficiency) = build_sufficient_packet_fixture( + let budget = apply_packet_budget( + packet_fixture_project_root(), + question, + PacketTaskClassDto::ArchitectureExplanation, + PacketBudgetModeDto::Compact, + packet_budget_limits(PacketBudgetModeDto::Compact), + &mut answer, + ); + let sufficiency = build_packet_sufficiency( + packet_fixture_project_root(), question, PacketTaskClassDto::ArchitectureExplanation, - vec![ - test_packet_citation("CliDirection", "crates/codestory-cli/src/args.rs", 0.8), - test_packet_citation( - "Runtime::index_service", - "crates/codestory-runtime/src/services.rs", - 0.8, - ), - test_packet_citation( - "index service run indexing", - "crates/codestory-runtime/src/services.rs", - 0.8, - ), - test_packet_citation( - "IndexService::run_indexing_blocking_without_runtime_refresh", - "crates/codestory-runtime/src/services.rs", - 0.8, - ), - test_packet_citation( - "WorkspaceManifest::build_execution_plan", - "crates/codestory-workspace/src/lib.rs", - 0.8, - ), - test_packet_citation( - "symbol extraction indexer", - "crates/codestory-indexer/src/lib.rs", - 0.8, - ), - test_packet_citation( - "WorkspaceIndexer::run", - "crates/codestory-indexer/src/lib.rs", - 0.8, - ), - test_packet_citation("index_file", "crates/codestory-indexer/src/lib.rs", 0.8), - test_packet_citation( - "Storage::flush_projection_batch", - "crates/codestory-store/src/storage_impl/mod.rs", - 0.8, - ), - test_packet_citation( - "Storage::rebuild_search_symbol_projection_from_node_table", - "crates/codestory-store/src/storage_impl/mod.rs", - 0.8, - ), - test_packet_citation( - "storage rebuild search symbol projection", - "crates/codestory-store/src/storage_impl/mod.rs", - 0.8, - ), - test_packet_citation( - "snapshot refresh", - "crates/codestory-store/src/snapshot_store.rs", - 0.8, - ), - test_packet_citation( - "snapshot refresh all stats", - "crates/codestory-store/src/snapshot_store.rs", - 0.8, - ), - ], + &answer, + &budget, ); + assert!( + budget.truncated && budget.omitted_sections.contains(&"citations".to_string()), + "fixture should exercise normal citation budget truncation: {budget:?}" + ); assert_eq!( sufficiency.status, PacketSufficiencyStatusDto::Sufficient, - "{sufficiency:?}" - ); - assert!( - sufficiency - .gaps - .iter() - .all(|gap| !gap.contains("index_file")), - "exact cited index_file should satisfy required probe gaps: {sufficiency:?}" + "budgeted citation clipping should not force broad follow-up when the compact packet still has cited anchors: {sufficiency:?}" ); + assert!(sufficiency.follow_up_commands.is_empty()); + assert_eq!(answer.citations.len(), 13); assert!( - sufficiency - .follow_up_commands - .iter() - .all(|command| !command.contains("index_file")), - "exact cited index_file should not produce follow-up commands: {sufficiency:?}" + sufficiency.gaps.is_empty(), + "normal compact-budget truncation should stay in budget metadata, not sufficiency gaps: {sufficiency:?}" ); + assert!(budget.used.files <= budget.limits.max_files); + assert!(budget.used.output_bytes <= budget.limits.max_output_bytes); } #[test] - fn production_packet_claims_do_not_synthesize_local_real_template_claims() { - let answer = AgentAnswerDto { - answer_id: "indexing-storage-production-fixture".to_string(), - prompt: "Explain project source-group indexing into storage.".to_string(), - summary: "Indexing and storage evidence is covered.".to_string(), - freshness: None, - sections: Vec::new(), - citations: vec![ - test_packet_citation("Project::buildIndex", "src/lib/project/Project.cpp", 0.8), - test_packet_citation( - "SourceGroupCxxCdb", - "src/lib_cxx/project/SourceGroupCxxCdb.cpp", - 0.8, - ), - test_packet_citation( - "IndexerCommandCxx", - "src/lib_cxx/data/indexer/IndexerCommandCxx.h", - 0.8, - ), - test_packet_citation("StorageAccess", "src/lib/data/storage/StorageAccess.h", 0.8), - test_packet_citation( - "PersistentStorage", - "src/lib/data/storage/PersistentStorage.cpp", - 0.8, - ), + fn answer_critical_budget_truncation_requires_deeper_packet() { + let question = "Explain the packet stop rule when evidence is clipped."; + let mut answer = packet_answer_fixture( + question, + vec![ + test_packet_citation("CliCommand", "crates/tool-cli/src/main.rs", 0.8), + test_packet_citation("RuntimeCoordinator", "crates/core/src/runtime.rs", 0.8), + test_packet_citation("WorkspacePlan", "crates/core/src/workspace/plan.rs", 0.8), ], - subgraph_ids: Vec::new(), - retrieval_version: "test".to_string(), - graphs: Vec::new(), - retrieval_trace: codestory_contracts::api::AgentRetrievalTraceDto { - request_id: "indexing-storage-production-fixture".to_string(), - resolved_profile: AgentRetrievalPresetDto::Architecture, - policy_mode: AgentRetrievalPolicyModeDto::LatencyFirst, - total_latency_ms: 1, - sla_target_ms: None, - sla_missed: false, - semantic_fallback_count: 0, - semantic_fallbacks: Vec::new(), - annotations: Vec::new(), - steps: Vec::new(), - retrieval_shadow: None, - }, - }; + ); + let mut budget = apply_packet_budget( + packet_fixture_project_root(), + question, + PacketTaskClassDto::ArchitectureExplanation, + PacketBudgetModeDto::Compact, + packet_budget_limits(PacketBudgetModeDto::Compact), + &mut answer, + ); + budget.truncated = true; + budget.omitted_sections = vec!["markdown_blocks".to_string(), "trail_edges".to_string()]; + budget.next_deeper_command = next_deeper_packet_command( + packet_fixture_project_root(), + question, + PacketBudgetModeDto::Compact, + ); - let claims = packet_supported_claims(&answer); - let text = claims - .iter() - .map(|claim| claim.claim.as_str()) - .collect::>() - .join("\n"); + let sufficiency = build_packet_sufficiency( + packet_fixture_project_root(), + question, + PacketTaskClassDto::ArchitectureExplanation, + &answer, + &budget, + ); + assert_eq!(sufficiency.status, PacketSufficiencyStatusDto::Partial); assert!( - !text.contains("Project::buildIndex builds a per-source-group indexing task pipeline"), - "production claims should not inject Sourcetrail-specific template text: {text}" + sufficiency + .gaps + .iter() + .any(|gap| gap.contains("answer-critical evidence")), + "answer-critical truncation should be named as a sufficiency gap: {sufficiency:?}" ); assert!( - !text.contains("SourceGroupCxxCdb reads compile database input"), - "production claims should not inject Sourcetrail-specific template text: {text}" + sufficiency + .follow_up_commands + .iter() + .any(|command| command.contains("--budget standard")), + "partial packet should recommend the existing deeper packet command: {sufficiency:?}" ); - assert!(text.contains("Indexing work queue behavior is anchored by `Project::buildIndex`")); - assert!(text.contains("Persistence or search projection is anchored by `StorageAccess`")); } #[test] - fn packet_supported_claims_include_vscode_workbench_extension_host_claims() { - let answer = AgentAnswerDto { - answer_id: "vscode-fixture".to_string(), - prompt: "Explain VS Code workbench extension-host command execution.".to_string(), - summary: "VS Code workbench flow evidence is covered.".to_string(), - freshness: None, - sections: Vec::new(), - citations: vec![ - test_packet_citation( - "Workbench.startup", - "src/vs/workbench/browser/workbench.ts", - 0.8, - ), - test_packet_citation( - "ExtensionService", - "src/vs/workbench/services/extensions/browser/extensionService.ts", - 0.8, - ), - test_packet_citation( - "ExtensionHostManager", - "src/vs/workbench/services/extensions/common/extensionHostManager.ts", - 0.8, - ), - test_packet_citation( - "AbstractExtHostExtensionService", - "src/vs/workbench/api/common/extHostExtensionService.ts", - 0.8, - ), - test_packet_citation( - "ExtHostCommands", - "src/vs/workbench/api/common/extHostCommands.ts", - 0.8, - ), - ], - subgraph_ids: Vec::new(), - retrieval_version: "test".to_string(), - graphs: Vec::new(), - retrieval_trace: codestory_contracts::api::AgentRetrievalTraceDto { - request_id: "vscode-fixture".to_string(), - resolved_profile: AgentRetrievalPresetDto::Architecture, - policy_mode: AgentRetrievalPolicyModeDto::LatencyFirst, - total_latency_ms: 1, - sla_target_ms: None, - sla_missed: false, - semantic_fallback_count: 0, - semantic_fallbacks: Vec::new(), - annotations: Vec::new(), - steps: Vec::new(), - retrieval_shadow: None, - }, - }; + fn retrieval_appendix_and_secondary_trail_clipping_can_remain_sufficient() { + fn node(id: &str) -> codestory_contracts::api::GraphNodeDto { + codestory_contracts::api::GraphNodeDto { + id: NodeId(id.to_string()), + label: id.to_string(), + kind: codestory_contracts::api::NodeKind::FUNCTION, + depth: 1, + label_policy: None, + badge_visible_members: None, + badge_total_members: None, + merged_symbol_examples: Vec::new(), + file_path: None, + qualified_name: None, + member_access: None, + } + } + + fn edge(id: &str, source: &str, target: &str) -> codestory_contracts::api::GraphEdgeDto { + codestory_contracts::api::GraphEdgeDto { + id: EdgeId(id.to_string()), + source: NodeId(source.to_string()), + target: NodeId(target.to_string()), + kind: codestory_contracts::api::EdgeKind::CALL, + confidence: None, + certainty: None, + callsite_identity: None, + candidate_targets: Vec::new(), + } + } + let question = "Explain public content flow through Payload."; + let mut answer = packet_answer_fixture( + question, + vec![ + test_packet_citation("Posts", "src/collections/Posts.ts", 0.9), + test_packet_citation( + "getApprovedCommentsForPost", + "src/lib/content-data/comment-content.ts", + 0.9, + ), + test_packet_citation("GET /feed.xml", "src/app/feed.xml/route.ts", 0.9), + ], + ); let claims = packet_supported_claims(&answer); - let text = claims - .iter() - .map(|claim| claim.claim.as_str()) - .collect::>() - .join("\n"); + answer.sections = vec![ + AgentResponseSectionDto { + id: "packet-flow-claims".to_string(), + title: "Packet Claims".to_string(), + blocks: vec![AgentResponseBlockDto::Markdown { + markdown: packet_flow_claims_markdown(&claims), + }], + }, + AgentResponseSectionDto { + id: "retrieval-evidence".to_string(), + title: "Retrieval Evidence".to_string(), + blocks: vec![AgentResponseBlockDto::Markdown { + markdown: format!( + "Search appendix and low-level trace details.{}", + PACKET_MARKDOWN_TRUNCATION_SUFFIX + ), + }], + }, + ]; + answer.graphs.push(GraphArtifactDto::Uml { + id: "primary".to_string(), + title: "Primary Neighborhood".to_string(), + graph: GraphResponse { + center_id: NodeId("post-page".to_string()), + nodes: vec![node("post-page"), node("payload")], + edges: vec![edge("edge_1", "post-page", "payload")], + truncated: false, + omitted_edge_count: 0, + canonical_layout: None, + }, + }); - assert!(text.contains("Runtime orchestration is anchored by `ExtensionService`")); - assert!(text.contains( - "The command or public entrypoint for this flow is anchored by `ExtHostCommands`" - )); - assert!( - text.contains("Source evidence is anchored by") - || text.contains("Runtime orchestration is anchored by"), - "VS Code packet claims should use generic role-led anchors: {text}" + let budget = PacketBudgetDto { + requested: PacketBudgetModeDto::Compact, + limits: packet_budget_limits(PacketBudgetModeDto::Compact), + used: packet_budget_usage(&answer), + truncated: true, + omitted_sections: vec![ + "citations".to_string(), + "markdown_blocks".to_string(), + "trail_edges".to_string(), + ], + next_deeper_command: next_deeper_packet_command( + packet_fixture_project_root(), + question, + PacketBudgetModeDto::Compact, + ), + }; + + let sufficiency = build_packet_sufficiency( + packet_fixture_project_root(), + question, + PacketTaskClassDto::ArchitectureExplanation, + &answer, + &budget, ); + + assert_eq!(sufficiency.status, PacketSufficiencyStatusDto::Sufficient); + assert!(sufficiency.gaps.is_empty()); + assert!(sufficiency.follow_up_commands.is_empty()); + assert!(sufficiency.covered_claims.len() >= 3); } #[test] - fn packet_supported_claims_include_payload_public_content_flow_claims() { - let answer = AgentAnswerDto { - answer_id: "payload-fixture".to_string(), - prompt: "Explain Payload posts comments RSS and Elsewhere feed.".to_string(), - summary: "Payload public content flow evidence is covered.".to_string(), - freshness: None, - sections: Vec::new(), - citations: vec![ - test_packet_citation("buildConfig", "src/payload.config.ts", 0.8), - test_packet_citation("Posts", "src/collections/Posts.ts", 0.8), - test_packet_citation("SocialEntries", "src/collections/SocialEntries.ts", 0.8), - test_packet_citation("PostPage", "src/app/(frontend)/posts/[slug]/page.tsx", 0.8), - test_packet_citation( - "POST /posts/:slug/comments", - "src/app/(frontend)/posts/[slug]/comments/route.ts", - 0.8, - ), - test_packet_citation("GET /feed.xml", "src/app/feed.xml/route.ts", 0.8), - test_packet_citation("getPayloadClient", "src/lib/payload.ts", 0.8), - test_packet_citation( - "getCommentAuthContextFromHeaders", - "src/lib/comment-auth.ts", - 0.8, - ), + fn retained_truncated_trail_edges_can_remain_sufficient() { + fn node(id: &str) -> codestory_contracts::api::GraphNodeDto { + codestory_contracts::api::GraphNodeDto { + id: NodeId(id.to_string()), + label: id.to_string(), + kind: codestory_contracts::api::NodeKind::FUNCTION, + depth: 1, + label_policy: None, + badge_visible_members: None, + badge_total_members: None, + merged_symbol_examples: Vec::new(), + file_path: None, + qualified_name: None, + member_access: None, + } + } + + fn edge(id: &str, source: &str, target: &str) -> codestory_contracts::api::GraphEdgeDto { + codestory_contracts::api::GraphEdgeDto { + id: EdgeId(id.to_string()), + source: NodeId(source.to_string()), + target: NodeId(target.to_string()), + kind: codestory_contracts::api::EdgeKind::CALL, + confidence: None, + certainty: None, + callsite_identity: None, + candidate_targets: Vec::new(), + } + } + + let question = "Explain public content flow through Payload."; + let mut answer = packet_answer_fixture( + question, + vec![ + test_packet_citation("Posts", "src/collections/Posts.ts", 0.9), test_packet_citation( - "getLatestSocialEntries", - "src/lib/content-data/social-entry-content.ts", - 0.8, + "getApprovedCommentsForPost", + "src/lib/content-data/comment-content.ts", + 0.9, ), + test_packet_citation("GET /feed.xml", "src/app/feed.xml/route.ts", 0.9), ], - subgraph_ids: Vec::new(), - retrieval_version: "test".to_string(), - graphs: Vec::new(), - retrieval_trace: codestory_contracts::api::AgentRetrievalTraceDto { - request_id: "payload-fixture".to_string(), - resolved_profile: AgentRetrievalPresetDto::Architecture, - policy_mode: AgentRetrievalPolicyModeDto::LatencyFirst, - total_latency_ms: 1, - sla_target_ms: None, - sla_missed: false, - semantic_fallback_count: 0, - semantic_fallbacks: Vec::new(), - annotations: Vec::new(), - steps: Vec::new(), - retrieval_shadow: None, + ); + answer.graphs.push(GraphArtifactDto::Uml { + id: "primary".to_string(), + title: "Primary Neighborhood".to_string(), + graph: GraphResponse { + center_id: NodeId("session".to_string()), + nodes: vec![node("api"), node("session"), node("adapter")], + edges: vec![ + edge("edge_1", "api", "session"), + edge("edge_2", "session", "adapter"), + ], + truncated: true, + omitted_edge_count: 12, + canonical_layout: None, }, + }); + + let budget = PacketBudgetDto { + requested: PacketBudgetModeDto::Compact, + limits: packet_budget_limits(PacketBudgetModeDto::Compact), + used: packet_budget_usage(&answer), + truncated: true, + omitted_sections: vec!["citations".to_string(), "trail_edges".to_string()], + next_deeper_command: next_deeper_packet_command( + packet_fixture_project_root(), + question, + PacketBudgetModeDto::Compact, + ), }; - let claims = packet_supported_claims(&answer); - let text = claims - .iter() - .map(|claim| claim.claim.as_str()) - .collect::>() - .join("\n"); + let sufficiency = build_packet_sufficiency( + packet_fixture_project_root(), + question, + PacketTaskClassDto::ArchitectureExplanation, + &answer, + &budget, + ); - assert!(text.contains("Collection configuration is anchored by `Posts`")); - assert!(text.contains("Route handling is anchored by `POST /posts/:slug/comments`")); - assert!(text.contains("`getPayloadClient` in `src/lib/payload.ts`")); + assert_eq!( + sufficiency.status, + PacketSufficiencyStatusDto::Sufficient, + "trail clipping should not force deeper packets when graph edges, citations, and claims remain: {sufficiency:?}" + ); + assert!(sufficiency.gaps.is_empty()); + assert!(sufficiency.follow_up_commands.is_empty()); } #[test] - fn packet_ranking_prefers_payload_collections_over_component_and_preview_fillers() { - let question = "Explain how Payload collections, post rendering, comment submission, RSS, and the Elsewhere feed connect."; - let mut answer = AgentAnswerDto { - answer_id: "payload-rank-fixture".to_string(), - prompt: question.to_string(), - summary: "Payload public content flow evidence is covered.".to_string(), - freshness: None, - sections: Vec::new(), - citations: vec![ - test_packet_citation("PostComments", "src/components/PostComments.tsx", 0.55), - test_packet_citation("posts", "src/lib/content-data/preview-content.ts", 0.55), - test_packet_citation("Posts", "src/collections/Posts.ts", 0.8), - test_packet_citation("Comments", "src/collections/Comments.ts", 0.8), - ], - subgraph_ids: Vec::new(), - retrieval_version: "test".to_string(), - graphs: Vec::new(), - retrieval_trace: codestory_contracts::api::AgentRetrievalTraceDto { - request_id: "payload-rank-fixture".to_string(), - resolved_profile: AgentRetrievalPresetDto::Architecture, - policy_mode: AgentRetrievalPolicyModeDto::LatencyFirst, - total_latency_ms: 1, - sla_target_ms: None, - sla_missed: false, - semantic_fallback_count: 0, - semantic_fallbacks: Vec::new(), - annotations: Vec::new(), - steps: Vec::new(), - retrieval_shadow: None, + fn packet_output_budget_measures_serialized_packet_payload() { + let question = "Explain the final packet payload budget."; + let limits = PacketBudgetLimitsDto { + max_anchors: 4, + max_files: 4, + max_snippets: 4, + max_trail_edges: 4, + max_output_bytes: 6 * 1024, + }; + let max_output_bytes = limits.max_output_bytes; + let mut answer = packet_answer_fixture( + question, + vec![test_packet_citation( + "PacketBudget", + "crates/codestory-runtime/src/agent/orchestrator.rs", + 0.8, + )], + ); + if let AgentResponseBlockDto::Markdown { markdown } = &mut answer.sections[0].blocks[0] { + *markdown = "payload budget evidence ".repeat(6000); + } + append_packet_step_trace_annotation(&mut answer); + let budget = apply_packet_budget( + packet_fixture_project_root(), + question, + PacketTaskClassDto::ArchitectureExplanation, + PacketBudgetModeDto::Tiny, + limits, + &mut answer, + ); + let sufficiency = build_packet_sufficiency( + packet_fixture_project_root(), + question, + PacketTaskClassDto::ArchitectureExplanation, + &answer, + &budget, + ); + let retrieval_trace_summary = trace_export::packet_retrieval_trace_summary(&answer); + let mut packet = AgentPacketDto { + packet_id: answer.answer_id.clone(), + question: question.to_string(), + task_class: Some(PacketTaskClassDto::ArchitectureExplanation), + plan: PacketPlanDto { + task_class: PacketTaskClassDto::ArchitectureExplanation, + inferred_task_class: false, + queries: vec![PacketPlanQueryDto { + query: question.to_string(), + purpose: "fixture".to_string(), + }], + trace: Vec::new(), }, + answer, + budget, + sufficiency, + retrieval_trace_summary, }; - rank_packet_evidence(question, &mut answer); - let top_paths = answer - .citations - .iter() - .take(2) - .filter_map(|citation| citation.file_path.as_deref().map(packet_display_path)) - .collect::>(); + enforce_packet_output_budget(packet_fixture_project_root(), &mut packet); - assert_eq!( - top_paths, - vec!["src/collections/Posts.ts", "src/collections/Comments.ts"], - "Payload collection files should outrank nearby rendering/preview fillers: {top_paths:?}" + let serialized_len = serde_json::to_vec(&packet).expect("serialize packet").len(); + assert!( + serialized_len <= max_output_bytes as usize, + "serialized packet should honor max_output_bytes: {serialized_len} > {}", + max_output_bytes + ); + assert_eq!(packet.budget.used.output_bytes as usize, serialized_len); + assert!( + packet + .answer + .retrieval_trace + .annotations + .iter() + .any(|annotation| annotation.starts_with("packet_step_trace ")), + "packet step trace annotation should be present before final budget measurement" + ); + assert!(packet.budget.truncated); + assert!( + packet + .budget + .omitted_sections + .contains(&"markdown_blocks".to_string()) + ); + assert!( + !packet + .budget + .omitted_sections + .contains(&"packet_payload".to_string()) + ); + assert!( + !packet + .sufficiency + .gaps + .iter() + .any(|gap| gap.contains("packet_payload") || gap.contains("output_bytes")), + "sufficiency gaps should be rebuilt after final payload remeasurement clears stale omissions: {:?}", + packet.sufficiency ); } #[test] - fn packet_ranking_demotes_test_paths_without_fixture_specific_boosts() { - let question = "Trace route dispatch through a handler."; - let mut answer = AgentAnswerDto { - answer_id: "rank-fixture".to_string(), - prompt: question.to_string(), - summary: "Route evidence is covered by cited anchors.".to_string(), - freshness: None, - sections: Vec::new(), - citations: vec![ - test_packet_citation("RouteHandler test", "tests/router_handler.rs", 5.0), - test_packet_citation("RouteHandler", "src/router/handler.rs", 0.5), - ], - subgraph_ids: Vec::new(), - retrieval_version: "test".to_string(), - graphs: Vec::new(), - retrieval_trace: codestory_contracts::api::AgentRetrievalTraceDto { - request_id: "rank-fixture".to_string(), - resolved_profile: AgentRetrievalPresetDto::Architecture, - policy_mode: AgentRetrievalPolicyModeDto::LatencyFirst, - total_latency_ms: 1, - sla_target_ms: None, - sla_missed: false, - semantic_fallback_count: 0, - semantic_fallbacks: Vec::new(), - annotations: Vec::new(), - steps: Vec::new(), - retrieval_shadow: None, + fn packet_hard_output_cap_uses_current_usage_not_stale_omissions() { + let limits = PacketBudgetLimitsDto { + max_anchors: 4, + max_files: 4, + max_snippets: 4, + max_trail_edges: 4, + max_output_bytes: 1000, + }; + let mut budget = PacketBudgetDto { + requested: PacketBudgetModeDto::Compact, + limits, + used: PacketBudgetUsageDto { + anchors: 4, + files: 4, + snippets: 0, + trail_edges: 0, + output_bytes: 900, }, + truncated: true, + omitted_sections: vec!["output_bytes".to_string(), "packet_payload".to_string()], + next_deeper_command: None, }; - rank_packet_evidence(question, &mut answer); - assert_eq!(answer.citations[0].display_name, "RouteHandler"); + assert!( + !packet_budget_exceeded_hard_output_cap(&budget), + "stale output_bytes omission should not force followups after final payload fits" + ); + budget.used.output_bytes = 1001; + assert!(packet_budget_exceeded_hard_output_cap(&budget)); } #[test] - fn packet_ranking_demotes_test_named_source_helpers_for_production_prompts() { - let question = "Explain runtime orchestration and search projection in the indexing flow."; - let mut answer = AgentAnswerDto { - answer_id: "rank-test-symbols".to_string(), - prompt: question.to_string(), - summary: "Runtime evidence is covered by cited anchors.".to_string(), - freshness: None, - sections: Vec::new(), - citations: vec![ - test_packet_citation( - "EmbeddingRuntime::test_runtime", - "crates/codestory-runtime/src/search/engine.rs", - 5.0, - ), - test_packet_citation( - "tests::drill_question_search_is_partial_discovery_evidence", - "crates/codestory-cli/src/main.rs", - 5.0, - ), - test_packet_citation( - "IndexService::run_indexing_blocking", - "crates/codestory-runtime/src/services.rs", - 0.5, - ), - ], - subgraph_ids: Vec::new(), - retrieval_version: "test".to_string(), - graphs: Vec::new(), - retrieval_trace: codestory_contracts::api::AgentRetrievalTraceDto { - request_id: "rank-test-symbols".to_string(), - resolved_profile: AgentRetrievalPresetDto::Architecture, - policy_mode: AgentRetrievalPolicyModeDto::LatencyFirst, - total_latency_ms: 1, - sla_target_ms: None, - sla_missed: false, - semantic_fallback_count: 0, - semantic_fallbacks: Vec::new(), - annotations: Vec::new(), - steps: Vec::new(), - retrieval_shadow: None, - }, - }; + fn graph_budget_prunes_nodes_not_referenced_by_retained_edges() { + fn node(id: &str) -> codestory_contracts::api::GraphNodeDto { + codestory_contracts::api::GraphNodeDto { + id: NodeId(id.to_string()), + label: id.to_string(), + kind: codestory_contracts::api::NodeKind::FUNCTION, + depth: 1, + label_policy: None, + badge_visible_members: None, + badge_total_members: None, + merged_symbol_examples: Vec::new(), + file_path: None, + qualified_name: None, + member_access: None, + } + } - rank_packet_evidence(question, &mut answer); + fn edge(id: &str, source: &str, target: &str) -> codestory_contracts::api::GraphEdgeDto { + codestory_contracts::api::GraphEdgeDto { + id: EdgeId(id.to_string()), + source: NodeId(source.to_string()), + target: NodeId(target.to_string()), + kind: codestory_contracts::api::EdgeKind::CALL, + confidence: None, + certainty: None, + callsite_identity: None, + candidate_targets: Vec::new(), + } + } - assert_eq!( - answer.citations[0].display_name, - "IndexService::run_indexing_blocking" - ); - assert_eq!( - packet_evidence_role(&answer.citations[1]), - Some("tests and regression coverage") - ); - assert_eq!( - packet_evidence_role(&answer.citations[2]), - Some("tests and regression coverage") + let mut answer = packet_answer_fixture( + "Explain graph budget trimming.", + vec![test_packet_citation("center", "src/center.rs", 0.9)], ); - } - - #[test] - fn packet_ranking_demotes_non_primary_roles_for_production_prompts() { - let question = "Trace production route dispatch through the handler."; - let mut answer = AgentAnswerDto { - answer_id: "rank-roles".to_string(), - prompt: question.to_string(), - summary: "Route evidence is covered by cited anchors.".to_string(), - freshness: None, - sections: Vec::new(), - citations: vec![ - test_packet_citation("DocsRouteHandler", "docs/routes.md", 5.0), - test_packet_citation("GeneratedRouteHandler", "target/generated/routes.rs", 5.0), - test_packet_citation("VendorRouteHandler", "vendor/router/handler.rs", 5.0), - test_packet_citation("BenchRouteHandler", "benches/router_handler.rs", 5.0), - test_packet_citation("RouteHandler", "src/router/handler.rs", 0.5), - ], - subgraph_ids: Vec::new(), - retrieval_version: "test".to_string(), - graphs: Vec::new(), - retrieval_trace: codestory_contracts::api::AgentRetrievalTraceDto { - request_id: "rank-roles".to_string(), - resolved_profile: AgentRetrievalPresetDto::Architecture, - policy_mode: AgentRetrievalPolicyModeDto::LatencyFirst, - total_latency_ms: 1, - sla_target_ms: None, - sla_missed: false, - semantic_fallback_count: 0, - semantic_fallbacks: Vec::new(), - annotations: Vec::new(), - steps: Vec::new(), - retrieval_shadow: None, + answer.graphs.push(GraphArtifactDto::Uml { + id: "graph".to_string(), + title: "Graph".to_string(), + graph: GraphResponse { + center_id: NodeId("center".to_string()), + nodes: vec![ + node("center"), + node("kept"), + node("dropped_a"), + node("dropped_b"), + ], + edges: vec![ + edge("edge_1", "center", "kept"), + edge("edge_2", "kept", "dropped_a"), + edge("edge_3", "dropped_a", "dropped_b"), + ], + truncated: false, + omitted_edge_count: 0, + canonical_layout: None, }, - }; + }); - rank_packet_evidence(question, &mut answer); - assert_eq!(answer.citations[0].display_name, "RouteHandler"); + let budget = apply_packet_budget( + packet_fixture_project_root(), + "Explain graph budget trimming.", + PacketTaskClassDto::ArchitectureExplanation, + PacketBudgetModeDto::Tiny, + PacketBudgetLimitsDto { + max_trail_edges: 1, + ..packet_budget_limits(PacketBudgetModeDto::Tiny) + }, + &mut answer, + ); + + let GraphArtifactDto::Uml { graph, .. } = &answer.graphs[0] else { + panic!("expected UML graph"); + }; + let node_ids = graph + .nodes + .iter() + .map(|node| node.id.0.as_str()) + .collect::>(); + assert_eq!(graph.edges.len(), 1); + assert_eq!(node_ids, vec!["center", "kept"]); + assert!(graph.truncated); + assert!(budget.omitted_sections.contains(&"trail_edges".to_string())); } #[test] - fn packet_ranking_keeps_requested_docs_role_eligible() { - let question = "Trace the docs route dispatch example."; + fn generic_packet_sections_and_sufficiency_cover_agent_stop_contract() { + let question = "Explain how a command enters runtime orchestration, workspace planning, symbol extraction, persistence, and snapshot refresh."; + let limits = packet_budget_limits(PacketBudgetModeDto::Compact); let mut answer = AgentAnswerDto { - answer_id: "rank-docs".to_string(), + answer_id: "packet-fixture".to_string(), prompt: question.to_string(), - summary: "Route evidence is covered by cited anchors.".to_string(), + summary: "Runtime flow is covered by cited anchors.".to_string(), freshness: None, - sections: Vec::new(), + sections: vec![AgentResponseSectionDto { + id: "answer".to_string(), + title: "Answer".to_string(), + blocks: vec![AgentResponseBlockDto::Markdown { + markdown: "The flow starts at the command surface and proceeds through runtime, workspace, indexer, store, and snapshot layers.".to_string(), + }], + }], citations: vec![ - test_packet_citation("RouteHandler", "src/router/handler.rs", 0.5), - test_packet_citation("DocsRouteHandler", "docs/routes.md", 5.0), + test_packet_citation( + "FlowRegression", + "tests/flow_regression.rs", + 0.5, + ), + test_packet_citation("CliCommand", "crates/app-cli/src/main.rs", 0.2), + test_packet_citation( + "RuntimeCoordinator", + "crates/app-runtime/src/services.rs", + 0.3, + ), + test_packet_citation( + "WorkspacePlan", + "crates/workspace/src/plan.rs", + 0.2, + ), + test_packet_citation( + "GraphIndexer", + "crates/indexer/src/lib.rs", + 0.2, + ), + test_packet_citation( + "ProjectionStore", + "crates/store/src/projection.rs", + 0.2, + ), ], subgraph_ids: Vec::new(), retrieval_version: "test".to_string(), graphs: Vec::new(), retrieval_trace: codestory_contracts::api::AgentRetrievalTraceDto { - request_id: "rank-docs".to_string(), + request_id: "packet-fixture".to_string(), resolved_profile: AgentRetrievalPresetDto::Architecture, policy_mode: AgentRetrievalPolicyModeDto::LatencyFirst, total_latency_ms: 1, @@ -9558,985 +7873,1783 @@ mod tests { semantic_fallbacks: Vec::new(), annotations: Vec::new(), steps: Vec::new(), + packet_sidecar_diagnostics: Vec::new(), retrieval_shadow: None, }, }; rank_packet_evidence(question, &mut answer); - assert_eq!(answer.citations[0].display_name, "DocsRouteHandler"); + append_packet_evidence_sections( + &mut answer, + PacketTaskClassDto::ArchitectureExplanation, + &limits, + ); + let budget = apply_packet_budget( + packet_fixture_project_root(), + question, + PacketTaskClassDto::ArchitectureExplanation, + PacketBudgetModeDto::Compact, + limits, + &mut answer, + ); + let sufficiency = build_packet_sufficiency( + packet_fixture_project_root(), + question, + PacketTaskClassDto::ArchitectureExplanation, + &answer, + &budget, + ); + + assert_eq!(answer.sections[0].id, "packet-evidence-ledger"); + assert_eq!(answer.sections[1].id, "packet-flow-claims"); + let top_anchor_names = answer + .citations + .iter() + .take(4) + .map(|citation| citation.display_name.as_str()) + .collect::>(); + assert!( + top_anchor_names.contains(&"CliCommand"), + "command entrypoint should stay in the high-priority flow anchors: {top_anchor_names:?}" + ); + assert!( + top_anchor_names.contains(&"RuntimeCoordinator"), + "runtime coordination should stay in the high-priority flow anchors: {top_anchor_names:?}" + ); + assert_eq!(sufficiency.status, PacketSufficiencyStatusDto::Sufficient); + assert!(sufficiency.follow_up_commands.is_empty()); + assert!(sufficiency.open_next.is_empty()); + assert!( + sufficiency.covered_claims.iter().any(|claim| claim + .claim + .contains("Runtime orchestration is anchored by `RuntimeCoordinator`")), + "generic packet should include claim-led runtime flow notes: {sufficiency:?}" + ); + assert!( + sufficiency + .avoid_opening + .iter() + .any(|path| path.contains("crates/app-cli/src/main.rs")), + "sufficient packets should tell agents cited files do not need broad re-opening: {sufficiency:?}" + ); + assert!( + sufficiency + .avoid_opening_paths + .iter() + .any(|path| path == "crates/app-cli/src/main.rs"), + "sufficient packets should expose raw cited paths separately from prose: {sufficiency:?}" + ); } #[test] - fn sufficient_packets_stop_broad_exploration_across_task_classes() { - let fixtures = [ - ( - PacketTaskClassDto::ArchitectureExplanation, - "Explain how the command runtime loads a workspace plan and refreshes snapshots.", - vec![ - test_packet_citation("CliCommand", "crates/app-cli/src/main.rs", 0.9), - test_packet_citation( - "RuntimeCoordinator", - "crates/app-runtime/src/runtime.rs", - 0.9, - ), - test_packet_citation("WorkspacePlan", "crates/workspace/src/plan.rs", 0.8), - ], - "Runtime orchestration is anchored by `RuntimeCoordinator`", - "crates/app-runtime/src/runtime.rs", - ), - ( - PacketTaskClassDto::BugLocalization, - "Find the failure handling path for decode validation.", - vec![ - test_packet_citation("RuntimeErrorHandler", "src/runtime/errors.rs", 0.9), - test_packet_citation("DecodeValidator", "src/validation/decode.rs", 0.8), - test_packet_citation("DecodeRegression", "tests/decode_regression.rs", 0.7), - ], - "Runtime orchestration is anchored by `RuntimeErrorHandler`", - "src/runtime/errors.rs", - ), - ( - PacketTaskClassDto::ChangeImpact, - "What changes if reference resolution behavior changes?", - vec![ - test_packet_citation( - "AffectedReferenceIndex", - "crates/indexer/src/references.rs", - 0.9, - ), - test_packet_citation("ReferenceStore", "crates/store/src/references.rs", 0.8), - test_packet_citation( - "ReferenceRegression", - "tests/reference_regression.rs", - 0.7, - ), - ], - "Symbol extraction is anchored by `AffectedReferenceIndex`", - "crates/indexer/src/references.rs", - ), - ( - PacketTaskClassDto::RouteTracing, - "Trace how a request reaches the selected handler.", - vec![ - test_packet_citation("RouteDispatcher", "src/router/dispatch.rs", 0.9), - test_packet_citation("RouteHandler", "src/router/handler.rs", 0.8), - test_packet_citation("RouteRegression", "tests/route_regression.rs", 0.7), - ], - "Route handling is anchored by `RouteHandler`", - "src/router/handler.rs", - ), - ( - PacketTaskClassDto::SymbolOwnership, - "Who owns workspace planning and graph state?", - vec![ - test_packet_citation( - "WorkspaceOwnerPlan", - "crates/workspace/src/ownership.rs", - 0.9, - ), - test_packet_citation("GraphStateStore", "crates/store/src/graph.rs", 0.8), - test_packet_citation( - "OwnershipRegression", - "tests/ownership_regression.rs", - 0.7, - ), - ], - "Workspace discovery or planning is anchored by `WorkspaceOwnerPlan`", - "crates/workspace/src/ownership.rs", - ), - ( - PacketTaskClassDto::EditPlanning, - "Plan the focused edit for configuration validation behavior.", - vec![ - test_packet_citation("ConfigValidator", "src/config/validator.rs", 0.9), - test_packet_citation("ConfigEditPlan", "src/config/edit_plan.rs", 0.8), - test_packet_citation("ConfigRegression", "tests/config_regression.rs", 0.7), - ], - "Regression coverage for this flow is anchored by `ConfigRegression`", - "tests/config_regression.rs", - ), - ]; - - for (task_class, question, citations, expected_claim, avoid_path) in fixtures { - let (_answer, sufficiency) = - build_sufficient_packet_fixture(question, task_class, citations); + fn packet_plan_adds_prepared_session_adapter_exact_probes() { + let _eval_probes = EvalProbesGuard::enabled(); + let question = "Explain how Requests turns a top-level request call into a prepared request and sends it through a session adapter."; + let plan = build_packet_plan( + question, + Some(PacketTaskClassDto::ArchitectureExplanation), + PacketBudgetModeDto::Compact, + ); + let queries = plan + .queries + .iter() + .map(|query| query.query.as_str()) + .collect::>(); + let required = packet_sufficiency_required_probe_queries( + question, + PacketTaskClassDto::ArchitectureExplanation, + ); - assert_eq!( - sufficiency.status, - PacketSufficiencyStatusDto::Sufficient, - "task class {task_class:?} should be sufficient: {sufficiency:?}" + for expected in [ + "Session.request", + "Session.prepare_request", + "PreparedRequest.prepare", + "Session.send", + "HTTPAdapter.send", + ] { + assert!( + queries.contains(&expected), + "packet plan should include exact Requests flow probe `{expected}` in {queries:?}" ); assert!( - sufficiency.follow_up_commands.is_empty(), - "sufficient {task_class:?} packets should not recommend broad follow-up commands: {sufficiency:?}" + required.iter().any(|query| query == expected), + "packet required probes should protect exact Requests flow probe `{expected}` in {required:?}" ); + } + } + + #[test] + fn packet_plan_keeps_requests_and_express_exact_probes_eval_only() { + let _env = EnvVarGuard::cleared(EVAL_PROBES_ENV); + let requests_question = "Explain how Requests turns a top-level request call into a prepared request and sends it through a session adapter."; + let requests_plan = build_packet_plan( + requests_question, + Some(PacketTaskClassDto::ArchitectureExplanation), + PacketBudgetModeDto::Compact, + ); + let requests_queries = requests_plan + .queries + .iter() + .map(|query| query.query.as_str()) + .collect::>(); + let requests_required = packet_sufficiency_required_probe_queries( + requests_question, + PacketTaskClassDto::ArchitectureExplanation, + ); + + for generic_probe in [ + "request preparation", + "session request", + "session send", + "adapter send", + "adapter selection", + ] { assert!( - sufficiency.open_next.is_empty(), - "sufficient {task_class:?} packets should not name generic open-next work: {sufficiency:?}" + requests_queries.contains(&generic_probe) + || requests_required.iter().any(|query| query == generic_probe), + "production plan should keep generic request/session probe `{generic_probe}`; queries={requests_queries:?} required={requests_required:?}" ); + } + for eval_only_probe in [ + "Session.request", + "Session.prepare_request", + "PreparedRequest.prepare", + "Session.send", + "HTTPAdapter.send", + ] { assert!( - sufficiency - .covered_claims - .iter() - .any(|claim| claim.claim.contains(expected_claim)), - "sufficient {task_class:?} packet should name the covered task claim `{expected_claim}`: {sufficiency:?}" + !requests_queries.contains(&eval_only_probe) + && !requests_required + .iter() + .any(|query| query == eval_only_probe), + "production plan should not add exact Requests probe `{eval_only_probe}`; queries={requests_queries:?} required={requests_required:?}" ); + } + + let express_question = "Trace how Express creates an application, registers middleware/routes, and handles an incoming request through the router and response helpers."; + let express_plan = build_packet_plan( + express_question, + Some(PacketTaskClassDto::RouteTracing), + PacketBudgetModeDto::Compact, + ); + let express_queries = express_plan + .queries + .iter() + .map(|query| query.query.as_str()) + .collect::>(); + let express_required = packet_sufficiency_required_probe_queries( + express_question, + PacketTaskClassDto::RouteTracing, + ); + + for eval_only_probe in [ + "createApplication", + "app.init", + "app.handle", + "app.use", + "app.route", + "res.send", + "application.js app.use", + ] { assert!( - sufficiency - .avoid_opening - .iter() - .any(|entry| entry.contains(avoid_path)), - "sufficient {task_class:?} packet should discourage reopening cited path `{avoid_path}`: {sufficiency:?}" + !express_queries.contains(&eval_only_probe) + && !express_required + .iter() + .any(|query| query == eval_only_probe), + "production plan should not add exact Express probe `{eval_only_probe}`; queries={express_queries:?} required={express_required:?}" ); } } #[test] - fn architecture_sufficiency_requires_minimum_distinct_claim_families() { - let question = "Explain how project indexing reaches persistent storage."; - let citations = vec![ - test_packet_citation("Project::buildIndex", "src/lib/project/Project.cpp", 0.9), - test_packet_citation( - "TaskBuildIndex", - "src/lib/data/indexer/TaskBuildIndex.cpp", - 0.85, - ), - test_packet_citation( - "TaskFillIndexerCommandsQueue", - "src/lib/data/indexer/TaskFillIndexerCommandQueue.h", - 0.8, - ), - ]; - let (_answer, sufficiency) = build_sufficient_packet_fixture( + fn packet_plan_derives_java_string_check_symbol_probes() { + let _eval_probes = EvalProbesGuard::enabled(); + let question = "Explain how Commons Lang implements blank, empty, and case-sensitive string checks across StringUtils, Strings, and CharSequenceUtils. Cite the source files and name the supporting symbols."; + let plan = build_packet_plan( question, - PacketTaskClassDto::ArchitectureExplanation, - citations, - ); - assert_eq!( - sufficiency.status, - PacketSufficiencyStatusDto::Partial, - "duplicate claim families should not satisfy architecture packets: {sufficiency:?}" + Some(PacketTaskClassDto::ArchitectureExplanation), + PacketBudgetModeDto::Compact, ); - assert!( - sufficiency - .gaps - .iter() - .any(|gap| gap.contains("claim families")), - "architecture sufficiency should explain missing claim-family coverage: {sufficiency:?}" + let queries = plan + .queries + .iter() + .map(|query| query.query.as_str()) + .collect::>(); + let required = packet_sufficiency_required_probe_queries( + question, + PacketTaskClassDto::ArchitectureExplanation, ); + + for expected in [ + "StringUtils", + "StringUtils.isBlank", + "StringUtils.isEmpty", + "Strings.CS", + "Strings.CI", + "CharSequenceUtils", + "CharSequenceUtils.regionMatches", + ] { + assert!( + queries.contains(&expected), + "packet plan should include Java string probe `{expected}` in {queries:?}" + ); + assert!( + required.iter().any(|query| query == expected), + "packet required probes should protect Java string probe `{expected}` in {required:?}" + ); + } + + for expected_file_probe in ["StringUtils.java", "Strings.java", "CharSequenceUtils.java"] { + assert!( + queries.contains(&expected_file_probe), + "packet plan should include generic file probe `{expected_file_probe}` in {queries:?}" + ); + } } #[test] - fn partial_and_insufficient_packets_recommend_targeted_followups() { - let question = "Explain route dispatch with enough evidence to stop."; - let mut partial_answer = packet_answer_fixture( + fn packet_plan_keeps_literal_symbols_without_eval_family_expansion() { + let _env = EnvVarGuard::cleared(EVAL_PROBES_ENV); + let question = "Explain how Commons Lang implements blank, empty, and case-sensitive string checks across StringUtils, Strings, and CharSequenceUtils. Cite the source files and name the supporting symbols."; + let plan = build_packet_plan( question, - vec![test_packet_citation( - "RouteDispatcher", - "src/router/dispatch.rs", - 0.8, - )], + Some(PacketTaskClassDto::ArchitectureExplanation), + PacketBudgetModeDto::Compact, ); - let mut budget = apply_packet_budget( - packet_fixture_project_root(), + let queries = plan + .queries + .iter() + .map(|query| query.query.as_str()) + .collect::>(); + + for literal_symbol in ["StringUtils", "Strings", "CharSequenceUtils"] { + assert!( + queries.contains(&literal_symbol), + "production packet plan should keep literal prompt symbol `{literal_symbol}` in {queries:?}" + ); + } + for eval_only_probe in [ + "StringUtils.isBlank", + "StringUtils.isEmpty", + "StringUtils.java", + "Strings.java", + "CharSequenceUtils.java", + ] { + assert!( + !queries.contains(&eval_only_probe), + "production packet plan should not add eval-only family probe `{eval_only_probe}` in {queries:?}" + ); + } + } + + #[test] + fn packet_plan_derives_swr_hook_flow_symbol_probes() { + let _eval_probes = EvalProbesGuard::enabled(); + let question = "Explain how SWR exposes useSWR, serializes keys, connects cache helpers, and routes mutate behavior through the internal mutation helper. Cite the source files and name the supporting symbols."; + let plan = build_packet_plan( question, - PacketTaskClassDto::RouteTracing, - PacketBudgetModeDto::Tiny, - packet_budget_limits(PacketBudgetModeDto::Tiny), - &mut partial_answer, + Some(PacketTaskClassDto::ArchitectureExplanation), + PacketBudgetModeDto::Compact, ); - budget.truncated = true; - budget.omitted_sections = vec!["output_bytes".to_string()]; - let partial = build_packet_sufficiency( - packet_fixture_project_root(), + let queries = plan + .queries + .iter() + .map(|query| query.query.as_str()) + .collect::>(); + let required = packet_sufficiency_required_probe_queries( question, - PacketTaskClassDto::RouteTracing, - &partial_answer, - &budget, + PacketTaskClassDto::ArchitectureExplanation, ); - assert_eq!(partial.status, PacketSufficiencyStatusDto::Partial); - assert!( - partial - .follow_up_commands - .iter() - .any(|command| command.contains("--budget compact")), - "partial packets should recommend the next deeper packet command: {partial:?}" - ); - assert!( - partial - .follow_up_commands - .iter() - .any(|command| command.contains("codestory-cli search")), - "partial packets should recommend targeted CodeStory search, not broad source reads: {partial:?}" - ); - assert!( - partial - .follow_up_commands - .iter() - .all(|command| !command.contains("")), - "partial packet follow-up commands should be directly runnable: {partial:?}" - ); - assert!( - partial - .follow_up_commands - .iter() - .all(|command| command.contains("--project 'C:/workspace/project root'")), - "partial packet follow-up commands should include the concrete project root: {partial:?}" - ); + for expected in [ + "useSWR", + "useSWRHandler", + "withArgs", + "withMiddleware", + "serialize", + "createCacheHelper", + "internalMutate", + ] { + assert!( + queries.contains(&expected), + "packet plan should include SWR flow probe `{expected}` in {queries:?}" + ); + assert!( + required.iter().any(|query| query == expected), + "packet required probes should protect SWR flow probe `{expected}` in {required:?}" + ); + } - let mut weak_answer = packet_answer_fixture( - question, - vec![test_packet_citation( - "RouteDispatcher", - "src/router/dispatch.rs", - 0.8, - )], - ); - let weak_budget = apply_packet_budget( - packet_fixture_project_root(), + for expected_file_probe in [ + "index.ts useSWR", + "use-swr.ts useSWRHandler", + "serialize.ts", + "helper.ts createCacheHelper", + "mutate.ts internalMutate", + "with-middleware.ts withMiddleware", + ] { + assert!( + queries.contains(&expected_file_probe), + "packet plan should include SWR file probe `{expected_file_probe}` in {queries:?}" + ); + } + } + + #[test] + fn packet_plan_derives_gin_route_dispatch_symbol_probes() { + let _eval_probes = EvalProbesGuard::enabled(); + let question = "Trace how Gin creates an engine, registers routes through router groups, stores them in method trees, and dispatches handlers for a request. Cite the source files and name the supporting symbols."; + let plan = build_packet_plan( question, - PacketTaskClassDto::RouteTracing, + Some(PacketTaskClassDto::RouteTracing), PacketBudgetModeDto::Compact, - packet_budget_limits(PacketBudgetModeDto::Compact), - &mut weak_answer, - ); - let weak = build_packet_sufficiency( - packet_fixture_project_root(), - question, - PacketTaskClassDto::RouteTracing, - &weak_answer, - &weak_budget, - ); - assert_eq!(weak.status, PacketSufficiencyStatusDto::Partial); - assert!( - weak.gaps - .iter() - .any(|gap| gap.contains("at least 3 are required")), - "single-citation route packets should name the coverage gap: {weak:?}" ); + let queries = plan + .queries + .iter() + .map(|query| query.query.as_str()) + .collect::>(); + let required = + packet_sufficiency_required_probe_queries(question, PacketTaskClassDto::RouteTracing); - let mut empty_answer = packet_answer_fixture(question, Vec::new()); - let empty_budget = apply_packet_budget( - packet_fixture_project_root(), + for expected in [ + "gin.go New", + "gin.go Default", + "routergroup.go RouterGroup.Handle", + "gin.go Engine.addRoute", + "tree.go node.addRoute", + "gin.go Engine.handleHTTPRequest", + "context.go Context.Next", + ] { + assert!( + queries.contains(&expected), + "packet plan should include Gin route probe `{expected}` in {queries:?}" + ); + assert!( + required.iter().any(|query| query == expected), + "packet required probes should protect Gin route probe `{expected}` in {required:?}" + ); + } + + for client_probe in ["request interceptor", "transport adapter"] { + assert!( + !required.iter().any(|query| query == client_probe), + "server route tracing should not require client transport probe `{client_probe}` in {required:?}" + ); + } + + for expected_file_probe in [ + "gin.go New", + "gin.go Default", + "gin.go Engine.addRoute", + "gin.go Engine.handleHTTPRequest", + "routergroup.go RouterGroup.Handle", + "tree.go node.addRoute", + "context.go Context.Next", + ] { + assert!( + queries.contains(&expected_file_probe), + "packet plan should include Gin file probe `{expected_file_probe}` in {queries:?}" + ); + } + } + + #[test] + fn packet_plan_derives_css_animation_symbol_probes() { + let _eval_probes = EvalProbesGuard::enabled(); + let question = "Explain how animate.css defines shared animation variables/base classes and connects named animation classes to keyframes. Cite the source files and name the supporting selectors or keyframes."; + let plan = build_packet_plan( question, - PacketTaskClassDto::RouteTracing, + Some(PacketTaskClassDto::ArchitectureExplanation), PacketBudgetModeDto::Compact, - packet_budget_limits(PacketBudgetModeDto::Compact), - &mut empty_answer, ); - let insufficient = build_packet_sufficiency( - packet_fixture_project_root(), + let queries = plan + .queries + .iter() + .map(|query| query.query.as_str()) + .collect::>(); + let required = packet_sufficiency_required_probe_queries( question, - PacketTaskClassDto::RouteTracing, - &empty_answer, - &empty_budget, + PacketTaskClassDto::ArchitectureExplanation, ); - assert_eq!( - insufficient.status, - PacketSufficiencyStatusDto::Insufficient - ); - assert!( - insufficient - .follow_up_commands - .iter() - .any(|command| command.contains("codestory-cli index")), - "insufficient packets should recommend indexing before broad exploration: {insufficient:?}" - ); - assert!( - insufficient - .follow_up_commands - .iter() - .any(|command| command.contains("codestory-cli search") - && command.contains("--why") - && !command.contains("--repo-text on")), - "insufficient packets should recommend sidecar-primary search diagnostics: {insufficient:?}" - ); + for expected in [ + "source/_vars.css", + "source/_base.css", + "source/animate.css", + "source/attention_seekers/bounce.css bounce", + "source/attention_seekers/flash.css flash", + ] { + assert!( + queries.contains(&expected), + "packet plan should include CSS animation probe `{expected}` in {queries:?}" + ); + assert!( + required.iter().any(|query| query == expected), + "packet required probes should protect CSS animation probe `{expected}` in {required:?}" + ); + } } - #[test] - fn packet_follow_up_commands_single_quote_shell_sensitive_questions() { - let question = "Inspect $env:SECRET and $(Get-ChildItem) and 'literal'"; - let quoted = quote_packet_command_value(question); - - assert_eq!( - quoted, - "'Inspect $env:SECRET and $(Get-ChildItem) and ''literal'''" - ); - let command = next_deeper_packet_command( - packet_fixture_project_root(), + fn packet_plan_derives_automapper_map_flow_symbol_probes() { + let _eval_probes = EvalProbesGuard::enabled(); + let question = "Explain how AutoMapper configuration and runtime mapper APIs cooperate to map source objects to destination objects. Cite the source files and name the supporting symbols."; + let plan = build_packet_plan( question, - PacketBudgetModeDto::Tiny, - ) - .expect("tiny packet should have deeper command"); - assert!( - command.contains("--question 'Inspect $env:SECRET and $(Get-ChildItem)"), - "packet command should single-quote shell-sensitive question text: {command}" + Some(PacketTaskClassDto::ArchitectureExplanation), + PacketBudgetModeDto::Compact, ); - assert!( - command.contains("--project 'C:/workspace/project root'"), - "packet command should include the concrete project root: {command}" + let queries = plan + .queries + .iter() + .map(|query| query.query.as_str()) + .collect::>(); + let required = packet_sufficiency_required_probe_queries( + question, + PacketTaskClassDto::ArchitectureExplanation, ); - } + for expected in [ + "src/AutoMapper/Mapper.cs IMapperBase", + "src/AutoMapper/Mapper.cs IMapper", + "src/AutoMapper/Mapper.cs Mapper", + "src/AutoMapper/Mapper.cs Mapper.Map", + "src/AutoMapper/Configuration/MapperConfiguration.cs MapperConfiguration", + "src/AutoMapper/TypeMap.cs TypeMap.CreateMapperLambda", + "src/AutoMapper/Execution/TypeMapPlanBuilder.cs TypeMapPlanBuilder", + "TypeMapPlanBuilder.CreateMapperLambda", + ] { + assert!( + queries.contains(&expected), + "packet plan should include AutoMapper probe `{expected}` in {queries:?}" + ); + assert!( + required.iter().any(|query| query == expected), + "packet required probes should protect AutoMapper probe `{expected}` in {required:?}" + ); + } + } #[test] - fn packet_anchor_probe_limit_hard_stops_after_sla_exhaustion() { - let budget = PacketLatencyBudget { - started_at: Instant::now() - std::time::Duration::from_secs(30), - target_ms: 1_000, - }; - assert_eq!( - packet_anchor_probe_limit_for_budget(PacketBudgetModeDto::Compact, budget, 1_500), - 0 - ); + fn gin_route_dispatch_source_claims_name_registration_and_context_flow() { + let _eval_probes = EvalProbesGuard::enabled(); + let prompt = "Trace how Gin creates an engine, registers routes through router groups, stores them in method trees, and dispatches handlers for a request."; + let fixtures = [ + ( + "RouterGroup.Handle", + "routergroup.go", + r#" + func (group *RouterGroup) handle(httpMethod, relativePath string, handlers HandlersChain) IRoutes { + absolutePath := group.calculateAbsolutePath(relativePath) + handlers = group.combineHandlers(handlers) + group.engine.addRoute(httpMethod, absolutePath, handlers) + return group.returnObj() + } + func (group *RouterGroup) Handle(httpMethod, relativePath string, handlers ...HandlerFunc) IRoutes { + return group.handle(httpMethod, relativePath, handlers) + } + "#, + "RouterGroup.Handle registers routes by delegating to the group handle path.", + ), + ( + "Engine.addRoute", + "gin.go", + r#" + func (engine *Engine) addRoute(method, path string, handlers HandlersChain) { + root := engine.trees.get(method) + if root == nil { + root = new(node) + engine.trees = append(engine.trees, methodTree{method: method, root: root}) + } + root.addRoute(path, handlers) + } + "#, + "Engine.addRoute inserts handlers into the per-method route tree.", + ), + ( + "Engine.handleHTTPRequest", + "gin.go", + r#" + func (engine *Engine) handleHTTPRequest(c *Context) { + value := root.getValue(rPath, c.params, c.skippedNodes, unescape) + if value.handlers != nil { + c.handlers = value.handlers + c.fullPath = value.fullPath + c.Next() + } + } + "#, + "Engine.handleHTTPRequest finds a route and installs handlers on the context.", + ), + ( + "Context.Next", + "context.go", + r#" + func (c *Context) Next() { + c.index++ + for c.index < safeInt8(len(c.handlers)) { + if c.handlers[c.index] != nil { + c.handlers[c.index](c) + } + c.index++ + } + } + "#, + "Context.Next advances through the handler chain.", + ), + ]; + + for (symbol, path, source, expected) in fixtures { + let citation = test_packet_citation(symbol, path, 0.9); + let claims = packet_source_derived_claims_for_citation(prompt, &citation, source); + assert!( + claims.iter().any(|claim| claim == expected), + "expected source-derived Gin claim `{expected}` for {path}; got {claims:?}" + ); + } } #[test] - fn packet_anchor_probe_limit_reduces_when_budget_half_consumed() { - let budget = PacketLatencyBudget { - started_at: Instant::now(), - target_ms: 10_000, - }; - assert_eq!( - packet_anchor_probe_limit_for_budget(PacketBudgetModeDto::Compact, budget, 5_500), - 14 - ); - assert_eq!( - packet_anchor_probe_limit_for_budget(PacketBudgetModeDto::Compact, budget, 8_000), - 7 - ); + fn server_route_source_claims_survive_with_eval_probes() { + let _eval_probes = EvalProbesGuard::enabled(); + let prompt = "Trace how a router group registers routes and dispatches handlers for an HTTP request."; + let fixtures = [ + ( + "RouterGroup.Handle", + "routergroup.go", + r#" + func (group *RouterGroup) Handle(httpMethod, relativePath string, handlers ...HandlerFunc) IRoutes { + if matched := regEnLetter.MatchString(httpMethod); !matched { + panic("http method is not valid") + } + return group.handle(httpMethod, relativePath, handlers) + } + "#, + "RouterGroup.Handle registers routes by delegating to the group handle path.", + ), + ( + "Context.Next", + "context.go", + r#" + func (c *Context) Next() { + c.index++ + for c.index < safeInt8(len(c.handlers)) { + if c.handlers[c.index] != nil { + c.handlers[c.index](c) + } + c.index++ + } + } + "#, + "Context.Next advances through the handler chain.", + ), + ]; + + for (symbol, path, source, expected) in fixtures { + let citation = test_packet_citation(symbol, path, 0.9); + let claims = packet_source_derived_claims_for_citation(prompt, &citation, source); + assert!( + claims.iter().any(|claim| claim == expected), + "expected eval-only server-route claim `{expected}` for {path}; got {claims:?}" + ); + } } #[test] - fn merged_packet_latency_recomputes_sla_against_packet_budget() { - let mut answer = packet_answer_fixture( - "Explain the packet latency budget.", - vec![ - test_packet_citation("A", "src/a.rs", 0.8), - test_packet_citation("B", "src/b.rs", 0.8), - test_packet_citation("C", "src/c.rs", 0.8), - ], - ); - answer.retrieval_trace.total_latency_ms = 900; - answer.retrieval_trace.sla_missed = false; - answer.retrieval_trace.total_latency_ms = - answer.retrieval_trace.total_latency_ms.saturating_add(250); + fn express_shape_route_claims_survive_with_eval_probes() { + let _eval_probes = EvalProbesGuard::enabled(); + let prompt = "Trace how Express creates an app, registers middleware and routes, handles an incoming request, and sends a response."; - PacketLatencyBudget { - started_at: Instant::now(), - target_ms: 1_000, - } - .apply_to_trace(&mut answer); + let fixtures = [ + ( + "createApplication", + "lib/express.js", + r#" + function createApplication() { + var app = function(req, res, next) { app.handle(req, res, next); }; + mixin(app, proto, false); + app.request = Object.create(req); + app.response = Object.create(res); + app.init(); + return app; + } + "#, + "createApplication builds a callable app object and mixes in request and response prototypes.", + ), + ( + "application", + "lib/application.js", + r#" + app.init = function init() { + this.defaultConfiguration(); + var router = new Router({}); + }; + + app.handle = function handle(req, res, callback) { + this.router.handle(req, res, done); + }; + + app.use = function use(fn) { + return router.use(path, fn); + }; + + app.route = function route(path) { + return this.router.route(path); + }; + "#, + "app.init creates application state and lazy router configuration.", + ), + ( + "response", + "lib/response.js", + r#" + res.send = function send(body) { + this.set('Content-Length', len); + return this.end(chunk, encoding); + }; + "#, + "res.send prepares and sends the response body.", + ), + ]; - assert_eq!(answer.retrieval_trace.total_latency_ms, 1_150); - assert!(answer.retrieval_trace.sla_missed); - assert_eq!(answer.retrieval_trace.sla_target_ms, Some(1_000)); + for (symbol, path, source, expected) in fixtures { + let citation = test_packet_citation(symbol, path, 0.9); + let claims = packet_source_derived_claims_for_citation(prompt, &citation, source); + assert!( + claims.iter().any(|claim| claim == expected), + "expected application-route claim `{expected}` for {path}; got {claims:?}" + ); + } } #[test] - fn packet_benchmark_trace_keeps_counters_without_duplicating_full_trace() { - let mut answer = packet_answer_fixture( - "Explain the packet benchmark trace.", - vec![test_packet_citation( - "PacketTrace", - "src/packet_trace.rs", - 0.8, - )], - ); - answer.retrieval_trace.total_latency_ms = 42; - answer.retrieval_trace.sla_target_ms = Some(1_000); - answer.retrieval_trace.sla_missed = true; - answer.retrieval_trace.annotations.push( - "large trace annotation should stay only on the canonical answer trace".repeat(8), + fn shell_version_use_guard_claim_survives_with_generic_claims() { + let prompt = "Trace how a shell version manager install script dispatches use commands and switches versions."; + let citation = test_packet_citation("maybe_switch_if_needed", "tool.sh", 0.9); + let claims = packet_source_derived_claims_for_citation( + prompt, + &citation, + r#" + maybe_switch_if_needed() { + if [ "_${1-}" = "_$(tool_ls_current)" ]; then + return + fi + tool use "$@" + } + "#, ); - answer.retrieval_trace.steps = vec![ - AgentRetrievalStepDto { - kind: AgentRetrievalStepKindDto::Search, - status: AgentRetrievalStepStatusDto::Ok, - duration_ms: 10, - input: Vec::new(), - output: Vec::new(), - message: Some("search details".repeat(16)), - }, - AgentRetrievalStepDto { - kind: AgentRetrievalStepKindDto::Trail, - status: AgentRetrievalStepStatusDto::Ok, - duration_ms: 20, - input: Vec::new(), - output: Vec::new(), - message: Some("trail details".repeat(16)), - }, - AgentRetrievalStepDto { - kind: AgentRetrievalStepKindDto::SourceRead, - status: AgentRetrievalStepStatusDto::Ok, - duration_ms: 12, - input: Vec::new(), - output: Vec::new(), - message: Some("source details".repeat(16)), - }, - ]; - - let full_trace_bytes = serde_json::to_vec(&answer.retrieval_trace) - .expect("serialize canonical trace") - .len(); - let benchmark_trace = packet_benchmark_trace(&answer); - let benchmark_trace_bytes = serde_json::to_vec(&benchmark_trace.retrieval_trace) - .expect("serialize benchmark trace") - .len(); - assert_eq!(answer.retrieval_trace.steps.len(), 3); - assert_eq!(benchmark_trace.search_steps, 1); - assert_eq!(benchmark_trace.trail_steps, 1); - assert_eq!(benchmark_trace.source_read_steps, 1); - assert_eq!(benchmark_trace.retrieval_trace.total_latency_ms, 42); - assert_eq!(benchmark_trace.retrieval_trace.sla_target_ms, Some(1_000)); - assert!(benchmark_trace.retrieval_trace.sla_missed); - assert!(benchmark_trace.retrieval_trace.steps.is_empty()); - assert!(benchmark_trace.retrieval_trace.annotations.is_empty()); + let expected = "maybe_switch_if_needed switches versions only when the requested version is not already active."; assert!( - benchmark_trace_bytes < full_trace_bytes / 2, - "benchmark trace should stay scalar-sized: {benchmark_trace_bytes} >= {full_trace_bytes}/2" + claims.iter().any(|claim| claim == expected), + "expected generic shell version-use claim `{expected}`; got {claims:?}" ); } #[test] - fn citation_budget_truncation_keeps_sufficient_stop_signal() { - let question = "Explain the compact packet stop rule."; - let mut answer = packet_answer_fixture( - question, - vec![ - test_packet_citation("CliCommand", "crates/tool-cli/src/main.rs", 0.8), - test_packet_citation("RuntimeCoordinator", "crates/core/src/runtime.rs", 0.8), - test_packet_citation("WorkspacePlan", "crates/core/src/workspace/plan.rs", 0.8), - test_packet_citation("GraphIndexer", "crates/indexer/src/lib.rs", 0.8), - test_packet_citation("ProjectionStore", "crates/store/src/projection.rs", 0.8), - test_packet_citation("SnapshotRefresh", "crates/store/src/snapshot.rs", 0.8), - test_packet_citation("RouteHandler", "src/routes/user.rs", 0.8), - test_packet_citation("PacketRegression", "tests/packet_flow.rs", 0.8), - test_packet_citation("PacketBudget", "src/packet/budget.rs", 0.8), - test_packet_citation("PacketStopRule", "src/packet/stop_rule.rs", 0.8), - test_packet_citation("PacketClaim", "src/packet/claim.rs", 0.8), - test_packet_citation("PacketFollowUp", "src/packet/follow_up.rs", 0.8), - test_packet_citation("PacketContext", "src/packet/context.rs", 0.8), - test_packet_citation("PacketOutput", "src/packet/output.rs", 0.8), - ], + fn hook_cache_source_claims_survive_with_eval_probes() { + let _eval_probes = EvalProbesGuard::enabled(); + let prompt = "Explain how a public hook serializes keys, connects cache helpers, and routes mutate behavior."; + + let hook = test_packet_citation("useDataHandler", "src/hooks/use-data.ts", 0.9); + let claims = packet_source_derived_claims_for_citation( + prompt, + &hook, + r#" + import { type State, withArgs } from '../_internal' + + export interface FullConfiguration { + fallback: Record> + } + + export const useDataHandler = (_key) => { + const [key, fnArg] = serialize(_key) + return internalMutate(cache, key, fnArg) + } + + const useData = withArgs(useDataHandler) + export default useData + "#, ); - let budget = apply_packet_budget( - packet_fixture_project_root(), - question, - PacketTaskClassDto::ArchitectureExplanation, - PacketBudgetModeDto::Compact, - packet_budget_limits(PacketBudgetModeDto::Compact), - &mut answer, + let expected = + "The public useData export wraps useDataHandler with argument normalization."; + assert!( + claims.iter().any(|claim| claim == expected), + "expected eval-only hook wrapper claim `{expected}`; got {claims:?}" ); - let sufficiency = build_packet_sufficiency( - packet_fixture_project_root(), - question, - PacketTaskClassDto::ArchitectureExplanation, - &answer, - &budget, + assert!( + claims + .iter() + .all(|claim| !claim.contains("public types export wraps thenable")), + "generic hook wrapper claim should come from the withArgs assignment, not imports or unrelated type defaults; got {claims:?}" ); + let helper = test_packet_citation("makeCacheHelper", "src/cache/helper.ts", 0.9); + let claims = packet_source_derived_claims_for_citation( + prompt, + &helper, + r#" + export const makeCacheHelper = (cache, key) => { + return [ + () => cache.get(key), + info => state[5](key, info), + state[6], + () => snapshot.get(key) + ] as const + } + "#, + ); + let expected = "makeCacheHelper provides cache get, set, subscribe, and snapshot helpers."; assert!( - budget.truncated && budget.omitted_sections.contains(&"citations".to_string()), - "fixture should exercise normal citation budget truncation: {budget:?}" + claims.iter().any(|claim| claim == expected), + "expected eval-only cache helper claim `{expected}`; got {claims:?}" ); - assert_eq!( - sufficiency.status, - PacketSufficiencyStatusDto::Sufficient, - "budgeted citation clipping should not force broad follow-up when the compact packet still has cited anchors: {sufficiency:?}" + + let swr_handler = test_packet_citation("useSWRHandler", "src/index/use-swr.ts", 0.9); + let claims = packet_source_derived_claims_for_citation( + prompt, + &swr_handler, + r#" + export const useSWRHandler = (_key, fetcher, config) => { + const [key, fnArg] = serialize(_key) + const [getCache, setCache, subscribeCache, getInitialCache] = + createCacheHelper(cache, key) + const cachedData = getCache() + return { data: cachedData.data, mutate: (...args) => internalMutate(cache, key, ...args) } + } + "#, ); - assert!(sufficiency.follow_up_commands.is_empty()); - assert_eq!(answer.citations.len(), 13); + let expected = "useSWRHandler serializes the key before reading cache state."; assert!( - sufficiency.gaps.is_empty(), - "normal compact-budget truncation should stay in budget metadata, not sufficiency gaps: {sufficiency:?}" + claims.iter().any(|claim| claim == expected), + "expected eval-only SWR key serialization claim `{expected}`; got {claims:?}" ); - assert!(budget.used.files <= budget.limits.max_files); - assert!(budget.used.output_bytes <= budget.limits.max_output_bytes); } #[test] - fn answer_critical_budget_truncation_requires_deeper_packet() { - let question = "Explain the packet stop rule when evidence is clipped."; - let mut answer = packet_answer_fixture( - question, - vec![ - test_packet_citation("CliCommand", "crates/tool-cli/src/main.rs", 0.8), - test_packet_citation("RuntimeCoordinator", "crates/core/src/runtime.rs", 0.8), - test_packet_citation("WorkspacePlan", "crates/core/src/workspace/plan.rs", 0.8), - ], - ); - let mut budget = apply_packet_budget( - packet_fixture_project_root(), - question, - PacketTaskClassDto::ArchitectureExplanation, - PacketBudgetModeDto::Compact, - packet_budget_limits(PacketBudgetModeDto::Compact), - &mut answer, + fn client_send_source_claims_survive_with_eval_probes() { + let _eval_probes = EvalProbesGuard::enabled(); + let prompt = "Explain how a client exposes convenience request helpers and routes send behavior through the transport implementation."; + + let base = test_packet_citation("BaseTransportClient", "src/base_client.dart", 0.9); + let claims = packet_source_derived_claims_for_citation( + prompt, + &base, + r#" + abstract mixin class BaseTransportClient implements Client { + Future get(Uri url) => _sendUnstreamed('GET', url); + Future post(Uri url, {Object? body}) => + _sendUnstreamed('POST', url, body); + + Future send(BaseRequest request); + + Future _sendUnstreamed(String method, Uri url, + [Object? body]) async { + var request = Request(method, url); + return Response.fromStream(await send(request)); + } + } + "#, ); - budget.truncated = true; - budget.omitted_sections = vec!["markdown_blocks".to_string(), "trail_edges".to_string()]; - budget.next_deeper_command = next_deeper_packet_command( - packet_fixture_project_root(), - question, - PacketBudgetModeDto::Compact, + let expected = "BaseTransportClient implements convenience methods in terms of send."; + assert!( + claims.iter().any(|claim| claim == expected), + "expected eval-only client convenience claim `{expected}`; got {claims:?}" ); - let sufficiency = build_packet_sufficiency( - packet_fixture_project_root(), - question, - PacketTaskClassDto::ArchitectureExplanation, - &answer, - &budget, - ); + let native = test_packet_citation("NativeClient", "src/native_client.dart", 0.9); + let claims = packet_source_derived_claims_for_citation( + prompt, + &native, + r#" + import 'dart:io'; - assert_eq!(sufficiency.status, PacketSufficiencyStatusDto::Partial); - assert!( - sufficiency - .gaps - .iter() - .any(|gap| gap.contains("answer-critical evidence")), - "answer-critical truncation should be named as a sufficiency gap: {sufficiency:?}" + class NativeClient extends BaseTransportClient { + HttpClient? _inner; + + Future send(BaseRequest request) async { + var stream = request.finalize(); + var ioRequest = await _inner!.openUrl(request.method, request.url); + final response = await stream.pipe(ioRequest) as HttpClientResponse; + return NativeStreamedResponse(response); + } + } + "#, ); + let expected = + "NativeClient.send forwards finalized requests through an HTTP client transport."; assert!( - sufficiency - .follow_up_commands - .iter() - .any(|command| command.contains("--budget standard")), - "partial packet should recommend the existing deeper packet command: {sufficiency:?}" + claims.iter().any(|claim| claim == expected), + "expected eval-only transport send claim `{expected}`; got {claims:?}" ); } #[test] - fn retrieval_appendix_and_secondary_trail_clipping_can_remain_sufficient() { - fn node(id: &str) -> codestory_contracts::api::GraphNodeDto { - codestory_contracts::api::GraphNodeDto { - id: NodeId(id.to_string()), - label: id.to_string(), - kind: codestory_contracts::api::NodeKind::FUNCTION, - depth: 1, - label_policy: None, - badge_visible_members: None, - badge_total_members: None, - merged_symbol_examples: Vec::new(), - file_path: None, - qualified_name: None, - member_access: None, - } + fn generic_css_animation_source_claims_name_vars_base_and_keyframes() { + let fixtures = [ + ( + "styles/timing.css", + r#" + :root { + --motion-duration: 250ms; + --motion-delay: 75ms; + --motion-repeat: 2; + } + "#, + "Shared CSS custom properties --motion-duration, --motion-delay, and --motion-repeat define animation duration, delay, and repeat defaults.", + ), + ( + "styles/base.css", + r#" + .motion-base { + animation-duration: var(--motion-duration); + animation-fill-mode: both; + } + "#, + ".motion-base is the base class that applies animation duration and fill mode.", + ), + ( + "styles/effects.css", + r#" + @keyframes fade-in { + from { opacity: 0; } + to { opacity: 1; } + } + + .fade-in { + animation-name: fade-in; + } + "#, + "Named classes such as .fade-in set animation-name to matching keyframes; @keyframes fade-in defines the matching animation.", + ), + ]; + + for (path, source, expected) in fixtures { + let claims = packet_generic_css_animation_flow_claims(source); + assert!( + claims.iter().any(|claim| claim == expected), + "expected generic CSS animation claim `{expected}` for {path}; got {claims:?}" + ); } + } - fn edge(id: &str, source: &str, target: &str) -> codestory_contracts::api::GraphEdgeDto { - codestory_contracts::api::GraphEdgeDto { - id: EdgeId(id.to_string()), - source: NodeId(source.to_string()), - target: NodeId(target.to_string()), - kind: codestory_contracts::api::EdgeKind::CALL, - confidence: None, - certainty: None, - callsite_identity: None, - candidate_targets: Vec::new(), - } + #[test] + fn css_animation_source_claims_name_vars_base_imports_and_keyframes() { + let _eval_probes = EvalProbesGuard::enabled(); + let prompt = "Explain how animate.css defines shared animation variables/base classes and connects named animation classes to keyframes."; + let fixtures = [ + ( + "source/_vars.css", + r#" + :root { + --animate-duration: 1s; + --animate-delay: 1s; + --animate-repeat: 1; + } + "#, + "Shared CSS custom properties define animation duration, delay, and repeat defaults.", + ), + ( + "source/_base.css", + r#" + .animated { + animation-duration: var(--animate-duration); + animation-fill-mode: both; + } + "#, + ".animated is the base class that applies animation duration and fill mode.", + ), + ( + "source/animate.css", + r#" + @import '_vars.css'; + @import '_base.css'; + @import 'attention_seekers/bounce.css'; + @import 'attention_seekers/flash.css'; + "#, + "The source/animate.css file imports the variable, base, and individual animation files.", + ), + ( + "source/attention_seekers/bounce.css", + r#" + @keyframes bounce { + from, to { transform: translate3d(0, 0, 0); } + } + .bounce { + animation-name: bounce; + } + "#, + "Named classes such as .bounce set animation-name to matching keyframes.", + ), + ( + "source/attention_seekers/flash.css", + r#" + @keyframes flash { + from, to { opacity: 1; } + } + .flash { + animation-name: flash; + } + "#, + "source/attention_seekers/flash.css defines @keyframes flash and .flash.", + ), + ]; + + for (path, source, expected) in fixtures { + let citation = test_packet_citation(path, path, 0.9); + let claims = packet_source_derived_claims_for_citation(prompt, &citation, source); + assert!( + claims.iter().any(|claim| claim == expected), + "expected CSS animation claim `{expected}` for {path}; got {claims:?}" + ); + } + } + #[test] + fn generic_sql_schema_claims_survive_with_generic_claims() { + let prompt = "Explain SQL schema relationships between artists, albums, tracks, invoices, and invoice lines across seed scripts."; + let citation = test_packet_citation("schema.sql", "db/schema.sql", 0.9); + let claims = packet_source_derived_claims_for_citation( + prompt, + &citation, + r#" + CREATE TABLE [Album] + ( + [AlbumId] INTEGER NOT NULL, + [ArtistId] INTEGER NOT NULL, + FOREIGN KEY ([ArtistId]) REFERENCES [Artist] ([ArtistId]) + ); + CREATE TABLE [Artist] ([ArtistId] INTEGER NOT NULL); + CREATE TABLE [InvoiceLine] + ( + [InvoiceLineId] INTEGER NOT NULL, + [InvoiceId] INTEGER NOT NULL, + [TrackId] INTEGER NOT NULL, + FOREIGN KEY ([InvoiceId]) REFERENCES [Invoice] ([InvoiceId]), + FOREIGN KEY ([TrackId]) REFERENCES [Track] ([TrackId]) + ); + CREATE TABLE [Track] + ( + [TrackId] INTEGER NOT NULL, + [AlbumId] INTEGER, + [GenreId] INTEGER, + [MediaTypeId] INTEGER NOT NULL, + FOREIGN KEY ([AlbumId]) REFERENCES [Album] ([AlbumId]), + FOREIGN KEY ([GenreId]) REFERENCES [Genre] ([GenreId]), + FOREIGN KEY ([MediaTypeId]) REFERENCES [MediaType] ([MediaTypeId]) + ); + "#, + ); + + for expected in [ + "SQL schema defines tables Album, Artist, InvoiceLine, and Track.", + "Album rows reference Artist rows through ArtistId.", + "InvoiceLine rows reference Invoice and Track rows.", + "Track rows reference Album, Genre, and MediaType rows.", + ] { + assert!( + claims.iter().any(|claim| claim == expected), + "expected generic SQL schema claim `{expected}` in {claims:?}" + ); } + } - let question = "Explain public content flow through Payload."; - let mut answer = packet_answer_fixture( - question, - vec![ - test_packet_citation("Posts", "src/collections/Posts.ts", 0.9), - test_packet_citation( - "getApprovedCommentsForPost", - "src/lib/content-data/comment-content.ts", - 0.9, - ), - test_packet_citation("GET /feed.xml", "src/app/feed.xml/route.ts", 0.9), - ], + #[test] + fn runtime_formatting_claims_survive_with_generic_claims() { + let prompt = "Explain how fmt turns formatting arguments into type-erased format args and reaches vformat or format_to output paths."; + + let format_h = test_packet_citation("vformat", "include/fmt/format.h", 0.9); + let claims = packet_source_derived_claims_for_citation( + prompt, + &format_h, + r#" + class format_error : public std::runtime_error {}; + inline auto vformat(locale_ref loc, string_view fmt, format_args args) -> std::string { + detail::buffer buf; + detail::vformat_to(buf, fmt, args, loc); + return to_string(buf); + } + template + auto format_to(OutputIt out, locale_ref loc, format_string fmt, T&&... args) { + return fmt::vformat_to(out, loc, fmt.str, vargs{{args...}}); + } + "#, ); - let claims = packet_supported_claims(&answer); - answer.sections = vec![ - AgentResponseSectionDto { - id: "packet-flow-claims".to_string(), - title: "Packet Claims".to_string(), - blocks: vec![AgentResponseBlockDto::Markdown { - markdown: packet_flow_claims_markdown(&claims), - }], - }, - AgentResponseSectionDto { - id: "retrieval-evidence".to_string(), - title: "Retrieval Evidence".to_string(), - blocks: vec![AgentResponseBlockDto::Markdown { - markdown: format!( - "Search appendix and low-level trace details.{}", - PACKET_MARKDOWN_TRUNCATION_SUFFIX - ), - }], - }, + + for expected in [ + "Runtime formatting uses type-erased format arguments before dispatching formatted output helpers.", + "Formatting errors are represented as runtime failures.", + ] { + assert!( + claims.iter().any(|claim| claim == expected), + "expected runtime formatting claim `{expected}` in {claims:?}" + ); + } + } + + #[test] + fn site_build_claims_survive_with_generic_claims() { + let _eval_probes = EvalProbesGuard::enabled(); + let prompt = "Trace how Jekyll's build command creates a site and runs the read, generate, render, and write phases."; + + let fixtures = [ + ( + "Jekyll::Commands::Build.process", + "lib/jekyll/commands/build.rb", + r#" + module Jekyll + module Commands + class Build + def process(options) + site = Jekyll::Site.new(options) + build(site, options) + end + end + end + end + "#, + "Build.process constructs a Jekyll::Site before running the build.", + ), + ( + "Site#process", + "lib/jekyll/site.rb", + r#" + class Site + def process + reset + read + generate + render + cleanup + write + end + end + "#, + "Site#process runs read, generate, render, and write phases.", + ), + ( + "Reader", + "lib/jekyll/reader.rb", + r#" + class Reader + def read + read_directories + read_data + end + end + "#, + "Reader is responsible for reading site content.", + ), + ( + "Renderer", + "lib/jekyll/renderer.rb", + r#" + class Renderer + def render_document + end + + def render_liquid(content, payload, info, path = nil) + end + end + "#, + "Renderer renders pages and documents.", + ), ]; - answer.graphs.push(GraphArtifactDto::Uml { - id: "primary".to_string(), - title: "Primary Neighborhood".to_string(), - graph: GraphResponse { - center_id: NodeId("post-page".to_string()), - nodes: vec![node("post-page"), node("payload")], - edges: vec![edge("edge_1", "post-page", "payload")], - truncated: false, - omitted_edge_count: 0, - canonical_layout: None, - }, + + for (symbol, path, source, expected) in fixtures { + let citation = test_packet_citation(symbol, path, 0.9); + let claims = packet_source_derived_claims_for_citation(prompt, &citation, source); + assert!( + claims.iter().any(|claim| claim == expected), + "expected site build claim `{expected}` for {path}; got {claims:?}" + ); + } + } + + #[test] + fn generic_sql_schema_file_probe_adds_files_and_source_anchors() { + let root = packet_temp_root("generic-sql-schema"); + let db_dir = root.join("db"); + let _ = std::fs::remove_dir_all(&root); + std::fs::create_dir_all(&db_dir).expect("create sql fixture directory"); + let schema_path = db_dir.join("schema.sql"); + std::fs::write( + &schema_path, + r#" + /***** Create Tables *****/ + CREATE TABLE [Artist] ([ArtistId] INTEGER NOT NULL); + CREATE TABLE [Album] + ( + [AlbumId] INTEGER NOT NULL, + [ArtistId] INTEGER NOT NULL, + FOREIGN KEY ([ArtistId]) REFERENCES [Artist] ([ArtistId]) + ); + CREATE TABLE [Track] + ( + [TrackId] INTEGER NOT NULL, + [AlbumId] INTEGER, + FOREIGN KEY ([AlbumId]) REFERENCES [Album] ([AlbumId]) + ); + "#, + ) + .expect("write sql fixture"); + + let question = "Explain SQL schema relationships between artists, albums, and tracks."; + let mut answer = packet_answer_fixture(question, Vec::new()); + maybe_append_sql_schema_file_citations(&root, question, &mut answer); + + let has_file = answer.citations.iter().any(|citation| { + citation.kind == NodeKind::FILE + && citation.display_name == "db/schema.sql" + && citation + .retrieval_score_breakdown + .as_ref() + .is_some_and(|breakdown| { + breakdown + .provenance + .iter() + .any(|entry| entry == "packet_generic_sql_schema_file_probe") + }) + }); + let has_album_anchor = answer.citations.iter().any(|citation| { + citation.kind == NodeKind::ANNOTATION + && citation.display_name == "CREATE TABLE Album" + && citation.file_path.as_deref().is_some_and(|path| { + packet_display_path(path) + .replace('\\', "/") + .ends_with("db/schema.sql") + }) }); + let has_track_anchor = answer.citations.iter().any(|citation| { + citation.kind == NodeKind::ANNOTATION && citation.display_name == "CREATE TABLE Track" + }); + let has_foreign_key_anchor = answer.citations.iter().any(|citation| { + citation.kind == NodeKind::ANNOTATION + && citation.display_name == "FOREIGN KEY" + && citation + .retrieval_score_breakdown + .as_ref() + .is_some_and(|breakdown| { + breakdown + .provenance + .iter() + .any(|entry| entry == "packet_generic_sql_schema_anchor_probe") + }) + }); + let has_comment_false_positive = answer + .citations + .iter() + .any(|citation| citation.display_name == "CREATE TABLE s"); - let budget = PacketBudgetDto { - requested: PacketBudgetModeDto::Compact, - limits: packet_budget_limits(PacketBudgetModeDto::Compact), - used: packet_budget_usage(&answer), - truncated: true, - omitted_sections: vec![ - "citations".to_string(), - "markdown_blocks".to_string(), - "trail_edges".to_string(), - ], - next_deeper_command: next_deeper_packet_command( - packet_fixture_project_root(), - question, - PacketBudgetModeDto::Compact, - ), - }; + let _ = std::fs::remove_dir_all(&root); - let sufficiency = build_packet_sufficiency( - packet_fixture_project_root(), - question, - PacketTaskClassDto::ArchitectureExplanation, - &answer, - &budget, + assert!( + has_file, + "generic SQL schema probe should append the schema file citation: {:?}", + answer.citations + ); + assert!( + has_album_anchor, + "generic SQL schema probe should append CREATE TABLE anchors: {:?}", + answer.citations + ); + assert!( + has_track_anchor, + "generic SQL schema probe should carry prompt-matched table anchors: {:?}", + answer.citations + ); + assert!( + has_foreign_key_anchor, + "generic SQL schema probe should append FOREIGN KEY anchors: {:?}", + answer.citations + ); + assert!( + !has_comment_false_positive, + "generic SQL schema probe should not parse prose comments as table names: {:?}", + answer.citations ); - - assert_eq!(sufficiency.status, PacketSufficiencyStatusDto::Sufficient); - assert!(sufficiency.gaps.is_empty()); - assert!(sufficiency.follow_up_commands.is_empty()); - assert!(sufficiency.covered_claims.len() >= 3); } #[test] - fn packet_output_budget_measures_serialized_packet_payload() { - let question = "Explain the final packet payload budget."; - let limits = PacketBudgetLimitsDto { - max_anchors: 4, - max_files: 4, - max_snippets: 4, - max_trail_edges: 4, - max_output_bytes: 6 * 1024, - }; - let max_output_bytes = limits.max_output_bytes; - let mut answer = packet_answer_fixture( - question, - vec![test_packet_citation( - "PacketBudget", - "crates/codestory-runtime/src/agent/orchestrator.rs", - 0.8, - )], + fn required_file_scoped_source_probe_adds_method_and_markup_anchors() { + let root = packet_temp_root("required-source-probes"); + let _ = std::fs::remove_dir_all(&root); + write_packet_fixture_file( + &root, + "lib/jekyll/site.rb", + r#" + module Jekyll + class Site + def process + read + render + write + end + end + end + "#, ); - if let AgentResponseBlockDto::Markdown { markdown } = &mut answer.sections[0].blocks[0] { - *markdown = "payload budget evidence ".repeat(6000); - } - let budget = apply_packet_budget( - packet_fixture_project_root(), - question, - PacketTaskClassDto::ArchitectureExplanation, - PacketBudgetModeDto::Tiny, - limits, - &mut answer, + write_packet_fixture_file( + &root, + "src/Logging/Logger.php", + r#" + + + + "#, ); - let benchmark_trace = packet_benchmark_trace(&answer); - let mut packet = AgentPacketDto { - packet_id: answer.answer_id.clone(), - question: question.to_string(), - task_class: Some(PacketTaskClassDto::ArchitectureExplanation), - plan: PacketPlanDto { - task_class: PacketTaskClassDto::ArchitectureExplanation, - inferred_task_class: false, - queries: vec![PacketPlanQueryDto { - query: question.to_string(), - purpose: "fixture".to_string(), - }], - trace: Vec::new(), - }, - answer, - budget, - sufficiency, - benchmark_trace, - }; - enforce_packet_output_budget(packet_fixture_project_root(), &mut packet); + let mut answer = packet_answer_fixture("fixture packet", Vec::new()); + let probes = [ + "lib/jekyll/site.rb Site#process".to_string(), + "src/Logging/Logger.php Logger::addRecord".to_string(), + "html/forms/custom-validation/detailed-custom-validation.html input#mail".to_string(), + "html/forms/custom-validation/detailed-custom-validation.html novalidate".to_string(), + ]; + maybe_append_required_file_scoped_source_citations( + &root, + "fixture packet", + PacketTaskClassDto::DataFlow, + &probes, + &mut answer, + ); + + let has_ruby_method = answer.citations.iter().any(|citation| { + citation.display_name == "Site#process" + && citation.kind == NodeKind::METHOD + && citation.line == Some(4) + }); + let has_php_method = answer.citations.iter().any(|citation| { + citation.display_name == "Logger::addRecord" + && citation.kind == NodeKind::METHOD + && citation + .file_path + .as_deref() + .is_some_and(|path| packet_display_path(path).ends_with("Logger.php")) + }); + let has_input_anchor = answer.citations.iter().any(|citation| { + citation.display_name == "input#mail" && citation.kind == NodeKind::ANNOTATION + }); + let has_boolean_attribute_anchor = answer.citations.iter().any(|citation| { + citation.display_name == "novalidate" && citation.kind == NodeKind::ANNOTATION + }); + let used_source_probe = answer.retrieval_trace.annotations.iter().any(|annotation| { + annotation == "packet_required_file_scoped_source_citations appended=4" + }); + + let _ = std::fs::remove_dir_all(&root); - let serialized_len = serde_json::to_vec(&packet).expect("serialize packet").len(); assert!( - serialized_len <= max_output_bytes as usize, - "serialized packet should honor max_output_bytes: {serialized_len} > {}", - max_output_bytes + has_ruby_method, + "required source probe should append Ruby method anchors: {:?}", + answer.citations ); - assert_eq!(packet.budget.used.output_bytes as usize, serialized_len); - assert!(packet.budget.truncated); assert!( - packet - .budget - .omitted_sections - .contains(&"markdown_blocks".to_string()) + has_php_method, + "required source probe should append PHP method anchors: {:?}", + answer.citations ); assert!( - !packet - .budget - .omitted_sections - .contains(&"packet_payload".to_string()) + has_input_anchor, + "required source probe should append HTML id anchors: {:?}", + answer.citations ); assert!( - !packet - .sufficiency - .gaps - .iter() - .any(|gap| gap.contains("packet_payload") || gap.contains("output_bytes")), - "sufficiency gaps should be rebuilt after final payload remeasurement clears stale omissions: {:?}", - packet.sufficiency + has_boolean_attribute_anchor, + "required source probe should append HTML boolean attribute anchors: {:?}", + answer.citations + ); + assert!( + used_source_probe, + "required source probe should annotate appended anchor count: {:?}", + answer.retrieval_trace.annotations ); } #[test] - fn packet_hard_output_cap_uses_current_usage_not_stale_omissions() { - let limits = PacketBudgetLimitsDto { - max_anchors: 4, - max_files: 4, - max_snippets: 4, - max_trail_edges: 4, - max_output_bytes: 1000, - }; - let mut budget = PacketBudgetDto { - requested: PacketBudgetModeDto::Compact, - limits, - used: PacketBudgetUsageDto { - anchors: 4, - files: 4, - snippets: 0, - trail_edges: 0, - output_bytes: 900, - }, - truncated: true, - omitted_sections: vec!["output_bytes".to_string(), "packet_payload".to_string()], - next_deeper_command: None, - }; + fn automapper_map_flow_source_claims_name_runtime_configuration_and_plans() { + let _eval_probes = EvalProbesGuard::enabled(); + let prompt = "Explain how AutoMapper configuration and runtime mapper APIs cooperate to map source objects to destination objects."; + let fixtures = [ + ( + "MapperConfiguration", + "src/AutoMapper/Configuration/MapperConfiguration.cs", + r#" + public sealed class MapperConfiguration : IGlobalConfiguration + { + private readonly Dictionary _configuredMaps; + private readonly Dictionary _resolvedMaps; + private readonly LockingConcurrentDictionary _executionPlans; + public LambdaExpression BuildExecutionPlan(Type sourceType, Type destinationType) => this.Internal().BuildExecutionPlan(new(new(sourceType, destinationType))); + } + "#, + "MapperConfiguration builds and owns the mapping configuration used at runtime.", + ), + ( + "Mapper.Map", + "src/AutoMapper/Mapper.cs", + r#" + public sealed class Mapper : IMapper, IInternalRuntimeMapper + { + public TDestination Map(object source) => Map(source, default(TDestination)); + public TDestination Map(TSource source, TDestination destination) => + MapCore(source, destination, _defaultContext); + private TDestination MapCore(TSource source, TDestination destination, ResolutionContext context) + { + return _configuration.GetExecutionPlan(mapRequest)(source, destination, context); + } + } + "#, + "Mapper.Map is the public runtime entry point for object mapping.", + ), + ( + "TypeMap.CreateMapperLambda", + "src/AutoMapper/TypeMap.cs", + r#" + internal LambdaExpression CreateMapperLambda(IGlobalConfiguration configuration) => + Types.ContainsGenericParameters ? null : new TypeMapPlanBuilder(configuration, this).CreateMapperLambda(); + "#, + "TypeMap contributes mapper lambda plans used by the execution pipeline.", + ), + ( + "TypeMapPlanBuilder", + "src/AutoMapper/Execution/TypeMapPlanBuilder.cs", + r#" + public LambdaExpression CreateMapperLambda() + { + var createDestinationFunc = CreateDestinationFunc(); + var assignmentFunc = CreateAssignmentFunc(createDestinationFunc); + var mapperFunc = CreateMapperFunc(assignmentFunc); + return Lambda(mapperFunc, GetParameters(second: _initialDestination)); + } + "#, + "TypeMapPlanBuilder participates in building expression plans for mappings.", + ), + ]; - assert!( - !packet_budget_exceeded_hard_output_cap(&budget), - "stale output_bytes omission should not force followups after final payload fits" - ); - budget.used.output_bytes = 1001; - assert!(packet_budget_exceeded_hard_output_cap(&budget)); + for (symbol, path, source, expected) in fixtures { + let citation = test_packet_citation(symbol, path, 0.9); + let claims = packet_source_derived_claims_for_citation(prompt, &citation, source); + assert!( + claims.iter().any(|claim| claim == expected), + "expected AutoMapper claim `{expected}` for {path}; got {claims:?}" + ); + } + } + #[test] + fn express_route_flow_source_claims_name_app_router_response_flow_with_eval_probes() { + let _eval_probes = EvalProbesGuard::enabled(); + let prompt = "Trace how Express creates an application, registers middleware/routes, and handles an incoming request through the router and response helpers."; + let fixtures = [ + ( + "createApplication", + "lib/express.js", + "function createApplication() { var app = function(req, res, next) { app.handle(req, res, next); }; mixin(app, proto, false); app.request = Object.create(req); app.response = Object.create(res); app.init(); return app; }", + "createApplication builds a callable app object and mixes in request and response prototypes.", + ), + ( + "app.handle", + "lib/application.js", + "app.init = function init() { var router = null; this.defaultConfiguration(); router = new Router({}); }\napp.handle = function handle(req, res, callback) { this.router.handle(req, res, done); }\napp.use = function use(fn) { return router.use(path, fn); }\napp.route = function route(path) { return this.router.route(path); }", + "app.handle delegates request handling to the router.", + ), + ( + "app.use", + "lib/application.js", + "app.init = function init() { var router = null; this.defaultConfiguration(); router = new Router({}); }\napp.handle = function handle(req, res, callback) { this.router.handle(req, res, done); }\napp.use = function use(fn) { return router.use(path, fn); }\napp.route = function route(path) { return this.router.route(path); }", + "app.use registers middleware on the router.", + ), + ( + "app.route", + "lib/application.js", + "app.init = function init() { var router = null; this.defaultConfiguration(); router = new Router({}); }\napp.handle = function handle(req, res, callback) { this.router.handle(req, res, done); }\napp.use = function use(fn) { return router.use(path, fn); }\napp.route = function route(path) { return this.router.route(path); }", + "app.route creates route entries through the router.", + ), + ( + "res.send", + "lib/response.js", + "res.send = function send(body) { this.set('Content-Length', len); this.end(chunk, encoding); return this; }", + "res.send prepares and sends the response body.", + ), + ]; + + for (symbol, path, source, expected) in fixtures { + let citation = test_packet_citation(symbol, path, 0.9); + let claims = packet_source_derived_claims_for_citation(prompt, &citation, source); + assert!( + claims.iter().any(|claim| claim == expected), + "expected source-derived claim `{expected}` for {path}; got {claims:?}" + ); + } } #[test] - fn graph_budget_prunes_nodes_not_referenced_by_retained_edges() { - fn node(id: &str) -> codestory_contracts::api::GraphNodeDto { - codestory_contracts::api::GraphNodeDto { - id: NodeId(id.to_string()), - label: id.to_string(), - kind: codestory_contracts::api::NodeKind::FUNCTION, - depth: 1, - label_policy: None, - badge_visible_members: None, - badge_total_members: None, - merged_symbol_examples: Vec::new(), - file_path: None, - qualified_name: None, - member_access: None, - } + fn url_session_request_claims_name_lifecycle_with_eval_probes() { + let _eval_probes = EvalProbesGuard::enabled(); + let prompt = "Trace how a Session creates requests, resumes tasks, validates data requests, and receives URLSession callbacks."; + let fixtures = [ + ( + "Session.request", + "Source/Core/Session.swift", + "open func request(_ convertible: URLRequestConvertible) -> DataRequest { let request = DataRequest(); performEagerlyIfNecessary(request); return request }", + "Session request creation builds request objects before optional eager execution.", + ), + ( + "Request.resume", + "Source/Core/Request.swift", + "public func resume() -> Self { delegate?.readyToPerform(request: self); task.resume(); return self }", + "Request.resume resumes the underlying request task.", + ), + ( + "DataRequest.validate", + "Source/Core/DataRequest.swift", + "public func validate(_ validation: @escaping Validation) -> Self { validators.write { $0.append(validation) }; didValidateRequest(); return self }", + "Request validation attaches validation behavior.", + ), + ( + "SessionDelegate", + "Source/Core/SessionDelegate.swift", + "open class SessionDelegate: NSObject, URLSessionDataDelegate { open func urlSession(_ session: URLSession, dataTask: URLSessionDataTask, didReceive data: Data) { request.didReceive(data: data) } open func urlSession(_ session: URLSession, task: URLSessionTask, didCompleteWithError error: Error?) { request.didReceiveResponse(nil) } }", + "The session delegate receives request callback events.", + ), + ]; + + for (symbol, path, source, expected) in fixtures { + let citation = test_packet_citation(symbol, path, 0.9); + let claims = packet_source_derived_claims_for_citation(prompt, &citation, source); + assert!( + claims.iter().any(|claim| claim == expected), + "expected URLSession request lifecycle claim `{expected}` for {path}; got {claims:?}" + ); } + } - fn edge(id: &str, source: &str, target: &str) -> codestory_contracts::api::GraphEdgeDto { - codestory_contracts::api::GraphEdgeDto { - id: EdgeId(id.to_string()), - source: NodeId(source.to_string()), - target: NodeId(target.to_string()), - kind: codestory_contracts::api::EdgeKind::CALL, - confidence: None, - certainty: None, - callsite_identity: None, - candidate_targets: Vec::new(), + #[test] + fn java_string_check_source_claims_name_blank_empty_and_region_matching() { + let _eval_probes = EvalProbesGuard::enabled(); + let prompt = "Explain how Commons Lang implements blank, empty, and case-sensitive string checks across StringUtils, Strings, and CharSequenceUtils."; + let string_utils = test_packet_citation( + "org.apache.commons.lang3.StringUtils.isBlank", + "src/main/java/org/apache/commons/lang3/StringUtils.java", + 0.9, + ); + let claims = packet_source_derived_claims_for_citation( + prompt, + &string_utils, + r#" + * StringUtils.isBlank(" ") = true + public static boolean isBlank(final CharSequence cs) { + if (cs == null || cs.length() == 0) { + return true; + } + return Character.isWhitespace(cs.charAt(0)); + } + * StringUtils.isEmpty(" ") = false + * NOTE: This method changed in Lang version 2.0. It no longer trims the CharSequence. + public static boolean isEmpty(final CharSequence cs) { + return cs == null || cs.length() == 0; } + "#, + ); + + for expected in [ + "StringUtils.isBlank treats null, empty, and whitespace-only inputs as blank.", + "StringUtils.isEmpty does not trim whitespace before deciding emptiness.", + ] { + assert!( + claims.iter().any(|claim| claim == expected), + "expected Java string claim `{expected}` in {claims:?}" + ); } - let mut answer = packet_answer_fixture( - "Explain graph budget trimming.", - vec![test_packet_citation("center", "src/center.rs", 0.9)], + let strings = test_packet_citation( + "Strings", + "src/main/java/org/apache/commons/lang3/Strings.java", + 0.9, ); - answer.graphs.push(GraphArtifactDto::Uml { - id: "graph".to_string(), - title: "Graph".to_string(), - graph: GraphResponse { - center_id: NodeId("center".to_string()), - nodes: vec![ - node("center"), - node("kept"), - node("dropped_a"), - node("dropped_b"), - ], - edges: vec![ - edge("edge_1", "center", "kept"), - edge("edge_2", "kept", "dropped_a"), - edge("edge_3", "dropped_a", "dropped_b"), - ], - truncated: false, - omitted_edge_count: 0, - canonical_layout: None, - }, - }); - - let budget = apply_packet_budget( - packet_fixture_project_root(), - "Explain graph budget trimming.", - PacketTaskClassDto::ArchitectureExplanation, - PacketBudgetModeDto::Tiny, - PacketBudgetLimitsDto { - max_trail_edges: 1, - ..packet_budget_limits(PacketBudgetModeDto::Tiny) - }, - &mut answer, + let claims = packet_source_derived_claims_for_citation( + prompt, + &strings, + "return CharSequenceUtils.regionMatches(str, ignoreCase, 0, suffix, 0, length);", + ); + assert!( + claims.iter().any(|claim| claim + == "Strings delegates region matching work to CharSequenceUtils.regionMatches."), + "expected region matching claim in {claims:?}" ); - - let GraphArtifactDto::Uml { graph, .. } = &answer.graphs[0] else { - panic!("expected UML graph"); - }; - let node_ids = graph - .nodes - .iter() - .map(|node| node.id.0.as_str()) - .collect::>(); - assert_eq!(graph.edges.len(), 1); - assert_eq!(node_ids, vec!["center", "kept"]); - assert!(graph.truncated); - assert!(budget.omitted_sections.contains(&"trail_edges".to_string())); } #[test] - fn generic_packet_sections_and_sufficiency_cover_agent_stop_contract() { - let question = "Explain how a command enters runtime orchestration, workspace planning, symbol extraction, persistence, and snapshot refresh."; - let limits = packet_budget_limits(PacketBudgetModeDto::Compact); - let mut answer = AgentAnswerDto { - answer_id: "packet-fixture".to_string(), - prompt: question.to_string(), - summary: "Runtime flow is covered by cited anchors.".to_string(), - freshness: None, - sections: vec![AgentResponseSectionDto { - id: "answer".to_string(), - title: "Answer".to_string(), - blocks: vec![AgentResponseBlockDto::Markdown { - markdown: "The flow starts at the command surface and proceeds through runtime, workspace, indexer, store, and snapshot layers.".to_string(), - }], - }], - citations: vec![ - test_packet_citation( - "FlowRegression", - "tests/flow_regression.rs", - 0.5, - ), - test_packet_citation("CliCommand", "crates/app-cli/src/main.rs", 0.2), - test_packet_citation( - "RuntimeCoordinator", - "crates/app-runtime/src/services.rs", - 0.3, - ), - test_packet_citation( - "WorkspacePlan", - "crates/workspace/src/plan.rs", - 0.2, - ), + fn exact_family_source_claims_require_eval_probes() { + let _env = EnvVarGuard::cleared(EVAL_PROBES_ENV); + let cases = [ + ( + "Explain how Commons Lang implements blank and empty string checks across StringUtils.", test_packet_citation( - "GraphIndexer", - "crates/indexer/src/lib.rs", - 0.2, + "org.apache.commons.lang3.StringUtils.isBlank", + "src/main/java/org/apache/commons/lang3/StringUtils.java", + 0.9, ), + r#" + public static boolean isBlank(final CharSequence cs) { + if (cs == null || cs.length() == 0) { + return true; + } + return Character.isWhitespace(cs.charAt(0)); + } + * NOTE: This method changed in Lang version 2.0. It no longer trims the CharSequence. + public static boolean isEmpty(final CharSequence cs) { + return cs == null || cs.length() == 0; + } + "#, + &["StringUtils."][..], + ), + ( + "Explain how Requests turns a top-level request call into a prepared request and sends it through a session adapter.", + test_packet_citation("Session.request", "src/requests/sessions.py", 0.9), + "def request(self, method, url, **kwargs):\n req = Request(method=method, url=url)\n prep = self.prepare_request(req)\n return self.send(prep, **kwargs)\n", + &["PreparedRequest", "Session.request"][..], + ), + ( + "Trace how Express creates an application, registers middleware/routes, and handles an incoming request through the router and response helpers.", + test_packet_citation("app.use", "lib/application.js", 0.9), + "app.use = function use(fn) { return router.use(path, fn); }\napp.handle = function handle(req, res, callback) { this.router.handle(req, res, done); }\n", + &[ + "createApplication", + "app.handle", + "app.use", + "lib/express.js", + ][..], + ), + ( + "Explain how fmt turns formatting arguments into type-erased format args and reaches vformat or format_to output paths.", + test_packet_citation("vformat", "include/fmt/format.h", 0.9), + "class format_error : public std::runtime_error {}; inline auto vformat(locale_ref loc, string_view fmt, format_args args) -> std::string { detail::vformat_to(buf, fmt, args, loc); return to_string(buf); }", + &["vformat is the central", "format_error represents"][..], + ), + ( + "Trace how Jekyll's build command creates a site and runs the read, generate, render, and write phases.", + test_packet_citation("Site#process", "lib/jekyll/site.rb", 0.9), + "class Site\n def process\n read\n generate\n render\n write\n end\nend\n", + &["Jekyll::Site", "Site#process"][..], + ), + ( + "Explain how AutoMapper configuration and runtime mapper APIs cooperate to map source objects to destination objects.", test_packet_citation( - "ProjectionStore", - "crates/store/src/projection.rs", - 0.2, + "MapperConfiguration", + "src/AutoMapper/Configuration/MapperConfiguration.cs", + 0.9, ), - ], - subgraph_ids: Vec::new(), - retrieval_version: "test".to_string(), - graphs: Vec::new(), - retrieval_trace: codestory_contracts::api::AgentRetrievalTraceDto { - request_id: "packet-fixture".to_string(), - resolved_profile: AgentRetrievalPresetDto::Architecture, - policy_mode: AgentRetrievalPolicyModeDto::LatencyFirst, - total_latency_ms: 1, - sla_target_ms: None, - sla_missed: false, - semantic_fallback_count: 0, - semantic_fallbacks: Vec::new(), - annotations: Vec::new(), - steps: Vec::new(), - retrieval_shadow: None, - }, - }; + "public sealed class MapperConfiguration { Dictionary _configuredMaps; Dictionary _resolvedMaps; LambdaExpression BuildExecutionPlan(Type sourceType, Type destinationType) => null; }\n", + &["MapperConfiguration", "Mapper.Map", "TypeMap"][..], + ), + ( + "Explain how Okio's Buffer, Source, Sink, and buffered wrappers cooperate to move bytes through reads and writes.", + test_packet_citation("RealBufferedSource", "okio/RealBufferedSource.kt", 0.9), + "class RealBufferedSource(val source: Source) { val buffer = Buffer(); override fun read(sink: Buffer, byteCount: Long): Long = source.read(buffer, byteCount) }\n", + &["RealBufferedSource", "Buffer helpers"][..], + ), + ( + "Trace how Alamofire's Session creates requests, resumes tasks, validates data requests, and receives URLSession callbacks.", + test_packet_citation("DataRequest.validate", "Source/Core/DataRequest.swift", 0.9), + "public func validate(_ validation: @escaping Validation) -> Self { validators.write { $0.append(validation) }; didValidateRequest() }\n", + &["Alamofire", "Source/Core", "URLSession"][..], + ), + ( + "Explain how package:http exposes top-level helpers, BaseClient convenience methods, BaseRequest finalization, and IOClient send behavior.", + test_packet_citation("NativeClient", "src/native_client.dart", 0.9), + "import 'dart:io'; class NativeClient { Future send(BaseRequest request) async { var stream = request.finalize(); var ioRequest = await _inner!.openUrl(request.method, request.url); final response = await stream.pipe(ioRequest) as HttpClientResponse; return NativeStreamedResponse(response); } }\n", + &["dart:io", "IOClient", "NativeClient.send is"][..], + ), + ]; - rank_packet_evidence(question, &mut answer); - append_packet_evidence_sections( - &mut answer, - PacketTaskClassDto::ArchitectureExplanation, - &limits, + for (prompt, citation, source, forbidden_fragments) in cases { + let claims = packet_source_derived_claims_for_citation(prompt, &citation, source); + for forbidden in forbidden_fragments { + assert!( + claims.iter().all(|claim| !claim.contains(forbidden)), + "production source claims should not include exact benchmark-family fragment `{forbidden}`: {claims:?}" + ); + } + } + } + + #[test] + fn generic_string_predicate_claims_name_blank_and_empty_behavior() { + let source = r#" + final class TextChecks { + /** + * @return true if the value is null, empty or whitespace only. + */ + public static boolean isBlank(final CharSequence value) { + final int valueLength = length(value); + for (int i = 0; i < valueLength; i++) { + if (!Character.isWhitespace(value.charAt(i))) { + return false; + } + } + return true; + } + + public static boolean isEmpty(final CharSequence value) { + return value == null || value.length() == 0; + } + } + "#; + + let mut claims = + packet_generic_string_predicate_flow_claims("com.acme.TextChecks.isBlank", source); + claims.extend(packet_generic_string_predicate_flow_claims( + "com.acme.TextChecks.isEmpty", + source, + )); + + for expected in [ + "isBlank treats null, empty, and whitespace-only inputs as blank.", + "isEmpty does not trim whitespace before deciding emptiness.", + ] { + assert!( + claims.iter().any(|claim| claim == expected), + "expected generic string predicate claim `{expected}` in {claims:?}" + ); + } + } + + #[test] + fn swr_source_claims_name_hook_cache_and_mutation_flow() { + let _eval_probes = EvalProbesGuard::enabled(); + let prompt = "Explain how SWR exposes useSWR, serializes keys, connects cache helpers, and routes mutate behavior through the internal mutation helper."; + let use_swr = test_packet_citation("useSWRHandler", "src/index/use-swr.ts", 0.9); + let claims = packet_source_derived_claims_for_citation( + prompt, + &use_swr, + r#" + export const useSWRHandler = (_key) => { + const [key, fnArg] = serialize(_key) + return internalMutate(cache, keyRef.current, ...args) + } + const useSWR = withArgs(useSWRHandler) + export default useSWR + "#, ); - let budget = apply_packet_budget( - packet_fixture_project_root(), - question, - PacketTaskClassDto::ArchitectureExplanation, - PacketBudgetModeDto::Compact, - limits, - &mut answer, + for expected in [ + "The public useSWR export wraps useSWRHandler with argument normalization.", + "useSWRHandler serializes the key before reading cache state.", + "mutate behavior flows through internalMutate.", + ] { + assert!( + claims.iter().any(|claim| claim == expected), + "expected SWR hook claim `{expected}` in {claims:?}" + ); + } + + let helper = + test_packet_citation("createCacheHelper", "src/_internal/utils/helper.ts", 0.9); + let claims = packet_source_derived_claims_for_citation( + prompt, + &helper, + r#" + export const createCacheHelper = (cache, key) => { + const get = () => cache.get(key) + const set = info => cache.set(key, info) + const subscribe = callback => subscriptions.push(callback) + return [get, set, subscribe, () => snapshot] + } + "#, ); - let sufficiency = build_packet_sufficiency( - packet_fixture_project_root(), - question, - PacketTaskClassDto::ArchitectureExplanation, - &answer, - &budget, + assert!( + claims.iter().any(|claim| claim + == "createCacheHelper provides cache get, set, subscribe, and snapshot helpers."), + "expected SWR cache helper claim in {claims:?}" ); - assert_eq!(answer.sections[0].id, "packet-evidence-ledger"); - assert_eq!(answer.sections[1].id, "packet-flow-claims"); - let top_anchor_names = answer - .citations - .iter() - .take(4) - .map(|citation| citation.display_name.as_str()) - .collect::>(); - assert!( - top_anchor_names.contains(&"CliCommand"), - "command entrypoint should stay in the high-priority flow anchors: {top_anchor_names:?}" + let mutate = test_packet_citation("internalMutate", "src/_internal/utils/mutate.ts", 0.9); + let claims = packet_source_derived_claims_for_citation( + prompt, + &mutate, + "export async function internalMutate(cache, _key, _data) { return data }", ); assert!( - top_anchor_names.contains(&"RuntimeCoordinator"), - "runtime coordination should stay in the high-priority flow anchors: {top_anchor_names:?}" + claims + .iter() + .any(|claim| claim == "mutate behavior flows through internalMutate."), + "expected SWR mutation claim in {claims:?}" ); - assert_eq!(sufficiency.status, PacketSufficiencyStatusDto::Sufficient); - assert!(sufficiency.follow_up_commands.is_empty()); - assert!(sufficiency.open_next.is_empty()); - assert!( - sufficiency.covered_claims.iter().any(|claim| claim - .claim - .contains("Runtime orchestration is anchored by `RuntimeCoordinator`")), - "generic packet should include claim-led runtime flow notes: {sufficiency:?}" + } + + #[test] + fn python_requests_source_claims_name_method_flow() { + let _eval_probes = EvalProbesGuard::enabled(); + let prompt = "Explain how Requests turns a top-level request call into a prepared request and sends it through a session adapter."; + let cases = [ + ( + "request", + "src/requests/api.py", + "def request(method, url, **kwargs):\n with sessions.Session() as session:\n return session.request(method=method, url=url, **kwargs)\n", + "The top-level request helper opens a Session and delegates to Session.request.", + ), + ( + "Session.request", + "src/requests/sessions.py", + "def request(self, method, url, **kwargs):\n req = Request(method=method, url=url)\n prep = self.prepare_request(req)\n return self.send(prep, **kwargs)\n", + "Session.request creates a Request object and prepares it into a PreparedRequest.", + ), + ( + "PreparedRequest.prepare", + "src/requests/models.py", + "def prepare(self):\n self.prepare_method(method)\n self.prepare_url(url, params)\n self.prepare_headers(headers)\n self.prepare_cookies(cookies)\n self.prepare_body(data, files, json)\n self.prepare_auth(auth, url)\n self.prepare_hooks(hooks)\n", + "PreparedRequest.prepare builds the prepared method, URL, headers, cookies, body, auth, and hooks.", + ), + ( + "Session.send", + "src/requests/sessions.py", + "def send(self, request, **kwargs):\n adapter = self.get_adapter(url=request.url)\n r = adapter.send(request, **kwargs)\n return r\n", + "Session.send chooses an adapter and calls the adapter send method.", + ), + ( + "HTTPAdapter.send", + "src/requests/adapters.py", + "def send(self, request, **kwargs):\n resp = conn.urlopen(method=request.method, url=url)\n return self.build_response(request, resp)\n", + "HTTPAdapter.send is the transport boundary that returns the response.", + ), + ]; + + for (symbol, path, source, expected) in cases { + let citation = test_packet_citation(symbol, path, 0.9); + let claims = packet_source_derived_claims_for_citation(prompt, &citation, source); + assert!( + claims.iter().any(|claim| claim == expected), + "expected source-derived claim `{expected}` for {symbol}; got {claims:?}" + ); + } + } + + #[test] + fn python_request_flow_does_not_emit_axios_transport_claim_without_xhr() { + let prompt = "Explain how Requests sends a prepared request through a session adapter."; + let citation = test_packet_citation("Session", "src/requests/sessions.py", 0.9); + let claims = packet_source_derived_claims_for_citation( + prompt, + &citation, + "adapter = self.get_adapter(url=request.url)\n# http proxy environment settings\n", ); + assert!( - sufficiency - .avoid_opening - .iter() - .any(|path| path.contains("crates/app-cli/src/main.rs")), - "sufficient packets should tell agents cited files do not need broad re-opening: {sufficiency:?}" + !claims.iter().any(|claim| claim.contains("xhr or http")), + "Python Requests source should not inherit Axios transport wording: {claims:?}" ); } @@ -10558,7 +9671,10 @@ mod tests { retrieval_score_breakdown: None, }; - assert_eq!(packet_evidence_role(&citation), Some("command entrypoint")); + assert_eq!( + packet_evidence_role(&citation), + Some(PacketEvidenceRole::CommandEntrypoint) + ); assert_eq!( packet_display_path(citation.file_path.as_deref().unwrap()), "crates/tool-cli/src/main.rs" @@ -10566,7 +9682,7 @@ mod tests { assert!( packet_claim_for_role( "command entrypoint", - "command entrypoint", + PacketEvidenceRole::CommandEntrypoint, &citation, "Explain the CLI entrypoint." ) diff --git a/crates/codestory-runtime/src/agent/packet_batch.rs b/crates/codestory-runtime/src/agent/packet_batch.rs index d357a741..10d84586 100644 --- a/crates/codestory-runtime/src/agent/packet_batch.rs +++ b/crates/codestory-runtime/src/agent/packet_batch.rs @@ -55,6 +55,10 @@ impl PacketLatencyBudget { self.elapsed_ms() >= self.target_ms } + pub(crate) fn remaining_ms(&self) -> u32 { + clamp_u128_to_u32(self.target_ms.saturating_sub(self.elapsed_ms()).max(100)) + } + pub(crate) fn budget_usage_percent(&self, consumed_trace_ms: u32) -> u128 { (consumed_trace_ms as u128) .saturating_mul(100) @@ -137,13 +141,18 @@ pub(crate) fn run_packet_planned_subqueries( .map(|(_, query)| (query.query.clone(), per_query_limit)) .collect::>(); let started_at = Instant::now(); - match controller.search_lexical_hybrid_batch(&batch) { - Ok(results) => { + match controller.search_lexical_hybrid_batch(&batch, Some(packet_latency.remaining_ms())) { + Ok(outcome) => { let duration_ms = clamp_u128_to_u32(started_at.elapsed().as_millis()); answer.retrieval_trace.total_latency_ms = answer .retrieval_trace .total_latency_ms .saturating_add(duration_ms); + answer + .retrieval_trace + .packet_sidecar_diagnostics + .extend(outcome.sidecar_diagnostics); + let results = outcome.results; merge_packet_lexical_subquery_batch( answer, &lexical_pending, @@ -178,7 +187,10 @@ pub(crate) fn run_packet_planned_subqueries( }) .collect::>(); let retry_started = Instant::now(); - match controller.search_semantic_hybrid_batch(&retry_batch) { + match controller.search_semantic_hybrid_batch( + &retry_batch, + Some(packet_latency.remaining_ms()), + ) { Ok(outcome) => { let retry_duration_ms = clamp_u128_to_u32(retry_started.elapsed().as_millis()); @@ -186,6 +198,10 @@ pub(crate) fn run_packet_planned_subqueries( .retrieval_trace .total_latency_ms .saturating_add(retry_duration_ms); + answer + .retrieval_trace + .packet_sidecar_diagnostics + .extend(outcome.sidecar_diagnostics); record_semantic_fallbacks(answer, &outcome.fallbacks); merge_packet_semantic_subquery_batch( answer, @@ -235,13 +251,19 @@ pub(crate) fn run_packet_planned_subqueries( }) .collect::>(); let started_at = Instant::now(); - match controller.search_semantic_hybrid_batch(&batch) { + match controller + .search_semantic_hybrid_batch(&batch, Some(packet_latency.remaining_ms())) + { Ok(outcome) => { let duration_ms = clamp_u128_to_u32(started_at.elapsed().as_millis()); answer.retrieval_trace.total_latency_ms = answer .retrieval_trace .total_latency_ms .saturating_add(duration_ms); + answer + .retrieval_trace + .packet_sidecar_diagnostics + .extend(outcome.sidecar_diagnostics); record_semantic_fallbacks(answer, &outcome.fallbacks); merge_packet_semantic_subquery_batch( answer, @@ -364,14 +386,23 @@ pub(crate) fn run_packet_anchor_expansion( } let started_at = Instant::now(); - let result = controller.search_symbolic_packet_anchor_batch(&queries, per_query_limit); + let result = controller.search_symbolic_packet_anchor_batch( + &queries, + per_query_limit, + Some(packet_latency.remaining_ms()), + ); let duration_ms = clamp_u128_to_u32(started_at.elapsed().as_millis()); answer.retrieval_trace.total_latency_ms = answer .retrieval_trace .total_latency_ms .saturating_add(duration_ms); match result { - Ok(results) => { + Ok(outcome) => { + answer + .retrieval_trace + .packet_sidecar_diagnostics + .extend(outcome.sidecar_diagnostics); + let results = outcome.results; let per_step_duration = duration_ms / results.len().max(1) as u32; for (query, hits) in results { let mut added = 0usize; @@ -492,6 +523,7 @@ pub(crate) fn packet_anchor_probe_queries(plan: &PacketPlanDto) -> Vec { .filter(|query| { let query = query.1; query.purpose.contains("symbol probe") + || packet_task_seed_anchor_probe(&query.query) || query.purpose.contains("concrete symbol") || is_packet_code_like_term(&query.query) }) @@ -506,13 +538,22 @@ pub(crate) fn packet_anchor_probe_queries(plan: &PacketPlanDto) -> Vec { fn packet_anchor_probe_priority(query: &PacketPlanQueryDto) -> u8 { if query.purpose.contains("symbol probe") { 0 - } else if packet_anchor_probe_has_strong_code_shape(&query.query) { + } else if packet_task_seed_anchor_probe(&query.query) { 1 - } else { + } else if packet_anchor_probe_has_strong_code_shape(&query.query) { 2 + } else { + 3 } } +fn packet_task_seed_anchor_probe(query: &str) -> bool { + matches!( + normalize_identifier(query).as_str(), + "main" | "run" | "entrypoint" + ) +} + fn packet_anchor_probe_has_strong_code_shape(query: &str) -> bool { let trimmed = query.trim(); trimmed.contains("::") @@ -556,7 +597,13 @@ pub(crate) fn packet_file_stem_matches_query(query: &str, path: Option<&str>) -> let Some(path) = path else { return false; }; - let normalized_query = normalize_identifier(query); + let query_path = query.replace('\\', "/"); + let query_file_name = query_path.rsplit('/').next().unwrap_or(query).trim(); + let query_stem = query_file_name + .rsplit_once('.') + .map(|(stem, _)| stem) + .unwrap_or(query_file_name); + let normalized_query = normalize_identifier(query_stem); if normalized_query.is_empty() { return false; } @@ -706,6 +753,45 @@ mod tests { ] ); } + + #[test] + fn packet_anchor_probe_queries_execute_entrypoint_seed_queries() { + let plan = PacketPlanDto { + task_class: PacketTaskClassDto::ArchitectureExplanation, + inferred_task_class: false, + queries: vec![ + PacketPlanQueryDto { + query: "Explain the runtime flow".to_string(), + purpose: "original task phrasing for sidecar-primary source-backed retrieval" + .to_string(), + }, + PacketPlanQueryDto { + query: "architecture entrypoint".to_string(), + purpose: "task-class retrieval seed".to_string(), + }, + PacketPlanQueryDto { + query: "main".to_string(), + purpose: "task-class retrieval seed".to_string(), + }, + PacketPlanQueryDto { + query: "run".to_string(), + purpose: "task-class retrieval seed".to_string(), + }, + PacketPlanQueryDto { + query: "entrypoint".to_string(), + purpose: "task-class retrieval seed".to_string(), + }, + ], + trace: Vec::new(), + }; + + let queries = packet_anchor_probe_queries(&plan); + + assert!(queries.contains(&"main".to_string())); + assert!(queries.contains(&"run".to_string())); + assert!(queries.contains(&"entrypoint".to_string())); + assert!(!queries.contains(&"architecture entrypoint".to_string())); + } } fn is_packet_code_like_term(token: &str) -> bool { diff --git a/crates/codestory-runtime/src/agent/packet_budget.rs b/crates/codestory-runtime/src/agent/packet_budget.rs new file mode 100644 index 00000000..f637eddb --- /dev/null +++ b/crates/codestory-runtime/src/agent/packet_budget.rs @@ -0,0 +1,411 @@ +use crate::agent::packet_capping::cap_packet_citations; +use crate::agent::packet_command_profiles::packet_command_exact_probe_queries; +use crate::agent::packet_plan::{packet_explicit_request_probe_queries, push_unique_term}; +use crate::agent::packet_required_probes::packet_sufficiency_required_probe_queries_with_extra; +use crate::agent::packet_sufficiency::{ + PACKET_MARKDOWN_TRUNCATION_SUFFIX, build_packet_sufficiency_with_extra, + quote_packet_command_value, quote_packet_project_arg, +}; +use crate::agent::trace_export::packet_retrieval_trace_summary; +use codestory_contracts::api::{ + AgentAnswerDto, AgentPacketDto, AgentResponseBlockDto, AgentRetrievalStepKindDto, + AgentRetrievalStepStatusDto, GraphArtifactDto, GraphResponse, PacketBudgetDto, + PacketBudgetLimitsDto, PacketBudgetModeDto, PacketBudgetUsageDto, PacketTaskClassDto, +}; +use std::collections::HashSet; +use std::path::Path; + +pub(crate) fn packet_budget_limits(mode: PacketBudgetModeDto) -> PacketBudgetLimitsDto { + match mode { + PacketBudgetModeDto::Tiny => PacketBudgetLimitsDto { + max_anchors: 3, + max_files: 3, + max_snippets: 6, + max_trail_edges: 12, + max_output_bytes: 24 * 1024, + }, + PacketBudgetModeDto::Compact => PacketBudgetLimitsDto { + max_anchors: 13, + max_files: 13, + max_snippets: 12, + max_trail_edges: 20, + max_output_bytes: 96 * 1024, + }, + PacketBudgetModeDto::Standard => PacketBudgetLimitsDto { + max_anchors: 16, + max_files: 16, + max_snippets: 24, + max_trail_edges: 60, + max_output_bytes: 128 * 1024, + }, + PacketBudgetModeDto::Deep => PacketBudgetLimitsDto { + max_anchors: 25, + max_files: 25, + max_snippets: 80, + max_trail_edges: 240, + max_output_bytes: 512 * 1024, + }, + } +} + +#[cfg(test)] +pub(crate) fn apply_packet_budget( + project_root: &Path, + question: &str, + task_class: PacketTaskClassDto, + requested: PacketBudgetModeDto, + limits: PacketBudgetLimitsDto, + answer: &mut AgentAnswerDto, +) -> PacketBudgetDto { + apply_packet_budget_with_extra( + project_root, + question, + task_class, + requested, + limits, + answer, + &[], + ) +} + +pub(crate) fn apply_packet_budget_with_extra( + project_root: &Path, + question: &str, + task_class: PacketTaskClassDto, + requested: PacketBudgetModeDto, + limits: PacketBudgetLimitsDto, + answer: &mut AgentAnswerDto, + extra_probes: &[String], +) -> PacketBudgetDto { + let mut truncated = false; + let mut omitted_sections = Vec::new(); + + let mut protected_probe_queries = packet_command_exact_probe_queries(question, task_class); + for probe in + packet_sufficiency_required_probe_queries_with_extra(question, task_class, extra_probes) + { + push_unique_term(&mut protected_probe_queries, &probe); + } + if cap_packet_citations(answer, &limits, &protected_probe_queries) { + truncated = true; + omitted_sections.push("citations".to_string()); + } + if cap_graph_edges(answer, limits.max_trail_edges) { + truncated = true; + omitted_sections.push("trail_edges".to_string()); + } + if truncate_answer_markdown_to_byte_cap(answer, limits.max_output_bytes as usize) { + truncated = true; + omitted_sections.push("markdown_blocks".to_string()); + } + + let used = packet_budget_usage(answer); + if used.output_bytes > limits.max_output_bytes { + truncated = true; + omitted_sections.push("output_bytes".to_string()); + } + + omitted_sections.sort(); + omitted_sections.dedup(); + + PacketBudgetDto { + requested, + limits, + used, + truncated, + omitted_sections, + next_deeper_command: next_deeper_packet_command(project_root, question, requested), + } +} + +pub(crate) fn enforce_packet_output_budget(project_root: &Path, packet: &mut AgentPacketDto) { + let extra_probes = packet_explicit_request_probe_queries(&packet.plan); + for _ in 0..8 { + let output_bytes = refresh_packet_output_bytes(packet); + if output_bytes <= packet.budget.limits.max_output_bytes as usize { + break; + } + + packet.budget.truncated = true; + push_omitted_section(&mut packet.budget, "output_bytes"); + push_omitted_section(&mut packet.budget, "packet_payload"); + + let over_by = output_bytes.saturating_sub(packet.budget.limits.max_output_bytes as usize); + let current_answer_bytes = serde_json::to_vec(&packet.answer) + .map(|bytes| bytes.len()) + .unwrap_or_default(); + let next_answer_cap = current_answer_bytes + .saturating_sub(over_by.saturating_add(1024)) + .max(1024); + + if truncate_answer_markdown_to_byte_cap(&mut packet.answer, next_answer_cap) { + push_omitted_section(&mut packet.budget, "markdown_blocks"); + packet.budget.used = packet_budget_usage(&packet.answer); + rebuild_packet_budget_dependents(project_root, packet, &extra_probes); + continue; + } + break; + } + + let output_bytes = refresh_packet_output_bytes(packet); + if output_bytes > packet.budget.limits.max_output_bytes as usize { + packet.budget.truncated = true; + push_omitted_section(&mut packet.budget, "output_bytes"); + push_omitted_section(&mut packet.budget, "packet_payload"); + rebuild_packet_budget_dependents(project_root, packet, &extra_probes); + } else { + remove_omitted_section(&mut packet.budget, "output_bytes"); + remove_omitted_section(&mut packet.budget, "packet_payload"); + rebuild_packet_budget_dependents(project_root, packet, &extra_probes); + let _ = refresh_packet_output_bytes(packet); + } +} + +fn rebuild_packet_budget_dependents( + project_root: &Path, + packet: &mut AgentPacketDto, + extra_probes: &[String], +) { + packet.retrieval_trace_summary = packet_retrieval_trace_summary(&packet.answer); + packet.sufficiency = build_packet_sufficiency_with_extra( + project_root, + &packet.question, + packet + .task_class + .unwrap_or(PacketTaskClassDto::ArchitectureExplanation), + &packet.answer, + &packet.budget, + extra_probes, + ); +} + +fn refresh_packet_output_bytes(packet: &mut AgentPacketDto) -> usize { + for _ in 0..4 { + let output_bytes = serialized_packet_len(packet); + let output_bytes_u32 = output_bytes.try_into().unwrap_or(u32::MAX); + if packet.budget.used.output_bytes == output_bytes_u32 { + return output_bytes; + } + packet.budget.used.output_bytes = output_bytes_u32; + } + serialized_packet_len(packet) +} + +fn serialized_packet_len(packet: &AgentPacketDto) -> usize { + serde_json::to_vec(packet) + .map(|bytes| bytes.len()) + .unwrap_or_default() +} + +fn push_omitted_section(budget: &mut PacketBudgetDto, section: &str) { + if !budget + .omitted_sections + .iter() + .any(|existing| existing == section) + { + budget.omitted_sections.push(section.to_string()); + budget.omitted_sections.sort(); + } +} + +fn remove_omitted_section(budget: &mut PacketBudgetDto, section: &str) { + budget + .omitted_sections + .retain(|existing| existing != section); +} + +fn cap_graph_edges(answer: &mut AgentAnswerDto, max_edges: u32) -> bool { + let mut remaining = max_edges as usize; + let mut truncated = false; + for artifact in &mut answer.graphs { + let GraphArtifactDto::Uml { graph, .. } = artifact else { + continue; + }; + if graph.edges.len() > remaining { + let omitted = graph.edges.len() - remaining; + graph.edges.truncate(remaining); + graph.truncated = true; + graph.omitted_edge_count = graph + .omitted_edge_count + .saturating_add(omitted.try_into().unwrap_or(u32::MAX)); + truncated = true; + remaining = 0; + } else { + remaining = remaining.saturating_sub(graph.edges.len()); + } + if prune_graph_to_retained_edges(graph) { + truncated = true; + } + } + truncated +} + +fn prune_graph_to_retained_edges(graph: &mut GraphResponse) -> bool { + let original_nodes = graph.nodes.len(); + let original_layout_nodes = graph + .canonical_layout + .as_ref() + .map(|layout| layout.nodes.len()) + .unwrap_or_default(); + let original_layout_edges = graph + .canonical_layout + .as_ref() + .map(|layout| layout.edges.len()) + .unwrap_or_default(); + let mut retained_node_ids = HashSet::new(); + retained_node_ids.insert(graph.center_id.clone()); + let retained_edge_ids = graph + .edges + .iter() + .map(|edge| edge.id.clone()) + .collect::>(); + + for edge in &graph.edges { + retained_node_ids.insert(edge.source.clone()); + retained_node_ids.insert(edge.target.clone()); + } + + graph + .nodes + .retain(|node| retained_node_ids.contains(&node.id)); + + if let Some(layout) = graph.canonical_layout.as_mut() { + layout.edges.retain(|edge| { + let endpoints_retained = retained_node_ids.contains(&edge.source) + && retained_node_ids.contains(&edge.target); + let source_edge_retained = edge.source_edge_ids.is_empty() + || edge + .source_edge_ids + .iter() + .any(|edge_id| retained_edge_ids.contains(edge_id)); + endpoints_retained && source_edge_retained + }); + layout + .nodes + .retain(|node| retained_node_ids.contains(&node.id)); + } + + let pruned = graph.nodes.len() < original_nodes + || graph + .canonical_layout + .as_ref() + .map(|layout| layout.nodes.len() < original_layout_nodes) + .unwrap_or(false) + || graph + .canonical_layout + .as_ref() + .map(|layout| layout.edges.len() < original_layout_edges) + .unwrap_or(false); + if pruned { + graph.truncated = true; + } + pruned +} + +pub(crate) fn truncate_answer_markdown_to_byte_cap( + answer: &mut AgentAnswerDto, + byte_cap: usize, +) -> bool { + let mut truncated = false; + for _ in 0..8 { + let Ok(bytes) = serde_json::to_vec(answer) else { + return truncated; + }; + if bytes.len() <= byte_cap { + return truncated; + } + let Some((section_index, block_index, len)) = largest_markdown_block(answer) else { + return truncated; + }; + if len <= 256 { + return truncated; + } + if let AgentResponseBlockDto::Markdown { markdown } = + &mut answer.sections[section_index].blocks[block_index] + { + truncate_markdown_block(markdown); + truncated = true; + } + } + truncated +} + +fn largest_markdown_block(answer: &AgentAnswerDto) -> Option<(usize, usize, usize)> { + let mut largest = None; + for (section_index, section) in answer.sections.iter().enumerate() { + for (block_index, block) in section.blocks.iter().enumerate() { + if let AgentResponseBlockDto::Markdown { markdown } = block { + let len = markdown.len(); + if largest.is_none_or(|(_, _, existing)| len > existing) { + largest = Some((section_index, block_index, len)); + } + } + } + } + largest +} + +fn truncate_markdown_block(markdown: &mut String) { + let keep_chars = markdown.chars().count() / 2; + let mut keep_byte = markdown.len(); + if let Some((index, _)) = markdown.char_indices().nth(keep_chars) { + keep_byte = index; + } + markdown.truncate(keep_byte); + markdown.push_str(PACKET_MARKDOWN_TRUNCATION_SUFFIX); +} + +pub(crate) fn packet_budget_usage(answer: &AgentAnswerDto) -> PacketBudgetUsageDto { + let files = answer + .citations + .iter() + .filter_map(|citation| citation.file_path.as_deref()) + .collect::>() + .len(); + let trail_edges = answer + .graphs + .iter() + .map(|artifact| match artifact { + GraphArtifactDto::Uml { graph, .. } => graph.edges.len(), + GraphArtifactDto::Mermaid { .. } => 0, + }) + .sum::(); + let snippets = answer + .retrieval_trace + .steps + .iter() + .filter(|step| { + step.kind == AgentRetrievalStepKindDto::SourceRead + && step.status == AgentRetrievalStepStatusDto::Ok + }) + .count(); + let output_bytes = serde_json::to_vec(answer) + .map(|bytes| bytes.len()) + .unwrap_or_default(); + + PacketBudgetUsageDto { + anchors: answer.citations.len().try_into().unwrap_or(u32::MAX), + files: files.try_into().unwrap_or(u32::MAX), + snippets: snippets.try_into().unwrap_or(u32::MAX), + trail_edges: trail_edges.try_into().unwrap_or(u32::MAX), + output_bytes: output_bytes.try_into().unwrap_or(u32::MAX), + } +} + +pub(crate) fn next_deeper_packet_command( + project_root: &Path, + question: &str, + requested: PacketBudgetModeDto, +) -> Option { + let next = match requested { + PacketBudgetModeDto::Tiny => "compact", + PacketBudgetModeDto::Compact => "standard", + PacketBudgetModeDto::Standard => "deep", + PacketBudgetModeDto::Deep => return None, + }; + let project = quote_packet_project_arg(project_root); + Some(format!( + "codestory-cli packet --project {project} --question {} --budget {next}", + quote_packet_command_value(question) + )) +} diff --git a/crates/codestory-runtime/src/agent/packet_capping.rs b/crates/codestory-runtime/src/agent/packet_capping.rs new file mode 100644 index 00000000..3314fd06 --- /dev/null +++ b/crates/codestory-runtime/src/agent/packet_capping.rs @@ -0,0 +1,976 @@ +use crate::agent::packet_batch::packet_file_stem_matches_query; +use crate::agent::packet_evidence_roles::{ + PacketEvidenceRole, packet_claim_key_for_citation, packet_evidence_role, +}; +use crate::agent::packet_required_probes::{ + packet_citation_probe_match_rank, packet_citation_probe_token_coverage, + packet_citation_satisfies_required_probe, packet_required_probe_needs_exact_match, +}; +use crate::agent::packet_scoring::{ + normalize_identifier, packet_citation_key, packet_display_name_is_import_literal, + packet_display_name_is_test_like, packet_display_path, packet_low_signal_display_name, +}; +use crate::{query_mentions_non_primary_source, retrieval_file_role_from_path}; +use codestory_contracts::api::{ + AgentAnswerDto, AgentCitationDto, NodeKind, PacketBudgetLimitsDto, SearchHitOrigin, +}; +use std::collections::{HashMap, HashSet}; + +pub(crate) fn cap_citations(answer: &mut AgentAnswerDto, limits: &PacketBudgetLimitsDto) -> bool { + cap_citations_with_protected(answer, limits, &HashSet::new()) +} + +pub(crate) fn cap_citations_with_protected( + answer: &mut AgentAnswerDto, + limits: &PacketBudgetLimitsDto, + protected_citation_keys: &HashSet, +) -> bool { + let original_len = answer.citations.len(); + let mut files = HashSet::new(); + let mut roles = HashSet::new(); + let mut claim_keys: HashSet = HashSet::new(); + let mut secondary_claim_keys: HashSet = HashSet::new(); + let mut kept = Vec::new(); + let mut deferred = Vec::new(); + + for citation in answer.citations.drain(..) { + let citation_key = packet_citation_key(&citation); + let file = citation.file_path.as_deref().map(packet_display_path); + let role = packet_evidence_role(&citation); + let claim_key = role.map(|role| packet_claim_key_for_citation(role, &citation)); + let low_priority_role = packet_low_priority_cap_role(role); + let protected = protected_citation_keys.contains(&citation_key); + if protected + && kept.len() < limits.max_anchors as usize + && packet_file_fits_limit(file.as_deref(), &files, limits.max_files) + { + if let Some(path) = file { + files.insert(path); + } + if let Some(role) = role { + roles.insert(role); + } + if let Some(ref claim_key) = claim_key { + claim_keys.insert(claim_key.clone()); + } + kept.push(citation); + continue; + } + if let Some(ref claim_key) = claim_key + && claim_keys.contains(claim_key) + && replace_weaker_duplicate_claim_citation( + &mut kept, + claim_key, + citation.clone(), + protected_citation_keys, + ) + { + rebuild_packet_cap_tracking(&kept, &mut files, &mut roles, &mut claim_keys); + continue; + } + let file_is_new = file.as_ref().is_some_and(|path| !files.contains(path)); + let role_is_new = role.is_some_and(|role| !roles.contains(&role)); + let claim_key_is_new = claim_key + .as_ref() + .is_some_and(|key| !claim_keys.contains(key)); + let secondary_claim_definition = claim_key.as_ref().is_some_and(|key| { + claim_keys.contains(key) + && !secondary_claim_keys.contains(key) + && packet_keep_secondary_claim_definition(key, &citation) + }); + let claim_key_expands_primary_packet_coverage = + !low_priority_role && claim_key_is_new && (role_is_new || file_is_new); + let expands_primary_packet_coverage = !low_priority_role + && (claim_key_expands_primary_packet_coverage + || role_is_new + || kept.is_empty() + || (claim_key.is_none() && file_is_new) + || secondary_claim_definition); + if kept.len() >= limits.max_anchors as usize + && packet_primary_definition_file_citation(&citation) + && replace_weaker_same_role_or_low_priority_citation( + &mut kept, + citation.clone(), + protected_citation_keys, + limits, + ) + { + rebuild_packet_cap_tracking(&kept, &mut files, &mut roles, &mut claim_keys); + continue; + } + if kept.len() >= limits.max_anchors as usize + && !low_priority_role + && role_is_new + && replace_overrepresented_role_citation( + &mut kept, + citation.clone(), + protected_citation_keys, + limits, + ) + { + rebuild_packet_cap_tracking(&kept, &mut files, &mut roles, &mut claim_keys); + continue; + } + if kept.len() < limits.max_anchors as usize + && expands_primary_packet_coverage + && packet_file_fits_limit(file.as_deref(), &files, limits.max_files) + { + if let Some(path) = file { + files.insert(path); + } + if let Some(role) = role { + roles.insert(role); + } + if let Some(ref claim_key) = claim_key { + claim_keys.insert(claim_key.clone()); + if secondary_claim_definition { + secondary_claim_keys.insert(claim_key.clone()); + } + } + kept.push(citation); + } else { + deferred.push(citation); + } + } + + let mut primary_new_files = Vec::new(); + let mut primary_duplicate_files = Vec::new(); + let mut low_priority_new_files = Vec::new(); + let mut low_priority_duplicate_files = Vec::new(); + for citation in deferred { + let file = citation.file_path.as_deref().map(packet_display_path); + let low_priority = packet_low_priority_cap_role(packet_evidence_role(&citation)); + if file.as_ref().is_some_and(|path| files.contains(path)) { + if low_priority { + low_priority_duplicate_files.push(citation); + } else { + primary_duplicate_files.push(citation); + } + } else if low_priority { + low_priority_new_files.push(citation); + } else { + primary_new_files.push(citation); + } + } + for citation in primary_new_files + .into_iter() + .chain(primary_duplicate_files) + .chain(low_priority_new_files) + .chain(low_priority_duplicate_files) + { + if kept.len() >= limits.max_anchors as usize { + continue; + } + let file = citation.file_path.as_deref().map(packet_display_path); + if !packet_file_fits_limit(file.as_deref(), &files, limits.max_files) { + continue; + } + if let Some(path) = file { + files.insert(path); + } + kept.push(citation); + } + + let truncated = kept.len() < original_len; + answer.citations = kept; + truncated +} + +pub(crate) fn packet_low_priority_cap_role(role: Option) -> bool { + role.is_some_and(PacketEvidenceRole::is_low_priority_cap_role) +} + +fn replace_weaker_same_role_or_low_priority_citation( + kept: &mut [AgentCitationDto], + candidate: AgentCitationDto, + protected_citation_keys: &HashSet, + limits: &PacketBudgetLimitsDto, +) -> bool { + let candidate_role = packet_evidence_role(&candidate); + let candidate_file = candidate.file_path.as_deref().map(packet_display_path); + let mut replacement: Option<(usize, u8, f32)> = None; + + for (index, existing) in kept.iter().enumerate() { + if protected_citation_keys.contains(&packet_citation_key(existing)) { + continue; + } + if !packet_file_fits_limit_after_replacement( + candidate_file.as_deref(), + kept, + index, + limits.max_files, + ) { + continue; + } + + let existing_role = packet_evidence_role(existing); + let replacement_priority = if packet_low_priority_cap_role(existing_role) { + 3 + } else if candidate_role.is_some() + && candidate_role == existing_role + && !packet_primary_definition_file_citation(existing) + { + 2 + } else { + 0 + }; + if replacement_priority == 0 { + continue; + } + + let existing_rank = existing.score; + let should_replace = replacement + .map(|(_, best_priority, best_rank)| { + replacement_priority > best_priority + || (replacement_priority == best_priority && existing_rank < best_rank) + }) + .unwrap_or(true); + if should_replace { + replacement = Some((index, replacement_priority, existing_rank)); + } + } + + let Some((index, _, _)) = replacement else { + return false; + }; + kept[index] = candidate; + true +} + +fn replace_overrepresented_role_citation( + kept: &mut [AgentCitationDto], + candidate: AgentCitationDto, + protected_citation_keys: &HashSet, + limits: &PacketBudgetLimitsDto, +) -> bool { + let Some(candidate_role) = packet_evidence_role(&candidate) else { + return false; + }; + if kept + .iter() + .any(|citation| packet_evidence_role(citation) == Some(candidate_role)) + { + return false; + } + let candidate_file = candidate.file_path.as_deref().map(packet_display_path); + let role_counts = kept.iter().filter_map(packet_evidence_role).fold( + HashMap::::new(), + |mut counts, role| { + *counts.entry(role).or_insert(0) += 1; + counts + }, + ); + + let mut replacement: Option<(usize, usize, f32)> = None; + for (index, existing) in kept.iter().enumerate() { + if protected_citation_keys.contains(&packet_citation_key(existing)) { + continue; + } + let Some(existing_role) = packet_evidence_role(existing) else { + continue; + }; + let existing_role_count = role_counts.get(&existing_role).copied().unwrap_or_default(); + if existing_role_count <= 1 { + continue; + } + if !packet_file_fits_limit_after_replacement( + candidate_file.as_deref(), + kept, + index, + limits.max_files, + ) { + continue; + } + let existing_rank = existing.score; + let should_replace = replacement + .map(|(_, best_count, best_rank)| { + existing_role_count > best_count + || (existing_role_count == best_count && existing_rank < best_rank) + }) + .unwrap_or(true); + if should_replace { + replacement = Some((index, existing_role_count, existing_rank)); + } + } + + let Some((index, _, _)) = replacement else { + return false; + }; + kept[index] = candidate; + true +} + +fn packet_file_fits_limit_after_replacement( + path: Option<&str>, + kept: &[AgentCitationDto], + replacement_index: usize, + max_files: u32, +) -> bool { + let files = kept + .iter() + .enumerate() + .filter(|(index, _)| *index != replacement_index) + .filter_map(|(_, citation)| citation.file_path.as_deref().map(packet_display_path)) + .collect::>(); + packet_file_fits_limit(path, &files, max_files) +} + +fn replace_weaker_duplicate_claim_citation( + kept: &mut [AgentCitationDto], + claim_key: &str, + candidate: AgentCitationDto, + protected_citation_keys: &HashSet, +) -> bool { + let Some(index) = kept.iter().position(|citation| { + packet_evidence_role(citation) + .map(|role| packet_claim_key_for_citation(role, citation) == claim_key) + .unwrap_or(false) + }) else { + return false; + }; + if protected_citation_keys.contains(&packet_citation_key(&kept[index])) { + return false; + } + if packet_prefer_duplicate_claim_citation(&candidate, &kept[index]) { + kept[index] = candidate; + return true; + } + false +} + +fn packet_prefer_duplicate_claim_citation( + candidate: &AgentCitationDto, + existing: &AgentCitationDto, +) -> bool { + if packet_prefer_flow_anchor_path_citation(candidate, existing) { + return true; + } + normalize_identifier(&candidate.display_name) == normalize_identifier(&existing.display_name) + && packet_exact_definition_file_citation(candidate) + && !packet_exact_definition_file_citation(existing) +} + +pub(crate) fn packet_primary_definition_file_citation(citation: &AgentCitationDto) -> bool { + packet_exact_definition_file_citation(citation) + || packet_near_stem_type_definition_file(citation) +} + +fn packet_near_stem_type_definition_file(citation: &AgentCitationDto) -> bool { + if citation.origin != SearchHitOrigin::IndexedSymbol + || !citation.resolvable + || !matches!( + citation.kind, + NodeKind::STRUCT + | NodeKind::CLASS + | NodeKind::INTERFACE + | NodeKind::UNION + | NodeKind::ENUM + | NodeKind::TYPEDEF + ) + { + return false; + } + let normalized_display = normalize_identifier(&citation.display_name); + if normalized_display.is_empty() + || packet_low_signal_display_name(normalized_display.as_str()) + || packet_exact_definition_file_citation(citation) + { + return false; + } + let stem = citation + .file_path + .as_deref() + .map(packet_display_path) + .and_then(|path| { + let file_name = path.rsplit('/').next().unwrap_or(path.as_str()); + file_name + .rsplit_once('.') + .map(|(stem, _)| stem.to_string()) + .or_else(|| Some(file_name.to_string())) + }) + .map(|stem| normalize_identifier(&stem)) + .unwrap_or_default(); + if stem.is_empty() { + return false; + } + + let len_delta = normalized_display.len().abs_diff(stem.len()); + if len_delta > 2 { + return false; + } + let shared_prefix = normalized_display + .chars() + .zip(stem.chars()) + .take_while(|(left, right)| left == right) + .count(); + shared_prefix >= 8 + && shared_prefix.saturating_mul(5) + >= normalized_display.len().min(stem.len()).saturating_mul(4) +} + +pub(crate) fn packet_prefer_flow_anchor_path_citation( + candidate: &AgentCitationDto, + existing: &AgentCitationDto, +) -> bool { + let candidate_path = candidate + .file_path + .as_deref() + .map(packet_display_path) + .unwrap_or_default() + .to_ascii_lowercase(); + let existing_path = existing + .file_path + .as_deref() + .map(packet_display_path) + .unwrap_or_default() + .to_ascii_lowercase(); + if candidate_path == existing_path { + return false; + } + let candidate_role = retrieval_file_role_from_path(&candidate_path); + let existing_role = retrieval_file_role_from_path(&existing_path); + candidate_role == crate::RetrievalFileRole::Source && existing_role.is_non_primary() +} + +pub(crate) fn packet_exact_definition_file_citation(citation: &AgentCitationDto) -> bool { + citation.origin == SearchHitOrigin::IndexedSymbol + && citation.resolvable + && matches!( + citation.kind, + NodeKind::STRUCT + | NodeKind::CLASS + | NodeKind::INTERFACE + | NodeKind::UNION + | NodeKind::ENUM + | NodeKind::TYPEDEF + ) + && !packet_low_signal_display_name(normalize_identifier(&citation.display_name).as_str()) + && packet_file_stem_matches_query(&citation.display_name, citation.file_path.as_deref()) +} + +fn packet_keep_secondary_claim_definition(_claim_key: &str, citation: &AgentCitationDto) -> bool { + if !packet_primary_definition_file_citation(citation) { + return false; + } + packet_mandatory_secondary_path_citation(citation) +} + +fn packet_mandatory_secondary_path_citation(citation: &AgentCitationDto) -> bool { + let path = citation + .file_path + .as_deref() + .map(packet_display_path) + .unwrap_or_default() + .to_ascii_lowercase(); + path.contains("event_processor") + || path.contains("_events") + || path.contains("-events") + || path.contains("/cli/") + || path.ends_with("/main.rs") +} + +fn rebuild_packet_cap_tracking( + kept: &[AgentCitationDto], + files: &mut HashSet, + roles: &mut HashSet, + claim_keys: &mut HashSet, +) { + files.clear(); + roles.clear(); + claim_keys.clear(); + for citation in kept { + if let Some(path) = citation.file_path.as_deref().map(packet_display_path) { + files.insert(path); + } + if let Some(role) = packet_evidence_role(citation) { + roles.insert(role); + claim_keys.insert(packet_claim_key_for_citation(role, citation)); + } + } +} + +fn packet_file_fits_limit(path: Option<&str>, files: &HashSet, max_files: u32) -> bool { + path.is_none_or(|path| files.contains(path) || files.len() < max_files as usize) +} + +const PACKET_FOCUS_NEIGHBORHOOD_CARRY_LIMIT: usize = 4; + +pub(crate) fn cap_packet_citations( + answer: &mut AgentAnswerDto, + limits: &PacketBudgetLimitsDto, + required_probe_queries: &[String], +) -> bool { + let mut protected_citation_keys = + promote_required_probe_citations(answer, required_probe_queries); + let focus_neighborhood_keys = + promote_focus_neighborhood_citations(answer, &protected_citation_keys); + protected_citation_keys.extend(focus_neighborhood_keys); + if protected_citation_keys.is_empty() { + cap_citations(answer, limits) + } else { + cap_citations_with_protected(answer, limits, &protected_citation_keys) + } +} + +pub(crate) fn promote_required_probe_citations( + answer: &mut AgentAnswerDto, + required_probe_queries: &[String], +) -> HashSet { + if required_probe_queries.is_empty() || answer.citations.is_empty() { + return HashSet::new(); + } + + let focus_roots = packet_command_focus_roots(&answer.citations); + let mut promoted_indices = Vec::new(); + for query in required_probe_queries { + if promoted_indices + .iter() + .any(|index| packet_citation_satisfies_required_probe(query, &answer.citations[*index])) + { + continue; + } + let mut best_match = None; + for (index, citation) in answer.citations.iter().enumerate() { + if promoted_indices.contains(&index) { + continue; + } + let Some(match_rank) = packet_citation_probe_match_rank(query, citation) else { + continue; + }; + if packet_display_name_is_import_literal(&citation.display_name.to_ascii_lowercase()) + && !packet_citation_satisfies_required_probe(query, citation) + { + continue; + } + if best_match + .map(|(best_index, best_rank)| { + packet_prefer_required_probe_match( + query, + citation, + match_rank, + &answer.citations[best_index], + best_rank, + &focus_roots, + ) + }) + .unwrap_or(true) + { + best_match = Some((index, match_rank)); + } + } + if let Some((index, _)) = best_match { + promoted_indices.push(index); + } + } + if promoted_indices.is_empty() { + return HashSet::new(); + } + + let protected_citation_keys = promoted_indices + .iter() + .map(|index| packet_citation_key(&answer.citations[*index])) + .collect::>(); + let promoted_index_set = promoted_indices.iter().copied().collect::>(); + let mut reordered = Vec::with_capacity(answer.citations.len()); + for index in promoted_indices { + reordered.push(answer.citations[index].clone()); + } + for (index, citation) in answer.citations.drain(..).enumerate() { + if !promoted_index_set.contains(&index) { + reordered.push(citation); + } + } + answer.citations = reordered; + answer.retrieval_trace.annotations.push(format!( + "packet_required_probe_citations promoted={} required={}", + promoted_index_set.len(), + required_probe_queries.join("|").replace('`', "'") + )); + protected_citation_keys +} + +pub(crate) fn promote_focus_neighborhood_citations( + answer: &mut AgentAnswerDto, + protected_citation_keys: &HashSet, +) -> HashSet { + if answer.citations.is_empty() { + return HashSet::new(); + } + let focus_roots = packet_command_focus_roots(&answer.citations); + if focus_roots.is_empty() { + return HashSet::new(); + } + let protected_file_paths = answer + .citations + .iter() + .filter(|citation| protected_citation_keys.contains(&packet_citation_key(citation))) + .filter_map(packet_citation_file_path_key) + .collect::>(); + + let mut ranked_candidates = answer + .citations + .iter() + .enumerate() + .filter(|(_, citation)| { + packet_focus_neighborhood_candidate( + citation, + &focus_roots, + protected_citation_keys, + &protected_file_paths, + ) + }) + .map(|(index, citation)| { + ( + index, + packet_focus_neighborhood_rank(citation, &focus_roots), + ) + }) + .collect::>(); + ranked_candidates.sort_by(|(left_index, left_rank), (right_index, right_rank)| { + right_rank + .cmp(left_rank) + .then_with(|| left_index.cmp(right_index)) + }); + + let mut promoted_indices = Vec::new(); + let mut promoted_file_paths = HashSet::new(); + for (index, _) in ranked_candidates { + let Some(path) = packet_citation_file_path_key(&answer.citations[index]) else { + continue; + }; + if !promoted_file_paths.insert(path) { + continue; + } + promoted_indices.push(index); + if promoted_indices.len() >= PACKET_FOCUS_NEIGHBORHOOD_CARRY_LIMIT { + break; + } + } + if promoted_indices.is_empty() { + return HashSet::new(); + } + + let promoted_index_set = promoted_indices.iter().copied().collect::>(); + let promoted_keys = promoted_indices + .iter() + .map(|index| packet_citation_key(&answer.citations[*index])) + .collect::>(); + let mut reordered = Vec::with_capacity(answer.citations.len()); + for citation in &answer.citations { + if protected_citation_keys.contains(&packet_citation_key(citation)) { + reordered.push(citation.clone()); + } + } + for index in promoted_indices { + reordered.push(answer.citations[index].clone()); + } + for (index, citation) in answer.citations.drain(..).enumerate() { + let key = packet_citation_key(&citation); + if !protected_citation_keys.contains(&key) && !promoted_index_set.contains(&index) { + reordered.push(citation); + } + } + answer.citations = reordered; + answer.retrieval_trace.annotations.push(format!( + "packet_focus_neighborhood_citations promoted={} roots={}", + promoted_keys.len(), + focus_roots + .iter() + .map(|root| root.root.as_str()) + .collect::>() + .join("|") + .replace('`', "'") + )); + promoted_keys +} + +fn packet_focus_neighborhood_candidate( + citation: &AgentCitationDto, + focus_roots: &[PacketCommandFocusRoot], + protected_citation_keys: &HashSet, + protected_file_paths: &HashSet, +) -> bool { + if protected_citation_keys.contains(&packet_citation_key(citation)) + || citation.origin != SearchHitOrigin::IndexedSymbol + || !citation.resolvable + || packet_display_name_is_import_literal(&citation.display_name.to_ascii_lowercase()) + || packet_display_name_is_test_like(&citation.display_name) + { + return false; + } + let path = citation + .file_path + .as_deref() + .map(packet_display_path) + .unwrap_or_default(); + if path.is_empty() || packet_citation_focus_root_score(citation, focus_roots) == 0 { + return false; + } + if protected_file_paths.contains(&path) { + return false; + } + !retrieval_file_role_from_path(&path.to_ascii_lowercase()).is_non_primary() +} + +fn packet_citation_file_path_key(citation: &AgentCitationDto) -> Option { + let path = citation.file_path.as_deref().map(packet_display_path)?; + if path.is_empty() { None } else { Some(path) } +} + +fn packet_focus_neighborhood_rank( + citation: &AgentCitationDto, + focus_roots: &[PacketCommandFocusRoot], +) -> (u8, u8, u8, u8, u8, u8, i32) { + let path = citation + .file_path + .as_deref() + .map(packet_display_path) + .unwrap_or_default(); + let source_file: u8 = if retrieval_file_role_from_path(&path.to_ascii_lowercase()) + == crate::RetrievalFileRole::Source + { + 1 + } else { + 0 + }; + let direct_root_file = packet_citation_direct_focus_root_file_score(citation, focus_roots); + let role_backed: u8 = if packet_evidence_role(citation).is_some() { + 1 + } else { + 0 + }; + let implementation_file: u8 = if packet_path_is_implementation(&path) { + 1 + } else { + 0 + }; + let definition_file: u8 = if packet_primary_definition_file_citation(citation) { + 1 + } else { + 0 + }; + ( + packet_citation_focus_root_score(citation, focus_roots), + direct_root_file, + packet_source_navigation_file_score(&path), + source_file, + role_backed, + implementation_file.saturating_add(definition_file), + (citation.score * 1000.0).round() as i32, + ) +} + +fn packet_citation_direct_focus_root_file_score( + citation: &AgentCitationDto, + focus_roots: &[PacketCommandFocusRoot], +) -> u8 { + let path = citation + .file_path + .as_deref() + .map(packet_display_path) + .unwrap_or_default() + .replace('\\', "/"); + let parent = path.rsplit_once('/').map(|(parent, _)| parent); + focus_roots + .iter() + .filter(|root| parent == Some(root.root.as_str())) + .map(|root| root.weight) + .max() + .unwrap_or_default() +} + +fn packet_source_navigation_file_score(path: &str) -> u8 { + let normalized = packet_display_path(path).replace('\\', "/"); + let file_name = normalized.rsplit('/').next().unwrap_or(normalized.as_str()); + let stem = file_name + .rsplit_once('.') + .map(|(stem, _)| stem) + .unwrap_or(file_name) + .to_ascii_lowercase(); + match stem.as_str() { + "cli" | "cmd" | "command" | "commands" => 4, + "lib" | "mod" | "index" => 3, + "events" | "event" => 2, + "main" | "app" | "server" | "router" | "routes" => 2, + "handler" | "handlers" | "entrypoint" | "entrypoints" => 1, + _ if stem.ends_with("_events") + || stem.ends_with("_event") + || stem.ends_with("-events") + || stem.ends_with("-event") => + { + 2 + } + _ => 0, + } +} + +fn packet_prefer_required_probe_match( + query: &str, + candidate: &AgentCitationDto, + candidate_rank: u8, + existing: &AgentCitationDto, + existing_rank: u8, + focus_roots: &[PacketCommandFocusRoot], +) -> bool { + if !query_mentions_non_primary_source(query) { + let candidate_test_like = packet_display_name_is_test_like(&candidate.display_name); + let existing_test_like = packet_display_name_is_test_like(&existing.display_name); + if candidate_test_like != existing_test_like { + return !candidate_test_like; + } + } + if candidate_rank != existing_rank { + return candidate_rank > existing_rank; + } + if !packet_required_probe_needs_exact_match(query) { + let candidate_focus = packet_citation_focus_root_score(candidate, focus_roots); + let existing_focus = packet_citation_focus_root_score(existing, focus_roots); + if candidate_focus != existing_focus { + return candidate_focus > existing_focus; + } + let candidate_token_coverage = packet_citation_probe_token_coverage(query, candidate); + let existing_token_coverage = packet_citation_probe_token_coverage(query, existing); + if candidate_token_coverage != existing_token_coverage { + return candidate_token_coverage > existing_token_coverage; + } + } + if packet_prefer_flow_anchor_path_citation(candidate, existing) { + return true; + } + if packet_required_probe_prefers_implementation(query) + && packet_prefer_implementation_file(candidate, existing) + { + return true; + } + packet_exact_definition_file_citation(candidate) + && !packet_exact_definition_file_citation(existing) +} + +fn packet_required_probe_prefers_implementation(query: &str) -> bool { + query.contains("::") || query.contains('.') +} + +fn packet_prefer_implementation_file( + candidate: &AgentCitationDto, + existing: &AgentCitationDto, +) -> bool { + let candidate_path = candidate + .file_path + .as_deref() + .map(packet_display_path) + .unwrap_or_default(); + let existing_path = existing + .file_path + .as_deref() + .map(packet_display_path) + .unwrap_or_default(); + packet_path_is_implementation(&candidate_path) && !packet_path_is_implementation(&existing_path) +} + +fn packet_path_is_implementation(path: &str) -> bool { + let lower = path.to_ascii_lowercase(); + matches!( + lower.rsplit('.').next(), + Some( + "c" | "cc" + | "cpp" + | "cxx" + | "go" + | "java" + | "js" + | "jsx" + | "kt" + | "php" + | "py" + | "rb" + | "rs" + | "ts" + | "tsx" + ) + ) +} + +#[derive(Debug, Clone, PartialEq, Eq)] +struct PacketCommandFocusRoot { + root: String, + weight: u8, +} + +fn packet_command_focus_roots(citations: &[AgentCitationDto]) -> Vec { + let mut roots = Vec::::new(); + for citation in citations { + let display = citation.display_name.as_str(); + let normalized_display = normalize_identifier(display); + let path = citation + .file_path + .as_deref() + .map(packet_display_path) + .unwrap_or_default(); + let Some(root) = packet_source_root_from_path(&path) else { + continue; + }; + let normalized_path = path.replace('\\', "/"); + let weight = + if normalized_display.ends_with("runmain") || normalized_display.contains("runexec") { + 3 + } else if display.contains("::Cli") + || display.contains("::cli") + || normalized_path.ends_with("/src/cli.rs") + || (normalized_path.ends_with("/main.rs") && normalized_display == "main") + { + 2 + } else if display.contains("Subcommand::") { + 1 + } else { + continue; + }; + packet_push_focus_root(&mut roots, root, weight); + } + roots.sort_by(|left, right| { + right + .weight + .cmp(&left.weight) + .then_with(|| left.root.cmp(&right.root)) + }); + roots +} + +fn packet_push_focus_root(roots: &mut Vec, root: String, weight: u8) { + if let Some(existing) = roots.iter_mut().find(|existing| existing.root == root) { + existing.weight = existing.weight.max(weight); + } else { + roots.push(PacketCommandFocusRoot { root, weight }); + } +} + +fn packet_source_root_from_path(path: &str) -> Option { + let normalized = packet_display_path(path); + let normalized = normalized.trim_matches('/').replace('\\', "/"); + if normalized.is_empty() { + return None; + } + if let Some(index) = normalized.find("/src/") { + let root = &normalized[..index + "/src".len()]; + return (!root.is_empty()).then(|| root.to_string()); + } + let (parent, _) = normalized.rsplit_once('/')?; + (!parent.is_empty()).then(|| parent.to_string()) +} + +fn packet_citation_focus_root_score( + citation: &AgentCitationDto, + focus_roots: &[PacketCommandFocusRoot], +) -> u8 { + let path = citation + .file_path + .as_deref() + .map(packet_display_path) + .unwrap_or_default() + .replace('\\', "/"); + focus_roots + .iter() + .filter(|root| path == root.root || path.starts_with(&format!("{}/", root.root))) + .map(|root| root.weight) + .max() + .unwrap_or_default() +} diff --git a/crates/codestory-runtime/src/agent/packet_citations.rs b/crates/codestory-runtime/src/agent/packet_citations.rs new file mode 100644 index 00000000..ad5f03eb --- /dev/null +++ b/crates/codestory-runtime/src/agent/packet_citations.rs @@ -0,0 +1,93 @@ +use crate::agent::packet_scoring::{normalize_identifier, packet_display_path}; +use codestory_contracts::api::AgentCitationDto; + +pub(crate) fn packet_citation_matching_display<'a>( + citations: &'a [AgentCitationDto], + display_needle: &str, +) -> Option<&'a AgentCitationDto> { + let needle = normalize_identifier(display_needle); + citations + .iter() + .find(|citation| normalize_identifier(&citation.display_name) == needle) +} + +pub(crate) fn packet_citation_matching_display_contains<'a>( + citations: &'a [AgentCitationDto], + display_needle: &str, +) -> Option<&'a AgentCitationDto> { + let needle = normalize_identifier(display_needle); + citations + .iter() + .find(|citation| normalize_identifier(&citation.display_name).contains(&needle)) +} + +pub(crate) fn packet_citation_matching_path_and_display<'a>( + citations: &'a [AgentCitationDto], + path_needle: &str, + display_needle: &str, +) -> Option<&'a AgentCitationDto> { + let normalized_path_needle = normalize_identifier(path_needle); + let normalized_display_needle = normalize_identifier(display_needle); + citations.iter().find(|citation| { + let path_match = citation + .file_path + .as_deref() + .map(packet_display_path) + .map(|path| normalize_identifier(&path).contains(&normalized_path_needle)) + .unwrap_or(false); + path_match + && normalize_identifier(&citation.display_name).contains(&normalized_display_needle) + }) +} + +pub(crate) fn packet_command_crate_sources_contain_all( + citations: &[AgentCitationDto], + crate_segment: &str, + groups: &[&[&str]], +) -> bool { + let mut combined = String::new(); + for citation in citations + .iter() + .filter(|citation| packet_citation_path_contains_crate_segment(citation, crate_segment)) + { + let Some(source) = packet_citation_source_text(citation) else { + continue; + }; + combined.push_str(&source.to_ascii_lowercase()); + combined.push('\n'); + } + !combined.is_empty() + && groups.iter().all(|terms| { + terms + .iter() + .any(|term| combined.contains(&term.to_ascii_lowercase())) + }) +} + +pub(crate) fn packet_citation_path_contains_crate_segment( + citation: &AgentCitationDto, + crate_segment: &str, +) -> bool { + let crate_segment = normalize_identifier(crate_segment); + if crate_segment.is_empty() { + return false; + } + citation + .file_path + .as_deref() + .map(|path| { + let raw = path.trim_start_matches("\\\\?\\").replace('\\', "/"); + let display = packet_display_path(path).replace('\\', "/"); + format!("{raw}\n{display}").to_ascii_lowercase() + }) + .map(|path| { + let needle = format!("/{crate_segment}/src/"); + path.contains(&needle) + }) + .unwrap_or(false) +} + +pub(crate) fn packet_citation_source_text(citation: &AgentCitationDto) -> Option { + let path = citation.file_path.as_deref()?; + std::fs::read_to_string(path).ok() +} diff --git a/crates/codestory-runtime/src/agent/packet_claim_profiles.rs b/crates/codestory-runtime/src/agent/packet_claim_profiles.rs new file mode 100644 index 00000000..f12ab882 --- /dev/null +++ b/crates/codestory-runtime/src/agent/packet_claim_profiles.rs @@ -0,0 +1,1566 @@ +use crate::agent::eval_probes::eval_probes_enabled; +use crate::agent::packet_citations::packet_citation_source_text; +use crate::agent::packet_evidence_roles::PacketEvidenceRole; +use crate::agent::packet_scoring::{normalize_identifier, packet_display_path}; +use crate::agent::packet_source_patterns::{ + packet_display_owner, packet_human_join, packet_source_constructed_type, packet_source_has_all, + packet_source_has_any, packet_source_identifier_ending_with, packet_source_identifier_exact, + packet_source_identifier_with_words, packet_source_identifier_with_words_shortest, + packet_sql_create_table_names, packet_sql_foreign_key_claims, +}; +use crate::agent::packet_terms::{ + packet_probe_terms, packet_terms_indicate_client_send_flow, + packet_terms_indicate_event_loop_command_flow, packet_terms_indicate_hook_cache_flow, + packet_terms_indicate_request_dispatch_flow, packet_terms_indicate_runtime_formatting_flow, + packet_terms_indicate_search_execution_flow, packet_terms_indicate_server_route_dispatch_flow, + packet_terms_indicate_shell_version_use_flow, packet_terms_indicate_sql_schema_flow, + packet_terms_indicate_string_predicate_flow, packet_terms_indicate_stylesheet_animation_flow, + packet_terms_indicate_url_session_request_flow, +}; +use codestory_contracts::api::AgentCitationDto; +use std::collections::HashSet; + +const GENERIC_PRODUCT_CLAIM_PROFILES: &[SourceClaimProfile] = &[ + SourceClaimProfile::ShellVersionUse, + SourceClaimProfile::StringPredicate, + SourceClaimProfile::StylesheetAnimation, + SourceClaimProfile::SqlSchema, + SourceClaimProfile::RuntimeFormatting, +]; + +const EVAL_DIAGNOSTIC_CLAIM_PROFILES: &[SourceClaimProfile] = &[ + SourceClaimProfile::ServerRoute, + SourceClaimProfile::HookCache, + SourceClaimProfile::ClientSend, + SourceClaimProfile::UrlSessionRequest, + SourceClaimProfile::ClientRequestDispatch, + SourceClaimProfile::EventLoopCommand, + SourceClaimProfile::SearchExecution, +]; + +#[derive(Debug, Clone, Copy)] +enum SourceClaimProfile { + ServerRoute, + ShellVersionUse, + HookCache, + ClientSend, + UrlSessionRequest, + StringPredicate, + StylesheetAnimation, + SqlSchema, + RuntimeFormatting, + ClientRequestDispatch, + EventLoopCommand, + SearchExecution, +} + +impl SourceClaimProfile { + fn collect(self, ctx: &SourceClaimContext<'_>, claims: &mut Vec) { + match self { + Self::ServerRoute => { + if packet_terms_indicate_server_route_dispatch_flow(&ctx.prompt_terms) { + claims.extend(packet_generic_server_route_flow_claims( + ctx.symbol, ctx.source, + )); + } + } + Self::ShellVersionUse => { + if packet_terms_indicate_shell_version_use_flow(&ctx.prompt_terms) { + claims.extend(packet_generic_shell_version_use_flow_claims( + ctx.symbol, ctx.source, + )); + } + } + Self::HookCache => { + if packet_terms_indicate_hook_cache_flow(&ctx.prompt_terms) { + claims.extend(packet_generic_hook_cache_flow_claims( + ctx.symbol, ctx.source, + )); + } + } + Self::ClientSend => { + if packet_terms_indicate_client_send_flow(&ctx.prompt_terms) { + claims.extend(packet_generic_client_send_flow_claims( + ctx.symbol, ctx.source, + )); + } + } + Self::UrlSessionRequest => { + if packet_terms_indicate_url_session_request_flow(&ctx.prompt_terms) { + claims.extend(packet_generic_url_session_request_flow_claims( + ctx.symbol, ctx.source, + )); + } + } + Self::StringPredicate => { + if packet_terms_indicate_string_predicate_flow(&ctx.prompt_terms) { + claims.extend(packet_generic_string_predicate_flow_claims( + ctx.symbol, ctx.source, + )); + } + } + Self::StylesheetAnimation => { + if packet_terms_indicate_stylesheet_animation_flow(&ctx.prompt_terms) { + claims.extend(packet_generic_css_animation_flow_claims(ctx.source)); + } + } + Self::SqlSchema => { + if packet_terms_indicate_sql_schema_flow(&ctx.prompt_terms) { + claims.extend(packet_generic_sql_schema_flow_claims(ctx.source)); + } + } + Self::RuntimeFormatting => { + if packet_terms_indicate_runtime_formatting_flow(&ctx.prompt_terms) { + claims.extend(packet_generic_runtime_formatting_flow_claims(ctx.source)); + } + } + Self::ClientRequestDispatch => collect_client_request_dispatch_claims(ctx, claims), + Self::EventLoopCommand => collect_event_loop_command_claims(ctx, claims), + Self::SearchExecution => collect_search_execution_claims(ctx, claims), + } + } +} + +struct SourceClaimContext<'a> { + source: &'a str, + symbol: &'a str, + file_name: String, + normalized_prompt: String, + prompt_terms: Vec, +} + +impl<'a> SourceClaimContext<'a> { + fn new(prompt: &str, citation: &'a AgentCitationDto, source: &'a str) -> Self { + let symbol = citation.display_name.as_str(); + let path = citation + .file_path + .as_deref() + .map(packet_display_path) + .unwrap_or_default(); + let file_name = path + .rsplit(['/', '\\']) + .next() + .filter(|name| !name.is_empty()) + .unwrap_or(symbol) + .to_string(); + Self { + source, + symbol, + file_name, + normalized_prompt: normalize_identifier(prompt), + prompt_terms: packet_probe_terms(prompt), + } + } +} + +pub(crate) fn packet_source_derived_claims_for_citation( + prompt: &str, + citation: &AgentCitationDto, + source: &str, +) -> Vec { + let mut claims = Vec::new(); + let eval_diagnostics = eval_probes_enabled(); + let ctx = SourceClaimContext::new(prompt, citation, source); + + if eval_diagnostics { + claims.extend( + crate::agent::eval_probes::source_derived_claims_for_citation(prompt, citation, source), + ); + for profile in EVAL_DIAGNOSTIC_CLAIM_PROFILES { + profile.collect(&ctx, &mut claims); + } + } + + for profile in GENERIC_PRODUCT_CLAIM_PROFILES { + profile.collect(&ctx, &mut claims); + } + + claims +} + +pub(crate) fn packet_source_derived_claim_for_role( + role: PacketEvidenceRole, + citation: &AgentCitationDto, + prompt: &str, +) -> Option { + let source = packet_citation_source_text(citation)?; + if source.len() > 800_000 { + return None; + } + let ctx = SourceClaimContext::new(prompt, citation, &source); + let request_flow = packet_terms_indicate_request_dispatch_flow(&ctx.prompt_terms); + let command_flow = packet_terms_indicate_event_loop_command_flow(&ctx.prompt_terms); + let search_flow = packet_terms_indicate_search_execution_flow(&ctx.prompt_terms); + let eval_diagnostics = eval_probes_enabled(); + + if eval_diagnostics && request_flow { + if role == PacketEvidenceRole::ClientFactory + && let Some(claim) = client_factory_claim(&ctx) + { + return Some(claim); + } + if let Some(claim) = client_request_pipeline_claim(&ctx) { + return Some(claim); + } + if role == PacketEvidenceRole::RequestDispatch + && let Some(claim) = request_dispatch_claim(&ctx) + { + return Some(claim); + } + if role == PacketEvidenceRole::InterceptorManagement + && let Some(claim) = interceptor_management_claim(&ctx) + { + return Some(claim); + } + if role == PacketEvidenceRole::TransportAdapter + && let Some(claim) = transport_adapter_claim(&ctx) + { + return Some(claim); + } + } + + if eval_diagnostics && command_flow && event_loop_prompt(&ctx) { + if let Some(claim) = event_loop_entry_claim(&ctx) { + return Some(claim); + } + if let Some(claim) = event_loop_process_events_claim(&ctx) { + return Some(claim); + } + } + + if eval_diagnostics + && command_flow + && role == PacketEvidenceRole::NetworkCommandInput + && let Some(claim) = network_command_input_claim(&ctx) + { + return Some(claim); + } + + if eval_diagnostics && command_flow && role == PacketEvidenceRole::CommandDispatch { + if let Some(claim) = command_dispatch_table_claim(&ctx) { + return Some(claim); + } + if let Some(claim) = command_dispatch_call_claim(&ctx) { + return Some(claim); + } + } + + if eval_diagnostics + && search_flow + && role == PacketEvidenceRole::SearchDriver + && let Some(claim) = search_driver_claim(&ctx) + { + return Some(claim); + } + + if eval_diagnostics + && search_flow + && role == PacketEvidenceRole::ArgumentPlanning + && let Some(claim) = argument_planning_claim(&ctx) + { + return Some(claim); + } + + if eval_diagnostics + && search_flow + && role == PacketEvidenceRole::SearchExecutionUnit + && let Some(claim) = search_execution_state_claim(&ctx) + { + return Some(claim); + } + + if eval_diagnostics + && search_flow + && let Some(claim) = search_walk_claim(&ctx) + { + return Some(claim); + } + + if eval_diagnostics + && search_flow + && let Some(claim) = parallel_search_claim(&ctx) + { + return Some(claim); + } + + if eval_diagnostics + && search_flow + && let Some(claim) = search_execution_method_claim(&ctx) + { + return Some(claim); + } + + None +} + +fn push_optional_claim(claims: &mut Vec, claim: Option) { + if let Some(claim) = claim { + claims.push(claim); + } +} + +fn collect_client_request_dispatch_claims(ctx: &SourceClaimContext<'_>, claims: &mut Vec) { + if !packet_terms_indicate_request_dispatch_flow(&ctx.prompt_terms) { + return; + } + + push_optional_claim(claims, client_factory_claim(ctx)); + push_optional_claim(claims, client_request_pipeline_claim(ctx)); + push_optional_claim(claims, request_dispatch_claim(ctx)); + push_optional_claim(claims, interceptor_management_claim(ctx)); + push_optional_claim(claims, transport_adapter_claim(ctx)); +} + +fn client_factory_claim(ctx: &SourceClaimContext<'_>) -> Option { + if packet_source_has_all(ctx.source, &["new ", "prototype", "request", "extend"]) { + let context = packet_source_constructed_type(ctx.source).unwrap_or_else(|| "client".into()); + return Some(format!( + "`{}` wraps a {context} context and exposes verb helpers bound to request.", + ctx.symbol + )); + } + None +} + +fn client_request_pipeline_claim(ctx: &SourceClaimContext<'_>) -> Option { + if packet_source_has_all(ctx.source, &["merge", "config", "interceptors", "request"]) + && packet_source_has_any(ctx.source, &["dispatch", "adapter"]) + && let Some(owner) = packet_display_owner(ctx.symbol) + { + let dispatch = packet_source_identifier_with_words(ctx.source, &["dispatch", "request"]) + .unwrap_or_else(|| "request dispatch".to_string()); + return Some(format!( + "{owner}.request merges defaults, runs request interceptors, then calls {dispatch}." + )); + } + None +} + +fn request_dispatch_claim(ctx: &SourceClaimContext<'_>) -> Option { + if packet_source_has_all(ctx.source, &["adapter", "transform"]) + && packet_source_has_any(ctx.source, &["headers", "data", "body"]) + { + return Some(format!( + "`{}` transforms the body/headers and invokes the configured adapter.", + ctx.symbol + )); + } + None +} + +fn interceptor_management_claim(ctx: &SourceClaimContext<'_>) -> Option { + if packet_source_has_all(ctx.source, &["handlers", "fulfilled", "rejected"]) { + return Some(format!( + "`{}` stores interceptor pairs used by the promise chain in request.", + ctx.symbol + )); + } + None +} + +fn transport_adapter_claim(ctx: &SourceClaimContext<'_>) -> Option { + if packet_source_has_all(ctx.source, &["adapter"]) + && packet_source_has_all(ctx.source, &["xhr", "http"]) + && packet_source_has_any(ctx.source, &["known", "environment", "platform"]) + { + return Some(format!( + "`{}` selects xhr or http transport based on environment capabilities.", + ctx.file_name + )); + } + None +} + +fn collect_event_loop_command_claims(ctx: &SourceClaimContext<'_>, claims: &mut Vec) { + if !packet_terms_indicate_event_loop_command_flow(&ctx.prompt_terms) { + return; + } + + if event_loop_prompt(ctx) { + push_optional_claim(claims, event_loop_entry_claim(ctx)); + push_optional_claim(claims, event_loop_process_events_claim(ctx)); + } + + push_optional_claim(claims, network_command_input_claim(ctx)); + push_optional_claim(claims, command_dispatch_table_claim(ctx)); + push_optional_claim(claims, command_dispatch_call_claim(ctx)); +} + +fn event_loop_prompt(ctx: &SourceClaimContext<'_>) -> bool { + ctx.normalized_prompt.contains("eventloop") + || (ctx.normalized_prompt.contains("event") && ctx.normalized_prompt.contains("loop")) +} + +fn event_loop_entry_claim(ctx: &SourceClaimContext<'_>) -> Option { + if packet_source_has_all(ctx.source, &["init", "event"]) + && let Some(loop_entry) = packet_source_identifier_ending_with(ctx.source, "Main", "main") + && packet_source_identifier_exact(ctx.source, "main").is_some() + { + return Some(format!( + "main initializes the server and enters {loop_entry} on the shared event loop." + )); + } + None +} + +fn event_loop_process_events_claim(ctx: &SourceClaimContext<'_>) -> Option { + if let Some(process_events) = + packet_source_identifier_with_words(ctx.source, &["process", "events"]) + && packet_source_has_any(ctx.source, &["readable", "writable"]) + { + return Some(format!( + "{process_events} polls readable/writable fds and invokes registered file event handlers." + )); + } + None +} + +fn network_command_input_claim(ctx: &SourceClaimContext<'_>) -> Option { + if let Some(read_client) = packet_source_identifier_with_words(ctx.source, &["read", "client"]) + && let Some(process_input) = + packet_source_identifier_with_words(ctx.source, &["process", "input", "buffer"]) + { + return Some(format!( + "{read_client} appends socket input and drives {process_input} when a full command is available." + )); + } + None +} + +fn command_dispatch_table_claim(ctx: &SourceClaimContext<'_>) -> Option { + if let Some(process_command) = + packet_source_identifier_with_words(ctx.source, &["process", "command"]) + && packet_source_has_any(ctx.source, &["lookup", "arity", "acl", "cluster"]) + { + return Some(format!( + "{process_command} resolves the command table entry and enforces ACL, arity, and cluster checks." + )); + } + None +} + +fn command_dispatch_call_claim(ctx: &SourceClaimContext<'_>) -> Option { + if let Some(call) = packet_source_identifier_exact(ctx.source, "call") + && packet_source_has_all(ctx.source, &["proc", "propagat"]) + && packet_source_has_any(ctx.source, &["slowlog", "monitor"]) + { + return Some(format!( + "{call} executes the command proc and handles propagation, monitoring, and slowlog accounting." + )); + } + None +} + +fn collect_search_execution_claims(ctx: &SourceClaimContext<'_>, claims: &mut Vec) { + if !packet_terms_indicate_search_execution_flow(&ctx.prompt_terms) { + return; + } + + push_optional_claim(claims, search_driver_claim(ctx)); + push_optional_claim(claims, argument_planning_claim(ctx)); + push_optional_claim(claims, search_execution_state_claim(ctx)); + push_optional_claim(claims, search_walk_claim(ctx)); + push_optional_claim(claims, parallel_search_claim(ctx)); + push_optional_claim(claims, search_execution_method_claim(ctx)); +} + +fn search_driver_claim(ctx: &SourceClaimContext<'_>) -> Option { + if packet_source_has_all(ctx.source, &["flags", "parse", "search"]) + && let Some(main) = packet_source_identifier_exact(ctx.source, "main") + { + let run = packet_source_identifier_exact(ctx.source, "run").unwrap_or_else(|| "run".into()); + return Some(format!( + "{main} delegates parsed search options into {run} for search execution." + )); + } + None +} + +fn argument_planning_claim(ctx: &SourceClaimContext<'_>) -> Option { + if packet_source_has_all(ctx.source, &["walk", "matcher", "searcher", "printer"]) { + let owner = packet_display_owner(ctx.symbol) + .or_else(|| packet_source_identifier_with_words_shortest(ctx.source, &["args"])) + .unwrap_or_else(|| ctx.symbol.to_string()); + return Some(format!( + "`{owner}` builds traversal, matching, search, and output components used by the search pipeline." + )); + } + None +} + +fn search_execution_state_claim(ctx: &SourceClaimContext<'_>) -> Option { + if packet_source_has_all(ctx.source, &["matcher", "searcher", "printer"]) + && packet_source_has_any(ctx.source, &["candidate", "file", "input", "path"]) + { + let execution_unit = + packet_source_identifier_with_words_shortest(ctx.source, &["search", "worker"]) + .unwrap_or_else(|| ctx.symbol.to_string()); + return Some(format!( + "`{execution_unit}` carries matching, search, and output state for each candidate input." + )); + } + None +} + +fn search_walk_claim(ctx: &SourceClaimContext<'_>) -> Option { + if packet_source_has_all(ctx.source, &["searcher", "search"]) + && packet_source_has_any(ctx.source, &["candidate", "file", "path", "walk"]) + && let Some(execution_unit) = + packet_source_identifier_with_words_shortest(ctx.source, &["search", "worker"]) + { + return Some(format!( + "candidate traversal invokes {execution_unit} for each file selected by the search walk." + )); + } + None +} + +fn parallel_search_claim(ctx: &SourceClaimContext<'_>) -> Option { + if packet_source_has_any(ctx.source, &["parallel", "concurrent", "rayon", "thread"]) + && packet_source_has_any(ctx.source, &["candidate", "file", "path", "walk"]) + && let Some(parallel_search) = + packet_source_identifier_with_words_shortest(ctx.source, &["search", "parallel"]) + { + return Some(format!( + "{parallel_search} uses parallel traversal to search candidate files concurrently." + )); + } + None +} + +fn search_execution_method_claim(ctx: &SourceClaimContext<'_>) -> Option { + if packet_source_has_all(ctx.source, &["matcher", "searcher", "printer"]) + && let Some(execution_unit) = + packet_source_identifier_with_words_shortest(ctx.source, &["search", "worker"]) + && let Some(search_method) = packet_source_identifier_exact(ctx.source, "search") + { + return Some(format!( + "{execution_unit}::{search_method} executes one candidate search with matching, search, and output state." + )); + } + None +} + +fn packet_generic_hook_cache_flow_claims(symbol: &str, source: &str) -> Vec { + let source_lower = source.to_ascii_lowercase(); + let mut claims = Vec::new(); + + if source_lower.contains("withargs") + && source_lower.contains("export default") + && let Some((public_hook, handler)) = packet_source_with_args_wrapper(source) + { + claims.push(format!( + "The public {public_hook} export wraps {handler} with argument normalization." + )); + } + + if source_lower.contains("serialize(_key)") + && (source_lower.contains("getcache") + || source_lower.contains("createcachehelper") + || source_lower.contains("cache")) + { + claims.push(format!( + "{symbol} serializes the key before reading cache state." + )); + } + + if source_lower.contains("cache.get(key)") + && source_lower.contains("return [") + && (source_lower.contains("cache.set(key") + || source_lower.contains("state[5]") + || source_lower.contains("setter")) + && (source_lower.contains("subscribe") + || source_lower.contains("state[6]") + || source_lower.contains("subscriber")) + && (source_lower.contains("snapshot") + || source_lower.contains("initial_cache") + || source_lower.contains("initial cache")) + { + claims.push(format!( + "{symbol} provides cache get, set, subscribe, and snapshot helpers." + )); + } + + claims +} + +fn packet_generic_client_send_flow_claims(symbol: &str, source: &str) -> Vec { + let source_lower = source.to_ascii_lowercase(); + let mut claims = Vec::new(); + let owner = packet_display_owner(symbol).unwrap_or_else(|| symbol.to_string()); + + if source_lower.contains("_sendunstreamed") + && source_lower.contains("response.fromstream") + && source_lower.contains("send(request)") + && (source_lower.contains("future") + || source_lower.contains("response>") + || source_lower.contains("response ")) + && packet_source_has_any(source, &["get(", "post(", "put(", "patch(", "delete("]) + { + claims.push(format!( + "{owner} implements convenience methods in terms of send." + )); + } + + if source_lower.contains("dart:io") + && source_lower.contains("httpclient") + && source_lower.contains("openurl") + && source_lower.contains("request.finalize") + && source_lower.contains("stream.pipe") + && source_lower.contains("httpclientresponse") + { + claims.push(format!( + "{owner}.send forwards finalized requests through an HTTP client transport." + )); + } + + claims +} + +fn packet_generic_url_session_request_flow_claims(symbol: &str, source: &str) -> Vec { + let normalized_symbol = normalize_identifier(symbol); + let source_lower = source.to_ascii_lowercase(); + let mut claims = Vec::new(); + + if (normalized_symbol == "session" || normalized_symbol.ends_with("sessionrequest")) + && source_lower.contains("open func request") + && source_lower.contains("let request =") + && source_lower.contains("performeagerlyifnecessary") + { + claims.push( + "Session request creation builds request objects before optional eager execution." + .to_string(), + ); + } + + if normalized_symbol.ends_with("requestresume") + && source_lower.contains("public func resume() -> self") + && source_lower.contains("task.resume()") + { + claims.push("Request.resume resumes the underlying request task.".to_string()); + } + + if normalized_symbol.ends_with("validate") + && source_lower.contains("public func validate(_ validation") + && source_lower.contains("validators.write") + && source_lower.contains("didvalidate") + && source_lower.contains("request") + { + claims.push("Request validation attaches validation behavior.".to_string()); + } + + if normalized_symbol.ends_with("delegate") + && source_lower.contains("urlsessiondatadelegate") + && source_lower.contains("open func urlsession") + && (source_lower.contains("request.didreceiveresponse") + || source_lower.contains("request.didreceive(data: data)") + || source_lower.contains("didcompletewitherror")) + { + claims.push("The session delegate receives request callback events.".to_string()); + } + + claims +} + +pub(crate) fn packet_generic_string_predicate_flow_claims( + symbol: &str, + source: &str, +) -> Vec { + let normalized_symbol = normalize_identifier(symbol); + let source_lower = source.to_ascii_lowercase(); + let mut claims = Vec::new(); + + if normalized_symbol.ends_with("isblank") + && let Some(method) = packet_source_method_block(source, "boolean", "isBlank") + { + let method_lower = method.to_ascii_lowercase(); + let null_empty_whitespace_documented = source_lower.contains("null, empty or whitespace") + || source_lower.contains("null, empty, or whitespace") + || source_lower.contains("null, empty and whitespace"); + if method_lower.contains("character.iswhitespace") + && (method_lower.contains("null") || null_empty_whitespace_documented) + && method_lower.contains("length") + { + claims.push( + "isBlank treats null, empty, and whitespace-only inputs as blank.".to_string(), + ); + } + } + + if normalized_symbol.ends_with("isempty") + && let Some(method) = packet_source_method_block(source, "boolean", "isEmpty") + { + let method_lower = method.to_ascii_lowercase(); + if method_lower.contains("null") + && method_lower.contains("length()") + && !method_lower.contains("trim(") + && !method_lower.contains(".trim") + && !method_lower.contains("strip(") + && !method_lower.contains(".strip") + { + claims.push("isEmpty does not trim whitespace before deciding emptiness.".to_string()); + } + } + + claims +} + +fn packet_source_method_block( + source: &str, + return_type: &str, + method_name: &str, +) -> Option { + let lower = source.to_ascii_lowercase(); + let method_lower = method_name.to_ascii_lowercase(); + let return_lower = return_type.to_ascii_lowercase(); + let patterns = [ + format!("{return_lower} {method_lower}("), + format!("{return_lower}\n{method_lower}("), + ]; + let method_start = patterns + .iter() + .filter_map(|pattern| lower.find(pattern)) + .min()?; + let brace_start = lower[method_start..].find('{')? + method_start; + let bytes = source.as_bytes(); + let mut depth = 0usize; + for index in brace_start..bytes.len() { + match bytes[index] { + b'{' => depth += 1, + b'}' => { + depth = depth.saturating_sub(1); + if depth == 0 { + return Some(source[method_start..=index].to_string()); + } + } + _ => {} + } + } + None +} + +pub(crate) fn packet_generic_css_animation_flow_claims(source: &str) -> Vec { + let mut claims = Vec::new(); + let custom_properties = packet_css_custom_property_names(source); + let duration = packet_css_custom_property_with_fragment(&custom_properties, "duration"); + let delay = packet_css_custom_property_with_fragment(&custom_properties, "delay"); + let repeat = packet_css_custom_property_with_fragment(&custom_properties, "repeat"); + + if let (Some(duration), Some(delay), Some(repeat)) = (duration, delay, repeat) { + claims.push(format!( + "Shared CSS custom properties {duration}, {delay}, and {repeat} define animation duration, delay, and repeat defaults." + )); + } + + if let Some(base_class) = + packet_css_class_with_properties(source, &["animation-duration", "animation-fill-mode"]) + { + claims.push(format!( + ".{base_class} is the base class that applies animation duration and fill mode." + )); + } + + for keyframe in packet_css_keyframe_names(source).into_iter().take(4) { + if packet_css_class_sets_animation_name(source, &keyframe) { + claims.push(format!( + "Named classes such as .{keyframe} set animation-name to matching keyframes; @keyframes {keyframe} defines the matching animation." + )); + } + } + + claims +} + +fn packet_css_custom_property_names(source: &str) -> Vec { + let bytes = source.as_bytes(); + let mut properties = Vec::new(); + let mut seen = HashSet::new(); + let mut index = 0usize; + while index + 1 < bytes.len() { + if bytes[index] != b'-' || bytes[index + 1] != b'-' { + index += 1; + continue; + } + let start = index; + index += 2; + while index < bytes.len() && packet_css_identifier_byte(bytes[index]) { + index += 1; + } + if index > start + 2 { + let property = source[start..index].to_string(); + if seen.insert(property.to_ascii_lowercase()) { + properties.push(property); + } + } + } + properties +} + +fn packet_css_custom_property_with_fragment<'a>( + properties: &'a [String], + fragment: &str, +) -> Option<&'a str> { + properties + .iter() + .find(|property| normalize_identifier(property).contains(fragment)) + .map(String::as_str) +} + +fn packet_css_class_with_properties(source: &str, required_properties: &[&str]) -> Option { + let lower = source.to_ascii_lowercase(); + let bytes = lower.as_bytes(); + let mut index = 0usize; + while let Some(dot_offset) = lower[index..].find('.') { + let dot = index + dot_offset; + let name_start = dot + 1; + if name_start >= bytes.len() || !packet_css_identifier_byte(bytes[name_start]) { + index = name_start.saturating_add(1); + continue; + } + let mut name_end = name_start; + while name_end < bytes.len() && packet_css_identifier_byte(bytes[name_end]) { + name_end += 1; + } + let Some(block_start_offset) = lower[name_end..].find('{') else { + break; + }; + let block_start = name_end + block_start_offset + 1; + let Some(block_end_offset) = lower[block_start..].find('}') else { + break; + }; + let block = &lower[block_start..block_start + block_end_offset]; + if required_properties + .iter() + .all(|property| block.contains(&property.to_ascii_lowercase())) + { + return Some(source[name_start..name_end].to_string()); + } + index = name_end; + } + None +} + +fn packet_css_keyframe_names(source: &str) -> Vec { + let lower = source.to_ascii_lowercase(); + let bytes = lower.as_bytes(); + let mut names = Vec::new(); + let mut seen = HashSet::new(); + let mut search_from = 0usize; + while let Some(offset) = lower[search_from..].find("@keyframes") { + let mut index = search_from + offset + "@keyframes".len(); + while index < bytes.len() && bytes[index].is_ascii_whitespace() { + index += 1; + } + let name_start = index; + while index < bytes.len() && packet_css_identifier_byte(bytes[index]) { + index += 1; + } + if index > name_start { + let name = source[name_start..index].to_string(); + if seen.insert(name.to_ascii_lowercase()) { + names.push(name); + } + } + search_from = index; + } + names +} + +fn packet_css_class_sets_animation_name(source: &str, class_name: &str) -> bool { + let lower = source.to_ascii_lowercase(); + let class_name = class_name.to_ascii_lowercase(); + let class_selector = format!(".{class_name}"); + if !lower.contains(&class_selector) { + return false; + } + let compact = lower + .chars() + .filter(|ch| !ch.is_whitespace()) + .collect::(); + compact.contains(&format!("animation-name:{class_name}")) +} + +fn packet_css_identifier_byte(byte: u8) -> bool { + byte.is_ascii_alphanumeric() || matches!(byte, b'-' | b'_') +} + +fn packet_source_with_args_wrapper(source: &str) -> Option<(String, String)> { + let lower = source.to_ascii_lowercase(); + let mut search_from = 0usize; + + while let Some(relative_at) = lower[search_from..].find("withargs") { + let with_args_at = search_from + relative_at; + let statement_start = source[..with_args_at] + .rfind(['\n', ';']) + .map(|idx| idx + 1) + .unwrap_or(0); + let before = &source[statement_start..with_args_at]; + let Some(wrapper) = before + .rsplit_once('=') + .and_then(|(left, _)| packet_last_identifier(left)) + else { + search_from = with_args_at + "withargs".len(); + continue; + }; + + let after = &source[with_args_at..]; + let Some(handler_start) = after.find('(').map(|idx| idx + 1) else { + search_from = with_args_at + "withargs".len(); + continue; + }; + let handler_tail = &after[handler_start..]; + let Some(handler) = packet_first_identifier_after_type_arguments(handler_tail) else { + search_from = with_args_at + "withargs".len(); + continue; + }; + + if packet_source_exports_default_identifier(after, &wrapper) { + return Some((wrapper, handler)); + } + + search_from = with_args_at + "withargs".len(); + } + + None +} + +fn packet_source_exports_default_identifier(source: &str, identifier: &str) -> bool { + let lower = source.to_ascii_lowercase(); + let mut search_from = 0usize; + + while let Some(relative_at) = lower[search_from..].find("export default") { + let export_at = search_from + relative_at + "export default".len(); + if packet_first_identifier(&source[export_at..]).as_deref() == Some(identifier) { + return true; + } + search_from = export_at; + } + + false +} + +fn packet_first_identifier_after_type_arguments(value: &str) -> Option { + let mut start = 0usize; + let trimmed = value.trim_start(); + if trimmed.starts_with('<') { + let mut depth = 0usize; + for (idx, ch) in trimmed.char_indices() { + match ch { + '<' => depth += 1, + '>' => { + depth = depth.saturating_sub(1); + if depth == 0 { + start = idx + ch.len_utf8(); + break; + } + } + _ => {} + } + } + } + packet_first_identifier(&trimmed[start..]) +} + +fn packet_first_identifier(value: &str) -> Option { + let mut chars = value + .char_indices() + .skip_while(|(_, ch)| !is_ident_start(*ch)); + let (start, _) = chars.next()?; + let mut end = value.len(); + for (idx, ch) in value[start..].char_indices().skip(1) { + if !is_ident_continue(ch) { + end = start + idx; + break; + } + } + Some(value[start..end].to_string()) +} + +fn packet_last_identifier(value: &str) -> Option { + value + .split(|ch: char| !is_ident_continue(ch)) + .rfind(|part| part.chars().next().is_some_and(is_ident_start)) + .map(str::to_string) +} + +fn is_ident_start(ch: char) -> bool { + ch == '_' || ch.is_ascii_alphabetic() +} + +fn is_ident_continue(ch: char) -> bool { + ch == '_' || ch.is_ascii_alphanumeric() +} + +fn packet_generic_shell_version_use_flow_claims(symbol: &str, source: &str) -> Vec { + let normalized_symbol = normalize_identifier(symbol); + let source_lower = source.to_ascii_lowercase(); + let mut claims = Vec::new(); + + if (normalized_symbol.contains("ifneeded") || normalized_symbol.contains("needed")) + && source_lower.contains("if ") + && source_lower.contains("${1-}") + && source_lower.contains("current") + && source_lower.contains("return") + && source_lower.contains("$@") + && source_lower.contains(" use ") + { + claims.push(format!( + "{symbol} switches versions only when the requested version is not already active." + )); + } + + claims +} + +fn packet_generic_server_route_flow_claims(symbol: &str, source: &str) -> Vec { + let normalized_symbol = normalize_identifier(symbol); + let source_lower = source.to_ascii_lowercase(); + let mut claims = Vec::new(); + + if normalized_symbol.contains("application") + && source_lower.contains("function") + && source_lower.contains("handle(") + && source_lower.contains("request") + && source_lower.contains("response") + { + claims.push( + "The application factory builds a callable request handler and wires request/response state." + .to_string(), + ); + } + + if normalized_symbol.ends_with("handle") + && source_lower.contains("router") + && source_lower.contains(".handle(") + { + claims.push("The application handler delegates request handling to a router.".to_string()); + } + + if normalized_symbol.ends_with("use") + && source_lower.contains("function use") + && source_lower.contains("router.use(") + { + claims.push("Middleware registration delegates to a router.".to_string()); + } + + if normalized_symbol.ends_with("route") + && source_lower.contains("function route") + && source_lower.contains("router.route(") + { + claims.push( + "The route registration helper creates route entries through a router.".to_string(), + ); + } + + if normalized_symbol.ends_with("send") + && source_lower.contains("res.send = function send") + && source_lower.contains("this.set('content-length'") + && source_lower.contains(".end(") + { + claims.push( + "The response send helper sets response metadata before ending the body.".to_string(), + ); + } + + if normalized_symbol.contains("handle") + && source_lower.contains("handlers") + && source_lower.contains("relativepath") + && (source_lower.contains(".handle(") || source_lower.contains(" handle(")) + && source_lower.contains("return") + { + claims.push(format!( + "{symbol} registers routes by delegating to the group handle path." + )); + } + + if normalized_symbol.ends_with("next") + && source_lower.contains("handlers") + && source_lower.contains("index") + && source_lower.contains("++") + && source_lower.contains("for ") + { + claims.push(format!("{symbol} advances through the handler chain.")); + } + + claims +} + +fn packet_generic_sql_schema_flow_claims(source: &str) -> Vec { + let mut claims = Vec::new(); + let tables = packet_sql_create_table_names(source); + if !tables.is_empty() { + claims.push(format!( + "SQL schema defines tables {}.", + packet_human_join(&tables.iter().take(6).cloned().collect::>()) + )); + } + for claim in packet_sql_foreign_key_claims(source) { + if !claims.iter().any(|existing| existing == &claim) { + claims.push(claim); + } + if claims.len() >= 18 { + break; + } + } + claims +} + +fn packet_generic_runtime_formatting_flow_claims(source: &str) -> Vec { + let normalized_source = normalize_identifier(source); + let mut claims = Vec::new(); + + if normalized_source.contains("vformat") + && (normalized_source.contains("formatargs") + || normalized_source.contains("basicformatargs") + || normalized_source.contains("formatargstore")) + && (normalized_source.contains("vformatto") || normalized_source.contains("formatto")) + { + claims.push( + "Runtime formatting uses type-erased format arguments before dispatching formatted output helpers." + .to_string(), + ); + } + + if normalized_source.contains("formaterror") + && (normalized_source.contains("runtimeerror") + || normalized_source.contains("throwformaterror") + || normalized_source.contains("formatting")) + { + claims.push("Formatting errors are represented as runtime failures.".to_string()); + } + + claims +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::agent::eval_probes::EVAL_PROBES_ENV; + use codestory_contracts::api::{NodeId, NodeKind, RetrievalScoreBreakdownDto, SearchHitOrigin}; + + fn test_packet_citation(display_name: &str, file_path: &str) -> AgentCitationDto { + AgentCitationDto { + node_id: NodeId(display_name.to_string()), + display_name: display_name.to_string(), + kind: NodeKind::FUNCTION, + file_path: Some(file_path.to_string()), + line: Some(10), + score: 0.9, + origin: SearchHitOrigin::IndexedSymbol, + resolvable: true, + subgraph_id: None, + evidence_edge_ids: Vec::new(), + retrieval_score_breakdown: Some(RetrievalScoreBreakdownDto { + lexical: 0.4, + semantic: 0.2, + graph: 0.3, + total: 0.9, + provenance: Vec::new(), + }), + } + } + + struct EvalProbesGuard; + + impl EvalProbesGuard { + fn enabled() -> Self { + crate::agent::eval_probes::push_eval_probes_test_override(); + Self + } + } + + impl Drop for EvalProbesGuard { + fn drop(&mut self) { + crate::agent::eval_probes::pop_eval_probes_test_override(); + } + } + + struct EnvVarGuard { + key: &'static str, + previous: Option, + } + + impl EnvVarGuard { + fn cleared(key: &'static str) -> Self { + let previous = std::env::var_os(key); + // SAFETY: tests use this guard to isolate one env var for this process-local + // regression and restore it on drop. + unsafe { + std::env::remove_var(key); + } + Self { key, previous } + } + } + + impl Drop for EnvVarGuard { + fn drop(&mut self) { + // SAFETY: restores the process-local env var captured by this guard. + unsafe { + if let Some(previous) = self.previous.take() { + std::env::set_var(self.key, previous); + } else { + std::env::remove_var(self.key); + } + } + } + } + + fn hook_cache_source() -> &'static str { + r#" + export const useSWRHandler = (_key, fetcher, config) => { + const [key, fnArg] = serialize(_key) + const [getCache, setCache, subscribeCache, getInitialCache] = + createCacheHelper(cache, key) + const cachedData = getCache() + return { data: cachedData.data } + } + const useSWR = withArgs(useSWRHandler) + export default useSWR + "# + } + + fn client_send_source() -> &'static str { + r#" + import 'dart:io'; + + abstract mixin class BaseTransportClient implements Client { + Future get(Uri url) => _sendUnstreamed('GET', url); + Future post(Uri url, {Object? body}) => + _sendUnstreamed('POST', url, body); + Future send(BaseRequest request); + Future _sendUnstreamed(String method, Uri url) async { + var request = Request(method, url); + return Response.fromStream(await send(request)); + } + } + + class NativeClient extends BaseTransportClient { + HttpClient? _inner; + Future send(BaseRequest request) async { + var stream = request.finalize(); + var ioRequest = await _inner!.openUrl(request.method, request.url); + final response = await stream.pipe(ioRequest) as HttpClientResponse; + return NativeStreamedResponse(response); + } + } + "# + } + + fn command_dispatch_source() -> &'static str { + r#" + void readQueryFromClient(client *c) { + processInputBuffer(c); + } + + void processInputBuffer(client *c) { + processCommand(c); + } + + int processCommand(client *c) { + lookupCommand(c->argv[0]); + aclCheckCommandPerm(c); + if (arity) return C_ERR; + if (cluster) return C_ERR; + call(c, 0); + } + + void call(client *c, int flags) { + c->cmd->proc(c); + propagate(c); + slowlogPushEntryIfNeeded(c); + } + "# + } + + fn search_execution_source() -> &'static str { + r#" + fn main() { + let flags = parse_flags(); + run(flags); + } + + fn run(flags: Flags) { + let args = HiArgs::from(flags); + search_parallel(args); + } + + struct HiArgs { + walk: WalkBuilder, + matcher: Matcher, + searcher: Searcher, + printer: Printer, + } + + struct SearchWorker { + matcher: Matcher, + searcher: Searcher, + printer: Printer, + candidate_path: PathBuf, + } + + impl SearchWorker { + fn search(&mut self) { + self.searcher.search_path(&self.matcher, &self.candidate_path, &mut self.printer); + } + } + "# + } + + #[test] + fn source_claims_do_not_activate_product_profiles_for_codestory_packet_audit_prompt() { + let prompt = "Audit CodeStory packet and orchestrator sufficiency for generic public helper cache source text."; + let cases = [ + ( + "useSWRHandler", + "src/index/use-swr.ts", + hook_cache_source(), + &[ + "The public useSWR export wraps useSWRHandler with argument normalization.", + "useSWRHandler serializes the key before reading cache state.", + ][..], + ), + ( + "BaseTransportClient", + "src/base_client.dart", + client_send_source(), + &[ + "BaseTransportClient implements convenience methods in terms of send.", + "BaseTransportClient.send forwards finalized requests through an HTTP client transport.", + ][..], + ), + ( + "processCommand", + "src/server.c", + command_dispatch_source(), + &[ + "readQueryFromClient appends socket input and drives processInputBuffer when a full command is available.", + "processCommand resolves the command table entry and enforces ACL, arity, and cluster checks.", + "call executes the command proc and handles propagation, monitoring, and slowlog accounting.", + ][..], + ), + ( + "HiArgs", + "crates/core/main.rs", + search_execution_source(), + &[ + "`HiArgs` builds traversal, matching, search, and output components used by the search pipeline.", + "`SearchWorker` carries matching, search, and output state for each candidate input.", + "SearchWorker::search executes one candidate search with matching, search, and output state.", + ][..], + ), + ]; + + for (symbol, path, source, blocked_claims) in cases { + let citation = test_packet_citation(symbol, path); + let claims = packet_source_derived_claims_for_citation(prompt, &citation, source); + for blocked_claim in blocked_claims { + assert!( + !claims.iter().any(|claim| claim == blocked_claim), + "CodeStory packet audit prompt must not activate unrelated product claim `{blocked_claim}`; got {claims:?}" + ); + } + } + } + + #[test] + fn search_execution_source_claims_are_eval_only() { + let _env = EnvVarGuard::cleared(EVAL_PROBES_ENV); + let prompt = "Explain how a search command parses CLI flags, walks candidate files, and executes a search through matcher, searcher, and printer components."; + let citation = test_packet_citation("HiArgs", "crates/core/main.rs"); + + let claims = + packet_source_derived_claims_for_citation(prompt, &citation, search_execution_source()); + assert!( + claims.is_empty(), + "search execution claims should be eval-only in production source profiles; got {claims:?}" + ); + + let _eval_probes = EvalProbesGuard::enabled(); + let claims = + packet_source_derived_claims_for_citation(prompt, &citation, search_execution_source()); + for expected in [ + "`HiArgs` builds traversal, matching, search, and output components used by the search pipeline.", + "`SearchWorker` carries matching, search, and output state for each candidate input.", + ] { + assert!( + claims.iter().any(|claim| claim == expected), + "expected eval-only search execution claim `{expected}`; got {claims:?}" + ); + } + } + + #[test] + fn role_search_execution_claims_are_eval_only() { + let _env = EnvVarGuard::cleared(EVAL_PROBES_ENV); + let temp = tempfile::tempdir().expect("temp dir"); + let source_path = temp.path().join("main.rs"); + std::fs::write(&source_path, search_execution_source()).expect("write source"); + let citation = test_packet_citation("HiArgs", &source_path.to_string_lossy()); + let prompt = "Explain how a search command parses CLI flags, walks candidate files, and executes a search through matcher, searcher, and printer components."; + + assert_eq!( + packet_source_derived_claim_for_role( + PacketEvidenceRole::ArgumentPlanning, + &citation, + prompt + ), + None, + "role-specific search claims should be eval-only in production" + ); + + let _eval_probes = EvalProbesGuard::enabled(); + assert_eq!( + packet_source_derived_claim_for_role( + PacketEvidenceRole::ArgumentPlanning, + &citation, + prompt + ) + .as_deref(), + Some( + "`HiArgs` builds traversal, matching, search, and output components used by the search pipeline." + ) + ); + } + + #[test] + fn source_claims_activate_hook_cache_only_with_hook_or_swr_intent() { + let _env = EnvVarGuard::cleared(EVAL_PROBES_ENV); + let generic_prompt = "Explain public helper cache behavior."; + let citation = test_packet_citation("useSWRHandler", "src/index/use-swr.ts"); + let claims = packet_source_derived_claims_for_citation( + generic_prompt, + &citation, + hook_cache_source(), + ); + assert!( + claims.is_empty(), + "generic cache words must not activate SWR hook/cache claims; got {claims:?}" + ); + + let swr_prompt = + "Explain how SWR exposes a public hook, serializes keys, and connects cache helpers."; + let claims = + packet_source_derived_claims_for_citation(swr_prompt, &citation, hook_cache_source()); + assert!( + claims.is_empty(), + "SWR-shaped claims should be eval-only in production source profiles; got {claims:?}" + ); + + let _eval_probes = EvalProbesGuard::enabled(); + let claims = + packet_source_derived_claims_for_citation(swr_prompt, &citation, hook_cache_source()); + for expected in [ + "The public useSWR export wraps useSWRHandler with argument normalization.", + "useSWRHandler serializes the key before reading cache state.", + ] { + assert!( + claims.iter().any(|claim| claim == expected), + "expected hook/cache claim `{expected}`; got {claims:?}" + ); + } + + let swr_only_prompt = "Explain SWR cache behavior and its public API."; + let claims = packet_source_derived_claims_for_citation( + swr_only_prompt, + &citation, + hook_cache_source(), + ); + for expected in [ + "The public useSWR export wraps useSWRHandler with argument normalization.", + "useSWRHandler serializes the key before reading cache state.", + ] { + assert!( + claims.iter().any(|claim| claim == expected), + "SWR-specific prompts without the word hook should still activate `{expected}`; got {claims:?}" + ); + } + } + + #[test] + fn source_claims_activate_client_send_only_with_client_request_send_intent() { + let _env = EnvVarGuard::cleared(EVAL_PROBES_ENV); + let generic_prompt = "Explain helper cache architecture."; + let citation = test_packet_citation("BaseTransportClient", "src/base_client.dart"); + let claims = packet_source_derived_claims_for_citation( + generic_prompt, + &citation, + client_send_source(), + ); + assert!( + claims.is_empty(), + "generic helper/cache words must not activate Dart client send claims; got {claims:?}" + ); + + let client_prompt = + "Explain how a client request helper routes send behavior through the transport."; + let claims = packet_source_derived_claims_for_citation( + client_prompt, + &citation, + client_send_source(), + ); + assert!( + claims.is_empty(), + "Dart client transport claims should be eval-only in production source profiles; got {claims:?}" + ); + + let _eval_probes = EvalProbesGuard::enabled(); + let claims = packet_source_derived_claims_for_citation( + client_prompt, + &citation, + client_send_source(), + ); + for expected in [ + "BaseTransportClient implements convenience methods in terms of send.", + "BaseTransportClient.send forwards finalized requests through an HTTP client transport.", + ] { + assert!( + claims.iter().any(|claim| claim == expected), + "expected client send claim `{expected}`; got {claims:?}" + ); + } + } + + #[test] + fn source_claims_activate_command_claims_only_with_command_event_loop_intent() { + let _env = EnvVarGuard::cleared(EVAL_PROBES_ENV); + let generic_prompt = "Audit packet helper cache source shapes."; + let citation = test_packet_citation("processCommand", "src/server.c"); + let claims = packet_source_derived_claims_for_citation( + generic_prompt, + &citation, + command_dispatch_source(), + ); + assert!( + claims.is_empty(), + "generic prompt must not activate command dispatch claims; got {claims:?}" + ); + + let command_prompt = "Explain Redis command dispatch from network command input through the command table and slowlog call accounting."; + let claims = packet_source_derived_claims_for_citation( + command_prompt, + &citation, + command_dispatch_source(), + ); + assert!( + claims.is_empty(), + "command/event-loop claims should be eval-only in production source profiles; got {claims:?}" + ); + + let _eval_probes = EvalProbesGuard::enabled(); + let claims = packet_source_derived_claims_for_citation( + command_prompt, + &citation, + command_dispatch_source(), + ); + for expected in [ + "readQueryFromClient appends socket input and drives processInputBuffer when a full command is available.", + "processCommand resolves the command table entry and enforces ACL, arity, and cluster checks.", + "call executes the command proc and handles propagation, monitoring, and slowlog accounting.", + ] { + assert!( + claims.iter().any(|claim| claim == expected), + "expected command/event-loop claim `{expected}`; got {claims:?}" + ); + } + } +} diff --git a/crates/codestory-runtime/src/agent/packet_claims.rs b/crates/codestory-runtime/src/agent/packet_claims.rs new file mode 100644 index 00000000..6a6e3149 --- /dev/null +++ b/crates/codestory-runtime/src/agent/packet_claims.rs @@ -0,0 +1,850 @@ +use crate::agent::eval_probes::{ + eval_citation_shaped_claim, eval_flow_template_claims, eval_probes_enabled, + eval_supporting_claim_flow_sentence, +}; +use crate::agent::packet_citations::{ + packet_citation_matching_display, packet_citation_matching_display_contains, + packet_citation_source_text, +}; +use crate::agent::packet_claim_profiles::{ + packet_source_derived_claim_for_role, packet_source_derived_claims_for_citation, +}; +use crate::agent::packet_command_profiles::packet_append_command_flow_template_claims; +use crate::agent::packet_evidence_roles::{ + PacketEvidenceRole, packet_claim_key_for_citation, packet_evidence_role, +}; +use crate::agent::packet_plan::packet_rank_terms; +use crate::agent::packet_scoring::{ + normalize_identifier, packet_adjacent_query_stop_term, packet_claim_carry_rank, + packet_display_path, packet_query_stop_term, +}; +use crate::agent::packet_terms::{packet_probe_terms, packet_terms_indicate_sql_schema_flow}; +use crate::query_mentions_non_primary_source; +use codestory_contracts::api::{AgentAnswerDto, AgentCitationDto, PacketClaimDto}; +use std::cmp::Ordering; +use std::collections::HashSet; +use std::fmt::Write as _; + +const PACKET_SOURCE_DEFINITION_CLAIM_LIMIT: usize = 6; + +pub(crate) fn packet_flow_claims_markdown(claims: &[PacketClaimDto]) -> String { + let mut markdown = String::new(); + markdown.push_str("Supported claims for a compact agent answer:\n"); + for claim in claims { + let citation = claim.citations.first(); + let suffix = citation + .and_then(|citation| citation.file_path.as_deref()) + .map(packet_display_path) + .map(|path| format!(" (`{path}`)")) + .unwrap_or_default(); + let _ = writeln!(markdown, "- {}{}", claim.claim, suffix); + } + markdown +} + +pub(crate) fn packet_supported_claims(answer: &AgentAnswerDto) -> Vec { + let mut claims = Vec::new(); + let mut seen_claims = HashSet::new(); + let rank_terms = packet_rank_terms(&answer.prompt); + let prefer_primary_sources = !query_mentions_non_primary_source(&answer.prompt); + let citations = answer.citations.clone(); + + append_flow_template_claims(&answer.prompt, &citations, &mut claims, &mut seen_claims); + append_ranked_citation_claims( + &answer.prompt, + &citations, + &rank_terms, + prefer_primary_sources, + &mut claims, + &mut seen_claims, + ); + claims +} + +pub(crate) fn append_flow_template_claims( + prompt: &str, + citations: &[AgentCitationDto], + claims: &mut Vec, + seen: &mut HashSet, +) { + let normalized_prompt = normalize_identifier(prompt); + + packet_append_command_flow_template_claims(prompt, citations, claims, seen); + packet_append_event_output_flow_template_claims(&normalized_prompt, citations, claims, seen); + packet_append_indexing_pipeline_flow_template_claims(prompt, citations, claims, seen); + packet_append_source_derived_flow_claims(prompt, citations, claims, seen); + packet_append_sql_schema_file_claims(prompt, citations, claims, seen); + if !eval_probes_enabled() { + return; + } + packet_append_indexing_storage_flow_template_claims(prompt, citations, claims, seen); + for (claim, citation) in eval_flow_template_claims(&normalized_prompt, citations) { + packet_push_flow_template_claim(claims, seen, &claim, Some(citation)); + } +} + +fn packet_append_event_output_flow_template_claims( + normalized_prompt: &str, + citations: &[AgentCitationDto], + claims: &mut Vec, + seen: &mut HashSet, +) { + if (normalized_prompt.contains("json") || normalized_prompt.contains("jsonl")) + && (normalized_prompt.contains("event") || normalized_prompt.contains("output")) + && let Some(json_output_citation) = citations.iter().find(|citation| { + packet_evidence_role(citation) == Some(PacketEvidenceRole::EventOutputProcessing) + }) + { + packet_push_flow_template_claim( + claims, + seen, + "Event-output processing evidence describes how structured runtime events are serialized for JSON/JSONL output.", + Some(json_output_citation.clone()), + ); + } +} + +fn packet_append_indexing_pipeline_flow_template_claims( + prompt: &str, + citations: &[AgentCitationDto], + claims: &mut Vec, + seen: &mut HashSet, +) { + let normalized_prompt = normalize_identifier(prompt); + let indexing_prompt = normalized_prompt.contains("indexing") + || normalized_prompt.contains("indexed") + || normalized_prompt.contains("indexer") + || normalized_prompt.contains("indexcommand"); + if !(indexing_prompt + && normalized_prompt.contains("runtime") + && (normalized_prompt.contains("workspace") + || normalized_prompt.contains("sourcefile") + || normalized_prompt.contains("filediscovery")) + && (normalized_prompt.contains("persistence") || normalized_prompt.contains("store")) + && normalized_prompt.contains("snapshot")) + { + return; + } + + let cli_entry = packet_citation_matching_display(citations, "run_index") + .or_else(|| packet_citation_matching_display(citations, "Command::Index")) + .or_else(|| packet_citation_matching_display(citations, "IndexCommand")) + .or_else(|| packet_citation_matching_display(citations, "CliDirection")); + let runtime_entry = + packet_citation_matching_display_contains(citations, "IndexService::run_indexing") + .or_else(|| packet_citation_matching_display(citations, "Runtime::index_service")); + if let Some(runtime_entry) = runtime_entry { + let mut claim_citations = Vec::new(); + if let Some(cli_entry) = cli_entry { + claim_citations.push(cli_entry.clone()); + } + claim_citations.push(runtime_entry.clone()); + packet_push_flow_template_claim_with_citations( + claims, + seen, + "The CLI index command prepares command options and delegates indexing work into the runtime layer.", + claim_citations, + ); + } + + let workspace_plan = + packet_citation_matching_display(citations, "WorkspaceManifest::build_execution_plan"); + if let Some(runtime_entry) = runtime_entry { + let mut claim_citations = vec![runtime_entry.clone()]; + if let Some(workspace_plan) = workspace_plan { + claim_citations.push(workspace_plan.clone()); + } + packet_push_flow_template_claim_with_citations( + claims, + seen, + "The runtime opens the workspace and store, chooses full or incremental indexing, and coordinates later refresh phases.", + claim_citations, + ); + } + + if let Some(workspace_plan) = workspace_plan { + packet_push_flow_template_claim( + claims, + seen, + "The workspace crate is responsible for source-file discovery and refresh-plan construction.", + Some(workspace_plan.clone()), + ); + } + + let workspace_indexer = packet_citation_matching_display(citations, "WorkspaceIndexer::run"); + let index_file = packet_citation_matching_display(citations, "index_file"); + if workspace_indexer.is_some() || index_file.is_some() { + let mut claim_citations = Vec::new(); + if let Some(workspace_indexer) = workspace_indexer { + claim_citations.push(workspace_indexer.clone()); + } + if let Some(index_file) = index_file { + claim_citations.push(index_file.clone()); + } + packet_push_flow_template_claim_with_citations( + claims, + seen, + "The indexer extracts nodes, edges, occurrences, and related symbol data from source files.", + claim_citations, + ); + } + + let storage_flush = + packet_citation_matching_display(citations, "Storage::flush_projection_batch"); + let search_projection = packet_citation_matching_display( + citations, + "Storage::rebuild_search_symbol_projection_from_node_table", + ); + if storage_flush.is_some() || search_projection.is_some() { + let mut claim_citations = Vec::new(); + if let Some(storage_flush) = storage_flush { + claim_citations.push(storage_flush.clone()); + } + if let Some(search_projection) = search_projection { + claim_citations.push(search_projection.clone()); + } + packet_push_flow_template_claim_with_citations( + claims, + seen, + "The store persists graph and file data to SQLite and rebuilds query/search projections from persisted data.", + claim_citations, + ); + } + + if let Some(snapshot_refresh) = + packet_citation_matching_display(citations, "SnapshotStore::refresh_all_with_stats") + { + packet_push_flow_template_claim( + claims, + seen, + "Snapshot refresh happens after persisted data changes so later grounding and summary reads see current indexed state.", + Some(snapshot_refresh.clone()), + ); + } +} + +fn packet_append_source_derived_flow_claims( + prompt: &str, + citations: &[AgentCitationDto], + claims: &mut Vec, + seen: &mut HashSet, +) { + for citation in citations.iter().take(24) { + let source = match packet_citation_source_text(citation) { + Some(source) if source.len() <= 800_000 => source, + _ => continue, + }; + for claim in packet_source_derived_claims_for_citation(prompt, citation, &source) { + packet_push_flow_template_claim(claims, seen, &claim, Some(citation.clone())); + if claims.len() >= 18 { + return; + } + } + } +} + +fn packet_append_sql_schema_file_claims( + prompt: &str, + citations: &[AgentCitationDto], + claims: &mut Vec, + seen: &mut HashSet, +) { + let terms = packet_probe_terms(prompt); + if !packet_terms_indicate_sql_schema_flow(&terms) { + return; + } + + let mut sql_schema_citations = Vec::new(); + let mut seen_paths = HashSet::new(); + let mut dialects = HashSet::new(); + for citation in citations { + let Some(path) = citation.file_path.as_deref() else { + continue; + }; + let display_path = packet_display_path(path); + if !display_path.to_ascii_lowercase().ends_with(".sql") { + continue; + } + let normalized_path = display_path.to_ascii_lowercase(); + if !seen_paths.insert(normalized_path.clone()) { + continue; + } + let Ok(source) = std::fs::read_to_string(path) else { + continue; + }; + if !source.to_ascii_lowercase().contains("create table") { + continue; + } + if let Some(dialect) = packet_sql_dialect_key(&normalized_path) { + dialects.insert(dialect); + } + sql_schema_citations.push(citation.clone()); + } + + if sql_schema_citations.len() < 2 { + return; + } + + let subject = packet_sql_schema_prompt_subject(prompt); + let claim = match (dialects.len() >= 2, subject.as_deref()) { + (true, Some(subject)) => { + format!( + "The repository carries multiple SQL dialect scripts for the same {subject} schema." + ) + } + (true, None) => { + "The repository carries multiple SQL dialect scripts for the same schema.".to_string() + } + (false, Some(subject)) => { + format!( + "The repository carries multiple SQL schema scripts for the same {subject} schema." + ) + } + (false, None) => { + "The repository carries multiple SQL schema scripts for the same schema.".to_string() + } + }; + packet_push_flow_template_claim_with_citations( + claims, + seen, + &claim, + sql_schema_citations.into_iter().take(3).collect(), + ); +} + +fn packet_sql_dialect_key(normalized_path: &str) -> Option<&'static str> { + if normalized_path.contains("sqlite") { + Some("sqlite") + } else if normalized_path.contains("mysql") { + Some("mysql") + } else if normalized_path.contains("postgres") || normalized_path.contains("pgsql") { + Some("postgres") + } else if normalized_path.contains("sqlserver") || normalized_path.contains("mssql") { + Some("sqlserver") + } else if normalized_path.contains("db2") { + Some("db2") + } else if normalized_path.contains("oracle") { + Some("oracle") + } else { + None + } +} + +fn packet_sql_schema_prompt_subject(prompt: &str) -> Option { + let stop_words = [ + "Explain", + "Trace", + "Cite", + "Name", + "SQL", + "Schema", + "Relationships", + "Relation", + "Tables", + "Table", + ]; + prompt + .split(|ch: char| !ch.is_ascii_alphanumeric() && ch != '_') + .map(str::trim) + .find(|token| { + token.len() >= 4 + && token + .chars() + .next() + .is_some_and(|ch| ch.is_ascii_uppercase()) + && !stop_words + .iter() + .any(|stop| stop.eq_ignore_ascii_case(token)) + }) + .map(str::to_string) +} + +fn packet_append_indexing_storage_flow_template_claims( + prompt: &str, + citations: &[AgentCitationDto], + claims: &mut Vec, + seen: &mut HashSet, +) { + let normalized_prompt = normalize_identifier(prompt); + let indexing_prompt = normalized_prompt.contains("indexing") + || normalized_prompt.contains("indexed") + || normalized_prompt.contains("indexer"); + let storage_prompt = normalized_prompt.contains("storage") + || normalized_prompt.contains("persistent") + || normalized_prompt.contains("sourcegroup") + || normalized_prompt.contains("sourcegroupconfiguration"); + if !(indexing_prompt && storage_prompt) { + return; + } + + let source_group = citations.iter().find(|citation| { + packet_evidence_role(citation) == Some(PacketEvidenceRole::SourceGroupConfiguration) + }); + let indexing_work = citations.iter().find(|citation| { + packet_evidence_role(citation) == Some(PacketEvidenceRole::IndexingWorkQueue) + }); + if let Some(source_group) = source_group + && let Some(indexing_work) = indexing_work + { + packet_push_flow_template_claim_with_citations( + claims, + seen, + "Source-group configuration and indexing command evidence describe how repository configuration becomes indexing work.", + vec![source_group.clone(), indexing_work.clone()], + ); + } + + if let Some(persistence) = citations.iter().find(|citation| { + packet_evidence_role(citation) == Some(PacketEvidenceRole::PersistenceAndSearchProjection) + }) { + packet_push_flow_template_claim( + claims, + seen, + "Persistence/search-projection evidence describes how indexed data remains available to later application reads.", + Some(persistence.clone()), + ); + } +} + +fn packet_push_flow_template_claim( + claims: &mut Vec, + seen: &mut HashSet, + claim_text: &str, + citation: Option, +) { + packet_push_flow_template_claim_with_citations( + claims, + seen, + claim_text, + citation.map(|value| vec![value]).unwrap_or_default(), + ); +} + +fn packet_push_flow_template_claim_with_citations( + claims: &mut Vec, + seen: &mut HashSet, + claim_text: &str, + citations: Vec, +) { + let key = normalize_identifier(claim_text); + if key.is_empty() || !seen.insert(key) { + return; + } + claims.push(PacketClaimDto { + claim: claim_text.to_string(), + citations, + }); +} + +pub(crate) fn append_ranked_citation_claims( + prompt: &str, + citations: &[AgentCitationDto], + rank_terms: &[String], + prefer_primary_sources: bool, + claims: &mut Vec, + seen_claims: &mut HashSet, +) { + let mut ordered_citations = citations.to_vec(); + ordered_citations.sort_by(|left, right| { + packet_claim_carry_rank(right, rank_terms, prefer_primary_sources) + .partial_cmp(&packet_claim_carry_rank( + left, + rank_terms, + prefer_primary_sources, + )) + .unwrap_or(Ordering::Equal) + }); + for citation in &ordered_citations { + if let Some(shaped) = packet_citation_shaped_claim(citation, prompt) { + let key = normalize_identifier(&shaped); + if seen_claims.insert(key) { + claims.push(PacketClaimDto { + claim: shaped, + citations: vec![citation.clone()], + }); + } + continue; + } + let role = match packet_evidence_role(citation) { + Some(PacketEvidenceRole::TestsAndRegressionCoverage) => { + let lower = prompt.to_ascii_lowercase(); + if lower.contains("test") + || lower.contains("regression") + || lower.contains("edit") + || lower.contains("plan") + { + PacketEvidenceRole::TestsAndRegressionCoverage + } else { + continue; + } + } + Some(role) => role, + None => PacketEvidenceRole::SourceEvidence, + }; + let claim_key = packet_claim_key_for_citation(role, citation); + if !seen_claims.insert(claim_key.clone()) { + continue; + } + claims.push(PacketClaimDto { + claim: packet_claim_for_role(role, citation, prompt, rank_terms), + citations: vec![citation.clone()], + }); + if claims.len() >= 18 { + break; + } + } + if claims.len() < 18 { + packet_append_source_definition_claims(&ordered_citations, rank_terms, claims, seen_claims); + } +} + +pub(crate) fn packet_claim_for_role( + role: PacketEvidenceRole, + citation: &AgentCitationDto, + prompt: &str, + rank_terms: &[String], +) -> String { + if let Some(shaped) = packet_citation_shaped_claim(citation, prompt) { + return shaped; + } + if let Some(source_derived) = packet_source_derived_claim_for_role(role, citation, prompt) { + return source_derived; + } + let symbol = citation.display_name.as_str(); + let path = citation + .file_path + .as_deref() + .map(packet_display_path) + .unwrap_or_default(); + match role { + PacketEvidenceRole::CommandEntrypoint => format!( + "The command or public entrypoint for this flow is anchored by `{symbol}`; inspect it before following downstream coordination." + ), + PacketEvidenceRole::ClientFactory => format!( + "Client factory behavior is anchored by `{symbol}`; inspect it for instance creation and request-method binding." + ), + PacketEvidenceRole::InterceptorManagement => format!( + "Interceptor management is anchored by `{symbol}`; inspect it for fulfilled/rejected handler registration and iteration." + ), + PacketEvidenceRole::RequestDispatch => format!( + "Request dispatch is anchored by `{symbol}`; inspect it for config transformation and adapter handoff." + ), + PacketEvidenceRole::TransportAdapter => format!( + "Transport adapter selection is anchored by `{symbol}`; inspect it for environment-specific transport choice." + ), + PacketEvidenceRole::EventLoop => format!( + "Event-loop polling is anchored by `{symbol}`; inspect it for readable/writable file-event dispatch." + ), + PacketEvidenceRole::NetworkCommandInput => format!( + "Network command input is anchored by `{symbol}`; inspect it for socket reads and command-buffer processing." + ), + PacketEvidenceRole::CommandDispatch => format!( + "Command dispatch is anchored by `{symbol}`; inspect it for command lookup, validation, execution, and propagation." + ), + PacketEvidenceRole::ArgumentPlanning => format!( + "Argument planning is anchored by `{symbol}`; inspect it for walker, matcher, searcher, and printer construction." + ), + PacketEvidenceRole::SearchDriver => format!( + "Search driver behavior is anchored by `{symbol}`; inspect it for entrypoint routing and sequential or parallel search selection." + ), + PacketEvidenceRole::SearchExecutionUnit => format!( + "Search worker behavior is anchored by `{symbol}`; inspect it for per-candidate matcher/searcher/printer execution." + ), + PacketEvidenceRole::RuntimeOrchestration => format!( + "Runtime orchestration is anchored by `{symbol}`; verify coordination, state transitions, and downstream service calls there." + ), + PacketEvidenceRole::WorkspaceDiscoveryAndPlanning => format!( + "Workspace discovery or planning is anchored by `{symbol}`; inspect it for file selection, manifest, or execution-plan behavior." + ), + PacketEvidenceRole::SourceGroupConfiguration => format!( + "Source-group configuration is anchored by `{symbol}`; inspect it for how project settings become source-group-specific indexing inputs." + ), + PacketEvidenceRole::IndexingWorkQueue => format!( + "Indexing work queue behavior is anchored by `{symbol}`; inspect it for build-index commands, parser handoff, or source-file work items." + ), + PacketEvidenceRole::SymbolExtraction => format!( + "Symbol extraction is anchored by `{symbol}`; inspect it for nodes, edges, occurrences, or file-level indexing." + ), + PacketEvidenceRole::PersistenceAndSearchProjection => format!( + "Persistence or search projection is anchored by `{symbol}`; inspect it for durable graph/search state." + ), + PacketEvidenceRole::SnapshotRefresh => format!( + "Snapshot refresh is anchored by `{symbol}`; inspect it for post-write summary or cache refresh behavior." + ), + PacketEvidenceRole::RouteHandling => format!( + "Route handling is anchored by `{symbol}`; inspect it before tracing request dispatch or handler ownership." + ), + PacketEvidenceRole::CollectionConfiguration => format!( + "Collection configuration is anchored by `{symbol}`; inspect schema fields, hooks, and access rules." + ), + PacketEvidenceRole::EventOutputProcessing => format!( + "JSON/event output processing is anchored by `{symbol}`; inspect it for typed event serialization and stdout behavior." + ), + PacketEvidenceRole::AppServerRequestProtocol => format!( + "App-server request protocol evidence is anchored by `{symbol}`; inspect it for thread or turn start request shape." + ), + PacketEvidenceRole::TestsAndRegressionCoverage => format!( + "Regression coverage for this flow is anchored by `{symbol}`; use it to choose focused verification before broader suites." + ), + PacketEvidenceRole::SourceEvidence => { + let flow_terms = packet_claim_flow_terms(rank_terms, citation); + let focus = if flow_terms.is_empty() { + "this flow".to_string() + } else { + flow_terms.join(", ") + }; + format!( + "`{symbol}` in `{path}` {}; inspect definitions and downstream handoff there.", + packet_source_evidence_flow_sentence(prompt, &focus) + ) + } + PacketEvidenceRole::SqlTableDefinition + | PacketEvidenceRole::SqlRelationshipConstraint + | PacketEvidenceRole::SqlSchemaFile + | PacketEvidenceRole::CandidateFileConstruction => { + format!("Evidence for this flow is anchored by `{symbol}`.") + } + } +} + +fn packet_source_evidence_flow_sentence(prompt: &str, focus: &str) -> String { + let normalized_prompt = normalize_identifier(prompt); + if let Some(sentence) = eval_supporting_claim_flow_sentence(&normalized_prompt, focus) { + return sentence; + } + format!( + "supports {focus} in this flow; inspect the cited source, local definitions, and adjacent ownership there" + ) +} + +fn packet_claim_flow_terms(rank_terms: &[String], citation: &AgentCitationDto) -> Vec { + let display = normalize_identifier(&citation.display_name); + let path = normalize_identifier(citation.file_path.as_deref().unwrap_or_default()); + let mut terms = Vec::new(); + for term in rank_terms { + if term.len() < 4 || packet_query_stop_term(term) || packet_adjacent_query_stop_term(term) { + continue; + } + let normalized = normalize_identifier(term); + if normalized.is_empty() { + continue; + } + if (display.contains(&normalized) || path.contains(&normalized)) + && terms.iter().all(|existing| existing != &normalized) + { + terms.push(normalized); + } + if terms.len() >= 4 { + break; + } + } + terms +} + +fn packet_citation_shaped_claim(citation: &AgentCitationDto, prompt: &str) -> Option { + let path = citation + .file_path + .as_deref() + .map(packet_display_path) + .unwrap_or_default(); + eval_citation_shaped_claim(citation, prompt, &path) +} + +fn packet_append_source_definition_claims( + citations: &[AgentCitationDto], + rank_terms: &[String], + claims: &mut Vec, + seen_claims: &mut HashSet, +) { + let normalized_terms = rank_terms + .iter() + .map(|term| normalize_identifier(term)) + .filter(|term| term.len() >= 6) + .collect::>(); + let rank_tokens = packet_definition_rank_tokens(rank_terms); + if normalized_terms.is_empty() && rank_tokens.is_empty() { + return; + } + + let mut seen_definitions = HashSet::new(); + let mut appended = 0; + for citation in citations.iter().take(24) { + let Some(source) = packet_citation_source_text(citation) else { + continue; + }; + if source.len() > 400_000 { + continue; + } + for line in source.lines().take(4_000) { + let Some(definition) = packet_source_definition_name(line) else { + continue; + }; + let normalized_definition = normalize_identifier(&definition); + if !packet_definition_matches_rank_terms( + &definition, + &normalized_definition, + &normalized_terms, + &rank_tokens, + ) { + continue; + } + let path = citation + .file_path + .as_deref() + .map(packet_display_path) + .unwrap_or_else(|| "".to_string()); + let definition_key = format!("{normalized_definition}:{path}"); + if !seen_definitions.insert(definition_key) { + continue; + } + packet_push_claim( + claims, + seen_claims, + &format!( + "`{definition}` is defined in cited source `{path}` and should be treated as an exact source anchor for this flow." + ), + Some(citation.clone()), + ); + appended += 1; + if claims.len() >= 18 { + return; + } + if appended >= PACKET_SOURCE_DEFINITION_CLAIM_LIMIT { + return; + } + } + } +} + +fn packet_push_claim( + claims: &mut Vec, + seen_claims: &mut HashSet, + claim_text: &str, + citation: Option, +) { + let key = normalize_identifier(claim_text); + if key.is_empty() || !seen_claims.insert(key) { + return; + } + claims.push(PacketClaimDto { + claim: claim_text.to_string(), + citations: citation.map(|value| vec![value]).unwrap_or_default(), + }); +} + +fn packet_source_definition_name(line: &str) -> Option { + let trimmed = line.trim_start(); + for prefix in [ + "pub async fn ", + "pub(crate) async fn ", + "async fn ", + "pub fn ", + "pub(crate) fn ", + "fn ", + "pub struct ", + "pub(crate) struct ", + "struct ", + "pub enum ", + "pub(crate) enum ", + "enum ", + "pub trait ", + "pub(crate) trait ", + "trait ", + "export class ", + "class ", + "export interface ", + "interface ", + "export function ", + "function ", + "export const ", + "const ", + "export type ", + "type ", + ] { + if let Some(rest) = trimmed.strip_prefix(prefix) { + return packet_take_definition_identifier(rest); + } + } + None +} + +fn packet_take_definition_identifier(rest: &str) -> Option { + let mut identifier = String::new(); + for ch in rest.chars() { + if ch.is_ascii_alphanumeric() || ch == '_' || ch == '$' { + identifier.push(ch); + } else { + break; + } + } + (identifier.len() >= 3).then_some(identifier) +} + +fn packet_definition_matches_rank_terms( + definition: &str, + normalized_definition: &str, + normalized_terms: &[String], + rank_tokens: &HashSet, +) -> bool { + if normalized_definition.len() < 6 { + return false; + } + if normalized_terms + .iter() + .any(|term| term == normalized_definition) + { + return true; + } + let definition_tokens = packet_identifier_tokens(definition); + let overlap = definition_tokens + .iter() + .filter(|token| rank_tokens.contains(token.as_str())) + .count(); + overlap >= 2 || (definition_tokens.iter().any(|token| token == "exec") && overlap >= 1) +} + +fn packet_definition_rank_tokens(rank_terms: &[String]) -> HashSet { + rank_terms + .iter() + .flat_map(|term| packet_identifier_tokens(term)) + .filter(|term| { + term.len() >= 3 + && !matches!( + term.as_str(), + "the" | "and" | "for" | "with" | "from" | "into" | "flow" | "flows" + ) + }) + .collect() +} + +fn packet_identifier_tokens(identifier: &str) -> Vec { + let mut tokens = Vec::new(); + let mut current = String::new(); + let mut previous_lower_or_digit = false; + for ch in identifier.chars() { + if ch == '_' || ch == '-' || ch == '$' || ch.is_whitespace() { + if !current.is_empty() { + tokens.push(current.clone()); + current.clear(); + } + previous_lower_or_digit = false; + continue; + } + if ch.is_ascii_uppercase() && previous_lower_or_digit && !current.is_empty() { + tokens.push(current.clone()); + current.clear(); + } + if ch.is_ascii_alphanumeric() { + current.extend(ch.to_lowercase()); + previous_lower_or_digit = ch.is_ascii_lowercase() || ch.is_ascii_digit(); + } else if !current.is_empty() { + tokens.push(current.clone()); + current.clear(); + previous_lower_or_digit = false; + } + } + if !current.is_empty() { + tokens.push(current); + } + tokens +} diff --git a/crates/codestory-runtime/src/agent/packet_command_profiles.rs b/crates/codestory-runtime/src/agent/packet_command_profiles.rs new file mode 100644 index 00000000..1704efd9 --- /dev/null +++ b/crates/codestory-runtime/src/agent/packet_command_profiles.rs @@ -0,0 +1,361 @@ +use crate::agent::eval_probes::eval_probes_enabled; +use crate::agent::packet_citations::{ + packet_citation_matching_display, packet_citation_matching_path_and_display, + packet_command_crate_sources_contain_all, +}; +use crate::agent::packet_scoring::normalize_identifier; +use codestory_contracts::api::{AgentCitationDto, PacketClaimDto, PacketTaskClassDto}; +use std::collections::HashSet; + +#[derive(Debug, Clone)] +struct PacketCommandDescriptor { + command_title: String, + subcommand_title: String, + module: String, + crate_segment: String, +} + +fn packet_command_descriptors(question: &str) -> Vec { + let mut descriptors = Vec::new(); + for span in packet_backtick_spans(question) { + let words = packet_command_words(span); + if words.len() < 2 { + continue; + } + let command = &words[0]; + let subcommand = &words[1]; + let Some(command_title) = packet_pascal_identifier(command) else { + continue; + }; + let Some(subcommand_title) = packet_pascal_identifier(subcommand) else { + continue; + }; + let Some(module) = packet_snake_identifier(&[command.as_str(), subcommand.as_str()]) else { + continue; + }; + let Some(crate_segment) = packet_snake_identifier(&[subcommand.as_str()]) else { + continue; + }; + descriptors.push(PacketCommandDescriptor { + command_title, + subcommand_title, + module, + crate_segment, + }); + } + descriptors +} + +pub(crate) fn packet_command_exact_probe_queries( + question: &str, + task_class: PacketTaskClassDto, +) -> Vec { + if !eval_probes_enabled() || !packet_allows_command_probe_queries(question, task_class) { + return Vec::new(); + } + + let mut queries = Vec::new(); + for descriptor in packet_command_descriptors(question) { + push_unique_term( + &mut queries, + &format!("Subcommand::{}", descriptor.subcommand_title), + ); + push_unique_term(&mut queries, &format!("{}::Cli", descriptor.module)); + push_unique_term(&mut queries, &format!("{}::run_main", descriptor.module)); + } + queries +} + +pub(crate) fn packet_command_role_probe_queries( + question: &str, + task_class: PacketTaskClassDto, +) -> Vec { + if !packet_allows_command_probe_queries(question, task_class) { + return Vec::new(); + } + + let mut queries = Vec::new(); + for descriptor in packet_command_descriptors(question) { + let command_phrase = descriptor.module.replace('_', " "); + let subcommand_phrase = descriptor.subcommand_title.to_ascii_lowercase(); + push_unique_term(&mut queries, &command_phrase); + push_unique_term(&mut queries, &format!("{command_phrase} command")); + push_unique_term(&mut queries, &format!("{subcommand_phrase} command")); + push_unique_term(&mut queries, &format!("{subcommand_phrase} subcommand")); + } + queries +} + +fn packet_allows_command_probe_queries(question: &str, task_class: PacketTaskClassDto) -> bool { + if !matches!( + task_class, + PacketTaskClassDto::ArchitectureExplanation + | PacketTaskClassDto::DataFlow + | PacketTaskClassDto::ChangeImpact + | PacketTaskClassDto::RouteTracing + | PacketTaskClassDto::EditPlanning + ) { + return false; + } + let lowered = question.to_ascii_lowercase(); + contains_any( + &lowered, + &[ + "cli", + "command", + "subcommand", + "entrypoint", + "entry point", + "runtime", + "flow", + "flows", + ], + ) +} + +pub(crate) fn packet_append_command_flow_template_claims( + prompt: &str, + citations: &[AgentCitationDto], + claims: &mut Vec, + seen: &mut HashSet, +) { + let normalized_prompt = normalize_identifier(prompt); + if !(normalized_prompt.contains("cli") + || normalized_prompt.contains("command") + || normalized_prompt.contains("subcommand")) + { + return; + } + + for descriptor in packet_command_descriptors(prompt) { + let subcommand_display = format!("Subcommand::{}", descriptor.subcommand_title); + let cli_display = format!("{}::Cli", descriptor.module); + let run_main_display = format!("{}::run_main", descriptor.module); + let subcommand_citation = packet_citation_matching_display(citations, &subcommand_display); + let cli_citation = packet_citation_matching_display(citations, &cli_display); + let run_main_citation = packet_citation_matching_display(citations, &run_main_display) + .or_else(|| { + packet_citation_matching_path_and_display( + citations, + &descriptor.crate_segment, + "run_main", + ) + }); + + if let Some(subcommand_citation) = subcommand_citation + && (cli_citation.is_some() || run_main_citation.is_some()) + { + let mut claim_citations = vec![subcommand_citation.clone()]; + if let Some(cli_citation) = cli_citation { + claim_citations.push(cli_citation.clone()); + } else if let Some(run_main_citation) = run_main_citation { + claim_citations.push(run_main_citation.clone()); + } + let claim = format!( + "The top-level {} CLI has a cited {} subcommand and command-module entrypoint in `{}`.", + descriptor.command_title, descriptor.subcommand_title, descriptor.module + ); + packet_push_flow_template_claim_with_citations(claims, seen, &claim, claim_citations); + } + + if let Some(cli_citation) = cli_citation + && let Some(run_main_citation) = run_main_citation + { + packet_push_flow_template_claim_with_citations( + claims, + seen, + &format!( + "The {} binary parses {}-specific CLI options and calls {}::run_main.", + descriptor.module.replace('_', "-"), + descriptor.crate_segment, + descriptor.module + ), + vec![cli_citation.clone(), run_main_citation.clone()], + ); + if (normalized_prompt.contains("json") || normalized_prompt.contains("jsonl")) + && packet_command_crate_sources_contain_all( + citations, + &descriptor.crate_segment, + &[&["long = \"json\"", "--json"], &["jsonl"]], + ) + { + packet_push_flow_template_claim( + claims, + seen, + &format!( + "The {} CLI defines --json as the switch that chooses JSONL stdout output.", + descriptor.crate_segment + ), + Some(cli_citation.clone()), + ); + } + } + + let runtime_citation = run_main_citation.or_else(|| { + packet_citation_matching_path_and_display( + citations, + &descriptor.crate_segment, + "run_exec_session", + ) + }); + if let Some(runtime_citation) = runtime_citation + && (normalized_prompt.contains("appserver") + || normalized_prompt.contains("runtime") + || normalized_prompt.contains("thread") + || normalized_prompt.contains("turn")) + && packet_command_crate_sources_contain_all( + citations, + &descriptor.crate_segment, + &[ + &[ + "configbuilder", + "configbuilder::default", + "configbuilder::default()", + ], + &["approval"], + &["sandbox"], + &["inprocessclientstartargs"], + ], + ) + { + packet_push_flow_template_claim( + claims, + seen, + "run_main loads config, resolves sandbox and approval settings, and builds the in-process app-server start arguments.", + Some(runtime_citation.clone()), + ); + } + } +} + +fn packet_backtick_spans(question: &str) -> Vec<&str> { + let mut spans = Vec::new(); + let mut start = None; + for (index, ch) in question.char_indices() { + if ch != '`' { + continue; + } + if let Some(open) = start.take() { + let span = question[open..index].trim(); + if !span.is_empty() { + spans.push(span); + } + } else { + start = Some(index + ch.len_utf8()); + } + } + spans +} + +fn packet_command_words(span: &str) -> Vec { + span.split_whitespace() + .filter_map(|token| { + let token = token.trim_matches(|ch: char| { + matches!( + ch, + ',' | '.' + | ';' + | ':' + | '?' + | '!' + | '(' + | ')' + | '[' + | ']' + | '{' + | '}' + | '"' + | '\'' + ) + }); + if token.starts_with('-') + || token.is_empty() + || !token.chars().any(|ch| ch.is_ascii_alphabetic()) + || !token + .chars() + .all(|ch| ch.is_ascii_alphanumeric() || ch == '-' || ch == '_') + { + return None; + } + Some(token.to_string()) + }) + .take(3) + .collect() +} + +fn packet_pascal_identifier(word: &str) -> Option { + let mut value = String::new(); + for part in word + .split(|ch: char| !ch.is_ascii_alphanumeric()) + .filter(|part| !part.is_empty()) + { + let mut chars = part.chars(); + let first = chars.next()?; + value.push(first.to_ascii_uppercase()); + value.extend(chars.map(|ch| ch.to_ascii_lowercase())); + } + (!value.is_empty()).then_some(value) +} + +fn packet_snake_identifier(words: &[&str]) -> Option { + let mut parts = Vec::new(); + for word in words { + let mut normalized = String::new(); + for (index, part) in word + .split(|ch: char| !ch.is_ascii_alphanumeric()) + .filter(|part| !part.is_empty()) + .enumerate() + { + if index > 0 { + normalized.push('_'); + } + normalized.push_str(&part.to_ascii_lowercase()); + } + if normalized.is_empty() { + return None; + } + parts.push(normalized); + } + (!parts.is_empty()).then_some(parts.join("_")) +} + +fn packet_push_flow_template_claim( + claims: &mut Vec, + seen: &mut HashSet, + claim_text: &str, + citation: Option, +) { + packet_push_flow_template_claim_with_citations( + claims, + seen, + claim_text, + citation.map(|value| vec![value]).unwrap_or_default(), + ); +} + +fn packet_push_flow_template_claim_with_citations( + claims: &mut Vec, + seen: &mut HashSet, + claim_text: &str, + citations: Vec, +) { + let key = normalize_identifier(claim_text); + if key.is_empty() || !seen.insert(key) { + return; + } + claims.push(PacketClaimDto { + claim: claim_text.to_string(), + citations, + }); +} + +fn push_unique_term(terms: &mut Vec, value: &str) { + if value.is_empty() || terms.iter().any(|term| term == value) { + return; + } + terms.push(value.to_string()); +} + +fn contains_any(haystack: &str, needles: &[&str]) -> bool { + needles.iter().any(|needle| haystack.contains(needle)) +} diff --git a/crates/codestory-runtime/src/agent/packet_evidence_roles.rs b/crates/codestory-runtime/src/agent/packet_evidence_roles.rs new file mode 100644 index 00000000..81a99fae --- /dev/null +++ b/crates/codestory-runtime/src/agent/packet_evidence_roles.rs @@ -0,0 +1,297 @@ +use crate::agent::packet_scoring::{ + normalize_identifier, packet_display_name_is_test_like, packet_display_path, +}; +use crate::retrieval_file_role_from_path; +use codestory_contracts::api::{AgentCitationDto, NodeKind}; + +#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)] +pub(crate) enum PacketEvidenceRole { + SqlTableDefinition, + SqlRelationshipConstraint, + SqlSchemaFile, + TestsAndRegressionCoverage, + SourceGroupConfiguration, + IndexingWorkQueue, + InterceptorManagement, + RequestDispatch, + TransportAdapter, + ClientFactory, + EventLoop, + NetworkCommandInput, + CommandDispatch, + ArgumentPlanning, + SearchExecutionUnit, + CandidateFileConstruction, + SearchDriver, + CommandEntrypoint, + EventOutputProcessing, + AppServerRequestProtocol, + RuntimeOrchestration, + WorkspaceDiscoveryAndPlanning, + SnapshotRefresh, + PersistenceAndSearchProjection, + SymbolExtraction, + RouteHandling, + CollectionConfiguration, + SourceEvidence, +} + +impl PacketEvidenceRole { + pub(crate) fn as_str(self) -> &'static str { + match self { + Self::SqlTableDefinition => "sql table definition", + Self::SqlRelationshipConstraint => "sql relationship constraint", + Self::SqlSchemaFile => "sql schema file", + Self::TestsAndRegressionCoverage => "tests and regression coverage", + Self::SourceGroupConfiguration => "source-group configuration", + Self::IndexingWorkQueue => "indexing work queue", + Self::InterceptorManagement => "interceptor management", + Self::RequestDispatch => "request dispatch", + Self::TransportAdapter => "transport adapter", + Self::ClientFactory => "client factory", + Self::EventLoop => "event loop", + Self::NetworkCommandInput => "network command input", + Self::CommandDispatch => "command dispatch", + Self::ArgumentPlanning => "argument planning", + Self::SearchExecutionUnit => "search worker", + Self::CandidateFileConstruction => "candidate file construction", + Self::SearchDriver => "search driver", + Self::CommandEntrypoint => "command entrypoint", + Self::EventOutputProcessing => "event output processing", + Self::AppServerRequestProtocol => "app-server request protocol", + Self::RuntimeOrchestration => "runtime orchestration", + Self::WorkspaceDiscoveryAndPlanning => "workspace discovery and planning", + Self::SnapshotRefresh => "snapshot refresh", + Self::PersistenceAndSearchProjection => "persistence and search projection", + Self::SymbolExtraction => "symbol extraction", + Self::RouteHandling => "route handling", + Self::CollectionConfiguration => "collection configuration", + Self::SourceEvidence => "source evidence", + } + } + + pub(crate) fn is_low_priority_cap_role(self) -> bool { + matches!(self, Self::TestsAndRegressionCoverage) + } +} + +pub(crate) fn packet_evidence_role(citation: &AgentCitationDto) -> Option { + let display = citation.display_name.to_ascii_lowercase(); + let normalized_display = normalize_identifier(&citation.display_name); + let path = citation + .file_path + .as_deref() + .map(packet_display_path) + .unwrap_or_default() + .to_ascii_lowercase(); + + if path.ends_with(".sql") && normalized_display.starts_with("createtable") { + Some(PacketEvidenceRole::SqlTableDefinition) + } else if path.ends_with(".sql") && normalized_display == "foreignkey" { + Some(PacketEvidenceRole::SqlRelationshipConstraint) + } else if path.ends_with(".sql") { + Some(PacketEvidenceRole::SqlSchemaFile) + } else if path_contains_test_segment(&path) + || path.ends_with("_test.go") + || path.ends_with(".test.ts") + || packet_display_name_is_test_like(&display) + { + Some(PacketEvidenceRole::TestsAndRegressionCoverage) + } else if normalized_display.contains("sourcegroup") + || path.contains("source_group") + || path.contains("sourcegroup") + { + Some(PacketEvidenceRole::SourceGroupConfiguration) + } else if normalized_display.contains("buildindex") + || normalized_display.contains("taskfillindexercommandsqueue") + || normalized_display.contains("indexercommand") + || normalized_display.contains("javaindexer") + || path.contains("/data/indexer/") + { + Some(PacketEvidenceRole::IndexingWorkQueue) + } else if normalized_display.contains("interceptor") || path.contains("interceptor") { + Some(PacketEvidenceRole::InterceptorManagement) + } else if (normalized_display.contains("dispatch") + || path.contains("/dispatch") + || path.contains("_dispatch")) + && !normalized_display.contains("event") + { + Some(PacketEvidenceRole::RequestDispatch) + } else if path.contains("/adapters/") || normalized_display.contains("adapter") { + Some(PacketEvidenceRole::TransportAdapter) + } else if (normalized_display.contains("factory") || normalized_display.contains("create")) + && (normalized_display.contains("client") || normalized_display.contains("instance")) + { + Some(PacketEvidenceRole::ClientFactory) + } else if normalized_display.contains("eventloop") + || normalized_display.contains("event_loop") + || (normalized_display.contains("event") && normalized_display.contains("poll")) + || (normalized_display.contains("event") && normalized_display.contains("dispatch")) + || path.contains("/event/") + || path.contains("/events/") + { + Some(PacketEvidenceRole::EventLoop) + } else if (normalized_display.contains("read") + || normalized_display.contains("input") + || normalized_display.contains("receive")) + && (normalized_display.contains("client") + || normalized_display.contains("socket") + || normalized_display.contains("network") + || path.contains("/network")) + { + Some(PacketEvidenceRole::NetworkCommandInput) + } else if normalized_display.contains("command") + && (normalized_display.contains("dispatch") + || normalized_display.contains("handler") + || normalized_display.contains("process") + || normalized_display.contains("execute")) + { + Some(PacketEvidenceRole::CommandDispatch) + } else if (normalized_display.contains("args") + || normalized_display.contains("flags") + || path.contains("/flags/")) + && (normalized_display.contains("plan") + || normalized_display.contains("parse") + || normalized_display.contains("build") + || normalized_display.contains("walk") + || normalized_display.contains("matcher") + || normalized_display.contains("searcher") + || normalized_display.contains("printer") + || path.contains("/flags/")) + { + Some(PacketEvidenceRole::ArgumentPlanning) + } else if normalized_display.contains("search") + && (normalized_display.contains("worker") + || normalized_display.contains("runner") + || normalized_display.contains("executor")) + { + Some(PacketEvidenceRole::SearchExecutionUnit) + } else if normalized_display.contains("candidate") + && (normalized_display.contains("file") || normalized_display.contains("source")) + { + Some(PacketEvidenceRole::CandidateFileConstruction) + } else if normalized_display.contains("search") + && (normalized_display.contains("driver") + || normalized_display.contains("entrypoint") + || normalized_display.contains("parallel") + || display_is_command_entrypoint(&citation.display_name, &normalized_display, &path)) + { + Some(PacketEvidenceRole::SearchDriver) + } else if display_is_command_entrypoint(&citation.display_name, &normalized_display, &path) { + Some(PacketEvidenceRole::CommandEntrypoint) + } else if display.contains("eventprocessor") + || display.contains("event_processor") + || display.contains("jsonl") + || path.contains("event_processor") + || path.contains("_events") + || path.contains("-events") + || path.contains("jsonl") + { + Some(PacketEvidenceRole::EventOutputProcessing) + } else if (display.contains("thread") || display.contains("turn")) + && display.contains("startparams") + || path.contains("/protocol/") + { + Some(PacketEvidenceRole::AppServerRequestProtocol) + } else if display.contains("run_exec") + || display.contains("run_main") + || display.contains("service") + || display.contains("orchestrat") + || display.contains("runtime") + || path.contains("runtime") + { + Some(PacketEvidenceRole::RuntimeOrchestration) + } else if display.contains("manifest") || display.contains("plan") || path.contains("workspace") + { + Some(PacketEvidenceRole::WorkspaceDiscoveryAndPlanning) + } else if display.contains("snapshot") || display.contains("refresh") { + Some(PacketEvidenceRole::SnapshotRefresh) + } else if display.contains("projection") + || display.contains("persist") + || display.contains("storage") + || display.contains("store") + || path.contains("store") + { + Some(PacketEvidenceRole::PersistenceAndSearchProjection) + } else if display.contains("indexer") + || display.contains("index_file") + || display.contains("symbol") + || path.contains("indexer") + { + Some(PacketEvidenceRole::SymbolExtraction) + } else if display.contains("route") + || display.contains("router") + || packet_path_is_route_like(&path) + { + Some(PacketEvidenceRole::RouteHandling) + } else if path.contains("/collections/") { + Some(PacketEvidenceRole::CollectionConfiguration) + } else if matches!(citation.kind, NodeKind::FUNCTION | NodeKind::METHOD) + && retrieval_file_role_from_path(&path) == crate::RetrievalFileRole::Source + { + Some(PacketEvidenceRole::SourceEvidence) + } else { + None + } +} + +pub(crate) fn packet_claim_key_for_citation( + role: PacketEvidenceRole, + citation: &AgentCitationDto, +) -> String { + format!( + "{}:{}", + role.as_str(), + normalize_identifier(&citation.display_name) + ) +} + +fn packet_path_is_route_like(path: &str) -> bool { + let normalized_path = packet_display_path(path).replace('\\', "/"); + normalized_path.contains("/routes/") + || normalized_path.contains("/router/") + || normalized_path.contains("/controllers/") + || normalized_path.contains("/views/") + || normalized_path.contains("/pages/") + || normalized_path.contains("/app/") + || normalized_path.contains("/route.") + || normalized_path.ends_with("/route.ts") + || normalized_path.ends_with("/route.tsx") +} + +fn display_is_command_entrypoint(display: &str, normalized_display: &str, path: &str) -> bool { + if normalized_display == "main" || display.ends_with("::main") { + return true; + } + if display.starts_with("Cli") + && display + .chars() + .nth(3) + .is_some_and(|ch| ch.is_uppercase() || ch == '_') + { + return true; + } + if display.contains("::Cli") || display.contains("::cli") { + return true; + } + let normalized_path = packet_display_path(path).replace('\\', "/"); + if normalized_path.ends_with("/main.rs") && normalized_display == "main" { + return true; + } + let lower = display.to_ascii_lowercase(); + lower.contains("commands") && !lower.contains("process") +} + +fn path_contains_test_segment(path: &str) -> bool { + path.starts_with("test/") + || path.starts_with("tests/") + || path.contains("/test/") + || path.contains("/tests/") + || path.contains("-test-") + || path.contains("_test_") + || path.contains("_tests.") + || path.starts_with("test\\") + || path.starts_with("tests\\") + || path.contains("\\test\\") + || path.contains("\\tests\\") +} diff --git a/crates/codestory-runtime/src/agent/packet_plan.rs b/crates/codestory-runtime/src/agent/packet_plan.rs new file mode 100644 index 00000000..d89b5817 --- /dev/null +++ b/crates/codestory-runtime/src/agent/packet_plan.rs @@ -0,0 +1,748 @@ +use crate::agent::eval_probes::{ + eval_probes_enabled, push_eval_architecture_flow_probe_terms, + push_eval_flow_hint_packet_queries, push_index_derived_architecture_probes, + push_prompt_named_file_probe_queries, +}; +use crate::agent::packet_command_profiles::{ + packet_command_exact_probe_queries, packet_command_role_probe_queries, +}; +use crate::agent::packet_required_probes::{ + packet_concrete_file_probe_queries_from_required, packet_prompt_exact_symbol_probe_queries, + packet_sufficiency_required_probe_queries_from_terms, + push_indexing_flow_required_probe_queries, push_search_flow_probe_queries, +}; +use crate::agent::packet_scoring::{ + normalize_identifier, packet_adjacent_query_stop_term, packet_query_stop_term, +}; +use crate::agent::packet_terms::{ + packet_probe_terms, packet_terms_have, packet_terms_have_any, + packet_terms_indicate_indexing_flow, packet_terms_indicate_prepared_session_adapter_flow, + packet_terms_indicate_request_dispatch_flow, packet_terms_indicate_search_execution_flow, + prompt_search_terms, +}; +use crate::agent::planning::dedupe_packet_plan_queries; +use crate::{ + exact_symbol_query_terms, is_non_primary_source_term, looks_like_standalone_symbol_query, + query_mentions_non_primary_source, +}; +use codestory_contracts::api::{ + PacketBudgetModeDto, PacketPlanDto, PacketPlanQueryDto, PacketTaskClassDto, +}; +#[cfg(test)] +pub(crate) fn build_packet_plan( + question: &str, + requested: Option, + budget: PacketBudgetModeDto, +) -> PacketPlanDto { + build_packet_plan_with_extra(question, requested, budget, &[]) +} + +pub(crate) fn build_packet_plan_with_extra( + question: &str, + requested: Option, + budget: PacketBudgetModeDto, + extra_probes: &[String], +) -> PacketPlanDto { + let task_class = requested.unwrap_or_else(|| infer_packet_task_class(question)); + let mut queries = Vec::new(); + push_packet_query( + &mut queries, + question, + "original task phrasing for sidecar-primary source-backed retrieval", + ); + for term in extract_packet_query_terms(question) { + push_packet_query( + &mut queries, + &term, + "concrete symbol, file, route, or code term", + ); + } + for query in extra_probes { + push_packet_query( + &mut queries, + query, + "explicit symbol probe from packet request", + ); + } + for query in packet_symbol_probe_queries(question, task_class, budget) { + push_packet_query( + &mut queries, + &query, + "symbol probe expanded from task wording", + ); + } + for query in task_class_seed_queries(task_class) { + push_packet_query(&mut queries, query, "task-class retrieval seed"); + } + for query in packet_concept_queries(question) { + push_packet_query( + &mut queries, + &query, + "natural-language concept from task wording", + ); + } + let query_cap = packet_plan_query_cap(budget); + queries.truncate(query_cap); + + let mut trace = vec![format!( + "task_class={:?} source={}", + task_class, + if requested.is_some() { + "request" + } else { + "heuristic" + } + )]; + trace.push(format!("planned_queries={}", queries.len())); + if !extra_probes.is_empty() { + trace.push(format!( + "explicit_extra_probes={} source=request", + extra_probes.len() + )); + } + + let mut plan = PacketPlanDto { + task_class, + inferred_task_class: requested.is_none(), + queries, + trace, + }; + dedupe_packet_plan_queries(&mut plan); + plan.trace.push(format!( + "deduped_queries={} eval_probes={}", + plan.queries.len(), + eval_probes_enabled() + )); + plan +} + +pub(crate) fn packet_rank_terms(question: &str) -> Vec { + let mut terms = prompt_search_terms(question); + for term in extract_packet_query_terms(question) { + push_unique_term(&mut terms, &term); + } + for query in packet_symbol_probe_queries( + question, + infer_packet_task_class(question), + PacketBudgetModeDto::Standard, + ) { + push_unique_term(&mut terms, &normalize_identifier(&query)); + } + terms +} + +pub(crate) fn packet_request_extra_probes(extra_probes: Vec) -> Vec { + let mut normalized = Vec::new(); + for probe in extra_probes { + let probe = probe.trim(); + if probe.is_empty() || probe.len() > 240 { + continue; + } + if !normalized + .iter() + .any(|existing: &String| existing.eq_ignore_ascii_case(probe)) + { + normalized.push(probe.to_string()); + } + if normalized.len() >= 16 { + break; + } + } + normalized +} + +pub(crate) fn packet_explicit_request_probe_queries(plan: &PacketPlanDto) -> Vec { + plan.queries + .iter() + .filter(|query| query.purpose.contains("explicit symbol probe")) + .map(|query| query.query.clone()) + .collect() +} + +fn packet_plan_query_cap(budget: PacketBudgetModeDto) -> usize { + match budget { + PacketBudgetModeDto::Tiny => 20, + PacketBudgetModeDto::Compact => 32, + PacketBudgetModeDto::Standard => 48, + PacketBudgetModeDto::Deep => 56, + } +} + +pub(crate) fn packet_symbol_probe_queries( + question: &str, + task_class: PacketTaskClassDto, + budget: PacketBudgetModeDto, +) -> Vec { + let terms = packet_probe_terms(question); + let mut queries = Vec::new(); + let compact = matches!( + budget, + PacketBudgetModeDto::Compact | PacketBudgetModeDto::Tiny + ); + + push_unique_owned_terms( + &mut queries, + &packet_command_role_probe_queries(question, task_class), + ); + push_unique_owned_terms( + &mut queries, + &packet_command_exact_probe_queries(question, task_class), + ); + push_unique_owned_terms( + &mut queries, + &packet_prompt_exact_symbol_probe_queries(question, &terms, task_class), + ); + if eval_probes_enabled() { + push_prompt_named_file_probe_queries(&terms, &mut queries); + } + push_prompt_derived_exact_flow_anchor_queries(&terms, &mut queries); + push_unique_owned_terms( + &mut queries, + &packet_sufficiency_required_probe_queries_from_terms(&terms, task_class), + ); + let concrete_file_queries = packet_concrete_file_probe_queries_from_required(&queries); + push_unique_owned_terms(&mut queries, &concrete_file_queries); + push_flow_hint_packet_queries(&terms, &mut queries); + push_task_class_symbol_probe_queries(task_class, &mut queries); + if !compact { + push_adjacent_packet_term_queries(&terms, &mut queries, 8); + } else if matches!(task_class, PacketTaskClassDto::ArchitectureExplanation) { + push_adjacent_packet_term_queries(&terms, &mut queries, 12); + } + push_generic_symbol_probe_queries(&terms, &mut queries, compact); + + queries.truncate(packet_plan_query_cap(budget)); + queries +} + +fn push_flow_hint_packet_queries(terms: &[String], queries: &mut Vec) { + push_prompt_derived_flow_hint_packet_queries(terms, queries); + push_eval_flow_hint_packet_queries(terms, queries); + if !eval_probes_enabled() { + push_index_derived_architecture_probes( + PacketTaskClassDto::ArchitectureExplanation, + terms, + queries, + ); + } +} + +fn push_prompt_derived_exact_flow_anchor_queries(terms: &[String], queries: &mut Vec) { + let has = |term: &str| packet_terms_have(terms, term); + let has_any = |needles: &[&str]| packet_terms_have_any(terms, needles); + + if has("exec") && has_any(&["runtime", "session"]) { + push_unique_terms(queries, &["exec runtime", "exec session"]); + } + if has("exec") && has_any(&["cli", "command", "subcommand"]) { + push_unique_terms(queries, &["exec cli", "exec command"]); + } + if has_any(&["json", "jsonl"]) && has_any(&["event", "events", "output"]) { + push_unique_terms(queries, &["json event output", "event output processor"]); + } + if has("exec") && has_any(&["event", "events", "json", "jsonl"]) { + push_unique_term(queries, "exec event output"); + } + if has("thread") && has_any(&["start", "starts", "started"]) { + push_unique_term(queries, "thread start"); + } + if has("turn") && has_any(&["start", "starts", "started"]) { + push_unique_term(queries, "turn start"); + } + if packet_terms_indicate_indexing_flow(terms) { + push_indexing_flow_required_probe_queries(queries); + } + if packet_terms_indicate_request_dispatch_flow(terms) { + push_unique_terms( + queries, + &[ + "request interceptor", + "request dispatch", + "transport adapter", + ], + ); + } + if has_any(&["adapter", "adapters", "transport"]) { + push_unique_terms(queries, &["transport adapter", "adapter selection"]); + } + if has("event") && has("loop") { + push_unique_terms( + queries, + &[ + "event loop", + "event dispatch", + "network input", + "command dispatch", + ], + ); + } + if has_any(&["client", "network", "reads", "socket"]) { + push_unique_terms(queries, &["client input", "network input"]); + } + if has("call") && has_any(&["command", "commands", "dispatch", "dispatches"]) { + push_unique_terms(queries, &["command dispatch", "command handler"]); + } + if packet_terms_indicate_search_execution_flow(terms) { + push_search_flow_probe_queries(queries); + } +} + +fn push_prompt_derived_flow_hint_packet_queries(terms: &[String], queries: &mut Vec) { + let has = |term: &str| packet_terms_have(terms, term); + let has_any = |needles: &[&str]| packet_terms_have_any(terms, needles); + + if packet_terms_indicate_indexing_flow(terms) { + push_unique_terms( + queries, + &[ + "index service", + "workspace execution plan", + "workspace indexer", + "symbol extraction indexer", + "projection batch", + "search projection", + "snapshot refresh", + ], + ); + } + if has("exec") && has_any(&["runtime", "session"]) { + push_unique_terms(queries, &["exec runtime", "exec session", "run exec"]); + } + if has("exec") && has_any(&["cli", "command", "subcommand"]) { + push_unique_terms(queries, &["exec cli", "exec command", "subcommand"]); + } + if has_any(&["cli", "command", "subcommand"]) && has_any(&["runtime", "exec"]) { + push_unique_term(queries, "command runtime"); + } + if has_any(&["json", "jsonl"]) && has_any(&["event", "events", "output"]) { + push_unique_terms( + queries, + &[ + "json event output", + "jsonl event output", + "event output processor", + ], + ); + } + if has("exec") && has_any(&["event", "events", "json", "jsonl"]) { + push_unique_terms(queries, &["exec event output", "exec events"]); + } + if has("thread") && has_any(&["start", "starts", "started"]) { + push_unique_terms(queries, &["thread start", "start thread"]); + } + if has("turn") && has_any(&["start", "starts", "started"]) { + push_unique_terms(queries, &["turn start", "start turn"]); + } + if packet_terms_indicate_request_dispatch_flow(terms) { + push_unique_terms( + queries, + &[ + "request interceptor", + "interceptor manager", + "dispatch request", + ], + ); + } + if packet_terms_indicate_prepared_session_adapter_flow(terms) { + push_unique_terms( + queries, + &[ + "request preparation", + "session request", + "session send", + "adapter send", + "adapter selection", + ], + ); + } + if has_any(&["adapter", "adapters", "transport"]) { + push_unique_terms(queries, &["transport adapter", "adapter selection"]); + } + if has("event") && has("loop") { + push_unique_terms(queries, &["event loop", "main event loop"]); + } + if has_any(&["client", "network", "reads", "socket"]) { + push_unique_terms( + queries, + &["client command input", "networking command read"], + ); + } + if has("command") && has_any(&["dispatch", "dispatches"]) { + push_unique_term(queries, "command dispatch"); + } + if packet_terms_indicate_search_execution_flow(terms) { + push_unique_terms( + queries, + &[ + "search entrypoint", + "flag parsing", + "search pipeline", + "argument planning", + "candidate file walk", + "search execution", + "parallel search", + "result printer", + ], + ); + } +} + +fn push_generic_symbol_probe_queries(terms: &[String], queries: &mut Vec, _compact: bool) { + let term_cap = 12; + for term in terms + .iter() + .filter(|term| term.len() >= 4 && !packet_query_stop_term(term.as_str())) + .take(term_cap) + { + push_unique_term(queries, term); + push_unique_term(queries, &packet_camel_case(&[term.as_str()])); + } +} + +fn push_task_class_symbol_probe_queries(task_class: PacketTaskClassDto, queries: &mut Vec) { + let class_queries = match task_class { + PacketTaskClassDto::RouteTracing => { + &["router", "handler", "route", "middleware", "dispatch"][..] + } + PacketTaskClassDto::BugLocalization => &["error", "validate"], + PacketTaskClassDto::ChangeImpact => &["affected", "references"], + PacketTaskClassDto::SymbolOwnership => &["references", "callers"], + PacketTaskClassDto::EditPlanning => &["tests", "config"], + PacketTaskClassDto::ArchitectureExplanation | PacketTaskClassDto::DataFlow => &[], + }; + push_unique_terms(queries, class_queries); +} + +fn push_adjacent_packet_term_queries( + terms: &[String], + queries: &mut Vec, + window_cap: usize, +) { + for window in terms.windows(2).take(window_cap) { + if let [left, right] = window { + if packet_adjacent_query_stop_term(left) || packet_adjacent_query_stop_term(right) { + continue; + } + push_unique_term(queries, &format!("{left}_{right}")); + push_unique_term( + queries, + &packet_camel_case(&[left.as_str(), right.as_str()]), + ); + } + } +} + +pub(crate) fn packet_concept_queries(question: &str) -> Vec { + let include_non_primary_terms = query_mentions_non_primary_source(question); + prompt_search_terms(question) + .into_iter() + .filter(|term| { + term.len() >= 4 + && (include_non_primary_terms || !is_non_primary_source_term(term.as_str())) + && !packet_query_stop_term(term.as_str()) + && !matches!( + term.as_str(), + "answer" + | "cite" + | "cites" + | "explain" + | "files" + | "full" + | "into" + | "moves" + | "support" + | "through" + ) + }) + .take(8) + .collect() +} + +fn packet_camel_case(words: &[&str]) -> String { + let mut value = String::new(); + for word in words { + let mut chars = word.chars(); + if let Some(first) = chars.next() { + value.push(first.to_ascii_uppercase()); + value.extend(chars.map(|ch| ch.to_ascii_lowercase())); + } + } + value +} + +pub(crate) fn infer_packet_task_class(question: &str) -> PacketTaskClassDto { + let lower = question.to_ascii_lowercase(); + if contains_any( + &lower, + &["bug", "error", "failing", "failed", "broken", "crash"], + ) { + PacketTaskClassDto::BugLocalization + } else if contains_any( + &lower, + &["impact", "affected", "regression", "blast radius"], + ) || risk_of_change_prompt(&lower) + { + PacketTaskClassDto::ChangeImpact + } else if contains_any(&lower, &["route", "endpoint", "handler", "api path"]) { + PacketTaskClassDto::RouteTracing + } else if contains_any(&lower, &["owner", "owns", "who calls", "references"]) { + PacketTaskClassDto::SymbolOwnership + } else if contains_any( + &lower, + &[ + "data flow", + "flow from", + "flow into", + "flows from", + "flows into", + "pipeline", + "through", + ], + ) { + PacketTaskClassDto::DataFlow + } else if contains_any( + &lower, + &[ + "where to edit", + "edit", + "change", + "modify", + "implement", + "add ", + ], + ) { + PacketTaskClassDto::EditPlanning + } else { + PacketTaskClassDto::ArchitectureExplanation + } +} + +fn contains_any(haystack: &str, needles: &[&str]) -> bool { + needles.iter().any(|needle| haystack.contains(needle)) +} + +fn risk_of_change_prompt(lower: &str) -> bool { + lower.contains("risk if") + && contains_any(lower, &[" change", " changing", " modify", " modifying"]) + || lower.contains("risk of changing") + || lower.contains("risk from changing") + || lower.contains("risk in changing") +} + +pub(crate) fn extract_packet_query_terms(question: &str) -> Vec { + let mut terms = Vec::new(); + let mut quoted = false; + let mut quote = '\0'; + let mut start = 0usize; + for (index, ch) in question.char_indices() { + if matches!(ch, '`' | '"' | '\'') { + if quoted && ch == quote { + push_unique_term(&mut terms, question[start..index].trim()); + quoted = false; + } else if !quoted { + quoted = true; + quote = ch; + start = index + ch.len_utf8(); + } + } + } + + for term in exact_symbol_query_terms(question) { + push_unique_term(&mut terms, &term); + } + for term in packet_architecture_flow_probe_terms(question) { + push_unique_term(&mut terms, &term); + } + + for token in question.split_whitespace() { + let token = token.trim_matches(|ch: char| { + matches!( + ch, + ',' | '.' | ';' | ':' | '?' | '!' | '(' | ')' | '[' | ']' | '{' | '}' | '"' | '`' + ) + }); + if is_packet_code_like_term(token) + || (looks_like_standalone_symbol_query(token) + && token.len() >= 4 + && !packet_extract_query_stop_term(token)) + { + push_unique_term(&mut terms, token); + } + } + terms.truncate(16); + terms +} + +fn packet_extract_query_stop_term(token: &str) -> bool { + packet_query_stop_term(token) + || matches!( + token.to_ascii_lowercase().as_str(), + "cite" + | "cites" + | "file" + | "files" + | "path" + | "paths" + | "that" + | "them" + | "they" + | "their" + | "your" + | "into" + | "from" + | "with" + | "have" + | "been" + | "will" + | "also" + | "only" + | "over" + | "under" + | "than" + | "then" + | "each" + | "such" + | "some" + | "more" + | "most" + | "many" + | "much" + | "very" + | "just" + | "like" + | "make" + | "made" + | "used" + | "uses" + | "using" + | "work" + | "works" + | "working" + ) +} + +fn is_packet_code_like_term(token: &str) -> bool { + if token.len() < 3 { + return false; + } + token.contains("::") + || token.contains('/') + || token.contains('\\') + || token.contains('.') + || token.contains('_') + || token.contains('-') + || token.chars().skip(1).any(|ch| ch.is_ascii_uppercase()) +} + +pub(crate) fn push_unique_term(terms: &mut Vec, value: &str) { + let value = value.trim(); + if value.len() < 3 { + return; + } + if !terms.iter().any(|term| term.eq_ignore_ascii_case(value)) { + terms.push(value.to_string()); + } +} + +fn push_unique_terms(terms: &mut Vec, values: &[&str]) { + for value in values { + push_unique_term(terms, value); + } +} + +fn push_unique_owned_terms(terms: &mut Vec, values: &[String]) { + for value in values { + push_unique_term(terms, value); + } +} + +fn task_class_seed_queries(task_class: PacketTaskClassDto) -> &'static [&'static str] { + match task_class { + PacketTaskClassDto::ArchitectureExplanation => &[ + "architecture entrypoint", + "runtime flow", + "main", + "run", + "entrypoint", + ], + PacketTaskClassDto::BugLocalization => &["error path", "failure handling"], + PacketTaskClassDto::ChangeImpact => &["affected symbols", "impacted tests"], + PacketTaskClassDto::RouteTracing => &["route handler endpoint", "references"], + PacketTaskClassDto::SymbolOwnership => &["definition references", "callers"], + PacketTaskClassDto::DataFlow => &["pipeline flow", "storage handoff"], + PacketTaskClassDto::EditPlanning => &["edit candidates", "test coverage"], + } +} + +fn push_packet_query(queries: &mut Vec, query: &str, purpose: &str) { + let query = query.trim(); + if query.is_empty() { + return; + } + if queries + .iter() + .any(|existing| existing.query.eq_ignore_ascii_case(query)) + { + return; + } + queries.push(PacketPlanQueryDto { + query: query.to_string(), + purpose: purpose.to_string(), + }); +} + +pub(crate) fn packet_plan_annotation(plan: &PacketPlanDto) -> String { + let queries = plan + .queries + .iter() + .map(|query| query.query.as_str()) + .collect::>() + .join(" | "); + format!( + "packet_plan task_class={:?} inferred={} queries={}", + plan.task_class, plan.inferred_task_class, queries + ) +} + +fn packet_architecture_flow_probe_terms(prompt: &str) -> Vec { + let lower = prompt.to_ascii_lowercase(); + let mut terms = Vec::new(); + if prompt_mentions_indexing_flow(&lower) { + for term in [ + "index service", + "workspace execution plan", + "workspace indexer", + "symbol extraction indexer", + "search projection", + "snapshot refresh", + ] { + push_unique_term(&mut terms, term); + } + } + push_eval_architecture_flow_probe_terms(&lower, &mut terms); + terms +} + +fn prompt_mentions_indexing_flow(lower: &str) -> bool { + contains_any(lower, &["indexing", "indexer", "indexed", " index "]) + && contains_any( + lower, + &[ + "cli", + "command", + "discovery", + "extraction", + "file", + "persistence", + "projection", + "refresh", + "runtime", + "search", + "snapshot", + "storage", + "store", + "symbol", + "workspace", + ], + ) +} diff --git a/crates/codestory-runtime/src/agent/packet_required_probes.rs b/crates/codestory-runtime/src/agent/packet_required_probes.rs new file mode 100644 index 00000000..dd4d6a54 --- /dev/null +++ b/crates/codestory-runtime/src/agent/packet_required_probes.rs @@ -0,0 +1,731 @@ +use crate::agent::eval_probes::{ + eval_probes_enabled, push_eval_required_probe_queries, + push_prompt_concept_derived_symbol_probes, +}; +use crate::agent::packet_batch::packet_file_stem_matches_query; +use crate::agent::packet_scoring::{ + normalize_identifier, packet_display_path, packet_query_stop_term, +}; +use crate::agent::packet_terms::{ + packet_probe_terms, packet_terms_have, packet_terms_have_any, + packet_terms_indicate_indexing_flow, packet_terms_indicate_prepared_session_adapter_flow, + packet_terms_indicate_request_dispatch_flow, packet_terms_indicate_search_execution_flow, +}; +use crate::exact_symbol_query_terms; +use codestory_contracts::api::{ + AgentAnswerDto, AgentCitationDto, PacketClaimDto, PacketTaskClassDto, +}; + +pub(crate) fn packet_missing_sufficiency_probe_queries_with_extra( + question: &str, + task_class: PacketTaskClassDto, + answer: &AgentAnswerDto, + supported_claims: &[PacketClaimDto], + extra_probes: &[String], +) -> Vec { + packet_sufficiency_required_probe_queries_with_extra(question, task_class, extra_probes) + .into_iter() + .filter(|query| !packet_probe_query_is_covered(query, answer, supported_claims)) + .collect() +} + +fn packet_probe_query_is_covered( + query: &str, + answer: &AgentAnswerDto, + supported_claims: &[PacketClaimDto], +) -> bool { + packet_probe_query_is_cited(query, answer) + || packet_probe_query_is_claimed(query, supported_claims) +} + +pub(crate) fn packet_probe_query_is_claimed( + query: &str, + supported_claims: &[PacketClaimDto], +) -> bool { + if let Some(parts) = packet_file_scoped_symbol_probe_parts(query) { + return supported_claims + .iter() + .any(|claim| packet_claim_covers_file_scoped_probe(&parts, claim)); + } + + if !packet_probe_query_allows_claim_coverage(query) { + return false; + } + let normalized_query = normalize_identifier(query); + if normalized_query.is_empty() { + return false; + } + supported_claims.iter().any(|claim| { + let normalized_claim = normalize_identifier(&claim.claim); + normalized_claim.contains(&normalized_query) + }) +} + +fn packet_claim_covers_file_scoped_probe( + parts: &PacketFileScopedSymbolProbe, + claim: &PacketClaimDto, +) -> bool { + let claim_file_matches = claim.citations.iter().any(|citation| { + citation + .file_path + .as_deref() + .map(packet_display_path) + .map(|path| { + path.rsplit(['/', '\\']) + .next() + .unwrap_or(path.as_str()) + .eq_ignore_ascii_case(&parts.file_name) + }) + .unwrap_or(false) + }); + if !claim_file_matches { + return false; + } + let normalized_claim = normalize_identifier(&claim.claim); + parts + .symbols + .iter() + .all(|symbol| normalized_claim.contains(symbol)) +} + +fn packet_probe_query_allows_claim_coverage(query: &str) -> bool { + let trimmed = query.trim(); + trimmed.contains('.') + && !trimmed.contains('/') + && !trimmed.contains('\\') + && !trimmed.chars().any(char::is_whitespace) +} + +#[cfg(test)] +pub(crate) fn packet_sufficiency_required_probe_queries( + question: &str, + task_class: PacketTaskClassDto, +) -> Vec { + packet_sufficiency_required_probe_queries_with_extra(question, task_class, &[]) +} + +pub(crate) fn packet_sufficiency_required_probe_queries_with_extra( + question: &str, + task_class: PacketTaskClassDto, + extra_probes: &[String], +) -> Vec { + let terms = packet_probe_terms(question); + let mut queries = packet_prompt_exact_symbol_probe_queries(question, &terms, task_class); + push_unique_owned_terms(&mut queries, extra_probes); + push_unique_owned_terms( + &mut queries, + &packet_sufficiency_required_probe_queries_from_terms(&terms, task_class), + ); + queries +} + +pub(crate) fn packet_sufficiency_required_probe_queries_from_terms( + terms: &[String], + task_class: PacketTaskClassDto, +) -> Vec { + if !matches!( + task_class, + PacketTaskClassDto::ArchitectureExplanation + | PacketTaskClassDto::DataFlow + | PacketTaskClassDto::ChangeImpact + | PacketTaskClassDto::RouteTracing + | PacketTaskClassDto::EditPlanning + ) { + return Vec::new(); + } + + let has = |term: &str| packet_terms_have(terms, term); + let has_any = |needles: &[&str]| packet_terms_have_any(terms, needles); + let mut queries = Vec::new(); + + if eval_probes_enabled() { + push_eval_required_probe_queries(terms, &mut queries); + return queries; + } + + if has("exec") && has_any(&["runtime", "session"]) { + push_unique_terms(&mut queries, &["exec runtime", "exec session"]); + } + if has("exec") && has_any(&["cli", "command", "subcommand"]) { + push_unique_terms(&mut queries, &["exec cli", "exec command"]); + } + if has_any(&["json", "jsonl"]) && has_any(&["event", "events", "output"]) { + push_unique_terms(&mut queries, &["json event output", "jsonl event output"]); + } + if has("thread") && has_any(&["start", "starts", "started"]) { + push_unique_term(&mut queries, "thread start"); + } + if has("turn") && has_any(&["start", "starts", "started"]) { + push_unique_term(&mut queries, "turn start"); + } + if has_any(&["storage", "persistent"]) || (has("data") && has_any(&["access", "accessed"])) { + push_unique_terms(&mut queries, &["storage access", "persistent storage"]); + } + if packet_terms_indicate_indexing_flow(terms) { + push_indexing_flow_required_probe_queries(&mut queries); + } + if packet_terms_indicate_request_dispatch_flow(terms) { + push_unique_terms( + &mut queries, + &[ + "request interceptor", + "request dispatch", + "transport adapter", + ], + ); + } + if packet_terms_indicate_prepared_session_adapter_flow(terms) { + push_unique_terms( + &mut queries, + &[ + "request preparation", + "session request", + "session send", + "adapter send", + "adapter selection", + ], + ); + } + if has("event") && has("loop") { + push_unique_terms( + &mut queries, + &[ + "event loop", + "event dispatch", + "network input", + "command dispatch", + ], + ); + } + if has("call") && has_any(&["command", "commands", "dispatch", "dispatches"]) { + push_unique_terms(&mut queries, &["command dispatch", "command handler"]); + } + if packet_terms_indicate_search_execution_flow(terms) { + push_search_flow_probe_queries(&mut queries); + } + if has_any(&["indexing", "indexed", "indexer"]) + && (has_any(&["storage", "persistent", "project", "configuration", "group"]) + || has_any(&["command", "commands"])) + { + push_unique_terms( + &mut queries, + &["build index", "source group indexing", "indexer command"], + ); + } + + queries +} + +pub(crate) fn packet_prompt_exact_symbol_probe_queries( + question: &str, + terms: &[String], + task_class: PacketTaskClassDto, +) -> Vec { + if !matches!( + task_class, + PacketTaskClassDto::ArchitectureExplanation + | PacketTaskClassDto::DataFlow + | PacketTaskClassDto::ChangeImpact + | PacketTaskClassDto::RouteTracing + | PacketTaskClassDto::EditPlanning + | PacketTaskClassDto::SymbolOwnership + | PacketTaskClassDto::BugLocalization + ) { + return Vec::new(); + } + + let mut queries = Vec::new(); + for term in exact_symbol_query_terms(question) { + if packet_prompt_exact_symbol_term_is_probe(&term) { + push_unique_term(&mut queries, &term); + } + } + if eval_probes_enabled() { + push_prompt_concept_derived_symbol_probes(terms, &mut queries); + } + queries +} + +fn packet_prompt_exact_symbol_term_is_probe(term: &str) -> bool { + let trimmed = term.trim(); + if trimmed.len() < 3 { + return false; + } + let letters = trimmed + .chars() + .filter(|ch| ch.is_ascii_alphabetic()) + .collect::>(); + !letters.is_empty() && !letters.iter().all(|ch| ch.is_ascii_uppercase()) +} + +pub(crate) fn packet_concrete_file_probe_queries_from_required( + required_queries: &[String], +) -> Vec { + let mut queries = Vec::new(); + for query in required_queries { + if let Some(file_query) = packet_required_probe_file_query(query) { + push_unique_term(&mut queries, &file_query); + } + } + queries +} + +fn packet_required_probe_file_query(query: &str) -> Option { + if !packet_required_probe_needs_concrete_file(query) { + return None; + } + let normalized_query = normalize_identifier(query); + if normalized_query == "eventprocessor" { + return Some("event_processor.rs".to_string()); + } + query + .chars() + .all(|ch| ch.is_ascii_alphanumeric() || ch == '_') + .then(|| format!("{query}.rs")) +} + +pub(crate) fn push_indexing_flow_required_probe_queries(queries: &mut Vec) { + push_unique_terms( + queries, + &[ + "indexing entrypoint", + "file discovery", + "symbol extraction", + "storage persistence", + "search projection", + "snapshot refresh", + ], + ); +} + +pub(crate) fn push_search_flow_probe_queries(queries: &mut Vec) { + push_unique_terms( + queries, + &[ + "search entrypoint", + "flag parsing", + "argument planning", + "candidate file walk", + "search execution", + "parallel search", + "result printer", + ], + ); +} + +pub(crate) fn packet_probe_query_is_cited(query: &str, answer: &AgentAnswerDto) -> bool { + answer + .citations + .iter() + .any(|citation| packet_citation_satisfies_required_probe(query, citation)) +} + +pub(crate) fn packet_citation_satisfies_required_probe( + query: &str, + citation: &AgentCitationDto, +) -> bool { + if let Some(matches_file_scoped_symbol) = + packet_file_scoped_symbol_probe_matches(query, citation) + { + return matches_file_scoped_symbol; + } + if packet_required_probe_needs_concrete_file(query) { + return packet_file_stem_matches_query(query, citation.file_path.as_deref()); + } + if packet_required_probe_needs_full_token_coverage(query) { + if packet_citation_probe_has_exact_identifier_match(query, citation) { + return true; + } + let tokens = packet_probe_match_tokens(query); + return !tokens.is_empty() + && packet_citation_probe_token_coverage(query, citation) >= tokens.len(); + } + let Some(match_rank) = packet_citation_probe_match_rank(query, citation) else { + return false; + }; + !packet_required_probe_needs_exact_match(query) || match_rank >= 4 +} + +pub(crate) fn packet_required_probe_needs_exact_match(query: &str) -> bool { + query.contains("::") || query.contains('.') +} + +fn packet_required_probe_needs_concrete_file(query: &str) -> bool { + let normalized_query = normalize_identifier(query); + normalized_query.contains("execevents") || normalized_query == "eventprocessor" +} + +fn packet_required_probe_needs_full_token_coverage(query: &str) -> bool { + matches!( + normalize_identifier(query).as_str(), + "indexingentrypoint" + | "filediscovery" + | "symbolextraction" + | "storagepersistence" + | "searchprojection" + | "snapshotrefresh" + ) +} + +fn packet_citation_probe_has_exact_identifier_match( + query: &str, + citation: &AgentCitationDto, +) -> bool { + let normalized_query = normalize_identifier(query); + if normalized_query.is_empty() { + return false; + } + let normalized_display = normalize_identifier(&citation.display_name); + normalized_display == normalized_query || normalized_display.ends_with(&normalized_query) +} + +pub(crate) fn packet_citation_probe_match_rank( + query: &str, + citation: &AgentCitationDto, +) -> Option { + let normalized_query = normalize_identifier(query); + if normalized_query.is_empty() { + return Some(0); + } + let normalized_display = normalize_identifier(&citation.display_name); + let normalized_path = citation + .file_path + .as_deref() + .map(packet_display_path) + .map(|path| normalize_identifier(&path)) + .unwrap_or_default(); + if let Some(matches_file_scoped_symbol) = + packet_file_scoped_symbol_probe_matches(query, citation) + { + if matches_file_scoped_symbol { + Some(6) + } else { + None + } + } else if packet_file_stem_matches_query(query, citation.file_path.as_deref()) { + Some(5) + } else if normalized_display == normalized_query + || normalized_display.ends_with(&normalized_query) + || (!packet_required_probe_needs_exact_match(query) + && packet_citation_probe_token_coverage(query, citation) >= 2) + { + Some(4) + } else if normalized_path.contains(&normalized_query) { + Some(3) + } else if normalized_display.contains(&normalized_query) { + Some(2) + } else if !normalized_display.is_empty() && normalized_query.contains(&normalized_display) { + Some(1) + } else { + None + } +} + +fn packet_file_scoped_symbol_probe_matches( + query: &str, + citation: &AgentCitationDto, +) -> Option { + let parts = packet_file_scoped_symbol_probe_parts(query)?; + let path = citation + .file_path + .as_deref() + .map(packet_display_path) + .unwrap_or_default(); + let file_name = path + .rsplit(['/', '\\']) + .next() + .unwrap_or(path.as_str()) + .to_ascii_lowercase(); + if file_name != parts.file_name { + return Some(false); + } + + let normalized_display = normalize_identifier(&citation.display_name); + if parts.symbols.len() >= 3 && parts.symbols[0] == "create" && parts.symbols[1] == "table" { + let Some(table_name) = parts.symbols.last() else { + return Some(false); + }; + let expected = format!("createtable{table_name}"); + return Some(normalized_display == expected || normalized_display.ends_with(&expected)); + } + if parts.symbols.len() >= 2 && parts.symbols[0] == "foreign" && parts.symbols[1] == "key" { + return Some( + normalized_display == "foreignkey" || normalized_display.ends_with("foreignkey"), + ); + } + Some(parts.symbols.iter().any(|symbol| { + normalized_display == *symbol + || normalized_display.ends_with(symbol) + || packet_file_scoped_short_symbol_matches(&citation.display_name, symbol) + })) +} + +fn packet_file_scoped_short_symbol_matches(display_name: &str, symbol: &str) -> bool { + if symbol.len() > 3 { + return false; + } + display_name + .rsplit(['.', ':', '#']) + .next() + .map(normalize_identifier) + .is_some_and(|tail| tail == symbol) +} + +pub(crate) struct PacketFileScopedSymbolProbe { + pub(crate) query_path: String, + pub(crate) file_name: String, + pub(crate) raw_symbols: Vec, + pub(crate) symbols: Vec, +} + +pub(crate) fn packet_file_scoped_symbol_probe_parts( + query: &str, +) -> Option { + let mut parts = query.split_whitespace(); + let file_part = parts + .next()? + .trim_matches(|ch: char| matches!(ch, '`' | '"' | '\'')); + let query_path = file_part.replace('\\', "/"); + let file_name = file_part.rsplit(['/', '\\']).next()?.to_ascii_lowercase(); + if !file_name.contains('.') { + return None; + } + + let raw_symbols = parts + .map(|part| { + part.trim_matches(|ch: char| matches!(ch, '`' | '"' | '\'' | ',' | ';')) + .to_string() + }) + .filter(|part| !part.is_empty()) + .collect::>(); + let symbols = raw_symbols + .iter() + .map(|part| normalize_identifier(part)) + .filter(|part| !part.is_empty()) + .collect::>(); + if symbols.is_empty() { + return None; + } + + Some(PacketFileScopedSymbolProbe { + query_path, + file_name, + raw_symbols, + symbols, + }) +} + +pub(crate) fn packet_citation_probe_token_coverage( + query: &str, + citation: &AgentCitationDto, +) -> usize { + let tokens = packet_probe_match_tokens(query); + if tokens.len() < 2 { + return 0; + } + let display = normalize_identifier(&citation.display_name); + let path = citation + .file_path + .as_deref() + .map(packet_display_path) + .map(|path| normalize_identifier(&path)) + .unwrap_or_default(); + tokens + .iter() + .filter(|token| display.contains(token.as_str()) || path.contains(token.as_str())) + .count() +} + +fn packet_probe_match_tokens(query: &str) -> Vec { + let mut tokens = Vec::new(); + for token in query + .split(|ch: char| !ch.is_ascii_alphanumeric()) + .map(|token| token.trim().to_ascii_lowercase()) + .filter(|token| token.len() >= 3 && !packet_query_stop_term(token)) + { + if !tokens.iter().any(|existing| existing == &token) { + tokens.push(token); + } + } + tokens +} + +fn push_unique_term(terms: &mut Vec, value: &str) { + let value = value.trim(); + if value.len() < 3 { + return; + } + if !terms.iter().any(|term| term.eq_ignore_ascii_case(value)) { + terms.push(value.to_string()); + } +} + +fn push_unique_terms(terms: &mut Vec, values: &[&str]) { + for value in values { + push_unique_term(terms, value); + } +} + +fn push_unique_owned_terms(terms: &mut Vec, values: &[String]) { + for value in values { + push_unique_term(terms, value); + } +} + +#[cfg(test)] +mod tests { + use super::*; + use codestory_contracts::api::{ + AgentCitationDto, NodeId, NodeKind, PacketClaimDto, RetrievalScoreBreakdownDto, + SearchHitOrigin, + }; + + fn test_packet_citation(display_name: &str, file_path: &str, score: f32) -> AgentCitationDto { + AgentCitationDto { + node_id: NodeId(format!( + "test:{}:{}", + display_name.replace(' ', "_"), + file_path.replace(['/', '\\'], "_") + )), + display_name: display_name.to_string(), + kind: NodeKind::FUNCTION, + file_path: Some(file_path.to_string()), + line: Some(1), + score, + origin: SearchHitOrigin::IndexedSymbol, + resolvable: true, + subgraph_id: None, + evidence_edge_ids: Vec::new(), + retrieval_score_breakdown: Some(RetrievalScoreBreakdownDto { + lexical: 0.4, + semantic: 0.2, + graph: 0.3, + total: score, + provenance: Vec::new(), + }), + } + } + + #[test] + fn packet_probe_match_rank_uses_multi_token_path_coverage() { + let mut citation = test_packet_citation( + "std::collections::HashMap", + "codex-rs/exec/src/event_processor_with_jsonl_output.rs", + 0.6, + ); + citation.kind = NodeKind::MODULE; + + assert_eq!( + packet_citation_probe_match_rank("jsonl event output", &citation), + Some(4) + ); + assert_eq!( + packet_citation_probe_token_coverage("jsonl event output", &citation), + 3 + ); + } + + #[test] + fn packet_required_probe_matching_uses_file_stems_and_display_symbols() { + let event_loop_entry = test_packet_citation("service::main", "src/event_loop.c", 0.9); + let command_handler = test_packet_citation("CommandHandler", "src/commands.c", 0.9); + let search_entrypoint = + test_packet_citation("search_driver::run", "crates/search/src/main.rs", 0.9); + let candidate_builder = test_packet_citation( + "CandidateFiles", + "crates/search/src/candidate_files.rs", + 0.9, + ); + + assert!(packet_citation_satisfies_required_probe( + "event_loop.c main", + &event_loop_entry + )); + assert!(packet_citation_satisfies_required_probe( + "command handler", + &command_handler + )); + assert!(packet_citation_satisfies_required_probe( + "search driver run", + &search_entrypoint + )); + assert!(packet_citation_satisfies_required_probe( + "candidate files", + &candidate_builder + )); + } + + #[test] + fn file_scoped_required_probes_match_symbol_inside_file() { + let gin_new = test_packet_citation("New", "gin.go", 0.9); + let gin_with = test_packet_citation("Engine.With", "gin.go", 0.9); + let binding_default = test_packet_citation("Default", "binding/binding.go", 0.9); + let router_group = test_packet_citation("RouterGroup", "routergroup.go", 0.9); + let router_group_handle = test_packet_citation("RouterGroup.Handle", "routergroup.go", 0.9); + + assert!(packet_citation_satisfies_required_probe( + "gin.go New", + &gin_new + )); + assert!(!packet_citation_satisfies_required_probe( + "gin.go New", + &gin_with + )); + assert!(!packet_citation_satisfies_required_probe( + "gin.go Default", + &binding_default + )); + assert!(packet_citation_satisfies_required_probe( + "routergroup.go RouterGroup.Handle", + &router_group_handle + )); + assert!(!packet_citation_satisfies_required_probe( + "routergroup.go RouterGroup.Handle", + &router_group + )); + + let create_track = test_packet_citation( + "CREATE TABLE Track", + "SampleDatabase/DataSources/Sample_Sqlite.sql", + 0.9, + ); + let create_playlist_track = test_packet_citation( + "CREATE TABLE PlaylistTrack", + "SampleDatabase/DataSources/Sample_Sqlite.sql", + 0.9, + ); + assert!(packet_citation_satisfies_required_probe( + "SampleDatabase/DataSources/Sample_Sqlite.sql CREATE TABLE Track", + &create_track + )); + assert!(!packet_citation_satisfies_required_probe( + "SampleDatabase/DataSources/Sample_Sqlite.sql CREATE TABLE Track", + &create_playlist_track + )); + } + + #[test] + fn route_sufficiency_probes_can_be_covered_by_source_claims() { + let claims = vec![ + PacketClaimDto { + claim: "app.use registers middleware on the router.".to_string(), + citations: Vec::new(), + }, + PacketClaimDto { + claim: "app.handle delegates request handling to the router.".to_string(), + citations: Vec::new(), + }, + PacketClaimDto { + claim: "res.send prepares and sends the response body.".to_string(), + citations: Vec::new(), + }, + ]; + + for probe in ["app.use", "app.handle", "res.send"] { + assert!( + packet_probe_query_is_claimed(probe, &claims), + "expected claim-backed coverage for {probe}: {claims:?}" + ); + } + } +} diff --git a/crates/codestory-runtime/src/agent/packet_scoring.rs b/crates/codestory-runtime/src/agent/packet_scoring.rs index 84b23d95..08533795 100644 --- a/crates/codestory-runtime/src/agent/packet_scoring.rs +++ b/crates/codestory-runtime/src/agent/packet_scoring.rs @@ -61,7 +61,11 @@ pub(crate) fn packet_citation_rank( { score -= 3.0; } - if path.starts_with("extensions/") || path.starts_with("vendor/") { + if path.starts_with("extensions/") + || path.starts_with("vendor/") + || path.starts_with("deps/") + || path.contains("/deps/") + { score -= 20.0; } if packet_path_is_test_segment(&path) { @@ -258,7 +262,6 @@ const PACKET_QUERY_STOP_TERMS: &[&str] = &[ "it", "its", "like", - "main", "module", "modules", "move", @@ -334,12 +337,12 @@ pub(crate) fn normalize_identifier(value: &str) -> String { pub(crate) fn packet_display_path(path: &str) -> String { let normalized = path.trim_start_matches("\\\\?\\").replace('\\', "/"); - if !normalized.contains(':') && !normalized.starts_with('/') { - return normalized; - } if let Some(path) = path_after_named_repo_root(&normalized) { return path; } + if !normalized.contains(':') && !normalized.starts_with('/') { + return normalized; + } for prefix in [ "crates/", "src/", @@ -370,12 +373,9 @@ pub(crate) fn packet_display_path(path: &str) -> String { } fn path_after_named_repo_root(normalized: &str) -> Option { - for marker in [ - "/source/repos/", - "/target/agent-benchmark/repos/", - "/repos/", - ] { - let Some(index) = normalized.find(marker) else { + let mut best_match: Option<(usize, String)> = None; + for marker in ["/source/repos/", "source/repos/", "/repos/", "repos/"] { + let Some(index) = normalized.rfind(marker) else { continue; }; let suffix = &normalized[index + marker.len()..]; @@ -384,8 +384,14 @@ fn path_after_named_repo_root(normalized: &str) -> Option { }; let path = &suffix[repo_name_end + 1..]; if !path.is_empty() { - return Some(path.to_string()); + let candidate = path.to_string(); + if best_match + .as_ref() + .is_none_or(|(best_index, _)| index > *best_index) + { + best_match = Some((index, candidate)); + } } } - None + best_match.map(|(_, path)| path) } diff --git a/crates/codestory-runtime/src/agent/packet_search.rs b/crates/codestory-runtime/src/agent/packet_search.rs index 152762d9..36cd9e4c 100644 --- a/crates/codestory-runtime/src/agent/packet_search.rs +++ b/crates/codestory-runtime/src/agent/packet_search.rs @@ -6,12 +6,19 @@ use crate::agent::retrieval_primary::{ }; use crate::{AppController, HybridSearchScoredHit}; use codestory_contracts::api::{ - AgentHybridWeightsDto, ApiError, SearchHit, SemanticFallbackRecordDto, + AgentHybridWeightsDto, ApiError, PacketSidecarQueryDiagnosticDto, SearchHit, + SemanticFallbackRecordDto, }; pub(crate) struct SemanticHybridBatchOutcome { pub results: Vec<(String, Vec)>, pub fallbacks: Vec, + pub sidecar_diagnostics: Vec, +} + +pub(crate) struct LexicalBatchOutcome { + pub results: Vec<(String, Vec)>, + pub sidecar_diagnostics: Vec, } impl AppController { @@ -19,24 +26,34 @@ impl AppController { &self, queries: &[String], max_results: usize, - ) -> Result)>, ApiError> { + latency_budget_ms: Option, + ) -> Result { let batched = queries .iter() .map(|query| (query.clone(), max_results)) .collect::>(); - self.search_lexical_hybrid_batch(&batched) + self.search_lexical_hybrid_batch(&batched, latency_budget_ms) } pub(crate) fn search_lexical_hybrid_batch( &self, queries: &[(String, usize)], - ) -> Result)>, ApiError> { + latency_budget_ms: Option, + ) -> Result { if queries.is_empty() { - return Ok(Vec::new()); + return Ok(LexicalBatchOutcome { + results: Vec::new(), + sidecar_diagnostics: Vec::new(), + }); } if packet_batch_should_use_sidecar(self) { - match search_sidecar_packet_batch(self, queries, None) { - Ok(results) => return Ok(results), + match search_sidecar_packet_batch(self, queries, latency_budget_ms) { + Ok(outcome) => { + return Ok(LexicalBatchOutcome { + results: outcome.results, + sidecar_diagnostics: outcome.diagnostics, + }); + } Err(error) => { tracing::warn!( "sidecar retrieval packet lexical batch unavailable; fail-closed: {}", @@ -63,11 +80,13 @@ impl AppController { pub(crate) fn search_semantic_hybrid_batch( &self, queries: &[(String, usize, Option)], + latency_budget_ms: Option, ) -> Result { if queries.is_empty() { return Ok(SemanticHybridBatchOutcome { results: Vec::new(), fallbacks: Vec::new(), + sidecar_diagnostics: Vec::new(), }); } if packet_batch_should_use_sidecar(self) { @@ -75,10 +94,11 @@ impl AppController { .iter() .map(|(query, max_results, _)| (query.clone(), *max_results)) .collect::>(); - match search_sidecar_packet_batch(self, &batch, None) { - Ok(results) => { + match search_sidecar_packet_batch(self, &batch, latency_budget_ms) { + Ok(outcome) => { return Ok(SemanticHybridBatchOutcome { - results: results + results: outcome + .results .into_iter() .map(|(query, hits)| { ( @@ -96,6 +116,7 @@ impl AppController { }) .collect(), fallbacks: Vec::new(), + sidecar_diagnostics: outcome.diagnostics, }); } Err(error) => { @@ -144,8 +165,40 @@ impl AppController { mod tests { use super::*; + struct EnvVarGuard { + key: &'static str, + previous: Option, + } + + impl EnvVarGuard { + fn cleared(key: &'static str) -> Self { + let previous = std::env::var_os(key); + // SAFETY: test-only env cleanup under the shared process env lock. + unsafe { + std::env::remove_var(key); + } + Self { key, previous } + } + } + + impl Drop for EnvVarGuard { + fn drop(&mut self) { + // SAFETY: restores the process-local env var captured by this guard. + unsafe { + if let Some(previous) = self.previous.take() { + std::env::set_var(self.key, previous); + } else { + std::env::remove_var(self.key); + } + } + } + } + #[test] fn packet_subquery_warmup_fails_closed_without_sidecar_primary() { + let _lock = crate::process_env_test_lock(); + let _retrieval_env = EnvVarGuard::cleared("CODESTORY_RETRIEVAL"); + let _deprecated_retrieval_env = EnvVarGuard::cleared("CODESTORY_RETRIEVAL_V2"); let controller = AppController::new(); let error = controller diff --git a/crates/codestory-runtime/src/agent/packet_source_patterns.rs b/crates/codestory-runtime/src/agent/packet_source_patterns.rs new file mode 100644 index 00000000..ad872ec6 --- /dev/null +++ b/crates/codestory-runtime/src/agent/packet_source_patterns.rs @@ -0,0 +1,303 @@ +use crate::agent::packet_scoring::normalize_identifier; + +pub(crate) fn packet_source_has_all(source: &str, terms: &[&str]) -> bool { + let lower = source.to_ascii_lowercase(); + terms + .iter() + .all(|term| lower.contains(&term.to_ascii_lowercase())) +} + +pub(crate) fn packet_source_has_any(source: &str, terms: &[&str]) -> bool { + let lower = source.to_ascii_lowercase(); + terms + .iter() + .any(|term| lower.contains(&term.to_ascii_lowercase())) +} + +pub(crate) fn packet_source_identifier_with_words(source: &str, words: &[&str]) -> Option { + if words.is_empty() { + return None; + } + for token in source.split(|ch: char| !(ch.is_ascii_alphanumeric() || ch == '_')) { + let token = token.trim(); + if token.is_empty() { + continue; + } + let normalized = normalize_identifier(token); + if words.iter().all(|word| normalized.contains(word)) { + return Some(token.to_string()); + } + } + None +} + +pub(crate) fn packet_source_identifier_with_words_shortest( + source: &str, + words: &[&str], +) -> Option { + if words.is_empty() { + return None; + } + let mut best: Option = None; + for token in source.split(|ch: char| !(ch.is_ascii_alphanumeric() || ch == '_')) { + let token = token.trim(); + if token.is_empty() { + continue; + } + let normalized = normalize_identifier(token); + if !words.iter().all(|word| normalized.contains(word)) { + continue; + } + let replace = best + .as_ref() + .map(|existing| token.len() < existing.len()) + .unwrap_or(true); + if replace { + best = Some(token.to_string()); + } + } + best +} + +pub(crate) fn packet_source_identifier_exact(source: &str, word: &str) -> Option { + for token in source.split(|ch: char| !(ch.is_ascii_alphanumeric() || ch == '_')) { + let token = token.trim(); + if token.eq_ignore_ascii_case(word) { + return Some(token.to_string()); + } + } + None +} + +pub(crate) fn packet_source_identifier_ending_with( + source: &str, + suffix: &str, + excluded: &str, +) -> Option { + for token in source.split(|ch: char| !(ch.is_ascii_alphanumeric() || ch == '_')) { + let token = token.trim(); + if token.is_empty() || token.eq_ignore_ascii_case(excluded) { + continue; + } + if token.ends_with(suffix) { + return Some(token.to_string()); + } + } + None +} + +pub(crate) fn packet_source_constructed_type(source: &str) -> Option { + let bytes = source.as_bytes(); + let needle = b"new "; + let mut index = 0; + while index + needle.len() < bytes.len() { + if &bytes[index..index + needle.len()] != needle { + index += 1; + continue; + } + let mut start = index + needle.len(); + while start < bytes.len() && bytes[start].is_ascii_whitespace() { + start += 1; + } + let mut end = start; + while end < bytes.len() && (bytes[end].is_ascii_alphanumeric() || bytes[end] == b'_') { + end += 1; + } + if end > start { + let value = &source[start..end]; + if value + .chars() + .next() + .is_some_and(|ch| ch.is_ascii_uppercase()) + { + return Some(value.to_string()); + } + } + index = end.saturating_add(1); + } + None +} + +pub(crate) fn packet_display_owner(display: &str) -> Option { + let owner = display + .split(['.', ':', '#', '_']) + .find(|part| { + part.chars() + .next() + .is_some_and(|ch| ch.is_ascii_uppercase()) + })? + .trim(); + if owner.is_empty() { + None + } else { + Some(owner.to_string()) + } +} + +pub(crate) fn packet_sql_create_table_names(source: &str) -> Vec { + let mut names = Vec::new(); + for line in source.lines() { + if let Some(name) = packet_sql_identifier_after(line, "create table") + && !names.iter().any(|existing| existing == &name) + { + names.push(name); + } + if names.len() >= 12 { + break; + } + } + names +} + +pub(crate) fn packet_sql_foreign_key_claims(source: &str) -> Vec { + let mut links = Vec::new(); + let mut current_table: Option = None; + for line in source.lines() { + if let Some(table) = packet_sql_identifier_after(line, "create table") { + current_table = Some(table); + } + let normalized = line.to_ascii_lowercase(); + if !normalized.contains("foreign key") || !normalized.contains("references") { + continue; + } + let Some(source_table) = current_table.clone() else { + continue; + }; + let Some(local_key) = packet_sql_identifier_between(line, "foreign key", "references") + else { + continue; + }; + let Some(target_table) = packet_sql_identifier_after(line, "references") else { + continue; + }; + if !links + .iter() + .any(|(existing_source, existing_target, existing_key)| { + existing_source == &source_table + && existing_target == &target_table + && existing_key == &local_key + }) + { + links.push((source_table, target_table, local_key)); + } + if links.len() >= 18 { + break; + } + } + + let mut claims = Vec::new(); + for (source_table, target_table, local_key) in &links { + claims.push(format!( + "{source_table} rows reference {target_table} rows through {local_key}." + )); + } + + let mut grouped: Vec<(String, Vec)> = Vec::new(); + for (source_table, target_table, _) in links { + if let Some((_, targets)) = grouped + .iter_mut() + .find(|(existing_source, _)| existing_source == &source_table) + { + if !targets.iter().any(|existing| existing == &target_table) { + targets.push(target_table); + } + } else { + grouped.push((source_table, vec![target_table])); + } + } + for (source_table, targets) in grouped { + if targets.len() < 2 { + continue; + } + let claim = format!( + "{source_table} rows reference {} rows.", + packet_human_join(&targets) + ); + if !claims.iter().any(|existing| existing == &claim) { + claims.push(claim); + } + } + + claims +} + +fn packet_sql_identifier_between(line: &str, start: &str, end: &str) -> Option { + let lower = line.to_ascii_lowercase(); + let start_at = lower.find(start)? + start.len(); + let end_at = lower[start_at..].find(end)? + start_at; + packet_first_sql_identifier(&line[start_at..end_at]) +} + +pub(crate) fn packet_sql_identifier_after(line: &str, needle: &str) -> Option { + let lower = line.to_ascii_lowercase(); + let at = lower.find(needle)? + needle.len(); + if needle == "create table" + && lower[at..] + .chars() + .next() + .is_some_and(|ch| ch.is_ascii_alphabetic() || ch == '_') + { + return None; + } + let mut rest = line[at..].trim_start(); + for prefix in ["if not exists", "only"] { + if rest.to_ascii_lowercase().starts_with(prefix) { + rest = rest[prefix.len()..].trim_start(); + } + } + packet_first_sql_identifier(rest) +} + +fn packet_first_sql_identifier(input: &str) -> Option { + let mut token = String::new(); + let mut in_identifier = false; + let mut quote: Option = None; + for ch in input.chars() { + if !in_identifier { + if ch.is_ascii_alphanumeric() || matches!(ch, '_' | '"' | '\'' | '`' | '[') { + in_identifier = true; + quote = match ch { + '"' | '\'' | '`' => Some(ch), + '[' => Some(']'), + _ => None, + }; + if quote.is_none() { + token.push(ch); + } + } + continue; + } + if quote.is_some_and(|end| ch == end) { + break; + } + if quote.is_none() && !(ch.is_ascii_alphanumeric() || matches!(ch, '_' | '.' | '$')) { + break; + } + token.push(ch); + } + let token = token + .trim_matches(|ch: char| matches!(ch, '"' | '\'' | '`' | '[' | ']' | '(' | ')')) + .rsplit('.') + .next() + .unwrap_or_default() + .trim_matches(|ch: char| matches!(ch, '"' | '\'' | '`' | '[' | ']')) + .trim(); + if token.is_empty() { + None + } else { + Some(token.to_string()) + } +} + +pub(crate) fn packet_human_join(items: &[String]) -> String { + match items { + [] => String::new(), + [one] => one.clone(), + [first, second] => format!("{first} and {second}"), + _ => { + let mut parts = items.to_vec(); + let last = parts.pop().unwrap_or_default(); + format!("{}, and {last}", parts.join(", ")) + } + } +} diff --git a/crates/codestory-runtime/src/agent/packet_sufficiency.rs b/crates/codestory-runtime/src/agent/packet_sufficiency.rs new file mode 100644 index 00000000..2c89ccc9 --- /dev/null +++ b/crates/codestory-runtime/src/agent/packet_sufficiency.rs @@ -0,0 +1,568 @@ +use crate::agent::packet_claims::packet_supported_claims; +use crate::agent::packet_evidence_roles::packet_evidence_role; +use crate::agent::packet_plan::packet_symbol_probe_queries; +use crate::agent::packet_required_probes::packet_missing_sufficiency_probe_queries_with_extra; +use crate::agent::packet_scoring::{normalize_identifier, packet_display_path}; +use codestory_contracts::api::{ + AgentAnswerDto, AgentResponseBlockDto, AgentRetrievalStepStatusDto, GraphArtifactDto, + PacketBudgetDto, PacketBudgetModeDto, PacketClaimDto, PacketSufficiencyDto, + PacketSufficiencyStatusDto, PacketTaskClassDto, +}; +use std::collections::{BTreeSet, HashSet}; +use std::path::Path; + +pub(crate) const PACKET_MARKDOWN_TRUNCATION_SUFFIX: &str = + "\n\n... packet section truncated by budget ...\n"; + +pub(crate) struct PacketSufficiencyInput<'a> { + pub(crate) project_root: &'a Path, + pub(crate) question: &'a str, + pub(crate) task_class: PacketTaskClassDto, + pub(crate) answer: &'a AgentAnswerDto, + pub(crate) budget: &'a PacketBudgetDto, + pub(crate) supported_claims: Vec, + pub(crate) missing_required_probe_queries: Vec, + pub(crate) targeted_follow_up_queries: Vec, +} + +#[cfg(test)] +pub(crate) fn build_packet_sufficiency( + project_root: &Path, + question: &str, + task_class: PacketTaskClassDto, + answer: &AgentAnswerDto, + budget: &PacketBudgetDto, +) -> PacketSufficiencyDto { + build_packet_sufficiency_with_extra(project_root, question, task_class, answer, budget, &[]) +} + +pub(crate) fn build_packet_sufficiency_with_extra( + project_root: &Path, + question: &str, + task_class: PacketTaskClassDto, + answer: &AgentAnswerDto, + budget: &PacketBudgetDto, + extra_probes: &[String], +) -> PacketSufficiencyDto { + let supported_claims = packet_supported_claims(answer); + let missing_required_probe_queries = packet_missing_sufficiency_probe_queries_with_extra( + question, + task_class, + answer, + &supported_claims, + extra_probes, + ); + assemble_packet_sufficiency(PacketSufficiencyInput { + project_root, + question, + task_class, + answer, + budget, + supported_claims, + missing_required_probe_queries, + targeted_follow_up_queries: packet_targeted_follow_up_queries(question, task_class), + }) +} + +fn assemble_packet_sufficiency(input: PacketSufficiencyInput<'_>) -> PacketSufficiencyDto { + let PacketSufficiencyInput { + project_root, + question, + task_class, + answer, + budget, + supported_claims, + missing_required_probe_queries, + targeted_follow_up_queries, + } = input; + + let has_errors = answer + .retrieval_trace + .steps + .iter() + .any(|step| step.status == AgentRetrievalStepStatusDto::Error); + let min_citations = packet_sufficiency_min_citations(task_class); + let min_claims = packet_sufficiency_min_claims(task_class); + let has_minimum_coverage = answer.citations.len() >= min_citations; + let has_minimum_claims = supported_claims.len() >= min_claims; + let claim_family_count = packet_supported_claim_family_count(&supported_claims); + let has_minimum_claim_families = + packet_has_minimum_claim_family_coverage(task_class, &supported_claims); + let has_sufficiency_blocking_budget_omission = packet_has_sufficiency_blocking_budget_omission( + answer, + budget, + min_citations, + min_claims, + supported_claims.len(), + ); + let unresolved_sidecar_queries = unresolved_sidecar_queries(answer); + let status = packet_sufficiency_status(PacketSufficiencyStatusInput { + answer, + budget, + has_errors, + has_minimum_coverage, + has_minimum_claims, + has_minimum_claim_families, + has_sufficiency_blocking_budget_omission, + missing_required_probe_queries: &missing_required_probe_queries, + unresolved_sidecar_queries: &unresolved_sidecar_queries, + }); + + let gaps = packet_sufficiency_gaps( + task_class, + answer, + budget, + min_citations, + min_claims, + supported_claims.len(), + claim_family_count, + status, + has_minimum_coverage, + has_minimum_claims, + has_minimum_claim_families, + has_sufficiency_blocking_budget_omission, + &missing_required_probe_queries, + &unresolved_sidecar_queries, + ); + let follow_up_commands = packet_follow_up_commands( + project_root, + question, + status, + budget, + &missing_required_probe_queries, + targeted_follow_up_queries, + ); + let open_next = follow_up_commands.clone(); + let avoid_opening_paths = answer + .citations + .iter() + .filter_map(|citation| citation.file_path.as_ref()) + .map(|path| packet_display_path(path)) + .collect::>() + .into_iter() + .take(12) + .collect::>(); + let avoid_opening = avoid_opening_paths + .iter() + .map(|path| { + format!( + "{} because this packet already includes a citation for the current answer.", + path + ) + }) + .collect::>(); + + PacketSufficiencyDto { + status, + covered_claims: supported_claims, + open_next, + avoid_opening, + avoid_opening_paths, + gaps, + follow_up_commands, + } +} + +pub(crate) fn packet_targeted_follow_up_queries( + question: &str, + task_class: PacketTaskClassDto, +) -> Vec { + packet_symbol_probe_queries(question, task_class, PacketBudgetModeDto::Standard) + .into_iter() + .filter(|query| is_packet_structured_follow_up_query(query)) + .take(6) + .collect() +} + +fn is_packet_structured_follow_up_query(query: &str) -> bool { + query.contains('_') + || query.contains("::") + || query.contains("Options") + || query.contains("Params") + || query.contains("Processor") + || query.contains("Subcommand") +} + +struct PacketSufficiencyStatusInput<'a> { + answer: &'a AgentAnswerDto, + budget: &'a PacketBudgetDto, + has_errors: bool, + has_minimum_coverage: bool, + has_minimum_claims: bool, + has_minimum_claim_families: bool, + has_sufficiency_blocking_budget_omission: bool, + missing_required_probe_queries: &'a [String], + unresolved_sidecar_queries: &'a [String], +} + +fn packet_sufficiency_status( + input: PacketSufficiencyStatusInput<'_>, +) -> PacketSufficiencyStatusDto { + if input.answer.citations.is_empty() { + PacketSufficiencyStatusDto::Insufficient + } else if input.has_errors + || !input.has_minimum_coverage + || !input.has_minimum_claims + || !input.has_minimum_claim_families + || !input.missing_required_probe_queries.is_empty() + || !input.unresolved_sidecar_queries.is_empty() + || input.has_sufficiency_blocking_budget_omission + || packet_budget_exceeded_hard_output_cap(input.budget) + { + PacketSufficiencyStatusDto::Partial + } else { + PacketSufficiencyStatusDto::Sufficient + } +} + +#[allow(clippy::too_many_arguments)] +fn packet_sufficiency_gaps( + task_class: PacketTaskClassDto, + answer: &AgentAnswerDto, + budget: &PacketBudgetDto, + min_citations: usize, + min_claims: usize, + supported_claim_count: usize, + claim_family_count: usize, + status: PacketSufficiencyStatusDto, + has_minimum_coverage: bool, + has_minimum_claims: bool, + has_minimum_claim_families: bool, + has_sufficiency_blocking_budget_omission: bool, + missing_required_probe_queries: &[String], + unresolved_sidecar_queries: &[String], +) -> Vec { + let mut gaps = Vec::new(); + if answer.citations.is_empty() { + gaps.push("No cited anchors were found for the question.".to_string()); + } + if !answer.citations.is_empty() && !has_minimum_coverage { + gaps.push(format!( + "{:?} packet found only {} cited anchor(s); at least {} are required before treating the packet as sufficient.", + task_class, + answer.citations.len(), + min_citations + )); + } + if !answer.citations.is_empty() && !has_minimum_claims { + gaps.push(format!( + "{:?} packet found only {} role-backed claim(s); at least {} are required before treating the packet as sufficient.", + task_class, supported_claim_count, min_claims + )); + } + if !answer.citations.is_empty() && !has_minimum_claim_families { + gaps.push(format!( + "{:?} packet covered only {} distinct claim families; at least {} are required before treating the packet as sufficient.", + task_class, + claim_family_count, + packet_sufficiency_min_claim_families(task_class) + )); + } + if !missing_required_probe_queries.is_empty() { + gaps.push(format!( + "{:?} packet missed required planned flow probe(s): {}.", + task_class, + missing_required_probe_queries.join(", ") + )); + } + if !unresolved_sidecar_queries.is_empty() { + gaps.push(format!( + "{:?} packet had sidecar candidates that could not resolve to indexed symbols for: {}.", + task_class, + unresolved_sidecar_queries.join(", ") + )); + } + if budget.truncated && status != PacketSufficiencyStatusDto::Sufficient { + gaps.push(format!( + "Packet was truncated by {:?} budget: {}.", + budget.requested, + budget.omitted_sections.join(", ") + )); + } + if has_sufficiency_blocking_budget_omission { + gaps.push(format!( + "Packet omitted answer-critical evidence under {:?} budget; use a deeper packet before treating this as complete.", + budget.requested + )); + } + for step in answer + .retrieval_trace + .steps + .iter() + .filter(|step| step.status == AgentRetrievalStepStatusDto::Error) + { + gaps.push(format!("{:?} step failed.", step.kind)); + } + gaps +} + +fn unresolved_sidecar_queries(answer: &AgentAnswerDto) -> Vec { + let mut seen = HashSet::new(); + answer + .retrieval_trace + .packet_sidecar_diagnostics + .iter() + .filter(|diagnostic| { + diagnostic.candidate_count > 0 + && diagnostic.resolved_hit_count == 0 + && diagnostic.unresolved_candidate_count > 0 + }) + .filter(|diagnostic| seen.insert(diagnostic.query.clone())) + .map(|diagnostic| diagnostic.query.clone()) + .collect() +} + +fn packet_sufficiency_min_citations(task_class: PacketTaskClassDto) -> usize { + match task_class { + PacketTaskClassDto::BugLocalization | PacketTaskClassDto::SymbolOwnership => 2, + PacketTaskClassDto::ArchitectureExplanation + | PacketTaskClassDto::ChangeImpact + | PacketTaskClassDto::RouteTracing + | PacketTaskClassDto::DataFlow + | PacketTaskClassDto::EditPlanning => 3, + } +} + +fn packet_sufficiency_min_claims(task_class: PacketTaskClassDto) -> usize { + match task_class { + PacketTaskClassDto::BugLocalization | PacketTaskClassDto::SymbolOwnership => 1, + PacketTaskClassDto::ArchitectureExplanation => 3, + PacketTaskClassDto::ChangeImpact + | PacketTaskClassDto::RouteTracing + | PacketTaskClassDto::DataFlow + | PacketTaskClassDto::EditPlanning => 2, + } +} + +fn packet_sufficiency_min_claim_families(task_class: PacketTaskClassDto) -> usize { + match task_class { + PacketTaskClassDto::ArchitectureExplanation => 3, + PacketTaskClassDto::DataFlow => 2, + PacketTaskClassDto::BugLocalization + | PacketTaskClassDto::ChangeImpact + | PacketTaskClassDto::RouteTracing + | PacketTaskClassDto::SymbolOwnership + | PacketTaskClassDto::EditPlanning => 1, + } +} + +fn packet_has_minimum_claim_family_coverage( + task_class: PacketTaskClassDto, + supported_claims: &[PacketClaimDto], +) -> bool { + packet_supported_claim_family_count(supported_claims) + >= packet_sufficiency_min_claim_families(task_class) +} + +pub(crate) fn packet_supported_claim_family_count(supported_claims: &[PacketClaimDto]) -> usize { + let mut families: HashSet<&'static str> = HashSet::new(); + for claim in supported_claims { + if let Some(family) = packet_claim_family(claim) { + families.insert(family); + } + } + families.len() +} + +pub(crate) fn packet_claim_family(claim: &PacketClaimDto) -> Option<&'static str> { + let normalized_claim = normalize_identifier(&claim.claim); + if !normalized_claim.is_empty() { + if normalized_claim.contains("serialize") && normalized_claim.contains("key") { + return Some("key serialization"); + } + if normalized_claim.contains("cache") + && contains_any( + &normalized_claim, + &["helper", "state", "snapshot", "subscribe", "getset"], + ) + { + return Some("cache state"); + } + if contains_any(&normalized_claim, &["mutation", "mutate", "internalmutate"]) { + return Some("mutation flow"); + } + if contains_any( + &normalized_claim, + &[ + "blank", + "empty", + "casesensitive", + "ignorecase", + "whitespace", + "trim", + ], + ) && contains_any( + &normalized_claim, + &[ + "treats", "tests", "doesnot", "deciding", "return", "compares", + ], + ) { + return Some("predicate behavior"); + } + if normalized_claim.contains("public") + && contains_any( + &normalized_claim, + &["api", "export", "entrypoint", "hook", "method"], + ) + { + return Some("public api/export"); + } + if contains_any( + &normalized_claim, + &[ + "delegates", + "delegate", + "handoff", + "wraps", + "invokes", + "callsinto", + ], + ) { + return Some("delegation/handoff"); + } + } + + claim + .citations + .iter() + .find_map(|citation| packet_evidence_role(citation).map(|role| role.as_str())) + .or_else(|| (!claim.citations.is_empty()).then_some("source evidence")) +} + +fn packet_has_sufficiency_blocking_budget_omission( + answer: &AgentAnswerDto, + budget: &PacketBudgetDto, + min_citations: usize, + min_claims: usize, + supported_claim_count: usize, +) -> bool { + if !budget.truncated { + return false; + } + + let has_claim_stop_signal = + answer.citations.len() >= min_citations && supported_claim_count >= min_claims; + let has_retained_graph = packet_has_retained_graph(answer); + + budget + .omitted_sections + .iter() + .any(|section| match section.as_str() { + "packet_payload" => true, + "markdown_blocks" => { + !has_claim_stop_signal || packet_markdown_truncation_blocks_sufficiency(answer) + } + "trail_edges" => !has_claim_stop_signal || !has_retained_graph, + _ => false, + }) +} + +fn packet_has_retained_graph(answer: &AgentAnswerDto) -> bool { + answer.graphs.iter().any(|artifact| match artifact { + GraphArtifactDto::Uml { graph, .. } => !graph.edges.is_empty(), + GraphArtifactDto::Mermaid { .. } => false, + }) +} + +fn packet_markdown_truncation_blocks_sufficiency(answer: &AgentAnswerDto) -> bool { + let mut saw_truncated_markdown = false; + for section in &answer.sections { + for block in §ion.blocks { + let AgentResponseBlockDto::Markdown { markdown } = block else { + continue; + }; + if !markdown.contains(PACKET_MARKDOWN_TRUNCATION_SUFFIX.trim()) { + continue; + } + saw_truncated_markdown = true; + if !packet_section_allows_nonblocking_truncation(section.id.as_str()) { + return true; + } + } + } + !saw_truncated_markdown +} + +fn packet_section_allows_nonblocking_truncation(section_id: &str) -> bool { + section_id == "retrieval-evidence" + || section_id == "diagrams" + || section_id.starts_with("packet-subquery-") +} + +pub(crate) fn packet_budget_exceeded_hard_output_cap(budget: &PacketBudgetDto) -> bool { + budget.used.output_bytes > budget.limits.max_output_bytes +} + +fn packet_follow_up_commands( + project_root: &Path, + question: &str, + status: PacketSufficiencyStatusDto, + budget: &PacketBudgetDto, + missing_required_probe_queries: &[String], + targeted_follow_up_queries: Vec, +) -> Vec { + let project = quote_packet_project_arg(project_root); + match status { + PacketSufficiencyStatusDto::Sufficient => Vec::new(), + PacketSufficiencyStatusDto::Partial => { + let queries = if missing_required_probe_queries.is_empty() { + targeted_follow_up_queries + } else { + missing_required_probe_queries.to_vec() + }; + let mut commands = packet_follow_up_search_commands(project.as_str(), &queries); + commands.truncate(8); + commands + .into_iter() + .chain(budget.next_deeper_command.clone()) + .chain(std::iter::once(format!( + "codestory-cli search --project {project} --query {} --why", + quote_packet_command_value(question) + ))) + .collect() + } + PacketSufficiencyStatusDto::Insufficient => vec![ + format!("codestory-cli index --project {project} --refresh full"), + format!( + "codestory-cli search --project {project} --query {} --why", + quote_packet_command_value(question) + ), + ], + } +} + +fn packet_follow_up_search_commands(quoted_project: &str, queries: &[String]) -> Vec { + let mut commands = Vec::new(); + for query in queries { + push_unique_term( + &mut commands, + &format!( + "codestory-cli search --project {quoted_project} --query {} --why", + quote_packet_command_value(query) + ), + ); + } + commands +} + +pub(crate) fn quote_packet_project_arg(project_root: &Path) -> String { + quote_packet_command_value(project_root.to_string_lossy().as_ref()) +} + +pub(crate) fn quote_packet_command_value(value: &str) -> String { + format!("'{}'", value.replace('\'', "''")) +} + +fn contains_any(haystack: &str, needles: &[&str]) -> bool { + needles.iter().any(|needle| haystack.contains(needle)) +} + +fn push_unique_term(terms: &mut Vec, value: &str) { + let value = value.trim(); + if value.is_empty() { + return; + } + if !terms.iter().any(|existing| existing == value) { + terms.push(value.to_string()); + } +} diff --git a/crates/codestory-runtime/src/agent/packet_terms.rs b/crates/codestory-runtime/src/agent/packet_terms.rs new file mode 100644 index 00000000..3aa272e5 --- /dev/null +++ b/crates/codestory-runtime/src/agent/packet_terms.rs @@ -0,0 +1,475 @@ +use crate::agent::packet_scoring::normalize_identifier; +use crate::{is_non_primary_source_term, query_mentions_non_primary_source}; +use std::collections::HashSet; + +pub(crate) fn prompt_search_terms(prompt: &str) -> Vec { + const STOPWORDS: &[&str] = &[ + "a", + "actual", + "already", + "an", + "and", + "are", + "area", + "areas", + "across", + "as", + "at", + "be", + "boundaries", + "boundary", + "by", + "can", + "current", + "does", + "existing", + "for", + "from", + "how", + "implementation", + "implemented", + "in", + "is", + "it", + "of", + "on", + "or", + "repo", + "repository", + "risk", + "risks", + "study", + "surface", + "surfaces", + "the", + "this", + "to", + "what", + "where", + "which", + "why", + "with", + "work", + "works", + ]; + + let mut terms = Vec::new(); + let mut current = String::new(); + let mut seen = HashSet::new(); + + for ch in prompt.chars() { + if ch.is_ascii_alphanumeric() || ch == '_' { + current.push(ch.to_ascii_lowercase()); + continue; + } + + if current.len() >= 3 + && !STOPWORDS.contains(¤t.as_str()) + && seen.insert(current.clone()) + { + terms.push(current.clone()); + } + current.clear(); + } + + if current.len() >= 3 && !STOPWORDS.contains(¤t.as_str()) && seen.insert(current.clone()) + { + terms.push(current); + } + + terms +} + +pub(crate) fn packet_probe_terms(question: &str) -> Vec { + let include_non_primary_terms = query_mentions_non_primary_source(question); + let brand_terms = brand_phrase_noise_terms(question); + let mut terms = prompt_search_terms(question) + .into_iter() + .filter(|term| { + include_non_primary_terms + || !is_non_primary_source_term(term) + || packet_retains_non_primary_probe_term(question, term) + }) + .collect::>(); + + if !brand_terms.is_empty() && packet_terms_have_specific_flow_anchor(&terms) { + terms.retain(|term| !brand_terms.contains(term.as_str())); + } + + terms +} + +fn packet_retains_non_primary_probe_term(question: &str, term: &str) -> bool { + if !matches!(term, "bench" | "benchmark" | "benchmarks") { + return false; + } + let lowered = question.to_ascii_lowercase(); + lowered.contains("architecture") + && (lowered.contains("boundary") + || lowered.contains("boundaries") + || lowered.contains("across")) +} + +fn packet_terms_have_specific_flow_anchor(terms: &[String]) -> bool { + let has = |term: &str| terms.iter().any(|value| value.eq_ignore_ascii_case(term)); + let has_any = |needles: &[&str]| needles.iter().any(|needle| has(needle)); + (has("extension") && has("host")) + || ((has("indexing") || has("indexer")) && (has("storage") || has("persistent"))) + || ((has("json") || has("jsonl")) && (has("exec") || has("thread") || has("turn"))) + || packet_terms_indicate_request_dispatch_flow(terms) + || (has("event") && has("loop")) + || (has_any(&["command", "commands"]) && has_any(&["dispatch", "dispatches"])) + || (has("search") && (has("flags") || has("matcher") || has("haystack"))) + || has("payload") + || has("posts") + || has("post") + || has("comments") + || has("feed") + || has("rss") +} + +fn brand_phrase_noise_terms(question: &str) -> HashSet { + let mut terms = HashSet::new(); + let tokens = question + .split_whitespace() + .map(|token| { + token.trim_matches(|ch: char| { + matches!( + ch, + ',' | '.' | ';' | ':' | '?' | '!' | '(' | ')' | '[' | ']' | '{' | '}' + ) + }) + }) + .collect::>(); + + for window in tokens.windows(3) { + if let [left, joiner, right] = window + && *joiner == "&" + { + if let Some(term) = title_case_brand_token_term(left) { + terms.insert(term); + } + if let Some(term) = title_case_brand_token_term(right) { + terms.insert(term); + } + } + } + + terms +} + +fn title_case_brand_token_term(token: &str) -> Option { + let mut chars = token.chars(); + let first = chars.next()?; + let second = chars.next()?; + if first.is_ascii_uppercase() + && second.is_ascii_lowercase() + && chars.all(|ch| ch.is_ascii_alphanumeric() || ch == '-' || ch == '_') + { + Some(token.to_ascii_lowercase()) + } else { + None + } +} + +pub(crate) fn packet_terms_have(terms: &[String], needle: &str) -> bool { + let normalized_needle = normalize_identifier(needle); + terms.iter().any(|value| { + value.eq_ignore_ascii_case(needle) || normalize_identifier(value) == normalized_needle + }) +} + +pub(crate) fn packet_terms_have_any(terms: &[String], needles: &[&str]) -> bool { + needles + .iter() + .any(|needle| packet_terms_have(terms, needle)) +} + +pub(crate) fn packet_terms_indicate_indexing_flow(terms: &[String]) -> bool { + let has_any = |needles: &[&str]| packet_terms_have_any(terms, needles); + + has_any(&["index", "indexed", "indexer", "indexing"]) + && has_any(&[ + "cli", + "command", + "discovery", + "extraction", + "file", + "files", + "persistence", + "projection", + "refresh", + "runtime", + "search", + "snapshot", + "storage", + "store", + "symbol", + "workspace", + ]) +} + +pub(crate) fn packet_terms_indicate_request_dispatch_flow(terms: &[String]) -> bool { + let has = |term: &str| packet_terms_have(terms, term); + let has_any = |needles: &[&str]| packet_terms_have_any(terms, needles); + let explicit_client_transport = has_any(&[ + "adapter", + "adapters", + "interceptor", + "interceptors", + "transport", + ]); + if packet_terms_indicate_server_route_dispatch_flow(terms) && !explicit_client_transport { + return false; + } + let has_compound_request_dispatch = terms.iter().any(|term| { + let normalized = normalize_identifier(term); + normalized.contains("dispatch") && normalized.contains("request") + }); + has_any(&["interceptor", "interceptors"]) + || has_compound_request_dispatch + || ((has("request") || has("http")) + && has_any(&["adapter", "adapters", "dispatch", "dispatches", "transport"])) +} + +pub(crate) fn packet_terms_indicate_server_route_dispatch_flow(terms: &[String]) -> bool { + let has = |term: &str| packet_terms_have(terms, term); + let has_any = |needles: &[&str]| packet_terms_have_any(terms, needles); + has_any(&["route", "routes", "router"]) + && has_any(&[ + "handler", + "handlers", + "middleware", + "dispatch", + "dispatches", + ]) + && (has("request") + || has_any(&["server", "incoming", "http"]) + || has_any(&["engine", "method", "methods"])) +} + +pub(crate) fn packet_terms_indicate_prepared_session_adapter_flow(terms: &[String]) -> bool { + let has = |term: &str| packet_terms_have(terms, term); + let has_any = |needles: &[&str]| packet_terms_have_any(terms, needles); + (has("prepared") || has("prepare")) + && has_any(&["request", "requests"]) + && has("session") + && has_any(&["adapter", "adapters", "send", "sends", "transport"]) +} + +pub(crate) fn packet_terms_indicate_search_execution_flow(terms: &[String]) -> bool { + let has = |term: &str| packet_terms_have(terms, term); + let has_any = |needles: &[&str]| packet_terms_have_any(terms, needles); + has("search") + && has_any(&[ + "candidate", + "flags", + "haystack", + "matcher", + "printer", + "searcher", + "walk", + "walks", + ]) +} + +pub(crate) fn packet_terms_indicate_stylesheet_animation_flow(terms: &[String]) -> bool { + let has = |term: &str| packet_terms_have(terms, term); + let has_any = |needles: &[&str]| packet_terms_have_any(terms, needles); + let css_signal = has("css") + || has("animatecss") + || has_any(&[ + "stylesheet", + "stylesheets", + "style", + "styles", + "selector", + "selectors", + ]); + let animation_signal = has_any(&[ + "animate", + "animated", + "animation", + "animations", + "keyframe", + "keyframes", + ]); + let source_shape_signal = has_any(&[ + "base", + "class", + "classes", + "custom", + "property", + "properties", + "selector", + "selectors", + "variable", + "variables", + ]); + css_signal && animation_signal && source_shape_signal +} + +pub(crate) fn packet_terms_indicate_sql_schema_flow(terms: &[String]) -> bool { + let has_any = |needles: &[&str]| packet_terms_have_any(terms, needles); + has_any(&["sql", "schema", "schemas", "table", "tables"]) + && has_any(&[ + "relationship", + "relationships", + "relation", + "relations", + "foreign", + "constraint", + "constraints", + "reference", + "references", + ]) + && has_any(&["table", "tables", "create", "schema", "schemas"]) +} + +pub(crate) fn packet_terms_indicate_hook_cache_flow(terms: &[String]) -> bool { + let hook_signal = packet_terms_have_any(terms, &["hook", "hooks"]) + || terms.iter().any(|term| { + let normalized = normalize_identifier(term); + normalized.as_bytes() == [115, 119, 114] + || (normalized.len() > 3 && normalized.starts_with("use")) + }); + let cache_or_public_api_intent = packet_terms_have_any( + terms, + &[ + "api", + "cache", + "caches", + "caching", + "expose", + "exposes", + "export", + "exports", + "public", + "serialize", + "serializes", + ], + ); + + hook_signal && cache_or_public_api_intent +} + +pub(crate) fn packet_terms_indicate_client_send_flow(terms: &[String]) -> bool { + let client_or_request_intent = packet_terms_have_any( + terms, + &[ + "client", + "clients", + "request", + "requests", + "http", + "httpclient", + ], + ); + let send_or_transport_intent = packet_terms_have_any( + terms, + &[ + "convenience", + "helper", + "helpers", + "send", + "sending", + "sent", + "transport", + "transports", + ], + ); + + client_or_request_intent && send_or_transport_intent +} + +pub(crate) fn packet_terms_indicate_event_loop_command_flow(terms: &[String]) -> bool { + let has = |term: &str| packet_terms_have(terms, term); + let has_any = |needles: &[&str]| packet_terms_have_any(terms, needles); + let event_loop_intent = has("eventloop") || (has("event") && has("loop")); + let command_dispatch_intent = has_any(&["command", "commands"]) + && has_any(&[ + "acl", + "arity", + "call", + "dispatch", + "dispatches", + "execute", + "executes", + "execution", + "handler", + "handlers", + "input", + "network", + "process", + "slowlog", + "table", + ]); + let network_command_input_intent = + has_any(&["network", "socket", "client", "input"]) && has_any(&["command", "commands"]); + + event_loop_intent || command_dispatch_intent || network_command_input_intent +} + +pub(crate) fn packet_terms_indicate_url_session_request_flow(terms: &[String]) -> bool { + packet_terms_have_any(terms, &["session", "urlsession", "callback", "callbacks"]) + && packet_terms_have_any( + terms, + &[ + "request", + "requests", + "resume", + "resumes", + "task", + "tasks", + "validate", + "validates", + "validation", + ], + ) +} + +pub(crate) fn packet_terms_indicate_shell_version_use_flow(terms: &[String]) -> bool { + packet_terms_have_any( + terms, + &[ + "bash", "shell", "script", "command", "dispatch", "install", "version", + ], + ) && packet_terms_have_any(terms, &["use", "switch", "active", "current", "needed"]) +} + +pub(crate) fn packet_terms_indicate_string_predicate_flow(terms: &[String]) -> bool { + packet_terms_have_any( + terms, + &["string", "strings", "charsequence", "charsequences", "text"], + ) && packet_terms_have_any( + terms, + &[ + "blank", + "empty", + "whitespace", + "trim", + "trims", + "predicate", + "predicates", + ], + ) +} + +pub(crate) fn packet_terms_indicate_runtime_formatting_flow(terms: &[String]) -> bool { + packet_terms_have_any( + terms, + &["format", "formats", "formatting", "vformat", "format_to"], + ) && packet_terms_have_any( + terms, + &[ + "arg", + "args", + "argument", + "arguments", + "runtime", + "type", + "erased", + "output", + ], + ) +} diff --git a/crates/codestory-runtime/src/agent/packet_trace.rs b/crates/codestory-runtime/src/agent/packet_trace.rs index 3b9a3295..66c07f8c 100644 --- a/crates/codestory-runtime/src/agent/packet_trace.rs +++ b/crates/codestory-runtime/src/agent/packet_trace.rs @@ -1,4 +1,4 @@ -//! Trace adapters that merge packet batch retrieval results into agent answers. +//! Trace adapters that merge packet batch retrieval results into agent answers. #![allow(clippy::items_after_test_module)] @@ -224,6 +224,7 @@ mod golden_tests { semantic_fallbacks: Vec::new(), annotations: Vec::new(), steps: Vec::new(), + packet_sidecar_diagnostics: Vec::new(), retrieval_shadow: None, }, }; diff --git a/crates/codestory-runtime/src/agent/retrieval_primary.rs b/crates/codestory-runtime/src/agent/retrieval_primary.rs index b424ea58..5d7dd9b3 100644 --- a/crates/codestory-runtime/src/agent/retrieval_primary.rs +++ b/crates/codestory-runtime/src/agent/retrieval_primary.rs @@ -5,8 +5,9 @@ use crate::agent::retrieval_rollback::{RollbackCheckInput, check_and_log_rollbac use crate::{AppController, HybridSearchScoredHit}; use anyhow::Error as AnyhowError; use codestory_contracts::api::{ - ApiError, RetrievalCandidateResolutionCountDto, RetrievalCandidateSummaryDto, - RetrievalScoreBreakdownDto, RetrievalShadowDto, RetrievalStageTimingDto, SearchHit, + ApiError, PacketSidecarQueryDiagnosticDto, RetrievalCandidateResolutionCountDto, + RetrievalCandidateSummaryDto, RetrievalScoreBreakdownDto, RetrievalShadowDto, + RetrievalStageTimingDto, SearchHit, }; use codestory_contracts::graph::{NodeId as CoreNodeId, NodeKind}; use codestory_retrieval::{ @@ -19,6 +20,8 @@ use std::collections::{BTreeMap, HashMap}; use std::path::{Path, PathBuf}; const DEFAULT_SIDECAR_BUDGET_MS: u64 = 1_000; +const DEFAULT_PACKET_BATCH_BUDGET_MS: u64 = 18_000; +const MAX_PACKET_BATCH_BUDGET_MS: u64 = 120_000; const MAX_SHADOW_CANDIDATES: usize = 20; const MAX_SHADOW_WOULD_RANK: usize = 10; pub(crate) const RETRIEVAL_VERSION_SIDECAR: &str = "sidecar"; @@ -159,17 +162,45 @@ pub(crate) fn sidecar_retrieval_unavailable_error( fn sidecar_retrieval_recovery_commands(project: &str) -> Vec { let project = quote_cli_arg(project); vec![ - format!("codestory-cli index --project {project} --refresh full"), format!("codestory-cli retrieval bootstrap --project {project} --format json"), - format!("codestory-cli retrieval index --project {project} --refresh full"), + format!("codestory-cli retrieval index --project {project} --refresh full --format json"), + format!("codestory-cli retrieval status --project {project} --format json"), format!("codestory-cli doctor --project {project} --format markdown"), ] } fn quote_cli_arg(value: &str) -> String { + let normalized = clean_cli_path(value); + if normalized + .chars() + .any(|ch| matches!(ch, '$' | '`' | '\'' | '"')) + { + quote_shell_single_quoted_arg(&normalized) + } else { + format!("\"{}\"", normalized.replace('"', "\\\"")) + } +} + +#[cfg(windows)] +fn quote_shell_single_quoted_arg(value: &str) -> String { format!("'{}'", value.replace('\'', "''")) } +#[cfg(not(windows))] +fn quote_shell_single_quoted_arg(value: &str) -> String { + format!("'{}'", value.replace('\'', "'\\''")) +} + +fn clean_cli_path(value: &str) -> String { + let mut path = value.replace('\\', "/"); + if let Some(stripped) = path.strip_prefix("//?/UNC/") { + path = format!("//{stripped}"); + } else if path.starts_with("//?/") { + path = path[4..].to_string(); + } + path +} + pub(crate) fn shadow_retrieval_enabled() -> bool { if let Some(env) = unsupported_deprecated_env() { tracing::error!( @@ -274,6 +305,13 @@ pub(crate) fn sidecar_budget_ms(latency_budget_ms: Option) -> u64 { .max(100) } +fn sidecar_packet_batch_budget_ms(latency_budget_ms: Option) -> u64 { + latency_budget_ms + .map(u64::from) + .unwrap_or(DEFAULT_PACKET_BATCH_BUDGET_MS) + .clamp(100, MAX_PACKET_BATCH_BUDGET_MS) +} + pub(crate) fn run_sidecar_query( controller: &AppController, query: &str, @@ -351,13 +389,29 @@ pub(crate) fn try_sidecar_primary_search( prompt: &str, max_results: usize, latency_budget_ms: Option, +) -> Option { + try_sidecar_primary_search_inner_with_query( + controller, + prompt, + max_results, + latency_budget_ms, + run_sidecar_query, + ) +} + +fn try_sidecar_primary_search_inner_with_query( + controller: &AppController, + prompt: &str, + max_results: usize, + latency_budget_ms: Option, + mut run_query: impl FnMut(&AppController, &str, Option) -> Result, ) -> Option { if !sidecar_retrieval_primary_enabled(controller) { return sidecar_retrieval_unavailable_reason(controller) .map(|reason| SidecarPrimarySearchOutcome::Unavailable { reason }); } - let query_result = match run_sidecar_query(controller, prompt, latency_budget_ms) { + let query_result = match run_query(controller, prompt, latency_budget_ms) { Ok(result) => result, Err(error) => { return Some(SidecarPrimarySearchOutcome::Unavailable { @@ -366,10 +420,34 @@ pub(crate) fn try_sidecar_primary_search( } }; + Some(sidecar_primary_search_outcome_from_query_result( + controller, + query_result, + max_results, + )) +} + +fn sidecar_primary_search_outcome_from_query_result( + controller: &AppController, + query_result: QueryResult, + max_results: usize, +) -> SidecarPrimarySearchOutcome { let candidate_count = query_result.hits.len(); - let resolved_hits = - resolve_sidecar_candidates_to_search_hits(controller, &query_result.hits, max_results) - .unwrap_or_default(); + let resolved_hits = match resolve_sidecar_candidates_to_search_hits( + controller, + &query_result.hits, + max_results, + ) { + Ok(hits) => hits, + Err(error) => { + return SidecarPrimarySearchOutcome::Unavailable { + reason: format!( + "sidecar retrieval primary unavailable: candidate resolution failed: {}", + error.message + ), + }; + } + }; let shadow = shadow_from_query_result_with_candidate_admission_diagnostics( controller, query_result.clone(), @@ -382,7 +460,7 @@ pub(crate) fn try_sidecar_primary_search( if let Some(reason) = sidecar_result_rejection_reason(&query_result, &resolved_hits) { let diagnostic = sidecar_rejection_diagnostic(controller, &query_result, &resolved_hits, 5); let reason = format!("{reason}; {diagnostic}"); - return Some(SidecarPrimarySearchOutcome::Rejected { shadow, reason }); + return SidecarPrimarySearchOutcome::Rejected { shadow, reason }; } let hits = resolved_hits; @@ -398,45 +476,97 @@ pub(crate) fn try_sidecar_primary_search( }) .collect(); - Some(SidecarPrimarySearchOutcome::Served { + SidecarPrimarySearchOutcome::Served { hits, scored_hits, shadow, - }) + } } pub(crate) fn search_sidecar_packet_batch( controller: &AppController, queries: &[(String, usize)], latency_budget_ms: Option, -) -> Result)>, ApiError> { +) -> Result { with_sidecar_primary_retrieval(|| { search_sidecar_packet_batch_inner(controller, queries, latency_budget_ms) }) } +pub(crate) struct SidecarPacketBatchOutcome { + pub results: Vec<(String, Vec)>, + pub diagnostics: Vec, +} + +struct SidecarCandidateResolutionOutcome { + resolved_hits: Vec, + attempted_candidate_count: usize, + unresolved_candidate_count: usize, +} + +fn packet_sidecar_query_diagnostic( + query_result: &QueryResult, + resolution: &SidecarCandidateResolutionOutcome, +) -> PacketSidecarQueryDiagnosticDto { + PacketSidecarQueryDiagnosticDto { + query: query_result.query.clone(), + retrieval_mode: query_result.trace.retrieval_mode.clone(), + candidate_count: u32::try_from(resolution.attempted_candidate_count).unwrap_or(u32::MAX), + resolved_hit_count: u32::try_from(resolution.resolved_hits.len()).unwrap_or(u32::MAX), + unresolved_candidate_count: u32::try_from(resolution.unresolved_candidate_count) + .unwrap_or(u32::MAX), + diagnostic: (resolution.unresolved_candidate_count > 0) + .then(|| "sidecar candidates did not all resolve to indexed symbols".to_string()), + } +} + fn search_sidecar_packet_batch_inner( controller: &AppController, queries: &[(String, usize)], latency_budget_ms: Option, -) -> Result)>, ApiError> { - let per_query_budget = sidecar_budget_ms(latency_budget_ms) +) -> Result { + search_sidecar_packet_batch_inner_with_query( + controller, + queries, + latency_budget_ms, + run_sidecar_query, + ) +} + +fn search_sidecar_packet_batch_inner_with_query( + controller: &AppController, + queries: &[(String, usize)], + latency_budget_ms: Option, + mut run_query: impl FnMut(&AppController, &str, Option) -> Result, +) -> Result { + let per_query_budget = sidecar_packet_batch_budget_ms(latency_budget_ms) .checked_div(queries.len().max(1) as u64) .unwrap_or(100) .max(100); let mut results = Vec::with_capacity(queries.len()); + let mut diagnostics = Vec::with_capacity(queries.len()); for (query, max_results) in queries { - let query_result = run_sidecar_query(controller, query, Some(per_query_budget as u32)) - .map_err(|error| { + let query_result = + run_query(controller, query, Some(per_query_budget as u32)).map_err(|error| { sidecar_retrieval_unavailable_error( controller, format!("sidecar retrieval batch query failed: {error}"), ) })?; let max_results = (*max_results).clamp(1, 50); - let resolved_hits = - resolve_sidecar_candidates_to_search_hits(controller, &query_result.hits, max_results) - .unwrap_or_default(); + let resolution = + resolve_sidecar_candidates_with_stats(controller, &query_result.hits, max_results) + .map_err(|error| { + sidecar_retrieval_unavailable_error( + controller, + format!( + "sidecar retrieval rejected packet batch query `{query}`: candidate resolution failed: {}", + error.message + ), + ) + })?; + diagnostics.push(packet_sidecar_query_diagnostic(&query_result, &resolution)); + let resolved_hits = resolution.resolved_hits; if let Some(reason) = sidecar_packet_batch_rejection_reason(&query_result, &resolved_hits) { let diagnostic = sidecar_rejection_diagnostic(controller, &query_result, &resolved_hits, 5); @@ -449,12 +579,15 @@ fn search_sidecar_packet_batch_inner( } results.push((query.clone(), resolved_hits)); } - Ok(results) + Ok(SidecarPacketBatchOutcome { + results, + diagnostics, + }) } fn sidecar_packet_batch_rejection_reason( query_result: &QueryResult, - resolved_hits: &[SearchHit], + _resolved_hits: &[SearchHit], ) -> Option { if !sidecar_mode_can_serve_primary(&query_result.trace.retrieval_mode) { return Some(format!( @@ -462,21 +595,9 @@ fn sidecar_packet_batch_rejection_reason( query_result.trace.retrieval_mode )); } - if sidecar_packet_batch_unresolved_full_mode(query_result, resolved_hits) { - return Some("sidecar retrieval candidates did not resolve to indexed symbols".into()); - } None } -fn sidecar_packet_batch_unresolved_full_mode( - query_result: &QueryResult, - resolved_hits: &[SearchHit], -) -> bool { - sidecar_mode_can_serve_primary(&query_result.trace.retrieval_mode) - && !query_result.hits.is_empty() - && resolved_hits.is_empty() -} - pub(crate) fn packet_batch_should_use_sidecar(controller: &AppController) -> bool { sidecar_retrieval_primary_enabled(controller) } @@ -1051,6 +1172,16 @@ fn resolve_candidate_node_id( rel_path: &str, candidate: &CandidateHit, ) -> Option { + if let Some(node_id) = candidate + .node_id + .as_deref() + .and_then(|raw| raw.parse::().ok()) + .map(CoreNodeId) + && storage.get_node(node_id).ok().flatten().is_some() + { + return Some(node_id); + } + if let Some(line) = candidate.start_line { let mut first_nodes = Vec::new(); for lookup_path in candidate_lookup_paths(project_root, rel_path) { @@ -1115,11 +1246,22 @@ pub(crate) fn resolve_sidecar_candidates_to_search_hits( candidates: &[CandidateHit], max_results: usize, ) -> Result, ApiError> { + resolve_sidecar_candidates_with_stats(controller, candidates, max_results) + .map(|outcome| outcome.resolved_hits) +} + +fn resolve_sidecar_candidates_with_stats( + controller: &AppController, + candidates: &[CandidateHit], + max_results: usize, +) -> Result { controller.ensure_search_state()?; let storage = controller.open_storage()?; let project_root = controller.require_project_root()?; let node_names = controller.state.lock().node_names.clone(); let mut hits = Vec::new(); + let mut attempted_candidate_count = 0; + let mut unresolved_candidate_count = 0; let mut seen = HashMap::::new(); let mut ordered: Vec<&CandidateHit> = candidates .iter() @@ -1140,10 +1282,12 @@ pub(crate) fn resolve_sidecar_candidates_to_search_hits( if hits.len() >= max_results { break; } + attempted_candidate_count += 1; let rel_path = normalize_repo_relative_path(&project_root, &candidate.file_path); let Some(node_id) = resolve_candidate_node_id(&storage, &node_names, &project_root, &rel_path, candidate) else { + unresolved_candidate_count += 1; continue; }; let dedupe_key = node_id.0.to_string(); @@ -1154,18 +1298,48 @@ pub(crate) fn resolve_sidecar_candidates_to_search_hits( let Some(mut hit) = AppController::build_search_hit(&storage, &node_names, node_id, candidate.score) else { + unresolved_candidate_count += 1; continue; }; - hit.score_breakdown = Some(RetrievalScoreBreakdownDto { - lexical: candidate.score, - semantic: 0.0, - graph: 0.0, - total: candidate.score, - }); + hit.score_breakdown = Some(score_breakdown_for_candidate(candidate)); hits.push(hit); } - Ok(hits) + Ok(SidecarCandidateResolutionOutcome { + resolved_hits: hits, + attempted_candidate_count, + unresolved_candidate_count, + }) +} + +fn score_breakdown_for_candidate(candidate: &CandidateHit) -> RetrievalScoreBreakdownDto { + let provenance = candidate_provenance_labels(candidate); + let (lexical, semantic, graph) = match candidate.source { + CandidateSource::Zoekt => (candidate.score, 0.0, 0.0), + CandidateSource::Qdrant => (0.0, candidate.score, 0.0), + CandidateSource::Scip => (0.0, 0.0, candidate.score), + CandidateSource::Legacy => (candidate.score, 0.0, 0.0), + }; + RetrievalScoreBreakdownDto { + lexical, + semantic, + graph, + total: candidate.score, + provenance, + } +} + +fn candidate_provenance_labels(candidate: &CandidateHit) -> Vec { + if !candidate.provenance.is_empty() { + return candidate.provenance.clone(); + } + let label = match candidate.source { + CandidateSource::Zoekt => "lexical_source", + CandidateSource::Qdrant => "dense_anchor", + CandidateSource::Scip => "graph_neighbor", + CandidateSource::Legacy => "legacy", + }; + vec![label.to_string()] } #[cfg(test)] @@ -1174,12 +1348,9 @@ mod tests { use codestory_retrieval::{ CandidateHit, QueryTrace, RetrievalStageKind, StageTrace, classify_query, }; - use std::sync::{Mutex, MutexGuard}; - - static ENV_TEST_LOCK: Mutex<()> = Mutex::new(()); - fn env_test_lock() -> MutexGuard<'static, ()> { - ENV_TEST_LOCK.lock().expect("env test lock") + fn env_test_lock() -> std::sync::MutexGuard<'static, ()> { + crate::process_env_test_lock() } #[test] @@ -1498,17 +1669,39 @@ mod tests { } #[test] - fn recovery_commands_quote_powershell_sensitive_project_paths() { + fn packet_batch_budget_uses_packet_latency_budget() { + assert_eq!( + sidecar_packet_batch_budget_ms(None), + DEFAULT_PACKET_BATCH_BUDGET_MS + ); + assert_eq!(sidecar_packet_batch_budget_ms(Some(18_000)), 18_000); + assert_eq!(sidecar_packet_batch_budget_ms(Some(5_000)), 5_000); + assert_eq!(sidecar_packet_batch_budget_ms(Some(5)), 100); + assert_eq!( + sidecar_packet_batch_budget_ms(Some(250_000)), + MAX_PACKET_BATCH_BUDGET_MS + ); + } + + #[test] + fn recovery_commands_quote_shell_sensitive_project_paths() { let commands = sidecar_retrieval_recovery_commands(r"C:\tmp\cost$cache`tick's repo"); - assert_eq!( - commands[0], - r"codestory-cli index --project 'C:\tmp\cost$cache`tick''s repo' --refresh full" + #[cfg(windows)] + let expected_project = r"'C:/tmp/cost$cache`tick''s repo'"; + #[cfg(not(windows))] + let expected_project = r"'C:/tmp/cost$cache`tick'\''s repo'"; + + assert!( + commands + .first() + .is_some_and(|command| command.contains("retrieval bootstrap")), + "sidecar recovery should start with the sidecar bootstrap, not repeat a core index: {commands:?}" ); assert!( commands .iter() - .all(|command| command.contains(r"--project 'C:\tmp\cost$cache`tick''s repo'")), + .all(|command| command.contains(&format!("--project {expected_project}"))), "all recovery commands should quote the project path literally: {commands:?}" ); } @@ -1551,6 +1744,11 @@ mod tests { sidecar_input_hash: Some(hash.into()), sidecar_generation: Some(generation), projection_count: Some(0), + symbol_doc_count: Some(0), + dense_projection_count: Some(0), + semantic_policy_version: Some("graph_first_v1".into()), + graph_artifact_hash: Some("graph-test-hash".into()), + dense_reason_counts_json: Some("{}".into()), }) .expect("manifest"); @@ -1646,7 +1844,7 @@ mod tests { } #[test] - fn packet_batch_allows_empty_full_mode_queries_but_rejects_unresolved_candidates() { + fn packet_sidecar_query_diagnostic_distinguishes_empty_and_unresolved_candidates() { use codestory_retrieval::{CandidateSource, classify_query}; let empty_full = QueryResult { @@ -1663,11 +1861,16 @@ mod tests { stages: Vec::new(), }, }; - assert_eq!( - sidecar_packet_batch_rejection_reason(&empty_full, &[]), - None - ); - assert!(!sidecar_packet_batch_unresolved_full_mode(&empty_full, &[])); + let empty_resolution = SidecarCandidateResolutionOutcome { + resolved_hits: Vec::new(), + attempted_candidate_count: 0, + unresolved_candidate_count: 0, + }; + let empty_diagnostic = packet_sidecar_query_diagnostic(&empty_full, &empty_resolution); + assert_eq!(empty_diagnostic.candidate_count, 0); + assert_eq!(empty_diagnostic.resolved_hit_count, 0); + assert_eq!(empty_diagnostic.unresolved_candidate_count, 0); + assert!(empty_diagnostic.diagnostic.is_none()); let unresolved = QueryResult { query: "handler".into(), @@ -1688,45 +1891,127 @@ mod tests { stages: Vec::new(), }, }; - assert_eq!( - sidecar_packet_batch_rejection_reason(&unresolved, &[]), - Some("sidecar retrieval candidates did not resolve to indexed symbols".to_string()) + let unresolved_resolution = SidecarCandidateResolutionOutcome { + resolved_hits: Vec::new(), + attempted_candidate_count: 1, + unresolved_candidate_count: 1, + }; + let unresolved_diagnostic = + packet_sidecar_query_diagnostic(&unresolved, &unresolved_resolution); + assert_eq!(unresolved_diagnostic.candidate_count, 1); + assert_eq!(unresolved_diagnostic.resolved_hit_count, 0); + assert_eq!(unresolved_diagnostic.unresolved_candidate_count, 1); + assert!( + unresolved_diagnostic + .diagnostic + .as_deref() + .is_some_and(|value| value.contains("did not all resolve")) ); - assert!(sidecar_packet_batch_unresolved_full_mode(&unresolved, &[])); + } - let unresolved_scip_only = QueryResult { - query: "neutral sidecar candidate".into(), - features: classify_query("neutral sidecar candidate"), - hits: vec![CandidateHit::with_source( - "src/main.rs", - Some("src/main.rs".into()), - 0.5, - CandidateSource::Scip, - )], + #[test] + fn packet_sidecar_query_diagnostic_ignores_candidates_skipped_by_result_cap() { + use codestory_retrieval::{CandidateSource, classify_query}; + use codestory_store::{FileInfo, FileRole}; + + let temp = tempfile::tempdir().expect("tempdir"); + let storage_path = temp.path().join("cache").join("codestory.db"); + std::fs::create_dir_all(storage_path.parent().expect("storage parent")) + .expect("create storage parent"); + let source_path = temp.path().join("src").join("lib.rs"); + std::fs::create_dir_all(source_path.parent().expect("source parent")) + .expect("create source parent"); + std::fs::write(&source_path, "fn alpha() {}\n").expect("write source"); + + { + let mut storage = Store::open(&storage_path).expect("open storage"); + storage + .insert_file(&FileInfo { + id: 1, + path: source_path.clone(), + language: "rust".to_string(), + modification_time: 1, + indexed: true, + complete: true, + line_count: 1, + file_role: FileRole::Source, + }) + .expect("insert file"); + storage + .insert_nodes_batch(&[ + codestory_contracts::graph::Node { + id: CoreNodeId(1), + kind: NodeKind::FILE, + serialized_name: source_path.to_string_lossy().to_string(), + file_node_id: Some(CoreNodeId(1)), + start_line: Some(1), + ..Default::default() + }, + codestory_contracts::graph::Node { + id: CoreNodeId(2), + kind: NodeKind::FUNCTION, + serialized_name: "alpha".to_string(), + file_node_id: Some(CoreNodeId(1)), + start_line: Some(1), + ..Default::default() + }, + ]) + .expect("insert nodes"); + } + + let controller = AppController::new(); + controller + .open_project_with_storage_path(temp.path().to_path_buf(), storage_path) + .expect("open project"); + let mut resolved_candidate = CandidateHit::with_source( + "src/lib.rs", + Some("alpha".to_string()), + 1.0, + CandidateSource::Scip, + ); + resolved_candidate.node_id = Some("2".to_string()); + let query_result = QueryResult { + query: "alpha".into(), + features: classify_query("alpha"), + hits: vec![ + resolved_candidate, + CandidateHit::with_source( + "src/missing.rs", + Some("missing".to_string()), + 0.5, + CandidateSource::Scip, + ), + ], trace: QueryTrace { retrieval_mode: "full".into(), degraded_reason: None, - total_budget_ms: 100, - elapsed_ms: 120, + total_budget_ms: 500, + elapsed_ms: 1, cancel_reason: None, cache_hit: false, stages: Vec::new(), }, }; - assert_eq!( - sidecar_packet_batch_rejection_reason(&unresolved_scip_only, &[]), - Some("sidecar retrieval candidates did not resolve to indexed symbols".to_string()), - "packet batch should fail closed when SCIP-only candidates do not resolve" + + let resolution = resolve_sidecar_candidates_with_stats(&controller, &query_result.hits, 1) + .expect("resolve sidecar candidates"); + assert_eq!(resolution.attempted_candidate_count, 1); + assert_eq!(resolution.resolved_hits.len(), 1); + assert_eq!(resolution.unresolved_candidate_count, 0); + + let diagnostic = packet_sidecar_query_diagnostic(&query_result, &resolution); + assert_eq!(diagnostic.candidate_count, 1); + assert_eq!(diagnostic.resolved_hit_count, 1); + assert_eq!(diagnostic.unresolved_candidate_count, 0); + assert!( + diagnostic.diagnostic.is_none(), + "capped-away candidates should not create unresolved diagnostics: {diagnostic:?}" ); - assert!(sidecar_packet_batch_unresolved_full_mode( - &unresolved_scip_only, - &[] - )); } #[test] fn packet_batch_rejects_unavailable_sidecar_mode() { - use codestory_retrieval::classify_query; + use codestory_retrieval::{CandidateSource, classify_query}; let unavailable = QueryResult { query: "handler".into(), @@ -1746,6 +2031,245 @@ mod tests { sidecar_packet_batch_rejection_reason(&unavailable, &[]).as_deref(), Some("sidecar retrieval mode `no_semantic` is not eligible for packet batch results") ); + + let unresolved = QueryResult { + query: "handler".into(), + features: classify_query("handler"), + hits: vec![CandidateHit::with_source( + "semantic:handler", + Some("handler".into()), + 0.5, + CandidateSource::Qdrant, + )], + trace: QueryTrace { + retrieval_mode: "full".into(), + degraded_reason: None, + total_budget_ms: 500, + elapsed_ms: 1, + cancel_reason: None, + cache_hit: false, + stages: Vec::new(), + }, + }; + assert_eq!( + sidecar_packet_batch_rejection_reason(&unresolved, &[]).as_deref(), + None, + "packet subqueries should report unresolved full-mode candidates as diagnostics instead of aborting the whole packet" + ); + } + + #[test] + fn packet_batch_reports_unresolved_full_mode_candidates_without_rejecting() { + use codestory_retrieval::CandidateSource; + + let temp = tempfile::tempdir().expect("tempdir"); + let storage_path = temp.path().join("cache").join("codestory.db"); + std::fs::create_dir_all(storage_path.parent().expect("storage parent")) + .expect("create storage parent"); + let controller = AppController::new(); + controller + .open_project_with_storage_path(temp.path().to_path_buf(), storage_path) + .expect("open project"); + + let queries = vec![("helpers".to_string(), 5)]; + let outcome = search_sidecar_packet_batch_inner_with_query( + &controller, + &queries, + Some(500), + |_, _, _| { + Ok(QueryResult { + query: "helpers".into(), + features: classify_query("helpers"), + hits: vec![CandidateHit::with_source( + "docs/helpers.md", + Some("helpers".into()), + 0.5, + CandidateSource::Scip, + )], + trace: QueryTrace { + retrieval_mode: "full".into(), + degraded_reason: None, + total_budget_ms: 500, + elapsed_ms: 1, + cancel_reason: None, + cache_hit: false, + stages: Vec::new(), + }, + }) + }, + ) + .expect("full-mode unresolved candidates should not reject packet batch"); + + assert_eq!(outcome.results.len(), 1); + assert_eq!(outcome.results[0].0, "helpers"); + assert!( + outcome.results[0].1.is_empty(), + "unresolved packet query should contribute no resolved hits" + ); + assert_eq!(outcome.diagnostics.len(), 1); + let diagnostic = &outcome.diagnostics[0]; + assert_eq!(diagnostic.query, "helpers"); + assert_eq!(diagnostic.retrieval_mode, "full"); + assert_eq!(diagnostic.candidate_count, 1); + assert_eq!(diagnostic.resolved_hit_count, 0); + assert_eq!(diagnostic.unresolved_candidate_count, 1); + assert!( + diagnostic + .diagnostic + .as_deref() + .is_some_and(|value| value.contains("did not all resolve")), + "diagnostic should preserve unresolved sidecar detail: {diagnostic:?}" + ); + } + + #[test] + fn packet_batch_divides_request_budget_across_queries() { + use codestory_retrieval::classify_query; + + let temp = tempfile::tempdir().expect("tempdir"); + let storage_path = temp.path().join("cache").join("codestory.db"); + std::fs::create_dir_all(storage_path.parent().expect("storage parent")) + .expect("create storage parent"); + let controller = AppController::new(); + controller + .open_project_with_storage_path(temp.path().to_path_buf(), storage_path) + .expect("open project"); + + let queries = vec![ + ("entrypoint".to_string(), 5), + ("file discovery".to_string(), 5), + ("symbol extraction".to_string(), 5), + ("search projection".to_string(), 5), + ]; + let mut observed_budgets = Vec::new(); + let outcome = search_sidecar_packet_batch_inner_with_query( + &controller, + &queries, + Some(18_000), + |_, query, budget| { + observed_budgets.push(budget); + Ok(QueryResult { + query: query.to_string(), + features: classify_query(query), + hits: Vec::new(), + trace: QueryTrace { + retrieval_mode: "full".into(), + degraded_reason: None, + total_budget_ms: u64::from(budget.unwrap_or_default()), + elapsed_ms: 1, + cancel_reason: None, + cache_hit: false, + stages: Vec::new(), + }, + }) + }, + ) + .expect("empty full-mode packet query results should not reject"); + + assert_eq!(outcome.results.len(), queries.len()); + assert_eq!( + observed_budgets, + vec![Some(4_500), Some(4_500), Some(4_500), Some(4_500)] + ); + } + + #[test] + fn packet_batch_rejects_candidate_resolution_errors() { + use codestory_retrieval::CandidateSource; + + let temp = tempfile::tempdir().expect("tempdir"); + let storage_path = temp.path().join("cache").join("codestory.db"); + let controller = AppController::new(); + controller + .open_project_with_storage_path(temp.path().to_path_buf(), storage_path.clone()) + .expect("open project"); + std::fs::remove_dir_all(storage_path.parent().expect("storage parent")) + .expect("remove storage parent"); + + let queries = vec![("handler".to_string(), 5)]; + let result = search_sidecar_packet_batch_inner_with_query( + &controller, + &queries, + Some(500), + |_, _, _| { + Ok(QueryResult { + query: "handler".into(), + features: classify_query("handler"), + hits: vec![CandidateHit::with_source( + "src/lib.rs", + Some("handler".into()), + 0.5, + CandidateSource::Scip, + )], + trace: QueryTrace { + retrieval_mode: "full".into(), + degraded_reason: None, + total_budget_ms: 500, + elapsed_ms: 1, + cancel_reason: None, + cache_hit: false, + stages: Vec::new(), + }, + }) + }, + ); + + let error = match result { + Ok(_) => panic!("packet batch must reject candidate resolution errors"), + Err(error) => error, + }; + assert_eq!(error.code, "retrieval_unavailable"); + assert!( + error.message.contains("sidecar retrieval rejected") + || error.message.contains("candidate resolution failed"), + "error should preserve candidate resolution failure: {}", + error.message + ); + } + + #[test] + fn sidecar_primary_search_reports_candidate_resolution_errors() { + use codestory_retrieval::CandidateSource; + + let temp = tempfile::tempdir().expect("tempdir"); + let storage_path = temp.path().join("cache").join("codestory.db"); + let controller = AppController::new(); + controller + .open_project_with_storage_path(temp.path().to_path_buf(), storage_path.clone()) + .expect("open project"); + std::fs::remove_dir_all(storage_path.parent().expect("storage parent")) + .expect("remove storage parent"); + + let query_result = QueryResult { + query: "handler".into(), + features: classify_query("handler"), + hits: vec![CandidateHit::with_source( + "src/lib.rs", + Some("handler".into()), + 0.5, + CandidateSource::Scip, + )], + trace: QueryTrace { + retrieval_mode: "full".into(), + degraded_reason: None, + total_budget_ms: 500, + elapsed_ms: 1, + cancel_reason: None, + cache_hit: false, + stages: Vec::new(), + }, + }; + + let outcome = + sidecar_primary_search_outcome_from_query_result(&controller, query_result, 5); + + match outcome { + SidecarPrimarySearchOutcome::Unavailable { reason } => assert!( + reason.contains("candidate resolution failed"), + "reason should preserve candidate resolution failure: {reason}" + ), + _ => panic!("candidate resolution errors must make primary search unavailable"), + } } #[test] diff --git a/crates/codestory-runtime/src/agent/trace.rs b/crates/codestory-runtime/src/agent/trace.rs index b3e842fb..625bf5f7 100644 --- a/crates/codestory-runtime/src/agent/trace.rs +++ b/crates/codestory-runtime/src/agent/trace.rs @@ -128,6 +128,7 @@ impl TraceRecorder { semantic_fallbacks: Vec::new(), annotations: self.annotations, steps: self.steps, + packet_sidecar_diagnostics: Vec::new(), retrieval_shadow: self.retrieval_shadow, } } diff --git a/crates/codestory-runtime/src/agent/trace_export.rs b/crates/codestory-runtime/src/agent/trace_export.rs index 498146a2..29d64f60 100644 --- a/crates/codestory-runtime/src/agent/trace_export.rs +++ b/crates/codestory-runtime/src/agent/trace_export.rs @@ -3,6 +3,7 @@ use codestory_contracts::api::{ AgentAnswerDto, AgentRetrievalStepDto, AgentRetrievalStepKindDto, AgentRetrievalStepStatusDto, + PacketRetrievalTraceSummaryDto, }; use serde_json::{Value, json}; @@ -80,6 +81,66 @@ pub fn packet_step_trace_json(answer: &AgentAnswerDto) -> Value { payload } +pub(crate) fn packet_retrieval_trace_summary( + answer: &AgentAnswerDto, +) -> PacketRetrievalTraceSummaryDto { + let mut source_read_steps = 0; + let mut search_steps = 0; + let mut trail_steps = 0; + for step in &answer.retrieval_trace.steps { + match step.kind { + AgentRetrievalStepKindDto::SourceRead => source_read_steps += 1, + AgentRetrievalStepKindDto::Search + | AgentRetrievalStepKindDto::SemanticQueryEmbedding + | AgentRetrievalStepKindDto::SemanticCandidateRetrieval + | AgentRetrievalStepKindDto::HybridRerank + | AgentRetrievalStepKindDto::QueryExpansion => search_steps += 1, + AgentRetrievalStepKindDto::Trail + | AgentRetrievalStepKindDto::Neighborhood + | AgentRetrievalStepKindDto::TrailFilterOptions => trail_steps += 1, + AgentRetrievalStepKindDto::NodeDetails + | AgentRetrievalStepKindDto::NodeOccurrences + | AgentRetrievalStepKindDto::EdgeOccurrences + | AgentRetrievalStepKindDto::RepoTextFallback + | AgentRetrievalStepKindDto::MermaidSynthesis + | AgentRetrievalStepKindDto::AnswerSynthesis => {} + } + } + + let mut trace_summary = answer.retrieval_trace.clone(); + // The full step trace already lives under answer.retrieval_trace. Keep the + // retrieval trace summary scalar-sized so compact packets do not serialize it twice. + trace_summary.annotations.clear(); + trace_summary.steps.clear(); + + PacketRetrievalTraceSummaryDto { + retrieval_trace: trace_summary, + source_read_steps, + search_steps, + trail_steps, + } +} + +pub(crate) fn write_packet_step_trace_from_env(answer: &AgentAnswerDto) -> Option { + let trace_path = std::env::var("CODESTORY_PACKET_STEP_TRACE_OUT").ok()?; + let payload = match serde_json::to_string_pretty(&packet_step_trace_json(answer)) { + Ok(payload) => payload, + Err(error) => { + return Some(format!( + "packet_step_trace_out error=serialize path={} message={error}", + trace_path + )); + } + }; + match std::fs::write(&trace_path, payload) { + Ok(()) => None, + Err(error) => Some(format!( + "packet_step_trace_out error=write path={} message={error}", + trace_path + )), + } +} + fn attributable_step_rows(rows: &[PacketStepTraceRow]) -> Vec<&PacketStepTraceRow> { rows.iter() .filter(|row| row.status != format!("{:?}", AgentRetrievalStepStatusDto::Skipped)) @@ -147,6 +208,7 @@ mod tests { semantic_fallback_count: 0, semantic_fallbacks: Vec::new(), steps, + packet_sidecar_diagnostics: Vec::new(), annotations: Vec::new(), retrieval_shadow: None, }, @@ -244,6 +306,37 @@ mod tests { assert_eq!(json["retrieval_shadow"]["would_rank"][0], "src/main.rs"); } + #[test] + fn env_step_trace_write_error_is_reported() { + let _lock = crate::process_env_test_lock(); + let missing_parent = std::env::temp_dir().join(format!( + "codestory-missing-trace-parent-{}", + std::process::id() + )); + let trace_path = missing_parent.join("trace.json"); + // SAFETY: this test holds the process env lock and restores the variable below. + unsafe { + std::env::set_var("CODESTORY_PACKET_STEP_TRACE_OUT", &trace_path); + } + + let answer = sample_answer(Vec::new()); + let diagnostic = write_packet_step_trace_from_env(&answer) + .expect("missing parent should produce a write diagnostic"); + assert!( + diagnostic.starts_with("packet_step_trace_out error=write "), + "diagnostic should report the write error: {diagnostic}" + ); + assert!( + diagnostic.contains(trace_path.to_string_lossy().as_ref()), + "diagnostic should include the configured trace path: {diagnostic}" + ); + + // SAFETY: this test holds the process env lock. + unsafe { + std::env::remove_var("CODESTORY_PACKET_STEP_TRACE_OUT"); + } + } + #[test] fn search_step_total_ms_excludes_skipped_search_steps() { let answer = sample_answer(vec![ diff --git a/crates/codestory-runtime/src/graph_analysis.rs b/crates/codestory-runtime/src/graph_analysis.rs index 3a3953e1..20a5a9fc 100644 --- a/crates/codestory-runtime/src/graph_analysis.rs +++ b/crates/codestory-runtime/src/graph_analysis.rs @@ -1,5 +1,7 @@ use anyhow::{Context, Result}; -use codestory_contracts::api::{EdgeId as ApiEdgeId, NodeId as ApiNodeId}; +use codestory_contracts::api::{ + EdgeId as ApiEdgeId, IndexFreshnessDto, NodeId as ApiNodeId, ReadinessVerdictDto, +}; use codestory_contracts::graph::{Edge, EdgeKind, Node, NodeId, NodeKind}; use codestory_store::Store; use serde::Serialize; @@ -35,6 +37,26 @@ pub struct ReportGenerationMetadata { pub storage_path: String, pub generated_at_epoch_ms: u128, pub note: String, + #[serde(default, skip_serializing_if = "Option::is_none")] + pub handoff: Option, +} + +#[derive(Debug, Clone, Serialize)] +pub struct RepoReportHandoff { + pub readiness: Vec, + #[serde(skip_serializing_if = "Option::is_none")] + pub freshness: Option, + #[serde(skip_serializing_if = "Option::is_none")] + pub sidecar_retrieval_mode: Option, + #[serde(skip_serializing_if = "Option::is_none")] + pub degraded_reason: Option, + pub trust_caveat: String, + #[serde(skip_serializing_if = "Option::is_none")] + pub top_entry_point: Option, + #[serde(skip_serializing_if = "Option::is_none")] + pub top_risk: Option, + #[serde(skip_serializing_if = "Option::is_none")] + pub next_command: Option, } #[derive(Debug, Clone, Serialize)] @@ -252,6 +274,7 @@ fn build_report_from_source( storage_path: storage_path.to_string_lossy().to_string(), generated_at_epoch_ms: generated_at_epoch_ms(), note: "Report/export artifacts are generated from the current SQLite store and are not source-of-truth state.".to_string(), + handoff: None, }, summary, hotspots, diff --git a/crates/codestory-runtime/src/grounding.rs b/crates/codestory-runtime/src/grounding.rs index cf803b8e..475b76dd 100644 --- a/crates/codestory-runtime/src/grounding.rs +++ b/crates/codestory-runtime/src/grounding.rs @@ -571,6 +571,7 @@ fn search_hit_from_grounding_recommendation(candidate: &RecommendationCandidate< semantic: 0.0, graph: 0.55, total: 1.0, + provenance: Vec::new(), }), } } @@ -619,6 +620,7 @@ impl AppController { edge_count: clamp_i64_to_u32(stats.edge_count), file_count: clamp_i64_to_u32(derived_file_count), error_count: clamp_i64_to_u32(stats.error_count), + fatal_error_count: clamp_i64_to_u32(stats.fatal_error_count), }; let mut file_coverages = Vec::with_capacity(file_summaries.len()); diff --git a/crates/codestory-runtime/src/lib.rs b/crates/codestory-runtime/src/lib.rs index fd6f01f4..de48f0ee 100644 --- a/crates/codestory-runtime/src/lib.rs +++ b/crates/codestory-runtime/src/lib.rs @@ -28,12 +28,17 @@ use codestory_contracts::api::{ WorkspaceMemberIndexDto, WriteFileResponse, WriteFileTextRequest, }; use codestory_contracts::events::{Event, EventBus}; -use codestory_contracts::graph::{Edge as GraphEdge, Node as GraphNode}; +use codestory_contracts::graph::{AccessKind, Edge as GraphEdge, Node as GraphNode}; +use codestory_contracts::language_support::{ + LanguageSupportProfile, language_support_profile_for_ext, + language_support_profile_for_language_name, +}; use codestory_indexer::IncrementalIndexingStats; use codestory_indexer::WorkspaceIndexer as V2WorkspaceIndexer; use codestory_store::{ FileInfo, GroundingEdgeKindCount, GroundingNodeRecord, LlmSymbolDoc, LlmSymbolDocReuseMetadata, - LlmSymbolDocStats, SearchSymbolProjection, SnapshotStore, Store, SymbolSummaryRecord, + LlmSymbolDocStats, SearchSymbolProjection, SnapshotStore, Store, SymbolSearchDoc, + SymbolSummaryRecord, }; use codestory_workspace::{ IndexedFileRecord, RefreshExecutionPlan, RefreshInputs, Workspace, WorkspaceInventory, @@ -73,6 +78,14 @@ pub use codestory_contracts as contracts; pub(crate) use mermaid::{fallback_mermaid, mermaid_flowchart, mermaid_gantt, mermaid_sequence}; pub use query_language::{GraphQueryParseError, parse_graph_query}; pub(crate) use search_runtime::SearchEngine; + +#[cfg(test)] +static PROCESS_ENV_TEST_LOCK: std::sync::Mutex<()> = std::sync::Mutex::new(()); + +#[cfg(test)] +pub(crate) fn process_env_test_lock() -> std::sync::MutexGuard<'static, ()> { + PROCESS_ENV_TEST_LOCK.lock().expect("process env test lock") +} pub use search_runtime::*; use semantic_doc_text::{ runtime_concept_phrases, semantic_doc_language_from_path, semantic_path_aliases, @@ -83,10 +96,11 @@ pub use services::{ TrailService, }; pub(crate) use support::{ - FocusedSourceContext, HYBRID_RETRIEVAL_ENABLED_ENV, aggregate_symbol_matches, clamp_i64_to_u32, - clamp_u64_to_u32, clamp_u128_to_u32, clamp_usize_to_u32, extract_symbol_search_terms, - file_text_match_line, hybrid_retrieval_enabled, looks_like_repo_text_query, node_display_name, - preferred_occurrence, query_has_symbol_or_literal_signal, read_searchable_file_contents, + FocusedSourceContext, HYBRID_RETRIEVAL_ENABLED_ENV, SEMANTIC_FILE_TEXT_CACHE_MAX_BYTES, + SEMANTIC_FILE_TEXT_MAX_BYTES, aggregate_symbol_matches, clamp_i64_to_u32, clamp_u64_to_u32, + clamp_u128_to_u32, clamp_usize_to_u32, extract_symbol_search_terms, file_text_match_line, + hybrid_retrieval_enabled, looks_like_repo_text_query, node_display_name, preferred_occurrence, + query_has_symbol_or_literal_signal, read_file_text_limited, read_searchable_file_contents, should_expand_symbol_query, }; #[cfg(test)] @@ -352,7 +366,7 @@ struct FrameworkRouteCoverageEntry { framework: &'static str, language: &'static str, status: &'static str, - fixture_status: &'static str, + coverage_evidence: &'static str, confidence_floor: &'static str, handler_link_support: &'static str, unsupported_patterns: &'static [&'static str], @@ -365,7 +379,7 @@ const FRAMEWORK_ROUTE_COVERAGE_ENTRIES: &[FrameworkRouteCoverageEntry] = &[ framework: "express", language: "javascript/typescript", status: "partial", - fixture_status: "covered_by_indexer_unit_fixture", + coverage_evidence: "validated_by_indexer_regression", confidence_floor: "heuristic", handler_link_support: "probable_when_handler_name_resolves", unsupported_patterns: &[ @@ -379,7 +393,7 @@ const FRAMEWORK_ROUTE_COVERAGE_ENTRIES: &[FrameworkRouteCoverageEntry] = &[ framework: "react-router", language: "javascript/typescript", status: "partial", - fixture_status: "covered_by_indexer_unit_fixture", + coverage_evidence: "validated_by_indexer_regression", confidence_floor: "heuristic", handler_link_support: "not_claimed", unsupported_patterns: &[ @@ -392,7 +406,7 @@ const FRAMEWORK_ROUTE_COVERAGE_ENTRIES: &[FrameworkRouteCoverageEntry] = &[ framework: "sveltekit", language: "svelte/javascript/typescript", status: "partial", - fixture_status: "covered_by_indexer_unit_fixture", + coverage_evidence: "validated_by_indexer_regression", confidence_floor: "file_convention", handler_link_support: "probable_for_server_method_exports", unsupported_patterns: &[ @@ -405,7 +419,7 @@ const FRAMEWORK_ROUTE_COVERAGE_ENTRIES: &[FrameworkRouteCoverageEntry] = &[ framework: "nextjs", language: "javascript/typescript", status: "partial", - fixture_status: "covered_by_indexer_unit_fixture", + coverage_evidence: "validated_by_indexer_regression", confidence_floor: "file_convention", handler_link_support: "probable_for_route_method_exports", unsupported_patterns: &["middleware rewrites and route groups require source review"], @@ -416,7 +430,7 @@ const FRAMEWORK_ROUTE_COVERAGE_ENTRIES: &[FrameworkRouteCoverageEntry] = &[ framework: "remix", language: "javascript/typescript", status: "partial", - fixture_status: "covered_by_indexer_unit_fixture", + coverage_evidence: "validated_by_indexer_regression", confidence_floor: "file_convention", handler_link_support: "probable_for_loader_action_exports", unsupported_patterns: &["route config composition and resource routes are partial"], @@ -427,7 +441,7 @@ const FRAMEWORK_ROUTE_COVERAGE_ENTRIES: &[FrameworkRouteCoverageEntry] = &[ framework: "astro", language: "astro/javascript/typescript", status: "partial", - fixture_status: "covered_by_indexer_unit_fixture", + coverage_evidence: "validated_by_indexer_regression", confidence_floor: "file_convention", handler_link_support: "probable_for_endpoint_method_exports", unsupported_patterns: &["redirects and integration-generated routes are not expanded"], @@ -438,7 +452,7 @@ const FRAMEWORK_ROUTE_COVERAGE_ENTRIES: &[FrameworkRouteCoverageEntry] = &[ framework: "nuxt", language: "vue/javascript/typescript", status: "partial", - fixture_status: "covered_by_indexer_unit_fixture", + coverage_evidence: "validated_by_indexer_regression", confidence_floor: "file_convention", handler_link_support: "probable_for_server_handlers", unsupported_patterns: &["route middleware and generated module routes are partial"], @@ -449,7 +463,7 @@ const FRAMEWORK_ROUTE_COVERAGE_ENTRIES: &[FrameworkRouteCoverageEntry] = &[ framework: "fastify", language: "javascript/typescript", status: "partial", - fixture_status: "covered_by_indexer_unit_fixture", + coverage_evidence: "validated_by_indexer_regression", confidence_floor: "heuristic", handler_link_support: "probable_when_handler_name_resolves", unsupported_patterns: &["plugin prefixes and schema-only route declarations are partial"], @@ -460,7 +474,7 @@ const FRAMEWORK_ROUTE_COVERAGE_ENTRIES: &[FrameworkRouteCoverageEntry] = &[ framework: "koa", language: "javascript/typescript", status: "partial", - fixture_status: "covered_by_indexer_unit_fixture", + coverage_evidence: "validated_by_indexer_regression", confidence_floor: "heuristic", handler_link_support: "probable_when_handler_name_resolves", unsupported_patterns: &["router prefixes and middleware arrays are partial"], @@ -471,7 +485,7 @@ const FRAMEWORK_ROUTE_COVERAGE_ENTRIES: &[FrameworkRouteCoverageEntry] = &[ framework: "hono", language: "javascript/typescript", status: "partial", - fixture_status: "covered_by_indexer_unit_fixture", + coverage_evidence: "validated_by_indexer_regression", confidence_floor: "heuristic", handler_link_support: "probable_when_handler_name_resolves", unsupported_patterns: &["basePath/grouped routes are partial"], @@ -482,7 +496,7 @@ const FRAMEWORK_ROUTE_COVERAGE_ENTRIES: &[FrameworkRouteCoverageEntry] = &[ framework: "nestjs", language: "typescript", status: "partial", - fixture_status: "covered_by_indexer_unit_fixture", + coverage_evidence: "validated_by_indexer_regression", confidence_floor: "decorator", handler_link_support: "probable_for_controller_method", unsupported_patterns: &["global prefixes and dynamic decorator expressions are partial"], @@ -493,7 +507,7 @@ const FRAMEWORK_ROUTE_COVERAGE_ENTRIES: &[FrameworkRouteCoverageEntry] = &[ framework: "django", language: "python", status: "partial", - fixture_status: "covered_by_indexer_unit_fixture", + coverage_evidence: "validated_by_indexer_regression", confidence_floor: "heuristic", handler_link_support: "probable_when_handler_name_resolves", unsupported_patterns: &["include() trees and namespaced URLConfs are not fully expanded"], @@ -504,7 +518,7 @@ const FRAMEWORK_ROUTE_COVERAGE_ENTRIES: &[FrameworkRouteCoverageEntry] = &[ framework: "flask", language: "python", status: "partial", - fixture_status: "covered_by_indexer_unit_fixture", + coverage_evidence: "validated_by_indexer_regression", confidence_floor: "decorator", handler_link_support: "not_claimed", unsupported_patterns: &["blueprint prefixes and dynamic method declarations are partial"], @@ -515,7 +529,7 @@ const FRAMEWORK_ROUTE_COVERAGE_ENTRIES: &[FrameworkRouteCoverageEntry] = &[ framework: "fastapi", language: "python", status: "partial", - fixture_status: "covered_by_indexer_unit_fixture", + coverage_evidence: "validated_by_indexer_regression", confidence_floor: "decorator", handler_link_support: "not_claimed", unsupported_patterns: &["router prefixes and dependency-driven routing are partial"], @@ -526,7 +540,7 @@ const FRAMEWORK_ROUTE_COVERAGE_ENTRIES: &[FrameworkRouteCoverageEntry] = &[ framework: "rails", language: "ruby", status: "partial", - fixture_status: "covered_by_indexer_unit_fixture", + coverage_evidence: "validated_by_indexer_regression", confidence_floor: "heuristic", handler_link_support: "not_claimed", unsupported_patterns: &["resource expansion is not fully enumerated"], @@ -537,7 +551,7 @@ const FRAMEWORK_ROUTE_COVERAGE_ENTRIES: &[FrameworkRouteCoverageEntry] = &[ framework: "laravel", language: "php", status: "partial", - fixture_status: "covered_by_indexer_unit_fixture", + coverage_evidence: "validated_by_indexer_regression", confidence_floor: "heuristic", handler_link_support: "not_claimed", unsupported_patterns: &["controller arrays and route groups are partial"], @@ -548,7 +562,7 @@ const FRAMEWORK_ROUTE_COVERAGE_ENTRIES: &[FrameworkRouteCoverageEntry] = &[ framework: "spring", language: "java", status: "partial", - fixture_status: "covered_by_indexer_unit_fixture", + coverage_evidence: "validated_by_indexer_regression", confidence_floor: "annotation", handler_link_support: "not_claimed", unsupported_patterns: &["class-level prefixes are not fully combined in every case"], @@ -559,7 +573,7 @@ const FRAMEWORK_ROUTE_COVERAGE_ENTRIES: &[FrameworkRouteCoverageEntry] = &[ framework: "aspnet", language: "csharp", status: "partial", - fixture_status: "covered_by_indexer_unit_fixture", + coverage_evidence: "validated_by_indexer_regression", confidence_floor: "attribute", handler_link_support: "not_claimed", unsupported_patterns: &["controller-level route templates are partial"], @@ -570,7 +584,7 @@ const FRAMEWORK_ROUTE_COVERAGE_ENTRIES: &[FrameworkRouteCoverageEntry] = &[ framework: "axum", language: "rust", status: "partial", - fixture_status: "covered_by_indexer_unit_fixture", + coverage_evidence: "validated_by_indexer_regression", confidence_floor: "heuristic", handler_link_support: "probable_when_handler_name_resolves", unsupported_patterns: &["nested routers and stateful route composition are partial"], @@ -581,7 +595,7 @@ const FRAMEWORK_ROUTE_COVERAGE_ENTRIES: &[FrameworkRouteCoverageEntry] = &[ framework: "actix", language: "rust", status: "partial", - fixture_status: "covered_by_indexer_unit_fixture", + coverage_evidence: "validated_by_indexer_regression", confidence_floor: "heuristic", handler_link_support: "probable_when_handler_name_resolves", unsupported_patterns: &["scoped services and macros are partial"], @@ -592,7 +606,7 @@ const FRAMEWORK_ROUTE_COVERAGE_ENTRIES: &[FrameworkRouteCoverageEntry] = &[ framework: "rocket", language: "rust", status: "partial", - fixture_status: "covered_by_indexer_unit_fixture", + coverage_evidence: "validated_by_indexer_regression", confidence_floor: "attribute", handler_link_support: "not_claimed", unsupported_patterns: &["mount prefixes are not fully combined"], @@ -603,7 +617,7 @@ const FRAMEWORK_ROUTE_COVERAGE_ENTRIES: &[FrameworkRouteCoverageEntry] = &[ framework: "gin", language: "go", status: "partial", - fixture_status: "covered_by_indexer_unit_fixture", + coverage_evidence: "validated_by_indexer_regression", confidence_floor: "heuristic", handler_link_support: "not_claimed_text_only", unsupported_patterns: &["router groups and middleware chains are partial"], @@ -614,7 +628,7 @@ const FRAMEWORK_ROUTE_COVERAGE_ENTRIES: &[FrameworkRouteCoverageEntry] = &[ framework: "chi", language: "go", status: "partial", - fixture_status: "covered_by_indexer_unit_fixture", + coverage_evidence: "validated_by_indexer_regression", confidence_floor: "heuristic", handler_link_support: "not_claimed_text_only", unsupported_patterns: &["route groups and mounted subrouters are partial"], @@ -625,7 +639,7 @@ const FRAMEWORK_ROUTE_COVERAGE_ENTRIES: &[FrameworkRouteCoverageEntry] = &[ framework: "echo", language: "go", status: "partial", - fixture_status: "covered_by_indexer_unit_fixture", + coverage_evidence: "validated_by_indexer_regression", confidence_floor: "heuristic", handler_link_support: "not_claimed_text_only", unsupported_patterns: &["group prefixes are partial"], @@ -636,7 +650,7 @@ const FRAMEWORK_ROUTE_COVERAGE_ENTRIES: &[FrameworkRouteCoverageEntry] = &[ framework: "fiber", language: "go", status: "partial", - fixture_status: "covered_by_indexer_unit_fixture", + coverage_evidence: "validated_by_indexer_regression", confidence_floor: "heuristic", handler_link_support: "not_claimed_text_only", unsupported_patterns: &["group prefixes and mounted apps are partial"], @@ -647,7 +661,7 @@ const FRAMEWORK_ROUTE_COVERAGE_ENTRIES: &[FrameworkRouteCoverageEntry] = &[ framework: "vue-router", language: "vue", status: "partial", - fixture_status: "covered_by_indexer_unit_fixture", + coverage_evidence: "validated_by_indexer_regression", confidence_floor: "heuristic", handler_link_support: "not_claimed", unsupported_patterns: &["imported route arrays and generated routes are partial"], @@ -668,7 +682,7 @@ fn framework_route_coverage_dto(entry: &FrameworkRouteCoverageEntry) -> Framewor framework: entry.framework.to_string(), language: entry.language.to_string(), status: entry.status.to_string(), - fixture_status: entry.fixture_status.to_string(), + coverage_evidence: entry.coverage_evidence.to_string(), confidence_floor: entry.confidence_floor.to_string(), handler_link_support: entry.handler_link_support.to_string(), unsupported_patterns: entry @@ -685,6 +699,26 @@ fn framework_route_coverage_dto(entry: &FrameworkRouteCoverageEntry) -> Framewor } } +struct LanguageSupportSummary { + support_mode: String, + evidence_tier: String, + claim_label: String, +} + +fn language_support_summary_for_language(language: &str) -> LanguageSupportSummary { + language_support_profile_for_language_name(language) + .map(|profile| LanguageSupportSummary { + support_mode: profile.support_mode.as_str().to_string(), + evidence_tier: profile.evidence_tier.as_str().to_string(), + claim_label: profile.claim_label.to_string(), + }) + .unwrap_or_else(|| LanguageSupportSummary { + support_mode: "unknown".to_string(), + evidence_tier: "unknown".to_string(), + claim_label: "no support claim recorded".to_string(), + }) +} + const REPO_TEXT_SCAN_FILE_CAP: usize = 2_000; const REPO_TEXT_SCAN_BYTE_CAP: usize = 32 * 1024 * 1024; const REPO_TEXT_SCAN_TIME_CAP_MS: u128 = 500; @@ -885,43 +919,70 @@ fn search_hit_name_matches(display_name: &str, requested: &str) -> bool { } fn language_filter_matches_path(requested: &str, path: &str) -> bool { - let requested_lower = requested.trim().to_ascii_lowercase(); - let requested_normalized = normalize_filter_token(&requested_lower); + let requested_lower = requested + .trim() + .trim_start_matches('.') + .to_ascii_lowercase(); + if requested_lower.is_empty() { + return false; + } let extension = Path::new(path) .extension() .and_then(|value| value.to_str()) .unwrap_or_default() .to_ascii_lowercase(); + if extension.is_empty() { + return false; + } + + if let Some(language_name) = language_family_alias(&requested_lower) { + return language_profile_matches_extension_name(language_name, &extension); + } + + if let Some(profile) = language_support_profile_for_language_name(&requested_lower) { + return language_profile_matches_extension(profile, &extension); + } + + if language_support_profile_for_ext(&requested_lower).is_some() { + return requested_lower == extension; + } + let requested_normalized = normalize_filter_token(&requested_lower); match requested_normalized.as_str() { - "rs" | "rust" => extension == "rs", - "ts" | "typescript" => matches!(extension.as_str(), "ts" | "tsx"), - "tsx" => extension == "tsx", - "js" | "javascript" => matches!(extension.as_str(), "js" | "jsx" | "mjs" | "cjs"), - "jsx" => extension == "jsx", - "py" | "python" => extension == "py", - "java" => extension == "java", - "cpp" | "cxx" | "cc" | "hpp" | "hxx" | "cplusplus" if requested_lower != "c" => { - matches!(extension.as_str(), "cpp" | "cxx" | "cc" | "hpp" | "hxx") - } - _ if requested_lower == "c++" => { - matches!(extension.as_str(), "cpp" | "cxx" | "cc" | "hpp" | "hxx") - } - _ if requested_lower == "c#" => extension == "cs", - "c" => matches!(extension.as_str(), "c" | "h"), - "cs" | "csharp" => extension == "cs", - "go" => extension == "go", - "svelte" => extension == "svelte", - "json" => extension == "json", - "md" | "markdown" => matches!(extension.as_str(), "md" | "mdx"), - "rb" | "ruby" => extension == "rb", - "php" => extension == "php", - "kt" | "kotlin" => matches!(extension.as_str(), "kt" | "kts"), - "swift" => extension == "swift", + "markdown" => matches!(extension.as_str(), "md" | "mdx"), _ => requested_normalized == normalize_filter_token(&extension), } } +fn indexed_file_matches_language_filter( + stored_language: &str, + path: &Path, + requested: &str, +) -> bool { + stored_language.eq_ignore_ascii_case(requested) + || language_filter_matches_path(requested, &path.to_string_lossy()) +} + +fn language_family_alias(requested: &str) -> Option<&'static str> { + match requested { + "ts" => Some("typescript"), + "js" => Some("javascript"), + "kt" => Some("kotlin"), + "c++" | "cplusplus" => Some("cpp"), + "c#" | "cs" => Some("csharp"), + _ => None, + } +} + +fn language_profile_matches_extension(profile: &LanguageSupportProfile, extension: &str) -> bool { + profile.extensions.contains(&extension) +} + +fn language_profile_matches_extension_name(language_name: &str, extension: &str) -> bool { + language_support_profile_for_language_name(language_name) + .is_some_and(|profile| language_profile_matches_extension(profile, extension)) +} + fn normalize_filter_token(value: &str) -> String { value .chars() @@ -3190,6 +3251,14 @@ struct SemanticProjectionStats { docs_embedded: u32, docs_pending: u32, docs_stale: u32, + symbol_search_docs_written: u32, + dense_docs_skipped: u32, + dense_public_api: u32, + dense_entrypoint: u32, + dense_documented_nontrivial: u32, + dense_central_graph_node: u32, + dense_component_report: u32, + dense_unstructured_doc: u32, } #[derive(Debug, Clone, Copy, Default, PartialEq, Eq)] @@ -3228,6 +3297,14 @@ fn apply_semantic_projection_stats( timings.semantic_docs_embedded = Some(stats.docs_embedded); timings.semantic_docs_pending = Some(stats.docs_pending); timings.semantic_docs_stale = Some(stats.docs_stale); + timings.symbol_search_docs_written = Some(stats.symbol_search_docs_written); + timings.semantic_dense_docs_skipped = Some(stats.dense_docs_skipped); + timings.semantic_dense_public_api = Some(stats.dense_public_api); + timings.semantic_dense_entrypoint = Some(stats.dense_entrypoint); + timings.semantic_dense_documented_nontrivial = Some(stats.dense_documented_nontrivial); + timings.semantic_dense_central_graph_node = Some(stats.dense_central_graph_node); + timings.semantic_dense_component_report = Some(stats.dense_component_report); + timings.semantic_dense_unstructured_doc = Some(stats.dense_unstructured_doc); } fn apply_cache_refresh_stats(timings: &mut IndexingPhaseTimings, stats: CacheRefreshStats) { @@ -3404,6 +3481,8 @@ fn indexable_source_path(path: &Path) -> bool { .and_then(codestory_indexer::get_language_for_ext) .is_some(); tree_sitter_supported + || codestory_indexer::template_pipeline::template_kind_for_path(path).is_some() + || codestory_indexer::structural::is_structural_candidate_path(path) || codestory_indexer::is_text_only_candidate_path(path) || looks_like_openapi_source_path(path) } @@ -3765,7 +3844,7 @@ fn local_symbol_summary(doc: &LlmSymbolDoc) -> String { ) } -const LLM_SYMBOL_DOC_SCHEMA_VERSION: u32 = 4; +const LLM_SYMBOL_DOC_SCHEMA_VERSION: u32 = 5; const LLM_SYMBOL_DOC_VERSION_PREFIX: &str = "semantic_doc_version:"; const SEARCH_NODE_BATCH_SIZE: usize = 8_192; const SEARCH_SYMBOL_PROJECTION_BATCH_SIZE: usize = 4_096; @@ -3780,6 +3859,33 @@ const SEMANTIC_STREAM_PENDING_DOCS_ENV: &str = "CODESTORY_SEMANTIC_STREAM_PENDIN const SEMANTIC_STREAM_SORT_WINDOW_BATCHES_ENV: &str = "CODESTORY_SEMANTIC_STREAM_SORT_WINDOW_BATCHES"; const SEMANTIC_STREAM_SORT_WINDOW_BATCHES: usize = 1; +const SEMANTIC_POLICY_VERSION: &str = "graph_first_v1"; +const SYMBOL_SEARCH_DOC_PROVENANCE: &str = "extracted"; +const DENSE_CENTRAL_LABEL_THRESHOLD: usize = 12; +const DENSE_CENTRAL_SCORE_THRESHOLD: usize = 24; + +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +enum DenseAnchorReason { + PublicApi, + Entrypoint, + DocumentedNontrivial, + CentralGraphNode, + ComponentReport, + UnstructuredDoc, +} + +impl DenseAnchorReason { + fn as_str(self) -> &'static str { + match self { + Self::PublicApi => "public_api", + Self::Entrypoint => "entrypoint", + Self::DocumentedNontrivial => "documented_nontrivial", + Self::CentralGraphNode => "central_graph_node", + Self::ComponentReport => "component_report", + Self::UnstructuredDoc => "unstructured_doc", + } + } +} #[derive(Debug, Clone, Copy, PartialEq, Eq)] enum SemanticDocScope { @@ -4133,6 +4239,8 @@ fn stored_semantic_docs_contract_from_stats( mixed_doc_versions: stats.mixed_doc_versions, mixed_doc_shapes: stats.mixed_doc_shapes, doc_shape: stats.doc_shape.clone(), + semantic_policy_version: stats.semantic_policy_version.clone(), + mixed_semantic_policy_versions: stats.mixed_semantic_policy_versions, } } @@ -4227,12 +4335,110 @@ fn llm_indexable_kind(kind: codestory_contracts::graph::NodeKind) -> bool { llm_indexable_kind_for_scope(kind, semantic_doc_scope_from_env()) } +fn normalize_semantic_store_path(path: &Path) -> String { + let path = path.to_string_lossy().replace('\\', "/"); + if let Some(rest) = path.strip_prefix("//?/UNC/") { + return format!("//{rest}"); + } + if let Some(rest) = path.strip_prefix("//?/") { + return rest.to_string(); + } + path +} + +fn semantic_path_is_absolute_like(path: &str) -> bool { + let bytes = path.as_bytes(); + path.starts_with('/') + || (bytes.len() > 2 + && bytes[1] == b':' + && bytes[2] == b'/' + && bytes[0].is_ascii_alphabetic()) +} + +fn semantic_path_parent(path: &str) -> Option<&str> { + path.rsplit_once('/') + .map(|(parent, _)| parent) + .filter(|parent| !parent.is_empty()) +} + +fn common_semantic_path_prefix(left: &str, right: &str) -> String { + let left_parts = left.split('/').collect::>(); + let right_parts = right.split('/').collect::>(); + let mut common = Vec::new(); + for (left, right) in left_parts.iter().zip(right_parts.iter()) { + if left != right { + break; + } + common.push(*left); + } + common.join("/") +} + +fn common_absolute_semantic_parent(paths: &[(GraphNodeId, String)]) -> Option { + let mut parents = paths + .iter() + .map(|(_, path)| path.as_str()) + .filter(|path| semantic_path_is_absolute_like(path)) + .filter_map(semantic_path_parent); + let mut common = parents.next()?.to_string(); + for parent in parents { + common = common_semantic_path_prefix(&common, parent); + if common.is_empty() { + return None; + } + } + Some(common).filter(|common| !common.is_empty()) +} + +fn strip_semantic_common_parent(path: &str, common_parent: &str) -> Option { + let rest = path.strip_prefix(common_parent)?; + let rest = rest.strip_prefix('/')?; + (!rest.is_empty()).then(|| rest.to_string()) +} + +fn semantic_file_table_path_maps( + files: Vec, +) -> (HashMap, HashMap) { + let rows = files + .into_iter() + .map(|file| { + ( + codestory_contracts::graph::NodeId(file.id), + normalize_semantic_store_path(&file.path), + ) + }) + .collect::>(); + let common_parent = common_absolute_semantic_parent(&rows); + let mut display_paths = HashMap::new(); + let mut read_paths = HashMap::new(); + for (id, path) in rows { + let normalized = common_parent + .as_deref() + .and_then(|common_parent| strip_semantic_common_parent(&path, common_parent)) + .unwrap_or_else(|| path.clone()); + display_paths.insert(id, normalized); + read_paths.insert(id, path); + } + (display_paths, read_paths) +} + +fn semantic_file_table_path_map(files: Vec) -> HashMap { + let (display_paths, _) = semantic_file_table_path_maps(files); + display_paths +} + +fn semantic_file_table_read_path_map(files: Vec) -> HashMap { + let (_, read_paths) = semantic_file_table_path_maps(files); + read_paths +} + #[derive(Default)] struct SemanticDocGraphContext { child_labels: HashMap>, referenced_labels: HashMap>, edge_digests: HashMap>, file_paths: HashMap, + file_read_paths: HashMap, } impl SemanticDocGraphContext { @@ -4252,15 +4458,27 @@ impl SemanticDocGraphContext { let edges_by_node = storage.get_edges_for_node_ids(&node_ids).map_err(|e| { ApiError::internal(format!("Failed to load semantic doc graph context: {e}")) })?; + let files = storage + .get_files() + .map_err(|e| ApiError::internal(format!("Failed to load semantic doc files: {e}")))?; + let file_table_paths = semantic_file_table_path_map(files.clone()); + let file_table_read_paths = semantic_file_table_read_path_map(files); let mut context = Self::default(); for node in semantic_nodes { if let Some(file_id) = node.file_node_id && let Some(file_node) = nodes_by_id.get(&file_id) { - context - .file_paths - .insert(node.id, file_node.serialized_name.clone()); + let file_path = file_table_paths + .get(&file_id) + .cloned() + .unwrap_or_else(|| file_node.serialized_name.clone()); + context.file_paths.insert(node.id, file_path); + let read_path = file_table_read_paths + .get(&file_id) + .cloned() + .unwrap_or_else(|| file_node.serialized_name.clone()); + context.file_read_paths.insert(node.id, read_path); } let edges = edges_by_node @@ -4286,27 +4504,76 @@ impl SemanticDocGraphContext { fn file_path_for_node(&self, node: &GraphNode) -> Option<&str> { self.file_paths.get(&node.id).map(String::as_str) } + + fn file_read_path_for_node(&self, node: &GraphNode) -> Option<&str> { + self.file_read_paths + .get(&node.id) + .or_else(|| self.file_paths.get(&node.id)) + .map(String::as_str) + } } fn build_semantic_file_text_cache( graph_context: &SemanticDocGraphContext, semantic_nodes: &[&GraphNode], +) -> HashMap> { + build_semantic_file_text_cache_with_limits( + graph_context, + semantic_nodes, + SEMANTIC_FILE_TEXT_MAX_BYTES, + SEMANTIC_FILE_TEXT_CACHE_MAX_BYTES, + ) +} + +fn build_semantic_file_text_cache_with_limits( + graph_context: &SemanticDocGraphContext, + semantic_nodes: &[&GraphNode], + max_file_bytes: u64, + max_cache_bytes: usize, ) -> HashMap> { let mut file_paths = semantic_nodes .iter() - .filter_map(|node| graph_context.file_path_for_node(node).map(str::to_string)) - .collect::>() + .filter_map(|node| { + let display_path = graph_context.file_path_for_node(node)?.to_string(); + let read_path = graph_context + .file_read_path_for_node(node) + .unwrap_or(display_path.as_str()) + .to_string(); + Some((display_path, read_path)) + }) + .collect::>() .into_iter() .collect::>(); - file_paths.sort(); + file_paths.sort_by(|left, right| left.0.cmp(&right.0)); + + let mut cached_bytes = 0usize; + let mut cache_exhausted = false; + let mut cache = HashMap::with_capacity(file_paths.len()); + for (display_path, read_path) in file_paths { + if cache_exhausted { + cache.insert(display_path, None); + continue; + } - file_paths - .into_par_iter() - .map(|path| { - let contents = read_searchable_file_contents(&path); - (path, contents) - }) - .collect() + let contents = read_file_text_limited(Path::new(&read_path), max_file_bytes) + .ok() + .flatten(); + let Some(contents) = contents else { + cache.insert(display_path, None); + continue; + }; + + let body_bytes = contents.len(); + if cached_bytes.saturating_add(body_bytes) > max_cache_bytes { + cache_exhausted = true; + cache.insert(display_path, None); + continue; + } + + cached_bytes = cached_bytes.saturating_add(body_bytes); + cache.insert(display_path, Some(contents)); + } + cache } fn edge_digest_for_edges(edges: &[GraphEdge], limit: usize) -> Vec { @@ -4525,24 +4792,16 @@ fn build_llm_symbol_doc_text( ); let _ = writeln!(out, "symbol: {display_name}"); let _ = writeln!(out, "kind: {:?}", node.kind); - if let Some(path) = file_path { - let _ = writeln!(out, "file: {path}"); - let path_lower = path.to_ascii_lowercase(); - if path_lower.contains("/tests/") || path_lower.contains("\\tests\\") { - let _ = writeln!(out, "file_role: test"); - } else if path_lower.contains("/docs/") - || path_lower.contains("\\docs\\") - || path_lower.ends_with(".md") - { - let _ = writeln!(out, "file_role: docs"); - } - } if let Some(line) = node.start_line { let _ = writeln!(out, "line: {line}"); } if let Some(qualified_name) = node.qualified_name.as_deref() { let _ = writeln!(out, "qualified_name: {qualified_name}"); } + let (signature, comments, body) = symbol_excerpt(node, file_path, file_text_cache); + if !comments.is_empty() { + let _ = writeln!(out, "comments: {}", comments.join(" ")); + } if alias_mode != SemanticDocAliasMode::NoAlias { if let Some(language) = semantic_doc_language_from_path(file_path) { let _ = writeln!(out, "language: {language}"); @@ -4574,17 +4833,24 @@ fn build_llm_symbol_doc_text( semantic_symbol_role_aliases(node.kind) ); } - - let (signature, comments, body) = symbol_excerpt(node, file_path, file_text_cache); if !signature.is_empty() { let _ = writeln!(out, "signature: {}", signature.join(" ")); } - if !comments.is_empty() { - let _ = writeln!(out, "comments: {}", comments.join(" ")); - } if !body.is_empty() { let _ = writeln!(out, "body_summary: {}", body.join(" ")); } + if let Some(path) = file_path { + let _ = writeln!(out, "file: {path}"); + let path_lower = path.to_ascii_lowercase(); + if path_lower.contains("/tests/") || path_lower.contains("\\tests\\") { + let _ = writeln!(out, "file_role: test"); + } else if path_lower.contains("/docs/") + || path_lower.contains("\\docs\\") + || path_lower.ends_with(".md") + { + let _ = writeln!(out, "file_role: docs"); + } + } let children = graph_context .child_labels @@ -4647,11 +4913,13 @@ struct PendingLlmSymbolDoc { start_line: Option, doc_text: String, doc_hash: String, + dense_reason: DenseAnchorReason, } #[derive(Debug)] struct BuiltLlmSymbolDoc { - pending: PendingLlmSymbolDoc, + symbol_doc: SymbolSearchDoc, + pending: Option, reusable: bool, } @@ -4662,6 +4930,7 @@ struct SemanticVectorReuseContractKey { model_id: String, dimension: u32, doc_shape: String, + semantic_policy_version: String, } impl SemanticVectorReuseContractKey { @@ -4677,6 +4946,7 @@ impl SemanticVectorReuseContractKey { model_id: existing_doc.embedding_model.clone(), dimension: existing_doc.embedding_dim, doc_shape: existing_doc.doc_shape.clone()?, + semantic_policy_version: existing_doc.semantic_policy_version.clone()?, }) } @@ -4687,6 +4957,7 @@ impl SemanticVectorReuseContractKey { model_id: embedding_contract.cache_key.clone(), dimension, doc_shape: embedding_contract.doc_shape.clone(), + semantic_policy_version: SEMANTIC_POLICY_VERSION.to_string(), } } @@ -4699,6 +4970,7 @@ impl SemanticVectorReuseContractKey { && self.model_id.as_str() == embedding_contract.cache_key.as_str() && self.dimension > 0 && self.doc_shape.as_str() == embedding_contract.doc_shape.as_str() + && self.semantic_policy_version.as_str() == SEMANTIC_POLICY_VERSION } } @@ -4790,6 +5062,490 @@ fn llm_symbol_doc_can_reuse( existing_key.matches_current_without_known_dimension(doc_hash, embedding_contract) } +fn observe_dense_anchor_reason(stats: &mut SemanticProjectionStats, reason: DenseAnchorReason) { + match reason { + DenseAnchorReason::PublicApi => { + stats.dense_public_api = stats.dense_public_api.saturating_add(1); + } + DenseAnchorReason::Entrypoint => { + stats.dense_entrypoint = stats.dense_entrypoint.saturating_add(1); + } + DenseAnchorReason::DocumentedNontrivial => { + stats.dense_documented_nontrivial = stats.dense_documented_nontrivial.saturating_add(1); + } + DenseAnchorReason::CentralGraphNode => { + stats.dense_central_graph_node = stats.dense_central_graph_node.saturating_add(1); + } + DenseAnchorReason::ComponentReport => { + stats.dense_component_report = stats.dense_component_report.saturating_add(1); + } + DenseAnchorReason::UnstructuredDoc => { + stats.dense_unstructured_doc = stats.dense_unstructured_doc.saturating_add(1); + } + } +} + +fn semantic_edge_count(edge_digests: &[String]) -> usize { + edge_digests + .iter() + .filter_map(|digest| digest.rsplit_once('=')) + .filter_map(|(_, raw)| raw.parse::().ok()) + .sum() +} + +fn dense_anchor_score(graph_context: &SemanticDocGraphContext, node_id: GraphNodeId) -> usize { + let child_count = graph_context + .child_labels + .get(&node_id) + .map(Vec::len) + .unwrap_or(0); + let related_count = graph_context + .referenced_labels + .get(&node_id) + .map(Vec::len) + .unwrap_or(0); + let edge_count = graph_context + .edge_digests + .get(&node_id) + .map(|digests| semantic_edge_count(digests)) + .unwrap_or(0); + child_count + .saturating_add(related_count) + .saturating_add(edge_count) +} + +fn dense_anchor_is_central(graph_context: &SemanticDocGraphContext, node_id: GraphNodeId) -> bool { + let label_count = graph_context + .child_labels + .get(&node_id) + .map(Vec::len) + .unwrap_or(0) + .saturating_add( + graph_context + .referenced_labels + .get(&node_id) + .map(Vec::len) + .unwrap_or(0), + ); + label_count >= DENSE_CENTRAL_LABEL_THRESHOLD + && dense_anchor_score(graph_context, node_id) >= DENSE_CENTRAL_SCORE_THRESHOLD +} + +fn semantic_component_key_for_path(path: Option<&str>) -> Option { + let path = path?.replace('\\', "/"); + let parent = path + .rsplit_once('/') + .map(|(parent, _)| parent) + .unwrap_or(""); + let parts = parent + .split('/') + .filter(|part| !part.is_empty()) + .collect::>(); + if parts.is_empty() { + return Some("dir:.".into()); + } + if let Some(index) = parts.iter().position(|part| *part == "crates") + && let Some(crate_name) = parts.get(index.saturating_add(1)) + { + return Some(format!("crate:{crate_name}")); + } + if let Some(index) = parts.iter().position(|part| *part == "src") { + if let Some(module) = parts.get(index.saturating_add(1)) { + return Some(format!("module:src/{module}")); + } + return Some("module:src".into()); + } + Some(format!( + "dir:{}", + parts.iter().take(2).copied().collect::>().join("/") + )) +} + +fn virtual_component_report_node_id(component_key: &str) -> GraphNodeId { + const FNV_OFFSET: u64 = 0xcbf29ce484222325; + const FNV_PRIME: u64 = 0x100000001b3; + + let mut hash = FNV_OFFSET; + for byte in component_key.as_bytes() { + hash ^= u64::from(*byte); + hash = hash.wrapping_mul(FNV_PRIME); + } + let value = ((hash & 0x3fff_ffff_ffff_ffff) as i64).max(1); + codestory_contracts::graph::NodeId(-value) +} + +fn semantic_file_is_entrypoint(path: Option<&str>, display_name: &str) -> bool { + let name = display_name + .rsplit("::") + .next() + .unwrap_or(display_name) + .to_ascii_lowercase(); + if name == "main" { + return true; + } + semantic_path_is_entrypoint_file(path) + && matches!( + name.as_str(), + "__main__" + | "app" + | "application" + | "asgi" + | "function" + | "handler" + | "index" + | "program" + | "route" + | "routes" + | "run" + | "server" + | "start" + | "startup" + | "wsgi" + ) +} + +fn semantic_path_is_entrypoint_file(path: Option<&str>) -> bool { + let Some(path) = path else { + return false; + }; + let normalized = path.replace('\\', "/").to_ascii_lowercase(); + [ + "/main.rs", + "/main.c", + "/main.cc", + "/main.cpp", + "/main.cxx", + "/main.go", + "/main.java", + "/main.py", + "/app.js", + "/app.jsx", + "/app.py", + "/app.rb", + "/app.ts", + "/app.tsx", + "/application.java", + "/asgi.py", + "/config.ru", + "/index.js", + "/index.jsx", + "/index.php", + "/index.rb", + "/index.ts", + "/index.tsx", + "/program.cs", + "/route.js", + "/route.jsx", + "/route.ts", + "/route.tsx", + "/server.js", + "/server.jsx", + "/server.py", + "/server.rb", + "/server.ts", + "/server.tsx", + "/startup.cs", + "/wsgi.py", + ] + .iter() + .any(|suffix| normalized.ends_with(suffix)) + || (normalized.contains("/cmd/") && normalized.ends_with("/main.go")) + || (normalized.contains("/src/main/java/") && normalized.ends_with("application.java")) + || (normalized.contains("/src/main/kotlin/") && normalized.ends_with("application.kt")) +} + +fn semantic_file_is_public_surface(path: Option<&str>) -> bool { + let Some(path) = path else { + return false; + }; + let normalized = path.replace('\\', "/").to_ascii_lowercase(); + normalized.ends_with("/lib.rs") + || normalized.ends_with("/mod.rs") + || normalized.ends_with("/public.rs") + || normalized.ends_with("/__init__.py") + || normalized.ends_with("/index.js") + || normalized.ends_with("/index.jsx") + || normalized.ends_with("/index.php") + || normalized.ends_with("/index.rb") + || normalized.ends_with("/index.ts") + || normalized.ends_with("/index.tsx") + || normalized.ends_with("/package.json") + || normalized.starts_with("api/") + || normalized.contains("/api/") + || normalized.starts_with("apps/") + || normalized.contains("/apps/") + || normalized.starts_with("include/") + || normalized.contains("/include/") + || normalized.starts_with("pkg/") + || normalized.contains("/pkg/") + || normalized.starts_with("public/") + || normalized.contains("/public/") + || normalized.starts_with("routes/") + || normalized.contains("/routes/") + || normalized.starts_with("controllers/") + || normalized.contains("/controllers/") + || normalized.starts_with("components/") + || normalized.contains("/components/") + || normalized.contains("/src/main/java/") + || normalized.contains("/src/main/kotlin/") +} + +fn dense_anchor_public_kind(kind: codestory_contracts::graph::NodeKind) -> bool { + matches!( + kind, + codestory_contracts::graph::NodeKind::STRUCT + | codestory_contracts::graph::NodeKind::CLASS + | codestory_contracts::graph::NodeKind::INTERFACE + | codestory_contracts::graph::NodeKind::ANNOTATION + | codestory_contracts::graph::NodeKind::UNION + | codestory_contracts::graph::NodeKind::ENUM + | codestory_contracts::graph::NodeKind::TYPEDEF + | codestory_contracts::graph::NodeKind::GLOBAL_VARIABLE + | codestory_contracts::graph::NodeKind::CONSTANT + ) +} + +fn dense_anchor_callable_kind(kind: codestory_contracts::graph::NodeKind) -> bool { + matches!( + kind, + codestory_contracts::graph::NodeKind::FUNCTION + | codestory_contracts::graph::NodeKind::METHOD + | codestory_contracts::graph::NodeKind::MACRO + ) +} + +fn semantic_file_is_package_callable_surface(path: Option<&str>) -> bool { + let Some(path) = path else { + return false; + }; + let normalized = path.replace('\\', "/").to_ascii_lowercase(); + let file_name = normalized.rsplit('/').next().unwrap_or(normalized.as_str()); + let source_extension = [ + ".bash", ".c", ".cc", ".cjs", ".cpp", ".cs", ".dart", ".fish", ".go", ".h", ".hpp", + ".java", ".js", ".jsx", ".kt", ".kts", ".mjs", ".php", ".py", ".rb", ".sh", ".swift", + ".ts", ".tsx", ".zsh", + ] + .iter() + .any(|suffix| file_name.ends_with(suffix)); + if !source_extension { + return false; + } + normalized.contains("/lib/") + || normalized.contains("/src/") + || normalized.contains("/pkg/") + || normalized.contains("/packages/") + || normalized.contains("/routes/") + || normalized.contains("/router/") + || normalized.contains("/controllers/") + || normalized.contains("/middleware/") + || normalized.contains("/sources/") + || matches!( + file_name, + "application.js" + | "context.go" + | "gin.go" + | "http.dart" + | "nvm.sh" + | "request.js" + | "response.js" + | "routergroup.go" + | "sessions.py" + | "tree.go" + ) +} + +fn semantic_doc_is_documented_nontrivial(doc_text: &str) -> bool { + if !doc_text.contains("comments:") { + return false; + } + doc_text + .lines() + .find_map(|line| line.strip_prefix("body_summary:")) + .is_some_and(|body| body.split_whitespace().count() >= 8) +} + +fn dense_anchor_reason_for_node( + graph_context: &SemanticDocGraphContext, + node: &GraphNode, + display_name: &str, + file_path: Option<&str>, + doc_text: &str, + access: Option, +) -> Option { + let file_role = file_path + .map(retrieval_file_role_from_path) + .unwrap_or(RetrievalFileRole::Source); + let central = dense_anchor_is_central(graph_context, node.id); + + if file_role == RetrievalFileRole::Docs { + return Some(DenseAnchorReason::UnstructuredDoc); + } + if file_role.is_non_primary() && !central { + return None; + } + if semantic_file_is_entrypoint(file_path, display_name) { + return Some(DenseAnchorReason::Entrypoint); + } + if central { + return Some(DenseAnchorReason::CentralGraphNode); + } + if dense_anchor_public_kind(node.kind) + && (matches!(access, Some(AccessKind::Public | AccessKind::Protected)) + || semantic_file_is_public_surface(file_path)) + { + return Some(DenseAnchorReason::PublicApi); + } + if dense_anchor_callable_kind(node.kind) && semantic_file_is_package_callable_surface(file_path) + { + return Some(DenseAnchorReason::PublicApi); + } + if semantic_doc_is_documented_nontrivial(doc_text) { + return Some(DenseAnchorReason::DocumentedNontrivial); + } + None +} + +fn is_retrieval_artifact_node(node: &GraphNode) -> bool { + node.serialized_name.starts_with("component_report:") + || node + .canonical_id + .as_deref() + .is_some_and(|canonical_id| canonical_id.starts_with("codestory:component_report:")) +} + +fn build_component_report_docs( + graph_context: &SemanticDocGraphContext, + semantic_nodes: &[&GraphNode], + existing_docs: &HashMap, + embedding_contract: Option<&EmbeddingProfileContractDto>, + updated_at_epoch_ms: i64, +) -> Vec { + let mut components = BTreeMap::>::new(); + for node in semantic_nodes { + let file_path = graph_context.file_path_for_node(node); + let Some(component_key) = semantic_component_key_for_path(file_path) else { + continue; + }; + components.entry(component_key).or_default().push(*node); + } + + components + .into_iter() + .filter_map(|(component_key, mut component_nodes)| { + component_nodes.sort_by(|left, right| { + dense_anchor_score(graph_context, right.id) + .cmp(&dense_anchor_score(graph_context, left.id)) + .then_with(|| node_display_name(left).cmp(&node_display_name(right))) + .then_with(|| left.id.0.cmp(&right.id.0)) + }); + let god_nodes = component_nodes + .iter() + .take(8) + .map(|node| { + let file = graph_context.file_path_for_node(node).unwrap_or(""); + format!( + "- {} kind={:?} file={} centrality={}", + node_display_name(node), + node.kind, + file, + dense_anchor_score(graph_context, node.id) + ) + }) + .collect::>(); + if god_nodes.is_empty() { + return None; + } + let mut files = component_nodes + .iter() + .filter_map(|node| graph_context.file_path_for_node(node)) + .map(str::to_string) + .collect::>(); + files.sort(); + files.dedup(); + files.truncate(12); + let representative_file_path = files.first().cloned(); + + let mut doc_text = String::new(); + let _ = writeln!( + doc_text, + "{LLM_SYMBOL_DOC_VERSION_PREFIX} {LLM_SYMBOL_DOC_SCHEMA_VERSION}" + ); + let _ = writeln!(doc_text, "component_report: {component_key}"); + let _ = writeln!( + doc_text, + "source_provenance: {SYMBOL_SEARCH_DOC_PROVENANCE}" + ); + let _ = writeln!(doc_text, "policy_version: {SEMANTIC_POLICY_VERSION}"); + if let Some(path) = representative_file_path.as_deref() { + let _ = writeln!(doc_text, "representative_file: {path}"); + } + let _ = writeln!(doc_text, "symbol_count: {}", component_nodes.len()); + let _ = writeln!(doc_text, "file_count: {}", files.len()); + if !files.is_empty() { + let _ = writeln!(doc_text, "files: {}", files.join("; ")); + } + let _ = writeln!(doc_text, "god_nodes:"); + for line in god_nodes { + let _ = writeln!(doc_text, "{line}"); + } + doc_text = truncate_semantic_doc_text_to_token_budget( + &doc_text, + semantic_doc_max_tokens_from_env(), + ); + let doc_hash = llm_symbol_doc_hash(&doc_text); + let node_id = virtual_component_report_node_id(&component_key); + let display_name = format!("component_report:{component_key}"); + let qualified_name = Some(format!("codestory::component_report::{component_key}")); + let kind = codestory_contracts::graph::NodeKind::MODULE; + let symbol_doc = SymbolSearchDoc { + node_id, + file_node_id: None, + kind, + display_name: display_name.clone(), + qualified_name: qualified_name.clone(), + file_path: representative_file_path.clone(), + start_line: None, + doc_text: doc_text.clone(), + doc_version: LLM_SYMBOL_DOC_SCHEMA_VERSION, + doc_hash: doc_hash.clone(), + policy_version: SEMANTIC_POLICY_VERSION.to_string(), + source_provenance: SYMBOL_SEARCH_DOC_PROVENANCE.to_string(), + updated_at_epoch_ms, + }; + let pending = embedding_contract.map(|embedding_contract| { + let dense_reason = DenseAnchorReason::ComponentReport; + let reusable = existing_docs.get(&node_id).is_some_and(|existing_doc| { + llm_symbol_doc_can_reuse(existing_doc, &doc_hash, embedding_contract) + && existing_doc.dense_reason.as_deref() == Some(dense_reason.as_str()) + }); + ( + PendingLlmSymbolDoc { + node_id, + file_node_id: None, + kind, + display_name, + qualified_name, + file_path: representative_file_path, + start_line: None, + doc_text, + doc_hash, + dense_reason, + }, + reusable, + ) + }); + let (pending, reusable) = pending + .map(|(pending, reusable)| (Some(pending), reusable)) + .unwrap_or((None, false)); + Some(BuiltLlmSymbolDoc { + symbol_doc, + pending, + reusable, + }) + }) + .collect() +} + fn sort_pending_llm_symbol_docs_for_embedding_batches(docs: &mut [PendingLlmSymbolDoc]) { docs.sort_by(|left, right| { left.doc_text @@ -4842,6 +5598,8 @@ fn flush_pending_llm_symbol_docs( embedding_backend: Some(embedding_contract.backend.clone()), embedding_dim: embedding.len() as u32, doc_shape: Some(embedding_contract.doc_shape.clone()), + semantic_policy_version: Some(SEMANTIC_POLICY_VERSION.to_string()), + dense_reason: Some(doc.dense_reason.as_str().to_string()), embedding, updated_at_epoch_ms, }) @@ -4907,23 +5665,19 @@ fn sync_llm_symbol_projection( return Ok(stats); } - if let Err(error) = engine.set_embedding_runtime_from_env() { - tracing::warn!( - "embedding runtime unavailable ({error}); semantic ask retrieval will be unavailable until managed ONNX assets are installed with `codestory-cli setup embeddings` or embedding env points at a reachable runtime. Agent-facing retrieval must be repaired to full sidecar readiness before packet/search evidence is trusted." - ); - if hydrate_semantic_docs { - let reload_started = Instant::now(); - reload_llm_docs_from_storage(storage, engine, LLM_DOC_RELOAD_BATCH_SIZE)?; - stats.reload_ms = clamp_u128_to_u32(reload_started.elapsed().as_millis()); + let embedding_contract = match engine.set_embedding_runtime_from_env() { + Ok(()) => Some(current_embedding_contract_from_env().ok_or_else(|| { + ApiError::internal( + "Failed to resolve current embedding profile contract after configuring runtime", + ) + })?), + Err(error) => { + tracing::warn!( + "embedding runtime unavailable ({error}); graph-native symbol docs will still be refreshed, but dense anchor retrieval will be unavailable until managed ONNX assets are installed with `codestory-cli setup embeddings` or embedding env points at a reachable runtime. Agent-facing retrieval must be repaired to full sidecar readiness before packet/search evidence is trusted." + ); + None } - return Ok(stats); - } - - let embedding_contract = current_embedding_contract_from_env().ok_or_else(|| { - ApiError::internal( - "Failed to resolve current embedding profile contract after configuring runtime", - ) - })?; + }; let updated_at_epoch_ms = current_epoch_ms(); let existing_docs = storage @@ -4933,10 +5687,15 @@ fn sync_llm_symbol_projection( .map(|doc| (doc.node_id, doc)) .collect::>(); - let expand_semantic_scope_for_contract_repair = llm_refresh_file_scope.is_some() - && existing_docs.values().any(|existing_doc| { - !llm_symbol_doc_contract_matches(existing_doc, &embedding_contract) - }); + let expand_semantic_scope_for_contract_repair = + if let Some(embedding_contract) = embedding_contract.as_ref() { + llm_refresh_file_scope.is_some() + && existing_docs.values().any(|existing_doc| { + !llm_symbol_doc_contract_matches(existing_doc, embedding_contract) + }) + } else { + false + }; if expand_semantic_scope_for_contract_repair { tracing::warn!( "Stored semantic-doc contract differs from current embedding contract; expanding incremental semantic sync to rebuild all semantic docs" @@ -4965,11 +5724,13 @@ fn sync_llm_symbol_projection( let stream_sort_window_size = embed_batch_size.saturating_mul(stream_sort_window_batches); tracing::debug!(embed_batch_size, "Using semantic doc embedding batch size"); let mut pending_docs = Vec::::new(); - let mut seen_node_ids = Vec::::new(); + let mut seen_symbol_node_ids = Vec::::new(); + let mut seen_dense_node_ids = Vec::::new(); let mut doc_build_ns = 0_u128; let semantic_nodes = nodes .iter() .filter(|node| llm_indexable_kind(node.kind)) + .filter(|node| !is_retrieval_artifact_node(node)) .filter(|node| { effective_llm_refresh_file_scope .map(|scope| { @@ -4980,6 +5741,13 @@ fn sync_llm_symbol_projection( .unwrap_or(true) }) .collect::>(); + let semantic_node_ids = semantic_nodes + .iter() + .map(|node| node.id) + .collect::>(); + let component_access = storage + .get_component_access_map_for_nodes(&semantic_node_ids) + .map_err(|e| ApiError::internal(format!("Failed to load symbol access metadata: {e}")))?; let graph_context = SemanticDocGraphContext::build(storage, &semantic_nodes, nodes)?; let file_cache_started = Instant::now(); let file_text_cache = build_semantic_file_text_cache(&graph_context, &semantic_nodes); @@ -5005,80 +5773,250 @@ fn sync_llm_symbol_projection( &file_text_cache, ); let doc_hash = llm_symbol_doc_hash(&doc_text); - let reusable = existing_docs.get(&node.id).is_some_and(|existing_doc| { - llm_symbol_doc_can_reuse(existing_doc, &doc_hash, &embedding_contract) - }); + let dense_reason = dense_anchor_reason_for_node( + &graph_context, + node, + &display_name, + file_path.as_deref(), + &doc_text, + component_access.get(&node.id).copied(), + ); + let symbol_doc = SymbolSearchDoc { + node_id: node.id, + file_node_id: node.file_node_id, + kind: node.kind, + display_name: display_name.clone(), + qualified_name: node.qualified_name.clone(), + file_path: file_path.clone(), + start_line: node.start_line, + doc_text: doc_text.clone(), + doc_version: LLM_SYMBOL_DOC_SCHEMA_VERSION, + doc_hash: doc_hash.clone(), + policy_version: SEMANTIC_POLICY_VERSION.to_string(), + source_provenance: SYMBOL_SEARCH_DOC_PROVENANCE.to_string(), + updated_at_epoch_ms, + }; + let pending_with_reuse = + embedding_contract.as_ref().and_then(|embedding_contract| { + dense_reason.map(|dense_reason| { + let reusable = + existing_docs.get(&node.id).is_some_and(|existing_doc| { + llm_symbol_doc_can_reuse( + existing_doc, + &doc_hash, + embedding_contract, + ) && existing_doc.dense_reason.as_deref() + == Some(dense_reason.as_str()) + }); + ( + PendingLlmSymbolDoc { + node_id: node.id, + file_node_id: node.file_node_id, + kind: node.kind, + display_name, + qualified_name: node.qualified_name.clone(), + file_path, + start_line: node.start_line, + doc_text, + doc_hash, + dense_reason, + }, + reusable, + ) + }) + }); + let (pending, reusable) = pending_with_reuse + .map(|(pending, reusable)| (Some(pending), reusable)) + .unwrap_or((None, false)); BuiltLlmSymbolDoc { - pending: PendingLlmSymbolDoc { - node_id: node.id, - file_node_id: node.file_node_id, - kind: node.kind, - display_name, - qualified_name: node.qualified_name.clone(), - file_path, - start_line: node.start_line, - doc_text, - doc_hash, - }, + symbol_doc, + pending, reusable, } }) .collect::>(); doc_build_ns = doc_build_ns.saturating_add(doc_build_started.elapsed().as_nanos()); + let symbol_docs = built_docs + .iter() + .map(|built_doc| built_doc.symbol_doc.clone()) + .collect::>(); + let symbol_upsert_started = Instant::now(); + storage + .upsert_symbol_search_docs_batch(&symbol_docs) + .map_err(|e| ApiError::internal(format!("Failed to upsert symbol search docs: {e}")))?; + stats.db_upsert_ms = stats.db_upsert_ms.saturating_add(clamp_u128_to_u32( + symbol_upsert_started.elapsed().as_millis(), + )); + stats.symbol_search_docs_written = stats + .symbol_search_docs_written + .saturating_add(clamp_usize_to_u32(symbol_docs.len())); + for built_doc in built_docs { - seen_node_ids.push(built_doc.pending.node_id); + seen_symbol_node_ids.push(built_doc.symbol_doc.node_id); + let Some(pending_doc) = built_doc.pending else { + stats.dense_docs_skipped = stats.dense_docs_skipped.saturating_add(1); + continue; + }; + seen_dense_node_ids.push(pending_doc.node_id); + observe_dense_anchor_reason(&mut stats, pending_doc.dense_reason); if built_doc.reusable { stats.docs_reused = stats.docs_reused.saturating_add(1); continue; } stats.docs_pending = stats.docs_pending.saturating_add(1); - pending_docs.push(built_doc.pending); + pending_docs.push(pending_doc); } while stream_pending_docs && pending_docs.len() >= embed_batch_size { + let Some(embedding_contract) = embedding_contract.as_ref() else { + break; + }; flush_streaming_llm_symbol_doc_window( storage, engine, &mut pending_docs, embed_batch_size, - &embedding_contract, + embedding_contract, updated_at_epoch_ms, &mut stats, )?; } } + + if effective_llm_refresh_file_scope.is_none() { + let report_build_started = Instant::now(); + let built_reports = build_component_report_docs( + &graph_context, + &semantic_nodes, + &existing_docs, + embedding_contract.as_ref(), + updated_at_epoch_ms, + ); + doc_build_ns = doc_build_ns.saturating_add(report_build_started.elapsed().as_nanos()); + if !built_reports.is_empty() { + let report_symbol_docs = built_reports + .iter() + .map(|built_doc| built_doc.symbol_doc.clone()) + .collect::>(); + let report_nodes = report_symbol_docs + .iter() + .map(|doc| GraphNode { + id: doc.node_id, + kind: doc.kind, + serialized_name: doc.display_name.clone(), + qualified_name: doc.qualified_name.clone(), + canonical_id: Some(format!("codestory:{}", doc.display_name)), + file_node_id: None, + start_line: None, + start_col: None, + end_line: None, + end_col: None, + }) + .collect::>(); + storage + .upsert_retrieval_artifact_nodes_batch(&report_nodes) + .map_err(|e| { + ApiError::internal(format!("Failed to upsert component report nodes: {e}")) + })?; + let symbol_upsert_started = Instant::now(); + storage + .upsert_symbol_search_docs_batch(&report_symbol_docs) + .map_err(|e| { + ApiError::internal(format!("Failed to upsert component report docs: {e}")) + })?; + stats.db_upsert_ms = stats.db_upsert_ms.saturating_add(clamp_u128_to_u32( + symbol_upsert_started.elapsed().as_millis(), + )); + stats.symbol_search_docs_written = stats + .symbol_search_docs_written + .saturating_add(clamp_usize_to_u32(report_symbol_docs.len())); + + for built_doc in built_reports { + seen_symbol_node_ids.push(built_doc.symbol_doc.node_id); + let Some(pending_doc) = built_doc.pending else { + stats.dense_docs_skipped = stats.dense_docs_skipped.saturating_add(1); + continue; + }; + seen_dense_node_ids.push(pending_doc.node_id); + observe_dense_anchor_reason(&mut stats, pending_doc.dense_reason); + if built_doc.reusable { + stats.docs_reused = stats.docs_reused.saturating_add(1); + continue; + } + stats.docs_pending = stats.docs_pending.saturating_add(1); + pending_docs.push(pending_doc); + } + + while stream_pending_docs && pending_docs.len() >= embed_batch_size { + let Some(embedding_contract) = embedding_contract.as_ref() else { + break; + }; + flush_streaming_llm_symbol_doc_window( + storage, + engine, + &mut pending_docs, + embed_batch_size, + embedding_contract, + updated_at_epoch_ms, + &mut stats, + )?; + } + } + } stats.doc_build_ms = clamp_u128_to_u32(doc_build_ns / 1_000_000); if !stream_pending_docs { sort_pending_llm_symbol_docs_for_embedding_batches(&mut pending_docs); } - for batch in pending_docs.chunks(embed_batch_size) { - flush_pending_llm_symbol_docs( - storage, - engine, - batch, - &embedding_contract, - updated_at_epoch_ms, - &mut stats, - )?; + if let Some(embedding_contract) = embedding_contract.as_ref() { + for batch in pending_docs.chunks(embed_batch_size) { + flush_pending_llm_symbol_docs( + storage, + engine, + batch, + embedding_contract, + updated_at_epoch_ms, + &mut stats, + )?; + } } let prune_started = Instant::now(); - let stale_docs = if let Some(scope) = effective_llm_refresh_file_scope { + let stale_symbol_docs = if let Some(scope) = effective_llm_refresh_file_scope { let file_node_ids = scope.iter().copied().collect::>(); storage - .delete_llm_symbol_docs_for_files_except_node_ids(&file_node_ids, &seen_node_ids) - .map_err(|e| ApiError::internal(format!("Failed to prune stale LLM docs: {e}")))? + .delete_symbol_search_docs_for_files_except_node_ids( + &file_node_ids, + &seen_symbol_node_ids, + ) + .map_err(|e| ApiError::internal(format!("Failed to prune stale symbol docs: {e}")))? } else { storage - .prune_llm_symbol_docs_to_node_ids(&seen_node_ids) - .map_err(|e| ApiError::internal(format!("Failed to prune stale LLM docs: {e}")))? + .prune_symbol_search_docs_to_node_ids(&seen_symbol_node_ids) + .map_err(|e| ApiError::internal(format!("Failed to prune stale symbol docs: {e}")))? + }; + let stale_dense_docs = if embedding_contract.is_some() { + if let Some(scope) = effective_llm_refresh_file_scope { + let file_node_ids = scope.iter().copied().collect::>(); + storage + .delete_llm_symbol_docs_for_files_except_node_ids( + &file_node_ids, + &seen_dense_node_ids, + ) + .map_err(|e| ApiError::internal(format!("Failed to prune stale LLM docs: {e}")))? + } else { + storage + .prune_llm_symbol_docs_to_node_ids(&seen_dense_node_ids) + .map_err(|e| ApiError::internal(format!("Failed to prune stale LLM docs: {e}")))? + } + } else { + 0 }; stats.prune_ms = clamp_u128_to_u32(prune_started.elapsed().as_millis()); - stats.docs_stale = clamp_usize_to_u32(stale_docs); + stats.docs_stale = clamp_usize_to_u32(stale_dense_docs.saturating_add(stale_symbol_docs)); if hydrate_semantic_docs { let reload_started = Instant::now(); @@ -6774,6 +7712,7 @@ impl AppController { edge_count: clamp_i64_to_u32(stats.edge_count), file_count: clamp_i64_to_u32(derived_file_count), error_count: clamp_i64_to_u32(stats.error_count), + fatal_error_count: clamp_i64_to_u32(stats.fatal_error_count), }; let workspace = Workspace::open(root.to_path_buf()) .map_err(|e| ApiError::internal(format!("Failed to open project: {e}")))?; @@ -7665,7 +8604,15 @@ impl AppController { &query_result.hits, limit_per_source, ) - .unwrap_or_default(); + .map_err(|error| { + agent::retrieval_primary::sidecar_retrieval_unavailable_error( + self, + format!( + "sidecar search rejected query: candidate resolution failed: {}", + error.message + ), + ) + })?; if let Some(reason) = agent::retrieval_primary::sidecar_result_rejection_reason( &query_result, &indexed_symbol_hits, @@ -7851,9 +8798,9 @@ impl AppController { normalize_path_key(&runtime_relative_path(&root, &file.path)) .contains(needle) }) - && language_filter - .as_deref() - .is_none_or(|language| file.language.eq_ignore_ascii_case(language)) + && language_filter.as_deref().is_none_or(|language| { + indexed_file_matches_language_filter(&file.language, &file.path, language) + }) }) .map(|file| IndexedFileDto { path: runtime_relative_path(&root, &file.path), @@ -7866,8 +8813,10 @@ impl AppController { }) .collect::>(); let limit = req.limit.unwrap_or(500).clamp(1, 5000) as usize; + let filtered_file_count = visible.len().min(u32::MAX as usize) as u32; let truncated = visible.len() > limit; visible.truncate(limit); + let visible_file_count = visible.len().min(u32::MAX as usize) as u32; let mut coverage_notes = Vec::new(); if incomplete_file_count > 0 || error_file_count > 0 { @@ -7879,9 +8828,15 @@ impl AppController { } let language_counts = language_counts .into_iter() - .map(|(language, file_count)| IndexedFileLanguageCountDto { - language, - file_count, + .map(|(language, file_count)| { + let support = language_support_summary_for_language(&language); + IndexedFileLanguageCountDto { + language, + file_count, + support_mode: support.support_mode, + evidence_tier: support.evidence_tier, + claim_label: support.claim_label, + } }) .collect::>(); let file_count = language_counts @@ -7895,6 +8850,8 @@ impl AppController { summary: IndexedFilesSummaryDto { file_count, indexed_file_count, + filtered_file_count, + visible_file_count, incomplete_file_count, error_file_count, truncated, @@ -8591,6 +9548,7 @@ impl AppController { semantic: scored.semantic_score, graph: scored.graph_score, total: scored.total_score, + provenance: Vec::new(), }); out.push(HybridSearchScoredHit { hit, @@ -9286,6 +10244,31 @@ fn index_full( } }; if can_copy_forward { + match staged + .store_mut() + .copy_retrieval_artifact_nodes_from(storage_path) + { + Ok(copied) => { + tracing::debug!( + copied, + "Copied retrieval artifact nodes into staged storage" + ) + } + Err(error) => { + tracing::warn!( + "Failed to copy retrieval artifact nodes into staged storage: {error}" + ) + } + } + match staged + .store_mut() + .copy_symbol_search_docs_from(storage_path) + { + Ok(copied) => tracing::debug!(copied, "Copied symbol docs into staged storage"), + Err(error) => { + tracing::warn!("Failed to copy symbol docs into staged storage: {error}") + } + } match staged.store_mut().copy_llm_symbol_docs_from(storage_path) { Ok(copied) => tracing::debug!(copied, "Copied semantic docs into staged storage"), Err(error) => { @@ -9334,6 +10317,14 @@ fn index_full( semantic_docs_embedded: None, semantic_docs_pending: None, semantic_docs_stale: None, + symbol_search_docs_written: None, + semantic_dense_docs_skipped: None, + semantic_dense_public_api: None, + semantic_dense_entrypoint: None, + semantic_dense_documented_nontrivial: None, + semantic_dense_central_graph_node: None, + semantic_dense_component_report: None, + semantic_dense_unstructured_doc: None, deferred_indexes_ms: Some(deferred_indexes_ms), summary_snapshot_ms: Some(summary_snapshot_ms), detail_snapshot_ms: None, @@ -9511,6 +10502,14 @@ where semantic_docs_embedded: None, semantic_docs_pending: None, semantic_docs_stale: None, + symbol_search_docs_written: None, + semantic_dense_docs_skipped: None, + semantic_dense_public_api: None, + semantic_dense_entrypoint: None, + semantic_dense_documented_nontrivial: None, + semantic_dense_central_graph_node: None, + semantic_dense_component_report: None, + semantic_dense_unstructured_doc: None, deferred_indexes_ms: None, summary_snapshot_ms: Some(summary_snapshot_ms), detail_snapshot_ms: Some(detail_snapshot_ms), @@ -9659,7 +10658,7 @@ mod tests { }; use crossbeam_channel::unbounded; use std::fs; - use std::path::PathBuf; + use std::path::{Path, PathBuf}; use std::sync::{Mutex as StdMutex, MutexGuard as StdMutexGuard}; use tempfile::tempdir; @@ -9719,6 +10718,33 @@ mod tests { ); } + #[test] + fn indexable_source_path_tracks_indexer_structural_and_template_surfaces() { + for relative_path in [ + "src/lib.rs", + "src/main.go", + "src/App.vue", + "src/App.svelte", + "src/pages/index.astro", + "public/index.html", + "public/site.css", + "db/schema.sql", + ] { + assert!( + indexable_source_path(Path::new(relative_path)), + "runtime freshness should count indexer-indexable path: {relative_path}" + ); + } + } + + #[test] + fn indexable_source_path_keeps_non_code_data_outside_freshness_gate() { + assert!( + !indexable_source_path(Path::new("target/run-output.log")), + "runtime freshness should not count unsupported output artifacts" + ); + } + struct HybridTestEnv { guards: Vec, _lock: StdMutexGuard<'static, ()>, @@ -9881,13 +10907,66 @@ mod tests { assert_eq!(hits[0].file_path.as_deref(), Some("src/routes.ts")); } + #[test] + fn language_filter_uses_shared_registry_extensions() { + for (requested, path) in [ + ("bash", "scripts/bootstrap.sh"), + ("bash", "scripts/bootstrap.bash"), + ("sh", "scripts/bootstrap.sh"), + ("python", "pkg/types.pyi"), + ("ts", "src/server.mts"), + ("typescript", "src/server.cts"), + ("dart", "lib/main.dart"), + ("html", "templates/index.htm"), + ("css", "assets/site.css"), + ("sql", "db/schema.sql"), + ("c++", "include/runtime.hh"), + ("c#", "src/App.cs"), + ("markdown", "docs/guide.mdx"), + ] { + assert!( + language_filter_matches_path(requested, path), + "expected language:{requested} to match {path}" + ); + } + + assert!(!language_filter_matches_path("bash", "src/main.py")); + assert!(!language_filter_matches_path( + "sh", + "scripts/bootstrap.bash" + )); + assert!(!language_filter_matches_path("tsx", "src/server.ts")); + assert!(!language_filter_matches_path("jsx", "src/app.js")); + + assert!(indexed_file_matches_language_filter( + "typescript", + Path::new("src/Widget.tsx"), + "tsx" + )); + assert!(indexed_file_matches_language_filter( + "bash", + Path::new("scripts/bootstrap.sh"), + "bash" + )); + assert!(!indexed_file_matches_language_filter( + "typescript", + Path::new("src/server.ts"), + "tsx" + )); + } + #[test] fn llm_doc_embed_batch_size_uses_throughput_default() { + let _lock = ENV_TEST_LOCK + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + let _env = EnvGuard::remove(LLM_DOC_EMBED_BATCH_SIZE_ENV); + assert_eq!(llm_doc_embed_batch_size(), 128); } #[test] - fn framework_route_coverage_matrix_lists_fixture_status_and_known_gaps() { + fn framework_route_coverage_matrix_lists_coverage_evidence_and_known_gaps() { let coverage = framework_route_coverage_matrix(); let frameworks = coverage .iter() @@ -9927,7 +11006,7 @@ mod tests { ); } assert!(coverage.iter().all(|entry| { - !entry.fixture_status.is_empty() + !entry.coverage_evidence.is_empty() && !entry.confidence_floor.is_empty() && !entry.handler_link_support.is_empty() && !entry.unsupported_patterns.is_empty() @@ -10080,7 +11159,472 @@ mod tests { start_line: None, doc_text: doc_text.to_string(), doc_hash: llm_symbol_doc_hash(doc_text), + dense_reason: DenseAnchorReason::PublicApi, + } + } + + fn semantic_policy_node(id: i64, kind: NodeKind, name: &str, file_id: i64) -> Node { + Node { + id: CoreNodeId(id), + kind, + serialized_name: name.to_string(), + qualified_name: Some(format!("pkg::{name}")), + file_node_id: Some(CoreNodeId(file_id)), + start_line: Some(1), + end_line: Some(3), + ..Default::default() + } + } + + fn semantic_policy_context(path: &str, node_id: CoreNodeId) -> SemanticDocGraphContext { + let mut context = SemanticDocGraphContext::default(); + context.file_paths.insert(node_id, path.to_string()); + context + } + + #[test] + fn dense_policy_skips_private_trivial_helpers() { + let node = semantic_policy_node(10, NodeKind::FUNCTION, "helper", 1); + let context = semantic_policy_context("src/internal/helper.rs", node.id); + + let reason = dense_anchor_reason_for_node( + &context, + &node, + "helper", + Some("src/internal/helper.rs"), + "semantic_doc_version: 4\nsymbol: helper\nkind: FUNCTION\n", + Some(AccessKind::Private), + ); + + assert_eq!(reason, None); + } + + #[test] + fn dense_policy_does_not_treat_every_handler_name_as_entrypoint() { + let node = semantic_policy_node(14, NodeKind::FUNCTION, "handler", 1); + let context = semantic_policy_context("src/internal/request.rs", node.id); + + let reason = dense_anchor_reason_for_node( + &context, + &node, + "handler", + Some("src/internal/request.rs"), + "semantic_doc_version: 4\nsymbol: handler\nkind: FUNCTION\n", + Some(AccessKind::Private), + ); + + assert_eq!(reason, None); + } + + #[test] + fn dense_policy_only_embeds_high_signal_central_nodes() { + let ordinary = semantic_policy_node(15, NodeKind::FUNCTION, "ordinary", 1); + let central = semantic_policy_node(16, NodeKind::FUNCTION, "central", 1); + let mut context = semantic_policy_context("src/internal/graph.rs", ordinary.id); + context + .file_paths + .insert(central.id, "src/internal/graph.rs".to_string()); + context.child_labels.insert( + ordinary.id, + ["a", "b", "c", "d"] + .into_iter() + .map(str::to_string) + .collect(), + ); + context.referenced_labels.insert( + central.id, + (0..DENSE_CENTRAL_LABEL_THRESHOLD) + .map(|index| format!("ref_{index}")) + .collect(), + ); + context + .edge_digests + .insert(central.id, vec!["CALL=24".to_string()]); + + assert_eq!( + dense_anchor_reason_for_node( + &context, + &ordinary, + "ordinary", + Some("src/internal/graph.rs"), + "semantic_doc_version: 4\nsymbol: ordinary\nkind: FUNCTION\n", + Some(AccessKind::Private), + ), + None + ); + assert_eq!( + dense_anchor_reason_for_node( + &context, + ¢ral, + "central", + Some("src/internal/graph.rs"), + "semantic_doc_version: 4\nsymbol: central\nkind: FUNCTION\n", + Some(AccessKind::Private), + ), + Some(DenseAnchorReason::CentralGraphNode) + ); + } + + #[test] + fn dense_policy_classifies_public_entrypoint_and_documented_symbols() { + let public_node = semantic_policy_node(11, NodeKind::STRUCT, "ReportBuilder", 1); + let entrypoint_node = semantic_policy_node(12, NodeKind::FUNCTION, "main", 1); + let documented_node = semantic_policy_node(13, NodeKind::METHOD, "parse_config", 1); + let context = semantic_policy_context("src/lib.rs", public_node.id); + + assert_eq!( + dense_anchor_reason_for_node( + &context, + &public_node, + "ReportBuilder", + Some("src/lib.rs"), + "semantic_doc_version: 4\nsymbol: ReportBuilder\nkind: STRUCT\n", + Some(AccessKind::Public), + ), + Some(DenseAnchorReason::PublicApi) + ); + assert_eq!( + dense_anchor_reason_for_node( + &context, + &entrypoint_node, + "main", + Some("src/main.rs"), + "semantic_doc_version: 4\nsymbol: main\nkind: FUNCTION\n", + Some(AccessKind::Private), + ), + Some(DenseAnchorReason::Entrypoint) + ); + assert_eq!( + dense_anchor_reason_for_node( + &context, + &documented_node, + "parse_config", + Some("src/internal/config.rs"), + "semantic_doc_version: 4\ncomments: parses user-visible configuration\nbody_summary: validates and normalizes the configuration before runtime startup\n", + Some(AccessKind::Private), + ), + Some(DenseAnchorReason::DocumentedNontrivial) + ); + } + + #[test] + fn dense_policy_classifies_cross_language_entrypoints_and_surfaces() { + let python_app = semantic_policy_node(21, NodeKind::FUNCTION, "app", 1); + let go_command = semantic_policy_node(22, NodeKind::FUNCTION, "run", 1); + let csharp_program = semantic_policy_node(23, NodeKind::CLASS, "Program", 1); + let java_application = semantic_policy_node(24, NodeKind::CLASS, "Application", 1); + let c_header_api = semantic_policy_node(25, NodeKind::STRUCT, "ClientApi", 1); + let python_package_api = semantic_policy_node(26, NodeKind::CLASS, "PackageClient", 1); + let mut context = SemanticDocGraphContext::default(); + context + .file_paths + .insert(python_app.id, "service/app.py".to_string()); + context + .file_paths + .insert(go_command.id, "cmd/server/main.go".to_string()); + context + .file_paths + .insert(csharp_program.id, "src/Program.cs".to_string()); + context.file_paths.insert( + java_application.id, + "src/main/java/com/acme/Application.java".to_string(), + ); + context + .file_paths + .insert(c_header_api.id, "include/acme/client_api.hpp".to_string()); + context.file_paths.insert( + python_package_api.id, + "packages/acme_sdk/__init__.py".to_string(), + ); + + for (node, display_name, file_path) in [ + (&python_app, "app", "service/app.py"), + (&go_command, "run", "cmd/server/main.go"), + (&csharp_program, "Program", "src/Program.cs"), + ( + &java_application, + "Application", + "src/main/java/com/acme/Application.java", + ), + ] { + assert_eq!( + dense_anchor_reason_for_node( + &context, + node, + display_name, + Some(file_path), + "semantic_doc_version: 4\nsymbol: entrypoint\nkind: FUNCTION\n", + Some(AccessKind::Private), + ), + Some(DenseAnchorReason::Entrypoint), + "{file_path} should classify as an entrypoint" + ); } + + for (node, display_name, file_path) in [ + (&c_header_api, "ClientApi", "include/acme/client_api.hpp"), + ( + &python_package_api, + "PackageClient", + "packages/acme_sdk/__init__.py", + ), + ] { + assert_eq!( + dense_anchor_reason_for_node( + &context, + node, + display_name, + Some(file_path), + "semantic_doc_version: 4\nsymbol: api\nkind: STRUCT\n", + Some(AccessKind::Private), + ), + Some(DenseAnchorReason::PublicApi), + "{file_path} should classify as a public surface" + ); + } + } + + #[test] + fn dense_policy_does_not_embed_plain_public_callables_by_default() { + let node = semantic_policy_node(17, NodeKind::FUNCTION, "plain_public_function", 1); + let context = semantic_policy_context("src/lib.rs", node.id); + + let reason = dense_anchor_reason_for_node( + &context, + &node, + "plain_public_function", + Some("src/lib.rs"), + "semantic_doc_version: 4\nsymbol: plain_public_function\nkind: FUNCTION\n", + Some(AccessKind::Public), + ); + + assert_eq!(reason, None); + } + + #[test] + fn dense_policy_embeds_package_public_callables_for_dynamic_frameworks() { + let node = semantic_policy_node(19, NodeKind::FUNCTION, "handle", 1); + let context = semantic_policy_context("lib/router/index.js", node.id); + + let reason = dense_anchor_reason_for_node( + &context, + &node, + "handle", + Some("lib/router/index.js"), + "semantic_doc_version: 4\nsymbol: handle\nkind: FUNCTION\nsignature: function handle(req, res, next) {}\n", + Some(AccessKind::Private), + ); + + assert_eq!(reason, Some(DenseAnchorReason::PublicApi)); + + let windows_node = semantic_policy_node(29, NodeKind::METHOD, "GET /json", 1); + let windows_path = r"\\?\C:\repo\expressjs-express\lib\response.js"; + let windows_context = semantic_policy_context(windows_path, windows_node.id); + + let windows_reason = dense_anchor_reason_for_node( + &windows_context, + &windows_node, + "GET /json", + Some(windows_path), + "semantic_doc_version: 4\nsymbol: GET /json\nkind: METHOD\nsignature: .get('/json')\n", + Some(AccessKind::Private), + ); + + assert_eq!(windows_reason, Some(DenseAnchorReason::PublicApi)); + } + + #[test] + fn dense_policy_does_not_embed_comment_only_symbols_by_default() { + let node = semantic_policy_node(18, NodeKind::FUNCTION, "commented_helper", 1); + let context = semantic_policy_context("src/internal/helper.rs", node.id); + + let reason = dense_anchor_reason_for_node( + &context, + &node, + "commented_helper", + Some("src/internal/helper.rs"), + "semantic_doc_version: 4\ncomments: explains how helper is used by nearby code\nsignature: fn commented_helper() {}\n", + Some(AccessKind::Private), + ); + + assert_eq!(reason, None); + } + + #[test] + fn component_reports_are_extracted_dense_anchors_with_virtual_ids() { + let node = semantic_policy_node(20, NodeKind::FUNCTION, "central_service", 1); + let mut context = semantic_policy_context("crates/app/src/service.rs", node.id); + context + .edge_digests + .insert(node.id, vec!["CALL=9".to_string()]); + let reports = build_component_report_docs( + &context, + &[&node], + &std::collections::HashMap::new(), + None, + 123, + ); + + assert_eq!(reports.len(), 1); + let report = &reports[0]; + assert!(report.symbol_doc.node_id.0 < 0); + assert_eq!(report.symbol_doc.source_provenance, "extracted"); + assert_eq!(report.symbol_doc.policy_version, SEMANTIC_POLICY_VERSION); + assert!( + report + .symbol_doc + .doc_text + .contains("component_report: crate:app") + ); + assert_eq!( + report.symbol_doc.file_path.as_deref(), + Some("crates/app/src/service.rs") + ); + assert!(report.symbol_doc.doc_text.contains("god_nodes:")); + assert!(report.pending.is_none()); + } + + #[test] + fn component_reports_group_root_level_source_files() { + assert_eq!( + semantic_component_key_for_path(Some("nvm.sh")).as_deref(), + Some("dir:.") + ); + } + + #[test] + fn semantic_graph_context_uses_repo_relative_file_table_paths() { + let temp = tempdir().expect("create temp dir"); + let storage_path = temp.path().join("codestory.db"); + let mut storage = Storage::open(&storage_path).expect("open storage"); + let verbatim_path = PathBuf::from(r"\\?\C:\work\nvm\nvm.sh"); + storage + .insert_file(&FileInfo { + id: 11, + path: verbatim_path.clone(), + language: "bash".to_string(), + modification_time: 1, + indexed: true, + complete: true, + line_count: 12, + file_role: codestory_store::FileRole::Source, + }) + .expect("insert file"); + let file_node = Node { + id: CoreNodeId(11), + kind: NodeKind::FILE, + serialized_name: verbatim_path.to_string_lossy().to_string(), + ..Default::default() + }; + let function_node = Node { + id: CoreNodeId(101), + kind: NodeKind::FUNCTION, + serialized_name: "nvm".to_string(), + file_node_id: Some(CoreNodeId(11)), + start_line: Some(1), + ..Default::default() + }; + storage + .insert_nodes_batch(&[file_node.clone(), function_node.clone()]) + .expect("insert nodes"); + let nodes = vec![file_node, function_node.clone()]; + let semantic_nodes = vec![&function_node]; + let context = + SemanticDocGraphContext::build(&storage, &semantic_nodes, &nodes).expect("context"); + + assert_eq!(context.file_path_for_node(&function_node), Some("nvm.sh")); + assert_eq!( + context.file_read_path_for_node(&function_node), + Some("C:/work/nvm/nvm.sh") + ); + let reports = build_component_report_docs( + &context, + &semantic_nodes, + &std::collections::HashMap::new(), + None, + 123, + ); + assert_eq!(reports.len(), 1); + assert_eq!(reports[0].symbol_doc.file_path.as_deref(), Some("nvm.sh")); + assert!( + reports[0] + .symbol_doc + .doc_text + .contains("component_report: dir:.") + ); + } + + fn semantic_file_text_cache_node( + id: i64, + display_path: &str, + read_path: &Path, + context: &mut SemanticDocGraphContext, + ) -> Node { + let node = Node { + id: CoreNodeId(id), + kind: NodeKind::FUNCTION, + serialized_name: format!("symbol_{id}"), + file_node_id: Some(CoreNodeId(id + 100)), + start_line: Some(1), + ..Default::default() + }; + context.file_paths.insert(node.id, display_path.to_string()); + context + .file_read_paths + .insert(node.id, read_path.to_string_lossy().to_string()); + node + } + + #[test] + fn semantic_file_text_cache_skips_files_above_byte_limit() { + let temp = tempdir().expect("create temp dir"); + let small_path = temp.path().join("small.rs"); + let large_path = temp.path().join("large.rs"); + fs::write(&small_path, "small").expect("write small file"); + fs::write(&large_path, "too-large").expect("write large file"); + let mut context = SemanticDocGraphContext::default(); + let nodes = [ + semantic_file_text_cache_node(1, "small.rs", &small_path, &mut context), + semantic_file_text_cache_node(2, "large.rs", &large_path, &mut context), + ]; + let semantic_nodes = nodes.iter().collect::>(); + + let cache = build_semantic_file_text_cache_with_limits(&context, &semantic_nodes, 5, 100); + + assert_eq!( + cache + .get("small.rs") + .and_then(|contents| contents.as_deref()), + Some("small") + ); + assert_eq!(cache.get("large.rs"), Some(&None)); + } + + #[test] + fn semantic_file_text_cache_respects_aggregate_byte_limit() { + let temp = tempdir().expect("create temp dir"); + let a_path = temp.path().join("a.rs"); + let b_path = temp.path().join("b.rs"); + let c_path = temp.path().join("c.rs"); + fs::write(&a_path, "aaaa").expect("write a file"); + fs::write(&b_path, "bbbb").expect("write b file"); + fs::write(&c_path, "cc").expect("write c file"); + let mut context = SemanticDocGraphContext::default(); + let nodes = [ + semantic_file_text_cache_node(1, "a.rs", &a_path, &mut context), + semantic_file_text_cache_node(2, "b.rs", &b_path, &mut context), + semantic_file_text_cache_node(3, "c.rs", &c_path, &mut context), + ]; + let semantic_nodes = nodes.iter().collect::>(); + + let cache = build_semantic_file_text_cache_with_limits(&context, &semantic_nodes, 100, 7); + + assert_eq!( + cache.get("a.rs").and_then(|contents| contents.as_deref()), + Some("aaaa") + ); + assert_eq!(cache.get("b.rs"), Some(&None)); + assert_eq!(cache.get("c.rs"), Some(&None)); } fn padded_char_cost(docs: &[PendingLlmSymbolDoc], batch_size: usize) -> usize { @@ -10267,6 +11811,48 @@ mod tests { ); } + #[test] + fn semantic_doc_text_keeps_comments_before_long_file_path() { + let _lock = ENV_TEST_LOCK + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()); + let _env = EnvGuard::set(SEMANTIC_DOC_ALIAS_MODE_ENV, "current_alias"); + let _budget = EnvGuard::set(SEMANTIC_DOC_MAX_TOKENS_ENV, "128"); + let file_path = r"\\?\C:\Users\alber\AppData\Local\Temp\codestory-search-quality-fixture-with-a-long-path\src\architecture.ts"; + let file_text = r#"// Project source groups create indexing commands and storage access. +export class SourceGroupCxxCdb { + getIndexerCommands() { return []; } +} +"#; + let node = Node { + id: CoreNodeId(10), + kind: NodeKind::CLASS, + serialized_name: "SourceGroupCxxCdb".to_string(), + qualified_name: Some("SourceGroupCxxCdb".to_string()), + file_node_id: Some(CoreNodeId(1)), + start_line: Some(2), + end_line: Some(4), + ..Default::default() + }; + let mut file_text_cache = HashMap::new(); + file_text_cache.insert(file_path.to_string(), Some(file_text.to_string())); + + let doc = build_llm_symbol_doc_text( + &SemanticDocGraphContext::default(), + &node, + "SourceGroupCxxCdb", + Some(file_path), + &file_text_cache, + ); + + assert!( + doc.contains( + "comments: // Project source groups create indexing commands and storage access." + ), + "symbol docs should preserve nearby comments before long file paths consume the token budget:\n{doc}" + ); + } + #[test] fn semantic_doc_text_alias_modes_are_switchable_for_research() { let _lock = ENV_TEST_LOCK @@ -12949,16 +14535,18 @@ fn build_llm_symbol_doc_text() -> String { .run_indexing_blocking_without_runtime_refresh(IndexMode::Incremental) .expect("incremental index"); assert!( - incremental_timings.semantic_docs_embedded.unwrap_or(0) > 0, - "new semantic docs from the touched file should be embedded" - ); - assert!( - incremental_timings - .semantic_docs_embedded - .unwrap_or(u32::MAX) - < clamp_usize_to_u32(before_docs.len()), - "incremental semantic sync should not re-embed untouched files" + incremental_timings.symbol_search_docs_written.unwrap_or(0) > 0, + "new symbols from the touched file should update graph-native symbol docs" ); + if incremental_timings.semantic_docs_embedded.unwrap_or(0) > 0 { + assert!( + incremental_timings + .semantic_docs_embedded + .unwrap_or(u32::MAX) + < clamp_usize_to_u32(before_docs.len()), + "incremental dense sync should not re-embed untouched files" + ); + } assert_eq!( incremental_timings.semantic_docs_stale.unwrap_or(0), 0, @@ -12967,12 +14555,12 @@ fn build_llm_symbol_doc_text() -> String { let docs = Storage::open(&storage_path) .expect("reopen storage") - .get_all_llm_symbol_docs() - .expect("semantic docs after incremental"); + .get_symbol_search_docs_batch_after(None, 10_000) + .expect("symbol docs after incremental"); assert!( docs.iter() .any(|doc| doc.display_name.contains("codestory_added_move_hint")), - "incremental semantic docs should include the new symbol" + "incremental symbol docs should include the new symbol" ); } @@ -13034,11 +14622,15 @@ fn build_llm_symbol_doc_text() -> String { .all(|doc| doc.embedding_dim == 384 && doc.embedding.len() == 384), "incremental repair should leave all stored semantic docs on the current contract" ); + let repaired_symbol_docs = Storage::open(&storage_path) + .expect("open storage after drift repair for symbol docs") + .get_symbol_search_docs_batch_after(None, 10_000) + .expect("symbol docs after drift repair"); assert!( - repaired_docs.iter().any(|doc| doc + repaired_symbol_docs.iter().any(|doc| doc .display_name .contains("codestory_contract_drift_added_hint")), - "incremental repair should still include symbols from the touched file" + "incremental repair should still include symbol docs from the touched file" ); } @@ -13669,16 +15261,37 @@ fn build_llm_symbol_doc_text() -> String { .as_deref(), Some("model-a") ); + let mut seeded_docs = storage + .get_all_llm_symbol_docs() + .expect("initial semantic docs"); + if seeded_docs.len() == 1 { + let mut extra = seeded_docs[0].clone(); + extra.node_id = CoreNodeId(3); + extra.display_name = "beta".to_string(); + extra.qualified_name = Some("pkg::beta".to_string()); + extra.dense_reason = Some("documented_nontrivial".to_string()); + storage + .upsert_llm_symbol_docs_batch(&[extra]) + .expect("seed second dense doc"); + seeded_docs = storage + .get_all_llm_symbol_docs() + .expect("seeded semantic docs"); + } + let mixed_node_id = seeded_docs + .last() + .expect("at least one semantic doc") + .node_id + .0; storage .get_connection() .execute( "UPDATE llm_symbol_doc SET embedding_model = CASE - WHEN node_id = 2 THEN 'model-b' + WHEN node_id = ?1 THEN 'model-b' ELSE embedding_model END", - [], + [mixed_node_id], ) .expect("mark one semantic doc as mixed"); assert_eq!( @@ -14149,8 +15762,8 @@ fn build_llm_symbol_doc_text() -> String { let storage = Storage::open(&storage_path).expect("open storage after initial index"); let initial_docs = storage - .get_all_llm_symbol_docs() - .expect("load initial semantic docs") + .get_symbol_search_docs_batch_after(None, 10_000) + .expect("load initial symbol docs") .into_iter() .filter(|doc| doc.display_name == "build_snapshot_digest") .collect::>(); @@ -14174,8 +15787,8 @@ fn build_llm_symbol_doc_text() -> String { let storage = Storage::open(&storage_path).expect("open storage after rerun"); let updated_docs = storage - .get_all_llm_symbol_docs() - .expect("load updated semantic docs") + .get_symbol_search_docs_batch_after(None, 10_000) + .expect("load updated symbol docs") .into_iter() .filter(|doc| doc.display_name == "build_snapshot_digest") .collect::>(); @@ -14194,7 +15807,7 @@ fn build_llm_symbol_doc_text() -> String { !updated_docs .iter() .any(|doc| doc.doc_text.contains("initial_compressed_digest")), - "full index should rebuild semantic docs instead of reusing stale persisted content" + "full index should rebuild symbol docs instead of reusing stale persisted content" ); } diff --git a/crates/codestory-runtime/src/semantic_doc_text.rs b/crates/codestory-runtime/src/semantic_doc_text.rs index b1be29e7..46b00447 100644 --- a/crates/codestory-runtime/src/semantic_doc_text.rs +++ b/crates/codestory-runtime/src/semantic_doc_text.rs @@ -4,22 +4,7 @@ use std::collections::HashSet; use crate::symbol_query::symbol_query_tokens; pub(crate) fn semantic_doc_language_from_path(path: Option<&str>) -> Option<&'static str> { - let ext = path? - .rsplit('.') - .next()? - .trim_start_matches('.') - .to_ascii_lowercase(); - match ext.as_str() { - "c" => Some("c"), - "cc" | "cpp" | "cxx" | "h" | "hh" | "hpp" | "hxx" => Some("cpp"), - "java" => Some("java"), - "js" | "jsx" | "mjs" | "cjs" => Some("javascript"), - "py" | "pyi" => Some("python"), - "rs" => Some("rust"), - "ts" | "tsx" | "mts" | "cts" => Some("typescript"), - "svelte" => Some("svelte"), - _ => None, - } + codestory_contracts::language_support::language_name_for_path(path) } pub(crate) fn semantic_symbol_role_aliases(kind: NodeKind) -> &'static str { @@ -69,6 +54,9 @@ pub(crate) fn semantic_symbol_aliases( if let Some(alias) = normalized_symbol_alias(candidate) { push_unique_alias(&mut aliases.name_aliases, &mut seen_names, alias); } + for expanded_alias in expanded_symbol_aliases(candidate) { + push_unique_alias(&mut aliases.name_aliases, &mut seen_names, expanded_alias); + } if let Some(terminal) = terminal_symbol_part(candidate) && let Some(alias) = normalized_symbol_alias(terminal) { @@ -76,6 +64,9 @@ pub(crate) fn semantic_symbol_aliases( aliases.terminal_alias = Some(alias.clone()); } push_unique_alias(&mut aliases.name_aliases, &mut seen_names, alias); + for expanded_alias in expanded_symbol_aliases(terminal) { + push_unique_alias(&mut aliases.name_aliases, &mut seen_names, expanded_alias); + } } let owner_parts = owner_symbol_parts(candidate); @@ -1035,25 +1026,25 @@ mod tests { #[test] fn language_from_path_covers_supported_extensions() { let cases = [ - ("a.c", Some("c")), - ("a.cc", Some("cpp")), - ("a.cpp", Some("cpp")), - ("a.cxx", Some("cpp")), - ("a.h", Some("cpp")), - ("a.hpp", Some("cpp")), - ("A.JAVA", Some("java")), - ("a.js", Some("javascript")), - ("a.jsx", Some("javascript")), - ("a.mjs", Some("javascript")), - ("a.cjs", Some("javascript")), - ("a.py", Some("python")), - ("a.pyi", Some("python")), - ("a.rs", Some("rust")), - ("a.ts", Some("typescript")), - ("a.tsx", Some("typescript")), - ("a.mts", Some("typescript")), - ("a.cts", Some("typescript")), - ("App.svelte", Some("svelte")), + ("main.c", Some("c")), + ("main.cpp", Some("cpp")), + ("Main.java", Some("java")), + ("main.js", Some("javascript")), + ("main.py", Some("python")), + ("main.rs", Some("rust")), + ("main.ts", Some("typescript")), + ("main.go", Some("go")), + ("main.rb", Some("ruby")), + ("main.php", Some("php")), + ("Program.cs", Some("csharp")), + ("View.cshtml", None), + ("Main.kt", Some("kotlin")), + ("Main.swift", Some("swift")), + ("main.dart", Some("dart")), + ("script.sh", Some("bash")), + ("index.html", Some("html")), + ("style.css", Some("css")), + ("schema.sql", Some("sql")), ("README.md", None), ]; @@ -1101,6 +1092,22 @@ mod tests { ); } + #[test] + fn symbol_aliases_expand_cpp_cdb_terminal_acronyms() { + let aliases = semantic_symbol_aliases("SourceGroupCxxCdb", Some("SourceGroupCxxCdb")); + + assert!( + aliases + .name_aliases + .contains(&"source group c++ compilation database".to_string()) + ); + assert!( + aliases + .name_aliases + .contains(&"source group c++ compile commands json".to_string()) + ); + } + #[test] fn runtime_concept_phrases_expand_targeted_runtime_terms_only() { assert_eq!( diff --git a/crates/codestory-runtime/src/support.rs b/crates/codestory-runtime/src/support.rs index 6a5ee0ca..b4299164 100644 --- a/crates/codestory-runtime/src/support.rs +++ b/crates/codestory-runtime/src/support.rs @@ -5,8 +5,12 @@ use codestory_contracts::api::{AgentHybridWeightsDto, SearchHybridLimitsDto}; use std::cmp::Ordering; use std::collections::HashMap; use std::collections::HashSet; +use std::io::Read; +use std::path::Path; pub(crate) const HYBRID_RETRIEVAL_ENABLED_ENV: &str = "CODESTORY_HYBRID_RETRIEVAL_ENABLED"; +pub(crate) const SEMANTIC_FILE_TEXT_MAX_BYTES: u64 = 1_000_000; +pub(crate) const SEMANTIC_FILE_TEXT_CACHE_MAX_BYTES: usize = 64 * 1_024 * 1_024; pub(crate) fn hybrid_retrieval_enabled() -> bool { env_flag_enabled(HYBRID_RETRIEVAL_ENABLED_ENV, true) @@ -359,6 +363,25 @@ pub(crate) fn read_searchable_file_contents(path: &str) -> Option { None } +pub(crate) fn read_file_text_limited( + path: &Path, + max_bytes: u64, +) -> std::io::Result> { + let metadata = std::fs::metadata(path)?; + if metadata.len() > max_bytes { + return Ok(None); + } + + let file = std::fs::File::open(path)?; + let mut reader = file.take(max_bytes.saturating_add(1)); + let mut contents = String::new(); + reader.read_to_string(&mut contents)?; + if contents.len() as u64 > max_bytes { + return Ok(None); + } + Ok(Some(contents)) +} + pub(crate) fn aggregate_symbol_matches( primary: Vec<(codestory_contracts::graph::NodeId, f32)>, expanded: Vec<(codestory_contracts::graph::NodeId, f32)>, diff --git a/crates/codestory-runtime/src/symbol_query.rs b/crates/codestory-runtime/src/symbol_query.rs index 466c30d3..0357896b 100644 --- a/crates/codestory-runtime/src/symbol_query.rs +++ b/crates/codestory-runtime/src/symbol_query.rs @@ -258,7 +258,8 @@ fn qualified_symbol_query_parts(query: &str) -> Option<(&str, &str)> { } pub fn retrieval_file_role_from_path(path: &str) -> RetrievalFileRole { - let normalized = normalize_retrieval_path(path); + let normalized_raw = normalize_retrieval_path(path); + let normalized = strip_materialized_repo_cache_prefix(&normalized_raw).to_string(); let marked = format!("/{normalized}"); let file_name = normalized.rsplit('/').next().unwrap_or(normalized.as_str()); @@ -268,6 +269,7 @@ pub fn retrieval_file_role_from_path(path: &str) -> RetrievalFileRole { "/node_modules/", "/src/external/", "/external/", + "/deps/", "/vendor/", "/vendors/", "/third_party/", @@ -356,6 +358,27 @@ fn normalize_retrieval_path(path: &str) -> String { .to_ascii_lowercase() } +fn strip_materialized_repo_cache_prefix(path: &str) -> &str { + let mut best_match: Option<(usize, &str)> = None; + for marker in ["/source/repos/", "source/repos/", "/repos/", "repos/"] { + let Some(index) = path.rfind(marker) else { + continue; + }; + let after_marker = &path[index + marker.len()..]; + if let Some((_, repo_relative)) = after_marker.split_once('/') + && !repo_relative.is_empty() + && best_match + .as_ref() + .is_none_or(|(best_index, _)| index > *best_index) + { + best_match = Some((index, repo_relative)); + } + } + best_match + .map(|(_, repo_relative)| repo_relative) + .unwrap_or(path) +} + fn path_contains_any(path: &str, markers: &[&str]) -> bool { markers.iter().any(|marker| path.contains(marker)) } @@ -1928,6 +1951,20 @@ mod tests { ), RetrievalFileRole::Generated ); + assert_eq!( + retrieval_file_role_from_path( + r"\\?\C:\repo\codestory\target\repo-cache\repos\expressjs-express\lib\response.js" + ), + RetrievalFileRole::Source + ); + assert_eq!( + retrieval_file_role_from_path("target/generated/client.ts"), + RetrievalFileRole::Generated + ); + assert_eq!( + retrieval_file_role_from_path("redis/deps/hiredis/examples/example-ae.c"), + RetrievalFileRole::Vendor + ); } #[test] diff --git a/crates/codestory-runtime/tests/integration.rs b/crates/codestory-runtime/tests/integration.rs index a4207fb3..07caffcf 100644 --- a/crates/codestory-runtime/tests/integration.rs +++ b/crates/codestory-runtime/tests/integration.rs @@ -1,5 +1,6 @@ use codestory_contracts::api::{ - IndexMode, LayoutDirection, OpenProjectRequest, TrailCallerScope, TrailDirection, TrailMode, + IndexMode, LayoutDirection, ListRootSymbolsRequest, OpenProjectRequest, TrailCallerScope, + TrailDirection, TrailMode, }; use codestory_runtime::AppController; use codestory_store::Store; @@ -55,24 +56,18 @@ fn test_cli_app_indexer_smoke() -> anyhow::Result<()> { .unwrap(); assert!(summary.stats.node_count > 0); - // 3. Search for a symbol - let hits = controller - .search(codestory_contracts::api::SearchRequest { - query: "f0".to_string(), - repo_text: codestory_contracts::api::SearchRepoTextMode::Off, - limit_per_source: 10, - expand_search_plan: false, - hybrid_weights: None, - hybrid_limits: None, - }) + // 3. Resolve an indexed symbol through the graph surface. Search is sidecar-primary and + // requires retrieval sidecars, which this lifecycle smoke intentionally does not build. + let symbols = controller + .list_root_symbols(ListRootSymbolsRequest { limit: Some(50) }) .unwrap(); - assert!(!hits.is_empty(), "Search should find f0"); + assert!(!symbols.is_empty(), "Root symbols should include f0"); - let main_id = hits + let main_id = symbols .into_iter() - .find(|h| h.display_name.contains("f0")) + .find(|symbol| symbol.label.contains("f0")) .unwrap() - .node_id; + .id; // 4. Trail query with max_nodes = 10 to force truncation // This is the regression test around truncated trails not emitting fallback node IDs diff --git a/crates/codestory-runtime/tests/retrieval_browser_contracts.rs b/crates/codestory-runtime/tests/retrieval_browser_contracts.rs index 9dfe2870..b1864615 100644 --- a/crates/codestory-runtime/tests/retrieval_browser_contracts.rs +++ b/crates/codestory-runtime/tests/retrieval_browser_contracts.rs @@ -289,9 +289,8 @@ fn assert_mandatory_sidecar_unavailable(error: &ApiError) { details .next_commands .iter() - .any(|command| command.contains("codestory-cli index") - && command.contains("--refresh full")), - "retrieval error should include index recovery command: {error:?}" + .all(|command| !command.contains("codestory-cli index")), + "sidecar retrieval errors should not repeat core index repair commands: {error:?}" ); assert!( details @@ -308,6 +307,14 @@ fn assert_mandatory_sidecar_unavailable(error: &ApiError) { && command.contains("--refresh full")), "retrieval error should include sidecar index recovery command: {error:?}" ); + assert!( + details + .next_commands + .iter() + .any(|command| command.contains("codestory-cli retrieval status") + && command.contains("--format json")), + "retrieval error should include sidecar status proof command: {error:?}" + ); } fn citation_named<'a>(citations: &'a [AgentCitationDto], name: &str) -> &'a AgentCitationDto { diff --git a/crates/codestory-runtime/tests/retrieval_generalization_guard.rs b/crates/codestory-runtime/tests/retrieval_generalization_guard.rs index bc56cbed..aa9fec27 100644 --- a/crates/codestory-runtime/tests/retrieval_generalization_guard.rs +++ b/crates/codestory-runtime/tests/retrieval_generalization_guard.rs @@ -2,8 +2,11 @@ use std::path::{Path, PathBuf}; use std::process::{Command, Output}; +use std::sync::{Mutex, OnceLock}; use tempfile::TempDir; +static LINT_SCRIPT_LOCK: OnceLock> = OnceLock::new(); + fn production_source(contents: &str) -> &str { match contents.find("#[cfg(test)]") { Some(marker) => &contents[..marker], @@ -67,6 +70,10 @@ fn lint_script(repo_root: &Path) -> PathBuf { } fn run_lint_with_extra_root(repo_root: &Path, script: &Path, extra_root: &Path) -> Output { + let _guard = LINT_SCRIPT_LOCK + .get_or_init(|| Mutex::new(())) + .lock() + .expect("lock lint script subprocess"); Command::new("node") .arg(script) .current_dir(repo_root) @@ -101,6 +108,10 @@ fn retrieval_generalization_lint_script_exits_clean_when_dirs_absent() { let repo_root = workspace_root(); let script = lint_script(&repo_root); + let _guard = LINT_SCRIPT_LOCK + .get_or_init(|| Mutex::new(())) + .lock() + .expect("lock lint script subprocess"); let status = Command::new("node") .arg(&script) .current_dir(&repo_root) @@ -192,6 +203,87 @@ pub fn leaked_production_path() -> &'static str { ); } +#[test] +fn linter_catches_current_holdout_literals_in_production() { + let output = run_lint_with_fixture( + r#" +pub fn leaked_holdout_probe() -> &'static [&'static str] { + &[ + "axios", + "redis", + "ripgrep", + "dispatchRequest", + "readQueryFromClient", + "HiArgs", + "server.c", + "core/main.rs", + "haystack.rs", + ] +} +"#, + ); + let stderr = String::from_utf8_lossy(&output.stderr); + assert!( + !output.status.success(), + "fixture with current holdout literals should fail lint; stderr={stderr}" + ); + for expected in ["dispatchRequest", "readQueryFromClient", "core/main.rs"] { + assert!( + stderr.contains(expected), + "lint failure should report current holdout literal {expected}, stderr={stderr}" + ); + } +} + +#[test] +fn linter_catches_split_benchmark_family_literals_in_production() { + let output = run_lint_with_fixture( + r##" +pub fn leaked_split_family_markers() -> Vec { + vec![ + ["use", "s", "wr"].concat(), + ["string", "utils"].concat(), + ["charsequence", "utils"].concat(), + ["source/animate", ".css"].concat(), + [ + "s", + "wr", + ].concat(), + [ + "auto", + "mapper", + ].concat(), + [ + r#"s"#, + r#"wr"#, + ].concat(), + [ + r#"string"#, + r#"utils"#, + ].concat(), + ] +} +"##, + ); + let stderr = String::from_utf8_lossy(&output.stderr); + assert!( + !output.status.success(), + "split benchmark-family literals should fail lint; stderr={stderr}" + ); + for expected in [ + "swr", + "useswr", + "stringutils", + "automapper", + "sourceanimatecss", + ] { + assert!( + stderr.to_ascii_lowercase().contains(expected), + "lint failure should report compact benchmark marker {expected}; stderr={stderr}" + ); + } +} + #[test] fn linter_masks_preceding_attrs_for_cfg_test_items() { let output = run_lint_with_fixture( diff --git a/crates/codestory-store/Cargo.toml b/crates/codestory-store/Cargo.toml index 531a4d7b..6622c336 100644 --- a/crates/codestory-store/Cargo.toml +++ b/crates/codestory-store/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "codestory-store" -version = "0.6.2" +version = "0.7.0" edition = "2024" [dependencies] diff --git a/crates/codestory-store/src/lib.rs b/crates/codestory-store/src/lib.rs index 1bf7f51c..f74f4138 100644 --- a/crates/codestory-store/src/lib.rs +++ b/crates/codestory-store/src/lib.rs @@ -14,12 +14,12 @@ pub use snapshot_store::{ SnapshotRefreshStats, SnapshotStore, StagedSnapshot, StagedSnapshotFinalizeStats, }; pub use storage_impl::{ - CallerProjectionRemovalSummary, FileInfo, FileProjectionRemovalSummary, FileRole, - GroundingEdgeKindCount, GroundingFileSummary, GroundingNodeRecord, GroundingSnapshotMetadata, - GroundingSnapshotState, LlmSymbolDoc, LlmSymbolDocReuseMetadata, LlmSymbolDocStats, - ProjectionFlushBreakdown, RetrievalIndexManifest, SearchSymbolProjection, + CallerProjectionRemovalSummary, DenseReasonCounts, FileInfo, FileProjectionRemovalSummary, + FileRole, GroundingEdgeKindCount, GroundingFileSummary, GroundingNodeRecord, + GroundingSnapshotMetadata, GroundingSnapshotState, LlmSymbolDoc, LlmSymbolDocReuseMetadata, + LlmSymbolDocStats, ProjectionFlushBreakdown, RetrievalIndexManifest, SearchSymbolProjection, SearchSymbolProjectionDetail, Storage as Store, StorageError, StorageOpenMode, StorageStats, - SymbolSummaryRecord, + SymbolSearchDoc, SymbolSummaryRecord, }; pub use trail_store::TrailStore; diff --git a/crates/codestory-store/src/storage_impl/mod.rs b/crates/codestory-store/src/storage_impl/mod.rs index 1677c15d..8eac2e79 100644 --- a/crates/codestory-store/src/storage_impl/mod.rs +++ b/crates/codestory-store/src/storage_impl/mod.rs @@ -26,7 +26,7 @@ use helpers::{ numbered_placeholders, question_placeholders, serialize_candidate_targets, }; -const SCHEMA_VERSION: u32 = 17; +const SCHEMA_VERSION: u32 = 18; const GROUNDING_SNAPSHOT_VERSION: i64 = 1; const GROUNDING_SNAPSHOT_STATE_DIRTY: i64 = 0; const GROUNDING_SNAPSHOT_STATE_BUILDING: i64 = 1; @@ -362,10 +362,26 @@ impl FileRole { } pub fn classify_path(path: &Path) -> Self { - let normalized = path + let mut normalized = path .to_string_lossy() .replace('\\', "/") .to_ascii_lowercase(); + let mut best_repo_relative: Option<(usize, String)> = None; + for marker in ["/source/repos/", "source/repos/", "/repos/", "repos/"] { + if let Some(index) = normalized.rfind(marker) { + let remainder = &normalized[index + marker.len()..]; + if let Some((_, repo_relative)) = remainder.split_once('/') + && best_repo_relative + .as_ref() + .is_none_or(|(best_index, _)| index > *best_index) + { + best_repo_relative = Some((index, repo_relative.to_string())); + } + } + } + if let Some((_, repo_relative)) = best_repo_relative { + normalized = repo_relative; + } let marked = format!("/{normalized}"); let file_name = normalized.rsplit('/').next().unwrap_or(normalized.as_str()); @@ -447,6 +463,7 @@ pub struct StorageStats { pub edge_count: i64, pub file_count: i64, pub error_count: i64, + pub fatal_error_count: i64, } fn is_framework_synthetic_node(node: &Node) -> bool { @@ -705,6 +722,10 @@ pub struct LlmSymbolDoc { pub embedding_dim: u32, #[serde(default, skip_serializing_if = "Option::is_none")] pub doc_shape: Option, + #[serde(default, skip_serializing_if = "Option::is_none")] + pub semantic_policy_version: Option, + #[serde(default, skip_serializing_if = "Option::is_none")] + pub dense_reason: Option, pub embedding: Vec, pub updated_at_epoch_ms: i64, } @@ -719,6 +740,8 @@ pub struct LlmSymbolDocReuseMetadata { pub embedding_backend: Option, pub embedding_dim: u32, pub doc_shape: Option, + pub semantic_policy_version: Option, + pub dense_reason: Option, } #[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)] @@ -732,12 +755,41 @@ pub struct LlmSymbolDocStats { pub embedding_dim: Option, pub doc_version: Option, pub doc_shape: Option, + pub semantic_policy_version: Option, pub mixed_embedding_profiles: bool, pub mixed_embedding_models: bool, pub mixed_embedding_backends: bool, pub mixed_dimensions: bool, pub mixed_doc_versions: bool, pub mixed_doc_shapes: bool, + pub mixed_semantic_policy_versions: bool, +} + +#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)] +pub struct DenseReasonCounts { + pub public_api: u32, + pub entrypoint: u32, + pub documented_nontrivial: u32, + pub central_graph_node: u32, + pub component_report: u32, + pub unstructured_doc: u32, +} + +#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)] +pub struct SymbolSearchDoc { + pub node_id: NodeId, + pub file_node_id: Option, + pub kind: NodeKind, + pub display_name: String, + pub qualified_name: Option, + pub file_path: Option, + pub start_line: Option, + pub doc_text: String, + pub doc_version: u32, + pub doc_hash: String, + pub policy_version: String, + pub source_provenance: String, + pub updated_at_epoch_ms: i64, } #[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)] @@ -1492,6 +1544,7 @@ impl Storage { tx.execute("DELETE FROM occurrence", [])?; tx.execute("DELETE FROM edge", [])?; tx.execute("DELETE FROM llm_symbol_doc", [])?; + tx.execute("DELETE FROM symbol_search_doc", [])?; tx.execute("DELETE FROM symbol_summary", [])?; tx.execute("DELETE FROM search_symbol_projection", [])?; tx.execute("DELETE FROM component_access", [])?; @@ -1607,9 +1660,8 @@ impl Storage { cleanup_sqlite_sidecars(staged_path) } - fn init(&self, mode: StorageOpenMode) -> Result<(), StorageError> { + fn init(&self, _mode: StorageOpenMode) -> Result<(), StorageError> { self.create_tables()?; - self.create_indexes(mode)?; if self.schema_version()? == 0 { self.set_schema_version(SCHEMA_VERSION)?; } @@ -1620,10 +1672,6 @@ impl Storage { schema::create_tables(&self.conn) } - fn create_indexes(&self, mode: StorageOpenMode) -> Result<(), StorageError> { - schema::create_indexes(&self.conn, mode) - } - fn schema_version(&self) -> Result { let version: i64 = self .conn @@ -1906,6 +1954,87 @@ impl Storage { Ok(()) } + pub fn upsert_retrieval_artifact_nodes_batch( + &mut self, + nodes: &[Node], + ) -> Result<(), StorageError> { + let prepared_nodes = self.prepared_nodes_for_insert(nodes)?; + let tx = self.conn.transaction()?; + { + let mut stmt = tx.prepare( + "INSERT INTO node (id, kind, serialized_name, qualified_name, canonical_id, file_node_id, start_line, start_col, end_line, end_col) + VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9, ?10) + ON CONFLICT(id) DO UPDATE SET + kind = excluded.kind, + serialized_name = excluded.serialized_name, + qualified_name = excluded.qualified_name, + canonical_id = excluded.canonical_id, + file_node_id = excluded.file_node_id, + start_line = excluded.start_line, + start_col = excluded.start_col, + end_line = excluded.end_line, + end_col = excluded.end_col", + )?; + for node in &prepared_nodes { + Self::insert_node_with_stmt(&mut stmt, node)?; + } + } + tx.commit()?; + + let mut cache = self.cache.nodes.write(); + for node in &prepared_nodes { + cache.insert(node.id, node.clone()); + } + + Ok(()) + } + + pub fn copy_retrieval_artifact_nodes_from( + &mut self, + source_path: &Path, + ) -> Result { + if !source_path.exists() { + return Ok(0); + } + drop(Storage::open(source_path)?); + let source = source_path.to_string_lossy().to_string(); + self.conn + .execute("ATTACH DATABASE ?1 AS source_snapshot", params![source])?; + let copy_result = self.conn.execute( + "INSERT OR REPLACE INTO node ( + id, + kind, + serialized_name, + qualified_name, + canonical_id, + file_node_id, + start_line, + start_col, + end_line, + end_col + ) + SELECT + source_node.id, + source_node.kind, + source_node.serialized_name, + source_node.qualified_name, + source_node.canonical_id, + source_node.file_node_id, + source_node.start_line, + source_node.start_col, + source_node.end_line, + source_node.end_col + FROM source_snapshot.node source_node + WHERE source_node.serialized_name LIKE 'component_report:%' + OR source_node.canonical_id LIKE 'codestory:component_report:%'", + [], + ); + let detach_result = self.conn.execute("DETACH DATABASE source_snapshot", []); + let copied = copy_result?; + detach_result?; + Ok(copied) + } + pub fn insert_edges_batch(&mut self, edges: &[Edge]) -> Result<(), StorageError> { let tx = self.conn.transaction()?; { @@ -2219,6 +2348,10 @@ impl Storage { } else { self.prepared_nodes_for_insert_with_files(batch.nodes, batch.files)? }; + let pending_node_labels = prepared_nodes + .iter() + .map(|node| (node.id, format!("{:?}:{}", node.kind, node.serialized_name))) + .collect::>(); let nodes_prepare_ms = clamp_i64_to_u32(nodes_prepare_started.elapsed().as_millis() as i64); let tx = self.conn.transaction()?; @@ -2281,7 +2414,12 @@ impl Storage { .filter(|node| node.kind != NodeKind::FILE), ) { - Self::insert_node_with_stmt(&mut stmt, node)?; + Self::insert_node_with_stmt(&mut stmt, node).map_err(|err| { + StorageError::Other(format!( + "flush_projection_batch node insert failed for id={} kind={:?} name={} file_node_id={:?}: {err}", + node.id.0, node.kind, node.serialized_name, node.file_node_id.map(|id| id.0) + )) + })?; } breakdown.nodes_ms = nodes_prepare_ms.saturating_add(clamp_i64_to_u32( nodes_insert_started.elapsed().as_millis() as i64, @@ -2308,7 +2446,34 @@ impl Storage { edge.callsite_identity.as_deref(), row_mapping::certainty_db_value(edge.certainty), serialize_candidate_targets(&edge.candidate_targets)? - ])?; + ]) + .map_err(|err| { + let source_label = pending_node_labels + .get(&edge.source) + .map(String::as_str) + .unwrap_or(""); + let target_label = pending_node_labels + .get(&edge.target) + .map(String::as_str) + .unwrap_or(""); + let file_label = edge + .file_node_id + .and_then(|id| pending_node_labels.get(&id).map(String::as_str)) + .unwrap_or(""); + StorageError::Other(format!( + "flush_projection_batch edge insert failed for id={} kind={:?} source={} ({}) target={} ({}) file_node_id={:?} ({}) resolved_source={:?} resolved_target={:?}: {err}", + edge.id.0, + edge.kind, + edge.source.0, + source_label, + edge.target.0, + target_label, + edge.file_node_id.map(|id| id.0), + file_label, + edge.resolved_source.map(|id| id.0), + edge.resolved_target.map(|id| id.0) + )) + })?; } breakdown.edges_ms = clamp_i64_to_u32(started.elapsed().as_millis() as i64); } @@ -2328,7 +2493,19 @@ impl Storage { occ.location.start_col, occ.location.end_line, occ.location.end_col, - ])?; + ]) + .map_err(|err| { + StorageError::Other(format!( + "flush_projection_batch occurrence insert failed for element_id={} kind={:?} file_node_id={} range={}:{}-{}:{}: {err}", + occ.element_id, + occ.kind, + occ.location.file_node_id.0, + occ.location.start_line, + occ.location.start_col, + occ.location.end_line, + occ.location.end_col + )) + })?; } breakdown.occurrences_ms = clamp_i64_to_u32(started.elapsed().as_millis() as i64); } @@ -2344,7 +2521,13 @@ impl Storage { stmt.execute(params![ node_id.0, row_mapping::access_kind_db_value(*access), - ])?; + ]) + .map_err(|err| { + StorageError::Other(format!( + "flush_projection_batch component_access insert failed for node_id={} access={:?}: {err}", + node_id.0, access + )) + })?; } breakdown.component_access_ms = clamp_i64_to_u32(started.elapsed().as_millis() as i64); } @@ -2371,7 +2554,17 @@ impl Storage { state.body_hash, state.start_line, state.end_line, - ])?; + ]) + .map_err(|err| { + StorageError::Other(format!( + "flush_projection_batch callable_projection_state insert failed for file_id={} node_id={} symbol_key={} range={}-{}: {err}", + state.file_id, + state.node_id.0, + state.symbol_key, + state.start_line, + state.end_line + )) + })?; } breakdown.callable_projection_ms = clamp_i64_to_u32(started.elapsed().as_millis() as i64); @@ -2613,6 +2806,290 @@ impl Storage { Ok(clamp_i64_to_u32(count)) } + pub fn upsert_symbol_search_docs_batch( + &mut self, + docs: &[SymbolSearchDoc], + ) -> Result<(), StorageError> { + if docs.is_empty() { + return Ok(()); + } + + let tx = self.conn.transaction()?; + { + let mut stmt = tx.prepare( + "INSERT INTO symbol_search_doc ( + node_id, + file_node_id, + kind, + display_name, + qualified_name, + file_path, + start_line, + doc_text, + doc_version, + doc_hash, + policy_version, + source_provenance, + updated_at_epoch_ms + ) VALUES ( + ?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9, ?10, ?11, ?12, ?13 + ) + ON CONFLICT(node_id) DO UPDATE SET + file_node_id = excluded.file_node_id, + kind = excluded.kind, + display_name = excluded.display_name, + qualified_name = excluded.qualified_name, + file_path = excluded.file_path, + start_line = excluded.start_line, + doc_text = excluded.doc_text, + doc_version = excluded.doc_version, + doc_hash = excluded.doc_hash, + policy_version = excluded.policy_version, + source_provenance = excluded.source_provenance, + updated_at_epoch_ms = excluded.updated_at_epoch_ms", + )?; + for doc in docs { + stmt.execute(params![ + doc.node_id.0, + doc.file_node_id.map(|id| id.0), + doc.kind as i32, + doc.display_name, + doc.qualified_name, + doc.file_path, + doc.start_line, + doc.doc_text, + doc.doc_version as i64, + doc.doc_hash, + doc.policy_version, + doc.source_provenance, + doc.updated_at_epoch_ms, + ])?; + } + } + tx.commit()?; + Ok(()) + } + + pub fn get_symbol_search_docs_batch_after( + &self, + after_node_id: Option, + limit: usize, + ) -> Result, StorageError> { + let mut stmt = self.conn.prepare( + "SELECT + node_id, + file_node_id, + kind, + display_name, + qualified_name, + file_path, + start_line, + doc_text, + doc_version, + doc_hash, + policy_version, + source_provenance, + updated_at_epoch_ms + FROM symbol_search_doc + WHERE (?1 IS NULL OR node_id > ?1) + ORDER BY node_id ASC + LIMIT ?2", + )?; + let after_node_id = after_node_id.map(|id| id.0); + let limit = limit.min(i64::MAX as usize) as i64; + let mut rows = stmt.query(params![after_node_id, limit])?; + let mut docs = Vec::new(); + while let Some(row) = rows.next()? { + let kind: i32 = row.get(2)?; + let doc_version: i64 = row.get(8)?; + docs.push(SymbolSearchDoc { + node_id: NodeId(row.get(0)?), + file_node_id: row.get::<_, Option>(1)?.map(NodeId), + kind: NodeKind::try_from(kind)?, + display_name: row.get(3)?, + qualified_name: row.get(4)?, + file_path: row.get(5)?, + start_line: row.get(6)?, + doc_text: row.get(7)?, + doc_version: doc_version.max(0).min(u32::MAX as i64) as u32, + doc_hash: row.get(9)?, + policy_version: row.get(10)?, + source_provenance: row.get(11)?, + updated_at_epoch_ms: row.get(12)?, + }); + } + Ok(docs) + } + + pub fn get_symbol_search_doc_count(&self) -> Result { + let count = self + .conn + .query_row("SELECT COUNT(*) FROM symbol_search_doc", [], |row| { + row.get::<_, i64>(0) + })?; + Ok(clamp_i64_to_u32(count)) + } + + pub fn clear_symbol_search_docs(&mut self) -> Result { + let removed = self.conn.execute("DELETE FROM symbol_search_doc", [])?; + Ok(removed) + } + + pub fn copy_symbol_search_docs_from( + &mut self, + source_path: &Path, + ) -> Result { + if !source_path.exists() { + return Ok(0); + } + drop(Storage::open(source_path)?); + let source = source_path.to_string_lossy().to_string(); + self.conn + .execute("ATTACH DATABASE ?1 AS source_snapshot", params![source])?; + let copy_result = self.conn.execute( + "INSERT OR REPLACE INTO symbol_search_doc ( + node_id, + file_node_id, + kind, + display_name, + qualified_name, + file_path, + start_line, + doc_text, + doc_version, + doc_hash, + policy_version, + source_provenance, + updated_at_epoch_ms + ) + SELECT + source_doc.node_id, + source_doc.file_node_id, + source_doc.kind, + source_doc.display_name, + source_doc.qualified_name, + source_doc.file_path, + source_doc.start_line, + source_doc.doc_text, + source_doc.doc_version, + source_doc.doc_hash, + source_doc.policy_version, + source_doc.source_provenance, + source_doc.updated_at_epoch_ms + FROM source_snapshot.symbol_search_doc source_doc + WHERE EXISTS ( + SELECT 1 FROM node WHERE node.id = source_doc.node_id + ) + AND ( + source_doc.file_node_id IS NULL + OR EXISTS ( + SELECT 1 FROM node WHERE node.id = source_doc.file_node_id + ) + )", + [], + ); + let detach_result = self.conn.execute("DETACH DATABASE source_snapshot", []); + let copied = copy_result?; + detach_result?; + Ok(copied) + } + + pub fn prune_symbol_search_docs_to_node_ids( + &mut self, + keep_node_ids: &[NodeId], + ) -> Result { + if keep_node_ids.is_empty() { + return self.clear_symbol_search_docs(); + } + + let tx = self.conn.transaction()?; + tx.execute( + "CREATE TEMP TABLE IF NOT EXISTS symbol_search_doc_keep ( + node_id INTEGER PRIMARY KEY + )", + [], + )?; + tx.execute("DELETE FROM temp.symbol_search_doc_keep", [])?; + { + let mut stmt = tx.prepare( + "INSERT OR IGNORE INTO temp.symbol_search_doc_keep (node_id) VALUES (?1)", + )?; + for node_id in keep_node_ids { + stmt.execute(params![node_id.0])?; + } + } + let removed = tx.execute( + "DELETE FROM symbol_search_doc + WHERE NOT EXISTS ( + SELECT 1 + FROM temp.symbol_search_doc_keep keep + WHERE keep.node_id = symbol_search_doc.node_id + )", + [], + )?; + tx.execute("DROP TABLE temp.symbol_search_doc_keep", [])?; + tx.commit()?; + Ok(removed) + } + + pub fn delete_symbol_search_docs_for_files_except_node_ids( + &mut self, + file_node_ids: &[NodeId], + keep_node_ids: &[NodeId], + ) -> Result { + if file_node_ids.is_empty() { + return Ok(0); + } + + let tx = self.conn.transaction()?; + tx.execute( + "CREATE TEMP TABLE IF NOT EXISTS symbol_search_doc_scope ( + file_node_id INTEGER PRIMARY KEY + )", + [], + )?; + tx.execute( + "CREATE TEMP TABLE IF NOT EXISTS symbol_search_doc_keep ( + node_id INTEGER PRIMARY KEY + )", + [], + )?; + tx.execute("DELETE FROM temp.symbol_search_doc_scope", [])?; + tx.execute("DELETE FROM temp.symbol_search_doc_keep", [])?; + { + let mut stmt = tx.prepare( + "INSERT OR IGNORE INTO temp.symbol_search_doc_scope (file_node_id) VALUES (?1)", + )?; + for file_node_id in file_node_ids { + stmt.execute(params![file_node_id.0])?; + } + } + { + let mut stmt = tx.prepare( + "INSERT OR IGNORE INTO temp.symbol_search_doc_keep (node_id) VALUES (?1)", + )?; + for node_id in keep_node_ids { + stmt.execute(params![node_id.0])?; + } + } + let removed = tx.execute( + "DELETE FROM symbol_search_doc + WHERE file_node_id IN ( + SELECT file_node_id FROM temp.symbol_search_doc_scope + ) + AND NOT EXISTS ( + SELECT 1 + FROM temp.symbol_search_doc_keep keep + WHERE keep.node_id = symbol_search_doc.node_id + )", + [], + )?; + tx.execute("DROP TABLE temp.symbol_search_doc_scope", [])?; + tx.execute("DROP TABLE temp.symbol_search_doc_keep", [])?; + tx.commit()?; + Ok(removed) + } + pub fn max_indexed_file_modification_time(&self) -> Result, StorageError> { self.conn .query_row( @@ -2726,10 +3203,12 @@ impl Storage { embedding_backend, embedding_dim, doc_shape, + semantic_policy_version, + dense_reason, embedding_blob, updated_at_epoch_ms ) VALUES ( - ?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9, ?10, ?11, ?12, ?13, ?14, ?15, ?16, ?17 + ?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9, ?10, ?11, ?12, ?13, ?14, ?15, ?16, ?17, ?18, ?19 ) ON CONFLICT(node_id) DO UPDATE SET file_node_id = excluded.file_node_id, @@ -2746,6 +3225,8 @@ impl Storage { embedding_backend = excluded.embedding_backend, embedding_dim = excluded.embedding_dim, doc_shape = excluded.doc_shape, + semantic_policy_version = excluded.semantic_policy_version, + dense_reason = excluded.dense_reason, embedding_blob = excluded.embedding_blob, updated_at_epoch_ms = excluded.updated_at_epoch_ms", )?; @@ -2767,6 +3248,8 @@ impl Storage { doc.embedding_backend, doc.embedding_dim as i64, doc.doc_shape, + doc.semantic_policy_version, + doc.dense_reason, encode_embedding_blob(&doc.embedding), doc.updated_at_epoch_ms, ])?; @@ -2801,6 +3284,8 @@ impl Storage { embedding_backend, embedding_dim, doc_shape, + semantic_policy_version, + dense_reason, embedding_blob, updated_at_epoch_ms FROM llm_symbol_doc @@ -2816,7 +3301,7 @@ impl Storage { let kind: i32 = row.get(2)?; let doc_version: i64 = row.get(8)?; let embedding_dim: i64 = row.get(13)?; - let embedding_blob: Vec = row.get(15)?; + let embedding_blob: Vec = row.get(17)?; docs.push(LlmSymbolDoc { node_id: NodeId(row.get(0)?), file_node_id: row.get::<_, Option>(1)?.map(NodeId), @@ -2833,8 +3318,10 @@ impl Storage { embedding_backend: row.get(12)?, embedding_dim: embedding_dim.max(0) as u32, doc_shape: row.get(14)?, + semantic_policy_version: row.get(15)?, + dense_reason: row.get(16)?, embedding: decode_embedding_blob(&embedding_blob)?, - updated_at_epoch_ms: row.get(16)?, + updated_at_epoch_ms: row.get(18)?, }); } @@ -2862,6 +3349,9 @@ impl Storage { min_shape, max_shape, shape_count, + min_policy, + max_policy, + policy_count, ) = self.conn.query_row( "SELECT COUNT(*), @@ -2882,7 +3372,10 @@ impl Storage { COUNT(doc_version), MIN(doc_shape), MAX(doc_shape), - COUNT(doc_shape) + COUNT(doc_shape), + MIN(semantic_policy_version), + MAX(semantic_policy_version), + COUNT(semantic_policy_version) FROM llm_symbol_doc", [], |row| { @@ -2906,6 +3399,9 @@ impl Storage { row.get::<_, Option>(16)?, row.get::<_, Option>(17)?, row.get::<_, i64>(18)?, + row.get::<_, Option>(19)?, + row.get::<_, Option>(20)?, + row.get::<_, i64>(21)?, )) }, )?; @@ -2917,6 +3413,8 @@ impl Storage { uniform_optional_string_with_count(doc_count, backend_count, min_backend, max_backend); let (doc_shape, mixed_doc_shapes) = uniform_optional_string_with_count(doc_count, shape_count, min_shape, max_shape); + let (semantic_policy_version, mixed_semantic_policy_versions) = + uniform_optional_string_with_count(doc_count, policy_count, min_policy, max_policy); let (embedding_dim, mixed_dimensions) = uniform_optional_u32_with_count(doc_count, dim_count, min_dim, max_dim); let (doc_version, mixed_doc_versions) = @@ -2930,12 +3428,14 @@ impl Storage { embedding_dim, doc_version, doc_shape, + semantic_policy_version, mixed_embedding_profiles, mixed_embedding_models, mixed_embedding_backends, mixed_dimensions, mixed_doc_versions, mixed_doc_shapes, + mixed_semantic_policy_versions, }) } @@ -3059,7 +3559,9 @@ impl Storage { embedding_model, embedding_backend, embedding_dim, - doc_shape + doc_shape, + semantic_policy_version, + dense_reason FROM llm_symbol_doc ORDER BY node_id ASC", )?; @@ -3077,6 +3579,8 @@ impl Storage { embedding_backend: row.get(5)?, embedding_dim: embedding_dim.max(0).min(u32::MAX as i64) as u32, doc_shape: row.get(7)?, + semantic_policy_version: row.get(8)?, + dense_reason: row.get(9)?, }); } Ok(docs) @@ -3104,6 +3608,8 @@ impl Storage { embedding_backend, embedding_dim, doc_shape, + semantic_policy_version, + dense_reason, embedding_blob, updated_at_epoch_ms FROM llm_symbol_doc @@ -3119,7 +3625,7 @@ impl Storage { let kind: i32 = row.get(2)?; let doc_version: i64 = row.get(8)?; let embedding_dim: i64 = row.get(13)?; - let embedding_blob: Vec = row.get(15)?; + let embedding_blob: Vec = row.get(17)?; docs.push(LlmSymbolDoc { node_id: NodeId(row.get(0)?), file_node_id: row.get::<_, Option>(1)?.map(NodeId), @@ -3136,8 +3642,10 @@ impl Storage { embedding_backend: row.get(12)?, embedding_dim: embedding_dim.max(0) as u32, doc_shape: row.get(14)?, + semantic_policy_version: row.get(15)?, + dense_reason: row.get(16)?, embedding: decode_embedding_blob(&embedding_blob)?, - updated_at_epoch_ms: row.get(16)?, + updated_at_epoch_ms: row.get(18)?, }); } Ok(docs) @@ -3152,6 +3660,7 @@ impl Storage { if !source_path.exists() { return Ok(0); } + drop(Storage::open(source_path)?); let source = source_path.to_string_lossy().to_string(); self.conn .execute("ATTACH DATABASE ?1 AS source_snapshot", params![source])?; @@ -3172,6 +3681,8 @@ impl Storage { embedding_backend, embedding_dim, doc_shape, + semantic_policy_version, + dense_reason, embedding_blob, updated_at_epoch_ms ) @@ -3191,6 +3702,8 @@ impl Storage { source_doc.embedding_backend, source_doc.embedding_dim, source_doc.doc_shape, + source_doc.semantic_policy_version, + source_doc.dense_reason, source_doc.embedding_blob, source_doc.updated_at_epoch_ms FROM source_snapshot.llm_symbol_doc source_doc @@ -3316,6 +3829,17 @@ impl Storage { Ok(removed) } + pub fn delete_symbol_search_docs_for_file( + &mut self, + file_node_id: NodeId, + ) -> Result { + let removed = self.conn.execute( + "DELETE FROM symbol_search_doc WHERE file_node_id = ?1", + params![file_node_id.0], + )?; + Ok(removed) + } + pub fn get_occurrences(&self) -> Result, StorageError> { let mut stmt = self.conn.prepare( "SELECT element_id, kind, file_node_id, start_line, start_col, end_line, end_col FROM occurrence" @@ -4400,6 +4924,7 @@ impl Storage { } pub fn get_stats(&self) -> Result { + let fatal_error_count = self.fatal_error_count()?; if self.has_ready_grounding_summary_snapshots()? { let mut stmt = self.conn.prepare( "SELECT node_count, edge_count, file_count, error_count @@ -4413,6 +4938,7 @@ impl Storage { edge_count: row.get(1)?, file_count: row.get(2)?, error_count: row.get(3)?, + fatal_error_count, }); } } @@ -4433,9 +4959,18 @@ impl Storage { edge_count, file_count, error_count, + fatal_error_count, }) } + fn fatal_error_count(&self) -> Result { + self.conn + .query_row("SELECT count(*) FROM error WHERE fatal = 1", [], |r| { + r.get(0) + }) + .map_err(StorageError::from) + } + /// Delete all graph/search projection data linked to one canonical file node. pub fn delete_file_projection( &mut self, @@ -4546,6 +5081,14 @@ impl Storage { ), params![file_node_id], )?; + tx.execute( + &format!( + "DELETE FROM symbol_search_doc + WHERE node_id IN (SELECT node_id FROM {RELATED_NODE_IDS_TABLE}) + OR file_node_id = ?1" + ), + params![file_node_id], + )?; tx.execute( &format!( "DELETE FROM search_symbol_projection diff --git a/crates/codestory-store/src/storage_impl/retrieval_manifest.rs b/crates/codestory-store/src/storage_impl/retrieval_manifest.rs index 32bd27f1..b2f45e45 100644 --- a/crates/codestory-store/src/storage_impl/retrieval_manifest.rs +++ b/crates/codestory-store/src/storage_impl/retrieval_manifest.rs @@ -21,6 +21,13 @@ pub struct RetrievalIndexManifest { pub sidecar_generation: Option, /// Number of symbol projection rows included in the sidecar input hash. pub projection_count: Option, + /// Number of graph-native symbol-search docs included in the sidecar input hash. + pub symbol_doc_count: Option, + /// Number of dense semantic anchors included in Qdrant. + pub dense_projection_count: Option, + pub semantic_policy_version: Option, + pub graph_artifact_hash: Option, + pub dense_reason_counts_json: Option, } impl Storage { @@ -42,8 +49,13 @@ impl Storage { sidecar_schema_version, sidecar_input_hash, sidecar_generation, - projection_count - ) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9, ?10, ?11, ?12, ?13) + projection_count, + symbol_doc_count, + dense_projection_count, + semantic_policy_version, + graph_artifact_hash, + dense_reason_counts_json + ) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9, ?10, ?11, ?12, ?13, ?14, ?15, ?16, ?17, ?18) ON CONFLICT(project_id) DO UPDATE SET zoekt_version = excluded.zoekt_version, qdrant_collection = excluded.qdrant_collection, @@ -56,7 +68,12 @@ impl Storage { sidecar_schema_version = excluded.sidecar_schema_version, sidecar_input_hash = excluded.sidecar_input_hash, sidecar_generation = excluded.sidecar_generation, - projection_count = excluded.projection_count", + projection_count = excluded.projection_count, + symbol_doc_count = excluded.symbol_doc_count, + dense_projection_count = excluded.dense_projection_count, + semantic_policy_version = excluded.semantic_policy_version, + graph_artifact_hash = excluded.graph_artifact_hash, + dense_reason_counts_json = excluded.dense_reason_counts_json", rusqlite::params![ manifest.project_id, manifest.zoekt_version, @@ -71,6 +88,11 @@ impl Storage { manifest.sidecar_input_hash, manifest.sidecar_generation, manifest.projection_count, + manifest.symbol_doc_count, + manifest.dense_projection_count, + manifest.semantic_policy_version, + manifest.graph_artifact_hash, + manifest.dense_reason_counts_json, ], )?; Ok(()) @@ -94,7 +116,12 @@ impl Storage { sidecar_schema_version, sidecar_input_hash, sidecar_generation, - projection_count + projection_count, + symbol_doc_count, + dense_projection_count, + semantic_policy_version, + graph_artifact_hash, + dense_reason_counts_json FROM retrieval_index_manifest WHERE project_id = ?1", )?; @@ -116,6 +143,11 @@ impl Storage { sidecar_input_hash: row.get(10)?, sidecar_generation: row.get(11)?, projection_count: row.get(12)?, + symbol_doc_count: row.get(13)?, + dense_projection_count: row.get(14)?, + semantic_policy_version: row.get(15)?, + graph_artifact_hash: row.get(16)?, + dense_reason_counts_json: row.get(17)?, })) } @@ -180,6 +212,11 @@ mod tests { sidecar_input_hash: None, sidecar_generation: None, projection_count: None, + symbol_doc_count: None, + dense_projection_count: None, + semantic_policy_version: None, + graph_artifact_hash: None, + dense_reason_counts_json: None, }) .expect("upsert manifest"); } @@ -215,6 +252,11 @@ mod tests { sidecar_input_hash: Some("deadbeefcafebabe".into()), sidecar_generation: Some("proj-deadbeefcafebabe".into()), projection_count: Some(99), + symbol_doc_count: Some(120), + dense_projection_count: Some(99), + semantic_policy_version: Some("graph_first_v1".into()), + graph_artifact_hash: Some("graph-hash".into()), + dense_reason_counts_json: Some("{\"public_api\":99}".into()), }; storage .upsert_retrieval_index_manifest(&manifest) @@ -253,6 +295,11 @@ mod tests { sidecar_input_hash: None, sidecar_generation: None, projection_count: None, + symbol_doc_count: None, + dense_projection_count: None, + semantic_policy_version: None, + graph_artifact_hash: None, + dense_reason_counts_json: None, }) .expect("upsert manifest"); } diff --git a/crates/codestory-store/src/storage_impl/schema.rs b/crates/codestory-store/src/storage_impl/schema.rs index af3be535..4e89f69e 100644 --- a/crates/codestory-store/src/storage_impl/schema.rs +++ b/crates/codestory-store/src/storage_impl/schema.rs @@ -101,11 +101,30 @@ const TABLE_STATEMENTS: &[&str] = &[ embedding_backend TEXT, embedding_dim INTEGER NOT NULL, doc_shape TEXT, + semantic_policy_version TEXT, + dense_reason TEXT, embedding_blob BLOB NOT NULL, updated_at_epoch_ms INTEGER NOT NULL, FOREIGN KEY(node_id) REFERENCES node(id), FOREIGN KEY(file_node_id) REFERENCES node(id) )", + "CREATE TABLE IF NOT EXISTS symbol_search_doc ( + node_id INTEGER PRIMARY KEY, + file_node_id INTEGER, + kind INTEGER NOT NULL, + display_name TEXT NOT NULL, + qualified_name TEXT, + file_path TEXT, + start_line INTEGER, + doc_text TEXT NOT NULL, + doc_version INTEGER NOT NULL DEFAULT 0, + doc_hash TEXT NOT NULL DEFAULT '', + policy_version TEXT NOT NULL, + source_provenance TEXT NOT NULL, + updated_at_epoch_ms INTEGER NOT NULL, + FOREIGN KEY(node_id) REFERENCES node(id), + FOREIGN KEY(file_node_id) REFERENCES node(id) + )", "CREATE TABLE IF NOT EXISTS symbol_summary ( node_id INTEGER NOT NULL, content_hash TEXT NOT NULL, @@ -220,7 +239,12 @@ const TABLE_STATEMENTS: &[&str] = &[ sidecar_schema_version INTEGER, sidecar_input_hash TEXT, sidecar_generation TEXT, - projection_count INTEGER + projection_count INTEGER, + symbol_doc_count INTEGER, + dense_projection_count INTEGER, + semantic_policy_version TEXT, + graph_artifact_hash TEXT, + dense_reason_counts_json TEXT )", ]; @@ -256,6 +280,12 @@ const SECONDARY_INDEX_STATEMENTS: &[&str] = &[ "CREATE INDEX IF NOT EXISTS idx_llm_symbol_doc_file_node ON llm_symbol_doc(file_node_id)", "CREATE INDEX IF NOT EXISTS idx_llm_symbol_doc_kind ON llm_symbol_doc(kind)", "CREATE INDEX IF NOT EXISTS idx_llm_symbol_doc_updated_at ON llm_symbol_doc(updated_at_epoch_ms)", + "CREATE INDEX IF NOT EXISTS idx_llm_symbol_doc_policy_reason + ON llm_symbol_doc(semantic_policy_version, dense_reason)", + "CREATE INDEX IF NOT EXISTS idx_symbol_search_doc_file_node ON symbol_search_doc(file_node_id)", + "CREATE INDEX IF NOT EXISTS idx_symbol_search_doc_kind ON symbol_search_doc(kind)", + "CREATE INDEX IF NOT EXISTS idx_symbol_search_doc_policy ON symbol_search_doc(policy_version)", + "CREATE INDEX IF NOT EXISTS idx_symbol_search_doc_hash ON symbol_search_doc(doc_version, doc_hash)", "CREATE INDEX IF NOT EXISTS idx_search_symbol_projection_display_name ON search_symbol_projection(display_name)", "CREATE INDEX IF NOT EXISTS idx_callable_projection_state_node_id ON callable_projection_state(node_id)", @@ -388,14 +418,19 @@ pub(super) fn apply_schema_migrations(storage: &Storage) -> Result<(), StorageEr migrate_v17_retrieval_manifest_sidecar_generation(&storage.conn)?; storage.set_schema_version(17)?; } + if stored_version < 18 { + migrate_v18_ast_first_symbol_docs(&storage.conn)?; + storage.set_schema_version(18)?; + } create_llm_symbol_doc_reuse_index(&storage.conn)?; create_symbol_summary_indexes(&storage.conn)?; - if storage.deferred_secondary_indexes { - create_load_indexes(&storage.conn)?; + let index_mode = if storage.deferred_secondary_indexes { + StorageOpenMode::Build } else { - create_secondary_indexes(&storage.conn)?; - } + StorageOpenMode::Live + }; + create_indexes(&storage.conn, index_mode)?; if stored_version < SCHEMA_VERSION { storage.set_schema_version(SCHEMA_VERSION)?; @@ -525,6 +560,49 @@ pub(super) fn migrate_v17_retrieval_manifest_sidecar_generation( Ok(()) } +pub(super) fn migrate_v18_ast_first_symbol_docs(conn: &Connection) -> Result<(), StorageError> { + conn.execute( + "CREATE TABLE IF NOT EXISTS symbol_search_doc ( + node_id INTEGER PRIMARY KEY, + file_node_id INTEGER, + kind INTEGER NOT NULL, + display_name TEXT NOT NULL, + qualified_name TEXT, + file_path TEXT, + start_line INTEGER, + doc_text TEXT NOT NULL, + doc_version INTEGER NOT NULL DEFAULT 0, + doc_hash TEXT NOT NULL DEFAULT '', + policy_version TEXT NOT NULL, + source_provenance TEXT NOT NULL, + updated_at_epoch_ms INTEGER NOT NULL, + FOREIGN KEY(node_id) REFERENCES node(id), + FOREIGN KEY(file_node_id) REFERENCES node(id) + )", + [], + )?; + try_add_column(conn, "llm_symbol_doc", "semantic_policy_version TEXT")?; + try_add_column(conn, "llm_symbol_doc", "dense_reason TEXT")?; + try_add_column(conn, "retrieval_index_manifest", "symbol_doc_count INTEGER")?; + try_add_column( + conn, + "retrieval_index_manifest", + "dense_projection_count INTEGER", + )?; + try_add_column( + conn, + "retrieval_index_manifest", + "semantic_policy_version TEXT", + )?; + try_add_column(conn, "retrieval_index_manifest", "graph_artifact_hash TEXT")?; + try_add_column( + conn, + "retrieval_index_manifest", + "dense_reason_counts_json TEXT", + )?; + Ok(()) +} + pub(super) fn migrate_v14_retrieval_index_manifest(conn: &Connection) -> Result<(), StorageError> { conn.execute( "CREATE TABLE IF NOT EXISTS retrieval_index_manifest ( diff --git a/crates/codestory-store/src/storage_impl/tests/mod.rs b/crates/codestory-store/src/storage_impl/tests/mod.rs index b1828eb2..b6a04c17 100644 --- a/crates/codestory-store/src/storage_impl/tests/mod.rs +++ b/crates/codestory-store/src/storage_impl/tests/mod.rs @@ -1,4 +1,24 @@ use super::*; + +#[test] +fn file_role_classification_ignores_materialized_repo_cache_prefix() { + assert_eq!( + FileRole::classify_path(Path::new( + "C:/repo/target/repo-cache/repos/nvm-sh-nvm/install.sh" + )), + FileRole::Source + ); + assert_eq!( + FileRole::classify_path(Path::new( + "C:/repo/target/repo-cache/repos/psf-requests/tests/test_sessions.py" + )), + FileRole::Test + ); + assert_eq!( + FileRole::classify_path(Path::new("target/generated/client.ts")), + FileRole::Generated + ); +} use codestory_contracts::graph::{ AccessKind, Edge, EdgeId, EdgeKind, Node, NodeId, NodeKind, Occurrence, OccurrenceKind, ResolutionCertainty, SourceLocation, TrailConfig, TrailDirection, @@ -471,6 +491,8 @@ fn test_llm_symbol_doc_round_trip() -> Result<(), StorageError> { embedding_backend: None, embedding_dim: 384, doc_shape: None, + semantic_policy_version: Some("graph_first_v1".to_string()), + dense_reason: Some("public_api".to_string()), embedding: vec![0.25_f32; 384], updated_at_epoch_ms: 123, }])?; @@ -513,6 +535,8 @@ fn test_llm_symbol_doc_stats_report_contract_metadata() -> Result<(), StorageErr embedding_backend: Some("llamacpp".to_string()), embedding_dim: 768, doc_shape: Some("semantic_doc_version=2;alias_mode=alias_variant".to_string()), + semantic_policy_version: Some("graph_first_v1".to_string()), + dense_reason: Some("public_api".to_string()), embedding: vec![0.25_f32; 4], updated_at_epoch_ms: 123, }])?; @@ -566,6 +590,8 @@ fn test_llm_symbol_doc_stats_treats_legacy_null_contract_metadata_as_mixed() embedding_backend: None, embedding_dim: 384, doc_shape: None, + semantic_policy_version: None, + dense_reason: None, embedding: vec![0.25_f32; 4], updated_at_epoch_ms: 123, }, @@ -585,6 +611,8 @@ fn test_llm_symbol_doc_stats_treats_legacy_null_contract_metadata_as_mixed() embedding_backend: Some("hash".to_string()), embedding_dim: 384, doc_shape: Some("semantic_doc_version=4;scope=durable_symbols".to_string()), + semantic_policy_version: Some("graph_first_v1".to_string()), + dense_reason: Some("public_api".to_string()), embedding: vec![0.5_f32; 4], updated_at_epoch_ms: 456, }, @@ -627,6 +655,8 @@ fn test_symbol_summary_uses_current_content_hash() -> Result<(), StorageError> { embedding_backend: None, embedding_dim: 384, doc_shape: None, + semantic_policy_version: Some("graph_first_v1".to_string()), + dense_reason: Some("public_api".to_string()), embedding: vec![0.25_f32; 384], updated_at_epoch_ms: 123, }; @@ -691,6 +721,8 @@ fn test_llm_symbol_doc_copy_forward_preserves_reuse_metadata() -> Result<(), Sto embedding_backend: Some("hash".to_string()), embedding_dim: 384, doc_shape: Some("semantic_doc_version=2".to_string()), + semantic_policy_version: Some("graph_first_v1".to_string()), + dense_reason: Some("public_api".to_string()), embedding: vec![0.25_f32; 384], updated_at_epoch_ms: 123, }])?; @@ -1376,6 +1408,64 @@ fn test_opening_v3_db_resets_projection_state() -> Result<(), StorageError> { Ok(()) } +#[test] +fn live_open_migrates_v17_llm_doc_columns_before_secondary_indexes() -> Result<(), StorageError> { + let db_path = unique_temp_db_path("v17-ast-first-live-migration"); + let _ = std::fs::remove_file(&db_path); + { + let conn = rusqlite::Connection::open(&db_path)?; + conn.execute( + "CREATE TABLE llm_symbol_doc ( + node_id INTEGER PRIMARY KEY, + file_node_id INTEGER, + kind INTEGER NOT NULL, + display_name TEXT NOT NULL, + qualified_name TEXT, + file_path TEXT, + start_line INTEGER, + doc_text TEXT NOT NULL, + doc_version INTEGER NOT NULL DEFAULT 0, + doc_hash TEXT NOT NULL DEFAULT '', + embedding_model TEXT NOT NULL, + embedding_profile TEXT, + embedding_backend TEXT, + embedding_dim INTEGER NOT NULL, + doc_shape TEXT, + embedding_blob BLOB NOT NULL, + updated_at_epoch_ms INTEGER NOT NULL + )", + [], + )?; + conn.pragma_update(None, "user_version", 17)?; + } + + let storage = Storage::open(&db_path)?; + let columns = storage + .conn + .prepare("PRAGMA table_info(llm_symbol_doc)")? + .query_map([], |row| row.get::<_, String>(1))? + .collect::, _>>()?; + assert!( + columns + .iter() + .any(|column| column == "semantic_policy_version") + ); + assert!(columns.iter().any(|column| column == "dense_reason")); + let policy_index_count: i64 = storage.conn.query_row( + "SELECT COUNT(*) + FROM sqlite_master + WHERE type = 'index' + AND name = 'idx_llm_symbol_doc_policy_reason'", + [], + |row| row.get(0), + )?; + assert_eq!(policy_index_count, 1); + + drop(storage); + let _ = std::fs::remove_file(&db_path); + Ok(()) +} + #[test] fn test_promote_staged_snapshot_replaces_live_db_while_live_reader_is_open() -> Result<(), StorageError> { @@ -1683,8 +1773,22 @@ fn test_error_storage() -> Result<(), StorageError> { index_step: codestory_contracts::graph::IndexStep::Indexing, }; storage.insert_error(&error)?; + storage.insert_error(&codestory_contracts::graph::ErrorInfo { + message: "Recoverable parse warning".to_string(), + file_id: Some(NodeId(1)), + line: Some(20), + column: Some(1), + is_fatal: false, + index_step: codestory_contracts::graph::IndexStep::Indexing, + })?; let stats = storage.get_stats()?; - assert_eq!(stats.error_count, 1); + assert_eq!(stats.error_count, 2); + assert_eq!(stats.fatal_error_count, 1); + storage.refresh_grounding_summary_snapshots()?; + assert!(storage.has_ready_grounding_summary_snapshots()?); + let snapshot_stats = storage.get_stats()?; + assert_eq!(snapshot_stats.error_count, 2); + assert_eq!(snapshot_stats.fatal_error_count, 1); Ok(()) } @@ -1789,6 +1893,8 @@ fn test_delete_file_projection() -> Result<(), StorageError> { embedding_backend: None, embedding_dim: 384, doc_shape: None, + semantic_policy_version: Some("graph_first_v1".to_string()), + dense_reason: Some("public_api".to_string()), embedding: vec![0.1_f32; 384], updated_at_epoch_ms: 1, }])?; diff --git a/crates/codestory-workspace/Cargo.toml b/crates/codestory-workspace/Cargo.toml index 0583d38f..acc5934d 100644 --- a/crates/codestory-workspace/Cargo.toml +++ b/crates/codestory-workspace/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "codestory-workspace" -version = "0.6.2" +version = "0.7.0" edition = "2024" [dependencies] diff --git a/crates/codestory-workspace/src/lib.rs b/crates/codestory-workspace/src/lib.rs index 9f0a315c..720e41fe 100644 --- a/crates/codestory-workspace/src/lib.rs +++ b/crates/codestory-workspace/src/lib.rs @@ -605,46 +605,68 @@ fn normalize_exclude_match_key(path: &Path) -> String { } fn matches_source_group_language(path: &Path, language: &Language) -> bool { - let extension = path + let Some(extension) = path .extension() .and_then(|ext| ext.to_str()) - .map(|ext| ext.to_ascii_lowercase()); + .map(codestory_contracts::language_support::normalize_extension) + else { + return false; + }; + + registry_extension_matches_source_group(&extension, language) + || compatibility_extension_matches_source_group(&extension, language) +} + +fn registry_extension_matches_source_group(extension: &str, language: &Language) -> bool { + codestory_contracts::language_support::language_support_profile_for_ext(extension).is_some_and( + |profile| source_group_accepts_registry_language(language, profile.language_name), + ) +} + +fn source_group_accepts_registry_language(language: &Language, registry_language: &str) -> bool { + matches!( + (language, registry_language), + (&Language::Rust, "rust") + | (&Language::Python, "python") + | (&Language::Java, "java") + | (&Language::JavaScript, "javascript") + | (&Language::TypeScript, "typescript") + | (&Language::Cxx, "cpp" | "c") + | (&Language::Go, "go") + | (&Language::Ruby, "ruby") + | (&Language::Php, "php") + | (&Language::CSharp, "csharp") + | (&Language::Kotlin, "kotlin") + | (&Language::Swift, "swift") + | (&Language::Dart, "dart") + | (&Language::Sql, "sql") + | (&Language::Html, "html") + | (&Language::Css, "css") + | (&Language::Bash, "bash") + ) +} + +fn compatibility_extension_matches_source_group(extension: &str, language: &Language) -> bool { matches!( - (language, extension.as_deref()), - (&Language::Rust, Some("rs")) - | (&Language::Python, Some("py" | "pyi")) - | (&Language::Java, Some("java")) - | ( - &Language::JavaScript, - Some("js" | "jsx" | "mjs" | "cjs" | "svelte" | "vue" | "astro") - ) - | ( - &Language::TypeScript, - Some("ts" | "tsx" | "mts" | "cts" | "svelte" | "vue" | "astro") - ) - | ( - &Language::Cxx, - Some("c" | "cc" | "cpp" | "cxx" | "h" | "hh" | "hpp" | "hxx") - ) - | (&Language::Go, Some("go")) - | (&Language::Ruby, Some("rb")) - | (&Language::Php, Some("php")) - | (&Language::CSharp, Some("cs" | "cshtml")) - | (&Language::Kotlin, Some("kt" | "kts")) - | (&Language::Swift, Some("swift")) - | (&Language::Dart, Some("dart")) - | (&Language::Lua, Some("lua")) - | (&Language::Sql, Some("sql")) - | (&Language::Html, Some("html" | "htm")) - | (&Language::Css, Some("css" | "scss" | "sass" | "less")) - | (&Language::Bash, Some("sh" | "bash")) - | (&Language::PowerShell, Some("ps1" | "psm1")) - | (&Language::Svelte, Some("svelte")) - | (&Language::Vue, Some("vue")) - | (&Language::Astro, Some("astro")) + (language, extension), + (&Language::JavaScript, "svelte" | "vue" | "astro") + | (&Language::TypeScript, "svelte" | "vue" | "astro") + | (&Language::CSharp, "cshtml") + | (&Language::Lua, "lua") + | (&Language::Css, "scss" | "sass" | "less") + | (&Language::PowerShell, "ps1" | "psm1") + | (&Language::Svelte, "svelte") + | (&Language::Vue, "vue") + | (&Language::Astro, "astro") ) } +#[cfg(test)] +fn registry_language_for_path(path: &Path) -> Option<&'static str> { + path.to_str() + .and_then(|path| codestory_contracts::language_support::language_name_for_path(Some(path))) +} + fn push_discovered_file( files: &mut Vec, seen: &mut HashSet, @@ -880,6 +902,80 @@ mod tests { Ok(()) } + #[test] + fn workspace_supported_source_extensions_have_registry_profiles() { + let source_group_languages = [ + Language::Rust, + Language::Python, + Language::Java, + Language::JavaScript, + Language::TypeScript, + Language::Cxx, + Language::Go, + Language::Ruby, + Language::Php, + Language::CSharp, + Language::Kotlin, + Language::Swift, + Language::Dart, + Language::Sql, + Language::Html, + Language::Css, + Language::Bash, + ]; + + for profile in codestory_contracts::language_support::LANGUAGE_SUPPORT_PROFILES { + for extension in profile.extensions { + let file_name = format!("main.{extension}"); + assert_eq!( + registry_language_for_path(Path::new(&file_name)), + Some(profile.language_name), + "workspace source extension should resolve registry language: {extension}" + ); + assert!( + source_group_languages + .iter() + .any(|language| matches_source_group_language( + Path::new(&file_name), + language + )), + "workspace discovery should accept public registry extension: {extension}" + ); + } + } + + let compatibility_only = [ + ("cshtml", Language::CSharp), + ("svelte", Language::JavaScript), + ("svelte", Language::TypeScript), + ("svelte", Language::Svelte), + ("vue", Language::JavaScript), + ("vue", Language::TypeScript), + ("vue", Language::Vue), + ("astro", Language::JavaScript), + ("astro", Language::TypeScript), + ("astro", Language::Astro), + ("lua", Language::Lua), + ("ps1", Language::PowerShell), + ("psm1", Language::PowerShell), + ("scss", Language::Css), + ("sass", Language::Css), + ("less", Language::Css), + ]; + for (extension, language) in compatibility_only { + assert!( + codestory_contracts::language_support::language_support_profile_for_ext(extension) + .is_none(), + "compatibility-only source extension should not have a public registry profile: {extension}" + ); + let file_name = format!("main.{extension}"); + assert!( + matches_source_group_language(Path::new(&file_name), &language), + "compatibility-only source extension should stay accepted by workspace discovery: {extension}" + ); + } + } + #[test] fn synthetic_workspace_under_excluded_parent_still_discovers_repo_files() -> Result<()> { let temp = tempdir()?; diff --git a/docker/retrieval.env.example b/docker/retrieval.env.example index fc1a199a..8f716a50 100644 --- a/docker/retrieval.env.example +++ b/docker/retrieval.env.example @@ -7,19 +7,23 @@ CODESTORY_QDRANT_HTTP_PORT=6333 CODESTORY_QDRANT_GRPC_PORT=6334 CODESTORY_EMBED_PORT=8080 -# Bind-mount for Qdrant persistence (Windows example) +# Bind-mount for Qdrant persistence +# CODESTORY_QDRANT_DATA_DIR=$HOME/.cache/codestory/qdrant # CODESTORY_QDRANT_DATA_DIR=C:\Users\you\AppData\Local\codestory\cache\qdrant # Zoekt index root (real profile webserver + lexical shards) +# CODESTORY_ZOEKT_DATA_DIR=$HOME/.cache/codestory/zoekt # CODESTORY_ZOEKT_DATA_DIR=C:\Users\you\AppData\Local\codestory\cache\zoekt # bge-base-en-v1.5 GGUF for llama.cpp embed service (real profile) +# CODESTORY_EMBED_MODEL_DIR=/path/to/codestory/target/retrieval-models # CODESTORY_EMBED_MODEL_DIR=C:\Users\you\source\repos\codestory\target\retrieval-models # Fetch: node scripts/setup-retrieval-env.mjs --fetch-embed-model # Historical compose-profile overrides are rejected by product bootstrap/index paths. # Optional: override compose file location +# CODESTORY_RETRIEVAL_COMPOSE_FILE=/path/to/codestory/docker/retrieval-compose.yml # CODESTORY_RETRIEVAL_COMPOSE_FILE=C:\path\to\codestory\docker\retrieval-compose.yml # Phase 2 — real Qdrant vectors (768-dim bge-base-en-v1.5) diff --git a/docs/architecture/browser-surface-gate.md b/docs/architecture/browser-surface-gate.md deleted file mode 100644 index 82c3e7e3..00000000 --- a/docs/architecture/browser-surface-gate.md +++ /dev/null @@ -1,60 +0,0 @@ -# Browser Surface Gate - -This is maintainer governance, not the product quickstart. It exists to stop a -new web UI or `browse` command from being added before the evidence exists. - -CodeStory should keep `explore`, `serve --stdio`, and the read-only browser -service as the default codebase-browser surfaces until a separate web UI or -`browse` command has evidence that it solves a different workflow. - -## Current Status - -Status: deferred. - -`explore` is the browser path for now. It already bundles project status, -query resolution, navigation results, symbol details, trail context, snippets, -and next commands without introducing another UI surface. - -Do not add a new `browse` command, web UI route, or browser-specific UI -until all of the gates below have current evidence in the repo. - -## Promotion Gates - -Before starting web UI work: - -- Tool, resource, and prompt manifests must be stable under stdio catalog tests. -- HTTP and stdio browser contracts must stay aligned with the read-only browser - service. -- Warm stdio/browser-loop p50, p95, and p99 timings must be recorded and must - meet the active Current Promotion Budget in - `docs/testing/codestory-stdio-warm-loop-stats.md`: small-fixture smoke p95 - stays under the smoke budget, and a current real-repo run meets the Web - Cockpit Promotion Budget. -- Browser stress lanes must pass at the intended scale, and synthetic evidence - must not be treated as real-repository promotion proof. -- `explore` must demonstrate the browser workflow in JSON/Markdown and - keyboard-first TUI paths. -- Screenshot-visible review must be planned before implementation, with one - reviewer for the full viewport and one reviewer for the changed surface or - acceptance path. - -## Evidence Sources - -- `crates/codestory-cli/tests/stdio_protocol_contracts.rs` protects tool, - resource, prompt, and schema stability. -- `crates/codestory-cli/tests/http_transport_contracts.rs` protects HTTP and - stdio default-browser alignment. -- `crates/codestory-cli/tests/stdio_warm_loop_stats.rs` measures warm loop - p50, p95, and p99. -- `docs/testing/codestory-stdio-warm-loop-stats.md` owns the active warm p95 - promotion budget and current run evidence. -- `docs/testing/codestory-stress-lanes.md` defines browser-scale stress lanes - and promotion thresholds. -- `crates/codestory-cli/tests/cli_golden_path.rs` keeps `explore` useful as the - bundled browser path. - -## When The Gate Opens - -If the gates are satisfied, start with a written implementation plan that names -why the new surface is not a duplicate of `explore`, the exact routes or -commands to add, the screenshot-visible review loop, and the rollback path. diff --git a/docs/architecture/indexing-pipeline.md b/docs/architecture/indexing-pipeline.md index 8f0f6994..a6eb0800 100644 --- a/docs/architecture/indexing-pipeline.md +++ b/docs/architecture/indexing-pipeline.md @@ -4,7 +4,7 @@ This page explains how `codestory-cli index` turns a repository into SQLite-back Read this page when you need the implementation mental model. Use the CLI grounding workflows after that if you want live evidence from an indexed workspace. -Default `index` includes semantic docs. A successful run returns only after graph indexing, snapshots, lexical search projection, and persisted semantic docs are synchronized. Semantic work is measured separately in the phase timings instead of being hidden behind a later read command. +Default `index` includes graph-native symbol docs and selected dense anchors. A successful run returns only after graph indexing, snapshots, lexical search projection, deterministic `symbol_search_doc` rows, component reports, and persisted dense-anchor docs are synchronized. Semantic work is measured separately in the phase timings instead of being hidden behind a later read command. ## End-To-End Command Path @@ -25,8 +25,8 @@ sequenceDiagram Indexer->>Store: flush files, nodes, edges, occurrences, component access, callable projection state Indexer->>Store: run post-flush resolution updates Runtime->>Store: finalize staged snapshot or refresh live snapshots - Runtime->>Search: rebuild lexical projection and sync semantic docs - Search->>Store: reuse unchanged embeddings or upsert embedded docs + Runtime->>Search: rebuild lexical projection, symbol docs, component reports, and dense anchors + Search->>Store: reuse unchanged dense embeddings or upsert selected anchor docs Runtime-->>CLI: indexing summary and phase timings ``` @@ -36,8 +36,8 @@ sequenceDiagram - `codestory-runtime` chooses full versus incremental flow and staged versus live store behavior. - `codestory-workspace` discovers source files and computes the refresh plan. - `codestory-indexer` turns the plan into projection writes and post-flush resolution. -- `codestory-store` persists rows, invalidates or refreshes snapshots, publishes staged builds, and stores semantic docs. -- `codestory-runtime` owns the runtime search engine, semantic doc sync, retrieval readiness, and timing surface. +- `codestory-store` persists rows, invalidates or refreshes snapshots, publishes staged builds, and stores symbol docs plus dense-anchor docs. +- `codestory-runtime` owns the runtime search engine, symbol doc and dense-anchor sync, retrieval readiness, and timing surface. That split is intentional: the runtime orchestrates the run, the indexer performs indexing work, and the store owns persistence mechanics. @@ -46,7 +46,7 @@ That split is intentional: the runtime orchestrates the run, the indexer perform ```mermaid flowchart TD plan["Refresh plan from codestory-workspace"] --> prep["Normalize paths and load compile_commands metadata"] - prep --> supported{"Supported language?"} + prep --> supported{"Parser-backed or structural support path?"} supported -->|"No"| skip["Skip file with no parse work"] supported -->|"Yes"| cache{"Artifact cache hit?"} cache -->|"Yes"| reuse["Reuse cached intermediate artifacts or refresh file metadata"] @@ -64,7 +64,7 @@ flowchart TD resolve --> errors["Flush indexing errors"] errors --> cleanup["Incremental cleanup for removed files"] cleanup --> snapshots["Runtime refreshes or publishes snapshots"] - snapshots --> semantic["Runtime syncs lexical search projection and semantic docs"] + snapshots --> semantic["Runtime syncs lexical search projection, symbol docs, component reports, and dense anchors"] semantic --> summary["CLI receives retrieval state and phase timings"] ``` @@ -105,7 +105,8 @@ Files that disappeared from discovery are collected into `files_to_remove`. - it seeds the symbol table from existing stored node kinds for incremental runs - it chunks `files_to_index` using batch settings - it loads parsed compilation metadata from `compile_commands.json` when available -- it picks a language configuration for each file and skips unsupported files before any parse work +- it picks a parser-backed language configuration or structural collector for + each file and skips unsupported files before any parse work Compilation metadata matters mostly for native-language parsing and is part of the artifact-cache key, so changes to compiler flags or include paths can invalidate cached artifacts. @@ -189,28 +190,32 @@ The last step belongs to runtime plus store: Full and incremental snapshot behavior are intentionally not symmetric. -### 11. Runtime synchronizes search and semantic docs +### 11. Runtime synchronizes search, symbol docs, and dense anchors -After graph and snapshot work, runtime rebuilds the search-symbol projection, opens or refreshes the persisted Tantivy search directory, and synchronizes semantic symbol docs. This is part of the default `index` contract. +After graph and snapshot work, runtime rebuilds the search-symbol projection, opens or refreshes the persisted Tantivy search directory, writes graph-native symbol docs, writes deterministic component reports, and synchronizes selected dense-anchor docs. This is part of the default `index` contract. -Semantic sync does four pieces of work: +Semantic sync does these pieces of work: -- build the generated text for indexable symbols -- reuse existing embeddings when doc version, generated text hash, embedding model, and embedding dimension still match -- embed only pending docs and upsert them back into SQLite -- prune stale docs that no longer correspond to the refreshed symbol set +- build deterministic generated text for durable AST symbols and store it in `symbol_search_doc` +- build deterministic component/community report docs with extracted provenance +- classify each symbol under `graph_first_v1` +- reuse existing dense embeddings when doc version, generated text hash, embedding profile/backend/model/dimension, document prefix, and semantic policy version still match +- embed only selected dense anchors and upsert them back into SQLite +- prune stale symbol docs or dense docs that no longer correspond to the refreshed graph and policy -Full refresh has an extra copy-forward path: if a previous live database exists, unchanged semantic docs are copied into the staged database before publish. The later semantic sync can then reuse those rows instead of re-embedding them. +Full refresh has an extra copy-forward path: if a previous live database exists, unchanged symbol docs, retrieval artifact nodes, and dense-anchor docs are copied into the staged database before publish. The later semantic sync can then reuse those rows instead of re-embedding them. -Incremental refresh scopes semantic invalidation by touched file. Untouched files keep their existing semantic docs; new, changed, or removed symbols in touched files are embedded or pruned. +Incremental refresh scopes symbol-doc and dense-anchor invalidation by touched file. Untouched files keep their existing docs; new, changed, or removed symbols in touched files are written, embedded if policy-selected, skipped with reason counts, or pruned. -The default semantic scope is durable symbols: classes, structs, interfaces, annotations, unions, enums, typedefs, functions, methods, macros, global variables, constants, and enum constants. Lower-signal module, namespace, package, field, local variable, and type-parameter docs stay out of semantic retrieval by default while remaining present in graph and lexical search. Set `CODESTORY_SEMANTIC_DOC_SCOPE=all` to restore the broader semantic doc set for investigations. +The default symbol-doc scope is durable symbols: classes, structs, interfaces, annotations, unions, enums, typedefs, functions, methods, macros, global variables, constants, and enum constants. Lower-signal module, namespace, package, field, local variable, and type-parameter docs stay out of dense retrieval by default while remaining present in graph and lexical search. Set `CODESTORY_SEMANTIC_DOC_SCOPE=all` only for investigations. + +The dense-anchor policy version is `graph_first_v1`. Dense reasons are `public_api`, `entrypoint`, `documented_nontrivial`, `central_graph_node`, `component_report`, and `unstructured_doc`. Private trivial helpers, generated/vendor code, and test-only implementation details are skipped for dense embedding unless they are structurally central; they remain discoverable through exact lookup, `symbol_search_doc`, source lexical search, and graph expansion. The default semantic text alias policy is `CODESTORY_SEMANTIC_DOC_ALIAS_MODE=alias_variant`. It keeps compact language, terminal-name, owner-name, and symbol-role hints, but leaves out the noisier full name-alias and path-alias lists from the earlier `current_alias` research variant. Use `no_alias` for baseline research rows and `current_alias` only when reproducing older alias-enriched runs. Embedding throughput is optimized for the local embedding path: -- pending semantic docs are sorted by generated text length before embedding, which keeps batches close to uniform length +- pending dense-anchor docs are sorted by generated text length before embedding, which keeps batches close to uniform length - the default semantic embedding batch size is `128`, with `CODESTORY_LLM_DOC_EMBED_BATCH_SIZE` available for profiling - product sidecar embeddings use `CODESTORY_EMBED_BACKEND=llamacpp` and the local `CODESTORY_EMBED_LLAMACPP_URL` endpoint; the manifest must record @@ -228,7 +233,11 @@ Keep measured repo-scale timings in [codestory-e2e-stats-log.md](../testing/code ### When files are skipped -The indexer skips files before parsing when it cannot select a supported language configuration for the path plus compilation metadata. +The indexer skips files before parsing when it cannot select a parser-backed +language configuration or structural collector for the path plus compilation +metadata. See [language-support.md](language-support.md) for the distinction +between parser-backed graph support, structural collectors, and candidate parser +compatibility records. ### How `compile_commands.json` participates @@ -246,13 +255,14 @@ Files, nodes, edges, occurrences, component access, and callable projection stat Full refresh builds a staged database and publishes it only after staged finalization succeeds. Incremental refresh never publishes a staged build; it updates the live store and refreshes live snapshots in place. -### How semantic docs are kept fast +### How symbol docs and dense anchors are kept fast -Semantic docs are persisted in SQLite with generated-text metadata. Reuse is keyed by schema version, generated text hash, embedding model, and embedding dimension. On full refresh, runtime copies prior semantic docs forward into the staged database before semantic sync checks them. On incremental refresh, runtime passes a touched-file scope so only docs belonging to changed files are rebuilt or pruned. +Symbol docs are deterministic graph artifacts persisted in SQLite with generated-text metadata and extracted provenance. Dense anchors are persisted separately in SQLite with vector metadata. Reuse is keyed by schema version, generated text hash, embedding profile/backend/model/dimension, document prefix, and semantic policy version. On full refresh, runtime copies prior retrieval artifact nodes, symbol docs, and dense docs forward into the staged database before semantic sync checks them. On incremental refresh, runtime passes a touched-file scope so only docs belonging to changed files are rebuilt, embedded, skipped, or pruned. -Cold start still has to embed any semantic doc that has no reusable row. The -cold path is kept under control by using the durable-symbol default scope, -length-bucketed batches, full sidecar readiness, and stored vector quantization. +Cold start embeds only dense anchors that have no reusable row. The cold path is +kept under control by using graph-native symbol docs for code recall, the +`graph_first_v1` dense policy, length-bucketed batches, full sidecar readiness, +and stored vector quantization. ### What timing output means @@ -267,10 +277,12 @@ The index summary reports graph and semantic work separately: - `semantic_ms.db_upsert`: SQLite writes for embedded docs - `semantic_ms.reload`: loading persisted semantic docs into the runtime search engine when needed - `semantic_ms.prune`: removing stale semantic docs after the refreshed symbol set is known -- `semantic_docs.reused`: existing docs accepted without embedding -- `semantic_docs.embedded`: docs newly embedded in this run -- `semantic_docs.pending`: docs that needed embedding after reuse checks -- `semantic_docs.stale`: persisted docs pruned because they no longer match the refreshed symbol set +- `symbol_search_docs_written`: graph-native symbol docs and component reports written for lexical/graph recall +- `semantic_docs.reused`: existing dense-anchor docs accepted without embedding +- `semantic_docs.embedded`: dense-anchor docs newly embedded in this run +- `semantic_docs.pending`: dense-anchor docs that needed embedding after reuse checks +- `semantic_docs.stale`: persisted dense-anchor docs pruned because they no longer match the refreshed symbol set +- `semantic_dense_docs_skipped` and `semantic_dense_*`: policy skip and dense-reason counters for `graph_first_v1` Use these fields before changing parser, graph, or SQLite code for a slow `index` run. diff --git a/docs/architecture/language-support.md b/docs/architecture/language-support.md new file mode 100644 index 00000000..edd74e49 --- /dev/null +++ b/docs/architecture/language-support.md @@ -0,0 +1,159 @@ +# Language Support Contract + +CodeStory uses the word "support" only with a qualifier. Parser routing, +regression evidence, framework route coverage, and agent packet/search quality +are separate claims. + +The source of truth for extension ownership, stored-language names, support +modes, evidence tiers, and claim labels is +`crates/codestory-contracts/src/language_support.rs`. The indexer maps those +shared support profiles to parser/rule construction in `get_language_for_ext`. +The shared registry owns public support claims. Workspace discovery also carries +compatibility-only filters for file types that can be scanned or grouped without +being claimed as parser-backed language support. + +## Claim Terms + +- `parser-backed graph`: the file extension routes to a tree-sitter parser and + rule asset, and the indexer can emit graph nodes and edges for that language. +- `fidelity-gated`: parser-backed graph support has overlapping regression + evidence for symbols, imports, calls, member ownership, representable + inheritance, and resolved-call behavior covered by the fixture suites. +- `semantic-resolution-backed`: the language has explicit semantic resolver + dispatch and tests for the resolution behavior being claimed. This is a + narrower claim than parser-backed graph support. +- `structural collector`: the language is indexed by dedicated structural + collectors, not full tree-sitter graph rules. +- `candidate parser compatibility record`: a parser crate/version was checked + for possible future use, but that record is not a runtime support claim until + the language has dependency wiring, rule assets, routing, and fidelity tests. + +## Current Matrix + +| Runtime claim | Languages | Runtime path | Evidence floor | Safe claim | +| --- | --- | --- | --- | --- | +| Parser-backed graph, fidelity-gated | Python, Java, Rust, JavaScript, TypeScript/TSX, C++, C, Go, Ruby, PHP, C#, Kotlin, Swift, Dart, Bash | tree-sitter parser plus graph rules | fidelity lab, tictactoe coverage, raw graph contracts, targeted rule/resolution suites, and the opt-in OSS language corpus; agent-facing A/B evidence is separate and currently mixed | daily graph navigation on typical code, with language-specific caveats | +| Structural collector | HTML, CSS, SQL | dedicated structural collectors | structural collector tests | structural entity extraction, not semantic code navigation | + +Agent-facing packet/search quality is a separate claim from parser-backed graph +support. The current language-expansion A/B report records a mixed full +18-language result and a stronger packet-eligible slice; do not use that report +as blanket promotion proof for every parser-backed language. + +The parser-backed graph claim is not a promise that every language has identical +dispatch or semantic-resolution semantics. Typed receiver-call support is +claimed only for the fixture-backed cases named in the indexer regression +suites. Current support covers simple local owner qualified calls where tests +prove the behavior. Cross-package receiver lookup, polymorphic dispatch, +inheritance-heavy target selection, framework-handler resolution, and +declarative parameter extraction require separate fixtures and cannot be used +as product claims until those fixtures pass. + +## Parser Compatibility Matrix + +This table is a parser-version compatibility record, not a runtime support +claim. Candidate parser crates are judged against the workspace parser-version +policy before they become durable language-support evidence: + +- `tree-sitter = "0.24"` +- `tree-sitter-graph = "0.12"` + +Validation method: checked candidate parser crates in an isolated temporary probe +crate (outside workspace members) with `tree-sitter = "0.24"`, +`tree-sitter-graph = "0.12"`, and exactly one pinned `` +dependency, then ran `cargo check` for each language. + +| Language | Candidate crate | Version checked | `cargo check` with 0.24/0.12 | Decision | Notes | +|---|---|---:|---|---|---| +| Go | `tree-sitter-go` | `0.23.4` | pass (`cargo check` + parse smoke) | crates.io pin | `0.25.0` compiles but fails at runtime with `LanguageError { version: 15 }` on tree-sitter `0.24`. | +| Ruby | `tree-sitter-ruby` | `0.23.1` | pass (`cargo check` + parse smoke) | crates.io pin | Wired in indexer with `rules/ruby.scm`. | +| PHP | `tree-sitter-php` | `0.23.11` | pass (`cargo check` + parse smoke) | crates.io pin | `0.24.2` compiles but fails at runtime with `LanguageError { version: 15 }` on tree-sitter `0.24`. | +| C# | `tree-sitter-c-sharp` | `=0.23.0` | pass (`cargo check` + parse smoke) | crates.io pin | `0.23.5` compiles but fails at runtime with `LanguageError { version: 15 }` on tree-sitter `0.24`. | +| Kotlin | `tree-sitter-kotlin-ng` | `1.1.0` | pass (`cargo check` + parse smoke) | crates.io pin | Wired in indexer with `rules/kotlin.scm`. | +| Swift | `tree-sitter-swift` | `0.7.0` | pass (`cargo check` + parse smoke) | crates.io pin | `0.7.1` and newer tested candidates use ABI 15 and fail at runtime on tree-sitter `0.24`. | +| Dart | `tree-sitter-dart-orchard` | `0.3.2` | pass (`cargo check` + parse smoke) | crates.io pin | Replaces `tree-sitter-dart = 0.2.0`, whose language export uses ABI 15 with tree-sitter `0.24`. | +| HTML | `tree-sitter-html` | `0.23.2` | pass | crates.io pin | Parser is available if structural extraction chooses parser-backed route. | +| CSS | `tree-sitter-css` | `0.25.0` | pass | crates.io pin | Parser is available if structural extraction chooses parser-backed route. | +| SQL | `tree-sitter-sequel` | `0.3.11` | pass | crates.io pin | SQL parser candidate compiles with policy pins. | +| Bash | `tree-sitter-bash` | `0.23.3` | pass (`cargo check` + parse smoke) | crates.io pin | `0.25.x` uses ABI 15 and fails at runtime on tree-sitter `0.24`. | + +Current outcome: + +- No language in this matrix currently requires a git pin, custom fork, or forced + text-only fallback for parser-policy compatibility. +- Go, Ruby, PHP, C#, Kotlin, Swift, Dart, and Bash have parser dependencies, + rule assets, and extension routing wired in the current branch. +- HTML, CSS, and SQL have structural extraction paths, but they are not + parser-backed rule assets from this matrix. +- New parser candidates should stay on this page as compatibility records until + they also have dependency wiring, rule assets, language routing, and fidelity + coverage. + +## Route Coverage Is Separate + +Framework route extraction has its own confidence labels in +[framework-route-coverage.md](../testing/framework-route-coverage.md). A +language can have parser-backed graph support while a framework remains +partial or heuristic. A route claim needs fixture or real-repo route evidence, +not just a language parser. + +## Expansion Checklist + +Before adding a new parser-backed language or broader framework claim: + +1. Add or update the parser/rule path and extension mapping. +2. Add tictactoe coverage for symbol, import, call, member, and inheritance + shapes that the language can reasonably represent. +3. Add or update fidelity-lab fixtures for symbols, imports, call edges, and + any resolution behavior being claimed. +4. Add targeted resolution tests before claiming local receiver-aware, + polymorphic, cross-package, framework-handler, or owner-qualified call trails. +5. Update `crates/codestory-contracts/src/language_support.rs`, including + `language_support_profile_for_ext` and + `language_support_profile_for_language_name`, parser construction such as + `get_language_for_ext`, and this page in the same change. +6. Add or update the + [OSS language corpus](../testing/oss-language-corpus.md) entry so the new + public language-support profile has a pinned medium-sized open source project and + a raw-without-CodeStory indexing comparison lane. +7. Add or update the `language-expansion-holdout` task manifest so the language + also has a strict `without_codestory` versus `with_codestory` agent A/B task + that measures elapsed time, tokens, tool calls, command counts, source reads, + post-packet source reads, and answer quality. +8. Run the full test binaries, not filtered test names: + + ```sh + cargo test -p codestory-indexer --test fidelity_regression + cargo test -p codestory-indexer --test tictactoe_language_coverage + cargo test -p codestory-indexer --test call_resolution_common_methods + cargo test -p codestory-indexer --test import_resolution + cargo test -p codestory-indexer --test query_rule_regressions + cargo test -p codestory-indexer --test trait_interface_resolution + ``` + +9. For broader real-project smoke evidence, run either the OSS corpus dry-run + manifest check or the relevant full corpus language subset: + + ```sh + CODESTORY_OSS_CORPUS_DRY_RUN=1 cargo test -p codestory-indexer --test oss_language_corpus -- --ignored --nocapture + CODESTORY_RUN_OSS_LANGUAGE_CORPUS=1 CODESTORY_OSS_CORPUS_LANGUAGES=python cargo test -p codestory-indexer --test oss_language_corpus -- --ignored --nocapture + ``` + +10. For agent-facing evidence, run at least the targeted language task from the + A/B suite, and run the full suite before making language-wide savings or + answer-quality claims: + + ```sh + node scripts/codestory-agent-ab-benchmark.mjs \ + --task-suite language-expansion-holdout \ + --arms without_codestory,with_codestory \ + --repeats 3 --materialize-repos --prepare-codestory-cache \ + --out-dir target/agent-benchmark/language-expansion-holdout \ + --timeout-ms 600000 + ``` + +11. Before widening typed receiver-call claims, add same-file and cross-file + fixtures for the target language. If implementation still uses signature + string slicing, document that as a transitional boundary; prefer a + tree-sitter-query or global-resolution-backed implementation for new + claims. diff --git a/docs/architecture/overview.md b/docs/architecture/overview.md index cfecb5d3..2c6640a8 100644 --- a/docs/architecture/overview.md +++ b/docs/architecture/overview.md @@ -86,6 +86,45 @@ Important rules: ## Operating Constraints +### Browser surface gate + +Status: deferred. + +`codestory-cli` `explore` and `serve --stdio` remain the current browser +surfaces. Do not add a new `browse` command, web UI route, or browser-specific UI +until all of the gates below have current evidence in the repo. + +Before starting web UI work: + +- Tool, resource, and prompt manifests must be stable under stdio catalog tests. +- HTTP and stdio browser contracts must stay aligned with the read-only browser + service. +- Warm stdio/browser-loop p50, p95, and p99 timings must be recorded and must + meet the active Current Promotion Budget in + `docs/testing/codestory-stdio-warm-loop-stats.md`: small-fixture smoke p95 + stays under the smoke budget, and a current real-repo run meets the Web + Cockpit Promotion Budget. +- Browser stress lanes must pass at the intended scale, and synthetic evidence + must not be treated as real-repository promotion proof. +- `explore` must demonstrate the browser workflow in JSON/Markdown and + keyboard-first TUI paths. +- Screenshot-visible review must be planned before implementation, with one + reviewer for the full viewport and one reviewer for the changed surface or + acceptance path. + +Evidence sources: `crates/codestory-cli/tests/stdio_protocol_contracts.rs`, +`crates/codestory-cli/tests/http_transport_contracts.rs`, +`crates/codestory-cli/tests/stdio_warm_loop_stats.rs`, +`docs/testing/codestory-stdio-warm-loop-stats.md`, +`docs/testing/codestory-stress-lanes.md`, and +`crates/codestory-cli/tests/cli_golden_path.rs`. + +If the gates are satisfied, start with a written implementation plan that names +why the new surface is not a duplicate of `explore`, the exact routes or +commands to add, the screenshot-visible review loop, and the rollback path. + +### Layer boundaries + - Keep the public command surface centered on grounding, target context, navigation, health, and serving workflows. - Add shared graph, DTO, grounding, and event types to `codestory-contracts`, not @@ -101,5 +140,5 @@ Important rules: - Product mental model: [../concepts/how-codestory-works.md](../concepts/how-codestory-works.md) - System behavior: [runtime-execution-path.md](runtime-execution-path.md) - Indexing lifecycle: [indexing-pipeline.md](indexing-pipeline.md) +- Language support claims: [language-support.md](language-support.md) - Ownership details: [subsystems/contracts.md](subsystems/contracts.md), [subsystems/workspace.md](subsystems/workspace.md), [subsystems/indexer.md](subsystems/indexer.md), [subsystems/store.md](subsystems/store.md), [subsystems/runtime.md](subsystems/runtime.md), [subsystems/cli.md](subsystems/cli.md) -- Historical context: [../decision-log.md](../decision-log.md) diff --git a/docs/architecture/retrieval-design.md b/docs/architecture/retrieval-design.md index d2742ea6..902fa8f7 100644 --- a/docs/architecture/retrieval-design.md +++ b/docs/architecture/retrieval-design.md @@ -7,14 +7,18 @@ with `retrieval_mode=full`. `full` means all of the following are true for the same generation: - Zoekt lexical shard exists, matches the current lexical input hash, and - answers smoke queries. -- Qdrant collection exists, has at least the manifest projection count, uses the - product llama.cpp `bge-base-en-v1.5` embedding backend, and answers semantic - smoke queries. + answers smoke queries against source files plus generated graph-native symbol + docs and component-report virtual docs. +- Qdrant collection exists, has at least the manifest dense-anchor projection + count, uses the product llama.cpp `bge-base-en-v1.5` embedding backend, and + answers semantic smoke queries when the active semantic policy selects one or + more dense anchors. If the active policy selects zero dense anchors, Qdrant is + explicitly not required for that generation. - SCIP graph artifacts exist and are not stub markers. - The SQLite `retrieval_index_manifest` has the current schema version, sidecar input hash, sidecar generation, Qdrant collection, embedding backend, - embedding dimension, and projection count. + embedding dimension, symbol-doc count, dense-anchor count, semantic policy + version, graph artifact hash, and dense reason counts. Everything else is diagnostic only. `no_scip`, `no_semantic`, `lexical_only`, `unavailable`, stale manifests, stub markers, disabled sidecars, hash vectors, @@ -33,13 +37,14 @@ agent-facing packet/search. ## Mode Matrix -| Zoekt | Qdrant | SCIP | Mode | Product behavior | -|-------|--------|------|------|------------------| -| up | up | up | `full` | Serve packet/search evidence | -| up | up | down | `no_scip` | Fail closed | -| up | down | up | `no_semantic` | Fail closed | -| up | down | down | `lexical_only` | Fail closed | -| down | * | * | `unavailable` | Fail closed | +| Zoekt | Qdrant | SCIP | Dense anchors | Mode | Product behavior | +|-------|--------|------|---------------|------|------------------| +| up | up | up | >0 | `full` | Serve packet/search evidence | +| up | skipped by policy | up | 0 | `full` | Serve graph/lexical packet/search evidence; dense stage is explicitly skipped | +| up | up | down | any | `no_scip` | Fail closed | +| up | down | up | >0 | `no_semantic` | Fail closed | +| up | down | down | >0 | `lexical_only` | Fail closed | +| down | * | * | any | `unavailable` | Fail closed | Runtime rules: @@ -53,19 +58,46 @@ Runtime rules: ## Generation And Reuse Sidecar generation is content-addressed by project id and sidecar input hash. -The hash includes local lexical input, symbol projection rows, semantic file -role metadata, sidecar schema version, Zoekt version pin, embedding backend, -embedding dimension, and SCIP artifact contract inputs. +The hash includes local lexical input, graph-native `symbol_search_doc` rows, +dense-anchor rows, semantic file-role metadata, sidecar schema version, Zoekt +version pin, embedding backend, embedding dimension, semantic policy version, +dense reason counts, and SCIP artifact contract inputs. `retrieval index --refresh auto` should reuse an unchanged healthy generation. If inputs match but health is not `full`, CodeStory rebuilds the unhealthy component and persists the manifest only after the full stack is healthy. +## AST-First Semantic Contract + +Code structure is graph-native first. Runtime writes a deterministic +`symbol_search_doc` for every durable AST symbol. These docs contain symbol name, +kind, file, signature, comments, aliases, related symbols, edge digest, hash, +policy version, extracted provenance, and file/node provenance. They are indexed +lexically and used for candidate generation and graph expansion; they are not +embedded by default. + +Dense vectors are reserved for `graph_first_v1` anchors. Allowed reasons are +`public_api`, `entrypoint`, `documented_nontrivial`, `central_graph_node`, +`component_report`, and `unstructured_doc`. Rejected private trivial helpers, +generated/vendor code, test-only helpers, and local implementation details must +still be discoverable through symbol docs, source lexical search, exact symbol +lookup, and graph expansion. There is no anonymous foreground cap: every dense +or skipped symbol must be explainable through policy counters. + +Component reports are deterministic extracted graph artifacts. They group symbols +by crate/module/directory ownership and summarize central "god node" symbols +using import/call/reference shape. Reports are virtual docs in the lexical shard +and may be dense anchors with reason `component_report`. + ## Evidence Rules - Exact symbol and path evidence remains the precision floor. -- Semantic and graph evidence can expand or rank candidates, but cannot replace - a missing exact sidecar contract. +- Candidate generation order is exact symbol/AST lookup, lexical source and + virtual-doc search, graph expansion, then dense-anchor augmentation. +- Dense search must never be the only recall path for code symbols. +- Served search evidence should expose provenance labels such as `exact`, + `lexical_source`, `symbol_doc`, `graph_neighbor`, `component_report`, and + `dense_anchor`. - Broad prompt retrieval should let lexical/source evidence compete with semantic evidence and should downrank tests, generated files, benchmarks, and vendor paths unless the query explicitly asks for those roles. @@ -115,3 +147,8 @@ Promotion requires at least: - local-real quality that beats the prior accepted baseline, - no diagnostic/stub/hash product evidence, - docs and runbooks aligned with the current mandatory sidecar contract. + +Proof tiers, promotion checklist, and north-star SLOs: +[`retrieval-architecture.md`](../testing/retrieval-architecture.md). Setup, +version pins, env vars, and CI smoke: +[`retrieval-sidecars.md`](../ops/retrieval-sidecars.md). diff --git a/docs/architecture/retrieval-parser-compat-matrix.md b/docs/architecture/retrieval-parser-compat-matrix.md deleted file mode 100644 index a297896d..00000000 --- a/docs/architecture/retrieval-parser-compat-matrix.md +++ /dev/null @@ -1,46 +0,0 @@ -# Retrieval parser compatibility matrix (ws-a-parser-compat) - -This records Step 2 parser compatibility decisions from `retrieval-language-support_038d3ae9.plan.md` against the workspace policy: - -- `tree-sitter = "0.24"` -- `tree-sitter-graph = "0.12"` - -## Validation method - -Checked candidate parser crates in an isolated temporary probe crate (outside workspace members) with this dependency shape: - -```toml -[dependencies] -tree-sitter = "0.24" -tree-sitter-graph = "0.12" - = "=" -``` - -For each language, ran `cargo check` after pinning exactly one parser crate/version. - -## Decision matrix - -| Language | Candidate crate | Version checked | `cargo check` with 0.24/0.12 | Decision | Notes | -|---|---|---:|---|---|---| -| Go | `tree-sitter-go` | `0.23.4` | pass (`cargo check` + parse smoke) | crates.io pin | `0.25.0` compiles but fails at runtime with `LanguageError { version: 15 }` on tree-sitter `0.24`. | -| Ruby | `tree-sitter-ruby` | `0.23.1` | pass (`cargo check` + parse smoke) | crates.io pin | Wired in indexer with `rules/ruby.scm`. | -| PHP | `tree-sitter-php` | `0.23.11` | pass (`cargo check` + parse smoke) | crates.io pin | `0.24.2` compiles but fails at runtime with `LanguageError { version: 15 }` on tree-sitter `0.24`. | -| C# | `tree-sitter-c-sharp` | `=0.23.0` | pass (`cargo check` + parse smoke) | crates.io pin | `0.23.5` compiles but fails at runtime with `LanguageError { version: 15 }` on tree-sitter `0.24`. | -| Kotlin | `tree-sitter-kotlin-ng` | `1.1.0` | pass | crates.io pin | Use `-ng` crate family for Kotlin parser wiring. | -| Swift | `tree-sitter-swift` | `0.7.2` | pass | crates.io pin | crates.io source compiles with policy pins. | -| Dart | `tree-sitter-dart` | `0.2.0` | pass | crates.io pin | crates.io source compiles with policy pins. | -| HTML | `tree-sitter-html` | `0.23.2` | pass | crates.io pin | Parser is available if structural extraction chooses parser-backed route. | -| CSS | `tree-sitter-css` | `0.25.0` | pass | crates.io pin | Parser is available if structural extraction chooses parser-backed route. | -| SQL | `tree-sitter-sequel` | `0.3.11` | pass | crates.io pin | SQL parser candidate compiles with policy pins. | -| Bash | `tree-sitter-bash` | `0.25.1` | pass | crates.io pin | Supports script-language parser path if/when enabled. | - -## Current outcome - -- No language in this matrix currently requires a git pin, custom fork, or forced text-only fallback for **parser-policy compatibility**. -- Go, Ruby, PHP, and C# have parser dependencies, rule assets, and extension - routing wired in the current branch. -- HTML, CSS, and SQL have structural extraction paths, but they are not - parser-backed rule assets from this matrix. -- Kotlin, Swift, Dart, and Bash remain compatibility decisions only. They still - need dependency wiring, rule assets, language routing, and fidelity coverage - before they should be described as parser-backed runtime support. diff --git a/docs/architecture/runtime-execution-path.md b/docs/architecture/runtime-execution-path.md index 4bc4289e..372c3ac6 100644 --- a/docs/architecture/runtime-execution-path.md +++ b/docs/architecture/runtime-execution-path.md @@ -6,36 +6,15 @@ This page describes the current command path for the core CLI workflows: ## Index Command -```mermaid -sequenceDiagram - participant CLI as codestory-cli - participant Runtime as codestory-runtime - participant Workspace as codestory-workspace - participant Indexer as codestory-indexer - participant Store as codestory-store - participant Search as runtime search - - CLI->>Runtime: parse args and build context - Runtime->>Workspace: open project and compute refresh inputs - Workspace-->>Runtime: refresh plan - Runtime->>Store: open staged or live store - Runtime->>Indexer: run WorkspaceIndexer - Indexer->>Store: flush graph, projections, search docs - Runtime->>Store: publish staged snapshot when a full refresh completes - Runtime->>Search: sync lexical projection and semantic docs - Search->>Store: reuse, embed, upsert, reload, and prune semantic docs -``` +See [indexing pipeline](indexing-pipeline.md) for the full indexing lifecycle, +refresh modes, and staged snapshot publish path. -1. `codestory-cli` parses the request and builds a runtime context. -2. `codestory-runtime` opens the project root, store path, and workspace manifest. -3. `codestory-workspace` computes the refresh plan from discovery plus stored file inventory. -4. `codestory-runtime` opens a staged or live store depending on refresh mode. -5. `codestory-indexer::WorkspaceIndexer` parses files, extracts graph artifacts, flushes projection batches, and runs resolution. -6. `codestory-store` updates graph rows, occurrence rows, callable projection state, search-doc rows, and snapshot invalidation state. -7. Runtime finalizes staged builds through `SnapshotStore` and publishes the finished snapshot when a full refresh completes. -8. Runtime refreshes the search-symbol projection and synchronizes semantic docs before returning the index summary. +At runtime, `codestory-cli` delegates to `codestory-runtime`, which opens the +workspace refresh plan, runs `codestory-indexer::WorkspaceIndexer`, flushes graph +and search projections through `codestory-store`, and synchronizes symbol docs, +component reports, and selected dense anchors before returning the index summary. -Default index runs do not defer semantic docs. When embedding assets are available, the returned retrieval state should have `semantic_ready = true` and a non-zero semantic doc count. If semantic assets are missing or hybrid retrieval is disabled, runtime still completes graph and lexical state and reports the degraded-state reason. +Default index runs do not defer symbol docs. When embedding assets are available, the returned retrieval state reports the selected dense-anchor corpus for `graph_first_v1`; that corpus may be zero for graph-only projects. If embedding assets are missing, runtime still completes graph, lexical, symbol-doc, and component-report state and reports the degraded-state reason instead of pretending dense retrieval is ready. ## Search Command @@ -61,13 +40,15 @@ sequenceDiagram 2. Runtime asks `codestory-retrieval` for sidecar status before serving results. 3. Retrieval status loads the stored retrieval manifest, applies stale-manifest checks, and reports the exact degraded reason before any healthy sidecar probe can bless an invalid manifest. 4. `retrieval_mode = full` is the only product-serving search path. Missing, stale, partial, or non-product sidecar state fails closed with the degraded reason. -5. Runtime executes the mandatory sidecar query, resolves returned candidates back into indexed symbols, and rejects unresolved or non-full candidate sets before returning product hits. +5. Runtime executes the mandatory sidecar query in AST-first order: exact symbol/AST lookup, lexical source and virtual-doc search, graph expansion, then dense-anchor augmentation. It resolves returned candidates back into indexed symbols and rejects unresolved or non-full candidate sets before returning product hits. 6. Hybrid semantic state, repo-text matches, and local lexical search are diagnostic/navigation surfaces only; they are not a product fallback for `search`. 7. For broad architecture-style queries, runtime may assemble a Search Plan with extracted/dropped terms, bounded subqueries, candidate windows, anchor groups, bridge evidence, next commands, and source-truth checks. 8. Runtime maps retrieval state plus resolved sidecar matches into contract DTOs and CLI renders them. When `search --why` is requested, the CLI renders compact explanations from the same DTO surface: sidecar origin, degraded/fail-closed state, candidate +provenance (`exact`, `lexical_source`, `symbol_doc`, `graph_neighbor`, +`component_report`, `dense_anchor`), resolution details, and the Search Plan when the broad-query planner emitted one. Legacy hybrid score details may appear only as diagnostic data from non-serving paths. @@ -119,7 +100,7 @@ stdio MCP-style resources/prompts/tools. `doctor` opens the project summary and reports cache/index/retrieval health without mutating state. `explore` remains the browser surface until the -[browser surface gate](browser-surface-gate.md) is satisfied. Do not add a +[browser surface gate](overview.md#browser-surface-gate) is satisfied. Do not add a separate `browse` command, web UI route, or browser-specific UI without current manifest, warm-loop, stress-lane, explore, and screenshot-review evidence. diff --git a/docs/architecture/subsystems/indexer.md b/docs/architecture/subsystems/indexer.md index cd29c731..570a6bea 100644 --- a/docs/architecture/subsystems/indexer.md +++ b/docs/architecture/subsystems/indexer.md @@ -44,18 +44,10 @@ Incremental work also does more cleanup: ## Pipeline -The core path inside `WorkspaceIndexer::run` is: - -1. Seed symbol state for incremental runs from existing stored node kinds. -2. Walk `files_to_index` in chunks using the configured batch sizes. -3. For each file, normalize the path, load compilation metadata if available, and skip unsupported files before parsing. -4. Try the artifact cache first. A cache hit can reuse stored nodes, edges, occurrences, component access, and callable projection state without reparsing the file. -5. Parse cache misses in parallel and turn each file into `IntermediateStorage`. -6. Merge per-file results into a batched in-memory projection and flush once file, node, edge, or occurrence thresholds are reached. -7. Flush any remaining batched data. -8. Run `ResolutionPass` after all projection writes are visible in the store. -9. Flush collected indexing errors. -10. For incremental runs, delete removed files from the store. +The core path inside `WorkspaceIndexer::run` is documented in the +[indexing pipeline](../indexing-pipeline.md). At a high level: discover files, +parse or reuse cached artifacts, flush projection batches, run `ResolutionPass`, +and clean up removed files on incremental runs. ## What Gets Flushed diff --git a/docs/architecture/subsystems/runtime.md b/docs/architecture/subsystems/runtime.md index f0adc618..4c61a2ad 100644 --- a/docs/architecture/subsystems/runtime.md +++ b/docs/architecture/subsystems/runtime.md @@ -7,7 +7,7 @@ - project open and summary flows - full and incremental indexing orchestration - runtime-owned search engine state and ranking -- semantic doc synchronization, embedding reuse, and retrieval readiness reporting +- symbol-doc synchronization, dense-anchor reuse, and retrieval readiness reporting - grounding, trail, symbol, and snippet assembly - agent-oriented retrieval and answer flows @@ -35,13 +35,13 @@ ## Search And Semantic Sync -Runtime owns the default semantic-sync path after graph indexing completes. The store owns persisted rows, but runtime decides when to build semantic docs, when to reuse or embed them, when to reload them into the search engine, and how to report readiness to CLI callers. +Runtime owns the default semantic-sync path after graph indexing completes. The store owns persisted rows, but runtime decides when to build graph-native symbol docs, when to build component reports, when to classify dense anchors under `graph_first_v1`, when to reuse or embed selected dense anchors, when to reload them into the search engine, and how to report readiness to CLI callers. Important tuning surfaces: -- `CODESTORY_SEMANTIC_DOC_SCOPE`: default durable symbols; use `all` for the older broad symbol set +- `CODESTORY_SEMANTIC_DOC_SCOPE`: default durable symbol-doc scope; use `all` only for diagnostics that need the older broad symbol set - `CODESTORY_SEMANTIC_DOC_ALIAS_MODE`: default `alias_variant`; use `no_alias` for baseline research rows or `current_alias` for the older full alias text -- `CODESTORY_SEMANTIC_DOC_MAX_TOKENS`: generated semantic-doc token budget. +- `CODESTORY_SEMANTIC_DOC_MAX_TOKENS`: generated symbol-doc and dense-anchor text token budget. - `CODESTORY_EMBED_BACKEND`: product sidecar indexing requires `llamacpp`. - `CODESTORY_EMBED_LLAMACPP_URL`: local OpenAI-compatible llama.cpp embedding endpoint for `CODESTORY_EMBED_BACKEND=llamacpp`. - `CODESTORY_EMBED_LLAMACPP_REQUEST_COUNT`: local llama.cpp request concurrency, clamped from `1` to `16`. @@ -59,11 +59,11 @@ the local llama.cpp sidecar when Docker Compose is available; `retrieval index` then writes generation-bound sidecar artifacts and manifest metadata. Missing or non-product embedding state fails closed for agent-facing retrieval. -Timing fields for this path are in `IndexingPhaseTimings`: `search_projection_rebuild_ms`, `search_symbol_index_ms`, `runtime_cache_publish_ms`, `semantic_doc_build_ms`, `semantic_embedding_ms`, `semantic_db_upsert_ms`, `semantic_reload_ms`, `semantic_prune_ms`, `semantic_docs_reused`, `semantic_docs_embedded`, `semantic_docs_pending`, and `semantic_docs_stale`. +Timing fields for this path are in `IndexingPhaseTimings`: `search_projection_rebuild_ms`, `search_symbol_index_ms`, `runtime_cache_publish_ms`, `semantic_doc_build_ms`, `semantic_embedding_ms`, `semantic_db_upsert_ms`, `semantic_reload_ms`, `semantic_prune_ms`, `symbol_search_docs_written`, `semantic_dense_docs_skipped`, dense reason counters, `semantic_docs_reused`, `semantic_docs_embedded`, `semantic_docs_pending`, and `semantic_docs_stale`. ## Failure Signatures - runtime regains direct persistence logic - search engine internals become public API - CLI formatting concerns start driving runtime behavior -- semantic docs become an implicit background side effect instead of an explicit index phase +- symbol docs or dense anchors become an implicit background side effect instead of an explicit index phase diff --git a/docs/concepts/how-codestory-works.md b/docs/concepts/how-codestory-works.md index d3f09c25..b4d86b73 100644 --- a/docs/concepts/how-codestory-works.md +++ b/docs/concepts/how-codestory-works.md @@ -15,8 +15,9 @@ doctor -> index -> ground -> search -> symbol/trail/snippet/explore -> context - `doctor` checks whether the cache, index, retrieval mode, and local embedding setup are usable. -- `index` builds or refreshes local graph, search, snapshot, and semantic-doc - state for one target repository. +- `index` builds or refreshes local graph, search, snapshot, graph-native + symbol-doc, component-report, and selected dense-anchor state for one target + repository. - `ground` gives broad orientation and reports limited coverage or gaps. - `search` finds candidate files, symbols, routes, literals, modules, or behavior terms. @@ -38,7 +39,10 @@ workspace path. The cache can include: - source snippets and occurrence locations - search projection rows and local search indexes - grounding snapshots rebuilt from the graph -- semantic docs, which are generated searchable summaries for durable symbols +- graph-native symbol docs, which are deterministic searchable summaries for + durable AST symbols +- selected dense anchors, which are the only generated docs embedded as vectors + under the active semantic policy Repository data stays local. Managed setup may fetch tool or model assets, but the indexed project evidence lives in the local cache. @@ -47,8 +51,12 @@ the indexed project evidence lives in the local cache. - Grounding is source-backed context: the files, symbols, and summaries a command returns so an answer can be tied back to repository evidence. -- A semantic doc is generated text for a symbol, stored so hybrid retrieval can - find relevant code even when the query words are not exact. +- A symbol doc is deterministic generated text for a symbol, stored so lexical + and graph retrieval can find relevant code even when the query words are not + exact. +- A dense anchor is a policy-selected symbol, component report, or unstructured + doc that receives a vector embedding. Code symbols do not need dense vectors + to be product-searchable. - A snapshot is a cached read model rebuilt from the local graph. If a snapshot is stale, the tool should say so. - A trail is a focused graph walk around one symbol: callers, callees, diff --git a/docs/contributors/debugging.md b/docs/contributors/debugging.md index 55dbf1bd..565fec33 100644 --- a/docs/contributors/debugging.md +++ b/docs/contributors/debugging.md @@ -83,7 +83,7 @@ Check: - whether the symbol exists in store-backed search docs - whether runtime rebuilt its search state after indexing - what retrieval mode `index`, `ground`, or `search` reported for the current run -- whether semantic retrieval is disabled, ONNX model/tokenizer paths are missing, sidecars are not full, or semantic docs are missing +- whether dense-anchor retrieval is disabled, ONNX model/tokenizer paths are missing, sidecars are not full, or symbol docs / dense anchors are missing - whether `CODESTORY_HYBRID_RETRIEVAL_ENABLED`, `CODESTORY_SEMANTIC_DOC_SCOPE`, `CODESTORY_EMBED_RUNTIME_MODE`, `CODESTORY_EMBED_BACKEND`, or the `CODESTORY_EMBED_ONNX_*` paths changed between runs - whether graph-based boosts are overwhelming lexical matches @@ -109,7 +109,7 @@ Common symptoms: - `index --refresh full` is much slower on an empty cache than on a repeat full refresh - graph timings are small but total index time is dominated by semantic work -- semantic docs are embedded again when they should be reused +- unchanged dense anchors are embedded again when they should be reused Start with: @@ -122,8 +122,8 @@ Start with: Check: - `semantic_ms.doc_build`, `semantic_ms.embedding`, `semantic_ms.db_upsert`, and `semantic_ms.reload` -- `semantic_docs.reused`, `semantic_docs.embedded`, `semantic_docs.pending`, and `semantic_docs.stale` -- whether `CODESTORY_SEMANTIC_DOC_SCOPE=all` is forcing the broad all-symbol semantic set +- `symbol_search_docs_written`, `semantic_dense_docs_skipped`, dense reason counters, `semantic_docs.reused`, `semantic_docs.embedded`, `semantic_docs.pending`, and `semantic_docs.stale` +- whether `CODESTORY_SEMANTIC_DOC_SCOPE=all` is forcing the broad all-symbol symbol-doc set - whether `CODESTORY_SEMANTIC_DOC_ALIAS_MODE` was changed from the profiled default of `alias_variant` - whether `CODESTORY_LLM_DOC_EMBED_BATCH_SIZE` was changed from the profiled default of `128` - whether mandatory sidecars report `retrieval_mode=full` according to `doctor` @@ -136,9 +136,9 @@ Check: Recovery order: 1. Run one measured cold E2E and append the headline numbers to `docs/testing/codestory-e2e-stats-log.md`. -2. Compare semantic embedded/reused counts before changing graph code. -3. For reuse regressions, inspect semantic doc version, generated text hash, embedding model, and embedding dimension. -4. For cold-only regressions, inspect durable semantic scope, length-bucket ordering, embedding batch size, sidecar health, and local embedding endpoint latency. +2. Compare symbol-doc counts, dense skipped/reason counts, and dense embedded/reused counts before changing graph code. +3. For reuse regressions, inspect semantic doc version, generated text hash, embedding profile/backend/model/dimension, document prefix, and semantic policy version. +4. For cold-only regressions, inspect durable symbol scope, dense-anchor policy, length-bucket ordering, embedding batch size, sidecar health, and local embedding endpoint latency. 5. For backend experiments, first verify the runtime is using the backend under test, then rerun the speed and quality comparisons documented in `docs/testing/embedding-backend-benchmarks.md`. ## If Grounding Is Wrong @@ -193,15 +193,15 @@ Check: Use this when you need to wipe state instead of debugging a clearly broken cache: -```powershell -.\target\release\codestory-cli.exe index --project . --refresh full +```sh +./target/release/codestory-cli index --project . --refresh full ``` If the cache directory itself needs to go: -```powershell -Remove-Item -LiteralPath -Recurse -Force -.\target\release\codestory-cli.exe index --project . --refresh full +```sh +mv .bak +./target/release/codestory-cli index --project . --refresh full ``` Keep the work serialized. Running multiple cargo or CLI indexing commands at once can hide the real failure behind lock contention and avoidable memory pressure. diff --git a/docs/contributors/getting-started.md b/docs/contributors/getting-started.md index 765202de..b398ed38 100644 --- a/docs/contributors/getting-started.md +++ b/docs/contributors/getting-started.md @@ -4,7 +4,7 @@ Run these from the repo root: -```powershell +```sh cargo fmt --check cargo check cargo test -p codestory-cli @@ -19,23 +19,30 @@ If you touch runtime search, grounding, or repo-scale indexing behavior, check t After the basic cargo checks, verify the shipped CLI flow with the built binary instead of `cargo run`: -```powershell +```sh cargo build --release -p codestory-cli -.\target\release\codestory-cli.exe setup embeddings --project . --dry-run -.\target\release\codestory-cli.exe index --project . --refresh auto -.\target\release\codestory-cli.exe search --project . --query WorkspaceIndexer --why -.\target\release\codestory-cli.exe context --project . --query WorkspaceIndexer -.\target\release\codestory-cli.exe doctor --project . +./target/release/codestory-cli setup embeddings --project . --dry-run +./target/release/codestory-cli index --project . --refresh auto +./target/release/codestory-cli ready --project . --goal local +./target/release/codestory-cli ground --project . --why +./target/release/codestory-cli files --project . --limit 20 +./target/release/codestory-cli doctor --project . ``` +On Windows PowerShell, use `.\target\release\codestory-cli.exe`. + Read commands default to `--refresh none`. If a read command says the cache is empty, either run `index --refresh full` first or rerun the read command with an explicit refresh mode. +The first loop above exercises local navigation only. Agent-facing `packet` and +`search` evidence require full retrieval sidecars; prepare the sidecar lane +below before treating those commands as product-quality proof. ## Hybrid Retrieval Setup Use the managed full-sidecar path before debugging ranking quality: - managed real-model setup: `node scripts/setup-retrieval-env.mjs --fetch-embed-model`, then `codestory-cli retrieval bootstrap --project .` -- default semantic scope: durable symbols only; set `CODESTORY_SEMANTIC_DOC_SCOPE=all` when you intentionally need the broad all-symbol semantic doc set +- default symbol-doc scope: durable symbols only; set `CODESTORY_SEMANTIC_DOC_SCOPE=all` when you intentionally need the broad all-symbol diagnostic symbol-doc set +- default dense policy: `graph_first_v1` embeds only selected dense anchors; private trivial code remains searchable through symbol docs, lexical source, and graph expansion - default semantic alias mode: compact aliases; set `CODESTORY_SEMANTIC_DOC_ALIAS_MODE=no_alias` or `current_alias` only when reproducing benchmark rows - embedding throughput tuning: `CODESTORY_LLM_DOC_EMBED_BATCH_SIZE` and local llama.cpp sidecar settings @@ -43,6 +50,15 @@ Hash embeddings, ONNX-only flows, and lexical-only switches are diagnostic or historical comparison modes only; they are not valid agent-facing retrieval setup. +After bootstrap, run a target-repo sidecar index before using packet/search: + +```sh +./target/release/codestory-cli index --project . --refresh full +./target/release/codestory-cli retrieval index --project . --refresh full +./target/release/codestory-cli retrieval status --project . --format json +./target/release/codestory-cli ready --project . --goal agent +``` + `index`, `ground`, `search`, `context`, and `doctor` report the active retrieval mode plus any degraded-state reason when retrieval state is available, so confirm that output before assuming the ranking logic regressed. Agent-facing retrieval requires `retrieval_mode=full`. ## Recommended Reading Order diff --git a/docs/contributors/retrieval-sidecar-smoke-ci.md b/docs/contributors/retrieval-sidecar-smoke-ci.md deleted file mode 100644 index ee9e79c4..00000000 --- a/docs/contributors/retrieval-sidecar-smoke-ci.md +++ /dev/null @@ -1,98 +0,0 @@ -# CI manifest-missing smoke: `retrieval-sidecar-smoke` (Windows) - -**Status:** workflow checked in at [`.github/workflows/retrieval-sidecar-smoke.yml`](../../.github/workflows/retrieval-sidecar-smoke.yml). -Full index/query on the monorepo may exceed runner budgets; the job runs bootstrap with -`--skip-compose --wait-secs 0`, asserts `retrieval status` returns the clean pre-index -`retrieval_manifest_missing` shape through the CLI integration test suite, and runs -runtime/retrieval protocol plus non-live CLI search contract tests. This job is not a full sidecar -readiness gate. The workflow restores a Rust build cache before the Cargo steps; a new cache key may -still pay one cold compile, but later pushes should reuse the warmed target and Cargo dependency -state. - -**Preflight reference:** [`docs/ops/retrieval-sidecars.md`](../ops/retrieval-sidecars.md#preflight-smoke-contract) - ---- - -## Purpose - -Fail PRs that touch retrieval/runtime/stdio/search wiring when the manifest-missing status shape -or associated Rust contracts drift on a clean Windows runner. - -## Trigger paths (suggested) - -```yaml -paths: - - crates/codestory-retrieval/** - - crates/codestory-cli/src/**/retrieval* - - crates/codestory-cli/src/stdio_*.rs - - crates/codestory-cli/tests/retrieval_bootstrap_contracts.rs - - crates/codestory-cli/tests/search_json_output.rs - - crates/codestory-cli/tests/stdio_protocol_contracts.rs - - crates/codestory-runtime/src/** - - crates/codestory-indexer/Cargo.toml - - crates/codestory-indexer/src/lib.rs - - docs/ops/retrieval-sidecars.md -``` - -## Job sketch (PowerShell) - -```powershell -# After checkout, Node setup, Rust toolchain setup, and Rust cache restore: -node scripts/lint-retrieval-generalization.mjs -cargo test -p codestory-cli --test retrieval_bootstrap_contracts -cargo test -p codestory-runtime --lib -cargo test -p codestory-runtime --test retrieval_generalization_guard -cargo test -p codestory-cli --test stdio_protocol_contracts -cargo test -p codestory-cli --test search_json_output -cargo test -p codestory-retrieval -``` - -Use a tiny fixture repo if this workflow later grows to include indexed full-mode smoke coverage; -bootstrap with `--skip-compose` does not start sidecars, fetch the GGUF model, or create the -retrieval manifest required for `retrieval_mode == "full"`. - -## Pass criteria - -1. Generalization lint exits 0. -2. Rust cache restore/save completes or gracefully misses without masking later failures. -3. `cargo test -p codestory-cli --test retrieval_bootstrap_contracts` exits 0, including the - bootstrap/status assertion that reports `degraded_reason == "retrieval_manifest_missing"` and - non-`full` mode on a clean temp project before indexing. -4. `cargo test -p codestory-runtime --lib` exits 0. -5. `cargo test -p codestory-runtime --test retrieval_generalization_guard` exits 0. -6. `cargo test -p codestory-cli --test stdio_protocol_contracts` exits 0. -7. `cargo test -p codestory-cli --test search_json_output` exits 0 for non-live fail-closed search contracts. -8. `cargo test -p codestory-retrieval` exits 0. - -## Pins - -Match [`docs/ops/retrieval-sidecars.md`](../ops/retrieval-sidecars.md) version table (real Zoekt, -`qdrant/qdrant:v1.12.5`, generated SCIP graph artifacts). - -## Related tests (local substitute) - -```powershell -node scripts/lint-retrieval-generalization.mjs -cargo test -p codestory-cli --test retrieval_bootstrap_contracts -cargo test -p codestory-runtime --lib -cargo test -p codestory-runtime --test retrieval_generalization_guard -cargo test -p codestory-cli --test stdio_protocol_contracts -cargo test -p codestory-cli --test search_json_output -cargo test -p codestory-retrieval -``` - -The workflow runs the lint script and focused test targets. The manifest-missing smoke lives in -`retrieval_bootstrap_contracts` so Cargo builds the CLI through the integration-test path instead of -paying for a standalone build step before the tests. The Rust cache is configured to save even on -failure, which keeps failed follow-up pushes from repeatedly paying the full Windows cold-compile -cost. `retrieval_generalization_guard` invokes the same lint from Rust for cross-platform CI parity. -This smoke job does not claim stdio, CLI, or runtime full-mode success. Full readiness evidence -requires a separate fixture run that starts real sidecars, provisions `bge-base-en-v1.5.Q8_0.gguf`, -runs `retrieval index`, and verifies `retrieval_mode == "full"`. The live success contracts are -intentionally outside the normal smoke gate: set `CODESTORY_STDIO_FULL_RETRIEVAL_TESTS=1` before -running the stdio full-mode -contracts with `-- --ignored --nocapture`, run -`cargo test -p codestory-cli --test search_json_output -- --ignored --nocapture search_json_emits_sidecar_primary_results_without_repo_text_fallback` -for the CLI lane, and run the ignored `retrieval_eval_*` tests with -`CODESTORY_RETRIEVAL_EVAL_FULL_TESTS=1` only after the sidecar fixture is prepared. Without those -preconditions, the live lanes are blocked/skipped by name rather than silently passing. diff --git a/docs/contributors/testing-matrix.md b/docs/contributors/testing-matrix.md index 1d566598..01c0487d 100644 --- a/docs/contributors/testing-matrix.md +++ b/docs/contributors/testing-matrix.md @@ -1,6 +1,8 @@ # Testing Matrix Run Cargo verifications serially in this repo. The workspace shares build locks. +Examples use POSIX shell syntax. On Windows PowerShell, use environment +assignments such as `$env:NAME = "value"`. ```mermaid flowchart TD @@ -24,7 +26,7 @@ flowchart TD ## Whole Workspace -```powershell +```sh cargo fmt --check cargo check cargo test @@ -37,7 +39,7 @@ These are the default checks for any contributor change. If you only changed `README.md` or `docs/**`, use the smallest credible lane: -```powershell +```sh cargo fmt --check cargo test -p codestory-cli --test onboarding_contracts ``` @@ -46,24 +48,55 @@ Only escalate to broader cargo checks if the doc change depends on new code beha ## Indexer And Graph Fidelity -```powershell +```sh cargo test -p codestory-indexer --test fidelity_regression cargo test -p codestory-indexer --test tictactoe_language_coverage cargo test -p codestory-indexer --test integration +cargo test -p codestory-indexer --test trait_interface_resolution ``` Run these whenever the change affects parsing, extraction, semantic resolution, or graph fidelity. Use the full test binaries above instead of filtered `cargo test` invocations. +Use [language-support.md](../architecture/language-support.md) when deciding +whether a language claim is parser-backed graph, structural collector, or only +a candidate parser compatibility record. + +The opt-in OSS corpus lane checks every public language-support profile against a +pinned medium-sized open source project and compares a raw filesystem baseline +with CodeStory indexing of the same file set: + +```sh +CODESTORY_RUN_OSS_LANGUAGE_CORPUS=1 cargo test -p codestory-indexer --test oss_language_corpus -- --ignored --nocapture +``` + +See [oss-language-corpus.md](../testing/oss-language-corpus.md) for PowerShell commands, +language filtering, cache configuration, and the JSONL report path. + +That corpus is not the strict agent A/B comparison. For language-level +with/without CodeStory agent evidence, run the manifest-backed holdout suite: + +```sh +node scripts/codestory-agent-ab-benchmark.mjs \ + --task-suite language-expansion-holdout \ + --arms without_codestory,with_codestory \ + --repeats 3 --materialize-repos --prepare-codestory-cache \ + --out-dir target/agent-benchmark/language-expansion-holdout \ + --timeout-ms 600000 +``` + +The A/B ledger records elapsed time, tokens, estimated cost, observed tool +calls, command counts, source reads, post-packet source reads, and manifest +quality gates for each supported language task. ## Store Changes -```powershell +```sh cargo test -p codestory-store ``` ## Runtime Changes -```powershell +```sh cargo test -p codestory-runtime cargo test -p codestory-runtime --test retrieval_eval ``` @@ -75,16 +108,17 @@ The repo-scale runtime integration test is ignored by default because it indexes `codestory` workspace and can exhaust memory on developer machines. Only run it as an explicit heavy lane: -```powershell -$env:CODESTORY_RUN_REPO_SCALE_TEST = "1" +```sh +export CODESTORY_RUN_REPO_SCALE_TEST=1 cargo test -p codestory-runtime --test integration test_repo_scale_call_resolution -- --ignored --nocapture ``` ## Repo-Scale Semantic And Cold-Start Checks -Run this lane when default `index` behavior, semantic doc persistence, embedding reuse, or cold-start performance changes: +Run this lane when default `index` behavior, symbol-doc persistence, dense-anchor +persistence/reuse, embedding reuse, or cold-start performance changes: -```powershell +```sh cargo build --release -p codestory-cli cargo test -p codestory-cli --test codestory_repo_e2e_stats -- --ignored --nocapture ``` @@ -95,14 +129,21 @@ only to make that separate drill skip explicit during local release-evidence collection. A skipped drill means the release evidence is not real-repo drill proof; it does not rename the `proof_tier` emitted by the stats JSON. -Append the emitted headline metrics to `docs/testing/codestory-e2e-stats-log.md`. Include graph seconds, semantic seconds, semantic docs reused, semantic docs embedded, total index seconds, `retrieval_index_seconds`, `retrieval_status_seconds`, `proof_tier`, any `warnings`, and whether `sidecar_status_after_retrieval_index` plus `search.sidecar_shadow_retrieval_mode` were `full`. +Append the emitted headline metrics to `docs/testing/codestory-e2e-stats-log.md`. +Include graph seconds, semantic seconds, symbol docs written, dense docs skipped, +dense reason counts, dense docs reused, dense docs embedded, total index +seconds, `repeat_full_refresh_seconds`, repeat graph/semantic/cache/search +timings, `retrieval_index_seconds`, `retrieval_status_seconds`, +`report_seconds`, `proof_tier`, any `warnings`, and whether +`sidecar_status_after_retrieval_index` plus `search.sidecar_shadow_retrieval_mode` +were `full`. Release-readiness evidence is tiered: | Evidence tier | Required proof | Release meaning | | --- | --- | --- | | Stats-only / degraded sidecar | Diagnostic timing or contract evidence without prepared full sidecars, or stats output whose `proof_tier` is `stats_only` | Useful local regression signal only; not release proof for packet/search readiness. The current passing `codestory_repo_release_e2e_emits_stats` harness asserts full sidecar status instead of completing as a passing no-full-sidecar row. | -| Full sidecar | `codestory_repo_release_e2e_emits_stats` emits `proof_tier: "full_sidecar"` after local Zoekt, Qdrant, SCIP, and llama.cpp are running; `retrieval index --refresh full` succeeds; `retrieval status --format json` reports `retrieval_mode: "full"`; and search shadow mode is `full` | Required before claiming agent-facing packet/search readiness on the current workspace. This is the normal tier for a passing stats JSON object from the release e2e stats harness. | +| Full sidecar | `codestory_repo_release_e2e_emits_stats` emits `proof_tier: "full_sidecar"` after local Zoekt, SCIP, and required dense-anchor Qdrant/llama.cpp are prepared; `retrieval index --refresh full` succeeds; `retrieval status --format json` reports `retrieval_mode: "full"` with current symbol-doc and dense-anchor manifest fields; and search shadow mode is `full` | Required before claiming agent-facing packet/search readiness on the current workspace. This is the normal tier for a passing stats JSON object from the release e2e stats harness. | | Real-repo drill | `CODESTORY_REAL_REPO_DRILL_CASES` points at prepared manifests and the drill cases run without skip allowances | Required before claiming the release was exercised beyond the CodeStory checkout. | | Promotion-grade benchmark | Baseline and candidate benchmark rows are captured with sidecar status, search shadow mode, and no-regression threshold | Required for performance or retrieval-quality promotion claims. | @@ -121,6 +162,7 @@ stay visible in logged evidence: | --- | --- | | Total index time | `index_seconds > 600` | | Semantic phase time | `semantic_phase_seconds > 500` | +| AST-first cold index gate | cold CodeStory product index is not under 180s or `semantic_embedding_ms` is not at least 70% below same-run baseline | Preserve those warning strings when copying the run into release evidence. An empty `warnings` array only means the measured run stayed under these warning @@ -133,7 +175,7 @@ examples only; do not copy them into current performance claims. ## CLI Boundary And Output Changes -```powershell +```sh cargo test -p codestory-cli ``` @@ -141,28 +183,39 @@ Prefer this lane before `cargo test` for the whole workspace when the change is Runtime-backed CLI fixture flows are a separate heavier lane: -```powershell +```sh cargo test -p codestory-cli --test runtime_backed_flows -- --ignored ``` Run that lane only when the change crosses CLI and runtime behavior together, such as auto-refresh handling or file-filtered symbol resolution. +The local real-repo agent-quality lane is ignored by default and must evaluate +at least one sibling repository when run: + +```sh +cargo test -p codestory-cli --test agent_quality_eval -- --ignored --nocapture +``` + +Set `CODESTORY_ALLOW_SKIP_LOCAL_REAL_AGENT_QUALITY=1` only when intentionally +collecting skip-only local evidence because none of the sibling repositories are +present. A zero-evaluated run is not quality proof. + ## Bench Surface Checks -```powershell +```sh node scripts/semantic-doc-leakage-check.mjs cargo check -p codestory-bench --benches ``` When changing embedding backends, model profiles, pooling, prefixes, batching, -hardware-provider settings, or generated semantic-doc text, run the semantic-doc +hardware-provider settings, generated symbol-doc text, or dense-anchor text, run the semantic-doc leakage check before trusting benchmark scores. It fails when production -semantic-doc concept phrases copy or closely overlap benchmark query text. Use +generated-doc concept phrases copy or closely overlap benchmark query text. Use `CODESTORY_EMBED_RESEARCH_QUERY_SPLIT=dev` for exploratory tuning and `CODESTORY_EMBED_RESEARCH_QUERY_SPLIT=holdout` for promotion evidence; dev-only rows have `promotion_eligible=false` and must not be promoted. Cache replay is blocked unless `CODESTORY_EMBED_RESEARCH_ALLOW_CACHE_REPLAY=1` is set, so stale -semantic-doc caches cannot silently seed a new benchmark lane. Queries that +generated-doc caches cannot silently seed a new benchmark lane. Queries that previously appeared in leaked production semantic-doc aliases are excluded by default; set `CODESTORY_EMBED_RESEARCH_INCLUDE_TAINTED_QUERIES=1` only when intentionally reproducing the invalidated historical slice. Also @@ -174,17 +227,17 @@ and decision current in the matrix instead of adding raw run transcripts. For indexing performance work, run the full bench when practical: -```powershell +```sh cargo bench -p codestory-bench --bench indexing ``` For browser-scale stress work, start with the smoke lane and only opt into larger synthetic repos when the machine and change justify it: -```powershell +```sh cargo bench -p codestory-bench --bench browser_stress -$env:CODESTORY_STRESS_SCALE = "large" # 1k + 10k -$env:CODESTORY_ALLOW_HEAVY_STRESS = "1" +export CODESTORY_STRESS_SCALE=large # 1k + 10k +export CODESTORY_ALLOW_HEAVY_STRESS=1 cargo bench -p codestory-bench --bench browser_stress ``` diff --git a/docs/decision-log.md b/docs/decision-log.md deleted file mode 100644 index be8598fd..00000000 --- a/docs/decision-log.md +++ /dev/null @@ -1,45 +0,0 @@ -# Architecture History - -This page is a short summary of the durable architecture choices that still shape the current workspace. - -Use the architecture pages as the source of truth for how the system works today. - -## Durable Boundaries - -CodeStory stays split into durable owning crates so contributors can reason about source-of-truth responsibilities without tracing every call path first. - -- boundaries and dependency direction: [architecture overview](architecture/overview.md) -- runtime-owned orchestration path: [runtime execution path](architecture/runtime-execution-path.md) - -## Workspace Plans, Indexer Execution, Store Persistence - -Refresh planning stays in `codestory-workspace`, parse and resolution work stay in `codestory-indexer`, and persistence plus snapshot lifecycle stay in `codestory-store`. - -- planning and ownership overview: [architecture overview](architecture/overview.md) -- full pipeline from CLI to store state: [indexing pipeline](architecture/indexing-pipeline.md) -- crate-specific ownership: [workspace subsystem](architecture/subsystems/workspace.md), [indexer subsystem](architecture/subsystems/indexer.md), [store subsystem](architecture/subsystems/store.md) - -## Snapshot Lifecycle Stays Store-Owned - -Runtime decides when to run full or incremental indexing, but staged-build preparation, staged publish, live snapshot refresh, and derived-state invalidation remain store mechanics. - -- runtime orchestration: [runtime execution path](architecture/runtime-execution-path.md) -- storage responsibilities: [store subsystem](architecture/subsystems/store.md) -- index-to-snapshot path: [indexing pipeline](architecture/indexing-pipeline.md) - -## Retrieval And Grounding Stay Runtime-Orchestrated - -Search ranking, grounding assembly, fallback reporting, and other workflow orchestration stay in runtime instead of leaking into CLI or storage adapters. - -- runtime ownership: [runtime subsystem](architecture/subsystems/runtime.md) -- command path context: [runtime execution path](architecture/runtime-execution-path.md) - -## Default Index Includes Semantic Docs - -Semantic docs are part of the default `codestory-cli index` contract. Runtime synchronizes durable semantic docs before returning instead of relying on a later read command to hydrate them. - -- semantic sync behavior: [indexing pipeline](architecture/indexing-pipeline.md) -- tuning and ownership: [runtime subsystem](architecture/subsystems/runtime.md) -- measured repo-scale baselines: [codestory e2e stats log](testing/codestory-e2e-stats-log.md) - -Keep future architecture guidance in the owning architecture pages instead of reviving a separate ADR track. diff --git a/docs/glossary.md b/docs/glossary.md index cbfbc742..461860ef 100644 --- a/docs/glossary.md +++ b/docs/glossary.md @@ -11,9 +11,10 @@ - contracts: shared graph, DTO, and event types that are safe to depend on across boundaries - repo-text hit: a direct file-content match surfaced alongside indexed-symbol search results - retrieval mode: retrieval status contract for sidecar evidence; `retrieval_mode=full` is required for agent packet/search readiness -- semantic doc: generated per-symbol text plus an embedding stored in SQLite for hybrid retrieval +- symbol doc: deterministic generated per-symbol text stored in SQLite for graph-native lexical retrieval; it is not embedded by default +- dense anchor: a policy-selected symbol, component report, or unstructured doc that receives a vector embedding - local navigation readiness: the local cache, graph, lexical index, and DB-backed navigation commands are usable - agent packet/search readiness: sidecar packet/search evidence is trustworthy only when retrieval status reports `retrieval_mode=full` - target context: DB-first evidence for one concrete target; not a replacement for broad packet, search, or drill questions -- semantic ready: local diagnostic state where hybrid retrieval is enabled, an embedding runtime is available, and persisted semantic docs exist; not agent packet/search readiness +- semantic ready: local diagnostic state where dense-anchor retrieval is enabled, an embedding runtime is available when dense anchors exist, and persisted dense anchors match the active policy; not agent packet/search readiness - cache root: the directory that owns one project cache; by default this is under the user cache directory, but `--cache-dir` can override it diff --git a/docs/ops/retrieval-sidecars.md b/docs/ops/retrieval-sidecars.md index 707472a7..a3651ea9 100644 --- a/docs/ops/retrieval-sidecars.md +++ b/docs/ops/retrieval-sidecars.md @@ -11,7 +11,11 @@ and `affected` can be useful with a healthy local cache, but that cache alone does not prove packet/search sidecar readiness. **Design reference:** [`retrieval-design.md`](../architecture/retrieval-design.md) -(sidecar pins, degraded modes, preflight). +(mode definitions, cost envelopes, promotion guards). + +**Operations reference:** this runbook owns setup commands, version pins, env +vars, troubleshooting, and CI smoke sequences. Proof tiers and promotion +checklists live in [`retrieval-architecture.md`](../testing/retrieval-architecture.md). --- @@ -33,6 +37,9 @@ From the CodeStory repository root (Windows, macOS, Linux): cargo retrieval-setup ``` +This starts or checks the local sidecar services for the CodeStory checkout; it +does not by itself finalize the retrieval manifest for every target workspace. + Plain `codestory-cli index` builds the core SQLite code index only. It can make the local navigation lane usable, but it does not generate sidecar artifacts or prove agent packet/search readiness. Use @@ -43,21 +50,26 @@ agent-facing packet/search evidence. First-run evidence path: -```powershell +```sh node scripts/setup-retrieval-env.mjs --fetch-embed-model -$env:CODESTORY_EMBED_MODEL_DIR = (Resolve-Path .\target\retrieval-models).Path -$env:CODESTORY_EMBED_BACKEND = "llamacpp" -$env:CODESTORY_EMBED_LLAMACPP_URL = "http://127.0.0.1:8080/v1/embeddings" +export CODESTORY_EMBED_MODEL_DIR="$(pwd)/target/retrieval-models" +export CODESTORY_EMBED_BACKEND="llamacpp" +export CODESTORY_EMBED_LLAMACPP_URL="http://127.0.0.1:8080/v1/embeddings" cargo retrieval-setup -.\target\release\codestory-cli.exe index --project --refresh full -.\target\release\codestory-cli.exe retrieval index --project --refresh full -.\target\release\codestory-cli.exe retrieval status --project --format json +./target/release/codestory-cli index --project --refresh full +./target/release/codestory-cli retrieval index --project --refresh full +./target/release/codestory-cli retrieval status --project --format json ``` +On Windows PowerShell, use `.\target\release\codestory-cli.exe` and `$env:...` +assignments for the same flow. + `retrieval status` must show `retrieval_mode: "full"`. Its JSON backend fields distinguish the active query backend (`query_embedding_backend`), manifest -vector contract (`manifest_vector_embedding_backend`), and stored semantic-doc -producer (`stored_doc_vector_producer_backend`). +vector contract (`manifest_vector_embedding_backend`), and stored dense-anchor +producer (`stored_doc_vector_producer_backend`). Under `graph_first_v1`, a +generation can be full with zero dense anchors; in that case status reports the +Qdrant component as policy-skipped rather than querying a missing collection. Status after bootstrap: @@ -93,6 +105,14 @@ node scripts/setup-retrieval-env.mjs --with-holdout-clone | `--skip-build` | Skip `cargo build` (alias still builds on first `cargo retrieval-setup`) | | `--with-holdout-clone` | Also run `scripts/fetch-holdout-repos.mjs` (large git clones under `target/`) | +When `--fetch-embed-model` is present, the wrapper downloads +`bge-base-en-v1.5.Q8_0.gguf` to a process-scoped temporary file, verifies the +exact size (`117974304` bytes) and SHA-256 +(`ad1afe72cd6654a558667a3db10878b049a75bfd72912e1dabb91310d671173c`), and only +then renames it into `CODESTORY_EMBED_MODEL_DIR` or `target/retrieval-models`. +Existing model files must pass the same verification. Configured fallback URLs +are mirrors only because the same checksum gates every accepted artifact. + **Direct CLI** (equivalent to alias): ```sh @@ -131,23 +151,26 @@ Override ports with `CODESTORY_ZOEKT_PORT`, `CODESTORY_QDRANT_HTTP_PORT`, `CODES Project id is a stable FNV-1a hex hash of the canonical repo root (same scheme as CLI cache hashing). Sidecar artifacts are content-addressed by `sidecar_generation = -`. -The hash covers the local lexical input, symbol projection rows, semantic file roles, embedding -backend/dim, and sidecar schema version. Re-running `retrieval index` with unchanged inputs validates +The hash covers local source lexical input, generated `symbol_search_doc` virtual docs, +component-report virtual docs, dense-anchor rows, semantic file roles, embedding +backend/dim, semantic policy version, dense reason counts, and sidecar schema version. +Re-running `retrieval index` with unchanged inputs validates the live generation and reuses it instead of rewriting Zoekt, Qdrant, or SCIP. `retrieval status` and `retrieval query` fail closed when the manifest is obsolete or stale. A valid manifest must include the current sidecar schema version, input hash, derived generation id, derived -Qdrant collection, and matching stored semantic-doc vector count. If the SQLite projection or stored -semantic-doc contract changes after the manifest is written, rerun `retrieval index`; runtime paths +Qdrant collection, matching symbol-doc count, matching dense-anchor count, semantic policy version, +graph artifact hash, and dense reason counts. If the SQLite graph projection, symbol docs, dense-anchor +contract, or policy version changes after the manifest is written, rerun `retrieval index`; runtime paths will not infer or reuse bare project-id sidecars. -`retrieval index --refresh auto` repairs stale stored semantic-doc contracts by retrying once with a +`retrieval index --refresh auto` repairs stale stored symbol-doc or dense-anchor contracts by retrying once with a full refresh when finalization detects that the manifest would be unavailable immediately. Explicit `--refresh none` and failed explicit refreshes still fail closed instead of serving degraded sidecars. Confirm bindings with: -```powershell -.\target\release\codestory-cli.exe retrieval status --project . +```sh +./target/release/codestory-cli retrieval status --project . ``` --- @@ -156,9 +179,9 @@ Confirm bindings with: ### Bootstrap (recommended: Compose + cache dirs + wait) -```powershell +```sh cargo build --release -p codestory-cli -.\target\release\codestory-cli.exe retrieval bootstrap --project . +./target/release/codestory-cli retrieval bootstrap --project . ``` Starts `docker/retrieval-compose.yml` when Docker is available (`qdrant/qdrant:v1.12.5`, Zoekt @@ -168,11 +191,14 @@ Bootstrap removes stale pre-mandatory `codestory-zoekt-stub` containers before s real sidecars. It discovers the embed model directory from `CODESTORY_EMBED_MODEL_DIR`, `target/retrieval-models`, or `models/gguf/bge-base-en-v1.5` when the GGUF file is present. The embed service uses the measured local request geometry (`-np 6`, `-b 1024`, `-ub 1024`). -Qdrant document vectors are copied from the already-managed local `llm_symbol_doc` semantic -document table when the stored embedding contract is the product BGE base profile -(`bge-base-en-v1.5`, 768 dimensions, ONNX or llama.cpp backend). The llama.cpp sidecar remains -mandatory for query embeddings and live semantic smoke checks, but cold sidecar indexing must not -re-embed the whole stored semantic corpus just to populate Qdrant. +Qdrant document vectors are copied from the already-managed local `llm_symbol_doc` dense-anchor +table when the stored embedding contract is the product BGE base profile +(`bge-base-en-v1.5`, 768 dimensions, ONNX or llama.cpp backend). Under `graph_first_v1`, most code +symbols live only in `symbol_search_doc` and Zoekt; Qdrant contains selected dense anchors such as +entrypoints, public APIs, documented nontrivial symbols, central graph nodes, component reports, and +unstructured docs. The llama.cpp sidecar remains mandatory for query embeddings and live semantic +smoke checks when dense anchors exist, but cold sidecar indexing must not re-embed every code symbol +just to populate Qdrant. Qdrant query-time search uses the current Query API `POST /collections/{collection}/points/query` and requires `result.points[]` in the response; older search response shapes are treated as contract drift. Exact symbol queries are served from @@ -197,16 +223,16 @@ While Qdrant is reachable, pruning uses HTTP `DELETE /collections/{name}`; when ### Start sidecars (data dirs + state file only) -```powershell -.\target\release\codestory-cli.exe retrieval up +```sh +./target/release/codestory-cli retrieval up ``` Does **not** start Docker. Use `retrieval bootstrap` or the setup script for automated Compose. ### Health check -```powershell -.\target\release\codestory-cli.exe retrieval status --project . +```sh +./target/release/codestory-cli retrieval status --project . ``` JSON includes per-component `status`, `latency_ms`, `detail`, `capabilities` flags @@ -217,7 +243,7 @@ is allowed to serve agent-facing retrieval. | Component | Healthy when | |-----------|--------------| | zoekt | HTTP reachable on `6070`, real shard dir (no `.zoekt-stub` marker) | -| qdrant | collection exists, no stub marker under `{qdrant_data_dir}/codestory-stub-markers/{collection}.qdrant-stub` (obsolete `collections/{collection}/.qdrant-stub` also counts as stubbed), reported point count is at least the manifest projection count when available, and semantic smoke search returns repo-relative paths | +| qdrant | when manifest dense-anchor count is >0: collection exists, no stub marker under `{qdrant_data_dir}/codestory-stub-markers/{collection}.qdrant-stub` (obsolete `collections/{collection}/.qdrant-stub` also counts as stubbed), reported point count is at least the manifest dense projection count, and semantic smoke search returns repo-relative paths; when dense-anchor count is 0: reported healthy/semantic with an explicit policy-skipped detail and no collection probe | | scip | `symbols.index.json`, `index.scip`, and non-empty `revision.txt` exist under the manifest generation, with no `index.scip.stub` | ### Mandatory sidecars @@ -236,34 +262,35 @@ retrieval manifest, or make `retrieval status` report `full`. | Component | Status | |-----------|--------| | Zoekt | `retrieval index` builds `lexical-index.jsonl` shards for the active sidecar generation; client searches the manifest generation | -| Qdrant | 768-d bge-base vectors copied from stored local semantic docs are mandatory; `semantic=true` only after smoke search succeeds against the manifest collection and manifest records the product embedding backend | +| Qdrant | 768-d bge-base vectors copied from stored local dense anchors are mandatory when dense anchors exist; `semantic=true` only after smoke search succeeds against the manifest collection and manifest records the product embedding backend. If `graph_first_v1` selects zero dense anchors, Qdrant is intentionally skipped and full mode remains valid only with complete graph/lexical artifacts | | SCIP | Graph symbols emitted to `symbols.index.json` + `index.scip` under the active sidecar generation from the full SQLite symbol projection | ### Real embeddings (bge-base-en-v1.5 + llama.cpp) Promotion uses **768-d** vectors. Qdrant document vectors come from stored -semantic docs with product-compatible vector metadata. Query vectors come from +dense anchors with product-compatible vector metadata. Query vectors come from the local llama.cpp sidecar so retrieval remains sidecar-backed and can smoke-test the live collection. With real vectors enabled, an unset retrieval backend means this product llama.cpp contract; explicit ONNX or hash modes are diagnostic only and never produce `retrieval_mode=full`. 1. Download GGUF (once): `node scripts/setup-retrieval-env.mjs --fetch-embed-model` + verifies the pinned size/SHA-256 before the model is accepted. 2. Export (see [`docker/retrieval.env.example`](../../docker/retrieval.env.example)): - `CODESTORY_EMBED_MODEL_DIR=/target/retrieval-models` - `CODESTORY_EMBED_BACKEND=llamacpp` (recommended explicit product mode; unset is also product mode for retrieval commands) - `CODESTORY_EMBED_LLAMACPP_URL=http://127.0.0.1:8080/v1/embeddings` 3. `cargo retrieval-setup` (starts Qdrant, Zoekt webserver, `codestory-embed` on `:8080`) 4. Dim smoke: `curl -s http://127.0.0.1:8080/v1/embeddings -H "Content-Type: application/json" -d "{\"input\":[\"function\"]}"` → embedding length **768** -5. `retrieval index --project --refresh full` (manifest records `embedding_backend`, `embedding_dim`, `sidecar_input_hash`, `sidecar_generation`, and the generated Qdrant collection; the input hash includes stored semantic-doc metadata and embedding contract) +5. `retrieval index --project --refresh full` (manifest records `embedding_backend`, `embedding_dim`, `sidecar_input_hash`, `sidecar_generation`, the generated Qdrant collection, `symbol_doc_count`, `dense_projection_count`, `semantic_policy_version`, `graph_artifact_hash`, and dense reason counts; the input hash includes symbol-doc and dense-anchor metadata plus the embedding contract) 6. `retrieval status` → `retrieval_mode: full` and `capabilities.semantic=true` Wrong model dim with `CODESTORY_EMBED_BACKEND=llamacpp` fails loudly (no hash substitution). ### Index project -```powershell -.\target\release\codestory-cli.exe retrieval index --project . --refresh auto +```sh +./target/release/codestory-cli retrieval index --project . --refresh auto ``` Runs workspace index (same as `codestory index`) then persists `retrieval_index_manifest` in @@ -277,26 +304,28 @@ Index finalization writes new generations instead of mutating the manifest gener - SCIP artifacts: `scip//` The manifest is updated only after the generated sidecars are emitted. If the manifest hash, -schema version, projection count, embedding backend/dim, and live health still match, finalization +schema version, symbol-doc count, dense-anchor count, semantic policy version, graph artifact hash, +dense reason counts, embedding backend/dim, and live health still match, finalization returns the existing manifest and skips the rebuild path. This is the intended fast path for iterative evidence loops with `--refresh none` after a successful generation build. If a previous `retrieval index` attempt emitted generated artifacts but failed before manifest persist, finalization probes the would-be generation before rebuilding. Healthy Zoekt shards, complete Qdrant collections, and SCIP artifacts are reused independently. Qdrant reuse requires an -exact point count at least as large as the current stored semantic-doc vector count; a one-point or -otherwise partial collection is rebuilt instead of being blessed by semantic smoke alone. +exact point count at least as large as the current dense-anchor count; a one-point or otherwise +partial collection is rebuilt instead of being blessed by semantic smoke alone. When dense-anchor +count is zero, Qdrant reuse is skipped explicitly and cannot mask stale graph/lexical artifacts. ### Stop sidecars (state file only) -```powershell -.\target\release\codestory-cli.exe retrieval down +```sh +./target/release/codestory-cli retrieval down ``` ### Standalone query (Phase 2+) -```powershell -.\target\release\codestory-cli.exe retrieval query "ExtensionService" --project . +```sh +./target/release/codestory-cli retrieval query "ExtensionService" --project . ``` --- @@ -319,24 +348,26 @@ GGUF embedding model. **CI reduced sequence:** 1. generalization lint - exit 0 -2. release `codestory-cli` build - exit 0 -3. `retrieval bootstrap --project . --skip-compose --wait-secs 0` - exit 0 -4. `retrieval status --project .` - JSON reports the clean pre-index - `degraded_reason == "retrieval_manifest_missing"` state and must not report `retrieval_mode=full` -5. `cargo test -p codestory-runtime --lib` - exit 0 -6. `cargo test -p codestory-runtime --test retrieval_generalization_guard` - exit 0 -7. `cargo test -p codestory-cli --test stdio_protocol_contracts` - exit 0 -8. `cargo test -p codestory-cli --test search_json_output` - exit 0 for non-live fail-closed search contracts -9. `cargo test -p codestory-retrieval` - exit 0 +2. `cargo test -p codestory-cli --test retrieval_bootstrap_contracts` - exit 0; + this integration suite runs the clean pre-index bootstrap/status shape and + asserts `degraded_reason == "retrieval_manifest_missing"` without reporting + `retrieval_mode=full` +3. `cargo test -p codestory-runtime --lib` - exit 0 +4. `cargo test -p codestory-runtime --test retrieval_generalization_guard` - exit 0 +5. `cargo test -p codestory-cli --test stdio_protocol_contracts` - exit 0 +6. `cargo test -p codestory-cli --test search_json_output` - exit 0 for non-live fail-closed search contracts +7. `cargo test -p codestory-retrieval` - exit 0 The reduced CI sequence is a manifest-missing shape check only. It creates local cache/state directories and verifies status JSON plus runtime/stdio/search/retrieval contracts, but it does not start sidecars, fetch `bge-base-en-v1.5.Q8_0.gguf`, or build the project manifest required for -`retrieval_mode=full`. The included `search_json_output` suite covers non-live fail-closed search -behavior; it does not claim stdio, CLI, or runtime full-mode success. Full-mode gates must start -real sidecars, provision the GGUF model, index a fixture or target workspace, and verify -`retrieval_mode == "full"`. The live full-mode contracts are ignored or env-gated by default and -should be run explicitly only after those dependencies are prepared: set +`retrieval_mode=full`. The included `retrieval_bootstrap_contracts` suite builds the CLI through +Cargo's integration-test path instead of a standalone release build step. The included +`search_json_output` suite covers non-live fail-closed search behavior; it does not claim stdio, +CLI, or runtime full-mode success. Full-mode gates must start real sidecars, provision the GGUF +model, index a fixture or target workspace, and verify `retrieval_mode == "full"`. The live +full-mode contracts are ignored or env-gated by default and should be run explicitly only after +those dependencies are prepared: set `CODESTORY_STDIO_FULL_RETRIEVAL_TESTS=1` before running stdio full-mode contracts with `-- --ignored --nocapture`, `cargo test -p codestory-cli --test search_json_output -- --ignored --nocapture search_json_emits_sidecar_primary_results_without_repo_text_fallback` @@ -344,10 +375,43 @@ and the ignored `retrieval_eval_*` tests with `CODESTORY_RETRIEVAL_EVAL_FULL_TES **Failure policy:** PRs touching `codestory-retrieval` or sidecar wiring fail CI if smoke fails. +**CI trigger paths:** match +[`.github/workflows/retrieval-sidecar-smoke.yml`](../../.github/workflows/retrieval-sidecar-smoke.yml) +`paths:` filters — `crates/codestory-retrieval/**`, `crates/codestory-contracts/**`, +`crates/codestory-store/Cargo.toml`, `crates/codestory-store/src/**`, `crates/codestory-cli` +retrieval/stdio sources (`retrieval.rs`, `main.rs`, `args.rs`, `runtime.rs`, `stdio_*.rs`), +`crates/codestory-cli/tests/retrieval_bootstrap_contracts.rs`, +`search_json_output.rs`, `stdio_protocol_contracts.rs`, `crates/codestory-runtime/src/**`, +`crates/codestory-runtime/tests/retrieval_generalization_guard.rs`, +`crates/codestory-indexer/Cargo.toml`, `crates/codestory-indexer/src/lib.rs`, +`scripts/lint-retrieval-generalization.mjs`, `scripts/**retrieval**`, +`docs/ops/retrieval-sidecars.md`, `docs/architecture/retrieval-*.md`, +`docs/testing/retrieval-architecture.md`, `docker/retrieval-compose.yml`, and the workflow file itself. + +**CI pass criteria (Windows `retrieval-sidecar-smoke`):** + +1. Generalization lint exits 0. +2. Rust cache restore/save completes or gracefully misses without masking later failures. +3. `cargo test -p codestory-cli --test retrieval_bootstrap_contracts` exits 0, including the bootstrap/status assertion that reports `degraded_reason == "retrieval_manifest_missing"` and non-`full` mode on a clean temp project before indexing. +4. `cargo test -p codestory-runtime --lib` exits 0. +5. `cargo test -p codestory-runtime --test retrieval_generalization_guard` exits 0. +6. `cargo test -p codestory-cli --test stdio_protocol_contracts` exits 0. +7. `cargo test -p codestory-cli --test search_json_output` exits 0 for non-live fail-closed search contracts. +8. `cargo test -p codestory-retrieval` exits 0. + +The workflow restores a Rust build cache before the Cargo steps; a new cache key may +still pay one cold compile, but later pushes should reuse the warmed target and Cargo +dependency state. The manifest-missing smoke lives in `retrieval_bootstrap_contracts` +so Cargo builds the CLI through the integration-test path instead of paying for a +standalone build step before the tests. The Rust cache is configured to save even on +failure, which keeps failed follow-up pushes from repeatedly paying the full Windows +cold-compile cost. `retrieval_generalization_guard` invokes the same lint from Rust for +cross-platform CI parity. + **Holdout prefetch (benchmark harness, not sidecar CLI):** -```powershell -node scripts/codestory-agent-ab-benchmark.mjs ` +```sh +node scripts/codestory-agent-ab-benchmark.mjs \ --list --task-suite holdout-retrieval --materialize-repos ``` @@ -357,9 +421,10 @@ Clones land in `target/agent-benchmark/repos/` (gitignored). | Symptom | Likely cause | Action | |---------|--------------|--------| -| `retrieval up` port in use | stale process | `retrieval down`; check Task Manager / `docker ps` | +| `retrieval up` port in use | stale process | `retrieval down`; check `ps`, Task Manager, or `docker ps` | | Zoekt unhealthy, unreachable | server not started | start Zoekt on `6070` and rebuild the project shard | | Qdrant unhealthy | wrong image tag / volume permissions | `docker run -p 6333:6333 qdrant/qdrant:v1.12.5` | +| Qdrant unavailable while manifest dense-anchor count is `0` | expected graph-first policy skip | Verify Zoekt and SCIP are healthy and manifest policy/count/hash fields match; the dense stage will be skipped explicitly | | SCIP `scip_unavailable` | graph artifacts missing | fix SCIP emission before using agent-facing retrieval | | Smoke > 100ms / 200ms | cold cache or oversized fixture | retry; check tier envelope | @@ -367,14 +432,15 @@ Clones land in `target/agent-benchmark/repos/` (gitignored). ## Mandatory sidecar modes (operator view) -When a sidecar is down, the sidecar executor selects a non-`full` mode per design matrix — see -[`retrieval-design.md`](../architecture/retrieval-design.md#mandatory-sidecar-mode-matrix). -Non-`full` modes are diagnostic only and fail closed for product packet/search paths. +When a sidecar is down, the sidecar executor selects a non-`full` mode per the +[mode matrix](../architecture/retrieval-design.md#mode-matrix). Non-`full` modes +are diagnostic only and fail closed for product packet/search paths. | Condition | User-visible mode | Action | |-----------|-------------------|--------| | Zoekt down | `unavailable` | Fix Zoekt; no product query should run | -| Qdrant down, Zoekt up | `no_semantic` or `lexical_only` | Fix Qdrant; no product query should run | +| Qdrant down, Zoekt up, dense anchors expected | `no_semantic` or `lexical_only` | Fix Qdrant; no product query should run | +| Qdrant skipped, Zoekt up, SCIP up, dense anchors `0` | `full` | Valid graph/lexical full mode for the active policy; dense query stage is skipped | | SCIP down | `no_scip` | Fix SCIP artifacts; no product query should run | Traces must include `retrieval_mode` and `degraded_reason`. @@ -387,7 +453,7 @@ Traces must include `retrieval_mode` and `degraded_reason`. |----------|---------| | `CODESTORY_RETRIEVAL` | unset or `1` uses mandatory sidecar primary when mode is `full`; non-`full` modes fail closed; `0` is unsupported | | `CODESTORY_RETRIEVAL_SHADOW` | Historical diagnostic trace switch; unsupported in product benchmarks | -| `CODESTORY_RETRIEVAL_REAL_EMBEDDINGS` | defaults to `1`; `0` is unsupported for product indexing or packet/search evidence | +| `CODESTORY_RETRIEVAL_REAL_EMBEDDINGS` | defaults to `1`; `0` is unsupported for product dense-anchor indexing or packet/search evidence when dense anchors exist | | `CODESTORY_EMBED_BACKEND` | unset/default product mode, `llamacpp`, or `llama_cpp` for sidecar query embeddings; explicit `onnx` is non-product for sidecar retrieval and cannot finalize/report full product mode | | `CODESTORY_EMBED_LLAMACPP_URL` | local OpenAI-compatible llama.cpp embedding endpoint (default `http://127.0.0.1:8080/v1/embeddings`) | | `CODESTORY_EMBED_MODEL_DIR` | Host path to `bge-base-en-v1.5.Q8_0.gguf` for compose `embed` service | @@ -414,6 +480,5 @@ project file to provide network endpoints for that run. ## Related docs -- [`retrieval-architecture.md`](../testing/retrieval-architecture.md) — promotion guide and checklist -- [`retrieval-design.md`](../architecture/retrieval-design.md) — mandatory sidecar mode matrix and module contracts -- [`retrieval-sidecar-smoke-ci.md`](../contributors/retrieval-sidecar-smoke-ci.md) — CI job stub +- [`retrieval-architecture.md`](../testing/retrieval-architecture.md) — proof tiers, promotion checklist, and north-star SLOs +- [`retrieval-design.md`](../architecture/retrieval-design.md) — mode definitions, cost envelopes, and promotion guards diff --git a/docs/project-delight-roadmap.md b/docs/project-delight-roadmap.md deleted file mode 100644 index 22d9e473..00000000 --- a/docs/project-delight-roadmap.md +++ /dev/null @@ -1,95 +0,0 @@ -# CodeStory Product Direction - -This page is product direction, not proof that every idea below is fully done. -For measured behavior, use the benchmark docs. For architecture truth, use the -architecture docs. - -CodeStory is meant to be the local codebase browser an agent uses before it -starts manual file inspection: index the repo, keep the evidence local, explain -retrieval, and hand back cited context. - -## Now - -These capabilities are represented in the current CLI/runtime surface: - -- `doctor` reports project, cache, index, retrieval, managed embedding setup, and - next-command health. -- `index` builds graph state, snapshots, lexical search state, and semantic docs - in the local cache. -- `ground --why` gives broad repo orientation with retrieval and coverage notes. -- `report` emits a derived Markdown repo report or JSON graph export from the - current SQLite store, including hotspots, entry points, bridge nodes, - suggested follow-up queries, generation metadata, source locations, and - confidence/certainty when available. -- `search --why` exposes candidate results and retrieval explanations. -- `symbol`, `trail`, `snippet`, and `explore` support focused navigation around - concrete targets. -- `context` builds a DB-first evidence bundle around one concrete target. -- `drill` and `drill-suite` turn a realistic codebase question into an evidence - packet, source-truth checklist, optional claim ledger, expected-file recall, - and separate mechanical versus answer-quality verdicts. -- `serve --stdio` exposes the read surface for repeated agent queries. - -## Next - -The highest-value improvements are still about making the evidence loop easier -to trust and harder to misuse: - -1. **Make reliability and readiness hard to misread** - - Keep tightening `doctor` so it separates local cache/navigation readiness - from agent-facing packet/search sidecar readiness. - - Keep recovery commands focused on the failing layer: stale local index, - managed embedding assets, sidecar manifest, sidecar process, or backend - contract drift. - - Preserve fail-closed behavior for sidecar packet/search and make stale - index repair obvious before answer-quality work starts. - -2. **Make answer-quality gates harder to bypass** - - Treat source-truth correction counts as product evidence, not test noise. - - Keep `drill-suite --ledger` as the repeatable loop for proving whether a - CodeStory-only draft survived focused source reads. - - A green index/build is not enough; final-answer status must stay pending or - degraded until claim classifications prove no misleading or unsupported - claims remain. - -3. **Make target-context packets sharper** - - Improve `context` so it gathers the right neighborhood around one target - with fewer manual hops. - - Keep it target-first; broad open-ended questions belong in `packet` or a - `drill`/`drill-suite` run that records source-truth verification. - -4. **Make retrieval explanations more useful** - - Keep improving `--why` output for lexical, semantic, graph, fallback, and - freshness signals. - - The goal is to show why a result appeared and when not to trust it. - -5. **Improve repository navigation** - - Keep hardening `explore`, definition, references, symbol browsing, trails, - and snippets before adding a separate web UI. - - A new surface should be added only when it solves a workflow that the - current surfaces do not. - -6. **Polish setup without blurring readiness tracks** - - Managed embeddings, profile selection, sidecar setup, and fail-closed - diagnostics should make first use clear. - - If the model path, backend, manifest, sidecar mode, or doc shape is stale, - `doctor` should say so plainly. - -## Later - -- Saved query presets for repeated investigations. -- Sharper shareable result bundles that pair Markdown summaries with machine - JSON across targeted workflows, beyond the current repo-level `report` - artifact. -- Better typo and low-confidence query suggestions. -- A separate web UI only after the browser surface gate has current evidence. - -## Research References - -- Sourcegraph, *Cody Context* docs: multi-source context retrieval and context-window tradeoffs. -- Sourcegraph, *Code Graph* docs: graph structure as contextual signal. -- Sourcegraph, *Agentic Context Fetching* docs: proactive and iterative context gathering. -- GitHub docs, *Navigating code on GitHub*: symbol browsing, go-to-definition, and find-references patterns. -- Microsoft, *Language Server Protocol*: standard definition/reference workflows. -- Model Context Protocol specification: resources, prompts, tools, and safety/consent requirements. -- SQLite FTS5 docs: ranking and snippet/highlight primitives. diff --git a/docs/research.md b/docs/research.md index 9d5d8744..b1709e28 100644 --- a/docs/research.md +++ b/docs/research.md @@ -10,7 +10,8 @@ decisions and points to the comparison matrix, not raw run output. | Real local embeddings | Use `CODESTORY_EMBED_BACKEND=llamacpp` with the local llama.cpp sidecar. | Product packet/search evidence now requires the sidecar manifest to record the 768-d bge-base backend and `retrieval_mode=full`. | | Deterministic diagnostics | `CODESTORY_EMBED_RUNTIME_MODE=hash` is diagnostic-only. | Keeps selected local-dev and CI checks reproducible without model services, but is not agent-facing retrieval evidence. | | Default model profile | `CODESTORY_EMBED_PROFILE=bge-base-en-v1.5`. | BGE-base remains the best quality/speed family for the active runtime. | -| Default doc shape | `CODESTORY_SEMANTIC_DOC_ALIAS_MODE=alias_variant`, durable semantic scope. | Compact aliases help retrieval without the noise of full alias text. | +| Default doc shape | Graph-native `symbol_search_doc` for durable symbols plus `CODESTORY_SEMANTIC_DOC_ALIAS_MODE=alias_variant` for selected dense anchors. | Code recall is AST-first; compact aliases help the dense-anchor subset without returning to an all-code vector corpus. | +| Dense policy | `graph_first_v1` with reasons `public_api`, `entrypoint`, `documented_nontrivial`, `central_graph_node`, `component_report`, and `unstructured_doc`. | Dense vectors are reserved for structurally justified anchors; private trivial code stays discoverable through symbol docs and graph/lexical recall. | | Current benchmark baseline | Historical BGE-base Q8 GGUF through llama.cpp/Vulkan remains the last fully scored broad-holdout baseline; the active mandatory sidecar contract needs a fresh coherent benchmark row. | Do not compare new sidecar speed numbers against old mixed-vintage rows without rerunning the quality and cross-repo gates. | | Peak memory evidence | Segment-2 q8/r6 baseline measured peak descendant working set `828.726562 MB`; repeat sampled `1019.789062 MB`; `peak_vram_mb` was unavailable on this host. | Memory is now measured explicitly, but sampled peak RAM is noisy enough that tiny memory wins need repeats. | | Evidence standard | Quality gates and rank profiles come before speed. | A faster row is rejected when MRR, Hit@10, rank1/rank2-10, or misses regress. | @@ -31,17 +32,15 @@ semantic indexing behavior and cache reuse. ### Product Direction -Read [project-delight-roadmap.md](project-delight-roadmap.md) for current product -direction around target context packets, explainable retrieval, navigation UX, -serving, and setup help. Treat it as direction, not benchmark proof or a -changelog. +Read [usage.md](usage.md) and [architecture overview](architecture/overview.md) +for current operator workflows and navigation surfaces. Treat roadmap notes as +direction, not benchmark proof or a changelog. ### Architecture And Documentation Research -Read [decision-log.md](decision-log.md), [architecture overview](architecture/overview.md), -and [indexing pipeline](architecture/indexing-pipeline.md) for the current -architecture state. Historical ADR-style notes were collapsed into current -architecture docs because clear live-system explanations are more useful here. +Read [architecture overview](architecture/overview.md) and +[indexing pipeline](architecture/indexing-pipeline.md) for the current +architecture state. ## How To Continue Research diff --git a/docs/testing/agent-benchmark-harness-verification.md b/docs/testing/agent-benchmark-harness-verification.md index b67aaa43..7da08145 100644 --- a/docs/testing/agent-benchmark-harness-verification.md +++ b/docs/testing/agent-benchmark-harness-verification.md @@ -6,25 +6,35 @@ Scope: transcript analysis and manifest-backed quality scoring for The harness exposes pure analyzer/scorer functions and keeps a built-in fixture smoke test: -```powershell -node .\scripts\codestory-agent-ab-benchmark.mjs --self-test +```sh +node ./scripts/codestory-agent-ab-benchmark.mjs --self-test ``` The focused Node fixture lives at `scripts/tests/codestory-agent-ab-analyzer.test.mjs`: -```powershell -node --test .\scripts\tests\codestory-agent-ab-analyzer.test.mjs +```sh +node --test ./scripts/tests/codestory-agent-ab-analyzer.test.mjs ``` The fixture verifies: - command category counts for CodeStory CLI, shell search, direct file reads, git, and build/test commands; +- modern Codex JSONL tool category counts for web search, MCP tool calls, + command execution, function calls, and other tool calls; +- direct source-read accounting across the supported language extension set, + including Dart, Bash, HTML, CSS, and SQL; - ordinary source reads after the first successful packet command; +- harness-run no-CodeStory local-context preludes and CodeStory packet preludes + as measured first-context commands; - duplicate file reads by normalized path; - expected file, symbol, claim, and citation recall; -- missed anchors as quality evidence, separate from operational run status. +- missed anchors as quality evidence, separate from operational run status; +- forbidden-claim scoring that avoids contradicted positive-claim false + positives, including hyphenated terms such as `whitespace-only`; +- publishable blockers when the `without_codestory` arm either calls CodeStory + or never inspects the local repository. `drill-suite` answer-quality ledgers are the repo-grounded counterpart to this transcript scorer. Use the transcript harness to check how an agent behaved; use @@ -37,9 +47,167 @@ For source-truth recall, `drill` now feeds the broad question search and bounded supplemental searches into the verification target list. Treat those targets as candidate files for verification, not as final answer support. -Keep `node .\scripts\codestory-agent-ab-benchmark.mjs --list` as the cheapest +Keep `node ./scripts/codestory-agent-ab-benchmark.mjs --list` as the cheapest configuration smoke check. -Do not make public savings claims from these fixtures. They only prove parser -and scorer behavior. Promotion evidence still requires real benchmark runs with -raw transcripts, repeated medians, and quality thresholds. +The language-support A/B suite is: + +```powershell +node scripts/codestory-agent-ab-benchmark.mjs ` + --task-suite language-expansion-holdout ` + --arms without_codestory,with_codestory ` + --repeats 3 --materialize-repos --prepare-codestory-cache ` + --out-dir target/agent-benchmark/language-expansion-holdout ` + --timeout-ms 600000 +``` + +The run ledger records per-run `wall_ms`, token usage, estimated cost when +`CODESTORY_BENCH_INPUT_COST_PER_MTOK` and +`CODESTORY_BENCH_OUTPUT_COST_PER_MTOK` are configured, observed tool calls, tool +categories, web searches, command counts, command categories, direct source +reads, ordinary source reads after the first CodeStory command, ordinary source +reads after the first packet, duplicate file reads, and manifest quality scores. +For the `without_codestory` arm, the harness mechanically runs a strictly +no-CodeStory local-context prelude before starting the nested agent. It derives +plain `rg` search terms from the prompt, reads bounded snippets from selected +source files, records those as measured shell-search/file-read commands, and +injects the snippets into the prompt. For the `with_codestory` arm, the harness +mechanically runs the required `codestory-cli packet` prelude before starting +the nested agent, injects a lean packet excerpt into the prompt, and records +that prelude as a measured CodeStory command with its own wall time. This makes +both local-baseline inspection and packet-first evidence harness facts instead +of prompt-compliance hopes. +When the harness can also score the packet itself against the task manifest and +that packet-level manifest quality passes, the nested CodeStory prompt treats +the packet as complete for that benchmark row. That row-specific stop rule is +not based on generic `sufficiency.status`; it is based on the same expected +file, symbol, claim, and citation evidence used by the row quality gate. +When packet-level manifest quality is incomplete, the CodeStory arm remains +CodeStory-first but is not packet-only by default. The nested agent must use +listed CodeStory follow-ups before ordinary source reads, and any source reads +after the first packet are counted as post-packet overhead. Use +`--max-source-reads-after-packet 0` only for stricter packet-only promotion +evidence. +Each run row also includes a normalized `resource_accounting` object with the +same wall-clock, agent-runner wall-clock, baseline-prelude wall-clock, +CodeStory-prelude wall-clock, token, tool-call, command-count, and source-read +evidence in one place. + +`summary.json` and `reanalyzed-summary.json` include a top-level +`cost_accounting` block. It totals time spent, input/output/total tokens spent, +estimated cost, tool calls, command counts, web searches, and source reads per +arm across all observed rows, including failed or timed-out rows when their +measurements are present, then emits a `with_vs_without` comparison for runner +wall time, all-in wall time, tokens, tool calls, commands, and estimated cost. +The Markdown summary prints the same totals before the per-task median table, +so a human report can compare aggregate cost and time before looking at quality +medians. +`scripts/codestory-agent-ab-score.mjs` reuses that ledger for Autoresearch and +emits `METRIC` lines for the raw per-arm wall time, tokens, tool calls, +commands, CodeStory commands, shell searches, file-read commands, web searches, +post-packet reads, quality pass counts, packet-first pass counts, +packet-manifest quality pass counts, partial packet counts, and ratios. +The score wrapper streams the lower-level benchmark progress while still +capturing stdout/stderr for failure reporting, and it forwards +`--prepare-codestory-timeout-ms` to the benchmark so long CodeStory cache +preparation is visible and explicitly bounded. +The primary `agent_ab_gap` penalizes with-CodeStory quality failures, +packet-first failures, post-packet source reads, and external web/search +leakage. The no-CodeStory quality result is emitted separately as +`without_quality_passes` and `quality_pass_delta` so baseline failure remains +visible without being misattributed as a CodeStory-side regression. + +For faster iteration on runtime packet fixes, use packet probes before nested +agents: + +```powershell +node scripts\codestory-agent-ab-score.mjs ` + --packet-gate --packet-probe-jobs 4 ` + --prepare-codestory-jobs 2 ` + --task-ids ` + --out-dir target\agent-benchmark\ +``` + +The packet gate runs cold `codestory-cli packet` probes first, with independent +rows parallelized by `--packet-probe-jobs`. Only tasks whose packet manifest +quality passes are sent to the nested A/B harness. If no task passes the packet +gate, the wrapper emits `packet_gate_*` metrics and exits non-zero before +nested agents run. Pass `--allow-empty-packet-gate` only for exploratory +diagnostics where an empty nested A/B run is intentional. +Rows that fail because the packet process temporarily cannot reach mandatory +sidecars are retried once, serially, in `packet-probes-retry`; the wrapper +emits `packet_gate_retry_tasks` plus retry artifact paths and uses the merged +quality-debug rows for A/B selection. Content-quality failures are not retried. +Use `--packet-gate-improved-from ` when iterating on runtime +packet fixes; then a task must pass the current packet gate and improve over +the previous packet-probe `quality-debug.json` or A/B `reanalyzed-runs.jsonl` +packet-prelude manifest score before nested agents are launched. + +For anti-overfit language work, run packet probes with production defaults and +keep exact benchmark probes behind manifests, explicit request probes, or +`CODESTORY_EVAL_PROBES=1` diagnostics only. Do not treat general +framework/domain semantics as overfit when they apply to real projects. +The current clean serial packet runtime scores `12/18` manifest-quality passes +without sidecar failures, with `9/18` rows packet-sufficient without follow-up +commands and Java, Redis, and Okio still missing the retrieval latency SLA. The +matching packet-gated A/B slice is useful for cost/time/tool-call accounting +(`9/9` CodeStory quality versus `6/9` baseline), but it is not promotion +evidence for all public language-support profiles because the slice is selected +from rows that are useful to compare today. + +The lower-level packet runtime mode can also be run directly with row-level +parallelism: + +```powershell +node scripts\codestory-agent-ab-benchmark.mjs ` + --packet-runtime --packet-runtime-mode cold-cli ` + --task-ids ` + --jobs 4 --prepare-codestory-jobs 2 ` + --out-dir target\agent-benchmark\ +``` + +This mode runs only CodeStory packet probes and does not start nested agents. +Keep `--prepare-codestory-jobs` lower than packet row concurrency; `2` to `4` +is usually the practical cap before local indexing, embedding, or Qdrant work +starts contending with itself. + +Nested A/B runs can use `--jobs N` too, but the harness parallelizes only +independent repo groups. Arms, repeats, and multiple tasks for the same repo +stay serial so both arms do not mutate the same checkout at the same time. + +When only CodeStory runtime packet behavior changed, reuse matching baselines: + +```powershell +node scripts\codestory-agent-ab-score.mjs ` + --packet-gate --packet-probe-jobs 4 ` + --packet-gate-improved-from target\agent-benchmark\ ` + --task-ids ` + --reuse-baseline-from target\agent-benchmark\ ` + --out-dir target\agent-benchmark\ +``` + +Baseline reuse is strict. The benchmark reuses only `without_codestory` rows +whose repo, task id, repeat, and task manifest snapshot match the current run. +It reanalyzes the old raw row with the current analyzer, copies stdout/stderr +and baseline-context artifacts into the new output directory, and annotates the +row with `reused_from`. Do not reuse baselines across manifest or scorer +changes; rerun the no-CodeStory arm in those cases. + +Web search, browser tools, remote URLs, and upstream mirrors are not allowed in +local pinned-repo A/B runs. Publishable gating reports external web/search tool +calls as blockers instead of treating them as local repository exploration. +Publishable gating also rejects rows that are missing wall time, total token +usage, observed tool-call count, or command-count accounting. A publishable +`without_codestory` row must inspect the local repository without CodeStory; a +model-prior answer with zero local commands is not valid baseline evidence. + +On Windows, nested `codex exec --sandbox workspace-write` can fail before local +commands launch with `CreateProcessWithLogonW failed: 1326`. Treat those rows as +invalid local-repo evidence. For local smoke verification on a trusted checkout, +rerun with `--sandbox danger-full-access` and confirm the summary shows local +command/tool counts and zero web searches. + +Do not make public savings claims from these fixtures. They only prove +transcript analyzer/parser and scorer behavior. Promotion evidence still +requires real benchmark runs with raw transcripts, repeated medians, and quality +thresholds. diff --git a/docs/testing/benchmark-ledger.md b/docs/testing/benchmark-ledger.md index 5cb37196..957109a6 100644 --- a/docs/testing/benchmark-ledger.md +++ b/docs/testing/benchmark-ledger.md @@ -1,16 +1,116 @@ # CodeStory Benchmark Ledger -This ledger keeps detailed benchmark history that is too dense for the README -scorecard. Treat every row as machine-, cache-, runner-, and date-specific. -Promote only rows that pass the current harness gates documented in -[benchmark-results.md](benchmark-results.md). +This ledger keeps the decision-grade scorecard and detailed benchmark history +that is too dense for the README. Treat every row as machine-, cache-, runner-, +and date-specific. Promote only rows that pass the current harness gates +documented below. + +Runs recorded before the 2026-05-24 harness tightening are historical unless +they are reanalyzed or rerun with answer-level expected-file/symbol recall, +immutable manifest refs, and CodeStory cache provenance. The harness now keeps +transcript-observed anchors separate from anchors actually present in the final +answer, so tool output alone cannot make a row quality-pass. + +## Current Scorecard + +| Lane | Current status | Public claim status | +| --- | --- | --- | +| Agent A/B quick check | The 2026-05-23 CodeStory-only quick run passed both arms, but the CodeStory arm used more tokens, more wall time, and more tool starts. | No agent savings claim. | +| Local-real Codex probe | On 2026-05-25, the narrowed `codex-exec-json-flow` live A/B repeated with a quality-passing CodeStory arm against a failing no-CodeStory arm. Latest corrected-wrapper repeat: `114,510` vs `2,209,856` tokens, `2` vs `39` observed tool calls, `117.37s` vs `262.39s`, and overhead ratio `0.183466`. | Strong exploratory evidence; no promotion claim from this task alone. | +| Local-real Sourcetrail probe | On 2026-05-25, the `sourcetrail-indexing-to-storage` live A/B passed with CodeStory after source-group/indexing/storage packet fixes. CodeStory used `269,363` vs `5,697,852` tokens, `2` vs `105` observed tool calls, `138.92s` vs `532.68s`, `0` vs `87` source reads, and overhead ratio `0.10904`. | Strong second-repo exploratory evidence; still not promotion-grade because it is one repeat using a local existing cache. | +| Local-real VS Code probe | On 2026-05-25, the `vscode-workbench-extension-host` packet holdout moved from partial coverage to a sufficient packet, then the live A/B passed with CodeStory after workbench/extension-host packet fixes. CodeStory used `1,070,153` vs `7,296,578` tokens, `2` vs `115` observed tool calls, `329.69s` vs `626.08s`, `0` vs `71` source reads, and overhead ratio `0.230215`. A follow-up release incremental refresh repaired the stale cache provenance, moving VS Code freshness from `74` new files to `0`. | Strong third-repo exploratory evidence; still not promotion-grade because it is one repeat and the no-CodeStory arm failed quality. | +| Local-real drill-suite probe | On 2026-05-25, a four-repo `drill-suite` matrix exposed a real CodeStory cache-reuse blocker, stale Codex anchor selections, VS Code indexing-error blockage, and Sourcetrail source-truth-only bridges. After the CodeStory cache fix and Rust receiver/return-chain graph pass, this repo's one-case drill is still degraded but now resolves `11/11` anchors with `28/55` graph bridges, `27` partial bridges, and `0` unresolved bridges. | Diagnostic product evidence only; the remaining target is store/workspace execution-plan and snapshot/projection bridge coverage. | +| Strict packet-first rows | Several with-CodeStory public-checkout rows passed quality, packet-first, and zero ordinary source reads after packet. | Behavior evidence only; paired savings still needs broader quality-passing baselines. | +| Packet runtime | Public-core warm stdio and cold CLI packet rows passed repeated publishable quality gates. | Runtime evidence, not agent-token savings. | +| Repo-scale cold index/read timing | The current timing source is the latest row in [codestory-e2e-stats-log.md](codestory-e2e-stats-log.md). | Current only after a fresh row is logged for the relevant change. | +| Warm stdio smoke | The current warm-loop timing source is [codestory-stdio-warm-loop-stats.md](codestory-stdio-warm-loop-stats.md). | Smoke evidence for the persistent read surface. | + +## What Is Solid + +- CodeStory can produce quality-passing packet-first answers on selected public + tasks while avoiding ordinary source reads after the answer packet. +- Repeated packet-runtime rows show `packet` can fit inside an agent workflow + budget in both cold CLI and warm stdio modes. +- The local-real harness now separates first-index setup cost from timed + cache-reuse agent work, blocks stale or semantic-empty caches from + publishable evidence, and records useful-context density for final-answer + context instead of raw packet volume alone. +- The quality-passing local-real Codex live A/B has now repeated on the same + task with the corrected wrapper. +- Sourcetrail now adds a second realistic repo where the CodeStory arm passed + quality and avoided source reads while the no-CodeStory arm failed quality + after broad exploration. +- VS Code now adds a large TypeScript repo where the packet planner can find the + workbench startup, extension service, extension host manager, extension-host + activation, and command execution anchors without follow-up commands, and the + live CodeStory arm passed quality while using far fewer tools and source + reads than the no-CodeStory arm. +- The VS Code cache freshness issue behind the first local-real row is now + understood and fixed: TypeScript/TSX factory-call superclass extraction no + longer crashes on `extends mock()`, failed attempts are recorded as + incomplete files with attached errors, and `../vscode` now reports + `10,491/10,491` indexed files as fresh after incremental refresh. +- CodeStory's own active cache can now recover from stale incremental + projection cleanup where cross-file callable state points at a deleted node; + release incremental refresh reports fresh inventory with `150/150` indexed + files, `0` errors, and `7,794` semantic docs. +- The tightened CodeStory drill now exposes the CLI-to-runtime-to-indexer path + mostly as graph evidence: Rust receiver and return-chain resolution moved the + case from `3/55` graph bridges to `28/55`, while preserving explicit + source-truth-only status for the remaining unproven bridge pairs. +- Repo-scale timing history is tracked in the stats log instead of copied into + prose that silently drifts. + +## What Is Not Claimed + +This page does not claim that CodeStory generally reduces agent cost, token +count, wall time, or tool calls. General savings claims require repeated +controlled with/without-agent measurements from the benchmark harness, not one +exploratory row or representative estimates. + +The 2026-05-25 Codex, Sourcetrail, and VS Code local-real rows are explicitly +non-promotional. They show a CodeStory advantage on three realistic tasks, but +they are still single-run or same-task exploratory measurements using local +cache state. The VS Code cache now has fresh provenance after the follow-up +repair, but public savings language still needs repeated controlled rows, clean +pinned checkout provenance, and at least one holdout that was not tuned during +the implementation loop. + +The 2026-05-25 `drill-suite` rows are also non-promotional. They are designed to +find grounding failures before an agent A/B run, and the current CodeStory case +still falls back to source-truth-only evidence for `27/55` bridge pairs. + +## Proof Tier Ladder + +Use the highest tier actually reached when describing a row. Do not promote a +lower tier into a broader claim just because the command exited successfully. + +| Proof tier | Required evidence | Can claim | Cannot claim | +| --- | --- | --- | --- | +| Stats-only local regression signal | `codestory_repo_e2e_stats` completed with skip allowances or without prepared full sidecars. | Local timing, indexing, and cache-shape regression signal for the current checkout. | Full sidecar readiness, agent packet/search readiness, real-repo release coverage, or performance promotion. | +| Full sidecar readiness proof | Zoekt, Qdrant, SCIP, and llama.cpp are running; `retrieval index --refresh full` succeeds; `retrieval status --format json` reports `retrieval_mode: "full"` and product backend fields. | Agent-facing packet/search readiness for the verified workspace and cache state. | General quality, cross-repo coverage, or benchmark savings. | +| Real-repo drill proof | Prepared real-repo drill manifests run without skip allowances and produce expected evidence packets, source-truth checks, and verdicts. | The release path was exercised beyond the CodeStory checkout on the named drill cases. | Generalized agent savings or promotion-grade performance. | +| Promotion-grade benchmark proof | Controlled baseline and candidate benchmark rows use pinned refs, comparable cache state, sidecar status, answer-level quality gates, and no-regression thresholds. | Cautious performance or retrieval-quality promotion for the measured scope. | Universal savings, untested repos, or environments outside the recorded setup. | + +## Promotion Rules + +- Use the same project, cache state, semantic backend, command flags, runner, + model, and sample shape when comparing before/after results. +- Do not promote a speed win if expected anchors, answer-level quality, protocol + cleanliness, or semantic-doc reuse regress. +- Treat small-fixture warm-loop numbers as smoke evidence, not repo-scale + product proof. +- Append current repo-scale timing rows to + [codestory-e2e-stats-log.md](codestory-e2e-stats-log.md) when default + indexing, semantic persistence, embedding reuse, or cold-start behavior + changes. ## Agent A/B History The 2026-05-23 quick CodeStory repo run used: -```powershell -node .\scripts\codestory-agent-ab-benchmark.mjs --quick --repos codestory --repeats 3 --timeout-ms 900000 --sandbox danger-full-access --publishable --out-dir target\agent-benchmark\codestory-quick-2026-05-23-r3 +```sh +node ./scripts/codestory-agent-ab-benchmark.mjs --quick --repos codestory --repeats 3 --timeout-ms 900000 --sandbox danger-full-access --publishable --out-dir target/agent-benchmark/codestory-quick-2026-05-23-r3 ``` It was a real baseline, not a savings claim. The without-CodeStory arm passed @@ -62,9 +162,9 @@ On 2026-05-23, the release CLI completed three-repeat packet runtime runs against the full public-core manifest suite in both warm stdio and cold CLI modes: -```powershell -node .\scripts\codestory-agent-ab-benchmark.mjs --packet-runtime --task-suite public-core --repeats 3 --packet-runtime-mode warm-stdio --codestory-cli .\target\release\codestory-cli.exe --out-dir target\agent-benchmark\packet-runtime-public-core-warm-r8 --timeout-ms 120000 --publishable -node .\scripts\codestory-agent-ab-benchmark.mjs --packet-runtime --task-suite public-core --repeats 3 --packet-runtime-mode cold-cli --codestory-cli .\target\release\codestory-cli.exe --out-dir target\agent-benchmark\packet-runtime-public-core-cold-r9 --timeout-ms 120000 --publishable +```sh +node ./scripts/codestory-agent-ab-benchmark.mjs --packet-runtime --task-suite public-core --repeats 3 --packet-runtime-mode warm-stdio --codestory-cli ./target/release/codestory-cli --out-dir target/agent-benchmark/packet-runtime-public-core-warm-r8 --timeout-ms 120000 --publishable +node ./scripts/codestory-agent-ab-benchmark.mjs --packet-runtime --task-suite public-core --repeats 3 --packet-runtime-mode cold-cli --codestory-cli ./target/release/codestory-cli --out-dir target/agent-benchmark/packet-runtime-public-core-cold-r9 --timeout-ms 120000 --publishable ``` Across both modes, all `108` packet rows passed operationally and quality gates. @@ -90,9 +190,14 @@ and model. Use `--publishable` only when the selected runner reports token usage and every run succeeds. For agent A/B rows, `--publishable` also requires with-CodeStory runs to execute an answer packet with `--question` first and stay within the -post-packet ordinary source-read budget. Publishable rows must carry clean -repository provenance pinned to an immutable commit or tag plus CodeStory cache -provenance from `doctor --format json`. +explicit post-packet ordinary source-read budget supplied through +`--max-source-reads-after-packet `. Use `0` for packet-only promotion +evidence; use a larger number only when the row is intentionally CodeStory-first +but not packet-only. Publishable rows must carry clean repository provenance +pinned to a full 40-character Git commit SHA plus CodeStory cache provenance +from `doctor --format json`. Tags are not accepted for publishable +materialized-repo rows because they can be moved after the benchmark is +published. Packet runtime runs compare cold CLI `packet` invocations with warm `serve --stdio` packet calls. They are runtime rows, not agent-token rows, and @@ -100,14 +205,14 @@ still use manifest quality gates before promotion. ## Commands -```powershell -node .\scripts\codestory-agent-ab-benchmark.mjs --list -node .\scripts\codestory-agent-ab-benchmark.mjs --quick --repos codestory --repeats 3 --timeout-ms 600000 --publishable -node .\scripts\codestory-agent-ab-benchmark.mjs --task-suite public-core --list -node .\scripts\codestory-agent-ab-benchmark.mjs --task-suite public-core --task-ids codestory-indexing-flow,vite-dev-server-architecture --arms with_codestory --repeats 3 --max-source-reads-after-packet 0 --allow-failures -node .\scripts\codestory-agent-ab-benchmark.mjs --reanalyze-dir target\agent-benchmark\ -node .\scripts\codestory-agent-ab-benchmark.mjs --task-suite public-core --materialize-repos --list -node .\scripts\codestory-agent-ab-benchmark.mjs --packet-runtime --task-suite public-core --repeats 3 +```sh +node ./scripts/codestory-agent-ab-benchmark.mjs --list +node ./scripts/codestory-agent-ab-benchmark.mjs --quick --repos codestory --repeats 3 --timeout-ms 600000 --publishable --max-source-reads-after-packet 0 +node ./scripts/codestory-agent-ab-benchmark.mjs --task-suite public-core --list +node ./scripts/codestory-agent-ab-benchmark.mjs --task-suite public-core --task-ids codestory-indexing-flow,vite-dev-server-architecture --arms with_codestory --repeats 3 --max-source-reads-after-packet 0 --allow-failures +node ./scripts/codestory-agent-ab-benchmark.mjs --reanalyze-dir target/agent-benchmark/ +node ./scripts/codestory-agent-ab-benchmark.mjs --task-suite public-core --materialize-repos --list +node ./scripts/codestory-agent-ab-benchmark.mjs --packet-runtime --task-suite public-core --repeats 3 ``` Cold repo-scale timings are owned by diff --git a/docs/testing/benchmark-results.md b/docs/testing/benchmark-results.md deleted file mode 100644 index 1e815fb4..00000000 --- a/docs/testing/benchmark-results.md +++ /dev/null @@ -1,114 +0,0 @@ -# CodeStory Benchmark Results - -This is the short, decision-grade scorecard linked from the README. It keeps -current claims cautious and points detailed history to the -[benchmark ledger](benchmark-ledger.md). - -Runs recorded before the 2026-05-24 harness tightening are historical unless -they are reanalyzed or rerun with answer-level expected-file/symbol recall, -immutable manifest refs, and CodeStory cache provenance. The harness now keeps -transcript-observed anchors separate from anchors actually present in the final -answer, so tool output alone cannot make a row quality-pass. - -## Current Scorecard - -| Lane | Current status | Public claim status | -| --- | --- | --- | -| Agent A/B quick check | The 2026-05-23 CodeStory-only quick run passed both arms, but the CodeStory arm used more tokens, more wall time, and more tool starts. | No agent savings claim. | -| Local-real Codex probe | On 2026-05-25, the narrowed `codex-exec-json-flow` live A/B repeated with a quality-passing CodeStory arm against a failing no-CodeStory arm. Latest corrected-wrapper repeat: `114,510` vs `2,209,856` tokens, `2` vs `39` observed tool calls, `117.37s` vs `262.39s`, and overhead ratio `0.183466`. | Strong exploratory evidence; no promotion claim from this task alone. | -| Local-real Sourcetrail probe | On 2026-05-25, the `sourcetrail-indexing-to-storage` live A/B passed with CodeStory after source-group/indexing/storage packet fixes. CodeStory used `269,363` vs `5,697,852` tokens, `2` vs `105` observed tool calls, `138.92s` vs `532.68s`, `0` vs `87` source reads, and overhead ratio `0.10904`. | Strong second-repo exploratory evidence; still not promotion-grade because it is one repeat using a local existing cache. | -| Local-real VS Code probe | On 2026-05-25, the `vscode-workbench-extension-host` packet holdout moved from partial coverage to a sufficient packet, then the live A/B passed with CodeStory after workbench/extension-host packet fixes. CodeStory used `1,070,153` vs `7,296,578` tokens, `2` vs `115` observed tool calls, `329.69s` vs `626.08s`, `0` vs `71` source reads, and overhead ratio `0.230215`. A follow-up release incremental refresh repaired the stale cache provenance, moving VS Code freshness from `74` new files to `0`. | Strong third-repo exploratory evidence; still not promotion-grade because it is one repeat and the no-CodeStory arm failed quality. | -| Local-real drill-suite probe | On 2026-05-25, a four-repo `drill-suite` matrix exposed a real CodeStory cache-reuse blocker, stale Codex anchor selections, VS Code indexing-error blockage, and Sourcetrail source-truth-only bridges. After the CodeStory cache fix and Rust receiver/return-chain graph pass, this repo's one-case drill is still degraded but now resolves `11/11` anchors with `28/55` graph bridges, `27` partial bridges, and `0` unresolved bridges. | Diagnostic product evidence only; the remaining target is store/workspace execution-plan and snapshot/projection bridge coverage. | -| Strict packet-first rows | Several with-CodeStory public-checkout rows passed quality, packet-first, and zero ordinary source reads after packet. | Behavior evidence only; paired savings still needs broader quality-passing baselines. | -| Packet runtime | Public-core warm stdio and cold CLI packet rows passed repeated publishable quality gates. | Runtime evidence, not agent-token savings. | -| Repo-scale cold index/read timing | The current timing source is the latest row in [codestory-e2e-stats-log.md](codestory-e2e-stats-log.md). | Current only after a fresh row is logged for the relevant change. | -| Warm stdio smoke | The current warm-loop timing source is [codestory-stdio-warm-loop-stats.md](codestory-stdio-warm-loop-stats.md). | Smoke evidence for the persistent read surface. | - -## What Is Solid - -- CodeStory can produce quality-passing packet-first answers on selected public - tasks while avoiding ordinary source reads after the answer packet. -- Repeated packet-runtime rows show `packet` can fit inside an agent workflow - budget in both cold CLI and warm stdio modes. -- The local-real harness now separates first-index setup cost from timed - cache-reuse agent work, blocks stale or semantic-empty caches from - publishable evidence, and records useful-context density for final-answer - context instead of raw packet volume alone. -- The quality-passing local-real Codex live A/B has now repeated on the same - task with the corrected wrapper. -- Sourcetrail now adds a second realistic repo where the CodeStory arm passed - quality and avoided source reads while the no-CodeStory arm failed quality - after broad exploration. -- VS Code now adds a large TypeScript repo where the packet planner can find the - workbench startup, extension service, extension host manager, extension-host - activation, and command execution anchors without follow-up commands, and the - live CodeStory arm passed quality while using far fewer tools and source - reads than the no-CodeStory arm. -- The VS Code cache freshness issue behind the first local-real row is now - understood and fixed: TypeScript/TSX factory-call superclass extraction no - longer crashes on `extends mock()`, failed attempts are recorded as - incomplete files with attached errors, and `../vscode` now reports - `10,491/10,491` indexed files as fresh after incremental refresh. -- CodeStory's own active cache can now recover from stale incremental - projection cleanup where cross-file callable state points at a deleted node; - release incremental refresh reports fresh inventory with `150/150` indexed - files, `0` errors, and `7,794` semantic docs. -- The tightened CodeStory drill now exposes the CLI-to-runtime-to-indexer path - mostly as graph evidence: Rust receiver and return-chain resolution moved the - case from `3/55` graph bridges to `28/55`, while preserving explicit - source-truth-only status for the remaining unproven bridge pairs. -- Repo-scale timing history is tracked in the stats log instead of copied into - prose that silently drifts. - -## What Is Not Claimed - -This page does not claim that CodeStory generally reduces agent cost, token -count, wall time, or tool calls. General savings claims require repeated -controlled with/without-agent measurements from the benchmark harness, not one -exploratory row or representative estimates. - -The 2026-05-25 Codex, Sourcetrail, and VS Code local-real rows are explicitly -non-promotional. They show a CodeStory advantage on three realistic tasks, but -they are still single-run or same-task exploratory measurements using local -cache state. The VS Code cache now has fresh provenance after the follow-up -repair, but public savings language still needs repeated controlled rows, clean -pinned checkout provenance, and at least one holdout that was not tuned during -the implementation loop. - -The 2026-05-25 `drill-suite` rows are also non-promotional. They are designed to -find grounding failures before an agent A/B run, and the current CodeStory case -still falls back to source-truth-only evidence for `27/55` bridge pairs. - -## Proof Tier Ladder - -Use the highest tier actually reached when describing a row. Do not promote a -lower tier into a broader claim just because the command exited successfully. - -| Proof tier | Required evidence | Can claim | Cannot claim | -| --- | --- | --- | --- | -| Stats-only local regression signal | `codestory_repo_e2e_stats` completed with skip allowances or without prepared full sidecars. | Local timing, indexing, and cache-shape regression signal for the current checkout. | Full sidecar readiness, agent packet/search readiness, real-repo release coverage, or performance promotion. | -| Full sidecar readiness proof | Zoekt, Qdrant, SCIP, and llama.cpp are running; `retrieval index --refresh full` succeeds; `retrieval status --format json` reports `retrieval_mode: "full"` and product backend fields. | Agent-facing packet/search readiness for the verified workspace and cache state. | General quality, cross-repo coverage, or benchmark savings. | -| Real-repo drill proof | Prepared real-repo drill manifests run without skip allowances and produce expected evidence packets, source-truth checks, and verdicts. | The release path was exercised beyond the CodeStory checkout on the named drill cases. | Generalized agent savings or promotion-grade performance. | -| Promotion-grade benchmark proof | Controlled baseline and candidate benchmark rows use pinned refs, comparable cache state, sidecar status, answer-level quality gates, and no-regression thresholds. | Cautious performance or retrieval-quality promotion for the measured scope. | Universal savings, untested repos, or environments outside the recorded setup. | - -## Promotion Rules - -- Use the same project, cache state, semantic backend, command flags, runner, - model, and sample shape when comparing before/after results. -- Do not promote a speed win if expected anchors, answer-level quality, protocol - cleanliness, or semantic-doc reuse regress. -- Treat small-fixture warm-loop numbers as smoke evidence, not repo-scale - product proof. -- Append current repo-scale timing rows to - [codestory-e2e-stats-log.md](codestory-e2e-stats-log.md) when default - indexing, semantic persistence, embedding reuse, or cold-start behavior - changes. - -## Detailed History - -- Detailed agent A/B rows, packet-runtime history, methodology, and commands: - [benchmark-ledger.md](benchmark-ledger.md) -- Repo-scale timing history: - [codestory-e2e-stats-log.md](codestory-e2e-stats-log.md) -- Warm stdio loop history: - [codestory-stdio-warm-loop-stats.md](codestory-stdio-warm-loop-stats.md) diff --git a/docs/testing/cli-navigation-next-wave-performance-review.md b/docs/testing/cli-navigation-next-wave-performance-review.md deleted file mode 100644 index 705bfcb5..00000000 --- a/docs/testing/cli-navigation-next-wave-performance-review.md +++ /dev/null @@ -1,65 +0,0 @@ -# CLI Navigation Next Wave Performance Review - -This is the initial validation record for the CLI-first navigation branch. It is -not a transport, server, MCP, or watch-mode benchmark. - -## Environment - -| Field | Value | -| --- | --- | -| Date | 2026-05-20 | -| Commit | `fea0cc5` with a dirty working tree for this branch | -| Shell | PowerShell 7.6.1 | -| Rust | `rustc 1.90.0`, `cargo 1.90.0` | -| Binary | `target/debug/codestory-cli.exe` | -| Project | `C:/Users/alber/source/repos/codestory` | -| Cache state | warm existing cache, `--refresh none`; doctor reported stale index freshness | -| Index shape | 145 files, 43,938 nodes, 37,086 edges | -| Retrieval | hybrid ready, 6,029 semantic docs, ONNX BGE base, DirectML provider, stored int8 vectors | - -The cache was intentionally not refreshed during these warm-read measurements so -the record captures read-path cost separately from indexing cost. Doctor reported -17 changed files, so these numbers are a branch validation baseline rather than -a release claim. - -## Warm Read Baseline - -Each command was run four times with the first run discarded. Times are wall -clock milliseconds from `Measure-Command`; stdout was redirected to `Out-Null`. - -| Path | Command | Samples ms | Kept avg ms | Kept max ms | -| --- | --- | ---: | ---: | ---: | -| files JSON | `target/debug/codestory-cli.exe files --project . --refresh none --format json` | `761.4, 763.5, 749.4, 743.2` | 752.0 | 763.5 | -| search JSON | `target/debug/codestory-cli.exe search --project . --query build_coverage_buckets --refresh none --format json` | `2875.4, 2900.6, 2960.2, 2848.1` | 2903.0 | 2960.2 | -| explore JSON | `target/debug/codestory-cli.exe explore --project . --id -743279210528755755 --no-tui --refresh none --format json` | `921.8, 860.8, 871.9, 911.1` | 881.3 | 911.1 | -| affected JSON | `target/debug/codestory-cli.exe affected --project . crates/codestory-runtime/src/lib.rs --refresh none --format json` | `1011.4, 996.0, 1027.4, 1010.6` | 1011.3 | 1027.4 | - -## Dominant Cost Centers - -| Path | Observed cost center | Notes | -| --- | --- | --- | -| files | storage open plus summary/materialization | Matrix rendering is small compared with opening and reading persisted file inventory. | -| search | hybrid search and repo-text fallback eligibility | This warm read is the slowest path in the sample; use `search --why --format json` and search-quality eval before ranking changes. | -| explore | symbol, trail, snippet, and source-slice reads | Profile presets and relationship evidence are bounded by existing depth and node caps. | -| affected | graph traversal plus file-role/test aggregation | Current traversal is bounded by `--depth`; route/test evidence is scored and reported as hints. | - -## No-Regression Gates - -- Route/ranking changes must keep the search-quality eval at no lost expected - anchors and no lower MRR unless the validation record explains the tradeoff. -- `files` coverage output must remain deterministic and include - `fixture_status`, `unsupported_patterns`, `known_gaps`, and `promotable`. -- `explore` JSON must keep stable status, profile, resolution, navigation, - relationship evidence, route context, source packet, trail, symbol, and - snippet sections. -- `affected` JSON must report matched/unmatched paths, graph depth, reason, - confidence, route evidence, blind spots, and next commands. -- Do not introduce broad async runtime migration, unbounded parallelism, or - parallel Cargo verification without a fresh candidate-gate record. - -## Current Parallelization Decision - -No new parallelization is promoted by this branch. The measured paths are warm -read paths over SQLite, hybrid search, source reads, and bounded graph traversal. -The next performance candidate should start with query-level search profiling, -because the search warm-read path has the highest current max latency. diff --git a/docs/testing/codestory-e2e-stats-log.md b/docs/testing/codestory-e2e-stats-log.md index 834077c7..7da11826 100644 --- a/docs/testing/codestory-e2e-stats-log.md +++ b/docs/testing/codestory-e2e-stats-log.md @@ -2,13 +2,15 @@ Append one entry before each commit after running: -```powershell +```sh cargo build --release -p codestory-cli cargo test -p codestory-cli --test codestory_repo_e2e_stats -- --ignored --nocapture ``` Keep the full emitted JSON in the test output when reviewing locally, and add the headline metrics here so search/index reuse trends are visible over time. For performance branches, capture the baseline and no-regression threshold from [performance-review-playbook.md](performance-review-playbook.md) before tuning. +Rows whose commit cell ends in `+wt` were run from the working tree based on that commit before the proof row itself was committed. When a row names the later commit in the result text, the runtime/code state under test is the working tree that became that commit; proof-only commits after the run still need fresh `ready`/`doctor` checks because docs change the sidecar input hash. + | Date | Commit | Result | Index seconds | Ground seconds | Search seconds | Symbol seconds | Trail seconds | Snippet seconds | Nodes | Edges | Files | Index errors | Semantic docs | Search dir unchanged | | --- | --- | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | --- | | 2026-04-18 | 2d6cc2c | pass | 171.97 | 0.09 | 0.84 | 0.09 | 0.07 | 0.06 | 25,500 | 21,622 | 122 | 0 | 10,205 | true | @@ -54,9 +56,76 @@ Keep the full emitted JSON in the test output when reviewing locally, and add th | 2026-06-02 | a23770f+wt | pass, round 9 stats-only release e2e; real drill intentionally skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; not real-drill release evidence; retrieval_index_seconds 18.35; retrieval_status_seconds 0.56; retrieval_mode full | 711.31 | 0.32 | 1.77 | 0.59 | 0.32 | 0.27 | 78,582 | 66,332 | 217 | 0 | 10,847 | true | | 2026-06-05 | 42089cc5+wt | pass, stats-only retrieval rollout proof guidance plus strict sidecar markdown freshness fix; real drill intentionally skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; retrieval_index_seconds 21.57; retrieval_status_seconds 0.96; retrieval_mode full | 981.56 | 0.50 | 2.94 | 0.54 | 0.34 | 0.26 | 79,028 | 66,731 | 217 | 0 | 10,881 | true | | 2026-06-08 | 9387e9e3 | pass, proof readiness 0.6.2 full-sidecar stats; proof_tier full_sidecar; warnings index_seconds>600 and semantic_phase_seconds>500; real drill not run because CODESTORY_REAL_REPO_DRILL_CASES was missing; retrieval_index_seconds 18.13; retrieval_status_seconds 1.28; retrieval_mode full | 791.43 | 0.39 | 3.46 | 0.49 | 0.27 | 0.35 | 79,779 | 67,446 | 217 | 0 | 11,049 | true | +| 2026-06-10 | a88705f2 | pass, clean main baseline same-machine full-sidecar stats from detached worktree; warnings index_seconds>600 and semantic_phase_seconds>500; retrieval_index_seconds 26.44; retrieval_mode full | 1238.23 | 0.44 | 4.33 | 0.93 | 0.40 | 0.37 | 80,734 | 68,163 | 220 | 0 | 11,178 | true | +| 2026-06-10 | a88705f2+wt | pass, AST-first graph_first_v1 full-sidecar stats; symbol_search_docs 11,315; dense anchors 693; semantic_embedding_ms 43.23s; repeat full refresh 22.75s with 0 embedded; retrieval_index_seconds 7.53; retrieval_mode full | 67.34 | 0.21 | 2.11 | 0.54 | 0.22 | 0.20 | 82,219 | 69,489 | 220 | 0 | 693 | true | +| 2026-06-11 | a88705f2+wt | AST-first graph_first_v1 sampled release e2e; symbol_search_docs 11,336; dense anchors 693; dense skips 10,643; semantic_embedding_ms 48.52s; retrieval_index_seconds 7.31; retrieval_mode full; repeat full refresh 21.39s with 0 embedded; peak descendant 304.93 MB at target/memory-measure/ast-first-release-e2e-v6/summary.json | 67.97 | 0.22 | 2.24 | 0.58 | 0.24 | 0.22 | 82,510 | 69,766 | 220 | 0 | 693 | true | +| 2026-06-11 | a88705f2+wt | final AST-first graph_first_v1 sampled release e2e after drill sidecar finalizer; symbol_search_docs 11,336; dense anchors 693; dense skips 10,643; semantic_embedding_ms 48.83s; retrieval_index_seconds 6.54; retrieval_mode full; repeat full refresh 21.39s with 0 embedded; peak descendant 318.35 MB at target/memory-measure/ast-first-release-e2e-v9/summary.json | 69.18 | 0.26 | 2.38 | 0.56 | 0.24 | 0.23 | 82,528 | 69,784 | 220 | 0 | 693 | true | +| 2026-06-11 | 376df0c8+wt | readiness/handoff and Unix compatibility release e2e; proof_tier full_sidecar; warnings none; real drill intentionally skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; symbol_search_docs 11,505; dense anchors 708; dense skips 10,797; semantic_embedding_ms 48.89s; retrieval_index_seconds 10.95; retrieval_mode full; repeat full refresh 20.56s with 0 embedded | 68.23 | 0.22 | 2.27 | 0.54 | 0.22 | 0.20 | 83,735 | 70,803 | 222 | 0 | 708 | true | +| 2026-06-11 | a60f078a+wt | agent-grounding rescue full e2e; proof_tier full_sidecar; warnings none; real drill manifest target/agent-benchmark/real-repo-drill-cases.json with no skip allowance; holdout packet gate final-v4 passed cold+warm; symbol_search_docs 11,543; dense anchors 708; dense skips 10,835; semantic_embedding_ms 45.17s; retrieval_index_seconds 6.50; retrieval_mode full; repeat full refresh 21.82s with 0 embedded | 66.00 | 0.22 | 2.05 | 0.53 | 0.21 | 0.20 | 84,170 | 71,161 | 222 | 0 | 708 | true | +| 2026-06-11 | f89e7c63+wt | review action plan full-sidecar stats; proof_tier full_sidecar; warnings none; real drill not run because CODESTORY_REAL_REPO_DRILL_CASES was missing; symbol_search_docs 11,615; dense anchors 712; dense skips 10,903; semantic_embedding_ms 45.58s; retrieval_index_seconds 8.31; retrieval_mode full; repeat full refresh 23.91s with 0 embedded | 65.12 | 0.21 | 2.00 | 0.52 | 0.21 | 0.19 | 84,389 | 71,323 | 226 | 0 | 712 | true | +| 2026-06-11 | 0ad9c380+wt | language support ownership full-sidecar stats; proof_tier full_sidecar; warnings none; retrieval_index_seconds 7.48; symbol_search_docs 11,630; dense anchors 713; dense skips 10,917; reasons public_api 661, entrypoint 5, central_graph_node 38, component_report 9 | 67.24 | 0.25 | 2.23 | 0.62 | 0.25 | 0.22 | 84,549 | 71,519 | 226 | 0 | 713 | true | +| 2026-06-11 | 0ad9c380+wt | receiver-aware language support follow-up full-sidecar stats; proof_tier full_sidecar; warnings none; retrieval_index_seconds 8.55; symbol_search_docs 11,658; dense anchors 714; dense skips 10,944; reasons public_api 662, entrypoint 5, central_graph_node 38, component_report 9 | 62.23 | 0.20 | 1.96 | 0.49 | 0.21 | 0.20 | 84,900 | 71,799 | 226 | 0 | 714 | true | +| 2026-06-11 | 0ad9c380+wt | Kotlin/Swift/Dart/Bash parser-backed graph stats-only full-sidecar pass; proof_tier full_sidecar; warnings none; broad ignored command also emitted stats but failed separate real drill because CODESTORY_REAL_REPO_DRILL_CASES was missing; retrieval_index_seconds 6.14; symbol_search_docs 11,772; dense anchors 715; dense skips 11,057; reasons public_api 663, entrypoint 5, central_graph_node 38, component_report 9 | 63.02 | 0.21 | 2.04 | 0.54 | 0.22 | 0.21 | 85,463 | 72,261 | 230 | 0 | 715 | true | +| 2026-06-13 | 99e47e77+wt | pass, AST-first retrieval remediation full-sidecar stats; proof_tier full_sidecar; warnings none; real drill intentionally skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; symbol_search_docs 12,208; dense anchors 721; dense skips 11,487; reasons public_api 667, entrypoint 6, central_graph_node 38, component_report 10; repeat full refresh 24.57s with 0 embedded; retrieval_index_seconds 7.26; repeat budget 30s | 68.25 | 0.20 | 1.23 | 0.50 | 0.22 | 0.21 | 89,726 | 75,676 | 238 | 0 | 721 | true | +| 2026-06-13 | ba745f33+wt | pass, branch review remediation final full-sidecar stats; proof_tier full_sidecar; warnings none; real drill intentionally skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; symbol_search_docs 12,239; dense anchors 721; dense skips 11,518; semantic_embedding_ms 47.06s; retrieval_index_seconds 4.46; retrieval_mode full; repeat full refresh 25.66s with 0 embedded | 71.33 | 0.27 | 1.27 | 0.47 | 0.24 | 0.20 | 90,015 | 75,900 | 238 | 0 | 721 | true | +| 2026-06-13 | 571a34e6+wt | pass, second-pass merge cleanup full-sidecar stats; proof_tier full_sidecar; warnings none; real drill intentionally skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; symbol_search_docs 12,243; dense anchors 721; dense skips 11,522; semantic_embedding_ms 43.85s; retrieval_index_seconds 7.30; retrieval_mode full; repeat full refresh 22.42s with 0 embedded | 65.89 | 0.29 | 1.66 | 0.56 | 0.23 | 0.20 | 90,016 | 75,903 | 238 | 0 | 721 | true | +| 2026-06-13 | b1849bfe+wt | pass, third-pass eval-boundary full-sidecar stats; proof_tier full_sidecar; warnings none; real drill intentionally skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; symbol_search_docs 12,242; dense anchors 721; dense skips 11,521; semantic_embedding_ms 45.74s; retrieval_index_seconds 7.44; retrieval_mode full; repeat full refresh 24.33s with 0 embedded | 71.08 | 0.24 | 1.28 | 0.44 | 0.21 | 0.19 | 90,000 | 75,872 | 238 | 0 | 721 | true | +| 2026-06-13 | 294c430c+wt | pass, language registry parity cleanup later committed as b0159add; proof_tier full_sidecar; warnings none; real drill intentionally skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; retrieval_index_seconds 6.69; retrieval_mode full; symbol_search_docs 12,251; dense anchors 721; dense skips 11,530; reasons public_api 667, entrypoint 6, central_graph_node 38, component_report 10; repeat full refresh 26.90s with 0 embedded | 67.97 | 0.37 | 1.31 | 0.50 | 0.23 | 0.21 | 90,035 | 75,909 | 238 | 0 | 721 | true | +| 2026-06-13 | b0159add+wt | pass, docs-contract and parser-backed sidecar freshness cleanup later committed as e0bf15f5; proof_tier full_sidecar; warnings none; real drill intentionally skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; retrieval_index_seconds 6.34; retrieval_mode full; symbol_search_docs 12,261; dense anchors 721; dense skips 11,540; reasons public_api 667, entrypoint 6, central_graph_node 38, component_report 10; repeat full refresh 22.50s with 0 embedded | 70.23 | 0.23 | 1.25 | 0.49 | 0.22 | 0.20 | 90,118 | 75,990 | 238 | 0 | 721 | true | +| 2026-06-13 | 12ebbf95+wt | pass, product semantics audit cleanup full-sidecar stats; proof_tier full_sidecar; warnings none; real drill intentionally skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; symbol_search_docs 12,267; dense anchors 721; dense skips 11,546; semantic_embedding_ms 44.25s; retrieval_index_seconds 6.53; retrieval_mode full; repeat full refresh 22.98s with 0 embedded | 67.02 | 0.28 | 1.26 | 0.53 | 0.21 | 0.21 | 90,147 | 76,016 | 238 | 0 | 721 | true | +| 2026-06-14 | 20a55398+wt | pass, packet sufficiency semantics and diagnostics docs full-sidecar stats; proof_tier full_sidecar; warnings none; real drill intentionally skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; symbol_search_docs 12,283; dense anchors 721; dense skips 11,562; semantic_embedding_ms 42.67s; retrieval_index_seconds 3.93; retrieval_mode full; repeat full refresh 23.97s with 0 embedded | 66.20 | 0.19 | 1.23 | 0.49 | 0.21 | 0.20 | 90,250 | 76,104 | 238 | 0 | 721 | true | +| 2026-06-14 | 28717906+wt | pass, packet sidecar budget and language-doc proof full-sidecar stats; proof_tier full_sidecar; warnings none; real drill intentionally skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; symbol_search_docs 12,398; dense anchors 725; dense skips 11,673; semantic_embedding_ms 67.43s; retrieval_index_seconds 7.73; retrieval_mode full; repeat full refresh 29.99s with 0 embedded | 98.53 | 0.23 | 1.35 | 0.60 | 0.26 | 0.30 | 90,780 | 76,566 | 248 | 0 | 725 | true | +| 2026-06-14 | 28717906+wt | pass, product contract vocabulary and repeat phase-gate full-sidecar stats; proof_tier full_sidecar; warnings none; real drill intentionally skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; symbol_search_docs 12,404; dense anchors 725; dense skips 11,679; semantic_embedding_ms 63.48s; retrieval_index_seconds 7.32; retrieval_mode full; repeat full refresh 26.22s with 0 embedded; repeat graph 13.87s; repeat semantic 0.79s; repeat cache 4.70s; repeat search projection/index 1.15s/1.12s | 88.16 | 0.23 | 1.41 | 0.57 | 0.25 | 0.23 | 90,807 | 76,595 | 248 | 0 | 725 | true | +| 2026-06-14 | 28717906+wt | pass, packet abstraction final release e2e after packet_plan/packet_capping/packet_budget extraction; proof_tier full_sidecar; warnings none; real drill intentionally skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; symbol_search_docs 12,404; dense anchors 725; dense skips 11,679; semantic_embedding_ms 57.14s; retrieval_index_seconds 7.59; retrieval_mode full; repeat full refresh 26.04s with 0 embedded; repeat graph 13.59s; repeat semantic 1.43s; repeat cache 5.54s; repeat search projection/index 1.21s/1.19s | 84.77 | 0.31 | 1.60 | 0.58 | 0.26 | 0.26 | 90,852 | 76,636 | 250 | 0 | 725 | true | +| 2026-06-14 | 28717906+wt | pass, packet step-trace budget finalizer fix full-sidecar stats; proof_tier full_sidecar; warnings none; real drill intentionally skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; symbol_search_docs 12,406; dense anchors 725; dense skips 11,681; semantic_embedding_ms 48.08s; retrieval_index_seconds 7.27; retrieval_mode full; repeat full refresh 23.08s with 0 embedded; repeat graph 12.44s; repeat semantic 0.76s; repeat cache 4.66s; repeat search projection/index 0.95s/1.11s | 72.13 | 0.22 | 1.36 | 0.56 | 0.21 | 0.23 | 90,856 | 76,640 | 250 | 0 | 725 | true | +| 2026-06-14 | 28717906+wt | pass, constrained packet claim profiles full-sidecar stats; proof_tier full_sidecar; warnings none; real drill intentionally skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; symbol_search_docs 12,420; dense anchors 725; dense skips 11,695; semantic_embedding_ms 45.88s; retrieval_index_seconds 6.47; retrieval_mode full; repeat full refresh 23.60s with 0 embedded; repeat graph 12.42s; repeat semantic 1.28s; repeat cache 4.97s; repeat search projection/index 0.96s/1.20s | 70.79 | 0.27 | 1.48 | 0.55 | 0.27 | 0.23 | 90,949 | 76,711 | 250 | 0 | 725 | true | +| 2026-06-14 | 28717906+wt | pass, final constrained packet claim profiles with SWR acronym gate full-sidecar stats; proof_tier full_sidecar; warnings none; real drill intentionally skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; symbol_search_docs 12,420; dense anchors 725; dense skips 11,695; semantic_embedding_ms 45.60s; retrieval_index_seconds 6.47; retrieval_mode full; repeat full refresh 23.52s with 0 embedded; repeat graph 12.17s; repeat semantic 1.12s; repeat cache 4.86s; repeat search projection/index 0.97s/1.12s | 68.20 | 0.32 | 1.35 | 0.57 | 0.25 | 0.22 | 90,954 | 76,715 | 250 | 0 | 725 | true | +| 2026-06-14 | 28717906+wt | pass, final cleanup without temporary plan docs full-sidecar stats; proof_tier full_sidecar; warnings none; real drill intentionally skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; symbol_search_docs 12,420; dense anchors 725; dense skips 11,695; semantic_embedding_ms 51.80s; retrieval_index_seconds 7.81; retrieval_mode full; repeat full refresh 24.41s with 0 embedded; repeat graph 12.66s; repeat semantic 0.68s; repeat cache 4.55s; repeat search projection/index 1.23s/1.11s | 76.13 | 0.31 | 1.48 | 0.71 | 0.28 | 0.23 | 90,954 | 76,715 | 250 | 0 | 725 | true | +| 2026-06-14 | 69c033c4+wt | pass, packet output budget and trace-writer cleanup full-sidecar stats; proof_tier full_sidecar; warnings none; real drill intentionally skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; symbol_search_docs 12,425; dense anchors 725; dense skips 11,700; semantic_embedding_ms 46.17s; retrieval_index_seconds 6.90; retrieval_mode full; repeat full refresh 24.87s with 0 embedded; repeat graph 12.49s; repeat semantic 0.70s; repeat cache 5.92s; repeat search projection/index 1.02s/2.55s | 72.75 | 0.30 | 1.69 | 0.57 | 0.25 | 0.21 | 90,984 | 76,741 | 250 | 0 | 725 | true | +| 2026-06-14 | 0f7020ed+wt | pass, review remediation spec execution full-sidecar stats; proof_tier full_sidecar; warnings none; real drill intentionally skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; symbol_search_docs 12,494; dense anchors 725; dense skips 11,769; semantic_embedding_ms 63.77s; retrieval_index_seconds 5.08; retrieval_mode full; repeat full refresh 28.34s with 0 embedded; repeat graph 13.66s; repeat semantic 1.34s; repeat cache 7.89s; repeat search projection/index 1.29s/2.33s | 101.15 | 0.24 | 1.51 | 0.67 | 0.33 | 0.28 | 91,417 | 77,058 | 251 | 0 | 725 | true | +| 2026-06-14 | 3291c4f1+wt | pass, readiness repair, publishable provenance, eval-boundary cleanup, fatal-error readiness split, and full-SHA benchmark manifest pinning full ignored e2e; proof_tier full_sidecar; warnings none; real drill intentionally skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1 because CODESTORY_REAL_REPO_DRILL_CASES was not set; sidecar_status_after_retrieval_index full; search.sidecar_shadow_retrieval_mode full; symbol_search_docs 12,558; dense anchors 725; dense skips 11,833; semantic_embedding_ms 54.33s; retrieval_index_seconds 8.38; retrieval_status_seconds 0.46; repeat full refresh 29.13s with 725 reused and 0 embedded | 85.71 | 0.26 | 1.92 | 0.70 | 0.24 | 0.24 | 91,707 | 77,287 | 251 | 0 | 725 | true | + +## Repeat And Report Timing + +New `codestory_repo_e2e_stats` runs emit `repeat_full_refresh_seconds`, +`repeat_cache_refresh_ms`, `repeat_search_projection_rebuild_ms`, +`repeat_search_symbol_index_ms`, `repeat_runtime_cache_publish_ms`, +`report_seconds`, and nested `report.markdown_seconds` / `report.json_seconds`. +Append the measurement row here when running the release harness. The repeat +wall-clock value is trend telemetry with a loose smoke cap; graph, semantic, +and zero-reembedding assertions are the actionable repeat-refresh gates. + +| Date | Commit | Scenario | Repeat full refresh seconds | Report seconds | Report markdown seconds | Report JSON seconds | +| --- | --- | --- | ---: | ---: | ---: | ---: | +| 2026-06-11 | 376df0c8+wt | readiness/handoff and Unix compatibility release e2e; proof_tier full_sidecar; real drill skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1 | 20.56 | 2.59 | 1.09 | 1.50 | +| 2026-06-11 | a60f078a+wt | agent-grounding rescue full e2e; proof_tier full_sidecar; real drill manifest target/agent-benchmark/real-repo-drill-cases.json with no skip allowance; holdout packet gate final-v4 passed cold+warm | 21.82 | 2.56 | 1.10 | 1.46 | +| 2026-06-11 | f89e7c63+wt | review action plan full-sidecar stats; proof_tier full_sidecar; real drill not run because CODESTORY_REAL_REPO_DRILL_CASES was missing | 23.91 | 2.59 | 1.08 | 1.51 | +| 2026-06-13 | 99e47e77+wt | AST-first retrieval remediation full-sidecar stats; proof_tier full_sidecar; real drill skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; repeat budget 30s | 24.57 | 1.90 | 0.78 | 1.12 | +| 2026-06-13 | ba745f33+wt | branch review remediation final full-sidecar stats; proof_tier full_sidecar; real drill skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1 | 25.66 | 1.98 | 0.79 | 1.19 | +| 2026-06-13 | 571a34e6+wt | second-pass merge cleanup full-sidecar stats; proof_tier full_sidecar; real drill skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1 | 22.42 | 2.00 | 0.81 | 1.19 | +| 2026-06-13 | b1849bfe+wt | third-pass eval-boundary full-sidecar stats; proof_tier full_sidecar; real drill skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1 | 24.33 | 1.86 | 0.74 | 1.12 | +| 2026-06-13 | 294c430c+wt | language registry parity cleanup later committed as b0159add; proof_tier full_sidecar; real drill skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1 | 26.90 | 2.16 | 0.86 | 1.30 | +| 2026-06-13 | b0159add+wt | docs-contract and parser-backed sidecar freshness cleanup later committed as e0bf15f5; proof_tier full_sidecar; real drill skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1 | 22.50 | 1.92 | 0.77 | 1.15 | +| 2026-06-13 | 12ebbf95+wt | product semantics audit cleanup full-sidecar stats; proof_tier full_sidecar; real drill skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1 | 22.98 | 1.95 | 0.75 | 1.21 | +| 2026-06-14 | 20a55398+wt | packet sufficiency semantics and diagnostics docs full-sidecar stats; proof_tier full_sidecar; real drill skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1 | 23.97 | 2.06 | 0.94 | 1.13 | +| 2026-06-14 | 28717906+wt | packet sidecar budget and language-doc proof full-sidecar stats; proof_tier full_sidecar; real drill skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1 | 29.99 | 2.62 | 1.15 | 1.48 | +| 2026-06-14 | 28717906+wt | product contract vocabulary and repeat phase-gate full-sidecar stats; proof_tier full_sidecar; real drill skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; repeat graph 13.87s; repeat semantic 0.79s; repeat cache/search projection/index 4.70s/1.15s/1.12s | 26.22 | 2.15 | 0.87 | 1.28 | +| 2026-06-14 | 28717906+wt | packet abstraction final release e2e; proof_tier full_sidecar; real drill skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; repeat graph 13.59s; repeat semantic 1.43s; repeat cache/search projection/index 5.54s/1.21s/1.19s | 26.04 | 2.39 | 0.92 | 1.47 | +| 2026-06-14 | 28717906+wt | packet step-trace budget finalizer fix full-sidecar stats; proof_tier full_sidecar; real drill skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; repeat graph 12.44s; repeat semantic 0.76s; repeat cache/search projection/index 4.66s/0.95s/1.11s | 23.08 | 2.09 | 0.84 | 1.25 | +| 2026-06-14 | 28717906+wt | constrained packet claim profiles full-sidecar stats; proof_tier full_sidecar; real drill skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; repeat graph 12.42s; repeat semantic 1.28s; repeat cache/search projection/index 4.97s/0.96s/1.20s | 23.60 | 2.11 | 0.83 | 1.28 | +| 2026-06-14 | 28717906+wt | final constrained packet claim profiles with SWR acronym gate full-sidecar stats; proof_tier full_sidecar; real drill skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; repeat graph 12.17s; repeat semantic 1.12s; repeat cache/search projection/index 4.86s/0.97s/1.12s | 23.52 | 2.07 | 0.82 | 1.24 | +| 2026-06-14 | 28717906+wt | final cleanup without temporary plan docs full-sidecar stats; proof_tier full_sidecar; real drill skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; repeat graph 12.66s; repeat semantic 0.68s; repeat cache/search projection/index 4.55s/1.23s/1.11s | 24.41 | 2.19 | 0.92 | 1.27 | +| 2026-06-14 | 69c033c4+wt | packet output budget and trace-writer cleanup full-sidecar stats; proof_tier full_sidecar; real drill skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; repeat graph 12.49s; repeat semantic 0.70s; repeat cache/search projection/index 5.92s/1.02s/2.55s | 24.87 | 2.03 | 0.80 | 1.23 | +| 2026-06-14 | 0f7020ed+wt | review remediation spec execution full-sidecar stats; proof_tier full_sidecar; real drill skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; repeat graph 13.66s; repeat semantic 1.34s; repeat cache/search projection/index 7.89s/1.29s/2.33s | 28.34 | 2.93 | 1.47 | 1.45 | +| 2026-06-14 | 3291c4f1+wt | readiness repair, publishable provenance, eval-boundary cleanup, fatal-error readiness split, and full-SHA benchmark manifest pinning full ignored e2e; proof_tier full_sidecar; real drill skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1 because CODESTORY_REAL_REPO_DRILL_CASES was not set; repeat graph 14.80s; repeat semantic 1.60s; repeat cache/search projection/index 5.88s/1.19s/1.21s | 29.13 | 2.59 | 1.08 | 1.51 | ## Phase Metrics +Rows for `2026-06-11` commit `0ad9c380+wt` were moved to the headline table +above because the source rows only recorded headline stats. No separate +graph/semantic phase split is recoverable from those rows, so they are omitted +from this phase table rather than backfilled. + | Date | Commit | Scenario | Index seconds | Graph phase seconds | Semantic phase seconds | Semantic docs reused | Semantic docs embedded | Semantic docs stale | | --- | --- | --- | ---: | ---: | ---: | ---: | ---: | ---: | | 2026-04-18 | c383227 | fresh temp cache E2E | 211.02 | 3.21 | 201.66 | 0 | 10,359 | 0 | @@ -103,3 +172,26 @@ Keep the full emitted JSON in the test output when reviewing locally, and add th | 2026-06-02 | a23770f+wt | round 9 stats-only release e2e; real drill intentionally skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; not real-drill release evidence; retrieval_index_seconds 18.35; retrieval_mode full | 711.31 | 11.08 | 691.07 | 0 | 10,847 | 0 | | 2026-06-05 | 42089cc5+wt | stats-only retrieval rollout proof guidance plus strict sidecar markdown freshness fix; real drill intentionally skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; retrieval_index_seconds 21.57; retrieval_mode full | 981.56 | 9.67 | 963.51 | 0 | 10,881 | 0 | | 2026-06-08 | 9387e9e3 | proof readiness 0.6.2 full-sidecar stats; proof_tier full_sidecar; warnings index_seconds>600 and semantic_phase_seconds>500; real drill not run because CODESTORY_REAL_REPO_DRILL_CASES was missing; retrieval_index_seconds 18.13; retrieval_mode full | 791.43 | 9.73 | 772.72 | 0 | 11,049 | 0 | +| 2026-06-10 | a88705f2 | clean main baseline same-machine full-sidecar stats from detached worktree; warnings index_seconds>600 and semantic_phase_seconds>500; retrieval_index_seconds 26.44; retrieval_mode full | 1238.23 | 13.61 | 1211.82 | 0 | 11,178 | 0 | +| 2026-06-10 | a88705f2+wt | AST-first graph_first_v1 full-sidecar stats; symbol_search_docs 11,315; dense anchors 693; dense skips 10,622; reasons public_api 643, entrypoint 5, central_graph_node 36, component_report 9; repeat full refresh 22.75s with 0 embedded | 67.34 | 13.16 | 43.98 | 0 | 693 | 0 | +| 2026-06-11 | 376df0c8+wt | readiness/handoff and Unix compatibility release e2e; proof_tier full_sidecar; real drill skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; symbol_search_docs 11,505; dense anchors 708; dense skips 10,797; reasons public_api 656, entrypoint 5, central_graph_node 38, component_report 9 | 68.23 | 10.11 | 49.85 | 0 | 708 | 0 | +| 2026-06-11 | a60f078a+wt | agent-grounding rescue full e2e; proof_tier full_sidecar; real drill manifest target/agent-benchmark/real-repo-drill-cases.json with no skip allowance; symbol_search_docs 11,543; dense anchors 708; dense skips 10,835; reasons public_api 656, entrypoint 5, central_graph_node 38, component_report 9 | 66.00 | 11.25 | 45.95 | 0 | 708 | 0 | +| 2026-06-11 | f89e7c63+wt | review action plan full-sidecar stats; proof_tier full_sidecar; real drill not run because CODESTORY_REAL_REPO_DRILL_CASES was missing; symbol_search_docs 11,615; dense anchors 712; dense skips 10,903; reasons public_api 660, entrypoint 5, central_graph_node 38, component_report 9 | 65.12 | 10.58 | 46.32 | 0 | 712 | 0 | +| 2026-06-13 | 99e47e77+wt | AST-first retrieval remediation full-sidecar stats; proof_tier full_sidecar; warnings none; symbol_search_docs 12,208; dense anchors 721; dense skips 11,487; reasons public_api 667, entrypoint 6, central_graph_node 38, component_report 10; repeat full refresh 24.57s with 0 embedded | 68.25 | 13.19 | 46.06 | 0 | 721 | 0 | +| 2026-06-13 | ba745f33+wt | branch review remediation final full-sidecar stats; proof_tier full_sidecar; warnings none; symbol_search_docs 12,239; dense anchors 721; dense skips 11,518; reasons public_api 667, entrypoint 6, central_graph_node 38, component_report 10; repeat full refresh 25.66s with 0 embedded | 71.33 | 13.29 | 47.96 | 0 | 721 | 0 | +| 2026-06-13 | 571a34e6+wt | second-pass merge cleanup full-sidecar stats; proof_tier full_sidecar; warnings none; symbol_search_docs 12,243; dense anchors 721; dense skips 11,522; reasons public_api 667, entrypoint 6, central_graph_node 38, component_report 10; repeat full refresh 22.42s with 0 embedded | 65.89 | 12.51 | 44.61 | 0 | 721 | 0 | +| 2026-06-13 | b1849bfe+wt | third-pass eval-boundary full-sidecar stats; proof_tier full_sidecar; warnings none; symbol_search_docs 12,242; dense anchors 721; dense skips 11,521; reasons public_api 667, entrypoint 6, central_graph_node 38, component_report 10; repeat full refresh 24.33s with 0 embedded | 71.08 | 14.38 | 46.63 | 0 | 721 | 0 | +| 2026-06-13 | 294c430c+wt | language registry parity cleanup later committed as b0159add; proof_tier full_sidecar; real drill skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; retrieval_index_seconds 6.69; retrieval_mode full | 67.97 | 12.72 | 45.34 | 0 | 721 | 0 | +| 2026-06-13 | b0159add+wt | docs-contract and parser-backed sidecar freshness cleanup later committed as e0bf15f5; proof_tier full_sidecar; real drill skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; retrieval_index_seconds 6.34; retrieval_mode full | 70.23 | 12.48 | 48.96 | 0 | 721 | 0 | +| 2026-06-13 | 12ebbf95+wt | product semantics audit cleanup full-sidecar stats; proof_tier full_sidecar; real drill skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; retrieval_index_seconds 6.53; retrieval_mode full | 67.02 | 12.30 | 45.09 | 0 | 721 | 0 | +| 2026-06-14 | 20a55398+wt | packet sufficiency semantics and diagnostics docs full-sidecar stats; proof_tier full_sidecar; real drill skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1; retrieval_index_seconds 3.93; retrieval_mode full | 66.20 | 12.68 | 43.71 | 0 | 721 | 0 | +| 2026-06-14 | 28717906+wt | packet sidecar budget and language-doc proof full-sidecar stats; proof_tier full_sidecar; symbol_search_docs 12,398; dense anchors 725; dense skips 11,673; reasons public_api 669, entrypoint 6, central_graph_node 40, component_report 10 | 98.53 | 15.08 | 68.50 | 0 | 725 | 0 | +| 2026-06-14 | 28717906+wt | product contract vocabulary and repeat phase-gate full-sidecar stats; proof_tier full_sidecar; symbol_search_docs 12,404; dense anchors 725; dense skips 11,679; reasons public_api 669, entrypoint 6, central_graph_node 40, component_report 10 | 88.16 | 12.92 | 64.42 | 0 | 725 | 0 | +| 2026-06-14 | 28717906+wt | packet abstraction final release e2e; proof_tier full_sidecar; symbol_search_docs 12,404; dense anchors 725; dense skips 11,679; reasons public_api 669, entrypoint 6, central_graph_node 40, component_report 10 | 84.77 | 12.67 | 58.41 | 0 | 725 | 0 | +| 2026-06-14 | 28717906+wt | packet step-trace budget finalizer fix full-sidecar stats; proof_tier full_sidecar; symbol_search_docs 12,406; dense anchors 725; dense skips 11,681; reasons public_api 669, entrypoint 6, central_graph_node 40, component_report 10 | 72.13 | 13.29 | 49.13 | 0 | 725 | 0 | +| 2026-06-14 | 28717906+wt | constrained packet claim profiles full-sidecar stats; proof_tier full_sidecar; symbol_search_docs 12,420; dense anchors 725; dense skips 11,695; reasons public_api 669, entrypoint 6, central_graph_node 40, component_report 10 | 70.79 | 12.60 | 49.31 | 0 | 725 | 0 | +| 2026-06-14 | 28717906+wt | final constrained packet claim profiles with SWR acronym gate full-sidecar stats; proof_tier full_sidecar; symbol_search_docs 12,420; dense anchors 725; dense skips 11,695; reasons public_api 669, entrypoint 6, central_graph_node 40, component_report 10 | 68.20 | 12.43 | 46.50 | 0 | 725 | 0 | +| 2026-06-14 | 28717906+wt | final cleanup without temporary plan docs full-sidecar stats; proof_tier full_sidecar; symbol_search_docs 12,420; dense anchors 725; dense skips 11,695; reasons public_api 669, entrypoint 6, central_graph_node 40, component_report 10 | 76.13 | 12.43 | 53.24 | 0 | 725 | 0 | +| 2026-06-14 | 69c033c4+wt | packet output budget and trace-writer cleanup full-sidecar stats; proof_tier full_sidecar; symbol_search_docs 12,425; dense anchors 725; dense skips 11,700; reasons public_api 669, entrypoint 6, central_graph_node 40, component_report 10 | 72.75 | 13.61 | 47.47 | 0 | 725 | 0 | +| 2026-06-14 | 0f7020ed+wt | review remediation spec execution full-sidecar stats; proof_tier full_sidecar; symbol_search_docs 12,494; dense anchors 725; dense skips 11,769; reasons public_api 669, entrypoint 6, central_graph_node 40, component_report 10 | 101.15 | 16.68 | 73.60 | 0 | 725 | 0 | +| 2026-06-14 | 3291c4f1+wt | readiness repair, publishable provenance, eval-boundary cleanup, fatal-error readiness split, and full-SHA benchmark manifest pinning full ignored e2e; proof_tier full_sidecar; warnings none; real drill skipped with CODESTORY_ALLOW_SKIP_REAL_REPO_DRILL_CASES=1 because CODESTORY_REAL_REPO_DRILL_CASES was not set; sidecar_status_after_retrieval_index full; search.sidecar_shadow_retrieval_mode full; retrieval_index_seconds 8.38; retrieval_status_seconds 0.46; repeat full refresh 29.13s with 725 reused and 0 embedded; report_seconds 2.59; symbol_search_docs 12,558; dense anchors 725; dense skips 11,833; reasons public_api 669, entrypoint 6, central_graph_node 40, component_report 10 | 85.71 | 16.59 | 55.35 | 0 | 725 | 0 | diff --git a/docs/testing/codestory-stdio-warm-loop-stats.md b/docs/testing/codestory-stdio-warm-loop-stats.md index 0d8ea701..3a7ac6cd 100644 --- a/docs/testing/codestory-stdio-warm-loop-stats.md +++ b/docs/testing/codestory-stdio-warm-loop-stats.md @@ -4,14 +4,14 @@ This log tracks the persistent `serve --stdio` path that agents should prefer on Run after building the release CLI: -```powershell +```sh cargo build --release -p codestory-cli cargo test -p codestory-cli --test stdio_warm_loop_stats -- --ignored --nocapture ``` The harness prints metrics from the test process after the stdio server exits. The server stdout remains protocol-only: one JSON-RPC response per line, with no benchmark text mixed into the protocol stream. -| Date | Commit | Scenario | Result | Reps | Startup ms | Tools/list ms | First search ms | Cold one-loop ms | Warm total ms | Warm per-loop ms | Warm/cold per-loop ratio | Search p50/p95/p99 ms | Symbol p50/p95/p99 ms | Trail p50/p95/p99 ms | Snippet p50/p95/p99 ms | Status p50/p95/p99 ms | Index semantic reload ms | Warm stdio semantic reload ms | Fallback reason | Warm search dir unchanged | Protocol stdout only | +| Date | Commit | Scenario | Result | Reps | Startup ms | Tools/list ms | First search ms | Cold one-loop ms | Warm total ms | Warm per-loop ms | Warm/cold per-loop ratio | Search p50/p95/p99 ms | Symbol p50/p95/p99 ms | Trail p50/p95/p99 ms | Snippet p50/p95/p99 ms | Sidecar fingerprint/status p50/p95/p99 ms | Index semantic reload ms | Warm stdio semantic reload ms | Fallback reason | Warm search dir unchanged | Protocol stdout only | | --- | --- | --- | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | --- | --- | --- | --- | --- | ---: | --- | --- | --- | --- | | 2026-05-06 | pending | small fixture, release binary, hash embeddings | pass | 20 | 25.09 | 1.56 | 25.96 | 169.29 | 1070.03 | 53.50 | 0.32 | 20.84/25.96/25.96 | 15.01/17.67/17.67 | 10.25/13.92/13.92 | 6.50/8.36/8.36 | 6.79/13.17/13.17 | 0 | null | null | true | true | @@ -77,6 +77,6 @@ so a changed index bypasses the cached packet. - The baseline is a small-fixture release-binary smoke, not a repo-scale promotion gate. - Response bytes are run-local smoke metrics because temp paths appear in JSON payloads. -- `warm per-loop ms` covers `search -> symbol -> trail -> snippet`; `resources/read codestory://status` is measured separately because it is a health check, not part of the cold one-shot comparison. +- `warm per-loop ms` covers `search -> symbol -> trail -> snippet`; `resources/read codestory://status` is measured separately as `sidecar_status` because it includes the mandatory sidecar fingerprint/status check, not the cold one-shot comparison. - `warm stdio semantic reload ms` is `null` because `serve --stdio` does not currently expose a dedicated semantic reload phase; any warm-server load cost is included in `startup ms`. - Add hard latency budgets only after several local runs establish variance. diff --git a/docs/testing/codestory-stress-lanes.md b/docs/testing/codestory-stress-lanes.md index c430f6d1..a355f257 100644 --- a/docs/testing/codestory-stress-lanes.md +++ b/docs/testing/codestory-stress-lanes.md @@ -7,20 +7,20 @@ metrics exist. They are promotion scouts, not product proof by themselves. Default smoke scale builds a 1k-file synthetic repo: -```powershell +```sh cargo bench -p codestory-bench --bench browser_stress ``` Larger scales are opt-in: -```powershell -$env:CODESTORY_STRESS_SCALE = "large" # 1k + 10k -$env:CODESTORY_ALLOW_HEAVY_STRESS = "1" +```sh +export CODESTORY_STRESS_SCALE=large # 1k + 10k +export CODESTORY_ALLOW_HEAVY_STRESS=1 cargo bench -p codestory-bench --bench browser_stress -$env:CODESTORY_STRESS_SCALE = "full" # 1k + 10k + 100k -$env:CODESTORY_ALLOW_HEAVY_STRESS = "1" -$env:CODESTORY_ALLOW_100K_STRESS = "1" +export CODESTORY_STRESS_SCALE=full # 1k + 10k + 100k +export CODESTORY_ALLOW_HEAVY_STRESS=1 +export CODESTORY_ALLOW_100K_STRESS=1 cargo bench -p codestory-bench --bench browser_stress ``` diff --git a/docs/testing/framework-route-coverage.md b/docs/testing/framework-route-coverage.md index be3b3439..4a275297 100644 --- a/docs/testing/framework-route-coverage.md +++ b/docs/testing/framework-route-coverage.md @@ -2,9 +2,14 @@ CodeStory indexes framework routes as graph symbols when extraction is backed by fixtures and confidence labels. Do not claim full framework support from a -single heuristic hit. +single heuristic hit. Language support tiers are defined separately in +[language-support.md](../architecture/language-support.md). -## Current Coverage Target +## Current Route Extraction Scope + +These bullets list extractor coverage or tracked fixture targets, not full +framework parity; use `summary.framework_route_coverage` for per-framework +status, confidence floor, handler-link support, known gaps, and promotability. - JavaScript/TypeScript: Express, React Router, SvelteKit, Next.js, Remix, Fastify, Koa, Hono, NestJS. @@ -15,10 +20,9 @@ single heuristic hit. - Rust: Axum, Actix, Rocket. - Go: Gin, Chi, Echo, Fiber as text-only partial route extraction until Go parser-backed handler links exist. -- Kotlin/Swift/Dart (unmapped today): Ktor, Vapor, and Shelf heuristics are - implemented in `collect_framework_routes` for when those language paths index - source files; fixture coverage lives in - `test_framework_route_extractors_cover_requested_web_stacks`. +- Kotlin/Swift/Dart: Ktor, Vapor, and Shelf extractor fixtures exist, but + published framework coverage is not promoted until the runtime coverage + matrix lists status, gaps, and handler-link support. - Existing OpenAPI endpoint indexing remains separate and should continue to produce endpoint symbols and speculative client-call edges. - Payload collection config and usage extraction is tracked as data bridge @@ -67,7 +71,7 @@ be checked with `files --path ` or a fresh index. identity. 5. Run `codestory-cli files --project --format json` and inspect `summary.framework_route_coverage` for framework, language, status, - fixture status, confidence floor, handler-link support, unsupported + coverage evidence, confidence floor, handler-link support, unsupported patterns, known gaps, and promotable status. 6. Run `cargo test -p codestory-indexer --lib framework_route`. 7. Run the search-quality eval harness when route names should be discoverable: diff --git a/docs/testing/language-expansion-ab-report.md b/docs/testing/language-expansion-ab-report.md new file mode 100644 index 00000000..33cc7942 --- /dev/null +++ b/docs/testing/language-expansion-ab-report.md @@ -0,0 +1,292 @@ +# Language Expansion A/B Report + +Date: 2026-06-14 + +## Verdict + +The language-expansion work is useful product evidence, but it is not broad +promotion proof and it is not a first-class claim for every language/framework. + +The corrected frame is: + +- Framework and domain semantics are product semantics. React/Next routes, + Express middleware, Gin handlers, ASP.NET endpoints, Rails controllers, + Django views/models, LINQ-style flows, and similar concepts are not overfit + merely because they are language- or framework-aware. +- Benchmark overfit is different: production code must not depend on holdout + task ids, pinned benchmark repo names, fixture paths, one-off route names, or + expected-answer wording. +- Parser-backed language support is not the same thing as first-class + framework/domain support. + +Current evidence says CodeStory is better on the packet-eligible slice and the +sidecar/packet path is improving, but the 18-language packet experience still +needs iteration. The main remaining product gap is answer semantics: several +packets cite the right anchors but still emit generic "supports/inspect" claims +instead of explaining the actual handoff. + +## Evidence Ledger + +| Slice | Raw evidence | Result | Use it for | +| --- | --- | --- | --- | +| Full 18-language paired A/B | `target/agent-benchmark/segment6-full-language-suite-r1-pathfix/reanalyzed-summary.json` and `.md` | CodeStory quality `9/18`; no-CodeStory quality `7/17` scored with one unsuccessful row. CodeStory used `13,060,265` tokens vs `8,191,771`, `4,014,646 ms` runner wall vs `3,094,988 ms`, and `4,796,792 ms` all-in wall vs `3,094,988 ms`. | Historical negative/diagnostic evidence. | +| Packet-eligible paired A/B | `target/agent-benchmark/segment8-no-family-steering-current9-ab-java-css-generic-shapes/reanalyzed-summary.json` and `.md` | CodeStory quality `9/9` vs no-CodeStory `6/9`; CodeStory used `291,788` tokens vs `5,346,265`, `502,289 ms` all-in wall vs `1,881,683 ms`, `9` commands vs `282`, and zero source reads vs `228`. | Narrow positive evidence for rows that were packet-eligible in that run. | +| Fresh 18-row packet runtime before sidecar fix | `target/agent-benchmark/language-expansion-packet-runtime-current-28717906/packet-runtime-summary.md`, `packet-composition.md`, and `quality-debug.json` | `13/18` rows produced scored packets, `7/13` scored rows passed manifest quality, `4/13` were partial, and `5/18` failed as hard `retrieval_unavailable` command failures. | Current diagnostic baseline before the sidecar unresolved-candidate fix. | +| Five-row sidecar unresolved-candidate fix slice | `target/agent-benchmark/language-expansion-packet-runtime-sidecar-unresolved-fix/packet-runtime-summary.md` and `quality-debug.json` | The five previously hard-failing rows all produced packet output. Quality passed `3/5` (`java`, `c`, `css`) and failed expected-claim recall `2/5` (`express`, `swift`). All five remained packet-partial because unresolved candidates and compact-budget truncation are now surfaced as sufficiency gaps instead of command failures. | Regression evidence for the sidecar strictness fix; not a substitute for a fresh full 18-row run. | +| Two-row product-claim semantics fix slice | `target/agent-benchmark/language-expansion-packet-runtime-claim-semantics-fix/packet-runtime-summary.md` and `quality-debug.json` | Express and Swift/URLSession both passed manifest quality after source-derived product claims replaced generic "supports/inspect" wording. Both remain packet-partial. | Regression evidence for production framework/domain semantics without enabling eval-only probes. | +| Current 18-row packet runtime after fixes | `target/agent-benchmark/language-expansion-packet-runtime-current-after-claim-fixes/packet-runtime-summary.json`, `.md`, `packet-composition.md`, and `quality-debug.json` | `18/18` command pass, `18/18` scored, `12/18` manifest-quality pass, `9/18` packet sufficient, `9/18` packet partial. Packet retrieval SLA misses remain on Java (`30,931 ms`), Redis (`30,313 ms`), and Okio (`20,799 ms`). | Canonical current packet-runtime evidence for this branch. | + +The current full packet-runtime artifact supersedes the stitched estimates from +the smaller repair slices. + +## Product Semantics vs Benchmark Overfit + +### Keep + +The framework route collectors in `crates/codestory-indexer/src/lib.rs` are +product semantics and should stay. They cover common route shapes for Express, +Fastify, Koa, Hono, React Router, SvelteKit, Next, Remix, Astro, Nuxt, Django, +Flask/FastAPI-style decorators, Spring, Axum/Actix/Rocket, Gin, Rails, +Laravel, and ASP.NET with explicit confidence labels. Ktor, Vapor, and Shelf +extractor fixtures exist, but they are not published in +`summary.framework_route_coverage` yet; treat them as extractor-level semantics +until the coverage matrix names status, gaps, and handler-link support. + +These are not benchmark hacks. They are the kind of domain knowledge required +for first-class framework support. + +### Move or Rename + +Packet source-claim semantics have been moved out of the orchestrator into +named runtime profile modules: + +- `packet_terms.rs` owns prompt/probe term extraction. +- `packet_source_patterns.rs` owns source-pattern primitives. +- `packet_claims.rs` owns ranked citation-to-claim synthesis and source + definition claim extraction. +- `packet_claim_profiles.rs` owns product claim profiles such as server route, + hook/cache, client-send, URLSession request lifecycle, string-predicate, + stylesheet animation, SQL schema, runtime-formatting, and search-execution + flows. +- `packet_command_profiles.rs` owns command-span probes and command-flow claim + templates. +- `packet_evidence_roles.rs` owns typed citation role classification; labels + leave that boundary only for user-facing text, trace rows, and claim keys. +- `packet_required_probes.rs` owns product-required probe expansion, concrete + file probe adaptation, and citation/claim coverage matching. +- `packet_citations.rs` owns shared citation display/path/source helpers. +- `packet_capping.rs` owns citation budget-capping policy. +- `packet_sufficiency.rs` owns packet sufficiency thresholds, budget-blocking + verdicts, gap text, command quoting, and follow-up command assembly. + +That boundary is the intended architecture. New framework/domain steering +should land as a named profile or collector, not as another ad hoc branch inside +the orchestrator. + +Indexing-flow packet probes now use product concepts such as indexing +entrypoint, file discovery, symbol extraction, storage persistence, search +projection, and snapshot refresh. Exact CodeStory fixture anchors such as +specific method names are test evidence or request-scoped diagnostics, not +production-required probes. + +Search-execution packet probes and product claims now use generic product +concepts such as search entrypoint, flag parsing, candidate traversal, search +execution, parallel search, and result output. Ripgrep-shaped wording such as +`SearchWorker`, `haystack`, `walk_builder`, `PatternMatcher`, and +`flags::parse` remains benchmark/eval-only. + +### Quarantine + +Exact holdout probes and expected-claim shaping belong in benchmark manifests, +scorer inputs, request-scoped probes, or `eval_probes.rs`. The current runtime +quarantine is intentionally hard: in non-test builds, +`eval_probes_enabled()` returns `false`, so release CLI/runtime builds ignore +`CODESTORY_EVAL_PROBES`. + +That means exact Requests, AutoMapper, Jekyll, and similar holdout probes are +not production steering in release builds. Keep that boundary. Express-style +route handoffs and URLSession request lifecycle claims are now production +semantics, but they are source-pattern-derived and pass the benchmark-overfit +lint instead of naming holdout repos or task ids. Exact ripgrep search-pipeline +wording stays in the holdout manifest and eval probes, while production search +semantics stay generic. + +### Delete + +No live production deletion target was confirmed in this pass. The concrete +bug found was not benchmark overfit; it was sidecar strictness. Packet batch +queries used to abort when a full-mode sidecar returned candidates from +docs/tests/non-symbol files that could not resolve to indexed graph symbols. +Those are now diagnostics and sufficiency gaps instead of command failures. + +## Current Packet Runtime Read + +### What Improved + +The five rows that previously failed before scoring now produce packet output +and pass manifest quality in the current full run: + +- `java-commons-lang-string-utils`: quality pass, packet partial. +- `javascript-express-routing-flow`: quality pass, packet partial. +- `c-redis-command-loop`: quality pass, packet partial. +- `swift-alamofire-request-flow`: quality pass, packet partial. +- `css-animate-base-and-keyframes`: quality pass, packet partial. + +This fixes the wrong failure mode. A full-mode sidecar candidate that cannot be +resolved to an indexed symbol is useful diagnostic evidence, not proof the +entire packet command is unavailable. It also shows that framework/domain +semantics can improve answer quality without leaking benchmark markers into +production code. + +### What Still Fails + +The remaining quality failures are mostly answer-semantics gaps, not missing +retrieval: + +- Python Requests, Jekyll, Monolog, AutoMapper, Okio, and MDN/HTML still fail + expected-claim recall in the current full run. Anchors are often present, but + the answer surface does not consistently state causal handoffs. +- Some partial rows are compact-budget artifacts. They retain enough citations + to be useful but still need follow-up commands before the packet can claim to + be self-contained. +- Java, Redis, and Okio still miss the packet retrieval SLA. + +## What This Proves + +- The benchmark harness can compare strict no-CodeStory and CodeStory-first + arms with wall time, token usage, command counts, direct source reads, web + leakage, packet quality, and post-packet behavior. +- CodeStory is clearly useful on the current packet-eligible slice. +- Parser-backed support exists for the languages listed in + `crates/codestory-contracts/src/language_support.rs`, and HTML/CSS/SQL are + explicitly structural-only. +- Sidecar unresolved-candidate handling no longer turns docs/tests/non-symbol + hits into packet command failures. +- Express-style route and URLSession request lifecycle claims can be generated + from source patterns in production builds without enabling eval-only probes. +- Runtime packet source claims are now named product profiles rather than + generic orchestration branches. +- The next frontier is framework/domain answer semantics, not simply adding + more file extensions. + +## What This Does Not Prove + +- It does not prove a broad 18-language A/B win. +- It does not prove every public language-support profile has equal semantic + resolution, graph depth, framework support, or packet sufficiency. +- It does not prove React, LINQ, Rails, Django, ASP.NET, or any other framework + is complete. Framework support requires explicit framework/domain semantics. +- It does not justify public savings claims or default promotion language. + +## Durable Boundaries + +- Public language support claims come from + `crates/codestory-contracts/src/language_support.rs`. +- Workspace filtering may keep compatibility-only extensions such as `svelte`, + `vue`, `astro`, `cshtml`, `scss`, `sass`, `less`, `ps1`, and `psm1`, but those + are not public parser-backed claims unless the registry says so. +- Benchmark-specific probes live outside production behavior. +- Ripgrep-shaped search-pipeline answer templates live outside production + behavior. +- Production framework/domain semantics should stay named as profiles or + collectors, not hidden as generic language steering. + +## Reproduction + +Validate the holdout manifest and corpus shape: + +```powershell +node scripts\codestory-language-holdout-integrity.mjs +``` + +Run harness self-checks: + +```powershell +node --test scripts\tests\codestory-agent-ab-analyzer.test.mjs +node scripts\codestory-agent-ab-benchmark.mjs --self-test +node --check scripts\codestory-agent-ab-score.mjs +node --check scripts\codestory-agent-ab-benchmark.mjs +``` + +Run a fresh packet-runtime diagnostic after runtime changes: + +```powershell +cargo build --release -p codestory-cli +node scripts\codestory-agent-ab-benchmark.mjs ` + --packet-runtime ` + --packet-runtime-mode cold-cli ` + --task-suite language-expansion-holdout ` + --repeats 1 ` + --repo-cache-dir target\oss-language-corpus\repos ` + --materialize-repos ` + --jobs 4 ` + --prepare-codestory-jobs 2 ` + --out-dir target\agent-benchmark\language-expansion-packet-runtime-current ` + --codestory-cli target\release\codestory-cli.exe ` + --timeout-ms 180000 ` + --allow-failures +``` + +Run the repaired five-row slice: + +```powershell +node scripts\codestory-agent-ab-benchmark.mjs ` + --packet-runtime ` + --packet-runtime-mode cold-cli ` + --task-suite language-expansion-holdout ` + --task-ids java-commons-lang-string-utils,javascript-express-routing-flow,c-redis-command-loop,swift-alamofire-request-flow,css-animate-base-and-keyframes ` + --repeats 1 ` + --repo-cache-dir target\oss-language-corpus\repos ` + --materialize-repos ` + --jobs 4 ` + --prepare-codestory-jobs 2 ` + --out-dir target\agent-benchmark\language-expansion-packet-runtime-sidecar-unresolved-fix ` + --codestory-cli target\release\codestory-cli.exe ` + --timeout-ms 180000 ` + --allow-failures +``` + +Run the focused claim-semantics slice: + +```powershell +node scripts\codestory-agent-ab-benchmark.mjs ` + --packet-runtime ` + --packet-runtime-mode cold-cli ` + --task-suite language-expansion-holdout ` + --task-ids javascript-express-routing-flow,swift-alamofire-request-flow ` + --repeats 1 ` + --repo-cache-dir target\oss-language-corpus\repos ` + --materialize-repos ` + --jobs 2 ` + --prepare-codestory-jobs 2 ` + --out-dir target\agent-benchmark\language-expansion-packet-runtime-claim-semantics-fix ` + --codestory-cli target\release\codestory-cli.exe ` + --timeout-ms 180000 ` + --allow-failures +``` + +Run eval-only exact benchmark diagnostics when debugging a row-specific probe: + +```powershell +# Only Rust tests and explicit benchmark/eval harnesses can enable this switch; +# release CLI/runtime builds ignore it. +$env:CODESTORY_EVAL_PROBES = "1" +cargo test -p codestory-runtime --test retrieval_generalization_guard -- --nocapture +Remove-Item Env:CODESTORY_EVAL_PROBES +``` + +Do not use eval-only rows as promotion evidence. + +## Promotion Blockers + +- Raise packet answer semantics so cited anchors become concrete handoff + claims, not generic "supports/inspect" bullets. +- Keep newly added framework/domain claims source-pattern-derived, linted, and + owned by named profiles or collectors. +- Keep sidecar strictness fail-closed for unavailable/degraded sidecar modes + while preserving unresolved full-mode candidate diagnostics. +- Convert the current `9/18` packet-partial rows into self-contained packets or + make their follow-up requirement more explicit in the product surface. +- Fix packet retrieval latency misses on Java, Redis, and Okio. +- Run a fresh full 18-language paired A/B suite only after packet sufficiency + and steering boundaries improve, then repeat before claiming promotion. diff --git a/docs/testing/oss-language-corpus.md b/docs/testing/oss-language-corpus.md new file mode 100644 index 00000000..40887278 --- /dev/null +++ b/docs/testing/oss-language-corpus.md @@ -0,0 +1,145 @@ +# OSS Language Corpus + +The OSS language corpus is an ignored, opt-in test suite for checking each +public language-support profile against a pinned medium-sized open source project. +It is intentionally outside the default test lane because it clones external +repositories and can take several minutes. + +The suite has two sides for each language: + +- `raw_without_codestory`: a plain `std::fs` crawl of the pinned checkout. This + code does not call CodeStory workspace discovery, indexing, runtime, or store + APIs. It counts files and LOC for the language's supported extensions. +- `with_codestory`: CodeStory indexes the exact raw file list into an in-memory + store. The suite compares stored/indexed file counts, node counts, edge + counts, errors, and timing stats against thresholds. + +This is a language-indexing corpus, not an agent answer-quality or agent-cost +benchmark. It does not measure tokens, tool calls, command counts, or elapsed +agent time. Use the `language-expansion-holdout` agent A/B suite when the +question is whether CodeStory improves an agent over raw source access. + +The paired A/B suite lives at +`benchmarks/tasks/language-expansion-holdout/language-support-ab.task.json` and +uses the same pinned projects. It compares `without_codestory` against +`with_codestory` and reports time, tokens, estimated cost, observed tool calls, +command counts, source reads, post-packet source reads, and manifest quality +scores. Its `summary.json` / `reanalyzed-summary.json` files include a +`cost_accounting` block that totals those costs per arm and compares +`with_codestory` against `without_codestory`. The no-CodeStory arm counts a +harness-run local `rg` plus bounded source-read prelude before the nested agent +starts. The CodeStory arm counts a harness-run packet prelude before the nested +agent starts. A baseline row cannot be promoted if it uses CodeStory or never +inspects the local repository. + +## Commands + +Validate the manifest without cloning: + +```powershell +$env:CODESTORY_OSS_CORPUS_DRY_RUN = "1" +cargo test -p codestory-indexer --test oss_language_corpus -- --ignored --nocapture +Remove-Item Env:CODESTORY_OSS_CORPUS_DRY_RUN +``` + +Run one or more languages: + +```powershell +$env:CODESTORY_RUN_OSS_LANGUAGE_CORPUS = "1" +$env:CODESTORY_OSS_CORPUS_LANGUAGES = "python,go" +cargo test -p codestory-indexer --test oss_language_corpus -- --ignored --nocapture +Remove-Item Env:CODESTORY_RUN_OSS_LANGUAGE_CORPUS +Remove-Item Env:CODESTORY_OSS_CORPUS_LANGUAGES +``` + +Run the full corpus: + +```powershell +$env:CODESTORY_RUN_OSS_LANGUAGE_CORPUS = "1" +cargo test -p codestory-indexer --test oss_language_corpus -- --ignored --nocapture +Remove-Item Env:CODESTORY_RUN_OSS_LANGUAGE_CORPUS +``` + +Run the paired agent A/B suite instead: + +```powershell +node scripts/codestory-agent-ab-benchmark.mjs ` + --task-suite language-expansion-holdout ` + --arms without_codestory,with_codestory ` + --repeats 3 --materialize-repos --prepare-codestory-cache ` + --out-dir target/agent-benchmark/language-expansion-holdout ` + --timeout-ms 600000 +``` + +By default, checkouts are cached in +`target/oss-language-corpus/repos`. To use another cache directory: + +```powershell +$env:CODESTORY_OSS_CORPUS_CACHE = "D:\codestory-oss-corpus" +``` + +The latest JSONL report is written to: + +```text +target/oss-language-corpus/reports/oss-language-corpus-latest.jsonl +``` + +## Latest Verification + +Last checked: 2026-06-12. + +The full ignored corpus was run against the materialized benchmark repo cache: + +```powershell +$env:CODESTORY_RUN_OSS_LANGUAGE_CORPUS = "1" +$env:CODESTORY_OSS_CORPUS_CACHE = "target\agent-benchmark\repos" +cargo test -p codestory-indexer --test oss_language_corpus -- --ignored --nocapture +``` + +Result: 18/18 public language-support profiles passed the indexing-only corpus. The run compared 4,308 raw files and +1,272,498 raw LOC against CodeStory indexing of the same file lists. CodeStory +indexed 4,308 files and produced 385,735 nodes and 312,268 edges with 0 errors +and 0 fatal errors. The latest per-language JSONL evidence is in +`target/oss-language-corpus/reports/oss-language-corpus-latest.jsonl`. + +The cheap integrity check used by the Autoresearch gate is: + +```powershell +node scripts\codestory-language-holdout-integrity.mjs +``` + +It validates the recorded artifact shape and provenance: all 18 +language-expansion repos are materialized at their manifest commits, and the +latest OSS corpus report has 18 passed rows with matching raw/indexed file +counts and zero errors. It is not a fresh indexing run unless the corpus test is +rerun with `CODESTORY_RUN_OSS_LANGUAGE_CORPUS=1`. + +## Manifest + +| Language | Project | Pinned commit | +| --- | --- | --- | +| Python | `psf/requests` | `6f66281a1d6326b1b9c4ac09ca30de0fc4e6ef43` | +| Java | `apache/commons-lang` | `57f39420fef8413ea42f045f1bdba4864ff75a0c` | +| Rust | `BurntSushi/ripgrep` | `82313cf95849bfe425109ad9506a52154879b1b1` | +| JavaScript | `expressjs/express` | `dae209ae6559c29cfca2a1f4414c51d89ea643d5` | +| TypeScript/TSX | `vercel/swr` | `f8d4995ac555f02a2784c8fc40bc819782c60568` | +| C++ | `fmtlib/fmt` | `e8deaf2ec3b53ced589fce6f640061e5b32eeeaa` | +| C | `redis/redis` | `df63a65d4d4ee33ae67e9f101885074febe0bccb` | +| Go | `gin-gonic/gin` | `d75fcd4c9ab260e5225de590f1f0f8c0e0e12d11` | +| Ruby | `jekyll/jekyll` | `202df571314ba1d18e9fccd81d12aaad4a703c38` | +| PHP | `Seldaek/monolog` | `04c3499db98d7471abd9261dc83232f8fe1a252d` | +| C# | `AutoMapper/AutoMapper` | `b57c206dc7291821e42bdf816a5637a5c1d8cb54` | +| Kotlin | `square/okio` | `722c8be0043d99b7b08d169b0ae90a24c15267ff` | +| Swift | `Alamofire/Alamofire` | `7595cbcf59809f9977c5f6378500de2ad73b7ddb` | +| Dart | `dart-lang/http` | `89cec60a4249ae0a0316f7a50d37ac56597f52c3` | +| Bash | `nvm-sh/nvm` | `7079a5d61c2b49c7d35a72006860ce5edb0fac51` | +| HTML | `mdn/learning-area` | `ca1ff0bd06e12b96a6742ffdf040bb22966e5a5e` | +| CSS | `animate-css/animate.css` | `3f8ab233dbbd9d2fe577528d2296382954be3d1a` | +| SQL | `lerocha/chinook-database` | `7f67772503d71ba90f19283c38e93923addb43fa` | + +## Maintenance Rule + +Every language returned by `language_support_profile_for_language_name` must +have exactly one corpus entry. The dry-run mode validates that the manifest and +the runtime support map stay aligned, so a future language addition must also +add a pinned OSS project before the manifest check passes. diff --git a/docs/testing/performance-review-playbook.md b/docs/testing/performance-review-playbook.md index a5fec2f1..13cda82e 100644 --- a/docs/testing/performance-review-playbook.md +++ b/docs/testing/performance-review-playbook.md @@ -54,7 +54,7 @@ command flags before and after. Prefer existing gates before adding a new harness: -```powershell +```sh cargo build --release -p codestory-cli cargo test -p codestory-cli --test codestory_repo_e2e_stats -- --ignored --nocapture cargo test -p codestory-cli --test search_json_output -- --ignored --nocapture search_quality_eval @@ -86,14 +86,11 @@ Before/after rows in that log require a serialized full ignored e2e run. If the branch cannot run it yet, leave the log unchanged and put this exact deferred verification plan in the PR or final notes: -```powershell +```sh cargo build --release -p codestory-cli cargo test -p codestory-cli --test codestory_repo_e2e_stats -- --ignored --nocapture ``` -For the CLI navigation branch baseline, see -[cli-navigation-next-wave-performance-review.md](cli-navigation-next-wave-performance-review.md). - ## Parallelization Candidate Gate Parallel or async work is allowed only after the baseline shows the exact path diff --git a/docs/testing/retrieval-architecture.md b/docs/testing/retrieval-architecture.md index a87868ef..5237e6e3 100644 --- a/docs/testing/retrieval-architecture.md +++ b/docs/testing/retrieval-architecture.md @@ -1,13 +1,14 @@ # Sidecar retrieval — architecture and promotion guide -Sidecar-primary packet retrieval (Zoekt lexical, Qdrant semantic, SCIP graph) orchestrated by +Sidecar-primary packet retrieval (Zoekt lexical, optional Qdrant dense anchors, SCIP graph) orchestrated by `codestory-retrieval` and integrated in `codestory-runtime`. Production packet paths use generic symbol/path roles; benchmark-only probe catalogs remain behind test-only eval harness hooks. Sidecar retrieval is mandatory for current evidence; `CODESTORY_RETRIEVAL=0` is treated as a configuration error, not a diagnostic route. -**Related:** [`../ops/retrieval-sidecars.md`](../ops/retrieval-sidecars.md) (operator runbook), -[`../architecture/retrieval-design.md`](../architecture/retrieval-design.md) (module contracts). +**Related:** [`../ops/retrieval-sidecars.md`](../ops/retrieval-sidecars.md) (setup, +env vars, CI smoke), [`../architecture/retrieval-design.md`](../architecture/retrieval-design.md) +(mode definitions, cost envelopes, promotion guards). --- @@ -17,99 +18,64 @@ configuration error, not a diagnostic route. |-------|----------|------| | Sidecar clients | `crates/codestory-retrieval/` (`zoekt_client`, `qdrant_client`, `scip_client`, `health`) | HTTP probes, staged search, timeouts | | Planner / executor / ranker | `codestory-retrieval` (`planner`, `executor`, `ranker`, `query_features`, `mode`) | Repo-agnostic staged plan, deadlines, degraded modes | -| Index manifest | `codestory-store` `retrieval_index_manifest` + `codestory-retrieval::index` | Version pins, sidecar input hash, generation id, and mandatory real sidecar artifact paths | +| Index manifest | `codestory-store` `retrieval_index_manifest` + `codestory-retrieval::index` | Version pins, sidecar input hash, generation id, symbol-doc count, dense-anchor count, semantic policy version, graph artifact hash, dense reason counts, and mandatory real sidecar artifact paths | | CLI lifecycle | `codestory-cli` `retrieval up\|down\|status\|index\|query` | Local data dirs, health JSON, standalone query | | Packet integration | `codestory-runtime/src/agent/retrieval_primary.rs` | Primary sidecar path, diagnostic traces, promotion warnings | | Nucleo policy | `codestory-runtime/src/agent/nucleo_policy.rs` | Suppresses Nucleo O(n) scan on sidecar primary; disabled sidecars are not valid product evidence | -| Generalization lint | `scripts/lint-retrieval-generalization.mjs` | Bans repo literals in Rust production retrieval trees (CI via Rust guard test); benchmark/eval harness scripts may name holdout repos only inside their manifest/eval boundary | +| Generalization lint | `scripts/lint-retrieval-generalization.mjs` | Bans repo literals in Rust production retrieval trees (CI via Rust guard test); benchmark/eval harness scripts and `codestory-runtime/src/agent/eval_probes.rs` may name holdout repos only inside their manifest/eval boundary | -**Modes:** `full`, `no_scip`, `no_semantic`, `lexical_only`, `unavailable` — only -`full` may serve primary packet/search results. All non-`full` modes fail closed. See -[`retrieval-design.md`](../architecture/retrieval-design.md#mandatory-sidecar-mode-matrix). +**Modes:** See the canonical +[mode matrix](../architecture/retrieval-design.md#mode-matrix). Only `full` may +serve primary packet/search results. **Benchmark manifests:** `benchmarks/tasks/local-real/` is the realistic local product corpus; `benchmarks/tasks/holdout-retrieval/` is the public generalization corpus. Holdout rows are promotion evidence only, not a tuning loop. -## Environment flags +## Proof tiers and claims -### Runtime variables +Do not describe a branch as generalized or useful for agents until the matching +proof tier has run cleanly on the current branch. Docs and PRs must state only +the highest tier actually reached: -`CODESTORY_RETRIEVAL_V2` and `CODESTORY_RETRIEVAL_V2_SHADOW` are no longer migration aliases. -If either legacy variable is present, packet retrieval fails closed instead of silently mapping it -to the sidecar-primary contract. +| Tier | Proof | Claim allowed | +|------|-------|---------------| +| 1. CodeStory self-e2e | Generalization lint, targeted runtime/indexer tests, release CLI build, `doctor`, and repo-scale e2e stats | CodeStory still works on itself and production code has no banned holdout literals | +| 2. Local-real drill suite | Tier 1 plus local-real packet/drill rows with no skip allowances | Product tuning survived realistic local repos | +| 3. Holdout-retrieval drill suite | Tier 2 plus holdout-retrieval materialized repos, no skip allowances, required recall/quality thresholds, and forbidden-claim checks | Retrieval behavior is generalized enough for the public holdout suite | +| 4. Promotion-grade paired benchmark | Tier 3 plus repeated paired CodeStory/no-CodeStory rows, quality gates, timing/cost accounting, and source-read avoidance checks | Promotion language about agent usefulness, speed, or savings | -| Variable | Default (production) | Purpose | -|----------|----------------------|---------| -| `CODESTORY_RETRIEVAL` | unset → sidecar primary when manifest + `full` mode (else fail closed) | `1` force sidecar primary attempt; `0` is unsupported and fails closed | -| `CODESTORY_RETRIEVAL_SHADOW` | unsupported for product benchmarks | Historical diagnostic switch; benchmark contract rejects it | -| `CODESTORY_ZOEKT_ENABLED` | on | `0` is unsupported for product retrieval | -| `CODESTORY_QDRANT_ENABLED` | on | `0` is unsupported for product retrieval | -| `CODESTORY_RETRIEVAL_REAL_EMBEDDINGS` | `1` | `0` is unsupported for product retrieval | -| `CODESTORY_RETRIEVAL_COMPOSE_PROFILE` | `real` | every other profile is unsupported for product bootstrap | -| `CODESTORY_EMBED_BACKEND` | `llamacpp` | product manifests require llama.cpp bge-base embeddings | -| `CODESTORY_EMBED_LLAMACPP_URL` | `http://127.0.0.1:8080/v1/embeddings` | local embedding sidecar endpoint | -| `CODESTORY_ZOEKT_PORT` | `6070` | Zoekt HTTP | -| `CODESTORY_QDRANT_HTTP_PORT` | `6333` | Qdrant HTTP | -| `CODESTORY_QDRANT_GRPC_PORT` | `6334` | Qdrant gRPC | +`packet` status is evidence sufficiency, not final answer quality. Only +`drill`/`drill-suite` rows with ledger classifications can promote answer +quality. Packet-first runs count as agent-useful only when packets marked +`sufficient` avoid post-packet source reads, or when those reads are explicitly +classified as source-truth follow-up rather than hidden grounding. -### Benchmark-only flags +## Environment and setup -Use these when running promotion harnesses. Do not enable in normal production packet runs. +Version pins, env vars, bootstrap commands, troubleshooting, and CI smoke +sequences are owned by +[`retrieval-sidecars.md`](../ops/retrieval-sidecars.md). AST-first policy gates +and dense-anchor promotion fields are summarized there and in +[`retrieval-design.md`](../architecture/retrieval-design.md#ast-first-semantic-contract). -| Variable | Default | Purpose | -|----------|---------|---------| -| `CODESTORY_EVAL_PROBES` | ignored in production runtime | Benchmark-shaped probe catalog (`eval_probes.rs`) is test-only; promotion bundles do not inject it. | - -**Sidecar promotion candidate (typical):** - -```powershell -Remove-Item Env:CODESTORY_RETRIEVAL -ErrorAction SilentlyContinue -Remove-Item Env:CODESTORY_EVAL_PROBES -ErrorAction SilentlyContinue -.\target\release\codestory-cli.exe retrieval up -.\target\release\codestory-cli.exe retrieval index --project . --refresh auto -``` +Benchmark-only flag: `CODESTORY_EVAL_PROBES` is ignored in production runtime +and must stay test-only. --- -## Local workflows - -### One-command environment setup - -From the CodeStory repository root: - -```sh -cargo retrieval-setup -cargo retrieval-status -``` - -Optional Node wrapper (prerequisite report, optional holdout clone): -`node scripts/setup-retrieval-env.mjs`. -See [`../ops/retrieval-sidecars.md`](../ops/retrieval-sidecars.md#quick-start-one-command). - -### Sidecars and index - -```sh -cargo retrieval-setup -cargo run -p codestory-cli -- retrieval index --project --refresh auto -cargo run -p codestory-cli -- retrieval query "main" --project -``` - -`retrieval bootstrap` (alias `cargo retrieval-setup`) starts Docker Compose when Docker is installed. -`retrieval up` alone only prepares cache dirs and state (see runbook). - -### local-real packet suite (in-scope tuning) +## Local test workflows Repos: `codex`, `rootandruntime`, `sourcetrail`, `vscode` — manifests under `benchmarks/tasks/local-real/`. -```powershell -node scripts/codestory-agent-ab-benchmark.mjs ` - --packet-runtime --packet-runtime-mode cold-cli ` - --task-suite local-real --repeats 1 ` - --out-dir target/agent-benchmark/packet-runtime-sidecar-promotion ` - --codestory-cli target/release/codestory-cli.exe ` +```sh +node scripts/codestory-agent-ab-benchmark.mjs \ + --packet-runtime --packet-runtime-mode cold-cli \ + --task-suite local-real --repeats 1 \ + --out-dir target/agent-benchmark/packet-runtime-sidecar-promotion \ + --codestory-cli target/release/codestory-cli \ --timeout-ms 300000 ``` @@ -119,27 +85,31 @@ before promotion language. ### holdout-retrieval (generalization) -```powershell +```sh node scripts/fetch-holdout-repos.mjs # or: -node scripts/codestory-agent-ab-benchmark.mjs ` +node scripts/codestory-agent-ab-benchmark.mjs \ --list --task-suite holdout-retrieval --materialize-repos -node scripts/codestory-agent-ab-benchmark.mjs ` - --packet-runtime --packet-runtime-mode cold-cli ` - --task-suite holdout-retrieval --materialize-repos ` - --repeats 1 ` - --out-dir target/agent-benchmark/holdout-retrieval-smoke ` - --codestory-cli target/release/codestory-cli.exe ` +node scripts/codestory-agent-ab-benchmark.mjs \ + --packet-runtime --packet-runtime-mode cold-cli \ + --task-suite holdout-retrieval --materialize-repos \ + --repeats 1 \ + --out-dir target/agent-benchmark/holdout-retrieval-smoke \ + --codestory-cli target/release/codestory-cli \ --timeout-ms 180000 ``` Holdout failures should block promotion or trigger diagnosis; do not add repo-name/path literals or tune planner/ranker heuristics against holdout rows. +The generalization lint currently fails production Rust on holdout names and +anchors such as repository names, specific source paths, and manifest-specific +symbols. Keep those strings in manifests, tests, benchmark harnesses, or the +test-only eval probe module. ## Fast CI-style checks (automated in Phase 6) -```powershell +```sh cargo test -p codestory-runtime --test retrieval_generalization_guard node --test scripts/tests/codestory-agent-ab-analyzer.test.mjs cargo test -p codestory-cli --test onboarding_contracts @@ -147,7 +117,7 @@ cargo test -p codestory-cli --test onboarding_contracts Optional broader lane: -```powershell +```sh cargo test -p codestory-retrieval cargo test -p codestory-runtime node --test scripts/tests/codestory-agent-ab-analyzer.test.mjs @@ -177,22 +147,28 @@ tests in the branch. Do not infer support for languages without direct benchmark | Warning config | done | `docs/architecture/retrieval-rollback.json` | | Markdown link contract (`onboarding_contracts`) | verify | `cargo test -p codestory-cli --test onboarding_contracts` | | local-real cold packet + north-star SLOs | **human** | p99 retrieval, quality 3/4, wall targets | -| holdout-retrieval 2/3 pass | **human** | Requires materialized OSS repos + index | +| holdout-retrieval pass without skip allowances | **human** | Requires materialized OSS repos + index; no generalized claim without required recall/quality/forbidden-claim thresholds | | `agent_value_gap` < 0.20 | **human** | Measure from a fresh coherent bundle | -| Windows `retrieval-sidecar-smoke` CI job | fail-closed sidecar smoke | [`retrieval-sidecar-smoke-ci.md`](../contributors/retrieval-sidecar-smoke-ci.md) | +| Windows `retrieval-sidecar-smoke` CI job | fail-closed sidecar smoke | [`retrieval-sidecars.md`](../ops/retrieval-sidecars.md#preflight-smoke-contract) | | Ragas/Phoenix nightly eval | optional | Not configured | ### North-star SLOs (targets — measure before claiming pass) | Metric | Target | |--------|--------| +| Cold CodeStory product index | under 180 s | +| Cold semantic embedding time | at least 70% lower than same-run baseline | +| Dense embedded docs | at least 65% lower than same-run baseline | +| Repeat full refresh | 0 unchanged dense docs embedded and under 25 s | +| Holdout MRR@10 | no more than 1 percentage-point drop versus same-run baseline | +| Hit@10 / exact-symbol Hit@1 | no regression | | Retrieval p50 | ≤ 250 ms | | Retrieval p90 | ≤ 600 ms | | Retrieval p99 | ≤ 1,000 ms | | Worst-case packet wall | ≤ 1,500 ms | | local-real quality pass | ≥ 3/4 repos | | `agent_value_gap` | < 0.20 | -| holdout generalization | 2/3 of `ripgrep`, `axios`, `redis` | +| holdout generalization | Required manifest thresholds across the full holdout-retrieval suite | | Sidecar planner/ranker repo literals | 0 (lint clean) | --- @@ -207,7 +183,7 @@ After promotion runs, verify rollback warnings: **One-shot operator drill (after each promotion run):** -```powershell +```sh cargo test -p codestory-runtime retrieval_rollback::tests::rollback_drill_warns_without_setting_legacy_env -- --nocapture ``` diff --git a/docs/usage.md b/docs/usage.md index 3581d74e..3bd8702b 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -3,29 +3,16 @@ This is the operator guide. It keeps setup, common workflows, retrieval defaults, and recovery notes in one place. +Examples use POSIX shell syntax unless a block is labeled PowerShell. On +Windows, use `.\target\release\codestory-cli.exe` for the release binary, +`$env:NAME = "value"` for environment variables, and Windows paths when that is +the workspace you are indexing. + ## Install The Skill Install the grounding skill once, then point it at explicit target workspaces. - -```powershell -$SkillHome = "" -New-Item -ItemType Directory -Force -Path $SkillHome | Out-Null -Copy-Item -Recurse -Force .\.agents\skills\codestory-grounding "$SkillHome\codestory-grounding" -& "$SkillHome\codestory-grounding\scripts\setup.ps1" -``` - -On Unix-like systems: - -```sh -bash "/codestory-grounding/scripts/setup.sh" -``` - -The setup script prints the resolved `CODESTORY_CLI` path. Persist it if your -agent environment does not already preserve the variable between sessions. - -```powershell -setx CODESTORY_CLI "C:\Users\you\AppData\Local\CodeStory\bin\codestory-cli.exe" -``` +See [README — Install As An Agent Skill](../README.md#install-as-an-agent-skill) +for the full copy/setup commands and Windows PowerShell variant. The source skill package lives at [../.agents/skills/codestory-grounding/SKILL.md](../.agents/skills/codestory-grounding/SKILL.md). @@ -38,19 +25,19 @@ setup fetches and builds the remote default branch. Use this path when you are changing CodeStory itself or testing the current checkout. -```powershell +```sh cargo build --release -p codestory-cli -$CodeStoryCli = ".\target\release\codestory-cli.exe" -& $CodeStoryCli --help +CODESTORY_CLI="./target/release/codestory-cli" +"$CODESTORY_CLI" --help ``` Pick a target workspace explicitly: -```powershell -$TargetWorkspace = "C:\path\to\repo" -& $CodeStoryCli doctor --project $TargetWorkspace -& $CodeStoryCli index --project $TargetWorkspace --refresh auto -& $CodeStoryCli ground --project $TargetWorkspace --why +```sh +TARGET_WORKSPACE="/path/to/repo" +"$CODESTORY_CLI" doctor --project "$TARGET_WORKSPACE" +"$CODESTORY_CLI" index --project "$TARGET_WORKSPACE" --refresh auto +"$CODESTORY_CLI" ground --project "$TARGET_WORKSPACE" --why ``` ## Readiness Tracks @@ -58,7 +45,7 @@ $TargetWorkspace = "C:\path\to\repo" CodeStory has two readiness tracks. Keep them separate when deciding whether an agent can rely on packet/search output. -### Local navigation readiness +### Local navigation/cache readiness This lane is for local browsing and source navigation. It uses the project SQLite cache built by `index` and read by commands such as `ground`, `symbol`, @@ -68,7 +55,7 @@ SQLite cache built by `index` and read by commands such as `ground`, `symbol`, means the local cache, graph, lexical index, and DB-backed navigation commands are usable. It does not prove agent packet/search readiness. -### Agent packet/search readiness +### Agent packet/search sidecar readiness This lane is for agent-facing `packet` and `search` evidence. It requires the sidecar retrieval stack to be built and healthy: Zoekt lexical shards, Qdrant @@ -85,7 +72,7 @@ described as agent packet/search readiness. ### I need a repo overview -```powershell +```sh codestory-cli doctor --project codestory-cli index --project --refresh full codestory-cli ground --project --why @@ -105,18 +92,20 @@ files as outputs to regenerate, not source-of-truth state. ### I need evidence for a broad question -```powershell +```sh codestory-cli packet --project --question "" --budget compact ``` Use `packet` for questions like "how does routing work?" or "what owns indexing -state?" It returns citations, gaps, and follow-up commands. If the packet says -the evidence is incomplete, follow the named commands instead of opening -unstructured source files directly. +state?" It returns a `sufficient`, `partial`, or `blocked` status with +citations, trust limits, gaps, and follow-up commands. If the packet is +`partial` or `blocked`, follow the named source-truth commands instead of +opening unstructured source files directly. Treat `sufficient` as evidence +coverage, not final answer-quality proof. ### I need to understand one symbol or file -```powershell +```sh codestory-cli search --project --query "" --why codestory-cli explore --project --id --no-tui codestory-cli trail --project --id --story --hide-speculative @@ -126,7 +115,7 @@ codestory-cli snippet --project --id --context 40 Start with `search`, pick a concrete `node-id`, then inspect the relationships and source. Use `context` when you want a bundled handoff around that target: -```powershell +```sh codestory-cli context --project --id --bundle out/context-name ``` @@ -136,7 +125,7 @@ target-first; it is not an open chat endpoint and is not a replacement for broad ### I changed files and need likely impact -```powershell +```sh codestory-cli index --project --refresh incremental codestory-cli affected --project --format markdown git diff --name-only HEAD | codestory-cli affected --project --stdin --format json @@ -147,25 +136,42 @@ Treat `affected` as test-selection evidence, not a replacement for tests. The default command preserves git name-status records; path-only stdin remains available when another tool already chose the file list. -### The cache or retrieval looks stale +### The cache or local navigation looks stale -```powershell +```sh codestory-cli doctor --project codestory-cli index --project --refresh full codestory-cli doctor --project ``` -If `doctor` reports stale inventory, semantic contract mismatch, missing managed -assets, or a non-`full` retrieval mode, fix that layer before investigating -answer quality. Treat the health report as the first source of truth for cache -and retrieval state. +If `doctor` reports stale inventory, dense-anchor contract mismatch, missing +managed assets, or a non-`full` retrieval mode, fix that layer before +investigating answer quality. Treat the health report as the first source of +truth for cache and retrieval state. + +For agent-facing packet/search recovery, use the full sidecar repair sequence +that `ready --goal agent` reports: + +```sh +codestory-cli retrieval bootstrap --project --format json +codestory-cli retrieval index --project --refresh full --format json +codestory-cli retrieval status --project --format json +codestory-cli doctor --project --format markdown +``` + +When the core index is missing, stale, unchecked, or has recorded fatal indexing +errors, `ready` reports the necessary `codestory-cli index` repair first. +Otherwise, sidecar recovery does not need to repeat a full core reindex. +`retrieval bootstrap` prepares or checks the local sidecar services. The target +workspace is not packet/search-ready until `retrieval index` writes a current +target manifest and `doctor` or `retrieval status` reports `retrieval_mode=full`. ## Core Commands - `doctor`: read-only health check for project, cache, index, retrieval, and environment readiness. -- `index`: build or refresh the SQLite graph, snapshots, search state, and - semantic docs. +- `index`: build or refresh the SQLite graph, snapshots, search state, + graph-native symbol docs, component reports, and selected dense anchors. - `ground`: broad repo-level orientation snapshot; `--why` explains retrieval mode, coverage, gaps, and next commands. - `report`: derived Markdown repo report or JSON graph export from the current @@ -224,32 +230,13 @@ reset, schema change, or suspected stale-state incident. ## Predictable Output Modes -Most commands default to Markdown because the normal operator path is human -review. Use `--format markdown` when the output will be read directly in a -terminal, pasted into a report, or inspected during recovery. +Most commands default to Markdown for human review. Use `--format json` when automation needs the complete structured result, including exact field comparisons such as `retrieval_mode` or cache paths. Use `--output-file ` when the artifact should live outside terminal logs. The parent directory must already exist. -Agent-facing Markdown starts with an operator header when the command has enough -status evidence to do so: `Status`, `Trust`, `Next Action`, and `Proof Tier` -come before dense citations, diagnostics, or graph details. This is the default -shape for `doctor`, `ground --why`, `search --why`, `packet`, and `context`. +`explore` opens the terminal UI by default when a TUI is available. Use `--no-tui`, `--plain`, or `CODESTORY_NO_TUI=1` for predictable command output in agent runs, tests, non-interactive terminals, and CI logs. -`search --why` keeps provenance compact by default. Use -`search --why --plan-details` only when you need the full broad-query search -plan, including subqueries, candidate windows, bridge evidence, rejected -candidates, and source-truth checks. - -Use `--format json` when automation needs the complete structured result, -including fields that Markdown may summarize. JSON is the safer choice for -tests, scripts, status gates, and any workflow that must compare exact values -such as `retrieval_mode`, cache paths, or timing fields. - -Use `--output-file ` when a command produces an artifact that should be -kept separate from terminal logs. The parent directory must already exist. -Treat the file as the durable result and stdout/stderr as command status. - -`explore` opens the terminal UI by default when a TUI is available. Use `--no-tui` -for predictable command output in agent runs, tests, non-interactive terminals, -and CI logs. +Agent-facing Markdown may start with `Status`, `Trust`, `Next Action`, and +`Proof Tier` before dense citations. Use `search --why --plan-details` only when +you need the full broad-query search plan. ## Retrieval Defaults @@ -261,7 +248,7 @@ older local search path. Basic local index: -```powershell +```sh codestory-cli doctor --project codestory-cli index --project --refresh full codestory-cli ground --project --why @@ -272,11 +259,11 @@ write the retrieval manifest, or prove agent packet/search readiness. Product sidecar setup for agent-facing packet/search: -```powershell +```sh node scripts/setup-retrieval-env.mjs --fetch-embed-model -$env:CODESTORY_EMBED_MODEL_DIR = (Resolve-Path .\target\retrieval-models).Path -$env:CODESTORY_EMBED_BACKEND = "llamacpp" -$env:CODESTORY_EMBED_LLAMACPP_URL = "http://127.0.0.1:8080/v1/embeddings" +export CODESTORY_EMBED_MODEL_DIR="$(pwd)/target/retrieval-models" +export CODESTORY_EMBED_BACKEND="llamacpp" +export CODESTORY_EMBED_LLAMACPP_URL="http://127.0.0.1:8080/v1/embeddings" cargo retrieval-setup codestory-cli index --project --refresh full @@ -285,6 +272,13 @@ codestory-cli retrieval status --project --format json codestory-cli doctor --project ``` +`setup-retrieval-env.mjs --fetch-embed-model` downloads the configured GGUF to a +temporary path and verifies the pinned artifact before renaming it into +`CODESTORY_EMBED_MODEL_DIR`. The accepted artifact is exactly `117974304` bytes +with SHA-256 +`ad1afe72cd6654a558667a3db10878b049a75bfd72912e1dabb91310d671173c`; all +configured mirrors must pass the same check. + Run `codestory-cli retrieval index` only after the local sidecar services, llama.cpp embedding endpoint, and `bge-base-en-v1.5` model configuration are ready, then require `retrieval status --format json` to report @@ -295,7 +289,7 @@ so backend drift is visible. Legacy managed embedding setup is local semantic/diagnostic only: -```powershell +```sh codestory-cli setup embeddings --project --dry-run --format json codestory-cli setup embeddings --project ``` @@ -384,7 +378,7 @@ Other values currently resolve to the durable default. Typical recovery flow: -```powershell +```sh codestory-cli doctor --project codestory-cli index --project --refresh full codestory-cli search --project --query WorkspaceIndexer @@ -394,20 +388,20 @@ If the cache directory itself is suspect, get the exact project cache path from `doctor`, verify that it is under the CodeStory cache root, move it aside first, then rebuild. Remove the backup only after the fresh index is healthy: -```powershell -$cacheDir = "" -$cacheRoot = Join-Path $env:LOCALAPPDATA "CodeStory" -$resolvedCache = (Resolve-Path -LiteralPath $cacheDir).Path -$resolvedRoot = (Resolve-Path -LiteralPath $cacheRoot).Path -$relative = [System.IO.Path]::GetRelativePath($resolvedRoot, $resolvedCache) -if ($relative.StartsWith("..") -or [System.IO.Path]::IsPathRooted($relative)) { - throw "Refusing to touch cache outside CodeStory cache root: $resolvedCache" -} -$backup = "$resolvedCache.bak-$(Get-Date -Format yyyyMMddHHmmss)" -Rename-Item -LiteralPath $resolvedCache -NewName (Split-Path -Leaf $backup) +```sh +cache_dir="" +cache_root="${XDG_CACHE_HOME:-$HOME/.cache}/codestory" +resolved_cache="$(realpath "$cache_dir")" +resolved_root="$(realpath "$cache_root")" +case "$resolved_cache" in + "$resolved_root"/*) ;; + *) echo "Refusing to touch cache outside CodeStory cache root: $resolved_cache" >&2; exit 1 ;; +esac +backup="${resolved_cache}.bak-$(date +%Y%m%d%H%M%S)" +mv "$resolved_cache" "$backup" codestory-cli index --project --refresh full codestory-cli doctor --project -Remove-Item -LiteralPath $backup -Recurse -Force +rm -rf "$backup" ``` Low-memory guidance: @@ -423,7 +417,7 @@ Low-memory guidance: Run Cargo commands serially in this repo: -```powershell +```sh cargo fmt --check cargo check cargo test @@ -432,13 +426,13 @@ cargo clippy --all-targets -- -D warnings Focused docs/onboarding lane: -```powershell +```sh cargo test -p codestory-cli --test onboarding_contracts ``` Release-blocking fidelity lanes: -```powershell +```sh cargo test -p codestory-indexer --test fidelity_regression cargo test -p codestory-indexer --test tictactoe_language_coverage cargo test -p codestory-runtime --test retrieval_eval @@ -450,7 +444,7 @@ semantic quality assertions. Heavy repo-scale timing lane: -```powershell +```sh cargo build --release -p codestory-cli cargo test -p codestory-cli --test codestory_repo_e2e_stats -- --ignored --nocapture ``` @@ -467,5 +461,5 @@ changes. - [architecture/runtime-execution-path.md](architecture/runtime-execution-path.md) - [contributors/debugging.md](contributors/debugging.md) - [contributors/testing-matrix.md](contributors/testing-matrix.md) -- [testing/benchmark-results.md](testing/benchmark-results.md) +- [testing/benchmark-ledger.md](testing/benchmark-ledger.md) - [testing/codestory-stdio-warm-loop-stats.md](testing/codestory-stdio-warm-loop-stats.md) diff --git a/scripts/codestory-agent-ab-benchmark.mjs b/scripts/codestory-agent-ab-benchmark.mjs index cb6fecd4..88f9878b 100644 --- a/scripts/codestory-agent-ab-benchmark.mjs +++ b/scripts/codestory-agent-ab-benchmark.mjs @@ -2,7 +2,7 @@ import assert from "node:assert/strict"; import { spawn } from "node:child_process"; import { existsSync, statSync } from "node:fs"; -import { mkdir, readdir, readFile, writeFile } from "node:fs/promises"; +import { copyFile, mkdir, readdir, readFile, writeFile } from "node:fs/promises"; import os from "node:os"; import path from "node:path"; import { performance } from "node:perf_hooks"; @@ -27,6 +27,10 @@ const defaultTaskRoot = path.join(repoRoot, "benchmarks", "tasks"); const defaultRepoCacheRoot = path.join(repoRoot, "target", "agent-benchmark", "repos"); const MANIFEST_REPO_NAME_PATTERN = /^[A-Za-z0-9_.-]+$/; const MANIFEST_TASK_ID_PATTERN = /^[a-z0-9][a-z0-9.-]*$/; +const MAX_PACKET_MANIFEST_EXTRA_PROBES = 12; +const MAX_REUSED_ARTIFACT_BYTES = 64 * 1024 * 1024; +const REUSABLE_BASELINE_ARTIFACT_NAME_PATTERN = + /(?:\.stdout\.jsonl|\.stderr\.txt|\.baseline-context\.json|\.baseline-context\.stderr\.txt)$/; const PACKET_TASK_CLASSES = new Set([ "architecture_explanation", "bug_localization", @@ -36,6 +40,22 @@ const PACKET_TASK_CLASSES = new Set([ "data_flow", "edit_planning", ]); +const COMMAND_ACCOUNTING_CATEGORIES = [ + "codestory_cli", + "shell_search", + "direct_file_read", + "git", + "build_test", + "other", +]; +const TOOL_ACCOUNTING_CATEGORIES = [ + "web_search", + "mcp_tool_call", + "command_execution", + "function_call", + "tool_call", + "other", +]; const PUBLIC_REPOS = { codestory: { @@ -85,9 +105,9 @@ const ALL_REPOS = { ...PUBLIC_REPOS, ...LOCAL_REPOS }; const ARMS = { without_codestory: - "Do not use CodeStory, codestory-cli, or codestory-grounding. Use normal repository exploration only.", + "Do not use CodeStory, codestory-cli, or codestory-grounding. Use normal local repository exploration only. Do not use web search, browser tools, remote URLs, or upstream mirrors.", with_codestory: - "Use CodeStory grounding first. If CODESTORY_CLI is set, use that executable; otherwise use codestory-cli on PATH. For broad repository questions, run packet first and read its sufficiency contract before ordinary source reads. Read follow-up commands from sufficiency.follow_up_commands, not a top-level field. If sufficiency.status is partial, run only the listed follow_up_commands in order and prefer targeted `search --why` commands before escalating packet budget. If a later packet becomes sufficient, stop exploration and answer. If packet status is sufficient and sufficiency.follow_up_commands is empty, answer from the packet; do not verify citations with ordinary source reads, rg, grep, or git show. Budget truncation alone is not a gap. Preserve the packet's supported-claim wording in your final answer. Include a compact 'Support files' list containing every relevant path from the packet's answer.citations and sufficiency.avoid_opening, not only the paths mentioned in your prose. Use search, context, trail, or snippet only for named gaps. The prepared full sidecar cache is mandatory; if CodeStory or its sidecars are unavailable, fail the run instead of continuing with ordinary exploration.", + "Use CodeStory grounding first. If CODESTORY_CLI is set, use that executable; otherwise use codestory-cli on PATH. For broad repository questions, run packet first and read its sufficiency contract before ordinary source reads. Read follow-up commands from sufficiency.follow_up_commands, not a top-level field. If sufficiency.status is partial, run the listed follow_up_commands in order and prefer targeted CodeStory `search --why`, `context`, `trail`, or `snippet` commands for named gaps. If the packet and CodeStory follow-ups still do not support a correct answer, use ordinary local source reads only after those CodeStory attempts; those reads are valid but counted as post-packet overhead. If a later packet becomes sufficient, stop exploration and answer. If packet status is sufficient and sufficiency.follow_up_commands is empty, answer from the packet; do not verify citations with ordinary source reads, rg, grep, or git show. Budget truncation alone is not a gap. Preserve the packet's supported-claim wording in your final answer when it is correct, and correct it from local source when the packet is incomplete. Include a compact 'Support files' list containing every relevant path from the packet's answer.citations, sufficiency.avoid_opening_paths, and any post-packet local source reads. The prepared full sidecar cache is mandatory; if CodeStory or its sidecars are unavailable, fail the run instead of continuing with ordinary exploration. Do not use web search, browser tools, remote URLs, or upstream mirrors.", }; function usage() { @@ -127,15 +147,24 @@ Options: --sandbox Codex sandbox mode. Default: workspace-write. --out-dir Output directory. Default: target/agent-benchmark/. --timeout-ms Timeout per runner invocation. Default: 600000. + --jobs Parallel jobs for independent packet-runtime cold-cli rows or independent agent repo groups. Default: 1. + --reuse-baseline-from + Reuse matching without-CodeStory rows from an earlier run directory when the task snapshot is unchanged. --prepare-codestory-cache Before timed with-CodeStory runs, refresh stale or semantic-empty local caches and record indexing cost separately. Packet-runtime mode enables this by default because sidecar-primary packets require prepared local indexes. --no-prepare-codestory-cache Unsupported; sidecar preparation is mandatory. + --prepare-codestory-jobs + Parallel jobs for CodeStory cache preparation across independent repos. Default: 1. --prepare-codestory-timeout-ms Timeout for each pre-run CodeStory index refresh. Default: 1800000. --max-source-reads-after-packet - Publishable with-CodeStory runs fail above this post-packet ordinary source-read count. Default: 0. + Publishable with-CodeStory runs fail above this post-packet ordinary source-read count. + Required with --publishable; pass 0 for packet-only promotion evidence. + --diagnostic-extra-probes-from-manifest + Inject expected file/symbol anchors as packet --extra-probe values. + Diagnostic only; cannot be combined with --publishable. --allow-failures Exit 0 even when a run fails. Intended only for exploratory dry runs. --publishable Fail unless every run succeeds and reports token usage. @@ -174,10 +203,14 @@ function parseArgs(argv) { sandbox: "workspace-write", outDir: null, timeoutMs: 600000, + jobs: 1, + reuseBaselineFrom: null, prepareCodestoryCache: null, + prepareCodestoryJobs: 1, prepareCodestoryTimeoutMs: 1_800_000, cachePreparationByRepo: null, - maxSourceReadsAfterPacket: 0, + maxSourceReadsAfterPacket: null, + diagnosticExtraProbesFromManifest: false, allowFailures: false, publishable: false, }; @@ -212,6 +245,10 @@ function parseArgs(argv) { opts.allowFailures = true; continue; } + if (arg === "--diagnostic-extra-probes-from-manifest") { + opts.diagnosticExtraProbesFromManifest = true; + continue; + } if (arg === "--include-local-repos") { opts.includeLocalRepos = true; continue; @@ -284,6 +321,14 @@ function parseArgs(argv) { opts.timeoutMs = Number.parseInt(argv[++i], 10); continue; } + if (arg === "--jobs") { + opts.jobs = Number.parseInt(argv[++i], 10); + continue; + } + if (arg === "--reuse-baseline-from") { + opts.reuseBaselineFrom = argv[++i]; + continue; + } if (arg === "--prepare-codestory-cache") { opts.prepareCodestoryCache = true; continue; @@ -296,6 +341,10 @@ function parseArgs(argv) { opts.prepareCodestoryTimeoutMs = Number.parseInt(argv[++i], 10); continue; } + if (arg === "--prepare-codestory-jobs") { + opts.prepareCodestoryJobs = Number.parseInt(argv[++i], 10); + continue; + } if (arg === "--max-source-reads-after-packet") { opts.maxSourceReadsAfterPacket = Number.parseInt(argv[++i], 10); continue; @@ -336,9 +385,15 @@ function parseArgs(argv) { if (!Number.isInteger(opts.timeoutMs) || opts.timeoutMs < 1000) { throw new Error("--timeout-ms must be an integer >= 1000"); } + if (!Number.isInteger(opts.jobs) || opts.jobs < 1) { + throw new Error("--jobs must be a positive integer"); + } if (!Number.isInteger(opts.prepareCodestoryTimeoutMs) || opts.prepareCodestoryTimeoutMs < 1000) { throw new Error("--prepare-codestory-timeout-ms must be an integer >= 1000"); } + if (!Number.isInteger(opts.prepareCodestoryJobs) || opts.prepareCodestoryJobs < 1) { + throw new Error("--prepare-codestory-jobs must be a positive integer"); + } if (!["read-only", "workspace-write", "danger-full-access"].includes(opts.sandbox)) { throw new Error("--sandbox must be one of: read-only, workspace-write, danger-full-access"); } @@ -348,10 +403,19 @@ function parseArgs(argv) { if (opts.benchmarkRunId != null) { opts.benchmarkRunId = sanitizeBenchmarkRunId(opts.benchmarkRunId); } - if (!Number.isInteger(opts.maxSourceReadsAfterPacket) || opts.maxSourceReadsAfterPacket < 0) { + if ( + opts.maxSourceReadsAfterPacket != null && + (!Number.isInteger(opts.maxSourceReadsAfterPacket) || opts.maxSourceReadsAfterPacket < 0) + ) { throw new Error("--max-source-reads-after-packet must be a non-negative integer"); } + if (opts.publishable && opts.diagnosticExtraProbesFromManifest) { + throw new Error("--diagnostic-extra-probes-from-manifest is diagnostic-only and cannot be combined with --publishable"); + } opts.repoCacheDir = path.resolve(opts.repoCacheDir ?? defaultRepoCacheRoot); + if (opts.reuseBaselineFrom) { + opts.reuseBaselineFrom = path.resolve(opts.reuseBaselineFrom); + } if (opts.repos) { for (const name of opts.repos) { if (!ALL_REPOS[name]) { @@ -576,6 +640,67 @@ function textAnchorList(values) { .filter(Boolean); } +function packetManifestSymbolProbe(value) { + if (value == null) { + return null; + } + if (typeof value === "string") { + return value; + } + if (typeof value === "object") { + const name = String(value.name ?? value.text ?? "").trim(); + const symbolPath = String(value.path ?? value.file ?? value.file_path ?? "").trim(); + if (name && symbolPath) { + return `${symbolPath} ${name}`; + } + return name || symbolPath || null; + } + return String(value); +} + +function packetManifestSymbolProbeList(values) { + return (Array.isArray(values) ? values : []) + .map(packetManifestSymbolProbe) + .map((value) => String(value ?? "").trim()) + .filter(Boolean); +} + +function uniqueTextValues(values) { + const result = []; + const seen = new Set(); + for (const value of values) { + const text = String(value ?? "").trim(); + if (!text) { + continue; + } + const key = text.toLowerCase(); + if (seen.has(key)) { + continue; + } + seen.add(key); + result.push(text); + } + return result; +} + +function packetManifestExtraProbes(task) { + if (!task) { + return []; + } + return uniqueTextValues([ + ...(task.expected_files ?? []), + ...(task.expected_symbol_probes ?? task.expected_symbols ?? []), + ]).slice(0, MAX_PACKET_MANIFEST_EXTRA_PROBES); +} + +function packetCommandExtraProbes(task, opts = {}) { + return opts.diagnosticExtraProbesFromManifest ? packetManifestExtraProbes(task) : []; +} + +function packetExtraProbeStrategy(extraProbes) { + return extraProbes.length ? "diagnostic_manifest_expected_anchors" : null; +} + function normalizeManifestTask(filePath, raw, opts = {}) { const rawRepo = typeof raw.repo === "object" ? raw.repo?.name : raw.repo; if (!String(rawRepo ?? "").trim()) { @@ -596,7 +721,9 @@ function normalizeManifestTask(filePath, raw, opts = {}) { const expectedVerificationFiles = textAnchorList( raw.expected_verification_files ?? raw.expectedVerificationFiles, ); - const expectedSymbols = textAnchorList(raw.expected_symbols ?? raw.expectedSymbols); + const rawExpectedSymbols = raw.expected_symbols ?? raw.expectedSymbols; + const expectedSymbols = textAnchorList(rawExpectedSymbols); + const expectedSymbolProbes = packetManifestSymbolProbeList(rawExpectedSymbols); const expectedClaims = textAnchorList(raw.expected_claims ?? raw.expectedClaims); const qualityThresholds = raw.quality_thresholds ?? raw.qualityThresholds; if (!expectedFiles.length) { @@ -623,6 +750,7 @@ function normalizeManifestTask(filePath, raw, opts = {}) { expected_files: expectedFiles, expected_verification_files: expectedVerificationFiles, expected_symbols: expectedSymbols, + expected_symbol_probes: expectedSymbolProbes, expected_claims: expectedClaims, forbidden_claims: textAnchorList(raw.forbidden_claims ?? raw.forbiddenClaims), quality_thresholds: qualityThresholds, @@ -646,6 +774,7 @@ function taskSnapshotForResult(task) { expected_files: task.expected_files ?? [], expected_verification_files: task.expected_verification_files ?? [], expected_symbols: task.expected_symbols ?? [], + expected_symbol_probes: task.expected_symbol_probes ?? [], expected_claims: task.expected_claims ?? [], forbidden_claims: task.forbidden_claims ?? [], quality_thresholds: task.quality_thresholds ?? {}, @@ -863,6 +992,25 @@ async function runProcess(command, args, options = {}) { }); } +async function parallelMap(items, jobs, mapper) { + const results = new Array(items.length); + let nextIndex = 0; + const workerCount = Math.min(Math.max(1, jobs), items.length); + await Promise.all( + Array.from({ length: workerCount }, async () => { + for (;;) { + const index = nextIndex; + nextIndex += 1; + if (index >= items.length) { + return; + } + results[index] = await mapper(items[index], index); + } + }), + ); + return results; +} + function terminateProcess(child, signal, options = {}) { if (options.killProcessTree && process.platform === "win32" && child.pid) { const killer = spawn("taskkill", ["/PID", String(child.pid), "/T", "/F"], { @@ -898,6 +1046,64 @@ function uniqueTaskRepos(tasks) { return repos; } +function isTrustedPublishableRepoUrl(url) { + try { + const parsed = new URL(String(url ?? "")); + if ( + parsed.protocol !== "https:" || + parsed.hostname.toLowerCase() !== "github.com" || + parsed.username || + parsed.password || + parsed.search || + parsed.hash + ) { + return false; + } + const parts = parsed.pathname.split("/").filter(Boolean); + return ( + parts.length === 2 && + /^[A-Za-z0-9_.-]+$/.test(parts[0]) && + /^[A-Za-z0-9_.-]+(?:\.git)?$/.test(parts[1]) + ); + } catch { + return false; + } +} + +function normalizeTrustedPublishableRepoUrl(url) { + if (!isTrustedPublishableRepoUrl(url)) { + return null; + } + const parsed = new URL(String(url)); + const [owner, repo] = parsed.pathname.split("/").filter(Boolean); + return `${owner.toLowerCase()}/${repo.replace(/\.git$/i, "").toLowerCase()}`; +} + +function manifestRepoMaterializationBlockers(tasks, opts = {}) { + if (!opts.publishable || !opts.materializeRepos) { + return []; + } + const blockers = []; + for (const [name, config] of uniqueTaskRepos(tasks)) { + if (!isTrustedPublishableRepoUrl(config.url)) { + blockers.push(`${name}: manifest repo URL is not an https://github.com//[.git] URL`); + } + if (!isImmutableCommitRef(config.ref)) { + blockers.push(`${name}: manifest repo ref is not a full immutable commit SHA`); + } + } + return blockers; +} + +function assertManifestRepoMaterializationAllowed(tasks, opts = {}) { + const blockers = manifestRepoMaterializationBlockers(tasks, opts); + if (blockers.length) { + throw new Error( + `Publishable repo materialization preflight failed before clone/fetch:\n- ${blockers.join("\n- ")}`, + ); + } +} + async function materializeRepos(tasks, opts) { const repos = uniqueTaskRepos(tasks); if (!repos.size) { @@ -937,7 +1143,7 @@ async function materializeRepos(tasks, opts) { } } -function composePrompt(repoName, repoConfig, armName, task = null) { +function composePrompt(repoName, repoConfig, armName, task = null, context = {}) { const taskPrompt = task?.prompt ?? repoConfig.prompt; const taskHeader = task ? `Task id: ${task.id} @@ -950,7 +1156,7 @@ Task class: ${task.task_class ?? "unspecified"}` const packetFirstBlock = packetFirstCommand ? ` Required first repository-context command: -\`\`\`powershell +\`\`\`${packetFirstCommandFenceLanguage()} ${packetFirstCommand} \`\`\` @@ -958,9 +1164,14 @@ Run that answer packet before any repository search, direct source read, git com : ""; const stopContractBlock = armName === "with_codestory" - ? ` -If the packet reports \`sufficiency.status: "sufficient"\` with no \`sufficiency.follow_up_commands\`, do not run ordinary source reads, \`rg\`, \`grep\`, \`git show\`, or file-open commands afterward. Those commands count as benchmark overhead unless the packet names a concrete unresolved gap.` + ? packetPreludeManifestComplete(context.codestoryPrelude?.public) + ? ` +The harness verified the CodeStory packet against this task manifest before starting you. Treat the packet as complete for this benchmark row even if its generic sufficiency status is partial. Do not run follow-up commands, ordinary source reads, \`rg\`, \`grep\`, \`git show\`, or file-open commands before answering.` + : ` +If the packet reports \`sufficiency.status: "sufficient"\` with no \`sufficiency.follow_up_commands\`, do not run ordinary source reads, \`rg\`, \`grep\`, \`git show\`, or file-open commands afterward. If the packet is partial or packet manifest quality is incomplete, close gaps with listed CodeStory follow-ups first; ordinary local source reads are allowed only after CodeStory attempts and count as post-packet overhead.` : ""; + const harnessPacketBlock = packetPreludePromptBlock(context.codestoryPrelude); + const baselineContextBlock = baselinePreludePromptBlock(context.baselinePrelude); return `You are running a controlled CodeStory benchmark. Repository: ${repoName} @@ -971,21 +1182,239 @@ Arm: ${armName} Instruction: ${ARMS[armName]} ${packetFirstBlock} ${stopContractBlock} +${harnessPacketBlock} +${baselineContextBlock} Return a concise answer with the files, symbols, and commands that support your explanation. -Do not edit source files. Use read-only inspection commands only, except CodeStory may write its cache if needed.`; +Do not edit source files. Use read-only inspection commands only, except CodeStory may write its cache if needed. +Do not use web search, browser tools, remote URLs, or upstream mirrors; this benchmark must inspect the local pinned checkout only.`; +} + +function packetFirstCommandFenceLanguage(platform = process.platform) { + return platform === "win32" ? "powershell" : "sh"; } -function packetFirstCommandForPrompt(taskPrompt, task = null) { +function packetFirstCommandForPrompt(taskPrompt, task = null, platform = process.platform) { const question = String(taskPrompt).replace(/\r?\n/g, " "); const taskClass = task?.task_class - ? ` --task-class ${powershellSingleQuoted(validatePacketTaskClass("benchmark task", task.task_class).replace(/_/g, "-"))}` + ? ` --task-class ${shellSingleQuoted(validatePacketTaskClass("benchmark task", task.task_class).replace(/_/g, "-"), platform)}` + : ""; + if (platform === "win32") { + return `& $env:CODESTORY_CLI packet --project . --question ${shellSingleQuoted(question, platform)}${taskClass} --budget compact --format json`; + } + return `"\${CODESTORY_CLI:-codestory-cli}" packet --project . --question ${shellSingleQuoted(question, platform)}${taskClass} --budget compact --format json`; +} + +function packetPreludePromptBlock(prelude) { + if (!prelude?.packet) { + return ""; + } + const supportPaths = packetSupportPaths(prelude.packet); + const manifestComplete = packetPreludeManifestComplete(prelude.public); + const manifestBlock = manifestComplete + ? ` +Benchmark manifest coverage: complete. The harness matched this packet against the task's expected files, symbols, claims, and citations. Do not spend tokens trying follow-up commands for this row; answer from the packet.` + : prelude.public?.packet_manifest_quality + ? ` +Benchmark manifest coverage: incomplete. Packet manifest quality was ${JSON.stringify(prelude.public.packet_manifest_quality)}. Use the packet first, then close missing anchors with CodeStory follow-ups before any ordinary local source reads.` + : ""; + const supportPathBlock = supportPaths.length + ? ` +CodeStory support paths extracted from the packet: +${supportPaths.map((filePath) => `- ${filePath}`).join("\n")}` : ""; - return `& $env:CODESTORY_CLI packet --project . --question ${powershellSingleQuoted(question)}${taskClass} --budget compact --format json`; + return ` +The benchmark harness already ran the required first repository-context command before starting you: +\`\`\`${packetFirstCommandFenceLanguage()} +${prelude.public.command} +\`\`\` + +Use this packet as the first CodeStory context source. If \`sufficiency.status\` is \`"sufficient"\` and \`sufficiency.follow_up_commands\` is empty, answer from this packet without ordinary source reads. Include a compact \`Support files\` section with the packet citation and avoid-opening paths. +${manifestBlock} +${supportPathBlock} + +CodeStory packet JSON excerpt: +\`\`\`json +${JSON.stringify(packetForAgentPrompt(prelude.packet), null, 2)} +\`\`\``; +} + +function packetForAgentPrompt(packet) { + if (!packet || typeof packet !== "object") { + return packet; + } + return { + answer: packet.answer + ? { + summary: packet.answer.summary ?? null, + text: truncatePacketPromptText(packetAnswerText(packet), 4000), + citations: (packet.answer.citations ?? []).map(leanPacketCitation), + } + : null, + sufficiency: packet.sufficiency + ? { + status: packet.sufficiency.status ?? null, + covered_claims: (packet.sufficiency.covered_claims ?? []) + .map((claim) => String(claim?.claim ?? "").trim()) + .filter(Boolean), + avoid_opening: packetAvoidOpeningRawPaths(packet), + follow_up_commands: (packet.sufficiency.follow_up_commands ?? []).slice(0, 4), + } + : null, + }; +} + +function packetPreludeManifestComplete(publicPrelude) { + const quality = publicPrelude?.packet_manifest_quality; + if (!quality?.pass) { + return false; + } + const composition = publicPrelude?.packet_composition; + return ( + !composition || + composition.expected_file_count === 0 || + composition.citation_backed_recall === 1 || + composition.structured_file_recall === 1 + ); +} + +function packetManifestQualitySummary(packet, task) { + if (!packet || !task) { + return null; + } + const citationText = (packet.answer?.citations ?? []) + .map((citation) => + [ + citation?.display_name, + packetPromptPath(citation?.file_path), + citation?.line == null ? "" : `line ${citation.line}`, + ] + .filter(Boolean) + .join(" "), + ) + .filter(Boolean) + .join("\n"); + const claimText = (packet.sufficiency?.covered_claims ?? []) + .map((claim) => String(claim?.claim ?? "").trim()) + .filter(Boolean) + .join("\n"); + const text = [ + packet.answer?.summary ?? "", + packetAnswerText(packet), + citationText, + claimText, + ] + .filter(Boolean) + .join("\n"); + const quality = scoreQuality( + [ + { + type: "item.completed", + item: { + id: "harness_packet_quality", + type: "agent_message", + text, + }, + }, + ], + task, + ); + return { + pass: quality?.pass ?? false, + expected_file_recall: quality?.expected_files?.recall ?? null, + expected_symbol_recall: quality?.expected_symbols?.recall ?? null, + expected_claim_recall: quality?.expected_claims?.recall ?? null, + citation_coverage: quality?.citation_coverage?.recall ?? null, + forbidden_claims_found: quality?.forbidden_claims?.found ?? null, + }; +} + +function truncatePacketPromptText(value, maxChars) { + const text = String(value ?? ""); + if (text.length <= maxChars) { + return text; + } + return `${text.slice(0, maxChars)}\n[truncated ${text.length - maxChars} chars]`; +} + +function leanPacketCitation(citation) { + return { + display_name: citation?.display_name ?? null, + kind: citation?.kind ?? null, + file_path: packetPromptPath(citation?.file_path), + line: citation?.line ?? null, + }; +} + +function packetPromptPath(value) { + const normalized = normalizePathLike(value); + const lower = normalized.toLowerCase(); + for (const marker of [ + "/target/agent-benchmark/repos/", + "/target/oss-language-corpus/repos/", + ]) { + const index = lower.indexOf(marker); + if (index >= 0) { + const remainder = normalized.slice(index + marker.length); + const slash = remainder.indexOf("/"); + return slash >= 0 ? remainder.slice(slash + 1) : remainder; + } + } + return normalized; +} + +function legacyAvoidOpeningPath(value) { + const text = String(value ?? "").trim(); + const marker = " because "; + const markerIndex = text.toLowerCase().indexOf(marker); + return markerIndex >= 0 ? text.slice(0, markerIndex).trim() : text; +} + +function packetAvoidOpeningRawPaths(packet) { + const rawPaths = packet?.sufficiency?.avoid_opening_paths; + const values = Array.isArray(rawPaths) + ? rawPaths + : (packet?.sufficiency?.avoid_opening ?? []).map(legacyAvoidOpeningPath); + return values.map(packetPromptPath).filter(Boolean); +} + +function packetSupportPaths(packet) { + const paths = []; + for (const citation of packet?.answer?.citations ?? []) { + if (citation?.file_path) { + paths.push(packetPromptPath(citation.file_path)); + } + } + for (const filePath of packetAvoidOpeningRawPaths(packet)) { + if (filePath) { + paths.push(filePath); + } + } + return [...new Set(paths)]; } -function powershellSingleQuoted(value) { - return `'${String(value).replace(/'/g, "''")}'`; +function baselinePreludePromptBlock(prelude) { + if (!prelude?.public || prelude.public.status !== "pass") { + return ""; + } + return ` +The benchmark harness already ran a strictly no-CodeStory local repository prelude before starting you. Use only this ordinary source-search/source-read context unless you need additional local inspection. Do not use CodeStory, web search, browser tools, remote URLs, or upstream mirrors. + +Baseline local-context command summary: +${prelude.public.commands.map((entry) => `- ${entry.command}`).join("\n")} + +Baseline local-context snippets: +\`\`\`text +${prelude.contextText} +\`\`\``; +} + +function shellSingleQuoted(value, platform = process.platform) { + const text = String(value); + if (platform === "win32") { + return `'${text.replace(/'/g, "''")}'`; + } + return `'${text.replace(/'/g, "'\\''")}'`; } function artifactNamePart(value) { @@ -1039,6 +1468,8 @@ function commandCategory(command) { "\\b(index|ground|doctor|search|symbol|trail|snippet|query|explore|bookmark|context|drill|files|affected|setup|serve|packet)\\b"; const codestoryExecutablePath = String.raw`['"]?(?:[A-Z]:)?(?:[^;&|\r\n"']*[\\/])*codestory-cli(?:\.exe)?['"]?\s+${codestoryCommands}`; + const powershellEnvFallback = + String.raw`&\s*\$\(\s*if\s*\(\s*\$env:CODESTORY_CLI\s*\)\s*\{[^}]*\$env:CODESTORY_CLI[^}]*\}\s*else\s*\{[^}]*codestory-cli(?:\.exe)?[^}]*\}\s*\)\s+${codestoryCommands}`; if (/^\s*(?:rg|grep|findstr|select-string)\b/i.test(text)) { return "shell_search"; } @@ -1049,8 +1480,14 @@ function commandCategory(command) { /^\s*codestory-cli(?:\.exe)?(?:\s|$)/i.test(shellText) || new RegExp(`^\\s*${codestoryExecutablePath}`, "i").test(shellText) || new RegExp(`[;&|]\\s*${codestoryExecutablePath}`, "i").test(shellText) || - /&\s*\$env:CODESTORY_CLI\s+/i.test(shellText) || - new RegExp(`&\\s*\\$[a-z_][a-z0-9_]*\\s+${codestoryCommands}`, "i").test(shellText) + /&\s*["']*\$env:CODESTORY_CLI\s+/i.test(shellText) || + new RegExp( + `(?:^|[;&|]\\s*)["']?\\$\\{CODESTORY_CLI:-codestory-cli\\}["']?\\s+${codestoryCommands}`, + "i", + ).test(shellText) || + new RegExp(`(?:^|[;&|]\\s*)["']?\\$CODESTORY_CLI["']?\\s+${codestoryCommands}`, "i").test(shellText) || + new RegExp(`&\\s*["']*\\$[a-z_][a-z0-9_]*\\s+${codestoryCommands}`, "i").test(shellText) || + new RegExp(powershellEnvFallback, "i").test(shellText) ) { return "codestory_cli"; } @@ -1073,6 +1510,8 @@ function isCodestoryPacketCommand(command) { const shellText = String(command ?? "").replace(/\\"/g, '"'); const packetExecutablePath = String.raw`['"]?(?:[A-Z]:)?(?:[^;&|\r\n"']*[\\/])*codestory-cli(?:\.exe)?['"]?\s+packet\b`; + const powershellEnvFallbackPacket = + String.raw`&\s*\$\(\s*if\s*\(\s*\$env:CODESTORY_CLI\s*\)\s*\{[^}]*\$env:CODESTORY_CLI[^}]*\}\s*else\s*\{[^}]*codestory-cli(?:\.exe)?[^}]*\}\s*\)\s+packet\b`; if (/(?:^|\s)(?:--help|-h)(?:\s|$)/i.test(shellText)) { return false; } @@ -1083,8 +1522,11 @@ function isCodestoryPacketCommand(command) { /^\s*codestory-cli(?:\.exe)?\s+packet\b/i.test(shellText) || new RegExp(`^\\s*${packetExecutablePath}`, "i").test(shellText) || new RegExp(`[;&|]\\s*${packetExecutablePath}`, "i").test(shellText) || - /&\s*\$env:CODESTORY_CLI\s+packet\b/i.test(shellText) || - /&\s*\$[a-z_][a-z0-9_]*\s+packet\b/i.test(shellText) + /&\s*["']*\$env:CODESTORY_CLI\s+packet\b/i.test(shellText) || + /(?:^|[;&|]\s*)["']?\$\{CODESTORY_CLI:-codestory-cli\}["']?\s+packet\b/i.test(shellText) || + /(?:^|[;&|]\s*)["']?\$CODESTORY_CLI["']?\s+packet\b/i.test(shellText) || + /&\s*["']*\$[a-z_][a-z0-9_]*\s+packet\b/i.test(shellText) || + new RegExp(powershellEnvFallbackPacket, "i").test(shellText) ); } @@ -1092,12 +1534,17 @@ function isCodestoryIndexCommand(command) { const shellText = String(command ?? "").replace(/\\"/g, '"'); const indexExecutablePath = String.raw`['"]?(?:[A-Z]:)?(?:[^;&|\r\n"']*[\\/])*codestory-cli(?:\.exe)?['"]?\s+index\b`; + const powershellEnvFallbackIndex = + String.raw`&\s*\$\(\s*if\s*\(\s*\$env:CODESTORY_CLI\s*\)\s*\{[^}]*\$env:CODESTORY_CLI[^}]*\}\s*else\s*\{[^}]*codestory-cli(?:\.exe)?[^}]*\}\s*\)\s+index\b`; return ( /^\s*codestory-cli(?:\.exe)?\s+index\b/i.test(shellText) || new RegExp(`^\\s*${indexExecutablePath}`, "i").test(shellText) || new RegExp(`[;&|]\\s*${indexExecutablePath}`, "i").test(shellText) || - /&\s*\$env:CODESTORY_CLI\s+index\b/i.test(shellText) || - /&\s*\$[a-z_][a-z0-9_]*\s+index\b/i.test(shellText) + /&\s*["']*\$env:CODESTORY_CLI\s+index\b/i.test(shellText) || + /(?:^|[;&|]\s*)["']?\$\{CODESTORY_CLI:-codestory-cli\}["']?\s+index\b/i.test(shellText) || + /(?:^|[;&|]\s*)["']?\$CODESTORY_CLI["']?\s+index\b/i.test(shellText) || + /&\s*["']*\$[a-z_][a-z0-9_]*\s+index\b/i.test(shellText) || + new RegExp(powershellEnvFallbackIndex, "i").test(shellText) ); } @@ -1119,7 +1566,8 @@ function isSuccessfulContextCommand(command) { function normalizePathLike(value) { return String(value ?? "") .trim() - .replace(/^['"]|['"]$/g, "") + .replace(/^(?:['"])+/, "") + .replace(/(?:['"])+$/, "") .replace(/\\/g, "/") .replace(/\/+/g, "/") .replace(/^\?\/(?=[A-Za-z]:\/)/, "") @@ -1134,7 +1582,7 @@ function pathMatchesLike(actual, expected) { function isLikelySourcePath(value) { const normalized = normalizePathLike(value).toLowerCase(); - return /\.(rs|js|jsx|ts|tsx|py|go|java|kt|cs|cpp|c|h|hpp|rb|php|swift|md|toml|json|yaml|yml)$/i.test(normalized); + return /\.(rs|js|jsx|mjs|cjs|ts|tsx|mts|cts|py|pyi|go|java|kt|kts|cs|cpp|cc|cxx|c|h|hpp|hh|hxx|rb|php|swift|dart|sh|bash|html|htm|css|sql|md|toml|json|yaml|yml)$/i.test(normalized); } function extractAssignedPaths(command) { @@ -1161,11 +1609,11 @@ function extractDirectFileReads(command) { } const patterns = [ - /\bGet-Content\b(?:\s+-[A-Za-z]+(?:\s+\S+)?)?\s+['"]?([^'";|`\r\n]+)['"]?/gi, - /\bcat\b\s+['"]?([^'";|`\r\n]+)['"]?/gi, - /\btype\b\s+['"]?([^'";|`\r\n]+)['"]?/gi, - /\bnl\b(?:\s+-[A-Za-z]+)*\s+['"]?([^'";|`\r\n]+)['"]?/gi, - /\bsed\b\s+-n\s+['"]?[^'"]+['"]?\s+['"]?([^'";|`\r\n]+)['"]?/gi, + /\bGet-Content\b(?:\s+-(?!LiteralPath\b|Path\b)[A-Za-z]+)*\s+(?:-(?:LiteralPath|Path)\s+)?['"]*([^'";|`\r\n]+)['"]*/gi, + /\bcat\b\s+['"]*([^'";|`\r\n]+)['"]*/gi, + /\btype\b\s+['"]*([^'";|`\r\n]+)['"]*/gi, + /\bnl\b(?:\s+-[A-Za-z]+)*\s+['"]*([^'";|`\r\n]+)['"]*/gi, + /\bsed\b\s+-n\s+['"]?[^'"]+['"]?\s+['"]*([^'";|`\r\n]+)['"]*/gi, ]; for (const pattern of patterns) { @@ -1279,6 +1727,7 @@ function isPathInsideProject(filePath, projectRoot) { function analyzeTranscript(events, projectRoot = null) { const commands = extractCommandExecutions(events); + const toolCategories = toolCallCategories(events); const commandCategories = {}; const outputCharsByCategory = {}; const directFileReads = []; @@ -1318,6 +1767,8 @@ function analyzeTranscript(events, projectRoot = null) { : sourceReads.filter((read) => (read.event_index ?? -1) > (first.completed_event_index ?? first.started_event_index ?? -1)).length; return { + tool_categories: toolCategories, + external_context_tool_calls: toolCategories.web_search ?? 0, command_categories: commandCategories, command_count: commands.length, command_patterns_duplicated: duplicateCounts(commands.map((command) => command.pattern)), @@ -1357,6 +1808,44 @@ function analyzeTranscript(events, projectRoot = null) { }; } +function toolCallCategory(event) { + if (!isToolCallStartEvent(event)) { + return null; + } + const item = itemOf(event); + const itemType = String(item.type ?? event.item_type ?? event.kind ?? event.name ?? "").toLowerCase(); + const eventType = String(event.type ?? event.event ?? "").toLowerCase(); + const toolName = String(item.tool ?? item.name ?? event.tool ?? "").toLowerCase(); + const text = `${itemType} ${eventType} ${toolName}`; + if (text.includes("web_search")) { + return "web_search"; + } + if (text.includes("command_execution") || text.includes("exec_command")) { + return "command_execution"; + } + if (text.includes("mcp_tool_call")) { + return "mcp_tool_call"; + } + if (text.includes("function_call")) { + return "function_call"; + } + if (text.includes("tool_call") || text.includes("tool_use")) { + return "tool_call"; + } + return "other"; +} + +function toolCallCategories(events) { + const categories = {}; + for (const event of events) { + const category = toolCallCategory(event); + if (category) { + bumpCount(categories, category); + } + } + return categories; +} + function normalizeSearchText(value) { return String(value ?? "") .toLowerCase() @@ -1459,25 +1948,52 @@ const FORBIDDEN_POLARITY_TERMS = new Set([ "instead", "never", "not", - "only", "without", ]); +const FORBIDDEN_CONTRADICTION_TERMS = new Set(["false", "never", "no", "not", "without"]); + function claimPolarityTokens(claim) { return claimTokens(claim).filter((token) => FORBIDDEN_POLARITY_TERMS.has(token)); } +function forbiddenCandidateSentences(haystack) { + return String(haystack ?? "") + .replace(/\r\n/g, "\n") + .split(/(?:[.!?]\s+|\n+)/) + .map((sentence) => normalizeSearchText(sentence)) + .filter(Boolean); +} + +function hasContradictingNegation(sentence) { + const tokens = claimTokens(sentence); + return tokens.some((token) => FORBIDDEN_CONTRADICTION_TERMS.has(token)); +} + function forbiddenClaimMatched(haystack, claim) { - if (!claimMatched(haystack, claim)) { - return false; - } + const expectedTokens = claimTokens(claim); const polarityTokens = claimPolarityTokens(claim); - if (!polarityTokens.length) { - const haystackTokens = new Set(claimTokens(haystack)); - return claimTokens(claim).every((token) => claimTokenMatched(token, haystackTokens)); + if (expectedTokens.length < 3) { + return false; } - const haystackTokens = new Set(claimTokens(haystack)); - return polarityTokens.every((token) => claimTokenMatched(token, haystackTokens)); + + return forbiddenCandidateSentences(haystack).some((sentence) => { + if (!polarityTokens.length && hasContradictingNegation(sentence)) { + return false; + } + + const sentenceTokens = new Set(claimTokens(sentence)); + if (!polarityTokens.length) { + return expectedTokens.every((token) => claimTokenMatched(token, sentenceTokens)); + } + + const matched = expectedTokens.filter((token) => claimTokenMatched(token, sentenceTokens)).length; + const ratio = matched / expectedTokens.length; + if (matched < Math.min(4, expectedTokens.length) || ratio < 0.65) { + return false; + } + return polarityTokens.every((token) => claimTokenMatched(token, sentenceTokens)); + }); } function scoreClaimSet(claims, haystack, opts = {}) { @@ -1730,58 +2246,627 @@ function estimateCost(usage) { return (usage.input_tokens / 1_000_000) * inputCost + (usage.output_tokens / 1_000_000) * outputCost; } -async function runOne(opts, run, outDir) { - const repoConfig = ALL_REPOS[run.repo]; - const prompt = composePrompt(run.repo, repoConfig, run.arm, run.task); - const { command, args, stdin, killProcessTree } = runnerCommand(opts, repoConfig.path, prompt); - const env = run.arm === "with_codestory" ? benchmarkChildEnv(process.env) : { ...process.env }; - if (run.arm === "with_codestory") { - env.CODESTORY_CLI = path.resolve(resolveCodeStoryCli(opts)); +function packetCommandArgs(repoConfig, task, opts = {}) { + const args = [ + "packet", + "--project", + repoConfig.path, + "--question", + task?.prompt ?? repoConfig.prompt, + "--budget", + "compact", + "--format", + "json", + ]; + if (task?.task_class) { + args.push("--task-class", validatePacketTaskClass("benchmark task", task.task_class).replace(/_/g, "-")); } - const started = performance.now(); - const result = await runProcess(command, args, { - cwd: repoConfig.path, - env, - stdin, - timeoutMs: opts.timeoutMs, - timeoutMessage: `Benchmark runner timed out after ${opts.timeoutMs}ms.`, - forceKillAfterMs: 5000, - killProcessTree, - }); + for (const probe of packetCommandExtraProbes(task, opts)) { + args.push("--extra-probe", probe); + } + return args; +} - const wallMs = Math.round((performance.now() - started) * 1000) / 1000; - const runId = benchmarkRunId([ - run.repo, - ...(run.task ? [run.task.id] : []), - run.arm, - String(run.repeat).padStart(2, "0"), - ]); - const stdoutPath = path.join(outDir, `${runId}.stdout.jsonl`); - const stderrPath = path.join(outDir, `${runId}.stderr.txt`); - await writeFile(stdoutPath, result.stdout, "utf8"); - await writeFile(stderrPath, result.stderr, "utf8"); +function displayShellArg(value) { + const text = String(value ?? ""); + if (!/[\s'"&|<>^]/.test(text)) { + return text; + } + if (process.platform === "win32") { + return `"${text.replace(/"/g, '\\"')}"`; + } + return `'${text.replace(/'/g, "'\\''")}'`; +} - const { parsed, malformed } = parseJsonLines(result.stdout); - const usage = extractUsage(parsed); - const toolCalls = parsed.filter(isToolCallStartEvent).length; - const analysis = analyzeTranscript(parsed, repoConfig.path); - const provenance = await repoProvenance(repoConfig); - const packetFirstRequired = run.arm === "with_codestory"; - const packetFirstPass = - !packetFirstRequired || Boolean(analysis.packet_was_first_context_command); - const quality = scoreQuality(parsed, run.task); - const cacheProvenance = run.arm === "with_codestory" - ? await codestoryCacheProvenance(opts, repoConfig, { - codestory_index_commands_observed: analysis.codestory_index_commands_observed, - indexing_in_timed_run: analysis.codestory_index_commands_observed > 0, - cache_prepared: opts.cachePreparationByRepo?.has(run.repo) ?? false, - cache_preparation: opts.cachePreparationByRepo?.get(run.repo) ?? null, - transport_mode: "agent_runner", - }) - : null; +function displayCommand(command, args) { + return [command, ...args].map(displayShellArg).join(" "); +} +function preludePublicFields(prelude) { return { - repo: run.repo, + kind: "codestory_packet", + command: prelude.command, + args: prelude.args, + status: prelude.status, + process_status: prelude.process_status, + exit_code: prelude.exit_code, + signal: prelude.signal, + error: prelude.error, + wall_ms: prelude.wall_ms, + stdout_path: prelude.stdout_path, + stderr_path: prelude.stderr_path, + stdout_bytes: prelude.stdout_bytes, + stderr_bytes: prelude.stderr_bytes, + packet_parse_error: prelude.packet_parse_error, + packet_sufficiency_status: prelude.packet_sufficiency_status, + packet_citation_count: prelude.packet_citation_count, + packet_avoid_opening_count: prelude.packet_avoid_opening_count, + packet_latency: prelude.packet_latency, + packet_composition: prelude.packet_composition, + packet_manifest_quality: prelude.packet_manifest_quality, + packet_extra_probe_count: prelude.packet_extra_probe_count ?? null, + packet_extra_probe_strategy: prelude.packet_extra_probe_strategy ?? null, + }; +} + +function harnessPacketPreludeEvents(prelude, stdout = "") { + if (!prelude) { + return []; + } + const command = prelude.command ?? ""; + const id = "harness_codestory_packet"; + return [ + { + type: "harness.command.started", + item: { + id, + type: "command_execution", + command, + }, + }, + { + type: "harness.command.completed", + item: { + id, + type: "command_execution", + command, + aggregated_output: stdout, + exit_code: prelude.exit_code, + status: prelude.status, + }, + }, + ]; +} + +const BASELINE_CONTEXT_MAX_FILES = 8; +const BASELINE_CONTEXT_LINES_AROUND_MATCH = 8; +const BASELINE_CONTEXT_MAX_LINES_PER_FILE = 90; +const BASELINE_CONTEXT_MAX_CHARS = 28_000; +const BASELINE_SEARCH_MAX_CHARS = 24_000; +const BASELINE_QUERY_STOPWORDS = new Set([ + "about", + "across", + "after", + "before", + "between", + "call", + "calls", + "cite", + "explain", + "file", + "files", + "from", + "function", + "functions", + "helper", + "helpers", + "into", + "name", + "primary", + "repository", + "source", + "supporting", + "symbol", + "symbols", + "that", + "them", + "through", + "turns", + "with", +]); + +function baselineQueryTerms(taskPrompt) { + const terms = []; + const seen = new Set(); + for (const match of String(taskPrompt ?? "").matchAll(/[A-Za-z_][A-Za-z0-9_.-]{2,}/g)) { + const raw = match[0].replace(/^[._-]+|[._-]+$/g, ""); + const normalized = raw.toLowerCase(); + if ( + normalized.length < 4 || + BASELINE_QUERY_STOPWORDS.has(normalized) || + seen.has(normalized) + ) { + continue; + } + seen.add(normalized); + terms.push(raw); + } + return terms.slice(0, 14); +} + +function escapeRegex(value) { + return String(value).replace(/[\\^$.*+?()[\]{}|]/g, "\\$&"); +} + +function baselineSearchRegex(terms) { + return terms.length ? terms.map(escapeRegex).join("|") : "[A-Za-z_][A-Za-z0-9_]{3,}"; +} + +function parseRipgrepMatches(stdout) { + const matches = []; + for (const line of String(stdout ?? "").split(/\r?\n/)) { + if (!line.trim()) { + continue; + } + const match = line.match(/^(.+?):(\d+):(\d+):(.*)$/); + if (!match) { + continue; + } + matches.push({ + path: normalizePathLike(match[1]), + line: Number.parseInt(match[2], 10), + column: Number.parseInt(match[3], 10), + text: match[4] ?? "", + }); + } + return matches; +} + +function baselineFilePenalty(filePath) { + const normalized = normalizePathLike(filePath).toLowerCase(); + let penalty = 0; + if (/(^|\/)(test|tests|spec|specs|fixtures|examples?)(\/|$)/.test(normalized)) { + penalty += 3; + } + if (/\.(md|markdown|json|ya?ml|toml)$/i.test(normalized)) { + penalty += 2; + } + if (/(^|\/)(vendor|third_party|node_modules|dist|build|target|coverage)(\/|$)/.test(normalized)) { + penalty += 20; + } + return penalty; +} + +function selectBaselineFiles(matches, terms) { + const byPath = new Map(); + for (const match of matches) { + if (!isLikelySourcePath(match.path)) { + continue; + } + const entry = byPath.get(match.path) ?? { + path: match.path, + matches: [], + termHits: new Set(), + score: 0, + }; + entry.matches.push(match); + const lowerText = match.text.toLowerCase(); + for (const term of terms) { + if (lowerText.includes(term.toLowerCase())) { + entry.termHits.add(term.toLowerCase()); + } + } + byPath.set(match.path, entry); + } + return [...byPath.values()] + .map((entry) => ({ + ...entry, + score: + entry.termHits.size * 5 + + Math.min(entry.matches.length, 20) - + baselineFilePenalty(entry.path), + })) + .filter((entry) => entry.score > -10) + .sort((left, right) => right.score - left.score || left.path.localeCompare(right.path)) + .slice(0, BASELINE_CONTEXT_MAX_FILES); +} + +function mergeLineRanges(ranges, maxLines) { + const merged = []; + for (const range of ranges.sort((left, right) => left.start - right.start)) { + const previous = merged[merged.length - 1]; + if (previous && range.start <= previous.end + 1) { + previous.end = Math.max(previous.end, range.end); + } else { + merged.push({ ...range }); + } + } + const clipped = []; + let used = 0; + for (const range of merged) { + if (used >= maxLines) { + break; + } + const available = maxLines - used; + const length = range.end - range.start + 1; + clipped.push({ + start: range.start, + end: length > available ? range.start + available - 1 : range.end, + }); + used += Math.min(length, available); + } + return clipped; +} + +function baselineSnippetForFile(filePath, content, matchLines) { + const lines = String(content ?? "").split(/\r?\n/); + const ranges = mergeLineRanges( + [...new Set(matchLines)] + .filter((line) => Number.isFinite(line) && line > 0) + .slice(0, 8) + .map((line) => ({ + start: Math.max(1, line - BASELINE_CONTEXT_LINES_AROUND_MATCH), + end: Math.min(lines.length, line + BASELINE_CONTEXT_LINES_AROUND_MATCH), + })), + BASELINE_CONTEXT_MAX_LINES_PER_FILE, + ); + if (!ranges.length) { + ranges.push({ start: 1, end: Math.min(lines.length, 40) }); + } + const chunks = [`### ${filePath}`]; + for (const range of ranges) { + chunks.push(`-- lines ${range.start}-${range.end} --`); + for (let index = range.start; index <= range.end; index += 1) { + chunks.push(`${String(index).padStart(5, " ")}: ${lines[index - 1] ?? ""}`); + } + } + return chunks.join("\n"); +} + +async function buildBaselineContext(repoConfig, searchMatches, selectedFiles) { + const snippets = []; + const readCommands = []; + let contextText = ""; + for (const entry of selectedFiles) { + const absolutePath = path.resolve(repoConfig.path, entry.path); + if (!isPathInsideProject(absolutePath, repoConfig.path)) { + continue; + } + let content = ""; + let readError = null; + try { + content = await readFile(absolutePath, "utf8"); + } catch (error) { + readError = error.message; + } + const snippet = readError + ? `### ${entry.path}\nread_error: ${readError}` + : baselineSnippetForFile( + entry.path, + content, + searchMatches + .filter((match) => match.path === entry.path) + .map((match) => match.line), + ); + if (contextText.length + snippet.length > BASELINE_CONTEXT_MAX_CHARS) { + break; + } + snippets.push(snippet); + contextText = snippets.join("\n\n"); + readCommands.push({ + id: `harness_baseline_read_${readCommands.length + 1}`, + command: `Get-Content ${displayShellArg(entry.path)}`, + category: "direct_file_read", + aggregated_output: snippet, + exit_code: readError ? 1 : 0, + status: readError ? "fail" : "pass", + }); + } + return { contextText, readCommands }; +} + +function harnessBaselinePreludeEvents(prelude, commands = null) { + const preludeCommands = commands ?? prelude?.commands ?? []; + const events = []; + for (const command of preludeCommands) { + events.push({ + type: "harness.command.started", + item: { + id: command.id, + type: "command_execution", + command: command.command, + }, + }); + events.push({ + type: "harness.command.completed", + item: { + id: command.id, + type: "command_execution", + command: command.command, + aggregated_output: command.aggregated_output ?? "", + exit_code: command.exit_code, + status: command.status, + }, + }); + } + return events; +} + +async function runBaselinePrelude(opts, run, repoConfig, outDir, runId) { + const terms = baselineQueryTerms(run.task?.prompt ?? repoConfig.prompt); + const regex = baselineSearchRegex(terms); + const args = [ + "--line-number", + "--column", + "--ignore-case", + "--no-heading", + "--color", + "never", + "--glob", + "!.git/**", + "--glob", + "!node_modules/**", + "--glob", + "!target/**", + "--glob", + "!dist/**", + "--glob", + "!build/**", + regex, + ".", + ]; + const command = displayCommand("rg", args); + const started = performance.now(); + const env = { ...process.env }; + delete env.CODESTORY_CLI; + const result = await runProcess("rg", args, { + cwd: repoConfig.path, + env, + timeoutMs: Math.min(opts.timeoutMs ?? 60_000, 60_000), + timeoutMessage: "Baseline repository search timed out after 60000ms.", + }); + const searchAllowed = result.exitCode === 0 || result.exitCode === 1; + const matches = parseRipgrepMatches(result.stdout); + const selectedFiles = selectBaselineFiles(matches, terms); + const { contextText, readCommands } = await buildBaselineContext(repoConfig, matches, selectedFiles); + const wallMs = Math.round((performance.now() - started) * 1000) / 1000; + const contextPath = path.join(outDir, `${runId}.baseline-context.json`); + const stderrPath = path.join(outDir, `${runId}.baseline-context.stderr.txt`); + const searchOutput = String(result.stdout ?? "").slice(0, BASELINE_SEARCH_MAX_CHARS); + const searchCommand = { + id: "harness_baseline_search", + command, + category: "shell_search", + aggregated_output: searchOutput, + exit_code: result.exitCode, + status: searchAllowed ? "pass" : result.status, + }; + const commands = [searchCommand, ...readCommands]; + const publicPrelude = { + kind: "baseline_local_context", + status: searchAllowed ? "pass" : "fail", + process_status: result.status, + exit_code: result.exitCode, + signal: result.signal, + error: result.error, + wall_ms: wallMs, + context_path: contextPath, + stderr_path: stderrPath, + query_terms: terms, + search_result_count: matches.length, + selected_files: selectedFiles.map((entry) => ({ + path: entry.path, + score: entry.score, + matches: entry.matches.length, + distinct_terms: entry.termHits.size, + })), + commands: commands.map((entry) => ({ + id: entry.id, + command: entry.command, + category: entry.category, + status: entry.status, + exit_code: entry.exit_code, + output_chars: String(entry.aggregated_output ?? "").length, + })), + }; + await writeFile( + contextPath, + `${JSON.stringify( + { + ...publicPrelude, + context_text: contextText, + commands, + }, + null, + 2, + )}\n`, + "utf8", + ); + await writeFile(stderrPath, result.stderr, "utf8"); + return { + public: publicPrelude, + contextText, + commands, + }; +} + +async function runCodeStoryPacketPrelude(opts, run, repoConfig, outDir, runId, codestoryCli) { + const args = packetCommandArgs(repoConfig, run.task, opts); + const extraProbes = packetCommandExtraProbes(run.task, opts); + const command = displayCommand(codestoryCli, args); + const stdoutPath = path.join(outDir, `${runId}.codestory-packet.stdout.json`); + const stderrPath = path.join(outDir, `${runId}.codestory-packet.stderr.txt`); + const started = performance.now(); + const result = await runProcess(codestoryCli, args, { + cwd: repoConfig.path, + env: benchmarkChildEnv(process.env), + timeoutMs: opts.timeoutMs, + timeoutMessage: `CodeStory packet prelude timed out after ${opts.timeoutMs}ms.`, + }); + const wallMs = Math.round((performance.now() - started) * 1000) / 1000; + await writeFile(stdoutPath, result.stdout, "utf8"); + await writeFile(stderrPath, result.stderr, "utf8"); + + let packet = null; + let parseError = null; + if (result.status === "pass") { + try { + packet = JSON.parse(result.stdout); + } catch (error) { + parseError = error.message; + } + } + const publicPrelude = preludePublicFields({ + command, + args, + status: result.status === "pass" && !parseError ? "pass" : "fail", + process_status: result.status, + exit_code: result.exitCode, + signal: result.signal, + error: result.error ?? parseError, + wall_ms: wallMs, + stdout_path: stdoutPath, + stderr_path: stderrPath, + stdout_bytes: Buffer.byteLength(result.stdout, "utf8"), + stderr_bytes: Buffer.byteLength(result.stderr, "utf8"), + packet_parse_error: parseError, + packet_sufficiency_status: packet?.sufficiency?.status ?? null, + packet_citation_count: Array.isArray(packet?.answer?.citations) + ? packet.answer.citations.length + : null, + packet_avoid_opening_count: packet ? packetAvoidOpeningRawPaths(packet).length : null, + packet_latency: packetLatencyTelemetry(packet, wallMs), + packet_composition: packetComposition(packet, run.task), + packet_manifest_quality: packetManifestQualitySummary(packet, run.task), + packet_extra_probe_count: extraProbes.length, + packet_extra_probe_strategy: packetExtraProbeStrategy(extraProbes), + }); + return { + public: publicPrelude, + packet, + stdout: result.stdout, + stderr: result.stderr, + }; +} + +async function recordedHarnessPreludeEvents(result, runDir) { + const events = []; + const prelude = result.codestory_harness_prelude ?? null; + if (prelude) { + let stdout = ""; + const stdoutPath = prelude.stdout_path + ? path.isAbsolute(prelude.stdout_path) + ? prelude.stdout_path + : path.resolve(runDir, prelude.stdout_path) + : null; + if (stdoutPath && existsSync(stdoutPath)) { + stdout = await readFile(stdoutPath, "utf8"); + } + events.push(...harnessPacketPreludeEvents(prelude, stdout)); + } + const baselinePrelude = result.baseline_harness_prelude ?? null; + if (baselinePrelude?.context_path) { + const contextPath = path.isAbsolute(baselinePrelude.context_path) + ? baselinePrelude.context_path + : path.resolve(runDir, baselinePrelude.context_path); + if (existsSync(contextPath)) { + const payload = JSON.parse(await readFile(contextPath, "utf8")); + events.push(...harnessBaselinePreludeEvents(baselinePrelude, payload.commands ?? [])); + } + } + return events; +} + +async function runOne(opts, run, outDir) { + const repoConfig = ALL_REPOS[run.repo]; + const runId = benchmarkRunId([ + run.repo, + ...(run.task ? [run.task.id] : []), + run.arm, + String(run.repeat).padStart(2, "0"), + ]); + const env = run.arm === "with_codestory" ? benchmarkChildEnv(process.env) : { ...process.env }; + if (run.arm === "with_codestory") { + const codestoryCli = resolveCodeStoryCli(opts); + env.CODESTORY_CLI = path.isAbsolute(codestoryCli) || /[\\/]/.test(codestoryCli) + ? path.resolve(codestoryCli) + : codestoryCli; + } else { + delete env.CODESTORY_CLI; + } + const baselinePrelude = + run.arm === "without_codestory" + ? await runBaselinePrelude(opts, run, repoConfig, outDir, runId) + : null; + const codestoryPrelude = + run.arm === "with_codestory" + ? await runCodeStoryPacketPrelude(opts, run, repoConfig, outDir, runId, env.CODESTORY_CLI) + : null; + const prompt = composePrompt(run.repo, repoConfig, run.arm, run.task, { + baselinePrelude, + codestoryPrelude, + }); + const { command, args, stdin, killProcessTree } = runnerCommand(opts, repoConfig.path, prompt); + const started = performance.now(); + const preludeFailure = [baselinePrelude, codestoryPrelude].find( + (prelude) => prelude && prelude.public.status !== "pass", + ); + const shouldRunAgent = preludeFailure == null; + const result = shouldRunAgent + ? await runProcess(command, args, { + cwd: repoConfig.path, + env, + stdin, + timeoutMs: opts.timeoutMs, + timeoutMessage: `Benchmark runner timed out after ${opts.timeoutMs}ms.`, + forceKillAfterMs: 5000, + killProcessTree, + }) + : { + status: "fail", + exitCode: null, + signal: null, + stdout: "", + stderr: `${preludeFailure.public.kind} prelude failed; skipped agent runner. See ${preludeFailure.public.stderr_path ?? preludeFailure.public.context_path}.`, + error: preludeFailure.public.error, + timedOut: false, + }; + + const runnerWallMs = shouldRunAgent ? Math.round((performance.now() - started) * 1000) / 1000 : 0; + const preludeWallMs = (codestoryPrelude?.public.wall_ms ?? 0) + (baselinePrelude?.public.wall_ms ?? 0); + const wallMs = Math.round((runnerWallMs + preludeWallMs) * 1000) / 1000; + const stdoutPath = path.join(outDir, `${runId}.stdout.jsonl`); + const stderrPath = path.join(outDir, `${runId}.stderr.txt`); + await writeFile(stdoutPath, result.stdout, "utf8"); + await writeFile(stderrPath, result.stderr, "utf8"); + + const { parsed, malformed } = parseJsonLines(result.stdout); + const analysisEvents = [ + ...harnessBaselinePreludeEvents(baselinePrelude?.public, baselinePrelude?.commands), + ...harnessPacketPreludeEvents(codestoryPrelude?.public, codestoryPrelude?.stdout), + ...parsed, + ]; + const usage = extractUsage(parsed); + const codexToolCalls = parsed.filter(isToolCallStartEvent).length; + const toolCalls = analysisEvents.filter(isToolCallStartEvent).length; + const analysis = analyzeTranscript(analysisEvents, repoConfig.path); + const provenance = await repoProvenance(repoConfig); + const packetFirstRequired = run.arm === "with_codestory"; + const packetFirstPass = + !packetFirstRequired || Boolean(analysis.packet_was_first_context_command); + const quality = scoreQuality(analysisEvents, run.task); + const cacheProvenance = run.arm === "with_codestory" + ? await codestoryCacheProvenance(opts, repoConfig, { + codestory_index_commands_observed: analysis.codestory_index_commands_observed, + indexing_in_timed_run: analysis.codestory_index_commands_observed > 0, + cache_prepared: opts.cachePreparationByRepo?.has(run.repo) ?? false, + cache_preparation: opts.cachePreparationByRepo?.get(run.repo) ?? null, + transport_mode: "agent_runner", + }) + : null; + + const output = { + repo: run.repo, task_id: run.task?.id ?? null, task_name: run.task?.name ?? null, task_class: run.task?.task_class ?? null, @@ -1804,19 +2889,28 @@ async function runOne(opts, run, outDir) { signal: result.signal, error: result.error, wall_ms: wallMs, + agent_runner_wall_ms: runnerWallMs, + baseline_harness_prelude: baselinePrelude?.public ?? null, + codestory_harness_prelude: codestoryPrelude?.public ?? null, usage, estimated_cost_usd: estimateCost(usage), tool_calls_observed: toolCalls, + codex_tool_calls_observed: codexToolCalls, transcript_analysis: analysis, packet_first_required: packetFirstRequired, packet_first_pass: packetFirstPass, quality, - event_types: eventTypeCounts(parsed), + event_types: eventTypeCounts(analysisEvents), json_events: parsed.length, + analysis_events: analysisEvents.length, malformed_stdout_lines: malformed.length, stdout_path: stdoutPath, stderr_path: stderrPath, }; + return { + ...output, + resource_accounting: resourceAccountingForResult(output), + }; } async function gitOutput(args, cwd, timeoutMs = 10_000) { @@ -1974,6 +3068,7 @@ function compactCachePreparation(preparation) { return { repo: preparation.repo, action: preparation.action, + preparation_wall_ms: preparation.preparation_wall_ms ?? null, index_status: preparation.index_status ?? null, index_exit_code: preparation.index_exit_code ?? null, index_wall_ms: preparation.index_wall_ms ?? null, @@ -2000,25 +3095,30 @@ async function prepareCodeStoryCaches(opts, tasks) { } const repoNames = [...new Set(tasks.map((task) => task.repo))]; const codestoryCli = resolveCodeStoryCli(opts); - const preparations = []; - for (const repo of repoNames) { + if (repoNames.length > 1 && opts.prepareCodestoryJobs > 1) { + console.log( + `preparing CodeStory caches for ${repoNames.length} repos with --prepare-codestory-jobs ${opts.prepareCodestoryJobs}`, + ); + } + return await parallelMap(repoNames, opts.prepareCodestoryJobs, async (repo) => { const config = ALL_REPOS[repo]; if (!config || !existsSync(config.path)) { - preparations.push({ + return { repo, project: config?.path ?? null, action: "skipped-missing-repo", - }); - continue; + }; } console.log(`preparing CodeStory cache for ${repo}`); + const preparationStarted = performance.now(); const before = await codestoryDoctorSnapshot(codestoryCli, config.path, 60_000); const preparation = { repo, project: config.path, codestory_cli: path.resolve(codestoryCli), action: cachePreparationAction(before), + preparation_wall_ms: null, before, index_status: null, index_exit_code: null, @@ -2067,9 +3167,10 @@ async function prepareCodeStoryCaches(opts, tasks) { } } - preparations.push(preparation); - } - return preparations; + preparation.preparation_wall_ms = + Math.round((performance.now() - preparationStarted) * 1000) / 1000; + return preparation; + }); } function semanticBackendName(retrieval) { @@ -2258,10 +3359,14 @@ async function recomputeRunAnalysis(result, opts, runDir, taskCache) { } const { parsed, malformed } = parseJsonLines(await readFile(stdoutPath, "utf8")); + const analysisEvents = [ + ...(await recordedHarnessPreludeEvents(result, runDir)), + ...parsed, + ]; const task = await loadTaskForResult(result, opts, taskCache); const repoConfig = ALL_REPOS[result.repo] ?? null; const usage = extractUsage(parsed); - const analysis = analyzeTranscript(parsed, result.repo_path ?? repoConfig?.path ?? runDir); + const analysis = analyzeTranscript(analysisEvents, result.repo_path ?? repoConfig?.path ?? runDir); const packetFirstRequired = result.packet_first_required ?? result.arm === "with_codestory"; const cacheProvenance = result.codestory_cache_provenance ?? ( repoConfig && result.arm === "with_codestory" @@ -2274,24 +3379,30 @@ async function recomputeRunAnalysis(result, opts, runDir, taskCache) { }) : null ); - return { + const output = { ...result, repo_provenance: result.repo_provenance ?? (repoConfig ? await repoProvenance(repoConfig) : null), codestory_cache_provenance: cacheProvenance, usage, estimated_cost_usd: estimateCost(usage), - tool_calls_observed: parsed.filter(isToolCallStartEvent).length, + tool_calls_observed: analysisEvents.filter(isToolCallStartEvent).length, + codex_tool_calls_observed: parsed.filter(isToolCallStartEvent).length, transcript_analysis: analysis, packet_first_required: packetFirstRequired, packet_first_pass: !packetFirstRequired || Boolean(analysis.packet_was_first_context_command), - quality: scoreQuality(parsed, task), + quality: scoreQuality(analysisEvents, task), reanalysis_task_source: result.task_manifest_snapshot ? "snapshot" : task ? "manifest" : null, - event_types: eventTypeCounts(parsed), + event_types: eventTypeCounts(analysisEvents), json_events: parsed.length, + analysis_events: analysisEvents.length, malformed_stdout_lines: malformed.length, reanalyzed_at: new Date().toISOString(), }; + return { + ...output, + resource_accounting: resourceAccountingForResult(output), + }; } async function reanalyzeAgentRunDirectory(opts) { @@ -2317,6 +3428,7 @@ async function reanalyzeAgentRunDirectory(opts) { } const summary = summarizeRuns(reanalyzed); + const costAccounting = summarizeCostAccounting(reanalyzed); const summaryOpts = { ...opts, runner: originalSummary.runner ?? opts.runner, @@ -2331,6 +3443,7 @@ async function reanalyzeAgentRunDirectory(opts) { max_source_reads_after_packet: opts.maxSourceReadsAfterPacket, output_dir: runDir, summary, + cost_accounting: costAccounting, }; await writeFile( path.join(runDir, "reanalyzed-runs.jsonl"), @@ -2338,7 +3451,11 @@ async function reanalyzeAgentRunDirectory(opts) { "utf8", ); await writeFile(path.join(runDir, "reanalyzed-summary.json"), `${JSON.stringify(payload, null, 2)}\n`, "utf8"); - await writeFile(path.join(runDir, "reanalyzed-summary.md"), markdownSummary(summary, summaryOpts), "utf8"); + await writeFile( + path.join(runDir, "reanalyzed-summary.md"), + markdownSummary(summary, summaryOpts, costAccounting), + "utf8", + ); if (opts.publishable) { const blockers = agentPublishableBlockers(reanalyzed, opts); if (blockers.length) { @@ -2390,7 +3507,7 @@ function packetPayloadText(packet) { for (const claim of packet.sufficiency?.covered_claims ?? []) { chunks.push(claim.claim); } - for (const path of packet.sufficiency?.avoid_opening ?? []) { + for (const path of packetAvoidOpeningRawPaths(packet)) { chunks.push(path); } return chunks.filter(Boolean).join("\n"); @@ -2438,9 +3555,9 @@ function packetComposition(packet, task) { line: citation.line ?? null, })) .filter((entry) => entry.path); - const avoidOpeningPaths = (packet.sufficiency?.avoid_opening ?? []) + const avoidOpeningPaths = packetAvoidOpeningRawPaths(packet) .map((pathValue, index) => ({ - source: "sufficiency.avoid_opening", + source: "sufficiency.avoid_opening_paths", path: pathValue, rank: index + 1, display_name: null, @@ -2636,6 +3753,15 @@ function finiteNumber(value) { return Number.isFinite(number) ? number : null; } +function cappedStringArray(value, limit) { + return Array.isArray(value) + ? value + .map((entry) => String(entry ?? "").trim()) + .filter(Boolean) + .slice(0, limit) + : []; +} + function packetShape(packet) { if (!packet || typeof packet !== "object") { return null; @@ -2659,13 +3785,19 @@ function packetSufficiencyTelemetry(packet, quality) { } const status = packet.sufficiency?.status ?? null; const qualityPass = quality?.pass ?? null; + const gaps = cappedStringArray(packet.sufficiency?.gaps, 8); + const openNext = cappedStringArray(packet.sufficiency?.open_next, 6); + const followUpCommands = cappedStringArray(packet.sufficiency?.follow_up_commands, 6); return { status, covered_claims_count: packet.sufficiency?.covered_claims?.length ?? 0, open_next_count: packet.sufficiency?.open_next?.length ?? 0, - avoid_opening_count: packet.sufficiency?.avoid_opening?.length ?? 0, + avoid_opening_count: packetAvoidOpeningRawPaths(packet).length, gaps_count: packet.sufficiency?.gaps?.length ?? 0, follow_up_commands_count: packet.sufficiency?.follow_up_commands?.length ?? 0, + gaps, + open_next: openNext, + follow_up_commands: followUpCommands, sufficient_quality_mismatch: status === "sufficient" && qualityPass === false, }; } @@ -2737,20 +3869,7 @@ async function runColdPacketRuntime(opts, task, repeat, outDir) { indexing_in_timed_run: false, transport_mode: "cold_cli_packet", }); - const args = [ - "packet", - "--project", - repoConfig.path, - "--question", - task.prompt, - "--budget", - "compact", - "--format", - "json", - ]; - if (task.task_class) { - args.push("--task-class", task.task_class.replace(/_/g, "-")); - } + const args = packetCommandArgs(repoConfig, task, opts); const started = performance.now(); const result = await runProcess(codestoryCli, args, { env: benchmarkChildEnv(process.env), @@ -2773,6 +3892,7 @@ async function runColdPacketRuntime(opts, task, repeat, outDir) { const sufficiency = packetSufficiencyTelemetry(packet, quality); const latency = packetLatencyTelemetry(packet, wallMs); const composition = packetComposition(packet, task); + const extraProbes = packetCommandExtraProbes(task, opts); const runId = benchmarkRunId([task.repo, task.id, "cold-cli-packet", String(repeat).padStart(2, "0")]); await writeFile(path.join(outDir, `${runId}.stdout.json`), result.stdout, "utf8"); await writeFile(path.join(outDir, `${runId}.stderr.txt`), result.stderr, "utf8"); @@ -2794,6 +3914,8 @@ async function runColdPacketRuntime(opts, task, repeat, outDir) { packet_shape: shape, packet_latency: latency, packet_composition: composition, + packet_extra_probe_count: extraProbes.length, + packet_extra_probe_strategy: packetExtraProbeStrategy(extraProbes), sufficiency, quality, }; @@ -3112,14 +4234,41 @@ function repoProvenanceBlockers(result) { } const configuredRef = provenance.configured?.ref ?? null; const manifestRef = provenance.manifest?.ref ?? null; - if (!isPinnedRepoRef(configuredRef)) { - reasons.push("repo ref is not pinned to an immutable commit or tag"); + const configuredCommit = normalizeImmutableCommitRef(configuredRef); + const manifestCommit = manifestRef ? normalizeImmutableCommitRef(manifestRef) : null; + const gitHead = normalizeImmutableCommitRef(provenance.git_head); + if (!configuredCommit) { + reasons.push("repo ref is not pinned to a full immutable commit SHA"); } - if (manifestRef && configuredRef && manifestRef !== configuredRef) { + if (manifestRef && configuredRef && manifestCommit !== configuredCommit) { reasons.push(`manifest ref ${manifestRef} does not match configured ref ${configuredRef}`); } - if (!provenance.git_head) { + if (!gitHead) { reasons.push("missing git head"); + } else if (configuredCommit && gitHead !== configuredCommit) { + reasons.push(`git head ${provenance.git_head} does not match configured ref ${configuredRef}`); + } + const configuredUrl = provenance.configured?.url ?? null; + const manifestUrl = provenance.manifest?.url ?? null; + const gitOrigin = provenance.git_origin ?? null; + const configuredRepo = normalizeTrustedPublishableRepoUrl(configuredUrl); + const manifestRepo = manifestUrl ? normalizeTrustedPublishableRepoUrl(manifestUrl) : null; + const originRepo = gitOrigin ? normalizeTrustedPublishableRepoUrl(gitOrigin) : null; + if (!configuredRepo) { + reasons.push("configured repo URL is not a trusted GitHub HTTPS repo URL"); + } + if (!manifestUrl) { + reasons.push("missing manifest repo URL"); + } else if (!manifestRepo) { + reasons.push("manifest repo URL is not a trusted GitHub HTTPS repo URL"); + } + if (configuredRepo && manifestUrl && manifestRepo && manifestRepo !== configuredRepo) { + reasons.push(`manifest repo URL ${manifestUrl} does not match configured URL ${configuredUrl}`); + } + if (!originRepo) { + reasons.push("git origin is missing or is not a trusted GitHub HTTPS repo URL"); + } else if (configuredRepo && originRepo !== configuredRepo) { + reasons.push(`git origin ${gitOrigin} does not match configured URL ${configuredUrl}`); } if (provenance.git_dirty !== false) { reasons.push(provenance.git_dirty ? "repo checkout is dirty" : "repo cleanliness is unknown"); @@ -3127,21 +4276,13 @@ function repoProvenanceBlockers(result) { return reasons; } -function isPinnedRepoRef(ref) { +function isImmutableCommitRef(ref) { + return /^[0-9a-f]{40}$/i.test(String(ref ?? "").trim()); +} + +function normalizeImmutableCommitRef(ref) { const value = String(ref ?? "").trim(); - if (!value || value === "local") { - return false; - } - if (/^[0-9a-f]{7,40}$/i.test(value)) { - return true; - } - if (/^refs\/tags\/[^/\s]+$/i.test(value)) { - return true; - } - if (/^v?\d+\.\d+(?:\.\d+)?(?:[-+][A-Za-z0-9._-]+)?$/.test(value)) { - return true; - } - return false; + return isImmutableCommitRef(value) ? value.toLowerCase() : null; } function cacheProvenanceBlockers(result) { @@ -3281,16 +4422,36 @@ function buildQualityDebugPayload(results, meta = {}) { missed_anchors: quality?.missed_anchors ?? null, retrieval: extractRetrievalDiagnostics(row), sufficiency_status: row.sufficiency?.status ?? null, + sufficiency: row.sufficiency + ? { + status: row.sufficiency.status ?? null, + gaps: row.sufficiency.gaps ?? [], + open_next: row.sufficiency.open_next ?? [], + follow_up_commands: row.sufficiency.follow_up_commands ?? [], + gaps_count: row.sufficiency.gaps_count ?? 0, + open_next_count: row.sufficiency.open_next_count ?? 0, + follow_up_commands_count: row.sufficiency.follow_up_commands_count ?? 0, + covered_claims_count: row.sufficiency.covered_claims_count ?? 0, + avoid_opening_count: row.sufficiency.avoid_opening_count ?? 0, + } + : null, sufficient_quality_mismatch: row.sufficiency?.sufficient_quality_mismatch ?? null, }; }); const failing = rows.filter((row) => row.quality_pass === false); + const partial = rows.filter((row) => row.sufficiency_status === "partial"); const reasonCounts = {}; for (const row of failing) { for (const reason of row.failure_reasons) { reasonCounts[reason] = (reasonCounts[reason] ?? 0) + 1; } } + const partialGapCounts = {}; + for (const row of partial) { + for (const gap of row.sufficiency?.gaps ?? []) { + partialGapCounts[gap] = (partialGapCounts[gap] ?? 0) + 1; + } + } return { generated_at: new Date().toISOString(), scope: "packet_runtime_quality_debug", @@ -3301,7 +4462,9 @@ function buildQualityDebugPayload(results, meta = {}) { quality_scored_runs: rows.filter((row) => row.quality_pass != null).length, quality_pass_runs: rows.filter((row) => row.quality_pass === true).length, quality_fail_runs: failing.length, + packet_partial_runs: partial.length, failure_reason_counts: reasonCounts, + partial_gap_counts: partialGapCounts, }, }; } @@ -3324,6 +4487,9 @@ function packetRuntimePublishableBlockers(results, opts = {}) { reasons.push("packet sufficiency says sufficient but manifest quality failed"); } if (enforcePacketRuntimeTelemetry) { + if (row.packet_extra_probe_strategy) { + reasons.push(`diagnostic packet extra probes used: ${row.packet_extra_probe_strategy}`); + } if (!row.sufficiency) { reasons.push("missing packet sufficiency telemetry"); } else if (row.sufficiency.status !== "sufficient") { @@ -3353,6 +4519,15 @@ function packetRuntimePublishableBlockers(results, opts = {}) { .filter(Boolean); } +function packetRuntimeQualityGateRequired(opts = {}) { + return Boolean(opts.publishable || (opts.taskSuite === "holdout-retrieval" && !opts.allowFailures)); +} + +function formatPacketRuntimeBlocker(blocker) { + const row = blocker.result; + return ` ${row.repo} ${row.task_id} ${row.mode} repeat ${row.repeat}: ${blocker.reasons.join("; ")}`; +} + function groupTasksByRepo(tasks) { const byRepo = new Map(); for (const task of tasks) { @@ -3482,10 +4657,19 @@ async function runPacketRuntimeBenchmark(opts, tasks) { : [opts.packetRuntimeMode]; const results = []; if (modes.includes("cold-cli")) { + const coldJobs = []; for (const task of tasks) { for (let repeat = 1; repeat <= opts.repeats; repeat += 1) { - console.log(`packet-runtime cold-cli ${task.repo} ${task.id} repeat ${repeat}/${opts.repeats}`); - results.push(await runColdPacketRuntime(opts, task, repeat, outDir)); + coldJobs.push({ task, repeat }); + } + } + const coldResults = await parallelMap(coldJobs, opts.jobs, async ({ task, repeat }) => { + console.log(`packet-runtime cold-cli ${task.repo} ${task.id} repeat ${repeat}/${opts.repeats}`); + return await runColdPacketRuntime(opts, task, repeat, outDir); + }); + for (const result of coldResults) { + if (result) { + results.push(result); } } } @@ -3552,10 +4736,19 @@ async function runPacketRuntimeBenchmark(opts, tasks) { const blockers = packetRuntimePublishableBlockers(results, opts); if (opts.publishable && blockers.length) { - console.error("--publishable failed: packet runtime rows must pass, include passing manifest quality gates, and use pinned clean repo provenance."); + console.error( + "--publishable failed: packet runtime rows must pass, include passing manifest quality gates, and use pinned clean repo provenance.", + ); for (const blocker of blockers) { - const row = blocker.result; - console.error(` ${row.repo} ${row.task_id} ${row.mode} repeat ${row.repeat}: ${blocker.reasons.join("; ")}`); + console.error(formatPacketRuntimeBlocker(blocker)); + } + process.exitCode = 1; + } else if (packetRuntimeQualityGateRequired(opts) && blockers.length) { + console.error( + "holdout-retrieval packet-runtime gate failed: every row must pass manifest quality thresholds. Use --allow-failures only for exploratory diagnostics.", + ); + for (const blocker of blockers) { + console.error(formatPacketRuntimeBlocker(blocker)); } process.exitCode = 1; } @@ -3571,6 +4764,241 @@ function median(values) { return sorted.length % 2 ? sorted[middle] : (sorted[middle - 1] + sorted[middle]) / 2; } +function presentFiniteNumber(value) { + if (value == null || value === "") { + return null; + } + const number = Number(value); + return Number.isFinite(number) ? number : null; +} + +function sumFinite(values) { + return values.reduce((sum, value) => { + const number = presentFiniteNumber(value); + return number == null ? sum : sum + number; + }, 0); +} + +function sumPresentFinite(values) { + let seen = false; + let sum = 0; + for (const value of values) { + const number = presentFiniteNumber(value); + if (number == null) { + continue; + } + seen = true; + sum += number; + } + return seen ? sum : null; +} + +function sumCategories(rows, categories, accessor) { + const totals = Object.fromEntries(categories.map((category) => [category, 0])); + for (const row of rows) { + const values = accessor(row) ?? {}; + for (const [category, value] of Object.entries(values)) { + const number = presentFiniteNumber(value); + if (number == null) { + continue; + } + totals[category] = (totals[category] ?? 0) + number; + } + } + return totals; +} + +function resourceAccountingForResult(result) { + const analysis = result.transcript_analysis ?? {}; + const usage = result.usage ?? {}; + const wallMs = presentFiniteNumber(result.wall_ms); + const agentRunnerWallMs = presentFiniteNumber(result.agent_runner_wall_ms); + const baselineHarnessPreludeWallMs = presentFiniteNumber(result.baseline_harness_prelude?.wall_ms); + const codestoryHarnessPreludeWallMs = presentFiniteNumber(result.codestory_harness_prelude?.wall_ms); + const preparationWallMs = cachePreparationWallMs( + result.codestory_cache_provenance?.cache_preparation, + ); + return { + measurement_source: "runner_process_wall_clock_codex_jsonl_and_harness_prelude", + status: result.status ?? null, + wall_ms: wallMs, + agent_runner_wall_ms: agentRunnerWallMs, + baseline_harness_prelude_wall_ms: baselineHarnessPreludeWallMs, + codestory_harness_prelude_wall_ms: codestoryHarnessPreludeWallMs, + codestory_cache_preparation_wall_ms: preparationWallMs, + all_in_wall_ms: wallMs == null ? null : wallMs + (preparationWallMs ?? 0), + usage: { + input_tokens: usage.input_tokens ?? null, + output_tokens: usage.output_tokens ?? null, + total_tokens: usage.total_tokens ?? null, + cached_input_tokens: usage.cached_input_tokens ?? null, + reasoning_tokens: usage.reasoning_tokens ?? null, + }, + estimated_cost_usd: result.estimated_cost_usd ?? null, + tool_calls_observed: presentFiniteNumber(result.tool_calls_observed), + codex_tool_calls_observed: presentFiniteNumber(result.codex_tool_calls_observed), + tool_categories: analysis.tool_categories ?? {}, + command_count: presentFiniteNumber(analysis.command_count), + command_categories: analysis.command_categories ?? {}, + external_context_tool_calls: presentFiniteNumber(analysis.external_context_tool_calls) ?? 0, + direct_source_reads_total: presentFiniteNumber(analysis.direct_source_reads_total), + ordinary_source_reads_after_first_codestory: + presentFiniteNumber(analysis.ordinary_source_reads_after_first_codestory), + ordinary_source_reads_after_first_packet: + presentFiniteNumber(analysis.ordinary_source_reads_after_first_packet), + }; +} + +function summarizeArmCostAccounting(rows) { + const successful = rows.filter((row) => row.status === "pass"); + const wallMs = sumFinite(rows.map((row) => row.wall_ms)); + const agentRunnerWallMs = sumFinite( + rows.map((row) => row.agent_runner_wall_ms ?? row.wall_ms), + ); + const baselineHarnessPreludeWallMs = sumFinite( + rows.map((row) => row.baseline_harness_prelude?.wall_ms), + ); + const codestoryHarnessPreludeWallMs = sumFinite( + rows.map((row) => row.codestory_harness_prelude?.wall_ms), + ); + const preparationWallMs = sumFinite( + rows.map((row) => cachePreparationWallMs(row.codestory_cache_provenance?.cache_preparation)), + ); + return { + runs: rows.length, + successful_runs: successful.length, + failed_runs: rows.filter((row) => row.status === "fail").length, + timeout_runs: rows.filter((row) => row.status === "timeout").length, + missing_token_usage_runs: rows.filter((row) => row.usage?.total_tokens == null).length, + time_spent_ms: { + runner_wall: wallMs, + agent_runner: agentRunnerWallMs, + baseline_harness_prelude: baselineHarnessPreludeWallMs, + codestory_harness_prelude: codestoryHarnessPreludeWallMs, + codestory_cache_preparation: preparationWallMs, + all_in: wallMs + preparationWallMs, + }, + tokens_spent: { + input_tokens: sumPresentFinite(rows.map((row) => row.usage?.input_tokens)), + output_tokens: sumPresentFinite(rows.map((row) => row.usage?.output_tokens)), + total_tokens: sumPresentFinite(rows.map((row) => row.usage?.total_tokens)), + cached_input_tokens: sumPresentFinite(rows.map((row) => row.usage?.cached_input_tokens)), + reasoning_tokens: sumPresentFinite(rows.map((row) => row.usage?.reasoning_tokens)), + }, + estimated_cost_usd: sumPresentFinite(rows.map((row) => row.estimated_cost_usd)), + tool_calls: { + observed: sumFinite(rows.map((row) => row.tool_calls_observed)), + codex_observed: sumFinite(rows.map((row) => row.codex_tool_calls_observed)), + categories: sumCategories( + rows, + TOOL_ACCOUNTING_CATEGORIES, + (row) => row.transcript_analysis?.tool_categories, + ), + }, + commands: { + observed: sumFinite(rows.map((row) => row.transcript_analysis?.command_count)), + categories: sumCategories( + rows, + COMMAND_ACCOUNTING_CATEGORIES, + (row) => row.transcript_analysis?.command_categories, + ), + }, + source_reads: { + direct_source_reads_total: sumFinite( + rows.map((row) => row.transcript_analysis?.direct_source_reads_total), + ), + ordinary_source_reads_after_first_codestory: sumFinite( + rows.map((row) => row.transcript_analysis?.ordinary_source_reads_after_first_codestory), + ), + ordinary_source_reads_after_first_packet: sumFinite( + rows.map((row) => row.transcript_analysis?.ordinary_source_reads_after_first_packet), + ), + }, + external_context_tool_calls: sumFinite( + rows.map((row) => row.transcript_analysis?.external_context_tool_calls), + ), + }; +} + +function accountingComparison(withValue, withoutValue) { + const withNumber = presentFiniteNumber(withValue); + const withoutNumber = presentFiniteNumber(withoutValue); + return { + with_codestory: withNumber, + without_codestory: withoutNumber, + with_minus_without: + withNumber == null || withoutNumber == null ? null : withNumber - withoutNumber, + ratio: + withNumber == null || withoutNumber == null || withoutNumber <= 0 + ? null + : withNumber / withoutNumber, + }; +} + +function summarizeCostAccounting(results) { + const byArm = new Map(); + for (const row of results) { + if (!byArm.has(row.arm)) { + byArm.set(row.arm, []); + } + byArm.get(row.arm).push(row); + } + + const arms = {}; + for (const [arm, rows] of byArm.entries()) { + arms[arm] = summarizeArmCostAccounting(rows); + } + + const withCodeStory = arms.with_codestory ?? null; + const withoutCodeStory = arms.without_codestory ?? null; + const withVsWithout = + withCodeStory && withoutCodeStory + ? { + runner_wall_ms: accountingComparison( + withCodeStory.time_spent_ms.runner_wall, + withoutCodeStory.time_spent_ms.runner_wall, + ), + all_in_wall_ms: accountingComparison( + withCodeStory.time_spent_ms.all_in, + withoutCodeStory.time_spent_ms.all_in, + ), + total_tokens: accountingComparison( + withCodeStory.tokens_spent.total_tokens, + withoutCodeStory.tokens_spent.total_tokens, + ), + input_tokens: accountingComparison( + withCodeStory.tokens_spent.input_tokens, + withoutCodeStory.tokens_spent.input_tokens, + ), + output_tokens: accountingComparison( + withCodeStory.tokens_spent.output_tokens, + withoutCodeStory.tokens_spent.output_tokens, + ), + tool_calls: accountingComparison( + withCodeStory.tool_calls.observed, + withoutCodeStory.tool_calls.observed, + ), + commands: accountingComparison( + withCodeStory.commands.observed, + withoutCodeStory.commands.observed, + ), + estimated_cost_usd: accountingComparison( + withCodeStory.estimated_cost_usd, + withoutCodeStory.estimated_cost_usd, + ), + } + : null; + + return { + measurement_source: "runner_process_wall_clock_codex_jsonl_and_harness_prelude", + note: + "Token values are parsed from Codex JSONL stdout. Tool-call and command totals include harness-run baseline and CodeStory preludes when present. Wall time includes the agent runner plus any harness prelude. CodeStory cache preparation is tracked separately and included in all-in wall time.", + generated_at: new Date().toISOString(), + arms, + with_vs_without: withVsWithout, + }; +} + function summarizeRuns(results) { const groups = new Map(); for (const result of results) { @@ -3587,19 +5015,25 @@ function summarizeRuns(results) { const successful = rows.filter((row) => row.status === "pass"); const qualityRows = successful.filter((row) => row.quality); const packetFirstRows = successful.filter((row) => row.packet_first_required); + const packetManifestRows = successful.filter( + (row) => row.codestory_harness_prelude?.packet_manifest_quality, + ); const categoryMedians = {}; - for (const category of [ - "codestory_cli", - "shell_search", - "direct_file_read", - "git", - "build_test", - "other", - ]) { + for (const category of COMMAND_ACCOUNTING_CATEGORIES) { categoryMedians[category] = median( successful.map((row) => row.transcript_analysis?.command_categories?.[category] ?? 0), ); } + const toolCategoryMedians = {}; + for (const category of TOOL_ACCOUNTING_CATEGORIES) { + toolCategoryMedians[category] = median( + successful.map((row) => row.transcript_analysis?.tool_categories?.[category] ?? 0), + ); + } + const totalCodestoryCachePreparationWallMs = sumFinite( + successful.map((row) => cachePreparationWallMs(row.codestory_cache_provenance?.cache_preparation)), + ); + const totalWallMs = sumFinite(successful.map((row) => row.wall_ms)); summaries.push({ repo, task_id: taskId || null, @@ -3610,14 +5044,48 @@ function summarizeRuns(results) { successful_runs: successful.length, packet_first_pass_runs: packetFirstRows.filter((row) => row.packet_first_pass).length, packet_first_required_runs: packetFirstRows.length, + packet_manifest_quality_pass_runs: packetManifestRows.filter( + (row) => row.codestory_harness_prelude?.packet_manifest_quality?.pass, + ).length, + packet_manifest_quality_scored_runs: packetManifestRows.length, + packet_partial_runs: successful.filter( + (row) => row.codestory_harness_prelude?.packet_sufficiency_status === "partial", + ).length, quality_scored_runs: qualityRows.length, quality_pass_runs: qualityRows.filter((row) => row.quality?.pass).length, + total_wall_ms: totalWallMs, + total_codestory_cache_preparation_wall_ms: totalCodestoryCachePreparationWallMs, + total_wall_ms_including_codestory_preparation: + totalWallMs + totalCodestoryCachePreparationWallMs, + total_input_tokens: sumPresentFinite(successful.map((row) => row.usage?.input_tokens)), + total_output_tokens: sumPresentFinite(successful.map((row) => row.usage?.output_tokens)), + total_tokens: sumPresentFinite(successful.map((row) => row.usage?.total_tokens)), + total_estimated_cost_usd: sumPresentFinite(successful.map((row) => row.estimated_cost_usd)), + total_tool_calls_observed: sumFinite(successful.map((row) => row.tool_calls_observed)), + total_command_count: sumFinite(successful.map((row) => row.transcript_analysis?.command_count)), + total_web_search_tool_calls: sumFinite( + successful.map((row) => row.transcript_analysis?.tool_categories?.web_search ?? 0), + ), + total_direct_source_reads_total: sumFinite( + successful.map((row) => row.transcript_analysis?.direct_source_reads_total), + ), + missing_token_usage_runs: successful.filter((row) => row.usage?.total_tokens == null).length, median_wall_ms: median(successful.map((row) => row.wall_ms)), - median_total_tokens: median(successful.map((row) => row.usage.total_tokens)), - median_input_tokens: median(successful.map((row) => row.usage.input_tokens)), - median_output_tokens: median(successful.map((row) => row.usage.output_tokens)), + median_codestory_cache_preparation_wall_ms: median( + successful.map((row) => cachePreparationWallMs(row.codestory_cache_provenance?.cache_preparation)), + ), + median_codestory_retrieval_index_wall_ms: median( + successful.map((row) => row.codestory_cache_provenance?.cache_preparation?.retrieval_index_wall_ms), + ), + median_total_tokens: median(successful.map((row) => row.usage?.total_tokens)), + median_input_tokens: median(successful.map((row) => row.usage?.input_tokens)), + median_output_tokens: median(successful.map((row) => row.usage?.output_tokens)), median_estimated_cost_usd: median(successful.map((row) => row.estimated_cost_usd)), + median_command_count: median(successful.map((row) => row.transcript_analysis?.command_count)), median_tool_calls_observed: median(successful.map((row) => row.tool_calls_observed)), + median_web_search_tool_calls: median( + successful.map((row) => row.transcript_analysis?.tool_categories?.web_search ?? 0), + ), median_direct_source_reads_total: median( successful.map((row) => row.transcript_analysis?.direct_source_reads_total), ), @@ -3646,11 +5114,37 @@ function summarizeRuns(results) { qualityRows.map((row) => usefulAnchorHitsPer10kContextChars(row)), ), median_command_categories: categoryMedians, + median_tool_categories: toolCategoryMedians, + total_command_categories: sumCategories( + successful, + COMMAND_ACCOUNTING_CATEGORIES, + (row) => row.transcript_analysis?.command_categories, + ), + total_tool_categories: sumCategories( + successful, + TOOL_ACCOUNTING_CATEGORIES, + (row) => row.transcript_analysis?.tool_categories, + ), }); } return summaries; } +function cachePreparationWallMs(preparation) { + if (!preparation) { + return null; + } + if (Number.isFinite(preparation.preparation_wall_ms)) { + return preparation.preparation_wall_ms; + } + const indexMs = Number.isFinite(preparation.index_wall_ms) ? preparation.index_wall_ms : 0; + const retrievalIndexMs = Number.isFinite(preparation.retrieval_index_wall_ms) + ? preparation.retrieval_index_wall_ms + : 0; + const fallback = indexMs + retrievalIndexMs; + return fallback > 0 ? fallback : null; +} + function repositoryContextOutputChars(analysis) { const byCategory = analysis?.output_chars_by_category ?? {}; return ( @@ -3671,7 +5165,7 @@ function usefulAnchorHitsPer10kContextChars(row) { } function agentPublishableBlockers(results, opts = {}) { - const maxSourceReadsAfterPacket = opts.maxSourceReadsAfterPacket ?? 0; + const maxSourceReadsAfterPacket = opts.maxSourceReadsAfterPacket; const enforceRepoProvenance = Boolean(opts.publishable || opts.enforceRepoProvenance); return results .map((result) => { @@ -3679,12 +5173,49 @@ function agentPublishableBlockers(results, opts = {}) { if (result.status !== "pass") { reasons.push(`status=${result.status}`); } + if (presentFiniteNumber(result.wall_ms) == null) { + reasons.push("missing wall time"); + } if (result.usage?.total_tokens == null) { reasons.push("missing total token usage"); } + if (presentFiniteNumber(result.tool_calls_observed) == null) { + reasons.push("missing tool call count"); + } + if (presentFiniteNumber(result.transcript_analysis?.command_count) == null) { + reasons.push("missing command count"); + } + if ( + result.arm === "without_codestory" && + (result.transcript_analysis?.command_categories?.codestory_cli ?? 0) > 0 + ) { + reasons.push("without_codestory arm used CodeStory"); + } + if ( + result.arm === "without_codestory" && + result.task_id && + (presentFiniteNumber(result.transcript_analysis?.command_count) ?? 0) <= 0 + ) { + reasons.push("without_codestory arm did not inspect local repository"); + } if (result.packet_first_required && !result.packet_first_pass) { reasons.push("missing answer packet as first successful context command"); } + if ( + opts.publishable && + result.arm === "with_codestory" && + result.packet_first_required && + maxSourceReadsAfterPacket == null + ) { + reasons.push("missing explicit post-packet source-read budget"); + } + const packetExtraProbeStrategy = + result.codestory_harness_prelude?.packet_extra_probe_strategy ?? + result.packet_extra_probe_strategy ?? + null; + if (opts.publishable && result.arm === "with_codestory" && packetExtraProbeStrategy) { + reasons.push(`diagnostic packet extra probes used: ${packetExtraProbeStrategy}`); + } if (result.task_id && !result.quality) { reasons.push("missing manifest quality score"); } @@ -3694,6 +5225,7 @@ function agentPublishableBlockers(results, opts = {}) { const readsAfterPacket = result.transcript_analysis?.ordinary_source_reads_after_first_packet; if ( result.packet_first_required && + maxSourceReadsAfterPacket != null && readsAfterPacket != null && readsAfterPacket > maxSourceReadsAfterPacket ) { @@ -3702,6 +5234,10 @@ function agentPublishableBlockers(results, opts = {}) { if (enforceRepoProvenance) { reasons.push(...repoProvenanceBlockers(result)); } + const externalContextCalls = result.transcript_analysis?.external_context_tool_calls ?? 0; + if (externalContextCalls > 0) { + reasons.push(`external web/search tool calls=${externalContextCalls} > 0`); + } if (result.arm === "with_codestory" && (opts.publishable || opts.enforceCacheProvenance)) { reasons.push(...cacheProvenanceBlockers(result)); } @@ -3710,7 +5246,7 @@ function agentPublishableBlockers(results, opts = {}) { .filter(Boolean); } -function markdownSummary(summary, opts) { +function markdownSummary(summary, opts, costAccounting = null) { const lines = [ "# CodeStory Agent A/B Benchmark", "", @@ -3719,9 +5255,16 @@ function markdownSummary(summary, opts) { `Sandbox: \`${opts.sandbox}\``, `Host: \`${os.hostname()}\``, "", - "| Repo | Task | Arm | Runs | Success | Packet first | Quality pass | Median wall ms | Median tokens | Median cost USD | Median tool calls | Source reads | After CodeStory | After Packet | File recall | Citation coverage | Context chars | Useful anchors / 10k context chars |", - "| --- | --- | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |", ]; + if (costAccounting) { + lines.push(...markdownCostAccounting(costAccounting), ""); + } + lines.push( + "## Per-task Summary", + "", + "| Repo | Task | Arm | Runs | Success | Packet first | Packet manifest | Quality pass | Median wall ms | CodeStory prep ms | Retrieval index ms | Median tokens | Median cost USD | Median tool calls | Web searches | Median commands | CodeStory cmds | Shell searches | File-read cmds | Source reads | After CodeStory | After Packet | File recall | Citation coverage | Context chars | Useful anchors / 10k context chars |", + "| --- | --- | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |", + ); for (const row of summary) { lines.push(markdownSummaryRow(row)); } @@ -3734,6 +5277,38 @@ function markdownSummary(summary, opts) { return lines.join("\n"); } +function markdownCostAccounting(costAccounting) { + const lines = [ + "## Cost Accounting", + "", + "| Arm | Runs | Success | Wall ms | Agent runner ms | Baseline prelude ms | CodeStory prelude ms | All-in wall ms | Input tokens | Output tokens | Total tokens | Tool calls | Codex tool calls | Commands | Web searches | Source reads | Est. cost USD |", + "| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |", + ]; + for (const [arm, row] of Object.entries(costAccounting.arms ?? {})) { + lines.push( + `| ${arm} | ${row.runs} | ${row.successful_runs} | ${formatValue(row.time_spent_ms?.runner_wall)} | ${formatValue(row.time_spent_ms?.agent_runner)} | ${formatValue(row.time_spent_ms?.baseline_harness_prelude)} | ${formatValue(row.time_spent_ms?.codestory_harness_prelude)} | ${formatValue(row.time_spent_ms?.all_in)} | ${formatValue(row.tokens_spent?.input_tokens)} | ${formatValue(row.tokens_spent?.output_tokens)} | ${formatValue(row.tokens_spent?.total_tokens)} | ${formatValue(row.tool_calls?.observed)} | ${formatValue(row.tool_calls?.codex_observed)} | ${formatValue(row.commands?.observed)} | ${formatValue(row.tool_calls?.categories?.web_search)} | ${formatValue(row.source_reads?.direct_source_reads_total)} | ${formatValue(row.estimated_cost_usd)} |`, + ); + } + const comparison = costAccounting.with_vs_without; + if (comparison) { + lines.push( + "", + "| Comparison | With | Without | Delta | Ratio |", + "| --- | ---: | ---: | ---: | ---: |", + ); + for (const [label, values] of Object.entries(comparison)) { + lines.push( + `| ${label} | ${formatValue(values.with_codestory)} | ${formatValue(values.without_codestory)} | ${formatValue(values.with_minus_without)} | ${formatValue(values.ratio)} |`, + ); + } + } + lines.push( + "", + "Accounting source: wall time includes the agent runner and any harness-run baseline or CodeStory prelude; tokens are parsed from Codex JSONL stdout; tool-call and command totals include harness preludes when present; CodeStory cache preparation is tracked separately and included in all-in wall time.", + ); + return lines; +} + function markdownSummaryRow(row) { const cells = [ row.repo, @@ -3742,11 +5317,19 @@ function markdownSummaryRow(row) { row.runs, row.successful_runs, packetFirstLabel(row), + packetManifestLabel(row), qualityPassLabel(row), formatValue(row.median_wall_ms), + formatValue(row.median_codestory_cache_preparation_wall_ms), + formatValue(row.median_codestory_retrieval_index_wall_ms), formatValue(row.median_total_tokens), formatValue(row.median_estimated_cost_usd), formatValue(row.median_tool_calls_observed), + formatValue(row.median_web_search_tool_calls), + formatValue(row.median_command_count), + formatValue(row.median_command_categories?.codestory_cli), + formatValue(row.median_command_categories?.shell_search), + formatValue(row.median_command_categories?.direct_file_read), formatValue(row.median_direct_source_reads_total), formatValue(row.median_source_reads_after_codestory), formatValue(row.median_source_reads_after_packet), @@ -3772,6 +5355,14 @@ function packetFirstLabel(row) { return `${row.packet_first_pass_runs}/${row.packet_first_required_runs}`; } +function packetManifestLabel(row) { + if (!row.packet_manifest_quality_scored_runs) { + return ""; + } + const partialSuffix = row.packet_partial_runs ? `; partial ${row.packet_partial_runs}` : ""; + return `${row.packet_manifest_quality_pass_runs}/${row.packet_manifest_quality_scored_runs}${partialSuffix}`; +} + function formatValue(value) { if (value == null) { return ""; @@ -3870,6 +5461,7 @@ function runSelfTest() { covered_claims: [{ claim: "covered" }], open_next: [], avoid_opening: ["crates/codestory-cli/src/main.rs because already cited"], + avoid_opening_paths: ["crates/codestory-cli/src/main.rs"], gaps: [], follow_up_commands: [], }, @@ -3903,6 +5495,15 @@ function runSelfTest() { }), [null, false, true, "fail"], ); + assert.equal(packetRuntimeQualityGateRequired({ taskSuite: "holdout-retrieval" }), true); + assert.equal( + packetRuntimeQualityGateRequired({ + taskSuite: "holdout-retrieval", + allowFailures: true, + }), + false, + ); + assert.equal(packetRuntimeQualityGateRequired({ taskSuite: "local-real" }), false); assert.equal( cachePreparationAction({ status: "pass", @@ -3922,6 +5523,29 @@ function runSelfTest() { "already-ready", ); + const plannedAgentRuns = planAgentRuns( + { arms: ["without_codestory", "with_codestory"], repeats: 1, repos: null }, + [ + { id: "task-a", repo: "repo-a" }, + { id: "task-b", repo: "repo-b" }, + { id: "task-c", repo: "repo-a" }, + ], + ); + const plannedGroups = groupPlannedAgentRuns(plannedAgentRuns); + assert.deepEqual( + plannedGroups.map((group) => group.key), + ["repo-a", "repo-b"], + ); + assert.deepEqual( + plannedGroups[0].runs.map((run) => `${run.task.id}:${run.arm}`), + [ + "task-a:without_codestory", + "task-a:with_codestory", + "task-c:without_codestory", + "task-c:with_codestory", + ], + ); + console.log("self-test passed"); } @@ -3947,6 +5571,183 @@ function planAgentRuns(opts, tasks) { return plannedRuns; } +function agentRunKey(run) { + const taskId = run.task?.id ?? run.task_id ?? ""; + return [run.repo, taskId, run.arm, String(run.repeat)].join("\t"); +} + +function agentRunIsolationGroupKey(run) { + return run.repo; +} + +function groupPlannedAgentRuns(plannedRuns) { + const groupsByKey = new Map(); + for (const run of plannedRuns) { + const key = agentRunIsolationGroupKey(run); + if (!groupsByKey.has(key)) { + groupsByKey.set(key, { key, runs: [] }); + } + groupsByKey.get(key).runs.push(run); + } + return [...groupsByKey.values()]; +} + +function taskSnapshotMatches(currentTask, candidate) { + const current = taskSnapshotForResult(currentTask); + const previous = candidate?.task_manifest_snapshot ?? null; + return JSON.stringify(current ?? null) === JSON.stringify(previous ?? null); +} + +function resolveRunArtifactPath(runDir, artifactPath) { + if (!artifactPath) { + return null; + } + const artifactText = String(artifactPath).trim(); + if (!artifactText || path.isAbsolute(artifactText)) { + return null; + } + if (!REUSABLE_BASELINE_ARTIFACT_NAME_PATTERN.test(path.basename(artifactText))) { + return null; + } + const resolved = path.resolve(runDir, artifactText); + return isPathInside(runDir, resolved) ? resolved : null; +} + +async function copyResultArtifact(runDir, outDir, artifactPath, nextName) { + const source = resolveRunArtifactPath(runDir, artifactPath); + if (!source) { + return null; + } + if (!existsSync(source)) { + return artifactPath ?? null; + } + const sourceStat = statSync(source); + if (!sourceStat.isFile()) { + return null; + } + if (sourceStat.size > MAX_REUSED_ARTIFACT_BYTES) { + throw new Error( + `Refusing to reuse oversized baseline artifact ${source}: ${sourceStat.size} bytes exceeds ${MAX_REUSED_ARTIFACT_BYTES}`, + ); + } + const destination = path.join(outDir, nextName); + await copyFile(source, destination); + return destination; +} + +async function copyReusableBaselineArtifacts(row, sourceRunDir, outDir, runId) { + const copied = { + ...row, + stdout_path: await copyResultArtifact(sourceRunDir, outDir, row.stdout_path, `${runId}.stdout.jsonl`), + stderr_path: await copyResultArtifact(sourceRunDir, outDir, row.stderr_path, `${runId}.stderr.txt`), + }; + if (copied.baseline_harness_prelude?.context_path) { + copied.baseline_harness_prelude = { + ...copied.baseline_harness_prelude, + context_path: await copyResultArtifact( + sourceRunDir, + outDir, + copied.baseline_harness_prelude.context_path, + `${runId}.baseline-context.json`, + ), + stderr_path: await copyResultArtifact( + sourceRunDir, + outDir, + copied.baseline_harness_prelude.stderr_path, + `${runId}.baseline-context.stderr.txt`, + ), + }; + } + return copied; +} + +async function loadReusableBaselines(opts, plannedRuns, outDir) { + if (!opts.reuseBaselineFrom) { + return new Map(); + } + const sourceRunDir = path.resolve(opts.reuseBaselineFrom); + const runsPath = path.join(sourceRunDir, "runs.jsonl"); + if (!existsSync(runsPath)) { + throw new Error(`--reuse-baseline-from must contain runs.jsonl: ${sourceRunDir}`); + } + const wanted = new Map( + plannedRuns + .filter((run) => run.arm === "without_codestory") + .map((run) => [agentRunKey(run), run]), + ); + if (!wanted.size) { + return new Map(); + } + + const rows = (await readFile(runsPath, "utf8")) + .split(/\r?\n/) + .map((line) => line.trim()) + .filter(Boolean) + .map((line) => JSON.parse(line)); + const taskCache = new Map(); + const reusable = new Map(); + for (const row of rows) { + if (row.arm !== "without_codestory") { + continue; + } + const key = agentRunKey(row); + const planned = wanted.get(key); + if (!planned || !taskSnapshotMatches(planned.task, row)) { + continue; + } + const reanalyzed = await recomputeRunAnalysis(row, opts, sourceRunDir, taskCache); + const runId = benchmarkRunId([ + planned.repo, + ...(planned.task ? [planned.task.id] : []), + planned.arm, + String(planned.repeat).padStart(2, "0"), + ]); + const copied = await copyReusableBaselineArtifacts(reanalyzed, sourceRunDir, outDir, runId); + reusable.set(key, { + ...copied, + reused_from: sourceRunDir, + reused_at: new Date().toISOString(), + resource_accounting: resourceAccountingForResult(copied), + }); + } + return reusable; +} + +async function runPlannedAgentRun(opts, run, outDir, reusableBaselines) { + const reusable = reusableBaselines.get(agentRunKey(run)); + if (reusable) { + console.log(`reusing ${run.repo} ${run.arm} repeat ${run.repeat}/${opts.repeats} from ${opts.reuseBaselineFrom}`); + return reusable; + } + console.log(`running ${run.repo} ${run.arm} repeat ${run.repeat}/${opts.repeats}`); + return await runOne(opts, run, outDir); +} + +async function runPlannedAgentRuns(opts, plannedRuns, reusableBaselines, outDir) { + const runsPath = path.join(outDir, "runs.jsonl"); + if (opts.jobs <= 1 || plannedRuns.length <= 1) { + const results = []; + for (const run of plannedRuns) { + results.push(await runPlannedAgentRun(opts, run, outDir, reusableBaselines)); + await writeJsonlRows(runsPath, results); + } + return results; + } + + const groups = groupPlannedAgentRuns(plannedRuns); + console.log(`running ${plannedRuns.length} planned agent rows across ${groups.length} repo groups with --jobs ${opts.jobs}`); + const groupedResults = await parallelMap(groups, opts.jobs, async (group) => { + const rows = []; + for (const run of group.runs) { + rows.push(await runPlannedAgentRun(opts, run, outDir, reusableBaselines)); + } + return rows; + }); + const results = groupedResults.flat(); + await writeJsonlRows(runsPath, results); + return results; +} + async function main() { const opts = parseArgs(process.argv.slice(2)); if (opts.selfTest) { @@ -3958,12 +5759,13 @@ async function main() { return; } const tasks = await loadTasks(opts); - if (opts.materializeRepos) { - await materializeRepos(tasks, opts); - } if (opts.publishable) { validatePublishableShape(opts, tasks); } + if (opts.materializeRepos) { + assertManifestRepoMaterializationAllowed(tasks, opts); + await materializeRepos(tasks, opts); + } if (opts.list) { if (tasks.length) { for (const task of tasks) { @@ -3999,6 +5801,7 @@ async function main() { const timestamp = new Date().toISOString().replace(/[:.]/g, "-"); const outDir = path.resolve(opts.outDir ?? path.join(repoRoot, "target", "agent-benchmark", timestamp)); await mkdir(outDir, { recursive: true }); + const reusableBaselines = await loadReusableBaselines(opts, plannedRuns, outDir); const cachePreparation = opts.prepareCodestoryCache ? await prepareCodeStoryCaches(opts, tasks) : []; @@ -4011,15 +5814,10 @@ async function main() { ); } - const results = []; - for (const run of plannedRuns) { - console.log(`running ${run.repo} ${run.arm} repeat ${run.repeat}/${opts.repeats}`); - const result = await runOne(opts, run, outDir); - results.push(result); - await writeJsonlRows(path.join(outDir, "runs.jsonl"), results); - } + const results = await runPlannedAgentRuns(opts, plannedRuns, reusableBaselines, outDir); const summary = summarizeRuns(results); + const costAccounting = summarizeCostAccounting(results); const summaryPayload = { generated_at: new Date().toISOString(), runner: opts.runner, @@ -4040,6 +5838,8 @@ async function main() { repeats: opts.repeats, publishable: opts.publishable, max_source_reads_after_packet: opts.maxSourceReadsAfterPacket, + reuse_baseline_from: opts.reuseBaselineFrom, + reused_baseline_runs: results.filter((row) => row.reused_from).length, allow_failures: opts.allowFailures, timeout_ms: opts.timeoutMs, sandbox: opts.sandbox, @@ -4047,9 +5847,10 @@ async function main() { retrieval_env: retrievalEnv(), retrieval_contract: retrievalContractSummary(benchmarkChildEnv(process.env)), summary, + cost_accounting: costAccounting, }; await writeFile(path.join(outDir, "summary.json"), `${JSON.stringify(summaryPayload, null, 2)}\n`, "utf8"); - await writeFile(path.join(outDir, "summary.md"), markdownSummary(summary, opts), "utf8"); + await writeFile(path.join(outDir, "summary.md"), markdownSummary(summary, opts, costAccounting), "utf8"); const failedRuns = results.filter((result) => result.status !== "pass"); let exitCode = 0; @@ -4063,13 +5864,13 @@ async function main() { if (opts.publishable) { const blockers = agentPublishableBlockers(results, opts); - if (blockers.length) { - console.error("--publishable failed: every run must pass, report total token usage, pass manifest quality gates when present, run packet first when required, and stay within the post-packet source-read budget."); - for (const blocker of blockers) { - console.error(formatAgentPublishableBlocker(blocker)); - } - exitCode = 1; + if (blockers.length) { + console.error("--publishable failed: every run must pass, report total token usage, pass manifest quality gates when present, run packet first when required, and stay within the post-packet source-read budget."); + for (const blocker of blockers) { + console.error(formatAgentPublishableBlocker(blocker)); } + exitCode = 1; + } } console.log(`wrote ${outDir}`); @@ -4085,26 +5886,38 @@ export { benchmarkRunId, buildPacketQualityDeltas, buildQualityDebugPayload, + copyResultArtifact, qualityFailureReasons, commandCategory, extractCommandExecutions, isPathInside, + isTrustedPublishableRepoUrl, loadTaskForResult, loadTasks, + manifestRepoMaterializationBlockers, materializeRepos, parseArgs, parseJsonLines, packetComposition, + packetCommandArgs, + packetForAgentPrompt, + packetManifestExtraProbes, + packetManifestQualitySummary, + packetPreludeManifestComplete, packetLatencyTelemetry, packetRuntimePublishableBlockers, + packetRuntimeQualityGateRequired, PACKET_COMPOSITION_WEIGHTS, + MAX_REUSED_ARTIFACT_BYTES, packetCompositionFileScore, packetFirstCommandForPrompt, publicCoreCorpusAudit, repoProvenanceBlockers, + resolveRunArtifactPath, repoConfigFromManifest, resolveCodeStoryCli, scoreQuality, + summarizeCostAccounting, summarizePacketRuntimeRuns, taskSnapshotForResult, }; diff --git a/scripts/codestory-agent-ab-score.mjs b/scripts/codestory-agent-ab-score.mjs new file mode 100644 index 00000000..89d58c52 --- /dev/null +++ b/scripts/codestory-agent-ab-score.mjs @@ -0,0 +1,961 @@ +#!/usr/bin/env node +import { spawn } from "node:child_process"; +import { existsSync, mkdirSync, readFileSync, writeFileSync } from "node:fs"; +import path from "node:path"; +import { fileURLToPath, pathToFileURL } from "node:url"; + +const scriptDir = path.dirname(fileURLToPath(import.meta.url)); +const repoRoot = path.resolve(scriptDir, ".."); +const benchmarkScript = path.join(scriptDir, "codestory-agent-ab-benchmark.mjs"); +const defaultSmokeTaskIds = "python-requests-session-flow,javascript-express-routing-flow"; + +function parseArgs(argv) { + const opts = { + taskSuite: "language-expansion-holdout", + taskIds: defaultSmokeTaskIds, + repeats: 1, + sandbox: "danger-full-access", + repoCacheDir: path.join(repoRoot, "target", "oss-language-corpus", "repos"), + outDir: null, + reanalyzeDir: null, + timeoutMs: 600000, + prepareCodestoryTimeoutMs: 1_800_000, + prepareCodestoryCache: true, + materializeRepos: true, + jobs: 1, + packetGate: false, + allowEmptyPacketGate: false, + packetProbeJobs: 1, + packetProbeRepeats: 1, + packetGateImprovedFrom: null, + reuseBaselineFrom: null, + prepareCodestoryJobs: 1, + }; + for (let i = 0; i < argv.length; i += 1) { + const arg = argv[i]; + if (arg === "--help" || arg === "-h") { + usage(); + process.exit(0); + } + if (arg === "--task-suite") { + opts.taskSuite = argv[++i]; + continue; + } + if (arg === "--task-ids") { + opts.taskIds = argv[++i]; + continue; + } + if (arg === "--repeats") { + opts.repeats = Number.parseInt(argv[++i], 10); + continue; + } + if (arg === "--sandbox") { + opts.sandbox = argv[++i]; + continue; + } + if (arg === "--repo-cache-dir") { + opts.repoCacheDir = path.resolve(argv[++i]); + continue; + } + if (arg === "--out-dir") { + opts.outDir = path.resolve(argv[++i]); + continue; + } + if (arg === "--reanalyze-dir") { + opts.reanalyzeDir = path.resolve(argv[++i]); + continue; + } + if (arg === "--timeout-ms") { + opts.timeoutMs = Number.parseInt(argv[++i], 10); + continue; + } + if (arg === "--prepare-codestory-timeout-ms") { + opts.prepareCodestoryTimeoutMs = Number.parseInt(argv[++i], 10); + continue; + } + if (arg === "--jobs") { + opts.jobs = Number.parseInt(argv[++i], 10); + continue; + } + if (arg === "--packet-gate") { + opts.packetGate = true; + continue; + } + if (arg === "--allow-empty-packet-gate") { + opts.allowEmptyPacketGate = true; + continue; + } + if (arg === "--packet-probe-jobs") { + opts.packetProbeJobs = Number.parseInt(argv[++i], 10); + continue; + } + if (arg === "--packet-probe-repeats") { + opts.packetProbeRepeats = Number.parseInt(argv[++i], 10); + continue; + } + if (arg === "--packet-gate-improved-from") { + opts.packetGateImprovedFrom = path.resolve(argv[++i]); + continue; + } + if (arg === "--reuse-baseline-from") { + opts.reuseBaselineFrom = path.resolve(argv[++i]); + continue; + } + if (arg === "--prepare-codestory-jobs") { + opts.prepareCodestoryJobs = Number.parseInt(argv[++i], 10); + continue; + } + if (arg === "--no-prepare-codestory-cache") { + opts.prepareCodestoryCache = false; + continue; + } + if (arg === "--no-materialize-repos") { + opts.materializeRepos = false; + continue; + } + throw new Error(`Unknown argument: ${arg}`); + } + if (!Number.isInteger(opts.repeats) || opts.repeats < 1) { + throw new Error("--repeats must be a positive integer"); + } + if (!Number.isInteger(opts.timeoutMs) || opts.timeoutMs < 1000) { + throw new Error("--timeout-ms must be at least 1000"); + } + if (!Number.isInteger(opts.prepareCodestoryTimeoutMs) || opts.prepareCodestoryTimeoutMs < 1000) { + throw new Error("--prepare-codestory-timeout-ms must be at least 1000"); + } + if (!Number.isInteger(opts.jobs) || opts.jobs < 1) { + throw new Error("--jobs must be a positive integer"); + } + if (!Number.isInteger(opts.packetProbeJobs) || opts.packetProbeJobs < 1) { + throw new Error("--packet-probe-jobs must be a positive integer"); + } + if (!Number.isInteger(opts.packetProbeRepeats) || opts.packetProbeRepeats < 1) { + throw new Error("--packet-probe-repeats must be a positive integer"); + } + if (!Number.isInteger(opts.prepareCodestoryJobs) || opts.prepareCodestoryJobs < 1) { + throw new Error("--prepare-codestory-jobs must be a positive integer"); + } + if (opts.packetGateImprovedFrom && !opts.packetGate) { + throw new Error("--packet-gate-improved-from requires --packet-gate"); + } + return opts; +} + +function usage() { + console.log(`Usage: + node scripts/codestory-agent-ab-score.mjs [--task-ids ids] [--repeats n] [--out-dir dir] [--prepare-codestory-timeout-ms ms] + [--jobs n] [--prepare-codestory-jobs n] [--packet-gate] [--allow-empty-packet-gate] [--packet-probe-jobs n] + [--packet-gate-improved-from dir] [--reuse-baseline-from dir] + node scripts/codestory-agent-ab-score.mjs --reanalyze-dir target/agent-benchmark/ + +Runs the real CodeStory agent A/B harness, reanalyzes it with the current +transcript analyzer, and emits METRIC lines for Codex Autoresearch. +Packet-gate mode automatically retries transient sidecar-unavailable packet +probe rows once, serially, before selecting nested A/B tasks. It exits non-zero +when no tasks are selected unless --allow-empty-packet-gate is present for an +exploratory diagnostic run. + +Default smoke task ids: ${defaultSmokeTaskIds}`); +} + +function timestampId() { + return new Date().toISOString().replace(/[:.]/g, "-"); +} + +async function runProcess(command, args, options = {}) { + return await new Promise((resolve) => { + const child = spawn(command, args, { + cwd: options.cwd ?? repoRoot, + env: options.env ?? process.env, + shell: false, + stdio: ["ignore", "pipe", "pipe"], + windowsHide: true, + }); + let stdout = ""; + let stderr = ""; + child.stdout.on("data", (chunk) => { + const text = chunk.toString(); + stdout += text; + if (options.streamOutput) { + process.stdout.write(text); + } + }); + child.stderr.on("data", (chunk) => { + const text = chunk.toString(); + stderr += text; + if (options.streamOutput) { + process.stderr.write(text); + } + }); + child.on("error", (error) => { + resolve({ status: "error", exitCode: null, stdout, stderr, error }); + }); + child.on("close", (exitCode, signal) => { + resolve({ + status: exitCode === 0 ? "pass" : "fail", + exitCode, + signal, + stdout, + stderr, + error: null, + }); + }); + }); +} + +function artifactNamePart(value) { + const normalized = String(value ?? "") + .trim() + .replace(/[^A-Za-z0-9_.-]+/g, "-") + .replace(/^[.-]+|[.-]+$/g, ""); + if (!normalized || normalized === "." || normalized === "..") { + return "unknown"; + } + return normalized; +} + +function benchmarkArtifactStem(parts) { + return parts.map(artifactNamePart).join("-"); +} + +async function runBenchmark(opts, outDir) { + const args = [ + benchmarkScript, + "--task-suite", + opts.taskSuite, + "--task-ids", + opts.taskIds, + "--arms", + "without_codestory,with_codestory", + "--repeats", + String(opts.repeats), + "--repo-cache-dir", + opts.repoCacheDir, + "--sandbox", + opts.sandbox, + "--allow-failures", + "--out-dir", + outDir, + "--timeout-ms", + String(opts.timeoutMs), + "--prepare-codestory-timeout-ms", + String(opts.prepareCodestoryTimeoutMs), + "--jobs", + String(opts.jobs), + "--prepare-codestory-jobs", + String(opts.prepareCodestoryJobs), + ]; + if (opts.materializeRepos) { + args.push("--materialize-repos"); + } + if (opts.prepareCodestoryCache) { + args.push("--prepare-codestory-cache"); + } + if (opts.reuseBaselineFrom) { + args.push("--reuse-baseline-from", opts.reuseBaselineFrom); + } + + const result = await runProcess(process.execPath, args, { streamOutput: true }); + if (result.status !== "pass") { + process.stderr.write(result.stderr || result.stdout); + throw new Error(`A/B benchmark command failed with exit ${result.exitCode ?? result.status}`); + } +} + +async function reanalyze(outDir) { + const result = await runProcess(process.execPath, [ + benchmarkScript, + "--reanalyze-dir", + outDir, + ]); + if (result.status !== "pass") { + process.stderr.write(result.stderr || result.stdout); + throw new Error(`A/B reanalysis command failed with exit ${result.exitCode ?? result.status}`); + } +} + +async function runPacketProbeBenchmark(opts, gateDir, taskIds, jobs, prepareJobs) { + const args = [ + benchmarkScript, + "--packet-runtime", + "--packet-runtime-mode", + "cold-cli", + "--task-suite", + opts.taskSuite, + "--task-ids", + taskIds, + "--repeats", + String(opts.packetProbeRepeats), + "--repo-cache-dir", + opts.repoCacheDir, + "--out-dir", + gateDir, + "--timeout-ms", + String(opts.timeoutMs), + "--prepare-codestory-timeout-ms", + String(opts.prepareCodestoryTimeoutMs), + "--jobs", + String(jobs), + "--prepare-codestory-jobs", + String(prepareJobs), + "--allow-failures", + ]; + if (opts.materializeRepos) { + args.push("--materialize-repos"); + } + if (opts.prepareCodestoryCache) { + args.push("--prepare-codestory-cache"); + } + + const result = await runProcess(process.execPath, args, { streamOutput: true }); + if (result.status !== "pass") { + process.stderr.write(result.stderr || result.stdout); + throw new Error(`packet gate command failed with exit ${result.exitCode ?? result.status}`); + } +} + +async function runPacketGate(opts, outDir) { + const gateDir = path.join(outDir, "packet-probes"); + mkdirSync(gateDir, { recursive: true }); + await runPacketProbeBenchmark(opts, gateDir, opts.taskIds, opts.packetProbeJobs, opts.prepareCodestoryJobs); + + const qualityDebugPath = path.join(gateDir, "quality-debug.json"); + const qualityDebug = readJsonFileIfPresent(qualityDebugPath); + const retryableTaskIds = retryablePacketGateTaskIds(qualityDebug?.rows ?? [], gateDir); + let selectedQualityDebugPath = qualityDebugPath; + let selectedRows = qualityDebug?.rows ?? []; + let retryDir = null; + let retryQualityDebugPath = null; + if (retryableTaskIds.length) { + retryDir = path.join(outDir, "packet-probes-retry"); + mkdirSync(retryDir, { recursive: true }); + console.log(`packet gate retrying transient sidecar failures: ${retryableTaskIds.join(",")}`); + await runPacketProbeBenchmark(opts, retryDir, retryableTaskIds.join(","), 1, 1); + retryQualityDebugPath = path.join(retryDir, "quality-debug.json"); + const retryQualityDebug = readJsonFileIfPresent(retryQualityDebugPath); + selectedRows = mergePacketGateRows(selectedRows, retryQualityDebug?.rows ?? [], retryableTaskIds); + selectedQualityDebugPath = path.join(gateDir, "quality-debug-merged.json"); + writeFileSync( + selectedQualityDebugPath, + `${JSON.stringify( + { + ...(qualityDebug ?? {}), + scope: "packet_runtime_quality_debug_with_retry", + retry: { + retry_dir: retryDir, + retry_quality_debug: retryQualityDebugPath, + retried_task_ids: retryableTaskIds, + }, + rows: selectedRows, + }, + null, + 2, + )}\n`, + "utf8", + ); + } + const byTask = rowsByTask(selectedRows); + const baseline = opts.packetGateImprovedFrom + ? loadPacketGateBaselineRows(opts.packetGateImprovedFrom) + : null; + const baselineByTask = baseline ? rowsByTask(baseline.rows) : null; + const selected = []; + const improved = []; + const unchangedOrMissing = []; + for (const [taskId, rows] of byTask) { + if (!packetGateTaskPasses(rows)) { + continue; + } + if (baselineByTask) { + const improvement = packetGateImprovement(rows, baselineByTask.get(taskId) ?? []); + if (!improvement.improved) { + unchangedOrMissing.push({ taskId, reason: improvement.reason }); + continue; + } + improved.push({ taskId, reason: improvement.reason }); + } + if (rows.length) { + selected.push(taskId); + } + } + selected.sort(); + improved.sort((a, b) => a.taskId.localeCompare(b.taskId)); + unchangedOrMissing.sort((a, b) => a.taskId.localeCompare(b.taskId)); + + console.log(`METRIC packet_gate_scored_tasks=${byTask.size}`); + console.log(`METRIC packet_gate_selected_tasks=${selected.length}`); + console.log(`METRIC packet_gate_retry_tasks=${retryableTaskIds.length}`); + if (baselineByTask) { + console.log(`METRIC packet_gate_baseline_tasks=${baselineByTask.size}`); + console.log(`METRIC packet_gate_improved_tasks=${improved.length}`); + } + console.log(`ARTIFACT packet_gate_dir=${path.relative(repoRoot, gateDir)}`); + if (existsSync(qualityDebugPath)) { + console.log(`ARTIFACT packet_gate_quality_debug=${path.relative(repoRoot, qualityDebugPath)}`); + } + if (retryDir) { + console.log(`ARTIFACT packet_gate_retry_dir=${path.relative(repoRoot, retryDir)}`); + } + if (retryQualityDebugPath && existsSync(retryQualityDebugPath)) { + console.log(`ARTIFACT packet_gate_retry_quality_debug=${path.relative(repoRoot, retryQualityDebugPath)}`); + } + if (selectedQualityDebugPath !== qualityDebugPath && existsSync(selectedQualityDebugPath)) { + console.log(`ARTIFACT packet_gate_quality_debug_merged=${path.relative(repoRoot, selectedQualityDebugPath)}`); + } + if (baseline?.path) { + console.log(`ARTIFACT packet_gate_improvement_baseline=${path.relative(repoRoot, baseline.path)}`); + } + if (!selected.length) { + console.log("packet gate selected no tasks; skipping nested A/B run"); + if (unchangedOrMissing.length) { + console.log( + `packet gate skipped unchanged tasks: ${unchangedOrMissing.map((row) => `${row.taskId}:${row.reason}`).join(",")}`, + ); + } + return packetGateSelectionOrThrow(selected, unchangedOrMissing, opts); + } + if (improved.length) { + console.log(`packet gate improved tasks: ${improved.map((row) => `${row.taskId}:${row.reason}`).join(",")}`); + } + console.log(`packet gate selected tasks: ${selected.join(",")}`); + return selected; +} + +function packetGateSelectionOrThrow(selected, unchangedOrMissing = [], opts = {}) { + if (selected.length) { + return selected; + } + if (opts.allowEmptyPacketGate) { + return null; + } + const skipped = unchangedOrMissing.length + ? ` Skipped tasks: ${unchangedOrMissing.map((row) => `${row.taskId}:${row.reason}`).join(",")}.` + : ""; + throw new Error( + `packet gate selected no nested A/B tasks; pass --allow-empty-packet-gate only for exploratory diagnostics.${skipped}`, + ); +} + +function readJsonl(filePath) { + return readFileSync(filePath, "utf8") + .split(/\r?\n/) + .map((line) => line.trim()) + .filter(Boolean) + .map((line) => JSON.parse(line)); +} + +function readJsonFileIfPresent(filePath) { + if (!existsSync(filePath)) { + return null; + } + return JSON.parse(readFileSync(filePath, "utf8")); +} + +function resolvePacketGateBaselinePath(sourcePath) { + if (!sourcePath) { + return null; + } + const candidates = []; + if (sourcePath.toLowerCase().endsWith(".json") || sourcePath.toLowerCase().endsWith(".jsonl")) { + candidates.push(sourcePath); + } + candidates.push( + path.join(sourcePath, "quality-debug.json"), + path.join(sourcePath, "packet-probes", "quality-debug.json"), + path.join(sourcePath, "reanalyzed-runs.jsonl"), + ); + return candidates.find((candidate) => existsSync(candidate)) ?? null; +} + +function packetQualityDebugRowsFromAbRows(filePath) { + return readJsonl(filePath) + .filter((row) => row.arm === "with_codestory" && row.codestory_harness_prelude?.packet_manifest_quality) + .map((row) => { + const quality = row.codestory_harness_prelude.packet_manifest_quality; + return { + repo: row.repo, + task_id: row.task_id, + mode: "with_codestory_packet_prelude", + repeat: row.repeat ?? null, + status: row.codestory_harness_prelude.status ?? row.status ?? null, + quality_pass: quality.pass === true, + failure_reasons: quality.failure_reasons ?? [], + quality_metrics: { + expected_file_recall: quality.expected_file_recall, + expected_symbol_recall: quality.expected_symbol_recall, + expected_claim_recall: quality.expected_claim_recall, + citation_coverage: quality.citation_coverage, + expected_anchor_recall: quality.expected_anchor_recall, + forbidden_claims_found: quality.forbidden_claims_found, + }, + missed_anchors: quality.missed_anchors ?? {}, + }; + }); +} + +function loadPacketGateBaselineRows(sourcePath) { + const resolved = resolvePacketGateBaselinePath(sourcePath); + if (!resolved) { + throw new Error(`--packet-gate-improved-from did not contain packet quality evidence: ${sourcePath}`); + } + if (resolved.endsWith(".jsonl")) { + return { path: resolved, rows: packetQualityDebugRowsFromAbRows(resolved) }; + } + const payload = readJsonFileIfPresent(resolved); + return { path: resolved, rows: Array.isArray(payload?.rows) ? payload.rows : [] }; +} + +const transientSidecarFailurePatterns = [ + /\bretrieval_unavailable\b/i, + /\bqdrant_unreachable\b/i, + /\bzoekt_unreachable\b/i, + /\bscip_unreachable\b/i, + /sidecar retrieval .* unavailable/i, + /sidecar retrieval .* failed/i, + /retrieval sidecar is mandatory/i, + /project is not in full mode/i, +]; + +function packetGateStderrPath(gateDir, row) { + const mode = String(row?.mode ?? "cold_cli_packet").replaceAll("_", "-"); + const repeat = String(row?.repeat ?? 1).padStart(2, "0"); + const stem = benchmarkArtifactStem([row?.repo, row?.task_id, mode, repeat]); + return path.join(gateDir, `${stem}.stderr.txt`); +} + +function packetGateRowHasTransientSidecarFailure(row, gateDir) { + if (row?.status === "pass") { + return false; + } + const failureReasons = Array.isArray(row?.failure_reasons) ? row.failure_reasons : []; + if (row?.quality_pass !== null && !failureReasons.includes("missing_quality_score")) { + return false; + } + const stderrPath = packetGateStderrPath(gateDir, row); + if (!existsSync(stderrPath)) { + return false; + } + const stderr = readFileSync(stderrPath, "utf8"); + return transientSidecarFailurePatterns.some((pattern) => pattern.test(stderr)); +} + +function retryablePacketGateTaskIds(rows, gateDir) { + const taskIds = new Set(); + for (const row of rows ?? []) { + if (packetGateRowHasTransientSidecarFailure(row, gateDir) && row?.task_id) { + taskIds.add(row.task_id); + } + } + return [...taskIds].sort(); +} + +function mergePacketGateRows(initialRows, retryRows, retriedTaskIds) { + const retried = new Set(retriedTaskIds); + const merged = (initialRows ?? []).filter((row) => !retried.has(row?.task_id)); + merged.push(...(retryRows ?? []).filter((row) => retried.has(row?.task_id))); + return merged; +} + +function rowsByTask(rows) { + const byTask = new Map(); + for (const row of rows ?? []) { + const taskId = row.task_id; + if (!taskId) { + continue; + } + if (!byTask.has(taskId)) { + byTask.set(taskId, []); + } + byTask.get(taskId).push(row); + } + return byTask; +} + +function packetGateRowPasses(row) { + return row?.status === "pass" && row?.quality_pass === true; +} + +function packetGateTaskPasses(rows) { + return rows.length > 0 && rows.every((row) => packetGateRowPasses(row)); +} + +const packetGateQualityMetricNames = [ + "expected_file_recall", + "expected_symbol_recall", + "expected_claim_recall", + "citation_coverage", + "expected_anchor_recall", +]; + +function averageFinite(values) { + const nums = values.filter((value) => Number.isFinite(value)); + if (!nums.length) { + return null; + } + return nums.reduce((sum, value) => sum + value, 0) / nums.length; +} + +function packetGateMetricAverage(rows, name) { + return averageFinite(rows.map((row) => row.quality_metrics?.[name])); +} + +function missedAnchorCount(row) { + const missed = row?.missed_anchors; + if (!missed || typeof missed !== "object") { + return null; + } + let count = 0; + let sawArray = false; + for (const value of Object.values(missed)) { + if (Array.isArray(value)) { + sawArray = true; + count += value.length; + } + } + return sawArray ? count : null; +} + +function failureReasonCount(row) { + return Array.isArray(row?.failure_reasons) ? row.failure_reasons.length : 0; +} + +function packetGateTaskProfile(rows) { + const metrics = Object.fromEntries( + packetGateQualityMetricNames.map((name) => [name, packetGateMetricAverage(rows, name)]), + ); + return { + rows: rows.length, + passRate: rows.length ? rows.filter((row) => packetGateRowPasses(row)).length / rows.length : 0, + metrics, + missedAnchors: averageFinite(rows.map((row) => missedAnchorCount(row))), + failureReasons: averageFinite(rows.map((row) => failureReasonCount(row))) ?? 0, + }; +} + +function packetGateImprovement(currentRows, baselineRows) { + if (!baselineRows?.length) { + return { improved: false, reason: "missing_baseline_task" }; + } + const current = packetGateTaskProfile(currentRows); + const baseline = packetGateTaskProfile(baselineRows); + const epsilon = 1e-9; + if (current.passRate > baseline.passRate + epsilon) { + return { improved: true, reason: "quality_pass_rate", current, baseline }; + } + for (const name of packetGateQualityMetricNames) { + const currentMetric = current.metrics[name]; + const baselineMetric = baseline.metrics[name]; + if ( + Number.isFinite(currentMetric) && + Number.isFinite(baselineMetric) && + currentMetric > baselineMetric + epsilon + ) { + return { improved: true, reason: name, current, baseline }; + } + } + if ( + Number.isFinite(current.missedAnchors) && + Number.isFinite(baseline.missedAnchors) && + current.missedAnchors < baseline.missedAnchors - epsilon + ) { + return { improved: true, reason: "missed_anchors", current, baseline }; + } + if ( + Number.isFinite(current.failureReasons) && + Number.isFinite(baseline.failureReasons) && + current.failureReasons < baseline.failureReasons - epsilon + ) { + return { improved: true, reason: "failure_reasons", current, baseline }; + } + return { improved: false, reason: "not_improved", current, baseline }; +} + +function median(values) { + const nums = values.filter((value) => Number.isFinite(value)).sort((a, b) => a - b); + if (!nums.length) { + return null; + } + const middle = Math.floor(nums.length / 2); + return nums.length % 2 ? nums[middle] : (nums[middle - 1] + nums[middle]) / 2; +} + +function sumFinite(values) { + return values.reduce((sum, value) => (Number.isFinite(value) ? sum + value : sum), 0); +} + +function sumPresentFinite(values) { + const nums = values.filter((value) => Number.isFinite(value)); + if (!nums.length) { + return null; + } + return nums.reduce((sum, value) => sum + value, 0); +} + +function cachePreparationWallMs(preparation) { + if (!preparation) { + return null; + } + if (Number.isFinite(preparation.preparation_wall_ms)) { + return preparation.preparation_wall_ms; + } + const indexMs = Number.isFinite(preparation.index_wall_ms) ? preparation.index_wall_ms : 0; + const retrievalIndexMs = Number.isFinite(preparation.retrieval_index_wall_ms) + ? preparation.retrieval_index_wall_ms + : 0; + const fallback = indexMs + retrievalIndexMs; + return fallback > 0 ? fallback : null; +} + +function summarizeCachePreparation(outDir) { + const rows = readJsonFileIfPresent(path.join(outDir, "codestory-cache-preparation.json")) ?? []; + return { + rows: Array.isArray(rows) ? rows.length : 0, + preparationWallMs: Array.isArray(rows) + ? sumFinite(rows.map((row) => cachePreparationWallMs(row))) + : null, + indexWallMs: Array.isArray(rows) ? sumFinite(rows.map((row) => row.index_wall_ms)) : null, + retrievalIndexWallMs: Array.isArray(rows) + ? sumFinite(rows.map((row) => row.retrieval_index_wall_ms)) + : null, + }; +} + +function summarizeArm(rows, arm) { + const armRows = rows.filter((row) => row.arm === arm); + const successful = armRows.filter((row) => row.status === "pass"); + return { + rows: armRows.length, + successful: successful.length, + qualityPass: successful.filter((row) => row.quality?.pass).length, + packetFirstPass: successful.filter((row) => row.packet_first_required && row.packet_first_pass).length, + packetFirstRequired: successful.filter((row) => row.packet_first_required).length, + packetManifestQualityPass: successful.filter( + (row) => row.codestory_harness_prelude?.packet_manifest_quality?.pass, + ).length, + packetManifestQualityScored: successful.filter( + (row) => row.codestory_harness_prelude?.packet_manifest_quality, + ).length, + packetPartial: successful.filter( + (row) => row.codestory_harness_prelude?.packet_sufficiency_status === "partial", + ).length, + totalWallMs: sumFinite(successful.map((row) => row.wall_ms)), + totalInputTokens: sumFinite(successful.map((row) => row.usage?.input_tokens)), + totalOutputTokens: sumFinite(successful.map((row) => row.usage?.output_tokens)), + totalTokens: sumFinite(successful.map((row) => row.usage?.total_tokens)), + totalEstimatedCostUsd: sumPresentFinite(successful.map((row) => row.estimated_cost_usd)), + totalToolCalls: sumFinite(successful.map((row) => row.tool_calls_observed)), + totalCommands: sumFinite(successful.map((row) => row.transcript_analysis?.command_count)), + medianWallMs: median(successful.map((row) => row.wall_ms)), + medianInputTokens: median(successful.map((row) => row.usage?.input_tokens)), + medianOutputTokens: median(successful.map((row) => row.usage?.output_tokens)), + medianTokens: median(successful.map((row) => row.usage?.total_tokens)), + medianEstimatedCostUsd: median(successful.map((row) => row.estimated_cost_usd)), + medianToolCalls: median(successful.map((row) => row.tool_calls_observed)), + medianCommands: median(successful.map((row) => row.transcript_analysis?.command_count)), + medianCodeStoryCommands: median( + successful.map((row) => row.transcript_analysis?.command_categories?.codestory_cli ?? 0), + ), + medianShellSearchCommands: median( + successful.map((row) => row.transcript_analysis?.command_categories?.shell_search ?? 0), + ), + medianFileReadCommands: median( + successful.map((row) => row.transcript_analysis?.command_categories?.direct_file_read ?? 0), + ), + medianWebSearches: median(successful.map((row) => row.transcript_analysis?.tool_categories?.web_search ?? 0)), + medianPostPacketReads: median( + successful.map((row) => row.transcript_analysis?.ordinary_source_reads_after_first_packet), + ), + }; +} + +function safeRatio(numerator, denominator, fallback = 999) { + if (!Number.isFinite(numerator) || !Number.isFinite(denominator) || denominator <= 0) { + return fallback; + } + return numerator / denominator; +} + +function score(rows) { + const without = summarizeArm(rows, "without_codestory"); + const withCodeStory = summarizeArm(rows, "with_codestory"); + const tokenRatio = safeRatio(withCodeStory.medianTokens, without.medianTokens); + const wallRatio = safeRatio(withCodeStory.medianWallMs, without.medianWallMs); + const toolRatio = safeRatio(withCodeStory.medianToolCalls, without.medianToolCalls); + const commandRatio = safeRatio(withCodeStory.medianCommands, without.medianCommands); + + const withQualityPenalty = + withCodeStory.qualityPass === withCodeStory.successful && withCodeStory.successful > 0 ? 0 : 1000000; + const packetPenalty = + withCodeStory.packetFirstRequired > 0 && withCodeStory.packetFirstPass === withCodeStory.packetFirstRequired + ? 0 + : 250000; + const postPacketReadPenalty = Math.max(0, withCodeStory.medianPostPacketReads ?? 0) * 100000; + const externalPenalty = + Math.max(0, without.medianWebSearches ?? 0) * 100000 + + Math.max(0, withCodeStory.medianWebSearches ?? 0) * 100000; + + const efficiencyScore = + tokenRatio * 1000 + + wallRatio * 1000 + + toolRatio * 250 + + commandRatio * 250; + const agentAbGap = + efficiencyScore + + withQualityPenalty + + packetPenalty + + postPacketReadPenalty + + externalPenalty; + + return { + agentAbGap, + tokenRatio, + wallRatio, + toolRatio, + commandRatio, + without, + withCodeStory, + penalties: { + withQualityPenalty, + packetPenalty, + postPacketReadPenalty, + externalPenalty, + }, + }; +} + +function printMetric(name, value) { + if (Number.isFinite(value)) { + console.log(`METRIC ${name}=${value}`); + } +} + +function printArtifacts(outDir) { + console.log(`ARTIFACT out_dir=${path.relative(repoRoot, outDir)}`); + for (const name of ["reanalyzed-summary.md", "reanalyzed-runs.jsonl", "summary.md", "runs.jsonl"]) { + const filePath = path.join(outDir, name); + if (existsSync(filePath)) { + console.log(`ARTIFACT ${name.replace(/[^A-Za-z0-9_]+/g, "_")}=${path.relative(repoRoot, filePath)}`); + } + } +} + +async function main() { + const opts = parseArgs(process.argv.slice(2)); + const outDir = opts.reanalyzeDir ?? opts.outDir ?? path.join(repoRoot, "target", "agent-benchmark", "autoresearch-agent-ab", timestampId()); + mkdirSync(outDir, { recursive: true }); + + if (!opts.reanalyzeDir) { + if (opts.packetGate) { + const selectedTaskIds = await runPacketGate(opts, outDir); + if (!selectedTaskIds) { + return; + } + opts.taskIds = selectedTaskIds.join(","); + } + await runBenchmark(opts, outDir); + } + await reanalyze(outDir); + + const rowsPath = path.join(outDir, "reanalyzed-runs.jsonl"); + const rows = readJsonl(rowsPath); + const cachePreparation = summarizeCachePreparation(outDir); + const result = score(rows); + const withTotalWallIncludingPreparation = + result.withCodeStory.totalWallMs + (cachePreparation.preparationWallMs ?? 0); + const allInWallRatio = safeRatio(withTotalWallIncludingPreparation, result.without.totalWallMs); + const totalTokenRatio = safeRatio(result.withCodeStory.totalTokens, result.without.totalTokens); + const totalToolRatio = safeRatio(result.withCodeStory.totalToolCalls, result.without.totalToolCalls); + const totalCommandRatio = safeRatio(result.withCodeStory.totalCommands, result.without.totalCommands); + const agentAbGapAllIn = + totalTokenRatio * 1000 + + allInWallRatio * 1000 + + totalToolRatio * 250 + + totalCommandRatio * 250 + + result.penalties.withQualityPenalty + + result.penalties.packetPenalty + + result.penalties.postPacketReadPenalty + + result.penalties.externalPenalty; + + printMetric("agent_ab_gap", result.agentAbGap); + printMetric("agent_ab_gap_all_in", agentAbGapAllIn); + printMetric("token_ratio", result.tokenRatio); + printMetric("wall_ratio", result.wallRatio); + printMetric("all_in_wall_ratio", allInWallRatio); + printMetric("total_token_ratio", totalTokenRatio); + printMetric("total_tool_ratio", totalToolRatio); + printMetric("total_command_ratio", totalCommandRatio); + printMetric("tool_ratio", result.toolRatio); + printMetric("command_ratio", result.commandRatio); + printMetric("without_quality_passes", result.without.qualityPass); + printMetric("with_quality_passes", result.withCodeStory.qualityPass); + printMetric("quality_pass_delta", result.withCodeStory.qualityPass - result.without.qualityPass); + printMetric("with_packet_first_passes", result.withCodeStory.packetFirstPass); + printMetric("with_packet_manifest_quality_passes", result.withCodeStory.packetManifestQualityPass); + printMetric("with_packet_manifest_quality_scored", result.withCodeStory.packetManifestQualityScored); + printMetric("with_partial_packets", result.withCodeStory.packetPartial); + printMetric("with_post_packet_source_reads", result.withCodeStory.medianPostPacketReads ?? 0); + printMetric("external_web_searches", (result.without.medianWebSearches ?? 0) + (result.withCodeStory.medianWebSearches ?? 0)); + printMetric("with_tokens", result.withCodeStory.medianTokens); + printMetric("without_tokens", result.without.medianTokens); + printMetric("with_total_tokens", result.withCodeStory.totalTokens); + printMetric("without_total_tokens", result.without.totalTokens); + printMetric("with_input_tokens", result.withCodeStory.medianInputTokens); + printMetric("without_input_tokens", result.without.medianInputTokens); + printMetric("with_total_input_tokens", result.withCodeStory.totalInputTokens); + printMetric("without_total_input_tokens", result.without.totalInputTokens); + printMetric("with_output_tokens", result.withCodeStory.medianOutputTokens); + printMetric("without_output_tokens", result.without.medianOutputTokens); + printMetric("with_total_output_tokens", result.withCodeStory.totalOutputTokens); + printMetric("without_total_output_tokens", result.without.totalOutputTokens); + printMetric("with_wall_ms", result.withCodeStory.medianWallMs); + printMetric("without_wall_ms", result.without.medianWallMs); + printMetric("with_total_wall_ms", result.withCodeStory.totalWallMs); + printMetric("without_total_wall_ms", result.without.totalWallMs); + printMetric("codestory_cache_preparation_repos", cachePreparation.rows); + printMetric("codestory_cache_preparation_wall_ms", cachePreparation.preparationWallMs); + printMetric("codestory_cache_index_wall_ms", cachePreparation.indexWallMs); + printMetric("codestory_retrieval_index_wall_ms", cachePreparation.retrievalIndexWallMs); + printMetric("with_total_wall_ms_including_codestory_preparation", withTotalWallIncludingPreparation); + printMetric("with_estimated_cost_usd", result.withCodeStory.medianEstimatedCostUsd); + printMetric("without_estimated_cost_usd", result.without.medianEstimatedCostUsd); + printMetric("with_total_estimated_cost_usd", result.withCodeStory.totalEstimatedCostUsd); + printMetric("without_total_estimated_cost_usd", result.without.totalEstimatedCostUsd); + printMetric("with_tool_calls", result.withCodeStory.medianToolCalls); + printMetric("without_tool_calls", result.without.medianToolCalls); + printMetric("with_total_tool_calls", result.withCodeStory.totalToolCalls); + printMetric("without_total_tool_calls", result.without.totalToolCalls); + printMetric("with_commands", result.withCodeStory.medianCommands); + printMetric("without_commands", result.without.medianCommands); + printMetric("with_total_commands", result.withCodeStory.totalCommands); + printMetric("without_total_commands", result.without.totalCommands); + printMetric("with_codestory_commands", result.withCodeStory.medianCodeStoryCommands); + printMetric("without_codestory_commands", result.without.medianCodeStoryCommands); + printMetric("with_shell_search_commands", result.withCodeStory.medianShellSearchCommands); + printMetric("without_shell_search_commands", result.without.medianShellSearchCommands); + printMetric("with_file_read_commands", result.withCodeStory.medianFileReadCommands); + printMetric("without_file_read_commands", result.without.medianFileReadCommands); + printMetric("with_web_searches", result.withCodeStory.medianWebSearches); + printMetric("without_web_searches", result.without.medianWebSearches); + printArtifacts(outDir); + + console.log( + `A/B score: gap=${result.agentAbGap.toFixed(3)} all_in_gap=${agentAbGapAllIn.toFixed(3)} token_ratio=${result.tokenRatio.toFixed(3)} wall_ratio=${result.wallRatio.toFixed(3)} all_in_wall_ratio=${allInWallRatio.toFixed(3)} with_quality=${result.withCodeStory.qualityPass}/${result.withCodeStory.successful}`, + ); +} + +if (process.argv[1] && import.meta.url === pathToFileURL(process.argv[1]).href) { + main().catch((error) => { + console.error(error instanceof Error ? error.message : error); + process.exit(1); + }); +} + +export { + mergePacketGateRows, + packetGateSelectionOrThrow, + packetGateStderrPath, + packetGateRowHasTransientSidecarFailure, + parseArgs, + retryablePacketGateTaskIds, +}; diff --git a/scripts/codestory-language-holdout-integrity.mjs b/scripts/codestory-language-holdout-integrity.mjs new file mode 100644 index 00000000..d7f36032 --- /dev/null +++ b/scripts/codestory-language-holdout-integrity.mjs @@ -0,0 +1,146 @@ +#!/usr/bin/env node +import { execFileSync } from "node:child_process"; +import fs from "node:fs"; +import path from "node:path"; + +const repoRoot = process.cwd(); +const manifestPath = path.join( + repoRoot, + "benchmarks", + "tasks", + "language-expansion-holdout", + "language-support-ab.task.json", +); +const repoCacheDir = process.env.CODESTORY_AB_REPO_CACHE_DIR + ? path.resolve(repoRoot, process.env.CODESTORY_AB_REPO_CACHE_DIR) + : path.join(repoRoot, "target", "agent-benchmark", "repos"); +const reportPath = process.env.CODESTORY_OSS_CORPUS_REPORT + ? path.resolve(repoRoot, process.env.CODESTORY_OSS_CORPUS_REPORT) + : path.join( + repoRoot, + "target", + "oss-language-corpus", + "reports", + "oss-language-corpus-latest.jsonl", + ); + +function fail(message) { + console.error(`language holdout integrity failed: ${message}`); + process.exit(1); +} + +function readJson(filePath) { + try { + return JSON.parse(fs.readFileSync(filePath, "utf8")); + } catch (error) { + fail(`could not read JSON ${filePath}: ${error.message}`); + } +} + +function gitHead(dir) { + try { + return execFileSync("git", ["-C", dir, "rev-parse", "HEAD"], { + encoding: "utf8", + stdio: ["ignore", "pipe", "pipe"], + }).trim(); + } catch (error) { + fail(`could not read git HEAD in ${dir}: ${error.message}`); + } +} + +function parseReportRows(filePath) { + try { + return fs + .readFileSync(filePath, "utf8") + .split(/\r?\n/) + .map((line) => line.trim()) + .filter(Boolean) + .map((line, index) => { + try { + return JSON.parse(line); + } catch (error) { + fail(`invalid JSONL row ${index + 1} in ${filePath}: ${error.message}`); + } + }); + } catch (error) { + fail(`could not read corpus report ${filePath}: ${error.message}`); + } +} + +const manifest = readJson(manifestPath); +const tasks = Array.isArray(manifest.tasks) ? manifest.tasks : [manifest]; +if (tasks.length !== 18) { + fail(`expected 18 language-expansion tasks, found ${tasks.length}`); +} + +const languages = new Set(); +const repoByCommit = new Map(); +for (const task of tasks) { + const repo = task.repo || {}; + const repoName = String(repo.name || "").trim(); + const ref = String(repo.ref || "").trim(); + const taskLanguages = Array.isArray(repo.languages) ? repo.languages : []; + if (!repoName || !ref || taskLanguages.length === 0) { + fail(`task ${task.id || ""} is missing repo name, ref, or languages`); + } + for (const language of taskLanguages) { + languages.add(language); + } + const checkout = path.join(repoCacheDir, repoName); + if (!fs.existsSync(path.join(checkout, ".git"))) { + fail(`missing materialized repo checkout ${checkout}`); + } + const head = gitHead(checkout); + if (head !== ref) { + fail(`${repoName} HEAD ${head} did not match manifest ref ${ref}`); + } + repoByCommit.set(ref, { repoName, languages: taskLanguages }); +} + +if (languages.size !== 18) { + fail(`expected 18 unique languages, found ${languages.size}`); +} + +const rows = parseReportRows(reportPath); +if (rows.length !== 18) { + fail(`expected 18 OSS corpus report rows, found ${rows.length}`); +} + +let rawFiles = 0; +let indexedFiles = 0; +let nodes = 0; +let edges = 0; +let errors = 0; +let fatalErrors = 0; +for (const row of rows) { + const commit = String(row.commit || ""); + if (!repoByCommit.has(commit)) { + fail(`report row for ${row.repo_name || row.language || ""} uses unexpected commit ${commit}`); + } + if (row.status !== "passed") { + fail(`${row.language || row.repo_name || commit} report status is ${row.status}`); + } + const rawCount = Number(row.raw_without_codestory?.files); + const indexedCount = Number(row.with_codestory?.indexed_files); + const rowErrors = Number(row.with_codestory?.errors); + const rowFatalErrors = Number(row.with_codestory?.fatal_errors); + if (!Number.isFinite(rawCount) || !Number.isFinite(indexedCount)) { + fail(`${row.language || commit} report is missing raw/indexed file counts`); + } + if (rawCount !== indexedCount) { + fail(`${row.language || commit} indexed ${indexedCount} files but raw baseline found ${rawCount}`); + } + if (rowErrors !== 0 || rowFatalErrors !== 0) { + fail(`${row.language || commit} reported errors=${rowErrors} fatal_errors=${rowFatalErrors}`); + } + rawFiles += rawCount; + indexedFiles += indexedCount; + nodes += Number(row.with_codestory?.nodes || 0); + edges += Number(row.with_codestory?.edges || 0); + errors += rowErrors; + fatalErrors += rowFatalErrors; +} + +console.log( + `language holdout integrity ok: tasks=${tasks.length} languages=${languages.size} repos=${repoByCommit.size} raw_files=${rawFiles} indexed_files=${indexedFiles} nodes=${nodes} edges=${edges} errors=${errors} fatal_errors=${fatalErrors}`, +); diff --git a/scripts/codestory-manual-friction-check.mjs b/scripts/codestory-manual-friction-check.mjs index 445895fc..2a647010 100644 --- a/scripts/codestory-manual-friction-check.mjs +++ b/scripts/codestory-manual-friction-check.mjs @@ -214,6 +214,7 @@ function checkPacket(repoName, packetJson) { } const citations = packetJson?.answer?.citations ?? []; const avoidOpening = sufficiency.avoid_opening ?? []; + const avoidOpeningPaths = sufficiency.avoid_opening_paths ?? null; if (!Array.isArray(citations) || citations.length === 0) { addGap(repoName, "packet_missing_citations", 1, "packet answer has no structured citations"); } @@ -226,9 +227,17 @@ function checkPacket(repoName, packetJson) { follow_up_commands: sufficiency.follow_up_commands, }); } - if (!Array.isArray(avoidOpening)) { + if (avoidOpening != null && !Array.isArray(avoidOpening)) { addGap(repoName, "packet_avoid_opening_malformed", 2, "packet sufficiency avoid_opening is not a list"); } + if (!Array.isArray(avoidOpeningPaths)) { + addGap( + repoName, + "packet_avoid_opening_paths_malformed", + 2, + "packet sufficiency avoid_opening_paths is not a raw path list", + ); + } const retrievalTrace = packetJson?.answer?.retrieval_trace; if (!retrievalTrace || typeof retrievalTrace !== "object") { addGap(repoName, "packet_missing_retrieval_trace", 2, "packet answer does not expose retrieval trace telemetry"); diff --git a/scripts/embedding-gpu-fair-benchmark.mjs b/scripts/embedding-gpu-fair-benchmark.mjs index 86430296..a98c7ce9 100644 --- a/scripts/embedding-gpu-fair-benchmark.mjs +++ b/scripts/embedding-gpu-fair-benchmark.mjs @@ -5,12 +5,13 @@ import http from "node:http"; import path from "node:path"; const root = process.env.CODESTORY_EMBED_RESEARCH_ROOT ?? process.env.CODESTORY_FAIR_BENCH_ROOT ?? process.cwd(); +const isWindows = process.platform === "win32"; const bin = process.env.CODESTORY_EMBED_RESEARCH_BIN ?? process.env.CODESTORY_FAIR_BENCH_BIN ?? - path.join(root, "target/release/codestory-cli.exe"); + path.join(root, "target", "release", isWindows ? "codestory-cli.exe" : "codestory-cli"); const llamaDir = process.env.CODESTORY_LLAMA_CPP_DIR ?? path.join(root, "target/llamacpp/b8840"); -const llamaExe = process.env.CODESTORY_LLAMA_CPP_SERVER ?? path.join(llamaDir, "llama-server.exe"); +const llamaExe = process.env.CODESTORY_LLAMA_CPP_SERVER ?? path.join(llamaDir, isWindows ? "llama-server.exe" : "llama-server"); const stamp = new Date().toISOString().replaceAll(/[-:]/g, "").replace(/\..+/, ""); const outDir = process.env.CODESTORY_EMBED_RESEARCH_OUT_DIR ?? diff --git a/scripts/lint-retrieval-generalization.mjs b/scripts/lint-retrieval-generalization.mjs index e7703160..fef78bc9 100644 --- a/scripts/lint-retrieval-generalization.mjs +++ b/scripts/lint-retrieval-generalization.mjs @@ -1,9 +1,10 @@ #!/usr/bin/env node /** * CI guard: ban repo-specific path literals in retrieval integration production code. - * Scope is Rust production retrieval integration files. Benchmark/eval harness scripts - * intentionally live outside this guard because their manifests name holdout - * repos; keep that boundary explicit instead of treating them as product code. + * Scope is Rust production retrieval integration files. Benchmark/eval harness + * scripts and the env-gated eval probe module intentionally live outside this + * guard because their manifests name holdout repos; keep that boundary explicit + * instead of treating them as product code. * Scans Rust files after masking `#[cfg(test)]` items/modules so test fixtures * do not define the production contract. */ @@ -50,6 +51,10 @@ const scanDirs = [ const productionOnlyFiles = requiredProductionOnlyFiles; +const evalOnlyProductionFiles = new Set([ + path.join(repoRoot, "crates", "codestory-runtime", "src", "agent", "eval_probes.rs"), +]); + const benchmarkIdentityScriptFiles = [ path.join(repoRoot, "scripts", "codestory-agent-ab-benchmark.mjs"), path.join(repoRoot, "scripts", "codestory-manual-friction-check.mjs"), @@ -93,6 +98,16 @@ const bannedPatterns = [ "Subcommand::Exec", "ThreadStartParams", "TurnStartParams", + "chinook", + "mdn", + "okio", + "monolog", + "alamofire", + "ChinookDatabase", + "form-validation", + "commonMain/kotlin/okio", + "src/Monolog", + "Source/Core/Session\\.swift", "SocialEntries", "ElsewhereFeed", "src/lib_cxx", @@ -100,12 +115,78 @@ const bannedPatterns = [ "src/lib/data/storage", "getPayloadClient", "comment_submission_guard", + "axios", + "redis", + "ripgrep", + "createInstance", + "InterceptorManager", + "dispatchRequest", + "readQueryFromClient", + "processCommand", + "aeMain", + "aeProcessEvents", + "HiArgs", + "SearchWorker", + "search_parallel", + "adapters\\.js", + "server\\.c", + "ae\\.c", + "networking\\.c", + "core/main\\.rs", + "flags/hiargs\\.rs", + "haystack\\.rs", + "lib/axios\\.js", + "lib/core/Axios\\.js", + "StringUtils", + "commons-lang", + "PreparedRequest", + "HTTPAdapter", + "createApplication", + "app\\.use", + "lib/express\\.js", + "Jekyll", + "LogRecord", + "AbstractProcessingHandler", + "useSWR", + "swr", + "gin\\.go", + "RouterGroup\\.Handle", + "Engine\\.addRoute", + "Engine\\.handleHTTPRequest", + "AutoMapper", + "TypeMapPlanBuilder", + "RealBufferedSource", + "RealBufferedSink", + "DataRequest", + "SessionDelegate", + "novalidate", + "showError", + "source/animate\\.css", ]; const bannedLiteralPatterns = [ "payload_collection", ]; +const bannedCompactPatterns = [ + "swr", + "useswr", + "stringutils", + "charsequenceutils", + "preparedrequest", + "httpadapter", + "createapplication", + "appuse", + "jekyll", + "logrecord", + "automapper", + "realbufferedsource", + "realbufferedsink", + "datarequest", + "sessiondelegate", + "sourceanimatecss", +]; + const allowedPatternLines = [ { pattern: "payload_collection", @@ -553,6 +634,103 @@ function scanProductionStringLiterals(filePath, pattern) { return hits; } +function compactProductionSource(text) { + return rustStringLiteralContent(text) + .replace(/["'`]/g, "") + .replace(/[^a-zA-Z0-9]+/g, "") + .toLowerCase(); +} + +function rustStringLiteralContent(literal) { + const raw = literal.match(/^b?r(#+)?"([\s\S]*)"(#*)$/); + if (raw && (raw[1] ?? "") === raw[3]) { + return raw[2]; + } + if (literal.length >= 2 && ["\"", "'", "`"].includes(literal[0])) { + return literal.slice(1, -1); + } + return literal; +} + +function scanProductionCompactPatterns(filePath, marker) { + const production = productionSource(filePath); + const markerLower = marker.toLowerCase(); + const hits = []; + const literals = rustStringLiteralSpans(production); + for (let start = 0; start < literals.length; start += 1) { + let compact = ""; + for (let end = start; end < literals.length; end += 1) { + if ( + end > start + && !literalJoinGapAllowsCompactScan( + production.slice(literals[end - 1].endOffset, literals[end].startOffset), + ) + ) { + break; + } + compact += compactProductionSource(literals[end].literal); + if (compact === markerLower) { + hits.push( + compactPatternHit(filePath, literals[start].line, literals[end].line, marker), + ); + break; + } + if (compact.length >= markerLower.length) { + break; + } + } + } + return hits; +} + +function rustStringLiteralSpans(text) { + const literals = []; + const lineStarts = [0]; + for (let index = 0; index < text.length; index += 1) { + if (text[index] === "\n") { + lineStarts.push(index + 1); + } + } + + const stringLiteral = /(?:b?r#*"[^"]*"#*|"(?:\\.|[^"\\])*"|'(?:\\.|[^'\\])*')/g; + let match; + while ((match = stringLiteral.exec(text)) != null) { + literals.push({ + literal: match[0], + startOffset: match.index, + endOffset: match.index + match[0].length, + line: lineNumberAtOffset(lineStarts, match.index), + }); + } + return literals; +} + +function lineNumberAtOffset(lineStarts, offset) { + let low = 0; + let high = lineStarts.length - 1; + while (low <= high) { + const mid = Math.floor((low + high) / 2); + if (lineStarts[mid] <= offset) { + low = mid + 1; + } else { + high = mid - 1; + } + } + return high + 1; +} + +function literalJoinGapAllowsCompactScan(gap) { + return /^[\s,]*$/.test(gap); +} + +function compactPatternHit(filePath, startLine, endLine, marker) { + const lineDisplay = startLine === endLine ? startLine : `${startLine}-${endLine}`; + return ( + `${filePath}:${lineDisplay}: ` + + `compact production source contains split benchmark marker ${marker}` + ); +} + function rustStringLiteralsOnLine(line) { const literals = []; const stringLiteral = /(?:b?r#*"[^"]*"#*|"(?:\\.|[^"\\])*"|'(?:\\.|[^'\\])*')/g; @@ -569,6 +747,10 @@ function lineAllowedForPattern(pattern, line) { ); } +function isEvalOnlyProductionFile(filePath) { + return evalOnlyProductionFiles.has(path.resolve(filePath)); +} + function scanRankerFilenameLiterals(filePath) { const lines = productionSource(filePath).split(/\r?\n/); const hits = []; @@ -595,22 +777,33 @@ if (scanFiles.size === 0) { } for (const filePath of [...scanFiles].sort()) { - for (const pattern of bannedPatterns) { - const hits = scanProductionFile(filePath, pattern); - if (hits.length > 0) { - console.error( - `Banned pattern /${pattern}/ in ${path.relative(repoRoot, filePath)} (production slice):\n${hits.join("\n")}\n`, - ); - failed = true; + if (!isEvalOnlyProductionFile(filePath)) { + for (const pattern of bannedPatterns) { + const hits = scanProductionFile(filePath, pattern); + if (hits.length > 0) { + console.error( + `Banned pattern /${pattern}/ in ${path.relative(repoRoot, filePath)} (production slice):\n${hits.join("\n")}\n`, + ); + failed = true; + } } - } - for (const pattern of bannedLiteralPatterns) { - const hits = scanProductionStringLiterals(filePath, pattern); - if (hits.length > 0) { - console.error( - `Banned literal pattern /${pattern}/ in ${path.relative(repoRoot, filePath)} (production slice):\n${hits.join("\n")}\n`, - ); - failed = true; + for (const pattern of bannedLiteralPatterns) { + const hits = scanProductionStringLiterals(filePath, pattern); + if (hits.length > 0) { + console.error( + `Banned literal pattern /${pattern}/ in ${path.relative(repoRoot, filePath)} (production slice):\n${hits.join("\n")}\n`, + ); + failed = true; + } + } + for (const pattern of bannedCompactPatterns) { + const hits = scanProductionCompactPatterns(filePath, pattern); + if (hits.length > 0) { + console.error( + `Banned compact benchmark marker /${pattern}/ in ${path.relative(repoRoot, filePath)} (production slice):\n${hits.join("\n")}\n`, + ); + failed = true; + } } } if (filePath.endsWith(`${path.sep}ranker.rs`)) { diff --git a/scripts/setup-retrieval-env.mjs b/scripts/setup-retrieval-env.mjs index be33b858..a679c2a1 100644 --- a/scripts/setup-retrieval-env.mjs +++ b/scripts/setup-retrieval-env.mjs @@ -9,6 +9,7 @@ * SCIP language indexers are documented only — not installed by this script. */ import { spawnSync } from "node:child_process"; +import { createHash } from "node:crypto"; import fs from "node:fs"; import os from "node:os"; import path from "node:path"; @@ -205,8 +206,12 @@ function printPrereqReport(opts) { } const BGE_GGUF = "bge-base-en-v1.5.Q8_0.gguf"; -const BGE_URL = - "https://huggingface.co/BAAI/bge-base-en-v1.5-GGUF/resolve/main/bge-base-en-v1.5.Q8_0.gguf"; +const BGE_GGUF_SHA256 = "ad1afe72cd6654a558667a3db10878b049a75bfd72912e1dabb91310d671173c"; +const BGE_GGUF_BYTES = 117_974_304; +const BGE_URLS = [ + "https://huggingface.co/BAAI/bge-base-en-v1.5-GGUF/resolve/main/bge-base-en-v1.5.Q8_0.gguf", + "https://huggingface.co/CompendiumLabs/bge-base-en-v1.5-gguf/resolve/main/bge-base-en-v1.5-q8_0.gguf", +]; function embedModelDir() { if (process.env.CODESTORY_EMBED_MODEL_DIR) { @@ -215,23 +220,73 @@ function embedModelDir() { return path.join(repoRoot, "target", "retrieval-models"); } +function sha256File(file) { + return new Promise((resolve, reject) => { + const hash = createHash("sha256"); + const stream = fs.createReadStream(file); + stream.on("data", (chunk) => hash.update(chunk)); + stream.on("error", reject); + stream.on("end", () => resolve(hash.digest("hex"))); + }); +} + +async function verifyEmbedModel(file) { + const stat = fs.statSync(file); + if (!stat.isFile()) { + throw new Error(`Embed model path is not a file: ${file}`); + } + if (stat.size !== BGE_GGUF_BYTES) { + throw new Error( + `Embed model size mismatch for ${file}: got ${stat.size} bytes, expected ${BGE_GGUF_BYTES}`, + ); + } + const actual = await sha256File(file); + if (actual !== BGE_GGUF_SHA256) { + throw new Error( + `Embed model SHA-256 mismatch for ${file}: got ${actual}, expected ${BGE_GGUF_SHA256}`, + ); + } + return actual; +} + async function fetchEmbedModel() { const dir = embedModelDir(); fs.mkdirSync(dir, { recursive: true }); const dest = path.join(dir, BGE_GGUF); if (fs.existsSync(dest) && fs.statSync(dest).size > 1_000_000) { - console.log(`Embed model already present: ${dest}`); + let checksum; + try { + checksum = await verifyEmbedModel(dest); + } catch (error) { + throw new Error( + `${error instanceof Error ? error.message : error}. Remove ${dest} and rerun --fetch-embed-model.`, + ); + } + console.log(`Embed model already present and verified: ${dest} sha256=${checksum}`); return dest; } - console.log(`Downloading ${BGE_GGUF} to ${dest} ...`); - const response = await fetch(BGE_URL); - if (!response.ok) { - throw new Error(`Failed to download embed model: HTTP ${response.status}`); + let lastError = null; + for (const url of BGE_URLS) { + console.log(`Downloading ${BGE_GGUF} from ${url} to ${dest} ...`); + const response = await fetch(url); + if (!response.ok) { + lastError = `HTTP ${response.status} from ${url}`; + continue; + } + const buffer = Buffer.from(await response.arrayBuffer()); + const tempDest = `${dest}.tmp-${process.pid}`; + try { + fs.writeFileSync(tempDest, buffer); + const checksum = await verifyEmbedModel(tempDest); + fs.renameSync(tempDest, dest); + console.log(`Wrote ${dest} (${buffer.length} bytes, sha256=${checksum})`); + return dest; + } catch (error) { + fs.rmSync(tempDest, { force: true }); + lastError = `${error instanceof Error ? error.message : error} from ${url}`; + } } - const buffer = Buffer.from(await response.arrayBuffer()); - fs.writeFileSync(dest, buffer); - console.log(`Wrote ${dest} (${buffer.length} bytes)`); - return dest; + throw new Error(`Failed to download embed model: ${lastError ?? "no URLs configured"}`); } async function main() { diff --git a/scripts/tests/codestory-agent-ab-analyzer.test.mjs b/scripts/tests/codestory-agent-ab-analyzer.test.mjs index 775eb780..e6162fbb 100644 --- a/scripts/tests/codestory-agent-ab-analyzer.test.mjs +++ b/scripts/tests/codestory-agent-ab-analyzer.test.mjs @@ -1,6 +1,6 @@ import test from "node:test"; import assert from "node:assert/strict"; -import { mkdtemp, rm, writeFile } from "node:fs/promises"; +import { mkdir, mkdtemp, rm, truncate, writeFile } from "node:fs/promises"; import os from "node:os"; import path from "node:path"; @@ -10,24 +10,42 @@ import { assertSafeWindowsCmdArgs, benchmarkRunId, commandCategory, + copyResultArtifact, + isTrustedPublishableRepoUrl, isPathInside, loadTaskForResult, loadTasks, + manifestRepoMaterializationBlockers, + MAX_REUSED_ARTIFACT_BYTES, parseArgs as parseBenchmarkArgs, parseJsonLines, packetComposition, + packetCommandArgs, + packetForAgentPrompt, + packetManifestExtraProbes, + packetManifestQualitySummary, + packetPreludeManifestComplete, packetLatencyTelemetry, packetFirstCommandForPrompt, packetRuntimePublishableBlockers, + packetRuntimeQualityGateRequired, publicCoreCorpusAudit, repoProvenanceBlockers, + resolveRunArtifactPath, resolveCodeStoryCli, scoreQuality, + summarizeCostAccounting, summarizePacketRuntimeRuns, buildQualityDebugPayload, qualityFailureReasons, taskSnapshotForResult, } from "../codestory-agent-ab-benchmark.mjs"; +import { + packetGateSelectionOrThrow, + packetGateStderrPath, + parseArgs as parseScoreArgs, + retryablePacketGateTaskIds, +} from "../codestory-agent-ab-score.mjs"; const RUNTIME_SERVICE_FILE = "crates/codestory-runtime/src/services.rs"; const RUN_INDEX_SYMBOL = "IndexService::run_indexing_blocking"; @@ -41,11 +59,14 @@ test("parses packet-runtime benchmark run id", () => { "local-real", "--benchmark-run-id", "segment 43/v2", + "--prepare-codestory-jobs", + "2", ]); assert.equal(opts.packetRuntime, true); assert.equal(opts.benchmarkRunId, "segment-43-v2"); assert.equal(opts.prepareCodestoryCache, true); + assert.equal(opts.prepareCodestoryJobs, 2); assert.throws( () => parseBenchmarkArgs([ @@ -56,6 +77,17 @@ test("parses packet-runtime benchmark run id", () => { ]), /sidecar preparation is mandatory/, ); + assert.throws( + () => + parseBenchmarkArgs([ + "--packet-runtime", + "--task-suite", + "local-real", + "--prepare-codestory-jobs", + "0", + ]), + /--prepare-codestory-jobs must be a positive integer/, + ); }); test("packet latency telemetry preserves retrieval shadow cache diagnostics", () => { @@ -199,7 +231,15 @@ async function withManifestFile(manifest, callback) { test("categorizes commands without treating source paths as cli invocations", () => { assert.equal(commandCategory("& $env:CODESTORY_CLI packet --project . --question flow"), "codestory_cli"); + assert.equal(commandCategory('"${CODESTORY_CLI:-codestory-cli}" packet --project . --question flow'), "codestory_cli"); + assert.equal(commandCategory('"$CODESTORY_CLI" index --project . --refresh full'), "codestory_cli"); assert.equal(commandCategory('& "C:\\tools\\codestory-cli.exe" packet --project . --question flow'), "codestory_cli"); + assert.equal( + commandCategory( + String.raw`"C:\Program Files\PowerShell\pwsh.exe" -Command '& $(if ($env:CODESTORY_CLI) { $env:CODESTORY_CLI } else { 'codestory-cli' }) packet --project . --question 'Trace flow' --task-class 'route-tracing' --budget compact --format json"`, + ), + "codestory_cli", + ); assert.equal( commandCategory( '"C:\\Program Files\\PowerShell\\pwsh.exe" -Command "& \\"C:\\tools\\codestory-cli.exe\\" packet --project . --question flow"', @@ -225,6 +265,65 @@ test("categorizes commands without treating source paths as cli invocations", () assert.equal(commandCategory("cargo test -p codestory-cli --test onboarding_contracts"), "build_test"); }); +test("packet gate retries only transient sidecar packet failures", async () => { + const dir = await mkdtemp(path.join(os.tmpdir(), "codestory-packet-gate-retry-")); + try { + const retryable = { + repo: "dart-lang-http", + task_id: "dart-http-client-flow", + mode: "cold_cli_packet", + repeat: 1, + status: "fail", + quality_pass: null, + failure_reasons: ["missing_quality_score"], + }; + const qualityFailure = { + repo: "fixture", + task_id: "quality-failure", + mode: "cold_cli_packet", + repeat: 1, + status: "fail", + quality_pass: false, + failure_reasons: ["expected_claim_recall_low"], + }; + await writeFile( + packetGateStderrPath(dir, retryable), + "Error: retrieval_unavailable: project is not in full mode (mode=no_semantic, reason=qdrant_unreachable)\n", + "utf8", + ); + await writeFile(packetGateStderrPath(dir, qualityFailure), "manifest quality failed\n", "utf8"); + + assert.deepEqual(retryablePacketGateTaskIds([retryable, qualityFailure], dir), [ + "dart-http-client-flow", + ]); + } finally { + await rm(dir, { recursive: true, force: true }); + } +}); + +test("packet gate empty selection is explicit exploratory behavior", () => { + assert.throws( + () => + packetGateSelectionOrThrow( + [], + [ + { + taskId: "python-requests-session-flow", + reason: "not_improved", + }, + ], + {}, + ), + /allow-empty-packet-gate/, + ); + + assert.equal(packetGateSelectionOrThrow([], [], { allowEmptyPacketGate: true }), null); + assert.deepEqual(packetGateSelectionOrThrow(["python-requests-session-flow"], [], {}), [ + "python-requests-session-flow", + ]); + assert.equal(parseScoreArgs(["--packet-gate", "--allow-empty-packet-gate"]).allowEmptyPacketGate, true); +}); + test("rejects manifest repo and workspace paths outside the cache", async () => { await withManifestFile( manifestFixture({ @@ -260,23 +359,76 @@ test("rejects manifest repo and workspace paths outside the cache", async () => ); }); -test("packet-first command renders manifest text as PowerShell literals", () => { - const command = packetFirstCommandForPrompt( +test("packet-first command renders manifest text for host shells", () => { + const windowsCommand = packetFirstCommandForPrompt( "Inspect $env:SECRET and $(Get-ChildItem), then read John's file.\nNext line.", { task_class: "bug_localization" }, + "win32", ); assert.match( - command, + windowsCommand, /--question 'Inspect \$env:SECRET and \$\(Get-ChildItem\), then read John''s file\. Next line\.'/, ); - assert.match(command, /--task-class 'bug-localization'/); + assert.match(windowsCommand, /--task-class 'bug-localization'/); + + const unixCommand = packetFirstCommandForPrompt( + "Inspect $env:SECRET and $(Get-ChildItem), then read John's file.\nNext line.", + { task_class: "bug_localization" }, + "linux", + ); + + assert.ok(unixCommand.startsWith('"${CODESTORY_CLI:-codestory-cli}" packet ')); + assert.ok( + unixCommand.includes( + "--question 'Inspect $env:SECRET and $(Get-ChildItem), then read John'\\''s file. Next line.'", + ), + ); + assert.match(unixCommand, /--task-class 'bug-localization'/); assert.throws( - () => packetFirstCommandForPrompt("Explain the task.", { task_class: "bug_localization; Remove-Item ." }), + () => packetFirstCommandForPrompt("Explain the task.", { task_class: "bug_localization; Remove-Item ." }, "linux"), /task_class/, ); }); +test("packet command keeps manifest-derived extra probes diagnostic-only", () => { + const task = { + prompt: "Explain how Requests dispatch works.", + task_class: "architecture_explanation", + expected_files: ["src/requests/api.py", "src/requests/sessions.py"], + expected_symbols: ["request", "Session.request"], + expected_symbol_probes: [ + "src/requests/api.py request", + "src/requests/sessions.py Session.request", + "src/requests/sessions.py Session.send", + ], + }; + + assert.deepEqual(packetManifestExtraProbes(task), [ + "src/requests/api.py", + "src/requests/sessions.py", + "src/requests/api.py request", + "src/requests/sessions.py Session.request", + "src/requests/sessions.py Session.send", + ]); + + const args = packetCommandArgs({ path: "C:\\repo" }, task); + assert.equal(args.filter((arg) => arg === "--extra-probe").length, 0); + + const diagnosticArgs = packetCommandArgs( + { path: "C:\\repo" }, + task, + { diagnosticExtraProbesFromManifest: true }, + ); + const extraProbeIndexes = diagnosticArgs + .map((arg, index) => (arg === "--extra-probe" ? index : -1)) + .filter((index) => index >= 0); + + assert.equal(extraProbeIndexes.length, 5); + assert.equal(diagnosticArgs[extraProbeIndexes[0] + 1], "src/requests/api.py"); + assert.equal(diagnosticArgs[extraProbeIndexes[3] + 1], "src/requests/sessions.py Session.request"); +}); + test("benchmark artifact run ids strip path separators from dynamic parts", () => { assert.equal( benchmarkRunId(["../repo", "task/id", "with codestory", "01"]), @@ -284,12 +436,191 @@ test("benchmark artifact run ids strip path separators from dynamic parts", () = ); }); +test("publishable benchmark args reject diagnostic packet probes", () => { + assert.throws( + () => + parseBenchmarkArgs([ + "--publishable", + "--diagnostic-extra-probes-from-manifest", + ]), + /diagnostic-only/, + ); +}); + +test("publishable repo URL trust only accepts plain GitHub HTTPS repo URLs", () => { + assert.equal(isTrustedPublishableRepoUrl("https://github.com/expressjs/express.git"), true); + assert.equal(isTrustedPublishableRepoUrl("https://github.com/expressjs/express"), true); + assert.equal(isTrustedPublishableRepoUrl("file:///tmp/repo.git"), false); + assert.equal(isTrustedPublishableRepoUrl("https://example.com/expressjs/express.git"), false); + assert.equal(isTrustedPublishableRepoUrl("https://github.com/expressjs/express.git?ref=main"), false); + assert.equal(isTrustedPublishableRepoUrl("https://token@github.com/expressjs/express.git"), false); +}); + +test("publishable materialization preflight rejects arbitrary URLs and moving refs", async () => { + await withManifestFile( + manifestFixture({ + repo: { + name: "fixture-repo", + url: "file:///tmp/fixture.git", + ref: "main", + workspace_root: ".", + }, + }), + async (manifestPath) => { + const opts = parseBenchmarkArgs([ + "--task-manifest", + manifestPath, + "--publishable", + "--materialize-repos", + "--max-source-reads-after-packet", + "0", + ]); + const tasks = await loadTasks(opts); + const blockers = manifestRepoMaterializationBlockers(tasks, opts); + const blockerText = blockers.join("\n"); + + assert.match(blockerText, /https:\/\/github\.com\/\//); + assert.match(blockerText, /full immutable commit SHA/); + }, + ); +}); + +test("publishable materialization preflight stays fail-closed for direct options", async () => { + await withManifestFile( + manifestFixture({ + repo: { + name: "fixture-repo", + url: "file:///tmp/fixture.git", + ref: "main", + workspace_root: ".", + }, + }), + async (manifestPath) => { + const opts = parseBenchmarkArgs([ + "--task-manifest", + manifestPath, + "--materialize-repos", + "--max-source-reads-after-packet", + "0", + ]); + const tasks = await loadTasks(opts); + const blockers = manifestRepoMaterializationBlockers(tasks, { + ...opts, + publishable: true, + }); + + assert.match(blockers.join("\n"), /full immutable commit SHA/); + }, + ); +}); + +test("publishable materialization preflight rejects mutable tags before fetch", async () => { + await withManifestFile( + manifestFixture({ + repo: { + name: "fixture-repo", + url: "https://github.com/example/fixture.git", + ref: "v1.2.3", + workspace_root: ".", + }, + }), + async (manifestPath) => { + const opts = parseBenchmarkArgs([ + "--task-manifest", + manifestPath, + "--publishable", + "--materialize-repos", + "--max-source-reads-after-packet", + "0", + ]); + const tasks = await loadTasks(opts); + const blockers = manifestRepoMaterializationBlockers(tasks, opts); + + assert.match(blockers.join("\n"), /full immutable commit SHA/); + }, + ); +}); + +test("publishable materialization preflight accepts trusted pinned GitHub manifests", async () => { + await withManifestFile( + manifestFixture({ + repo: { + name: "fixture-repo", + url: "https://github.com/example/fixture.git", + ref: "1234567890abcdef1234567890abcdef12345678", + workspace_root: ".", + }, + }), + async (manifestPath) => { + const opts = parseBenchmarkArgs([ + "--task-manifest", + manifestPath, + "--publishable", + "--materialize-repos", + "--max-source-reads-after-packet", + "0", + ]); + const tasks = await loadTasks(opts); + + assert.deepEqual(manifestRepoMaterializationBlockers(tasks, opts), []); + }, + ); +}); + test("path containment rejects sibling-prefix directories", () => { const root = path.join(os.tmpdir(), "codestory-agent-benchmark", "repos"); assert.equal(isPathInside(root, path.join(root, "express")), true); assert.equal(isPathInside(root, path.join(os.tmpdir(), "codestory-agent-benchmark", "repos2", "evil")), false); }); +test("reused baseline artifact paths stay inside the previous run directory", () => { + const runDir = path.join(os.tmpdir(), "codestory-agent-benchmark", "previous-run"); + assert.equal( + resolveRunArtifactPath(runDir, "codestory.without.01.stdout.jsonl"), + path.resolve(runDir, "codestory.without.01.stdout.jsonl"), + ); + assert.equal(resolveRunArtifactPath(runDir, path.join(runDir, "codestory.without.01.stdout.jsonl")), null); + assert.equal(resolveRunArtifactPath(runDir, "..\\outside.stdout.jsonl"), null); + assert.equal(resolveRunArtifactPath(runDir, "codestory.without.01.env"), null); +}); + +test("copying reused baseline artifacts rejects oversized files", async () => { + const root = await mkdtemp(path.join(os.tmpdir(), "codestory-reused-artifacts-")); + try { + const runDir = path.join(root, "previous"); + const outDir = path.join(root, "next"); + await mkdir(runDir, { recursive: true }); + await mkdir(outDir, { recursive: true }); + const sourceName = "codestory.without.01.stdout.jsonl"; + const sourcePath = path.join(runDir, sourceName); + await writeFile(sourcePath, ""); + await truncate(sourcePath, MAX_REUSED_ARTIFACT_BYTES + 1); + + await assert.rejects( + () => copyResultArtifact(runDir, outDir, sourceName, "copied.stdout.jsonl"), + /Refusing to reuse oversized baseline artifact/, + ); + } finally { + await rm(root, { recursive: true, force: true }); + } +}); + +test("copying reused baseline artifacts rejects absolute source paths", async () => { + const root = await mkdtemp(path.join(os.tmpdir(), "codestory-reused-artifacts-")); + try { + const runDir = path.join(root, "previous"); + const outDir = path.join(root, "next"); + await mkdir(runDir, { recursive: true }); + await mkdir(outDir, { recursive: true }); + const sourcePath = path.join(runDir, "codestory.without.01.stdout.jsonl"); + await writeFile(sourcePath, "{}\n"); + + assert.equal(await copyResultArtifact(runDir, outDir, sourcePath, "copied.stdout.jsonl"), null); + } finally { + await rm(root, { recursive: true, force: true }); + } +}); + test("Windows Codex runner args reject cmd metacharacters", () => { assert.doesNotThrow(() => assertSafeWindowsCmdArgs(["exec", "--cd", "C:\\Users\\alber\\source\\repos\\codestory"])); assert.throws( @@ -418,6 +749,179 @@ test("analyzes transcript command friction and scores manifest anchors", () => { assert.equal(quality.citation_coverage.recall, 1); }); +test("counts direct source reads for every supported language extension family", () => { + const paths = [ + "src/main.rs", + "src/app.py", + "src/App.java", + "src/index.js", + "src/index.tsx", + "include/fmt/base.hpp", + "src/server.c", + "router.go", + "lib/site.rb", + "src/Logger.php", + "src/Mapper.cs", + "src/Main.kt", + "Package.swift", + "lib/client.dart", + "nvm.sh", + "index.html", + "styles/site.css", + "schema/chinook.sql", + ]; + const events = paths.flatMap((sourcePath, index) => [ + commandEvent(`cmd_${index}`, "item.started", `Get-Content ${sourcePath}`), + commandEvent(`cmd_${index}`, "item.completed", `Get-Content ${sourcePath}`, "source"), + ]); + + const analysis = analyzeTranscript(events); + assert.equal(analysis.command_categories.direct_file_read, paths.length); + assert.equal(analysis.direct_source_reads_total, paths.length); +}); + +test("counts PowerShell LiteralPath source reads after a CodeStory packet", () => { + const command = String.raw`"C:\\Program Files\\PowerShell\\pwsh.exe" -Command '$lines = Get-Content -LiteralPath '"'src/index/use-swr.ts' +for ($i = 1; $i -le 2; $i++) { "{0}: {1}" -f $i, $lines[$i - 1] }'`; + const events = [ + commandEvent("packet", "item.started", "& $env:CODESTORY_CLI packet --project . --question flow"), + commandEvent("packet", "item.completed", "& $env:CODESTORY_CLI packet --project . --question flow", "{}"), + commandEvent("read", "item.started", command), + commandEvent("read", "item.completed", command, "export default useSWR"), + ]; + + const analysis = analyzeTranscript(events); + assert.equal(analysis.command_categories.direct_file_read, 1); + assert.equal(analysis.direct_source_reads_total, 1); + assert.equal(analysis.ordinary_source_reads_after_first_packet, 1); + assert.deepEqual(analysis.direct_file_reads_duplicated, {}); +}); + +test("counts modern Codex JSONL tool categories including web search", () => { + const events = [ + { + type: "item.started", + item: { + id: "item_web", + type: "web_search", + query: "github psf requests api.py", + }, + }, + { + type: "item.completed", + item: { + id: "item_web", + type: "web_search", + query: "github psf requests api.py", + }, + }, + { + type: "item.started", + item: { + id: "item_mcp", + type: "mcp_tool_call", + server: "codex", + tool: "list_mcp_resources", + }, + }, + ]; + + const analysis = analyzeTranscript(events); + assert.equal(analysis.command_count, 0); + assert.equal(analysis.tool_categories.web_search, 1); + assert.equal(analysis.tool_categories.mcp_tool_call, 1); + assert.equal(analysis.external_context_tool_calls, 1); + + const blockers = agentPublishableBlockers([ + { + status: "pass", + arm: "without_codestory", + usage: { total_tokens: 1 }, + transcript_analysis: analysis, + }, + ]); + assert.match(blockers[0].reasons.join("\n"), /external web\/search tool calls=1 > 0/); +}); + +test("summarizes A/B cost accounting totals and ratios", () => { + const costAccounting = summarizeCostAccounting([ + { + arm: "without_codestory", + status: "pass", + wall_ms: 200, + agent_runner_wall_ms: 190, + baseline_harness_prelude: { + wall_ms: 10, + }, + usage: { input_tokens: 80, output_tokens: 20, total_tokens: 100 }, + estimated_cost_usd: 0.02, + tool_calls_observed: 4, + transcript_analysis: { + command_count: 4, + tool_categories: { command_execution: 4 }, + command_categories: { shell_search: 2, direct_file_read: 2 }, + direct_source_reads_total: 2, + external_context_tool_calls: 0, + }, + }, + { + arm: "with_codestory", + status: "pass", + wall_ms: 50, + agent_runner_wall_ms: 40, + usage: { input_tokens: 30, output_tokens: 10, total_tokens: 40 }, + estimated_cost_usd: 0.01, + tool_calls_observed: 1, + codex_tool_calls_observed: 0, + codestory_harness_prelude: { + wall_ms: 10, + }, + codestory_cache_provenance: { + cache_preparation: { preparation_wall_ms: 10 }, + }, + transcript_analysis: { + command_count: 1, + tool_categories: { command_execution: 1 }, + command_categories: { codestory_cli: 1 }, + direct_source_reads_total: 0, + external_context_tool_calls: 0, + }, + }, + { + arm: "with_codestory", + status: "fail", + wall_ms: 5, + usage: null, + estimated_cost_usd: null, + tool_calls_observed: 1, + transcript_analysis: { + command_count: 1, + tool_categories: { command_execution: 1 }, + command_categories: { codestory_cli: 1 }, + direct_source_reads_total: 0, + external_context_tool_calls: 0, + }, + }, + ]); + + assert.equal(costAccounting.arms.with_codestory.runs, 2); + assert.equal(costAccounting.arms.with_codestory.failed_runs, 1); + assert.equal(costAccounting.arms.with_codestory.missing_token_usage_runs, 1); + assert.equal(costAccounting.arms.without_codestory.time_spent_ms.agent_runner, 190); + assert.equal(costAccounting.arms.without_codestory.time_spent_ms.baseline_harness_prelude, 10); + assert.equal(costAccounting.arms.with_codestory.time_spent_ms.runner_wall, 55); + assert.equal(costAccounting.arms.with_codestory.time_spent_ms.agent_runner, 45); + assert.equal(costAccounting.arms.with_codestory.time_spent_ms.codestory_harness_prelude, 10); + assert.equal(costAccounting.arms.with_codestory.time_spent_ms.all_in, 65); + assert.equal(costAccounting.arms.with_codestory.tokens_spent.total_tokens, 40); + assert.equal(costAccounting.arms.with_codestory.tool_calls.codex_observed, 0); + assert.equal(costAccounting.arms.without_codestory.tool_calls.observed, 4); + assert.equal(costAccounting.arms.without_codestory.commands.categories.shell_search, 2); + assert.equal(costAccounting.with_vs_without.total_tokens.ratio, 0.4); + assert.equal(costAccounting.with_vs_without.all_in_wall_ms.ratio, 0.325); + assert.equal(costAccounting.with_vs_without.tool_calls.with_minus_without, -2); +}); + test("parses JSONL transcript text before analysis", () => { const jsonl = [ JSON.stringify(commandEvent("cmd_1", "item.started", "codestory-cli packet --project . --question flow")), @@ -468,6 +972,33 @@ test("requires packet as the CodeStory subcommand for packet-first telemetry", ( assert.equal(analysis.packet_was_first_context_command, false); }); +test("recognizes quoted PowerShell variable CodeStory packet commands", () => { + const command = + "\"C:\\\\Program Files\\\\PowerShell\\\\pwsh.exe\" -Command '$cli = if ($env:CODESTORY_CLI) { $env:CODESTORY_CLI } else { '\"'codestory-cli' }\n& \"'$cli packet --project . --question '\"'Explain flow' --task-class 'architecture-explanation' --budget compact --format json\""; + const events = [ + commandEvent("cmd_1", "item.started", command), + commandEvent("cmd_1", "item.completed", command, "{\"packet_id\":\"ask-1\"}", 0), + ]; + + const analysis = analyzeTranscript(events); + assert.equal(analysis.command_categories.codestory_cli, 1); + assert.equal(analysis.first_successful_packet_command.id, "cmd_1"); + assert.equal(analysis.packet_was_first_context_command, true); +}); + +test("recognizes inline PowerShell env fallback CodeStory packet commands", () => { + const command = String.raw`"C:\Program Files\PowerShell\pwsh.exe" -Command '& $(if ($env:CODESTORY_CLI) { $env:CODESTORY_CLI } else { 'codestory-cli' }) packet --project . --question 'Trace flow' --task-class 'route-tracing' --budget compact --format json"`; + const events = [ + commandEvent("cmd_1", "item.started", command), + commandEvent("cmd_1", "item.completed", command, "{\"packet_id\":\"ask-1\"}", 0), + ]; + + const analysis = analyzeTranscript(events); + assert.equal(analysis.command_categories.codestory_cli, 1); + assert.equal(analysis.first_successful_packet_command.id, "cmd_1"); + assert.equal(analysis.packet_was_first_context_command, true); +}); + test("packet-first telemetry treats git and help probes before packet as context", () => { const gitFirst = analyzeTranscript([ commandEvent("cmd_git", "item.completed", "git status --short", " M file"), @@ -485,6 +1016,35 @@ test("packet-first telemetry treats git and help probes before packet as context assert.equal(helpFirst.packet_was_first_context_command, false); }); +test("harness packet prelude counts as the first context command", () => { + const events = [ + { + type: "harness.command.started", + item: { + id: "harness_codestory_packet", + type: "command_execution", + command: '"C:\\tools\\codestory-cli.exe" packet --project . --question flow --format json', + }, + }, + { + type: "harness.command.completed", + item: { + id: "harness_codestory_packet", + type: "command_execution", + command: '"C:\\tools\\codestory-cli.exe" packet --project . --question flow --format json', + aggregated_output: '{"answer":{"citations":[{"file_path":"src/requests/sessions.py"}]}}', + exit_code: 0, + }, + }, + ]; + + const analysis = analyzeTranscript(events); + assert.equal(analysis.command_count, 1); + assert.equal(analysis.tool_categories.command_execution, 1); + assert.equal(analysis.first_successful_packet_command.id, "harness_codestory_packet"); + assert.equal(analysis.packet_was_first_context_command, true); +}); + test("codestory cli resolver prefers explicit path, release binary, then PATH fallback", () => { const explicit = resolveCodeStoryCli({ codestoryCli: "C:/custom/codestory-cli.exe" }, () => { throw new Error("explicit path should not probe local candidates"); @@ -598,7 +1158,8 @@ test("packet composition separates citations, answer surfaces, and structured-on ], }, sufficiency: { - avoid_opening: ["src/lib/data/storage/PersistentStorage.cpp"], + avoid_opening: ["src/lib/data/storage/LegacyOnly.cpp because this is legacy prose"], + avoid_opening_paths: ["src/lib/data/storage/PersistentStorage.cpp"], covered_claims: [ { claim: "Hidden trace source mentions src/lib_cxx/project/SourceGroupCxxCdb.cpp.", @@ -646,6 +1207,130 @@ test("packet composition separates citations, answer surfaces, and structured-on assert.equal(composition.verification_summary.structured_file_recall, 0); }); +test("packet prompt excerpt keeps answer support while dropping bulky packet fields", () => { + const longText = `${"flow ".repeat(1400)}tail`; + const promptPacket = packetForAgentPrompt({ + answer: { + summary: "Requests flow", + sections: [{ title: "Verbose", blocks: [{ markdown: longText }] }], + citations: [ + { + display_name: "Session.request", + kind: "function", + file_path: + "C:/repo/target/agent-benchmark/repos/psf-requests/src/requests/sessions.py", + line: 557, + snippet: "large snippet should not be embedded", + }, + ], + }, + sufficiency: { + status: "partial", + gaps: ["drop me"], + open_next: ["drop me too"], + avoid_opening: [ + "C:/repo/target/agent-benchmark/repos/psf-requests/src/requests/legacy.py because legacy prose", + ], + avoid_opening_paths: [ + "C:/repo/target/agent-benchmark/repos/psf-requests/src/requests/api.py", + ], + follow_up_commands: ["a", "b", "c", "d", "e"], + covered_claims: [ + { + claim: "Session.request prepares requests.", + citations: [ + { + display_name: "Session.request", + file_path: + "C:/repo/target/agent-benchmark/repos/psf-requests/src/requests/sessions.py", + line: 557, + }, + ], + }, + ], + }, + }); + + assert.equal(promptPacket.answer.summary, "Requests flow"); + assert.match(promptPacket.answer.text, /\[truncated \d+ chars\]$/); + assert.ok(promptPacket.answer.text.length < longText.length); + assert.deepEqual(promptPacket.answer.citations, [ + { + display_name: "Session.request", + kind: "function", + file_path: "src/requests/sessions.py", + line: 557, + }, + ]); + assert.deepEqual(promptPacket.sufficiency.avoid_opening, ["src/requests/api.py"]); + assert.deepEqual(promptPacket.sufficiency.follow_up_commands, ["a", "b", "c", "d"]); + assert.deepEqual(promptPacket.sufficiency.covered_claims, [ + "Session.request prepares requests.", + ]); + assert.equal(Object.hasOwn(promptPacket.answer, "sections"), false); + assert.equal(Object.hasOwn(promptPacket.sufficiency, "gaps"), false); + assert.equal(Object.hasOwn(promptPacket.sufficiency, "open_next"), false); +}); + +test("packet manifest completion is gated by packet quality evidence", () => { + const task = manifestFixture({ + expected_files: ["src/requests/sessions.py"], + expected_symbols: ["Session.request"], + expected_claims: ["Session.request prepares requests."], + }); + const packet = { + answer: { + summary: "Session.request prepares requests in src/requests/sessions.py.", + sections: [], + citations: [ + { + display_name: "Session.request", + file_path: "src/requests/sessions.py", + line: 557, + }, + ], + }, + sufficiency: { + covered_claims: [{ claim: "Session.request prepares requests." }], + }, + }; + + const quality = packetManifestQualitySummary(packet, task); + assert.equal(quality.pass, true); + assert.equal( + packetPreludeManifestComplete({ + packet_manifest_quality: quality, + packet_composition: packetComposition(packet, task), + }), + true, + ); + + const incompleteQuality = packetManifestQualitySummary( + { + answer: { + summary: "Session.request is present in src/requests/sessions.py.", + citations: [ + { + display_name: "Session.request", + file_path: "src/requests/sessions.py", + line: 557, + }, + ], + }, + sufficiency: { covered_claims: [] }, + }, + task, + ); + assert.equal(incompleteQuality.pass, false); + assert.equal( + packetPreludeManifestComplete({ + packet_manifest_quality: incompleteQuality, + packet_composition: packetComposition(packet, task), + }), + false, + ); +}); + const LOCAL_REAL_COMPACT_BUDGET_TASKS = [ { repo: "vscode", @@ -750,6 +1435,30 @@ test("forbidden claim scoring requires negative polarity terms", () => { assert.equal(quality.pass, true); }); +test("forbidden claim scoring does not flag contradicted positive claims", () => { + const task = runtimeQualityTask("forbidden-positive-contradicted-fixture", { + min_expected_file_recall: 0, + min_expected_symbol_recall: 0, + min_expected_claim_recall: 0, + min_citation_coverage: 0, + min_expected_anchor_recall: 0, + max_forbidden_claims: 0, + }); + task.forbidden_claims = ["StringUtils.isEmpty treats whitespace-only strings as empty."]; + + const quality = scoreQuality( + [ + agentMessageEvent( + "StringUtils.isEmpty does not trim whitespace before deciding emptiness.", + ), + ], + task, + ); + + assert.equal(quality.forbidden_claims.found, 0); + assert.equal(quality.pass, true); +}); + test("forbidden claim scoring does not combine unrelated storage sentences", () => { const task = runtimeQualityTask("forbidden-storage-fixture", { min_expected_file_recall: 0, @@ -774,12 +1483,45 @@ test("forbidden claim scoring does not combine unrelated storage sentences", () assert.equal(quality.pass, true); }); +test("forbidden claim scoring keeps polarity inside one candidate sentence", () => { + const task = runtimeQualityTask("forbidden-shell-polarity-fixture", { + min_expected_file_recall: 0, + min_expected_symbol_recall: 0, + min_expected_claim_recall: 0, + min_citation_coverage: 0, + min_expected_anchor_recall: 0, + max_forbidden_claims: 0, + }); + task.forbidden_claims = [ + "nvm is a compiled binary and does not dispatch through shell functions.", + ]; + + const quality = scoreQuality( + [ + agentMessageEvent( + "`nvm` is the shell function dispatcher. `nvm_use_if_needed` switches versions only when the requested version is not already active.", + ), + ], + task, + ); + + assert.equal(quality.forbidden_claims.found, 0); + assert.equal(quality.pass, true); +}); + function pinnedRepoProvenance() { return { manifest_overridden_by_builtin: false, - configured: { ref: "9fdfd4650427eb050a11fd9ebd7a4e13dd4b57d7" }, - manifest: { ref: "9fdfd4650427eb050a11fd9ebd7a4e13dd4b57d7" }, + configured: { + url: "https://github.com/example/fixture.git", + ref: "9fdfd4650427eb050a11fd9ebd7a4e13dd4b57d7", + }, + manifest: { + url: "https://github.com/example/fixture.git", + ref: "9fdfd4650427eb050a11fd9ebd7a4e13dd4b57d7", + }, git_head: "9fdfd4650427eb050a11fd9ebd7a4e13dd4b57d7", + git_origin: "https://github.com/example/fixture.git", git_dirty: false, }; } @@ -810,11 +1552,14 @@ function publishableWithCodeStoryResult(overrides = {}) { arm: "with_codestory", repeat: 1, status: "pass", + wall_ms: 10, usage: { total_tokens: 100 }, + tool_calls_observed: 1, packet_first_required: true, packet_first_pass: true, quality: { pass: true }, transcript_analysis: { + command_count: 1, ordinary_source_reads_after_first_packet: 0, }, repo_provenance: pinnedRepoProvenance(), @@ -872,6 +1617,46 @@ test("publishable gate blocks avoidable source reads after packet", () => { assert.match(blockers[0].reasons.join("\n"), /ordinary source reads after packet=1 > 0/); }); +test("publishable gate records but does not block post-packet reads by default", () => { + const blockers = agentPublishableBlockers([ + publishableWithCodeStoryResult({ + transcript_analysis: { + command_count: 3, + ordinary_source_reads_after_first_packet: 2, + }, + }), + ]); + + assert.deepEqual(blockers, []); +}); + +test("publishable gate requires explicit post-packet source-read budget", () => { + const blockers = agentPublishableBlockers( + [publishableWithCodeStoryResult()], + { publishable: true }, + ); + + assert.equal(blockers.length, 1); + assert.match(blockers[0].reasons.join("\n"), /missing explicit post-packet source-read budget/); +}); + +test("publishable gate rejects diagnostic packet probes", () => { + const blockers = agentPublishableBlockers( + [ + publishableWithCodeStoryResult({ + codestory_harness_prelude: { + packet_extra_probe_count: 2, + packet_extra_probe_strategy: "diagnostic_manifest_expected_anchors", + }, + }), + ], + { publishable: true, maxSourceReadsAfterPacket: 0 }, + ); + + assert.equal(blockers.length, 1); + assert.match(blockers[0].reasons.join("\n"), /diagnostic packet extra probes used/); +}); + test("publishable gate requires packet before ordinary context exploration", () => { const blockers = agentPublishableBlockers( [ @@ -897,13 +1682,102 @@ test("publishable gate requires packet before ordinary context exploration", () assert.match(blockers[0].reasons.join("\n"), /missing answer packet as first successful context command/); }); -test("publishable provenance requires pinned clean manifest checkout", () => { +test("publishable gate rejects CodeStory use in the without arm", () => { + const blockers = agentPublishableBlockers([ + { + repo: "codestory", + task_id: "codestory-indexing-flow", + arm: "without_codestory", + repeat: 1, + status: "pass", + wall_ms: 10, + usage: { total_tokens: 100 }, + tool_calls_observed: 1, + packet_first_required: false, + packet_first_pass: true, + quality: { pass: true }, + transcript_analysis: { + command_count: 1, + command_categories: { + codestory_cli: 1, + }, + external_context_tool_calls: 0, + }, + }, + ]); + + assert.equal(blockers.length, 1); + assert.match(blockers[0].reasons.join("\n"), /without_codestory arm used CodeStory/); +}); + +test("publishable gate requires local repo inspection in the without arm", () => { + const blockers = agentPublishableBlockers([ + { + repo: "codestory", + task_id: "codestory-indexing-flow", + arm: "without_codestory", + repeat: 1, + status: "pass", + wall_ms: 10, + usage: { total_tokens: 100 }, + tool_calls_observed: 1, + packet_first_required: false, + packet_first_pass: true, + quality: { pass: true }, + transcript_analysis: { + command_count: 0, + command_categories: {}, + external_context_tool_calls: 0, + }, + }, + ]); + + assert.equal(blockers.length, 1); + assert.match(blockers[0].reasons.join("\n"), /without_codestory arm did not inspect local repository/); +}); + +test("publishable gate accepts ordinary local inspection in the without arm", () => { + const blockers = agentPublishableBlockers([ + { + repo: "codestory", + task_id: "codestory-indexing-flow", + arm: "without_codestory", + repeat: 1, + status: "pass", + wall_ms: 10, + usage: { total_tokens: 100 }, + tool_calls_observed: 1, + packet_first_required: false, + packet_first_pass: true, + quality: { pass: true }, + transcript_analysis: { + command_count: 2, + command_categories: { + shell_search: 1, + direct_file_read: 1, + }, + external_context_tool_calls: 0, + }, + }, + ]); + + assert.deepEqual(blockers, []); +}); + +test("publishable provenance requires full-SHA clean manifest checkout", () => { const clean = { repo_provenance: { manifest_overridden_by_builtin: false, - configured: { ref: "9fdfd4650427eb050a11fd9ebd7a4e13dd4b57d7" }, - manifest: { ref: "9fdfd4650427eb050a11fd9ebd7a4e13dd4b57d7" }, - git_head: "abc123", + configured: { + url: "https://github.com/example/fixture.git", + ref: "9fdfd4650427eb050a11fd9ebd7a4e13dd4b57d7", + }, + manifest: { + url: "https://github.com/example/fixture.git", + ref: "9fdfd4650427eb050a11fd9ebd7a4e13dd4b57d7", + }, + git_head: "9fdfd4650427eb050a11fd9ebd7a4e13dd4b57d7", + git_origin: "https://github.com/example/fixture.git", git_dirty: false, }, }; @@ -912,13 +1786,118 @@ test("publishable provenance requires pinned clean manifest checkout", () => { repoProvenanceBlockers({ repo_provenance: { manifest_overridden_by_builtin: false, - configured: { ref: "main" }, - manifest: { ref: "main" }, + configured: { + url: "https://github.com/example/fixture.git", + ref: "main", + }, + manifest: { + url: "https://github.com/example/fixture.git", + ref: "main", + }, git_head: "abc123", + git_origin: "https://github.com/example/fixture.git", + git_dirty: false, + }, + }).join("\n"), + /not pinned to a full immutable commit SHA/, + ); + for (const ref of ["abcdef0", "v1.2.3", "refs/tags/v1.2.3"]) { + assert.match( + repoProvenanceBlockers({ + repo_provenance: { + manifest_overridden_by_builtin: false, + configured: { + url: "https://github.com/example/fixture.git", + ref, + }, + manifest: { + url: "https://github.com/example/fixture.git", + ref, + }, + git_head: "abc123", + git_origin: "https://github.com/example/fixture.git", + git_dirty: false, + }, + }).join("\n"), + /not pinned to a full immutable commit SHA/, + `publishable provenance should reject ${ref}`, + ); + } + assert.match( + repoProvenanceBlockers({ + repo_provenance: { + manifest_overridden_by_builtin: false, + configured: { + url: "https://github.com/example/fixture.git", + ref: "9fdfd4650427eb050a11fd9ebd7a4e13dd4b57d7", + }, + manifest: { + url: "https://github.com/example/fixture.git", + ref: "9fdfd4650427eb050a11fd9ebd7a4e13dd4b57d7", + }, + git_head: "1234567890abcdef1234567890abcdef12345678", + git_origin: "https://github.com/example/fixture.git", + git_dirty: false, + }, + }).join("\n"), + /does not match configured ref/, + ); + assert.match( + repoProvenanceBlockers({ + repo_provenance: { + manifest_overridden_by_builtin: false, + configured: { + url: "file:///tmp/fixture.git", + ref: "9fdfd4650427eb050a11fd9ebd7a4e13dd4b57d7", + }, + manifest: { + url: "file:///tmp/fixture.git", + ref: "9fdfd4650427eb050a11fd9ebd7a4e13dd4b57d7", + }, + git_head: "9fdfd4650427eb050a11fd9ebd7a4e13dd4b57d7", + git_origin: "file:///tmp/fixture.git", + git_dirty: false, + }, + }).join("\n"), + /configured repo URL is not a trusted GitHub HTTPS repo URL/, + ); + assert.match( + repoProvenanceBlockers({ + repo_provenance: { + manifest_overridden_by_builtin: false, + configured: { + url: "https://github.com/example/fixture.git", + ref: "9fdfd4650427eb050a11fd9ebd7a4e13dd4b57d7", + }, + manifest: { + url: "https://github.com/other/fixture.git", + ref: "9fdfd4650427eb050a11fd9ebd7a4e13dd4b57d7", + }, + git_head: "9fdfd4650427eb050a11fd9ebd7a4e13dd4b57d7", + git_origin: "https://github.com/example/fixture.git", + git_dirty: false, + }, + }).join("\n"), + /manifest repo URL .* does not match configured URL/, + ); + assert.match( + repoProvenanceBlockers({ + repo_provenance: { + manifest_overridden_by_builtin: false, + configured: { + url: "https://github.com/example/fixture.git", + ref: "9fdfd4650427eb050a11fd9ebd7a4e13dd4b57d7", + }, + manifest: { + url: "https://github.com/example/fixture.git", + ref: "9fdfd4650427eb050a11fd9ebd7a4e13dd4b57d7", + }, + git_head: "9fdfd4650427eb050a11fd9ebd7a4e13dd4b57d7", + git_origin: "https://github.com/other/fixture.git", git_dirty: false, }, }).join("\n"), - /not pinned to an immutable commit or tag/, + /git origin .* does not match configured URL/, ); const blockers = agentPublishableBlockers( @@ -938,9 +1917,10 @@ test("publishable provenance requires pinned clean manifest checkout", () => { }, repo_provenance: { manifest_overridden_by_builtin: true, - configured: { ref: "local" }, - manifest: { ref: "main" }, + configured: { url: "local", ref: "local" }, + manifest: { url: "https://github.com/example/fixture.git", ref: "main" }, git_head: "abc123", + git_origin: "local", git_dirty: true, }, }, @@ -961,7 +1941,7 @@ test("publishable gate requires CodeStory cache provenance for CodeStory arm", ( codestory_cache_provenance: null, }), ], - { publishable: true }, + { publishable: true, maxSourceReadsAfterPacket: 0 }, ); assert.equal(blockers.length, 1); @@ -971,12 +1951,35 @@ test("publishable gate requires CodeStory cache provenance for CodeStory arm", ( test("publishable gate accepts local-only CodeStory cache provenance", () => { const blockers = agentPublishableBlockers( [publishableWithCodeStoryResult()], - { publishable: true }, + { publishable: true, maxSourceReadsAfterPacket: 0 }, ); assert.deepEqual(blockers, []); }); +test("publishable gate requires resource accounting fields", () => { + const blockers = agentPublishableBlockers( + [ + publishableWithCodeStoryResult({ + wall_ms: null, + usage: { total_tokens: null }, + tool_calls_observed: null, + transcript_analysis: { + ordinary_source_reads_after_first_packet: 0, + }, + }), + ], + { publishable: true, maxSourceReadsAfterPacket: 0 }, + ); + + assert.equal(blockers.length, 1); + const reasons = blockers[0].reasons.join("\n"); + assert.match(reasons, /missing wall time/); + assert.match(reasons, /missing total token usage/); + assert.match(reasons, /missing tool call count/); + assert.match(reasons, /missing command count/); +}); + test("publishable gate requires CodeStory local-only provenance", () => { const blockers = agentPublishableBlockers( [ @@ -987,7 +1990,7 @@ test("publishable gate requires CodeStory local-only provenance", () => { }), }), ], - { publishable: true }, + { publishable: true, maxSourceReadsAfterPacket: 0 }, ); assert.equal(blockers.length, 1); @@ -1048,6 +2051,36 @@ test("packet runtime publishable gate requires SLA pass and full retrieval shado assert.match(blockers[2].reasons.join("\n"), /packet retrieval shadow mode=degraded; expected full/); }); +test("packet runtime publishable gate rejects diagnostic packet probes", () => { + const blockers = packetRuntimePublishableBlockers( + [ + publishablePacketRuntimeResult({ + packet_extra_probe_count: 1, + packet_extra_probe_strategy: "diagnostic_manifest_expected_anchors", + }), + ], + { publishable: true }, + ); + + assert.equal(blockers.length, 1); + assert.match(blockers[0].reasons.join("\n"), /diagnostic packet extra probes used/); +}); + +test("holdout packet runtime requires quality gate unless failures are allowed", () => { + assert.equal( + packetRuntimeQualityGateRequired({ taskSuite: "holdout-retrieval" }), + true, + ); + assert.equal( + packetRuntimeQualityGateRequired({ + taskSuite: "holdout-retrieval", + allowFailures: true, + }), + false, + ); + assert.equal(packetRuntimeQualityGateRequired({ taskSuite: "local-real" }), false); +}); + test("reanalysis uses the run-time task snapshot before current manifest contents", async () => { await withManifestFile( manifestFixture({ @@ -1121,3 +2154,56 @@ test("buildQualityDebugPayload aggregates failure counts", () => { assert.equal(payload.summary.quality_fail_runs, 1); assert.ok(Object.keys(payload.summary.failure_reason_counts).length > 0); }); + +test("buildQualityDebugPayload preserves packet sufficiency diagnostics", () => { + const payload = buildQualityDebugPayload([ + { + repo: "requests", + task_id: "requests-session-flow", + mode: "cold_cli_packet", + status: "pass", + quality: { + pass: true, + thresholds: {}, + expected_anchors: { recall: 1 }, + expected_files: { recall: 1 }, + expected_symbols: { recall: 1 }, + expected_claims: { recall: 1 }, + citation_coverage: { recall: 1 }, + forbidden_claims: { found: 0 }, + }, + sufficiency: { + status: "partial", + gaps_count: 2, + gaps: [ + "Packet was truncated by Compact budget: citations, trail_edges.", + "Packet omitted answer-critical evidence under Compact budget; use a deeper packet before treating this as complete.", + ], + open_next_count: 2, + open_next: ["codestory-cli packet --budget standard", "codestory-cli search --why"], + follow_up_commands_count: 2, + follow_up_commands: [ + "codestory-cli packet --budget standard", + "codestory-cli search --why", + ], + covered_claims_count: 8, + avoid_opening_count: 4, + sufficient_quality_mismatch: false, + }, + }, + ]); + + assert.equal(payload.rows[0].sufficiency_status, "partial"); + assert.deepEqual(payload.rows[0].sufficiency.gaps, [ + "Packet was truncated by Compact budget: citations, trail_edges.", + "Packet omitted answer-critical evidence under Compact budget; use a deeper packet before treating this as complete.", + ]); + assert.equal(payload.rows[0].sufficiency.follow_up_commands_count, 2); + assert.equal(payload.summary.packet_partial_runs, 1); + assert.equal( + payload.summary.partial_gap_counts[ + "Packet was truncated by Compact budget: citations, trail_edges." + ], + 1, + ); +});