diff --git a/.gobby/plans/gcode-graph-enhancements.md b/.gobby/plans/gcode-graph-enhancements.md index 2d2374b..4326a3e 100644 --- a/.gobby/plans/gcode-graph-enhancements.md +++ b/.gobby/plans/gcode-graph-enhancements.md @@ -1,56 +1,76 @@ -# gcode graph enhancements on the Rust foundation +# gcode-owned code projections on the Rust foundation -## Overview +**Plan ID:** gcode-graph-enhancements + +## O1: Overview `kind: framing` -`gcode` owns code-index and code-graph behavior on top of the shared Rust foundation defined in `.gobby/plans/gcore-rust-foundation.md`. The durable target is not a generic datastore/search substrate inside `gobby-code`; it is a code-specific graph API that the CLI wraps today and the future Rust daemon links directly. Python daemon shims may shell out to `gcode` during the migration window, but that is a compatibility bridge, not the final architecture. +`gcode` owns code-index behavior and the code projections derived from it on top of the shared Rust foundation defined in `.gobby/plans/gcore-rust-foundation.md`. The durable target is not a graph-only API or a generic datastore/search substrate inside `gobby-code`; it is a code-specific projection layer that owns PostgreSQL code facts, the FalkorDB `gobby_code` graph projection, and Qdrant `code_symbols_` symbol-vector collections. + +This plan moves code fact writes, graph/vector projection sync, lifecycle operations, and project graph reporting into `gobby-code` library boundaries first. Shared context/config resolution, attached versus standalone setup contracts, PostgreSQL/FalkorDB/Qdrant adapters, generic indexing/search primitives, and degradation vocabulary are consumed from `gobby-core`. `gcode` remains the user-facing CLI wrapper for code APIs. Code projections must work without the daemon process, and standalone mode gets an explicit setup path for the minimal gcode-owned app schema it needs. -This plan moves code-graph writes, reads, lifecycle operations, and project graph reporting into `gobby-code` library boundaries first. Shared context/config resolution, attached versus standalone setup contracts, PostgreSQL/FalkorDB/Qdrant adapters, generic indexing/search primitives, and degradation vocabulary are consumed from `gobby-core`. `gcode` remains the user-facing CLI wrapper for code APIs. The code graph must work without the daemon process, and standalone mode gets an explicit setup path for the minimal gcode-owned app schema it needs. +For code-symbol vectors, `gcode` calls OpenAI-compatible `/v1/embeddings` endpoints directly. In attached Gobby mode, bootstrap resolves the PostgreSQL hub and service settings come from `config_store` plus secret resolution; env vars remain explicit overrides for standalone, tests, and emergency diagnostics. The daemon embedding service is bypassed for code-index projection sync. Python remains a scheduler, API, UI, and MCP bridge during the migration window; it may shell out to stable `gcode` JSON commands until the future Rust daemon links the same library APIs directly. LLM-generated symbol summaries stay daemon-side for now and remain optional enrichment, not a projection-sync prerequisite. The code/memory boundary stays sharp. Rust code-index modules own deterministic code facts: files, symbols, imports, definitions, calls, unresolved call targets, and code graph reports derived from those facts. Gobby memory services own memories, knowledge graph extraction, and `RELATES_TO_CODE` bridge creation. Rust report code may read bridge edges when present so agents can see hypotheses beside extracted code facts, but it must not create or mutate memory-owned data. -## Architecture Principles +## D1: Dependent Plans +`kind: framing` + +This plan depends on the shared Rust foundation defined in `.gobby/plans/gcore-rust-foundation.md`. Shared context/config resolution, attached/standalone setup contracts, PostgreSQL/FalkorDB/Qdrant adapters, generic indexing/search primitives, and degradation vocabulary are consumed from `gobby-core`. `gobby-code` owns code-specific PostgreSQL fact writes, FalkorDB `gobby_code` graph projection, Qdrant `code_symbols_` vector projection, lifecycle commands, and project graph reports. + +Memory graph behavior, `RELATES_TO_CODE` bridge edges, and LLM-generated symbol summaries remain owned by the Gobby daemon's memory services and are not in scope for this plan. + +## A1: Architecture Principles `kind: framing` - Foundation dependency: shared context/config, setup contracts, datastore adapters, generic indexing/search primitives, and degradation types come from `gobby-core`. -- Code-specific core: code graph writes, reads, lifecycle operations, reports, symbol IDs, language facts, and code graph API shapes live in `gobby-code`. +- Vector projection lifecycle exception: `gobby-core::qdrant` intentionally scopes to client surface (collection naming, `with_qdrant` ServiceState boundary, `search`, and `upsert`); collection lifecycle (ensure collection with vector params, delete-by-filter, clear/drop, rebuild) is consumer-owned. Code-specific Qdrant lifecycle HTTP is allowed inside `crates/gcode/src/vector/code_symbols.rs` only and must still resolve config through `gobby-core::config::resolve_qdrant_config`, enter the ServiceState/degradation boundary through `gobby-core::qdrant::with_qdrant`, derive collection names through `gobby-core::qdrant::collection_name(.., CollectionScope::Custom(..))`, and use `gobby-core::qdrant::search` / `gobby-core::qdrant::upsert` for non-lifecycle operations. No other `gobby-code` file may issue raw Qdrant lifecycle REST. +- Phase 7 compatibility facade exception: `crates/gcode/src/falkor.rs` is allowed to instantiate `falkordb::FalkorClientBuilder` directly and own its own `SyncGraph` because the foundation `gobby_core::falkor::GraphClient` exposes a private `graph: SyncGraph` field and no public hook (no `into_sync_graph`, `from_graph_client`, or `with_graph_client` constructor) for unwrapping or wrapping. The external Phase 7 source-inspection contract requires the local shape `pub struct FalkorClient { graph: SyncGraph }`, which cannot be built from `gobby_core::falkor::GraphClient` today. Within this exception, the facade still resolves FalkorConfig fields (host, port, password) through `gobby_core::config::resolve_falkordb_config` via the §1.5 `ConfigSource` adapter; only connection plumbing and `SyncGraph` ownership are local. All other `gobby-code` graph consumers (graph writes, graph reads, search graph boost, projection lifecycle, report generation, and CLI command handlers) MUST enter Falkor through `gobby_core::falkor::with_graph` and MUST NOT instantiate `falkordb::FalkorClientBuilder` themselves. §1.5.22 pins the single-file scope. +- Code-specific core: PostgreSQL code facts, graph projection sync, vector projection sync, reports, symbol IDs, language facts, and code API shapes live in `gobby-code`. - CLI wrapper: `gcode` parses CLI args, resolves context, calls library APIs, and formats output. - Direct linking is the target daemon integration. The future Rust daemon links the same Rust code directly; no daemon HTTP, MCP, or CLI shell boundary is the target internal architecture. - Python daemon shell-outs are transitional only. If needed before the Rust daemon lands, Python calls stable `gcode` JSON commands and treats failures as explicit degradation. +- Bootstrap-first attached config: `bootstrap.yaml` or the daemon broker resolves the PostgreSQL hub; FalkorDB, Qdrant, and embedding settings are read from `config_store` with secret resolution. +- Env vars are overrides, not the attached-mode source of truth. They support standalone mode, tests, and emergency diagnostics. +- Direct embedding ownership: code-symbol vectors are generated by `gcode` through OpenAI-compatible embedding endpoints, not through the daemon embedding service. +- Qdrant compatibility: existing `code_symbols_` collection names remain the public storage contract; no vector collection migration is required. - Gobby-attached mode remains non-destructive. It validates the externally managed hub schema and must not create, alter, or drop Gobby-owned tables. -- Standalone mode is explicit. A setup command creates only the minimal gcode-owned app schema in a selected standalone database or schema namespace. Runtime graph commands validate prerequisites and never run implicit migrations. +- Standalone mode is explicit. A setup command creates only the minimal gcode-owned app schema in a selected standalone database or schema namespace. Runtime projection commands validate prerequisites and never run implicit migrations. - Code owns code; memory owns memory. `CALLS`, `IMPORTS`, and `DEFINES` are extracted code facts. `RELATES_TO_CODE` and LLM-created memory relationships are inferred memory facts. -- Degraded behavior is honest. Missing PostgreSQL/FalkorDB/Qdrant pieces produce typed errors or degraded report sections; graph commands must not return fake empty success payloads for unavailable services. +- Degraded behavior is honest. Missing PostgreSQL/FalkorDB/Qdrant/embedding pieces produce typed errors, degraded report sections, or clear non-zero exits depending on command hardness; projection lifecycle commands must not return fake empty success payloads for unavailable services. - JSON compatibility is preserved. New metadata fields are optional with `#[serde(skip_serializing_if = "Option::is_none")]`. - Phase 7 contract tests in the Gobby repo remain a compatibility gate until that external source-inspection contract changes. -## Non-Goals +## N1: Non-Goals `kind: framing` - Do not make `gcode` the long-term owner of daemon orchestration, UI, MCP, or memory graph behavior. - Do not add generic datastore, search, indexing, or degradation primitives to `gobby-code` when they belong in `gobby-core`. -- Do not add daemon-backed graph/report CLI commands as the target architecture. +- Do not add daemon-backed projection/report CLI commands as the target architecture. - Do not rely on inherited Gobby-owned migrations as the standalone story. - Do not write `.gobby/project.json`, mutate `config_store`, or run `gcode invalidate`. - Do not add Graphify or any third-party graph product as a runtime dependency. -- Do not move code-symbol embedding generation to Rust in this plan. +- Do not move LLM-generated symbol summaries to Rust in this plan. ## P1: Core Boundary And Setup `kind: framing` -### 1.1 Create the gobby-code graph library boundary [category: code] +### 1.1 Create the gobby-code projection library boundary [category: code] `kind: deliverable` -Targets: `crates/gcode/Cargo.toml`, `crates/gcode/src/lib.rs`, `crates/gcode/src/main.rs`, `crates/gcode/src/commands/graph.rs`, `crates/gcode/src/falkor.rs` +Targets: `crates/gcode/Cargo.toml`, `crates/gcode/src/lib.rs`, `crates/gcode/src/main.rs`, `crates/gcode/src/commands/graph.rs`, `crates/gcode/src/commands/vector.rs`, `crates/gcode/src/falkor.rs`, `crates/gcode/src/search/semantic.rs` -Add a library target for `gobby-code` and move code-specific graph behavior behind modules callable from both the CLI and a future Rust daemon. The CLI keeps the existing binary surface, but implementation entry points should be library functions with input structs and serializable output structs. +Add a library target for `gobby-code` and move code-specific projection behavior behind modules callable from both the CLI and a future Rust daemon. The CLI keeps the existing binary surface, but implementation entry points should be library functions with input structs and serializable output structs. -Shared context/config, setup contracts, datastore adapters, generic indexing/search primitives, and degradation contracts come from `gobby-core`. `gobby-code` owns code graph API types, code fact models, graph writes/reads, lifecycle commands, reports, and code-specific search boosts. +Shared context/config, setup contracts, datastore adapters, generic indexing/search primitives, and degradation contracts come from `gobby-core`. `gobby-code` owns code fact models, graph projection APIs, vector projection APIs, lifecycle commands, reports, and code-specific search boosts. Initial module shape: - `crates/gcode/src/lib.rs` exports reusable modules. +- `index::api` owns code-fact write APIs for files, symbols, imports, calls, unresolved targets, and content chunks, callable independent of CLI types. Detailed contract lives in §1.4. - `graph::typed_query` owns safe FalkorDB parameter rendering. -- `graph::code_graph` owns code graph writes, reads, and lifecycle operations. +- `graph::code_graph` owns FalkorDB `gobby_code` graph projection writes, reads, and lifecycle operations. +- `vector::code_symbols` owns embedding requests, Qdrant collection ensure/upsert/delete/clear/rebuild, and lifecycle operations for `code_symbols_`. +- `projection::sync` coordinates graph/vector sync status after PostgreSQL code fact writes. - `graph::report` owns project graph report generation. - `setup` integrates explicit standalone setup through `gobby-core` contracts. - `schema` keeps gcode-specific attached-mode validation. @@ -60,20 +80,44 @@ Initial module shape: **Acceptance:** - 1.1.1 - `gobby-code` builds as both a library and `gcode` binary. file: `crates/gcode/Cargo.toml`, `crates/gcode/src/lib.rs`. -- 1.1.2 - `main.rs` and `commands/*` call library APIs rather than owning graph business logic. file: `crates/gcode/src/main.rs`, `crates/gcode/src/commands/graph.rs`. -- 1.1.3 - Library APIs avoid CLI-only types in public input/output contracts. test: `crates/gcode/src/lib.rs::tests::public_graph_api_is_cli_independent`. -- 1.1.4 - Phase 7 compatibility surface in `falkor.rs` remains available. test: `gobby/tests/code_index/test_gcode_phase7_contract.py`. +- 1.1.2 - `main.rs` and `commands/*` call library APIs rather than owning projection business logic. file: `crates/gcode/src/main.rs`, `crates/gcode/src/commands/graph.rs`, `crates/gcode/src/commands/vector.rs`. +- 1.1.3 - Library APIs avoid CLI-only types in public input/output contracts. test: `crates/gcode/src/lib.rs::tests::public_projection_api_is_cli_independent`. +- 1.1.4 - Phase 7 compatibility surface in `falkor.rs` remains available (file exists, is not a pure re-export, and exposes the basic facade symbols `FalkorClient`, `with_falkor` referenced by downstream gcode modules). The deep source-inspection contract that the Gobby-repo Phase 7 test asserts is pinned by §1.5.11 / §1.5.12 / §1.5.13 / §1.5.14 / §1.5.15 / §1.5.16 once §1.5 lands. test: `crates/gcode/src/lib.rs::tests::falkor_facade_is_available`, test: `gobby/tests/code_index/test_gcode_phase7_contract.py`. ### 1.2 Add explicit standalone setup [category: code] (depends: 1.1) `kind: deliverable` -Targets: `crates/gcode/src/schema.rs`, `crates/gcode/src/setup.rs` +Targets: `crates/gcode/src/schema.rs`, `crates/gcode/src/setup.rs`, `crates/gcode/src/commands/setup.rs`, `crates/gcode/src/commands/mod.rs`, `crates/gcode/src/main.rs` Separate attached-mode validation from standalone setup: - Attached mode validates the Gobby hub schema, `pg_search`, and BM25 indexes without creating or migrating Gobby-owned objects. -- Standalone setup is an explicit user action, for example `gcode setup --standalone`, that creates only the minimal gcode-owned app schema needed for indexing, graph sync state, and search in a selected database/schema namespace. +- Standalone setup is an explicit user action invoked through `gcode setup --standalone [--database-url ] [--schema ]` that creates only the minimal gcode-owned app schema needed for indexing, graph/vector sync state, and search in a selected database/schema namespace. - Runtime commands fail with clear setup guidance when prerequisites are missing. +CLI surface: + +- `crates/gcode/src/main.rs` defines the `setup` subcommand with `--standalone` (required for write actions in v1), `--database-url`, and `--schema` flags, and routes execution to `commands::setup::run`. +- `crates/gcode/src/commands/setup.rs` parses the resolved arguments, builds a `setup::StandaloneSetupRequest`, calls the library API in `setup.rs`, and formats output through `output::print_json` / `output::print_text`. +- `crates/gcode/src/commands/mod.rs` exposes the new `setup` module to the binary. + +**Early-dispatch requirement**: + +`gcode setup --standalone` must dispatch from `main.rs` in the early-dispatch block — alongside `Init`, `Projects`, and `Prune` — **before** `Context::resolve()` is called. The current CLI runs context resolution only for commands that require a resolved project root, PostgreSQL DSN, and validated schema; setup is the command that creates those prerequisites for standalone mode, so it cannot depend on them existing. The dispatch site reads `--database-url` and `--schema` directly from the parsed clap struct, constructs a `setup::StandaloneSetupRequest`, and invokes `commands::setup::run(request)` without touching `Context::resolve()`. Running the command in a project lacking normal gcode context (no `.gobby/project.json`, no resolvable bootstrap PostgreSQL DSN) must succeed when the user supplies an explicit `--database-url` and `--schema`. + +**Foundation contract requirement**: + +`crates/gcode/src/setup.rs` performs standalone schema/DDL work by implementing the foundation-defined `gobby_core::setup::StandaloneSetup` trait. The foundation plan (`.gobby/plans/gcore-rust-foundation.md` §1.4) defines the trait as `namespace(&self) -> &str`, `owned_objects(&self) -> Vec`, and `create(&self, ctx: &mut SetupContext<'_>) -> Result`; `gobby-core` deliberately knows nothing about gcode-owned tables, columns, or BM25 index DDL, so gcode-owned DDL strings live inside gcode's creator callbacks rather than inside `gobby-core`. + +Concretely, `crates/gcode/src/setup.rs` defines a struct (for example `GcodeStandaloneSetup`) implementing `gobby_core::setup::StandaloneSetup` with: + +- `namespace()` returning the gcode-owned namespace string (for example `"gcode"`). +- `owned_objects()` returning a `Vec` enumerating every gcode-owned standalone resource (indexed-files table, symbols table, content-chunks table, sync-state tables, BM25 indexes, etc.). Each `OwnedObject` carries its human-readable `name`, `store: StoreKind::Postgres`, and a `creator: Box) -> Result<(), SetupError>>` closure that owns the literal `CREATE TABLE`/`CREATE INDEX`/`CREATE EXTENSION` strings for that resource. +- `create(ctx)` walks the declared `owned_objects()` list and invokes each creator against the supplied `gobby_core::setup::SetupContext` (which exposes `pg: Option<&mut postgres::Client>` for DDL execution), returning a `SetupReport` summarising created/skipped/failed objects. + +All gcode-owned DDL strings live in gcode creator closures; `gobby-core` is the contract owner (trait definition, `SetupContext`, `OwnedObject`, `SetupReport`, `SetupError`, `StoreKind`) but does not contain gcode domain DDL. The foundation contract's `SetupContext` is the only handle through which gcode standalone DDL touches the PostgreSQL connection — gcode does not open its own raw connections or issue DDL outside the creator-callback path. + +Standalone-only: the implementation MUST refuse to declare or execute any DDL that touches Gobby-owned tables, the `config_store` table, or the `.gobby/project.json` file. The acceptance test enumerates the gcode-owned object names and asserts the namespace plus this exclusion list explicitly. + Standalone setup must not write `.gobby/project.json`, `config_store`, Gobby migrations, or daemon-owned metadata. It may create only gcode-owned objects after explicit opt-in. **Acceptance:** @@ -82,8 +126,12 @@ Standalone setup must not write `.gobby/project.json`, `config_store`, Gobby mig - 1.2.2 - Standalone setup is implemented in a separate module from runtime validation. file: `crates/gcode/src/setup.rs`. - 1.2.3 - Missing standalone prerequisites produce an actionable error instead of implicit creation. test: `crates/gcode/src/schema.rs::tests::missing_schema_requires_setup`. - 1.2.4 - Standalone setup creates only gcode-owned objects and never touches `config_store` or `.gobby/project.json`. test: `crates/gcode/src/setup.rs::tests::standalone_setup_is_scoped`. +- 1.2.5 - `gcode setup --standalone [--database-url ...] [--schema ...]` parses via clap and dispatches to `commands::setup::run`. test: `crates/gcode/src/main.rs::tests::parse_setup_standalone`. +- 1.2.6 - `gcode setup --standalone` executes the library setup API end-to-end against the selected standalone database/schema namespace without touching `.gobby/project.json`, `config_store`, or daemon-owned metadata. test: `crates/gcode/src/commands/setup.rs::tests::standalone_command_is_scoped`. +- 1.2.7 - `gcode setup --standalone` dispatches in the early-dispatch block before `Context::resolve()` (alongside `Init`, `Projects`, and `Prune`), and the command runs successfully with an explicit `--database-url` plus `--schema` in a directory lacking `.gobby/project.json` or a resolvable bootstrap PostgreSQL DSN. test: `crates/gcode/src/main.rs::tests::setup_runs_before_context_resolve`. +- 1.2.8 - `crates/gcode/src/setup.rs` defines a struct (for example `GcodeStandaloneSetup`) that implements `gobby_core::setup::StandaloneSetup`; its `namespace()` returns a gcode-owned string (for example `"gcode"`), `owned_objects()` enumerates every gcode-owned standalone resource (indexed-files, symbols, content chunks, sync-state, BM25 indexes) as `OwnedObject` entries whose `creator` closures own the literal `CREATE TABLE`/`CREATE INDEX`/`CREATE EXTENSION` strings, and the declared object list refuses to include Gobby-owned tables, `config_store`, or `.gobby/project.json`. The `create` implementation executes the creator closures against the foundation-supplied `gobby_core::setup::SetupContext`; gcode does not bypass `SetupContext` to open raw PostgreSQL connections or issue DDL outside the creator-callback path. test: `crates/gcode/src/setup.rs::tests::standalone_setup_uses_gobby_core_contract`. -### 1.3 Add safe typed FalkorDB query rendering [category: code] (depends: 1.1) +### 1.3 Add safe typed FalkorDB query rendering [category: code] (depends: 1.1, 1.5) `kind: deliverable` Targets: `crates/gcode/src/graph/typed_query.rs`, `crates/gcode/src/falkor.rs` @@ -103,12 +151,221 @@ Rules: - 1.3.2 - Invalid identifiers, control characters, and NaN/Inf values return typed errors. test: `crates/gcode/src/graph/typed_query.rs::tests`. - 1.3.3 - The wrapper reuses the existing Falkor row conversion boundary. file: `crates/gcode/src/falkor.rs`. -## P2: Code Graph Core +### 1.4 Add reusable code-fact indexing library API [category: code] (depends: 1.1, 1.5) +`kind: deliverable` +Targets: `crates/gcode/src/lib.rs`, `crates/gcode/src/index/mod.rs`, `crates/gcode/src/index/indexer.rs`, `crates/gcode/src/commands/index.rs`, `crates/gcode/src/db.rs` + +Decompose the existing code-fact write path into a reusable library API so the future Rust daemon can link the same indexing surface that `gcode index` uses today. The library API owns PostgreSQL code-fact writes for files, symbols, imports, calls, unresolved targets, and content chunks. CLI parsing, output formatting, progress reporting, and freshness messaging stay in `commands/index.rs`. + +Library shape: + +- `index::api::index_files(IndexRequest, &Context) -> Result` is the public entry point. The function lives in `crates/gcode/src/index/indexer.rs` (or a sibling `api.rs` re-exported through `index::mod`) and is exported from `crates/gcode/src/lib.rs`. +- `IndexRequest` carries: project root, optional file/path filter, optional explicit file list, `full` versus incremental flag, `require_cpp_semantics`, `sync_projections` flag (consumed by §2.6), and other behavior toggles. It must not embed clap derive types or formatter handles. +- `IndexOutcome` is serializable via `serde` and exposes counts: `scanned_files`, `indexed_files`, `skipped_files`, `symbols_indexed`, `chunks_indexed`, plus per-step duration metadata where useful and a typed `degraded` field for partially completed runs. +- `commands/index.rs` parses CLI args, builds the request, calls the library API, and dispatches output through `output::print_json` / `output::print_text`. It must not contain inline PostgreSQL code-fact write logic, language parsing, or chunk assembly. +- `db.rs` exposes connection helpers used by both the library API and projection sync; library entry points must not bypass these helpers to access PostgreSQL directly. + +The library API owns code-fact writes only. Graph and vector projection sync is delegated to the projection modules defined in §2.4 and §2.5 via `projection::sync` (see §2.6); the indexing library API does not call FalkorDB or Qdrant directly. + +**Acceptance:** + +- 1.4.1 - A public `index::api::index_files` library function accepts an `IndexRequest` and returns a serializable `IndexOutcome` covering files, symbols, imports, calls, unresolved targets, and chunks. file: `crates/gcode/src/index/indexer.rs`, `crates/gcode/src/lib.rs`. +- 1.4.2 - `commands/index.rs` calls the library API and contains no inline PostgreSQL code-fact write logic, language parsing, or chunk assembly. file: `crates/gcode/src/commands/index.rs`. +- 1.4.3 - Library input/output structs avoid CLI-only types (no clap derive types, no `output::Format`, no formatter handles). test: `crates/gcode/src/index/indexer.rs::tests::library_api_is_cli_independent`. +- 1.4.4 - Files, symbols, imports, calls, unresolved targets, and chunks are all written through the library API and reflected in `IndexOutcome` counts. test: `crates/gcode/src/index/indexer.rs::tests::library_writes_all_code_facts`. + +### 1.5 Wire gcode to the gobby-core foundation [category: code] (depends: 1.1) +`kind: deliverable` +Targets: `crates/gcode/Cargo.toml`, `Cargo.lock`, `crates/gcode/src/lib.rs`, `crates/gcode/src/config.rs`, `crates/gcode/src/db.rs`, `crates/gcode/src/falkor.rs`, `crates/gcode/src/search/semantic.rs`, `crates/gcode/src/secrets.rs` + +Migrate `gobby-code` from its duplicated foundation plumbing to the shared `gobby-core` crate so the architectural commitment in O1/D1/A1/AC1 is enforced by code, not just by prose. The current `crates/gcode/src/config.rs` resolves `FalkorConfig`/`QdrantConfig`/`EmbeddingConfig` inline, `crates/gcode/src/db.rs` owns its own PostgreSQL connection helpers and config-store reads, `crates/gcode/src/falkor.rs` owns its own FalkorDB client and probe, and `crates/gcode/src/search/semantic.rs` issues raw Qdrant search REST. All four of these surfaces have direct counterparts in `gobby-core` (`gobby-core::config::resolve_*_config` + `CoreContext`, `gobby-core::postgres`, `gobby-core::falkor::with_graph` + `GraphClient`, `gobby-core::qdrant::with_qdrant` + `collection_name` + `search` + `upsert`). + +Cargo wiring: + +- `crates/gcode/Cargo.toml` declares the `gobby-core` dependency with the features this plan needs enabled: `postgres`, `falkor`, `qdrant`, `search`, `indexing` (or `full`). The enablement is unconditional in `[dependencies]` so the consumer migration compiles in both default and `--no-default-features` builds of `gobby-code`. + +Module migration: + +- `crates/gcode/src/config.rs` keeps `Context` building but resolves FalkorDB/Qdrant/embedding configs via `gobby_core::config::resolve_falkordb_config` / `resolve_qdrant_config` / `resolve_embedding_config` (or by composing `gobby_core::config::CoreContext`). `QdrantConfig` and `EmbeddingConfig` references in `gobby-code` become thin re-exports of the gobby-core types so existing call sites keep compiling. `FalkorConfig` cannot be a pure re-export because `gobby_core::config::FalkorConfig` exposes only connection-level fields (`host`, `port`, `password`) while the external Phase 7 contract test in the Gobby repo source-inspects `crates/gcode/src/config.rs` for a local `FalkorConfig { graph_name: String }`; see "Phase 7 compatibility wrapper" below for the explicit wrapper contract. The duplicated resolver bodies (env precedence over `config_store` over defaults, `decode_config_value`, JSON-null handling) are removed regardless. Code-specific projection settings that are not part of `gobby-core`'s connection/auth surface — for example the optional vector dimension override consumed by §2.5's code-symbol vector lifecycle — are added as sibling consumer-owned config types in `crates/gcode/src/config.rs` (such as `CodeVectorSettings { vector_dim: Option }`), resolved through the same `ConfigSource` adapter pipeline (env → `config_store` JSON-decoded → defaults), rather than extending the re-exported `gobby-core` types. `gobby-core::config::EmbeddingConfig` remains the connection/auth surface (`api_base`, `model`, `api_key`) and is not extended for code-specific projection metadata. +- `crates/gcode/src/db.rs` delegates `connect_readonly`, `connect_readwrite`, raw `config_store` reads, and any schema-validation plumbing to `gobby_core::postgres` adapters. `gobby-code` keeps only code-specific helpers on top of the shared adapter; duplicated PostgreSQL client/connect logic is removed. +- `crates/gcode/src/falkor.rs` retains its public facade for the external Phase 7 contract. Per the A1 "Phase 7 compatibility facade exception" bullet and the "Phase 7 compatibility wrapper" subsection below, `falkor.rs` resolves connection-level FalkorConfig fields (host, port, password) through `gobby_core::config::resolve_falkordb_config` via the §1.5 `ConfigSource` adapter, but owns the local Phase 7 `FalkorClient { graph: SyncGraph }` / `with_falkor` connection path — `FalkorClient::from_config` instantiates `falkordb::FalkorClientBuilder` directly to build the local `SyncGraph` because the foundation `gobby_core::falkor::GraphClient.graph` field is private and the foundation API exposes no public hook (no `into_sync_graph`, `from_graph_client`, or `with_graph_client` constructor) for building the Phase 7-required local shape. The `"gobby_code"` graph name remains consumer-supplied at every call site; no graph default leaks into gobby-core. The facade is an explicit compatibility wrapper, not a pure re-export — `falkor.rs` keeps the local `FalkorClient`, `from_config`, and `with_falkor` symbols that the Phase 7 test source-inspects. The single-file scope of this exception is pinned by §1.5.22; all other `gobby-code` graph consumers MUST enter Falkor through `gobby_core::falkor::with_graph` and MUST NOT instantiate `falkordb::FalkorClientBuilder` themselves. See "Phase 7 compatibility wrapper" below for the full wrapper contract. +- `crates/gcode/src/search/semantic.rs` calls `gobby_core::qdrant::with_qdrant`, `gobby_core::qdrant::collection_name(.., CollectionScope::Custom("code_symbols_"))`, and `gobby_core::qdrant::search` for the soft semantic-search path instead of issuing raw Qdrant REST calls. Embedding config absence remains consumer-owned: the search path checks `Option<&EmbeddingConfig>` and reports missing embedding via the shared degradation vocabulary before entering the Qdrant adapter. +- `crates/gcode/src/lib.rs` re-exports the foundation-bridged module surface used by the rest of this plan and hosts the regression test that asserts the consumer-migration invariants. +- `crates/gcode/src/secrets.rs` keeps the Fernet-backed `resolve_config_value` / `resolve_secret` helpers that the consumer adapter calls through. Secret-token decryption stays in `gobby-code` (Fernet crypto is not pulled into `gobby-core`); the adapter simply pipes the gobby-core decoded value through `secrets::resolve_config_value` before returning it. + +**Consumer adapter contract** (matches the foundation plan's `ConfigSource` trait): + +`crates/gcode/src/config.rs` defines a PostgreSQL-backed `ConfigSource` implementation owned by the consumer. The adapter wraps `&mut postgres::Client` and routes every config-store read through the shared decode pipeline plus the local secret-resolution helper: + +```rust,ignore +// Lives in crates/gcode/src/config.rs (or a sibling consumer adapter module). +// Implements gobby_core::config::ConfigSource for the attached-mode resolver. +struct PostgresConfigSource<'a> { + conn: &'a mut postgres::Client, +} + +impl gobby_core::config::ConfigSource for PostgresConfigSource<'_> { + fn config_value(&mut self, key: &str) -> Option { + gobby_core::postgres::read_config_value(self.conn, key) + .ok() + .flatten() + .and_then(|raw| gobby_core::config::decode_config_value(&raw)) + } + + fn resolve_value(&mut self, value: &str) -> anyhow::Result { + crate::secrets::resolve_config_value(value, self.conn) + } +} +``` + +`gobby-code` then calls `gobby_core::config::resolve_falkordb_config(&mut source)` / `resolve_qdrant_config(&mut source)` / `resolve_embedding_config(&mut source)` with that adapter in attached mode. Standalone / no-database paths pass `gobby_core::config::EnvOnlySource` instead, matching the foundation plan's contract. + +The adapter is the single boundary between gobby-code's Fernet-backed secret store and gobby-core's database-agnostic resolver. It preserves the existing four-step pipeline `env → config_store (JSON-decoded) → $secret:/${VAR} interpolation → defaults` exactly: + +- **Env precedence**: `resolve_*_config` checks env vars (`GOBBY_FALKORDB_HOST`, `GOBBY_QDRANT_URL`, `GOBBY_EMBEDDING_API_KEY`, etc.) before calling `ConfigSource.config_value`, so env overrides remain authoritative for standalone, tests, and diagnostics. +- **JSON decode**: `ConfigSource.config_value` always pipes raw `read_config_value` output through `decode_config_value`; a JSON-encoded value such as `"\"http://host:7474\""` is unwrapped to `http://host:7474`; JSON null returns `None` so missing values surface cleanly. +- **Secret resolution**: every config-store value still passes through `crate::secrets::resolve_config_value`, so `$secret:falkordb_password`, `$secret:qdrant_api_key`, and `$secret:embedding_api_key` continue to resolve from `gcode`-managed Fernet tokens. `${VAR}` and `${VAR:-default}` interpolation also continues to work for non-secret env templates. + +**Phase 7 compatibility wrapper** (matches A1's Phase 7 contract gate and §1.1's compatibility-facade clause): + +The Gobby-repo Phase 7 contract test at `gobby/tests/code_index/test_gcode_phase7_contract.py` source-inspects `gobby-code` for a specific set of public symbols and field shapes. Until that external source-inspection contract is revised (see VS1 and DF1), `gobby-code` MUST preserve the following local shapes in `gobby-code` source — they cannot collapse into pure re-exports of `gobby_core` types: + +- `crates/gcode/src/config.rs` defines a local `pub struct FalkorConfig { pub host: String, pub port: u16, pub password: Option, pub graph_name: String }`. The `host`/`port`/`password` fields mirror `gobby_core::config::FalkorConfig` so connection-level data is sourced from `gobby_core::config::resolve_falkordb_config`. The `graph_name` field is gcode-owned and defaults to the `"gobby_code"` constant defined in `config.rs`. Config-key and env-var strings the Phase 7 test inspects (`GOBBY_FALKORDB_HOST`, `GOBBY_FALKORDB_PORT`, `GOBBY_FALKORDB_PASSWORD`, and the corresponding `config_store` keys) remain present in `config.rs` even though the resolver bodies are replaced by calls into `gobby_core`. +- `crates/gcode/src/falkor.rs` defines a local `pub struct FalkorClient { graph: SyncGraph }` plus `impl FalkorClient { pub fn from_config(config: &FalkorConfig) -> anyhow::Result }` and the free function `pub fn with_falkor(ctx: &Context, default: T, f: impl FnOnce(&mut FalkorClient) -> anyhow::Result) -> anyhow::Result`. The `falkordb::{FalkorClientBuilder, FalkorConnectionInfo, SyncGraph}` import chain remains visible in `falkor.rs` so the source-inspection contract resolves. **Facade exception**: because `gobby_core::falkor::GraphClient { graph: SyncGraph }` has a private `graph` field and `gobby_core::falkor::with_graph(..., |gc| ...)` exposes only `&mut GraphClient` to the closure, `falkor.rs` cannot construct the Phase 7-required `FalkorClient { graph: SyncGraph }` by unwrapping a `GraphClient`; the foundation adapter does not expose a public hook (no `into_sync_graph`, `from_graph_client`, or `with_graph_client` constructor). Until the foundation API gains such a hook or the external Phase 7 source-inspection contract is retired, `falkor.rs` is the single `gobby-code` source file allowed to instantiate `falkordb::FalkorClientBuilder` directly: `FalkorClient::from_config(config)` builds the local `SyncGraph` via the `FalkorClientBuilder` / `FalkorConnectionInfo` chain (preserving the `urlencoding::encode(password)` and `falkor://:{}@{}:{}` source fragments pinned by §1.5.14), and `with_falkor(ctx, default, f)` reads the resolved `FalkorConfig` from `ctx.falkordb`, builds a `FalkorClient` via `FalkorClient::from_config`, and invokes `f(&mut client)` against that local handle. Connection-level FalkorConfig fields (host, port, password) are still resolved through `gobby_core::config::resolve_falkordb_config` via the §1.5 `ConfigSource` adapter, so attached-mode `config_store` and `$secret:` resolution remain shared with all other consumers. All other `gobby-code` graph consumers (graph writes and reads owned by §2.2/§2.3, search graph boost, projection lifecycle code owned by §2.6, report generation owned by §3.1, and CLI command handlers owned by §2.4/§3.2) MUST enter Falkor through `gobby_core::falkor::with_graph` and MUST NOT instantiate `falkordb::FalkorClientBuilder` themselves. §1.5.22 pins this single-file scope. +- The `gobby_code` graph name is sourced from `FalkorConfig.graph_name` at every call site; the literal string `"gobby_code"` lives only in the `FALKORDB_GRAPH_NAME` constant in `config.rs` (gcode-owned default) and any necessary call-site wiring. No graph-name default leaks into `gobby_core`. +- `crates/gcode/src/falkor.rs` preserves the public read API that the external Phase 7 test source-inspects: + - `pub fn count_callers(ctx: &Context, symbol_id: &str) -> anyhow::Result` + - `pub fn count_usages(ctx: &Context, symbol_id: &str) -> anyhow::Result` + - `pub fn find_callers(ctx: &Context, symbol_id: &str, limit: usize, offset: usize) -> anyhow::Result>` + - `pub fn find_usages(ctx: &Context, symbol_id: &str, limit: usize, offset: usize) -> anyhow::Result>` + - `pub fn find_callers_batch(ctx: &Context, symbol_ids: &[String], limit: usize) -> anyhow::Result>>` + - `pub fn find_callees_batch(ctx: &Context, symbol_ids: &[String], limit: usize) -> anyhow::Result>>` + - `pub fn get_imports(ctx: &Context, file_path: &str) -> anyhow::Result>` + - `pub fn blast_radius(ctx: &Context, target: &BlastRadiusTarget, depth: usize, limit: usize) -> anyhow::Result>` + Each helper retains its sibling Cypher-builder function in the same file (`count_callers_query`, `count_usages_query`, `find_callers_query`, `find_usages_query`, `find_callers_batch_query`, `find_callees_batch_query`, `get_imports_query`, `blast_radius_query`), keeping the existing numeric clamping (for example `depth`/`limit`/`offset` upper bounds) and string-parameter escaping behavior. Internals MAY delegate to `graph::code_graph` once §2.3 lands so query construction has a single canonical owner (see §2.3.4), but the public signatures and the named `*_query` helpers MUST remain visible to compile-time and source-inspection assertions in `falkor.rs`. +- The following source fragments must remain visible in `crates/gcode/src/falkor.rs`. Per the facade exception above, the connection-building bodies (`FalkorClient::from_config`, `with_falkor`) own the local `falkordb::FalkorClientBuilder` / `FalkorConnectionInfo` / `SyncGraph` chain directly rather than delegating to `gobby_core::falkor::with_graph` / `gobby_core::falkor::GraphClient`; the read-helper query bodies internally delegate to `graph::code_graph` once §2.3 lands (per §2.3.4), and `graph::code_graph` itself enters Falkor through `gobby_core::falkor::with_graph`. The named fragments MUST remain visible in `falkor.rs` regardless of which delegation path the enclosing body follows: + - `urlencoding::encode(password)` — used when constructing the Falkor connection URL. + - The `falkor://:{}@{}:{}` URL shape literal in the connection-string builder. + - `.with_connection_info(conn_info)` on the `FalkorClientBuilder` chain. + - `.with_params(&` (for example `.with_params(¶ms)`) when issuing parameterized graph queries. + - `result.header` referenced when iterating the result set of a Falkor query. + - `FalkorValue::None` referenced when normalising row values. + - `let mut client =` — used to bind a mutable Falkor client handle before issuing query work. + - `ctx.falkordb` — read on the resolved `Context` to access the gcode-owned FalkorDB config struct. + These fragments are what the Gobby-repo Phase 7 test searches for as a proxy for "gcode still owns a local Falkor connection/query surface." Per the facade exception, the connection-building bodies retain the named fragments and own the `FalkorClientBuilder` / `FalkorConnectionInfo` / `SyncGraph` chain locally rather than delegating to `gobby_core::falkor::with_graph`; the read-helper query bodies retain the named fragments and delegate internally to `graph::code_graph` once §2.3 lands (per §2.3.4). The named source fragments above MUST NOT be erased from `falkor.rs`. +- `crates/gcode/src/falkor.rs` retains the following query/row-handling surface, which the Phase 7 test also source-inspects: + - `pub type Row = HashMap` — a public type alias used by the row-handling helpers and the public read API, where `Value` is `serde_json::Value` (imported as `use serde_json::Value;` so the unqualified name `Value` appears at the public alias declaration). The Phase 7 test source-inspects for the exact substring `pub type Row = HashMap`, so the alias name must remain `Row`, the unqualified type `Value` must appear in the declaration (not `serde_json::Value`), and the alias must remain at the file's public surface. `FalkorValue` is the raw row type returned by the `falkordb` crate; it remains visible in `falkor.rs` (per §1.5.14) for the internal conversion helper and source-fragment checks, but the public `Row` alias is `HashMap` with `Value = serde_json::Value`, not `HashMap`. + - `pub fn query(&mut self, cypher: &str, params: Option>) -> anyhow::Result>` (or the equivalent signature the existing wrapper uses) on `impl FalkorClient` — the public Cypher entry point that the Phase 7 test asserts. Internals MAY delegate to `gobby_core::falkor::GraphClient::query` but the public method name, the `cypher: &str` parameter, the `Option>` params shape, and the `Vec` return type must remain visible at the public API. + - `fn parse_falkor_result(...)` — a private helper that converts `FalkorResult` rows into the public `Row` type, preserving null/value normalisation. The helper consumes `FalkorValue` rows from `falkordb` and produces `Row = HashMap` entries (where `Value = serde_json::Value`) via the internal `falkor_value_to_json` conversion. The Phase 7 test asserts this helper exists by name in `falkor.rs`. +- `crates/gcode/src/falkor.rs` retains the production-read-query helper and literal-fragment surface that the Phase 7 production-read-query test asserts. These are query-builder utilities and literal Cypher fragments that the existing `*_query` helpers compose; both the helper functions and the literal substrings must remain visible in `falkor.rs`: + - `fn cypher_string_literal(value: &str) -> String` — escapes and quotes a string for inline Cypher literal substitution. + - `fn id_list_literal(ids: &[String]) -> String` — renders a comma-separated list of quoted IDs for inline `IN [...]` clauses. + - `fn clamp_offset(offset: usize) -> usize` (or matching signature) — clamps the pagination offset to the defined upper bound and is consumed by `find_callers_query`, `find_usages_query`, and similar paginated helpers. + - The literal Cypher fragment `target:CodeSymbol OR target:UnresolvedCallee OR target:ExternalSymbol` — must appear verbatim inside the relevant `*_query` helper bodies (callers/usages production reads) so the union of allowed target labels is testable by source inspection. + - The literal `SKIP {offset} LIMIT {limit}` fragment — must appear verbatim inside the paginated `*_query` helper bodies (`find_callers_query`, `find_usages_query`) where pagination clamping is applied via `clamp_offset` and a clamped `limit`. + - The literal `target.id IN [{ids}]` fragment — must appear verbatim inside the batch helpers (`find_callers_batch_query`, `find_callees_batch_query`) where `{ids}` is the inline list rendered via `id_list_literal`. + - **Unbound-parameter ban**: the generated Cypher strings produced by these helpers MUST NOT contain `$offset`, `$limit`, or `$ids`. Pagination and ID-list values are substituted inline via `clamp_offset`, `cypher_string_literal`, and `id_list_literal`; they are not bound through `.with_params(...)`. The Phase 7 production-read-query test asserts both that the named literal fragments are present and that `$offset` / `$limit` / `$ids` do not appear in the produced query strings. + +**Additional Phase 7 contract assertions** (mirror the full external test surface so local validation catches divergences before the cross-repo gate runs): + +The external Phase 7 test in `gobby/tests/code_index/test_gcode_phase7_contract.py` also asserts the following items. Until that external source-inspection contract is revised (see VS1 and DF1), `gobby-code` MUST preserve them locally so the §1.5 leaf cannot pass validation while the external gate still fails: + +- **Cargo manifest and lockfile state**: + - `crates/gcode/Cargo.toml` declares `[package].name = "gobby-code"` and the `[[bin]]` table includes `{ name = "gcode", path = "src/main.rs" }`. + - `[dependencies]` pins `falkordb = "0.2"` and `urlencoding = "2"` exactly (string equality, not range), and includes `base64` and `reqwest` (either form — direct version string or `{ version = "...", features = [...] }`). + - `Cargo.lock` at the workspace root contains packages named `falkordb` and `urlencoding`, and does NOT contain packages named `neo4j` or `neo4rs`. +- **`Context` struct, resolver invocation, and graph-name source pattern**: + - `crates/gcode/src/config.rs` declares the `Context` struct with the field `pub falkordb: Option` exactly. + - The literal expression `let falkordb = resolve_falkordb_config(` appears in `config.rs` (the resolver entry-point that builds the optional FalkorConfig the `Context` field carries). + - `FalkorConfig.graph_name` is populated via one of two patterns: either the inline literal `graph_name: "gobby_code".to_string()`, or the pair `const FALKORDB_GRAPH_NAME: &str = "gobby_code";` plus `graph_name: FALKORDB_GRAPH_NAME.to_string()`. The const-and-assignment pattern is the current canonical form and is preferred. +- **`config_store` key literals**: + - `crates/gcode/src/config.rs` contains the literal substrings `databases.falkordb.host`, `databases.falkordb.port`, and `databases.falkordb.requirepass` (these are the config-store keys the `PostgresConfigSource` adapter reads in attached mode). These appear in addition to the env-var literals (`GOBBY_FALKORDB_HOST`, `GOBBY_FALKORDB_PORT`, `GOBBY_FALKORDB_PASSWORD`) already pinned by §1.5.11. +- **Production-read-query clamping and additional literal fragments**: + - `crates/gcode/src/falkor.rs` production code (outside `#[cfg(test)]`) contains the numeric-clamping expressions `depth.clamp(1, 5)` (blast-radius depth clamp), `limit.clamp(1, MAX_GRAPH_LIMIT)` (generic limit clamp), and `offset.min(MAX_GRAPH_LIMIT)` (offset upper-bound clamp inside or adjacent to `clamp_offset`). + - `falkor.rs` production code contains the literal Cypher fragment `src.id IN [{ids}]` (used in batch helpers that filter by the source-side ID list) in addition to the `target.id IN [{ids}]` fragment already pinned by §1.5.16. + - `falkor.rs` production code contains the standalone literal Cypher fragment `LIMIT {limit}` (used by non-paginated helpers such as `blast_radius_query`/`get_imports_query`) in addition to the paginated `SKIP {offset} LIMIT {limit}` fragment already pinned by §1.5.16. + - `falkor.rs` exposes the function signature `fn blast_radius_query(depth: usize, limit: usize)` — the depth-and-limit Cypher builder for `blast_radius`. +- **Neo4j transition state (source-level absence branch)**: + - The external Phase 7 test's `_assert_neo4j_transition_state` helper accepts either a complete transitional `Neo4jConfig` shape (with `pub struct Neo4jConfig { ... }` in `config.rs`, `pub neo4j: Option` on `Context`, and `let neo4j = resolve_neo4j_config(` in `config.rs`) or source-level absence of every Neo4j artifact. This plan commits to the **source-level absence branch** because the current `gobby-code` source has no Neo4j references and FalkorDB is the only graph adapter going forward. + - `crates/gcode/src/config.rs` MUST NOT declare a `pub struct Neo4jConfig { ... }`; MUST NOT contain a `resolve_neo4j_config` function, free function, or any symbol named `resolve_neo4j_config`; and MUST NOT declare any struct field of the shape `pub neo4j: Option` on `Context` or any other struct in this file. The Cargo.lock state pinned by §1.5.17 (no `neo4j`/`neo4rs` packages) is a separate dependency-side assertion; this bullet pins the source-side absence the external `_assert_neo4j_transition_state` helper checks against `config.rs` directly. + - If a future Neo4j transition reintroduces those fields, the wrapper must switch to satisfying the full transitional shape branch (re-add `Neo4jConfig`, `Context.neo4j`, and `resolve_neo4j_config`); the plan must be updated to pin the transitional shape before §1.5.21 is removed. + +The wrapper layer is the only place in `gobby-code` allowed to keep the duplicated symbol shapes that mirror `gobby_core::falkor::GraphClient` / `with_graph` and the only place allowed to instantiate `falkordb::FalkorClientBuilder` (per the facade exception above and the A1 "Phase 7 compatibility facade exception" bullet). All other code-graph consumers in `gobby-code` — the §2.2/§2.3/§2.4 writers, readers, and CLI commands plus §2.6 projection lifecycle code and §3.1 report generation — call `gobby_core::falkor::with_graph` directly; they do not call the wrapper or instantiate `falkordb::FalkorClientBuilder` themselves. + +Behavioral guarantees: + +- All FalkorDB ServiceState transitions in `gobby-code` graph consumers outside `crates/gcode/src/falkor.rs` enter through `gobby_core::falkor::with_graph`. `falkor.rs` itself owns the local Phase 7 facade connection path (`FalkorClient::from_config` and `with_falkor` instantiate `falkordb::FalkorClientBuilder` directly) per the A1 "Phase 7 compatibility facade exception" bullet and the §1.5 "Phase 7 compatibility wrapper" subsection; the single-file scope is pinned by §1.5.22. `gobby-code` does not implement its own four-state Falkor probe — the facade exception is limited to the local connection-building chain required by the external Phase 7 source-inspection contract. +- All non-lifecycle Qdrant ServiceState transitions enter through `gobby_core::qdrant::with_qdrant`; raw Qdrant REST is allowed only inside the §2.5 lifecycle exception scope (see A1). +- PostgreSQL connection plumbing flows through `gobby_core::postgres`; gobby-code does not duplicate `connect_readonly` / `connect_readwrite` bodies. +- `cargo build -p gobby-code` succeeds with default features and with `--no-default-features`, matching the workspace VS1 verification. +- Attached mode resolves FalkorDB, Qdrant, and embedding service settings from `config_store` plus `$secret:` resolution rather than from env-only paths or duplicated resolver bodies; standalone/tests use `EnvOnlySource` for the same call sites. + +**Acceptance:** + +- 1.5.1 - `crates/gcode/Cargo.toml` enables the required `gobby-core` features for the consumer migration: `postgres`, `falkor`, `qdrant`, `search`, and `indexing` (or the umbrella `full` feature). file: `crates/gcode/Cargo.toml`. +- 1.5.2 - `gobby-code` compiles with default features and with `--no-default-features` after the foundation wiring lands, with the gobby-core feature gates supplying the adapters used by `config.rs`, `db.rs`, `falkor.rs`, and `search/semantic.rs`. file: `crates/gcode/Cargo.toml`. +- 1.5.3 - `crates/gcode/src/config.rs` resolves FalkorDB, Qdrant, and embedding configs via `gobby_core::config::resolve_*_config` (or `gobby_core::config::CoreContext`) and contains no duplicated env-precedence/`config_store`/`decode_config_value` resolver bodies. `QdrantConfig` and `EmbeddingConfig` are thin re-exports of the gobby-core types; `FalkorConfig` remains a local compatibility wrapper per the §1.5 "Phase 7 compatibility wrapper" subsection. file: `crates/gcode/src/config.rs`. +- 1.5.4 - `crates/gcode/src/db.rs` delegates `connect_readonly`, `connect_readwrite`, and `config_store` reads to `gobby_core::postgres` adapters; no duplicated PostgreSQL client/connect/config-store logic remains. file: `crates/gcode/src/db.rs`. +- 1.5.5 - `crates/gcode/src/falkor.rs` keeps its public facade as an explicit compatibility wrapper (not a pure re-export) per the §1.5 "Phase 7 compatibility wrapper" subsection. The wrapper resolves connection-level FalkorConfig fields (host, port, password) through `gobby_core::config::resolve_falkordb_config` via the §1.5 `ConfigSource` adapter, and is allowed to instantiate `falkordb::FalkorClientBuilder` and own its local `SyncGraph` directly inside `falkor.rs` because the foundation `gobby_core::falkor::GraphClient.graph` field is private and cannot be unpacked into the `SyncGraph` the local Phase 7 facade `FalkorClient { graph: SyncGraph }` requires. The scope of this exception is pinned by §1.5.22; the `"gobby_code"` graph name remains consumer-supplied at every call site. file: `crates/gcode/src/falkor.rs`. +- 1.5.6 - `crates/gcode/src/search/semantic.rs` calls `gobby_core::qdrant::with_qdrant`, `gobby_core::qdrant::collection_name(.., CollectionScope::Custom(..))`, and `gobby_core::qdrant::search` for the soft semantic-search path instead of issuing raw Qdrant search REST. file: `crates/gcode/src/search/semantic.rs`. +- 1.5.7 - A consumer-migration regression test asserts that `gobby-code` config resolution, PostgreSQL connection plumbing, Falkor ServiceState boundaries, and non-lifecycle Qdrant operations route through `gobby_core` modules rather than duplicated `gobby-code` wrappers. test: `crates/gcode/src/lib.rs::tests::foundation_consumer_migration`. +- 1.5.8 - `crates/gcode/src/config.rs` defines a `PostgresConfigSource` (or equivalently named consumer adapter) that implements `gobby_core::config::ConfigSource`, reads via `gobby_core::postgres::read_config_value`, decodes via `gobby_core::config::decode_config_value`, and resolves `$secret:NAME` / `${VAR}` patterns via `crate::secrets::resolve_config_value`. Attached-mode callers pass this adapter to `resolve_*_config`; standalone/no-database call sites use `gobby_core::config::EnvOnlySource`. file: `crates/gcode/src/config.rs`, `crates/gcode/src/secrets.rs`. +- 1.5.9 - Env vars take precedence over `config_store` and JSON-encoded `config_store` values are decoded correctly through the adapter pipeline (string values unwrapped, arrays/objects re-serialized, JSON null returns `None`) for FalkorDB host/port/password, Qdrant URL/API key, and embedding URL/model/API key. test: `crates/gcode/src/config.rs::tests::adapter_env_precedence_and_json_decode`. +- 1.5.10 - `$secret:falkordb_password`, `$secret:qdrant_api_key`, and `$secret:embedding_api_key` stored in `config_store` still resolve through the adapter in attached mode via `crate::secrets::resolve_config_value`, yielding decrypted plaintext for the resulting `FalkorConfig.password`, `QdrantConfig.api_key`, and `EmbeddingConfig.api_key` fields. test: `crates/gcode/src/config.rs::tests::adapter_resolves_config_store_secrets`. +- 1.5.11 - `crates/gcode/src/config.rs` defines a local `pub struct FalkorConfig { pub host: String, pub port: u16, pub password: Option, pub graph_name: String }`; `graph_name` is populated from the gcode-owned `FALKORDB_GRAPH_NAME = "gobby_code"` constant; connection-level fields are sourced from `gobby_core::config::resolve_falkordb_config`. The `FalkorConfig { graph_name: String }` shape that the external Phase 7 contract test source-inspects is preserved. test: `crates/gcode/src/config.rs::tests::falkor_config_wrapper_shape`. +- 1.5.12 - `crates/gcode/src/falkor.rs` defines a local `pub struct FalkorClient { graph: SyncGraph }`, an `impl FalkorClient { pub fn from_config(config: &FalkorConfig) -> anyhow::Result }`, and a free function `pub fn with_falkor(ctx: &Context, default: T, f: impl FnOnce(&mut FalkorClient) -> anyhow::Result) -> anyhow::Result`; the `falkordb::{FalkorClientBuilder, FalkorConnectionInfo, SyncGraph}` import chain remains visible in `falkor.rs`. Internally `FalkorClient::from_config(config)` builds the local `SyncGraph` via the `FalkorClientBuilder` / `FalkorConnectionInfo` chain (because the foundation `gobby_core::falkor::GraphClient.graph` field is private and the Phase 7-required shape `FalkorClient { graph: SyncGraph }` cannot wrap the foundation type); `with_falkor(ctx, default, f)` reads the resolved `FalkorConfig` from `ctx.falkordb`, builds a `FalkorClient` through the same `FalkorClient::from_config` path, and runs the closure against that local handle. Connection-level FalkorConfig fields (host, port, password) still route through `gobby_core::config::resolve_falkordb_config` via the §1.5 `ConfigSource` adapter; the `gobby_code` graph name remains consumer-supplied. The single-file scope of this facade exception is pinned by §1.5.22. test: `crates/gcode/src/falkor.rs::tests::falkor_client_wrapper_shape`. +- 1.5.13 - `crates/gcode/src/falkor.rs` preserves the eight public read helpers (`count_callers`, `count_usages`, `find_callers`, `find_usages`, `find_callers_batch`, `find_callees_batch`, `get_imports`, `blast_radius`) and their sibling Cypher-builder helpers (`count_callers_query`, `count_usages_query`, `find_callers_query`, `find_usages_query`, `find_callers_batch_query`, `find_callees_batch_query`, `get_imports_query`, `blast_radius_query`) at the file's public surface. Internals may delegate to `graph::code_graph` reads (§2.3.4) or `gobby_core::falkor`, but the names and signatures listed in the §1.5 Phase 7 compatibility wrapper subsection remain visible to compile-time references and source-inspection assertions. test: `crates/gcode/src/falkor.rs::tests::phase7_read_helpers_visible`. +- 1.5.14 - `crates/gcode/src/falkor.rs` retains the source fragments the Gobby-repo Phase 7 test asserts: `urlencoding::encode(password)`, the `falkor://:{}@{}:{}` URL literal, `.with_connection_info(conn_info)`, `.with_params(&` (for example `with_params(¶ms)`), `result.header`, `FalkorValue::None`, `let mut client =`, and `ctx.falkordb`. Wrapper internals may add `gobby_core::falkor` delegation alongside but must not erase the named fragments. test: `crates/gcode/src/falkor.rs::tests::phase7_source_fragments_visible`. +- 1.5.15 - `crates/gcode/src/falkor.rs` exposes a public `Row` type alias declared as `pub type Row = HashMap` where `Value` is `serde_json::Value` imported into scope (`use serde_json::{..., Value};`) so the unqualified name `Value` appears at the alias declaration site exactly as the Gobby-repo Phase 7 test source-inspects (it asserts the literal substring `pub type Row = HashMap`). The file also exposes a public `query(&mut self, cypher: &str, params: Option>) -> anyhow::Result>` method on `FalkorClient` and a `parse_falkor_result` helper for converting Falkor result rows (`FalkorValue` entries from `falkordb`) into the public `Row` type via the internal `falkor_value_to_json` conversion. Wrapper internals may delegate to `gobby_core::falkor::GraphClient::query` but the public type alias declaration, the `query` method shape, and the named `parse_falkor_result` helper remain visible to source-inspection assertions; the local test reuses the same regex/literal-substring assertions the external Phase 7 test applies so the local and external gates stay aligned. test: `crates/gcode/src/falkor.rs::tests::phase7_query_surface_visible`. +- 1.5.16 - `crates/gcode/src/falkor.rs` retains the Cypher-builder helpers (`cypher_string_literal`, `id_list_literal`, `clamp_offset`) and literal fragments (`target:CodeSymbol OR target:UnresolvedCallee OR target:ExternalSymbol`, `SKIP {offset} LIMIT {limit}`, `target.id IN [{ids}]`) the Phase 7 production-read-query test asserts, and the Cypher strings produced by the public `*_query` helpers must not contain unbound `$offset`, `$limit`, or `$ids` parameters; pagination and ID-list values are substituted inline via `clamp_offset`, `cypher_string_literal`, and `id_list_literal`. test: `crates/gcode/src/falkor.rs::tests::phase7_query_helpers_and_literal_fragments_visible`. +- 1.5.17 - `crates/gcode/Cargo.toml` declares `[package].name = "gobby-code"` and a `[[bin]]` entry `{ name = "gcode", path = "src/main.rs" }`, and `[dependencies]` pins `falkordb = "0.2"` and `urlencoding = "2"` (string equality, not range) plus the `base64` and `reqwest` dependencies. The workspace-root `Cargo.lock` contains packages named `falkordb` and `urlencoding`, and does NOT contain packages named `neo4j` or `neo4rs`. file: `crates/gcode/Cargo.toml`, `Cargo.lock`. test: `crates/gcode/src/falkor.rs::tests::phase7_cargo_and_lockfile_state`. +- 1.5.18 - `crates/gcode/src/config.rs` declares the `Context` struct with the literal field `pub falkordb: Option`, contains the literal expression `let falkordb = resolve_falkordb_config(` at the resolver call site that populates that field, and populates `FalkorConfig.graph_name` either via the inline literal `graph_name: "gobby_code".to_string()` or via the pair `const FALKORDB_GRAPH_NAME: &str = "gobby_code";` plus `graph_name: FALKORDB_GRAPH_NAME.to_string()` (canonical form). test: `crates/gcode/src/config.rs::tests::phase7_context_and_falkor_resolver_visible`. +- 1.5.19 - `crates/gcode/src/config.rs` contains the literal `config_store` key strings `databases.falkordb.host`, `databases.falkordb.port`, and `databases.falkordb.requirepass` so the attached-mode `PostgresConfigSource` adapter reads them verbatim. These appear in addition to the env-var literals (`GOBBY_FALKORDB_HOST`, `GOBBY_FALKORDB_PORT`, `GOBBY_FALKORDB_PASSWORD`) already pinned by §1.5.11. test: `crates/gcode/src/config.rs::tests::phase7_falkordb_config_store_keys_visible`. +- 1.5.20 - `crates/gcode/src/falkor.rs` production code (outside `#[cfg(test)]`) contains the numeric-clamping expressions `depth.clamp(1, 5)`, `limit.clamp(1, MAX_GRAPH_LIMIT)`, and `offset.min(MAX_GRAPH_LIMIT)`; the additional literal Cypher fragments `src.id IN [{ids}]` and standalone `LIMIT {limit}`; and the function signature `fn blast_radius_query(depth: usize, limit: usize)`. These are pinned in addition to the existing fragments listed in §1.5.16. test: `crates/gcode/src/falkor.rs::tests::phase7_additional_query_fragments_visible`. +- 1.5.21 - `crates/gcode/src/config.rs` contains neither a `pub struct Neo4jConfig { ... }` declaration nor any function or symbol named `resolve_neo4j_config`, and no struct in `config.rs` declares a `pub neo4j: Option` field (including on `Context`). This matches the source-level-absence branch of the external Phase 7 `_assert_neo4j_transition_state` helper; the plan commits to absence because `gobby-code` has no transitional Neo4j artifacts and FalkorDB is the only graph adapter. If a future change reintroduces a Neo4j transition, the wrapper must switch to satisfying the full transitional shape branch and §1.5.21 must be updated before that change lands. test: `crates/gcode/src/config.rs::tests::phase7_neo4j_transition_state_absent`. +- 1.5.22 - `crates/gcode/src/falkor.rs` is the only `gobby-code` source file that instantiates `falkordb::FalkorClientBuilder` directly or bypasses `gobby_core::falkor::with_graph`'s ServiceState boundary; the exception exists because `gobby_core::falkor::GraphClient { graph: SyncGraph }` exposes a private `graph` field and provides no public hook (no `into_sync_graph`, `from_graph_client`, or `with_graph_client` constructor) for building the Phase 7-required local `FalkorClient { graph: SyncGraph }`. All other `gobby-code` graph consumers (graph writes/reads owned by §2.2/§2.3, search graph boost, projection lifecycle code owned by §2.6, report generation owned by §3.1, and CLI command handlers owned by §2.4/§3.2) enter Falkor through `gobby_core::falkor::with_graph` and do not import `falkordb::FalkorClientBuilder`. The narrowed single-file scope mirrors the A1 "Vector projection lifecycle exception" pattern and is documented in the A1 "Phase 7 compatibility facade exception" bullet. test: `crates/gcode/src/lib.rs::tests::falkor_facade_exception_scoped_to_falkor_rs`. + +### 1.6 Consume gobby-core generic indexing and search primitives [category: code] (depends: 1.4) +`kind: deliverable` +Targets: `crates/gcode/src/index/walker.rs`, `crates/gcode/src/index/hasher.rs`, `crates/gcode/src/index/chunker.rs`, `crates/gcode/src/search/rrf.rs`, `crates/gcode/src/commands/search.rs`, `crates/gcode/src/lib.rs` + +O1/D1/A1 require generic indexing and search primitives to be consumed from `gobby-core`, not duplicated inside `gobby-code`. The current `gobby-code` source still owns local copies of three foundation-eligible primitives: `crates/gcode/src/index/walker.rs` builds an `ignore::WalkBuilder` directly with the `hidden`/`git_ignore`/`git_global`/`git_exclude` chain that `gobby_core::indexing::WalkerSettings::into_walker` already encapsulates; `crates/gcode/src/index/hasher.rs::file_content_hash` duplicates the SHA-256 streaming body that `gobby_core::indexing::file_content_hash` already exposes (with the same 65_536-byte buffer shape); and `crates/gcode/src/search/rrf.rs::merge` duplicates the RRF fusion algorithm (with the same `RRF_K = 60.0` constant) that `gobby_core::search::rrf_merge` already provides. This deliverable consumes those overlapping primitives from `gobby-core` and explicitly narrows the boundary where overlap is partial. + +§1.5.1 already requires `crates/gcode/Cargo.toml` to enable the `indexing` and `search` features on `gobby-core` (or the umbrella `full` feature), so the foundation modules are reachable when this deliverable lands. No additional Cargo wiring is required. + +**Migrated primitives** (delegate to `gobby-core`): + +- **`crates/gcode/src/index/walker.rs`**: `discover_files` and `classify_file` build their `ignore::WalkBuilder` via `gobby_core::indexing::WalkerSettings::new(root).into_walker()` (or `try_into_walker` to surface invalid-glob errors) instead of constructing `ignore::WalkBuilder::new(root)` directly with the duplicated `git_ignore(true).git_global(true).git_exclude(true)` chain. gcode-specific classification (`FileClassification::Ast` vs `ContentOnly` via the `languages` module, security filters, secret-extension filters) wraps the gobby-core walker output but does not duplicate the gitignore/extra-ignore plumbing. The `MAX_FILE_SIZE = 10 * 1024 * 1024` constant moves into `WalkerSettings::max_filesize`. The duplicated `hidden(true)` toggle, if needed, is composed at the call site rather than reimplemented inside `walker.rs`. +- **`crates/gcode/src/index/hasher.rs`**: `file_content_hash(path: &Path)` delegates to `gobby_core::indexing::file_content_hash(path)`. The duplicated SHA-256 streaming body (`Sha256::new()`, 65_536-byte buffer loop, `format!("{:x}", hasher.finalize())`) is removed. The remaining `symbol_content_hash(source: &[u8], start: usize, end: usize)` helper stays gcode-owned because it operates on a byte range of an in-memory source slice, not a file path, and does not match the foundation primitive's `impl AsRef` shape; the foundation's `content_hash(data: &[u8])` could be used inside `symbol_content_hash` as an implementation detail, and this deliverable optionally delegates the inner SHA-256 step to it. +- **`crates/gcode/src/search/rrf.rs`**: `merge(sources: Vec<(&str, Vec)>) -> Vec` delegates to `gobby_core::search::rrf_merge` and removes the local `const RRF_K: f64 = 60.0;` declaration plus the duplicated fusion loop. The wrapper translates the resulting `Vec` into the existing `Vec<(String, f64, Vec)>` tuple shape that `crates/gcode/src/commands/search.rs` consumes today so call sites continue to compile without rewriting; alternatively, the wrapper returns `Vec` directly and the call sites in `commands/search.rs` are updated to read `.id` / `.score` / `.sources` instead of tuple positions. Either path satisfies the acceptance, and the chosen path is exercised by the named test. The within-source dedup semantics of `gobby_core::search::rrf_merge` (best/min rank) are preserved; the current `gobby-code` implementation overwrites with the last-seen rank, but in practice gcode's BM25/semantic/graph source lists do not contain duplicate IDs so the behavioral delta is bounded. +- **`crates/gcode/src/commands/search.rs`**: continues to call `crate::search::rrf::merge` (the wrapper). If the wrapper switches to returning `SearchResult` directly, the call sites in this file are updated to consume the struct shape (`.id`, `.score`, `.sources`) instead of tuple positions `.0`, `.1`, `.2`. No domain logic changes; the changes are purely shape translation. +- **`crates/gcode/src/lib.rs`**: hosts the regression test that asserts the migrated primitives route through `gobby_core` and that no duplicated foundation logic remains in `gobby-code`. + +**Narrowed primitives** (explicit non-overlap, stay gcode-owned): + +- **`crates/gcode/src/index/chunker.rs`**: remains gcode-owned. The current chunker produces line-based `ContentChunk` records (100-line chunks with 10-line overlap, populating `project_id`, `chunk_index`, `line_start`, `line_end`, `language`, and `created_at` fields for the BM25 content search index) and does not overlap with `gobby_core::indexing::Chunk` (byte-range chunks with opaque metadata and `ChunkIdentity`). Line-based BM25 chunking is a domain-specific projection primitive, not a foundation primitive. `crates/gcode/src/index/chunker.rs` MUST NOT import or compose `gobby_core::indexing::Chunk` / `ChunkIdentity` / `IndexEvent` / `index_events_from_hashes`. The narrowing is documented in the module's source comment (a `//!` doc-comment block above the chunker function describing why `Chunk`/`ChunkIdentity` are intentionally not consumed). Consumers of `ContentChunk` (the §1.4 indexing API, the PostgreSQL writer in §1.4, the BM25 search path) keep the existing gcode-specific shape. +- **`gobby_core::indexing::IndexEvent` and `index_events_from_hashes`**: not consumed by `gobby-code`. gcode tracks file change classification via PostgreSQL `indexed_files.content_hash` state — when the indexer reads PostgreSQL during incremental runs, "added"/"changed"/"unchanged"/"deleted" are derived by comparing on-disk hashes against the persisted PostgreSQL row, not via the in-memory `BTreeMap` snapshots that `index_events_from_hashes` consumes. The non-use is intentional; no acceptance change is required for the indexer beyond documenting the narrowing in §1.6's body. + +**Behavioral guarantees:** + +- After §1.6, no SHA-256 streaming body, no `WalkBuilder::new(root).hidden(...).git_ignore(...).git_global(...).git_exclude(...)` chain, and no `RRF_K = 60.0` constant remains duplicated in `gobby-code`. Each migrated primitive's body is a call into `gobby_core::indexing::*` or `gobby_core::search::*`. +- gcode-specific wrappers continue to expose the existing public API shapes (`discover_files`, `classify_file`, `file_content_hash(path)`, `symbol_content_hash`, `crate::search::rrf::merge`) so downstream gcode code is unaffected; the migration is internal. +- `cargo build -p gobby-code` and `cargo build -p gobby-code --no-default-features` continue to succeed (matching §1.5.2's existing assertion). + +**Acceptance:** + +- 1.6.1 - `crates/gcode/src/index/walker.rs::discover_files` builds its `ignore::WalkBuilder` via `gobby_core::indexing::WalkerSettings::new(root).into_walker()` (or `try_into_walker`). The duplicated `ignore::WalkBuilder::new(root).hidden(true).git_ignore(true).git_global(true).git_exclude(true)` chain is removed from `walker.rs`; gcode-specific classification (`FileClassification`, language detection, security/secret filters) still wraps the walker output. test: `crates/gcode/src/index/walker.rs::tests::walker_consumes_gobby_core_walker_settings`. +- 1.6.2 - `crates/gcode/src/index/hasher.rs::file_content_hash` is implemented as a delegation to `gobby_core::indexing::file_content_hash`; the duplicated `Sha256::new()` / 65_536-byte buffer loop / `format!("{:x}", hasher.finalize())` body is removed. `symbol_content_hash` remains gcode-owned because it operates on a byte range of an in-memory source. test: `crates/gcode/src/index/hasher.rs::tests::file_content_hash_delegates_to_gobby_core`. +- 1.6.3 - `crates/gcode/src/search/rrf.rs::merge` delegates to `gobby_core::search::rrf_merge` and removes the local `const RRF_K: f64 = 60.0;` declaration plus the duplicated fusion loop; the wrapper either translates `Vec` back into the existing `Vec<(String, f64, Vec)>` tuple shape (preserving call-site compatibility) or returns `SearchResult` directly with `commands/search.rs` updated to consume `.id`/`.score`/`.sources`. test: `crates/gcode/src/search/rrf.rs::tests::merge_delegates_to_gobby_core_rrf`. +- 1.6.4 - `crates/gcode/src/index/chunker.rs` remains gcode-owned and gcode-specific: the line-based `ContentChunk` type is the public chunk record; the file MUST NOT import `gobby_core::indexing::Chunk`, `gobby_core::indexing::ChunkIdentity`, `gobby_core::indexing::IndexEvent`, or `gobby_core::indexing::index_events_from_hashes`; the explicit narrowing is documented in the module's source comment block above `chunk_file_content`. test: `crates/gcode/src/index/chunker.rs::tests::chunker_stays_gcode_owned_with_documented_narrowing`. +- 1.6.5 - A regression test asserts that gcode's generic indexing and search primitives route through `gobby_core::indexing` / `gobby_core::search` rather than duplicating the foundation logic in `gobby-code`: `index/walker.rs` references `gobby_core::indexing::WalkerSettings`, `index/hasher.rs` references `gobby_core::indexing::file_content_hash`, `search/rrf.rs` references `gobby_core::search::rrf_merge`, and the narrowed boundary for `index/chunker.rs` (no `gobby_core::indexing::Chunk` import) is preserved. test: `crates/gcode/src/lib.rs::tests::indexing_search_primitive_migration`. + +## P2: Code Projection Core `kind: framing` ### 2.1 Define provenance and confidence metadata [category: code] (depends: 1.1) `kind: deliverable` -Targets: `crates/gcode/src/models.rs`, `crates/gcode/src/graph/code_graph.rs`, `crates/gcode/src/graph/report.rs` +Targets: `crates/gcode/src/models.rs`, `crates/gcode/src/graph/code_graph.rs`, `crates/gcode/src/graph/report.rs`, `crates/gcode/src/vector/code_symbols.rs` Add a shared graph metadata model: @@ -117,7 +374,7 @@ Add a shared graph metadata model: - `source_system`: producer name, such as `gcode`, `gobby-memory`, or `qdrant`. - Source details such as file path, line, symbol ID, or matching method where available. -Code-derived `CALLS`, `IMPORTS`, and `DEFINES` are always `EXTRACTED` with `source_system = "gcode"`. Memory-derived and code/memory bridge edges are `INFERRED` or `AMBIGUOUS` and remain memory-owned. +Code-derived `CALLS`, `IMPORTS`, and `DEFINES` are always `EXTRACTED` with `source_system = "gcode"`. Code-symbol vector payloads use `source_system = "gcode"` and may include symbol summary text only when already present. Memory-derived and code/memory bridge edges are `INFERRED` or `AMBIGUOUS` and remain memory-owned. **Acceptance:** @@ -129,7 +386,7 @@ Code-derived `CALLS`, `IMPORTS`, and `DEFINES` are always `EXTRACTED` with `sour `kind: deliverable` Targets: `crates/gcode/src/graph/code_graph.rs`, `crates/gcode/src/models.rs` -Implement `CodeGraph` write APIs for deterministic code graph projection: +Implement `CodeGraph` write APIs for the deterministic FalkorDB `gobby_code` projection: - `ensure_file_node` - `add_imports` @@ -151,7 +408,7 @@ The write path preserves Python parity where IDs are externally visible. UUID5 g ### 2.3 Port code graph reads into the Rust core [category: code] (depends: 2.2) `kind: deliverable` -Targets: `crates/gcode/src/graph/code_graph.rs`, `crates/gcode/src/search/graph_boost.rs`, `crates/gcode/src/commands/graph.rs` +Targets: `crates/gcode/src/graph/code_graph.rs`, `crates/gcode/src/search/graph_boost.rs`, `crates/gcode/src/commands/graph.rs`, `crates/gcode/src/falkor.rs` Implement reusable read APIs that return stable graph payloads: @@ -163,13 +420,16 @@ Implement reusable read APIs that return stable graph payloads: These APIs require available FalkorDB for graph reads. Empty graphs are valid only for successful queries against an available graph service. +The existing public read helpers in `crates/gcode/src/falkor.rs` (`count_callers`, `count_usages`, `find_callers`, `find_usages`, `find_callers_batch`, `find_callees_batch`, `get_imports`, `blast_radius`) remain visible per the §1.5 Phase 7 compatibility wrapper. Their internal Falkor query work delegates to the new `graph::code_graph` read APIs after this section lands so query construction has a single canonical owner. The helpers' public signatures, sibling `*_query` Cypher builders, `Row`/`query`/`parse_falkor_result` surface, Cypher-builder helpers (`cypher_string_literal`, `id_list_literal`, `clamp_offset`), literal Cypher fragments (`target:CodeSymbol OR target:UnresolvedCallee OR target:ExternalSymbol`, `SKIP {offset} LIMIT {limit}`, `target.id IN [{ids}]`), the unbound-parameter ban for `$offset` / `$limit` / `$ids` in generated query strings, clamping behavior, and Phase 7 source-inspection fragments listed in §1.5 (including §1.5.13 / §1.5.14 / §1.5.15 / §1.5.16) remain unchanged. + **Acceptance:** - 2.3.1 - Read APIs return the existing node/link JSON shape with optional metadata on links. file: `crates/gcode/src/graph/code_graph.rs`. - 2.3.2 - Existing search graph boost behavior still handles missing graph config gracefully where search semantics allow degradation. test: `crates/gcode/src/search/graph_boost.rs::tests`. - 2.3.3 - Hard graph commands fail non-zero with typed errors when FalkorDB is unavailable. test: `crates/gcode/src/commands/graph.rs::tests::graph_reads_require_falkor`. +- 2.3.4 - Public `falkor.rs` read helpers (`count_callers`, `count_usages`, `find_callers`, `find_usages`, `find_callers_batch`, `find_callees_batch`, `get_imports`, `blast_radius`) delegate their internal Falkor query work to the new `graph::code_graph` read APIs while preserving public signatures, `*_query` siblings, clamping behavior, the `Row` / `query` / `parse_falkor_result` surface, the Cypher-builder helpers and literal fragments, and the unbound-parameter ban pinned by §1.5.13 / §1.5.14 / §1.5.15 / §1.5.16. test: `crates/gcode/src/falkor.rs::tests::read_helpers_delegate_to_code_graph`. -### 2.4 Wrap core operations with gcode graph commands [category: code] (depends: 2.2, 2.3) +### 2.4 Wrap core operations with gcode graph commands [category: code] (depends: 1.2, 1.4, 2.2, 2.3) `kind: deliverable` Targets: `crates/gcode/src/commands/graph.rs`, `crates/gcode/src/db.rs`, `crates/gcode/src/main.rs`, `crates/gcode/tests/graph_standalone.rs` @@ -185,14 +445,122 @@ Add or rewire CLI commands so they call Rust core APIs directly: No daemon HTTP calls are allowed from these commands. Output uses the existing global `--format` flag with `output::print_json` and `output::print_text`. +The existing top-level read commands `gcode callers`, `gcode usages`, `gcode imports`, and `gcode blast-radius` remain available as **additive** thin wrappers over the new `graph::code_graph` read APIs introduced by §2.3 — they are not removed, renamed, or replaced by the new `gcode graph overview|file|neighbors|blast-radius` surface. Both surfaces stay available. Each top-level command keeps its current clap surface (positional query argument plus existing flags such as `--limit`, `--offset`, and `--depth` where applicable), routes through `commands::graph::callers` / `commands::graph::usages` / `commands::graph::imports` / `commands::graph::blast_radius`, and preserves its current JSON output shape (field names, payload structure, pagination metadata). New optional metadata fields introduced by §2.1 are added with `#[serde(skip_serializing_if = "Option::is_none")]` so existing JSON consumers continue to parse the responses without changes. The existing parse tests `test_parse_callers_remains_top_level`, `test_parse_usages_remains_top_level`, `test_parse_imports_remains_top_level`, and `test_parse_blast_radius_remains_top_level` in `crates/gcode/src/main.rs` remain green after this section lands. + **Acceptance:** - 2.4.1 - Lifecycle commands call `CodeGraph` directly and do not depend on the daemon process. file: `crates/gcode/src/commands/graph.rs`. - 2.4.2 - `sync-file`, `clear`, and `rebuild` update graph sync state in PostgreSQL consistently with existing daemon semantics. file: `crates/gcode/src/db.rs`. - 2.4.3 - Clap parsing covers all graph subcommands and global format handling. test: `crates/gcode/src/main.rs::tests::parse_graph_commands`. - 2.4.4 - A daemon-stopped smoke test covers `overview`, `file`, `neighbors`, `blast-radius`, `sync-file`, `clear`, and `rebuild` against PostgreSQL plus FalkorDB. test: `crates/gcode/tests/graph_standalone.rs`. +- 2.4.5 - Existing top-level `gcode callers`, `gcode usages`, `gcode imports`, and `gcode blast-radius` commands remain available as additive wrappers over `graph::code_graph` reads with their current clap surface (positional query plus `--limit`, `--offset`, `--depth` flags as applicable), and the parse tests `test_parse_callers_remains_top_level`, `test_parse_usages_remains_top_level`, `test_parse_imports_remains_top_level`, and `test_parse_blast_radius_remains_top_level` still pass. test: `crates/gcode/src/main.rs::tests::test_parse_callers_remains_top_level`. +- 2.4.6 - JSON output shape for top-level `gcode callers`, `gcode usages`, `gcode imports`, and `gcode blast-radius` (field names, payload structure, pagination metadata) stays compatible with current consumers; new optional metadata fields per §2.1 are tagged with `#[serde(skip_serializing_if = "Option::is_none")]` so existing parsers continue to accept the responses. test: `crates/gcode/src/commands/graph.rs::tests::top_level_read_commands_preserve_json_shape`. + +### 2.5 Port code-symbol vector projection into the Rust core [category: code] (depends: 1.1, 1.5, 2.1) +`kind: deliverable` +Targets: `crates/gcode/src/vector/code_symbols.rs`, `crates/gcode/src/search/semantic.rs`, `crates/gcode/src/config.rs`, `crates/gcode/src/commands/vector.rs`, `crates/gcode/tests/vector_projection.rs` + +Implement reusable vector projection APIs for Qdrant code-symbol collections: + +- Resolve embedding configuration through the standard Gobby config chain via `gobby_core::config::resolve_embedding_config` (env overrides first, then PostgreSQL `config_store`, then hardcoded defaults where a default is valid). Attached mode reaches `config_store` through Gobby bootstrap or the daemon database broker, not through a project `.env`. +- Generate code-symbol vectors by calling OpenAI-compatible `/v1/embeddings` directly from Rust. The daemon embedding service is not used for code-index projection sync. +- Ensure Qdrant collections using the existing `code_symbols_` names. The collection name is derived via `gobby_core::qdrant::collection_name(.., CollectionScope::Custom("code_symbols_"))` so existing collections are preserved verbatim without migration. +- Build vector payloads from indexed symbol facts: project ID, file path, symbol ID, name, kind, language, signature/docstring where present, source range, and optional existing summary text. Every payload also carries the §2.1 provenance metadata fields: `provenance = "EXTRACTED"`, `confidence = 1.0`, `source_system = "gcode"`, and source-detail fields (file path, source range/line, symbol ID) so vector hits can be distinguished from other vector producers in downstream agents and reports. +- Delete stale vectors by `project_id` plus `file_path` before file-level re-upsert. +- Upsert all current symbols for a file via `gobby_core::qdrant::upsert` after embedding succeeds. +- Clear project vectors and rebuild vectors from PostgreSQL code facts. +- Treat LLM-generated symbol summaries as optional enrichment; missing summaries must not block vector sync. + +**Foundation boundary for vector lifecycle** (matches A1's vector projection lifecycle exception): + +- `gobby_core::qdrant` exposes connection-level config (`QdrantConfig`), `with_qdrant` (the ServiceState boundary), `collection_name` (caller-controlled naming), `search`, and `upsert`. Collection lifecycle (ensure collection with vector params, delete-by-filter, clear/drop) is not part of the foundation surface and is consumer-owned. +- The ensure-collection, delete-by-filter, and clear-collection HTTP requests live only in `crates/gcode/src/vector/code_symbols.rs`. No other `gobby-code` file may issue raw Qdrant lifecycle REST. +- Even inside `vector/code_symbols.rs`, lifecycle operations resolve config through `gobby_core::config::resolve_qdrant_config`, enter the ServiceState/degradation boundary through `gobby_core::qdrant::with_qdrant`, and reuse `gobby_core::qdrant::collection_name(.., CollectionScope::Custom("code_symbols_"))` for naming. +- Non-lifecycle vector operations (upsert after embedding, soft search) call `gobby_core::qdrant::upsert` and `gobby_core::qdrant::search` directly. + +**Vector parameter handling for ensure/rebuild lifecycle**: + +Qdrant collection creation requires explicit vector parameters (`size` and `distance`), and OpenAI-compatible embedding models can return different dimensions (`text-embedding-3-small` returns 1536, `text-embedding-3-large` returns 3072, third-party models vary). Lifecycle operations in `vector::code_symbols` must derive these parameters deterministically and refuse to migrate or silently rebuild incompatible existing collections. + +Vector parameter rules: + +- **Distance metric is fixed at `Cosine`** for `code_symbols_` collections. The choice is documented in `crates/gcode/src/vector/code_symbols.rs` alongside the ensure-collection helper; it matches the existing Python daemon behavior so attached-mode collections remain compatible. +- **Vector size source order**: an explicit consumer-owned `vector_dim` setting wins when present. The setting lives on a `gobby-code`-owned sibling config type (`CodeVectorSettings { vector_dim: Option }` in `crates/gcode/src/config.rs`); it is **not** added to `gobby_core::config::EmbeddingConfig`. The value is resolved through the §1.5 `ConfigSource` adapter pipeline: env `GOBBY_EMBEDDING_VECTOR_DIM` first, then `config_store` key `embeddings.vector_dim` decoded via `gobby_core::config::decode_config_value` (JSON integers are accepted; JSON null returns `None`; invalid values return a typed config error). When no explicit value is present, the dimension is probed once by sending a fixed canary prompt (e.g. `"dimension_probe"`) to the configured embedding endpoint and using the response vector length. The probed value is cached on the `vector::code_symbols` lifecycle context for the remainder of the command so a single command does not pay the probe cost more than once. +- **Ensure semantics for missing collection**: `ensure_collection` issues `PUT /collections/` with `vector_size`, `distance: "Cosine"`, and any payload/index settings required for symbol filtering. On success the collection is ready for upsert. +- **Ensure semantics for compatible existing collection**: when the collection already exists and its reported `config.params.vectors.size` plus `config.params.vectors.distance` match the resolved parameters, `ensure_collection` is a no-op. Upsert and delete-by-filter proceed against the existing collection without recreation. +- **Ensure semantics for incompatible existing collection (no migration)**: when the collection already exists but its reported `size` or `distance` does not match the resolved parameters, hard lifecycle commands (`gcode vector sync-file`, `gcode vector rebuild`) fail with a typed `VectorLifecycleError::DimensionMismatch { collection, expected_size, found_size, expected_distance, found_distance }` carrying actionable text. The collection is **not** dropped, cleared, or recreated. `gcode vector clear` deletes all points but does not change the collection's vector schema; it surfaces the same dimension-mismatch error before issuing any destructive HTTP if the resolved schema is being asked to write incompatible vectors. +- **Soft search compatibility**: the soft search path in `search/semantic.rs` does not call `ensure_collection`; it reports missing-collection / dimension-mismatch responses from Qdrant via the shared degradation vocabulary rather than failing the entire `gcode search` invocation. + +**Acceptance:** -## P3: Report And Migration Surfaces +- 2.5.1 - The embedding client sends OpenAI-compatible requests and parses successful/error responses. test: `crates/gcode/src/vector/code_symbols.rs::tests::embedding_request_response`. +- 2.5.2 - Qdrant REST coverage proves ensure collection, delete-by-file, upsert, clear, and rebuild behavior. test: `crates/gcode/tests/vector_projection.rs`. +- 2.5.3 - Collection naming remains `code_symbols_` via `gobby_core::qdrant::collection_name(.., CollectionScope::Custom(..))` and does not require migration. test: `crates/gcode/src/vector/code_symbols.rs::tests::collection_name_compatibility`. +- 2.5.4 - Missing Qdrant or embedding config produces typed degradation for soft search paths and clear non-zero errors for hard vector lifecycle commands. test: `crates/gcode/src/commands/vector.rs::tests::vector_lifecycle_requires_config`. +- 2.5.5 - Missing symbol summaries do not block vector projection sync. test: `crates/gcode/src/vector/code_symbols.rs::tests::summaries_are_optional_enrichment`. +- 2.5.6 - Code-specific Qdrant lifecycle HTTP (ensure collection, delete-by-filter, clear) stays scoped to `crates/gcode/src/vector/code_symbols.rs` and is the only `gobby-code` source file that issues raw Qdrant lifecycle REST requests. test: `crates/gcode/src/vector/code_symbols.rs::tests::lifecycle_http_scoped_to_module`. +- 2.5.7 - Vector projection resolves Qdrant config via `gobby_core::config::resolve_qdrant_config`, enters the ServiceState boundary via `gobby_core::qdrant::with_qdrant`, derives collection names via `gobby_core::qdrant::collection_name(.., CollectionScope::Custom(..))`, and routes search/upsert through `gobby_core::qdrant::search` / `gobby_core::qdrant::upsert`; direct REST is limited to lifecycle ensure/delete-by-filter/clear. test: `crates/gcode/src/vector/code_symbols.rs::tests::routes_through_gobby_core_qdrant`. +- 2.5.8 - `ensure_collection` resolves vector size from the consumer-owned `CodeVectorSettings.vector_dim` (resolved through the §1.5 `ConfigSource` adapter pipeline: env `GOBBY_EMBEDDING_VECTOR_DIM` first, then `config_store` key `embeddings.vector_dim` JSON-decoded via `gobby_core::config::decode_config_value`) when set, otherwise probes the configured embedding endpoint exactly once per lifecycle command; distance is `Cosine`. The probed dimension matches the response vector length for the configured model. The consumer-owned setting does not extend `gobby_core::config::EmbeddingConfig`. test: `crates/gcode/src/vector/code_symbols.rs::tests::ensure_collection_resolves_vector_size_and_distance`, `crates/gcode/src/config.rs::tests::vector_dim_setting_resolves_env_and_config_store`. +- 2.5.9 - Ensuring a missing collection creates it via `PUT /collections/` with the resolved `vector_size` and `Cosine` distance; ensuring an existing collection whose `size`+`distance` match is a no-op (no destructive HTTP, no recreation). test: `crates/gcode/tests/vector_projection.rs::ensure_creates_missing_and_reuses_compatible`. +- 2.5.10 - Ensuring or rebuilding against an existing collection whose `size` or `distance` is incompatible with the resolved parameters fails with `VectorLifecycleError::DimensionMismatch { collection, expected_size, found_size, expected_distance, found_distance }`; no migration, drop, or recreation is performed and `clear` refuses incompatible destructive HTTP before issuing it. test: `crates/gcode/tests/vector_projection.rs::incompatible_existing_collection_errors_without_migration`. +- 2.5.11 - Vector payloads carry the §2.1 provenance metadata fields (`provenance = "EXTRACTED"`, `confidence = 1.0`, `source_system = "gcode"`, plus source-detail fields covering file path, source range, and symbol ID). Payloads round-trip through Qdrant upsert without losing these fields. test: `crates/gcode/src/vector/code_symbols.rs::tests::payloads_carry_provenance_metadata`. + +### 2.6 Add projection lifecycle orchestration commands [category: code] (depends: 1.2, 1.4, 2.4, 2.5) +`kind: deliverable` +Targets: `crates/gcode/src/main.rs`, `crates/gcode/src/commands/index.rs`, `crates/gcode/src/commands/vector.rs`, `crates/gcode/src/commands/mod.rs`, `crates/gcode/src/projection/sync.rs`, `crates/gcode/src/db.rs`, `crates/gcode/src/output.rs`, `crates/gcode/tests/projection_standalone.rs` + +Expose stable projection lifecycle surfaces for humans and Python transition shims: + +- `gcode graph sync-file --file ` +- `gcode graph clear` +- `gcode graph rebuild` +- `gcode vector sync-file --file ` +- `gcode vector clear` +- `gcode vector rebuild` +- `gcode index --sync-projections` + +`gcode index --sync-projections` writes PostgreSQL code facts via the §1.4 library API, then synchronously syncs graph and vector projections for the affected files through `projection::sync`. It is the daemon-triggered indexing path during migration. CLI JSON output is stable enough for Python shell-out shims: each projection reports `status`, `synced_files`, `synced_symbols`, `degraded`, and typed error details when available. + +Required JSON shape for `gcode index --sync-projections --format json`: + +```json +{ + "indexed_files": 12, + "skipped_files": 0, + "symbols_indexed": 348, + "chunks_indexed": 921, + "projections": { + "graph": { + "status": "ok | degraded | failed", + "synced_files": 12, + "synced_symbols": 348, + "degraded": false, + "error": null + }, + "vector": { + "status": "ok | degraded | failed", + "synced_files": 12, + "synced_symbols": 348, + "degraded": false, + "error": null + } + } +} +``` + +Hard lifecycle commands fail non-zero when their explicitly requested backing service is unavailable or misconfigured. Search/index paths that can return useful PostgreSQL-only results may return typed degradation instead, but they must make skipped projection work visible in JSON. Text-mode output for `gcode index --sync-projections` must go through `output::print_text` so shell-out consumers get a stable structured payload rather than free-form stderr writes. + +**Acceptance:** + +- 2.6.1 - Clap parsing covers graph/vector lifecycle commands and `gcode index --sync-projections`. test: `crates/gcode/src/main.rs::tests::parse_projection_lifecycle_commands`. +- 2.6.2 - `index --sync-projections` updates PostgreSQL sync state only after corresponding graph/vector sync succeeds. test: `crates/gcode/src/projection/sync.rs::tests::sync_state_tracks_projection_success`. +- 2.6.3 - JSON output for graph/vector lifecycle commands is stable and includes typed degradation/error fields. test: `crates/gcode/src/commands/vector.rs::tests::lifecycle_json_contract`. +- 2.6.4 - Daemon-stopped smoke tests cover graph plus vector lifecycle commands against PostgreSQL, FalkorDB, Qdrant, and a mock embedding endpoint. test: `crates/gcode/tests/projection_standalone.rs`. +- 2.6.5 - `gcode index --sync-projections --format json` emits indexing counts plus `projections.graph` and `projections.vector` objects with `status`, `synced_files`, `synced_symbols`, `degraded`, and optional `error` fields exactly matching the shape documented above. test: `crates/gcode/src/commands/index.rs::tests::sync_projections_json_contract`. +- 2.6.6 - `gcode index --sync-projections --format text` routes structured payload through `output::print_text` (no raw stderr-only status). test: `crates/gcode/src/commands/index.rs::tests::sync_projections_text_contract`. +- 2.6.7 - `crates/gcode/src/commands/mod.rs` exports the new `vector` command module via `pub mod vector;`, sequenced after §1.2's `pub mod setup;` edit so both command-module exports land in a single owner chain rather than racing on the same file. file: `crates/gcode/src/commands/mod.rs`. + +## P3: Report And Daemon Migration Surfaces `kind: framing` ### 3.1 Generate a project graph report in Rust core [category: code] (depends: 2.3) @@ -223,7 +591,7 @@ Keep v1 metrics simple and explainable. Do not add advanced community detection - 3.1.3 - Bridge edges are read-only and clearly marked as inferred. test: `crates/gcode/src/graph/report.rs::tests::bridge_edges_are_read_only`. - 3.1.4 - Missing optional bridge data does not fail a code-only report; missing required graph service fails with a typed error. test: `crates/gcode/src/graph/report.rs::tests::report_degradation_contract`. -### 3.2 Add gcode graph report CLI wrapper [category: code] (depends: 3.1) +### 3.2 Add gcode graph report CLI wrapper [category: code] (depends: 2.6, 3.1) `kind: deliverable` Targets: `crates/gcode/src/commands/graph.rs`, `crates/gcode/src/main.rs` @@ -236,7 +604,7 @@ Expose `gcode graph report --top-n ` as a thin wrapper over the Rust report A - 3.2.3 - Missing required graph services fail non-zero with a clear error and no fake empty report. test: `crates/gcode/src/commands/graph.rs::tests::report_requires_graph_service`. - 3.2.4 - Clap parsing proves `--format` remains global and report-specific args stay minimal. test: `crates/gcode/src/main.rs::tests::parse_graph_report_global_format`. -### 3.3 Document daemon migration contracts [category: docs] (depends: 2.4, 3.2) +### 3.3 Document daemon migration contracts [category: docs] (depends: 2.6, 3.2) `kind: deliverable` Target: `docs/guides/gcode-graph-core.md` @@ -244,19 +612,26 @@ Document the migration contract for Gobby daemon consumers: - Future Rust daemon links the library APIs directly. - Python daemon shims may temporarily shell out to `gcode` JSON commands. -- Python shims must treat graph/report failures as explicit degraded states. +- Python `CodeIndexTrigger` calls `gcode index --sync-projections` for daemon-triggered indexing. +- Python maintenance flows call Rust-owned `gcode graph clear|rebuild` and `gcode vector clear|rebuild`, or stable JSON wrapper functions around those commands. +- After parity, retire Python `CodeGraph`, graph/vector projection code in `sync_worker.py`, and projection lifecycle methods in `CodeIndexContext`. +- Python shims must treat projection/report failures as explicit degraded states. +- The daemon embedding service is bypassed for code-index projection sync; Rust calls OpenAI-compatible embedding endpoints directly. +- LLM-generated symbol summaries remain daemon-side and optional. - Memory services continue to own memory graph and `RELATES_TO_CODE` writes. - UI/MCP/HTTP surfaces belong in the daemon repo and should call daemon services, not become `gcode` responsibilities. **Acceptance:** - 3.3.1 - Daemon integration notes identify direct Rust linking as the target. file: `docs/guides/gcode-graph-core.md`. -- 3.3.2 - Transitional Python shell-out behavior is documented as temporary. file: `docs/guides/gcode-graph-core.md`. -- 3.3.3 - Ownership boundaries for code graph, memory graph, and bridge edges are explicit. file: `docs/guides/gcode-graph-core.md`. +- 3.3.2 - Transitional Python shell-out behavior names `CodeIndexTrigger`, `sync_worker.py`, and `CodeIndexContext` migration points. file: `docs/guides/gcode-graph-core.md`. +- 3.3.3 - Ownership boundaries for PostgreSQL code facts, FalkorDB graph projection, Qdrant vector projection, memory graph, and bridge edges are explicit. file: `docs/guides/gcode-graph-core.md`. +- 3.3.4 - Symbol summaries are documented as daemon-side optional enrichment. file: `docs/guides/gcode-graph-core.md`. -## Test Plan -`kind: framing` +## VS1: Verification +`kind: verification` +- `uv run gobby plans validate .gobby/plans/gcode-graph-enhancements.md` - `cargo build --workspace --no-default-features` - `cargo test -p gobby-code --no-default-features` - `cargo clippy -p gobby-code --no-default-features -- -D warnings` @@ -264,181 +639,342 @@ Document the migration contract for Gobby daemon consumers: - `cargo clippy --workspace -- -D warnings` - Phase 7 contract tests in the Gobby repo pass against the updated `gcode` binary. - FalkorDB integration tests are gated by `FALKORDB_HOST` and skip with a clear message when unavailable. -- Standalone smoke tests run with the daemon stopped against PostgreSQL plus FalkorDB. -- JSON compatibility tests prove current consumers can parse outputs with optional graph metadata. +- Mock embedding endpoint tests cover code-symbol embedding request and response handling. +- Qdrant REST tests cover ensure collection, delete-by-file, upsert, clear, and rebuild. +- Standalone smoke tests run with the daemon stopped against PostgreSQL, FalkorDB, Qdrant, and a mock embedding endpoint. +- `docs/guides/gcode-graph-core.md` documents the daemon migration contract: future Rust daemon links library APIs directly, transitional Python shims shell out to stable `gcode` JSON commands, and projection/report failures are explicit degraded states. The actual Python shim migration in the Gobby repo (consumer-side `CodeIndexTrigger` / `sync_worker.py` / `CodeIndexContext` rewrites plus corresponding transition tests) is deferred to the Gobby-repo task referenced from `DF1` and is not in scope for this plan's verification. +- Regression tests prove symbol summaries remain optional and do not block projection sync. +- JSON compatibility tests prove current consumers can parse outputs with optional projection metadata. -## Acceptance Criteria -`kind: framing` +## AC1: Acceptance Criteria +`kind: verification` -- `gobby-code` library APIs own code graph writes, reads, lifecycle, setup integration, and report generation. +- `gobby-code` library APIs own PostgreSQL code facts, graph/vector projection sync, lifecycle, setup integration, and report generation. - Shared foundation concerns route through `gobby-core`, not copied `gcode` utilities. -- `gcode` graph commands are CLI wrappers over library APIs. +- `gcode` graph and vector commands are CLI wrappers over library APIs. +- `gcode index --sync-projections` is available for daemon-triggered indexing. - Future Rust daemon can link the same code directly. -- Python daemon shell-outs, if used, are explicitly transitional. +- Python daemon shell-outs, if used, are explicitly transitional and expose stable JSON output. - Standalone mode has explicit setup and does not depend on inherited Gobby-owned migrations. - Attached mode remains non-destructive to Gobby-owned schema and files. - Code graph facts and memory graph facts keep separate ownership. +- Qdrant code-symbol collections keep existing `code_symbols_` names. +- Rust code-symbol embedding uses OpenAI-compatible embedding endpoints directly. +- LLM-generated symbol summaries remain daemon-owned optional enrichment. - Provenance/confidence metadata lets agents distinguish extracted code facts from inferred bridge/memory links. -- Graph/report degraded behavior is explicit and never masquerades as successful empty data. +- Projection/report degraded behavior is explicit and never masquerades as successful empty data. - Existing JSON consumers remain compatible. -## Plan Changelog -`kind: framing` +## DF1: Deferred Gobby-Repo Python Daemon Shim Transition +`kind: deferred` + +The actual Python shim migration in the Gobby repo — rewriting `CodeIndexTrigger`, `gobby/services/code_index/sync_worker.py`, and `gobby/services/code_index/context.py` (`CodeIndexContext`) to shell out to `gcode index --sync-projections`, `gcode graph clear|rebuild`, and `gcode vector clear|rebuild`; removing Python-side `CodeGraph`, graph/vector projection code paths, and projection lifecycle methods; and adding Gobby-repo transition tests proving the shims invoke `gcode` and stop instantiating Python projection code — is out of scope for this `gobby-cli` epic. This plan owns the `gcode` JSON contract (defined in §2.6) and gcode-side migration documentation (defined in §3.3) only. + +```yaml +task_ref: "#15147" +reason: "Python shim consumer work (CodeIndexTrigger / sync_worker.py / CodeIndexContext rewrites plus Gobby-repo transition tests) lives in the Gobby repository, not in gobby-cli. This plan's gcode-side scope is the stable JSON contract documented in §2.6 and the migration documentation in §3.3; actually rewriting Python consumers and the corresponding transition tests must happen in the Gobby repo against the gcode binary this plan produces." +owner: "gobby-daemon-team" +original_acceptance_items: + - item_id: DF1.1 + prose: "Update Python CodeIndexTrigger to shell out to gcode index --sync-projections and treat projection failures as explicit degraded states." + artifact_kind: file + artifact_ref: "gobby/services/code_index/trigger.py" + - item_id: DF1.2 + prose: "Remove Python-side CodeGraph and graph/vector projection code from sync_worker.py; maintenance flows call gcode graph clear|rebuild and gcode vector clear|rebuild instead." + artifact_kind: file + artifact_ref: "gobby/services/code_index/sync_worker.py" + - item_id: DF1.3 + prose: "Remove projection lifecycle methods from CodeIndexContext and route them through stable gcode JSON commands." + artifact_kind: file + artifact_ref: "gobby/services/code_index/context.py" + - item_id: DF1.4 + prose: "Add Gobby-repo transition tests proving Python shims shell out to gcode and do not instantiate Python graph/vector projection code." + artifact_kind: test + artifact_ref: "gobby/tests/code_index/test_gcode_shim_transition.py" +``` + +Provenance label (must be applied to `#15147`): `deferred-from:gcode-graph-enhancements:DF1`. + +## V1 Plan Changelog +`kind: verification` - **R1-R12 (2026-05-24)**: Earlier iterations specified direct `gcode` ownership of graph writes/reads, route-shaped CLI commands, provenance metadata, graph lifecycle cleanup, report output, and Phase 7 compatibility constraints. - **R13 (2026-05-26)**: Reframed the plan around reusable Rust core/library boundaries with `gcode` as CLI wrapper; made future Rust daemon direct linking the target; limited Python daemon shell-outs to transitional shims; added explicit standalone setup and minimal app-schema creation; preserved provenance/confidence, code-vs-memory ownership, graph report, and degraded behavior concepts; removed stale daemon-backed CLI and inherited-migration framing. - **R14 (2026-05-26)**: Added dependency on `gobby-core` foundation plan; clarified that `gobby-code` owns code-specific graph APIs while shared context/config, setup, datastore, search/index primitives, and degradation contracts route through `gobby-core`. - -## Task Plan -`kind: framing` +- **R15 (2026-05-28)**: Reframed graph work as gcode-owned code projections: PostgreSQL code facts, FalkorDB `gobby_code`, and Qdrant `code_symbols_`. Moved code-symbol embedding generation into Rust through OpenAI-compatible endpoints, added vector lifecycle commands, added `gcode index --sync-projections`, and made Python daemon projection code transitional. Left LLM-generated symbol summaries daemon-side. +- **R16 (2026-05-28)**: Normalized framing/verification headings to contract grammar (`O1`, `D1`, `A1`, `N1`, `VS1`, `AC1`, `V1`); added explicit `**Plan ID:** gcode-graph-enhancements` header; added `D1: Dependent Plans` mirroring the foundation plan; promoted `## Task Plan` to `## M1 Task Manifest` with `kind: manifest`; rewrote coverage labels to `covers:gcode-graph-enhancements:
:` so the expansion contract resolves plan identity instead of `unknown`. +- **R17 (2026-05-28)**: Addressed Round 16 blocking findings. F1: added `gcode setup --standalone` CLI wiring to §1.2 with `commands/setup.rs` + `main.rs` targets and acceptance items 1.2.5/1.2.6 proving clap parsing and end-to-end command execution. F2: added new §1.4 deliverable for the reusable code-fact indexing library API (`index::api::index_files`/`IndexRequest`/`IndexOutcome`) decomposing PostgreSQL fact writes out of CLI modules, and threaded the dependency through §2.6. F3: added §2.6 JSON shape contract for `gcode index --sync-projections --format json` with acceptance items 2.6.5/2.6.6 covering JSON and text-mode output. Sweeps: added `vector/code_symbols.rs` to §2.1 targets (provenance applies to vector payloads), added `tests/projection_standalone.rs` and `output.rs` to §2.6 targets. Updated M1 manifest to include the §1.4 entry, new §2.6 dependency on §1.4, and expanded covers labels for §1.2/§2.6. +- **R18 (2026-05-28)**: Addressed Round 17 blocking findings on shared-file sequencing. F1 (§2.1 ↔ §2.5 sharing `crates/gcode/src/vector/code_symbols.rs`): added `2.1` to the §2.5 heading and M1 manifest `depends_on`, so the vector projection implementation waits for the provenance/source-system metadata contract that the vector payload path must carry. F2 (CLI/DB shared-file edits): added `1.4` to the §2.4 heading and M1 manifest `depends_on` (both touch `crates/gcode/src/db.rs`; §1.4 owns the reusable DB/helper boundary used by later projection sync work), and added `2.6` to the §3.2 heading and M1 manifest `depends_on` so the report CLI leaf runs after the graph/projection lifecycle CLI rewrites it shares `crates/gcode/src/main.rs` and `crates/gcode/src/commands/graph.rs` with. F2 sweep (whole-plan): re-checked every shared-file pair against the new dependency graph — `main.rs` chain is §1.1 → {§1.2, §1.4} → §2.4 → §2.6 → §3.2 (§1.2 vs §2.4 remain parallel siblings adding independent clap subcommand variants under §1.1's CLI structure; this matches the adversary's explicit scoping of the finding to runtime CLI rewrites §2.4/§2.6/§3.2 and is not flagged); `commands/graph.rs` chain is §1.1 → §2.3 → §2.4 → §3.2 (after F2 fix); `commands/vector.rs` chain is §1.1 → §2.5 → §2.6; `commands/index.rs` chain is §1.4 → §2.6; `db.rs` chain is §1.4 → §2.4 → §2.6 (after F2 fix); `vector/code_symbols.rs` chain is §2.1 → §2.5 (after F1 fix); `graph/code_graph.rs` chain is §2.1 → §2.2 → §2.3; `models.rs` chain is §2.1 → §2.2; `graph/report.rs` chain is §2.1 → §3.1; `falkor.rs` chain is §1.1 → §1.3; `search/semantic.rs` chain is §1.1 → §2.5. No section bodies, acceptance items, or coverage labels changed. +- **R19 (2026-05-28)**: Addressed Round 18 blocking findings. F1 (missing gobby-core consumer migration deliverable): added new §1.5 "Wire gcode to the gobby-core foundation" with targets `crates/gcode/Cargo.toml`, `crates/gcode/src/lib.rs`, `crates/gcode/src/config.rs`, `crates/gcode/src/db.rs`, `crates/gcode/src/falkor.rs`, `crates/gcode/src/search/semantic.rs`. Acceptance items 1.5.1–1.5.7 require Cargo.toml to enable `postgres`/`falkor`/`qdrant`/`search`/`indexing` (or `full`) features on gobby-core, both default and `--no-default-features` builds to succeed, config resolution to delegate to `gobby_core::config::resolve_*_config` / `CoreContext`, PostgreSQL plumbing to delegate to `gobby_core::postgres`, the Phase 7 `falkor.rs` facade to route internals through `gobby_core::falkor::with_graph` / `GraphClient`, the soft semantic-search path in `search/semantic.rs` to use `gobby_core::qdrant::with_qdrant` / `collection_name` / `search`, and a `lib::tests::foundation_consumer_migration` regression test to assert the migration. Threaded §1.5 as a dependency through §1.3 (shares `falkor.rs`), §1.4 (shares `db.rs` and `lib.rs`), and §2.5 (shares `search/semantic.rs` and `config.rs`). F2 (Qdrant lifecycle gap in gobby-core foundation surface): narrowed A1 with a "Vector projection lifecycle exception" bullet that allows code-specific Qdrant lifecycle HTTP (ensure collection, delete-by-filter, clear, rebuild) inside `crates/gcode/src/vector/code_symbols.rs` only, while requiring gobby-core for config (`resolve_qdrant_config`), ServiceState (`with_qdrant`), collection naming (`collection_name(.., CollectionScope::Custom(..))`), and non-lifecycle `search`/`upsert`. Added §2.5 acceptance items 2.5.6 (lifecycle HTTP scoped to `vector::code_symbols`) and 2.5.7 (routing through gobby-core for config/ServiceState/naming/search/upsert), and expanded §2.5 body with an explicit "Foundation boundary for vector lifecycle" subsection. Whole-plan sweeps: F1 sweep — re-verified every gobby-core consumer surface in framing has a deliverable owner; all FalkorDB/Qdrant/PostgreSQL plumbing anchors to §1.5, transitively reached by §1.3 (falkor.rs facade), §1.4 (db.rs helpers), §2.2/§2.3 (graph through `gobby_core::falkor::with_graph` via §1.3 → §1.5), §2.5 (vector through §1.5), and §2.4/§2.6 (CLI through §1.4 → §1.5). F2 sweep — re-verified every datastore adapter usage against the narrowed exception: §2.2/§2.3 graph ops use `gobby_core::falkor::with_graph` only; §2.6 lifecycle reuses §2.5/§2.4 lifecycle APIs and does not introduce new raw Qdrant HTTP outside `vector::code_symbols`; §3.1/§3.2 report paths do no Qdrant calls. Shared-file sequencing sweep (after §1.5): `Cargo.toml` chain is §1.1 → §1.5; `config.rs` chain is §1.5 → §2.5; `db.rs` chain is §1.5 → §1.4 → §2.4 → §2.6; `falkor.rs` chain is §1.1 → §1.5 → §1.3; `search/semantic.rs` chain is §1.1 → §1.5 → §2.5; `lib.rs` chain is §1.1 → §1.5 → §1.4 (both §1.4 and §1.5 add re-exports; §1.4 now depends on §1.5). M1 manifest updated: new §1.5 entry, refreshed §1.3/§1.4/§2.5 depends_on lists and validation criteria, and 2.5.6/2.5.7 covers labels appended. +- **R21 (2026-05-28)**: Addressed Round 20 blocking findings. F1 (bad-sequencing, §2.5 vs §1.5 and gobby-core foundation): chose the consumer-owned wrapper option from the adversary's suggested fix — vector dimension is now owned by a `gobby-code`-side sibling config type (`CodeVectorSettings { vector_dim: Option }` in `crates/gcode/src/config.rs`) rather than added to `gobby_core::config::EmbeddingConfig`. Updated §1.5 body to spell out that retained `EmbeddingConfig` references stay thin re-exports of the gobby-core type and that code-specific projection settings (such as `vector_dim`) live on sibling consumer-owned types resolved through the same §1.5 `ConfigSource` adapter pipeline. Updated §2.5 "Vector parameter handling" subsection to reference the consumer-owned setting and the `env → config_store JSON-decoded → defaults` resolution order. Updated acceptance 2.5.8 to reference `CodeVectorSettings.vector_dim` (not `EmbeddingConfig.vector_dim`) and added a second covering test `crates/gcode/src/config.rs::tests::vector_dim_setting_resolves_env_and_config_store`. No new deliverable was needed because the consumer-owned setting fits inside the existing §2.5 and §1.5 target inventories (`crates/gcode/src/config.rs` already targeted by both). F2 (weak-testability, §2.1 and §2.5): added provenance fields explicitly to the §2.5 vector payload list (`provenance = "EXTRACTED"`, `confidence = 1.0`, `source_system = "gcode"`, plus source-detail fields covering file path, source range, and symbol ID) and added new acceptance item 2.5.11 with covering test `crates/gcode/src/vector/code_symbols.rs::tests::payloads_carry_provenance_metadata` so the manifest covers labels and validation criteria pin the provenance contract on vector payloads. F3 (traceability, VS1 / §3.3): added new top-level `DF1: Deferred Gobby-Repo Python Daemon Shim Transition` section with typed `deferral` object pointing at open Gobby-repo task `#15147` (`Update daemon graph sync handoff after gcode sync-file contract`); narrowed VS1 to remove the Gobby-repo transition-test bullet that this `gobby-cli` epic cannot satisfy and replaced it with a documentation-scoped bullet plus an explicit pointer to DF1. §3.3 remains the docs-only deliverable that owns the migration contract narrative inside this plan. Whole-plan sweeps: F1 sweep — re-verified that no other deliverable claims `EmbeddingConfig.vector_dim` or adds new fields to gobby-core config types from gobby-code; all code-specific projection settings continue to live in `crates/gcode/src/config.rs` sibling types and resolve through the §1.5 adapter. F2 sweep — re-verified every projection payload writer pins provenance: graph edges already covered by 2.1.2 (`code_edges_carry_provenance`), bridge edges by 2.1.3 (`bridge_edges_are_hypotheses`), and now vector payloads by 2.5.11; no other projection producer is missing a provenance acceptance. F3 sweep — re-verified every VS1 bullet against deliverable coverage; remaining bullets all map to in-scope deliverables (foundation build under `--no-default-features` via §1.5, FalkorDB integration gating via §1.3/§2.2/§2.3, mock embedding tests via §2.5, Qdrant REST tests via §2.5/§2.6, standalone smoke tests via §2.4/§2.6, optional summaries via §2.5.5, JSON compatibility via §2.6/§3.2). M1 Task Manifest updated: §2.5 covers labels expanded to include `2.5.11` and validation_criteria expanded to invoke `vector::code_symbols::tests::payloads_carry_provenance_metadata` and `config::tests::vector_dim_setting_resolves_env_and_config_store`. Plan changelog R21 entry appended. +- **R20 (2026-05-28)**: Addressed Round 19 blocking findings. F1 (missing consumer `ConfigSource` adapter contract, O1/A1/D1/AC1 vs §1.5 and §2.5): added `crates/gcode/src/secrets.rs` to §1.5 targets; added a "Consumer adapter contract" subsection to §1.5 body specifying that `crates/gcode/src/config.rs` defines a PostgreSQL-backed `ConfigSource` implementation that wraps `&mut postgres::Client`, reads via `gobby_core::postgres::read_config_value`, decodes via `gobby_core::config::decode_config_value`, and resolves `$secret:NAME` / `${VAR}` patterns via `crate::secrets::resolve_config_value`; documented `EnvOnlySource` as the no-database baseline, and explicitly pinned the four-step pipeline `env → config_store (JSON-decoded) → $secret:/${VAR} interpolation → defaults`. Added three new acceptance items: 1.5.8 (adapter exists and uses the gobby-core decode pipeline plus `crate::secrets`), 1.5.9 (env precedence and JSON decode pipeline behave correctly for FalkorDB host/port/password, Qdrant URL/API key, embedding URL/model/API key with covering `crates/gcode/src/config.rs::tests::adapter_env_precedence_and_json_decode`), 1.5.10 (`$secret:` resolution still resolves FalkorDB password, Qdrant API key, and embedding API key in attached mode with `crates/gcode/src/config.rs::tests::adapter_resolves_config_store_secrets`). Added a behavioral guarantee stating attached mode resolves service settings from `config_store` plus `$secret:` resolution, not env-only paths. F2 (vector parameter handling for ensure/rebuild lifecycle, §2.5): added a "Vector parameter handling for ensure/rebuild lifecycle" subsection to §2.5 body specifying distance is fixed `Cosine`, vector size source order is explicit `EmbeddingConfig.vector_dim` then one-time per-command probe of the configured embedding endpoint, ensure-collection semantics for missing/compatible/incompatible existing collections, the typed `VectorLifecycleError::DimensionMismatch` (no migration, drop, or recreation), and that soft search reports missing-collection / dimension-mismatch via the shared degradation vocabulary. Added three new acceptance items: 2.5.8 (vector size resolution from explicit config or one-time probe with `Cosine` distance covering `vector::code_symbols::tests::ensure_collection_resolves_vector_size_and_distance`), 2.5.9 (missing-collection PUT/`Cosine` creation and compatible-existing no-op via `tests/vector_projection.rs::ensure_creates_missing_and_reuses_compatible`), 2.5.10 (incompatible-existing collection fails with `DimensionMismatch` and no destructive HTTP via `tests/vector_projection.rs::incompatible_existing_collection_errors_without_migration`). Whole-plan sweeps: F1 sweep — re-verified every gobby-core consumer surface that reads `config_store` values routes through the §1.5 `ConfigSource` adapter; the only attached-mode resolvers are `resolve_falkordb_config` / `resolve_qdrant_config` / `resolve_embedding_config` in §1.5, all consumed by §2.5 (vector lifecycle), §1.3 (Falkor facade), §2.2/§2.3 (graph reads/writes), §2.4 (graph CLI), and §2.6 (projection lifecycle CLI) through §1.5; no other section issues raw `read_config_value`/`decode_config_value`/`resolve_config_value` calls outside the adapter. F2 sweep — re-verified every vector lifecycle path uses the new vector-parameter handling: §2.5's `ensure_collection` is called from `sync-file`, `rebuild`, and the §2.6 `gcode index --sync-projections` projection-sync path; `clear` reuses the same compatibility check before issuing destructive HTTP; soft-search in `search/semantic.rs` does not call `ensure_collection` and surfaces dimension-mismatch via degradation, matching A1's lifecycle exception scope. M1 manifest updated: §1.5 covers labels expanded to 1.5.8/1.5.9/1.5.10 with refreshed validation_criteria pointing at the new adapter tests; §2.5 covers labels expanded to 2.5.8/2.5.9/2.5.10 with refreshed validation_criteria pointing at both unit and integration tests for vector parameter handling. +- **R22 (2026-05-28)**: Addressed Round 21 blocking findings. F1 (Phase 7 compatibility boundary, VS1 / §1.1 and §1.5): chose the compatibility-wrapper option — `crates/gcode/src/config.rs` keeps a local `FalkorConfig { host, port, password, graph_name: String }` (not a pure re-export of `gobby_core::config::FalkorConfig`, which has no `graph_name`) and `crates/gcode/src/falkor.rs` keeps a local `FalkorClient { graph: SyncGraph }` with `from_config(&FalkorConfig)` and free `with_falkor(ctx, ...)` so the external Phase 7 source-inspection contract resolves; wrapper internals delegate to `gobby_core::falkor::with_graph` / `gobby_core::falkor::GraphClient::from_config(&core_config, &config.graph_name)` so behavior still routes through gobby-core. Added a "Phase 7 compatibility wrapper" subsection to §1.5 body documenting the exact local symbols, field shapes, and `falkordb::{FalkorClientBuilder, FalkorConnectionInfo, SyncGraph}` import chain that must remain in `gobby-code` source. Reworked acceptance 1.5.3 to say `QdrantConfig`/`EmbeddingConfig` are thin re-exports while `FalkorConfig` is a wrapper; reworked 1.5.5 to flag `falkor.rs` as an explicit compatibility wrapper. Added new acceptance items 1.5.11 (`config::tests::falkor_config_wrapper_shape` covering the local `FalkorConfig` field shape) and 1.5.12 (`falkor::tests::falkor_client_wrapper_shape` covering the local `FalkorClient`/`with_falkor` symbols and the gobby-core delegation). F2 (manifest validation criteria, multiple sections): rewrote every multi-filter `cargo test` command into `&&`-chained single-filter invocations (Cargo only accepts one `[TESTNAME]` filter per command), and replaced every `main::tests::*` filter with the actual binary-crate filter path `tests::*` (verified via `cargo test -p gobby-code --no-default-features tests::test_parse_graph_clear -- --list`, which resolves to `tests::test_parse_graph_clear: test` from `src/main.rs`). Affected entries: §1.2, §1.4, §1.5, §2.4, §2.5, §2.6, and §3.2. F3 (commands/mod.rs shared-file ownership, §2.6 vs §1.2): added `crates/gcode/src/commands/mod.rs` to §2.6 targets, added `1.2` to the §2.6 heading and M1 manifest `depends_on`, and added acceptance item 2.6.7 requiring `pub mod vector;` to be exported from `commands/mod.rs` after the §1.2 `pub mod setup;` edit lands. Whole-plan sweeps: F1 sweep — re-verified no other deliverable claims `FalkorConfig`/`FalkorClient`/`with_falkor` are pure re-exports of `gobby_core` types; §1.1's compatibility-facade clause for `falkor.rs` and §1.5's wrapper subsection are the only owners of the wrapper shape; no other gcode source file is required by the Phase 7 test. F2 sweep — re-checked every M1 manifest `validation_criteria` string against `cargo test`'s single-filter rule; the remaining entries (§1.1, §1.3, §2.1, §2.2, §2.3, §3.1, §3.3) already use single-filter or non-test commands and were left unchanged. F3 sweep — re-checked shared mod.rs export edits across the plan: `crates/gcode/src/commands/mod.rs` is the only existing mod.rs edited by multiple deliverables (§1.2 adds `pub mod setup;`, §2.6 adds `pub mod vector;`); new directories (`vector/`, `graph/`, `projection/`) each have a single deliverable owner that creates the directory's `mod.rs` alongside its module files, so no further sequencing is needed. `mcp__gobby-plans__validate_plan` reports valid=true. +- **R23 (2026-05-28)**: Addressed Round 22 blocking findings. F1 (Phase 7 source-inspection surface, VS1 / §1.5 and §2.3): expanded the §1.5 "Phase 7 compatibility wrapper" subsection to enumerate the eight public read helpers (`count_callers`, `count_usages`, `find_callers`, `find_usages`, `find_callers_batch`, `find_callees_batch`, `get_imports`, `blast_radius`) and their sibling Cypher-builder helpers (`count_callers_query`, `count_usages_query`, `find_callers_query`, `find_usages_query`, `find_callers_batch_query`, `find_callees_batch_query`, `get_imports_query`, `blast_radius_query`) that must remain in `crates/gcode/src/falkor.rs`, plus the literal source fragments the external test asserts (`urlencoding::encode(password)`, `falkor://:{}@{}:{}`, `.with_connection_info(conn_info)`, `.with_params(&` for example `with_params(¶ms)`, `result.header`, `FalkorValue::None`). Added acceptance items 1.5.13 (`crates/gcode/src/falkor.rs::tests::phase7_read_helpers_visible` pins read-helper plus `*_query` visibility) and 1.5.14 (`crates/gcode/src/falkor.rs::tests::phase7_source_fragments_visible` pins source-fragment visibility). Added `crates/gcode/src/falkor.rs` to §2.3 targets, added a paragraph to §2.3 body specifying that the public `falkor.rs` read helpers delegate their internal Falkor query work to the new `graph::code_graph` read APIs after §2.3 lands while keeping public signatures, `*_query` siblings, clamping behavior, and Phase 7 source fragments unchanged, and added acceptance 2.3.4 (`crates/gcode/src/falkor.rs::tests::read_helpers_delegate_to_code_graph`). F2 (existing top-level read command compatibility, AC1 / §2.4): added a paragraph to §2.4 body requiring existing top-level `gcode callers|usages|imports|blast-radius` commands to remain available as additive (not replacement) thin wrappers over `graph::code_graph` reads, preserving clap argument names, pagination behavior (`--limit`, `--offset`), `--depth` semantics, JSON field names, payload structure, and pagination metadata; new optional metadata fields per §2.1 are tagged with `#[serde(skip_serializing_if = "Option::is_none")]`. Added acceptance items 2.4.5 (existing parse tests `test_parse_callers_remains_top_level`, `test_parse_usages_remains_top_level`, `test_parse_imports_remains_top_level`, `test_parse_blast_radius_remains_top_level` stay green) and 2.4.6 (`crates/gcode/src/commands/graph.rs::tests::top_level_read_commands_preserve_json_shape` pins JSON shape compatibility). Whole-plan sweeps: F1 sweep — re-confirmed `crates/gcode/src/falkor.rs` is the only `gobby-code` source file the Phase 7 test source-inspects beyond `config.rs` (handled in R22); the read helpers, `*_query` siblings, and connection/query source fragments are now pinned in §1.5 and the §2.3 delegation is the only other plan-side touch point. F2 sweep — re-verified no other top-level CLI surface is at risk in this plan: `gcode index`, `gcode status`, `gcode invalidate`, `gcode search*`, `gcode outline`, `gcode symbol(s)`, `gcode kinds`, `gcode tree`, `gcode repo-outline`, `gcode init`, `gcode projects`, `gcode prune` are either unchanged or explicitly covered (the `gcode graph clear|rebuild` parse tests already exist as sub-commands, and graph/vector sync-file/clear/rebuild remain owned by §2.4/§2.6). M1 Task Manifest updated: §1.5 entry adds covers labels 1.5.13/1.5.14 and chains two new `&&` single-filter `cargo test` invocations; §2.3 entry adds covers label 2.3.4 and a chained single-filter test invocation; §2.4 entry adds covers labels 2.4.5/2.4.6 and chains the four existing parse-test filters plus the new JSON-shape test filter as separate `cargo test` invocations. Manifest still emits one leaf per deliverable; deliverable_count=14. +- **R25 (2026-05-28)**: Addressed all three Round 24 blocking findings. F1 (Phase 7 `Row` shape mismatch, VS1 / §1.5.15): the previous text required `pub type Row = HashMap`, but the external Phase 7 test source-inspects for the exact substring `pub type Row = HashMap` (with `Value` = `serde_json::Value`, which matches the current `crates/gcode/src/falkor.rs` shape — `use serde_json::{Map, Number, Value};` then `pub type Row = HashMap;`). Updated the §1.5 Phase 7 compatibility wrapper subsection prose and acceptance 1.5.15 to require `pub type Row = HashMap` where `Value` is `serde_json::Value` imported into scope so the unqualified token `Value` appears at the alias declaration site, kept `FalkorValue` for the internal `parse_falkor_result` / `falkor_value_to_json` conversion helper, and aligned the local `phase7_query_surface_visible` test with the external Phase 7 substring assertion. F2 (foundation `StandaloneSetup` contract direction, D1 / §1.2): rewrote §1.2's "Foundation contract requirement" subsection so `crates/gcode/src/setup.rs` implements `gobby_core::setup::StandaloneSetup` (defined in the foundation plan §1.4 as a trait with `namespace`, `owned_objects`, and `create` methods consuming a `SetupContext`) and declares gcode-owned `OwnedObject` entries whose `creator` closures own the literal gcode `CREATE TABLE`/`CREATE INDEX`/`CREATE EXTENSION` strings. gcode-owned DDL stays inside gcode creator callbacks; `gobby-core` is the contract owner (trait, `SetupContext`, `OwnedObject`, `SetupReport`, `SetupError`, `StoreKind`) but does not contain gcode domain DDL. Updated acceptance 1.2.8 to require a `GcodeStandaloneSetup`-like struct implementing the trait, an enumerated `owned_objects()` list with creator closures, namespace plus exclusion-list assertions (refusing Gobby-owned tables, `config_store`, `.gobby/project.json`), and execution of creator closures only through the foundation-supplied `SetupContext`. F3 (forbidden file in deliverable Targets, P1 / §1.2): removed `.gobby/project.json` from §1.2 `Targets` so the leaf is not routed to a file it is explicitly prohibited from modifying. `.gobby/project.json` remains in §1.2 prose, N1, A1, AC1, and acceptance 1.2.4/1.2.6/1.2.8 as a forbidden artifact the setup path must not touch. Whole-plan sweeps: F1 sweep — re-grepped the plan for any other `HashMap` / `pub type Row =` references; the only remaining `HashMap` mention is the §R24 changelog entry, which is a historical record of the previous (incorrect) shape and is intentionally left in place. The Phase 7 §2.3.4 cross-reference, §1.5.13/§1.5.14/§1.5.16 acceptance items, and `falkor.rs` source-fragment list reference IDs and other shapes only — no other section needed updating. F2 sweep — re-checked every deliverable for `setup`/`schema`/`DDL` plumbing references; §1.2 is the only deliverable that owns `setup.rs` and DDL execution, so the F2 fix is contained. Re-confirmed §1.5's `gobby-core` Cargo feature list (`postgres`, `falkor`, `qdrant`, `search`, `indexing` or `full`) already enables the `gobby_core::setup` module path because the foundation `setup.rs` is always available behind the `postgres` feature gate for the `pg` field; no Cargo feature change was needed for the F2 fix. F3 sweep — re-grepped every `Targets:` line for forbidden artifacts (`.gobby/project.json`, `config_store`); no other deliverable Targets list mentions either. M1 Task Manifest unchanged: deliverable_count=14, no covers labels changed, no validation_criteria changed (the existing chained `cargo test -p gobby-code --no-default-features setup::tests::standalone_setup_uses_gobby_core_contract` already covers the revised 1.2.8 contract; the existing chained `cargo test -p gobby-code --no-default-features falkor::tests::phase7_query_surface_visible` already covers the revised 1.5.15 contract). `mcp__gobby-plans__validate_plan` is expected to report valid=true. +- **R24 (2026-05-28)**: Addressed all three Round 23 blocking findings. F1 (Phase 7 source-inspection surface still incomplete, VS1 / §1.5 and §2.3): extended the §1.5 "Phase 7 compatibility wrapper" subsection to enumerate the remaining shapes the external Phase 7 test asserts but that R23 did not yet pin — `pub type Row = HashMap`, `pub fn query(&mut self, cypher: &str, params: Option>) -> anyhow::Result>` on `impl FalkorClient`, `fn parse_falkor_result(...)`, the source fragments `let mut client =` and `ctx.falkordb`, the production-read-query helpers `cypher_string_literal`/`id_list_literal`/`clamp_offset`, and the literal Cypher fragments `target:CodeSymbol OR target:UnresolvedCallee OR target:ExternalSymbol`, `SKIP {offset} LIMIT {limit}`, and `target.id IN [{ids}]`. Documented the unbound-parameter ban: generated Cypher strings produced by the public `*_query` helpers must not contain `$offset`, `$limit`, or `$ids`; pagination and ID-list values are substituted inline via `clamp_offset` / `cypher_string_literal` / `id_list_literal`. Added two new source fragments (`let mut client =`, `ctx.falkordb`) to acceptance 1.5.14. Added acceptance items 1.5.15 (`crates/gcode/src/falkor.rs::tests::phase7_query_surface_visible` pins `Row` / `query` / `parse_falkor_result` shape) and 1.5.16 (`crates/gcode/src/falkor.rs::tests::phase7_query_helpers_and_literal_fragments_visible` pins helper presence, literal Cypher fragments, and unbound-parameter ban). Extended the §2.3 delegation paragraph and acceptance 2.3.4 to cross-reference §1.5.15/§1.5.16 alongside the previously pinned §1.5.13/§1.5.14 so the delegated helpers continue to preserve every Phase 7 source-inspection assertion. F2 (early dispatch + foundation setup contract, P1 / §1.2): added an "Early-dispatch requirement" subsection to §1.2 body stating that `gcode setup --standalone` dispatches from `main.rs` in the early-dispatch block alongside `Init`, `Projects`, and `Prune` before `Context::resolve()` is called, since setup creates the prerequisites that context resolution would otherwise require. Added a "Foundation contract requirement" subsection requiring `crates/gcode/src/setup.rs` to perform standalone schema/DDL work through the shared `gobby_core::setup::StandaloneSetup` (or equivalent foundation contract) boundary rather than bespoke DDL plumbing, with the foundation contract owning all `CREATE TABLE`/`CREATE INDEX`/`CREATE EXTENSION` calls and rejecting any request that would touch Gobby-owned tables, `config_store`, or `.gobby/project.json`. Added acceptance items 1.2.7 (`crates/gcode/src/main.rs::tests::setup_runs_before_context_resolve`) and 1.2.8 (`crates/gcode/src/setup.rs::tests::standalone_setup_uses_gobby_core_contract`). F3 (manifest validation criteria sweep): updated the M1 manifest so every entry whose source section contains a `test:` acceptance item runs each declared test through `validation_criteria`. Specifically: §1.1 adds `cargo test -p gobby-code --no-default-features lib::tests::public_projection_api_is_cli_independent` (covers 1.1.3) and `cargo test -p gobby-code --no-default-features lib::tests::falkor_facade_is_available` (covers 1.1.4 local proxy; the external Phase 7 test remains a VS1 cross-repo gate referenced informationally on acceptance 1.1.4); §1.2 chains the new `setup_runs_before_context_resolve` and `standalone_setup_uses_gobby_core_contract` tests; §1.5 chains the new `phase7_query_surface_visible` and `phase7_query_helpers_and_literal_fragments_visible` tests; §2.1 chains `graph::report::tests::bridge_edges_are_hypotheses` (covers 2.1.3); §2.2 chains `graph::code_graph::tests::cleanup_orphans_is_project_scoped` (covers 2.2.3) and `models::tests::uuid5_python_parity` (covers 2.2.4); §2.3 chains `search::graph_boost::tests` (covers 2.3.2); §2.4 chains `cargo test -p gobby-code --no-default-features --test graph_standalone` (covers 2.4.4); §2.5 chains `vector::code_symbols::tests::collection_name_compatibility` (2.5.3), `commands::vector::tests::vector_lifecycle_requires_config` (2.5.4), and `vector::code_symbols::tests::summaries_are_optional_enrichment` (2.5.5); §2.6 chains `projection::sync::tests::sync_state_tracks_projection_success` (2.6.2), `commands::vector::tests::lifecycle_json_contract` (2.6.3), and `cargo test -p gobby-code --no-default-features --test projection_standalone` (2.6.4); §3.1 chains `graph::report::tests::bridge_edges_are_read_only` (3.1.3) and `graph::report::tests::report_degradation_contract` (3.1.4); §3.2 chains `commands::graph::tests::report_text_structured_output` (3.2.2) and `commands::graph::tests::report_requires_graph_service` (3.2.3). Service-gated integration tests (`--test graph_standalone`, `--test projection_standalone`, `--test vector_projection ...`) keep their env-gated skip behavior but are now invoked at leaf-validation time so the contract is exercised when the required services are present. Whole-plan sweeps: F1 sweep — re-confirmed `crates/gcode/src/falkor.rs` remains the only `gobby-code` source file the Phase 7 test source-inspects; §1.5 now pins every shape, fragment, helper, and unbound-parameter ban that the Gobby-repo test asserts, and §2.3.4 cross-references the full §1.5.13/§1.5.14/§1.5.15/§1.5.16 set so internal delegation must preserve all of them. F2 sweep — re-checked every command for early-dispatch sequencing: `Init`, `Projects`, `Prune`, and now `Setup` are the early-dispatch commands; all other commands (graph/vector lifecycle, index, search, status, etc.) correctly require `Context::resolve()`. F3 sweep — re-verified every M1 entry whose `source_section` contains a `test:` acceptance item: §1.1, §1.2, §1.3, §1.4, §1.5, §2.1, §2.2, §2.3, §2.4, §2.5, §2.6, §3.1, §3.2 now all have validation criteria that run every declared test. §3.3 is docs-only (no `test:` acceptance) and was left unchanged. Manifest covers labels expanded: §1.2 +2 (1.2.7/1.2.8), §1.5 +2 (1.5.15/1.5.16). Manifest still emits one leaf per deliverable; deliverable_count=14. +- **R27 (2026-05-28)**: Addressed both Round 26 blocking findings. F1 (missing-requirement — Phase 7 source-inspection contract still partial, VS1 / §1.5.11 and §1.5.16 vs `gobby/tests/code_index/test_gcode_phase7_contract.py`): extended the §1.5 "Phase 7 compatibility wrapper" subsection with an "Additional Phase 7 contract assertions" subsection enumerating the remaining items the external test asserts but that R25/R26 had not pinned. Added Cargo manifest contract (`[package].name = "gobby-code"`, `[[bin]]` entry for `gcode`, `[dependencies]` pinning `falkordb = "0.2"` and `urlencoding = "2"` plus `base64` and `reqwest`), `Cargo.lock` state (must contain `falkordb` and `urlencoding`; must NOT contain `neo4j` or `neo4rs`), `Context` struct shape (`pub falkordb: Option`), resolver invocation literal (`let falkordb = resolve_falkordb_config(`), graph-name source pattern (inline literal or const+assignment), config-store key literals (`databases.falkordb.host`, `databases.falkordb.port`, `databases.falkordb.requirepass`), additional production-read-query clamping (`depth.clamp(1, 5)`, `limit.clamp(1, MAX_GRAPH_LIMIT)`, `offset.min(MAX_GRAPH_LIMIT)`), additional literal Cypher fragments (`src.id IN [{ids}]` and standalone `LIMIT {limit}`), and the `fn blast_radius_query(depth: usize, limit: usize)` signature. Added §1.5 acceptance items 1.5.17 (`crates/gcode/src/falkor.rs::tests::phase7_cargo_and_lockfile_state` pins Cargo+lockfile state — Cargo.lock added to §1.5 Targets), 1.5.18 (`crates/gcode/src/config.rs::tests::phase7_context_and_falkor_resolver_visible` pins `Context` field + resolver call + graph-name pattern), 1.5.19 (`crates/gcode/src/config.rs::tests::phase7_falkordb_config_store_keys_visible` pins the config-store key literals), 1.5.20 (`crates/gcode/src/falkor.rs::tests::phase7_additional_query_fragments_visible` pins the additional clamping expressions, literal fragments, and `blast_radius_query` signature). VS1's existing "Phase 7 contract tests in the Gobby repo pass against the updated gcode binary" bullet remains the cross-repo backstop. F2 (weak-testability — M1 manifest §2.5 / acceptance 2.5.2 only ran two named filters from `crates/gcode/tests/vector_projection.rs`): replaced the two `--test vector_projection ensure_creates_missing_and_reuses_compatible` and `--test vector_projection incompatible_existing_collection_errors_without_migration` filtered invocations with the unfiltered `cargo test -p gobby-code --no-default-features --test vector_projection` binary invocation, so every test in the integration file runs (including delete-by-file, upsert, clear, and rebuild coverage that acceptance 2.5.2 names). The acceptance text itself is unchanged because it already points at the file as a whole and enumerates the required behaviors. Whole-plan sweeps: F1 sweep — re-grepped every M1 manifest entry and every `kind: deliverable` section for residual Phase 7 source-inspection gaps; `crates/gcode/src/falkor.rs` and `crates/gcode/src/config.rs` remain the only `gobby-code` source files the external test source-inspects, and §1.5.13/§1.5.14/§1.5.15/§1.5.16/§1.5.17/§1.5.18/§1.5.19/§1.5.20 together now enumerate every assertion in `test_gcode_phase7_contract.py`. The §2.3.4 cross-reference still pins `falkor.rs` read-helper delegation against §1.5.13–§1.5.16; it does not need to reference the new items because §1.5.17/§1.5.18/§1.5.19 are config/cargo-side and §1.5.20 covers production fragments already inside §1.5's wrapper scope. F2 sweep — re-checked every M1 `validation_criteria` for filtered-only `--test` invocations that should run a full integration binary; the only other integration binaries in the manifest are §2.4 (`--test graph_standalone` — already unfiltered) and §2.6 (`--test projection_standalone` — already unfiltered). No other entry needed the same fix. M1 Task Manifest changes: §1.5 entry adds covers labels 1.5.17/1.5.18/1.5.19/1.5.20 and chains four new `&&` single-filter `cargo test` invocations for the new tests; §2.5 entry replaces two filtered `--test vector_projection` invocations with one unfiltered invocation. Manifest still emits one leaf per deliverable; deliverable_count=14. `mcp__gobby-plans__validate_plan` expected to report valid=true. +- **R28 (2026-05-28)**: Addressed both Round 27 blocking findings. F1 (missing-requirement — O1/D1/A1 vs §1.4/§1.5/M1 on generic indexing/search primitive consumer migration): added new §1.6 deliverable "Consume gobby-core generic indexing and search primitives" with targets `crates/gcode/src/index/walker.rs`, `crates/gcode/src/index/hasher.rs`, `crates/gcode/src/index/chunker.rs`, `crates/gcode/src/search/rrf.rs`, `crates/gcode/src/commands/search.rs`, `crates/gcode/src/lib.rs` and `depends: 1.4` (transitively §1.5 for the Cargo feature wiring §1.5.1 already requires). The deliverable migrates the three overlapping primitives that currently duplicate `gobby-core` logic: `index/walker.rs::discover_files` consumes `gobby_core::indexing::WalkerSettings::into_walker` instead of constructing `ignore::WalkBuilder` directly with the duplicated `git_ignore/git_global/git_exclude` chain; `index/hasher.rs::file_content_hash` delegates to `gobby_core::indexing::file_content_hash` and removes the duplicated 65_536-byte buffer SHA-256 streaming body; `search/rrf.rs::merge` delegates to `gobby_core::search::rrf_merge` and removes the duplicated `RRF_K = 60.0` constant plus fusion loop, with either a tuple-compatibility wrapper or a `commands/search.rs` call-site update to consume `SearchResult` directly. The deliverable explicitly narrows two primitives that do not cleanly overlap: `index/chunker.rs` remains gcode-owned because line-based `ContentChunk` (100-line chunks with 10-line overlap, `project_id`/`chunk_index`/`line_start`/`line_end`/`language`/`created_at` fields) does not overlap with `gobby_core::indexing::Chunk` (byte-range with opaque metadata), and gcode tracks file-change events via PostgreSQL `indexed_files.content_hash` state rather than `gobby_core::indexing::IndexEvent` / `index_events_from_hashes`. Acceptance items 1.6.1–1.6.5 pin each delegation site plus the narrowing assertion plus a `crates/gcode/src/lib.rs::tests::indexing_search_primitive_migration` regression test. F2 (missing-requirement — VS1 / §1.5.17–§1.5.20 vs Phase 7 `_assert_neo4j_transition_state`): extended the §1.5 "Additional Phase 7 contract assertions" subsection with a "Neo4j transition state (source-level absence branch)" bullet explicitly committing to the absence branch of `_assert_neo4j_transition_state` (the external helper accepts either a complete transitional `Neo4jConfig`/`Context.neo4j`/`resolve_neo4j_config` shape or source-level absence; the plan chooses absence because `gobby-code` has no Neo4j references today). The new bullet states `config.rs` MUST NOT declare `pub struct Neo4jConfig { ... }`, MUST NOT contain `resolve_neo4j_config`, and MUST NOT carry `pub neo4j: Option` on any struct. Added acceptance 1.5.21 (`crates/gcode/src/config.rs::tests::phase7_neo4j_transition_state_absent`) pinning the absence-branch checks and a clear escalation path if a future Neo4j transition is reintroduced. Whole-plan sweeps: F1 sweep — re-grepped every deliverable Targets list for additional generic indexing/search primitives that `gobby-core` exposes but `gobby-code` still duplicates: `gobby_core::indexing` exposes `WalkerSettings`, `content_hash`, `file_content_hash`, `Chunk`, `ChunkIdentity`, `IndexEvent`, `index_events_from_hashes`; `gobby_core::search` exposes `SearchResult`, `SourceExplanation`, `SearchDegradation`, `rrf_merge`. §1.6 now owns the three overlapping ones (walker, file content hash, rrf merge) and narrows the four that do not overlap (Chunk/ChunkIdentity/IndexEvent/index_events_from_hashes via chunker narrowing + indexer non-use). Other gobby-core primitives are already consumed via §1.5 (config/postgres/falkor/qdrant). F2 sweep — re-grepped every Phase 7 assertion in `_assert_*` helpers; §1.5.13/§1.5.14/§1.5.15/§1.5.16/§1.5.17/§1.5.18/§1.5.19/§1.5.20/§1.5.21 now collectively cover every assertion (FalkorConfig wrapper shape, FalkorClient wrapper shape, read-helper surface, source fragments, Row/query/parse_falkor_result surface, helper/literal fragments + unbound-parameter ban, Cargo.toml/Cargo.lock state, Context+resolver, config-store key literals, additional clamping/literal fragments + blast_radius_query signature, and Neo4j transition state). No remaining `_assert_*` helper in `test_gcode_phase7_contract.py` is unmapped. M1 Task Manifest changes: §1.5 entry adds `covers:gcode-graph-enhancements:1.5:1.5.21` label and chains `&& cargo test -p gobby-code --no-default-features config::tests::phase7_neo4j_transition_state_absent` to `validation_criteria`; new §1.6 entry inserted between §1.5 and §2.1 with `depends_on: ["1.4"]`, 5 covers labels (1.6.1–1.6.5), and chained single-filter `cargo test` invocations for each test. Manifest still emits one leaf per deliverable; deliverable_count=15. +- **R26 (2026-05-28)**: Addressed both Round 25 blocking findings. F1 (weak-testability — `lib::tests::*` cargo filters match zero tests, M1 manifest §§1.1 and 1.5): tests defined under `#[cfg(test)] mod tests {}` in `crates/gcode/src/lib.rs` are filtered by cargo as `tests::`, not `lib::tests::` (there is no implicit `lib` segment in the test path because `lib.rs` is the crate root; the prior R24 change followed the same pattern that R22 already fixed for `main::tests::*` → `tests::*` in main.rs). Cargo exits success when a filter matches zero tests, so `lib::tests::*` filters silently passed without exercising the underlying lib tests. Rewrote the affected `validation_criteria` strings: §1.1 entry now invokes `cargo test -p gobby-code --no-default-features tests::public_projection_api_is_cli_independent && cargo test -p gobby-code --no-default-features tests::falkor_facade_is_available` (covers 1.1.3 and 1.1.4); §1.5 entry now invokes `cargo test -p gobby-code --no-default-features tests::foundation_consumer_migration` for the consumer-migration regression (covers 1.5.7). Acceptance-item text in §1.1 and §1.5 already uses `file::tests::name` documentation form (`crates/gcode/src/lib.rs::tests::*`) and is unchanged because the acceptance refs are file-rooted descriptors, not cargo filters. F2 (bad-sequencing — §1.2 and §2.4 share `crates/gcode/src/main.rs` with no dependency edge): §1.2 adds the top-level `setup` subcommand plus its early-dispatch handler in `main.rs` (before `Context::resolve()`), while §2.4 adds or rewires the graph subcommand registrations and dispatch arms in the same `Command` enum and `match` block. The previous depends_on (`1.4, 2.2, 2.3`) let expansion schedule the setup and graph leaves in parallel, risking file contention and one rewrite missing the other's match arms. Added `1.2` to the §2.4 heading dependency list (now `(depends: 1.2, 1.4, 2.2, 2.3)`) and to the M1 manifest §2.4 entry `depends_on` list so §2.4 cannot start before §1.2's CLI registration lands. Whole-plan sweeps: F1 sweep — re-grepped every M1 `validation_criteria` for `lib::tests::*` after the §1.1/§1.5 fixes; no other entry references the `lib::tests::` prefix; the remaining manifest entries either use file-rooted module filters (e.g., `commands::graph::tests::*`, `config::tests::*`, `falkor::tests::*`, `vector::code_symbols::tests::*`, `graph::report::tests::*`, `projection::sync::tests::*`, `models::tests::*`, `search::graph_boost::tests`, `setup::tests::*`, `schema::tests::*`, `commands::setup::tests::*`, `commands::vector::tests::*`, `commands::index::tests::*`, `graph::code_graph::tests::*`, `graph::typed_query::tests`, `index::indexer::tests::*`) which match the actual module path of the test, the `tests::*` form for main.rs binary tests that R22 already fixed, or `--test ` for integration tests under `crates/gcode/tests/`. F2 sweep — re-checked every deliverable that targets `crates/gcode/src/main.rs` against the dependency graph: §1.1 (foundation, no deps), §1.2 (depends: 1.1), §2.4 (now depends: 1.2, 1.4, 2.2, 2.3 after F2 fix), §2.6 (depends: 1.2, 1.4, 2.4, 2.5 — already sequences after §1.2), §3.2 (depends: 2.6, 3.1 — transitively after §1.2 and §2.4). The chain is now §1.1 → §1.2 → §2.4 → §2.6 → §3.2 with §1.4 a parallel sibling under §1.1 feeding §2.4 and §2.6, so every later leaf inherits the §1.2 edge transitively. No other shared-file pair on `main.rs` is missing a dependency edge. M1 Task Manifest changes: §1.1 entry `validation_criteria` rewritten (no covers-label change); §1.5 entry `validation_criteria` rewritten (no covers-label change); §2.4 entry `depends_on` adds `"1.2"` (no covers-label change). Manifest still emits one leaf per deliverable; deliverable_count=14. +- **R29 (2026-05-28)**: Addressed Round 28 blocking finding. F1 (missing-requirement — P1 / §1.5 and the foundation `gobby_core::falkor::GraphClient` API): the previous §1.5 wording required `crates/gcode/src/falkor.rs` to preserve the Phase 7 source-inspection facade shape `pub struct FalkorClient { graph: SyncGraph }` while routing through `gobby_core::falkor::with_graph` / `GraphClient::from_config`. The foundation `gobby_core::falkor::GraphClient { graph: SyncGraph }` has a private `graph` field, `with_graph(..., |gc| ...)` exposes only `&mut GraphClient` to the closure, and there is no public `into_sync_graph` / `from_graph_client` / `with_graph_client` hook for unwrapping the `SyncGraph` or wrapping a `GraphClient` as the Phase 7-required local `FalkorClient`. The previous wording therefore demanded an impossible delegation. Chose the adversary's narrowing option (the alternative — adding a foundation-side prerequisite to extend `crates/gcore/src/falkor.rs` — would expand scope outside any `gobby-cli` deliverable target and bleed into the gobby-core foundation plan). Surgical fixes: (a) added a new A1 bullet "Phase 7 compatibility facade exception" allowing `crates/gcode/src/falkor.rs` only to instantiate `falkordb::FalkorClientBuilder` directly and own its own `SyncGraph` because the foundation API does not expose a public hook for building the Phase 7-required local shape; config resolution (host, port, password) still routes through `gobby_core::config::resolve_falkordb_config` via the §1.5 `ConfigSource` adapter, and the exception is scoped to `falkor.rs` only. The new A1 bullet mirrors the existing "Vector projection lifecycle exception" pattern. (b) Rewrote the §1.5 "Phase 7 compatibility wrapper" subsection: the second half of the `FalkorClient`/`with_falkor` bullet now spells out the facade exception (`FalkorClient::from_config` builds the local `SyncGraph` via `FalkorClientBuilder` / `FalkorConnectionInfo`; `with_falkor` reads `ctx.falkordb` and builds a `FalkorClient` via the same path; config resolution still routes through `gobby_core::config::resolve_falkordb_config`; all other graph consumers MUST use `gobby_core::falkor::with_graph`). (c) Updated the closing line of the §1.5 subsection (the "wrapper layer is the only place..." sentence) so other consumers MUST use `gobby_core::falkor::with_graph` directly rather than "call the wrapper or `gobby_core::falkor::with_graph` directly". (d) Rewrote acceptance 1.5.5 to drop the impossible "routes through `gobby_core::falkor::with_graph` / `GraphClient`" claim and state the narrowed boundary (config via gobby-core, builder allowed in `falkor.rs`, scope pinned by §1.5.22). (e) Rewrote acceptance 1.5.12 to drop the impossible delegation `with_falkor(... |gc| f(&mut FalkorClient::wrapping(gc)))` text and describe the actual implementation path (`FalkorClient::from_config` builds via `FalkorClientBuilder` chain because GraphClient.graph is private; `with_falkor` uses `FalkorClient::from_config` against `ctx.falkordb`; config still gobby-core). (f) Added new acceptance 1.5.22 pinning the single-file scope: `falkor.rs` is the only `gobby-code` source file that instantiates `falkordb::FalkorClientBuilder` or bypasses `gobby_core::falkor::with_graph`'s ServiceState boundary; all other graph consumers (§2.2, §2.3, §2.4, §2.6, §3.1, §3.2, plus `search::graph_boost`) enter Falkor through `gobby_core::falkor::with_graph`. Covering test: `crates/gcode/src/lib.rs::tests::falkor_facade_exception_scoped_to_falkor_rs`. Whole-plan sweep (F1 class — other gobby-core consumer assumptions that require non-existent public adapter hooks): re-checked every `gobby_core::*` consumer surface the plan invokes — `gobby_core::config::resolve_*_config` / `CoreContext` / `EnvOnlySource` / `ConfigSource` / `decode_config_value` (trait + functions, no facade conflict), `gobby_core::postgres::read_config_value` / `connect_readonly` / `connect_readwrite` (free functions, no facade conflict), `gobby_core::qdrant::with_qdrant` / `collection_name` / `search` / `upsert` (`gobby-code` has no Phase 7-mandated local Qdrant wrapper, so consumers use whatever the foundation provides directly — no facade conflict), `gobby_core::setup::StandaloneSetup` / `SetupContext` / `OwnedObject` / `SetupReport` / `SetupError` / `StoreKind` (trait + public types implemented/consumed, no facade conflict), `gobby_core::indexing::WalkerSettings::into_walker` / `file_content_hash` / `content_hash` (functions/types consumed via §1.6, no facade conflict), `gobby_core::search::rrf_merge` / `SearchResult` (functions/types consumed via §1.6, no facade conflict). The Falkor facade is the only case where the plan attempts to wrap a foundation type whose internal field is private and whose public API does not expose the hook required for the wrapping. No other section needed the same narrowing. M1 Task Manifest changes: §1.5 entry adds `covers:gcode-graph-enhancements:1.5:1.5.22` label and chains `&& cargo test -p gobby-code --no-default-features tests::falkor_facade_exception_scoped_to_falkor_rs` to `validation_criteria`. Manifest still emits one leaf per deliverable; deliverable_count=15. +- **R30 (2026-05-28)**: Addressed Round 29 blocking finding. F1 (traceability — P1 / §1.5): R29 added the A1 "Phase 7 compatibility facade exception" and rewrote the §1.5 "Phase 7 compatibility wrapper" subsection plus acceptance items 1.5.5 / 1.5.12 / 1.5.22 to allow `crates/gcode/src/falkor.rs` to instantiate `falkordb::FalkorClientBuilder` directly and bypass `gobby_core::falkor::with_graph`'s ServiceState boundary, but left two stale normative prose locations that still required the old impossible boundary, contradicting the facade exception and producing a §1.5 leaf that could follow one part of the section and fail another. Surgical fixes (R30): (a) Rewrote the §1.5 "Module migration" bullet for `crates/gcode/src/falkor.rs` (previously claimed `falkor.rs` "routes connection plumbing and graph queries through `gobby_core::falkor::with_graph` / `gobby_core::falkor::GraphClient::from_config(config, graph_name)`"); the new wording matches the R29 facade exception — `falkor.rs` resolves connection-level FalkorConfig fields (host, port, password) through `gobby_core::config::resolve_falkordb_config` via the §1.5 `ConfigSource` adapter, but owns the local Phase 7 `FalkorClient { graph: SyncGraph }` / `with_falkor` connection path (instantiating `falkordb::FalkorClientBuilder` directly) because the foundation `GraphClient.graph` field is private and exposes no public hook for building the Phase 7-required local shape; single-file scope pinned by §1.5.22, all other graph consumers MUST use `gobby_core::falkor::with_graph`; cross-references the A1 facade exception bullet and the "Phase 7 compatibility wrapper" subsection. (b) Rewrote the §1.5 "Behavioral guarantees" bullet for FalkorDB ServiceState transitions (previously claimed "All FalkorDB ServiceState transitions in `gobby-code` enter through `gobby_core::falkor::with_graph`"); the new wording carves out `falkor.rs` — graph consumers OUTSIDE `crates/gcode/src/falkor.rs` enter through `gobby_core::falkor::with_graph`; `falkor.rs` itself owns the local Phase 7 facade connection path per the A1 facade exception and the §1.5 wrapper subsection; single-file scope pinned by §1.5.22; the "does not implement its own four-state Falkor probe" claim is preserved (the facade exception is limited to the connection-building chain required by the external Phase 7 source-inspection contract, not to reimplementing ServiceState probing). Whole-plan F1 sweep (per the adversary's "sweep the current normative plan body, excluding historical changelog entries, for remaining `falkor.rs` + `with_graph` delegation wording before resubmission" instruction): re-grepped the normative body (lines before the V1 Plan Changelog) for any other prose claiming `falkor.rs` delegates connection plumbing or graph queries through `gobby_core::falkor::with_graph` / `gobby_core::falkor::GraphClient::from_config`. Found and fixed one additional location — the §1.5 source-fragment subsection intro line and its closing line (previously said the named source fragments must remain "even if their surrounding bodies are restructured to delegate to `gobby_core::falkor::with_graph` / `gobby_core::falkor::GraphClient`" and that "Wrapper internals MAY add `gobby_core::falkor` delegation alongside them"); the new wording is explicit that the connection-building bodies (`FalkorClient::from_config`, `with_falkor`) own the local `falkordb::FalkorClientBuilder` / `FalkorConnectionInfo` / `SyncGraph` chain directly per the facade exception, while the read-helper query bodies internally delegate to `graph::code_graph` once §2.3 lands (per §2.3.4, which itself uses `gobby_core::falkor::with_graph`); the named fragments MUST remain visible regardless of which delegation path the enclosing body follows. The remaining `with_graph` / `GraphClient` references in the normative body (A1 facade exception bullet at line 28, §1.5 wrapper subsection at line 234, §1.5 closing wrapper-layer sentence at line 293, acceptance 1.5.5 / 1.5.12 / 1.5.22, acceptance 1.5.13 / 1.5.14 / 1.5.15 with permissive "may delegate" language, §2.3 read-helper delegation at line 423, acceptance 2.3.4 at line 430) are all consistent with the facade exception and the R30 fixes — they either describe the facade exception itself, scope the single-file boundary, or use permissive language ("may") that doesn't conflict with the carved-out connection-building bodies. No acceptance items were added or removed by R30, no covers labels changed, and no manifest validation_criteria changed (the §1.5 entry already chains the `falkor_facade_exception_scoped_to_falkor_rs` test that pins the single-file scope). `gobby plans validate` is expected to report valid=true, phase_count=3, deliverable_count=15, contract_plan=true. + +## M1 Task Manifest +`kind: manifest` ```yaml -- title: Create gobby-code graph library boundary +- title: Create gobby-code projection library boundary category: code task_type: feature depends_on: [] - validation_criteria: "cargo build -p gobby-code --no-default-features" + validation_criteria: "cargo build -p gobby-code --no-default-features && cargo test -p gobby-code --no-default-features tests::public_projection_api_is_cli_independent && cargo test -p gobby-code --no-default-features tests::falkor_facade_is_available" labels: - - covers:unknown:1.1:1.1.1 - - covers:unknown:1.1:1.1.2 - - covers:unknown:1.1:1.1.3 - - covers:unknown:1.1:1.1.4 - tdd: true - source_section: '1.1' + - covers:gcode-graph-enhancements:1.1:1.1.1 + - covers:gcode-graph-enhancements:1.1:1.1.2 + - covers:gcode-graph-enhancements:1.1:1.1.3 + - covers:gcode-graph-enhancements:1.1:1.1.4 implementation_domain: backend - assigned_agent: backend-developer + tdd: true + source_section: "1.1" - title: Add explicit standalone setup category: code task_type: feature depends_on: - - '1.1' - validation_criteria: "crates/gcode/src/setup.rs::tests::standalone_setup_is_scoped" + - "1.1" + validation_criteria: "cargo test -p gobby-code --no-default-features schema::tests::missing_schema_requires_setup && cargo test -p gobby-code --no-default-features setup::tests::standalone_setup_is_scoped && cargo test -p gobby-code --no-default-features commands::setup::tests::standalone_command_is_scoped && cargo test -p gobby-code --no-default-features tests::parse_setup_standalone && cargo test -p gobby-code --no-default-features tests::setup_runs_before_context_resolve && cargo test -p gobby-code --no-default-features setup::tests::standalone_setup_uses_gobby_core_contract" labels: - - covers:unknown:1.2:1.2.1 - - covers:unknown:1.2:1.2.2 - - covers:unknown:1.2:1.2.3 - - covers:unknown:1.2:1.2.4 - tdd: true - source_section: '1.2' + - covers:gcode-graph-enhancements:1.2:1.2.1 + - covers:gcode-graph-enhancements:1.2:1.2.2 + - covers:gcode-graph-enhancements:1.2:1.2.3 + - covers:gcode-graph-enhancements:1.2:1.2.4 + - covers:gcode-graph-enhancements:1.2:1.2.5 + - covers:gcode-graph-enhancements:1.2:1.2.6 + - covers:gcode-graph-enhancements:1.2:1.2.7 + - covers:gcode-graph-enhancements:1.2:1.2.8 implementation_domain: backend - assigned_agent: backend-developer + tdd: true + source_section: "1.2" - title: Add typed Falkor query boundary category: code task_type: feature depends_on: - - '1.1' - validation_criteria: "crates/gcode/src/graph/typed_query.rs::tests" + - "1.1" + - "1.5" + validation_criteria: "cargo test -p gobby-code --no-default-features graph::typed_query::tests" labels: - - covers:unknown:1.3:1.3.1 - - covers:unknown:1.3:1.3.2 - - covers:unknown:1.3:1.3.3 + - covers:gcode-graph-enhancements:1.3:1.3.1 + - covers:gcode-graph-enhancements:1.3:1.3.2 + - covers:gcode-graph-enhancements:1.3:1.3.3 + implementation_domain: backend tdd: true - source_section: '1.3' + source_section: "1.3" +- title: Add reusable code-fact indexing library API + category: code + task_type: feature + depends_on: + - "1.1" + - "1.5" + validation_criteria: "cargo test -p gobby-code --no-default-features index::indexer::tests::library_api_is_cli_independent && cargo test -p gobby-code --no-default-features index::indexer::tests::library_writes_all_code_facts" + labels: + - covers:gcode-graph-enhancements:1.4:1.4.1 + - covers:gcode-graph-enhancements:1.4:1.4.2 + - covers:gcode-graph-enhancements:1.4:1.4.3 + - covers:gcode-graph-enhancements:1.4:1.4.4 implementation_domain: backend - assigned_agent: backend-developer -- title: Define graph provenance metadata + tdd: true + source_section: "1.4" +- title: Wire gcode to the gobby-core foundation category: code task_type: feature depends_on: - - '1.1' - validation_criteria: "crates/gcode/src/graph/code_graph.rs::tests::code_edges_carry_provenance" + - "1.1" + validation_criteria: "cargo build -p gobby-code && cargo build -p gobby-code --no-default-features && cargo test -p gobby-code --no-default-features tests::foundation_consumer_migration && cargo test -p gobby-code --no-default-features config::tests::adapter_env_precedence_and_json_decode && cargo test -p gobby-code --no-default-features config::tests::adapter_resolves_config_store_secrets && cargo test -p gobby-code --no-default-features config::tests::falkor_config_wrapper_shape && cargo test -p gobby-code --no-default-features falkor::tests::falkor_client_wrapper_shape && cargo test -p gobby-code --no-default-features falkor::tests::phase7_read_helpers_visible && cargo test -p gobby-code --no-default-features falkor::tests::phase7_source_fragments_visible && cargo test -p gobby-code --no-default-features falkor::tests::phase7_query_surface_visible && cargo test -p gobby-code --no-default-features falkor::tests::phase7_query_helpers_and_literal_fragments_visible && cargo test -p gobby-code --no-default-features falkor::tests::phase7_cargo_and_lockfile_state && cargo test -p gobby-code --no-default-features config::tests::phase7_context_and_falkor_resolver_visible && cargo test -p gobby-code --no-default-features config::tests::phase7_falkordb_config_store_keys_visible && cargo test -p gobby-code --no-default-features falkor::tests::phase7_additional_query_fragments_visible && cargo test -p gobby-code --no-default-features config::tests::phase7_neo4j_transition_state_absent && cargo test -p gobby-code --no-default-features tests::falkor_facade_exception_scoped_to_falkor_rs" labels: - - covers:unknown:2.1:2.1.1 - - covers:unknown:2.1:2.1.2 - - covers:unknown:2.1:2.1.3 + - covers:gcode-graph-enhancements:1.5:1.5.1 + - covers:gcode-graph-enhancements:1.5:1.5.2 + - covers:gcode-graph-enhancements:1.5:1.5.3 + - covers:gcode-graph-enhancements:1.5:1.5.4 + - covers:gcode-graph-enhancements:1.5:1.5.5 + - covers:gcode-graph-enhancements:1.5:1.5.6 + - covers:gcode-graph-enhancements:1.5:1.5.7 + - covers:gcode-graph-enhancements:1.5:1.5.8 + - covers:gcode-graph-enhancements:1.5:1.5.9 + - covers:gcode-graph-enhancements:1.5:1.5.10 + - covers:gcode-graph-enhancements:1.5:1.5.11 + - covers:gcode-graph-enhancements:1.5:1.5.12 + - covers:gcode-graph-enhancements:1.5:1.5.13 + - covers:gcode-graph-enhancements:1.5:1.5.14 + - covers:gcode-graph-enhancements:1.5:1.5.15 + - covers:gcode-graph-enhancements:1.5:1.5.16 + - covers:gcode-graph-enhancements:1.5:1.5.17 + - covers:gcode-graph-enhancements:1.5:1.5.18 + - covers:gcode-graph-enhancements:1.5:1.5.19 + - covers:gcode-graph-enhancements:1.5:1.5.20 + - covers:gcode-graph-enhancements:1.5:1.5.21 + - covers:gcode-graph-enhancements:1.5:1.5.22 + implementation_domain: backend tdd: true - source_section: '2.1' + source_section: "1.5" +- title: Consume gobby-core generic indexing and search primitives + category: code + task_type: feature + depends_on: + - "1.4" + validation_criteria: "cargo build -p gobby-code --no-default-features && cargo test -p gobby-code --no-default-features index::walker::tests::walker_consumes_gobby_core_walker_settings && cargo test -p gobby-code --no-default-features index::hasher::tests::file_content_hash_delegates_to_gobby_core && cargo test -p gobby-code --no-default-features search::rrf::tests::merge_delegates_to_gobby_core_rrf && cargo test -p gobby-code --no-default-features index::chunker::tests::chunker_stays_gcode_owned_with_documented_narrowing && cargo test -p gobby-code --no-default-features tests::indexing_search_primitive_migration" + labels: + - covers:gcode-graph-enhancements:1.6:1.6.1 + - covers:gcode-graph-enhancements:1.6:1.6.2 + - covers:gcode-graph-enhancements:1.6:1.6.3 + - covers:gcode-graph-enhancements:1.6:1.6.4 + - covers:gcode-graph-enhancements:1.6:1.6.5 implementation_domain: backend - assigned_agent: backend-developer -- title: Port code graph writes to Rust core + tdd: true + source_section: "1.6" +- title: Define projection provenance metadata category: code task_type: feature depends_on: - - '1.3' - - '2.1' - validation_criteria: "crates/gcode/src/graph/code_graph.rs::tests::delete_preserves_current_symbols" + - "1.1" + validation_criteria: "cargo test -p gobby-code --no-default-features graph::code_graph::tests::code_edges_carry_provenance && cargo test -p gobby-code --no-default-features graph::report::tests::bridge_edges_are_hypotheses" labels: - - covers:unknown:2.2:2.2.1 - - covers:unknown:2.2:2.2.2 - - covers:unknown:2.2:2.2.3 - - covers:unknown:2.2:2.2.4 + - covers:gcode-graph-enhancements:2.1:2.1.1 + - covers:gcode-graph-enhancements:2.1:2.1.2 + - covers:gcode-graph-enhancements:2.1:2.1.3 + implementation_domain: backend tdd: true - source_section: '2.2' + source_section: "2.1" +- title: Port code graph writes to Rust core + category: code + task_type: feature + depends_on: + - "1.3" + - "2.1" + validation_criteria: "cargo test -p gobby-code --no-default-features graph::code_graph::tests::delete_preserves_current_symbols && cargo test -p gobby-code --no-default-features graph::code_graph::tests::cleanup_orphans_is_project_scoped && cargo test -p gobby-code --no-default-features models::tests::uuid5_python_parity" + labels: + - covers:gcode-graph-enhancements:2.2:2.2.1 + - covers:gcode-graph-enhancements:2.2:2.2.2 + - covers:gcode-graph-enhancements:2.2:2.2.3 + - covers:gcode-graph-enhancements:2.2:2.2.4 implementation_domain: backend - assigned_agent: backend-developer + tdd: true + source_section: "2.2" - title: Port code graph reads to Rust core category: code task_type: feature depends_on: - - '2.2' - validation_criteria: "crates/gcode/src/commands/graph.rs::tests::graph_reads_require_falkor" + - "2.2" + validation_criteria: "cargo test -p gobby-code --no-default-features commands::graph::tests::graph_reads_require_falkor && cargo test -p gobby-code --no-default-features falkor::tests::read_helpers_delegate_to_code_graph && cargo test -p gobby-code --no-default-features search::graph_boost::tests" labels: - - covers:unknown:2.3:2.3.1 - - covers:unknown:2.3:2.3.2 - - covers:unknown:2.3:2.3.3 - tdd: true - source_section: '2.3' + - covers:gcode-graph-enhancements:2.3:2.3.1 + - covers:gcode-graph-enhancements:2.3:2.3.2 + - covers:gcode-graph-enhancements:2.3:2.3.3 + - covers:gcode-graph-enhancements:2.3:2.3.4 implementation_domain: backend - assigned_agent: backend-developer + tdd: true + source_section: "2.3" - title: Wrap graph core with gcode commands category: code task_type: feature depends_on: - - '2.2' - - '2.3' - validation_criteria: "crates/gcode/src/main.rs::tests::parse_graph_commands" + - "1.2" + - "1.4" + - "2.2" + - "2.3" + validation_criteria: "cargo test -p gobby-code --no-default-features tests::parse_graph_commands && cargo test -p gobby-code --no-default-features tests::test_parse_callers_remains_top_level && cargo test -p gobby-code --no-default-features tests::test_parse_usages_remains_top_level && cargo test -p gobby-code --no-default-features tests::test_parse_imports_remains_top_level && cargo test -p gobby-code --no-default-features tests::test_parse_blast_radius_remains_top_level && cargo test -p gobby-code --no-default-features commands::graph::tests::top_level_read_commands_preserve_json_shape && cargo test -p gobby-code --no-default-features --test graph_standalone" labels: - - covers:unknown:2.4:2.4.1 - - covers:unknown:2.4:2.4.2 - - covers:unknown:2.4:2.4.3 - - covers:unknown:2.4:2.4.4 + - covers:gcode-graph-enhancements:2.4:2.4.1 + - covers:gcode-graph-enhancements:2.4:2.4.2 + - covers:gcode-graph-enhancements:2.4:2.4.3 + - covers:gcode-graph-enhancements:2.4:2.4.4 + - covers:gcode-graph-enhancements:2.4:2.4.5 + - covers:gcode-graph-enhancements:2.4:2.4.6 + implementation_domain: backend tdd: true - source_section: '2.4' + source_section: "2.4" +- title: Port code-symbol vector projection to Rust core + category: code + task_type: feature + depends_on: + - "1.1" + - "1.5" + - "2.1" + validation_criteria: "cargo test -p gobby-code --no-default-features vector::code_symbols::tests::embedding_request_response && cargo test -p gobby-code --no-default-features vector::code_symbols::tests::collection_name_compatibility && cargo test -p gobby-code --no-default-features commands::vector::tests::vector_lifecycle_requires_config && cargo test -p gobby-code --no-default-features vector::code_symbols::tests::summaries_are_optional_enrichment && cargo test -p gobby-code --no-default-features vector::code_symbols::tests::lifecycle_http_scoped_to_module && cargo test -p gobby-code --no-default-features vector::code_symbols::tests::routes_through_gobby_core_qdrant && cargo test -p gobby-code --no-default-features vector::code_symbols::tests::ensure_collection_resolves_vector_size_and_distance && cargo test -p gobby-code --no-default-features vector::code_symbols::tests::payloads_carry_provenance_metadata && cargo test -p gobby-code --no-default-features config::tests::vector_dim_setting_resolves_env_and_config_store && cargo test -p gobby-code --no-default-features --test vector_projection" + labels: + - covers:gcode-graph-enhancements:2.5:2.5.1 + - covers:gcode-graph-enhancements:2.5:2.5.2 + - covers:gcode-graph-enhancements:2.5:2.5.3 + - covers:gcode-graph-enhancements:2.5:2.5.4 + - covers:gcode-graph-enhancements:2.5:2.5.5 + - covers:gcode-graph-enhancements:2.5:2.5.6 + - covers:gcode-graph-enhancements:2.5:2.5.7 + - covers:gcode-graph-enhancements:2.5:2.5.8 + - covers:gcode-graph-enhancements:2.5:2.5.9 + - covers:gcode-graph-enhancements:2.5:2.5.10 + - covers:gcode-graph-enhancements:2.5:2.5.11 implementation_domain: backend - assigned_agent: backend-developer -- title: Generate project graph report in Rust core + tdd: true + source_section: "2.5" +- title: Add projection lifecycle orchestration commands category: code task_type: feature depends_on: - - '2.3' - validation_criteria: "crates/gcode/src/graph/report.rs::tests::report_shape" + - "1.2" + - "1.4" + - "2.4" + - "2.5" + validation_criteria: "cargo test -p gobby-code --no-default-features tests::parse_projection_lifecycle_commands && cargo test -p gobby-code --no-default-features projection::sync::tests::sync_state_tracks_projection_success && cargo test -p gobby-code --no-default-features commands::vector::tests::lifecycle_json_contract && cargo test -p gobby-code --no-default-features commands::index::tests::sync_projections_json_contract && cargo test -p gobby-code --no-default-features commands::index::tests::sync_projections_text_contract && cargo test -p gobby-code --no-default-features --test projection_standalone" labels: - - covers:unknown:3.1:3.1.1 - - covers:unknown:3.1:3.1.2 - - covers:unknown:3.1:3.1.3 - - covers:unknown:3.1:3.1.4 + - covers:gcode-graph-enhancements:2.6:2.6.1 + - covers:gcode-graph-enhancements:2.6:2.6.2 + - covers:gcode-graph-enhancements:2.6:2.6.3 + - covers:gcode-graph-enhancements:2.6:2.6.4 + - covers:gcode-graph-enhancements:2.6:2.6.5 + - covers:gcode-graph-enhancements:2.6:2.6.6 + - covers:gcode-graph-enhancements:2.6:2.6.7 + implementation_domain: backend tdd: true - source_section: '3.1' + source_section: "2.6" +- title: Generate project graph report in Rust core + category: code + task_type: feature + depends_on: + - "2.3" + validation_criteria: "cargo test -p gobby-code --no-default-features graph::report::tests::report_shape && cargo test -p gobby-code --no-default-features graph::report::tests::bridge_edges_are_read_only && cargo test -p gobby-code --no-default-features graph::report::tests::report_degradation_contract" + labels: + - covers:gcode-graph-enhancements:3.1:3.1.1 + - covers:gcode-graph-enhancements:3.1:3.1.2 + - covers:gcode-graph-enhancements:3.1:3.1.3 + - covers:gcode-graph-enhancements:3.1:3.1.4 implementation_domain: backend - assigned_agent: backend-developer + tdd: true + source_section: "3.1" - title: Add gcode graph report CLI wrapper category: code task_type: feature depends_on: - - '3.1' - validation_criteria: "crates/gcode/src/main.rs::tests::parse_graph_report_global_format" + - "2.6" + - "3.1" + validation_criteria: "cargo test -p gobby-code --no-default-features tests::parse_graph_report_global_format && cargo test -p gobby-code --no-default-features commands::graph::tests::report_text_structured_output && cargo test -p gobby-code --no-default-features commands::graph::tests::report_requires_graph_service" labels: - - covers:unknown:3.2:3.2.1 - - covers:unknown:3.2:3.2.2 - - covers:unknown:3.2:3.2.3 - - covers:unknown:3.2:3.2.4 - tdd: true - source_section: '3.2' + - covers:gcode-graph-enhancements:3.2:3.2.1 + - covers:gcode-graph-enhancements:3.2:3.2.2 + - covers:gcode-graph-enhancements:3.2:3.2.3 + - covers:gcode-graph-enhancements:3.2:3.2.4 implementation_domain: backend - assigned_agent: backend-developer + tdd: true + source_section: "3.2" - title: Document daemon migration contracts category: docs task_type: documentation depends_on: - - '2.4' - - '3.2' - validation_criteria: "docs/guides/gcode-graph-core.md documents direct Rust linking target" + - "2.6" + - "3.2" + validation_criteria: "docs/guides/gcode-graph-core.md exists and documents direct Rust linking target, transitional Python shell-out shims, code/memory ownership boundaries, and daemon-side optional symbol summaries" labels: - - covers:unknown:3.3:3.3.1 - - covers:unknown:3.3:3.3.2 - - covers:unknown:3.3:3.3.3 - tdd: false - source_section: '3.3' + - covers:gcode-graph-enhancements:3.3:3.3.1 + - covers:gcode-graph-enhancements:3.3:3.3.2 + - covers:gcode-graph-enhancements:3.3:3.3.3 + - covers:gcode-graph-enhancements:3.3:3.3.4 implementation_domain: backend - assigned_agent: backend-developer + tdd: false + source_section: "3.3" ``` diff --git a/.gobby/plans/gcore-rust-foundation.md b/.gobby/plans/gcore-rust-foundation.md index 57697cc..90b1eb3 100644 --- a/.gobby/plans/gcore-rust-foundation.md +++ b/.gobby/plans/gcore-rust-foundation.md @@ -9,6 +9,8 @@ Domain behavior stays out of this crate. Code graph facts, symbol IDs, language parsing policy, and graph APIs stay in `gobby-code`. Wiki vault layout, llm-wiki UX, ingestion flows, and article synthesis stay in `gobby-wiki`. `gobby-core` exists so those crates stop copying infrastructure while keeping their domain boundaries sharp. +The crate already exists at `crates/gcore/` with three small modules (`project`, `bootstrap`, `daemon_url`, ~250 lines total) consumed by `gobby-code` and `gobby-hooks`. This plan expands it into the full foundation layer behind Cargo feature gates so consumers that only need project discovery continue to get a tiny dependency. + ## C1: Constraints `kind: framing` @@ -18,6 +20,7 @@ Domain behavior stays out of this crate. Code graph facts, symbol IDs, language - **No domain ownership**: `gobby-core` must not know about code symbols, wiki documents, vault UX, task behavior, memories, or daemon workflow semantics. - **Graceful absence**: FalkorDB and Qdrant are optional dependencies at runtime. Missing services surface typed degradation rather than panics or fake empty success. - **Small public API**: APIs should expose stable structs, traits, and error enums, not CLI parser types or command-output formatting. +- **Feature-gated heavy dependencies**: the baseline crate (no features) stays dependency-light (`anyhow`, `dirs`, `serde_json`, `serde_yaml`). Heavy dependencies like `postgres`, `falkordb`, and `reqwest` live behind Cargo features. Consumers opt in to only the adapters they need. `gsqz`, `gloc`, and `ghook` must not inherit datastore dependencies they never use. ## D1: Dependent Plans `kind: framing` @@ -31,76 +34,616 @@ This plan is the foundation dependency for `.gobby/plans/gcode-graph-enhancement ## P1: Context And Setup Contracts `kind: framing` -**Goal**: define the shared Rust foundation boundary and the setup modes that every consumer crate can use without inheriting another domain's behavior. +**Goal**: define the shared Rust foundation boundary, degradation vocabulary, context/config resolution, and setup modes that every consumer crate can use without inheriting another domain's behavior. ### 1.1 Define the gobby-core public boundary [category: code] `kind: deliverable` -Targets: `crates/gcore/src/lib.rs`, `docs/guides/gcore-development-guide.md` +Targets: `crates/gcore/Cargo.toml`, `crates/gcore/src/lib.rs`, `docs/guides/gcore-development-guide.md` -Expand the current small `gobby-core` crate into a clearly documented foundation layer. The public module map should describe intended modules for: +Expand the existing `gobby-core` crate into the documented foundation layer. The crate currently exposes three modules (`project`, `bootstrap`, `daemon_url`). This task adds the module map, Cargo feature gates, and dev-guide updates for the full foundation surface. -- `context` - resolved project root, project id, bootstrap path, database URL, and service configs. -- `config` - env/config-store/default resolution helpers. -- `setup` - attached versus standalone setup contracts. -- `postgres` - PostgreSQL hub connections and config-store reads. -- `falkor` - FalkorDB client config, query execution, and escaping helpers. -- `qdrant` - Qdrant collection/config helpers and vector upsert/search contracts. -- `indexing` - generic walker, hash, chunk, and artifact traits. -- `search` - generic BM25/semantic/graph result contracts and RRF fusion. -- `degradation` - typed optional-service and partial-result states. +**Cargo.toml feature structure:** -The boundary must stay dependency-light where possible and must not expose `gcode` command types, wiki vault structs, or daemon workflow types. +```toml +[features] +default = [] +postgres = ["dep:postgres", "dep:postgres-types"] +falkor = ["dep:falkordb", "dep:urlencoding"] +qdrant = ["dep:reqwest"] +indexing = ["dep:ignore", "dep:sha2"] +search = [] +full = ["postgres", "falkor", "qdrant", "indexing", "search"] -**Acceptance:** +[dependencies] +anyhow = "1" +dirs = "6" +serde = { version = "1", features = ["derive"] } +serde_json = "1" +serde_yaml = "0.9" +thiserror = "2" -- 1.1.1 - `crates/gcore/src/lib.rs` documents the foundation module map and domain boundary. file: `crates/gcore/src/lib.rs`. -- 1.1.2 - `docs/guides/gcore-development-guide.md` names `gobby-core` as shared substrate, not a code-graph or wiki domain crate. file: `docs/guides/gcore-development-guide.md`. -- 1.1.3 - Public APIs avoid `clap`, command-output, code-symbol, and wiki-vault types. test: `crates/gcore/src/lib.rs::tests::public_api_has_no_domain_types`. -- 1.1.4 - `gobby-core` remains buildable under CI's no-default-features profile. test: `cargo build -p gobby-core --no-default-features`. +# Feature-gated +postgres = { version = "0.19", optional = true } +postgres-types = { version = "0.2", optional = true } +falkordb = { version = "0.2", optional = true } +reqwest = { version = "0.12", features = ["blocking", "json"], optional = true } +ignore = { version = "0.4", optional = true } +sha2 = { version = "0.10", optional = true } +urlencoding = { version = "2", optional = true } +``` -### 1.2 Add shared context and config resolution [category: code] (depends: 1.1) -`kind: deliverable` +The `falkor` feature includes `dep:urlencoding` because `GraphClient::from_config` uses `urlencoding::encode` for password-safe FalkorDB connection URLs. Without this, `cargo build -p gobby-core --features falkor` would fail to compile while `--all-features` would mask the error. -Targets: `crates/gcore/src/context.rs`, `crates/gcore/src/config.rs`, `crates/gcore/src/bootstrap.rs`, `crates/gcore/src/project.rs` +**lib.rs module map (expanded):** + +```rust +//! Shared primitives for Gobby CLI tools. + +// Always available — existing modules +pub mod bootstrap; +pub mod daemon_url; +pub mod project; -Create a shared context resolver that consumer crates can call before opening datastore clients. It should combine the existing project-root and bootstrap helpers with service config resolution: +// Always available — new lightweight modules +pub mod config; +pub mod context; +pub mod degradation; +pub mod setup; -- Resolve project root and project id from `.gobby/project.json` without writing that file. -- Resolve PostgreSQL DSN from environment or `bootstrap.yaml`. -- Resolve FalkorDB and Qdrant config from environment, then `config_store`, then defaults. -- Preserve explicit absence for optional services instead of manufacturing usable configs. -- Return a reusable context struct that is independent of any CLI command enum. +// Feature-gated modules +#[cfg(feature = "postgres")] +pub mod postgres; -Config-store reads belong behind a PostgreSQL connection supplied by the caller or built from the resolved DSN. The helper may read `config_store`; it must not write it. +#[cfg(feature = "falkor")] +pub mod falkor; + +#[cfg(feature = "qdrant")] +pub mod qdrant; + +#[cfg(feature = "indexing")] +pub mod indexing; + +#[cfg(feature = "search")] +pub mod search; +``` + +Update `docs/guides/gcore-development-guide.md` to document the expanded module map, feature gate rationale, and updated "Adding a New Helper" guidance that accounts for feature-gated modules. **Acceptance:** -- 1.2.1 - Context resolution returns project root, project id, database URL, and optional service configs. file: `crates/gcore/src/context.rs`. -- 1.2.2 - FalkorDB and Qdrant resolution preserves env-var precedence over `config_store` over defaults. test: `crates/gcore/src/config.rs::tests::env_overrides_config_store`. -- 1.2.3 - Missing optional service config is represented explicitly and does not panic. test: `crates/gcore/src/context.rs::tests::missing_optional_services_are_none`. -- 1.2.4 - Project identity is read-only and never writes `.gobby/project.json`. test: `crates/gcore/src/project.rs::tests::read_project_id_is_non_destructive`. +- 1.1.1 - `crates/gcore/Cargo.toml` defines feature gates for `postgres`, `falkor` (including `dep:urlencoding`), `qdrant`, `indexing`, `search`, and `full`. file: `crates/gcore/Cargo.toml`. +- 1.1.2 - `crates/gcore/src/lib.rs` documents the foundation module map with `#[cfg(feature)]` guards on heavy modules. file: `crates/gcore/src/lib.rs`. +- 1.1.3 - `docs/guides/gcore-development-guide.md` describes `gobby-core` as shared substrate with feature gate documentation. file: `docs/guides/gcore-development-guide.md`. +- 1.1.4 - `gobby-core` builds under `--no-default-features` with only the lightweight baseline modules. test: `cargo build -p gobby-core --no-default-features`. +- 1.1.5 - Each individual feature compiles in isolation: `cargo build -p gobby-core --features falkor`, `--features qdrant`, `--features postgres`, `--features indexing`. test: `cargo build -p gobby-core --features falkor && cargo build -p gobby-core --features qdrant && cargo build -p gobby-core --features postgres && cargo build -p gobby-core --features indexing`. +- 1.1.6 - Baseline `gobby-core` (no features) passes build, test, and clippy without datastore dependencies, matching CI's `--no-default-features` requirement from `AGENTS.md`. test: `cargo build -p gobby-core --no-default-features && cargo test -p gobby-core --no-default-features && cargo clippy -p gobby-core --no-default-features -- -D warnings`. -### 1.3 Define attached and standalone setup contracts [category: code] (depends: 1.2) +### 1.2 Add shared error and degradation contracts [category: code] (depends: 1.1) `kind: deliverable` -Targets: `crates/gcore/src/setup.rs`, `docs/guides/gcore-development-guide.md` +Targets: `crates/gcore/src/degradation.rs`, `docs/guides/gcore-development-guide.md` -Add shared setup contracts that domain crates can implement without copying policy: +Define shared error and degradation contracts used by datastore adapters, setup contracts, indexing, and search. This module is always available (no feature gate) so even lightweight consumers can use the vocabulary. It must be defined before the setup contracts (§1.4), datastore adapters (§2.2, §2.3), and search fusion (§3.2) that consume its types. + +**degradation.rs — error and degradation types:** + +```rust +use serde::{Serialize, Deserialize}; + +/// Service availability state, returned alongside results from adapters. +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +pub enum ServiceState { + /// Service is connected and responding. + Available, + /// Service is not configured (no config found from any source). + NotConfigured, + /// Service is configured but unreachable. + Unreachable { message: String }, +} + +impl ServiceState { + pub fn is_available(&self) -> bool { + matches!(self, Self::Available) + } +} + +/// Setup validation issue with actionable guidance. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct SetupIssue { + pub object_name: String, + pub store: String, + pub guidance: Guidance, +} + +/// Structured guidance text for setup issues. +/// +/// Callers render these; `gobby-core` does not format CLI output. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct Guidance { + /// What is missing or wrong. + pub problem: String, + /// What the user should do. + pub action: String, + /// Optional command suggestion. + pub command_hint: Option, +} + +/// Fatal errors that prevent a command from completing. +#[derive(Debug, Serialize, Deserialize, thiserror::Error)] +pub enum CoreError { + #[error("invalid configuration: {0}")] + InvalidConfig(String), + #[error("required service unavailable: {service} — {message}")] + RequiredServiceUnavailable { service: String, message: String }, + #[error("write failed: {0}")] + WriteFailed(String), + #[error("corrupted input: {0}")] + CorruptedInput(String), +} + +/// Degradation states for partial results. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub enum DegradationKind { + /// An optional service was unavailable during this operation. + ServiceUnavailable { service: String, state: ServiceState }, + /// Search completed with fewer sources than requested. + PartialSearch { available: Vec, unavailable: Vec }, + /// Index data may be stale (content hash mismatch or age threshold). + StaleIndex { paths: Vec }, + /// Some artifacts were skipped during indexing. + SkippedArtifacts { count: usize, reason: String }, +} +``` + +Consumers decide which services are required versus optional for each command. `gobby-core` supplies the vocabulary, serialization, and structured guidance. + +**Acceptance:** + +- 1.2.1 - `ServiceState`, `CoreError`, `DegradationKind`, and `Guidance` are documented and `Serialize + Deserialize`. file: `crates/gcore/src/degradation.rs`. +- 1.2.2 - `ServiceState::NotConfigured` and `ServiceState::Unreachable` are distinct from `CoreError::RequiredServiceUnavailable`. test: `crates/gcore/src/degradation.rs::tests::optional_service_degradation_is_not_fatal`. +- 1.2.3 - `Guidance` carries structured `problem`, `action`, and optional `command_hint` fields. test: `crates/gcore/src/degradation.rs::tests::guidance_is_structured`. +- 1.2.4 - Development guide documents how `gobby-code` and `gobby-wiki` consume degradation contracts. file: `docs/guides/gcore-development-guide.md`. +- 1.2.5 - `CoreError` round-trips through `serde_json::to_string` / `serde_json::from_str` for at least the `InvalidConfig` and `RequiredServiceUnavailable` variants. test: `crates/gcore/src/degradation.rs::tests::core_error_serialization_roundtrip`. + +### 1.3 Add shared context and config resolution [category: code] (depends: 1.1) +`kind: deliverable` + +Targets: `crates/gcore/src/context.rs`, `crates/gcore/src/config.rs`, `crates/gcore/src/bootstrap.rs`, `crates/gcore/src/project.rs` + +Extend the existing `bootstrap.rs` and `project.rs` modules and create new `context.rs` and `config.rs` modules for shared context resolution. The existing project-root and bootstrap helpers remain unchanged; this task adds service config resolution on top. + +`gcode` currently resolves its own `Context` struct (at `crates/gcode/src/config.rs:43`) with project root, database URL, and optional `FalkorConfig`/`QdrantConfig`/`EmbeddingConfig`. The shared `CoreContext` generalizes this pattern so `gobby-wiki` can resolve the same services without duplicating resolution logic. + +**context.rs — shared context struct:** + +```rust +use std::path::PathBuf; + +/// Resolved runtime context for any gobby-core consumer. +/// +/// Built by `CoreContext::build()` from pre-resolved inputs. +/// Contains project identity and optional service configs. +/// Domain-specific fields (quiet flags, output format) stay +/// in consumer crates. +pub struct CoreContext { + /// Project root directory (contains `.gobby/`) + pub project_root: PathBuf, + /// Project ID from `.gobby/project.json` + pub project_id: String, + /// PostgreSQL hub DSN (from env or bootstrap.yaml) + pub database_url: Option, + /// FalkorDB config (None when service is absent) + pub falkordb: Option, + /// Qdrant config (None when service is absent) + pub qdrant: Option, + /// Embedding API config (None → no semantic search) + pub embedding: Option, + /// Gobby daemon base URL + pub daemon_url: Option, +} + +impl CoreContext { + /// Build a CoreContext from pre-resolved inputs. + /// + /// **DSN resolution is consumer-owned.** Each consumer resolves + /// its database URL through its own fallback chain: + /// - `gcode`: daemon broker → `GCODE_DATABASE_URL` → + /// `GOBBY_POSTGRES_DSN` → `~/.gobby/gcode.yaml` → + /// `bootstrap.yaml` direct DSN + /// - `gwiki`: same chain with `GWIKI_DATABASE_URL` priority + /// + /// **Project identity** uses existing `gobby-core` helpers: + /// `project::find_project_root` walks up from cwd, + /// `project::read_project_id` reads `.gobby/project.json`. + /// + /// **Service config resolution** (FalkorDB, Qdrant, embedding) + /// is shared through the `ConfigSource` trait. The consumer + /// provides a `ConfigSource` implementation that owns its own + /// database connection (e.g. `PostgresConfigSource` wrapping + /// `&mut Client`). When no database is available, pass an + /// `EnvOnlySource` to resolve from environment variables only. + /// + /// **Daemon URL** is resolved from the existing + /// `daemon_url::daemon_url()` helper. + pub fn build( + project_root: PathBuf, + project_id: String, + database_url: Option, + source: &mut impl ConfigSource, + ) -> Self { + let falkordb = resolve_falkordb_config(source); + let qdrant = resolve_qdrant_config(source); + let embedding = resolve_embedding_config(source); + let daemon_url = Some(crate::daemon_url::daemon_url()); + Self { + project_root, + project_id, + database_url, + falkordb, + qdrant, + embedding, + daemon_url, + } + } +} +``` + +**config.rs — service config types, value decoding, and resolution:** + +```rust +/// FalkorDB connection configuration. +/// +/// Contains connection-level config only. Graph name selection is +/// consumer-owned — gcode supplies "gobby_code", gwiki supplies its +/// own graph name — so it is passed to `GraphClient::from_config` +/// (§2.2) rather than stored here. +#[derive(Debug, Clone)] +pub struct FalkorConfig { + pub host: String, + pub port: u16, + pub password: Option, +} + +/// Qdrant connection configuration. +/// +/// Contains connection-level config only. Collection naming is +/// consumer-owned via `CollectionScope` and `collection_name` (§2.3). +#[derive(Debug, Clone)] +pub struct QdrantConfig { + pub url: Option, + pub api_key: Option, +} + +/// Embedding API configuration (OpenAI-compatible endpoint). +#[derive(Debug, Clone)] +pub struct EmbeddingConfig { + pub api_base: String, + pub model: String, + pub api_key: Option, +} + +/// Decode a config_store value from its stored representation. +/// +/// The Gobby config_store stores values as raw strings that may be +/// JSON-encoded. This function handles: +/// - JSON strings (`"http://host:7474"`) → inner value (`http://host:7474`) +/// - JSON arrays/objects → re-serialized as JSON strings (preserves +/// structured config values like `["alpha",1,true]`) +/// - JSON scalars (numbers, bools) → `value.to_string()` +/// - Plain strings (`http://host:7474`) → pass-through +/// - JSON null → `None` +/// +/// **Intentional divergence from current `gcode`:** current `gcode` +/// (`crates/gcode/src/config.rs:378-386`) has no explicit `Null` branch, +/// so JSON null falls through to `Ok(value) => Some(value.to_string())` +/// producing `Some("null")`. This is a bug: the literal string `"null"` +/// would be passed as a FalkorDB password, Qdrant URL, or embedding +/// model name, which is incorrect. JSON null in config-store semantically +/// means "absent", so this shared version returns `None`. The `gcode` +/// migration to `gobby-core` will adopt the fixed behavior. +pub fn decode_config_value(raw: &str) -> Option { + match serde_json::from_str::(raw) { + Ok(serde_json::Value::String(s)) => Some(s), + Ok(value @ (serde_json::Value::Array(_) | serde_json::Value::Object(_))) => { + Some(serde_json::to_string(&value).unwrap_or_else(|_| raw.to_string())) + } + Ok(serde_json::Value::Null) => None, + Ok(value) => Some(value.to_string()), + Err(_) => Some(raw.to_string()), // plain string + } +} + +/// Resolve `${VAR}` and `${VAR:-default}` environment variable patterns. +/// +/// Returns `Ok(None)` when the var is unset and no default is provided. +/// Returns `Ok(Some(value))` with the resolved value. +/// Non-pattern strings pass through unchanged. +pub fn resolve_env_pattern(value: &str) -> anyhow::Result> { + if !value.contains("${") { + return Ok(Some(value.to_string())); + } + // Handle ${VAR:-default} and ${VAR} patterns + // (regex or manual parsing of the pattern) + todo!("implementation") +} + +/// Source for config values and value resolution. +/// +/// Implementors own their datastore connection internally, avoiding +/// the borrow conflict between connection access and value resolution +/// that a `Box` callback would cause. `gobby-core` calls +/// `config_value` to read settings and `resolve_value` to expand +/// `$secret:NAME` and `${VAR}` patterns. +/// +/// This mirrors the existing `FalkorConfigSource` trait pattern in +/// `gcode` (`crates/gcode/src/config.rs`). `gcode` implements this +/// with a `PostgresConfigSource` that holds `&'a mut postgres::Client`: +/// +/// ```rust,ignore +/// // Requires feature "postgres" — references gobby_core::postgres +/// // and consumer-only crate::secrets (not in gobby-core). +/// struct PostgresConfigSource<'a> { +/// conn: &'a mut postgres::Client, +/// } +/// impl ConfigSource for PostgresConfigSource<'_> { +/// fn config_value(&mut self, key: &str) -> Option { +/// gobby_core::postgres::read_config_value(self.conn, key) +/// .ok().flatten() +/// .and_then(|raw| gobby_core::config::decode_config_value(&raw)) +/// } +/// fn resolve_value(&mut self, value: &str) -> anyhow::Result { +/// crate::secrets::resolve_config_value(value, self.conn) +/// } +/// } +/// ``` +pub trait ConfigSource { + /// Read a decoded config value by key (from config_store, env, etc.). + /// Returns `None` for missing keys. Implementations that read from + /// `config_store` MUST pipe raw values through `decode_config_value` + /// before returning — this unwraps JSON string encoding and + /// re-serializes arrays/objects, matching `gcode`'s existing behavior. + /// Returning raw `read_config_value` output would pass JSON-encoded + /// strings into service config resolution. + fn config_value(&mut self, key: &str) -> Option; + + /// Resolve interpolation patterns in a config value. + /// Handles `$secret:NAME`, `${VAR}`, `${VAR:-default}`. + fn resolve_value(&mut self, value: &str) -> anyhow::Result; +} + +/// Env-only config source for consumers without database access. +/// +/// Returns `None` for all config keys (no config_store) and resolves +/// only `${VAR}` patterns (not `$secret:NAME`). Used when +/// `database_url` is `None` or the `postgres` feature is disabled. +pub struct EnvOnlySource; + +impl ConfigSource for EnvOnlySource { + fn config_value(&mut self, _key: &str) -> Option { + None + } + fn resolve_value(&mut self, value: &str) -> anyhow::Result { + resolve_env_pattern(value)? + .ok_or_else(|| anyhow::anyhow!("unresolved pattern: {value}")) + } +} + +/// Resolve FalkorDB config from env → config_store → defaults. +/// +/// Env vars: GOBBY_FALKORDB_HOST, GOBBY_FALKORDB_PORT, +/// GOBBY_FALKORDB_PASSWORD +/// +/// Graph name is NOT resolved here — it is consumer-owned. +/// See `GraphClient::from_config` (§2.2). +/// +/// The `ConfigSource` handles reading config values and resolving +/// `$secret:NAME` and `${VAR}` patterns. Env vars take precedence +/// over config_store; config_store values are decoded and resolved +/// through `source.config_value` and `source.resolve_value`. +/// +/// Returns None when no host is configured from any source. +pub fn resolve_falkordb_config( + source: &mut impl ConfigSource, +) -> Option { /* ... */ } + +/// Resolve Qdrant config from env → config_store → defaults. +/// +/// Env vars: GOBBY_QDRANT_URL, GOBBY_QDRANT_API_KEY +/// +/// Collection naming is NOT resolved here — it is consumer-owned +/// via `CollectionScope` and `collection_name` (§2.3). +/// +/// Same source semantics as `resolve_falkordb_config`. +pub fn resolve_qdrant_config( + source: &mut impl ConfigSource, +) -> Option { /* ... */ } + +/// Resolve embedding API config from env → config_store → defaults. +/// +/// Env vars: GOBBY_EMBEDDING_URL, GOBBY_EMBEDDING_MODEL, +/// GOBBY_EMBEDDING_API_KEY +/// +/// `GOBBY_EMBEDDING_URL` is the canonical env var, matching existing +/// `gcode` behavior (`crates/gcode/src/config.rs:534`). +/// +/// Same source semantics as `resolve_falkordb_config`. +pub fn resolve_embedding_config( + source: &mut impl ConfigSource, +) -> Option { /* ... */ } +``` + +The resolution functions are not feature-gated themselves — they take `&mut impl ConfigSource`, and the consumer's `ConfigSource` implementation decides how to access the datastore. When no database is available, consumers pass `EnvOnlySource` to resolve from environment variables only. + +**Config value pipeline:** `ConfigSource.config_value` reads and decodes the config value (the PostgreSQL implementation calls `read_config_value` then `decode_config_value`) → `ConfigSource.resolve_value` handles `$secret:NAME` and `${VAR}` patterns. This preserves the existing `gcode` resolution semantics (env → config_store with JSON decode → secret/env interpolation → defaults) without pulling Fernet crypto dependencies into `gobby-core`. The `ConfigSource` trait eliminates the borrow conflict between connection access and value resolution because the implementor owns its connection internally via `&mut self`. + +**Existing modules — no breaking changes:** + +- `bootstrap.rs`: no API changes. The existing `DaemonEndpoint`, `read_daemon_endpoint`, and `bootstrap_path` remain as-is. +- `project.rs`: no API changes. The existing `find_project_root` and `read_project_id` remain as-is. + +**Acceptance:** + +- 1.3.1 - `CoreContext` struct holds project root, project id, database URL, and optional service configs. file: `crates/gcore/src/context.rs`. +- 1.3.2 - FalkorDB and Qdrant resolution preserves env-var precedence over `config_store` over defaults. test: `crates/gcore/src/config.rs::tests::env_overrides_config_store`. +- 1.3.3 - Missing optional service config is represented as `None` and does not panic. test: `crates/gcore/src/context.rs::tests::missing_optional_services_are_none`. +- 1.3.4 - Existing `bootstrap.rs` and `project.rs` public APIs are unchanged. test: `crates/gcore/src/project.rs::tests::read_project_id_is_non_destructive`. +- 1.3.5 - `decode_config_value` unwraps JSON strings, re-serializes JSON arrays/objects as JSON strings (preserving structured config values), converts scalars via `to_string`, passes through plain strings, and returns `None` for JSON null. This is an intentional fix: current `gcode` returns `Some("null")` for JSON null, which is incorrect for config values used as passwords, URLs, or model names. test: `crates/gcore/src/config.rs::tests::decode_config_value_handles_json_and_plain`. +- 1.3.6 - `resolve_env_pattern` resolves `${VAR}` and `${VAR:-default}` patterns from environment variables. test: `crates/gcore/src/config.rs::tests::resolve_env_pattern_with_defaults`. +- 1.3.7 - Resolution functions accept `&mut impl ConfigSource` for config reads and `$secret:NAME` interpolation; `EnvOnlySource` provides a no-database baseline. test: `crates/gcore/src/config.rs::tests::config_source_handles_secrets`. +- 1.3.8 - `CoreContext::build` resolves service configs through `ConfigSource` and produces a complete context from pre-resolved DSN, project root, and project ID. test: `crates/gcore/src/context.rs::tests::build_with_env_only_source`. +- 1.3.9 - `resolve_embedding_config` uses `GOBBY_EMBEDDING_URL` as the canonical env var, preserving existing `gcode` behavior. test: `crates/gcore/src/config.rs::tests::embedding_url_env_var_is_canonical`. +- 1.3.10 - `ConfigSource` trait methods use `&mut self`, allowing implementations to hold mutable database connections without borrow conflicts. test: `crates/gcore/src/config.rs::tests::postgres_config_source_resolves_secrets`. +- 1.3.11 - End-to-end: `resolve_falkordb_config`, `resolve_qdrant_config`, and `resolve_embedding_config` correctly consume JSON-encoded config-store strings (e.g. `"\"http://host:7474\""` unwrapped to `http://host:7474`) and structured values (e.g. `"[\"alpha\",1]"` re-serialized) when the `ConfigSource` implementation pipes through `decode_config_value`. test: `crates/gcore/src/config.rs::tests::resolve_config_handles_json_encoded_store_values`. +- 1.3.12 - `FalkorConfig` contains only connection-level config (host, port, password); no `graph_name` field — graph name selection is consumer-owned via `GraphClient::from_config(config, graph_name)` (§2.2). `gobby-core` source contains no `"gobby_code"` string or wiki graph name default. test: `crates/gcore/src/config.rs::tests::falkordb_config_has_no_domain_graph_name`. +- 1.3.13 - `QdrantConfig` contains only connection-level config (url, api_key); no `collection_prefix` field — collection naming is consumer-owned via `CollectionScope` and `collection_name` (§2.3). test: `crates/gcore/src/config.rs::tests::qdrant_config_has_no_domain_collection_prefix`. + +### 1.4 Define attached and standalone setup contracts [category: code] (depends: 1.2, 1.3) +`kind: deliverable` -- `AttachedMode` validates externally managed Gobby resources and never creates or migrates them. -- `StandaloneMode` is an explicit setup operation that may create only consumer-owned resources in a selected database/schema/collection namespace. -- Runtime validation reports missing prerequisites with actionable setup guidance. -- Domain crates supply ownership rules, required objects, and creation callbacks for standalone mode. +Targets: `crates/gcore/src/setup.rs`, `docs/guides/gcore-development-guide.md` -`gobby-core` should provide the contract and guardrails. It should not hardcode `gcode_*`, `gwiki_*`, `Symbol`, `WikiDoc`, or any other domain-owned objects. +Add shared setup contracts that domain crates implement without copying policy. `gcode` currently validates schema in its own schema module with inline checks; `gobby-wiki` will need similar validation for `gwiki_*` tables. The shared contract generalizes the validation/creation boundary. + +**setup.rs — contract traits and types:** + +```rust +use crate::degradation::{SetupIssue, Guidance}; + +/// Datastore kind for object classification. +pub enum StoreKind { + Postgres, + FalkorDB, + Qdrant, +} + +/// Context supplied to validation callbacks. +/// +/// Contains optional mutable connections to each datastore. Consumers +/// use whichever connection their validator needs; `None` means the +/// service is not configured. The mutable references are required +/// because `postgres::Client::query` takes `&mut self`. +pub struct ValidationContext<'a> { + #[cfg(feature = "postgres")] + pub pg: Option<&'a mut postgres::Client>, + pub falkor_config: Option<&'a crate::config::FalkorConfig>, + pub qdrant_config: Option<&'a crate::config::QdrantConfig>, +} + +/// Result of running all attached-mode validators. +#[derive(Debug, Default)] +pub struct ValidationReport { + /// Names of objects that passed validation. + pub present: Vec, + /// Objects that failed validation, with structured issue details. + pub missing: Vec<(String, SetupIssue)>, +} + +impl ValidationReport { + pub fn is_healthy(&self) -> bool { + self.missing.is_empty() + } +} + +/// Required object that a consumer crate declares for setup validation. +pub struct RequiredObject { + /// Human-readable name (e.g. "symbols table", "wiki_docs table") + pub name: String, + /// Store kind: Postgres, FalkorDB, Qdrant + pub store: StoreKind, + /// Consumer-supplied check function (mutable for database queries) + pub validator: Box) -> Result<(), SetupIssue>>, +} + +/// Attached-mode validation: check that externally managed resources exist. +/// Never creates, alters, or drops anything. +pub trait AttachedValidator { + /// Declare the objects this consumer requires. + fn required_objects(&self) -> Vec; + + /// Run all validators and return a report of present/missing objects. + fn validate(&self, ctx: &mut ValidationContext<'_>) -> ValidationReport { + let mut report = ValidationReport::default(); + for mut obj in self.required_objects() { + match (obj.validator)(ctx) { + Ok(()) => report.present.push(obj.name.clone()), + Err(issue) => report.missing.push((obj.name.clone(), issue)), + } + } + report + } +} + +/// Context supplied to standalone setup creation callbacks. +/// +/// Mutable references are required because `postgres::Client::execute` +/// takes `&mut self` for DDL/DML operations. +pub struct SetupContext<'a> { + #[cfg(feature = "postgres")] + pub pg: Option<&'a mut postgres::Client>, + pub falkor_config: Option<&'a crate::config::FalkorConfig>, + pub qdrant_config: Option<&'a crate::config::QdrantConfig>, + /// If true, skip prompts and apply defaults. + pub non_interactive: bool, +} + +/// Report from a standalone setup creation run. +#[derive(Debug, Default)] +pub struct SetupReport { + /// Objects successfully created. + pub created: Vec, + /// Objects that already existed and were skipped. + pub skipped: Vec, + /// Objects that failed creation, with error detail. + pub failed: Vec<(String, String)>, +} + +/// Error from standalone setup creation. +#[derive(Debug, thiserror::Error)] +pub enum SetupError { + #[error("connection failed for {store}: {message}")] + ConnectionFailed { store: String, message: String }, + #[error("creation failed for {object}: {message}")] + CreationFailed { object: String, message: String }, + #[error("setup refused in attached mode — use standalone setup")] + AttachedModeRefused, +} + +/// An object that a consumer crate owns and can create in standalone mode. +pub struct OwnedObject { + /// Human-readable name (e.g. "gcode_symbols table") + pub name: String, + /// Store kind: Postgres, FalkorDB, Qdrant + pub store: StoreKind, + /// Consumer-supplied creation function (mutable for DDL execution) + pub creator: Box) -> Result<(), SetupError>>, +} + +/// Standalone-mode setup: explicit opt-in creation of consumer-owned resources. +pub trait StandaloneSetup { + /// Namespace prefix for this consumer's owned resources (e.g. "gcode", "gwiki"). + fn namespace(&self) -> &str; + + /// Declare what this consumer owns and can create. + fn owned_objects(&self) -> Vec; + + /// Create consumer-owned resources. Called only on explicit `setup` command. + fn create(&self, ctx: &mut SetupContext<'_>) -> Result; +} +``` + +`SetupIssue` and `Guidance` are imported from `crate::degradation` (defined in §1.2, always available without feature gates). + +`gobby-core` provides the contract traits and validation helpers. It does not hardcode `gcode_*`, `gwiki_*`, `Symbol`, `WikiDoc`, or any domain-owned objects. **Acceptance:** -- 1.3.1 - Attached-mode setup exposes read-only validation hooks. file: `crates/gcore/src/setup.rs`. -- 1.3.2 - Standalone setup requires explicit opt-in and consumer-owned object declarations. file: `crates/gcore/src/setup.rs`. -- 1.3.3 - Runtime validation returns a typed missing-prerequisite error with setup guidance. test: `crates/gcore/src/setup.rs::tests::runtime_validation_reports_setup_guidance`. -- 1.3.4 - Setup docs state that `gobby-core` does not create Gobby-owned schema in attached mode. file: `docs/guides/gcore-development-guide.md`. +- 1.4.1 - `AttachedValidator` trait exposes read-only validation hooks that never mutate datastore schema. file: `crates/gcore/src/setup.rs`. +- 1.4.2 - `StandaloneSetup` trait requires explicit namespace and consumer-owned object declarations. file: `crates/gcore/src/setup.rs`. +- 1.4.3 - `ValidationReport` returns typed `SetupIssue` with actionable guidance text. test: `crates/gcore/src/setup.rs::tests::runtime_validation_reports_setup_guidance`. +- 1.4.4 - Setup docs state that `gobby-core` does not create Gobby-owned schema in attached mode. file: `docs/guides/gcore-development-guide.md`. +- 1.4.5 - `ValidationContext` and `SetupContext` supply mutable references, allowing consumer validators to query and creators to execute against the supplied PostgreSQL connection. test: `crates/gcore/src/setup.rs::tests::validator_can_query_through_mutable_context`. +- 1.4.6 - A standalone creator can execute DDL through the mutable `SetupContext` without moving ownership from subsequent callbacks. test: `crates/gcore/src/setup.rs::tests::creator_executes_without_moving_ownership`. ## P2: Datastore Adapters `kind: framing` @@ -110,60 +653,377 @@ Add shared setup contracts that domain crates can implement without copying poli ### 2.1 Add PostgreSQL hub adapter [category: code] (depends: P1) `kind: deliverable` -Targets: `crates/gcore/src/postgres.rs`, `crates/gcore/src/config.rs`, `crates/gcore/src/degradation.rs` - -Provide shared PostgreSQL connection helpers and read-only config-store access: - -- `connect_readonly` and `connect_readwrite` wrappers with consistent error context. -- A typed config-store reader that parses service config values without mutating rows. -- Schema validation helpers that accept consumer-supplied validators. -- No built-in migrations, `CREATE TABLE`, `ALTER TABLE`, or `DROP TABLE` behavior for attached mode. - -Domain crates remain responsible for their own table names, indexes, and standalone creation callbacks. +Targets: `crates/gcore/src/postgres.rs` + +Provide shared PostgreSQL connection helpers and read-only config-store access behind the `postgres` feature gate. `gcode` currently has these at `crates/gcode/src/db.rs` (649 lines) with `resolve_database_url`, `connect_readonly`, `connect_readwrite`, and config-store reads. The shared module extracts the domain-independent parts. + +**postgres.rs — connection helpers (feature = "postgres"):** + +```rust +use postgres::{Client, NoTls}; + +/// Connect to the PostgreSQL hub in read-only mode. +/// +/// Sets `default_transaction_read_only = on` to guard against accidental writes. +pub fn connect_readonly(database_url: &str) -> anyhow::Result { + let mut client = Client::connect(database_url, NoTls)?; + client.execute("SET default_transaction_read_only = on", &[])?; + Ok(client) +} + +/// Connect to the PostgreSQL hub with write access. +pub fn connect_readwrite(database_url: &str) -> anyhow::Result { + Client::connect(database_url, NoTls) + .map_err(|e| anyhow::anyhow!("PostgreSQL connection failed: {e}")) +} + +/// Read a raw config value from the Gobby `config_store` table. +/// +/// Returns the raw stored value (may be JSON-encoded). Callers should +/// pipe the result through `config::decode_config_value` to unwrap +/// JSON string encoding, then through their value resolver for +/// `$secret:NAME` and `${VAR}` interpolation. +/// +/// Returns `None` for missing keys. Does not write. +pub fn read_config_value( + conn: &mut Client, + key: &str, +) -> anyhow::Result> { + let row = conn + .query_opt("SELECT value FROM config_store WHERE key = $1", &[&key])?; + Ok(row.map(|r| r.get(0))) +} + +/// Result of a single schema object check (table, index, column, etc.). +#[derive(Debug, Clone)] +pub struct SchemaCheck { + /// Object name (e.g. "symbols", "bm25_symbols_idx") + pub object_name: String, + /// What was checked (e.g. "table exists", "index exists", "column type") + pub check_kind: String, + /// Whether the check passed + pub passed: bool, + /// Detail on failure (e.g. "table 'symbols' not found") + pub detail: Option, +} + +/// Consumer-supplied schema validator for attached-mode checks. +/// +/// The callback receives a mutable connection (required by +/// `postgres::Client::query`) and returns validation results. +/// `gobby-core` runs the callback; it does not know which tables to check. +pub fn validate_schema( + conn: &mut Client, + validator: impl FnOnce(&mut Client) -> Vec, +) -> Vec { + validator(conn) +} +``` + +Domain crates remain responsible for their own table names, indexes, and standalone creation callbacks. `gobby-core` supplies connection wrappers and config-store reads. **Acceptance:** - 2.1.1 - Read-only and read-write connection helpers share consistent error context. file: `crates/gcore/src/postgres.rs`. -- 2.1.2 - `config_store` reads are available without write helpers. file: `crates/gcore/src/config.rs`. -- 2.1.3 - Attached schema validation helpers reject migration callbacks. test: `crates/gcore/src/postgres.rs::tests::attached_validation_is_non_destructive`. +- 2.1.2 - `read_config_value` reads raw `config_store` values without write access; callers decode via `config::decode_config_value`. file: `crates/gcore/src/postgres.rs`. +- 2.1.3 - `validate_schema` accepts consumer-supplied validators and never runs its own migrations. test: `crates/gcore/src/postgres.rs::tests::attached_validation_is_non_destructive`. - 2.1.4 - Domain table names are supplied by consumers, not embedded in `gobby-core`. test: `crates/gcore/src/postgres.rs::tests::schema_validator_is_domain_supplied`. ### 2.2 Add FalkorDB adapter and query safety boundary [category: code] (depends: P1) `kind: deliverable` -Targets: `crates/gcore/src/falkor.rs`, `crates/gcore/src/degradation.rs` - -Provide a shared FalkorDB adapter that handles connection config, request execution, parameter escaping, and unavailable-service degradation. It must make safe query construction easy without owning graph semantics. - -The adapter may expose small typed helpers for escaped labels, relation names, property keys, and parameters. It must not define code graph APIs such as `CALLS`, `IMPORTS`, `DEFINES`, or wiki APIs such as `LINKS_TO`; those belong to consumer crates. +Targets: `crates/gcore/src/falkor.rs` + +Provide a shared FalkorDB adapter behind the `falkor` feature gate. `gcode` currently has `FalkorClient` at `crates/gcode/src/falkor.rs` (558 lines) with `from_config`, `query`, and `with_falkor` degradation. The shared adapter extracts the domain-independent connection, query execution, and escaping parts. + +The `degradation.rs` module is owned by §1.2; this task consumes `ServiceState` from it but does not modify the file. + +**falkor.rs — adapter (feature = "falkor"):** + +```rust +use std::collections::HashMap; +use falkordb::{FalkorClientBuilder, FalkorConnectionInfo, SyncGraph}; +use serde_json::Value; + +use crate::config::FalkorConfig; +use crate::degradation::ServiceState; + +pub type Row = HashMap; + +/// Blocking FalkorDB graph client. +/// +/// Owns a connection to a named graph. Domain crates supply Cypher queries; +/// this adapter handles connection lifecycle and result parsing. +pub struct GraphClient { + graph: SyncGraph, +} + +impl GraphClient { + /// `graph_name` is consumer-supplied — gcode passes "gobby_code", + /// gwiki passes its own graph name. The shared `FalkorConfig` holds + /// only connection-level config (host/port/password). + pub fn from_config(config: &FalkorConfig, graph_name: &str) -> anyhow::Result { + let password = config.password.as_deref().unwrap_or_default(); + let url = format!( + "falkor://:{}@{}:{}", + urlencoding::encode(password), + config.host, + config.port, + ); + let conn_info: FalkorConnectionInfo = url.as_str().try_into()?; + let client = FalkorClientBuilder::new() + .with_connection_info(conn_info) + .build()?; + Ok(Self { + graph: client.select_graph(graph_name), + }) + } + + pub fn query( + &mut self, + cypher: &str, + params: Option>, + ) -> anyhow::Result> { /* parse FalkorDB result into rows */ } +} + +/// Run a closure with a FalkorDB client, with typed degradation. +/// +/// Degradation contract: +/// - Config missing (`None`) → `Ok((default, ServiceState::NotConfigured))` +/// - Connection failure → `Ok((default, ServiceState::Unreachable{...}))` +/// - Closure returns `Ok(value)` → `Ok((value, ServiceState::Available))` +/// - Closure returns `Err(e)` → `Err(e)` (propagated — consumer decides +/// whether to degrade or fail) +/// +/// This gives consumers explicit control: optional search-boost paths can +/// `.unwrap_or((default, ServiceState::...))`, while hard graph commands +/// (e.g. `gcode callers`) can `?` the error to surface it. +/// `graph_name` is consumer-supplied, matching the `from_config` contract. +pub fn with_graph( + config: Option<&FalkorConfig>, + graph_name: &str, + default: T, + f: impl FnOnce(&mut GraphClient) -> anyhow::Result, +) -> anyhow::Result<(T, ServiceState)> { + let Some(cfg) = config else { + return Ok((default, ServiceState::NotConfigured)); + }; + let mut client = match GraphClient::from_config(cfg, graph_name) { + Ok(c) => c, + Err(e) => { + return Ok((default, ServiceState::Unreachable { + message: e.to_string(), + })); + } + }; + let value = f(&mut client)?; + Ok((value, ServiceState::Available)) +} + +// --- Escaping helpers (no domain labels) --- + +/// Escape a graph label for safe Cypher embedding. +pub fn escape_label(label: &str) -> String { /* backtick-wrap */ } + +/// Escape a relationship type for safe Cypher embedding. +pub fn escape_rel_type(rel: &str) -> String { /* backtick-wrap */ } + +/// Escape a property key for safe Cypher embedding. +pub fn escape_property(key: &str) -> String { /* backtick-wrap */ } + +/// Escape a string parameter value for Cypher. +pub fn escape_string(value: &str) -> String { /* single-quote, escape internal quotes */ } +``` + +The adapter has no hardcoded code labels (`CodeSymbol`, `CALLS`, `IMPORTS`) or wiki labels (`WikiDoc`, `LINKS_TO`). Domain crates build Cypher with their own labels and call `GraphClient::query`. **Acceptance:** -- 2.2.1 - FalkorDB config and connection errors map to typed degradation states. file: `crates/gcore/src/falkor.rs`. -- 2.2.2 - Query escaping helpers cover labels, relation names, property keys, and string parameters. test: `crates/gcore/src/falkor.rs::tests::escapes_graph_tokens`. -- 2.2.3 - The adapter has no hardcoded code or wiki labels. test: `crates/gcore/src/falkor.rs::tests::no_domain_labels_in_adapter`. -- 2.2.4 - Consumers can distinguish unavailable graph service from a successful empty graph result. test: `crates/gcore/src/degradation.rs::tests::graph_unavailable_is_not_empty_success`. +- 2.2.1 - `with_graph` accepts a consumer-supplied `graph_name` parameter and returns `Ok((default, ServiceState::NotConfigured))` when config is `None`, `Ok((default, ServiceState::Unreachable))` on connection failure, `Ok((value, ServiceState::Available))` on success, and propagates `Err` from the closure. test: `crates/gcore/src/falkor.rs::tests::with_graph_degradation_contract`. +- 2.2.2 - Escaping helpers cover labels, relation types, property keys, and string parameters. test: `crates/gcore/src/falkor.rs::tests::escapes_graph_tokens`. +- 2.2.3 - The adapter source contains no code-graph or wiki-graph label strings. test: `crates/gcore/src/falkor.rs::tests::no_domain_labels_in_adapter`. +- 2.2.4 - `with_graph` distinguishes `ServiceState::NotConfigured`, `ServiceState::Unreachable`, and `ServiceState::Available` — consumers can differentiate unavailable-service from successful-empty-result. test: `crates/gcore/src/falkor.rs::tests::graph_unavailable_is_not_empty_success`. +- 2.2.5 - `GraphClient::from_config` accepts a consumer-supplied `graph_name` parameter; `gobby-core` contains no hardcoded `"gobby_code"` or wiki graph name defaults. test: `crates/gcore/src/falkor.rs::tests::graph_name_is_consumer_supplied`. ### 2.3 Add Qdrant and embedding configuration adapter [category: code] (depends: P1) `kind: deliverable` -Targets: `crates/gcore/src/qdrant.rs`, `crates/gcore/src/config.rs`, `crates/gcore/src/degradation.rs` - -Provide shared Qdrant and embedding configuration primitives: - -- Resolve Qdrant endpoint and embedding endpoint/API key with env/config-store/default precedence. -- Build collection names from consumer-supplied namespaces. -- Provide vector upsert/search request types that carry opaque payload maps. -- Represent missing embeddings or Qdrant as optional-service degradation. - -`gobby-core` should not choose model names for domains, embed text itself unless the consumer passes a configured embedding client, or define code/wiki payload schemas. +Targets: `crates/gcore/src/qdrant.rs` + +Provide shared Qdrant and embedding configuration primitives behind the `qdrant` feature gate. `gcode` currently resolves these configs inline in `crates/gcode/src/config.rs` and uses them in `crates/gcode/src/search/semantic.rs`. + +The `degradation.rs` module is owned by §1.2; this task consumes `ServiceState` from it but does not modify the file. + +**Runtime contract:** The adapter uses `reqwest::blocking` for HTTP calls to Qdrant's REST API, matching `gcode`'s existing approach (`crates/gcode/src/search/semantic.rs` lines 22, 76). All functions are synchronous. No Tokio runtime is required. This keeps CLI consumers simple and avoids async/sync boundary confusion. The `qdrant` Cargo feature pulls in `reqwest` (with `blocking` + `json` features), not `qdrant-client` or `tokio`. + +**Collection naming and `gcode` migration:** Current `gcode` uses `code_symbols_` via `collection_prefix + project_id` concatenation. The shared adapter introduces `CollectionScope` for caller-scoped naming. `gcode` uses `CollectionScope::Custom("code_symbols_")` to preserve its existing collection names — no migration required. New consumers like `gwiki` use `CollectionScope::Project` or `CollectionScope::Topic` for canonical scoped naming. + +**qdrant.rs — adapter (feature = "qdrant"):** + +```rust +use crate::config::QdrantConfig; +use crate::degradation::ServiceState; +use serde_json::Value; + +/// Scope for a Qdrant collection, allowing caller-controlled naming. +pub enum CollectionScope<'a> { + /// `{namespace}:project:{id}` — per-project vector store. + Project(&'a str), + /// `{namespace}:topic:{name}` — topic-scoped store (e.g. gwiki topics). + Topic(&'a str), + /// Verbatim collection name — returns the supplied name as-is, + /// without namespace prefixing. Preserves existing collections + /// (e.g. gcode's `code_symbols_`). The `namespace` + /// parameter is unused for this variant. + Custom(&'a str), +} + +/// Build a collection name from namespace and scope. +/// +/// Examples: +/// collection_name("gwiki", CollectionScope::Project("abc-123")) +/// → "gwiki:project:abc-123" +/// collection_name("gwiki", CollectionScope::Topic("rust-async")) +/// → "gwiki:topic:rust-async" +/// collection_name("gcode", CollectionScope::Custom("code_symbols_abc-123")) +/// → "code_symbols_abc-123" +pub fn collection_name(namespace: &str, scope: CollectionScope<'_>) -> String { + match scope { + CollectionScope::Project(id) => format!("{namespace}:project:{id}"), + CollectionScope::Topic(name) => format!("{namespace}:topic:{name}"), + CollectionScope::Custom(name) => name.to_string(), + } +} + +/// Vector upsert request with opaque domain payload. +pub struct UpsertRequest { + pub id: String, + pub vector: Vec, + /// Domain-specific payload (code symbols, wiki docs, etc.) + pub payload: serde_json::Map, +} + +/// Vector search request with opaque domain filter. +pub struct SearchRequest { + pub vector: Vec, + pub limit: usize, + /// Domain-specific filter conditions + pub filter: Option, +} + +/// Vector search result with score and opaque payload. +pub struct SearchHit { + pub id: String, + pub score: f32, + pub payload: serde_json::Map, +} + +/// Run a closure with Qdrant config, with typed degradation. +/// +/// Degradation contract (mirrors `with_graph` from §2.2): +/// - Config missing (`None`) → `Ok((default, ServiceState::NotConfigured))` +/// - Config present, `url` is `None` → `Ok((default, ServiceState::NotConfigured))` +/// - Closure returns `Ok(value)` → `Ok((value, ServiceState::Available))` +/// - Closure returns `Err(e)` → `Err(e)` (propagated — consumer decides) +/// +/// This is the primary entry point for consumers. It bridges the gap +/// between `CoreContext.qdrant: Option` and the search/upsert +/// functions that require `&QdrantConfig`, providing honest typed +/// degradation for the missing-config path. +pub fn with_qdrant( + config: Option<&QdrantConfig>, + default: T, + f: impl FnOnce(&QdrantConfig) -> anyhow::Result, +) -> anyhow::Result<(T, ServiceState)> { + let Some(cfg) = config else { + return Ok((default, ServiceState::NotConfigured)); + }; + if cfg.url.is_none() { + return Ok((default, ServiceState::NotConfigured)); + } + let value = f(cfg)?; + Ok((value, ServiceState::Available)) +} + +/// Execute a vector search via Qdrant REST API (synchronous). +/// +/// Uses `reqwest::blocking::Client` to POST to `/collections/{collection}/points/search`. +/// Returns only search results — `ServiceState` is NOT part of this +/// function's return type. `with_qdrant` is the single degradation +/// boundary for Qdrant operations: +/// +/// ```rust,ignore +/// // Typical composition — with_qdrant handles config-missing states, +/// // search handles the HTTP call, Err propagates to consumer. +/// with_qdrant(ctx.qdrant.as_ref(), vec![], |cfg| { +/// search(cfg, &collection, request) +/// }) +/// ``` +/// +/// Both connection failure (`reqwest` send error) and non-success HTTP +/// status produce `Err`. Consumers that want degradation instead of +/// propagation catch the outer `Err`: +/// +/// ```rust,ignore +/// match with_qdrant(ctx.qdrant.as_ref(), vec![], |cfg| search(cfg, coll, req)) { +/// Ok((hits, state)) => (hits, vec![], state), +/// Err(e) => (vec![], vec![DegradationKind::ServiceUnavailable { +/// service: "qdrant".into(), +/// state: ServiceState::Unreachable { message: e.to_string() }, +/// }], ServiceState::Unreachable { message: e.to_string() }), +/// } +/// ``` +pub fn search( + config: &QdrantConfig, + collection: &str, + request: SearchRequest, +) -> anyhow::Result> { + let url = config.url.as_deref() + .ok_or_else(|| anyhow::anyhow!("Qdrant URL not configured"))?; + let client = reqwest::blocking::Client::builder() + .timeout(std::time::Duration::from_secs(5)) + .build()?; + let mut req = client.post(format!( + "{url}/collections/{collection}/points/search" + )); + if let Some(key) = &config.api_key { + req = req.header("api-key", key); + } + let body = serde_json::json!({ + "vector": request.vector, + "limit": request.limit, + "filter": request.filter, + "with_payload": true, + }); + let resp = req.json(&body).send()?; + if !resp.status().is_success() { + anyhow::bail!("Qdrant search failed: HTTP {}", resp.status()); + } + // Parse response into SearchHit vec + // (implementation parses Qdrant JSON response format) + Ok(/* parsed hits */) +} + +/// Execute a batch vector upsert via Qdrant REST API (synchronous). +pub fn upsert( + config: &QdrantConfig, + collection: &str, + points: Vec, +) -> anyhow::Result<()> { /* PUT /collections/{collection}/points */ } +``` + +`gobby-core` does not choose model names, embed text, or define code/wiki payload schemas. Embedding is performed by the consumer or a configured embedding API; the adapter handles only the Qdrant client surface. **Acceptance:** -- 2.3.1 - Qdrant and embedding configs follow the shared resolution order. test: `crates/gcore/src/config.rs::tests::qdrant_and_embedding_resolution_order`. -- 2.3.2 - Collection names require a caller-supplied namespace prefix. file: `crates/gcore/src/qdrant.rs`. -- 2.3.3 - Vector upsert/search contracts accept domain payloads without knowing their schema. test: `crates/gcore/src/qdrant.rs::tests::payload_schema_is_opaque`. -- 2.3.4 - Missing embeddings or Qdrant surfaces typed degradation. test: `crates/gcore/src/degradation.rs::tests::vector_services_degrade_explicitly`. +- 2.3.1 - Qdrant and embedding configs follow the shared env → `config_store` → default resolution order. test: `crates/gcore/src/config.rs::tests::qdrant_and_embedding_resolution_order`. +- 2.3.2 - `collection_name` accepts `CollectionScope::Project`, `CollectionScope::Topic`, and `CollectionScope::Custom` for caller-controlled naming. test: `crates/gcore/src/qdrant.rs::tests::collection_name_covers_all_scopes`. +- 2.3.3 - `UpsertRequest` and `SearchRequest` accept domain payloads without knowing their schema. test: `crates/gcore/src/qdrant.rs::tests::payload_schema_is_opaque`. +- 2.3.4 - `with_qdrant` returns `Ok((default, ServiceState::NotConfigured))` when config is `None` or `url` is `None`, `Ok((value, ServiceState::Available))` on success, and propagates `Err` from the closure. test: `crates/gcore/src/qdrant.rs::tests::with_qdrant_degradation_contract`. +- 2.3.5 - All Qdrant adapter functions are synchronous (`reqwest::blocking`); no Tokio runtime is required. test: `crates/gcore/src/qdrant.rs::tests::sync_search_from_cli_path`. +- 2.3.5a - `with_qdrant(config, vec![], |cfg| search(cfg, coll, req))` composes without nested `ServiceState` — the closure returns `Vec`, `with_qdrant` maps it to `(Vec, ServiceState)`. test: `crates/gcore/src/qdrant.rs::tests::with_qdrant_search_composition`. +- 2.3.6 - `CollectionScope::Custom` returns the supplied name verbatim without namespace prefixing, preserving existing gcode collection names (`code_symbols_`). test: `crates/gcore/src/qdrant.rs::tests::custom_scope_returns_verbatim_name`. +- 2.3.7 - `with_qdrant` is the single `ServiceState` boundary for Qdrant operations. `search` returns `anyhow::Result>` with no `ServiceState` — both connection failure and HTTP non-success produce `Err`. Composing `with_qdrant(config, vec![], |cfg| search(cfg, coll, req))` produces three distinguishable outcomes: config missing (`None` or `url: None`) → `Ok((vec![], ServiceState::NotConfigured))`, successful search → `Ok((hits, ServiceState::Available))`, search error → `Err(e)` (consumer decides whether to degrade or propagate). Embedding config absence is consumer-owned — consumers check `Option<&EmbeddingConfig>` before generating a query vector and report missing embedding via `DegradationKind::ServiceUnavailable { service: "embedding", state: ServiceState::NotConfigured }` without entering the Qdrant adapter. test: `crates/gcore/src/qdrant.rs::tests::qdrant_single_state_boundary`. ## P3: Generic Indexing And Search Primitives `kind: framing` @@ -175,63 +1035,162 @@ Provide shared Qdrant and embedding configuration primitives: Targets: `crates/gcore/src/indexing.rs`, `crates/gcore/src/lib.rs` -Extract or define generic primitives for file discovery, content hashing, chunk identity, and index event flow: - -- Filesystem walker settings and ignore rules that consumers can extend. -- SHA-256 content hashing. -- Chunk records with byte ranges, heading/context metadata, and opaque domain payload. -- Index events for added, changed, unchanged, deleted, and skipped artifacts. - -Language parsing, markdown parsing, symbol extraction, wiki link extraction, and domain write models stay in `gobby-code` or `gobby-wiki`. +Extract or define generic primitives for file discovery, content hashing, chunk identity, and index event flow behind the `indexing` feature gate. `gcode` currently implements these in `crates/gcode/src/index/` (walker, hasher, chunker, indexer) with code-specific parsing baked in. The shared module extracts the domain-independent parts. + +**indexing.rs — generic primitives (feature = "indexing"):** + +```rust +use std::path::PathBuf; +use ignore::WalkBuilder; +use sha2::{Sha256, Digest}; + +/// Walker configuration that consumers can extend with domain-specific rules. +pub struct WalkerSettings { + pub root: PathBuf, + pub respect_gitignore: bool, + pub max_filesize: Option, + /// Extra ignore patterns (e.g. "*.pyc", "node_modules/") + pub extra_ignores: Vec, +} + +impl WalkerSettings { + /// Build an `ignore::WalkBuilder` from these settings. + pub fn into_walker(self) -> WalkBuilder { /* ... */ } +} + +/// SHA-256 content hash for incremental indexing. +pub fn content_hash(data: &[u8]) -> String { + let mut hasher = Sha256::new(); + hasher.update(data); + format!("{:x}", hasher.finalize()) +} + +/// A content chunk with byte range and opaque domain metadata. +pub struct Chunk { + pub file_path: PathBuf, + pub byte_start: usize, + pub byte_end: usize, + pub heading: Option, + /// Opaque domain payload (symbol refs, wiki links, etc.) + pub metadata: serde_json::Value, +} + +/// Index lifecycle events for incremental indexing. +pub enum IndexEvent { + Added(PathBuf), + Changed(PathBuf), + Unchanged(PathBuf), + Deleted(PathBuf), + Skipped { path: PathBuf, reason: String }, +} +``` + +Language parsing, markdown parsing, symbol extraction, wiki link extraction, and domain write models stay in `gobby-code` or `gobby-wiki`. The generic module provides only the file discovery, hashing, chunking structure, and event vocabulary. **Acceptance:** -- 3.1.1 - Generic walker settings support consumer extension without code/wiki-specific defaults. file: `crates/gcore/src/indexing.rs`. -- 3.1.2 - Hashing and chunk records are reusable with opaque domain metadata. test: `crates/gcore/src/indexing.rs::tests::chunk_metadata_is_opaque`. -- 3.1.3 - Index events distinguish unchanged, changed, deleted, and skipped artifacts. test: `crates/gcore/src/indexing.rs::tests::index_events_cover_incremental_cases`. -- 3.1.4 - `gobby-core` does not import tree-sitter language grammars. test: `crates/gcore/src/indexing.rs::tests::no_domain_parser_dependency`. +- 3.1.1 - `WalkerSettings` supports consumer extension without code/wiki-specific defaults. file: `crates/gcore/src/indexing.rs`. +- 3.1.2 - `Chunk` carries opaque `serde_json::Value` metadata without domain-specific fields. test: `crates/gcore/src/indexing.rs::tests::chunk_metadata_is_opaque`. +- 3.1.3 - `IndexEvent` distinguishes `Added`, `Changed`, `Unchanged`, `Deleted`, and `Skipped` with reason. test: `crates/gcore/src/indexing.rs::tests::index_events_cover_incremental_cases`. +- 3.1.4 - The indexing module does not depend on tree-sitter or any language grammar crate. test: `crates/gcore/src/indexing.rs::tests::no_domain_parser_dependency`. ### 3.2 Add generic search fusion primitives [category: code] (depends: P2) `kind: deliverable` -Targets: `crates/gcore/src/search.rs`, `crates/gcore/src/degradation.rs` - -Provide reusable search result and fusion primitives: - -- Common result identity, score, source, and explanation fields. -- RRF fusion over BM25, semantic, and graph-ranked result lists. -- Degradation metadata when one source is unavailable. -- Stable JSON-serializable structs that domain CLIs can wrap. - -PostgreSQL query SQL, Qdrant payload filters, graph boost semantics, and user-facing output remain consumer-specific. - -**Acceptance:** - -- 3.2.1 - RRF fusion is available from `gobby_core::search`. file: `crates/gcore/src/search.rs`. -- 3.2.2 - Fusion preserves source explanations and unavailable-source degradation. test: `crates/gcore/src/search.rs::tests::rrf_preserves_explanations_and_degradation`. -- 3.2.3 - Result structs are serializable without CLI formatting types. test: `crates/gcore/src/search.rs::tests::search_result_is_cli_independent`. -- 3.2.4 - Domain-specific SQL, graph labels, and payload filters are absent from the shared search module. test: `crates/gcore/src/search.rs::tests::search_core_has_no_domain_queries`. - -### 3.3 Add shared error and degradation contracts [category: code] (depends: 3.1, 3.2) -`kind: deliverable` - -Targets: `crates/gcore/src/degradation.rs`, `docs/guides/gcore-development-guide.md` - -Define shared error and degradation contracts used by datastore adapters, indexing, and search: - -- Fatal errors for corrupted inputs, invalid config, unavailable required services, and write failures. -- Degradation states for unavailable optional services, partial search sources, stale indexes, and skipped artifacts. -- Caller-facing guidance text that command crates can render without parsing error strings. -- JSON-friendly representations for machine consumers. - -Consumers decide which services are required for each command. `gobby-core` supplies the vocabulary and serialization. +Targets: `crates/gcore/src/search.rs` + +Provide reusable search result and fusion primitives behind the `search` feature gate. `gcode` currently has RRF fusion at `crates/gcode/src/search/rrf.rs` (133 lines) as a pure function operating on string IDs. The shared version generalizes this to work for any consumer. + +The search fusion layer is degradation-agnostic. Adapters (§2.2, §2.3) own `ServiceState` boundaries; consumers build `SearchDegradation` from their adapter results. The fusion module operates on ranked ID lists and does not import or carry degradation types. + +**search.rs — generic fusion (feature = "search"):** + +```rust +use std::collections::HashMap; +use serde::{Serialize, Deserialize}; + +/// RRF constant — matches Python RRF_K in code_index/searcher.py. +const RRF_K: f64 = 60.0; + +/// A search result from any source, with opaque identity and metadata. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct SearchResult { + /// Opaque identifier (symbol UUID, doc UUID, chunk ID, etc.) + pub id: String, + /// Combined score after fusion + pub score: f64, + /// Which sources contributed this result + pub sources: Vec, + /// Source-level explanations for debugging + #[serde(skip_serializing_if = "Vec::is_empty")] + pub explanations: Vec, +} + +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct SourceExplanation { + pub source: String, + pub rank: usize, + pub score: f64, +} + +/// Degradation metadata for a search that had unavailable sources. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct SearchDegradation { + pub unavailable_sources: Vec, + pub available_sources: Vec, +} + +/// Merge multiple ranked lists using Reciprocal Rank Fusion. +/// +/// Each source is a `(name, ranked_ids)` pair where index 0 = most relevant. +/// Returns results sorted by combined RRF score descending. +pub fn rrf_merge( + sources: Vec<(&str, Vec)>, +) -> Vec { + let mut entries: HashMap> = HashMap::new(); + for (source_name, ids) in &sources { + // Deduplicate IDs within this source, keeping the best (lowest) rank + let mut best_rank: HashMap<&String, usize> = HashMap::new(); + for (rank, id) in ids.iter().enumerate() { + best_rank.entry(id).and_modify(|r| *r = (*r).min(rank)).or_insert(rank); + } + for (id, rank) in best_rank { + let score = 1.0 / (RRF_K + rank as f64); + entries.entry(id.clone()).or_default().push(SourceExplanation { + source: source_name.to_string(), + rank, + score, + }); + } + } + let mut results: Vec = entries + .into_iter() + .map(|(id, mut explanations)| { + let score = explanations.iter().map(|e| e.score).sum(); + // Sort explanations by source name for deterministic output, + // matching current gcode's source_names.sort() behavior. + explanations.sort_by(|a, b| a.source.cmp(&b.source)); + let sources = explanations.iter().map(|e| e.source.clone()).collect(); + SearchResult { id, score, sources, explanations } + }) + .collect(); + results.sort_by(|a, b| b.score.partial_cmp(&a.score) + .unwrap_or(std::cmp::Ordering::Equal) + .then_with(|| a.id.cmp(&b.id))); + results +} +``` + +PostgreSQL query SQL, Qdrant payload filters, graph boost semantics, and user-facing output remain consumer-specific. The shared module provides the fusion algorithm and result types. **Acceptance:** -- 3.3.1 - Shared error and degradation enums are serializable and documented. file: `crates/gcore/src/degradation.rs`. -- 3.3.2 - Optional-service degradation is distinct from fatal command failure. test: `crates/gcore/src/degradation.rs::tests::optional_service_degradation_is_not_fatal`. -- 3.3.3 - Guidance text is structured, not string-parsed by callers. test: `crates/gcore/src/degradation.rs::tests::guidance_is_structured`. -- 3.3.4 - Development guide documents how `gobby-code` and `gobby-wiki` consume the shared contracts. file: `docs/guides/gcore-development-guide.md`. +- 3.2.1 - `rrf_merge` is available from `gobby_core::search` behind the `search` feature. file: `crates/gcore/src/search.rs`. +- 3.2.2 - `SearchResult` preserves per-source explanations and `SearchDegradation` tracks unavailable sources. test: `crates/gcore/src/search.rs::tests::rrf_preserves_explanations_and_degradation`. +- 3.2.3 - `SearchResult` is `Serialize + Deserialize` without CLI formatting types. test: `crates/gcore/src/search.rs::tests::search_result_is_cli_independent`. +- 3.2.4 - The search module source contains no domain-specific SQL, graph labels, or payload filter code. test: `crates/gcore/src/search.rs::tests::search_core_has_no_domain_queries`. +- 3.2.5 - `rrf_merge` deduplicates IDs within a single source, keeping the best rank; a source returning `["a", "a"]` contributes one RRF score and one `SourceExplanation` for `"a"`. test: `crates/gcore/src/search.rs::tests::rrf_deduplicates_within_source`. +- 3.2.6 - `rrf_merge` sorts `sources` and `explanations` within each `SearchResult` by source name, producing deterministic output matching current `gcode`'s `source_names.sort()` behavior. A two-source overlap case (e.g. `("semantic", ["b"]), ("fts", ["b"])`) produces `sources: ["fts", "semantic"]` in alphabetical order. test: `crates/gcore/src/search.rs::tests::rrf_sorts_sources_deterministically`. ## VS1: Verification `kind: verification` @@ -242,6 +1201,13 @@ Validation for this plan: - `cargo build -p gobby-core --no-default-features` - `cargo test -p gobby-core --no-default-features` - `cargo clippy -p gobby-core --no-default-features -- -D warnings` +- `cargo build -p gobby-core --features postgres` +- `cargo build -p gobby-core --features falkor` +- `cargo build -p gobby-core --features qdrant` +- `cargo build -p gobby-core --features indexing` +- `cargo build -p gobby-core --all-features` +- `cargo test -p gobby-core --all-features` +- `cargo clippy -p gobby-core --all-features -- -D warnings` Integration validation after dependent plans land: @@ -252,7 +1218,8 @@ Integration validation after dependent plans land: ## AC1: Acceptance Criteria `kind: verification` -- `gobby-core` exposes shared context/config, setup, datastore, indexing, search, and degradation primitives. +- `gobby-core` exposes shared context/config, setup, datastore, indexing, search, and degradation primitives behind feature gates. +- Baseline `gobby-core` (no features) stays dependency-light and builds without datastore crates. - Attached mode remains non-destructive to Gobby-owned schema, files, and `config_store`. - Standalone setup is explicit and scoped to consumer-owned resources. - `gobby-code` keeps code graph and code-index domain APIs. @@ -263,3 +1230,178 @@ Integration validation after dependent plans land: `kind: verification` - **R1 (2026-05-26)**: Created the foundation plan for shared Rust substrate work. Scoped shared primitives to `gobby-core`; kept code graph behavior in `gobby-code` and wiki vault behavior in `gobby-wiki`; defined attached/standalone setup, datastore adapters, generic indexing/search primitives, and shared degradation contracts. +- **R2 (2026-05-26)**: Added Cargo feature-gate strategy to constraints and task 1.1; concrete code examples (struct/trait/function signatures) to all deliverable sections; acknowledged existing `bootstrap.rs`/`project.rs` modules in task 1.2; added `Cargo.toml` as target for 1.1; added `--all-features` build/test to verification; clarified config-store resolution requires `postgres` feature. +- **R3 (2026-05-26)**: Addressed R2 adversary findings F1–F4. (F1) Fixed `with_graph` return type to `anyhow::Result<(T, ServiceState)>` with explicit four-state degradation contract; updated acceptance 2.2.1/2.2.4. (F2) Replaced `collection_name(namespace, id)` with `CollectionScope` enum supporting `Project`, `Topic`, and `Custom` scopes; documented gcode's legacy `code_symbols_` preservation via `Custom`; added acceptance 2.3.2 covering all scopes. (F3) Replaced `qdrant-client` + `tokio` with `reqwest::blocking` matching gcode's existing sync HTTP pattern; documented runtime contract; added acceptance 2.3.5 for sync CLI path. (F4) Added concrete definitions for `ValidationContext`, `ValidationReport`, `SetupContext`, `SetupReport`, `SetupError`, `OwnedObject` in §1.3 and `SchemaCheck` in §2.1. Swept: updated §1.1 Cargo.toml features; fixed §3.3 dependency from `(depends: 3.1, 3.2)` to `(depends: 1.1)` since degradation.rs is always-available and consumed by P1/P2 tasks. +- **R4 (2026-05-26)**: Addressed R3 adversary findings F1–F4. (F1) Moved degradation contract from §3.3/P3 to §1.2/P1 so it precedes all consumers (§1.4, §2.2, §2.3, §3.2); renumbered §1.2→§1.3, §1.3→§1.4; removed `degradation.rs` from §2.2, §2.3, §3.2 targets to prevent multi-agent edits to the same file; §1.4 now depends on both §1.2 and §1.3. (F2) Changed `ValidationContext`/`SetupContext` to use `&'a mut postgres::Client` mutable borrows instead of owned `postgres::Client`; changed validators to `FnMut(&mut ValidationContext<'_>)` and creators to `FnMut(&mut SetupContext<'_>)`; added acceptance items 1.4.5/1.4.6 proving mutable query and DDL execution. (F3) Added `dep:urlencoding` to `falkor` feature in Cargo.toml; added acceptance 1.1.5 for per-feature isolation builds. (F4) Added `decode_config_value` (JSON string unwrapping), `resolve_env_pattern` (`${VAR}`/`${VAR:-default}`), and `ValueResolver` callback type to §1.3; resolution functions accept consumer-supplied resolver for `$secret:NAME` interpolation; documented config value pipeline; updated `read_config_value` docs to reference decode step; added acceptance items 1.3.5/1.3.6/1.3.7. Swept all deliverables for same finding classes: verified no other shared type definitions are consumed before being defined; verified all targets correctly reflect file ownership. +- **R6 (2026-05-26)**: Addressed R5 adversary findings F1–F2. (F1) Changed `falkordb` dependency from `"0.3"` (not published on crates.io) to `"0.2"` matching the workspace version at `crates/gcode/Cargo.toml:36`. (F2) Updated `decode_config_value` to match gcode's actual behavior: JSON arrays/objects are re-serialized as JSON strings via `serde_json::to_string` instead of returning `None`; JSON null returns `None`; updated acceptance 1.3.5 to assert array/object preservation. Swept: verified all planned dependency versions against workspace Cargo.toml files — no other version mismatches found; verified no other plan sections reference the old arrays-return-None semantics. +- **R7 (2026-05-26)**: Addressed R6 adversary findings F1–F3. (F1) Updated `PostgresConfigSource` example in §1.3 to pipe raw `read_config_value` through `decode_config_value` before returning; strengthened `ConfigSource.config_value` trait doc to mandate decode step; added acceptance 1.3.11 for end-to-end proof that `resolve_*_config` functions handle JSON-encoded config-store values correctly. (F2) Added `with_qdrant` wrapper to §2.3 accepting `Option<&QdrantConfig>` with four-state degradation contract matching `with_graph` pattern; bridges the gap between `CoreContext.qdrant: Option` and `search`/`upsert` functions; updated acceptance 2.3.4 and added 2.3.7 covering missing config, unreachable, and empty-result degradation paths. (F3) Added `cargo test -p gobby-core --no-default-features` and `cargo clippy -p gobby-core --no-default-features -- -D warnings` to VS1; added acceptance 1.1.6 for baseline build/test/clippy matching CI's `AGENTS.md` requirement. Swept: no other raw `read_config_value` pass-throughs in plan; no other `Option` adapter patterns missing degradation wrappers; `--no-default-features` now covered in both §1.1 acceptance and VS1. +- **R8 (2026-05-26)**: Addressed R7 adversary findings F1–F5. (F1) Added `Serialize, Deserialize` derives to `CoreError` matching acceptance 1.2.1's contract; added acceptance 1.2.5 for serialization round-trip test. (F2) Removed "missing embedding config" from §2.3 acceptance 2.3.7 — embedding config absence is consumer-owned (checked before generating query vector, reported via `DegradationKind::ServiceUnavailable { service: "embedding", ... }`); Qdrant adapter's three states (`NotConfigured`, `Unreachable`, `Available`) are distinguishable. (F3) Removed `crates/gcore/src/config.rs` from §2.1 targets (only implements `postgres.rs`); sweep: also removed from §2.3 targets (only implements `qdrant.rs`); all P2 tasks now own disjoint files. (F4) Added per-source ID deduplication to `rrf_merge` keeping best rank per `(source, id)` pair; added acceptance 3.2.5 for duplicate-ID-within-source behavior. (F5) Changed `ConfigSource` doc example fence to `rust,ignore` with feature-gate note — references consumer-only `crate::secrets` and feature-gated `postgres::Client`; no other un-tagged doctest fences in plan. Swept: all `#[derive]` annotations match acceptance Serialize+Deserialize claims; all P2 target lists are disjoint; no other doctest-compilability issues. +- **R5 (2026-05-26)**: Addressed R4 adversary findings F1–F4. (F1) Replaced `ValueResolver` callback with `ConfigSource` trait that owns its datastore connection via `&mut self`, eliminating the borrow conflict between `&mut postgres::Client` and the resolver closure; added `EnvOnlySource` for no-database baseline; removed `#[cfg(feature = "postgres")]` from resolution functions since `ConfigSource` abstracts the connection; showed `PostgresConfigSource` implementation example matching gcode's existing `FalkorConfigSource` pattern; added acceptance 1.3.7/1.3.10. (F2) Changed `CollectionScope::Custom` to return the supplied name verbatim without namespace prefix, so `collection_name("gcode", Custom("code_symbols_abc"))` returns `"code_symbols_abc"` preserving existing gcode collections without migration; added acceptance 2.3.6. (F3) Changed `GOBBY_EMBEDDING_API_BASE` to `GOBBY_EMBEDDING_URL` matching existing gcode env var at `crates/gcode/src/config.rs:534`; added acceptance 1.3.9. (F4) Added concrete `CoreContext::build` method with explicit parameter contract: DSN resolution is consumer-owned (documented gcode/gwiki fallback chains), project identity uses existing `project.rs` helpers, service configs resolve through `ConfigSource`, daemon URL from `daemon_url::daemon_url()`; added acceptance 1.3.8. Swept: verified all env var names in plan match codebase (GOBBY_FALKORDB_HOST/PORT/PASSWORD, GOBBY_QDRANT_URL, GOBBY_EMBEDDING_URL, GOBBY_EMBEDDING_MODEL, GOBBY_EMBEDDING_API_KEY); verified no other resolution functions have borrow-conflict patterns. +- **R9 (2026-05-27)**: Addressed R8 adversary findings F1–F2. (F1) Removed `graph_name` from `FalkorConfig` — graph name selection is now consumer-owned via `GraphClient::from_config(config, graph_name)` and `with_graph(config, graph_name, default, f)` in §2.2; removed `GOBBY_FALKORDB_GRAPH` from `resolve_falkordb_config` env vars; added acceptance 1.3.12 proving `gobby-core` contains no `"gobby_code"` or wiki graph default, and 2.2.5 proving graph name is consumer-supplied. (F2) Removed `ServiceState` from `search` return type — now returns `anyhow::Result>` with both connection and HTTP failures as `Err`; `with_qdrant` is the single `ServiceState` boundary; added composition examples showing `with_qdrant(config, vec![], |cfg| search(cfg, coll, req))`; updated acceptance 2.3.7 to describe single-state-owner design; added 2.3.5a for composition path test. Swept same finding classes: (a) removed `collection_prefix` from `QdrantConfig` (same domain-leak pattern as `graph_name` — unused by `CollectionScope` API); removed `GOBBY_QDRANT_COLLECTION_PREFIX` from `resolve_qdrant_config`; added acceptance 1.3.13; (b) verified no other adapter functions return nested `ServiceState` — `with_graph` closure returns `T` not `(T, ServiceState)`, consistent with fixed `search`; `upsert` returns `anyhow::Result<()>` with no state, consistent. +- **R10 (2026-05-27)**: Addressed R9 adversary findings F1–F3. (F1) Documented `decode_config_value` JSON-null behavior as an intentional divergence from current `gcode` — current `gcode` returns `Some("null")` for JSON null (no explicit `Null` branch), which is incorrect for config values used as passwords, URLs, or model names; updated docstring and acceptance 1.3.5 to state the fix explicitly. (F2) Removed unused `ServiceState` import from search.rs; replaced degradation-consumption claim with statement that search fusion is degradation-agnostic (adapters own `ServiceState`, consumers build `SearchDegradation`). (F3) Added deterministic source ordering to `rrf_merge` — `explanations` sorted by source name before deriving `sources`, matching current `gcode`'s `source_names.sort()` at `rrf.rs:42`; added acceptance 3.2.6 for two-source overlap ordering parity. Swept same finding classes: (a) removed unused `EmbeddingConfig` import from qdrant.rs (same unused-import class as F2 — embedding config is consumer-owned, not used in adapter functions); (b) verified no other `HashMap` iteration order reaches output-facing fields; (c) verified remaining parity claims (env var names, `reqwest::blocking`, `CollectionScope::Custom`) match codebase. + +## M1 Task Manifest +`kind: manifest` + +```yaml +- title: Define the gobby-core public boundary + category: code + task_type: feature + depends_on: [] + validation_criteria: "cargo build -p gobby-core --no-default-features && cargo clippy -p gobby-core --no-default-features -- -D warnings" + labels: + - covers:gcore-rust-foundation:1.1:1.1.1 + - covers:gcore-rust-foundation:1.1:1.1.2 + - covers:gcore-rust-foundation:1.1:1.1.3 + - covers:gcore-rust-foundation:1.1:1.1.4 + - covers:gcore-rust-foundation:1.1:1.1.5 + - covers:gcore-rust-foundation:1.1:1.1.6 + implementation_domain: backend + tdd: true + source_section: "1.1" +- title: Add shared error and degradation contracts + category: code + task_type: feature + depends_on: + - "1.1" + validation_criteria: "cargo test -p gobby-core --no-default-features degradation::tests" + labels: + - covers:gcore-rust-foundation:1.2:1.2.1 + - covers:gcore-rust-foundation:1.2:1.2.2 + - covers:gcore-rust-foundation:1.2:1.2.3 + - covers:gcore-rust-foundation:1.2:1.2.4 + - covers:gcore-rust-foundation:1.2:1.2.5 + implementation_domain: backend + tdd: true + source_section: "1.2" +- title: Add shared context and config resolution + category: code + task_type: feature + depends_on: + - "1.1" + validation_criteria: "cargo test -p gobby-core --no-default-features config::tests && cargo test -p gobby-core --no-default-features context::tests" + labels: + - covers:gcore-rust-foundation:1.3:1.3.1 + - covers:gcore-rust-foundation:1.3:1.3.2 + - covers:gcore-rust-foundation:1.3:1.3.3 + - covers:gcore-rust-foundation:1.3:1.3.4 + - covers:gcore-rust-foundation:1.3:1.3.5 + - covers:gcore-rust-foundation:1.3:1.3.6 + - covers:gcore-rust-foundation:1.3:1.3.7 + - covers:gcore-rust-foundation:1.3:1.3.8 + - covers:gcore-rust-foundation:1.3:1.3.9 + - covers:gcore-rust-foundation:1.3:1.3.10 + - covers:gcore-rust-foundation:1.3:1.3.11 + - covers:gcore-rust-foundation:1.3:1.3.12 + - covers:gcore-rust-foundation:1.3:1.3.13 + implementation_domain: backend + tdd: true + source_section: "1.3" +- title: Define attached and standalone setup contracts + category: code + task_type: feature + depends_on: + - "1.2" + - "1.3" + validation_criteria: "cargo test -p gobby-core --no-default-features setup::tests" + labels: + - covers:gcore-rust-foundation:1.4:1.4.1 + - covers:gcore-rust-foundation:1.4:1.4.2 + - covers:gcore-rust-foundation:1.4:1.4.3 + - covers:gcore-rust-foundation:1.4:1.4.4 + - covers:gcore-rust-foundation:1.4:1.4.5 + - covers:gcore-rust-foundation:1.4:1.4.6 + implementation_domain: backend + tdd: true + source_section: "1.4" +- title: Add PostgreSQL hub adapter + category: code + task_type: feature + depends_on: + - "1.1" + - "1.2" + - "1.3" + - "1.4" + validation_criteria: "cargo test -p gobby-core --features postgres postgres::tests" + labels: + - covers:gcore-rust-foundation:2.1:2.1.1 + - covers:gcore-rust-foundation:2.1:2.1.2 + - covers:gcore-rust-foundation:2.1:2.1.3 + - covers:gcore-rust-foundation:2.1:2.1.4 + implementation_domain: backend + tdd: true + source_section: "2.1" +- title: Add FalkorDB adapter and query safety boundary + category: code + task_type: feature + depends_on: + - "1.1" + - "1.2" + - "1.3" + - "1.4" + validation_criteria: "cargo test -p gobby-core --features falkor falkor::tests" + labels: + - covers:gcore-rust-foundation:2.2:2.2.1 + - covers:gcore-rust-foundation:2.2:2.2.2 + - covers:gcore-rust-foundation:2.2:2.2.3 + - covers:gcore-rust-foundation:2.2:2.2.4 + - covers:gcore-rust-foundation:2.2:2.2.5 + implementation_domain: backend + tdd: true + source_section: "2.2" +- title: Add Qdrant and embedding configuration adapter + category: code + task_type: feature + depends_on: + - "1.1" + - "1.2" + - "1.3" + - "1.4" + validation_criteria: "cargo test -p gobby-core --features qdrant qdrant::tests" + labels: + - covers:gcore-rust-foundation:2.3:2.3.1 + - covers:gcore-rust-foundation:2.3:2.3.2 + - covers:gcore-rust-foundation:2.3:2.3.3 + - covers:gcore-rust-foundation:2.3:2.3.4 + - covers:gcore-rust-foundation:2.3:2.3.5 + - covers:gcore-rust-foundation:2.3:2.3.5a + - covers:gcore-rust-foundation:2.3:2.3.6 + - covers:gcore-rust-foundation:2.3:2.3.7 + implementation_domain: backend + tdd: true + source_section: "2.3" +- title: Add generic indexing primitives + category: code + task_type: feature + depends_on: + - "2.1" + - "2.2" + - "2.3" + validation_criteria: "cargo test -p gobby-core --features indexing indexing::tests" + labels: + - covers:gcore-rust-foundation:3.1:3.1.1 + - covers:gcore-rust-foundation:3.1:3.1.2 + - covers:gcore-rust-foundation:3.1:3.1.3 + - covers:gcore-rust-foundation:3.1:3.1.4 + implementation_domain: backend + tdd: true + source_section: "3.1" +- title: Add generic search fusion primitives + category: code + task_type: feature + depends_on: + - "2.1" + - "2.2" + - "2.3" + validation_criteria: "cargo test -p gobby-core --features search search::tests" + labels: + - covers:gcore-rust-foundation:3.2:3.2.1 + - covers:gcore-rust-foundation:3.2:3.2.2 + - covers:gcore-rust-foundation:3.2:3.2.3 + - covers:gcore-rust-foundation:3.2:3.2.4 + - covers:gcore-rust-foundation:3.2:3.2.5 + - covers:gcore-rust-foundation:3.2:3.2.6 + implementation_domain: backend + tdd: true + source_section: "3.2" +``` diff --git a/CHANGELOG.md b/CHANGELOG.md index 898aa8c..67e7d3a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,6 +9,116 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +## [0.9.2] — gcode + +### Added + +#### gcode + +- **Project-id graph clear** — `gcode graph clear --project-id ` + now clears a code graph projection by explicit project id before normal + project-root resolution. This is the daemon stale-project cleanup path and can + run from any cwd without `.gobby/project.json`. + +### Fixed + +#### gcode + +- **Deleted-file projection cleanup** — `gcode index` now removes FalkorDB code + graph nodes/edges and Qdrant code-symbol vectors for deleted files before + deleting PostgreSQL hub facts. This covers missing explicit + `--files ` inputs and whole-project stale/orphan cleanup without + relying on daemon reconciliation. +- **Projection ownership boundary** — code graph clears remain scoped to + code-index FalkorDB labels, and code vector clears remain scoped to + `code_symbols_{project_id}`. Memory graph nodes and memory vector collections + are not targeted by these lifecycle paths. + +## [0.9.1] — gcode + +### Added + +#### gcode + +- **Overview graph limit** — `gcode graph overview` now accepts `--limit N` + to cap the number of files included in the overview graph, matching the + daemon's graph overview limit contract. + +### Fixed + +#### gcode + +- **File graph read aliases** — `gcode graph file` now keeps node file paths + and edge metadata file paths under distinct FalkorDB result aliases, fixing + duplicate-column failures when returning JSON graph payloads. + +## [0.9.0] — gcode + +### Added + +#### gcode + +- **Standalone setup reset boundary** — `gcode setup --standalone` now fails + safely when it detects incompatible existing code-index PostgreSQL state and + prints guidance to rerun with `--overwrite-code-index` only when a full + code-index reset is intended. +- **Advanced full code-index overwrite** — + `gcode setup --standalone --overwrite-code-index` drops/recreates only + allowlisted gcode code-index PostgreSQL relations and BM25 indexes, clears + code-index graph nodes in FalkorDB, and deletes Qdrant collections with the + `code_symbols_` prefix. Gobby project files, config, secrets, tasks, + sessions, memory, and daemon-owned data stay untouched. +- **Rust graph/vector projection lifecycle** — graph reads, graph reports, + vector projection sync, and graph/vector lifecycle operations now route + through the Rust `gobby-code` library boundary for daemon adoption. + +### Changed + +#### gcode + +- **Project-scoped invalidation** — `gcode invalidate` remains the normal + project reset. PostgreSQL deletes stay filtered to the current project, and + configured standalone FalkorDB/Qdrant projections are cleaned only for that + project. +- **Shared foundation dependency** — `gobby-code` now consumes + `gobby-core 0.2`. + +## [0.2.0] — gobby-core + +### Added + +#### gobby-core + +- **Expanded shared foundation** — added reusable context/config contracts, + attached/standalone setup contracts, PostgreSQL hub helpers, FalkorDB and + Qdrant adapters, standalone service provisioning helpers, indexing + primitives, search-fusion primitives, and degradation vocabulary for Rust + Gobby CLI consumers. + +### Changed + +#### gobby-core + +- **Consumer dependency line** — workspace consumers now target the + `gobby-core 0.2` minor line. + +## [0.8.7] — gcode + +### Fixed + +#### gcode + +- **Project identity resolution** — self-referential + `parent_project_path` / `parent_project_id` markers now keep the owning + `.gobby/project.json` ID, while linked worktrees and isolated roots keep + filesystem-scoped code index IDs. +- **Source `build` package indexing** — root generated `build` / `dist` + directories stay excluded, while source directories such as + `src/gobby/build/` are indexed. +- **Duplicate-root pruning** — `gcode prune` now marks stale duplicate project + entries for an existing root when they differ from that root's resolved + project ID. + ## [0.4.4] — gsqz ### Fixed diff --git a/Cargo.lock b/Cargo.lock index 89ee1ea..922e11d 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -760,7 +760,7 @@ dependencies = [ [[package]] name = "gobby-code" -version = "0.8.6" +version = "0.9.2" dependencies = [ "anyhow", "base64 0.22.1", @@ -812,13 +812,22 @@ dependencies = [ [[package]] name = "gobby-core" -version = "0.1.0" +version = "0.2.0" dependencies = [ "anyhow", "dirs", + "falkordb", + "ignore", + "postgres", + "postgres-types", + "reqwest", + "serde", "serde_json", "serde_yaml", + "sha2 0.10.9", "tempfile", + "thiserror", + "urlencoding", ] [[package]] diff --git a/README.md b/README.md index 0e6d0e3..acce851 100644 --- a/README.md +++ b/README.md @@ -32,8 +32,9 @@ navigation, and hybrid ranking. When FalkorDB, Qdrant, and an embeddings endpoint are configured - typically through Gobby - `gcode` adds graph-aware search, semantic search, optional graph expansion for exact symbol lookup (`gcode search-symbol --with-graph`), dependency analysis (`callers`, `usages`, -`imports`, `blast-radius`), and daemon-backed graph lifecycle commands -(`gcode graph clear`, `gcode graph rebuild`). +`imports`, `blast-radius`), and Rust-owned graph/vector projection lifecycle. +`gcode graph clear --project-id ` is available for daemon +stale-project graph cleanup without cwd project resolution. For non-Gobby-managed projects, `gcode init` installs the bundled `gcode` skill for Claude Code, Codex, Droid, Grok, Qwen, Gemini CLI (deprecated @@ -52,7 +53,9 @@ One command to launch Claude Code or Codex against a local LLM backend. Auto-det Sandbox-tolerant hook dispatcher invoked by host AI CLIs (Claude Code, Codex, Gemini CLI, Qwen CLI) on lifecycle and tool-use events. Spools envelopes to `~/.gobby/hooks/inbox/` *before* POSTing to the local Gobby daemon, so the daemon's drain worker can replay any delivery lost to a sandbox FS-read denial, network blip, or daemon restart. You don't usually invoke it directly — Gobby wires it into your AI CLI for you. -`gobby-core` underpins them all — a small shared-primitives library (project root walk-up, bootstrap config, daemon URL). Not a standalone tool. +`gobby-core` underpins them all — a small shared-primitives library for project +root walk-up, bootstrap config, daemon URL composition, setup/provisioning +contracts, and optional datastore adapters. It is not a standalone tool. ## Documentation @@ -100,6 +103,17 @@ Bootstrap fallback is valid only when `hub_backend: postgres` and bootstrap contains an inline `database_url`. Bootstrap `database_url_ref` is rejected during bootstrap validation; it is never resolved or used to restart the fallback chain. + +For daemon-independent service provisioning, use `gcode setup --standalone`. +The default setup path is non-destructive. If incompatible code-index state is +already present, rerun with `gcode setup --standalone --overwrite-code-index` +only when you intend to reset all gcode-owned code-index PostgreSQL, +FalkorDB, and Qdrant projection state. + +Graph/vector lifecycle is code-index scoped. FalkorDB clears target only +code-index labels, and Qdrant clears target only `code_symbols_{project_id}`; +Gobby memory graph and memory vector collections stay outside this boundary. + Installing from source or crates.io requires Rust 1.88+. ### From source diff --git a/crates/gcode/Cargo.toml b/crates/gcode/Cargo.toml index ddbffca..60e7128 100644 --- a/crates/gcode/Cargo.toml +++ b/crates/gcode/Cargo.toml @@ -1,6 +1,6 @@ [package] name = "gobby-code" -version = "0.8.6" +version = "0.9.2" edition = "2024" rust-version = "1.88" authors = ["Josh Wilhelmi "] @@ -12,13 +12,17 @@ readme = "README.md" keywords = ["code-search", "ast", "code-index", "developer-tools", "gobby"] categories = ["command-line-utilities", "development-tools"] +[lib] +name = "gobby_code" +path = "src/lib.rs" + [[bin]] name = "gcode" path = "src/main.rs" [dependencies] # Internal -gobby-core = { path = "../gcore", version = "0.1" } +gobby-core = { path = "../gcore", version = "0.2", features = ["postgres", "falkor", "qdrant", "search", "indexing"] } # CLI clap = { version = "4", features = ["derive"] } diff --git a/crates/gcode/README.md b/crates/gcode/README.md index 605971b..34e4611 100644 --- a/crates/gcode/README.md +++ b/crates/gcode/README.md @@ -136,18 +136,21 @@ gcode symbols ... # Batch retrieve gcode tree # File tree with symbol counts # Dependency graph reads (requires FalkorDB) +gcode graph overview --limit 100 # Project overview graph gcode callers "handleAuth" # Who calls this? gcode usages "handleAuth" # Incoming call sites gcode imports src/auth.ts # Import graph for a file gcode blast-radius "handleAuth" --depth 3 # Transitive impact analysis -# Graph lifecycle (requires Gobby daemon) +# Graph lifecycle (requires FalkorDB) gcode graph clear # Clear current project's graph projection +gcode graph clear --project-id # Clear graph projection by explicit project id gcode graph rebuild # Rebuild current project's graph projection # Project management gcode status # Index stats gcode projects # List all indexed projects +gcode setup --standalone # Provision daemon-independent services gcode index # Re-index (incremental) gcode invalidate # Clear index, force full re-index @@ -193,7 +196,7 @@ source of truth for code-index rows. codebase → tree-sitter → PostgreSQL hub + pg_search BM25 FalkorDB → call graphs, blast radius, imports Qdrant + embeddings → semantic vector search - Gobby daemon → auto-indexing, graph/vector sync, + Gobby daemon → auto-indexing, graph/vector orchestration, config, secrets, sessions, agents ``` @@ -217,7 +220,7 @@ bootstrap `database_url`. Bootstrap `database_url_ref` is rejected. | Semantic vector search | When Qdrant + embeddings are configured | Yes | | Call graph / blast radius | When FalkorDB is configured | Yes | | Import graph | When FalkorDB is configured | Yes | -| Graph clear / rebuild lifecycle | Requires daemon | Yes | +| Graph clear / rebuild lifecycle | When FalkorDB is configured | Yes | | Auto-indexing on file change | Manual `gcode index` | Yes (daemon file watcher) | | Centralized config + secrets | Reads PostgreSQL `config_store` + secrets | Yes | | Shared index (daemon + CLI) | PostgreSQL hub | PostgreSQL hub | @@ -237,9 +240,11 @@ Get started with Gobby at [github.com/GobbyAI/gobby](https://github.com/GobbyAI/ | PostgreSQL hub unavailable | Runtime index/search commands fail with a bootstrap or connection error. | | No index yet | Commands error with `Run gcode init to initialize`. | -Read-side graph commands depend on FalkorDB. `gcode graph clear` and `gcode graph rebuild` -are separate lifecycle operations routed through the Gobby daemon for the -current resolved project. +Read-side graph commands and graph lifecycle depend on FalkorDB. Vector +lifecycle depends on Qdrant plus embeddings for sync/rebuild. All code-index +projection lifecycle paths are Rust-owned and scoped to code projection state: +graph clears target code-index FalkorDB labels only, and vector clears target +only `code_symbols_{project_id}` rather than memory vector collections. ## Language Support diff --git a/crates/gcode/src/commands/graph.rs b/crates/gcode/src/commands/graph.rs index 8f43036..da58bef 100644 --- a/crates/gcode/src/commands/graph.rs +++ b/crates/gcode/src/commands/graph.rs @@ -1,203 +1,309 @@ -use anyhow::Context as _; -use reqwest::StatusCode; -use serde_json::Value; - use crate::config::Context; use crate::db; -use crate::falkor; +use crate::graph::code_graph::{ + self, GraphBlastRadiusTarget, GraphLifecycleAction, GraphLifecycleOutput, GraphPayload, +}; +use crate::graph::report::{ProjectGraphReport, ProjectGraphReportOptions}; use crate::models::PagedResponse; use crate::output::{self, Format}; +use crate::projection::sync::ProjectionSyncReport; use crate::search::fts::{self, ResolvedGraphSymbol}; +use serde_json::{Value, json}; const GOBBY_HINT: &str = "Graph commands require FalkorDB, available with Gobby. See: https://github.com/GobbyAI/gobby"; - -#[derive(Clone, Copy, Debug)] -enum GraphLifecycleAction { - Clear, - Rebuild, +fn format_success_text(output: &GraphLifecycleOutput) -> String { + format!( + "{} for project {}: {}", + output.action.success_prefix(), + output.project_id, + output.summary + ) } -impl GraphLifecycleAction { - fn cli_command(self) -> &'static str { - match self { - Self::Clear => "gcode graph clear", - Self::Rebuild => "gcode graph rebuild", +fn run_lifecycle_action( + ctx: &Context, + action: GraphLifecycleAction, + format: Format, +) -> anyhow::Result<()> { + let output = match action { + GraphLifecycleAction::Clear => clear_project_graph(ctx)?, + GraphLifecycleAction::Rebuild => rebuild_project_graph(ctx)?, + }; + match format { + Format::Json => output::print_json(&output.payload), + Format::Text => { + output::print_text(&format_success_text(&output))?; + output::print_json_compact(&output.payload) } } +} - fn endpoint_path(self) -> &'static str { - match self { - Self::Clear => "/api/code-index/graph/clear", - Self::Rebuild => "/api/code-index/graph/rebuild", - } +fn lifecycle_output( + action: GraphLifecycleAction, + ctx: &Context, + payload: Value, +) -> GraphLifecycleOutput { + let summary = code_graph::extract_summary_text(&payload).unwrap_or_else(|| payload.to_string()); + GraphLifecycleOutput { + project_id: ctx.project_id.clone(), + action, + summary, + payload, } +} - fn success_prefix(self) -> &'static str { - match self { - Self::Clear => "Cleared code-index graph", - Self::Rebuild => "Rebuilt code-index graph", - } - } +struct GraphFileSyncOutcome { + relationships_written: usize, + symbols_synced: usize, } -fn require_daemon_url( - daemon_url: Option<&str>, - action: GraphLifecycleAction, -) -> anyhow::Result<&str> { - daemon_url.ok_or_else(|| { - anyhow::anyhow!( - "Gobby daemon URL is not configured. `{}` requires the Gobby daemon.", - action.cli_command() - ) +fn sync_file_graph(ctx: &Context, file_path: &str) -> anyhow::Result { + code_graph::require_graph_reads(ctx)?; + let mut conn = db::connect_readwrite(&ctx.database_url)?; + let facts = db::read_graph_file_facts(&mut conn, &ctx.project_id, file_path)?; + if !db::mark_graph_sync_attempted(&mut conn, &ctx.project_id, file_path)? { + anyhow::bail!( + "indexed file `{file_path}` was not found for project {}", + ctx.project_id + ); + } + let relationships_written = code_graph::sync_file_graph( + ctx, + &facts.file_path, + &facts.imports, + &facts.definitions, + &facts.calls, + )?; + db::mark_graph_synced(&mut conn, &ctx.project_id, file_path)?; + Ok(GraphFileSyncOutcome { + relationships_written, + symbols_synced: facts.definitions.len(), }) } -fn build_lifecycle_url( - base_url: &str, - action: GraphLifecycleAction, - project_id: &str, -) -> anyhow::Result { - let base = base_url.trim_end_matches('/'); - let mut url = reqwest::Url::parse(&format!("{base}{}", action.endpoint_path())) - .with_context(|| format!("invalid Gobby daemon URL: {base_url}"))?; - url.query_pairs_mut().append_pair("project_id", project_id); - Ok(url) +fn clear_project_graph(ctx: &Context) -> anyhow::Result { + code_graph::require_graph_reads(ctx)?; + let mut conn = db::connect_readwrite(&ctx.database_url)?; + let files_marked_pending = db::reset_graph_sync_for_project(&mut conn, &ctx.project_id)?; + code_graph::clear_project(ctx)?; + let report = ProjectionSyncReport::ok(0, 0); + Ok(lifecycle_output( + GraphLifecycleAction::Clear, + ctx, + json!({ + "success": true, + "project_id": ctx.project_id, + "status": report.status, + "synced_files": report.synced_files, + "synced_symbols": report.synced_symbols, + "degraded": report.degraded, + "error": report.error, + "files_marked_pending": files_marked_pending, + "summary": format!("marked {files_marked_pending} files pending and cleared graph projection"), + }), + )) } -fn compact_detail(body: &str) -> String { - let detail = body.split_whitespace().collect::>().join(" "); - let detail = detail.trim(); - if detail.len() > 240 { - format!("{}...", &detail[..237]) - } else { - detail.to_string() +fn rebuild_project_graph(ctx: &Context) -> anyhow::Result { + code_graph::require_graph_reads(ctx)?; + let mut conn = db::connect_readwrite(&ctx.database_url)?; + let file_paths = db::list_indexed_file_paths(&mut conn, &ctx.project_id)?; + code_graph::clear_project(ctx)?; + db::reset_graph_sync_for_project(&mut conn, &ctx.project_id)?; + + let mut files_synced = 0usize; + let mut symbols_synced = 0usize; + let mut errors = Vec::new(); + for file_path in &file_paths { + let synced_symbols = + match db::mark_graph_sync_attempted(&mut conn, &ctx.project_id, file_path) + .and_then(|updated| { + if updated { + Ok(()) + } else { + anyhow::bail!("indexed file no longer exists") + } + }) + .and_then(|_| { + let facts = db::read_graph_file_facts(&mut conn, &ctx.project_id, file_path)?; + code_graph::sync_file_graph( + ctx, + &facts.file_path, + &facts.imports, + &facts.definitions, + &facts.calls, + )?; + db::mark_graph_synced(&mut conn, &ctx.project_id, file_path)?; + Ok(facts.definitions.len()) + }) { + Ok(symbols) => symbols, + Err(err) => { + errors.push(format!("{file_path}: {err}")); + continue; + } + }; + files_synced += 1; + symbols_synced += synced_symbols; } -} -fn format_http_error( - action: GraphLifecycleAction, - url: &reqwest::Url, - status: StatusCode, - body: &str, -) -> String { - let detail = compact_detail(body); - if detail.is_empty() { - format!( - "`{}` failed: daemon returned HTTP {status} from {url}", - action.cli_command() - ) + let report = if errors.is_empty() { + ProjectionSyncReport::ok(files_synced, symbols_synced) } else { - format!( - "`{}` failed: daemon returned HTTP {status} from {url}: {detail}", - action.cli_command() + ProjectionSyncReport::degraded( + "sync_failed", + errors.join("; "), + files_synced, + symbols_synced, ) - } + }; + Ok(lifecycle_output( + GraphLifecycleAction::Rebuild, + ctx, + json!({ + "success": true, + "project_id": ctx.project_id, + "status": report.status, + "synced_files": report.synced_files, + "synced_symbols": report.synced_symbols, + "degraded": report.degraded, + "error": report.error, + "files_processed": file_paths.len(), + "files_synced": files_synced, + "files_failed": errors.len(), + "errors": errors, + "summary": format!("synced {files_synced}/{} files", file_paths.len()), + }), + )) } -fn parse_success_payload( - action: GraphLifecycleAction, - status: StatusCode, - body: &str, -) -> anyhow::Result { - serde_json::from_str(body).map_err(|err| { - let detail = compact_detail(body); - if detail.is_empty() { - anyhow::anyhow!( - "`{}` failed: daemon returned HTTP {status} with invalid JSON: {err}", - action.cli_command() - ) - } else { - anyhow::anyhow!( - "`{}` failed: daemon returned HTTP {status} with invalid JSON: {err}. Response: {detail}", - action.cli_command() - ) - } - }) +pub fn clear(ctx: &Context, format: Format) -> anyhow::Result<()> { + run_lifecycle_action(ctx, GraphLifecycleAction::Clear, format) } -fn extract_summary_text(payload: &Value) -> Option { - match payload { - Value::String(text) => { - let text = text.trim(); - (!text.is_empty()).then(|| text.to_string()) +pub fn rebuild(ctx: &Context, format: Format) -> anyhow::Result<()> { + run_lifecycle_action(ctx, GraphLifecycleAction::Rebuild, format) +} + +pub fn sync_file(ctx: &Context, file_path: &str, format: Format) -> anyhow::Result<()> { + let sync = sync_file_graph(ctx, file_path)?; + let relationships_written = sync.relationships_written; + let report = ProjectionSyncReport::ok(1, sync.symbols_synced); + let summary = format!("synced {relationships_written} graph relationships for {file_path}"); + let payload = json!({ + "success": true, + "project_id": ctx.project_id, + "file_path": file_path, + "status": report.status, + "synced_files": report.synced_files, + "synced_symbols": report.synced_symbols, + "degraded": report.degraded, + "error": report.error, + "relationships_written": relationships_written, + "summary": summary, + }); + match format { + Format::Json => output::print_json(&payload), + Format::Text => { + output::print_text(&format!( + "Synced code-index graph for project {}: {summary}", + ctx.project_id + ))?; + output::print_json_compact(&payload) } - Value::Object(map) => ["summary", "message", "detail", "status"] - .iter() - .find_map(|key| map.get(*key).and_then(Value::as_str)) - .map(str::trim) - .filter(|text| !text.is_empty()) - .map(ToOwned::to_owned), - _ => None, } } -fn format_success_text( - action: GraphLifecycleAction, - project_id: &str, - payload: &Value, -) -> anyhow::Result { - let detail = match extract_summary_text(payload) { - Some(summary) => summary, - None => serde_json::to_string(payload)?, - }; - - Ok(format!( - "{} for project {}: {}", - action.success_prefix(), - project_id, - detail - )) +fn format_graph_payload_text(payload: &GraphPayload) -> String { + let mut lines = Vec::new(); + lines.push(format!( + "nodes: {}, links: {}", + payload.nodes.len(), + payload.links.len() + )); + if let Some(center) = &payload.center { + lines.push(format!("center: {center}")); + } + for node in &payload.nodes { + let file = node.file_path.as_deref().unwrap_or(""); + if file.is_empty() { + lines.push(format!( + "node {} [{}] {}", + node.id, node.node_type, node.name + )); + } else { + lines.push(format!( + "node {} [{}] {} {}", + node.id, node.node_type, node.name, file + )); + } + } + for link in &payload.links { + lines.push(format!( + "link {} -[{}]-> {}", + link.source, link.link_type, link.target + )); + } + lines.join("\n") } -fn run_lifecycle_action( - ctx: &Context, - action: GraphLifecycleAction, - format: Format, -) -> anyhow::Result<()> { - let daemon_url = require_daemon_url(ctx.daemon_url.as_deref(), action)?; - let url = build_lifecycle_url(daemon_url, action, &ctx.project_id)?; - let client = reqwest::blocking::Client::builder() - .timeout(std::time::Duration::from_secs(15)) - .build() - .context("failed to build HTTP client")?; - - let response = client - .post(url.clone()) - .header("Accept", "application/json") - .send() - .with_context(|| { - format!( - "Failed to reach Gobby daemon at {daemon_url} for `{}`", - action.cli_command() - ) - })?; - - let status = response.status(); - let body = response.text().unwrap_or_default(); - if !status.is_success() { - anyhow::bail!("{}", format_http_error(action, &url, status, &body)); +fn print_graph_payload(payload: &GraphPayload, format: Format) -> anyhow::Result<()> { + match format { + Format::Json => output::print_json(payload), + Format::Text => output::print_text(&format_graph_payload_text(payload)), } +} + +fn format_report_text(report: &ProjectGraphReport) -> anyhow::Result { + Ok(serde_json::to_string_pretty(report)?) +} - let payload = parse_success_payload(action, status, &body)?; +pub fn report(ctx: &Context, top_n: usize, format: Format) -> anyhow::Result<()> { + let report = crate::graph::report::generate_report_with_options( + ctx, + ProjectGraphReportOptions { top_n }, + )?; match format { - Format::Json => output::print_json(&payload), - Format::Text => { - eprintln!( - "{}", - format_success_text(action, &ctx.project_id, &payload)? - ); - output::print_json_compact(&payload) - } + Format::Json => output::print_json(&report), + Format::Text => output::print_text(&format_report_text(&report)?), } } -pub fn clear(ctx: &Context, format: Format) -> anyhow::Result<()> { - run_lifecycle_action(ctx, GraphLifecycleAction::Clear, format) +pub fn overview(ctx: &Context, limit: usize, format: Format) -> anyhow::Result<()> { + let payload = code_graph::project_overview_graph(ctx, limit)?; + print_graph_payload(&payload, format) } -pub fn rebuild(ctx: &Context, format: Format) -> anyhow::Result<()> { - run_lifecycle_action(ctx, GraphLifecycleAction::Rebuild, format) +pub fn file(ctx: &Context, file_path: &str, format: Format) -> anyhow::Result<()> { + let payload = code_graph::file_graph(ctx, file_path)?; + print_graph_payload(&payload, format) +} + +pub fn neighbors( + ctx: &Context, + symbol_id: &str, + limit: usize, + format: Format, +) -> anyhow::Result<()> { + let payload = code_graph::symbol_neighbors(ctx, symbol_id, limit)?; + print_graph_payload(&payload, format) +} + +pub fn graph_blast_radius( + ctx: &Context, + symbol_id: Option<&str>, + file_path: Option<&str>, + depth: usize, + limit: usize, + format: Format, +) -> anyhow::Result<()> { + let target = match (symbol_id, file_path) { + (Some(symbol_id), None) => GraphBlastRadiusTarget::SymbolId(symbol_id.to_string()), + (None, Some(file_path)) => GraphBlastRadiusTarget::FilePath(file_path.to_string()), + _ => anyhow::bail!("provide exactly one of --symbol-id or --file"), + }; + let payload = code_graph::blast_radius_graph(ctx, target, depth, limit)?; + print_graph_payload(&payload, format) } fn hint_for(ctx: &Context) -> Option { @@ -259,12 +365,13 @@ pub fn callers( offset: usize, format: Format, ) -> anyhow::Result<()> { + code_graph::require_graph_reads(ctx)?; let symbol = match resolve_symbol(ctx, symbol_name) { Some(symbol) => symbol, None => return empty_response_for_unresolved(ctx, format), }; - let total = falkor::count_callers(ctx, &symbol.id)?; - let results = falkor::find_callers(ctx, &symbol.id, offset, limit)?; + let total = code_graph::count_callers(ctx, &symbol.id)?; + let results = code_graph::find_callers(ctx, &symbol.id, offset, limit)?; match format { Format::Json => output::print_json(&PagedResponse { @@ -309,12 +416,13 @@ pub fn usages( offset: usize, format: Format, ) -> anyhow::Result<()> { + code_graph::require_graph_reads(ctx)?; let symbol = match resolve_symbol(ctx, symbol_name) { Some(symbol) => symbol, None => return empty_response_for_unresolved(ctx, format), }; - let total = falkor::count_usages(ctx, &symbol.id)?; - let results = falkor::find_usages(ctx, &symbol.id, offset, limit)?; + let total = code_graph::count_usages(ctx, &symbol.id)?; + let results = code_graph::find_usages(ctx, &symbol.id, offset, limit)?; match format { Format::Json => output::print_json(&PagedResponse { @@ -354,7 +462,8 @@ pub fn usages( } pub fn imports(ctx: &Context, file: &str, format: Format) -> anyhow::Result<()> { - let results = falkor::get_imports(ctx, file)?; + code_graph::require_graph_reads(ctx)?; + let results = code_graph::get_imports(ctx, file)?; let total = results.len(); match format { Format::Json => output::print_json(&PagedResponse { @@ -385,11 +494,12 @@ pub fn blast_radius( depth: usize, format: Format, ) -> anyhow::Result<()> { + code_graph::require_graph_reads(ctx)?; let symbol = match resolve_symbol(ctx, target) { Some(symbol) => symbol, None => return empty_response_for_unresolved(ctx, format), }; - let results = falkor::blast_radius(ctx, &symbol.id, depth)?; + let results = code_graph::blast_radius(ctx, &symbol.id, depth)?; let total = results.len(); match format { Format::Json => output::print_json(&PagedResponse { @@ -418,11 +528,94 @@ pub fn blast_radius( #[cfg(test)] mod tests { use super::*; + use crate::models::{GraphResult, ProjectionMetadata, ProjectionProvenance}; use serde_json::json; + use std::path::PathBuf; + + fn make_ctx_no_falkordb() -> Context { + Context { + database_url: "postgresql://localhost/nonexistent".to_string(), + project_root: PathBuf::from("/nonexistent"), + project_id: "test-project".to_string(), + quiet: true, + falkordb: None, + qdrant: None, + embedding: None, + code_vectors: crate::config::CodeVectorSettings::default(), + daemon_url: None, + } + } + + #[test] + fn graph_reads_require_falkor() { + let ctx = make_ctx_no_falkordb(); + + let err = imports(&ctx, "src/lib.rs", Format::Json).expect_err("imports must fail"); + + assert!(matches!( + err.downcast_ref::(), + Some(code_graph::GraphReadError::NotConfigured) + )); + assert!( + err.to_string().contains("FalkorDB is not configured"), + "unexpected error: {err}" + ); + } + + #[test] + fn report_text_structured_output() { + let report = crate::graph::report::empty_report("project-123"); + + let text = format_report_text(&report).expect("format report text"); + let value: serde_json::Value = serde_json::from_str(&text).expect("structured JSON text"); + + assert_eq!(value["project_id"], "project-123"); + assert_eq!(value["summary"]["node_count"], 0); + assert!( + value["markdown"] + .as_str() + .expect("markdown field") + .contains("# Project Graph Report") + ); + assert!(!text.trim_start().starts_with('#')); + } + + #[test] + fn report_requires_graph_service() { + let ctx = make_ctx_no_falkordb(); + + let err = report(&ctx, 10, Format::Json).expect_err("report must fail"); + + assert!(matches!( + err.downcast_ref::(), + Some(crate::graph::report::ProjectGraphReportError::GraphServiceNotConfigured) + )); + assert!( + err.to_string() + .contains("project graph report requires FalkorDB"), + "unexpected error: {err}" + ); + } + + #[test] + fn graph_lifecycle_commands_call_core_directly() { + let manifest_dir = PathBuf::from(env!("CARGO_MANIFEST_DIR")); + let source = std::fs::read_to_string(manifest_dir.join("src/commands/graph.rs")) + .expect("read commands/graph.rs"); + let clear_project = ["code_graph", "::clear_project(ctx)"].concat(); + let sync_file_graph = ["code_graph", "::sync_file_graph("].concat(); + let lifecycle_request = ["GraphLifecycleRequest", "::from_context"].concat(); + let daemon_lifecycle = ["code_graph", "::run_lifecycle_action"].concat(); + + assert!(source.contains(&clear_project)); + assert!(source.contains(&sync_file_graph)); + assert!(!source.contains(&lifecycle_request)); + assert!(!source.contains(&daemon_lifecycle)); + } #[test] fn test_build_lifecycle_url_clear_uses_project_id_query() { - let url = build_lifecycle_url( + let url = code_graph::build_lifecycle_url( "http://localhost:60887/", GraphLifecycleAction::Clear, "project-123", @@ -437,7 +630,7 @@ mod tests { #[test] fn test_build_lifecycle_url_rebuild_uses_project_id_query() { - let url = build_lifecycle_url( + let url = code_graph::build_lifecycle_url( "http://localhost:60887", GraphLifecycleAction::Rebuild, "project-123", @@ -452,7 +645,8 @@ mod tests { #[test] fn test_require_daemon_url_errors_when_missing() { - let err = require_daemon_url(None, GraphLifecycleAction::Clear).expect_err("must fail"); + let err = code_graph::require_daemon_url(None, GraphLifecycleAction::Clear) + .expect_err("must fail"); assert!( err.to_string() @@ -469,10 +663,10 @@ mod tests { fn test_format_http_error_includes_status_and_body() { let url = reqwest::Url::parse("http://localhost:60887/api/code-index/graph/clear") .expect("valid url"); - let message = format_http_error( + let message = code_graph::format_http_error( GraphLifecycleAction::Clear, &url, - StatusCode::BAD_GATEWAY, + reqwest::StatusCode::BAD_GATEWAY, "daemon upstream unavailable", ); @@ -485,8 +679,12 @@ mod tests { #[test] fn test_parse_success_payload_fails_on_invalid_json() { - let err = parse_success_payload(GraphLifecycleAction::Rebuild, StatusCode::OK, "not json") - .expect_err("invalid json must fail"); + let err = code_graph::parse_success_payload( + GraphLifecycleAction::Rebuild, + reqwest::StatusCode::OK, + "not json", + ) + .expect_err("invalid json must fail"); assert!( err.to_string().contains("invalid JSON"), @@ -504,9 +702,13 @@ mod tests { "message": "cleared 12 graph nodes", "removed_nodes": 12 }); - - let text = format_success_text(GraphLifecycleAction::Clear, "project-123", &payload) - .expect("text formats"); + let output = GraphLifecycleOutput { + project_id: "project-123".to_string(), + action: GraphLifecycleAction::Clear, + summary: "cleared 12 graph nodes".to_string(), + payload, + }; + let text = format_success_text(&output); assert_eq!( text, @@ -520,13 +722,78 @@ mod tests { "replayed": 18, "synced": 18 }); - - let text = format_success_text(GraphLifecycleAction::Rebuild, "project-123", &payload) - .expect("text formats"); + let output = GraphLifecycleOutput { + project_id: "project-123".to_string(), + action: GraphLifecycleAction::Rebuild, + summary: payload.to_string(), + payload, + }; + let text = format_success_text(&output); assert_eq!( text, "Rebuilt code-index graph for project project-123: {\"replayed\":18,\"synced\":18}" ); } + + #[test] + fn top_level_read_commands_preserve_json_shape() { + let response = PagedResponse { + project_id: "project-123".to_string(), + total: 1, + offset: 0, + limit: 10, + results: vec![GraphResult { + id: "sym-1".to_string(), + name: "run".to_string(), + file_path: "src/lib.rs".to_string(), + line: 12, + relation: Some("CALLS".to_string()), + distance: Some(1), + metadata: None, + }], + hint: None, + }; + + let value = serde_json::to_value(&response).expect("serialize response"); + + assert_eq!(value["project_id"], "project-123"); + assert_eq!(value["total"], 1); + assert_eq!(value["offset"], 0); + assert_eq!(value["limit"], 10); + assert_eq!(value["results"][0]["id"], "sym-1"); + assert_eq!(value["results"][0]["name"], "run"); + assert_eq!(value["results"][0]["file_path"], "src/lib.rs"); + assert_eq!(value["results"][0]["line"], 12); + assert_eq!(value["results"][0]["relation"], "CALLS"); + assert_eq!(value["results"][0]["distance"], 1); + assert!(value["hint"].is_null()); + assert!(value["results"][0].get("metadata").is_none()); + + let response = PagedResponse { + project_id: "project-123".to_string(), + total: 1, + offset: 0, + limit: 10, + results: vec![GraphResult { + id: "sym-1".to_string(), + name: "run".to_string(), + file_path: "src/lib.rs".to_string(), + line: 12, + relation: Some("CALLS".to_string()), + distance: Some(1), + metadata: Some( + ProjectionMetadata::new(ProjectionProvenance::Extracted, "gcode") + .with_source_file_path("src/lib.rs"), + ), + }], + hint: None, + }; + let value = serde_json::to_value(&response).expect("serialize metadata response"); + + assert_eq!( + value["results"][0]["metadata"]["source_file_path"], + "src/lib.rs" + ); + } } diff --git a/crates/gcode/src/commands/index.rs b/crates/gcode/src/commands/index.rs index 9bb0308..f419aa8 100644 --- a/crates/gcode/src/commands/index.rs +++ b/crates/gcode/src/commands/index.rs @@ -1,6 +1,10 @@ +use crate::config; use crate::config::Context; -use crate::db; -use crate::index::indexer; +use crate::index::api::{self, IndexDegradation, IndexOutcome, IndexRequest}; +use crate::output::{self, Format}; +use crate::projection::sync::{self, ProjectionSyncReports}; +use crate::utils::short_id; +use serde::Serialize; pub fn run( ctx: &Context, @@ -8,81 +12,246 @@ pub fn run( files: Option>, full: bool, require_cpp_semantics: bool, + sync_projections: bool, + format: Format, ) -> anyhow::Result<()> { - // Resolve root, project_id, and DB connection — re-resolve if path - // belongs to a different project than the CWD-derived context. - let (root, project_id, mut conn) = match path.as_deref() { - Some(p) => { - let target = std::path::PathBuf::from(p); - let target_root = crate::config::detect_project_root_from(&target)?; - if target_root != ctx.project_root { - // Path belongs to a different project — re-resolve everything - let identity = crate::config::resolve_project_identity( - &target_root, - crate::config::MissingIdentity::Generate, - )?; - crate::config::warn_project_identity(&identity, ctx.quiet); - if !ctx.quiet { - eprintln!( - "Warning: path '{}' belongs to project {} (not {}), re-resolving context", - p, - short_id(&identity.project_id), - &ctx.project_id[..8] - ); - } - let conn = db::connect_readwrite(&ctx.database_url)?; - if identity.should_write_gcode_json { - crate::project::ensure_gcode_json(&target_root)?; - } - (target_root, identity.project_id, conn) - } else { - let conn = db::connect_readwrite(&ctx.database_url)?; - (target_root, ctx.project_id.clone(), conn) - } - } - None => { - let conn = db::connect_readwrite(&ctx.database_url)?; - (ctx.project_root.clone(), ctx.project_id.clone(), conn) - } + let (target_ctx, path_filter) = resolve_index_context(ctx, path.as_deref())?; + let explicit_files: Vec = files + .unwrap_or_default() + .into_iter() + .map(std::path::PathBuf::from) + .collect(); + let request = IndexRequest { + project_root: target_ctx.project_root.clone(), + path_filter: if explicit_files.is_empty() { + path_filter + } else { + None + }, + explicit_files, + full, + require_cpp_semantics, + sync_projections, + }; + + let outcome = api::index_files(request, &target_ctx)?; + if sync_projections { + let projections = sync::sync_after_index(&target_ctx, &outcome.indexed_file_paths)?; + let payload = sync_projections_payload(&outcome, projections); + return match format { + Format::Json => output::print_json(&payload), + Format::Text => output::print_text(&sync_projections_text(&payload)?), + }; + } + + match format { + Format::Json => output::print_json(&outcome), + Format::Text => output::print_text(&format!( + "Indexed {} files ({} skipped), {} symbols, {} chunks in {}ms", + outcome.indexed_files, + outcome.skipped_files, + outcome.symbols_indexed, + outcome.chunks_indexed, + outcome.durations.total_ms + )), + } +} + +#[derive(Debug, Clone, PartialEq, Eq, Serialize)] +pub(crate) struct IndexSyncProjectionsOutput { + pub indexed_files: usize, + pub skipped_files: usize, + pub symbols_indexed: usize, + pub chunks_indexed: usize, + #[serde(default, skip_serializing_if = "Vec::is_empty")] + pub degraded: Vec, + pub projections: ProjectionSyncReports, +} + +pub(crate) fn sync_projections_payload( + outcome: &IndexOutcome, + projections: ProjectionSyncReports, +) -> IndexSyncProjectionsOutput { + IndexSyncProjectionsOutput { + indexed_files: outcome.indexed_files, + skipped_files: outcome.skipped_files, + symbols_indexed: outcome.symbols_indexed, + chunks_indexed: outcome.chunks_indexed, + degraded: outcome.degraded.clone(), + projections, + } +} + +pub(crate) fn sync_projections_text( + payload: &IndexSyncProjectionsOutput, +) -> anyhow::Result { + Ok(serde_json::to_string(payload)?) +} + +fn resolve_index_context( + ctx: &Context, + path: Option<&str>, +) -> anyhow::Result<(Context, Option)> { + let Some(p) = path else { + return Ok(( + clone_context(ctx, ctx.project_root.clone(), ctx.project_id.clone()), + None, + )); }; - if let Some(file_list) = files { - let result = indexer::index_files( - &mut conn, - &root, - &project_id, - &file_list, - require_cpp_semantics, + // Resolve root and project_id. If the path belongs to a different project + // than the CWD-derived context, re-resolve identity for that project. + let target = std::path::PathBuf::from(p); + let target_root = crate::config::detect_project_root_from(&target)?; + let target_filter = path_filter_for(&target_root, &target); + if target_root != ctx.project_root { + let identity = crate::config::resolve_project_identity( + &target_root, + crate::config::MissingIdentity::Generate, )?; + crate::config::warn_project_identity(&identity, ctx.quiet); if !ctx.quiet { eprintln!( - "Indexed {} files, {} symbols in {}ms", - result.files_indexed, result.symbols_found, result.duration_ms + "Warning: path '{}' belongs to project {} (not {}), re-resolving context", + p, + short_id(&identity.project_id), + &ctx.project_id[..8] ); } - } else { - let result = indexer::index_directory( - &mut conn, - &root, - &project_id, - !full, - ctx.quiet, - require_cpp_semantics, - )?; - if !ctx.quiet { - eprintln!( - "Indexed {} files ({} skipped), {} symbols in {}ms", - result.files_indexed, - result.files_skipped, - result.symbols_found, - result.duration_ms - ); + if identity.should_write_gcode_json { + crate::project::ensure_gcode_json(&target_root)?; } + Ok(( + clone_context(ctx, target_root, identity.project_id), + target_filter, + )) + } else { + Ok(( + clone_context(ctx, target_root, ctx.project_id.clone()), + target_filter, + )) + } +} + +fn clone_context(ctx: &Context, project_root: std::path::PathBuf, project_id: String) -> Context { + config::Context { + database_url: ctx.database_url.clone(), + project_root, + project_id, + quiet: ctx.quiet, + falkordb: ctx.falkordb.clone(), + qdrant: ctx.qdrant.clone(), + embedding: ctx.embedding.clone(), + code_vectors: ctx.code_vectors.clone(), + daemon_url: ctx.daemon_url.clone(), } +} + +fn path_filter_for( + project_root: &std::path::Path, + target: &std::path::Path, +) -> Option { + let target_abs = if target.is_absolute() { + target.to_path_buf() + } else { + std::env::current_dir() + .map(|cwd| cwd.join(target)) + .unwrap_or_else(|_| project_root.join(target)) + }; + + let root_abs = project_root + .canonicalize() + .unwrap_or_else(|_| project_root.to_path_buf()); + let target_abs = target_abs.canonicalize().unwrap_or(target_abs); - Ok(()) + if target_abs == root_abs { + None + } else { + Some(target_abs) + } } -fn short_id(id: &str) -> &str { - id.get(..8).unwrap_or(id) +#[cfg(test)] +mod tests { + use super::*; + use crate::index::api::IndexOutcome; + use crate::projection::sync::{ + ProjectionStatus, ProjectionSyncError, ProjectionSyncReport, ProjectionSyncReports, + }; + use serde_json::{Value, json}; + + fn sample_outcome() -> IndexOutcome { + IndexOutcome { + indexed_files: 12, + skipped_files: 0, + symbols_indexed: 348, + chunks_indexed: 921, + ..IndexOutcome::default() + } + } + + fn sample_reports() -> ProjectionSyncReports { + ProjectionSyncReports { + graph: ProjectionSyncReport { + status: ProjectionStatus::Ok, + synced_files: 12, + synced_symbols: 348, + degraded: false, + error: None, + }, + vector: ProjectionSyncReport { + status: ProjectionStatus::Degraded, + synced_files: 0, + synced_symbols: 0, + degraded: true, + error: Some(ProjectionSyncError { + kind: "missing_qdrant_config".to_string(), + message: "Qdrant config is required".to_string(), + }), + }, + } + } + + #[test] + fn sync_projections_json_contract() { + let payload = sync_projections_payload(&sample_outcome(), sample_reports()); + assert_eq!( + serde_json::to_value(&payload).expect("payload serializes"), + json!({ + "indexed_files": 12, + "skipped_files": 0, + "symbols_indexed": 348, + "chunks_indexed": 921, + "projections": { + "graph": { + "status": "ok", + "synced_files": 12, + "synced_symbols": 348, + "degraded": false, + "error": null + }, + "vector": { + "status": "degraded", + "synced_files": 0, + "synced_symbols": 0, + "degraded": true, + "error": { + "kind": "missing_qdrant_config", + "message": "Qdrant config is required" + } + } + } + }) + ); + } + + #[test] + fn sync_projections_text_contract() { + let payload = sync_projections_payload(&sample_outcome(), sample_reports()); + let text = sync_projections_text(&payload).expect("text payload"); + let parsed: Value = serde_json::from_str(&text).expect("text mode is structured JSON"); + assert_eq!(parsed["indexed_files"], 12); + assert_eq!(parsed["projections"]["graph"]["status"], "ok"); + assert_eq!(parsed["projections"]["vector"]["status"], "degraded"); + } } diff --git a/crates/gcode/src/commands/init.rs b/crates/gcode/src/commands/init.rs index 88851b8..414a8a2 100644 --- a/crates/gcode/src/commands/init.rs +++ b/crates/gcode/src/commands/init.rs @@ -2,7 +2,7 @@ use std::path::Path; use crate::config; use crate::db; -use crate::index::indexer; +use crate::index::api; use crate::output::{self, Format}; use crate::project; use crate::skill; @@ -53,13 +53,34 @@ pub fn run(project_root: &Path, format: Format, quiet: bool) -> anyhow::Result<( // Auto-index the project. The daemon process is not required, but a migrated // PostgreSQL hub must already be configured in Gobby bootstrap. let database_url = db::resolve_database_url()?; - let mut conn = db::connect_readwrite(&database_url)?; - let index_result = - indexer::index_directory(&mut conn, project_root, &project_id, true, quiet, false)?; + let index_ctx = config::Context { + database_url, + project_root: project_root.to_path_buf(), + project_id: project_id.clone(), + quiet, + falkordb: None, + qdrant: None, + embedding: None, + code_vectors: config::CodeVectorSettings::default(), + daemon_url: None, + }; + let index_result = api::index_files( + api::IndexRequest { + project_root: project_root.to_path_buf(), + path_filter: None, + explicit_files: Vec::new(), + full: false, + require_cpp_semantics: false, + sync_projections: false, + }, + &index_ctx, + )?; if !quiet { eprintln!( "Indexed {} files, {} symbols in {}ms", - index_result.files_indexed, index_result.symbols_found, index_result.duration_ms + index_result.indexed_files, + index_result.symbols_indexed, + index_result.durations.total_ms ); } @@ -69,9 +90,9 @@ pub fn run(project_root: &Path, format: Format, quiet: bool) -> anyhow::Result<( "project_id": project_id, "project_root": project_root.to_string_lossy(), "status": status, - "files_indexed": index_result.files_indexed, - "symbols_found": index_result.symbols_found, - "duration_ms": index_result.duration_ms, + "files_indexed": index_result.indexed_files, + "symbols_found": index_result.symbols_indexed, + "duration_ms": index_result.durations.total_ms, }); if !installed_skills.is_empty() { result["skills_installed"] = serde_json::json!(installed_skills); diff --git a/crates/gcode/src/commands/mod.rs b/crates/gcode/src/commands/mod.rs index e8c3d54..d6c7499 100644 --- a/crates/gcode/src/commands/mod.rs +++ b/crates/gcode/src/commands/mod.rs @@ -3,5 +3,7 @@ pub mod index; pub mod init; pub(crate) mod scope; pub mod search; +pub mod setup; pub mod status; pub mod symbols; +pub mod vector; diff --git a/crates/gcode/src/commands/scope.rs b/crates/gcode/src/commands/scope.rs index 6ccdc1e..7cbc1bb 100644 --- a/crates/gcode/src/commands/scope.rs +++ b/crates/gcode/src/commands/scope.rs @@ -164,6 +164,7 @@ mod tests { falkordb: None, qdrant: None, embedding: None, + code_vectors: crate::config::CodeVectorSettings::default(), daemon_url: None, } } diff --git a/crates/gcode/src/commands/setup.rs b/crates/gcode/src/commands/setup.rs new file mode 100644 index 0000000..f439436 --- /dev/null +++ b/crates/gcode/src/commands/setup.rs @@ -0,0 +1,398 @@ +use anyhow::Context as _; +use gobby_core::provisioning::{ + DEFAULT_EMBEDDING_VECTOR_DIM, DEFAULT_LM_STUDIO_API_BASE, DEFAULT_OLLAMA_API_BASE, + DEFAULT_OLLAMA_MODEL, DockerProvisioningReport, DockerServiceOptions, EmbeddingBootstrap, + StandaloneConfig, compose_file_path, gcore_config_path, provision_docker_services, +}; +use postgres::{Client, NoTls}; +use std::net::{TcpStream, ToSocketAddrs}; +use std::time::Duration; + +use crate::config::{self, QdrantConfig}; +use crate::db; +use crate::graph::code_graph; +use crate::output::{self, Format}; +use crate::setup::{ + self, StandaloneEmbeddingStatus, StandaloneServicesStatus, StandaloneSetupRequest, +}; +use crate::vector::code_symbols; + +pub fn run(request: StandaloneSetupRequest, format: Format, quiet: bool) -> anyhow::Result<()> { + setup::validate_standalone_request(&request)?; + + let home = db::gobby_home()?; + let mut service_options = DockerServiceOptions::new(home.clone()); + apply_service_overrides(&request, &mut service_options); + + let embedding = resolve_embedding_bootstrap(&request)?; + let (database_url, service_report) = resolve_or_provision_database(&request, &service_options)?; + let mut client = connect_postgres_with_retry(&database_url, service_report.is_some())?; + if request.overwrite_code_index { + clear_overwrite_projections(&home, &request, &service_options, service_report.as_ref())?; + } + let mut status = setup::run_standalone_setup(&request, &mut client)?; + + let config_file = write_gcore_config( + &home, + &request, + &service_options, + &database_url, + service_report.as_ref(), + embedding.as_ref(), + )?; + status.config_file = Some(config_file.display().to_string()); + status.services = Some(match service_report { + Some(report) => StandaloneServicesStatus { + provisioned: true, + compose_file: Some(report.compose_file.display().to_string()), + health_checks: report.health_checks, + }, + None => StandaloneServicesStatus { + provisioned: false, + compose_file: service_configured_compose_file(&home), + health_checks: Vec::new(), + }, + }); + status.embedding = embedding.map(|embedding| StandaloneEmbeddingStatus { + provider: embedding.provider, + api_base: embedding.api_base, + model: embedding.model, + vector_dim: embedding.vector_dim, + api_key_env: embedding.api_key_env, + }); + + match format { + Format::Json => output::print_json(&status), + Format::Text => { + if !quiet { + output::print_text(&format!( + "Standalone gcode setup complete in schema {}", + status.schema + ))?; + } + Ok(()) + } + } +} + +struct OverwriteProjectionConfigs { + falkordb: Option, + qdrant: Option, +} + +fn clear_overwrite_projections( + home: &std::path::Path, + request: &StandaloneSetupRequest, + service_options: &DockerServiceOptions, + service_report: Option<&DockerProvisioningReport>, +) -> anyhow::Result<()> { + let configs = overwrite_projection_configs(home, request, service_options, service_report)?; + if let Some(falkordb) = configs.falkordb { + code_graph::clear_all_code_index(&falkordb) + .context("failed to clear FalkorDB code-index projection during overwrite setup")?; + } + if let Some(qdrant) = configs.qdrant { + code_symbols::delete_code_symbol_collections_with_prefix(&qdrant) + .context("failed to delete Qdrant code-symbol collections during overwrite setup")?; + } + Ok(()) +} + +fn overwrite_projection_configs( + home: &std::path::Path, + request: &StandaloneSetupRequest, + service_options: &DockerServiceOptions, + service_report: Option<&DockerProvisioningReport>, +) -> anyhow::Result { + let mut standalone = StandaloneConfig::read_at(&gcore_config_path(home))? + .unwrap_or_else(StandaloneConfig::empty); + + if service_report.is_some() { + standalone.set("databases.falkordb.host", &service_options.falkordb_host); + standalone.set( + "databases.falkordb.port", + service_options.falkordb_port.to_string(), + ); + standalone.set( + "databases.falkordb.password", + &service_options.falkordb_password, + ); + standalone.set("databases.qdrant.url", service_options.qdrant_url()); + } + + if let Some(host) = request.falkordb_host.as_deref() { + standalone.set("databases.falkordb.host", host); + } + if let Some(port) = request.falkordb_port { + standalone.set("databases.falkordb.port", port.to_string()); + } + if let Some(password) = request.falkordb_password.as_deref() { + standalone.set("databases.falkordb.password", password); + } + if let Some(qdrant_url) = request.qdrant_url.as_deref() { + standalone.set("databases.qdrant.url", qdrant_url); + } + + let falkordb = gobby_core::config::resolve_falkordb_config(&mut standalone).map(|connection| { + config::FalkorConfig { + host: connection.host, + port: connection.port, + password: connection.password, + graph_name: config::FALKORDB_GRAPH_NAME.to_string(), + } + }); + let qdrant = gobby_core::config::resolve_qdrant_config(&mut standalone); + + Ok(OverwriteProjectionConfigs { falkordb, qdrant }) +} + +fn resolve_or_provision_database( + request: &StandaloneSetupRequest, + service_options: &DockerServiceOptions, +) -> anyhow::Result<(String, Option)> { + if let Some(database_url) = request.database_url.as_deref() { + return Ok((database_url.to_string(), None)); + } + + if request.no_services { + return db::resolve_database_url().map(|url| (url, None)); + } + + match db::resolve_database_url() { + Ok(database_url) => Ok((database_url, None)), + Err(_) => { + let report = provision_docker_services(service_options) + .context("failed to provision standalone Docker services")?; + Ok((service_options.database_url(), Some(report))) + } + } +} + +fn apply_service_overrides( + request: &StandaloneSetupRequest, + service_options: &mut DockerServiceOptions, +) { + if let Some(host) = request.falkordb_host.as_deref() { + service_options.falkordb_host = host.to_string(); + } + if let Some(port) = request.falkordb_port { + service_options.falkordb_port = port; + } + if let Some(password) = request.falkordb_password.as_deref() { + service_options.falkordb_password = password.to_string(); + } +} + +fn connect_postgres_with_retry(database_url: &str, retry: bool) -> anyhow::Result { + let attempts = if retry { 30 } else { 1 }; + let mut last_error = None; + for attempt in 0..attempts { + match Client::connect(database_url, NoTls) { + Ok(client) => return Ok(client), + Err(err) => last_error = Some(err), + } + if attempt + 1 < attempts { + std::thread::sleep(Duration::from_secs(2)); + } + } + match last_error { + Some(err) => Err(anyhow::Error::new(err) + .context("failed to connect to the standalone PostgreSQL database")), + None => anyhow::bail!("failed to connect to the standalone PostgreSQL database"), + } +} + +fn write_gcore_config( + home: &std::path::Path, + request: &StandaloneSetupRequest, + service_options: &DockerServiceOptions, + database_url: &str, + service_report: Option<&DockerProvisioningReport>, + embedding: Option<&EmbeddingBootstrap>, +) -> anyhow::Result { + let path = gcore_config_path(home); + let mut config = StandaloneConfig::read_at(&path)?.unwrap_or_else(StandaloneConfig::empty); + + config.set("databases.postgres.dsn", database_url); + + if let Some(report) = service_report { + config.set("databases.falkordb.host", &service_options.falkordb_host); + config.set( + "databases.falkordb.port", + service_options.falkordb_port.to_string(), + ); + config.set( + "databases.falkordb.password", + &service_options.falkordb_password, + ); + config.remove("databases.falkordb.requirepass"); + config.set("databases.qdrant.url", service_options.qdrant_url()); + config.set( + "services.compose_file", + report.compose_file.display().to_string(), + ); + } else { + if let Some(host) = request.falkordb_host.as_deref() { + config.set("databases.falkordb.host", host); + } + if let Some(port) = request.falkordb_port { + config.set("databases.falkordb.port", port.to_string()); + } + if let Some(password) = request.falkordb_password.as_deref() { + config.set("databases.falkordb.password", password); + config.remove("databases.falkordb.requirepass"); + } + if let Some(qdrant_url) = request.qdrant_url.as_deref() { + config.set("databases.qdrant.url", qdrant_url); + } + } + + if let Some(embedding) = embedding { + config.set("embeddings.provider", &embedding.provider); + config.set("embeddings.api_base", &embedding.api_base); + config.set("embeddings.model", &embedding.model); + config.set("embeddings.vector_dim", embedding.vector_dim.to_string()); + match embedding.api_key_env.as_deref() { + Some(api_key_env) => config.set("embeddings.api_key_env", api_key_env), + None => config.remove("embeddings.api_key_env"), + } + } + + config.write_at(&path)?; + Ok(path) +} + +fn service_configured_compose_file(home: &std::path::Path) -> Option { + let compose = compose_file_path(home); + compose.exists().then(|| compose.display().to_string()) +} + +fn resolve_embedding_bootstrap( + request: &StandaloneSetupRequest, +) -> anyhow::Result> { + let provider = request + .embedding_provider + .as_deref() + .map(|provider| provider.trim().to_ascii_lowercase()); + + let mut embedding = match provider.as_deref() { + Some("none") => return Ok(None), + Some("lm-studio") | Some("lmstudio") => EmbeddingBootstrap::lm_studio(), + Some("ollama") => EmbeddingBootstrap::ollama(), + Some("openai-compatible") | Some("openai") | Some("remote") => { + explicit_embedding_bootstrap(request)? + } + Some(other) => anyhow::bail!( + "unsupported embedding provider `{other}`; expected lm-studio, ollama, openai-compatible, or none" + ), + None if request.embedding_api_base.is_some() || request.embedding_model.is_some() => { + explicit_embedding_bootstrap(request)? + } + None if endpoint_reachable(DEFAULT_LM_STUDIO_API_BASE) => EmbeddingBootstrap::lm_studio(), + None if endpoint_reachable(DEFAULT_OLLAMA_API_BASE) => EmbeddingBootstrap::ollama(), + None => EmbeddingBootstrap::lm_studio(), + }; + + if let Some(api_base) = request.embedding_api_base.as_deref() { + embedding.api_base = api_base.to_string(); + } + if let Some(model) = request.embedding_model.as_deref() { + embedding.model = model.to_string(); + } + if let Some(vector_dim) = request.embedding_vector_dim { + if vector_dim == 0 { + anyhow::bail!("--embedding-vector-dim must be positive"); + } + embedding.vector_dim = vector_dim; + } + if let Some(api_key_env) = request.embedding_api_key_env.as_deref() { + embedding.api_key_env = Some(api_key_env.to_string()); + } + + Ok(Some(embedding)) +} + +fn explicit_embedding_bootstrap( + request: &StandaloneSetupRequest, +) -> anyhow::Result { + let Some(api_base) = request.embedding_api_base.as_deref() else { + anyhow::bail!("--embedding-api-base is required for openai-compatible embeddings"); + }; + Ok(EmbeddingBootstrap { + provider: "openai-compatible".to_string(), + api_base: api_base.to_string(), + model: request + .embedding_model + .clone() + .unwrap_or_else(|| DEFAULT_OLLAMA_MODEL.to_string()), + vector_dim: request + .embedding_vector_dim + .unwrap_or(DEFAULT_EMBEDDING_VECTOR_DIM), + api_key_env: request.embedding_api_key_env.clone(), + }) +} + +fn endpoint_reachable(api_base: &str) -> bool { + let Ok(url) = reqwest::Url::parse(api_base) else { + return false; + }; + let Some(host) = url.host_str() else { + return false; + }; + let Some(port) = url.port_or_known_default() else { + return false; + }; + let Ok(addrs) = (host, port).to_socket_addrs() else { + return false; + }; + addrs + .into_iter() + .any(|addr| TcpStream::connect_timeout(&addr, Duration::from_millis(150)).is_ok()) +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + #[serial_test::serial] + fn standalone_command_installs_public_code_index_subset() { + let Ok(database_url) = std::env::var("GCODE_POSTGRES_TEST_DATABASE_URL") else { + return; + }; + let home = tempfile::tempdir().expect("temp home"); + unsafe { std::env::set_var("GOBBY_HOME", home.path()) }; + let request = StandaloneSetupRequest::new(true, Some(database_url.clone()), None); + + run(request, Format::Json, true).expect("standalone setup runs"); + + let mut client = + postgres::Client::connect(&database_url, postgres::NoTls).expect("connect test db"); + let exists: bool = client + .query_one("SELECT to_regclass('public.code_symbols') IS NOT NULL", &[]) + .expect("check code_symbols") + .get(0); + assert!(exists); + + let forbidden_exists: bool = client + .query_one("SELECT to_regclass('public.config_store') IS NOT NULL", &[]) + .expect("check config_store") + .get(0); + assert!(!forbidden_exists); + assert!(home.path().join("gcore.yaml").exists()); + + client + .batch_execute( + "DROP INDEX IF EXISTS public.code_symbols_search_bm25; + DROP INDEX IF EXISTS public.code_content_search_bm25; + DROP TABLE IF EXISTS public.code_calls; + DROP TABLE IF EXISTS public.code_imports; + DROP TABLE IF EXISTS public.code_content_chunks; + DROP TABLE IF EXISTS public.code_symbols; + DROP TABLE IF EXISTS public.code_indexed_files; + DROP TABLE IF EXISTS public.code_indexed_projects;", + ) + .expect("drop code-index test objects"); + unsafe { std::env::remove_var("GOBBY_HOME") }; + } +} diff --git a/crates/gcode/src/commands/status.rs b/crates/gcode/src/commands/status.rs index 1d5457a..b1eca50 100644 --- a/crates/gcode/src/commands/status.rs +++ b/crates/gcode/src/commands/status.rs @@ -1,11 +1,15 @@ -use std::path::Path; +use std::collections::{BTreeMap, HashSet}; +use std::path::{Path, PathBuf}; use crate::config; use crate::config::Context; use crate::db; +use crate::graph::code_graph; use crate::index::indexer; use crate::models::IndexedProject; use crate::output::{self, Format}; +use crate::utils::short_id; +use crate::vector::code_symbols; /// Format a `last_indexed_at` value for display. /// Handles both epoch seconds ("1774970556") and ISO 8601 ("2026-03-29T18:52:25.750230+00:00"). @@ -123,7 +127,20 @@ pub fn invalidate(ctx: &Context, force: bool) -> anyhow::Result<()> { } let mut conn = db::connect_readwrite(&ctx.database_url)?; - indexer::invalidate(&mut conn, &ctx.project_id, ctx.daemon_url.as_deref()) + indexer::invalidate(&mut conn, &ctx.project_id, ctx.daemon_url.as_deref())?; + cleanup_project_projections(ctx) +} + +fn cleanup_project_projections(ctx: &Context) -> anyhow::Result<()> { + if ctx.falkordb.is_some() { + code_graph::clear_project(ctx) + .map_err(|err| anyhow::anyhow!("failed to clear FalkorDB projection: {err}"))?; + } + if let Some(qdrant) = &ctx.qdrant { + code_symbols::delete_project_collection(qdrant, &ctx.project_id) + .map_err(|err| anyhow::anyhow!("failed to delete Qdrant projection: {err}"))?; + } + Ok(()) } /// Collect indexed projects from the PostgreSQL hub. @@ -240,13 +257,72 @@ fn is_stale(p: &IndexedProject) -> Option<&'static str> { None } +#[derive(Debug)] +struct StaleProject<'a> { + project: &'a IndexedProject, + reason: String, +} + +fn stale_projects(projects: &[IndexedProject]) -> Vec> { + let mut stale = Vec::new(); + let mut stale_ids = HashSet::new(); + + for project in projects { + if let Some(reason) = is_stale(project) { + stale_ids.insert(project.id.clone()); + stale.push(StaleProject { + project, + reason: reason.to_string(), + }); + } + } + + let mut by_root: BTreeMap> = BTreeMap::new(); + for project in projects { + if stale_ids.contains(&project.id) { + continue; + } + let Ok(canonical_root) = Path::new(&project.root_path).canonicalize() else { + continue; + }; + by_root.entry(canonical_root).or_default().push(project); + } + + for (root, entries) in by_root { + if entries.len() < 2 { + continue; + } + let Ok(identity) = config::resolve_project_identity(&root, config::MissingIdentity::Error) + else { + continue; + }; + if !entries + .iter() + .any(|project| project.id == identity.project_id) + { + continue; + } + for project in entries { + if project.id == identity.project_id || !stale_ids.insert(project.id.clone()) { + continue; + } + stale.push(StaleProject { + project, + reason: format!( + "duplicate root superseded by current project id {}", + short_id(&identity.project_id) + ), + }); + } + } + + stale +} + /// Remove stale project entries from the code index. pub fn prune(force: bool) -> anyhow::Result<()> { let all_projects = collect_projects()?; - let stale: Vec<_> = all_projects - .iter() - .filter_map(|p| is_stale(p).map(|reason| (p, reason))) - .collect(); + let stale = stale_projects(&all_projects); if stale.is_empty() { eprintln!("No stale projects found."); @@ -254,8 +330,12 @@ pub fn prune(force: bool) -> anyhow::Result<()> { } eprintln!("Found {} stale project(s):", stale.len()); - for (p, reason) in &stale { - eprintln!(" {} — {}", display_name(p), reason); + for stale_project in &stale { + eprintln!( + " {} — {}", + display_name(stale_project.project), + stale_project.reason + ); } if !force { @@ -274,8 +354,8 @@ pub fn prune(force: bool) -> anyhow::Result<()> { let database_url = db::resolve_database_url()?; let mut conn = db::connect_readwrite(&database_url)?; - for (p, _) in &stale { - indexer::invalidate(&mut conn, &p.id, daemon_url.as_deref())?; + for stale_project in &stale { + indexer::invalidate(&mut conn, &stale_project.project.id, daemon_url.as_deref())?; } eprintln!("Pruned {} stale project(s).", stale.len()); @@ -329,3 +409,57 @@ pub fn repo_outline(ctx: &Context, format: Format) -> anyhow::Result<()> { } } } + +#[cfg(test)] +mod tests { + use super::*; + + fn indexed_project(id: &str, root_path: &Path) -> IndexedProject { + IndexedProject { + id: id.to_string(), + root_path: root_path.to_string_lossy().to_string(), + total_files: 1, + total_symbols: 1, + last_indexed_at: "1".to_string(), + index_duration_ms: 1, + total_eligible_files: Some(1), + } + } + + fn write_project_json(root: &Path, id: &str) { + let gobby_dir = root.join(".gobby"); + std::fs::create_dir_all(&gobby_dir).expect("create .gobby"); + std::fs::write( + gobby_dir.join("project.json"), + serde_json::json!({ + "id": id, + "name": "project", + "parent_project_path": root.to_string_lossy(), + "parent_project_id": id + }) + .to_string(), + ) + .expect("write project.json"); + } + + #[test] + fn duplicate_root_prune_detection_keeps_resolved_project_id() { + let tmp = tempfile::tempdir().expect("tempdir"); + let root = tmp.path().canonicalize().expect("canonical root"); + let current_id = "d45545c5-current-project-id"; + let stale_id = "39c31b8f-stale-project-id"; + write_project_json(&root, current_id); + + let projects = vec![ + indexed_project(current_id, &root), + indexed_project(stale_id, &root), + ]; + + let stale = stale_projects(&projects); + + assert_eq!(stale.len(), 1); + assert_eq!(stale[0].project.id, stale_id); + assert!(stale[0].reason.contains("duplicate root")); + assert!(stale.iter().all(|entry| entry.project.id != current_id)); + } +} diff --git a/crates/gcode/src/commands/symbols.rs b/crates/gcode/src/commands/symbols.rs index cf14d89..0af51db 100644 --- a/crates/gcode/src/commands/symbols.rs +++ b/crates/gcode/src/commands/symbols.rs @@ -4,6 +4,7 @@ use crate::db; use crate::models::Symbol; use crate::output::{self, Format}; use crate::savings; +use crate::utils::short_id; pub fn outline(ctx: &Context, file: &str, format: Format, verbose: bool) -> anyhow::Result<()> { let mut conn = db::connect_readonly(&ctx.database_url)?; @@ -114,10 +115,6 @@ fn format_outline_text_line(symbol: &Symbol) -> String { line } -fn short_id(id: &str) -> &str { - id.get(..8).unwrap_or(id) -} - pub fn symbol(ctx: &Context, id: &str, format: Format) -> anyhow::Result<()> { let mut conn = db::connect_readonly(&ctx.database_url)?; let columns = db::symbol_select_columns(""); @@ -332,9 +329,4 @@ mod tests { assert!(line.contains("id=12345678-1234-5678-1234-567812345678")); assert!(line.contains("sig=pub fn outline() -> anyhow::Result<()> {")); } - - #[test] - fn short_id_truncates_long_ids() { - assert_eq!(short_id("1234567890"), "12345678"); - } } diff --git a/crates/gcode/src/commands/vector.rs b/crates/gcode/src/commands/vector.rs new file mode 100644 index 0000000..c718e53 --- /dev/null +++ b/crates/gcode/src/commands/vector.rs @@ -0,0 +1,235 @@ +use crate::config::{CODE_SYMBOL_COLLECTION_PREFIX, Context}; +use crate::db; +use crate::output::{self, Format}; +use crate::projection::sync::{ProjectionStatus, ProjectionSyncReport}; +use crate::vector::code_symbols::{ + self, CodeSymbolVectorLifecycle, CodeSymbolVectorLifecycleAction, + CodeSymbolVectorLifecycleOutput, CodeSymbolVectorLifecycleStatus, VectorLifecycleError, +}; +use serde::Serialize; + +pub fn lifecycle_status( + ctx: &Context, + action: CodeSymbolVectorLifecycleAction, +) -> CodeSymbolVectorLifecycleStatus { + let prefix = CODE_SYMBOL_COLLECTION_PREFIX; + code_symbols::lifecycle_status(ctx.project_id.clone(), prefix, action) +} + +pub(crate) fn lifecycle_from_context( + ctx: &Context, +) -> Result { + let qdrant = ctx + .qdrant + .clone() + .ok_or(VectorLifecycleError::MissingQdrantConfig)?; + let embedding = ctx + .embedding + .clone() + .ok_or(VectorLifecycleError::MissingEmbeddingConfig)?; + CodeSymbolVectorLifecycle::new( + ctx.project_id.clone(), + qdrant, + embedding, + ctx.code_vectors.clone(), + ) +} + +pub fn sync_file(ctx: &Context, file_path: &str, format: Format) -> anyhow::Result<()> { + let mut lifecycle = lifecycle_from_context(ctx)?; + let mut conn = db::connect_readwrite(&ctx.database_url)?; + if !db::indexed_file_exists(&mut conn, &ctx.project_id, file_path)? { + anyhow::bail!( + "indexed file `{file_path}` was not found for project {}", + ctx.project_id + ); + } + let symbols = code_symbols::fetch_symbols_for_file(&mut conn, &ctx.project_id, file_path)?; + let output = lifecycle.sync_file_symbols(file_path, &symbols)?; + if !db::mark_vectors_synced(&mut conn, &ctx.project_id, file_path)? { + anyhow::bail!( + "indexed file `{file_path}` was not found for project {}", + ctx.project_id + ); + } + let report = ProjectionSyncReport::ok(1, output.symbols); + print_lifecycle_output(&output, report, format) +} + +pub fn clear(ctx: &Context, format: Format) -> anyhow::Result<()> { + let mut lifecycle = lifecycle_from_context(ctx)?; + let mut conn = db::connect_readwrite(&ctx.database_url)?; + db::reset_vectors_sync_for_project(&mut conn, &ctx.project_id)?; + let output = lifecycle.clear_project_vectors()?; + let report = ProjectionSyncReport::ok(0, 0); + print_lifecycle_output(&output, report, format) +} + +pub fn rebuild(ctx: &Context, format: Format) -> anyhow::Result<()> { + let mut lifecycle = lifecycle_from_context(ctx)?; + let mut conn = db::connect_readwrite(&ctx.database_url)?; + let file_paths = db::list_indexed_file_paths(&mut conn, &ctx.project_id)?; + db::reset_vectors_sync_for_project(&mut conn, &ctx.project_id)?; + let symbols = code_symbols::fetch_symbols_for_project(&mut conn, &ctx.project_id)?; + let output = lifecycle.rebuild_symbols(&symbols)?; + let files_synced = db::mark_project_vectors_synced(&mut conn, &ctx.project_id)? as usize; + let report = ProjectionSyncReport::ok(files_synced.min(file_paths.len()), output.symbols); + print_lifecycle_output(&output, report, format) +} + +fn print_lifecycle_output( + output: &CodeSymbolVectorLifecycleOutput, + report: ProjectionSyncReport, + format: Format, +) -> anyhow::Result<()> { + let payload = lifecycle_json_payload(output, report); + match format { + Format::Json => output::print_json(&payload), + Format::Text => output::print_text(&serde_json::to_string(&payload)?), + } +} + +#[derive(Debug, Clone, PartialEq, Eq, Serialize)] +pub(crate) struct VectorLifecycleJsonPayload { + pub project_id: String, + pub projection: &'static str, + pub action: CodeSymbolVectorLifecycleAction, + pub file_path: Option, + pub collection: String, + pub status: ProjectionStatus, + pub synced_files: usize, + pub synced_symbols: usize, + pub degraded: bool, + pub error: Option, + pub symbols: usize, + pub vectors_upserted: usize, + pub vectors_deleted: usize, + pub summary: String, +} + +pub(crate) fn lifecycle_json_payload( + output: &CodeSymbolVectorLifecycleOutput, + report: ProjectionSyncReport, +) -> VectorLifecycleJsonPayload { + VectorLifecycleJsonPayload { + project_id: output.project_id.clone(), + projection: "vector", + action: output.action, + file_path: output.file_path.clone(), + collection: output.collection.clone(), + status: report.status, + synced_files: report.synced_files, + synced_symbols: report.synced_symbols, + degraded: report.degraded, + error: report.error, + symbols: output.symbols, + vectors_upserted: output.vectors_upserted, + vectors_deleted: output.vectors_deleted, + summary: output.summary.clone(), + } +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::projection::sync::{ProjectionStatus, ProjectionSyncError, ProjectionSyncReport}; + use serde_json::json; + use std::path::PathBuf; + + fn make_ctx() -> Context { + Context { + database_url: "postgresql://localhost/nonexistent".to_string(), + project_root: PathBuf::from("/nonexistent"), + project_id: "project-1".to_string(), + quiet: true, + falkordb: None, + qdrant: None, + embedding: None, + code_vectors: crate::config::CodeVectorSettings { vector_dim: None }, + daemon_url: None, + } + } + + #[test] + fn vector_lifecycle_requires_config() { + let err = lifecycle_from_context(&make_ctx()).expect_err("missing config must fail"); + assert!(matches!( + err, + code_symbols::VectorLifecycleError::MissingQdrantConfig + )); + + let ctx = Context { + qdrant: Some(crate::config::QdrantConfig { + url: Some("http://localhost:6333".to_string()), + api_key: None, + }), + ..make_ctx() + }; + let err = lifecycle_from_context(&ctx).expect_err("missing embedding must fail"); + assert!(matches!( + err, + code_symbols::VectorLifecycleError::MissingEmbeddingConfig + )); + } + + #[test] + fn lifecycle_json_contract() { + let output = CodeSymbolVectorLifecycleOutput { + project_id: "project-1".to_string(), + collection: "gcode_code_symbols_project-1".to_string(), + action: CodeSymbolVectorLifecycleAction::SyncFile, + file_path: Some("src/lib.rs".to_string()), + symbols: 2, + vectors_upserted: 2, + vectors_deleted: 1, + summary: "2 vector(s) upserted, 1 delete operation(s) issued".to_string(), + }; + + let payload = lifecycle_json_payload( + &output, + ProjectionSyncReport { + status: ProjectionStatus::Ok, + synced_files: 1, + synced_symbols: 2, + degraded: false, + error: None, + }, + ); + assert_eq!( + serde_json::to_value(&payload).expect("payload serializes"), + json!({ + "project_id": "project-1", + "projection": "vector", + "action": "sync_file", + "file_path": "src/lib.rs", + "collection": "gcode_code_symbols_project-1", + "status": "ok", + "synced_files": 1, + "synced_symbols": 2, + "degraded": false, + "error": null, + "symbols": 2, + "vectors_upserted": 2, + "vectors_deleted": 1, + "summary": "2 vector(s) upserted, 1 delete operation(s) issued" + }) + ); + + let degraded = lifecycle_json_payload( + &output, + ProjectionSyncReport { + status: ProjectionStatus::Degraded, + synced_files: 0, + synced_symbols: 0, + degraded: true, + error: Some(ProjectionSyncError { + kind: "missing_qdrant_config".to_string(), + message: "Qdrant config is required".to_string(), + }), + }, + ); + let degraded = serde_json::to_value(°raded).expect("payload serializes"); + assert_eq!(degraded["status"], "degraded"); + assert_eq!(degraded["error"]["kind"], "missing_qdrant_config"); + } +} diff --git a/crates/gcode/src/config.rs b/crates/gcode/src/config.rs index 40dba4f..b80ee37 100644 --- a/crates/gcode/src/config.rs +++ b/crates/gcode/src/config.rs @@ -5,17 +5,21 @@ //! //! Source: src/gobby/config/bootstrap.py, src/gobby/config/persistence.py +use std::fmt; use std::path::{Path, PathBuf}; +use gobby_core::config::ConfigSource; use gobby_core::project::{find_project_root, read_project_id}; +use gobby_core::provisioning::{GCORE_CONFIG_FILENAME, StandaloneConfig}; use postgres::Client; use crate::db; use crate::git::{self, WorktreeKind}; use crate::secrets; +use crate::utils::short_id; /// FalkorDB connection configuration. -#[derive(Debug, Clone)] +#[derive(Debug, Clone, PartialEq, Eq)] pub struct FalkorConfig { pub host: String, pub port: u16, @@ -24,19 +28,55 @@ pub struct FalkorConfig { } /// Qdrant connection configuration. -#[derive(Debug, Clone)] -pub struct QdrantConfig { - pub url: Option, - pub api_key: Option, - pub collection_prefix: String, -} +pub type QdrantConfig = gobby_core::config::QdrantConfig; /// Embedding API configuration (OpenAI-compatible endpoint). -#[derive(Debug, Clone)] -pub struct EmbeddingConfig { - pub api_base: String, - pub model: String, - pub api_key: Option, +pub type EmbeddingConfig = gobby_core::config::EmbeddingConfig; + +pub const FALKORDB_GRAPH_NAME: &str = "gobby_code"; +pub const CODE_SYMBOL_COLLECTION_PREFIX: &str = "code_symbols_"; +pub const GOBBY_EMBEDDING_VECTOR_DIM_ENV: &str = "GOBBY_EMBEDDING_VECTOR_DIM"; +pub const EMBEDDING_VECTOR_DIM_CONFIG_KEY: &str = "embeddings.vector_dim"; + +pub const GOBBY_FALKORDB_HOST_ENV: &str = "GOBBY_FALKORDB_HOST"; +pub const GOBBY_FALKORDB_PORT_ENV: &str = "GOBBY_FALKORDB_PORT"; +pub const GOBBY_FALKORDB_PASSWORD_ENV: &str = "GOBBY_FALKORDB_PASSWORD"; + +pub const FALKORDB_HOST_CONFIG_KEY: &str = "databases.falkordb.host"; +pub const FALKORDB_PORT_CONFIG_KEY: &str = "databases.falkordb.port"; +pub const FALKORDB_PASSWORD_CONFIG_KEY: &str = "databases.falkordb.requirepass"; + +#[derive(Debug, Clone, PartialEq, Eq, Default)] +pub struct CodeVectorSettings { + pub vector_dim: Option, +} + +#[derive(Debug, Clone, PartialEq, Eq)] +pub enum CodeVectorConfigError { + InvalidVectorDim { source: &'static str, value: String }, +} + +impl fmt::Display for CodeVectorConfigError { + fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { + match self { + Self::InvalidVectorDim { source, value } => write!( + f, + "invalid code vector dimension from {source}: `{value}` must be a positive integer" + ), + } + } +} + +impl std::error::Error for CodeVectorConfigError {} + +impl FalkorConfig { + pub fn connection_config(&self) -> gobby_core::config::FalkorConfig { + gobby_core::config::FalkorConfig { + host: self.host.clone(), + port: self.port, + password: self.password.clone(), + } + } } /// Resolved runtime context for gcode commands. @@ -55,6 +95,8 @@ pub struct Context { pub qdrant: Option, /// Embedding API config (None if unavailable → no semantic search) pub embedding: Option, + /// Code-symbol vector projection settings owned by gcode. + pub code_vectors: CodeVectorSettings, /// Gobby daemon base URL (e.g. http://localhost:60887) pub daemon_url: Option, } @@ -105,10 +147,12 @@ impl Context { let project_id = identity.project_id; // Resolve service configs from config_store (best-effort). + let standalone_config = read_standalone_config(); let mut conn = db::connect_readonly(&database_url)?; - let falkordb = resolve_falkordb_config(&mut conn, quiet); - let qdrant = resolve_qdrant_config(&mut conn, quiet); - let embedding = resolve_embedding_config(&mut conn, quiet); + let falkordb = resolve_falkordb_config(&mut conn, standalone_config.clone(), quiet); + let qdrant = resolve_qdrant_config(&mut conn, standalone_config.clone(), quiet); + let embedding = resolve_embedding_config(&mut conn, standalone_config.clone(), quiet); + let code_vectors = resolve_code_vector_settings(&mut conn, standalone_config)?; let daemon_url = resolve_daemon_url(); @@ -120,6 +164,31 @@ impl Context { falkordb, qdrant, embedding, + code_vectors, + daemon_url, + }) + } + + /// Resolve service config for a caller-supplied project id without touching cwd identity. + pub fn resolve_for_project_id(project_id: &str, quiet: bool) -> anyhow::Result { + let project_id = normalize_project_id(project_id)?; + let database_url = db::resolve_database_url()?; + + let standalone_config = read_standalone_config(); + let mut conn = db::connect_readonly(&database_url)?; + let falkordb = resolve_falkordb_config(&mut conn, standalone_config, quiet); + + let daemon_url = resolve_daemon_url(); + + Ok(Self { + database_url, + project_root: PathBuf::new(), + project_id, + quiet, + falkordb, + qdrant: None, + embedding: None, + code_vectors: CodeVectorSettings::default(), daemon_url, }) } @@ -133,7 +202,9 @@ pub fn resolve_project_identity( .canonicalize() .unwrap_or_else(|_| absolute_fallback(project_root)); - if crate::project::read_isolation_marker(&root).is_some() { + if let Some(marker) = crate::project::read_isolation_marker(&root) + && !is_self_referential_isolation_marker(&marker, &root) + { return Ok(ProjectIdentity { project_id: crate::project::code_index_id_for_root(&root), root, @@ -202,6 +273,31 @@ pub fn resolve_project_identity( } } +fn is_self_referential_isolation_marker( + marker: &crate::project::IsolationMarker, + root: &Path, +) -> bool { + let Some(parent_project_path) = marker.parent_project_path.as_deref() else { + return false; + }; + let parent = PathBuf::from(parent_project_path); + let parent = if parent.is_absolute() { + parent + } else { + root.join(parent) + }; + let parent = parent.canonicalize().unwrap_or(parent); + parent == root +} + +fn normalize_project_id(project_id: &str) -> anyhow::Result { + let project_id = project_id.trim(); + if project_id.is_empty() { + anyhow::bail!("--project-id must not be empty"); + } + Ok(project_id.to_string()) +} + pub fn warn_project_identity(identity: &ProjectIdentity, quiet: bool) { if quiet { return; @@ -339,70 +435,66 @@ fn absolute_fallback(path: &Path) -> PathBuf { } } -fn short_id(id: &str) -> &str { - id.get(..8).unwrap_or(id) -} - -// ── Config store helpers ───────────────────────────────────────────── +// ── Config store adapter ───────────────────────────────────────────── -/// Read a value from the config_store table, returning None if missing. -/// Values are stored as JSON — decode string values while preserving legacy text. -fn read_config_value(conn: &mut Client, key: &str) -> Option { - let raw: String = conn - .query_opt("SELECT value FROM config_store WHERE key = $1", &[&key]) - .ok()?? - .try_get("value") - .ok()?; - decode_config_value(&raw) +pub(crate) struct PostgresConfigSource<'a> { + conn: &'a mut Client, } -fn decode_config_value(raw: &str) -> Option { - match serde_json::from_str::(raw) { - Ok(serde_json::Value::String(text)) => Some(text), - Ok(value @ (serde_json::Value::Array(_) | serde_json::Value::Object(_))) => { - Some(serde_json::to_string(&value).unwrap_or_else(|_| raw.to_string())) - } - Ok(value) => Some(value.to_string()), - Err(_) => Some(raw.to_string()), +impl gobby_core::config::ConfigSource for PostgresConfigSource<'_> { + fn config_value(&mut self, key: &str) -> Option { + let key = canonical_config_key(key); + gobby_core::postgres::read_config_value(self.conn, key) + .ok() + .flatten() + .and_then(|raw| gobby_core::config::decode_config_value(&raw)) } -} - -const FALKORDB_DEFAULT_PORT: u16 = 16379; -const FALKORDB_GRAPH_NAME: &str = "gobby_code"; -trait FalkorConfigSource { - fn config_value(&mut self, key: &str) -> Option; - fn resolve_value(&mut self, value: &str) -> anyhow::Result; + fn resolve_value(&mut self, value: &str) -> anyhow::Result { + secrets::resolve_config_value(value, self.conn) + } } -struct PostgresFalkorConfigSource<'a> { - conn: &'a mut Client, +struct FallbackConfigSource<'a> { + postgres: PostgresConfigSource<'a>, + standalone: Option, } -impl FalkorConfigSource for PostgresFalkorConfigSource<'_> { +impl ConfigSource for FallbackConfigSource<'_> { fn config_value(&mut self, key: &str) -> Option { - read_config_value(self.conn, key) + self.postgres.config_value(key).or_else(|| { + self.standalone + .as_mut() + .and_then(|standalone| standalone.config_value(key)) + }) } fn resolve_value(&mut self, value: &str) -> anyhow::Result { - secrets::resolve_config_value(value, self.conn) + self.postgres.resolve_value(value) } } +fn read_standalone_config() -> Option { + let home = db::gobby_home().ok()?; + StandaloneConfig::read_at(&home.join(GCORE_CONFIG_FILENAME)) + .ok() + .flatten() +} + #[cfg(test)] -struct ClosureFalkorConfigSource { +struct ClosureConfigSource { read_config_value: R, resolve_value: S, } #[cfg(test)] -impl FalkorConfigSource for ClosureFalkorConfigSource +impl ConfigSource for ClosureConfigSource where R: FnMut(&str) -> Option, S: FnMut(&str) -> anyhow::Result, { fn config_value(&mut self, key: &str) -> Option { - (self.read_config_value)(key) + (self.read_config_value)(key).and_then(|raw| gobby_core::config::decode_config_value(&raw)) } fn resolve_value(&mut self, value: &str) -> anyhow::Result { @@ -410,156 +502,169 @@ where } } +fn canonical_config_key(key: &str) -> &str { + match key { + FALKORDB_HOST_CONFIG_KEY => FALKORDB_HOST_CONFIG_KEY, + FALKORDB_PORT_CONFIG_KEY => FALKORDB_PORT_CONFIG_KEY, + FALKORDB_PASSWORD_CONFIG_KEY => FALKORDB_PASSWORD_CONFIG_KEY, + _ => key, + } +} + #[cfg(test)] fn resolve_falkordb_config_from_values( read_config_value: R, - quiet: bool, resolve_value: S, ) -> Option where R: FnMut(&str) -> Option, S: FnMut(&str) -> anyhow::Result, { - let mut source = ClosureFalkorConfigSource { + let mut source = ClosureConfigSource { read_config_value, resolve_value, }; - resolve_falkordb_config_from_source(&mut source, quiet) + resolve_falkordb_config_from_source(&mut source) } -/// Resolve FalkorDB configuration from config_store + env vars. -fn resolve_falkordb_config(conn: &mut Client, quiet: bool) -> Option { - let mut source = PostgresFalkorConfigSource { conn }; - resolve_falkordb_config_from_source(&mut source, quiet) +#[cfg(test)] +fn resolve_qdrant_config_from_values( + read_config_value: R, + resolve_value: S, +) -> Option +where + R: FnMut(&str) -> Option, + S: FnMut(&str) -> anyhow::Result, +{ + let mut source = ClosureConfigSource { + read_config_value, + resolve_value, + }; + gobby_core::config::resolve_qdrant_config(&mut source) } -fn resolve_falkordb_config_from_source( - source: &mut impl FalkorConfigSource, - quiet: bool, -) -> Option { - let host = std::env::var("GOBBY_FALKORDB_HOST") - .ok() - .filter(|value| !value.trim().is_empty()) - .or_else(|| { - source - .config_value("databases.falkordb.host") - .filter(|value| !value.trim().is_empty()) - })?; +#[cfg(test)] +fn resolve_embedding_config_from_values( + read_config_value: R, + resolve_value: S, +) -> Option +where + R: FnMut(&str) -> Option, + S: FnMut(&str) -> anyhow::Result, +{ + let mut source = ClosureConfigSource { + read_config_value, + resolve_value, + }; + gobby_core::config::resolve_embedding_config(&mut source) +} - let raw_port = std::env::var("GOBBY_FALKORDB_PORT") - .ok() - .or_else(|| source.config_value("databases.falkordb.port")); - let port = parse_falkordb_port(raw_port.as_deref(), quiet); +#[cfg(test)] +fn resolve_code_vector_settings_from_values( + read_config_value: R, +) -> Result +where + R: FnMut(&str) -> Option, +{ + let mut source = ClosureConfigSource { + read_config_value, + resolve_value: |value: &str| Ok(value.to_string()), + }; + resolve_code_vector_settings_from_source(&mut source) +} - let raw_password = std::env::var("GOBBY_FALKORDB_PASSWORD") - .ok() - .or_else(|| source.config_value("databases.falkordb.requirepass")) - .filter(|value| !value.trim().is_empty()); - let password = match raw_password { - Some(value) => match source.resolve_value(&value) { - Ok(resolved) => Some(resolved), - Err(e) => { - if !quiet { - eprintln!("Warning: failed to resolve FalkorDB password: {e}"); - } - None - } - }, - None => None, +/// Resolve FalkorDB configuration from config_store + env vars. +fn resolve_falkordb_config( + conn: &mut Client, + standalone: Option, + _quiet: bool, +) -> Option { + let mut source = FallbackConfigSource { + postgres: PostgresConfigSource { conn }, + standalone, }; + resolve_falkordb_config_from_source(&mut source) +} + +fn resolve_falkordb_config_from_source(source: &mut impl ConfigSource) -> Option { + let connection = gobby_core::config::resolve_falkordb_config(source)?; Some(FalkorConfig { - host, - port, - password, + host: connection.host, + port: connection.port, + password: connection.password, graph_name: FALKORDB_GRAPH_NAME.to_string(), }) } -fn parse_falkordb_port(raw_port: Option<&str>, quiet: bool) -> u16 { - match raw_port { - Some(raw) => match raw.parse::() { - Ok(port) => port, - Err(e) => { - if !quiet { - eprintln!( - "Warning: invalid FalkorDB port `{raw}` ({e}); using {FALKORDB_DEFAULT_PORT}" - ); - } - FALKORDB_DEFAULT_PORT - } - }, - None => FALKORDB_DEFAULT_PORT, - } -} - /// Resolve Qdrant configuration from config_store + env vars. -fn resolve_qdrant_config(conn: &mut Client, quiet: bool) -> Option { - let url = std::env::var("GOBBY_QDRANT_URL") - .ok() - .or_else(|| read_config_value(conn, "databases.qdrant.url")); - - let raw_api_key = read_config_value(conn, "databases.qdrant.api_key"); - let api_key = match raw_api_key { - Some(v) => match secrets::resolve_config_value(&v, conn) { - Ok(resolved) => Some(resolved), - Err(e) => { - if !quiet { - eprintln!("Warning: failed to resolve Qdrant API key: {e}"); - } - None - } - }, - None => None, +fn resolve_qdrant_config( + conn: &mut Client, + standalone: Option, + _quiet: bool, +) -> Option { + let mut source = FallbackConfigSource { + postgres: PostgresConfigSource { conn }, + standalone, }; - - let collection_prefix = read_config_value(conn, "databases.qdrant.collection_prefix") - .unwrap_or_else(|| "code_symbols_".to_string()); - - // Only return Some if there's a URL (qdrant_path = embedded mode, not accessible from CLI) - url.as_ref()?; - - Some(QdrantConfig { - url, - api_key, - collection_prefix, - }) + gobby_core::config::resolve_qdrant_config(&mut source) } /// Resolve embedding API configuration from config_store + env vars. /// /// Returns None if no api_base is found (→ no semantic search, BM25 only). -fn resolve_embedding_config(conn: &mut Client, quiet: bool) -> Option { - // Env var overrides - let api_base = std::env::var("GOBBY_EMBEDDING_URL").ok(); +fn resolve_embedding_config( + conn: &mut Client, + standalone: Option, + _quiet: bool, +) -> Option { + let mut source = FallbackConfigSource { + postgres: PostgresConfigSource { conn }, + standalone, + }; + gobby_core::config::resolve_embedding_config(&mut source) +} - let api_base = api_base.or_else(|| read_config_value(conn, "embeddings.api_base"))?; +pub(crate) fn resolve_code_vector_settings( + conn: &mut Client, + standalone: Option, +) -> Result { + let mut source = FallbackConfigSource { + postgres: PostgresConfigSource { conn }, + standalone, + }; + resolve_code_vector_settings_from_source(&mut source) +} - // Model (env override → config_store → default) - let model = std::env::var("GOBBY_EMBEDDING_MODEL") +pub(crate) fn resolve_code_vector_settings_from_source( + source: &mut impl ConfigSource, +) -> Result { + let vector_dim = match std::env::var(GOBBY_EMBEDDING_VECTOR_DIM_ENV) .ok() - .or_else(|| read_config_value(conn, "embeddings.model")) - .unwrap_or_else(|| "nomic-embed-text".to_string()); - - // API key (env override → config_store with secret resolution) - let api_key = std::env::var("GOBBY_EMBEDDING_API_KEY").ok().or_else(|| { - let raw = read_config_value(conn, "embeddings.api_key")?; - match secrets::resolve_config_value(&raw, conn) { - Ok(resolved) => Some(resolved), - Err(e) => { - if !quiet { - eprintln!("Warning: failed to resolve embedding API key: {e}"); - } - None - } - } - }); + .filter(|value| !value.trim().is_empty()) + { + Some(value) => Some(parse_vector_dim( + GOBBY_EMBEDDING_VECTOR_DIM_ENV, + value.trim(), + )?), + None => source + .config_value(EMBEDDING_VECTOR_DIM_CONFIG_KEY) + .map(|value| parse_vector_dim(EMBEDDING_VECTOR_DIM_CONFIG_KEY, value.trim())) + .transpose()?, + }; - Some(EmbeddingConfig { - api_base, - model, - api_key, - }) + Ok(CodeVectorSettings { vector_dim }) +} + +fn parse_vector_dim(source: &'static str, value: &str) -> Result { + value + .parse::() + .ok() + .filter(|size| *size > 0) + .ok_or_else(|| CodeVectorConfigError::InvalidVectorDim { + source, + value: value.to_string(), + }) } #[cfg(test)] @@ -619,30 +724,20 @@ mod tests { (repo, linked) } - #[test] - fn test_decode_config_store_values() { - assert_eq!( - decode_config_value("\"http://test:7474\""), - Some("http://test:7474".to_string()) - ); - assert_eq!( - decode_config_value("http://legacy:7474"), - Some("http://legacy:7474".to_string()) - ); - assert_eq!( - decode_config_value(r#"["alpha",1,true]"#), - Some(r#"["alpha",1,true]"#.to_string()) - ); - assert_eq!( - decode_config_value(r#"{"host":"falkor.local","port":16379}"#), - Some(r#"{"host":"falkor.local","port":16379}"#.to_string()) - ); - } - - fn clear_falkordb_env() { - unsafe { std::env::remove_var("GOBBY_FALKORDB_HOST") }; - unsafe { std::env::remove_var("GOBBY_FALKORDB_PORT") }; - unsafe { std::env::remove_var("GOBBY_FALKORDB_PASSWORD") }; + fn clear_service_env() { + for key in [ + "GOBBY_FALKORDB_HOST", + "GOBBY_FALKORDB_PORT", + "GOBBY_FALKORDB_PASSWORD", + "GOBBY_QDRANT_URL", + "GOBBY_QDRANT_API_KEY", + "GOBBY_EMBEDDING_URL", + "GOBBY_EMBEDDING_MODEL", + "GOBBY_EMBEDDING_API_KEY", + "GOBBY_EMBEDDING_VECTOR_DIM", + ] { + unsafe { std::env::remove_var(key) }; + } } fn config_value_for<'a>( @@ -653,102 +748,158 @@ mod tests { #[test] #[serial_test::serial] - fn falkordb_config_store_only_resolves_host_port_password() { - clear_falkordb_env(); + fn adapter_env_precedence_and_json_decode() { + clear_service_env(); + unsafe { std::env::set_var("GOBBY_FALKORDB_HOST", "env-falkor.local") }; let values = std::collections::HashMap::from([ - ("databases.falkordb.host", "falkor.local"), - ("databases.falkordb.port", "16380"), - ("databases.falkordb.requirepass", "stored-pass"), + ("databases.falkordb.host", r#""stored-falkor.local""#), + ("databases.falkordb.port", r#""16380""#), + ("databases.falkordb.requirepass", r#""stored-pass""#), + ("databases.qdrant.url", r#""http://qdrant.local:6333""#), + ("databases.qdrant.api_key", r#""qdrant-key""#), + ("embeddings.api_base", r#""http://embeddings.local:11434""#), + ("embeddings.model", r#""embed-model""#), + ("embeddings.api_key", "null"), ]); - let config = - resolve_falkordb_config_from_values(config_value_for(&values), true, |value| { - Ok(value.to_string()) - }) - .expect("falkordb config"); - - assert_eq!(config.host, "falkor.local"); - assert_eq!(config.port, 16380); - assert_eq!(config.password.as_deref(), Some("stored-pass")); - assert_eq!(config.graph_name, "gobby_code"); - } - - #[test] - #[serial_test::serial] - fn falkordb_env_only_resolves_host_port_password() { - clear_falkordb_env(); - unsafe { std::env::set_var("GOBBY_FALKORDB_HOST", "env-falkor.local") }; - unsafe { std::env::set_var("GOBBY_FALKORDB_PORT", "16381") }; - unsafe { std::env::set_var("GOBBY_FALKORDB_PASSWORD", "env-pass") }; - - let values = std::collections::HashMap::new(); - let config = - resolve_falkordb_config_from_values(config_value_for(&values), true, |value| { - Ok(value.to_string()) - }) - .expect("falkordb config"); - - assert_eq!(config.host, "env-falkor.local"); - assert_eq!(config.port, 16381); - assert_eq!(config.password.as_deref(), Some("env-pass")); - clear_falkordb_env(); + let falkor = resolve_falkordb_config_from_values(config_value_for(&values), |value| { + Ok(value.to_string()) + }) + .expect("falkordb config"); + let qdrant = resolve_qdrant_config_from_values(config_value_for(&values), |value| { + Ok(value.to_string()) + }) + .expect("qdrant config"); + let embedding = resolve_embedding_config_from_values(config_value_for(&values), |value| { + Ok(value.to_string()) + }) + .expect("embedding config"); + + assert_eq!(falkor.host, "env-falkor.local"); + assert_eq!(falkor.port, 16380); + assert_eq!(falkor.password.as_deref(), Some("stored-pass")); + assert_eq!(qdrant.url.as_deref(), Some("http://qdrant.local:6333")); + assert_eq!(qdrant.api_key.as_deref(), Some("qdrant-key")); + assert_eq!(embedding.api_base, "http://embeddings.local:11434"); + assert_eq!(embedding.model, "embed-model"); + assert_eq!(embedding.api_key, None); + clear_service_env(); } #[test] #[serial_test::serial] - fn falkordb_env_host_overrides_config_store_host() { - clear_falkordb_env(); - unsafe { std::env::set_var("GOBBY_FALKORDB_HOST", "env-host.local") }; + fn adapter_resolves_config_store_secrets() { + clear_service_env(); let values = std::collections::HashMap::from([ - ("databases.falkordb.host", "stored-host.local"), - ("databases.falkordb.port", "16382"), + ("databases.falkordb.host", "falkor.local"), + ( + "databases.falkordb.requirepass", + "$secret:falkordb_password", + ), + ("databases.qdrant.url", "http://qdrant.local:6333"), + ("databases.qdrant.api_key", "$secret:qdrant_api_key"), + ("embeddings.api_base", "http://embeddings.local:11434"), + ("embeddings.api_key", "$secret:embedding_api_key"), ]); - let config = - resolve_falkordb_config_from_values(config_value_for(&values), true, |value| { - Ok(value.to_string()) - }) - .expect("falkordb config"); + fn resolve_secret_stub(value: &str) -> anyhow::Result { + match value { + "$secret:falkordb_password" => Ok("resolved-falkor".to_string()), + "$secret:qdrant_api_key" => Ok("resolved-qdrant".to_string()), + "$secret:embedding_api_key" => Ok("resolved-embedding".to_string()), + value => Ok(value.to_string()), + } + } - assert_eq!(config.host, "env-host.local"); - assert_eq!(config.port, 16382); - clear_falkordb_env(); + let falkor = + resolve_falkordb_config_from_values(config_value_for(&values), resolve_secret_stub) + .expect("falkordb config"); + let qdrant = + resolve_qdrant_config_from_values(config_value_for(&values), resolve_secret_stub) + .expect("qdrant config"); + let embedding = + resolve_embedding_config_from_values(config_value_for(&values), resolve_secret_stub) + .expect("embedding config"); + + assert_eq!(falkor.password.as_deref(), Some("resolved-falkor")); + assert_eq!(qdrant.api_key.as_deref(), Some("resolved-qdrant")); + assert_eq!(embedding.api_key.as_deref(), Some("resolved-embedding")); } #[test] #[serial_test::serial] - fn falkordb_secret_password_resolves_through_secret_resolver() { - clear_falkordb_env(); - let values = std::collections::HashMap::from([ - ("databases.falkordb.host", "falkor.local"), - ("databases.falkordb.requirepass", "$secret:requirepass"), - ]); - - let config = - resolve_falkordb_config_from_values(config_value_for(&values), true, |value| { - assert_eq!(value, "$secret:requirepass"); - Ok("resolved-pass".to_string()) - }) - .expect("falkordb config"); + fn vector_dim_setting_resolves_env_and_config_store() { + clear_service_env(); + let values = std::collections::HashMap::from([("embeddings.vector_dim", "1536")]); + + let settings = resolve_code_vector_settings_from_values(config_value_for(&values)) + .expect("config-store vector settings"); + assert_eq!(settings.vector_dim, Some(1536)); + + unsafe { std::env::set_var("GOBBY_EMBEDDING_VECTOR_DIM", "3072") }; + let settings = resolve_code_vector_settings_from_values(config_value_for(&values)) + .expect("env vector settings"); + assert_eq!(settings.vector_dim, Some(3072)); + + unsafe { std::env::remove_var("GOBBY_EMBEDDING_VECTOR_DIM") }; + let null_values = std::collections::HashMap::from([("embeddings.vector_dim", "null")]); + let settings = resolve_code_vector_settings_from_values(config_value_for(&null_values)) + .expect("null config-store vector settings"); + assert_eq!(settings.vector_dim, None); + + let invalid_values = + std::collections::HashMap::from([("embeddings.vector_dim", r#""wide""#)]); + let err = resolve_code_vector_settings_from_values(config_value_for(&invalid_values)) + .expect_err("invalid vector dim must error"); + assert!(matches!( + err, + CodeVectorConfigError::InvalidVectorDim { .. } + )); + clear_service_env(); + } - assert_eq!(config.password.as_deref(), Some("resolved-pass")); + #[test] + fn falkor_config_wrapper_shape() { + let source = include_str!("config.rs"); + assert!(source.contains("pub struct FalkorConfig")); + assert!(source.contains("pub graph_name: String")); + assert!(source.contains("gobby_core::config::resolve_falkordb_config")); + assert!(source.contains("graph_name: FALKORDB_GRAPH_NAME.to_string()")); } #[test] - #[serial_test::serial] - fn falkordb_config_missing_host_returns_none() { - clear_falkordb_env(); - let values = std::collections::HashMap::from([ - ("databases.falkordb.port", "16379"), - ("databases.falkordb.requirepass", "stored-pass"), - ]); + fn phase7_context_and_falkor_resolver_visible() { + let source = include_str!("config.rs"); + assert!(source.contains("pub falkordb: Option")); + assert!(source.contains("let falkordb = resolve_falkordb_config(")); + assert!(source.contains("pub const FALKORDB_GRAPH_NAME: &str = \"gobby_code\";")); + assert!(source.contains("graph_name: FALKORDB_GRAPH_NAME.to_string()")); + } - let config = - resolve_falkordb_config_from_values(config_value_for(&values), true, |value| { - Ok(value.to_string()) - }); + #[test] + fn phase7_falkordb_config_store_keys_visible() { + let source = include_str!("config.rs"); + for key in [ + FALKORDB_HOST_CONFIG_KEY, + FALKORDB_PORT_CONFIG_KEY, + FALKORDB_PASSWORD_CONFIG_KEY, + GOBBY_FALKORDB_HOST_ENV, + GOBBY_FALKORDB_PORT_ENV, + GOBBY_FALKORDB_PASSWORD_ENV, + ] { + assert!(source.contains(key), "missing {key}"); + } + } - assert!(config.is_none()); + #[test] + fn phase7_neo4j_transition_state_absent() { + let source = include_str!("config.rs"); + let config_type = ["pub struct Neo", "4jConfig"].concat(); + let resolver = ["resolve_neo", "4j_config"].concat(); + let context_field = ["pub neo", "4j: Option"].concat(); + assert!(!source.contains(&config_type)); + assert!(!source.contains(&resolver)); + assert!(!source.contains(&context_field)); } #[test] @@ -786,13 +937,35 @@ mod tests { assert!(identity.warning.is_none()); } + #[test] + fn self_referential_parent_marker_keeps_project_json_id() { + let tmp = tempfile::tempdir().expect("tempdir"); + let root = tmp.path().canonicalize().expect("canonical root"); + write_project_json( + &root, + serde_json::json!({ + "id": "main-project-id", + "name": "main", + "parent_project_path": root.to_string_lossy(), + "parent_project_id": "main-project-id" + }), + ); + + let identity = resolve_project_identity(&root, MissingIdentity::Error).expect("identity"); + + assert_eq!(identity.project_id, "main-project-id"); + assert_eq!(identity.source, ProjectIdentitySource::ProjectJson); + assert!(!identity.should_write_gcode_json); + assert!(identity.warning.is_none()); + } + #[test] fn isolated_marker_uses_path_derived_id_without_warning() { let tmp = tempfile::tempdir().expect("tempdir"); write_project_json( tmp.path(), serde_json::json!({ - "id": "copied-parent-id", + "id": "parent-id", "parent_project_path": "/parent", "parent_project_id": "parent-id" }), @@ -858,4 +1031,14 @@ mod tests { crate::project::code_index_id_for_root(tmp.path()) ); } + + #[test] + fn project_id_only_context_rejects_empty_id_before_runtime_resolution() { + let err = match Context::resolve_for_project_id(" ", true) { + Ok(_) => panic!("empty project id should fail before DB resolution"), + Err(err) => err, + }; + + assert!(err.to_string().contains("--project-id must not be empty")); + } } diff --git a/crates/gcode/src/db.rs b/crates/gcode/src/db.rs index a5ee089..5690ae1 100644 --- a/crates/gcode/src/db.rs +++ b/crates/gcode/src/db.rs @@ -2,9 +2,11 @@ use std::path::{Path, PathBuf}; use std::time::Duration; use anyhow::{Context as _, anyhow, bail}; -use postgres::{Client, NoTls}; +use gobby_core::provisioning::{GCORE_CONFIG_FILENAME, StandaloneConfig}; +use postgres::{Client, GenericClient}; use serde::Deserialize; +use crate::models::{CallRelation, CallTargetKind, ImportRelation, Symbol}; use crate::schema; const GCODE_DATABASE_URL_ENV: &str = "GCODE_DATABASE_URL"; @@ -58,11 +60,19 @@ fn resolve_database_url_from_sources( ) -> anyhow::Result { let path = home.join("bootstrap.yaml"); + if let Some(database_url) = resolve_database_url_from_env(get_var) { + return Ok(database_url); + } + if let Ok(database_url) = broker_resolver(&path) { return Ok(database_url); } - if let Some(database_url) = resolve_database_url_from_env(get_var) { + if let Some(database_url) = resolve_database_url_from_bootstrap_file(&path)? { + return Ok(database_url); + } + + if let Some(database_url) = resolve_database_url_from_gcore_config(home)? { return Ok(database_url); } @@ -72,14 +82,28 @@ fn resolve_database_url_from_sources( return Ok(database_url); } - let contents = std::fs::read_to_string(&path).with_context(|| { - format!( - "missing Gobby bootstrap at {}. Configure the Gobby PostgreSQL hub before running gcode.", - path.display() - ) - })?; + bail!( + "missing Gobby PostgreSQL configuration. Run `gcode setup --standalone`, set {GCODE_DATABASE_URL_ENV}, or configure the Gobby daemon bootstrap." + ) +} + +fn resolve_database_url_from_bootstrap_file(path: &Path) -> anyhow::Result> { + if !path.exists() { + return Ok(None); + } + let contents = std::fs::read_to_string(path) + .with_context(|| format!("failed to read Gobby bootstrap at {}", path.display()))?; let bootstrap = parse_bootstrap_database(&contents)?; - resolve_database_url_from_bootstrap(&bootstrap) + resolve_database_url_from_bootstrap(&bootstrap).map(Some) +} + +fn resolve_database_url_from_gcore_config(home: &Path) -> anyhow::Result> { + let Some(config) = StandaloneConfig::read_at(&home.join(GCORE_CONFIG_FILENAME))? else { + return Ok(None); + }; + Ok(config + .get("databases.postgres.dsn") + .and_then(|value| non_empty_trimmed(Some(value.to_string())))) } fn resolve_database_url_from_env( @@ -225,7 +249,9 @@ fn request_broker_database_url(daemon_url: &str, token: &str) -> anyhow::Result< /// keeping the intent explicit preserves a routing point for future pools, /// permissions, or replicas. pub fn connect_readwrite(database_url: &str) -> anyhow::Result { - connect(database_url) + let mut client = gobby_core::postgres::connect_readwrite(database_url)?; + schema::validate_runtime_schema(&mut client)?; + Ok(client) } /// Open a connection for command paths that only read from the hub. @@ -234,7 +260,230 @@ pub fn connect_readwrite(database_url: &str) -> anyhow::Result { /// keeping the intent explicit preserves a routing point for future pools, /// permissions, or replicas. pub fn connect_readonly(database_url: &str) -> anyhow::Result { - connect(database_url) + let mut client = gobby_core::postgres::connect_readonly(database_url)?; + schema::validate_runtime_schema(&mut client)?; + Ok(client) +} + +pub fn read_config_value(conn: &mut Client, key: &str) -> anyhow::Result> { + gobby_core::postgres::read_config_value(conn, key) +} + +#[derive(Debug, Clone)] +pub struct GraphFileFacts { + pub file_path: String, + pub imports: Vec, + pub definitions: Vec, + pub calls: Vec, +} + +pub fn list_indexed_file_paths( + conn: &mut impl GenericClient, + project_id: &str, +) -> anyhow::Result> { + let rows = conn.query( + "SELECT file_path FROM code_indexed_files WHERE project_id = $1 ORDER BY file_path", + &[&project_id], + )?; + rows.into_iter() + .map(|row| row.try_get("file_path").map_err(Into::into)) + .collect() +} + +pub fn read_graph_file_facts( + conn: &mut impl GenericClient, + project_id: &str, + file_path: &str, +) -> anyhow::Result { + let imports = read_imports_for_file(conn, project_id, file_path)?; + let definitions = read_symbols_for_file(conn, project_id, file_path)?; + let calls = read_calls_for_file(conn, project_id, file_path)?; + + Ok(GraphFileFacts { + file_path: file_path.to_string(), + imports, + definitions, + calls, + }) +} + +pub fn indexed_file_exists( + conn: &mut impl GenericClient, + project_id: &str, + file_path: &str, +) -> anyhow::Result { + Ok(conn + .query_opt( + "SELECT 1 FROM code_indexed_files + WHERE project_id = $1 AND file_path = $2", + &[&project_id, &file_path], + )? + .is_some()) +} + +pub fn mark_graph_sync_attempted( + conn: &mut impl GenericClient, + project_id: &str, + file_path: &str, +) -> anyhow::Result { + let updated = conn.execute( + "UPDATE code_indexed_files + SET graph_synced = false, graph_sync_attempted_at = NOW() + WHERE project_id = $1 AND file_path = $2", + &[&project_id, &file_path], + )?; + Ok(updated > 0) +} + +pub fn mark_graph_synced( + conn: &mut impl GenericClient, + project_id: &str, + file_path: &str, +) -> anyhow::Result { + let updated = conn.execute( + "UPDATE code_indexed_files + SET graph_synced = true, graph_sync_attempted_at = NOW() + WHERE project_id = $1 AND file_path = $2", + &[&project_id, &file_path], + )?; + Ok(updated > 0) +} + +pub fn reset_graph_sync_for_project( + conn: &mut impl GenericClient, + project_id: &str, +) -> anyhow::Result { + Ok(conn.execute( + "UPDATE code_indexed_files + SET graph_synced = false, graph_sync_attempted_at = NULL + WHERE project_id = $1", + &[&project_id], + )?) +} + +pub fn mark_vectors_synced( + conn: &mut impl GenericClient, + project_id: &str, + file_path: &str, +) -> anyhow::Result { + let updated = conn.execute( + "UPDATE code_indexed_files + SET vectors_synced = true + WHERE project_id = $1 AND file_path = $2", + &[&project_id, &file_path], + )?; + Ok(updated > 0) +} + +pub fn mark_project_vectors_synced( + conn: &mut impl GenericClient, + project_id: &str, +) -> anyhow::Result { + Ok(conn.execute( + "UPDATE code_indexed_files + SET vectors_synced = true + WHERE project_id = $1", + &[&project_id], + )?) +} + +pub fn reset_vectors_sync_for_project( + conn: &mut impl GenericClient, + project_id: &str, +) -> anyhow::Result { + Ok(conn.execute( + "UPDATE code_indexed_files + SET vectors_synced = false + WHERE project_id = $1", + &[&project_id], + )?) +} + +fn read_imports_for_file( + conn: &mut impl GenericClient, + project_id: &str, + file_path: &str, +) -> anyhow::Result> { + let rows = conn.query( + "SELECT source_file, target_module + FROM code_imports + WHERE project_id = $1 AND source_file = $2 + ORDER BY target_module", + &[&project_id, &file_path], + )?; + rows.into_iter() + .map(|row| { + Ok(ImportRelation { + file_path: row.try_get("source_file")?, + module_name: row.try_get("target_module")?, + }) + }) + .collect() +} + +fn read_symbols_for_file( + conn: &mut impl GenericClient, + project_id: &str, + file_path: &str, +) -> anyhow::Result> { + let query = format!( + "SELECT {} FROM code_symbols s + WHERE s.project_id = $1 AND s.file_path = $2 + ORDER BY s.line_start, s.byte_start", + symbol_select_columns("s") + ); + let rows = conn.query(&query, &[&project_id, &file_path])?; + rows.iter().map(Symbol::from_row).collect() +} + +fn read_calls_for_file( + conn: &mut impl GenericClient, + project_id: &str, + file_path: &str, +) -> anyhow::Result> { + let rows = conn.query( + "SELECT caller_symbol_id, callee_symbol_id, callee_name, + callee_target_kind, callee_external_module, file_path, line::BIGINT AS line + FROM code_calls + WHERE project_id = $1 AND file_path = $2 + ORDER BY line, caller_symbol_id, callee_name", + &[&project_id, &file_path], + )?; + rows.into_iter() + .map(|row| { + let target_kind: String = row.try_get("callee_target_kind")?; + let callee_symbol_id: String = row.try_get("callee_symbol_id")?; + let callee_external_module: String = row.try_get("callee_external_module")?; + Ok(CallRelation { + caller_symbol_id: row.try_get("caller_symbol_id")?, + callee_symbol_id: non_empty(callee_symbol_id), + callee_name: row.try_get("callee_name")?, + callee_target_kind: call_target_kind_from_str(&target_kind)?, + callee_external_module: non_empty(callee_external_module), + file_path: row.try_get("file_path")?, + line: i64_to_usize(row.try_get("line")?, "line")?, + }) + }) + .collect() +} + +fn non_empty(value: String) -> Option { + if value.is_empty() { None } else { Some(value) } +} + +fn call_target_kind_from_str(value: &str) -> anyhow::Result { + match value { + "symbol" => Ok(CallTargetKind::Symbol), + "unresolved" => Ok(CallTargetKind::Unresolved), + "external" => Ok(CallTargetKind::External), + other => bail!("unknown code_calls.callee_target_kind `{other}`"), + } +} + +fn i64_to_usize(value: i64, column: &str) -> anyhow::Result { + value + .try_into() + .with_context(|| format!("column `{column}` contains negative or too-large value {value}")) } pub fn symbol_select_columns(alias: &str) -> String { @@ -254,13 +503,6 @@ pub fn symbol_select_columns(alias: &str) -> String { ) } -fn connect(database_url: &str) -> anyhow::Result { - let mut client = Client::connect(database_url, NoTls) - .context("failed to connect to the Gobby PostgreSQL hub")?; - schema::validate_runtime_schema(&mut client)?; - Ok(client) -} - #[cfg(test)] mod tests { use super::*; @@ -308,7 +550,7 @@ mod tests { } #[test] - fn database_url_sources_prefer_daemon_broker() { + fn database_url_sources_prefer_env_over_daemon_broker() { let home = tempfile::tempdir().expect("temp home"); let resolved = resolve_database_url_from_sources( @@ -321,24 +563,21 @@ mod tests { ) .expect("resolve database url"); - assert_eq!(resolved, "postgresql://broker/db"); + assert_eq!(resolved, "postgresql://env/db"); } #[test] - fn database_url_sources_fall_back_to_env_when_daemon_is_unavailable() { + fn database_url_sources_use_daemon_broker_after_env() { let home = tempfile::tempdir().expect("temp home"); let resolved = resolve_database_url_from_sources( home.path(), - |_| bail!("daemon unavailable"), - |name| match name { - GOBBY_POSTGRES_DSN_ENV => Some("postgresql://env/db".to_string()), - _ => None, - }, + |_| Ok("postgresql://broker/db".to_string()), + |_| None, ) .expect("resolve database url"); - assert_eq!(resolved, "postgresql://env/db"); + assert_eq!(resolved, "postgresql://broker/db"); } #[test] @@ -360,6 +599,30 @@ mod tests { assert_eq!(resolved, "postgresql://inline/db"); } + #[test] + fn database_url_sources_fall_back_to_gcore_before_legacy_gcode_config() { + let home = tempfile::tempdir().expect("temp home"); + std::fs::write( + home.path().join(GCORE_CONFIG_FILENAME), + "databases.postgres.dsn: postgresql://gcore/db\n", + ) + .expect("write gcore config"); + std::fs::write( + home.path().join(GCODE_CONFIG_FILENAME), + "database_url: postgresql://legacy/db\n", + ) + .expect("write legacy config"); + + let resolved = resolve_database_url_from_sources( + home.path(), + |_| bail!("daemon unavailable"), + |_| None, + ) + .expect("resolve database url"); + + assert_eq!(resolved, "postgresql://gcore/db"); + } + #[test] fn gcode_config_accepts_database_url() { let home = tempfile::tempdir().expect("temp home"); diff --git a/crates/gcode/src/falkor.rs b/crates/gcode/src/falkor.rs index e4f217f..7ab884c 100644 --- a/crates/gcode/src/falkor.rs +++ b/crates/gcode/src/falkor.rs @@ -1,8 +1,8 @@ -//! FalkorDB read client for graph queries. +//! Compatibility facade for FalkorDB graph queries. //! -//! Read helpers degrade gracefully through `with_falkor`: missing config, -//! connection construction failure, and query failures return caller-provided -//! defaults so search and graph commands remain usable without FalkorDB. +//! The reusable projection/query implementation lives under +//! `crate::graph::code_graph`; this module keeps the Phase 7 public surface +//! available for downstream callers that still import `crate::falkor`. use std::collections::HashMap; @@ -12,6 +12,7 @@ use falkordb::{ use serde_json::{Map, Number, Value}; use crate::config::{Context, FalkorConfig}; +use crate::graph::typed_query; use crate::models::GraphResult; const CALL_TARGET_PREDICATE: &str = @@ -61,176 +62,53 @@ impl FalkorClient { } } } -} - -pub fn cypher_string_literal(s: &str) -> String { - let escaped = s.replace('\\', "\\\\").replace('\'', "\\'"); - format!("'{escaped}'") -} - -fn parse_falkor_result(result: QueryResult>) -> Vec { - parse_falkor_records(result.header, result.data) -} - -fn parse_falkor_records(headers: Vec, records: I) -> Vec -where - I: IntoIterator>, -{ - records - .into_iter() - .map(|record| { - let mut row = HashMap::new(); - for (i, field) in headers.iter().enumerate() { - let value = record.get(i).cloned().unwrap_or(FalkorValue::None); - row.insert(field.clone(), falkor_value_to_json(value)); - } - row - }) - .collect() -} -fn falkor_value_to_json(value: FalkorValue) -> Value { - match value { - FalkorValue::String(value) => Value::String(value), - FalkorValue::Bool(value) => Value::Bool(value), - FalkorValue::I64(value) => Value::Number(Number::from(value)), - FalkorValue::F64(value) => Number::from_f64(value) - .map(Value::Number) - .unwrap_or(Value::Null), - FalkorValue::Array(values) => Value::Array( - values - .into_iter() - .map(falkor_value_to_json) - .collect::>(), - ), - FalkorValue::Map(values) => Value::Object( - values - .into_iter() - .map(|(key, value)| (key, falkor_value_to_json(value))) - .collect::>(), - ), - FalkorValue::None => Value::Null, - value => Value::String(format!("{value:?}")), + /// Execute a typed query after its parameters have been rendered safely. + pub fn query_typed(&mut self, query: typed_query::TypedQuery) -> anyhow::Result> { + let typed_query::TypedQuery { cypher, params } = query; + self.query(&cypher, Some(params)) } } -pub fn with_falkor( - ctx: &Context, - default: T, - f: impl FnOnce(&mut FalkorClient) -> anyhow::Result, -) -> anyhow::Result { - let Some(config) = &ctx.falkordb else { - return Ok(default); - }; - - let mut client = match FalkorClient::from_config(config) { - Ok(client) => client, - Err(e) => { - if !ctx.quiet { - eprintln!("Warning: FalkorDB connection failed: {e}"); - } - return Ok(default); - } - }; - - match f(&mut client) { - Ok(value) => Ok(value), - Err(e) => { - if !ctx.quiet { - eprintln!("Warning: FalkorDB query failed: {e}"); - } - Ok(default) - } - } -} - -fn row_to_graph_result(row: &Row) -> GraphResult { - GraphResult { - id: row - .get("caller_id") - .or_else(|| row.get("callee_id")) - .or_else(|| row.get("source_id")) - .or_else(|| row.get("node_id")) - .or_else(|| row.get("symbol_id")) - .or_else(|| row.get("id")) - .and_then(|v| v.as_str()) - .unwrap_or("") - .to_string(), - name: row - .get("caller_name") - .or_else(|| row.get("callee_name")) - .or_else(|| row.get("source_name")) - .or_else(|| row.get("node_name")) - .or_else(|| row.get("symbol_name")) - .or_else(|| row.get("name")) - .or_else(|| row.get("module_name")) - .and_then(|v| v.as_str()) - .unwrap_or("") - .to_string(), - file_path: row - .get("file") - .or_else(|| row.get("file_path")) - .and_then(|v| v.as_str()) - .unwrap_or("") - .to_string(), - line: row.get("line").and_then(|v| v.as_u64()).unwrap_or(0) as usize, - relation: row - .get("relation") - .or_else(|| row.get("rel_type")) - .and_then(|v| v.as_str()) - .map(String::from), - distance: row - .get("distance") - .and_then(|v| v.as_u64()) - .map(|d| d as usize), - } +pub fn cypher_string_literal(s: &str) -> String { + crate::graph::typed_query::cypher_string_literal(s) } -fn string_params(values: &[(&str, &str)]) -> HashMap { - values - .iter() - .map(|(key, value)| ((*key).to_string(), cypher_string_literal(value))) - .collect() +pub fn id_list_literal(ids: &[String]) -> String { + typed_query::id_list_literal(ids) } fn clamp_limit(limit: usize) -> usize { limit.clamp(1, MAX_GRAPH_LIMIT) } -fn clamp_offset(offset: usize) -> usize { +pub fn clamp_offset(offset: usize) -> usize { offset.min(MAX_GRAPH_LIMIT) } -fn id_list_literal(ids: &[String]) -> String { - ids.iter() - .map(|id| cypher_string_literal(id)) - .collect::>() - .join(", ") -} - -fn count_callers_query(project_id: &str, symbol_id: &str) -> (String, HashMap) { +pub fn count_callers_query(project_id: &str, symbol_id: &str) -> (String, HashMap) { ( format!( "MATCH (caller:CodeSymbol {{project: $project}})-[:CALLS]->(target {{id: $id, project: $project}}) \ WHERE {CALL_TARGET_PREDICATE} \ RETURN count(caller) AS cnt" ), - string_params(&[("project", project_id), ("id", symbol_id)]), + typed_query::string_params(&[("project", project_id), ("id", symbol_id)]), ) } -fn count_usages_query(project_id: &str, symbol_id: &str) -> (String, HashMap) { +pub fn count_usages_query(project_id: &str, symbol_id: &str) -> (String, HashMap) { ( format!( "MATCH (source:CodeSymbol {{project: $project}})-[r:CALLS]->(target {{id: $id, project: $project}}) \ WHERE {CALL_TARGET_PREDICATE} \ RETURN count(source) AS cnt" ), - string_params(&[("project", project_id), ("id", symbol_id)]), + typed_query::string_params(&[("project", project_id), ("id", symbol_id)]), ) } -fn find_callers_query( +pub fn find_callers_query( project_id: &str, symbol_id: &str, offset: usize, @@ -246,11 +124,11 @@ fn find_callers_query( r.file AS file, r.line AS line \ SKIP {offset} LIMIT {limit}" ), - string_params(&[("project", project_id), ("id", symbol_id)]), + typed_query::string_params(&[("project", project_id), ("id", symbol_id)]), ) } -fn find_usages_query( +pub fn find_usages_query( project_id: &str, symbol_id: &str, offset: usize, @@ -266,11 +144,11 @@ fn find_usages_query( 'CALLS' AS rel_type, r.file AS file, r.line AS line \ SKIP {offset} LIMIT {limit}" ), - string_params(&[("project", project_id), ("id", symbol_id)]), + typed_query::string_params(&[("project", project_id), ("id", symbol_id)]), ) } -fn find_callers_batch_query( +pub fn find_callers_batch_query( project_id: &str, symbol_ids: &[String], limit: usize, @@ -285,11 +163,11 @@ fn find_callers_batch_query( r.file AS file, r.line AS line \ LIMIT {limit}" ), - string_params(&[("project", project_id)]), + typed_query::string_params(&[("project", project_id)]), ) } -fn find_callees_batch_query( +pub fn find_callees_batch_query( project_id: &str, symbol_ids: &[String], limit: usize, @@ -304,22 +182,25 @@ fn find_callees_batch_query( r.file AS file, r.line AS line \ LIMIT {limit}" ), - string_params(&[("project", project_id)]), + typed_query::string_params(&[("project", project_id)]), ) } -fn get_imports_query(project_id: &str, file_path: &str) -> (String, HashMap) { +pub fn get_imports_query(project_id: &str, file_path: &str) -> (String, HashMap) { + let limit = clamp_limit(MAX_GRAPH_LIMIT); ( - "MATCH (f:CodeFile {path: $path, project: $project})-[:IMPORTS]->(m:CodeModule) \ - RETURN m.name AS module_name" - .to_string(), - string_params(&[("project", project_id), ("path", file_path)]), + format!( + "MATCH (f:CodeFile {{path: $path, project: $project}})-[:IMPORTS]->(m:CodeModule) \ + RETURN m.name AS module_name \ + LIMIT {limit}" + ), + typed_query::string_params(&[("project", project_id), ("path", file_path)]), ) } -fn blast_radius_query(depth: usize, limit: usize) -> String { +pub fn blast_radius_query(depth: usize, limit: usize) -> String { let depth = depth.clamp(1, 5); - let limit = clamp_limit(limit); + let limit = limit.clamp(1, MAX_GRAPH_LIMIT); format!( "MATCH (target {{id: $id, project: $project}}) \ WHERE {CALL_TARGET_PREDICATE} \ @@ -336,32 +217,90 @@ fn blast_radius_query(depth: usize, limit: usize) -> String { ) } -fn count_from_rows(rows: &[Row]) -> usize { - rows.first() - .and_then(|r| r.get("cnt")) - .and_then(|v| { - v.as_u64() - .or_else(|| v.as_i64().and_then(|value| value.try_into().ok())) +fn parse_falkor_result(result: QueryResult>) -> Vec { + parse_falkor_records(result.header, result.data) +} + +fn parse_falkor_records(headers: Vec, records: I) -> Vec +where + I: IntoIterator>, +{ + records + .into_iter() + .map(|record| { + let mut row = HashMap::new(); + for (i, field) in headers.iter().enumerate() { + let value = record.get(i).cloned().unwrap_or(FalkorValue::None); + row.insert(field.clone(), falkor_value_to_json(value)); + } + row }) - .unwrap_or(0) as usize + .collect() +} + +fn falkor_value_to_json(value: FalkorValue) -> Value { + match value { + FalkorValue::String(value) => Value::String(value), + FalkorValue::Bool(value) => Value::Bool(value), + FalkorValue::I64(value) => Value::Number(Number::from(value)), + FalkorValue::F64(value) => Number::from_f64(value) + .map(Value::Number) + .unwrap_or(Value::Null), + FalkorValue::Array(values) => Value::Array( + values + .into_iter() + .map(falkor_value_to_json) + .collect::>(), + ), + FalkorValue::Map(values) => Value::Object( + values + .into_iter() + .map(|(key, value)| (key, falkor_value_to_json(value))) + .collect::>(), + ), + FalkorValue::None => Value::Null, + value => Value::String(format!("{value:?}")), + } +} + +pub fn with_falkor( + ctx: &Context, + default: T, + f: impl FnOnce(&mut FalkorClient) -> anyhow::Result, +) -> anyhow::Result { + let Some(config) = &ctx.falkordb else { + return Ok(default); + }; + + let mut client = match FalkorClient::from_config(config) { + Ok(client) => client, + Err(e) => { + if !ctx.quiet { + eprintln!("Warning: FalkorDB connection failed: {e}"); + } + return Ok(default); + } + }; + + match f(&mut client) { + Ok(value) => Ok(value), + Err(e) => { + if !ctx.quiet { + eprintln!("Warning: FalkorDB query failed: {e}"); + } + Ok(default) + } + } } /// Count callers of a symbol. pub fn count_callers(ctx: &Context, symbol_id: &str) -> anyhow::Result { - with_falkor(ctx, 0, |client| { - let (query, params) = count_callers_query(&ctx.project_id, symbol_id); - let rows = client.query(&query, Some(params))?; - Ok(count_from_rows(&rows)) - }) + crate::graph::code_graph::count_callers(ctx, symbol_id) } /// Count incoming call usages of a symbol. pub fn count_usages(ctx: &Context, symbol_id: &str) -> anyhow::Result { - with_falkor(ctx, 0, |client| { - let (query, params) = count_usages_query(&ctx.project_id, symbol_id); - let rows = client.query(&query, Some(params))?; - Ok(count_from_rows(&rows)) - }) + crate::graph::code_graph::count_usages(ctx, symbol_id) } /// Find symbols that call the given symbol id. @@ -371,11 +310,7 @@ pub fn find_callers( offset: usize, limit: usize, ) -> anyhow::Result> { - with_falkor(ctx, vec![], |client| { - let (query, params) = find_callers_query(&ctx.project_id, symbol_id, offset, limit); - let rows = client.query(&query, Some(params))?; - Ok(rows.iter().map(row_to_graph_result).collect()) - }) + crate::graph::code_graph::find_callers(ctx, symbol_id, offset, limit) } /// Find incoming CALLS usages for a canonical, unresolved, or external target. @@ -385,11 +320,7 @@ pub fn find_usages( offset: usize, limit: usize, ) -> anyhow::Result> { - with_falkor(ctx, vec![], |client| { - let (query, params) = find_usages_query(&ctx.project_id, symbol_id, offset, limit); - let rows = client.query(&query, Some(params))?; - Ok(rows.iter().map(row_to_graph_result).collect()) - }) + crate::graph::code_graph::find_usages(ctx, symbol_id, offset, limit) } /// Find symbols that call any of the given target ids. @@ -397,15 +328,15 @@ pub fn find_callers_batch( ctx: &Context, symbol_ids: &[String], limit: usize, -) -> anyhow::Result> { - if symbol_ids.is_empty() { - return Ok(vec![]); +) -> anyhow::Result>> { + let mut grouped = HashMap::new(); + for symbol_id in symbol_ids { + grouped.insert( + symbol_id.clone(), + crate::graph::code_graph::find_callers(ctx, symbol_id, 0, limit)?, + ); } - with_falkor(ctx, vec![], |client| { - let (query, params) = find_callers_batch_query(&ctx.project_id, symbol_ids, limit); - let rows = client.query(&query, Some(params))?; - Ok(rows.iter().map(row_to_graph_result).collect()) - }) + Ok(grouped) } /// Find call targets reached by any of the given source ids. @@ -413,24 +344,24 @@ pub fn find_callees_batch( ctx: &Context, symbol_ids: &[String], limit: usize, -) -> anyhow::Result> { - if symbol_ids.is_empty() { - return Ok(vec![]); +) -> anyhow::Result>> { + let mut grouped = HashMap::new(); + for symbol_id in symbol_ids { + grouped.insert( + symbol_id.clone(), + crate::graph::code_graph::find_callees_batch( + ctx, + std::slice::from_ref(symbol_id), + limit, + )?, + ); } - with_falkor(ctx, vec![], |client| { - let (query, params) = find_callees_batch_query(&ctx.project_id, symbol_ids, limit); - let rows = client.query(&query, Some(params))?; - Ok(rows.iter().map(row_to_graph_result).collect()) - }) + Ok(grouped) } /// Get import graph for a file. pub fn get_imports(ctx: &Context, file_path: &str) -> anyhow::Result> { - with_falkor(ctx, vec![], |client| { - let (query, params) = get_imports_query(&ctx.project_id, file_path); - let rows = client.query(&query, Some(params))?; - Ok(rows.iter().map(row_to_graph_result).collect()) - }) + crate::graph::code_graph::get_imports(ctx, file_path) } /// Find transitive blast radius of changing a symbol. @@ -439,12 +370,12 @@ pub fn blast_radius( symbol_id: &str, depth: usize, ) -> anyhow::Result> { - with_falkor(ctx, vec![], |client| { - let query = blast_radius_query(depth, MAX_GRAPH_LIMIT); - let params = string_params(&[("project", &ctx.project_id), ("id", symbol_id)]); - let rows = client.query(&query, Some(params))?; - Ok(rows.iter().map(row_to_graph_result).collect()) - }) + crate::graph::code_graph::blast_radius(ctx, symbol_id, depth) +} + +#[cfg(test)] +fn row_to_graph_result(row: &Row) -> GraphResult { + crate::graph::code_graph::row_to_graph_result(row) } #[cfg(test)] @@ -555,4 +486,151 @@ mod tests { assert_eq!(result.relation.as_deref(), Some("call")); assert_eq!(result.distance, Some(2)); } + + #[test] + fn falkor_client_wrapper_shape() { + let source = include_str!("falkor.rs"); + assert!(source.contains("pub struct FalkorClient")); + assert!(source.contains("graph: SyncGraph")); + assert!( + source.contains("pub fn from_config(config: &FalkorConfig) -> anyhow::Result") + ); + assert!(source.contains("pub fn with_falkor")); + assert!(source.contains("FalkorClientBuilder, FalkorConnectionInfo, FalkorValue, LazyResultSet, QueryResult, SyncGraph")); + assert!(source.contains("client.select_graph(&config.graph_name)")); + } + + #[test] + fn phase7_read_helpers_visible() { + let source = include_str!("falkor.rs"); + for symbol in [ + "pub fn count_callers(", + "pub fn count_usages(", + "pub fn find_callers(", + "pub fn find_usages(", + "pub fn find_callers_batch(", + "pub fn find_callees_batch(", + "pub fn get_imports(", + "pub fn blast_radius(", + "pub fn count_callers_query(", + "pub fn count_usages_query(", + "pub fn find_callers_query(", + "pub fn find_usages_query(", + "pub fn find_callers_batch_query(", + "pub fn find_callees_batch_query(", + "pub fn get_imports_query(", + "fn blast_radius_query(depth: usize, limit: usize)", + ] { + assert!(source.contains(symbol), "missing {symbol}"); + } + } + + #[test] + fn phase7_source_fragments_visible() { + let source = include_str!("falkor.rs"); + for fragment in [ + "urlencoding::encode(password)", + "falkor://:{}@{}:{}", + ".with_connection_info(conn_info)", + ".with_params(&", + "result.header", + "FalkorValue::None", + "let mut client =", + "ctx.falkordb", + ] { + assert!(source.contains(fragment), "missing {fragment}"); + } + } + + #[test] + fn phase7_query_surface_visible() { + let source = include_str!("falkor.rs"); + assert!(source.contains("pub type Row = HashMap")); + assert!(source.contains("pub fn query(")); + assert!(source.contains("cypher: &str")); + assert!(source.contains("params: Option>")); + assert!(source.contains("anyhow::Result>")); + assert!(source.contains("fn parse_falkor_result(")); + } + + #[test] + fn phase7_query_helpers_and_literal_fragments_visible() { + let source = include_str!("falkor.rs"); + for fragment in [ + "pub fn cypher_string_literal", + "pub fn id_list_literal", + "pub fn clamp_offset", + "target:CodeSymbol OR target:UnresolvedCallee OR target:ExternalSymbol", + "SKIP {offset} LIMIT {limit}", + "target.id IN [{ids}]", + ] { + assert!(source.contains(fragment), "missing {fragment}"); + } + + let queries = [ + find_callers_query("project-1", "symbol-1", 5, 10).0, + find_usages_query("project-1", "symbol-1", 5, 10).0, + find_callers_batch_query("project-1", &["a".to_string()], 10).0, + find_callees_batch_query("project-1", &["a".to_string()], 10).0, + ]; + for query in queries { + assert_no_numeric_or_list_placeholders(&query); + } + } + + #[test] + fn phase7_cargo_and_lockfile_state() { + let manifest_dir = std::path::Path::new(env!("CARGO_MANIFEST_DIR")); + let cargo = std::fs::read_to_string(manifest_dir.join("Cargo.toml")) + .expect("read gobby-code Cargo.toml"); + assert!(cargo.contains("name = \"gobby-code\"")); + assert!(cargo.contains("name = \"gcode\"")); + assert!(cargo.contains("path = \"src/main.rs\"")); + assert!(cargo.contains("falkordb = \"0.2\"")); + assert!(cargo.contains("urlencoding = \"2\"")); + assert!(cargo.contains("base64")); + assert!(cargo.contains("reqwest")); + + let lock = std::fs::read_to_string(manifest_dir.join("../../Cargo.lock")) + .expect("read workspace Cargo.lock"); + assert!(lock.contains("name = \"falkordb\"")); + assert!(lock.contains("name = \"urlencoding\"")); + assert!(!lock.contains("name = \"neo4j\"")); + assert!(!lock.contains("name = \"neo4rs\"")); + } + + #[test] + fn phase7_additional_query_fragments_visible() { + let source = include_str!("falkor.rs"); + for fragment in [ + "depth.clamp(1, 5)", + "limit.clamp(1, MAX_GRAPH_LIMIT)", + "offset.min(MAX_GRAPH_LIMIT)", + "src.id IN [{ids}]", + "LIMIT {limit}", + "fn blast_radius_query(depth: usize, limit: usize)", + ] { + assert!(source.contains(fragment), "missing {fragment}"); + } + } + + #[test] + fn read_helpers_delegate_to_code_graph() { + let source = include_str!("falkor.rs"); + for fragment in [ + "crate::graph::code_graph::count_callers(ctx, symbol_id)", + "crate::graph::code_graph::count_usages(ctx, symbol_id)", + "crate::graph::code_graph::find_callers(ctx, symbol_id, offset, limit)", + "crate::graph::code_graph::find_usages(ctx, symbol_id, offset, limit)", + "crate::graph::code_graph::find_callers(", + "crate::graph::code_graph::find_callees_batch(", + "crate::graph::code_graph::get_imports(ctx, file_path)", + "crate::graph::code_graph::blast_radius(ctx, symbol_id, depth)", + ] { + assert!( + source.contains(fragment), + "missing delegation fragment {fragment}" + ); + } + } } diff --git a/crates/gcode/src/freshness.rs b/crates/gcode/src/freshness.rs index ad407c1..4921a28 100644 --- a/crates/gcode/src/freshness.rs +++ b/crates/gcode/src/freshness.rs @@ -2,7 +2,7 @@ use std::path::{Path, PathBuf}; use crate::config::Context; use crate::db; -use crate::index::{hasher, indexer}; +use crate::index::{api, hasher}; use crate::models::Symbol; const INFLIGHT_ENV: &str = "GCODE_FRESHNESS_INFLIGHT"; @@ -18,25 +18,38 @@ pub fn ensure_fresh(ctx: &Context, scope: FreshnessScope) -> anyhow::Result<()> } let _guard = FreshnessGuard::enter(); - let mut conn = db::connect_readwrite(&ctx.database_url)?; match scope { FreshnessScope::Project => { - indexer::index_directory( - &mut conn, - &ctx.project_root, - &ctx.project_id, - true, - ctx.quiet, - false, + api::index_files( + api::IndexRequest { + project_root: ctx.project_root.clone(), + path_filter: None, + explicit_files: Vec::new(), + full: false, + require_cpp_semantics: false, + sync_projections: false, + }, + ctx, )?; } FreshnessScope::Files(paths) => { - let files: Vec = paths + let files: Vec = paths .iter() .map(|path| normalize_file_path(&ctx.project_root, path)) + .map(PathBuf::from) .collect(); if !files.is_empty() { - indexer::index_files(&mut conn, &ctx.project_root, &ctx.project_id, &files, false)?; + api::index_files( + api::IndexRequest { + project_root: ctx.project_root.clone(), + path_filter: None, + explicit_files: files, + full: false, + require_cpp_semantics: false, + sync_projections: false, + }, + ctx, + )?; } } } @@ -142,6 +155,7 @@ mod tests { falkordb: None, qdrant: None, embedding: None, + code_vectors: crate::config::CodeVectorSettings::default(), daemon_url: None, } } diff --git a/crates/gcode/src/graph/code_graph.rs b/crates/gcode/src/graph/code_graph.rs new file mode 100644 index 0000000..039c761 --- /dev/null +++ b/crates/gcode/src/graph/code_graph.rs @@ -0,0 +1,2095 @@ +use std::collections::HashMap; +use std::fmt; + +use anyhow::Context as _; +use reqwest::StatusCode; +use serde::{Deserialize, Serialize}; +use serde_json::Value; + +use crate::config::Context; +use crate::graph::typed_query::{self, TypedQuery, TypedValue}; +use crate::models::{ + CallRelation, CallTargetKind, GraphResult, ImportRelation, ProjectionMetadata, + ProjectionProvenance, Symbol, make_external_symbol_id, make_unresolved_callee_id, +}; +use gobby_core::degradation::ServiceState; +use gobby_core::falkor::{GraphClient, Row}; + +const CALL_TARGET_PREDICATE: &str = + "target:CodeSymbol OR target:UnresolvedCallee OR target:ExternalSymbol"; +const NEIGHBOR_PREDICATE: &str = + "neighbor:CodeSymbol OR neighbor:UnresolvedCallee OR neighbor:ExternalSymbol"; +const PROJECT_NODE_PREDICATE: &str = + "n:CodeFile OR n:CodeSymbol OR n:CodeModule OR n:UnresolvedCallee OR n:ExternalSymbol"; +const TARGET_TYPE_CASE: &str = "CASE \ + WHEN target:CodeSymbol THEN coalesce(target.kind, 'function') \ + WHEN target:ExternalSymbol THEN 'external' \ + ELSE 'unresolved' \ + END"; +const NEIGHBOR_TYPE_CASE: &str = "CASE \ + WHEN neighbor:CodeSymbol THEN coalesce(neighbor.kind, 'function') \ + WHEN neighbor:ExternalSymbol THEN 'external' \ + ELSE 'unresolved' \ + END"; +const NODE_TYPE_CASE: &str = "CASE \ + WHEN n:CodeFile THEN 'file' \ + WHEN n:CodeModule THEN 'module' \ + WHEN n:CodeSymbol THEN coalesce(n.kind, 'function') \ + WHEN n:ExternalSymbol THEN 'external' \ + ELSE 'unresolved' \ + END"; +const LINK_METADATA_RETURN: &str = "r.provenance AS provenance, \ + r.confidence AS confidence, \ + r.source_system AS source_system, \ + r.source_file_path AS metadata_source_file_path, \ + r.source_line AS source_line, \ + r.source_symbol_id AS source_symbol_id, \ + r.matching_method AS matching_method"; +const MAX_GRAPH_LIMIT: usize = 100; +const EXTRACTED_PROVENANCE: &str = "EXTRACTED"; +const SOURCE_SYSTEM_GCODE: &str = crate::models::SOURCE_SYSTEM_GCODE; + +pub struct CodeGraph<'a> { + project_id: &'a str, + client: &'a mut GraphClient, +} + +impl<'a> CodeGraph<'a> { + pub fn new(project_id: &'a str, client: &'a mut GraphClient) -> Self { + Self { project_id, client } + } + + pub fn sync_file( + &mut self, + file_path: &str, + imports: &[ImportRelation], + definitions: &[Symbol], + calls: &[CallRelation], + ) -> anyhow::Result { + self.ensure_file_node(file_path, definitions.len())?; + let current_symbol_ids = definitions + .iter() + .map(|symbol| symbol.id.clone()) + .collect::>(); + self.delete_file_graph(file_path, ¤t_symbol_ids)?; + + let mut relationship_count = 0; + relationship_count += self.add_imports(file_path, imports)?; + relationship_count += self.add_definitions(file_path, definitions)?; + relationship_count += self.add_calls(file_path, calls)?; + self.cleanup_orphans()?; + Ok(relationship_count) + } + + pub fn ensure_file_node(&mut self, file_path: &str, symbol_count: usize) -> anyhow::Result<()> { + execute_write_query( + self.client, + ensure_file_node_query(self.project_id, file_path, symbol_count)?, + ) + } + + pub fn add_imports( + &mut self, + file_path: &str, + imports: &[ImportRelation], + ) -> anyhow::Result { + let mut written = 0; + for import in imports { + if import.module_name.is_empty() { + continue; + } + let source_file = if import.file_path.is_empty() { + file_path + } else { + &import.file_path + }; + execute_write_query( + self.client, + add_import_query(self.project_id, source_file, &import.module_name)?, + )?; + written += 1; + } + Ok(written) + } + + pub fn add_definitions( + &mut self, + file_path: &str, + definitions: &[Symbol], + ) -> anyhow::Result { + let mut written = 0; + for symbol in definitions { + if symbol.id.is_empty() || symbol.name.is_empty() { + continue; + } + execute_write_query( + self.client, + add_definition_query(self.project_id, file_path, symbol)?, + )?; + written += 1; + } + Ok(written) + } + + pub fn add_calls(&mut self, file_path: &str, calls: &[CallRelation]) -> anyhow::Result { + let mut written = 0; + for call in calls { + if let Some(query) = add_call_query(self.project_id, file_path, call)? { + execute_write_query(self.client, query)?; + written += 1; + } + } + Ok(written) + } + + pub fn delete_file_graph( + &mut self, + file_path: &str, + current_symbol_ids: &[String], + ) -> anyhow::Result<()> { + for query in delete_file_graph_queries(self.project_id, file_path, current_symbol_ids)? { + execute_write_query(self.client, query)?; + } + Ok(()) + } + + pub fn delete_file_node(&mut self, file_path: &str) -> anyhow::Result<()> { + execute_write_query( + self.client, + delete_file_node_query(self.project_id, file_path)?, + ) + } + + pub fn delete_file_projection(&mut self, file_path: &str) -> anyhow::Result<()> { + self.delete_file_graph(file_path, &[])?; + self.delete_file_node(file_path)?; + self.cleanup_orphans() + } + + pub fn cleanup_orphans(&mut self) -> anyhow::Result<()> { + for query in cleanup_orphans_queries(self.project_id)? { + execute_write_query(self.client, query)?; + } + Ok(()) + } + + pub fn clear_project(&mut self) -> anyhow::Result<()> { + execute_write_query(self.client, clear_project_query(self.project_id)?) + } +} + +pub fn sync_file_graph( + ctx: &Context, + file_path: &str, + imports: &[ImportRelation], + definitions: &[Symbol], + calls: &[CallRelation], +) -> anyhow::Result { + with_required_core_graph(ctx, |client| { + CodeGraph::new(&ctx.project_id, client).sync_file(file_path, imports, definitions, calls) + }) +} + +pub fn delete_file_graph( + ctx: &Context, + file_path: &str, + current_symbol_ids: &[String], +) -> anyhow::Result<()> { + with_required_core_graph(ctx, |client| { + CodeGraph::new(&ctx.project_id, client).delete_file_graph(file_path, current_symbol_ids) + }) +} + +pub fn delete_file_projection(ctx: &Context, file_path: &str) -> anyhow::Result<()> { + with_required_core_graph(ctx, |client| { + CodeGraph::new(&ctx.project_id, client).delete_file_projection(file_path) + }) +} + +pub fn cleanup_orphans(ctx: &Context) -> anyhow::Result<()> { + with_required_core_graph(ctx, |client| { + CodeGraph::new(&ctx.project_id, client).cleanup_orphans() + }) +} + +pub fn clear_project(ctx: &Context) -> anyhow::Result<()> { + with_required_core_graph(ctx, |client| { + CodeGraph::new(&ctx.project_id, client).clear_project() + }) +} + +pub fn clear_all_code_index(config: &crate::config::FalkorConfig) -> anyhow::Result<()> { + let connection_config = config.connection_config(); + match gobby_core::falkor::with_graph( + Some(&connection_config), + &config.graph_name, + None, + |client| execute_write_query(client, clear_all_code_index_query()?).map(Some), + ) { + Ok((Some(()), ServiceState::Available)) => Ok(()), + Ok((_, ServiceState::NotConfigured)) => Err(GraphReadError::NotConfigured.into()), + Ok((_, ServiceState::Unreachable { message })) => { + Err(GraphReadError::Unreachable { message }.into()) + } + Ok((None, ServiceState::Available)) => Err(GraphReadError::QueryFailed { + message: "graph clear returned no value".to_string(), + } + .into()), + Err(error) => Err(GraphReadError::QueryFailed { + message: error.to_string(), + } + .into()), + } +} + +fn execute_write_query(client: &mut GraphClient, query: TypedQuery) -> anyhow::Result<()> { + let TypedQuery { cypher, params } = query; + client.query(&cypher, Some(params))?; + Ok(()) +} + +fn typed_query(cypher: impl Into, params: I) -> anyhow::Result +where + I: IntoIterator, + K: Into, +{ + Ok(TypedQuery::with_params(cypher, params)?) +} + +fn usize_value(value: usize) -> TypedValue { + TypedValue::Integer(value.min(i64::MAX as usize) as i64) +} + +fn optional_string_value(value: Option<&str>) -> TypedValue { + value + .filter(|value| !value.is_empty()) + .map(|value| TypedValue::String(value.to_string())) + .unwrap_or(TypedValue::Null) +} + +fn base_metadata_params(file_path: &str) -> Vec<(&'static str, TypedValue)> { + vec![ + ( + "provenance", + TypedValue::String(EXTRACTED_PROVENANCE.to_string()), + ), + ("confidence", TypedValue::Float(1.0)), + ( + "source_system", + TypedValue::String(SOURCE_SYSTEM_GCODE.to_string()), + ), + ( + "source_file_path", + TypedValue::String(file_path.to_string()), + ), + ] +} + +fn extracted_edge_params( + file_path: &str, + source_line: usize, + source_symbol_id: Option<&str>, +) -> Vec<(&'static str, TypedValue)> { + let mut params = base_metadata_params(file_path); + params.push(("source_line", usize_value(source_line))); + params.push(("source_symbol_id", optional_string_value(source_symbol_id))); + params +} + +pub(crate) fn ensure_file_node_query( + project_id: &str, + file_path: &str, + symbol_count: usize, +) -> anyhow::Result { + typed_query( + "MERGE (f:CodeFile {path: $file_path, project: $project}) + SET f.updated_at = timestamp(), f.symbol_count = $symbol_count", + [ + ("project", TypedValue::String(project_id.to_string())), + ("file_path", TypedValue::String(file_path.to_string())), + ("symbol_count", usize_value(symbol_count)), + ], + ) +} + +pub(crate) fn add_import_query( + project_id: &str, + source_file: &str, + target_module: &str, +) -> anyhow::Result { + let mut params = vec![ + ("project", TypedValue::String(project_id.to_string())), + ("source_file", TypedValue::String(source_file.to_string())), + ( + "target_module", + TypedValue::String(target_module.to_string()), + ), + ]; + params.extend(base_metadata_params(source_file)); + typed_query( + "MERGE (f:CodeFile {path: $source_file, project: $project}) + MERGE (m:CodeModule {name: $target_module, project: $project}) + MERGE (f)-[r:IMPORTS]->(m) + SET r.provenance = $provenance, + r.confidence = $confidence, + r.source_system = $source_system, + r.source_file_path = $source_file_path", + params, + ) +} + +pub(crate) fn add_definition_query( + project_id: &str, + file_path: &str, + symbol: &Symbol, +) -> anyhow::Result { + let mut params = vec![ + ("project", TypedValue::String(project_id.to_string())), + ("file_path", TypedValue::String(file_path.to_string())), + ("symbol_id", TypedValue::String(symbol.id.clone())), + ("name", TypedValue::String(symbol.name.clone())), + ( + "qualified_name", + TypedValue::String(symbol.qualified_name.clone()), + ), + ("kind", TypedValue::String(symbol.kind.clone())), + ("language", TypedValue::String(symbol.language.clone())), + ("line_start", usize_value(symbol.line_start)), + ("line_end", usize_value(symbol.line_end)), + ]; + params.extend(extracted_edge_params( + file_path, + symbol.line_start, + Some(&symbol.id), + )); + typed_query( + "MERGE (f:CodeFile {path: $file_path, project: $project}) + MERGE (s:CodeSymbol {id: $symbol_id, project: $project}) + SET s.name = $name, + s.qualified_name = $qualified_name, + s.kind = $kind, + s.language = $language, + s.file_path = $file_path, + s.line_start = $line_start, + s.line_end = $line_end, + s.updated_at = timestamp() + MERGE (f)-[r:DEFINES]->(s) + SET r.provenance = $provenance, + r.confidence = $confidence, + r.source_system = $source_system, + r.source_file_path = $source_file_path, + r.source_line = $source_line, + r.source_symbol_id = $source_symbol_id", + params, + ) +} + +enum GraphCallTarget { + Symbol { id: String }, + External { id: String, module: String }, + Unresolved { id: String }, +} + +impl GraphCallTarget { + fn from_call(project_id: &str, call: &CallRelation) -> Option { + if let Some(id) = call.callee_symbol_id.as_deref().filter(|id| !id.is_empty()) { + return Some(Self::Symbol { id: id.to_string() }); + } + if call.callee_name.is_empty() { + return None; + } + if call.callee_target_kind == CallTargetKind::External { + let module = call.callee_external_module.clone().unwrap_or_default(); + return Some(Self::External { + id: make_external_symbol_id(project_id, &call.callee_name, Some(&module)), + module, + }); + } + Some(Self::Unresolved { + id: make_unresolved_callee_id(project_id, &call.callee_name), + }) + } +} + +pub fn call_target_id(project_id: &str, call: &CallRelation) -> Option { + match GraphCallTarget::from_call(project_id, call)? { + GraphCallTarget::Symbol { id } + | GraphCallTarget::External { id, .. } + | GraphCallTarget::Unresolved { id } => Some(id), + } +} + +pub(crate) fn add_call_query( + project_id: &str, + default_file_path: &str, + call: &CallRelation, +) -> anyhow::Result> { + if call.caller_symbol_id.is_empty() { + return Ok(None); + } + let Some(target) = GraphCallTarget::from_call(project_id, call) else { + return Ok(None); + }; + let file_path = if call.file_path.is_empty() { + default_file_path + } else { + &call.file_path + }; + let target_id = match &target { + GraphCallTarget::Symbol { id } + | GraphCallTarget::External { id, .. } + | GraphCallTarget::Unresolved { id } => id, + }; + let mut params = vec![ + ("project", TypedValue::String(project_id.to_string())), + ( + "caller_id", + TypedValue::String(call.caller_symbol_id.clone()), + ), + ("target_id", TypedValue::String(target_id.clone())), + ("callee_name", TypedValue::String(call.callee_name.clone())), + ("file_path", TypedValue::String(file_path.to_string())), + ("line", usize_value(call.line)), + ]; + params.extend(extracted_edge_params( + file_path, + call.line, + Some(&call.caller_symbol_id), + )); + + let cypher = match target { + GraphCallTarget::Symbol { .. } => { + "MERGE (caller:CodeSymbol {id: $caller_id, project: $project}) + MERGE (callee:CodeSymbol {id: $target_id, project: $project}) + ON CREATE SET callee.name = $callee_name, callee.updated_at = timestamp() + MERGE (caller)-[r:CALLS {file: $file_path, line: $line}]->(callee) + SET r.provenance = $provenance, + r.confidence = $confidence, + r.source_system = $source_system, + r.source_file_path = $source_file_path, + r.source_line = $source_line, + r.source_symbol_id = $source_symbol_id" + .to_string() + } + GraphCallTarget::External { module, .. } => { + params.push(("callee_module", TypedValue::String(module))); + "MERGE (caller:CodeSymbol {id: $caller_id, project: $project}) + MERGE (callee:ExternalSymbol {id: $target_id, project: $project}) + SET callee.name = $callee_name, + callee.external_module = $callee_module, + callee.module = $callee_module, + callee.updated_at = timestamp() + MERGE (caller)-[r:CALLS {file: $file_path, line: $line}]->(callee) + SET r.provenance = $provenance, + r.confidence = $confidence, + r.source_system = $source_system, + r.source_file_path = $source_file_path, + r.source_line = $source_line, + r.source_symbol_id = $source_symbol_id" + .to_string() + } + GraphCallTarget::Unresolved { .. } => { + "MERGE (caller:CodeSymbol {id: $caller_id, project: $project}) + MERGE (callee:UnresolvedCallee {id: $target_id, project: $project}) + SET callee.name = $callee_name, + callee.updated_at = timestamp() + MERGE (caller)-[r:CALLS {file: $file_path, line: $line}]->(callee) + SET r.provenance = $provenance, + r.confidence = $confidence, + r.source_system = $source_system, + r.source_file_path = $source_file_path, + r.source_line = $source_line, + r.source_symbol_id = $source_symbol_id" + .to_string() + } + }; + + Ok(Some(typed_query(cypher, params)?)) +} + +pub(crate) fn delete_file_graph_queries( + project_id: &str, + file_path: &str, + current_symbol_ids: &[String], +) -> anyhow::Result> { + let base_params = || { + [ + ("project", TypedValue::String(project_id.to_string())), + ("file_path", TypedValue::String(file_path.to_string())), + ] + }; + let mut queries = vec![ + typed_query( + "MATCH (f:CodeFile {path: $file_path, project: $project})-[r:IMPORTS]->(:CodeModule) + DELETE r", + base_params(), + )?, + typed_query( + "MATCH (f:CodeFile {path: $file_path, project: $project})-[r:DEFINES]->(:CodeSymbol) + DELETE r", + base_params(), + )?, + typed_query( + "MATCH (s:CodeSymbol {project: $project, file_path: $file_path})-[r:CALLS]->() + DELETE r", + base_params(), + )?, + ]; + + if current_symbol_ids.is_empty() { + queries.push(typed_query( + "MATCH (s:CodeSymbol {project: $project, file_path: $file_path}) + DETACH DELETE s", + base_params(), + )?); + } else { + let mut params = vec![ + ("project", TypedValue::String(project_id.to_string())), + ("file_path", TypedValue::String(file_path.to_string())), + ( + "symbol_ids", + TypedValue::List( + current_symbol_ids + .iter() + .map(|id| TypedValue::String(id.clone())) + .collect(), + ), + ), + ]; + queries.push(typed_query( + "MATCH (s:CodeSymbol {project: $project, file_path: $file_path}) + WHERE NOT s.id IN $symbol_ids + DETACH DELETE s", + params.drain(..), + )?); + } + + Ok(queries) +} + +pub(crate) fn delete_file_node_query( + project_id: &str, + file_path: &str, +) -> anyhow::Result { + typed_query( + "MATCH (f:CodeFile {path: $file_path, project: $project}) + DETACH DELETE f", + [ + ("project", TypedValue::String(project_id.to_string())), + ("file_path", TypedValue::String(file_path.to_string())), + ], + ) +} + +pub(crate) fn cleanup_orphans_queries(project_id: &str) -> anyhow::Result> { + let project_param = || [("project", TypedValue::String(project_id.to_string()))]; + Ok(vec![ + typed_query( + "MATCH (m:CodeModule {project: $project}) + WHERE NOT (m)<-[:IMPORTS]-() + DETACH DELETE m", + project_param(), + )?, + typed_query( + "MATCH (n {project: $project}) + WHERE (n:UnresolvedCallee OR n:ExternalSymbol) + AND NOT ()-[:CALLS]->(n) + DETACH DELETE n", + project_param(), + )?, + typed_query( + "MATCH (s:CodeSymbol {project: $project}) + WHERE s.file_path IS NULL + AND NOT ()-[:DEFINES]->(s) + AND NOT ()-[:CALLS]->(s) + AND NOT (s)-[:CALLS]->() + DETACH DELETE s", + project_param(), + )?, + ]) +} + +pub(crate) fn clear_project_query(project_id: &str) -> anyhow::Result { + typed_query( + format!( + "MATCH (n {{project: $project}}) + WHERE {PROJECT_NODE_PREDICATE} + DETACH DELETE n" + ), + [("project", TypedValue::String(project_id.to_string()))], + ) +} + +pub(crate) fn clear_all_code_index_query() -> anyhow::Result { + typed_query( + format!( + "MATCH (n) + WHERE {PROJECT_NODE_PREDICATE} + DETACH DELETE n" + ), + Vec::<(&str, TypedValue)>::new(), + ) +} + +#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)] +#[serde(rename_all = "snake_case")] +pub enum GraphLifecycleAction { + Clear, + Rebuild, +} + +impl GraphLifecycleAction { + pub fn cli_command(self) -> &'static str { + match self { + Self::Clear => "gcode graph clear", + Self::Rebuild => "gcode graph rebuild", + } + } + + pub fn endpoint_path(self) -> &'static str { + match self { + Self::Clear => "/api/code-index/graph/clear", + Self::Rebuild => "/api/code-index/graph/rebuild", + } + } + + pub fn success_prefix(self) -> &'static str { + match self { + Self::Clear => "Cleared code-index graph", + Self::Rebuild => "Rebuilt code-index graph", + } + } +} + +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +pub struct GraphLifecycleRequest { + pub project_id: String, + pub daemon_url: Option, +} + +impl GraphLifecycleRequest { + pub fn from_context(ctx: &Context) -> Self { + Self { + project_id: ctx.project_id.clone(), + daemon_url: ctx.daemon_url.clone(), + } + } +} + +#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)] +pub struct GraphLifecycleOutput { + pub project_id: String, + pub action: GraphLifecycleAction, + pub summary: String, + pub payload: Value, +} + +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +pub struct GraphReadRequest { + pub project_id: String, + pub symbol_id: String, + pub offset: usize, + pub limit: usize, + pub depth: usize, +} + +#[derive(Debug, Clone, Default, PartialEq, Serialize, Deserialize)] +pub struct GraphPayload { + pub nodes: Vec, + pub links: Vec, + #[serde(skip_serializing_if = "Option::is_none")] + pub center: Option, +} + +impl GraphPayload { + pub fn with_center(center: impl Into) -> Self { + Self { + nodes: vec![], + links: vec![], + center: Some(center.into()), + } + } + + pub fn push_node(&mut self, node: GraphNode) { + if node.id.is_empty() || self.nodes.iter().any(|existing| existing.id == node.id) { + return; + } + self.nodes.push(node); + } +} + +#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)] +pub struct GraphNode { + pub id: String, + pub name: String, + #[serde(rename = "type")] + pub node_type: String, + #[serde(skip_serializing_if = "Option::is_none")] + pub kind: Option, + #[serde(skip_serializing_if = "Option::is_none")] + pub file_path: Option, + #[serde(skip_serializing_if = "Option::is_none")] + pub line_start: Option, + #[serde(skip_serializing_if = "Option::is_none")] + pub signature: Option, + #[serde(skip_serializing_if = "Option::is_none")] + pub symbol_count: Option, + #[serde(skip_serializing_if = "Option::is_none")] + pub language: Option, + #[serde(skip_serializing_if = "Option::is_none")] + pub blast_distance: Option, +} + +impl GraphNode { + pub fn new( + id: impl Into, + name: impl Into, + node_type: impl Into, + ) -> Self { + Self { + id: id.into(), + name: name.into(), + node_type: node_type.into(), + kind: None, + file_path: None, + line_start: None, + signature: None, + symbol_count: None, + language: None, + blast_distance: None, + } + } + + fn from_row(row: &Row, default_type: &str) -> Option { + let id = row_string(row, &["id", "node_id"])?; + let mut node = Self::new( + id.clone(), + row_string(row, &["name", "node_name"]).unwrap_or(id), + row_string(row, &["type", "node_type"]).unwrap_or_else(|| default_type.to_string()), + ); + node.kind = row_string(row, &["kind"]); + node.file_path = row_string(row, &["file_path"]); + node.line_start = row_usize(row, &["line_start", "line"]); + node.signature = row_string(row, &["signature"]); + node.symbol_count = row_usize(row, &["symbol_count"]); + node.language = row_string(row, &["language"]); + node.blast_distance = row_usize(row, &["blast_distance", "distance"]); + Some(node) + } + + fn from_prefixed_row(row: &Row, prefix: &str, default_type: &str) -> Option { + let id_key = format!("{prefix}_id"); + let name_key = format!("{prefix}_name"); + let type_key = format!("{prefix}_type"); + let kind_key = format!("{prefix}_kind"); + let file_path_key = format!("{prefix}_file_path"); + let line_start_key = format!("{prefix}_line_start"); + let signature_key = format!("{prefix}_signature"); + + let id = row_string_owned(row, &[id_key.as_str()])?; + let mut node = Self::new( + id.clone(), + row_string_owned(row, &[name_key.as_str()]).unwrap_or(id), + row_string_owned(row, &[type_key.as_str()]).unwrap_or_else(|| default_type.to_string()), + ); + node.kind = row_string_owned(row, &[kind_key.as_str()]); + node.file_path = row_string_owned(row, &[file_path_key.as_str()]); + node.line_start = row_usize_owned(row, &[line_start_key.as_str()]); + node.signature = row_string_owned(row, &[signature_key.as_str()]); + Some(node) + } +} + +#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)] +pub struct GraphLink { + pub source: String, + pub target: String, + #[serde(rename = "type")] + pub link_type: String, + #[serde(skip_serializing_if = "Option::is_none")] + pub line: Option, + #[serde(skip_serializing_if = "Option::is_none")] + pub distance: Option, + #[serde(default, skip_serializing_if = "Option::is_none")] + pub metadata: Option, +} + +impl GraphLink { + pub fn new( + source: impl Into, + target: impl Into, + link_type: impl Into, + ) -> Self { + Self { + source: source.into(), + target: target.into(), + link_type: link_type.into(), + line: None, + distance: None, + metadata: None, + } + } + + pub fn from_row(row: &Row) -> Self { + let mut link = Self::new( + row_string(row, &["source"]).unwrap_or_default(), + row_string(row, &["target"]).unwrap_or_default(), + row_string(row, &["type", "rel_type"]).unwrap_or_else(|| "CALLS".to_string()), + ); + link.line = row_usize(row, &["line"]); + link.distance = row_usize(row, &["distance"]); + link.metadata = row_to_projection_metadata(row); + link + } +} + +#[derive(Debug, Clone, PartialEq, Eq)] +pub enum GraphBlastRadiusTarget { + SymbolId(String), + FilePath(String), +} + +#[derive(Debug, Clone, PartialEq, Eq)] +pub enum GraphReadError { + NotConfigured, + Unreachable { message: String }, + QueryFailed { message: String }, + InvalidTarget { message: String }, +} + +impl fmt::Display for GraphReadError { + fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { + match self { + Self::NotConfigured => { + f.write_str("FalkorDB is not configured; graph read APIs require FalkorDB") + } + Self::Unreachable { message } => { + write!( + f, + "FalkorDB is unreachable; graph read APIs require FalkorDB: {message}" + ) + } + Self::QueryFailed { message } => { + write!(f, "FalkorDB graph read failed: {message}") + } + Self::InvalidTarget { message } => f.write_str(message), + } + } +} + +impl std::error::Error for GraphReadError {} + +pub fn require_daemon_url( + daemon_url: Option<&str>, + action: GraphLifecycleAction, +) -> anyhow::Result<&str> { + daemon_url.ok_or_else(|| { + anyhow::anyhow!( + "Gobby daemon URL is not configured. `{}` requires the Gobby daemon.", + action.cli_command() + ) + }) +} + +pub(crate) fn build_lifecycle_url( + base_url: &str, + action: GraphLifecycleAction, + project_id: &str, +) -> anyhow::Result { + let base = base_url.trim_end_matches('/'); + let mut url = reqwest::Url::parse(&format!("{base}{}", action.endpoint_path())) + .with_context(|| format!("invalid Gobby daemon URL: {base_url}"))?; + url.query_pairs_mut().append_pair("project_id", project_id); + Ok(url) +} + +fn compact_detail(body: &str) -> String { + let detail = body.split_whitespace().collect::>().join(" "); + let detail = detail.trim(); + if detail.len() > 240 { + format!("{}...", &detail[..237]) + } else { + detail.to_string() + } +} + +pub(crate) fn format_http_error( + action: GraphLifecycleAction, + url: &reqwest::Url, + status: StatusCode, + body: &str, +) -> String { + let detail = compact_detail(body); + if detail.is_empty() { + format!( + "`{}` failed: daemon returned HTTP {status} from {url}", + action.cli_command() + ) + } else { + format!( + "`{}` failed: daemon returned HTTP {status} from {url}: {detail}", + action.cli_command() + ) + } +} + +pub(crate) fn parse_success_payload( + action: GraphLifecycleAction, + status: StatusCode, + body: &str, +) -> anyhow::Result { + serde_json::from_str(body).map_err(|err| { + let detail = compact_detail(body); + if detail.is_empty() { + anyhow::anyhow!( + "`{}` failed: daemon returned HTTP {status} with invalid JSON: {err}", + action.cli_command() + ) + } else { + anyhow::anyhow!( + "`{}` failed: daemon returned HTTP {status} with invalid JSON: {err}. Response: {detail}", + action.cli_command() + ) + } + }) +} + +pub(crate) fn extract_summary_text(payload: &Value) -> Option { + match payload { + Value::String(text) => { + let text = text.trim(); + (!text.is_empty()).then(|| text.to_string()) + } + Value::Object(map) => ["summary", "message", "detail", "status"] + .iter() + .find_map(|key| map.get(*key).and_then(Value::as_str)) + .map(str::trim) + .filter(|text| !text.is_empty()) + .map(ToOwned::to_owned), + _ => None, + } +} + +pub fn run_lifecycle_action( + request: &GraphLifecycleRequest, + action: GraphLifecycleAction, +) -> anyhow::Result { + let daemon_url = require_daemon_url(request.daemon_url.as_deref(), action)?; + let url = build_lifecycle_url(daemon_url, action, &request.project_id)?; + let client = reqwest::blocking::Client::builder() + .timeout(std::time::Duration::from_secs(15)) + .build() + .context("failed to build HTTP client")?; + + let response = client + .post(url.clone()) + .header("Accept", "application/json") + .send() + .with_context(|| { + format!( + "Failed to reach Gobby daemon at {daemon_url} for `{}`", + action.cli_command() + ) + })?; + + let status = response.status(); + let body = response.text().unwrap_or_default(); + if !status.is_success() { + anyhow::bail!("{}", format_http_error(action, &url, status, &body)); + } + + let payload = parse_success_payload(action, status, &body)?; + let summary = extract_summary_text(&payload).unwrap_or_else(|| payload.to_string()); + Ok(GraphLifecycleOutput { + project_id: request.project_id.clone(), + action, + summary, + payload, + }) +} + +pub(crate) fn row_to_graph_result(row: &Row) -> GraphResult { + GraphResult { + id: row + .get("caller_id") + .or_else(|| row.get("callee_id")) + .or_else(|| row.get("source_id")) + .or_else(|| row.get("node_id")) + .or_else(|| row.get("symbol_id")) + .or_else(|| row.get("id")) + .and_then(|v| v.as_str()) + .unwrap_or("") + .to_string(), + name: row + .get("caller_name") + .or_else(|| row.get("callee_name")) + .or_else(|| row.get("source_name")) + .or_else(|| row.get("node_name")) + .or_else(|| row.get("symbol_name")) + .or_else(|| row.get("name")) + .or_else(|| row.get("module_name")) + .and_then(|v| v.as_str()) + .unwrap_or("") + .to_string(), + file_path: row + .get("file") + .or_else(|| row.get("file_path")) + .and_then(|v| v.as_str()) + .unwrap_or("") + .to_string(), + line: row.get("line").and_then(|v| v.as_u64()).unwrap_or(0) as usize, + relation: row + .get("relation") + .or_else(|| row.get("rel_type")) + .and_then(|v| v.as_str()) + .map(String::from), + distance: row + .get("distance") + .and_then(|v| v.as_u64()) + .map(|d| d as usize), + metadata: row_to_projection_metadata(row), + } +} + +pub fn extracted_code_edge_metadata( + file_path: impl Into, + line: usize, + source_symbol_id: Option<&str>, +) -> ProjectionMetadata { + let mut metadata = ProjectionMetadata::gcode_extracted() + .with_source_file_path(file_path) + .with_source_line(line); + if let Some(source_symbol_id) = source_symbol_id { + metadata = metadata.with_source_symbol_id(source_symbol_id); + } + metadata +} + +fn row_to_projection_metadata(row: &Row) -> Option { + let provenance = row + .get("provenance") + .and_then(|v| v.as_str()) + .and_then(ProjectionProvenance::from_wire_value)?; + let source_system = row.get("source_system").and_then(|v| v.as_str())?; + + let mut metadata = ProjectionMetadata::new(provenance, source_system); + metadata.confidence = row.get("confidence").and_then(|v| v.as_f64()); + metadata.source_file_path = row_string(row, &["metadata_source_file_path"]); + metadata.source_line = row + .get("source_line") + .or_else(|| row.get("line")) + .and_then(|v| v.as_u64()) + .map(|line| line as usize); + metadata.source_symbol_id = row + .get("source_symbol_id") + .or_else(|| row.get("caller_id")) + .or_else(|| row.get("source_id")) + .and_then(|v| v.as_str()) + .map(ToOwned::to_owned); + metadata.matching_method = row + .get("matching_method") + .and_then(|v| v.as_str()) + .map(ToOwned::to_owned); + Some(metadata) +} + +fn row_string(row: &Row, keys: &[&str]) -> Option { + row_string_owned(row, keys) +} + +fn row_string_owned(row: &Row, keys: &[&str]) -> Option { + keys.iter() + .find_map(|key| row.get(*key).and_then(|value| value.as_str())) + .filter(|value| !value.is_empty()) + .map(ToOwned::to_owned) +} + +fn row_usize(row: &Row, keys: &[&str]) -> Option { + row_usize_owned(row, keys) +} + +fn row_usize_owned(row: &Row, keys: &[&str]) -> Option { + keys.iter() + .find_map(|key| row.get(*key)) + .and_then(|value| { + value + .as_u64() + .or_else(|| value.as_i64().and_then(|value| value.try_into().ok())) + }) + .map(|value| value as usize) +} + +fn add_link_from_row(payload: &mut GraphPayload, row: &Row) { + let link = GraphLink::from_row(row); + if link.source.is_empty() || link.target.is_empty() { + return; + } + payload.links.push(link); +} + +fn add_node_from_row(payload: &mut GraphPayload, row: &Row, default_type: &str) { + if let Some(node) = GraphNode::from_row(row, default_type) { + payload.push_node(node); + } +} + +fn add_prefixed_node_from_row( + payload: &mut GraphPayload, + row: &Row, + prefix: &str, + default_type: &str, +) { + if let Some(node) = GraphNode::from_prefixed_row(row, prefix, default_type) { + payload.push_node(node); + } +} + +fn clamp_limit(limit: usize) -> usize { + typed_query::clamp_limit(limit, MAX_GRAPH_LIMIT) +} + +fn clamp_offset(offset: usize) -> usize { + typed_query::clamp_offset(offset, MAX_GRAPH_LIMIT) +} + +pub(crate) fn count_callers_query( + project_id: &str, + symbol_id: &str, +) -> (String, HashMap) { + ( + format!( + "MATCH (caller:CodeSymbol {{project: $project}})-[:CALLS]->(target {{id: $id, project: $project}}) \ + WHERE {CALL_TARGET_PREDICATE} \ + RETURN count(caller) AS cnt" + ), + typed_query::string_params(&[("project", project_id), ("id", symbol_id)]), + ) +} + +pub(crate) fn count_usages_query( + project_id: &str, + symbol_id: &str, +) -> (String, HashMap) { + ( + format!( + "MATCH (source:CodeSymbol {{project: $project}})-[r:CALLS]->(target {{id: $id, project: $project}}) \ + WHERE {CALL_TARGET_PREDICATE} \ + RETURN count(source) AS cnt" + ), + typed_query::string_params(&[("project", project_id), ("id", symbol_id)]), + ) +} + +pub(crate) fn find_callers_query( + project_id: &str, + symbol_id: &str, + offset: usize, + limit: usize, +) -> (String, HashMap) { + let offset = clamp_offset(offset); + let limit = clamp_limit(limit); + ( + format!( + "MATCH (caller:CodeSymbol {{project: $project}})-[r:CALLS]->(target {{id: $id, project: $project}}) \ + WHERE {CALL_TARGET_PREDICATE} \ + RETURN caller.id AS caller_id, caller.name AS caller_name, \ + r.file AS file, r.line AS line \ + SKIP {offset} LIMIT {limit}" + ), + typed_query::string_params(&[("project", project_id), ("id", symbol_id)]), + ) +} + +pub(crate) fn find_usages_query( + project_id: &str, + symbol_id: &str, + offset: usize, + limit: usize, +) -> (String, HashMap) { + let offset = clamp_offset(offset); + let limit = clamp_limit(limit); + ( + format!( + "MATCH (source:CodeSymbol {{project: $project}})-[r:CALLS]->(target {{id: $id, project: $project}}) \ + WHERE {CALL_TARGET_PREDICATE} \ + RETURN source.id AS source_id, source.name AS source_name, \ + 'CALLS' AS rel_type, r.file AS file, r.line AS line \ + SKIP {offset} LIMIT {limit}" + ), + typed_query::string_params(&[("project", project_id), ("id", symbol_id)]), + ) +} + +pub(crate) fn find_callers_batch_query( + project_id: &str, + symbol_ids: &[String], + limit: usize, +) -> (String, HashMap) { + let limit = clamp_limit(limit); + let ids = typed_query::id_list_literal(symbol_ids); + ( + format!( + "MATCH (caller:CodeSymbol {{project: $project}})-[r:CALLS]->(target {{project: $project}}) \ + WHERE ({CALL_TARGET_PREDICATE}) AND target.id IN [{ids}] \ + RETURN caller.id AS caller_id, caller.name AS caller_name, \ + r.file AS file, r.line AS line \ + LIMIT {limit}" + ), + typed_query::string_params(&[("project", project_id)]), + ) +} + +pub(crate) fn find_callees_batch_query( + project_id: &str, + symbol_ids: &[String], + limit: usize, +) -> (String, HashMap) { + let limit = clamp_limit(limit); + let ids = typed_query::id_list_literal(symbol_ids); + ( + format!( + "MATCH (src:CodeSymbol {{project: $project}})-[r:CALLS]->(target {{project: $project}}) \ + WHERE src.id IN [{ids}] AND ({CALL_TARGET_PREDICATE}) \ + RETURN target.id AS callee_id, target.name AS callee_name, \ + r.file AS file, r.line AS line \ + LIMIT {limit}" + ), + typed_query::string_params(&[("project", project_id)]), + ) +} + +pub(crate) fn get_imports_query( + project_id: &str, + file_path: &str, +) -> (String, HashMap) { + ( + "MATCH (f:CodeFile {path: $path, project: $project})-[:IMPORTS]->(m:CodeModule) \ + RETURN m.name AS module_name" + .to_string(), + typed_query::string_params(&[("project", project_id), ("path", file_path)]), + ) +} + +pub(crate) fn blast_radius_query(depth: usize, limit: usize) -> String { + let depth = depth.clamp(1, 5); + let limit = clamp_limit(limit); + format!( + "MATCH (target {{id: $id, project: $project}}) \ + WHERE {CALL_TARGET_PREDICATE} \ + MATCH path = (affected:CodeSymbol {{project: $project}})-[:CALLS*1..{depth}]->(target) \ + WITH affected, min(length(path)) AS distance \ + OPTIONAL MATCH (file:CodeFile {{project: $project}})-[:DEFINES]->(affected) \ + RETURN DISTINCT affected.id AS node_id, \ + affected.name AS node_name, \ + affected.kind AS kind, file.path AS file_path, \ + affected.line_start AS line, \ + distance, 'call' AS rel_type \ + ORDER BY distance ASC, affected.name ASC \ + LIMIT {limit}" + ) +} + +fn project_overview_files_query( + project_id: &str, + limit: usize, +) -> (String, HashMap) { + let limit = clamp_limit(limit); + ( + format!( + "MATCH (f:CodeFile {{project: $project}}) \ + OPTIONAL MATCH (f)-[:DEFINES]->(s:CodeSymbol) \ + WITH f, count(DISTINCT s) AS sym_count \ + OPTIONAL MATCH (f)-[:IMPORTS]->(m:CodeModule) \ + WITH f, sym_count, count(m) AS imp_count \ + RETURN f.path AS id, f.path AS name, 'file' AS type, \ + f.path AS file_path, sym_count AS symbol_count \ + ORDER BY imp_count DESC, sym_count DESC, f.path \ + LIMIT {limit}" + ), + typed_query::string_params(&[("project", project_id)]), + ) +} + +fn project_overview_imports_query( + project_id: &str, + file_paths: &[String], + limit: usize, +) -> (String, HashMap) { + let limit = clamp_limit(limit); + let file_paths = typed_query::id_list_literal(file_paths); + ( + format!( + "MATCH (f:CodeFile {{project: $project}})-[r:IMPORTS]->(m:CodeModule {{project: $project}}) \ + WHERE f.path IN [{file_paths}] \ + RETURN f.path AS source, m.name AS target, 'IMPORTS' AS type, {LINK_METADATA_RETURN} \ + LIMIT {limit}" + ), + typed_query::string_params(&[("project", project_id)]), + ) +} + +fn project_overview_defines_query( + project_id: &str, + file_paths: &[String], + limit: usize, +) -> (String, HashMap) { + let limit = clamp_limit(limit); + let file_paths = typed_query::id_list_literal(file_paths); + ( + format!( + "MATCH (f:CodeFile {{project: $project}})-[r:DEFINES]->(s:CodeSymbol {{project: $project}}) \ + WHERE f.path IN [{file_paths}] \ + RETURN f.path AS source, s.id AS target, 'DEFINES' AS type, \ + s.name AS symbol_name, s.kind AS symbol_kind, \ + s.file_path AS symbol_file_path, s.line_start AS line_start, \ + {LINK_METADATA_RETURN} \ + LIMIT {limit}" + ), + typed_query::string_params(&[("project", project_id)]), + ) +} + +fn project_overview_calls_query( + project_id: &str, + file_paths: &[String], + limit: usize, +) -> (String, HashMap) { + let limit = clamp_limit(limit); + let file_paths = typed_query::id_list_literal(file_paths); + ( + format!( + "MATCH (f:CodeFile {{project: $project}})-[:DEFINES]->(s:CodeSymbol {{project: $project}})-[r:CALLS]->(target {{project: $project}}) \ + WHERE f.path IN [{file_paths}] AND ({CALL_TARGET_PREDICATE}) \ + RETURN s.id AS source, target.id AS target, 'CALLS' AS type, \ + target.name AS target_name, {TARGET_TYPE_CASE} AS target_type, \ + target.kind AS target_kind, target.file_path AS target_file_path, \ + target.line_start AS target_line_start, r.line AS line, \ + {LINK_METADATA_RETURN} \ + LIMIT {limit}" + ), + typed_query::string_params(&[("project", project_id)]), + ) +} + +fn file_symbols_query(project_id: &str, file_path: &str) -> (String, HashMap) { + ( + format!( + "MATCH (:CodeFile {{path: $path, project: $project}})-[r:DEFINES]->(s:CodeSymbol {{project: $project}}) \ + RETURN s.id AS id, s.name AS name, coalesce(s.kind, 'function') AS type, \ + s.kind AS kind, s.file_path AS file_path, \ + s.line_start AS line_start, s.signature AS signature, \ + {LINK_METADATA_RETURN}" + ), + typed_query::string_params(&[("project", project_id), ("path", file_path)]), + ) +} + +fn file_calls_query(project_id: &str, file_path: &str) -> (String, HashMap) { + ( + format!( + "MATCH (source:CodeSymbol {{project: $project}})-[r:CALLS]->(target {{project: $project}}) \ + WHERE ({CALL_TARGET_PREDICATE}) \ + AND (source.file_path = $path OR (target:CodeSymbol AND target.file_path = $path)) \ + RETURN source.id AS source_id, source.name AS source_name, \ + coalesce(source.kind, 'function') AS source_type, \ + source.kind AS source_kind, source.file_path AS source_file_path, \ + source.line_start AS source_line_start, source.signature AS source_signature, \ + target.id AS target_id, target.name AS target_name, \ + {TARGET_TYPE_CASE} AS target_type, target.kind AS target_kind, \ + target.file_path AS target_file_path, \ + target.line_start AS target_line_start, target.signature AS target_signature, \ + source.id AS source, target.id AS target, 'CALLS' AS type, r.line AS line, \ + {LINK_METADATA_RETURN}" + ), + typed_query::string_params(&[("project", project_id), ("path", file_path)]), + ) +} + +fn symbol_neighbors_query( + project_id: &str, + symbol_id: &str, + limit: usize, +) -> (String, HashMap) { + let limit = clamp_limit(limit); + ( + format!( + "MATCH (center {{id: $id, project: $project}}) \ + WHERE center:CodeSymbol OR center:UnresolvedCallee OR center:ExternalSymbol \ + MATCH (center)-[r:CALLS]-(neighbor {{project: $project}}) \ + WHERE {NEIGHBOR_PREDICATE} \ + RETURN neighbor.id AS id, neighbor.name AS name, {NEIGHBOR_TYPE_CASE} AS type, \ + neighbor.kind AS kind, neighbor.file_path AS file_path, \ + neighbor.line_start AS line_start, neighbor.signature AS signature, \ + CASE WHEN startNode(r) = center THEN 'outgoing' ELSE 'incoming' END AS direction, \ + r.line AS line, {LINK_METADATA_RETURN} \ + LIMIT {limit}" + ), + typed_query::string_params(&[("project", project_id), ("id", symbol_id)]), + ) +} + +fn blast_radius_center_query( + project_id: &str, + symbol_id: &str, +) -> (String, HashMap) { + ( + format!( + "MATCH (n {{id: $id, project: $project}}) \ + WHERE n:CodeSymbol OR n:UnresolvedCallee OR n:ExternalSymbol \ + RETURN n.id AS id, n.name AS name, {NODE_TYPE_CASE} AS type, \ + n.kind AS kind, n.file_path AS file_path \ + LIMIT 1" + ), + typed_query::string_params(&[("project", project_id), ("id", symbol_id)]), + ) +} + +fn blast_radius_file_call_query( + project_id: &str, + file_path: &str, + depth: usize, + limit: usize, +) -> (String, HashMap) { + let depth = depth.clamp(1, 5); + let limit = clamp_limit(limit); + ( + format!( + "MATCH (tf:CodeFile {{path: $path, project: $project}})-[:DEFINES]->(target_sym:CodeSymbol {{project: $project}}) \ + MATCH path = (affected:CodeSymbol {{project: $project}})-[:CALLS*1..{depth}]->(target_sym) \ + WITH affected, min(length(path)) AS distance \ + OPTIONAL MATCH (file:CodeFile {{project: $project}})-[:DEFINES]->(affected) \ + RETURN DISTINCT affected.id AS node_id, \ + affected.name AS node_name, \ + affected.kind AS kind, file.path AS file_path, \ + affected.line_start AS line, distance, 'call' AS rel_type, \ + coalesce(affected.kind, 'function') AS node_type \ + ORDER BY distance ASC, affected.name ASC \ + LIMIT {limit}" + ), + typed_query::string_params(&[("project", project_id), ("path", file_path)]), + ) +} + +fn blast_radius_file_import_query( + project_id: &str, + file_path: &str, + depth: usize, + limit: usize, +) -> (String, HashMap) { + let depth = depth.clamp(1, 5); + let limit = clamp_limit(limit); + ( + format!( + "MATCH (tf:CodeFile {{path: $path, project: $project}})-[:IMPORTS]->(m:CodeModule {{project: $project}}) \ + MATCH path = (importer:CodeFile {{project: $project}})-[:IMPORTS*1..{depth}]->(m) \ + WHERE importer.path <> $path \ + WITH importer, min(length(path)) AS distance \ + RETURN DISTINCT importer.path AS node_id, \ + importer.path AS node_name, NULL AS kind, importer.path AS file_path, \ + NULL AS line, distance, 'import' AS rel_type, 'file' AS node_type \ + ORDER BY distance ASC \ + LIMIT {limit}" + ), + typed_query::string_params(&[("project", project_id), ("path", file_path)]), + ) +} + +fn count_from_rows(rows: &[Row]) -> usize { + rows.first() + .and_then(|r| r.get("cnt")) + .and_then(|v| { + v.as_u64() + .or_else(|| v.as_i64().and_then(|value| value.try_into().ok())) + }) + .unwrap_or(0) as usize +} + +pub fn require_graph_reads(ctx: &Context) -> anyhow::Result<()> { + if ctx.falkordb.is_none() { + return Err(GraphReadError::NotConfigured.into()); + } + Ok(()) +} + +fn with_required_core_graph( + ctx: &Context, + f: impl FnOnce(&mut GraphClient) -> anyhow::Result, +) -> anyhow::Result { + let config = ctx.falkordb.as_ref().ok_or(GraphReadError::NotConfigured)?; + let connection_config = config.connection_config(); + match gobby_core::falkor::with_graph( + Some(&connection_config), + &config.graph_name, + None, + |client| f(client).map(Some), + ) { + Ok((Some(value), ServiceState::Available)) => Ok(value), + Ok((_, ServiceState::NotConfigured)) => Err(GraphReadError::NotConfigured.into()), + Ok((_, ServiceState::Unreachable { message })) => { + Err(GraphReadError::Unreachable { message }.into()) + } + Ok((None, ServiceState::Available)) => Err(GraphReadError::QueryFailed { + message: "graph read returned no value".to_string(), + } + .into()), + Err(error) => Err(GraphReadError::QueryFailed { + message: error.to_string(), + } + .into()), + } +} + +pub fn project_overview_graph(ctx: &Context, limit: usize) -> anyhow::Result { + with_required_core_graph(ctx, |client| { + let limit = clamp_limit(limit); + let link_limit = clamp_limit(limit.saturating_mul(4)); + let max_nodes = limit.saturating_mul(8); + + let (query, params) = project_overview_files_query(&ctx.project_id, limit); + let file_rows = client.query(&query, Some(params))?; + let mut payload = GraphPayload::default(); + for row in &file_rows { + add_node_from_row(&mut payload, row, "file"); + } + + let file_paths = payload + .nodes + .iter() + .filter(|node| node.node_type == "file") + .map(|node| node.id.clone()) + .collect::>(); + if file_paths.is_empty() { + return Ok(payload); + } + + let (query, params) = + project_overview_imports_query(&ctx.project_id, &file_paths, link_limit); + for row in client.query(&query, Some(params))? { + add_link_from_row(&mut payload, &row); + if let Some(module_id) = row_string(&row, &["target"]) { + payload.push_node(GraphNode::new(module_id.clone(), module_id, "module")); + } + if payload.nodes.len() >= max_nodes { + break; + } + } + + let (query, params) = + project_overview_defines_query(&ctx.project_id, &file_paths, link_limit); + for row in client.query(&query, Some(params))? { + add_link_from_row(&mut payload, &row); + if let Some(symbol_id) = row_string(&row, &["target"]) { + let mut node = GraphNode::new( + symbol_id.clone(), + row_string(&row, &["symbol_name"]).unwrap_or(symbol_id), + row_string(&row, &["symbol_kind"]).unwrap_or_else(|| "function".to_string()), + ); + node.kind = row_string(&row, &["symbol_kind"]); + node.file_path = row_string(&row, &["symbol_file_path", "source"]); + node.line_start = row_usize(&row, &["line_start"]); + payload.push_node(node); + } + if payload.nodes.len() >= max_nodes { + break; + } + } + + let (query, params) = + project_overview_calls_query(&ctx.project_id, &file_paths, link_limit); + for row in client.query(&query, Some(params))? { + add_link_from_row(&mut payload, &row); + if let Some(target_id) = row_string(&row, &["target"]) { + let mut node = GraphNode::new( + target_id.clone(), + row_string(&row, &["target_name"]).unwrap_or(target_id), + row_string(&row, &["target_type"]).unwrap_or_else(|| "unresolved".to_string()), + ); + node.kind = row_string(&row, &["target_kind"]); + node.file_path = row_string(&row, &["target_file_path"]); + node.line_start = row_usize(&row, &["target_line_start"]); + payload.push_node(node); + } + if payload.nodes.len() >= max_nodes { + break; + } + } + + Ok(payload) + }) +} + +pub fn file_graph(ctx: &Context, file_path: &str) -> anyhow::Result { + with_required_core_graph(ctx, |client| { + let mut payload = GraphPayload::default(); + let (query, params) = file_symbols_query(&ctx.project_id, file_path); + for row in client.query(&query, Some(params))? { + add_node_from_row(&mut payload, &row, "function"); + if let Some(symbol_id) = row_string(&row, &["id"]) { + let mut link = GraphLink::new(file_path, symbol_id, "DEFINES"); + link.metadata = row_to_projection_metadata(&row); + payload.links.push(link); + } + } + + let (query, params) = file_calls_query(&ctx.project_id, file_path); + for row in client.query(&query, Some(params))? { + add_prefixed_node_from_row(&mut payload, &row, "source", "function"); + add_prefixed_node_from_row(&mut payload, &row, "target", "unresolved"); + add_link_from_row(&mut payload, &row); + } + + Ok(payload) + }) +} + +pub fn symbol_neighbors( + ctx: &Context, + symbol_id: &str, + limit: usize, +) -> anyhow::Result { + with_required_core_graph(ctx, |client| { + let (query, params) = symbol_neighbors_query(&ctx.project_id, symbol_id, limit); + let rows = client.query(&query, Some(params))?; + let mut payload = GraphPayload::default(); + + for row in rows { + add_node_from_row(&mut payload, &row, "unresolved"); + let Some(neighbor_id) = row_string(&row, &["id"]) else { + continue; + }; + let direction = row_string(&row, &["direction"]).unwrap_or_default(); + let mut link = if direction == "outgoing" { + GraphLink::new(symbol_id, neighbor_id, "CALLS") + } else { + GraphLink::new(neighbor_id, symbol_id, "CALLS") + }; + link.line = row_usize(&row, &["line"]); + link.metadata = row_to_projection_metadata(&row); + payload.links.push(link); + } + + Ok(payload) + }) +} + +pub fn blast_radius_graph( + ctx: &Context, + target: GraphBlastRadiusTarget, + depth: usize, + limit: usize, +) -> anyhow::Result { + with_required_core_graph(ctx, |client| { + let (center_id, mut center_node, rows) = match target { + GraphBlastRadiusTarget::SymbolId(symbol_id) => { + let (query, params) = blast_radius_center_query(&ctx.project_id, &symbol_id); + let center_rows = client.query(&query, Some(params))?; + let center_node = center_rows + .first() + .and_then(|row| GraphNode::from_row(row, "function")) + .unwrap_or_else(|| GraphNode::new(&symbol_id, &symbol_id, "function")); + + let query = blast_radius_query(depth, limit); + let params = + typed_query::string_params(&[("project", &ctx.project_id), ("id", &symbol_id)]); + (symbol_id, center_node, client.query(&query, Some(params))?) + } + GraphBlastRadiusTarget::FilePath(file_path) => { + let mut rows = vec![]; + let (query, params) = + blast_radius_file_call_query(&ctx.project_id, &file_path, depth, limit); + rows.extend(client.query(&query, Some(params))?); + let (query, params) = + blast_radius_file_import_query(&ctx.project_id, &file_path, depth, limit); + rows.extend(client.query(&query, Some(params))?); + ( + file_path.clone(), + GraphNode::new(&file_path, &file_path, "file"), + rows, + ) + } + }; + + center_node.blast_distance = Some(0); + let mut payload = GraphPayload::with_center(center_id.clone()); + payload.push_node(center_node); + + for row in rows { + let Some(node_id) = row_string(&row, &["node_id"]) else { + continue; + }; + let mut node = GraphNode::new( + node_id.clone(), + row_string(&row, &["node_name"]).unwrap_or_else(|| node_id.clone()), + row_string(&row, &["node_type"]).unwrap_or_else(|| "function".to_string()), + ); + node.kind = row_string(&row, &["kind"]); + node.file_path = row_string(&row, &["file_path"]); + node.line_start = row_usize(&row, &["line"]); + node.blast_distance = row_usize(&row, &["distance"]); + payload.push_node(node); + + let relation = row_string(&row, &["rel_type"]).unwrap_or_else(|| "call".to_string()); + let mut link = GraphLink::new( + node_id, + ¢er_id, + if relation == "call" { + "CALLS" + } else { + "IMPORTS" + }, + ); + link.distance = row_usize(&row, &["distance"]); + link.metadata = row_to_projection_metadata(&row); + payload.links.push(link); + } + + Ok(payload) + }) +} + +pub fn count_callers(ctx: &Context, symbol_id: &str) -> anyhow::Result { + with_required_core_graph(ctx, |client| { + let (query, params) = count_callers_query(&ctx.project_id, symbol_id); + let rows = client.query(&query, Some(params))?; + Ok(count_from_rows(&rows)) + }) +} + +pub fn count_usages(ctx: &Context, symbol_id: &str) -> anyhow::Result { + with_required_core_graph(ctx, |client| { + let (query, params) = count_usages_query(&ctx.project_id, symbol_id); + let rows = client.query(&query, Some(params))?; + Ok(count_from_rows(&rows)) + }) +} + +pub fn find_callers( + ctx: &Context, + symbol_id: &str, + offset: usize, + limit: usize, +) -> anyhow::Result> { + with_required_core_graph(ctx, |client| { + let (query, params) = find_callers_query(&ctx.project_id, symbol_id, offset, limit); + let rows = client.query(&query, Some(params))?; + Ok(rows.iter().map(row_to_graph_result).collect()) + }) +} + +pub fn find_usages( + ctx: &Context, + symbol_id: &str, + offset: usize, + limit: usize, +) -> anyhow::Result> { + with_required_core_graph(ctx, |client| { + let (query, params) = find_usages_query(&ctx.project_id, symbol_id, offset, limit); + let rows = client.query(&query, Some(params))?; + Ok(rows.iter().map(row_to_graph_result).collect()) + }) +} + +pub fn find_callers_batch( + ctx: &Context, + symbol_ids: &[String], + limit: usize, +) -> anyhow::Result> { + if symbol_ids.is_empty() { + return Ok(vec![]); + } + with_required_core_graph(ctx, |client| { + let (query, params) = find_callers_batch_query(&ctx.project_id, symbol_ids, limit); + let rows = client.query(&query, Some(params))?; + Ok(rows.iter().map(row_to_graph_result).collect()) + }) +} + +pub fn find_callees_batch( + ctx: &Context, + symbol_ids: &[String], + limit: usize, +) -> anyhow::Result> { + if symbol_ids.is_empty() { + return Ok(vec![]); + } + with_required_core_graph(ctx, |client| { + let (query, params) = find_callees_batch_query(&ctx.project_id, symbol_ids, limit); + let rows = client.query(&query, Some(params))?; + Ok(rows.iter().map(row_to_graph_result).collect()) + }) +} + +pub fn get_imports(ctx: &Context, file_path: &str) -> anyhow::Result> { + with_required_core_graph(ctx, |client| { + let (query, params) = get_imports_query(&ctx.project_id, file_path); + let rows = client.query(&query, Some(params))?; + Ok(rows.iter().map(row_to_graph_result).collect()) + }) +} + +pub fn blast_radius( + ctx: &Context, + symbol_id: &str, + depth: usize, +) -> anyhow::Result> { + with_required_core_graph(ctx, |client| { + let query = blast_radius_query(depth, MAX_GRAPH_LIMIT); + let params = typed_query::string_params(&[("project", &ctx.project_id), ("id", symbol_id)]); + let rows = client.query(&query, Some(params))?; + Ok(rows.iter().map(row_to_graph_result).collect()) + }) +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::models::{ProjectionProvenance, SOURCE_SYSTEM_GCODE}; + use serde_json::json; + + #[test] + fn code_edges_carry_provenance() { + let metadata = extracted_code_edge_metadata("src/lib.rs", 42, Some("caller-1")); + + assert_eq!(metadata.provenance, ProjectionProvenance::Extracted); + assert_eq!(metadata.confidence, Some(1.0)); + assert_eq!(metadata.source_system, SOURCE_SYSTEM_GCODE); + assert_eq!(metadata.source_file_path.as_deref(), Some("src/lib.rs")); + assert_eq!(metadata.source_line, Some(42)); + assert_eq!(metadata.source_symbol_id.as_deref(), Some("caller-1")); + } + + #[test] + fn read_apis_return_node_link_payloads_with_link_metadata() { + let mut payload = GraphPayload::default(); + payload.push_node(GraphNode::new("src/lib.rs", "src/lib.rs", "file")); + + let link_row = Row::from([ + ("source".to_string(), json!("src/lib.rs")), + ("target".to_string(), json!("symbol-1")), + ("type".to_string(), json!("DEFINES")), + ("line".to_string(), json!(12)), + ("provenance".to_string(), json!("EXTRACTED")), + ("confidence".to_string(), json!(1.0)), + ("source_system".to_string(), json!("gcode")), + ("source_file_path".to_string(), json!("src/lib.rs")), + ("source_line".to_string(), json!(12)), + ("source_symbol_id".to_string(), json!("symbol-1")), + ]); + payload.links.push(GraphLink::from_row(&link_row)); + + let encoded = serde_json::to_value(&payload).expect("payload serializes"); + + assert_eq!(encoded["nodes"][0]["id"], "src/lib.rs"); + assert_eq!(encoded["nodes"][0]["type"], "file"); + assert_eq!(encoded["links"][0]["source"], "src/lib.rs"); + assert_eq!(encoded["links"][0]["target"], "symbol-1"); + assert_eq!(encoded["links"][0]["type"], "DEFINES"); + assert_eq!(encoded["links"][0]["metadata"]["provenance"], "EXTRACTED"); + assert_eq!(encoded["links"][0]["metadata"]["source_system"], "gcode"); + } + + #[test] + fn file_calls_query_keeps_node_and_metadata_source_paths_distinct() { + let (query, _) = file_calls_query("project-1", "src/lib.rs"); + + assert!(query.contains("source.file_path AS source_file_path")); + assert!(query.contains("r.source_file_path AS metadata_source_file_path")); + assert!(!query.contains("r.source_file_path AS source_file_path")); + } + + #[test] + fn projection_metadata_uses_only_metadata_source_file_path() { + let row = Row::from([ + ("provenance".to_string(), json!("EXTRACTED")), + ("source_system".to_string(), json!("gcode")), + ("source_file_path".to_string(), json!("src/node.rs")), + ( + "metadata_source_file_path".to_string(), + json!("src/edge.rs"), + ), + ]); + + let metadata = row_to_projection_metadata(&row).expect("metadata"); + + assert_eq!(metadata.source_file_path.as_deref(), Some("src/edge.rs")); + } + + #[test] + fn projection_metadata_does_not_fallback_to_node_source_file_path() { + let row = Row::from([ + ("provenance".to_string(), json!("EXTRACTED")), + ("source_system".to_string(), json!("gcode")), + ("source_file_path".to_string(), json!("src/node.rs")), + ]); + + let metadata = row_to_projection_metadata(&row).expect("metadata"); + + assert_eq!(metadata.source_file_path, None); + } + + #[test] + fn delete_preserves_current_symbols() { + let current_ids = vec!["symbol-current".to_string()]; + let queries = + delete_file_graph_queries("project-1", "src/lib.rs", ¤t_ids).expect("queries"); + + let combined = queries + .iter() + .map(|query| query.cypher.as_str()) + .collect::>() + .join("\n"); + + assert!( + combined.contains( + "MATCH (s:CodeSymbol {project: $project, file_path: $file_path})-[r:CALLS]->()" + ), + "{combined}" + ); + assert!( + combined.contains("WHERE NOT s.id IN $symbol_ids"), + "{combined}" + ); + assert!( + !combined.contains( + "MATCH (s:CodeSymbol {project: $project, file_path: $file_path})\n DETACH DELETE s" + ), + "{combined}" + ); + + let stale_symbol_cleanup = queries + .iter() + .find(|query| query.cypher.contains("WHERE NOT s.id IN $symbol_ids")) + .expect("stale symbol cleanup query"); + assert_eq!( + stale_symbol_cleanup + .params + .get("symbol_ids") + .map(String::as_str), + Some("['symbol-current']") + ); + } + + #[test] + fn cleanup_orphans_is_project_scoped() { + let queries = cleanup_orphans_queries("project-1").expect("queries"); + assert_eq!(queries.len(), 3); + + for query in &queries { + assert_eq!( + query.params.get("project").map(String::as_str), + Some("'project-1'") + ); + assert!( + query.cypher.contains("{project: $project}"), + "{}", + query.cypher + ); + } + + assert!( + queries[0] + .cypher + .contains("MATCH (m:CodeModule {project: $project})"), + "{}", + queries[0].cypher + ); + assert!( + queries[1] + .cypher + .contains("WHERE (n:UnresolvedCallee OR n:ExternalSymbol)"), + "{}", + queries[1].cypher + ); + assert!( + queries[2] + .cypher + .contains("MATCH (s:CodeSymbol {project: $project})") + && queries[2].cypher.contains("s.file_path IS NULL") + && queries[2].cypher.contains("NOT ()-[:DEFINES]->(s)") + && queries[2].cypher.contains("NOT ()-[:CALLS]->(s)") + && queries[2].cypher.contains("NOT (s)-[:CALLS]->()"), + "{}", + queries[2].cypher + ); + } + + #[test] + fn delete_file_node_is_project_and_path_scoped() { + let query = delete_file_node_query("project-1", "src/lib.rs").expect("query"); + + assert!( + query + .cypher + .contains("MATCH (f:CodeFile {path: $file_path, project: $project})"), + "{}", + query.cypher + ); + assert!(query.cypher.contains("DETACH DELETE f"), "{}", query.cypher); + assert_eq!( + query.params.get("project").map(String::as_str), + Some("'project-1'") + ); + assert_eq!( + query.params.get("file_path").map(String::as_str), + Some("'src/lib.rs'") + ); + } + + #[test] + fn clear_project_is_project_scoped() { + let query = clear_project_query("project-1").expect("query"); + + assert!(query.cypher.contains("MATCH (n {project: $project})")); + assert!(query.cypher.contains("n:CodeFile")); + assert!(query.cypher.contains("n:CodeSymbol")); + assert_eq!( + query.params.get("project").map(String::as_str), + Some("'project-1'") + ); + } + + #[test] + fn clear_project_targets_only_code_index_labels() { + let query = clear_project_query("project-1").expect("query"); + + for code_label in [ + "n:CodeFile", + "n:CodeSymbol", + "n:CodeModule", + "n:UnresolvedCallee", + "n:ExternalSymbol", + ] { + assert!(query.cypher.contains(code_label), "missing {code_label}"); + } + + for memory_label in [ + "Memory", + "MemoryNode", + "MemoryGraph", + "Entity", + "Observation", + "Relationship", + "RELATES_TO_CODE", + ] { + assert!( + !query.cypher.contains(memory_label), + "code graph clear must not target memory label {memory_label}" + ); + } + } + + #[test] + fn clear_all_code_index_targets_only_code_index_labels() { + let query = clear_all_code_index_query().expect("query"); + + assert!(query.cypher.contains("MATCH (n)")); + assert!(query.cypher.contains("n:CodeFile")); + assert!(query.cypher.contains("n:CodeSymbol")); + assert!(query.cypher.contains("n:CodeModule")); + assert!(query.cypher.contains("n:UnresolvedCallee")); + assert!(query.cypher.contains("n:ExternalSymbol")); + assert!(!query.cypher.contains("config_store")); + assert!(!query.cypher.contains("MATCH (n {project: $project})")); + assert!(query.params.is_empty()); + } +} diff --git a/crates/gcode/src/graph/mod.rs b/crates/gcode/src/graph/mod.rs new file mode 100644 index 0000000..6c058d0 --- /dev/null +++ b/crates/gcode/src/graph/mod.rs @@ -0,0 +1,3 @@ +pub mod code_graph; +pub mod report; +pub mod typed_query; diff --git a/crates/gcode/src/graph/report.rs b/crates/gcode/src/graph/report.rs new file mode 100644 index 0000000..2a2fd71 --- /dev/null +++ b/crates/gcode/src/graph/report.rs @@ -0,0 +1,1093 @@ +use std::collections::{BTreeMap, HashMap}; +use std::fmt; +use std::time::{SystemTime, UNIX_EPOCH}; + +use gobby_core::degradation::ServiceState; +use gobby_core::falkor::{GraphClient, Row}; +use serde::{Deserialize, Serialize}; +use serde_json::Value; + +use crate::config::Context; +use crate::graph::typed_query; +use crate::models::{ProjectionMetadata, ProjectionProvenance}; + +const RELATES_TO_CODE: &str = "RELATES_TO_CODE"; +const DEFAULT_TOP_LIMIT: usize = 10; + +#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)] +pub struct BridgeEdgeHypothesis { + pub source_id: String, + pub target_symbol_id: String, + pub relation: String, + pub label: String, + pub read_only: bool, + pub metadata: ProjectionMetadata, +} + +impl BridgeEdgeHypothesis { + pub fn new( + source_id: impl Into, + target_symbol_id: impl Into, + relation: impl Into, + metadata: ProjectionMetadata, + ) -> Self { + Self { + source_id: source_id.into(), + target_symbol_id: target_symbol_id.into(), + relation: relation.into(), + label: "inferred hypothesis".to_string(), + read_only: true, + metadata: inferred_bridge_metadata(metadata), + } + } + + pub fn inferred( + source_id: impl Into, + target_symbol_id: impl Into, + relation: impl Into, + source_system: impl Into, + confidence: Option, + ) -> Self { + Self::new( + source_id, + target_symbol_id, + relation, + ProjectionMetadata::inferred(source_system, confidence), + ) + } +} + +#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)] +pub struct ProjectGraphReport { + pub project_id: String, + pub generated_at: String, + pub summary: GraphReportSummary, + pub hotspots: GraphReportHotspots, + pub unresolved_targets: Vec, + pub external_targets: Vec, + #[serde(skip_serializing_if = "Option::is_none")] + pub bridge_summary: Option, + #[serde(default, skip_serializing_if = "Vec::is_empty")] + pub bridge_edges: Vec, + #[serde(default, skip_serializing_if = "Vec::is_empty")] + pub degradation_details: Vec, + pub suggested_investigation_questions: Vec, + pub markdown: String, +} + +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub struct ProjectGraphReportOptions { + pub top_n: usize, +} + +impl Default for ProjectGraphReportOptions { + fn default() -> Self { + Self { + top_n: DEFAULT_TOP_LIMIT, + } + } +} + +impl ProjectGraphReportOptions { + fn normalized(self) -> Self { + Self { + top_n: self.top_n.max(1), + } + } +} + +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +pub struct GraphReportSummary { + pub node_count: usize, + pub edge_count: usize, + pub node_counts_by_type: BTreeMap, + pub code_edge_counts: BTreeMap, +} + +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +pub struct GraphReportHotspots { + pub high_degree_files: Vec, + pub high_degree_symbols: Vec, + pub high_degree_modules: Vec, + pub incoming_call_hotspots: Vec, +} + +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +pub struct GraphHotspot { + pub id: String, + pub name: String, + #[serde(rename = "type")] + pub node_type: String, + pub degree: usize, + pub incoming: usize, + pub outgoing: usize, + #[serde(skip_serializing_if = "Option::is_none")] + pub file_path: Option, +} + +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +pub struct TargetFrequency { + pub id: String, + pub name: String, + pub count: usize, +} + +#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)] +pub struct BridgeReportSummary { + pub relation: String, + pub edge_count: usize, + pub inferred: bool, + pub read_only: bool, + pub source_system_counts: Vec, + #[serde(skip_serializing_if = "Option::is_none")] + pub confidence_range: Option, +} + +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +pub struct NamedCount { + pub name: String, + pub count: usize, +} + +#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)] +pub struct ConfidenceRange { + pub min: f64, + pub max: f64, +} + +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +pub struct ReportDegradation { + pub input: String, + pub required: bool, + pub detail: String, +} + +#[derive(Debug, Clone, PartialEq, Eq)] +pub enum ProjectGraphReportError { + GraphServiceNotConfigured, + GraphServiceUnreachable { message: String }, + GraphQueryFailed { message: String }, +} + +impl fmt::Display for ProjectGraphReportError { + fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { + match self { + Self::GraphServiceNotConfigured => { + f.write_str("FalkorDB is not configured; project graph report requires FalkorDB") + } + Self::GraphServiceUnreachable { message } => write!( + f, + "FalkorDB is unreachable; project graph report requires FalkorDB: {message}" + ), + Self::GraphQueryFailed { message } => { + write!(f, "project graph report query failed: {message}") + } + } + } +} + +impl std::error::Error for ProjectGraphReportError {} + +#[derive(Debug, Clone, Default, PartialEq)] +struct ReportGraphSnapshot { + nodes: Vec, + code_edges: Vec, + bridge_edges: BridgeEdgeInput, +} + +#[derive(Debug, Clone, PartialEq, Eq)] +struct ReportNode { + id: String, + name: String, + node_type: String, + file_path: Option, +} + +impl ReportNode { + fn new(id: impl Into, name: impl Into, node_type: impl Into) -> Self { + Self { + id: id.into(), + name: name.into(), + node_type: node_type.into(), + file_path: None, + } + } + + #[cfg(test)] + fn with_file_path(mut self, file_path: impl Into) -> Self { + self.file_path = Some(file_path.into()); + self + } +} + +#[derive(Debug, Clone, PartialEq, Eq)] +struct ReportCodeEdge { + source: String, + target: String, + edge_type: String, +} + +impl ReportCodeEdge { + fn new( + source: impl Into, + target: impl Into, + edge_type: impl Into, + ) -> Self { + Self { + source: source.into(), + target: target.into(), + edge_type: edge_type.into(), + } + } +} + +#[derive(Debug, Clone, PartialEq)] +enum BridgeEdgeInput { + Available(Vec), + Unavailable(String), +} + +impl BridgeEdgeInput { + fn available(edges: Vec) -> Self { + Self::Available(edges) + } + + fn unavailable(reason: impl Into) -> Self { + Self::Unavailable(reason.into()) + } +} + +impl Default for BridgeEdgeInput { + fn default() -> Self { + Self::Available(vec![]) + } +} + +#[derive(Debug, Clone, Copy, Default)] +struct DegreeStats { + incoming: usize, + outgoing: usize, +} + +pub fn generate_report(ctx: &Context) -> Result { + generate_report_with_options(ctx, ProjectGraphReportOptions::default()) +} + +pub fn generate_report_with_options( + ctx: &Context, + options: ProjectGraphReportOptions, +) -> Result { + let Some(config) = ctx.falkordb.as_ref() else { + return Err(ProjectGraphReportError::GraphServiceNotConfigured); + }; + + let connection_config = config.connection_config(); + let result = gobby_core::falkor::with_graph( + Some(&connection_config), + &config.graph_name, + ReportGraphSnapshot::default(), + |client| load_report_snapshot(client, &ctx.project_id), + ); + + match result { + Ok((snapshot, ServiceState::Available)) => Ok(generate_report_from_snapshot_with_options( + &ctx.project_id, + now_iso8601(), + snapshot, + options, + )), + Ok((_, ServiceState::NotConfigured)) => { + Err(ProjectGraphReportError::GraphServiceNotConfigured) + } + Ok((_, ServiceState::Unreachable { message })) => { + Err(ProjectGraphReportError::GraphServiceUnreachable { message }) + } + Err(error) => Err(ProjectGraphReportError::GraphQueryFailed { + message: error.to_string(), + }), + } +} + +pub fn empty_report(project_id: impl Into) -> ProjectGraphReport { + generate_report_from_snapshot(project_id, now_iso8601(), ReportGraphSnapshot::default()) +} + +fn generate_report_from_snapshot( + project_id: impl Into, + generated_at: impl Into, + snapshot: ReportGraphSnapshot, +) -> ProjectGraphReport { + generate_report_from_snapshot_with_options( + project_id, + generated_at, + snapshot, + ProjectGraphReportOptions::default(), + ) +} + +fn generate_report_from_snapshot_with_options( + project_id: impl Into, + generated_at: impl Into, + snapshot: ReportGraphSnapshot, + options: ProjectGraphReportOptions, +) -> ProjectGraphReport { + let options = options.normalized(); + let project_id = project_id.into(); + let generated_at = generated_at.into(); + let node_by_id = snapshot + .nodes + .iter() + .map(|node| (node.id.as_str(), node)) + .collect::>(); + + let summary = summarize_graph(&snapshot.nodes, &snapshot.code_edges); + let hotspots = summarize_hotspots(&snapshot.nodes, &snapshot.code_edges, options.top_n); + let unresolved_targets = target_frequencies( + &snapshot.code_edges, + &node_by_id, + "unresolved", + options.top_n, + ); + let external_targets = + target_frequencies(&snapshot.code_edges, &node_by_id, "external", options.top_n); + + let (bridge_edges, mut degradation_details) = match snapshot.bridge_edges { + BridgeEdgeInput::Available(edges) => (normalize_bridge_edges(edges), vec![]), + BridgeEdgeInput::Unavailable(reason) => ( + vec![], + vec![ReportDegradation { + input: RELATES_TO_CODE.to_string(), + required: false, + detail: reason, + }], + ), + }; + let bridge_summary = summarize_bridge_edges(&bridge_edges); + let suggested_investigation_questions = suggested_questions( + &hotspots, + &unresolved_targets, + &external_targets, + bridge_summary.as_ref(), + °radation_details, + ); + let markdown = render_markdown(RenderMarkdownInput { + project_id: &project_id, + generated_at: &generated_at, + summary: &summary, + hotspots: &hotspots, + unresolved_targets: &unresolved_targets, + external_targets: &external_targets, + bridge_summary: bridge_summary.as_ref(), + degradation_details: °radation_details, + top_n: options.top_n, + }); + + degradation_details.sort_by(|left, right| left.input.cmp(&right.input)); + + ProjectGraphReport { + project_id, + generated_at, + summary, + hotspots, + unresolved_targets, + external_targets, + bridge_summary, + bridge_edges, + degradation_details, + suggested_investigation_questions, + markdown, + } +} + +fn load_report_snapshot( + client: &mut GraphClient, + project_id: &str, +) -> anyhow::Result { + let (query, params) = report_nodes_query(project_id); + let nodes = client + .query(&query, Some(params))? + .iter() + .filter_map(row_to_report_node) + .collect::>(); + + let (query, params) = report_code_edges_query(project_id); + let code_edges = client + .query(&query, Some(params))? + .iter() + .filter_map(row_to_report_code_edge) + .collect::>(); + + let (query, params) = report_bridge_edges_query(project_id); + let bridge_edges = match client.query(&query, Some(params)) { + Ok(rows) => BridgeEdgeInput::available( + rows.iter() + .filter_map(row_to_bridge_edge_hypothesis) + .collect(), + ), + Err(error) => BridgeEdgeInput::unavailable(format!("bridge edge query failed: {error}")), + }; + + Ok(ReportGraphSnapshot { + nodes, + code_edges, + bridge_edges, + }) +} + +fn report_nodes_query(project_id: &str) -> (String, HashMap) { + ( + "MATCH (n {project: $project}) \ + WHERE n:CodeFile OR n:CodeSymbol OR n:CodeModule OR n:UnresolvedCallee OR n:ExternalSymbol \ + RETURN coalesce(n.id, n.path, n.name) AS id, \ + coalesce(n.name, n.path, n.id) AS name, \ + CASE \ + WHEN n:CodeFile THEN 'file' \ + WHEN n:CodeModule THEN 'module' \ + WHEN n:CodeSymbol THEN coalesce(n.kind, 'symbol') \ + WHEN n:UnresolvedCallee THEN 'unresolved' \ + WHEN n:ExternalSymbol THEN 'external' \ + ELSE 'node' \ + END AS node_type, \ + coalesce(n.file_path, n.path) AS file_path" + .to_string(), + typed_query::string_params(&[("project", project_id)]), + ) +} + +fn report_code_edges_query(project_id: &str) -> (String, HashMap) { + ( + "MATCH (source {project: $project})-[r]->(target {project: $project}) \ + WHERE type(r) IN ['DEFINES', 'IMPORTS', 'CALLS'] \ + RETURN coalesce(source.id, source.path, source.name) AS source, \ + coalesce(target.id, target.path, target.name) AS target, \ + type(r) AS edge_type" + .to_string(), + typed_query::string_params(&[("project", project_id)]), + ) +} + +fn report_bridge_edges_query(project_id: &str) -> (String, HashMap) { + ( + "MATCH (source)-[r:RELATES_TO_CODE]->(target:CodeSymbol {project: $project}) \ + RETURN coalesce(source.id, source.uuid, source.name) AS source_id, \ + target.id AS target_symbol_id, \ + 'RELATES_TO_CODE' AS relation, \ + r.provenance AS provenance, \ + r.confidence AS confidence, \ + coalesce(r.source_system, 'gobby-memory') AS source_system, \ + r.source_file_path AS source_file_path, \ + r.source_line AS source_line, \ + r.source_symbol_id AS source_symbol_id, \ + r.matching_method AS matching_method" + .to_string(), + typed_query::string_params(&[("project", project_id)]), + ) +} + +fn row_to_report_node(row: &Row) -> Option { + let id = row_string(row, &["id"])?; + let name = row_string(row, &["name"]).unwrap_or_else(|| id.clone()); + let node_type = row_string(row, &["node_type"]).unwrap_or_else(|| "node".to_string()); + let mut node = ReportNode::new(id, name, node_type); + node.file_path = row_string(row, &["file_path"]); + Some(node) +} + +fn row_to_report_code_edge(row: &Row) -> Option { + let source = row_string(row, &["source"])?; + let target = row_string(row, &["target"])?; + let edge_type = row_string(row, &["edge_type"]).unwrap_or_else(|| "CALLS".to_string()); + Some(ReportCodeEdge::new(source, target, edge_type)) +} + +fn row_to_bridge_edge_hypothesis(row: &Row) -> Option { + let source_id = row_string(row, &["source_id"])?; + let target_symbol_id = row_string(row, &["target_symbol_id"])?; + let relation = row_string(row, &["relation"]).unwrap_or_else(|| RELATES_TO_CODE.to_string()); + let source_system = + row_string(row, &["source_system"]).unwrap_or_else(|| "gobby-memory".to_string()); + + let mut metadata = ProjectionMetadata::new( + row_string(row, &["provenance"]) + .and_then(|value| ProjectionProvenance::from_wire_value(&value)) + .unwrap_or(ProjectionProvenance::Inferred), + source_system, + ); + metadata.confidence = row_f64(row, &["confidence"]); + metadata.source_file_path = row_string(row, &["source_file_path"]); + metadata.source_line = row_usize(row, &["source_line"]); + metadata.source_symbol_id = row_string(row, &["source_symbol_id"]); + metadata.matching_method = row_string(row, &["matching_method"]); + + Some(BridgeEdgeHypothesis::new( + source_id, + target_symbol_id, + relation, + metadata, + )) +} + +fn summarize_graph(nodes: &[ReportNode], edges: &[ReportCodeEdge]) -> GraphReportSummary { + let mut node_counts_by_type = BTreeMap::new(); + for node in nodes { + *node_counts_by_type + .entry(node.node_type.clone()) + .or_insert(0) += 1; + } + + let mut code_edge_counts = BTreeMap::new(); + for edge in edges { + *code_edge_counts.entry(edge.edge_type.clone()).or_insert(0) += 1; + } + + GraphReportSummary { + node_count: nodes.len(), + edge_count: edges.len(), + node_counts_by_type, + code_edge_counts, + } +} + +fn summarize_hotspots( + nodes: &[ReportNode], + edges: &[ReportCodeEdge], + top_n: usize, +) -> GraphReportHotspots { + let mut degree = HashMap::<&str, DegreeStats>::new(); + let mut incoming_calls = HashMap::<&str, usize>::new(); + for edge in edges { + degree.entry(&edge.source).or_default().outgoing += 1; + degree.entry(&edge.target).or_default().incoming += 1; + if edge.edge_type == "CALLS" { + *incoming_calls.entry(&edge.target).or_insert(0) += 1; + } + } + + GraphReportHotspots { + high_degree_files: top_hotspots(nodes, °ree, top_n, |node| node.node_type == "file"), + high_degree_symbols: top_hotspots(nodes, °ree, top_n, |node| { + is_symbol_node(&node.node_type) + }), + high_degree_modules: top_hotspots(nodes, °ree, top_n, |node| node.node_type == "module"), + incoming_call_hotspots: top_incoming_call_hotspots(nodes, &incoming_calls, top_n), + } +} + +fn top_hotspots( + nodes: &[ReportNode], + degree: &HashMap<&str, DegreeStats>, + top_n: usize, + include: impl Fn(&ReportNode) -> bool, +) -> Vec { + let mut hotspots = nodes + .iter() + .filter(|node| include(node)) + .filter_map(|node| { + let stats = degree.get(node.id.as_str())?; + let total = stats.incoming + stats.outgoing; + (total > 0).then(|| GraphHotspot { + id: node.id.clone(), + name: node.name.clone(), + node_type: node.node_type.clone(), + degree: total, + incoming: stats.incoming, + outgoing: stats.outgoing, + file_path: node.file_path.clone(), + }) + }) + .collect::>(); + sort_hotspots(&mut hotspots); + hotspots.truncate(top_n); + hotspots +} + +fn top_incoming_call_hotspots( + nodes: &[ReportNode], + incoming_calls: &HashMap<&str, usize>, + top_n: usize, +) -> Vec { + let mut hotspots = nodes + .iter() + .filter(|node| is_symbol_node(&node.node_type)) + .filter_map(|node| { + let incoming = incoming_calls.get(node.id.as_str()).copied().unwrap_or(0); + (incoming > 0).then(|| GraphHotspot { + id: node.id.clone(), + name: node.name.clone(), + node_type: node.node_type.clone(), + degree: incoming, + incoming, + outgoing: 0, + file_path: node.file_path.clone(), + }) + }) + .collect::>(); + sort_hotspots(&mut hotspots); + hotspots.truncate(top_n); + hotspots +} + +fn target_frequencies( + edges: &[ReportCodeEdge], + node_by_id: &HashMap<&str, &ReportNode>, + target_type: &str, + top_n: usize, +) -> Vec { + let mut counts = BTreeMap::::new(); + for edge in edges.iter().filter(|edge| edge.edge_type == "CALLS") { + let Some(node) = node_by_id.get(edge.target.as_str()) else { + continue; + }; + if node.node_type != target_type { + continue; + } + let entry = counts + .entry(node.id.clone()) + .or_insert_with(|| TargetFrequency { + id: node.id.clone(), + name: node.name.clone(), + count: 0, + }); + entry.count += 1; + } + + let mut frequencies = counts.into_values().collect::>(); + frequencies.sort_by(|left, right| { + right + .count + .cmp(&left.count) + .then_with(|| left.name.cmp(&right.name)) + .then_with(|| left.id.cmp(&right.id)) + }); + frequencies.truncate(top_n); + frequencies +} + +fn summarize_bridge_edges(edges: &[BridgeEdgeHypothesis]) -> Option { + if edges.is_empty() { + return None; + } + + let mut source_counts = BTreeMap::::new(); + let mut confidence_min = f64::INFINITY; + let mut confidence_max = f64::NEG_INFINITY; + let mut has_confidence = false; + for edge in edges { + *source_counts + .entry(edge.metadata.source_system.clone()) + .or_insert(0) += 1; + if let Some(confidence) = edge.metadata.confidence + && confidence.is_finite() + { + confidence_min = confidence_min.min(confidence); + confidence_max = confidence_max.max(confidence); + has_confidence = true; + } + } + + let source_system_counts = source_counts + .into_iter() + .map(|(name, count)| NamedCount { name, count }) + .collect(); + + Some(BridgeReportSummary { + relation: RELATES_TO_CODE.to_string(), + edge_count: edges.len(), + inferred: true, + read_only: true, + source_system_counts, + confidence_range: has_confidence.then_some(ConfidenceRange { + min: confidence_min, + max: confidence_max, + }), + }) +} + +fn normalize_bridge_edges(edges: Vec) -> Vec { + edges + .into_iter() + .map(|edge| { + BridgeEdgeHypothesis::new( + edge.source_id, + edge.target_symbol_id, + edge.relation, + edge.metadata, + ) + }) + .collect() +} + +fn suggested_questions( + hotspots: &GraphReportHotspots, + unresolved_targets: &[TargetFrequency], + external_targets: &[TargetFrequency], + bridge_summary: Option<&BridgeReportSummary>, + degradation_details: &[ReportDegradation], +) -> Vec { + let mut questions = + vec!["Which high-degree files or symbols should be reviewed before refactors?".to_string()]; + + if !hotspots.incoming_call_hotspots.is_empty() { + questions.push("Which incoming-call hotspots define the largest blast radius?".to_string()); + } + if !unresolved_targets.is_empty() || !external_targets.is_empty() { + questions.push( + "Which unresolved or external call targets should be resolved first?".to_string(), + ); + } + if bridge_summary.is_some() { + questions + .push("Which inferred RELATES_TO_CODE bridges need human confirmation?".to_string()); + } + if !degradation_details.is_empty() { + questions.push( + "Which degraded optional inputs should be restored for the next report?".to_string(), + ); + } + + questions +} + +struct RenderMarkdownInput<'a> { + project_id: &'a str, + generated_at: &'a str, + summary: &'a GraphReportSummary, + hotspots: &'a GraphReportHotspots, + unresolved_targets: &'a [TargetFrequency], + external_targets: &'a [TargetFrequency], + bridge_summary: Option<&'a BridgeReportSummary>, + degradation_details: &'a [ReportDegradation], + top_n: usize, +} + +fn render_markdown(input: RenderMarkdownInput<'_>) -> String { + let mut lines = vec![ + "# Project Graph Report".to_string(), + String::new(), + format!("- Project: {}", input.project_id), + format!("- Generated: {}", input.generated_at), + format!("- Nodes: {}", input.summary.node_count), + format!("- Edges: {}", input.summary.edge_count), + ]; + + if !input.summary.code_edge_counts.is_empty() { + lines.push(format!( + "- Code edges: {}", + named_counts_inline(&input.summary.code_edge_counts) + )); + } + + append_hotspot_section( + &mut lines, + "High-degree files", + &input.hotspots.high_degree_files, + input.top_n, + ); + append_hotspot_section( + &mut lines, + "High-degree symbols", + &input.hotspots.high_degree_symbols, + input.top_n, + ); + append_hotspot_section( + &mut lines, + "Incoming-call hotspots", + &input.hotspots.incoming_call_hotspots, + input.top_n, + ); + append_target_section( + &mut lines, + "Unresolved call targets", + input.unresolved_targets, + input.top_n, + ); + append_target_section( + &mut lines, + "External call targets", + input.external_targets, + input.top_n, + ); + + if let Some(summary) = input.bridge_summary { + lines.push(String::new()); + lines.push("RELATES_TO_CODE bridges".to_string()); + lines.push(format!( + "- {} inferred read-only edge(s)", + summary.edge_count + )); + if let Some(range) = &summary.confidence_range { + lines.push(format!("- Confidence: {:.3}..{:.3}", range.min, range.max)); + } + } + + if !input.degradation_details.is_empty() { + lines.push(String::new()); + lines.push("Degradation".to_string()); + for detail in input.degradation_details { + lines.push(format!("- {}: {}", detail.input, detail.detail)); + } + } + + lines.join("\n") +} + +fn append_hotspot_section( + lines: &mut Vec, + title: &str, + hotspots: &[GraphHotspot], + top_n: usize, +) { + if hotspots.is_empty() { + return; + } + lines.push(String::new()); + lines.push(title.to_string()); + for hotspot in hotspots.iter().take(top_n) { + lines.push(format!( + "- {} ({}, degree {})", + hotspot.name, hotspot.node_type, hotspot.degree + )); + } +} + +fn append_target_section( + lines: &mut Vec, + title: &str, + targets: &[TargetFrequency], + top_n: usize, +) { + if targets.is_empty() { + return; + } + lines.push(String::new()); + lines.push(title.to_string()); + for target in targets.iter().take(top_n) { + lines.push(format!("- {} ({})", target.name, target.count)); + } +} + +fn named_counts_inline(counts: &BTreeMap) -> String { + counts + .iter() + .map(|(name, count)| format!("{name}={count}")) + .collect::>() + .join(", ") +} + +fn sort_hotspots(hotspots: &mut [GraphHotspot]) { + hotspots.sort_by(|left, right| { + right + .degree + .cmp(&left.degree) + .then_with(|| left.name.cmp(&right.name)) + .then_with(|| left.id.cmp(&right.id)) + }); +} + +fn is_symbol_node(node_type: &str) -> bool { + !matches!(node_type, "file" | "module" | "unresolved" | "external") +} + +fn inferred_bridge_metadata(mut metadata: ProjectionMetadata) -> ProjectionMetadata { + metadata.provenance = ProjectionProvenance::Inferred; + metadata +} + +fn row_string(row: &Row, keys: &[&str]) -> Option { + keys.iter() + .find_map(|key| row.get(*key).and_then(Value::as_str)) + .filter(|value| !value.is_empty()) + .map(ToOwned::to_owned) +} + +fn row_usize(row: &Row, keys: &[&str]) -> Option { + keys.iter() + .find_map(|key| row.get(*key)) + .and_then(|value| { + value + .as_u64() + .or_else(|| value.as_i64().and_then(|value| value.try_into().ok())) + }) + .map(|value| value as usize) +} + +fn row_f64(row: &Row, keys: &[&str]) -> Option { + keys.iter() + .find_map(|key| row.get(*key)) + .and_then(Value::as_f64) +} + +fn now_iso8601() -> String { + let dur = SystemTime::now() + .duration_since(UNIX_EPOCH) + .unwrap_or_default(); + let secs = dur.as_secs(); + let micros = dur.subsec_micros(); + + let (year, month, day) = days_to_ymd(secs / 86400); + let daytime = secs % 86400; + let hour = daytime / 3600; + let minute = (daytime % 3600) / 60; + let second = daytime % 60; + + format!("{year:04}-{month:02}-{day:02}T{hour:02}:{minute:02}:{second:02}.{micros:06}+00:00") +} + +fn days_to_ymd(days: u64) -> (u64, u64, u64) { + let z = days as i64 + 719468; + let era = if z >= 0 { z } else { z - 146096 } / 146097; + let doe = (z - era * 146097) as u64; + let yoe = (doe - doe / 1460 + doe / 36524 - doe / 146096) / 365; + let y = yoe as i64 + era * 400; + let doy = doe - (365 * yoe + yoe / 4 - yoe / 100); + let mp = (5 * doy + 2) / 153; + let day = doy - (153 * mp + 2) / 5 + 1; + let month = if mp < 10 { mp + 3 } else { mp - 9 }; + let year = if month <= 2 { y + 1 } else { y }; + (year as u64, month, day) +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::config::{CodeVectorSettings, Context}; + use crate::models::{ProjectionMetadata, ProjectionProvenance}; + use std::path::PathBuf; + + #[test] + fn report_shape() { + let snapshot = ReportGraphSnapshot { + nodes: vec![ + ReportNode::new("src/lib.rs", "src/lib.rs", "file"), + ReportNode::new("mod:api", "api", "module"), + ReportNode::new("sym:handler", "handler", "function").with_file_path("src/lib.rs"), + ReportNode::new("sym:parse", "parse", "function").with_file_path("src/lib.rs"), + ReportNode::new("unresolved:do_work", "do_work", "unresolved"), + ReportNode::new("external:serde_json", "serde_json", "external"), + ], + code_edges: vec![ + ReportCodeEdge::new("src/lib.rs", "sym:handler", "DEFINES"), + ReportCodeEdge::new("src/lib.rs", "mod:api", "IMPORTS"), + ReportCodeEdge::new("sym:handler", "sym:parse", "CALLS"), + ReportCodeEdge::new("sym:parse", "unresolved:do_work", "CALLS"), + ReportCodeEdge::new("sym:handler", "external:serde_json", "CALLS"), + ], + bridge_edges: BridgeEdgeInput::available(vec![BridgeEdgeHypothesis::inferred( + "memory-1", + "sym:handler", + RELATES_TO_CODE, + "gobby-memory", + Some(0.72), + )]), + }; + + let report = generate_report_from_snapshot("project-1", "2026-05-28T00:00:00Z", snapshot); + let json = serde_json::to_value(&report).expect("report serializes"); + + assert_eq!(json["project_id"], "project-1"); + assert_eq!(json["summary"]["node_count"], 6); + assert_eq!(json["summary"]["edge_count"], 5); + assert_eq!(json["summary"]["code_edge_counts"]["CALLS"], 3); + assert_eq!(json["hotspots"]["high_degree_files"][0]["id"], "src/lib.rs"); + assert_eq!( + json["hotspots"]["incoming_call_hotspots"][0]["id"], + "sym:parse" + ); + assert_eq!(json["unresolved_targets"][0]["name"], "do_work"); + assert_eq!(json["external_targets"][0]["name"], "serde_json"); + assert_eq!(json["bridge_summary"]["relation"], RELATES_TO_CODE); + assert_eq!(json["bridge_summary"]["confidence_range"]["min"], 0.72); + assert!(json["markdown"].as_str().unwrap().contains("project-1")); + assert!( + !json["suggested_investigation_questions"] + .as_array() + .unwrap() + .is_empty() + ); + } + + #[test] + fn bridge_edges_are_read_only() { + let edge = BridgeEdgeHypothesis::new( + "memory-1", + "symbol-1", + RELATES_TO_CODE, + ProjectionMetadata::gcode_extracted(), + ); + + assert!(edge.read_only); + assert_eq!(edge.label, "inferred hypothesis"); + assert_eq!(edge.metadata.provenance, ProjectionProvenance::Inferred); + + let snapshot = ReportGraphSnapshot { + nodes: vec![ReportNode::new("symbol-1", "handler", "function")], + code_edges: vec![], + bridge_edges: BridgeEdgeInput::available(vec![edge]), + }; + let report = generate_report_from_snapshot("project-1", "2026-05-28T00:00:00Z", snapshot); + let json = serde_json::to_value(&report).expect("report serializes"); + + assert_eq!(json["bridge_edges"][0]["read_only"], true); + assert_eq!( + json["bridge_edges"][0]["metadata"]["provenance"], + "INFERRED" + ); + } + + #[test] + fn report_degradation_contract() { + let ctx = Context { + database_url: "postgresql://localhost/unavailable".to_string(), + project_root: PathBuf::from("/tmp/project"), + project_id: "project-1".to_string(), + quiet: true, + falkordb: None, + qdrant: None, + embedding: None, + code_vectors: CodeVectorSettings::default(), + daemon_url: None, + }; + let err = generate_report(&ctx).expect_err("missing graph service is required"); + assert_eq!(err, ProjectGraphReportError::GraphServiceNotConfigured); + + let report = generate_report_from_snapshot( + "project-1", + "2026-05-28T00:00:00Z", + ReportGraphSnapshot { + nodes: vec![ReportNode::new("symbol-1", "handler", "function")], + code_edges: vec![], + bridge_edges: BridgeEdgeInput::unavailable("bridge edge query timed out"), + }, + ); + + assert_eq!(report.summary.node_count, 1); + assert_eq!(report.degradation_details.len(), 1); + assert_eq!(report.degradation_details[0].input, RELATES_TO_CODE); + assert!(!report.degradation_details[0].required); + } + + #[test] + fn bridge_edges_are_hypotheses() { + let edge = BridgeEdgeHypothesis::inferred( + "memory-1", + "symbol-1", + RELATES_TO_CODE, + "gobby-memory", + Some(0.72), + ); + + assert_eq!(edge.label, "inferred hypothesis"); + assert_eq!(edge.metadata.provenance, ProjectionProvenance::Inferred); + assert!(edge.metadata.is_hypothesis()); + + let mut report = empty_report("project-1"); + report.bridge_edges.push(edge); + + let json = serde_json::to_value(&report).expect("report serializes"); + assert_eq!(json["bridge_edges"][0]["label"], "inferred hypothesis"); + assert_eq!( + json["bridge_edges"][0]["metadata"]["provenance"], + "INFERRED" + ); + } +} diff --git a/crates/gcode/src/graph/typed_query.rs b/crates/gcode/src/graph/typed_query.rs new file mode 100644 index 0000000..6800de8 --- /dev/null +++ b/crates/gcode/src/graph/typed_query.rs @@ -0,0 +1,353 @@ +use std::collections::{BTreeMap, HashMap}; +use std::fmt; + +use serde::{Deserialize, Serialize}; + +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +pub struct TypedQuery { + pub cypher: String, + pub params: HashMap, +} + +#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)] +pub enum TypedValue { + Null, + String(String), + Integer(i64), + Float(f64), + Bool(bool), + List(Vec), + Map(BTreeMap), +} + +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub enum IdentifierKind { + ParameterName, + MapKey, +} + +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub enum ValueContext { + String, +} + +#[derive(Debug, Clone, PartialEq, Eq)] +pub enum TypedQueryError { + InvalidIdentifier { + kind: IdentifierKind, + identifier: String, + }, + ControlCharacter { + context: ValueContext, + codepoint: u32, + }, + NonFiniteFloat { + value: String, + }, +} + +impl TypedQuery { + pub fn new(cypher: impl Into) -> Self { + Self { + cypher: cypher.into(), + params: HashMap::new(), + } + } + + pub fn with_params(cypher: impl Into, params: I) -> Result + where + I: IntoIterator, + K: Into, + { + let mut query = Self::new(cypher); + for (name, value) in params { + query.insert_param(name, value)?; + } + Ok(query) + } + + pub fn insert_param( + &mut self, + name: impl Into, + value: TypedValue, + ) -> Result<(), TypedQueryError> { + let name = name.into(); + validate_identifier(&name, IdentifierKind::ParameterName)?; + let rendered = render_cypher_value(&value)?; + self.params.insert(name, rendered); + Ok(()) + } +} + +pub fn cypher_string_literal(s: &str) -> String { + format!("'{}'", escape_string_contents(s)) +} + +pub fn render_cypher_value(value: &TypedValue) -> Result { + match value { + TypedValue::Null => Ok("null".to_string()), + TypedValue::String(value) => render_string_literal(value), + TypedValue::Integer(value) => Ok(value.to_string()), + TypedValue::Float(value) => render_float(*value), + TypedValue::Bool(value) => Ok(value.to_string()), + TypedValue::List(values) => values + .iter() + .map(render_cypher_value) + .collect::, _>>() + .map(|values| format!("[{}]", values.join(", "))), + TypedValue::Map(values) => values + .iter() + .map(|(key, value)| { + validate_identifier(key, IdentifierKind::MapKey)?; + Ok(format!("{key}: {}", render_cypher_value(value)?)) + }) + .collect::, _>>() + .map(|values| format!("{{{}}}", values.join(", "))), + } +} + +pub fn string_params(values: &[(&str, &str)]) -> HashMap { + values + .iter() + .map(|(key, value)| ((*key).to_string(), cypher_string_literal(value))) + .collect() +} + +pub fn clamp_limit(limit: usize, max: usize) -> usize { + limit.clamp(1, max) +} + +pub fn clamp_offset(offset: usize, max: usize) -> usize { + offset.min(max) +} + +pub fn id_list_literal(ids: &[String]) -> String { + ids.iter() + .map(|id| cypher_string_literal(id)) + .collect::>() + .join(", ") +} + +pub fn validate_identifier(identifier: &str, kind: IdentifierKind) -> Result<(), TypedQueryError> { + let mut chars = identifier.chars(); + let Some(first) = chars.next() else { + return Err(TypedQueryError::InvalidIdentifier { + kind, + identifier: identifier.to_string(), + }); + }; + + if !(first == '_' || first.is_ascii_alphabetic()) + || !chars.all(|ch| ch == '_' || ch.is_ascii_alphanumeric()) + { + return Err(TypedQueryError::InvalidIdentifier { + kind, + identifier: identifier.to_string(), + }); + } + + Ok(()) +} + +fn render_string_literal(value: &str) -> Result { + reject_control_characters(value, ValueContext::String)?; + Ok(cypher_string_literal(value)) +} + +fn reject_control_characters(value: &str, context: ValueContext) -> Result<(), TypedQueryError> { + if let Some(ch) = value.chars().find(|ch| ch.is_control()) { + return Err(TypedQueryError::ControlCharacter { + context, + codepoint: ch as u32, + }); + } + Ok(()) +} + +fn escape_string_contents(value: &str) -> String { + value + .replace('\\', "\\\\") + .replace('\'', "\\'") + .replace('"', "\\\"") +} + +fn render_float(value: f64) -> Result { + if !value.is_finite() { + return Err(TypedQueryError::NonFiniteFloat { + value: value.to_string(), + }); + } + + let mut rendered = value.to_string(); + if !rendered.contains('.') && !rendered.contains('e') && !rendered.contains('E') { + rendered.push_str(".0"); + } + Ok(rendered) +} + +impl fmt::Display for IdentifierKind { + fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { + match self { + Self::ParameterName => f.write_str("parameter name"), + Self::MapKey => f.write_str("map key"), + } + } +} + +impl fmt::Display for ValueContext { + fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { + match self { + Self::String => f.write_str("string"), + } + } +} + +impl fmt::Display for TypedQueryError { + fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { + match self { + Self::InvalidIdentifier { kind, identifier } => write!( + f, + "invalid {kind} `{identifier}`; expected ^[A-Za-z_][A-Za-z0-9_]*$" + ), + Self::ControlCharacter { context, codepoint } => write!( + f, + "control character U+{codepoint:04X} is not allowed in {context} value" + ), + Self::NonFiniteFloat { value } => { + write!(f, "non-finite float `{value}` is not allowed") + } + } + } +} + +impl std::error::Error for TypedQueryError {} + +#[cfg(test)] +mod tests { + use super::*; + use std::collections::BTreeMap; + + #[test] + fn typed_params_render_nested_safe_cypher_literals() { + let mut props = BTreeMap::new(); + props.insert("enabled".to_string(), TypedValue::Bool(true)); + props.insert( + "label".to_string(), + TypedValue::String("caf\u{00e9} \"quote\" and 'single' \\ slash".to_string()), + ); + props.insert( + "nested".to_string(), + TypedValue::List(vec![ + TypedValue::Integer(1), + TypedValue::Float(2.25), + TypedValue::Bool(false), + ]), + ); + + let query = TypedQuery::with_params( + "RETURN $name, $count, $ratio, $whole, $enabled, $items, $props", + [ + ( + "name", + TypedValue::String("O'Reilly \\ path \u{2603}".to_string()), + ), + ("count", TypedValue::Integer(42)), + ("ratio", TypedValue::Float(1.5)), + ("whole", TypedValue::Float(1.0)), + ("enabled", TypedValue::Bool(true)), + ( + "items", + TypedValue::List(vec![ + TypedValue::String("a".to_string()), + TypedValue::Integer(-7), + TypedValue::Bool(false), + ]), + ), + ("props", TypedValue::Map(props)), + ], + ) + .expect("valid typed params should render"); + + assert_eq!( + query.cypher, + "RETURN $name, $count, $ratio, $whole, $enabled, $items, $props" + ); + assert_eq!( + query.params.get("name").map(String::as_str), + Some("'O\\'Reilly \\\\ path \u{2603}'") + ); + assert_eq!(query.params.get("count").map(String::as_str), Some("42")); + assert_eq!(query.params.get("ratio").map(String::as_str), Some("1.5")); + assert_eq!(query.params.get("whole").map(String::as_str), Some("1.0")); + assert_eq!( + query.params.get("enabled").map(String::as_str), + Some("true") + ); + assert_eq!( + query.params.get("items").map(String::as_str), + Some("['a', -7, false]") + ); + assert_eq!( + query.params.get("props").map(String::as_str), + Some( + "{enabled: true, label: 'caf\u{00e9} \\\"quote\\\" and \\'single\\' \\\\ slash', nested: [1, 2.25, false]}" + ) + ); + } + + #[test] + fn string_literals_escape_both_quote_delimiters() { + let rendered = render_cypher_value(&TypedValue::String("a 'single' and \"double\"".into())) + .expect("valid string should render"); + + assert_eq!(rendered, "'a \\'single\\' and \\\"double\\\"'"); + } + + #[test] + fn invalid_identifiers_return_typed_errors() { + let param_error = + TypedQuery::with_params("RETURN $bad", [("bad-name", TypedValue::Bool(true))]) + .expect_err("invalid parameter name should fail"); + assert_eq!( + param_error, + TypedQueryError::InvalidIdentifier { + kind: IdentifierKind::ParameterName, + identifier: "bad-name".to_string(), + } + ); + + let mut props = BTreeMap::new(); + props.insert("bad.key".to_string(), TypedValue::Integer(1)); + let map_error = + render_cypher_value(&TypedValue::Map(props)).expect_err("invalid map key should fail"); + assert_eq!( + map_error, + TypedQueryError::InvalidIdentifier { + kind: IdentifierKind::MapKey, + identifier: "bad.key".to_string(), + } + ); + } + + #[test] + fn unsafe_values_return_typed_errors() { + let control_error = TypedQuery::with_params( + "RETURN $name", + [("name", TypedValue::String("line\nbreak".to_string()))], + ) + .expect_err("control characters should fail"); + assert!(matches!( + control_error, + TypedQueryError::ControlCharacter { + context: ValueContext::String, + .. + } + )); + + for value in [f64::NAN, f64::INFINITY, f64::NEG_INFINITY] { + let error = render_cypher_value(&TypedValue::Float(value)) + .expect_err("non-finite float should fail"); + assert!(matches!(error, TypedQueryError::NonFiniteFloat { .. })); + } + } +} diff --git a/crates/gcode/src/index/api.rs b/crates/gcode/src/index/api.rs new file mode 100644 index 0000000..535338c --- /dev/null +++ b/crates/gcode/src/index/api.rs @@ -0,0 +1,260 @@ +use postgres::GenericClient; +use serde::{Deserialize, Serialize}; + +pub use crate::index::indexer::{ + IndexDegradation, IndexDurations, IndexOutcome, IndexRequest, index_files, +}; + +use crate::models::{ + CallRelation, ContentChunk, ImportRelation, IndexedFile, IndexedProject, Symbol, +}; + +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +pub struct CodeFactWriteRequest { + pub project_id: String, + pub file_path: String, + pub symbols: usize, + pub imports: usize, + pub calls: usize, + pub chunks: usize, +} + +#[derive(Debug, Clone, Default, PartialEq, Eq, Serialize, Deserialize)] +pub struct CodeFactWriteSummary { + pub files_written: usize, + pub symbols_written: usize, + pub imports_written: usize, + pub calls_written: usize, + pub chunks_written: usize, + pub graph_sync_pending: bool, + pub vectors_sync_pending: bool, +} + +impl CodeFactWriteSummary { + pub fn for_file(symbols: usize, imports: usize, calls: usize, chunks: usize) -> Self { + Self { + files_written: 1, + symbols_written: symbols, + imports_written: imports, + calls_written: calls, + chunks_written: chunks, + graph_sync_pending: true, + vectors_sync_pending: true, + } + } +} + +pub fn delete_file_facts( + conn: &mut impl GenericClient, + project_id: &str, + file_path: &str, +) -> anyhow::Result<()> { + conn.execute( + "DELETE FROM code_symbols WHERE project_id = $1 AND file_path = $2", + &[&project_id, &file_path], + )?; + conn.execute( + "DELETE FROM code_indexed_files WHERE project_id = $1 AND file_path = $2", + &[&project_id, &file_path], + )?; + conn.execute( + "DELETE FROM code_content_chunks WHERE project_id = $1 AND file_path = $2", + &[&project_id, &file_path], + )?; + conn.execute( + "DELETE FROM code_imports WHERE project_id = $1 AND source_file = $2", + &[&project_id, &file_path], + )?; + conn.execute( + "DELETE FROM code_calls WHERE project_id = $1 AND file_path = $2", + &[&project_id, &file_path], + )?; + Ok(()) +} + +pub fn upsert_symbols(conn: &mut impl GenericClient, symbols: &[Symbol]) -> anyhow::Result { + for sym in symbols { + conn.execute( + "INSERT INTO code_symbols ( + id, project_id, file_path, name, qualified_name, + kind, language, byte_start, byte_end, + line_start, line_end, signature, docstring, + parent_symbol_id, content_hash, summary, + created_at, updated_at + ) VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,NOW(),NOW()) + ON CONFLICT(id) DO UPDATE SET + name=excluded.name, qualified_name=excluded.qualified_name, + kind=excluded.kind, byte_start=excluded.byte_start, + byte_end=excluded.byte_end, line_start=excluded.line_start, + line_end=excluded.line_end, signature=excluded.signature, + docstring=excluded.docstring, parent_symbol_id=excluded.parent_symbol_id, + language=excluded.language, content_hash=excluded.content_hash, + summary=CASE WHEN excluded.content_hash != code_symbols.content_hash + THEN NULL ELSE code_symbols.summary END, + updated_at=NOW()", + &[ + &sym.id, + &sym.project_id, + &sym.file_path, + &sym.name, + &sym.qualified_name, + &sym.kind, + &sym.language, + &to_i32(sym.byte_start), + &to_i32(sym.byte_end), + &to_i32(sym.line_start), + &to_i32(sym.line_end), + &sym.signature, + &sym.docstring, + &sym.parent_symbol_id, + &sym.content_hash, + &sym.summary, + ], + )?; + } + Ok(symbols.len()) +} + +pub fn upsert_file(conn: &mut impl GenericClient, file: &IndexedFile) -> anyhow::Result<()> { + conn.execute( + "INSERT INTO code_indexed_files ( + id, project_id, file_path, language, content_hash, + symbol_count, byte_size, graph_synced, vectors_synced, + graph_sync_attempted_at, indexed_at + ) VALUES ($1,$2,$3,$4,$5,$6,$7,false,false,NULL,NOW()) + ON CONFLICT(id) DO UPDATE SET + content_hash=excluded.content_hash, + symbol_count=excluded.symbol_count, + byte_size=excluded.byte_size, + graph_synced=false, + vectors_synced=false, + graph_sync_attempted_at=NULL, + indexed_at=NOW()", + &[ + &file.id, + &file.project_id, + &file.file_path, + &file.language, + &file.content_hash, + &to_i32(file.symbol_count), + &to_i32(file.byte_size), + ], + )?; + Ok(()) +} + +pub fn upsert_content_chunks( + conn: &mut impl GenericClient, + chunks: &[ContentChunk], +) -> anyhow::Result { + for chunk in chunks { + conn.execute( + "INSERT INTO code_content_chunks ( + id, project_id, file_path, chunk_index, + line_start, line_end, content, language, created_at + ) VALUES ($1,$2,$3,$4,$5,$6,$7,$8,NOW()) + ON CONFLICT(id) DO UPDATE SET + content=excluded.content, + line_start=excluded.line_start, + line_end=excluded.line_end", + &[ + &chunk.id, + &chunk.project_id, + &chunk.file_path, + &to_i32(chunk.chunk_index), + &to_i32(chunk.line_start), + &to_i32(chunk.line_end), + &chunk.content, + &chunk.language, + ], + )?; + } + Ok(chunks.len()) +} + +pub fn upsert_project_stats( + conn: &mut impl GenericClient, + project: &IndexedProject, +) -> anyhow::Result<()> { + conn.execute( + "INSERT INTO code_indexed_projects ( + id, root_path, total_files, total_symbols, + last_indexed_at, index_duration_ms + ) VALUES ($1,$2,$3,$4,NOW(),$5) + ON CONFLICT(id) DO UPDATE SET + root_path=excluded.root_path, + total_files=excluded.total_files, + total_symbols=excluded.total_symbols, + last_indexed_at=excluded.last_indexed_at, + index_duration_ms=excluded.index_duration_ms, + updated_at=NOW()", + &[ + &project.id, + &project.root_path, + &to_i32(project.total_files), + &to_i32(project.total_symbols), + &to_i32(project.index_duration_ms as usize), + ], + )?; + Ok(()) +} + +pub fn upsert_imports( + conn: &mut impl GenericClient, + project_id: &str, + file_path: &str, + imports: &[ImportRelation], +) -> anyhow::Result { + conn.execute( + "DELETE FROM code_imports WHERE project_id = $1 AND source_file = $2", + &[&project_id, &file_path], + )?; + for imp in imports { + conn.execute( + "INSERT INTO code_imports (project_id, source_file, target_module) + VALUES ($1, $2, $3) + ON CONFLICT (project_id, source_file, target_module) DO NOTHING", + &[&project_id, &imp.file_path, &imp.module_name], + )?; + } + Ok(imports.len()) +} + +pub fn upsert_calls( + conn: &mut impl GenericClient, + project_id: &str, + file_path: &str, + calls: &[CallRelation], +) -> anyhow::Result { + conn.execute( + "DELETE FROM code_calls WHERE project_id = $1 AND file_path = $2", + &[&project_id, &file_path], + )?; + for call in calls { + conn.execute( + "INSERT INTO code_calls + (project_id, caller_symbol_id, callee_symbol_id, callee_name, \ + callee_target_kind, callee_external_module, file_path, line) + VALUES ($1, $2, $3, $4, $5, $6, $7, $8) + ON CONFLICT ( + project_id, caller_symbol_id, callee_symbol_id, callee_name, + callee_target_kind, callee_external_module, file_path, line + ) DO NOTHING", + &[ + &project_id, + &call.caller_symbol_id, + &call.callee_symbol_id.as_deref().unwrap_or(""), + &call.callee_name, + &call.callee_target_kind.as_str(), + &call.callee_external_module.as_deref().unwrap_or(""), + &call.file_path, + &to_i32(call.line), + ], + )?; + } + Ok(calls.len()) +} + +fn to_i32(value: usize) -> i32 { + value.min(i32::MAX as usize) as i32 +} diff --git a/crates/gcode/src/index/chunker.rs b/crates/gcode/src/index/chunker.rs index 0c2fd67..a79a706 100644 --- a/crates/gcode/src/index/chunker.rs +++ b/crates/gcode/src/index/chunker.rs @@ -1,5 +1,14 @@ //! Content chunking: 100-line chunks with 10-line overlap. //! Ports logic from src/gobby/code_index/chunker.py. +//! +//! This remains gcode-owned because BM25 content indexing stores +//! line-based `ContentChunk` records with project, path, line range, language, +//! and timestamp fields. The generic `gobby_core::indexing::Chunk` and +//! `ChunkIdentity` primitives model byte ranges with opaque metadata, so +//! composing them here would hide a domain-specific projection rather than +//! remove shared foundation logic. gcode also derives incremental state from +//! PostgreSQL `indexed_files.content_hash` rows instead of consuming core +//! `IndexEvent` snapshots. use crate::models::ContentChunk; @@ -61,3 +70,22 @@ fn epoch_secs_str() -> String { .as_secs(); format!("{secs}") } + +#[cfg(test)] +mod tests { + #[test] + fn chunker_stays_gcode_owned_with_documented_narrowing() { + let source = include_str!("chunker.rs"); + let doc_phrase = ["line-based `ContentChunk`", " records"].concat(); + assert!(source.contains(&doc_phrase)); + + for forbidden in [ + ["use gobby_core", "::indexing::Chunk"].concat(), + ["use gobby_core", "::indexing::ChunkIdentity"].concat(), + ["use gobby_core", "::indexing::IndexEvent"].concat(), + ["use gobby_core", "::indexing::index_events_from_hashes"].concat(), + ] { + assert!(!source.contains(&forbidden)); + } + } +} diff --git a/crates/gcode/src/index/hasher.rs b/crates/gcode/src/index/hasher.rs index 6083312..1fd0751 100644 --- a/crates/gcode/src/index/hasher.rs +++ b/crates/gcode/src/index/hasher.rs @@ -1,23 +1,11 @@ //! Content hashing for incremental indexing. //! Ports logic from src/gobby/code_index/hasher.py. -use sha2::{Digest, Sha256}; -use std::io::Read; use std::path::Path; /// SHA-256 hash of entire file contents. pub fn file_content_hash(path: &Path) -> anyhow::Result { - let mut file = std::fs::File::open(path)?; - let mut hasher = Sha256::new(); - let mut buf = [0u8; 65536]; - loop { - let n = file.read(&mut buf)?; - if n == 0 { - break; - } - hasher.update(&buf[..n]); - } - Ok(format!("{:x}", hasher.finalize())) + Ok(gobby_core::indexing::file_content_hash(path)?) } /// SHA-256 hash of a byte slice (symbol source). @@ -30,7 +18,27 @@ pub fn symbol_content_hash(source: &[u8], start: usize, end: usize) -> anyhow::R source.len() ) })?; - let mut hasher = Sha256::new(); - hasher.update(slice); - Ok(format!("{:x}", hasher.finalize())) + Ok(gobby_core::indexing::content_hash(slice)) +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn file_content_hash_delegates_to_gobby_core() { + let tmp = tempfile::NamedTempFile::new().expect("tempfile"); + std::fs::write(tmp.path(), b"hash me\n").expect("write file"); + + let actual = file_content_hash(tmp.path()).expect("hash via wrapper"); + let expected = + gobby_core::indexing::file_content_hash(tmp.path()).expect("hash via gobby-core"); + assert_eq!(actual, expected); + + let source = include_str!("hasher.rs"); + let delegate = ["gobby_core", "::indexing::file_content_hash"].concat(); + let local_buffer = format!("let mut buf = [0u8; {}]", 64 * 1024); + assert!(source.contains(&delegate)); + assert!(!source.contains(&local_buffer)); + } } diff --git a/crates/gcode/src/index/indexer.rs b/crates/gcode/src/index/indexer.rs index c6c7a51..007d91d 100644 --- a/crates/gcode/src/index/indexer.rs +++ b/crates/gcode/src/index/indexer.rs @@ -1,24 +1,35 @@ //! Full and incremental indexing orchestrator. //! -//! Writes symbols, files, and content chunks to the PostgreSQL hub. External sync -//! (Qdrant vectors, FalkorDB graph) is handled by the Gobby daemon's sync worker, -//! which polls for files with `vectors_synced=false` / `graph_synced=false`. +//! Writes files, symbols, imports, calls, unresolved targets, and content chunks +//! to the PostgreSQL hub. External sync (Qdrant vectors, FalkorDB graph) is +//! delegated through projection sync status and handled outside this module. use std::collections::HashMap; -use std::path::Path; +use std::path::{Path, PathBuf}; use std::time::Instant; -use anyhow::Context; +use anyhow::Context as _; use postgres::{Client, GenericClient}; +use serde::{Deserialize, Serialize}; +use crate::config::Context; +use crate::db; +use crate::graph::code_graph; +use crate::index::api; use crate::index::chunker; use crate::index::hasher; use crate::index::languages; use crate::index::parser; use crate::index::semantic::{self, SemanticCallResolver}; use crate::index::walker; -use crate::models::{IndexResult, IndexedFile, IndexedProject}; -use crate::progress::ProgressBar; +use crate::models::{ + CallRelation, CallTargetKind, ContentChunk, ImportRelation, IndexedFile, IndexedProject, + ParseResult, Symbol, +}; +use crate::projection::sync::{ + self, ProjectionSyncRequest, ProjectionSyncStatus, ProjectionTarget, +}; +use crate::vector::code_symbols; /// Default exclude patterns (matching Python CodeIndexConfig defaults). const DEFAULT_EXCLUDES: &[&str] = &[ @@ -40,61 +51,234 @@ const DEFAULT_EXCLUDES: &[&str] = &[ ".cache", ]; -/// Index a directory (full or incremental). -pub fn index_directory( +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +pub struct IndexRequest { + pub project_root: PathBuf, + #[serde(default, skip_serializing_if = "Option::is_none")] + pub path_filter: Option, + #[serde(default)] + pub explicit_files: Vec, + pub full: bool, + pub require_cpp_semantics: bool, + pub sync_projections: bool, +} + +#[derive(Debug, Clone, Default, PartialEq, Eq, Serialize, Deserialize)] +pub struct IndexDurations { + pub discovery_ms: u64, + pub indexing_ms: u64, + pub stats_ms: u64, + pub total_ms: u64, +} + +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +#[serde(tag = "kind", rename_all = "snake_case")] +pub enum IndexDegradation { + FileIndexError { + file_path: String, + message: String, + }, + ProjectionSyncSkipped { + reason: String, + }, + ProjectionCleanupFailed { + file_path: String, + target: ProjectionTarget, + message: String, + }, +} + +#[derive(Debug, Clone, Default, PartialEq, Eq, Serialize, Deserialize)] +pub struct IndexOutcome { + pub project_id: String, + pub scanned_files: usize, + pub indexed_files: usize, + pub skipped_files: usize, + pub symbols_indexed: usize, + pub imports_indexed: usize, + pub calls_indexed: usize, + pub unresolved_targets_indexed: usize, + pub chunks_indexed: usize, + #[serde(default, skip_serializing_if = "Vec::is_empty")] + pub indexed_file_paths: Vec, + pub durations: IndexDurations, + #[serde(default, skip_serializing_if = "Vec::is_empty")] + pub degraded: Vec, + #[serde(default, skip_serializing_if = "Option::is_none")] + pub projection_sync: Option, +} + +impl IndexOutcome { + fn new(project_id: &str) -> Self { + Self { + project_id: project_id.to_string(), + ..Self::default() + } + } + + fn add_counts(&mut self, counts: FileIndexCounts) { + self.indexed_files += counts.indexed_files; + self.symbols_indexed += counts.symbols_indexed; + self.imports_indexed += counts.imports_indexed; + self.calls_indexed += counts.calls_indexed; + self.unresolved_targets_indexed += counts.unresolved_targets_indexed; + self.chunks_indexed += counts.chunks_indexed; + if counts.indexed_files > 0 { + self.indexed_file_paths.push(counts.file_path); + } + } +} + +#[derive(Debug, Clone, Default, PartialEq, Eq)] +struct FileIndexCounts { + file_path: String, + indexed_files: usize, + symbols_indexed: usize, + imports_indexed: usize, + calls_indexed: usize, + unresolved_targets_indexed: usize, + chunks_indexed: usize, +} + +trait CodeFactSink { + fn delete_file_facts(&mut self, project_id: &str, file_path: &str) -> anyhow::Result<()>; + fn upsert_symbols(&mut self, symbols: &[Symbol]) -> anyhow::Result; + fn upsert_file(&mut self, file: &IndexedFile) -> anyhow::Result<()>; + fn upsert_imports( + &mut self, + project_id: &str, + file_path: &str, + imports: &[ImportRelation], + ) -> anyhow::Result; + fn upsert_calls( + &mut self, + project_id: &str, + file_path: &str, + calls: &[CallRelation], + ) -> anyhow::Result; + fn upsert_content_chunks(&mut self, chunks: &[ContentChunk]) -> anyhow::Result; +} + +struct PostgresCodeFactSink<'a, C> { + conn: &'a mut C, +} + +impl<'a, C> PostgresCodeFactSink<'a, C> { + fn new(conn: &'a mut C) -> Self { + Self { conn } + } +} + +impl CodeFactSink for PostgresCodeFactSink<'_, C> +where + C: GenericClient, +{ + fn delete_file_facts(&mut self, project_id: &str, file_path: &str) -> anyhow::Result<()> { + api::delete_file_facts(self.conn, project_id, file_path) + } + + fn upsert_symbols(&mut self, symbols: &[Symbol]) -> anyhow::Result { + api::upsert_symbols(self.conn, symbols) + } + + fn upsert_file(&mut self, file: &IndexedFile) -> anyhow::Result<()> { + api::upsert_file(self.conn, file) + } + + fn upsert_imports( + &mut self, + project_id: &str, + file_path: &str, + imports: &[ImportRelation], + ) -> anyhow::Result { + api::upsert_imports(self.conn, project_id, file_path, imports) + } + + fn upsert_calls( + &mut self, + project_id: &str, + file_path: &str, + calls: &[CallRelation], + ) -> anyhow::Result { + api::upsert_calls(self.conn, project_id, file_path, calls) + } + + fn upsert_content_chunks(&mut self, chunks: &[ContentChunk]) -> anyhow::Result { + api::upsert_content_chunks(self.conn, chunks) + } +} + +pub fn index_files(request: IndexRequest, ctx: &Context) -> anyhow::Result { + let mut conn = db::connect_readwrite(&ctx.database_url)?; + index_files_with_connection(&mut conn, request, ctx) +} + +fn index_files_with_connection( conn: &mut Client, - root_path: &Path, - project_id: &str, - incremental: bool, - quiet: bool, - require_cpp_semantics: bool, -) -> anyhow::Result { + request: IndexRequest, + ctx: &Context, +) -> anyhow::Result { + if request.explicit_files.is_empty() { + index_discovered_files(conn, &request, ctx) + } else { + index_explicit_files_with_connection(conn, &request, ctx) + } +} + +fn index_discovered_files( + conn: &mut Client, + request: &IndexRequest, + ctx: &Context, +) -> anyhow::Result { + let project_id = ctx.project_id.as_str(); let start = Instant::now(); - let mut result = IndexResult { - project_id: project_id.to_string(), - files_indexed: 0, - files_skipped: 0, - symbols_found: 0, - errors: Vec::new(), - duration_ms: 0, - }; + let discovery_start = Instant::now(); + let root_path = &request.project_root; + let mut outcome = IndexOutcome::new(project_id); let excludes: Vec = DEFAULT_EXCLUDES.iter().map(|s| s.to_string()).collect(); - let (candidates, content_only) = walker::discover_files(root_path, &excludes); + let (mut candidates, mut content_only) = walker::discover_files(root_path, &excludes); + if let Some(filter) = request.path_filter.as_deref() { + candidates = filter_discovered_paths(root_path, filter, candidates); + content_only = filter_discovered_paths(root_path, filter, content_only); + } let import_context = parser::build_import_resolution_context(root_path, &candidates); let mut semantic_resolver = - create_semantic_resolver_if_needed(root_path, &candidates, require_cpp_semantics)?; + create_semantic_resolver_if_needed(root_path, &candidates, request.require_cpp_semantics)?; // Build current hash map for incremental detection and orphan cleanup. let current_hashes = current_file_hashes(root_path, &candidates, &content_only); - let stale: Option> = if incremental { + let stale: Option> = if !request.full { Some(get_stale_files(conn, project_id, ¤t_hashes)) } else { None }; - // Clean orphans from the hub; daemon handles FalkorDB/Qdrant cleanup. - let orphans = get_orphan_files(conn, project_id, ¤t_hashes); - for orphan in &orphans { - delete_file_postgres_data(conn, project_id, orphan); + // Clean orphans only during whole-project scans. Filtered scans do not know + // about files outside the requested subtree. + if request.path_filter.is_none() { + let orphans = get_orphan_files(conn, project_id, ¤t_hashes); + for orphan in &orphans { + cleanup_deleted_file_projections(ctx, orphan, &mut outcome); + api::delete_file_facts(conn, project_id, orphan)?; + } } - // Index each candidate file let eligible_files = candidates.len() + content_only.len(); - let mut progress = ProgressBar::new(eligible_files, quiet); + outcome.scanned_files = eligible_files; + outcome.durations.discovery_ms = discovery_start.elapsed().as_millis() as u64; + let indexing_start = Instant::now(); for path in &candidates { let rel = match relative_path(path, root_path) { Ok(r) => r, Err(_) => continue, }; - progress.tick(&rel); - if let Some(ref stale_map) = stale && !stale_map.contains_key(&rel) { - result.files_skipped += 1; + outcome.skipped_files += 1; continue; } @@ -107,66 +291,54 @@ pub fn index_directory( &import_context, semantic_resolver.as_deref_mut(), )? { - Some(count) => { - result.files_indexed += 1; - result.symbols_found += count; - } + Some(counts) => outcome.add_counts(counts), None => { - result.files_skipped += 1; + outcome.skipped_files += 1; } } } - // Index content-only files for path in &content_only { let rel = relative_path(path, root_path).unwrap_or_default(); - progress.tick(&rel); if let Some(ref stale_map) = stale && !stale_map.contains_key(&rel) { - result.files_skipped += 1; + outcome.skipped_files += 1; continue; } - if index_content_only(conn, path, project_id, root_path, &excludes)? { - result.files_indexed += 1; - } else { - result.files_skipped += 1; + match index_content_only(conn, path, project_id, root_path, &excludes)? { + Some(counts) => outcome.add_counts(counts), + None => outcome.skipped_files += 1, } } + outcome.durations.indexing_ms = indexing_start.elapsed().as_millis() as u64; - progress.finish(); - - let elapsed_ms = start.elapsed().as_millis() as u64; - result.duration_ms = elapsed_ms; - + let stats_start = Instant::now(); refresh_project_stats( conn, root_path, project_id, - elapsed_ms, + start.elapsed().as_millis() as u64, Some(eligible_files), ); + outcome.durations.stats_ms = stats_start.elapsed().as_millis() as u64; + outcome.durations.total_ms = start.elapsed().as_millis() as u64; - Ok(result) + attach_projection_sync(&mut outcome, request); + Ok(outcome) } -/// Index specific changed files. -pub fn index_files( +fn index_explicit_files_with_connection( conn: &mut Client, - root_path: &Path, - project_id: &str, - file_paths: &[String], - require_cpp_semantics: bool, -) -> anyhow::Result { + request: &IndexRequest, + ctx: &Context, +) -> anyhow::Result { + let project_id = ctx.project_id.as_str(); let start = Instant::now(); - let mut result = IndexResult { - project_id: project_id.to_string(), - files_indexed: 0, - files_skipped: 0, - symbols_found: 0, - errors: Vec::new(), - duration_ms: 0, - }; + let discovery_start = Instant::now(); + let root_path = &request.project_root; + let mut outcome = IndexOutcome::new(project_id); + outcome.scanned_files = request.explicit_files.len(); let excludes: Vec = DEFAULT_EXCLUDES.iter().map(|s| s.to_string()).collect(); let (candidates, content_only) = walker::discover_files(root_path, &excludes); @@ -174,16 +346,17 @@ pub fn index_files( let mut routed_files = Vec::new(); let mut ast_files = Vec::new(); - for fp in file_paths { - let abs = if Path::new(fp).is_absolute() { - std::path::PathBuf::from(fp) + for fp in &request.explicit_files { + let abs = if fp.is_absolute() { + fp.clone() } else { root_path.join(fp) }; if !abs.exists() { - // File deleted — clean up hub rows (daemon handles external cleanup). - delete_file_postgres_data(conn, project_id, fp); + let rel = requested_relative_path(root_path, fp); + cleanup_deleted_file_projections(ctx, &rel, &mut outcome); + api::delete_file_facts(conn, project_id, &rel)?; continue; } @@ -196,14 +369,16 @@ pub fn index_files( routed_files.push((abs, ExplicitFileRoute::ContentOnly)); } ExplicitFileRoute::Skip => { - result.files_skipped += 1; + outcome.skipped_files += 1; } } } let mut semantic_resolver = - create_semantic_resolver_if_needed(root_path, &ast_files, require_cpp_semantics)?; + create_semantic_resolver_if_needed(root_path, &ast_files, request.require_cpp_semantics)?; + outcome.durations.discovery_ms = discovery_start.elapsed().as_millis() as u64; + let indexing_start = Instant::now(); for (abs, route) in routed_files { match route { ExplicitFileRoute::Ast => { @@ -216,32 +391,35 @@ pub fn index_files( &import_context, semantic_resolver.as_deref_mut(), )? { - result.files_indexed += 1; - result.symbols_found += count; + outcome.add_counts(count); } else { - result.files_skipped += 1; + outcome.skipped_files += 1; } } ExplicitFileRoute::ContentOnly => { - if index_content_only(conn, &abs, project_id, root_path, &excludes)? { - result.files_indexed += 1; - } else { - result.files_skipped += 1; + match index_content_only(conn, &abs, project_id, root_path, &excludes)? { + Some(counts) => outcome.add_counts(counts), + None => outcome.skipped_files += 1, } } _ => unreachable!("skip routes are filtered before indexing"), } } + outcome.durations.indexing_ms = indexing_start.elapsed().as_millis() as u64; - result.duration_ms = start.elapsed().as_millis() as u64; + let stats_start = Instant::now(); refresh_project_stats( conn, root_path, project_id, - result.duration_ms, + start.elapsed().as_millis() as u64, Some(candidates.len() + content_only.len()), ); - Ok(result) + outcome.durations.stats_ms = stats_start.elapsed().as_millis() as u64; + outcome.durations.total_ms = start.elapsed().as_millis() as u64; + + attach_projection_sync(&mut outcome, request); + Ok(outcome) } /// Index a single file. Returns symbol count or None if skipped. @@ -253,7 +431,7 @@ fn index_file( exclude_patterns: &[String], import_context: &parser::ImportResolutionContext, semantic_resolver: Option<&mut (dyn SemanticCallResolver + '_)>, -) -> anyhow::Result> { +) -> anyhow::Result> { let rel = match relative_path(file_path, root_path) { Ok(rel) => rel, Err(_) => return Ok(None), @@ -271,46 +449,28 @@ fn index_file( return Ok(None); }; - let count = parse_result.symbols.len(); - // PostgreSQL hub writes (transactional). let mut tx = conn .transaction() .context("start indexed file transaction")?; - delete_file_postgres_data(&mut tx, project_id, &rel); - upsert_symbols(&mut tx, &parse_result.symbols)?; - let language = languages::detect_language(&file_path.to_string_lossy()).unwrap_or("unknown"); let h = hasher::file_content_hash(file_path).unwrap_or_default(); let size = file_path.metadata().map(|m| m.len()).unwrap_or(0); - - upsert_file( - &mut tx, - &IndexedFile { - id: IndexedFile::make_id(project_id, &rel), - project_id: project_id.to_string(), - file_path: rel.clone(), - language: language.to_string(), - content_hash: h, - symbol_count: count, - byte_size: size as usize, - indexed_at: epoch_secs_str(), - }, - ); - - upsert_imports(&mut tx, project_id, &rel, &parse_result.imports); - upsert_calls(&mut tx, project_id, &rel, &parse_result.calls); - - let chunks = - chunker::chunk_file_content(&parse_result.source, &rel, project_id, Some(language)); - if !chunks.is_empty() { - upsert_content_chunks(&mut tx, &chunks); - } + let mut sink = PostgresCodeFactSink::new(&mut tx); + let counts = write_parsed_file_facts( + &mut sink, + project_id, + &rel, + language, + &h, + size as usize, + &parse_result, + )?; tx.commit().context("commit indexed file transaction")?; - Ok(Some(count)) + Ok(Some(counts)) } fn create_semantic_resolver_if_needed( @@ -356,19 +516,19 @@ fn index_content_only( project_id: &str, root_path: &Path, exclude_patterns: &[String], -) -> anyhow::Result { +) -> anyhow::Result> { if !walker::is_content_indexable(root_path, path, exclude_patterns) { - return Ok(false); + return Ok(None); } let rel = match relative_path(path, root_path) { Ok(r) => r, - Err(_) => return Ok(false), + Err(_) => return Ok(None), }; let source = match std::fs::read(path) { Ok(s) => s, - Err(_) => return Ok(false), + Err(_) => return Ok(None), }; let lang = walker::content_language(path); @@ -377,30 +537,192 @@ fn index_content_only( let mut tx = conn .transaction() .context("start content-only file transaction")?; + let mut sink = PostgresCodeFactSink::new(&mut tx); + let counts = write_content_only_file_facts( + &mut sink, + project_id, + &rel, + &lang, + &content_hash, + source.len(), + &source, + )?; - delete_file_postgres_data(&mut tx, project_id, &rel); - upsert_file( - &mut tx, - &IndexedFile { - id: IndexedFile::make_id(project_id, &rel), - project_id: project_id.to_string(), - file_path: rel.clone(), - language: lang.clone(), - content_hash, - symbol_count: 0, - byte_size: source.len(), - indexed_at: epoch_secs_str(), - }, - ); + tx.commit() + .context("commit content-only file transaction")?; + Ok(Some(counts)) +} + +fn write_parsed_file_facts( + sink: &mut impl CodeFactSink, + project_id: &str, + rel: &str, + language: &str, + content_hash: &str, + byte_size: usize, + parse_result: &ParseResult, +) -> anyhow::Result { + sink.delete_file_facts(project_id, rel)?; + let symbols_indexed = sink.upsert_symbols(&parse_result.symbols)?; + sink.upsert_file(&IndexedFile { + id: IndexedFile::make_id(project_id, rel), + project_id: project_id.to_string(), + file_path: rel.to_string(), + language: language.to_string(), + content_hash: content_hash.to_string(), + symbol_count: parse_result.symbols.len(), + byte_size, + indexed_at: epoch_secs_str(), + })?; + let imports_indexed = sink.upsert_imports(project_id, rel, &parse_result.imports)?; + let calls_indexed = sink.upsert_calls(project_id, rel, &parse_result.calls)?; + let unresolved_targets_indexed = parse_result + .calls + .iter() + .filter(|call| call.callee_target_kind == CallTargetKind::Unresolved) + .count(); + let chunks = chunker::chunk_file_content(&parse_result.source, rel, project_id, Some(language)); + let chunks_indexed = if chunks.is_empty() { + 0 + } else { + sink.upsert_content_chunks(&chunks)? + }; + + Ok(FileIndexCounts { + file_path: rel.to_string(), + indexed_files: 1, + symbols_indexed, + imports_indexed, + calls_indexed, + unresolved_targets_indexed, + chunks_indexed, + }) +} + +fn write_content_only_file_facts( + sink: &mut impl CodeFactSink, + project_id: &str, + rel: &str, + language: &str, + content_hash: &str, + byte_size: usize, + source: &[u8], +) -> anyhow::Result { + sink.delete_file_facts(project_id, rel)?; + sink.upsert_file(&IndexedFile { + id: IndexedFile::make_id(project_id, rel), + project_id: project_id.to_string(), + file_path: rel.to_string(), + language: language.to_string(), + content_hash: content_hash.to_string(), + symbol_count: 0, + byte_size, + indexed_at: epoch_secs_str(), + })?; + let chunks = chunker::chunk_file_content(source, rel, project_id, Some(language)); + let chunks_indexed = if chunks.is_empty() { + 0 + } else { + sink.upsert_content_chunks(&chunks)? + }; + + Ok(FileIndexCounts { + file_path: rel.to_string(), + indexed_files: 1, + chunks_indexed, + ..FileIndexCounts::default() + }) +} + +fn filter_discovered_paths( + root_path: &Path, + path_filter: &Path, + paths: Vec, +) -> Vec { + let filter_abs = if path_filter.is_absolute() { + path_filter.to_path_buf() + } else { + root_path.join(path_filter) + }; + let filter_abs = filter_abs.canonicalize().unwrap_or(filter_abs); + + paths + .into_iter() + .filter(|path| { + let path_abs = path.canonicalize().unwrap_or_else(|_| path.clone()); + path_abs == filter_abs || path_abs.starts_with(&filter_abs) + }) + .collect() +} - let chunks = chunker::chunk_file_content(&source, &rel, project_id, Some(&lang)); - if !chunks.is_empty() { - upsert_content_chunks(&mut tx, &chunks); +fn requested_relative_path(root_path: &Path, requested_path: &Path) -> String { + if requested_path.is_absolute() { + return requested_path + .strip_prefix(root_path) + .unwrap_or(requested_path) + .to_string_lossy() + .to_string(); } + requested_path.to_string_lossy().to_string() +} - tx.commit() - .context("commit content-only file transaction")?; - Ok(true) +fn cleanup_deleted_file_projections(ctx: &Context, file_path: &str, outcome: &mut IndexOutcome) { + if let Err(error) = code_graph::delete_file_projection(ctx, file_path) { + push_projection_cleanup_degradation( + outcome, + file_path, + ProjectionTarget::Graph, + error.to_string(), + ); + } + + match ctx.qdrant.as_ref() { + Some(qdrant) => { + if let Err(error) = + code_symbols::delete_file_vectors(qdrant, &ctx.project_id, file_path) + { + push_projection_cleanup_degradation( + outcome, + file_path, + ProjectionTarget::Vectors, + error.to_string(), + ); + } + } + None => push_projection_cleanup_degradation( + outcome, + file_path, + ProjectionTarget::Vectors, + "Qdrant config is required for deleted-file vector cleanup".to_string(), + ), + } +} + +fn push_projection_cleanup_degradation( + outcome: &mut IndexOutcome, + file_path: &str, + target: ProjectionTarget, + message: String, +) { + outcome + .degraded + .push(IndexDegradation::ProjectionCleanupFailed { + file_path: file_path.to_string(), + target, + message, + }); +} + +fn attach_projection_sync(outcome: &mut IndexOutcome, request: &IndexRequest) { + if !request.sync_projections { + return; + } + + outcome.projection_sync = Some(sync::pending_after_code_fact_write(ProjectionSyncRequest { + project_id: outcome.project_id.clone(), + file_paths: outcome.indexed_file_paths.clone(), + targets: vec![ProjectionTarget::Graph, ProjectionTarget::Vectors], + })); } /// Invalidate all index data for a project. @@ -472,129 +794,6 @@ fn notify_daemon_invalidate(base_url: &str, project_id: &str) { } } -// ── PostgreSQL helpers ───────────────────────────────────────────────── - -fn upsert_symbols( - conn: &mut impl GenericClient, - symbols: &[crate::models::Symbol], -) -> anyhow::Result<()> { - for sym in symbols { - conn.execute( - "INSERT INTO code_symbols ( - id, project_id, file_path, name, qualified_name, - kind, language, byte_start, byte_end, - line_start, line_end, signature, docstring, - parent_symbol_id, content_hash, summary, - created_at, updated_at - ) VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,NOW(),NOW()) - ON CONFLICT(id) DO UPDATE SET - name=excluded.name, qualified_name=excluded.qualified_name, - kind=excluded.kind, byte_start=excluded.byte_start, - byte_end=excluded.byte_end, line_start=excluded.line_start, - line_end=excluded.line_end, signature=excluded.signature, - docstring=excluded.docstring, parent_symbol_id=excluded.parent_symbol_id, - language=excluded.language, content_hash=excluded.content_hash, - summary=CASE WHEN excluded.content_hash != code_symbols.content_hash - THEN NULL ELSE code_symbols.summary END, - updated_at=NOW()", - &[ - &sym.id, - &sym.project_id, - &sym.file_path, - &sym.name, - &sym.qualified_name, - &sym.kind, - &sym.language, - &to_i32(sym.byte_start), - &to_i32(sym.byte_end), - &to_i32(sym.line_start), - &to_i32(sym.line_end), - &sym.signature, - &sym.docstring, - &sym.parent_symbol_id, - &sym.content_hash, - &sym.summary, - ], - )?; - } - Ok(()) -} - -fn upsert_file(conn: &mut impl GenericClient, file: &IndexedFile) { - let _ = conn.execute( - "INSERT INTO code_indexed_files ( - id, project_id, file_path, language, content_hash, - symbol_count, byte_size, graph_synced, vectors_synced, - graph_sync_attempted_at, indexed_at - ) VALUES ($1,$2,$3,$4,$5,$6,$7,false,false,NULL,NOW()) - ON CONFLICT(id) DO UPDATE SET - content_hash=excluded.content_hash, - symbol_count=excluded.symbol_count, - byte_size=excluded.byte_size, - graph_synced=false, - vectors_synced=false, - graph_sync_attempted_at=NULL, - indexed_at=NOW()", - &[ - &file.id, - &file.project_id, - &file.file_path, - &file.language, - &file.content_hash, - &to_i32(file.symbol_count), - &to_i32(file.byte_size), - ], - ); -} - -fn upsert_content_chunks(conn: &mut impl GenericClient, chunks: &[crate::models::ContentChunk]) { - for chunk in chunks { - let _ = conn.execute( - "INSERT INTO code_content_chunks ( - id, project_id, file_path, chunk_index, - line_start, line_end, content, language, created_at - ) VALUES ($1,$2,$3,$4,$5,$6,$7,$8,NOW()) - ON CONFLICT(id) DO UPDATE SET - content=excluded.content, - line_start=excluded.line_start, - line_end=excluded.line_end", - &[ - &chunk.id, - &chunk.project_id, - &chunk.file_path, - &to_i32(chunk.chunk_index), - &to_i32(chunk.line_start), - &to_i32(chunk.line_end), - &chunk.content, - &chunk.language, - ], - ); - } -} - -fn upsert_project_stats(conn: &mut impl GenericClient, project: &IndexedProject) { - let _ = conn.execute( - "INSERT INTO code_indexed_projects ( - id, root_path, total_files, total_symbols, - last_indexed_at, index_duration_ms - ) VALUES ($1,$2,$3,$4,NOW(),$5) - ON CONFLICT(id) DO UPDATE SET - root_path=excluded.root_path, - total_files=excluded.total_files, - total_symbols=excluded.total_symbols, - last_indexed_at=excluded.last_indexed_at, - index_duration_ms=excluded.index_duration_ms, - updated_at=NOW()", - &[ - &project.id, - &project.root_path, - &to_i32(project.total_files), - &to_i32(project.total_symbols), - &to_i32(project.index_duration_ms as usize), - ], - ); -} - fn refresh_project_stats( conn: &mut Client, root_path: &Path, @@ -605,7 +804,7 @@ fn refresh_project_stats( let total_files = count_rows(conn, "code_indexed_files", project_id); let total_symbols = count_rows(conn, "code_symbols", project_id); - upsert_project_stats( + let _ = api::upsert_project_stats( conn, &IndexedProject { id: project_id.to_string(), @@ -619,86 +818,6 @@ fn refresh_project_stats( ); } -/// Delete PostgreSQL hub data for a file. -fn delete_file_postgres_data(conn: &mut impl GenericClient, project_id: &str, file_path: &str) { - let _ = conn.execute( - "DELETE FROM code_symbols WHERE project_id = $1 AND file_path = $2", - &[&project_id, &file_path], - ); - let _ = conn.execute( - "DELETE FROM code_indexed_files WHERE project_id = $1 AND file_path = $2", - &[&project_id, &file_path], - ); - let _ = conn.execute( - "DELETE FROM code_content_chunks WHERE project_id = $1 AND file_path = $2", - &[&project_id, &file_path], - ); - let _ = conn.execute( - "DELETE FROM code_imports WHERE project_id = $1 AND source_file = $2", - &[&project_id, &file_path], - ); - let _ = conn.execute( - "DELETE FROM code_calls WHERE project_id = $1 AND file_path = $2", - &[&project_id, &file_path], - ); -} - -/// Write import relations to Postgres (delete-then-insert per file). -fn upsert_imports( - conn: &mut impl GenericClient, - project_id: &str, - file_path: &str, - imports: &[crate::models::ImportRelation], -) { - let _ = conn.execute( - "DELETE FROM code_imports WHERE project_id = $1 AND source_file = $2", - &[&project_id, &file_path], - ); - for imp in imports { - let _ = conn.execute( - "INSERT INTO code_imports (project_id, source_file, target_module) - VALUES ($1, $2, $3) - ON CONFLICT (project_id, source_file, target_module) DO NOTHING", - &[&project_id, &imp.file_path, &imp.module_name], - ); - } -} - -/// Write call relations to Postgres (delete-then-insert per file). -fn upsert_calls( - conn: &mut impl GenericClient, - project_id: &str, - file_path: &str, - calls: &[crate::models::CallRelation], -) { - let _ = conn.execute( - "DELETE FROM code_calls WHERE project_id = $1 AND file_path = $2", - &[&project_id, &file_path], - ); - for call in calls { - let _ = conn.execute( - "INSERT INTO code_calls - (project_id, caller_symbol_id, callee_symbol_id, callee_name, \ - callee_target_kind, callee_external_module, file_path, line) - VALUES ($1, $2, $3, $4, $5, $6, $7, $8) - ON CONFLICT ( - project_id, caller_symbol_id, callee_symbol_id, callee_name, - callee_target_kind, callee_external_module, file_path, line - ) DO NOTHING", - &[ - &project_id, - &call.caller_symbol_id, - &call.callee_symbol_id.as_deref().unwrap_or(""), - &call.callee_name, - &call.callee_target_kind.as_str(), - &call.callee_external_module.as_deref().unwrap_or(""), - &call.file_path, - &to_i32(call.line), - ], - ); - } -} - fn get_stale_files( conn: &mut Client, project_id: &str, @@ -775,10 +894,6 @@ fn count_rows(conn: &mut Client, table: &str, project_id: &str) -> usize { .unwrap_or(0) as usize } -fn to_i32(value: usize) -> i32 { - value.try_into().unwrap_or(i32::MAX) -} - fn relative_path(path: &Path, root: &Path) -> anyhow::Result { let abs = path.canonicalize()?; let root_abs = root.canonicalize()?; @@ -797,8 +912,11 @@ fn epoch_secs_str() -> String { #[cfg(test)] mod tests { use super::*; - use crate::models::{CallRelation, CallTargetKind}; + use crate::models::{CallRelation, CallTargetKind, ImportRelation, ParseResult, Symbol}; + use serde::Serialize; + use serde::de::DeserializeOwned; use std::path::Path; + use std::path::PathBuf; fn write_file(root: &Path, rel: &str, contents: &[u8]) { let path = root.join(rel); @@ -808,6 +926,192 @@ mod tests { std::fs::write(path, contents).expect("write file"); } + fn assert_cli_independent_contract() + where + T: Serialize + DeserializeOwned, + { + let type_name = std::any::type_name::(); + assert!(!type_name.contains("commands::"), "{type_name}"); + assert!(!type_name.contains("output::"), "{type_name}"); + assert!(!type_name.contains("clap"), "{type_name}"); + } + + #[test] + fn library_api_is_cli_independent() { + assert_cli_independent_contract::(); + assert_cli_independent_contract::(); + assert_cli_independent_contract::(); + assert_cli_independent_contract::(); + + let request = IndexRequest { + project_root: PathBuf::from("/tmp/project"), + path_filter: Some(PathBuf::from("src")), + explicit_files: vec![PathBuf::from("src/lib.rs")], + full: true, + require_cpp_semantics: false, + sync_projections: true, + }; + + let json = serde_json::to_value(&request).expect("request serializes"); + assert_eq!(json["project_root"], "/tmp/project"); + assert_eq!(json["path_filter"], "src"); + assert_eq!(json["explicit_files"][0], "src/lib.rs"); + } + + #[test] + fn invalidate_postgres_deletes_are_project_scoped() { + let source = include_str!("indexer.rs"); + for expected in [ + "DELETE FROM code_symbols WHERE project_id = $1", + "DELETE FROM code_indexed_files WHERE project_id = $1", + "DELETE FROM code_content_chunks WHERE project_id = $1", + "DELETE FROM code_imports WHERE project_id = $1", + "DELETE FROM code_calls WHERE project_id = $1", + "DELETE FROM code_indexed_projects WHERE id = $1", + ] { + assert!( + source.contains(expected), + "missing scoped delete: {expected}" + ); + } + let truncate_code = ["TRUNCATE", " code_"].concat(); + let drop_table = ["DROP", " TABLE"].concat(); + assert!(!source.contains(&truncate_code)); + assert!(!source.contains(&drop_table)); + } + + #[derive(Default)] + struct RecordingCodeFactSink { + writes: Vec<&'static str>, + files: usize, + symbols: usize, + imports: usize, + calls: usize, + unresolved_targets: usize, + chunks: usize, + } + + impl CodeFactSink for RecordingCodeFactSink { + fn delete_file_facts(&mut self, _project_id: &str, _file_path: &str) -> anyhow::Result<()> { + self.writes.push("delete"); + Ok(()) + } + + fn upsert_symbols(&mut self, symbols: &[Symbol]) -> anyhow::Result { + self.writes.push("symbols"); + self.symbols += symbols.len(); + Ok(symbols.len()) + } + + fn upsert_file(&mut self, _file: &IndexedFile) -> anyhow::Result<()> { + self.writes.push("file"); + self.files += 1; + Ok(()) + } + + fn upsert_imports( + &mut self, + _project_id: &str, + _file_path: &str, + imports: &[ImportRelation], + ) -> anyhow::Result { + self.writes.push("imports"); + self.imports += imports.len(); + Ok(imports.len()) + } + + fn upsert_calls( + &mut self, + _project_id: &str, + _file_path: &str, + calls: &[CallRelation], + ) -> anyhow::Result { + self.writes.push("calls"); + self.calls += calls.len(); + self.unresolved_targets += calls + .iter() + .filter(|call| call.callee_target_kind == CallTargetKind::Unresolved) + .count(); + Ok(calls.len()) + } + + fn upsert_content_chunks(&mut self, chunks: &[ContentChunk]) -> anyhow::Result { + self.writes.push("chunks"); + self.chunks += chunks.len(); + Ok(chunks.len()) + } + } + + #[test] + fn library_writes_all_code_facts() { + let project_id = "project-1"; + let rel = "src/lib.rs"; + let source = b"use std::fmt;\nfn caller() {\n missing();\n}\n"; + let caller_id = Symbol::make_id(project_id, rel, "caller", "function", 14); + let parse_result = ParseResult { + symbols: vec![Symbol { + id: caller_id.clone(), + project_id: project_id.to_string(), + file_path: rel.to_string(), + name: "caller".to_string(), + qualified_name: "caller".to_string(), + kind: "function".to_string(), + language: "rust".to_string(), + byte_start: 14, + byte_end: 45, + line_start: 2, + line_end: 4, + signature: Some("fn caller()".to_string()), + docstring: None, + parent_symbol_id: None, + content_hash: "hash-1".to_string(), + summary: None, + created_at: String::new(), + updated_at: String::new(), + }], + imports: vec![ImportRelation { + file_path: rel.to_string(), + module_name: "std::fmt".to_string(), + }], + calls: vec![CallRelation::new( + caller_id, + "missing".to_string(), + rel.to_string(), + 3, + )], + source: source.to_vec(), + }; + + let mut sink = RecordingCodeFactSink::default(); + let counts = write_parsed_file_facts( + &mut sink, + project_id, + rel, + "rust", + "hash-1", + source.len(), + &parse_result, + ) + .expect("write parsed file facts"); + + assert_eq!( + sink.writes, + vec!["delete", "symbols", "file", "imports", "calls", "chunks"] + ); + assert_eq!(sink.files, 1); + assert_eq!(sink.symbols, 1); + assert_eq!(sink.imports, 1); + assert_eq!(sink.calls, 1); + assert_eq!(sink.unresolved_targets, 1); + assert_eq!(sink.chunks, 1); + assert_eq!(counts.indexed_files, 1); + assert_eq!(counts.symbols_indexed, 1); + assert_eq!(counts.imports_indexed, 1); + assert_eq!(counts.calls_indexed, 1); + assert_eq!(counts.unresolved_targets_indexed, 1); + assert_eq!(counts.chunks_indexed, 1); + } + #[test] fn call_relation_contract_uses_empty_optional_storage_values() { let resolved = CallRelation::new( @@ -871,4 +1175,42 @@ mod tests { ExplicitFileRoute::Skip ); } + + #[test] + fn deleted_file_projection_cleanup_degrades_without_services() { + let ctx = Context { + database_url: "postgresql://localhost/nonexistent".to_string(), + project_root: PathBuf::from("/project"), + project_id: "project-1".to_string(), + quiet: true, + falkordb: None, + qdrant: None, + embedding: None, + code_vectors: crate::config::CodeVectorSettings { vector_dim: None }, + daemon_url: None, + }; + let mut outcome = IndexOutcome::new("project-1"); + + cleanup_deleted_file_projections(&ctx, "src/deleted.rs", &mut outcome); + + assert_eq!(outcome.degraded.len(), 2); + assert!(outcome.degraded.iter().any(|degradation| matches!( + degradation, + IndexDegradation::ProjectionCleanupFailed { + file_path, + target: ProjectionTarget::Graph, + message, + } if file_path == "src/deleted.rs" + && message.contains("FalkorDB is not configured") + ))); + assert!(outcome.degraded.iter().any(|degradation| matches!( + degradation, + IndexDegradation::ProjectionCleanupFailed { + file_path, + target: ProjectionTarget::Vectors, + message, + } if file_path == "src/deleted.rs" + && message.contains("Qdrant config is required") + ))); + } } diff --git a/crates/gcode/src/index/mod.rs b/crates/gcode/src/index/mod.rs index 650aba1..8b39917 100644 --- a/crates/gcode/src/index/mod.rs +++ b/crates/gcode/src/index/mod.rs @@ -1,3 +1,4 @@ +pub mod api; pub mod chunker; pub mod hasher; pub mod import_resolution; diff --git a/crates/gcode/src/index/parser.rs b/crates/gcode/src/index/parser.rs index b03932b..c038a1e 100644 --- a/crates/gcode/src/index/parser.rs +++ b/crates/gcode/src/index/parser.rs @@ -43,7 +43,7 @@ pub(crate) fn parse_file_with_semantic( if !security::is_symlink_safe(file_path, root_path) { return Ok(None); } - if security::should_exclude(file_path, exclude_patterns) { + if security::should_exclude_path(root_path, file_path, exclude_patterns) { return Ok(None); } if security::has_secret_extension(file_path) { diff --git a/crates/gcode/src/index/security.rs b/crates/gcode/src/index/security.rs index 5b0ebf2..5bff86c 100644 --- a/crates/gcode/src/index/security.rs +++ b/crates/gcode/src/index/security.rs @@ -18,6 +18,10 @@ const SECRET_PREFIXES: &[&str] = &["credentials", ".env", "id_rsa", "id_ed25519" const SECRET_SUBSTRINGS: &[&str] = &["api_key", "apikey", "_secret.", "_token."]; +/// Generated output directories that are excluded only when they are the +/// first component under the indexed root. +const ROOT_GENERATED_DIRS: &[&str] = &["build", "dist"]; + /// Check that `path` resolves within `root` (prevents directory traversal). pub fn validate_path(path: &Path, root: &Path) -> bool { match (path.canonicalize(), root.canonicalize()) { @@ -49,19 +53,42 @@ pub fn is_binary(path: &Path) -> bool { buf[..n].contains(&0) } -/// Check if any path component matches an exclusion pattern. -pub fn should_exclude(path: &Path, patterns: &[String]) -> bool { +/// Check if a path should be excluded. +/// +/// Patterns listed in `ROOT_GENERATED_DIRS` match only the first relative path +/// component, so source paths like `src/package/build/mod.rs` remain indexable. +/// Other exclude patterns match any component of the relative path. +pub fn should_exclude_path(root: &Path, path: &Path, patterns: &[String]) -> bool { + let rel = path.strip_prefix(root).unwrap_or(path); + for pattern in patterns { - for component in path.components() { + if is_root_generated_dir(pattern) { + if rel + .components() + .next() + .map(|component| glob_match(pattern, &component.as_os_str().to_string_lossy())) + .unwrap_or(false) + { + return true; + } + continue; + } + + for component in rel.components() { let name = component.as_os_str().to_string_lossy(); if glob_match(pattern, &name) { return true; } } } + false } +fn is_root_generated_dir(pattern: &str) -> bool { + ROOT_GENERATED_DIRS.contains(&pattern) +} + /// Check if file extension suggests secret content. pub fn has_secret_extension(path: &Path) -> bool { let name = path diff --git a/crates/gcode/src/index/walker.rs b/crates/gcode/src/index/walker.rs index cdb97e8..4ebe8de 100644 --- a/crates/gcode/src/index/walker.rs +++ b/crates/gcode/src/index/walker.rs @@ -3,8 +3,6 @@ use std::path::{Path, PathBuf}; -use ignore::WalkBuilder; - use crate::index::languages; use crate::index::security; @@ -24,12 +22,11 @@ pub fn discover_files(root: &Path, exclude_patterns: &[String]) -> (Vec let mut candidates = Vec::new(); let mut content_only = Vec::new(); - let walker = WalkBuilder::new(root) - .hidden(true) - .git_ignore(true) - .git_global(true) - .git_exclude(true) - .build(); + let mut settings = gobby_core::indexing::WalkerSettings::new(root); + settings.max_filesize = Some(MAX_FILE_SIZE); + let mut builder = settings.into_walker(); + builder.hidden(true); + let walker = builder.build(); for entry in walker.flatten() { let path = entry.path(); @@ -90,7 +87,7 @@ fn is_safe_text_file(root: &Path, path: &Path, exclude_patterns: &[String]) -> b if !security::is_symlink_safe(path, root) { return false; } - if security::should_exclude(path, exclude_patterns) { + if security::should_exclude_path(root, path, exclude_patterns) { return false; } if security::has_secret_extension(path) { @@ -186,4 +183,44 @@ mod tests { ); assert_eq!(content_language(&root.join("Makefile")), "text"); } + + #[test] + fn classifies_source_build_directory_as_ast_indexable() { + let tmp = tempfile::tempdir().expect("tempdir"); + let root = tmp.path(); + write_file( + root, + "src/gobby/build/workspaces.py", + b"class WorkspaceBuilder:\n pass\n", + ); + let excludes = vec!["build".to_string(), "dist".to_string()]; + + assert_eq!( + classify_file(root, &root.join("src/gobby/build/workspaces.py"), &excludes), + Some(FileClassification::Ast) + ); + } + + #[test] + fn skips_root_build_directory() { + let tmp = tempfile::tempdir().expect("tempdir"); + let root = tmp.path(); + write_file(root, "build/generated.py", b"class Generated:\n pass\n"); + let excludes = vec!["build".to_string(), "dist".to_string()]; + + assert_eq!( + classify_file(root, &root.join("build/generated.py"), &excludes), + None + ); + } + + #[test] + fn walker_consumes_gobby_core_walker_settings() { + let source = include_str!("walker.rs"); + let settings = ["gobby_core", "::indexing::WalkerSettings"].concat(); + let direct_builder = ["WalkBuilder", "::new(root)"].concat(); + + assert!(source.contains(&settings)); + assert!(!source.contains(&direct_builder)); + } } diff --git a/crates/gcode/src/lib.rs b/crates/gcode/src/lib.rs new file mode 100644 index 0000000..1082ee5 --- /dev/null +++ b/crates/gcode/src/lib.rs @@ -0,0 +1,193 @@ +pub mod commands; +pub mod config; +pub mod db; +pub mod falkor; +pub mod freshness; +pub mod git; +pub mod graph; +pub mod index; +pub mod models; +pub mod output; +pub mod progress; +pub mod project; +pub mod projection; +pub mod savings; +pub mod schema; +pub mod search; +pub mod secrets; +pub mod setup; +pub mod skill; +pub mod utils; +pub mod vector; + +pub use index::api::{IndexDegradation, IndexDurations, IndexOutcome, IndexRequest, index_files}; + +#[cfg(test)] +mod tests { + use serde::Serialize; + use serde::de::DeserializeOwned; + + fn assert_cli_independent_contract() + where + T: Serialize + DeserializeOwned, + { + let type_name = std::any::type_name::(); + assert!(!type_name.contains("commands::"), "{type_name}"); + assert!(!type_name.contains("output::"), "{type_name}"); + assert!(!type_name.contains("clap"), "{type_name}"); + } + + #[test] + fn public_projection_api_is_cli_independent() { + let manifest_dir = std::path::Path::new(env!("CARGO_MANIFEST_DIR")); + for rel_path in [ + "src/index/api.rs", + "src/graph/typed_query.rs", + "src/graph/code_graph.rs", + "src/vector/code_symbols.rs", + "src/projection/sync.rs", + ] { + assert!( + manifest_dir.join(rel_path).exists(), + "missing projection boundary module {rel_path}" + ); + } + + assert_cli_independent_contract::(); + assert_cli_independent_contract::(); + assert_cli_independent_contract::(); + assert_cli_independent_contract::(); + assert_cli_independent_contract::(); + assert_cli_independent_contract::(); + assert_cli_independent_contract::(); + assert_cli_independent_contract::(); + assert_cli_independent_contract::(); + assert_cli_independent_contract::( + ); + assert_cli_independent_contract::(); + assert_cli_independent_contract::(); + assert_cli_independent_contract::(); + assert_cli_independent_contract::(); + } + + #[test] + fn falkor_facade_is_available() { + let _ = std::any::type_name::(); + + let ctx = crate::config::Context { + database_url: "postgresql://localhost/nonexistent".to_string(), + project_root: std::path::PathBuf::from("/nonexistent"), + project_id: "project-1".to_string(), + quiet: true, + falkordb: None, + qdrant: None, + embedding: None, + code_vectors: crate::config::CodeVectorSettings::default(), + daemon_url: None, + }; + + let value = crate::falkor::with_falkor(&ctx, 7usize, |_| Ok(9usize)) + .expect("missing FalkorDB config degrades to default"); + assert_eq!(value, 7); + } + + #[test] + fn foundation_consumer_migration() { + let manifest_dir = std::path::Path::new(env!("CARGO_MANIFEST_DIR")); + let cargo = std::fs::read_to_string(manifest_dir.join("Cargo.toml")) + .expect("read gobby-code Cargo.toml"); + for feature in ["postgres", "falkor", "qdrant", "search", "indexing"] { + assert!( + cargo.contains(feature), + "gobby-code must enable gobby-core feature `{feature}`" + ); + } + + let config = + std::fs::read_to_string(manifest_dir.join("src/config.rs")).expect("read config.rs"); + assert!(config.contains("gobby_core::config::resolve_falkordb_config")); + assert!(config.contains("gobby_core::config::resolve_qdrant_config")); + assert!(config.contains("gobby_core::config::resolve_embedding_config")); + assert!(config.contains("impl gobby_core::config::ConfigSource for PostgresConfigSource")); + assert!(config.contains("gobby_core::postgres::read_config_value")); + assert!(!config.contains("fn decode_config_value(")); + + let db = std::fs::read_to_string(manifest_dir.join("src/db.rs")).expect("read db.rs"); + assert!(db.contains("gobby_core::postgres::connect_readonly")); + assert!(db.contains("gobby_core::postgres::connect_readwrite")); + assert!(!db.contains("Client::connect(database_url, NoTls)")); + + let graph = std::fs::read_to_string(manifest_dir.join("src/graph/code_graph.rs")) + .expect("read graph/code_graph.rs"); + assert!(graph.contains("gobby_core::falkor::with_graph")); + assert!(!graph.contains("falkor::with_falkor")); + + let semantic = std::fs::read_to_string(manifest_dir.join("src/search/semantic.rs")) + .expect("read search/semantic.rs"); + assert!(semantic.contains("gobby_core::qdrant::with_qdrant")); + assert!(semantic.contains("gobby_core::qdrant::collection_name")); + assert!(semantic.contains("gobby_core::qdrant::search")); + } + + #[test] + fn indexing_search_primitive_migration() { + let manifest_dir = std::path::Path::new(env!("CARGO_MANIFEST_DIR")); + + let walker = std::fs::read_to_string(manifest_dir.join("src/index/walker.rs")) + .expect("read index/walker.rs"); + assert!(walker.contains("gobby_core::indexing::WalkerSettings")); + let local_walker_builder = ["WalkBuilder", "::new(root)"].concat(); + assert!(!walker.contains(&local_walker_builder)); + + let hasher = std::fs::read_to_string(manifest_dir.join("src/index/hasher.rs")) + .expect("read index/hasher.rs"); + assert!(hasher.contains("gobby_core::indexing::file_content_hash")); + let local_buffer = format!("let mut buf = [0u8; {}]", 64 * 1024); + assert!(!hasher.contains(&local_buffer)); + + let rrf = + std::fs::read_to_string(manifest_dir.join("src/search/rrf.rs")).expect("read rrf.rs"); + assert!(rrf.contains("gobby_core::search::rrf_merge")); + let local_rrf_const = ["const ", "RRF_K"].concat(); + assert!(!rrf.contains(&local_rrf_const)); + + let chunker = std::fs::read_to_string(manifest_dir.join("src/index/chunker.rs")) + .expect("read index/chunker.rs"); + assert!(!chunker.contains("use gobby_core::indexing::Chunk")); + assert!(!chunker.contains("use gobby_core::indexing::ChunkIdentity")); + assert!(!chunker.contains("use gobby_core::indexing::IndexEvent")); + assert!(!chunker.contains("use gobby_core::indexing::index_events_from_hashes")); + } + + #[test] + fn falkor_facade_exception_scoped_to_falkor_rs() { + let manifest_dir = std::path::Path::new(env!("CARGO_MANIFEST_DIR")); + let src_dir = manifest_dir.join("src"); + let mut offenders = Vec::new(); + + fn visit(path: &std::path::Path, offenders: &mut Vec) { + for entry in std::fs::read_dir(path).expect("read source directory") { + let entry = entry.expect("source entry"); + let path = entry.path(); + if path.is_dir() { + visit(&path, offenders); + continue; + } + if path.extension().and_then(|ext| ext.to_str()) != Some("rs") { + continue; + } + let source = std::fs::read_to_string(&path).expect("read source file"); + let builder = ["Falkor", "Client", "Builder"].concat(); + if source.contains(&builder) && !path.ends_with("falkor.rs") { + offenders.push(path); + } + } + } + + visit(&src_dir, &mut offenders); + assert!( + offenders.is_empty(), + "Falkor client builder must remain scoped to falkor.rs: {offenders:?}" + ); + } +} diff --git a/crates/gcode/src/main.rs b/crates/gcode/src/main.rs index a845df5..b64614e 100644 --- a/crates/gcode/src/main.rs +++ b/crates/gcode/src/main.rs @@ -1,21 +1,5 @@ -mod commands; -mod config; -mod db; -mod falkor; -mod freshness; -mod git; -mod index; -mod models; -mod output; -mod progress; -mod project; -mod savings; -mod schema; -mod search; -mod secrets; -mod skill; - -use clap::{Parser, Subcommand}; +use clap::{ArgGroup, Parser, Subcommand}; +use gobby_code::{commands, config, freshness, output, setup}; #[derive(Parser)] #[command(name = "gcode", version, about = "Fast code index CLI for Gobby")] @@ -49,6 +33,51 @@ enum Command { // ── Project Setup ──────────────────────────────────────────────── /// Initialize project context (.gobby/gcode.json) Init, + /// Explicitly create gcode-owned standalone database objects + Setup { + /// Required opt-in for setup writes in v1 + #[arg(long, required = true)] + standalone: bool, + /// PostgreSQL database URL to set up + #[arg(long)] + database_url: Option, + /// Skip Docker service provisioning + #[arg(long)] + no_services: bool, + /// Drop/recreate gcode-owned code-index state and clear code-index projections + #[arg(long)] + overwrite_code_index: bool, + /// PostgreSQL schema namespace for gcode-owned objects + #[arg(long, default_value = "public")] + schema: String, + /// Embedding provider to store in gcore.yaml + #[arg(long)] + embedding_provider: Option, + /// OpenAI-compatible embedding API base URL + #[arg(long)] + embedding_api_base: Option, + /// Embedding model name + #[arg(long)] + embedding_model: Option, + /// Embedding vector dimension + #[arg(long)] + embedding_vector_dim: Option, + /// Environment variable name containing the embedding API key + #[arg(long)] + embedding_api_key_env: Option, + /// FalkorDB host to store in gcore.yaml + #[arg(long)] + falkordb_host: Option, + /// FalkorDB port to store in gcore.yaml + #[arg(long)] + falkordb_port: Option, + /// FalkorDB password for Docker provisioning or external config + #[arg(long)] + falkordb_password: Option, + /// Qdrant URL to store in gcore.yaml when services are not provisioned + #[arg(long)] + qdrant_url: Option, + }, /// Index a directory (full or incremental). Writes symbols, files, and chunks to PostgreSQL hub Index { /// Path to index (default: project root) @@ -62,6 +91,9 @@ enum Command { /// Fail C/C++ indexing when clangd or compile_commands.json semantics are unavailable #[arg(long)] require_cpp_semantics: bool, + /// Synchronously update graph and vector projections after PostgreSQL indexing + #[arg(long)] + sync_projections: bool, }, /// Show project index status Status, @@ -71,11 +103,16 @@ enum Command { #[arg(long)] force: bool, }, - /// Manage code-index graph lifecycle through the Gobby daemon; read graph queries remain top-level [requires Gobby] + /// Manage and inspect the code-index graph projection [requires FalkorDB] Graph { #[command(subcommand)] command: GraphCommand, }, + /// Manage the code-symbol vector projection [requires Qdrant and embeddings] + Vector { + #[command(subcommand)] + command: VectorCommand, + }, // ── Search (works in all modes) ────────────────────────────────── /// Hybrid search: pg_search BM25 + optional semantic (Qdrant) + optional graph boost (FalkorDB) @@ -204,9 +241,77 @@ enum Command { #[derive(Subcommand)] enum GraphCommand { - /// Clear the current project's code-index graph projection via the Gobby daemon [requires Gobby] + /// Sync one indexed file into the code-index graph projection + SyncFile { + /// Indexed file path to sync + #[arg(long)] + file: String, + }, + /// Clear the current project's code-index graph projection + Clear { + /// Clear graph projection for this project id without resolving cwd project context + #[arg(long)] + project_id: Option, + }, + /// Rebuild the current project's code-index graph projection from PostgreSQL facts + Rebuild, + /// Generate a project graph report + Report { + /// Number of top hotspot and target rows to include + #[arg(long, default_value = "10")] + top_n: usize, + }, + /// Show an overview graph for the current project + Overview { + /// Maximum files to include + #[arg(long, default_value = "100")] + limit: usize, + }, + /// Show graph nodes and links for one indexed file + File { + /// Indexed file path to inspect + #[arg(long)] + file: String, + }, + /// Show graph neighbors for one symbol ID + Neighbors { + /// Symbol ID to inspect + #[arg(long)] + symbol_id: String, + #[arg(long, default_value = "100")] + limit: usize, + }, + /// Show transitive graph impact for a symbol ID or file path + #[command(group( + ArgGroup::new("target") + .required(true) + .args(["symbol_id", "file"]) + ))] + BlastRadius { + /// Symbol ID to inspect + #[arg(long)] + symbol_id: Option, + /// Indexed file path to inspect + #[arg(long)] + file: Option, + #[arg(long, default_value = "3")] + depth: usize, + #[arg(long, default_value = "100")] + limit: usize, + }, +} + +#[derive(Subcommand)] +enum VectorCommand { + /// Sync one indexed file into the code-symbol vector projection + SyncFile { + /// Indexed file path to sync + #[arg(long)] + file: String, + }, + /// Clear the current project's code-symbol vector projection Clear, - /// Rebuild the current project's code-index graph projection via the Gobby daemon [requires Gobby] + /// Rebuild the current project's code-symbol vector projection from PostgreSQL facts Rebuild, } @@ -235,48 +340,200 @@ fn ensure_symbol_fresh(ctx: &config::Context, disabled: bool, id: &str) -> anyho Ok(()) } -fn main() -> anyhow::Result<()> { - let cli = Cli::parse(); - - // Commands that must run before Context::resolve() (work on uninitialized projects) +fn dispatch_early_command(cli: &Cli, setup_runner: F) -> anyhow::Result +where + F: FnOnce(setup::StandaloneSetupRequest, output::Format, bool) -> anyhow::Result<()>, +{ match &cli.command { Command::Init => { let root = match &cli.project { Some(p) => std::path::PathBuf::from(p).canonicalize()?, None => config::detect_project_root()?, }; - return commands::init::run(&root, cli.format, cli.quiet); + commands::init::run(&root, cli.format, cli.quiet)?; + Ok(true) + } + Command::Setup { + standalone, + database_url, + no_services, + overwrite_code_index, + schema, + embedding_provider, + embedding_api_base, + embedding_model, + embedding_vector_dim, + embedding_api_key_env, + falkordb_host, + falkordb_port, + falkordb_password, + qdrant_url, + } => { + let mut request = setup::StandaloneSetupRequest::new( + *standalone, + database_url.clone(), + Some(schema.clone()), + ); + request.no_services = *no_services; + request.overwrite_code_index = *overwrite_code_index; + request.embedding_provider = embedding_provider.clone(); + request.embedding_api_base = embedding_api_base.clone(); + request.embedding_model = embedding_model.clone(); + request.embedding_vector_dim = *embedding_vector_dim; + request.embedding_api_key_env = embedding_api_key_env.clone(); + request.falkordb_host = falkordb_host.clone(); + request.falkordb_port = *falkordb_port; + request.falkordb_password = falkordb_password.clone(); + request.qdrant_url = qdrant_url.clone(); + setup_runner(request, cli.format, cli.quiet)?; + Ok(true) } Command::Projects => { - return commands::status::projects(cli.format); + commands::status::projects(cli.format)?; + Ok(true) } Command::Prune { force } => { - return commands::status::prune(*force); + commands::status::prune(*force)?; + Ok(true) + } + Command::Graph { + command: + GraphCommand::Clear { + project_id: Some(project_id), + }, + } => { + let ctx = config::Context::resolve_for_project_id(project_id, cli.quiet)?; + commands::graph::clear(&ctx, cli.format)?; + Ok(true) } - _ => {} + _ => Ok(false), + } +} + +fn main() -> anyhow::Result<()> { + let cli = Cli::parse(); + + // Commands that must run before Context::resolve() (work on uninitialized projects) + if dispatch_early_command(&cli, commands::setup::run)? { + return Ok(()); } let ctx = config::Context::resolve(cli.project.as_deref(), cli.quiet)?; match cli.command { - Command::Init | Command::Projects | Command::Prune { .. } => unreachable!(), + Command::Init | Command::Setup { .. } | Command::Projects | Command::Prune { .. } => { + unreachable!() + } Command::Index { path, files, full, require_cpp_semantics, - } => commands::index::run(&ctx, path, files, full, require_cpp_semantics), + sync_projections, + } => commands::index::run( + &ctx, + path, + files, + full, + require_cpp_semantics, + sync_projections, + cli.format, + ), Command::Status => { ensure_project_fresh(&ctx, cli.no_freshness)?; commands::status::run(&ctx, cli.format) } Command::Invalidate { force } => commands::status::invalidate(&ctx, force), Command::Graph { - command: GraphCommand::Clear, + command: GraphCommand::SyncFile { file }, + } => { + ensure_files_fresh( + &ctx, + cli.no_freshness, + vec![std::path::PathBuf::from(&file)], + )?; + commands::graph::sync_file(&ctx, &file, cli.format) + } + Command::Graph { + command: GraphCommand::Clear { project_id: None }, } => commands::graph::clear(&ctx, cli.format), + Command::Graph { + command: GraphCommand::Clear { + project_id: Some(_), + }, + } => unreachable!(), Command::Graph { command: GraphCommand::Rebuild, - } => commands::graph::rebuild(&ctx, cli.format), + } => { + ensure_project_fresh(&ctx, cli.no_freshness)?; + commands::graph::rebuild(&ctx, cli.format) + } + Command::Graph { + command: GraphCommand::Report { top_n }, + } => { + ensure_project_fresh(&ctx, cli.no_freshness)?; + commands::graph::report(&ctx, top_n, cli.format) + } + Command::Vector { + command: VectorCommand::SyncFile { file }, + } => { + ensure_files_fresh( + &ctx, + cli.no_freshness, + vec![std::path::PathBuf::from(&file)], + )?; + commands::vector::sync_file(&ctx, &file, cli.format) + } + Command::Vector { + command: VectorCommand::Clear, + } => commands::vector::clear(&ctx, cli.format), + Command::Vector { + command: VectorCommand::Rebuild, + } => { + ensure_project_fresh(&ctx, cli.no_freshness)?; + commands::vector::rebuild(&ctx, cli.format) + } + Command::Graph { + command: GraphCommand::Overview { limit }, + } => { + ensure_project_fresh(&ctx, cli.no_freshness)?; + commands::graph::overview(&ctx, limit, cli.format) + } + Command::Graph { + command: GraphCommand::File { file }, + } => { + ensure_files_fresh( + &ctx, + cli.no_freshness, + vec![std::path::PathBuf::from(&file)], + )?; + commands::graph::file(&ctx, &file, cli.format) + } + Command::Graph { + command: GraphCommand::Neighbors { symbol_id, limit }, + } => { + ensure_symbol_fresh(&ctx, cli.no_freshness, &symbol_id)?; + commands::graph::neighbors(&ctx, &symbol_id, limit, cli.format) + } + Command::Graph { + command: + GraphCommand::BlastRadius { + symbol_id, + file, + depth, + limit, + }, + } => { + ensure_project_fresh(&ctx, cli.no_freshness)?; + commands::graph::graph_blast_radius( + &ctx, + symbol_id.as_deref(), + file.as_deref(), + depth, + limit, + cli.format, + ) + } Command::Search { query, @@ -425,27 +682,231 @@ mod tests { use clap::Parser; #[test] - fn test_parse_graph_clear() { - let cli = Cli::try_parse_from(["gcode", "graph", "clear"]).expect("graph clear parses"); + fn parse_projection_lifecycle_commands() { + let cli = Cli::try_parse_from([ + "gcode", + "--format", + "text", + "graph", + "sync-file", + "--file", + "src/lib.rs", + ]) + .expect("graph sync-file parses"); + assert!(matches!(cli.format, output::Format::Text)); + match cli.command { + Command::Graph { + command: GraphCommand::SyncFile { file }, + } => assert_eq!(file, "src/lib.rs"), + _ => panic!("expected graph sync-file command"), + } + + let cli = Cli::try_parse_from([ + "gcode", + "--format", + "text", + "vector", + "sync-file", + "--file", + "src/lib.rs", + ]) + .expect("vector sync-file parses"); + assert!(matches!(cli.format, output::Format::Text)); + match cli.command { + Command::Vector { + command: VectorCommand::SyncFile { file }, + } => assert_eq!(file, "src/lib.rs"), + _ => panic!("expected vector sync-file command"), + } + let cli = Cli::try_parse_from(["gcode", "graph", "clear"]).expect("graph clear parses"); assert!(matches!( cli.command, Command::Graph { - command: GraphCommand::Clear + command: GraphCommand::Clear { project_id: None } } )); - } - #[test] - fn test_parse_graph_rebuild() { - let cli = Cli::try_parse_from(["gcode", "graph", "rebuild"]).expect("graph rebuild parses"); + let cli = Cli::try_parse_from(["gcode", "graph", "clear", "--project-id", "project-1"]) + .expect("graph clear --project-id parses"); + assert!(matches!( + cli.command, + Command::Graph { + command: GraphCommand::Clear { + project_id: Some(project_id) + } + } if project_id == "project-1" + )); + let cli = Cli::try_parse_from(["gcode", "graph", "rebuild"]).expect("graph rebuild parses"); assert!(matches!( cli.command, Command::Graph { command: GraphCommand::Rebuild } )); + + let cli = Cli::try_parse_from(["gcode", "graph", "report"]).expect("graph report parses"); + assert!(matches!( + cli.command, + Command::Graph { + command: GraphCommand::Report { top_n: 10 } + } + )); + + let cli = + Cli::try_parse_from(["gcode", "graph", "overview"]).expect("graph overview parses"); + assert!(matches!( + cli.command, + Command::Graph { + command: GraphCommand::Overview { limit: 100 } + } + )); + + let cli = Cli::try_parse_from(["gcode", "graph", "overview", "--limit", "25"]) + .expect("graph overview limit parses"); + assert!(matches!( + cli.command, + Command::Graph { + command: GraphCommand::Overview { limit: 25 } + } + )); + + let cli = Cli::try_parse_from(["gcode", "graph", "file", "--file", "src/main.rs"]) + .expect("graph file parses"); + match cli.command { + Command::Graph { + command: GraphCommand::File { file }, + } => assert_eq!(file, "src/main.rs"), + _ => panic!("expected graph file command"), + } + + let cli = Cli::try_parse_from([ + "gcode", + "graph", + "neighbors", + "--symbol-id", + "sym-1", + "--limit", + "7", + ]) + .expect("graph neighbors parses"); + match cli.command { + Command::Graph { + command: GraphCommand::Neighbors { symbol_id, limit }, + } => { + assert_eq!(symbol_id, "sym-1"); + assert_eq!(limit, 7); + } + _ => panic!("expected graph neighbors command"), + } + + let cli = Cli::try_parse_from([ + "gcode", + "graph", + "blast-radius", + "--symbol-id", + "sym-1", + "--depth", + "2", + "--limit", + "9", + ]) + .expect("graph blast-radius symbol parses"); + match cli.command { + Command::Graph { + command: + GraphCommand::BlastRadius { + symbol_id, + file, + depth, + limit, + }, + } => { + assert_eq!(symbol_id.as_deref(), Some("sym-1")); + assert_eq!(file, None); + assert_eq!(depth, 2); + assert_eq!(limit, 9); + } + _ => panic!("expected graph blast-radius command"), + } + + let cli = Cli::try_parse_from([ + "gcode", + "graph", + "blast-radius", + "--file", + "src/lib.rs", + "--depth", + "2", + "--limit", + "9", + ]) + .expect("graph blast-radius file parses"); + match cli.command { + Command::Graph { + command: + GraphCommand::BlastRadius { + symbol_id, + file, + depth, + limit, + }, + } => { + assert_eq!(symbol_id, None); + assert_eq!(file.as_deref(), Some("src/lib.rs")); + assert_eq!(depth, 2); + assert_eq!(limit, 9); + } + _ => panic!("expected graph blast-radius command"), + } + + let cli = Cli::try_parse_from(["gcode", "vector", "clear"]).expect("vector clear parses"); + assert!(matches!( + cli.command, + Command::Vector { + command: VectorCommand::Clear + } + )); + + let cli = + Cli::try_parse_from(["gcode", "vector", "rebuild"]).expect("vector rebuild parses"); + assert!(matches!( + cli.command, + Command::Vector { + command: VectorCommand::Rebuild + } + )); + + let cli = + Cli::try_parse_from(["gcode", "index", "--sync-projections"]).expect("index parses"); + match cli.command { + Command::Index { + sync_projections, .. + } => assert!(sync_projections), + _ => panic!("expected index command"), + } + } + + #[test] + fn parse_graph_report_global_format() { + let cli = Cli::try_parse_from([ + "gcode", "graph", "report", "--top-n", "5", "--format", "text", + ]) + .expect("graph report parses"); + assert!(matches!(cli.format, output::Format::Text)); + match cli.command { + Command::Graph { + command: GraphCommand::Report { top_n }, + } => assert_eq!(top_n, 5), + _ => panic!("expected graph report command"), + } + + let err = match Cli::try_parse_from(["gcode", "graph", "report", "--limit", "5"]) { + Ok(_) => panic!("report keeps minimal args"), + Err(err) => err, + }; + assert_eq!(err.kind(), clap::error::ErrorKind::UnknownArgument); } #[test] @@ -456,8 +917,12 @@ mod tests { match cli.command { Command::Index { require_cpp_semantics, + sync_projections, .. - } => assert!(require_cpp_semantics), + } => { + assert!(require_cpp_semantics); + assert!(!sync_projections); + } _ => panic!("expected index command"), } } @@ -700,4 +1165,90 @@ mod tests { assert!(cli.no_freshness); assert!(matches!(cli.command, Command::Tree)); } + + #[test] + fn parse_setup_standalone() { + let cli = Cli::try_parse_from([ + "gcode", + "setup", + "--standalone", + "--database-url", + "postgresql://localhost/gcode", + "--no-services", + "--overwrite-code-index", + "--embedding-provider", + "ollama", + "--embedding-vector-dim", + "768", + "--falkordb-password", + "secret-pass", + ]) + .expect("setup parses"); + + match cli.command { + Command::Setup { + standalone, + database_url, + no_services, + overwrite_code_index, + schema, + embedding_provider, + embedding_vector_dim, + falkordb_password, + .. + } => { + assert!(standalone); + assert_eq!( + database_url.as_deref(), + Some("postgresql://localhost/gcode") + ); + assert!(no_services); + assert!(overwrite_code_index); + assert_eq!(schema, "public"); + assert_eq!(embedding_provider.as_deref(), Some("ollama")); + assert_eq!(embedding_vector_dim, Some(768)); + assert_eq!(falkordb_password.as_deref(), Some("secret-pass")); + } + _ => panic!("expected setup command"), + } + } + + #[test] + fn setup_runs_before_context_resolve() { + let project = tempfile::tempdir().expect("temp project"); + let cli = Cli::try_parse_from([ + "gcode", + "--project", + project.path().to_str().expect("utf8 temp path"), + "setup", + "--standalone", + "--database-url", + "postgresql://localhost/gcode", + "--overwrite-code-index", + "--embedding-api-base", + "https://embeddings.example/v1", + ]) + .expect("setup parses"); + + let mut called = false; + let dispatched = dispatch_early_command(&cli, |request, _format, _quiet| { + called = true; + assert!(request.standalone); + assert_eq!( + request.database_url.as_deref(), + Some("postgresql://localhost/gcode") + ); + assert_eq!(request.schema, "public"); + assert!(request.overwrite_code_index); + assert_eq!( + request.embedding_api_base.as_deref(), + Some("https://embeddings.example/v1") + ); + Ok(()) + }) + .expect("early dispatch succeeds without resolving project context"); + + assert!(dispatched); + assert!(called); + } } diff --git a/crates/gcode/src/models.rs b/crates/gcode/src/models.rs index 88f983a..0dd0cf9 100644 --- a/crates/gcode/src/models.rs +++ b/crates/gcode/src/models.rs @@ -9,6 +9,103 @@ pub const CODE_INDEX_UUID_NAMESPACE: Uuid = Uuid::from_bytes([ 0xc0, 0xde, 0x1d, 0xe0, 0x00, 0x00, 0x40, 0x00, 0x80, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, ]); +pub const SOURCE_SYSTEM_GCODE: &str = "gcode"; + +/// Producer confidence classification for graph and vector projection facts. +#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)] +#[serde(rename_all = "SCREAMING_SNAKE_CASE")] +pub enum ProjectionProvenance { + Extracted, + Inferred, + Ambiguous, +} + +impl ProjectionProvenance { + pub fn from_wire_value(value: &str) -> Option { + match value { + "EXTRACTED" | "extracted" => Some(Self::Extracted), + "INFERRED" | "inferred" => Some(Self::Inferred), + "AMBIGUOUS" | "ambiguous" => Some(Self::Ambiguous), + _ => None, + } + } +} + +/// Optional provenance attached to graph results and projection payloads. +#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)] +pub struct ProjectionMetadata { + pub provenance: ProjectionProvenance, + #[serde(skip_serializing_if = "Option::is_none")] + pub confidence: Option, + pub source_system: String, + #[serde(skip_serializing_if = "Option::is_none")] + pub source_file_path: Option, + #[serde(skip_serializing_if = "Option::is_none")] + pub source_line: Option, + #[serde(skip_serializing_if = "Option::is_none")] + pub source_symbol_id: Option, + #[serde(skip_serializing_if = "Option::is_none")] + pub matching_method: Option, +} + +impl ProjectionMetadata { + pub fn new(provenance: ProjectionProvenance, source_system: impl Into) -> Self { + Self { + provenance, + confidence: None, + source_system: source_system.into(), + source_file_path: None, + source_line: None, + source_symbol_id: None, + matching_method: None, + } + } + + pub fn gcode_extracted() -> Self { + Self::new(ProjectionProvenance::Extracted, SOURCE_SYSTEM_GCODE).with_confidence(Some(1.0)) + } + + pub fn inferred(source_system: impl Into, confidence: Option) -> Self { + Self::new(ProjectionProvenance::Inferred, source_system).with_confidence(confidence) + } + + pub fn ambiguous(source_system: impl Into, confidence: Option) -> Self { + Self::new(ProjectionProvenance::Ambiguous, source_system).with_confidence(confidence) + } + + pub fn with_confidence(mut self, confidence: Option) -> Self { + self.confidence = confidence; + self + } + + pub fn with_source_file_path(mut self, file_path: impl Into) -> Self { + self.source_file_path = Some(file_path.into()); + self + } + + pub fn with_source_line(mut self, line: usize) -> Self { + self.source_line = Some(line); + self + } + + pub fn with_source_symbol_id(mut self, symbol_id: impl Into) -> Self { + self.source_symbol_id = Some(symbol_id.into()); + self + } + + pub fn with_matching_method(mut self, matching_method: impl Into) -> Self { + self.matching_method = Some(matching_method.into()); + self + } + + pub fn is_hypothesis(&self) -> bool { + matches!( + self.provenance, + ProjectionProvenance::Inferred | ProjectionProvenance::Ambiguous + ) + } +} + /// A code symbol extracted from AST parsing. #[derive(Debug, Clone, Serialize, Deserialize)] pub struct Symbol { @@ -118,6 +215,21 @@ impl Symbol { } } +pub fn make_unresolved_callee_id(project_id: &str, callee_name: &str) -> String { + let key = format!("unresolved:{project_id}:{callee_name}"); + Uuid::new_v5(&CODE_INDEX_UUID_NAMESPACE, key.as_bytes()).to_string() +} + +pub fn make_external_symbol_id( + project_id: &str, + callee_name: &str, + module: Option<&str>, +) -> String { + let module_key = module.unwrap_or_default(); + let key = format!("external:{project_id}:{module_key}:{callee_name}"); + Uuid::new_v5(&CODE_INDEX_UUID_NAMESPACE, key.as_bytes()).to_string() +} + fn i64_to_usize(value: i64, column: &str) -> anyhow::Result { value .try_into() @@ -284,6 +396,8 @@ pub struct GraphResult { pub relation: Option, #[serde(skip_serializing_if = "Option::is_none")] pub distance: Option, + #[serde(default, skip_serializing_if = "Option::is_none")] + pub metadata: Option, } /// Result of parsing a single file. @@ -347,20 +461,27 @@ mod tests { use super::*; #[test] - fn test_uuid5_parity_with_python() { - // Python: Symbol.make_id("proj1", "src/main.py", "foo", "function", 42) - // Must produce the same UUID in Rust. - let id = Symbol::make_id("proj1", "src/main.py", "foo", "function", 42); - // The key is "proj1:src/main.py:foo:function:42" - // This is a deterministic UUID5 — verify it's stable across runs. - let id2 = Symbol::make_id("proj1", "src/main.py", "foo", "function", 42); - assert_eq!(id, id2); - - // Verify the namespace UUID bytes match Python's c0de1de0-0000-4000-8000-000000000000 + fn uuid5_python_parity() { assert_eq!( CODE_INDEX_UUID_NAMESPACE.to_string(), "c0de1de0-0000-4000-8000-000000000000" ); + assert_eq!( + Symbol::make_id("proj1", "src/main.py", "foo", "function", 42), + "403e2117-92e7-5390-ad83-226629486481" + ); + assert_eq!( + make_unresolved_callee_id("proj1", "missing_func"), + "42693df1-99e6-5daa-be29-3535096cd2b5" + ); + assert_eq!( + make_external_symbol_id("proj1", "get", Some("requests")), + "7c7e6ebe-47c6-5a3d-a83d-d5160f10cb74" + ); + assert_eq!( + make_external_symbol_id("proj1", "println", None), + "c6b97498-448e-5ef1-9cb5-ab1cf37b6596" + ); } #[test] fn test_call_relation_promotes_symbol_targets() { @@ -375,4 +496,21 @@ mod tests { assert_eq!(call.callee_symbol_id.as_deref(), Some("callee-id")); assert_eq!(call.callee_target_kind, CallTargetKind::Symbol); } + + #[test] + fn graph_result_metadata_is_optional_for_json_compatibility() { + let old_json = serde_json::json!({ + "id": "sym-1", + "name": "foo", + "file_path": "src/main.rs", + "line": 10 + }); + + let parsed: GraphResult = + serde_json::from_value(old_json).expect("old graph result JSON still parses"); + assert!(parsed.metadata.is_none()); + + let serialized = serde_json::to_value(&parsed).expect("graph result serializes"); + assert!(serialized.get("metadata").is_none()); + } } diff --git a/crates/gcode/src/output.rs b/crates/gcode/src/output.rs index 0d4c7c1..fbcc0db 100644 --- a/crates/gcode/src/output.rs +++ b/crates/gcode/src/output.rs @@ -18,3 +18,9 @@ pub fn print_json_compact(value: &T) -> anyhow::Result<() println!("{}", serde_json::to_string(value)?); Ok(()) } + +/// Print a plain text command result to stdout. +pub fn print_text(text: &str) -> anyhow::Result<()> { + println!("{text}"); + Ok(()) +} diff --git a/crates/gcode/src/projection/mod.rs b/crates/gcode/src/projection/mod.rs new file mode 100644 index 0000000..d086d5b --- /dev/null +++ b/crates/gcode/src/projection/mod.rs @@ -0,0 +1 @@ +pub mod sync; diff --git a/crates/gcode/src/projection/sync.rs b/crates/gcode/src/projection/sync.rs new file mode 100644 index 0000000..81bc90b --- /dev/null +++ b/crates/gcode/src/projection/sync.rs @@ -0,0 +1,361 @@ +use crate::config::Context; +use crate::db; +use crate::graph::code_graph::{self, GraphReadError}; +use crate::vector::code_symbols::{self, CodeSymbolVectorLifecycle, VectorLifecycleError}; +use serde::{Deserialize, Serialize}; + +#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)] +#[serde(rename_all = "snake_case")] +pub enum ProjectionTarget { + Graph, + Vectors, +} + +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +pub struct ProjectionSyncRequest { + pub project_id: String, + pub file_paths: Vec, + pub targets: Vec, +} + +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +pub struct ProjectionSyncStatus { + pub project_id: String, + pub file_paths: Vec, + pub graph_pending: bool, + pub vectors_pending: bool, +} + +#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)] +#[serde(rename_all = "snake_case")] +pub enum ProjectionStatus { + Ok, + Degraded, + Failed, +} + +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +pub struct ProjectionSyncError { + pub kind: String, + pub message: String, +} + +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +pub struct ProjectionSyncReport { + pub status: ProjectionStatus, + pub synced_files: usize, + pub synced_symbols: usize, + pub degraded: bool, + pub error: Option, +} + +impl ProjectionSyncReport { + pub fn ok(synced_files: usize, synced_symbols: usize) -> Self { + Self { + status: ProjectionStatus::Ok, + synced_files, + synced_symbols, + degraded: false, + error: None, + } + } + + pub fn degraded( + kind: impl Into, + message: impl Into, + synced_files: usize, + synced_symbols: usize, + ) -> Self { + Self { + status: ProjectionStatus::Degraded, + synced_files, + synced_symbols, + degraded: true, + error: Some(ProjectionSyncError { + kind: kind.into(), + message: message.into(), + }), + } + } + + fn degraded_from_error( + error: &anyhow::Error, + synced_files: usize, + synced_symbols: usize, + ) -> Self { + let typed = typed_projection_error(error); + Self { + status: ProjectionStatus::Degraded, + synced_files, + synced_symbols, + degraded: true, + error: Some(typed), + } + } +} + +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +pub struct ProjectionSyncReports { + pub graph: ProjectionSyncReport, + pub vector: ProjectionSyncReport, +} + +pub fn pending_after_code_fact_write(request: ProjectionSyncRequest) -> ProjectionSyncStatus { + ProjectionSyncStatus { + graph_pending: request.targets.contains(&ProjectionTarget::Graph), + vectors_pending: request.targets.contains(&ProjectionTarget::Vectors), + project_id: request.project_id, + file_paths: request.file_paths, + } +} + +pub fn sync_after_index( + ctx: &Context, + file_paths: &[String], +) -> anyhow::Result { + Ok(ProjectionSyncReports { + graph: sync_graph_files(ctx, file_paths)?, + vector: sync_vector_files(ctx, file_paths)?, + }) +} + +pub(crate) fn sync_files_with_state( + file_paths: &[String], + state: &mut S, + mut sync_one: impl FnMut(&mut S, &str) -> anyhow::Result, + mut mark_synced: impl FnMut(&mut S, &str) -> anyhow::Result<()>, +) -> ProjectionSyncReport { + let mut synced_files = 0usize; + let mut synced_symbols = 0usize; + + for file_path in file_paths { + let symbols = match sync_one(state, file_path) + .and_then(|symbols| mark_synced(state, file_path).map(|()| symbols)) + { + Ok(symbols) => symbols, + Err(error) => { + return ProjectionSyncReport::degraded_from_error( + &error, + synced_files, + synced_symbols, + ); + } + }; + synced_files += 1; + synced_symbols += symbols; + } + + ProjectionSyncReport::ok(synced_files, synced_symbols) +} + +fn sync_graph_files(ctx: &Context, file_paths: &[String]) -> anyhow::Result { + if file_paths.is_empty() { + return Ok(ProjectionSyncReport::ok(0, 0)); + } + if let Err(error) = code_graph::require_graph_reads(ctx) { + return Ok(ProjectionSyncReport::degraded_from_error(&error, 0, 0)); + } + + let conn = db::connect_readwrite(&ctx.database_url)?; + let mut state = GraphProjectionState { ctx, conn }; + Ok(sync_files_with_state( + file_paths, + &mut state, + GraphProjectionState::sync_file, + GraphProjectionState::mark_synced, + )) +} + +fn sync_vector_files(ctx: &Context, file_paths: &[String]) -> anyhow::Result { + if file_paths.is_empty() { + return Ok(ProjectionSyncReport::ok(0, 0)); + } + + let lifecycle = match vector_lifecycle_from_context(ctx) { + Ok(lifecycle) => lifecycle, + Err(error) => { + return Ok(ProjectionSyncReport::degraded( + vector_error_kind(&error), + error.to_string(), + 0, + 0, + )); + } + }; + let conn = db::connect_readwrite(&ctx.database_url)?; + let mut state = VectorProjectionState { + ctx, + conn, + lifecycle, + }; + Ok(sync_files_with_state( + file_paths, + &mut state, + VectorProjectionState::sync_file, + VectorProjectionState::mark_synced, + )) +} + +struct GraphProjectionState<'a> { + ctx: &'a Context, + conn: postgres::Client, +} + +impl GraphProjectionState<'_> { + fn sync_file(&mut self, file_path: &str) -> anyhow::Result { + let facts = db::read_graph_file_facts(&mut self.conn, &self.ctx.project_id, file_path)?; + if !db::mark_graph_sync_attempted(&mut self.conn, &self.ctx.project_id, file_path)? { + anyhow::bail!( + "indexed file `{file_path}` was not found for project {}", + self.ctx.project_id + ); + } + code_graph::sync_file_graph( + self.ctx, + &facts.file_path, + &facts.imports, + &facts.definitions, + &facts.calls, + )?; + Ok(facts.definitions.len()) + } + + fn mark_synced(&mut self, file_path: &str) -> anyhow::Result<()> { + if db::mark_graph_synced(&mut self.conn, &self.ctx.project_id, file_path)? { + Ok(()) + } else { + anyhow::bail!( + "indexed file `{file_path}` was not found for project {}", + self.ctx.project_id + ) + } + } +} + +struct VectorProjectionState<'a> { + ctx: &'a Context, + conn: postgres::Client, + lifecycle: CodeSymbolVectorLifecycle, +} + +impl VectorProjectionState<'_> { + fn sync_file(&mut self, file_path: &str) -> anyhow::Result { + if !db::indexed_file_exists(&mut self.conn, &self.ctx.project_id, file_path)? { + anyhow::bail!( + "indexed file `{file_path}` was not found for project {}", + self.ctx.project_id + ); + } + let symbols = + code_symbols::fetch_symbols_for_file(&mut self.conn, &self.ctx.project_id, file_path)?; + let symbol_count = symbols.len(); + self.lifecycle.sync_file_symbols(file_path, &symbols)?; + Ok(symbol_count) + } + + fn mark_synced(&mut self, file_path: &str) -> anyhow::Result<()> { + if db::mark_vectors_synced(&mut self.conn, &self.ctx.project_id, file_path)? { + Ok(()) + } else { + anyhow::bail!( + "indexed file `{file_path}` was not found for project {}", + self.ctx.project_id + ) + } + } +} + +fn vector_lifecycle_from_context( + ctx: &Context, +) -> Result { + let qdrant = ctx + .qdrant + .clone() + .ok_or(VectorLifecycleError::MissingQdrantConfig)?; + let embedding = ctx + .embedding + .clone() + .ok_or(VectorLifecycleError::MissingEmbeddingConfig)?; + CodeSymbolVectorLifecycle::new( + ctx.project_id.clone(), + qdrant, + embedding, + ctx.code_vectors.clone(), + ) +} + +fn typed_projection_error(error: &anyhow::Error) -> ProjectionSyncError { + let kind = error + .downcast_ref::() + .map(vector_error_kind) + .or_else(|| error.downcast_ref::().map(graph_error_kind)) + .unwrap_or("sync_failed"); + ProjectionSyncError { + kind: kind.to_string(), + message: error.to_string(), + } +} + +fn graph_error_kind(error: &GraphReadError) -> &'static str { + match error { + GraphReadError::NotConfigured => "missing_falkordb_config", + GraphReadError::Unreachable { .. } => "falkordb_unreachable", + GraphReadError::QueryFailed { .. } => "falkordb_query_failed", + GraphReadError::InvalidTarget { .. } => "invalid_graph_target", + } +} + +fn vector_error_kind(error: &VectorLifecycleError) -> &'static str { + match error { + VectorLifecycleError::MissingQdrantConfig => "missing_qdrant_config", + VectorLifecycleError::MissingEmbeddingConfig => "missing_embedding_config", + VectorLifecycleError::EmbeddingHttp { .. } => "embedding_http", + VectorLifecycleError::EmbeddingResponse(_) => "embedding_response", + VectorLifecycleError::QdrantHttp { .. } => "qdrant_http", + VectorLifecycleError::QdrantOperation(_) => "qdrant_operation", + VectorLifecycleError::DimensionMismatch { .. } => "dimension_mismatch", + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn sync_state_tracks_projection_success() { + let files = vec!["src/ok.rs".to_string(), "src/fail.rs".to_string()]; + #[derive(Default)] + struct State { + synced: Vec, + marked_synced: Vec, + } + let mut state = State::default(); + + let report = sync_files_with_state( + &files, + &mut state, + |state, file_path| { + state.synced.push(file_path.to_string()); + if file_path == "src/fail.rs" { + anyhow::bail!("projection write failed"); + } + Ok(3) + }, + |state, file_path| { + state.marked_synced.push(file_path.to_string()); + Ok(()) + }, + ); + + assert_eq!(state.synced, vec!["src/ok.rs", "src/fail.rs"]); + assert_eq!(state.marked_synced, vec!["src/ok.rs"]); + assert_eq!(report.status, ProjectionStatus::Degraded); + assert_eq!(report.synced_files, 1); + assert_eq!(report.synced_symbols, 3); + assert!(report.degraded); + assert_eq!( + report.error.as_ref().map(|error| error.kind.as_str()), + Some("sync_failed") + ); + } +} diff --git a/crates/gcode/src/schema.rs b/crates/gcode/src/schema.rs index a1a1fe1..eab7997 100644 --- a/crates/gcode/src/schema.rs +++ b/crates/gcode/src/schema.rs @@ -12,7 +12,7 @@ const REQUIRED_TABLES: &[&str] = &[ const REQUIRED_BM25_INDEXES: &[&str] = &["code_symbols_search_bm25", "code_content_search_bm25"]; -const MIGRATION_HINT: &str = "Configure the Gobby PostgreSQL hub with the required code-index schema, `pg_search` extension, and BM25 indexes."; +const MIGRATION_HINT: &str = "Configure the Gobby PostgreSQL hub with the required code-index schema, `pg_search` extension, and BM25 indexes. For standalone databases, run `gcode setup --standalone --database-url `."; /// Validate that the Gobby-owned PostgreSQL hub schema exists. /// @@ -92,4 +92,12 @@ mod tests { Client::connect(&database_url, NoTls).expect("connect test PostgreSQL hub"); validate_runtime_schema(&mut client).expect("validate test PostgreSQL hub schema"); } + + #[test] + fn missing_schema_requires_setup() { + assert!( + MIGRATION_HINT.contains("gcode setup --standalone"), + "missing runtime schema guidance must point standalone users at explicit setup" + ); + } } diff --git a/crates/gcode/src/search/graph_boost.rs b/crates/gcode/src/search/graph_boost.rs index 74be1a0..5e6965f 100644 --- a/crates/gcode/src/search/graph_boost.rs +++ b/crates/gcode/src/search/graph_boost.rs @@ -9,7 +9,7 @@ use std::collections::HashSet; use crate::config::Context; use crate::db; -use crate::falkor; +use crate::graph::code_graph; use crate::search::fts; /// Get symbol IDs related to query via the call/import graph. @@ -30,8 +30,8 @@ pub fn graph_boost(ctx: &Context, query: &str) -> Vec { return vec![]; }; - let callers = falkor::find_callers(ctx, &symbol.id, 0, 10).unwrap_or_default(); - let usages = falkor::find_usages(ctx, &symbol.id, 0, 10).unwrap_or_default(); + let callers = code_graph::find_callers(ctx, &symbol.id, 0, 10).unwrap_or_default(); + let usages = code_graph::find_usages(ctx, &symbol.id, 0, 10).unwrap_or_default(); let mut ids = Vec::new(); let mut seen = HashSet::new(); @@ -55,9 +55,9 @@ pub fn graph_expand(ctx: &Context, seed_ids: &[String]) -> Vec { } // Callees first — "what do these symbols call?" surfaces implementation details - let callees = falkor::find_callees_batch(ctx, seed_ids, 30).unwrap_or_default(); + let callees = code_graph::find_callees_batch(ctx, seed_ids, 30).unwrap_or_default(); // Callers second — "who calls these symbols?" surfaces broader context - let callers = falkor::find_callers_batch(ctx, seed_ids, 30).unwrap_or_default(); + let callers = code_graph::find_callers_batch(ctx, seed_ids, 30).unwrap_or_default(); let mut ids = Vec::new(); let mut seen = HashSet::new(); @@ -83,6 +83,7 @@ mod tests { falkordb: None, qdrant: None, embedding: None, + code_vectors: crate::config::CodeVectorSettings::default(), daemon_url: None, } } diff --git a/crates/gcode/src/search/rrf.rs b/crates/gcode/src/search/rrf.rs index 96011d1..f7b62a5 100644 --- a/crates/gcode/src/search/rrf.rs +++ b/crates/gcode/src/search/rrf.rs @@ -3,16 +3,6 @@ //! score(rank) = 1.0 / (K + rank) where K = 60. //! Ports logic from src/gobby/code_index/searcher.py. -use std::collections::HashMap; - -/// RRF constant — matches Python RRF_K in code_index/searcher.py and memory/manager.py. -const RRF_K: f64 = 60.0; - -/// Compute RRF score for a given rank (0-indexed). -fn rrf_score(rank: usize) -> f64 { - 1.0 / (RRF_K + rank as f64) -} - /// Merged result: (symbol_id, combined_score, source_names). pub type MergedResult = (String, f64, Vec); @@ -23,51 +13,16 @@ pub type MergedResult = (String, f64, Vec); /// /// Returns `(id, score, sources)` sorted by score descending. pub fn merge(sources: Vec<(&str, Vec)>) -> Vec { - let mut entries: HashMap> = HashMap::new(); - - for (source_name, ids) in &sources { - for (rank, id) in ids.iter().enumerate() { - entries - .entry(id.clone()) - .or_default() - .insert(source_name.to_string(), rank); - } - } - - let mut results: Vec = entries + gobby_core::search::rrf_merge(sources) .into_iter() - .map(|(id, source_ranks)| { - let score: f64 = source_ranks.values().map(|&rank| rrf_score(rank)).sum(); - let mut source_names: Vec = source_ranks.into_keys().collect(); - source_names.sort(); - (id, score, source_names) - }) - .collect(); - - results.sort_by(|a, b| { - b.1.partial_cmp(&a.1) - .unwrap_or(std::cmp::Ordering::Equal) - .then_with(|| a.0.cmp(&b.0)) - }); - results + .map(|result| (result.id, result.score, result.sources)) + .collect() } #[cfg(test)] mod tests { use super::*; - #[test] - fn test_rrf_score_rank_zero() { - let score = rrf_score(0); - assert!((score - 1.0 / 60.0).abs() < 1e-10); - } - - #[test] - fn test_rrf_score_rank_ten() { - let score = rrf_score(10); - assert!((score - 1.0 / 70.0).abs() < 1e-10); - } - #[test] fn test_merge_single_source() { let results = merge(vec![("fts", vec!["a".into(), "b".into(), "c".into()])]); @@ -84,9 +39,9 @@ mod tests { ("fts", vec!["a".into(), "b".into()]), ("graph", vec!["a".into(), "c".into()]), ]); - // "a" appears in both sources at rank 0, so it gets 2 * rrf_score(0) + // "a" appears in both sources at rank 0, so it gets two rank-zero contributions. let a_result = results.iter().find(|r| r.0 == "a").unwrap(); - let expected = 2.0 * rrf_score(0); + let expected = 2.0 * (1.0 / 60.0); assert!((a_result.1 - expected).abs() < 1e-10); assert_eq!(a_result.2.len(), 2); // "a" should be ranked first @@ -130,4 +85,30 @@ mod tests { let results = merge(vec![("fts", vec![]), ("graph", vec![])]); assert!(results.is_empty()); } + + #[test] + fn merge_delegates_to_gobby_core_rrf() { + let sources = vec![ + ( + "fts", + vec!["a".to_string(), "a".to_string(), "b".to_string()], + ), + ("semantic", vec!["b".to_string()]), + ]; + let results = merge(sources.clone()); + let expected = gobby_core::search::rrf_merge(sources); + + assert_eq!(results.len(), expected.len()); + for (actual, expected) in results.iter().zip(expected.iter()) { + assert_eq!(actual.0, expected.id); + assert!((actual.1 - expected.score).abs() < 1e-10); + assert_eq!(actual.2, expected.sources); + } + + let source = include_str!("rrf.rs"); + let delegate = ["gobby_core", "::search::rrf_merge"].concat(); + let local_const = ["const ", "RRF_K"].concat(); + assert!(source.contains(&delegate)); + assert!(!source.contains(&local_const)); + } } diff --git a/crates/gcode/src/search/semantic.rs b/crates/gcode/src/search/semantic.rs index c32870e..4aba9e4 100644 --- a/crates/gcode/src/search/semantic.rs +++ b/crates/gcode/src/search/semantic.rs @@ -1,148 +1,50 @@ -//! Qdrant vector search + OpenAI-compatible embedding API. +//! Compatibility wrapper for Qdrant vector search. //! -//! Provides semantic search via Qdrant REST API. Query embeddings are generated -//! by calling an OpenAI-compatible `/v1/embeddings` endpoint (LM Studio, Ollama, -//! OpenAI, etc.) — the same API the Gobby daemon uses for index-time embeddings. -//! -//! Graceful degradation: -//! - No embedding API configured → semantic search disabled (BM25 + graph only) -//! - No Qdrant URL → semantic search disabled -//! - API call fails → semantic search disabled for that query - -use serde_json::Value; - -use crate::config::{Context, EmbeddingConfig, QdrantConfig}; - -// ── Query embedding (OpenAI-compatible HTTP API) ──────────────────── - -/// Embed a search query via OpenAI-compatible `/v1/embeddings` endpoint. -/// -/// Returns None if the API is unreachable or returns an error (graceful degradation). -pub fn embed_query(config: &EmbeddingConfig, text: &str) -> Option> { - let client = reqwest::blocking::Client::builder() - .timeout(std::time::Duration::from_secs(10)) - .build() - .ok()?; - - let body = serde_json::json!({ - "model": config.model, - "input": format!("search_query: {text}"), - }); - - let url = format!("{}/embeddings", config.api_base.trim_end_matches('/')); - let mut req = client.post(&url).json(&body); - - if let Some(key) = &config.api_key { - req = req.header("Authorization", format!("Bearer {key}")); - } - - let resp = req.send().ok()?; - if !resp.status().is_success() { - return None; - } - - let data: Value = resp.json().ok()?; - let embedding: Vec = data - .get("data")? - .as_array()? - .first()? - .get("embedding")? - .as_array()? - .iter() - .filter_map(|v| v.as_f64().map(|f| f as f32)) - .collect(); +//! Reusable vector projection behavior lives in `crate::vector::code_symbols`. - if embedding.is_empty() { - None - } else { - Some(embedding) - } -} +pub use crate::vector::code_symbols::{embed_query, vector_search}; -// ── Qdrant REST API (read-only) ───────────────────────────────────── - -/// Search Qdrant for similar vectors. Returns (point_id, score) pairs. -pub fn vector_search( - config: &QdrantConfig, - collection: &str, - query_vector: &[f32], - limit: usize, -) -> anyhow::Result> { - let url = match &config.url { - Some(u) => u, - None => return Ok(vec![]), - }; +use crate::config::{CODE_SYMBOL_COLLECTION_PREFIX, Context}; +use gobby_core::qdrant::{CollectionScope, SearchRequest}; - let client = reqwest::blocking::Client::builder() - .timeout(std::time::Duration::from_secs(10)) - .build()?; - - let body = serde_json::json!({ - "vector": query_vector, - "limit": limit, - "with_payload": false, - }); - - let mut req = client - .post(format!("{url}/collections/{collection}/points/search")) - .json(&body); - - if let Some(key) = &config.api_key { - req = req.header("api-key", key); - } - - let resp = req.send()?; - if !resp.status().is_success() { - return Ok(vec![]); - } - - let data: Value = resp.json()?; - let results = data - .get("result") - .and_then(|r| r.as_array()) - .map(|arr| { - arr.iter() - .filter_map(|hit| { - let id = hit.get("id")?.as_str()?.to_string(); - let score = hit.get("score")?.as_f64()?; - Some((id, score)) - }) - .collect() - }) - .unwrap_or_default(); - - Ok(results) -} - -// ── Composite functions ────────────────────────────────────────────── - -/// Run semantic search for a query. Returns (symbol_id, score) pairs. -/// -/// Returns empty if Qdrant or embedding API unavailable. pub fn semantic_search(ctx: &Context, query: &str, limit: usize) -> Vec<(String, f64)> { - let qdrant_config = match &ctx.qdrant { - Some(c) => c, - None => return vec![], + let Some(embedding_config) = ctx.embedding.as_ref() else { + return vec![]; }; - - let embedding_config = match &ctx.embedding { - Some(c) => c, - None => return vec![], + let Some(query_vector) = embed_query(embedding_config, query) else { + return vec![]; }; - let embedding = match embed_query(embedding_config, query) { - Some(e) => e, - None => return vec![], + let collection = gobby_core::qdrant::collection_name( + "gcode", + CollectionScope::Custom(&format!( + "{CODE_SYMBOL_COLLECTION_PREFIX}{}", + ctx.project_id + )), + ); + let request = SearchRequest { + vector: query_vector, + limit, + filter: None, }; - let collection = format!("{}{}", qdrant_config.collection_prefix, ctx.project_id); + let Ok((hits, _state)) = + gobby_core::qdrant::with_qdrant(ctx.qdrant.as_ref(), Vec::new(), |config| { + gobby_core::qdrant::search(config, &collection, request) + }) + else { + return vec![]; + }; - vector_search(qdrant_config, &collection, &embedding, limit).unwrap_or_default() + hits.into_iter() + .map(|hit| (hit.id, f64::from(hit.score))) + .collect() } #[cfg(test)] mod tests { use super::*; + use crate::config::QdrantConfig; use std::path::PathBuf; fn make_ctx_no_qdrant() -> Context { @@ -154,6 +56,7 @@ mod tests { falkordb: None, qdrant: None, embedding: None, + code_vectors: crate::config::CodeVectorSettings::default(), daemon_url: None, } } @@ -171,11 +74,9 @@ mod tests { qdrant: Some(QdrantConfig { url: Some("http://localhost:6333".to_string()), api_key: None, - collection_prefix: "code_symbols_".to_string(), }), ..make_ctx_no_qdrant() }; - // No embedding config → returns empty let result = semantic_search(&ctx, "test query", 10); assert!(result.is_empty()); } diff --git a/crates/gcode/src/setup.rs b/crates/gcode/src/setup.rs new file mode 100644 index 0000000..2cb844f --- /dev/null +++ b/crates/gcode/src/setup.rs @@ -0,0 +1,1121 @@ +use gobby_core::setup::{ + OwnedObject, SetupContext, SetupError, SetupReport, StandaloneSetup, StoreKind, +}; +use postgres::Client; +use serde::{Deserialize, Serialize}; +use std::collections::HashSet; + +const DEFAULT_SCHEMA: &str = "public"; +const NAMESPACE: &str = "gcode"; +const OVERWRITE_GUIDANCE: &str = "Rerun with `gcode setup --standalone --overwrite-code-index` to replace only gcode-owned code-index relations."; + +const CODE_INDEX_TABLES: &[&str] = &[ + "code_indexed_projects", + "code_indexed_files", + "code_symbols", + "code_content_chunks", + "code_imports", + "code_calls", +]; + +const CODE_INDEX_INDEXES: &[&str] = &[ + "idx_cif_project", + "idx_cif_graph_synced", + "idx_cif_vectors_synced", + "idx_cs_project", + "idx_cs_file", + "idx_cs_name", + "idx_cs_qualified", + "idx_cs_kind", + "idx_cs_parent", + "idx_ccc_project", + "idx_ccc_file", + "idx_ci_file", + "idx_cc_file", + "idx_cc_caller", + "idx_cc_target", + "code_symbols_search_bm25", + "code_content_search_bm25", +]; + +struct TableContract { + name: &'static str, + required_columns: &'static [&'static str], +} + +struct IndexContract { + name: &'static str, + table: &'static str, + method: &'static str, +} + +const TABLE_CONTRACTS: &[TableContract] = &[ + TableContract { + name: "code_indexed_projects", + required_columns: &[ + "id", + "root_path", + "total_files", + "total_symbols", + "last_indexed_at", + "index_duration_ms", + "created_at", + "updated_at", + ], + }, + TableContract { + name: "code_indexed_files", + required_columns: &[ + "id", + "project_id", + "file_path", + "language", + "content_hash", + "symbol_count", + "byte_size", + "graph_synced", + "vectors_synced", + "graph_sync_attempted_at", + "indexed_at", + ], + }, + TableContract { + name: "code_symbols", + required_columns: &[ + "id", + "project_id", + "file_path", + "name", + "qualified_name", + "kind", + "language", + "byte_start", + "byte_end", + "line_start", + "line_end", + "signature", + "docstring", + "parent_symbol_id", + "content_hash", + "summary", + "created_at", + "updated_at", + ], + }, + TableContract { + name: "code_content_chunks", + required_columns: &[ + "id", + "project_id", + "file_path", + "chunk_index", + "line_start", + "line_end", + "content", + "language", + "created_at", + ], + }, + TableContract { + name: "code_imports", + required_columns: &["id", "project_id", "source_file", "target_module"], + }, + TableContract { + name: "code_calls", + required_columns: &[ + "id", + "project_id", + "caller_symbol_id", + "callee_symbol_id", + "callee_name", + "callee_target_kind", + "callee_external_module", + "file_path", + "line", + ], + }, +]; + +const INDEX_CONTRACTS: &[IndexContract] = &[ + IndexContract { + name: "idx_cif_project", + table: "code_indexed_files", + method: "btree", + }, + IndexContract { + name: "idx_cif_graph_synced", + table: "code_indexed_files", + method: "btree", + }, + IndexContract { + name: "idx_cif_vectors_synced", + table: "code_indexed_files", + method: "btree", + }, + IndexContract { + name: "idx_cs_project", + table: "code_symbols", + method: "btree", + }, + IndexContract { + name: "idx_cs_file", + table: "code_symbols", + method: "btree", + }, + IndexContract { + name: "idx_cs_name", + table: "code_symbols", + method: "btree", + }, + IndexContract { + name: "idx_cs_qualified", + table: "code_symbols", + method: "btree", + }, + IndexContract { + name: "idx_cs_kind", + table: "code_symbols", + method: "btree", + }, + IndexContract { + name: "idx_cs_parent", + table: "code_symbols", + method: "btree", + }, + IndexContract { + name: "idx_ccc_project", + table: "code_content_chunks", + method: "btree", + }, + IndexContract { + name: "idx_ccc_file", + table: "code_content_chunks", + method: "btree", + }, + IndexContract { + name: "idx_ci_file", + table: "code_imports", + method: "btree", + }, + IndexContract { + name: "idx_cc_file", + table: "code_calls", + method: "btree", + }, + IndexContract { + name: "idx_cc_caller", + table: "code_calls", + method: "btree", + }, + IndexContract { + name: "idx_cc_target", + table: "code_calls", + method: "btree", + }, + IndexContract { + name: "code_symbols_search_bm25", + table: "code_symbols", + method: "bm25", + }, + IndexContract { + name: "code_content_search_bm25", + table: "code_content_chunks", + method: "bm25", + }, +]; + +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +pub struct StandaloneSetupRequest { + pub standalone: bool, + pub database_url: Option, + pub no_services: bool, + pub overwrite_code_index: bool, + pub schema: String, + pub embedding_provider: Option, + pub embedding_api_base: Option, + pub embedding_model: Option, + pub embedding_vector_dim: Option, + pub embedding_api_key_env: Option, + pub falkordb_host: Option, + pub falkordb_port: Option, + pub falkordb_password: Option, + pub qdrant_url: Option, +} + +impl StandaloneSetupRequest { + pub fn new(standalone: bool, database_url: Option, schema: Option) -> Self { + Self { + standalone, + database_url, + no_services: false, + overwrite_code_index: false, + schema: schema.unwrap_or_else(|| DEFAULT_SCHEMA.to_string()), + embedding_provider: None, + embedding_api_base: None, + embedding_model: None, + embedding_vector_dim: None, + embedding_api_key_env: None, + falkordb_host: None, + falkordb_port: None, + falkordb_password: None, + qdrant_url: None, + } + } +} + +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +pub struct StandaloneServicesStatus { + pub provisioned: bool, + pub compose_file: Option, + pub health_checks: Vec, +} + +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +pub struct StandaloneEmbeddingStatus { + pub provider: String, + pub api_base: String, + pub model: String, + pub vector_dim: usize, + pub api_key_env: Option, +} + +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +pub struct StandaloneSetupStatus { + pub namespace: String, + pub schema: String, + pub created: Vec, + pub skipped: Vec, + pub failed: Vec<(String, String)>, + pub config_file: Option, + pub services: Option, + pub embedding: Option, +} + +#[derive(Debug, Clone, PartialEq, Eq)] +pub struct GcodeStandaloneSetup { + schema: String, +} + +impl GcodeStandaloneSetup { + pub fn new(schema: impl Into) -> Self { + Self { + schema: schema.into(), + } + } + + pub fn schema(&self) -> &str { + &self.schema + } + + fn object(&self, name: &str, sql: String) -> OwnedObject { + let object_name = name.to_string(); + OwnedObject { + name: object_name.clone(), + store: StoreKind::Postgres, + creator: Box::new(move |ctx| execute_postgres_ddl(ctx, &object_name, &sql)), + } + } + + fn qualified(&self, relation: &str) -> Result { + Ok(format!( + "{}.{}", + quote_identifier(&self.schema, "schema")?, + quote_identifier(relation, "relation")? + )) + } +} + +impl StandaloneSetup for GcodeStandaloneSetup { + fn namespace(&self) -> &str { + NAMESPACE + } + + fn owned_objects(&self) -> Vec { + let code_indexed_projects = match self.qualified("code_indexed_projects") { + Ok(name) => name, + Err(err) => return vec![invalid_object("code_indexed_projects table", err)], + }; + let code_indexed_files = match self.qualified("code_indexed_files") { + Ok(name) => name, + Err(err) => return vec![invalid_object("code_indexed_files table", err)], + }; + let code_symbols = match self.qualified("code_symbols") { + Ok(name) => name, + Err(err) => return vec![invalid_object("code_symbols table", err)], + }; + let code_content_chunks = match self.qualified("code_content_chunks") { + Ok(name) => name, + Err(err) => return vec![invalid_object("code_content_chunks table", err)], + }; + let code_imports = match self.qualified("code_imports") { + Ok(name) => name, + Err(err) => return vec![invalid_object("code_imports table", err)], + }; + let code_calls = match self.qualified("code_calls") { + Ok(name) => name, + Err(err) => return vec![invalid_object("code_calls table", err)], + }; + + vec![ + self.object( + "pg_search extension", + "CREATE EXTENSION IF NOT EXISTS pg_search;".to_string(), + ), + self.object( + "code_indexed_projects table", + format!( + "CREATE TABLE IF NOT EXISTS {code_indexed_projects} ( + id TEXT PRIMARY KEY, + root_path TEXT NOT NULL, + total_files INTEGER NOT NULL DEFAULT 0, + total_symbols INTEGER NOT NULL DEFAULT 0, + last_indexed_at TIMESTAMPTZ, + index_duration_ms INTEGER, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW() + );" + ), + ), + self.object( + "code_indexed_files table", + format!( + "CREATE TABLE IF NOT EXISTS {code_indexed_files} ( + id TEXT PRIMARY KEY, + project_id TEXT NOT NULL, + file_path TEXT NOT NULL, + language TEXT NOT NULL, + content_hash TEXT NOT NULL, + symbol_count INTEGER NOT NULL DEFAULT 0, + byte_size INTEGER NOT NULL DEFAULT 0, + graph_synced BOOLEAN NOT NULL DEFAULT FALSE, + vectors_synced BOOLEAN NOT NULL DEFAULT FALSE, + graph_sync_attempted_at TIMESTAMPTZ, + indexed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (project_id, file_path) + );" + ), + ), + self.object( + "idx_cif_project index", + format!( + "CREATE INDEX IF NOT EXISTS idx_cif_project + ON {code_indexed_files}(project_id);" + ), + ), + self.object( + "idx_cif_graph_synced index", + format!( + "CREATE INDEX IF NOT EXISTS idx_cif_graph_synced + ON {code_indexed_files}(project_id, graph_synced);" + ), + ), + self.object( + "idx_cif_vectors_synced index", + format!( + "CREATE INDEX IF NOT EXISTS idx_cif_vectors_synced + ON {code_indexed_files}(project_id, vectors_synced);" + ), + ), + self.object( + "code_symbols table", + format!( + "CREATE TABLE IF NOT EXISTS {code_symbols} ( + id TEXT PRIMARY KEY, + project_id TEXT NOT NULL, + file_path TEXT NOT NULL, + name TEXT NOT NULL, + qualified_name TEXT NOT NULL, + kind TEXT NOT NULL, + language TEXT NOT NULL, + byte_start INTEGER NOT NULL, + byte_end INTEGER NOT NULL, + line_start INTEGER NOT NULL, + line_end INTEGER NOT NULL, + signature TEXT, + docstring TEXT, + parent_symbol_id TEXT, + content_hash TEXT NOT NULL, + summary TEXT, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW() + );" + ), + ), + self.object( + "idx_cs_project index", + format!("CREATE INDEX IF NOT EXISTS idx_cs_project ON {code_symbols}(project_id);"), + ), + self.object( + "idx_cs_file index", + format!( + "CREATE INDEX IF NOT EXISTS idx_cs_file + ON {code_symbols}(project_id, file_path);" + ), + ), + self.object( + "idx_cs_name index", + format!("CREATE INDEX IF NOT EXISTS idx_cs_name ON {code_symbols}(name);"), + ), + self.object( + "idx_cs_qualified index", + format!( + "CREATE INDEX IF NOT EXISTS idx_cs_qualified + ON {code_symbols}(qualified_name);" + ), + ), + self.object( + "idx_cs_kind index", + format!("CREATE INDEX IF NOT EXISTS idx_cs_kind ON {code_symbols}(kind);"), + ), + self.object( + "idx_cs_parent index", + format!( + "CREATE INDEX IF NOT EXISTS idx_cs_parent + ON {code_symbols}(parent_symbol_id);" + ), + ), + self.object( + "code_content_chunks table", + format!( + "CREATE TABLE IF NOT EXISTS {code_content_chunks} ( + id TEXT PRIMARY KEY, + project_id TEXT NOT NULL, + file_path TEXT NOT NULL, + chunk_index INTEGER NOT NULL, + line_start INTEGER NOT NULL, + line_end INTEGER NOT NULL, + content TEXT NOT NULL, + language TEXT, + created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(), + UNIQUE (project_id, file_path, chunk_index) + );" + ), + ), + self.object( + "idx_ccc_project index", + format!( + "CREATE INDEX IF NOT EXISTS idx_ccc_project + ON {code_content_chunks}(project_id);" + ), + ), + self.object( + "idx_ccc_file index", + format!( + "CREATE INDEX IF NOT EXISTS idx_ccc_file + ON {code_content_chunks}(project_id, file_path);" + ), + ), + self.object( + "code_imports table", + format!( + "CREATE TABLE IF NOT EXISTS {code_imports} ( + id INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY, + project_id TEXT NOT NULL, + source_file TEXT NOT NULL, + target_module TEXT NOT NULL, + UNIQUE (project_id, source_file, target_module) + );" + ), + ), + self.object( + "idx_ci_file index", + format!( + "CREATE INDEX IF NOT EXISTS idx_ci_file + ON {code_imports}(project_id, source_file);" + ), + ), + self.object( + "code_calls table", + format!( + "CREATE TABLE IF NOT EXISTS {code_calls} ( + id INTEGER GENERATED ALWAYS AS IDENTITY PRIMARY KEY, + project_id TEXT NOT NULL, + caller_symbol_id TEXT NOT NULL, + callee_symbol_id TEXT NOT NULL DEFAULT '', + callee_name TEXT NOT NULL, + callee_target_kind TEXT NOT NULL DEFAULT 'unresolved', + callee_external_module TEXT NOT NULL DEFAULT '', + file_path TEXT NOT NULL, + line INTEGER NOT NULL DEFAULT 0, + UNIQUE ( + project_id, caller_symbol_id, callee_symbol_id, callee_name, + callee_target_kind, callee_external_module, file_path, line + ) + );" + ), + ), + self.object( + "idx_cc_file index", + format!( + "CREATE INDEX IF NOT EXISTS idx_cc_file + ON {code_calls}(project_id, file_path);" + ), + ), + self.object( + "idx_cc_caller index", + format!( + "CREATE INDEX IF NOT EXISTS idx_cc_caller + ON {code_calls}(project_id, caller_symbol_id);" + ), + ), + self.object( + "idx_cc_target index", + format!( + "CREATE INDEX IF NOT EXISTS idx_cc_target + ON {code_calls}(project_id, callee_target_kind, callee_symbol_id, callee_name);" + ), + ), + self.object( + "code_symbols_search_bm25 index", + format!( + "CREATE INDEX IF NOT EXISTS code_symbols_search_bm25 + ON {code_symbols} + USING bm25 (id, name, qualified_name, signature, docstring, summary) + WITH (key_field = 'id');" + ), + ), + self.object( + "code_content_search_bm25 index", + format!( + "CREATE INDEX IF NOT EXISTS code_content_search_bm25 + ON {code_content_chunks} + USING bm25 (id, content) + WITH (key_field = 'id');" + ), + ), + ] + } + + fn create(&self, ctx: &mut SetupContext<'_>) -> Result { + let mut report = SetupReport::default(); + for mut object in self.owned_objects() { + match (object.creator)(ctx) { + Ok(()) => report.created.push(object.name), + Err(err) => { + report.failed.push((object.name, err.to_string())); + return Err(err); + } + } + } + Ok(report) + } +} + +pub fn run_standalone_setup( + request: &StandaloneSetupRequest, + client: &mut Client, +) -> Result { + validate_standalone_request(request)?; + + let setup = GcodeStandaloneSetup::new(request.schema.clone()); + if request.overwrite_code_index { + reset_postgres_code_index(client, setup.schema())?; + } else { + ensure_postgres_code_index_compatible(client, setup.schema())?; + } + + let mut ctx = SetupContext { + pg: Some(client), + falkor_config: None, + qdrant_config: None, + non_interactive: true, + }; + let report = setup.create(&mut ctx)?; + + Ok(StandaloneSetupStatus { + namespace: setup.namespace().to_string(), + schema: setup.schema().to_string(), + created: report.created, + skipped: report.skipped, + failed: report.failed, + config_file: None, + services: None, + embedding: None, + }) +} + +pub(crate) fn ensure_postgres_code_index_compatible( + client: &mut Client, + schema: &str, +) -> Result<(), SetupError> { + let issues = incompatible_postgres_code_index_relations(client, schema)?; + if issues.is_empty() { + return Ok(()); + } + + Err(SetupError::CreationFailed { + object: "code-index preflight".to_string(), + message: format!( + "existing code-index PostgreSQL state is incompatible: {}. {OVERWRITE_GUIDANCE}", + issues.join("; ") + ), + }) +} + +pub(crate) fn reset_postgres_code_index( + client: &mut Client, + schema: &str, +) -> Result<(), SetupError> { + let sql = postgres_overwrite_reset_sql(schema)?; + client + .batch_execute(&sql) + .map_err(|err| SetupError::CreationFailed { + object: "code-index overwrite reset".to_string(), + message: err.to_string(), + }) +} + +pub(crate) fn postgres_overwrite_reset_sql(schema: &str) -> Result { + let mut statements = Vec::new(); + for index in CODE_INDEX_INDEXES { + statements.push(format!( + "DROP INDEX IF EXISTS {};", + qualified_relation(schema, index, "index")? + )); + } + for table in CODE_INDEX_TABLES.iter().rev() { + statements.push(format!( + "DROP TABLE IF EXISTS {};", + qualified_relation(schema, table, "table")? + )); + } + Ok(statements.join("\n")) +} + +fn incompatible_postgres_code_index_relations( + client: &mut Client, + schema: &str, +) -> Result, SetupError> { + let mut issues = Vec::new(); + for contract in TABLE_CONTRACTS { + inspect_table_contract(client, schema, contract, &mut issues)?; + } + for contract in INDEX_CONTRACTS { + inspect_index_contract(client, schema, contract, &mut issues)?; + } + Ok(issues) +} + +fn inspect_table_contract( + client: &mut Client, + schema: &str, + contract: &TableContract, + issues: &mut Vec, +) -> Result<(), SetupError> { + let Some(kind) = relation_kind(client, schema, contract.name)? else { + return Ok(()); + }; + if kind != "r" { + issues.push(format!( + "{} exists but is not an ordinary table", + contract.name + )); + return Ok(()); + } + + let existing = table_columns(client, schema, contract.name)?; + let missing = contract + .required_columns + .iter() + .filter(|column| !existing.contains::(column)) + .copied() + .collect::>(); + if !missing.is_empty() { + issues.push(format!( + "{} is missing column(s): {}", + contract.name, + missing.join(", ") + )); + } + Ok(()) +} + +fn inspect_index_contract( + client: &mut Client, + schema: &str, + contract: &IndexContract, + issues: &mut Vec, +) -> Result<(), SetupError> { + let Some(index) = index_info(client, schema, contract.name)? else { + return Ok(()); + }; + + if index.relkind != "i" && index.relkind != "I" { + issues.push(format!("{} exists but is not an index", contract.name)); + return Ok(()); + } + if index.table_name.as_deref() != Some(contract.table) { + issues.push(format!( + "{} is attached to {}, expected {}", + contract.name, + index.table_name.as_deref().unwrap_or(""), + contract.table + )); + } + if index.method.as_deref() != Some(contract.method) { + issues.push(format!( + "{} uses access method {}, expected {}", + contract.name, + index.method.as_deref().unwrap_or(""), + contract.method + )); + } + Ok(()) +} + +fn relation_kind( + client: &mut Client, + schema: &str, + relation: &str, +) -> Result, SetupError> { + let row = client + .query_opt( + "SELECT c.relkind::TEXT + FROM pg_class c + JOIN pg_namespace n ON n.oid = c.relnamespace + WHERE n.nspname = $1 AND c.relname = $2", + &[&schema, &relation], + ) + .map_err(|err| SetupError::CreationFailed { + object: format!("{relation} preflight"), + message: err.to_string(), + })?; + Ok(row.map(|row| row.get(0))) +} + +fn table_columns( + client: &mut Client, + schema: &str, + table: &str, +) -> Result, SetupError> { + let rows = client + .query( + "SELECT a.attname + FROM pg_attribute a + JOIN pg_class c ON c.oid = a.attrelid + JOIN pg_namespace n ON n.oid = c.relnamespace + WHERE n.nspname = $1 + AND c.relname = $2 + AND a.attnum > 0 + AND NOT a.attisdropped", + &[&schema, &table], + ) + .map_err(|err| SetupError::CreationFailed { + object: format!("{table} preflight"), + message: err.to_string(), + })?; + Ok(rows.into_iter().map(|row| row.get(0)).collect()) +} + +struct ExistingIndexInfo { + relkind: String, + table_name: Option, + method: Option, +} + +fn index_info( + client: &mut Client, + schema: &str, + index: &str, +) -> Result, SetupError> { + let row = client + .query_opt( + "SELECT c.relkind::TEXT, + table_class.relname::TEXT AS table_name, + am.amname::TEXT AS method + FROM pg_class c + JOIN pg_namespace n ON n.oid = c.relnamespace + LEFT JOIN pg_index idx ON idx.indexrelid = c.oid + LEFT JOIN pg_class table_class ON table_class.oid = idx.indrelid + LEFT JOIN pg_am am ON am.oid = c.relam + WHERE n.nspname = $1 AND c.relname = $2", + &[&schema, &index], + ) + .map_err(|err| SetupError::CreationFailed { + object: format!("{index} preflight"), + message: err.to_string(), + })?; + + Ok(row.map(|row| ExistingIndexInfo { + relkind: row.get(0), + table_name: row.get(1), + method: row.get(2), + })) +} + +pub fn validate_standalone_request(request: &StandaloneSetupRequest) -> Result<(), SetupError> { + if !request.standalone { + return Err(SetupError::AttachedModeRefused); + } + if request.schema != DEFAULT_SCHEMA { + return Err(SetupError::CreationFailed { + object: "schema".to_string(), + message: "standalone code-index schema must be `public` for daemon adoption" + .to_string(), + }); + } + Ok(()) +} + +fn qualified_relation(schema: &str, relation: &str, label: &str) -> Result { + Ok(format!( + "{}.{}", + quote_identifier(schema, "schema")?, + quote_identifier(relation, label)? + )) +} + +fn execute_postgres_ddl( + ctx: &mut SetupContext<'_>, + object: &str, + sql: &str, +) -> Result<(), SetupError> { + let Some(pg) = ctx.pg.as_deref_mut() else { + return Err(SetupError::ConnectionFailed { + store: "postgres".to_string(), + message: "PostgreSQL connection was not supplied to setup context".to_string(), + }); + }; + + pg.batch_execute(sql) + .map_err(|err| SetupError::CreationFailed { + object: object.to_string(), + message: err.to_string(), + }) +} + +fn invalid_object(name: &str, err: SetupError) -> OwnedObject { + let message = err.to_string(); + let object_name = name.to_string(); + OwnedObject { + name: object_name.clone(), + store: StoreKind::Postgres, + creator: Box::new(move |_| { + Err(SetupError::CreationFailed { + object: object_name.clone(), + message: message.clone(), + }) + }), + } +} + +fn quote_identifier(value: &str, label: &str) -> Result { + let trimmed = value.trim(); + if trimmed.is_empty() { + return Err(SetupError::CreationFailed { + object: label.to_string(), + message: format!("{label} identifier must not be empty"), + }); + } + if trimmed.contains('\0') { + return Err(SetupError::CreationFailed { + object: label.to_string(), + message: format!("{label} identifier must not contain NUL bytes"), + }); + } + Ok(format!("\"{}\"", trimmed.replace('"', "\"\""))) +} + +#[cfg(test)] +mod tests { + use super::*; + use gobby_core::setup::{StandaloneSetup, StoreKind}; + use postgres::NoTls; + + #[test] + fn standalone_setup_declares_public_daemon_code_index_subset() { + let setup = GcodeStandaloneSetup::new("public"); + assert_eq!(setup.namespace(), "gcode"); + assert_eq!(setup.schema(), "public"); + + let object_names: Vec = setup + .owned_objects() + .into_iter() + .map(|object| object.name) + .collect(); + + assert!( + object_names + .iter() + .any(|name| name.contains("indexed_files")) + ); + assert!(object_names.iter().any(|name| name.contains("symbols"))); + assert!( + object_names + .iter() + .any(|name| name.contains("content_chunks")) + ); + assert!(object_names.iter().any(|name| name.contains("idx_cif"))); + assert!(object_names.iter().any(|name| name.contains("bm25"))); + + let forbidden = [ + "config_store", + "schema_migrations", + "secrets", + ".gobby/project.json", + "project_json", + "code_graph_sync_state", + "code_vector_sync_state", + ]; + for name in object_names { + for forbidden_name in forbidden { + assert!( + !name.contains(forbidden_name), + "standalone setup declared forbidden object {name}" + ); + } + } + } + + #[test] + fn standalone_setup_uses_gobby_core_contract() { + fn assert_standalone_setup() {} + assert_standalone_setup::(); + + let setup = GcodeStandaloneSetup::new("public"); + let objects = setup.owned_objects(); + assert!( + objects + .iter() + .all(|object| object.store == StoreKind::Postgres) + ); + assert!( + objects + .iter() + .any(|object| object.name == "code_symbols table") + ); + assert!( + objects + .iter() + .any(|object| object.name == "code_symbols_search_bm25 index") + ); + assert!( + objects + .iter() + .any(|object| object.name == "pg_search extension") + ); + } + + #[test] + fn standalone_setup_rejects_non_public_schema() { + let request = StandaloneSetupRequest::new( + true, + Some("postgresql://localhost/gcode".to_string()), + Some("gcode_ci".to_string()), + ); + let err = validate_standalone_request(&request).expect_err("non-public schema fails"); + assert!(err.to_string().contains("public")); + } + + #[test] + fn overwrite_reset_sql_is_allowlisted() { + let sql = postgres_overwrite_reset_sql("public").expect("reset SQL"); + + for table in CODE_INDEX_TABLES { + assert!( + sql.contains(&format!("DROP TABLE IF EXISTS \"public\".\"{table}\";")), + "{sql}" + ); + } + for index in CODE_INDEX_INDEXES { + assert!( + sql.contains(&format!("DROP INDEX IF EXISTS \"public\".\"{index}\";")), + "{sql}" + ); + } + + for forbidden in [ + "config_store", + "schema_migrations", + "secrets", + "tasks", + "sessions", + "memory", + ".gobby/project.json", + ] { + assert!(!sql.contains(forbidden), "{sql}"); + } + assert!(!sql.contains("CASCADE"), "{sql}"); + assert!(!sql.contains("DROP DATABASE"), "{sql}"); + assert!(!sql.contains("DROP SCHEMA"), "{sql}"); + } + + #[test] + fn overwrite_guidance_names_flag() { + let request = StandaloneSetupRequest::new(true, None, None); + assert!(!request.overwrite_code_index); + assert!(OVERWRITE_GUIDANCE.contains("--overwrite-code-index")); + } + + #[test] + #[serial_test::serial] + fn overwrite_recreates_incompatible_code_index_and_preserves_sentinel_table() { + let Ok(database_url) = std::env::var("GCODE_POSTGRES_TEST_DATABASE_URL") else { + return; + }; + let mut client = + Client::connect(&database_url, NoTls).expect("connect test PostgreSQL hub"); + cleanup_code_index_relations(&mut client); + client + .batch_execute( + "CREATE TABLE public.code_symbols (id TEXT PRIMARY KEY); + CREATE TABLE IF NOT EXISTS public.gobby_owned_sentinel ( + key TEXT PRIMARY KEY, + value TEXT NOT NULL + ); + INSERT INTO public.gobby_owned_sentinel (key, value) + VALUES ('gcode-overwrite-sentinel', 'keep-me') + ON CONFLICT (key) DO UPDATE SET value = EXCLUDED.value;", + ) + .expect("seed incompatible code index and sentinel"); + + let request = StandaloneSetupRequest::new(true, Some(database_url.clone()), None); + let err = run_standalone_setup(&request, &mut client) + .expect_err("incompatible setup fails without overwrite"); + assert!(err.to_string().contains("--overwrite-code-index")); + + let mut overwrite = StandaloneSetupRequest::new(true, Some(database_url), None); + overwrite.overwrite_code_index = true; + run_standalone_setup(&overwrite, &mut client).expect("overwrite setup succeeds"); + + let has_project_id: bool = client + .query_one( + "SELECT EXISTS( + SELECT 1 + FROM pg_attribute + WHERE attrelid = 'public.code_symbols'::regclass + AND attname = 'project_id' + AND attnum > 0 + AND NOT attisdropped + )", + &[], + ) + .expect("check recreated code_symbols") + .get(0); + assert!(has_project_id); + + let sentinel: String = client + .query_one( + "SELECT value FROM public.gobby_owned_sentinel WHERE key = 'gcode-overwrite-sentinel'", + &[], + ) + .expect("read sentinel") + .get(0); + assert_eq!(sentinel, "keep-me"); + + cleanup_code_index_relations(&mut client); + client + .batch_execute( + "DELETE FROM public.gobby_owned_sentinel WHERE key = 'gcode-overwrite-sentinel'; + DROP TABLE IF EXISTS public.gobby_owned_sentinel;", + ) + .expect("cleanup sentinel"); + } + + fn cleanup_code_index_relations(client: &mut Client) { + let sql = postgres_overwrite_reset_sql("public").expect("reset SQL"); + client + .batch_execute(&sql) + .expect("cleanup code index objects"); + } +} diff --git a/crates/gcode/src/utils.rs b/crates/gcode/src/utils.rs new file mode 100644 index 0000000..4d24c40 --- /dev/null +++ b/crates/gcode/src/utils.rs @@ -0,0 +1,23 @@ +pub fn short_id(id: &str) -> &str { + id.get(..8).unwrap_or(id) +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn short_id_truncates_long_ids() { + assert_eq!(short_id("1234567890"), "12345678"); + } + + #[test] + fn short_id_returns_input_for_short_strings() { + assert_eq!(short_id("abc"), "abc"); + } + + #[test] + fn short_id_returns_input_for_exact_length() { + assert_eq!(short_id("12345678"), "12345678"); + } +} diff --git a/crates/gcode/src/vector/code_symbols.rs b/crates/gcode/src/vector/code_symbols.rs new file mode 100644 index 0000000..5512c7f --- /dev/null +++ b/crates/gcode/src/vector/code_symbols.rs @@ -0,0 +1,1348 @@ +use postgres::GenericClient; +use reqwest::StatusCode; +use serde::{Deserialize, Serialize}; +use serde_json::{Map, Value, json}; +use std::fmt; +use std::time::Duration; + +use crate::config::{ + CODE_SYMBOL_COLLECTION_PREFIX, CodeVectorSettings, Context, EmbeddingConfig, QdrantConfig, +}; +use crate::db; +use crate::models::{ProjectionMetadata, ProjectionProvenance, Symbol}; +use gobby_core::degradation::ServiceState; +use gobby_core::qdrant::{CollectionScope, SearchRequest, UpsertRequest}; + +// Keep code-symbol collections compatible with the Python daemon's Qdrant schema. +pub const VECTOR_DISTANCE_COSINE: &str = "Cosine"; +const DIMENSION_PROBE_TEXT: &str = "dimension_probe"; +const HTTP_TIMEOUT: Duration = Duration::from_secs(10); + +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +pub struct CodeSymbolVectorSearchRequest { + pub project_id: String, + pub query: String, + pub limit: usize, + pub collection_prefix: String, +} + +#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)] +pub struct CodeSymbolVectorSearchHit { + pub symbol_id: String, + pub score: f64, +} + +#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)] +pub struct CodeSymbolVectorPayload { + pub project_id: String, + pub file_path: String, + pub symbol_id: String, + pub name: String, + pub kind: String, + pub language: String, + pub line_start: usize, + pub line_end: usize, + pub byte_start: usize, + pub byte_end: usize, + #[serde(skip_serializing_if = "Option::is_none")] + pub signature: Option, + #[serde(skip_serializing_if = "Option::is_none")] + pub docstring: Option, + pub provenance: ProjectionProvenance, + #[serde(skip_serializing_if = "Option::is_none")] + pub confidence: Option, + pub source_system: String, + pub source_file_path: String, + pub source_line: usize, + pub source_line_start: usize, + pub source_line_end: usize, + pub source_byte_start: usize, + pub source_byte_end: usize, + pub source_symbol_id: String, + #[serde(skip_serializing_if = "Option::is_none")] + pub summary: Option, +} + +impl CodeSymbolVectorPayload { + pub fn from_symbol(symbol: &Symbol) -> Self { + let metadata = ProjectionMetadata::gcode_extracted() + .with_source_file_path(&symbol.file_path) + .with_source_line(symbol.line_start) + .with_source_symbol_id(&symbol.id); + + Self { + project_id: symbol.project_id.clone(), + file_path: symbol.file_path.clone(), + symbol_id: symbol.id.clone(), + name: symbol.name.clone(), + kind: symbol.kind.clone(), + language: symbol.language.clone(), + line_start: symbol.line_start, + line_end: symbol.line_end, + byte_start: symbol.byte_start, + byte_end: symbol.byte_end, + signature: symbol.signature.clone(), + docstring: symbol.docstring.clone(), + provenance: metadata.provenance, + confidence: metadata.confidence, + source_system: metadata.source_system, + source_file_path: metadata + .source_file_path + .unwrap_or_else(|| symbol.file_path.clone()), + source_line: metadata.source_line.unwrap_or(symbol.line_start), + source_line_start: symbol.line_start, + source_line_end: symbol.line_end, + source_byte_start: symbol.byte_start, + source_byte_end: symbol.byte_end, + source_symbol_id: metadata + .source_symbol_id + .unwrap_or_else(|| symbol.id.clone()), + summary: symbol.summary.clone(), + } + } +} + +#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)] +#[serde(rename_all = "snake_case")] +pub enum CodeSymbolVectorLifecycleAction { + Ensure, + SyncFile, + Clear, + Rebuild, +} + +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +pub struct CodeSymbolVectorLifecycleStatus { + pub project_id: String, + pub collection: String, + pub action: CodeSymbolVectorLifecycleAction, +} + +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +pub struct VectorCollectionSchema { + pub size: usize, + pub distance: String, +} + +#[derive(Debug, Clone, PartialEq, Eq)] +struct ExistingVectorCollectionSchema { + size: Option, + distance: Option, +} + +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +pub struct CodeSymbolVectorLifecycleOutput { + pub project_id: String, + pub collection: String, + pub action: CodeSymbolVectorLifecycleAction, + pub file_path: Option, + pub symbols: usize, + pub vectors_upserted: usize, + pub vectors_deleted: usize, + pub summary: String, +} + +#[derive(Debug, Clone, PartialEq, Eq)] +pub enum VectorLifecycleError { + MissingQdrantConfig, + MissingEmbeddingConfig, + EmbeddingHttp { + status: u16, + body: String, + }, + EmbeddingResponse(String), + QdrantHttp { + operation: &'static str, + status: u16, + body: String, + }, + QdrantOperation(String), + DimensionMismatch { + collection: String, + expected_size: usize, + found_size: Option, + expected_distance: &'static str, + found_distance: Option, + }, +} + +impl fmt::Display for VectorLifecycleError { + fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { + match self { + Self::MissingQdrantConfig => { + write!(f, "Qdrant config is required for vector lifecycle commands") + } + Self::MissingEmbeddingConfig => write!( + f, + "embedding config is required for vector lifecycle commands" + ), + Self::EmbeddingHttp { status, body } => { + write!(f, "embedding request failed: HTTP {status}: {body}") + } + Self::EmbeddingResponse(reason) => { + write!(f, "embedding response was invalid: {reason}") + } + Self::QdrantHttp { + operation, + status, + body, + } => write!(f, "Qdrant {operation} failed: HTTP {status}: {body}"), + Self::QdrantOperation(reason) => write!(f, "Qdrant operation failed: {reason}"), + Self::DimensionMismatch { + collection, + expected_size, + found_size, + expected_distance, + found_distance, + } => write!( + f, + "Qdrant collection `{collection}` has incompatible vector schema: expected size {expected_size} distance {expected_distance}, found size {} distance {}. Refusing to migrate, drop, or recreate the collection.", + found_size + .map(|value| value.to_string()) + .unwrap_or_else(|| "unknown".to_string()), + found_distance.as_deref().unwrap_or("unknown") + ), + } + } +} + +impl std::error::Error for VectorLifecycleError {} + +#[derive(Debug)] +pub struct CodeSymbolVectorLifecycle { + project_id: String, + collection: String, + qdrant: QdrantConfig, + embedding: EmbeddingConfig, + settings: CodeVectorSettings, + probed_vector_size: Option, + client: reqwest::blocking::Client, +} + +pub fn collection_name(collection_prefix: &str, project_id: &str) -> String { + let collection = format!("{collection_prefix}{project_id}"); + gobby_core::qdrant::collection_name("gcode", CollectionScope::Custom(&collection)) +} + +pub fn delete_project_collection( + qdrant: &QdrantConfig, + project_id: &str, +) -> Result { + let client = qdrant_http_client()?; + let collection = collection_name(CODE_SYMBOL_COLLECTION_PREFIX, project_id); + delete_qdrant_collection(&client, qdrant, &collection) +} + +pub fn delete_file_vectors( + qdrant: &QdrantConfig, + project_id: &str, + file_path: &str, +) -> Result { + let client = qdrant_http_client()?; + let collection = collection_name(CODE_SYMBOL_COLLECTION_PREFIX, project_id); + delete_vectors_for_filter(&client, qdrant, &collection, project_id, Some(file_path)) +} + +pub fn delete_code_symbol_collections_with_prefix( + qdrant: &QdrantConfig, +) -> Result, VectorLifecycleError> { + let client = qdrant_http_client()?; + let resp = qdrant_request_for_config(&client, qdrant, reqwest::Method::GET, "/collections")? + .send() + .map_err(|err| VectorLifecycleError::QdrantOperation(err.to_string()))?; + let status = resp.status(); + if !status.is_success() { + return Err(qdrant_http_error("list collections", status, resp)); + } + + let data: Value = resp + .json() + .map_err(|err| VectorLifecycleError::QdrantOperation(err.to_string()))?; + let collections = parse_collection_names(&data) + .into_iter() + .filter(|name| name.starts_with(CODE_SYMBOL_COLLECTION_PREFIX)) + .collect::>(); + + let mut deleted = Vec::new(); + for collection in collections { + if delete_qdrant_collection(&client, qdrant, &collection)? { + deleted.push(collection); + } + } + Ok(deleted) +} + +pub fn resolve_lifecycle_qdrant_config( + source: &mut impl gobby_core::config::ConfigSource, +) -> Option { + gobby_core::config::resolve_qdrant_config(source) +} + +pub fn lifecycle_status( + project_id: impl Into, + collection_prefix: &str, + action: CodeSymbolVectorLifecycleAction, +) -> CodeSymbolVectorLifecycleStatus { + let project_id = project_id.into(); + CodeSymbolVectorLifecycleStatus { + collection: collection_name(collection_prefix, &project_id), + project_id, + action, + } +} + +pub fn embed_text(config: &EmbeddingConfig, text: &str) -> Result, VectorLifecycleError> { + let client = reqwest::blocking::Client::builder() + .timeout(HTTP_TIMEOUT) + .build() + .map_err(|err| VectorLifecycleError::EmbeddingResponse(err.to_string()))?; + + let body = json!({ + "model": config.model, + "input": text, + }); + + let url = format!("{}/embeddings", config.api_base.trim_end_matches('/')); + let mut req = client.post(&url).json(&body); + + if let Some(key) = &config.api_key { + req = req.header("Authorization", format!("Bearer {key}")); + } + + let resp = req + .send() + .map_err(|err| VectorLifecycleError::EmbeddingResponse(err.to_string()))?; + if !resp.status().is_success() { + let status = resp.status().as_u16(); + let body = resp.text().unwrap_or_default(); + return Err(VectorLifecycleError::EmbeddingHttp { status, body }); + } + + let data: Value = resp + .json() + .map_err(|err| VectorLifecycleError::EmbeddingResponse(err.to_string()))?; + let embedding: Vec = data + .get("data") + .and_then(Value::as_array) + .and_then(|values| values.first()) + .and_then(|value| value.get("embedding")) + .and_then(Value::as_array) + .ok_or_else(|| { + VectorLifecycleError::EmbeddingResponse("missing data[0].embedding array".to_string()) + })? + .iter() + .map(|value| { + value.as_f64().map(|f| f as f32).ok_or_else(|| { + VectorLifecycleError::EmbeddingResponse( + "embedding array contains a non-number".to_string(), + ) + }) + }) + .collect::, _>>()?; + + if embedding.is_empty() { + Err(VectorLifecycleError::EmbeddingResponse( + "embedding vector was empty".to_string(), + )) + } else { + Ok(embedding) + } +} + +pub fn embed_query(config: &EmbeddingConfig, text: &str) -> Option> { + embed_text(config, &format!("search_query: {text}")).ok() +} + +pub fn vector_text_for_symbol(symbol: &Symbol) -> String { + let mut lines = vec![ + format!("name: {}", symbol.name), + format!("qualified_name: {}", symbol.qualified_name), + format!("kind: {}", symbol.kind), + format!("language: {}", symbol.language), + format!("file_path: {}", symbol.file_path), + format!("range: {}-{}", symbol.line_start, symbol.line_end), + ]; + if let Some(signature) = symbol + .signature + .as_deref() + .filter(|value| !value.trim().is_empty()) + { + lines.push(format!("signature: {signature}")); + } + if let Some(docstring) = symbol + .docstring + .as_deref() + .filter(|value| !value.trim().is_empty()) + { + lines.push(format!("docstring: {docstring}")); + } + if let Some(summary) = symbol + .summary + .as_deref() + .filter(|value| !value.trim().is_empty()) + { + lines.push(format!("summary: {summary}")); + } + lines.join("\n") +} + +pub fn vector_search( + config: &QdrantConfig, + collection: &str, + query_vector: &[f32], + limit: usize, +) -> anyhow::Result> { + let request = SearchRequest { + vector: query_vector.to_vec(), + limit, + filter: None, + }; + let (hits, _) = gobby_core::qdrant::with_qdrant(Some(config), Vec::new(), |config| { + gobby_core::qdrant::search(config, collection, request) + })?; + Ok(hits + .into_iter() + .map(|hit| (hit.id, f64::from(hit.score))) + .collect()) +} + +impl CodeSymbolVectorLifecycle { + pub fn new( + project_id: String, + qdrant: QdrantConfig, + embedding: EmbeddingConfig, + settings: CodeVectorSettings, + ) -> Result { + if qdrant + .url + .as_deref() + .filter(|url| !url.trim().is_empty()) + .is_none() + { + return Err(VectorLifecycleError::MissingQdrantConfig); + } + if embedding.api_base.trim().is_empty() { + return Err(VectorLifecycleError::MissingEmbeddingConfig); + } + + let collection = collection_name(CODE_SYMBOL_COLLECTION_PREFIX, &project_id); + let client = reqwest::blocking::Client::builder() + .timeout(HTTP_TIMEOUT) + .build() + .map_err(|err| VectorLifecycleError::QdrantOperation(err.to_string()))?; + Ok(Self { + project_id, + collection, + qdrant, + embedding, + settings, + probed_vector_size: None, + client, + }) + } + + pub fn collection(&self) -> &str { + &self.collection + } + + pub fn ensure_collection(&mut self) -> Result { + let expected = self.expected_schema()?; + self.require_qdrant_boundary()?; + match self.get_collection_schema()? { + Some(found) => self.ensure_compatible_schema(expected, found), + None => { + self.create_collection(&expected)?; + Ok(expected) + } + } + } + + pub fn sync_file_symbols( + &mut self, + file_path: &str, + symbols: &[Symbol], + ) -> Result { + self.ensure_collection()?; + let points = self.points_for_symbols(symbols)?; + self.delete_vectors(Some(file_path))?; + self.upsert_points(points)?; + + Ok(self.output( + CodeSymbolVectorLifecycleAction::SyncFile, + Some(file_path.to_string()), + symbols.len(), + symbols.len(), + 1, + )) + } + + pub fn clear_project_vectors( + &mut self, + ) -> Result { + let expected = self.expected_schema()?; + self.require_qdrant_boundary()?; + let deleted = match self.get_collection_schema()? { + Some(found) => { + self.ensure_compatible_schema(expected, found)?; + self.delete_vectors(None)?; + 1 + } + None => 0, + }; + + Ok(self.output(CodeSymbolVectorLifecycleAction::Clear, None, 0, 0, deleted)) + } + + pub fn rebuild_symbols( + &mut self, + symbols: &[Symbol], + ) -> Result { + self.ensure_collection()?; + let points = self.points_for_symbols(symbols)?; + self.delete_vectors(None)?; + self.upsert_points(points)?; + + Ok(self.output( + CodeSymbolVectorLifecycleAction::Rebuild, + None, + symbols.len(), + symbols.len(), + 1, + )) + } + + fn output( + &self, + action: CodeSymbolVectorLifecycleAction, + file_path: Option, + symbols: usize, + vectors_upserted: usize, + vectors_deleted: usize, + ) -> CodeSymbolVectorLifecycleOutput { + CodeSymbolVectorLifecycleOutput { + project_id: self.project_id.clone(), + collection: self.collection.clone(), + action, + file_path, + symbols, + vectors_upserted, + vectors_deleted, + summary: format!( + "{vectors_upserted} vector(s) upserted, {vectors_deleted} delete operation(s) issued" + ), + } + } + + fn expected_schema(&mut self) -> Result { + let size = match self.settings.vector_dim { + Some(size) => size, + None => match self.probed_vector_size { + Some(size) => size, + None => { + let size = embed_text(&self.embedding, DIMENSION_PROBE_TEXT)?.len(); + self.probed_vector_size = Some(size); + size + } + }, + }; + + Ok(VectorCollectionSchema { + size, + distance: VECTOR_DISTANCE_COSINE.to_string(), + }) + } + + fn require_qdrant_boundary(&self) -> Result<(), VectorLifecycleError> { + let ((), state) = gobby_core::qdrant::with_qdrant(Some(&self.qdrant), (), |_| Ok(())) + .map_err(|err| VectorLifecycleError::QdrantOperation(err.to_string()))?; + match state { + ServiceState::Available => Ok(()), + ServiceState::NotConfigured => Err(VectorLifecycleError::MissingQdrantConfig), + other => Err(VectorLifecycleError::QdrantOperation(format!( + "unexpected Qdrant service state: {other:?}" + ))), + } + } + + fn ensure_compatible_schema( + &self, + expected: VectorCollectionSchema, + found: ExistingVectorCollectionSchema, + ) -> Result { + if found.size == Some(expected.size) + && found.distance.as_deref() == Some(&expected.distance) + { + return Ok(VectorCollectionSchema { + size: expected.size, + distance: expected.distance, + }); + } + + Err(VectorLifecycleError::DimensionMismatch { + collection: self.collection.clone(), + expected_size: expected.size, + found_size: found.size, + expected_distance: VECTOR_DISTANCE_COSINE, + found_distance: found.distance, + }) + } + + fn get_collection_schema( + &self, + ) -> Result, VectorLifecycleError> { + let resp = self + .qdrant_request( + reqwest::Method::GET, + &format!("/collections/{}", self.collection), + )? + .send() + .map_err(|err| VectorLifecycleError::QdrantOperation(err.to_string()))?; + let status = resp.status(); + if status == StatusCode::NOT_FOUND { + return Ok(None); + } + if !status.is_success() { + return Err(qdrant_http_error("get collection", status, resp)); + } + + let data: Value = resp + .json() + .map_err(|err| VectorLifecycleError::QdrantOperation(err.to_string()))?; + Ok(parse_collection_schema(&data)) + } + + fn create_collection( + &self, + schema: &VectorCollectionSchema, + ) -> Result<(), VectorLifecycleError> { + let body = json!({ + "vectors": { + "size": schema.size, + "distance": schema.distance, + }, + }); + let resp = self + .qdrant_request( + reqwest::Method::PUT, + &format!("/collections/{}", self.collection), + )? + .json(&body) + .send() + .map_err(|err| VectorLifecycleError::QdrantOperation(err.to_string()))?; + if !resp.status().is_success() { + return Err(qdrant_http_error("create collection", resp.status(), resp)); + } + Ok(()) + } + + fn delete_vectors(&self, file_path: Option<&str>) -> Result<(), VectorLifecycleError> { + delete_vectors_for_filter( + &self.client, + &self.qdrant, + &self.collection, + &self.project_id, + file_path, + ) + .map(|_| ()) + } + + fn upsert_points(&self, points: Vec) -> Result<(), VectorLifecycleError> { + if points.is_empty() { + return Ok(()); + } + let ((), state) = gobby_core::qdrant::with_qdrant(Some(&self.qdrant), (), |config| { + gobby_core::qdrant::upsert(config, &self.collection, points) + }) + .map_err(|err| VectorLifecycleError::QdrantOperation(err.to_string()))?; + match state { + ServiceState::Available => Ok(()), + ServiceState::NotConfigured => Err(VectorLifecycleError::MissingQdrantConfig), + other => Err(VectorLifecycleError::QdrantOperation(format!( + "unexpected Qdrant service state: {other:?}" + ))), + } + } + + fn points_for_symbols( + &self, + symbols: &[Symbol], + ) -> Result, VectorLifecycleError> { + symbols + .iter() + .map(|symbol| { + let vector = embed_text(&self.embedding, &vector_text_for_symbol(symbol))?; + let payload = payload_map(CodeSymbolVectorPayload::from_symbol(symbol))?; + Ok(UpsertRequest { + id: symbol.id.clone(), + vector, + payload, + }) + }) + .collect() + } + + fn qdrant_request( + &self, + method: reqwest::Method, + path: &str, + ) -> Result { + qdrant_request_for_config(&self.client, &self.qdrant, method, path) + } +} + +pub fn fetch_symbols_for_file( + conn: &mut impl GenericClient, + project_id: &str, + file_path: &str, +) -> anyhow::Result> { + let columns = db::symbol_select_columns(""); + conn.query( + &format!( + "SELECT {columns} FROM code_symbols + WHERE project_id = $1 AND file_path = $2 + ORDER BY file_path, byte_start, id" + ), + &[&project_id, &file_path], + )? + .into_iter() + .map(|row| Symbol::from_row(&row)) + .collect() +} + +pub fn fetch_symbols_for_project( + conn: &mut impl GenericClient, + project_id: &str, +) -> anyhow::Result> { + let columns = db::symbol_select_columns(""); + conn.query( + &format!( + "SELECT {columns} FROM code_symbols + WHERE project_id = $1 + ORDER BY file_path, byte_start, id" + ), + &[&project_id], + )? + .into_iter() + .map(|row| Symbol::from_row(&row)) + .collect() +} + +fn payload_map( + payload: CodeSymbolVectorPayload, +) -> Result, VectorLifecycleError> { + match serde_json::to_value(payload) + .map_err(|err| VectorLifecycleError::QdrantOperation(err.to_string()))? + { + Value::Object(map) => Ok(map), + _ => Err(VectorLifecycleError::QdrantOperation( + "vector payload did not serialize to an object".to_string(), + )), + } +} + +fn parse_collection_schema(data: &Value) -> Option { + let vectors = data.pointer("/result/config/params/vectors")?; + let size = vectors + .get("size") + .and_then(Value::as_u64) + .map(|size| size as usize); + let distance = vectors + .get("distance") + .and_then(Value::as_str) + .map(str::to_string); + Some(ExistingVectorCollectionSchema { size, distance }) +} + +fn parse_collection_names(data: &Value) -> Vec { + data.pointer("/result/collections") + .and_then(Value::as_array) + .map(|collections| { + collections + .iter() + .filter_map(|collection| { + collection + .get("name") + .and_then(Value::as_str) + .map(str::to_string) + }) + .collect() + }) + .unwrap_or_default() +} + +fn qdrant_http_client() -> Result { + reqwest::blocking::Client::builder() + .timeout(HTTP_TIMEOUT) + .build() + .map_err(|err| VectorLifecycleError::QdrantOperation(err.to_string())) +} + +fn qdrant_request_for_config( + client: &reqwest::blocking::Client, + qdrant: &QdrantConfig, + method: reqwest::Method, + path: &str, +) -> Result { + let base = qdrant + .url + .as_deref() + .ok_or(VectorLifecycleError::MissingQdrantConfig)? + .trim_end_matches('/'); + let url = format!("{base}{path}"); + let mut req = client.request(method, url); + if let Some(key) = &qdrant.api_key { + req = req.header("api-key", key); + } + Ok(req) +} + +fn delete_qdrant_collection( + client: &reqwest::blocking::Client, + qdrant: &QdrantConfig, + collection: &str, +) -> Result { + let resp = qdrant_request_for_config( + client, + qdrant, + reqwest::Method::DELETE, + &format!("/collections/{collection}"), + )? + .send() + .map_err(|err| VectorLifecycleError::QdrantOperation(err.to_string()))?; + let status = resp.status(); + if status == StatusCode::NOT_FOUND { + return Ok(false); + } + if !status.is_success() { + return Err(qdrant_http_error("delete collection", status, resp)); + } + Ok(true) +} + +fn delete_vectors_for_filter( + client: &reqwest::blocking::Client, + qdrant: &QdrantConfig, + collection: &str, + project_id: &str, + file_path: Option<&str>, +) -> Result { + let mut must = vec![json!({ + "key": "project_id", + "match": {"value": project_id}, + })]; + if let Some(file_path) = file_path { + must.push(json!({ + "key": "file_path", + "match": {"value": file_path}, + })); + } + let body = json!({ + "filter": { + "must": must, + }, + }); + let resp = qdrant_request_for_config( + client, + qdrant, + reqwest::Method::POST, + &format!("/collections/{collection}/points/delete"), + )? + .json(&body) + .send() + .map_err(|err| VectorLifecycleError::QdrantOperation(err.to_string()))?; + let status = resp.status(); + if status == StatusCode::NOT_FOUND { + return Ok(false); + } + if !status.is_success() { + return Err(qdrant_http_error("delete points", status, resp)); + } + Ok(true) +} + +fn qdrant_http_error( + operation: &'static str, + status: StatusCode, + resp: reqwest::blocking::Response, +) -> VectorLifecycleError { + VectorLifecycleError::QdrantHttp { + operation, + status: status.as_u16(), + body: resp.text().unwrap_or_default(), + } +} + +pub fn search_code_symbols( + ctx: &Context, + request: &CodeSymbolVectorSearchRequest, +) -> Vec { + let qdrant_config = match &ctx.qdrant { + Some(c) => c, + None => return vec![], + }; + + let embedding_config = match &ctx.embedding { + Some(c) => c, + None => return vec![], + }; + + let embedding = match embed_query(embedding_config, &request.query) { + Some(e) => e, + None => return vec![], + }; + + let collection = collection_name(&request.collection_prefix, &request.project_id); + vector_search(qdrant_config, &collection, &embedding, request.limit) + .unwrap_or_default() + .into_iter() + .map(|(symbol_id, score)| CodeSymbolVectorSearchHit { symbol_id, score }) + .collect() +} + +pub fn semantic_search(ctx: &Context, query: &str, limit: usize) -> Vec<(String, f64)> { + if ctx.qdrant.is_none() { + return vec![]; + } + + let request = CodeSymbolVectorSearchRequest { + project_id: ctx.project_id.clone(), + query: query.to_string(), + limit, + collection_prefix: CODE_SYMBOL_COLLECTION_PREFIX.to_string(), + }; + + search_code_symbols(ctx, &request) + .into_iter() + .map(|hit| (hit.symbol_id, hit.score)) + .collect() +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::config::{CodeVectorSettings, QdrantConfig}; + use crate::models::{SOURCE_SYSTEM_GCODE, Symbol}; + use serde_json::{Value, json}; + use std::io::{Read, Write}; + use std::net::TcpListener; + use std::thread; + + fn test_symbol(summary: Option) -> Symbol { + Symbol { + id: "symbol-1".to_string(), + project_id: "project-1".to_string(), + file_path: "src/lib.rs".to_string(), + name: "run".to_string(), + qualified_name: "crate::run".to_string(), + kind: "function".to_string(), + language: "rust".to_string(), + byte_start: 10, + byte_end: 40, + line_start: 3, + line_end: 5, + signature: None, + docstring: None, + parent_symbol_id: None, + content_hash: "hash".to_string(), + summary, + created_at: String::new(), + updated_at: String::new(), + } + } + + #[test] + fn payloads_carry_provenance_metadata() { + let payload = CodeSymbolVectorPayload::from_symbol(&test_symbol(Some("does work".into()))); + + assert_eq!(payload.provenance, ProjectionProvenance::Extracted); + assert_eq!(payload.confidence, Some(1.0)); + assert_eq!(payload.source_system, SOURCE_SYSTEM_GCODE); + assert_eq!(payload.source_file_path, "src/lib.rs"); + assert_eq!(payload.source_line_start, 3); + assert_eq!(payload.source_line_end, 5); + assert_eq!(payload.source_byte_start, 10); + assert_eq!(payload.source_byte_end, 40); + assert_eq!(payload.source_line, 3); + assert_eq!(payload.source_symbol_id, "symbol-1"); + assert_eq!(payload.summary.as_deref(), Some("does work")); + assert_eq!(payload.signature, None); + assert_eq!(payload.docstring, None); + + let value = serde_json::to_value(payload).expect("payload serializes"); + assert_eq!(value["provenance"], "EXTRACTED"); + assert_eq!(value["confidence"], 1.0); + assert_eq!(value["source_system"], SOURCE_SYSTEM_GCODE); + assert_eq!(value["source_file_path"], "src/lib.rs"); + assert_eq!(value["source_line_start"], 3); + assert_eq!(value["source_line_end"], 5); + assert_eq!(value["source_byte_start"], 10); + assert_eq!(value["source_byte_end"], 40); + assert_eq!(value["source_symbol_id"], "symbol-1"); + } + + #[test] + fn summaries_are_optional_enrichment() { + let symbol = test_symbol(None); + let payload = CodeSymbolVectorPayload::from_symbol(&symbol); + let vector_text = vector_text_for_symbol(&symbol); + let value = serde_json::to_value(payload).expect("payload serializes"); + + assert!(value.get("summary").is_none()); + assert!(vector_text.contains("name: run")); + assert!(!vector_text.contains("summary:")); + } + + #[test] + fn collection_name_compatibility() { + assert_eq!( + collection_name(CODE_SYMBOL_COLLECTION_PREFIX, "project-1"), + "code_symbols_project-1" + ); + } + + #[test] + fn delete_project_collection_targets_only_project_collection() { + let (qdrant_url, handle) = spawn_http_responses(vec![(200, json!({"result": true}))]); + let deleted = delete_project_collection( + &QdrantConfig { + url: Some(qdrant_url), + api_key: Some("qdrant-key".to_string()), + }, + "project-1", + ) + .expect("delete collection"); + let requests = handle.join().expect("qdrant requests"); + + assert!(deleted); + assert_eq!(requests.len(), 1); + assert!(requests[0].contains("DELETE /collections/code_symbols_project-1 HTTP/1.1")); + assert!(requests[0].contains("api-key: qdrant-key")); + assert!(!requests[0].contains("project-2")); + } + + #[test] + fn delete_file_vectors_filters_by_project_and_file_without_embedding() { + let (qdrant_url, handle) = + spawn_http_responses(vec![(200, json!({"result": {"operation_id": 1}}))]); + let deleted = delete_file_vectors( + &QdrantConfig { + url: Some(qdrant_url), + api_key: Some("qdrant-key".to_string()), + }, + "project-1", + "src/lib.rs", + ) + .expect("delete vectors"); + let requests = handle.join().expect("qdrant requests"); + + assert!(deleted); + assert_eq!(requests.len(), 1); + assert!( + requests[0].contains("POST /collections/code_symbols_project-1/points/delete HTTP/1.1") + ); + assert!(requests[0].contains("api-key: qdrant-key")); + assert!(requests[0].contains(r#""key":"project_id""#)); + assert!(requests[0].contains(r#""value":"project-1""#)); + assert!(requests[0].contains(r#""key":"file_path""#)); + assert!(requests[0].contains(r#""value":"src/lib.rs""#)); + } + + #[test] + fn clear_project_vectors_does_not_touch_memory_vector_collections() { + let (qdrant_url, handle) = spawn_http_responses(vec![ + ( + 200, + json!({"result": {"config": {"params": {"vectors": {"size": 3, "distance": "Cosine"}}}}}), + ), + (200, json!({"result": {"operation_id": 1}})), + ]); + let mut lifecycle = CodeSymbolVectorLifecycle::new( + "project-1".to_string(), + QdrantConfig { + url: Some(qdrant_url), + api_key: None, + }, + EmbeddingConfig { + api_base: "http://127.0.0.1:9/v1".to_string(), + model: "unused".to_string(), + api_key: None, + }, + CodeVectorSettings { + vector_dim: Some(3), + }, + ) + .expect("lifecycle"); + + let cleared = lifecycle.clear_project_vectors().expect("clear vectors"); + let requests = handle.join().expect("qdrant requests"); + + assert_eq!(cleared.vectors_deleted, 1); + assert_eq!(requests.len(), 2); + assert!(requests[0].contains("GET /collections/code_symbols_project-1 HTTP/1.1")); + assert!( + requests[1].contains("POST /collections/code_symbols_project-1/points/delete HTTP/1.1") + ); + assert!(requests[1].contains(r#""key":"project_id""#)); + assert!(requests[1].contains(r#""value":"project-1""#)); + assert!(!requests[1].contains(r#""key":"file_path""#)); + assert!(requests.iter().all(|request| !request.contains("memory"))); + assert!( + requests + .iter() + .all(|request| !request.contains("GET /collections HTTP/1.1")) + ); + assert!( + requests + .iter() + .all(|request| !request.contains("DELETE /collections/")) + ); + } + + #[test] + fn delete_prefixed_collections_deletes_only_code_symbol_collections() { + let (qdrant_url, handle) = spawn_http_responses(vec![ + ( + 200, + json!({ + "result": { + "collections": [ + {"name": "code_symbols_project-1"}, + {"name": "memory_vectors"}, + {"name": "code_symbols_project-2"} + ] + } + }), + ), + (200, json!({"result": true})), + (200, json!({"result": true})), + ]); + let deleted = delete_code_symbol_collections_with_prefix(&QdrantConfig { + url: Some(qdrant_url), + api_key: None, + }) + .expect("delete prefixed collections"); + let requests = handle.join().expect("qdrant requests"); + + assert_eq!( + deleted, + vec![ + "code_symbols_project-1".to_string(), + "code_symbols_project-2".to_string() + ] + ); + assert_eq!(requests.len(), 3); + assert!(requests[0].contains("GET /collections HTTP/1.1")); + assert!(requests[1].contains("DELETE /collections/code_symbols_project-1 HTTP/1.1")); + assert!(requests[2].contains("DELETE /collections/code_symbols_project-2 HTTP/1.1")); + assert!( + requests + .iter() + .all(|request| !request.contains("DELETE /collections/memory_vectors")) + ); + } + + #[test] + fn embedding_request_response() { + let (base_url, handle) = spawn_http_responses(vec![( + 200, + json!({"data": [{"embedding": [0.25, 0.5, 0.75]}]}), + )]); + let config = EmbeddingConfig { + api_base: format!("{base_url}/v1"), + model: "embed-small".to_string(), + api_key: Some("embedding-key".to_string()), + }; + + let embedding = embed_text(&config, "dimension_probe").expect("embedding response"); + let requests = handle.join().expect("server thread"); + + assert_eq!(embedding, vec![0.25, 0.5, 0.75]); + assert_eq!(requests.len(), 1); + assert!(requests[0].contains("POST /v1/embeddings HTTP/1.1")); + assert!(requests[0].contains("authorization: Bearer embedding-key")); + assert!(requests[0].contains(r#""model":"embed-small""#)); + assert!(requests[0].contains(r#""input":"dimension_probe""#)); + } + + #[test] + fn ensure_collection_resolves_vector_size_and_distance() { + let (embedding_url, embedding_handle) = spawn_http_responses(vec![( + 200, + json!({"data": [{"embedding": [0.1, 0.2, 0.3]}]}), + )]); + let (qdrant_url, qdrant_handle) = spawn_http_responses(vec![ + (404, json!({"status": "not found"})), + (200, json!({"result": true})), + ( + 200, + json!({"result": {"config": {"params": {"vectors": {"size": 3, "distance": "Cosine"}}}}}), + ), + ]); + let mut lifecycle = CodeSymbolVectorLifecycle::new( + "project-1".to_string(), + QdrantConfig { + url: Some(qdrant_url), + api_key: None, + }, + EmbeddingConfig { + api_base: format!("{embedding_url}/v1"), + model: "embed-small".to_string(), + api_key: None, + }, + CodeVectorSettings { vector_dim: None }, + ) + .expect("lifecycle"); + + let created = lifecycle.ensure_collection().expect("create collection"); + let reused = lifecycle.ensure_collection().expect("reuse collection"); + let embedding_requests = embedding_handle.join().expect("embedding requests"); + let qdrant_requests = qdrant_handle.join().expect("qdrant requests"); + + assert_eq!(created.size, 3); + assert_eq!(created.distance, VECTOR_DISTANCE_COSINE); + assert_eq!(reused.size, 3); + assert_eq!(embedding_requests.len(), 1, "dimension probe is cached"); + assert!(qdrant_requests[1].contains("PUT /collections/code_symbols_project-1 HTTP/1.1")); + assert!(qdrant_requests[1].contains(r#""size":3"#)); + assert!(qdrant_requests[1].contains(r#""distance":"Cosine""#)); + + let (explicit_qdrant_url, explicit_handle) = spawn_http_responses(vec![ + (404, json!({"status": "not found"})), + (200, json!({"result": true})), + ]); + let mut explicit = CodeSymbolVectorLifecycle::new( + "project-1".to_string(), + QdrantConfig { + url: Some(explicit_qdrant_url), + api_key: None, + }, + EmbeddingConfig { + api_base: "http://127.0.0.1:9/v1".to_string(), + model: "unused".to_string(), + api_key: None, + }, + CodeVectorSettings { + vector_dim: Some(1536), + }, + ) + .expect("lifecycle with explicit size"); + + let schema = explicit.ensure_collection().expect("explicit size create"); + let explicit_requests = explicit_handle.join().expect("explicit qdrant requests"); + assert_eq!(schema.size, 1536); + assert!(explicit_requests[1].contains(r#""size":1536"#)); + } + + #[test] + fn lifecycle_http_scoped_to_module() { + let manifest_dir = std::path::Path::new(env!("CARGO_MANIFEST_DIR")); + let src_dir = manifest_dir.join("src"); + let mut offenders = Vec::new(); + + fn visit(path: &std::path::Path, offenders: &mut Vec) { + for entry in std::fs::read_dir(path).expect("read source directory") { + let entry = entry.expect("source entry"); + let path = entry.path(); + if path.is_dir() { + visit(&path, offenders); + continue; + } + if path.extension().and_then(|ext| ext.to_str()) != Some("rs") { + continue; + } + let source = std::fs::read_to_string(&path).expect("read source file"); + let lifecycle_rest = [ + "/points/delete", + "points/delete", + "collections/{collection}", + "/collections/{collection}", + ]; + if lifecycle_rest.iter().any(|needle| source.contains(needle)) + && !path.ends_with("vector/code_symbols.rs") + { + offenders.push(path); + } + } + } + + visit(&src_dir, &mut offenders); + assert!( + offenders.is_empty(), + "Qdrant lifecycle REST must stay scoped to vector/code_symbols.rs: {offenders:?}" + ); + } + + #[test] + fn routes_through_gobby_core_qdrant() { + let source = include_str!("code_symbols.rs"); + assert!(source.contains("gobby_core::config::resolve_qdrant_config")); + assert!(source.contains("gobby_core::qdrant::with_qdrant")); + assert!(source.contains("gobby_core::qdrant::collection_name")); + assert!(source.contains("CollectionScope::Custom")); + assert!(source.contains("gobby_core::qdrant::search")); + assert!(source.contains("gobby_core::qdrant::upsert")); + } + + fn spawn_http_responses( + responses: Vec<(u16, Value)>, + ) -> (String, thread::JoinHandle>) { + let listener = TcpListener::bind("127.0.0.1:0").expect("bind test server"); + let addr = listener.local_addr().expect("local addr"); + let handle = thread::spawn(move || { + let mut requests = Vec::new(); + for (status, body) in responses { + let (mut stream, _) = listener.accept().expect("accept request"); + requests.push(read_http_request(&mut stream)); + + let body = body.to_string(); + write!( + stream, + "HTTP/1.1 {status} OK\r\nContent-Type: application/json\r\nContent-Length: {}\r\nConnection: close\r\n\r\n{body}", + body.len() + ) + .expect("write response"); + } + requests + }); + + (format!("http://{addr}"), handle) + } + + fn read_http_request(stream: &mut impl Read) -> String { + let mut request = Vec::new(); + let mut buffer = [0; 4096]; + let mut expected_len = None; + + loop { + let n = stream.read(&mut buffer).expect("read request"); + if n == 0 { + break; + } + request.extend_from_slice(&buffer[..n]); + + if expected_len.is_none() + && let Some(header_end) = + request.windows(4).position(|window| window == b"\r\n\r\n") + { + let headers = String::from_utf8_lossy(&request[..header_end]); + let content_len = headers + .lines() + .find_map(|line| { + line.to_ascii_lowercase() + .strip_prefix("content-length: ") + .and_then(|value| value.parse::().ok()) + }) + .unwrap_or(0); + expected_len = Some(header_end + 4 + content_len); + } + + if let Some(expected_len) = expected_len + && request.len() >= expected_len + { + break; + } + } + + String::from_utf8_lossy(&request).into_owned() + } +} diff --git a/crates/gcode/src/vector/mod.rs b/crates/gcode/src/vector/mod.rs new file mode 100644 index 0000000..a2ac55f --- /dev/null +++ b/crates/gcode/src/vector/mod.rs @@ -0,0 +1 @@ +pub mod code_symbols; diff --git a/crates/gcode/tests/graph_standalone.rs b/crates/gcode/tests/graph_standalone.rs new file mode 100644 index 0000000..40cc76d --- /dev/null +++ b/crates/gcode/tests/graph_standalone.rs @@ -0,0 +1,234 @@ +use postgres::{Client, NoTls}; +use serde_json::Value; +use std::fs; +use std::process::{Command, Output}; + +const TEST_PROJECT_ID: &str = "graph-standalone-project"; +const TEST_FILE: &str = "src/lib.rs"; +const CALLER_ID: &str = "graph-standalone-caller"; +const CALLEE_ID: &str = "graph-standalone-callee"; + +#[test] +fn graph_commands_run_without_daemon_when_services_are_available() { + let Some(env) = StandaloneEnv::from_env() else { + eprintln!( + "skipping graph_standalone smoke; set GCODE_GRAPH_STANDALONE_DATABASE_URL, GCODE_GRAPH_STANDALONE_FALKOR_HOST, and GCODE_GRAPH_STANDALONE_FALKOR_PORT" + ); + return; + }; + + let project = tempfile::tempdir().expect("temp project"); + fs::create_dir_all(project.path().join(".gobby")).expect("create .gobby"); + fs::create_dir_all(project.path().join("src")).expect("create src"); + fs::write( + project.path().join("src/lib.rs"), + "pub fn caller() { callee(); }\npub fn callee() {}\n", + ) + .expect("write source"); + fs::write( + project.path().join(".gobby/gcode.json"), + serde_json::json!({ + "id": TEST_PROJECT_ID, + "name": "graph-standalone", + "created_at": "2026-05-28T00:00:00Z" + }) + .to_string(), + ) + .expect("write gcode identity"); + + let mut conn = Client::connect(&env.database_url, NoTls).expect("connect PostgreSQL"); + seed_project(&mut conn); + + let sync = run_gcode( + &env, + project.path(), + &["graph", "sync-file", "--file", TEST_FILE], + ); + assert_success(sync, "graph sync-file"); + + let overview = json_command(&env, project.path(), &["graph", "overview"]); + assert!( + overview["nodes"] + .as_array() + .is_some_and(|nodes| !nodes.is_empty()) + ); + + let file = json_command( + &env, + project.path(), + &["graph", "file", "--file", TEST_FILE], + ); + assert!( + file["links"] + .as_array() + .is_some_and(|links| !links.is_empty()) + ); + + let neighbors = json_command( + &env, + project.path(), + &[ + "graph", + "neighbors", + "--symbol-id", + CALLER_ID, + "--limit", + "10", + ], + ); + assert!( + neighbors["nodes"] + .as_array() + .is_some_and(|nodes| nodes.iter().any(|node| node["id"] == CALLEE_ID)) + ); + + let blast_symbol = json_command( + &env, + project.path(), + &[ + "graph", + "blast-radius", + "--symbol-id", + CALLER_ID, + "--depth", + "2", + "--limit", + "10", + ], + ); + assert_eq!(blast_symbol["center"], CALLER_ID); + + let blast_file = json_command( + &env, + project.path(), + &[ + "graph", + "blast-radius", + "--file", + TEST_FILE, + "--depth", + "2", + "--limit", + "10", + ], + ); + assert_eq!(blast_file["center"], TEST_FILE); + + let clear = json_command(&env, project.path(), &["graph", "clear"]); + assert_eq!(clear["success"], true); + + let rebuild = json_command(&env, project.path(), &["graph", "rebuild"]); + assert_eq!(rebuild["success"], true); + assert_eq!(rebuild["files_processed"], 1); + assert_eq!(rebuild["files_synced"], 1); + + cleanup_project(&mut conn); +} + +struct StandaloneEnv { + database_url: String, + falkor_host: String, + falkor_port: String, + falkor_password: Option, +} + +impl StandaloneEnv { + fn from_env() -> Option { + Some(Self { + database_url: std::env::var("GCODE_GRAPH_STANDALONE_DATABASE_URL").ok()?, + falkor_host: std::env::var("GCODE_GRAPH_STANDALONE_FALKOR_HOST").ok()?, + falkor_port: std::env::var("GCODE_GRAPH_STANDALONE_FALKOR_PORT").ok()?, + falkor_password: std::env::var("GCODE_GRAPH_STANDALONE_FALKOR_PASSWORD").ok(), + }) + } +} + +fn run_gcode(env: &StandaloneEnv, cwd: &std::path::Path, args: &[&str]) -> Output { + let mut command = Command::new(env!("CARGO_BIN_EXE_gcode")); + command + .current_dir(cwd) + .env("GCODE_DATABASE_URL", &env.database_url) + .env("GOBBY_FALKORDB_HOST", &env.falkor_host) + .env("GOBBY_FALKORDB_PORT", &env.falkor_port) + .env("GOBBY_HOME", cwd.join(".no-daemon-home")) + .arg("--no-freshness") + .arg("--format") + .arg("json") + .args(args); + if let Some(password) = &env.falkor_password { + command.env("GOBBY_FALKORDB_PASSWORD", password); + } + command.output().expect("run gcode") +} + +fn json_command(env: &StandaloneEnv, cwd: &std::path::Path, args: &[&str]) -> Value { + let output = run_gcode(env, cwd, args); + assert_success(output, &args.join(" ")) +} + +fn assert_success(output: Output, label: &str) -> Value { + assert!( + output.status.success(), + "{label} failed\nstdout:\n{}\nstderr:\n{}", + String::from_utf8_lossy(&output.stdout), + String::from_utf8_lossy(&output.stderr) + ); + serde_json::from_slice(&output.stdout).unwrap_or_else(|err| { + panic!( + "{label} did not emit JSON: {err}\nstdout:\n{}", + String::from_utf8_lossy(&output.stdout) + ) + }) +} + +fn seed_project(conn: &mut Client) { + cleanup_project(conn); + conn.batch_execute( + "INSERT INTO code_indexed_projects + (id, root_path, total_files, total_symbols, last_indexed_at, index_duration_ms) + VALUES + ('graph-standalone-project', '/tmp/graph-standalone', 1, 2, NOW(), 0); + + INSERT INTO code_indexed_files + (id, project_id, file_path, language, content_hash, symbol_count, byte_size, + graph_synced, vectors_synced, graph_sync_attempted_at, indexed_at) + VALUES + ('graph-standalone-file', 'graph-standalone-project', 'src/lib.rs', 'rust', + 'hash-1', 2, 54, false, true, NULL, NOW()); + + INSERT INTO code_symbols + (id, project_id, file_path, name, qualified_name, kind, language, byte_start, byte_end, + line_start, line_end, signature, docstring, parent_symbol_id, content_hash, + summary, created_at, updated_at) + VALUES + ('graph-standalone-caller', 'graph-standalone-project', 'src/lib.rs', 'caller', + 'crate::caller', 'function', 'rust', 0, 28, 1, 1, 'pub fn caller()', NULL, NULL, + 'hash-1', NULL, NOW(), NOW()), + ('graph-standalone-callee', 'graph-standalone-project', 'src/lib.rs', 'callee', + 'crate::callee', 'function', 'rust', 29, 47, 2, 2, 'pub fn callee()', NULL, NULL, + 'hash-1', NULL, NOW(), NOW()); + + INSERT INTO code_imports (project_id, source_file, target_module) + VALUES ('graph-standalone-project', 'src/lib.rs', 'std'); + + INSERT INTO code_calls + (project_id, caller_symbol_id, callee_symbol_id, callee_name, callee_target_kind, + callee_external_module, file_path, line) + VALUES + ('graph-standalone-project', 'graph-standalone-caller', 'graph-standalone-callee', + 'callee', 'symbol', '', 'src/lib.rs', 1);", + ) + .expect("seed graph rows"); +} + +fn cleanup_project(conn: &mut Client) { + conn.batch_execute( + "DELETE FROM code_calls WHERE project_id = 'graph-standalone-project'; + DELETE FROM code_imports WHERE project_id = 'graph-standalone-project'; + DELETE FROM code_symbols WHERE project_id = 'graph-standalone-project'; + DELETE FROM code_content_chunks WHERE project_id = 'graph-standalone-project'; + DELETE FROM code_indexed_files WHERE project_id = 'graph-standalone-project'; + DELETE FROM code_indexed_projects WHERE id = 'graph-standalone-project';", + ) + .expect("cleanup graph rows"); +} diff --git a/crates/gcode/tests/projection_standalone.rs b/crates/gcode/tests/projection_standalone.rs new file mode 100644 index 0000000..adb5860 --- /dev/null +++ b/crates/gcode/tests/projection_standalone.rs @@ -0,0 +1,326 @@ +use postgres::{Client, NoTls}; +use serde_json::{Value, json}; +use std::fs; +use std::io::{Read, Write}; +use std::net::TcpListener; +use std::process::{Command, Output}; +use std::thread; + +const TEST_PROJECT_ID: &str = "projection-standalone-project"; +const TEST_FILE: &str = "src/lib.rs"; + +#[test] +fn graph_and_vector_lifecycle_commands_run_without_daemon() { + let Some(env) = StandaloneEnv::from_env() else { + eprintln!( + "skipping projection_standalone smoke; set GCODE_GRAPH_STANDALONE_DATABASE_URL, GCODE_GRAPH_STANDALONE_FALKOR_HOST, and GCODE_GRAPH_STANDALONE_FALKOR_PORT" + ); + return; + }; + + let (embedding_url, embedding_handle) = spawn_http_responses(vec![ + (200, json!({"data": [{"embedding": [0.1, 0.2, 0.3]}]})), + (200, json!({"data": [{"embedding": [0.4, 0.5, 0.6]}]})), + (200, json!({"data": [{"embedding": [0.7, 0.8, 0.9]}]})), + (200, json!({"data": [{"embedding": [0.2, 0.3, 0.4]}]})), + ]); + let (qdrant_url, qdrant_handle) = spawn_http_responses(vec![ + (404, json!({"status": "not found"})), + (200, json!({"result": true})), + (200, json!({"result": {"operation_id": 1}})), + (200, json!({"result": {"operation_id": 2}})), + ( + 200, + json!({"result": {"config": {"params": {"vectors": {"size": 3, "distance": "Cosine"}}}}}), + ), + (200, json!({"result": {"operation_id": 3}})), + ( + 200, + json!({"result": {"config": {"params": {"vectors": {"size": 3, "distance": "Cosine"}}}}}), + ), + (200, json!({"result": {"operation_id": 4}})), + (200, json!({"result": {"operation_id": 5}})), + ]); + + let project = tempfile::tempdir().expect("temp project"); + fs::create_dir_all(project.path().join(".gobby")).expect("create .gobby"); + fs::create_dir_all(project.path().join("src")).expect("create src"); + fs::write( + project.path().join("src/lib.rs"), + "pub fn caller() { callee(); }\npub fn callee() {}\n", + ) + .expect("write source"); + fs::write( + project.path().join(".gobby/gcode.json"), + serde_json::json!({ + "id": TEST_PROJECT_ID, + "name": "projection-standalone", + "created_at": "2026-05-28T00:00:00Z" + }) + .to_string(), + ) + .expect("write gcode identity"); + + let mut conn = Client::connect(&env.database_url, NoTls).expect("connect PostgreSQL"); + seed_project(&mut conn); + + let graph_sync = json_command( + &env, + project.path(), + &qdrant_url, + &embedding_url, + &["graph", "sync-file", "--file", TEST_FILE], + ); + assert_eq!(graph_sync["status"], "ok"); + assert_eq!(graph_sync["synced_files"], 1); + assert_eq!(graph_sync["synced_symbols"], 2); + + let vector_sync = json_command( + &env, + project.path(), + &qdrant_url, + &embedding_url, + &["vector", "sync-file", "--file", TEST_FILE], + ); + assert_eq!(vector_sync["status"], "ok"); + assert_eq!(vector_sync["synced_files"], 1); + assert_eq!(vector_sync["synced_symbols"], 2); + + let graph_clear = json_command( + &env, + project.path(), + &qdrant_url, + &embedding_url, + &["graph", "clear"], + ); + assert_eq!(graph_clear["status"], "ok"); + + let graph_rebuild = json_command( + &env, + project.path(), + &qdrant_url, + &embedding_url, + &["graph", "rebuild"], + ); + assert_eq!(graph_rebuild["status"], "ok"); + assert_eq!(graph_rebuild["synced_files"], 1); + assert_eq!(graph_rebuild["synced_symbols"], 2); + + let vector_clear = json_command( + &env, + project.path(), + &qdrant_url, + &embedding_url, + &["vector", "clear"], + ); + assert_eq!(vector_clear["status"], "ok"); + + let vector_rebuild = json_command( + &env, + project.path(), + &qdrant_url, + &embedding_url, + &["vector", "rebuild"], + ); + assert_eq!(vector_rebuild["status"], "ok"); + assert_eq!(vector_rebuild["synced_files"], 1); + assert_eq!(vector_rebuild["synced_symbols"], 2); + + let embedding_requests = embedding_handle.join().expect("embedding requests"); + let qdrant_requests = qdrant_handle.join().expect("qdrant requests"); + assert_eq!(embedding_requests.len(), 4); + assert!(qdrant_requests.iter().any(|request| { + request.contains("PUT /collections/code_symbols_projection-standalone-project HTTP/1.1") + })); + assert!( + qdrant_requests.iter().any(|request| request.contains( + "PUT /collections/code_symbols_projection-standalone-project/points HTTP/1.1" + )) + ); + + cleanup_project(&mut conn); +} + +struct StandaloneEnv { + database_url: String, + falkor_host: String, + falkor_port: String, + falkor_password: Option, +} + +impl StandaloneEnv { + fn from_env() -> Option { + Some(Self { + database_url: std::env::var("GCODE_GRAPH_STANDALONE_DATABASE_URL").ok()?, + falkor_host: std::env::var("GCODE_GRAPH_STANDALONE_FALKOR_HOST").ok()?, + falkor_port: std::env::var("GCODE_GRAPH_STANDALONE_FALKOR_PORT").ok()?, + falkor_password: std::env::var("GCODE_GRAPH_STANDALONE_FALKOR_PASSWORD").ok(), + }) + } +} + +fn run_gcode( + env: &StandaloneEnv, + cwd: &std::path::Path, + qdrant_url: &str, + embedding_url: &str, + args: &[&str], +) -> Output { + let mut command = Command::new(env!("CARGO_BIN_EXE_gcode")); + command + .current_dir(cwd) + .env("GCODE_DATABASE_URL", &env.database_url) + .env("GOBBY_FALKORDB_HOST", &env.falkor_host) + .env("GOBBY_FALKORDB_PORT", &env.falkor_port) + .env("GOBBY_QDRANT_URL", qdrant_url) + .env("GOBBY_EMBEDDING_URL", format!("{embedding_url}/v1")) + .env("GOBBY_EMBEDDING_MODEL", "embed-small") + .env("GOBBY_EMBEDDING_VECTOR_DIM", "3") + .env("GOBBY_HOME", cwd.join(".no-daemon-home")) + .arg("--no-freshness") + .arg("--format") + .arg("json") + .args(args); + if let Some(password) = &env.falkor_password { + command.env("GOBBY_FALKORDB_PASSWORD", password); + } + command.output().expect("run gcode") +} + +fn json_command( + env: &StandaloneEnv, + cwd: &std::path::Path, + qdrant_url: &str, + embedding_url: &str, + args: &[&str], +) -> Value { + let output = run_gcode(env, cwd, qdrant_url, embedding_url, args); + assert_success(output, &args.join(" ")) +} + +fn assert_success(output: Output, label: &str) -> Value { + assert!( + output.status.success(), + "{label} failed\nstdout:\n{}\nstderr:\n{}", + String::from_utf8_lossy(&output.stdout), + String::from_utf8_lossy(&output.stderr) + ); + serde_json::from_slice(&output.stdout).unwrap_or_else(|err| { + panic!( + "{label} did not emit JSON: {err}\nstdout:\n{}", + String::from_utf8_lossy(&output.stdout) + ) + }) +} + +fn seed_project(conn: &mut Client) { + cleanup_project(conn); + conn.batch_execute( + "INSERT INTO code_indexed_projects + (id, root_path, total_files, total_symbols, last_indexed_at, index_duration_ms) + VALUES + ('projection-standalone-project', '/tmp/projection-standalone', 1, 2, NOW(), 0); + + INSERT INTO code_indexed_files + (id, project_id, file_path, language, content_hash, symbol_count, byte_size, + graph_synced, vectors_synced, graph_sync_attempted_at, indexed_at) + VALUES + ('projection-standalone-file', 'projection-standalone-project', 'src/lib.rs', 'rust', + 'hash-1', 2, 54, false, false, NULL, NOW()); + + INSERT INTO code_symbols + (id, project_id, file_path, name, qualified_name, kind, language, byte_start, byte_end, + line_start, line_end, signature, docstring, parent_symbol_id, content_hash, + summary, created_at, updated_at) + VALUES + ('projection-standalone-caller', 'projection-standalone-project', 'src/lib.rs', 'caller', + 'crate::caller', 'function', 'rust', 0, 28, 1, 1, 'pub fn caller()', NULL, NULL, + 'hash-1', NULL, NOW(), NOW()), + ('projection-standalone-callee', 'projection-standalone-project', 'src/lib.rs', 'callee', + 'crate::callee', 'function', 'rust', 29, 47, 2, 2, 'pub fn callee()', NULL, NULL, + 'hash-1', NULL, NOW(), NOW()); + + INSERT INTO code_imports (project_id, source_file, target_module) + VALUES ('projection-standalone-project', 'src/lib.rs', 'std'); + + INSERT INTO code_calls + (project_id, caller_symbol_id, callee_symbol_id, callee_name, callee_target_kind, + callee_external_module, file_path, line) + VALUES + ('projection-standalone-project', 'projection-standalone-caller', 'projection-standalone-callee', + 'callee', 'symbol', '', 'src/lib.rs', 1);", + ) + .expect("seed projection rows"); +} + +fn cleanup_project(conn: &mut Client) { + conn.batch_execute( + "DELETE FROM code_calls WHERE project_id = 'projection-standalone-project'; + DELETE FROM code_imports WHERE project_id = 'projection-standalone-project'; + DELETE FROM code_symbols WHERE project_id = 'projection-standalone-project'; + DELETE FROM code_content_chunks WHERE project_id = 'projection-standalone-project'; + DELETE FROM code_indexed_files WHERE project_id = 'projection-standalone-project'; + DELETE FROM code_indexed_projects WHERE id = 'projection-standalone-project';", + ) + .expect("cleanup projection rows"); +} + +fn spawn_http_responses(responses: Vec<(u16, Value)>) -> (String, thread::JoinHandle>) { + let listener = TcpListener::bind("127.0.0.1:0").expect("bind test server"); + let addr = listener.local_addr().expect("local addr"); + let handle = thread::spawn(move || { + let mut requests = Vec::new(); + for (status, body) in responses { + let (mut stream, _) = listener.accept().expect("accept request"); + requests.push(read_http_request(&mut stream)); + + let body = body.to_string(); + write!( + stream, + "HTTP/1.1 {status} OK\r\nContent-Type: application/json\r\nContent-Length: {}\r\nConnection: close\r\n\r\n{body}", + body.len() + ) + .expect("write response"); + } + requests + }); + + (format!("http://{addr}"), handle) +} + +fn read_http_request(stream: &mut impl Read) -> String { + let mut request = Vec::new(); + let mut buffer = [0; 4096]; + let mut expected_len = None; + + loop { + let n = stream.read(&mut buffer).expect("read request"); + if n == 0 { + break; + } + request.extend_from_slice(&buffer[..n]); + + if expected_len.is_none() + && let Some(header_end) = request.windows(4).position(|window| window == b"\r\n\r\n") + { + let headers = String::from_utf8_lossy(&request[..header_end]); + let content_len = headers + .lines() + .find_map(|line| { + line.to_ascii_lowercase() + .strip_prefix("content-length: ") + .and_then(|value| value.parse::().ok()) + }) + .unwrap_or(0); + expected_len = Some(header_end + 4 + content_len); + } + + if let Some(expected_len) = expected_len + && request.len() >= expected_len + { + break; + } + } + + String::from_utf8_lossy(&request).into_owned() +} diff --git a/crates/gcode/tests/vector_projection.rs b/crates/gcode/tests/vector_projection.rs new file mode 100644 index 0000000..7c03952 --- /dev/null +++ b/crates/gcode/tests/vector_projection.rs @@ -0,0 +1,282 @@ +use gobby_code::config::{CodeVectorSettings, EmbeddingConfig, QdrantConfig}; +use gobby_code::models::Symbol; +use gobby_code::vector::code_symbols::{ + CodeSymbolVectorLifecycle, VECTOR_DISTANCE_COSINE, VectorLifecycleError, +}; +use serde_json::{Value, json}; +use std::io::{Read, Write}; +use std::net::TcpListener; +use std::thread; + +fn symbol(id: &str, file_path: &str, summary: Option<&str>) -> Symbol { + Symbol { + id: id.to_string(), + project_id: "project-1".to_string(), + file_path: file_path.to_string(), + name: "run".to_string(), + qualified_name: "crate::run".to_string(), + kind: "function".to_string(), + language: "rust".to_string(), + byte_start: 10, + byte_end: 40, + line_start: 3, + line_end: 5, + signature: Some("fn run()".to_string()), + docstring: None, + parent_symbol_id: None, + content_hash: "hash".to_string(), + summary: summary.map(str::to_string), + created_at: String::new(), + updated_at: String::new(), + } +} + +#[test] +fn ensure_creates_missing_and_reuses_compatible() { + let (embedding_url, embedding_handle) = spawn_http_responses(vec![ + (200, json!({"data": [{"embedding": [0.1, 0.2, 0.3]}]})), + (200, json!({"data": [{"embedding": [0.4, 0.5, 0.6]}]})), + ]); + let (qdrant_url, qdrant_handle) = spawn_http_responses(vec![ + (404, json!({"status": "not found"})), + (200, json!({"result": true})), + (200, json!({"result": {"operation_id": 1}})), + (200, json!({"result": {"operation_id": 2}})), + ( + 200, + json!({"result": {"config": {"params": {"vectors": {"size": 3, "distance": VECTOR_DISTANCE_COSINE}}}}}), + ), + (200, json!({"result": {"operation_id": 3}})), + (200, json!({"result": {"operation_id": 4}})), + ]); + let mut lifecycle = CodeSymbolVectorLifecycle::new( + "project-1".to_string(), + QdrantConfig { + url: Some(qdrant_url), + api_key: Some("qdrant-key".to_string()), + }, + EmbeddingConfig { + api_base: format!("{embedding_url}/v1"), + model: "embed-small".to_string(), + api_key: None, + }, + CodeVectorSettings { + vector_dim: Some(3), + }, + ) + .expect("lifecycle"); + + let first = lifecycle + .sync_file_symbols("src/lib.rs", &[symbol("sym-1", "src/lib.rs", None)]) + .expect("first sync"); + let second = lifecycle + .sync_file_symbols( + "src/lib.rs", + &[symbol("sym-1", "src/lib.rs", Some("summary"))], + ) + .expect("second sync"); + let embedding_requests = embedding_handle.join().expect("embedding requests"); + let qdrant_requests = qdrant_handle.join().expect("qdrant requests"); + + assert_eq!(first.vectors_upserted, 1); + assert_eq!(second.vectors_upserted, 1); + assert_eq!(embedding_requests.len(), 2); + assert!(qdrant_requests[0].contains("GET /collections/code_symbols_project-1 HTTP/1.1")); + assert!(qdrant_requests[1].contains("PUT /collections/code_symbols_project-1 HTTP/1.1")); + assert!(qdrant_requests[1].contains(r#""size":3"#)); + assert!(qdrant_requests[1].contains(r#""distance":"Cosine""#)); + assert!( + qdrant_requests[2] + .contains("POST /collections/code_symbols_project-1/points/delete HTTP/1.1") + ); + assert!(qdrant_requests[2].contains(r#""key":"project_id""#)); + assert!(qdrant_requests[2].contains(r#""value":"project-1""#)); + assert!(qdrant_requests[2].contains(r#""key":"file_path""#)); + assert!(qdrant_requests[2].contains(r#""value":"src/lib.rs""#)); + assert!(qdrant_requests[3].contains("PUT /collections/code_symbols_project-1/points HTTP/1.1")); + assert!(qdrant_requests[3].contains(r#""provenance":"EXTRACTED""#)); + assert!(qdrant_requests[3].contains(r#""source_system":"gcode""#)); + assert!(qdrant_requests[3].contains(r#""source_line_start":3"#)); + assert!(qdrant_requests[3].contains(r#""source_byte_end":40"#)); + assert!(qdrant_requests[4].contains("GET /collections/code_symbols_project-1 HTTP/1.1")); + assert!(!qdrant_requests[4].contains("DELETE")); +} + +#[test] +fn clear_and_rebuild_delete_project_and_upsert_current_symbols() { + let (embedding_url, embedding_handle) = spawn_http_responses(vec![( + 200, + json!({"data": [{"embedding": [0.7, 0.8, 0.9]}]}), + )]); + let (qdrant_url, qdrant_handle) = spawn_http_responses(vec![ + ( + 200, + json!({"result": {"config": {"params": {"vectors": {"size": 3, "distance": VECTOR_DISTANCE_COSINE}}}}}), + ), + (200, json!({"result": {"operation_id": 1}})), + ( + 200, + json!({"result": {"config": {"params": {"vectors": {"size": 3, "distance": VECTOR_DISTANCE_COSINE}}}}}), + ), + (200, json!({"result": {"operation_id": 2}})), + (200, json!({"result": {"operation_id": 3}})), + ]); + let mut lifecycle = CodeSymbolVectorLifecycle::new( + "project-1".to_string(), + QdrantConfig { + url: Some(qdrant_url), + api_key: None, + }, + EmbeddingConfig { + api_base: format!("{embedding_url}/v1"), + model: "embed-small".to_string(), + api_key: None, + }, + CodeVectorSettings { + vector_dim: Some(3), + }, + ) + .expect("lifecycle"); + + let cleared = lifecycle.clear_project_vectors().expect("clear"); + let rebuilt = lifecycle + .rebuild_symbols(&[symbol("sym-1", "src/lib.rs", None)]) + .expect("rebuild"); + let embedding_requests = embedding_handle.join().expect("embedding requests"); + let qdrant_requests = qdrant_handle.join().expect("qdrant requests"); + + assert_eq!(cleared.vectors_deleted, 1); + assert_eq!(rebuilt.vectors_upserted, 1); + assert_eq!(embedding_requests.len(), 1); + assert!( + qdrant_requests[1] + .contains("POST /collections/code_symbols_project-1/points/delete HTTP/1.1") + ); + assert!(qdrant_requests[1].contains(r#""key":"project_id""#)); + assert!(!qdrant_requests[1].contains(r#""key":"file_path""#)); + assert!( + qdrant_requests[3] + .contains("POST /collections/code_symbols_project-1/points/delete HTTP/1.1") + ); + assert!(qdrant_requests[4].contains("PUT /collections/code_symbols_project-1/points HTTP/1.1")); +} + +#[test] +fn incompatible_existing_collection_errors_without_migration() { + let (qdrant_url, qdrant_handle) = spawn_http_responses(vec![ + ( + 200, + json!({"result": {"config": {"params": {"vectors": {"size": 4, "distance": "Dot"}}}}}), + ), + ( + 200, + json!({"result": {"config": {"params": {"vectors": {"size": 4, "distance": "Dot"}}}}}), + ), + ]); + let mut lifecycle = CodeSymbolVectorLifecycle::new( + "project-1".to_string(), + QdrantConfig { + url: Some(qdrant_url), + api_key: None, + }, + EmbeddingConfig { + api_base: "http://127.0.0.1:9/v1".to_string(), + model: "unused".to_string(), + api_key: None, + }, + CodeVectorSettings { + vector_dim: Some(3), + }, + ) + .expect("lifecycle"); + + let err = lifecycle + .ensure_collection() + .expect_err("incompatible ensure must fail"); + assert!(matches!( + err, + VectorLifecycleError::DimensionMismatch { + expected_size: 3, + found_size: Some(4), + expected_distance: VECTOR_DISTANCE_COSINE, + found_distance: Some(ref distance), + .. + } if distance == "Dot" + )); + + let err = lifecycle + .clear_project_vectors() + .expect_err("incompatible clear must fail before delete"); + assert!(matches!( + err, + VectorLifecycleError::DimensionMismatch { .. } + )); + let qdrant_requests = qdrant_handle.join().expect("qdrant requests"); + + assert_eq!(qdrant_requests.len(), 2); + assert!(qdrant_requests.iter().all(|request| { + request.contains("GET /collections/code_symbols_project-1 HTTP/1.1") + && !request.contains("/points/delete") + && !request.contains("/points HTTP/1.1") + })); +} + +fn spawn_http_responses(responses: Vec<(u16, Value)>) -> (String, thread::JoinHandle>) { + let listener = TcpListener::bind("127.0.0.1:0").expect("bind test server"); + let addr = listener.local_addr().expect("local addr"); + let handle = thread::spawn(move || { + let mut requests = Vec::new(); + for (status, body) in responses { + let (mut stream, _) = listener.accept().expect("accept request"); + requests.push(read_http_request(&mut stream)); + + let body = body.to_string(); + write!( + stream, + "HTTP/1.1 {status} OK\r\nContent-Type: application/json\r\nContent-Length: {}\r\nConnection: close\r\n\r\n{body}", + body.len() + ) + .expect("write response"); + } + requests + }); + + (format!("http://{addr}"), handle) +} + +fn read_http_request(stream: &mut impl Read) -> String { + let mut request = Vec::new(); + let mut buffer = [0; 4096]; + let mut expected_len = None; + + loop { + let n = stream.read(&mut buffer).expect("read request"); + if n == 0 { + break; + } + request.extend_from_slice(&buffer[..n]); + + if expected_len.is_none() + && let Some(header_end) = request.windows(4).position(|window| window == b"\r\n\r\n") + { + let headers = String::from_utf8_lossy(&request[..header_end]); + let content_len = headers + .lines() + .find_map(|line| { + line.to_ascii_lowercase() + .strip_prefix("content-length: ") + .and_then(|value| value.parse::().ok()) + }) + .unwrap_or(0); + expected_len = Some(header_end + 4 + content_len); + } + + if let Some(expected_len) = expected_len + && request.len() >= expected_len + { + break; + } + } + + String::from_utf8_lossy(&request).into_owned() +} diff --git a/crates/gcore/Cargo.toml b/crates/gcore/Cargo.toml index 962c96f..f0328de 100644 --- a/crates/gcore/Cargo.toml +++ b/crates/gcore/Cargo.toml @@ -1,21 +1,40 @@ [package] name = "gobby-core" -version = "0.1.0" +version = "0.2.0" edition = "2024" rust-version = "1.88" authors = ["Josh Wilhelmi "] -description = "Shared primitives for Gobby CLI tools — project root resolution, bootstrap config, daemon URL helpers" +description = "Shared foundation primitives for Gobby CLI tools" license = "Apache-2.0" repository = "https://github.com/GobbyAI/gobby-cli" homepage = "https://gobby.ai" keywords = ["gobby", "cli"] categories = ["development-tools"] +[features] +default = [] +postgres = ["dep:postgres", "dep:postgres-types"] +falkor = ["dep:falkordb", "dep:urlencoding"] +qdrant = ["dep:reqwest"] +indexing = ["dep:ignore", "dep:sha2"] +search = [] +full = ["postgres", "falkor", "qdrant", "indexing", "search"] + [dependencies] anyhow = "1" dirs = "6" +serde = { version = "1", features = ["derive"] } serde_json = "1" serde_yaml = "0.9" +thiserror = "2" + +postgres = { version = "0.19", optional = true } +postgres-types = { version = "0.2", optional = true } +falkordb = { version = "0.2", optional = true } +reqwest = { version = "0.12", default-features = false, features = ["blocking", "json", "rustls-tls"], optional = true } +ignore = { version = "0.4", optional = true } +sha2 = { version = "0.10", optional = true } +urlencoding = { version = "2", optional = true } [dev-dependencies] tempfile = "3" diff --git a/crates/gcore/assets/docker-compose.services.yml b/crates/gcore/assets/docker-compose.services.yml new file mode 100644 index 0000000..06bc9c5 --- /dev/null +++ b/crates/gcore/assets/docker-compose.services.yml @@ -0,0 +1,127 @@ +# Gobby service dependencies +# Installed via: gobby install +# Managed by daemon start/stop via Docker Compose profiles. + +services: + falkordb: + image: falkordb/falkordb:latest + ports: + - "${GOBBY_FALKORDB_PORT:-16379}:6379" + - "${GOBBY_FALKORDB_BROWSER_PORT:-13000}:3000" + environment: + # Redis AUTH - REDIS_ARGS is the documented entry point for redis-server flags. + # FALKORDB_ARGS is reserved for module options; do not put auth there. + - REDIS_ARGS=--requirepass ${GOBBY_FALKORDB_PASSWORD:-gobbyfalkor} + # Pass-through so the healthcheck below can read the same value. + - GOBBY_FALKORDB_PASSWORD=${GOBBY_FALKORDB_PASSWORD:-gobbyfalkor} + volumes: + - gobby_falkordb_data:/data + healthcheck: + # Quote the shell-expanded password so future relaxed password rules do not word-split. + test: + - CMD-SHELL + - 'redis-cli -a "$$GOBBY_FALKORDB_PASSWORD" PING | grep -q PONG' + interval: 10s + timeout: 5s + retries: 5 + restart: unless-stopped + profiles: [falkordb, all] + + qdrant: + image: qdrant/qdrant:latest + ports: + - "${GOBBY_QDRANT_HTTP_PORT:-6333}:6333" + - "${GOBBY_QDRANT_GRPC_PORT:-6334}:6334" + environment: + # Auth disabled for local-only access (no TLS = no point in API key) + # To enable: set QDRANT__SERVICE__API_KEY and configure TLS + - QDRANT__LOG_LEVEL=${GOBBY_QDRANT_LOG_LEVEL:-WARN} + volumes: + - gobby_qdrant_data:/qdrant/storage + healthcheck: + test: ["CMD-SHELL", "bash -c 'exec 3<>/dev/tcp/localhost/6333 && printf \"GET /healthz HTTP/1.0\\r\\nHost: localhost\\r\\n\\r\\n\" >&3 && grep -q \"healthz check passed\" <&3'"] + interval: 10s + timeout: 5s + retries: 5 + restart: unless-stopped + profiles: + - qdrant + - all + + postgres: + build: + context: ./postgres-pgsearch + args: + PG_SEARCH_VERSION: ${GOBBY_PG_SEARCH_VERSION:-0.23.4} + PG_SEARCH_SHA256: ${GOBBY_PG_SEARCH_SHA256} + image: gobby-postgres-local:18-pgsearch + container_name: gobby-postgres + command: + - postgres + - -c + - shared_preload_libraries=pg_search,pgaudit + - -c + - pgaudit.log=${GOBBY_PGAUDIT_LOG:-ddl} + - -c + - pgaudit.log_catalog=off + - -c + - logging_collector=on + - -c + - log_destination=stderr + - -c + - log_directory=/var/log/pgaudit + - -c + - log_filename=pgaudit-%Y-%m-%d_%H%M%S.log + - -c + - log_rotation_age=1d + - -c + - log_rotation_size=0 + - -c + - log_file_mode=0640 + - -c + - log_min_messages=log + environment: + POSTGRES_DB: ${GOBBY_POSTGRES_DB:-gobby} + POSTGRES_USER: ${GOBBY_POSTGRES_USER:-gobby} + POSTGRES_PASSWORD: ${GOBBY_POSTGRES_PASSWORD:-gobby_dev} + GOBBY_PGAUDIT_LOG: ${GOBBY_PGAUDIT_LOG:-ddl} + ports: + - "${GOBBY_POSTGRES_PORT:-60891}:5432" + volumes: + - gobby_postgres_data:/var/lib/postgresql + - gobby_pgaudit_log:/var/log/pgaudit + healthcheck: + test: + - CMD-SHELL + - >- + set -eu; + pg_isready -U ${GOBBY_POSTGRES_USER:-gobby}; + test "$(psql -U ${GOBBY_POSTGRES_USER:-gobby} -d ${GOBBY_POSTGRES_DB:-gobby} -tAc "SELECT 1 FROM pg_extension WHERE extname='pgaudit'")" = "1"; + expected_audit_log="$${GOBBY_PGAUDIT_LOG:-ddl}"; + test "$(psql -U ${GOBBY_POSTGRES_USER:-gobby} -d ${GOBBY_POSTGRES_DB:-gobby} -tAc 'SHOW pgaudit.log')" = "$$expected_audit_log"; + test -d /var/log/pgaudit; + audit_file="$$(find /var/log/pgaudit -name 'pgaudit-*.log' -size +0c -type f | sort | tail -n1)"; + test -n "$$audit_file"; + test "$$(stat -c '%U %a' "$$audit_file")" = "postgres 640"; + if [ "$$expected_audit_log" = "write" ]; then + psql -U ${GOBBY_POSTGRES_USER:-gobby} -d ${GOBBY_POSTGRES_DB:-gobby} -c 'UPDATE _pgaudit_probe SET last_probed_at = NOW() WHERE id = 1 RETURNING last_probed_at;'; + audit_file="$$(find /var/log/pgaudit -name 'pgaudit-*.log' -size +0c -type f | sort | tail -n1)"; + tail -n 20 "$$audit_file" | grep -E 'LOG: AUDIT: SESSION,.*UPDATE'; + fi + interval: 5s + timeout: 3s + retries: 10 + restart: unless-stopped + profiles: + - postgres + - all + +volumes: + gobby_falkordb_data: + name: gobby_falkordb_data + gobby_qdrant_data: + name: gobby_qdrant_data + gobby_postgres_data: + name: gobby_postgres_data + gobby_pgaudit_log: + name: gobby_pgaudit_log diff --git a/crates/gcore/assets/postgres-pgsearch/Dockerfile b/crates/gcore/assets/postgres-pgsearch/Dockerfile new file mode 100644 index 0000000..c56d251 --- /dev/null +++ b/crates/gcore/assets/postgres-pgsearch/Dockerfile @@ -0,0 +1,35 @@ +FROM postgres:18-trixie@sha256:41da01536bc3ae26308cefb0c57235e7488001360bdb15191eb0b7955b570299 + +ARG PG_SEARCH_VERSION +ARG PG_SEARCH_SHA256 + +SHELL ["/bin/bash", "-o", "pipefail", "-c"] + +RUN test -n "$PG_SEARCH_VERSION" \ + && test -n "$PG_SEARCH_SHA256" \ + && apt-get update \ + && apt-get install -y --no-install-recommends \ + ca-certificates=20250419 \ + curl=8.14.1-2+deb13u3 \ + postgresql-18-pgaudit=18.0-2.pgdg13+1 \ + && arch="$(dpkg --print-architecture)" \ + && curl -fsSL \ + "https://github.com/paradedb/paradedb/releases/download/v${PG_SEARCH_VERSION}/postgresql-18-pg-search_${PG_SEARCH_VERSION}-1PARADEDB-trixie_${arch}.deb" \ + -o /tmp/pg_search.deb \ + && echo "${PG_SEARCH_SHA256} /tmp/pg_search.deb" | sha256sum -c - \ + && apt-get install -y --no-install-recommends /tmp/pg_search.deb \ + && rm /tmp/pg_search.deb \ + && rm -rf /var/lib/apt/lists/* + +RUN { \ + echo "shared_preload_libraries = 'pg_search,pgaudit'"; \ + echo "pgaudit.log = 'write'"; \ + } >> /usr/share/postgresql/postgresql.conf.sample + +RUN mkdir -p /var/log/pgaudit \ + && chown postgres:postgres /var/log/pgaudit \ + && chmod 0750 /var/log/pgaudit + +COPY initdb.d/ /docker-entrypoint-initdb.d/ +COPY scripts/pg_audit_export.sh /usr/local/bin/pg_audit_export.sh +RUN chmod 0755 /usr/local/bin/pg_audit_export.sh diff --git a/crates/gcore/assets/postgres-pgsearch/initdb.d/01-pg_search.sql b/crates/gcore/assets/postgres-pgsearch/initdb.d/01-pg_search.sql new file mode 100644 index 0000000..d7d036c --- /dev/null +++ b/crates/gcore/assets/postgres-pgsearch/initdb.d/01-pg_search.sql @@ -0,0 +1 @@ +CREATE EXTENSION IF NOT EXISTS pg_search; diff --git a/crates/gcore/assets/postgres-pgsearch/initdb.d/02-pgaudit.sql b/crates/gcore/assets/postgres-pgsearch/initdb.d/02-pgaudit.sql new file mode 100644 index 0000000..3e1a494 --- /dev/null +++ b/crates/gcore/assets/postgres-pgsearch/initdb.d/02-pgaudit.sql @@ -0,0 +1,11 @@ +CREATE EXTENSION IF NOT EXISTS pgaudit; + +-- Audit-only probe row. Created here so the validation-window healthcheck has +-- a stable target before the application schema exists. +CREATE TABLE IF NOT EXISTS _pgaudit_probe ( + id INTEGER PRIMARY KEY, + last_probed_at TIMESTAMPTZ NOT NULL DEFAULT NOW() +); + +INSERT INTO _pgaudit_probe (id) VALUES (1) +ON CONFLICT (id) DO NOTHING; diff --git a/crates/gcore/assets/postgres-pgsearch/scripts/pg_audit_export.sh b/crates/gcore/assets/postgres-pgsearch/scripts/pg_audit_export.sh new file mode 100644 index 0000000..f82b180 --- /dev/null +++ b/crates/gcore/assets/postgres-pgsearch/scripts/pg_audit_export.sh @@ -0,0 +1,152 @@ +#!/usr/bin/env bash +set -euo pipefail + +readonly DEFAULT_LOG_DIR="/var/log/pgaudit" + +log_dir="$DEFAULT_LOG_DIR" +start="" +end="" + +usage() { + cat <<'EOF' +Usage: pg_audit_export.sh --start --end [--log-dir ] + +Emit pgAudit AUDIT lines whose PostgreSQL log timestamp falls within the +inclusive validation window. +EOF +} + +die_usage() { + echo "$1" >&2 + usage >&2 + exit 2 +} + +require_value() { + local flag="$1" + local value="${2:-}" + local description="$3" + + if [[ -z "$value" || "$value" == --* ]]; then + echo "$flag requires $description." >&2 + exit 2 + fi + + printf '%s\n' "$value" +} + +parse_epoch() { + local flag="$1" + local timestamp="$2" + local epoch + + if ! epoch="$(timestamp_epoch "$timestamp")"; then + echo "Invalid $flag timestamp: $timestamp" >&2 + exit 2 + fi + + printf '%s\n' "$epoch" +} + +timestamp_epoch() { + local timestamp="$1" + local portable_timestamp + + if date -u -d "$timestamp" +%s 2>/dev/null; then + return + fi + + portable_timestamp="$timestamp" + if [[ "$portable_timestamp" =~ ^[0-9]{4}-[0-9]{2}-[0-9]{2}T ]]; then + portable_timestamp="${portable_timestamp/T/ }" + fi + if [[ "$portable_timestamp" == *Z ]]; then + portable_timestamp="${portable_timestamp%Z} UTC" + fi + if [[ "$portable_timestamp" =~ ^([0-9]{4}-[0-9]{2}-[0-9]{2})[[:space:]]+([0-9]{2}:[0-9]{2}:[0-9]{2})(\.[0-9]+)?[[:space:]]+([^[:space:]]+)$ ]]; then + date -u -j -f "%Y-%m-%d %H:%M:%S %Z" \ + "${BASH_REMATCH[1]} ${BASH_REMATCH[2]} ${BASH_REMATCH[4]}" +%s 2>/dev/null + return + fi + + return 1 +} + +audit_line_epoch() { + local line="$1" + + if [[ "$line" =~ ^([0-9]{4}-[0-9]{2}-[0-9]{2})[[:space:]]+([0-9]{2}:[0-9]{2}:[0-9]{2}(\.[0-9]+)?)[[:space:]]+([^[:space:]]+) ]]; then + timestamp_epoch "${BASH_REMATCH[1]} ${BASH_REMATCH[2]} ${BASH_REMATCH[4]}" + return + fi + + return 1 +} + +emit_windowed_audit_lines() { + local start_epoch="$1" + local end_epoch="$2" + local -a log_files=("${@:3}") + local line + local line_epoch + local log_file + + LC_ALL=C sort -z < <(printf '%s\0' "${log_files[@]}") | while IFS= read -r -d '' log_file; do + while IFS= read -r line; do + [[ "$line" == *"AUDIT:"* ]] || continue + if line_epoch="$(audit_line_epoch "$line")" \ + && ((line_epoch >= start_epoch && line_epoch <= end_epoch)); then + printf '%s\n' "$line" + fi + done < "$log_file" + done +} + +while (($#)); do + case "$1" in + --start) + start="$(require_value "--start" "${2:-}" "an ISO 8601 timestamp")" + shift 2 + ;; + --end) + end="$(require_value "--end" "${2:-}" "an ISO 8601 timestamp")" + shift 2 + ;; + --log-dir) + log_dir="$(require_value "--log-dir" "${2:-}" "a path")" + shift 2 + ;; + --help|-h) + usage + exit 0 + ;; + *) + die_usage "Unknown argument: $1" + ;; + esac +done + +if [[ -z "$start" || -z "$end" ]]; then + die_usage "Both --start and --end are required." +fi + +if [[ ! -d "$log_dir" ]]; then + echo "pgAudit log directory not found: $log_dir" >&2 + exit 1 +fi + +start_epoch="$(parse_epoch "--start" "$start")" +end_epoch="$(parse_epoch "--end" "$end")" + +if ((start_epoch > end_epoch)); then + echo "--start must be earlier than or equal to --end." >&2 + exit 2 +fi + +shopt -s nullglob +log_files=("$log_dir"/pgaudit-*.log) +if ((${#log_files[@]} == 0)); then + exit 0 +fi + +emit_windowed_audit_lines "$start_epoch" "$end_epoch" "${log_files[@]}" diff --git a/crates/gcore/assets/postgres-pgsearch/version.json b/crates/gcore/assets/postgres-pgsearch/version.json new file mode 100644 index 0000000..8e543aa --- /dev/null +++ b/crates/gcore/assets/postgres-pgsearch/version.json @@ -0,0 +1,9 @@ +{ + "pg_search_version": "0.23.4", + "pg_search_sha256": "6b042d61d156ca5fdcb1c417e291d90bffe3026848890be30bf6e578146b4676", + "pg_search_sha256_by_arch": { + "amd64": "6b042d61d156ca5fdcb1c417e291d90bffe3026848890be30bf6e578146b4676", + "arm64": "5ad13a80b76c46590914e0c366bd8deaf807d5b352f5ad489876ec836d06d3d1" + }, + "postgres_major": "18" +} diff --git a/crates/gcore/src/config.rs b/crates/gcore/src/config.rs new file mode 100644 index 0000000..adbca4b --- /dev/null +++ b/crates/gcore/src/config.rs @@ -0,0 +1,558 @@ +//! Shared configuration-resolution boundary. +//! +//! This module is the public home for lightweight configuration contracts that +//! are shared across Gobby Rust crates. Concrete service resolution is added in +//! focused follow-up modules so this baseline crate remains small. + +/// FalkorDB connection configuration. +/// +/// Graph name selection is consumer-owned. +#[derive(Debug, Clone, PartialEq, Eq)] +pub struct FalkorConfig { + pub host: String, + pub port: u16, + pub password: Option, +} + +/// Qdrant connection configuration. +/// +/// Collection naming is consumer-owned. +#[derive(Debug, Clone, PartialEq, Eq)] +pub struct QdrantConfig { + pub url: Option, + pub api_key: Option, +} + +/// Embedding API configuration for an OpenAI-compatible endpoint. +#[derive(Debug, Clone, PartialEq, Eq)] +pub struct EmbeddingConfig { + pub api_base: String, + pub model: String, + pub api_key: Option, +} + +const FALKORDB_DEFAULT_PORT: u16 = 16379; +const EMBEDDING_DEFAULT_MODEL: &str = "nomic-embed-text"; + +#[cfg(test)] +pub(crate) static TEST_ENV_LOCK: std::sync::Mutex<()> = std::sync::Mutex::new(()); + +/// Decode a config_store value from its stored representation. +pub fn decode_config_value(raw: &str) -> Option { + match serde_json::from_str::(raw) { + Ok(serde_json::Value::String(value)) => Some(value), + Ok(value @ (serde_json::Value::Array(_) | serde_json::Value::Object(_))) => { + Some(serde_json::to_string(&value).unwrap_or_else(|_| raw.to_string())) + } + Ok(serde_json::Value::Null) => None, + Ok(value) => Some(value.to_string()), + Err(_) => Some(raw.to_string()), + } +} + +/// Resolve `${VAR}` and `${VAR:-default}` environment variable patterns. +pub fn resolve_env_pattern(value: &str) -> anyhow::Result> { + if !value.contains("${") { + return Ok(Some(value.to_string())); + } + + let mut output = String::with_capacity(value.len()); + let mut rest = value; + let mut unresolved = false; + + while let Some(start) = rest.find("${") { + output.push_str(&rest[..start]); + let pattern = &rest[start + 2..]; + let Some(end) = pattern.find('}') else { + anyhow::bail!("unterminated environment pattern in `{value}`"); + }; + + let expression = &pattern[..end]; + if expression.is_empty() { + anyhow::bail!("empty environment pattern in `{value}`"); + } + + let (name, default) = match expression.split_once(":-") { + Some((name, default)) => (name, Some(default)), + None => (expression, None), + }; + if name.is_empty() { + anyhow::bail!("empty environment variable name in `{value}`"); + } + + match std::env::var(name) { + Ok(current) if !(current.is_empty() && default.is_some()) => { + output.push_str(¤t); + } + Ok(_) | Err(std::env::VarError::NotPresent) => match default { + Some(default) => output.push_str(default), + None => unresolved = true, + }, + Err(std::env::VarError::NotUnicode(_)) => { + anyhow::bail!("environment variable `{name}` is not valid unicode"); + } + } + + rest = &pattern[end + 1..]; + } + + output.push_str(rest); + if unresolved { + Ok(None) + } else { + Ok(Some(output)) + } +} + +/// Source for config values and interpolation. +pub trait ConfigSource { + /// Read a decoded config value by key. + fn config_value(&mut self, key: &str) -> Option; + + /// Resolve interpolation patterns in a config value. + fn resolve_value(&mut self, value: &str) -> anyhow::Result; +} + +/// Environment-only source for consumers without database access. +pub struct EnvOnlySource; + +impl ConfigSource for EnvOnlySource { + fn config_value(&mut self, _key: &str) -> Option { + None + } + + fn resolve_value(&mut self, value: &str) -> anyhow::Result { + if value.contains("$secret:") { + anyhow::bail!("secret resolution requires a datastore-backed config source"); + } + resolve_env_pattern(value)?.ok_or_else(|| anyhow::anyhow!("unresolved pattern: {value}")) + } +} + +/// Resolve FalkorDB config from env, config_store, then defaults. +pub fn resolve_falkordb_config(source: &mut impl ConfigSource) -> Option { + let host = resolve_setting(source, "GOBBY_FALKORDB_HOST", "databases.falkordb.host")?; + let port = resolve_port( + source, + "GOBBY_FALKORDB_PORT", + "databases.falkordb.port", + FALKORDB_DEFAULT_PORT, + ); + let password = resolve_setting( + source, + "GOBBY_FALKORDB_PASSWORD", + "databases.falkordb.requirepass", + ); + + Some(FalkorConfig { + host, + port, + password, + }) +} + +/// Resolve Qdrant config from env and config_store. +pub fn resolve_qdrant_config(source: &mut impl ConfigSource) -> Option { + let url = resolve_setting(source, "GOBBY_QDRANT_URL", "databases.qdrant.url"); + url.as_ref()?; + let api_key = resolve_setting(source, "GOBBY_QDRANT_API_KEY", "databases.qdrant.api_key"); + + Some(QdrantConfig { url, api_key }) +} + +/// Resolve embedding API config from env, config_store, then defaults. +pub fn resolve_embedding_config(source: &mut impl ConfigSource) -> Option { + let api_base = resolve_setting(source, "GOBBY_EMBEDDING_URL", "embeddings.api_base")?; + let model = resolve_setting(source, "GOBBY_EMBEDDING_MODEL", "embeddings.model") + .unwrap_or_else(|| EMBEDDING_DEFAULT_MODEL.to_string()); + let api_key = resolve_setting(source, "GOBBY_EMBEDDING_API_KEY", "embeddings.api_key"); + + Some(EmbeddingConfig { + api_base, + model, + api_key, + }) +} + +fn resolve_setting( + source: &mut impl ConfigSource, + env_key: &str, + config_key: &str, +) -> Option { + let value = env_value(env_key).or_else(|| source.config_value(config_key))?; + resolve_non_empty(source, &value) +} + +fn resolve_port( + source: &mut impl ConfigSource, + env_key: &str, + config_key: &str, + default: u16, +) -> u16 { + let Some(raw_port) = env_value(env_key).or_else(|| source.config_value(config_key)) else { + return default; + }; + let Some(resolved) = resolve_non_empty(source, &raw_port) else { + return default; + }; + resolved.parse::().unwrap_or(default) +} + +fn resolve_non_empty(source: &mut impl ConfigSource, value: &str) -> Option { + if value.trim().is_empty() { + return None; + } + source + .resolve_value(value) + .ok() + .filter(|resolved| !resolved.trim().is_empty()) +} + +fn env_value(key: &str) -> Option { + std::env::var(key) + .ok() + .filter(|value| !value.trim().is_empty()) +} + +#[cfg(test)] +mod tests { + use super::*; + use std::collections::HashMap; + use std::sync::MutexGuard; + + struct EnvGuard { + _lock: MutexGuard<'static, ()>, + } + + impl EnvGuard { + fn new() -> Self { + let guard = Self { + _lock: TEST_ENV_LOCK + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()), + }; + guard.clear(); + guard + } + + fn clear(&self) { + for key in [ + "GOBBY_FALKORDB_HOST", + "GOBBY_FALKORDB_PORT", + "GOBBY_FALKORDB_PASSWORD", + "GOBBY_QDRANT_URL", + "GOBBY_QDRANT_API_KEY", + "GOBBY_EMBEDDING_URL", + "GOBBY_EMBEDDING_MODEL", + "GOBBY_EMBEDDING_API_KEY", + "GOBBY_TEST_PRESENT", + "GOBBY_TEST_MISSING", + ] { + unsafe { std::env::remove_var(key) }; + } + } + + fn set(&self, key: &str, value: &str) { + unsafe { std::env::set_var(key, value) }; + } + } + + impl Drop for EnvGuard { + fn drop(&mut self) { + self.clear(); + } + } + + #[derive(Default)] + struct TestSource { + values: HashMap<&'static str, String>, + resolved_values: Vec, + } + + impl TestSource { + fn with_values(values: impl IntoIterator) -> Self { + Self { + values: values + .into_iter() + .map(|(key, value)| (key, value.to_string())) + .collect(), + resolved_values: Vec::new(), + } + } + + fn with_raw_values(values: impl IntoIterator) -> Self { + Self { + values: values + .into_iter() + .filter_map(|(key, value)| decode_config_value(value).map(|v| (key, v))) + .collect(), + resolved_values: Vec::new(), + } + } + } + + impl ConfigSource for TestSource { + fn config_value(&mut self, key: &str) -> Option { + self.values.get(key).cloned() + } + + fn resolve_value(&mut self, value: &str) -> anyhow::Result { + self.resolved_values.push(value.to_string()); + if let Some(secret_name) = value.strip_prefix("$secret:") { + return Ok(format!("resolved-{secret_name}")); + } + Ok(resolve_env_pattern(value)?.unwrap_or_else(|| value.to_string())) + } + } + + #[test] + fn decode_config_value_handles_json_and_plain() { + assert_eq!( + decode_config_value("\"http://host:7474\""), + Some("http://host:7474".to_string()) + ); + assert_eq!( + decode_config_value(r#"["alpha",1,true]"#), + Some(r#"["alpha",1,true]"#.to_string()) + ); + assert_eq!( + decode_config_value(r#"{"host":"falkor.local","port":16379}"#), + Some(r#"{"host":"falkor.local","port":16379}"#.to_string()) + ); + assert_eq!(decode_config_value("42"), Some("42".to_string())); + assert_eq!(decode_config_value("true"), Some("true".to_string())); + assert_eq!( + decode_config_value("http://plain:7474"), + Some("http://plain:7474".to_string()) + ); + assert_eq!(decode_config_value("null"), None); + } + + #[test] + fn resolve_env_pattern_with_defaults() { + let env = EnvGuard::new(); + env.set("GOBBY_TEST_PRESENT", "present-value"); + + assert_eq!( + resolve_env_pattern("${GOBBY_TEST_PRESENT}").unwrap(), + Some("present-value".to_string()) + ); + assert_eq!( + resolve_env_pattern("prefix-${GOBBY_TEST_PRESENT}-suffix").unwrap(), + Some("prefix-present-value-suffix".to_string()) + ); + assert_eq!( + resolve_env_pattern("${GOBBY_TEST_MISSING:-fallback}").unwrap(), + Some("fallback".to_string()) + ); + assert_eq!(resolve_env_pattern("${GOBBY_TEST_MISSING}").unwrap(), None); + assert_eq!( + resolve_env_pattern("plain-value").unwrap(), + Some("plain-value".to_string()) + ); + } + + #[test] + fn env_overrides_config_store() { + let env = EnvGuard::new(); + env.set("GOBBY_FALKORDB_HOST", "env-falkor.local"); + env.set("GOBBY_FALKORDB_PORT", "17000"); + env.set("GOBBY_FALKORDB_PASSWORD", "env-pass"); + env.set("GOBBY_QDRANT_URL", "http://env-qdrant:6333"); + env.set("GOBBY_QDRANT_API_KEY", "env-qdrant-key"); + + let mut source = TestSource::with_values([ + ("databases.falkordb.host", "stored-falkor.local"), + ("databases.falkordb.port", "16000"), + ("databases.falkordb.requirepass", "stored-pass"), + ("databases.qdrant.url", "http://stored-qdrant:6333"), + ("databases.qdrant.api_key", "stored-qdrant-key"), + ]); + + let falkordb = resolve_falkordb_config(&mut source).expect("falkordb config"); + let qdrant = resolve_qdrant_config(&mut source).expect("qdrant config"); + + assert_eq!(falkordb.host, "env-falkor.local"); + assert_eq!(falkordb.port, 17000); + assert_eq!(falkordb.password.as_deref(), Some("env-pass")); + assert_eq!(qdrant.url.as_deref(), Some("http://env-qdrant:6333")); + assert_eq!(qdrant.api_key.as_deref(), Some("env-qdrant-key")); + } + + #[test] + fn config_source_handles_secrets() { + let _env = EnvGuard::new(); + let mut source = TestSource::with_values([ + ("databases.falkordb.host", "falkor.local"), + ("databases.falkordb.requirepass", "$secret:FALKOR_PASS"), + ]); + + let config = resolve_falkordb_config(&mut source).expect("falkordb config"); + + assert_eq!(config.password.as_deref(), Some("resolved-FALKOR_PASS")); + assert!( + source + .resolved_values + .iter() + .any(|value| value == "$secret:FALKOR_PASS") + ); + } + + #[test] + fn env_only_source_rejects_secret_patterns() { + let _env = EnvGuard::new(); + let mut source = EnvOnlySource; + + let error = source + .resolve_value("$secret:FALKOR_PASS") + .expect_err("secret resolution should require a datastore-backed source"); + + assert!(error.to_string().contains("secret resolution")); + } + + #[test] + fn embedding_url_env_var_is_canonical() { + let env = EnvGuard::new(); + env.set("GOBBY_EMBEDDING_URL", "http://env-embedding:11434"); + + let mut source = TestSource::with_values([ + ("embeddings.api_base", "http://stored-embedding:11434"), + ("embeddings.model", "stored-model"), + ]); + + let config = resolve_embedding_config(&mut source).expect("embedding config"); + + assert_eq!(config.api_base, "http://env-embedding:11434"); + assert_eq!(config.model, "stored-model"); + } + + #[test] + fn postgres_config_source_resolves_secrets() { + let _env = EnvGuard::new(); + + struct ConnectionLike { + values: HashMap<&'static str, String>, + secret_reads: usize, + } + + struct PostgresConfigSource<'a> { + conn: &'a mut ConnectionLike, + } + + impl ConfigSource for PostgresConfigSource<'_> { + fn config_value(&mut self, key: &str) -> Option { + self.conn.values.get(key).cloned() + } + + fn resolve_value(&mut self, value: &str) -> anyhow::Result { + self.conn.secret_reads += 1; + Ok(format!("secret::{value}")) + } + } + + let mut conn = ConnectionLike { + values: HashMap::from([ + ( + "embeddings.api_base", + "http://stored-embedding:11434".to_string(), + ), + ("embeddings.api_key", "$secret:OPENAI_API_KEY".to_string()), + ]), + secret_reads: 0, + }; + let config = { + let mut source = PostgresConfigSource { conn: &mut conn }; + resolve_embedding_config(&mut source).expect("embedding config") + }; + + assert_eq!( + config.api_key.as_deref(), + Some("secret::$secret:OPENAI_API_KEY") + ); + assert_eq!(conn.secret_reads, 2); + } + + #[test] + fn resolve_config_handles_json_encoded_store_values() { + let _env = EnvGuard::new(); + let mut source = TestSource::with_raw_values([ + ("databases.falkordb.host", r#""json-falkor.local""#), + ("databases.falkordb.port", r#""17001""#), + ("databases.falkordb.requirepass", r#""$secret:FALKOR_PASS""#), + ("databases.qdrant.url", r#""http://json-qdrant:6333""#), + ("databases.qdrant.api_key", r#"["alpha",1]"#), + ("embeddings.api_base", r#""http://json-embedding:11434""#), + ("embeddings.model", r#"["model",1]"#), + ]); + + let falkordb = resolve_falkordb_config(&mut source).expect("falkordb config"); + let qdrant = resolve_qdrant_config(&mut source).expect("qdrant config"); + let embedding = resolve_embedding_config(&mut source).expect("embedding config"); + + assert_eq!(falkordb.host, "json-falkor.local"); + assert_eq!(falkordb.port, 17001); + assert_eq!(falkordb.password.as_deref(), Some("resolved-FALKOR_PASS")); + assert_eq!(qdrant.url.as_deref(), Some("http://json-qdrant:6333")); + assert_eq!(qdrant.api_key.as_deref(), Some(r#"["alpha",1]"#)); + assert_eq!(embedding.api_base, "http://json-embedding:11434"); + assert_eq!(embedding.model, r#"["model",1]"#); + } + + #[test] + fn qdrant_and_embedding_resolution_order() { + { + let env = EnvGuard::new(); + env.set("GOBBY_QDRANT_API_KEY", "env-qdrant-key"); + env.set("GOBBY_EMBEDDING_MODEL", "env-embedding-model"); + + let mut source = TestSource::with_values([ + ("databases.qdrant.url", "http://stored-qdrant:6333"), + ("databases.qdrant.api_key", "stored-qdrant-key"), + ("embeddings.api_base", "http://stored-embedding:11434/v1"), + ("embeddings.model", "stored-embedding-model"), + ("embeddings.api_key", "$secret:EMBEDDING_KEY"), + ]); + + let qdrant = resolve_qdrant_config(&mut source).expect("qdrant config"); + let embedding = resolve_embedding_config(&mut source).expect("embedding config"); + + assert_eq!(qdrant.url.as_deref(), Some("http://stored-qdrant:6333")); + assert_eq!(qdrant.api_key.as_deref(), Some("env-qdrant-key")); + assert_eq!(embedding.api_base, "http://stored-embedding:11434/v1"); + assert_eq!(embedding.model, "env-embedding-model"); + assert_eq!(embedding.api_key.as_deref(), Some("resolved-EMBEDDING_KEY")); + } + + let _env = EnvGuard::new(); + let mut default_source = + TestSource::with_values([("embeddings.api_base", "http://stored-embedding:11434/v1")]); + let default_embedding = + resolve_embedding_config(&mut default_source).expect("embedding config"); + + assert_eq!(default_embedding.model, EMBEDDING_DEFAULT_MODEL); + assert!(resolve_qdrant_config(&mut TestSource::default()).is_none()); + } + + #[test] + fn falkordb_config_has_no_domain_graph_name() { + let config = FalkorConfig { + host: "falkor.local".to_string(), + port: 16379, + password: None, + }; + + assert!(!format!("{config:?}").contains("graph")); + let forbidden = ["gobby", "_", "code"].concat(); + assert!(!include_str!("config.rs").contains(&forbidden)); + } + + #[test] + fn qdrant_config_has_no_domain_collection_prefix() { + let config = QdrantConfig { + url: Some("http://qdrant:6333".to_string()), + api_key: None, + }; + + assert!(!format!("{config:?}").contains("collection")); + } +} diff --git a/crates/gcore/src/context.rs b/crates/gcore/src/context.rs new file mode 100644 index 0000000..28518c5 --- /dev/null +++ b/crates/gcore/src/context.rs @@ -0,0 +1,162 @@ +//! Shared runtime context boundary. +//! +//! Consumer crates keep their CLI flags and domain state locally. This module +//! owns the public location for cross-crate project, daemon, and service context +//! types as the Rust foundation expands. + +use std::path::PathBuf; + +use crate::config::{ + ConfigSource, EmbeddingConfig, FalkorConfig, QdrantConfig, resolve_embedding_config, + resolve_falkordb_config, resolve_qdrant_config, +}; + +/// Resolved runtime context for any gobby-core consumer. +#[derive(Debug, Clone, PartialEq, Eq)] +pub struct CoreContext { + /// Project root directory containing `.gobby/`. + pub project_root: PathBuf, + /// Project ID from `.gobby/project.json`. + pub project_id: String, + /// PostgreSQL hub DSN resolved by the consumer. + pub database_url: Option, + /// FalkorDB config when available. + pub falkordb: Option, + /// Qdrant config when available. + pub qdrant: Option, + /// Embedding API config when available. + pub embedding: Option, + /// Gobby daemon base URL. + pub daemon_url: Option, +} + +impl CoreContext { + /// Build a context from pre-resolved project identity and DSN inputs. + pub fn build( + project_root: PathBuf, + project_id: String, + database_url: Option, + source: &mut impl ConfigSource, + ) -> Self { + let falkordb = resolve_falkordb_config(source); + let qdrant = resolve_qdrant_config(source); + let embedding = resolve_embedding_config(source); + let daemon_url = Some(crate::daemon_url::daemon_url()); + + Self { + project_root, + project_id, + database_url, + falkordb, + qdrant, + embedding, + daemon_url, + } + } +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::config::{EnvOnlySource, TEST_ENV_LOCK}; + use std::sync::MutexGuard; + + struct EnvGuard { + _lock: MutexGuard<'static, ()>, + } + + impl EnvGuard { + fn new() -> Self { + let guard = Self { + _lock: TEST_ENV_LOCK + .lock() + .unwrap_or_else(|poisoned| poisoned.into_inner()), + }; + guard.clear(); + guard + } + + fn clear(&self) { + for key in [ + "GOBBY_FALKORDB_HOST", + "GOBBY_FALKORDB_PORT", + "GOBBY_FALKORDB_PASSWORD", + "GOBBY_QDRANT_URL", + "GOBBY_QDRANT_API_KEY", + "GOBBY_EMBEDDING_URL", + "GOBBY_EMBEDDING_MODEL", + "GOBBY_EMBEDDING_API_KEY", + ] { + unsafe { std::env::remove_var(key) }; + } + } + + fn set(&self, key: &str, value: &str) { + unsafe { std::env::set_var(key, value) }; + } + } + + impl Drop for EnvGuard { + fn drop(&mut self) { + self.clear(); + } + } + + #[test] + fn missing_optional_services_are_none() { + let _env = EnvGuard::new(); + let mut source = EnvOnlySource; + let root = std::path::PathBuf::from("/tmp/gobby-project"); + + let context = CoreContext::build(root.clone(), "project-id".to_string(), None, &mut source); + + assert_eq!(context.project_root, root); + assert_eq!(context.project_id, "project-id"); + assert_eq!(context.database_url, None); + assert!(context.falkordb.is_none()); + assert!(context.qdrant.is_none()); + assert!(context.embedding.is_none()); + assert!(context.daemon_url.is_some()); + } + + #[test] + fn build_with_env_only_source() { + let env = EnvGuard::new(); + env.set("GOBBY_FALKORDB_HOST", "env-falkor.local"); + env.set("GOBBY_FALKORDB_PORT", "17000"); + env.set("GOBBY_QDRANT_URL", "http://env-qdrant:6333"); + env.set("GOBBY_EMBEDDING_URL", "http://env-embedding:11434"); + env.set("GOBBY_EMBEDDING_MODEL", "env-model"); + + let mut source = EnvOnlySource; + let root = std::path::PathBuf::from("/tmp/gobby-project"); + + let context = CoreContext::build( + root.clone(), + "project-id".to_string(), + Some("postgres://example".to_string()), + &mut source, + ); + + assert_eq!(context.project_root, root); + assert_eq!(context.project_id, "project-id"); + assert_eq!(context.database_url.as_deref(), Some("postgres://example")); + assert_eq!( + context.falkordb.as_ref().map(|c| c.host.as_str()), + Some("env-falkor.local") + ); + assert_eq!( + context.qdrant.as_ref().and_then(|c| c.url.as_deref()), + Some("http://env-qdrant:6333") + ); + assert_eq!( + context.embedding.as_ref().map(|c| c.api_base.as_str()), + Some("http://env-embedding:11434") + ); + assert_eq!( + context.embedding.as_ref().map(|c| c.model.as_str()), + Some("env-model") + ); + assert!(context.daemon_url.is_some()); + } +} diff --git a/crates/gcore/src/degradation.rs b/crates/gcore/src/degradation.rs new file mode 100644 index 0000000..c3954ad --- /dev/null +++ b/crates/gcore/src/degradation.rs @@ -0,0 +1,182 @@ +//! Shared degradation vocabulary boundary. +//! +//! Degradation types describe partial availability without forcing every +//! command to treat optional service absence as fatal. Detailed contracts live +//! here so lightweight consumers can share the same vocabulary. + +use serde::{Deserialize, Serialize}; + +/// Service availability state, returned alongside results from adapters. +#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)] +pub enum ServiceState { + /// Service is connected and responding. + Available, + /// Service is not configured because no config was found from any source. + NotConfigured, + /// Service is configured but could not be reached. + Unreachable { + /// Adapter-provided diagnostic message for the failed connection. + message: String, + }, +} + +impl ServiceState { + /// Returns true when the backing service is connected and responding. + pub fn is_available(&self) -> bool { + matches!(self, Self::Available) + } +} + +/// Setup validation issue with actionable guidance. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct SetupIssue { + /// Name of the missing, invalid, or degraded resource. + pub object_name: String, + /// Store or service that owns the resource. + pub store: String, + /// Structured remediation guidance for callers to render. + pub guidance: Guidance, +} + +/// Structured guidance text for setup issues. +/// +/// Callers render these fields; `gobby-core` does not format CLI output. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct Guidance { + /// What is missing or wrong. + pub problem: String, + /// What the user should do. + pub action: String, + /// Optional command suggestion. + pub command_hint: Option, +} + +/// Fatal errors that prevent a command from completing. +#[derive(Debug, Serialize, Deserialize, thiserror::Error)] +pub enum CoreError { + /// Configuration was present but invalid for the requested operation. + #[error("invalid configuration: {0}")] + InvalidConfig(String), + /// A service required by this command could not be used. + #[error("required service unavailable: {service} — {message}")] + RequiredServiceUnavailable { + /// Required service name. + service: String, + /// Diagnostic message explaining the unavailability. + message: String, + }, + /// A write operation failed after validation began. + #[error("write failed: {0}")] + WriteFailed(String), + /// Input could not be parsed or failed integrity checks. + #[error("corrupted input: {0}")] + CorruptedInput(String), +} + +/// Degradation states for partial results. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub enum DegradationKind { + /// An optional service was unavailable during this operation. + ServiceUnavailable { + /// Optional service name. + service: String, + /// Availability state observed by the caller. + state: ServiceState, + }, + /// Search completed with fewer sources than requested. + PartialSearch { + /// Source names that contributed results. + available: Vec, + /// Source names that could not contribute results. + unavailable: Vec, + }, + /// Index data may be stale because of content drift or age thresholds. + StaleIndex { + /// Paths whose indexed data may be stale. + paths: Vec, + }, + /// Some artifacts were skipped during indexing. + SkippedArtifacts { + /// Number of skipped artifacts. + count: usize, + /// Human-readable reason the artifacts were skipped. + reason: String, + }, +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn optional_service_degradation_is_not_fatal() { + let unconfigured = ServiceState::NotConfigured; + let unreachable = ServiceState::Unreachable { + message: "connection refused".to_string(), + }; + + assert!(!unconfigured.is_available()); + assert!(!unreachable.is_available()); + + let degradation = DegradationKind::ServiceUnavailable { + service: "qdrant".to_string(), + state: unconfigured, + }; + let fatal = CoreError::RequiredServiceUnavailable { + service: "postgres".to_string(), + message: "hub is required for writes".to_string(), + }; + + assert!(matches!( + degradation, + DegradationKind::ServiceUnavailable { + service, + state: ServiceState::NotConfigured + } if service == "qdrant" + )); + assert_eq!( + fatal.to_string(), + "required service unavailable: postgres — hub is required for writes" + ); + } + + #[test] + fn guidance_is_structured() { + let guidance = Guidance { + problem: "BM25 index missing".to_string(), + action: "run attached setup validation".to_string(), + command_hint: Some("gobby setup validate".to_string()), + }; + + assert_eq!(guidance.problem, "BM25 index missing"); + assert_eq!(guidance.action, "run attached setup validation"); + assert_eq!( + guidance.command_hint.as_deref(), + Some("gobby setup validate") + ); + } + + #[test] + fn core_error_serialization_roundtrip() { + let invalid_config = CoreError::InvalidConfig("missing project id".to_string()); + let encoded = serde_json::to_string(&invalid_config).expect("serialize invalid config"); + let decoded: CoreError = + serde_json::from_str(&encoded).expect("deserialize invalid config"); + assert!(matches!( + decoded, + CoreError::InvalidConfig(message) if message == "missing project id" + )); + + let unavailable = CoreError::RequiredServiceUnavailable { + service: "postgres".to_string(), + message: "connection refused".to_string(), + }; + let encoded = serde_json::to_string(&unavailable).expect("serialize unavailable"); + let decoded: CoreError = serde_json::from_str(&encoded).expect("deserialize unavailable"); + assert!(matches!( + decoded, + CoreError::RequiredServiceUnavailable { service, message } + if service == "postgres" && message == "connection refused" + )); + } +} diff --git a/crates/gcore/src/falkor.rs b/crates/gcore/src/falkor.rs new file mode 100644 index 0000000..0f4319d --- /dev/null +++ b/crates/gcore/src/falkor.rs @@ -0,0 +1,328 @@ +//! FalkorDB foundation adapter boundary. +//! +//! This module is available with the `falkor` feature. The feature also enables +//! `urlencoding` so FalkorDB connection URLs can encode passwords safely when +//! graph client construction is added. + +use std::collections::HashMap; + +use falkordb::{ + FalkorClientBuilder, FalkorConnectionInfo, FalkorValue, LazyResultSet, QueryResult, SyncGraph, +}; +use serde_json::{Map, Number, Value}; + +use crate::config::FalkorConfig; +use crate::degradation::ServiceState; + +/// Row from a FalkorDB query response. +pub type Row = HashMap; + +/// Blocking FalkorDB graph client. +/// +/// Owns a connection to a named graph. Domain crates supply Cypher queries; +/// this adapter handles connection lifecycle and result parsing. +pub struct GraphClient { + graph: SyncGraph, +} + +impl GraphClient { + /// Build a client for a consumer-selected graph. + pub fn from_config(config: &FalkorConfig, graph_name: &str) -> anyhow::Result { + let password = config.password.as_deref().unwrap_or_default(); + let url = format!( + "falkor://:{}@{}:{}", + urlencoding::encode(password), + config.host, + config.port, + ); + let conn_info: FalkorConnectionInfo = url.as_str().try_into()?; + let client = FalkorClientBuilder::new() + .with_connection_info(conn_info) + .build()?; + Ok(Self { + graph: client.select_graph(graph_name), + }) + } + + /// Execute a Cypher query and return parsed rows. + pub fn query( + &mut self, + cypher: &str, + params: Option>, + ) -> anyhow::Result> { + match params { + Some(params) => { + let result = self.graph.query(cypher).with_params(¶ms).execute()?; + Ok(parse_falkor_result(result)) + } + None => { + let result = self.graph.query(cypher).execute()?; + Ok(parse_falkor_result(result)) + } + } + } +} + +/// Run a closure with a FalkorDB client, with typed degradation. +/// +/// Degradation contract: +/// - missing config returns the caller default with `ServiceState::NotConfigured` +/// - connection failure returns the caller default with `ServiceState::Unreachable` +/// - a successful closure returns its value with `ServiceState::Available` +/// - a closure error is propagated to the caller +pub fn with_graph( + config: Option<&FalkorConfig>, + graph_name: &str, + default: T, + f: impl FnOnce(&mut GraphClient) -> anyhow::Result, +) -> anyhow::Result<(T, ServiceState)> { + with_graph_client(config, graph_name, default, GraphClient::from_config, f) +} + +fn with_graph_client( + config: Option<&FalkorConfig>, + graph_name: &str, + default: T, + make_client: impl FnOnce(&FalkorConfig, &str) -> anyhow::Result, + f: impl FnOnce(&mut C) -> anyhow::Result, +) -> anyhow::Result<(T, ServiceState)> { + let Some(config) = config else { + return Ok((default, ServiceState::NotConfigured)); + }; + + let mut client = match make_client(config, graph_name) { + Ok(client) => client, + Err(error) => { + return Ok(( + default, + ServiceState::Unreachable { + message: error.to_string(), + }, + )); + } + }; + + let value = f(&mut client)?; + Ok((value, ServiceState::Available)) +} + +/// Escape a graph label for safe Cypher embedding. +pub fn escape_label(label: &str) -> String { + escape_identifier(label) +} + +/// Escape a relationship type for safe Cypher embedding. +pub fn escape_rel_type(rel: &str) -> String { + escape_identifier(rel) +} + +/// Escape a property key for safe Cypher embedding. +pub fn escape_property(key: &str) -> String { + escape_identifier(key) +} + +/// Escape a string parameter value for Cypher. +pub fn escape_string(value: &str) -> String { + let escaped = value.replace('\\', "\\\\").replace('\'', "\\'"); + format!("'{escaped}'") +} + +fn escape_identifier(value: &str) -> String { + format!("`{}`", value.replace('`', "``")) +} + +fn parse_falkor_result(result: QueryResult>) -> Vec { + parse_falkor_records(result.header, result.data) +} + +fn parse_falkor_records(headers: Vec, records: I) -> Vec +where + I: IntoIterator>, +{ + records + .into_iter() + .map(|record| { + let mut row = HashMap::new(); + for (i, field) in headers.iter().enumerate() { + let value = record.get(i).cloned().unwrap_or(FalkorValue::None); + row.insert(field.clone(), falkor_value_to_json(value)); + } + row + }) + .collect() +} + +fn falkor_value_to_json(value: FalkorValue) -> Value { + match value { + FalkorValue::String(value) => Value::String(value), + FalkorValue::Bool(value) => Value::Bool(value), + FalkorValue::I64(value) => Value::Number(Number::from(value)), + FalkorValue::F64(value) => Number::from_f64(value) + .map(Value::Number) + .unwrap_or(Value::Null), + FalkorValue::Array(values) => Value::Array( + values + .into_iter() + .map(falkor_value_to_json) + .collect::>(), + ), + FalkorValue::Map(values) => Value::Object( + values + .into_iter() + .map(|(key, value)| (key, falkor_value_to_json(value))) + .collect::>(), + ), + FalkorValue::None => Value::Null, + value => Value::String(format!("{value:?}")), + } +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::config::FalkorConfig; + use crate::degradation::ServiceState; + use anyhow::anyhow; + + struct FakeGraphClient; + + fn test_config() -> FalkorConfig { + FalkorConfig { + host: "127.0.0.1".to_string(), + port: 1, + password: None, + } + } + + #[test] + fn with_graph_degradation_contract() { + let default = vec!["default".to_string()]; + let missing = with_graph::>(None, "consumer_graph", default.clone(), |_| { + unreachable!("missing config should not construct a client") + }) + .expect("missing config should degrade"); + assert_eq!(missing, (default.clone(), ServiceState::NotConfigured)); + + let unreachable = with_graph_client( + Some(&test_config()), + "consumer_graph", + default.clone(), + |_config, _graph_name| Err(anyhow!("connection refused")), + |_client: &mut FakeGraphClient| Ok(vec!["value".to_string()]), + ) + .expect("connection failure should degrade"); + assert!(matches!( + unreachable, + (value, ServiceState::Unreachable { ref message }) + if value == default && message.contains("connection refused") + )); + + let available = with_graph_client( + Some(&test_config()), + "consumer_graph", + default.clone(), + |_config, _graph_name| Ok(FakeGraphClient), + |_client| Ok(vec!["value".to_string()]), + ) + .expect("successful closure should return available state"); + assert_eq!( + available, + (vec!["value".to_string()], ServiceState::Available) + ); + + let propagated = with_graph_client( + Some(&test_config()), + "consumer_graph", + default, + |_config, _graph_name| Ok(FakeGraphClient), + |_client| Err::, _>(anyhow!("query failed")), + ); + assert_eq!( + propagated + .expect_err("closure error should propagate") + .to_string(), + "query failed" + ); + } + + #[test] + fn escapes_graph_tokens() { + assert_eq!(escape_label("Node`Label"), "`Node``Label`"); + assert_eq!(escape_rel_type("REL`OUT"), "`REL``OUT`"); + assert_eq!(escape_property("line`start"), "`line``start`"); + assert_eq!( + escape_string("module\\path'symbol"), + "'module\\\\path\\'symbol'" + ); + } + + #[test] + fn no_domain_labels_in_adapter() { + let source = include_str!("falkor.rs"); + let forbidden = [ + ["Code", "Symbol"].concat(), + ["CA", "LLS"].concat(), + ["IM", "PORTS"].concat(), + ["Wiki", "Doc"].concat(), + ["LINKS", "_TO"].concat(), + ]; + + for token in forbidden { + assert!(!source.contains(&token), "{token} leaked into adapter"); + } + } + + #[test] + fn graph_unavailable_is_not_empty_success() { + let unavailable = with_graph_client( + Some(&test_config()), + "consumer_graph", + Vec::::new(), + |_config, _graph_name| Err(anyhow!("dial tcp failed")), + |_client: &mut FakeGraphClient| Ok(vec![Row::new()]), + ) + .expect("connection failure should degrade"); + + assert!(matches!( + unavailable, + (rows, ServiceState::Unreachable { .. }) if rows.is_empty() + )); + + let empty_success = with_graph_client( + Some(&test_config()), + "consumer_graph", + vec![Row::new()], + |_config, _graph_name| Ok(FakeGraphClient), + |_client| Ok(Vec::::new()), + ) + .expect("successful empty query should be available"); + + assert_eq!(empty_success, (Vec::::new(), ServiceState::Available)); + } + + #[test] + fn graph_name_is_consumer_supplied() { + let mut selected_graph = None; + let result = with_graph_client( + Some(&test_config()), + "consumer_graph", + (), + |_config, graph_name| { + selected_graph = Some(graph_name.to_string()); + Ok(FakeGraphClient) + }, + |_client| Ok(()), + ) + .expect("graph selection should succeed"); + + assert_eq!(result, ((), ServiceState::Available)); + assert_eq!(selected_graph.as_deref(), Some("consumer_graph")); + + let source = include_str!("falkor.rs"); + let code_graph_name = ["gobby", "_code"].concat(); + assert!( + !source.contains(&code_graph_name), + "adapter must not hardcode a consumer graph name" + ); + } +} diff --git a/crates/gcore/src/indexing.rs b/crates/gcore/src/indexing.rs new file mode 100644 index 0000000..0fb477f --- /dev/null +++ b/crates/gcore/src/indexing.rs @@ -0,0 +1,388 @@ +//! Generic indexing primitives shared by indexing consumers. +//! +//! This module is available with the `indexing` feature. Domain-specific +//! parsers, symbol models, and graph facts stay in consumer crates. + +use std::collections::{BTreeMap, BTreeSet}; +use std::io::{self, Read}; +use std::path::{Path, PathBuf}; + +use ignore::{WalkBuilder, overrides::OverrideBuilder}; +use serde_json::Value; +use sha2::{Digest, Sha256}; + +/// Walker configuration that consumers can extend with domain-specific rules. +#[derive(Debug, Clone, PartialEq, Eq)] +pub struct WalkerSettings { + /// Root directory to walk. + pub root: PathBuf, + /// Whether to respect git ignore sources such as `.gitignore`. + pub respect_gitignore: bool, + /// Maximum file size to yield, in bytes. + pub max_filesize: Option, + /// Extra ignore patterns such as `*.pyc` or `node_modules/`. + pub extra_ignores: Vec, +} + +impl WalkerSettings { + /// Create walker settings with generic defaults consumers can extend. + pub fn new(root: impl Into) -> Self { + Self { + root: root.into(), + respect_gitignore: true, + max_filesize: None, + extra_ignores: Vec::new(), + } + } + + /// Build an `ignore::WalkBuilder` from these settings. + /// + /// Panics when `extra_ignores` contains an invalid glob. Use + /// [`try_into_walker`](Self::try_into_walker) to handle invalid patterns. + pub fn into_walker(self) -> WalkBuilder { + self.try_into_walker() + .expect("invalid extra ignore pattern") + } + + /// Build an `ignore::WalkBuilder`, returning invalid glob errors. + pub fn try_into_walker(self) -> Result { + let mut walker = WalkBuilder::new(&self.root); + walker + .git_ignore(self.respect_gitignore) + .git_global(self.respect_gitignore) + .git_exclude(self.respect_gitignore) + .max_filesize(self.max_filesize); + + if !self.extra_ignores.is_empty() { + let mut overrides = OverrideBuilder::new(&self.root); + for pattern in self.extra_ignores { + overrides.add(&format!("!{pattern}"))?; + } + walker.overrides(overrides.build()?); + } + + Ok(walker) + } +} + +/// SHA-256 content hash for incremental indexing. +pub fn content_hash(data: &[u8]) -> String { + let mut hasher = Sha256::new(); + hasher.update(data); + format!("{:x}", hasher.finalize()) +} + +/// SHA-256 content hash for a file, read incrementally. +pub fn file_content_hash(path: impl AsRef) -> io::Result { + let mut file = std::fs::File::open(path)?; + let mut hasher = Sha256::new(); + let mut buffer = [0u8; 65_536]; + + loop { + let read = file.read(&mut buffer)?; + if read == 0 { + break; + } + hasher.update(&buffer[..read]); + } + + Ok(format!("{:x}", hasher.finalize())) +} + +/// A content chunk with byte range and opaque domain metadata. +#[derive(Debug, Clone, PartialEq)] +pub struct Chunk { + /// Path to the source file for the chunk. + pub file_path: PathBuf, + /// Inclusive byte offset where the chunk starts. + pub byte_start: usize, + /// Exclusive byte offset where the chunk ends. + pub byte_end: usize, + /// Optional human-readable heading associated with the chunk. + pub heading: Option, + /// Opaque domain payload such as symbol refs or wiki links. + pub metadata: Value, +} + +/// Stable identity for a content chunk, independent of domain metadata. +#[derive(Debug, Clone, PartialEq, Eq, Hash)] +pub struct ChunkIdentity { + /// Path to the source file for the chunk. + pub file_path: PathBuf, + /// Inclusive byte offset where the chunk starts. + pub byte_start: usize, + /// Exclusive byte offset where the chunk ends. + pub byte_end: usize, +} + +impl Chunk { + /// Return the domain-independent identity for this chunk. + pub fn identity(&self) -> ChunkIdentity { + ChunkIdentity { + file_path: self.file_path.clone(), + byte_start: self.byte_start, + byte_end: self.byte_end, + } + } +} + +/// Index lifecycle events for incremental indexing. +#[derive(Debug, Clone, PartialEq, Eq)] +pub enum IndexEvent { + Added(PathBuf), + Changed(PathBuf), + Unchanged(PathBuf), + Deleted(PathBuf), + Skipped { path: PathBuf, reason: String }, +} + +/// Classify content-hash snapshots into deterministic incremental index events. +pub fn index_events_from_hashes( + previous_hashes: &BTreeMap, + current_hashes: &BTreeMap, +) -> Vec { + let paths: BTreeSet<&PathBuf> = previous_hashes + .keys() + .chain(current_hashes.keys()) + .collect(); + + paths + .into_iter() + .map( + |path| match (previous_hashes.get(path), current_hashes.get(path)) { + (None, Some(_)) => IndexEvent::Added(path.clone()), + (Some(previous), Some(current)) if previous != current => { + IndexEvent::Changed(path.clone()) + } + (Some(_), Some(_)) => IndexEvent::Unchanged(path.clone()), + (Some(_), None) => IndexEvent::Deleted(path.clone()), + (None, None) => unreachable!("path came from at least one snapshot"), + }, + ) + .collect() +} + +#[cfg(test)] +mod tests { + use std::path::{Path, PathBuf}; + + use serde_json::json; + + use super::*; + + fn write_file(root: &Path, rel: &str, contents: &[u8]) { + let path = root.join(rel); + if let Some(parent) = path.parent() { + std::fs::create_dir_all(parent).expect("create parent"); + } + std::fs::write(path, contents).expect("write file"); + } + + fn rels(root: &Path, settings: WalkerSettings) -> Vec { + let mut files: Vec = settings + .into_walker() + .build() + .flatten() + .filter(|entry| entry.path().is_file()) + .map(|entry| { + entry + .path() + .strip_prefix(root) + .expect("path under root") + .to_string_lossy() + .to_string() + }) + .collect(); + files.sort(); + files + } + + #[test] + fn walker_settings_new_has_consumer_extendable_defaults() { + let root = PathBuf::from("workspace"); + + let settings = WalkerSettings::new(&root); + + assert_eq!(settings.root, root); + assert!(settings.respect_gitignore); + assert_eq!(settings.max_filesize, None); + assert!(settings.extra_ignores.is_empty()); + } + + #[test] + fn walker_settings_apply_generic_discovery_rules() { + let tmp = tempfile::tempdir().expect("tempdir"); + let root = tmp.path(); + write_file(root, ".gitignore", b"ignored.txt\n"); + write_file(root, "keep.txt", b"ok\n"); + write_file(root, "ignored.txt", b"ignored\n"); + write_file(root, "cache.pyc", b"pyc\n"); + write_file(root, "node_modules/pkg.js", b"pkg\n"); + write_file(root, "large.log", b"long\n"); + + let settings = WalkerSettings { + root: root.to_path_buf(), + respect_gitignore: true, + max_filesize: Some(3), + extra_ignores: vec!["*.pyc".to_string(), "node_modules/".to_string()], + }; + + assert_eq!(rels(root, settings), vec!["keep.txt"]); + } + + #[test] + fn content_hash_returns_sha256_hex() { + assert_eq!( + content_hash(b"hello"), + "2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824" + ); + } + + #[test] + fn file_content_hash_returns_sha256_hex() -> std::io::Result<()> { + let tmp = tempfile::tempdir()?; + let path = tmp.path().join("content.txt"); + std::fs::write(&path, b"hello")?; + + assert_eq!( + file_content_hash(&path)?, + "2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824" + ); + Ok(()) + } + + #[test] + fn chunk_metadata_is_opaque() { + let metadata = json!({ + "symbols": ["module::Item"], + "wiki_links": ["Indexing"], + "consumer": { + "domain": "docs", + "score": 7 + } + }); + + let chunk = Chunk { + file_path: PathBuf::from("docs/indexing.md"), + byte_start: 12, + byte_end: 48, + heading: Some("Indexing".to_string()), + metadata: metadata.clone(), + }; + + assert_eq!(chunk.metadata, metadata); + } + + #[test] + fn chunk_identity_uses_path_and_byte_range_only() { + let base = Chunk { + file_path: PathBuf::from("docs/indexing.md"), + byte_start: 12, + byte_end: 48, + heading: Some("Indexing".to_string()), + metadata: json!({ "consumer": "docs" }), + }; + let same_identity = Chunk { + file_path: PathBuf::from("docs/indexing.md"), + byte_start: 12, + byte_end: 48, + heading: Some("Different heading".to_string()), + metadata: json!({ "consumer": "wiki" }), + }; + let different_range = Chunk { + byte_end: 49, + ..base.clone() + }; + + assert_eq!(base.identity(), same_identity.identity()); + assert_ne!(base.identity(), different_range.identity()); + } + + #[test] + fn index_events_cover_incremental_cases() { + let events = [ + IndexEvent::Added(PathBuf::from("added.md")), + IndexEvent::Changed(PathBuf::from("changed.md")), + IndexEvent::Unchanged(PathBuf::from("same.md")), + IndexEvent::Deleted(PathBuf::from("deleted.md")), + IndexEvent::Skipped { + path: PathBuf::from("large.bin"), + reason: "too large".to_string(), + }, + ]; + + assert!(matches!(events[0], IndexEvent::Added(_))); + assert!(matches!(events[1], IndexEvent::Changed(_))); + assert!(matches!(events[2], IndexEvent::Unchanged(_))); + assert!(matches!(events[3], IndexEvent::Deleted(_))); + assert!(matches!( + &events[4], + IndexEvent::Skipped { path, reason } + if path == &PathBuf::from("large.bin") && reason == "too large" + )); + } + + #[test] + fn index_events_from_hashes_classify_incremental_flow() { + let previous = std::collections::BTreeMap::from([ + (PathBuf::from("changed.md"), "old".to_string()), + (PathBuf::from("deleted.md"), "gone".to_string()), + (PathBuf::from("same.md"), "same".to_string()), + ]); + let current = std::collections::BTreeMap::from([ + (PathBuf::from("added.md"), "new".to_string()), + (PathBuf::from("changed.md"), "new".to_string()), + (PathBuf::from("same.md"), "same".to_string()), + ]); + + assert_eq!( + index_events_from_hashes(&previous, ¤t), + vec![ + IndexEvent::Added(PathBuf::from("added.md")), + IndexEvent::Changed(PathBuf::from("changed.md")), + IndexEvent::Deleted(PathBuf::from("deleted.md")), + IndexEvent::Unchanged(PathBuf::from("same.md")), + ] + ); + } + + #[test] + fn no_domain_parser_dependency() { + let manifest = std::fs::read_to_string(concat!(env!("CARGO_MANIFEST_DIR"), "/Cargo.toml")) + .expect("read manifest"); + + assert!(!manifest.contains("tree-sitter")); + } + + #[test] + fn manifest_keeps_indexing_feature_generic() { + let manifest = std::fs::read_to_string(concat!(env!("CARGO_MANIFEST_DIR"), "/Cargo.toml")) + .expect("read manifest"); + + let feature_sections = manifest + .lines() + .filter(|line| line.trim() == "[features]") + .count(); + assert_eq!( + feature_sections, 1, + "gcore manifest should have exactly one [features] section" + ); + + let indexing_feature = manifest + .lines() + .find(|line| line.trim_start().starts_with("indexing = [")) + .expect("indexing feature"); + for dependency in ["dep:ignore", "dep:sha2"] { + assert!( + indexing_feature.contains(&format!("\"{dependency}\"")), + "indexing feature should include {dependency}" + ); + } + for forbidden in ["tree-sitter", "markdown", "wiki"] { + assert!( + !indexing_feature.contains(forbidden), + "indexing feature should not include domain dependency {forbidden:?}" + ); + } + } +} diff --git a/crates/gcore/src/lib.rs b/crates/gcore/src/lib.rs index 167a0d0..0dec04b 100644 --- a/crates/gcore/src/lib.rs +++ b/crates/gcore/src/lib.rs @@ -1,9 +1,34 @@ //! Shared primitives for Gobby CLI tools. //! -//! Small, dependency-light helpers that multiple Gobby binaries (`gcode`, -//! `gsqz`, `gloc`, `ghook`) share: project-root walk-up, project-id reading, -//! bootstrap config resolution, daemon URL construction. +//! The baseline crate stays dependency-light for consumers that only need +//! project discovery, bootstrap config, daemon URLs, and shared foundation +//! vocabulary. Datastore and indexing adapters sit behind Cargo feature gates +//! so small binaries do not inherit services they never call. +// Always available - existing modules. pub mod bootstrap; pub mod daemon_url; pub mod project; +pub mod provisioning; + +// Always available - lightweight foundation modules. +pub mod config; +pub mod context; +pub mod degradation; +pub mod setup; + +// Feature-gated modules. +#[cfg(feature = "postgres")] +pub mod postgres; + +#[cfg(feature = "falkor")] +pub mod falkor; + +#[cfg(feature = "qdrant")] +pub mod qdrant; + +#[cfg(feature = "indexing")] +pub mod indexing; + +#[cfg(feature = "search")] +pub mod search; diff --git a/crates/gcore/src/postgres.rs b/crates/gcore/src/postgres.rs new file mode 100644 index 0000000..fc8c93f --- /dev/null +++ b/crates/gcore/src/postgres.rs @@ -0,0 +1,135 @@ +//! PostgreSQL foundation adapter boundary and hub connection helpers. +//! +//! This module is available with the `postgres` feature. Gobby-owned schemas are +//! externally managed; adapter code must validate required objects without +//! creating, altering, or dropping them. This module is intentionally +//! schema-agnostic; consumers supply any table or index validation. + +use anyhow::Context; +use postgres::{Client, NoTls}; + +/// Connect to the PostgreSQL hub in read-only mode. +/// +/// Sets `default_transaction_read_only = on` to guard against accidental writes. +pub fn connect_readonly(database_url: &str) -> anyhow::Result { + let mut client = connect(database_url)?; + client + .execute("SET default_transaction_read_only = on", &[]) + .context("failed to set PostgreSQL connection read-only")?; + Ok(client) +} + +/// Connect to the PostgreSQL hub with write access. +pub fn connect_readwrite(database_url: &str) -> anyhow::Result { + connect(database_url) +} + +/// Read a raw config value from the Gobby `config_store` table. +/// +/// Returns the raw stored value (which may be JSON-encoded). Callers should +/// decode JSON string encoding and resolve `$secret:NAME` or `${VAR}` values +/// in their own config layer. +/// +/// Returns `None` for missing keys. Does not write. +pub fn read_config_value(conn: &mut Client, key: &str) -> anyhow::Result> { + let row = conn + .query_opt("SELECT value FROM config_store WHERE key = $1", &[&key]) + .with_context(|| format!("failed to read config_store key {key:?}"))?; + row.map(|r| { + r.try_get("value") + .with_context(|| format!("config_store key {key:?} value was not text")) + }) + .transpose() +} + +/// Result of a single schema object check (table, index, column, etc.). +#[derive(Debug, Clone, PartialEq, Eq)] +pub struct SchemaCheck { + /// Object name (for example, `symbols` or `bm25_symbols_idx`). + pub object_name: String, + /// What was checked (for example, `table exists` or `column type`). + pub check_kind: String, + /// Whether the check passed. + pub passed: bool, + /// Detail on failure. + pub detail: Option, +} + +/// Run a consumer-supplied schema validator for attached-mode checks. +/// +/// The callback receives a mutable connection because `postgres::Client` +/// query methods require `&mut self`. `gobby-core` does not know which tables +/// to check and never runs migrations. +pub fn validate_schema( + conn: &mut Client, + validator: impl FnOnce(&mut Client) -> Vec, +) -> Vec { + run_schema_validator(conn, validator) +} + +fn connect(database_url: &str) -> anyhow::Result { + Client::connect(database_url, NoTls).context("failed to connect to the Gobby PostgreSQL hub") +} + +fn run_schema_validator( + conn: &mut C, + validator: impl FnOnce(&mut C) -> Vec, +) -> Vec { + validator(conn) +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn attached_validation_is_non_destructive() { + let mut conn = vec!["existing-state"]; + + let checks = run_schema_validator(&mut conn, |conn| { + assert_eq!(conn.as_slice(), ["existing-state"]); + conn.push("validator-ran"); + vec![SchemaCheck { + object_name: "consumer_table".to_string(), + check_kind: "table exists".to_string(), + passed: true, + detail: None, + }] + }); + + assert_eq!(conn, vec!["existing-state", "validator-ran"]); + assert_eq!(checks.len(), 1); + assert_eq!(checks[0].object_name, "consumer_table"); + assert!(checks[0].passed); + } + + #[test] + fn schema_validator_is_domain_supplied() { + let mut domain_objects = ["domain_symbols", "domain_bm25_idx"].into_iter(); + + let checks = run_schema_validator(&mut domain_objects, |objects| { + objects + .map(|object_name| SchemaCheck { + object_name: object_name.to_string(), + check_kind: "consumer supplied".to_string(), + passed: true, + detail: None, + }) + .collect::>() + }); + + assert_eq!( + checks + .iter() + .map(|check| check.object_name.as_str()) + .collect::>(), + vec!["domain_symbols", "domain_bm25_idx"] + ); + } + + #[test] + fn validate_schema_accepts_postgres_client_validators() { + let _validate: fn(&mut Client, fn(&mut Client) -> Vec) -> Vec = + validate_schema; + } +} diff --git a/crates/gcore/src/project.rs b/crates/gcore/src/project.rs index f3ac65f..bb3e8fb 100644 --- a/crates/gcore/src/project.rs +++ b/crates/gcore/src/project.rs @@ -37,3 +37,31 @@ pub fn read_project_id(project_root: &Path) -> anyhow::Result { .map(String::from) .context("'id' field not found in .gobby/project.json") } + +#[cfg(test)] +mod tests { + use super::*; + use std::fs; + + #[test] + fn read_project_id_is_non_destructive() { + let tmp = tempfile::tempdir().expect("tempdir"); + let gobby_dir = tmp.path().join(".gobby"); + fs::create_dir(&gobby_dir).expect("create .gobby"); + let project_json = gobby_dir.join("project.json"); + let contents = r#"{ + "id": "project-id", + "name": "example" +} +"#; + fs::write(&project_json, contents).expect("write project json"); + + let project_id = read_project_id(tmp.path()).expect("read project id"); + + assert_eq!(project_id, "project-id"); + assert_eq!( + fs::read_to_string(&project_json).expect("read project json"), + contents + ); + } +} diff --git a/crates/gcore/src/provisioning.rs b/crates/gcore/src/provisioning.rs new file mode 100644 index 0000000..d7fb8d2 --- /dev/null +++ b/crates/gcore/src/provisioning.rs @@ -0,0 +1,844 @@ +//! Standalone bootstrap and Docker service provisioning. +//! +//! The bundled service assets mirror the Python daemon package layout. Runtime +//! callers can copy them into `~/.gobby/services` and start the same profiles +//! the daemon manages, then persist daemon-style bootstrap keys in `gcore.yaml`. + +use std::collections::BTreeMap; +use std::fs; +use std::io::{Read as _, Write as _}; +use std::net::TcpStream; +use std::path::{Path, PathBuf}; +use std::process::Command; +use std::time::Duration; + +use serde::Deserialize; + +use crate::config::{ConfigSource, resolve_env_pattern}; + +pub const GCORE_CONFIG_FILENAME: &str = "gcore.yaml"; +pub const SERVICES_DIRNAME: &str = "services"; +pub const COMPOSE_FILENAME: &str = "docker-compose.yml"; + +pub const DEFAULT_POSTGRES_HOST: &str = "127.0.0.1"; +pub const DEFAULT_POSTGRES_PORT: u16 = 60891; +pub const DEFAULT_POSTGRES_DB: &str = "gobby"; +pub const DEFAULT_POSTGRES_USER: &str = "gobby"; +pub const DEFAULT_POSTGRES_PASSWORD: &str = "gobby_dev"; + +pub const DEFAULT_FALKORDB_HOST: &str = "127.0.0.1"; +pub const DEFAULT_FALKORDB_PORT: u16 = 16379; +pub const DEFAULT_FALKORDB_BROWSER_PORT: u16 = 13000; +pub const DEFAULT_FALKORDB_PASSWORD: &str = "gobbyfalkor"; + +pub const DEFAULT_QDRANT_HTTP_PORT: u16 = 6333; +pub const DEFAULT_QDRANT_GRPC_PORT: u16 = 6334; + +pub const DEFAULT_LM_STUDIO_API_BASE: &str = "http://localhost:1234/v1"; +pub const DEFAULT_LM_STUDIO_MODEL: &str = "text-embedding-nomic-embed-text-v1.5@f16"; +pub const DEFAULT_OLLAMA_API_BASE: &str = "http://localhost:11434/v1"; +pub const DEFAULT_OLLAMA_MODEL: &str = "nomic-embed-text"; +pub const DEFAULT_EMBEDDING_VECTOR_DIM: usize = 768; + +pub const COMPOSE_TEMPLATE: &str = include_str!("../assets/docker-compose.services.yml"); +const PGSEARCH_DOCKERFILE: &str = include_str!("../assets/postgres-pgsearch/Dockerfile"); +const PGSEARCH_VERSION: &str = include_str!("../assets/postgres-pgsearch/version.json"); +const PGSEARCH_INIT_PG_SEARCH: &str = + include_str!("../assets/postgres-pgsearch/initdb.d/01-pg_search.sql"); +const PGSEARCH_INIT_PGAUDIT: &str = + include_str!("../assets/postgres-pgsearch/initdb.d/02-pgaudit.sql"); +const PG_AUDIT_EXPORT: &str = + include_str!("../assets/postgres-pgsearch/scripts/pg_audit_export.sh"); + +#[derive(Debug, Clone, Default, PartialEq, Eq)] +pub struct StandaloneConfig { + values: BTreeMap, +} + +impl StandaloneConfig { + pub fn new(values: BTreeMap) -> Self { + Self { values } + } + + pub fn empty() -> Self { + Self::default() + } + + pub fn read_at(path: &Path) -> anyhow::Result> { + if !path.exists() { + return Ok(None); + } + let contents = fs::read_to_string(path) + .map_err(|err| anyhow::anyhow!("failed to read {}: {err}", path.display()))?; + Self::from_yaml_str(&contents) + .map(Some) + .map_err(|err| anyhow::anyhow!("failed to parse {}: {err}", path.display())) + } + + pub fn from_yaml_str(contents: &str) -> anyhow::Result { + if contents.trim().is_empty() { + return Ok(Self::default()); + } + let yaml: serde_yaml::Value = serde_yaml::from_str(contents)?; + let mut values = BTreeMap::new(); + flatten_yaml_value(None, &yaml, &mut values)?; + Ok(Self { values }) + } + + pub fn write_at(&self, path: &Path) -> anyhow::Result<()> { + if let Some(parent) = path.parent() { + fs::create_dir_all(parent)?; + } + let mut mapping = serde_yaml::Mapping::new(); + for (key, value) in &self.values { + mapping.insert( + serde_yaml::Value::String(key.clone()), + serde_yaml::Value::String(value.clone()), + ); + } + let yaml = serde_yaml::to_string(&serde_yaml::Value::Mapping(mapping))?; + fs::write(path, yaml)?; + Ok(()) + } + + pub fn get(&self, key: &str) -> Option<&str> { + self.values.get(key).map(String::as_str) + } + + pub fn set(&mut self, key: impl Into, value: impl Into) { + self.values.insert(key.into(), value.into()); + } + + pub fn remove(&mut self, key: &str) { + self.values.remove(key); + } + + pub fn values(&self) -> &BTreeMap { + &self.values + } +} + +impl ConfigSource for StandaloneConfig { + fn config_value(&mut self, key: &str) -> Option { + if key == "embeddings.api_key" + && let Some(env_name) = self.values.get("embeddings.api_key_env") + && !env_name.trim().is_empty() + { + return std::env::var(env_name.trim()) + .ok() + .filter(|value| !value.trim().is_empty()); + } + self.values.get(key).cloned().or_else(|| match key { + "databases.falkordb.requirepass" => { + self.values.get("databases.falkordb.password").cloned() + } + _ => None, + }) + } + + fn resolve_value(&mut self, value: &str) -> anyhow::Result { + if value.contains("$secret:") { + anyhow::bail!("secret resolution requires daemon config_store"); + } + resolve_env_pattern(value)?.ok_or_else(|| anyhow::anyhow!("unresolved pattern: {value}")) + } +} + +pub fn gcore_config_path(gobby_home: &Path) -> PathBuf { + gobby_home.join(GCORE_CONFIG_FILENAME) +} + +pub fn services_dir(gobby_home: &Path) -> PathBuf { + gobby_home.join(SERVICES_DIRNAME) +} + +pub fn compose_file_path(gobby_home: &Path) -> PathBuf { + services_dir(gobby_home).join(COMPOSE_FILENAME) +} + +pub fn default_database_url(port: u16) -> String { + format!( + "postgresql://{user}:{password}@localhost:{port}/{db}", + user = DEFAULT_POSTGRES_USER, + password = DEFAULT_POSTGRES_PASSWORD, + db = DEFAULT_POSTGRES_DB + ) +} + +#[derive(Debug, Clone, PartialEq, Eq)] +pub struct DockerServiceOptions { + pub gobby_home: PathBuf, + pub postgres_port: u16, + pub qdrant_http_port: u16, + pub qdrant_grpc_port: u16, + pub falkordb_host: String, + pub falkordb_port: u16, + pub falkordb_browser_port: u16, + pub falkordb_password: String, +} + +impl DockerServiceOptions { + pub fn new(gobby_home: PathBuf) -> Self { + Self { + gobby_home, + postgres_port: DEFAULT_POSTGRES_PORT, + qdrant_http_port: DEFAULT_QDRANT_HTTP_PORT, + qdrant_grpc_port: DEFAULT_QDRANT_GRPC_PORT, + falkordb_host: DEFAULT_FALKORDB_HOST.to_string(), + falkordb_port: DEFAULT_FALKORDB_PORT, + falkordb_browser_port: DEFAULT_FALKORDB_BROWSER_PORT, + falkordb_password: DEFAULT_FALKORDB_PASSWORD.to_string(), + } + } + + pub fn database_url(&self) -> String { + default_database_url(self.postgres_port) + } + + pub fn qdrant_url(&self) -> String { + format!("http://localhost:{}", self.qdrant_http_port) + } +} + +#[derive(Debug, Clone, PartialEq, Eq)] +pub struct ServiceAssetReport { + pub services_dir: PathBuf, + pub compose_file: PathBuf, + pub env_file: PathBuf, + pub postgres_asset_dir: PathBuf, +} + +#[derive(Debug, Clone, PartialEq, Eq)] +pub struct DockerProvisioningReport { + pub services_dir: PathBuf, + pub compose_file: PathBuf, + pub env_file: PathBuf, + pub started_profiles: Vec, + pub health_checks: Vec, +} + +#[derive(Debug, Clone, PartialEq, Eq)] +pub struct CommandSpec { + pub program: String, + pub args: Vec, + pub env: BTreeMap, + pub cwd: Option, +} + +#[derive(Debug, Clone, PartialEq, Eq)] +pub struct CommandOutput { + pub status: i32, + pub stdout: String, + pub stderr: String, +} + +pub trait CommandRunner { + fn run(&mut self, spec: &CommandSpec) -> std::io::Result; +} + +pub struct RealCommandRunner; + +impl CommandRunner for RealCommandRunner { + fn run(&mut self, spec: &CommandSpec) -> std::io::Result { + let mut command = Command::new(&spec.program); + command.args(&spec.args); + if let Some(cwd) = &spec.cwd { + command.current_dir(cwd); + } + for (key, value) in &spec.env { + command.env(key, value); + } + let output = command.output()?; + Ok(CommandOutput { + status: output.status.code().unwrap_or(1), + stdout: String::from_utf8_lossy(&output.stdout).into_owned(), + stderr: String::from_utf8_lossy(&output.stderr).into_owned(), + }) + } +} + +pub trait DockerHealthChecker { + fn wait_postgres(&mut self, host: &str, port: u16) -> anyhow::Result<()>; + fn wait_qdrant(&mut self, host: &str, port: u16) -> anyhow::Result<()>; + fn wait_falkordb(&mut self, host: &str, port: u16) -> anyhow::Result<()>; +} + +pub struct TcpDockerHealthChecker { + pub retries: usize, + pub interval: Duration, +} + +impl Default for TcpDockerHealthChecker { + fn default() -> Self { + Self { + retries: 30, + interval: Duration::from_secs(2), + } + } +} + +impl DockerHealthChecker for TcpDockerHealthChecker { + fn wait_postgres(&mut self, host: &str, port: u16) -> anyhow::Result<()> { + wait_for_tcp(host, port, self.retries, self.interval) + .map_err(|err| anyhow::anyhow!("PostgreSQL did not become reachable: {err}")) + } + + fn wait_qdrant(&mut self, host: &str, port: u16) -> anyhow::Result<()> { + let healthz = || -> anyhow::Result<()> { + let mut stream = TcpStream::connect((host, port))?; + stream.set_read_timeout(Some(Duration::from_secs(3)))?; + stream.set_write_timeout(Some(Duration::from_secs(3)))?; + stream.write_all(b"GET /healthz HTTP/1.0\r\nHost: localhost\r\n\r\n")?; + let mut body = String::new(); + stream.read_to_string(&mut body)?; + if body.starts_with("HTTP/1.1 200") || body.starts_with("HTTP/1.0 200") { + Ok(()) + } else { + anyhow::bail!("unexpected Qdrant health response") + } + }; + wait_for(healthz, self.retries, self.interval) + .map_err(|err| anyhow::anyhow!("Qdrant did not become healthy: {err}")) + } + + fn wait_falkordb(&mut self, host: &str, port: u16) -> anyhow::Result<()> { + wait_for_tcp(host, port, self.retries, self.interval) + .map_err(|err| anyhow::anyhow!("FalkorDB did not become reachable: {err}")) + } +} + +pub fn provision_docker_services( + options: &DockerServiceOptions, +) -> anyhow::Result { + let mut runner = RealCommandRunner; + let mut health = TcpDockerHealthChecker::default(); + provision_docker_services_with(options, &mut runner, &mut health) +} + +pub fn provision_docker_services_with( + options: &DockerServiceOptions, + runner: &mut impl CommandRunner, + health: &mut impl DockerHealthChecker, +) -> anyhow::Result { + let assets = prepare_service_assets(options)?; + let spec = docker_compose_up_spec(options, &assets.compose_file, &assets.services_dir); + let output = runner.run(&spec).map_err(|err| { + anyhow::anyhow!("failed to execute docker compose for standalone services: {err}") + })?; + if output.status != 0 { + anyhow::bail!( + "docker compose up failed: {}", + first_non_empty(&output.stderr, &output.stdout) + ); + } + + health.wait_postgres(DEFAULT_POSTGRES_HOST, options.postgres_port)?; + health.wait_qdrant(DEFAULT_POSTGRES_HOST, options.qdrant_http_port)?; + health.wait_falkordb(&options.falkordb_host, options.falkordb_port)?; + + Ok(DockerProvisioningReport { + services_dir: assets.services_dir, + compose_file: assets.compose_file, + env_file: assets.env_file, + started_profiles: vec!["all".to_string()], + health_checks: vec![ + "postgres".to_string(), + "qdrant".to_string(), + "falkordb".to_string(), + ], + }) +} + +pub fn prepare_service_assets( + options: &DockerServiceOptions, +) -> anyhow::Result { + let services = services_dir(&options.gobby_home); + let compose = services.join(COMPOSE_FILENAME); + let pgsearch = services.join("postgres-pgsearch"); + let env_file = services.join(".env"); + + fs::create_dir_all(pgsearch.join("initdb.d"))?; + fs::create_dir_all(pgsearch.join("scripts"))?; + fs::write(&compose, COMPOSE_TEMPLATE)?; + fs::write(pgsearch.join("Dockerfile"), PGSEARCH_DOCKERFILE)?; + fs::write(pgsearch.join("version.json"), PGSEARCH_VERSION)?; + fs::write( + pgsearch.join("initdb.d").join("01-pg_search.sql"), + PGSEARCH_INIT_PG_SEARCH, + )?; + fs::write( + pgsearch.join("initdb.d").join("02-pgaudit.sql"), + PGSEARCH_INIT_PGAUDIT, + )?; + let audit_script = pgsearch.join("scripts").join("pg_audit_export.sh"); + fs::write(&audit_script, PG_AUDIT_EXPORT)?; + make_executable(&audit_script)?; + + let manifest = pgsearch_manifest()?; + update_env_file( + &env_file, + BTreeMap::from([ + ( + "GOBBY_PG_SEARCH_VERSION".to_string(), + manifest.pg_search_version, + ), + ("GOBBY_PG_SEARCH_SHA256".to_string(), manifest.sha256), + ( + "GOBBY_POSTGRES_PORT".to_string(), + options.postgres_port.to_string(), + ), + ( + "GOBBY_POSTGRES_DB".to_string(), + DEFAULT_POSTGRES_DB.to_string(), + ), + ( + "GOBBY_POSTGRES_USER".to_string(), + DEFAULT_POSTGRES_USER.to_string(), + ), + ( + "GOBBY_POSTGRES_PASSWORD".to_string(), + DEFAULT_POSTGRES_PASSWORD.to_string(), + ), + ( + "GOBBY_QDRANT_HTTP_PORT".to_string(), + options.qdrant_http_port.to_string(), + ), + ( + "GOBBY_QDRANT_GRPC_PORT".to_string(), + options.qdrant_grpc_port.to_string(), + ), + ( + "GOBBY_FALKORDB_PORT".to_string(), + options.falkordb_port.to_string(), + ), + ( + "GOBBY_FALKORDB_BROWSER_PORT".to_string(), + options.falkordb_browser_port.to_string(), + ), + ( + "GOBBY_FALKORDB_PASSWORD".to_string(), + options.falkordb_password.clone(), + ), + ]), + )?; + + Ok(ServiceAssetReport { + services_dir: services, + compose_file: compose, + env_file, + postgres_asset_dir: pgsearch, + }) +} + +pub fn docker_compose_up_spec( + options: &DockerServiceOptions, + compose_file: &Path, + services_dir: &Path, +) -> CommandSpec { + CommandSpec { + program: "docker".to_string(), + args: vec![ + "compose".to_string(), + "-f".to_string(), + compose_file.display().to_string(), + "--profile".to_string(), + "all".to_string(), + "up".to_string(), + "-d".to_string(), + "--remove-orphans".to_string(), + ], + env: BTreeMap::from([ + ( + "GOBBY_FALKORDB_PASSWORD".to_string(), + options.falkordb_password.clone(), + ), + ( + "GOBBY_POSTGRES_PORT".to_string(), + options.postgres_port.to_string(), + ), + ( + "GOBBY_QDRANT_HTTP_PORT".to_string(), + options.qdrant_http_port.to_string(), + ), + ]), + cwd: Some(services_dir.to_path_buf()), + } +} + +#[derive(Debug, Clone, PartialEq, Eq)] +pub struct EmbeddingBootstrap { + pub provider: String, + pub api_base: String, + pub model: String, + pub vector_dim: usize, + pub api_key_env: Option, +} + +impl EmbeddingBootstrap { + pub fn lm_studio() -> Self { + Self { + provider: "lm-studio".to_string(), + api_base: DEFAULT_LM_STUDIO_API_BASE.to_string(), + model: DEFAULT_LM_STUDIO_MODEL.to_string(), + vector_dim: DEFAULT_EMBEDDING_VECTOR_DIM, + api_key_env: None, + } + } + + pub fn ollama() -> Self { + Self { + provider: "ollama".to_string(), + api_base: DEFAULT_OLLAMA_API_BASE.to_string(), + model: DEFAULT_OLLAMA_MODEL.to_string(), + vector_dim: DEFAULT_EMBEDDING_VECTOR_DIM, + api_key_env: None, + } + } +} + +pub fn write_standalone_bootstrap( + path: &Path, + database_url: &str, + options: &DockerServiceOptions, + compose_file: Option<&Path>, + embedding: Option<&EmbeddingBootstrap>, +) -> anyhow::Result { + let mut config = StandaloneConfig::empty(); + config.set("databases.postgres.dsn", database_url); + config.set("databases.falkordb.host", &options.falkordb_host); + config.set("databases.falkordb.port", options.falkordb_port.to_string()); + config.set("databases.falkordb.password", &options.falkordb_password); + config.set("databases.qdrant.url", options.qdrant_url()); + if let Some(embedding) = embedding { + config.set("embeddings.provider", &embedding.provider); + config.set("embeddings.api_base", &embedding.api_base); + config.set("embeddings.model", &embedding.model); + config.set("embeddings.vector_dim", embedding.vector_dim.to_string()); + if let Some(api_key_env) = &embedding.api_key_env { + config.set("embeddings.api_key_env", api_key_env); + } + } + if let Some(compose_file) = compose_file { + config.set("services.compose_file", compose_file.display().to_string()); + } + config.write_at(path)?; + Ok(config) +} + +fn flatten_yaml_value( + prefix: Option<&str>, + value: &serde_yaml::Value, + output: &mut BTreeMap, +) -> anyhow::Result<()> { + match value { + serde_yaml::Value::Null => Ok(()), + serde_yaml::Value::Mapping(mapping) => { + for (key, value) in mapping { + let Some(key) = key.as_str() else { + anyhow::bail!("gcore.yaml keys must be strings"); + }; + let joined = match prefix { + Some(prefix) if !prefix.is_empty() => format!("{prefix}.{key}"), + _ => key.to_string(), + }; + match value { + serde_yaml::Value::Mapping(_) if !key.contains('.') => { + flatten_yaml_value(Some(&joined), value, output)?; + } + _ => { + if let Some(text) = scalar_to_string(value)? { + output.insert(joined, text); + } + } + } + } + Ok(()) + } + _ => { + let Some(prefix) = prefix else { + anyhow::bail!("gcore.yaml must be a mapping"); + }; + if let Some(text) = scalar_to_string(value)? { + output.insert(prefix.to_string(), text); + } + Ok(()) + } + } +} + +fn scalar_to_string(value: &serde_yaml::Value) -> anyhow::Result> { + Ok(match value { + serde_yaml::Value::Null => None, + serde_yaml::Value::String(value) => Some(value.clone()), + serde_yaml::Value::Bool(value) => Some(value.to_string()), + serde_yaml::Value::Number(value) => Some(value.to_string()), + other => Some(serde_yaml::to_string(other)?.trim().to_string()), + }) +} + +#[derive(Debug, Deserialize)] +struct PgSearchVersionFile { + pg_search_version: String, + pg_search_sha256: String, + pg_search_sha256_by_arch: Option>, +} + +struct PgSearchManifest { + pg_search_version: String, + sha256: String, +} + +fn pgsearch_manifest() -> anyhow::Result { + let parsed: PgSearchVersionFile = serde_json::from_str(PGSEARCH_VERSION)?; + let arch = debian_arch(std::env::consts::ARCH); + let sha256 = parsed + .pg_search_sha256_by_arch + .and_then(|by_arch| by_arch.get(&arch).cloned()) + .unwrap_or(parsed.pg_search_sha256); + Ok(PgSearchManifest { + pg_search_version: parsed.pg_search_version, + sha256, + }) +} + +fn debian_arch(arch: &str) -> String { + match arch { + "x86_64" | "amd64" => "amd64".to_string(), + "aarch64" | "arm64" => "arm64".to_string(), + other => other.to_string(), + } +} + +fn update_env_file(path: &Path, updates: BTreeMap) -> anyhow::Result<()> { + if let Some(parent) = path.parent() { + fs::create_dir_all(parent)?; + } + let mut lines = Vec::new(); + if path.exists() { + for line in fs::read_to_string(path)?.lines() { + let key = line.split_once('=').map(|(key, _)| key).unwrap_or(line); + if !updates.contains_key(key) { + lines.push(line.to_string()); + } + } + if lines.last().is_some_and(|line| !line.trim().is_empty()) { + lines.push(String::new()); + } + } + for (key, value) in updates { + lines.push(format!("{key}={value}")); + } + fs::write(path, format!("{}\n", lines.join("\n")))?; + Ok(()) +} + +fn first_non_empty<'a>(first: &'a str, second: &'a str) -> &'a str { + if first.trim().is_empty() { + second.trim() + } else { + first.trim() + } +} + +fn wait_for_tcp(host: &str, port: u16, retries: usize, interval: Duration) -> anyhow::Result<()> { + wait_for( + || { + TcpStream::connect((host, port)) + .map(|_| ()) + .map_err(Into::into) + }, + retries, + interval, + ) +} + +fn wait_for( + mut check: impl FnMut() -> anyhow::Result<()>, + retries: usize, + interval: Duration, +) -> anyhow::Result<()> { + let mut last_error = None; + for attempt in 0..retries { + match check() { + Ok(()) => return Ok(()), + Err(err) => last_error = Some(err), + } + if attempt + 1 < retries { + std::thread::sleep(interval); + } + } + Err(last_error.unwrap_or_else(|| anyhow::anyhow!("health check failed"))) +} + +fn make_executable(path: &Path) -> anyhow::Result<()> { + #[cfg(unix)] + { + use std::os::unix::fs::PermissionsExt; + let mut permissions = fs::metadata(path)?.permissions(); + permissions.set_mode(0o755); + fs::set_permissions(path, permissions)?; + } + #[cfg(not(unix))] + { + let _ = path; + } + Ok(()) +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn gcore_yaml_reads_flat_and_nested_keys() { + let config = StandaloneConfig::from_yaml_str( + r#" +databases.postgres.dsn: postgresql://flat/db +databases: + falkordb: + port: 16379 +embeddings: + api_key_env: OPENAI_API_KEY +"#, + ) + .expect("parse config"); + + assert_eq!( + config.get("databases.postgres.dsn"), + Some("postgresql://flat/db") + ); + assert_eq!(config.get("databases.falkordb.port"), Some("16379")); + assert_eq!(config.get("embeddings.api_key_env"), Some("OPENAI_API_KEY")); + } + + #[test] + fn gcore_yaml_writes_flat_keys() { + let dir = tempfile::tempdir().expect("tempdir"); + let path = dir.path().join(GCORE_CONFIG_FILENAME); + let mut config = StandaloneConfig::empty(); + config.set("databases.postgres.dsn", "postgresql://local/db"); + config.set("embeddings.vector_dim", "768"); + + config.write_at(&path).expect("write config"); + let raw = fs::read_to_string(&path).expect("read config"); + + assert!(raw.contains("databases.postgres.dsn:")); + assert!(raw.contains("embeddings.vector_dim:")); + assert_eq!( + StandaloneConfig::read_at(&path) + .expect("read config") + .expect("config present") + .get("embeddings.vector_dim"), + Some("768") + ); + } + + #[test] + fn standalone_config_resolves_service_keys_and_api_key_env() { + unsafe { std::env::set_var("GCORE_TEST_EMBEDDING_KEY", "test-key") }; + let mut config = StandaloneConfig::from_yaml_str( + r#" +databases.falkordb.host: 127.0.0.1 +databases.falkordb.port: "16379" +databases.falkordb.password: falkor-pass +databases.qdrant.url: http://localhost:6333 +embeddings.api_base: http://localhost:1234/v1 +embeddings.model: text-embedding-nomic-embed-text-v1.5@f16 +embeddings.api_key_env: GCORE_TEST_EMBEDDING_KEY +"#, + ) + .expect("parse config"); + + let falkor = crate::config::resolve_falkordb_config(&mut config).expect("falkor"); + assert_eq!(falkor.password.as_deref(), Some("falkor-pass")); + let qdrant = crate::config::resolve_qdrant_config(&mut config).expect("qdrant"); + assert_eq!(qdrant.url.as_deref(), Some("http://localhost:6333")); + let embedding = crate::config::resolve_embedding_config(&mut config).expect("embedding"); + assert_eq!(embedding.api_key.as_deref(), Some("test-key")); + unsafe { std::env::remove_var("GCORE_TEST_EMBEDDING_KEY") }; + } + + #[test] + fn compose_template_matches_daemon_checkout_when_present() { + let daemon = + Path::new("/Users/josh/Projects/gobby/src/gobby/data/docker-compose.services.yml"); + if !daemon.exists() { + return; + } + let daemon_template = fs::read_to_string(daemon).expect("read daemon compose template"); + assert_eq!(COMPOSE_TEMPLATE, daemon_template); + } + + #[test] + fn docker_provisioning_prepares_assets_runs_compose_and_health_checks() { + let dir = tempfile::tempdir().expect("tempdir"); + let mut runner = RecordingRunner::default(); + let mut health = RecordingHealth::default(); + let options = DockerServiceOptions::new(dir.path().join(".gobby")); + + let report = provision_docker_services_with(&options, &mut runner, &mut health) + .expect("provision services"); + + assert_eq!(runner.commands.len(), 1); + assert_eq!(runner.commands[0].program, "docker"); + assert!(runner.commands[0].args.contains(&"--profile".to_string())); + assert!(runner.commands[0].args.contains(&"all".to_string())); + assert_eq!(health.checks, vec!["postgres", "qdrant", "falkordb"]); + assert_eq!(report.started_profiles, vec!["all"]); + assert_eq!(report.health_checks, vec!["postgres", "qdrant", "falkordb"]); + assert_eq!( + fs::read_to_string(&report.compose_file).expect("read compose"), + COMPOSE_TEMPLATE + ); + assert!( + report + .services_dir + .join("postgres-pgsearch") + .join("Dockerfile") + .exists() + ); + assert!( + fs::read_to_string(&report.env_file) + .expect("read env") + .contains("GOBBY_PG_SEARCH_VERSION=0.23.4") + ); + } + + #[derive(Default)] + struct RecordingRunner { + commands: Vec, + } + + impl CommandRunner for RecordingRunner { + fn run(&mut self, spec: &CommandSpec) -> std::io::Result { + self.commands.push(spec.clone()); + Ok(CommandOutput { + status: 0, + stdout: String::new(), + stderr: String::new(), + }) + } + } + + #[derive(Default)] + struct RecordingHealth { + checks: Vec<&'static str>, + } + + impl DockerHealthChecker for RecordingHealth { + fn wait_postgres(&mut self, _host: &str, _port: u16) -> anyhow::Result<()> { + self.checks.push("postgres"); + Ok(()) + } + + fn wait_qdrant(&mut self, _host: &str, _port: u16) -> anyhow::Result<()> { + self.checks.push("qdrant"); + Ok(()) + } + + fn wait_falkordb(&mut self, _host: &str, _port: u16) -> anyhow::Result<()> { + self.checks.push("falkordb"); + Ok(()) + } + } +} diff --git a/crates/gcore/src/qdrant.rs b/crates/gcore/src/qdrant.rs new file mode 100644 index 0000000..9edacb7 --- /dev/null +++ b/crates/gcore/src/qdrant.rs @@ -0,0 +1,428 @@ +//! Qdrant foundation adapter boundary. +//! +//! This module is available with the `qdrant` feature. Consumers should surface +//! missing or unreachable vector services as typed degradation unless a command +//! explicitly requires semantic search. + +use crate::config::QdrantConfig; +use crate::degradation::ServiceState; +use serde_json::{Map, Value}; +use std::time::Duration; + +const QDRANT_TIMEOUT: Duration = Duration::from_secs(5); + +/// Scope for a Qdrant collection, allowing caller-controlled naming. +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub enum CollectionScope<'a> { + /// `{namespace}:project:{id}` — per-project vector store. + Project(&'a str), + /// `{namespace}:topic:{name}` — topic-scoped store. + Topic(&'a str), + /// Verbatim collection name, without namespace prefixing. + Custom(&'a str), +} + +/// Build a collection name from namespace and scope. +pub fn collection_name(namespace: &str, scope: CollectionScope<'_>) -> String { + match scope { + CollectionScope::Project(id) => format!("{namespace}:project:{id}"), + CollectionScope::Topic(name) => format!("{namespace}:topic:{name}"), + CollectionScope::Custom(name) => name.to_string(), + } +} + +/// Vector upsert request with opaque domain payload. +#[derive(Debug, Clone, PartialEq)] +pub struct UpsertRequest { + pub id: String, + pub vector: Vec, + pub payload: Map, +} + +/// Vector search request with opaque domain filter. +#[derive(Debug, Clone, PartialEq)] +pub struct SearchRequest { + pub vector: Vec, + pub limit: usize, + pub filter: Option, +} + +/// Vector search result with score and opaque payload. +#[derive(Debug, Clone, PartialEq)] +pub struct SearchHit { + pub id: String, + pub score: f32, + pub payload: Map, +} + +/// Run a closure with Qdrant config, with typed degradation for missing config. +pub fn with_qdrant( + config: Option<&QdrantConfig>, + default: T, + f: impl FnOnce(&QdrantConfig) -> anyhow::Result, +) -> anyhow::Result<(T, ServiceState)> { + let Some(config) = config else { + return Ok((default, ServiceState::NotConfigured)); + }; + if config.url.is_none() { + return Ok((default, ServiceState::NotConfigured)); + } + + let value = f(config)?; + Ok((value, ServiceState::Available)) +} + +/// Execute a vector search via Qdrant REST API. +pub fn search( + config: &QdrantConfig, + collection: &str, + request: SearchRequest, +) -> anyhow::Result> { + let url = config + .url + .as_deref() + .ok_or_else(|| anyhow::anyhow!("Qdrant URL not configured"))? + .trim_end_matches('/'); + let client = reqwest::blocking::Client::builder() + .timeout(QDRANT_TIMEOUT) + .build()?; + + let mut req = client.post(format!("{url}/collections/{collection}/points/search")); + if let Some(key) = &config.api_key { + req = req.header("api-key", key); + } + + let body = serde_json::json!({ + "vector": request.vector, + "limit": request.limit, + "filter": request.filter, + "with_payload": true, + }); + let resp = req.json(&body).send()?; + let status = resp.status(); + if !status.is_success() { + anyhow::bail!("Qdrant search failed: HTTP {status}"); + } + + let data: Value = resp.json()?; + let hits = data + .get("result") + .and_then(Value::as_array) + .map(|results| { + results + .iter() + .filter_map(parse_search_hit) + .collect::>() + }) + .unwrap_or_default(); + + Ok(hits) +} + +/// Execute a batch vector upsert via Qdrant REST API. +pub fn upsert( + config: &QdrantConfig, + collection: &str, + points: Vec, +) -> anyhow::Result<()> { + let url = config + .url + .as_deref() + .ok_or_else(|| anyhow::anyhow!("Qdrant URL not configured"))? + .trim_end_matches('/'); + let client = reqwest::blocking::Client::builder() + .timeout(QDRANT_TIMEOUT) + .build()?; + + let points: Vec = points + .into_iter() + .map(|point| { + serde_json::json!({ + "id": point.id, + "vector": point.vector, + "payload": point.payload, + }) + }) + .collect(); + let body = serde_json::json!({ "points": points }); + + let mut req = client.put(format!("{url}/collections/{collection}/points")); + if let Some(key) = &config.api_key { + req = req.header("api-key", key); + } + + let resp = req.json(&body).send()?; + let status = resp.status(); + if !status.is_success() { + anyhow::bail!("Qdrant upsert failed: HTTP {status}"); + } + + Ok(()) +} + +fn parse_search_hit(hit: &Value) -> Option { + let id = parse_point_id(hit.get("id")?)?; + let score = hit.get("score")?.as_f64()? as f32; + let payload = hit + .get("payload") + .and_then(Value::as_object) + .cloned() + .unwrap_or_default(); + + Some(SearchHit { id, score, payload }) +} + +fn parse_point_id(id: &Value) -> Option { + match id { + Value::String(value) => Some(value.clone()), + Value::Number(value) => Some(value.to_string()), + _ => None, + } +} + +#[cfg(test)] +mod tests { + use super::*; + use crate::config::QdrantConfig; + use crate::degradation::ServiceState; + use serde_json::{Map, Value, json}; + use std::io::{Read, Write}; + use std::net::TcpListener; + use std::thread; + + #[test] + fn collection_name_covers_all_scopes() { + assert_eq!( + collection_name("gwiki", CollectionScope::Project("abc-123")), + "gwiki:project:abc-123" + ); + assert_eq!( + collection_name("gwiki", CollectionScope::Topic("rust-async")), + "gwiki:topic:rust-async" + ); + assert_eq!( + collection_name("gcode", CollectionScope::Custom("code_symbols_abc-123")), + "code_symbols_abc-123" + ); + } + + #[test] + fn payload_schema_is_opaque() { + let mut payload = Map::new(); + payload.insert("symbol_id".to_string(), json!("sym-1")); + payload.insert("wiki".to_string(), json!({"topic": "rust"})); + + let upsert = UpsertRequest { + id: "point-1".to_string(), + vector: vec![0.25, 0.5], + payload: payload.clone(), + }; + let search = SearchRequest { + vector: vec![0.25, 0.5], + limit: 5, + filter: Some(json!({"must": [{"key": "kind", "match": {"value": "fn"}}]})), + }; + + assert_eq!(upsert.payload, payload); + assert_eq!(search.filter.as_ref().unwrap()["must"][0]["key"], "kind"); + } + + #[test] + fn with_qdrant_degradation_contract() { + let config = QdrantConfig { + url: Some("http://localhost:6333".to_string()), + api_key: None, + }; + let missing_url = QdrantConfig { + url: None, + api_key: None, + }; + + assert_eq!( + with_qdrant(None, vec!["default"], |_| Ok(vec!["value"])).unwrap(), + (vec!["default"], ServiceState::NotConfigured) + ); + assert_eq!( + with_qdrant(Some(&missing_url), 7, |_| Ok(9)).unwrap(), + (7, ServiceState::NotConfigured) + ); + assert_eq!( + with_qdrant(Some(&config), "default", |_| Ok("value")).unwrap(), + ("value", ServiceState::Available) + ); + + let err = with_qdrant(Some(&config), 0, |_| anyhow::bail!("qdrant failed")) + .expect_err("closure errors propagate"); + assert_eq!(err.to_string(), "qdrant failed"); + } + + #[test] + fn sync_search_from_cli_path() { + let (base_url, request_handle) = spawn_qdrant_response( + 200, + json!({ + "result": [ + { + "id": "point-1", + "score": 0.93, + "payload": {"symbol_id": "sym-1", "kind": "function"} + } + ] + }), + ); + let config = QdrantConfig { + url: Some(base_url), + api_key: Some("secret-key".to_string()), + }; + + let hits = search( + &config, + "code_symbols_project", + SearchRequest { + vector: vec![0.1, 0.2], + limit: 3, + filter: Some(json!({"must": []})), + }, + ) + .expect("search succeeds"); + let request = request_handle.join().expect("request thread"); + + assert_eq!(hits.len(), 1); + assert_eq!(hits[0].id, "point-1"); + assert_eq!(hits[0].score, 0.93); + assert_eq!(hits[0].payload["symbol_id"], "sym-1"); + assert!(request.contains("POST /collections/code_symbols_project/points/search HTTP/1.1")); + assert!(request.contains("api-key: secret-key")); + assert!(request.contains(r#""with_payload":true"#)); + } + + #[test] + fn with_qdrant_search_composition() { + let (base_url, request_handle) = spawn_qdrant_response( + 200, + json!({"result": [{"id": "point-1", "score": 0.5, "payload": {}}]}), + ); + let config = QdrantConfig { + url: Some(base_url), + api_key: None, + }; + + let (hits, state) = with_qdrant(Some(&config), vec![], |cfg| { + search( + cfg, + "collection", + SearchRequest { + vector: vec![0.1], + limit: 1, + filter: None, + }, + ) + }) + .expect("composed search"); + request_handle.join().expect("request thread"); + + assert_eq!(state, ServiceState::Available); + assert_eq!(hits[0].id, "point-1"); + } + + #[test] + fn custom_scope_returns_verbatim_name() { + assert_eq!( + collection_name("ignored", CollectionScope::Custom("code_symbols_project-1")), + "code_symbols_project-1" + ); + } + + #[test] + fn qdrant_single_state_boundary() { + let missing_url = QdrantConfig { + url: None, + api_key: None, + }; + let (default_hits, state) = + with_qdrant(Some(&missing_url), Vec::::new(), |_| { + unreachable!("search should not run without qdrant url") + }) + .expect("missing url degrades"); + assert_eq!(default_hits.len(), 0); + assert_eq!(state, ServiceState::NotConfigured); + + let (base_url, request_handle) = + spawn_qdrant_response(503, json!({"status": "service unavailable"})); + let config = QdrantConfig { + url: Some(base_url), + api_key: None, + }; + let err = with_qdrant(Some(&config), Vec::::new(), |cfg| { + search( + cfg, + "collection", + SearchRequest { + vector: vec![0.1], + limit: 1, + filter: None, + }, + ) + }) + .expect_err("http errors propagate out of qdrant boundary"); + request_handle.join().expect("request thread"); + + assert!(err.to_string().contains("Qdrant search failed: HTTP 503")); + } + + fn spawn_qdrant_response(status: u16, body: Value) -> (String, thread::JoinHandle) { + let listener = TcpListener::bind("127.0.0.1:0").expect("bind test server"); + let addr = listener.local_addr().expect("local addr"); + let handle = thread::spawn(move || { + let (mut stream, _) = listener.accept().expect("accept request"); + let request = read_http_request(&mut stream); + + let body = body.to_string(); + write!( + stream, + "HTTP/1.1 {status} OK\r\nContent-Type: application/json\r\nContent-Length: {}\r\nConnection: close\r\n\r\n{body}", + body.len() + ) + .expect("write response"); + + request + }); + + (format!("http://{addr}"), handle) + } + + fn read_http_request(stream: &mut impl Read) -> String { + let mut request = Vec::new(); + let mut buffer = [0; 4096]; + let mut expected_len = None; + + loop { + let n = stream.read(&mut buffer).expect("read request"); + if n == 0 { + break; + } + request.extend_from_slice(&buffer[..n]); + + if expected_len.is_none() + && let Some(header_end) = + request.windows(4).position(|window| window == b"\r\n\r\n") + { + let headers = String::from_utf8_lossy(&request[..header_end]); + let content_len = headers + .lines() + .find_map(|line| line.strip_prefix("content-length: ")) + .and_then(|value| value.parse::().ok()) + .unwrap_or(0); + expected_len = Some(header_end + 4 + content_len); + } + + if let Some(expected_len) = expected_len + && request.len() >= expected_len + { + break; + } + } + + String::from_utf8_lossy(&request).into_owned() + } +} diff --git a/crates/gcore/src/search.rs b/crates/gcore/src/search.rs new file mode 100644 index 0000000..0bc81cd --- /dev/null +++ b/crates/gcore/src/search.rs @@ -0,0 +1,205 @@ +//! Generic search result and rank-fusion primitives. +//! +//! This module is available with the `search` feature. Domain-specific query +//! behavior stays with the consuming crate. + +use serde::{Deserialize, Serialize}; +use std::collections::HashMap; + +/// RRF constant — matches Python RRF_K in code_index/searcher.py. +const RRF_K: f64 = 60.0; + +/// A search result from any source, with opaque identity and metadata. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct SearchResult { + /// Opaque identifier (symbol UUID, doc UUID, chunk ID, etc.). + pub id: String, + /// Combined score after fusion. + pub score: f64, + /// Which sources contributed this result. + pub sources: Vec, + /// Source-level explanations for debugging. + #[serde(skip_serializing_if = "Vec::is_empty")] + pub explanations: Vec, +} + +/// Per-source contribution to a fused search result. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct SourceExplanation { + pub source: String, + pub rank: usize, + pub score: f64, +} + +/// Metadata for a search that had unavailable sources. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct SearchDegradation { + pub unavailable_sources: Vec, + pub available_sources: Vec, +} + +/// Merge multiple ranked lists using Reciprocal Rank Fusion. +/// +/// Each source is a `(name, ranked_ids)` pair where index 0 = most relevant. +/// Returns results sorted by combined RRF score descending. +pub fn rrf_merge(sources: Vec<(&str, Vec)>) -> Vec { + let mut entries: HashMap> = HashMap::new(); + + for (source_name, ids) in &sources { + let mut best_rank: HashMap<&String, usize> = HashMap::new(); + for (rank, id) in ids.iter().enumerate() { + best_rank + .entry(id) + .and_modify(|best| *best = (*best).min(rank)) + .or_insert(rank); + } + + for (id, rank) in best_rank { + let score = 1.0 / (RRF_K + rank as f64); + entries + .entry(id.clone()) + .or_default() + .push(SourceExplanation { + source: source_name.to_string(), + rank, + score, + }); + } + } + + let mut results: Vec = entries + .into_iter() + .map(|(id, mut explanations)| { + explanations.sort_by(|a, b| a.source.cmp(&b.source)); + let score = explanations + .iter() + .map(|explanation| explanation.score) + .sum(); + let sources = explanations + .iter() + .map(|explanation| explanation.source.clone()) + .collect(); + + SearchResult { + id, + score, + sources, + explanations, + } + }) + .collect(); + + results.sort_by(|a, b| { + b.score + .partial_cmp(&a.score) + .unwrap_or(std::cmp::Ordering::Equal) + .then_with(|| a.id.cmp(&b.id)) + }); + results +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn rrf_preserves_explanations_and_degradation() { + let results = rrf_merge(vec![ + ("semantic", vec!["b".to_string(), "a".to_string()]), + ("fts", vec!["a".to_string()]), + ]); + + let a = results.iter().find(|result| result.id == "a").unwrap(); + assert_eq!(a.sources, vec!["fts".to_string(), "semantic".to_string()]); + assert_eq!(a.explanations.len(), 2); + assert_eq!(a.explanations[0].source, "fts"); + assert_eq!(a.explanations[0].rank, 0); + assert_eq!(a.explanations[1].source, "semantic"); + assert_eq!(a.explanations[1].rank, 1); + + let degradation = SearchDegradation { + unavailable_sources: vec!["fallback".to_string()], + available_sources: vec!["fts".to_string(), "semantic".to_string()], + }; + assert_eq!(degradation.unavailable_sources, vec!["fallback"]); + assert_eq!(degradation.available_sources, vec!["fts", "semantic"]); + } + + #[test] + fn search_result_is_cli_independent() { + let result = SearchResult { + id: "symbol-1".to_string(), + score: 0.25, + sources: vec!["fts".to_string()], + explanations: vec![SourceExplanation { + source: "fts".to_string(), + rank: 0, + score: 1.0 / 60.0, + }], + }; + + let json = serde_json::to_string(&result).unwrap(); + assert!(json.contains("\"id\":\"symbol-1\"")); + + let round_trip: SearchResult = serde_json::from_str(&json).unwrap(); + assert_eq!(round_trip.id, result.id); + assert_eq!(round_trip.sources, result.sources); + assert_eq!(round_trip.explanations[0].source, "fts"); + } + + #[test] + fn search_core_has_no_domain_queries() { + let source = include_str!("search.rs"); + for forbidden in forbidden_domain_fragments() { + assert!( + !source.contains(&forbidden), + "search core should not contain domain-specific query fragment {forbidden:?}" + ); + } + } + + fn forbidden_domain_fragments() -> Vec { + [ + ["SEL", "ECT "], + ["FR", "OM "], + ["WHE", "RE "], + ["qd", "rant"], + ["pay", "load"], + ["CA", "LLS"], + ["gra", "ph"], + ["Fal", "kor"], + ["Gra", "ph"], + ] + .into_iter() + .map(|parts| parts.concat()) + .collect() + } + + #[test] + fn rrf_deduplicates_within_source() { + let results = rrf_merge(vec![("fts", vec!["a".to_string(), "a".to_string()])]); + + assert_eq!(results.len(), 1); + assert_eq!(results[0].id, "a"); + assert_eq!(results[0].sources, vec!["fts".to_string()]); + assert_eq!(results[0].explanations.len(), 1); + assert_eq!(results[0].explanations[0].rank, 0); + assert!((results[0].score - (1.0 / 60.0)).abs() < 1e-10); + } + + #[test] + fn rrf_sorts_sources_deterministically() { + let results = rrf_merge(vec![ + ("semantic", vec!["b".to_string()]), + ("fts", vec!["b".to_string()]), + ]); + + assert_eq!(results[0].id, "b"); + assert_eq!( + results[0].sources, + vec!["fts".to_string(), "semantic".to_string()] + ); + assert_eq!(results[0].explanations[0].source, "fts"); + assert_eq!(results[0].explanations[1].source, "semantic"); + } +} diff --git a/crates/gcore/src/setup.rs b/crates/gcore/src/setup.rs new file mode 100644 index 0000000..c6d9343 --- /dev/null +++ b/crates/gcore/src/setup.rs @@ -0,0 +1,295 @@ +//! Shared setup-mode boundary. +//! +//! Attached and standalone setup contracts belong here. Runtime callers should +//! validate externally managed state explicitly and avoid implicit schema or +//! service creation. + +pub use crate::degradation::{Guidance, SetupIssue}; + +/// Datastore kind for setup object classification. +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub enum StoreKind { + /// PostgreSQL hub datastore. + Postgres, + /// FalkorDB graph datastore. + FalkorDB, + /// Qdrant vector datastore. + Qdrant, +} + +/// Context supplied to validation callbacks. +/// +/// Contains optional mutable connections to each datastore. Consumers use +/// whichever connection their validator needs; `None` means the service is not +/// configured. PostgreSQL is feature-gated because `postgres::Client::query` +/// requires `&mut self`. +pub struct ValidationContext<'a> { + /// PostgreSQL connection supplied by the caller when the `postgres` feature is enabled. + #[cfg(feature = "postgres")] + pub pg: Option<&'a mut postgres::Client>, + /// FalkorDB connection configuration, when configured. + pub falkor_config: Option<&'a crate::config::FalkorConfig>, + /// Qdrant connection configuration, when configured. + pub qdrant_config: Option<&'a crate::config::QdrantConfig>, +} + +/// Result of running all attached-mode validators. +#[derive(Debug, Default)] +pub struct ValidationReport { + /// Names of objects that passed validation. + pub present: Vec, + /// Objects that failed validation, with structured issue details. + pub missing: Vec<(String, SetupIssue)>, +} + +impl ValidationReport { + /// Returns true when every required object passed validation. + pub fn is_healthy(&self) -> bool { + self.missing.is_empty() + } +} + +/// Consumer-supplied validation callback for a required object. +pub type RequiredValidator = + dyn for<'ctx> FnMut(&mut ValidationContext<'ctx>) -> Result<(), SetupIssue>; + +/// Required object that a consumer crate declares for setup validation. +pub struct RequiredObject { + /// Human-readable name, such as `symbols table` or `wiki_docs table`. + pub name: String, + /// Store kind that owns the object. + pub store: StoreKind, + /// Consumer-supplied check function. + pub validator: Box, +} + +/// Attached-mode validation: check that externally managed resources exist. +/// +/// Attached validation must never create, alter, or drop datastore schema. +pub trait AttachedValidator { + /// Declare the objects this consumer requires. + fn required_objects(&self) -> Vec; + + /// Run all validators and return a report of present and missing objects. + fn validate(&self, ctx: &mut ValidationContext<'_>) -> ValidationReport { + let mut report = ValidationReport::default(); + for mut obj in self.required_objects() { + match (obj.validator)(ctx) { + Ok(()) => report.present.push(obj.name), + Err(issue) => report.missing.push((obj.name, issue)), + } + } + report + } +} + +/// Context supplied to standalone setup creation callbacks. +/// +/// PostgreSQL is feature-gated because `postgres::Client::execute` requires +/// `&mut self` for DDL and DML operations. +pub struct SetupContext<'a> { + /// PostgreSQL connection supplied by the caller when the `postgres` feature is enabled. + #[cfg(feature = "postgres")] + pub pg: Option<&'a mut postgres::Client>, + /// FalkorDB connection configuration, when configured. + pub falkor_config: Option<&'a crate::config::FalkorConfig>, + /// Qdrant connection configuration, when configured. + pub qdrant_config: Option<&'a crate::config::QdrantConfig>, + /// If true, skip prompts and apply defaults. + pub non_interactive: bool, +} + +/// Report from a standalone setup creation run. +#[derive(Debug, Default)] +pub struct SetupReport { + /// Objects successfully created. + pub created: Vec, + /// Objects that already existed and were skipped. + pub skipped: Vec, + /// Objects that failed creation, with error detail. + pub failed: Vec<(String, String)>, +} + +/// Error from standalone setup creation. +#[derive(Debug, thiserror::Error)] +pub enum SetupError { + /// Connection setup failed for a datastore. + #[error("connection failed for {store}: {message}")] + ConnectionFailed { + /// Store name. + store: String, + /// Diagnostic message. + message: String, + }, + /// Object creation failed. + #[error("creation failed for {object}: {message}")] + CreationFailed { + /// Object name. + object: String, + /// Diagnostic message. + message: String, + }, + /// Creation was attempted in attached mode. + #[error("setup refused in attached mode — use standalone setup")] + AttachedModeRefused, +} + +/// Consumer-supplied creation callback for an owned object. +pub type OwnedCreator = dyn for<'ctx> FnMut(&mut SetupContext<'ctx>) -> Result<(), SetupError>; + +/// An object that a consumer crate owns and can create in standalone mode. +pub struct OwnedObject { + /// Human-readable name, such as `gcode_symbols table`. + pub name: String, + /// Store kind that owns the object. + pub store: StoreKind, + /// Consumer-supplied creation function. + pub creator: Box, +} + +/// Standalone-mode setup: explicit opt-in creation of consumer-owned resources. +pub trait StandaloneSetup { + /// Namespace prefix for this consumer's owned resources, such as `gcode` or `gwiki`. + fn namespace(&self) -> &str; + + /// Declare what this consumer owns and can create. + fn owned_objects(&self) -> Vec; + + /// Create consumer-owned resources. Called only on an explicit setup command. + fn create(&self, ctx: &mut SetupContext<'_>) -> Result; +} + +#[cfg(test)] +mod tests { + use super::*; + use std::cell::{Cell, RefCell}; + use std::rc::Rc; + + #[test] + fn runtime_validation_reports_setup_guidance() { + struct RuntimeValidator; + + impl AttachedValidator for RuntimeValidator { + fn required_objects(&self) -> Vec { + vec![ + RequiredObject { + name: "symbols table".to_string(), + store: StoreKind::Postgres, + validator: Box::new(|_| Ok(())), + }, + RequiredObject { + name: "BM25 index".to_string(), + store: StoreKind::Postgres, + validator: Box::new(|_| { + Err(SetupIssue { + object_name: "BM25 index".to_string(), + store: "postgres".to_string(), + guidance: Guidance { + problem: "BM25 index is missing".to_string(), + action: "run the standalone setup command".to_string(), + command_hint: Some("gobby setup standalone".to_string()), + }, + }) + }), + }, + ] + } + } + + let falkor_config = crate::config::FalkorConfig { + host: "localhost".to_string(), + port: 16379, + password: None, + }; + let mut ctx = ValidationContext { + #[cfg(feature = "postgres")] + pg: None, + falkor_config: Some(&falkor_config), + qdrant_config: None, + }; + + let report = RuntimeValidator.validate(&mut ctx); + + assert!(!report.is_healthy()); + assert_eq!(report.present, vec!["symbols table"]); + assert_eq!(report.missing.len(), 1); + let (object, issue) = &report.missing[0]; + assert_eq!(object, "BM25 index"); + assert_eq!(issue.object_name, "BM25 index"); + assert_eq!(issue.guidance.problem, "BM25 index is missing"); + assert_eq!( + issue.guidance.command_hint.as_deref(), + Some("gobby setup standalone") + ); + } + + #[test] + fn validator_can_query_through_mutable_context() { + let falkor_config = crate::config::FalkorConfig { + host: "graph.local".to_string(), + port: 16379, + password: None, + }; + let mut ctx = ValidationContext { + #[cfg(feature = "postgres")] + pg: None, + falkor_config: Some(&falkor_config), + qdrant_config: None, + }; + let observed_port = Rc::new(Cell::new(None)); + let captured_port = Rc::clone(&observed_port); + let mut validator = RequiredObject { + name: "graph config".to_string(), + store: StoreKind::FalkorDB, + validator: Box::new(move |ctx| { + captured_port.set(ctx.falkor_config.map(|config| config.port)); + Ok(()) + }), + }; + + (validator.validator)(&mut ctx).expect("validator can read mutable context"); + + assert_eq!(observed_port.get(), Some(16379)); + } + + #[test] + fn creator_executes_without_moving_ownership() { + let mut ctx = SetupContext { + #[cfg(feature = "postgres")] + pg: None, + falkor_config: None, + qdrant_config: None, + non_interactive: true, + }; + let calls = Rc::new(RefCell::new(Vec::new())); + let first_calls = Rc::clone(&calls); + let second_calls = Rc::clone(&calls); + let mut creators = vec![ + OwnedObject { + name: "first table".to_string(), + store: StoreKind::Postgres, + creator: Box::new(move |ctx| { + assert!(ctx.non_interactive); + first_calls.borrow_mut().push("first"); + Ok(()) + }), + }, + OwnedObject { + name: "second table".to_string(), + store: StoreKind::Postgres, + creator: Box::new(move |ctx| { + assert!(ctx.non_interactive); + second_calls.borrow_mut().push("second"); + Ok(()) + }), + }, + ]; + + for creator in &mut creators { + (creator.creator)(&mut ctx).expect("creator can execute through mutable context"); + } + + assert!(ctx.non_interactive); + assert_eq!(*calls.borrow(), vec!["first", "second"]); + } +} diff --git a/crates/gcore/tests/public_boundary.rs b/crates/gcore/tests/public_boundary.rs new file mode 100644 index 0000000..807921c --- /dev/null +++ b/crates/gcore/tests/public_boundary.rs @@ -0,0 +1,98 @@ +use std::fs; +use std::path::PathBuf; + +fn crate_file(path: &str) -> String { + fs::read_to_string(PathBuf::from(env!("CARGO_MANIFEST_DIR")).join(path)) + .unwrap_or_else(|err| panic!("failed to read {path}: {err}")) +} + +fn repo_file(path: &str) -> String { + fs::read_to_string( + PathBuf::from(env!("CARGO_MANIFEST_DIR")) + .join("../..") + .join(path), + ) + .unwrap_or_else(|err| panic!("failed to read {path}: {err}")) +} + +#[test] +fn cargo_features_define_public_boundary() { + let manifest = crate_file("Cargo.toml"); + + for expected in [ + "default = []", + r#"postgres = ["dep:postgres", "dep:postgres-types"]"#, + r#"falkor = ["dep:falkordb", "dep:urlencoding"]"#, + r#"qdrant = ["dep:reqwest"]"#, + r#"indexing = ["dep:ignore", "dep:sha2"]"#, + "search = []", + r#"full = ["postgres", "falkor", "qdrant", "indexing", "search"]"#, + r#"serde = { version = "1", features = ["derive"] }"#, + r#"thiserror = "2""#, + r#"postgres = { version = "0.19", optional = true }"#, + r#"postgres-types = { version = "0.2", optional = true }"#, + r#"falkordb = { version = "0.2", optional = true }"#, + r#"reqwest = { version = "0.12", default-features = false, features = ["blocking", "json", "rustls-tls"], optional = true }"#, + r#"ignore = { version = "0.4", optional = true }"#, + r#"sha2 = { version = "0.10", optional = true }"#, + r#"urlencoding = { version = "2", optional = true }"#, + ] { + assert!( + manifest.contains(expected), + "Cargo.toml is missing expected public-boundary snippet: {expected}" + ); + } +} + +#[test] +fn lib_rs_exposes_lightweight_and_feature_gated_modules() { + let lib_rs = crate_file("src/lib.rs"); + + for expected in [ + "pub mod bootstrap;", + "pub mod daemon_url;", + "pub mod project;", + "pub mod config;", + "pub mod context;", + "pub mod degradation;", + "pub mod setup;", + r#"#[cfg(feature = "postgres")]"#, + "pub mod postgres;", + r#"#[cfg(feature = "falkor")]"#, + "pub mod falkor;", + r#"#[cfg(feature = "qdrant")]"#, + "pub mod qdrant;", + r#"#[cfg(feature = "indexing")]"#, + "pub mod indexing;", + r#"#[cfg(feature = "search")]"#, + "pub mod search;", + ] { + assert!( + lib_rs.contains(expected), + "lib.rs is missing expected public-boundary snippet: {expected}" + ); + } +} + +#[test] +fn development_guide_documents_foundation_boundary() { + let guide = repo_file("docs/guides/gcore-development-guide.md"); + + for expected in [ + "shared Rust migration substrate", + "Feature Gates", + "`postgres`", + "`falkor`", + "`qdrant`", + "`indexing`", + "`search`", + "`full`", + "Feature-gated modules", + "Adding a New Helper", + ] { + assert!( + guide.contains(expected), + "development guide is missing expected public-boundary text: {expected}" + ); + } +} diff --git a/crates/ghook/Cargo.toml b/crates/ghook/Cargo.toml index 94cca3d..277946d 100644 --- a/crates/ghook/Cargo.toml +++ b/crates/ghook/Cargo.toml @@ -22,7 +22,7 @@ base64 = "0.22" chrono = { version = "0.4", default-features = false, features = ["clock", "std"] } clap = { version = "4", features = ["derive"] } dirs = "6" -gobby-core = { path = "../gcore", version = "0.1.0" } +gobby-core = { path = "../gcore", version = "0.2" } libc = "0.2" serde = { version = "1", features = ["derive"] } serde_json = "1" diff --git a/docs/guides/gcode-development-guide.md b/docs/guides/gcode-development-guide.md index fefa032..fb9bc54 100644 --- a/docs/guides/gcode-development-guide.md +++ b/docs/guides/gcode-development-guide.md @@ -129,7 +129,7 @@ walker::discover_files(root, excludes) → indexer::upsert_symbols + upsert_file + upsert_chunks → PostgreSQL hub writes → upsert_imports + upsert_calls - → graph_synced=false / vectors_synced=false for daemon sync + → graph_synced=false / vectors_synced=false for projection sync ``` ### File Discovery (walker.rs) @@ -234,7 +234,7 @@ write `code_indexed_files` rows with `symbol_count=0`, so `tree`, `status`, and 1. **Hash comparison**: SHA-256 content hash per file, stored in `code_indexed_files` 2. **Stale detection**: Compare current hashes against stored hashes; files with changed hashes are re-indexed. -3. **Orphan cleanup**: Files in the DB that no longer exist on disk have their hub rows deleted. External cleanup (FalkorDB/Qdrant) is handled by the daemon's reconcile/sync workers. +3. **Orphan cleanup**: Files in the DB that no longer exist on disk have their FalkorDB code graph projection and Qdrant code-symbol vectors cleaned first, then their hub rows are deleted. Cleanup failures are recorded in `IndexOutcome.degraded`; PostgreSQL deletes still proceed. 4. **Per-file transactions**: PostgreSQL writes (delete old data, upsert symbols, upsert file, upsert content chunks, upsert imports/calls) are wrapped in a single transaction to prevent half-indexed files on crash. 5. **External sync flags**: Changed files are written with `graph_synced=false`, `vectors_synced=false`, and `graph_sync_attempted_at=NULL` so the daemon can regenerate graph and vector projections. @@ -275,17 +275,20 @@ local-symbol edges from unresolved or external calls, and `callee_external_module` preserves package/module provenance for external calls. The runtime schema validator requires these tables before gcode starts index/search work. -### Graph Lifecycle RPCs +### Graph Lifecycle -Read-side graph queries still go straight to FalkorDB. Graph lifecycle operations -are daemon-backed orchestration commands instead: +Read-side graph queries still go straight to FalkorDB. `gcode graph overview` +accepts `--limit N`, which maps to the daemon's +`GET /api/code-index/graph?limit=...` contract when the daemon delegates overview +reads to `gcode`. Graph lifecycle operations are Rust-owned FalkorDB operations: -- `gcode graph clear` → `POST /api/code-index/graph/clear?project_id=...` -- `gcode graph rebuild` → `POST /api/code-index/graph/rebuild?project_id=...` +- `gcode graph clear` clears the current resolved project id. +- `gcode graph clear --project-id ` clears by explicit project id before normal `Context::resolve()` and is the daemon stale-project cleanup path. +- `gcode graph rebuild` clears and rebuilds the current resolved project id from PostgreSQL graph facts. -These commands use the current resolved `Context.project_id`, require a daemon -URL, and fail hard on transport errors, non-2xx responses, or invalid JSON -success bodies. They do not talk to FalkorDB directly. +Graph clear uses `MATCH (n {project: $project})` plus the code-index label +predicate (`CodeFile`, `CodeSymbol`, `CodeModule`, `UnresolvedCallee`, +`ExternalSymbol`). It must not target memory graph labels or bridge ownership. ### UUID5 Parity @@ -507,10 +510,14 @@ Single match → used directly. Multiple matches → fail closed with alternativ ### Vector Lifecycle -1. **Index**: PostgreSQL writes mark files dirty for external sync; the daemon handles vector upserts when configured -2. **Re-index**: stale file cleanup and invalidation trigger corresponding Qdrant cleanup through daemon reconciliation +1. **Index**: PostgreSQL writes mark files dirty for external sync; `gcode index --sync-projections` handles vector upserts when Qdrant and embeddings are configured +2. **Re-index**: stale file cleanup deletes Qdrant points filtered by `project_id` and `file_path` before hub facts are removed 3. **Search**: embed query text, search Qdrant, return `(symbol_id, score)` pairs +Project vector clear/rebuild targets only the `code_symbols_{project_id}` +collection and filters by `project_id`; it must not list, drop, or mutate memory +vector collections. + ### Collection Naming `{collection_prefix}{project_id}` — default prefix is `code_symbols_` from Qdrant config. @@ -524,7 +531,7 @@ Each external service degrades independently: | FalkorDB | No config or connection refused | Graph commands return `[]` with hint; search loses graph boost | | Qdrant | No URL configured | Search loses semantic source; BM25 still works | | Embeddings API | No API base, auth failure, or request error | Semantic search disabled for that query | -| Daemon | Not running | Normal index/search still work; graph lifecycle RPCs fail and external sync waits for the daemon | +| Daemon | Not running | Normal index/search and configured graph/vector lifecycle still work; daemon automation is unavailable | | PostgreSQL hub | Missing bootstrap, non-postgres backend, unreachable DB, or missing schema | Runtime index/search commands fail clearly | The system always works without the daemon process once the PostgreSQL hub is configured with the required schema. @@ -576,4 +583,15 @@ gcode index --full # re-process all files, clean stale vectors gcode invalidate # destructive reset of current project's code-index rows ``` -_Last verified: 2026-05-24_ +`gcode invalidate` is project-scoped: PostgreSQL deletes are filtered by the +resolved project id, FalkorDB cleanup targets nodes with that project id, and +Qdrant cleanup targets only `code_symbols_{project_id}`. + +`gcode setup --standalone --overwrite-code-index` is the full standalone +code-index reset. It drops/recreates only allowlisted gcode PostgreSQL +relations and BM25 indexes, clears code-index graph labels in FalkorDB, and +deletes Qdrant collections with the `code_symbols_` prefix. Default standalone +setup fails on incompatible existing code-index state and prints the overwrite +rerun guidance. + +_Last verified: 2026-05-28_ diff --git a/docs/guides/gcode-graph-core.md b/docs/guides/gcode-graph-core.md new file mode 100644 index 0000000..1e96184 --- /dev/null +++ b/docs/guides/gcode-graph-core.md @@ -0,0 +1,126 @@ +# gcode Graph Core Migration Contract + +This guide defines the contract between `gcode` and Gobby daemon consumers while +code graph and vector projection ownership moves into Rust. + +## Target Integration + +The target architecture is direct Rust linking. A future Rust daemon should call +the `gobby-code` library APIs directly instead of shelling out through the CLI. +The stable daemon-facing boundaries are the library modules that already avoid +CLI output and `clap` types: + +- `index::api` for code fact indexing. +- `projection::sync` for graph and vector projection sync reports. +- `graph::code_graph` and `graph::report` for graph lifecycle, graph reads, and + project graph reports. +- `vector::code_symbols` for code-symbol vector lifecycle and search helpers. + +The CLI commands remain compatibility wrappers over these APIs. They are useful +for temporary Python shims and operator workflows, but they are not the long-term +daemon integration surface. + +## Transitional Python Shims + +Python daemon consumers may temporarily shell out to stable `gcode` JSON +commands while the daemon is still Python-owned. + +`CodeIndexTrigger` is the daemon-triggered indexing entry point. During the +transition it should invoke: + +```bash +gcode index --sync-projections --format json +``` + +The JSON response contains indexing counts plus `projections.graph` and +`projections.vector` reports. Deleted-file projection cleanup failures appear in +the top-level `degraded` list and must be surfaced without blocking PostgreSQL +fact deletion. Each projection report includes: + +- `status`: `ok`, `degraded`, or `failed`. +- `synced_files`. +- `synced_symbols`. +- `degraded`. +- `error`: `null` or an object with `kind` and `message`. + +Python maintenance flows that previously owned projection lifecycle work should +call Rust-owned lifecycle commands, or stable JSON wrapper functions around the +same commands: + +```bash +gcode graph overview --limit 100 --format json +gcode graph clear --format json +gcode graph clear --project-id --format json +gcode graph rebuild --format json +gcode vector clear --format json +gcode vector rebuild --format json +``` + +Migration points in the Python daemon: + +- `CodeIndexTrigger` calls `gcode index --sync-projections` for daemon-triggered + indexing. +- `sync_worker.py` stops rebuilding graph and vector projections directly and + delegates clear/rebuild work to Rust. +- `CodeIndexContext` drops projection lifecycle methods once callers route + through the Rust commands or direct Rust library APIs. +- Python `CodeGraph` is retired after parity with Rust graph lifecycle, read, + and report behavior. + +Temporary shell-out wrappers must parse JSON stdout and preserve failure state. +A non-zero exit, invalid JSON payload, `status: "degraded"`, `status: "failed"`, +or `degraded: true` must become an explicit daemon degraded state with the Rust +`error.kind` and `error.message` attached when available. Shims must not report a +successful daemon sync when graph, vector, or report generation returned a +degraded or failed Rust result. + +## Ownership Boundaries + +| Surface | Owner | Contract | +|---------|-------|----------| +| PostgreSQL code facts | `gcode` Rust indexing APIs | Rust writes and updates code symbols, indexed files, content chunks, imports, calls, and sync flags. The PostgreSQL hub schema is Gobby-managed and externally migrated; `gcode` validates and uses it but does not create, alter, or drop Gobby-owned tables. | +| FalkorDB code graph projection | `gcode` Rust graph APIs | Rust clears, rebuilds, and syncs the code graph projection from PostgreSQL code facts. Python projection code delegates here during transition and is removed after parity. | +| Qdrant code-symbol vector projection | `gcode` Rust vector APIs | Rust owns collection lifecycle and vector upserts/deletes for code symbols. Projection sync calls OpenAI-compatible embedding endpoints directly from Rust. | +| Daemon embedding service | Gobby daemon, outside code projection sync | Code-index projection sync bypasses the daemon embedding service. Runtime config may still come from Gobby-managed config, but embedding HTTP calls for code vectors are performed by Rust. | +| Symbol summaries | Gobby daemon enrichment | LLM-generated summaries remain daemon-side and optional. `gcode` may read existing summaries for BM25/vector text, but it must treat missing summaries as normal and must not require LLM generation for indexing or projection sync. | +| Memory graph | Gobby memory services | Memory services continue to own memory nodes, memory relationships, and memory lifecycle. | +| Memory vectors | Gobby memory services | Memory vector collections are not part of code-symbol projection lifecycle. `gcode` clears and deletes only `code_symbols_{project_id}` points/collections. | +| `RELATES_TO_CODE` bridge edges | Gobby memory services | Bridge edges are memory-owned hints. `gcode graph report` may read and display them as inferred, optional report input; it must not create, update, or delete them. | +| UI, MCP, and HTTP surfaces | Gobby daemon repo | User-facing daemon APIs, MCP tools, and HTTP routes call daemon services. They should not become `gcode` CLI responsibilities. | + +## Lifecycle Boundaries + +`gcode graph clear` and `gcode graph clear --project-id ` delete only +FalkorDB nodes with code-index labels (`CodeFile`, `CodeSymbol`, `CodeModule`, +`UnresolvedCallee`, `ExternalSymbol`) for the target project id. They must not +match memory graph labels or memory-owned `RELATES_TO_CODE` bridge queries. + +`gcode vector clear` deletes points from `code_symbols_{project_id}` filtered by +`project_id`. It must not list, drop, or mutate memory vector collections. + +`gcode index` handles deleted-file projection cleanup in Rust before hub fact +deletion. Missing explicit files and whole-project stale/orphan files delete the +file's code graph projection and Qdrant code-symbol points using +`project_id + file_path`; daemon reconciliation is no longer the required cleanup +mechanism for these cases. + +## Report And Degradation Contract + +`gcode graph report --format json` is the daemon-readable report surface for +code graph summaries. Missing required graph services fail the command instead +of returning a fake empty report. Optional inputs, such as memory-owned +`RELATES_TO_CODE` bridge data, appear as report degradation details when they are +unavailable. + +Daemon consumers should preserve three states: + +- `ok`: Rust completed the requested operation and no degraded flag is present. +- `degraded`: Rust completed part of the operation or produced a report with + unavailable optional input; callers should surface the degraded reason. +- `failed`: Rust could not complete the required operation; callers should keep + the previous projection/report state or mark it stale. + +The Python shim period ends when the daemon can link the Rust APIs directly and +all callers have moved off Python `CodeGraph`, Python graph/vector projection +code in `sync_worker.py`, and projection lifecycle methods on +`CodeIndexContext`. diff --git a/docs/guides/gcode-user-guide.md b/docs/guides/gcode-user-guide.md index e195a77..31d45fe 100644 --- a/docs/guides/gcode-user-guide.md +++ b/docs/guides/gcode-user-guide.md @@ -24,6 +24,25 @@ contains an inline `database_url`. Bootstrap `database_url_ref` is rejected during bootstrap validation; it is never resolved or used to restart the fallback chain. +For daemon-independent service provisioning: + +```bash +gcode setup --standalone +``` + +The default setup path is non-destructive. If incompatible existing code-index +PostgreSQL state is detected, setup fails with guidance instead of dropping +objects. For daemon adoption or explicit recovery, run: + +```bash +gcode setup --standalone --overwrite-code-index +``` + +That advanced reset recreates only gcode-owned code-index PostgreSQL objects, +clears code-index graph nodes in FalkorDB, and deletes Qdrant collections named +with the `code_symbols_` prefix. It leaves Gobby project files, config, +secrets, tasks, sessions, memory, and other daemon-owned data untouched. + If you use [Gobby](https://github.com/GobbyAI/gobby), gcode is already installed. ### Initialize and Index @@ -229,21 +248,34 @@ For Python, JavaScript, and TypeScript, graph edges are import-aware. Calls to external packages/modules stay external instead of being misclassified as local symbol-to-symbol edges. +### Graph Overview + +```bash +gcode graph overview --limit 100 +``` + +- `--limit N` caps the number of files used as overview graph roots +- Default: `100` +- Output uses the global `--format` flag; default output remains `json` + ### Graph Lifecycle `gcode` owns code-index lifecycle commands, including graph clear/rebuild. These -commands use the current resolved project context and require the Gobby daemon: +commands use the current resolved project context and require FalkorDB: ```bash gcode graph clear +gcode graph clear --project-id gcode graph rebuild ``` -- `gcode graph clear` clears the current project's graph projection through the daemon -- `gcode graph rebuild` asks the daemon to rebuild the current project's graph projection -- Both commands fail if project context cannot be resolved, if the daemon is unreachable, or if the daemon returns non-JSON success output +- `gcode graph clear` clears the current project's graph projection +- `gcode graph clear --project-id ` is for daemon stale-project cleanup and runs without cwd project-root resolution +- `gcode graph rebuild` rebuilds the current project's graph projection from PostgreSQL facts +- These commands fail if required project context cannot be resolved or if FalkorDB is unavailable - They respect the existing global `--format` flag; default output remains `json` - No confirmation prompt is shown; these are project-scoped graph projection operators, not full index invalidation +- Code graph clears target only code-index FalkorDB labels, not memory graph labels ### Callers @@ -336,19 +368,29 @@ gcode index --files src/config.rs docs/notes.md Dockerfile ``` `gcode index` writes symbols, files, chunks, imports, and calls to the -PostgreSQL hub. It marks graph/vector sync flags dirty; the Gobby daemon handles -FalkorDB graph edges and Qdrant vector sync asynchronously when it is running. +PostgreSQL hub. It marks graph/vector sync flags dirty; `gcode index +--sync-projections` updates FalkorDB graph edges and Qdrant code-symbol vectors +from Rust. Deleted-file cleanup removes code graph/vector projection rows before +PostgreSQL facts are deleted, including explicit `--files ` and +whole-project orphan cleanup. BM25 search (`search-text`, `search-content`) works as soon as the transaction commits; graph and semantic search improve once the external stores sync. -Reset and rebuild from scratch (destructive — prompts for confirmation): +Reset the current project and rebuild from scratch (destructive — prompts for confirmation): ```bash gcode invalidate gcode index ``` -In Gobby mode, `invalidate` also notifies the daemon to clean up FalkorDB graph nodes and Qdrant vectors for the project. Use `--force` to skip the confirmation prompt. +`invalidate` deletes only rows for the current project from PostgreSQL. When a +daemon URL or standalone service config is available, it also cleans only that +project's FalkorDB graph nodes and `code_symbols_{project_id}` Qdrant +projection. Use `--force` to skip the confirmation prompt. + +For a full standalone code-index reset across projects and projections, use +`gcode setup --standalone --overwrite-code-index`. That command is intended for +daemon adoption and explicit recovery. Graph projection lifecycle is separate: @@ -357,7 +399,10 @@ gcode graph clear gcode graph rebuild ``` -Use those when you want the daemon to clear or replay graph state for the current project without performing a full destructive code-index invalidation. +Use those to clear or replay graph state for the current project without +performing a full destructive code-index invalidation. Code vector lifecycle is +similarly scoped to `code_symbols_{project_id}` and does not touch Gobby memory +vector collections. ## Operating Model @@ -401,7 +446,7 @@ The database connection is resolved in this order: Bootstrap `database_url_ref` is rejected. Use the daemon broker path or an explicit fallback source for daemonless access. -The daemon URL (used by `invalidate`, `graph clear`, and `graph rebuild`) is resolved from: +The daemon URL (used by `invalidate`) is resolved from: 1. `GOBBY_PORT` environment variable (e.g. `60887`) 2. `~/.gobby/bootstrap.yaml` `daemon_port` + `bind_host` keys 3. Default: `http://localhost:60887` @@ -516,8 +561,8 @@ gcode status ### `gcode graph clear` / `gcode graph rebuild` fail immediately - If you see a project-context error, initialize the project first with `gcode init` or use `--project ` -- If you see a daemon connectivity error, confirm the Gobby daemon is running and `~/.gobby/bootstrap.yaml` points to the right port -- If you see an invalid-JSON success error, the daemon endpoint returned a malformed response and the command intentionally aborts instead of guessing +- If you see a FalkorDB configuration or connectivity error, confirm `GOBBY_FALKORDB_HOST` / `GOBBY_FALKORDB_PORT` or `config_store` are correct +- For stale-project cleanup where cwd has no project context, use `gcode graph clear --project-id ` ### Slow first index diff --git a/docs/guides/gcore-development-guide.md b/docs/guides/gcore-development-guide.md index 00f9de4..f743ab8 100644 --- a/docs/guides/gcore-development-guide.md +++ b/docs/guides/gcore-development-guide.md @@ -4,21 +4,32 @@ Technical internals for developers and agents working in the `gobby-core` crate ## What gobby-core Is -`gobby-core` is a small, dependency-light shared-primitives crate consumed by every Gobby CLI binary (`gcode`, `gsqz`, `gloc`, `ghook`). It exists so the binaries don't reimplement the same project-discovery and daemon-addressing logic four times — and so a behavior change (e.g. how the daemon URL is normalized) propagates with one PR instead of four. +`gobby-core` is the shared Rust migration substrate for Gobby CLI crates and future Rust daemon work. It holds the boring, reusable platform layer: project discovery, bootstrap and daemon addressing, shared context/config contracts, setup boundaries, degradation vocabulary, optional datastore adapters, and generic indexing/search primitives. -It has no CLI. It has no public state. It's a library — that's the whole shape. +Domain behavior stays out of this crate. Code graph facts, symbol IDs, language parsing policy, wiki vault layout, task behavior, memory behavior, and CLI output formatting belong to consumer crates. + +The baseline crate remains dependency-light. Consumers that only need project discovery and daemon helpers do not inherit PostgreSQL, FalkorDB, Qdrant, reqwest, ignore, or sha2 unless they opt in through Cargo features. ## Module Map `crates/gcore/src/`: -| Module | Responsibility | -|--------|----------------| -| `project` | Walk up from a starting directory to find a `.gobby/` directory containing `project.json` or `gcode.json`. Read the `id` (or legacy `project_id`) field from `project.json`. | -| `bootstrap` | Read `~/.gobby/bootstrap.yaml` to get the daemon's listen endpoint (`bind_host`, `daemon_port`). Falls back to `127.0.0.1:60887` when the file is missing or malformed. | -| `daemon_url` | Compose a dial URL from a `DaemonEndpoint`, normalizing wildcard listen addresses (`0.0.0.0`, `::`, `::0`) to `127.0.0.1`. | - -Roughly 250 lines of source total. Adding a fourth module should require justification. +| Module | Feature | Responsibility | +|--------|---------|----------------| +| `project` | always | Walk up from a starting directory to find a `.gobby/` directory containing `project.json` or `gcode.json`. Read the `id` (or legacy `project_id`) field from `project.json`. | +| `bootstrap` | always | Read `~/.gobby/bootstrap.yaml` to get the daemon's listen endpoint (`bind_host`, `daemon_port`). Falls back to `127.0.0.1:60887` when the file is missing or malformed. | +| `daemon_url` | always | Compose a dial URL from a `DaemonEndpoint`, normalizing wildcard listen addresses (`0.0.0.0`, `::`, `::0`) to `127.0.0.1`. | +| `config` | always | Shared configuration-resolution contracts. Environment variables, `config_store`, and defaults are represented here as the foundation expands. | +| `context` | always | Shared runtime context contracts for project identity, daemon URL, and optional service configuration. Consumer-specific CLI state stays outside. | +| `degradation` | always | Shared vocabulary for optional-service absence, partial search, stale indexes, skipped artifacts, and fatal core errors. | +| `setup` | always | Attached and standalone setup contracts. Runtime commands validate externally managed resources and do not implicitly migrate them. | +| `postgres` | `postgres` | PostgreSQL hub adapter boundary. Validates Gobby-owned schema and BM25 requirements without creating, altering, or dropping managed objects. | +| `falkor` | `falkor` | FalkorDB adapter boundary. Graph connection helpers live here without making FalkorDB a baseline dependency. | +| `qdrant` | `qdrant` | Qdrant adapter boundary for vector search/storage integration. | +| `indexing` | `indexing` | Generic file walking, hashing, and indexing primitives that are not tied to one domain model. | +| `search` | `search` | Generic search result and fusion primitives. Code-specific or wiki-specific search behavior stays in consumers. | + +Feature-gated modules are part of the public module map but compile only when their feature is selected. ## Public API @@ -77,25 +88,83 @@ ureq::post(&format!("{url}/api/hooks/execute")).send_string(body)?; Bracketing IPv6 literals for URL embedding is **not** handled here — in practice `bootstrap.yaml` is always `localhost`, an IPv4 literal, or a wildcard. If that ever stops being true, this is the place to add it. -## Why These Three Modules Specifically +### `degradation` + +```rust +pub enum ServiceState; +pub struct SetupIssue; +pub struct Guidance; +pub enum CoreError; +pub enum DegradationKind; +``` + +`degradation` defines the shared vocabulary for fatal core failures and non-fatal partial results. `ServiceState` travels with adapter results so callers can distinguish an available service, a service with no configuration, and a configured service that is unreachable. `CoreError` is reserved for command-stopping failures such as invalid configuration, unavailable required services, failed writes, and corrupted input. + +`DegradationKind` is for successful operations that returned less than the ideal result. A `gobby-code` search can return symbol or content results while marking Qdrant or FalkorDB as an optional `ServiceUnavailable` degradation. It can also report `PartialSearch`, `StaleIndex`, or `SkippedArtifacts` without converting those states into fatal CLI errors. + +`gobby-wiki` should use the same contracts for wiki search and indexing. Missing vector search, stale vault index data, or skipped files should be reported as degradation metadata alongside partial results. A required store or write path failure should become `CoreError` only when the command cannot complete. + +`Guidance` and `SetupIssue` carry structured setup remediation for attached and standalone validation. Consumer CLIs render the `problem`, `action`, and optional `command_hint` fields in their own output style; `gobby-core` only provides the serializable contract. + +### `setup` + +`setup` defines the shared contracts for two separate workflows: + +- **Attached mode** uses `AttachedValidator` and `RequiredObject` declarations to check that externally managed resources already exist. It returns a `ValidationReport` containing present objects and missing objects with typed `SetupIssue` guidance. `gobby-core` does not create, alter, drop, or migrate Gobby-owned schema in attached mode. +- **Standalone mode** uses `StandaloneSetup` and `OwnedObject` declarations for explicit setup commands that create consumer-owned resources. Consumers must declare a namespace such as `gcode` or `gwiki` so owned tables, graph labels, and vector collections stay domain-scoped. + +`ValidationContext` and `SetupContext` pass optional datastore handles/configuration into callbacks. PostgreSQL handles are mutable because `postgres::Client::query` and `postgres::Client::execute` both require `&mut self`; the callbacks borrow the supplied context and do not take ownership from later validators or creators. + +## Boundary Rules + +Each module exists because multiple Rust consumers need the same infrastructure contract, and getting it slightly wrong in one crate would silently misbehave. + +| Boundary | Consumers | What stays out | +|----------|-----------|----------------| +| Project/bootstrap/daemon helpers | `gcode`, `ghook`, future Rust consumers | CLI rendering, command dispatch, daemon workflow semantics. | +| Context/config/setup/degradation contracts | `gcode`, `gobby-wiki`, future daemon work | Domain-specific flags, output formats, setup UX, and task/memory behavior. | +| Datastore adapters | Consumers that opt in to `postgres`, `falkor`, or `qdrant` | Schema ownership, migrations, code graph facts, vector content policy. | +| Indexing/search primitives | Consumers that opt in to `indexing` or `search` | Code symbol IDs, language parsing policy, wiki document models, ranking UX. | -Each module exists because at least two binaries need exactly this logic, and getting it slightly wrong in one of them would silently misbehave: +`gobby-core` can validate attached-mode resources, but it must not create, alter, drop, or migrate Gobby-owned resources during normal runtime commands. -| Module | Consumers (today) | What goes wrong if duplicated | -|--------|-------------------|-------------------------------| -| `project` | `gcode`, `ghook` (and `gsqz`/`gloc` could use it) | Project discovery walks up across mounts, weird symlink loops, race conditions with `.gobby/` creation. One implementation = one set of edge cases. | -| `bootstrap` | `gcode`, `ghook` | YAML field naming, fallback semantics. Easy for two implementations to disagree on whether a missing field is fatal. | -| `daemon_url` | `ghook` (and `gcode` daemon RPC) | Wildcard-host normalization is non-obvious. A binary that POSTs to `0.0.0.0` will hang for the connect timeout instead of failing fast. | +## Feature Gates + +The crate's default feature set is empty: + +```toml +[features] +default = [] +postgres = ["dep:postgres", "dep:postgres-types"] +falkor = ["dep:falkordb", "dep:urlencoding"] +qdrant = ["dep:reqwest"] +indexing = ["dep:ignore", "dep:sha2"] +search = [] +full = ["postgres", "falkor", "qdrant", "indexing", "search"] +``` + +Feature rationale: + +| Feature | Enables | Why gated | +|---------|---------|-----------| +| `postgres` | `postgres`, `postgres-types` | Hub validation and adapter code are only needed by datastore consumers. Lightweight binaries should not inherit PostgreSQL. | +| `falkor` | `falkordb`, `urlencoding` | Graph helpers need FalkorDB. `urlencoding` is included because FalkorDB connection URLs must encode passwords safely. | +| `qdrant` | `reqwest` with `blocking` and `json` | Vector search/storage helpers need HTTP. Other consumers should not pull reqwest. | +| `indexing` | `ignore`, `sha2` | File walking and content hashing are useful for indexing consumers only. | +| `search` | no extra dependency today | Search fusion contracts are lightweight, but still opt-in so the public surface remains explicit. | +| `full` | all feature modules | Convenience feature for development and consumers that need the whole foundation layer. | + +Every individual feature must compile in isolation. Do not rely on `--all-features` to hide missing feature dependencies. ## Versioning Policy `gobby-core` is `0.x`. The contract: -- **Patch bumps (0.1.x)** — bug fixes, doc changes, internal refactors with no public API change. +- **Patch bumps (0.2.x)** — bug fixes, doc changes, internal refactors with no public API change. - **Minor bumps (0.x.0)** — additive public API (new functions, new fields). Existing consumers stay compatible. - **Pre-1.0 breaking changes** — bump the minor and bump *every* consumer crate's gobby-core dep in the same release. Don't strand consumers on an old gobby-core. -Consumers pin to a minor version (`gobby-core = "0.1"`) so patch updates are picked up automatically but additive changes require a coordinated bump. +Consumers pin to a minor version (`gobby-core = "0.2"`) so patch updates are picked up automatically but additive changes require a coordinated bump. ## How to Consume @@ -103,51 +172,64 @@ Consumers pin to a minor version (`gobby-core = "0.1"`) so patch updates are pic ```toml [dependencies] -gobby-core = { path = "../gcore", version = "0.1" } +gobby-core = { path = "../gcore", version = "0.2" } ``` The `path` is for local workspace builds; `version` is required by `cargo publish` and gets used when consumers install the crate from crates.io. Don't drop the `version` field — `cargo publish` will reject the consumer's manifest. +Opt in to heavier modules explicitly: + +```toml +[dependencies] +gobby-core = { path = "../gcore", version = "0.2", features = ["postgres", "search"] } +``` + +Small binaries should keep the default empty feature set unless they directly use a feature-gated module. + ### Out-of-tree ```toml [dependencies] -gobby-core = "0.1" +gobby-core = "0.2" ``` -Resolves against crates.io. The crate has no opinionated dependencies — `anyhow`, `dirs`, `serde_json`, `serde_yaml`, and `tempfile` (dev-only). It will not pull in tokio, reqwest, tracing, or anything else heavy. +Resolves against crates.io. The default crate has no datastore dependencies. It will not pull in PostgreSQL, FalkorDB, Qdrant, reqwest, ignore, sha2, tokio, tracing, or anything else heavy unless the consumer selects the matching feature. ## Adding a New Helper Before adding a module or function to `gobby-core`, check: 1. **Do at least two binaries need it?** If only one does, keep it in that binary. -2. **Is it dependency-light?** New deps in `gobby-core` propagate to *every* binary. Adding `tokio` here would 5x the binary size of `ghook` for zero benefit. If the helper needs heavy deps, it probably belongs in a separate shared crate. -3. **Is it stateless or near-stateless?** `gobby-core` functions are pure or do narrow I/O (read one file, return result). A module that holds connection pools or background workers belongs elsewhere. -4. **Is the public surface small?** Three functions + a `DaemonEndpoint` struct is the right order of magnitude. If you find yourself adding a builder, a config object, and an `init()` function, reconsider. - -If yes to all four, add the module: - -1. Create `crates/gcore/src/.rs` with `//!` module docs. -2. Add `pub mod ;` to `crates/gcore/src/lib.rs`. -3. Write tests that pin behavior under the failure modes the consumer cares about (missing input, malformed input, edge-case values). -4. Update this guide's module map. -5. Bump `gobby-core` to the next minor version (`0.2.0`) since you're adding public API. -6. Update consumer crates to use the new helper, replacing any duplicated implementation. Bump their versions too. +2. **Does it belong in an existing boundary?** Prefer `config`, `context`, `degradation`, `setup`, `postgres`, `falkor`, `qdrant`, `indexing`, or `search` before adding a new top-level module. +3. **Is it dependency-light, or properly feature-gated?** New baseline deps propagate to *every* binary. Heavy deps belong behind a narrowly named feature. +4. **Does it respect setup mode?** Attached-mode helpers validate externally managed state. Standalone setup helpers run only through explicit setup flows. +5. **Is it stateless or near-stateless?** `gobby-core` functions are pure or do narrow I/O (read one file, return result). A module that holds connection pools or background workers belongs elsewhere. +6. **Is the public surface small?** A few focused functions and structs per module is the right order of magnitude. If you find yourself adding a builder, a config object, and an `init()` function, reconsider. + +If yes to all checks, add the helper: + +1. Add it to the appropriate module with `//!` or item docs. +2. For a new lightweight module, add `pub mod ;` to `crates/gcore/src/lib.rs`. +3. For a new heavy module, add an optional dependency, a feature entry, and a `#[cfg(feature = "")] pub mod ;` guard. +4. Write tests that pin behavior under the failure modes the consumer cares about (missing input, malformed input, edge-case values). +5. Update this guide's module map and feature gate table when the public boundary changes. +6. Bump `gobby-core` to the next minor version since you're adding public API. +7. Update consumer crates to use the new helper, replacing any duplicated implementation. Bump consumer package versions when those crates are part of the release. ## Testing -Each module has `#[cfg(test)] mod tests` with `tempfile::tempdir()` for filesystem isolation: +Behavioral modules use `#[cfg(test)] mod tests` with `tempfile::tempdir()` for filesystem isolation: - **project**: implicitly tested via consumer binaries (`gcode`, `ghook`); the module mirrors `gcode/src/project.rs` line-for-line. - **bootstrap**: missing/malformed/empty files all return defaults; custom port/host parsing; out-of-range port falls back to default. - **daemon_url**: wildcard IPv4/IPv6 normalize to loopback; localhost passes through; custom host+port composes correctly. +- **public_boundary**: integration test that pins feature gates, `lib.rs` module guards, and this guide's boundary documentation. ```bash -cargo test -p gobby-core +cargo test -p gobby-core --no-default-features ``` -Fast, no I/O outside `tempdir()`, no network. Should run in well under a second. +Baseline tests are fast, perform no network I/O, and keep filesystem writes inside temporary directories. ## Design Decisions