Skip to content

Latest commit

 

History

History
137 lines (111 loc) · 28.6 KB

File metadata and controls

137 lines (111 loc) · 28.6 KB

Feature catalog

The full surface, grouped by concern. Each item links to the deeper reference where one exists.

Graph & resolution

  • Knowledge graph — every file, symbol, import, call chain, and type relationship in one queryable structure.
  • Multi-repo workspaces — index multiple repos into one graph with cross-repo symbol resolution, project grouping, reference tags, and per-repo scoping. A dedicated cross-repo edge layer materialises cross_repo_calls / cross_repo_implements / cross_repo_extends whenever a relation's endpoints live in different repos — surfaced via analyze kind: "cross_repo". Resolution is evidence-gated so unrelated same-named symbols don't get wired together. See multi-repo.md.
  • Per-session workspace isolation — under the daemon, each MCP session's queries are scoped to the workspace it connected from. Sessions on different repos never see each other's graph slices.
  • 257 languages across three tiers — bespoke tree-sitter extractors (~30) for the deep-resolution tier, regex extractors (~60) for niche/legacy, forest-backed signature-only (~165 via alexaandru/go-sitter-forest, extended by drop-in user grammars via .gortex.yaml's index.grammars key) for the long tail. Plus notebook-style sources: Jupyter .ipynb (nbformat 3+4) and Databricks notebooks (.dbc archives + source-format .py / .scala / .sql / .R with # COMMAND ---------- separators) extract each cell as its own graph node, tagged with cell_index / cell_kind / cell_language. See languages.md.
  • Type-aware resolution — infers receiver types from variable declarations, composite literals, and Go constructor conventions to disambiguate same-named methods across types.
  • Scope-based static resolver for C / C++ / Java / PHP — disambiguates same-named symbols using language scope rules before falling back to directory-locality. C prefers same-file static over same-name extern; C++ prefers same-namespace then walks ADL namespaces; Java pins to the enclosing class, then walks extends up to 8 hops; PHP resolves parent::/self::/static:: against the inheritance chain. Scope-resolved edges stamp OriginASTResolved + Meta["resolution"]="scope".
  • LSP-enriched call-graph tiers — every edge carries an origin tier (lsp_resolved / lsp_dispatch / ast_resolved / ast_inferred / text_matched) plus a coarse tier label (lsp / ast / heuristic). Pass min_tier to get_callers, find_usages, find_implementations, flow_between, etc. for compiler-verified-only edges. For TS/JS/JSX/TSX the cross-file resolver consults typescript-language-server on the hot path. See lsp.md.
  • Sub-millisecond impact analysisexplain_change_impact, detect_changes, flow_between step-impact, and the safe_to_change / pre_commit prompts share a precomputed reach index walking every node's incoming edges to depth 3 at index time. Blast-radius queries become O(seeds × reach) map lookups. The daemon snapshot persists the index so warm starts skip the build.
  • Semantic enrichment — pluggable SCIP, go/types, and LSP providers upgrade edge confidence from ~70-85% (tree-sitter) to 95-100% (compiler-verified). Additive: graceful degradation when external tools are unavailable. Per-language connect: { network, address, fallback_spawn } dials an already-running language server instead of spawning a duplicate.
  • IMPLEMENTS inference — structural interface satisfaction for Go, TypeScript, Java, Rust, C#, Scala, Swift, Protobuf. C# bases in another compilation unit are split into extends vs implements via a local-interface prescan + the I-prefix convention.
  • Framework dynamic-dispatch synthesis — a provenance-tagged synthesizer engine materialises the call edges frameworks wire at runtime that static resolution can't see: gRPC stub→handler, Temporal workflow→activity, in-process / native event channels, and the cross-language native bridges — Swift↔Objective-C selectors, React Native (RCT_EXPORT_* / @ReactMethod ↔ JS NativeModules), Expo (Function(...) DSL ↔ requireNativeModule), and Fabric codegen view managers. Every synthesized edge carries synthesized_by provenance; analyze kind: "synthesizers" rolls them up.
  • Language-specific resolution — Rust impl-block method-owner / self-receiver / crate::-super:: module-path resolution; Kotlin companion-object static dispatch + lambda-parameter scoping. analyze kind: "resolution_outcomes" explains why a call/reference edge was left unresolved (ambiguous_multi_match / candidate_out_of_scope / cross_language_only / stub_only / no_definition).
  • Per-reference contextsfind_usages classifies every usage by the role it plays (parameter_type / return_type / field / value / type / attribute / call) and accepts a context: filter — e.g. "every place this type is used as a parameter".
  • External-package call qualification — calls into un-indexed third parties are retargeted onto stable per-package identity nodes (dep:: / stdlib:: / external::, per-language: Go / Rust / Java / Python / C# / TS), so call chains keep the external hop and a service's external surface aggregates across repos. On by default, incremental on the reindex hot path; opt out with index.synthesize_external_calls: false.

Search & navigation

  • Semantic search default-on — hybrid BM25 + vector with RRF fusion, baked GloVe-50d (~3.8 MB embedded in the binary, top 20k tokens), CPU-only, zero native deps. Large symbols are split into AST-aware windows and de-chunked at query time. Equivalence-class vocabulary expansion bridges auth ≈ authentication ≈ login without an LLM. A HITS authority/hub signal feeds the rerank pipeline. A keyword-soup query defense detects degenerate OR-soup and operator-free phrasing and skips wasted LLM expansion. Markdown documentation is a first-class corpus with its own retrieval channel and prose-tuned ranking. Opt-in embedding.provider: local (Hugot MiniLM-L6-v2) or api (Ollama / OpenAI). See semantic-search.md.
  • Provenance-aware ranking — the BM25↔vector balance is scored continuously from query shape; edge-resolution provenance attenuates LSP-inflated framework wiring in centrality and rerank; generated files are ranked below a real same-named implementation; the implementation is lifted above its own test; and a post-rerank pass recovers exact embedding cosine. Zero-result identifier queries are auto-decomposed into leaf terms.
  • context_closure — given seed files/symbols, walks the transitive import/dependency closure and packs it under one token_budget, ranked by graph distance or seeded random-walk proximity.
  • Code search beyond symbolssearch_text is a trigram-indexed literal/regex search; search_ast runs structural tree-sitter queries; analyze kind=sast is a 190-rule, CWE/OWASP-tagged security scan across 8 languages.
  • Multi-axis structured retrieval (winnow_symbols) — constraint-chain filter over kind, language, community, path_prefix, min_fan_in, min_fan_out, min_churn, text_match with per-axis score contributions.
  • Token-budgeted free traversal (walk_graph) — walks arbitrary edge_kinds outward / inward / both from a starting symbol; auto-stops at token_budget.
  • Ad-hoc graph query (graph_query) — small read-only DSL with nodes / traverse / filter stages, bounded by limit and a five-stage cap.
  • Per-session symbol cursor (nav) — verb-dispatched (goto / into / up / sibling / back / where / read); adjacency preview on every response.
  • Opening-move routing (plan_turn) — ~200-token ranked list of next calls with pre-filled args for a task description.
  • Narrative repo overview (get_repo_outline) — single-call: top languages, communities, hotspots, most-imported files, entry points.
  • Cold-start query suggestions (suggest_queries) — 5-10 starter queries from entry points, hubs, bridges, subsystems.
  • Use-site → declaration resolver (find_declaration) — literal substring or regex over use sites; trigram-prefiltered.

Dataflow, structure, concurrency

  • CPG-lite dataflowvalue_flow (intra-procedural assignment / return / range), arg_of (caller arg → callee param), returns_to (callee → assignment LHS) built at index time. flow_between returns ranked dataflow paths; taint_paths does pattern-driven source→sink sweeps for security audits.
  • Near-duplicate clone detection — every substantial function body is reduced to a 64-slot token-normalised MinHash signature at index time; LSH banding finds candidate pairs; a Jaccard threshold filter keeps the true clones — emitted as similar_to edges. find_clones surfaces clusters; dead_only: true yields dead duplicates of live code.
  • Bundled unsafe-patterns scananalyze kind=unsafe_patterns fans out seven tree-sitter detectors in one call: Go panic, Rust .unwrap / .expect / panic! / todo! / unimplemented! / unreachable! / assert! / unsafe { } / unsafe fn, Python assert (stripped by -O), JS/TS throw. Each rule is also invocable individually via search_ast detector=….
  • Language-agnostic concurrency analyzersanalyze kind=race_writes flags struct-field writes from inside a goroutine-reachable function whose writer has no detected lock acquisition (Lock / RLock / Acquire / WithLock / synchronized / Do recognised across Go, Rust, Java, TS, Python, C#). analyze kind=unclosed_channels flags channels with sends but no close() call, classified high / medium / low risk. Rides existing EdgeSpawns / EdgeSends / EdgeRecvs / EdgeWrites / EdgeCalls. get_callers and analyze kind=goroutine_spawns annotate each result with sync_guarded and cross_concurrent plus a human-readable explanation.
  • Composite code-health scoreanalyze kind=health_score aggregates coverage_pct + complexity + recency + churn into one 0..100 value per symbol plus an A..F grade. Population distribution (mean / median / std-dev / Gini / per-grade counts). roll_up=file or roll_up=repo for per-file / per-repo averages.
  • Composite change-impact scoreanalyze kind=impact blends PageRank centrality + transitive reach + cyclomatic complexity + co-change coupling + community span into one 0..100 score and risk label.
  • analyze is a 59-kind dispatcher — beyond structural kinds, covers impact, health_score, sast / named / unsafe_patterns, clusters, connectivity_health, tests_as_edges, channel_ops, goroutine_spawns, field_writers, config_readers, event_emitters, error_surface, routes, models, components, k8s_resources, images, kustomize, cross_repo, dbt_models, env_var_users, sql_call_sites, fixes_history, edge_audit, domain, synthesizers (framework-dispatch-synthesized edges grouped by pass), resolution_outcomes (why the resolver left an edge unresolved), more.

Refactoring, simulation, overlays

  • Atomic refactorsedit_symbol (edit by ID, optional base_sha drift guard, returns new_sha for pipelined edits), edit_file (any file, no graph required, kills Read-before-Edit), write_file (atomic temp+rename, re-indexes on write), rename_symbol (coordinated multi-file rename), move_symbol (relocate function/method/type/variable/const across files — same-package leaves callers untouched, cross-package rewrites every qualified reference, drops/adds imports, synthesises the target file; Go for now), inline_symbol (replace every callsite of a trivial callee with the body; refuses cleanly on defer/spawn/close-over-scope/multi-return/side-effecting args; delete_after: true removes the declaration), safe_delete_symbol (atomic dead-code removal with graph-aware safety gate and a fixed-point orphan-propagation pass).
  • Speculative executionpreview_edit and simulate_chain answer "what would change if I applied this WorkspaceEdit?" without touching disk or mutating the base graph. Standard LSP WorkspaceEdit input. Per-step impact: touched files, added/removed/renamed symbols, broken callers, broken interface implementors, blast-radius rollup, suggested test targets, round-trip LSP diagnostics. simulate_chain with keep: true promotes the final state into a real overlay.
  • Live editor overlays (shadow graph)overlay_register / overlay_push / overlay_list / overlay_delete / overlay_drop / overlay_keepalive / compare_with_overlay. Editor extensions push in-flight (unsaved) buffers; every subsequent tool call in the same MCP session reads through the shadow. Base graph is never mutated. Concurrent sessions each see their own view. Overlays are bound to the MCP session lifecycle; idle TTL via GORTEX_OVERLAY_IDLE_TTL (default 30 m).
  • Overlay branching — N parallel speculative sessions off one baseline. overlay_fork / overlay_branches / overlay_switch / overlay_merge / overlay_drop_branch / compare_branches. Hold strategy A and strategy B simultaneously, evaluate each, merge the winner.
  • Dependency-ordered batch refactorsget_edit_plan returns the file order; batch_edit applies atomically, re-indexing between steps.
  • Scaffoldingscaffold generates code, registration wiring, and test stubs from an example symbol.
  • Pattern extractionsuggest_pattern extracts the code pattern from an example (source, registration, tests).

Safety, guards, agent config

  • Proactive safetyverify_change checks proposed signature changes against all callers and interface implementors; check_guards evaluates project guard rules (.gortex.yaml) against changed symbols.
  • Graph-validated config hygiene (audit_agent_config) — scans CLAUDE.md, AGENTS.md, .cursor/rules, .github/copilot-instructions.md, .windsurf/rules, .antigravity/rules for stale symbol references, dead file paths, and bloat — validated against the live graph.
  • Guard rules — project-specific constraints (co-change, boundary) enforced via check_guards. The architecture: block carries declarative rules: (max_fan_out dependency-cone limits, deny_callers_outside caller boundaries) and named layers: (path globs with directional allow / deny dependency lists).
  • Graph-grounded PR reviewreview / review_pack run a deterministic correctness rulepack (NPE / thread-safety check-then-act / N+1 / logic-error, Go + Python; also analyze kind=review) over a changeset, graph-grounded to drop false positives, and return a BLOCK/REVIEW/APPROVE verdict with line-anchored inline comments; critique_review adds an adversarial LLM false-positive pass and post_review posts the gated findings as inline PR/MR comments (secrets redacted before egress). PR triage rides the graph: list_prs / triage_prs / pr_risk / get_pr_impact (blast radius + risk score), conflicts_prs (merge-order hotspots), suggest_reviewers, suggested_review_questions. Exposed on the CLI as gortex prs / gortex review. See cli.md and mcp.md.
  • Phase-enforcement workflowset_planning_mode switches the session between a no-writes planning phase (every editing tool removed and hard-blocked) and editing mode. workflow drives a phase-enforcement state machine (explore → implement → verify); editing tools are gated until the implement phase.
  • Diagnostics & code actionssubscribe_diagnostics / unsubscribe_diagnostics push LSP publishDiagnostics; get_diagnostics reads the current state. get_code_actions / apply_code_action / fix_all_in_file wire LSP code actions (quickfix / organizeImports / refactor / source) across every running language server. Server-driven capability registration (client/registerCapability) is honoured live.
  • Prompt-injection screening — every tool call is screened for injection patterns; non-blocking _meta.gortex_security advisory on hit. Disable with GORTEX_MCP_SANITIZE=0.

Notifications & live signals

Five proactive push channels — per-session opt-in, delta-filtered, initial replay, auto-cleanup on disconnect. Subscriber counts surface in graph_stats.

  • notifications/diagnostics — LSP publishDiagnostics fan-out. Filter by min_severity / path_prefix.
  • notifications/workspace_readiness — daemon warmup phase transitions (snapshot_loaded → parallel_parse → deferred_passes_all → global_resolve → end_batch → watcher_started → ready). Late subscribers get the last-known phase replayed.
  • notifications/daemon_health — periodic ticker (default 15 s, clampable 1 s..5 min). Snapshots uptime, alloc/sys/heap, num_goroutine, num_gc, tracked_repos, sessions, lsp_alive, graph nodes/edges. Only runs while ≥1 subscriber is attached.
  • notifications/stale_refs — per-session intersect of watcher symbol-change events against the session's viewed/modified working set. Fires only when a change actually touches what this session has consumed.
  • notifications/graph_invalidated — coarse "the graph was rebuilt, drop cached results" signal. {node_count, edge_count, reason, ts}. Unfiltered.

Plus MCP progress notifications on long-running indexing / track_repository calls (notifications/progress with stage messages walking files → parsing → resolving → semantic enrichment → search index → contracts → done).

Coverage, churn, ownership

  • Test taxonomy — functions/methods in test files carry Meta["is_test"] + Meta["test_role"] (test / benchmark / fuzz / example) + Meta["test_runner"]. Runner identifier resolved from parser-stamped imports across gotest / pytest / unittest / rspec / minitest / test-unit / jest / vitest / mocha / bun-test / node-test / playwright / cypress.
  • Stratified test classificationtests edges carry the role so winnow_symbols can filter "production functions only" and coverage analyzers can reason per-tier.
  • Test-coverage gaps (get_untested_symbols) — inverse of get_test_targets. Functions/methods not reached from any test file, ranked by fan-in.
  • get_churn_rate — per-symbol commit density from blame.
  • find_co_changing_symbols — ranked git co-change neighbours over mined cosine-weighted co_change edges.
  • Enrichment CLIgortex enrich blame | coverage | releases | all hydrates the graph with the metadata stale_* / coverage* / ownership / releases analyzers need.

Framework, infra, contracts

  • Framework graph layer — handler→route edges from HTTP / gRPC / GraphQL / WebSocket / Phoenix / Kafka topic registrations. ORM model→table edges across GORM, SQLAlchemy, Django, ActiveRecord, JPA, TypeORM, Ecto. Component-tree edges for JSX/TSX and Phoenix HEEx. analyze kind=routes/models/components.
  • Cross-file contract resolution — router-mount prefixes are folded into HTTP contract paths (FastAPI include_router(prefix=), Express app.use, NestJS @Controller); project-specific HTTP client wrappers register as consumers via a configurable alias list; Symfony services.yaml / Spring beans XML emit interface→implementation DI bindings; MyBatis mapper XML is indexed (a node per <select|insert|update|delete>, <namespace>::<id>) and linked from the Java DAO methods that execute it; supabase.rpc / SQLAlchemy func.* call sites link to SQL CREATE FUNCTION nodes.
  • Real-time transport edges — WebSocket / SSE client constructors (new WebSocket / new EventSource) and server upgrade handlers (Go gorilla / gobwas / coder) surface as channel edges, reusing the pub/sub event model.
  • Infrastructure graph layerKindResource (K8s Deployments, Services, Ingresses, ConfigMaps, Secrets, CronJobs), KindKustomization (overlay tree), KindImage (Dockerfile FROM and K8s container.image) with depends_on / configures / mounts / exposes / uses_env edges. Cross-references with code-side os.Getenv automatically. analyze kind=k8s_resources/kustomize/images.
  • Framework-aware extraction — first-class entry points (Alembic migrations, Next.js pages / App-Router files, ASP.NET host files) stamped so the dead-code analyzer never flags runtime-invoked symbols. Swift HTTP routes (Vapor + Alamofire). XAML / AXAML extractor (x:Class code-behind link, named controls, {Binding} expressions). .NET DI registrations + COM-interop flags. Lombok / MapStruct / Kotlin / CommunityToolkit.Mvvm source-generated members surfaced via has_generated_members.
  • Cross-repo API contracts — auto-detection across HTTP routes, gRPC services, GraphQL schemas, Kafka/RabbitMQ/NATS/Redis pub/sub topics, WebSocket events, env vars, OpenAPI specs, Temporal workflows. Normalised to canonical IDs (e.g. http::GET::/api/users/{id}) and matched across repos to detect orphan providers/consumers and mismatches. See contracts.md.
  • Artifacts — non-code knowledge files (DB schemas, API specs, ADRs, infra configs) declared in .gortex.yaml::artifacts are indexed as artifact nodes. search_artifacts / get_artifact surface them; EdgeReferences links code to spec.

Persistence, scale, isolation

  • On-disk persistence — snapshots the graph on shutdown, restores on startup with incremental re-indexing of only changed files (~200 ms vs 3–5 s full re-index). Snapshots keyed by (repo, branch) so branch-switches reuse each branch's cached index and git worktrees of one repo share the base. Change detection is mtime-based by default, opt-in BLAKE3 Merkle-tree mode (index.merkle / GORTEX_MERKLE) diffs by content hash. Cross-process advisory lock guards the store. See architecture.md.
  • Crash-resilient indexing — opt-in (index.crash_isolation) tree-sitter extraction in worker subprocesses, so a grammar SIGSEGV / OOM / hang on one pathological file is contained. The worker pool is long-lived. Per-file extraction budget, size cap, content-based bundled/minified detection — each leaves a synthetic node carrying skipped_due_to_* telemetry. Pluggable pre-ingestion content transforms (index.transforms) — BOM stripping plus user external-command processors (minified-bundle expansion, SVG/TOON, PDF→markdown).
  • Long-living daemon (optional)gortex daemon start runs a single shared process holding the graph for every tracked repo. Each Claude Code / Cursor / Kiro window connects as a thin stdio proxy over a Unix socket with per-client session isolation. Live fsnotify on every tracked repo. gortex install sets up user-level config; gortex daemon install-service installs a LaunchAgent (macOS) or systemd --user unit (Linux). Binaries fall back to embedded mode if the daemon isn't running.
  • MCP 2026 Streamable HTTP transport (/mcp) — the wire format the June 2026 MCP release locks in. Opt-in on the daemon via gortex daemon start --http-addr <addr> (non-localhost binds require --http-auth-token); served on the same address as the /v1/* JSON API. See server.md.
  • Watch mode — surgical graph updates on file change across all tracked repos, live sync with agents.

Agent ergonomics

  • PreToolUse + PostToolUse + PreCompact + Stop hooks — two postures picked at install (gortex install --hook-mode={deny,enrich}). deny (default) has PreToolUse enrich Read/Grep/Glob/Bash with graph context and redirect by deny to Gortex MCP tools. enrich never denies — PreToolUse downgrades to soft additionalContext, and a PostToolUse hook augments the actual tool output with graph context. PreCompact injects a condensed orientation snapshot before context compaction. Stop runs post-task diagnostics (detect_changesget_test_targets, check_guards, analyze dead_code, contracts check).
  • Agent feedback loop — unified feedback tool (action: "record" / "query"). Cross-session persistence improves future smart_context quality via feedback-aware reranking.
  • Per-community skillsgortex init --skills (default on) auto-generates SKILL.md per detected community with key files, entry points, cross-community connections, and MCP tool invocations. The same routing table lands in every detected agent's per-repo instructions file. See skills.md.
  • Session memorysave_note / query_notes / distill_session persist agent-authored notes per repo, auto-linked to symbols mentioned in the body. Notes survive daemon restarts and context compactions.
  • Development memoriesstore_memory / query_memories / surface_memories / edit_memory / rename_memory — cross-session, symbol-linked durable knowledge with kind (invariant / constraint / convention / gotcha / decision / incident / reference), importance (1..5), confidence (0..1). Surfaced proactively by surface_memories when anchor symbols / files enter the working set.
  • Repository-local persistent notebooknotebook_save / notebook_find / notebook_list / notebook_show / notebook_used. Markdown entries committed to git so agent journal entries surface in PR review.
  • Context exportexport_context tool + gortex context CLI render graph context as portable markdown/JSON briefings for sharing outside MCP.
  • Composed workflow primitivesget_architecture (single-shot snapshot; pass resolution for a hierarchical file→package→service→system rollup), replay_episode (incident investigation from a symptom anchor), get_knowledge_gaps, get_surprising_connections, verify_citation, check_onboarding_performed, check_references, generate_skill, gortex_wakeup.

Token economy

  • GCX1 compact wire format — published, round-trippable text format. Opt-in per call via format: "gcx" on every list-shaped tool. Auto-served as the default for known clients (Claude Code, Cursor, VS Code, Zed, Aider, Kilo Code, OpenCode, OpenClaw, Codex) when no format is passed. Median −27.4 % savings vs JSON, best case −38.3 %, 100 % round-trip integrity. Spec: wire-format.md. Standalone MIT-licensed reference implementations: github.com/gortexhq/gcx-go and github.com/gortexhq/gcx-ts (npm @gortex/wire).
  • TOON fallback wire format — second-tier compact text (~10–15 % smaller than JSON, lossy but human-friendly). format: "toon".
  • Budget-by-default MCP responses — list-shaped tools cap each page at the project default budget and return next_cursor for the tail. Per-call caps via max_bytes and max_tokens (composable — tighter wins). Truncation markers ride on the response.
  • Graded-fidelity context economysmart_context fidelity: "graded" returns a context_manifest that tiers symbols by graph distance: focus symbols at full source, caller/callee ring as signature stubs, keyword-match remainder as outline — packed under one token_budget (which, like the seed count, scales with project size when unset). Large interchangeable symbol families are skeletonized to one representative. Every call also emits a blast_radius (callers by file + covering tests + a no-tests warning) and a file-clustered working_set. estimate: true projects a call's token cost before fetching. if_none_match turns a repeated call on unchanged code into a near-zero-token not_modified no-op. compress_bodies takes a keep predicate; fidelity_globs sets a per-glob full/compress/omit tier. max_lines does AST-aware salience truncation (control-flow skeleton kept, leaf runs collapse to … N lines elided …).
  • ETag conditional fetch — content-hash if_none_match on source-reading tools avoids re-transmitting unchanged symbols.
  • Response post-filter re-cutting — every large tool response captured into a bounded per-session ring; ctx_grep / ctx_slice / ctx_peek / head_results / ctx_stats re-cut a prior result without re-issuing the original query.
  • Token-savings tracking — per-call tokens_saved field; session-level metrics in graph_stats; gortex savings three-bucket dashboard (Today / Last 7 days / All time) with USD avoided priced per model. See savings.md.