Code-intelligence improvements: call-edge return-usage, per-function CFG reaching-defs, affected-by re-resolution, contract bridges, in-process type resolvers#83
Merged
Conversation
Stamp return_usage on every call edge at extraction time across Go, Python, JavaScript, TypeScript, Java, Rust, Ruby, and C#: discarded, assigned, partially_ignored, returned, goroutine, deferred, argument, condition. One shared parent-chain classifier driven by per-grammar node-kind tables, with closure and switch-expression boundaries honored so a call inside a block or match arm is not mislabeled by its enclosing statement; unknown shapes stay unstamped rather than mislabeled. Surfaced on find_usages (per-usage return_usage field + filter param) and verify_change (per-function return-usage distribution of real call sites, so a return-signature change shows how every call site consumes the value).
…xpoint New internal/cfg package: on-demand per-function CFGs from the tree-sitter AST for Go, Python, JavaScript, TypeScript, Java, Rust, and Ruby — basic blocks with per-statement def/use sets, labeled edges (branches, loops, labeled break/continue, switch/match fallthrough, try/except/finally), and a bitset GEN/KILL reaching-definitions fixpoint producing statement-granular def-to-use chains, with a Mermaid renderer. Exposed as the get_cfg MCP tool and analyze kind=def_use. internal/dataflow gained a CFG-backed refiner on flow_between and taint_paths: same-function value_flow hops are confirmed or pruned based on whether the def reaches the use, and pruned paths sink in the ranking.
When an incremental sync changes a file's symbol signatures (or removes symbols / changes their kind), the files that reference those symbols are re-resolved synchronously in the same pipeline; a body-only edit produces no delta and fans out to nothing. The delta is computed on a line-insensitive, graph-derived symbol shape so it is meaningful across languages and not defeated by line-embedded node IDs, and parse failures are not mistaken for symbol removal. Affected files come from a reverse reference-facts lookup (RefFactsReader.LoadRefFactsByTargets, backed by a new by-target index) unioned with a pre-evict in-edge snapshot, capped with truncation accounting. The no-delta path stays cheap. Also fixes the cold-index resolver shadow-swap and a stale ref-facts row left when a reference disappears.
IDL-aware contract extraction (.proto package/service/method canonical identities with brace-bounded service blocks; a Thrift extractor) plus a matcher join that pairs gRPC/Thrift providers and consumers across casing and package-qualification, gated on real gRPC evidence so plain Register*Server function definitions do not mint phantom providers. Each matched provider-consumer group materializes one persisted contract-bridge node, scoped to the (workspace, project) match boundary so unrelated services never merge, with deterministic node fields and reconcile serialization. The contracts tool gained action=bridge: a reciprocal-rank-fusion group query and a cross-service impact mode.
New internal/semantic/tstypes package: per-language type resolvers for Java, Python, Ruby, Rust, TypeScript/JavaScript, and C# that run fully in-process over the shared tree-sitter AST — no external language server. A table-driven engine builds per-file scope graphs, binds declared and constructor types, propagates them through local assignments, resolves receivers against the graph's method sets via import-aware cross-file lookup, and synthesizes implements/extends edges per language. Resolutions are stamped at the ast_resolved tier with semantic_source <lang>-types, never downgrading a stronger edge; ambiguous receivers are skipped. Enrichment is scoped to the repo being enriched, runs its graph-apply phase under the resolve mutex, persists full edge provenance on disk backends, and wires single-file incremental enrichment. Providers register as supplemental in the semantic manager and coexist with LSP providers.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Five independent code-intelligence improvements, one commit each. Every commit is fully implemented and tested; the branch builds with CGO and passes
go test -raceacross all touched packages.1. Return-value usage classification on call edges (
26d50ae7)Every call edge is stamped at extraction time with how its call site consumes the callee's return value —
discarded,assigned,partially_ignored,returned,goroutine,deferred,argument, orcondition. A single parent-chain classifier, driven by per-grammar node-kind tables, covers Go, Python, JavaScript, TypeScript, Java, Rust, Ruby, and C#; closure and switch-expression boundaries are honored so a call inside a block or match arm is not mislabeled by its enclosing statement, and unknown shapes stay unstamped rather than guessed. Surfaced onfind_usages(per-usage field + filter) andverify_change(per-function distribution of real call sites), so a return-signature change shows exactly how each caller uses the value.2. Per-function control-flow graphs + reaching-definitions fixpoint (
878dd0e7)New
internal/cfgpackage: on-demand per-function CFGs built from the tree-sitter AST for Go, Python, JavaScript, TypeScript, Java, Rust, and Ruby — basic blocks with per-statement def/use sets, labeled edges (branches, loops, labeled break/continue, switch/match fallthrough, try/except/finally), and a bitset GEN/KILL reaching-definitions fixpoint producing statement-granular def→use chains, plus a Mermaid renderer. Exposed as theget_cfgMCP tool andanalyze kind=def_use.internal/dataflowgained a CFG-backed refiner onflow_betweenandtaint_paths: same-function value-flow hops are confirmed or pruned based on whether the definition actually reaches the use, and pruned paths sink in the ranking.3. Affected-by re-resolution on incremental sync (
f518c494)When an incremental sync changes a file's symbol signatures (or removes symbols / changes their kind), the files that reference those symbols are re-resolved synchronously in the same pipeline; a body-only edit produces no delta and fans out to nothing. The delta is computed on a line-insensitive, graph-derived symbol shape, so it is meaningful across languages and is not defeated by line-embedded node IDs, and parse failures are no longer mistaken for symbol removal. Affected files come from a reverse reference-facts lookup (
RefFactsReader.LoadRefFactsByTargets, backed by a new by-target index) unioned with a pre-evict in-edge snapshot, capped with truncation accounting; the no-delta path stays cheap.4. Persisted cross-service contract-bridge subgraph (
30e58f45)IDL-aware contract extraction (
.protopackage/service/method canonical identities with brace-bounded service blocks; a Thrift extractor) plus a matcher join that pairs gRPC/Thrift providers and consumers across casing and package-qualification, gated on real gRPC evidence so plainRegister*Serverfunction definitions don't mint phantom providers. Each matched provider↔consumer group materializes one persisted contract-bridge node, scoped to the(workspace, project)match boundary so unrelated services never merge, with deterministic node fields and reconcile serialization. Thecontractstool gainedaction=bridge: a reciprocal-rank-fusion group query and a cross-service impact mode.5. In-process tree-sitter type resolvers for six languages (
8605e8fc)New
internal/semantic/tstypespackage: per-language type resolvers for Java, Python, Ruby, Rust, TypeScript/JavaScript, and C# that run fully in-process over the shared tree-sitter AST — no external language server spawned. A table-driven engine builds per-file scope graphs, binds declared and constructor types, propagates them through local assignments, resolves receivers against the graph's method sets via import-aware cross-file lookup, and synthesizes implements/extends edges per language. Resolutions are stamped at theast_resolvedtier withsemantic_source <lang>-types, never downgrading a stronger edge; ambiguous receivers are skipped. Enrichment is scoped to the repo being enriched, runs its graph-apply phase under the resolve mutex, persists full edge provenance on disk backends, and wires single-file incremental enrichment. Providers register as supplemental and coexist with LSP providers.Quality
main(v0.44.1) — conflicts with the recently merged work were resolved by union, and the merge surfaced (and fixed) one real omission where two additional call edges were left unstamped.go build ./...+ CGO binary build green;go vetclean;go test -racegreen acrossparser/languages,cfg,dataflow,graph,graph/store_sqlite,graph/storetest,analysis,contracts,semantic/...,serverstack,resolver,config,agents/...,indexer, andmcp(5,791 tests).Notes
.js(not.ts) function remains undetectable by the affected-by delta, because the JavaScript extractor emits no parameter-shape nodes; TypeScript, Python, Java, C#, Rust, and Go are covered. Closing this would require a change to the JS extractor, outside the scope of this branch.