Skip to content

Add deeper Python/Go semantic indexing passes beyond tree-sitter AST #84

@mohanagy

Description

@mohanagy

Goal

Move Python and Go from the current tree-sitter AST baseline to a deeper semantic indexing pass that resolves cross-file calls, type relationships, and (where possible) framework-shaped nodes for the major Python/Go web stacks.

Why

The README's honest disclosure explicitly calls out:

Deep extraction is best on JS/TS with framework-aware passes for Express, Redux Toolkit, React Router, NestJS, and Next.js. Python / Ruby / Go / Java / Rust use tree-sitter AST.

For polyglot codebases — and especially Python+Django/FastAPI backends or Go+Gin/Chi services — the tree-sitter AST baseline gives us syntax structure but not the cross-file behavioral edges that make retrieve / impact / pr_impact strong on TS today. Closing this gap is the single biggest unlock for non-JS users.

Deliverables

Python:

  • Cross-file import resolution with module/package awareness (so from a.b import c resolves to the actual symbol).
  • Call edges between resolved symbols where statically determinable.
  • First-pass framework awareness for at least Django (URL conf → view → model) and FastAPI (router → endpoint → dependency injection).
  • Test fixtures under examples/demo-repo/ (or a Python sibling) exercising the new edges.

Go:

  • Cross-package import resolution.
  • Method-receiver and interface-implementation edges.
  • First-pass framework awareness for at least net/http handlers, Gin routers, and Chi routers (route → handler → dependency).
  • Test fixtures exercising the new edges.

Cross-cutting:

  • Updated docs/language-capability-matrix.md describing the new tier ("Tree-sitter + semantic resolution") for Python and Go.
  • Updated README honesty disclosure once the new tier ships.

Acceptance criteria

  • A Python/Django fixture produces URL-conf → view → model edges; a FastAPI fixture produces router → endpoint → dependency edges.
  • A Go/Gin fixture produces route → handler edges with method-receiver resolution working across files.
  • retrieve against a polyglot fixture produces materially more relevant results than the current tree-sitter-only baseline.
  • The capability matrix and README accurately describe the new tier — no overclaiming.
  • Existing tree-sitter extraction continues to work as a fallback when the deeper pass cannot resolve a target.

Suggested milestone

v0.17-language-expansion

Roadmap source

README "Roadmap" section.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestindexingIndex build / extraction / SPIlanguage-supportNon-JS/TS language extractorsroadmapTracked on the public README roadmap

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions