Skip to content

feat: wire up framework route extraction#89

Merged
colbymchenry merged 19 commits into
colbymchenry:mainfrom
timomeara:feat/framework-extract-wiring
May 8, 2026
Merged

feat: wire up framework route extraction#89
colbymchenry merged 19 commits into
colbymchenry:mainfrom
timomeara:feat/framework-extract-wiring

Conversation

@timomeara
Copy link
Copy Markdown
Contributor

Problem

FrameworkResolver.extractNodes is declared in the type at src/resolution/types.ts but has zero callers across the entire src/ tree (confirmed via grep). Meanwhile every framework resolver (Django, Flask, FastAPI, Express, Laravel, Rails, Spring, Go, Rust, C#, Swift, React, Svelte) ships an extractNodes implementation that does real work and is then discarded.

As a result, the graph has zero route kind nodes in practice — checked on a real Django codebase: 23 urls.py files indexed, 0 route nodes produced, 0 edges from URL configs to view classes. codegraph_callers(MyView) silently misses its most important caller: the URL pattern that binds it.

Separately, the existing Django extractor's regex captures the view name in group 2 but the destructure discards it, so even if the hook were wired up it wouldn't link routes to views. Similar bugs exist in other frameworks.

Fix

  • Replaces the dead extractNodes?(filePath, content): Node[] hook with extract?(filePath, content): { nodes, references }.
  • Runs extract() inside the extraction pipeline for every framework whose declared languages include the current file's language. The orchestrator detects frameworks once per index run via a filesystem-backed ResolutionContext and plumbs the names through the parse-worker boundary (strings, not function refs — structured clone can't serialize methods).
  • Updates all 13 existing framework resolvers to emit both route nodes AND handler references. The references flow through the existing resolution pipeline (name matching, import resolution, framework-specific resolve()) to produce route -> handler edges with kind references.

After this change, codegraph_callers(UserListView) on a Django project returns the URL pattern that binds it.

Frameworks covered

Framework Shapes recognized
Django path(), re_path(), url(), include() in urls.py (CBV .as_view(), dotted module paths)
Flask @app.route('/x', methods=[...]), blueprint routes
FastAPI @app.get(...), @router.post(...), all standard methods
Express app.get(...), router.post(...) with middleware chains (handler = last arg)
Laravel Route::get(), Route::resource(), Controller@action, tuple syntax
Rails get '/x', to: 'users#index', hash-rocket =>
Spring @GetMapping, @PostMapping, @RequestMapping on methods
Gin / chi / gorilla / mux r.GET(...), router.HandleFunc(...)
Axum / actix / Rocket .route("/x", get(handler))
ASP.NET [HttpGet("/x")] attributes
Vapor app.get("x", use: handler)
React Router / SvelteKit Route component nodes (interface migration only; handler refs are a follow-up)

Tests

  • Unit tests per framework in __tests__/frameworks.test.ts — 29 tests. Each framework asserts a representative route pattern produces both the expected route node and a handler reference with correct fromNodeId / referenceName / referenceKind.
  • End-to-end Django test in __tests__/frameworks-integration.test.ts — builds a real tmp Django project on disk (manage.py, requirements.txt, users/views.py with UserListView, users/urls.py with path("users/", UserListView.as_view(), ...)), runs full indexAll(), asserts the route node exists, the class node exists, and an edge between them with kind references.

Before this PR the integration test fails (0 route nodes). After, it passes. Full suite: 410 tests, 409 pass. The 1 pre-existing failure is FileWatcher > debounced sync > should trigger sync after file change — an fs.watch timing flake that reproduces on the base commit too and is unrelated to this work.

Architecture

The cleanest hook point turned out to be inside extractFromSource itself, because both the main-thread fallback path and the worker-thread parse path go through it. That way the worker doesn't need to know anything about framework objects, only a string[] of detected names.

indexAll()
  ├─ detectFrameworks() → string[]        (once per run, filesystem-backed context)
  └─ for each file: postMessage({ ..., frameworkNames })
       worker: extractFromSource(path, content, lang, frameworkNames)
         ├─ tree-sitter pass → {nodes, unresolvedReferences, errors}
         └─ for fw in getApplicableFrameworks(names, lang):
              fw.extract(path, content) → {nodes, references}
              merge into result

The references flow through the existing ReferenceResolver.resolveAll so they're linked by the same name-matching / import-resolution / framework resolve() machinery that handles every other kind of reference. That means Django's view-class-targeting logic in djangoResolver.resolve() is re-used automatically for route references — no new resolution path to maintain.

Scope notes

  • Regex-based extraction throughout. AST-based is a tracked follow-up (the plan doc explicitly scopes it out). Current regex handles the realistic shapes covered by the test suite; known edge cases (namespaced include(('api.urls', 'api')), comments containing fake path(...) calls, DRF router.register action expansion) are listed as follow-ups.
  • Node IDs embed line numbers (route:<file>:<line>:<url>). Matches existing framework precedent; an edit that adds a route at the top of a file will churn downstream IDs. Worth revisiting when incremental indexing lands.
  • React Router / SvelteKit only migrate to the new interface without emitting handler refs — <Route element={<Page/>}/>Page wiring is a follow-up.

Stats

Category Lines
Production code (src/) +760 / -683
Tests (tests/) +370
Docs (README + plan) +1139

The bulk of the docs delta is docs/plans/2026-04-24-framework-resolver-extract.md — the implementation plan. Happy to drop that commit if you'd prefer the PR without the planning artifact.

Commit sequence

15 commits, one per framework (revertable independently):

docs: add framework extract wiring plan
feat(resolution): replace extractNodes with extract() returning nodes and references
feat(resolution): add getApplicableFrameworks helper for per-language dispatch
feat(django): emit route nodes and route->view references in extract()
feat(flask,fastapi): emit route nodes and route->handler references
feat(express): emit route nodes and route->handler references
feat(laravel): emit route nodes and route->handler references
feat(rails): emit route nodes and route->handler references
feat(spring): emit route nodes and route->handler references
feat(go): emit route nodes and route->handler references
feat(rust): emit route nodes and route->handler references
feat(aspnet): emit route nodes and route->handler references
feat(swift,vapor): emit route nodes and route->handler references
chore(react,svelte): migrate resolvers to extract() interface
feat(extraction): run framework extractors after tree-sitter parse
docs: document framework route extraction

@andreinknv
Copy link
Copy Markdown
Contributor

Hi @timomeara — solid PR. The bug verification is rock-solid (I confirmed the same extractNodes zero-callers result via grep) and the architectural moves are well-chosen: the worker-boundary string-passing, reusing ReferenceResolver.resolveAll for handler-ref resolution, per-framework commits.

Two concrete improvements that I think would land cleanly inside this PR. Both prevent latent bugs and one of them removes a merge conflict you'll hit with #101 when both merge.

1. Apply stripCommentsForRegex inside every framework's extract()

Why: Your regex-based extractors run against raw source. A commented-out or docstring example like:

# Example: path('/users/', UserListView.as_view())
# or
"""
Routing example:
    path('/admin/', AdminPanel.as_view())
"""

…will currently produce phantom route nodes. PR #101 fixed exactly this for the (now-dead) extractNodes path by adding stripCommentsForRegex(content, language) before regex matching — newlines preserved so line numbers stay correct.

That fix needs to port forward into your new extract() methods. Without it, the bug #101 fixed will silently re-introduce as soon as your PR merges.

Bonus: doing this also makes #101 + this PR mergeable in either order. Without it, #101 and this PR conflict on every framework file.

Patch (Django, as a representative — same shape for Flask/FastAPI/Express/Laravel/etc.):

--- a/src/resolution/frameworks/python.ts
+++ b/src/resolution/frameworks/python.ts
+import { stripCommentsForRegex } from '../../utils';

 export const djangoResolver: FrameworkResolver = {
   // ...
   extract(filePath, content) {
     if (!filePath.endsWith('.py')) return { nodes: [], references: [] };

+    // Neutralize comments and docstrings so a `path('/x', view)` example
+    // inside a docstring or commented line isn't extracted as a real route.
+    // Newlines preserved so line numbers map back to the original source.
+    const safe = stripCommentsForRegex(content, 'python');
+
     const nodes: Node[] = [];
     const references: UnresolvedRef[] = [];
     const now = Date.now();
     const routeRegex = /\b(path|re_path|url)\s*\(\s*r?['"]([^'"]+)['"]\s*,\s*([\w.]+(?:\s*\([^)]*\))?)/g;

     let match: RegExpExecArray | null;
-    while ((match = routeRegex.exec(content)) !== null) {
+    while ((match = routeRegex.exec(safe)) !== null) {
       const [, _fn, urlPath, handlerExpr] = match;
-      const line = content.slice(0, match.index).split('\n').length;
+      const line = safe.slice(0, match.index).split('\n').length;
       // ... rest unchanged
     }
   },
 };

stripCommentsForRegex is exported from src/utils.ts — it handles per-language comment markers (Python # + triple-quoted docstrings, JS/TS // + /* */, Ruby # + =begin/=end, PHP // # /* */, etc.) and replaces comment characters with spaces so regex offsets stay valid.

Languages that need this in your PR: python.ts (Django/Flask/FastAPI all share the file), express.ts, laravel.ts, csharp.ts, rust.ts, go.ts, java.ts, ruby.ts, swift.ts. One import line + one stripCommentsForRegex call + one variable rename per file. ~10 minutes of mechanical work.

Regression test to add:

// __tests__/frameworks.test.ts > Django
it('does not extract commented-out routes', () => {
  const content = `
# urls.py example:
# path('/admin/', AdminPanel.as_view())
"""
Other routing example:
    path('/users/', UserListView.as_view())
"""
urlpatterns = [
    path('/real/', RealView.as_view()),  # this one is real
]
`;
  const result = djangoResolver.extract!('app/urls.py', content);
  const urls = result.nodes.map(n => n.name);
  expect(urls).toEqual(['/real/']);  // not '/admin/' or '/users/'
});

This test fails on the current PR; passes once stripCommentsForRegex is wired in.

2. Extract makeRouteNode() helper (kills ~150 LOC of duplication)

Why: every framework rebuilds the same Node shape. Centralizing makes the Node shape evolve in one place if it ever changes (and it has — see PR #112 adding centrality).

New file src/resolution/frameworks/utils.ts:

import type { Node, Language } from '../../types';

/**
 * Build a `route` Node from a regex match. Used by every framework
 * extractor to keep route Node construction consistent and DRY.
 *
 * Note on IDs: route IDs embed the line number so two routes in the
 * same file with the same URL stay distinct. Adding a route at the
 * top of a file churns downstream IDs — non-blocking until incremental
 * indexing requires stable IDs across edits.
 */
export function makeRouteNode(args: {
  filePath: string;
  url: string;
  line: number;
  endColumn: number;
  language: Language;
}): Node {
  return {
    id: `route:${args.filePath}:${args.line}:${args.url}`,
    kind: 'route',
    name: args.url,
    qualifiedName: `${args.filePath}::route:${args.url}`,
    filePath: args.filePath,
    startLine: args.line,
    endLine: args.line,
    startColumn: 0,
    endColumn: args.endColumn,
    language: args.language,
    updatedAt: Date.now(),
  };
}

Per-framework usage becomes:

-      const routeNode: Node = {
-        id: `route:${filePath}:${line}:${urlPath}`,
-        kind: 'route',
-        name: urlPath!,
-        qualifiedName: `${filePath}::route:${urlPath}`,
-        filePath,
-        startLine: line,
-        endLine: line,
-        startColumn: 0,
-        endColumn: match[0].length,
-        language: 'python',
-        updatedAt: now,
-      };
-      nodes.push(routeNode);
+      const routeNode = makeRouteNode({
+        filePath, url: urlPath!, line,
+        endColumn: match[0].length, language: 'python',
+      });
+      nodes.push(routeNode);

13 frameworks × ~12 lines of boilerplate → 1 import + 1 call each. Net: ~150 LOC removed, every Node-shape change becomes one-file.

Optional follow-ups (NOT blockers for this PR)

These two are worth tracking but I wouldn't expand scope here:

(a) Move frameworks to a per-file registry — currently src/resolution/frameworks/index.ts is a list-of-imports + ALL_RESOLVERS array, which is the same shape that caused conflict bottlenecks for languages (#116) and MCP tools (#117). Same per-file registry pattern would make adding a framework a one-file addition. But that's a structural refactor that should be its own PR; doing it inside this one would distract from the (significant) bug fix.

(b) Surface framework-extractor failures in IndexResult — currently a thrown error becomes a per-file result.errors entry with severity 'warning'. Useful, but easy to miss. A summary like frameworkExtractorFailures: { django: 47, express: 0 } on IndexResult would let users detect "wired up but producing nothing." 5-line addition; can ship as a follow-up.

Bottom line

Suggested addition order:

  1. Add makeRouteNode() helper file (one new file, ~25 lines)
  2. Migrate each framework's extract() to use it (purely mechanical replacement)
  3. Add stripCommentsForRegex import + call to each framework that scans source content (one line + one variable rename per file)
  4. Add the regression test above (one new it block per framework)

Items 1+2 are pure cleanup. Item 3 is the conflict fix with #101 and prevents a real bug class. Item 4 locks in the behavior so it can't regress.

Happy to send a follow-up patch directly if it'd help — let me know.

@timomeara
Copy link
Copy Markdown
Contributor Author

Thanks @andreinknv — appreciate the careful read.

On comment stripping: agreed, this is a real bug class and worth fixing in this PR rather than relying on follow-up coordination. Since stripCommentsForRegex lives in #101 (also unmerged), I'll add a small inline comment-stripper to this PR so it can land independently. If #101 merges first with stripCommentsForRegex exported from src/utils.ts, we can dedupe in a follow-up — happy to defer to whichever helper @colbymchenry prefers as the canonical one. Either way, the comments-as-routes bug gets fixed and our PRs stop conflicting.

On makeRouteNode(): the duplication is real, but I'd rather keep this PR focused on the wiring fix and let @colbymchenry decide whether the dedup belongs here or as a follow-up. If he wants it in this PR, I'll add it — happy to do whichever lands cleaner.

On the optional follow-ups: agree both make sense as separate PRs (per-file registry, IndexResult failure summary). I'd rather not expand scope here.

Will push the comment-stripper + regression tests shortly.

… extractors

Replaces comment characters and string-literal contents with spaces (not
removal) so source offsets stay valid for downstream regex match index ->
line number conversion. Handles Python triple-quoted docstrings, Ruby
=begin/=end, Rust nested block comments, and the standard //, #, /* */
forms across the supported languages.

This is consumed by framework extract() methods in a follow-up commit so
that commented-out / docstring routing examples don't surface as phantom
route nodes in the graph.
…antom routes)

Pipes the per-language stripCommentsForRegex helper into every framework
extract() that scans raw source: django/flask/fastapi (python.ts),
express, laravel, rails, spring, go, rust, aspnet, vapor, plus
swiftui/uikit struct extraction in swift.ts.

Without this, examples like:

    # path('/admin/', AdminPanel.as_view())
    """ path('/users/', UserListView.as_view()) """
    urlpatterns = [path('/real/', RealView.as_view())]

produced 3 phantom route nodes. Now only the real one is extracted.

Each framework gets a regression test in __tests__/frameworks.test.ts
asserting that line-, block-, docstring- and (where relevant)
heredoc-style commented-out routes do not surface as nodes.
Conflict resolution after rebasing main onto this PR:

- src/extraction/tree-sitter.ts: main added VueExtractor (new file
  src/extraction/vue-extractor.ts via colbymchenry#66). The PR's restructured
  if/else chain in extractFromSource gets a new vue branch alongside
  svelte/liquid/dfm so the framework-extract pipeline runs uniformly
  for vue files too.
- src/resolution/frameworks/vue.ts: vue resolver still used the dead
  extractNodes(): Node[] interface that this PR replaced. Migrated to
  extract(): { nodes, references } matching the other 13 resolvers —
  Vue's nuxt route detection (pages/, server/api/, middleware/) keeps
  working, just emits no references (matches react.ts shape).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@colbymchenry
Copy link
Copy Markdown
Owner

This is excellent work — thanks Tim. The bug analysis is genuine (extractNodes had zero callers, so every framework's route extraction was being silently discarded), the per-framework commits make this reviewable, and the integration test that builds a real Django project on disk to assert the route→view edge is exactly the right kind of verification.

Pushed a conflict-resolution commit to your branch:

  • The PR restructured extractFromSource from independent if/return branches into an if/else if/else chain so framework extraction can run after every language extractor. Main has since added Vue support (feat: add Vue support #66) — added a Vue branch to your chain.
  • src/resolution/frameworks/vue.ts (added by feat: add Vue support #66) was still using the dead extractNodes interface that this PR replaces. Migrated it to extract() so it builds.

Full test suite: 481/481 passing (423 → 481, +58 new from your tests). Merging.

@colbymchenry colbymchenry merged commit 7432781 into colbymchenry:main May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants