Skip to content

Feature Request: Support Graph Query Languages for AST‑Based Code Analysis #37

@qdrddr

Description

@qdrddr

Add support for querying the AST using graph query languages such as Cypher, SPARQL, or GraphQL.
This would allow developers to explore relationships between symbols (nodes) and their dependencies (edges), rather than extracting isolated symbols. Enable structural queries on code relationships, such as:

  • Finding callers/callees of functions
  • Traversing call chains
  • Identifying unused code (leaf nodes with no incoming edges)
  • Building dependency graphs
  • Cross-file call graph construction
    This would treat the AST as a graph where functions are nodes and calls are edges, enabling richer static analysis.

Motivation

When analyzing codebases, it’s often more valuable to understand how functions depend on each other than to list them.
By exposing the AST as a graph and enabling graph‑query execution, users could perform powerful structural searches and build call graphs directly from their code.

Example

Given this code

def load_data():
    return fetch_from_db()

def fetch_from_db():
    return parse_record("raw")

def parse_record(x):
    return x.upper()

def main():
    data = load_data()
    print(data)

The dependency chain is: main → load_data → fetch_from_db → parse_record - This structure is ideal for graph queries.

Graph Query Examples

When a user needs to find all callers of parse_record:

Cypher Examples

MATCH (caller)-[:CALLS]->(callee {name: "parse_record"})
RETURN caller;

SPARQL Examples

PREFIX code: <http://example.com/code#>

SELECT ?caller
WHERE {
  ?caller code:calls ?callee .
  ?callee code:name "parse_record" .
}

GraphQL API Examples

query {
  functions(where: { calls: { name: "parse_record" } }) {
    name
  }
}

Use Cases

Query Purpose
Which functions does load_data call? Inspect outgoing edges (dependencies).
Which functions call parse_record? Reverse edges to find dependents.
Show the full call chain starting from main() Traverse the graph (DFS/BFS).
Which functions are leaf nodes? Identify functions with no outgoing calls.
Which functions are unused? Identify nodes with no incoming edges.

Why This Matters

  • Enables richer static analysis
  • Supports knowledge‑graph workflows
  • Helps with refactoring, dead‑code detection, and architecture mapping
  • Aligns with modern developer expectations around structural search

Architectural comments

  1. Cypher may offer the simplest query syntax, but I don’t have a strong preference. GraphQL, despite being API‑oriented, can still serve well for graph‑style lookups. I’d prefer to leave the final choice of query language to the development team.
  2. The graph model should be as granular as possible, capturing precise relationships across classes, methods, libraries, constants, and other symbol types.
  3. Querying should operate on an in‑memory AST graph with no external indexing or synchronization steps. Import/export‑based indexing introduces lag and complexity, which would significantly reduce developer adoption.
  4. A practical starting point would be supporting widely used languages such as Python, TypeScript, and Rust.
  5. Query capabilities should be accessible both manually by users and programmatically by LLMs through the MCP query tool.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions