-
Notifications
You must be signed in to change notification settings - Fork 53
Description
Add support for querying the AST using graph query languages such as Cypher, SPARQL, or GraphQL.
This would allow developers to explore relationships between symbols (nodes) and their dependencies (edges), rather than extracting isolated symbols. Enable structural queries on code relationships, such as:
- Finding callers/callees of functions
- Traversing call chains
- Identifying unused code (leaf nodes with no incoming edges)
- Building dependency graphs
- Cross-file call graph construction
This would treat the AST as a graph where functions are nodes and calls are edges, enabling richer static analysis.
Motivation
When analyzing codebases, it’s often more valuable to understand how functions depend on each other than to list them.
By exposing the AST as a graph and enabling graph‑query execution, users could perform powerful structural searches and build call graphs directly from their code.
Example
Given this code
def load_data():
return fetch_from_db()
def fetch_from_db():
return parse_record("raw")
def parse_record(x):
return x.upper()
def main():
data = load_data()
print(data)The dependency chain is: main → load_data → fetch_from_db → parse_record - This structure is ideal for graph queries.
Graph Query Examples
When a user needs to find all callers of parse_record:
Cypher Examples
MATCH (caller)-[:CALLS]->(callee {name: "parse_record"})
RETURN caller;
SPARQL Examples
PREFIX code: <http://example.com/code#>
SELECT ?caller
WHERE {
?caller code:calls ?callee .
?callee code:name "parse_record" .
}
GraphQL API Examples
query {
functions(where: { calls: { name: "parse_record" } }) {
name
}
}
Use Cases
| Query | Purpose |
|---|---|
Which functions does load_data call? |
Inspect outgoing edges (dependencies). |
Which functions call parse_record? |
Reverse edges to find dependents. |
Show the full call chain starting from main() |
Traverse the graph (DFS/BFS). |
| Which functions are leaf nodes? | Identify functions with no outgoing calls. |
| Which functions are unused? | Identify nodes with no incoming edges. |
Why This Matters
- Enables richer static analysis
- Supports knowledge‑graph workflows
- Helps with refactoring, dead‑code detection, and architecture mapping
- Aligns with modern developer expectations around structural search
Architectural comments
- Cypher may offer the simplest query syntax, but I don’t have a strong preference. GraphQL, despite being API‑oriented, can still serve well for graph‑style lookups. I’d prefer to leave the final choice of query language to the development team.
- The graph model should be as granular as possible, capturing precise relationships across classes, methods, libraries, constants, and other symbol types.
- Querying should operate on an in‑memory AST graph with no external indexing or synchronization steps. Import/export‑based indexing introduces lag and complexity, which would significantly reduce developer adoption.
- A practical starting point would be supporting widely used languages such as Python, TypeScript, and Rust.
- Query capabilities should be accessible both manually by users and programmatically by LLMs through the MCP query tool.