Engineering Codebase Intelligence Systems for Rapid FDE Onboarding.
The Brownfield Cartographer is a multi-agent system that ingests any GitHub repository or local path and produces a living, queryable knowledge graph of the system's architecture, data flows, and semantic structure.
- Structural Analysis (Surveyor Agent): Uses
tree-sitterfor language-agnostic AST parsing. Builds module import graphs, identifies architectural hubs (PageRank), and detects circular dependencies. - Data Lineage (Hydrologist Agent): Specialized for data engineering. Analyzes data flows across Python (pandas, PySpark), SQL (
sqlglot), and configuration boundaries. - Semantic Analysis (Semanticist Agent): Uses Gemini LLM to generate business-oriented purpose statements for every module.
- Semantic Search (Semantic Index): Vector-indexed knowledge base powered by Qdrant and Gemini embeddings.
- Living Context (Archivist Agent): Produces
CODEBASE.mdandonboarding_brief.mdfor instant architectural awareness.
- uv for dependency management.
- Google Gemini API Key (
GEMINI_API_KEY). - Qdrant Cluster (Endpoint and API Key).
git clone <this-repo-url>
cd "The Brownfield Cartographer"
uv syncCreate a .env file in the root directory:
GEMINI_API_KEY=your_gemini_key
QDRANT_API_KEY=your_qdrant_key
QDRANT_CLUSTER_ENDPOINT=your_qdrant_endpointAnalyze a GitHub repository or local path:
# Analyze a GitHub repo
uv run main.py https://github.com/meltano/meltano #example
# Analyze a local path
uv run main.py /path/to/your/projectResults are stored in .cartography/<project_name>/.
- Surveyor: Static structure analyst.
- Hydrologist: Data flow & lineage analyst.
- Semanticist: LLM-powered purpose analyst.
- Archivist: Living context maintainer.
MIT License - see LICENSE for details.