Understand any GitHub repository instantly with a single command.
Repo Understanding Engine is a Python-based CLI tool that scans, analyzes, and summarizes codebases. It combines static analysis (files, folders, languages, entry points, dependencies) with optional AI-powered summarization (via OpenRouter) to generate professional documentation-ready summaries.
The project is currently in an active development phase (v0.1.0), focusing on:
- Robust repository scanning and metadata extraction
- AI-enhanced, presentation-ready summaries
- Cleaner storage, persistence, and CLI ergonomics
- Overview
- Key Languages and Development
- Top-Level Folders
- Main Entry Points
- Dependencies
- Setup & Installation
- Usage
- Contribution
- Notes for the New Phase
Repo Understanding Engine is a command-line tool that analyzes local and remote (GitHub) repositories to produce structured and human-readable summaries. It scans the repository file tree, detects languages and frameworks, identifies entry points, and aggregates dependency information. On top of that, it can leverage OpenRouter to generate professional, documentation-ready narratives.
The CLI exposes multiple commands to:
- Scan and list files, extensions, and folders
- Produce structured JSON summaries for downstream tooling
- Generate concise or enhanced markdown summaries suitable for READMEs and technical docs
This project itself is primarily a Python CLI application with configuration and documentation files.
-
Python (
.py)- Core language for the CLI, scanning logic, AI integration, and output generation.
- Used across modules in
src/repo_understanding_engine(core, ai, cli, output, models).
-
TOML (
.toml)- Project configuration and dependency management via
pyproject.toml. - Defines the
repo-understandconsole script, dependencies, and build system settings.
- Project configuration and dependency management via
-
Markdown (
.md)- Human-readable documentation (this
README.md). - Generated summaries are also written as markdown files into
repos/.
- Human-readable documentation (this
-
Environment Files (
.env,.env.example)- Used to configure environment variables such as
OPENROUTER_API_KEY. - Loaded via
python-dotenvat runtime.
- Used to configure environment variables such as
No frontend stack or database schema lives in this repo; instead, this tool is designed to analyze other repositories that may contain JS/TS, SQL, YAML, Docker, etc.
The repository is organized as follows:
-
src/- Root for the Python package
repo_understanding_engine. - Contains all application logic (CLI, scanning, AI integration, output formatting, models).
- Root for the Python package
-
src/repo_understanding_engine/ai/- AI integration layer.
- Includes the OpenRouter HTTP client, prompt templates, and AI summarizer utilities.
-
src/repo_understanding_engine/cli/- CLI entrypoint code based on Typer.
- Declares the
scan,analyze,summarize, andenhancedcommands.
-
src/repo_understanding_engine/core/- Core engine modules:
- Repository fetching (local path vs GitHub URL clone)
- Filesystem scanning and metadata extraction
- Language, folder, dependency, and entry-point detection
- Core engine modules:
-
src/repo_understanding_engine/output/- Output rendering and persistence:
- Raw JSON summary writing
- Human-readable and enhanced markdown summary generation
- File naming and timestamping for summaries
- Output rendering and persistence:
-
src/repo_understanding_engine/models/- Placeholder for data models (e.g., for potential Pydantic-based schemas).
-
tests/- Python test package for validating core scanning behavior.
- Currently minimal, ready to be extended as the engine evolves.
-
repos/- Runtime directory created by the tool.
- Stores cloned repositories when
--keepis used, along with generated summaries and raw structured outputs (e.g.,repos/raw/*.json).
-
Console script:
repo-understand- Defined in
pyproject.tomlunder[tool.poetry.scripts]. - Points to
repo_understanding_engine.cli.app:main.
- Defined in
-
Python entry function:
repo_understanding_engine.cli.app.main- Wires up Typer’s
appand exposes CLI commands.
- Wires up Typer’s
-
scan- Input: GitHub repo URL or local path.
- Behavior: Clones/fetches the repo, scans files and folders, and prints counts by extension and folder list.
-
analyze- Input: GitHub repo URL or local path.
- Behavior: Uses the scanner to produce a basic structured summary (path, languages, frameworks, main files, folders) and prints it to the console.
-
summarize- Input: GitHub repo URL or local path.
- Options:
--ai openrouter– enable AI-powered human-readable summary.--keep– keep cloned repositories under./repos/instead of using a temporary directory.
- Behavior:
- Fetches/clones the repo.
- Scans structure into a structured summary.
- Optionally calls OpenRouter to generate a human-friendly narrative summary.
- Saves both raw JSON (
repos/raw/*.json) and markdown summary (repos/*.md).
-
enhanced- Input: GitHub repo URL or local path.
- Options:
--ai openrouter– let AI generate a full professional README-style summary.--keep– keep cloned repository under./repos/.
- Behavior:
- Fetches/clones the repo.
- Runs an advanced scanner that extracts:
- Languages with purposes and file counts
- Top-level folders with inferred roles
- Entry points (server, app, index, Docker, etc.)
- Dependencies (Python/Node) and frameworks.
- If AI is enabled, prompts OpenRouter to generate a complete markdown summary following a strict structure (overview, languages, folders, entry points, dependencies, notes for the phase).
- Saves the enhanced summary to
repos/<repo>_summary_<timestamp>.md.
Core runtime dependencies are defined in pyproject.toml:
-
Typer
- Framework for building modern CLI applications.
- Powers argument parsing, command registration, and help text.
-
Rich
- Provides rich terminal output (colors, tables, formatting).
- Can be used to improve CLI UX (e.g., future progress bars, formatted summaries).
-
GitPython
- Used to clone and interact with Git repositories from Python.
- Powers handling of remote GitHub URLs vs local paths.
-
Pydantic (v2)
- Planned/available for defining typed models and validation.
- Suitable for structured summary schemas or configuration models.
-
httpx
- Modern HTTP client used for communicating with the OpenRouter API.
- Handles request/response lifecycle, timeouts, and error states.
-
python-dotenv
- Loads environment variables from
.env. - Used to configure
OPENROUTER_API_KEYand other secrets.
- Loads environment variables from
Development and build tooling:
- Poetry / poetry-core
- Dependency management and packaging.
- Defines the build backend and entry points.
External service integration:
- OpenRouter (via
ai/openrouter_client.py)- Provides access to hosted LLMs (e.g.,
meta-llama/llama-3.2-3b-instruct). - Used to generate advanced human-readable summaries and technical documentation.
- Provides access to hosted LLMs (e.g.,
- Python 3.11.14 or later (per
pyproject.toml). - Poetry installed for dependency management.
git clone <YOUR-FORK-OR-ORIGIN-URL> repo-understanding-engine
cd repo-understanding-enginepoetry installCreate a .env file (or copy from .env.example) and set:
OPENROUTER_API_KEY=sk-or-v1-... # your OpenRouter API keyAI features are optional. If OPENROUTER_API_KEY is not set, the tool will gracefully fall back to
non-AI summaries.
All commands are exposed through the repo-understand console script.
poetry run repo-understand scan https://github.com/owner/repoOutputs:
- Files grouped by extension (with counts)
- List of folders in the repository
poetry run repo-understand analyze /path/to/local/repoOutputs a simple structured summary:
- Repository path
- Languages and frameworks
- Main files
- Folders
# Without AI (structured only)
poetry run repo-understand summarize https://github.com/owner/repo
# With AI (OpenRouter-based human-readable summary)
poetry run repo-understand summarize https://github.com/owner/repo --ai openrouter
# Keep cloned repository under ./repos/
poetry run repo-understand summarize https://github.com/owner/repo --ai openrouter --keepResults:
- Raw JSON summary file in
repos/raw/<repo>.json. - Markdown summary file in
repos/<repo>_summary_<timestamp>.md.
# Rule-based enhanced summary
poetry run repo-understand enhanced https://github.com/owner/repo
# AI-generated full professional summary
poetry run repo-understand enhanced https://github.com/owner/repo --ai openrouterThe enhanced command is designed to produce documentation-quality output that can be used directly
in READMEs, technical docs, or slide decks.
Contributions are welcome. To propose changes or new features:
- Fork the repository.
- Create a feature branch:
git checkout -b feature/my-improvement
- Add tests under
tests/when introducing new behavior. - Run the CLI against a few sample repositories to validate behavior.
- Open a Pull Request, describing:
- What you changed
- Why it is useful
- How to reproduce or test it
You can also use GitHub Issues to report bugs, request features, or discuss improvements to the scanning, AI prompts, or CLI ergonomics.
The current development phase introduced several significant improvements:
-
Enhanced Scanner
- Language analysis now accounts for file counts and inferred purposes.
- Top-level folder purposes are detected using name patterns (e.g.,
backend,frontend,ai). - Entry points are detected for common file names (
app.py,server.js,Dockerfile, etc.). - Dependencies are extracted from
pyproject.toml,requirements*.txt, and nestedpackage.jsonfiles for target repositories.
-
AI-Driven Summaries
- Integration with OpenRouter via a dedicated client (
ai/openrouter_client.py) with robust error handling and logging. - New
enhancedcommand that can generate full professional summaries (README-style) for analyzed repositories using GPT-5.1-compatible models via OpenRouter.
- Integration with OpenRouter via a dedicated client (
-
Improved Storage & Cleanup
- All summaries and raw structured outputs are now stored under
repos/. - Optionally persistent cloned repositories (
--keep) to avoid re-cloning for repeated analysis. - Temporary directories are automatically cleaned up when not needed.
- All summaries and raw structured outputs are now stored under
-
CLI Experience
- Clearer commands (
scan,analyze,summarize,enhanced) with helpful options and messages. - Better separation of concerns across
core,ai,cli, andoutputmodules.
- Clearer commands (
Future phases can extend this engine with:
- Richer framework and architecture detection (e.g., microservices maps, monorepo layout).
- Deeper test coverage and CI integration.
- Optional HTML or PDF summary exports.