Repo Understanding Engine

Understand any GitHub repository instantly with a single command.

Repo Understanding Engine is a Python-based CLI tool that scans, analyzes, and summarizes codebases. It combines static analysis (files, folders, languages, entry points, dependencies) with optional AI-powered summarization (via OpenRouter) to generate professional documentation-ready summaries.

The project is currently in an active development phase (v0.1.0), focusing on:

Robust repository scanning and metadata extraction
AI-enhanced, presentation-ready summaries
Cleaner storage, persistence, and CLI ergonomics

Overview

Repo Understanding Engine is a command-line tool that analyzes local and remote (GitHub) repositories to produce structured and human-readable summaries. It scans the repository file tree, detects languages and frameworks, identifies entry points, and aggregates dependency information. On top of that, it can leverage OpenRouter to generate professional, documentation-ready narratives.

The CLI exposes multiple commands to:

Scan and list files, extensions, and folders
Produce structured JSON summaries for downstream tooling
Generate concise or enhanced markdown summaries suitable for READMEs and technical docs

Key Languages and Development

This project itself is primarily a Python CLI application with configuration and documentation files.

Python (.py)
- Core language for the CLI, scanning logic, AI integration, and output generation.
- Used across modules in src/repo_understanding_engine (core, ai, cli, output, models).
TOML (.toml)
- Project configuration and dependency management via pyproject.toml.
- Defines the repo-understand console script, dependencies, and build system settings.
Markdown (.md)
- Human-readable documentation (this README.md).
- Generated summaries are also written as markdown files into repos/.
Environment Files (.env, .env.example)
- Used to configure environment variables such as OPENROUTER_API_KEY.
- Loaded via python-dotenv at runtime.

No frontend stack or database schema lives in this repo; instead, this tool is designed to analyze other repositories that may contain JS/TS, SQL, YAML, Docker, etc.

Top-Level Folders

The repository is organized as follows:

src/
- Root for the Python package repo_understanding_engine.
- Contains all application logic (CLI, scanning, AI integration, output formatting, models).
src/repo_understanding_engine/ai/
- AI integration layer.
- Includes the OpenRouter HTTP client, prompt templates, and AI summarizer utilities.
src/repo_understanding_engine/cli/
- CLI entrypoint code based on Typer.
- Declares the scan, analyze, summarize, and enhanced commands.
src/repo_understanding_engine/core/
- Core engine modules:
  - Repository fetching (local path vs GitHub URL clone)
  - Filesystem scanning and metadata extraction
  - Language, folder, dependency, and entry-point detection
src/repo_understanding_engine/output/
- Output rendering and persistence:
  - Raw JSON summary writing
  - Human-readable and enhanced markdown summary generation
  - File naming and timestamping for summaries
src/repo_understanding_engine/models/
- Placeholder for data models (e.g., for potential Pydantic-based schemas).
tests/
- Python test package for validating core scanning behavior.
- Currently minimal, ready to be extended as the engine evolves.
repos/
- Runtime directory created by the tool.
- Stores cloned repositories when --keep is used, along with generated summaries and raw structured outputs (e.g., repos/raw/*.json).

Main Entry Points

CLI Entry Point

Console script: repo-understand
- Defined in pyproject.toml under [tool.poetry.scripts].
- Points to repo_understanding_engine.cli.app:main.
Python entry function: repo_understanding_engine.cli.app.main
- Wires up Typer’s app and exposes CLI commands.

Key CLI Commands (in `src/repo_understanding_engine/cli/app.py`)

scan
- Input: GitHub repo URL or local path.
- Behavior: Clones/fetches the repo, scans files and folders, and prints counts by extension and folder list.
analyze
- Input: GitHub repo URL or local path.
- Behavior: Uses the scanner to produce a basic structured summary (path, languages, frameworks, main files, folders) and prints it to the console.
summarize
- Input: GitHub repo URL or local path.
- Options:
  - --ai openrouter – enable AI-powered human-readable summary.
  - --keep – keep cloned repositories under ./repos/ instead of using a temporary directory.
- Behavior:
  - Fetches/clones the repo.
  - Scans structure into a structured summary.
  - Optionally calls OpenRouter to generate a human-friendly narrative summary.
  - Saves both raw JSON (repos/raw/*.json) and markdown summary (repos/*.md).
enhanced
- Input: GitHub repo URL or local path.
- Options:
  - --ai openrouter – let AI generate a full professional README-style summary.
  - --keep – keep cloned repository under ./repos/.
- Behavior:
  - Fetches/clones the repo.
  - Runs an advanced scanner that extracts:
    - Languages with purposes and file counts
    - Top-level folders with inferred roles
    - Entry points (server, app, index, Docker, etc.)
    - Dependencies (Python/Node) and frameworks.
  - If AI is enabled, prompts OpenRouter to generate a complete markdown summary following a strict structure (overview, languages, folders, entry points, dependencies, notes for the phase).
  - Saves the enhanced summary to repos/<repo>_summary_<timestamp>.md.

Dependencies

Core runtime dependencies are defined in pyproject.toml:

Typer
- Framework for building modern CLI applications.
- Powers argument parsing, command registration, and help text.
Rich
- Provides rich terminal output (colors, tables, formatting).
- Can be used to improve CLI UX (e.g., future progress bars, formatted summaries).
GitPython
- Used to clone and interact with Git repositories from Python.
- Powers handling of remote GitHub URLs vs local paths.
Pydantic (v2)
- Planned/available for defining typed models and validation.
- Suitable for structured summary schemas or configuration models.
httpx
- Modern HTTP client used for communicating with the OpenRouter API.
- Handles request/response lifecycle, timeouts, and error states.
python-dotenv
- Loads environment variables from .env.
- Used to configure OPENROUTER_API_KEY and other secrets.

Development and build tooling:

Poetry / poetry-core
- Dependency management and packaging.
- Defines the build backend and entry points.

External service integration:

OpenRouter (via ai/openrouter_client.py)
- Provides access to hosted LLMs (e.g., meta-llama/llama-3.2-3b-instruct).
- Used to generate advanced human-readable summaries and technical documentation.

Setup & Installation

Prerequisites

Python 3.11.14 or later (per pyproject.toml).
Poetry installed for dependency management.

Clone the Repository

git clone <YOUR-FORK-OR-ORIGIN-URL> repo-understanding-engine
cd repo-understanding-engine

Install Dependencies

poetry install

Configure Environment (for AI features)

Create a .env file (or copy from .env.example) and set:

OPENROUTER_API_KEY=sk-or-v1-...   # your OpenRouter API key

AI features are optional. If OPENROUTER_API_KEY is not set, the tool will gracefully fall back to non-AI summaries.

Usage

All commands are exposed through the repo-understand console script.

Basic Scan

poetry run repo-understand scan https://github.com/owner/repo

Outputs:

Files grouped by extension (with counts)
List of folders in the repository

Basic Analysis

poetry run repo-understand analyze /path/to/local/repo

Outputs a simple structured summary:

Repository path
Languages and frameworks
Main files
Folders

Summarize (Structured + Optional AI)

# Without AI (structured only)
poetry run repo-understand summarize https://github.com/owner/repo

# With AI (OpenRouter-based human-readable summary)
poetry run repo-understand summarize https://github.com/owner/repo --ai openrouter

# Keep cloned repository under ./repos/
poetry run repo-understand summarize https://github.com/owner/repo --ai openrouter --keep

Results:

Raw JSON summary file in repos/raw/<repo>.json.
Markdown summary file in repos/<repo>_summary_<timestamp>.md.

Enhanced Summary (README-Style)

# Rule-based enhanced summary
poetry run repo-understand enhanced https://github.com/owner/repo

# AI-generated full professional summary
poetry run repo-understand enhanced https://github.com/owner/repo --ai openrouter

The enhanced command is designed to produce documentation-quality output that can be used directly in READMEs, technical docs, or slide decks.

Contribution

Contributions are welcome. To propose changes or new features:

Fork the repository.
Create a feature branch:
```
git checkout -b feature/my-improvement
```
Add tests under tests/ when introducing new behavior.
Run the CLI against a few sample repositories to validate behavior.
Open a Pull Request, describing:
- What you changed
- Why it is useful
- How to reproduce or test it

You can also use GitHub Issues to report bugs, request features, or discuss improvements to the scanning, AI prompts, or CLI ergonomics.

Notes for the New Phase

The current development phase introduced several significant improvements:

Enhanced Scanner
- Language analysis now accounts for file counts and inferred purposes.
- Top-level folder purposes are detected using name patterns (e.g., backend, frontend, ai).
- Entry points are detected for common file names (app.py, server.js, Dockerfile, etc.).
- Dependencies are extracted from pyproject.toml, requirements*.txt, and nested package.json files for target repositories.
AI-Driven Summaries
- Integration with OpenRouter via a dedicated client (ai/openrouter_client.py) with robust error handling and logging.
- New enhanced command that can generate full professional summaries (README-style) for analyzed repositories using GPT-5.1-compatible models via OpenRouter.
Improved Storage & Cleanup
- All summaries and raw structured outputs are now stored under repos/.
- Optionally persistent cloned repositories (--keep) to avoid re-cloning for repeated analysis.
- Temporary directories are automatically cleaned up when not needed.
CLI Experience
- Clearer commands (scan, analyze, summarize, enhanced) with helpful options and messages.
- Better separation of concerns across core, ai, cli, and output modules.

Future phases can extend this engine with:

Richer framework and architecture detection (e.g., microservices maps, monorepo layout).
Deeper test coverage and CI integration.
Optional HTML or PDF summary exports.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Repo Understanding Engine

Table of Contents

Overview

Key Languages and Development

Top-Level Folders

Main Entry Points

CLI Entry Point

Key CLI Commands (in `src/repo_understanding_engine/cli/app.py`)

Dependencies

Setup & Installation

Prerequisites

Clone the Repository

Install Dependencies

Configure Environment (for AI features)

Usage

Basic Scan

Basic Analysis

Summarize (Structured + Optional AI)

Enhanced Summary (README-Style)

Contribution

Notes for the New Phase

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src/repo_understanding_engine		src/repo_understanding_engine
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Repo Understanding Engine

Table of Contents

Overview

Key Languages and Development

Top-Level Folders

Main Entry Points

CLI Entry Point

Key CLI Commands (in src/repo_understanding_engine/cli/app.py)

Dependencies

Setup & Installation

Prerequisites

Clone the Repository

Install Dependencies

Configure Environment (for AI features)

Usage

Basic Scan

Basic Analysis

Summarize (Structured + Optional AI)

Enhanced Summary (README-Style)

Contribution

Notes for the New Phase

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Key CLI Commands (in `src/repo_understanding_engine/cli/app.py`)

Packages