Skip to content

ASHEN-IX/RepoRadar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Repo Understanding Engine

Understand any GitHub repository instantly with a single command.

Repo Understanding Engine is a Python-based CLI tool that scans, analyzes, and summarizes codebases. It combines static analysis (files, folders, languages, entry points, dependencies) with optional AI-powered summarization (via OpenRouter) to generate professional documentation-ready summaries.

The project is currently in an active development phase (v0.1.0), focusing on:

  • Robust repository scanning and metadata extraction
  • AI-enhanced, presentation-ready summaries
  • Cleaner storage, persistence, and CLI ergonomics

Table of Contents


Overview

Repo Understanding Engine is a command-line tool that analyzes local and remote (GitHub) repositories to produce structured and human-readable summaries. It scans the repository file tree, detects languages and frameworks, identifies entry points, and aggregates dependency information. On top of that, it can leverage OpenRouter to generate professional, documentation-ready narratives.

The CLI exposes multiple commands to:

  • Scan and list files, extensions, and folders
  • Produce structured JSON summaries for downstream tooling
  • Generate concise or enhanced markdown summaries suitable for READMEs and technical docs

Key Languages and Development

This project itself is primarily a Python CLI application with configuration and documentation files.

  • Python (.py)

    • Core language for the CLI, scanning logic, AI integration, and output generation.
    • Used across modules in src/repo_understanding_engine (core, ai, cli, output, models).
  • TOML (.toml)

    • Project configuration and dependency management via pyproject.toml.
    • Defines the repo-understand console script, dependencies, and build system settings.
  • Markdown (.md)

    • Human-readable documentation (this README.md).
    • Generated summaries are also written as markdown files into repos/.
  • Environment Files (.env, .env.example)

    • Used to configure environment variables such as OPENROUTER_API_KEY.
    • Loaded via python-dotenv at runtime.

No frontend stack or database schema lives in this repo; instead, this tool is designed to analyze other repositories that may contain JS/TS, SQL, YAML, Docker, etc.


Top-Level Folders

The repository is organized as follows:

  • src/

    • Root for the Python package repo_understanding_engine.
    • Contains all application logic (CLI, scanning, AI integration, output formatting, models).
  • src/repo_understanding_engine/ai/

    • AI integration layer.
    • Includes the OpenRouter HTTP client, prompt templates, and AI summarizer utilities.
  • src/repo_understanding_engine/cli/

    • CLI entrypoint code based on Typer.
    • Declares the scan, analyze, summarize, and enhanced commands.
  • src/repo_understanding_engine/core/

    • Core engine modules:
      • Repository fetching (local path vs GitHub URL clone)
      • Filesystem scanning and metadata extraction
      • Language, folder, dependency, and entry-point detection
  • src/repo_understanding_engine/output/

    • Output rendering and persistence:
      • Raw JSON summary writing
      • Human-readable and enhanced markdown summary generation
      • File naming and timestamping for summaries
  • src/repo_understanding_engine/models/

    • Placeholder for data models (e.g., for potential Pydantic-based schemas).
  • tests/

    • Python test package for validating core scanning behavior.
    • Currently minimal, ready to be extended as the engine evolves.
  • repos/

    • Runtime directory created by the tool.
    • Stores cloned repositories when --keep is used, along with generated summaries and raw structured outputs (e.g., repos/raw/*.json).

Main Entry Points

CLI Entry Point

  • Console script: repo-understand

    • Defined in pyproject.toml under [tool.poetry.scripts].
    • Points to repo_understanding_engine.cli.app:main.
  • Python entry function: repo_understanding_engine.cli.app.main

    • Wires up Typer’s app and exposes CLI commands.

Key CLI Commands (in src/repo_understanding_engine/cli/app.py)

  • scan

    • Input: GitHub repo URL or local path.
    • Behavior: Clones/fetches the repo, scans files and folders, and prints counts by extension and folder list.
  • analyze

    • Input: GitHub repo URL or local path.
    • Behavior: Uses the scanner to produce a basic structured summary (path, languages, frameworks, main files, folders) and prints it to the console.
  • summarize

    • Input: GitHub repo URL or local path.
    • Options:
      • --ai openrouter – enable AI-powered human-readable summary.
      • --keep – keep cloned repositories under ./repos/ instead of using a temporary directory.
    • Behavior:
      • Fetches/clones the repo.
      • Scans structure into a structured summary.
      • Optionally calls OpenRouter to generate a human-friendly narrative summary.
      • Saves both raw JSON (repos/raw/*.json) and markdown summary (repos/*.md).
  • enhanced

    • Input: GitHub repo URL or local path.
    • Options:
      • --ai openrouter – let AI generate a full professional README-style summary.
      • --keep – keep cloned repository under ./repos/.
    • Behavior:
      • Fetches/clones the repo.
      • Runs an advanced scanner that extracts:
        • Languages with purposes and file counts
        • Top-level folders with inferred roles
        • Entry points (server, app, index, Docker, etc.)
        • Dependencies (Python/Node) and frameworks.
      • If AI is enabled, prompts OpenRouter to generate a complete markdown summary following a strict structure (overview, languages, folders, entry points, dependencies, notes for the phase).
      • Saves the enhanced summary to repos/<repo>_summary_<timestamp>.md.

Dependencies

Core runtime dependencies are defined in pyproject.toml:

  • Typer

    • Framework for building modern CLI applications.
    • Powers argument parsing, command registration, and help text.
  • Rich

    • Provides rich terminal output (colors, tables, formatting).
    • Can be used to improve CLI UX (e.g., future progress bars, formatted summaries).
  • GitPython

    • Used to clone and interact with Git repositories from Python.
    • Powers handling of remote GitHub URLs vs local paths.
  • Pydantic (v2)

    • Planned/available for defining typed models and validation.
    • Suitable for structured summary schemas or configuration models.
  • httpx

    • Modern HTTP client used for communicating with the OpenRouter API.
    • Handles request/response lifecycle, timeouts, and error states.
  • python-dotenv

    • Loads environment variables from .env.
    • Used to configure OPENROUTER_API_KEY and other secrets.

Development and build tooling:

  • Poetry / poetry-core
    • Dependency management and packaging.
    • Defines the build backend and entry points.

External service integration:

  • OpenRouter (via ai/openrouter_client.py)
    • Provides access to hosted LLMs (e.g., meta-llama/llama-3.2-3b-instruct).
    • Used to generate advanced human-readable summaries and technical documentation.

Setup & Installation

Prerequisites

  • Python 3.11.14 or later (per pyproject.toml).
  • Poetry installed for dependency management.

Clone the Repository

git clone <YOUR-FORK-OR-ORIGIN-URL> repo-understanding-engine
cd repo-understanding-engine

Install Dependencies

poetry install

Configure Environment (for AI features)

Create a .env file (or copy from .env.example) and set:

OPENROUTER_API_KEY=sk-or-v1-...   # your OpenRouter API key

AI features are optional. If OPENROUTER_API_KEY is not set, the tool will gracefully fall back to non-AI summaries.


Usage

All commands are exposed through the repo-understand console script.

Basic Scan

poetry run repo-understand scan https://github.com/owner/repo

Outputs:

  • Files grouped by extension (with counts)
  • List of folders in the repository

Basic Analysis

poetry run repo-understand analyze /path/to/local/repo

Outputs a simple structured summary:

  • Repository path
  • Languages and frameworks
  • Main files
  • Folders

Summarize (Structured + Optional AI)

# Without AI (structured only)
poetry run repo-understand summarize https://github.com/owner/repo

# With AI (OpenRouter-based human-readable summary)
poetry run repo-understand summarize https://github.com/owner/repo --ai openrouter

# Keep cloned repository under ./repos/
poetry run repo-understand summarize https://github.com/owner/repo --ai openrouter --keep

Results:

  • Raw JSON summary file in repos/raw/<repo>.json.
  • Markdown summary file in repos/<repo>_summary_<timestamp>.md.

Enhanced Summary (README-Style)

# Rule-based enhanced summary
poetry run repo-understand enhanced https://github.com/owner/repo

# AI-generated full professional summary
poetry run repo-understand enhanced https://github.com/owner/repo --ai openrouter

The enhanced command is designed to produce documentation-quality output that can be used directly in READMEs, technical docs, or slide decks.


Contribution

Contributions are welcome. To propose changes or new features:

  1. Fork the repository.
  2. Create a feature branch:
    git checkout -b feature/my-improvement
  3. Add tests under tests/ when introducing new behavior.
  4. Run the CLI against a few sample repositories to validate behavior.
  5. Open a Pull Request, describing:
    • What you changed
    • Why it is useful
    • How to reproduce or test it

You can also use GitHub Issues to report bugs, request features, or discuss improvements to the scanning, AI prompts, or CLI ergonomics.


Notes for the New Phase

The current development phase introduced several significant improvements:

  • Enhanced Scanner

    • Language analysis now accounts for file counts and inferred purposes.
    • Top-level folder purposes are detected using name patterns (e.g., backend, frontend, ai).
    • Entry points are detected for common file names (app.py, server.js, Dockerfile, etc.).
    • Dependencies are extracted from pyproject.toml, requirements*.txt, and nested package.json files for target repositories.
  • AI-Driven Summaries

    • Integration with OpenRouter via a dedicated client (ai/openrouter_client.py) with robust error handling and logging.
    • New enhanced command that can generate full professional summaries (README-style) for analyzed repositories using GPT-5.1-compatible models via OpenRouter.
  • Improved Storage & Cleanup

    • All summaries and raw structured outputs are now stored under repos/.
    • Optionally persistent cloned repositories (--keep) to avoid re-cloning for repeated analysis.
    • Temporary directories are automatically cleaned up when not needed.
  • CLI Experience

    • Clearer commands (scan, analyze, summarize, enhanced) with helpful options and messages.
    • Better separation of concerns across core, ai, cli, and output modules.

Future phases can extend this engine with:

  • Richer framework and architecture detection (e.g., microservices maps, monorepo layout).
  • Deeper test coverage and CI integration.
  • Optional HTML or PDF summary exports.

About

RepoRadar instantly scans GitHub or local repos, revealing files, folders, languages, frameworks, and main entry points. AI-powered summaries explain the project in plain language, helping developers quickly understand any codebase.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages