Generate DeepWiki-style onboarding documentation for any codebase — source-linked, diagram-rich, and structured for both humans and LLMs to navigate.
Validated against axios, flask, and express with a mean eval score of 95.3 / 100 across three independent LLM harnesses (Claude, Gemini CLI, OpenAI Codex CLI).
A hierarchical wiki of Markdown pages, each containing:
- TL;DR — 2–3 sentences summarising the subsystem
- Architecture diagram — Mermaid flowchart or sequence diagram
- Relevant Source Files table — every claim links back to real file paths
- Key Concepts table — structured reference for onboarding engineers
- Prose with inline citations —
path/to/file.ext:L45–L87on every claim [NEEDS INVESTIGATION]markers — honest flags on anything unverified
wiki/
├── 00-index.md Table of contents + navigation guide
├── 01-overview.md High-level architecture + Mermaid diagram
├── 02-request-lifecycle.md
├── 03-middleware.md
└── ...
axios — Promise-based HTTP client
- 00 · Index
- 01 · Overview & Architecture
- 02 · HTTP Client Core
- 03 · Request Pipeline
- 04 · Interceptors
- 05 · Adapters
- 06 · Config Merging
- 07 · Error Handling
- 08 · Build & Development
- 09 · Testing
flask — Python web framework
- 00 · Index
- 01 · Overview & Architecture
- 02 · Application Core
- 03 · Request / Response Cycle
- 04 · Routing System
- 05 · Blueprints
- 06 · Context Management
- 07 · Globals & Proxies
- 08 · Templating
- 09 · Sessions & Cookies
- 10 · Build & Deployment
express — Node.js web framework
- 00 · Index
- 01 · Overview & Architecture
- 02 · Application Core
- 03 · Routing System
- 04 · Middleware Pipeline
- 05 · Request & Response
- 06 · View Engine
- 07 · Static Middleware
- 08 · Build & Testing
codebase-onboarding-skill/
├── SKILL.md ← Skill invocation & workflow instructions
├── scripts/
│ ├── analyze.py ← Codebase reconnaissance (AST, PageRank, git)
│ ├── eval.py ← Wiki quality scorer (6 dimensions, max 100)
│ └── requirements.txt ← Optional Python deps
├── references/
│ ├── page-template.md ← Required page structure
│ ├── diagram-patterns.md ← Mermaid diagram templates by scenario
│ └── language-guides.md ← Language/framework-specific analysis guidance
├── agents/
│ └── openai.yaml ← Harness metadata for implicit invocation
└── evals/
├── run.sh ← Unified eval runner (Claude + Gemini + Codex)
├── score.py ← Multi-harness comparison scorer
├── parse_gemini.py ← Splits Gemini's delimited output into .md files
├── README.md ← Eval setup & runbook
└── results/ ← Committed benchmark artifacts
Install via the Vercel Labs Skills framework:
# Install to all agents
npx skills add eabait/codebase-onboarding-skill --all
# Or install to a specific agent
npx skills add eabait/codebase-onboarding-skill -a claude-codePoint your agent at a repository and ask:
"Onboard me to this codebase."
That's it. The skill instructs the agent to install dependencies, run the codebase analysis, and generate the full wiki automatically — no manual steps required.
The agent will:
- Install Python deps (
scripts/requirements.txt) if needed - Run
scripts/analyze.pyon the repository to produce a structured analysis - Read
SKILL.mdand the reference docs - Generate a complete wiki and write the pages to
wiki/
If you're using an agent directly, pass SKILL.md and the reference docs in context
alongside the repo path. See evals/README.md for per-harness
examples with Claude Code, Gemini CLI, and Codex CLI.
Run the built-in eval harness to benchmark quality across models:
# All harnesses (Claude + Gemini + Codex), all 3 reference repos
bash evals/run.sh
# Single harness
bash evals/run.sh --harness gemini
bash evals/run.sh --harness codex axios| Harness | Model | Score | Cit/page | vs Baseline |
|---|---|---|---|---|
| Claude | claude-sonnet-4-6 | 95.3 | 13.7 | — |
| Codex | gpt-5.2 | 92.2 | 39.6 | −3.1 |
| Gemini | gemini-2.5-pro | 91.7 | 6.3 | −3.6 |
Full results:
- Multi-harness comparison report — Claude vs Codex vs Gemini, all repos
- Claude baseline report — per-repo subscores and notes
- Benchmark data (JSON)
scripts/eval.py grades six dimensions (max 100 points):
| Dimension | Max | Measures |
|---|---|---|
| Structure | 25 | TL;DR, required headings, source-files table, index page |
| Citations | 30 | Citation count, format (file.py:L45), density per page |
| Diagrams | 15 | Mermaid blocks, multi-node diagrams |
| Tables | 10 | Key Concepts tables, component references |
| Completeness | 10 | Page count, word count, topic coverage |
| Transparency | 10 | [NEEDS INVESTIGATION] markers |
# Score any wiki output directory
python3 scripts/eval.py path/to/runs/ --output report.jsonPull requests welcome. When adding support for a new language or framework, update
references/language-guides.md. When adding a new eval harness, follow the pattern
in evals/run.sh and document it in evals/README.md.
MIT — see LICENSE.