structured documentation infrastructure for humans and agents.
docforge crawls, renders, versions, and caches live documentation — turning messy, hard-to-scrape docs into clean, searchable artifacts that both humans and agents can reason over.
agents get a stable http interface.
humans get a readable, inspectable UI.
docs stop being scraped repeatedly and start being infrastructure.
live documentation is one of the worst inputs for agents:
- js-heavy pages
- inconsistent structure
- high token cost to scrape repeatedly
- no versioning or freshness guarantees
docforge fixes this by:
- rendering docs once (properly)
- extracting structure, not just text
- storing versioned docsets with diffs
- exposing a deterministic api agents can trust
- an agent-native api for documentation
- a docset store with versioning + freshness
- a human-readable ui to inspect what agents actually see
- infra, not a chatbot
-
next.js
human ui + api gateway (agent entrypoint) -
fastapi workers + playwright
render + crawl js-heavy documentation -
redis + bullmq
async ingestion and crawling jobs -
postgres
metadata, versions, chunks, search -
s3-compatible storage (r2 / minio)
raw snapshots + extracted artifacts