docforge

structured documentation infrastructure for humans and agents.

docforge crawls, renders, versions, and caches live documentation — turning messy, hard-to-scrape docs into clean, searchable artifacts that both humans and agents can reason over.

agents get a stable http interface.
humans get a readable, inspectable UI.
docs stop being scraped repeatedly and start being infrastructure.

why this exists

live documentation is one of the worst inputs for agents:

js-heavy pages
inconsistent structure
high token cost to scrape repeatedly
no versioning or freshness guarantees

docforge fixes this by:

rendering docs once (properly)
extracting structure, not just text
storing versioned docsets with diffs
exposing a deterministic api agents can trust

what docforge is

an agent-native api for documentation
a docset store with versioning + freshness
a human-readable ui to inspect what agents actually see
infra, not a chatbot

architecture (high level)

next.js
human ui + api gateway (agent entrypoint)
fastapi workers + playwright
render + crawl js-heavy documentation
redis + bullmq
async ingestion and crawling jobs
postgres
metadata, versions, chunks, search
s3-compatible storage (r2 / minio)
raw snapshots + extracted artifacts

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
apps/web		apps/web
.gitignore		.gitignore
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

docforge

why this exists

what docforge is

architecture (high level)

About

Uh oh!

Releases

Packages

Languages

ShernanJ/docforge

Folders and files

Latest commit

History

Repository files navigation

docforge

why this exists

what docforge is

architecture (high level)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages