Skip to content

michellepace/docs-for-ai

Repository files navigation

Curate Docs For AI (with Claude Code)

Curate and index documentation from any website into collections like tailwind/ or horses/, then /ask-docs [collection] [your question] for a grounded answer — cleaner than a web-fetch, more focussed than a web-search, and keeps AI context sharp.

Each collection is curated from source docs — fetched directly where possible (.md URLs, GitHub blobs, or markdown-allowlist.txt), and scraped via the FireCrawl Python SDK only as a last resort. Its INDEX.xml is a routing signal an LLM reader uses for targeted context retrieval.

Terminal showing three-step workflow: (1) Running /curate-doc biome command, (2) Curation success output showing curated documentation and generated INDEX.xml entry, (3) Use /ask-docs to query docs. Handwritten annotations highlight each step.

Three Steps: (1) run /curate-doc on a URL → (2) the doc is curated and indexed → (3) /ask-docs to query it


🚀 Setup

This will setup /ask-docs, /curate-doc and /recurate-docs to work anywhere — not just inside this repo.

# 1. Install UV
# 👉 https://docs.astral.sh/uv/getting-started/installation/

# 2. Clone repository
git clone https://github.com/michellepace/docs-for-ai.git
cd docs-for-ai

# 3. Get free FireCrawl API key
# Visit: https://www.firecrawl.dev/app/api-keys

# 4. Add to your shell profile
echo 'export API_KEY_MCP_FIRECRAWL=your-api-key-here' >> ~/.zshrc
source ~/.zshrc  # Use ~/.bashrc if that's your shell

# 5. Install dependencies and git hooks (commit/push)
uv sync && uv run pre-commit install

# 6. Symlink slash commands so they work anywhere
mkdir -p ~/.claude/commands
ln -s "$PWD" ~/.claude/docs-for-ai # anchor (run from repo root)
ln -s ~/.claude/docs-for-ai/.claude/commands/*.md ~/.claude/commands/

To always curate via direct fetch (not scraping), add a URL prefix — not a full URL (e.g. https://nextjs.org/docs/) — to markdown-allowlist.txt. GitHub URLs and any URL ending in .md are always fetched directly.

📖 Usage

I want to… Command
Ask a collection a question /ask-docs <collection> <question>
Add or refresh a doc /curate-doc <collection> <url>
Re-curate a whole collection /recurate-docs <collection>

Example:

# Ask a question — the everyday command
/ask-docs tailwind Is my project using utility classes correctly?

# Add a doc — a new URL starts a collection or extends an existing one
/curate-doc tailwind https://tailwindcss.com/docs/theme

# Refresh a doc — re-run the same URL to pull the latest content
/curate-doc tailwind https://tailwindcss.com/docs/theme

# Re-curate every doc in a collection at once
/recurate-docs tailwind

📦 Repo Collections

My curations — a starting point. Keep what's useful, delete the rest, re-curate anytime to refresh.

Collection Collection Index Description Curated Source
📦 biome/ 📄 INDEX.xml Fast linter/formatter 2025-11-04 Official
📦 claudecode/ 📄 INDEX.xml Anthropic Claude Code 2026-02-05 Official
📦 claudeplat/ 📄 INDEX.xml Anthropic Claude Platform 2026-01-07 Official
📦 clerk/ 📄 INDEX.xml Authentication 2025-12-03 Official
📦 convex/ 📄 INDEX.xml Reactive database 2026-01-07 Official
🪝 lefthook/ 📄 INDEX.xml Git hooks manager 2025-11-24 Official
📦 marimo/ 📄 INDEX.xml Reactive Python notebooks 2025-11-11 Official
📦 nextjs/ 📄 INDEX.xml React framework 2025-12-02 Official
📦 playwright/ 📄 INDEX.xml Browser testing 2025-11-07 Official
📦 shadcn/ 📄 INDEX.xml React UI components 2025-12-16 Official, Guide
📦 shiny/ 📄 INDEX.xml Python web apps 2025-11-02 Official
📦 tailwind/ 📄 INDEX.xml CSS framework 2025-10-15 Official
📦 tailwindplus/ 📄 INDEX.xml Paid UI Components 2025-11-16 Official
📦 uv/ 📄 INDEX.xml Python projects 2026-05-30 Official
📦 vercel/ 📄 INDEX.xml Deployment platform 2025-10-20 Official
📦 vitest/ 📄 INDEX.xml Testing framework 2025-11-05 Official
📦 zustand/ 📄 INDEX.xml State management 2026-01-03 Official

🏗️ How This Repo Works

Workflow: /curate-doc <collection> <url> runs a Python script that fetches the source URL → writes a .md file → adds a collections/<collection>/INDEX.xml entry with a PLACEHOLDER description → Claude Code fills in the description.

The /curate-doc command always regenerates the description, whereas /recurate-docs only regenerates descriptions for files with content changes.

Source routing: Direct .md URLs and GitHub blobs (.md/.mdx/.qmd) are fetched as-is; FireCrawl scrapes everything else as a fallback.

Curated Collection:

collections/
└── <collection>/       # eg. biome/, clerk/, uv/
    ├── INDEX.xml       # Routing index for targeted retrieval
    ├── README.md
    └── *.md            # Curated doc files

INDEX.xml Schema:

<docs_index>
  <source>
    <title>[curated source document title]</title>
    <description>[20-30 word routing signal an LLM reader uses to pick this file]</description>
    <source_url>[document source url]</source_url>
    <local_file>[curated .md filename]</local_file>
    <curated_at>YYYY-MM-DD</curated_at>
  </source>
  <!-- One <source> entry per curated .md file -->
</docs_index>

📝 TODO

Regenerate "Descriptions":

  • uv run scripts/collection_status.py
  • Framing has shifted from "semantic search" to LLM routing, still need to re-curate.
  • Possibly replace /recurate-docs via a sequential claude -p shell script. Gets all the URLs → downloads / scrapes → does a diff (or git % change) and updates the index only if needed. Replace the command? Possibly related to scripts/curate-batch.sh?

Sort out scripts/

  • delete things

Markdown allowlist

About

Curate and index clean docs for clean AI context to ask questions against docs.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors