A personal knowledge base about AI/ML topics, inspired by Andrej Karpathy's approach to LLM-assisted knowledge management.
The idea is to collect raw sources (articles, papers, etc.) and use LLMs - primarily Claude Code - to incrementally compile and maintain a structured wiki from them. The pattern:
knowledge-base/raw/— source documents as ingested markdownknowledge-base/summaries/— LLM-generated summaries of raw articlesknowledge-base/wiki/— LLM-compiled wiki: concept articles, cross-links, and index filesknowledge-base/manual/— hand-written notes and cheat sheets that I was maintaining beforehand already
The wiki is maintained by the LLM, not edited by hand. New raw sources get "compiled" into it incrementally — summaries written, concepts updated, backlinks added.
Articles and other sources are clipped from the web as markdown using MarkSnip, which also downloads all images locally alongside the document. After clipping, image references in the markdown are often broken due to URL encoding issues — fix_images.py fixes those paths so images render correctly in Obsidian.
Obsidian is used as the reading and browsing frontend for both raw sources and the compiled wiki.
I've been collecting and hoarding information about topics I care about for a while. This repo is an attempt to make that habit more structured and maintainable. LLMs (especially Claude Code) handle the organization, synthesis, and upkeep — so I can focus on feeding in sources and asking questions.
Eventually: Q&A against the wiki, linting for inconsistencies, and generating outputs (slides, visualizations) — all viewable in Obsidian.
- Clip web articles as local markdown with images downloaded alongside
- Fix broken image references so the local markdown renders correctly
- Establish a repeatable LLM workflow to incrementally compile raw sources into the wiki (summaries, concept articles, backlinks)
- Auto-maintain an index and brief per-document summaries to support Q&A without needing retrieval infrastructure
- Test complex multi-document questions against the wiki once it reaches meaningful size
- Evaluate whether the LLM can navigate the wiki well via index files alone vs. needing a dedicated search tool
- Have query outputs rendered as markdown and filed back into the wiki so explorations accumulate
- Try other output formats (slides, visualizations) viewable directly in the knowledge base
- Run LLM health checks to surface inconsistent or conflicting information across articles
- Impute missing data in incomplete articles using web search
- Generate suggestions for new article candidates based on gaps and unexplored connections
- Build a lightweight search interface over the wiki, usable both directly and as a tool handed off to an LLM
A local semantic search app lives under tools/search/. It indexes the wiki, summaries, and manual notes using sentence embeddings, stores them in SQLite, and serves a web UI with deep links back into Obsidian.
1. Install dependencies (Python 3.10+ required)
pip install -r tools/search/requirements.txt2. Build the index
python3 tools/search/index.pyChunks all .md files in wiki/, summaries/, and manual/ by section, embeds them with all-MiniLM-L6-v2, and writes to tools/search/search.db. First run takes ~10s; subsequent runs only re-embed changed files.
To force a full rebuild:
python3 tools/search/index.py --rebuild3. Start the server
python3 tools/search/server.pyOpen http://127.0.0.1:8000. Results link directly into Obsidian via obsidian:// deep links — make sure the vault is open in Obsidian first.
search.dbis gitignored — regenerate locally after cloning- The model (~80MB) downloads automatically on first run and is cached in
~/.cache/huggingface/ - To re-index on server startup:
SEARCH_REINDEX_ON_STARTUP=1 python3 tools/search/server.py
- Explore synthetic data generation + fine-tuning so the LLM "knows" the knowledge base in its weights rather than just its context
Claude Code skills live under .claude/skills/ and automate recurring workflows. Invoke them by typing /skill-name <argument> in a Claude Code session.
Ingests a raw source file (markdown article, PDF, etc.) into the knowledge base end-to-end:
- Fix images — repairs broken URL-encoded image paths in clipped markdown
- Read — reads the full source (chunked if long)
- Summarise — writes a structured summary to
knowledge-base/summaries/<Title>.mdwith sections, tables, formulas, and wikilinks - Identify wiki pages — finds which existing wiki pages the article's concepts belong to
- Update wiki pages — adds new content and backlinks to affected pages
- New wiki page — creates a new page if the article introduces a concept cluster with no existing home
- Update wiki index — adds the new page to
knowledge-base/wiki/index.md - Update Q&A index — adds/updates a brief entry in
knowledge-base/qa-index.md, a compact master index loadable in a single LLM context for Q&A without retrieval infrastructure - Report — lists all files created or modified
Example:
/process-article knowledge-base/raw/articles/My Article.mdReads wiki pages and summaries (full wiki, or scoped to a topic), then produces a structured report at knowledge-base/health-checks/YYYY-MM-DD.md covering:
- Contradictions — directly opposing claims about the same concept or result
- Inconsistent framing — same concept described with conflicting terminology across pages
- Superseded claims — older wiki content not updated to reflect newer sources
- Missing cross-links — related pages that don't reference each other
- Gaps — concepts mentioned across multiple pages but lacking a dedicated entry
Every finding includes the pages involved and a suggested fix.
Examples:
/health-check
/health-check training/fine-tuning
/health-check continual learning