ai-ml-knowledge-base

A personal knowledge base about AI/ML topics, inspired by Andrej Karpathy's approach to LLM-assisted knowledge management.

Concept

The idea is to collect raw sources (articles, papers, etc.) and use LLMs - primarily Claude Code - to incrementally compile and maintain a structured wiki from them. The pattern:

knowledge-base/raw/ — source documents as ingested markdown
knowledge-base/summaries/ — LLM-generated summaries of raw articles
knowledge-base/wiki/ — LLM-compiled wiki: concept articles, cross-links, and index files
knowledge-base/manual/ — hand-written notes and cheat sheets that I was maintaining beforehand already

The wiki is maintained by the LLM, not edited by hand. New raw sources get "compiled" into it incrementally — summaries written, concepts updated, backlinks added.

Collection Workflow

Articles and other sources are clipped from the web as markdown using MarkSnip, which also downloads all images locally alongside the document. After clipping, image references in the markdown are often broken due to URL encoding issues — fix_images.py fixes those paths so images render correctly in Obsidian.

Obsidian is used as the reading and browsing frontend for both raw sources and the compiled wiki.

Goal

I've been collecting and hoarding information about topics I care about for a while. This repo is an attempt to make that habit more structured and maintainable. LLMs (especially Claude Code) handle the organization, synthesis, and upkeep — so I can focus on feeding in sources and asking questions.

Eventually: Q&A against the wiki, linting for inconsistencies, and generating outputs (slides, visualizations) — all viewable in Obsidian.

TODO / Things to explore

Ingest

Clip web articles as local markdown with images downloaded alongside
Fix broken image references so the local markdown renders correctly

Wiki compilation

Establish a repeatable LLM workflow to incrementally compile raw sources into the wiki (summaries, concept articles, backlinks)
Auto-maintain an index and brief per-document summaries to support Q&A without needing retrieval infrastructure

Q&A

Test complex multi-document questions against the wiki once it reaches meaningful size
Evaluate whether the LLM can navigate the wiki well via index files alone vs. needing a dedicated search tool

Output

Have query outputs rendered as markdown and filed back into the wiki so explorations accumulate
Try other output formats (slides, visualizations) viewable directly in the knowledge base

Linting / health checks

Run LLM health checks to surface inconsistent or conflicting information across articles
Impute missing data in incomplete articles using web search
Generate suggestions for new article candidates based on gaps and unexplored connections

Tooling

Build a lightweight search interface over the wiki, usable both directly and as a tool handed off to an LLM

Search Interface

A local semantic search app lives under tools/search/. It indexes the wiki, summaries, and manual notes using sentence embeddings, stores them in SQLite, and serves a web UI with deep links back into Obsidian.

Setup

1. Install dependencies (Python 3.10+ required)

pip install -r tools/search/requirements.txt

2. Build the index

python3 tools/search/index.py

Chunks all .md files in wiki/, summaries/, and manual/ by section, embeds them with all-MiniLM-L6-v2, and writes to tools/search/search.db. First run takes ~10s; subsequent runs only re-embed changed files.

To force a full rebuild:

python3 tools/search/index.py --rebuild

3. Start the server

python3 tools/search/server.py

Open http://127.0.0.1:8000. Results link directly into Obsidian via obsidian:// deep links — make sure the vault is open in Obsidian first.

Notes

search.db is gitignored — regenerate locally after cloning
The model (~80MB) downloads automatically on first run and is cached in ~/.cache/huggingface/
To re-index on server startup: SEARCH_REINDEX_ON_STARTUP=1 python3 tools/search/server.py

Future

Explore synthetic data generation + fine-tuning so the LLM "knows" the knowledge base in its weights rather than just its context

LLM Skills (Claude Code)

Claude Code skills live under .claude/skills/ and automate recurring workflows. Invoke them by typing /skill-name <argument> in a Claude Code session.

`/process-article <path-to-article>`

Ingests a raw source file (markdown article, PDF, etc.) into the knowledge base end-to-end:

Fix images — repairs broken URL-encoded image paths in clipped markdown
Read — reads the full source (chunked if long)
Summarise — writes a structured summary to knowledge-base/summaries/<Title>.md with sections, tables, formulas, and wikilinks
Identify wiki pages — finds which existing wiki pages the article's concepts belong to
Update wiki pages — adds new content and backlinks to affected pages
New wiki page — creates a new page if the article introduces a concept cluster with no existing home
Update wiki index — adds the new page to knowledge-base/wiki/index.md
Update Q&A index — adds/updates a brief entry in knowledge-base/qa-index.md, a compact master index loadable in a single LLM context for Q&A without retrieval infrastructure
Report — lists all files created or modified

Example:

/process-article knowledge-base/raw/articles/My Article.md

`/health-check [topic]`

Reads wiki pages and summaries (full wiki, or scoped to a topic), then produces a structured report at knowledge-base/health-checks/YYYY-MM-DD.md covering:

Contradictions — directly opposing claims about the same concept or result
Inconsistent framing — same concept described with conflicting terminology across pages
Superseded claims — older wiki content not updated to reflect newer sources
Missing cross-links — related pages that don't reference each other
Gaps — concepts mentioned across multiple pages but lacking a dedicated entry

Every finding includes the pages involved and a suggested fix.

Examples:

/health-check
/health-check training/fine-tuning
/health-check continual learning

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.agents/skills		.agents/skills
.claude/skills		.claude/skills
.obsidian		.obsidian
knowledge-base		knowledge-base
tests		tests
tools/search		tools/search
.gitignore		.gitignore
README.md		README.md
fix_images.py		fix_images.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ai-ml-knowledge-base

Concept

Collection Workflow

Goal

TODO / Things to explore

Ingest

Wiki compilation

Q&A

Output

Linting / health checks

Tooling

Search Interface

Setup

Notes

Future

LLM Skills (Claude Code)

`/process-article <path-to-article>`

`/health-check [topic]`

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ai-ml-knowledge-base

Concept

Collection Workflow

Goal

TODO / Things to explore

Ingest

Wiki compilation

Q&A

Output

Linting / health checks

Tooling

Search Interface

Setup

Notes

Future

LLM Skills (Claude Code)

/process-article <path-to-article>

/health-check [topic]

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`/process-article <path-to-article>`

`/health-check [topic]`

Packages