Skip to content

Mituvinci/ml-interview-drill

Repository files navigation

ML Drill

A personal ML interview preparation tool powered by Claude AI and RAG over your own ML textbooks.

Python FastAPI Claude


Screenshots

Home

Book Exercises Progress Study Journal
Book Exercises Progress Study Journal

What it does

  • Claude-generated interview questions — every question is generated by Claude API for your topic, grounded in your books via RAG. No hardcoded question banks.
  • Socratic evaluation — submit your answer, Claude evaluates it: what you got right, what's missing, a book reference, and an improved answer.
  • Weak spot tracking — scores per topic tracked automatically. Random drill pulls 70% from your weakest topics.
  • Book Exercises — real end-of-chapter exercises extracted from your PDFs with Claude model answers backed by book passages.
  • Equation Cheat Sheet — key ML equations with Claude explanations grounded in your books.
  • Study Journal — every answer and evaluation saved and reviewable.
  • Framework Studio — learn OpenAI Agents SDK, CrewAI, and LangGraph by reading real project code with Claude explanations.
  • Study Guide — structured 10-phase ML interview learning path with interview tips and direct drill links.

Tech stack

  • Backend: Python + FastAPI + ChromaDB + PyMuPDF
  • Frontend: React (single HTML file, no build step) + Tailwind CDN
  • LLM: Claude API (claude-sonnet-4-20250514)
  • Vector DB: ChromaDB (local, embedded)
  • Package manager: UV

No LangChain. No LlamaIndex. Raw API calls only.


Setup

1. Install UV

curl -LsSf https://astral.sh/uv/install.sh | sh

2. Clone and install

git clone https://github.com/Mituvinci/ml-interview-drill.git
cd ml-interview-drill
uv sync

3. Add your API key

cp .env.example .env
# Edit .env — add your ANTHROPIC_API_KEY

4. Add your books

Place PDFs in data/books/:

data/books/
├── goodfellow.pdf        # Goodfellow — Deep Learning
├── chip_huyen.pdf        # Chip Huyen — Designing ML Systems
└── geron.pdf             # Géron — Hands-On ML

Any PDF works. Add more by editing BOOK_FILES in ingest.py.
You can also add markdown content (notes, Q&A books) via MARKDOWN_DIRS in ingest.py.

5. Index your books

uv run python ingest.py

Creates a local ChromaDB vector store in data/chroma_db/.

6. (Optional) Extract book exercises

uv run python extract_exercises.py

Extracts end-of-chapter exercises from your PDFs into data/book_exercises.json.

7. Run

uv run uvicorn main:app --reload --port 8000

Open http://localhost:8000


Project structure

ml-drill/
├── main.py                    # FastAPI app
├── ingest.py                  # PDF + Markdown → ChromaDB
├── extract_exercises.py       # Extract book exercises from PDFs
├── api/                       # All API endpoints
├── core/
│   ├── question_generator.py  # Claude question generation + caching
│   ├── evaluator.py           # Claude evaluation prompts
│   ├── rag.py                 # ChromaDB retrieval
│   └── ...
├── frontend/
│   ├── index.html             # React app shell
│   └── components/            # React component files
├── data/
│   ├── books/                 # Your PDFs go here (gitignored)
│   ├── study_guide.json       # Learning path config
│   └── code_projects/         # Framework Studio code examples
└── notes/                     # Personal study notes HTML (gitignored)

How questions are generated

Every drill question is generated by Claude:

  1. You select a topic (e.g. "Adam Optimizer")
  2. App checks cache (data/generated_questions.json) for unanswered questions on that topic
  3. If none → calls Claude with a RAG-grounded prompt using your indexed books
  4. Question is cached — generated once, free forever after

The prompt produces Zoom-interview style questions: conceptual, comparison, explanation. Not "derive from scratch."


Adding personal study notes

Place HTML files in notes/ and add the filename to ALLOWED_NOTES in main.py. Served at /notes/<filename>.


Deploying to Hugging Face Spaces

Set these secrets in your Space settings:

ANTHROPIC_API_KEY=your_key
HF_TOKEN=your_hf_token           # optional — for syncing progress to HF Dataset
HF_DATASET_REPO=user/repo-name   # optional

See core/hf_sync.py for the sync logic.


Acknowledgements

This tool is built to study from these excellent resources. No book content is distributed in this repository. You must obtain your own legal copies.

  • Ian Goodfellow, Yoshua Bengio, Aaron CourvilleDeep Learning (free online)
  • Aurélien GéronHands-On Machine Learning with Scikit-Learn, Keras & TensorFlow (O'Reilly)
  • Chip HuyenDesigning Machine Learning Systems (O'Reilly) and ML Interviews Book (free online)
  • Ed DonnerLLM Engineering: Master AI & Large Language Models (Udemy) — the agentic AI module that inspired the Framework Studio feature

Place your own PDF copies in data/books/ before running ingest.py.


License

MIT

About

ML interview prep tool with Claude-generated questions, RAG over textbooks, and adaptive drilling

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors