GitHub - kumarishan/llm-lab

  _     _     __  __   _          _         
 | |   | |   |  \/  | | |    __ _| |__  ___ 
 | |   | |   | |\/| | | |   / _` | '_ \/ __|
 | |___| |___| |  | | | |__| (_| | |_) \__ \
 |_____|_____|_|  |_| |_____\__,_|_.__/|___/

Made with ❤️ by Kumar Ishan — with a lot of help from ✨ 🤖 So please be kind 😍.

This repo holds two tutorial tracks that follow code from Andrej Karpathy: a minimal microgpt walkthrough (single-file GPT, no ML frameworks) and a nanochat walkthrough (full tokenizer → pretrain → SFT → RL → eval → inference → web UI). Each chapter is reading-first: you trace real code, run hands-on steps, and build intuition for why each piece exists.

The material here was generated with the repo-tutorial skill from kumarishan/kistack — structured, chapter-by-chapter guides derived from the repositories themselves.

microgpt — `microgpt.py`

“The most atomic way to train and run inference for a GPT in pure, dependency-free Python.”

Chapter	What your reading covers
Chapter 1: Setup and First Run	Running the file with zero deps, reading train/inference output, a map of the whole pipeline
Chapter 2: Dataset and Tokenization	Character vocab, token IDs, BOS trick, sliding-window targets
Chapter 3: Autograd from Scratch	The `Value` type, computation graphs, local grads, reverse-mode autodiff
Chapter 4: Model Architecture Foundations	Init, `linear()`, stable `softmax()`, `rmsnorm()`
Chapter 5: Multi-Head Self-Attention	Q/K/V, scaled dot-product attention, KV cache, causal masking
Chapter 6: The Full GPT Forward Pass	MLP block, residual stream, token → logits
Chapter 7: Training — Loss, Backprop, and Adam	Cross-entropy, `backward()` through the graph, Adam
Chapter 8: Inference and Sampling	Autoregressive generation, temperature, sampling, what to try next

Start here: microgpt/README.md

nanochat — full LLM stack (Review in progress)

A path from BPE and architecture through distributed pretraining, SFT, RL (GRPO), evaluation, fast inference, and a streaming chat server.

Chapter	What your reading covers
Chapter 1: Setup and First Run	`uv`, project layout, device selection, tiny CPU end-to-end run, full pipeline overview
Chapter 2: LLM Fundamentals — Tokens, Embeddings, and Attention	Autoregressive LM, embeddings, self-attention, transformer blocks, cross-entropy
Chapter 3: Tokenization — Training BPE from Scratch	BPE, split patterns, RustBPE / tiktoken, specials, chat tokens, loss masking
Chapter 4: The GPT Architecture — Attention, RoPE, GQA, and Modern Innovations	RoPE, QK-norm, GQA, sliding windows, FA3, smearing, scalars, logit softcap
Chapter 5: Data Pipeline — Loading, Packing, and Distributing Training Data	Parquet / mix data, BOS-aligned cropping, DDP sharding, resumable state
Chapter 6: Pretraining — The Training Loop, Optimizers, and Mixed Precision	Training loop, CE + BPB, AdamW, Muon / Polar Express, AMP, accumulation, LR schedule
Chapter 7: Distributed Training — Multi-GPU with DDP	Data parallel, process groups, `torchrun`, `all_reduce`, checkpoints, MFU
Chapter 8: Evaluation — Bits-Per-Byte, CORE, and In-Context Learning	Bits-per-byte, in-context learning, CORE / MMLU-style evals, running eval scripts
Chapter 9: Supervised Fine-Tuning — Teaching the Model to Chat	Chat templates, assistant-only loss, benchmarks, mixtures, forgetting
Chapter 10: Reinforcement Learning — GRPO, Rewards, and Tool Use	Policy gradients, REINFORCE, GRPO/DAPO, rewards, tool use (e.g. Python sandbox)
Chapter 11: Inference — KV Cache, Flash Attention, and Sampling	KV cache, prefill vs decode, Flash vs SDPA, temperature / top-k, streaming
Chapter 12: Chat Interfaces — CLI and Web Server	CLI state, FastAPI, SSE, OpenAI-style API, async workers, multi-GPU serve
Chapter 13: Scaling Laws, Compute-Optimal Training, and Going Further	Chinchilla-style scaling, compute-optimal training, depth dial, MFU, FP8, where to go next

Start here: nanochat/README.md

How to use this repo

Pick microgpt if you want every idea visible in ~200 lines of Python, and then head over to nanochat when you want the full production-shaped stack.
Open the track README for time estimates and prerequisites, then follow chapters in order.
Do the hands-on steps in each chapter — the goal is not skimming but tracing and running the real code paths.

Contributing

See CONTRIBUTING.md. Please open an issue first to discuss improvements, vetting, or ideas for other repos that could use a similar tutorial.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
microgpt		microgpt
nanochat		nanochat
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

microgpt — `microgpt.py`

nanochat — full LLM stack (Review in progress)

How to use this repo

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

microgpt — microgpt.py

nanochat — full LLM stack (Review in progress)

How to use this repo

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

microgpt — `microgpt.py`

Packages