This repository contains docs and code which builds a small scale GPT on top of the Tiny Shakespeare dataset. It is designed to teach people like me - have basic understanding of DL and related math concepts (namely basic multivar calc and linear algebra) and want to dive deeper into the world of attention and transformers.
The repository is generated as follows:
- I asked Claude Code to fetch Harvard's The Annotated Transformer and Karpathy's nanoGPT, summarize and restructure them in a way that is easy to understand for someone with my background.
- I follow the docs generated and asked Claude Code to make edits as I go along, rinse and repeat.
uv sync # install deps
uv run python -m babygpt.dataset # print dataset stats
uv run python -m babygpt.train # train (~14 min on RTX 3080)
uv run python -m babygpt.generate --prompt "ROMEO:" --temperature 0.8 # generate text
./build.sh # rebuild tutorial PDFI cannot take credit for this repository which is based heavily on the work of others. I'd like to thank:
- The original authors of the paper Attention is All You Need and the many researchers who have built on top of it.
- The authors of the Annotated Transformer. (MIT License)
- Karpathy's tutorials and his nanoGPT codebase. (MIT License)
I read everything.