Skip to content

yikerman/learn-attn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

learn-attn

This repository contains docs and code which builds a small scale GPT on top of the Tiny Shakespeare dataset. It is designed to teach people like me - have basic understanding of DL and related math concepts (namely basic multivar calc and linear algebra) and want to dive deeper into the world of attention and transformers.

The repository is generated as follows:

  1. I asked Claude Code to fetch Harvard's The Annotated Transformer and Karpathy's nanoGPT, summarize and restructure them in a way that is easy to understand for someone with my background.
  2. I follow the docs generated and asked Claude Code to make edits as I go along, rinse and repeat.

Quick Start

uv sync                                                                # install deps
uv run python -m babygpt.dataset                                       # print dataset stats
uv run python -m babygpt.train                                         # train (~14 min on RTX 3080)
uv run python -m babygpt.generate --prompt "ROMEO:" --temperature 0.8  # generate text
./build.sh                                                             # rebuild tutorial PDF

Credits

I cannot take credit for this repository which is based heavily on the work of others. I'd like to thank:

  • The original authors of the paper Attention is All You Need and the many researchers who have built on top of it.
  • The authors of the Annotated Transformer. (MIT License)
  • Karpathy's tutorials and his nanoGPT codebase. (MIT License)

How is this not AI slop?

I read everything.

About

Build GPT from scratch.

Resources

License

Stars

Watchers

Forks

Contributors