Transformer decoder built from scratch following Sebastien Raschka's Build a Large Language Model From Scratch. Covers the full architecture from tokenization to autoregressive generation.
Build a standalone Transformer engine to master the architecture.
Data - setup done ✅, got a short story from Wikipedia ("The Journey" by Edith Wharton)
Attention.py - done, full multi-head attention mechanism.
Transformer.ipynb - contains full decoder.