Build a Large Language Model from scratch, following the Stanford CS336 assignments.
The official original repository: https://github.com/stanford-cs336/assignment1-basics/tree/main
What you will implement
- Byte-pair encoding (BPE) tokenizer (§2)
- Transformer language model (LM) (§3)
- The cross-entropy loss function and the AdamW optimizer (§4)
- The training loop, with support for serializing and loading model and optimizer state (§5)
What you will run
- Train a BPE tokenizer on the TinyStories dataset.
- Run your trained tokenizer on the dataset to convert it into a sequence of integer IDs.
- Train a Transformer LM on the TinyStories dataset.
- Generate samples and evaluate perplexity using the trained Transformer LM.
- Train models on OpenWebText and submit your attained perplexities to a leaderboard.
📦 Click to expand assignment1-basics directory structure
assignment1-basics/
├── config/ # configs for TinyStories / OWT experiments
├── cs336_basics/ # starter code and basic tokenizer examples
├── data/ # TinyStories and OpenWebText data
├── model/ # BPE vocab and merges used by the tokenizer
├── runs/ # training runs and saved checkpoints
├── script/ # scripts for training, tokenization, and experiments
├── src/ # core tokenizer and transformer implementation
│ ├── attention.py # attention modules
│ ├── config.py # model & training configs
│ ├── dataloader.py # dataset & dataloader
│ ├── embedding.py # token & positional embeddings
│ ├── generate.py # text generation logic
│ ├── linear.py # linear layers
│ ├── optimizer.py # optimizer implementations
│ ├── rmsnorm.py # RMSNorm layer
│ ├── rope.py # rotary positional embeddings (RoPE)
│ ├── softmax.py # numerically stable softmax
│ ├── swiglu.py # SwiGLU feedforward components
│ ├── tokenizer.py # BPE tokenizer (train / encode / decode)
│ ├── tracker.py # training metrics & logging
│ ├── transformer.py # Transformer language model
│ └── utils.py # shared utilities
├── tests/ # pytest tests and fixtures
├── wandb/ # Weights & Biases logs and metadata
└── ...
Prepare the environment with uv as described in assignment1 README – Environment.
Download the pretraining datasets as described in assignment1 README – Download Data.
Run unit tests for the components which have implemented:
-
cd assignment1-basics -
Run unit tests (they call functions in assignment1-basics/tests/adapters.py):
- Run all 48 tests:
uv run pytest - Run a specific component test, e.g.
uv run pytest -k test_transformer_lm
- Run all 48 tests:
-
Train a BPE tokenizer on the TinyStories dataset.
uv run python script/train_bpe_tokenizer.py -
Run your trained tokenizer on the dataset to convert it into a sequence of integer IDs.
uv run python script/tokenize_and_bin.py -
Train a Transformer LM on the TinyStories dataset.
-
Use tokenized TinyStories dataset to train model, and evaluate perplexity
uv run python script/train.py -
Tune the learning rate: [1e-1, 5e-2, 2e-2, 1e-2, 5e-3, 2e-3, 1e-3]
uv run python script/learning_rate_experiment.py -
Batch size variations: [8, 16, 32, 64] (GPU memory limit)
uv run python script/batch_size_experiment.py
-
-
Generate samples and evaluate perplexity using the trained Transformer LM.
-
Generate and decode
uv run python script/generate_and_decode.py📦 Click to expand: Generate and decode results
Input
Once upon a timeOutput
Once upon a time, there was a small dog named Spot. Spot loved to play with his toy car. He would run around the park to play with. They all day, and the sun went on a tree. They liked to play with the toys with a ball. One day, Tom and Sam were playing with the ball together. They played together and had lots of fun. At the end of the day, Tim and his friends were very happy. They played together all day, laughing and having fun. <|endoftext|>
-
-
Train models on OpenWebText and submit your attained perplexities to a leaderboard.
-
train tokenizer
uv run python script/train_bpe_tokenizer.py --config owt -
dataset tokenization
uv run python script/tokenize_and_bin.py --config owt -
pretrain model
uv run python script/train.py --config owt
📦 Click to expand: Training result
wandb report: https://api.wandb.ai/links/viko/axveizyyofficial leaderboard: Assignment 1 (Basics) Leaderboard
-
-
Tuning the learning rate with SGD example
uv run python script/learning_rate_tuning_sgd.py
-
layer normalization
-
position embeddings
-
SwiGLU vs. SiLU
This project is licensed under the MIT License - see the LICENSE file for details.







