ScratchSeq is a from-scratch learning project for sequence modeling and language understanding, implemented with a PyTorch-first mindset.
The goal is not benchmark performance, but mechanistic understanding — how sequence models evolved, why each innovation mattered, and how to implement them cleanly.
- Implement core models before using abstractions
- Read original papers alongside code
- Prefer minimal, inspectable implementations
- Focus on learning signals, gradients, and inductive biases
This repository is designed as a learning timeline, not a model zoo.
ROADMAP available at TIMELINE
- ❌ No large-scale pretraining
- ❌ No SOTA chasing
- ❌ No heavy frameworks or wrappers
- ❌ No “black box” usage
By completing ScratchSeq, one should be able to:
- Derive sequence models from first principles
- Understand why transformers replaced recurrence
- Reason about attention, memory, and scaling limits
- Read modern LLM papers without hand-waving gaps
🚧 Work in progress — built incrementally alongside paper reading and experimentation.
ScratchSeq is about earning intuition, not importing it.