ScratchOptim is the optimization-focused core of the ScratchMind ecosystem — an educational project dedicated to re-implementing the algorithms that power modern deep learning training.
This repository rebuilds every major neural network optimizer from scratch, following their historical evolution from classical gradient descent to the Adam family and modern adaptive variants.
ScratchOptim first implements optimizers in PyTorch, then later re-implements them using the custom autograd engine from ScratchGrad, making the stack fully self-contained.
To deeply understand optimization in machine learning by rebuilding every optimizer from the ground up, mathematically and programmatically.
We focus on clarity, correctness, and reproducibility — not shortcuts.
Understand the update rule before writing a single line of code.
Recreate optimizers in the order they were invented to reveal how each solved limitations of the previous.
- Stage 1: Implement using PyTorch Tensors for stability & testing.
- Stage 2: Re-write on top of ScratchGrad once it reaches optimizer-ready maturity.
Readable, fully documented implementations over clever shortcuts.
ScratchOptim includes:
- From-scratch implementations of foundational optimizers (GD, SGD, Momentum)
- Adaptive optimizers (AdaGrad, RMSProp, Adam family)
- Modern improvements (RAdam, AdaBelief)
- Clean, unified optimizer API
- Visualization scripts (optimizer paths, convergence behaviors)
- Unit tests comparing outputs to PyTorch equivalents
- Full research-backed timeline (
TIMELINE.md)
No learning rate schedulers, clipping, warmup, EMA, or training utilities — those live in ScratchTrain.
See the full roadmap:
👉 TIMELINE.md (included in repo)
This document explains:
- Key innovations
- Learning focus
- Required math
- Which optimizers to fully implement vs. partially vs. read-only
ScratchOptim is closely tied to:
- ScratchGrad → provides autograd needed for final rewrites
- ScratchTrain → uses optimizers for training loop experiments
- ScratchVision, ScratchSeq, ScratchGen → earliest testbeds for experimenting with optimizers
Eventually, all ScratchMind models will train exclusively using ScratchOptim + ScratchGrad.
By the end of ScratchOptim, you will:
✔ Implement every major optimizer from scratch ✔ Understand their math, intuition, and dynamics ✔ Visualize how optimizers behave on real loss landscapes ✔ Build a consistent optimizer API ✔ Integrate with ScratchGrad for a self-sustaining deep learning stack ✔ Be able to modify or design your own optimization algorithms
ScratchOptim is open to contributions:
- Fixes and improvements
- Additional visualizations
- Mathematical notes
- Comparisons between variants
- Optimizer research replicas
Submit an issue or pull request in this repository.
Inspired by open-source ML communities, classical optimization literature, and the philosophy of learning by rebuilding.