Skip to content

ScratchMind/ScratchOptim

Repository files navigation

📘 ScratchOptim

🔧 ScratchOptim — Optimizers Rebuilt from First Principles

ScratchOptim is the optimization-focused core of the ScratchMind ecosystem — an educational project dedicated to re-implementing the algorithms that power modern deep learning training.

This repository rebuilds every major neural network optimizer from scratch, following their historical evolution from classical gradient descent to the Adam family and modern adaptive variants.

ScratchOptim first implements optimizers in PyTorch, then later re-implements them using the custom autograd engine from ScratchGrad, making the stack fully self-contained.


🎯 Mission

To deeply understand optimization in machine learning by rebuilding every optimizer from the ground up, mathematically and programmatically.

We focus on clarity, correctness, and reproducibility — not shortcuts.


🧠 Philosophy

1. Math First, Code Second

Understand the update rule before writing a single line of code.

2. Historical Evolution

Recreate optimizers in the order they were invented to reveal how each solved limitations of the previous.

3. Two-Stage Implementation

  • Stage 1: Implement using PyTorch Tensors for stability & testing.
  • Stage 2: Re-write on top of ScratchGrad once it reaches optimizer-ready maturity.

4. Minimal, Transparent Code

Readable, fully documented implementations over clever shortcuts.


🗂️ What’s Inside

ScratchOptim includes:

  • From-scratch implementations of foundational optimizers (GD, SGD, Momentum)
  • Adaptive optimizers (AdaGrad, RMSProp, Adam family)
  • Modern improvements (RAdam, AdaBelief)
  • Clean, unified optimizer API
  • Visualization scripts (optimizer paths, convergence behaviors)
  • Unit tests comparing outputs to PyTorch equivalents
  • Full research-backed timeline (TIMELINE.md)

No learning rate schedulers, clipping, warmup, EMA, or training utilities — those live in ScratchTrain.


📚 Learning Timeline

See the full roadmap: 👉 TIMELINE.md (included in repo)

This document explains:

  • Key innovations
  • Learning focus
  • Required math
  • Which optimizers to fully implement vs. partially vs. read-only

🔄 Relation to Other ScratchMind Projects

ScratchOptim is closely tied to:

  • ScratchGrad → provides autograd needed for final rewrites
  • ScratchTrain → uses optimizers for training loop experiments
  • ScratchVision, ScratchSeq, ScratchGen → earliest testbeds for experimenting with optimizers

Eventually, all ScratchMind models will train exclusively using ScratchOptim + ScratchGrad.


🚀 Goals

By the end of ScratchOptim, you will:

✔ Implement every major optimizer from scratch ✔ Understand their math, intuition, and dynamics ✔ Visualize how optimizers behave on real loss landscapes ✔ Build a consistent optimizer API ✔ Integrate with ScratchGrad for a self-sustaining deep learning stack ✔ Be able to modify or design your own optimization algorithms


🤝 Contributing

ScratchOptim is open to contributions:

  • Fixes and improvements
  • Additional visualizations
  • Mathematical notes
  • Comparisons between variants
  • Optimizer research replicas

Submit an issue or pull request in this repository.


Acknowledgements

Inspired by open-source ML communities, classical optimization literature, and the philosophy of learning by rebuilding.


About

Optimizers from scratch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors