An interactive playground to learn reinforcement learning — from first principles to policy gradients.
wadekarg.github.io/RL-interactive-lab
Watch RL algorithms learn in real time. Adjust hyperparameters, step through episodes, and build intuition for how agents explore, exploit, and improve. No backend. No accounts. Just open the link above and start learning.
|
The exploration vs exploitation dilemma. Choose between slot machines with unknown payoffs. |
Help Boru the elephant navigate to water while avoiding lions and cliffs. Design your own worlds. |
|
The classic RL benchmark. Push a cart left or right to keep a pole balanced for 500 steps. |
Land a rocket softly under gravity with 3 thrusters. 6D continuous state space — the bridge to deep RL. |
Tip
New to RL? The app includes a complete 10-chapter interactive course built right in. Learn the theory, then see it in action in the labs. Start from Chapter 1 — no prerequisites needed.
| # | Chapter | Key Concepts | Lab |
|---|---|---|---|
| 1 | What is Reinforcement Learning? | Agent-environment loop, reward signal, RL vs supervised learning | — |
| 2 | States and Actions | State/action spaces, discrete vs continuous, trajectories | — |
| 3 | Rewards and Returns | Reward hypothesis, cumulative return G_t, discount factor γ | — |
| 4 | Policies | π(a|s), deterministic vs stochastic, optimal policy π* | — |
| 5 | Markov Decision Processes | MDP framework, transition dynamics, Markov property | — |
| 6 | Value Functions | V(s), Q(s,a), value estimation, action-value intuition | — |
| 7 | Bellman Equations | Expectation equations, optimality equations, recursive structure | — |
| 8 | Exploration vs Exploitation | ε-greedy, UCB, Thompson Sampling, regret | 🎰 Bandit |
| 9 | Temporal Difference Learning | TD(0), Q-Learning, SARSA, bootstrapping | 🐘 GridWorld |
| 10 | Policy Gradients | REINFORCE, baselines, softmax policy | 🚀 CartPole & Rocket |
Every chapter includes interactive widgets, KaTeX-rendered equations, and direct links to the hands-on labs.
Note
Everything runs client-side. No backend, no accounts, no data collection. Open the link and start learning.
- 🎮 Real-time simulation — Play, pause, step-by-step, adjustable speed
- 🎛️ Hyperparameter tuning — Adjust α, γ, ε, bins, learning rate and see instant effects
- 🔍 Step-by-step breakdowns — See Q-value updates, TD errors, policy gradients as they happen
- 📊 Episode tracking — Duration and reward charts with success markers
- 🎨 3 themes — Dark (default), Light, and Warm — persisted in localStorage
- 📐 KaTeX equations — Bellman equations, policy gradient theorem, and more rendered beautifully
- 📱 Responsive — Works on desktop and tablet
Important
All 13 RL algorithms are implemented from scratch in TypeScript — no external RL libraries. Every environment runs entirely in the browser.
Each environment follows the same architecture:
Environment.reset() → initial state
↓
Agent.act(state) → action
↓
Environment.step(state, action) → { nextState, reward, done }
↓
Agent.learn(state, action, reward, nextState, done)
↓
repeat until done → new episode
The UI records every step as a SimulationStep and renders:
- Live visualization — canvas-based environment rendering
- Step breakdown — what the algorithm computed and why
- Episode charts — duration and reward trends over time
- Algorithm explainer — educational context for the selected algorithm
| Algorithm | Type | Update Rule |
|---|---|---|
| ε-Greedy | Random with prob ε, best arm otherwise | |
| UCB | Q(a) + c√(ln(t)/N(a)) | |
| Thompson Sampling | Sample from Beta(α, β) posteriors |
| Algorithm | Type | Approach |
|---|---|---|
| Random Baseline | Uniform random actions (baseline to beat) | |
| Discretized Q-Learning | Bin continuous state → Q-table | |
| REINFORCE | Linear softmax policy, Monte Carlo updates |
src/
├── algorithms/ # 13 RL algorithm implementations
│ ├── bandit/ # ε-Greedy, UCB, Thompson Sampling
│ ├── gridworld/ # Value Iteration, Policy Iteration, Q-Learning, SARSA
│ ├── classicCartpole/ # Random, Discretized Q-Learning, REINFORCE
│ └── rocketLanding/ # Random, Discretized Q-Learning, REINFORCE
├── environments/ # 4 environment simulators (pure TypeScript, no UI)
├── components/ # React components per environment + shared UI
├── pages/ # Route pages + 10-chapter Learn course
│ └── learn/ # Interactive educational content
├── content/ # Algorithm explanations & environment stories
├── hooks/ # useSimulation, useAnimationFrame, useThemeColors
├── store/ # Zustand stores (simulation state, theme)
└── utils/ # Math helpers, step breakdowns, color utilities
The project deploys automatically to GitHub Pages via GitHub Actions.
# Build for production
npm run build
# Preview the production build locally
npm run previewThe Vite config handles the base path automatically:
- Dev:
/(localhost) - Production:
/RL-interactive-lab/(GitHub Pages)
This project was built for learning and draws inspiration from:
- 📖 Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto — the foundational textbook that shaped the curriculum
- 🎓 Coursework at UT Arlington — lectures and notes from Dr. Manfred Huber's reinforcement learning course
- 🎥 The RL community on YouTube — countless explanations, visualizations, and intuition-building videos
- 🏋️ OpenAI Gym — CartPole-v1 physics equations used as reference for the CartPole and Rocket environments
Caution
Found an error in the educational content or algorithms? Please open an issue — accuracy matters for a learning tool.
The easiest way to help is to open an issue — no code required.
If you'd like to contribute code:
git clone https://github.com/wadekarg/RL-interactive-lab.git
cd RL-interactive-lab
npm install
npm run dev # → http://localhost:5173Then create a feature branch and submit a pull request.
MIT License. See LICENSE for details.
Vibe coded by Gajanan Wadekar
Built for education. Runs entirely in your browser.