RL Interactive Lab

An interactive playground to learn reinforcement learning — from first principles to policy gradients.

🚀 Launch the Lab

wadekarg.github.io/RL-interactive-lab

Watch RL algorithms learn in real time. Adjust hyperparameters, step through episodes, and build intuition for how agents explore, exploit, and improve. No backend. No accounts. Just open the link above and start learning.

What's Inside

🎰 Multi-Armed Bandit

The exploration vs exploitation dilemma. Choose between slot machines with unknown payoffs.

🐘 Boru's GridWorld

Help Boru the elephant navigate to water while avoiding lions and cliffs. Design your own worlds.

🏋️ Classic CartPole

The classic RL benchmark. Push a cart left or right to keep a pole balanced for 500 steps.

🚀 Rocket Landing

Land a rocket softly under gravity with 3 thrusters. 6D continuous state space — the bridge to deep RL.

Learn RL — Interactive Course

Tip

New to RL? The app includes a complete 10-chapter interactive course built right in. Learn the theory, then see it in action in the labs. Start from Chapter 1 — no prerequisites needed.

#	Chapter	Key Concepts	Lab
1	What is Reinforcement Learning?	Agent-environment loop, reward signal, RL vs supervised learning	—
2	States and Actions	State/action spaces, discrete vs continuous, trajectories	—
3	Rewards and Returns	Reward hypothesis, cumulative return G_t, discount factor γ	—
4	Policies	π(a\|s), deterministic vs stochastic, optimal policy π*	—
5	Markov Decision Processes	MDP framework, transition dynamics, Markov property	—
6	Value Functions	V(s), Q(s,a), value estimation, action-value intuition	—
7	Bellman Equations	Expectation equations, optimality equations, recursive structure	—
8	Exploration vs Exploitation	ε-greedy, UCB, Thompson Sampling, regret	🎰 Bandit
9	Temporal Difference Learning	TD(0), Q-Learning, SARSA, bootstrapping	🐘 GridWorld
10	Policy Gradients	REINFORCE, baselines, softmax policy	🚀 CartPole & Rocket

Every chapter includes interactive widgets, KaTeX-rendered equations, and direct links to the hands-on labs.

Features

Note

Everything runs client-side. No backend, no accounts, no data collection. Open the link and start learning.

🎮 Real-time simulation — Play, pause, step-by-step, adjustable speed
🎛️ Hyperparameter tuning — Adjust α, γ, ε, bins, learning rate and see instant effects
🔍 Step-by-step breakdowns — See Q-value updates, TD errors, policy gradients as they happen
📊 Episode tracking — Duration and reward charts with success markers
🎨 3 themes — Dark (default), Light, and Warm — persisted in localStorage
📐 KaTeX equations — Bellman equations, policy gradient theorem, and more rendered beautifully
📱 Responsive — Works on desktop and tablet

Tech Stack

Layer	Technology	Why
UI		Type-safe components, fast iteration
Language		Full type safety across algorithms and environments
Styling		Utility-first, 3 built-in themes (dark/light/warm)
State		Lightweight store for simulation + theme state
Charts		Episode duration & reward visualizations
Math		Beautiful equation rendering throughout the course
Routing		Client-side navigation (HashRouter for GitHub Pages)
Build		Sub-second HMR, fast production builds
Deploy		Automatic via GitHub Actions on push to `main`

Important

All 13 RL algorithms are implemented from scratch in TypeScript — no external RL libraries. Every environment runs entirely in the browser.

How It Works

Each environment follows the same architecture:

Environment.reset() → initial state
     ↓
Agent.act(state) → action
     ↓
Environment.step(state, action) → { nextState, reward, done }
     ↓
Agent.learn(state, action, reward, nextState, done)
     ↓
repeat until done → new episode

The UI records every step as a SimulationStep and renders:

Live visualization — canvas-based environment rendering
Step breakdown — what the algorithm computed and why
Episode charts — duration and reward trends over time
Algorithm explainer — educational context for the selected algorithm

Algorithms at a Glance

Bandit Algorithms

Algorithm	Type	Update Rule
ε-Greedy		Random with prob ε, best arm otherwise
UCB		Q(a) + c√(ln(t)/N(a))
Thompson Sampling		Sample from Beta(α, β) posteriors

GridWorld Algorithms

Algorithm	Type	Update Rule
Value Iteration		Full Bellman sweeps over all states
Policy Iteration		Alternating evaluation + improvement
Q-Learning		Q(s,a) += α[r + γ·max Q(s',a') - Q(s,a)]
SARSA		Q(s,a) += α[r + γ·Q(s',a') - Q(s,a)]

CartPole & Rocket Algorithms

Algorithm	Type	Approach
Random Baseline		Uniform random actions (baseline to beat)
Discretized Q-Learning		Bin continuous state → Q-table
REINFORCE		Linear softmax policy, Monte Carlo updates

Project Structure

src/
├── algorithms/          # 13 RL algorithm implementations
│   ├── bandit/          #   ε-Greedy, UCB, Thompson Sampling
│   ├── gridworld/       #   Value Iteration, Policy Iteration, Q-Learning, SARSA
│   ├── classicCartpole/ #   Random, Discretized Q-Learning, REINFORCE
│   └── rocketLanding/   #   Random, Discretized Q-Learning, REINFORCE
├── environments/        # 4 environment simulators (pure TypeScript, no UI)
├── components/          # React components per environment + shared UI
├── pages/               # Route pages + 10-chapter Learn course
│   └── learn/           #   Interactive educational content
├── content/             # Algorithm explanations & environment stories
├── hooks/               # useSimulation, useAnimationFrame, useThemeColors
├── store/               # Zustand stores (simulation state, theme)
└── utils/               # Math helpers, step breakdowns, color utilities

Deployment

The project deploys automatically to GitHub Pages via GitHub Actions.

# Build for production
npm run build

# Preview the production build locally
npm run preview

The Vite config handles the base path automatically:

Dev: / (localhost)
Production: /RL-interactive-lab/ (GitHub Pages)

Acknowledgments

This project was built for learning and draws inspiration from:

📖 Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto — the foundational textbook that shaped the curriculum
🎓 Coursework at UT Arlington — lectures and notes from Dr. Manfred Huber's reinforcement learning course
🎥 The RL community on YouTube — countless explanations, visualizations, and intuition-building videos
🏋️ OpenAI Gym — CartPole-v1 physics equations used as reference for the CartPole and Rocket environments

Contributing

Caution

Found an error in the educational content or algorithms? Please open an issue — accuracy matters for a learning tool.

The easiest way to help is to open an issue — no code required.

If you'd like to contribute code:

git clone https://github.com/wadekarg/RL-interactive-lab.git
cd RL-interactive-lab
npm install
npm run dev          # → http://localhost:5173

Then create a feature branch and submit a pull request.

License

MIT License. See LICENSE for details.

Vibe coded by Gajanan Wadekar

Built for education. Runs entirely in your browser.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github/workflows		.github/workflows
public		public
src		src
.gitignore		.gitignore
README.md		README.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
postcss.config.js		postcss.config.js
tailwind.config.js		tailwind.config.js
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RL Interactive Lab

🚀 Launch the Lab

What's Inside

🎰 Multi-Armed Bandit

🐘 Boru's GridWorld

🏋️ Classic CartPole

🚀 Rocket Landing

Learn RL — Interactive Course

Features

Tech Stack

How It Works

Algorithms at a Glance

Bandit Algorithms

GridWorld Algorithms

CartPole & Rocket Algorithms

Project Structure

Deployment

Acknowledgments

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RL Interactive Lab

🚀 Launch the Lab

What's Inside

🎰 Multi-Armed Bandit

🐘 Boru's GridWorld

🏋️ Classic CartPole

🚀 Rocket Landing

Learn RL — Interactive Course

Features

Tech Stack

How It Works

Algorithms at a Glance

Bandit Algorithms

GridWorld Algorithms

CartPole & Rocket Algorithms

Project Structure

Deployment

Acknowledgments

Contributing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages