Hands-on notebooks covering classical and modern Reinforcement Learning algorithms, from the foundations of decision-making to Deep RL with neural networks.
- Model sequential decision problems as Markov Decision Processes (MDPs)
- Understand and apply the Bellman Equation for value estimation
- Implement Q-Learning from scratch in discrete environments
- Explore the exploration vs. exploitation trade-off
- Scale to complex environments using Deep Q-Networks (DQN)
- Interpret learning curves and agent performance
| Topic | Level |
|---|---|
| Python | Intermediate |
| NumPy & Matplotlib | Basic |
| Neural Networks | Basic (for Lesson 3) |
Aula1_Q_Learning_Taxi.ipynb
Introduction to Q-Learning in a discrete environment. The agent learns to navigate the Taxi grid by maximizing cumulative reward.
Aula2_Bellman.ipynb·Aula2_Grid_movements_RL.ipynb·Aula2_Q_Learning_FrozenLakev1.ipynb
From the Bellman recurrence to a full Q-Learning implementation in the stochastic FrozenLake environment.
Aula3_DQN_Breakout.ipynb·Aula3_DQN_LunarLander.ipynb
Moving from Q-tables to neural network function approximation. Trains DQN agents on Breakout and LunarLander.
Option 1 — Google Colab (recommended)
Click any "Open in Colab" badge above to run directly in your browser, no setup required.
Option 2 — Local
git clone https://github.com/ahirtonlopes/Mastering-Reinforcement-Learning.git
cd Mastering-Reinforcement-Learning
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
jupyter notebookDependencies
gymnasium
numpy
matplotlib
tensorflow # Lesson 3
Run notebooks in order. After each lesson, try modifying the key hyperparameters:
alpha— learning rategamma— discount factorepsilon— exploration rate
Observe how each change affects the reward curve and ask: is the agent actually learning, or just getting lucky?
Prof. Dr. Ahirton Lopes · LinkedIn · Google Scholar
Contributions are welcome — open an issue or submit a pull request.