Skip to content

ahirtonlopes/Mastering-Reinforcement-Learning

Repository files navigation

Mastering Reinforcement Learning

Hands-on notebooks covering classical and modern Reinforcement Learning algorithms, from the foundations of decision-making to Deep RL with neural networks.

Python Gymnasium License


What you'll learn

  • Model sequential decision problems as Markov Decision Processes (MDPs)
  • Understand and apply the Bellman Equation for value estimation
  • Implement Q-Learning from scratch in discrete environments
  • Explore the exploration vs. exploitation trade-off
  • Scale to complex environments using Deep Q-Networks (DQN)
  • Interpret learning curves and agent performance

Prerequisites

Topic Level
Python Intermediate
NumPy & Matplotlib Basic
Neural Networks Basic (for Lesson 3)

Contents

Lesson 1 — Q-Learning

Aula1_Q_Learning_Taxi.ipynb

Introduction to Q-Learning in a discrete environment. The agent learns to navigate the Taxi grid by maximizing cumulative reward.

Open in Colab


Lesson 2 — Bellman Equation & FrozenLake

Aula2_Bellman.ipynb · Aula2_Grid_movements_RL.ipynb · Aula2_Q_Learning_FrozenLakev1.ipynb

From the Bellman recurrence to a full Q-Learning implementation in the stochastic FrozenLake environment.

Open in Colab


Lesson 3 — Deep Reinforcement Learning (DQN)

Aula3_DQN_Breakout.ipynb · Aula3_DQN_LunarLander.ipynb

Moving from Q-tables to neural network function approximation. Trains DQN agents on Breakout and LunarLander.

Open in Colab


Getting Started

Option 1 — Google Colab (recommended)

Click any "Open in Colab" badge above to run directly in your browser, no setup required.

Option 2 — Local

git clone https://github.com/ahirtonlopes/Mastering-Reinforcement-Learning.git
cd Mastering-Reinforcement-Learning
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
jupyter notebook

Dependencies

gymnasium
numpy
matplotlib
tensorflow  # Lesson 3

Study Tips

Run notebooks in order. After each lesson, try modifying the key hyperparameters:

  • alpha — learning rate
  • gamma — discount factor
  • epsilon — exploration rate

Observe how each change affects the reward curve and ask: is the agent actually learning, or just getting lucky?


Author

Prof. Dr. Ahirton Lopes · LinkedIn · Google Scholar

Contributions are welcome — open an issue or submit a pull request.

License

MIT

About

Hands-on notebooks on classical and modern RL algorithms, from Q-Learning to Deep RL · Python · Gym

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors