- Dyna-Q
- DQN
- MCTS
- MuZero
Work in progress. Feedback welcome.
Supervised learning on a replay buffer with value targets generated by self-play and Monte-Carlo tree search?
Signs of life of DQN on Ms Pacman after 1mio frames:
TODO
- Dyna-Q Notebook
- DQN Notebook Work in Progress
- Replay buffer
- Atari environment
- Neural network, stochastic gradient descent
- Training loop
- Signs of life :)
- GPU
- Debug!
- Remaining details from both DQN papers
- Run for the full number of frames
- MuZero
- Monte-Carlo tree search Notebook
- Does it work with tensors / batches
- Other changes to DQN
- Different loss
- TD-targets
- Non-uniform sampling from replay buffer
- ...
- Monte-Carlo tree search Notebook
