Skip to content

adrische/MuZero-MsPacman

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Reimplementation attempt of MuZero for Ms Pacman

  1. Dyna-Q
  2. DQN
  3. MCTS
  4. MuZero

Work in progress. Feedback welcome.

Supervised learning on a replay buffer with value targets generated by self-play and Monte-Carlo tree search?

Signs of life of DQN on Ms Pacman after 1mio frames:

TODO

  • Dyna-Q Notebook
  • DQN Notebook Work in Progress
    • Replay buffer
    • Atari environment
    • Neural network, stochastic gradient descent
    • Training loop
    • Signs of life :)
    • GPU
    • Debug!
    • Remaining details from both DQN papers
    • Run for the full number of frames
  • MuZero
    • Monte-Carlo tree search Notebook
      • Does it work with tensors / batches
    • Other changes to DQN
      • Different loss
      • TD-targets
      • Non-uniform sampling from replay buffer
      • ...

About

Step-by-step reimplementation attempt of MuZero for Ms Pacman. Work in progress.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors