Reimplementation attempt of MuZero for Ms Pacman

Work in progress. Feedback welcome.

Supervised learning on a replay buffer with value targets generated by self-play and Monte-Carlo tree search?

Signs of life of DQN on Ms Pacman after 1mio frames:

TODO

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
assets		assets
notebooks		notebooks
src		src
README.md		README.md

Provide feedback