This package provides the core MuZero algorithm in Julia Language:
MuZero is a state of the art RL algorithm for board games (Chess, Go, ...) and Atari games. It is the successor to AlphaZero but without any knowledge of the environment underlying dynamics. MuZero learns a model of the environment and uses an internal representation that contains only the useful information for predicting the reward, value, policy and transitions.
Because MuZero is resource-hungry, the motivation for this project is to provide an implementation of MuZero that is simple enough to be widely accessible, while also being sufficiently powerful and fast to enable meaningful experiments on limited computing resources. I found the Julia language to be instrumental in achieving this goal.
-
To install Julia on your platform, download from the appropriate mirror and add to PATH, instructions can be found here
-
To set up Git on your computer follow the instructions here
To download MuZero.jl and start training a TicTacToe agent with 2 threads, just run:
git clone https://github.com/deveshjawla/MuZero.jl
cd MuZero.jl
julia --project -e 'import Pkg; Pkg.instantiate()'
julia --project -t 3 ./games/tictactoe/main.jl
Note that the MuZero agent is not exposed to the baselines during training and learns purely from self-play, without any form of supervision or prior knowledge.
julia --project ./games/tictactoe/play.jl
- Residual Network and Fully connected network in Flux
- Reinforcement Learning enviornment and TicTacToe example adapted from ReinforcementLearning.jl
- Parallel computing natively supported by Julia
- Multi GPU support for the training and the selfplay
- Model weights automatically saved at checkpoints
- Single and two player mode
- Easily adaptable for new games
- Tic-tac-toe (Tested with the fully connected network)
You can adapt the configurations of each game by editing the Config
of the params.jl
file in the games folder.
I would like to invite you to contribute to this project by addressing any of the following points:
- User Interface: Session management, track Learning Performance with TensorBoard, and Diagnostic tools to understand the learned model
- Benchmarking: Interface and tools for Benchmarking against Perfect solvers, MCTS Only or Network Only players.
- Logging Tools: To track code performance
- Optimize code for Performance
- Support for more than 2 Players
- Hyper-Parameter Search
- Support for Continuous action spaces
- Support of New environments: Zero sum games, RL, Control problems etc.
The next aim for me would be to implement an easy to use Interface and this could be expected in v0.4.0. The User Interface and Benchmarking will most likely be adapted from Jonathan Laurent's AlphaZero.jl.
- David Foster for his excellent tutorial
- Werner Duvaud : the core algorithm of this Julia implementation is mostly based on his Python implementation. Some parts of this
ReadMe
are also adpated from his Github repository - Julian Schrittweiser for his tutorial and the associated pseudocode
- Jonathan Laurent : Some parts of this
ReadMe
are adpated from his Github repository
- Author: Devesh Jawla
- Contributors
If you want to support this project and help it gain visibility, please consider starring the repository. Doing well on such metrics may also help us secure academic funding in the future. Also, if you use this software as part of your research, I would appreciate that you include the following citation in your paper.