Othello RL

An implementation of self-play reinforcement learning algorithm for 8x8 Othello. The algorithm uses Monte Carlo Tree Search to improve its policy in every turn and then updates the policy so that actions taken by MCTS have higher probability if they led to winning the game.

Results

The agent was trained for approximately 24 hours on a node with 32 CPU cores. In every iteration it played 160 games against a randomly sampled version of past agent parameters. This prevents agent's policy from overfittnig. 50 MCTS traversals per turn were used and the game tree was reset after every game. To update agent parameters I used clipped PPO loss.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
othello		othello
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
gcp_train_config.yaml		gcp_train_config.yaml
setup.py		setup.py
submit_gcp_eval.sh		submit_gcp_eval.sh
submit_gcp_train.sh		submit_gcp_train.sh
win_rates.png		win_rates.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Othello RL

Results

About

Releases

Packages

Languages

marekgalovic/othello-rl

Folders and files

Latest commit

History

Repository files navigation

Othello RL

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages