Description
As I mentioned in another issue, I've been working on training an AI Agent to play Othello/reversi. I wanted to report that I've had some pretty decent success using AlphaZero.jl. Much more than I was able to achieve with PYTorch, TensorFlow or Flux.jl. That's the good news. The not-so-good news is that while I've gotten a relatively good player, it's still not that great. It easily beats really bad players (like me) and can play 50/50 against a basic MinMax heuristic (translated from https://github.com/sadeqsheikhi/reversi_python_ai).
In my training, I've done around 25 iterations (the repository is here: https://git.sr.ht/~bwanab/AZ_Reversi.jl). The loss seems to have flatlined at around 10 iteration and very gradually slopes upward after that.
Are there any particular hyper-parameters that I should look at? One thing I tried that didn't seem to make much difference was making the net a little bigger by changing the number of blocks from 5 to 8.