You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As I mentioned in another issue, I've been working on training an AI Agent to play Othello/reversi. I wanted to report that I've had some pretty decent success using AlphaZero.jl. Much more than I was able to achieve with PYTorch, TensorFlow or Flux.jl. That's the good news. The not-so-good news is that while I've gotten a relatively good player, it's still not that great. It easily beats really bad players (like me) and can play 50/50 against a basic MinMax heuristic (translated from https://github.com/sadeqsheikhi/reversi_python_ai).
In my training, I've done around 25 iterations (the repository is here: https://git.sr.ht/~bwanab/AZ_Reversi.jl). The loss seems to have flatlined at around 10 iteration and very gradually slopes upward after that.
Are there any particular hyper-parameters that I should look at? One thing I tried that didn't seem to make much difference was making the net a little bigger by changing the number of blocks from 5 to 8.
The text was updated successfully, but these errors were encountered:
Replying to myself, but I've found that by increasing the timeout when creating the AlphaZeroPlayer, the level of play gets much better. For example, in the case I gave above of playing 50/50 against the MinMax heuristic, using a 5 second timeout instead of the default 2 seconds, raises the level to more like 80/20. At 10 seconds, MinMax can't beat it.
If anybody has insight into this I'd love to hear it.
Thanks for reporting on your experience! Tuning AlphaZero can be pretty hard indeed. Can I see some of the automatically generated metrics and graphs in your experiment?
As I mentioned in another issue, I've been working on training an AI Agent to play Othello/reversi. I wanted to report that I've had some pretty decent success using AlphaZero.jl. Much more than I was able to achieve with PYTorch, TensorFlow or Flux.jl. That's the good news. The not-so-good news is that while I've gotten a relatively good player, it's still not that great. It easily beats really bad players (like me) and can play 50/50 against a basic MinMax heuristic (translated from https://github.com/sadeqsheikhi/reversi_python_ai).
In my training, I've done around 25 iterations (the repository is here: https://git.sr.ht/~bwanab/AZ_Reversi.jl). The loss seems to have flatlined at around 10 iteration and very gradually slopes upward after that.
Are there any particular hyper-parameters that I should look at? One thing I tried that didn't seem to make much difference was making the net a little bigger by changing the number of blocks from 5 to 8.
The text was updated successfully, but these errors were encountered: