-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hardware sizing with regards to problem complexity #165
Comments
There is no perfect answer here. Also, note that AlphaZero was not designed to work optimally with every possible cluster configuration out-of-the-box. You may need to do some tweaking to achieve the performance you want on your hardware. One of the main factors that determines how much compute you will need is the branching factor of your game (along with the average number of moves in a game). For example, connect four has maximum branching factor 7 and a game of connect four is usually ~30 moves. Connect four is about the difficulty of what you can solve easily on commodity hardware (one gaming laptop with a decent GPU). AlphaZero being sample inefficient, the amount of required compute can scale really fast with the complexity of your game. Depending on your hardware, you can invest this compute differently:
What the best tradeoff is depends on your available hardware, your specific use-case, how costly it is to simulate your environment... Finally, the best way to make AlphaZero suitable for challenging games without spending too much compute is to initialize the policy with a decent heuristic (possibly learned from human data with supervised learning). This has the practical effect of considerably reducing your branching factor since only actions that are not clearly stupid will be considered most of the times. |
Thank you for your reply. Re: compute investment strategy, if I want to explore using larger networks vs more MCTS simulations, what are the main parameters I should play around with, which don't require a deep understanding of all under-the-hood mechanisms? I guess in Would you recommend some readings about this question? |
Also if your branching factor is huge you will be penalized because the way it is coded Alphazero.jl use all possible moves. For example using Alphazero.jl for chess would need to store more than 1800 moves, policy etc. Whereas in a given position you have at most around 250 moves possible. So you are wasting a lot of memory, preventing I think to train such games without huge amount of ram. (I tried for the game Ataxx, this is very slow, cause you can't play that many games in parallel). It is quit easy to fix (eg storing the move or a move id in actions and retaining only valid actions instead of masking) |
@fabricerosay You are perfectly right. The reason I made this implementation choice initially is that any problem with a branching factor where this is problematic is probably not learnable from scratch using a reasonable amount of compute. This does not hold anymore when initializing the policy from supervised learning though and so I may want to lift this restriction indeed. |
I started again to work on alphazero: new implementation more inline with Alphagpu but not wholly on gpu( i dropped struct nodes etc for a SOA imple), adding NNcache and on connect4 i saw a huge performance gain: 4096 games, 600 rollouts with a 128x5 resnet in under 5 minutes. |
Using an heuristic to artificially prevent (mask) the dumbest actions after |
Interesting. Does your system offer an API similar to AlphaZero.jl's |
No it is different, very experimental, and miles away from AlphaZero.jl in terms of coding quality, it is not as generic, but it is probably faster. If you'd be to use it , you would have to dig into the ugly, uncommented code. Very amateurish work, which I am. |
Does AlphaZero.jl take advantage of multiple GPUs on a single machine, or is a cluster of single-GPU machines the only way to parallelize GPU computing?
|
AlphaZero.jl cannot leverage multi-GPU machines out-of-the-box but making it do so would probably only require a small change. |
It would be super helpful for the exploration of the framework and its possibilities to have some list of rules and constraints linking the parameters, the output indicators and the system hardware characteristics. For example (I don't know if these are true or false):
etc. If everyone could contribute their observations, it would be a useful start. |
You are perfectly right and this would be useful. More generally, I am regularly thinking about what a smart framework could look like that performs as much autotuning as possible given one's configuration, makes hyperparameter sanity checks and even suggests relevant hyperparameter variants. This is an open research question though and in any case, I am skeptical an algorithm as complex and computationally-demanding as AlphaZero can ever be used as a black-box. |
I understand the complexity of the question of a self-adapting framework, and the value of any solution which would get us closer to this goal. From my amateur point of view, this is simply way beyond my power. But believe it or not, I was able to create a fairly good agent playing my game, in nominal conditions (16x16 board), without any deep knowledge of under-the-hood mechanics, "just" by coding my game's rules according to the I wouldn't call this "black-box" usage, but it demonstrates the great versatility of this framework you created following DeepMind guidelines. |
Hihi,
Do you have a rule of thumb that could be used in order to determine which sizing of hardware would be required in order to train an agent given the size and complexity of a game model? In terms of CUDA cores, GPU memory, CPU memory, Mflops whatever unit could help configure the hardware before starting
dummy_run
-ning a game?The text was updated successfully, but these errors were encountered: