-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it worth to continue training after the memory buffer was filled with samples? #176
Comments
First of all, you should make sure that the buffer size is at least as big as the number of samples generated during a single iteration. Otherwise, you are just throwing away compute. If the network is not replaced after an evaluation, it means that it is not significantly better than the one already in use to generate data. However, it is possible for the optimization process to go through local minima and sometimes getting an improvement is going to take several iterations. As a rule of thumb though, if the network is never updated during a number of consecutive iterations large enough for the memory buffer to be fully renewed, then it is likely that learning has stalled or something is wrong with your hyperparameters. Indeed, in this case, the quality of data in your buffer is not improving anymore and self-play data is being wasted. Finally, the advantage of a bigger buffer is that you get more training data and possibly better generalization. You also get better sample efficiency by reusing samples in multiple batch updates. What the ideal size of the memory buffer should be is hard to say in general and a critical hyper-parameter to tune. However, 40K samples looks pretty low (2M is typical for connect four). |
Thank you for all the insight. |
Note that the agent does not technically improve when the NN is not replaced since the same NN is still used to generate data but it does not mean progress isn't being made, either by improving the quality of the data in the memory buffer or by optimizing the current network (a lower loss may not mean a better NN right now but it may lead to it in the future). |
In order to avoid an ERROR: Out of GPU memory, I had to reduce my
mem_buffer_size
quite drastically (even though there's not yet a logical explanation why this works).As a result, the memory buffer is filled with samples after only a few training iterations.
I am wondering if subsequent training iteration truly improve my agent or are just a waste of computing time, since I am observing the following after the memory buffer was filled: (using
netparams
taken from the connect-four example, which might not be optimal for my game)Playing against my agent, which reached a fair level, yet I suspect it has stopped improving after the memory buffer was filled. I can't be sure of this, I will further test and maybe try pitting two versions of it against each other.
But I have the following theoretical questions:
(I understand that the samples generated during self-play are meant to incur both new visits of existing MCTS nodes and brand new nodes.)
The text was updated successfully, but these errors were encountered: