Is it worth to continue training after the memory buffer was filled with samples? #176

smart-fr · 2023-02-22T10:42:59Z

In order to avoid an ERROR: Out of GPU memory, I had to reduce my mem_buffer_size quite drastically (even though there's not yet a logical explanation why this works).

As a result, the memory buffer is filled with samples after only a few training iterations.
I am wondering if subsequent training iteration truly improve my agent or are just a waste of computing time, since I am observing the following after the memory buffer was filled: (using netparams taken from the connect-four example, which might not be optimal for my game)

the learning phase very rarely manages to reduce the loss
the network is very rarely replaced after the checkpoint evaluation (it seems it is replaced only when the loss was successfully reduced), as if it had converged to a stable state.

Playing against my agent, which reached a fair level, yet I suspect it has stopped improving after the memory buffer was filled. I can't be sure of this, I will further test and maybe try pitting two versions of it against each other.

But I have the following theoretical questions:
(I understand that the samples generated during self-play are meant to incur both new visits of existing MCTS nodes and brand new nodes.)

during a new iteration after the memory buffer was filled with samples, are some nodes / visits of the current MCTS forgotten, while new ones are created based on the new samples generated during self-play?
can we say that the impacted nodes are probably improved by this "creative destruction" process, since their statistics are less polluted by silly moves inspired by early NN?
since the MCTS learns at every iteration, is it correct to say that the agent improves during an iteration even when the NN isn't replaced after the checkpoint evaluation?

The text was updated successfully, but these errors were encountered:

jonathan-laurent · 2023-02-22T14:18:21Z

First of all, you should make sure that the buffer size is at least as big as the number of samples generated during a single iteration. Otherwise, you are just throwing away compute.

If the network is not replaced after an evaluation, it means that it is not significantly better than the one already in use to generate data. However, it is possible for the optimization process to go through local minima and sometimes getting an improvement is going to take several iterations.

As a rule of thumb though, if the network is never updated during a number of consecutive iterations large enough for the memory buffer to be fully renewed, then it is likely that learning has stalled or something is wrong with your hyperparameters. Indeed, in this case, the quality of data in your buffer is not improving anymore and self-play data is being wasted.

Finally, the advantage of a bigger buffer is that you get more training data and possibly better generalization. You also get better sample efficiency by reusing samples in multiple batch updates. What the ideal size of the memory buffer should be is hard to say in general and a critical hyper-parameter to tune. However, 40K samples looks pretty low (2M is typical for connect four).

smart-fr · 2023-02-22T17:24:57Z

Thank you for all the insight.
I understand it probably makes no sense to say that the agent improves during an iteration even when the NN isn't replaced after the checkpoint evaluation.
As a matter of fact, it appears now (after 60+ iterations) that to replace the NN it takes almost exactly the number of consecutive iterations large enough for the memory buffer to be fully renewed.

jonathan-laurent · 2023-02-22T19:06:49Z

Note that the agent does not technically improve when the NN is not replaced since the same NN is still used to generate data but it does not mean progress isn't being made, either by improving the quality of the data in the memory buffer or by optimizing the current network (a lower loss may not mean a better NN right now but it may lead to it in the future).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is it worth to continue training after the memory buffer was filled with samples? #176

Is it worth to continue training after the memory buffer was filled with samples? #176

smart-fr commented Feb 22, 2023

jonathan-laurent commented Feb 22, 2023

smart-fr commented Feb 22, 2023 •

edited

Loading

jonathan-laurent commented Feb 22, 2023

Is it worth to continue training after the memory buffer was filled with samples? #176

Is it worth to continue training after the memory buffer was filled with samples? #176

Comments

smart-fr commented Feb 22, 2023

jonathan-laurent commented Feb 22, 2023

smart-fr commented Feb 22, 2023 • edited Loading

jonathan-laurent commented Feb 22, 2023

smart-fr commented Feb 22, 2023 •

edited

Loading