[PPO - early-step / green penalty] Value Function coeff c1 = 2.0, entropy coeff c2 = 0.08 #63

xeviknal · 2021-04-18T13:00:41Z

Playing around with PPO hyperparams we've realized that increasing the value function coeff from 1.0 to 2.0 boosts the training.

The first part of the training (orange) uses the c1 coeff to 1.0 whereas the second part is set to 2.0.

However, doing a whole training with c1=2.0 and c2=0.08 shows that the model ends ups overfitting and decreasing the entropy radically.

The early-stop wrapper makes it easy to learn how to go fast and ahead. However, after learning this it is not able to learn how to drive through curves.

Running experiment with c1=2, c2=0.08

c4bbe2b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PPO - early-step / green penalty] Value Function coeff c1 = 2.0, entropy coeff c2 = 0.08 #63

[PPO - early-step / green penalty] Value Function coeff c1 = 2.0, entropy coeff c2 = 0.08 #63

xeviknal commented Apr 18, 2021

[PPO - early-step / green penalty] Value Function coeff c1 = 2.0, entropy coeff c2 = 0.08 #63

Are you sure you want to change the base?

[PPO - early-step / green penalty] Value Function coeff c1 = 2.0, entropy coeff c2 = 0.08 #63

Conversation

xeviknal commented Apr 18, 2021