[RL-baseline] Model v5, experiment #3 #46

ziritrion · 2021-04-05T07:57:07Z

Action set #2 was chosen for this experiment:
[0.0, 0.0, 0.0], # no action
[0.0, 0.8, 0.0], # throttle
[0.0, 0.0, 0.6], # break
[-0.9, 0.0, 0.0], # left
[0.9, 0.0, 0.0], # right

The Running Reward oscillated between 200 up until the 20k episode mark, where it suddenly dropped. Since the other experiments had experienced long intervals of low reward during training, I decided to train up to 30k episodes to see if the RR could improve, but the result was disappointing.

Final Running Reward was 72, with an achieved max of 379 around the 4k episode mark.

Tensorboard screenshots below:

Sample video below:
https://user-images.githubusercontent.com/1465235/113551628-44479480-95f5-11eb-9a6f-dc3a6e61ced5.mp4

ziritrion added 3 commits April 2, 2021 12:01

Start of experiment 3 with action set #2

5fb1c26

Changed to 30k episodes due to sudden reward drop

ef2f614

30k episodes, running reward 72

274c87b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RL-baseline] Model v5, experiment #3 #46

[RL-baseline] Model v5, experiment #3 #46

ziritrion commented Apr 5, 2021

[RL-baseline] Model v5, experiment #3 #46

Are you sure you want to change the base?

[RL-baseline] Model v5, experiment #3 #46

Conversation

ziritrion commented Apr 5, 2021