reinforce-learningrate #actions1,2,3,4 #45

jaimepedretp · 2021-04-04T13:48:00Z

Experiments manually modifying actions on a trained reinforce algorithm to solve gym CarRacing.

We start from the model as follows:

Branch --> reinforce-learningrate

Set of actions:

available_actions = [
[0.0, 0.7, 0.0], # throttle
[0.0, 0.5, 0.0], # throttle
[0.0, 0.2, 0.0], # throttle
[0.0, 0.0, 0.7], # break
[0.0, 0.0, 0.5], # break
[0.0, 0.0, 0.2], # break
[-0.8, 0.1, 0.0], # left
[-0.5, 0.1, 0.0], # left
[-0.2, 0.1, 0.0], # left
[0.8, 0.1, 0.0], # right
[0.5, 0.1, 0.0], # right
[0.2, 0.1, 0.0], # right
]

Average Reward.

Model was stuck around 200 and 600 average reward. There were some high average reward moments, with average up to 800 between 46k and 55k episodes.

We analyzed what was happening seeing the actual video. We noticed the agent was not able to correctly process low speed corners. It was not even braking.
Car was accelerating until speed was too high to manage any corner.

openaigym.video.0.632780.video000000.mp4

We stared an investigation using different sets of actions but from an already trained network. (the one on the picture above.)

Branch --> reinforce-learningrate-act1

We changed set of actions, trying to reduce speed.

available_actions = [
[0.0, 0.2, 0.0], # throttle – lower acc (from 0.7 to 0.2)
[0.0, 0.1, 0.0], # throttle – lower acc (from 0.5 to 0.1)
[0.0, 0.0, 0.0], # no action
[0.0, 0.0, 0.7], # break
[0.0, 0.0, 0.5], # break
[0.0, 0.0, 0.2], # break
[-1.0, 0.0, 0.0], # left – more steering angle (from -0.8 to -1) / and no throttle when turning
[-0.5, 0.0, 0.0], # left
[-0.2, 0.0, 0.0], # left
[1.0, 0.0, 0.0], # right – more steering angle (from 0.8 to 1) / and no throttle when turning
[0.5, 0.0, 0.0], # right
[0.2, 0.0, 0.0], # right
]

Results improved significantly

Now it was quite clear that car was not loosing track so easy as acceleration was reduced.

We had some good examples when we were lucky and track was easy with many straights and few sharp corners.

929_reward_act1_good_example.mp4

But some other bad examples when track had sharp corners. Car was still driving to fast to successfully turn.

554_reward_act1_bad_example.mp4

Branch --> reinforce-learningrate-act2

from this point, we thought on introducing some brake action when turning and adding some more acceleration to compensate braking at corners.

available_actions = [
[0.0, 0.3, 0.0], # throttle – higher acc (from 0.2 to 0.3)
[0.0, 0.1, 0.0], # throttle
[0.0, 0.0, 0.0], # throttle
[0.0, 0.0, 0.7], # break
[0.0, 0.0, 0.5], # break
[0.0, 0.0, 0.2], # break
[-1.0, 0.0, 0.2], # left – slight braking at corners (from 0.0 to 0.2)
[-0.5, 0.0, 0.2], # left – slight braking at corners (from 0.0 to 0.2)
[-0.2, 0.0, 0.2], # left – slight braking at corners (from 0.0 to 0.2)
[1.0, 0.0, 0.2], # right – slight braking at corners (from 0.0 to 0.2)
[0.5, 0.0, 0.2], # right – slight braking at corners (from 0.0 to 0.2)
[0.2, 0.0, 0.2], # right – slight braking at corners (from 0.0 to 0.2)
]

Here we were obviously too conservative on braking but we thought it may be a good working path to prevent car from accelerating to much and crashing on sharp corners.

openaigym.video.0.610454.video000000.mp4

Branch --> reinforce-learningrate-act3

We adjusted corner braking and results were as follows:

available_actions = [
[0.0, 0.3, 0.0], # throttle
[0.0, 0.1, 0.0], # throttle
[0.0, 0.0, 0.0], # throttle
[0.0, 0.0, 0.7], # break
[0.0, 0.0, 0.5], # break
[0.0, 0.0, 0.2], # break
[-1.0, 0.0, 0.05], # left – slight braking at corners (from 0.2 to 0.05)
[-0.5, 0.0, 0.05], # left – slight braking at corners (from 0.2 to 0.05)
[-0.2, 0.0, 0.05], # left – slight braking at corners (from 0.2 to 0.05)
[1.0, 0.0, 0.05], # right – slight braking at corners (from 0.2 to 0.05)
[0.5, 0.0, 0.05], # right – slight braking at corners (from 0.2 to 0.05)
[0.2, 0.0, 0.05], # right – slight braking at corners (from 0.2 to 0.05)
]

Results were even better (final orange rewards) with a maximum average reward of 889.

openaigym.video.0.618566.video000000.mp4

It seems we are very close to average reward 900 but we think actions setup is critical.

Branch --> reinforce-learningrate-act4

We tried minimal changes to check if we could see better results.

available_actions = [
[0.0, 0.25, 0.0], # throttle – lower acc (from 0.3 to 0.25)
[0.0, 0.1, 0.0], # throttle
[0.0, 0.0, 0.0], # throttle
[0.0, 0.0, 0.7], # break
[0.0, 0.0, 0.5], # break
[0.0, 0.0, 0.2], # break
[-1.0, 0.0, 0.04], # left – slight braking at corners (from 0.05 to 0.04)
[-0.5, 0.0, 0.04], # left – slight braking at corners (from 0.05 to 0.04)
[-0.2, 0.0, 0.04], # left – slight braking at corners (from 0.05 to 0.04)
[1.0, 0.0, 0.04], # right – slight braking at corners (from 0.05 to 0.04)
[0.5, 0.0, 0.04], # right – slight braking at corners (from 0.05 to 0.04)
[0.2, 0.0, 0.04], # right – slight braking at corners (from 0.05 to 0.04)
]

Average reward results were now similar to previous action set results (final blue curve area).

openaigym.video.0.630905.video000000.mp4

CONCLUSION:

Set of available actions plays key role on agent performance. It is not possible to solve car racing environment with poor set of actions using reinforce.
Defining good set of actions seems as important as defining a proper neural network to solve this environment.

To reach average reward of 900, additional fine tuning on actions or continuous set of actions should be used.

jaimepedretp added 5 commits April 3, 2021 09:22

reinforcement 71k / different set of actions test

19187d4

75.5k actions with 0.05 brake when turning left/right - best average 889

ab0cfdd

added video using eval mode

1c48480

81.9k episodes with different set of actions / best~880 final 797

05a3695

added video eval mode

ed5ab9e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reinforce-learningrate #actions1,2,3,4 #45

reinforce-learningrate #actions1,2,3,4 #45

jaimepedretp commented Apr 4, 2021

reinforce-learningrate #actions1,2,3,4 #45

Are you sure you want to change the base?

reinforce-learningrate #actions1,2,3,4 #45

Conversation

jaimepedretp commented Apr 4, 2021

Branch --> reinforce-learningrate

Branch --> reinforce-learningrate-act1

Branch --> reinforce-learningrate-act2

Branch --> reinforce-learningrate-act3

Branch --> reinforce-learningrate-act4