Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reinforce-learningrate #actions1,2,3,4 #45

Open
wants to merge 5 commits into
base: reinforce-learningrate
Choose a base branch
from

Conversation

jaimepedretp
Copy link
Collaborator

Experiments manually modifying actions on a trained reinforce algorithm to solve gym CarRacing.

We start from the model as follows:

Branch --> reinforce-learningrate

Set of actions:

available_actions = [
[0.0, 0.7, 0.0], # throttle
[0.0, 0.5, 0.0], # throttle
[0.0, 0.2, 0.0], # throttle
[0.0, 0.0, 0.7], # break
[0.0, 0.0, 0.5], # break
[0.0, 0.0, 0.2], # break
[-0.8, 0.1, 0.0], # left
[-0.5, 0.1, 0.0], # left
[-0.2, 0.1, 0.0], # left
[0.8, 0.1, 0.0], # right
[0.5, 0.1, 0.0], # right
[0.2, 0.1, 0.0], # right
]

Average Reward.

image

Model was stuck around 200 and 600 average reward. There were some high average reward moments, with average up to 800 between 46k and 55k episodes.

We analyzed what was happening seeing the actual video. We noticed the agent was not able to correctly process low speed corners. It was not even braking.
Car was accelerating until speed was too high to manage any corner.

openaigym.video.0.632780.video000000.mp4

We stared an investigation using different sets of actions but from an already trained network. (the one on the picture above.)

Branch --> reinforce-learningrate-act1

We changed set of actions, trying to reduce speed.

available_actions = [
[0.0, 0.2, 0.0], # throttle – lower acc (from 0.7 to 0.2)
[0.0, 0.1, 0.0], # throttle – lower acc (from 0.5 to 0.1)
[0.0, 0.0, 0.0], # no action
[0.0, 0.0, 0.7], # break
[0.0, 0.0, 0.5], # break
[0.0, 0.0, 0.2], # break
[-1.0, 0.0, 0.0], # left – more steering angle (from -0.8 to -1) / and no throttle when turning
[-0.5, 0.0, 0.0], # left
[-0.2, 0.0, 0.0], # left
[1.0, 0.0, 0.0], # right – more steering angle (from 0.8 to 1) / and no throttle when turning
[0.5, 0.0, 0.0], # right
[0.2, 0.0, 0.0], # right
]

Results improved significantly

image

Now it was quite clear that car was not loosing track so easy as acceleration was reduced.

We had some good examples when we were lucky and track was easy with many straights and few sharp corners.

929_reward_act1_good_example.mp4

But some other bad examples when track had sharp corners. Car was still driving to fast to successfully turn.

554_reward_act1_bad_example.mp4

Branch --> reinforce-learningrate-act2

from this point, we thought on introducing some brake action when turning and adding some more acceleration to compensate braking at corners.

available_actions = [
[0.0, 0.3, 0.0], # throttle – higher acc (from 0.2 to 0.3)
[0.0, 0.1, 0.0], # throttle
[0.0, 0.0, 0.0], # throttle
[0.0, 0.0, 0.7], # break
[0.0, 0.0, 0.5], # break
[0.0, 0.0, 0.2], # break
[-1.0, 0.0, 0.2], # left – slight braking at corners (from 0.0 to 0.2)
[-0.5, 0.0, 0.2], # left – slight braking at corners (from 0.0 to 0.2)
[-0.2, 0.0, 0.2], # left – slight braking at corners (from 0.0 to 0.2)
[1.0, 0.0, 0.2], # right – slight braking at corners (from 0.0 to 0.2)
[0.5, 0.0, 0.2], # right – slight braking at corners (from 0.0 to 0.2)
[0.2, 0.0, 0.2], # right – slight braking at corners (from 0.0 to 0.2)
]

Here we were obviously too conservative on braking but we thought it may be a good working path to prevent car from accelerating to much and crashing on sharp corners.

openaigym.video.0.610454.video000000.mp4

Branch --> reinforce-learningrate-act3

We adjusted corner braking and results were as follows:

available_actions = [
[0.0, 0.3, 0.0], # throttle
[0.0, 0.1, 0.0], # throttle
[0.0, 0.0, 0.0], # throttle
[0.0, 0.0, 0.7], # break
[0.0, 0.0, 0.5], # break
[0.0, 0.0, 0.2], # break
[-1.0, 0.0, 0.05], # left – slight braking at corners (from 0.2 to 0.05)
[-0.5, 0.0, 0.05], # left – slight braking at corners (from 0.2 to 0.05)
[-0.2, 0.0, 0.05], # left – slight braking at corners (from 0.2 to 0.05)
[1.0, 0.0, 0.05], # right – slight braking at corners (from 0.2 to 0.05)
[0.5, 0.0, 0.05], # right – slight braking at corners (from 0.2 to 0.05)
[0.2, 0.0, 0.05], # right – slight braking at corners (from 0.2 to 0.05)
]

image

Results were even better (final orange rewards) with a maximum average reward of 889.

openaigym.video.0.618566.video000000.mp4

It seems we are very close to average reward 900 but we think actions setup is critical.

Branch --> reinforce-learningrate-act4

We tried minimal changes to check if we could see better results.

available_actions = [
[0.0, 0.25, 0.0], # throttle – lower acc (from 0.3 to 0.25)
[0.0, 0.1, 0.0], # throttle
[0.0, 0.0, 0.0], # throttle
[0.0, 0.0, 0.7], # break
[0.0, 0.0, 0.5], # break
[0.0, 0.0, 0.2], # break
[-1.0, 0.0, 0.04], # left – slight braking at corners (from 0.05 to 0.04)
[-0.5, 0.0, 0.04], # left – slight braking at corners (from 0.05 to 0.04)
[-0.2, 0.0, 0.04], # left – slight braking at corners (from 0.05 to 0.04)
[1.0, 0.0, 0.04], # right – slight braking at corners (from 0.05 to 0.04)
[0.5, 0.0, 0.04], # right – slight braking at corners (from 0.05 to 0.04)
[0.2, 0.0, 0.04], # right – slight braking at corners (from 0.05 to 0.04)
]

image

Average reward results were now similar to previous action set results (final blue curve area).

openaigym.video.0.630905.video000000.mp4

CONCLUSION:

Set of available actions plays key role on agent performance. It is not possible to solve car racing environment with poor set of actions using reinforce.
Defining good set of actions seems as important as defining a proper neural network to solve this environment.

To reach average reward of 900, additional fine tuning on actions or continuous set of actions should be used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant