Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RL-baseline] Model v5, experiment #4 #47

Open
wants to merge 2 commits into
base: RL-baseline-v5
Choose a base branch
from

Conversation

ziritrion
Copy link
Collaborator

For this experiment, a brand new move set with extreme granularity was chosen:
[0.0, 0.0, 0.0], # no action
[0.0, 0.9, 0.0], # throttle high
[0.0, 0.6, 0.0], # throttle medium-high
[0.0, 0.4, 0.0], # throttle medium-low
[0.0, 0.2, 0.0], # throttle low
[0.0, 0.0, 0.9], # brake high
[0.0, 0.0, 0.6], # brake medium-high
[0.0, 0.0, 0.4], # brake medium-low
[0.0, 0.0, 0.2], # brake low
[-0.9, 0.9, 0.0], # left high, throttle high
[-0.9, 0.6, 0.0], # left high, throttle medium-high
[-0.9, 0.4, 0.0], # left high, throttle medium-low
[-0.9, 0.2, 0.0], # left high, throttle low
[-0.9, 0.0, 0.9], # left high, brake high
[-0.9, 0.0, 0.6], # left high, brake medium-high
[-0.9, 0.0, 0.4], # left high, brake medium-low
[-0.9, 0.0, 0.2], # left high, brake low
[-0.9, 0.0, 0.0], # left high, no throttle
[-0.6, 0.9, 0.0], # left medium-high, throttle high
[-0.6, 0.6, 0.0], # left medium-high, throttle medium-high
[-0.6, 0.4, 0.0], # left medium-high, throttle medium-low
[-0.6, 0.2, 0.0], # left medium-high, throttle low
[-0.6, 0.0, 0.9], # left medium-high, brake high
[-0.6, 0.0, 0.6], # left medium-high, brake medium-high
[-0.6, 0.0, 0.4], # left medium-high, brake medium-low
[-0.6, 0.0, 0.2], # left medium-high, brake low
[-0.6, 0.0, 0.0], # left medium-high, no throttle
[-0.4, 0.9, 0.0], # left medium-low, throttle high
[-0.4, 0.6, 0.0], # left medium-low, throttle medium-high
[-0.4, 0.4, 0.0], # left medium-low, throttle medium-low
[-0.4, 0.2, 0.0], # left medium-low, throttle low
[-0.4, 0.0, 0.9], # left medium-low, brake high
[-0.4, 0.0, 0.6], # left medium-low, brake medium-high
[-0.4, 0.0, 0.4], # left medium-low, brake medium-low
[-0.4, 0.0, 0.2], # left medium-low, brake low
[-0.4, 0.0, 0.0], # left medium-low, no throttle
[-0.2, 0.9, 0.0], # left low, throttle high
[-0.2, 0.6, 0.0], # left low, throttle medium-high
[-0.2, 0.4, 0.0], # left low, throttle medium-low
[-0.2, 0.2, 0.0], # left low, throttle low
[-0.2, 0.0, 0.9], # left low, brake high
[-0.2, 0.0, 0.6], # left low, brake medium-high
[-0.2, 0.0, 0.4], # left low, brake medium-low
[-0.2, 0.0, 0.2], # left low, brake low
[-0.2, 0.0, 0.0], # left low, no throttle
[0.9, 0.9, 0.0], # right high, throttle high
[0.9, 0.6, 0.0], # right high, throttle medium-high
[0.9, 0.4, 0.0], # right high, throttle medium-low
[0.9, 0.2, 0.0], # right high, throttle low
[0.9, 0.0, 0.9], # right high, brake high
[0.9, 0.0, 0.6], # right high, brake medium-high
[0.9, 0.0, 0.4], # right high, brake medium-low
[0.9, 0.0, 0.2], # right high, brake low
[0.9, 0.0, 0.0], # right high, no throttle
[0.6, 0.9, 0.0], # right medium-high, throttle high
[0.6, 0.6, 0.0], # right medium-high, throttle medium-high
[0.6, 0.4, 0.0], # right medium-high, throttle medium-low
[0.6, 0.2, 0.0], # right medium-high, throttle low
[0.6, 0.0, 0.9], # right medium-high, brake high
[0.6, 0.0, 0.6], # right medium-high, brake medium-high
[0.6, 0.0, 0.4], # right medium-high, brake medium-low
[0.6, 0.0, 0.2], # right medium-high, brake low
[0.6, 0.0, 0.0], # right medium-high, no throttle
[0.4, 0.9, 0.0], # right medium-low, throttle high
[0.4, 0.6, 0.0], # right medium-low, throttle medium-high
[0.4, 0.4, 0.0], # right medium-low, throttle medium-low
[0.4, 0.2, 0.0], # right medium-low, throttle low
[0.4, 0.0, 0.9], # right medium-low, brake high
[0.4, 0.0, 0.6], # right medium-low, brake medium-high
[0.4, 0.0, 0.4], # right medium-low, brake medium-low
[0.4, 0.0, 0.2], # right medium-low, brake low
[0.4, 0.0, 0.0], # right medium-low, no throttle
[0.2, 0.9, 0.0], # right low, throttle high
[0.2, 0.6, 0.0], # right low, throttle medium-high
[0.2, 0.4, 0.0], # right low, throttle medium-low
[0.2, 0.2, 0.0], # right low, throttle low
[0.2, 0.0, 0.9], # right low, brake high
[0.2, 0.0, 0.6], # right low, brake medium-high
[0.2, 0.0, 0.4], # right low, brake medium-low
[0.2, 0.0, 0.2], # right low, brake low
[0.2, 0.0, 0.0], # right low, no throttle

The max Running Reward achieved was 448 at the 3.5k episode mark, but for most of the experiment the Running Reward was negative and the experiment ended at a very sharp drop with a final value of -13, even though it was over 200 not even 50 episodes before.

I believe it's likely that the results could be improved further with additional training, but after the success we've found with finetuning the action set in REINFORCE experiments, we believe that the only way we could achieve noticeably improved results over what we've got so far with REINFORCE with Baseline is by limiting the action set in a way that forbids the network from choosing actions with catastrophic consequences, which essentially means driving slowly.

Tensorboard screenshots below:
TensorBoard
TensorBoard

Sample video below:
https://user-images.githubusercontent.com/1465235/113552629-aead0480-95f6-11eb-8e44-0c70b081d4c2.mp4

@ziritrion ziritrion changed the base branch from main to RL-baseline-v5 April 5, 2021 08:09
@ziritrion ziritrion changed the title Rl baseline v5 exp4 [RL-baseline] Model v5, experiment #4 Apr 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant