Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About PPO train on the Intersection and DQN train on the highway-v0 #585

Open
AHPUymhd opened this issue Apr 1, 2024 · 4 comments
Open

Comments

@AHPUymhd
Copy link

AHPUymhd commented Apr 1, 2024

Hello dear authors, thanks for your contributions in highway-env, but I recently had some questions when training the agent with stable-baselines3:
1.I learned 20,000 steps with DQN in highway-v0, but it only learned to steer to the far right, and can't dodge vehicles or even overtake, and the code is the official documentation code as follows:
image
image
image
Is there anything wrong with this code? please
2.Even I learned 400,000 steps with PPO at the intersection, but the learning effect was very bad, I don't know what went wrong, can you help me? code as follows:
image
image

@AHPUymhd
Copy link
Author

AHPUymhd commented Apr 1, 2024

@eleurent

@eleurent
Copy link
Collaborator

eleurent commented Apr 6, 2024

Hi,
For the highway-v0 run, I think that the problem is that the observation is configures with absolute coordinates (absolute: True) instead of relative (absolute: False). This means that the observed features (e.g. x position) will diverge quickly, the learned decisions will not generalise to any new position in the scene.

So I would set the observation config to relative (absolute: False) and try again.

For intersection-v0 however, absolute coordinates are more appropriate since it's always the locations in the scenes that are visited (but relative coordinates may work well too). PPO should definitely be able to learn a medium policy, e.g. tries to cross the intersection and sometimes collides. The MLP is a bad model for this task because it cannot easily understand and generalise interactions between vehicles , and I got much better results with Transformer models (see paper), but MLP should at least get off the ground and improve a bit over a random policy.

@AHPUymhd
Copy link
Author

AHPUymhd commented Apr 6, 2024

Hi, For the highway-v0 run, I think that the problem is that the observation is configures with absolute coordinates (absolute: True) instead of relative (absolute: False). This means that the observed features (e.g. x position) will diverge quickly, the learned decisions will not generalise to any new position in the scene.

So I would set the observation config to relative (absolute: False) and try again.

For intersection-v0 however, absolute coordinates are more appropriate since it's always the locations in the scenes that are visited (but relative coordinates may work well too). PPO should definitely be able to learn a medium policy, e.g. tries to cross the intersection and sometimes collides. The MLP is a bad model for this task because it cannot easily understand and generalise interactions between vehicles , and I got much better results with Transformer models (see paper), but MLP should at least get off the ground and improve a bit over a random policy.

Thank you very much for your help, I will follow your suggestions to modify the code, thank you very much for your reply to the open source community. By the way, I don't unsderstand the ego_spacing and "destination": "o1" and "scaling": 5.5 * 1.3 these parameter Could you please explain for me ? I would appreciate it very much.

@eleurent
Copy link
Collaborator

eleurent commented Apr 7, 2024

By the way, I don't unsderstand the ego_spacing and "destination": "o1" and "scaling": 5.5 * 1.3 these parameter Could you please explain for me ? I would appreciate it very much.

  • destination is the name of the node in the road network that the ego-vehicle is driving to. They are defined here:

(o:outer | i:inner + [r:right, l:left]) + (0:south | 1:west | 2:north | 3:east)

o1 is the west outer location.

  • scaling is just the zoom level of the camera.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants