Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support to PPO for continuous actions spaces #103

Closed
wants to merge 0 commits into from

Conversation

kuds
Copy link

@kuds kuds commented Sep 10, 2024

#77

Design

Created a new class called ContinuousProximalPolicyOptimization to handle continuous action spaces for PPO. Updated README with new tutorial, added new unit tests, and general comments clean-up

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 10, 2024
@kuds kuds changed the title Add support to PPO for continuous actions spaces Adding support to PPO for continuous actions spaces Sep 11, 2024
@rodrigodesalvobraz
Copy link
Contributor

Thank you! I am currently reviewing the PR.

@rodrigodesalvobraz
Copy link
Contributor

rodrigodesalvobraz commented Sep 13, 2024

Thanks, I've reviewed it and have two requests:

  • You duplicated quite a bit of code from ppo.py to continuous_ppo.py. It seems that almost all changes are in _actor_loss. Would you be able to define a base class BasePPO containing most of the common code now in PPO, and subclass it into two subclasses PPO and ContinuousPPO, each of them defining the appropriate _actor_loss?
  • Can you replace tuple[Tensor, Normal] by Tuple[Tensor, Normal]? The former does not work with Python 3.8, which we still currently support in pyproject.toml.

@kuds
Copy link
Author

kuds commented Sep 14, 2024

Sure, it makes sense to consolidate the core PPO logic into a base class and have the discrete and continuous versions override where needed (e.g., the _actor_loss method). I will work on incorporating both requests into my pull request and should have those done sometime next week. Thanks for the feedback!

@kuds
Copy link
Author

kuds commented Sep 30, 2024

Sorry for the delay on this. I have the base class created for PPO with the discrete and continuous versions extending it. I will get my changes checked in by end of this week.

@rodrigodesalvobraz
Copy link
Contributor

Sorry for the delay on this. I have the base class created for PPO with the discrete and continuous versions extending it. I will get my changes checked in by end of this week.

No problem, thanks for the update! Looking forward to it.

@kuds
Copy link
Author

kuds commented Oct 8, 2024

I appreciate your patience on this! I have finished creating the PPO base class with the discrete and continuous versions, inheriting from it and overriding it where needed. Let me know your thoughts and if I can help with any other issues or development priorities!

@rodrigodesalvobraz
Copy link
Contributor

rodrigodesalvobraz commented Oct 18, 2024

It's looking good! I was running the Lunar Lander tutorial (nice!) and encountered an error. I am curious if you are seeing the same thing by any chance? Here's the notebook with the error.

@kuds
Copy link
Author

kuds commented Oct 18, 2024

So the issue you are running into is known (Issue #1142) with Gymnasium 0.29.1 and Numpy Verison 2+. You have to downgrade Numpy to version 1+. There is a comment in the Lunar Lander tutorial about this to help during this transition.

image

The Farama Foundation just released version 1.0.0 of Gymnasium about a week ago, which should resolve this issue with Numpy 2+, but I have not tried it with Pearl. The library upgrade for Gymansium and Numpy should probably be its effort/issue/pull request if you are ok with that.

Let me know if you have any other issues or additional questions!

@kuds
Copy link
Author

kuds commented Oct 25, 2024

@rodrigodesalvobraz

It looks like the Pearl repo has undergone some significant changes in the last couple of days, like around the replay buffer. Would you like me to rework my pull request to handle these latest updates?

@rodrigodesalvobraz
Copy link
Contributor

@kuds, that would be greatly appreciated, thank you!
Thank you also for diagnosing that last issue!

@kuds
Copy link
Author

kuds commented Oct 29, 2024

I have merged in the latest round of changes from the main pearl branch and assimilated the renamed replay buffers into the PPO code base. I am still running some tests, but I wanted to get this up for an initial review.

@rodrigodesalvobraz
Copy link
Contributor

I have merged in the latest round of changes from the main pearl branch and assimilated the renamed replay buffers into the PPO code base. I am still running some tests, but I wanted to get this up for an initial review.

Sounds good, thanks. I am checking things around, but I see you are still making changes, so let me know when you're ready.

@rodrigodesalvobraz
Copy link
Contributor

Hi @kuds ,
I checked out your PR today and a few unit tests failed:
image

Errors were:

pearl\policy_learners\sequential_decision_making\ppo.py", line 68, in __init__:
TypeError: ProximalPolicyOptimizationBase.__init__() got an unexpected keyword argument 'history_summarization_learning_rate'
pearl\policy_learners\sequential_decision_making\ppo_continuous.py", line 101, in __init__:
AttributeError: 'ContinuousProximalPolicyOptimization' object has no attribute 'is_action_continuous'

@kuds
Copy link
Author

kuds commented Dec 20, 2024

@rodrigodesalvobraz

Sorry for the delay in responding. I have time this week and next to take a look this.

Thank you for your patience!

@kuds
Copy link
Author

kuds commented Jan 24, 2025

@rodrigodesalvobraz

Quick update: I am working on migrating my code base to the latest version of Pearl. I should have this pull request re-opened by the weekend.

@rodrigodesalvobraz
Copy link
Contributor

@rodrigodesalvobraz

Quick update: I am working on migrating my code base to the latest version of Pearl. I should have this pull request re-opened by the weekend.

Thanks. I really appreciate you updating so many times. Next time you submit I will make a point of integrating it right away so we don't have this problem anymore!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants