Pure python training, evaluation and rollout documentation request. #209

redzhepdx · 2024-02-17T13:18:44Z

Hi everyone,

As a professional who has worked with a few RL frameworks in the past, I can confidently say that this is one of the cleanest, most user-friendly, and advanced RL library I've encountered. In fact, I'm planning to introduce it to my team as our future RL framework, and we're excited to contribute to its development. I especially appreciate the Dreamer implementations and the informative blog posts – amazing work!

Based on my experience with RL framework development, I have a few recommendations that could make this library even more appealing to a wider range of engineers:

Pure Python Examples:

While I understand the value of Hydra as a tool for configuration management and rapid experimentation, it can be intimidating for newcomers. To address this barrier and encourage broader adoption, I recommend creating 3-4 pure Python documentation/tutorial examples demonstrating training, evaluation, and rollout using existing Lagos functionalities. This approach has been successful in attracting large-scale users to other RL libraries.

Here are some examples that might be helpful:

Similar to: https://github.com/araffin/rl-tutorial-jnrr19/blob/sb3/1_getting_started.ipynb
A bit more complex (but valuable): https://github.com/araffin/rl-tutorial-jnrr19/blob/sb3/1_getting_started.ipynb
Integrating with Isaac Gym: https://docs.omniverse.nvidia.com/isaacsim/latest/isaac_gym_tutorials/tutorial_advanced_rl_stable_baselines.html

Tips and Tricks:

As we all know, RL algorithms are sensitive to hyperparameters and often require specific techniques like action masking, observation normalization, and reward scaling to be successful on new environments. Given the library's advanced capabilities with World Models, sharing insights and best practices on these topics would be incredibly valuable to the community (including myself!). Here are some examples from other libraries:

https://stable-baselines.readthedocs.io/en/master/guide/rl_tips.html
https://maze-rl.readthedocs.io/en/latest/best_practices_and_tutorials/tricks_of_the_trade.html

Transitioning to Hydra:

Once users become comfortable with the library's fundamentals, they'll naturally progress towards exploring scalability and advanced experimentation, which is where Hydra shines. Consider creating a separate tutorial or example notebook showcasing how to leverage Hydra and Sheep-RL's train and evaluate functionalities to achieve this transition smoothly.

I hope you find these recommendations helpful. Best of luck to the developers!

belerico · 2024-02-20T07:23:44Z

Hi @redzhepdx! Thank you for the suggestions, really appreciated them!
We can definitely have something similar to this and this: what do you think @michele-milesi?
For the contribution we have to introduce a how to contribute.md, but if you want there is an old issue regarding the implementation of the DQN methods and their variants, if you want to start somewhere.
Thank you

michele-milesi · 2024-02-20T10:39:00Z

Hi there,
@belerico, yes, we can start with something similar to the two examples you mentioned.
For the environment part, I think we can try to recycle this. Or are you thinking to use a more complex environment? (like this).

redzhepdx · 2024-02-20T12:16:01Z

Hi @michele-milesi ,
I believe the complexity of the environment matters little. You can use any environment but I would recommend something like crawler or any of mujoco or classical gym environments to show the capabilities of the framework on decently challenging cases so anyone can test it locally.

Thanks a lot for your prompt reaction to this topic.

verityw · 2024-03-26T03:14:58Z

Is there any update on this? Would really appreciate a pure Python example to use for research, to better integrate my existing stable-baselines code with!

michele-milesi · 2024-03-26T08:31:31Z

Hi @verityw,
we are fixing a few problems we found with half-precision training. After this, we will move on to pure python examples.
Thank you for your patience.

belerico assigned belerico and michele-milesi May 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pure python training, evaluation and rollout documentation request. #209

Pure python training, evaluation and rollout documentation request. #209

redzhepdx commented Feb 17, 2024

belerico commented Feb 20, 2024

michele-milesi commented Feb 20, 2024

redzhepdx commented Feb 20, 2024

verityw commented Mar 26, 2024

michele-milesi commented Mar 26, 2024

Pure python training, evaluation and rollout documentation request. #209

Pure python training, evaluation and rollout documentation request. #209

Comments

redzhepdx commented Feb 17, 2024

belerico commented Feb 20, 2024

michele-milesi commented Feb 20, 2024

redzhepdx commented Feb 20, 2024

verityw commented Mar 26, 2024

michele-milesi commented Mar 26, 2024