Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The N Implementation Details of RLHF with PPO #1580

Merged
merged 12 commits into from
Oct 24, 2023

Conversation

vwxyzjn
Copy link
Contributor

@vwxyzjn vwxyzjn commented Oct 13, 2023

Adding the blog post here 🤗. CC @liutianlin0121

@vwxyzjn vwxyzjn requested a review from lvwerra October 13, 2023 16:31
@liutianlin0121
Copy link

Adding the blog post here 🤗. CC @liutianlin0121

Looks awesome! 🤗

@liutianlin0121
Copy link

Just a small recommendation (do feel free to disregard!): perhaps adding a Bibtex citation section at the end of the blog like in https://huggingface.co/blog/rlhf and in https://lilianweng.github.io/posts/2018-04-08-policy-gradient/?

For instance:

@article{Huang2023implementation,
  author = {Huang, Costa and Liu, Tianlin and von Werra, Leandro},
  title = {The N Implementation Details of RLHF with PPO},
  journal = {Hugging Face Blog},
  year = {2023},
  note = {https://huggingface.co/blog/the_n_implementation_details_of_rlhf_with_ppo},
}

@vwxyzjn
Copy link
Contributor Author

vwxyzjn commented Oct 19, 2023

@liutianlin0121 thanks for the suggestion. Done! Btw would you mind creating an HF account and giving me the username? It's gonna be highlighted like this:

image

@liutianlin0121
Copy link

@liutianlin0121 thanks for the suggestion. Done! Btw would you mind creating an HF account and giving me the username? It's gonna be highlighted like this:

image

Thanks! My HF account is tianlinliu0121: https://huggingface.co/tianlinliu0121

Copy link
Member

@pcuenca pcuenca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great, deep technical post, I just took a quick initial look. As a personal opinion, I think it reads too much like a paper and therefore demands a lot of effort from the reader (for example, they have to go through several intros and lists of resources before getting to the "meat"). I'm sure there's a lot of interest in RLHF and PPO from non-specialists, so I would have loved if it was more engaging for that audience, contextualizing stuff and explaining what this means and why it's important in addition to how it works.

I understand a rewrite is a lot of work and impractical at this point, but there may be some small quick wins in the way we present things or introduce concepts.

Of course, feel free to completely disregard this opinion if you don't agree!


# The N Implementation Details of RLHF with PPO

**Correspondence goes to** [[email protected]](mailto:[email protected])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a tad too paper-y for me :) The blog style is more informal than academia, but happy to keep it if you think it's necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

the_n_implementation_details_of_rlhf_with_ppo.md Outdated Show resolved Hide resolved
the_n_implementation_details_of_rlhf_with_ppo.md Outdated Show resolved Hide resolved
the_n_implementation_details_of_rlhf_with_ppo.md Outdated Show resolved Hide resolved
Copy link
Contributor Author

@vwxyzjn vwxyzjn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much @pcuenca for the review. I have addressed most of your comments and updated the _blog.yml.

Copy link
Member

@lvwerra lvwerra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

_blog.yml Outdated Show resolved Hide resolved
Co-authored-by: Leandro von Werra <[email protected]>
@vwxyzjn vwxyzjn merged commit 25bf01e into huggingface:main Oct 24, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants