-
Notifications
You must be signed in to change notification settings - Fork 785
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The N Implementation Details of RLHF with PPO #1580
Conversation
Looks awesome! 🤗 |
Just a small recommendation (do feel free to disregard!): perhaps adding a Bibtex citation section at the end of the blog like in https://huggingface.co/blog/rlhf and in https://lilianweng.github.io/posts/2018-04-08-policy-gradient/? For instance:
|
@liutianlin0121 thanks for the suggestion. Done! Btw would you mind creating an HF account and giving me the username? It's gonna be highlighted like this: |
Thanks! My HF account is tianlinliu0121: https://huggingface.co/tianlinliu0121 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great, deep technical post, I just took a quick initial look. As a personal opinion, I think it reads too much like a paper and therefore demands a lot of effort from the reader (for example, they have to go through several intros and lists of resources before getting to the "meat"). I'm sure there's a lot of interest in RLHF and PPO from non-specialists, so I would have loved if it was more engaging for that audience, contextualizing stuff and explaining what this means and why it's important in addition to how it works.
I understand a rewrite is a lot of work and impractical at this point, but there may be some small quick wins in the way we present things or introduce concepts.
Of course, feel free to completely disregard this opinion if you don't agree!
|
||
# The N Implementation Details of RLHF with PPO | ||
|
||
**Correspondence goes to** [[email protected]](mailto:[email protected]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a tad too paper-y for me :) The blog style is more informal than academia, but happy to keep it if you think it's necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
Co-authored-by: Pedro Cuenca <[email protected]>
Co-authored-by: Pedro Cuenca <[email protected]>
Co-authored-by: Pedro Cuenca <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much @pcuenca for the review. I have addressed most of your comments and updated the _blog.yml
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM :)
Co-authored-by: Leandro von Werra <[email protected]>
Adding the blog post here 🤗. CC @liutianlin0121