Replies: 1 comment 3 replies
-
is this the section you are looking at? they say it is similar to the cringe paper, but then goes on to outline what they actually did |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I think authors of self rewarding llm didn't use standard DPO but Interactive DPO, which is from their another paper: https://arxiv.org/pdf/2312.16682.pdf.
Beta Was this translation helpful? Give feedback.
All reactions