-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new functionality to binarize preference datasets directly from distilabel #264
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, added some comments :)
keep_ties: bool = True, | ||
**kwargs: Any, | ||
) -> "CustomDataset": | ||
"""Binarizes a distilabel dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, some people might be like: "what is binarizing?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you both @davidberenstein1957 and @dvsrepo take a look at the new section of the docs for the dataset preparation/binarization? This commit e4602d4 contains the updates.
@plaguss it looks good, I don't have cycles to do an in-depth review. One suggestion: would it be possible to return the chosen and rejected in the OpenAI format, like this? https://huggingface.co/datasets/HuggingFaceH4/ultrafeedback_binarized?row=0 I think this would make it more interoperable |
|
@plaguss can you resolve the merge conflicts. After that you can merge it. @sdiazlor, this might be interesting to include or at least briefly mention in your tutorial too. #247 "this dataset is already binarized don't know about binarization or do you want to know how to binarize a dataset?" look here |
Description
This PR adds a function to binarize a
CustomDataset
with aPreferenceTask
:Closes #263