We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for the --train-align-safety feature to fine-tune the model using Direct Preference Optimization (DPO) from the TRL library.
This feature will align the model's responses to prioritize safe and preferred outputs (chosen) over unsafe or less-preferred ones (rejected).
The dataset should be structured with prompt, chosen, and rejected fields.
The aligned model should be robust against adversarial and unsafe prompts while maintaining high-quality responses.
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Add support for the --train-align-safety feature to fine-tune the model using Direct Preference Optimization (DPO) from the TRL library.
This feature will align the model's responses to prioritize safe and preferred outputs (chosen) over unsafe or less-preferred ones (rejected).
The dataset should be structured with prompt, chosen, and rejected fields.
The aligned model should be robust against adversarial and unsafe prompts while maintaining high-quality responses.
The text was updated successfully, but these errors were encountered: