Skip to content

YSLIU627/Regularized-Preference-Optimization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Regularized-Preference-Optimization

This repository contains the code for our NeurIPS 2024 poster paper: Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer.

In this study, we explore two approaches to incorporating the supervised fine-tuning (SFT) loss as a regularizer:

  1. Cumulative SFT Loss: This method calculates the cumulative SFT loss over all unmasked tokens in the selected response.
  2. Average SFT Loss: This method computes the average SFT loss across all unmasked tokens in the chosen responses.

We implement these approaches by integrating the Alignment Handbook Codebase for the cumulative loss and the OpenRLHF Codebase for the average loss.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages