Starling-7B: Improving Helpfulness and Harmlessness with RLAIF

This is the code base for the Starling project from UC Berkeley including:

Paper: Starling-7B: Improving Helpfulness and Harmlessness with RLAIF (COLM 24)
Blog Post: starling.cs.berkeley.edu
LLM: Starling-LM-7B-alpha
RM: Starling-RM-7B-alpha
RM: Starling-RM-34B
Dataset: Nectar

We include code for the full pipeline: from dataset curation to reward model training to PPO finetuning.

The code base is split into 3 parts:

Nectar: All code pertaining to dataset curation, including prompt sourcing, response distillation, and judgment curation.
Reward Model Training: All code pertaining to reward model training using the Nectar dataset.
trlx: All code pertaining to PPO finetuning, a customized fork of the original trlx codebase.*

Each part has its own respective documentation.

_{^{*Note that it seems the trlx codebase is no longer maintained. Parts of the code may be outdated or may not be compatible with newer systems.}}

Citation

@misc{
starling2024,
title = {Starling-7B: Improving Helpfulness and Harmlessness with RLAIF},
author = {Zhu, Banghua and Frick, Evan and Wu, Tianhao and Zhu, Hanlin and Ganesan, Karthik and Chiang, Wei-lin and Zhang, Jian and Jiao, Jiantao},
booktitle = {First Conference on Language Modeling},
year = {2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
nectar		nectar
reward-model-training		reward-model-training
trlx		trlx
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Starling-7B: Improving Helpfulness and Harmlessness with RLAIF

Citation

About

Releases

Packages

Languages

efrick2002/Starling

Folders and files

Latest commit

History

Repository files navigation

Starling-7B: Improving Helpfulness and Harmlessness with RLAIF

Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages