This is the code base for the Starling project from UC Berkeley including:
- Paper: Starling-7B: Improving Helpfulness and Harmlessness with RLAIF (COLM 24)
- Blog Post: starling.cs.berkeley.edu
- LLM: Starling-LM-7B-alpha
- RM: Starling-RM-7B-alpha
- RM: Starling-RM-34B
- Dataset: Nectar
We include code for the full pipeline: from dataset curation to reward model training to PPO finetuning.
The code base is split into 3 parts:
- Nectar: All code pertaining to dataset curation, including prompt sourcing, response distillation, and judgment curation.
- Reward Model Training: All code pertaining to reward model training using the Nectar dataset.
- trlx: All code pertaining to PPO finetuning, a customized fork of the original trlx codebase.*
Each part has its own respective documentation.
*Note that it seems the trlx codebase is no longer maintained. Parts of the code may be outdated or may not be compatible with newer systems.
@misc{
starling2024,
title = {Starling-7B: Improving Helpfulness and Harmlessness with RLAIF},
author = {Zhu, Banghua and Frick, Evan and Wu, Tianhao and Zhu, Hanlin and Ganesan, Karthik and Chiang, Wei-lin and Zhang, Jian and Jiao, Jiantao},
booktitle = {First Conference on Language Modeling},
year = {2024},
}