Special thanks to Hugging face for helpful code and examples. See here and here.
Install all the dependencies in the requirements.txt
:
$ pip install -U -r requirements.txt
Since we will use accelerate
for training, make sure to run:
$ accelerate config
There were three main steps to the experiment:
- Supervised fine-tuning of the base llama-v2 model. See pre_training_script.py. This makes the language model "on-policy".
- DPO fine-tuning. See training_script.py. This finetunes the model to align with the provided preferences.
- Testing. See either testing_script.py to run one question through the model, or checkpoints_to_csv.py to run many questions through many models.
See commands.txt for the command line arguments used for these experiments.