DPO pipeline for finetuning LlaMa 2

Special thanks to Hugging face for helpful code and examples. See here and here.

Prerequisites

Install all the dependencies in the requirements.txt:

$ pip install -U -r requirements.txt

Since we will use accelerate for training, make sure to run:

$ accelerate config

There were three main steps to the experiment:

Supervised fine-tuning of the base llama-v2 model. See pre_training_script.py. This makes the language model "on-policy".
DPO fine-tuning. See training_script.py. This finetunes the model to align with the provided preferences.
Testing. See either testing_script.py to run one question through the model, or checkpoints_to_csv.py to run many questions through many models.

See commands.txt for the command line arguments used for these experiments.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
input_data		input_data
output_data		output_data
.gitignore		.gitignore
README.md		README.md
checkpoints_to_csv.py		checkpoints_to_csv.py
commands.txt		commands.txt
pre_training_script.py		pre_training_script.py
requirements.txt		requirements.txt
run_experiment.sh		run_experiment.sh
testing_script.py		testing_script.py
training_script.py		training_script.py