This repository contains implementations for Reinforcement Learning with Human Feedback (RLHF) training of Large Language Models (LLMs) using Supervised Fine-Tuning (SFT), Reward Modeling, and Proximal Policy Optimization (PPO). The goal is to create a modular and maintainable codebase for replicating RLHF training on LLMs like LLaMA. The following codebase is specific to LLaMa 2, so while the components can work universally, data related components (such as special token formatting) need to be modified to fit other models.
-
Clone the repository:
git clone https://github.com/lightmatmul/rlhf_training.git cd rlhf_training
-
Create and activate a virtual environment:
python -m venv env source env/bin/activate # On Windows, use `env\Scripts\activate
-
Install the required packages:
pip install -r requirements.txt
To train a model using Supervised Fine-Tuning (SFT), run the following script:
python scripts/train_sft.py
To train a reward model, run the following script:
python scripts/train_reward.py
To train a model using Proximal Policy Optimization (PPO), run the following script:
python scripts/train_ppo.py
The configuration files are located in the configs/ directory. Here’s a brief description of each:
- lora_config.py: Contains the configuration for LoRA (Low-Rank Adaptation).
- reward_config.py: Contains the constants and configurations specific to Reward Modeling.
- ppo_config.py: Contains the constants and configurations specific to PPO.
GPT is used as AI evaluator to determine evaluate the impact of the alignment tuning compared to the original supervised finetuned model:
python eval/gpt_evaluator.py
python eval/count_wins.py
To interact with the trained models, run the following scriptL:
python scripts/inference.py