This repository contains code for fine-tuning permissive open source LLMs using low-rank adaptation (LoRA).
Code is tested using Stanford Alpaca dataset.
- Estimated training time for fine-tuning RedPajama-INCITE-Base-7B-v0.1 with a single RTX 3090 and Stanford Alpaca is ~12 hours.
- Estimated training time for fine-tuning RedPajama-INCITE-Base-7B-v0.1 with RTX 3090 and RTX Titan and Stanford Alpaca is ~6.5 hours.
- Currently only supports LoRA Instruct fine-tuning RedPajama-INCITE-Base-7B-v0.1.
Inspired by Alpaca-LoRA
Model | Runs | Training Time | Link |
---|---|---|---|
LLaMA 3B | ⬜ | ||
LLaMA 7B | ⬜ | ||
RedPajama 3B | ✅ | 1:44:14 | |
RedPajama 7B | ✅ | 3:09:58 | |
MPT 3B | ⬜ | ||
MPT 7B | ⬜ | ||
Falcon 7B | ✅ |
Ubuntu 20.04.1 LTS (WSL2)
Driver Version: 531.41
CUDA Version: 12.1
cuDNN version: 8.5.0
Install dependencies
poetry install
To fine-tune using NVidia 2000 series GPU or earlier, please comment out this line in finetune.py
model = prepare_model_for_int8_training(model)
This file contains a straightforward application of PEFT / LoRA to decoder only model, as well as some code related to prompt construction and tokenization.
Example usage:
python finetune.py \
--base_model 'togethercomputer/RedPajama-INCITE-Base-7B-v0.1' \
--output_dir './lora-redpajama'
We uses HuggingFace's accelerate
library for distributed training. The following is an example for distributed training with two GPUs.
- NOTE: please set the following environment variables
export WORLD_SIZE=2
export CUDA_VISIBLE_DEVICES=0,1
torchrun \
--nproc_per_node=2 \
--master_port=1234 \
finetune.py \
--base_model 'togethercomputer/RedPajama-INCITE-Base-7B-v0.1' \
--output_dir './lora-redpajama'