-
Notifications
You must be signed in to change notification settings - Fork 531
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add finetuning example for llama 3.1 * LoRA finetuning llama 3.1 with torch tune * Add checkpoint file mounts * working recipe * Add instructions * fix mounting * Allow custom dataset source * Add serve finetuned yaml * remove newlines * fix serving yaml * rename folder * remove gradio * Add readme * update readme * Add news * Update llm/llama-31-finetuning/readme.md Co-authored-by: Zongheng Yang <[email protected]> * Update llm/llama-31-finetuning/readme.md Co-authored-by: Zongheng Yang <[email protected]> * Update llm/llama-31-finetuning/readme.md Co-authored-by: Zongheng Yang <[email protected]> * Update llm/llama-31-finetuning/readme.md Co-authored-by: Zongheng Yang <[email protected]> * Update llm/llama-31-finetuning/readme.md Co-authored-by: Zongheng Yang <[email protected]> * change to underscore * fix serve.yaml * fix readme --------- Co-authored-by: Zongheng Yang <[email protected]>
- Loading branch information
1 parent
541efba
commit 8f280dc
Showing
6 changed files
with
541 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
# Config for multi-device LoRA in lora_finetune_distributed.py | ||
# using a Llama3.1 70B model | ||
# | ||
# This config assumes that you've run the following command before launching | ||
# this run: | ||
# tune download meta-llama/Meta-Llama-3.1-70B-Instruct --output-dir /tmp/Meta-Llama-3.1-70B-Instruct --ignore-patterns "original/consolidated*" | ||
# | ||
# This config needs 8 GPUs to run | ||
# tune run --nproc_per_node 8 lora_finetune_distributed --config llama3_1/70B_lora | ||
|
||
# Model Arguments | ||
model: | ||
_component_: torchtune.models.llama3_1.lora_llama3_1_70b | ||
lora_attn_modules: ['q_proj', 'k_proj', 'v_proj'] | ||
apply_lora_to_mlp: False | ||
apply_lora_to_output: False | ||
lora_rank: 16 | ||
lora_alpha: 32 | ||
|
||
tokenizer: | ||
_component_: torchtune.models.llama3.llama3_tokenizer | ||
path: /tmp/Meta-Llama-3.1-70B-Instruct/original/tokenizer.model | ||
|
||
checkpointer: | ||
_component_: torchtune.utils.FullModelHFCheckpointer | ||
checkpoint_dir: /tmp/Meta-Llama-3.1-70B-Instruct/ | ||
checkpoint_files: [ | ||
model-00001-of-00030.safetensors, | ||
model-00002-of-00030.safetensors, | ||
model-00003-of-00030.safetensors, | ||
model-00004-of-00030.safetensors, | ||
model-00005-of-00030.safetensors, | ||
model-00006-of-00030.safetensors, | ||
model-00007-of-00030.safetensors, | ||
model-00008-of-00030.safetensors, | ||
model-00009-of-00030.safetensors, | ||
model-00010-of-00030.safetensors, | ||
model-00011-of-00030.safetensors, | ||
model-00012-of-00030.safetensors, | ||
model-00013-of-00030.safetensors, | ||
model-00014-of-00030.safetensors, | ||
model-00015-of-00030.safetensors, | ||
model-00016-of-00030.safetensors, | ||
model-00017-of-00030.safetensors, | ||
model-00018-of-00030.safetensors, | ||
model-00019-of-00030.safetensors, | ||
model-00020-of-00030.safetensors, | ||
model-00021-of-00030.safetensors, | ||
model-00022-of-00030.safetensors, | ||
model-00023-of-00030.safetensors, | ||
model-00024-of-00030.safetensors, | ||
model-00025-of-00030.safetensors, | ||
model-00026-of-00030.safetensors, | ||
model-00027-of-00030.safetensors, | ||
model-00028-of-00030.safetensors, | ||
model-00029-of-00030.safetensors, | ||
model-00030-of-00030.safetensors, | ||
] | ||
recipe_checkpoint: null | ||
output_dir: /tmp/Meta-Llama-3.1-70B-Instruct/ | ||
model_type: LLAMA3 | ||
resume_from_checkpoint: False | ||
|
||
# Dataset and Sampler | ||
dataset: | ||
_component_: torchtune.datasets.alpaca_dataset | ||
seed: null | ||
shuffle: True | ||
batch_size: 2 | ||
|
||
# Optimizer and Scheduler | ||
optimizer: | ||
_component_: torch.optim.AdamW | ||
weight_decay: 0.01 | ||
lr: 3e-4 | ||
lr_scheduler: | ||
_component_: torchtune.modules.get_cosine_schedule_with_warmup | ||
num_warmup_steps: 100 | ||
|
||
loss: | ||
_component_: torch.nn.CrossEntropyLoss | ||
|
||
# Training | ||
epochs: 1 | ||
max_steps_per_epoch: null | ||
gradient_accumulation_steps: 1 | ||
|
||
# Logging | ||
output_dir: /tmp/lora_finetune_output | ||
metric_logger: | ||
_component_: torchtune.utils.metric_logging.DiskLogger | ||
log_dir: ${output_dir} | ||
log_every_n_steps: 1 | ||
log_peak_memory_stats: False | ||
|
||
# Environment | ||
device: cuda | ||
dtype: bf16 | ||
enable_activation_checkpointing: True |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
# Config for multi-device LoRA finetuning in lora_finetune_distributed.py | ||
# using a Llama3.1 8B Instruct model | ||
# | ||
# This config assumes that you've run the following command before launching | ||
# this run: | ||
# tune download meta-llama/Meta-Llama-3.1-8B-Instruct --output-dir /tmp/Meta-Llama-3.1-8B-Instruct --ignore-patterns "original/consolidated.00.pth" | ||
# | ||
# To launch on 2 devices, run the following command from root: | ||
# tune run --nproc_per_node 2 lora_finetune_distributed --config llama3_1/8B_lora | ||
# | ||
# You can add specific overrides through the command line. For example | ||
# to override the checkpointer directory while launching training | ||
# you can run: | ||
# tune run --nproc_per_node 2 lora_finetune_distributed --config llama3_1/8B_lora checkpointer.checkpoint_dir=<YOUR_CHECKPOINT_DIR> | ||
# | ||
# This config works best when the model is being fine-tuned on 2+ GPUs. | ||
# For single device LoRA finetuning please use 8B_lora_single_device.yaml | ||
# or 8B_qlora_single_device.yaml | ||
|
||
# Tokenizer | ||
tokenizer: | ||
_component_: torchtune.models.llama3.llama3_tokenizer | ||
path: /tmp/Meta-Llama-3.1-8B-Instruct/original/tokenizer.model | ||
|
||
# Model Arguments | ||
model: | ||
_component_: torchtune.models.llama3_1.lora_llama3_1_8b | ||
lora_attn_modules: ['q_proj', 'v_proj'] | ||
apply_lora_to_mlp: False | ||
apply_lora_to_output: False | ||
lora_rank: 8 | ||
lora_alpha: 16 | ||
|
||
checkpointer: | ||
_component_: torchtune.utils.FullModelHFCheckpointer | ||
checkpoint_dir: /tmp/Meta-Llama-3.1-8B-Instruct/ | ||
checkpoint_files: [ | ||
model-00001-of-00004.safetensors, | ||
model-00002-of-00004.safetensors, | ||
model-00003-of-00004.safetensors, | ||
model-00004-of-00004.safetensors | ||
] | ||
recipe_checkpoint: null | ||
output_dir: /tmp/Meta-Llama-3.1-8B-Instruct/ | ||
model_type: LLAMA3 | ||
resume_from_checkpoint: False | ||
|
||
# Dataset and Sampler | ||
dataset: | ||
_component_: torchtune.datasets.alpaca_cleaned_dataset | ||
seed: null | ||
shuffle: True | ||
batch_size: 2 | ||
|
||
# Optimizer and Scheduler | ||
optimizer: | ||
_component_: torch.optim.AdamW | ||
weight_decay: 0.01 | ||
lr: 3e-4 | ||
lr_scheduler: | ||
_component_: torchtune.modules.get_cosine_schedule_with_warmup | ||
num_warmup_steps: 100 | ||
|
||
loss: | ||
_component_: torch.nn.CrossEntropyLoss | ||
|
||
# Training | ||
epochs: 1 | ||
max_steps_per_epoch: null | ||
gradient_accumulation_steps: 32 | ||
|
||
# Logging | ||
output_dir: /tmp/lora_finetune_output | ||
metric_logger: | ||
_component_: torchtune.utils.metric_logging.DiskLogger | ||
log_dir: ${output_dir} | ||
log_every_n_steps: 1 | ||
log_peak_memory_stats: False | ||
|
||
# Environment | ||
device: cuda | ||
dtype: bf16 | ||
enable_activation_checkpointing: False |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
# LoRA finetuning Meta Llama-3.1 on any of your own infra. | ||
# | ||
# Usage: | ||
# | ||
# HF_TOKEN=xxx sky launch lora.yaml -c llama31 --env HF_TOKEN | ||
# | ||
# To finetune a 70B model: | ||
# | ||
# HF_TOKEN=xxx sky launch lora.yaml -c llama31-70 --env HF_TOKEN --env MODEL_SIZE=70B | ||
|
||
envs: | ||
MODEL_SIZE: 8B | ||
HF_TOKEN: | ||
DATASET: "yahma/alpaca-cleaned" | ||
# Change this to your own checkpoint bucket | ||
CHECKPOINT_BUCKET_NAME: sky-llama-31-checkpoints | ||
|
||
|
||
resources: | ||
accelerators: A100:8 | ||
disk_tier: best | ||
use_spot: true | ||
|
||
file_mounts: | ||
/configs: ./configs | ||
/output: | ||
name: $CHECKPOINT_BUCKET_NAME | ||
mode: MOUNT | ||
# Optionally, specify the store to enforce to use one of the stores below: | ||
# r2/azure/gcs/s3/cos | ||
# store: r2 | ||
|
||
setup: | | ||
pip install torch torchvision | ||
# Install torch tune from source for the latest Llama-3.1 model | ||
pip install git+https://github.com/pytorch/torchtune.git@58255001bd0b1e3a81a6302201024e472af05379 | ||
# pip install torchtune | ||
tune download meta-llama/Meta-Llama-3.1-${MODEL_SIZE}-Instruct \ | ||
--hf-token $HF_TOKEN \ | ||
--output-dir /tmp/Meta-Llama-3.1-${MODEL_SIZE}-Instruct \ | ||
--ignore-patterns "original/consolidated*" | ||
run: | | ||
tune run --nproc_per_node $SKYPILOT_NUM_GPUS_PER_NODE \ | ||
lora_finetune_distributed \ | ||
--config /configs/${MODEL_SIZE}-lora.yaml \ | ||
dataset.source=$DATASET | ||
# Remove the checkpoint files to save space, LoRA serving only needs the | ||
# adapter files. | ||
rm /tmp/Meta-Llama-3.1-${MODEL_SIZE}-Instruct/*.pt | ||
rm /tmp/Meta-Llama-3.1-${MODEL_SIZE}-Instruct/*.safetensors | ||
mkdir -p /output/$MODEL_SIZE-lora | ||
rsync -Pavz /tmp/Meta-Llama-3.1-${MODEL_SIZE}-Instruct /output/$MODEL_SIZE-lora | ||
cp -r /tmp/lora_finetune_output /output/$MODEL_SIZE-lora/ |
Oops, something went wrong.