update LoRA part WIP

aws-samples · KeitaW · Mar 16, 2024 · Mar 17, 2024 · Mar 17, 2024 · Mar 17, 2024
commit b024895670872f18dba09b73257aa1fcbaeba84a
diff --git a/3.test_cases/torchtune/slurm/pretraining/README.md b/3.test_cases/torchtune/slurm/pretraining/README.md
diff --git a/...chtune/slurm/deployment/7.generate.sbatch → ...-llama3-70b-development/7.generate.sbatch b/...chtune/slurm/deployment/7.generate.sbatch → ...-llama3-70b-development/7.generate.sbatch
diff --git a/3.test_cases/torchtune/slurm/tutorials/e2e-llama3-70b-development/README.md b/3.test_cases/torchtune/slurm/tutorials/e2e-llama3-70b-development/README.md
@@ -64,26 +64,62 @@ This output confirms that the `torchtune download` command has been executed wit
 By following these steps, you ensure that the necessary model components are in place, setting the stage for subsequent tasks such as pretraining, finetuning, evaluation, and deployment.
 
 
-## 3. Pretrain Llama3 model
+## 3. Full-parameter finetuning
 
-In this step, you will author Llama3 model using c4 dataset. 
+WIP In this step, you will author Llama3 model using c4 dataset. 
 
 ```bash
 sbatch tutorials/e2e-llama3-70b-development/pretrain.sbatch
 ```
 
 
+## 4. Lora parameter efficient finetuning
 
-## 4. Finetune Llama3 model
+In this step, you will fine tune llama model with Low-Rank Adaptation (LoRA), using Alpaca dataset.  
+Low-Rank Adaptation (LoRA) is a method introduced by Microsoft researchers in 2021 for fine-tuning large language models and other AI models efficiently. It is a Parameter-efficient Fine-tuning (PEFT) technique that modifies a small, low-rank subset of a model's parameters, significantly reducing the computational cost and time required for fine-tuning. LoRA operates on the principle that large models, despite their size, inherently possess a low-dimensional structure, allowing significant changes to be represented with fewer parameters. This method involves decomposing large weight matrices into smaller matrices, drastically reducing the number of trainable parameters and making the adaptation process faster and less resource-intensive. LoRA achieves high-quality fine-tuning results by adjusting all the model's parameters, albeit not as precisely when the rank is low, which is generally acceptable for most tasks. It leverages the concept of lower-rank matrices to efficiently train models, making it a cost-effective solution for fine-tuning large language models.
+
+```yaml
+model:
+  _component_: torchtune.models.llama3.lora_llama3_70b
+  lora_attn_modules: ['q_proj', 'k_proj', 'v_proj']
+  apply_lora_to_mlp: False
+  apply_lora_to_output: False
+  lora_rank: 16
+  lora_alpha: 32
+```
+For this particular example, we utilize alpaca_data.json. This JSON file comprises a list of dictionaries where each dictionary contains the following fields:
+instruction: a string that describes the task the model should perform. Each of the 52,000 instructions is unique.
+input: a string providing optional context or input for the task. For instance, if the instruction is "Summarize the following article," the input would be the article text. Approximately 40% of the examples include an input.
+output: a string representing the response to the instruction as generated by the text-davinci-003 model.
+```yaml
+dataset:
+  _component_: torchtune.datasets.alpaca_dataset
+  train_on_input: True
+```
 
-In this step, you will fine tune llama model, using Alpaca dataset. 
 
 ```bash
-sbatch 4.finetune.sbatch
+sbatch tutorials/e2e-llama3-70b-development/lora_finetune_distributed.sbatch
 ```
 
 Once the job has been completed, you will see following outputs in the log:
 
+
+```bash
+...
+Executing following command:
+torchtune run --master_addr 10.1.28.89 --master_port 14280 --nproc_per_node=8 --nnodes 1 --nnodes=1 --rdzv_backend=c10d --rdzv_endpoint=p5-st-p5-2 lora_finetune_distributed
+...
+0: wandb: Currently logged in as: <YOURUSERNAME>. Use `wandb login --relogin` to force relogin
+0: wandb: Tracking run with wandb version 0.17.0
+0: wandb: Run data is saved locally in /fsx/ubuntu/models/torchtune/meta-llama/Meta-Llama-3-70B-tuned/log/metrics/wandb/run-20240527_001350-oziekm6j
+0: wandb: Run `wandb offline` to turn off syncing.
+0: wandb: Syncing run helpful-surf-1
+0: wandb: ⭐️ View project at https://wandb.ai/<YOURUSERNAME>/torchtune
+0: wandb: 🚀 View run at https://wandb.ai/<YOURUSERNAME>/torchtune/runs/oziekm6j
+0: 2024-05-27:00:13:50,919 INFO     [metric_logging.py:225] Logging /fsx/ubuntu/models/torchtune/meta-llama/Meta-Llama-3-70B/torchtune_config.yaml to W&B under Files
+```
+
 ```bash
 ==> logs/convert-checkpoint_560.out <==
 0: INFO:torchtune.utils.logging:Model checkpoint of size 4.97 GB saved to /fsx/models/torchtitan-torchtune/meta-llama/Meta-Llama-3-70B-tuned/hf_model_0024_0.pt
@@ -106,6 +142,8 @@ config.json         hf_model_0003_0.pt  hf_model_0006_0.pt  hf_model_0009_0.pt
 hf_model_0001_0.pt  hf_model_0004_0.pt  hf_model_0007_0.pt  hf_model_0010_0.pt  hf_model_0013_0.pt  hf_model_0016_0.pt  hf_model_0019_0.pt  hf_model_0022_0.pt  hf_model_0025_0.pt  hf_model_0028_0.pt
 ```
 
+Notice that you have `adapter_0.pt`, which stores weighhs for the lora adapter.
+
 ## 5. Evaluate Llama3 model with lm-evaluation harness
 
 In this last section, you will evaluate Llama models. It will make use of [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness). 

diff --git a/.../slurm/train_configs/evaluate_llama3.yaml → ...-development/configs/evaluate_llama3.yaml b/.../slurm/train_configs/evaluate_llama3.yaml → ...-development/configs/evaluate_llama3.yaml
diff --git a/...ma3-70b-development/configs/70b_full.yaml → ...nt/configs/full_finetune_distributed.yaml b/...ma3-70b-development/configs/70b_full.yaml → ...nt/configs/full_finetune_distributed.yaml
diff --git a/.../slurm/train_configs/generate_llama3.yaml → ...-development/configs/generate_llama3.yaml b/.../slurm/train_configs/generate_llama3.yaml → ...-development/configs/generate_llama3.yaml
diff --git a/...ain_configs/finetune_llama3_70b_lora.yaml → ...nt/configs/lora_finetune_distributed.yaml b/...ain_configs/finetune_llama3_70b_lora.yaml → ...nt/configs/lora_finetune_distributed.yaml
@@ -20,11 +20,11 @@ model:
 
 tokenizer:
   _component_: torchtune.models.llama3.llama3_tokenizer
-  path: ${MODEL_PATH}/${HF_MODEL}/original/tokenizer.model
+  path: None
 
 checkpointer:
   _component_: torchtune.utils.FullModelHFCheckpointer
-  checkpoint_dir:  ${MODEL_PATH}/${HF_MODEL}
+  checkpoint_dir:  None
   checkpoint_files: [
       model-00001-of-00030.safetensors,
       model-00002-of-00030.safetensors,
@@ -58,7 +58,7 @@ checkpointer:
       model-00030-of-00030.safetensors,
   ]
   recipe_checkpoint: null
-  output_dir: ${MODEL_PATH}/${HF_MODEL}-tuned
+  output_dir: None
   model_type: LLAMA3
 resume_from_checkpoint: False
 
@@ -88,10 +88,10 @@ max_steps_per_epoch: null
 gradient_accumulation_steps: 1
 
 # Logging
-output_dir: ${MODEL_PATH}/${HF_MODEL}/lora_finetune_output
+output_dir: None
 metric_logger:
-  _component_: torchtune.utils.metric_logging.DiskLogger
-  log_dir: ${MODEL_PATH}/${HF_MODEL}/lora_finetune_output
+  _component_: torchtune.utils.metric_logging.WandBLogger
+  log_dir: None
 log_every_n_steps: 1
 log_peak_memory_stats: False
 

diff --git a/...rm/train_configs/pretrain_llama3_70b.toml → ...elopment/configs/pretrain_llama3_70b.toml b/...rm/train_configs/pretrain_llama3_70b.toml → ...elopment/configs/pretrain_llama3_70b.toml
diff --git a/.../slurm/train_configs/quantize_llama3.yaml → ...-development/configs/quantize_llama3.yaml b/.../slurm/train_configs/quantize_llama3.yaml → ...-development/configs/quantize_llama3.yaml
diff --git a/...orchtune/slurm/evaluation/evaluate.sbatch → ...2e-llama3-70b-development/evaluate.sbatch b/...orchtune/slurm/evaluation/evaluate.sbatch → ...2e-llama3-70b-development/evaluate.sbatch
diff --git a/...ases/torchtune/slurm/tutorials/e2e-llama3-70b-development/lora_finetun_distributed.sbatch b/...ases/torchtune/slurm/tutorials/e2e-llama3-70b-development/lora_finetun_distributed.sbatch
diff --git a/...orchtune/slurm/finetuning/finetune.sbatch → ...elopment/lora_finetune_distributed.sbatch b/...orchtune/slurm/finetuning/finetune.sbatch → ...elopment/lora_finetune_distributed.sbatch
@@ -13,6 +13,14 @@
 #SBATCH --exclusive
 set -euxo pipefail
 
+##################################################################
+########### Check current working directory ######################
+##################################################################
+if [ $(basename $(pwd)) != "slurm" ]
+then
+    echo "Please run this script from the slurm directory"
+    exit 1
+fi
 ##################################################################
 ############# Load environment variables #########################
 ##################################################################
@@ -50,16 +58,9 @@ export NPROC=$SLURM_GPUS_PER_NODE
 export WORLD_SIZE=$(( $NNODES * $NPROC ))
 
 ##################################################################
-############### Create train config ##############################
-##################################################################
-if [ ! -d ${FSX_PATH}/tmp ]; then
-    mkdir -p ${FSX_PATH}/tmp
-fi
-cat ${PWD}/train_configs/finetune_llama3_70b_lora.yaml | envsubst > ${FSX_PATH}/tmp/finetune_llama3_70b_lora.yaml
-
-##################################################################
-################# Set arguments ##################################
+############# Set training arguments #############################
 ##################################################################
+export HF_MODEL="meta-llama/Meta-Llama-3-70B"
 : "${CONTAINER_MOUNT:=$FSX_PATH:$FSX_PATH}"
 declare -a SRUN_ARGS=(
     --container-image $ENROOT_IMAGE
@@ -76,10 +77,19 @@ declare -a TORCHRUN_ARGS=(
     --rdzv_endpoint=$(hostname)
 )
 declare -a TRAIN_ARGS=(
-    --config ${FSX_PATH}/tmp/finetune_llama3_70b_lora.yaml
+    --config  ${PWD}/tutorials/e2e-llama3-70b-development/configs/lora_finetune_distributed.yaml
+    tokenizer.path=${MODEL_PATH}/${HF_MODEL}/original/tokenizer.model
+    checkpointer.checkpoint_dir=${MODEL_PATH}/${HF_MODEL}
+    checkpointer.output_dir=${MODEL_PATH}/${HF_MODEL}-tuned
+    output_dir=${MODEL_PATH}/${HF_MODEL}-tuned/log
+    metric_logger.log_dir=${MODEL_PATH}/${HF_MODEL}-tuned/log/metrics
 )
-
-export TORCHTUNE=${PWD}/torchtune/torchtune/_cli/tune.py
+##################################################################
+################# Run torchtune ##################################
+##################################################################
 export PYTHONPATH=${PWD}/torchtune
-
-srun -l "${SRUN_ARGS[@]}" python ${TORCHTUNE} run  "${TORCHRUN_ARGS[@]}" lora_finetune_distributed "${TRAIN_ARGS[@]}"
+export TORCHTUNE=${PWD}/torchtune/torchtune/_cli/tune.py
+export TORCHTUNE_COMMAND="lora_finetune_distributed"
+echo "Executing following command:"
+echo "torchtune" "run" "${TORCHRUN_ARGS[@]}" "${TORCHTUNE_COMMAND}" "${TORCHTUNE_ARGS[@]}"
+srun -l "${SRUN_ARGS[@]}" python ${TORCHTUNE} run "${TORCHRUN_ARGS[@]}" "${TORCHTUNE_COMMAND}" "${TRAIN_ARGS[@]}"
diff --git a/...orchtune/slurm/deployment/quantize.sbatch → ...2e-llama3-70b-development/quantize.sbatch b/...orchtune/slurm/deployment/quantize.sbatch → ...2e-llama3-70b-development/quantize.sbatch