Skip to content

Commit

Permalink
update
Browse files Browse the repository at this point in the history
  • Loading branch information
KeitaW committed May 31, 2024
1 parent 7eb0f66 commit d4029d2
Show file tree
Hide file tree
Showing 7 changed files with 183 additions and 56 deletions.
6 changes: 3 additions & 3 deletions 3.test_cases/torchtune/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@ This guide demonstrates the comprehensive process of developing a Large Language

![LLMOps](docs/LLMOps.png)

1. **Data Preparation**: The journey begins with the collection and preparation of data for training. This step is crucial as it involves exploring the data's characteristics, performing necessary cleaning, and applying preprocessing techniques to ensure the data is in the right shape for model training.
1. **(Continuous) Pretraining the Language Model**: Next, the language model undergoes pretraining on a vast corpus of text data. This step can be bypassed if starting with an already pretrained model. Pretraining is essential for the model to learn the general patterns and structures of language. Refer `torchtitan` test case for the large scale pretraining with the latest techniques such as 3D parallelism and `torch.compile`.

2. **Pretraining the Language Model**: Next, the language model undergoes pretraining on a vast corpus of text data. This step can be bypassed if starting with an already pretrained model. Pretraining is essential for the model to learn the general patterns and structures of language. Refer `torchtitan` test case for the large scale pretraining with the latest techniques such as 3D parallelism and `torch.compile`.
2. **Instruction Tuning**: The pretrained model is then fine-tuned to cater to specific tasks by updating its parameters with a new dataset. This process involves partially retraining the model with samples that exemplify the desired behavior, thus refining the model weights for the particular application.

3. **Fine-Tuning**: The pretrained model is then fine-tuned to cater to specific tasks by updating its parameters with a new dataset. This process involves partially retraining the model with samples that exemplify the desired behavior, thus refining the model weights for the particular application.
3. **Aligment**: The pretrained model is then fine-tuned to cater to specific tasks by updating its parameters with a new dataset. This process involves partially retraining the model with samples that exemplify the desired behavior, thus refining the model weights for the particular application.

4. **Evaluation**: Evaluating the LLM's performance is a critical step. It involves using various metrics to assess the model's accuracy and effectiveness. This step is vital for validating new techniques and objectively comparing different model releases.

Expand Down
Binary file modified 3.test_cases/torchtune/docs/LLMOps.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
# End-to-End LLama3-70B model development with Torchtune <!-- omit in toc -->

In this tutorial, you will see how to:
* Pretrain
* Finetune
* Evaluate
* Deploy
* Contious Pretraining
* Instruction Finetuning
* Alignment
* Evaluation
* Deployment

## 1. Prerequisites
Before starting, ensure you have requested access to Meta-Llama-3-70B by visiting [Meta-Llama-3-70B](https://huggingface.co/meta-llama/Meta-Llama-3-70B) on Hugging Face and following the access request instructions. Additionally, make sure all prerequisites described in the [slurm](..) directory are set up.
Expand Down Expand Up @@ -64,16 +65,16 @@ This output confirms that the `torchtune download` command has been executed wit
By following these steps, you ensure that the necessary model components are in place, setting the stage for subsequent tasks such as pretraining, finetuning, evaluation, and deployment.


## 3. Full-parameter finetuning
## 3. Continuous Pretraining

WIP In this step, you will author Llama3 model using c4 dataset.
In this step, you will fine-tune the Llama model. Specifically, the finetune process in this step is called Full-parameter finetuning, which will update all the parameters in the original model.

```bash
sbatch tutorials/e2e-llama3-70b-development/pretrain.sbatch
```


## 4. Lora parameter efficient finetuning
## 4. Instruction-tuning

In this step, you will fine-tune the LLaMA model using Low-Rank Adaptation (LoRA) with the Alpaca dataset. We will first cover the basic concepts and relevant configurations found in the [config file](configs/lora_finetune_distributed.yaml), followed by a detailed fine-tuning tutorial.

Expand Down Expand Up @@ -111,6 +112,10 @@ dataset:

As the config suggests, we use a predefined dataset class prepared in torchtune.

## 5. Alignment



### Submit Finetuning job

You can submit the finetuning job with the following command:
Expand Down Expand Up @@ -226,15 +231,33 @@ quantizer:
groupsize: 256
```
`Int4WeightOnlyQuantizer` performs per-axis group quantization, which means it quantizes weights in groups rather than individually. This helps maintain a balance between compression and model accuracy.
`Int4WeightOnlyQuantizer` performs per-axis group quantization, which means it quantizes weights in groups rather than individually. By adjusting the `groupsize`, one can control the trade-off between compression ratio and accuracy. Smaller group sizes typically lead to higher accuracy but lower compression, while larger group sizes achieve higher compression at the potential cost of accuracy.

```bash
sbatch quentize.sbatch
```


```bash
Executing following command:
torchtune run quantize --config /fsx/ubuntu/awsome-distributed-training/3.test_cases/torchtune/slurm/tutorials/e2e-llama3-70b-development/configs/quantize.yaml tokenizer.path=/fsx/ubuntu/models/torchtune/meta-llama/Meta-Llama-3-70B/original/tokenizer.model checkpointer.checkpoint_dir=/fsx/ubuntu/models/torchtune/meta-llama/Meta-Llama-3-70B-tuned checkpointer.output_dir=/fsx/ubuntu/models/torchtune/meta-llama/Meta-Llama-3-70B-quantized
```

The resultant quantized weights is saved as follows:

```bash
0: 2024-05-31:02:10:46,964 DEBUG [seed.py:60] Setting manual seed to local seed 1234. Local seed is seed + rank = 1234 + 0
0: 2024-05-31:02:18:17,728 INFO [quantize.py:90] Model is initialized with precision torch.bfloat16.
0: 2024-05-31:02:20:33,576 INFO [quantize.py:98] Time for quantization: 133.08 sec
0: 2024-05-31:02:20:33,577 INFO [quantize.py:99] Memory used: 40.03 GB
0: 2024-05-31:02:21:18,609 INFO [quantize.py:112] Model checkpoint of size 37.94 GB saved to /fsx/ubuntu/models/torchtune/meta-llama/Meta-Llama-3-70B-quantized/hf_model_0001_0-4w.pt
```


## 7. Generation

Now that you have production-ready quantized model. This last step test text generation using the model.

```bash
sbatch 7.generate.sbatch --config configs/generate_llama3.yaml --prompt "Hello, my name is"
```
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,42 +12,43 @@ checkpointer:
_component_: torchtune.utils.FullModelHFCheckpointer
checkpoint_dir: ${MODEL_PATH}/${HF_MODEL}
checkpoint_files: [
model-00001-of-00030.safetensors,
model-00002-of-00030.safetensors,
model-00003-of-00030.safetensors,
model-00004-of-00030.safetensors,
model-00005-of-00030.safetensors,
model-00006-of-00030.safetensors,
model-00007-of-00030.safetensors,
model-00008-of-00030.safetensors,
model-00009-of-00030.safetensors,
model-00010-of-00030.safetensors,
model-00011-of-00030.safetensors,
model-00012-of-00030.safetensors,
model-00013-of-00030.safetensors,
model-00014-of-00030.safetensors,
model-00015-of-00030.safetensors,
model-00016-of-00030.safetensors,
model-00017-of-00030.safetensors,
model-00018-of-00030.safetensors,
model-00019-of-00030.safetensors,
model-00020-of-00030.safetensors,
model-00021-of-00030.safetensors,
model-00022-of-00030.safetensors,
model-00023-of-00030.safetensors,
model-00024-of-00030.safetensors,
model-00025-of-00030.safetensors,
model-00026-of-00030.safetensors,
model-00027-of-00030.safetensors,
model-00028-of-00030.safetensors,
model-00029-of-00030.safetensors,
model-00030-of-00030.safetensors,
hf_model_0001_0.pt,
hf_model_0002_0.pt,
hf_model_0003_0.pt,
hf_model_0004_0.pt,
hf_model_0005_0.pt,
hf_model_0006_0.pt,
hf_model_0007_0.pt,
hf_model_0007_0.pt,
hf_model_0008_0.pt,
hf_model_0009_0.pt,
hf_model_0010_0.pt,
hf_model_0011_0.pt,
hf_model_0012_0.pt,
hf_model_0013_0.pt,
hf_model_0014_0.pt,
hf_model_0015_0.pt,
hf_model_0016_0.pt,
hf_model_0017_0.pt,
hf_model_0018_0.pt,
hf_model_0019_0.pt,
hf_model_0020_0.pt,
hf_model_0021_0.pt,
hf_model_0022_0.pt,
hf_model_0023_0.pt,
hf_model_0024_0.pt,
hf_model_0025_0.pt,
hf_model_0026_0.pt,
hf_model_0027_0.pt,
hf_model_0028_0.pt,
hf_model_0029_0.pt,
hf_model_0030_0.pt,
]
recipe_checkpoint: null
output_dir: ${MODEL_PATH}/${HF_MODEL}-quantized
model_type: LLAMA3

device: cuda
device: cpu
dtype: bf16
seed: 1234

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
#!/bin/bash

# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
# SPDX-License-Identifier: MIT-0

#SBATCH --job-name=full-finetuning
#SBATCH --nodes=2
#SBATCH --ntasks=2
#SBATCH --gpus-per-node=8 # Number of GPU per node
#SBATCH --output=logs/%x_%j.out # logfile for stdout
#SBATCH --error=logs/%x_%j.err # logfile for stderr, remove it to merge both outputs
#SBATCH --wait-all-nodes=1
#SBATCH --exclusive
set -euxo pipefail

##################################################################
########### Check current working directory ######################
##################################################################
if [ $(basename $(pwd)) != "slurm" ]
then
echo "Please run this script from the slurm directory"
exit 1
fi
##################################################################
############# Load environment variables #########################
##################################################################
# Load environment variables
if [ ! -f .env ]
then
echo "Please create a .env file with the required environment variables"
exit 1
else
source .env
fi

##################################################################
######### Define EFA/NCCL/Slurm environment variables ############
##################################################################
## EFA settings
export FI_LOG_LEVEL=1
export FI_PROVIDER=efa # change to eth if you want to use ENA for comparisons
export FI_EFA_USE_HUGE_PAGE=0
# https://discuss.pytorch.org/t/nccl-network-is-unreachable-connection-refused-when-initializing-ddp/137352
# https://github.com/pytorch/pytorch/issues/68893
export NCCL_SOCKET_IFNAME=en
export TORCH_NCCL_ASYNC_ERROR_HANDLING=1
export NCCL_DEBUG=INFO
export HOSTNAMES=`scontrol show hostnames "$SLURM_JOB_NODELIST"`
export MASTER_ADDR=$(scontrol show hostnames "$SLURM_JOB_NODELIST" | head -n 1)
export COUNT_NODE=`scontrol show hostnames "$SLURM_JOB_NODELIST" | wc -l`
export NODES=( $( scontrol show hostnames $SLURM_JOB_NODELIST ) )
export NODES_ARRAY=($NODES)
export HEAD_NODE=${NODES_ARRAY[0]}
export MASTER_ADDR=$(hostname --ip-address)
export MASTER_PORT=$RANDOM
export NNODES=$SLURM_JOB_NUM_NODES
export NPROC=$SLURM_GPUS_PER_NODE
export WORLD_SIZE=$(( $NNODES * $NPROC ))

##################################################################
############# Set training arguments #############################
##################################################################
export HF_MODEL="meta-llama/Meta-Llama-3-70B"
: "${CONTAINER_MOUNT:=$FSX_PATH:$FSX_PATH}"
declare -a SRUN_ARGS=(
--container-image $ENROOT_IMAGE
--container-mounts $CONTAINER_MOUNT
)
declare -a TORCHRUN_ARGS=(
# change this to match the number of gpus per node:
--master_addr $MASTER_ADDR
--master_port $RANDOM
--nproc_per_node=8
--nnodes $NNODES
--nnodes=$SLURM_JOB_NUM_NODES
--rdzv_backend=c10d
--rdzv_endpoint=$(hostname)
)
declare -a TRAIN_ARGS=(
--config ${PWD}/tutorials/e2e-llama3-70b-development/configs/lora_finetune_distributed.yaml
tokenizer.path=${MODEL_PATH}/${HF_MODEL}/original/tokenizer.model
checkpointer.checkpoint_dir=${MODEL_PATH}/${HF_MODEL}
checkpointer.output_dir=${MODEL_PATH}/${HF_MODEL}-tuned
output_dir=${MODEL_PATH}/${HF_MODEL}-tuned/log
metric_logger.log_dir=${MODEL_PATH}/${HF_MODEL}-tuned/log/metrics
)
##################################################################
################# Run torchtune ##################################
##################################################################
export PYTHONPATH=${PWD}/torchtune
export TORCHTUNE=${PWD}/torchtune/torchtune/_cli/tune.py
export TORCHTUNE_COMMAND="full_finetune_distributed"
echo "Executing following command:"
echo "torchtune" "run" "${TORCHRUN_ARGS[@]}" "${TORCHTUNE_COMMAND}" "${TORCHTUNE_ARGS[@]}"
srun -l "${SRUN_ARGS[@]}" python ${TORCHTUNE} run "${TORCHRUN_ARGS[@]}" "${TORCHTUNE_COMMAND}" "${TRAIN_ARGS[@]}"
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,14 @@
#SBATCH --exclusive
set -euxo pipefail

##################################################################
########### Check current working directory ######################
##################################################################
if [ $(basename $(pwd)) != "slurm" ]
then
echo "Please run this script from the slurm directory"
exit 1
fi
##################################################################
############# Load environment variables #########################
##################################################################
Expand Down Expand Up @@ -50,26 +58,26 @@ export NPROC=$SLURM_GPUS_PER_NODE
export WORLD_SIZE=$(( $NNODES * $NPROC ))

##################################################################
############### Create train config ##############################
##################################################################
if [ ! -d ${FSX_PATH}/tmp ]; then
mkdir -p ${FSX_PATH}/tmp
fi
cat ${PWD}/train_configs/quantize_llama3.yaml | envsubst > ${FSX_PATH}/tmp/quantize_llama3.yaml
##################################################################
################# Set arguments ##################################
############# Set training arguments #############################
##################################################################
export HF_MODEL="meta-llama/Meta-Llama-3-70B"
: "${CONTAINER_MOUNT:=$FSX_PATH:$FSX_PATH}"
declare -a SRUN_ARGS=(
--container-image $ENROOT_IMAGE
--container-mounts $CONTAINER_MOUNT
)
declare -a TRAIN_ARGS=(
--config ${FSX_PATH}/tmp/quantize_llama3.yaml
--config ${PWD}/tutorials/e2e-llama3-70b-development/configs/quantize.yaml
tokenizer.path=${MODEL_PATH}/${HF_MODEL}/original/tokenizer.model
checkpointer.checkpoint_dir=${MODEL_PATH}/${HF_MODEL}-tuned
checkpointer.output_dir=${MODEL_PATH}/${HF_MODEL}-quantized
)

export TORCHTUNE=${PWD}/torchtune/torchtune/_cli/tune.py
##################################################################
################# Run torchtune ##################################
##################################################################
export PYTHONPATH=${PWD}/torchtune

#srun -l "${SRUN_ARGS[@]}" python ${TORCHTUNE} cp generation /fsx/tmp/generate_llama3.yaml
srun -l "${SRUN_ARGS[@]}" python ${TORCHTUNE} run quantize "${TRAIN_ARGS[@]}"
export TORCHTUNE=${PWD}/torchtune/torchtune/_cli/tune.py
export TORCHTUNE_COMMAND="quantize"
echo "Executing following command:"
echo "torchtune" "run" "${TORCHTUNE_COMMAND}" "${TRAIN_ARGS[@]}"
srun -l "${SRUN_ARGS[@]}" python ${TORCHTUNE} run "${TORCHTUNE_COMMAND}" "${TRAIN_ARGS[@]}"

0 comments on commit d4029d2

Please sign in to comment.