Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add llama example #1382

Merged
merged 6 commits into from
Sep 19, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions examples/onnxruntime/training/text-classification/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,60 @@ limitations under the License.

# Text classification

By running the script [`run_classification.py`](https://github.com/huggingface/optimum/blob/main/examples/onnxruntime/training/text-classification/run_classification.py),
we will be able to leverage the [`ONNX Runtime`](https://github.com/microsoft/onnxruntime) accelerator to fine-tune the models from the
[HuggingFace hub](https://huggingface.co/models) for text classification task.


__The following example applies the acceleration features powered by ONNX Runtime.__


### ONNX Runtime Training

The following example fine-tunes [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on the [Amazon Reviews Dataset](https://huggingface.co/datasets/amazon_reviews_multi).

```bash
torchrun --nproc_per_node=NUM_GPUS_YOU_HAVE run_classification.py \
--model_name_or_path meta-llama/Llama-2-7b-hf \
--dataset_name amazon_reviews_multi \
--dataset_config_name en \
--shuffle_train_dataset \
--metric_name accuracy \
--text_column_name 'review_title,review_body,product_category' \
--text_column_delimiter ' ' \
--label_column_name stars \
--do_train \
--do_eval \
--fp16 \
--max_seq_length 128 \
--per_device_train_batch_size 16 \
--learning_rate 2e-5 \
--num_train_epochs 1 \
--deepspeed zero_stage_2.json \
--use_peft \
--output_dir /tmp/ort-llama-2/
```

### Performance

We get the following results for [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) using mixed-precision-training/LoRA/ZeRO-Stage-2 under PyTorch and ONNX Runtime backends. 8 Nvidia V100 cards were used to run the
experiment for 10 epochs:

| Model | Backend | Runtime(s) | Train samples(/s) |
| --------------------------- |------------- | --------------- | ------------------- |
| meta-llama/Llama-2-7b-hf | PyTorch | 17035.9055 | 117.399 |
| meta-llama/Llama-2-7b-hf | ONNX Runtime | 15532.2403 | 128.764 |

We observe the gain of ONNX Runtime compared to PyTorch as follow:

| Model | Latency | Throughput |
| ------------------------- | ------- | ---------- |
| meta-llama/Llama-2-7b-hf | 8.83% | 9.68% |

#### DeepSpeed

[zero_stage_2.json](https://github.com/huggingface/optimum/blob/main/examples/onnxruntime/training/text-classification/zero_stage_2.json) is an example DeepSpeed config file to enable Stage-2 parameter sharing for training meta-llama/Llama-2-7b. More information can be found at [DeepSpeed's official repo](https://github.com/microsoft/DeepSpeed).

## GLUE Tasks

By running the script [`run_glue.py`](https://github.com/huggingface/optimum/blob/main/examples/onnxruntime/training/text-classification/run_glue.py),
Expand Down
Loading