diff --git a/python/models/mistral/README.md b/python/models/mistral/README.md index 7c492f9e1..fbe37c0dd 100644 --- a/python/models/mistral/README.md +++ b/python/models/mistral/README.md @@ -1,6 +1,6 @@ # Mistral 7B v0.1 Inference Benchmarking -This demo will show how to run Inference benchmark for comparing ONNX Runtime with Torch Eager mode and Torch compile. +This demo will show how to run Inference benchmark for comparing ONNX Runtime, Torch Eager mode and Torch compile using Mistral 7B model. ## Background @@ -22,23 +22,23 @@ pip install azure-ai-ml azure-identity #### AzureML Workspace - An AzureML workspace is required to run this demo. Download the config.json file ([How to get config.json file from Azure Portal](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-configure-environment#workspace)) for your workspace. Make sure to put this config file in this folder and name it ws_config.json. - The workspace should have a gpu cluster. This demo was tested with GPU cluster of SKU [Standard_ND40rs_v2](https://docs.microsoft.com/en-us/azure/virtual-machines/ndv2-series). See this document for [creating gpu cluster](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-create-attach-compute-cluster?tabs=python). We do not recommend running this demo on `NC` series VMs which uses old architecture (K80). -- Additionally, you'll need to create a [Custom Curated Environment ACPT](https://learn.microsoft.com/en-us/azure/machine-learning/resource-curated-environments) with PyTorch >=2.0.1 and the requirements file in the environment folder: +- Additionally, you'll need to create a [Custom Curated Environment ACPT](https://learn.microsoft.com/en-us/azure/machine-learning/resource-curated-environments) with PyTorch >=2.0.1 and the requirements file in the environment folder. ## Run Experiments The demo is ready to be run. -#### `aml_submit.py` submits an training job to AML for both Pytorch+DeepSpeed+LoRA and ORT+DeepSpeed+LoRA. This job builds the training environment and runs the fine-tuning script in it. +#### `aml_submit_mistral_inference.py` submits an inference job to AML for ONNX Runtime, Torch Eager and Torch compile. This job builds the environment and runs the ([benchmark script](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/llama/benchmark.py)) which is present on onnxruntime repository. ```bash -python aml_submit.py +python aml_submit_mistral_inference.py ``` -The above script will generate a URL showing the prompt processing and token generation time for each case.. +The above script will generate a URL showing the prompt processing (step to get past_key_values) and token generation (step with past_key_values) time for each case. ### Run directly on your compute -If you are using CLI by directly logging into your machine then you can follow the below instructions. The below steps assume you have the required packages like Pytorch, ORT Nightly GPU, Transformers and more already installed in your system. For easier setup, you can look at the environment folder. +If you are using CLI by directly logging into your machine then you can follow the below instructions. It assumes you have the required packages like Pytorch, ORT Nightly GPU, Transformers and more already installed in your system. For easier setup, you can look at the environment folder. ```bash cd inference-code diff --git a/python/models/mistral/aml_submit_mistral_inference.py b/python/models/mistral/aml_submit_mistral_inference.py index e1426b68e..2e8950086 100644 --- a/python/models/mistral/aml_submit_mistral_inference.py +++ b/python/models/mistral/aml_submit_mistral_inference.py @@ -53,10 +53,8 @@ def main(raw_args=None): # https://huggingface.co/datasets/dair-ai/emotion dataset_name = "databricks/databricks-dolly-15k" - dataset_config_name = "split" - text_column_name = "text" - pytorch_job = command( + inference_job = command( code=code_dir, # local path where the code is stored command=f"bash inference_setup.sh", environment=Environment(build=BuildContext(path=environment_dir)), @@ -72,12 +70,12 @@ def main(raw_args=None): shm_size="16g" ) - print("submitting PyTorch job for " + model) - pytorch_returned_job = ml_client.create_or_update(pytorch_job) + print("submitting Inference job for " + model) + inference_returned_job = ml_client.create_or_update(inference_job) print("submitted job") - pytorch_aml_url = pytorch_returned_job.studio_url - print("job link:", pytorch_aml_url) + inference_aml_url = inference_returned_job.studio_url + print("Inference Benchmark job link:", inference_aml_url) if __name__ == "__main__": diff --git a/python/models/mistral/environment/Dockerfile b/python/models/mistral/environment/Dockerfile index 6bce36393..134a48dfd 100644 --- a/python/models/mistral/environment/Dockerfile +++ b/python/models/mistral/environment/Dockerfile @@ -1,16 +1,9 @@ -#FROM ptebic.azurecr.io/test/internal/aifx/acpt/nightly-ubuntu2004-cu118-py38-torch220dev:20230929_ds_stage3_with_optimum_v2 -# FROM mcr.microsoft.com/azureml/aifx/stable-ubuntu2004-cu118-py38-torch211 FROM mcr.microsoft.com/aifx/acpt/stable-ubuntu2004-cu118-py38-torch211 -RUN pip uninstall onnxruntime-training -y - RUN pip install -U --pre torch --index-url https://download.pytorch.org/whl/nightly/cu118 - +RUN pip uninstall onnxruntime-training -y RUN pip install --pre ort-nightly-gpu --extra-index-url=https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/ORT-Nightly/pypi/simple/ - -RUN pip install transformers -RUN pip install optimum - +RUN pip install transformers==4.35.2 +RUN pip install optimum==1.14.1 RUN pip install py3nvml - RUN pip list