Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add torchbench exports and benchmarks. #845

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 70 additions & 0 deletions models/turbine_models/custom_models/torchbench/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# SHARK torchbench exports and benchmarks

## Overview

This directory serves as a place for scripts and utilities to run a suite of benchmarked inference tasks, showing functionality and performance parity between SHARK/IREE and native torch.compile workflows. It is currently under development and benchmark numbers should not be treated as the best possible result with the current state of IREE compiler optimizations.

Eventually, we want this process to be a plug-in to the upstream torchbench process, and this will be accomplished by exposing the IREE methodology shown here as a compile/runtime backend for the torch benchmark classes. For now, it is set up for developers as a way to get preliminary results and achieve blanket functionality for the models listed in export.py.

The setup instructions provided here, in a few cases, use "gfx942" as the IREE/LLVM hip target. This is for MI300x accelerators -- you can find a mapping of AMD targets to their LLVM target architecture [here](https://llvm.org/docs/AMDGPUUsage.html#amdgpu-architecture-table), and replace "gfx942" in the following documentation with your desired target.

## Setup (docker)

Use the dockerfile provided with the following build/run commands to execute in docker.
These commands assume a few things about your machine/distro, so please read them and make sure they do what you want.

```shell
docker build --platform linux/amd64 --tag shark_torchbench --file shark_torchbench.dockerfile .
```
```shell
docker run -it --network=host --device=/dev/kfd --device=/dev/dri --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v ./shark_torchbench_outputs:/SHARK-Turbine/models/turbine_models/custom_models/torchbench/outputs -w /SHARK-Turbine/models/turbine_models/custom_models/torchbench shark_torchbench:latest
```
```shell
python3 ./export.py --target=gfx942 --device=rocm --compile_to=vmfb --performance --inference --precision=fp16 --float16 --external_weights=safetensors --external_weights_dir=./torchbench_weights/ --output_csv=./outputs/torchbench_results_SHARK.csv
```


## Setup (source)

### Setup source code and prerequisites

- pip install torch+rocm packages:
```shell
pip install torch==2.5.0.dev20240801+rocm6.1 torchvision==0.20.0.dev20240801+rocm6.1 torchaudio==2.4.0.dev20240801+rocm6.1 --index-url https://download.pytorch.org/whl/nightly/rocm6.1

```
- Workaround amdsmi error in pre-release pytorch+rocm:
```shell
sudo apt install amd-smi-lib
sudo chown -R $USER:$USER /opt/rocm/share/amd_smi
python3 -m pip install /opt/rocm/share/amd_smi
```
- Clone torch and expose benchmarking code as a relative module:
```shell
git clone https://github.com/pytorch/pytorch
cd pytorch/benchmarks
touch __init__.py
cd ../..
```
- Clone and install pytorch benchmark modules:
```shell
git clone https://github.com/pytorch/benchmark
cd benchmark
python3 install.py --models BERT_pytorch Background_Matting LearningToPaint alexnet dcgan densenet121 hf_Albert hf_Bart hf_Bert hf_GPT2 hf_T5 mnasnet1_0 mobilenet_v2 mobilenet_v3_large nvidia_deeprecommender pytorch_unet resnet18 resnet50 resnet50_32x4d shufflenet_v2_x1_0 squeezenet1_1 timm_nfnet timm_efficientnet timm_regnet timm_resnest timm_vision_transformer timm_vovnet vgg16
pip install -e .
cd ..
```

### Export and compile

```shell
python ./export.py --target=gfx942 --device=rocm --compile_to=vmfb --performance --inference --precision=fp16 --float16 --external_weights=safetensors --external_weights_dir=./torchbench_weights/
```

### Example of manual benchmark using export and IREE runtime CLI (mobilenet_v3_large)

```shell
python ./export.py --target=gfx942 --device=rocm --compile_to=vmfb --performance --inference --precision=fp16 --float16 --external_weights=safetensors --external_weights_dir=./torchbench_weights/ --model_id=mobilenet_v3_large

iree-benchmark-module --module=generated/mobilenet_v3_large_256_fp16_gfx942.vmfb --input=@generated/mobilenet_v3_large_input0.npy --parameters=model=./torchbench_weights/mobilenet_v3_large_fp16.irpa --device=hip://0 --device_allocator=caching --function=main --benchmark_repetitions=10
```
Empty file.
159 changes: 159 additions & 0 deletions models/turbine_models/custom_models/torchbench/cmd_opts.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,159 @@
import argparse
import os
from pathlib import Path


def path_expand(s):
return Path(s).expanduser().resolve()


def is_valid_file(arg):
if not os.path.exists(arg):
return None
else:
return arg


# Note: this is where command-line options for the scripts in this directory
# are defined along with their defaults. Thus, they should not be referenced
# within modelling or inference code, only at the entry point to the script.

# We should consider separating out the options that are "model configs" from
# the options that control the compiler, runtime, and script behavior,
# when applicable, as the former would best be kept in a separate
# config or imported from huggingface.

p = argparse.ArgumentParser(
description=__doc__, formatter_class=argparse.ArgumentDefaultsHelpFormatter
)

##############################################################################
# general options
##############################################################################

p.add_argument(
"--hf_auth_token",
type=str,
help="The Hugging Face auth token, if required",
default=None,
)
p.add_argument(
"--model_id",
type=str,
help="model ID as it appears in the torchbench models text file lists, or 'all' for batch export",
default="all",
)
p.add_argument(
"--model_lists",
type=Path,
nargs="*",
help="path to a JSON list of models to benchmark. One or more paths.",
default=["torchbench_models.json", "timm_models.json", "torchvision_models.json"],
)
p.add_argument(
"--external_weights_dir",
type=str,
default="",
help="Path to external weights file, for jobs with one weights filepath. When importing, this is used to specify where to save the model weights, and at runtime, this is used to specify where to load the model weights from.",
)
p.add_argument(
"--vmfbs_dir", type=str, default="", help="path to vmfb containing compiled module"
)
p.add_argument(
"--benchmark",
type=str,
default=None,
help="A comma-separated list of submodel IDs for which to report benchmarks for, or 'all' for all components.",
)
p.add_argument(
"--save_outputs",
type=str,
default=None,
help="A comma-separated list of submodel IDs for which to save output .npys for, or 'all' for all components.",
)
p.add_argument("--compile_to", type=str, default="mlir", help="torch, linalg, vmfb")
p.add_argument(
"--external_weights",
type=str,
default="irpa",
choices=["safetensors", "irpa", "gguf", None],
help="Externalizes model weights from the torch dialect IR and its successors",
)
p.add_argument(
"--run_benchmark",
type=bool,
default=True,
)
p.add_argument(
"--num_iters",
type=int,
default=10,
)
p.add_argument(
"--output_csv",
type=str,
default="./benchmark_results.csv",
)

##############################################################################
# Modeling and Export Options
# These options are used to control model defining parameters.
# These are MLIR - changing variables! If you change them, you will need
# to import/download and recompile the model.
##############################################################################

p.add_argument("--batch_size", type=int, default=1, help="Batch size for inference")
p.add_argument(
"--precision",
type=str,
default="fp16",
help="Precision of Stable Diffusion weights and graph.",
)
p.add_argument(
"--decomp_attn",
default=False,
action="store_true",
help="Decompose attention at fx graph level",
)

# See --external_weight_path and --external_weight_dir to specify where to save the model weights.

p.add_argument(
"--compare_vs_torch",
action="store_true",
help="Runs both turbine vmfb and a torch model to compare results",
)
p.add_argument(
"--input_mlir",
type=str,
default=None,
help="Path to input mlir file to compile. Comma-separate paths to provide more than one input to pipelines.",
)


##############################################################################
# IREE Compiler Options
##############################################################################

p.add_argument(
"--device",
type=str,
default="local-task",
help="local-task, local-sync, vulkan://0, rocm://0, cuda://0, etc.",
)
p.add_argument(
"--target",
type=str,
default="gfx942",
help="Usually a rocm chip arch or llvmcpu target triple, e.g. gfx942 or x86_64-linux-gnu.",
)
p.add_argument("--ireec_flags", type=str, default="", help="extra iree-compile options")
p.add_argument(
"--attn_spec",
type=str,
default=None,
help="extra iree-compile options for models with sdpa ops.",
)


args, unknown = p.parse_known_args()
Loading
Loading