-
Notifications
You must be signed in to change notification settings - Fork 563
Description
Summary of Contributions (9th Feb)
-
Improve the number of models in TorchBench that work with Dynamo as a tracer: These passing rates are now comparable to those from torch.compile using Inductor. Some of the fixes also improved the previous tracer that PyTorch/XLA used to use.
Inference Training Inductor 87 63 Dynamo 60 to 82 41 to 53 Non-Dynamo 79 to 82 54 to 56 -
Improve the benchmarking tools used by Google: The initial Google runs benchmarking these models showed a discrepancy of about 15 models with the results reported. We identified and fixed 10+ issues that helped reconcile Google's benchmarks with those reported and, in turn, with the PyTorch HUD.
Current State
This post has two lists:
- Failing inference models
- Failing training models
Each of them shows the failing models:
- Tracing without Dynamo (Eager-mode)
- Tracing with Dynamo into openxla (Dynamo+
openxla
)
These lists were created using the benchmarking scripts that currently live in the upstream. The following command was executed:
python xla/benchmarks/experiment_runner.py \
--suite-name torchbench \
--accelerator cuda \
--xla PJRT --xla None \
--dynamo openxla --dynamo inductor --dynamo None \
--test eval --test train \
--repeat 30 --iterations-per-run 5 \
--print-subprocess \
--no-resume
Environment
- GPU: A100 40GB
Inference
Non-Dynamo. Pass rate: 78/81 - 96% (against inductor)
[x] DALLE2_pytorch- Issue: [torchbench] Moving models from CUDA to XLA raise segmentation fault. #6010
- PyTorch/XLA PR: Move CUDA tensors to CPU before moving to XLA. #6060
- PyTorch/XLA PR: Move 0-dimensional tensors to CPU before copying to XLA. #6071
- Moved to canary models: Upgrade numpy to 2.0.0rc benchmark#2311
- Issue: [torchbench] Moving models from CUDA to XLA raise segmentation fault. #6010
- cm3leon_generate
- hf_Longformer
- hf_T5_generate
- moco
- Issue: [torchbench]
moco
fails to run. #6083 - Issue: [torchbench] Models require initialization on CUDA device. #6011
- PyTorch/XLA PR: Re-land: Fix model initialization. #6296
- PyTorch/XLA PR: Fix model initialization. #6076
- Issue: [torchbench] Moving models from CUDA to XLA raise segmentation fault. #6010
- PyTorch/XLA PR: Move CUDA tensors to CPU before moving to XLA. #6060
- PyTorch/XLA PR: Move 0-dimensional tensors to CPU before copying to XLA. #6071
- Issue: [torchbench]
moco
inference fails to run on dynamo. #7636 - Issue: [torchbench]
moco
fails to run with CUDA OpenXLA fallback. #7647
- Issue: [torchbench]
- nvidia_deeprecommender
- Issue: [torchbench]
nvidia_deeprecommender
fails to run. #6006- PyTorch/XLA PR: Re-land: Fix model initialization. #6296
- PyTorch/XLA PR: Fix model initialization. #6076
- Issue: [torchbench]
- pytorch_CycleGAN_and_pix2pix
- Issue: [torchbench]
pytorch_CycleGAN_and_pix2pix
fails to run. #6007- PyTorch/XLA PR: Re-land: Fix model initialization. #6296
- PyTorch/XLA PR: Fix model initialization. #6076
- Issue: [torchbench]
[ ] simple_gpt- RTX 2060 doesn't support BF16
- Issue: [torchbench] Models require initialization on CUDA device. #6011
- PyTorch/XLA PR: Re-land: Fix model initialization. #6296
- PyTorch/XLA PR: Fix model initialization. #6076
- Issue: [torchbench] Moving models from CUDA to XLA raise segmentation fault. #6010
- PyTorch/XLA PR: Move CUDA tensors to CPU before moving to XLA. #6060
- PyTorch/XLA PR: Move 0-dimensional tensors to CPU before copying to XLA. #6071
- SKIP (only work with multiprocess enabled -- torchbench.yaml)
[ ] simple_gpt_tp_manual- RTX 2060 doesn't support BF16
- Issue: [torchbench] Models require initialization on CUDA device. #6011
- PyTorch/XLA PR: Re-land: Fix model initialization. #6296
- PyTorch/XLA PR: Fix model initialization. #6076
- Issue: [torchbench] Moving models from CUDA to XLA raise segmentation fault. #6010
- PyTorch/XLA PR: Move CUDA tensors to CPU before moving to XLA. #6060
- PyTorch/XLA PR: Move 0-dimensional tensors to CPU before copying to XLA. #6071
- SKIP (inside skip list -- torchbench.yaml)
[ ] tacotron2- Issue: [torchbench]
tacotron2
fails to run in eager-mode. #6112 - Issue: [torchbench] Moving models from CUDA to XLA raise segmentation fault. #6010
- PyTorch/XLA PR: Move CUDA tensors to CPU before moving to XLA. #6060
- PyTorch/XLA PR: Move 0-dimensional tensors to CPU before copying to XLA. #6071
- SKIP (inside skip list -- torchbench.yaml)
- Issue: [torchbench]
- timm_efficientdet
- Issue: [torchbench] Models require initialization on CUDA device. #6011
- PyTorch/XLA PR: Re-land: Fix model initialization. #6296
- PyTorch/XLA PR: Fix model initialization. #6076
- Issue: [torchbench] Moving models from CUDA to XLA raise segmentation fault. #6010
- PyTorch/XLA PR: Move CUDA tensors to CPU before moving to XLA. #6060
- PyTorch/XLA PR: Move 0-dimensional tensors to CPU before copying to XLA. #6071
- Issue: [torchbench] Models require initialization on CUDA device. #6011
- vision_maskrcnn
- PyTorch/XLA PR: Fix XLA tensor storage device by using
XlaDeviceToAtenDevice
. #5743 - PyTorch PR: Skip aliasing correction for
lift_fresh
. pytorch#112202 - Issue: [torchbench]
vision_maskrcnn
failing on inference with dynamo afterbfloat16
conversion. #6557- PyTorch/XLA PR:
index
: fix index of 0-element tensor by 0-element tensor. #7113
- PyTorch/XLA PR:
- PyTorch/XLA PR: Fix XLA tensor storage device by using
Dynamo+openxla
. 78/81 - 96% (against inductor)
[x] DALLE2_pytorch- Issue: [torchbench] Moving models from CUDA to XLA raise segmentation fault. #6010
- PyTorch/XLA PR: Move CUDA tensors to CPU before moving to XLA. #6060
- PyTorch/XLA PR: Move 0-dimensional tensors to CPU before copying to XLA. #6071
- Moved to canary models: Upgrade numpy to 2.0.0rc benchmark#2311
- Issue: [torchbench] Moving models from CUDA to XLA raise segmentation fault. #6010
- Super_SloMo
- PyTorch/XLA PR: Add support for
_unsafe_index
. #5707 - PyTorch/benchmark PR: Register grid buffers for Super_SloMo. benchmark#2038
- PyTorch/XLA PR: Add support for
- cm3leon_generate
- detectron2_fasterrcnn_r_101_c4
- detectron2_fasterrcnn_r_101_dc5
- detectron2_fasterrcnn_r_101_fpn
- detectron2_fasterrcnn_r_50_c4
- detectron2_fasterrcnn_r_50_dc5
- detectron2_fasterrcnn_r_50_fpn
- detectron2_fcos_r_50_fpn
- detectron2_maskrcnn
- detectron2_maskrcnn_r_101_c4
- detectron2_maskrcnn_r_101_fpn
- detectron2_maskrcnn_r_50_c4
- detectron2_maskrcnn_r_50_fpn
- dlrm
- hf_BigBird
- hf_GPT2
- PyTorch/XLA PR: [dynamo] Move CPU tensor constructor nodes to XLA. #5922
- hf_GPT2_large
- PyTorch/XLA PR: [dynamo] Move CPU tensor constructor nodes to XLA. #5922
- hf_Longformer
- hf_Reformer
- hf_T5_generate
- moco
- Issue: [torchbench]
moco
fails to run. #6083 - Issue: [torchbench] Models require initialization on CUDA device. #6011
- PyTorch/XLA PR: Re-land: Fix model initialization. #6296
- PyTorch/XLA PR: Fix model initialization. #6076
- Issue: [torchbench] Moving models from CUDA to XLA raise segmentation fault. #6010
- PyTorch/XLA PR: Move CUDA tensors to CPU before moving to XLA. #6060
- PyTorch/XLA PR: Move 0-dimensional tensors to CPU before copying to XLA. #6071
- Issue: [torchbench]
moco
inference fails to run on dynamo. #7636 - Issue: [torchbench]
moco
fails to run with CUDA OpenXLA fallback. #7647
- Issue: [torchbench]
- nvidia_deeprecommender
- Issue: [torchbench]
nvidia_deeprecommender
fails to run. #6006- PyTorch/XLA PR: Re-land: Fix model initialization. #6296
- PyTorch/XLA PR: Fix model initialization. #6076
- Issue: [torchbench]
- pyhpc_isoneutral_mixing
- pyhpc_turbulent_kinetic_energy
- pytorch_CycleGAN_and_pix2pix
- Issue: [torchbench]
pytorch_CycleGAN_and_pix2pix
fails to run. #6007- PyTorch/XLA PR: Re-land: Fix model initialization. #6296
- PyTorch/XLA PR: Fix model initialization. #6076
- Issue: [torchbench]
- speech_transformer
- PyTorch/XLA PR: Sync
xla_args
before computation. #5823
- PyTorch/XLA PR: Sync
- timm_efficientdet
- Issue: [torchbench] Models require initialization on CUDA device. #6011
- PyTorch/XLA PR: Re-land: Fix model initialization. #6296
- PyTorch/XLA PR: Fix model initialization. #6076
- Issue: [torchbench] Moving models from CUDA to XLA raise segmentation fault. #6010
- PyTorch/XLA PR: Move CUDA tensors to CPU before moving to XLA. #6060
- PyTorch/XLA PR: Move 0-dimensional tensors to CPU before copying to XLA. #6071
- Issue: [torchbench] Models require initialization on CUDA device. #6011
Models also Failing on Inductor
Inference Failing on Inductor CUDA with the Same Error
Benchmarks that raise the same error on inductor:
- hf_clip
- 'str' object has no attribute 'shape'
- mobilenet_v2_quantized_qat
- resnet50_quantized_qat
Inference Failing on Inductor CUDA with Different Errors
- simple_gpt
- RTX 2060 doesn't support BF16
- Issue: [torchbench] Models require initialization on CUDA device. #6011
- PyTorch/XLA PR: Re-land: Fix model initialization. #6296
- PyTorch/XLA PR: Fix model initialization. #6076
- Issue: [torchbench] Moving models from CUDA to XLA raise segmentation fault. #6010
- PyTorch/XLA PR: Move CUDA tensors to CPU before moving to XLA. #6060
- PyTorch/XLA PR: Move 0-dimensional tensors to CPU before copying to XLA. #6071
- SKIP (only work with multiprocess enabled -- torchbench.yaml)
- simple_gpt_tp_manual
- RTX 2060 doesn't support BF16
- Issue: [torchbench] Models require initialization on CUDA device. #6011
- PyTorch/XLA PR: Re-land: Fix model initialization. #6296
- PyTorch/XLA PR: Fix model initialization. #6076
- Issue: [torchbench] Moving models from CUDA to XLA raise segmentation fault. #6010
- PyTorch/XLA PR: Move CUDA tensors to CPU before moving to XLA. #6060
- PyTorch/XLA PR: Move 0-dimensional tensors to CPU before copying to XLA. #6071
- SKIP (inside skip list -- torchbench.yaml)
- tacotron2
- Issue: [torchbench] Check failed: xtensor #6005
- Issue: [torchbench] Moving models from CUDA to XLA raise segmentation fault. #6010
- PyTorch/XLA PR: Move CUDA tensors to CPU before moving to XLA. #6060
- PyTorch/XLA PR: Move 0-dimensional tensors to CPU before copying to XLA. #6071
- SKIP (inside skip list -- torchbench.yaml)
Training
Non-Dynamo. Pass rate: 64/66 - 96% (against inductor)
[ ] DALLE2_pytorch- Issue: [torchbench] Training benchmarks failing with: tensor does not require grad #6084
- Issue: [torchbench] Moving models from CUDA to XLA raise segmentation fault. #6010
- PyTorch/XLA PR: Move CUDA tensors to CPU before moving to XLA. #6060
- PyTorch/XLA PR: Move 0-dimensional tensors to CPU before copying to XLA. #6071
- Moved to canary models: Upgrade numpy to 2.0.0rc benchmark#2311
- demucs
- densenet121
- detectron2_fasterrcnn_r_101_c4
- detectron2_fasterrcnn_r_101_dc5
- detectron2_fasterrcnn_r_101_fpn
- detectron2_fasterrcnn_r_50_c4
- detectron2_fasterrcnn_r_50_dc5
- detectron2_fasterrcnn_r_50_fpn
- detectron2_fcos_r_50_fpn
- Skipped by the benchmarking script
- detectron2_maskrcnn_r_101_c4
- detectron2_maskrcnn_r_101_fpn
- detectron2_maskrcnn_r_50_c4
- detectron2_maskrcnn_r_50_fpn
- dlrm
- hf_GPT2_large
- hf_Longformer
- hf_T5_base
[ ] llama_v2_7b_16h- Issue: [torchbench] Training benchmarks failing with: OOM #6003
- SKIP (training not supported -- torchbench.yaml)
- moco
- Issue: [torchbench]
moco
fails to run. #6083 - Issue: [torchbench] Models require initialization on CUDA device. #6011
- PyTorch/XLA PR: Re-land: Fix model initialization. #6296
- PyTorch/XLA PR: Fix model initialization. #6076
- Issue: [torchbench] Moving models from CUDA to XLA raise segmentation fault. #6010
- PyTorch/XLA PR: Move CUDA tensors to CPU before moving to XLA. #6060
- PyTorch/XLA PR: Move 0-dimensional tensors to CPU before copying to XLA. #6071
- Issue: [torchbench]
moco
inference fails to run on dynamo. #7636 - Issue: [torchbench]
moco
fails to run with CUDA OpenXLA fallback. #7647
- Issue: [torchbench]
- nvidia_deeprecommender
- RTX 2060 OOM
- Issue: [torchbench]
nvidia_deeprecommender
fails to run. #6006- PyTorch/XLA PR: Re-land: Fix model initialization. #6296
- PyTorch/XLA PR: Fix model initialization. #6076
- pytorch_CycleGAN_and_pix2pix
- Issue: [torchbench]
pytorch_CycleGAN_and_pix2pix
fails to run. #6007- PyTorch/XLA PR: Re-land: Fix model initialization. #6296
- PyTorch/XLA PR: Fix model initialization. #6076
- Issue: [torchbench]
- stable_diffusion_unet
[ ] tacotron2- Issue: [torchbench]
tacotron2
fails to run in eager-mode. #6112 - Issue: [torchbench] Moving models from CUDA to XLA raise segmentation fault. #6010
- PyTorch/XLA PR: Move CUDA tensors to CPU before moving to XLA. #6060
- PyTorch/XLA PR: Move 0-dimensional tensors to CPU before copying to XLA. #6071
- SKIP (inside skip list -- torchbench.yaml)
- Issue: [torchbench]
- timm_efficientdet
- Issue: [torchbench] Models require initialization on CUDA device. #6011
- PyTorch/XLA PR: Re-land: Fix model initialization. #6296
- PyTorch/XLA PR: Fix model initialization. #6076
- Issue: [torchbench] Moving models from CUDA to XLA raise segmentation fault. #6010
- PyTorch/XLA PR: Move CUDA tensors to CPU before moving to XLA. #6060
- PyTorch/XLA PR: Move 0-dimensional tensors to CPU before copying to XLA. #6071
- Issue: [torchbench] Models require initialization on CUDA device. #6011
- timm_nfnet
- timm_vision_transformer_large
- yolov3
Dynamo+openxla
. Pass rate: 55/66 - 83% (against inductor)
- demucs
- densenet121
- dlrm
- hf_BigBird
- hf_GPT2
- PyTorch/XLA PR: [dynamo] Move CPU tensor constructor nodes to XLA. #5922
- hf_GPT2_large
- PyTorch/XLA PR: [dynamo] Move CPU tensor constructor nodes to XLA. #5922
- hf_Longformer
- hf_Reformer
- moco
- Issue: [torchbench]
moco
fails to run. #6083 - Issue: [torchbench] Models require initialization on CUDA device. #6011
- PyTorch/XLA PR: Re-land: Fix model initialization. #6296
- PyTorch/XLA PR: Fix model initialization. #6076
- Issue: [torchbench] Moving models from CUDA to XLA raise segmentation fault. #6010
- PyTorch/XLA PR: Move CUDA tensors to CPU before moving to XLA. #6060
- PyTorch/XLA PR: Move 0-dimensional tensors to CPU before copying to XLA. #6071
- Issue: [torchbench]
moco
inference fails to run on dynamo. #7636 - Issue: [torchbench]
moco
fails to run with CUDA OpenXLA fallback. #7647
- Issue: [torchbench]
- nvidia_deeprecommender
- pytorch_CycleGAN_and_pix2pix
- Issue: [torchbench]
pytorch_CycleGAN_and_pix2pix
fails to run. #6007- PyTorch/XLA PR: Re-land: Fix model initialization. #6296
- PyTorch/XLA PR: Fix model initialization. #6076
- Issue: [torchbench]
- stable_diffusion_unet
- timm_efficientdet
- Issue: [torchbench] Training benchmarks failing with: OOM #6003
- Issue: [torchbench] Models require initialization on CUDA device. #6011
- PyTorch/XLA PR: Re-land: Fix model initialization. #6296
- PyTorch/XLA PR: Fix model initialization. #6076
- Issue: [torchbench] Moving models from CUDA to XLA raise segmentation fault. #6010
- PyTorch/XLA PR: Move CUDA tensors to CPU before moving to XLA. #6060
- PyTorch/XLA PR: Move 0-dimensional tensors to CPU before copying to XLA. #6071
- timm_vision_transformer
- timm_vision_transformer_large
- torch_multimodal_clip
- yolov3
Models also Failing on Inductor
No Training Support on Inductor CUDA
Benchmarks that raise the error: Model's DEFAULT_TRAIN_BSIZE is not implemented
.
- cm3leon_generate
- detectron2_fcos_r_50_fpn
- doctr_det_predictor
- doctr_reco_predictor
- hf_T5_generate
- llama
- phi_1_5
- pyhpc_equation_of_state
- pyhpc_isoneutral_mixing
- pyhpc_turbulent_kinetic_energy
- sam
- simple_gpt
- simple_gpt_tp_manual
Training Failing on Inductor CUDA with the Same Error
Benchmarks that raise the same error on inductor:
- DALLE2_pytorch
- Issue: [torchbench] Training benchmarks failing with: tensor does not require grad #6084
- Issue: [torchbench] Moving models from CUDA to XLA raise segmentation fault. #6010
- PyTorch/XLA PR: Move CUDA tensors to CPU before moving to XLA. #6060
- PyTorch/XLA PR: Move 0-dimensional tensors to CPU before copying to XLA. #6071
- Moved to canary models: Upgrade numpy to 2.0.0rc benchmark#2311
- llama_v2_7b_16h
- Issue: [torchbench] Training benchmarks failing with: OOM #6003
- SKIP (training not supported -- torchbench.yaml)
- maml
- Issue: [torchbench] Training benchmarks failing with: tensor does not require grad #6084
- SKIP (training not supported -- torchbench.yaml)
- vision_maskrcnn
- targets should not be none when in training mode
- Fix Decomposition for upsample_linear{1d, 3d} pytorch#114774
Training Failing on Inductor CUDA with Different Errors
- detectron2_fasterrcnn_r_101_c4
- detectron2_fasterrcnn_r_101_dc5
- detectron2_fasterrcnn_r_101_fpn
- detectron2_fasterrcnn_r_50_c4
- detectron2_fasterrcnn_r_50_dc5
- detectron2_fasterrcnn_r_50_fpn
- detectron2_maskrcnn
- detectron2_maskrcnn_r_101_c4
- detectron2_maskrcnn_r_101_fpn
- detectron2_maskrcnn_r_50_c4
- detectron2_maskrcnn_r_50_fpn
- opacus_cifar10