Closed
Description
With https://github.com/huggingface/transformers/releases/tag/v4.47.0.
Transforms gpt2, mt5, t5 and umt5 models don't support model parallelism when running on PyTorch XPU backend (on few gpu devices) as can be observed by running Transformers tests - see logs below.
Can model parallelism be supported for XPU backend?
- For GPT2 model, gpt2: enable model_parallel for xpu backend #35253
- For MT5 model
- For T5 model
- For UMT5 model
$ cat spec.py
import torch
DEVICE_NAME = 'xpu'
MANUAL_SEED_FN = torch.xpu.manual_seed
EMPTY_CACHE_FN = torch.xpu.empty_cache
DEVICE_COUNT_FN = torch.xpu.device_count
$ TRANSFORMERS_TEST_DEVICE_SPEC=spec.py python3 -m pytest -rsf tests/models/ -k "test_model_parallelization or test_model_parallel_equal_results"
<...>
FAILED tests/models/gpt2/test_modeling_gpt2.py::GPT2ModelTest::test_model_parallel_equal_results - ZeroDivisionError: division by zero
FAILED tests/models/gpt2/test_modeling_gpt2.py::GPT2ModelTest::test_model_parallelization - AssertionError: Torch not compiled with CUDA enabled
FAILED tests/models/mt5/test_modeling_mt5.py::MT5ModelTest::test_model_parallel_equal_results - ZeroDivisionError: division by zero
FAILED tests/models/mt5/test_modeling_mt5.py::MT5ModelTest::test_model_parallelization - AssertionError: Torch not compiled with CUDA enabled
FAILED tests/models/mt5/test_modeling_mt5.py::MT5EncoderOnlyModelTest::test_model_parallel_equal_results - ZeroDivisionError: division by zero
FAILED tests/models/mt5/test_modeling_mt5.py::MT5EncoderOnlyModelTest::test_model_parallelization - AssertionError: Torch not compiled with CUDA enabled
FAILED tests/models/t5/test_modeling_t5.py::T5ModelTest::test_model_parallel_equal_results - ZeroDivisionError: division by zero
FAILED tests/models/t5/test_modeling_t5.py::T5ModelTest::test_model_parallelization - AssertionError: Torch not compiled with CUDA enabled
FAILED tests/models/t5/test_modeling_t5.py::T5EncoderOnlyModelTest::test_model_parallel_equal_results - ZeroDivisionError: division by zero
FAILED tests/models/t5/test_modeling_t5.py::T5EncoderOnlyModelTest::test_model_parallelization - AssertionError: Torch not compiled with CUDA enabled
FAILED tests/models/umt5/test_modeling_umt5.py::UMT5EncoderOnlyModelTest::test_model_parallel_equal_results - AttributeError: 'UMT5EncoderModel' object has no attribute 'parallelize'
FAILED tests/models/umt5/test_modeling_umt5.py::UMT5EncoderOnlyModelTest::test_model_parallelization - AssertionError: Torch not compiled with CUDA enabled
=============================== 12 failed, 682 skipped, 76163 deselected, 5 warnings in 24.79s ================================
Metadata
Metadata
Assignees
Labels
No labels