Skip to content

xpu: parallelize() not supported for PyTorch XPU backend #35252

Closed
@dvrogozh

Description

@dvrogozh

With https://github.com/huggingface/transformers/releases/tag/v4.47.0.

Transforms gpt2, mt5, t5 and umt5 models don't support model parallelism when running on PyTorch XPU backend (on few gpu devices) as can be observed by running Transformers tests - see logs below.

Can model parallelism be supported for XPU backend?

$ cat spec.py
import torch
DEVICE_NAME = 'xpu'
MANUAL_SEED_FN = torch.xpu.manual_seed
EMPTY_CACHE_FN = torch.xpu.empty_cache
DEVICE_COUNT_FN = torch.xpu.device_count

$ TRANSFORMERS_TEST_DEVICE_SPEC=spec.py python3 -m pytest -rsf tests/models/ -k "test_model_parallelization or test_model_parallel_equal_results"
<...>
FAILED tests/models/gpt2/test_modeling_gpt2.py::GPT2ModelTest::test_model_parallel_equal_results - ZeroDivisionError: division by zero
FAILED tests/models/gpt2/test_modeling_gpt2.py::GPT2ModelTest::test_model_parallelization - AssertionError: Torch not compiled with CUDA enabled
FAILED tests/models/mt5/test_modeling_mt5.py::MT5ModelTest::test_model_parallel_equal_results - ZeroDivisionError: division by zero
FAILED tests/models/mt5/test_modeling_mt5.py::MT5ModelTest::test_model_parallelization - AssertionError: Torch not compiled with CUDA enabled
FAILED tests/models/mt5/test_modeling_mt5.py::MT5EncoderOnlyModelTest::test_model_parallel_equal_results - ZeroDivisionError: division by zero
FAILED tests/models/mt5/test_modeling_mt5.py::MT5EncoderOnlyModelTest::test_model_parallelization - AssertionError: Torch not compiled with CUDA enabled
FAILED tests/models/t5/test_modeling_t5.py::T5ModelTest::test_model_parallel_equal_results - ZeroDivisionError: division by zero
FAILED tests/models/t5/test_modeling_t5.py::T5ModelTest::test_model_parallelization - AssertionError: Torch not compiled with CUDA enabled
FAILED tests/models/t5/test_modeling_t5.py::T5EncoderOnlyModelTest::test_model_parallel_equal_results - ZeroDivisionError: division by zero
FAILED tests/models/t5/test_modeling_t5.py::T5EncoderOnlyModelTest::test_model_parallelization - AssertionError: Torch not compiled with CUDA enabled
FAILED tests/models/umt5/test_modeling_umt5.py::UMT5EncoderOnlyModelTest::test_model_parallel_equal_results - AttributeError: 'UMT5EncoderModel' object has no attribute 'parallelize'
FAILED tests/models/umt5/test_modeling_umt5.py::UMT5EncoderOnlyModelTest::test_model_parallelization - AssertionError: Torch not compiled with CUDA enabled
=============================== 12 failed, 682 skipped, 76163 deselected, 5 warnings in 24.79s ================================

CC: @ArthurZucker @SunMarc

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions