Skip to content

[TorchAO] Int4XPULayout Lacks MoE Quant Support #1913

@Stonepia

Description

@Stonepia

🚀 The feature, motivation and pitch

Currently, the int4_xpu_layout assumes that the int_data dim is 2, but in MoE case, the int_data dim should be 3. So when calling _convert_weight_to_int4pack, it will result into an assert error:

_convert_weight_to_int4pack_xpu : expect weight to be 2D tensor.

Solution

Currently, the Int4XPULayout has the code like below:

See reference at:
https://github.com/pytorch/ao/blob/1493b15f65917477ce37abb94365b262cc3b1d95/torchao/dtypes/uintx/int4_xpu_layout.py#L255-L260

We need the logic like CUDA and implement the following:

def quant_2d(int_data_2d):
    ...
if int_data.dim()==3: # moe case
...
else: # normal case
...

See CUDA Code at:
https://github.com/pytorch/ao/blob/418593c0e903f2b76072cc75a3010b3ef5396a20/torchao/dtypes/uintx/tensor_core_tiled_layout.py#L289

Affected Test Cases

Affected in total 4 Test cases:

cd ao/test/quantization
python -m pytest -sv -k test_int4wo test_moe_quant.py

FAILED test_moe_quant.py::TestMoEQuantCompile::test_int4wo_base_0_single_token
FAILED test_moe_quant.py::TestMoEQuantCompile::test_int4wo_base_1_multiple_tokens
FAILED test_moe_quant.py::TestMoEQuantCompile::test_int4wo_fake_dim_0_single_token
FAILED test_moe_quant.py::TestMoEQuantCompile::test_int4wo_fake_dim_1_multiple_tokens

Metadata

Metadata

Assignees

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions