Add support for float8 activation for Int4GroupwisePreshuffleTensor #2437

jerryzh168 · 2025-06-24T22:25:42Z

Stacked PRs:

Add support for float8 activation for Int4GroupwisePreshuffleTensor

Summary:
Added basic op support like linear and bmm, we have both float8 and bf16 in the same Tensor
because it's the same dtype, only difference is whether the activation is quantized or not. Although
there is some differneces in implementation:

bf16 activaton:

group_scale
group_zero

fp8 activation

group_scale
row_scale

Test Plan:
python test/dtypes/test_float8_activation_int4_groupwise_preshuffle.py

Reviewers:

Subscribers:

Tasks:

Tags:

pytorch-bot · 2025-06-24T22:25:46Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2437

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 12 New Failures, 1 Cancelled Job

As of commit cc359e6 with merge base 5a50667 ():

NEW FAILURES - The following jobs have failed:

Run Float8 Tests / test (H100, linux.aws.h100, --pre torch torchvision torchaudio --index-url https://download.pytor... / linux-job (gh)
RuntimeError: Command docker exec -t a7c9d7aac623e8a13a3fab827100355494addf599be8ae48b78a967bda5e867d /exec failed with exit code 2
Run Float8 Tests / test (SM-89, linux.g6.4xlarge.experimental.nvidia.gpu, --pre torch --index-url https://download.p... / linux-job (gh)
RuntimeError: Command docker exec -t ca6cc20c8fd34ea4ad28ac7babacc3e3d9bf2efcdac44757d60a3797e1054f36 /exec failed with exit code 2
Run Regression Tests / test (CPU 2.5.1, linux.4xlarge, torch==2.5.1 --index-url https://download.pytorch.org/whl/cpu, cp... / linux-job (gh)
RuntimeError: Command docker exec -t 9fee3b87a3a837e3659e7f22033e758a150b01d1d5ac1334714fb5bbe4e940b5 /exec failed with exit code 2
Run Regression Tests / test (CPU 2.6, linux.4xlarge, torch==2.6.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh)
RuntimeError: Command docker exec -t 3eb87834dc62b4ae3b4e7ce51d8f741c8cbc521ed3a01806ad4117abf509abdb /exec failed with exit code 2
Run Regression Tests / test (CPU 2.7, linux.4xlarge, torch==2.7.0 --index-url https://download.pytorch.org/whl/cpu, cpu) / linux-job (gh)
RuntimeError: Command docker exec -t 928068097d17ef385f99b4ea443e46bd9f90cde99634c2e0d3009c2ec6d01a26 /exec failed with exit code 2
Run Regression Tests / test (CUDA 2.5.1, linux.g5.12xlarge.nvidia.gpu, torch==2.5.1 --index-url https://download.pytorch... / linux-job (gh)
RuntimeError: Command docker exec -t 7fc46336d368b9fc1760264935abd109b0cc38eebb02759d2da57e4ab4f865d3 /exec failed with exit code 2
Run Regression Tests / test (CUDA 2.6, linux.g5.12xlarge.nvidia.gpu, torch==2.6.0, cuda, 12.6) / linux-job (gh)
RuntimeError: Command docker exec -t dbe82f0d861df0990c09c7983b51c0fd3345e972c0fb5d627640b7e2eaadc810 /exec failed with exit code 2
Run Regression Tests / test (CUDA 2.7, linux.g5.12xlarge.nvidia.gpu, torch==2.7.0, cuda, 12.6) / linux-job (gh)
RuntimeError: Command docker exec -t 33ab5d50f90198ae0647e0b4d7dfd240bc4189b0bb8b5b1ef8c8844204a1172a /exec failed with exit code 2
Run Regression Tests / test-nightly (CPU Nightly, linux.4xlarge, --pre torch --index-url https://download.pytorch.org/wh... / linux-job (gh)
RuntimeError: Command docker exec -t fc634db933c1e39094bb495036cf34affb6d7628cafccc16e4d46df88c9ca7ec /exec failed with exit code 2
Run Regression Tests / test-nightly (CUDA Nightly, linux.g5.12xlarge.nvidia.gpu, --pre torch --index-url https://downloa... / linux-job (gh)
RuntimeError: Command docker exec -t 4797e7ba67eb5a6d64e0faee24f7fcfd6f4829cd2765fd75d18cdc77555865ac /exec failed with exit code 2
Run TorchAO Experimental Tests / test-cpu-ops (macos-14) (gh)
Process completed with exit code 2.
Run TorchAO Experimental Tests / test-mps-ops (macos-m1-stable) (gh)
Process completed with exit code 127.

CANCELLED JOB - The following job was cancelled. Please retry:

Run TorchAO Experimental Tests / test-cpu-ops (linux.arm64.2xlarge) (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: Note: slice is not working yet, others are working Test Plan: python test/dtypes/test_float8_activation_int4_groupwise_preshuffle.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2437, branch: jerryzh168/stack/4

vkuzo · 2025-07-02T12:07:20Z