-
Notifications
You must be signed in to change notification settings - Fork 293
Add support for float8 activation for Int4GroupwisePreshuffleTensor #2437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: jerryzh168/stack/2
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2437
Note: Links to docs will display an error until the docs builds have been completed. ❌ 12 New Failures, 1 Cancelled JobAs of commit cc359e6 with merge base 5a50667 ( NEW FAILURES - The following jobs have failed:
CANCELLED JOB - The following job was cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Summary: Note: slice is not working yet, others are working Test Plan: python test/dtypes/test_float8_activation_int4_groupwise_preshuffle.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2437, branch: jerryzh168/stack/4
d6d3477
to
26517e8
Compare
Summary: Note: slice is not working yet, others are working Test Plan: python test/dtypes/test_float8_activation_int4_groupwise_preshuffle.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2437, branch: jerryzh168/stack/4
26517e8
to
d187f78
Compare
d187f78
to
2fcff42
Compare
Summary: Note: slice is not working yet, others are working Test Plan: python test/dtypes/test_float8_activation_int4_groupwise_preshuffle.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2437, branch: jerryzh168/stack/4
Summary: Note: slice is not working yet, others are working Test Plan: python test/dtypes/test_float8_activation_int4_groupwise_preshuffle.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2437, branch: jerryzh168/stack/4
2fcff42
to
95856ed
Compare
Summary: Note: slice is not working yet, others are working Test Plan: python test/dtypes/test_float8_activation_int4_groupwise_preshuffle.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2437, branch: jerryzh168/stack/4
95856ed
to
1dec2cb
Compare
1dec2cb
to
1645c79
Compare
Summary: Note: slice is not working yet, others are working Test Plan: python test/dtypes/test_float8_activation_int4_groupwise_preshuffle.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2437, branch: jerryzh168/stack/4
Summary: Note: slice is not working yet, others are working Test Plan: python test/dtypes/test_float8_activation_int4_groupwise_preshuffle.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2437, branch: jerryzh168/stack/4
1645c79
to
5e9e869
Compare
Summary: Note: slice is not working yet, others are working Test Plan: python test/dtypes/test_float8_activation_int4_groupwise_preshuffle.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2437, branch: jerryzh168/stack/4
43dc85e
to
040375e
Compare
Summary: Note: slice is not working yet, others are working Test Plan: python test/dtypes/test_float8_activation_int4_groupwise_preshuffle.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2437, branch: jerryzh168/stack/4
040375e
to
fb2686e
Compare
fb2686e
to
84bae22
Compare
84bae22
to
c13fa2b
Compare
@@ -0,0 +1,166 @@ | |||
# Copyright (c) Meta Platforms, Inc. and affiliates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this test be in the test/quantization/quantize_
folder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, the test file to review is test_int4_groupwise_preshuffle.py, I forgot to remove this file
@unittest.skipIf(not is_sm_at_least_90(), "Nedd sm90+") | ||
class TestInt4GroupwisePreshuffleTensor(TestCase): | ||
def setUp(self): | ||
self.config = FbgemmConfig( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or, maybe organize tests by the config? it would make sense for tests for everything in FbgemmConfig
to be together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was planning to remove FbgemmConfig as well soon
quantized = linear(input) | ||
self.assertTrue(compute_error(original, quantized) > 20) | ||
|
||
# @unittest.skip("WIP: this doesn't work yet") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
self.assertTrue(compute_error(original, quantized) > 20) | ||
|
||
# @unittest.skip("WIP: this doesn't work yet") | ||
def test_slice(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there an integration test we can write to cover this? If this is needed for TP, maybe just have an integration test for TP which is easy to run for all of these configs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is required to run the model in vllm, can add to https://github.com/pytorch/ao/blob/main/test/integration/test_vllm.py when the API is more mature
# making sure param.data is updated | ||
assert param.data.packed_weight[0][0] != orig_value | ||
|
||
def test_bmm(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unit tests are nice, feels like it would be good to also have an integration test to cover all of these ops in one go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e2e tests can be the ones we run in vllm/sglang integration on real models I think?
@@ -2040,6 +2040,8 @@ class FbgemmConfig(AOBaseConfig): | |||
weight_dtype (torch.dtype): weight dtype of the kernel | |||
output_dtype (torch.dtype): output dtype of the kernel | |||
group_size (int): The group size for weight | |||
preshuffle (bool): whether preshuffle the weights or not | |||
activation_dtype_for_int4 (str): the dtype for activation for int4 weight, either bf16 or fp8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's confusing to have both input_dtype
and activation_dtype_for_int4
, what are your thoughts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it's a bit confusing, although this is temporary, I'm deprecating this later in the stack: #2474
) | ||
|
||
|
||
class Float8ActivationInt4GroupwisePreshuffleTensor(TorchAOBaseTensor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not extend the existing Int4GroupwisePreshuffleTensor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry this should be removed
Summary: Added basic op support like linear and bmm, we have both float8 and bf16 in the same Tensor because it's the same dtype, only difference is whether the activation is quantized or not. Although there is some differneces in implementation: bf16 activaton: * group_scale * group_zero fp8 activation * group_scale * row_scale Test Plan: python test/dtypes/test_float8_activation_int4_groupwise_preshuffle.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2437, branch: jerryzh168/stack/4
c13fa2b
to
cc359e6
Compare
Stacked PRs:
transpose_input
from fbgemm configs #2422Add support for float8 activation for Int4GroupwisePreshuffleTensor
Summary:
Added basic op support like linear and bmm, we have both float8 and bf16 in the same Tensor
because it's the same dtype, only difference is whether the activation is quantized or not. Although
there is some differneces in implementation:
bf16 activaton:
fp8 activation
Test Plan:
python test/dtypes/test_float8_activation_int4_groupwise_preshuffle.py
Reviewers:
Subscribers:
Tasks:
Tags: