-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bump minimum TorchAO version to 0.7.0 #10293
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@@ -276,7 +278,6 @@ def test_int4wo_quant_bfloat16_conversion(self): | |||
self.assertTrue(isinstance(weight, AffineQuantizedTensor)) | |||
self.assertEqual(weight.quant_min, 0) | |||
self.assertEqual(weight.quant_max, 15) | |||
self.assertTrue(isinstance(weight.layout_type, TensorCoreTiledLayoutType)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
layout_type
has become an internal private attribute called _layout
now. It does not have to be tested as such so can remove. The layout is also now called TensorCoreTiledLayout
instead
size_quantized_with_not_convert = get_model_size_in_bytes(quantized_model_with_not_convert) | ||
size_quantized = get_model_size_in_bytes(quantized_model) | ||
|
||
self.assertTrue(size_quantized < size_quantized_with_not_convert) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to bumping the version, but it makes for a more meaningful test
|
||
for param in module.parameters(): | ||
if param.__class__.__name__ == "AffineQuantizedTensor": | ||
data, scale, zero_point = param.layout_tensor.get_plain() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same reason as above for removing this. layout_tensor
is internal private attribute meaning we shouldn't access it because they could change it without warning in future
self.assertTrue(total_int8wo < total_bf16 < total_int4wo_gs32) | ||
# int4 with default group size quantized very few linear layers compared to a smaller group size of 32 | ||
self.assertTrue(quantized_int4wo < quantized_int4wo_gs32 and unquantized_int4wo > unquantized_int4wo_gs32) | ||
total_int4wo = get_model_size_in_bytes(transformer_int4wo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use the torchao provided utility instead now
@@ -593,7 +589,7 @@ def get_dummy_inputs(self, device: torch.device, seed: int = 0): | |||
|
|||
def _test_quant_type(self, quantization_config, expected_slice): | |||
components = self.get_dummy_components(quantization_config) | |||
pipe = FluxPipeline(**components).to(dtype=torch.bfloat16) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was incorrect thing to do here and it slipped past us in previous PR. We should not be calling .to(dtype)
on the pipeline directly if there has been a model that has been quantized.
The GGUF PR introduced a check in modeling_utils.py
here that catches this behaviour.
Gentle ping @DN6 |
* bump min torchao version to 0.7.0 * update
Context: https://huggingface.slack.com/archives/C065E480NN9/p1734425021147699
cc @yiyixuxu