Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump minimum TorchAO version to 0.7.0 #10293

Merged
merged 6 commits into from
Dec 23, 2024
Merged

Bump minimum TorchAO version to 0.7.0 #10293

merged 6 commits into from
Dec 23, 2024

Conversation

a-r-r-o-w
Copy link
Member

@a-r-r-o-w a-r-r-o-w requested a review from DN6 December 18, 2024 19:54
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@@ -276,7 +278,6 @@ def test_int4wo_quant_bfloat16_conversion(self):
self.assertTrue(isinstance(weight, AffineQuantizedTensor))
self.assertEqual(weight.quant_min, 0)
self.assertEqual(weight.quant_max, 15)
self.assertTrue(isinstance(weight.layout_type, TensorCoreTiledLayoutType))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

layout_type has become an internal private attribute called _layout now. It does not have to be tested as such so can remove. The layout is also now called TensorCoreTiledLayout instead

size_quantized_with_not_convert = get_model_size_in_bytes(quantized_model_with_not_convert)
size_quantized = get_model_size_in_bytes(quantized_model)

self.assertTrue(size_quantized < size_quantized_with_not_convert)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to bumping the version, but it makes for a more meaningful test


for param in module.parameters():
if param.__class__.__name__ == "AffineQuantizedTensor":
data, scale, zero_point = param.layout_tensor.get_plain()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same reason as above for removing this. layout_tensor is internal private attribute meaning we shouldn't access it because they could change it without warning in future

self.assertTrue(total_int8wo < total_bf16 < total_int4wo_gs32)
# int4 with default group size quantized very few linear layers compared to a smaller group size of 32
self.assertTrue(quantized_int4wo < quantized_int4wo_gs32 and unquantized_int4wo > unquantized_int4wo_gs32)
total_int4wo = get_model_size_in_bytes(transformer_int4wo)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the torchao provided utility instead now

@@ -593,7 +589,7 @@ def get_dummy_inputs(self, device: torch.device, seed: int = 0):

def _test_quant_type(self, quantization_config, expected_slice):
components = self.get_dummy_components(quantization_config)
pipe = FluxPipeline(**components).to(dtype=torch.bfloat16)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was incorrect thing to do here and it slipped past us in previous PR. We should not be calling .to(dtype) on the pipeline directly if there has been a model that has been quantized.

The GGUF PR introduced a check in modeling_utils.py here that catches this behaviour.

@a-r-r-o-w
Copy link
Member Author

Gentle ping @DN6

@DN6 DN6 merged commit ffc0eaa into main Dec 23, 2024
15 checks passed
@a-r-r-o-w a-r-r-o-w deleted the bump-torchao-version branch December 23, 2024 05:35
sayakpaul pushed a commit that referenced this pull request Dec 23, 2024
* bump min torchao version to 0.7.0

* update
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants