[tests] tighten compilation tests for quantization #12002

sayakpaul · 2025-07-28T08:01:47Z

What does this PR do?

When not using any kind of offloading but just torch.compile with fullgraph=True on quantized models, we don't want any recompilations to get in the way of performance. This PR ensures that.
When using offloading, regional compilation with fullgraph=True is better in terms of cold-start and also overall execution time. When _repeated_blocks is available for a model class, we make use of compile_repeated_blocks() instead of compile().

sayakpaul · 2025-07-28T09:30:30Z

tests/quantization/bnb/test_mixed_int8.py

@@ -847,6 +847,10 @@ def quantization_config(self):
            components_to_quantize=["transformer", "text_encoder_2"],
        )

+    @pytest.mark.xfail(


@matthewdouglas I get:

- 0/0: expected type of 'module._modules['norm_out']._modules['linear']._parameters['weight'].CB' to be a tensor type, ' but found <class 'NoneType'>

For the time being I'm not sure that we can do a whole lot to avoid this for bnb int8. At the very least it is not a high priority for us. Not 100% sure but it's possible you could get around this by making a forward pass through the model prior to compiling it.

I will note this down then in the xfail reason.

sayakpaul · 2025-07-28T09:34:42Z

@anijain2305 I am planning to switch to compile_repeated_blocks(fullgraph=True) for:

diffusers/tests/quantization/test_torch_compile_utils.py

Line 95 in 2841504

def test_torch_compile_with_group_offload_leaf(self, use_stream=False):

This is to get rid of:

diffusers/tests/quantization/test_torch_compile_utils.py

Line 70 in 2841504

torch._dynamo.config.cache_size_limit = 1000

However, doing so with pytest tests/quantization/gguf/test_gguf.py::GGUFCompileTests::test_torch_compile_with_group_offload_leaf results into:

E               torch._dynamo.exc.Unsupported: Attempted to call function marked as skipped
E                 Explanation: Dynamo developers have intentionally marked that the function `current_accelerator` in file `/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/accelerator/__init__.py` should not be traced.
E                 Hint: Avoid calling the function `current_accelerator`.
E                 Hint: Apply `@torch._dynamo.dont_skip_tracing` to the function `current_accelerator` to force tracing into the function. More graph breaks may occur as a result of attempting to trace into the function.
E                 Hint: Please file an issue to PyTorch.
E               
E                 Developer debug context: module: torch.accelerator, qualname: current_accelerator, skip reason: <missing reason>
E               
E               
E               from user code:
E                  File "/fsx/sayak/diffusers/src/diffusers/models/transformers/transformer_flux.py", line 448, in forward
E                   norm_hidden_states, gate_msa, shift_mlp, scale_mlp, gate_mlp = self.norm1(hidden_states, emb=temb)
E                 File "/fsx/sayak/diffusers/src/diffusers/models/normalization.py", line 168, in forward
E                   emb = self.linear(self.silu(emb))
E                 File "/fsx/sayak/miniconda3/envs/diffusers/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1778, in _call_impl
E                   return forward_call(*args, **kwargs)
E                 File "/fsx/sayak/diffusers/src/diffusers/hooks/hooks.py", line 188, in new_forward
E                   args, kwargs = function_reference.pre_forward(module, *args, **kwargs)
E                 File "/fsx/sayak/diffusers/src/diffusers/hooks/group_offloading.py", line 339, in pre_forward
E                   self.group.onload_()
E                 File "/fsx/sayak/diffusers/src/diffusers/hooks/group_offloading.py", line 213, in onload_
E                   getattr(torch, torch.accelerator.current_accelerator().type)
E               
E               Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

I also had to comment out:

diffusers/src/diffusers/hooks/group_offloading.py

Line 210 in 2841504

@torch.compiler.disable()

Do you have any recommendations?

sayakpaul added 2 commits July 28, 2025 13:27

tighten compilation tests for quantization

8d431dc

up

16acec5

sayakpaul commented Jul 28, 2025

View reviewed changes

sayakpaul requested review from DN6 and matthewdouglas July 28, 2025 09:30

sayakpaul added the torch.compile label Jul 28, 2025

sayakpaul added 2 commits July 29, 2025 21:45

Merge branch 'main' into tighten-compile-quant-tests

ea1a134

Merge branch 'main' into tighten-compile-quant-tests

cdd346d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[tests] tighten compilation tests for quantization #12002

[tests] tighten compilation tests for quantization #12002

sayakpaul commented Jul 28, 2025

Uh oh!

sayakpaul Jul 28, 2025

Uh oh!

matthewdouglas Jul 29, 2025

Uh oh!

sayakpaul Jul 29, 2025

Uh oh!

sayakpaul commented Jul 28, 2025

Uh oh!

Uh oh!

[tests] tighten compilation tests for quantization #12002

Are you sure you want to change the base?

[tests] tighten compilation tests for quantization #12002

Conversation

sayakpaul commented Jul 28, 2025

What does this PR do?

Uh oh!

sayakpaul Jul 28, 2025

Choose a reason for hiding this comment

Uh oh!

matthewdouglas Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

sayakpaul Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

sayakpaul commented Jul 28, 2025

Uh oh!

Uh oh!