Control on the weight quantization #1123

balditommaso · 2024-12-09T10:52:44Z

I am asking this question because I am working with a custom implementation of a QuantConv2d layer. During the training, the weights of the layer have to be processed with a series of operations that can be done in full precision, then the quantized version of the processed weights can be used to apply the convolution.

So far, I have applied the preprocessing on the . value version of the quant_weight, breaking the quantization and then I re-apply the quantization making the weights passing through a QuantIdentity. However, This approach is sub-optimal because it increases the quantization error and it is tricky to emulate the weight quantization scheme with the activations.

Is there a way to control when the layer apply the weight quantization?

PS: During the inference there are no problems because it will behaive like a traditional QuantConv2d.

The text was updated successfully, but these errors were encountered:

Giuseppe5 · 2024-12-09T16:03:42Z

I would recommend using parametrize.

You register your parametrization, and those will be applied automatically before quantization.

balditommaso · 2024-12-10T16:37:52Z

Sorry one further question: what is the pipeline of quantization during training? Are weights still updated or we should rely only on the .value field?

Giuseppe5 · 2024-12-10T16:41:56Z

I'm not sure I follow but gradients should be correctly propagated. If you need an example on how to use quant_weights with custom layers, this is the general implementation of a forward pass for a quantized int layer:

brevitas/src/brevitas/quant_tensor/int_torch_handler.py

Line 140 in 4617f7b

def quant_layer(fn, quant_input, quant_weight, bias, *args, **kwargs):

Giuseppe5 · 2024-12-10T16:42:50Z

If your layer is a custom QuantConv2d and you end up calling torch.nn.functional.conv2d passing a QuantTensor, everything is handled by Brevitas

balditommaso · 2024-12-10T16:49:14Z

My layer extend QuantConv2d, before calling its inner_forward_impl I need to do some computations with the weights that must be traced by the gradients, but in the inner_forward_impl I have access only to the quant_weight, however if I move this processing in the forward before calling the inner_forward and I work with traditional .weight does it propagate to the quantization?

Giuseppe5 · 2024-12-11T10:05:40Z

Could you post a code snippet to get an idea of what you're trying to achieve?

balditommaso · 2024-12-11T10:17:06Z

Here an example:
I tried parametrization as you suggested, but there are problems when training on GPU becuase the scales are on cpu


class Operation(nn.Module):
    @staticmethod
    def forward(weight: Tensor) -> Tensor:
        return weight + torch.ones_like(weight)
    

class CustomQuantConv2d(qnn.QuantConv2d):
    def __init__(self,
            in_channels: int,
            out_channels: int,
            kernel_size: _size_2_t,
            padding: Union[_size_2_t, str] = 'same',
            padding_mode: str = 'circular',
            initializer: Optional[Callable] = torch.nn.init.orthogonal_,
            weight_quant: Optional[WeightQuantType] = Int8WeightPerTensorFloat,
            bias_quant: Optional[BiasQuantType] = None,
            input_quant: Optional[ActQuantType] = None,
            output_quant: Optional[ActQuantType] = None,
            return_quant_tensor: bool = False,
            **kwargs):
        #if padding == 'same':
        #    padding = kernel_size[-1]//2 if isinstance(kernel_size, tuple) else kernel_size//2
        super().__init__(
            in_channels=in_channels,
            out_channels=out_channels,
            kernel_size=kernel_size,
            padding=padding,
            padding_mode=padding_mode,
            weight_quant=weight_quant,
            bias_quant=bias_quant,
            input_quant=input_quant,
            output_quant=output_quant,
            return_quant_tensor=return_quant_tensor,
            **kwargs)
        
        if initializer is not None:
            initializer(self.weight)
            
        # apply rescaling
        register_parametrization(self, "weight", Operation())

balditommaso · 2024-12-11T10:20:45Z

This is the error I am getting, I am sure that I am moving correctly all the model on the right GPU because I am training with pytorch-lightning.

balditommaso · 2024-12-11T11:10:09Z

I think the problem is in _ParameterListStats, where the tracked_parameter_list has the tensors on cpu and then when we move the model on GPU during training they are not moved too.

Hope it is helping!

Giuseppe5 · 2024-12-11T11:32:11Z

Might I ask you to pull the latest version of dev?
We made some change to that logic and maybe it will also solve your issue

balditommaso · 2024-12-11T12:29:04Z

With the latest version it is working. Thank you so much!

balditommaso · 2024-12-11T12:52:53Z

Sorry one more question, by using parametrize, the values stored in quant_weight().value are the dequantized version of self.weight or self.parametrizations.weight.original?

I am asking this, becuase I want to be sure that once I export the model, I will get the quantized version of the modified weights.

Giuseppe5 · 2024-12-12T15:48:22Z

Dequantized version of self.weight, which is what you were asking for at the beginning of the question right?

The idea is that in Brevitas, we rely on self.weight for quantization, so that's why I suggested it should work out of the box.

Having said that, I have never used parametrize before (for now), so if you still have doubts, I'd recommend poking a bit with a debugger to make sure everything is in the correct place

balditommaso closed this as completed Dec 11, 2024

balditommaso reopened this Dec 11, 2024

balditommaso closed this as completed Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Control on the weight quantization #1123

Control on the weight quantization #1123

balditommaso commented Dec 9, 2024

Giuseppe5 commented Dec 9, 2024

balditommaso commented Dec 10, 2024

Giuseppe5 commented Dec 10, 2024

Giuseppe5 commented Dec 10, 2024

balditommaso commented Dec 10, 2024 •

edited

Loading

Giuseppe5 commented Dec 11, 2024

balditommaso commented Dec 11, 2024

balditommaso commented Dec 11, 2024

balditommaso commented Dec 11, 2024

Giuseppe5 commented Dec 11, 2024

balditommaso commented Dec 11, 2024

balditommaso commented Dec 11, 2024

Giuseppe5 commented Dec 12, 2024

Control on the weight quantization #1123

Control on the weight quantization #1123

Comments

balditommaso commented Dec 9, 2024

Giuseppe5 commented Dec 9, 2024

balditommaso commented Dec 10, 2024

Giuseppe5 commented Dec 10, 2024

Giuseppe5 commented Dec 10, 2024

balditommaso commented Dec 10, 2024 • edited Loading

Giuseppe5 commented Dec 11, 2024

balditommaso commented Dec 11, 2024

balditommaso commented Dec 11, 2024

balditommaso commented Dec 11, 2024

Giuseppe5 commented Dec 11, 2024

balditommaso commented Dec 11, 2024

balditommaso commented Dec 11, 2024

Giuseppe5 commented Dec 12, 2024

balditommaso commented Dec 10, 2024 •

edited

Loading