Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control on the weight quantization #1123

Closed
balditommaso opened this issue Dec 9, 2024 · 13 comments
Closed

Control on the weight quantization #1123

balditommaso opened this issue Dec 9, 2024 · 13 comments

Comments

@balditommaso
Copy link

I am asking this question because I am working with a custom implementation of a QuantConv2d layer. During the training, the weights of the layer have to be processed with a series of operations that can be done in full precision, then the quantized version of the processed weights can be used to apply the convolution.

So far, I have applied the preprocessing on the . value version of the quant_weight, breaking the quantization and then I re-apply the quantization making the weights passing through a QuantIdentity. However, This approach is sub-optimal because it increases the quantization error and it is tricky to emulate the weight quantization scheme with the activations.

Is there a way to control when the layer apply the weight quantization?

PS: During the inference there are no problems because it will behaive like a traditional QuantConv2d.

@Giuseppe5
Copy link
Collaborator

I would recommend using parametrize.

You register your parametrization, and those will be applied automatically before quantization.

@balditommaso
Copy link
Author

Sorry one further question: what is the pipeline of quantization during training? Are weights still updated or we should rely only on the .value field?

@Giuseppe5
Copy link
Collaborator

I'm not sure I follow but gradients should be correctly propagated. If you need an example on how to use quant_weights with custom layers, this is the general implementation of a forward pass for a quantized int layer:

def quant_layer(fn, quant_input, quant_weight, bias, *args, **kwargs):

@Giuseppe5
Copy link
Collaborator

If your layer is a custom QuantConv2d and you end up calling torch.nn.functional.conv2d passing a QuantTensor, everything is handled by Brevitas

@balditommaso
Copy link
Author

balditommaso commented Dec 10, 2024

My layer extend QuantConv2d, before calling its inner_forward_impl I need to do some computations with the weights that must be traced by the gradients, but in the inner_forward_impl I have access only to the quant_weight, however if I move this processing in the forward before calling the inner_forward and I work with traditional .weight does it propagate to the quantization?

@Giuseppe5
Copy link
Collaborator

Could you post a code snippet to get an idea of what you're trying to achieve?

@balditommaso
Copy link
Author

Here an example:
I tried parametrization as you suggested, but there are problems when training on GPU becuase the scales are on cpu


class Operation(nn.Module):
    @staticmethod
    def forward(weight: Tensor) -> Tensor:
        return weight + torch.ones_like(weight)
    

class CustomQuantConv2d(qnn.QuantConv2d):
    def __init__(self,
            in_channels: int,
            out_channels: int,
            kernel_size: _size_2_t,
            padding: Union[_size_2_t, str] = 'same',
            padding_mode: str = 'circular',
            initializer: Optional[Callable] = torch.nn.init.orthogonal_,
            weight_quant: Optional[WeightQuantType] = Int8WeightPerTensorFloat,
            bias_quant: Optional[BiasQuantType] = None,
            input_quant: Optional[ActQuantType] = None,
            output_quant: Optional[ActQuantType] = None,
            return_quant_tensor: bool = False,
            **kwargs):
        #if padding == 'same':
        #    padding = kernel_size[-1]//2 if isinstance(kernel_size, tuple) else kernel_size//2
        super().__init__(
            in_channels=in_channels,
            out_channels=out_channels,
            kernel_size=kernel_size,
            padding=padding,
            padding_mode=padding_mode,
            weight_quant=weight_quant,
            bias_quant=bias_quant,
            input_quant=input_quant,
            output_quant=output_quant,
            return_quant_tensor=return_quant_tensor,
            **kwargs)
        
        if initializer is not None:
            initializer(self.weight)
            
        # apply rescaling
        register_parametrization(self, "weight", Operation())
    
  
      

@balditommaso
Copy link
Author

This is the error I am getting, I am sure that I am moving correctly all the model on the right GPU because I am training with pytorch-lightning.

Screenshot 2024-12-11 alle 11 18 57

@balditommaso
Copy link
Author

I think the problem is in _ParameterListStats, where the tracked_parameter_list has the tensors on cpu and then when we move the model on GPU during training they are not moved too.

Hope it is helping!

@Giuseppe5
Copy link
Collaborator

Might I ask you to pull the latest version of dev?
We made some change to that logic and maybe it will also solve your issue

@balditommaso
Copy link
Author

With the latest version it is working. Thank you so much!

@balditommaso
Copy link
Author

Sorry one more question, by using parametrize, the values stored in quant_weight().value are the dequantized version of self.weight or self.parametrizations.weight.original?

I am asking this, becuase I want to be sure that once I export the model, I will get the quantized version of the modified weights.

@Giuseppe5
Copy link
Collaborator

Dequantized version of self.weight, which is what you were asking for at the beginning of the question right?

The idea is that in Brevitas, we rely on self.weight for quantization, so that's why I suggested it should work out of the box.

Having said that, I have never used parametrize before (for now), so if you still have doubts, I'd recommend poking a bit with a debugger to make sure everything is in the correct place

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants