-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Add QuantConv3d and QuantConv3dTranspose #805
Conversation
… specifically for a mode that pytorch doesn't support
group_size = self.out_channels // self.groups | ||
overlapping_sums = max(round(self.kernel_size[0] / self.stride[0]), 1) | ||
overlapping_sums *= max(round(self.kernel_size[1] / self.stride[1]), 1) | ||
overlapping_sums *= max(round(self.kernel_size[2] / self.stride[2]), 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this actually calculate the right thing? The way I read this function is that group_size
* overlapping_sums
should equal the number of terms in my dot product. If so, why is the stride important?
@i-colbert, @Giuseppe5, can you enlighten me here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have only worked with the 2D case, but if it is a trivial extension of that then this is almost correct. Unlike the convolution, which traverses the input space in strides, the deconvolution traverses the output space in strides and creates overlapping sums that are a function of the kernel size and stride. This paper introduces an algorithm the traverses the output space of the deconvolution (see Algorithm 8) and you can see the stride skipping there. However, you can see that it should be a floor rounding rather than a halfway rounding. I would propose the following:
...
patch_size = (self.kernel_size[0] // self.stride[0]) * (self.kernel_size[1] // self.stride[1]) * (self.kernel_size[2] // self.stride[2])
max_uint_output = max_uint_input * max_kernel_val * patch_size * group_size
...
The stride shouldn't really outgrow the kernel_size, since it wouldn't make much sense unless there is a niche use case. Not sure if PyTorch throws an error in that case and also not sure if it'd be the job of the quant layer to protect against this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! <3
tests/brevitas/nn/test_a2q.py
Outdated
elif kwargs[ | ||
'model_type'] == 'QuantConvTranspose1d': # shape = (in_channels, out_channels, kernel_size) | ||
quant_weight_per_channel_l1_norm = quant_weight.norm(p=1, dim=(0, 2)) | ||
elif kwargs[ | ||
'model_type'] == 'QuantConvTranspose2d': # shape = (in_channels, out_channels, kernel_size) | ||
quant_weight_per_channel_l1_norm = quant_weight.norm(p=1, dim=(0, 2, 3)) | ||
elif kwargs[ | ||
'model_type'] == 'QuantConvTranspose3d': # shape = (in_channels, out_channels, kernel_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment: Should it be? # shape = (in_channels, out_channels, kernel_size, kernel_size)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Issue
Implements #795
Details