-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doubt about the __add__ implementation of the IntQuantTensor #1106
Comments
My understanding from your message is that what we currently do in Brevitas is not technically wrong but maybe it's not the best handling of this case, correct? In Brevitas, we tend to preserve and propagate the concept of QuantTensor as much as possible, as in the case you pointed out, since it's always possible to fall back to a normal tensor by calling In the future, we might think about introducing a flag that let the user select one behavior or another, but for now I would recommend to manually dequantize your tensor after the addition. Going back to the question at the beginning of my answer, if I miss-understood and the results is technically wrong, let us know and we'll promptly fix that. Thanks! |
I see your point, but maybe there should be a warning for the user, so he/she can be aware of this behavior which is not "canonical" for the quantization theory. I discover it by chance, because I was surprised by the size of the model. Indeed, imagine moving this implementation on HW where for each weight (ex. INT8) we need its zero_point (ex. FP32) value, you are wasting a lot of memory. I am not sure there is a correct way to handle this case, it's tricky, however thank you for your answer, now I know what you were looking for with this approach! |
Can you provide a small script with an example of this and its impact on the size of the model? I imagine it has an impact on the runtime memory you might need to run the model (which is removed once you dequantize) but I'm not sure I understand how it would impact the "static" size of your model. Maybe I am not understanding correctly, so a small example could clarify that. |
with this script you will see that the number of zero_point is the same of weights,
|
Right, but the size of the And to avoid the issue, as mentioned above, you could simply change x = x + torch.ones_like(x) to x = x.value + torch.ones_like(x) I understand your point and as mentioned, we are thinking about a clean solution to allow for both solutions to co-exist. I just wanted to make sure that there were no side effects that would cause an increase of size of the model |
The |
I think we are in agreement, so I am going to close this issue for now, but feel free to re-open if you have further issues with this. Thanks for pointing this out and the example, I am sure it will be useful for other Brevitas' users as well! |
Hi, I was looking to the new implementation of the
__add__
in the IntQuantTensor class and I am not sure that you are handling correctly the case where we add QuantTensor to a Tensor. Indeed, you are adjusting the zero_point in the following way:By doing so you are basically creating a zero_point value for each weight of the tensor and this is not good for the following reasons:
.zero_point
and.value
tensors will have the same shape.I think it might be better to handle that situation dequantizing the QuantTensor and returning the sum of two Tensors.
Let me know what do you think!
The text was updated successfully, but these errors were encountered: