Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to handle short-cuts with different scales #1101

Closed
balditommaso opened this issue Nov 25, 2024 · 6 comments
Closed

How to handle short-cuts with different scales #1101

balditommaso opened this issue Nov 25, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@balditommaso
Copy link

I am in the situation where the input (quantized) is added back to the output of a block (still quantized), but the scales are different and Brevitas raise an error.

How should I handle this situation?

@balditommaso balditommaso added the bug Something isn't working label Nov 25, 2024
@Giuseppe5
Copy link
Collaborator

The error is not a bug but a feature.

The idea is that if you have two quantized values, you can't really add them unless they have the same scale factor. A different scales implies that they have a different range, so adding their integer representation would produce an integer with an unclear range.

There are two main solutions for this:

  • Dequantize the numbers and adding them in floating point:
first_qt = IntQuantTensor(...)
second_qt = IntQuantTensor(...)
assert first_qt.scale != second_qt.scale
output = first_qt.value + second_qt.value
  • The second solution is to make sure that they have the same scale before adding them. Achieving this depends on the quantized network, but in general, the simplest way to achieve this is to re-quantize the two tensors using the same quant activations:
first_qt = IntQuantTensor(...)
second_qt = IntQuantTensor(...)
shared_requant = QuantIdentity(...)
assert first_qt.scale != second_qt.scale

first_qt = shared_requant(first_qt)
second_qt = shared_requant(second_qt )
assert first_qt.scale == second_qt.scale

output_qt = first_qt + second_qt

This second solution could be optimized so that you reduce the amount of unecessary requantization, but it could serve as a good starting point.

@balditommaso
Copy link
Author

I see, thank you for your answer, I am sorry for the wrong Tag.

However, if we consider a HW implementation of the network (on FPGA for example), we would like to avoid full-precision operation, so I think whoever is interested in the HW implementation should go for the second option, what do you think?

@Giuseppe5
Copy link
Collaborator

Yes the second solution is generally preferred for that particular use case, and that is what we use when quantizing networks for FINN.

@balditommaso
Copy link
Author

Thanks a lot!

@JPPalacios
Copy link

Hi @balditommaso, sorry to butt-in but do you mind sharing how you are streamlining short-cuts during FINN compilation? The resnet example is a little vague with the transformation steps. Thanks for your help!

@Giuseppe5
Copy link
Collaborator

I might suggest to open an issue directly on the FINN repo, @auphelia will be more than happy to help :)

If it is Brevitas related, feel free to share more details

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants