How to handle short-cuts with different scales #1101

balditommaso · 2024-11-25T17:46:02Z

I am in the situation where the input (quantized) is added back to the output of a block (still quantized), but the scales are different and Brevitas raise an error.

How should I handle this situation?

Giuseppe5 · 2024-11-26T09:33:31Z

The error is not a bug but a feature.

The idea is that if you have two quantized values, you can't really add them unless they have the same scale factor. A different scales implies that they have a different range, so adding their integer representation would produce an integer with an unclear range.

There are two main solutions for this:

Dequantize the numbers and adding them in floating point:

first_qt = IntQuantTensor(...)
second_qt = IntQuantTensor(...)
assert first_qt.scale != second_qt.scale
output = first_qt.value + second_qt.value

The second solution is to make sure that they have the same scale before adding them. Achieving this depends on the quantized network, but in general, the simplest way to achieve this is to re-quantize the two tensors using the same quant activations:

first_qt = IntQuantTensor(...)
second_qt = IntQuantTensor(...)
shared_requant = QuantIdentity(...)
assert first_qt.scale != second_qt.scale

first_qt = shared_requant(first_qt)
second_qt = shared_requant(second_qt )
assert first_qt.scale == second_qt.scale

output_qt = first_qt + second_qt

This second solution could be optimized so that you reduce the amount of unecessary requantization, but it could serve as a good starting point.

balditommaso · 2024-11-26T09:44:11Z

I see, thank you for your answer, I am sorry for the wrong Tag.

However, if we consider a HW implementation of the network (on FPGA for example), we would like to avoid full-precision operation, so I think whoever is interested in the HW implementation should go for the second option, what do you think?

Giuseppe5 · 2024-11-26T09:55:49Z

Yes the second solution is generally preferred for that particular use case, and that is what we use when quantizing networks for FINN.

balditommaso · 2024-11-26T10:02:31Z

Thanks a lot!

JPPalacios · 2024-11-26T20:59:08Z

Hi @balditommaso, sorry to butt-in but do you mind sharing how you are streamlining short-cuts during FINN compilation? The resnet example is a little vague with the transformation steps. Thanks for your help!

Giuseppe5 · 2024-11-26T21:07:49Z

I might suggest to open an issue directly on the FINN repo, @auphelia will be more than happy to help :)

If it is Brevitas related, feel free to share more details

balditommaso added the bug Something isn't working label Nov 25, 2024

balditommaso closed this as completed Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to handle short-cuts with different scales #1101

How to handle short-cuts with different scales #1101

balditommaso commented Nov 25, 2024

Giuseppe5 commented Nov 26, 2024

balditommaso commented Nov 26, 2024

Giuseppe5 commented Nov 26, 2024

balditommaso commented Nov 26, 2024

JPPalacios commented Nov 26, 2024

Giuseppe5 commented Nov 26, 2024

How to handle short-cuts with different scales #1101

How to handle short-cuts with different scales #1101

Comments

balditommaso commented Nov 25, 2024

Giuseppe5 commented Nov 26, 2024

balditommaso commented Nov 26, 2024

Giuseppe5 commented Nov 26, 2024

balditommaso commented Nov 26, 2024

JPPalacios commented Nov 26, 2024

Giuseppe5 commented Nov 26, 2024