QuantMultiheadAttention: Transpose keys after quantizer? #756

iksnagreb · 2023-11-14T09:08:32Z

Currently, they key matrix is transposed first before being quantized. To simplify the streamlining and detection of the operator pattern in FINN, I would like to have the transpose directly in front of the MatMul operation - currently there will be a Quant Node and later on a MultiThreshold in between. Would it be generally ok to switch the order of these operations? Or is there more reasoning behind the current order, which I do not see? Maybe it yields better quantization statistics (if ever, should only apply in cases of channel-/group-wise quantization)?

For more context on the effort of streamlining the Brevitas exported QuantMultiheadAttention, please see Xilinx/finn#878

See here for the condition on location of the transpose operation I currently use for detecting the pattern: https://github.com/iksnagreb/attention-dummy/blob/infer-op/infer.py#L124

The text was updated successfully, but these errors were encountered:

Giuseppe5 · 2024-02-12T11:56:09Z

The idea of having the quantization just before the matmul is to avoid dealing with transposition of scale factors and zero point, especially in the case of per channel/per group quantization.

Although QuantTensor supports transpose even for quantization metadata, I would first need to check that it is robust enough in these cases so that we do not worry too much about transposing after quantization with different types of quantization.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QuantMultiheadAttention: Transpose keys after quantizer? #756

QuantMultiheadAttention: Transpose keys after quantizer? #756

iksnagreb commented Nov 14, 2023 •

edited

Loading

Giuseppe5 commented Feb 12, 2024

QuantMultiheadAttention: Transpose keys after quantizer? #756

QuantMultiheadAttention: Transpose keys after quantizer? #756

Comments

iksnagreb commented Nov 14, 2023 • edited Loading

Giuseppe5 commented Feb 12, 2024

iksnagreb commented Nov 14, 2023 •

edited

Loading