Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QuantMultiheadAttention: Transpose keys after quantizer? #756

Open
iksnagreb opened this issue Nov 14, 2023 · 1 comment
Open

QuantMultiheadAttention: Transpose keys after quantizer? #756

iksnagreb opened this issue Nov 14, 2023 · 1 comment

Comments

@iksnagreb
Copy link

iksnagreb commented Nov 14, 2023

Currently, they key matrix is transposed first before being quantized. To simplify the streamlining and detection of the operator pattern in FINN, I would like to have the transpose directly in front of the MatMul operation - currently there will be a Quant Node and later on a MultiThreshold in between. Would it be generally ok to switch the order of these operations? Or is there more reasoning behind the current order, which I do not see? Maybe it yields better quantization statistics (if ever, should only apply in cases of channel-/group-wise quantization)?

For more context on the effort of streamlining the Brevitas exported QuantMultiheadAttention, please see Xilinx/finn#878

See here for the condition on location of the transpose operation I currently use for detecting the pattern: https://github.com/iksnagreb/attention-dummy/blob/infer-op/infer.py#L124

@Giuseppe5
Copy link
Collaborator

The idea of having the quantization just before the matmul is to avoid dealing with transposition of scale factors and zero point, especially in the case of per channel/per group quantization.

Although QuantTensor supports transpose even for quantization metadata, I would first need to check that it is robust enough in these cases so that we do not worry too much about transposing after quantization with different types of quantization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants