Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group Query Attention #22

Open
SimJeg opened this issue Oct 3, 2024 · 4 comments
Open

Group Query Attention #22

SimJeg opened this issue Oct 3, 2024 · 4 comments

Comments

@SimJeg
Copy link

SimJeg commented Oct 3, 2024

Hello,

Could you clarify how you handle group query attention ? For instance in Mistral 7B, there are 8 key value heads and 32 heads. So a given key-value pair is associated with 4 different queries and hence 4 different attention weights. How do you aggregate these 4 values ? I do see the num_key_value_groups variable in the update_kv method but it is not used.

Thanks !

@WendyH1108
Copy link
Collaborator

Thanks for the question. We keep the head dimension intact as

key_states = repeat_kv(key_states, self.num_key_value_groups)
. In our update_kv, we also keep the head dimension along calculations.

@SimJeg
Copy link
Author

SimJeg commented Oct 26, 2024 via email

@FdyCN
Copy link

FdyCN commented Nov 14, 2024

I have the same question. seems there is no "avg" or other process between repeated kv-heads to reduce to GQA. Or just change GQA to MHA by always repeating kv head??

@FFY0
Copy link

FFY0 commented Nov 15, 2024

Hello, I am the author of Ada-KV, a follow-up work to SnapKV. Recently, we try to integrate GQA support into SnapKV and our Ada-KV. Experimental results show that, after enabling GQA with only 25% of the original cache size in Mistral-7B-Instruct-v0.2, both SnapKV and our Ada-KV continue to perform well, with only a slight quality drop. We have made our code and preliminary results on LongBench publicly available in our repository. We sincerely thank the SnapKV team for releasing their paper and code, which greatly contributed to advancing our research!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants