Group Query Attention #22

SimJeg · 2024-10-03T14:38:33Z

Hello,

Could you clarify how you handle group query attention ? For instance in Mistral 7B, there are 8 key value heads and 32 heads. So a given key-value pair is associated with 4 different queries and hence 4 different attention weights. How do you aggregate these 4 values ? I do see the num_key_value_groups variable in the update_kv method but it is not used.

Thanks !

The text was updated successfully, but these errors were encountered:

WendyH1108 · 2024-10-26T04:44:31Z

Thanks for the question. We keep the head dimension intact as

SnapKV/snapkv/monkeypatch/mistral_hijack_4_37.py

Line 97 in 82135ce

key_states = repeat_kv(key_states, self.num_key_value_groups)

. In our update_kv, we also keep the head dimension along calculations.

SimJeg · 2024-10-26T08:13:45Z

I'm still confused. With group query attention there are more queries than keys and values (e g. 32 queries for 8 keys). SnapKV is based on filtering keys and values with the latest queries. So for each key / value you get for instance 4 different scores. How do you average these 4 scores into a single one ? In other words past key values should have 8 heads and not 32, is it the case in snapkv ? Le sam. 26 oct. 2024, 06:44, yingbinghuang ***@***.***> a écrit :

…

Thanks for the question. We keep the head dimension intact as https://github.com/FasterDecoding/SnapKV/blob/82135ce2cc60f212a9ba918467f3d9c8134e163f/snapkv/monkeypatch/mistral_hijack_4_37.py#L97. In our update_kv, we also keep the head dimension along calculations. — Reply to this email directly, view it on GitHub <#22 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADE64VPAULKAN7DECWWT3ILZ5MM4JAVCNFSM6AAAAABPJ6EEFWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZZGMZTMNZTHA> . You are receiving this because you authored the thread.Message ID: ***@***.***>

FdyCN · 2024-11-14T07:47:02Z

I have the same question. seems there is no "avg" or other process between repeated kv-heads to reduce to GQA. Or just change GQA to MHA by always repeating kv head？？

FFY0 · 2024-11-15T10:59:18Z

Hello, I am the author of Ada-KV, a follow-up work to SnapKV. Recently, we try to integrate GQA support into SnapKV and our Ada-KV. Experimental results show that, after enabling GQA with only 25% of the original cache size in Mistral-7B-Instruct-v0.2, both SnapKV and our Ada-KV continue to perform well, with only a slight quality drop. We have made our code and preliminary results on LongBench publicly available in our repository. We sincerely thank the SnapKV team for releasing their paper and code, which greatly contributed to advancing our research!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Group Query Attention #22

Group Query Attention #22

SimJeg commented Oct 3, 2024

WendyH1108 commented Oct 26, 2024

SimJeg commented Oct 26, 2024 via email

FdyCN commented Nov 14, 2024

FFY0 commented Nov 15, 2024 •

edited

Loading

Group Query Attention #22

Group Query Attention #22

Comments

SimJeg commented Oct 3, 2024

WendyH1108 commented Oct 26, 2024

SimJeg commented Oct 26, 2024 via email

FdyCN commented Nov 14, 2024

FFY0 commented Nov 15, 2024 • edited Loading

FFY0 commented Nov 15, 2024 •

edited

Loading