You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's impressive to compress the long prompt KVs into a constant length.
I'm wondering whether the scenario here also consider the case that generation responses > maximum compacity?
The text was updated successfully, but these errors were encountered:
PengWenChen
changed the title
What happens to the total KV length > max-compacity length during generateion?
What happens to the total KV length > max-compacity length during response generation?
Oct 23, 2024
Thanks for the question. Our method mainly focused on long-context sequence scenarios where input is usually much longer than output and benefited generation speed. We didn't consider the compression along generation stage. I believe other work like H2O also compress along generation.
Hi, thanks for your great work!
It's impressive to compress the long prompt KVs into a constant length.
I'm wondering whether the scenario here also consider the case that generation responses > maximum compacity?
It always goes to ln127 only during prefilling stage, and during generation stage it always goes to ln131.
Is my understanding correct?
https://github.com/FasterDecoding/SnapKV/blob/main/snapkv/monkeypatch/mistral_hijack_4_37.py#L127-L133
The text was updated successfully, but these errors were encountered: