What happens to the total KV length > max-compacity length during response generation? #23

PengWenChen · 2024-10-23T02:00:42Z

Hi, thanks for your great work!

It's impressive to compress the long prompt KVs into a constant length.
I'm wondering whether the scenario here also consider the case that generation responses > maximum compacity?

It always goes to ln127 only during prefilling stage, and during generation stage it always goes to ln131.
Is my understanding correct?
https://github.com/FasterDecoding/SnapKV/blob/main/snapkv/monkeypatch/mistral_hijack_4_37.py#L127-L133

WendyH1108 · 2024-10-26T04:36:04Z

Thanks for the question. Our method mainly focused on long-context sequence scenarios where input is usually much longer than output and benefited generation speed. We didn't consider the compression along generation stage. I believe other work like H2O also compress along generation.

PengWenChen changed the title ~~What happens to the total KV length > max-compacity length during generateion?~~ What happens to the total KV length > max-compacity length during response generation? Oct 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What happens to the total KV length > max-compacity length during response generation? #23

What happens to the total KV length > max-compacity length during response generation? #23

PengWenChen commented Oct 23, 2024 •

edited

Loading

WendyH1108 commented Oct 26, 2024

What happens to the total KV length > max-compacity length during response generation? #23

What happens to the total KV length > max-compacity length during response generation? #23

Comments

PengWenChen commented Oct 23, 2024 • edited Loading

WendyH1108 commented Oct 26, 2024

PengWenChen commented Oct 23, 2024 •

edited

Loading