how to use continuous kv cache with prefix prompt caching in gpt attention plugin in context phase? #2593

FPTMMC · 2024-12-19T09:30:39Z

In this scenario, I want to use system prompt cache with continuous kv cache in the context phase.

I have 14 precomputed key-value pairs stored in the continuous kv cache with shape [batchsize, 2, num_kv_heads, max_seq_len, head_size]. When I first compute attention values using gpt-attention-plugin, I pass the kv cache as the parametaer past_key_value and set context_lengths, sequence_length and host_past_key_value_lengths as 14.

Since it's in the context phase, I find that kv cache with precomputed values are not used in the computing attention progress. The attention results show that gpt-attention-plugin doesn't count the precomputed values. And the present kv values computed by get-attention are stored in the kvcache[0:14], instead of being stored in the [14:28].

So I wonder how to use continuous kv cache with prefix prompt caching in gpt attention plugin in context phase?

FPTMMC · 2024-12-19T09:56:57Z

@schetlur-nv @nv-guomingz

nv-guomingz · 2024-12-25T01:37:26Z

@byshiue could u please take a look this question?

FPTMMC changed the title ~~how to use continuous kv cache in gpt attention plugin in context phase?~~ how to use continuous kv cache with prefix prompt caching in gpt attention plugin in context phase? Dec 19, 2024

nv-guomingz assigned byshiue Dec 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to use continuous kv cache with prefix prompt caching in gpt attention plugin in context phase? #2593

how to use continuous kv cache with prefix prompt caching in gpt attention plugin in context phase? #2593

FPTMMC commented Dec 19, 2024 •

edited

Loading

FPTMMC commented Dec 19, 2024

nv-guomingz commented Dec 25, 2024

how to use continuous kv cache with prefix prompt caching in gpt attention plugin in context phase? #2593

how to use continuous kv cache with prefix prompt caching in gpt attention plugin in context phase? #2593

Comments

FPTMMC commented Dec 19, 2024 • edited Loading

FPTMMC commented Dec 19, 2024

nv-guomingz commented Dec 25, 2024

FPTMMC commented Dec 19, 2024 •

edited

Loading