Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to use continuous kv cache with prefix prompt caching in gpt attention plugin in context phase? #2593

Open
FPTMMC opened this issue Dec 19, 2024 · 2 comments
Assignees

Comments

@FPTMMC
Copy link

FPTMMC commented Dec 19, 2024

In this scenario, I want to use system prompt cache with continuous kv cache in the context phase.

I have 14 precomputed key-value pairs stored in the continuous kv cache with shape [batchsize, 2, num_kv_heads, max_seq_len, head_size]. When I first compute attention values using gpt-attention-plugin, I pass the kv cache as the parametaer past_key_value and set context_lengths, sequence_length and host_past_key_value_lengths as 14.

Since it's in the context phase, I find that kv cache with precomputed values are not used in the computing attention progress. The attention results show that gpt-attention-plugin doesn't count the precomputed values. And the present kv values computed by get-attention are stored in the kvcache[0:14], instead of being stored in the [14:28].

So I wonder how to use continuous kv cache with prefix prompt caching in gpt attention plugin in context phase?

@FPTMMC FPTMMC changed the title how to use continuous kv cache in gpt attention plugin in context phase? how to use continuous kv cache with prefix prompt caching in gpt attention plugin in context phase? Dec 19, 2024
@FPTMMC
Copy link
Author

FPTMMC commented Dec 19, 2024

@schetlur-nv @nv-guomingz

@nv-guomingz
Copy link
Collaborator

@byshiue could u please take a look this question?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants