Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Multiple prompts for prompt caching #10904

Closed
4 tasks done
firelex opened this issue Dec 19, 2024 · 1 comment
Closed
4 tasks done

Feature Request: Multiple prompts for prompt caching #10904

firelex opened this issue Dec 19, 2024 · 1 comment
Labels
enhancement New feature or request stale

Comments

@firelex
Copy link

firelex commented Dec 19, 2024

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

For function calling it would be extremely useful if we could use the prompt cache to cache multiple prompts and use different caches for different calls depending on which of the cached prompts we want to use.

Motivation

This would massively improve performance for function calling. I'm on an M4 Max and trying to use speculative decoding for a quantized model, which doesn't work (unclear whether it shoudl or not - one of the pages here said spec. decoding doesn't wortkf for quantized models). But with a multi-prompt cahce, I could still get significant performance juice out of my current setup.

Possible Implementation

I think it wouldn't be hard to do this. vllm supports it.

@firelex firelex added the enhancement New feature or request label Dec 19, 2024
@github-actions github-actions bot added the stale label Jan 19, 2025
Copy link
Contributor

github-actions bot commented Feb 3, 2025

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale
Projects
None yet
Development

No branches or pull requests

1 participant