Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable_prefix_caching LMDeploy vs vLLM #3182

Open
radna0 opened this issue Feb 24, 2025 · 2 comments
Open

enable_prefix_caching LMDeploy vs vLLM #3182

radna0 opened this issue Feb 24, 2025 · 2 comments
Assignees

Comments

@radna0
Copy link

radna0 commented Feb 24, 2025

I'm currently testing LMDeploy, and it just seems like enable_prefix_caching can't be zero-overhead/as performance as vLLM? with vLLM when enable_prefix_caching is enabled, the time to generate the same prompts, is almost free.

Is there an example of this that I might be missing for Turbomind engine? Given the same prompts? I know Turbomind Engine doesn't support prefix prompts/system prompts but this also does not work in user prompts?

vLLM V1

Image
Image

@lvhan028 lvhan028 self-assigned this Feb 26, 2025
@lvhan028
Copy link
Collaborator

We'll optimize the prefix caching next month.

@radna0
Copy link
Author

radna0 commented Feb 26, 2025

Thanks @lvhan028 , What's estimated development time for the optimization?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants