enable_prefix_caching LMDeploy vs vLLM #3182

radna0 · 2025-02-24T14:11:07Z

I'm currently testing LMDeploy, and it just seems like enable_prefix_caching can't be zero-overhead/as performance as vLLM? with vLLM when enable_prefix_caching is enabled, the time to generate the same prompts, is almost free.

Is there an example of this that I might be missing for Turbomind engine? Given the same prompts? I know Turbomind Engine doesn't support prefix prompts/system prompts but this also does not work in user prompts?

vLLM V1

The text was updated successfully, but these errors were encountered:

lvhan028 · 2025-02-26T08:32:26Z

We'll optimize the prefix caching next month.

radna0 · 2025-02-26T08:39:13Z

Thanks @lvhan028 , What's estimated development time for the optimization?

lvhan028 self-assigned this Feb 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable_prefix_caching LMDeploy vs vLLM #3182

enable_prefix_caching LMDeploy vs vLLM #3182

radna0 commented Feb 24, 2025 •

edited

Loading

lvhan028 commented Feb 26, 2025

radna0 commented Feb 26, 2025

enable_prefix_caching LMDeploy vs vLLM #3182

enable_prefix_caching LMDeploy vs vLLM #3182

Comments

radna0 commented Feb 24, 2025 • edited Loading

vLLM V1

lvhan028 commented Feb 26, 2025

radna0 commented Feb 26, 2025

radna0 commented Feb 24, 2025 •

edited

Loading