use error #27

zky001 · 2024-02-27T03:10:17Z

The model's max seq len (4096) is larger than the maximum number of tokens that can be stored in KV cache (1792). Try increasing `gpu_memory_utilization` or decreasing `max_model_len` when initializing the engine

sherdencooper · 2024-02-27T03:51:11Z

Hi, thanks for running our codes, it looks like you are encountering an issue with vllm. You could refer to vllm-project/vllm#2418 to try the solution mentioned there. Since the vllm running may depend on cuda version and torch version, I cannot determine the solution for your case. If you still encounter issues with vllm, you may turn to hugging face inference instead.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use error #27

use error #27

zky001 commented Feb 27, 2024

sherdencooper commented Feb 27, 2024

use error #27

use error #27

Comments

zky001 commented Feb 27, 2024

sherdencooper commented Feb 27, 2024