Closed
Description
Your current environment
The output of `python collect_env.py`
Your output of `python collect_env.py` here
Model Input Dumps
No response
🐛 Describe the bug
VLLM version: 0.6.4.post1
I have trained a LoRA model based on Qwen2.5-7B-Instruct, and I have started the vllm service using pm2 with the following configuration:
apps:
- name: "vllm"
script: "/home/lucas/envs/nlp-vllm/bin/python"
args:
- "-m"
- "vllm.entrypoints.openai.api_server"
- "--port=18101" # 端口设置
# # Meta-Llama-3.1-8B-Instruct
# - "--served-model-name=Meta-Llama-3.1-8B-Instruct"
# - "--model=/data/llms/Meta-Llama-3.1-8B-Instruct"
# - "--tokenizer=/data/llms/Meta-Llama-3.1-8B-Instruct"
# qwen
- "--served-model-name=Qwen2.5-7B-Instruct"
- "--model=/data/llms/Qwen2.5-7B-Instruct"
- "--tokenizer=/data/llms/Qwen2.5-7B-Instruct"
# - "--max-model-len=8192" # 最大模型长度
- "--max-model-len=4096"
- "--gpu-memory-utilization=0.9" # GPU内存利用率
# speedup
# - "--enable-chunked-prefill" # NOTE: LoRA is not supported with chunked prefill yet
- "--enable-prefix-caching"
# - "--num-scheduler-steps=8" # NOTE: LoRA, will always use base model(BUG)
- "--enable-lora"
- "--max-lora-rank=64"
- "--lora-modules"
# - '{"name": "nl2filter", "path": "/home/lucas/workspace/github_project/LLaMA-Factory/saves/Meta-Llama-3.1-8B-Instruct/lora/nl2filter-all", "base_model_name": "Meta-Llama-3.1-8B-Instruct"}'
- '{"name": "nl2filter", "path": "/home/lucas/workspace/github_project/LLaMA-Factory/saves/Qwen2.5-7B-Instruct/lora/nl2filter-all", "base_model_name": "Qwen2.5-7B-Instruct"}'
env:
CUDA_VISIBLE_DEVICES: "0"
log_date_format: "YYYY-MM-DD HH:mm:ss"
error_file: "/home/lucas/workspace/pm2_logs/error.log"
out_file: "/home/lucas/workspace/pm2_logs/out.log"
When calling, I use model_name=nl2filter. Everything works fine when the num-scheduler-steps parameter is not set. However, when setting --num-scheduler-steps=8, the service starts up normally and the call also returns, but the result returned is not from the LoRA model. Instead, it is from the base model, which is Qwen2.5-7B-Instruct without any LoRA modifications. There are no errors or warnings.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.