[Bug]: When using lora and setting num-scheduler-steps simultaneously, the output does not meet expectations.

### Your current environment

<details>
<summary>The output of `python collect_env.py`</summary>

```text
Your output of `python collect_env.py` here
```

</details>


### Model Input Dumps

_No response_

### 🐛 Describe the bug

VLLM version： 0.6.4.post1
I have trained a LoRA model based on Qwen2.5-7B-Instruct, and I have started the vllm service using pm2 with the following configuration:
```yaml
apps:
  - name: "vllm"
    script: "/home/lucas/envs/nlp-vllm/bin/python"
    args:
      - "-m"
      - "vllm.entrypoints.openai.api_server"
      - "--port=18101"  # 端口设置

      # # Meta-Llama-3.1-8B-Instruct
      # - "--served-model-name=Meta-Llama-3.1-8B-Instruct"
      # - "--model=/data/llms/Meta-Llama-3.1-8B-Instruct"
      # - "--tokenizer=/data/llms/Meta-Llama-3.1-8B-Instruct"

      # qwen
      - "--served-model-name=Qwen2.5-7B-Instruct"
      - "--model=/data/llms/Qwen2.5-7B-Instruct"
      - "--tokenizer=/data/llms/Qwen2.5-7B-Instruct"

      # - "--max-model-len=8192"  # 最大模型长度
      - "--max-model-len=4096"
      - "--gpu-memory-utilization=0.9"  # GPU内存利用率

      # speedup
      # - "--enable-chunked-prefill"  # NOTE: LoRA is not supported with chunked prefill yet
      - "--enable-prefix-caching"
      # - "--num-scheduler-steps=8" # NOTE: LoRA, will always use base model(BUG)

      - "--enable-lora"
      - "--max-lora-rank=64"
      - "--lora-modules"
      # - '{"name": "nl2filter", "path": "/home/lucas/workspace/github_project/LLaMA-Factory/saves/Meta-Llama-3.1-8B-Instruct/lora/nl2filter-all", "base_model_name": "Meta-Llama-3.1-8B-Instruct"}'
      - '{"name": "nl2filter", "path": "/home/lucas/workspace/github_project/LLaMA-Factory/saves/Qwen2.5-7B-Instruct/lora/nl2filter-all", "base_model_name": "Qwen2.5-7B-Instruct"}'
   
    env:
      CUDA_VISIBLE_DEVICES: "0"

    log_date_format: "YYYY-MM-DD HH:mm:ss"
    error_file: "/home/lucas/workspace/pm2_logs/error.log"
    out_file: "/home/lucas/workspace/pm2_logs/out.log"
```

When calling, I use model_name=nl2filter. Everything works fine when the num-scheduler-steps parameter is not set. However, when setting --num-scheduler-steps=8, the service starts up normally and the call also returns, but the result returned is not from the LoRA model. Instead, it is from the base model, which is Qwen2.5-7B-Instruct without any LoRA modifications. There are no errors or warnings.

### Before submitting a new issue...

- [X] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bug]: When using lora and setting num-scheduler-steps simultaneously, the output does not meet expectations. #11086

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: When using lora and setting num-scheduler-steps simultaneously, the output does not meet expectations. #11086

Description

Your current environment

Model Input Dumps

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions