Skip to content

[Bug]: When using lora and setting num-scheduler-steps simultaneously, the output does not meet expectations. #11086

Closed
@luoling1993

Description

@luoling1993

Your current environment

The output of `python collect_env.py`
Your output of `python collect_env.py` here

Model Input Dumps

No response

🐛 Describe the bug

VLLM version: 0.6.4.post1
I have trained a LoRA model based on Qwen2.5-7B-Instruct, and I have started the vllm service using pm2 with the following configuration:

apps:
  - name: "vllm"
    script: "/home/lucas/envs/nlp-vllm/bin/python"
    args:
      - "-m"
      - "vllm.entrypoints.openai.api_server"
      - "--port=18101"  # 端口设置

      # # Meta-Llama-3.1-8B-Instruct
      # - "--served-model-name=Meta-Llama-3.1-8B-Instruct"
      # - "--model=/data/llms/Meta-Llama-3.1-8B-Instruct"
      # - "--tokenizer=/data/llms/Meta-Llama-3.1-8B-Instruct"

      # qwen
      - "--served-model-name=Qwen2.5-7B-Instruct"
      - "--model=/data/llms/Qwen2.5-7B-Instruct"
      - "--tokenizer=/data/llms/Qwen2.5-7B-Instruct"

      # - "--max-model-len=8192"  # 最大模型长度
      - "--max-model-len=4096"
      - "--gpu-memory-utilization=0.9"  # GPU内存利用率

      # speedup
      # - "--enable-chunked-prefill"  # NOTE: LoRA is not supported with chunked prefill yet
      - "--enable-prefix-caching"
      # - "--num-scheduler-steps=8" # NOTE: LoRA, will always use base model(BUG)

      - "--enable-lora"
      - "--max-lora-rank=64"
      - "--lora-modules"
      # - '{"name": "nl2filter", "path": "/home/lucas/workspace/github_project/LLaMA-Factory/saves/Meta-Llama-3.1-8B-Instruct/lora/nl2filter-all", "base_model_name": "Meta-Llama-3.1-8B-Instruct"}'
      - '{"name": "nl2filter", "path": "/home/lucas/workspace/github_project/LLaMA-Factory/saves/Qwen2.5-7B-Instruct/lora/nl2filter-all", "base_model_name": "Qwen2.5-7B-Instruct"}'
   
    env:
      CUDA_VISIBLE_DEVICES: "0"

    log_date_format: "YYYY-MM-DD HH:mm:ss"
    error_file: "/home/lucas/workspace/pm2_logs/error.log"
    out_file: "/home/lucas/workspace/pm2_logs/out.log"

When calling, I use model_name=nl2filter. Everything works fine when the num-scheduler-steps parameter is not set. However, when setting --num-scheduler-steps=8, the service starts up normally and the call also returns, but the result returned is not from the LoRA model. Instead, it is from the base model, which is Qwen2.5-7B-Instruct without any LoRA modifications. There are no errors or warnings.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions