Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] NPU下Qwen2.5-VL推理报错 #3237

Open
3 tasks done
tcye opened this issue Mar 10, 2025 · 0 comments
Open
3 tasks done

[Bug] NPU下Qwen2.5-VL推理报错 #3237

tcye opened this issue Mar 10, 2025 · 0 comments

Comments

@tcye
Copy link

tcye commented Mar 10, 2025

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

NPU下,Qwen2.5-VL推理报错,是还有算子未支持吗?但看dlinfer显示已支持Qwen2.5-VL

Reproduction

lmdeploy serve api_server /mnt/cephfs/tiancaiye/r1_model/Qwen2.5-VL-72B-Reasoning/ --tp 8 --server-port 8080 --device ascend --backend pytorch --eager-mode --dtype bfloat16

随后使用OpenAI client对话

Environment

sys.platform: linux
Python: 3.10.15 (main, Nov 27 2024, 06:37:16) [GCC 11.4.0]
CUDA available: False
MUSA available: False
numpy_random_seed: 2147483648
GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 2.3.1+cpu
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.3.6 (Git Hash 86e6af5974177e513fd3fee58425e1063e7f1361)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.3.1, USE_CUDA=0, USE_CUDNN=OFF, USE_CUSPARSELT=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, 

TorchVision: 0.18.1+cpu
LMDeploy: 0.7.1+
transformers: 4.49.0
gradio: 5.20.0
fastapi: 0.115.11
pydantic: 2.10.6
triton: Not Found

Error traceback

2025-03-10 08:45:28,436 - lmdeploy - ERROR - model_agent.py:391 - Task <ModelAgentLoop> failed
Traceback (most recent call last):
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 386, in _on_finish_callback
    task.result()
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 374, in _async_loop_background
    await self._async_step_background(
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 322, in _async_step_background
    output = await self._async_model_forward(inputs,
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 243, in _async_model_forward
    ret = await __forward(inputs)
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 220, in __forward
    return await self.async_forward(inputs, swap_in_map=swap_in_map, swap_out_map=swap_out_map)
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 538, in async_forward
    output = self._forward_impl(inputs, swap_in_map=swap_in_map, swap_out_map=swap_out_map)
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 521, in _forward_impl
    output = model_forward(
  File "/usr/local/python3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 75, in model_forward
    output = model(**input_dict)
  File "/opt/lmdeploy/lmdeploy/pytorch/backends/graph_runner.py", line 24, in __call__
    return self.model(**kwargs)
  File "/usr/local/python3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/python3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2_5_vl.py", line 432, in forward
    image_embeds = self.visual(pixel_values,
  File "/usr/local/python3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/python3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2_5_vl.py", line 359, in forward
    hidden_states = blk(hidden_states, cu_seqlens=cu_seqlens_now, rotary_pos_emb=rotary_pos_emb)
  File "/usr/local/python3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/python3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2_5_vl.py", line 195, in forward
    hidden_states = hidden_states + self.attn(
  File "/usr/local/python3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/python3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2_5_vl.py", line 136, in forward
    attn_output = self.proj(attn_output)
  File "/usr/local/python3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/python3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/lmdeploy/lmdeploy/pytorch/nn/linear.py", line 1094, in forward
    return self.impl.forward(x, self.weight, self.bias, all_reduce)
  File "/opt/lmdeploy/lmdeploy/pytorch/backends/dlinfer/linear.py", line 23, in forward
    return linear(x, weight, bias, all_reduce)
  File "/opt/lmdeploy/lmdeploy/pytorch/kernels/dlinfer/linear.py", line 9, in linear
    return ext_ops.linear(x, weight, bias=bias, all_reduce=all_reduce)
  File "/opt/dlinfer/dlinfer/graph/custom_op.py", line 70, in patched_func
    return func_with_default(*args, **kwargs)
  File "/opt/dlinfer/dlinfer/ops/llm.py", line 619, in linear
    return vendor_ops_registry["linear"](x, weight, bias, all_reduce)
  File "/opt/dlinfer/dlinfer/vendor/ascend/torch_npu_ops.py", line 496, in linear
    out = torch.ops.npu.npu_mm_all_reduce_base(
  File "/usr/local/python3.10/lib/python3.10/site-packages/torch/_ops.py", line 854, in __call__
    return self_._op(*args, **(kwargs or {}))
RuntimeError: call aclnnMatmulAllReduce failed, detail:EZ1001: [PID: 13141] 2025-03-10-08:45:28.432.661 x1 not implemented for DT_FLOAT, should be in dtype support list [DT_FLOAT16,DT_BFLOAT16,].
        TraceBack (most recent call last):
        Cannot find bin of op FlashAttentionScore, integral key 0/1/|float/ND/float/ND/bf16/ND/float/ND/float/ND/float/ND/float/ND/.
        Cannot find binary for op FlashAttentionScore.
        Kernel GetWorkspace failed. opType: 39
        x1 not implemented for DT_FLOAT, should be in dtype support list [DT_FLOAT16,DT_BFLOAT16,].
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant