You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
2025-03-10 08:45:28,436 - lmdeploy - ERROR - model_agent.py:391 - Task <ModelAgentLoop> failed
Traceback (most recent call last):
File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 386, in _on_finish_callback
task.result()
File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 374, in _async_loop_background
await self._async_step_background(
File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 322, in _async_step_background
output = await self._async_model_forward(inputs,
File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 243, in _async_model_forward
ret = await __forward(inputs)
File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 220, in __forward
return await self.async_forward(inputs, swap_in_map=swap_in_map, swap_out_map=swap_out_map)
File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 538, in async_forward
output = self._forward_impl(inputs, swap_in_map=swap_in_map, swap_out_map=swap_out_map)
File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 521, in _forward_impl
output = model_forward(
File "/usr/local/python3.10/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 75, in model_forward
output = model(**input_dict)
File "/opt/lmdeploy/lmdeploy/pytorch/backends/graph_runner.py", line 24, in __call__
return self.model(**kwargs)
File "/usr/local/python3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2_5_vl.py", line 432, in forward
image_embeds = self.visual(pixel_values,
File "/usr/local/python3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2_5_vl.py", line 359, in forward
hidden_states = blk(hidden_states, cu_seqlens=cu_seqlens_now, rotary_pos_emb=rotary_pos_emb)
File "/usr/local/python3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2_5_vl.py", line 195, in forward
hidden_states = hidden_states + self.attn(
File "/usr/local/python3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2_5_vl.py", line 136, in forward
attn_output = self.proj(attn_output)
File "/usr/local/python3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/lmdeploy/lmdeploy/pytorch/nn/linear.py", line 1094, in forward
return self.impl.forward(x, self.weight, self.bias, all_reduce)
File "/opt/lmdeploy/lmdeploy/pytorch/backends/dlinfer/linear.py", line 23, in forward
return linear(x, weight, bias, all_reduce)
File "/opt/lmdeploy/lmdeploy/pytorch/kernels/dlinfer/linear.py", line 9, in linear
return ext_ops.linear(x, weight, bias=bias, all_reduce=all_reduce)
File "/opt/dlinfer/dlinfer/graph/custom_op.py", line 70, in patched_func
return func_with_default(*args, **kwargs)
File "/opt/dlinfer/dlinfer/ops/llm.py", line 619, in linear
return vendor_ops_registry["linear"](x, weight, bias, all_reduce)
File "/opt/dlinfer/dlinfer/vendor/ascend/torch_npu_ops.py", line 496, in linear
out = torch.ops.npu.npu_mm_all_reduce_base(
File "/usr/local/python3.10/lib/python3.10/site-packages/torch/_ops.py", line 854, in __call__
return self_._op(*args, **(kwargs or {}))
RuntimeError: call aclnnMatmulAllReduce failed, detail:EZ1001: [PID: 13141] 2025-03-10-08:45:28.432.661 x1 not implemented forDT_FLOAT, should bein dtype support list [DT_FLOAT16,DT_BFLOAT16,].
TraceBack (most recent call last):
Cannot find bin of op FlashAttentionScore, integral key 0/1/|float/ND/float/ND/bf16/ND/float/ND/float/ND/float/ND/float/ND/.
Cannot find binary for op FlashAttentionScore.
Kernel GetWorkspace failed. opType: 39
x1 not implemented forDT_FLOAT, should bein dtype support list [DT_FLOAT16,DT_BFLOAT16,].
The text was updated successfully, but these errors were encountered:
Checklist
Describe the bug
NPU下,Qwen2.5-VL推理报错,是还有算子未支持吗?但看dlinfer显示已支持Qwen2.5-VL
Reproduction
lmdeploy serve api_server /mnt/cephfs/tiancaiye/r1_model/Qwen2.5-VL-72B-Reasoning/ --tp 8 --server-port 8080 --device ascend --backend pytorch --eager-mode --dtype bfloat16
随后使用OpenAI client对话
Environment
sys.platform: linux Python: 3.10.15 (main, Nov 27 2024, 06:37:16) [GCC 11.4.0] CUDA available: False MUSA available: False numpy_random_seed: 2147483648 GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 PyTorch: 2.3.1+cpu PyTorch compiling details: PyTorch built with: - GCC 9.3 - C++ Version: 201703 - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications - Intel(R) MKL-DNN v3.3.6 (Git Hash 86e6af5974177e513fd3fee58425e1063e7f1361) - OpenMP 201511 (a.k.a. OpenMP 4.5) - LAPACK is enabled (usually provided by MKL) - NNPACK is enabled - CPU capability usage: AVX512 - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.3.1, USE_CUDA=0, USE_CUDNN=OFF, USE_CUSPARSELT=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, TorchVision: 0.18.1+cpu LMDeploy: 0.7.1+ transformers: 4.49.0 gradio: 5.20.0 fastapi: 0.115.11 pydantic: 2.10.6 triton: Not Found
Error traceback
The text was updated successfully, but these errors were encountered: