Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]在910B上量化后部署好无法正常访问 #3201

Open
3 tasks done
jxz542189 opened this issue Mar 3, 2025 · 1 comment
Open
3 tasks done

[Bug]在910B上量化后部署好无法正常访问 #3201

jxz542189 opened this issue Mar 3, 2025 · 1 comment
Assignees

Comments

@jxz542189
Copy link

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.
  • 3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.

Describe the bug

当前环境:910B机器(x86_64)

运行AWQ量化(lmdeploy lite auto_awq DeepSeek-R1-Distill-Qwen-7B --work-dir DeepSeek-R1-Distill-Qwen-7B-AWQ-0301-v1.0 --device npu),可以正常部署(lmdeploy serve api_server DeepSeek-R1-Distill-Qwen-7B-AWQ-0301-v1.0 --backend pytorch --device ascend --model-format awq --server-port 8005 --model-name deepseek-r1-distill-qwen-7B-awq --session-len 16000),但是无法正常访问:

2025-03-03 03:37:07,235 - lmdeploy - ERROR - engine.py:912 - Task failed
Traceback (most recent call last):
File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 907, in __task_callback
task.result()
File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 865, in _async_loop_background
await self._async_step_background(
File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 735, in _async_step_background
output = await self._async_model_forward(inputs,
File "/opt/lmdeploy/lmdeploy/utils.py", line 243, in __tmp
return (await func(*args, **kwargs))
File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 633, in _async_model_forward
ret = await __forward(inputs)
File "/opt/lmdeploy/lmdeploy/pytorch/engine/engine.py", line 610, in __forward
return await self.model_agent.async_forward(inputs, swap_in_map=swap_in_map, swap_out_map=swap_out_map)
File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 260, in async_forward
output = self._forward_impl(inputs, swap_in_map=swap_in_map, swap_out_map=swap_out_map)
File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 243, in _forward_impl
output = model_forward(
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/opt/lmdeploy/lmdeploy/pytorch/engine/model_agent.py", line 151, in model_forward
output = model(**input_dict)
File "/opt/lmdeploy/lmdeploy/pytorch/backends/graph_runner.py", line 24, in call
return self.model(**kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 451, in _fn
return fn(*args, **kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 921, in catch_errors
return callback(frame, cache_entry, hooks, frame_state, skip=1)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 400, in _convert_frame_assert
return _compile(
File "/usr/local/python3.10.5/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 676, in _compile
guarded_code = compile_inner(code, one_graph, hooks, transform)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 262, in time_wrapper
r = func(*args, **kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 535, in compile_inner
out_code = transform_code_object(code, transform)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1036, in transform_code_object
transformations(instructions, code_options)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 165, in _fn
return fn(*args, **kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 500, in transform
tracer.run()
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 2149, in run
super().run()
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 810, in run
and self.step()
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 773, in step
getattr(self, inst.opname)(inst)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 489, in wrapper
return inner_fn(self, inst)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1272, in CALL_FUNCTION_KW
self.call_function(fn, args, kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 674, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/variables/nn_module.py", line 336, in call_function
return tx.inline_user_function_return(
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 680, in inline_user_function_return
return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/dynamo/symbolic_convert.py", line 2285, in inline_call
return cls.inline_call
(parent, func, args, kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/dynamo/symbolic_convert.py", line 2399, in inline_call
tracer.run()
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 810, in run
and self.step()
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 773, in step
getattr(self, inst.opname)(inst)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 489, in wrapper
return inner_fn(self, inst)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1260, in CALL_FUNCTION_EX
self.call_function(fn, argsvars.items, kwargsvars)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 674, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 335, in call_function
return super().call_function(tx, args, kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 289, in call_function
return super().call_function(tx, args, kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 90, in call_function
return tx.inline_user_function_return(
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 680, in inline_user_function_return
return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/dynamo/symbolic_convert.py", line 2285, in inline_call
return cls.inline_call
(parent, func, args, kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/dynamo/symbolic_convert.py", line 2399, in inline_call
tracer.run()
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 810, in run
and self.step()
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 773, in step
getattr(self, inst.opname)(inst)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 489, in wrapper
return inner_fn(self, inst)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1272, in CALL_FUNCTION_KW
self.call_function(fn, args, kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 674, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/variables/nn_module.py", line 336, in call_function
return tx.inline_user_function_return(
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 680, in inline_user_function_return
return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/dynamo/symbolic_convert.py", line 2285, in inline_call
return cls.inline_call
(parent, func, args, kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/dynamo/symbolic_convert.py", line 2399, in inline_call
tracer.run()
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 810, in run
and self.step()
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 773, in step
getattr(self, inst.opname)(inst)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 489, in wrapper
return inner_fn(self, inst)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1260, in CALL_FUNCTION_EX
self.call_function(fn, argsvars.items, kwargsvars)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 674, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 335, in call_function
return super().call_function(tx, args, kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 289, in call_function
return super().call_function(tx, args, kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 90, in call_function
return tx.inline_user_function_return(
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 680, in inline_user_function_return
return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/dynamo/symbolic_convert.py", line 2285, in inline_call
return cls.inline_call
(parent, func, args, kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/dynamo/symbolic_convert.py", line 2399, in inline_call
tracer.run()
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 810, in run
and self.step()
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 773, in step
getattr(self, inst.opname)(inst)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 489, in wrapper
return inner_fn(self, inst)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1272, in CALL_FUNCTION_KW
self.call_function(fn, args, kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 674, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/variables/nn_module.py", line 336, in call_function
return tx.inline_user_function_return(
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 680, in inline_user_function_return
return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/dynamo/symbolic_convert.py", line 2285, in inline_call
return cls.inline_call
(parent, func, args, kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/dynamo/symbolic_convert.py", line 2399, in inline_call
tracer.run()
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 810, in run
and self.step()
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 773, in step
getattr(self, inst.opname)(inst)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 489, in wrapper
return inner_fn(self, inst)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1260, in CALL_FUNCTION_EX
self.call_function(fn, argsvars.items, kwargsvars)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 674, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 335, in call_function
return super().call_function(tx, args, kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 289, in call_function
return super().call_function(tx, args, kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 90, in call_function
return tx.inline_user_function_return(
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 680, in inline_user_function_return
return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/dynamo/symbolic_convert.py", line 2285, in inline_call
return cls.inline_call
(parent, func, args, kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/dynamo/symbolic_convert.py", line 2399, in inline_call
tracer.run()
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 810, in run
and self.step()
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 773, in step
getattr(self, inst.opname)(inst)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 489, in wrapper
return inner_fn(self, inst)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1219, in CALL_FUNCTION
self.call_function(fn, args, {})
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 674, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 335, in call_function
return super().call_function(tx, args, kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 289, in call_function
return super().call_function(tx, args, kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/variables/functions.py", line 90, in call_function
return tx.inline_user_function_return(
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 680, in inline_user_function_return
return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/dynamo/symbolic_convert.py", line 2285, in inline_call
return cls.inline_call
(parent, func, args, kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/dynamo/symbolic_convert.py", line 2399, in inline_call
tracer.run()
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 810, in run
and self.step()
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 773, in step
getattr(self, inst.opname)(inst)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 489, in wrapper
return inner_fn(self, inst)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1272, in CALL_FUNCTION_KW
self.call_function(fn, args, kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 674, in call_function
self.push(fn.call_function(self, args, kwargs))
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/variables/misc.py", line 562, in call_function
return self.obj.call_method(tx, self.name, args, kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch_npu/utils/_dynamo.py", line 106, in TensorVariable_call_method
return TensorVariable.call_method_raw(self, tx, name, args, kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/variables/tensor.py", line 442, in call_method
return wrap_fx_proxy(
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 1330, in wrap_fx_proxy
return wrap_fx_proxy_cls(target_cls=TensorVariable, **kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/variables/builder.py", line 1415, in wrap_fx_proxy_cls
example_value = get_fake_value(proxy.node, tx, allow_non_graph_fake=True)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 1714, in get_fake_value
raise TorchRuntimeError(str(e)).with_traceback(e.traceback) from None
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 1656, in get_fake_value
ret_val = wrap_fake_exception(
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 1190, in wrap_fake_exception
return fn()
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 1657, in
lambda: run_node(tx.output, node, args, kwargs, nnmodule)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 1782, in run_node
raise RuntimeError(make_error_message(e)).with_traceback(
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_dynamo/utils.py", line 1766, in run_node
return getattr(args[0], node.target)(*args[1:], **kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_tensor.py", line 921, in split
return torch._VF.split_with_sizes(self, split_size, dim)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/utils/_stats.py", line 20, in wrapper
return fn(*args, **kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py", line 896, in torch_dispatch
return self.dispatch(func, types, args, kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py", line 1241, in dispatch
return self._cached_dispatch_impl(func, types, args, kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py", line 974, in _cached_dispatch_impl
output = self._dispatch_impl(func, types, args, kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_subclasses/fake_tensor.py", line 1393, in _dispatch_impl
return decomposition_table[func](args, **kwargs)
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/_decomp/decompositions.py", line 1316, in split_with_sizes
torch._check_with(
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/init.py", line 1123, in _check_with
raise error_type(message_evaluated)
torch._dynamo.exc.TorchRuntimeError: Failed running call_method split(
(FakeTensor(..., device='npu:0', size=(s0, 576), dtype=torch.float16), (3584, 512, 512)), **{'dim': -1}):
Split sizes add up to 4608 but got the tensor's size of 576

from user code:
File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2.py", line 324, in forward
hidden_states = self.model(
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2.py", line 264, in forward
hidden_states, residual = decoder_layer(
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2.py", line 188, in forward
hidden_states = self.self_attn(
File "/usr/local/python3.10.5/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/lmdeploy/lmdeploy/pytorch/models/qwen2.py", line 73, in forward
query_states, key_states, value_states = self.qkv_proj.split_qkv(qkv_states)
File "/opt/lmdeploy/lmdeploy/pytorch/nn/linear.py", line 60, in split_qkv
q, k, v = x.split(sections, dim=-1)

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

You can suppress this exception and fall back to eager by setting:
import torch._dynamo
torch._dynamo.config.suppress_errors = True

2025-03-03 03:37:07,728 - lmdeploy - ERROR - async_engine.py:791 - session 1 finished, reason "error"
INFO: 172.17.0.1:60900 - "POST /v1/chat/completions HTTP/1.1" 200 OK

Reproduction

curl -s http://localhost:8005/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "deepseek-r1-distill-qwen-7B-awq",
"messages": [
{"role": "system", "content": "你是一个数学家."},
{"role": "user", "content": "1+2+3+4…+2025"}
],
"max_tokens": 10000,
"temperature": 0
}'

Environment

absl-py==2.1.0
accelerate==1.4.0
addict==2.4.0
aiohappyeyeballs==2.4.6
aiohttp==3.11.13
aiosignal==1.3.2
annotated-types==0.7.0
anyio==4.8.0
async-timeout==5.0.1
attr==0.3.2
attrs==25.1.0
auto_tune @ file:///root/selfgz859226207/compiler/lib64/auto_tune-0.1.0-py3-none-any.whl#sha256=8f08449dc1164e46c73acc85087e32a503ff77f4047d9b2c3f9597012e8adfb3
certifi==2025.1.31
cffi==1.17.1
charset-normalizer==3.4.1
click==8.1.8
cloudpickle==3.1.1
cmake==3.31.4
dataflow @ file:///root/selfgz859226207/compiler/lib64/dataflow-0.0.1-py3-none-any.whl#sha256=61d02556a49d4a5fa86ca434292c3733448fe3f62be337496414ce809c196215
datasets==2.16.0
decorator==5.2.1
dill==0.3.7
diskcache==5.6.3
distro==1.9.0
-e git+https://github.com/DeepLink-org/dlinfer.git@06f8580ae26768e982444a3937fcb64887291e60#egg=dlinfer_ascend
einops==0.8.1
exceptiongroup==1.2.2
fastapi==0.115.8
filelock==3.13.1
fire==0.7.0
frozenlist==1.5.0
fsspec==2023.10.0
h11==0.14.0
hccl @ file:///root/selfgz2398513867/hccl/lib64/hccl-0.1.0-py3-none-any.whl#sha256=0bfe7f1863fd4c3f056c0cdf806834764594c3254ec260e70255da87f7b12a94
hccl_parser @ file:///usr/local/Ascend/ascend-toolkit/8.0.0/toolkit/tools/hccl_parser-0.1-py3-none-any.whl#sha256=37541281a74de6ae3f8a15c77e1df8cf899bcbeb391a9a47ecfb582caf695fd8
httpcore==1.0.7
httpx==0.28.1
huggingface-hub==0.29.1
idna==3.10
interegular==0.3.3
Jinja2==3.1.4
jiter==0.8.2
jsonschema==4.23.0
jsonschema-specifications==2024.10.1
lark==1.2.2
llm_datadist @ file:///root/selfgz859226207/compiler/lib64/llm_datadist-0.0.1-py3-none-any.whl#sha256=7893ce8709f56d8d06f3d8b64f68b6c8e51a1e571e0d671e6be5bec99d6977a9
llvmlite==0.44.0
-e git+https://github.com/InternLM/lmdeploy@0eb625fa5059d8a1815b1747dbb4bb10e08a6836#egg=lmdeploy
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
ml_dtypes==0.5.1
mmengine-lite==0.10.6
mpmath==1.3.0
msobjdump @ file:///usr/local/Ascend/ascend-toolkit/8.0.0/toolkit/tools/msobjdump-0.1.0-py3-none-any.whl#sha256=7bfe4926d56b034b7c19dca614b040b732bde240dafb7dcddce195596dabb600
multidict==6.1.0
multiprocess==0.70.15
nest-asyncio==1.6.0
networkx==3.3
ninja==1.11.1.3
numba==0.61.0
numpy==1.24.0
op_compile_tool @ file:///root/selfgz859226207/compiler/lib64/op_compile_tool-0.1.0-py3-none-any.whl#sha256=6c2d27cab88e642baa565cf3093e5c057300b32f68e83ebca34f136126a69e92
op_gen @ file:///usr/local/Ascend/ascend-toolkit/8.0.0/toolkit/tools/op_gen-0.1-py3-none-any.whl#sha256=243815003fd38a68940ad3de96fdd5fac3a465c32e4a834772f69b4c3c216234
op_test_frame @ file:///usr/local/Ascend/ascend-toolkit/8.0.0/toolkit/tools/op_test_frame-0.1-py3-none-any.whl#sha256=ff966bcad4e85cabebbebe368a808724069c909837cc210e4bd9cfe74bb25e33
opc_tool @ file:///root/selfgz859226207/compiler/lib64/opc_tool-0.1.0-py3-none-any.whl#sha256=85eab59b3a9225f65558d8f498c06988e3523714c16de8fb8f44b21f483fd5ff
openai==1.64.0
outlines==0.0.46
packaging==24.2
pandas==2.2.3
pathlib2==2.3.7.post1
peft==0.14.0
pillow==11.0.0
platformdirs==4.3.6
propcache==0.3.0
protobuf==5.29.3
psutil==7.0.0
pyairports==2.1.1
pyarrow==19.0.1
pyarrow-hotfix==0.6
pycountry==24.6.1
pycparser==2.22
pydantic==2.10.6
pydantic_core==2.27.2
Pygments==2.19.1
python-dateutil==2.9.0.post0
pytz==2025.1
PyYAML==6.0.2
referencing==0.36.2
regex==2024.11.6
requests==2.32.3
rich==13.9.4
rpds-py==0.23.1
safetensors==0.5.3
schedule_search @ file:///root/selfgz859226207/compiler/lib64/schedule_search-0.1.0-py3-none-any.whl#sha256=dcd6b3e218d353172396cdd6b299c8ddf28b638533a037bedb2ab0526fecf700
scikit-build==0.18.0
scipy==1.15.2
sentencepiece==0.2.0
shortuuid==1.0.13
show_kernel_debug_data @ file:///usr/local/Ascend/ascend-toolkit/8.0.0/toolkit/tools/show_kernel_debug_data-0.1.0-py3-none-any.whl#sha256=9d0d08c761209b32f4d38b6555b76f87c2ec49a9130f1039ae45cc8d0b7f5363
six==1.17.0
sniffio==1.3.1
starlette==0.45.3
sympy==1.13.3
te @ file:///root/selfgz859226207/compiler/lib64/te-0.4.0-py3-none-any.whl#sha256=6d03aed35fc22ecad63a6217043ec03d75e96410a6fcfa2944447959b86c9de0
termcolor==2.5.0
tiktoken==0.9.0
timm==1.0.15
tokenizers==0.21.0
tomli==2.2.1
torch==2.3.1+cpu
torch-npu==2.3.1
torchvision==0.18.1+cpu
tornado==6.4.2
tqdm==4.67.1
transformers==4.49.0
triton==3.1.0
typing_extensions==4.12.2
tzdata==2025.1
urllib3==2.3.0
uvicorn==0.34.0
xxhash==3.5.0
yapf==0.43.0
yarl==1.18.3

Error traceback

@jxz542189 jxz542189 changed the title [Bug] [Bug]在910B上量化后部署好无法正常访问 Mar 3, 2025
@jinminxi104
Copy link
Collaborator

量化的图模式还在开发中。这个w4a16请用eager=True。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants