Xinference Chat Bot 每次对话多轮就会卡死 #1192

andylzming · 2024-03-27T01:22:41Z

Describe the bug

Xinference Chat Bot 每次对话多轮（一般两三轮）就会卡死，详见截图。

To Reproduce

To help us to reproduce this bug, please provide information below:

Python ：3.10.6
xinference : 0.9.4
Versions of crucial packages.
Full stack of the error.
Minimized code to reproduce the error.

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here.

ChengjieLi28 · 2024-03-28T08:08:12Z

@andylzming 。我用一样的模型可以复现此问题（不一定百分百，我换个模型有时不会触发），gradio 版本：

gradio                        3.50.1
gradio_client                 0.6.1

打开F12可以看到console上有报错，然后网络中的ws中其实模型的回答已经传回来了，只是gradio没显示出来。猜测gradio版本有问题。
详见：gradio-app/gradio#6613
和 gradio-app/gradio#3943

按照issue里面，gradio降级到3.41，我就再也没出现这样的问题，你可以试下。

andylzming · 2024-04-01T07:28:09Z

@ChengjieLi28

gradio降级到3.41，submit 按钮点击无效。

以下两个版本都会出现对话卡死现象

(xinference) [root@gpu-server gradio]# pip list | grep gradio
gradio                        3.47.1
gradio_client                 0.6.0

(xinference) [root@gpu-server depends]# ll xinference-dependences/ | grep gradio
-rw-r--r--. 1 root root  20298198 12月 19 21:55 gradio-3.50.2-py3-none-any.whl
-rw-r--r--. 1 root root    299220 12月 19 21:55 gradio_client-0.6.1-py3-none-any.whl

控制台如下：

andylzming · 2024-04-10T07:43:52Z

qwen-14b 模型对话多轮正常，chatglm3-6b 不行。
另外，通过 dify 使用 xinference 与 chatglm3-6b 通信报以下错误：

错误日志

INFO 04-10 16:30:35 llm_engine.py:653] Avg prompt throughput: 24.8 tokens/s, Avg generation throughput: 6.5 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%
INFO 04-10 16:30:35 async_llm_engine.py:111] Finished request 2f3a1d40-f779-11ee-b1b4-80615f20f615.
2024-04-10 16:30:35,419 xinference.api.restful_api 27390 ERROR    [address=127.0.0.1:34773, pid=24418] 0
Traceback (most recent call last):
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1394, in create_chat_completion
    data = await model.chat(prompt, system_prompt, chat_history, kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
    result = await self._run_coro(message.message_id, coro)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/utils.py", line 45, in wrapped
    ret = await func(*args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 79, in wrapped_func
    ret = await fn(self, *args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 462, in _wrapper
    r = await func(self, *args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 375, in chat
    response = await self._call_wrapper(
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 103, in _async_wrapper
    return await fn(*args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 325, in _call_wrapper
    ret = await fn(*args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 439, in async_chat
    return self._tool_calls_completion(
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 601, in _tool_calls_completion
    content, func, args = cls._eval_chatglm3_arguments(c, tools)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 548, in _eval_chatglm3_arguments
    if isinstance(c[0], str):
KeyError: [address=127.0.0.1:34773, pid=24418] 0

github-actions · 2024-08-06T19:06:08Z

This issue is stale because it has been open for 7 days with no activity.

github-actions · 2024-08-12T03:38:23Z

This issue was closed because it has been inactive for 5 days since being marked as stale.

Matrix0816 · 2024-11-20T05:11:43Z

我也遇到模型多轮对话后崩掉的问题，未找到解决方案Qwen 14B

XprobeBot added this to the v0.9.5 milestone Mar 27, 2024

andylzming changed the title ~~Xinference Chat Bot 每次对话三轮就会卡死~~ Xinference Chat Bot 每次对话多轮就会卡死 Mar 27, 2024

XprobeBot modified the milestones: v0.10.0, v0.10.1 Mar 29, 2024

XprobeBot modified the milestones: v0.10.1, v0.10.2 Apr 12, 2024

XprobeBot modified the milestones: v0.10.2, v0.10.3, v0.11.0 Apr 19, 2024

XprobeBot modified the milestones: v0.11.0, v0.11.1, v0.11.2 May 11, 2024

XprobeBot modified the milestones: v0.11.2, v0.11.3 May 24, 2024

XprobeBot modified the milestones: v0.11.3, v0.11.4, v0.12.0, v0.12.1 May 31, 2024

XprobeBot modified the milestones: v0.12.1, v0.12.2 Jun 14, 2024

XprobeBot modified the milestones: v0.12.2, v0.12.4, v0.13.0, v0.13.1 Jun 28, 2024

XprobeBot removed this from the v0.13.1 milestone Jul 12, 2024

XprobeBot added this to the v0.13.2 milestone Jul 12, 2024

XprobeBot modified the milestones: v0.13.2, v0.13.4 Jul 26, 2024

github-actions bot added the stale label Aug 6, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Xinference Chat Bot 每次对话多轮就会卡死 #1192

Xinference Chat Bot 每次对话多轮就会卡死 #1192

andylzming commented Mar 27, 2024 •

edited

Loading

ChengjieLi28 commented Mar 28, 2024 •

edited

Loading

andylzming commented Apr 1, 2024 •

edited

Loading

andylzming commented Apr 10, 2024

github-actions bot commented Aug 6, 2024

github-actions bot commented Aug 12, 2024

Matrix0816 commented Nov 20, 2024

Xinference Chat Bot 每次对话多轮就会卡死 #1192

Xinference Chat Bot 每次对话多轮就会卡死 #1192

Comments

andylzming commented Mar 27, 2024 • edited Loading

Describe the bug

To Reproduce

Expected behavior

Additional context

ChengjieLi28 commented Mar 28, 2024 • edited Loading

andylzming commented Apr 1, 2024 • edited Loading

andylzming commented Apr 10, 2024

github-actions bot commented Aug 6, 2024

github-actions bot commented Aug 12, 2024

Matrix0816 commented Nov 20, 2024

andylzming commented Mar 27, 2024 •

edited

Loading

ChengjieLi28 commented Mar 28, 2024 •

edited

Loading

andylzming commented Apr 1, 2024 •

edited

Loading