Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xinference Chat Bot 每次对话多轮就会卡死 #1192

Closed
andylzming opened this issue Mar 27, 2024 · 6 comments
Closed

Xinference Chat Bot 每次对话多轮就会卡死 #1192

andylzming opened this issue Mar 27, 2024 · 6 comments
Labels
Milestone

Comments

@andylzming
Copy link

andylzming commented Mar 27, 2024

Describe the bug

Xinference Chat Bot 每次对话多轮(一般两三轮)就会卡死,详见截图。

11

12

To Reproduce

To help us to reproduce this bug, please provide information below:

  1. Python :3.10.6
  2. xinference : 0.9.4
  3. Versions of crucial packages.
  4. Full stack of the error.
  5. Minimized code to reproduce the error.

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here.

@XprobeBot XprobeBot added this to the v0.9.5 milestone Mar 27, 2024
@andylzming andylzming changed the title Xinference Chat Bot 每次对话三轮就会卡死 Xinference Chat Bot 每次对话多轮就会卡死 Mar 27, 2024
@ChengjieLi28
Copy link
Contributor

ChengjieLi28 commented Mar 28, 2024

@andylzming 。我用一样的模型可以复现此问题(不一定百分百,我换个模型有时不会触发),gradio 版本:

gradio                        3.50.1
gradio_client                 0.6.1

打开F12可以看到console上有报错,然后网络中的ws中其实模型的回答已经传回来了,只是gradio没显示出来。猜测gradio版本有问题。
详见:gradio-app/gradio#6613
gradio-app/gradio#3943

按照issue里面,gradio降级到3.41,我就再也没出现这样的问题,你可以试下。

@XprobeBot XprobeBot modified the milestones: v0.10.0, v0.10.1 Mar 29, 2024
@andylzming
Copy link
Author

andylzming commented Apr 1, 2024

@ChengjieLi28

gradio降级到3.41,submit 按钮点击无效。

以下两个版本都会出现对话卡死现象

(xinference) [root@gpu-server gradio]# pip list | grep gradio
gradio                        3.47.1
gradio_client                 0.6.0
(xinference) [root@gpu-server depends]# ll xinference-dependences/ | grep gradio
-rw-r--r--. 1 root root  20298198 12月 19 21:55 gradio-3.50.2-py3-none-any.whl
-rw-r--r--. 1 root root    299220 12月 19 21:55 gradio_client-0.6.1-py3-none-any.whl

控制台如下:
111

112

@andylzming
Copy link
Author

qwen-14b 模型对话多轮正常,chatglm3-6b 不行。
另外,通过 dify 使用 xinference 与 chatglm3-6b 通信报以下错误:

  • 错误日志
INFO 04-10 16:30:35 llm_engine.py:653] Avg prompt throughput: 24.8 tokens/s, Avg generation throughput: 6.5 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%
INFO 04-10 16:30:35 async_llm_engine.py:111] Finished request 2f3a1d40-f779-11ee-b1b4-80615f20f615.
2024-04-10 16:30:35,419 xinference.api.restful_api 27390 ERROR    [address=127.0.0.1:34773, pid=24418] 0
Traceback (most recent call last):
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1394, in create_chat_completion
    data = await model.chat(prompt, system_prompt, chat_history, kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
    return self._process_result_message(result)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
    raise message.as_instanceof_cause()
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
    result = await self._run_coro(message.message_id, coro)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
    return await coro
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
    return await super().__on_receive__(message)  # type: ignore
  File "xoscar/core.pyx", line 558, in __on_receive__
    raise ex
  File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
    async with self._lock:
  File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
    with debug_async_timeout('actor_lock_timeout',
  File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
    result = await result
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/utils.py", line 45, in wrapped
    ret = await func(*args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 79, in wrapped_func
    ret = await fn(self, *args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xoscar/api.py", line 462, in _wrapper
    r = await func(self, *args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 375, in chat
    response = await self._call_wrapper(
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 103, in _async_wrapper
    return await fn(*args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/core/model.py", line 325, in _call_wrapper
    ret = await fn(*args, **kwargs)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 439, in async_chat
    return self._tool_calls_completion(
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 601, in _tool_calls_completion
    content, func, args = cls._eval_chatglm3_arguments(c, tools)
  File "/home/miniconda3/envs/xinference/lib/python3.10/site-packages/xinference/model/llm/utils.py", line 548, in _eval_chatglm3_arguments
    if isinstance(c[0], str):
KeyError: [address=127.0.0.1:34773, pid=24418] 0

@XprobeBot XprobeBot modified the milestones: v0.10.1, v0.10.2 Apr 12, 2024
@XprobeBot XprobeBot modified the milestones: v0.10.2, v0.10.3, v0.11.0 Apr 19, 2024
@XprobeBot XprobeBot modified the milestones: v0.11.0, v0.11.1, v0.11.2 May 11, 2024
@XprobeBot XprobeBot modified the milestones: v0.11.2, v0.11.3 May 24, 2024
@XprobeBot XprobeBot modified the milestones: v0.11.3, v0.11.4, v0.12.0, v0.12.1 May 31, 2024
@XprobeBot XprobeBot modified the milestones: v0.12.1, v0.12.2 Jun 14, 2024
@XprobeBot XprobeBot modified the milestones: v0.12.2, v0.12.4, v0.13.0, v0.13.1 Jun 28, 2024
@XprobeBot XprobeBot removed this from the v0.13.1 milestone Jul 12, 2024
@XprobeBot XprobeBot added this to the v0.13.2 milestone Jul 12, 2024
@XprobeBot XprobeBot modified the milestones: v0.13.2, v0.13.4 Jul 26, 2024
Copy link

github-actions bot commented Aug 6, 2024

This issue is stale because it has been open for 7 days with no activity.

@github-actions github-actions bot added the stale label Aug 6, 2024
Copy link

This issue was closed because it has been inactive for 5 days since being marked as stale.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 12, 2024
@Matrix0816
Copy link

我也遇到模型多轮对话后崩掉的问题,未找到解决方案Qwen 14B

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants