-
Notifications
You must be signed in to change notification settings - Fork 522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
报错 No available block found in 60 second. #2594
Comments
vLLM 似乎跑挂了,他有自动重启吗? |
没有重启 我现在暂时解决了,首先是把版本从0.16.2升级到了0.16.3,然后再把以前下过的模型文件删了重新再下了次,那个报错维度错误感觉像是文件出的问题,但是我也没有动过,很奇怪 |
可能还是有问题的,vllm 引擎应该已经死掉了,不过自动杀掉重启的机制没生效,可能还要解决下。 |
This issue is stale because it has been open for 7 days with no activity. |
This issue was closed because it has been inactive for 5 days since being marked as stale. |
v1.1.0版本也复现了这个问题。通过xinference的webui已无法关闭模型,而且直接kill掉模型所占用的进程再launch模型无法启动,最后只能重启xinference服务。 |
This issue is stale because it has been open for 7 days with no activity. |
This issue was closed because it has been inactive for 5 days since being marked as stale. |
System Info / 系統信息
4*32g v100
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
Version info / 版本信息
0.16.2
The command used to start Xinference / 用以启动 xinference 的命令
nohup启动
Reproduction / 复现过程
报错过程
![image](https://private-user-images.githubusercontent.com/59902823/390252612-18e6bb17-af17-43fa-a7fe-b567d1c4ba5c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk3OTM4NDUsIm5iZiI6MTczOTc5MzU0NSwicGF0aCI6Ii81OTkwMjgyMy8zOTAyNTI2MTItMThlNmJiMTctYWYxNy00M2ZhLWE3ZmUtYjU2N2QxYzRiYTVjLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE3VDExNTkwNVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTBjOTFlMDNjYTBjNGJiNjhmNzYwNjFmZTkwZTQzZTJjNDBiY2E4ZmZlNzhjYzc0NDYzN2VkYWYxOWJjZTBlNDEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.Nu1z5Te7r3QCnQ0t8plRzbNI5QB1kOnXbFuFe0HNV1Q)
vllm部署模型 4卡 之前一直好好的 然后调用会话后出现 第一张卡直接的gpu进程没了 然后其他三张卡模型还挂着 但是调用会话就一直没响应 只能进行重启服务再挂载模型
报错信息如下
ERROR 11-27 11:41:33 async_llm_engine.py:64] Engine background task failed
ERROR 11-27 11:41:33 async_llm_engine.py:64] Traceback (most recent call last):
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 54, in _log_task_completion
ERROR 11-27 11:41:33 async_llm_engine.py:64] return_value = task.result()
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 851, in run_engine_loop
ERROR 11-27 11:41:33 async_llm_engine.py:64] result = task.result()
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 774, in engine_step
ERROR 11-27 11:41:33 async_llm_engine.py:64] request_outputs = await self.engine.step_async(virtual_engine)
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 346, in step_async
ERROR 11-27 11:41:33 async_llm_engine.py:64] outputs = await self.model_executor.execute_model_async(
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 181, in execute_model_async
ERROR 11-27 11:41:33 async_llm_engine.py:64] return await self._driver_execute_model_async(execute_model_req)
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 224, in _driver_execute_model_async
ERROR 11-27 11:41:33 async_llm_engine.py:64] return await self.driver_exec_model(execute_model_req)
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/concurrent/futures/thread.py", line 52, in run
ERROR 11-27 11:41:33 async_llm_engine.py:64] result = self.fn(*self.args, **self.kwargs)
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 303, in execute_model
ERROR 11-27 11:41:33 async_llm_engine.py:64] inputs = self.prepare_input(execute_model_req)
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 291, in prepare_input
ERROR 11-27 11:41:33 async_llm_engine.py:64] return self._get_driver_input_and_broadcast(execute_model_req)
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 253, in _get_driver_input_and_broadcast
ERROR 11-27 11:41:33 async_llm_engine.py:64] self.model_runner.prepare_model_input(
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1586, in prepare_model_input
ERROR 11-27 11:41:33 async_llm_engine.py:64] model_input = self._prepare_model_input_tensors(
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1196, in _prepare_model_input_tensors
ERROR 11-27 11:41:33 async_llm_engine.py:64] return builder.build() # type: ignore
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 867, in build
ERROR 11-27 11:41:33 async_llm_engine.py:64] attn_metadata = self.attn_metadata_builder.build(
ERROR 11-27 11:41:33 async_llm_engine.py:64] File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/attention/backends/utils.py", line 215, in build
ERROR 11-27 11:41:33 async_llm_engine.py:64] input_block_tables[i, :len(block_table)] = block_table
ERROR 11-27 11:41:33 async_llm_engine.py:64] ValueError: could not broadcast input array from shape (516,) into shape (512,)
2024-11-27 11:41:33,903 xinference.core.model 931290 ERROR [request 7b8ea0e6-ac71-11ef-9b19-fa163ea8cbc1] Leave chat, error: could not broadcast input array from shape (516,) into shape (512,), elapsed time: 6 s
Traceback (most recent call last):
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/utils.py", line 78, in wrapped
ret = await func(*args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 709, in chat
response = await self._call_wrapper_json(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 517, in _call_wrapper_json
return await self._call_wrapper("json", fn, *args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 122, in _async_wrapper
return await fn(*args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 526, in _call_wrapper
ret = await fn(*args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/model/llm/vllm/utils.py", line 30, in _async_wrapper
return await fn(self, *args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 706, in async_chat
c = await self.async_generate(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/model/llm/vllm/utils.py", line 30, in _async_wrapper
return await fn(self, *args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 597, in async_generate
async for request_output in results_generator:
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 1029, in generate
async for output in await self.add_request(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 112, in generator
raise result
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 54, in _log_task_completion
return_value = task.result()
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 851, in run_engine_loop
result = task.result()
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 774, in engine_step
request_outputs = await self.engine.step_async(virtual_engine)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 346, in step_async
outputs = await self.model_executor.execute_model_async(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 181, in execute_model_async
return await self._driver_execute_model_async(execute_model_req)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 224, in _driver_execute_model_async
return await self.driver_exec_model(execute_model_req)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 303, in execute_model
inputs = self.prepare_input(execute_model_req)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 291, in prepare_input
return self._get_driver_input_and_broadcast(execute_model_req)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 253, in _get_driver_input_and_broadcast
self.model_runner.prepare_model_input(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1586, in prepare_model_input
model_input = self._prepare_model_input_tensors(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1196, in _prepare_model_input_tensors
return builder.build() # type: ignore
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 867, in build
attn_metadata = self.attn_metadata_builder.build(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/attention/backends/utils.py", line 215, in build
input_block_tables[i, :len(block_table)] = block_table
ValueError: could not broadcast input array from shape (516,) into shape (512,)
2024-11-27 11:41:33,909 xinference.api.restful_api 929546 ERROR [address=0.0.0.0:36473, pid=931290] could not broadcast input array from shape (516,) into shape (512,)
Traceback (most recent call last):
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1998, in create_chat_completion
data = await model.chat(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/backends/context.py", line 231, in send
return self._process_result_message(result)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
raise message.as_instanceof_cause()
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
result = await self._run_coro(message.message_id, coro)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
return await coro
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/api.py", line 384, in on_receive
return await super().on_receive(message) # type: ignore
File "xoscar/core.pyx", line 558, in on_receive
raise ex
File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive
async with self._lock:
File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive
with debug_async_timeout('actor_lock_timeout',
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive
result = await result
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 98, in wrapped_func
ret = await fn(self, *args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/api.py", line 462, in _wrapper
r = await func(self, *args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/utils.py", line 78, in wrapped
ret = await func(*args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 709, in chat
response = await self._call_wrapper_json(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 517, in _call_wrapper_json
return await self._call_wrapper("json", fn, *args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 122, in _async_wrapper
return await fn(*args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/core/model.py", line 526, in _call_wrapper
ret = await fn(*args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/model/llm/vllm/utils.py", line 30, in _async_wrapper
return await fn(self, *args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 706, in async_chat
c = await self.async_generate(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/model/llm/vllm/utils.py", line 30, in _async_wrapper
return await fn(self, *args, **kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/model/llm/vllm/core.py", line 597, in async_generate
async for request_output in results_generator:
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 1029, in generate
async for output in await self.add_request(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 112, in generator
raise result
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 54, in _log_task_completion
return_value = task.result()
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 851, in run_engine_loop
result = task.result()
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 774, in engine_step
request_outputs = await self.engine.step_async(virtual_engine)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 346, in step_async
outputs = await self.model_executor.execute_model_async(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/executor/distributed_gpu_executor.py", line 181, in execute_model_async
return await self._driver_execute_model_async(execute_model_req)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/executor/multiproc_gpu_executor.py", line 224, in _driver_execute_model_async
return await self.driver_exec_model(execute_model_req)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/concurrent/futures/thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 303, in execute_model
inputs = self.prepare_input(execute_model_req)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 291, in prepare_input
return self._get_driver_input_and_broadcast(execute_model_req)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 253, in _get_driver_input_and_broadcast
self.model_runner.prepare_model_input(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1586, in prepare_model_input
model_input = self._prepare_model_input_tensors(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 1196, in _prepare_model_input_tensors
return builder.build() # type: ignore
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 867, in build
attn_metadata = self.attn_metadata_builder.build(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/vllm/attention/backends/utils.py", line 215, in build
input_block_tables[i, :len(block_table)] = block_table
ValueError: [address=0.0.0.0:36473, pid=931290] could not broadcast input array from shape (516,) into shape (512,)
2024-11-27 11:41:34,736 xinference.model.llm.vllm.utils 931290 INFO Detecting vLLM is not health, prepare to quit the process
2024-11-27 11:41:34,736 xinference.model.llm.vllm.core 931290 INFO Stopping vLLM engine
INFO 11-27 11:41:34 multiproc_worker_utils.py:133] Terminating local vLLM worker processes
2024-11-27 11:41:35,202 xinference.api.restful_api 929546 ERROR Remote server 0.0.0.0:36473 closed
Traceback (most recent call last):
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1998, in create_chat_completion
data = await model.chat(
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/backends/context.py", line 230, in send
result = await self._wait(future, actor_ref.address, send_message) # type: ignore
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/backends/context.py", line 115, in _wait
return await future
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/backends/context.py", line 106, in _wait
await asyncio.shield(future)
File "/ai/anaconda3/envs/xinference0161/lib/python3.10/site-packages/xoscar/backends/core.py", line 84, in _listen
raise ServerClosed(
xoscar.errors.ServerClosed: Remote server 0.0.0.0:36473 closed
2024-11-27 11:41:36,476 xinference.core.worker 929804 WARNING Process 0.0.0.0:36473 is down.
(VllmWorkerProcess pid=931684) WARNING 11-27 11:42:27 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=931682) WARNING 11-27 11:42:27 shm_broadcast.py:396] No available block found in 60 second.
(VllmWorkerProcess pid=931683) WARNING 11-27 11:42:27 shm_broadcast.py:396] No available block found in 60 second.
Expected behavior / 期待表现
修复bug
The text was updated successfully, but these errors were encountered: