Qwen2-VL VLLM backend infer_dataset报Engine iteration timed out. This should never happen #2690

heichang12138 · 2024-12-18T09:15:28Z

Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程，最好有截图)
[INFO:swift] request_config: RequestConfig(max_tokens=2048, temperature=0.0, top_k=None, top_p=None, repetition_penalty=None, num_beams=1, stop=[], seed=None, stream=False, logprobs=False, top_logprobs=None, n=1, best_of=None, presence_penalty=0.0, frequency_penalty=0.0, length_penalty=1.0)
Map: 100%|██████████████████████████████████████████████████████████████████████████████████████| 49443/49443 [00:05<00:00, 8797.95 examples/s]
[INFO:swift] val_dataset: Dataset({
features: ['messages', 'label', 'images'],
num_rows: 49443
})
0%| | 0/49443 [00:00<?, ?it/s][INFO:swift] Using environment variable IMAGE_FACTOR, Setting image_factor: 8.
[INFO:swift] Setting resized_height: None. You can adjust this hyperparameter through the environment variable: RESIZED_HEIGHT.
[INFO:swift] Setting resized_width: None. You can adjust this hyperparameter through the environment variable: RESIZED_WIDTH.
[INFO:swift] Setting min_pixels: 3136. You can adjust this hyperparameter through the environment variable: MIN_PIXELS.
[INFO:swift] Using environment variable MAX_PIXELS, Setting max_pixels: 65536.
INFO 12-18 17:08:03 preprocess.py:215] Your model uses the legacy input pipeline instead of the new multi-modal processor. Please note that the legacy pipeline will be removed in a future release. For more details, see: vllm-project/vllm#10114
/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/PIL/Image.py:1056: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images
warnings.warn(
ERROR 12-18 17:09:03 async_llm_engine.py:886] Engine iteration timed out. This should never happen!
Exception in callback VllmEngine.patch_remove_log..new_log_task_completion(error_callback=>)(<Task finishe...imeoutError()>) at /media/cfs/caojunhao/workspace/ms-swift/swift/llm/infer/infer_engine/vllm_engine.py:397
handle: <Handle VllmEngine.patch_remove_log..new_log_task_completion(error_callback=>)(<Task finishe...imeoutError()>) at /media/cfs/caojunhao/workspace/ms-swift/swift/llm/infer/infer_engine/vllm_engine.py:397>
Traceback (most recent call last):
File "/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 866, in run_engine_loop
done, _ = await asyncio.wait(
File "/home/caojunhao/env/envs/swift/lib/python3.10/asyncio/tasks.py", line 384, in wait
return await _wait(fs, timeout, return_when, loop)
File "/home/caojunhao/env/envs/swift/lib/python3.10/asyncio/tasks.py", line 491, in _wait
await waiter
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/caojunhao/env/envs/swift/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/media/cfs/caojunhao/workspace/ms-swift/swift/llm/infer/infer_engine/vllm_engine.py", line 399, in new_log_task_completion
return_value = task.result()
File "/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 865, in run_engine_loop
async with asyncio_timeout(ENGINE_ITERATION_TIMEOUT_S):
File "/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/vllm/engine/async_timeout.py", line 95, in aexit
self._do_exit(exc_type)
File "/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/vllm/engine/async_timeout.py", line 178, in _do_exit
raise asyncio.TimeoutError
asyncio.exceptions.TimeoutError
Exception in thread Thread-4 ():
Traceback (most recent call last):
File "/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 866, in run_engine_loop
done, _ = await asyncio.wait(
File "/home/caojunhao/env/envs/swift/lib/python3.10/asyncio/tasks.py", line 384, in wait
return await _wait(fs, timeout, return_when, loop)
File "/home/caojunhao/env/envs/swift/lib/python3.10/asyncio/tasks.py", line 491, in _wait
await waiter
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/caojunhao/env/envs/swift/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/media/cfs/caojunhao/workspace/ms-swift/swift/llm/infer/infer_engine/vllm_engine.py", line 399, in new_log_task_completion
return_value = task.result()
File "/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 865, in run_engine_loop
async with asyncio_timeout(ENGINE_ITERATION_TIMEOUT_S):
File "/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/vllm/engine/async_timeout.py", line 95, in aexit
self._do_exit(exc_type)
File "/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/vllm/engine/async_timeout.py", line 178, in _do_exit
raise asyncio.TimeoutError
asyncio.exceptions.TimeoutError

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/caojunhao/env/envs/swift/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/home/caojunhao/env/envs/swift/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/media/cfs/caojunhao/workspace/ms-swift/swift/llm/infer/infer_engine/infer_engine.py", line 70, in
thread = Thread(target=lambda: asyncio.run(_batch_run(new_tasks)))
File "/home/caojunhao/env/envs/swift/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/home/caojunhao/env/envs/swift/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/media/cfs/caojunhao/workspace/ms-swift/swift/llm/infer/infer_engine/infer_engine.py", line 65, in _batch_run
return await asyncio.gather(*tasks)
File "/media/cfs/caojunhao/workspace/ms-swift/swift/llm/infer/infer_engine/infer_engine.py", line 61, in _run_infer
queue.put((i, await task))
File "/media/cfs/caojunhao/workspace/ms-swift/swift/llm/infer/infer_engine/vllm_engine.py", line 389, in infer_async
return await self._infer_full_async(**kwargs)
File "/media/cfs/caojunhao/workspace/ms-swift/swift/llm/infer/infer_engine/vllm_engine.py", line 320, in _infer_full_async
async for result in result_generator:
File "/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 1051, in generate
async for output in await self.add_request(
File "/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 948, in add_request
self.start_background_loop()
File "/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 745, in start_background_loop
raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already.

Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息，如CUDA版本，系统，GPU型号和torch版本等)
vllm 0.6.4.post1
torch 2.5.1
CUDA Version: 12.2
NVIDIA GeForce RTX 4090 D

Additional context
Add any other context about the problem here(在这里补充其他信息)
我debug了一下发现应该是数据太多，导致异步执行的时候超时。
我在swift/llm/infer/infer.py修改成了这样，可以丑陋的跑起来了，期待大佬的优化。

The text was updated successfully, but these errors were encountered:

songyang23 · 2025-01-10T13:48:11Z

我也遇到了一样的报错，安装了最新版的swift3.0以后就出现这个错误了，请问这个bug什么时候能解决？

Jintao-Huang · 2025-01-10T13:59:13Z

已经发布了3.0.2.post1

https://pypi.org/project/ms-swift/

songyang23 · 2025-01-10T14:25:10Z

谢谢大佬回复，试了下，问题解决了，
PS:感谢swift team，比llama_factory好用太多

Jintao-Huang added the bug Something isn't working label Dec 19, 2024

heichang12138 closed this as completed Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen2-VL VLLM backend infer_dataset报Engine iteration timed out. This should never happen #2690

Qwen2-VL VLLM backend infer_dataset报Engine iteration timed out. This should never happen #2690

heichang12138 commented Dec 18, 2024

songyang23 commented Jan 10, 2025

Jintao-Huang commented Jan 10, 2025

songyang23 commented Jan 10, 2025

Qwen2-VL VLLM backend infer_dataset报Engine iteration timed out. This should never happen #2690

Qwen2-VL VLLM backend infer_dataset报Engine iteration timed out. This should never happen #2690

Comments

heichang12138 commented Dec 18, 2024

songyang23 commented Jan 10, 2025

Jintao-Huang commented Jan 10, 2025

songyang23 commented Jan 10, 2025