You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
[INFO:swift] request_config: RequestConfig(max_tokens=2048, temperature=0.0, top_k=None, top_p=None, repetition_penalty=None, num_beams=1, stop=[], seed=None, stream=False, logprobs=False, top_logprobs=None, n=1, best_of=None, presence_penalty=0.0, frequency_penalty=0.0, length_penalty=1.0)
Map: 100%|██████████████████████████████████████████████████████████████████████████████████████| 49443/49443 [00:05<00:00, 8797.95 examples/s]
[INFO:swift] val_dataset: Dataset({
features: ['messages', 'label', 'images'],
num_rows: 49443
})
0%| | 0/49443 [00:00<?, ?it/s][INFO:swift] Using environment variable IMAGE_FACTOR, Setting image_factor: 8.
[INFO:swift] Setting resized_height: None. You can adjust this hyperparameter through the environment variable: RESIZED_HEIGHT.
[INFO:swift] Setting resized_width: None. You can adjust this hyperparameter through the environment variable: RESIZED_WIDTH.
[INFO:swift] Setting min_pixels: 3136. You can adjust this hyperparameter through the environment variable: MIN_PIXELS.
[INFO:swift] Using environment variable MAX_PIXELS, Setting max_pixels: 65536.
INFO 12-18 17:08:03 preprocess.py:215] Your model uses the legacy input pipeline instead of the new multi-modal processor. Please note that the legacy pipeline will be removed in a future release. For more details, see: vllm-project/vllm#10114
/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/PIL/Image.py:1056: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images
warnings.warn(
ERROR 12-18 17:09:03 async_llm_engine.py:886] Engine iteration timed out. This should never happen!
Exception in callback VllmEngine.patch_remove_log..new_log_task_completion(error_callback=>)(<Task finishe...imeoutError()>) at /media/cfs/caojunhao/workspace/ms-swift/swift/llm/infer/infer_engine/vllm_engine.py:397
handle: <Handle VllmEngine.patch_remove_log..new_log_task_completion(error_callback=>)(<Task finishe...imeoutError()>) at /media/cfs/caojunhao/workspace/ms-swift/swift/llm/infer/infer_engine/vllm_engine.py:397>
Traceback (most recent call last):
File "/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 866, in run_engine_loop
done, _ = await asyncio.wait(
File "/home/caojunhao/env/envs/swift/lib/python3.10/asyncio/tasks.py", line 384, in wait
return await _wait(fs, timeout, return_when, loop)
File "/home/caojunhao/env/envs/swift/lib/python3.10/asyncio/tasks.py", line 491, in _wait
await waiter
asyncio.exceptions.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/caojunhao/env/envs/swift/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/media/cfs/caojunhao/workspace/ms-swift/swift/llm/infer/infer_engine/vllm_engine.py", line 399, in new_log_task_completion
return_value = task.result()
File "/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 865, in run_engine_loop
async with asyncio_timeout(ENGINE_ITERATION_TIMEOUT_S):
File "/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/vllm/engine/async_timeout.py", line 95, in aexit
self._do_exit(exc_type)
File "/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/vllm/engine/async_timeout.py", line 178, in _do_exit
raise asyncio.TimeoutError
asyncio.exceptions.TimeoutError
Exception in thread Thread-4 ():
Traceback (most recent call last):
File "/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 866, in run_engine_loop
done, _ = await asyncio.wait(
File "/home/caojunhao/env/envs/swift/lib/python3.10/asyncio/tasks.py", line 384, in wait
return await _wait(fs, timeout, return_when, loop)
File "/home/caojunhao/env/envs/swift/lib/python3.10/asyncio/tasks.py", line 491, in _wait
await waiter
asyncio.exceptions.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/caojunhao/env/envs/swift/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/media/cfs/caojunhao/workspace/ms-swift/swift/llm/infer/infer_engine/vllm_engine.py", line 399, in new_log_task_completion
return_value = task.result()
File "/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 865, in run_engine_loop
async with asyncio_timeout(ENGINE_ITERATION_TIMEOUT_S):
File "/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/vllm/engine/async_timeout.py", line 95, in aexit
self._do_exit(exc_type)
File "/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/vllm/engine/async_timeout.py", line 178, in _do_exit
raise asyncio.TimeoutError
asyncio.exceptions.TimeoutError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/caojunhao/env/envs/swift/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/home/caojunhao/env/envs/swift/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/media/cfs/caojunhao/workspace/ms-swift/swift/llm/infer/infer_engine/infer_engine.py", line 70, in
thread = Thread(target=lambda: asyncio.run(_batch_run(new_tasks)))
File "/home/caojunhao/env/envs/swift/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/home/caojunhao/env/envs/swift/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/media/cfs/caojunhao/workspace/ms-swift/swift/llm/infer/infer_engine/infer_engine.py", line 65, in _batch_run
return await asyncio.gather(*tasks)
File "/media/cfs/caojunhao/workspace/ms-swift/swift/llm/infer/infer_engine/infer_engine.py", line 61, in _run_infer
queue.put((i, await task))
File "/media/cfs/caojunhao/workspace/ms-swift/swift/llm/infer/infer_engine/vllm_engine.py", line 389, in infer_async
return await self._infer_full_async(**kwargs)
File "/media/cfs/caojunhao/workspace/ms-swift/swift/llm/infer/infer_engine/vllm_engine.py", line 320, in _infer_full_async
async for result in result_generator:
File "/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 1051, in generate
async for output in await self.add_request(
File "/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 948, in add_request
self.start_background_loop()
File "/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 745, in start_background_loop
raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already.
Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
vllm 0.6.4.post1
torch 2.5.1
CUDA Version: 12.2
NVIDIA GeForce RTX 4090 D
Additional context
Add any other context about the problem here(在这里补充其他信息)
我debug了一下发现应该是数据太多,导致异步执行的时候超时。
我在swift/llm/infer/infer.py修改成了这样,可以丑陋的跑起来了,期待大佬的优化。
The text was updated successfully, but these errors were encountered:
Describe the bug
What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
[INFO:swift] request_config: RequestConfig(max_tokens=2048, temperature=0.0, top_k=None, top_p=None, repetition_penalty=None, num_beams=1, stop=[], seed=None, stream=False, logprobs=False, top_logprobs=None, n=1, best_of=None, presence_penalty=0.0, frequency_penalty=0.0, length_penalty=1.0)
Map: 100%|██████████████████████████████████████████████████████████████████████████████████████| 49443/49443 [00:05<00:00, 8797.95 examples/s]
[INFO:swift] val_dataset: Dataset({
features: ['messages', 'label', 'images'],
num_rows: 49443
})
0%| | 0/49443 [00:00<?, ?it/s][INFO:swift] Using environment variable
IMAGE_FACTOR
, Setting image_factor: 8.[INFO:swift] Setting resized_height: None. You can adjust this hyperparameter through the environment variable:
RESIZED_HEIGHT
.[INFO:swift] Setting resized_width: None. You can adjust this hyperparameter through the environment variable:
RESIZED_WIDTH
.[INFO:swift] Setting min_pixels: 3136. You can adjust this hyperparameter through the environment variable:
MIN_PIXELS
.[INFO:swift] Using environment variable
MAX_PIXELS
, Setting max_pixels: 65536.INFO 12-18 17:08:03 preprocess.py:215] Your model uses the legacy input pipeline instead of the new multi-modal processor. Please note that the legacy pipeline will be removed in a future release. For more details, see: vllm-project/vllm#10114
/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/PIL/Image.py:1056: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images
warnings.warn(
ERROR 12-18 17:09:03 async_llm_engine.py:886] Engine iteration timed out. This should never happen!
Exception in callback VllmEngine.patch_remove_log..new_log_task_completion(error_callback=>)(<Task finishe...imeoutError()>) at /media/cfs/caojunhao/workspace/ms-swift/swift/llm/infer/infer_engine/vllm_engine.py:397
handle: <Handle VllmEngine.patch_remove_log..new_log_task_completion(error_callback=>)(<Task finishe...imeoutError()>) at /media/cfs/caojunhao/workspace/ms-swift/swift/llm/infer/infer_engine/vllm_engine.py:397>
Traceback (most recent call last):
File "/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 866, in run_engine_loop
done, _ = await asyncio.wait(
File "/home/caojunhao/env/envs/swift/lib/python3.10/asyncio/tasks.py", line 384, in wait
return await _wait(fs, timeout, return_when, loop)
File "/home/caojunhao/env/envs/swift/lib/python3.10/asyncio/tasks.py", line 491, in _wait
await waiter
asyncio.exceptions.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/caojunhao/env/envs/swift/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/media/cfs/caojunhao/workspace/ms-swift/swift/llm/infer/infer_engine/vllm_engine.py", line 399, in new_log_task_completion
return_value = task.result()
File "/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 865, in run_engine_loop
async with asyncio_timeout(ENGINE_ITERATION_TIMEOUT_S):
File "/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/vllm/engine/async_timeout.py", line 95, in aexit
self._do_exit(exc_type)
File "/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/vllm/engine/async_timeout.py", line 178, in _do_exit
raise asyncio.TimeoutError
asyncio.exceptions.TimeoutError
Exception in thread Thread-4 ():
Traceback (most recent call last):
File "/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 866, in run_engine_loop
done, _ = await asyncio.wait(
File "/home/caojunhao/env/envs/swift/lib/python3.10/asyncio/tasks.py", line 384, in wait
return await _wait(fs, timeout, return_when, loop)
File "/home/caojunhao/env/envs/swift/lib/python3.10/asyncio/tasks.py", line 491, in _wait
await waiter
asyncio.exceptions.CancelledError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/caojunhao/env/envs/swift/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/media/cfs/caojunhao/workspace/ms-swift/swift/llm/infer/infer_engine/vllm_engine.py", line 399, in new_log_task_completion
return_value = task.result()
File "/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 865, in run_engine_loop
async with asyncio_timeout(ENGINE_ITERATION_TIMEOUT_S):
File "/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/vllm/engine/async_timeout.py", line 95, in aexit
self._do_exit(exc_type)
File "/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/vllm/engine/async_timeout.py", line 178, in _do_exit
raise asyncio.TimeoutError
asyncio.exceptions.TimeoutError
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/caojunhao/env/envs/swift/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/home/caojunhao/env/envs/swift/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/media/cfs/caojunhao/workspace/ms-swift/swift/llm/infer/infer_engine/infer_engine.py", line 70, in
thread = Thread(target=lambda: asyncio.run(_batch_run(new_tasks)))
File "/home/caojunhao/env/envs/swift/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/home/caojunhao/env/envs/swift/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/media/cfs/caojunhao/workspace/ms-swift/swift/llm/infer/infer_engine/infer_engine.py", line 65, in _batch_run
return await asyncio.gather(*tasks)
File "/media/cfs/caojunhao/workspace/ms-swift/swift/llm/infer/infer_engine/infer_engine.py", line 61, in _run_infer
queue.put((i, await task))
File "/media/cfs/caojunhao/workspace/ms-swift/swift/llm/infer/infer_engine/vllm_engine.py", line 389, in infer_async
return await self._infer_full_async(**kwargs)
File "/media/cfs/caojunhao/workspace/ms-swift/swift/llm/infer/infer_engine/vllm_engine.py", line 320, in _infer_full_async
async for result in result_generator:
File "/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 1051, in generate
async for output in await self.add_request(
File "/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 948, in add_request
self.start_background_loop()
File "/home/caojunhao/env/envs/swift/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 745, in start_background_loop
raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Background loop has errored already.
Your hardware and system info
Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等)
vllm 0.6.4.post1
torch 2.5.1
CUDA Version: 12.2
NVIDIA GeForce RTX 4090 D
Additional context
Add any other context about the problem here(在这里补充其他信息)
我debug了一下发现应该是数据太多,导致异步执行的时候超时。
我在swift/llm/infer/infer.py修改成了这样,可以丑陋的跑起来了,期待大佬的优化。
The text was updated successfully, but these errors were encountered: