Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated #50

Open
wants to merge 44 commits into
base: fix-shutdown
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
e7c7c5e
[V1][VLM] V1 support for selected single-image models. (#11632)
ywang96 Dec 31, 2024
0c6f998
[Benchmark] Add benchmark script for CPU offloading (#11533)
ApostaC Jan 1, 2025
4db72e5
[Bugfix][Refactor] Unify model management in frontend (#11660)
joerunde Jan 1, 2025
365801f
[VLM] Add max-count checking in data parser for single image models (…
DarkLight1337 Jan 1, 2025
11d8a09
[Misc] Optimize Qwen2-VL LoRA test (#11663)
jeejeelee Jan 1, 2025
f962f42
[Misc] Replace space with - in the file names (#11667)
houseroad Jan 1, 2025
6d70198
[Doc] Fix typo (#11666)
serihiro Jan 1, 2025
7300144
[V1] Implement Cascade Attention (#11635)
WoosukKwon Jan 1, 2025
a115ac4
[VLM] Move supported limits and max tokens to merged multi-modal proc…
DarkLight1337 Jan 1, 2025
23c1b10
[VLM][Bugfix] Multi-modal processor compatible with V1 multi-input (#…
DarkLight1337 Jan 2, 2025
b6087a6
[mypy] Pass type checking in vllm/inputs (#11680)
CloseChoice Jan 2, 2025
8c38ee7
[VLM] Merged multi-modal processor for LLaVA-NeXT (#11682)
DarkLight1337 Jan 2, 2025
84c35c3
According to vllm.EngineArgs, the name should be distributed_executor…
chunyang-wen Jan 2, 2025
2f38518
[Bugfix] Free cross attention block table for preempted-for-recompute…
kathyyu-google Jan 2, 2025
b55ed6e
[V1][Minor] Optimize token_ids_cpu copy (#11692)
WoosukKwon Jan 2, 2025
187e329
[Bugfix] Change kv scaling factor by param json on nvidia gpu (#11688)
bjmsong Jan 2, 2025
5dba257
Resolve race conditions in Marlin kernel (#11493)
wchen61 Jan 2, 2025
68d3780
[Misc] Minimum requirements for SageMaker compatibility (#11576)
nathan-az Jan 2, 2025
2f1e8e8
Update default max_num_batch_tokens for chunked prefill (#11694)
SachinVarghese Jan 3, 2025
07064cb
[Bugfix] Check chain_speculative_sampling before calling it (#11673)
houseroad Jan 3, 2025
fd3a62a
[perf-benchmark] Fix dependency for steps in benchmark pipeline (#11710)
khluu Jan 3, 2025
e1a5c2f
[Model] Whisper model implementation (#11280)
aurickq Jan 3, 2025
1c4b92a
updated
robertgshaw2-redhat Jan 3, 2025
eb9b00b
stash
robertgshaw2-redhat Jan 3, 2025
80c751e
[V1] Simplify Shutdown (#11659)
robertgshaw2-redhat Jan 3, 2025
1da99a8
updated
robertgshaw2-redhat Jan 3, 2025
ca7b92d
Merge branch 'main' into tp-shutdown
robertgshaw2-redhat Jan 3, 2025
2743166
updated
robertgshaw2-redhat Jan 3, 2025
8e257c1
stash
robertgshaw2-redhat Jan 3, 2025
b7c50dc
revert spurious change
robertgshaw2-redhat Jan 3, 2025
dcfd3b8
updated
robertgshaw2-redhat Jan 3, 2025
6e0e0d4
stash
robertgshaw2-redhat Jan 3, 2025
55a6195
updated
robertgshaw2-redhat Jan 3, 2025
aa6954f
updated
robertgshaw2-redhat Jan 3, 2025
1d15ae0
remove cruft
robertgshaw2-redhat Jan 3, 2025
0347baa
Update vllm/v1/executor/multiproc_executor.py
robertgshaw2-redhat Jan 3, 2025
20b8fa2
stash
robertgshaw2-redhat Jan 3, 2025
32840f2
Merge branch 'tp-shutdown' of https://github.com/neuralmagic/vllm int…
robertgshaw2-redhat Jan 3, 2025
884879a
switch to SIGUSR1
robertgshaw2-redhat Jan 3, 2025
bb86a03
updated
robertgshaw2-redhat Jan 3, 2025
405bcc1
Update vllm/v1/engine/core_client.py
robertgshaw2-redhat Jan 3, 2025
25e0fea
update message
robertgshaw2-redhat Jan 3, 2025
efd6270
updated
robertgshaw2-redhat Jan 3, 2025
a5a306e
fixed!
robertgshaw2-redhat Jan 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 28 additions & 10 deletions vllm/v1/executor/multiproc_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@
import signal
import sys
import time
import weakref
from dataclasses import dataclass
from enum import Enum, auto
from multiprocessing.process import BaseProcess
from typing import Any, Dict, List, Optional, Tuple

import psutil
import zmq

from vllm.config import VllmConfig
Expand All @@ -19,8 +19,9 @@
from vllm.executor.multiproc_worker_utils import (
_add_prefix, set_multiprocessing_worker_envs)
from vllm.logger import init_logger
from vllm.utils import (get_distributed_init_method, get_mp_context,
get_open_port, get_open_zmq_ipc_path, zmq_socket_ctx)
from vllm.utils import (get_distributed_init_method, get_exception_traceback,
get_mp_context, get_open_port, get_open_zmq_ipc_path,
kill_process_tree, zmq_socket_ctx)
from vllm.v1.executor.abstract import Executor
from vllm.v1.outputs import ModelRunnerOutput
from vllm.worker.worker_base import WorkerWrapperBase
Expand All @@ -34,10 +35,25 @@
class MultiprocExecutor(Executor):

def __init__(self, vllm_config: VllmConfig) -> None:
# Call self.shutdown at exit to clean up
Copy link
Collaborator Author

@robertgshaw2-redhat robertgshaw2-redhat Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NOTE: removed this because this creates a circular reference that can prevent us from being gced (finalizer shutdown function cannot be a bound method of self + EngineCore calls already calls executor.shutdown() at its exit.

# and ensure workers will be terminated.
self._finalizer = weakref.finalize(self, self.shutdown)

# The child processes will send SIGQUIT when unrecoverable
# errors happen. We kill the process tree here so that the
# stack trace is very evident.
# TODO: rather than killing the main process, we should
# figure out how to raise an AsyncEngineDeadError and
# handle at the API server level so we can return a better
# error code to the clients calling VLLM.

def sigquit_handler(signum, frame):
logger.fatal(
"MulitprocExecutor got SIGQUIT from worker processes, shutting "
"down. See stack trace above for root cause issue.")
# Propagate error up to parent process.
parent_process = psutil.Process().parent()
parent_process.send_signal(signal.SIGQUIT)
kill_process_tree(os.getpid())

signal.signal(signal.SIGQUIT, sigquit_handler)
self.vllm_config = vllm_config
self.parallel_config = vllm_config.parallel_config

Expand Down Expand Up @@ -321,7 +337,8 @@ def signal_handler(signum, frame):
signal.signal(signal.SIGTERM, signal_handler)
signal.signal(signal.SIGINT, signal_handler)

worker = None
parent_process = psutil.Process().parent()
worker: Optional[WorkerProc] = None
try:
worker = WorkerProc(*args, **kwargs)

Expand All @@ -335,9 +352,10 @@ def signal_handler(signum, frame):
except SystemExit:
logger.debug("Worker interrupted.")

except BaseException as e:
logger.exception(e)
raise
except Exception:
traceback = get_exception_traceback()
logger.error("Worker hit an exception: %s", traceback)
parent_process.send_signal(signal.SIGQUIT)

finally:
# Clean up once worker exits busy loop
Expand Down
Loading