-
-
Notifications
You must be signed in to change notification settings - Fork 5.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Begin refactoring executor_base ABC #9392
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
c703862
to
dd4b28c
Compare
vllm/executor/executor_base.py
Outdated
@@ -50,6 +50,15 @@ def __init__( | |||
def _init_executor(self) -> None: | |||
pass | |||
|
|||
@abstractmethod | |||
def _create_worker(self, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function name starting with underline should be an internal function in general practice. I'm not sure if it's a suitable abstract function name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
many of the executors already have a _create_worker method, and it seems to be used internally, so I think this name fits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jberkhahn apologies I was looking at this again and I'm actually also unsure of why we would add this in the base class, since it's only ever used in a private context. There isn't any place where we call _create_worker on a generic executor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
declaring it here let's me force a standardized set of args to make sure all the _create_worker methods do roughly the same steps in the different executors. currently the separation of functionality is a bit different in different executors. I realize declaring a private method part of an interface is a bit unusual, there are spots where we do reach in and call private methods, so I'm going to have to declare those in the ABC at some point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case does that make sense to just promote it to create_worker
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i could do that. the specific instance i'm referring to where stuff reaches in and calls private methods is stuff like the calls to _run_workers in vllm/engine/llm_engine.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, i went and changed it to create_worker
. (incidentally, cpu_executor has a now private method for boostrapping it's worker, which now makes more sense with this naming scheme)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jberkhahn I'm sorry I still can't wrap my head around why we want this in the top level interface unless there's somewhere in the code that calls it on arbitrary executors (which could include other non-abstract methods in ExecutorBase
).
Does this fix an existing typing issue?
It makes sense to standardize how things are done across the implementations but given the current scoping of what's the concern of each particular impl, it's not "wrong" for them to do this differently.
So concretely my suggestions would be to remove it here and keep the private _create_worker
naming across the classes.
vllm/executor/ray_tpu_executor.py
Outdated
# The driver dummy worker does not actually use any resources. | ||
# It holds the resource for the driver worker. | ||
self.driver_dummy_worker: Optional[RayWorkerWrapper] = None | ||
# The remaining workers are the actual ray actors. | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is... definitely confusing for me, as I have no other context about how this code works.
Should the code from _init_workers_ray
really be here under create_worker
since that's what actually creates the driver_dummy_worker
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
woops, left a bit on the cutting room floor here, sorry. This is one of the parts that is kind of janky, tho. The ray tpu doesn't have a driver worker, so it just sets it to None, which I extracted into the _create_worker method to have an implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jberkhahn the changes in this file also don't make sense to me. _create_worker is implemented in the TPUExecutor superclass, there's no need to override it here. It's just not used in this class.
Also I don't see what the benefit is in changing this line:
self.driver_dummy_worker: Optional[RayWorkerWrapper] = None
it seems clearer as it is currently. Compare with the RayGPUExecutor class hierarchy which is similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, i went ahead and removed this bit, it wasn't making things clearer
110ceeb
to
90b46aa
Compare
2826f97
to
0fc499d
Compare
This pull request has merge conflicts that must be resolved before it can be |
0fc499d
to
e349434
Compare
4411794
to
01a40cb
Compare
…od declaration Signed-off-by: jberkhahn <[email protected]>
01a40cb
to
3a1e3f4
Compare
Signed-off-by: jberkhahn <[email protected]>
had to fix merge conflicts because of #9938, some interesting work in that one. might be a good idea to move all the create_worker logic over into worker for all the backend types at some point? |
This was started in support of mypy'ing the remaining libs that need it. It's been extremely difficult to add mypy to anything that touches the executor, because the various diffrent kinds of backends are structured differently, but everything is generally referred to via the abstract base type and then methods are called blindly because we "know" which kind of executor we're dealing with, even if the code doesn't. In addition, the various executor implementations often implement similar functionality decomposed differently - so the same bit of functionality often exists in different places or is referenced differently in different backends.
This is the beginning of a reactor that aims to create more abstract method declarations in executor_base, to allow executor code to be statically type checked, as well as to hopefully let things be structured in a more consistent manner that is easy to understand. This does occasionally mean that a particular backend will have a kind of dummy implementation that doesn't do much, as with the ray_tpu_executor here.
This PR just starts with the _create_worker method, which I've changed to init the driver worker, but not init the device or load the model across all various backend implementations.