-
Notifications
You must be signed in to change notification settings - Fork 171
[bugfix] some bugs maybe fail to run #896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
port = int(os.environ.get("MASTER_PORT", answer)) # type: ignore | ||
port = int(os.environ.get("VLLM_DP_MASTER_PORT", answer)) # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using envs.VLLM_DP_MASTER_PORT
is better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
9df5c0f
to
293eefe
Compare
from torch.distributed import ProcessGroup | ||
from torch.distributed.distributed_c10d import (Backend, PrefixStore, | ||
_get_default_timeout, | ||
is_nccl_available) | ||
from torch.distributed.rendezvous import rendezvous | ||
from vllm.config import ParallelConfig | ||
|
||
_DP_GROUP = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is still process group for dp in vllm now, why we add this here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used to determine whether to execute the dummy_run of prefill process. The native stateless process does not have global variables to obtain.
@@ -21,12 +21,18 @@ def get_etp_group() -> GroupCoordinator: | |||
return _ETP | |||
|
|||
|
|||
def model_parallel_initialized(): | |||
return (_ETP is not None and _EP is not None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could use ep without etp, thus this will break this senario
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. If ETP is not enabled, communication groups will still be created.
Signed-off-by: ningbenzhe1 <[email protected]>
What this PR does / why we need it?
Solve the bug that the graph mode is the same as p and d, and some other bugs.
Does this PR introduce any user-facing change?
Wouldn't be
How was this patch tested?
Follow the end-to-end test