-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ci][distributed] add distributed test gptq_marlin with tp = 2 #6010
base: main
Are you sure you want to change the base?
Conversation
6a2004c
to
8e128d2
Compare
Thanks for the PR! You need to move the test from https://github.com/vllm-project/vllm/blob/main/.buildkite/test-pipeline.yaml In addition, because of some limitations, you might only test the tp=2 case. It is not safe to test two vLLM instances together. |
63b9545
to
49141fb
Compare
Imo we should keep the original |
make sense - so is it better to abstract the common following test codes into a new code block (e.g.
|
Let's abstract out the code (similar to what I did for the multimodal distributed tests) |
64c0686
to
f12288d
Compare
@@ -17,8 +18,6 @@ | |||
|
|||
from .utils import check_logprobs_close | |||
|
|||
os.environ["TOKENIZERS_PARALLELISM"] = "true" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep this line as it avoids unnecessary warnings from HuggingFace
@DarkLight1337 it looks like the new unit test (
|
This happens because you initialized CUDA too early (probably indirectly via imports). Try to avoid importing torch-related stuff in the top level code of your test. |
If the issue persists, #6056 should help you. |
follow-up pr of #6007