-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: vllm v0.5.0 internal assert failed #5450
Comments
You only have 2 GPUs, why use tensor parallel size = 4? |
@youkaichao |
If you use a tensor parallel size different from the number of GPUs you have, then this is indeed a known issue. #5473 should solve it. |
No, I actually run vLLM on Kubernetes. Every time I modify the tensor parallel size, I manually adjust the number of GPUs simultaneously. The environment description shows only 2 GPUs because I copied it from another issue I had raised previously, where I encountered a similar problem on the same computing cluster. Therefore, I reused the environment description. |
you can take a look at #6056 |
This issue has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this issue should remain open. Thank you! |
This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant. Thank you! |
Your current environment
🐛 Describe the bug
I use vllm/vllm-openai:v0.5.0 on k8s to deploy qwen 2 72b instruct, with tensor parallel size = 4, args looks like:
then I got the following error:
This same config works normally with vllm/vllm-openai:v0.4.3.
I tried to set tensor parallel size = 8, then I got a bunch of exceptions like #5439 ,and it takes very long time to launch, I did not wait to see if it starts successfully.
The text was updated successfully, but these errors were encountered: