Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vllm could not be used because of CUDA kernal #21

Open
652994331 opened this issue Nov 18, 2023 · 1 comment
Open

vllm could not be used because of CUDA kernal #21

652994331 opened this issue Nov 18, 2023 · 1 comment

Comments

@652994331
Copy link

hi, I got a problem when I was trying to use vllm.
Screen Shot 2023-11-18 at 14 42 49

nt PyTorch version is :
Screen Shot 2023-11-18 at 14 44 01

and my gpu machine is P100, Nvidia-driveris 470.141. could you please check this problem? thx

@sherdencooper
Copy link
Owner

Hi, thanks for trying our codes. From our experience, the vllm is sometimes indeed hard to install because it depends on xformers. See this post. I think you need to load the model as dtype half on P100 with vllm. One of my server which has the similar environment that could run vllm successfully has the following package: torch 2.0.1+cu118, vllm==0.1.6. Another server with latest vllm is: torch 2.1.0+cu121 vllm==0.2.0. You could try different cuda versions (no need to upgrade the server's cuda, it can be shipped with pytorch installation in the env) or vllm versions. Note that we have found some differences in outputs from vllm 0.1.6 and vllm 0.2.0. You could also try build from source if you meet some installation or running issue.

If all above solutions cannot work well, maybe you could still stick to hugging face which is easier to install but slow. Plz let us know if you still meet the installation or running issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants