Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why not support Tesla P100 and limit computing power to greater than 7? #1284

Closed
jianhuaz opened this issue Oct 7, 2023 · 8 comments
Closed

Comments

@jianhuaz
Copy link

jianhuaz commented Oct 7, 2023

Bfloat16 is only supported on GPUs with compute capability of at least 8.0. Your Tesla P100-PCIE-16GB GPU has compute capability 6.0.
RuntimeError: GPUs with compute capability below 7.0 are not supported.

@wasertech
Copy link

wasertech commented Oct 7, 2023

As the runtime error states, the P100 only has a compute capability of 6.0, which is lower than the minimum requirement of 8.0 for bfloat16 support. This means that Tesla P100 cannot use bfloat16 efficiently or correctly. By limiting the computing power to greater than 8, the software can ensure that the GPU has the right hardware and software to use bfloat16 with no problem.
Note that fp16 only requires a 7.0 as compute capability but I'm afraid your GPU is just not made to do such computation efficiently.

@jianhuaz
Copy link
Author

jianhuaz commented Oct 8, 2023

But P100 is widely used in school teaching, with lower performance and no problem. If Vllm is discarded during the teaching phase, and company employees do not learn Vllm well during the school phase, would it be a loss for Vllm.

@jianhuaz
Copy link
Author

jianhuaz commented Oct 8, 2023

A software that is not only used in production environments. Before being used in production, it is used for teaching, validation of technical routes, and so on.

@jianhuaz
Copy link
Author

jianhuaz commented Oct 8, 2023

A software that targets the world, and many developing and underdeveloped countries also need to use good technology to develop their own countries.

@Harrison-cc
Copy link

add parameter in command line
--dtype half

@jianhuaz
Copy link
Author

jianhuaz commented Oct 8, 2023

@Harrison-cc Y

@jianhuaz jianhuaz closed this as completed Oct 8, 2023
@Harrison-cc
Copy link

which means loading the model using fp16 (v100 support), but I'm not sure if it performs the same as bf16 loading. fp16 is less precise than bf16. (fp16 has 5 bits for exponent, bf16 has 8 bits)

@jasonacox
Copy link
Contributor

You can use --dtype float. I managed to get a Docker container of vLLM running Mistral on a system with four P100's. Details: #963 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants