You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently learned about whisper and was eager to try your CPU-based adaptation for speed increase.
However, me not being too rich, I have an older server running a pentium G3220.
When I went to try the quantization on this CPU, the program quit without a message (even running python using -vvv)
If I use the model as-is, without weight quantization, the code runs well (but slow! ha).
I think a requirement is a CPU supporting AVX2, since it is supported on my main PC (Which is an i7-6700). My friend also tried quantization on their server which did not support AVX2 and it failed, too (took a while to figure out why, I'll tell you)
So there you have it. Maybe there's a workaround for older CPUs? I did not find any notes of necessary AVX2 on torch's quantize_dynamic function documentation.
Thanks!
The text was updated successfully, but these errors were encountered:
Hello, thanks for your interest in the project! This page https://pytorch.org/docs/stable/quantization.html (Section: Backend/Hardware Support) explains that Dynamic Quantization relies on “fbgemm” which is a library that accelerates vector operations on the CPU using AVX2 instructions. However it seems to indicate that even without AVX2 it should still work, but it should just be a bit slower. If your deploying this on your server for a production use case or are committed to making this work on your CPU-only server, TensorRT may be more appropriate.
Hello @MiscellaneousStuff ! Hope you are well.
I recently learned about whisper and was eager to try your CPU-based adaptation for speed increase.
However, me not being too rich, I have an older server running a pentium G3220.
When I went to try the quantization on this CPU, the program quit without a message (even running python using
-vvv
)If I use the model as-is, without weight quantization, the code runs well (but slow! ha).
I think a requirement is a CPU supporting AVX2, since it is supported on my main PC (Which is an i7-6700). My friend also tried quantization on their server which did not support AVX2 and it failed, too (took a while to figure out why, I'll tell you)
So there you have it. Maybe there's a workaround for older CPUs? I did not find any notes of necessary AVX2 on torch's quantize_dynamic function documentation.
Thanks!
The text was updated successfully, but these errors were encountered: