-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slower results from quantized model potentially due to warning prints #9
Comments
Are you getting this using the code from this repository or are you running the dynamic quantisation in your own script? |
Getting it from the repo (although I have moved over to the faster_whisper repo based on CTranslate2 since then, which seems to have working quantization) |
That repo looks more developed than this one so that is probably a good idea, especially if they have verified their method has the same performance after optimisation. Good to see others also using quantisation. As for the warnings I’ve found similar issues when using other Nvidia libraries like TensorRT so will still need to look into this one. |
Hello, How do you fix the problem? I am using torch.quantization and also want to disable the printing. |
I'm getting the following warning which prints probably 1000 times during execution:
[W qlinear_dynamic.cpp:239] Warning: Currently, qnnpack incorrectly ignores reduce_range when it is set to true; this may change in a future release. (function apply_dynamic_impl)
And I'm getting results maybe 30% slower than when running the non quantized model and I think it might be the cause.
Any idea how to fix?
(And just using filter warnings in python is not supressing them for some reason)
The text was updated successfully, but these errors were encountered: