Slower results from quantized model potentially due to warning prints #9

Mijawel · 2023-02-16T05:36:47Z

I'm getting the following warning which prints probably 1000 times during execution:
[W qlinear_dynamic.cpp:239] Warning: Currently, qnnpack incorrectly ignores reduce_range when it is set to true; this may change in a future release. (function apply_dynamic_impl)

And I'm getting results maybe 30% slower than when running the non quantized model and I think it might be the cause.

Any idea how to fix?

(And just using filter warnings in python is not supressing them for some reason)

MiscellaneousStuff · 2023-02-21T14:39:20Z

Are you getting this using the code from this repository or are you running the dynamic quantisation in your own script?

Mijawel · 2023-02-27T05:56:05Z

Getting it from the repo (although I have moved over to the faster_whisper repo based on CTranslate2 since then, which seems to have working quantization)

MiscellaneousStuff · 2023-02-27T06:10:10Z

That repo looks more developed than this one so that is probably a good idea, especially if they have verified their method has the same performance after optimisation. Good to see others also using quantisation. As for the warnings I’ve found similar issues when using other Nvidia libraries like TensorRT so will still need to look into this one.

cehongwang · 2023-11-01T19:45:23Z

I'm getting the following warning which prints probably 1000 times during execution: [W qlinear_dynamic.cpp:239] Warning: Currently, qnnpack incorrectly ignores reduce_range when it is set to true; this may change in a future release. (function apply_dynamic_impl)

And I'm getting results maybe 30% slower than when running the non quantized model and I think it might be the cause.

Any idea how to fix?

(And just using filter warnings in python is not supressing them for some reason)

Hello, How do you fix the problem? I am using torch.quantization and also want to disable the printing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slower results from quantized model potentially due to warning prints #9

Slower results from quantized model potentially due to warning prints #9

Mijawel commented Feb 16, 2023

MiscellaneousStuff commented Feb 21, 2023

Mijawel commented Feb 27, 2023

MiscellaneousStuff commented Feb 27, 2023

cehongwang commented Nov 1, 2023

Slower results from quantized model potentially due to warning prints #9

Slower results from quantized model potentially due to warning prints #9

Comments

Mijawel commented Feb 16, 2023

MiscellaneousStuff commented Feb 21, 2023

Mijawel commented Feb 27, 2023

MiscellaneousStuff commented Feb 27, 2023

cehongwang commented Nov 1, 2023