How does PPQ perform real quantization and achieve speed up? #570

YixuanSeanZhou · 2024-08-08T02:19:39Z

Question

Looking at the forward call of QConv2D, PPQ torch executor seems to be executing with a fake quantization scheme, where the input and weight goes through Q->DQ->Conv rather than Q->INT8_Conv->DQ.

I wonder whether PPQ has an implementation where the Q/DQ nodes are being resolved and real quantized kernels are being invoked. If so, could you please provide a code pointer?

Thanks in advance.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does PPQ perform real quantization and achieve speed up? #570

How does PPQ perform real quantization and achieve speed up? #570

YixuanSeanZhou commented Aug 8, 2024

How does PPQ perform real quantization and achieve speed up? #570

How does PPQ perform real quantization and achieve speed up? #570

Comments

YixuanSeanZhou commented Aug 8, 2024

Question