How should I verify the speedup effect of the algorithm? #38

moonlightian · 2023-07-20T10:52:47Z

Hi~ Thank you for your great works! It seems that GPTQ would lead to significant speedups for end-to-end inference. But after quantizing INT8 BLOOM-7B with GPTQ, I found it twice slower than FP16 model. How could I make it speedup as shown in paper?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How should I verify the speedup effect of the algorithm? #38

How should I verify the speedup effect of the algorithm? #38

moonlightian commented Jul 20, 2023

How should I verify the speedup effect of the algorithm? #38

How should I verify the speedup effect of the algorithm? #38

Comments

moonlightian commented Jul 20, 2023