rotation+gptq data #20

Andy0422 · 2024-10-11T10:30:38Z

Hi，

Can you share the rotation+gptq ppl data? is it better than smoothquant+gptq? Many tks!

HandH1998 · 2024-10-12T02:23:54Z

Ref to #13 (comment).
In my practice, rotation+gptq is generally better than smooth+gptq for per-channel quantization. However, this is not the case for some models, such as #17.

Andy0422 · 2024-10-14T02:06:02Z

@HandH1998

Hi，thank you for your kindly help.
I encountered another problem with the calibration data,

from my test result as following, the results with wikitext2 seems ok, and the results with pile calib dataset is not aligned with your original data. The pile data I used in from https://huggingface.co/datasets/mit-han-lab/pile-val-backup/tree/main,
could share your pile dataset for me? or share your comments on this finding. email: [email protected].

Granularity	Method	Llama-2	Wikitext2	Pile	paper data
per-channel	smooth+gptq	7B	5.98	6.14	5.95
per-group	smooth+gptq		5.71	5.78	5.71

HandH1998 · 2024-10-14T11:55:54Z

@Andy0422 We used pile for smoothing and wikitext2 for gptq in our paper. But the current code has fixed this issue to use the same dataset for both smoothing and gptq. So it is normal that you cannot reprocude the results of our paper using the latest code. It is not relevant with the pile data.

Andy0422 · 2024-10-14T14:48:03Z

@Andy0422 We used pile for smoothing and wikitext2 for gptq in our paper. But the current code has fixed this issue to use the same dataset for both smoothing and gptq. So it is normal that you cannot reprocude the results of our paper using the latest code. It is not relevant with the pile data.
@HandH1998
okay, see... So do you think our test results is correct ? Thank you!

HandH1998 · 2024-10-15T08:28:45Z

@Andy0422 It is probably correct.

Andy0422 · 2024-10-21T04:55:36Z

@Andy0422 It is probably correct.

@HandH1998 One more question, do you employ the online Hadmamad transform before the down_proj or ignore all the online transform in your implementation? If yes, do you evaluate the overhead in inference? Thanks~

HandH1998 · 2024-10-22T11:49:57Z

@Andy0422 I don't employ the online Hadamard transform.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rotation+gptq data #20

rotation+gptq data #20

Andy0422 commented Oct 11, 2024

HandH1998 commented Oct 12, 2024

Andy0422 commented Oct 14, 2024 •

edited

Loading

HandH1998 commented Oct 14, 2024

Andy0422 commented Oct 14, 2024

HandH1998 commented Oct 15, 2024

Andy0422 commented Oct 21, 2024

HandH1998 commented Oct 22, 2024

rotation+gptq data #20

rotation+gptq data #20

Comments

Andy0422 commented Oct 11, 2024

HandH1998 commented Oct 12, 2024

Andy0422 commented Oct 14, 2024 • edited Loading

HandH1998 commented Oct 14, 2024

Andy0422 commented Oct 14, 2024

HandH1998 commented Oct 15, 2024

Andy0422 commented Oct 21, 2024

HandH1998 commented Oct 22, 2024

Andy0422 commented Oct 14, 2024 •

edited

Loading