No reduction in model size #15

Sekri0 · 2024-10-01T03:02:50Z

I use this command to quantize llama2-7b-chat model, but the model size dosen't change.
CUDA_VISIBLE_DEVICES=0 python3 main.py
--model /mnt/home/model/llama2-7b-chat-hf
--epochs 20 --output_dir ./log/llama2-7b-w2a8
--eval_ppl --wbits 2 --abits 8 --lwc --let
--tasks piqa,arc_easy,arc_challenge,boolq,hellaswag,winogrande
--real_quant
--save_dir /mnt/home/model/abq-llm/llama2-7b-w2a8

zengchao0424 · 2024-10-02T15:53:17Z

Hello, the weights obtained through this way are the calibrated fake quantization weights. To achieve actual weight compression, a packing operation is required to store the weights. For example, using INT2 quantization, the pack operation will use an INT32 to store 16 INT2 element values.

lswzjuer · 2024-10-03T02:15:37Z

We are developing a complete link from pseudo-quantized models to real packing weights and directly executing WxAy quantized inference in Torch, which is expected to be released within a week after the National Day. We did not release this link before because the engine inside ByteDance is a pure C++ solution.

Sekri0 · 2024-10-15T10:57:47Z

We are developing a complete link from pseudo-quantized models to real packing weights and directly executing WxAy quantized inference in Torch, which is expected to be released within a week after the National Day. We did not release this link before because the engine inside ByteDance is a pure C++ solution.

Thank you for your work. May I ask when you guys are planning to release the code related to real packing weight and Torch inference

lswzjuer · 2024-10-17T09:42:27Z

It is expected that this quarter From: ***@***.***> Date: Tue, Oct 15, 2024, 18:58 Subject: [External] Re: [bytedance/ABQ-LLM] No reduction in model size (Issue #15) To: ***@***.***> Cc: ***@***.***>, "Comment"< ***@***.***> We are developing a complete link from pseudo-quantized models to real packing weights and directly executing WxAy quantized inference in Torch, which is expected to be released within a week after the National Day. We did not release this link before because the engine inside ByteDance is a pure C++ solution. Thank you for your work. May I ask when you guys are planning to release the code related to real packing weight and Torch inference — Reply to this email directly, view it on GitHub <#15 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AL4LCKNANSLUL2V2UTNCHHTZ3TYMDAVCNFSM6AAAAABPETUI6SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJTGU3DIOJZGQ> . You are receiving this because you commented.Message ID: ***@***.***>

limertang · 2024-11-28T07:26:57Z

@lswzjuer
Did you release the code?
I quantized the model,and the model size of save_dir is still the same as before.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No reduction in model size #15

No reduction in model size #15

Sekri0 commented Oct 1, 2024

zengchao0424 commented Oct 2, 2024

lswzjuer commented Oct 3, 2024

Sekri0 commented Oct 15, 2024

lswzjuer commented Oct 17, 2024 via email

limertang commented Nov 28, 2024

No reduction in model size #15

No reduction in model size #15

Comments

Sekri0 commented Oct 1, 2024

zengchao0424 commented Oct 2, 2024

lswzjuer commented Oct 3, 2024

Sekri0 commented Oct 15, 2024

lswzjuer commented Oct 17, 2024 via email

limertang commented Nov 28, 2024