Feature Request: Q6_0 quant #10848

Nexesenex · 2024-12-16T05:46:33Z

Prerequisites

I am running the latest code. Mention the version if possible as well.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Qwen 2 and models can't use Q5_K and Q6_K ggml_types to quantize ffn_down due to its irregular shape.
It thus brings us on Q5_1, which is now suboptimal, and sometimes subpar to Q5_0. Or to Q8_0, much bigger.

Ikawrakow published a while ago a Q6_0 which can quantize irregularly shaped tensors, and it would provide a great alternative to either Q5_0 or Q8_0.

After all, ffn_down represents something like 25% of the weight of a layer, and so, for Qwen 2 models, that means 0.4 to 0.5bpw more for a Q6_K quantized model. In Q5_K_S or Q5_K_M, Q5_1 tensors are also bigger than Q5_K, with a lesser quality.

In all these cases, a Q6_0 ggml_type would be more appropriate.

Ikawrakow has already one made on his fork. It'd be great to see such a quant in LlamaCPP mainline.

Motivation

Better ratio quality/size for models with irregularly shaped tensors.

Possible Implementation

Factor and merge IK's Q6_0 GGML_Type.
Or
Develop and implement an equivalent here.

github-actions · 2025-01-30T01:07:05Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

Nexesenex added the enhancement New feature or request label Dec 16, 2024

github-actions bot added the stale label Jan 16, 2025

github-actions bot closed this as completed Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Q6_0 quant #10848

Feature Request: Q6_0 quant #10848

Nexesenex commented Dec 16, 2024 •

edited

Loading

github-actions bot commented Jan 30, 2025

Feature Request: Q6_0 quant #10848

Feature Request: Q6_0 quant #10848

Comments

Nexesenex commented Dec 16, 2024 • edited Loading

Prerequisites

Feature Description

Motivation

Possible Implementation

github-actions bot commented Jan 30, 2025

Nexesenex commented Dec 16, 2024 •

edited

Loading