Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Q6_0 quant #10848

Closed
4 tasks done
Nexesenex opened this issue Dec 16, 2024 · 1 comment
Closed
4 tasks done

Feature Request: Q6_0 quant #10848

Nexesenex opened this issue Dec 16, 2024 · 1 comment
Labels
enhancement New feature or request stale

Comments

@Nexesenex
Copy link
Contributor

Nexesenex commented Dec 16, 2024

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Qwen 2 and models can't use Q5_K and Q6_K ggml_types to quantize ffn_down due to its irregular shape.
It thus brings us on Q5_1, which is now suboptimal, and sometimes subpar to Q5_0. Or to Q8_0, much bigger.

Ikawrakow published a while ago a Q6_0 which can quantize irregularly shaped tensors, and it would provide a great alternative to either Q5_0 or Q8_0.

After all, ffn_down represents something like 25% of the weight of a layer, and so, for Qwen 2 models, that means 0.4 to 0.5bpw more for a Q6_K quantized model. In Q5_K_S or Q5_K_M, Q5_1 tensors are also bigger than Q5_K, with a lesser quality.

In all these cases, a Q6_0 ggml_type would be more appropriate.

Ikawrakow has already one made on his fork. It'd be great to see such a quant in LlamaCPP mainline.

Motivation

Better ratio quality/size for models with irregularly shaped tensors.

Possible Implementation

Factor and merge IK's Q6_0 GGML_Type.
Or
Develop and implement an equivalent here.

@Nexesenex Nexesenex added the enhancement New feature or request label Dec 16, 2024
@github-actions github-actions bot added the stale label Jan 16, 2025
Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale
Projects
None yet
Development

No branches or pull requests

1 participant