[Badcase]: Gibberish output of Qwen2.5-3B-Instruct with Q2_K quantization, Llama.cpp #1237

simmonssong · 2025-03-17T09:19:41Z

Model Series

Qwen2.5

What are the models used?

Qwen2.5-3B-Instruct

What is the scenario where the problem happened?

Qwen2.5-3B-Instruct with Q2_K quantization, Llama.cpp

Is this badcase known and can it be solved using avaiable techniques?

I have followed the GitHub README.
I have checked the Qwen documentation and cannot find a solution there.
I have checked the documentation of the related framework and cannot find useful information.
I have searched the issues and there is not a similar one.

Information about environment

OS: Windows 10 & 11
Python: 3.13
CPU

Description

Steps to reproduce

I tried converting Qwen2.5-3B-Instruct into Q2_K quantization on two different machines. The output of the compressed model is always nonsense:
,“||9"363的76...5 31367244一246“),).请-264“3))-64）)5761595431636843467435565846"):4843)"),\n5353"34“ ), 3\"6)) the"24\n\n964

But it seems that this only happens to Q2_K.

Platforms:
Windows 10 with llama.cpp build b4846.
Windows 11 with llama.cpp build b4520.
Original model:
https://huggingface.co/Qwen/Qwen2.5-3B-Instruct
Conversion script:
python convert_hf_to_gguf.py ***\Qwen2.5-7B-Instruct --outfile ***\Qwen2.5-7B-Instruct-FP16.gguf
Quantization script:
llama-quantize.exe ***\Qwen2.5-3B-Instruct-FP16.gguf ***\Qwen2.5-3B-Instruct-Q2_K.gguf Q2_K
Model testing script:
llama-cli.exe -m ***\Qwen2.5-3B-Instruct-Q2_K.gguf

Expected results

The results are expected to be ordinary output of words.

Attempts to fix

I have tried several ways to fix this but no help, including:

Change different machines and versions of Llama.cpp.
Change different Qwen models, e.g, 7B-Instruct, 1B-Instruct, 14B-Instruct
Change quantization methods of Llama.cpp, e.g., Q8_0, Q5_0, Q4_0, Q3_k, Q4_k, Q5_k, Q6_k,

Anything else helpful for investigation

None.

The text was updated successfully, but these errors were encountered:

jklj077 · 2025-03-17T12:11:06Z

You should provide an importance matrix for Q2_K. Refer to the documentation for how to do that: https://qwen.readthedocs.io/en/latest/run_locally/llama.cpp.html (some parts are a little bit outdated)
Or other quant types such as IQ2_XS or IQ2_XXS. See https://www.github.com/ggml-org/llama.cpp/pull/4897 for reference (importance matrix also needed).

Otherwise, use the available Q2_K quants from https://huggingface.co/Qwen/Qwen2.5-3B-Instruct-GGUF or others.

simmonssong · 2025-03-18T00:43:00Z

It is optional for Q2_K quantization. I don't think the gibberish output is caused by the missing of importance matrix. The importance matrix will decrease perplexity. But the perplexity of gibberish output seems to be infinity.

simmonssong · 2025-03-18T00:44:06Z

BTW, the calculation of important matrix is too slow on laptop. Do you have official importance matrices?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Badcase]: Gibberish output of Qwen2.5-3B-Instruct with Q2_K quantization, Llama.cpp #1237

[Badcase]: Gibberish output of Qwen2.5-3B-Instruct with Q2_K quantization, Llama.cpp #1237

simmonssong commented Mar 17, 2025

jklj077 commented Mar 17, 2025

simmonssong commented Mar 18, 2025

simmonssong commented Mar 18, 2025

[Badcase]: Gibberish output of Qwen2.5-3B-Instruct with Q2_K quantization, Llama.cpp #1237

[Badcase]: Gibberish output of Qwen2.5-3B-Instruct with Q2_K quantization, Llama.cpp #1237

Comments

simmonssong commented Mar 17, 2025

Model Series

What are the models used?

What is the scenario where the problem happened?

Is this badcase known and can it be solved using avaiable techniques?

Information about environment

Description

Steps to reproduce

Expected results

Attempts to fix

Anything else helpful for investigation

jklj077 commented Mar 17, 2025

simmonssong commented Mar 18, 2025

simmonssong commented Mar 18, 2025