Misc. bug: llama 3.2 3B : for Q4 quantized models repeated output is generating without hitting [end of text] #10824

Ramees025 · 2024-12-14T09:17:58Z

Name and Version

build: 4274 (7736837) with cc (Ubuntu 9.4.0-1ubuntu1~20.04.3) 9.4.0 for x86_64-linux-gnu

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

No response

Problem description & steps to reproduce

./llama-cli -m /output_file_Llama_3_2_3b_Q4_K_M.gguf -c 512 -p "<|begin_of_text|>The color of the sky is blue but sometimes it can also be"

Output:

build: 4274 (7736837) with cc (Ubuntu 9.4.0-1ubuntu1~20.04.3) 9.4.0 for x86_64-linux-gnu
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_loader: loaded meta data with 29 key-value pairs and 255 tensors from /storage_data/snap/ramees/LLAMA_CPP/output_file_Llama_3_2_3b_Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Llama 3.2 3B
llama_model_loader: - kv 3: general.basename str = Llama-3.2
llama_model_loader: - kv 4: general.size_label str = 3B
llama_model_loader: - kv 5: general.license str = llama3.2
llama_model_loader: - kv 6: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam...
llama_model_loader: - kv 7: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ...
llama_model_loader: - kv 8: llama.block_count u32 = 28
llama_model_loader: - kv 9: llama.context_length u32 = 131072
llama_model_loader: - kv 10: llama.embedding_length u32 = 3072
llama_model_loader: - kv 11: llama.feed_forward_length u32 = 8192
llama_model_loader: - kv 12: llama.attention.head_count u32 = 24
llama_model_loader: - kv 13: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 14: llama.rope.freq_base f32 = 500000.000000
llama_model_loader: - kv 15: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 16: llama.attention.key_length u32 = 128
llama_model_loader: - kv 17: llama.attention.value_length u32 = 128
llama_model_loader: - kv 18: general.file_type u32 = 15
llama_model_loader: - kv 19: llama.vocab_size u32 = 128256
llama_model_loader: - kv 20: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 21: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 22: tokenizer.ggml.pre str = llama-bpe
llama_model_loader: - kv 23: tokenizer.ggml.tokens arr[str,128256] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 24: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 25: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...
llama_model_loader: - kv 26: tokenizer.ggml.bos_token_id u32 = 128000
llama_model_loader: - kv 27: tokenizer.ggml.eos_token_id u32 = 128001
llama_model_loader: - kv 28: general.quantization_version u32 = 2
llama_model_loader: - type f32: 58 tensors
llama_model_loader: - type q4_K: 168 tensors
llama_model_loader: - type q6_K: 29 tensors
llm_load_vocab: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
llm_load_vocab: special tokens cache size = 256
llm_load_vocab: token to piece cache size = 0.7999 MB
llm_load_print_meta: format = GGUF V3 (latest)
llm_load_print_meta: arch = llama
llm_load_print_meta: vocab type = BPE
llm_load_print_meta: n_vocab = 128256
llm_load_print_meta: n_merges = 280147
llm_load_print_meta: vocab_only = 0
llm_load_print_meta: n_ctx_train = 131072
llm_load_print_meta: n_embd = 3072
llm_load_print_meta: n_layer = 28
llm_load_print_meta: n_head = 24
llm_load_print_meta: n_head_kv = 8
llm_load_print_meta: n_rot = 128
llm_load_print_meta: n_swa = 0
llm_load_print_meta: n_embd_head_k = 128
llm_load_print_meta: n_embd_head_v = 128
llm_load_print_meta: n_gqa = 3
llm_load_print_meta: n_embd_k_gqa = 1024
llm_load_print_meta: n_embd_v_gqa = 1024
llm_load_print_meta: f_norm_eps = 0.0e+00
llm_load_print_meta: f_norm_rms_eps = 1.0e-05
llm_load_print_meta: f_clamp_kqv = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: f_logit_scale = 0.0e+00
llm_load_print_meta: n_ff = 8192
llm_load_print_meta: n_expert = 0
llm_load_print_meta: n_expert_used = 0
llm_load_print_meta: causal attn = 1
llm_load_print_meta: pooling type = 0
llm_load_print_meta: rope type = 0
llm_load_print_meta: rope scaling = linear
llm_load_print_meta: freq_base_train = 500000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_ctx_orig_yarn = 131072
llm_load_print_meta: rope_finetuned = unknown
llm_load_print_meta: ssm_d_conv = 0
llm_load_print_meta: ssm_d_inner = 0
llm_load_print_meta: ssm_d_state = 0
llm_load_print_meta: ssm_dt_rank = 0
llm_load_print_meta: ssm_dt_b_c_rms = 0
llm_load_print_meta: model type = 3B
llm_load_print_meta: model ftype = Q4_K - Medium
llm_load_print_meta: model params = 3.21 B
llm_load_print_meta: model size = 1.87 GiB (5.01 BPW)
llm_load_print_meta: general.name = Llama 3.2 3B
llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'
llm_load_print_meta: EOS token = 128001 '<|end_of_text|>'
llm_load_print_meta: EOT token = 128009 '<|eot_id|>'
llm_load_print_meta: EOM token = 128008 '<|eom_id|>'
llm_load_print_meta: LF token = 128 'Ä'
llm_load_print_meta: EOG token = 128001 '<|end_of_text|>'
llm_load_print_meta: EOG token = 128008 '<|eom_id|>'
llm_load_print_meta: EOG token = 128009 '<|eot_id|>'
llm_load_print_meta: max token length = 256
llm_load_tensors: CPU_Mapped model buffer size = 1918.35 MiB
.....................................................................................
llama_new_context_with_model: n_seq_max = 1
llama_new_context_with_model: n_ctx = 512
llama_new_context_with_model: n_ctx_per_seq = 512
llama_new_context_with_model: n_batch = 512
llama_new_context_with_model: n_ubatch = 512
llama_new_context_with_model: flash_attn = 0
llama_new_context_with_model: freq_base = 500000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: n_ctx_per_seq (512) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
llama_kv_cache_init: CPU KV buffer size = 56.00 MiB
llama_new_context_with_model: KV self size = 56.00 MiB, K (f16): 28.00 MiB, V (f16): 28.00 MiB
llama_new_context_with_model: CPU output buffer size = 0.49 MiB
llama_new_context_with_model: CPU compute buffer size = 256.50 MiB
llama_new_context_with_model: graph nodes = 902
llama_new_context_with_model: graph splits = 1
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
main: llama threadpool init, n_threads = 20

system_info: n_threads = 20 (n_threads_batch = 20) / 40 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | AVX512 = 1 | AVX512_VNNI = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |

check_double_bos_eos: Added a BOS token to the prompt as specified by the model but the prompt also starts with a BOS token. So now the final prompt starts with 2 BOS tokens. Are you sure this is what you want?
sampler seed: 4177316350
sampler params:
repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = -1
top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, temp = 0.800
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
generate: n_ctx = 512, n_batch = 2048, n_predict = -1, n_keep = 1

The color of the sky is blue but sometimes it can also be a mixture of blue and white. The same is the case with the color of the sky which is also a mixture of blue and white. But what exactly is the color of the sky? It is the color of the sky that is the color of the sky.
In this article we will discuss the color of the sky. The color of the sky is the color of the sky but sometimes it can also be a mixture of blue and white. The color of the sky is the color of the sky but sometimes it can also be a mixture of blue and white.
The color of the sky is the color of the sky but sometimes it can also be a mixture of blue and white. The color of the sky is the color of the sky but sometimes it can also be a mixture of blue and white. The color of the sky is the color of the sky but sometimes it can also be a mixture of blue and white.
The color of the sky is the color of the sky but sometimes it can also be a mixture of blue and white. The color of the sky is the color of the sky but sometimes it can also be a mixture of blue and white. The color of the sky is the color of the sky but sometimes it can also be a mixture of blue and white.
The color of the sky is the color of the sky but sometimes it can also be a mixture of blue and white. The color of the sky is the color of the sky but sometimes it can also be a mixture of blue and white. The color of the sky is the color of the sky but sometimes it can also be a mixture of blue and white.
The color of the sky is the color of the sky but sometimes it can also be a mixture of blue and white. The color of the sky is the color of the sky but sometimes it can also be a mixture of blue and white. The color of the sky is the color of the sky but sometimes it can also be a mixture of blue and white.
The color of the sky is the color of the sky but sometimes it can also be a mixture of blue and white. The color of the sky is the color of the sky but sometimes it can also be a mixture of blue and white. The color of the sky is the color of the sky but sometimes it can also be a mixture of blue and white.
The color of the sky is the color of the sky but sometimes it can also be a mixture of blue and white. The color of the sky is the color of the sky but sometimes it can also be a mixture of blue and white. The color of the sky is the color of the sky but sometimes it can also be a mixture of blue and white.

Issue: same setence is generating again and again without hitting [end of text]

First Bad Commit

No response

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

ngxson · 2024-12-14T18:07:04Z

-c 512 is too small. you should increase context size, for example -c 4096

Ramees025 · 2024-12-15T07:43:34Z

@ngxson , I tried with -c 4096 as well, but same result

./llama-cli -m ./output_file_Llama_3_2_3b_Q4_K_M.gguf -c 4096 -p "<|begin_of_text|>The color of the sky is blue but sometimes it can also be"
O/p:
The color of the sky is blue but sometimes it can also be green, yellow, red, purple, or even black. The color of the sky is determined by the amount of light that is reflected off the sky. The more light that is reflected, the bluer the sky will appear. The less light that is reflected, the greener the sky will appear. The color of the sky is also affected by the amount of water vapor in the air. When there is a lot of water vapor in the air, the sky will appear more blue. When there is less water vapor in the air, the sky will appear more green.
The color of the sky is also affected by the amount of dust in the air. When there is a lot of dust in the air, the sky will appear more yellow. When there is less dust in the air, the sky will appear more blue.
The color of the sky is also affected by the amount of ozone in the air. When there is a lot of ozone in the air, the sky will appear more purple. When there is less ozone in the air, the sky will appear more blue.
The color of the sky is also affected by the amount of sunlight that is reflected off the sky. When there is a lot of sunlight reflected off the sky, the sky will appear more yellow. When there is less sunlight reflected off the sky, the sky will appear more blue.
The color of the sky is also affected by the amount of clouds in the sky. When there are a lot of clouds in the sky, the sky will appear more green. When there are less clouds in the sky, the sky will appear more blue.
The color of the sky is also affected by the amount of pollution in the air. When there is a lot of pollution in the air, the sky will appear more yellow. When there is less pollution in the air, the sky will appear more blue.
The color of the sky is also affected by the amount of water vapor in the air. When there is a lot of water vapor in the air, the sky will appear more green. When there is less water vapor in the air, the sky will appear more blue.
The color of the sky is also affected by the amount of dust in the air. When there is a lot of dust in the air, the sky will appear more yellow. When there is less dust in the air, the sky will appear more blue.
The color of the sky is also affected by the amount of ozone in the air. When there is a lot of ozone in the air, the sky will appear more purple. When there is less ozone in the air, the sky will appear more blue.
The color of the sky is also affected by the amount of sunlight that is reflected off the sky. When there is a lot of sunlight reflected off the sky, the sky will appear more yellow. When there is less sunlight reflected off the sky, the sky will appear more blue.
The color of the sky is also affected by the amount of clouds in the sky. When there are a lot of clouds in the sky, the sky will appear more green. When there are less clouds in the sky, the sky will appear more blue.
The color of the sky is also affected by the amount of pollution in the air. When there is a lot of pollution in the air, the sky will appear more yellow. When there is less pollution in the air, the sky will appear more blue.
The color of the sky is also affected by the amount of water vapor in the air. When there is a lot of water vapor in the air, the sky will appear more green. When there is less water vapor in the air, the sky will appear more blue.
The color of the sky is also affected by the amount of dust in the air. When there is a lot of dust in the air, the sky will appear more yellow. When there is less dust in the air, the sky will appear more blue.
The color of the sky is also affected by the amount of ozone in the air. When there is a lot of ozone in the air, the sky will appear more purple. When there is less ozone in the air, the sky will appear more blue.
The color of the sky is also affected by the amount of sunlight that is reflected off the sky. When there is a lot of sunlight reflected off the sky, the sky will appear more yellow. When there is less sunlight reflected off the sky, the sky will appear more blue.
The color of the sky is also affected by the amount of clouds in the sky. When there are a lot of clouds in the sky, the sky will appear more green. When there are less clouds in the sky, the sky will appear more blue.
The color of the sky is also affected by the amount of pollution in the air. When there is a lot of pollution in the air, the sky will appear more yellow. When there is less pollution in the air, the sky will appear more blue.
The color of the sky is also

ngxson · 2024-12-15T11:44:36Z

it could also be due to model itself, i.e. are you using the correct non-instruct / instruct model? (depends on your requirement)

non-instruct models are usually less deterministic, as there is no public source of info detailing how it should behave

hwpoison · 2024-12-19T22:57:25Z

I'm having the same issue:

With a previous commit doesn't happens, any new about this?

Ramees025 added the bug-unconfirmed label Dec 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misc. bug: llama 3.2 3B : for Q4 quantized models repeated output is generating without hitting [end of text] #10824

Misc. bug: llama 3.2 3B : for Q4 quantized models repeated output is generating without hitting [end of text] #10824

Ramees025 commented Dec 14, 2024

ngxson commented Dec 14, 2024

Ramees025 commented Dec 15, 2024

ngxson commented Dec 15, 2024

hwpoison commented Dec 19, 2024

Misc. bug: llama 3.2 3B : for Q4 quantized models repeated output is generating without hitting [end of text] #10824

Misc. bug: llama 3.2 3B : for Q4 quantized models repeated output is generating without hitting [end of text] #10824

Comments

Ramees025 commented Dec 14, 2024

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Problem description & steps to reproduce

First Bad Commit

Relevant log output

ngxson commented Dec 14, 2024

Ramees025 commented Dec 15, 2024

ngxson commented Dec 15, 2024

hwpoison commented Dec 19, 2024