Can't evaluate a gguf model using llama.cpp as inference framework #2525

SurviiingZc · 2024-11-29T15:21:18Z

I wana evaluate the precision of the gguf model using llama.cpp as inference framework

use these commands:

./llama-server -m /root/ICAS_test/models/Qwen-1_8B-Q8_0.gguf

lm_eval --model gguf --model_args base_url=http://localhost:8080 --tasks piqa --device cpu

with the output:

2024-11-29:23:02:13,182 INFO [main.py:279] Verbosity set to INFO
2024-11-29:23:02:19,167 INFO [main.py:376] Selected Tasks: ['piqa']
2024-11-29:23:02:19,168 INFO [evaluator.py:164] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234
2024-11-29:23:02:19,169 INFO [evaluator.py:201] Initializing gguf model, with arguments: {'base_url': 'http://localhost:8080'}
2024-11-29:23:04:15,092 INFO [task.py:415] Building contexts for piqa on rank 0...
100%|██████████████████████████████████████████████████████████████████████████████| 1838/1838 [00:01<00:00, 1190.89it/s]
2024-11-29:23:04:16,698 INFO [evaluator.py:494] Running loglikelihood requests
0%| | 0/3676 [00:00<?, ?it/s]2024-11-29:23:04:17,085 ERROR [gguf.py:98] Invalid response for loglikelihood. Response: {'content': ' The', 'id_slot': 0, 'stop': True, 'model': '/root/ICAS_test/models/Qwen-1_8B-Q8_0.gguf', 'tokens_predicted': 1, 'tokens_evaluated': 56, 'generation_settings': {'n_ctx': 8192, 'n_predict': 50, 'model': '/root/ICAS_test/models/Qwen-1_8B-Q8_0.gguf', 'seed': 4294967295, 'seed_cur': 4294967295, 'temperature': 0.0, 'dynatemp_range': 0.0, 'dynatemp_exponent': 1.0, 'top_k': 40, 'top_p': 0.949999988079071, 'min_p': 0.05000000074505806, 'tfs_z': 1.0, 'typical_p': 1.0, 'repeat_last_n': 64, 'repeat_penalty': 1.0, 'presence_penalty': 0.0, 'frequency_penalty': 0.0, 'mirostat': 0, 'mirostat_tau': 5.0, 'mirostat_eta': 0.10000000149011612, 'penalize_nl': False, 'stop': [], 'max_tokens': 1, 'n_keep': 0, 'n_discard': 0, 'ignore_eos': False, 'stream': False, 'n_probs': 0, 'min_keep': 0, 'grammar': '', 'samplers': ['top_k', 'tfs_z', 'typ_p', 'top_p', 'min_p', 'temperature']}, 'prompt': "Question: How do I ready a guinea pig cage for it's new occupants?\nAnswer: Provide the guinea pig with a cage full of a few inches of bedding made of ripped paper strips, you will also need to supply it with a water bottle and a food dish.", 'truncated': False, 'stopped_eos': False, 'stopped_word': False, 'stopped_limit': True, 'stopping_word': '', 'tokens_cached': 56, 'timings': {'prompt_n': 56, 'prompt_ms': 383.06, 'prompt_per_token_ms': 6.840357142857143, 'prompt_per_second': 146.1911972014828, 'predicted_n': 1, 'predicted_ms': 0.022, 'predicted_per_token_ms': 0.022, 'predicted_per_second': 45454.545454545456}, 'index': 0}
0%| | 0/3676 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/root/anaconda3/bin/lm_eval", line 8, in
sys.exit(cli_evaluate())
^^^^^^^^^^^^^^
File "/root/ICAS_test/evaluate/lm-evaluation-harness/lm_eval/main.py", line 382, in cli_evaluate
results = evaluator.simple_evaluate(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/ICAS_test/evaluate/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/root/ICAS_test/evaluate/lm-evaluation-harness/lm_eval/evaluator.py", line 301, in simple_evaluate
results = evaluate(
^^^^^^^^^
File "/root/ICAS_test/evaluate/lm-evaluation-harness/lm_eval/utils.py", line 397, in _wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/root/ICAS_test/evaluate/lm-evaluation-harness/lm_eval/evaluator.py", line 505, in evaluate
resps = getattr(lm, reqtype)(cloned_reqs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/ICAS_test/evaluate/lm-evaluation-harness/lm_eval/models/gguf.py", line 101, in loglikelihood
assert False
^^^^^
AssertionError

why couldn't run loglikelihood requests and 'predict_n' is just 1

baberabb · 2024-11-30T09:09:36Z

Hi! looks like the llama-server doesn't support the echo like param (we need the log probs of the prompt), so can't use it with loglikelihood tasks.

SurviiingZc · 2024-11-30T14:57:59Z

Thanks,

After using lm_eval, the output of llama-server looks like this:

'main: server is listening on 127.0.0.1:8080 - starting the main loop
srv update_slots: all slots are idle
slot launch_slot_: id 0 | task 0 | processing task
slot update_slots: id 0 | task 0 | tokenizing prompt, len = 1
slot update_slots: id 0 | task 0 | prompt tokenized, n_ctx_slot = 8192, n_keep = 0, n_prompt_tokens = 56
slot update_slots: id 0 | task 0 | kv cache rm [0, end)
slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 56, n_tokens = 56, progress = 1.000000
slot update_slots: id 0 | task 0 | prompt done, n_past = 56, n_tokens = 56
slot release: id 0 | task 0 | stop processing: n_past = 56, truncated = 0
slot print_timing: id 0 | task 0 |
prompt eval time = 383.02 ms / 56 tokens ( 6.84 ms per token, 146.21 tokens per second)
eval time = 0.02 ms / 1 tokens ( 0.02 ms per token, 43478.26 tokens per second)
total time = 383.04 ms / 57 tokens
srv update_slots: all slots are idle

btw, I'm just starting to learn about these, I've made some changes to llama.cpp, may I ask what other methods are available to test the accuracy of the gguf model using llama.cpp as inference backend?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't evaluate a gguf model using llama.cpp as inference framework #2525

Can't evaluate a gguf model using llama.cpp as inference framework #2525

SurviiingZc commented Nov 29, 2024 •

edited

Loading

baberabb commented Nov 30, 2024 •

edited

Loading

SurviiingZc commented Nov 30, 2024

Can't evaluate a gguf model using llama.cpp as inference framework #2525

Can't evaluate a gguf model using llama.cpp as inference framework #2525

Comments

SurviiingZc commented Nov 29, 2024 • edited Loading

I wana evaluate the precision of the gguf model using llama.cpp as inference framework

use these commands:

with the output:

why couldn't run loglikelihood requests and 'predict_n' is just 1

baberabb commented Nov 30, 2024 • edited Loading

SurviiingZc commented Nov 30, 2024

After using lm_eval, the output of llama-server looks like this:

btw, I'm just starting to learn about these, I've made some changes to llama.cpp, may I ask what other methods are available to test the accuracy of the gguf model using llama.cpp as inference backend?

SurviiingZc commented Nov 29, 2024 •

edited

Loading

baberabb commented Nov 30, 2024 •

edited

Loading