Some errors when doing evaluation with hf models #352

JiaQiSJTU · 2024-09-16T06:42:43Z

I would like to express my appreciation for this repository, particularly for its robust evaluation component.
However, during re-implementation, I encountered some unexpected behaviors in model generations. I have a few suggestions for improving the source code:

In this line, it might be more appropriate to replace temperature=0 with sample=False.
For the function generate_completions at this location, consider adding "add_special_tokens=False if args.use_chat_format else True".
"Because many tokenizers will treat the word after space differently from the original word alone, to be consistent, we add a space before tokenization and remove it after tokenization." However, when using the llama-3 tokenizer, adding a space before "```" results in it being tokenized as a single token, so nothing should be removed?

hamishivi · 2024-10-14T18:44:01Z

Thanks for your comments! Actually, we use vllm for generation these days wherever possible, so I think we don't bump into 1 and 2 that often. I'll leave this issue open to track these.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some errors when doing evaluation with hf models #352

Some errors when doing evaluation with hf models #352

JiaQiSJTU commented Sep 16, 2024 •

edited

Loading

hamishivi commented Oct 14, 2024

Some errors when doing evaluation with hf models #352

Some errors when doing evaluation with hf models #352

Comments

JiaQiSJTU commented Sep 16, 2024 • edited Loading

hamishivi commented Oct 14, 2024

JiaQiSJTU commented Sep 16, 2024 •

edited

Loading