-
Notifications
You must be signed in to change notification settings - Fork 2k
Issues: EleutherAI/lm-evaluation-harness
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Question: Is there an easy way for me to know all the generation_until tasks?
#2569
opened Dec 14, 2024 by
Ki-Seki
When can support MATH/HumanEval datasets eval
asking questions
For asking for clarification / support on library usage.
#2564
opened Dec 12, 2024 by
shawn0wang
reproduce llama 3 evals
good first issue
Good for newcomers
validation
For validation of task implementations.
#2557
opened Dec 10, 2024 by
baberabb
fail to reproduce Deepseek-math result
asking questions
For asking for clarification / support on library usage.
validation
For validation of task implementations.
#2555
opened Dec 10, 2024 by
zhuqiangLu
Hendrycks Math extraction rule seems too strict
good first issue
Good for newcomers
validation
For validation of task implementations.
#2552
opened Dec 8, 2024 by
fzyzcjy
Inconsistent responses for the same case with different limit parameters
#2550
opened Dec 7, 2024 by
Starry-Liu1
Inquiry about the feature to continue evaluation after abnormal termination
asking questions
For asking for clarification / support on library usage.
#2548
opened Dec 6, 2024 by
minimi-kei
Answer extraction logic for Math Lvl 5 (Open LLM Leaderboard 2) may be too strict
#2539
opened Dec 5, 2024 by
suhara
How to evaluate correctness and errors in multiple-choice tasks?
#2529
opened Dec 1, 2024 by
WuXnkris
Can't evaluate a gguf model using llama.cpp as inference framework
#2525
opened Nov 29, 2024 by
SurviiingZc
Got near random guess results for GPQA.
validation
For validation of task implementations.
#2513
opened Nov 25, 2024 by
Ignoramus0817
french_bench_xnli dataset doesn't exist
asking questions
For asking for clarification / support on library usage.
#2501
opened Nov 18, 2024 by
jgcb00
[Performance significantly drop when increase the batch_size]
bug
Something isn't working.
#2498
opened Nov 17, 2024 by
yushengsu-thu
OOM Issues in MMLU Evaluation with lm_eval Using vllm as Backend
#2490
opened Nov 14, 2024 by
wchen61
Cannot reproduce LLaMA 3 8B on hendrycks_math
validation
For validation of task implementations.
#2479
opened Nov 11, 2024 by
liuxiaozhu01
Previous Next
ProTip!
Add no:assignee to see everything that’s not assigned.