Skip to content

Issues: EleutherAI/lm-evaluation-harness

reproduce llama 3 evals
#2557 opened Dec 10, 2024 by baberabb
Open
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

CaseHOLD Task Implementation
#2571 opened Dec 16, 2024 by zolastro
When can support MATH/HumanEval datasets eval asking questions For asking for clarification / support on library usage.
#2564 opened Dec 12, 2024 by shawn0wang
reproduce llama 3 evals good first issue Good for newcomers validation For validation of task implementations.
#2557 opened Dec 10, 2024 by baberabb
fail to reproduce Deepseek-math result asking questions For asking for clarification / support on library usage. validation For validation of task implementations.
#2555 opened Dec 10, 2024 by zhuqiangLu
Hendrycks Math extraction rule seems too strict good first issue Good for newcomers validation For validation of task implementations.
#2552 opened Dec 8, 2024 by fzyzcjy
Inquiry about the feature to continue evaluation after abnormal termination asking questions For asking for clarification / support on library usage.
#2548 opened Dec 6, 2024 by minimi-kei
Add Global-MMLU
#2547 opened Dec 6, 2024 by shivalika-singh
Support for for squad dataset
#2538 opened Dec 4, 2024 by danielkorzekwa
import lm_eval
#2526 opened Nov 30, 2024 by fmk345
humaneval
#2516 opened Nov 26, 2024 by fxnie
Got near random guess results for GPQA. validation For validation of task implementations.
#2513 opened Nov 25, 2024 by Ignoramus0817
french_bench_xnli dataset doesn't exist asking questions For asking for clarification / support on library usage.
#2501 opened Nov 18, 2024 by jgcb00
Overwrite default tasks
#2487 opened Nov 13, 2024 by jonoillar
Cannot reproduce LLaMA 3 8B on hendrycks_math validation For validation of task implementations.
#2479 opened Nov 11, 2024 by liuxiaozhu01
ProTip! Add no:assignee to see everything that’s not assigned.