EleutherAI / lm-evaluation-harness Public

Notifications You must be signed in to change notification settings
Fork 2k
Star 7.2k

Code
Issues 332
Pull requests 99
Actions
Projects 1
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: EleutherAI/lm-evaluation-harness

reproduce llama 3 evals

#2557 opened Dec 10, 2024 by baberabb

Open

Labels 10 Milestones 1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

332 Open 853 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

CaseHOLD Task Implementation

#2571 opened Dec 16, 2024 by zolastro

Question: Is there an easy way for me to know all the generation_until tasks?

#2569 opened Dec 14, 2024 by Ki-Seki

When can support MATH/HumanEval datasets eval asking questions

For asking for clarification / support on library usage.

#2564 opened Dec 12, 2024 by shawn0wang

reproduce llama 3 evals good first issue

Good for newcomers

validation

For validation of task implementations.

#2557 opened Dec 10, 2024 by baberabb

fail to reproduce Deepseek-math result asking questions

For asking for clarification / support on library usage.

validation

For validation of task implementations.

#2555 opened Dec 10, 2024 by zhuqiangLu

Hendrycks Math extraction rule seems too strict good first issue

Good for newcomers

validation

For validation of task implementations.

#2552 opened Dec 8, 2024 by fzyzcjy

Inconsistent responses for the same case with different limit parameters

#2550 opened Dec 7, 2024 by Starry-Liu1

Inquiry about the feature to continue evaluation after abnormal termination asking questions

For asking for clarification / support on library usage.

#2548 opened Dec 6, 2024 by minimi-kei

Add Global-MMLU

#2547 opened Dec 6, 2024 by shivalika-singh

Answer extraction logic for Math Lvl 5 (Open LLM Leaderboard 2) may be too strict

#2539 opened Dec 5, 2024 by suhara

Support for for squad dataset

#2538 opened Dec 4, 2024 by danielkorzekwa

lm_eval on squadv2 and meta-llama/Meta-Llama-3.1-8B fails with TypeError: Instance.__init__() got an unexpected keyword argument 'apply_chat_template'

#2537 opened Dec 4, 2024 by danielkorzekwa

long-time stuck after running loglikelihood requests

#2532 opened Dec 3, 2024 by junming-yang

How to evaluate correctness and errors in multiple-choice tasks?

#2529 opened Dec 1, 2024 by WuXnkris

import lm_eval

#2526 opened Nov 30, 2024 by fmk345

Can't evaluate a gguf model using llama.cpp as inference framework

#2525 opened Nov 29, 2024 by SurviiingZc

NotImplementedError: Cannot copy out of meta tensor; no data!

#2519 opened Nov 26, 2024 by Steindox

humaneval

#2516 opened Nov 26, 2024 by fxnie

Got near random guess results for GPQA. validation

For validation of task implementations.

#2513 opened Nov 25, 2024 by Ignoramus0817

Is there a good method to set the generation_kwargs for tasks so that different kv caches can be used for optimization?

#2506 opened Nov 19, 2024 by CoderChen01

french_bench_xnli dataset doesn't exist asking questions

For asking for clarification / support on library usage.

#2501 opened Nov 18, 2024 by jgcb00

[Performance significantly drop when increase the batch_size] bug

Something isn't working.

#2498 opened Nov 17, 2024 by yushengsu-thu

OOM Issues in MMLU Evaluation with lm_eval Using vllm as Backend

#2490 opened Nov 14, 2024 by wchen61

Overwrite default tasks

#2487 opened Nov 13, 2024 by jonoillar

Cannot reproduce LLaMA 3 8B on hendrycks_math validation

For validation of task implementations.

#2479 opened Nov 11, 2024 by liuxiaozhu01

Previous 1 2 3 4 5 … 13 14 Next

Previous Next

ProTip! Add no:assignee to see everything that’s not assigned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly