Popular repositories Loading
-
-
-
lm-evaluation-harness
lm-evaluation-harness PublicForked from EleutherAI/lm-evaluation-harness
A framework for few-shot evaluation of language models.
-
Ollama-MMLU-Pro-IRT
Ollama-MMLU-Pro-IRT PublicForked from chigkim/Ollama-MMLU-Pro
Ollama-MMLU-Pro fork, using a smaller IRT-tuned subset of MMLU-Pro
Jupyter Notebook 2
-
-
FastEval
FastEval PublicForked from FastEval/FastEval
Fast & more realistic evaluation of chat language models. Includes leaderboard.
Python
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.