Popular repositories Loading
-
-
lm-evaluation-harness
lm-evaluation-harness PublicForked from EleutherAI/lm-evaluation-harness
A framework for few-shot evaluation of language models.
-
Ollama-MMLU-Pro-IRT
Ollama-MMLU-Pro-IRT PublicForked from chigkim/Ollama-MMLU-Pro
Ollama-MMLU-Pro fork, using a smaller IRT-tuned subset of MMLU-Pro
Jupyter Notebook 2
-
-
FastEval
FastEval PublicForked from FastEval/FastEval
Fast & more realistic evaluation of chat language models. Includes leaderboard.
Python
-
MMLU-Pro-IRT
MMLU-Pro-IRT PublicForked from TIGER-AI-Lab/MMLU-Pro
The scripts for MMLU-Pro, using a smaller IRT-tuned dataset
Python
If the problem persists, check the GitHub status page or contact support.