Ollama-MMLU-Pro

This is a modified version of run_gpt4o.py from TIGER-AI-Lab/MMLU-Pro, and it lets you run MMLU-Pro benchmark via the OpenAI Chat Completion API. It's tested on Ollama and Llama.cpp, but it should also work with LMStudio, Koboldcpp, Oobabooga with openai extension, etc.

I kept the testing and scoring method exactly the same as the original script, adding only a few features to simplify running the test and displaying the results. To see the exact changes, compare between mmlu-pro branch against main with git diff:

git diff mmlu-pro..main -- run_openai.py

Usage

Change the config.toml according to your setup.

pip install -r requirements.txt
python run_openai.py

You can also override settings in configuration file with command line flags like --model, ----category, etc. For example, if you specify --model phi3, all the settings from configuration file will be loaded except model. See python run_openai.py -h for more info.

Additional Notes

If an answer cannot be extracted from the model's response, the script will randomly assign an answer. It's the same way as the original script.
The total score represents the number of correct answers out of the total number of attempts including random guess attempts. This is the score from the original script.
"Random Guess Attempts" indicates the total number of random guesses out of the total attempts.
"Correct Random Guesses" shows the number of random guesses that were correct out of all the random guesses.
"Adjusted Score Without Random Guesses" subtracts all random guesses from the correct answers and the total answers.
The last overall score in the table is calculated as: the total number of correct answers across all categories / the total number of all attempts across all categories * 100.
All the scores in percentage are rounded numbers.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.editorconfig		.editorconfig
.gitignore		.gitignore
LICENSE		LICENSE
Ollama_MMLU_Pro.ipynb		Ollama_MMLU_Pro.ipynb
README.md		README.md
config.toml		config.toml
requirements.txt		requirements.txt
run_openai.py		run_openai.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ollama-MMLU-Pro

Usage

Additional Notes

About

Releases

Packages

Languages

License

GlasslessPizza/Ollama-MMLU-Pro

Folders and files

Latest commit

History

Repository files navigation

Ollama-MMLU-Pro

Usage

Additional Notes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages