Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add new Arabic benchmarks (5) and enhance existing tasks (#372)
* Update arabic_evals.py Add new Arabic benchmarks and update existing tasks - Renamed `arabic_mmlu` to `arabic_mmlu_mt` to highlight its machine-translated origin. - Added new benchmarks: `arabic_mmlu` ArabicMMLU (https://arxiv.org/abs/2402.12840), `arabic_mmlu_ht` (human-translated), and `MadinahQA` from MBZUAI. As well as `arabic_mmmlu` (OpenAI MMMLU), and `AraTrust` a trustworthiness benchmark for Arabic LLMs (https://arxiv.org/abs/2403.09017). - Enhanced prompt functions for better flexibility in answer options. * Update and rename OALL_tasks.txt to OALL_v1_tasks.txt Rename file to refelect that it is v1 leaderboard tasks * Create OALL_v2_tasks.txt Tasks for v2 of OALL * Update all_arabic_tasks.txt add new and renamed tasks * Update arabic_evals.py Fix formatting issues for * Update all_arabic_tasks.txt Add missing task: OpenAI's MMMLU arabic subset * Update all_arabic_tasks.txt Correct order * Update arabic_evals.py remove openai mmmlu task following the discussion here: #372 * Update all_arabic_tasks.txt remove openai mmmlu task following the discussion here: #372 * Update tasks.py Adding a templated version of arabic mmlu based on @hynky1999 request in the #372 PR * Update tasks.py remove arabic_mmlu_templated_tasks --------- Co-authored-by: Clémentine Fourrier <[email protected]> Co-authored-by: Nathan Habib <[email protected]>
- Loading branch information