MRCEval is a comprehensive benchmark for machine reading comprehension (MRC) designed to assess the reading comprehension (RC) capabilities of LLMs, covering 13 sub-tasks with a total of 2.1K high-quality multi-choice questions.
MRCEval can be loaded from Huggingface. Download and place the dataset file into the data/
.
Create a python environment, then intall required dependencies:
pip install -r requirements.txt
Choose a model_id
from huggingface, such as meta-llama/Llama-3.1-8B-Instruct
, or your own model_path
.
Run eval.py
:
python eval.py --model [model_id]