π Benchmark Data | π Arxiv | π οΈ Evaluation Framework
| π Dataset | π Description |
|---|---|
| FinSM | Evaluation set for FinSM subtask within FinAuditing benchmark. This task follows the information retrieval paradigm: given a query describing a financial term that represents either currency or concentration of credit risk, an XBRL filing, and a US-GAAP taxonomy, the output is the set of mismatched US-GAAP tags after retrieval. |
| FinRE | Evaluation set for FinRE subtask within FinAuditing benchmark. This is a relation extraction task, given two specific elements |
| FinMR | Evaluation set for FinMR subtask within FinAuditing benchmark. This is a mathematical reasoning task, given two questions |
| FinSM_Sub | FinSM subset for ICAIF 2025. |
| FinRE_Sub | FinRE subset for ICAIF 2025. |
| FinMR_Sub | FinMR subset for ICAIF 2025. |
If you find our benchmark useful, please cite:
@misc{wang2025finauditingfinancialtaxonomystructuredmultidocument,
title={FinAuditing: A Financial Taxonomy-Structured Multi-Document Benchmark for Evaluating LLMs},
author={Yan Wang and Keyi Wang and Shanshan Yang and Jaisal Patel and Jeff Zhao and Fengran Mo and Xueqing Peng and Lingfei Qian and Jimin Huang and Guojun Xiong and Xiao-Yang Liu and Jian-Yun Nie},
year={2025},
eprint={2510.08886},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2510.08886},
}