The PERSUADE 2.0 corpus builds on the PERSUADE 1.0 corpus by providing holistic essay scores to each persuasive essay in the PERSUADE 1.0 corpus as well as proficiency scores for each argumentative and discourse element found in the initial corpus. This version also contains all essays (as compared to 1.0 which linked the training set for the Kaggle competition)
In total, the PERSUADE 2.0 corpus comprises over 25,000 argumentative essays produced by 6th-12th grade students in the United States for 15 prompts on two writing tasks: independent and source-based writing. The PERSUADE 2.0 corpus provides detailed individual and demographic information for each writer as well as the initial annotations for argumentative and discourse element found PERSUADE 1.0.
The .csv files are too large for github. The links for the dataframes are below
All the argumentative and discourse element annotations and effectiveness scores are available at
PLEASE NOTE: The test set is a password protected zip file. The password is persuade_test.
You may need to use specific software to decrypt the zip file like 7-Zip for Windows of Keka for Mac.
The published paper for the dataset is avaible is published here.
The reference for the paper is
Crossley, S. A, Baffour, P., Tian, Y., Franklin, A., Benner, M., & Boser., U. (2024). A large-scale corpus for assessing written argumentation: PERSUADE 2.0. Assessing Writing, 61.
A pre-print of the associated paper is on zenodo.
The data is provided under a CC BY-NC-SA 4.0 DEED Attribution-NonCommercial-ShareAlike 4.0 International license (https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en)