This project entails sentiment analysis and argumentation mining into the recently published UN security council speeches (UNSC) corpus which is publicly accessible here. The UNSC corpus contains ~65,000 UN security council speeches from ~5,000 security council meetings from years 1995-2017. Each meeting is split up into the various speeches given by member countries. Furthermore, speeches are annotated with dates, topics and overall meeting outcomes.
The UNSC corpus is, however, not annotated for argumentation structures and sentiment polarities. In this project, we attempt to produce automatic machine-driven sentiment and argumentation annotations for the UNSC corpus; which could aid future human-driven annotations.
To find out more about our methodologies, read the next parts of this readme. Additionally, a list of documents detailing our methodologies can be found below:
-
We developed this repository using Python versions
3.7.*
. To sync python-based dependencies, we recommend creating a virtual environment and running the following command:$ pip install -r requirements.txt
-
We use
R
versions3.6.*
andggplot
for pretty visualizations. Execute the following within your R console to get our R-based dependencies:> install.packages(c("ggplot2","tikzDevice","reshape2","optparse","ggsci"))
In order to prepare the necessary data and git hooks, simply run init.sh
and you will receive the following prompts:
$ ./init.sh
-
You will be prompted to download and deploy the UNSC corpus files. This will download and unzip the corresponding files, but can take quite some time due to large file sizes.
-
You will be prompted to download and deploy the US Election Debate corpus which is publicly accesible here. This will download and unzip the corresponding files, and should be fairly quick.
-
Optional: Finally, you will be prompted to initialize a pre-commit hook which keeps python dependencies up-to-date in
requirements.txt
, lints R/shell scripts and convertsorg
files to the GitHub markdown format. This is only necessary if you are further developing this repository.
In this repository, we provide our best performing argumentation mining model ./model_logs/2020_03_17_09_17_44_MSL512_grid_train/model_1.h5
as a Git LFS entry.
-
If
git-lfs
was already installed on your system prior to the cloning of this repository, our best performing model should have also been cloned in the./model_logs
directory. -
If you installed
git-lfs
on your system after cloning this repository, executegit lfs pull
in the repository to pull the best performing model. In case of syncing problems, check out this GitHub issue for suggested workarounds.
Under sentiment analysis, we tested two successful sentiment-analysis tools; specifically VADER and AFINN, on the UNSC corpus. For subjectivity analysis, we used TextBlob, a text processing framework for Python. Next, we evaluated the predicted results to check their quality.
For further details on sentiment analysis, check out our dedicated Jupyter notebook.
Our final product for sentiment analysis is a json file which maps UNSC speech IDs to automatically produced sentiment and subjectivity scores.
Under argumentation mining, we fine-tuned the ALBERT language encoder with custom decoders on a small annotated political argumentation corpus known as the US Election Debate corpus. Next, we applied the fine-tuned argumentation classifier on the UNSC corpus to predict and extract argumentation candidates.
For further details on argumentation mining, check out our dedicated readme.
Our final products for argumentation mining are twofold; firstly being the fine-tuned ALBERT language model and secondly a human-readable json file mapping UNSC speech IDs to token-level argumentation labels. For the json
file, we were only able to conduct argumentation mining for shorter UNSC speeches.
Schönfeld et al. 2019 (UNSC corpus)
@misc{schnfeld2019security,
title={The UN Security Council debates 1995-2017},
author={Mirco Schönfeld and Steffen Eckhard and Ronny Patz and Hilde van Meegdenburg},
year={2019},
eprint={1906.10969},
archivePrefix={arXiv},
primaryClass={cs.DL}
}
Haddadan et al. 2019 (US Election Debate corpus)
@inproceedings{haddadan-etal-2019-yes,
title = "Yes, we can! Mining Arguments in 50 Years of {US} Presidential Campaign Debates",
author = "Haddadan, Shohreh and
Cabrio, Elena and
Villata, Serena",
booktitle = {Proceedings of the 57th Annual Meeting of the Association
for Computational Linguistics},
month = jul,
year = "2019",
address = "Florence, Italy",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/P19-1463",
doi = "10.18653/v1/P19-1463",
pages = "4684--4690"
}
Atreya Shankar, Juliane Hanel
Project Module: Mining Sentiments and Arguments, WiSe 2019/20
Cognitive Systems: Language, Learning, and Reasoning, University of Potsdam