Code accompanying the EMNLP2023 long paper "Do Differences in Values Influence Disagreements in Online Discussions?"
- Install the package in developer mode
pip install -e .
. This will (hopefully) also install the dependencies. - Done ... (well, almost)
You may need to download or initialize models, but you will be prompted for it.
- You may need to download spacy models
- You may beed to initialize wandb logging
Here's some guidance on the folder structure. Inside each file should be more information about what the script is intended to do.
The code for the value extraction, dataset and evaluation metrics. Some classes require external files, listed below:
- Value Dictionary baseline: find the Refined_dictionary.txt file here and place it in
data/
.
May contain notebooks made for analysis of generated or scraped data. In our case, contains code for training the TF-IDF baseline for (dis-)agreement prediction.
Folder for storing all data (datasets, user profile information, task instances, survey results). We list a bunch of sources below.
- Experimental data for our paper: https://osf.io/42dns/
- Debagreement: https://scale.com/open-av-datasets/oxford
- ArgValues (internal name is ValueEval): https://zenodo.org/records/6855004 (download
webis-argvalues-22.zip
). - ValueNet: https://liang-qiu.github.io/ValueNet/ (download original).
- MFTC: https://osf.io/k5n7y/ (download and follow provided instructions)
Some unittest functionality, or other sanity checks. Call using python3 -m unittest discover test
.
Creation of the Bayes Factor scores.
Training and evaluation of models for agreement analysis.
Training and evaluation of models for value extractions
Download the experimental data from OSF, which contains the links to the Reddit comments analyzed in our work. You can gather these comments using e.g. PRAW. After obtaining the comment data, you can construct user profiles as follows.
- Filter the comments to only include data from relevant subreddits with
scripts/filter_subreddits.py
. You may need to adjust internal paths and the comment storage format to match that of theRedditBackgroundDataset
. - Filter the content to only include English text using
scripts/filter_reddit.py
. - Create user profiles using
scripts/get_user_context.py
. Depending on the method you are using for constructing the profiles, you may need to have trained value extraction models (see next section).
See python3 scripts/training_moral_values/train.py -h
See python3 scripts/training_agreement/train.py -h
Below is a (non-exhaustive) list of scripts you need to run to compute the values as presented in the paper.
- Figure 2:
scripts/analyze_profiles.py
- Table 3:
scripts/count_debagreements.py
for the most significant value in each subcorpus, andscripts/analyze_value_conflict.py
for the mean tau distance.