This repository contains code to reproduce the results in the paper Quantifying the Vulnerabilities of the Online Public Square to Adversarial Manipulation Tactics by Bao Tran Truong, Xiaodan Lou, Alessandro Flammini, and Filippo Menczer.
data
: contains raw & derived datasetsexample
: contains a minimal example to start using the SimSoM modellibs
: contains the SimSoM model package that can be imported into scriptsreport_figures
: experiment results, supplementary data and .ipynb noteboooks to produce figures reported in the paperworkflow
: scripts to run simulation and Snakemake rules to run sets of experiments
- This code is written and tested with Python>=3.6
- We use
conda
, a package manager to manage the development environment. Please make sure you have conda or mamba installed on your machine
To set up the environment and install the model: run make
from the project directory (SimSoM
)
- Create the environment with required packages: run
conda env create -n simsom -f environment.yml
to - Install the
SimSoM
module:- activate virtualenv:
conda activate simsom
- run
pip install -e ./libs/
- activate virtualenv:
The empirical network is created from the Replication Data for: Right and left, partisanship predicts vulnerability to misinformation, where:
measures.tab
contains user information, i.e., one's partisanship and misinformation score.anonymized-friends.json
is the adjacency list.
We reconstruct the empirical network from the above 2 files, resulting in data/follower_network.gml
. The steps are specified in the script to create empirical network
Check out example
to get started.
- Example of the simulation and results:
example/run_simulation.ipynb
- From the root directory, unzip the data file:
unzip data/data.zip -d .
- Create config files specifying parameters for simulations:
workflow/scripts/make_finalconfig.py
- See
example/data/config.json
for example of a config file
- See
- Run a Snakemake rule corresponding to the simulations of interest.
- e.g.:
workflow/rules/shuffle_network.smk
runs simulations on different shuffled versions of the empirical network
- e.g.:
The results in the paper are based on averages across multiple simulation runs. To reproduce those results, we suggest running the simulations in parallel, for example on a cluster, since they will need a lot of memory and CPU time.