This repository contains the implementation for the RedThread algorithm from this paper.
-
There are 2 non-trafficking related datasets used in the experiments Discogs and Memetracker which can be downloaded from their respective websites. The trafficking data can be shared separately via a google drive link.
-
Once you have the data file, to first obtain the data features in the required format, run the code in the
exploratory_analysis.ipynb
file. This code will create the sample data files insidedata
folder. -
Following this, run
redthread_run.py
to run the RedThread algorithm. The default data path is set todata/sample_data/
which points to the files created in Step 2. In order to use another path, provide the path as a command-line argument. To see other arguments, runpython redthread_run.py --help
. The current code runs very slowly and needs to be improved for efficiency. Due to the large size of the data, they have been uploaded here -
The graph models built after Step 3 will be stored in the
models
folder.