- The python scripts use version 3.5.1
- install dependencies
pip install -r requirements.txt
- make the corpus files
bash scripts/makecorpus.sh
UPDATE: Since GitHub rejected the corpus files as too large, they are not included here. Instructions to create the corpus in scripts/mkcorpus.py