Grebe aggregates geo-fenced Canadian Twitter data for research in sociology and public health. View our demo to see how the data collected by Grebe can be analyzed and visualized in various ways.
Please cite the following publication when using our source code for your research.
@inproceedings{SamuelNooriFaraziZaiane2018,
title = {{Context Prediction in the Social Web Using Applied Machine Learning: A Study of Canadian Tweeters}},
author = {Samuel, Hamman and Noori, Benyamin and Farazi, Sara and Zaiane, Osmar},
booktitle = {IEEE/WIC/ACM International Conference on Web Intelligence (WI)},
pages = {230--237},
year = {2018},
organization = {IEEE}
}
A working live web app is available for demo purposes.
- For hosting, you can use IaaS with Cybera or Digital Ocean, PaaS with OpenShift or Heroku, or just use your laptop/computer (not recommended due to space and processing limitations).
- Install Python.
- Install Flask by using
pip install flask
. - Install Flask's HTTP Auth dependency via
pip install flask-httpauth
. - Install TwitterAPI via
pip install TwitterAPI
. - Install MariaDB.
- Run the SQL commands in
schema.sql
to set up a database. - Edit
config.py
to enter your database username and password. - Install the MySQL Connector via
pip install mysql-connector
. - Install Python MySQL Connector by using
pip install mysql-connector-python-rf
.
- Aggregate tweets by running
spyder.py
. - Initialize cache by running
scripts/cacher.py
. - View web app by running
webapp/server.py
.
- Sign up for a Twitter Developer account.
- Set up your Twitter API keys.
- Edit
config.py
and enter your API keys. - In a terminal, use the following command to run the aggregator
python spyder.py [status | search | stream]
. - If you want to aggregate data automatically, set up instances of the command above to run at scheduled intervals, for example as a cron job or Task Scheduler.
- To visualize and display data faster in the web app, the cache directory is set in
config.py
asHOME_DIR
. - To set up the cache, run
python cacher.py [data tags stats]
from thescripts
folder. - Clean up your cache directory regularly so it doesn't fill your drive, a sample bash script is provided here that can be set up to run regularly (replace
HOME_DIR
with the actual path to your directory).
#!/bin/bash
LIMIT="1000000" # 1GB
SIZE=$(du --apparent-size HOME_DIR | cut -f1)
if (($SIZE > $LIMIT))
then
rm -f HOME_DIR/*
echo "Cache cleared"
else
echo "Cache preserved"
fi
- In a terminal from the
webapp
folder, use the commandpython server.py
to run the Flask server. - In your web browser, go to http://127.0.0.1:5000/grebe/
- When using IaaS hosting, you can serve the Flask web app using uWSGI.
- PaaS hosting configurations depend on the provider, but here is one for Heroku.