SocialKB is a system for modeling Twitter data and reasoning about the data for discovering malicious content and suspicious users.
SocialKB uses Markov logic networks (MLNs) for modeling and inference. It uses Tuffy as the MLN engine. Tuffy has been obtained from http://i.stanford.edu/hazy/tuffy/. Please visit this page to download Tuffy.
-
Clone the repository
git clone https://github.com/UMKC-BigDataLab/SocialKB.git
-
System Setup
cd scripts && source setup.sh
-
Collect Twitter Data and Construct Evidence
- Update twitter keys along with the number of tweets to be collected at
scripts/construct-evidence.sh
- Construct evidence db
cd scripts && bash construct-evidence.sh
- Update twitter keys along with the number of tweets to be collected at
-
Setup Tuffy
- Setup PostgreSQL
cd scripts && bash postgresql_setup.sh
- Update
tuffy.conf
with the username
- Setup PostgreSQL
-
Weight Learning
java -jar tuffy.jar -learnwt -e <EVIDENCE_DIR>/evidence.db -i input/prog.mln -queryFile input/query.db -r lrnt.prog.mln -mcsatSamples 50 -dMaxIter 100
-
Inference
- MAP Inference:
java -jar tuffy.jar -e <EVIDENCE_DIR>/evidence.db -i input/lrnt.prog.mln -queryFile input/query.db -r out.txt
- Marginal Inference:
java -jar tuffy.jar -marginal -e <EVIDENCE_DIR>/evidence.db -i input/lrnt.prog.mln -queryFile input/query.db -r out.txt
- MAP Inference:
-
Praveen Rao, Anas Katib, Charles Kamhoua, Kevin Kwiat, and Laurent Njilla. "Probabilistic Inference on Twitter Data to Discover Suspicious Users and Malicious Content." In the 2nd IEEE International Symposium on Security and Privacy in Social Networks and Big Data (SocialSec 2016), pages 407-414, Nadi, Fiji, December 2016. PDF
-
Praveen Rao, Charles Kamhoua, Laurent Njilla, Kevin Kwiat. "Methods to Detect Cyberthreats on Twitter." In Surveillance in Action - Technologies for Civilian, Military and Cyber Surveillance, pages 333-350, Springer, 2017.
- Praveen Rao, Charles Kamhoua, Kevin Kwiat, Laurent Njilla. "System and Article of Manufacture to Analyze Twitter Data to Discover Suspicious Users and Malicious Content," US Patent, Sr. No. 10,348,752, July 9, 2019.
Faculty: Praveen Rao (PI)
PhD Students: Anas Katib, Arun Zachariah
Others: Charles Kamhoua, Kevin Kwiat, and Laurent Njilla
The first author (P.R.) was supported by the U.S. Air Force Summer Faculty Fellowship and the National Research Council Research Associateship Senior Fellowship Award.