In this project, we apply machine learning to unstructured data to detect hate speech in comments from the Civil Comments dataset, with labeling informed by the Online Hate Index Research Project at D-Lab, University of California, Berkeley.
Our goal is to classify comments as hateful or not hateful. Historically, attempts to do similar classifications misidentify comments that mention identify groups that could be attacked with hate speech as hateful. We hope to develop more nuanced models that correctly categorize both hateful speech and non-hateful identity references.
Python:
Amazon Web Services:
Google Cloud Services:
- NB_final.ipynb Naive Bayes Model (698 lines)
- SVM_final.ipynb Support Vector Machines Model (1818 lines)
- neural_network.ipynb Two Layer Neural Network (536 lines)
- final_lstm.ipynb Three Layer Bidirectional Long Short-Term Memory Recurrent Neural Network (7514 lines)
- feature_generation_functions.py: Contains modules and functions used to generate text and numerical features for model. (273 lines)
- feature_generation.ipynb: Python 3 notebook used to run functions from feature_generation_functions.py and pickle_functions.py. Generates features, pickles data frames, and sends to s3 bucket. (160 lines)
- model_functions.py: Contains modules and functions to generate and test Naive Bayes and SVM models; run metrics on models. (226 lines)
- pickle_functions.py: Contains modules and functions used to read/write data from/to pickle files hosted in AWS s3 bucket. (60 lines)
- exploration/exploration_functions.py: Contains modules and functions used to explore dataset. (103 lines)
- Stepping_Stones: Iterations of each model that was built prior to the final model design and assessment
- Initial_Models_Exploration.ipynb (1697 lines)
- NB_iter1.ipynb (726 lines)
- NB_iter2.ipynb (626 lines)
- NB_iter3.ipynb (865 lines)
- SVM_iter1.ipynb (657 lines)
- SVM_iter2.ipynb (691 lines)
- SVM iter3.ipynb (644 lines)
- initial_lstm.ipynb (1920 lines)
- exec_lstm (587 lines) and rcc_run_model.sh (27 lines)
If there are any issues opening a notebook, please enter the link into the renderer at the following site: https://nbviewer.jupyter.org/