Skip to content

Natural language processing and machine learning to classify hate speech using NLTK, Pytorch, scikit-learn

Notifications You must be signed in to change notification settings

lorenh516/no_hate_all_love

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Detecting Hateful Speech in Social Media Comments

In this project, we apply machine learning to unstructured data to detect hate speech in comments from the Civil Comments dataset, with labeling informed by the Online Hate Index Research Project at D-Lab, University of California, Berkeley.

Goal

Our goal is to classify comments as hateful or not hateful. Historically, attempts to do similar classifications misidentify comments that mention identify groups that could be attacked with hate speech as hateful. We hope to develop more nuanced models that correctly categorize both hateful speech and non-hateful identity references.

Team Members

Technologies

Python:

Amazon Web Services:

Google Cloud Services:

Files & Notebooks

Final Models

Feature Generation

  • feature_generation_functions.py: Contains modules and functions used to generate text and numerical features for model. (273 lines)
  • feature_generation.ipynb: Python 3 notebook used to run functions from feature_generation_functions.py and pickle_functions.py. Generates features, pickles data frames, and sends to s3 bucket. (160 lines)

Helper Functions

  • model_functions.py: Contains modules and functions to generate and test Naive Bayes and SVM models; run metrics on models. (226 lines)
  • pickle_functions.py: Contains modules and functions used to read/write data from/to pickle files hosted in AWS s3 bucket. (60 lines)
  • exploration/exploration_functions.py: Contains modules and functions used to explore dataset. (103 lines)

Intermediate Models

If there are any issues opening a notebook, please enter the link into the renderer at the following site: https://nbviewer.jupyter.org/

About

Natural language processing and machine learning to classify hate speech using NLTK, Pytorch, scikit-learn

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 97.0%
  • Python 2.9%
  • Shell 0.1%