Detecting hate speech

Final project for NYU Statistical Consulting class.

Warning: this project contains vulgar and offensive language.

Read the full paper here and a shortened summary here.

Abstract

The divisive and polarizing rhetoric in the 2016 presidential election sparked concern over popularizing hateful sentiments towards marginalized populations on Twitter. We analyzed ~100 million tweets spanning 14 years to identify hate speech targeted towards the LGBTQ+ community. We then modeled the prevalence and incidence of hate speech surrounding key political events to test if there was a significant change in the patterns.

Hate speech was identified using dictionary-based methods refined by logistic regression, Naive Bayes, and Recurrent Neural Network (RNN) machine learning classifiers. Quasi-experimental interrupted time series design was used to quantify the incidence and prevalence of hate speech — the former defined as the change in rate of hate speech and the latter the change in the amount of hate speech in a given time period.

We found no conclusive evidence of changes in prevalence or incidence of hate speech around key events. While some events saw brief upticks in prevalence, overall levels of hate speech remained stable. Our analysis finds exploratory evidence of decreases in incidence of anti-LGBTQ+ hate speech (p < 0.001) over time coinciding with a Twitter policy change allowing users to directly report abuse.

The incidence and prevalence are visualized in the below plot. Incidence is represented by the change in slopes of the regression lines and prevalence by the gap between the two regression lines at the date of interest. The most apparent change — bottom right facet — corresponds to the Windsor vs. United States case decided on June 26, 2013. However, it is confounded by a change in Twitter policy two months later.

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
Analysis		Analysis
Checkpoint		Checkpoint
Modeling		Modeling
Plots		Plots
Tweets		Tweets
Writeup_and_presentation		Writeup_and_presentation
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Detecting hate speech

Abstract

Results

About

Releases

Packages

Contributors 3

Languages

joemarlo/hate-speech

Folders and files

Latest commit

History

Repository files navigation

Detecting hate speech

Abstract

Results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages