Skip to content

Analysis of text in tweets and youtube comments for toxicity

Notifications You must be signed in to change notification settings

jhochmuth/YouToxic

Repository files navigation

YouToxic

A Python web application that predicts the toxicity of text using Deep Learning. App is no longer deployed on a cluster.

Demo

Basic demo

Specifics

The framework of the app was built primarily with Dash by Plotly.

The predictions are generated with ULMFiT models trained on the dataset provided for the Jigsaw Toxic Comment Classification Kaggle competition. You can find the dataset here.

Currently, predictions can be made for 4 types of toxicity: general toxicity, insults, obscenity, and prejudice/identity hate.

Usage

If you are trying to run the application locally, use the requirements.txt to build your environment. You also need to provide environment variables that contain credentials for the twitter API and youtube API. Tweets and youtube comments cannot be collected unless you provide these. Predictions on text entered manually are still possible even without providing the credentials.

These are the necessary environment variables: CONSUMER_KEY, CONSUMER_SECRET, ACCESS_KEY, ACCESS_SECRET, YOUTUBE_KEY

The first four variables are all for the twitter API.

You must also download the model and mappings files for each type of toxicity. These files can be downloaded here. Add them to the "youtoxic/app/models directory".

These files are stored on Onedrive because they are too large for storage on Github and Git LFS has a 1 GB storage limit for free users.

Documentation

YouToxic uses Sphinx to build the documentation. To build

cd docs
make html

About

Analysis of text in tweets and youtube comments for toxicity

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published