Machine Learning Fall 2022 Final Project

2 labels version

3 labels version

Project Description

We are trying to solve the problem of automating hate speech and offensive language detection.

Hate speeches are common on social media, and it would be easier for such speeches to be regulated if some program can automatically detect them. The problem is similar to the language recognition in hw3 lab in that we take a natural language input as a sequence, and train a model to predict some labels associated with such input sequence. The unique part of this task is that hate speech/offensive language is sometimes hard to detect because it really depends on the context the language is used. By automating hate speech and offensive language detection, we could contribute to making a more healthy internet environment.

Install Dependencies

On MacOS/Linux

pip install -r requirements.txt

Quickstart

ML_project_3labels.ipynb is a jupyter notebook file contains the 3 labels classifier model we wrote.

ML_project_2labels.ipynb is a jupyter notebook file in which we combined the "Offensive" and "Hate" languages in the dataset together to make binary classification

Methods Documentation

create_train_and_test_set_balanced(X, y, train_ratio=0.8)

Parameters

X: array of sentence embeddings
y: labels
train_ratio: proportion of size of training set to

Returns

X_train: Training data
X_rem: Testing data
y_train: Training labels
y_rem: Testing labels

model.fit(train_loader, epochs=300, lr=1e-5, interval=100)

Parameters

train_loader: Dataloader for the training dataset
epochs: number of epochs in training
lr: learning rate of optimizer
interval: frequency to output loss information

model.validate(valid_loader)

Parameters

valid_loader: Dataloader for the validation dataset

Returns

The average validation loss

model.accuracy(test_loader)

Parameters

test_loader: Dataloader for the testing dataset

Returns

Accuracy score of the model on the testing dataset

model.predict(sentence)

Parameters

sentence: Input sentence to predict its category

Returns

Hate, Offensive or neither

model.metrics(test_loader)

Note: this method is only in 2 labels version

Parameters

test_loader: Dataloader for the testing dataset

Returns

Evaluation metrics including accuracy, precision, recall, f1 score and a ROC graph

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
CS 475 Project Presentation.pptx		CS 475 Project Presentation.pptx
CS 475_675 Final Project-2022.pdf		CS 475_675 Final Project-2022.pdf
CS475_Final_Project_Report.ipynb		CS475_Final_Project_Report.ipynb
ML_final_report-2.pdf		ML_final_report-2.pdf
ML_project_2labels.ipynb		ML_project_2labels.ipynb
ML_project_3labels.ipynb		ML_project_3labels.ipynb
README.md		README.md
cs475_project.pdf		cs475_project.pdf
data_preprocess.py		data_preprocess.py
data_processed.csv		data_processed.csv
labeled_data.csv		labeled_data.csv
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Fall 2022 Final Project

Project Description

Install Dependencies

On MacOS/Linux

Quickstart

Methods Documentation

About

Releases

Packages

Contributors 3

Languages

KeyiDing/ML_final_project

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Fall 2022 Final Project

Project Description

Install Dependencies

On MacOS/Linux

Quickstart

Methods Documentation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages