Skip to content

NyAiko/Violence-Tweet-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Violence-Tweet-Classification:

Overview

The primary objective of "Classifying Violence Text" is to build a robust machine learning model capable of categorizing text documents into specific violence classes. By analyzing the textual content, we strive to uncover patterns and features that differentiate the various forms of violence, thereby contributing to our collective effort in addressing these pressing societal challenges.

Dataset

For this undertaking, I utilized a dataset sourced from Kaggle, comprising textual descriptions of violence incidents, meticulously labeled with five distinct classes representing the different forms of violence. This dataset serves as the foundation for training, validating, and evaluating our classification model.

Data Preprocessing

To ensure the model's efficacy, we subjected the text data to essential preprocessing steps, including:

  • Converting all text to lowercase to maintain uniformity in text representation.
  • Removing punctuations to focus solely on the meaningful words.
  • Eliminating common stop words to reduce noise and enhance signal detection.
  • Applying lemmatization to simplify the words and reduce them to their base or root form.

Text Vectorization

In order to convert the preprocessed text into numerical features, we leveraged the TF-IDF (Term Frequency-Inverse Document Frequency) Vectorizer. This powerful technique captures not only the significance of individual words in a document but also their relevance across the entire corpus.

Machine Learning Model

The Machine Learning model we chose for this classification task is the widely-used Logistic Regression algorithm. Renowned for its simplicity and efficiency in binary and multi-class classification problems, Logistic Regression proved to be a suitable choice for our project. We trained the model on the transformed TF-IDF features to effectively categorize the text documents into their respective violence classes.

Tools and Technology Used

In this data science endeavor, I harnessed the capabilities of various Python libraries, including:

  • Pandas: For seamless data manipulation and analysis.
  • Matplotlib and Seaborn: For creating insightful data visualizations and plots.
  • Scikit-Learn: To implement and evaluate the machine learning models.

Results and Insights

The trained Logistic Regression model demonstrated commendable performance in classifying violence text, achieving notable accuracy and precision across the violence classes. The insights gleaned from this classification effort offer valuable understanding into the distinct characteristics of each form of violence, potentially aiding in devising effective prevention strategies and support mechanisms.

eco

emov

harmful

phys_vi

sx_vio

wordcount

confusion_matrix

Tools Used: Python, Pandas, Matplotlib, Seaborn, Scikit-Learn

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published