Violence-Tweet-Classification:

Overview

The primary objective of "Classifying Violence Text" is to build a robust machine learning model capable of categorizing text documents into specific violence classes. By analyzing the textual content, we strive to uncover patterns and features that differentiate the various forms of violence, thereby contributing to our collective effort in addressing these pressing societal challenges.

Dataset

For this undertaking, I utilized a dataset sourced from Kaggle, comprising textual descriptions of violence incidents, meticulously labeled with five distinct classes representing the different forms of violence. This dataset serves as the foundation for training, validating, and evaluating our classification model.

Data Preprocessing

To ensure the model's efficacy, we subjected the text data to essential preprocessing steps, including:

Converting all text to lowercase to maintain uniformity in text representation.
Removing punctuations to focus solely on the meaningful words.
Eliminating common stop words to reduce noise and enhance signal detection.
Applying lemmatization to simplify the words and reduce them to their base or root form.

Text Vectorization

In order to convert the preprocessed text into numerical features, we leveraged the TF-IDF (Term Frequency-Inverse Document Frequency) Vectorizer. This powerful technique captures not only the significance of individual words in a document but also their relevance across the entire corpus.

Machine Learning Model

The Machine Learning model we chose for this classification task is the widely-used Logistic Regression algorithm. Renowned for its simplicity and efficiency in binary and multi-class classification problems, Logistic Regression proved to be a suitable choice for our project. We trained the model on the transformed TF-IDF features to effectively categorize the text documents into their respective violence classes.

Tools and Technology Used

In this data science endeavor, I harnessed the capabilities of various Python libraries, including:

Pandas: For seamless data manipulation and analysis.
Matplotlib and Seaborn: For creating insightful data visualizations and plots.
Scikit-Learn: To implement and evaluate the machine learning models.

Results and Insights

The trained Logistic Regression model demonstrated commendable performance in classifying violence text, achieving notable accuracy and precision across the violence classes. The insights gleaned from this classification effort offer valuable understanding into the distinct characteristics of each form of violence, potentially aiding in devising effective prevention strategies and support mechanisms.

Tools Used: Python, Pandas, Matplotlib, Seaborn, Scikit-Learn

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
Text Classification.ipynb		Text Classification.ipynb
test_data.csv		test_data.csv
train_data.csv		train_data.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Violence-Tweet-Classification:

Overview

Dataset

Data Preprocessing

Text Vectorization

Machine Learning Model

Tools and Technology Used

Results and Insights

About

Releases

Packages

Languages

NyAiko/Violence-Tweet-Classification

Folders and files

Latest commit

History

Repository files navigation

Violence-Tweet-Classification:

Overview

Dataset

Data Preprocessing

Text Vectorization

Machine Learning Model

Tools and Technology Used

Results and Insights

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages