Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

News Articles Category Prediction #402

Merged
merged 4 commits into from
Oct 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 117 additions & 0 deletions Prediction Models/News_Category_Prediction/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# 📰 News Article Classification Streamlit App

## Overview 🎯

This Streamlit application classifies news articles into five predefined categories: **Business**, **Entertainment**, **Politics**, **Sports**, and **Technology**. It leverages advanced machine learning techniques, such as **Non-Negative Matrix Factorization (NMF)** and **Support Vector Classifier (SVC)**, to provide an intuitive interface where users can input articles and receive instant predictions.

## Table of Contents 📑

- [Overview 🎯](#overview-🎯)
- [Features ✨](#features-✨)
- [Technologies Used 🔧](#technologies-used-🔧)
- [Dataset 📊](#dataset-📊)
- [Jupyter Notebook 📘](#jupyter-notebook-📘)
- [Installation ⚙️](#installation-⚙️)
- [Usage 💻](#usage-💻)
- [Model Training 🤖](#model-training-🤖)
- [Model Evaluation 📈](#model-evaluation-📈)
- [Model Tuning 🔍](#model-tuning-🔍)
- [Predicting Test Dataset 📉](#predicting-test-dataset-📉)
- [Contributing 🤝](#contributing-🤝)

## Features ✨

- **Interactive Classification**: Input any news article to instantly predict its category.
- **Performance Visualization**: See model accuracy metrics and confusion matrices to analyze the results.
- **Multiple Model Benchmarking**: Test various algorithms and find the best-performing one.
- **Easy-to-Use Interface**: Streamlined design to make the app simple and intuitive for all users.

## Technologies Used 🔧

- [Streamlit](https://streamlit.io/) - Web application framework.
- [scikit-learn](https://scikit-learn.org/stable/) - Machine learning models and evaluation metrics.
- [pandas](https://pandas.pydata.org/) - Data manipulation.
- [numpy](https://numpy.org/) - Numerical computations.
- [matplotlib](https://matplotlib.org/) & [seaborn](https://seaborn.pydata.org/) - Data visualization.

## Dataset 📊

The app uses a **BBC News dataset** consisting of 2,225 articles spread across five categories. This dataset is divided into 1,490 articles for training and 735 for testing. You can download the dataset or use the one provided in this repository.

## Jupyter Notebook 📘

The Jupyter Notebook `news-classification.ipynb` is used for data preprocessing, model training, and evaluation. The key steps include:

- **Data Preprocessing**: Text tokenization, stemming, and stopword removal.
- **Feature Extraction**: TF-IDF to convert text to numerical features.
- **Model Training**: Training with models like Logistic Regression, Naive Bayes, and SVC.
- **Evaluation**: Classification reports and confusion matrices.

## Installation ⚙️

To run this app locally:

1. Clone the repository:
```bash
git clone <repository_url>
cd <repository_directory>
```
2. Install the required packages:
```bash
pip install -r requirements.txt
```

## Usage 💻

1. Run the app using Streamlit:
```bash
streamlit run app.py
```
2. Open your browser and go to `http://localhost:8501`.
3. Input any news article and get its predicted category.

## Model Training 🤖

The app evaluates several models, including:

- **Logistic Regression**: Models relationships between features and class probabilities.
- **Naive Bayes**: Probabilistic classifier with independence assumptions.
- **Support Vector Classifier (SVC)**: Finds an optimal hyperplane for classification.

The SVC model, which performed best during cross-validation, is used in the final prediction process.

## Model Evaluation 📈

Model performance is assessed using accuracy, precision, recall, and F1-score. Confusion matrices allow you to visualize misclassifications and understand model behavior better.

## Model Tuning 🔍

We use **GridSearchCV** to tune hyperparameters and improve model performance. Analyzing the most important words helps refine the model and preprocessing steps.

## Predicting Test Dataset 📉

The best-performing model is applied to predict the test dataset, giving users insights into model accuracy with unseen data.

## Contributing 🤝

Feel free to contribute! Suggestions, bug reports, or feature requests are welcome. To contribute:

1. Fork the repo.
2. Create a new branch:
```bash
git checkout -b feature-branch
```
3. Make your changes and commit them:
```bash
git commit -m "Added new feature"
```
4. Push to your branch:
```bash
git push origin feature-branch
```

## Contributor 👥

Pratik Wayal - Feel free to connect [GitHub](https://github.com/pratikwayal01)

---
24 changes: 24 additions & 0 deletions Prediction Models/News_Category_Prediction/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
import streamlit as st
import pandas as pd
import joblib
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.pipeline import Pipeline


model = joblib.load('news_category_prediction.pkl')

st.title("News Category Prediction")
st.write("""
## Predict the category of a news article based on its context.
""")

# user input
input_text = st.text_input("Enter your News Content: ", "")

# Prediction Button
if st.button("Classify"):
if input_text:
prediction = model.predict([input_text])[0]
st.write(f"Predicted Category: {prediction}")
else:
st.write("Please Enter some text to classify.")
Loading
Loading