This is a simple Fake News Detection project that utilizes Machine Learning (ML), Deep Learning (DL), LSTM (Long Short-Term Memory), Natural Language Processing (NLP), Stemming, Lemmatization, and Data Visualization techniques. This project aims to build a model that can classify news articles as real or fake.
Table of Contents:
- Introduction
- Project Overview
- Technologies Used
- Dataset
- Data Preprocessing
- Feature Extraction
- Model Architecture
- Training
- Evaluation
- Usage
Fake news has become a significant issue in the digital age, and developing methods to identify and combat it is crucial. This project aims to address the problem of fake news by leveraging Machine Learning and Deep Learning techniques to build a model that can distinguish between genuine and fake news articles.
The project consists of several key steps:
-
Data collection and preprocessing: Obtain a dataset of labelled news articles and clean the data for further processing.
-
Feature extraction: Convert the textual data into numerical vectors using techniques like TF-IDF (Term Frequency-Inverse Document Frequency), stemming, and lemmatization.
-
Model Building: Implement a Deep Learning LSTM model to train on the extracted features.
-
Training: Train the LSTM model on the preprocessed data.
-
Evaluation: Evaluate the model's performance using various metrics to measure its effectiveness in detecting fake news.
-
Data Visualization: Visualize the results and important insights gained from the analysis.
The project employs the following technologies:
- Machine Learning (ML) and Deep Learning (DL) techniques
- Long Short-Term Memory (LSTM) neural networks
- Natural Language Processing (NLP) for text data preprocessing
- Stemming and Lemmatization for text normalization
- Data Visualization libraries for presenting results effectively
- Flask to run the project on the website
The dataset used in this project is obtained from a reliable source and contains labelled news articles. It comprises two classes: "Real News" and "Fake News." The data should be split into training and testing sets to evaluate the model's performance.
Data preprocessing is a critical step to clean and prepare the text data for further analysis. The following preprocessing steps will be applied:
- Removing HTML tags and special characters
- Converting text to lowercase
- Removing stopwords
- Tokenization
Feature Extraction To convert the textual data into a numerical format for model training, the following techniques will be employed:
- TF-IDF (Term Frequency-Inverse Document Frequency): To represent the importance of words in documents.
- Stemming and Lemmatization: To reduce words to their base or root form.
The model architecture will consist of an LSTM neural network. LSTM is chosen for its ability to process sequential data and handle long-term dependencies, making it suitable for NLP tasks.
The training phase involves feeding the preprocessed data into the LSTM model. The model will be trained on the training dataset with an appropriate optimization algorithm and loss function.
The model's performance will be evaluated using various metrics such as accuracy, precision, recall, and F1-score. Confusion matrix and ROC-AUC curves may also be employed to assess the model's effectiveness.
To use this Fake News Detection project, follow these steps:
- Clone the repository to your local machine.
- Install the required dependencies using the provided requirements.txt file.
- Run the data preprocessing scripts to clean and prepare the dataset.
- Execute the feature extraction scripts to convert the text data into numerical vectors.
- Train the LSTM model on the preprocessed data.
- Evaluate the model's performance using the evaluation scripts.
- Visualize the results and insights gained from the analysis.
- Flask is used to view the website run the code run section and input your news articles, get the results as real or fake.
Screenshots: