Detecting-Phishing-Attack-using-ML-DL-Models

Developed a model to detect Phished emails from legitimate ones using the Spam Assassin dataset. Extracted relevant features by processing the mails using the NLP toolkit. Built various ML models like Naïve Bayes, Random Forest, and Voting Ensemble with the best accuracy of ~72%, and deep learning model like Neural Network with an accuracy of ~96%.

Overview

Phishing is when cybercriminals send malicious emails designed to trick people into falling for a scam. The intent is often to get users to reveal financial information, system credentials, or other sensitive data. The term “Phishing” came about in mid-1990’s, when hackers began using fraudulent emails to fish for information from unsuspecting users. Cybercriminals use phishing because it’s easy, cheap and effective. Email addresses are easy to obtain and emails are virtually free to send. With little effort and little cost, attackers can quickly gain access to valuable data. We can detect these emails and detect them as spam and reduce these attacks. To do this we can use various machine learning and deep learning models.

Phishector Architecture

Email Dataset

An experiment is conducted in order to identify the input/output behavior of the system. We have collected data from 2 different datasets. The datasets are SpamAssassin and spam/ham. These datasets are open-source and are freely available. The dataset collected in the experiment are identified and given in Table 4.1. Below table shows the total count of dataset and number of phished and legitimate emails present in those datasets which we have further used to train our model.

Implementation

Accessing the .py file and running Phishector code.
Entering the path to the folder consisting of emails.
Menu choice available to the user.
Choosing choice 1 leads to the extracted features of the emails.
Choosing choice 2 provides classification using Deep learning ie Neural network.
Choosing choice 3 provides ML models menu.
Choice 3 in ML models menu provides classification using Extra trees model.
Choice 4 & 5 in ML models menu provides classification using Adaboost and Stochastic Gradient Boosting model respectively.
Choice 6 & 7 in ML models menu provides classification using Voting Ensemble and Naive Bayes model respectively.
Choice 8 in ML models menu provides classification using SVM model and choosing option 9 in ML models menu will EXIT the internal menu and go back to Main Menu.

Evaluation Metrics

Graph plot of evaluation metrics vs score for different ML models on SpamAssassin dataset.
Graph plot of evaluation metrics vs score for different ML models on HSD dataset.

Result Analysis

Graph plot of Machine Learning models vs Accuracy for SpamAssassin dataset.
Graph plot of Machine Learning Models vs Accuracy for HSD dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Code		Code
Testing-mails		Testing-mails
pickle_files		pickle_files
IJCRT2005081.pdf		IJCRT2005081.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Detecting-Phishing-Attack-using-ML-DL-Models

Overview

Phishector Architecture

Email Dataset

Implementation

Evaluation Metrics

Result Analysis

About

Releases

Packages

Languages

Selvagokul/Detecting-Phishing-Attack-using-ML-DL-Models

Folders and files

Latest commit

History

Repository files navigation

Detecting-Phishing-Attack-using-ML-DL-Models

Overview

Phishector Architecture

Email Dataset

Implementation

Evaluation Metrics

Result Analysis

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages