Developed a model to detect Phished emails from legitimate ones using the Spam Assassin dataset. Extracted relevant features by processing the mails using the NLP toolkit. Built various ML models like Naïve Bayes, Random Forest, and Voting Ensemble with the best accuracy of ~72%, and deep learning model like Neural Network with an accuracy of ~96%.
Phishing is when cybercriminals send malicious emails designed to trick people into falling for a scam. The intent is often to get users to reveal financial information, system credentials, or other sensitive data. The term “Phishing” came about in mid-1990’s, when hackers began using fraudulent emails to fish for information from unsuspecting users. Cybercriminals use phishing because it’s easy, cheap and effective. Email addresses are easy to obtain and emails are virtually free to send. With little effort and little cost, attackers can quickly gain access to valuable data. We can detect these emails and detect them as spam and reduce these attacks. To do this we can use various machine learning and deep learning models.
An experiment is conducted in order to identify the input/output behavior of the system. We have collected data from 2 different datasets. The datasets are SpamAssassin and spam/ham. These datasets are open-source and are freely available. The dataset collected in the experiment are identified and given in Table 4.1. Below table shows the total count of dataset and number of phished and legitimate emails present in those datasets which we have further used to train our model.
-
Choosing choice 1 leads to the extracted features of the emails.
-
Choosing choice 2 provides classification using Deep learning ie Neural network.
-
Choice 3 in ML models menu provides classification using Extra trees model.
-
Choice 4 & 5 in ML models menu provides classification using Adaboost and Stochastic Gradient Boosting model respectively.
-
Choice 6 & 7 in ML models menu provides classification using Voting Ensemble and Naive Bayes model respectively.
-
Choice 8 in ML models menu provides classification using SVM model and choosing option 9 in ML models menu will EXIT the internal menu and go back to Main Menu.