PhishSense

Overview

PhishSense is a system designed to extract features from URLs, classify them as either legitimate or phishing, and create a structured dataset for machine learning model training. The system consists of Python scripts that read a CSV file containing URLs and their corresponding types, then extracts features from each URL using web scraping techniques. The extracted features are saved to a CSV file, which can be used to train machine learning models for phishing detection.

Usage

1. Installation

Make sure you have Python installed on your system. Additionally, install the required Python packages using the following command:

pip install -r requirements.txt

2. Running the System

Clone the Repository:

git clone https://github.com/yourusername/PhishSense.git
cd PhishSense

Prepare the CSV File:

Create a CSV file (your_csv_file.csv) with columns 'url' and 'type', where 'type' indicates whether the URL is legitimate (0) or phishing (1).
```
url,type
http://example.com,0
http://phishing.com,1
```
Run the Main Script:

Execute the main script (main.py) with the input CSV file and desired output file for extracted features. Optionally, you can specify the start and end lines (inclusive) to read from the input CSV file:
```
python main.py --input your_csv_file.csv --output extracted_features.csv --start-line 1 --end-line 100
```
This will read the CSV, extract features from each URL, and save the results to a new CSV file (extracted_features.csv).

3. Output

The system generates a CSV file (extracted_features.csv) containing the extracted features for each URL, including the URL itself, title, number of links, and the type of the website (legitimate or phishing). This file can be used as a labeled dataset for training machine learning models.

Demo

Watch this video to see PhishSense's interface and trained machine learning models in action.

PhishSense.Demo.1.1.mp4

Notes

Ensure that the URLs in the input CSV file are accessible, as the system makes web requests to extract features.
The machine learning model training part is not included in this system. You can use the generated extracted_features.csv file to train your own machine learning model for phishing detection.

Machine Learning Description

The IPYNB file (Machine Learning Models.ipynb) contains code for training and evaluating machine learning models for phishing detection. The file includes the following sections:

Setup: Installation of necessary libraries and modules.
Data Loading and Preprocessing: Loading the dataset and preprocessing steps such as standard scaling.
Data Visualization: Visualizing the dataset using Principal Component Analysis (PCA).
Support Vector Machine (SVM): Training, evaluation, and visualization of results for SVM model.
Neural Networks: Building, training, evaluation, and visualization of results for neural network model.
Random Forest: Training, evaluation, and visualization of results for random forest model.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
app		app
data		data
models		models
.gitignore		.gitignore
README.md		README.md
extract_features.py		extract_features.py
main.py		main.py
phishing_detector.py		phishing_detector.py
read_csv.py		read_csv.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PhishSense

Table of Contents

Overview

Usage

1. Installation

2. Running the System

3. Output

Demo

Notes

Machine Learning Description

About

Releases

Packages

Languages

ahmaad-ansari/PhishSense

Folders and files

Latest commit

History

Repository files navigation

PhishSense

Table of Contents

Overview

Usage

1. Installation

2. Running the System

3. Output

Demo

Notes

Machine Learning Description

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages