Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes Malware Detection #702

Merged
merged 4 commits into from
Nov 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2,966 changes: 0 additions & 2,966 deletions Prediction Models/LSTM_Traffic_Forecasting/grab-traffic-mgmt.ipynb

This file was deleted.

80 changes: 80 additions & 0 deletions Prediction Models/Malware Detection/Readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@

## Malware 🤖 Detection with Deep Learning 🧑🏻‍💻
This project demonstrates the implementation of a deep learning model for malware detection on Android devices. The model analyzes behavioral characteristics of applications to distinguish between benign and malicious software.

----

## Dataset
The dataset used in this project contains 100,000 observations, each representing an Android application. There are 35 features extracted from the app's behavior in a Unix/Linux-based virtual machine. These features include:

----

## Features Description Properties:

| Features Description | Properties |
|---|---|
| hash APK/ SHA256 | file name |
| millisecond | time |
| classification | malware/benign |
| state | flag of unrunable/runnable/stopped tasks |
| usage_counter | task structure usage counter |
| prio | keeps the dynamic priority of a process |
| static_prio | static priority of a process |
| normal_prio | priority without taking RT-inheritance into account |
| policy | planning policy of the process |
| vm_pgoff | the offset of the area in the file, in pages. |
| vm_truncate_count | used to mark a vma as now dealt with |
| task_size | size of current task. |
| cached_hole_size | size of free address space hole. |
| free_area_cache | first address space hole |
| mm_users | address space users |
| map_count | number of memory areas |
| hiwater_rss | peak of resident set size |
| total_vm | total number of pages |
| shared_vm | number of shared pages. |
| exec_vm | number of executable pages. |
| reserved_vm | number of reserved pages. |
| nr_ptes | number of page table entries |
| end_data | end address of code component |
| last_interval | last interval time before thrashing |
| nvcsw | number of volunteer context switches. |
| nivcsw | number of in-volunteer context switches |
| min_flt | minor page faults |
| maj_flt | major page faults |
| fs_excl_counter | it holds file system exclusive resources. |
| lock | the read-write synchronization lock used for file system access |
| utime | user time |
| stime | system time |
| gtime | guest time |
| cgtime | cumulative group time. Cumulative resource counter |
| signal_nvcsw | used as cumulative resource counter. |

------


## Methodology
- Data Loading and Preprocessing: The dataset is loaded and the 'classification' column is mapped to binary values (0 for benign, 1 for malware). The data is then shuffled.

- Exploratory Data Analysis: The distribution of the target variable ('classification') is visualized. A correlation matrix is generated to identify relationships between features.

- Feature Selection: Several features are dropped based on low correlation or redundancy.

- Data Normalization: The data is normalized using StandardScaler to standardize the range of features.

- Model Building: A deep neural network is constructed using TensorFlow/Keras. The model consists of multiple dense layers with ReLU activation functions and an output layer with a softmax activation function.

- Model Compilation and Training: The model is compiled with the Adam optimizer and sparse categorical cross-entropy loss function. It is trained on the training data with a specified batch size and number of epochs.  

- Evaluation: The trained model is evaluated on the test data, and the test loss and accuracy are printed.

- Fine-tuning: The model is further trained using an SGD optimizer with a lower learning rate and early stopping to potentially improve performance.

-------

## Results
The model achieves high accuracy (over 99%) in classifying malware on the test data. The training history is visualized to observe the trend of accuracy and loss over epochs.

------

## Conclusion
This project demonstrates the effectiveness of deep learning in malware detection. By analyzing behavioral patterns, the model can identify malicious applications with high accuracy, contributing to improved security on Android devices.
Loading
Loading