Network Intrusion Detection Project

Authors: Charles Rizzo, Nick Skuda, Austin Saporito

The final project reports is entitled CS545_Final_Report.pdf and is in the top level of this repo.

Add the data sets UNSW_NB15_training-set.csv and UNSW_NB15_testing-set.csv to the folder data/
Run python3 remove_duplicates.py for both the aformentioned data sets (manually go into main and change the argument -- yes, I know this could have been done better)
You should see UNSW_test.csv and UNSW_train.csv in the data/ directory now
Run python3 preprocess_data.py to generate all of the numpy files in the data/ dir
All of those .npy files collectively represent 4 versions of the data set: normal_binary (attack vs. normal), normal_labeled (normal vs. 1 of 9 attack labels), PCA_binary (attack vs. normal), and PCA_labeled (normal vs. 1 of 9 attack labels)

So with PCA, we decided to retain 90% variance in the data set, which reduced our amount of features from 44 to 16, which massively reduces the complexity of the data set. It will be interesting to see what the performance hit for providing less information is.
We used an OrdinalEncoder to transform the 3 features that were categorical to numbers instead of strings
We also standardized each column in the data set such that the mean was 0 with a variance of 1. Just seemed like the right thing to do.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data		data
docs		docs
graphs		graphs
saved_models		saved_models
src		src
.gitignore		.gitignore
CS545_Final_Report.pdf		CS545_Final_Report.pdf
README.md		README.md
grid-search.sh		grid-search.sh