Clean repository for Oscar Cleve and Sara Gustafsson's bachelor thesis project at KTH.
The raw data is found in the data directory.
Generate all features by executing the generate_features.py file. Make sure the execution saves your extracted features, that saving is not not commented in the code, since this can take quite some time.
This repository contains two separate methods for feature selection after the initial pool of features has been generated.
-
The FRESH algorithm implemented in TSFRESH combined with a correlation filter. This is in the fresh_selector.py file.
-
Random forest selector which can be found and run using the RandomForest_Selector.py file.
Both of these files also contains classification using a decision tree classifier and will present results after execution using the selected features.
The following files are only used imported in other programs: correlation.py load_data.py storage.py
classification_only.py is used to extract only selected features and make a classifcation using a pre-trained classifier.
correlation_group.py can be used to evaluate correlation in a dataset, using the correlation.py functions.
feature_compariosn.py is used to compare two lists of selected features.
find_hyper_parameters.py is used to search for best hyperparameters to use in the Random Forest feature selector method.