Skip to content

Latest commit

 

History

History
81 lines (66 loc) · 4.33 KB

README.md

File metadata and controls

81 lines (66 loc) · 4.33 KB

AC-DL-LAB-SS-2022-Team03

General Information

The project is tackling the interesting problem presented by the HAPT (Human Activity Recognition Dataset). Please find documentation and detailed description of the dataset and it's feature set, target set at the following link

  • dataset is NOT added to the project on GitHub, please link it to the Google Colab instance + add path in dataset.py
  • models are NOT added in the project on GitHub, since these will be presented using the paper & presentation & poster
  • training metrics are NOT added in the project on GitHub, since these are used as presentation materials for the publications mentioned above

Running the project

Main project entry point: project-master.ipynb

Prerequisites:

  • project is created to be executed using a Jupiter instance, preferably Google Colab
  • prerequisites are added in the Setup cell tree in the ipynb file

Running parts of the project:

opening the _ipynb_ file, and running the Setup cell
  • Google Drive will be linked, used to link dataset to project
  • needed Python3 packages (besides already installed ones on Colab) are installed
  • cwd is set accordingly to the path of project on Google Drive
the validation cells for source code cell tree contains different cells to check
  • dataset loading
  • configuration variables such as device (CPU|GPU)
the train-val-test sequce is the main entry point for creating models
  • open main.py

  • modify in function main() the model you would like to train/evaluate

    • uncomment in model declaration
    • for RNNs, DO NOT forget sequence_length variable, also decomment it if needed
    • Linear networks: sequence_length has to be None
  • execute cells from the cell tree, for the following purposes:

    • train the currently active (decommented in main.py) model
    • visualize losses throughout training (training and validation) - in order to assess correct combatting of overfit
    • test the currently active model
    • explain the current model using SHAP: average feature importance & one index in test dataset explained using force plots
the (Empirical) Evaluation - model comparison cell tree is for model comparison, evaluation
  • run the first cell to see training metrics loss|acc by specifying the train_cmp parameter, and add a number as sg_w parameter to smoothen the metrics (good smoothening is 1001, very visible differences between metrics)
  • the second cell compares the evaluation of every existing trained model by printing
    • (optional, param show_arch) the architecture of the network, together with tensor sizes, network sizes
    • (optional, param conf_cut) the size of the subset from test dataset to be shown on confusion matrix + sample/category from subset (500 is a good value to visually evaluate)
    • evaluation metrics such as accuracy, f1

TODO List

Done:

  • train-validation-test sequence
  • SHAP (SHapley Additive exPlanations) analysis of trained models
  • empirical evaluation
    • visualization of training metrics as comparison plots
    • model evaluation comparison. For each model:
      • Model architecture
      • Metrics: training accuracy, F1 score
      • Visualize: confusion matrix on a subset of the test dataset (understand metrics)
  • add EarlyStopping to training, together with saving loss (train/val ds) + val acc per epoch
  • enhance RNN architecture to capture better the task
  • work on GRU/LSTM comparison in particular
  • use SHAP to explain different scenarios (idea: explain how e.g. sitting is affected differently than walking upstairs)
  • use multiple sizes/network architecture cluster to visualize problem's complexity

TODO:

  • use further researched networks to prove different theoretical aspects (transformers, ensemble methods, etc.)