run ./run.sh in command line
- Divide.py randomly selects files from the folder 'train' and put them into the folders 'new_train' and 'new_test' In the meanwhile, divide 'train_label' into its corresoponding 'new_train_label' and 'new_test_label'
- Preprocess.py cleans the training and testing data
- Predict.py make prediction of 'test' with the model generated by 'train'
- Evaluate.py calculates the result of submission.csv by 'new_test_label' (ground truth)
Remember not to 'git add -A' so you won't end up including all the data