In this project, we will carry out model evaluation and selection for predictive analytics on image data.
Term: Spring 2018
-
Team 9
-
Team members
- Fan Yang
- Jingyi Wang
- Xueyao Li
- Yiran Jiang
-
Project summary:
In this project, we created a classification engine for images of dogs versus fried chicken versus blueberry muffins. The baseline model used GBM with decision stumps on SIFT features. For feature extraction, besides the SIFT, we also tried RGB and GIST. For advanced model, we considered SVM(Linear and RBF kernel), XGBoost, AdaBoost and Convolutional Neural Network(CNN).
After model evaluation and comparison, XGBoost achieved the best performance. Then we tuned RGB hyperparameters and considered the combination of features. Comparing different features on XGBoost, we chose the RGB. The final model reduced the test error to 9.77% with a running time 48.467s.
Comparison of baseline and advanced models:
Comparison of different features on XGBoost:
- Contribution statement: (Team 9 contribution statement) All team members contributed equally in all stages of this project. All team members approve our work presented in this GitHub repository including this contributions statement.
Following suggestions by RICH FITZJOHN (@richfitz). This folder is orgarnized as follows.
proj/
├── lib/
├── data/
├── doc/
├── figs/
└── output/
Please see each subfolder for a README file.