An analysis about judging whether a car from auctions has issues by fitting multiple machine learning models based on R.
The purpose of this project is to learning the philosophy between cars tradings. It is a practise of employing machine learning methods.
The original data is collected from kaggle Don't get kicked!, which is the same as csv files in orinigal_data folder. The original kaggle test has already closed the evaluation access, thus there is no specific rank for this project.
Entity:
- Environment: R (R Markdown), Tableau
- Data resource: kaggle Don't get kicked!
- Models: classification tree model, Random Forest, XGBoost, support vector machine, neutral network
- Libraries: caret, randomForest, xgboost, nnet, pROC, etc.
- Other: PCA knowledge is also needed.
- Import all
.rmd
files and install libraries as needed.
The files is currently set to knit to word document.
In this project, data is first cleaned with preprocessing and feature selection methods, and second analyzed with multiple machine learning models, and then evaluated considering roc when applying different models.
-
Tree model
Five variables are chosen for classification tree model according to gini index. After and additional cross-validation check, further improvement is made by changing model complexity to pruning the tree.
In terms of the auc and recall rate, the best current model is nnet, which gives a 59.48% recall rate, this is based on cutoff value where having the J statistic (Sensitivity+Specificity-1) maximal.
This is a group project for a course. My teammate Wendy, Nina, Andi and I worked together to accomplish this work.