Skip to content

Latest commit

 

History

History
28 lines (16 loc) · 1.55 KB

README.md

File metadata and controls

28 lines (16 loc) · 1.55 KB

Predict-Default-Loans

The objective of this project is to predict the loans that will be charged-off/default. The dataset is taken from Lending Club with 52 descriptive features with loans over a period of 5 years from 2007-2011.

Dataset and Modelling

The dataset is imbalanced with fully paid(positive class) to charged off(negative class) ratio of 85:15. Three techniques are implemented to balance the data: Under-sampling, over-sampling and using weighted model.

Three algorithms are used to train the data: Random Forest, XGBoost and Neural network using Pytorch and CUDA. The XGBooost and Neural network are trained using GPU. The models are evaluated using AUC, F1 score and confusion matrix.

Results

The best model for each of the technique are:

Technique Algorithm AUC F1 score Confusion Matrix
Undersampling XGBoost 0.98 Charged Off: 0.98
Fully paid: 0.98
TP:2156
FP:14
TN:365
FN:14
Oversampling Neural Network 0.99 Charged Off: 0.97
Fully paid: 1.00
TP:2158
FP:12
TN:371
FN:8
Weighted Model XGBoost 0.98 Charged Off: 0.99
Fully paid: 0.99
TP:120
FP:1
TN:130
FN:5

Libraries installed:

pandas, numpy, matplotlib, seaborn, chart_studio, sklearn, xgboost, torch, torchvision.

Using conda install orca to render static plots from plotly. Command to install:

$ conda install -c plotly plotly-orca