- First we observe that the feature
Credit_Product
has missing values. We use imputation here to fill all the values. Here we mark 'Unknown' for eachNaN
values. - Since after training on the data-set we get to know that the feature
Credit_Product
has highest feature importance. So we will now try to break the "Unknown" value into "U1" and "U0" as per our target variableIs_Lead
. - Now we want to predict correct
Credit_Product
featue from rest of our dataset. So we train aRandomForestClassifier
for classification ofCredit_Product
feature. After training, we will add all the probabilities ofCredit_Product
in the train data itself and for test data also. - Now we use
CatBoostClassifier
to train the data for the target variableIs_Lead
. - After that we evaluate its roc-auc score.
- Now we predict the target variable
Is_Lead
for the test data and save it toPredictions.csv
.
- The Notebook file is
Main.ipynb
. - The Prediction file is
Predictions.csv
. - The EDA is performed in
EDA.ipynb
.