AV Job-A-thon 2021

Approach:

First we observe that the feature Credit_Product has missing values. We use imputation here to fill all the values. Here we mark 'Unknown' for each NaN values.
Since after training on the data-set we get to know that the feature Credit_Product has highest feature importance. So we will now try to break the "Unknown" value into "U1" and "U0" as per our target variable Is_Lead.
Now we want to predict correct Credit_Product featue from rest of our dataset. So we train a RandomForestClassifier for classification of Credit_Product feature. After training, we will add all the probabilities of Credit_Product in the train data itself and for test data also.
Now we use CatBoostClassifier to train the data for the target variable Is_Lead.
After that we evaluate its roc-auc score.
Now we predict the target variable Is_Lead for the test data and save it to Predictions.csv.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
catboost_info		catboost_info
.gitignore		.gitignore
EDA.ipynb		EDA.ipynb
Main.ipynb		Main.ipynb
Predictions.csv		Predictions.csv
README.md		README.md
data_dictionary.png		data_dictionary.png
test.csv		test.csv
train.csv		train.csv