Skip to content

Commit

Permalink
Merge pull request #1576 from Stuti333/feature/Stuti
Browse files Browse the repository at this point in the history
Adding Customer Churn Prediction Model
  • Loading branch information
Kushal997-das authored Oct 31, 2024
2 parents 1e5cfdc + c4520c5 commit 1450212
Show file tree
Hide file tree
Showing 6 changed files with 11,985 additions and 1 deletion.
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Customer Churn Prediction

## Overview
This project implements a machine learning solution to predict customer churn using three different algorithms: Random Forest, XGBoost, and Logistic Regression. The model helps identify customers who are likely to discontinue services, enabling proactive retention strategies.

## Table of Contents
- [Features](#features)
- [Requirements](#requirements)
- [Project Structure](#project-structure)
- [Installation](#installation)
- [Model Comparison](#model-comparison)

## Features
- Data preprocessing and feature engineering
- Implementation of three machine learning algorithms:
- Random Forest Classifier
- XGBoost Classifier
- Logistic Regression
- Model performance comparison and evaluation
- Feature importance analysis
- Cross-validation for robust model validation

## Requirements
```
python>=3.8
pandas
numpy
scikit-learn
xgboost
matplotlib
seaborn
```

## Project Structure
```
customer-churn-prediction/
├── data/
│ ├── Churn_Modelling.csv
├── saved models/
│ ├── Gradient_Boosting_Classifier.joblib
│ ├── scaler.joblib
├── notebooks/
│ └── Model.ipynb
└── README.md
```

## Installation
1. Clone the repository:
```bash
git clone https://github.com/yourusername/customer-churn-prediction.git
cd customer-churn-prediction
```

2. Create a virtual environment and activate it:
```bash
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
```

3. Install required packages:
```bash
pip install -r requirements.txt
```

## Model Comparison

### Performance Metrics

| Model | Accuracy | Precision |
|--------------------|----------|-----------|
| Random Forest | 0.87 | 0.83 |
| XGBoost | 0.88 | 0.86 |
| Logistic Regression| 0.82 | 0.77 |

### Key Findings
- XGBoost performed best overall with highest accuracy and AUC-ROC scores
- Random Forest showed comparable performance with slightly lower metrics
- Logistic Regression provided a good baseline but was outperformed by both ensemble methods

Threshold vs Recall and Threshold vs Precision graph (XGBoost)
![image](https://github.com/user-attachments/assets/42be4ba5-052d-4e7c-8c16-be57bc929d80)

ROC Curve
![{05BAA722-2B5D-466B-94C4-5ECB09D9A904}](https://github.com/user-attachments/assets/3a3cacb5-15e2-4876-bf49-94d3d3515866)


Loading

0 comments on commit 1450212

Please sign in to comment.