GitHub

1. Project Overview

Predicting customer churn is a vital economic focus for many companies^[1]. Simply put, customer churn happens when people stop using a company's services, often turning to competitors. Acquiring a new customer can cost up to seven times more than keeping an existing one^[2]. Therefore, for companies relying on regular subscription fees, like those in banking, telecom, or online services, it's essential to keep their customers. As such, identifying which customers might leave has become a priority in many industries.

For KKBOX, a subscription-based music streaming platform, maintaining user loyalty is essential. Given that users can opt for manual or auto-renewal upon sign-up and can cancel memberships anytime, this project aims to leverage four machine learning algorithms, supplemented by GCP Auto-ML methods, to accurately identify and predict potential customer churn for KKBOX.

2. Methods and Steps

The following steps outline the customer churn prediction process for this notebook:

Fig 1-Workflow Diagram

Dataset and Data Preprocessing

The notebook gathers essential customer details like age, purchase history, usage frequency, and feedback. Then, using data preprocessing, it cleans the data by fixing errors, filling in gaps, and removing unusual data points.

Data Profiling

Feature Engineering and Modeling

Monitoring

4. Results

member model df
Analyzing the model outcomes, the Decision Tree achieves a decent accuracy of 92.73%, but its AUC of 35.54% reveals a limitation in distinguishing between classes. Random Forest's accuracy is close at 92.68%, but its higher AUC of 82.06% shows better class differentiation. The Gradient-Boosted Trees model leads with a 92.89% accuracy and an AUC of 85.11%. In contrast, the Linear SVM has an accuracy of 92.19% with a moderate AUC of 50%. Overall, Gradient-Boosted Trees is the standout performer, especially in AUC, followed closely by Random Forest. Both significantly surpass the Decision Tree and Linear SVM in class differentiation capabilities. Validating these findings, Gradient-Boosted Trees maintain their strong performance in the test set, notably with an AUC of 85.11%, emphasizing its effectiveness in class separation.

5. Conclusions

6. Future Improvement and Discussion

References

[1] https://www.sciencedirect.com/science/article/pii/S0169023X2200091X
[2] https://www.forbes.com/sites/forbesbusinesscouncil/2022/12/12/customer-retention-versus-customer-acquisition/?sh=3ad964471c7d

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
app		app
assets		assets
data		data
results		results
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1. Project Overview

2. Methods and Steps

Fig 1-Workflow Diagram

4. Results

5. Conclusions

6. Future Improvement and Discussion

References

About

Languages

License

theidari/customer_churn

Folders and files

Latest commit

History

Repository files navigation

1. Project Overview

2. Methods and Steps

Fig 1-Workflow Diagram

4. Results

5. Conclusions

6. Future Improvement and Discussion

References

About

Resources

License

Stars

Watchers

Forks

Languages