Skip to content

theidari/customer_churn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

1. Project Overview

Predicting customer churn is a vital economic focus for many companies[1]. Simply put, customer churn happens when people stop using a company's services, often turning to competitors. Acquiring a new customer can cost up to seven times more than keeping an existing one[2]. Therefore, for companies relying on regular subscription fees, like those in banking, telecom, or online services, it's essential to keep their customers. As such, identifying which customers might leave has become a priority in many industries.

For KKBOX, a subscription-based music streaming platform, maintaining user loyalty is essential. Given that users can opt for manual or auto-renewal upon sign-up and can cancel memberships anytime, this project aims to leverage four machine learning algorithms, supplemented by GCP Auto-ML methods, to accurately identify and predict potential customer churn for KKBOX.

2. Methods and Steps

The following steps outline the customer churn prediction process for this notebook:

Fig 1-Workflow Diagram

  1. Dataset and Data Preprocessing
  2. The notebook gathers essential customer details like age, purchase history, usage frequency, and feedback. Then, using data preprocessing, it cleans the data by fixing errors, filling in gaps, and removing unusual data points.

  3. Data Profiling
  4. Data Preprocessing: Cleanse and preprocess the collected data to remove any inconsistencies, missing values, or outliers. This step may involve data transformation, feature engineering, and scaling. Feature Selection: Identify the most relevant features that can potentially influence churn. This step helps reduce noise and improve the accuracy of the predictive models.
  5. Feature Engineering and Modeling
  6. Model Selection: Choose an appropriate predictive modeling technique based on the nature of the data and the problem at hand. Commonly used techniques include logistic regression, decision trees, random forests, support vector machines (SVM), and artificial neural networks. Model Training: Split the dataset into training and testing sets. Use the training set to train the chosen model by fitting it to the historical data and adjusting the model's parameters to minimize the prediction error. Model Evaluation: Evaluate the trained model's performance using the testing set. Common evaluation metrics include accuracy, precision, recall, F1 score, and area under the receiver operating characteristic curve (AUC-ROC). Predictive Analysis: Apply the trained model to new, unseen data to predict the likelihood of churn for individual customers. This step helps identify customers who are at high risk of churn and require targeted retention efforts.
  7. Monitoring
  8. Customer Retention Strategies: Based on the churn predictions, design and implement personalized retention strategies for at-risk customers. These strategies might include special offers, discounts, personalized communication, loyalty programs, or improved customer service. Monitor and Iterate: Continuously monitor the performance of the churn prediction model and retention strategies. Collect feedback, measure the effectiveness of the implemented measures, and refine the predictive models and retention strategies over time.

4. Results

  • member model df

    Analyzing the model outcomes, the Decision Tree achieves a decent accuracy of 92.73%, but its AUC of 35.54% reveals a limitation in distinguishing between classes. Random Forest's accuracy is close at 92.68%, but its higher AUC of 82.06% shows better class differentiation. The Gradient-Boosted Trees model leads with a 92.89% accuracy and an AUC of 85.11%. In contrast, the Linear SVM has an accuracy of 92.19% with a moderate AUC of 50%. Overall, Gradient-Boosted Trees is the standout performer, especially in AUC, followed closely by Random Forest. Both significantly surpass the Decision Tree and Linear SVM in class differentiation capabilities. Validating these findings, Gradient-Boosted Trees maintain their strong performance in the test set, notably with an AUC of 85.11%, emphasizing its effectiveness in class separation.

5. Conclusions

6. Future Improvement and Discussion

References

[1] https://www.sciencedirect.com/science/article/pii/S0169023X2200091X
[2] https://www.forbes.com/sites/forbesbusinesscouncil/2022/12/12/customer-retention-versus-customer-acquisition/?sh=3ad964471c7d