This repo is aimed at exploring Kedro, Kedro-viz and Kedro-mlflow for ML pipelines. For this project I have chosen a simple usecase of churn prediction. This project has four pipelines that are data preprocessing, data science, reporting and api. (:link: Kedro)
Customer churn is the percentage of customers your business lost in a set period of time. For instance, if you had 100 customers at the beginning of the month, and lost 10 of them throughout the month, you had a 10% churn rate for the month.
In this project, I have addressed the problem of churning of customers. I have aggregated data of a Telecommunication company and applied EDA on the data. After understanding the data, I have applied many classification models on it like decision tree classifier, random forest, xgboost, logistic regression, LGBM, SVM, adaboost, neurel network, Naive Bayes and random forest after PCA.
The results were:
After that I made a Flask api which takes all the parameters and returns the result back. I deployed the project on Heroku.
I have also made a Power BI report to better understand and visualize the data.
So we talked about what customer churn prediction is, and what it can do for us (among other things).
Let's get this thing running! Follow the next steps:
git clone https://github.com/abideenml/Kedro-ML-pipeline
- Navigate into project directory
cd path_to_repo
- Create a new venv environment and run
pip install -r requirements.txt
- Run your Kedro project with:
kedro run
- You can run your tests mentioned below. To configure the coverage threshold, go to the
.coveragerc
file.
kedro test
- To generate or update the dependency requirements for your project:
kedro build-reqs
- To launch Kedro-Viz from the command line as a Kedro plugin, use the following command from the root folder of your Kedro project:
kedro viz
A browser tab opens automatically to serve the visualisation at http://127.0.0.1:4141/
.
That's it! It should work out-of-the-box executing requirements.txt file which deals with dependencies.
Finally there are a couple more todos which I'll hopefully add really soon:
- Deploy this entire pipeline on AWS Ec2.
- Capture drift and validate data.
- Perform extensive feature engineering and data exploration to better understand the missing points of data story.
I found these resources useful (while developing this one):
If you find this code useful, please cite the following:
@misc{Zain2023KedroMLPipeline,
author = {Zain, Abideen},
title = {kedro-ml-pipeline},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/abideenml/Kedro-ML-pipeline/tree/master/Detecting-Telephone-based-Social-Engineering-Attacks}},
}
If you'd love to have some more AI-related content in your life ๐ค, consider:
- Connect and reach me on LinkedIn and Twitter
- Follow me on ๐ Medium
- Subscribe to my ๐ข weekly AI newsletter!