Data Science Projects

This repository contains a collection of data science projects and notebooks. Each notebook explores different data science techniques, analyses, or machine learning models applied to various datasets and problems. Below is a detailed description of each notebook.

Student Performance Indicator
Effect of Government Social Programs on Poverty in Kenya
Effect of Petroleum Prices Changes on the Demand for Petroleum in Kenya
Effect of Taxation on SME Performance
Fine-Tuning English-Swahili Translation Model
Lyrics Finder
English-Kiswahili Translation Notebook
PandemAI
Supervised Learning with SVM
Supervised Learning with Random Forests
Customer Churn Prediction
[Causal Inference with Bayesian Networks](#12-Causal Inference with Bayesian Networks)

1. Student Performance Indicator

Notebook: EDA_STUDENT_PERFORMANCE_.ipynb

This notebook focuses on analyzing student performance through a comprehensive Exploratory Data Analysis (EDA). It follows the machine learning project lifecycle, starting from understanding the problem statement to data preprocessing, modeling, and choosing the best model.

Key Steps:

Understanding the Problem Statement: Defining the objectives of the analysis.
Data Collection: Gathering the relevant data on student performance.
Data Checks to Perform: Ensuring the data's integrity and suitability for analysis.
Exploratory Data Analysis: Analyzing and visualizing data to uncover patterns and insights.
Data Pre-Processing: Preparing data for modeling by handling missing values, encoding categorical variables, etc.
Model Training: Training various machine learning models.
Choose Best Model: Selecting the most effective model based on evaluation metrics.

2. Effect of Government Social Programs on Poverty in Kenya

Notebook: Effect_of_government_social_programs_on_poverty_in_Kenya.ipynb

This notebook performs descriptive analytics to examine the effect of government social programs on poverty in Kenya. The main focus is on understanding correlations and drawing insights from the data.

Key Steps:

Correlation Analysis: Identifying relationships between variables to understand how social programs may influence poverty.
Descriptive Analytics: Summarizing and visualizing the data to gain insights into the impact of social programs.

3. Effect of Petroleum Prices Changes on the Demand for Petroleum in Kenya

Notebook: Effect_of_petroleum_prices_changes_on_the_demand_for_petroleum_in_Kenya.ipynb

This notebook explores the relationship between changes in petroleum prices and the demand for petroleum in Kenya. Through descriptive analytics, it aims to uncover correlations and patterns in the data.

Key Steps:

Correlation Analysis: Analyzing the relationship between petroleum prices and demand.
Descriptive Analytics: Utilizing visualizations and statistical summaries to understand market trends.

4. Effect of Taxation on SME Performance

Notebook: Effect_of_taxation_on_sme_performance.ipynb

This notebook investigates the impact of taxation on the performance of Small and Medium Enterprises (SMEs). It employs frequency analysis to explore common responses and patterns in the data.

Key Steps:

Frequency Analysis: Identifying the most common responses and trends related to taxation and SME performance.
Descriptive Analytics: Visualizing the data to gain insights into how taxation affects SMEs.

5. Fine-Tuning English-Swahili Translation Model

Notebook: FineTuningEngSwaModel.ipynb

This notebook demonstrates the process of fine-tuning a translation model for English to Swahili. It utilizes deep learning techniques and frameworks like TensorFlow and Keras for model training and evaluation.

Key Steps:

Import Libraries: Utilizing TensorFlow, Keras, Matplotlib, Seaborn, Numpy, and Sklearn for various tasks.
Load and Preprocess the Dataset: Working with the CIFAR-10 dataset, normalizing images, and converting labels for training.
Model Training: Fine-tuning the translation model using deep learning techniques.
Evaluation: Assessing the model's performance with appropriate metrics.

6. Lyrics Finder

Notebook: LyricsFinder.ipynb

This notebook provides a tool for finding song lyrics by scraping Genius.com. It covers the process of collecting URLs and fetching lyrics for a specified number of songs by an artist.

Key Steps:

Get URLs: Obtaining a list of Genius.com URLs for the desired number of songs by a specific artist.
Fetch Lyrics: Scraping the lyrics from the URLs using BeautifulSoup, including a fix for HTML parsing.

7. English-Kiswahili Translation Notebook

Notebook: eng_kisw_traslation_notebook.ipynb

This notebook focuses on fine-tuning a model for English-Kiswahili translation tasks. It emphasizes the importance of using GPU for accelerated computation and covers various aspects of model fine-tuning.

Key Steps:

Switch Runtime to GPU: Ensuring that the notebook utilizes GPU for faster processing.
Model Fine-Tuning: Fine-tuning a translation model for improved performance on the English-Kiswahili task.

8. PandemAI

Notebook: pandemai.ipynb

This notebook deals with data cleaning and formatting as part of a larger project named "PandemAI." It outlines the steps involved in preparing data for analysis and modeling.

Key Steps:

Data Cleaning: Removing inconsistencies and preparing the dataset for analysis.
Formatting: Structuring the data in a way that's suitable for further exploration and modeling.

9. Supervised Learning with SVM

Notebook: supervised_learning(SVM).ipynb

This notebook explores supervised learning techniques using Support Vector Machines (SVM). It delves into training and evaluating SVM models on various datasets.

Key Steps:

Model Training: Implementing SVM algorithms for supervised learning tasks.
Evaluation: Assessing the performance of SVM models with relevant metrics.

10. Supervised Learning with Random Forests

Notebook: supervised_learning(randomForests).ipynb

This notebook examines the application of Random Forest algorithms for supervised learning. It covers the process of training models, selecting attributes, and evaluating performance.

Key Steps:

Attribute Selection: Identifying relevant attributes for modeling.
Model Training: Implementing Random Forest algorithms for classification tasks.
Evaluation: Utilizing confusion matrices and classification reports to measure accuracy and performance.

11. Customer Churn Prediction

Notebook: Customer Churn Prediction.ipynb

Project Overview

This project focuses on predicting customer churn using various machine learning models to identify factors contributing to customer attrition. The analysis is performed using a dataset of retail customer information, including demographic and behavioral attributes.

Key Components

i. Data Exploration:

Loaded and explored the dataset to understand its structure and the distribution of features.
Visualized target variable distribution, numerical and categorical features, and correlations.

ii. Feature Engineering:

Encoded categorical variables and scaled numerical features for model training.
Split data into training and testing sets.

iii. Model Training and Evaluation:

Trained several classification models: Random Forest, AdaBoost, Support Vector Classifier, and XGBoost.
Evaluated models using accuracy, classification reports, and confusion matrices.

iv. Results:

Compared model performance to select the best performing model for predicting customer churn.
Generated insights into the effectiveness of different machine learning algorithms in the context of customer churn prediction.

Dependencies

pandas: 1.5.3
numpy: 1.24.3
matplotlib: 3.8.0
seaborn: 0.14.0
scikit-learn: 1.3.0
xgboost: 2.1.0

Acknowledgements

Dataset: Online Retail Customer Churn Dataset

Getting Started

To explore these notebooks, you can open them directly in Google Colab using the provided links. Each notebook contains the necessary code and instructions to replicate the analyses and results.

Prerequisites

Python 3.x
Jupyter Notebook
Libraries: TensorFlow, Keras, Matplotlib, Seaborn, Numpy, Scikit-learn, BeautifulSoup, Pandas, etc.

Running the Notebooks

Clone the repository:

git clone https://github.com/aueskinj/Data-Science-Projects.git

Navigate to the project directory:
```
cd Data-Science-Projects
```
Open a Jupyter Notebook environment and select the desired notebook to run.

Author

Kimuhu Njuguna

Feel free to explore the notebooks, modify the code, and apply these techniques to your own data science projects!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Science Projects

Table of Contents

1. Student Performance Indicator

Key Steps:

2. Effect of Government Social Programs on Poverty in Kenya

Key Steps:

3. Effect of Petroleum Prices Changes on the Demand for Petroleum in Kenya

Key Steps:

4. Effect of Taxation on SME Performance

Key Steps:

5. Fine-Tuning English-Swahili Translation Model

Key Steps:

6. Lyrics Finder

Key Steps:

7. English-Kiswahili Translation Notebook

Key Steps:

8. PandemAI

Key Steps:

9. Supervised Learning with SVM

Key Steps:

10. Supervised Learning with Random Forests

Key Steps:

11. Customer Churn Prediction

Project Overview

Key Components

Dependencies

Acknowledgements

Getting Started

Prerequisites

Running the Notebooks

Author

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 26,042 Commits
CausalML/healthcareml		CausalML/healthcareml
Customer_Churn_Prediction.ipynb		Customer_Churn_Prediction.ipynb
EDA_STUDENT_PERFORMANCE_.ipynb		EDA_STUDENT_PERFORMANCE_.ipynb
Effect_of_government_social_programs_on_poverty_in_Kenya.ipynb		Effect_of_government_social_programs_on_poverty_in_Kenya.ipynb
Effect_of_petroleum_prices_changes_on_the_demand_for_petroleum_in_Kenya.ipynb		Effect_of_petroleum_prices_changes_on_the_demand_for_petroleum_in_Kenya.ipynb
Effect_of_taxation_on_sme_performance.ipynb		Effect_of_taxation_on_sme_performance.ipynb
FineTuningEngSwaModel.ipynb		FineTuningEngSwaModel.ipynb
LyricsFinder.ipynb		LyricsFinder.ipynb
README.md		README.md
eng_kisw_traslation_notebook.ipynb		eng_kisw_traslation_notebook.ipynb
pandemai.ipynb		pandemai.ipynb
supervised_learning(SVM).ipynb		supervised_learning(SVM).ipynb
supervised_learning(randomForests).ipynb		supervised_learning(randomForests).ipynb

aueskinj/Data-Science-Projects

Folders and files

Latest commit

History

Repository files navigation

Data Science Projects

Table of Contents

1. Student Performance Indicator

Key Steps:

2. Effect of Government Social Programs on Poverty in Kenya

Key Steps:

3. Effect of Petroleum Prices Changes on the Demand for Petroleum in Kenya

Key Steps:

4. Effect of Taxation on SME Performance

Key Steps:

5. Fine-Tuning English-Swahili Translation Model

Key Steps:

6. Lyrics Finder

Key Steps:

7. English-Kiswahili Translation Notebook

Key Steps:

8. PandemAI

Key Steps:

9. Supervised Learning with SVM

Key Steps:

10. Supervised Learning with Random Forests

Key Steps:

11. Customer Churn Prediction

Project Overview

Key Components

Dependencies

Acknowledgements

Getting Started

Prerequisites

Running the Notebooks

Author

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages