Skip to content

aueskinj/Data-Science-Projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Science Projects

This repository contains a collection of data science projects and notebooks. Each notebook explores different data science techniques, analyses, or machine learning models applied to various datasets and problems. Below is a detailed description of each notebook.

Table of Contents

  1. Student Performance Indicator
  2. Effect of Government Social Programs on Poverty in Kenya
  3. Effect of Petroleum Prices Changes on the Demand for Petroleum in Kenya
  4. Effect of Taxation on SME Performance
  5. Fine-Tuning English-Swahili Translation Model
  6. Lyrics Finder
  7. English-Kiswahili Translation Notebook
  8. PandemAI
  9. Supervised Learning with SVM
  10. Supervised Learning with Random Forests
  11. Customer Churn Prediction
  12. [Causal Inference with Bayesian Networks](#12-Causal Inference with Bayesian Networks)

1. Student Performance Indicator

Notebook: EDA_STUDENT_PERFORMANCE_.ipynb

This notebook focuses on analyzing student performance through a comprehensive Exploratory Data Analysis (EDA). It follows the machine learning project lifecycle, starting from understanding the problem statement to data preprocessing, modeling, and choosing the best model.

Key Steps:

  • Understanding the Problem Statement: Defining the objectives of the analysis.
  • Data Collection: Gathering the relevant data on student performance.
  • Data Checks to Perform: Ensuring the data's integrity and suitability for analysis.
  • Exploratory Data Analysis: Analyzing and visualizing data to uncover patterns and insights.
  • Data Pre-Processing: Preparing data for modeling by handling missing values, encoding categorical variables, etc.
  • Model Training: Training various machine learning models.
  • Choose Best Model: Selecting the most effective model based on evaluation metrics.

2. Effect of Government Social Programs on Poverty in Kenya

Notebook: Effect_of_government_social_programs_on_poverty_in_Kenya.ipynb

This notebook performs descriptive analytics to examine the effect of government social programs on poverty in Kenya. The main focus is on understanding correlations and drawing insights from the data.

Key Steps:

  • Correlation Analysis: Identifying relationships between variables to understand how social programs may influence poverty.
  • Descriptive Analytics: Summarizing and visualizing the data to gain insights into the impact of social programs.

3. Effect of Petroleum Prices Changes on the Demand for Petroleum in Kenya

Notebook: Effect_of_petroleum_prices_changes_on_the_demand_for_petroleum_in_Kenya.ipynb

This notebook explores the relationship between changes in petroleum prices and the demand for petroleum in Kenya. Through descriptive analytics, it aims to uncover correlations and patterns in the data.

Key Steps:

  • Correlation Analysis: Analyzing the relationship between petroleum prices and demand.
  • Descriptive Analytics: Utilizing visualizations and statistical summaries to understand market trends.

4. Effect of Taxation on SME Performance

Notebook: Effect_of_taxation_on_sme_performance.ipynb

This notebook investigates the impact of taxation on the performance of Small and Medium Enterprises (SMEs). It employs frequency analysis to explore common responses and patterns in the data.

Key Steps:

  • Frequency Analysis: Identifying the most common responses and trends related to taxation and SME performance.
  • Descriptive Analytics: Visualizing the data to gain insights into how taxation affects SMEs.

5. Fine-Tuning English-Swahili Translation Model

Notebook: FineTuningEngSwaModel.ipynb

This notebook demonstrates the process of fine-tuning a translation model for English to Swahili. It utilizes deep learning techniques and frameworks like TensorFlow and Keras for model training and evaluation.

Key Steps:

  • Import Libraries: Utilizing TensorFlow, Keras, Matplotlib, Seaborn, Numpy, and Sklearn for various tasks.
  • Load and Preprocess the Dataset: Working with the CIFAR-10 dataset, normalizing images, and converting labels for training.
  • Model Training: Fine-tuning the translation model using deep learning techniques.
  • Evaluation: Assessing the model's performance with appropriate metrics.

6. Lyrics Finder

Notebook: LyricsFinder.ipynb

This notebook provides a tool for finding song lyrics by scraping Genius.com. It covers the process of collecting URLs and fetching lyrics for a specified number of songs by an artist.

Key Steps:

  • Get URLs: Obtaining a list of Genius.com URLs for the desired number of songs by a specific artist.
  • Fetch Lyrics: Scraping the lyrics from the URLs using BeautifulSoup, including a fix for HTML parsing.

7. English-Kiswahili Translation Notebook

Notebook: eng_kisw_traslation_notebook.ipynb

This notebook focuses on fine-tuning a model for English-Kiswahili translation tasks. It emphasizes the importance of using GPU for accelerated computation and covers various aspects of model fine-tuning.

Key Steps:

  • Switch Runtime to GPU: Ensuring that the notebook utilizes GPU for faster processing.
  • Model Fine-Tuning: Fine-tuning a translation model for improved performance on the English-Kiswahili task.

8. PandemAI

Notebook: pandemai.ipynb

This notebook deals with data cleaning and formatting as part of a larger project named "PandemAI." It outlines the steps involved in preparing data for analysis and modeling.

Key Steps:

  • Data Cleaning: Removing inconsistencies and preparing the dataset for analysis.
  • Formatting: Structuring the data in a way that's suitable for further exploration and modeling.

9. Supervised Learning with SVM

Notebook: supervised_learning(SVM).ipynb

This notebook explores supervised learning techniques using Support Vector Machines (SVM). It delves into training and evaluating SVM models on various datasets.

Key Steps:

  • Model Training: Implementing SVM algorithms for supervised learning tasks.
  • Evaluation: Assessing the performance of SVM models with relevant metrics.

10. Supervised Learning with Random Forests

Notebook: supervised_learning(randomForests).ipynb

This notebook examines the application of Random Forest algorithms for supervised learning. It covers the process of training models, selecting attributes, and evaluating performance.

Key Steps:

  • Attribute Selection: Identifying relevant attributes for modeling.
  • Model Training: Implementing Random Forest algorithms for classification tasks.
  • Evaluation: Utilizing confusion matrices and classification reports to measure accuracy and performance.

11. Customer Churn Prediction

Notebook: Customer Churn Prediction.ipynb

Project Overview

This project focuses on predicting customer churn using various machine learning models to identify factors contributing to customer attrition. The analysis is performed using a dataset of retail customer information, including demographic and behavioral attributes.

Key Components

i. Data Exploration:

  • Loaded and explored the dataset to understand its structure and the distribution of features.
  • Visualized target variable distribution, numerical and categorical features, and correlations.

ii. Feature Engineering:

  • Encoded categorical variables and scaled numerical features for model training.
  • Split data into training and testing sets.

iii. Model Training and Evaluation:

  • Trained several classification models: Random Forest, AdaBoost, Support Vector Classifier, and XGBoost.
  • Evaluated models using accuracy, classification reports, and confusion matrices.

iv. Results:

  • Compared model performance to select the best performing model for predicting customer churn.
  • Generated insights into the effectiveness of different machine learning algorithms in the context of customer churn prediction.

Dependencies

  • pandas: 1.5.3
  • numpy: 1.24.3
  • matplotlib: 3.8.0
  • seaborn: 0.14.0
  • scikit-learn: 1.3.0
  • xgboost: 2.1.0

Acknowledgements

Getting Started

To explore these notebooks, you can open them directly in Google Colab using the provided links. Each notebook contains the necessary code and instructions to replicate the analyses and results.

Prerequisites

  • Python 3.x
  • Jupyter Notebook
  • Libraries: TensorFlow, Keras, Matplotlib, Seaborn, Numpy, Scikit-learn, BeautifulSoup, Pandas, etc.

Running the Notebooks

  1. Clone the repository:
    git clone https://github.com/aueskinj/Data-Science-Projects.git
  2. Navigate to the project directory:
    cd Data-Science-Projects
  3. Open a Jupyter Notebook environment and select the desired notebook to run.

Author

  • Kimuhu Njuguna

Feel free to explore the notebooks, modify the code, and apply these techniques to your own data science projects!


About

Notebooks from google collab

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published