Skip to content

Latest commit

 

History

History
168 lines (117 loc) · 11.7 KB

README.md

File metadata and controls

168 lines (117 loc) · 11.7 KB

Data Science Projects Portfolio

The portfolio contains my projects from data science, data analysis, SQL databases and python programming which show my all self-study progress.

The projects includes a few categories:

  • Data Analysis (data visualization, data cleaning and data exploration) with python and SQL.
  • Machine Learning (supervised & unsupervised) such as linear regression, classification, prediction, recommendation, customer segmentation and anomaly detection.
  • Natural Language Processing: text classification, sentiment analysis, spam detection and text summarization.
  • Deep Learning/Computer Vision: image recognition.
  • Python projects: web applications with Flask and Streamlit, simple pipeline with python, automating with python;
  • SQL and Python projects: ETL process, basic CRUD.

Projects:

Machine learning:

ML supervised & unsupervised:

The project concerns prediction of the advertisement click using the machine learning. The main aim of this project is predict who is going to click ad on a website in the future. The analysis includes data analysis, data preparation and creation model by different machine learning models.

  • Models used: Logistic Regression, Linear SVC, Decision Tree, Random Forest, AdaBoost.
  • Keywords: Ad click prediction, Python: pandas, scikit-learn, seaborn, matplotlib.

The project concerns churn prediction in the bank customers. It includes data analysis, data preparation and create model by using different machine learning algorithms to predict whether the client is going to leave the bank or not.

  • Models used: Logistic Regression, Random Forest, KNN, SVC, XGBoost;
  • Keywords: Churn prediction, Python: pandas, scikit-learn, seaborn, matplotlib, xgboost.

The project concerns the books recommendation system. It includes data analysis, data preparation and build model by using colaborative filtering and matrix factorization to get books recommendations.

  • Models used: KNN, colaborative filtering, matrix factorization;
  • Keywords: recommendation system , python: pandas, scikit-learn, seaborn, matplotlib.

The project contains customer segmentation by using the RFM method (RFM score) and K-Means clustering for creating customer segments based on data provided.

  • Models used: K-Means, RFM method;
  • Keywords: RFM, K-Means clustering, Python: pandas, scikit-learn, scipy, matplotlib.

The project concerns the anomaly detection in credit cards transactions using machine learning models and Autoencoders. The main aim of this project is predict whether a given transaction was a fraud or not.

  • Models used: Isolation Forest, Local Outlier Factor, Support Vector Machine (OneClassSVM), Autoencoder.
  • Keywords: Anomaly detection, Python: pandas, scikit-learn, tensorflow, seaborn, matplotlib.

The project concerns sales forecasting by using time series model. The project includes sales data analysis and forecast of the number of orders by using Prophet library.

  • Models used: Time Series.
  • Keywords: Exploratory data analysis, Prophet, python.

The project concerns real estate price prediction using linear regression models. I have build a model which predict real estate price based on historical data.

  • Models used: Ridge, Lasso, Elastic Net, Random Forest, Gradient Descent, XGBoost.
  • Keywords: Linear regression, Python: pandas, numpy, scikit-learn.

Natural Language Processing:

The project concerns product categorization (make-up products) based on their description. I have build multi-class text classification model (with ML algorithms, MLP, CNN and Distilbert model) to predict the category (type) of a product. From the data I also have trained Word2vec and Doc2vec model and created Topic Modeling and EDA analysis.

  • Models used: MLP, CNN, Distilbert, Logistic Regression, SVM, Naive Bayes, Random Forest; Word2vec, Doc2vec.
  • Keywords: NLP, text classification, transformers, topic modeling; Python: nltk, gensim, scikit-learn, keras, tensorflow, Hugging Face, LDA.

Text summarization based on extractive and abstractive methods by using python. The analysis includes text summary by calculating word frequency with spacy library, TFIDF vectorizer implementation, automatic text summarization with gensim library and abstractive techniques by using Hugging Face library.

  • Models used: word frequency, TFIDF vectorizer, BART.
  • Keywords: text summarization, transformers, BART, Python: spacy, nltk, scikit-learn, gensim.

The project concerns spam detection in SMS messages to determine whether the messages is spam or not. I have build model by using pretrained BERT model and different machine learning algorithms. The analysis includes also text mining with NLP methods to prepare and clean data.

  • Models used: BERT, Logistic Regression, Naive Bayes, SVM, Random Forest.
  • Keywords: NLP, transformers, spam detection, smote sampling, Python: nltk, scikit-learn, Hugging Face, imbalanced-learn.

The project concerns sentiment analysis of women's clothes reviews. I have built model to predict if the review is positive or negative. I have used different machine learning algorithms and a pre-trained Glove word embeddings with Bidirectional LSTM. The project also includes EDA analysis and sentiment analysis by using Vader and TextBlob methods.

  • Models used: LSTM, Glove, Logistic Regression, Naive Bayes, SVM.
  • Keywords: NLP, sentiment analysis, TextBlob, Vader; Keras, TensorFlow, nltk, scikit-learn, pandas.

Computer vision/Image processing:

The project concerns recognition diseases on apple leaves based on their images. The solution includes data analysis, data preparation, CNN model with data augmentation and transfer learning to recognition of leaves diseases.

  • Models used: Convolutional Neural Network, MobileNet V2.
  • Keywords: Image Recognition, data augumentation, transfer learning; Python: tensorflow, keras, pandas, numpy, scikit-learn, seaborn, pillow, opencv.

The project concerns waste classification to determine if it may be recycle or not. In the analysis I have used Convolutional Neural Network (CNN) model with data augumentation and transfer learning with pre-trained MobileNet V2 model.

  • Models used: Convolutional Neural Network, MobileNet V2.
  • Keywords: Image Recognition, data augumentation, Python: tensorflow, keras, numpy, matplotlib.

In the project I have used OpenCV library to detect faces, eyes and smile in an image.

  • Models used: OpenCV: Harr Classifier.
  • Keywords: Face detection, Python: OpenCV, pillow, numpy, matplotlib.

Data analysis:

The project concerns market basket analysis and product recommendation by using the association methods. I have build model by using the Apriori algorithm to products recomendation based on our data.

  • Models used: Apriori algorithm.
  • Keywords: product recomendation, data analysis, python, MLxtend.

The project concerns the analysis of the IT job market using data from GitHub, StackOverflow and Web scraping data. I have used SQL, Google Big Query and Python (pandas, numpy, matplotlib, seaborn) to analyze the data.

  • Keywords: data preprocessing, data cleaning, EDA, Python: pandas, numpy, seaborn; SQL, Google BigQuery.

The project contains the analysis of example sales data with SQL. The project showcase my knowledge and skils in SQL such as data manipulation, analysis and querying.

  • Keywords: SQL, data analysis, Microsoft SQL Server.

The project contains the analysis of employee attrition data and create an interactive dashboard using Power BI.

  • Keywords: Power BI, data analysis, data visualization, dashboard.

The project includes data analysis and outliers detection of air quality data. The outliers detection have been made with a few methods such as Tukey's method (IQR) and Isolation Forest algorithm.

  • Models used: Isolation Forest.
  • Keywords: data analysis, outliers detection, Python: pandas, numpy, scikit-learn, seaborn.

The project includes world happiness analysis over 5 years (2015-2019). For analysis I have used SQL (SQLite) and python.

  • Keywords: data analysis, SQL, Python: SQLite3, pandas, matplotlib, seaborn.

The project allows to build interactive dashboard from sales data by using pandas-bokeh library.

  • Keywords: data analysis, data visualization, dashboard, python, pandas, pandas-bokeh.

Python projects

The REST API Web App for Sentiment analysis of clothes reviews by using Flask and Machine Learning model.

  • Keywords: Flask, HTML, Python: pandas, scikit-learn, regex, nltk.

It is Streamlit application with using a Deep Learning model to determine if a given waste are recycle or organic. I have used a previous trained CNN (Convolutional Neural Networks) algorithm to detect waste.

  • Keywords: python, streamlit, tensorflow, pillow.

Automating the Excel report with python and openpyxl library.

  • Keywords: python, openpyxl, pandas.

This Python script allows to read a CSV file entered by the user, changes the data contained in it and returns the transformed data as a new CSV one.

  • Keywords: python, pycountry, csv.

In the project I have used the API to get the data and create a dataset. I have created two examples of get the data from an API. The data received was saved in json format and they were exported to a csv file.

  • Keywords: python, pandas, requests, json.

SQL and Python projects

The project includes a simple ETL process using Python and SQLite database. This pipeline allows to match reported chargebacks (Excel file) with transactions from the database.

  • Keywords: ETL, python, SQLite, pandas.

The script allows to make a basic crud operations by using python and SQLite3.

  • Keywords: python, SQLite.