This portfolio is a compilation of data projects that I have done for research and portfolio purposes.
Methodologies: Data analysis, machine learning, deep learning, natural language processing, statistics, experiment design
Languages: Python, SQL
Libraries: Pandas, Numpy, pySpark, Scrapy, MySQL, PostgreSQL, Matplotlib, Plotly, Sklearn, Tensorflow, Keras
Deployment: Streamlit, Docker, Heroku
Framework: Databricks, Tableau, PowerBI, AWS, VSCode, Jupyter Notebook, Google Colab, Git and GitHub
Language: C++
Cloud: Google Cloud Computing
Framework: MLFlow
Visual Basic, some VBA, some Matlab, little Fortran90.
Scripting for finite element analysis software such as Ansys APDL, Opensees and CAST3M
Civil Work Bidding Price Prediction in San Francisco. AI-Powered
Based on machine learning algorithms, it helps users estimating construction cost of future housing or appartment projects in San Francisco, California.
Domains: supervised machine learning, feature engineering, data visualization, model performances, construction cost, investment, app deployment
➜ App
Getaround Car Rental Price Predictor and Dashboard on a New Feature
Deployment of an online API to predict Getaround car rental price with an endpoint containing an XGBoost model and then production of a dashboard to give insights on implementing a new feature
Domains: data analysis, dashboard, supervised machine learning, FastAPI, Streamlit, app deployment, customer engagement
➜ Dashboard, API/docs and API/predict
Building a credit risk model by using Loan Data to provide a scorecard and a pipeline to calculate exposure loss
Domains: data cleaning, data analysis, supervised machine learning, statistical modeling, hypothesis testing, risk, finance
Hotspot Zone Segmentation in Uber Pickup Data
Creation of pipelines that determine the hot-zones guiding UBER drivers for optimal pickups
Domains: data analysis, data visualization, unsupervised machine learning, clustering, optimization
Disaster Tweet Analysis with Natural Language Processing
Building a deep learning model that predicts which Tweets talks about real disasters and which ones do not
Domains: natural language processing,spacy, tokenizing, deep learning, RNN, GRU, LSTM, word clouds, disasters
Trained Naïve Bayes algorithms to classify spam comments in Youtube videos
Domains: natural language processing, vectorizing, machine learning, confusion matrix, social media
Kayak Trip Planning: Extract, Transform and Load
Development of an app that recommend best destinations in France with up-to-date weather and hotels information
Domains: ETL, web scrapping, data lake, data warehouse, AWS, client engagement
Extraction of insights on key factors for a second date
Domains: data cleaning and organizing, exploratory data analysis, insights
- Python tools on spectrum compatible record selection and modification with cycle-and-shift algorithm
Domains: algorithm development, automation, optimization, large data sets, data analytics, signal processing, engineering, research
Manuscript on its development
- ALCAMBER - Software package providing improved estimates of camber in concrete bridge girders
Domains: user interface, debugging, predictive models, visual basic, bridge engineering, research
Book chapter with a summary on its development
➜ App on Demand
A project on combined capabilities of Hadoop and Apache Spark on data analytics of a student score dataset
Learning material for C++