This repository contains code and data files for projects completed throughout the Galvanize Immersive Data Science program.
"You are a contract data scientist/consultant hired by a new e-commerce site to try to weed out fraudsters. The company unfortunately does not have much data science expertise so you must properly scope and present your solution to the manager before you embark on your analysis. Also, you will need to build a sustainable software project that you can hand off to the companies engineers by deploying your model in the cloud."
- Python
- Neural network (MLP)
- Flask
- HTML
- Pandas
- Numpy
- Seaborn
"You and a team of talented data scientists are working for the company, Items-Legit, who use several production recommenders that provide a significant revenue stream. The issue is that these systems have been around a long time and your head of data science has asked you and your team members to explore new solutions. The main goal here is to improve the RMSE, however, another equally important goal is to present your model and the details of your methods in a clear, but technically sound manner. We would also like you to include some discussion about how you would move from prototype to production."
- Python
- Spark
"A ride-sharing company (Company X) is interested in predicting rider retention. To help explore this question, we have provided a sample dataset of a cohort of users who signed up for an account in January 2014. The data was pulled on July 1, 2014; we consider a user retained if they were “active” (i.e. took a trip) in the preceding 30 days (from the day the data was pulled). In other words, a user is "active" if they have taken a trip since June 1, 2014. We would like you to use this data set to help understand what factors are the best predictors for retention, and offer suggestions to operationalize those insights to help Company X. Therefore, your task is not only to build a model that minimizes error, but also a model that allows you to interpret the factors that contributed to your predictions."
- Python
- Regression analysis
- Decision trees
- K nearest neighbors
- Pandas
- Seaborn