Springboard Data Science Submissions

Submissions for COMPLETED ✔️ Springboard Data Science curriculum.

Submissions
- Guided Capstone (external repo)
Capstone Two
- Project Page
Capstone Three
- Project Page

Submissions

All assignments and submissions practice the datascience methodology.

Unit	Name, link	Description	Status	Skills
4.3	London Calling!	London Housing Case Study	✔️	pandas, matplotlib, seaborn, Python data types
6	Guided Capstone	First Capstone Data Science Method (DSM) exercise, contained in external repository. Seven submissions in total.	✔️	statistics review, scikit-learn, data wrangling/exploration/visualization, PCA, cross-validation, regression, model selection, RandomForest, hyperparameter tuning, data quantity assessment
7.2	API Mini-Project	Last assignment for Data Wrangling portion, 7.2 of the DSM	✔️	RESTful APIs, JSON, requests, financial calculations
7.4	Nasa Meteorites Report	Notebook example showing use of ydata-profiling to generate reports from `DataFrames`, part of 7.4 - Data Defintions. Example HTML reports contained in folder.	✔️	automated EDA, NumPy
8.3	SQL Case Study	SQL wrap up using country club facility / booking / member data. Issues with Springboard's PHPMyAdmin server	✔️	SQL, SQLite3, joins, filters, etc. . .
11.1	Frequentist Inference Part A, Part B	Statisical inference exercises. Introduction to scipy package.	✔️✔️	SciPy, statistical tests and parameters, confidence intervals, population distributions and sampling, Central Limit Theorem
11.3	Hypothesis Testing - Integrating Apps	Case study were user reviews from X better than for Y?	✔️	data cleaning, null and alternate hypotheses, permutation tests, tqdm
11.4	Linear Regression - Red Wine	Case study using a Kaggle wine dataset for regression practice.	✔️	EDA, correlation, train/test splits, statsmodels, linear regression (Ordinary least squares, multiple, weighted), multicollinearity, feature selection
11.4	Linear Regression - Boston Housing	Mini-project, predict house prices from data using OLS regression.	✔️	EDA, linear regression, model coefficients interpretation, error analysis, coefficient of determination, model comparisons, information criteria, F-statistic, QQ plots, influence plots, outlier analysis, ethical data (see sklearn `load_boston`)
14.2	Logistic Regression	Case study using healthcare patient data, predicting heart disease. Brief discussion of model tuning. Added extra notes to Confusion Matrix / Precision / Recall section.	✔️	logistic regression, wrangling, EDA, preprocessing, categorical feature encoding, training, Confusion Matrix, prescision, recall, accuracy, hyperparameter tuning, GridSearchCV, discriminative vs. generative models
14.3	Decision Trees - RR Diner Coffee	Use customer survey to predict if others will buy new coffee.	✔️	dummy encoding, decision tree classifiers, decision tree hyperparameters, bagging classifier, RandomForest, classification metrics
14.4	Random Forest	Random Forest overview and discussion, basic demonstration with patient data and classification.	✔️	Graphviz, RandomForest, ExtraTreees, data imputation, data scaling and normalization, model feature importances
14.5	Gradient Boosting	Gradient Boosting basic demonstration for curve fitting and with Titanic survivors dataset.	✔️	Gradient boosting, regression and classification, ROC-AUC, GradientBoosting tuning
15.2	Calculating Distances	Visual demonstrations of Euclidean vs Manhattan distance calculations.	✔️	Distance metrics and calculation, Euclidean, Manhattan, PCA
15.5	Cosine Similarity	Brief example of calculation using numerical and text data (with preprocessing steps).	✔️	Cosine similarity, 3d plotting, text data, TF-IDF,
15.6	PCA, K-Means Clustering - Customer Segmentation	Customer survery clustering, example with limited features. Variety of algorithms and evaluation metrics used.	✔️	K-means clustering, inertia, Elbow, Silhouette and gap statistic methods, PCA, other clustering algorithms: (H)DBSCAN, AffinityPropagation, Spectral, Aggomeration
16.2	`featuretools` Automated Feature Engineering	Tutorial notebook for featuretools package, customer churn prediction from Kaggle dataset	✔️	featuretools, dask, automated feature engineering, deep feature synthesis, custom primitives, selected primitives, churn prediction
18.2	Grid Search with kNN	Brief hyperparameter tuning example using nearest neighbors and random forest models.	✔️	nearest neighbors classification, hyperparameter tuning
18.2	Bayesian Optimization	Bayesian optimization (package) for hyperparameter tuning, LightGBM and CatBoost models.	✔️	bayes_opt, Bayesian optimization for hyperparameter tuning, CatBoost classification, LightGBM, feature encoding, transformation, and engineering
20.3	Storytelling	Choose dataset, explore, build a narritve: NFL QB Draft Picks since 1990	✔️	applied datascience methodology with focus on data visualization and interpreation
21.1	Time Series Analysis, ARIMA model	Case study forecasting sales data using ARIMA.	✔️	Time series analysis, ARIMA models, decomposition: trend, seasonality, and noise, stationarity, KPSS, ARIMA scoring, parameters, forecasting
25.2	PySpark, Databricks Exercises	examples interacting with data and fitting models	✔️	external link, Databricks, Spark, SparkSQL, Spark ML, pipelines,
27.2	Take Home One	Three part take home challenge. Timeseries, experiment design, and classification modeling.	✔️	demonstration of skills, EDA, DoE, Modeling, hyperparameter tuning with RandomizedSearch grids, relative permutation feature importance, error analysis
27.2	Take Home Two	Classification modeling, data analysis and discussion.	✔️	demonstration of skills, gradient boosting classifiers (HistGB, LGBM, CatBoost, XGBoost)

Capstone Two

Unit	Name, link	Description	Status
7.1	Project Proposal	Final PDF of proposal after discussion and approval. Project ideas not uploaded to repository folder.	✔️
7.6	Data Wrangling	Notebook containing initial data cleaning steps and descriptions.	✔️
11.5	Exploratory Data Analysis	Notebook containing initial data exploration steps and descriptions.	✔️
16.3	Pre-processing and Training	Notebook containing initial data pre-processing and model training steps and descriptions.	✔️
18.3	Modeling	Notebook containing initial modeling steps and descriptions.	✔️
20.4	Final Report	Final report for Capstone Two. Brief summary	✔️
20.4	Final Model	Final model parameters and metrics for Capstone Two.	✔️
20.4	Final Presentation	Final slides for Capstone Two.	✔️

Capstone Three

Unit	Name, link	Description	Status
24.4.1	Project Proposal	Final PDF of proposal after discussion and approval. Based on Kaggle PlantTraits competition.	✔️
26.2.1	Data Wrangling and EDA	Data wrangling and EDA notebook.	✔️
28.1.1	Pre-processing and Modeling	Notebook containing data pre-processing and model training.	✔️
28.1.2	Documentation	Final report for Capstone Three	✔️
28.1.3	Presentation	Final slides for Capstone Three	✔️

Other

Notes, in progress
- Statistics
  - Stat Book
    - Chapter summaries, mostly incomplete
  - LI Learning
    - statistical inference, statistical modeling, bayesian inference
- Review Topics for Interviews
  - Python basics, SQL, various interview articles
- Machine Learning Units
  - supervised and unsupervised learning, feature engineering, applications
- datacamp notebooks
  - supervised and unsupervised learning, feature engineering, time-series analysis, PySpark
  - brief list of completed courses
External
- Python Data Science repository Jake VanderPlas
  - website form of book
- Notes from Guided Capstone
- StatQuest YT channel

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
11.1 Statistical Inference		11.1 Statistical Inference
11.3 Hypothesis Testing		11.3 Hypothesis Testing
11.4 LinReg_Boston Housing		11.4 LinReg_Boston Housing
11.4 Modeling_LinReg		11.4 Modeling_LinReg
14.2 Logistic Regression		14.2 Logistic Regression
14.3 Decision Trees		14.3 Decision Trees
14.4 Random Forest		14.4 Random Forest
14.5 Gradient Boosting		14.5 Gradient Boosting
15.2 Euclidean and Manhattan Distances		15.2 Euclidean and Manhattan Distances
15.5 Cosine Similarity		15.5 Cosine Similarity
15.6 PCA_K-Means clustering		15.6 PCA_K-Means clustering
16.2.5 featuretools customer churn tutorial		16.2.5 featuretools customer churn tutorial
18.2.4 kNN Grid Search		18.2.4 kNN Grid Search
18.2.6 Bayesian Optimization		18.2.6 Bayesian Optimization
20.3.2 Dataset Story		20.3.2 Dataset Story
21.1.2 Time Series Analysis_CowboyCigarettes		21.1.2 Time Series Analysis_CowboyCigarettes
27.2.2 Take Home Challenge		27.2.2 Take Home Challenge
27.2.3 Take Home Two		27.2.3 Take Home Two
4.3 London Calling!		4.3 London Calling!
7.2.5 API Mini-Project		7.2.5 API Mini-Project
7.4.2 Meteorites		7.4.2 Meteorites
8.3.3 SQL Case Study		8.3.3 SQL Case Study
Capstone Three		Capstone Three
Capstone Two		Capstone Two
Notes		Notes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Springboard Data Science Submissions

Submissions

Capstone Two

Capstone Three

Other

About

Releases

Packages

Languages

License

NBPub/Springboard

Folders and files

Latest commit

History

Repository files navigation

Springboard Data Science Submissions

Submissions

Capstone Two

Capstone Three

Other

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages