Skip to content

Submissions for Springboard Data Science curriculum.

License

Notifications You must be signed in to change notification settings

NBPub/Springboard

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Springboard Data Science Submissions

Submissions for COMPLETED ✔️ Springboard Data Science curriculum.

Submissions

All assignments and submissions practice the datascience methodology.

Unit Name, link Description Status Skills
4.3 London Calling! London Housing Case Study ✔️ pandas, matplotlib, seaborn, Python data types
6 Guided Capstone First Capstone
Data Science Method (DSM) exercise, contained in external repository. Seven submissions in total.
✔️ statistics review, scikit-learn, data wrangling/exploration/visualization, PCA, cross-validation, regression, model selection, RandomForest, hyperparameter tuning, data quantity assessment
7.2 API Mini-Project Last assignment for Data Wrangling portion, 7.2 of the DSM ✔️ RESTful APIs, JSON, requests, financial calculations
7.4 Nasa Meteorites Report Notebook example showing use of ydata-profiling to generate reports from DataFrames, part of 7.4 - Data Defintions. Example HTML reports contained in folder. ✔️ automated EDA, NumPy
8.3 SQL Case Study SQL wrap up using country club facility / booking / member data. Issues with Springboard's PHPMyAdmin server ✔️ SQL, SQLite3, joins, filters, etc. . .
11.1 Frequentist Inference Part A, Part B Statisical inference exercises. Introduction to scipy package. ✔️✔️ SciPy, statistical tests and parameters, confidence intervals, population distributions and sampling, Central Limit Theorem
11.3 Hypothesis Testing - Integrating Apps Case study were user reviews from X better than for Y? ✔️ data cleaning, null and alternate hypotheses, permutation tests, tqdm
11.4 Linear Regression - Red Wine Case study using a Kaggle wine dataset for regression practice. ✔️ EDA, correlation, train/test splits, statsmodels, linear regression (Ordinary least squares, multiple, weighted), multicollinearity, feature selection
11.4 Linear Regression - Boston Housing Mini-project, predict house prices from data using OLS regression. ✔️ EDA, linear regression, model coefficients interpretation, error analysis, coefficient of determination, model comparisons, information criteria, F-statistic, QQ plots, influence plots, outlier analysis, ethical data (see sklearn load_boston)
14.2 Logistic Regression Case study using healthcare patient data, predicting heart disease. Brief discussion of model tuning. Added extra notes to Confusion Matrix / Precision / Recall section. ✔️ logistic regression, wrangling, EDA, preprocessing, categorical feature encoding, training, Confusion Matrix, prescision, recall, accuracy, hyperparameter tuning, GridSearchCV, discriminative vs. generative models
14.3 Decision Trees - RR Diner Coffee Use customer survey to predict if others will buy new coffee. ✔️ dummy encoding, decision tree classifiers, decision tree hyperparameters, bagging classifier, RandomForest, classification metrics
14.4 Random Forest Random Forest overview and discussion, basic demonstration with patient data and classification. ✔️ Graphviz, RandomForest, ExtraTreees, data imputation, data scaling and normalization, model feature importances
14.5 Gradient Boosting Gradient Boosting basic demonstration for curve fitting and with Titanic survivors dataset. ✔️ Gradient boosting, regression and classification, ROC-AUC, GradientBoosting tuning
15.2 Calculating Distances Visual demonstrations of Euclidean vs Manhattan distance calculations. ✔️ Distance metrics and calculation, Euclidean, Manhattan, PCA
15.5 Cosine Similarity Brief example of calculation using numerical and text data (with preprocessing steps). ✔️ Cosine similarity, 3d plotting, text data, TF-IDF,
15.6 PCA, K-Means Clustering - Customer Segmentation Customer survery clustering, example with limited features. Variety of algorithms and evaluation metrics used. ✔️ K-means clustering, inertia, Elbow, Silhouette and gap statistic methods, PCA, other clustering algorithms: (H)DBSCAN, AffinityPropagation, Spectral, Aggomeration
16.2 featuretools Automated Feature Engineering Tutorial notebook for featuretools package, customer churn prediction from Kaggle dataset ✔️ featuretools, dask, automated feature engineering, deep feature synthesis, custom primitives, selected primitives, churn prediction
18.2 Grid Search with kNN Brief hyperparameter tuning example using nearest neighbors and random forest models. ✔️ nearest neighbors classification, hyperparameter tuning
18.2 Bayesian Optimization Bayesian optimization (package) for hyperparameter tuning, LightGBM and CatBoost models. ✔️ bayes_opt, Bayesian optimization for hyperparameter tuning, CatBoost classification, LightGBM, feature encoding, transformation, and engineering
20.3 Storytelling Choose dataset, explore, build a narritve: NFL QB Draft Picks since 1990 ✔️ applied datascience methodology with focus on data visualization and interpreation
21.1 Time Series Analysis, ARIMA model Case study forecasting sales data using ARIMA. ✔️ Time series analysis, ARIMA models, decomposition: trend, seasonality, and noise, stationarity, KPSS, ARIMA scoring, parameters, forecasting
25.2 PySpark, Databricks Exercises examples interacting with data and fitting models ✔️ external link, Databricks, Spark, SparkSQL, Spark ML, pipelines,
27.2 Take Home One Three part take home challenge. Timeseries, experiment design, and classification modeling. ✔️ demonstration of skills, EDA, DoE, Modeling, hyperparameter tuning with RandomizedSearch grids, relative permutation feature importance, error analysis
27.2 Take Home Two Classification modeling, data analysis and discussion. ✔️ demonstration of skills, gradient boosting classifiers (HistGB, LGBM, CatBoost, XGBoost)
Unit Name, link Description Status
7.1 Project Proposal Final PDF of proposal after discussion and approval. Project ideas not uploaded to repository folder. ✔️
7.6 Data Wrangling Notebook containing initial data cleaning steps and descriptions. ✔️
11.5 Exploratory Data Analysis Notebook containing initial data exploration steps and descriptions. ✔️
16.3 Pre-processing and Training Notebook containing initial data pre-processing and model training steps and descriptions. ✔️
18.3 Modeling Notebook containing initial modeling steps and descriptions. ✔️
20.4 Final Report Final report for Capstone Two. Brief summary ✔️
20.4 Final Model Final model parameters and metrics for Capstone Two. ✔️
20.4 Final Presentation Final slides for Capstone Two. ✔️
Unit Name, link Description Status
24.4.1 Project Proposal Final PDF of proposal after discussion and approval. Based on Kaggle PlantTraits competition. ✔️
26.2.1 Data Wrangling and EDA Data wrangling and EDA notebook. ✔️
28.1.1 Pre-processing and Modeling Notebook containing data pre-processing and model training. ✔️
28.1.2 Documentation Final report for Capstone Three ✔️
28.1.3 Presentation Final slides for Capstone Three ✔️

Other

About

Submissions for Springboard Data Science curriculum.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published