- Alpine Data Labs http://alpinedatalabs.com - primary project manager
- Pivotal Inc. http://gopivotal.com - primary sponsor
- Nitin Borwankar http://twitter.com/nitin - primary developer
- A collection of Open Data Science Training lessons in the form of IPython Notebooks.
- Associated data sets.
The initial beta release consists of four major topics
- Linear Regression
- Logistic Regression
- Random Forests
- K-Means Clustering
Each of the above has at least three IPython Notebooks covering
- Overview (an exposition of the technique for the math-wary)
- Data Exploration (the nuts and bolts of real world data wrangling)
- Analysis (using the technique to get results)
One or more of these may have supplementary material. Each of these have worksheets that contain mostly the code sections so you can iteratively explore the
Three openly available data sets are used.
- For the Linear and Logistic Regression we use a data set on loans and interest rates provided by Learning Club http://learningclub.com
- For Random Forests we use a data set of Android accelerometer and gyroscope readings used to predict body position and motion from the Human Activity Recognition project http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones
- UN data on economic indicators of countries
There's a need for open content to raise the level of awareness and training in basics, in the Data Science field (circa early 2013).
IPython Notebook provides an appropriate platform for rapid iterative exploration and learning.
Starting in 2013 and intended to extend for a long while.
Today github, tomorrow the world.
- A0. How to use this content.ipynb
- A1. Linear Regression - Overview.ipynb
- A2. Linear Regression - Data Exploration - Lending Club.ipynb
- A3. Linear Regression - Analysis.ipynb
- B1. Logistic Regression - Overview.ipynb
- B1a. Odds, LogOdds and Logit Function .ipynb
- B2. Logistic Regression - Data Exploration.ipynb
- B3. Logistic Regression - Analysis.ipynb
- C1. Random Forests - Overview.ipynb
- C2. Random Forests - Data Exploration.ipynb
- C3. Random Forests - Analysis.ipynb
- D1. K-Means Clustering - Overview.ipynb
- D2. K-Means Clustering - Data Exploration.ipynb
- D3. K-Means Clustering Analysis.ipynb
- WA1. Linear Regression Overview Worksheet.ipynb
- WA2. Linear Regression - Data Exploration - Lending Club Worksheet.ipynb
- WA3. Linear Regression - Analysis Worksheet.ipynb
- WB3. Logistic Regression - Analysis- Worksheet.ipynb
- WC3. Random Forests - Analysis - Worksheet.ipynb
- WD2. K-Means Clustering - Data Exploration-Worksheet.ipynb
- WD3. K-Means Clustering Analysis - Worksheet.ipynb
- Z0. A quick tour of the IPython notebook.ipynb
- Z1. Appendix 1 Plotting code snippets.ipynb