This is main repository to group the my course work as part of Coursera Data Science Specialization by John Hopkins University, .
- Roger D. Peng, PhD, Associate Professor, Biostatistics
- Brian Caffo, PhD, Professor, Biostatistics
- Jeff Leek, PhD, Associate Professor, Biostatistics
- Course 1: The Data Scientist's Toolbox
- Course 2: R Programming
- Course 3: Getting and Cleaning Data
- Course 4: Exploratory Data Analysis
- Learn from Roger Peng, etc.; Understand John Hopkins perspective on Data Science
- Understand R community within Healthcare, Biostatistics
- Learn R as a language and understanding tooling and dependencies
- Perform literature search at it applies to Healthcare use case using R for publishing research results.
- Course 3: Getting and Cleaning Data (by analyzing the UCI Human Activity Recognition (HAR) Data Set
- During this project, I felt most productive and synthesized multiple concepts and skills; Also, it felt more real world with having to more domain analysis and data wrangling
- R is domain specific language (DSL) that most applied statistics will use from top down; Bottom up approach would be Python (Numpy, Scipy, etc.)
- R is easy to learn; more procedural and assignment
- Data frames, RShiny, etc. are nice to use
- OO/module paradigm is complex and too many ways of doing the same thing
- For complex data science projects that require pipelines
- RStudio has community (i.e., Microsoft heavily backing)