Skip to content

Major Overhaul of Data Science Topics

Compare
Choose a tag to compare
@rudeboybert rudeboybert released this 02 Aug 20:53
· 2222 commits to master since this release

Content changes

  • Incorporated feedback from consultations with Prof. Yana Weinstein, cognitive psychological scientist and co-founder of The Learning Scientists.
  • Restructured/revamped chapters
    • Chapter 1: Introduction
    • Chapter 2: Getting Started New chapter added meant for new R users/coders, including
      • Discusions on R vs RStudio and how to install both (with support videos)
      • A "How do I code in R?" section with links to DataCamp.com courses that covers the console, data types, vectors, factors, data frames, boolean operators, functions etc
      • Thorough discussion on R packages
      • An end-to-end starter example analysis of the data frames in the nycflights13 package using the console, View(), glimpse() etc.
    • Chapter 3: Data Visualization via ggplot2 now first non-intro chapter.
      • Replaced Menard's "Napolean's March on Moscow" with Hans Rosling's (RIP) "Gapminder" plots as introductory example to Grammar of Graphics.
      • Added geom_col() for making barcharts when data is pre-tabulated, instead of using geom_bar(stat="identity")
    • Chapter 4: Tidy Data via tidyr bumped back. Added sections on converting from wide to long/tidy format and importing CSV's
    • Chapter 5: Data Manipulation Wrangling via dplyr
    • Chapter 6: Data Modeling using Regression via broom bumped up from end of book to here given its pedagogical importance, added notes on viewing regression in a prediction framework.
    • Chapter 7-9: Sampling, Hypothesis Testing, Confidence Intervals Mostly unchanged for now; see pending changes section below.

Technical changes

  • Book is now hosted on ModernDive.com
  • Development version now on original ModernDive site https://ismayc.github.io/moderndiver-book/
  • Added links to digital copies and source code of all past versions of ModernDive in Chapter 1.
  • Cut build/compilation time of book from ~20 minutes to ~1 minute
  • Disabled gitbook PDF output

Pending changes for next version

  • Chapter 6: Data Modeling using Regression via broom
    • Better treatment of experimental design and its effect on bias/causation than currently exists in chapter.
    • Examples of regression with categorical predictors with 3 or more levels.
    • Multivariate regression, in particular the following predictor scenarios: 2 numerical, 2 categorical, and 1 numerical + 1 categorical
    • Interaction effects
  • Chapter 7-9: Sampling, Hypothesis Testing, Confidence Intervals have largely not been updated, pending developments of infer: A tidyverse-friendly R package fo statistical inference