Skip to content

Add infer to Chapters 9-11 and DataCamp courses. Big additions to Chapters 8 and 12.

Compare
Choose a tag to compare
@ismayc ismayc released this 22 Jul 21:36
· 1835 commits to master since this release
afba032

ModernDive 0.4.0

Highlights

  1. The infer package is ready for prime-time! Thus we made a first pass at incorporating it into the book in Chapters 9 and 10 on confidence intervals and hypothesis testing!
  2. Chapter 12 on "Thinking with Data" now includes a case study using the Seattle house prices dataset on Kaggle.com. Chapters 3 and 4 from new "Modeling with Data in the Tidyverse" DataCamp course by Albert Y. Kim are based on this analysis!
  3. Speaking of DataCamp, we point readers to various DataCamp courses that directly align with various chapters in the book!
  4. We significantly cleaned up Chapter 8 on sampling! In particular: adding a 2013 Obama approval rating poll example to tie in with our sampling bowl tactile and virtual simulations and making it very clear that ultimately we are performing statistical inference via sampling.

All content changes

  • Introduction: Added section on correspondence of chapters to various DataCamp courses. Furthermore, links to relevant DataCamp course are included at the outset of each chapter.
  • Chapter 3 - Data visualization:
    • Added simplified geom_jitter() example
    • More explanations for how whiskers and outliers are constructed in geom_boxplots
    • Added summary of table of all 5 named graphs
  • Chapter 4 - Tidy data:
    • Added section on importing Excel data via RStudio
    • Added example of tidy vs non-tidy: fivethirtyeight::drinks
  • Chapter 5 - Data wrangling:
    • Added computing available seat miles data wrangling case study
    • Abandoned "5 Main Verbs" 5MV notion
    • Added _join() and group_by() multiple variables
  • Chapter 6 - Basic regression:
    • Clarified explanations of indicator/dummy variables when using categorical variable in regression.
    • Expanded "Correlation is not necessarily causation" subsection with example of "does sleeping with shoes on cause headaches?" including causal diagram
    • Introduced concept of a "wrapper function" when introducing moderndive::get_regression_table() function
    • Replaced all base::summary() with skimr::skim() for quick numerical summaries
  • Chapter 7 - Multiple regression:
    • Changed all "everything else being equal" interpretation statements with "taking into account/controlling for all other variables in our model"
  • Chapter 8 - Sampling:
    • Significantly cleaned up sampling terminology and definitions and made more clear that we are sampling for inference
    • Cleaned up section and subsection structure to be much cleaner:
      1. Tactile sampling simulation
      2. Virtual sampling simulation
      3. In real-life sampling: Introduced example of 2013 Obama approval rating poll and then tie everything with sampling bowl.
  • Major overhaul: Chapter 9 - Confidence intervals
    • infer package now being ready for prime-time, we made first pass at incorporation into book.
  • Major overhaul: Chapter 10 - Hypothesis testing
  • Chapter 11 - Inference for Regression
    • Added a simple linear regression example using the infer package
  • Major overhaul: Chapter 12 - Thinking with data
    • Added case study of Seattle house prices dataset from Kaggle, which is now available in house_prices dataframe in moderndive package.
      1. Chapters 3 and 4 from new "Modeling with Data in the Tidyverse" DataCamp course are based on this analysis
      2. Includes a discussion on the importance of log10-transformations
      3. Introduces modeling/regression for prediction: predicting house prices
    • Laid outline for "effective data storytelling" using fivethirtyeight data and added one small example using US births data
    • At the beginning of chapter, we now come full circle and revisit the discussion on the ModernDive flowchart in the introduction.

Other changes

  • Updated moderndive package on CRAN to 0.2.0. See NEWS.md