Skip to content

Commit

Permalink
Merge
Browse files Browse the repository at this point in the history
Merge branch 'master' of http://github.com/ismayc/moderndiver-book

# Conflicts:
#	bib/packages.bib
#	docs/3-tidy.html
#	docs/7-hypo.html
#	docs/8-ci.html
#	docs/index.html
#	docs/ismaykim.pdf
#	docs/ismaykim.tex
#	docs/ismaykim_files/figure-html/jitter-1.png
#	docs/search_index.json
  • Loading branch information
ismayc committed Feb 9, 2017
2 parents 1e05d4f + 5223f11 commit a5cec6c
Show file tree
Hide file tree
Showing 28 changed files with 1,125 additions and 176 deletions.
2 changes: 1 addition & 1 deletion 03-tidy.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -213,7 +213,7 @@ As can be seen here when you just enter the name of an object in R, by default i
**_Learning check_**
```

**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** Run the following block of code in R to load and view each of the four data frames in the `nycflights13` package. Switch between the different tabs that have opened to view each of the four data frames. Describe in two sentences for each data frame what stands out to you and what the most important features are of each.
**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** Run the following block of code in RStudio to load and view each of the four data frames in the `nycflights13` package. Switch between the different tabs that have opened to view each of the four data frames. Describe in two sentences for each data frame what stands out to you and what the most important features are of each.

```{r eval=FALSE}
data(weather)
Expand Down
19 changes: 12 additions & 7 deletions bib/packages.bib
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,9 @@ @Manual{R-broom
@Manual{R-devtools,
title = {devtools: Tools to Make Developing R Packages Easier},
author = {Hadley Wickham and Winston Chang},
note = {R package version 1.12.0.9000},
url = {https://github.com/hadley/devtools},
year = {2016},
note = {R package version 1.12.0},
url = {https://CRAN.R-project.org/package=devtools},
}
@Manual{R-dplyr,
title = {dplyr: A Grammar of Data Manipulation},
Expand All @@ -37,8 +37,8 @@ @Manual{R-dplyr
@Manual{R-dygraphs,
title = {dygraphs: Interface to 'Dygraphs' Interactive Time Series Charting Library},
author = {Dan Vanderkam and JJ Allaire and Jonathan Owen and Daniel Gromer and Petr Shevtsov and Benoit Thieurmel},
year = {2017},
note = {R package version 1.1.1.4},
year = {2016},
note = {R package version 1.1.1.3},
url = {https://CRAN.R-project.org/package=dygraphs},
}
@Manual{R-fivethirtyeight,
Expand All @@ -52,7 +52,7 @@ @Manual{R-ggplot2
title = {ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics},
author = {Hadley Wickham and Winston Chang},
year = {2016},
note = {R package version 2.2.1},
note = {R package version 2.2.0},
url = {https://CRAN.R-project.org/package=ggplot2},
}
@Manual{R-ggplot2movies,
Expand Down Expand Up @@ -107,8 +107,13 @@ @Manual{R-mvtnorm
@Manual{R-nycflights13,
title = {nycflights13: Flights that Departed NYC in 2013},
author = {Hadley Wickham},
<<<<<<< HEAD
year = {2017},
note = {R package version 0.2.2},
=======
year = {2016},
note = {R package version 0.2.0},
>>>>>>> 5223f11c169d8e511baaf89c512ab1e07fdad40e
url = {https://CRAN.R-project.org/package=nycflights13},
}
@Manual{R-okcupiddata,
Expand All @@ -130,7 +135,7 @@ @Manual{R-rmarkdown
title = {rmarkdown: Dynamic Documents for R},
author = {JJ Allaire and Joe Cheng and Yihui Xie and Jonathan McPherson and Winston Chang and Jeff Allen and Hadley Wickham and Aron Atkins and Rob Hyndman},
year = {2016},
note = {R package version 1.3},
note = {R package version 1.2},
url = {https://CRAN.R-project.org/package=rmarkdown},
}
@Manual{R-tibble,
Expand All @@ -151,6 +156,6 @@ @Manual{R-webshot
title = {webshot: Take Screenshots of Web Pages},
author = {Winston Chang},
year = {2016},
note = {R package version 0.4.0},
note = {R package version 0.3.2},
url = {https://CRAN.R-project.org/package=webshot},
}
3 changes: 1 addition & 2 deletions docs/10-effective-data-storytelling.html
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,7 @@
<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
<script src="libs/moment-2.8.4/moment.js"></script>
<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
<script src="libs/dygraphs-binding-1.1.1.4/dygraphs.js"></script>
<script src="libs/dygraphs-binding-1.1.1.3/dygraphs.js"></script>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
Expand Down
3 changes: 1 addition & 2 deletions docs/2-intro.html
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,7 @@
<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
<script src="libs/moment-2.8.4/moment.js"></script>
<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
<script src="libs/dygraphs-binding-1.1.1.4/dygraphs.js"></script>
<script src="libs/dygraphs-binding-1.1.1.3/dygraphs.js"></script>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
Expand Down
70 changes: 57 additions & 13 deletions docs/3-tidy.html
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,7 @@
<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
<script src="libs/moment-2.8.4/moment.js"></script>
<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
<script src="libs/dygraphs-binding-1.1.1.4/dygraphs.js"></script>
<script src="libs/dygraphs-binding-1.1.1.3/dygraphs.js"></script>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
Expand Down Expand Up @@ -404,8 +403,20 @@ <h1><span class="header-section-number">3</span> Tidy Data</h1>
<h3>Needed packages</h3>
<p>At the beginning of this and all subsequent chapters, we’ll always have a list of packages you should have installed and loaded. In particular we load the <code>nycflights13</code> package which we’ll discuss shortly and the <code>dplyr</code> package for data manipulation, the subject of Chapter <a href="5-manip.html#manip">5</a>.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">library</span>(nycflights13)
<<<<<<< HEAD
<span class="kw">library</span>(dplyr)
<span class="kw">library</span>(tibble)</code></pre></div>
=======
<span class="kw">library</span>(dplyr)</code></pre></div>
<pre><code>##
## Attaching package: &#39;dplyr&#39;</code></pre>
<pre><code>## The following objects are masked from &#39;package:stats&#39;:
##
## filter, lag</code></pre>
<pre><code>## The following objects are masked from &#39;package:base&#39;:
##
## intersect, setdiff, setequal, union</code></pre>
>>>>>>> 5223f11c169d8e511baaf89c512ab1e07fdad40e
<!--Subsection on Tidy Data -->
</div>
<div id="what-is-tidy-data" class="section level2">
Expand Down Expand Up @@ -464,6 +475,7 @@ <h2><span class="header-section-number">3.1</span> What is tidy data?</h2>
<tbody>
<tr class="odd">
<td align="left">2009-01-01</td>
<<<<<<< HEAD
<td align="right">0.884</td>
<td align="right">-1.016</td>
<td align="right">-1.537</td>
Expand Down Expand Up @@ -491,6 +503,35 @@ <h2><span class="header-section-number">3.1</span> What is tidy data?</h2>
<td align="right">-0.784</td>
<td align="right">2.422</td>
<td align="right">5.535</td>
=======
<td align="right">-0.431</td>
<td align="right">-1.670</td>
<td align="right">-5.342</td>
</tr>
<tr class="even">
<td align="left">2009-01-02</td>
<td align="right">-1.302</td>
<td align="right">0.636</td>
<td align="right">4.286</td>
</tr>
<tr class="odd">
<td align="left">2009-01-03</td>
<td align="right">-1.053</td>
<td align="right">1.197</td>
<td align="right">6.420</td>
</tr>
<tr class="even">
<td align="left">2009-01-04</td>
<td align="right">-0.452</td>
<td align="right">2.654</td>
<td align="right">-2.260</td>
</tr>
<tr class="odd">
<td align="left">2009-01-05</td>
<td align="right">-0.146</td>
<td align="right">0.280</td>
<td align="right">-6.077</td>
>>>>>>> 5223f11c169d8e511baaf89c512ab1e07fdad40e
</tr>
</tbody>
</table>
Expand Down Expand Up @@ -625,7 +666,7 @@ <h2><span class="header-section-number">3.3</span> How is <code>flights</code> t
<strong><em>Learning check</em></strong>
</p>
</div>
<p><strong>(LC3.8)</strong> Run the following block of code in R to load and view each of the four data frames in the <code>nycflights13</code> package. Switch between the different tabs that have opened to view each of the four data frames. Describe in two sentences for each data frame what stands out to you and what the most important features are of each.</p>
<p><strong>(LC3.8)</strong> Run the following block of code in RStudio to load and view each of the four data frames in the <code>nycflights13</code> package. Switch between the different tabs that have opened to view each of the four data frames. Describe in two sentences for each data frame what stands out to you and what the most important features are of each.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">data</span>(weather)
<span class="kw">data</span>(planes)
<span class="kw">data</span>(airports)
Expand All @@ -639,16 +680,15 @@ <h2><span class="header-section-number">3.3</span> How is <code>flights</code> t
<h3><span class="header-section-number">3.3.1</span> Identification variables</h3>
<p>There is a subtle difference between the kinds of variables that you will encounter in data frames. The <code>airports</code> data frame you worked with above contains data in these different kinds. Let’s pull them apart using the <code>glimpse</code> function:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">glimpse</span>(airports)</code></pre></div>
<pre><code>## Observations: 1,458
## Variables: 8
## $ faa &lt;chr&gt; &quot;04G&quot;, &quot;06A&quot;, &quot;06C&quot;, &quot;06N&quot;, &quot;09J&quot;, &quot;0A9&quot;, &quot;0G6&quot;, &quot;0G7...
## $ name &lt;chr&gt; &quot;Lansdowne Airport&quot;, &quot;Moton Field Municipal Airport&quot;,...
## $ lat &lt;dbl&gt; 41.13, 32.46, 41.99, 41.43, 31.07, 36.37, 41.47, 42.8...
## $ lon &lt;dbl&gt; -80.62, -85.68, -88.10, -74.39, -81.43, -82.17, -84.5...
## $ alt &lt;int&gt; 1044, 264, 801, 523, 11, 1593, 730, 492, 1000, 108, 4...
## $ tz &lt;dbl&gt; -5, -6, -6, -5, -5, -5, -5, -5, -5, -8, -5, -6, -5, -...
## $ dst &lt;chr&gt; &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;U&quot;, &quot;A&quot;, &quot;A&quot;...
## $ tzone &lt;chr&gt; &quot;America/New_York&quot;, &quot;America/Chicago&quot;, &quot;America/Chica...</code></pre>
<pre><code>## Observations: 1,396
## Variables: 7
## $ faa &lt;chr&gt; &quot;04G&quot;, &quot;06A&quot;, &quot;06C&quot;, &quot;06N&quot;, &quot;09J&quot;, &quot;0A9&quot;, &quot;0G6&quot;, &quot;0G7&quot;...
## $ name &lt;chr&gt; &quot;Lansdowne Airport&quot;, &quot;Moton Field Municipal Airport&quot;, ...
## $ lat &lt;dbl&gt; 41.13047, 32.46057, 41.98934, 41.43191, 31.07447, 36.3...
## $ lon &lt;dbl&gt; -80.61958, -85.68003, -88.10124, -74.39156, -81.42778,...
## $ alt &lt;int&gt; 1044, 264, 801, 523, 11, 1593, 730, 492, 1000, 108, 40...
## $ tz &lt;dbl&gt; -5, -5, -6, -5, -4, -4, -5, -5, -5, -8, -5, -6, -5, -4...
## $ dst &lt;chr&gt; &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;A&quot;, &quot;U&quot;, &quot;A&quot;, &quot;A&quot;,...</code></pre>
<p>The variables <code>faa</code> and <code>name</code> are what we will call <em>identification variables</em>. They are mainly used to provide a name to the observational unit. Here the observational unit is an airport and the <code>faa</code> gives the code provided by the FAA for that airport while the <code>name</code> variable gives the longer more natural name of the airport. These ID variables differ from the other variables that are often called <em>measurement</em> or <em>characteristic</em> variables. The remaining variables (aside from <code>faa</code> and <code>name</code>) are of this type in <code>airports</code>. They don’t uniquely identify the observational unit, but instead describe properties of the observational unit. For organizational purposes, it is best practice to have your identification variables in the far leftmost columns of your data frame.</p>
<hr />
<div class="learncheck">
Expand Down Expand Up @@ -800,7 +840,11 @@ <h2><span class="header-section-number">3.4</span> Normal forms of data</h2>
<strong><em>Review questions</em></strong>
</p>
</div>
<<<<<<< HEAD
<p>Review questions have been designed using the <code>fivethirtyeight</code> R package <span class="citation">(Ismay and Chunn <a href="#ref-R-fivethirtyeight">2016</a>)</span> with links to the corresponding FiveThirtyEight.com articles in our free DataCamp course <strong>Effective Data Storytelling using the <code>tidyverse</code></strong>. The material in this chapter is covered in the <strong>Tidy Data</strong> chapter of the DataCamp course available <a href="https://campus.datacamp.com/courses/effective-data-storytelling-using-the-tidyverse/tidy-data">here</a>.</p>
=======
<p><strong>(RQ3.1)</strong></p>
>>>>>>> 5223f11c169d8e511baaf89c512ab1e07fdad40e
<hr />
<hr />
</div>
Expand Down
Empty file added docs/3-viz.html
Empty file.
Empty file added docs/4-manip.html
Empty file.
3 changes: 1 addition & 2 deletions docs/4-viz.html
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,7 @@
<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
<script src="libs/moment-2.8.4/moment.js"></script>
<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
<script src="libs/dygraphs-binding-1.1.1.4/dygraphs.js"></script>
<script src="libs/dygraphs-binding-1.1.1.3/dygraphs.js"></script>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
Expand Down
Empty file added docs/5-hypo.html
Empty file.
45 changes: 22 additions & 23 deletions docs/5-manip.html
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,7 @@
<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
<script src="libs/moment-2.8.4/moment.js"></script>
<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
<script src="libs/dygraphs-binding-1.1.1.4/dygraphs.js"></script>
<script src="libs/dygraphs-binding-1.1.1.3/dygraphs.js"></script>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
Expand Down Expand Up @@ -524,12 +523,12 @@ <h3><span class="header-section-number">5.2.2</span> 5MV#2: Summarize variables
<span class="st"> </span><span class="kw">summarize</span>(<span class="dt">mean =</span> <span class="kw">mean</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>), <span class="dt">std_dev =</span> <span class="kw">sd</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>))
summary_temp</code></pre></div>
<pre><code>## # A tibble: 1 × 2
## mean std_dev
## &lt;dbl&gt; &lt;dbl&gt;
## 1 55.2 17.78</code></pre>
## mean std_dev
## &lt;dbl&gt; &lt;dbl&gt;
## 1 55.20351 17.78212</code></pre>
<p>If we’d like to access either of these values directly we can use the <code>$</code> to specify a column in a data frame. For example:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">summary_temp$mean</code></pre></div>
<pre><code>## [1] 55.2</code></pre>
<pre><code>## [1] 55.20351</code></pre>
<p>You’ll often encounter issues with missing values <code>NA</code>. In fact, an entire branch of the field of statistics deals with missing data. However, it is not good practice to include a <code>na.rm = TRUE</code> in your summary commands by default; you should attempt to run them without this argument. The idea being you should at the very least be alerted to the presence of missing values and consider what the impact on the analysis might be if you ignore these values. In other words, <code>na.rm = TRUE</code> should only be used when necessary.</p>
<p>What other summary functions can we use inside the <code>summarize()</code> verb? Any function in R that takes a vector of values and returns just one. Here are just a few:</p>
<ul>
Expand Down Expand Up @@ -574,20 +573,20 @@ <h3><span class="header-section-number">5.2.3</span> 5MV#3: Group rows using gro
<span class="dt">std_dev =</span> <span class="kw">sd</span>(temp, <span class="dt">na.rm =</span> <span class="ot">TRUE</span>))
summary_monthly_temp</code></pre></div>
<pre><code>## # A tibble: 12 × 3
## month mean std_dev
## &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
## 1 1 35.64 10.185
## 2 2 34.15 6.940
## 3 3 39.81 6.225
## 4 4 51.67 8.785
## 5 5 61.59 9.609
## 6 6 72.14 7.603
## 7 7 80.01 7.148
## 8 8 74.40 5.171
## 9 9 67.43 8.476
## 10 10 60.03 8.830
## 11 11 45.11 10.502
## 12 12 38.37 9.941</code></pre>
## month mean std_dev
## &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
## 1 1 35.64127 10.185459
## 2 2 34.15454 6.940228
## 3 3 39.81404 6.224948
## 4 4 51.67094 8.785250
## 5 5 61.59185 9.608687
## 6 6 72.14500 7.603356
## 7 7 80.00967 7.147631
## 8 8 74.40495 5.171365
## 9 9 67.42582 8.475824
## 10 10 60.03305 8.829652
## 11 11 45.10893 10.502249
## 12 12 38.36811 9.940822</code></pre>
<p>This code is identical to the previous code that created <code>summary_temp</code>, but there is an extra <code>group_by(month)</code> spliced in between. By simply grouping the <code>weather</code> data set by <code>month</code> first and then passing this new data frame into <code>summarize</code> we get a resulting data frame that shows the mean and standard deviation temperature for each month in New York City. Since each row in <code>summary_monthly_temp</code> represents a summary of different rows in <code>weather</code>, the observational units have changed.</p>
<p>It is important to note that <code>group_by</code> doesn’t actually change the data frame. It simply sets <em>meta-data</em> (data about the data), specifically the group structure of the data. It is only after we apply the <code>summarize</code> function that the data frame actually changes. If we would like to remove this group structure meta-data, we can pipe a resulting data frame into the <code>ungroup()</code> function.</p>
<p>We now revisit the <code>n()</code> counting summary function we introduced in the previous section. For example, suppose we’d like to get a sense for how many flights departed each of the three airports in New York City:</p>
Expand Down Expand Up @@ -646,9 +645,9 @@ <h3><span class="header-section-number">5.2.4</span> 5MV#4: Create new variables
)
gain_summary</code></pre></div>
<pre><code>## # A tibble: 1 × 8
## min q1 median q3 max mean sd missing
## &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt;
## 1 -109 -17 -7 3 196 -5.66 18.04 9430</code></pre>
## min q1 median q3 max mean sd missing
## &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt;
## 1 -109 -17 -7 3 196 -5.659779 18.04365 9430</code></pre>
<p>We’ve recreated the <code>summary</code> function we saw in Chapter <a href="4-viz.html#viz">4</a> here using the <code>summarize</code> function in <code>dplyr</code>.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggplot</span>(<span class="dt">data =</span> flights, <span class="dt">mapping =</span> <span class="kw">aes</span>(<span class="dt">x =</span> gain)) +
<span class="st"> </span><span class="kw">geom_histogram</span>(<span class="dt">color =</span> <span class="st">&quot;white&quot;</span>, <span class="dt">bins =</span> <span class="dv">20</span>)</code></pre></div>
Expand Down
Empty file added docs/6-ci.html
Empty file.
Loading

0 comments on commit a5cec6c

Please sign in to comment.