Skip to content

Commit

Permalink
Up to 0.1.2
Browse files Browse the repository at this point in the history
  • Loading branch information
ismayc committed Jan 22, 2017
1 parent f6b757f commit 3081fc6
Show file tree
Hide file tree
Showing 35 changed files with 213 additions and 483 deletions.
6 changes: 3 additions & 3 deletions 03-tidy.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ Note that if you look in the leftmost column of the `View(flights)` output, you
- specify the variables, and
- give the types of variables you are presented with.

The `glimpse()` command in the `dplyr` package provides us with much of the above information and more:
The `glimpse()` command in the `tibble` package provides us with much of the above information and more:

```{r}
glimpse(flights)
Expand All @@ -170,10 +170,10 @@ glimpse(flights)

We see that `glimpse` will give you the first few entries of each variable in a row after the variable. In addition, the type of the variable is given immediately after each variable's name inside `< >`. Here, `int` and `num` refer to quantitative variables. In contrast, `chr` refers to categorical variables. One more type of variable is given here with the `time_hour` variable: **dttm**. As you may suspect, this variable corresponds to a specific date and time of day.

Another nice feature of R is the help system. You can get help in R by simply entering a question mark before the name of a function or an object and you will be presented with a page showing the documentation. Note that this output help file is omitted here but can be accessed [here](https://cran.r-project.org/web/packages/nycflights13/nycflights13.pdf) on page 3 of the PDF document.
Another nice feature of R is the help system. You can get help in R by simply entering a question mark before the name of a function or an object and you will be presented with a page showing the documentation. Since `glimpse` is a function defined in the `tibble` package, you can further emphasize that you'd like to look at the help for that specific `glimpse` function by adding the two columns between the package name and the function. Note that these output help files is omitted here but the `flights` help can be accessed [here](https://cran.r-project.org/web/packages/nycflights13/nycflights13.pdf) on page 3 of the PDF document.

```{r eval=FALSE}
?glimpse
?tibble::glimpse
?flights
```

Expand Down
37 changes: 18 additions & 19 deletions 04-viz.Rmd
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Data Visualization via ggplot2 {#viz}

```{r setup_viz, include=FALSE}
```{r setup-viz, include=FALSE, purl=FALSE}
chap <- 4
lc <- 0
rq <- 0
Expand Down Expand Up @@ -52,7 +52,7 @@ Specifically, we can break a graphic into the following three essential componen

In 1812, Napoleon led a French invasion of Russia, marching on Moscow. It was one of the biggest military disasters due in large part to the Russian winter. In 1869, a French civil engineer named Charles Joseph Minard published arguably one of the greatest statistical visualizations of all-time, which summarized this march:

```{r minard, echo=FALSE, fig.cap="Minard's Visualization of Napolean's March"}
```{r minard, echo=FALSE, fig.cap="Minard's Visualization of Napolean's March", purl=FALSE}
knitr::include_graphics("images/Minard.png")
```

Expand Down Expand Up @@ -163,7 +163,7 @@ This code snippet makes use of functions in the `dplyr` package for data manipul

***

```{block lc-all_alaska_flights, type='learncheck'}
```{block lc-all_alaska_flights, type='learncheck', purl=FALSE}
**_Learning check_**
```

Expand Down Expand Up @@ -198,7 +198,7 @@ In Figure \@ref(fig:noalpha) we see that a positive relationship exists between

***

```{block lc-scatter-plots, type='learncheck'}
```{block lc-scatter-plots, type='learncheck', purl=FALSE}
**_Learning check_**
```

Expand Down Expand Up @@ -242,7 +242,7 @@ Note how this function call is identical to the one in Section \@ref(geompoint),

***

```{block lc-overplotting, type='learncheck'}
```{block lc-overplotting, type='learncheck', purl=FALSE}
**_Learning check_**
```

Expand Down Expand Up @@ -287,7 +287,7 @@ This is similar to the previous use of the `filter` command in Section \@ref(sca

***

```{block lc-early_january_weather, type='learncheck'}
```{block lc-early_january_weather, type='learncheck', purl=FALSE}
**_Learning check_**
```

Expand Down Expand Up @@ -318,7 +318,7 @@ Much as with the `ggplot()` call in Section \@ref(geompoint), we specify the com

***

```{block lc-line-graph, type='learncheck'}
```{block lc-line-graph, type='learncheck', purl=FALSE}
**_Learning check_**
```

Expand Down Expand Up @@ -394,7 +394,7 @@ ggplot(data = weather, mapping = aes(x = temp)) +

***

```{block lc-histogram, type='learncheck'}
```{block lc-histogram, type='learncheck', purl=FALSE}
**_Learning check_**
```

Expand Down Expand Up @@ -431,7 +431,7 @@ As we might expect, the temperature tends to increase as summer approaches and t

***

```{block lc-facet, type='learncheck'}
```{block lc-facet, type='learncheck', purl=FALSE}
**_Learning check_**
```

Expand Down Expand Up @@ -472,7 +472,7 @@ We have introduced a new function called `factor()` here. One of the things thi

***

```{block lc-boxplot, type='learncheck'}
```{block lc-boxplot, type='learncheck', purl=FALSE}
**_Learning check_**
```

Expand Down Expand Up @@ -525,7 +525,7 @@ knitr::kable(flights_table)

***

```{block lc-barplot, type='learncheck'}
```{block lc-barplot, type='learncheck', purl=FALSE}
**_Learning check_**
```

Expand All @@ -539,8 +539,6 @@ knitr::kable(flights_table)

***



### Must avoid pie charts!

Unfortunately, one of the most common plots seen today for categorical data is the pie chart. While they may see harmless enough, they actually present a problem in that humans are unable to judge angles well. As Naomi Robbins describes in her book "Creating More Effective Graphs" [@robbins2013], we overestimate angles greater than 90 degrees and we underestimate angles less than 90 degrees. In other words, it is difficult for us to determine relative size of one piece of the pie compared to another.
Expand Down Expand Up @@ -570,13 +568,13 @@ While it is quite easy to look back at the barplot to get the answer to these qu

[fd]: https://flowingdata.com/2008/09/19/pie-i-have-eaten-and-pie-i-have-not-eaten/ "Pie I Have Eaten and Pie I Have Not Eaten"

```{r echo=FALSE, fig.align='center', fig.cap="The only good pie chart", out.height=if(knitr:::is_latex_output()) '2.5in'}
```{r echo=FALSE, fig.align='center', fig.cap="The only good pie chart", out.height=if(knitr:::is_latex_output()) '2.5in', purl=FALSE}
knitr::include_graphics("images/Pie-I-have-Eaten.jpg")
```

***

```{block lc-pie-charts, type='learncheck'}
```{block lc-pie-charts, type='learncheck', purl=FALSE}
**_Learning check_**
```

Expand Down Expand Up @@ -610,7 +608,7 @@ This plot is what is known as a **stacked barplot**. While simple to make, it o

***

```{block lc-barplot-two-var, type='learncheck'}
```{block lc-barplot-two-var, type='learncheck', purl=FALSE}
**_Learning check_**
```

Expand All @@ -629,7 +627,7 @@ ggplot(data = flights_namedports, mapping = aes(x = carrier, fill = name)) +

***

```{block lc-barplot-stacked, type='learncheck'}
```{block lc-barplot-stacked, type='learncheck', purl=FALSE}
**_Learning check_**
```

Expand All @@ -653,7 +651,7 @@ Note how the `facet_grid` function arguments are written here. We are wanting t

***

```{block lc-barplot-facet, type='learncheck'}
```{block lc-barplot-facet, type='learncheck', purl=FALSE}
**_Learning check_**
```

Expand Down Expand Up @@ -699,7 +697,8 @@ In addition, we've created a mind map to help you remember which types of plots

### Script of R code

```{r include=FALSE, eval=FALSE}
```{r include=FALSE, eval=FALSE, purl=FALSE}
dir.create("docs/scripts")
knitr::purl("04-viz.Rmd", "docs/scripts/04-viz.R")
```

Expand Down
55 changes: 19 additions & 36 deletions 05-manip.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
material here relates to answering those questions
-->

```{r setup_manip, include=FALSE}
```{r setup_manip, include=FALSE, purl=FALSE}
chap <- 5
lc <- 0
rq <- 0
Expand Down Expand Up @@ -40,10 +40,6 @@ library(nycflights13)
library(knitr)
```





<!--Subsection on Pipe -->

## The pipe `%>%`
Expand All @@ -56,10 +52,6 @@ Before we introduce the five main verbs, we first introduce the the pipe operato

The piping syntax will be our major focus throughout the rest of this book and you'll find that you'll quickly be addicted to the chaining with some practice. If you'd like to see more examples on using `dplyr`, the 5MV (in addition to some other `dplyr` verbs), and `%>%` with the `nycflights13` data set, you can check out Chapter 5 of Hadley and Garrett's book [@rds2016].





<!--Subsection on Verbs -->

## Five Main Verbs - The 5MV
Expand All @@ -78,7 +70,7 @@ Just as we had the 5NG (The Five Named Graphs in Chapter \@ref(viz) using `ggplo

### 5MV#1: Filter observations using filter {#filter}

```{r filter, echo=FALSE, fig.cap="Filter diagram from Data Wrangling with dplyr and tidyr cheatsheet"}
```{r filter, echo=FALSE, fig.cap="Filter diagram from Data Wrangling with dplyr and tidyr cheatsheet", purl=FALSE}
knitr::include_graphics("images/filter.png")
```

Expand Down Expand Up @@ -143,7 +135,7 @@ As a final note we point out that `filter()` should often be the first verb you'

***

```{block lc-filter, type='learncheck'}
```{block lc-filter, type='learncheck', purl=FALSE}
**_Learning check_**
```

Expand All @@ -155,11 +147,11 @@ As a final note we point out that `filter()` should often be the first verb you'

### 5MV#2: Summarize variables using summarize

```{r sum1, echo=FALSE, fig.cap="Summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet"}
```{r sum1, echo=FALSE, fig.cap="Summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet", purl=FALSE}
knitr::include_graphics("images/summarize1.png")
```

```{r sum2, echo=FALSE, fig.cap="Another summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet"}
```{r sum2, echo=FALSE, fig.cap="Another summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet", purl=FALSE}
knitr::include_graphics("images/summary.png")
```

Expand Down Expand Up @@ -187,10 +179,6 @@ summary_temp$mean

You'll often encounter issues with missing values `NA`. In fact, an entire branch of the field of statistics deals with missing data. However, it is not good practice to include a `na.rm = TRUE` in your summary commands by default; you should attempt to run them without this argument. The idea being you should at the very least be alerted to the presence of missing values and consider what the impact on the analysis might be if you ignore these values. In other words, `na.rm = TRUE` should only be used when necessary.

<!--
-->

What other summary functions can we use inside the `summarize()` verb? Any function in R that takes a vector of values and returns just one. Here are just a few:

* `min()` and `max()`: the minimum and maximum values respectively
Expand All @@ -201,7 +189,7 @@ What other summary functions can we use inside the `summarize()` verb? Any funct

***

```{block lc-summarize, type='learncheck'}
```{block lc-summarize, type='learncheck', purl=FALSE}
**_Learning check_**
```

Expand All @@ -223,7 +211,7 @@ summary_temp <- weather %>%

### 5MV#3: Group rows using group_by

```{r groupsummarize, echo=FALSE, fig.cap="Group by and summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet"}
```{r groupsummarize, echo=FALSE, fig.cap="Group by and summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet", purl=FALSE}
knitr::include_graphics("images/group_summary.png")
```

Expand All @@ -239,7 +227,8 @@ We believe that you will be amazed at just how simple this is. Run the following
```{r}
summary_monthly_temp <- weather %>%
group_by(month) %>%
summarize(mean = mean(temp, na.rm = TRUE), std_dev = sd(temp, na.rm = TRUE))
summarize(mean = mean(temp, na.rm = TRUE),
std_dev = sd(temp, na.rm = TRUE))
summary_monthly_temp
```

Expand Down Expand Up @@ -289,7 +278,7 @@ View(by_monthly_origin)

### 5MV#4: Create new variables/change old variables using mutate

```{r select, echo=FALSE, fig.cap="Mutate diagram from Data Wrangling with dplyr and tidyr cheatsheet"}
```{r select, echo=FALSE, fig.cap="Mutate diagram from Data Wrangling with dplyr and tidyr cheatsheet", purl=FALSE}
knitr::include_graphics("images/mutate.png")
```

Expand Down Expand Up @@ -339,7 +328,7 @@ flights <- flights %>%

***

```{block lc-mutate, type='learncheck'}
```{block lc-mutate, type='learncheck', purl=FALSE}
**_Learning check_**
```

Expand Down Expand Up @@ -369,21 +358,15 @@ freq_dest
You'll see that by default the values of `dest` are displayed in alphabetical order here. We are interested in finding those airports that appear most:

```{r}
freq_dest %>%
arrange(num_flights)
freq_dest %>% arrange(num_flights)
```

This is actually giving us the opposite of what we are looking for. It tells us the least frequent destination airports first. To switch the ordering to be descending instead of ascending we use the `desc` function:

```{r}
freq_dest %>%
arrange(desc(num_flights))
freq_dest %>% arrange(desc(num_flights))
```





<!--Chapter on joins-->

## Joining data frames
Expand All @@ -398,7 +381,7 @@ We see that in `airports`, `carrier` is the carrier code while `name` is the ful

Note that the values in the variable `carrier` in `flights` match the values in the variable `carrier` in `airlines`. In this case, we can use the variable `carrier` as a *key variable* to join/merge/match the two data frames by. Hadley and Garrett [@rds2016] created the following diagram to help us understand how the different data sets are linked:

```{r reldiagram, echo=FALSE, fig.cap="Data relationships in nycflights13 from R for Data Science"}
```{r reldiagram, echo=FALSE, fig.cap="Data relationships in nycflights13 from R for Data Science", purl=FALSE}
knitr::include_graphics("images/relational-nycflights.png")
```

Expand All @@ -418,7 +401,7 @@ We observed that the `flights` and `flights_joined` are identical except that `f

A visual representation of the `inner_join` is given below [@rds2016]:

```{r ijdiagram, echo=FALSE, fig.cap="Diagram of inner join from R for Data Science"}
```{r ijdiagram, echo=FALSE, fig.cap="Diagram of inner join from R for Data Science", purl=FALSE}
knitr::include_graphics("images/join-inner.png")
```

Expand Down Expand Up @@ -470,7 +453,7 @@ In case you didn't know, `"ORD"` is the airport code of Chicago O'Hare airport a

***

```{block lc-join, type='learncheck'}
```{block lc-join, type='learncheck', purl=FALSE}
**_Learning check_**
```

Expand All @@ -490,7 +473,7 @@ In case you didn't know, `"ORD"` is the airport code of Chicago O'Hare airport a

### Select variables using select {#select}

```{r selectfig, echo=FALSE, fig.cap="Select diagram from Data Wrangling with dplyr and tidyr cheatsheet"}
```{r selectfig, echo=FALSE, fig.cap="Select diagram from Data Wrangling with dplyr and tidyr cheatsheet", purl=FALSE}
knitr::include_graphics("images/select.png")
```

Expand Down Expand Up @@ -606,7 +589,7 @@ View(ten_freq_dests)

***

```{block lc-other-verbs, type='learncheck'}
```{block lc-other-verbs, type='learncheck', purl=FALSE}
**_Learning check_**
```

Expand Down Expand Up @@ -644,7 +627,7 @@ We will focus only on the `dplyr` functions in this book, but you are encouraged

### Script of R code

```{r include=FALSE, eval=FALSE}
```{r include=FALSE, eval=FALSE, purl=FALSE}
knitr::purl("05-manip.Rmd", "docs/scripts/05-manip.R")
```

Expand Down
Loading

0 comments on commit 3081fc6

Please sign in to comment.