forked from fels-bioinformatics/fels_bioinformatics_meetup
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathweek6_practice_stats_in_R.Rmd
128 lines (71 loc) · 2.74 KB
/
week6_practice_stats_in_R.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
---
output: html_document
---
```{r setup, include=FALSE}
library(tidyverse)
library(conflicted)
library(nycflights13)
filter <- dplyr::filter
knitr::opts_chunk$set(echo = TRUE)
```
<br><br>
# Practice: Statistical Tests in R
Before doing anything else, install the broom package in the console below! Once you've done that load the library.
```{r}
library(broom)
```
<br>
## Practice Running the Code
### Summary Statistics
Remember the flights dataset from the nycflights13 package? It has information for flights in 2013 in the New York City metropolitan area airports: Newark (EWR), LaGuardia (LGA), and JFK. In the chunk below, find the mean, median, and standard deviation of the departure delay in the flights table in a SINGLE line of code by airport.
```{r}
```
<br>
Why is there such a large difference between the mean and the median? Plotting a histogram of the departure delay gives the answer the question. Can you explain it now?
```{r}
```
**Write your answer here (so you dont forget later):**
<br>
### t test
Suppose that on Thanksgiving, the average departure delay at NYC airports is 45 mins. Is this unusual? Test it with a t test.
```{r}
```
<br>
### chi square test
The Titanic dataset is built into R and gives survival information for passengers on the Titanic. You'll need to practice your data wrangling skills to get the table in the correct form and don't forget to end by tidying the test with `broom::tidy()`! There's a bit of code in the chunk below to get you started:
```{r}
Titanic %>%
as_tibble()
```
<br>
### correct for multiple testing
The chickwts dataset is built into R and looks at how different feed typs affect baby chick's weight. Use a pairwise t test to test the differences between feed types and don't forget to correct for multiple testing.
```{r}
chickwts
```
<br><br>
## Pick the Test for the Data
In this section there's a question to answer and you need to pick the appropriate test to answer it.
<br>
**Question 1:** In the flights dataset again, is the average departure delay the same as the average arrival delay?
```{r}
```
<br>
**Question 2:** The weather dataset is also in the nycflights13 package, and gives weather information at the three NYC airports from 2013. In the weather dataset, is wind speed significantly different between the airports?
```{r}
```
<br>
**Question 3:** Do flights from different carriers fly about the same distances?
```{r}
```
<br>
**Question 4:** In the built-in HairEyeColor dataset, are there equal numbers of students between the four hair colors (black, blonde, brown, and red)?
```{r}
HairEyeColor %>%
as_tibble()
```
<br>
**Question 5:** Is the variation in depature delays the same at all the airports in the flights table?
```{r}
```
<br><br>