week06/week6_practice_stats_in_R.Rmd

---
output: html_document
---

```{r setup, include=FALSE}
library(tidyverse)
library(conflicted)
library(nycflights13)

filter <- dplyr::filter

knitr::opts_chunk$set(echo = TRUE)
```

<br><br>

# Practice: Statistical Tests in R

Before doing anything else, install the broom package in the console below! Once you've done that load the library.

```{r}
library(broom)
```

<br>

## Practice Running the Code

### Summary Statistics

Remember the flights dataset from the nycflights13 package? It has information for flights in 2013 in the New York City metropolitan area airports: Newark (EWR), LaGuardia (LGA), and JFK. In the chunk below, find the mean, median, and standard deviation of the departure delay in the flights table in a SINGLE line of code by airport.

```{r}

```

<br>

Why is there such a large difference between the mean and the median? Plotting a histogram of the departure delay gives the answer the question. Can you explain it now?

```{r}

```

**Write your answer here (so you dont forget later):**


<br>

### t test

Suppose that on Thanksgiving, the average departure delay at NYC airports is 45 mins. Is this unusual? Test it with a t test.

```{r}

```

<br>

### chi square test

The Titanic dataset is built into R and gives survival information for passengers on the Titanic. You'll need to practice your data wrangling skills to get the table in the correct form and don't forget to end by tidying the test with `broom::tidy()`! There's a bit of code in the chunk below to get you started:

```{r}
Titanic %>% 
  as_tibble()
```

<br>

### correct for multiple testing

The chickwts dataset is built into R and looks at how different feed typs affect baby chick's weight. Use a pairwise t test to test the differences between feed types and don't forget to correct for multiple testing.

```{r}
chickwts
```

<br><br>

## Pick the Test for the Data

In this section there's a question to answer and you need to pick the appropriate test to answer it.

<br>

**Question 1:** In the flights dataset again, is the average departure delay the same as the average arrival delay?

```{r}

```

<br>

**Question 2:** The weather dataset is also in the nycflights13 package, and gives weather information at the three NYC airports from 2013. In the weather dataset, is wind speed significantly different between the airports?

```{r}

```

<br>

**Question 3:** Do flights from different carriers fly about the same distances?

```{r}

```

<br>

**Question 4:** In the built-in HairEyeColor dataset, are there equal numbers of students between the four hair colors (black, blonde, brown, and red)?

```{r}
HairEyeColor %>% 
  as_tibble() 
```

<br>

**Question 5:** Is the variation in depature delays the same at all the airports in the flights table?

```{r}

```

<br><br>