Skip to content

Commit

Permalink
Merge pull request #681 from jhudsl/minor_fixes_stats_dat_out_W25
Browse files Browse the repository at this point in the history
small edits to slides
  • Loading branch information
clifmckee authored Jan 16, 2025
2 parents 13950b4 + ab55d81 commit bc0d372
Show file tree
Hide file tree
Showing 4 changed files with 26 additions and 18 deletions.
10 changes: 6 additions & 4 deletions modules/Data_Output/Data_Output.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ pre { /* Code block - slightly smaller in this lecture */
</style>


## Data Output
## Data Output {.smaller}

While its nice to be able to read in a variety of data formats, it's equally important to be able to output data somewhere.

Expand Down Expand Up @@ -102,7 +102,7 @@ save.image(file = "my_environment.RData")
```


## Using RStudio for importing/exporting data
## Using RStudio for importing/exporting data {.smaller}

If there is an `.rds` or `.RData` file that you want to work with, you can open it into your environment using the file icon.

Expand Down Expand Up @@ -131,12 +131,14 @@ ggsave(filename = "saved_plot.png", # will save in working directory
width = 6, height = 3.5) # by default in inches
```

## Summary {.small}
## Summary

- Use `write_csv()` and `write_delim()` from the `readr` package to write your (modified) data
- `.rds` files can be handy for saving intermediate work
- Can save environment (or subset) using `save()` and `save.image()`

## Resources & Lab {.small}

🏠 [Class Website](https://jhudatascience.org/intro_to_r/)

💻 [Data Output Lab](https://jhudatascience.org/intro_to_r/modules/Data_Output/lab/Data_Output_Lab.Rmd)
Expand All @@ -145,7 +147,7 @@ ggsave(filename = "saved_plot.png", # will save in working directory

📃 [Day 2 Cheatsheet](https://jhudatascience.org/intro_to_r/modules/cheatsheets/Day-2.pdf)

```{r, fig.alt="The End", out.width = "50%", echo = FALSE, fig.align='center'}
```{r, fig.alt="The End", out.width = "35%", echo = FALSE, fig.align='center'}
knitr::include_graphics(here::here("images/the-end-g23b994289_1280.jpg"))
```

Expand Down
8 changes: 5 additions & 3 deletions modules/Data_Summarization/Data_Summarization.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -515,15 +515,17 @@ yts_loc %>% unique() %>% length() # similar to n_distinct()
* `range()`: minimum and maximum of the data
* `IQR()`: interquartile range of the data

## Summary & Lab Part 2 {.small}
## Summary

- `count(x)`: what unique values do you have?
- `distinct()`: what are the distinct values?
- `n_distinct()` with `pull()`: how many distinct values?
- `group_by()`: changes subsequent functions (remove with `ungroup()`)
- combine with `summarize()` to get statistics per group
- combine with `mutate()` to add column
- `summarize()` with `n()` gives the count (NAs included)
- `summarize()` with `n()` gives the count (NAs included)

## Resources & Lab Part 2 {.small}

🏠 [Class Website](https://jhudatascience.org/intro_to_r/)

Expand All @@ -533,7 +535,7 @@ yts_loc %>% unique() %>% length() # similar to n_distinct()

📃 [Posit's data transformation Cheatsheet](https://rstudio.github.io/cheatsheets/data-transformation.pdf)

```{r, fig.alt="The End", out.width = "20%", echo = FALSE, fig.align='center'}
```{r, fig.alt="The End", out.width = "35%", echo = FALSE, fig.align='center'}
knitr::include_graphics(here::here("images/the-end-g23b994289_1280.jpg"))
```

Expand Down
10 changes: 7 additions & 3 deletions modules/Reproducibility/Reproducibility.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ Occasionally we might forget to save a step of our code in our R Markdown file t
knitr::include_graphics("images/clean.png")
```

## Check if your file knits regularly
## Check if your file knits regularly {.small}

Regularly checking if your file knits will help you spot a missing step or error earlier when you have less code to try to identify where your code might have gone wrong.

Expand All @@ -143,6 +143,8 @@ knitr::include_graphics("images/knit.png")
knitr::include_graphics("images/error_monster.png")
```

Image by [Allison Horst](https://allisonhorst.com/data-science-art).

## Tell your future self and others what you did!

Provide sufficient detail so that you can understand what you did and why.
Expand Down Expand Up @@ -250,7 +252,7 @@ These are just some quick tips, for more information:
- [Jenny Bryan's organizational strategies](https://www.stat.ubc.ca/~jenny/STAT545A/block19_codeFormattingOrganization.html).
- [Write efficient R code for science](https://www.earthdatascience.org/courses/earth-analytics/automate-science-workflows/write-efficient-code-for-science-r/).

## Summary {.smaller}
## Summary

To help make your work more reproducible:

Expand All @@ -260,6 +262,8 @@ To help make your work more reproducible:
- Tell your future self and others what you did!
- Print session info!

## Resources & Lab {.small}

🏠 [Class Website](https://jhudatascience.org/intro_to_r/)

💻 [Lab](https://jhudatascience.org/intro_to_r/modules/Reproducibility/lab/Reproducibility_Lab.Rmd)
Expand All @@ -268,7 +272,7 @@ To help make your work more reproducible:

🗒 [RStudio Cheatsheet](https://d33wubrfki0l68.cloudfront.net/374f4c769f97c4ded7300d521eb59b24168a7261/c72ad/lesson-images/cheatsheets-1-cheatsheet.png)

```{r, fig.alt="The End", out.width = "15%", echo = FALSE, fig.align='center'}
```{r, fig.alt="The End", out.width = "35%", echo = FALSE, fig.align='center'}
knitr::include_graphics(here::here("images/the-end-g23b994289_1280.jpg"))
```

Expand Down
16 changes: 8 additions & 8 deletions modules/Statistics/Statistics.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -96,8 +96,8 @@ cor(x, y = NULL, use = c("everything", "complete.obs"),
Function `cor.test()` also computes correlation and tests for association.

```
cor.test(x, y = NULL, alternative(c("two.sided", "less", "greater")),
method = c("pearson", "kendall", "spearman"))
cor.test(x, y = NULL, alternative = c("two.sided", "less", "greater"),
method = c("pearson", "kendall", "spearman"), ...)
```
- provide two numeric vectors of the same length (arguments `x`, `y`), or
- provide a data.frame / tibble with numeric columns only
Expand Down Expand Up @@ -164,7 +164,7 @@ glimpse(cor_result)

## Correlation for two vectors with plot {.smaller}

In plot form... `geom_smooth()` and `annotate()` can help.
In plot form, `geom_smooth()` and `annotate()` can help.

```{r}
corr_value <- pull(cor_result, estimate) %>% round(digits = 4)
Expand Down Expand Up @@ -515,7 +515,7 @@ esdcomp %>% count(residency)

## Linear regression: factors {.smaller}

Yes relative to No -- baseline is no
Yes relative to No -- baseline is No

```{r regressbaseline, comment="", fig.height=4,fig.width=8}
fit_3 <- glm(visits ~ residency, data = esdcomp)
Expand Down Expand Up @@ -672,9 +672,9 @@ The odds ratio is 21.4. When the predictor is TRUE (aka the individual ate vanil
## Functions you might also see {.smaller}

- the `stat_cor()` function in the `ggpubr` can add correlation coefficients and p-values as a layer to `ggplot` objects
- the `graphics::pairs()` or `GGally::ggpairs()` functions are also useful for visualizing correlations across variables in a data frame
- the `pairs()` (`graphics` package) or `ggpairs()` (`GGally` package) functions are also useful for visualizing correlations across variables in a data frame
- `acf()` in the `stats` package can compute autocorrelation and cross-correlation with lags
- calculate confidence intervals for intercept and slopes from `glm`/lm` objects using `confint()`
- calculate confidence intervals for intercept and slopes from `glm`/`lm` objects using `confint()`
- principal components analysis -- use `prcomp()`

```{r, fig.alt="There's an R package for everything", out.width = "20%", echo = FALSE, fig.show='hold',fig.align='center'}
Expand Down Expand Up @@ -719,15 +719,15 @@ For classes at JHU School of Public Health:
- [PH.140.778 - Statistical Computing, Algorithm, and Software Development](https://www.jhsph.edu/courses/course/36737/2022/140.778.01/statistical-computing-algorithm-and-software-devel) - A more advanced course for working with data in R.
Content for similar topics as this course can also be found on Leanpub.

## Lab Part 2
## Lab Part 2 {.small}

🏠 [Class Website](https://jhudatascience.org/intro_to_r/)

💻 [Lab](https://jhudatascience.org/intro_to_r/modules/Statistics/lab/Statistics_Lab.Rmd)

📃 [Day 8 Cheatsheet](https://jhudatascience.org/intro_to_r/modules/cheatsheets/Day-8.pdf)

```{r, fig.alt="The End", out.width = "50%", echo = FALSE, fig.align='center'}
```{r, fig.alt="The End", out.width = "35%", echo = FALSE, fig.align='center'}
knitr::include_graphics(here::here("images/the-end-g23b994289_1280.jpg"))
```

Expand Down

0 comments on commit bc0d372

Please sign in to comment.