forked from carpentries-incubator/targets-workshop
-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
New translations quarto.md (Japanese)
- Loading branch information
Showing
1 changed file
with
198 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,198 @@ | ||
--- | ||
title: Reproducible Reports with Quarto | ||
teaching: 10 | ||
exercises: 2 | ||
--- | ||
|
||
:::::::::::::::::::::::::::::::::::::: questions | ||
|
||
- How can we create reproducible reports? | ||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
::::::::::::::::::::::::::::::::::::: objectives | ||
|
||
- Be able to generate a report using `targets` | ||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::: | ||
|
||
::::::::::::::::::::::::::::::::::::: instructor | ||
|
||
Episode summary: Show how to write reports with Quarto | ||
|
||
::::::::::::::::::::::::::::::::::::: | ||
|
||
```{r} | ||
#| label: setup | ||
#| echo: FALSE | ||
#| message: FALSE | ||
#| warning: FALSE | ||
library(targets) | ||
library(tarchetypes) | ||
library(quarto) # don't actually need to load, but put here so renv catches it | ||
source("files/lesson_functions.R") | ||
# Increase width for printing tibbles | ||
options(width = 140) | ||
``` | ||
|
||
## Copy-paste vs. dynamic documents | ||
|
||
Typically, you will want to communicate the results of a data analysis to a broader audience. | ||
|
||
You may have done this before by copying and pasting statistics, plots, and other results into a text document or presentation. | ||
This may be fine if you only ever do the analysis once. | ||
But that is rarely the case---it is much more likely that you will tweak parts of the analysis or add new data and re-run your pipeline. | ||
With the copy-paste method, you'd have to remember what results changed and manually make sure everything is up-to-date. | ||
This is a perilous exercise! | ||
|
||
Fortunately, `targets` provides functions for keeping a document in sync with pipeline results, so you can avoid such pitfalls. | ||
The main tool we will use to generate documents is **Quarto**. | ||
Quarto can be used separately from `targets` (and is a large topic on its own), but it also happens to be an excellent way to dynamically generate reports with `targets`. | ||
|
||
Quarto allows you to insert the results of R code directly into your documents so that there is no danger of copy-and-paste mistakes. | ||
Furthermore, it can generate output from the same underlying script in multiple formats including PDF, HTML, and Microsoft Word. | ||
|
||
::::::::::::::::::::::::::::::::::::: {.prereq} | ||
|
||
## Install Quarto | ||
|
||
If you haven't done so already, you will need to [install Quarto](https://quarto.org/docs/get-started/), which is separate from R. | ||
|
||
You will also need to install the `quarto` R package with `install.packages("quarto")`. | ||
|
||
::::::::::::::::::::::::::::::::::::: | ||
|
||
## About Quarto files | ||
|
||
`.qmd` or `.Qmd` is the extension for Quarto files, and stands for "Quarto markdown". | ||
Quarto files invert the normal way of writing code and comments: in a typical R script, all text is assumed to be R code, unless you preface it with a `#` to show that it is a comment. | ||
In Quarto, all text is assumed to be prose, and you use special notation to indicate which lines are R code to be evaluated. | ||
Once the code is evaluated, the results get inserted into a final, rendered document, which could be one of various formats. | ||
|
||
![Quarto workflow](https://ucsbcarpentry.github.io/Reproducible-Publications-with-RStudio-Quarto/fig/03-qmd-workflow.png) | ||
|
||
We don't have the time to go into the details of Quarto during this lesson, but recommend the ["Introduction to Reproducible Publications with RStudio" incubator (in-development) lesson](https://ucsbcarpentry.github.io/Reproducible-Publications-with-RStudio-Quarto/) for more on this topic. | ||
|
||
## Recommended workflow | ||
|
||
Dynamic documents like Quarto (or Rmarkdown, the predecessor to Quarto) can actually be used to manage data analysis pipelines. | ||
But that is not recommended because it doesn't scale well and lacks the sophisticated dependency tracking offered by `targets`. | ||
|
||
Our suggested approach is to conduct the vast majority of data analysis (in other words, the "heavy lifting") in the `targets` pipeline, then use the Quarto document to **summarize** and **plot** the results. | ||
|
||
## Report on bill size in penguins | ||
|
||
Continuing our penguin bill size analysis, let's write a report evaluating each model. | ||
|
||
To save time, the report is already available at https\://github.com/joelnitta/penguins-targets. | ||
|
||
Copy the [raw code from here](https://raw.githubusercontent.com/joelnitta/penguins-targets/main/penguin_report.qmd) and save it as a new file `penguin_report.qmd` in your project folder (you may also be able to right click in your browser and select "Save As"). | ||
|
||
Then, add one more target to the pipeline using the `tar_quarto()` function like this: | ||
|
||
```{r} | ||
#| label = "example-penguins-show-1", | ||
#| eval = FALSE, | ||
#| code = readLines("files/plans/plan_11.R")[2:42] | ||
``` | ||
|
||
```{r} | ||
#| label: example-penguins-hide-1 | ||
#| echo: FALSE | ||
#| eval: FALSE | ||
# FIXME | ||
# Skip eval until can figure out how to install quarto CLI in whatever is | ||
# compiling the lesson | ||
tar_dir({ | ||
library(quarto) | ||
write_example_plan(9) | ||
readr::read_lines("https://raw.githubusercontent.com/joelnitta/penguins-targets/main/penguin_report.qmd") |> | ||
readr::write_lines("penguin_report.qmd") | ||
# Run it | ||
write_example_plan("plan_8.R") | ||
tar_make(reporter = "silent") | ||
write_example_plan("plan_11.R") | ||
tar_make() | ||
}) | ||
``` | ||
|
||
The function to generate the report is `tar_quarto()`, from the `tarchetypes` package. | ||
|
||
As you can see, the "heavy" analysis of running the models is done in the workflow, then there is a single call to render the report at the end with `tar_quarto()`. | ||
|
||
## How does `targets` know when to render the report? | ||
|
||
It is not immediately apparent just from this how `targets` knows to generate the report **at the end of the workflow** (recall that build order is not determined by the order of how targets are written in the workflow, but rather by their dependencies). | ||
`penguin_report` does not appear to depend on any of the other targets, since they do not show up in the `tar_quarto()` call. | ||
|
||
How does this work? | ||
|
||
The answer lies **inside** the `penguin_report.qmd` file. Let's look at the start of the file: | ||
|
||
````{markdown} | ||
--- | ||
title: "Simpson's Paradox in Palmer Penguins" | ||
format: | ||
html: | ||
toc: true | ||
execute: | ||
echo: false | ||
--- | ||
```{r} | ||
#| label: load | ||
#| message: false | ||
targets::tar_load(penguin_models_augmented) | ||
targets::tar_load(penguin_models_summary) | ||
library(tidyverse) | ||
``` | ||
This is an example analysis of penguins on the Palmer Archipelago in Antarctica. | ||
```` | ||
|
||
The lines in between `---` and `---` at the very beginning are called the "YAML header", and contain directions about how to render the document. | ||
|
||
The R code to be executed is specified by the lines between ` ```{r} ` and ` ``` `. This is called a "code chunk", since it is a portion of code interspersed within prose text. | ||
|
||
Take a closer look at the R code chunk. Notice the two calls to `targets::tar_load()`. Do you remember what that function does? It loads the targets built during the workflow. | ||
|
||
Now things should make a bit more sense: `targets` knows that the report depends on the targets built during the workflow, `penguin_models_augmented` and `penguin_models_summary`, **because they are loaded in the report with `tar_load()`.** | ||
|
||
## Generating dynamic content | ||
|
||
The call to `tar_load()` at the start of `penguin_report.qmd` is really the key to generating an up-to-date report---once those are loaded from the workflow, we know that they are in sync with the data, and can use them to produce "polished" text and plots. | ||
|
||
::::::::::::::::::::::::::::::::::::: {.challenge} | ||
|
||
## Challenge: Spot the dynamic contents | ||
|
||
Read through `penguin_report.qmd` and try to find instances where the targets built during the workflow (`penguin_models_augmented` and `penguin_models_summary`) are used to dynamically produce text and plots. | ||
|
||
:::::::::::::::::::::::::::::::::: {.solution} | ||
|
||
- In the code chunk labeled `results-stats`, statistics from the models like _P_-value and adjusted _R_ squared are extracted, then inserted into the text with in-line code like `` `r knitr::inline_expr("mod_stats$combined$r.squared")` ``. | ||
|
||
- There are two figures, one for the combined model and one for the separate model (code chunks labeled `fig-combined-plot` and `fig-separate-plot`, respectively). These are built using the points predicted from the model in `penguin_models_augmented`. | ||
|
||
:::::::::::::::::::::::::::::::::: | ||
|
||
::::::::::::::::::::::::::::::::::::: | ||
|
||
You should also interactively run the code in `penguin_report.qmd` to better understand what is going on, starting with `tar_load()`. In fact, that is how this report was written: the code was run in an interactive session, and saved to the report as it was gradually tweaked to obtain the desired results. | ||
|
||
The best way to learn this approach to generating reports is to **try it yourself**. | ||
|
||
So your final Challenge is to construct a `targets` workflow using your own data and generate a report. Good luck! | ||
|
||
::::::::::::::::::::::::::::::::::::: keypoints | ||
|
||
- `tarchetypes::tar_quarto()` is used to render Quarto documents | ||
- You should load targets within the Quarto document using `tar_load()` and `tar_read()` | ||
- It is recommended to do heavy computations in the main targets workflow, and lighter formatting and plot generation in the Quarto document | ||
|
||
:::::::::::::::::::::::::::::::::::::::::::::::: |