From bfece38cf28f3471ddec200a487a5ebd21bb9b46 Mon Sep 17 00:00:00 2001 From: joelnitta Date: Thu, 26 Dec 2024 09:49:00 +0900 Subject: [PATCH 1/3] Update times --- episodes/basic-targets.Rmd | 4 ++-- episodes/functions.Rmd | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/episodes/basic-targets.Rmd b/episodes/basic-targets.Rmd index 4ac83a0d..fdb0a2e6 100644 --- a/episodes/basic-targets.Rmd +++ b/episodes/basic-targets.Rmd @@ -1,7 +1,7 @@ --- title: 'First targets Workflow' -teaching: 10 -exercises: 2 +teaching: 30 +exercises: 10 --- :::::::::::::::::::::::::::::::::::::: questions diff --git a/episodes/functions.Rmd b/episodes/functions.Rmd index c2e4e316..f4656724 100644 --- a/episodes/functions.Rmd +++ b/episodes/functions.Rmd @@ -1,7 +1,7 @@ --- -title: 'A brief introduction to functions' -teaching: 20 -exercises: 1 +title: 'A Brief Introduction to Functions' +teaching: 30 +exercises: 10 --- :::::::::::::::::::::::::::::::::::::: questions From 2a5d7373dc1ae1ad057e2ad2de599d7f86dbac46 Mon Sep 17 00:00:00 2001 From: joelnitta Date: Thu, 26 Dec 2024 09:50:23 +0900 Subject: [PATCH 2/3] Swap order of first two sections in basic-targets Fixes #33 --- episodes/basic-targets.Rmd | 201 +++++++++++++++++++++++++--------- episodes/files/plans/plan_0.R | 22 ++++ 2 files changed, 170 insertions(+), 53 deletions(-) create mode 100644 episodes/files/plans/plan_0.R diff --git a/episodes/basic-targets.Rmd b/episodes/basic-targets.Rmd index fdb0a2e6..699b257d 100644 --- a/episodes/basic-targets.Rmd +++ b/episodes/basic-targets.Rmd @@ -34,6 +34,11 @@ Episode summary: First chance to get hands dirty by writing a very simple workfl #| message: FALSE #| warning: FALSE library(targets) + +if (interactive()) { + setwd("episodes") +} + source("files/lesson_functions.R") ``` @@ -76,49 +81,10 @@ Once you work through these steps, your RStudio session should look like this: Our project now contains a single file, created by RStudio: `targets-demo.Rproj`. You should not edit this file by hand. Its purpose is to tell RStudio that this is a project folder and to store some RStudio settings (if you use version-control software, it is OK to commit this file). Also, you can open the project by double clicking on the `.Rproj` file in your file explorer (try it by quitting RStudio then navigating in your file browser to your Desktop, opening the "targets-demo" folder, and double clicking `targets-demo.Rproj`). -OK, now that our project is set up, we are ready to start using `targets`! - -## Create a `_targets.R` file - -Every `targets` project must include a special file, called `_targets.R` in the main project folder (the "project root"). -The `_targets.R` file includes the specification of the workflow: directions for R to run your analysis, kind of like a recipe. -By using the `_targets.R` file, you won't have to remember to run specific scripts in a certain order. -Instead, R will do it for you (more reproducibility points)! - -### Anatomy of a `_targets.R` file - -We will now start to write a `_targets.R` file. Fortunately, `targets` comes with a function to help us do this. +OK, now that our project is set up, we are (almost) ready to start using `targets`! -In the R console, first load the `targets` package with `library(targets)`, then run the command `tar_script()`. +## Background: non-`targets` version -```{r} -#| label: start-targets-show -#| eval: FALSE -library(targets) -tar_script() -``` - -Nothing will happen in the console, but in the file viewer, you should see a new file, `_targets.R` appear. Open it using the File menu or by clicking on it. - -We can see this default `_targets.R` file includes three main parts: - -- Loading packages with `library()` -- Defining a custom function with `function()` -- Defining a list with `list()`. - -The last part, the list, is the most important part of the `_targets.R` file. -It defines the steps in the workflow. -The `_targets.R` file must always end with this list. - -Furthermore, each item in the list is a call of the `tar_target()` function. -The first argument of `tar_target()` is name of the target to build, and the second argument is the command used to build it. -Note that the name of the target is **unquoted**, that is, it is written without any surrounding quotation marks. - -## Set up `_targets.R` file to run example analysis - -### Background: non-`targets` version - -We will use this template to start building our analysis of bill shape in penguins. First though, to get familiar with the functions and packages we'll use, let's run the code like you would in a "normal" R script without using `targets`. Recall that we are using the `palmerpenguins` R package to obtain the data. @@ -139,7 +105,6 @@ penguins_csv_file We will use the `tidyverse` set of packages for loading and manipulating the data. We don't have time to cover all the details about using `tidyverse` now, but if you want to learn more about it, please see the ["Manipulating, analyzing and exporting data with tidyverse" lesson](https://datacarpentry.org/R-ecology-lesson/03-dplyr.html), or the Carpentry incubator lesson [R and the tidyverse for working with datasets](https://carpentries-incubator.github.io/r-tidyverse-4-datasets/). - Let's load the data with `read_csv()`. ```{r} @@ -191,25 +156,134 @@ penguins_data <- penguins_data_raw |> penguins_data ``` -That's better! +We have not run the full analysis yet, but this is enough to get us started with the transition to using `targets`. -### `targets` version +## `targets` version -What does this look like using `targets`? +### About the `_targets.R` file -The biggest difference is that we need to **put each step of the workflow into the list at the end**. +One major difference between a typical R data analysis and a `targets` project is that the latter must include a special file, called `_targets.R` in the main project folder (the "project root"). -We also define a custom function for the data cleaning step. -That is because the list of targets at the end **should look like a high-level summary of your analysis**. -You want to avoid lengthy chunks of code when defining the targets; instead, put that code in the custom functions. -The other steps (setting the file path and loading the data) are each just one function call so there's not much point in putting those into their own custom functions. +The `_targets.R` file includes the specification of the workflow: these are the directions for R to run your analysis, kind of like a recipe. +By using the `_targets.R` file, **you won't have to remember to run specific scripts in a certain order**; instead, R will do it for you! +This is a **huge win**, both for your future self and anybody else trying to reproduce your analysis. + +### Writing the initial `_targets.R` file + +We will now start to write a `_targets.R` file. Fortunately, `targets` comes with a function to help us do this. + +In the R console, first load the `targets` package with `library(targets)`, then run the command `tar_script()`. + +```{r} +#| label: start-targets-show +#| eval: FALSE +library(targets) +tar_script() +``` -Finally, each step in the workflow is defined with the `tar_target()` function. +Nothing will happen in the console, but in the file viewer, you should see a new file, `_targets.R` appear. Open it using the File menu or by clicking on it. + +```{r} +#| label: start-targets-hide +#| eval: true +#| echo: false +#| results: "asis" +plan_0_dir <- make_tempdir() +pushd(plan_0_dir) +tar_script() +default_script <- readr::read_lines("_targets.R") +popd() + +cat("```{.r}\n") +cat(default_script, sep = "\n") +cat("```") +``` + +Don't worry about the details of this file. +Instead, notice that that it includes three main parts: + +- Loading packages with `library()` +- Defining a custom function with `function()` +- Defining a list with `list()`. + +You may not have used `function()` before. +If not, that's OK; we will cover this in more detail in the [next episode](episodes/functions.Rmd), so we will ignore it for now. + +The last part, the list, is the **most important part** of the `_targets.R` file. +It defines the steps in the workflow. +The `_targets.R` file **must always end with this list**. + +Furthermore, each item in the list is a call of the `tar_target()` function. +The first argument of `tar_target()` is name of the target to build, and the second argument is the command used to build it. +Note that the name of the target is **unquoted**, that is, it is written without any surrounding quotation marks. + +## Modifying `_targets.R` to run the example analysis + +First, let's load all of the packages we need for our workflow. +Add `library(tidyverse)` and `library(palmerpenguins)` to the top of `_targets.R` after `library(targets)`. + +Next, we can delete the `function()` statement since we won't be using that just yet (we will come back to custom functions soon!). + +The last, and trickiest, part is correctly defining the workflow in the list at the end of the file. + +From [the non-`targets` version](#background-non-targets-version), you can see we have three steps so far: + +1. Define the path to the CSV file with the raw penguins data. +2. Read the CSV file. +3. Clean the raw data. + +Each of these will be one item in the list. +Furthermore, we need to write each item using the `tar_target()` function. +Recall that we write the `tar_target()` function by writing the **name of the target to build** first and the **command to build it** second. + +::::::::::::::::::::::::::::::::::::: {.callout} + +## Choosing good target names + +The name of each target could be anything you like, but it is strongly recommended to **choose names that reflect what the target actually contains**. + +For example, `penguins_data_raw` for the raw data loaded from the CSV file and not `x`. + +Your future self will thank you! + +:::::::::::::::::::::::::::::::::::::::::: + +::::::::::::::::::::::::::::::::::::: {.challenge} + +## Challenge: Use `tar_target()` + +Can you use `tar_target()` to define the first step in the workflow (setting the path to the CSV file with the penguins data)? + +:::::::::::::::::::::::::::::::::: {.solution} + +```{r} +#| label: challenge-solution-1 +#| eval: false +tar_target(name = penguins_csv_file, command = path_to_file("penguins_raw.csv")) +``` + +The first two arguments of `tar_target()` are the **name** of the target, followed by the **command** to build it. + +These arguments are used so frequently we will typically omit the argument names, instead writing it like this: + +```{r} +#| label: challenge-solution-2 +#| eval: false +tar_target(penguins_csv_file, path_to_file("penguins_raw.csv")) +``` + +:::::::::::::::::::::::::::::::::: + +:::::::::::::::::::::::::::::::::::::::::: + +Now that we've seen how to define the first target, let's continue and add the rest. + +Once you've done that, this is how `_targets.R` should look: ```{r} #| label = "targets-show-workflow", #| eval = FALSE, -#| code = readLines("files/plans/plan_1.R")[2:21] +#| code = readLines("files/plans/plan_0.R")[2:22] ``` I have set `show_col_types = FALSE` in `read_csv()` because we know from the earlier code that the column types were set correctly by default (character for species and numeric for bill length and depth), so we don't need to see the warning it would otherwise issue. @@ -224,13 +298,34 @@ Try running it, and you should see something like this: #| eval: true #| echo: [3] pushd(make_tempdir()) -write_example_plan("plan_1.R") +write_example_plan("plan_0.R") tar_make() popd() ``` Congratulations, you've run your first workflow with `targets`! +::::::::::::::::::::::::::::::::::::: {.callout} + +## The workflow cannot be run interactively + +You may be used to running R code interactively by selecting lines and pressing the "Run" button (or using the keyboard shortcut) in RStudio or your IDE of choice. + +You *could* run the list at the of `_targets.R` this way, but it will not execute the workflow (it will return a list instead). + +**The only way to run the workflow is with `tar_make()`.** + +You do not need to select and run anything interactively in `_targets.R`. +In fact, you do not even need to have the `_targets.R` file open to run the workflow with `tar_make()`---try it for yourself! + +Similarly, **you must not write `tar_make()` in the `_targets.R` file**; you should only use `tar_make()` as a direct command at the R console. + +:::::::::::::::::::::::::::::::::::::::::: + +Remember, now that we are using `targets`, **the only thing you need to do to replicate your analysis is run `tar_make()`**. + +This is true no matter how long or complicated your analysis becomes. + ::::::::::::::::::::::::::::::::::::: keypoints - Projects help keep our analyses organized so we can easily re-run them later diff --git a/episodes/files/plans/plan_0.R b/episodes/files/plans/plan_0.R new file mode 100644 index 00000000..3825e719 --- /dev/null +++ b/episodes/files/plans/plan_0.R @@ -0,0 +1,22 @@ +options(tidyverse.quiet = TRUE) +library(targets) +library(tidyverse) +library(palmerpenguins) + +list( + tar_target(penguins_csv_file, path_to_file("penguins_raw.csv")), + tar_target( + penguins_data_raw, + read_csv(penguins_csv_file, show_col_types = FALSE) + ), + tar_target( + penguins_data, + penguins_data_raw |> + select( + species = Species, + bill_length_mm = `Culmen Length (mm)`, + bill_depth_mm = `Culmen Depth (mm)` + ) |> + drop_na() + ) +) From f840e254d20a8fca0b879132c1d02aa501994013 Mon Sep 17 00:00:00 2001 From: joelnitta Date: Thu, 26 Dec 2024 09:51:33 +0900 Subject: [PATCH 3/3] Add section in functions.Rmd showing how to use fun in workflow --- episodes/cache.Rmd | 2 +- episodes/functions.Rmd | 69 ++++++++++++++++++++++++++++++++++++------ 2 files changed, 60 insertions(+), 11 deletions(-) diff --git a/episodes/cache.Rmd b/episodes/cache.Rmd index c61e2a91..0a585295 100644 --- a/episodes/cache.Rmd +++ b/episodes/cache.Rmd @@ -35,7 +35,7 @@ Episode summary: Show how to get at the objects that we built ## Where does the workflow happen? -So we just finished running our first workflow. +So we just finished running our workflow. Now you probably want to look at its output. But, if we just call the name of the object (for example, `penguins_data`), we get an error. ```{r} diff --git a/episodes/functions.Rmd b/episodes/functions.Rmd index f4656724..5ddd54e0 100644 --- a/episodes/functions.Rmd +++ b/episodes/functions.Rmd @@ -39,9 +39,7 @@ if (interactive()) { source("files/lesson_functions.R") ``` -## Create a function - -### About functions +## About functions Functions in R are something we are used to thinking of as something that comes from a package. You find, install and use specialized functions from packages to get your work done. @@ -58,10 +56,11 @@ Furthermore, `targets` makes extensive use of custom functions, so a basic under ### Writing a function There is not much difference between writing your own function and writing other code in R, you are still coding with R! -Let's imagine we want to convert the millimeter measurements in the Penguins data to centimeters. +Let's imagine we want to convert the millimeter measurements in the penguins data to centimeters. ```{r} #| label: targets-functions-problem +#| message: FALSE library(palmerpenguins) library(tidyverse) @@ -96,7 +95,14 @@ For our mm to cm conversion the function would look like so: mm2cm <- function(x) { x / 10 } -# use it +``` + +Our custom function will now transform any numerical input by dividing it by 10. + +Let's try it out: + +```{r} +#| label: targets-functions-cm-use penguins |> mutate( bill_length_cm = mm2cm(bill_length_mm), @@ -104,7 +110,7 @@ penguins |> ) ``` -Our custom function will now transform any numerical input by dividing it by 10. +Congratulations, you've created and used your first custom function! ### Make a function from existing code @@ -112,7 +118,7 @@ Many times, we might already have a piece of code that we'd like to use to creat For instance, we've copy-pasted a section of code several times and realize that this piece of code is repetitive, so a function is in order. Or, you are converting your workflow to `targets`, and need to change your script into a series of functions that `targets` will call. -Recall the code snippet we had to clean our Penguins data: +Recall the code snippet we had to clean our penguins data: ```{r} #| label: code-to-convert-to-function @@ -142,6 +148,8 @@ clean_penguin_data <- function(penguins_data_raw) { } ``` +Add this function to `_targets.R` after the part where you load packages with `library()` and before the list at the end. + ::::::::::::::::: callout # RStudio function extraction @@ -178,9 +186,50 @@ vecmean <- function(x) { ::::::::::::::::::::::::::::::::::::: -Congratulations, you've started a whole new journey into functions! -This was a very brief introduction to functions, and you will likely need to get more help in learning about them. -There is an episode in the R Novice lesson from Carpentries that is [all about functions](https://swcarpentry.github.io/r-novice-gapminder/10-functions.html) which you might want to read. +## Using functions in the workflow + +Now that we've defined our custom data cleaning function, we can put it to use in the workflow. + +Can you see how this might be done? + +We need to delete the corresponding code from the last `tar_target()` and replace it with a call to the new function. + +Modify the workflow to look like this: + +```{r} +#| label = "targets-show-fun-add", +#| eval = FALSE, +#| code = readLines("files/plans/plan_1.R")[2:21] +``` + +We should run the workflow again with `tar_make()` to make sure it is up-to-date: + +```{r} +#| label: targets-run-fun +#| eval: true +#| echo: [5] +pushd(make_tempdir()) +write_example_plan("plan_0.R") +tar_make(reporter = "silent") +write_example_plan("plan_1.R") +tar_make() +popd() +``` + +We will learn more soon about the messages that `targets()` prints out. + +## Functions make it easier to reason about code + +Notice that now the list of targets at the end is starting to look like a high-level summary of your analysis. + +This is another advantage of using custom functions: **functions allows us to separate the details of each workflow step from the overall workflow**. + +To understand the overall workflow, you don't need to know all of the details about how the data were cleaned; you just need to know that there was a cleaning step. +On the other hand, if you do need to go back and delve into the specifics of the data cleaning, you only need to pay attention to what happens inside that function, and you can ignore the rest of the workflow. +**This makes it easier to reason about the code**, and will lead to fewer bugs and ultimately save you time and mental energy. + +Here we have only scratched the surface of functions, and you will likely need to get more help in learning about them. +For more information, we recommend reading this episode in the R Novice lesson from Carpentries that is [all about functions](https://swcarpentry.github.io/r-novice-gapminder/10-functions.html). ::::::::::::::::::::::::::::::::::::: keypoints