From 71d36aee33180b478071adec42b1f67c0e34a289 Mon Sep 17 00:00:00 2001 From: joelnitta Date: Mon, 2 Dec 2024 20:09:18 +0900 Subject: [PATCH 01/10] Change branching episode to loop over groups --- episodes/branch.Rmd | 103 ++++++++++-------- episodes/files/plans/plan_5.R | 21 ++-- episodes/files/plans/plan_6.R | 23 ++-- episodes/files/plans/plan_6b.R | 28 +++++ episodes/files/tar_functions/model_augment.R | 16 +++ episodes/files/tar_functions/model_glance.R | 16 +++ .../files/tar_functions/model_glance_orig.R | 6 + 7 files changed, 145 insertions(+), 68 deletions(-) create mode 100644 episodes/files/plans/plan_6b.R create mode 100644 episodes/files/tar_functions/model_augment.R create mode 100644 episodes/files/tar_functions/model_glance.R create mode 100644 episodes/files/tar_functions/model_glance_orig.R diff --git a/episodes/branch.Rmd b/episodes/branch.Rmd index 23d4fcb5..b055392c 100644 --- a/episodes/branch.Rmd +++ b/episodes/branch.Rmd @@ -30,6 +30,14 @@ Episode summary: Show how to use branching library(targets) library(tarchetypes) library(broom) + +# sandpaper renders this lesson from episodes/ +# need to emulate this behavior during interactive development +# would be preferable to use here::here() but it doesn't work for some reason +if (interactive()) { + setwd("episodes") +} + source("files/lesson_functions.R") # Increase width for printing tibbles @@ -102,15 +110,14 @@ This seems to indicate that the model is highly significant. But wait a moment... is this really an appropriate model? Recall that there are three species of penguins in the dataset. It is possible that the relationship between bill depth and length **varies by species**. -We should probably test some alternative models. -These could include models that add a parameter for species, or add an interaction effect between species and bill length. +Let's try making one model *per* species (three models total) to see how that does (this is technically not the correct statistical approach, but our focus here is to learn `targets`, not statistics). Now our workflow is getting more complicated. This is what a workflow for such an analysis might look like **without branching** (make sure to add `library(broom)` to `packages.R`): ```{r} #| label = "example-model-show-1", #| eval = FALSE, -#| code = readLines("files/plans/plan_5.R")[2:31] +#| code = readLines("files/plans/plan_5.R")[2:36] ``` ```{r} @@ -133,19 +140,32 @@ Let's look at the summary of one of the models: #| eval: true #| echo: [2] pushd(plan_5_dir) -tar_read(species_summary) +tar_read(adelie_summary) popd() ``` So this way of writing the pipeline works, but is repetitive: we have to call `glance()` each time we want to obtain summary statistics for each model. -Furthermore, each summary target (`combined_summary`, etc.) is explicitly named and typed out manually. +Furthermore, each summary target (`adelie_summary`, etc.) is explicitly named and typed out manually. It would be fairly easy to make a typo and end up with the wrong model being summarized. +Before moving on, let's define another **custom function** function: `model_glance()`. +You will need to write custom functions frequently when using `targets`, so it's good to get used to it! + +As the name `model_glance()` suggests (it is good to write functions with names that indicate their purpose), this will build a model then immediately run `glance()` on it. +The reason for doing so is that we get a **dataframe as a result**, which as previously mentioned is very helpful for branching, as we will see in the next section. +Save this in `R/functions.R`: + +```{r} +#| label = "model-glance", +#| eval = FALSE, +#| code = readLines("files/tar_functions/model_glance_orig.R") +``` + ## Example with branching ### First attempt -Let's see how to write the same plan using **dynamic branching**: +Let's see how to write the same plan using **dynamic branching** (after running it, we will go through the new version in detail to understand each step): ```{r} #| label = "example-model-show-3", @@ -165,63 +185,65 @@ pushd(plan_6_dir) # simulate already running the plan once write_example_plan("plan_5.R") tar_make(reporter = "silent") -write_example_plan("plan_6.R") +# run version of plan that uses `model_glance_orig()` (doesn't include species +# names in output) +write_example_plan("plan_6b.R") tar_make() -example_branch_name <- tar_branch_names(model_summaries, 1) +example_branch_name <- tar_branch_names(species_summary, 1) popd() ``` -There is a series of smaller targets (branches) that are each named like `r example_branch_name`, then one overall `model_summaries` target. +There is a series of smaller targets (branches) that are each named like `r example_branch_name`, then one overall `species_summary` target. That is the result of specifying targets using branching: each of the smaller targets are the "branches" that comprise the overall target. Since `targets` has no way of knowing ahead of time how many branches there will be or what they represent, it names each one using this series of numbers and letters (the "hash"). `targets` builds each branch one at a time, then combines them into the overall target. -Next, let's look in more detail about how the workflow is set up, starting with how we defined the models: +Next, let's look in more detail about how the workflow is set up, starting with how we set up the data: ```{r} #| label = "model-def", -#| code = readLines("files/plans/plan_6.R")[14:22], +#| code = readLines("files/plans/plan_6.R")[14:19], #| eval = FALSE ``` -Unlike the non-branching version, we defined the models **in a list** (instead of one target per model). -This is because dynamic branching is similar to the `base::apply()` or [`purrrr::map()`](https://purrr.tidyverse.org/reference/map.html) method of looping: it applies a function to each element of a list. -So we need to prepare the input for looping as a list. +Unlike the non-branching version, we added a step that **groups the data**. +This is because dynamic branching is similar to the [`tidyverse` approach](https://dplyr.tidyverse.org/articles/grouping.html) of applying the same function to a grouped dataframe. +So we use the `tar_group_by()` function to specify the groups in our input data: one group per species. -Next, take a look at the command to build the target `model_summaries`. +Next, take a look at the command to build the target `species_summary`. ```{r} #| label = "model-summaries", -#| code = readLines("files/plans/plan_6.R")[23:28], +#| code = readLines("files/plans/plan_6.R")[22:27], #| eval = FALSE ``` -As before, the first argument is the name of the target to build, and the second is the command to build it. +As before, the first argument to `tar_target()` is the name of the target to build, and the second is the command to build it. -Here, we apply the `glance()` function to each element of `models` (the `[[1]]` is necessary because when the function gets applied, each element is actually a nested list, and we need to remove one layer of nesting). +Here, we apply our custom `model_glance()` function to each group (in other words, each species) in `penguins_data_grouped`. Finally, there is an argument we haven't seen before, `pattern`, which indicates that this target should be built using dynamic branching. -`map` means to apply the command to each element of the input list (`models`) sequentially. +`map` means to apply the function to each group of the input data (`penguins_data_grouped`) sequentially. Now that we understand how the branching workflow is constructed, let's inspect the output: ```{r} #| label: example-model-show-4 #| eval: FALSE -tar_read(model_summaries) +tar_read(species_summary) ``` ```{r} #| label: example-model-hide-4 #| echo: FALSE pushd(plan_6_dir) -tar_read(model_summaries) +tar_read(species_summary) popd() ``` The model summary statistics are all included in a single dataframe. -But there's one problem: **we can't tell which row came from which model!** It would be unwise to assume that they are in the same order as the list of models. +But there's one problem: **we can't tell which row came from which species!** It would be unwise to assume that they are in the same order as the input data. This is due to the way dynamic branching works: by default, there is no information about the provenance of each target preserved in the output. @@ -230,58 +252,43 @@ How can we fix this? ### Second attempt The key to obtaining useful output from branching pipelines is to include the necessary information in the output of each individual branch. -Here, we want to know the kind of model that corresponds to each row of the model summaries. -To do that, we need to write a **custom function**. -You will need to write custom functions frequently when using `targets`, so it's good to get used to it! +Here, we want to know the species that corresponds to each row of the model summaries. -Here is the function. Save this in `R/functions.R`: +We can achieve this by modifying our `model_glance` function. Be sure to save it after modifying it to include a column for species: ```{r} #| label: example-model-show-5 #| eval: FALSE -#| file: files/tar_functions/glance_with_mod_name.R +#| file: files/tar_functions/model_glance.R ``` -Our new pipeline looks almost the same as before, but this time we use the custom function instead of `glance()`. +Our new pipeline looks exactly the same as before; we have made a modification, but to a **function**, not the pipeline. -```{r} -#| label = "example-model-show-6", -#| code = readLines("files/plans/plan_7.R")[2:29], -#| eval = FALSE -``` +Since `targets` tracks the contents of each custom function, it realizes that it needs to recompute `species_summary` and runs this target again with the newly modified function. ```{r} #| label: example-model-hide-6 #| echo: FALSE pushd(plan_6_dir) -write_example_plan("plan_7.R") +write_example_plan("plan_6.R") tar_make() popd() ``` -And this time, when we load the `model_summaries`, we can tell which model corresponds to which row (you may need to scroll to the right to see it). +And this time, when we load the `model_summaries`, we can tell which model corresponds to which row (the `.before = 1` in `mutate()` ensures that it shows up before the other columns). ```{r} #| label: example-model-7 #| echo: [2] #| warning: false pushd(plan_6_dir) -tar_read(model_summaries) +tar_read(species_summary) popd() ``` Next we will add one more target, a prediction of bill depth based on each model. These will be needed for plotting the models in the report. -Such a prediction can be obtained with the `augment()` function of the `broom` package. +Such a prediction can be obtained with the `augment()` function of the `broom` package, and we create a custom function that outputs predicted points as a dataframe much like we did for the model summaries. -```{r} -#| label: example-augment -#| echo: [2, 3] -#| eval: true -pushd(plan_6_dir) -tar_load(models) -augment(models[[1]]) -popd() -``` ::::::::::::::::::::::::::::::::::::: {.challenge} @@ -291,12 +298,12 @@ Can you add the model predictions using `augment()`? You will need to define a c :::::::::::::::::::::::::::::::::: {.solution} -Define the new function as `augment_with_mod_name()`. It is the same as `glance_with_mod_name()`, but use `augment()` instead of `glance()`: +Define the new function as `model_augment()`. It is the same as `model_glance()`, but use `augment()` instead of `glance()`: ```{r} #| label: example-model-augment-func #| eval: FALSE -#| file: files/tar_functions/augment_with_mod_name.R +#| file: files/tar_functions/model_augment.R ``` Add the step to the workflow: diff --git a/episodes/files/plans/plan_5.R b/episodes/files/plans/plan_5.R index 882876cc..cecaae2b 100644 --- a/episodes/files/plans/plan_5.R +++ b/episodes/files/plans/plan_5.R @@ -16,16 +16,21 @@ tar_plan( bill_depth_mm ~ bill_length_mm, data = penguins_data ), - species_model = lm( - bill_depth_mm ~ bill_length_mm + species, - data = penguins_data + adelie_model = lm( + bill_depth_mm ~ bill_length_mm, + data = filter(penguins_data, species == "Adelie") ), - interaction_model = lm( - bill_depth_mm ~ bill_length_mm * species, - data = penguins_data + chinstrap_model = lm( + bill_depth_mm ~ bill_length_mm, + data = filter(penguins_data, species == "Chinstrap") + ), + gentoo_model = lm( + bill_depth_mm ~ bill_length_mm, + data = filter(penguins_data, species == "Gentoo") ), # Get model summaries combined_summary = glance(combined_model), - species_summary = glance(species_model), - interaction_summary = glance(interaction_model) + adelie_summary = glance(adelie_model), + chinstrap_summary = glance(chinstrap_model), + gentoo_summary = glance(gentoo_model) ) diff --git a/episodes/files/plans/plan_6.R b/episodes/files/plans/plan_6.R index fad7536b..33f30d95 100644 --- a/episodes/files/plans/plan_6.R +++ b/episodes/files/plans/plan_6.R @@ -11,19 +11,18 @@ tar_plan( ), # Clean data penguins_data = clean_penguin_data(penguins_data_raw), - # Build models - models = list( - combined_model = lm( - bill_depth_mm ~ bill_length_mm, data = penguins_data), - species_model = lm( - bill_depth_mm ~ bill_length_mm + species, data = penguins_data), - interaction_model = lm( - bill_depth_mm ~ bill_length_mm * species, data = penguins_data) + # Group data + tar_group_by( + penguins_data_grouped, + penguins_data, + species ), - # Get model summaries + # Build combined model with all species together + combined_summary = model_glance(penguins_data), + # Build one model per species tar_target( - model_summaries, - glance(models[[1]]), - pattern = map(models) + species_summary, + model_glance(penguins_data_grouped), + pattern = map(penguins_data_grouped) ) ) diff --git a/episodes/files/plans/plan_6b.R b/episodes/files/plans/plan_6b.R new file mode 100644 index 00000000..28ac909c --- /dev/null +++ b/episodes/files/plans/plan_6b.R @@ -0,0 +1,28 @@ +options(tidyverse.quiet = TRUE) +source("R/packages.R") +source("R/functions.R") + +tar_plan( + # Load raw data + tar_file_read( + penguins_data_raw, + path_to_file("penguins_raw.csv"), + read_csv(!!.x, show_col_types = FALSE) + ), + # Clean data + penguins_data = clean_penguin_data(penguins_data_raw), + # Group data + tar_group_by( + penguins_data_grouped, + penguins_data, + species + ), + # Build combined model with all species together + combined_summary = model_glance_orig(penguins_data), + # Build one model per species + tar_target( + species_summary, + model_glance_orig(penguins_data_grouped), + pattern = map(penguins_data_grouped) + ) +) diff --git a/episodes/files/tar_functions/model_augment.R b/episodes/files/tar_functions/model_augment.R new file mode 100644 index 00000000..68d65591 --- /dev/null +++ b/episodes/files/tar_functions/model_augment.R @@ -0,0 +1,16 @@ +model_glance <- function(penguins_data) { + # Make model + model <- lm( + bill_depth_mm ~ bill_length_mm, + data = penguins_data) + # Get species name + species_name <- unique(penguins_data$species) + # If this is the combined dataset with multiple + # species, changed name to 'combined' + if (length(species_name) > 1) { + species_name <- "combined" + } + # Get model summary and add species name + augment(model) |> + mutate(species = species_name, .before = 1) +} diff --git a/episodes/files/tar_functions/model_glance.R b/episodes/files/tar_functions/model_glance.R new file mode 100644 index 00000000..c324161f --- /dev/null +++ b/episodes/files/tar_functions/model_glance.R @@ -0,0 +1,16 @@ +model_glance <- function(penguins_data) { + # Make model + model <- lm( + bill_depth_mm ~ bill_length_mm, + data = penguins_data) + # Get species name + species_name <- unique(penguins_data$species) + # If this is the combined dataset with multiple + # species, changed name to 'combined' + if (length(species_name) > 1) { + species_name <- "combined" + } + # Get model summary and add species name + glance(model) |> + mutate(species = species_name, .before = 1) +} diff --git a/episodes/files/tar_functions/model_glance_orig.R b/episodes/files/tar_functions/model_glance_orig.R new file mode 100644 index 00000000..a0c3fdd4 --- /dev/null +++ b/episodes/files/tar_functions/model_glance_orig.R @@ -0,0 +1,6 @@ +model_glance_orig <- function(penguins_data) { + model <- lm( + bill_depth_mm ~ bill_length_mm, + data = penguins_data) + broom::glance(model) +} From 7f83f51695253a8249e0e1257734a0e27d5d5e25 Mon Sep 17 00:00:00 2001 From: joelnitta Date: Thu, 12 Dec 2024 07:35:04 +0900 Subject: [PATCH 02/10] Update parallelization episode --- episodes/branch.Rmd | 34 +++++- episodes/files/plans/plan_10.R | 35 +++--- episodes/files/plans/plan_6c.R | 34 ++++++ episodes/files/plans/plan_7.R | 31 ++++-- episodes/files/plans/plan_8.R | 35 +++--- episodes/files/plans/plan_9.R | 35 +++--- episodes/files/tar_functions/model_augment.R | 2 +- .../files/tar_functions/model_augment_slow.R | 17 +++ .../files/tar_functions/model_glance_slow.R | 17 +++ episodes/parallel.Rmd | 67 +++++++----- renv/activate.R | 103 ++++++++++++++++-- 11 files changed, 301 insertions(+), 109 deletions(-) create mode 100644 episodes/files/plans/plan_6c.R create mode 100644 episodes/files/tar_functions/model_augment_slow.R create mode 100644 episodes/files/tar_functions/model_glance_slow.R diff --git a/episodes/branch.Rmd b/episodes/branch.Rmd index b055392c..2c646125 100644 --- a/episodes/branch.Rmd +++ b/episodes/branch.Rmd @@ -1,6 +1,6 @@ --- title: 'Branching' -teaching: 10 +teaching: 30 exercises: 2 --- @@ -152,7 +152,7 @@ Before moving on, let's define another **custom function** function: `model_glan You will need to write custom functions frequently when using `targets`, so it's good to get used to it! As the name `model_glance()` suggests (it is good to write functions with names that indicate their purpose), this will build a model then immediately run `glance()` on it. -The reason for doing so is that we get a **dataframe as a result**, which as previously mentioned is very helpful for branching, as we will see in the next section. +The reason for doing so is that we get a **dataframe as a result**, which is very helpful for branching, as we will see in the next section. Save this in `R/functions.R`: ```{r} @@ -310,7 +310,7 @@ Add the step to the workflow: ```{r} #| label = "example-model-augment-show", -#| code = readLines("files/plans/plan_8.R")[2:35], +#| code = readLines("files/plans/plan_7.R")[2:36], #| eval = FALSE ``` @@ -318,6 +318,34 @@ Add the step to the workflow: ::::::::::::::::::::::::::::::::::::: +### Further simplify the workflow + +You may have noticed that we can further simplify the workflow: there is no need to have separate `penguins_data` and `penguins_data_grouped` dataframes. +In general it is best to keep the number of named objects as small as possible to make it easier to reason about your code. +Let's combine the cleaning and grouping step into a single command: + +```{r} +#| label = "example-model-show-8", +#| eval = FALSE, +#| code = readLines("files/plans/plan_8.R")[2:35] +``` + +And run it once more: + +```{r} +#| label: example-model-show-8 +#| echo: false +pushd(plan_6_dir) +# simulate already running the plan once +write_example_plan("plan_7.R") +tar_make(reporter = "silent") +# run version of plan that uses `model_glance_orig()` (doesn't include species +# names in output) +write_example_plan("plan_8.R") +tar_make() +popd() +``` + ::::::::::::::::::::::::::::::::::::: {.callout} ## Best practices for branching diff --git a/episodes/files/plans/plan_10.R b/episodes/files/plans/plan_10.R index be92fd01..59bcfed9 100644 --- a/episodes/files/plans/plan_10.R +++ b/episodes/files/plans/plan_10.R @@ -16,27 +16,26 @@ tar_plan( path_to_file("penguins_raw.csv"), read_csv(!!.x, show_col_types = FALSE) ), - # Clean data - penguins_data = clean_penguin_data(penguins_data_raw), - # Build models - models = list( - combined_model = lm( - bill_depth_mm ~ bill_length_mm, data = penguins_data), - species_model = lm( - bill_depth_mm ~ bill_length_mm + species, data = penguins_data), - interaction_model = lm( - bill_depth_mm ~ bill_length_mm * species, data = penguins_data) + # Clean and group data + tar_group_by( + penguins_data, + clean_penguin_data(penguins_data_raw), + species ), - # Get model summaries + # Get summary of combined model with all species together + combined_summary = model_glance(penguins_data), + # Get summary of one model per species tar_target( - model_summaries, - glance_with_mod_name_slow(models), - pattern = map(models) + species_summary, + model_glance_slow(penguins_data), + pattern = map(penguins_data) ), - # Get model predictions + # Get predictions of combined model with all species together + combined_predictions = model_glance_slow(penguins_data), + # Get predictions of one model per species tar_target( - model_predictions, - augment_with_mod_name_slow(models), - pattern = map(models) + species_predictions, + model_augment_slow(penguins_data), + pattern = map(penguins_data) ) ) diff --git a/episodes/files/plans/plan_6c.R b/episodes/files/plans/plan_6c.R new file mode 100644 index 00000000..8b72fa69 --- /dev/null +++ b/episodes/files/plans/plan_6c.R @@ -0,0 +1,34 @@ +options(tidyverse.quiet = TRUE) +source("R/functions.R") +source("R/packages.R") + +tar_plan( + # Load raw data + tar_file_read( + penguins_data_raw, + path_to_file("penguins_raw.csv"), + read_csv(!!.x, show_col_types = FALSE) + ), + # Clean and group data + tar_group_by( + penguins_data, + clean_penguin_data(penguins_data_raw), + species + ), + # Get summary of combined model with all species together + combined_summary = model_glance(penguins_data), + # Get summary of one model per species + tar_target( + species_summary, + model_glance(penguins_data), + pattern = map(penguins_data) + ), + # Get predictions of combined model with all species together + combined_predictions = model_glance(penguins_data), + # Get predictions of one model per species + tar_target( + species_predictions, + model_augment(penguins_data), + pattern = map(penguins_data) + ) +) diff --git a/episodes/files/plans/plan_7.R b/episodes/files/plans/plan_7.R index 346cca74..da5f7bc5 100644 --- a/episodes/files/plans/plan_7.R +++ b/episodes/files/plans/plan_7.R @@ -11,19 +11,26 @@ tar_plan( ), # Clean data penguins_data = clean_penguin_data(penguins_data_raw), - # Build models - models = list( - combined_model = lm( - bill_depth_mm ~ bill_length_mm, data = penguins_data), - species_model = lm( - bill_depth_mm ~ bill_length_mm + species, data = penguins_data), - interaction_model = lm( - bill_depth_mm ~ bill_length_mm * species, data = penguins_data) + # Group data + tar_group_by( + penguins_data_grouped, + penguins_data, + species ), - # Get model summaries + # Get summary of combined model with all species together + combined_summary = model_glance(penguins_data), + # Get summary of one model per species tar_target( - model_summaries, - glance_with_mod_name(models), - pattern = map(models) + species_summary, + model_glance(penguins_data_grouped), + pattern = map(penguins_data_grouped) + ), + # Get predictions of combined model with all species together + combined_predictions = model_glance(penguins_data_grouped), + # Get predictions of one model per species + tar_target( + species_predictions, + model_augment(penguins_data_grouped), + pattern = map(penguins_data_grouped) ) ) diff --git a/episodes/files/plans/plan_8.R b/episodes/files/plans/plan_8.R index 8a6779ef..8b72fa69 100644 --- a/episodes/files/plans/plan_8.R +++ b/episodes/files/plans/plan_8.R @@ -9,27 +9,26 @@ tar_plan( path_to_file("penguins_raw.csv"), read_csv(!!.x, show_col_types = FALSE) ), - # Clean data - penguins_data = clean_penguin_data(penguins_data_raw), - # Build models - models = list( - combined_model = lm( - bill_depth_mm ~ bill_length_mm, data = penguins_data), - species_model = lm( - bill_depth_mm ~ bill_length_mm + species, data = penguins_data), - interaction_model = lm( - bill_depth_mm ~ bill_length_mm * species, data = penguins_data) + # Clean and group data + tar_group_by( + penguins_data, + clean_penguin_data(penguins_data_raw), + species ), - # Get model summaries + # Get summary of combined model with all species together + combined_summary = model_glance(penguins_data), + # Get summary of one model per species tar_target( - model_summaries, - glance_with_mod_name(models), - pattern = map(models) + species_summary, + model_glance(penguins_data), + pattern = map(penguins_data) ), - # Get model predictions + # Get predictions of combined model with all species together + combined_predictions = model_glance(penguins_data), + # Get predictions of one model per species tar_target( - model_predictions, - augment_with_mod_name(models), - pattern = map(models) + species_predictions, + model_augment(penguins_data), + pattern = map(penguins_data) ) ) diff --git a/episodes/files/plans/plan_9.R b/episodes/files/plans/plan_9.R index 164359b1..99958265 100644 --- a/episodes/files/plans/plan_9.R +++ b/episodes/files/plans/plan_9.R @@ -16,27 +16,26 @@ tar_plan( path_to_file("penguins_raw.csv"), read_csv(!!.x, show_col_types = FALSE) ), - # Clean data - penguins_data = clean_penguin_data(penguins_data_raw), - # Build models - models = list( - combined_model = lm( - bill_depth_mm ~ bill_length_mm, data = penguins_data), - species_model = lm( - bill_depth_mm ~ bill_length_mm + species, data = penguins_data), - interaction_model = lm( - bill_depth_mm ~ bill_length_mm * species, data = penguins_data) + # Clean and group data + tar_group_by( + penguins_data, + clean_penguin_data(penguins_data_raw), + species ), - # Get model summaries + # Get summary of combined model with all species together + combined_summary = model_glance(penguins_data), + # Get summary of one model per species tar_target( - model_summaries, - glance_with_mod_name(models), - pattern = map(models) + species_summary, + model_glance(penguins_data), + pattern = map(penguins_data) ), - # Get model predictions + # Get predictions of combined model with all species together + combined_predictions = model_glance(penguins_data), + # Get predictions of one model per species tar_target( - model_predictions, - augment_with_mod_name(models), - pattern = map(models) + species_predictions, + model_augment(penguins_data), + pattern = map(penguins_data) ) ) diff --git a/episodes/files/tar_functions/model_augment.R b/episodes/files/tar_functions/model_augment.R index 68d65591..68875d00 100644 --- a/episodes/files/tar_functions/model_augment.R +++ b/episodes/files/tar_functions/model_augment.R @@ -1,4 +1,4 @@ -model_glance <- function(penguins_data) { +model_augment <- function(penguins_data) { # Make model model <- lm( bill_depth_mm ~ bill_length_mm, diff --git a/episodes/files/tar_functions/model_augment_slow.R b/episodes/files/tar_functions/model_augment_slow.R new file mode 100644 index 00000000..8dd99fe6 --- /dev/null +++ b/episodes/files/tar_functions/model_augment_slow.R @@ -0,0 +1,17 @@ +model_augment_slow <- function(penguins_data) { + Sys.sleep(4) + # Make model + model <- lm( + bill_depth_mm ~ bill_length_mm, + data = penguins_data) + # Get species name + species_name <- unique(penguins_data$species) + # If this is the combined dataset with multiple + # species, changed name to 'combined' + if (length(species_name) > 1) { + species_name <- "combined" + } + # Get model summary and add species name + augment(model) |> + mutate(species = species_name, .before = 1) +} diff --git a/episodes/files/tar_functions/model_glance_slow.R b/episodes/files/tar_functions/model_glance_slow.R new file mode 100644 index 00000000..ba37fe66 --- /dev/null +++ b/episodes/files/tar_functions/model_glance_slow.R @@ -0,0 +1,17 @@ +model_glance_slow <- function(penguins_data) { + Sys.sleep(4) + # Make model + model <- lm( + bill_depth_mm ~ bill_length_mm, + data = penguins_data) + # Get species name + species_name <- unique(penguins_data$species) + # If this is the combined dataset with multiple + # species, changed name to 'combined' + if (length(species_name) > 1) { + species_name <- "combined" + } + # Get model summary and add species name + glance(model) |> + mutate(species = species_name, .before = 1) +} diff --git a/episodes/parallel.Rmd b/episodes/parallel.Rmd index 1bdcb79b..4b458fc4 100644 --- a/episodes/parallel.Rmd +++ b/episodes/parallel.Rmd @@ -1,6 +1,6 @@ --- title: 'Parallel Processing' -teaching: 10 +teaching: 15 exercises: 2 --- @@ -30,6 +30,11 @@ Episode summary: Show how to use parallel processing library(targets) library(tarchetypes) library(broom) + +if (interactive()) { + setwd("episodes") +} + source("files/lesson_functions.R") # Increase width for printing tibbles @@ -76,7 +81,7 @@ It should now look like this: There is still one more thing we need to modify only for the purposes of this demo: if we ran the analysis in parallel now, you wouldn't notice any difference in compute time because the functions are so fast. -So let's make "slow" versions of `glance_with_mod_name()` and `augment_with_mod_name()` using the `Sys.sleep()` function, which just tells the computer to wait some number of seconds. +So let's make "slow" versions of `model_glance()` and `model_augment()` using the `Sys.sleep()` function, which just tells the computer to wait some number of seconds. This will simulate a long-running computation and enable us to see the difference between running sequentially and in parallel. Add these functions to `functions.R` (you can copy-paste the original ones, then modify them): @@ -85,8 +90,8 @@ Add these functions to `functions.R` (you can copy-paste the original ones, then #| label: slow-funcs #| eval: false #| file: -#| - files/tar_functions/glance_with_mod_name_slow.R -#| - files/tar_functions/augment_with_mod_name_slow.R +#| - files/tar_functions/model_glance_slow.R +#| - files/tar_functions/model_augment_slow.R ``` Then, change the plan to use the "slow" version of the functions: @@ -109,34 +114,36 @@ Finally, run the pipeline with `tar_make()` as normal. # with sandpaper::build_lesson(), even though it only uses 2 when run # interactively # -# plan_10_dir <- make_tempdir() -# pushd(plan_10_dir) -# write_example_plan("plan_9.R") -# tar_make(reporter = "silent") -# write_example_plan("plan_10.R") -# tar_make() -# popd() +plan_10_dir <- make_tempdir() +pushd(plan_10_dir) +write_example_plan("plan_9.R") +tar_make(reporter = "silent") +write_example_plan("plan_10.R") +tar_make() +popd() # Solution for now is to hard-code output -cat("✔ skip target penguins_data_raw_file -✔ skip target penguins_data_raw -✔ skip target penguins_data -✔ skip target models -• start branch model_predictions_5ad4cec5 -• start branch model_predictions_c73912d5 -• start branch model_predictions_91696941 -• start branch model_summaries_5ad4cec5 -• start branch model_summaries_c73912d5 -• start branch model_summaries_91696941 -• built branch model_predictions_5ad4cec5 [4.884 seconds] -• built branch model_predictions_c73912d5 [4.896 seconds] -• built branch model_predictions_91696941 [4.006 seconds] -• built pattern model_predictions -• built branch model_summaries_5ad4cec5 [4.011 seconds] -• built branch model_summaries_c73912d5 [4.011 seconds] -• built branch model_summaries_91696941 [4.011 seconds] -• built pattern model_summaries -• end pipeline [15.153 seconds]") +cat("✔ skipped target penguins_data_raw_file +✔ skipped target penguins_data_raw +✔ skipped target penguins_data +✔ skipped target combined_summary +▶ dispatched branch species_summary_1598bb4431372f32 +▶ dispatched branch species_summary_6b9109ba2e9d27fd +● completed branch species_summary_1598bb4431372f32 [4.815 seconds, 367 bytes] +▶ dispatched branch species_summary_625f9fbc7f62298a +● completed branch species_summary_6b9109ba2e9d27fd [4.813 seconds, 370 bytes] +▶ dispatched target combined_predictions +● completed branch species_summary_625f9fbc7f62298a [4.01 seconds, 367 bytes] +● completed pattern species_summary +▶ dispatched branch species_predictions_1598bb4431372f32 +● completed target combined_predictions [4.012 seconds, 370 bytes] +▶ dispatched branch species_predictions_6b9109ba2e9d27fd +● completed branch species_predictions_1598bb4431372f32 [4.014 seconds, 11.585 kilobytes] +▶ dispatched branch species_predictions_625f9fbc7f62298a +● completed branch species_predictions_6b9109ba2e9d27fd [4.01 seconds, 6.25 kilobytes] +● completed branch species_predictions_625f9fbc7f62298a [4.007 seconds, 9.628 kilobytes] +● completed pattern species_predictions +▶ ended pipeline [19.363 seconds]") ``` Notice that although the time required to build each individual target is about 4 seconds, the total time to run the entire workflow is less than the sum of the individual target times! That is proof that processes are running in parallel **and saving you time**. diff --git a/renv/activate.R b/renv/activate.R index d13f9932..8638f7fe 100644 --- a/renv/activate.R +++ b/renv/activate.R @@ -98,6 +98,66 @@ local({ unloadNamespace("renv") # load bootstrap tools + ansify <- function(text) { + if (renv_ansify_enabled()) + renv_ansify_enhanced(text) + else + renv_ansify_default(text) + } + + renv_ansify_enabled <- function() { + + override <- Sys.getenv("RENV_ANSIFY_ENABLED", unset = NA) + if (!is.na(override)) + return(as.logical(override)) + + pane <- Sys.getenv("RSTUDIO_CHILD_PROCESS_PANE", unset = NA) + if (identical(pane, "build")) + return(FALSE) + + testthat <- Sys.getenv("TESTTHAT", unset = "false") + if (tolower(testthat) %in% "true") + return(FALSE) + + iderun <- Sys.getenv("R_CLI_HAS_HYPERLINK_IDE_RUN", unset = "false") + if (tolower(iderun) %in% "false") + return(FALSE) + + TRUE + + } + + renv_ansify_default <- function(text) { + text + } + + renv_ansify_enhanced <- function(text) { + + # R help links + pattern <- "`\\?(renv::(?:[^`])+)`" + replacement <- "`\033]8;;ide:help:\\1\a?\\1\033]8;;\a`" + text <- gsub(pattern, replacement, text, perl = TRUE) + + # runnable code + pattern <- "`(renv::(?:[^`])+)`" + replacement <- "`\033]8;;ide:run:\\1\a\\1\033]8;;\a`" + text <- gsub(pattern, replacement, text, perl = TRUE) + + # return ansified text + text + + } + + renv_ansify_init <- function() { + + envir <- renv_envir_self() + if (renv_ansify_enabled()) + assign("ansify", renv_ansify_enhanced, envir = envir) + else + assign("ansify", renv_ansify_default, envir = envir) + + } + `%||%` <- function(x, y) { if (is.null(x)) y else x } @@ -142,7 +202,10 @@ local({ # compute common indent indent <- regexpr("[^[:space:]]", lines) common <- min(setdiff(indent, -1L)) - leave - paste(substring(lines, common), collapse = "\n") + text <- paste(substring(lines, common), collapse = "\n") + + # substitute in ANSI links for executable renv code + ansify(text) } @@ -305,8 +368,11 @@ local({ quiet = TRUE ) - if ("headers" %in% names(formals(utils::download.file))) - args$headers <- renv_bootstrap_download_custom_headers(url) + if ("headers" %in% names(formals(utils::download.file))) { + headers <- renv_bootstrap_download_custom_headers(url) + if (length(headers) && is.character(headers)) + args$headers <- headers + } do.call(utils::download.file, args) @@ -385,10 +451,21 @@ local({ for (type in types) { for (repos in renv_bootstrap_repos()) { + # build arguments for utils::available.packages() call + args <- list(type = type, repos = repos) + + # add custom headers if available -- note that + # utils::available.packages() will pass this to download.file() + if ("headers" %in% names(formals(utils::download.file))) { + headers <- renv_bootstrap_download_custom_headers(repos) + if (length(headers) && is.character(headers)) + args$headers <- headers + } + # retrieve package database db <- tryCatch( as.data.frame( - utils::available.packages(type = type, repos = repos), + do.call(utils::available.packages, args), stringsAsFactors = FALSE ), error = identity @@ -470,6 +547,14 @@ local({ } + renv_bootstrap_github_token <- function() { + for (envvar in c("GITHUB_TOKEN", "GITHUB_PAT", "GH_TOKEN")) { + envval <- Sys.getenv(envvar, unset = NA) + if (!is.na(envval)) + return(envval) + } + } + renv_bootstrap_download_github <- function(version) { enabled <- Sys.getenv("RENV_BOOTSTRAP_FROM_GITHUB", unset = "TRUE") @@ -477,16 +562,16 @@ local({ return(FALSE) # prepare download options - pat <- Sys.getenv("GITHUB_PAT") - if (nzchar(Sys.which("curl")) && nzchar(pat)) { + token <- renv_bootstrap_github_token() + if (nzchar(Sys.which("curl")) && nzchar(token)) { fmt <- "--location --fail --header \"Authorization: token %s\"" - extra <- sprintf(fmt, pat) + extra <- sprintf(fmt, token) saved <- options("download.file.method", "download.file.extra") options(download.file.method = "curl", download.file.extra = extra) on.exit(do.call(base::options, saved), add = TRUE) - } else if (nzchar(Sys.which("wget")) && nzchar(pat)) { + } else if (nzchar(Sys.which("wget")) && nzchar(token)) { fmt <- "--header=\"Authorization: token %s\"" - extra <- sprintf(fmt, pat) + extra <- sprintf(fmt, token) saved <- options("download.file.method", "download.file.extra") options(download.file.method = "wget", download.file.extra = extra) on.exit(do.call(base::options, saved), add = TRUE) From cc6cb538aa703279dc4a2fcdf4ebbc8286399911 Mon Sep 17 00:00:00 2001 From: joelnitta Date: Tue, 24 Dec 2024 07:29:02 +0900 Subject: [PATCH 03/10] Set wd for interactive sessions --- episodes/quarto.Rmd | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/episodes/quarto.Rmd b/episodes/quarto.Rmd index e9d724a6..55e1d8d5 100644 --- a/episodes/quarto.Rmd +++ b/episodes/quarto.Rmd @@ -30,6 +30,11 @@ Episode summary: Show how to write reports with Quarto library(targets) library(tarchetypes) library(quarto) # don't actually need to load, but put here so renv catches it + +if (interactive()) { + setwd("episodes") +} + source("files/lesson_functions.R") # Increase width for printing tibbles From 5d1191638d35b2634daad201ec99746dbb347c29 Mon Sep 17 00:00:00 2001 From: joelnitta Date: Tue, 24 Dec 2024 07:29:16 +0900 Subject: [PATCH 04/10] Update plan 11 --- episodes/files/plans/plan_11.R | 38 ++++++++++++++++------------------ 1 file changed, 18 insertions(+), 20 deletions(-) diff --git a/episodes/files/plans/plan_11.R b/episodes/files/plans/plan_11.R index 5c9af52f..fa028596 100644 --- a/episodes/files/plans/plan_11.R +++ b/episodes/files/plans/plan_11.R @@ -9,34 +9,32 @@ tar_plan( path_to_file("penguins_raw.csv"), read_csv(!!.x, show_col_types = FALSE) ), - # Clean data - penguins_data = clean_penguin_data(penguins_data_raw), - # Build models - models = list( - combined_model = lm( - bill_depth_mm ~ bill_length_mm, data = penguins_data), - species_model = lm( - bill_depth_mm ~ bill_length_mm + species, data = penguins_data), - interaction_model = lm( - bill_depth_mm ~ bill_length_mm * species, data = penguins_data) + # Clean and group data + tar_group_by( + penguins_data, + clean_penguin_data(penguins_data_raw), + species ), - # Get model summaries + # Get summary of combined model with all species together + combined_summary = model_glance(penguins_data), + # Get summary of one model per species tar_target( - model_summaries, - glance_with_mod_name(models), - pattern = map(models) + species_summary, + model_glance(penguins_data), + pattern = map(penguins_data) ), - # Get model predictions + # Get predictions of combined model with all species together + combined_predictions = model_glance(penguins_data), + # Get predictions of one model per species tar_target( - model_predictions, - augment_with_mod_name(models), - pattern = map(models) + species_predictions, + model_augment(penguins_data), + pattern = map(penguins_data) ), # Generate report tar_quarto( penguin_report, path = "penguin_report.qmd", - quiet = FALSE, - packages = c("targets", "tidyverse") + quiet = FALSE ) ) From 8725b9d84f26b78c299110a09289af1ff6b1b133 Mon Sep 17 00:00:00 2001 From: carpenter Date: Tue, 24 Dec 2024 12:15:46 +0900 Subject: [PATCH 05/10] Fix duplicated chunk label --- episodes/branch.Rmd | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/episodes/branch.Rmd b/episodes/branch.Rmd index 2c646125..9ec33475 100644 --- a/episodes/branch.Rmd +++ b/episodes/branch.Rmd @@ -333,7 +333,7 @@ Let's combine the cleaning and grouping step into a single command: And run it once more: ```{r} -#| label: example-model-show-8 +#| label: example-model-hide-8 #| echo: false pushd(plan_6_dir) # simulate already running the plan once @@ -350,9 +350,9 @@ popd() ## Best practices for branching -Dynamic branching is designed to work well with **dataframes** (tibbles). +Dynamic branching is designed to work well with **dataframes** (it can also use [lists](https://books.ropensci.org/targets/dynamic.html#list-iteration), but that is more advanced, so we recommend using dataframes when possible). -So if possible, write your custom functions to accept dataframes as input and return them as output, and always include any necessary metadata as a column or columns. +It is recommended to write your custom functions to accept dataframes as input and return them as output, and always include any necessary metadata as a column or columns. ::::::::::::::::::::::::::::::::::::: From 15bd5e2f23dc6ec93bfb60a3da13dd8d9dce45fe Mon Sep 17 00:00:00 2001 From: carpenter Date: Tue, 24 Dec 2024 12:16:10 +0900 Subject: [PATCH 06/10] Fix function call --- episodes/files/plans/plan_11.R | 2 +- episodes/files/plans/plan_8.R | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/episodes/files/plans/plan_11.R b/episodes/files/plans/plan_11.R index fa028596..6b23b0b3 100644 --- a/episodes/files/plans/plan_11.R +++ b/episodes/files/plans/plan_11.R @@ -24,7 +24,7 @@ tar_plan( pattern = map(penguins_data) ), # Get predictions of combined model with all species together - combined_predictions = model_glance(penguins_data), + combined_predictions = model_augment(penguins_data), # Get predictions of one model per species tar_target( species_predictions, diff --git a/episodes/files/plans/plan_8.R b/episodes/files/plans/plan_8.R index 8b72fa69..9d76b4a4 100644 --- a/episodes/files/plans/plan_8.R +++ b/episodes/files/plans/plan_8.R @@ -24,7 +24,7 @@ tar_plan( pattern = map(penguins_data) ), # Get predictions of combined model with all species together - combined_predictions = model_glance(penguins_data), + combined_predictions = model_augment(penguins_data), # Get predictions of one model per species tar_target( species_predictions, From 46ca809b9bf20343f9c546505c78e183285b2c84 Mon Sep 17 00:00:00 2001 From: carpenter Date: Tue, 24 Dec 2024 12:16:45 +0900 Subject: [PATCH 07/10] Update quarto episode to use branching over row groups --- episodes/quarto.Rmd | 38 +++++++++++++------------------------- 1 file changed, 13 insertions(+), 25 deletions(-) diff --git a/episodes/quarto.Rmd b/episodes/quarto.Rmd index 55e1d8d5..4f095276 100644 --- a/episodes/quarto.Rmd +++ b/episodes/quarto.Rmd @@ -111,7 +111,6 @@ Then, add one more target to the pipeline using the `tar_quarto()` function like tar_dir({ library(quarto) - write_example_plan(9) readr::read_lines("https://raw.githubusercontent.com/joelnitta/penguins-targets/main/penguin_report.qmd") |> readr::write_lines("penguin_report.qmd") # Run it @@ -135,36 +134,25 @@ How does this work? The answer lies **inside** the `penguin_report.qmd` file. Let's look at the start of the file: -``````{markdown} ---- -title: "Simpson's Paradox in Palmer Penguins" -format: - html: - toc: true -execute: - echo: false ---- - ```{r} -#| label: load -#| message: false -targets::tar_load(penguin_models_augmented) -targets::tar_load(penguin_models_summary) - -library(tidyverse) -``` +#| label: show-penguin-report-qmd +#| echo: FALSE +#| results: 'asis' -This is an example analysis of penguins on the Palmer Archipelago in Antarctica. +penguin_qmd <- readr::read_lines("https://raw.githubusercontent.com/joelnitta/penguins-targets/main/penguin_report.qmd") -`````` +cat("````{.markdown}\n") +cat(penguin_qmd[1:24], sep = "\n") +cat("\n````") +``` The lines in between `---` and `---` at the very beginning are called the "YAML header", and contain directions about how to render the document. The R code to be executed is specified by the lines between `` ```{r} `` and `` ``` ``. This is called a "code chunk", since it is a portion of code interspersed within prose text. -Take a closer look at the R code chunk. Notice the two calls to `targets::tar_load()`. Do you remember what that function does? It loads the targets built during the workflow. +Take a closer look at the R code chunk. Notice the use of `targets::tar_load()`. Do you remember what that function does? It loads the targets built during the workflow. -Now things should make a bit more sense: `targets` knows that the report depends on the targets built during the workflow, `penguin_models_augmented` and `penguin_models_summary`, **because they are loaded in the report with `tar_load()`.** +Now things should make a bit more sense: `targets` knows that the report depends on the targets built during the workflow like `combined_summary` and `species_summary` **because they are loaded in the report with `tar_load()`.** ## Generating dynamic content @@ -174,13 +162,13 @@ The call to `tar_load()` at the start of `penguin_report.qmd` is really the key ## Challenge: Spot the dynamic contents -Read through `penguin_report.qmd` and try to find instances where the targets built during the workflow (`penguin_models_augmented` and `penguin_models_summary`) are used to dynamically produce text and plots. +Read through `penguin_report.qmd` and try to find instances where the targets built during the workflow (`combined_summary`, etc.) are used to dynamically produce text and plots. :::::::::::::::::::::::::::::::::: {.solution} -- In the code chunk labeled `results-stats`, statistics from the models like *P*-value and adjusted *R* squared are extracted, then inserted into the text with in-line code like `` `r knitr::inline_expr("mod_stats$combined$r.squared")` ``. +- In the code chunk labeled `results-stats`, statistics from the models like *R* squared are extracted, then inserted into the text with in-line code like `` `r knitr::inline_expr("combined_r2")` ``. -- There are two figures, one for the combined model and one for the separate model (code chunks labeled `fig-combined-plot` and `fig-separate-plot`, respectively). These are built using the points predicted from the model in `penguin_models_augmented`. +- There are two figures, one for the combined model and one for the separate models (code chunks labeled `fig-combined-plot` and `fig-separate-plot`, respectively). These are built using the points predicted from the model in `combined_predictions` and `species_predictions`. :::::::::::::::::::::::::::::::::: From 895e9825d5985768baa4a0b31afa9d67de4f08df Mon Sep 17 00:00:00 2001 From: carpenter Date: Tue, 24 Dec 2024 15:01:49 +0900 Subject: [PATCH 08/10] Fix model_glance() vs model_augment() calls --- episodes/files/plans/plan_10.R | 4 ++-- episodes/files/plans/plan_7.R | 2 +- episodes/files/plans/plan_9.R | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/episodes/files/plans/plan_10.R b/episodes/files/plans/plan_10.R index 59bcfed9..2dbc9d21 100644 --- a/episodes/files/plans/plan_10.R +++ b/episodes/files/plans/plan_10.R @@ -23,7 +23,7 @@ tar_plan( species ), # Get summary of combined model with all species together - combined_summary = model_glance(penguins_data), + combined_summary = model_glance_slow(penguins_data), # Get summary of one model per species tar_target( species_summary, @@ -31,7 +31,7 @@ tar_plan( pattern = map(penguins_data) ), # Get predictions of combined model with all species together - combined_predictions = model_glance_slow(penguins_data), + combined_predictions = model_augment_slow(penguins_data), # Get predictions of one model per species tar_target( species_predictions, diff --git a/episodes/files/plans/plan_7.R b/episodes/files/plans/plan_7.R index da5f7bc5..af844230 100644 --- a/episodes/files/plans/plan_7.R +++ b/episodes/files/plans/plan_7.R @@ -26,7 +26,7 @@ tar_plan( pattern = map(penguins_data_grouped) ), # Get predictions of combined model with all species together - combined_predictions = model_glance(penguins_data_grouped), + combined_predictions = model_augment(penguins_data_grouped), # Get predictions of one model per species tar_target( species_predictions, diff --git a/episodes/files/plans/plan_9.R b/episodes/files/plans/plan_9.R index 99958265..5eb6ed7b 100644 --- a/episodes/files/plans/plan_9.R +++ b/episodes/files/plans/plan_9.R @@ -31,7 +31,7 @@ tar_plan( pattern = map(penguins_data) ), # Get predictions of combined model with all species together - combined_predictions = model_glance(penguins_data), + combined_predictions = model_augment(penguins_data), # Get predictions of one model per species tar_target( species_predictions, From d0345a88339f73f553d0aa984460fabc849f157c Mon Sep 17 00:00:00 2001 From: carpenter Date: Tue, 24 Dec 2024 15:02:06 +0900 Subject: [PATCH 09/10] Delete unused plan --- episodes/files/plans/plan_6c.R | 34 ---------------------------------- 1 file changed, 34 deletions(-) delete mode 100644 episodes/files/plans/plan_6c.R diff --git a/episodes/files/plans/plan_6c.R b/episodes/files/plans/plan_6c.R deleted file mode 100644 index 8b72fa69..00000000 --- a/episodes/files/plans/plan_6c.R +++ /dev/null @@ -1,34 +0,0 @@ -options(tidyverse.quiet = TRUE) -source("R/functions.R") -source("R/packages.R") - -tar_plan( - # Load raw data - tar_file_read( - penguins_data_raw, - path_to_file("penguins_raw.csv"), - read_csv(!!.x, show_col_types = FALSE) - ), - # Clean and group data - tar_group_by( - penguins_data, - clean_penguin_data(penguins_data_raw), - species - ), - # Get summary of combined model with all species together - combined_summary = model_glance(penguins_data), - # Get summary of one model per species - tar_target( - species_summary, - model_glance(penguins_data), - pattern = map(penguins_data) - ), - # Get predictions of combined model with all species together - combined_predictions = model_glance(penguins_data), - # Get predictions of one model per species - tar_target( - species_predictions, - model_augment(penguins_data), - pattern = map(penguins_data) - ) -) From fac93964b3d5c08e89ed5a6c9e2b427f60927908 Mon Sep 17 00:00:00 2001 From: carpenter Date: Tue, 24 Dec 2024 15:02:29 +0900 Subject: [PATCH 10/10] Local build seems to work with mult cores --- episodes/parallel.Rmd | 27 --------------------------- 1 file changed, 27 deletions(-) diff --git a/episodes/parallel.Rmd b/episodes/parallel.Rmd index 4b458fc4..1fa7c573 100644 --- a/episodes/parallel.Rmd +++ b/episodes/parallel.Rmd @@ -110,10 +110,6 @@ Finally, run the pipeline with `tar_make()` as normal. #| message: false #| echo: false -# FIXME: parallel code uses all available CPUs and hangs when rendering website -# with sandpaper::build_lesson(), even though it only uses 2 when run -# interactively -# plan_10_dir <- make_tempdir() pushd(plan_10_dir) write_example_plan("plan_9.R") @@ -121,29 +117,6 @@ tar_make(reporter = "silent") write_example_plan("plan_10.R") tar_make() popd() - -# Solution for now is to hard-code output -cat("✔ skipped target penguins_data_raw_file -✔ skipped target penguins_data_raw -✔ skipped target penguins_data -✔ skipped target combined_summary -▶ dispatched branch species_summary_1598bb4431372f32 -▶ dispatched branch species_summary_6b9109ba2e9d27fd -● completed branch species_summary_1598bb4431372f32 [4.815 seconds, 367 bytes] -▶ dispatched branch species_summary_625f9fbc7f62298a -● completed branch species_summary_6b9109ba2e9d27fd [4.813 seconds, 370 bytes] -▶ dispatched target combined_predictions -● completed branch species_summary_625f9fbc7f62298a [4.01 seconds, 367 bytes] -● completed pattern species_summary -▶ dispatched branch species_predictions_1598bb4431372f32 -● completed target combined_predictions [4.012 seconds, 370 bytes] -▶ dispatched branch species_predictions_6b9109ba2e9d27fd -● completed branch species_predictions_1598bb4431372f32 [4.014 seconds, 11.585 kilobytes] -▶ dispatched branch species_predictions_625f9fbc7f62298a -● completed branch species_predictions_6b9109ba2e9d27fd [4.01 seconds, 6.25 kilobytes] -● completed branch species_predictions_625f9fbc7f62298a [4.007 seconds, 9.628 kilobytes] -● completed pattern species_predictions -▶ ended pipeline [19.363 seconds]") ``` Notice that although the time required to build each individual target is about 4 seconds, the total time to run the entire workflow is less than the sum of the individual target times! That is proof that processes are running in parallel **and saving you time**.