Skip to content

Commit

Permalink
New translations branch.md (Spanish)
Browse files Browse the repository at this point in the history
  • Loading branch information
joelnitta committed Nov 17, 2023
1 parent ee7bd8b commit 298de1f
Showing 1 changed file with 351 additions and 0 deletions.
351 changes: 351 additions & 0 deletions locale/es/episodes/branch.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,351 @@
---
title: Branching
teaching: 10
exercises: 2
---

:::::::::::::::::::::::::::::::::::::: questions

- How can we specify many targets without typing everything out?

::::::::::::::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: objectives

- Be able to specify targets using branching

::::::::::::::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: instructor

Episode summary: Show how to use branching

:::::::::::::::::::::::::::::::::::::

```{r}
#| label: setup
#| echo: FALSE
#| message: FALSE
#| warning: FALSE
library(targets)
library(tarchetypes)
library(broom)
source("files/lesson_functions.R")
# Increase width for printing tibbles
options(width = 140)
```

## Why branching?

One of the major strengths of `targets` is the ability to define many targets from a single line of code ("branching").
This not only saves you typing, it also **reduces the risk of errors** since there is less chance of making a typo.

## Types of branching

There are two types of branching, **dynamic branching** and **static branching**.
"Branching" refers to the idea that you can provide a single specification for how to make targets (the "pattern"), and `targets` generates multiple targets from it ("branches").
"Dynamic" means that the branches that result from the pattern do not have to be defined ahead of time---they are a dynamic result of the code.

In this workshop, we will only cover dynamic branching since it is generally easier to write (static branching requires use of [meta-programming](https://books.ropensci.org/targets/static.html#metaprogramming), an advanced topic). For more information about each and when you might want to use one or the other (or some combination of the two), [see the `targets` package manual](https://books.ropensci.org/targets/dynamic.html).

## Example without branching

To see how this works, let's continue our analysis of the `palmerpenguins` dataset.

**Our hypothesis is that bill depth decreases with bill length.**
We will test this hypothesis with a linear model.

For example, this is a model of bill depth dependent on bill length:

```{r}
#| label: example-lm
#| eval: FALSE
lm(bill_depth_mm ~ bill_length_mm, data = penguins_data)
```

We can add this to our pipeline. We will call it the `combined_model` because it combines all the species together without distinction:

```{r}
#| label = "example-lm-pipeline-show",
#| eval = FALSE,
#| code = readLines("files/plans/plan_4.R")[2:19]
```

```{r}
#| label: example-lm-pipeline-hide
#| echo: false
plan_4_dir <- make_tempdir()
pushd(plan_4_dir)
write_example_plan("plan_3.R")
tar_make(reporter = "silent")
write_example_plan("plan_4.R")
tar_make()
popd()
```

Let's have a look at the model. We will use the `glance()` function from the `broom` package. Unlike base R `summary()`, this function returns output as a tibble (the tidyverse equivalent of a dataframe), which as we will see later is quite useful for downstream analyses.

```{r}
#| label: example-lm-pipeline-inspect-show
#| eval: true
#| echo: [2, 3, 4]
pushd(plan_4_dir)
library(broom)
tar_load(combined_model)
glance(combined_model)
popd()
```

Notice the small _P_-value.
This seems to indicate that the model is highly significant.

But wait a moment... is this really an appropriate model? Recall that there are three species of penguins in the dataset. It is possible that the relationship between bill depth and length **varies by species**.

We should probably test some alternative models.
These could include models that add a parameter for species, or add an interaction effect between species and bill length.

Now our workflow is getting more complicated. This is what a workflow for such an analysis might look like **without branching** (make sure to add `library(broom)` to `packages.R`):

```{r}
#| label = "example-model-show-1",
#| eval = FALSE,
#| code = readLines("files/plans/plan_5.R")[2:31]
```

```{r}
#| label: example-model-hide-1
#| echo: false
plan_5_dir <- make_tempdir()
pushd(plan_5_dir)
# simulate already running the plan once
write_example_plan("plan_4.R")
tar_make(reporter = "silent")
write_example_plan("plan_5.R")
tar_make()
popd()
```

Let's look at the summary of one of the models:

```{r}
#| label: example-model-show-2
#| eval: true
#| echo: [2]
pushd(plan_5_dir)
tar_read(species_summary)
popd()
```

So this way of writing the pipeline works, but is repetitive: we have to call `glance()` each time we want to obtain summary statistics for each model.
Furthermore, each summary target (`combined_summary`, etc.) is explicitly named and typed out manually.
It would be fairly easy to make a typo and end up with the wrong model being summarized.

## Example with branching

### First attempt

Let's see how to write the same plan using **dynamic branching**:

```{r}
#| label = "example-model-show-3",
#| eval = FALSE,
#| code = readLines("files/plans/plan_6.R")[2:29]
```

What is going on here?

First, let's look at the messages provided by `tar_make()`.

```{r}
#| label: example-model-hide-3
#| echo: false
plan_6_dir <- make_tempdir()
pushd(plan_6_dir)
# simulate already running the plan once
write_example_plan("plan_5.R")
tar_make(reporter = "silent")
write_example_plan("plan_6.R")
tar_make()
example_branch_name <- tar_branch_names(model_summaries, 1)
popd()
```

There is a series of smaller targets (branches) that are each named like `r example_branch_name`, then one overall `model_summaries` target.
That is the result of specifying targets using branching: each of the smaller targets are the "branches" that comprise the overall target.
Since `targets` has no way of knowing ahead of time how many branches there will be or what they represent, it names each one using this series of numbers and letters (the "hash").
`targets` builds each branch one at a time, then combines them into the overall target.

Next, let's look in more detail about how the workflow is set up, starting with how we defined the models:

```{r}
#| label = "model-def",
#| code = readLines("files/plans/plan_6.R")[14:22],
#| eval = FALSE
```

Unlike the non-branching version, we defined the models **in a list** (instead of one target per model).
This is because dynamic branching is similar to the `base::apply()` or [`purrrr::map()`](https://purrr.tidyverse.org/reference/map.html) method of looping: it applies a function to each element of a list.
So we need to prepare the input for looping as a list.

Next, take a look at the command to build the target `model_summaries`.

```{r}
#| label = "model-summaries",
#| code = readLines("files/plans/plan_6.R")[23:28],
#| eval = FALSE
```

As before, the first argument is the name of the target to build, and the second is the command to build it.

Here, we apply the `glance()` function to each element of `models` (the `[[1]]` is necessary because when the function gets applied, each element is actually a nested list, and we need to remove one layer of nesting).

Finally, there is an argument we haven't seen before, `pattern`, which indicates that this target should be built using dynamic branching.
`map` means to apply the command to each element of the input list (`models`) sequentially.

Now that we understand how the branching workflow is constructed, let's inspect the output:

```{r}
#| label: example-model-show-4
#| eval: FALSE
tar_read(model_summaries)
```

```{r}
#| label: example-model-hide-4
#| echo: FALSE
pushd(plan_6_dir)
tar_read(model_summaries)
popd()
```

The model summary statistics are all included in a single dataframe.

But there's one problem: **we can't tell which row came from which model!** It would be unwise to assume that they are in the same order as the list of models.

This is due to the way dynamic branching works: by default, there is no information about the provenance of each target preserved in the output.

How can we fix this?

### Second attempt

The key to obtaining useful output from branching pipelines is to include the necessary information in the output of each individual branch.
Here, we want to know the kind of model that corresponds to each row of the model summaries.
To do that, we need to write a **custom function**.
You will need to write custom functions frequently when using `targets`, so it's good to get used to it!

Here is the function. Save this in `R/functions.R`:

```{r}
#| label: example-model-show-5
#| eval: FALSE
#| file: files/tar_functions/glance_with_mod_name.R
```

Our new pipeline looks almost the same as before, but this time we use the custom function instead of `glance()`.

```{r}
#| label = "example-model-show-6",
#| code = readLines("files/plans/plan_7.R")[2:29],
#| eval = FALSE
```

```{r}
#| label: example-model-hide-6
#| echo: FALSE
pushd(plan_6_dir)
write_example_plan("plan_7.R")
tar_make()
popd()
```

And this time, when we load the `model_summaries`, we can tell which model corresponds to which row (you may need to scroll to the right to see it).

```{r}
#| label: example-model-7
#| echo: [2]
#| warning: false
pushd(plan_6_dir)
tar_read(model_summaries)
popd()
```

Next we will add one more target, a prediction of bill depth based on each model. These will be needed for plotting the models in the report.
Such a prediction can be obtained with the `augment()` function of the `broom` package.

```{r}
#| label: example-augment
#| echo: [2, 3]
#| eval: true
pushd(plan_6_dir)
tar_load(models)
augment(models[[1]])
popd()
```

::::::::::::::::::::::::::::::::::::: {.challenge}

## Challenge: Add model predictions to the workflow

Can you add the model predictions using `augment()`? You will need to define a custom function just like we did for `glance()`.

:::::::::::::::::::::::::::::::::: {.solution}

Define the new function as `augment_with_mod_name()`. It is the same as `glance_with_mod_name()`, but use `augment()` instead of `glance()`:

```{r}
#| label: example-model-augment-func
#| eval: FALSE
#| file: files/tar_functions/augment_with_mod_name.R
```

Add the step to the workflow:

```{r}
#| label = "example-model-augment-show",
#| code = readLines("files/plans/plan_8.R")[2:35],
#| eval = FALSE
```

::::::::::::::::::::::::::::::::::

:::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: {.callout}

## Best practices for branching

Dynamic branching is designed to work well with **dataframes** (tibbles).

So if possible, write your custom functions to accept dataframes as input and return them as output, and always include any necessary metadata as a column or columns.

:::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: {.challenge}

## Challenge: What other kinds of patterns are there?

So far, we have only used a single function in conjunction with the `pattern` argument, `map()`, which applies the function to each element of its input in sequence.

Can you think of any other ways you might want to apply a branching pattern?

:::::::::::::::::::::::::::::::::: {.solution}

Some other ways of applying branching patterns include:

- crossing: one branch per combination of elements (`cross()` function)
- slicing: one branch for each of a manually selected set of elements (`slice()` function)
- sampling: one branch for each of a randomly selected set of elements (`sample()` function)

You can [find out more about different branching patterns in the `targets` manual](https://books.ropensci.org/targets/dynamic.html#patterns).

::::::::::::::::::::::::::::::::::

:::::::::::::::::::::::::::::::::::::

::::::::::::::::::::::::::::::::::::: keypoints

- Dynamic branching creates multiple targets with a single command
- You usually need to write custom functions so that the output of the branches includes necessary metadata

::::::::::::::::::::::::::::::::::::::::::::::::

0 comments on commit 298de1f

Please sign in to comment.