Skip to content

Commit

Permalink
differences for PR #18
Browse files Browse the repository at this point in the history
  • Loading branch information
actions-user committed Jul 10, 2024
1 parent d4b61fc commit d0c0eff
Show file tree
Hide file tree
Showing 16 changed files with 444 additions and 2,162 deletions.
38 changes: 19 additions & 19 deletions basic-targets.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ We will now start to write a `_targets.R` file. Fortunately, `targets` comes wit
In the R console, first load the `targets` package with `library(targets)`, then run the command `tar_script()`.


```r
``` r
library(targets)
tar_script()
```
Expand Down Expand Up @@ -120,7 +120,7 @@ In real life you are probably have externally stored raw data, so **let's use th
The `path_to_file()` function in `palmerpenguins` provides the path to the raw data CSV file (it is inside the `palmerpenguins` R package source code that you downloaded to your computer when you installed the package).


```r
``` r
library(palmerpenguins)

# Get path to CSV file
Expand All @@ -129,16 +129,16 @@ penguins_csv_file <- path_to_file("penguins_raw.csv")
penguins_csv_file
```

```{.output}
[1] "/home/runner/.local/share/renv/cache/v5/R-4.3/x86_64-pc-linux-gnu/palmerpenguins/0.1.1/6c6861efbc13c1d543749e9c7be4a592/palmerpenguins/extdata/penguins_raw.csv"
``` output
[1] "/home/runner/.local/share/renv/cache/v5/linux-ubuntu-jammy/R-4.4/x86_64-pc-linux-gnu/palmerpenguins/0.1.1/6c6861efbc13c1d543749e9c7be4a592/palmerpenguins/extdata/penguins_raw.csv"
```

We will use the `tidyverse` set of packages for loading and manipulating the data. We don't have time to cover all the details about using `tidyverse` now, but if you want to learn more about it, please see the ["Manipulating, analyzing and exporting data with tidyverse" lesson](https://datacarpentry.org/R-ecology-lesson/03-dplyr.html).

Let's load the data with `read_csv()`.


```r
``` r
library(tidyverse)

# Read CSV file into R
Expand All @@ -148,7 +148,7 @@ penguins_data_raw
```


```{.output}
``` output
Rows: 344 Columns: 17
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
Expand All @@ -160,7 +160,7 @@ date (1): Date Egg
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
```

```{.output}
``` output
# A tibble: 344 × 17
studyName `Sample Number` Species Region Island Stage `Individual ID`
<chr> <dbl> <chr> <chr> <chr> <chr> <chr>
Expand Down Expand Up @@ -191,7 +191,7 @@ Let's clean up the data to make it easier to use for downstream analyses.
We will also remove any rows with missing data, because this could cause errors for some functions later.


```r
``` r
# Clean up raw data
penguins_data <- penguins_data_raw |>
# Rename columns for easier typing and
Expand All @@ -207,7 +207,7 @@ penguins_data <- penguins_data_raw |>
penguins_data
```

```{.output}
``` output
# A tibble: 342 × 3
species bill_length_mm bill_depth_mm
<chr> <dbl> <dbl>
Expand Down Expand Up @@ -240,7 +240,7 @@ The other steps (setting the file path and loading the data) are each just one f
Finally, each step in the workflow is defined with the `tar_target()` function.


```r
``` r
library(targets)
library(tidyverse)
library(palmerpenguins)
Expand Down Expand Up @@ -271,18 +271,18 @@ Now that we have a workflow, we can run it with the `tar_make()` function.
Try running it, and you should see something like this:


```r
``` r
tar_make()
```

```{.output}
• start target penguins_csv_file
• built target penguins_csv_file [0.002 seconds]
• start target penguins_data_raw
• built target penguins_data_raw [0.095 seconds]
• start target penguins_data
• built target penguins_data [0.013 seconds]
• end pipeline [0.216 seconds]
``` output
▶ dispatched target penguins_csv_file
● completed target penguins_csv_file [0 seconds]
▶ dispatched target penguins_data_raw
● completed target penguins_data_raw [0.136 seconds]
▶ dispatched target penguins_data
● completed target penguins_data [0.007 seconds]
▶ ended pipeline [0.205 seconds]
```

Congratulations, you've run your first workflow with `targets`!
Expand Down
142 changes: 71 additions & 71 deletions branch.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,14 +47,14 @@ We will test this hypothesis with a linear model.
For example, this is a model of bill depth dependent on bill length:


```r
``` r
lm(bill_depth_mm ~ bill_length_mm, data = penguins_data)
```

We can add this to our pipeline. We will call it the `combined_model` because it combines all the species together without distinction:


```r
``` r
source("R/packages.R")
source("R/functions.R")

Expand All @@ -76,25 +76,25 @@ tar_plan(
```


```{.output}
skip target penguins_data_raw_file
skip target penguins_data_raw
skip target penguins_data
• start target combined_model
• built target combined_model [0.034 seconds]
• end pipeline [0.136 seconds]
``` output
skipped target penguins_data_raw_file
skipped target penguins_data_raw
skipped target penguins_data
▶ dispatched target combined_model
● completed target combined_model [0.052 seconds]
▶ ended pipeline [0.13 seconds]
```

Let's have a look at the model. We will use the `glance()` function from the `broom` package. Unlike base R `summary()`, this function returns output as a tibble (the tidyverse equivalent of a dataframe), which as we will see later is quite useful for downstream analyses.


```r
``` r
library(broom)
tar_load(combined_model)
glance(combined_model)
```

```{.output}
``` output
# A tibble: 1 × 12
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
Expand All @@ -112,7 +112,7 @@ These could include models that add a parameter for species, or add an interacti
Now our workflow is getting more complicated. This is what a workflow for such an analysis might look like **without branching** (make sure to add `library(broom)` to `packages.R`):


```r
``` r
source("R/packages.R")
source("R/functions.R")

Expand Down Expand Up @@ -146,32 +146,32 @@ tar_plan(
```


```{.output}
skip target penguins_data_raw_file
skip target penguins_data_raw
skip target penguins_data
skip target combined_model
• start target interaction_model
• built target interaction_model [0.004 seconds]
• start target species_model
• built target species_model [0.011 seconds]
• start target combined_summary
• built target combined_summary [0.008 seconds]
• start target interaction_summary
• built target interaction_summary [0.002 seconds]
• start target species_summary
• built target species_summary [0.003 seconds]
• end pipeline [0.144 seconds]
``` output
skipped target penguins_data_raw_file
skipped target penguins_data_raw
skipped target penguins_data
skipped target combined_model
▶ dispatched target interaction_model
● completed target interaction_model [0.002 seconds]
▶ dispatched target species_model
● completed target species_model [0.001 seconds]
▶ dispatched target combined_summary
● completed target combined_summary [0.006 seconds]
▶ dispatched target interaction_summary
● completed target interaction_summary [0.003 seconds]
▶ dispatched target species_summary
● completed target species_summary [0.035 seconds]
▶ ended pipeline [0.144 seconds]
```

Let's look at the summary of one of the models:


```r
``` r
tar_read(species_summary)
```

```{.output}
``` output
# A tibble: 1 × 12
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
Expand All @@ -189,7 +189,7 @@ It would be fairly easy to make a typo and end up with the wrong model being sum
Let's see how to write the same plan using **dynamic branching**:


```r
``` r
source("R/packages.R")
source("R/functions.R")

Expand Down Expand Up @@ -225,31 +225,31 @@ What is going on here?
First, let's look at the messages provided by `tar_make()`.


```{.output}
skip target penguins_data_raw_file
skip target penguins_data_raw
skip target penguins_data
• start target models
• built target models [0.006 seconds]
• start branch model_summaries_5ad4cec5
• built branch model_summaries_5ad4cec5 [0.008 seconds]
• start branch model_summaries_c73912d5
• built branch model_summaries_c73912d5 [0.002 seconds]
• start branch model_summaries_91696941
• built branch model_summaries_91696941 [0.003 seconds]
• built pattern model_summaries
• end pipeline [0.149 seconds]
``` output
skipped target penguins_data_raw_file
skipped target penguins_data_raw
skipped target penguins_data
▶ dispatched target models
● completed target models [0.004 seconds]
▶ dispatched branch model_summaries_812e3af782bee03f
● completed branch model_summaries_812e3af782bee03f [0.005 seconds]
▶ dispatched branch model_summaries_2b8108839427c135
● completed branch model_summaries_2b8108839427c135 [0.002 seconds]
▶ dispatched branch model_summaries_533cd9a636c3e05b
● completed branch model_summaries_533cd9a636c3e05b [0.002 seconds]
● completed pattern model_summaries
▶ ended pipeline [0.14 seconds]
```

There is a series of smaller targets (branches) that are each named like model_summaries_5ad4cec5, then one overall `model_summaries` target.
There is a series of smaller targets (branches) that are each named like model_summaries_812e3af782bee03f, then one overall `model_summaries` target.
That is the result of specifying targets using branching: each of the smaller targets are the "branches" that comprise the overall target.
Since `targets` has no way of knowing ahead of time how many branches there will be or what they represent, it names each one using this series of numbers and letters (the "hash").
`targets` builds each branch one at a time, then combines them into the overall target.

Next, let's look in more detail about how the workflow is set up, starting with how we defined the models:


```r
``` r
# Build models
models = list(
combined_model = lm(
Expand All @@ -268,7 +268,7 @@ So we need to prepare the input for looping as a list.
Next, take a look at the command to build the target `model_summaries`.


```r
``` r
# Get model summaries
tar_target(
model_summaries,
Expand All @@ -287,12 +287,12 @@ Finally, there is an argument we haven't seen before, `pattern`, which indicates
Now that we understand how the branching workflow is constructed, let's inspect the output:


```r
``` r
tar_read(model_summaries)
```


```{.output}
``` output
# A tibble: 3 × 12
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
Expand All @@ -319,7 +319,7 @@ You will need to write custom functions frequently when using `targets`, so it's
Here is the function. Save this in `R/functions.R`:


```r
``` r
glance_with_mod_name <- function(model_in_list) {
model_name <- names(model_in_list)
model <- model_in_list[[1]]
Expand All @@ -331,7 +331,7 @@ glance_with_mod_name <- function(model_in_list) {
Our new pipeline looks almost the same as before, but this time we use the custom function instead of `glance()`.


```r
``` r
source("R/functions.R")
source("R/packages.R")

Expand Down Expand Up @@ -363,29 +363,29 @@ tar_plan(
```


```{.output}
skip target penguins_data_raw_file
skip target penguins_data_raw
skip target penguins_data
skip target models
• start branch model_summaries_5ad4cec5
• built branch model_summaries_5ad4cec5 [0.03 seconds]
• start branch model_summaries_c73912d5
• built branch model_summaries_c73912d5 [0.006 seconds]
• start branch model_summaries_91696941
• built branch model_summaries_91696941 [0.004 seconds]
• built pattern model_summaries
• end pipeline [0.154 seconds]
``` output
skipped target penguins_data_raw_file
skipped target penguins_data_raw
skipped target penguins_data
skipped target models
▶ dispatched branch model_summaries_812e3af782bee03f
● completed branch model_summaries_812e3af782bee03f [0.011 seconds]
▶ dispatched branch model_summaries_2b8108839427c135
● completed branch model_summaries_2b8108839427c135 [0.006 seconds]
▶ dispatched branch model_summaries_533cd9a636c3e05b
● completed branch model_summaries_533cd9a636c3e05b [0.039 seconds]
● completed pattern model_summaries
▶ ended pipeline [0.145 seconds]
```

And this time, when we load the `model_summaries`, we can tell which model corresponds to which row (you may need to scroll to the right to see it).


```r
``` r
tar_read(model_summaries)
```

```{.output}
``` output
# A tibble: 3 × 13
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs model_name
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int> <chr>
Expand All @@ -398,12 +398,12 @@ Next we will add one more target, a prediction of bill depth based on each model
Such a prediction can be obtained with the `augment()` function of the `broom` package.


```r
``` r
tar_load(models)
augment(models[[1]])
```

```{.output}
``` output
# A tibble: 342 × 8
bill_depth_mm bill_length_mm .fitted .resid .hat .sigma .cooksd .std.resid
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
Expand Down Expand Up @@ -431,7 +431,7 @@ Can you add the model predictions using `augment()`? You will need to define a c
Define the new function as `augment_with_mod_name()`. It is the same as `glance_with_mod_name()`, but use `augment()` instead of `glance()`:


```r
``` r
augment_with_mod_name <- function(model_in_list) {
model_name <- names(model_in_list)
model <- model_in_list[[1]]
Expand All @@ -443,7 +443,7 @@ augment_with_mod_name <- function(model_in_list) {
Add the step to the workflow:


```r
``` r
source("R/functions.R")
source("R/packages.R")

Expand Down
Loading

0 comments on commit d0c0eff

Please sign in to comment.