forked from ropensci/tarchetypes
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
190 lines (155 loc) · 7.14 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# tarchetypes <img src='man/figures/logo.png' align="right" height="139"/>
[![ropensci](https://badges.ropensci.org/401_status.svg)](https://github.com/ropensci/software-review/issues/401)
[![zenodo](https://zenodo.org/badge/282774543.svg)](https://zenodo.org/badge/latestdoi/282774543)
[![R Targetopia](https://img.shields.io/badge/R_Targetopia-member-blue?style=flat&labelColor=gray)](https://wlandau.github.io/targetopia/)
[![CRAN](https://www.r-pkg.org/badges/version/tarchetypes)](https://CRAN.R-project.org/package=tarchetypes)
[![status](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)
[![check](https://github.com/ropensci/tarchetypes/actions/workflows/check.yaml/badge.svg)](https://github.com/ropensci/tarchetypes/actions?query=workflow%3Acheck)
[![codecov](https://codecov.io/gh/ropensci/tarchetypes/branch/main/graph/badge.svg?token=3T5DlLwUVl)](https://app.codecov.io/gh/ropensci/tarchetypes)
[![lint](https://github.com/ropensci/tarchetypes/actions/workflows/lint.yaml/badge.svg)](https://github.com/ropensci/tarchetypes/actions?query=workflow%3Alint)
The `tarchetypes` R package is a collection of target and pipeline archetypes for the [`targets`](https://github.com/ropensci/targets) package. These archetypes express complicated pipelines with concise syntax, which enhances readability and thus reproducibility. Archetypes are possible because of the flexible metaprogramming capabilities of [`targets`](https://github.com/ropensci/targets). In [`targets`](https://github.com/ropensci/targets), one can define a target as an object outside the central pipeline, and the [`tar_target_raw()`](https://docs.ropensci.org/targets/reference/tar_target_raw.html) function completely avoids non-standard evaluation. That means anyone can write their own niche interfaces for specialized projects. `tarchetypes` aims to include the most common and versatile archetypes and usage patterns.
## Grouped data frames
`tarchetypes` has functions for easy dynamic branching over subsets of data frames:
* `tar_group_by()`: define row groups using `dplyr::group_by()` semantics.
* `tar_group_select()`: define row groups using `tidyselect` semantics.
* `tar_group_count()`: define a given number row groups.
* `tar_group_size()`: define row groups of a given size.
If you define a target with one of these functions, all downstream dynamic targets will automatically branch over the row groups.
```{r, echo = FALSE}
targets::tar_script({
produce_data <- function() {
expand.grid(var1 = c("a", "b"), var2 = c("c", "d"), rep = c(1, 2, 3))
}
list(
tarchetypes::tar_group_by(data, produce_data(), var1, var2),
tar_target(group, data, pattern = map(data))
)
})
```
```{r, eval = FALSE}
# _targets.R file:
library(targets)
library(tarchetypes)
produce_data <- function() {
expand.grid(var1 = c("a", "b"), var2 = c("c", "d"), rep = c(1, 2, 3))
}
list(
tar_group_by(data, produce_data(), var1, var2),
tar_target(group, data, pattern = map(data))
)
```
```{r}
# R console:
library(targets)
tar_make()
# First row group:
tar_read(group, branches = 1)
# Second row group:
tar_read(group, branches = 2)
```
## Literate programming
Consider the following R Markdown report.
```{r, echo = FALSE, comment = ""}
lines <- c(
"---",
"title: report",
"output: html_document",
"---",
"",
"```{r}",
"library(targets)",
"tar_read(dataset)",
"```"
)
cat(lines, sep = "\n")
```
We want to define a target to render the report. And because the report calls `tar_read(dataset)`, this target needs to depend on `dataset`. Without `tarchetypes`, it is cumbersome to set up the pipeline correctly.
```{r, eval = FALSE}
# _targets.R
library(targets)
list(
tar_target(dataset, data.frame(x = letters)),
tar_target(
report, {
# Explicitly mention the symbol `dataset`.
list(dataset)
# Return relative paths to keep the project portable.
fs::path_rel(
# Need to return/track all input/output files.
c(
rmarkdown::render(
input = "report.Rmd",
# Always run from the project root
# so the report can find _targets/.
knit_root_dir = getwd(),
quiet = TRUE
),
"report.Rmd"
)
)
},
# Track the input and output files.
format = "file",
# Avoid building small reports on HPC.
deployment = "main"
)
)
```
With `tarchetypes`, we can simplify the pipeline with the `tar_render()` archetype.
```{r, eval = FALSE}
# _targets.R
library(targets)
library(tarchetypes)
list(
tar_target(dataset, data.frame(x = letters)),
tar_render(report, "report.Rmd")
)
```
Above, `tar_render()` scans code chunks for mentions of targets in `tar_load()` and `tar_read()`, and it enforces the dependency relationships it finds. In our case, it reads `report.Rmd` and then forces `report` to depend on `dataset`. That way, `tar_make()` always processes `dataset` before `report`, and it automatically reruns `report.Rmd` whenever `dataset` changes.
## Alternative pipeline syntax
[`tar_plan()`](https://docs.ropensci.org/tarchetypes/reference/tar_plan.html) is a drop-in replacement for [`drake_plan()`](https://docs.ropensci.org/drake/reference/drake_plan.html) in the [`targets`](https://github.com/ropensci/targets) ecosystem.
It lets users write targets as name/command pairs without having to call [`tar_target()`](https://docs.ropensci.org/targets/reference/tar_target.html).
```{r, eval = FALSE}
tar_plan(
tar_file(raw_data_file, "data/raw_data.csv", format = "file"),
# Simple drake-like syntax:
raw_data = read_csv(raw_data_file, col_types = cols()),
data =raw_data %>%
mutate(Ozone = replace_na(Ozone, mean(Ozone, na.rm = TRUE))),
hist = create_plot(data),
fit = biglm(Ozone ~ Wind + Temp, data),
# Needs tar_render() because it is a target archetype:
tar_render(report, "report.Rmd")
)
```
## Installation
Type | Source | Command
---|---|---
Release | CRAN | `install.packages("tarchetypes")`
Development | GitHub | `remotes::install_github("ropensci/tarchetypes")`
Development | rOpenSci | `install.packages("tarchetypes", repos = "https://dev.ropensci.org")`
## Documentation
For specific documentation on `tarchetypes`, including the help files of all user-side functions, please visit the [reference website](https://docs.ropensci.org/tarchetypes/). For documentation on [`targets`](https://github.com/ropensci/targets) in general, please visit the [`targets` reference website](https://docs.ropensci.org/targets/). Many of the linked resources use `tarchetypes` functions such as [`tar_render()`](https://docs.ropensci.org/tarchetypes/reference/tar_render.html).
## Help
Please read the [help guide](https://books.ropensci.org/targets/help.html) to learn how best to ask for help using `targets` and `tarchetypes`.
## Code of conduct
Please note that this package is released with a [Contributor Code of Conduct](https://ropensci.org/code-of-conduct/).
## Citation
```{r}
citation("tarchetypes")
```
```{r, echo = FALSE}
unlink("_targets.R")
tar_destroy()
```