-
Notifications
You must be signed in to change notification settings - Fork 128
Meta make
I'd like to propose three very simple verbs that might get us halfway towards the goal of a DSL (#233). This proposal addresses the low-level technical part, which I think is required for any DSL.
Input: A drake plan. Output: A named list. Example of two equivalent plans:
# Proposed plan
plan <- drake::drake_plan(
x = 1,
y = 2,
meta_plan = drake_plan(
a = x,
b = y
),
results = meta_make(meta_plan)
)
# Equivalent plan that works in the current implementation
plan <- drake::drake_plan(
x = 1,
y = 2,
meta_plan = NULL,
results = { meta_plan; list(
a = x,
b = y
)}
)
drake::make(plan)
#> cache /tmp/RtmpArc3UM/.drake
#> connect 1 import: plan
#> connect 4 targets: x, y, meta_plan, results
#> check 1 item: list
#> check 3 items: meta_plan, x, y
#> target meta_plan
#> check 1 item: results
#> load 2 items: x, y
#> target results
drake::readd(results)
#> cache /tmp/RtmpArc3UM/.drake
#> $a
#> [1] 1
#>
#> $b
#> [1] 2
Created on 2018-03-07 by the reprex package (v0.2.0).
The argument to meta_make()
can be a target, that's where it becomes really powerful. If meta_make()
is called with an up to date target and unchanged code, the results remain up to date too.
Input: A named list. For each element, a target is created in the plan. Example of two equivalent plans:
# Proposed plan
drake_plan(
results = list(a = 1, b = 2, c = 3),
unpack(results)
)
# Equivalent plan that works in the current implementation
drake_plan(
results = list(a = 1, b = 2, c = 3),
a = results$a,
b = results$b,
c = results$c
)
The arguments to unpack()
can be targets, that's where it becomes really powerful. This is related to #283 (multi-file output; and the equivalent for R objects), but I don't think #283 is a prerequisite. If unpack()
is called with an up to date target and unchanged code, all resulting targets (from the last run) remain up to date too.
The unpacking is a declarative operation, we don't (necessarily) need to materialize all targets. In particular, if the target is the result of a previous call to meta_make()
, the results are already unpacked.
Semantics identical to tibble::lst()
: Construct a list from a set of targets. The main difference is that this is a declarative operation that doesn't physically construct the list yet. It can be used to bundle targets together for use in a subsequent operation. Example of two equivalent plans:
# Proposed plan
drake_plan(
a = 1,
b = 2,
packed = pack(a, b)
)
# Equivalent plan that works in the current implementation
drake_plan(
a = 1,
b = 2,
packed = tibble::lst(a, b)
)
Essentially the opposite of unpack()
.
We could do meta-make + unpack as a single operation, and not implement pack at all. I'm following the Unix philosophy here, because I feel that we can only gain by exposing these operations separately, if only for testing. From the separate verbs, we can provide a flat_meta_make()
(meta-make + unpack) or even a pack + meta-make + unpack verb. These operations feel simple enough to be understood individually and in combination.
These three verbs seem the simplest possible solution to me, maybe I'm missing a different decomposition into even simpler operations.
- Delayed plan evaluation, possibly a new target state "unknown"
- Visualization: We don't always want to expand the constructed plans when visualizing them
- Storing object hierarchies: When storing
x <- list(a = 1, b = 2)
, we want to be able to accessx$a
andx$b
without loadingx
- ...
The new verbs can be implemented in a similar way to dbplyr: When executed, they return a lightweight data structure that contains all the information necessary to assemble the result. (In dbplyr, tbl %>% select(a, b) %>% filter(a > 5)
creates an object that has a sql_render()
method which composes the corresponding SQL, and only calling collect()
will actually run the query.) This means that the objects returned by meta_make()
et al. can just be serialized without special treatment.
The examples above use named lists for illustration. This means that names for objects/targets must be strings (just like in the current implementation, so not a restriction).
Ideally I'd prefer arbitrary (multivariate) keys to describe targets, and a nested tibble as data structure. (Let's not discuss this in too much detail for now.) If we support two-column data frames (target + x) from the start, we might be able to support multivariate keys later; I'd prefer this over the named list approach.
Alternatively, we might want to stick with named lists and provide seamless support for the enframe()
and deframe()
verbs that convert a named list to a two-column tibble and vice versa.
With a data-frame-based approach and multivariate keys, the focus of the DSL will be more efficient/elegant/straightforward ways to construct plans, which then are passed on to meta_make()
.
On the other hand, restricting target names to simple strings may be enough if our DSL adds multivariate keys on top of that. Again, let's postpone discussion on that detail.