-
Notifications
You must be signed in to change notification settings - Fork 128
Meta make
I'd like to propose three very simple verbs that might get us halfway towards the goal of a DSL (#233). This proposal addresses the low-level technical part, which I think is required for any DSL.
Input: A drake plan. Output: A named list. Example of two equivalent plans:
# Proposed plan
drake_plan(
x = 1,
y = 2,
plan = drake_plan(
a = x,
b = y
),
results = meta_make(plan)
)
# Equivalent plan that works in the current implementation
drake_plan(
x = 1,
y = 2,
plan = ...,
results = { plan; list(
a = x,
b = y
)}
)
The arguments to meta_make()
can be targets, that's where it becomes really powerful. If meta_make()
is called with an up to date target and unchanged code, the results remain up to date too.
Input: A named list. For each item/row, a target is created in the plan. Example of two equivalent plans:
# Proposed plan
drake_plan(
results = list(a = 1, b = 2, c = 3),
unpack(results)
)
# Equivalent plan that works in the current implementation
drake_plan(
results = ...,
a = { results; 1 },
b = { results; 2 },
c = { results; 3 },
)
The arguments to unpack()
can be targets, that's where it becomes really powerful. This is related to #283 (multi-file output; and the equivalent for R objects), but I don't think #283 is a prerequisite. If unpack()
is called with an up to date target and unchanged code, all resulting targets (from the last run) remain up to date too.
The unpacking is a declarative operation, we don't (necessarily) need to materialize all targets. In particular, if the target is the result of a previous call to meta_make()
, the results are already unpacked.
Semantics identical to tibble::lst()
: Construct a list from a set of targets. The main difference is that this is a declarative operation that doesn't physically construct the list yet. It can be used to bundle targets together for use in a subsequent operation. Example of two equivalent plans:
# Proposed plan
drake_plan(
a = 1,
b = 2,
packed = pack(a, b)
)
# Equivalent plan that works in the current implementation
drake_plan(
a = 1,
b = 2,
packed = tibble::lst(a, b)
)
Essentially the opposite of unpack()
.
We could do meta-make + unpack as a single operation, and not implement pack at all. I'm following the Unix philosophy here, because I feel that we can only gain by exposing these operations separately, if only for testing. From the separate verbs, we can provide a flat_meta_make()
(meta-make + unpack) or even a pack + meta-make + unpack verb. These operations feel simple enough to be understood individually and in combination.
These three verbs seem the simplest possible solution to me, maybe I'm missing a different decomposition into even simpler operations.
- Delayed plan evaluation, possibly a new target state "unknown"
- Visualization: We don't always want to expand the constructed plans when visualizing them
- Storing object hierarchies: When storing
x <- list(a = 1, b = 2)
, we want to be able to accessx$a
andx$b
without loadingx
- ...
The new verbs can be implemented in a similar way to dbplyr: When executed, they return a lightweight data structure that contains all the information necessary to assemble the result. (In dbplyr, tbl %>% select(a, b) %>% filter(a > 5)
creates an object that has a sql_render()
method which composes the corresponding SQL, and only calling collect()
will actually run the query.) This means that the objects returned by meta_make()
et al. can just be serialized without special treatment.
The examples above use named lists for illustration. This means that names for objects/targets must be strings (just like in the current implementation, so not a restriction).
Ideally I'd prefer arbitrary (multivariate) keys to describe targets, and a nested tibble as data structure. (Let's not discuss this in too much detail for now.) If we support two-column data frames (target + x) from the start, we might be able to support multivariate keys later; I'd prefer this over the named list approach.
Alternatively, we might want to stick with named lists and provide seamless support for the enframe()
and deframe()
verbs that convert a named list to a two-column tibble and vice versa.
With a data-frame-based approach and multivariate keys, the focus of the DSL will be more efficient/elegant/straightforward ways to construct plans, which then are passed on to meta_make()
.
On the other hand, restricting target names to simple strings may be enough if our DSL adds multivariate keys on top of that. Again, let's postpone discussion on that detail.