Skip to content

Meta make

Kirill Müller edited this page Mar 6, 2018 · 19 revisions

I'd like to propose three very simple verbs that might get us halfway towards the goal of a DSL (#233). This proposal addresses the low-level technical part, which I think is required for any DSL.

meta-make

Input: A drake plan. Output: A named list (or a tibble with two columns, target and payload). Example of two equivalent plans:

drake_plan(
  x = 1,
  y = 2,
  plan = drake_plan(
    a = x,
    b = y
  ),
  results = meta_make(plan)
)

drake_plan(
  x = 1,
  y = 2,
  plan = ...,
  results = { plan; list(
    a = x,
    b = y
  )}
)

The arguments to meta_make() can be targets, that's where it becomes really powerful. If meta_make() is called with an up to date target and unchanged code, the results remain up to date too.

unpack

Input: A named list (or a data frame with two columns, target (character) and payload). For each item/row, a target is created in the plan. Example of two equivalent plans:

drake_plan(
  results = list(a = 1, b = 2, c = 3),
  unpack(results)
)

drake_plan(
  results = ...,
  a = { results; 1 },
  b = { results; 2 },
  c = { results; 3 },
)

The arguments to unpack() can be targets, that's where it becomes really powerful. This is related to #283 (multi-file output; and the equivalent for R objects), but I don't think #283 is a prerequisite. If unpack() is called with an up to date target and unchanged code, all resulting targets (from the last run) remain up to date too.

The unpacking is a declarative operation, we don't (necessarily) need to materialize all targets. In particular, if the target is the result of a previous call to meta_make(), the results are already unpacked.

pack

Semantics identical to tibble::lst() (or maybe tibble::lst() %>% tibble::enframe()): Construct a list from a set of targets. The main difference is that this is a declarative operation that doesn't physically construct the list yet. It can be used to bundle targets together for use in a subsequent operation.

Essentially the opposite of unpack().

Why three verbs?

We could do meta-make + unpack as a single operation, and not implement pack at all. I'm following the Unix philosophy here, because I feel that we can only gain by exposing these operations separately, if only for testing. From the separate verbs, we can provide a flat_meta_make() (meta-make + unpack) or even a pack + meta-make + unpack verb. These operations feel simple enough to be understood individually and in combination.

These three verbs seem the simplest possible solution to me, maybe I'm missing a different decomposition into even simpler operations.

Towards a DSL?

I'm not deeply in love with that proposal yet, because it's restricted to strings as names for objects/targets. Ideally I'd prefer arbitrary (multivariate) keys for targets, and a nested tibble as data structure. (Let's not discuss this in too much detail for now.) If we support two-column data frames (target + x) from the start, we might be able to support multivariate keys later; I'd prefer this over the named list approach.

With multivariate keys, the focus of the DSL will be more efficient/elegant/straightforward ways to construct plans, which then are passed on to meta_make().

Challenges

  • Delayed plan evaluation, possibly a new target state "unknown"
  • Visualization: We don't always want to expand the constructed plans when visualizing them
  • Storing object hierarchies: When storing x <- list(a = 1, b = 2), we want to be able to access x$a and x$b without loading x
  • ...

Implementation ideas

The new verbs can be implemented in a similar way to dbplyr: When executed, they return a lightweight data structure that contains all the information necessary to assemble the result. (In dbplyr, tbl %>% select(a, b) %>% filter(a > 5) creates an object that has a sql_render() method which composes the corresponding SQL, and only calling collect() will actually run the query.) This means that the objects returned by meta_make() et al. can just be serialized without special treatment.

Clone this wiki locally