Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More robust data mask rechaining #1231

Open
lionel- opened this issue Jun 30, 2021 · 1 comment
Open

More robust data mask rechaining #1231

lionel- opened this issue Jun 30, 2021 · 1 comment
Labels
Milestone

Comments

@lionel-
Copy link
Member

lionel- commented Jun 30, 2021

Currently the lexical scope of environments that escape data masking contexts is rather undefined. These environments consistently include the masked data but beyond that it depends which quosure was first evaluated:

e1 <- NULL
q1 <- local({
  foo <- 10
  e1 <<- environment()
  quo(environment())
})

e2 <- NULL
q2 <- local({
  foo <- "wrong"
  e2 <<- environment()
  quo(identity(!!q1))
})

out <- eval_tidy(q2)

evalq(foo, out)
#> [1] "wrong"

identical(env_parent(out), e2)
#> [1] TRUE

That's because of the way data masks are rechained to the quosure env as evaluation progresses from one quosure to the next.

This strategy causes pain for some metaprogramming tasks such as quosure expansion in dplyr::across() (cc @romainfrancois), making it difficult to create quosures inside data masks. We are now at our third patch release of dplyr to deal with edge cases related to this.

An alternative rechaining strategy that would be more predictable and easier to work with would be as follows:

  • Create a new execution environment before evaluating each quosure.
  • Set the parent of that exec env to the data mask. Set the data mask top to the quosure env.
  • After evaluation, set the parent of the exec env back to the quosure env. Set the data mask top to the empty env.

With this design all environments created within data masks would consistently inherit from their original lexical environment. On the other hand, they would no longer inherit from the data mask, which is a potentially important breaking change. We would need to assess damage for tidyverse and tidymodels reverse dependencies (cc @topepo @DavisVaughan).

@lionel- lionel- added this to the future milestone Jun 30, 2021
@lionel-
Copy link
Member Author

lionel- commented Jul 5, 2021

dplyr data masks are marked stale after evaluation so removing the mask from the scope would converge towards dplyr behaviour:

library(dplyr)

out <- mtcars %>%
  transmute(fit = list(lm(disp ~ drat)))

fit <- out$fit[[1]]
e <- environment(fit$terms)

rlang::env_print(rlang::env_parent(e))
#> <environment: 0x7f93126f10f8>
#> Parent: <environment: 0x7f9322012090>
#> Bindings:
#> • gear: <lazy>
#> • drat: <lazy>
#> • wt: <lazy>
#> • fit: <lazy>
#> • mpg: <lazy>
#> • hp: <lazy>
#> • cyl: <lazy>
#> • am: <lazy>
#> • qsec: <lazy>
#> • vs: <lazy>
#> • carb: <lazy>

rlang::env_parent(e)$cyl
#> Error: Obsolete data mask.
#> ✖ Too late to resolve `cyl` after the end of `dplyr::mutate()`.
#> ℹ Did you save an object that uses `cyl` lazily in a column in the `dplyr::mutate()` expression ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant