Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected differences in cur_group_id() operations in vs. outside mutate calls #6889

Closed
Nick-Eagles opened this issue Jul 21, 2023 · 1 comment

Comments

@Nick-Eagles
Copy link

Hello, I'm encountering unexpected behavior from cur_group_id(). In particular, I expected that operations within a mutate() call surrounding cur_group_id() would treat the output of cur_group_id() as a vector along the rows of the tibble.

In the below reprex, calling duplicated(cur_group_id()) in the mutate call gives a different result than calling cur_group_id() in the mutate call, extracting the result, and then calling duplicated.

library(tidyverse)
data(mtcars)

#   Pick a column to group by (that has repeated values)
mt_tibble = as_tibble(mtcars) |>
    group_by(cyl)

#  Call 'duplicated' outside the 'mutate' call
mt_tibble |>
    mutate(gid = cur_group_id()) |>
    pull(gid) |>
    duplicated() |>
    table()

## FALSE  TRUE 
##     3    29

# Call 'duplicated' inside 'mutate' call
mt_tibble |>
    mutate(gid = duplicated(cur_group_id())) |>
    pull(gid) |>
    table()

## FALSE 
##    32 

I'm interested to know if I'm fundamentally misusing cur_group_id(), since I'm not very experienced with it, but I also think that if this is considered expected behavior, it is probably counterintuitive for many users, and seems inconsistent with other dplyr behavior (for example, mutate(n = n() * 2) is a perfectly valid operation that elementwise doubles the values output by n())

Thanks!

Best,

-Nick

Session info (using the latest dplyr 1.1.2):

─ Session info ────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.1 Patched (2023-07-21 r84719)
 os       CentOS Linux 7 (Core)
 system   x86_64, linux-gnu
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       US/Eastern
 date     2023-07-21
 pandoc   3.1.1 @ /jhpce/shared/jhpce/core/conda/miniconda3-4.11.0/envs/svnR-4.3/bin/pandoc

─ Packages ────────────────────────────────────────────────────────────────────────────────────
 package     * version date (UTC) lib source
 cli           3.6.1   2023-03-23 [2] CRAN (R 4.3.0)
 colorspace    2.1-0   2023-01-23 [2] CRAN (R 4.3.0)
 dplyr       * 1.1.2   2023-04-20 [2] CRAN (R 4.3.0)
 fansi         1.0.4   2023-01-22 [2] CRAN (R 4.3.0)
 forcats     * 1.0.0   2023-01-29 [2] CRAN (R 4.3.0)
 generics      0.1.3   2022-07-05 [2] CRAN (R 4.3.0)
 ggplot2     * 3.4.2   2023-04-03 [2] CRAN (R 4.3.0)
 glue          1.6.2   2022-02-24 [2] CRAN (R 4.3.0)
 gtable        0.3.3   2023-03-21 [2] CRAN (R 4.3.0)
 hms           1.1.3   2023-03-21 [2] CRAN (R 4.3.0)
 lifecycle     1.0.3   2022-10-07 [2] CRAN (R 4.3.0)
 lubridate   * 1.9.2   2023-02-10 [2] CRAN (R 4.3.0)
 magrittr      2.0.3   2022-03-30 [2] CRAN (R 4.3.0)
 munsell       0.5.0   2018-06-12 [2] CRAN (R 4.3.0)
 pillar        1.9.0   2023-03-22 [2] CRAN (R 4.3.0)
 pkgconfig     2.0.3   2019-09-22 [2] CRAN (R 4.3.0)
 purrr       * 1.0.1   2023-01-10 [2] CRAN (R 4.3.0)
 R6            2.5.1   2021-08-19 [2] CRAN (R 4.3.0)
 readr       * 2.1.4   2023-02-10 [2] CRAN (R 4.3.0)
 rlang         1.1.1   2023-04-28 [2] CRAN (R 4.3.0)
 scales        1.2.1   2022-08-20 [2] CRAN (R 4.3.0)
 sessioninfo * 1.2.2   2021-12-06 [2] CRAN (R 4.3.0)
 stringi       1.7.12  2023-01-11 [2] CRAN (R 4.3.0)
 stringr     * 1.5.0   2022-12-02 [2] CRAN (R 4.3.0)
 tibble      * 3.2.1   2023-03-20 [2] CRAN (R 4.3.0)
 tidyr       * 1.3.0   2023-01-24 [2] CRAN (R 4.3.0)
 tidyselect    1.2.0   2022-10-10 [2] CRAN (R 4.3.0)
 tidyverse   * 2.0.0   2023-02-22 [2] CRAN (R 4.3.0)
 timechange    0.2.0   2023-01-11 [2] CRAN (R 4.3.0)
 tzdb          0.4.0   2023-05-12 [2] CRAN (R 4.3.0)
 utf8          1.2.3   2023-01-31 [2] CRAN (R 4.3.0)
 vctrs         0.6.3   2023-06-14 [2] CRAN (R 4.3.1)
 withr         2.5.0   2022-03-03 [2] CRAN (R 4.3.0)

 [1] /users/neagles/R/4.3
 [2] /jhpce/shared/jhpce/core/conda/miniconda3-4.11.0/envs/svnR-4.3/R/4.3/lib64/R/site-library
 [3] /jhpce/shared/jhpce/core/conda/miniconda3-4.11.0/envs/svnR-4.3/R/4.3/lib64/R/library
@DavisVaughan
Copy link
Member

cur_group_id() returns a single value, the current group id. The duplicated(cur_group_id()) expression is called 3 times, once for each cyl group, so it basically gets called as duplicated(1), duplicated(2), and duplicated(3), all of which return FALSE, and then that FALSE is recycled to the size of each group.

So nothing is really wrong here, but I don't think this is a common usage of duplicated() or cur_group_id().

This question may be better for Posit Community https://community.rstudio.com/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants