Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reframe() is dropping columns when you use rowwise() #6903

Closed
mutahiwachira opened this issue Aug 4, 2023 · 2 comments
Closed

reframe() is dropping columns when you use rowwise() #6903

mutahiwachira opened this issue Aug 4, 2023 · 2 comments

Comments

@mutahiwachira
Copy link
Contributor

mutahiwachira commented Aug 4, 2023

Minimal reprex

library(dplyr)
quantile_df <- function(x, probs = c(0.25, 0.5, 0.75)) {
  tibble(
    val = quantile(x, probs, na.rm = TRUE),
    quant = probs
  )
}

# Actual Behavioiur: Removes the grouping columns and just returns cols from rowwise
starwars %>%
  rowwise() %>% 
  reframe(quantile_df(height)) %>%
  ungroup()
  # 261 rows, 2 cols from quantile_df

# Expected Beavhiour: Preserve them like in this code
starwars %>% 
  rowwise() %>% 
  mutate(quantiles = list(quantile_df(height))) %>% 
  unnest(quantiles) %>%
  ungroup()
  # 261 rows, all columns preserved

I want to be able to go from 1 row to multiple rows while keeping the previous information.

The use case is described below.

Use case

  • I was doing a simulation of the German Tank problem following a very functional/list-column heavy workflow. I have a dataframe called sensitivities which has columns that I need to generate and describe my samples. It looks like this:
## A tibble: 18 × 4
#  pop_size prop_of_pop all_tanks     sample_size
#       <dbl>       <dbl> <list>              <dbl>
#   1     1000         0.1 <int [1,000]>         100
#   2     1000         0.2 <int [1,000]>         200
#   3     1000         0.3 <int [1,000]>         300
#   4     1000         0.4 <int [1,000]>         400
#   5     1000         0.5 <int [1,000]>         500
#   6     1000         0.6 <int [1,000]>         600
#   7     1000         0.7 <int [1,000]>         700
#   8     1000         0.8 <int [1,000]>         800
#   9     1000         0.9 <int [1,000]>         900
# 10     2000         0.1 <int [2,000]>         200
# 11     2000         0.2 <int [2,000]>         400
# 12     2000         0.3 <int [2,000]>         600
# 13     2000         0.4 <int [2,000]>         800
# 14     2000         0.5 <int [2,000]>        1000
# 15     2000         0.6 <int [2,000]>        1200
# 16     2000         0.7 <int [2,000]>        1400
# 17     2000         0.8 <int [2,000]>        1600
# 18     2000         0.9 <int [2,000]>        1800

I have a function called simulate_samples. Each row of the above df defines one sensitivity. Given one sensitivity, simulate_sample generates a dataframe of 100 samples with a sample_id<int> and a list column for sample<list[int]>. One row to many. So I used reframe and got the behaviour like in the minimal reprex.

@DavisVaughan
Copy link
Member

Do you just want this then?

starwars %>%
  rowwise(everything()) %>% 
  reframe(quantile_df(height)) %>%
  ungroup()

rowwise() works more like summarise() than like mutate()

@mutahiwachira
Copy link
Contributor Author

That's perfect. I also read the docs and I see that the simulation case is mentioned.
Thanks so much this is very useful. Look forward to see what you guys do with by-row operations in future as they are very useful for simulations and nesting calculations without loops.

This solves my issue so I will close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants