Skip to content

Commit

Permalink
GH-43894: [R] format_aggregation() should print options too (#43896)
Browse files Browse the repository at this point in the history
### Rationale for this change

If you printed the inner query after summarize, it would show what function was being called but not the function options.

### What changes are included in this PR?

One-line code change plus a test

### Are these changes tested?

Yes. Interestingly, it did not seem that `format_aggregations()` was tested before.

### Are there any user-facing changes?

Technically yes, but few users would likely see this.
* GitHub Issue: #43894

Authored-by: Neal Richardson <[email protected]>
Signed-off-by: Nic Crane <[email protected]>
  • Loading branch information
nealrichardson authored Sep 2, 2024
1 parent 7f88ae7 commit a8df190
Show file tree
Hide file tree
Showing 2 changed files with 39 additions and 1 deletion.
2 changes: 1 addition & 1 deletion r/R/dplyr-summarize.R
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,7 @@ group_types <- function(.data, schema = NULL) {
}

format_aggregation <- function(x) {
paste0(x$fun, "(", paste(map(x$data, ~ .$ToString()), collapse = ","), ")")
Expression$create(x$fun, args = x$data, options = x$options)$ToString()
}

# This function evaluates an expression and returns the post-summarize
Expand Down
38 changes: 38 additions & 0 deletions r/tests/testthat/test-dplyr-summarize.R
Original file line number Diff line number Diff line change
Expand Up @@ -955,6 +955,44 @@ test_that("Summarize with 0 arguments", {
)
})

test_that("Printing aggregation expressions", {
q <- tbl |>
arrow_table() |>
summarize(
total = sum(int, na.rm = TRUE),
prod = prod(int, na.rm = TRUE),
any = any(lgl, na.rm = TRUE),
all = all(lgl, na.rm = TRUE),
mean = mean(int, na.rm = TRUE),
sd = sd(int, na.rm = TRUE),
var = var(int, na.rm = TRUE),
n_distinct = n_distinct(chr),
min = min(int, na.rm = TRUE),
max = max(int, na.rm = TRUE)
)
expect_output(
print(q$.data),
"Table (query)
int: int32
lgl: bool
chr: string
* Aggregations:
total: sum(int, {skip_nulls=true, min_count=0})
prod: product(int, {skip_nulls=true, min_count=0})
any: any(lgl, {skip_nulls=true, min_count=0})
all: all(lgl, {skip_nulls=true, min_count=0})
mean: mean(int, {skip_nulls=true, min_count=0})
sd: stddev(int, {ddof=1, skip_nulls=true, min_count=0})
var: variance(int, {ddof=1, skip_nulls=true, min_count=0})
n_distinct: count_distinct(chr, {mode=ALL})
min: min(int, {skip_nulls=true, min_count=0})
max: max(int, {skip_nulls=true, min_count=0})
See $.data for the source Arrow object",
fixed = TRUE
)
})

test_that("Not supported: window functions", {
compare_dplyr_binding(
.input %>%
Expand Down

0 comments on commit a8df190

Please sign in to comment.