Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

power operation on aggregate result uses dplyr fallback #265

Open
Tmonster opened this issue Sep 25, 2024 · 2 comments
Open

power operation on aggregate result uses dplyr fallback #265

Tmonster opened this issue Sep 25, 2024 · 2 comments

Comments

@Tmonster
Copy link
Contributor

Tmonster commented Sep 25, 2024

There are aggregation functions that are available in DuckDB, but duckplyr still falls back to dplyr.

discovered when benchmarking duckplyr with the db-benchmark. This example comes from group by query q9

repro

.libPaths("./duckplyr/r-duckplyr") # tidyverse/duckplyr#4641
suppressPackageStartupMessages(library("duckplyr", lib.loc="./duckplyr/r-duckplyr", warn.conflicts=FALSE))
ver = packageVersion("duckplyr")

src_grp = "test.csv"

x = as_duckplyr_tibble(data.table::fread(src_grp, showProgress=FALSE, na.strings="", data.table=FALSE))
print(nrow(x))

t = system.time(print(dim(ans<-x %>% summarise(.by = c(id2, id4), r2=cor(v1, v2, use="na.or.complete")^2))))[["elapsed"]]

The duckplyr package is configured to fall back to dplyr when it encounters an incompatibility. Fallback events can be collected and uploaded for analysis to guide future development.
By default, no data will be collected or uploaded.
ℹ A fallback situation just occurred. The following information would have been recorded:
  {"version":"0.4.1","message":"No translation for function
  `^`.","name":"summarise","x":{"...1":"character","...2":"character","...3":"character","...4":"integer","...5":"integer","...6":"integer","...7":"integer","...8":"integer","...9":"numeric"},"args":{"dots":{"...10":"cor(...7,
  ...8, use = \"<character>\")^2"},"by":["...2","...4"]}}
→ Run `duckplyr::fallback_sitrep()` to review the current settings.
→ Run `Sys.setenv(DUCKPLYR_FALLBACK_COLLECT = 1)` to enable fallback logging, and `Sys.setenv(DUCKPLYR_FALLBACK_VERBOSE = TRUE)` in addition to enable printing of fallback situations
  to the console.
→ Run `duckplyr::fallback_review()` to review the available reports, and `duckplyr::fallback_upload()` to upload them.
ℹ See `?duckplyr::fallback()` for details.

test.csv

id1,id2,id3,id4,id5,id6,v1,v2,v3
id010,id007,id0000329755,9,1,707298,1,7,45.741516
id007,id004,id0000136233,5,5,635644,3,1,7.932007
id006,id001,id0000306329,6,4,910916,1,8,92.181312
id007,id009,id0000194009,1,4,378004,3,7,35.369551
id010,id004,id0000067310,5,3,77126,5,5,27.005417
id006,id004,id0000733374,2,6,1416,3,13,3.830562
id007,id010,id0000723276,3,4,567333,5,11,18.993338
id007,id003,id0000191079,5,3,652736,4,1,35.720091
id009,id010,id0000364850,1,5,771296,5,8,90.567817

============================= OLD ISSUE BEFORE EDIT (CAN IGNORE) ======================
repro

library(duckplyr)
library(DBI)
x = as_duckplyr_tibble(iris)
x %>% arrange(sum(Sepal.Length)^2)

The duckplyr package is configured to fall back to dplyr when it encounters an incompatibility. Fallback events can be collected and uploaded for analysis to guide future
development. By default, no data will be collected or uploaded.
ℹ A fallback situation just occurred. The following information would have been recorded:
  {"version":"0.4.1","message":"Can't convert columns of class <factor> to relational. Affected column:
  `...5`.","name":"arrange","x":{"...1":"numeric","...2":"numeric","...3":"numeric","...4":"numeric","...5":"factor"},"args":{"dots":["sum(...1)^2"],".by_group":false}}
→ Run `duckplyr::fallback_sitrep()` to review the current settings.
→ Run `Sys.setenv(DUCKPLYR_FALLBACK_COLLECT = 1)` to enable fallback logging, and `Sys.setenv(DUCKPLYR_FALLBACK_VERBOSE = TRUE)` in addition to enable printing of fallback
  situations to the console.
→ Run `duckplyr::fallback_review()` to review the available reports, and `duckplyr::fallback_upload()` to upload them.
ℹ See `?duckplyr::fallback()` for details.

~~
The occurs on group by query 9 of the db benchmark.

@Tmonster
Copy link
Contributor Author

Feel free to close if this is not an issue

@krlmlr
Copy link
Member

krlmlr commented Sep 25, 2024

Oh no, it is relevant -- I think we "forgot" the power operator at some point. Are you aware of any semantic differences? I'll see if it passes revdepchecks so that we can enable the translation.

The obvious candidate seems the same in both systems:

library(DBI)
conn <- dbConnect(duckdb::duckdb())
0^0
#> [1] 1
dbGetQuery(conn, "SELECT 0^0")
#>   (0 ^ 0)
#> 1       1
dbGetQuery(conn, "SELECT 0.0^0")
#>   (0.0 ^ 0)
#> 1         1
dbGetQuery(conn, "SELECT 0^0.0")
#>   (0 ^ 0.0)
#> 1         1
dbGetQuery(conn, "SELECT 0.0^0.0")
#>   (0.0 ^ 0.0)
#> 1           1

Created on 2024-09-25 with reprex v2.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants