-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support unordered row_number() on Snowflake, MSSQL, and Teradata #1331
Support unordered row_number() on Snowflake, MSSQL, and Teradata #1331
Conversation
SELECT `df`.*, ROW_NUMBER() OVER (PARTITION BY `y` ORDER BY `y`) AS `rown` | ||
SELECT | ||
`df`.*, | ||
ROW_NUMBER() OVER (PARTITION BY `y` ORDER BY (SELECT NULL)) AS `rown` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, Teradata (via win_rank_tdata()
) was previously defaulting to ORDER BY
the result of win_current_group()
(added in 4109051#diff-c4f086980e5692d16e3337acd7bfefe3a9315135688133ee4f24ea2944ccba3bR174 by @overmar ).
By generalizing the behavior to fall back to SELECT NULL
for all back ends (incl. Snowflake / MSSQL) this now no longer defaults to ordering by the group.
I'm likely not familiar enough with (1) PARTITION vs ORDER and (2) Teradata to determine whether this would have adverse impacts, so would welcome input from more knowledgeable folks!
@mgirlich would you be open to reviewing? thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this!
@mgirlich I'm ok with allowing an argument-less row_number()
; it doesn't really make a lot of sense for databases, but it's still sometimes useful.
|
||
#' @export | ||
#' @rdname win_over | ||
win_rank_tdata <- function(f) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this exported in a released dbplyr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was, as of 2.3.0 (added in 4109051#diff-c4f086980e5692d16e3337acd7bfefe3a9315135688133ee4f24ea2944ccba3bR167). I'd guess it was primarily to enable row_number()
for Teradata backend.
Open to any recommended patterns for deprecation!
R/translate-sql-window.R
Outdated
#' @rdname win_over | ||
#' @export | ||
win_rank <- function(f) { | ||
win_rank <- function(f, use_default_order_null = FALSE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This argument name feels a bit long to me, and needs to be documented. Maybe empty_order = TRUE
or order_by = c("required", "optional")
, or something along those lines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to empty_order
(defaulting to FALSE
to not apply to all backends) and added some docs!
#Conflicts: # man/win_over.Rd
Thanks for the PR! 😄 |
row_number()
on Snowflake #1332row_number()
#1316 (Closesrow_number()
portion of MSSQL Incorrect Translation of Boolean and row_number while Filtering Zero Rows #1233 for MSSQL)Of note, from https://stackoverflow.com/questions/44105691/row-number-without-order-by per second comment, I opted to use
SELECT NULL
overSELECT 1
(which also aligns with the implementation for Teradata).