-
Notifications
You must be signed in to change notification settings - Fork 1.6k
feat: spark udf array shuffle #17674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
impl SparkShuffle { | ||
pub fn new() -> Self { | ||
Self { | ||
signature: Signature::any(1, Volatility::Volatile), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
signature: Signature::any(1, Volatility::Volatile), | |
signature: Signature::arrays(1, None, Volatility::Volatile), |
Example:
datafusion/datafusion/functions-nested/src/empty.rs
Lines 72 to 79 in 44cd972
impl ArrayEmpty { | |
pub fn new() -> Self { | |
Self { | |
signature: Signature::arrays(1, None, Volatility::Immutable), | |
aliases: vec!["array_empty".to_string(), "list_empty".to_string()], | |
} | |
} | |
} |
(using arrays()
instead of array()
to avoid the coercion from FixedSizeList
to List
)
Although, looking at the Spark doc it says it accepts an optional seed argument; do we need to include that here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we implement with a seed argument we can have deterministic tests for shuffle, without running it through sort or relying on the shuffled permutation not being equal to the sorted version
} | ||
} | ||
|
||
fn general_array_shuffle<O: OffsetSizeTrait + TryFrom<i64>>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn general_array_shuffle<O: OffsetSizeTrait + TryFrom<i64>>( | |
fn general_array_shuffle<O: OffsetSizeTrait>( |
951af82
to
faccdab
Compare
Which issue does this PR close?
Rationale for this change
support shuffle udf
What changes are included in this PR?
support shuffle udf
Are these changes tested?
UT
Are there any user-facing changes?
No