-
Notifications
You must be signed in to change notification settings - Fork 824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor temporal extract date part kernels #5319
Refactor temporal extract date part kernels #5319
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know what you think @tustvold , if this falls in line with what you're imagining.
I tried to make things simpler, without resorting to macros so the code might be duplicated a bit but I tried to reduce as much as I could.
Happy for any feedback
arrow-arith/src/temporal.rs
Outdated
// TODO expand support for Time types, see: https://github.com/apache/arrow-rs/issues/5261 | ||
_ => return_compute_error_with!(format!("{part} does not support"), array.data_type()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can tackle this as separate issue to make this refactoring simpler (maintain existing behaviour)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a nice step forward, I have some ideas on how we might use generic to reduce some of the duplication, but I'd need to try it out to know if it works 😅
arrow-arith/src/temporal.rs
Outdated
} | ||
}); | ||
/// Returns function to extract relevant [`DatePart`] from a [`NaiveDateTime`]. | ||
fn get_naive_date_time_part_extract_fn(part: DatePart) -> fn(NaiveDateTime) -> i32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious why this doesn't need to be impl fn
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No reason, really. I just stuck with the first signature that worked here 😅
arrow-arith/src/temporal.rs
Outdated
fn date_part_timestamp_ns( | ||
array: &PrimitiveArray<TimestampNanosecondType>, | ||
part: DatePart, | ||
) -> Int32Array { | ||
let map_func = get_naive_date_time_part_extract_fn(part); | ||
array.unary_opt(|d| timestamp_ns_to_datetime(d).map(map_func)) | ||
} | ||
|
||
fn date_part_timestamp_ns_tz( | ||
array: &PrimitiveArray<TimestampNanosecondType>, | ||
part: DatePart, | ||
tz: Tz, | ||
) -> Int32Array { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One way that you might be able to avoid this duplication would be to make the functions take a generic Timezone
argument, and then provide Utc
as the timezone when none is provided
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, will look into it
I've refactored it to use generics more, let me know if it looks better this way 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very nice, thank you
} | ||
|
||
/// Implement the specialized functions for extracting date part from temporal arrays. | ||
trait ExtractDatePartExt { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, this is exactly what I was thinking of 👍
arrow-arith/src/temporal.rs
Outdated
| DatePart::Microsecond | ||
| DatePart::Nanosecond = part | ||
{ | ||
Ok(self.unary(|_| 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW it might be cheaper to do Int32Array::new(vec![0; self.len()].into(), self.nulls().clone())
or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, applied change
@@ -228,14 +372,14 @@ where | |||
T: ArrowTemporalType + ArrowNumericType, | |||
i64: From<T::Native>, | |||
{ | |||
time_fraction_internal(array, "year", |t| t.year()) | |||
date_part_primitive(array, DatePart::Year) | |||
} | |||
|
|||
/// Extracts the quarter of a given temporal array as an array of integersa within | |||
/// the range of [1, 4]. If the given array isn't temporal primitive or dictionary array, | |||
/// an `Err` will be returned. | |||
pub fn quarter_dyn(array: &dyn Array) -> Result<ArrayRef, ArrowError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we deprecate these functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably for the best, as if in the future we want to support more parts like century
or decade
in arrow-rs natively, we wouldn't have to add these shim functions for consistency if they are already deprecated.
That does raise a question that comes to mind, that if we add more of these parts, by extending the DatePart
enum, would that constitute a breaking API change? Since we're introducing a new enum variant. Maybe mark it as non-exhaustive, or would this be unnecessary since its mainly used as an input to the date_part()
function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non exhaustive sounds like a reasonable idea to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add this and the deprecations for the shim functions, then PR should be good to go
Which issue does this PR close?
Closes #5266
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?
No, aside from new function. Existing functions (
hour(...)
,year(...)
) are unchanged, only internals are changed and should maintain existing behaviour.