-
Notifications
You must be signed in to change notification settings - Fork 839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: support sub day increments for date64 #6199
fix: support sub day increments for date64 #6199
Conversation
29aaec2
to
d8c4dc1
Compare
I'm marking this ready for review, but I'm hoping a maintainer can take a quick look to see if this is reasonable, and if so, I'll spend some time updating the tests as they're not testing the seconds component very well after the changes. |
@@ -1035,6 +1035,16 @@ impl Date64Type { | |||
epoch.add(Duration::try_milliseconds(i).unwrap()) | |||
} | |||
|
|||
/// Converts an arrow Date64Type into a chrono::NaiveDateTime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file has the important changes in the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tshauck
I did some research on this. Proposal
Even though Arrow stores Date64 as
A 64-bit date type representing the elapsed time since UNIX epoch in milliseconds(64 bits).
https://docs.rs/arrow/latest/arrow/array/types/struct.Date64Type.html
The arrow spec says "where the values are evenly divisible by 86400000"
Thus I am not sure that setting sub-day precision is a good idea as it would permit (and maybe encourage) constructing Date64 that do not actually follows the Arrow spec 🤔
Maybe we can add some comments to Date64Type
explaining this nuanace and pointing people to other alternatives (like Timestamp)
Perhaps we
IntervalDayTimeType::make_value(34, 2), | ||
IntervalDayTimeType::make_value(3, -3), | ||
IntervalDayTimeType::make_value(-12, 4), | ||
IntervalDayTimeType::make_value(34, 0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these changes needed? Would it be possible to update the expected output too? Or is the issue the display looks nasty with millisecond precision too?
I see the same values are used in the tests below as well
{ | ||
let a = PrimitiveArray::<T>::new( | ||
#[test] | ||
fn test_date64_impl() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe this should just be called test_date64
now as it only tests date 64
@@ -1520,4 +1499,94 @@ mod tests { | |||
"Arithmetic overflow: Overflow happened on: 9223372036854775807 - -1" | |||
); | |||
} | |||
|
|||
#[test] | |||
fn test_date32_impl() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likewise this might make sense to call test_date32
/// * `i` - The Date64Type to convert | ||
pub fn to_naive_datetime(i: <Date64Type as ArrowPrimitiveType>::Native) -> NaiveDateTime { | ||
let datetime = NaiveDateTime::default(); | ||
datetime.add(Duration::try_milliseconds(i).unwrap()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also add some documentation about when this will panic? Or comments explaining why it won't panic 🤔
Thanks for your comments @alamb (and patience 😅 ) ... I think I got anchored to the series generation code in datafusion that uses a date32 for generating series for dates. I think the bug (apache/datafusion#11823) is related to date handling of less than day increments, i.e. a date plus something smaller than a day doesn't add any additional days as it loses the precision so the loop never exits. I think maybe the proper path for improving this handling looks something like:
postgres for reference:
|
Sorry for the delay in responding. Things have been very busy for me the last few days
Yes I think this sounds like a good idea
This (using timestamps in gen sereis) also seems reasonable to me -- I suggest we treat it as a separate ticket though as the infinite loop seems like a bug and this part seems more like a feature / improvement.
Sounds also reasonable Thanks again for the help. |
See also #5288 and the corresponding discussion on the mailing list - https://lists.apache.org/thread/q036r1q3cw5ysn3zkpvljx3s9ho18419 IMO systems shouldn't ever use Date64 other than for compatibility with systems that require it |
|
Which issue does this PR close?
Closes #6198
Rationale for this change
It should be possible to add sub day seconds to a date64, but currently those are ignored.
What changes are included in this PR?
Additional functionality to handle going from a date64type to and from a naivedatetime type. Also a couple of unittests.
Are there any user-facing changes?
No