Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend string parsing support for Date32 #5282

Merged
merged 1 commit into from
Jan 11, 2024

Conversation

gruuya
Copy link
Contributor

@gruuya gruuya commented Jan 5, 2024

This now includes the timestamp format besides the plain date format.

Which issue does this PR close?

Closes #5280.

Rationale for this change

PG supports casting a valid timestamp string into a date (by throwing away the time part).

What changes are included in this PR?

Fallback to parsing a datetime if the string length is too long for just a date.

Are there any user-facing changes?

None, except additional format support when casting to Date32.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jan 5, 2024
@gruuya gruuya force-pushed the parse-timestamp-as-date branch from fdfddd1 to 81ea97a Compare January 5, 2024 09:34
Comment on lines +1515 to +1516
"2020-9-8 01:02:03",
"2020-09-08 1:2:3",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may make sense to support these two as well, though that isn't doable with the current TimestampParser.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this could be used to be more flexible with the date(time) support format, while also adhering to some form of input validation:

fn parse_date(string: &str) -> Option<NaiveDate> {
    if string.len() > 10 {
        let mut parts = string.splitn(2, ' ');
        return match (parts.next(), parts.next()) {
            (Some(date), Some(time)) if string_to_time(time).is_some() => parse_date(date),
            _ => None,
        };
    };

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would likely represent a major performance regression as formulated, but so long as we don't regress performance I have no major objections

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's go with the current approach, since it handles the majority of use cases anyway.

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor nit

@@ -7500,17 +7500,19 @@ mod tests {
assert!(c.is_valid(0)); // "2000-01-01"
assert_eq!(date_value, c.value(0));

assert!(c.is_valid(1)); // "2000-01-01"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is incorrect

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, since we trim away the time part, I think it's correct (I see 10957 as the value for both)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value being parsed is still 2000-01-01T12:00:00 which is what this should read I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see; fixed now, thanks!

@gruuya gruuya force-pushed the parse-timestamp-as-date branch from 81ea97a to 35c761b Compare January 11, 2024 10:14
@tustvold tustvold merged commit 72d8a78 into apache:master Jan 11, 2024
25 checks passed
mildbyte pushed a commit to splitgraph/arrow-rs that referenced this pull request Jan 16, 2024
@gruuya gruuya deleted the parse-timestamp-as-date branch February 5, 2024 08:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support casting strings like '2001-01-01 01:01:01' to Date32
2 participants