Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-36862][table] Implement additional TO_TIMESTAMP_LTZ() functions #25763

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

yiyutian1
Copy link

@yiyutian1 yiyutian1 commented Dec 9, 2024

What is the purpose of the change

The Flink TO_TIMESTAMP_LTZ is lacking some behaviors, compared with other vendors. eg. https://docs.snowflake.com/en/sql-reference/functions/to_timestamp

This PR adds the following additional functions for TO_TIMESTAMP_LTZ.

TO_TIMESTAMP_LTZ(string1) -- Parse timestamp with default format 
TO_TIMESTAMP_LTZ(string1, string2) -- Parse timestamp with custom format 
TO_TIMESTAMP_LTZ(string1, string2, string3) -- Parse timestamp with format and timezone 
TO_TIMESTAMP_LTZ(numeric) -- For epoch milliseconds

Brief change log

  • Add the additional overloading functions for TO_TIMESTAMP_LTZ.
  • Add java test cases.
  • Make sure the current scala tests pass.

Verifying this change

This change is already covered by existing tests, TemporalTypesTest.scala.
Added additional tests in TimeFunctionsITCase.java.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: ( no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (yes)
  • If yes, how is the feature documented? (documented in sql_functions.yml)

@flinkbot
Copy link
Collaborator

flinkbot commented Dec 9, 2024

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@yiyutian1 yiyutian1 force-pushed the flink36862 branch 2 times, most recently from bd3ff92 to 77a36af Compare December 11, 2024 05:50
@yiyutian1 yiyutian1 changed the title [FLINK-36862][table] Implement additional TO_TIMESTAMP_LTZ() functions [WIP][FLINK-36862][table] Implement additional TO_TIMESTAMP_LTZ() functions Dec 11, 2024
@yiyutian1 yiyutian1 force-pushed the flink36862 branch 3 times, most recently from 1936203 to 0bf526d Compare December 12, 2024 09:45
@yiyutian1 yiyutian1 marked this pull request as ready for review December 12, 2024 10:05
@yiyutian1 yiyutian1 changed the title [WIP][FLINK-36862][table] Implement additional TO_TIMESTAMP_LTZ() functions [FLINK-36862][table] Implement additional TO_TIMESTAMP_LTZ() functions Dec 12, 2024
return DateTimeUtils.toTimestampData(epoch, precision);
}

public TimestampData eval(Double epoch, Integer precision) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This I didn't get, why do we have anything with floating point here?
Can you please clarify
first what does fractional epoch actually means? Does any vendor support it?
Second floating points have issues with exact data

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Sergey, they are for existing tests like:

Comment on lines 68 to 88
public TimestampData eval(Float value, Integer precision) {
if (value == null || precision == null) {
return null;
}
return DateTimeUtils.toTimestampData(value.longValue(), precision);
}

public TimestampData eval(Byte value, Integer precision) {
if (value == null || precision == null) {
return null;
}
return DateTimeUtils.toTimestampData(value.longValue(), precision);
}

public TimestampData eval(DecimalData epoch, Integer precision) {
if (epoch == null || precision == null) {
return null;
}

return DateTimeUtils.toTimestampData(epoch, precision);
}
Copy link
Contributor

@snuyanzin snuyanzin Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need only numeric (exact numeric)
or did I miss anything?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

Comment on lines 409 to 420
public static TimestampData toTimestampData(long epoch) {
return toTimestampData(epoch, 3);
}

public static TimestampData toTimestampData(double epoch) {
return toTimestampData(epoch, 3);
}

public static TimestampData toTimestampData(DecimalData epoch) {
return toTimestampData(epoch, 3);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from any user outside of TO_TIMESTAMP_LTZ 3 here is a magic number.
I would suggest to have such calls only in your toTimestampLtzFunction class and don't put them here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed here: 5335325#r1886093369

Comment on lines 98 to 112
public TimestampData eval(Long epoch) {
if (epoch == null) {
return null;
}

return eval(epoch, 3);
}

public TimestampData eval(Float epoch) {
if (epoch == null) {
return null;
}

return eval(epoch, 3);
}
Copy link
Contributor

@snuyanzin snuyanzin Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can not we replace a number of these duplicating methods with something like

    public TimestampData eval(Number epoch) {
        if (epoch == null) {
            return null;
        }

        return eval(epoch, 3);
    }

?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address here: 5335325#r1886090826

@yiyutian1 yiyutian1 force-pushed the flink36862 branch 2 times, most recently from 408214d to 497f3a0 Compare December 13, 2024 05:04
@yiyutian1
Copy link
Author

@flinkbot run azure

@davidradl
Copy link
Contributor

Reviewed by Chi on 12/12/24 Group to test and/or review outside of the meeting (Nic Townsend from IBM has some insight he will add on this)

@@ -1158,7 +1157,7 @@ class TemporalTypesTest extends ExpressionTestBase {
testAllApis(
toTimestampLtz(100.01.cast(DataTypes.FLOAT()), 0),
"TO_TIMESTAMP_LTZ(CAST(100.01 AS FLOAT), 0)",
"1970-01-01 08:01:40.010")
"1970-01-01 08:01:40.000")
Copy link
Contributor

@snuyanzin snuyanzin Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it like that?
why are we missing millis now?

Or are we casting to timestamp with precision zero?
If so, do we have a similar test with cast to timestamp with precision 3?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @snuyanzin, the 010 is the original test case. I reverted it back to 010 and the test passes now.
I added a precision with 3 in the new Java test.

@@ -2332,7 +2332,8 @@ ANY, and(logical(LogicalTypeRoot.BOOLEAN), LITERAL)

public static final BuiltInFunctionDefinition TO_TIMESTAMP_LTZ =
BuiltInFunctionDefinition.newBuilder()
.name("TO_TIMESTAMP_LTZ")
.name("toTimestampLtz")
.sqlName("TO_TIMESTAMP_LTZ")
Copy link
Author

@yiyutian1 yiyutian1 Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted the change of function name. Added necessary logic register the function.

TIMESTAMP_LTZ(0).nullable())
null,
TIMESTAMP_LTZ(3).nullable())
.testResult(
Copy link
Author

@yiyutian1 yiyutian1 Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To address this comment, added a FLOAT case with precision 3.


// FLOAT -> TIMESTAMP_LTZ
testAllApis(
toTimestampLtz(100.01.cast(DataTypes.FLOAT()), 0),
"TO_TIMESTAMP_LTZ(CAST(100.01 AS FLOAT), 0)",
"1970-01-01 08:01:40")
"1970-01-01 08:01:40.010")
Copy link
Author

@yiyutian1 yiyutian1 Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert to the original test case.

"Cannot apply 'TO_TIMESTAMP_LTZ' to arguments of type" +
" 'TO_TIMESTAMP_LTZ(<CHAR(16)>, <INTEGER>)'. Supported form(s):" +
" 'TO_TIMESTAMP_LTZ(<NUMERIC>, <INTEGER>)'",
"SQL validation failed. From line 1, column 8 to line 1, column 46: Cannot apply " +
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to change these existing scala tests, because I changed the function signature. The change reflects the expected function behaviors.

@@ -92,74 +121,72 @@ public TimestampData eval(Integer epoch) {
return null;
}

return eval(epoch, 3);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replying to this comment:
This is not possible, because we handle exact numeric and fractional data differently. Therefore, we can’t use the umbrella Number for them.
I keep them as is to be consistent with existing logic.
@snuyanzin , if you have a better way, please shed some lights here.

Can not we replace a number of these duplicating methods with something like

public TimestampData eval(Number epoch) {
    if (epoch == null) {
        return null;
    }

    return eval(epoch, 3);
}

@@ -385,6 +388,18 @@ public static TimestampData toTimestampData(int v, int precision) {
}
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To address this comment, I replaced 3 with a constant.
After giving it some thought, I believe it's better to keep these functions in DateTimeUtils.java rather than moving them to toTimestampLtzFunction, since toTimestampLtzFunction already depends on DateTimeUtils.java as a library. What are your thoughts on this? @snuyanzin

from any user outside of TO_TIMESTAMP_LTZ 3 here is a magic number.
I would suggest to have such calls only in your toTimestampLtzFunction class and don't put them here

@yiyutian1
Copy link
Author

Reviewed by Chi on 12/12/24 Group to test and/or review outside of the meeting (Nic Townsend from IBM has some insight he will add on this)

Hi @davidradl, I appreciate your involvement with this PR. As a new member to this community, this means a lot to me!
However, this function is currently a blocker for a user at my company, Confluent.

I've also just submitted another commit to address the previous comments, added tests, and made other updates, which may have rendered some of your current feedback outdated.

Given the time constraints, can I proceed with the Flink committer's approval, and address your comments in a separate PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants