Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add support for MonthsBetween #11709

Open
kuhushukla opened this issue Nov 7, 2024 · 1 comment
Open

[FEA] Add support for MonthsBetween #11709

kuhushukla opened this issue Nov 7, 2024 · 1 comment
Labels
? - Needs Triage Need team to review and classify feature request New feature or request

Comments

@kuhushukla
Copy link
Collaborator

Is your feature request related to a problem? Please describe.

cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.MonthsBetween

Describe the solution you'd like
Add a GpuMonthsBetween Implementation for this expression

Describe alternatives you've considered
None
Additional context
Seen in customer queries

@kuhushukla kuhushukla added ? - Needs Triage Need team to review and classify feature request New feature or request labels Nov 7, 2024
@revans2
Copy link
Collaborator

revans2 commented Nov 7, 2024

Wow this is not simple.

months_between(timestamp1, timestamp2[, roundOff]) - If timestamp1 is later than timestamp2, then the result is positive. If timestamp1 and timestamp2 are on the same day of month, or both are the last day of month, time of day will be ignored. Otherwise, the difference is calculated based on 31 days per month, and rounded to 8 digits unless roundOff=false.

@kuhushukla do you know what time zones we need to support (at least beyond UTC)?

The spark code is equally confusing. They convert the timestamps to a LocalDateTime for the desired time zone, and pull out the date from it. Then they convert that to the number of days since the epoch (covert it to a date in the local time zone). After that they convert it back to a LocalDate and calculate the year, month, day, and number of days to the end of the month.

If the days of the month are the same (like both are the 15th) or both of them are the last day of the month, then the number of months between them is calculated exactly as a double value (year1 * 12 + month1) - (year2 * 12 - month2).

otherwise they calculate the number of seconds different between the two timestamps (in a really complicated way where each day has 24 hours worth of seconds and each month has 31 days) It is then rounded to 8 digits.

This is far from simple to do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify feature request New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants