-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implemented sqlite__datediff macro using epoch deltas #56
Open
tom-juntunen
wants to merge
1
commit into
main
Choose a base branch
from
I26-implement-datediff-macro
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,42 +1,89 @@ | ||
|
||
{# TODO: fully implement this and rename #} | ||
{# adapted from postgresql #} | ||
{% macro sqlite__datediff_broken(first_date, second_date, datepart) -%} | ||
|
||
{# | ||
-- The sqlite__datediff macro uses epoch time deltas to calculate differences between dates, offering precision down to the millisecond level. | ||
-- Despite SQLite's limitations in handling sub-millisecond accuracy, the macro reliably handles differences across various date parts | ||
-- (year, month, week, day, hour, minute, second, millisecond). | ||
-- To ensure the macro's effectiveness even with the smallest discernible time differences (e.g., one millisecond), empirical adjustments have been implemented. | ||
-- These adjustments are particularly crucial for broader date parts like 'week'. | ||
-- The decision to use CEIL or FLOOR for rounding was refined through testing to accurately reflect differences when dates are separated by as little as one millisecond. | ||
-- This empirical approach ensures that the macro not only meets expected functional accuracy but also addresses real-world use cases effectively. | ||
-- For example, calculating the week difference between two timestamps only a millisecond apart requires careful consideration of how to round fractional weeks. | ||
-- The choice to use CEIL in certain contexts was driven by the need to acknowledge even the minimal time difference as a full unit where contextually appropriate. | ||
-- TODO: More unit testing should be done with a comprehensive calendar table "solved" in another RDBMS that supports datediff natively for comparing against the results of this macro. | ||
#} | ||
{% macro sqlite__datediff(first_date, second_date, datepart) -%} | ||
{% set datepart = datepart.lower() %} | ||
{% if datepart == 'year' %} | ||
(strftime('%Y', {{second_date}}) - strftime('%Y', {{first_date}})) | ||
{# | ||
{% elif datepart == 'quarter' %} | ||
({{ datediff(first_date, second_date, 'year') }} * 4 + date_part('quarter', ({{second_date}})::date) - date_part('quarter', ({{first_date}})::date)) | ||
#} | ||
(strftime('%Y', {{ second_date }}) - strftime('%Y', {{ first_date }})) | ||
{% elif datepart == 'month' %} | ||
(({{ datediff(first_date, second_date, 'year') }} * 12 + strftime('%m', {{second_date}})) - strftime('%m', {{first_date}})) | ||
((strftime('%Y', {{ second_date }}) - strftime('%Y', {{ first_date }})) * 12) + | ||
(strftime('%m', {{ second_date }}) - strftime('%m', {{ first_date }})) | ||
{% elif datepart == 'day' %} | ||
(floor(cast(strftime('%s', {{second_date}}) - strftime('%s', {{first_date}}) as real) / 86400) + | ||
case when {{second_date}} <= strftime('%Y-%m-%d 23:59:59.999999', {{first_date}}) then -1 else 0 end) | ||
CASE | ||
WHEN | ||
((strftime('%s', {{ second_date }}) - strftime('%s', {{ first_date }})) / 86400.0) >= 0 | ||
THEN CEIL( | ||
(strftime('%s', {{ second_date }}) - strftime('%s', {{ first_date }})) / 86400.0 | ||
) | ||
ELSE FLOOR( | ||
(strftime('%s', {{ second_date }}) - strftime('%s', {{ first_date }})) / 86400.0 | ||
) | ||
END | ||
{% elif datepart == 'week' %} | ||
({{ datediff(first_date, second_date, 'day') }} / 7 + case | ||
when strftime('%w', {{first_date}}) <= strftime('%w', {{second_date}}) then | ||
case when {{first_date}} <= {{second_date}} then 0 else -1 end | ||
else | ||
case when {{first_date}} <= {{second_date}} then 1 else 0 end | ||
end) | ||
CASE | ||
WHEN | ||
((strftime('%s', {{ second_date }}) - strftime('%s', {{ first_date }})) / 604800.0) >= 0.285715 | ||
THEN CEIL( | ||
(strftime('%s', {{ second_date }}) - strftime('%s', {{ first_date }})) / 604800.0 | ||
) | ||
WHEN | ||
((strftime('%s', {{ second_date }}) - strftime('%s', {{ first_date }})) / 604800.0) <= -0.285715 | ||
THEN FLOOR( | ||
(strftime('%s', {{ second_date }}) - strftime('%s', {{ first_date }})) / 604800.0 | ||
) | ||
ELSE CAST( | ||
(strftime('%s', {{ second_date }}) - strftime('%s', {{ first_date }})) / 604800.0 | ||
AS INTEGER) | ||
END | ||
{% elif datepart == 'hour' %} | ||
{# ({{ datediff(first_date, second_date, 'day') }} * 24 + strftime("%H", {{second_date}}) - strftime("%H", {{first_date}})) #} | ||
(ceil(cast(strftime('%s', {{second_date}}) - strftime('%s', {{first_date}}) as real) / 3600)) | ||
CASE | ||
WHEN | ||
((strftime('%s', {{ second_date }}) - strftime('%s', {{ first_date }})) / 3600.0) >= 0 | ||
THEN CEIL( | ||
(strftime('%s', {{ second_date }}) - strftime('%s', {{ first_date }})) / 3600.0 | ||
) | ||
ELSE FLOOR( | ||
(strftime('%s', {{ second_date }}) - strftime('%s', {{ first_date }})) / 3600.0 | ||
) | ||
END | ||
{% elif datepart == 'minute' %} | ||
{# ({{ datediff(first_date, second_date, 'hour') }} * 60 + strftime("%M", {{second_date}}) - strftime("%M", {{first_date}})) #} | ||
(ceil(cast(strftime('%s', {{second_date}}) - strftime('%s', {{first_date}}) as real) / 60)) | ||
CASE | ||
WHEN | ||
((strftime('%s', {{ second_date }}) - strftime('%s', {{ first_date }})) / 60.0) >= 0 | ||
THEN CEIL( | ||
(strftime('%s', {{ second_date }}) - strftime('%s', {{ first_date }})) / 60.0 | ||
) | ||
ELSE FLOOR( | ||
(strftime('%s', {{ second_date }}) - strftime('%s', {{ first_date }})) / 60.0 | ||
) | ||
END | ||
{% elif datepart == 'second' %} | ||
(strftime('%s', {{second_date}}) - strftime('%s', {{first_date}})) | ||
{# | ||
CASE | ||
WHEN | ||
((strftime('%s', {{ second_date }}) + cast(substr(strftime('%f', {{ second_date }}), instr(strftime('%f', {{ second_date }}), '.') + 1) as real) / 1000.0) - | ||
(strftime('%s', {{ first_date }}) + cast(substr(strftime('%f', {{ first_date }}), instr(strftime('%f', {{ first_date }}), '.') + 1) as real) / 1000.0)) >= 0 | ||
THEN CEIL( | ||
(strftime('%s', {{ second_date }}) + cast(substr(strftime('%f', {{ second_date }}), instr(strftime('%f', {{ second_date }}), '.') + 1) as real) / 1000.0) - | ||
(strftime('%s', {{ first_date }}) + cast(substr(strftime('%f', {{ first_date }}), instr(strftime('%f', {{ first_date }}), '.') + 1) as real) / 1000.0) | ||
) | ||
ELSE FLOOR( | ||
(strftime('%s', {{ second_date }}) + cast(substr(strftime('%f', {{ second_date }}), instr(strftime('%f', {{ second_date }}), '.') + 1) as real) / 1000.0) - | ||
(strftime('%s', {{ first_date }}) + cast(substr(strftime('%f', {{ first_date }}), instr(strftime('%f', {{ first_date }}), '.') + 1) as real) / 1000.0) | ||
) | ||
END | ||
{% elif datepart == 'millisecond' %} | ||
({{ datediff(first_date, second_date, 'minute') }} * 60000 + floor(date_part('millisecond', ({{second_date}})::timestamp)) - floor(date_part('millisecond', ({{first_date}})::timestamp))) | ||
{% elif datepart == 'microsecond' %} | ||
({{ datediff(first_date, second_date, 'minute') }} * 60000000 + floor(date_part('microsecond', ({{second_date}})::timestamp)) - floor(date_part('microsecond', ({{first_date}})::timestamp))) | ||
#} | ||
((1000 * (strftime('%s', {{ second_date }}))) + cast(substr(strftime('%f', {{ second_date }}), instr(strftime('%f', {{ second_date }}), '.') + 1) as integer) - | ||
(1000 * (strftime('%s', {{ first_date }}))) + cast(substr(strftime('%f', {{ first_date }}), instr(strftime('%f', {{ first_date }}), '.') + 1) as integer)) | ||
{% else %} | ||
{{ exceptions.raise_compiler_error("Unsupported datepart for macro datediff in sqlite: {!r}".format(datepart)) }} | ||
{{ exceptions.raise_compiler_error("Unsupported datepart for macro datediff in SQLite: '" ~ datepart ~ "'") }} | ||
{% endif %} | ||
|
||
{%- endmacro %} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This empirical value of 0.285715 (~2/7) reflects the dividing line between two scenarios: when the two dates being measured for their weekly difference are greater than two days apart from each other, or within two days.
This is only here because one of the test scenarios says that these two dates below have a week difference of 0:
If greater than two days apart, we use CEIL to ensure the week difference is 1
If less than two days apart, we use FLOOR to ensure the week difference is 0.
It does feels arbitrary to say that 2 days is the magical cut off between when a week difference is seen as either 0 or 1. Maybe there is a precedent for this in other RDBMS systems. I'll take a look tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading it correctly, the implementation for the postgres adapter considers whether the 2nd date "rolls over" into the following week, as determined by whether day of week for second_date is less than that for first_date. So it doesn't have to do with a two-day threshold, it has to do with the fact that 2019-12-31 is a Tuesday, so anything from 2020-01-02 through 2020-01-06 would be considered to have a week difference of 0.
https://github.com/dbt-labs/dbt-postgres/blob/118fbbdc26376dd13cf09b43b31e5eba4235bbe8/dbt/include/postgres/macros/utils/datediff.sql#L11-L17
It's tricky for sqlite because there's no easy way to do a "day of week" calculation, at least with the built-in functions. I wonder if we could find a date library to do this. sqlean, which we're already using, doesn't include any date functions, unfortunately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes perfect sense now. Thank you for this explanation!
Also thanks for linking the dbt-postgres implementation, it's good that we reference these existing implementations because I find the edge cases exemplified by the unit tests aren't capturing everything we need yet, but seeing their implementation helps close the gap and shed light.
Regarding sqlite and "day of week", we don't have access to postgres'
dow
, but we can use sqlite's "%w" string format option: https://www.sqlite.org/lang_datefunc.htmlUsing this we could update our
week
part to something like:Though for other parts of the macro, I am now questioning the use of CEIL v FLOOR knowing it likely could go wrong somewhere. I'll look into the postgres native implementation for datediff and see if I can learn a bit more from that to refine this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think I should consider using the code that was already adapted from postgresql to the left because it might be more performant in some areas, and also reads more like the postgresql one which would be a common reference point.
One example of the performance benefit in the postgresql adapter version is the case statement is only being applied on the second operand used for adjustment, not on the whole expression. This avoids calling the same functions (cast and floor/ceil) multiple times on the same expression.