Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implemented sqlite__datediff macro using epoch deltas #56

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 78 additions & 31 deletions dbt/include/sqlite/macros/utils/datediff.sql
Original file line number Diff line number Diff line change
@@ -1,42 +1,89 @@

{# TODO: fully implement this and rename #}
{# adapted from postgresql #}
{% macro sqlite__datediff_broken(first_date, second_date, datepart) -%}

{#
-- The sqlite__datediff macro uses epoch time deltas to calculate differences between dates, offering precision down to the millisecond level.
-- Despite SQLite's limitations in handling sub-millisecond accuracy, the macro reliably handles differences across various date parts
-- (year, month, week, day, hour, minute, second, millisecond).
-- To ensure the macro's effectiveness even with the smallest discernible time differences (e.g., one millisecond), empirical adjustments have been implemented.
-- These adjustments are particularly crucial for broader date parts like 'week'.
-- The decision to use CEIL or FLOOR for rounding was refined through testing to accurately reflect differences when dates are separated by as little as one millisecond.
-- This empirical approach ensures that the macro not only meets expected functional accuracy but also addresses real-world use cases effectively.
-- For example, calculating the week difference between two timestamps only a millisecond apart requires careful consideration of how to round fractional weeks.
-- The choice to use CEIL in certain contexts was driven by the need to acknowledge even the minimal time difference as a full unit where contextually appropriate.
-- TODO: More unit testing should be done with a comprehensive calendar table "solved" in another RDBMS that supports datediff natively for comparing against the results of this macro.
#}
{% macro sqlite__datediff(first_date, second_date, datepart) -%}
{% set datepart = datepart.lower() %}
{% if datepart == 'year' %}
(strftime('%Y', {{second_date}}) - strftime('%Y', {{first_date}}))
{#
{% elif datepart == 'quarter' %}
({{ datediff(first_date, second_date, 'year') }} * 4 + date_part('quarter', ({{second_date}})::date) - date_part('quarter', ({{first_date}})::date))
#}
(strftime('%Y', {{ second_date }}) - strftime('%Y', {{ first_date }}))
{% elif datepart == 'month' %}
(({{ datediff(first_date, second_date, 'year') }} * 12 + strftime('%m', {{second_date}})) - strftime('%m', {{first_date}}))
((strftime('%Y', {{ second_date }}) - strftime('%Y', {{ first_date }})) * 12) +
(strftime('%m', {{ second_date }}) - strftime('%m', {{ first_date }}))
{% elif datepart == 'day' %}
(floor(cast(strftime('%s', {{second_date}}) - strftime('%s', {{first_date}}) as real) / 86400) +
case when {{second_date}} <= strftime('%Y-%m-%d 23:59:59.999999', {{first_date}}) then -1 else 0 end)
CASE
WHEN
((strftime('%s', {{ second_date }}) - strftime('%s', {{ first_date }})) / 86400.0) >= 0
THEN CEIL(
(strftime('%s', {{ second_date }}) - strftime('%s', {{ first_date }})) / 86400.0
)
ELSE FLOOR(
(strftime('%s', {{ second_date }}) - strftime('%s', {{ first_date }})) / 86400.0
)
END
{% elif datepart == 'week' %}
({{ datediff(first_date, second_date, 'day') }} / 7 + case
when strftime('%w', {{first_date}}) <= strftime('%w', {{second_date}}) then
case when {{first_date}} <= {{second_date}} then 0 else -1 end
else
case when {{first_date}} <= {{second_date}} then 1 else 0 end
end)
CASE
WHEN
((strftime('%s', {{ second_date }}) - strftime('%s', {{ first_date }})) / 604800.0) >= 0.285715
THEN CEIL(
Copy link
Collaborator Author

@tom-juntunen tom-juntunen Apr 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This empirical value of 0.285715 (~2/7) reflects the dividing line between two scenarios: when the two dates being measured for their weekly difference are greater than two days apart from each other, or within two days.

This is only here because one of the test scenarios says that these two dates below have a week difference of 0:

first_date,second_date,datepart,result
2019-12-31 00:00:00,2020-01-02 00:00:00,week,0

If greater than two days apart, we use CEIL to ensure the week difference is 1
If less than two days apart, we use FLOOR to ensure the week difference is 0.

It does feels arbitrary to say that 2 days is the magical cut off between when a week difference is seen as either 0 or 1. Maybe there is a precedent for this in other RDBMS systems. I'll take a look tomorrow.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading it correctly, the implementation for the postgres adapter considers whether the 2nd date "rolls over" into the following week, as determined by whether day of week for second_date is less than that for first_date. So it doesn't have to do with a two-day threshold, it has to do with the fact that 2019-12-31 is a Tuesday, so anything from 2020-01-02 through 2020-01-06 would be considered to have a week difference of 0.

https://github.com/dbt-labs/dbt-postgres/blob/118fbbdc26376dd13cf09b43b31e5eba4235bbe8/dbt/include/postgres/macros/utils/datediff.sql#L11-L17

It's tricky for sqlite because there's no easy way to do a "day of week" calculation, at least with the built-in functions. I wonder if we could find a date library to do this. sqlean, which we're already using, doesn't include any date functions, unfortunately.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it doesn't have to do with a two-day threshold, it has to do with the fact that 2019-12-31 is a Tuesday, so anything from 2020-01-02 through 2020-01-06 would be considered to have a week difference of 0.

That makes perfect sense now. Thank you for this explanation!

Also thanks for linking the dbt-postgres implementation, it's good that we reference these existing implementations because I find the edge cases exemplified by the unit tests aren't capturing everything we need yet, but seeing their implementation helps close the gap and shed light.

Regarding sqlite and "day of week", we don't have access to postgres' dow, but we can use sqlite's "%w" string format option: https://www.sqlite.org/lang_datefunc.html

select cast(strftime('%w', datetime('now')) as integer);
(1,)

Using this we could update our week part to something like:

{% elif datepart == 'week' %}
        -- Calculate the initial week difference by finding the difference in days and dividing by 7
        (
            ((strftime('%s', {{ second_date }}) - strftime('%s', {{ first_date }})) / 86400) / 7 +
            -- Adjust based on the days of the week to ensure correct week boundaries
            CASE
                WHEN strftime('%w', {{ first_date }}) <= strftime('%w', {{ second_date }})
                THEN CASE
                        WHEN strftime('%s', {{ first_date }}) <= strftime('%s', {{ second_date }})
                        THEN 0
                        ELSE -1
                    END
                ELSE CASE
                        WHEN strftime('%s', {{ first_date }}) <= strftime('%s', {{ second_date }})
                        THEN 1
                        ELSE 0
                    END
            END
        )

Though for other parts of the macro, I am now questioning the use of CEIL v FLOOR knowing it likely could go wrong somewhere. I'll look into the postgres native implementation for datediff and see if I can learn a bit more from that to refine this PR.

Copy link
Collaborator Author

@tom-juntunen tom-juntunen Apr 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think I should consider using the code that was already adapted from postgresql to the left because it might be more performant in some areas, and also reads more like the postgresql one which would be a common reference point.

One example of the performance benefit in the postgresql adapter version is the case statement is only being applied on the second operand used for adjustment, not on the whole expression. This avoids calling the same functions (cast and floor/ceil) multiple times on the same expression.

(strftime('%s', {{ second_date }}) - strftime('%s', {{ first_date }})) / 604800.0
)
WHEN
((strftime('%s', {{ second_date }}) - strftime('%s', {{ first_date }})) / 604800.0) <= -0.285715
THEN FLOOR(
(strftime('%s', {{ second_date }}) - strftime('%s', {{ first_date }})) / 604800.0
)
ELSE CAST(
(strftime('%s', {{ second_date }}) - strftime('%s', {{ first_date }})) / 604800.0
AS INTEGER)
END
{% elif datepart == 'hour' %}
{# ({{ datediff(first_date, second_date, 'day') }} * 24 + strftime("%H", {{second_date}}) - strftime("%H", {{first_date}})) #}
(ceil(cast(strftime('%s', {{second_date}}) - strftime('%s', {{first_date}}) as real) / 3600))
CASE
WHEN
((strftime('%s', {{ second_date }}) - strftime('%s', {{ first_date }})) / 3600.0) >= 0
THEN CEIL(
(strftime('%s', {{ second_date }}) - strftime('%s', {{ first_date }})) / 3600.0
)
ELSE FLOOR(
(strftime('%s', {{ second_date }}) - strftime('%s', {{ first_date }})) / 3600.0
)
END
{% elif datepart == 'minute' %}
{# ({{ datediff(first_date, second_date, 'hour') }} * 60 + strftime("%M", {{second_date}}) - strftime("%M", {{first_date}})) #}
(ceil(cast(strftime('%s', {{second_date}}) - strftime('%s', {{first_date}}) as real) / 60))
CASE
WHEN
((strftime('%s', {{ second_date }}) - strftime('%s', {{ first_date }})) / 60.0) >= 0
THEN CEIL(
(strftime('%s', {{ second_date }}) - strftime('%s', {{ first_date }})) / 60.0
)
ELSE FLOOR(
(strftime('%s', {{ second_date }}) - strftime('%s', {{ first_date }})) / 60.0
)
END
{% elif datepart == 'second' %}
(strftime('%s', {{second_date}}) - strftime('%s', {{first_date}}))
{#
CASE
WHEN
((strftime('%s', {{ second_date }}) + cast(substr(strftime('%f', {{ second_date }}), instr(strftime('%f', {{ second_date }}), '.') + 1) as real) / 1000.0) -
(strftime('%s', {{ first_date }}) + cast(substr(strftime('%f', {{ first_date }}), instr(strftime('%f', {{ first_date }}), '.') + 1) as real) / 1000.0)) >= 0
THEN CEIL(
(strftime('%s', {{ second_date }}) + cast(substr(strftime('%f', {{ second_date }}), instr(strftime('%f', {{ second_date }}), '.') + 1) as real) / 1000.0) -
(strftime('%s', {{ first_date }}) + cast(substr(strftime('%f', {{ first_date }}), instr(strftime('%f', {{ first_date }}), '.') + 1) as real) / 1000.0)
)
ELSE FLOOR(
(strftime('%s', {{ second_date }}) + cast(substr(strftime('%f', {{ second_date }}), instr(strftime('%f', {{ second_date }}), '.') + 1) as real) / 1000.0) -
(strftime('%s', {{ first_date }}) + cast(substr(strftime('%f', {{ first_date }}), instr(strftime('%f', {{ first_date }}), '.') + 1) as real) / 1000.0)
)
END
{% elif datepart == 'millisecond' %}
({{ datediff(first_date, second_date, 'minute') }} * 60000 + floor(date_part('millisecond', ({{second_date}})::timestamp)) - floor(date_part('millisecond', ({{first_date}})::timestamp)))
{% elif datepart == 'microsecond' %}
({{ datediff(first_date, second_date, 'minute') }} * 60000000 + floor(date_part('microsecond', ({{second_date}})::timestamp)) - floor(date_part('microsecond', ({{first_date}})::timestamp)))
#}
((1000 * (strftime('%s', {{ second_date }}))) + cast(substr(strftime('%f', {{ second_date }}), instr(strftime('%f', {{ second_date }}), '.') + 1) as integer) -
(1000 * (strftime('%s', {{ first_date }}))) + cast(substr(strftime('%f', {{ first_date }}), instr(strftime('%f', {{ first_date }}), '.') + 1) as integer))
{% else %}
{{ exceptions.raise_compiler_error("Unsupported datepart for macro datediff in sqlite: {!r}".format(datepart)) }}
{{ exceptions.raise_compiler_error("Unsupported datepart for macro datediff in SQLite: '" ~ datepart ~ "'") }}
{% endif %}

{%- endmacro %}
1 change: 0 additions & 1 deletion tests/functional/adapter/utils/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,6 @@ def models(self):
}


@pytest.mark.skip("TODO: implement datediff")
class TestDateDiff(BaseDateDiff):
pass

Expand Down
Loading