-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-1356] [Feature] incremental models' merge conditional update #6077
Comments
@Yotamho Thanks for opening! This ask sounds to me a lot like the ask to pass additional custom "predicates" to the There's a subtle distinction between: merge into ...
from ...
on <unique_key_match> and <custom_condition> ...
when matched then update ...
when not matched then insert ... and: merge into ...
from ...
on <unique_key_match> ...
when matched and <custom_condition> then update ...
when not matched then insert ... In the first case, if the unique key matches but the custom condition isn't met, the row will be inserted as a new record instead of updating an existing record. (Could lead to duplicates in the target table.) In the second case, if the unique key matches, but the custom condition isn't met, that new row goes ... nowhere. Is that preferable behavior? Or risk greater confusion? It may help to get concrete about specific use cases. Looking at yours:
Is this different from the common logic that we ask users to stick in incremental models? Yes, this is an implicit join within the incremental model logic instead, but I'd rather have it live in the model logic, where it can be more easily seen / debugged, than tucked away in the materialization logic / select * from {{ ref('upstream_table') }}
{% if is_incremental() %}
where timestamp_col > (select max(timestamp_col) as max_ts from {{ this }})
{% endif %} |
Hi @jtcohen6 , thanks for responding!
If the name is the unique key, and we wish to update records when they arrive with newer timestamp, if we simply base the incremental model on the timestamp we will ignore the updated "john" record. When data might come out of order, I can think of 2 implementations of incremental models (I'm sure there are more approaches):
|
(Jumping in quickly because @dbeatty10 has pointed out to me that, where I said "subtle distinction" above, I had initially pasted the exact same code twice. Sorry about that!!) |
(Doug - assigning both of us, just so that one of us writes a quick follow-up to the conversation we had on Monday!) |
Hey, |
I am very interested in this. My use-case is I only want to update when a column "hashdiff" has change: Without that ability, it requires a self-join to identify rows which actually changed. This requires processing, when adding this simple compare would fix that issue. I am considering a custom materialization to work around this issue. |
After giving this another read through, I see significant overlap between this issue, and one I just responded to yesterday: #6415 (comment) Two points that carry over in particular:
|
Interesting. I will need to update to 1.3 and see if that will allow me to fix my problem. |
In reviewing this implementation. For my purpose, I think I would rather simply override default__get_merge_sql, to accept a new config setting of "merge_update_only_when_columns_different" (or something) and append it to the "when matched {merge_update_only_when_columns_different code here} then update set..." I will try that and see how that goes. |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers. |
Is this your first time submitting a feature request?
Describe the feature
In some databases (namely snowflake and postgres), it is possible to add a condition to a merge update clause:
when matched and <condition> then ...
I want to allow to add this kind of condition to incremental models (by making it an incremental model configuration).
Describe alternatives you've considered
A considerable alternative would be to join the records in the incremental run, with their destination records, to check for which rows the condition is not satisfied, and omit these rows.
such solution would be less performant and will make the model more complicated to understand.
Who will this benefit?
For example when there is a scenario of out of order data that is arbitrarily inserted into the source table of a model, this feature allow us to omit this data by using a condition like:
DBT_INTERNAL_SOURCE.timestamp > DBT_INTERNAL_DEST.timestamp
.Are you interested in contributing this feature?
yes
Anything else?
I have forks with suggested implementation:
https://github.com/Yotamho/dbt-core
https://github.com/Yotamho/dbt-snowflake
The text was updated successfully, but these errors were encountered: