-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-3375] [Feature] Allow Users to Specify Merge Clauses for Incremental Models #9060
Comments
Here's some code that would integrate this feature request, my feature request in #9056, and the traditional merge behavior into a single macro. (Currently calling it
|
@dbeatty10 Thanks for the prompt reply and the links! I've read through them and my takeaways are:
FWIW, I think the strategy I've proposed here could be a great solution to all of the above. It handles every specific case that I've seen people asking for in all of the other linked issues. It also handles as some pretty complicated merge logic that's impossible to handle with the existing merge strategy, no matter how cleverly you write the model SQL itself. (The limitation is that the existing merge strategy considers at most two cases Anyway, doesn't matter much to me either way, since I've already implemented this as a working custom strategy in our dbt project (see my comment above for the code). But I think it'd be cool to bring this functionality to the broader dbt user base. And then maybe y'all would stop having to respond to so many feature requests about custom incremental strategies. 😆😉😄 |
Thanks! |
I like this suggestion! I was about to suggest something similar, since I maintain a custom incremental strategy, and I think it's a little clunky. Right now I have to copy and rename this entire macro and slightly adjust the It would be nice if I could just write that part as a macro (with maybe some new helpers to make it easier) {% macro custom_incremental_strategy(target, source, unique_key, dest_columns, incremental_predicates) -%}
when matched and <condition> = TRUE then delete
when matched then update set {{ set_columns(...) }}
when not matched then insert
({{ dest_cols_csv }})
values
({{ dest_cols_csv }})
{% endmacro %} and call that in the Do do this, we could hook in the custom merge statements after all the arguments and data are processed (rather than before as it is now). {% macro default__get_merge_sql(target, source, unique_key, dest_columns, incremental_predicates) -%}
{%- set predicates = [] if incremental_predicates is none else [] + incremental_predicates -%}
{%- set dest_cols_csv = get_quoted_csv(dest_columns | map(attribute="name")) -%}
{%- set merge_update_columns = config.get('merge_update_columns') -%}
{%- set merge_exclude_columns = config.get('merge_exclude_columns') -%}
{%- set update_columns = get_merge_update_columns(merge_update_columns, merge_exclude_columns, dest_columns) -%}
{%- set sql_header = config.get('sql_header', none) -%}
{% if unique_key %}
{% if unique_key is sequence and unique_key is not mapping and unique_key is not string %}
{% for key in unique_key %}
{% set this_key_match %}
DBT_INTERNAL_SOURCE.{{ key }} = DBT_INTERNAL_DEST.{{ key }}
{% endset %}
{% do predicates.append(this_key_match) %}
{% endfor %}
{% else %}
{% set unique_key_match %}
DBT_INTERNAL_SOURCE.{{ unique_key }} = DBT_INTERNAL_DEST.{{ unique_key }}
{% endset %}
{% do predicates.append(unique_key_match) %}
{% endif %}
{% else %}
{% do predicates.append('FALSE') %}
{% endif %}
{{ sql_header if sql_header is not none }}
{% if config.get('incremental_strategy') %}
{{ call_custom_incremental_strategy(config.get('incremental_strategy'), args....) }}
{% else %}
{{ default_strategy() }}
{% endif %}
{% endmacro %} I slightly prefer this approach because I can write a normal jinja-based SQL macro, with nice syntax highlighting, formatting, and autocomplete in dbt Cloud and VSCode. I don't have those same features when writing lists of dictionaries of strings. That's my personal preference, but I'd also be curious to hear if dbt labs have a general philosophy on what should be SQL with jinja and what should be stringified SQL. Also, this could be done in addition to OP's suggestion. If the call to a custom incremental strategy after the arguments are parsed, you could add in the |
Thanks for opening this @dbernett-amplify ! 🏆
Well said 👍
Similar to #9312 (comment), we think that custom incremental strategies is the way to go here (like you've already done) rather than make changes to Then folks can share their custom strategies via the dbt package ecosystem per these instructions. Per @tmastny's comments, we've created #9223 to consider ways to make it easier to customize incremental materializations that utilize a |
Is this your first time submitting a feature request?
Describe the feature
Overview
dbt's existing "merge" strategy is relatively inflexible in that it always states that matched rows should be updated
and unmatched rows should be inserted. Snowflake's merge statement is actually quite a bit more flexible than this. Specifically Snowflake allows you to:
and
conditions in your clauses (case predicates)This update would allow the user to set the merge clauses in the config, allowing the user to access the full flexiblity of the Snowflake merge statement.
Implementation
I currently have this working our dbt project. I have it as a separate incremental strategy, but it could also be incorporated into the existing merge strategy with some conditional logic that uses the existing default merge clauses if the user doesn't specify any merge clauses.
Here's an example of what it looks like in my config block:
A few things to note:
merge_clauses
is a list of dictionaries. So the format is [{dict 1}, {dict 2}, ...].{
"when": required, one of "matched" or "not matched"
"and": optional, an additional condition that must be satisfied in order for the "then" behavior to occur
"then": required, what the user wants to happen when the above are satisfied, one of "update", "insert", or "delete".
}
And then my
get_specify_merge_clauses_sql
macro is identical to the existingget_merge_sql
macro, except that I have replaced this code:with this code
and I have also added
{%- set merge_clauses = config.get('merge_clauses')%}
at the top along with the otherset
statements.Describe alternatives you've considered
No response
Who will this benefit?
The use case that I needed this for was that I wanted to:
is_valid
= falseThis is just one particular use case, but I imagine that there are lots of use cases where a user might want to have more flexibility than what the existing merge strategy provides. For example, it is also possible to use this to insert if the new record does not match and do nothing otherwise, as I outlined in #9056. However, I think that one is so common (and really should be the default when your only source of duplicates is your lookback period) that I think it still deserves a separate option that allows the end user to specify that behavior without having to type out the full merge clause dictionary as is done here.
Are you interested in contributing this feature?
Sure
Anything else?
I work in Snowflake. Not sure how this works on other platforms.
There's one small bit of flexibility that this doesn't allow which is that, if the user has multiple
when matched ... then update
clauses (with differentand
conditions) it doesn't let the user specify different columns to update for those different clauses, but I feel like that would be a relatively rare use case.The text was updated successfully, but these errors were encountered: