Clone modified incremental models as step #1 of dbt Cloud CI job (Snowflake only) #1477

graciegoheen · 2022-05-24T16:00:11Z

graciegoheen
May 24, 2022
Maintainer

CAUTION: This solution will not work as cloning must be done via a run-operation for dbt to be aware of the objects prior to parsing. Until we're able to access selected_resources via the run-operation, cloning only the modified incremental models in a CI job is not possible.

Background:

Imagine that you've created a Slim CI job in dbt Cloud.

Your CI job:

defers to your daily production job
runs the command dbt build --select state:modified+
is triggered when a PR is opened

Now imagine you're dbt project looks like this:

When you open a PR that modifies model_1, your CI job will kickoff and build only the modified models and their downstream dependencies (in this case: model_1, model_2, and model_3) into a PR-specific schema. This mimics the behavior of what will happen once the PR is merged into the main branch (so you have confidence that you're not introducing breaking changes), without requiring a build of your entire dbt project.

But what happens when one of the modified models (or one of their downstream dependencies) is an incremental model?

Because your CI job is building modified models into a PR-specific schema, on the first execution of dbt build --select state:modified+ the modified incremental model will be built in its entirety because it does not yet exist in the PR-specific schema aka is_incremental will be false.

This can cause problems because:

typically incremental models take a long time to build in their entirety which can slow down development time
there are situations where a full-refresh of the incremental model passes successfully in your CI job but an incremental build of that same table in prod would fail when the PR is merged into main (think schema drift where on_schema_change config is set to fail)

We can alleviate the above problems by zero copy cloning these incremental models into our PR-specific schema as the first step of the CI job. This way, the incremental models already exist in the PR-specific schema when you first execute the command dbt build --select state:modified+ so the is_incremental flag will be true.

Your CI jobs will run faster, and you're more accurately mimicking the behavior of "what will happen once the PR has been merged into main".

In the past, we've been able to accomplish a similar goal by cloning the entire production schema into the PR-specific schema following this approach. But with the introduction of the new selected_resources context variable in version 1.1, we can specifically clone only the modified+ incremental models which will decrease the execution time of our CI job.

Step 1:

Make sure you've updated your project to v1.1.

Step 2:

Ensure your target is set as ci for your dbt cloud CI job.

Step 3:

Add the following clone_modified_incrementals macro to your dbt project:

{% macro clone_modified_incrementals(from_db, from_schema) %}

{%- if execute -%}

    {%- if target.name == 'ci' -%}
    
        {%- for node in graph.nodes.values() -%}
            {%- if node.unique_id in selected_resources and node.resource_type == 'model' and node.config.materialized == 'incremental' -%}
                {%- set from_relation = (adapter.get_relation(database=from_db, schema=from_schema, identifier=node.name)) -%} 
                {%- if from_relation.is_table -%}

                create or replace transient table {{ target.database }}.{{ generate_schema_name(custom_schema_name = node.config.schema, node = node.name) }}.{{ node.name }} clone {{ from_db }}.{{ from_schema }}.{{ node.name }};
                
                {% do log("Cloned incremental model " ~ from_db ~ "." ~ from_schema ~ "." ~ node.name ~ " into target schema.", info=true) %}
                
                {%- endif -%}
                
            {%- endif -%}
            
        {%- endfor -%}

        select 1; {# hooks will error if they dont have valid SQL in them, this handles that! #}
    
    {%- else -%}

    select 2; {# hooks will error if they dont have valid SQL in them, this handles that! #}

    {%- endif -%}

{%- endif -%}

{% endmacro %}

Step 4:

Add the macro as an on-run-start hook to your dbt_project.yml file:

on-run-start: "{{ clone_modified_incrementals(from_db='production_db', from_schema='production_schema') }}"

Disclaimers

This macro assumes that all of your production incremental models are built into a single production database and schema.
Snowflake only.
Requires dbt version 1.1.0 or greater.
We must run the macro as an on-run-start instead of a run-operation because the selected_resources variable is not currently accessible when using the command run-operation.

graciegoheen · 2022-11-15T19:46:56Z

graciegoheen
Nov 15, 2022
Maintainer Author

FOLLOW UP:

While this does successfully clone the modified incremental models, currently the dbt build --select modified+ is still building the incremental models from scratch because dbt has to know of the existence of the model prior to the on-run-start triggering.

This is likely not possible until selected resources variable becomes available in a run-operation command.

2 replies

bbrewington Nov 22, 2023

Given it's been a year, just checking back on this to see if there's been any movement on selected resources availability. Or if there's a Discussion, Issue, or PR to tackle it

graciegoheen Nov 22, 2023
Maintainer Author

We actually introduced a new dbt clone command in 1.6 - docs here https://docs.getdbt.com/reference/commands/clone

You can use this command instead of a macro + pre-hook, writeup here #4359

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clone modified incremental models as step #1 of dbt Cloud CI job (Snowflake only) #1477

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Clone modified incremental models as step #1 of dbt Cloud CI job (Snowflake only) #1477

graciegoheen May 24, 2022 Maintainer

Background:

Step 1:

Step 2:

Step 3:

Step 4:

Disclaimers

Replies: 1 comment · 2 replies

graciegoheen Nov 15, 2022 Maintainer Author

bbrewington Nov 22, 2023

graciegoheen Nov 22, 2023 Maintainer Author

graciegoheen
May 24, 2022
Maintainer

Replies: 1 comment 2 replies

graciegoheen
Nov 15, 2022
Maintainer Author

graciegoheen Nov 22, 2023
Maintainer Author