Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add config blocks for the "source" models #391

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jsnb-devoted
Copy link
Contributor

Overview

Adds config blocks to the upstream "source" models according to the dbt project file.

Update type - breaking / non-breaking

  • Minor bug fix
  • Documentation improvements
  • Quality of Life improvements
  • New features (non-breaking change)
  • New features (breaking change)
  • Other (non-breaking change)
  • Other (breaking change)
  • Release preparation

What does this solve?

I've been working on standardizing model configs in my project and I set the default materialization to table in the dbt_project.yml. It appears to have overridden the materialization for these "source" models such that the subsequent runs had been doing a CREATE OR REPLACE and wiping out all of the data without my knowing. At least that is what I think happened. I copied the project config settings from the repo to include the source block like so:

  dbt_artifacts:
    +schema: dbt_artifacts
    staging:
      +schema: dbt_artifacts
    sources:
      +materialized: incremental
      +on_schema_change: append_new_columns

and I confirmed that my dbt runs now look like:

21:19:30  1 of 34 START sql incremental model dbt_artifacts.exposures .................... [RUN]                                                                                                     
21:19:30  2 of 34 START sql incremental model dbt_artifacts.invocations .................. [RUN]                                                                                                     
...

instead of what they used to look like:

[2023-10-02, 21:02:38 UTC] {dbt_hook.py:191} INFO - �[0m21:02:38  1 of 34 START sql table model dbt_artifacts.exposures .......................... [RUN]
[2023-10-02, 21:02:38 UTC] {dbt_hook.py:191} INFO - �[0m21:02:38  2 of 34 START sql table model dbt_artifacts.invocations ........................ [RUN]

Outstanding questions

Is this actually the expected behavior of the dbt_project configs? I would have thought that because the materialization for the source block was set in the project config for this package that I wouldn't be able to override it like I did.

Is there anything lost by setting these as project configs? I don't feel strongly about this change I'm just throwing it out there that this would have saved me from losing all my data.

What databases have you tested with?

  • Snowflake
  • Google BigQuery
  • Databricks
  • Spark
  • N/A

@jsnb-devoted jsnb-devoted had a problem deploying to Approve Integration Tests October 2, 2023 21:40 — with GitHub Actions Failure
@jsnb-devoted jsnb-devoted had a problem deploying to Approve Integration Tests October 2, 2023 21:40 — with GitHub Actions Failure
@jsnb-devoted jsnb-devoted had a problem deploying to Approve Integration Tests October 2, 2023 21:40 — with GitHub Actions Failure
@jsnb-devoted jsnb-devoted had a problem deploying to Approve Integration Tests October 2, 2023 21:40 — with GitHub Actions Failure
@jared-rimmer
Copy link
Member

Hi @jsnb-devoted thanks for opening this PR. I just wanted to clarify my understanding on this.

In your dbt_project.yml file you have configured the materialisation for dbt_artifacts models?

@jsnb-devoted
Copy link
Contributor Author

Hi @jsnb-devoted thanks for opening this PR. I just wanted to clarify my understanding on this.

In your dbt_project.yml file you have configured the materialisation for dbt_artifacts models?

Hi @jared-rimmer -- This was my project config:

models:
  +persist_docs:
    relation: true
    columns: true
  devoted:
     <... all my project configs> 
  dbt_artifacts:
    +schema: dbt_artifacts
    staging:
      +schema: dbt_artifacts

Then I made the default materialization for my project table so it looked like this:

models:
  +materialized: table
  +transient: true
  devoted:
     <... all my project configs> 
  dbt_artifacts:
    +schema: dbt_artifacts
    staging:
      +schema: dbt_artifacts

And my understanding is that overrode the "source" models materialization in the dbt_artifacts dbt_project file and it started dropping and creating those tables -- I didn't notice for long enough that the snowflake time travel expired and I lost all the data.

I assumed that the project config in dbt_artifacts couldn't be overridden like that. Assuming its not just user error on my part I figure you probably don't want this happening to anyone. You don't want those models to ever have their materialization overridden.

@jsnb-devoted
Copy link
Contributor Author

bump - @jared-rimmer any thoughts on this?

@jsnb-devoted
Copy link
Contributor Author

Anyone? Maybe @jaypeedevlin since it looks like you implemented the original feature? Happy to close this if this was just user error or if it isn't any interest.

@hash0b
Copy link

hash0b commented Nov 2, 2023

The current documentation has this around configurations, maybe it addresses your query.

Note that model materializations and on_schema_change configs are defined in this package's dbt_project.yml, so do not set them globally in your dbt_project.yml (see docs on configuring packages):

Configurations made in your dbt_project.yml file will override any configurations in a package (either in the dbt_project.yml file of the package, or in config blocks).

On a side note, with 2.6.1, I get an error on dbt run -s dbt_artifacts, I had to add incremental_strategy to my dbt_project.yml. I am still checking if I am missing something:

                'Compilation Error in model model_executions (models/sources/model_executions.sql)
  Invalid incremental strategy provided: None
      Expected one of: \'append\', \'merge\', \'insert_overwrite\'
  
  > in macro dbt_spark_validate_get_incremental_strategy (macros/materializations/incremental/validate.sql)

@jsnb-devoted
Copy link
Contributor Author

The current documentation has this around configurations, maybe it addresses your query.

Note that model materializations and on_schema_change configs are defined in this package's dbt_project.yml, so do not set them globally in your dbt_project.yml (see docs on configuring packages):

This is a good call out @hash0b. I did not see this note in the README -- had I read this I probably wouldn't have lost all that data.

I don't feel that strongly about it but it seems like there isn't any benefit to letting someone override the materialization for these particular "source" style models. You can prevent that from happening by hardcoding the materialization like I did in the PR. It's no longer an issue for me anymore so it is really up to you all maintainers @jared-rimmer @jaypeedevlin

@jaypeedevlin
Copy link
Contributor

Hi all,

just to add here that I no longer work for Brooklyn Data so am not actively involved in maintaining this package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants