-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-2609] [Feature] Allow defining a default backfill value for incremental models with constraints #7732
Comments
This is an intriguing idea @b-per ! 💡 Database / schema migrationsThe manual steps you described are basically like database / schema migrations in Django, Rails, and other similar frameworks. Incremental models already do similar operations as those migrations (i.e., adding columns like you described), but don't do others (like replacing Things to solve forThe piece of the design that seems most impactful is how to specify the value to use for the new column(s). There's a couple awkward things to solve for:
Specifying default values for not null constraintsIdea 1One option would be to introduce a It could look similar this:
{{
config(
materialized='incremental',
unique_key='id',
on_schema_change='append_new_columns',
default_values={
"some_string_column": "'foobar'",
"some_timestamp_column": "cast('0000-01-01' as timestamp)",
}
)
}} Idea 2An alternative could be to have the default value added to the configuration of the constraint itself, like this:
models:
- name: dim_customers
config:
contract:
enforced: true
columns:
- name: some_string_column
data_type: varchar
constraints:
- type: not_null
- default_value: "foobar"
- name: some_timestamp_column
data_type: timestamp
constraints:
- type: not_null
- default_value: "cast('0000-01-01' as timestamp)" Idea 3Full-blown support for general database migrations. Such a feature would only be relevant to models materialized as incremental because ephemeral, table, and view don't need migrations. SummaryI prefer idea 2 over idea 1 since there's already a full listing of the column names when a contract is enforced. It solves for the first two things listed in the "things to solve for" section. But neither idea solves for the one-time nature of these migrations. To solve for that one, we'd need idea 3. That typically involves storing state, which we've historically shied away from. What do you think of those first two options versus each other @b-per? Did you have something different in mind? |
Idea 1 and Idea 2 are tackling slightly different use cases
Knowing that the original issue came more from a constraints use case I would be edging towards Idea 2, supporting it when contraints are defined. Couple of additional ideas about it:
|
@b-per good insight about Idea 1 and Idea 2 tackling slightly different use cases. I agree with your assessment and edging towards Idea 2!
Calling it
I'm concerned that it doesn't actually add the constraint. It feels like it should be all or nothing. Otherwise, it's really more like a pre-insert/merge data test on a batch than a true constraint. Allowing specification of a What do you think? Discussing the details here makes me wonder if this type of backfill activity is necessarily difficult. i.e., the Alternative 1 that you mentioned originally is definitely overhead for the end user, but it's clear to them what they are doing and how the system is behaving. |
I think that
It might be confusing but it is the current behaviour. Would we be ok changing the current behaviour and making
I think that not having it managed in dbt and letting people do it by hand is difficult. |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers. |
I think that this is still worth considering, hence why I reopen it but feel free to close it for good! |
I have a client who asked for the |
Is this your first time submitting a feature request?
Describe the feature
When adding a new column to an incremental model (set with
append_new_columns
) with a not null constraint, the new rows are checked to see if the new column is NULL but the table itself doesn't get its overall metadata/DDL updated (which is normal as there are now NULL values for previous rows).This is good as data is checked as configured, but the table attributes in the Warehouse then doesn't reflect the fact the new column is verified to be not null.
The feature suggested would be to:
not null
constraintDescribe alternatives you've considered
Alternative 1
It is currently possible to achieve the same outcome by manually:
update
statements to update the historical rows with NULL valuesalter
statement on the table to update the column as beingnot null
But this approach requires writing DDL directly when the suggested feature would handle this automatically.
Alternative 2
Another approach is to run a
--full-refresh
but this might not be possible for huge tables.Who will this benefit?
People wanting to leverage the new constraints capabilities and handling large amount of data stored in incremental models.
Are you interested in contributing this feature?
If need be
Anything else?
No response
The text was updated successfully, but these errors were encountered: