Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add on_schema_change possibility #426

Merged

Conversation

Jeremynadal33
Copy link
Contributor

@Jeremynadal33 Jeremynadal33 commented Aug 9, 2024

resolves #330

Description

Adds a call to the macro process_schema_change after creating the temp relation and before merging. Similar to dbt-spark's incremental materialization.

It will allow the "ignore", "fail" and "append_new_columns" as on_schema_change strategy but not the "sync_all_columns" because the macro spark__alter_relation_add_remove_columns does not allow for dropping columns. It could because Delta allow for dropping columns if table properties are :

{
    'delta.columnMapping.mode' = 'name',
    'delta.minReaderVersion' = '2',
    'delta.minWriterVersion' = '5'

It will be possible once dbt-sprak PR #1088 is merged.

WARNING : there is still a limitation. For example, if using "ignore" strategy and removing columns, update will fail because spark__get_merge_sql uses update set * which looks for all columns.

Checklist

  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • I have updated the CHANGELOG.md and added information about my change to the "dbt-glue next" section.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

FYI :
I am not sure wether this change needs testing, if it does, I would be interested in understanding how to!

Tests that were run locally

Add a simple model with incremental strategy (test_change_schema) :

{{
    config(
        materialized='incremental',
        incremental_strategy='merge',
        unique_key=['id'],
        file_format='delta',
        on_schema_change='append_new_columns'
    )
}}

with incoming_data as (
    select 1 as id
    union all
    select 2 as id
)

select * from incoming_data

Run it a first time and add a new columns :

{{
    config(
        materialized='incremental',
        incremental_strategy='merge',
        unique_key=['id'],
        file_format='delta',
        on_schema_change='append_new_columns'
    )
}}

with incoming_data as (
    select 1 as id, 'a' as name
    union all
    select 2 as id, 'b' as name
)

select * from incoming_data

Column is added.

@Jeremynadal33 Jeremynadal33 force-pushed the feature/allow-on-schema-change branch from fbf2fe8 to 7e8fd8d Compare August 9, 2024 10:39
@moomindani moomindani self-assigned this Aug 26, 2024
@moomindani moomindani added the enable-functional-tests This label enable functional tests label Aug 26, 2024
@moomindani
Copy link
Collaborator

Thank you for your contribution!

@moomindani moomindani merged commit 242400f into aws-samples:main Aug 26, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
beginning-contributor enable-functional-tests This label enable functional tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support on_schema_change
2 participants