Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flexible delete+insert incremental strategy that purely relies on incremental_predicates #303

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

dlord
Copy link

@dlord dlord commented Sep 12, 2024

This allows us to perform fine-grained optimizations that does not rely on the unique key. This optimization is important for data warehouses like Redshift, where there is no concept of primary/unique keys and indexes.

resolves dbt-labs/dbt-core#10655

Problem

Full description of the problem can be found here

Solution

This PR makes the incremental_predicates mandatory if no unique_key. was supplied. Below is an example config:

{{ config(materialized="incremental",
            incremental_strategy='delete+insert',
            incremental_predicates = [
                "DBT_INTERNAL_DEST.event_time >= (select min(event_time) from DBT_INTERNAL_SOURCE)"
            ],
) }}

The DBT_INTERNAL_DEST and DBT_INTERNAL_SOURCE are translated to their equivalent table names using a simple string replace. This change affects the generated delete statement. Here is an example:

delete
from
  "source_db"."dbt"."source_table"
where
  (
    "source_db"."dbt"."source_table".event_time >= (select min(event_time) from "source_table__dbt_tmp173111887980")
  );

This change no longer allows the delete+insert strategy to be used as if it were an append strategy by omitting the unique_key.

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development, and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc.) or this PR has already received feedback and approval from Product or DX

@dlord dlord requested a review from a team as a code owner September 12, 2024 07:32
@cla-bot cla-bot bot added the cla:yes label Sep 12, 2024
Copy link

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

This allows us to perform fine-grained optimizations that does
not rely on the unique key. This optimization is important for
data warehouses like Redshift, where there is no concept of
primary/unique keys and indexes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Create a flexible delete+insert incremental strategy without relying on primary/unique keys
1 participant