Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up tables when materialization isn´t incremental #49

Closed
wants to merge 3 commits into from

Conversation

datalytics1
Copy link

@datalytics1 datalytics1 commented Dec 14, 2021

Minor change to adapters.sql logic to clean_up table when materialization isn´t incremental.

@owenprough-sift
Copy link

I tested these changes on my local project and they worked great. Thanks for making these changes!

@nialloriordan
Copy link
Contributor

nialloriordan commented Mar 10, 2022

As this branch has conflicts and has not been updated recently, I created a new PR that has been rebased: #73

@Gatsby-Lee
Copy link

@courentin
Hi, I have a question about your change.
If the materialized is "table" and external_location is set, although the table is recreated, doesn't it make an error?

For example, since the external_location is set(static path), although the table is recreated, creating a new table will fail.

what issue did you try to fix with this change?

@jessedobbelaere
Copy link

jessedobbelaere commented Jun 1, 2022

@Gatsby-Lee that's correct. I added a global static external_location in dbt_project.yml like this:

    +external_location: "{{ target.s3_staging_dir }}/data/"

Then using this fix, it still throws a HIVE_PATH_ALREADY_EXISTS error when I run dbt --debug run --threads 1 because the single threaded flow will deploy one model at a time, where the first model works fine, and the second model fails because the S3 directory already contains files from the first model.

The solution, I think, is to make the external_location unique per model; a folder per model on S3. Then, the cleanup using a static path works fine. Either by setting a unique external_location on each sql model, e.g.:

{{
    config(external_location=target.s3_staging_dir + "/data/my_model_name")
}}

Or waiting for #50 to be merged 🤞 Currently, I have an ugly solution where I override the athena__create_table_as macro in my project to append the model name to the external_location path.

Nevertheless, merging this PR would help me out a lot to clean up the model folders on S3, and it unblocks PR #50 too.

Any change to merge this @courentin @Tomme ? 🙏

@owenprough-sift
Copy link

Is this PR superseded by #73?

@jessedobbelaere
Copy link

Same fix is in #73 indeed, but 73 does not have conflicts 👍

@Gatsby-Lee
Copy link

@jessedobbelaere
Thank you for comment.
I built a macro that can generate unique externa_location per model.
BTW, even for a model, there could be a error due to the existing path in S3.
Therefore, I add random prefix on the S3.

I might miss sth to understand your change.
your change is

  {% if config.get('incremental_strategy') == 'insert_overwrite' %}
  {% if config.get('incremental_strategy') != 'append' %}

doesn't incremental_strategy' != 'append' equal to incremental_strategy' != 'insert_overwrite'?

@jessedobbelaere
Copy link

jessedobbelaere commented Jun 1, 2022

Hi @Gatsby-Lee they're not my changes, but I believe the issue got introduced in https://github.com/Tomme/dbt-athena/pull/43/files

In the past, table data always got cleaned up before dropping, as you see in the diff of #43 . Since the changes in #43 the value of incremental_strategy is by default set to NONE when it runs that if-statement, so the cleanup never happens by default anymore. It should only skip the cleanup in case it's an append strategy, so that's what the bugfix is about.

Anyway, #73 introduces the same bugfix as this PR, by someone else, but it's more up to date with master and ready to merge. Hopefully someone can do that 🙏

@Gatsby-Lee
Copy link

@jessedobbelaere got it.
so, it gives the same functionality, but it breaks the existing implantation.

Thank you

@Tomme
Copy link
Owner

Tomme commented Jun 6, 2022

Resolved by #73, thank you for your contribution though!

@Tomme Tomme closed this Jun 6, 2022
@Gatsby-Lee
Copy link

thank you @jessedobbelaere @Tomme

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants