-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Optimize incremental 'insert_overwrite' strategy #1409
Comments
Hi, yes this is also something i've tried but:
I also though about the static partition definition but uses cases requires to have a dynamic partition overwrite :) |
Yes agreed if your incremental refresh a lot of partitions back then it can takes time (usually i refresh around 3 partitions back so the process isn’t so long). For debuging purposes, I have seen some benefits having the tmp table by restoring it with the time travel feature. But its clearly worst for auditing purposes, it should be nice to have labels related to the dbt run id in the copy config btw 😀. I keep your feature in mind in case of future need. Thanks! |
Oh yes it could be nice to have those labels So you won't valide my pull request for now from what I understand ? In any case, feel free to reach me if you want some help or anything else |
Unfortunately I don't have yet the authority to do it :) But I hope someone will do it |
Ok, let's pray someone with this authority will go through this pull request/issue I can create an additional |
@AxelThevenot, I don't think I quite understand your justification for not creating a new |
This is great stuff! But I think we should create a For example, the For example granular partitioned tables do not benefit from free DELETEs 😢 : Using a |
@amychen1776 and @borjavb I got your points, I will go for a I will make the change as soon as possible and tell you when it's done :) But |
Been thinking about this and... how do we feel about an operator/function configuration within The way I've hijacked the Technically we have an incremental strategy that can work with different functions, but apply the same semantics of replacing a whole partition. So basically we can define an
We already have custom configurations for merge like So the config block would look like something like this:
Although I'm torn if we should add the
But this would just be some syntax sugar of how we want to expose the config. But I think with this we can let the user decide which operator is the most performant, and keep full backwards compatibility. What do you think, @amychen1776 and @AxelThevenot? |
Is this your first time submitting a feature request?
Describe the feature
The
MERGE
statement is sub-optimized in BigQuery.This is of course ok for unique keys as this is what we are looking for.
But for the
insert_overwrite
strategy where we are looking to rows at the partition-level there is a better solution and here is why:DELETE
orINSERT
statement is cheapest than aMERGE
statement.DELETE
statement in BigQuery is free at the partition-level.MERGE
statement it reduces the cost by 50.4% and the elapsed time by 35.2%Describe alternatives you've considered
I have considered an alternative to create an additional
delete+insert
incremental strategy.But overriding the existing
insert_overwrite
is the more convenient as this is exactly what we are looking for and the features are the same.Who will this benefit?
Of course, everyone without any changes.
Are you interested in contributing this feature?
Yes
Anything else?
No response
The text was updated successfully, but these errors were encountered: