Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a way of replacing a single partition within a table #2167

Closed
jenkoian opened this issue Dec 19, 2024 · 2 comments
Closed

Add a way of replacing a single partition within a table #2167

jenkoian opened this issue Dec 19, 2024 · 2 comments

Comments

@jenkoian
Copy link

Feature description

As far as I can tell the replace write_disposition only supports replacing an entire table, I would like to be able to replace a single partition within a table.

I can do this with a merge disposition with a merge key, but it does a lot more than I want/need. It creates temp tables and stages data and all that stuff, when I simply want to just delete and re-insert data, I am aware this means there is a non-zero period of time the data is not available.

Are you a dlt user?

Yes, I'm already a dlt user.

Use case

I can use the merge write_disposition but it does a bunch of stuff I don't really need and can be quite slow and expensive (when running on Snowflake anyways). I looked at using upsert but it doesn't handle the case of existing entries being deleted if they no longer exist in the source (afaict).

Proposed solution

Example:

Event
id
type
match_id
id type match_id
1 pass 123
2 shot 123
3 pass 456

Replace everything in Event with the match_id 123 would delete events 1 and 2 and re-add them, but event with id 3 would remain untouched.

Related issues

I think this is similar? #1094

@sh-rp
Copy link
Collaborator

sh-rp commented Jan 2, 2025

@jenkoian I depends a bit what kind of data is incoming, but let's say you are running the pipeline and you know the incoming data is only for one specific partition, you can simply just drop the partition in the destination before the load and then to an append load to that table. You have access to the sql_client of the destination via the pipeline. Or which destination are you using? Dropping a partition is a simple as running "DELETE FROM table_name WHERE partition=value" in most cases.

@jenkoian
Copy link
Author

jenkoian commented Jan 9, 2025

Thanks @sh-rp that's really helpful. It's snowflake destination I'm using btw.

@jenkoian jenkoian closed this as completed Jan 9, 2025
@github-project-automation github-project-automation bot moved this from Todo to Done in dlt core library Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

No branches or pull requests

2 participants