Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema evolution on upsert (merge) #2282

Open
ion-elgreco opened this issue Mar 11, 2024 Discussed in #2281 · 7 comments
Open

Schema evolution on upsert (merge) #2282

ion-elgreco opened this issue Mar 11, 2024 Discussed in #2281 · 7 comments
Assignees
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate enhancement New feature or request help wanted Extra attention is needed

Comments

@ion-elgreco
Copy link
Collaborator

Discussed in #2281

Originally posted by cesar-vermeulen March 11, 2024
Hello!

With the awsome addition of merge schema support in the write operation (kudos contributors @ #2246), I was wondering whether a similar functionality is on the roadmap for upsert transactions? Would be a great addition to the current functionalities!

Thanks,
Cheers!

@JonasDev1
Copy link
Contributor

JonasDev1 commented Mar 21, 2024

In order for this function to be useful, we first need to implement an updateAll / insertAll functionality.
Currently you need to specify all updates and inserts manually.

In the future something like that would be helpfull together with schema evolution:

let (table, metrics) = DeltaOps(table)
  .merge(source, "target.id = source.id")
  .with_source_alias("source")
  .with_target_alias("target")
  .when_matched_update(|update| {
   update.updateAll()
  }).unwrap()
  .when_not_matched_insert(|insert| {
    insert.insertAll()
  }).unwrap()
  .await
  .unwrap();

@rtyler rtyler added enhancement New feature or request binding/python Issues for the Python package binding/rust Issues for the Rust crate labels Aug 10, 2024
@rtyler
Copy link
Member

rtyler commented Aug 10, 2024

@ion-elgreco I think with your recent improvements, we have this now right?

@ion-elgreco
Copy link
Collaborator Author

@ion-elgreco I think with your recent improvements, we have this now right?

No not on merge yet, we need to built it differently there since it needs to part of the projection expressions. We can reuse the merge schema functionality though to check whether the source and table schema is different and if they can be merged

@npgretz
Copy link

npgretz commented Oct 10, 2024

My org would love to see this functionality. The Python delta-rs package has proven to be less overhead for our ETL loads than spinning up Spark pods for every load to a delta table.

We cannot make the switch to delta-rs though if we need to completely overwrite the delta table to add new columns.

@ion-elgreco ion-elgreco added the help wanted Extra attention is needed label Dec 7, 2024
@ion-elgreco
Copy link
Collaborator Author

My org would love to see this functionality. The Python delta-rs package has proven to be less overhead for our ETL loads than spinning up Spark pods for every load to a delta table.

We cannot make the switch to delta-rs though if we need to completely overwrite the delta table to add new columns.

Unfortunately there is no one working on it or looking at it, if your org is in dire need of it, they might look for contractors who can build this feature.

I can support by reviewing a PR

@JustinRush80
Copy link

I can try to take this

@ion-elgreco
Copy link
Collaborator Author

I can try to take this

Ok great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

5 participants