-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(python): expose rust writer as additional engine #1872
Closed
ion-elgreco
wants to merge
21
commits into
delta-io:main
from
ion-elgreco:feat/expose_rust_writer_as_optional_engine
Closed
Changes from all commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
ca2acb8
first version
ion-elgreco 3f55470
add try from uri with storage options
ion-elgreco 714cc56
Start to enable overwrite_schema
ion-elgreco 52f0d6a
add tests to check rust py03 writer
ion-elgreco c067d4d
remove comment
ion-elgreco 5c5f247
rename and clean up
ion-elgreco 717b7c7
add float type support in partition cols
ion-elgreco dae361e
check for pandas
ion-elgreco 3052f12
add support for name and desc
ion-elgreco 01f0194
fmt
ion-elgreco 9911574
improve tests and add config support
ion-elgreco a410dfa
parametrize write benchmark
ion-elgreco 8c976b6
add LargeUtf8 support in partition stringify
ion-elgreco 57565b5
refactor: express log schema in delta types
roeap 49a298b
feat(python): expose `convert_to_deltalake` (#1842)
ion-elgreco 635313f
refactor: merge to use logical plans (#1720)
Blajda 633fd7f
feat: create benchmarks for merge (#1857)
Blajda 07113c6
Revert "refactor: express log schema in delta types"
ion-elgreco 3a8c026
Merge branch 'main' into feat/expose_rust_writer_as_optional_engine
ion-elgreco 3e25561
formatting
ion-elgreco d9a4ce0
use fromstr
ion-elgreco File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this meant for schema evolution? If so, I'd recommend moving that to a follow-up PR as it would likely blow up this PR quite a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. I think it's fine if we let that return
NotImplementedError
for now.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was an quick attempt for schema evolution. I was able to write except it didn't write the columns that were not part of the original schema, so I need to dig through the code more.
Ok, let's do this as improvement in another update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we also would need to update all read path to always add null columns for columns non-existent in older parquet files. Haven't looked into it, but this would likely require some larger refactoring particularly in the datafusion
DeltaScan
. Saying this we likely need to validate that added columns are always nullable as well.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would be schema evolution for also appends.
The PyArrow writer can do schema evolution but only combined with an
overwrite
mode.I think that's purely a metadata action then. Would this be doable with the existing deltalake-core crate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure, we may end up with unreadble tables if we do this... if we replace the whole table this might work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With PyArrow it only works together with overwrite so it should be safe. Is there a way to adjust the commit that's written during a write?