-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(python): add support for pyarrow 13+ #1804
Conversation
python/deltalake/table.py
Outdated
def _cast_to_equal_batch( | ||
batch: pyarrow.RecordBatch, schema: pyarrow.Schema | ||
) -> pyarrow.RecordBatch: | ||
""" | ||
Cast a batch to a schema, if it is already considered equal. | ||
|
||
This is mostly for mapping things like list field names, which arrow-rs | ||
checks when looking at schema equality, but pyarrow does not. | ||
""" | ||
if batch.schema == schema: | ||
return pyarrow.Table.from_batches([batch]).cast(schema).to_batches()[0] | ||
else: | ||
return batch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ion-elgreco If we take this out, do the tests still pass?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, everything passes still. Shall I take it out?
Or would we require it later once we enable the compliant types again.
edit: Taking it out for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally I think we shouldn’t need it. This was an earlier attempt at fixing the compliant types thing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have backwards compatibility during reads and writes when we use the compliant types, I think spark-delta also doesn't write these compliant types but I may be wrong here
Thanks for finishing this up! |
Description
I build on top of the branch of @wjones127 #1602. In pyarrow v13+ the ParquetWriter by default uses the
compliant_nested_types = True
(see related PR: https://github.com/apache/arrow/pull/35146/files)and the docs: https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetWriter.html).In arrow/parquet-rs it fails when it compares schemas because it expected the old non-compliant ones. For now we can have pyarrow 13+ supported by disabling it or updating the file options provided by a user.
Related Issue(s)
pyarrow>=13
#1744Documentation