-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: use physical name for column name lookup in partitions #1836
Conversation
ACTION NEEDED delta-rs follows the Conventional Commits specification for release automation. The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. |
seems to be compiling in CI/CD. Where would you propose to add a test for this? Would also have to check if the issue exists with statistics as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For testing, I think the tests belong here:
mod arrow_tests { |
Co-authored-by: Will Jones <[email protected]>
Co-authored-by: Will Jones <[email protected]>
Ok, I'll add some tests. How am I supposed to execute those locally? Always get strange compile errors locally. I'm using Windows |
There is currently a build issue on main which may be related. Could you try with the |
Does not help, however in WSL it works just with |
I don't think the build error is related to my changes? At least I would not understand it 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @aersam for getting us started on column mapping!
Left a comment around error handling, but overall looking good.
One thing I noticed is that there is a lot going on in the example data log. Would it maybe be an option to set up the column mapping on creating the table, which makes it a bit easier to reason about the relevant commit.
I now simplified the test file, it's a lot less files now. |
One comment here: the error disappeared on Windows when doing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the log clean-up - much easier to read now 👍.
Clarified a bit on why to remove the panic and handle via a Result
.
In addition to a lot more error handling, I also added a check for delta.columnMapping.mode which matches the spec. If column mapping mode=none, we should ignore physical name entirely |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking really good!
One more ask is to handle the column mapping analogous to the rest of the table configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sticking with it and your patience @aersam!
Thanks for your guidance and for merging it! |
…#1836) # Description get_actions wrongly assumes that partition_columns from schema and partitionValues from log must be the same. This is not true since partition_columns are logical column names while partitionValues are physical column names. Tests pending # Related Issue(s) - closes delta-io#1835 # Documentation https://github.com/delta-io/delta/blob/master/PROTOCOL.md#writer-requirements-for-column-mapping "Track partition values and column level statistics with the physical name of the column in the transaction log." --------- Co-authored-by: Will Jones <[email protected]>
Description
get_actions wrongly assumes that partition_columns from schema and partitionValues from log must be the same. This is not true since partition_columns are logical column names while partitionValues are physical column names.
Tests pending
Related Issue(s)
Documentation
https://github.com/delta-io/delta/blob/master/PROTOCOL.md#writer-requirements-for-column-mapping
"Track partition values and column level statistics with the physical name of the column in the transaction log."