-
Notifications
You must be signed in to change notification settings - Fork 421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: workaround for Add actions being read slightly differently out of parquet files #3031
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3031 +/- ##
==========================================
- Coverage 72.72% 72.71% -0.01%
==========================================
Files 128 128
Lines 41318 41357 +39
Branches 41318 41357 +39
==========================================
+ Hits 30047 30074 +27
+ Misses 9340 9336 -4
- Partials 1931 1947 +16 ☔ View full report in Codecov by Sentry. |
Signed-off-by: R. Tyler Croy <[email protected]>
1c450c0
to
ede79c1
Compare
…f parquet files This workaround fixes #3030 but I'm not quite happy about how. I believe there is likely an upstream issue in arrow that I have not been able to reproduce where the nullable struct (deletionVector) which has non-nullable members (e.g. storageType) that are being read as empty strings rather than `null`. This issue started to appear with the v0.22 release which included an upgrade to datafusion 43 and arrow 53. I believe the latter is causing the issue here since it affects non-datafusion code paths. This workaround at least allows us to move forward and release a v0.22.1 while continuing to hunt down the issue. Signed-off-by: R. Tyler Croy <[email protected]>
Signed-off-by: R. Tyler Croy <[email protected]>
ede79c1
to
53ccd68
Compare
@rtyler did you file an issue in arrow-rs? |
@ion-elgreco my suspicions are that it lies upstream, but I don't have any confirmation |
This test should also check the final state of the table is correct. I expect it to not: #3030 (comment) |
This workaround fixes #3030 but I'm not quite happy about how. I believe there is likely an upstream issue in arrow that I have not been able to reproduce where the nullable struct (deletionVector) which has non-nullable members (e.g. storageType) that are being read as empty strings rather than
null
.This issue started to appear with the v0.22 release which included an upgrade to datafusion 43 and arrow 53. I believe the latter is causing the issue here since it affects non-datafusion code paths.
This workaround at least allows us to move forward and release a v0.22.1 while continuing to hunt down the issue.