Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: workaround for Add actions being read slightly differently out of parquet files #3031

Merged
merged 3 commits into from
Nov 29, 2024

Conversation

rtyler
Copy link
Member

@rtyler rtyler commented Nov 25, 2024

This workaround fixes #3030 but I'm not quite happy about how. I believe there is likely an upstream issue in arrow that I have not been able to reproduce where the nullable struct (deletionVector) which has non-nullable members (e.g. storageType) that are being read as empty strings rather than null.

This issue started to appear with the v0.22 release which included an upgrade to datafusion 43 and arrow 53. I believe the latter is causing the issue here since it affects non-datafusion code paths.

This workaround at least allows us to move forward and release a v0.22.1 while continuing to hunt down the issue.

@github-actions github-actions bot added the binding/python Issues for the Python package label Nov 25, 2024
Copy link

codecov bot commented Nov 25, 2024

Codecov Report

Attention: Patch coverage is 82.35294% with 9 lines in your changes missing coverage. Please review.

Project coverage is 72.71%. Comparing base (f5f7ccc) to head (53ccd68).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
crates/core/src/protocol/checkpoints.rs 86.84% 1 Missing and 4 partials ⚠️
crates/core/src/kernel/snapshot/log_data.rs 55.55% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3031      +/-   ##
==========================================
- Coverage   72.72%   72.71%   -0.01%     
==========================================
  Files         128      128              
  Lines       41318    41357      +39     
  Branches    41318    41357      +39     
==========================================
+ Hits        30047    30074      +27     
+ Misses       9340     9336       -4     
- Partials     1931     1947      +16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rtyler rtyler force-pushed the fix/checkpoint-failure-3030 branch from 1c450c0 to ede79c1 Compare November 28, 2024 15:54
@github-actions github-actions bot added the binding/rust Issues for the Rust crate label Nov 28, 2024
…f parquet files

This workaround fixes #3030 but I'm not quite happy about how. I believe
there is likely an upstream issue in arrow that I have not been able to
reproduce where the nullable struct (deletionVector) which has
non-nullable members (e.g. storageType) that are being read as empty
strings rather than `null`.

This issue started to appear with the v0.22 release which included an
upgrade to datafusion 43 and arrow 53. I believe the latter is causing
the issue here since it affects non-datafusion code paths.

This workaround at least allows us to move forward and release a v0.22.1
while continuing to hunt down the issue.

Signed-off-by: R. Tyler Croy <[email protected]>
@rtyler rtyler force-pushed the fix/checkpoint-failure-3030 branch from ede79c1 to 53ccd68 Compare November 28, 2024 15:58
@rtyler rtyler changed the title chore: add a reproduction case for the checkpointing failure from #3030 fix: workaround for Add actions being read slightly differently out of parquet files Nov 28, 2024
@rtyler rtyler marked this pull request as ready for review November 28, 2024 16:00
@rtyler rtyler added this to the Rust v1.0.0 milestone Nov 28, 2024
@rtyler rtyler enabled auto-merge November 28, 2024 16:00
@rtyler rtyler added this pull request to the merge queue Nov 29, 2024
@ion-elgreco
Copy link
Collaborator

@rtyler did you file an issue in arrow-rs?

Merged via the queue into main with commit 1083c8c Nov 29, 2024
28 checks passed
@rtyler rtyler deleted the fix/checkpoint-failure-3030 branch November 29, 2024 06:48
@rtyler
Copy link
Member Author

rtyler commented Nov 29, 2024

@ion-elgreco my suspicions are that it lies upstream, but I don't have any confirmation

@echai58
Copy link

echai58 commented Dec 2, 2024

This test should also check the final state of the table is correct. I expect it to not: #3030 (comment)

@rtyler rtyler modified the milestones: Rust v1.0.0, v0.23 Jan 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package binding/rust Issues for the Rust crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

checkpoint breaks writes on 0.22.0
3 participants