-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug with out of date last checkpoint, and clean listing function #354
Fix bug with out of date last checkpoint, and clean listing function #354
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #354 +/- ##
==========================================
+ Coverage 74.88% 74.98% +0.10%
==========================================
Files 43 43
Lines 8409 8496 +87
Branches 8409 8496 +87
==========================================
+ Hits 6297 6371 +74
- Misses 1724 1737 +13
Partials 388 388 ☔ View full report in Codecov by Sentry. |
kernel/src/snapshot.rs
Outdated
.collect::<Result<Vec<_>, Error>>()? | ||
.into_iter() | ||
let mut max_checkpoint_version = checkpoint_metadata.version; | ||
let mut checkpoint_files = Vec::with_capacity(10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, any reason why you expect the Vec
to be at most size 10?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is odd, because most checkpoints will have just one file?
I wonder if the intent was to pre-size the commit_files
to 10, since that's the default checkpoint interval for delta-spark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that makes sense. I've swapped them so we reserve 10 for commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably fine, but will hold off for the rebase because it should clean up the most important logic.
kernel/src/snapshot.rs
Outdated
.collect::<Result<Vec<_>, Error>>()? | ||
.into_iter() | ||
let mut max_checkpoint_version = checkpoint_metadata.version; | ||
let mut checkpoint_files = Vec::with_capacity(10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is odd, because most checkpoints will have just one file?
I wonder if the intent was to pre-size the commit_files
to 10, since that's the default checkpoint interval for delta-spark?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
kernel/src/snapshot.rs
Outdated
); | ||
// we may need to drop some commits that are after the actual last checkpoint | ||
commit_files.retain(|parsed_path| parsed_path.version > max_checkpoint_version); | ||
} else if checkpoint_files.len() != checkpoint_metadata.parts.unwrap_or(1) as usize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work?
} else if checkpoint_files.len() != checkpoint_metadata.parts.unwrap_or(1) as usize { | |
} else if checkpoint_files.len() != checkpoint_metadata.parts.unwrap_or(1usize) { |
(replace usize
with the suffix for whatever type parts
has)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no. the issue is that checkpoint_metadata.parts
was an i32, so the 1
was implicitly an i32, and trying to make it a usize
doesn't work because the cast needs to happen after the unwrap.
Anyway, since the reason we have parts
is to compare it against a vector len, I made it a usize
in the struct, and we don't have to cast. It means serde will fail if there's a negative number in the json, but that's probably okay since that's a broken _last_checkpoint anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm couple nits/followups
Co-authored-by: Zach Schuermann <[email protected]>
We already have constants like `DataType::LONG` that can avoid boilerplate like `DataType::Primitive(PrimitiveType::Long)`, and such constants can be used in match arms. Update the code to use them.
Turns out we don't use lazy static in many places so potentially removing the dependency is straight forward. Lazy static itself even gives an example from the standard library. https://github.com/rust-lang-nursery/lazy-static.rs?tab=readme-ov-file#standard-library --------- Co-authored-by: Ryan Johnson <[email protected]>
remove sync-engine from default features so there are no features enabled by default.
)" This reverts commit 4c214c8.
This reverts commit 0968281.
This reverts commit 57bf817.
…elta-io#354) If we have a `_last_checkpoint` that is out of date, things can get confused. This code: 1. Cleans up the listing function a bit 2. Ensures we end up with the real latest checkpoint 3. Drops any commit files from the listing that are older than the last checkpoint 4. `warns!` if ` _last_checkpoint` is out of date 5. Adds a test for this case This code will conflict with delta-io#347, so maybe hold of merging until that merges and then I can rebase and clean this up more. --------- Co-authored-by: Nick Lanham <[email protected]> Co-authored-by: Zach Schuermann <[email protected]> Co-authored-by: Ryan Johnson <[email protected]> Co-authored-by: Stephen Carman <[email protected]>
If we have a
_last_checkpoint
that is out of date, things can get confused. This code:warns!
if_last_checkpoint
is out of dateThis code will conflict with #347, so maybe hold of merging until that merges and then I can rebase and clean this up more.