-
Notifications
You must be signed in to change notification settings - Fork 66
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add methods for constructing
LogSegment
for Snapshot and for TableC…
…hanges (#495) <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-incubator/delta-kernel-rs/blob/main/CONTRIBUTING.md 2. Run `cargo t --all-features --all-targets` to get started testing, and run `cargo fmt`. 3. Ensure you have added or run the appropriate tests for your PR. 4. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 5. Be sure to keep the PR description updated to reflect all changes. --> ## What changes are proposed in this pull request? <!-- Please clarify what changes you are proposing and why the changes are needed. The purpose of this section is to outline the changes, why they are needed, and how this PR fixes the issue. If the reason for the change is already explained clearly in an issue, then it does not need to be restated here. 1. If you propose a new API or feature, clarify the use case for a new API or feature. 2. If you fix a bug, you can clarify why it is a bug. --> This introduces two methods to construct `LogSegment`. The first is constructing LogSegment for Snapshots using `LogSegment::for_snapshot`. The second constructs LogSegment for the upcoming `TableChanges` type. This PR also refactors log listing functions to reduce duplication in the code. We do so by creating a function `get_parsed_log_files_iter` to list, filter, and parse log files. This adds a test function to `test-utils` called `delta_path_for_multipart_checkpoint`. This function can be used to create a multipart checkpoint path. This replaces the changes proposed in #457 <!-- Uncomment this section if there are any changes affecting public APIs: ### This PR affects the following public APIs If there are breaking changes, please ensure the `breaking-changes` label gets added by CI, and describe why the changes are needed. Note that _new_ public APIs are not considered breaking. --> ## How was this change tested? <!-- Please make sure to add test cases that check the changes thoroughly including negative and positive cases if possible. If it was tested in a way different from regular unit tests, please clarify how you tested, ideally via a reproducible test documented in the PR description. --> This change introduces tests for the following: - reading log with out of date checkpoint hint - reading log with up to date checkpoint hint - creating snapshot log segment without a checkpoint hint - Creating snapshot with a multi-part checkpoint - Multipart checkpoint with incorrect number of parts fails. - creating snapshot with a start checkpoint and an end time travel version - Creating a snapshot with a checkpoint hint higher than the time travel version - Creating log segments for table changes - Checking contiguity of the log is always preserved. - Checking that `for_table_changes` fails when the start version > end_version This PR also adds an ignored test that checks for desired behaviour. The test `build_snapshot_with_missing_checkpoint_part_no_hint` checks that an incomplete checkpoint is not used in a LogSegment. A checkpoint is incomplete if it does not have all the parts specified in `LogPathFileType::MultiPartCheckpoint.num_parts`. --------- Co-authored-by: Ryan Johnson <[email protected]> Co-authored-by: Zach Schuermann <[email protected]>
- Loading branch information
1 parent
4ad2f8b
commit 3e7ad45
Showing
3 changed files
with
824 additions
and
323 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.