Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed parsing of log files should be ignored #496

Open
OussamaSaoudi-db opened this issue Nov 15, 2024 · 1 comment · May be fixed by #575
Open

Failed parsing of log files should be ignored #496

OussamaSaoudi-db opened this issue Nov 15, 2024 · 1 comment · May be fixed by #575
Labels
bug Something isn't working good first issue Good for newcomers

Comments

@OussamaSaoudi-db
Copy link
Collaborator

Describe the bug

When listing files in the delta log, the kernel parses each file path into a ParsedLogPath. This is fallible, but errors must not be propagated to the caller.

Details:

This issue was first described here.

The ParsedLogPath represents the file types that kernel understands such as commits, checkpoints, and compacted commits[1]. When the ParsedLogPath encounters an unknown file type, it fails, returning an error. This error is propagated to the caller.

The delta specification requires that unrecognized protocol fields be ignored.

Clients must ignore such unrecognized fields, and should not produce an error when reading a table that contains unrecognized fields.
Some of these delta table features like checkpointv2 and log compaction generate log files. Thus by extension, unrecognized files must be ignored as well.

[1]: See crate::path::LogPathFileType for more details.

To Reproduce

A _delta_log with the following files must not fail to list log files:

# hex instead of decimal
00000000deadbeef.commit.json  

# bogus part numbering
00000000000000000000.checkpoint.0000000010.0000000000.parquet 

# v2 checkpoint, as seen by a client that doesn't understand that feature
00000000000000000010.checkpoint.80a083e8-7026-4e79-81be-64bd76c43a11.json

# compacted log file, as seen by a client that doesn't understand that feature
00000000000000000004.00000000000000000006.compacted.json

# CRC files 
00000000000000000001.crc

Credit to @scovich for the examples.

Expected behavior

The unrecognized errors above must be ignored.

Additional context

Once #495 merges, the change will likely be in list_log_files in kernel/src/log_segment.rs

@cg-cognition
Copy link

I took a pass at fixing this issue (#575), PTAL whenever you get the chance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants