Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Handle io.EOF errors returned by ReadAt opening files #478

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

metalmatze
Copy link

Hey,

we're opening parquet files with an object storage client and not directly via os.File. This currently fails.

The underlying ReaderAt interface is allowed to return io.EOF. Currently, opening a file fails for us because reading the footer the object storage client reads until the end of the file returning io.EOF yet at the same time the data was written to the buffer just fine.

Now we can fix this in our FrostDB code, however, we think that generally speaking FrostDB' object storage client is compliant with the ReaderAt interface.

What is this project's stance on handling the io.EOF here?

The underlying ReaderAt interface is allowed to return io.EOF
Currently, opening a file fails for us because reading footer the object
storage client reads until the end of the file returning io.EOF.
Copy link

@achille-roussel achille-roussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, I agree that we should not push this check to the io.ReaderAt implementation 👍

@joe-elliott
Copy link
Contributor

This makes sense to me for reading the footer where you would likely get valid data and an EOF, but do we need this for the column and offset indexes?

@achille-roussel
Copy link

achille-roussel commented Mar 1, 2023

Maybe we can use a helper function to handle unexpected conditions?

func readFullAt(r io.ReaderAt, b []byte, off int64) (int, error) {
  n, err := r.ReadAt(b, off)
  if n == len(b) {
    err = nil
  } else {
    switch err {
    case nil:
      err = io.ErrNoProgress
    case io.EOF:
      err = io.ErrUnexpectedEOF
    }
  }
  return n, err
}
  • we don't make assumptions about the underlying io.ReaderAt implementation
  • we silence errors if we read all the bytes we were looking for as they don't impact correctness

@metalmatze
Copy link
Author

I'd be happy with such a helper function.

@kevinburkesegment
Copy link
Contributor

Apologies to make more work for you, but we've decided to move development on this project to a new organization at https://github.com/parquet-go/parquet-go to ensure its long term success. We appreciate your contribution and would appreciate if you could reopen this PR there if it is still relevant.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants