-
Notifications
You must be signed in to change notification settings - Fork 57
Add "WithError" versions of NewReader/NewGenericReader #305
base: main
Are you sure you want to change the base?
Conversation
Hi @SgtCoDFish thanks for your PR! This is certainly a change we can accept. I agree that it's better to not break the API just yet. Something I'm not sure tho is the name of the function. Usually (in the stdlib for example) functions with Wondering if we should use something else like |
New(Generic)Reader calls can fail at runtime if they encounter an invalid parquet file, which means that this library is difficult to use in a situation where an arbitrary or potentially untrusted parquet file might need to be read. This adds OrError versions of these instantiation functions, which enables easier handling of errors without having to recover from a panic Also add tests for creating a reader with a broken file, which illustrates the need for New(Generic)ReaderOrError
Ah I much prefer I thought about the name a bit and ended up just going with my first idea; I'm definitely happy to change it. I've made that change 😁 |
Something I had documented (but maybe not highlighted enough) is the pattern which is currently supported to handle configuration errors; here is an example https://pkg.go.dev/github.com/segmentio/parquet-go#NewBuffer I wonder if this would be an acceptable approach, or whether we feel like the new APIs are worth introducing still? |
I think if configuration errors were the only way that creating a |
Thanks for sharing these details. You could open the file ahead of time to handle errors caused by invalid files, for example: f, err := parquet.OpenFile(...)
if err != nil {
...
}
r := parquet.NewGenericReader[Row](f) Then you have the guarantee that the reader constructor won't panic. Let me know if this is an acceptable solution for your use case. |
This would probably work in our use case in the short term but it's quite verbose and fiddly. Using I like that Another issue with panics is that there's no easy way for me to know whether a new failure mode which could cause a panic might be added to |
@achille-roussel gentle bump - does my reasoning above seem reasonable? |
@Pryz @achille-roussel Hi, another gentle bump - does this look like it might be acceptable? Just looking to tie off the open PRs I have but of course I recognise that people are busy 😁 |
It's currently difficult to use
NewReader
/NewGenericReader
in a situation where an invalid file might be read, since they can panic when trying to read an invalid file.This PR adds WithError versions of
NewReader
andNewGenericReader
, which return an error if something goes wrong rather thanpanic
ing.This PR also adds a testdata file which causes a
New(Generic)Reader
to panic, and adds tests ensuring thatNewReaderWithError
andNewGenericReaderWithError
return an error when trying to read that file. It was that file which alerted me to this issue!I figured it was easier to raise a PR for discussion than to create an issue for this - if this needs any work or if this isn't something you want to add to the library, please let me know!
Alternative Approach
Another approach would be to break the API and change the signatures to e.g.
func NewReader(...) (*Reader, error)
- given that this library isn't at v1 yet, that might be an option which some libraries would take.I assumed that breaking existing code wouldn't be desirable here, though - if this library ever publishes a
v2
, that might be a thing to do then.That said, if you'd be happy to break the API I'd gladly change this PR to do that instead!