-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-43605: [Go][Parquet] Recover from panic in file reader #43607
Conversation
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format?
or
In the case of PARQUET issues on JIRA the title also supports:
See also: |
|
Do we know why it was causing the panic in the first place? Can you add the corrupted file to https://github.com/apache/arrow-testing and then use that for a test? |
Sorry but I cannot share this corrupted file. I tried to reproduce a dummy file reproducing the bug but I did not find a way to corrupt the dummy file the same way as the original one because I do not know the corruption cause. |
apache/parquet-testing#48 have some corrput parquet file which is able to use, but it might need a few days to check and merge that. Maybe it's more easier to makeup a test with go writer? |
I managed to reproduce the dummy failing data.
It's a single column uint16 dataset, filled with 0 values. The same dataset using polars with the following version does not reproduce the bug:
To be able to execute this test, we now need to wait for this other PR to be merged: |
New PR dealing with the cause of the panic is opened: |
Does this is real bad-data? I found my parquet-c++ can read this out ? |
Maybe it should not be considered as bad data but arrow-go did not manage to open it before my PR. |
I'll try to figure out it later, you can try to read another "bad_data" file firstly I believe it's easy to re-producing with other bad data file |
@don4get the parquet-testing PR was merged, can you update the submodule here and push it so it can run the CI with the new file? |
Hello, the PR needed for testing has been approved but not merged: Can you please merge it? I will update this PR once it is merged. |
Would you mind read other bad-data? |
Other bad data files have been added 6 years ago, I doubt they are relevant for the issue I am targeting in this PR. |
Sigh, I would check I still don't know, since this pr is just recover from panic, #43616 is related to the avx2 decoding in |
Concerning the panics caused by #41317 and #41321, these are not urgent matter because they are in the same thread than the For the issue I am pointing in #43605, the problem is more severe because the panic happens in a goroutine, it is not handled by arrow and the user simply cannot handle it. User program will crash, no workaround possible. |
#43605 highlights a panic in a goroutine, which in the end is a crash in avx2 decoding. #41317 leads to a panic in validate, far from avx decoding. Plus, validate panics are explicitly written by go-arrow authors, we could say it is a API contract, even though it's not documented and difficult to anticipate for clients because I think I cannot I cannot reuse these files, it is not similar issues. |
I got this! Also, can you provide this link in polaris about how this issue is fixed in polaris? I cannot find that now |
testing file is merged, sorrying for delaying to find the reason. just go-through for testing |
I don't know yet what fixed the issue in
Thank you so much. I just updated the submodule both in the branch and the other one. |
5c291fc
to
ac12c20
Compare
ac12c20
to
654c839
Compare
if err != nil { | ||
t.Errorf("unexpected error: %v", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
require.NoError(t, err)
if err != nil { | ||
t.Errorf("unexpected error: %v", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
require.NoError(t, err)
if err == nil { | ||
t.Errorf("expected error: %v", err) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert.Error(t, err)
Looks like the last call in the unit test is getting |
Deferred function calls are executed in Last In First Out order after the surrounding function returns: https://go.dev/blog/defer-panic-and-recover
I manage to reproduce this locally with
I initially thought it was linked to |
The test fails when this flag is set: |
What is the error that you get when running it locally without the That particular flag controls whether or not optimized assembly code is utilized for decoding values, when that flag is set the library will fall back to pure go implementations of several decoding aspects such as unpacking booleans or packed integers. Without that flag, SIMD (vectorized) implementations of those functions are used instead on a per-architecture basis (falling back to pure go if we don't have optimized assembly for a given platform). |
While the default implementation fails to read the table (it returns an error), the Should I adapt the expect error if the flag |
@don4get I figured out the issue and uploaded a fix so that the tests should all work properly now. |
As the Go implementation has shifted to the github.com/apache/arrow-go repository, once all the CI here passes, I'm going to shift this to the new repo and merge it there instead as we are no longer merging Go PRs on this repo. (We've only held off on shifting ALL the Go PRs while we get the CI fully complete over there). |
Thanks @zeroshade :) |
Closing this to move it to apache/arrow-go#124 for the new Arrow Go repo. |
see apache/arrow#43607 original author: @don4get
Rationale for this change
The Parquet file reader should handle panics within its internal goroutines to ensure graceful failure and prevent client-side crashes.
What changes are included in this PR?
ReadRowGroups
now defers a panic recover from its goroutines and return an error to gracefully fail file reading.Are these changes tested?
No more tests were added because I cannot reproduce such a corrupted file with dummy data.
Are there any user-facing changes?
No
This PR contains a "Critical Fix".