Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid panics? #6737

Open
jp0317 opened this issue Nov 16, 2024 · 10 comments
Open

Avoid panics? #6737

jp0317 opened this issue Nov 16, 2024 · 10 comments

Comments

@jp0317
Copy link
Contributor

jp0317 commented Nov 16, 2024

This is a follow-up for the discussion here. In short, the parquet implementation may panic upon invalid inputs. IIUC ideally we should avoid panics as much as possible, especially those can be converted to error results with small efforts.

@jp0317 jp0317 changed the title Convert some panics that happen on invalid parquet files to error results Avoid panics? Nov 16, 2024
@emkornfield
Copy link
Contributor

It seems there is potentially an overall project question of when is using a panic desired and what efforts should be made to clean up existing code with this standard?

@emkornfield
Copy link
Contributor

@alamb @tustvold is this something that has been discussed more widely within the community? having parameters on when panics are allowed would help avoid future friction on contributions we are looking to make.

@alamb
Copy link
Contributor

alamb commented Dec 17, 2024

It seems there is potentially an overall project question of when is using a panic desired and what efforts should be made to clean up existing code with this standard?

Yes I think having clear guidelines on when to panic (basically an uncatchable panic) vs return errors would be helpful

I think there is a tension between the decreased usability of APIs that require handling / propagating errors and allowing errors to propagate. In my opinion, if an API returns an error, it is an error that some program might want to catch. If the only errors that can be returned are due to bugs in code (aka internals errors) it is less clear to me an error is appropriate in that case

For invalid inputs beyond the control of the calling program (e.g. invalid parquet files) i think returning errors rather than panic is a very reasonable behavior

@emkornfield
Copy link
Contributor

Yes I think having clear guidelines on when to panic (basically an uncatchable panic) vs return errors would be helpful

Would this be something to discuss on the mailing list (or in this issue) or something else? I understand this can be contentious and I don't have enough knowledge to actually say what is right for the project but would like to see it move forward so happy to help out where I can.

If the only errors that can be returned are due to bugs in code (aka internals errors) it is less clear to me an error is appropriate in that case

My perspective (which is admittedly not rust centric) comes from operating multi-tenant services. In this case the only cases that we wouldn't want to explicitly catch errors at a higher level is memory corruption (I think there is some gray area for out-of-bounds references). I've seen multiple occurrences of the equivalent in panic in C++, causing wide spread outages due to bugs in the assertion (i.e. the developer did not reason correctly about valid vs invalid state). Using a stack-unwinding panic handler is one way to mitigate but they come with there own issues (e.g. mixed FFI/Rust code).

@emkornfield
Copy link
Contributor

Also CC @etseidl who I think had opinions here

@tustvold
Copy link
Contributor

tustvold commented Dec 17, 2024

https://doc.rust-lang.org/book/ch09-03-to-panic-or-not-to-panic.html#guidelines-for-error-handling

I think is relevant here, panics are a perfectly valid way to handle unexpected or illegal system states, especially when this would have safety implications.

Errors should be used where the situation is expected, e.g. due to user input.

I would be pretty lukewarm on a load of PRs randomly replacing assertions with errors, this just makes code harder to read and blurs what situations are expected.

However, PRs that change panics reachable from public APIs into errors are welcome.

It is also worth noting, although mostly irrelevant in practice that handling results is potentially slower on the happy path than panics, as panic unrolling logic is off the hot path

Using a stack-unwinding panic handler is one way to mitigate but they come with there own issues (e.g. mixed FFI/Rust code).

FWIW anyone using the tokio ecosystem is using such a handler

@alamb
Copy link
Contributor

alamb commented Dec 17, 2024

Would this be something to discuss on the mailing list (or in this issue) or something else? I understand this can be contentious and I don't have enough knowledge to actually say what is right for the project but would like to see it move forward so happy to help out where I can.

I recommend we keep the conversation on this ticket as the convention is communication happens on github and thus is is also easier for people to find / discover in the future

I have had good luck in the past sending notes to the mailing list with a pointer to ticket when looking to solicit wider feedback, for example: https://lists.apache.org/thread/lqoh9mkss6xlod3380h85pjzowfsysw1

@jp0317
Copy link
Contributor Author

jp0317 commented Dec 18, 2024

Thanks for all the comments! IIUC it seems we have a consensus like this? If a state should be unreachable regardless how user input looks like (in which case reaching this state probably indicates a code bug or internal error), then panic is definitely preferred. If a state can be reachable upon invalid/corrupted user inputs, then an error result may be better

@westonpace
Copy link
Member

westonpace commented Dec 18, 2024

If a state should be unreachable regardless how user input looks like (in which case reaching this state probably indicates a code bug or internal error), then panic is definitely preferred. If a state can be reachable upon invalid/corrupted user inputs, then an error result may be better

Yes, with a caveat. If a state is reachable upon invalid / corrupted user input then that does not mean the panic should be converted to an error. It is just as likely that validation needs to be inserted into the pipeline earlier.

For example, let's say I have some invalid parquet input. Maybe a byte array length is larger than the page length. Maybe this causes us to not read all the items in the array and then we generate some kind of panic like arr_len != expected_len. We shouldn't simply change that assert into a Result. The site of the panic may far from the spot where validation should have happened. In other words, if we reach an invalid or unreachable system state, we should identify the part of the input that was incorrect that led to this state, and then insert validation routines to detect that invalid input and report it very clearly (as a result at this point).

@jp0317
Copy link
Contributor Author

jp0317 commented Dec 18, 2024

If a state is reachable upon invalid / corrupted user input then that does not mean the panic should be converted to an error. It is just as likely that validation needs to be inserted into the pipeline earlier

Thanks for calling this out. This is better and clearer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants