Avoid panics? #6737

jp0317 · 2024-11-16T06:01:53Z

This is a follow-up for the discussion here. In short, the parquet implementation may panic upon invalid inputs. IIUC ideally we should avoid panics as much as possible, especially those can be converted to error results with small efforts.

emkornfield · 2024-12-10T20:15:28Z

It seems there is potentially an overall project question of when is using a panic desired and what efforts should be made to clean up existing code with this standard?

emkornfield · 2024-12-17T19:27:57Z

@alamb @tustvold is this something that has been discussed more widely within the community? having parameters on when panics are allowed would help avoid future friction on contributions we are looking to make.

alamb · 2024-12-17T21:04:24Z

It seems there is potentially an overall project question of when is using a panic desired and what efforts should be made to clean up existing code with this standard?

Yes I think having clear guidelines on when to panic (basically an uncatchable panic) vs return errors would be helpful

I think there is a tension between the decreased usability of APIs that require handling / propagating errors and allowing errors to propagate. In my opinion, if an API returns an error, it is an error that some program might want to catch. If the only errors that can be returned are due to bugs in code (aka internals errors) it is less clear to me an error is appropriate in that case

For invalid inputs beyond the control of the calling program (e.g. invalid parquet files) i think returning errors rather than panic is a very reasonable behavior

emkornfield · 2024-12-17T21:27:40Z

Yes I think having clear guidelines on when to panic (basically an uncatchable panic) vs return errors would be helpful

Would this be something to discuss on the mailing list (or in this issue) or something else? I understand this can be contentious and I don't have enough knowledge to actually say what is right for the project but would like to see it move forward so happy to help out where I can.

If the only errors that can be returned are due to bugs in code (aka internals errors) it is less clear to me an error is appropriate in that case

My perspective (which is admittedly not rust centric) comes from operating multi-tenant services. In this case the only cases that we wouldn't want to explicitly catch errors at a higher level is memory corruption (I think there is some gray area for out-of-bounds references). I've seen multiple occurrences of the equivalent in panic in C++, causing wide spread outages due to bugs in the assertion (i.e. the developer did not reason correctly about valid vs invalid state). Using a stack-unwinding panic handler is one way to mitigate but they come with there own issues (e.g. mixed FFI/Rust code).

emkornfield · 2024-12-17T21:29:22Z

Also CC @etseidl who I think had opinions here

tustvold · 2024-12-17T21:37:13Z

https://doc.rust-lang.org/book/ch09-03-to-panic-or-not-to-panic.html#guidelines-for-error-handling

I think is relevant here, panics are a perfectly valid way to handle unexpected or illegal system states, especially when this would have safety implications.

Errors should be used where the situation is expected, e.g. due to user input.

I would be pretty lukewarm on a load of PRs randomly replacing assertions with errors, this just makes code harder to read and blurs what situations are expected.

However, PRs that change panics reachable from public APIs into errors are welcome.

It is also worth noting, although mostly irrelevant in practice that handling results is potentially slower on the happy path than panics, as panic unrolling logic is off the hot path

Using a stack-unwinding panic handler is one way to mitigate but they come with there own issues (e.g. mixed FFI/Rust code).

FWIW anyone using the tokio ecosystem is using such a handler

alamb · 2024-12-17T22:19:25Z

Would this be something to discuss on the mailing list (or in this issue) or something else? I understand this can be contentious and I don't have enough knowledge to actually say what is right for the project but would like to see it move forward so happy to help out where I can.

I recommend we keep the conversation on this ticket as the convention is communication happens on github and thus is is also easier for people to find / discover in the future

I have had good luck in the past sending notes to the mailing list with a pointer to ticket when looking to solicit wider feedback, for example: https://lists.apache.org/thread/lqoh9mkss6xlod3380h85pjzowfsysw1

jp0317 · 2024-12-18T14:53:40Z

Thanks for all the comments! IIUC it seems we have a consensus like this? If a state should be unreachable regardless how user input looks like (in which case reaching this state probably indicates a code bug or internal error), then panic is definitely preferred. If a state can be reachable upon invalid/corrupted user inputs, then an error result may be better

westonpace · 2024-12-18T15:09:24Z

If a state should be unreachable regardless how user input looks like (in which case reaching this state probably indicates a code bug or internal error), then panic is definitely preferred. If a state can be reachable upon invalid/corrupted user inputs, then an error result may be better

Yes, with a caveat. If a state is reachable upon invalid / corrupted user input then that does not mean the panic should be converted to an error. It is just as likely that validation needs to be inserted into the pipeline earlier.

For example, let's say I have some invalid parquet input. Maybe a byte array length is larger than the page length. Maybe this causes us to not read all the items in the array and then we generate some kind of panic like arr_len != expected_len. We shouldn't simply change that assert into a Result. The site of the panic may far from the spot where validation should have happened. In other words, if we reach an invalid or unreachable system state, we should identify the part of the input that was incorrect that led to this state, and then insert validation routines to detect that invalid input and report it very clearly (as a result at this point).

jp0317 · 2024-12-18T15:12:35Z

If a state is reachable upon invalid / corrupted user input then that does not mean the panic should be converted to an error. It is just as likely that validation needs to be inserted into the pipeline earlier

Thanks for calling this out. This is better and clearer.

jp0317 changed the title ~~Convert some panics that happen on invalid parquet files to error results~~ Avoid panics? Nov 16, 2024

jp0317 mentioned this issue Nov 16, 2024

Convert some panics that happen on invalid parquet files to error results #6738

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid panics? #6737

Avoid panics? #6737

jp0317 commented Nov 16, 2024

emkornfield commented Dec 10, 2024

emkornfield commented Dec 17, 2024

alamb commented Dec 17, 2024

emkornfield commented Dec 17, 2024

emkornfield commented Dec 17, 2024

tustvold commented Dec 17, 2024 •

edited

Loading

alamb commented Dec 17, 2024

jp0317 commented Dec 18, 2024

westonpace commented Dec 18, 2024 •

edited

Loading

jp0317 commented Dec 18, 2024

Avoid panics? #6737

Avoid panics? #6737

Comments

jp0317 commented Nov 16, 2024

emkornfield commented Dec 10, 2024

emkornfield commented Dec 17, 2024

alamb commented Dec 17, 2024

emkornfield commented Dec 17, 2024

emkornfield commented Dec 17, 2024

tustvold commented Dec 17, 2024 • edited Loading

alamb commented Dec 17, 2024

jp0317 commented Dec 18, 2024

westonpace commented Dec 18, 2024 • edited Loading

jp0317 commented Dec 18, 2024

tustvold commented Dec 17, 2024 •

edited

Loading

westonpace commented Dec 18, 2024 •

edited

Loading