Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve reqwests error display #6377

Open
crepererum opened this issue Sep 10, 2024 · 10 comments
Open

Improve reqwests error display #6377

crepererum opened this issue Sep 10, 2024 · 10 comments
Assignees
Labels
enhancement Any new improvement worthy of a entry in the changelog good first issue Good for newcomers object-store Object Store Interface

Comments

@crepererum
Copy link
Contributor

Originally reported in #5882 (comment) :

For #5882 we only report Generic S3 error: error decoding response body. We should try to improve the error message. It seems that the Display impl. is not super helpful and we might wanna use Debug instead, see seanmonstar/reqwest#2373

@crepererum crepererum added object-store Object Store Interface enhancement Any new improvement worthy of a entry in the changelog good first issue Good for newcomers labels Sep 10, 2024
@tustvold
Copy link
Contributor

tustvold commented Sep 10, 2024

One thing to be aware of is exposing sensitive headers, or URL path segments.

We could also provide information on common causes of this error - e.g. network/runtime instability

@crepererum crepererum self-assigned this Sep 10, 2024
@itsjunetime
Copy link
Contributor

take

@crepererum crepererum removed their assignment Sep 15, 2024
@erratic-pattern
Copy link
Contributor

take

@itsjunetime itsjunetime removed their assignment Sep 30, 2024
@erratic-pattern
Copy link
Contributor

One thing to be aware of is exposing sensitive headers, or URL path segments. We could also provide information on common causes of this error - e.g. network/runtime instability

The documentation for reqwest::Error only mentions URLs as a concern. Hopefully this means that there are no headers in the output, but I do not know for certain.

There is reqwest::Error::without_url which can be used to scrub the URL from the reqwest error. Either the user could explicitly call this when they have sensitive information in URLs, or we could provide a helper method to accomplish this.

Another option is to have for each object storage API provide its own scrubbing method to eliminate specific sensitive headers from AWS, Azure, etc. I do not know enough about these object storage APIs to make this change.

For now I am just going to change to using the Debug information to see if it provides enough information.

erratic-pattern added a commit to erratic-pattern/arrow-rs that referenced this issue Oct 6, 2024
see apache#6377

This should help in the interim to identify specific causes of, sometimes confusing, network
related errors. I think this change should be temporary until we can find a better way to
improve error messages.
erratic-pattern added a commit to erratic-pattern/arrow-rs that referenced this issue Oct 6, 2024
see apache#6377

This should help in the interim to identify specific causes of, sometimes confusing, network
related errors. I think this change should be temporary until we can find a better way to
improve error messages.
erratic-pattern added a commit to erratic-pattern/arrow-rs that referenced this issue Oct 6, 2024
see apache#6377

This should help in the interim to identify specific causes of, sometimes confusing, network
related errors. I think this change should be temporary until we can find a better way to
improve error messages.
@crepererum
Copy link
Contributor Author

Regarding the argument that @tustvold brought up here: #6518 (comment) and that basically boils down to "people should walk the error chain":

I think the issue here is that object_store is somewhat inconsistent: for some errors, Display prints the entire chain e1: e2: e3 but for reqwest errors it does not. That also make the proposal to "walk the error chain" somewhat impractical, because if you walk the chain and use Display to print the chain parts, you gonna repeat some errors. So our standpoint is to really tell people that the chain should be walked, then stuff like this

#[snafu(display("Error performing DeleteObjects request: {}", source))]
DeleteObjectsRequest { source: crate::client::retry::Error },

is wrong and

#[snafu(display("Error bla bla: {}", source))]

should be

#[snafu(display("Error bla bla"))]

@tustvold
Copy link
Contributor

I personally agree that walking the error chain should be the approach, it's the pattern used by the vast majority of other languages, but unfortunately support for this in the ecosystem is limited. If Rust had supported this from day one, perhaps we wouldn't be in this mess, but alas it did not. As it stands I'm honestly not even sure what the consensus is...

Moving to this particular issue, I think we need to surface enough information within the default display implementation to diagnose issues, however, this does not mean we should just expose everything, including what may be sensitive or internal details. My point w.r.t walking the error chain is that in the absence of a clearly articulated error scenario the only option is to either expose everything, which I don't agree with, or instruct people to walk the error chain to get the more detailed information they require.

Perhaps we could reframe this issue as a concrete, what would I like to be in the output that currently is not

@crepererum
Copy link
Contributor Author

Perhaps we could reframe this issue as a concrete, what would I like to be in the output that currently is not

Fine with me. I think #5882 provides a good example. The reqwest error that is stringified only says error decoding response body which is somewhat misleading w/o looking at the error chain. It sounds like the body data is malformed or the data got scrambled, but if you walk the error chain you'll often see it's simply a timeout that occurred while fetching the body bytes. So I wonder if we should wrap reqwest::Error into another error type that tries to extract SOME useful information (it could itself walk the chain and find the underlying IO error for example).

@tustvold
Copy link
Contributor

tustvold commented Oct 11, 2024

This sounds reasonable to me, it certainly would help users to make more clear that error means that the request was interrupted mid-stream, even if there isn't really anything more that can be said about why.

@erratic-pattern
Copy link
Contributor

I feel that cramming more error sources into the Display is a step in the wrong direction. Despite the half-baked edges of Rust error handling making this a tricky decision for library maintainers, following standard practices should lead to a better outcome for both application developers and library developers.

As @crepererum pointed out, right now for a application developer, there is a confusing incompatibility in error messaging between Arrow ecosystem crates and (most?) other third-party crates. Using Display seems to "just work" for getting detailed information for Arrow crates, until the error goes into a third-party crate and then it doesn't.

"So just use an error reporting library, "as the reqwest maintainers suggest. Well, no. Even if you follow "best practice" you're still hindered as an application developer because using any of the available error reporting crates with arrow-rs will produce very redundant and ugly error reports.

To highlight this visually I've made an example repo that creates a simple "application" using object_store with a popular error reporting crate called color_eyre. It attempts to connect to an S3 bucket that doesn't exist and then displays an error.

Here is the output:

Error:
   0: Generic S3 error: Error after 10 retries in 5.723990834s, max_retries:10, retry_timeout:180s, source:error sending request for url (http://169.254.169.254/latest/api/token)
   1: Error after 10 retries in 5.723990834s, max_retries:10, retry_timeout:180s, source:error sending request for url (http://169.254.169.254/latest/api/token)
   2: error sending request for url (http://169.254.169.254/latest/api/token)
   3: client error (Connect)
   4: tcp connect error: Host is down (os error 64)
   5: Host is down (os error 64)

Location:
   examples/object_store.rs:11

Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.

Maybe this is tolerable, but it's definitely ugly. You are at least getting as much information about the error chain as possible. Additionally, by cramming more information into the top-level error message, you're helping users in the default case to discover useful errors beyond just "Generic S3 error". So there is an argument to be made that the current state is acceptable, even if not ideal..

I think this discussion is beyond the scope of just "improve reqwests error display" and we should create an issue to discuss a more broad improvement to error messaging with both the default rust panic handler as well as the color_eyre (or similar crate) panic handler. If that sounds reasonable, I can write one up with points made from this discussion.

I would also like to create a similar example repo for DataFusion to bring up the discussion there, since they also use the same approach to error messaging, but I think the discussion needs to happen here primarily since DataFusion is largely required to follow whatever convention arrow-rs takes.

@erratic-pattern
Copy link
Contributor

erratic-pattern commented Oct 14, 2024

From our (InfluxData's) side, we are most likely going to unroll the cause chain for object_store errors manually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog good first issue Good for newcomers object-store Object Store Interface
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants