-
Notifications
You must be signed in to change notification settings - Fork 824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
object_store: Retry on connection duration timeouts? #6287
Comments
The golang GCS client supports automatic retrying here via keeping track of how many bytes the client has already seen and creating a new ranged read request to resume the read where the client left off: |
So apparently I feel like the docstring of I'm not that happy yet I'm also not sure if the calls to |
take |
take |
Closes apache#6287 This PR includes `reqwest::Error::Decode` as an error case to retry on, which can occur when a server drops a connection in the middle of sending the response body.
I've made a PR #6519 similar to the closed PR #5383, but only permits EDIT: I am not sure this is exactly the same issue that @flokli is seeing, but it is the one being seen in the #5882 issue so maybe I should associate this PR with that issue instead? |
Given the subtle nature of errors and retries in the context of streaming requests, I think the only practical way forward will be to create an example/ test case that shows the problem. The example would likely look something like:
This test would also let us explore the various issues / corner cases that could be encountered It would be likely that writing the test harness would be a significant undertaking, but that would be the best way to inform a proposal for API changes |
We already have all of this setup as part of the retry tests - https://github.com/apache/arrow-rs/blob/master/object_store/src/client/retry.rs#L505. It would be a relatively straightforward enhancement to inject a failure mid-stream. Edit: As for what the API would look like, given the only streaming request is get, and the corresponding logic is already extracted into GetClientExt which is shared across all implementations. This would be the obvious place to put any retry logic for streaming requests. |
Perfect! So maybe @erratic-pattern you can make a PR with the failure scenario ? It would be valuable to document in code the current behavior, even if we decide we don't have the time to fix it. |
**Is your feature request related to a problem or challenge?
I'm using
object_store
to stream large(r) files to in this case, the body of a HTTP .I essentially do a
store.get(path).await?.into_stream()
to get the data stream.When using it with the Azure backend, I noticed that Azure reliably closes the connection after 30 seconds. Other providers (S3) also explicitly inject errors, but keep most of the error handling in their SDKs.
I know there's some retry logic for some error cases in
object_store
, but the "connection closed while getting data as a stream" part doesn't seem to be covered. I think there should be a high-level function that retries receiving the remaining data in these error cases (verifying the etag is still the same).Describe alternatives you've considered
Manually dealing with error handling and retries in all
object_store
consumersThe text was updated successfully, but these errors were encountered: