-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't retry POSTs #101
Comments
@lognaturel the client session request methods accept an optional
Requests are retried in case where there's a connection timeout or something. When this was added 2 years ago, it wasn't considered realistic that requests would take more 30 seconds to complete, i.e. a connection timeout was a more likely scenario. Do you want to disable retries for all POSTs or just that endpoint/method (or those that accept bulk data)? |
One other thing, maybe @tobiasmcnulty knows the details - when this POST retry issue was triggered, was Central responding to any other requests? i.e was it completely saturated with these bulk entity calls? Just wondering if there's a Central issue here too - like in this case maybe it should accept the request and reply HTTP/202 immediately but process it in the background. |
The error in our case was a read timeout; i.e., the server merely hadn't finished processing the request yet. Central wasn't otherwise under high load, though it's possible our database server was, which slowed down the request processing. If Central knows its own server-side timeout and wants to send back a 202 right before the timeout is hit that might be a nice touch, but I probably wouldn't bother...the error is only occasional on our side, and like you said we can increase the client side timeout, at least, if it gets bothersome. (I still think pyodk should auto-retry only idempotent HTTP methods.) Talking this through, another helpful change might be to increase the default pyodk timeout to something larger than Central's server side timeout. Then, barring connectivity issues, the client should always get back a status code that tells it whether the request was successful, was queued, or failed. |
That makes sense. I think it's much more likely now for other endpoints like the OData endpoints. People have a lot of data now!
This feels right to me too. The risk of issues may be low but I think they'd be hard to track down and may cause subtle cascading problems. There are some POST calls that do end up being idempotent but I think it's simplest to not retry any POST and let clients handle timeouts or other errors.
It's at 120s https://github.com/getodk/central/blob/2d95a612218e7c73ee8b3e715749401f08fc00ac/files/nginx/odk.conf.template#L60 |
OK we seem pretty set on switching to the default urllib3 method list. About the timeout, it seems sensible to match Central. Happy to be corrected on this but I think timeout applies to the original request and each retry request. Each retry has a incremental sleep time which |
That all sounds right to me! |
Many of the resources in Central have no user-specified unique key. This means that running the same POST multiple times can result in multiple resources that are duplicates from the end user's perspective. For example, App Users only have a display name so there's no guard against creating 100 App Users with the same name.
This is particularly problematic with bulk Entity creation. If a user specifies a large CSV, it may take the server a little bit to process. pyodk may get a timeout. If it retries, it adds the same batch of Entities a second time.
The issue is introduced at
pyodk/pyodk/_utils/session.py
Line 54 in b2cffe3
@tobiasmcnulty has pointed out that using the default instead is likely a good fix:
https://urllib3.readthedocs.io/en/stable/reference/urllib3.util.html#urllib3.util.Retry
The text was updated successfully, but these errors were encountered: