Feature timeouts and stats #251
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This pull request allows us to configure separate queue and service timeouts instead of just the total timeout.
The problem it's aimed at is that each request is blocking, so if the client is used in a near-capacity application and request queues start to build up it can become very inefficient.
The inefficiency in the current timeout mechanism is caused by the fact that the timeout covers both the queue and service time. If the request queue builds up then the requests that we put on the wire are the oldest ones which are most likely to expire while being serviced. If one expires while being serviced the connection is broken and re-formed and the next oldest request is taken. So if we're in a constant overloaded state the client will spend all its time disconnecting & reconnecting to the server and serve no requests at all.
To avoid this the new mechanism allows us to configure seperate queue and servicing timeouts.
This is demonstrated in overload_demo_test where we setup a dummy server with a delay and then overload it with 200 requests. The old mechanism can only service the first 2 or 3 requests and will timeout for the rest of the test, while the new mechanism will service a constant ~60% of the requests.
The old mechanism is retained and is working exactly as before, backward compatibility is tested in timeout_conn_test & timeout_no_conn_test.