-
Notifications
You must be signed in to change notification settings - Fork 126
Failing of connection establishment and missing messages on rust-utp socket test #439
Comments
I'm suspecting this is being caused by the congestion control algorithm (and the changes brought by the |
I based my work on top of @maqi's After some attempts, debugging hours, observations and so on, I'm suspecting the problem is on the logic of the network. I believe the problem is on the congestion control algorithm from rust-utp. Missing packets make uTP unreliable. uTP isn't working properly under unreliable connections (which is exactly where UDP is supposed to be used). On the tests done, it was observed that the use of custom user timeout -- which is a feature used by crust to make concurrent reads/writes possible -- would increase the failure/hang rate. My newest try was to change Under the usual
When I changed the value to
With my suspicion confirmed (actually, I'd need to run more tests, but they take too long to complete and I already have something to change and test to see if the problem goes away), the next step is to study the congestion control algorithm of rust-utp/uTP and propose a solution to make it more reliable (and maybe give better guarantees on connection timed out). Until a definitive solution is implemented, crust could change the Also, we could implement solution |
So, I found out that one PR of ours broke µTP LEDBAT. Previously rust-utp would do a read without a timeout and would block until this timeout would be reached. So the algorithm was coded in such a way that, when the timeout was reached, it'd resend the packet at the correct time, as specified per LEDBAT. And one of the properties from LEDBAT is that it'd "back off" and slowdown exactly when the network usage is high, which is exactly what we're doing the expose the failure. Also, the maximum number of retries would be reached way sooner with small timeouts. So, my idea to fix the problem is to refactor the code in such a way that we'd compute the passed time ourselves. This means storing the "time point" the last packet was resent (instead of relying on the wrong assumption that enough-time/the-right-amount-of-time has passed on the line following) and moving lines around such that if we're just coming back after a user timeout happened, we'll only do a read (and nothing else!) to wait for the packet for more time. |
Kind of my changes are ready for some time already, but when I try to test against I've been debugging what's wrong during this time. Actually I somehow might have broken the |
It should be fixed now with the latest version of the Upstream project's CI will timeout, but if you let the test run a little longer, tests will pass. |
It's merged on
|
Gonna close this issue as the rust-utp reliability problem was solved. We should open another issue to track Windows hanging issue. |
As detailed described in the Rust-UTP PR maidsafe-archive/rust-utp#8
the test proves a failing rate at around : 0.005% when socket without timeout and 0.25% when socket with timeout.
according to the work of Crust PR #437 , the test proves the crust level failing rate is around the same as the rust-utp when socket with timeout.
The text was updated successfully, but these errors were encountered: