Failing of connection establishment and missing messages on rust-utp socket test #439

maqi · 2015-12-02T10:20:04Z

As detailed described in the Rust-UTP PR maidsafe-archive/rust-utp#8

the test proves a failing rate at around : 0.005% when socket without timeout and 0.25% when socket with timeout.

according to the work of Crust PR #437 , the test proves the crust level failing rate is around the same as the rust-utp when socket with timeout.

vinipsmaker · 2015-12-02T19:37:41Z

I'm suspecting this is being caused by the congestion control algorithm (and the changes brought by the timeout branch). I'm going to investigate it in detail.

vinipsmaker · 2015-12-04T17:23:23Z

I based my work on top of @maqi's test_network_utp branch.

After some attempts, debugging hours, observations and so on, I'm suspecting the problem is on the logic of the network. I believe the problem is on the congestion control algorithm from rust-utp. Missing packets make uTP unreliable. uTP isn't working properly under unreliable connections (which is exactly where UDP is supposed to be used). On the tests done, it was observed that the use of custom user timeout -- which is a feature used by crust to make concurrent reads/writes possible -- would increase the failure/hang rate.

My newest try was to change CHECK_FOR_NEW_WRITES_INTERVAL_MS to support my claim/suspicion. Making the check more "granular" would increase the chance to not be reading when the packet was sent, which would cause a "lost packet". The "lost packet" is exactly the reason why I believe the test is failing.

Under the usual 50i64, the rates I got from my machine were:

ok	err
6	15

When I changed the value to 200i64, the stats were:

ok	err
11	10

With my suspicion confirmed (actually, I'd need to run more tests, but they take too long to complete and I already have something to change and test to see if the problem goes away), the next step is to study the congestion control algorithm of rust-utp/uTP and propose a solution to make it more reliable (and maybe give better guarantees on connection timed out). Until a definitive solution is implemented, crust could change the CHECK_FOR_NEW_WRITES_INTERVAL_MS value to decrease the rate failure and use ping/pong messages to detect connection loss itself.

Also, we could implement solution #2 from meqif/rust-utp#4 (comment), which would make the design more reliable.

vinipsmaker · 2015-12-16T14:08:12Z

So, I found out that one PR of ours broke µTP LEDBAT.

Previously rust-utp would do a read without a timeout and would block until this timeout would be reached. So the algorithm was coded in such a way that, when the timeout was reached, it'd resend the packet at the correct time, as specified per LEDBAT. And one of the properties from LEDBAT is that it'd "back off" and slowdown exactly when the network usage is high, which is exactly what we're doing the expose the failure. Also, the maximum number of retries would be reached way sooner with small timeouts.

So, my idea to fix the problem is to refactor the code in such a way that we'd compute the passed time ourselves. This means storing the "time point" the last packet was resent (instead of relying on the wrong assumption that enough-time/the-right-amount-of-time has passed on the line following) and moving lines around such that if we're just coming back after a user timeout happened, we'll only do a read (and nothing else!) to wait for the packet for more time.

vinipsmaker · 2015-12-24T09:32:30Z

Kind of my changes are ready for some time already, but when I try to test against test_network_utp from modified (with changes for utp_wrapper reverted) #442, no messages are received at all (actual error being thread 'Service::listen' panicked at 'Error { repr: Custom(Custom { kind: InvalidData, error: StringError("Failed to decode CBOR") }) }', src/state.rs:165 and src/state.rs:165 being trans.receiver.receive_handshake().unwrap()).

I've been debugging what's wrong during this time. Actually I somehow might have broken the connect function from rust-utp (which calls recv, which is the function I needed to change).

vinipsmaker · 2015-12-28T10:09:33Z

I've been debugging what's wrong during this time. Actually I somehow might have broken the connect function from rust-utp (which calls recv, which is the function I needed to change).

It should be fixed now with the latest version of the set-read-timeout patch:

Upstream project's CI will timeout, but if you let the test run a little longer, tests will pass.

vinipsmaker · 2015-12-28T19:03:19Z

It's merged on master now. I'll create a proper PR with the Windows build fix that was suggested earlier among our core devs and then:

Update the network test.
Confirm the Windows hanging issue.
Find an alternative fix that doesn't break current behaviour/usage-patterns.

vinipsmaker · 2015-12-28T21:05:25Z

Gonna close this issue as the rust-utp reliability problem was solved. We should open another issue to track Windows hanging issue.

maqi assigned vinipsmaker Dec 2, 2015

maqi mentioned this issue Dec 2, 2015

Newly added test_network_utp hogs CPU and fails to finish #412

Closed

vinipsmaker mentioned this issue Dec 10, 2015

Improve uTP #445

Closed

vinipsmaker mentioned this issue Dec 28, 2015

add read timeout meqif/rust-utp#10

Open

vinipsmaker closed this as completed Dec 28, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failing of connection establishment and missing messages on rust-utp socket test #439

Failing of connection establishment and missing messages on rust-utp socket test #439

maqi commented Dec 2, 2015

vinipsmaker commented Dec 2, 2015

vinipsmaker commented Dec 4, 2015

vinipsmaker commented Dec 16, 2015

vinipsmaker commented Dec 24, 2015

vinipsmaker commented Dec 28, 2015

vinipsmaker commented Dec 28, 2015

vinipsmaker commented Dec 28, 2015

Failing of connection establishment and missing messages on rust-utp socket test #439

Failing of connection establishment and missing messages on rust-utp socket test #439

Comments

maqi commented Dec 2, 2015

vinipsmaker commented Dec 2, 2015

vinipsmaker commented Dec 4, 2015

vinipsmaker commented Dec 16, 2015

vinipsmaker commented Dec 24, 2015

vinipsmaker commented Dec 28, 2015

vinipsmaker commented Dec 28, 2015

vinipsmaker commented Dec 28, 2015