-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Take into account network latency when syncing #55
Conversation
…etting stuck in an always lib catchup state. Co-authored-by: Farhad Shahabi <[email protected]>
plugins/net_plugin/net_plugin.cpp
Outdated
@@ -1642,15 +1644,25 @@ namespace eosio { | |||
|
|||
sync_reset_lib_num(c); | |||
|
|||
auto current_time_ns = std::chrono::duration_cast<std::chrono::nanoseconds>(std::chrono::system_clock::now().time_since_epoch()).count(); | |||
auto network_latency_ns = current_time_ns - msg.time; // net latency in nanoseconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if this is negative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Negative would mean time skew between the nodes, should just make it 0
if < 0
I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense if the clock skew is known to be small... but you removed the check for skew. If the clock skew is close to the latency, then one side will see double latency and the other will see 0 latency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check for skew that was removed never worked (see comment I just added to PR for that section of code). I'm open for suggestions on alternatives, but I don't think there is any way to improve that, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would have to be based on RTT (which can be measured independent of clock skew) rather than one-way latency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that would work, but require a new protocol version and RT message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's probably fine to assume low clock skew for now, since we've survived so long without a working check for clock skew. This PR doesn't make things worse in that regard.
("peer", msg.p2p_address)("time", "1 second")); // TODO Add to_variant for std::chrono::system_clock::duration | ||
return false; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding note to this PR here for future documentation of why this was removed. Removed this code because it could never have worked. time is in microseconds where msg.time is in nanoseconds so time - msg_time is always negative.
Also there is no way to do what this was trying to do. You don't know how much network latency is involved so you have no idea what clock skew is involved.
plugins/net_plugin/net_plugin.cpp
Outdated
} | ||
// number of blocks syncing node is behind from a peer node | ||
uint32_t nblk_behind_by_net_latency = static_cast<uint32_t>(network_latency_ns / block_interval_ns); | ||
// Multiplied by 2 to compensate the time it takes for message to reach peer node, and plus 1 to compensate for integer division truncation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we change it to "to reach back to that peer node" I think the 2 times will be clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Take into account network latency when syncing from a node to avoid getting stuck in an always lib catchup state.
p2p_high_latency_test.py
) as it requires eitheriproute-tc
oriproute2
installed depending on platform.Co-authored-by: Farhad Shahabi [email protected]