-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: Future operation cancelled flooding logs #2780
Comments
Hello @Ivansete-status , I come back with more news about this. I just tested this again. Maybe this information helps:
If you can provide more information about why this log line it would be helpful. |
Thanks for the update! We need to pinpoint the actual commit that caused that issue. |
After some debugging thanks to @Ivansete-status we found out that:
|
After verifying with @AlbertoSoutullo, it seems that merging #2823 and #2824 fixed the issue 🥳 Closing it :)) The fixes increased the time to get a stable mesh, which makes sense as we used to establish connections in a very aggressive way. We need to calibrate how many |
Problem
From DST, we are testing version
v0.28
, and we noticed a "regression" in terms of how our simulations get stable.As a summary with respect to explaining the simulation: We use 45 nWaku nodes as bootstrap nodes, and then 50 normal nWaku nodes. Each normal nWaku node will have 3
--discv5-bootstrap-node
at random. They will also use--relay-peer-exchange
.Before we inject traffic in the simulation, we make sure that every node is "healthy". We obtain this information from
libp2p_gossipsub_healthy_peers_topics
. This returns the number of topics with peers in the ranged_low < nºpeers < d_high
(in our case, just 1).In version
v0.27
, everything is ok. The aforementioned scenario reach healthy state in less than 1 minute in a constant manner.In version
v0.28
, this behavior is very inconsistent. It can vary from a couple of minutes to more than 10 minutes (cancelled at that point).The only difference between these 2 cases is the nWaku version, everything else is exactly the same.
What we found so far is that the following line is very present in the nodes that can't reach a healthy state:
Dialing canceled topics="libp2p dialer" tid=7 file=dialer.nim:67 err="Future operation cancelled!"
Impact
It slows down considerably the setup time of the simulations for DST. Also, as in the previous version this was not happening, we assume there is an unexpected behavior / possible bug that might be good to look into.
To reproduce
libp2p_gossipsub_healthy_peers_topics
metric varies.Expected behavior
No connection getting cancelled if there is no interruption.
Screenshots/logs
This log is an example of v0.28:
nodes-11_waku.log
nwaku version/commit hash
v0.28.0-rc.0
The text was updated successfully, but these errors were encountered: