-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock on simultaneous nodeup #91
Comments
Please see https://github.com/kzemek/swarm-deadlock-repro for reliable reproduction of the issue. |
These are the logs produced with |
I've also tried manipulating the choice of sync node in hopes that it would solve the lock: kzemek@28516d9 But instead, the states of the All nodes tried to sync to |
Seeing this issue as well. When I revert to version 3.1 I don't see any problems with deadlocking on startup. |
We've been having this issue as well, and I'm pretty sure we also had this in 3.3.1 In our case we observed the following scenario. Lets say we have node A,B and C and the following happens: All nodes are now in So far we have resolved this with a state timeout in syncing, were stops the syncing and tries another node. It seems to work fine, however, this approach gave a few complications and made it a bit more complex. So a simpler approach could be to drop the pending_sync_request strategy and and just decline the sync request while syncing. |
I'm having an issue similar to #60, reproducible very often when I bring up containers with the app at roughly the same time. Looks like each node is waiting for another one, and they're perpetually stuck in
:syncing
state. Here are the:sys.get_status(Swarm.Tracer)
results from my 5 nodes: https://pastebin.com/EYLg6YNE . No custom options set, all default; clustering withlibcluster
gossip strategy.The text was updated successfully, but these errors were encountered: