-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TCP connect timeout errors #183
Comments
Of course, we're happy to provide any other info needed, but at this point we're not sure how to find more useful info. |
More info: one of our infra folks have pointed out that we have a backup server in our replica set, which is set unreadable, and isolated from network ingress; this could explain the We are thinking it could be the case that this failure is causing the entire pool to shutdown, giving the error we get when we try to query from the server. Does this sound like a likely cause for our problems? If so, is there some way we can exclude a host from the topology? |
The backup server that cannot be reached is most likely the cause of the tco connect errors. However, that should not take down the entire application. It should just keep trying to make a connection to that server forever (should probably change that behaviour in some way). As for the checkout issue, I am really unsure why that is happening. I will try to come up with some tests for you to run in order to figure out this issue. |
Would you be able to test my |
We most definitely will give your branch a go, thanks @ankhers. We've managed to get a local setup that replicates the problem with a bit of finagling of mongos inside docker composes, so we'll test your branch against that as soon as we're able. |
Is there any chance you can share those docker files so that I could do some additional testing? |
One of the devs on my team got this sorted (the docker networking setup, not the issue as a whole); I haven't looked at his notes yet, but I'll ask if he can comment on here with some details for you (AFAIK it was a bit of a pain to replicate, I think he ended up going with hacking |
Sorry for taking time to reply, but I have written a gist on how to run the containers with sample https://gist.github.com/vgunawan/4588cf26ca84ba57e39008d94dae032a Follow through the instructions.md file, should give you some idea what to do. |
Did you end up testing that branch with your setup? |
I'm having the same issue @ankhers , I've been investigating and debugging the code, I add more info: This happens to me when using a remote public uri but once the replica set is updated my original host is removed from the Topology state and the Regarding this problem, I think the mongo spec explains the problem and the expected behaviour here: does it sound good? PS: I tested with the |
I had the same issue @ankhers, and I think I resolved it by changing the options for There are two(actually three) timeout value:
The problem is if connect MongoDB timeout (timeout value is 5000), checkout connection from a pool will timeout (timeout value is 5000 too) too. And checkout timeout will exit the process (https://github.com/elixir-ecto/db_connection/blob/master/lib/db_connection/connection.ex#L54). After times crash, the So, we can modify the timeout for connect to MongoDB and checkout connection from a pool. Just like: timeout: 5000,
pool_timeout: 8000 Actually, we should use |
Just to keep on top of this. We added the |
Hey, not 100% sure, but having a similar error, must report that it totally had to do with internal (GCE) domain name resolution in my case: replacing local names with ips (which i never did with components on other platforms) totally solved my "timeout" problem. p.s. I spoke too soon about "totally solving" the problem, I really have no idea, it "just works" now (except when it doesn't, and fails with timeout). |
In my case I solved the issue passing this explicit option: should we document better the options? I was connecting remotely to an AWS mongodb and it was resolving the internal IPs giving errors, I just wanted to connect to my single server public dns. |
I am getting same error after i upgrade to v5.0.x, previous version was working fine |
Would anyone be willing to give me access to a database that is having this issue? I do not think there is much I can do if I am unable to see the issue firsthand. |
I'm just going to leave this here. Someone asked a question on the elixir forum about this issue. I'm adding it here incase we are able to track it down from that. |
We're having issues with connections to our prod cluster. Sometimes a query will work, but more often than not, it will throw:
when called directly from the server. We're not clear on what process it is saying is not alive.
Our logs show
tcp connect
errors that look like:and repeat indefinitely.
We're starting mongo in our supervision tree like:
With config:
We are not having any problems on our staging server which is using a standalone server.
We're using the current release
0.4.3
.Please help!
The text was updated successfully, but these errors were encountered: