Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to close idle connections for dead nodes #1507

Draft
wants to merge 5 commits into
base: release-branch.v7
Choose a base branch
from

Conversation

olivere
Copy link
Owner

@olivere olivere commented Jul 8, 2021

This commit adds a configuration option SetCloseIdleConnections.
The effect of enabling it is that whenever the Client finds a
dead node, it will call CloseIdleConnections on the underlying HTTP
transport.

This is useful for e.g. AWS Elasticsearch Service. When AWS ES
reconfigures the cluster, it may change the underlying IP addresses
while keeping the DNS entry stable. If the Client would not close idle
connections, the underlying HTTP client would re-use existing HTTP
connections and use the old IP addresses. See #1091 for a discussion of
this problem.

The commit also illustrates how to connect to an AWS ES cluster in the
recipes in recipes/aws-mapping-v4 and recipts/aws-es-client.
See the ConnectToAWS method for a blueprint of how to connect to an
AWS ES cluster.

See #1091

This commit adds a configuration option `SetCloseIdleConnections` to a
client. The effect of enabling it is that whenever the Client finds a
dead node, it will call `CloseIdleConnections` on the underlying HTTP
transport.

This is useful for.e.g. AWS Elasticsearch Service. When AWS ES
reconfigures the cluster, it may change the underlying IP addresses
while keeping the DNS entry stable. If the Client would _not_ close idle
connections, the underlying HTTP client would re-use existing HTTP
connections and use the old IP addresses. See #1091 for a discussion of
this problem.

The commit also illustrates how to connect to an AWS ES cluster in the
recipes in
[`recipes/aws-mapping-v4`](https://github.com/olivere/elastic/tree/release-branch.v7/recipes/aws-mapping-v4)
and
[`recipts/aws-es-client`](https://github.com/olivere/elastic/tree/release-branch.v7/recipes/aws-es-client).
See the `ConnectToAWS` method for a blueprint of how to connect to an
AWS ES cluster.

See #1091
@chrisharrisonkiwi
Copy link

Nice!
This will simplify my implementation with an AWS cluster.
Thanks for the solution 👍

@olivere
Copy link
Owner Author

olivere commented Jul 9, 2021

Nice!
This will simplify my implementation with an AWS cluster.
Thanks for the solution 👍

Do you actually have the chance to test it?

@chrisharrisonkiwi
Copy link

chrisharrisonkiwi commented Jul 20, 2021

Yes!
So the interesting part here is when I test it I'm using the cluster health check to determine how many nodes are available.
If zero, i close idle connections.
It works! I see my nodes immediately afterward.

Now the bad news.
The /_cluster/health/{index} cluster health check is usually a nice and fast endpoint.
Until the nodes become unreachable.
The very next health check once that happens is slow. Really slow.

Here's my log lines...
{"time":"2021-07-20T03:53:08.446041591Z","message":"Checking cluster health"}
{"time":"2021-07-20T04:05:07.490156019Z","message":"Nodes count","NodesFound":0}
{"time":"2021-07-20T04:05:07.490174537Z","message":"Closing Idle ES node Connections"}
{"time":"2021-07-20T04:05:07.490226072Z","message":"Finished closing Idle ES node Connections"}
{"time":"2021-07-20T04:05:07.490232218Z","message":"Checking cluster health again"}
{"time":"2021-07-20T04:05:07.567672918Z","message":"Nodes count","NodesFound":1}

12 minutes to do a health check is terrible.

I'm going to add a timeout like health.Timeout("5s") and on timeout failure assume it's the nodes problem.

But the good news at least, is closeIdleConnections sorts it out fast.

@olivere
Copy link
Owner Author

olivere commented Jul 21, 2021

@chrisharrisonkiwi I'm confused. I tried to put the automatic connection reset (by means of closing idle connections) into the client. So why do you still do it in your code?

As for any request, you can use the context to cancel early, like so:

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()
res, err := service.Do(ctx, ...)
...

@chrisharrisonkiwi
Copy link

@olivere Well now I'm confused also. Has this pull request been merged already?
If not, what is this PR for?

@chrisharrisonkiwi
Copy link

chrisharrisonkiwi commented Jul 22, 2021

Spent some time checking the closeIdleConnections technique once the cluster health timeout took place.
It ended up not refreshing the connection to nodes instantly.

I ended up re-generating my ES client on cluster health timeout.
Not as pretty as I would like, but it was able to recover my service from detection to recovery in under 4 seconds.

I'll keep watching this PR for now in case more detail on CloseIdleConnections comes to hand.

@olivere
Copy link
Owner Author

olivere commented Jul 22, 2021

@olivere Well now I'm confused also. Has this pull request been merged already?
If not, what is this PR for?

No, this PR is not merged yet. I was asking for anyone to test this PR (besides me). I don't have too much experience with running an Elasticsearch cluster on AWS, so I was hoping for a few more thumbs up.

This PR closes idle connections, automatically, once it finds a "dead" node during any of its requests. To enable (and test) this behaviour, you need to check out this PR and use it instead of the official github.com/olivere/elastic/v7 7.0.x. The easiest thing to do so is probably to git-checkout this PR and use a replace in your go.mod to point to the code in this PR, as described here.

To enable the feature of automatically closing idle connections, one also needs to pass elastic.SetCloseIdleConnections(true) as one of the options when creating a new Client. For AWS, I found that the best you can probably do is to to create a "dumb" client which ensures that NS lookups happen once in a while:

client, err := elastic.NewClient(
   elastic.SetSniff(false),
   elastic.SetHealthcheck(false),
   elastic.SetCloseIdleConnections(true),
   ...,
)

@notque
Copy link

notque commented Jul 1, 2022

I use the dumb client to get around it as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants