Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High CPU on forwarder when the server disconnects quickly #535

Closed
jstangroome opened this issue Oct 5, 2015 · 1 comment
Closed

High CPU on forwarder when the server disconnects quickly #535

jstangroome opened this issue Oct 5, 2015 · 1 comment
Labels

Comments

@jstangroome
Copy link

I have ~100 logstash-forwarder 0.4.0 clients sending to Logstash 1.5.4 with logstash-input-lumberjack 1.0.5 behind an AWS ELB.

When Logstash is unavailable (eg its restarting, or dies) the ELB continues to accept connections from logstash-forwarder clients but closes the connection when it realises it cannot proxy to Logstash.

This is seen on the client logs as:

2015/10/05 09:40:40.943298 Connecting to [x.y.a.b]:5043 (the-elb)
2015/10/05 09:40:40.952560 Connected to x.y.a.b
2015/10/05 09:40:40.952692 Read error looking for ack: EOF
2015/10/05 09:40:40.952738 Setting trusted CA from file: /etc/logstash-forwarder/the-elb.pem
2015/10/05 09:40:40.953257 Connecting to [x.y.c.d]:5043 (the-elb)
2015/10/05 09:40:40.959790 Connected to x.y.c.d
2015/10/05 09:40:40.959946 Read error looking for ack: EOF
2015/10/05 09:40:40.959992 Setting trusted CA from file: /etc/logstash-forwarder/the-elb.pem
2015/10/05 09:40:40.960627 Connecting to [x.y.a.b]:5043 (the-elb)
2015/10/05 09:40:40.970599 Connected to x.y.a.b
2015/10/05 09:40:40.970704 Read error looking for ack: EOF
2015/10/05 09:40:40.970749 Setting trusted CA from file: /etc/logstash-forwarder/the-elb.pem
2015/10/05 09:40:40.971342 Connecting to [x.y.c.d]:5043 (the-elb)
2015/10/05 09:40:40.977973 Connected to x.y.c.d

This is essentially the same behaviour as described in issue #293 but I'm less concerned about the cause and much more concerned about the failure mode.

As you can see from the timestamps in the log snippet above, logstash-forwarder is entering a tight retry-loop in this failure mode and this results in very high CPU on the client. Left long enough this causes a cascade failure of other processes competing for the CPU on the same computer as logstash-forwarder.

For now I will attempt to resolve the situation with nice(1) but logstash-forwarder should probably limit its retry aggressiveness.

There are several places in the current code where a time.Sleep() is used following a communication failure and this probably needs to be inserted into the Read error looking for ack code path too.

Uses of time.Sleep():
https://github.com/elastic/logstash-forwarder/blob/master/publisher1.go#L65
https://github.com/elastic/logstash-forwarder/blob/master/publisher1.go#L181
https://github.com/elastic/logstash-forwarder/blob/master/publisher1.go#L200
https://github.com/elastic/logstash-forwarder/blob/master/publisher1.go#L211

Likely place for a new time.Sleep() call:
https://github.com/elastic/logstash-forwarder/blob/master/publisher1.go#L112-L113

@jordansissel
Copy link
Contributor

Thanks for helping make logstash-forwarder better!

Logstash-forwarder is going away and is replaced by filebeat and its friend, libbeat. If this is still an issue, would you mind opening a ticket there?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants