You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have ~100 logstash-forwarder 0.4.0 clients sending to Logstash 1.5.4 with logstash-input-lumberjack 1.0.5 behind an AWS ELB.
When Logstash is unavailable (eg its restarting, or dies) the ELB continues to accept connections from logstash-forwarder clients but closes the connection when it realises it cannot proxy to Logstash.
This is seen on the client logs as:
2015/10/05 09:40:40.943298 Connecting to [x.y.a.b]:5043 (the-elb)
2015/10/05 09:40:40.952560 Connected to x.y.a.b
2015/10/05 09:40:40.952692 Read error looking for ack: EOF
2015/10/05 09:40:40.952738 Setting trusted CA from file: /etc/logstash-forwarder/the-elb.pem
2015/10/05 09:40:40.953257 Connecting to [x.y.c.d]:5043 (the-elb)
2015/10/05 09:40:40.959790 Connected to x.y.c.d
2015/10/05 09:40:40.959946 Read error looking for ack: EOF
2015/10/05 09:40:40.959992 Setting trusted CA from file: /etc/logstash-forwarder/the-elb.pem
2015/10/05 09:40:40.960627 Connecting to [x.y.a.b]:5043 (the-elb)
2015/10/05 09:40:40.970599 Connected to x.y.a.b
2015/10/05 09:40:40.970704 Read error looking for ack: EOF
2015/10/05 09:40:40.970749 Setting trusted CA from file: /etc/logstash-forwarder/the-elb.pem
2015/10/05 09:40:40.971342 Connecting to [x.y.c.d]:5043 (the-elb)
2015/10/05 09:40:40.977973 Connected to x.y.c.d
This is essentially the same behaviour as described in issue #293 but I'm less concerned about the cause and much more concerned about the failure mode.
As you can see from the timestamps in the log snippet above, logstash-forwarder is entering a tight retry-loop in this failure mode and this results in very high CPU on the client. Left long enough this causes a cascade failure of other processes competing for the CPU on the same computer as logstash-forwarder.
For now I will attempt to resolve the situation with nice(1) but logstash-forwarder should probably limit its retry aggressiveness.
There are several places in the current code where a time.Sleep() is used following a communication failure and this probably needs to be inserted into the Read error looking for ack code path too.
Thanks for helping make logstash-forwarder better!
Logstash-forwarder is going away and is replaced by filebeat and its friend, libbeat. If this is still an issue, would you mind opening a ticket there?
I have ~100 logstash-forwarder 0.4.0 clients sending to Logstash 1.5.4 with logstash-input-lumberjack 1.0.5 behind an AWS ELB.
When Logstash is unavailable (eg its restarting, or dies) the ELB continues to accept connections from logstash-forwarder clients but closes the connection when it realises it cannot proxy to Logstash.
This is seen on the client logs as:
This is essentially the same behaviour as described in issue #293 but I'm less concerned about the cause and much more concerned about the failure mode.
As you can see from the timestamps in the log snippet above, logstash-forwarder is entering a tight retry-loop in this failure mode and this results in very high CPU on the client. Left long enough this causes a cascade failure of other processes competing for the CPU on the same computer as logstash-forwarder.
For now I will attempt to resolve the situation with
nice(1)
but logstash-forwarder should probably limit its retry aggressiveness.There are several places in the current code where a
time.Sleep()
is used following a communication failure and this probably needs to be inserted into theRead error looking for ack
code path too.Uses of
time.Sleep()
:https://github.com/elastic/logstash-forwarder/blob/master/publisher1.go#L65
https://github.com/elastic/logstash-forwarder/blob/master/publisher1.go#L181
https://github.com/elastic/logstash-forwarder/blob/master/publisher1.go#L200
https://github.com/elastic/logstash-forwarder/blob/master/publisher1.go#L211
Likely place for a new
time.Sleep()
call:https://github.com/elastic/logstash-forwarder/blob/master/publisher1.go#L112-L113
The text was updated successfully, but these errors were encountered: