-
Notifications
You must be signed in to change notification settings - Fork 413
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read error looking for ack: EOF #293
Comments
How are you connecting to Logstash - it says elk-cluster - is it via AWS ELB or something else? Or just direct? |
Connecting direct to logstash (1.4.2) lumberjack input. |
In the same physical network too. I had problems with read error before when load was too high and the logstash agents weren't able to process the events fast enough, but at the moment the cluster is healthy and this time the oddity is that the slowest machines are experiencing the error, not the ones pumping hundreds/thousands of events per second... |
Have you got many servers connecting? It might be that issue where if a client fails to connect due to certificate problem - it can cause the disconnection of other clients randomly. And because it retries connection every second it randomly throws loads of other clients off. You can patch for the above using the current version: It sits in logstash/vendor/bundle/jruby/1.9/gems/jls-lumberjack-0.0.20/lib if I remember correctly. The gem hasn't been updated so Logstash still using old one. If you can, try Logstash in debug mode it might tell you what's going on, but I don't think the current gem does any logging - maybe look at #180 in that case as it will log who's connecting and who's failing. |
Hi Jason, That patch worked! :) You're the bomb! Beers are on me if we ever meet IRL! How could we get that patch pushed so it is included in the next logstash package? Cheers, |
I am getting this on CentOS 6.4 with recent update of logstash and I built logstash-forwarder from master. Is this related to an issue with the SSL cert on my logstash server? |
"Read error looking for ack: EOF" means likely that the remote (logstash, On Fri, Dec 19, 2014 at 11:37 AM, Casey Dunham [email protected]
|
Seeing this issue as well on CentOS 5.9 using LSF 3.1.1 and Logstash 1.4.2. I think I read there will be a fix for this in the newest release of Logstash? I see that there is a beta release out, would it be worth switching over to this release to correct the issue? |
I'm seeing this as well with Ubuntu 14.04, LSF 3.1.1 and Logstash 1.4.2. I upgraded to 1.5.0.beta1 but no luck with that either so @jmreicha I don't know if that's going to change. |
I also just tried logstash-forwarder 0.4.0 with logstash 1.4.2 to no avail (I made a mistake above, should have been LSF 0.3.1 for the version. |
@rabidscorpio Can you provide the full LSF log output? |
I also have this problem - I have thought this is issue with config on output { this is keeping data from loading |
It seems like LSF 0.4.0 (built from master a week or so ago) is not compatible with Logstash 1.4.2? |
I have been troubleshooting this again recently and can see packets from the client (logstash-forwarder) going through the firewall, sending Syn packets endlessly. Is there anything that I can look at on the logstash server side to see why it is closing the connection (or isn't accepting the syn)? It is strange, for at least a small period of time, logs are being forwarded correctl. |
I also upgraded to LSF 0.4.0 (with Logstash 1.4.2) and am facing this issue now. Could someone re-open this ticket or is there a new one somewhere? |
I see this issue with Logstash 1.5.0beta1 (also saw it with 1.4.2) and a docker container for LSF from https://registry.hub.docker.com/u/digitalwonderland/logstash-forwarder/ One thing to note is that the connections tend to eventually become stable - however I have not been able to tell by any pattern that one is connecting correctly, since it can fail against connecting to one logstash server, then attempt to reconnect and join successfully another time. |
Let me describe what I believe this message to mean:
EXPERIENCING THIS ISSUE?Please put the following on https://gist.github.com/ and link it here:
If you aren't using Logstash 1.4.2 and logstash-forwarder 0.4.0, please upgrade and let me know if you are still having problems. |
Reopening - I need more information to make a determination. Please read my above comment and provide as much information as you can. Thank you! |
Thanks for getting back to me so quickly! I'm testing updating to rc2 currently, I'll bring back all the info you need once I complete some changes in the configuration for rc2 deployment. |
Hey @jordansissel thanks for reopening. I am going to put what I have so far in to the following gist. I will add on to this as much as I can. https://gist.github.com/jmreicha/dc25c3790793ae4a163f Logstash version is 1.5.0.beta1 Everything is running in Docker containers. The logstash stack (ELK) is running on its own server, the containers running the clients are running in a few different ways. The stuff I am having trouble is running in Kubernetes, so I don't know if that is maybe partially causing this issue? As far a I know there isn't any proxying. But logs that are being shipped by logstash-forwarder that are outside of Kubernetes seem to work fine for the most part so far but I haven't been able to narrow anything down yet. Here's a few things I've tested so far:
Definitely let me know what other details you need. I have been playing around with tcpdump but don't know if any of the debugs from it would be useful or not. |
@jmreicha we fixed a bug on the lumberjack input in 1.5.0 RC2, btw. Not sure if it's your bug, but may be. |
I have two hypotheses at this time:
|
@jordansissel For me bumping up to 1.5.0rc2 seems to have corrected the issue; I've not seen the issue repeat today since updating and using the new logstash plugins. |
Oh sheesh, I didn't realize there was a new release out :) I will get that put in place and see if it helps. The extent of my Kubernetes networking knowledge is limited. I know it uses a network overlay for distributed containers to communicate using flannel, and I also know it does some things in the background with iptables, I'm just not sure what exactly what it is. Interestingly enough, with tcpdump running I can see logstash-forwarder traffic coming in to the logstash server. Maybe my issue is a combination of the two? |
When logstash-forwarder says things like this: If you see these messages, and then later see "read error waiting for ack" it means something interrupted the previously-healthy connection. |
@brendangibat This is very exciting. Thank you for the report! |
Ah that makes sense, thanks for the clarification. |
Ah interesting. The json codec decode actually succeeds - it takes the float at the beginning of the line and returns that, rather than actually failing. This creates an event that is not a set of key-value pairs, it just is a float. This should be raised in https://github.com/logstash-plugins/logstash-codec-json I think - it really should check the decoded result is a key-value pairs I think, otherwise it does crash the input and really it should just mark it as decode failed. |
correct me if I am wrong but I thought based on the configs, only grok pattern decoder should be invoked, not json decoder? |
I seem to be running into this same problem, followed the same DO tutorial as naviehuynh, I have almost exactly the same config file as him from the 22nd. Are other people able to reproduce this error consistently? I'm seeing it as an intermittent problem where everything works fine and then it breaks. Non-json log statements seem to cause the EOF, but most of the time it picks up and starts processing again, only some times does it go into a repeating error state that strzelecki-maciek reported. |
I would say don't use the codec if some lines are not json. Use the filter and put a conditional on it to ensure you only decide valid json logs. This is my own method with Log Courier Though I have heard ppl say filter has issues. But with the plugin separations hopefully fixing the issues is faster and it's so easy to update individual filters in 1.5. |
@driskell Thanks, I'll try that. Is your filter actually validating the JSON? Or just filtering for files that should have valid json? I found the issue because I had a log file with corrupted text (not sure how, but there was weird corrupted text in the middle of a log for a python Traceback of an error.), so I can't assume that I will always have valid json in my logs. On Friday I was seeing the issue intermittently, over the weekend with the corrupted log file I could put the system in the EOF loop state every time I moved a copy of the file into the path, but my system seems to be in a different state today and only some times goes into the EOF loop. Just so that I understand logs like:
Indicate that something went wrong on the logstash server and the connection was closed, a properly working system should never see that even if you try to ingest a corrupted file? |
EOF usually means network problem or a crashed input. Checking logstash error log will say what it was as it will normally log why it failed. |
I had the same "Read error looking for ack: EOF" issue. I've started logstash in debug mode
And noticed that logstash could not connect to ES using default protocol. Changing it to http resolved my issue
So, probably there is one of possible reasons of this famous issue |
I can confirm that explicitly declaring a protocol for elasticsearch output resolved this for us. |
[endless loop of] after updating the plugin (bin/plugin update logstash-input-lumberjack) in logstash logs (with constantly disconnecting logstash_forwarder <--> logstash, the) i can see {:timestamp=>"2015-08-05T13:58:42.173000+0200", :message=>"Exception in lumberjack input thread", :exception=>#<ArgumentError: comparison of String with 1 failed>, :level=>:error} this only happens for redis and redis sentinel logs. Every other logfile works fine |
on logstash side - removal of codec: json fixed the issue. |
Same problem with nxlog in windows. I resolved adding a new line into the certificate. |
Some of comments here (recent one) make me believe its related to #293 |
Logstash-forwarder: 0.4.0 (client on diff server, collecting logs) 2015/08/14 12:28:29.137179 Socket error, will reconnect: write tcp 10.200.106.101:5043: broken pipe I had the same issue with all the same error messages on the forwarder and indexer, was able verify that there were no firewall issues, could see that the tcp connections were working... checked access .. tried stopping Logstash, but wouldn't go shutdown.. finally killed it :-( .. tried brining back up.. came up but same issue with all the "broken pipe" and other connection errors... After reading all the github issues 415, 416, and 293. .. and hearing that some folks had success .. thought I would give it a try.. although not the best option, I stopped all services/connections to Logstash.. stopped all the logstash-forwarder instances.. stopped Elasticsearch .. THEN :-) brought back Logstash first, then all the other services ... NO further errors in either of the logs, and Logstash + Logstash-forwarder is working fine.. so is Elasticsearch and I can see the messages in Kibana |
Back to square one.. the moment I touch anything in Logstash .. the "broken pipe" happens and cannot shutdown Logstash gracefully.. [root@sever107 software]# service logstash stop |
Hi. Env is:
logstash-forwarder config:
logstash-forwarder starts normally: 2015/09/08 18:00:59.490129 --- options -------
}, "files": [ 2015/09/08 18:00:59.491422 Waiting for 1 prospectors to initialise After (due to STDIN input) I try enter some text and error appears 2015/09/08 18:02:59.513948 Read error looking for ack: EOF In logstash in the same time I see errors too: About my network: |
Can you try using the rubydebug codec in a stdout output instead of the lumberjack input?
|
Ye, it works. A problem is, that for rubydebug codec method decode is not implemented, and this method is called when add codec in input section |
Hi All, FYI for anyone potentially googling to this bug (like me): In my case with the error "Read error looking for ack: EOF" in logstash-forwarder log there was no bug in the code and I was getting this error because logstash server was not able to connect to the ouput elasticsearch servers due to configuration error in logstash/ES/firewall on my part. Judging from the logs this prevented Logstash from flushing events to the output server and kept disconnecting the logstash-forwarder client with errors logged in logstash.log and logstash-forwarder log. This error in the logstash logs pointed to the real problem: ":timestamp=>"2015-09-29T04:04:35.592000+0000", :message=>"Got error to send bulk of actions: connect timed out", :level=>:error} Once I fixed the elasticsearch servers settings in my logstash config(+ES+firewall) it started to work fine. LogStash : 1.5.4 Forwarder Log **** _LogStash Log_** Thanks, Salman |
I've managed to reproduce this reliably with logstash-forwarder 0.4.0 and logstash 2.1.1: What I've seen is that copy-truncate can produce a very large Java GC log This log is starts with a bunch of Logstash
Logstash-forwarder
|
@jamesblackburn Interesting issue. I didn't test it yet but I assume this issue also exists in filebeat https://github.com/elastic/beats/tree/master/filebeat On the one hand it seems like a log rotation issue but on the other hand the log crawler should be able to "overcome" this issue. Did you find a solution to the problem or is it still an issue? If yes, would you mind testing it with filebeat and in case it is an issue, open an issue there? |
Thanks @ruflin It's still an issue with these files. The workaround is to disable logrotate on the Java GC out files. The biggest issue we have is actually on the jruby logstash end. I'm using exactly the config above - the receiver just pipes data directly to files, no filtering or parsing. Unfortunately the receivers are very heavyweight, and can't keep up with the write load. I've got 25 hosts with forwarders, and I'm running 8 receivers. The problems I've seen:
At this point I'm not sure that logstash scales, I'm considering Heka which claims to handle 10Gb/s/node(!): https://blog.mozilla.org/services/2013/04/30/introducing-heka/ |
@jamesblackburn As you mentioned yourself, it looks more like a Logstash issue. Perhaps it's better to start a discussion here: https://discuss.elastic.co/c/logstash |
He created this issue elastic/logstash#4333
The receiving input is multi threaded, only the TCP accept loop is not, each connections has his own thread. But you are correct the output to file is single threaded. |
I have hit the dreaded 'Read error looking for ack: EOF' and 'Connecting to' loop of death. We tracked it down to what looks like a parse error on the logstash side cause by a log message that begins with a number followed by white space. I reproduced the issue with a simple logstash set up (lumber jack input and file output, no filtering and no codec specified). When the line in the file logstash-forwarder is watching begins with a number followed by white space the non-recoverable error condition occurs. For example this will cause the error to occur: using logstash-1.5.4 and logstash-forwarder-0.4.0 |
Since upgrading to (0.3.1 / 0632ce)[https://github.com/elasticsearch/logstash-forwarder/commit/0632ce3952fb4e941ec520d9fad05a3e10955dc4] I've been getting this error a lot, but strangely only on the boxes where there is relatively little activity. The boxes which are sending hundreds of events per second never have this error, but where there are fewer events being sent it looks like this:
As you can see, always on the dot 10 seconds after the last log event, even though I've got the timeout set to 30s in the logstash-forwarder.conf
Any ideas what could be going on here?
The text was updated successfully, but these errors were encountered: