-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Beat stops processing events after OOM but keeps running #309
Comments
The exception handling in
As a workaround, consider adding |
@praseodym Thanks for the hint, I've tried it today, but it didn't trigger a JVM exit. I'll downgrade to |
The exceptions in
|
Try updating to 8.0.10 |
@tomsommer I did try |
@praseodym I am running |
I am getting the same behavior .. version 6.x [2018-04-02T16:14:47,537][INFO ][org.logstash.beats.BeatsHandler] [local: 10.16.11.222:5044, remote: 10.16.11.67:42102] Handling exception: failed to allocate 83886080 byte(s) of direct memory (used: 4201761716, max: 4277534720) |
Hi, I'm running logstash 6.2.3 with logstash-input-beats (5.0.10) and I have the same issue.
|
I got some useful advise on this thread Also , please post ur updates of upgrading to version 8 works for you |
I'm running Logstash 6.2.3 with logstash-input-beats 5.0.11 and facing the same issue. I've enabled both I have a few high volume FIlebeat agents have been switched to Redis because they kept crashing when sending to Logstash. The Filebeat log kept repeating "connection was reset" errors even though the rest of the beats agents were still working fine. These high volume Filebeat agents average around 5k/s and 10Mbps uncompressed each. The problem as experience is after the period of time, event rate will fall off to around 1% of previous. If it is the particular Filebeat agent that is affected, then that one agent will be 1% of previous event rate but everything else is still operating at full speed. If it is the logstash-beats-input then the entire Logstash process (including other inputs) will slow down to 1% of previous event rate All of this started in the logstash-input-beats 5.0.6, previous versions worked 100% perfectly. |
We have switched back to logstash-input-beats v5.0.6 on Logstash 6.2.3. None of the newer versions (5.0.8-5.0.14) would work, all would eventually OOM and slow down the whole Logstash process and all other input plugins until Logstash would eventually crash. Increasing the JVM size would cause Logstash to OOM quicker and crash sooner, reducing the JVM size would cause the OOM and crash to be much more gradual. Eventually it would OOM and crash which would cause the various Beats to queue up, once Logstash was restarted they start to flush the queue, which causes Logstash to OOM and crash sooner. Message load is about 10k/s per Logstash beat input, mostly Winlogbeat. |
@packetrevolt Sorry to hear that. How many beats nodes do you have, and what is the typical (uncompressed) message size? |
~800 beats nodes
The largest volume Beats (bursts of 5k/s+) I had to convert them to sending to Redis instead of Logstash (Domain Controllers and IDS sensors mostly) because they would continually fall behind and/or crash Logstash once logstash-input-beats >5.0.6. Upgrading logstash-input-beats was the only difference, same beat version and same config in both Logstash/Beats. Compression is default, level 3 I think it is. |
Having to tune the Beats sending-side sounds like a terrible workaround considering that the protocol supports backpressure, so the server should be able to indicate when it is too busy to handle the workload. Also, in large production environments with hundreds of Beats nodes deployed it can be challenging to reconfigure them all. |
This is the pattern I see. These screenshot is from Logstash 6.2.3 and logstash-input-beats 5.0.14. After a service restart the event rate stays high and latency low for 4-6 hours, after which the event rate starts dropping and the latency increasing. The only way this recovers is to restart Logstash. This Logstash server has zero filters, it receives from beats/tcp/udp inputs and dumps into Redis (to be processed by a different Logstash server). |
same problem,memory old grows up then stop work, no die,no running |
Sorry for inadverdently +1'ing this, but is there any chance we might get a look at this sometime soon? This problem still exists in the latest version and the only real way to keep our pipelines stable is to do like what @packetrevolt suggested, and simply restart logstash as memory usage gets high, which is also not an ideal workaround. More than happy to provide diagnostic information as needed. |
Same problem on Logstash 7.4.0 and logstash-input-beats-6.0.1 |
Having the same problem here. |
We got that issue fixed! It was a problem with VMware, the allocated RAM was not available on VMware host. Now we allocated fixed RAM size to the VM and is working fine. Hope it may help others to solve the issue. |
We have experienced the same issue with Logstash 7.4.2, Beat input plugin version 6.0.3. Logstash run in docker, with 4G of heap, and 6G of memory allocated for the container pod. |
To minimize the possibility to go OOM there are some strategies to implement in Beats input that could mitigate this problem.
|
I am running Logstash 6.2.0 in Docker with the following config:
and 6 GB heap size.
When overloading Logstash with a lot of events, with
logstash-input-beats (5.0.6)
I see the same pattern as in elastic/logstash#9195 (used heap hits max heap, events processing stops). Beats stops processing the incoming events and crashes most of the time.After finding #286, I did an update to
logstash-input-beats (5.0.10)
, now used heap stays low, but I get a bunch of those:I would prefer it to reliably crash (so it can be restarted automatically) instead of hanging around and doing nothing.
The text was updated successfully, but these errors were encountered: