-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logstash 2.3.4 vs Logstash 1.4.2 #5682
Comments
This has to do with some changes to the pipeline architecture. Please see our new performance tuning guide, written specifically to address these changes: https://www.elastic.co/guide/en/logstash/current/performance-troubleshooting.html There's also a blog post on this subject coming soon. |
Thank you for the link. The problems is - we run this on the same hardware and same settings applied across the board. When we stop 1.4.2 instances 2.3.4 starts processing all messages just fine. We bring 1.4.2 back up and the volume for 2.3.4 goes down. Its like 1.4.2 does not allow 2.3.4 to get to Redis somehow. And all the parameters CPU, memory, disk access etc are within green area on all servers. We tried using -w and -b parameters with no luck - 1.4.2 just "blocks" 2.3.4 from Redis. Also when we run all servers at 2.3.4 we see Redis queuing constantly - like 2-100 messages always sit waiting to be processed. With 1.4.2 it never happened - Redis had 0 items in the queue. Again - I think there is something wrong with Redis input with 2.3.x. From what I read 2.3.x supposed to be so much faster but the result is actually the opposite. We even compared same configs with 1.4.2. and 2.3.4 when reading thousands of lines a second and 2.3.4 was not outperforming 1.4.2 at all. They were kind of at the same numbers. I am questioning now the direction Logstash goes - it appears to me there a lot of incompatibilities/gotchas/bugs that stops anybody from moving forward with 2.3.x installation. We have more than 600 servers running Logstash 1.4.2 and ES 1.7.2 and we spend a lot of time patching issues/bugs. With 2.3.x a lot of those issues (like throttling filter not supporting mutlithreading) are not addressed still and we have to rely on the community to fix it. It would be great if somebody from Elastic wrote a blog on how to upgrade ELK from older version to new ones and give some data on performance comparison - and I mean not 2-3 server installation but big installation and see how "easy" the upgrade process is going to be. At least it would give us an idea on what to expect from new ELK. So far results are disappointing. |
I see. You're trying to get 1.4.2 and 2.3.4 to both access Redis at the same time. The 2.3.4 method is totally different, and the newer plugin allows batching, where 1.4.2 is probably blocking due to the polling method (hence the unavailability you're seeing in 2.3.4). You'll probably have to switch all at once, or somehow parallelize your Redis instances (use different keys, for example) to migrate. |
Also, please watch this informative YouTube video (from Logstash creator, Jordan Sissel himself) about the changes to the Pipeline Architecture. |
Thanks that makes sense. How does it explain that 2.3.4 is not processing all items in the Redis queue? Example: We use the same script to get stats from Redis and actually the same Redis instance. With 2.3.4 there is almost always something in the queue. Like Logstash does not get all items from the list somehow. |
If you compare the code in your 1.4.2 setup (lib/inputs/redis.rb or something like that), vs. what is now available at https://github.com/logstash-plugins/logstash-input-redis you'll see that many changes have been made. Logstash is not blocking on read, and batching differently. A queue of 10 to 100 is not huge, and Logstash reads in batches of 100 by default. This is a tunable parameter. |
Thanks for the video and explanation. So am I getting it right? 2.3.4 Redis Input - threads 10 - means that I still have 10 workers connecting to Redis and pushing events to Filter (threads = number of processor by default) --> Output (Elasticsearch, default workers is 1 but we set it to 2). |
Also, you can increase |
you need to increse --pipeline-batch-size or --pipeline-workers. |
Well - batch_size does not really work. And we did test that before. Here is an example: B - Before the change. 3 servers each have 20 Redis threads (-b 500 -w 16 on a 8 core systems) So again - nothing really improved as far new Logstash. I did see stuff about new pipeline - so far I cannot prove to myself that it is any faster/better that 1.4.2 pipeline. All tests show the opposite. |
Have you tested throughput without using the Elasticsearch output? One of the biggest changes since 1.4.2 is that the Elasticsearch output now retries unsuccessful writes. I repeat, 1.4.2 does not retry unsuccessful writes, but newer versions do. Given the volume you're trying to send through Redis, it may be that the bottleneck is the Elasticsearch output. If you have debug logging enabled (or perhaps even just |
@untergeek - we process a LOT more that I post here. This is just test results and I have full confidence that we get all data in ES without delay. If you see my previous post - thread =1 and batch_count = 1 is performing much better than threads = 20 and batch_count = 1. Now this is a puzzle. |
Are they supposed to be different? Hammering the current Redis input with 20 separate threads may not be necessary. I will try to run some tests against a local redis today. |
@AGirin - LS 1.4 does not directly compare with LS 2.3 in terms of what you need to know to get the best performance from LS. When making such a big jump in versions you need to throw out the old knowledge and build anew. First starting with the output - by default the outputs are not shared, one output is used sequentially by the workers ( Next the filters, LS 2.3 will run multiple worker threads - each thread will grab a batch of events from the synchronous queue and process the batch through the filters. When done filtering the thread tries to acquire the output - it must wait if it can't. Please note the significance of the synchronous queue - an input thread and a worker thread MUST both be at the queue ready to exchange an event (the input thread Next the Redis input. By setting more than one Redis "thread" one, in fact, creates a complete duplicate input and thread. Each input thread will wait until it can sync up with a worker thread to hand over an event - please note this hand over is done one event at a time. Redis, being single threaded, will fulfill one redis input thread request at a time and the LS synchronous queue will fulfill one redis input thread In order to properly tune your installation I suggest that you add the stdout output with the dots codec with the linux Comment out the ES output, establish a baseline with 1 worker Once you have a good idea of how the 1) worker count, 2) pipeline batch size, 3) redis batch size, 4) redis input thread count variables interact with each other; you can uncomment the ES output and tune the 5) ES output worker count, 6) flush size and 7) idle_flush_time. If you can, try to send exactly the same set of events through redis for each tuning run. Be systematic and post your results here. |
Well - I am getting somewhere. I removed all threads and batch_count from all configs on one Logstash server. Throughput went down and other two servers were processing a lot more data (45%/45%/10% - new config server). From here I added just one config line to "new config" server - batch_count => 1. Now this server is processing a lot more than the other two: So somehow batch_count is not documented correctly - I think it is 125 and not 1 as documentation states. And setting batch_count => 1 explicitly made things a lot better and more promising. Also I moved config of that "new config" server back to default number of workers and default batch (so logstash init script does not have -w or -b parameters). And with these settings "new config" server is processing ~63% of all messages. The other 37% are processed by other two servers with old configs. As a side note I had to increase timeout => 300 (from default 5) because I started to see a lot of this messages after I changed batch_count => 1: {:timestamp=>"2016-07-25T12:43:47.231000-0500", :message=>"Redis connection problem", :exception=>#<Redis::TimeoutError: Connection timed out>, :level=>:warn} Timeout 300 fixed the issue. |
I ran some tests on my laptop (9 month old Macbook Pro). Each test was against 1000000 lines. There is no grok or processing of every kind. The output was measured with The following tests were run against Logstash 2.3.4 with Redis 3.0.6. Subsequent tests with 1.4.2 are in progress. Testing Redis input (
Testing Redis output (
Thread Tuning only:No tweaking of
Batch Tuning - 125:Note that this is the Logstash default batch size (no
Batch Tuning - 250 (using the command-line flag
|
@untergeek - I mistyped and corrected. Sorry about that.
|
@untergeek and @guyboertje - Thanks a lot for such a valuable input! Threads setting came from our Logstash 1.4.2 installation since this was the only way we could make our setup work and process all those millions of messages. It looks like with 2.3.4 redis threads are not needed anymore but batch_count looks to be the culprit. At least in my tests it shows that setting it to 1 (with everything else default) makes 2.3.4 a lot faster and it beats our old configs by far. I will continue testing and let yo know my findings. |
@untergeek - we are using Redis 2.8.11 What Redis version do you guys consider old? 2.8.x? I know they have 3.x now. |
@AGirin If you look at my test output above, a batch_count of 125 shows a slight improvement over 1. I tested 125 because the current default is 125:
I would recommend testing in between 1 and 125 to see if there's a peak and a fall off in between. |
@untergeek - I want to make sure it is clear - when I was saying threads I meant threads in Redis configuration (threads => 1) and NOT -b 1. In case somebody else is following this thread. So in my tests I had -b set to 125 (default) and in Redis config I had only batch_count => 1 (using default threads number of 1). |
@untergeek - Were you changing -w for logstash init? Or by threads you mean threads => xxx in Logstash Redis input config? |
Logstash tries to detect the number of cores and sets |
@AGirin Redis 2.8.11 is fine. |
Is my understanding correct? There's a lot of great information in this discussion and I want to make sure I got it right. @guyboertje There are three distinct stages in the pipeline - input, filter, and output. Usually, there is only one input thread per an input. The threads=> setting in the redis input will in fact run multiple threads in parallel. This has nothing to do with the pipeline workers. The -w setting comes into play in the filtering and output process. Say, if I have 10 workers but only one input thread, the workers somehow have to sync with that one input thread. Lastly, only one worker can output to Elasticsearch per a time, that means when output is being done by a single worker, it is blocking the others. |
@axfx |
I ran a lot of tests and the final is:
Thanks again for all the great info! |
@AGirin - reading back through the very earliest comments - you questioned whether 2.3.x is better than 1.4.2 for your case. It would be really helpful for future readers if you could write an update, providing that you did find that 2.3.4 is better (given the correct tuning of course). I don't feel that you need to include tons of detail. |
@guyboertje - I will. |
Closing - no feedback (v2 must be better than v1.4) |
We are working on a project to update our ELK to latest version. So far we have mixed results with Logstash. Basically we are running multiple servers side Logstash instances. Most of them are 1.4.2 and one is 2.3.4. If you look at the graph below you can see that 2.3.4 instance has extremely low processing volume. No matter what we do it never changes. We use exactly the same configs across all Logstash servers and they all read from Redis. Somehow it feels like new Logstash 2.3.4 cannot get information from Redis fast enough? We reproduced the same behavior in Staging as well. Any ideas are appreciated.
We also had the same result with Logstash 2.3.3
The text was updated successfully, but these errors were encountered: