-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingesting IPFIX Flows From YAF Oddly Slow #151
Comments
After more benchmarking, it seems as if processing YAF flows doesn't scale vertically. The following was captured using Digital Ocean CPU-Optimized Droplets while running MongoDB, Logstash, Elasticsearch, and Kibana. Elasticsearch and Kibana were needed in order to collect the data. The system was benchmarked with the scripts here as well as with two yaf commands run directly on a sample pcap file. The system caps out at around 13000 flows/ second due to the MongoDB output plugin and database in the pipeline.
EDIT: Added more details for each benchmark
|
The YAF flows use flowStartMilliseconds (152) and flowEndMilliseconds (153) fields (the Cisco flows don't use them) so take a slightly other path in the code: logstash-codec-netflow/lib/logstash/codecs/netflow.rb Lines 382 to 397 in f131bcf
|
Thanks for looking at this! A very similar code path is taken for the Cisco v9 however. logstash-codec-netflow/lib/logstash/codecs/netflow.rb Lines 297 to 302 in f131bcf
In the YAF IPFIX case, we have a division and a timestamp conversion (line 388). In the netflow case, we have more arithmetic operations and a timestamp conversion. Nonetheless, the netflow data scales well. |
I've created a new benchmark script that compares the two. https://github.com/logstash-plugins/logstash-codec-netflow/blob/master/spec/codecs/benchmarks/flowStartMilliseconds.rb From that benchmark it seems v.snapshot is solely responsible for the difference in performance.
|
Giving it some more thought, the gains are only 2 seconds for 1 million flows. |
Circling back to this issue... perhaps the performance discrepancy can be explained by the huge number of templates YAF sends? Every time a tpl comes in, we use a mutex to synchronise access to it. Which also applies to data packets because they need access to the template. This makes it a non-parallellizable operation, perhaps explaining your flat scaling curve. |
I don't think the number of templates can be the issue. When we run the benchmark script, we see similarly poor performance as we do when running YAF outright. Yet, the benchmark script only sends the template once. |
Hmm yes good point! |
I've been benchmarking the netflow codec for a tool that stitches IPFIX/ netflow data together into bi-directional sessions. The codec seems to run well when ingesting data from SonicWall and Cisco devices. However, the codec runs oddly slow when ingesting flows produced by YAF.
Test Machine Specifications:
-Xms8g -Xmx8g
4.1.0
6.2.4
Known benchmarks:
First, I ran yaf with the following command:
yaf --uniflow --in test.pcap --out my.logstash.ip --ipfix-port 2055 --ipfix udp
and saw that I was only processing 3000 flows/ second.I did some packet captures and saw that about one fifth of the data was templates. In order to make a better comparison to known benchmarks, I created a similar script (had to change file extension to txt).
With the script, the codec manages 4000 flows/ second.
Running yaf with
--silk
doesn't seem to dramatically change the performance characteristics. I didn't create a test script for silk style flows, but when running the tool, adding--silk
resulted in 200 less flows/ second on average.I know it is not IPFIX specific since I created a similar test script (can be provided) for flows based on SonicWall's IPFIX implementation. The SonicWall IPFIX benchmark managed 17500 flows/ second.
I know it is not template-processing related since the benchmark only sends the template once.
The YAF flows encode timestamps as flowStartMilliseconds which need a simple conversion to iso8601, while the sonicwall flows only include systemInit relative timestamps, so they don't need a conversion. I thought this may be the cause, but netflow v9 also requires the timestamp conversion on the cisco data sets. The cisco data sets perform fine.
The YAF flows do have a
subTemplateList
but that should be efficiently handled since the field is labeled:skip
in the definitions file.I thought the data size of each flow may affect the speed at which the flows are processed. However, the YAF benchmark bundles 25 flows per packet with an overall packet size of 1415 bytes, while the cisco asr benchmark bundles 21 flows per packet with an overall packet size of 1392 bytes. The cisco asr actually sends larger flows (~66 bytes) to logstash than YAF does (~56 bytes). Yet, YAF is processed much slower.
The YAF flows do include an enterprise field, but the field
flowAttributes
is well accounted for in the definitions file.I've looked through the code and I can't work out what is causing the slow down. However, I'm not too good at ruby to be honest.
The script also includes the wireshark descriptions of the template and data packets.
The text was updated successfully, but these errors were encountered: