-
Notifications
You must be signed in to change notification settings - Fork 1
Optimizations
Ilya Baldin edited this page Feb 18, 2025
·
4 revisions
Linux kernel provides multiple send- and receive- side optimization options to improve the performance of streaming applications. E2SAR implements a number of these options that can be mixed and matched to the needs of a given application. Note that the availability of the optimizations depends on the version of the kernel and libraries available at the time E2SAR was built. Available optimizations can be listed using get_Optimizations()
call, optimizations can be selected using select_Optimizations()
call, selected optimizations can be queried using get_SelectedOptimizations()
and is_SelectedOptimization()
calls.
- sendmmsg - takes advantage of
sendmmsg()
system call if available to send all buffers/packets of a single event with a single call, rather thansendmsg()
used by default for every buffer. Available on Linux on any recent distribution. Makes one system call per event buffer. Generally 10%+ or more performant than the default implementation with no optimizations. Should be used any time it is available. - liburing_send - if
liburing
is installed and the Linux kernel is relatively recent (5.15.x and above), uses asynchronous calls to submit buffers/packets for sending. Maps file descriptors into the kernel to cut down on overhead. Uses a kernel polling thread to send outgoing fragments and makes no system calls. Generally 10%+ or more performant than the default implementation with no optimizations. Should be used when trying to reach the highest performance. May utilize more CPU through busy-waiting of the polling thread.
See bin/e2sar_perf.cpp for examples on how to use this.
- liburing_recv - if
liburing
is installed and the Linux kernel is relatively recent (5.15.x and above), uses asynchronous calls to receive data. Maps buffer space and file descriptors into the kernel to cut down on overhead.
See bin/e2sar_perf.cpp for examples on how to use this.
- Both Segmenter and Reassembler allow to pin the entire process or individual threads to CPU cores. Cores should be selected from the NUMA node on which the network NIC is attached (can be determined using e.g.
lstopo
andlscpu
). - Both Segmenter and Reassembler allow forcing memory allocation from the NUMA node on which the network NIC is attached. The NUMA node can be determined from using
lstopo
. For examples of how to use this capability look at bin/e2sar_perf.cpp