-
Notifications
You must be signed in to change notification settings - Fork 3.2k
mtio-mode #818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
mtio-mode #818
Conversation
This is amazing - but we're actually not interested in adding thread support to userland OpenVPN (we had some rudimentary thread support and threw it out between 2.4 and 2.5, if I remember correctly). The recommended way to get maximum throughput on FreeBSD, Linux or Windows is to use the "data channel offload" module (DCO) living in the respective OS kernel, completely avoiding all context switches, etc. - so OpenVPN userland can happily do single-threaded race-condition-free TLS and key management, and the kernel side can do all-cores packet handling (how many cores are used in the end depends on OS queueing features, ingress NICs used, etc) |
Or if we add threaded support again to userland, it would probably good to have a better control channel/data channel separation, so that userland is not the special wacky data channel anymore but we would have a user space implementation that is also using the same interface as all the other DCO platforms. So writing a new clean sheet userspace data channel that uses modern techniques and thread and so on would be a great benefit. But again as with the earlier "bulk mode" patch, having some idea what the improvement here are and how you measured them. The blog post shows 672 MBit/s but is not is not impressive without having somethign to compare or knowing the test setup. My macOS OpenVPN does over 1 Gbit/s against a local Linux DCO server without any modifications. |
Hah, 1 Gbit/s... :-)
this is "iperf3, 10 TCP streams" over an OpenVPN link between two FreeBSD machines, using DCO. The limitation here is the slow CPU on the OpenVPN server and the single-threadedness of "iperf3" - it's a 10 year old i7-6700, and iperf3 runs at 100% CPU. Using |
To be clear, we are not saying that that is this isn't useful or make a difference. It is just hard to know what difference we are looking at without benchmarks. |
Well, the patch as it is is just a proof of concept code - it breaks UDP, and it needs cooperating client and server ("two TCP sessions"), so it's incompatible with regular OpenVPN. As such it is definitely interesting, but to go there "for real" would be a much larger effort which is just not worth it given that DCO will still be faster by running as part of kernel packet processing, not detouring to userland. Something which could be looked at if it really must be "userland" and "TCP" is how complicated it would be to run 2 openvpn sessions in p2p mode in parallel (so, 2 on the client side, 2 on the server side) and then either share a tun fd or have the routing layer balance packets between the two instances. |
5941946
to
c286f01
Compare
Thank you all for your inputs and reviews on the code! Good catch on the UDP part, I had to rework that middle method to make it compatible context pointer wise. I updated the code commit to allow for 4 parallel streams now and generalized it a bit more and fixed up some edge case errors as well. Thanks for providing the extra context surrounding threading and the DCO feature. Looking back now on the history of changes made to OpenVPN, I can see what happened here and why. It appears that when WireGuard became popular, a lot of folks were putting pressure on OpenVPN to become kernel level for performance reasons which I guess made you abandon the threading component entirely. The ultimate issue with WireGuard is that it enforces a very small MTU along with the UDP protocol only. This is a very bad design in general for VPN software because if you are running a VPN appliance in the middle of your network which is operating on behalf of your LAN clients automatically, your LAN clients will assume they are operating on a 1500 byte MTU WiFi link and in addition, your WAN ISP link will also enforce a 1500 byte MTU on the other side of it also. And to make matters worse, WG does not have a fragmentation component built in either so my performance testing with it was extremely terrible for what I am trying to do and for my own personal use case. I still maintain that one of OpenVPNs distinguishing features is that it can allow for a true 1500 byte MTU TUN/TAP interface as well as operate over TCP without the need for any fragmentation or compression. And it can operate at the user application level so I can run it almost anywhere that I can compile it, for example, a Linux VM running on a Mac Mini. This means that I can properly place it right in the middle of my core network at home and it will operate almost perfectly with these new changes that I have made without any changes needed on any of my network clients. I am presently running bulk-mode and mtio-mode together at home on my core network which is piping my entire LAN traffic over a VPN tunnel over my WAN link. With these changes made, I am now getting full 6x1500 (~9000) byte TUN MTU reads off the interface and performing 1x9000 byte crypto/write call to the TCP socket with 4 parallel TCP threads capable of load balancing with each other off of 1 shared TUN interface. My next step is to run 4 TUN interfaces @ 4 TCP threads each (16 OpenVPN simultaneous session streams) and load balance across those interfaces some how via Linux network routing perhaps. Even though this is all proof of concept work and I haven't officially provided any comparative test results yet, it's been working way better for me personally and I still appreciate your time and work on this open source project. I still think it would be good to add features similar to what I am presenting because I don't believe that copying WG is the true answer, I think being able to optimize the application process would distinguish the project from any other VPN product on the market and possibly fulfill use cases that no other product can achieve. Thanks, |
We have never copied wireguard :-) OpenVPN has had "true 1500 byte MTU" and "UDP and TCP transport" since 20+ years, and we've had a Linux kernel module 7-8 years ago - being able to saturate a 10G link at that time already. That first module was not open sourced, and not in a shape suitable for inclusion in the general linux kernel, so it took quite a bit of time to get this all into proper shape. This said, it is plain impossible to achieve DCO speed in userland, due to the extra overhead for every single packet going to userland and back. So this is explicitely not our goal. Userland has a reasonable dataplane implementation with all bells and whistles, and the full-featured control plane (that WG does not have at all), and the "fast but only minimal features" dataplane can be offloaded to whatever is available - today, kernel, tomorrow, dpdk/vpp, network cards with crypto accelerators or whatever comes along. |
9c0e60d
to
9b7acf8
Compare
b6ff3c7
to
57f2739
Compare
bdc1cbc
to
f197d46
Compare
aef47a2
to
6cc84d8
Compare
This is an example PR of bulk-mode ++ mtio-mode which is a multi-threaded server/client operation that allows for even greater performance.
Blog Post: https://fossjon.com/2025/08/14/re-modifying-openvpn-source-code-to-allow-for-dual-connection-multi-threaded-load-balanced-operation-for-even-more-performance/