mtio-mode #818

stoops · 2025-08-15T02:24:31Z

This is an example PR of bulk-mode ++ mtio-mode which is a multi-threaded server/client operation that allows for even greater performance.

Blog Post: https://fossjon.com/2025/08/14/re-modifying-openvpn-source-code-to-allow-for-dual-connection-multi-threaded-load-balanced-operation-for-even-more-performance/

cron2 · 2025-08-15T09:15:41Z

This is amazing - but we're actually not interested in adding thread support to userland OpenVPN (we had some rudimentary thread support and threw it out between 2.4 and 2.5, if I remember correctly).

The recommended way to get maximum throughput on FreeBSD, Linux or Windows is to use the "data channel offload" module (DCO) living in the respective OS kernel, completely avoiding all context switches, etc. - so OpenVPN userland can happily do single-threaded race-condition-free TLS and key management, and the kernel side can do all-cores packet handling (how many cores are used in the end depends on OS queueing features, ingress NICs used, etc)

schwabe · 2025-08-15T09:50:37Z

Or if we add threaded support again to userland, it would probably good to have a better control channel/data channel separation, so that userland is not the special wacky data channel anymore but we would have a user space implementation that is also using the same interface as all the other DCO platforms. So writing a new clean sheet userspace data channel that uses modern techniques and thread and so on would be a great benefit.

But again as with the earlier "bulk mode" patch, having some idea what the improvement here are and how you measured them. The blog post shows 672 MBit/s but is not is not impressive without having somethign to compare or knowing the test setup. My macOS OpenVPN does over 1 Gbit/s against a local Linux DCO server without any modifications.

cron2 · 2025-08-15T10:10:23Z

Hah, 1 Gbit/s... :-)

[SUM]   0.00-10.00  sec  2.66 GBytes  2.28 Gbits/sec  16684             sender
[SUM]   0.00-10.00  sec  2.65 GBytes  2.28 Gbits/sec                  receiver

this is "iperf3, 10 TCP streams" over an OpenVPN link between two FreeBSD machines, using DCO. The limitation here is the slow CPU on the OpenVPN server and the single-threadedness of "iperf3" - it's a 10 year old i7-6700, and iperf3 runs at 100% CPU.

Using --disable-dco on the client (faster CPU) throughput drops to ~1.5 Gbit/s and we see 99% CPU usage by OpenVPN.

schwabe · 2025-08-15T11:54:52Z

To be clear, we are not saying that that is this isn't useful or make a difference. It is just hard to know what difference we are looking at without benchmarks.

cron2 · 2025-08-15T15:58:18Z

To be clear, we are not saying that that is this isn't useful or make a difference. It is just hard to know what difference we are looking at without benchmarks.

Well, the patch as it is is just a proof of concept code - it breaks UDP, and it needs cooperating client and server ("two TCP sessions"), so it's incompatible with regular OpenVPN. As such it is definitely interesting, but to go there "for real" would be a much larger effort which is just not worth it given that DCO will still be faster by running as part of kernel packet processing, not detouring to userland.

Something which could be looked at if it really must be "userland" and "TCP" is how complicated it would be to run 2 openvpn sessions in p2p mode in parallel (so, 2 on the client side, 2 on the server side) and then either share a tun fd or have the routing layer balance packets between the two instances.

stoops · 2025-08-15T22:05:30Z

Thank you all for your inputs and reviews on the code! Good catch on the UDP part, I had to rework that middle method to make it compatible context pointer wise. I updated the code commit to allow for 4 parallel streams now and generalized it a bit more and fixed up some edge case errors as well.

Thanks for providing the extra context surrounding threading and the DCO feature. Looking back now on the history of changes made to OpenVPN, I can see what happened here and why. It appears that when WireGuard became popular, a lot of folks were putting pressure on OpenVPN to become kernel level for performance reasons which I guess made you abandon the threading component entirely.

The ultimate issue with WireGuard is that it enforces a very small MTU along with the UDP protocol only. This is a very bad design in general for VPN software because if you are running a VPN appliance in the middle of your network which is operating on behalf of your LAN clients automatically, your LAN clients will assume they are operating on a 1500 byte MTU WiFi link and in addition, your WAN ISP link will also enforce a 1500 byte MTU on the other side of it also. And to make matters worse, WG does not have a fragmentation component built in either so my performance testing with it was extremely terrible for what I am trying to do and for my own personal use case.

I still maintain that one of OpenVPNs distinguishing features is that it can allow for a true 1500 byte MTU TUN/TAP interface as well as operate over TCP without the need for any fragmentation or compression. And it can operate at the user application level so I can run it almost anywhere that I can compile it, for example, a Linux VM running on a Mac Mini. This means that I can properly place it right in the middle of my core network at home and it will operate almost perfectly with these new changes that I have made without any changes needed on any of my network clients.

I am presently running bulk-mode and mtio-mode together at home on my core network which is piping my entire LAN traffic over a VPN tunnel over my WAN link. With these changes made, I am now getting full 6x1500 (~9000) byte TUN MTU reads off the interface and performing 1x9000 byte crypto/write call to the TCP socket with 4 parallel TCP threads capable of load balancing with each other off of 1 shared TUN interface. My next step is to run 4 TUN interfaces @ 4 TCP threads each (16 OpenVPN simultaneous session streams) and load balance across those interfaces some how via Linux network routing perhaps.

Even though this is all proof of concept work and I haven't officially provided any comparative test results yet, it's been working way better for me personally and I still appreciate your time and work on this open source project. I still think it would be good to add features similar to what I am presenting because I don't believe that copying WG is the true answer, I think being able to optimize the application process would distinguish the project from any other VPN product on the market and possibly fulfill use cases that no other product can achieve.

Thanks,
Jon C

cron2 · 2025-08-16T10:12:01Z

We have never copied wireguard :-)

OpenVPN has had "true 1500 byte MTU" and "UDP and TCP transport" since 20+ years, and we've had a Linux kernel module 7-8 years ago - being able to saturate a 10G link at that time already. That first module was not open sourced, and not in a shape suitable for inclusion in the general linux kernel, so it took quite a bit of time to get this all into proper shape.

This said, it is plain impossible to achieve DCO speed in userland, due to the extra overhead for every single packet going to userland and back. So this is explicitely not our goal.

Userland has a reasonable dataplane implementation with all bells and whistles, and the full-featured control plane (that WG does not have at all), and the "fast but only minimal features" dataplane can be offloaded to whatever is available - today, kernel, tomorrow, dpdk/vpp, network cards with crypto accelerators or whatever comes along.

stoops force-pushed the mtio branch 8 times, most recently from 5941946 to c286f01 Compare August 15, 2025 21:15

stoops force-pushed the mtio branch 15 times, most recently from 9c0e60d to 9b7acf8 Compare August 18, 2025 01:24

stoops force-pushed the mtio branch 13 times, most recently from b6ff3c7 to 57f2739 Compare September 11, 2025 01:55

stoops force-pushed the mtio branch 9 times, most recently from bdc1cbc to f197d46 Compare September 16, 2025 20:06

bulk mode

ea8cee8

stoops force-pushed the mtio branch 5 times, most recently from aef47a2 to 6cc84d8 Compare September 18, 2025 16:54

mtio mode

f487c70

stoops force-pushed the mtio branch from 6cc84d8 to f487c70 Compare September 18, 2025 16:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mtio-mode #818

mtio-mode #818

Uh oh!

stoops commented Aug 15, 2025

Uh oh!

cron2 commented Aug 15, 2025

Uh oh!

schwabe commented Aug 15, 2025

Uh oh!

cron2 commented Aug 15, 2025

Uh oh!

schwabe commented Aug 15, 2025

Uh oh!

cron2 commented Aug 15, 2025

Uh oh!

stoops commented Aug 15, 2025 •

edited

Loading

Uh oh!

cron2 commented Aug 16, 2025

Uh oh!

Uh oh!

mtio-mode #818

Are you sure you want to change the base?

mtio-mode #818

Uh oh!

Conversation

stoops commented Aug 15, 2025

Uh oh!

cron2 commented Aug 15, 2025

Uh oh!

schwabe commented Aug 15, 2025

Uh oh!

cron2 commented Aug 15, 2025

Uh oh!

schwabe commented Aug 15, 2025

Uh oh!

cron2 commented Aug 15, 2025

Uh oh!

stoops commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cron2 commented Aug 16, 2025

Uh oh!

Uh oh!

stoops commented Aug 15, 2025 •

edited

Loading