Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This change demonstrates how quinn could make use of windows
registered IO (RIO), or other kinds of completion based IO.
Upfront note: The change is incomplete, buggy, and will leak all memory.
Don't even think about using it as is. This is just a quick hack to get
some ideas about integration and about achievable performance.
Integrating with registered IO requires the following changes to get
things working:
running on the shared tokio runtime. This allows it to use any platform
specific IO primitives it requires to use. In this case we are using RIO,
and a custom eventloop which waits for new IO being possible using
a windows ManualResetEvent. Waiting via IOCP, or letting the thread
busy-spinning is also possible.
The actual loop, which is implemented in
EndpointDriver::run
, is notthat different from the existing
EndpointDriver::poll
method.and requires buffers to be registered with kernel space for the complete
lifetime of the socket. In order to accomodate for those requirements,
a buffer pool is allocated when the socket + endpoint are created, and
reused throughout the lifetime of the endpoint. The endpoint will make
sure the maximum possible amount of concurrent receive operations is
scheduled. Transmit operations get scheduled whenever data to transmit
is available and TX buffers are available.
Vec
to transmit outgoing buffers andBytesMut
to decode incomingdatagrams, all datagrams are copied once from the IO buffers to those
higher level buffers. This could theoretically be optimized.
from connections anymore using an async channel. This adds a custom
channel implemetation for this purpose, which consists of a trivial
synchronized queue and a wakeup of the endpoint eventloop.
tokio::spawn
anymore to spawn new connections,since it is not running inside a tokio context.
Therefore a runtime handle needs to be explicitely propagated.
WSA_FLAG_REGISTERED_IO
. ThereforeUDP sockets create via
std::net::UdpSocket
unfortunately can't be triviallyforwarded. It would be debatable whether this means the quinn library
should be resopnsible for creating all sockets, or whether it should still
accept external sockets but explictily require that those have been configured
with all the necessary flags.
Most of the points outlined here would also be required to support io_uring
or AF_XDP with buffers pre-registered with the kernel, or just
sendmsg/sendmmsg
usingMSG_ZEROCOPY
, which has similarrequirements.
Performance with this approach varies. The benchmarks indicate a
throughput somewhere between 180MB/s and 330MB/s. If a benchmark
was started, it will either consistently report the low or the high value.
Some comments in the msquic repository indicate that this might be
due to RSS. Maybe something can be improved here by making sure
the endpoint IO thread runs on the ideal core.
A follow up POC which could be built, but isn't part of this demo,
is to also move the
Connection
handling onto the new dedicatedIO thread.