Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to configure the maximum size for the frame channel between one connection and one of its stream. #165

Conversation

stormshield-damiend
Copy link

While doing some benchmarks using iperf3 on a network client / server application using Yamux + Tokio TcpStream + rustls we noticed that the mpsc between a Connection and one of its Stream was filling too fast leading to lower bandwidth.

With the default hard-coded value (10 packets) with the client and server application running inside two dockers on the same laptop, we measured 315 Mbits/sec. With an increased value of 16*1024 packets, we measured 426 Mbits/sec.

This PR adds an API that allow setting this value in YamuxConfig and leave the default value as is.

@thomaseizinger
Copy link
Contributor

I wonder if we need to make this configurable or can't just increase the size of the channel. As documented, 10 was picked completely arbitrary.

A question around your benchmarks: Did you run the stream and the connection on different tasks? I'd imagine the performance to be quite bad if the same task has to switch back and forth between transferring data on the stream (which writes to the queue) vs writing data to the wire (which empties this queue).

The former one is actually the default in rust-libp2p if you poll streams as part of a ConnectionHandler and not in a separately spawned task.

How big each individual frame can be also depends on how much credit the remote gave you. See #162 for a related discussion. Thus, I'd expect that because of our fairly small default window size, you need a lot of space in this buffer initially to achieve good performance. As soon as the receive window has been increased, one initial frame could easily be several MB big at which point the performance should be quite good, despite a small buffer size.

I am a bit hesitant to just expose this as a configuration option. I'd rather like to explore if we can figure out what a good value is. Given that this is only for outgoing frames, I think we don't need to be worried about DoS attacks and can just set this to a high value.

@stormshield-damiend
Copy link
Author

stormshield-damiend commented Jul 3, 2023

Hi Thomas,

Some more context.

Here are the Yamux settings we use

let mut yamux_config = YamuxConfig::default();
yamux_config.set_window_update_mode(WindowUpdateMode::on_read());
yamux_config.set_receive_window_size(16 * 1024 * 1024);
yamux_config.set_max_buffer_size(16 * 1024 * 1024);
yamux_config.set_max_stream_backlog(16 * 1024);

We use Yamux to stream some data between client and server, on a dedicated stream we have a pattern like send length of data (2 bytes) + send data (around 1400).

We use tokio runtime in current_thread mode. I can run the same tests in multi_thread mode if needed.
We have many spawns all over our code to write and poll data.

About your proposal to expose or not this setting, it is true we could find a good value based on the initial size of the max_buffer_size divided by a proper average packet size.

@thomaseizinger
Copy link
Contributor

thomaseizinger commented Jul 4, 2023

Here is my intuition but this would have to be confirmed by benchmarks:

The best performance should be a combination of:

  • Spawning worker tasks for reading and writing to streams
  • Use a dedicated task for polling the Connection
  • Using a multi-threaded runtime (not sure though if lock-contention wouldn't hurt here at some point)
  • Writing to the stream should always be faster than writing to the wire (assuming you can produce data fast) because it just needs to write to a channel. This also assumes that your receive window is already ramped up.

The channel will fill up eventually which is important to backpressure your work-producing tasks.

My theory is that what slows down performance are the initially small receive windows. Those tiny bits of data are written very quickly to the wire and thus the channel is often empty.

It would be great if you could confirm that. For that, you'll need to add logs for when a receiver is "pending". If we see those logs more often with a smaller channel size, that should confirm the theory.

There should be a point from where increasing the channel size no longer improves performance. Mind doing a bit of binary-search style testing what a good value for the channel size is?

Perhaps property-based testing can help here.

I don't think we can set it in relation to the window size because we depend on our peer's window size, not our own.

@stormshield-damiend
Copy link
Author

Hi Thomas,

sure i can run some more tests using different size, it may take some time before i get back with the results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants