Skip to content

Commit

Permalink
allow setting buffer sizes on server_socket
Browse files Browse the repository at this point in the history
We add two options to set the recv and send (SO_RCVBUF, ...) buffer
sizes on a listening socket (server_socket). This is mostly useful to
propagate said sizes to all sockets returned by accept().

It is already possible to set the socket option directly on the
connected socket after it returned by accept() but experimentally this
results in a socket with the specified buffer size but whose
receive window will not be advertised to the client beyond the default
(64K for current typical kernel defaults). So you get only some of the
benefit of the larger buffer.

Setting the buffer size on the listening socket, however, is mentioned
as the correct approach in tcp(7) and does not suffer from the same
limitation.

A test is included which checks that the mechanism, including the
inheritance, works.

Closes scylladb#2458

(cherry picked from commit 4cb7f8e)
  • Loading branch information
travisdowns authored and StephanDollberg committed Oct 23, 2024
1 parent 8a8a90c commit 4663e75
Show file tree
Hide file tree
Showing 3 changed files with 107 additions and 3 deletions.
22 changes: 20 additions & 2 deletions include/seastar/net/api.hh
Original file line number Diff line number Diff line change
Expand Up @@ -396,13 +396,31 @@ public:

/// @}

/// Options for creating a listening socket.
///
/// WARNING: these options currently only have an effect when using
/// the POSIX stack: all options are ignored on the native stack as they
/// are not implemented there.
struct listen_options {
bool reuse_address = false;
server_socket::load_balancing_algorithm lba = server_socket::load_balancing_algorithm::default_;
transport proto = transport::TCP;
int listen_backlog = 100;
unsigned fixed_cpu = 0u;
std::optional<file_permissions> unix_domain_socket_permissions;

/// If set, the SO_SNDBUF size will be set to the given value on the listening socket
/// via setsockopt. This buffer size is inherited by the sockets returned by
/// accept and is the preferred way to set the buffer size for these sockets since
/// setting it directly on the already-accepted socket is ineffective (see TCP(7)).
std::optional<int> so_sndbuf;

/// If set, the SO_RCVBUF size will be set to the given value on the listening socket
/// via setsockopt. This buffer size is inherited by the sockets returned by
/// accept and is the preferred way to set the buffer size for these sockets since
/// setting it directly on the already-accepted socket is ineffective (see TCP(7)).
std::optional<int> so_rcvbuf;

void set_fixed_cpu(unsigned cpu) {
lba = server_socket::load_balancing_algorithm::fixed;
fixed_cpu = cpu;
Expand Down Expand Up @@ -457,8 +475,8 @@ public:
return false;
}

/**
* Returns available network interfaces. This represents a
/**
* Returns available network interfaces. This represents a
* snapshot of interfaces available at call time, hence the
* return by value.
*/
Expand Down
9 changes: 9 additions & 0 deletions src/core/reactor.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1752,6 +1752,15 @@ reactor::posix_listen(socket_address sa, listen_options opts) {
if (opts.reuse_address) {
fd.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1);
}

if (opts.so_sndbuf) {
fd.setsockopt(SOL_SOCKET, SO_SNDBUF, *opts.so_sndbuf);
}

if (opts.so_rcvbuf) {
fd.setsockopt(SOL_SOCKET, SO_RCVBUF, *opts.so_rcvbuf);
}

if (_reuseport && !sa.is_af_unix())
fd.setsockopt(SOL_SOCKET, SO_REUSEPORT, 1);

Expand Down
79 changes: 78 additions & 1 deletion tests/unit/socket_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,17 @@
#include <seastar/util/std-compat.hh>
#include <seastar/util/later.hh>
#include <seastar/testing/test_case.hh>
#include <seastar/testing/thread_test_case.hh>
#include <seastar/core/abort_source.hh>
#include <seastar/core/sleep.hh>
#include <seastar/core/thread.hh>
#include <seastar/core/when_all.hh>

#include <seastar/net/api.hh>
#include <seastar/net/posix-stack.hh>

#include <optional>
#include <tuple>

using namespace seastar;

future<> handle_connection(connected_socket s) {
Expand Down Expand Up @@ -224,3 +228,76 @@ SEASTAR_TEST_CASE(socket_on_close_local_shutdown_test) {
when_all(std::move(client), std::move(server)).discard_result().get();
});
}

SEASTAR_THREAD_TEST_CASE(socket_bufsize) {

// Test that setting the send and recv buffer sizes on the listening
// socket is propagated to the socket returned by accept().

auto buf_size = [](std::optional<int> snd_size, std::optional<int> rcv_size) {
listen_options lo{
.reuse_address = true,
.lba = server_socket::load_balancing_algorithm::fixed,
.so_sndbuf = snd_size,
.so_rcvbuf = rcv_size
};

ipv4_addr addr("127.0.0.1", 1234);
server_socket ss = seastar::listen(addr, lo);
connected_socket client = connect(addr).get();
connected_socket server = ss.accept().get().connection;

auto sockopt = [&](int option) {
int val{};
int ret = server.get_sockopt(SOL_SOCKET, option, &val, sizeof(val));
BOOST_REQUIRE_EQUAL(ret, 0);
return val;
};

int send = sockopt(SO_SNDBUF);
int recv = sockopt(SO_RCVBUF);

ss.abort_accept();
client.shutdown_output();
server.shutdown_output();


return std::make_tuple(send, recv);
};

constexpr int small_size = 8192, big_size = 128 * 1024;

// we pass different sizes for send and recv to catch any copy/paste
// style bugs
auto [send_small, recv_small] = buf_size(small_size, small_size * 2);
auto [send_big, recv_big] = buf_size(big_size, big_size * 2);

// Setting socket buffer sizes isn't an exact science: the kernel does
// some rounding, and also (currently) doubles the requested size and
// also applies so limits. So as a basic check, assert simply that the
// explicit small buffer ends up smaller than the explicit big buffer,
// and that both results are at least as large as the requested amount.
// The latter condition could plausibly fail if the OS clamped the size
// at a small amount, but this is unlikely for the chosen buffer sizes.

BOOST_CHECK_LT(send_small, send_big);
BOOST_CHECK_LT(recv_small, recv_big);

BOOST_CHECK_GE(send_small, small_size);
BOOST_CHECK_GE(send_big, big_size);

BOOST_CHECK_GE(recv_small, small_size * 2);
BOOST_CHECK_GE(recv_big, big_size * 2);

// not much to check here with "default" sizes, but let's at least call it
// and check that we get a reasonable answer
auto [send_default, recv_default] = buf_size({}, {});

BOOST_CHECK_GE(send_default, 4096);
BOOST_CHECK_GE(recv_default, 4096);

// we don't really know the default socket size and it can vary by kernel
// config, but 20 MB should be enough for everyone.
BOOST_CHECK_LT(send_default, 20'000'000);
BOOST_CHECK_LT(recv_default, 20'000'000);
}

0 comments on commit 4663e75

Please sign in to comment.