-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] iostream: extended read_exactly2 interface with alignment #5
base: ceph-octopus-19.06.0-45-g7744693c
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -315,16 +315,44 @@ posix_ap_server_socket_impl<Transport>::move_connected_socket(socket_address sa, | |
} | ||
} | ||
|
||
static constexpr size_t min_buf_size = 1 << 9; | ||
static constexpr size_t max_buf_size = 1 << 15; | ||
|
||
future<temporary_buffer<char>> | ||
posix_data_source_impl::get() { | ||
return _fd->read_some(_buf.get_write(), _buf_size).then([this] (size_t size) { | ||
if (_buf_size == size) { | ||
_buf_size = std::min(max_buf_size, _buf_size << 2); | ||
} else if (size < (_buf_size >> 2)) { | ||
_buf_size = std::max(min_buf_size, _buf_size >> 2); | ||
} | ||
_buf.trim(size); | ||
auto ret = std::move(_buf); | ||
_buf = make_temporary_buffer<char>(_buffer_allocator, _buf_size); | ||
return make_ready_future<temporary_buffer<char>>(std::move(ret)); | ||
}); | ||
} | ||
|
||
future<size_t, temporary_buffer<char>> | ||
posix_data_source_impl::get_direct(char* buf, size_t size) { | ||
if (size > _buf_size / 2) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, currently 4096 looks in the 1 syscal/msg testing as a reasonable threshold for prefetching. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep, it is the same strategy implemented in the current async-messenger. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I cannot move it up, because this is the special case to reduce syscall for posix sockets, which is not general to other concrete |
||
// this was a large read, we don't prefetch | ||
return _fd->read_some(buf, size).then([this] (auto read_size) { | ||
if (_buf_size == read_size) { | ||
_buf_size = std::min(max_buf_size, _buf_size << 2); | ||
} else if (read_size < (_buf_size >> 2)) { | ||
_buf_size = std::max(min_buf_size, _buf_size >> 2); | ||
} | ||
return make_ready_future<size_t, temporary_buffer<char>>( | ||
read_size, temporary_buffer<char>()); | ||
}); | ||
} else { | ||
// read with prefetch, but with extra memory copy, | ||
// because we prefer less system calls. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For |
||
return data_source_impl::get_direct(buf, size); | ||
} | ||
} | ||
|
||
future<> posix_data_source_impl::close() { | ||
_fd->shutdown(SHUT_RD); | ||
return make_ready_future<>(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid that using
read_exactly2()
to read big chunks will imposememcpy
for DPDK due to contiguity requirement. The new method returnstemporary_buffer
which means: only one data pointer and one data size.If we expect from DPDK fragmented payloads, we should expect from
read_exactly2()
a lot ofmemcpy
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, if crimson-OSD supports fragmented payloads (such as SPDK), it should explicitly instruct messenger to use
ceph::net::Socket::read(size)
instead ofceph::net::Socket::read_exactly(size, alignment)
. Becauseceph::net::Socket::read(size)
will return internally fragmentedbufferlist
as expected, and IMO it is better renamed toceph::net::Socket::read_fragmented(size)
.Also, the current
ceph::net::Socket::read(size)
is already optimal for both DPDK stack and POSIX stack if OSD-side supports fragmented DATA payload:If OSD doesn't support fragmented payloads itself (such as kernel),
ceph::net::Socket::read_exactly(size)
still needs to be used to build up big chunks of aligned payload, regardless of whether the messenger is using Native or POSIX stack.My point is that whether or not to use fragmented/aligned payloads should be instructed by OSD, not seastar framework. It's our (framework user) specific requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree with that. For POSIX stack the SGL will be terribly fragmented and many syscalls will be issued because of the small, 8 KB-long prefetch buffer. For instance: reading 4 MB payload requires
4096 KB / 8 KB = 512
fragments and also 512 syscalls.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a separate issue, and there is already another PR addressing it (#4). My analysis (#4 (comment)) shows that messenger performance is much better with larger trunks (1 MB), as expected. But I still don't know why
rados bench
disagreed (from kefu).