-
Notifications
You must be signed in to change notification settings - Fork 818
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support splice in http blind tunnel #11890
base: master
Are you sure you want to change the base?
Conversation
Passes unit test and manual integration test on Debian 12. Performance benchmark shows that splice could improve maximum blind tunnel throughput from 300 MB/s to 575 MB/s and reduce latency by 40% for MB level payload on C6in.large EC2 instance. Feel free to benchmark this CR. |
339f013
to
5a1b3ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR. Overall I think this is a clever idea to introduce zero-copy in a way that fits the existing architecture. I found a few small issues that I hope you can address before merging.
@@ -7274,6 +7274,32 @@ HttpSM::setup_blind_tunnel(bool send_response_hdr, IOBufferReader *initial) | |||
// header buffer into new buffer | |||
client_request_body_bytes += from_ua_buf->write(_ua.get_txn()->get_remote_reader()); | |||
|
|||
#if TS_USE_LINUX_SPLICE | |||
MIOBuffer *from_ua_pipe_buf = new_PipeIOBuffer(BUFFER_SIZE_INDEX_32K); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BUFFER_SIZE_INDEX_32K
is equal to 8. Is this the capacity of the pipe that we're requesting from the kernel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is the same as the MIOBuffer size we use for blind tunnel. The default linux pipe size is 16 pages so I am trying to save memory by requesting only 8 pages here. However, the system admin still need to lift pipe-user-pages-soft limit to avoid exceptions.
src/iocore/net/UnixNetVConnection.cc
Outdated
|
||
if (r <= 0) { | ||
// Temporary Unavailable, Non-Blocking I/O | ||
if (r == -EAGAIN || r == -ENOTCONN) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that we get EAGAIN here because the pipe is at capacity? How do we handle that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is actually impossible to get EAGAIN here because the pipe is at capacity. We will disable the read vio after each successful read and only reenable it when its corresponding pipe is empty again.
However, it is possible that we get EAGAIN because the socket is somehow unavailable. In that case, we wait for next epoll edge trigger same as the logic without zero copy
|
||
#if TS_USE_LINUX_SPLICE | ||
|
||
class PipeIOBufferReader : public IOBufferReader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Managing header dependency in ATS (specifically iocore) has been an ongoing challenge. One of the reasons is that we often combine multiple classes into a single header file. Please move the new classes to their own header and source files.
@@ -508,6 +508,81 @@ UnixNetVConnection::net_read_io(NetHandler *nh) | |||
read_disable(nh, this); | |||
return; | |||
} | |||
#if TS_USE_LINUX_SPLICE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is getting quite long, with two independent code paths to read from socket to buffer. I would prefer using polymorphism to handle the different socket-to-buffer copies, but at least we should split this function up so that it's more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have considered using polymorphism here but it looks impossible without significant refactoring because of exposure of low level io operations in this function (we will need a new member function in both MIOBuffer and PipeIOBuffer). Will split this function up for now
src/iocore/eventsystem/P_IOBuffer.h
Outdated
@@ -656,6 +656,15 @@ new_MIOBuffer_internal(const char *location, int64_t size_index) | |||
TS_INLINE void | |||
free_MIOBuffer(MIOBuffer *mio) | |||
{ | |||
#if TS_USE_LINUX_SPLICE | |||
// check if mio is PipeIOBuffer using dynamic_cast | |||
PipeIOBuffer *pipe_mio = dynamic_cast<PipeIOBuffer *>(mio); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relying on a runtime type check with dynamic_cast to select the proper deallocation mechanism is generally considered an anti-pattern. Instead, ClassAllocator has a Destruct_on_free_
parameter that allows you to use a virtual destructor to clean up properly depending on the underlying type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PipeIOBuffer is managed by pipeIOAllocator instead of ioAllocator. I introduced pipeIOAllocator because it is a new class with additional attributes. The ClassAllocator appears to be designed for concrete classes rather than abstract classes or interfaces.
- I didn’t put the cleanup logic to the virtual destructor because the existing logic for MIOBuffer doesn’t use MIOBuffer's destructor. Instead, it manually handles cleanup in free_MIOBuffer with the following steps:
mio->_writer = nullptr;
mio->dealloc_all_readers();
Interestingly, the destructor is designed to perform this same cleanup, but for some reason, the existing code avoids relying on it. If anyone knows why, I’d appreciate the insight.
- Even if we moved the cleanup logic to the virtual destructor, we would still need a runtime type check since real polymorphism is not yet in place. For example, when dealing with a PipeIOBuffer, we would need to use
THREAD_FREE(mio, pipeIOAllocator, this_thread());
instead of
THREAD_FREE(mio, ioAllocator, this_thread());
to make sure the correct allocator is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will keep it as it was until figure out how to implement real polymorphism here. More background info required.
Thanks for the comments @moonchen. I will start to resolve the comments |
5a1b3ee
to
0f7fc70
Compare
Slightly reorganized the file structure to decouple PipieIOBuffer from original IOBuffer. |
TS_INLINE char * | ||
PipeIOBufferReader::start() | ||
{ | ||
throw std::runtime_error("Not applicable for PipeIOBufferReader"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this PR is pretty interesting as concept. However, overriding bunch of methods and throwing runtime errors like this is indicating something is wrong with class design. ( Also, please note that ATS doesn't use Exception in these fundamental code )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m aware of this and have discussed it with Susan. The ultimate solution would be to extract a pure abstract class named IOBuffer, making MIOBuffer and PipeIOBuffer both derived classes of IOBuffer. This would provide a cleaner separation of concerns and eliminate the need for unnecessary overrides. However, this would require significant engineering effort, and I may not have enough resources to invest in this at the moment.
The reason I explicitly throw errors in empty overridden methods is to prevent developers from incorrectly calling MIOBuffer-specific methods on PipeIOBuffer instances, which could lead to undefined behavior. This ensures that such issues are caught early rather than silently causing unintended consequences.
I’m open to discussing alternative approaches if there’s a way to improve the class design while keeping the effort reasonable.
1. Add TS_USE_LINUX_SPLICE as a compilation option. 2. Make MIOBuffer and MIOBufferReader polymorphic classes making their member function virtual. 3. Create PipeIOBuffer and PipeIOBufferReader as derived classed, encapsulating Linux pipe. 4. Use dynamic_cast to enable logic switch in state machines and continuations.
0f7fc70
to
461e3e5
Compare
The latest revision is to address the following comments: |
[approve ci] |
Documentations:
ATS_splice_runbook.md
ATS Performance Benchmark.pdf
ATS_splice_design_doc.pdf