-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
issue: 1792164 Socket error queue support #900
base: master
Are you sure you want to change the base?
Conversation
NOTIFY_ON_EVENTS should be used in all place to provide single way for passing any epoll events. Signed-off-by: Igor Ivanov <[email protected]>
9c9c072
to
16e0cbf
Compare
Test FAILed. |
Test FAILed. |
16e0cbf
to
3db3bd0
Compare
Signed-off-by: Igor Ivanov <[email protected]>
Added flags argument that comes from original recv() call. It is needed to return information from error queue that should be done if MSG_ERRQUEUE is passed. Signed-off-by: Igor Ivanov <[email protected]>
Signed-off-by: Igor Ivanov <[email protected]>
Signed-off-by: Igor Ivanov <[email protected]>
zero copy was introduced at linux kernel 4.14 so prevoius versions do not have related options. Signed-off-by: Igor Ivanov <[email protected]>
Passing the MSG_ZEROCOPY flag is the most obvious step to enable copy avoidance, but not the only one. The kernel is permissive when applications pass undefined flags to the send system call. By default it simply ignores these. To avoid enabling copy avoidance mode for legacy processes that accidentally already pass this flag, a process must first signal intent by setting a socket option as SO_ZEROCOPY. Signed-off-by: Igor Ivanov <[email protected]>
Extend pbuf allocation functions with new parameter as pbuf_type to by pass requested type of memory to socket. Socket layer needs this information to manage different types of mem_buf_desc_t elements. Signed-off-by: Igor Ivanov <[email protected]>
There flags are added: VMA_TX_PACKET_ZEROCOPY - to use on sockinfo/dst_entry layers TCP_WRITE_ZEROCOPY - to use inside lwip tcp_write TF_SEG_OPTS_ZEROCOPY - to mark tcp segment with zero copy attribute Signed-off-by: Igor Ivanov <[email protected]>
3db3bd0
to
74b4965
Compare
Test FAILed. |
Test PASSed. |
9964cb1
to
fdc9e75
Compare
Test FAILed. |
Test FAILed. |
fdc9e75
to
bf819fc
Compare
Test PASSed. |
These changes make workable MSG_ZEROCOPY send flow including notification mechanizm. It is needed to notify the process when it is safe to reuse a previously passed buffer. It queues completion notifications on the socket error queue. But copy avoidance internally is not done. So all data is copied in internal buffers as without MSG_ZEROCOPY. Full zcopy support will be implemented later. Signed-off-by: Igor Ivanov <[email protected]>
ZCOPY packets should notify application as soon as possible to confirm one that user buffers are free to reuse. So force completion signal for such work requests. Signed-off-by: Igor Ivanov <[email protected]>
TCP write can create several memory descriptors for single write call with identical zcopy id. Notification should be done just in case last one is free. This change does not garantee correctness completelly when during the same write() call memory descriptor set current zcopy id after previous memory descriptor get tx completion and ack. zcopy operation should allocate memory buffer to track unique counter correctly. So tcp_write() should avoid adding portion of data to existing pbuf. Signed-off-by: Igor Ivanov <[email protected]>
To effectively process TX completions VMA should polling TX from internal thread too otherwise tx memory descriptor can not be freed on time as far as there user application should call any write() operations to force it. Signed-off-by: Igor Ivanov <[email protected]>
Flexible tunning is added to control RX and TX polling. Signed-off-by: Igor Ivanov <[email protected]>
rx() processing should allow return information from error queue and income data in single call. Depending on user application it means that rx() logic should return: 1. only income data 2. only error queue data 3. income and error queue data Error processing logic is done accordingly. Signed-off-by: Igor Ivanov <[email protected]>
Signed-off-by: Igor Ivanov <[email protected]>
LSO operation can not be done when payload data less than mss. This change allows to use LSO in right way. Signed-off-by: Igor Ivanov <[email protected]>
Signed-off-by: Igor Ivanov <[email protected]>
bf819fc
to
1392cb6
Compare
Zcopy notification mechanism (error queue) adds an event EPOLLERR to respective epfd_info object and it is never removed. This leads to the issue that epoll_wait() returns EPOLLERR event endlessly and doesn't enter polling loops. Fix this by removing EPOLLERR event when socket becomes not "errorable". The fix avoids fake EPOLLERR events and allows epoll_wait_helper() to perform polling.
In retransmit scenario it is possible to get duplicate ids in the zcopy callback. In this case, ee_data is rewritten with a value which may be lower than previous value. This leads to missed notifications. As workaround, don't overwrite ee_data with lower value.
Control message should be handled just in case an user passes a buffer for it. Error queue request must be processed first before data. Signed-off-by: Igor Ivanov <[email protected]>
b9e1276
to
07a71ec
Compare
Can one of the admins verify this patch? |
No description provided.