io_uring_wait_cqe_timeout() with a 1ms timeout not returning for seconds #810

bateyejoe · 2023-03-02T19:54:16Z

bateyejoe
Mar 2, 2023

On Ubuntu with 5.19 kernel, I'm testing an app I've converted from using epoll to io_uring. This is actually just a test app for measuring how much capacity is available for my application type. The test app does UDP I/O and simulates sending/receiving many RTP media streams. There is one socket per stream being sent/received, and there is one thread and ring per logical CPU. The media streams are distributed evenly among the rings. Each media stream mimics a g.711 RTP stream having 172-byte UDP payloads sent 20 times a second, spaced as evenly as possible.

Using epoll on this particular hardware, I can manage about 8000 simultaneous streams which is 160,000 packets/s sent and 160,000 packets/s received. My io_uring implementation can barely get about 1000 streams before weird things start happening. In particular, it will run for a short time and then, on all my ring threads, io_uring_wait_cqe_timeout() will block for seconds (usually in the range of 2-5 seconds, but I've seen as high as 20s). As it currently is, I'm passing a hard-coded 1ms as the timeout. Also, when it returns from this very long wait, the return value is zero.

I should add that this is on a physical machine (Intel i7-9700K with an Intel desktop NIC). On another system, which happens to be a hyper-v VM running Ubunti with kernel 6.1, I see better behavior. I can get up to about 4000 streams before io_uring_wait_cqe_time() starts taking too long to reliably send the packets on time (in the range of 300ms when called with a 1ms timeout).

So that's a lot of info and I'm sure I still have issues with my implementation which I'm not looking for help with. I'm Just wondering if anyone has any hints of where to look with regard to the io_uring_wait_cqe_timeout() blocking for so long?

Answered by bateyejoe

Aug 3, 2023

So, to summarize, the end result of all this is that the problems I was seeing on 5.19 where I could block in io_uring_wait_cqe_timeout() for multiple seconds with a 1ms timeout, are all gone in 6.1.39. I'm running the same build of my original test app that had the problems in 5.19 under 6.1.39 and I'm not seeing any extended blocking in the wait and performance is close to epoll, even when not using multishot receives. So apparently there were some internal improvements between 5.19 and 6.1.

View full answer

redbaron · 2023-03-02T21:21:57Z

redbaron
Mar 2, 2023

Did you try to get perf report? Might show the reason of a hiccup when taken in that pause.

perf record -g
perf report > output_file.txt

0 replies

FrankReh · 2023-03-02T21:33:00Z

FrankReh
Mar 2, 2023

Make sure your implementation honors the overflow bit. If the cqeueue had an overflow, you have to "manually" get it to read from the overflow area first before normal operation will proceed.

1 reply

bateyejoe Mar 7, 2023
Author

No, I'm not doing anything of this sort. But I've googled around for more information on the subject but I'm having trouble finding much information. If you have any links to places that explain this further I would greatly appreciate it.

axboe · 2023-03-03T04:08:03Z

axboe
Mar 3, 2023
Maintainer

Since it sounds it's stuck, might also be worth doing:

# cat /proc/$pid/wchan

where $pid is the pid of the task as well. And you can capture the ring state by catting fdinfo/$fd in that same /proc/$pid/ directory, with $fd being the fd value of the ring fd, which might also provide some hints.

This isn't normal, and I'm curious what's going on here.

26 replies

bateyejoe Jul 21, 2023
Author

With the nanosleep(), I'm only missing cancels on about 1-4 buffers (or none at all). Without the nanosleep, and with 1000 receive ops, I'm missing cancels on around 150 buffers.

axboe Jul 22, 2023
Maintainer

Took a closer look and this is specific to 6.0, newer should work fine. This is unfortunately not something I can fix since 6.0-stable is long since retired, it's not one of the longer term maintained kernels. 6.1-stable and 6.4-stable are current, I would strongly recommend upgrading to that.

bateyejoe Jul 24, 2023
Author

Moved up to 6.1.39 and the cancel is finding all the operations and cancelling them correctly. Multishot is working and is much better than the individual receive operations, but still not quite as good as epoll, but pretty close. In the pic below, the first run (on the left) is io_uring using multishot recvmsg, and the second (on the right) is epoll. These are both running at 465000 packets/s which is essentially saturating my 1Gbps link between the two systems.

On 6.1 the non-multishot version still performs pretty poorly. I recall that it looked like it was performing much better on 6.0.19, but I'm not going to bother going back to re-test since that kernel has the cancel problem. Multishot is the only way to go with this kind of traffic.

On the subject of the ring buffer, I do have a question. If I register a ring buffer, fill it with buffers, and use it until I cancel my multi-shot receive, what's the typical way to reclaim the un-used buffers from the buffer ring? I see that I can un-register the buffer ring from the io_uring, but I don't see anything for accessing the unused buffers left in the ring.

bateyejoe Aug 3, 2023
Author

So, to summarize, the end result of all this is that the problems I was seeing on 5.19 where I could block in io_uring_wait_cqe_timeout() for multiple seconds with a 1ms timeout, are all gone in 6.1.39. I'm running the same build of my original test app that had the problems in 5.19 under 6.1.39 and I'm not seeing any extended blocking in the wait and performance is close to epoll, even when not using multishot receives. So apparently there were some internal improvements between 5.19 and 6.1.

Answer selected by bateyejoe

isilence · 2023-03-05T02:44:39Z

isilence
Mar 5, 2023
Collaborator

And having a reproducer would also be great

0 replies

FrankReh · 2023-03-10T22:33:20Z

FrankReh
Mar 10, 2023

Where is the overflow bit being handled in that msim.cpp file? Grep didn't find one for me. If you aren't handling the overflow case, that is still the first place I would look for the cause. Sorry if my armchair quarterbacking is just noise here.

1 reply

bateyejoe Mar 10, 2023
Author

Not noise at all. I just can't find much info on handling the overflow bit. Incidentally, I just tried adding code for checking for the overflow bit before each call to io_uring_wait_cqe_timeout() and if set, calling io_uring_get_events(), but in my test run that reproduces this issue, the io_uring_cq_has_overflow() never returns non-zero.

Also, my test runs ok, on this hardware, at 1000 streams/sockets with the ring configured for 512 entries. My test fails at 2000 streams/sockets and it doesn't seem to matter if I bump the ring's queue entries up to 2048.

FrankReh · 2023-03-11T00:14:09Z

FrankReh
Mar 11, 2023

Checking, and handling the case when there is a cqueue overflow bit was the critical piece for some software we worked on months ago that exhibited choppy throughput. The overflow bit is described in the manuals, I get 30 hits in liburing/man/* but of course some are soft link duplicated files.

Probably the man pages describe this better than I can.
But if/when you call the enter syscall, you should be including the IORING_ENTER_GETEVENTS in flags when there is an overflow. I guess it is possible user code always sets that bit, I didn't check that in your example, in which case it should not be necessary to check for the overflow because all you can do about it is call enter with the GETEVENTS bit set anyway.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

io_uring_wait_cqe_timeout() with a 1ms timeout not returning for seconds #810

{{title}}

Replies: 6 comments 28 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

io_uring_wait_cqe_timeout() with a 1ms timeout not returning for seconds #810

bateyejoe Mar 2, 2023

Replies: 6 comments · 28 replies

redbaron Mar 2, 2023

FrankReh Mar 2, 2023

bateyejoe Mar 7, 2023 Author

axboe Mar 3, 2023 Maintainer

bateyejoe Jul 21, 2023 Author

axboe Jul 22, 2023 Maintainer

bateyejoe Jul 24, 2023 Author

bateyejoe Aug 3, 2023 Author

isilence Mar 5, 2023 Collaborator

FrankReh Mar 10, 2023

bateyejoe Mar 10, 2023 Author

FrankReh Mar 11, 2023

bateyejoe
Mar 2, 2023

Replies: 6 comments 28 replies

redbaron
Mar 2, 2023

FrankReh
Mar 2, 2023

bateyejoe Mar 7, 2023
Author

axboe
Mar 3, 2023
Maintainer

bateyejoe Jul 21, 2023
Author

axboe Jul 22, 2023
Maintainer

bateyejoe Jul 24, 2023
Author

bateyejoe Aug 3, 2023
Author

isilence
Mar 5, 2023
Collaborator

FrankReh
Mar 10, 2023

bateyejoe Mar 10, 2023
Author

FrankReh
Mar 11, 2023