-
On Ubuntu with 5.19 kernel, I'm testing an app I've converted from using epoll to io_uring. This is actually just a test app for measuring how much capacity is available for my application type. The test app does UDP I/O and simulates sending/receiving many RTP media streams. There is one socket per stream being sent/received, and there is one thread and ring per logical CPU. The media streams are distributed evenly among the rings. Each media stream mimics a g.711 RTP stream having 172-byte UDP payloads sent 20 times a second, spaced as evenly as possible. Using epoll on this particular hardware, I can manage about 8000 simultaneous streams which is 160,000 packets/s sent and 160,000 packets/s received. My io_uring implementation can barely get about 1000 streams before weird things start happening. In particular, it will run for a short time and then, on all my ring threads, io_uring_wait_cqe_timeout() will block for seconds (usually in the range of 2-5 seconds, but I've seen as high as 20s). As it currently is, I'm passing a hard-coded 1ms as the timeout. Also, when it returns from this very long wait, the return value is zero. I should add that this is on a physical machine (Intel i7-9700K with an Intel desktop NIC). On another system, which happens to be a hyper-v VM running Ubunti with kernel 6.1, I see better behavior. I can get up to about 4000 streams before io_uring_wait_cqe_time() starts taking too long to reliably send the packets on time (in the range of 300ms when called with a 1ms timeout). So that's a lot of info and I'm sure I still have issues with my implementation which I'm not looking for help with. I'm Just wondering if anyone has any hints of where to look with regard to the io_uring_wait_cqe_timeout() blocking for so long? |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments 28 replies
-
Did you try to get perf report? Might show the reason of a hiccup when taken in that pause.
|
Beta Was this translation helpful? Give feedback.
-
Make sure your implementation honors the overflow bit. If the cqeueue had an overflow, you have to "manually" get it to read from the overflow area first before normal operation will proceed. |
Beta Was this translation helpful? Give feedback.
-
Since it sounds it's stuck, might also be worth doing:
where $pid is the pid of the task as well. And you can capture the ring state by catting fdinfo/$fd in that same /proc/$pid/ directory, with $fd being the fd value of the ring fd, which might also provide some hints. This isn't normal, and I'm curious what's going on here. |
Beta Was this translation helpful? Give feedback.
-
And having a reproducer would also be great |
Beta Was this translation helpful? Give feedback.
-
Where is the overflow bit being handled in that msim.cpp file? Grep didn't find one for me. If you aren't handling the overflow case, that is still the first place I would look for the cause. Sorry if my armchair quarterbacking is just noise here. |
Beta Was this translation helpful? Give feedback.
-
Checking, and handling the case when there is a cqueue overflow bit was the critical piece for some software we worked on months ago that exhibited choppy throughput. The overflow bit is described in the manuals, I get 30 hits in liburing/man/* but of course some are soft link duplicated files. Probably the man pages describe this better than I can. |
Beta Was this translation helpful? Give feedback.
So, to summarize, the end result of all this is that the problems I was seeing on 5.19 where I could block in io_uring_wait_cqe_timeout() for multiple seconds with a 1ms timeout, are all gone in 6.1.39. I'm running the same build of my original test app that had the problems in 5.19 under 6.1.39 and I'm not seeing any extended blocking in the wait and performance is close to epoll, even when not using multishot receives. So apparently there were some internal improvements between 5.19 and 6.1.