-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Packet loss / ENOBUFs with kqueue(2) and tap(4) on OpenBSD #374
Comments
@mato I have tested the scenario described and I too see the dropped packets. Issue confirmed. I will setup a |
@adamsteen Thanks for confirming. Let me know how it goes with the UNIX
version, or if you have any questions on how to put that together.
|
@mato i have setup something up so far (not working), test_net_2if, need to have a look at where i went wrong with yield. ( Just doing a little bit now and then as I have time. |
@mato using my very naive translation, i was able to reproduce this error with UNIX program. |
@adamsteen I can also reproduce the behaviour on 6.5 with your UNIX version. So, I suggest asking on |
here is the bug report sent to @bugs https://marc.info/?l=openbsd-bugs&m=156229879107900&w=2 |
@adamsteen Thanks for filing that. Is there some way I can get myself added to a Cc: list on that bug? |
@mato I have cc you into the bug report |
@adamsteen Any progress on this? I've not seen any traffic from |
@mato Sorry for the late reply, I have been with limited internet access for the last few weeks (Holidays), once i have caught up on things, I will ask on |
I have sent an email to |
I have been reading a little further in tap(4), and found this tid bit
See 4th the paragraph of Description in tap(4) https://man.openbsd.org/tap.4 Not sure exactly what that means or where to look but wanted to note it down |
This issue is for the work in progress multiple device support in #373. While testing with the newly added "dual interface"
tests/test_net_2if
on OpenBSD, I've found what seems like a bug in the interaction betweenkqueue(2)
andtap(4)
.This is reproducible on OpenBSD 6.4 (under nested KVM) and OpenBSD 6.5 (on bare metal). This issue does not occur on FreeBSD, which uses identical calls to
kqueue(2)
, or Linux ,which usesepoll(2)
.To reproduce:
service0
interface:service1
interface:service1
, a packet is dropped byservice0
. This shows as a.
in the first flood ping's output.service1
and start a flood ping instead:service1
interface:ENOBUFS
). Note thatnetstat -m
shows a large (but not inordinately so) amount ofmbufs
used.In an attempt to find the root cause of this issue, I've looked at the source to OpenBSD's if_tun.c and written a patch for solo5-hvt that dumps the
IFQ_LEN()
returned by the kqueue filter to userspace:kqueue-diag.patch.txt.
Rebuilding the branch with this patch, and repeating up to step (7) above, note that for each ping sent to
service1
(tap101,h=1
), the queue length (q=
) reported by the filter forservice0
(tap100,h=0
) increases by one and never decreases while the flood ping toservice0
(tap100) is running. In fact, if we leave the normal ping from (6) running long enough for the queue length to reach255
, then we see that the flood ping (4) starts reportingENOBUFS
.My hypothesis is that there is something wrong in the interaction between
kqueue(2)
andtap(4)
, perhaps a race condition that is causing events to be dropped? Of course there may also be an implementation error in thekqueue(2)
code in #373, but given that FreeBSD does not have this problem, and neither does the functionally equivalent Linuxepoll(2)
code, this seems unlikely.@adamsteen I'd appreciate it if you could confirm that you can reproduce this. Not sure what to do about it -- my idea would be to first take
test_net_2if.c
and rewrite it as a conventional UNIX program by cut-n-pasting from the hvt tender, to see if the problem still occurs. If it still does, then we should report that as a bug (with the non-Solo5 test case) to the OpenBSDtech@
mailing list.In the mean time, I'll probably just disable the two-interface test for OpenBSD CI in #373.
The text was updated successfully, but these errors were encountered: