-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Confusing AXFR errors #1624
Comments
thanks for the repro. I just ran that on my laptop , without seeing any error. Was some record removed from zone? |
Unclear if changes have been made to the zone or platform. I reported the issue to IIS who shared internally, but I've not heard back. As far as this DNS library is concerned, I'm trying to understand the errors in order to be more resilient to the behaviour of the host. |
[ Quoting ***@***.***> in "Re: [miekg/dns] Confusing AXFR erro..." ]
Unclear if changes have been made to the zone or platform. I reported the issue
to IIS who shared internally, but I've not heard back.
The symptoms remain for me. My pcaps indicate that the records restart at some
point (sometimes after only a few hundred RRs other times after a few
thousand), which results a TCP segment that is never received. Then it's just
DUP and retransmissions and eventually the connection fails. By restart, I mean
the SOA is re-sent and RRs begin again from the top of the zone.
I suspect this is related to some special behaviour of the zonedata.iis.se
host, particularly over high latency connections. I'm fetching from Australia.
I see the same symptoms in pcaps using dig, but the connection remains
resilient and eventually the axfr completes.
As far as this DNS library is concerned, I'm trying to understand the errors in
order to be more resilient to the behaviour of the host.
Happy to include fixes need here. But resend tcp segments, or not something this lib
should see (i think).... so that would hint at deeper golang problems?
(minimal tcpdump would help here I guess?)
Cheers!
|
se_cap_golang-filtered-01.pcap.gz I've attached a pcap starting a few records before the first malformed RR is seen. The library returns |
after loading it up in wireshark, after the TCP stuff, the first DNS seen according to ws is:
... and looking at ws this is indeed completely broken packet. Then until the next red block, no valid DNS packets are seen. That would mean we could do better and erroring when seeing:
Which would a better opcode check or something? |
what does this do? diff --git a/msg.go b/msg.go
index 5fa7f9e8..ead1f5f4 100644
--- a/msg.go
+++ b/msg.go
@@ -885,6 +885,13 @@ func (dns *Msg) Unpack(msg []byte) (err error) {
}
dns.setHdr(dh)
+ if _, ok := OpcodeToString[dns.Opcode]; !ok {
+ return fmt.Errorf("bad opcode %d", dns.Opcode)
+ }
+ if _, ok := RcodeToString[dns.Rcode]; !ok {
+ return fmt.Errorf("bad rcode %d", dns.Rcode)
+ }
+
return dns.unpack(dh, msg, off)
} |
hmmm this will give you a saner error, but is not the root cause. This may give more hints:
|
That second error message provides more helpful information. The changes to the Unpack function did not result in any difference. The packets I received never triggered those additional error conditions in the Unpack function.
The message in |
what happens if you dont error here, but just return, i.e.
obv, the first 2 error from your comment, are leading up to the last two, so it might make sense to instrument those parts well. I'll check and post another diff. I'm more confused that dig continues unfazed when looking at that tcp dump TBH |
Changing the code to if lenrd > len(msg) {
return msg, nil
} resulted in these errors being returned: I've done more testing with AFAIK this is not behaviour that is part of the protocol, but is probably the result of server logic. The axfr will complete, although I'll have duplicate or triplicate RRs. So the connection stays resilient to malformed packets, but I don't end up with a clean version of the zone. IMO this is probably not something that makes sense to copy. But maybe this is a question for DNSOP as to what should the correct client behaviour be? My personal opinion is that the right thing to do would be to drop the connection and return a clear error |
restarting the axfr is indeed not specified, although you can add it to the code. I think godns is probably right by calling this transfer broken, some part is not being (re)send and this messes up the (length) counters and there is no mechanism to fix that in the DNS. It's hard to distill something sane, error wise, from that situation. |
Description
When attempting an xfr from zonedata.iis.se, for
se.
using a non-tsig signed transfer I see the following error consistently:dns: overflowing header size
When attempting an xfr from zonedata.iis.se, for
nu.
using a non-tsig signed transfer I see the following two errors occasionally:dns: overflow unpacking uint16
dns: buffer size too small
However the transfer attempt for
nu.
will also work as expected about half of the time.I'm using just the high level APIs for a simple transfer. AXFR via
dig @zonedata.iis.se axfr se
works as expected.Steps to reproduce
Expected Behaviour
All records received for zone file (~8106000 RRs), finishing with the SOA RR.
Actual Behaviour
For SE zone I receive the error
dns: overflowing header size
at some point during the transfer. Number of records received before the error is not consistent.For NU zone I receive the one of the following errors at some point during the transfer
dns: overflow unpacking uint16
ordns: buffer size too small
. As with the SE zone, the number of records received before the error is not consistent. Approx half of my transfer attempts succeed for the NU zone.The text was updated successfully, but these errors were encountered: