-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dns server does not behave well with DNS size limits #6342
Comments
In terms of urgency: I think we have a workaround that will allow us to land #5912 and there may be no immediate problem, but I think there's considerable risk in doing nothing right now. For one, I'm not convinced the workaround isn't working by accident, in which case it could break again in a new update. (The workaround is to enable EDNS on the client but we're not actually using EDNS! I think it's just causing hickory-dns to process the full packet instead of ignoring it.) For another, it wouldn't necessarily take a software update to trigger this again. Any change that causes some query to produce more DNS records than it did before (e.g., adding a sled and putting another Nexus on it or something like that) could cause some client to stop working. (Even if we're talking about the same client, there may be another threshold slightly higher than we've hit so far that will trigger this again.) And when this sort of problem comes up, the failure mode is as we saw in #5912, which is sudden, silent failure of all DNS queries. So I think we should probably do this sooner rather than later. In terms of implementation: right now, our DNS server is quite low-level: parsing packets directly and producing responses using low-level packet builders. I expect those interfaces probably do support what we need. But I see that hickory-dns has a much higher level |
If there is a higher-level interface that provides more inbuilt functionality and looks like it satisfies our needs, I'd say give it a go. IIRC, when I was first working with TrustDNS, the higher-level interfaces were a little too tied at the hip to the standalone daemon for what I was trying to accomplish, which is why I went for the lower-level ones. |
from a quick look, i think
at the bare minimum, after looking at this, i'm strongly in favor of
which it seems we might have to do ourselves anyway even using on that topic, part of why we've been happily emitting oversized DNS responses might be that the underlying buffer in this also gives us a nice way to plumb the EDNS-advertised maximum size through, if we really need. i'm also uncomfortable with the thought that our DNS client code was getting responses that were somehow unusable and not even warning pretty loudly about it. the change in hickory-dns that made this apparent is mostly interesting because it resulted in this client behavior is clearly a different issue, and i'm not immediately sure how to improve the client side here, so i'll probably digest this part into a different issue tomorrow... edit: looked further into client behavior and wrote it up in #6415. while i think think this is more concerning from a debuggability standpoint, i also think this is where we'd need to work with hickory-dns more actively for the right API. |
after finding the relevant code in more mechanically, for this workaround to stop working, this buffer size calculation in
after putting together #6415 we should not do this until we support queries over TCP.
strong agree
yes
big fan. alongside all this, knowing that we're sending too-large answers (there's still a 64KiB size limit for TCP answers) seems really important. EDNS pushes that problem down the road another kb or two, TCP is probably Good Enough Forever, but still... |
This issue is deliberately a little vague, but what I mean is:
More research needed to confirm but I think we should probably have the DNS server:
I gather there's some max size negotiation that can happen, at least with EDNS (not sure about non-EDNS), and we should presumably honor that.
Once we've done that, we may want to configure our clients to always use EDNS or TCP instead of always trying without those and then retrying when they see the truncated response.
The text was updated successfully, but these errors were encountered: