Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dns server does not behave well with DNS size limits #6342

Open
davepacheco opened this issue Aug 15, 2024 · 5 comments
Open

dns server does not behave well with DNS size limits #6342

davepacheco opened this issue Aug 15, 2024 · 5 comments

Comments

@davepacheco
Copy link
Collaborator

This issue is deliberately a little vague, but what I mean is:

  • The DNS server appears to happily send DNS responses that are larger than 512 bytes. I'm not sure if this is actually wrong or not but it seems to break hickory-dns. This appeared to be a cause of the problems in follow trust-dns to its new name: hickory #5912 that have been rather painful to debug.
  • I don't see any code in the DNS server for supporting EDNS.
  • I don't see any code in the DNS server for supporting TCP.

More research needed to confirm but I think we should probably have the DNS server:

  • truncate non-EDNS, UDP messages at 512 bytes (and set the "truncated" bit)
  • support EDNS for larger response sizes
  • support TCP for larger response sizes

I gather there's some max size negotiation that can happen, at least with EDNS (not sure about non-EDNS), and we should presumably honor that.

Once we've done that, we may want to configure our clients to always use EDNS or TCP instead of always trying without those and then retrying when they see the truncated response.

@davepacheco
Copy link
Collaborator Author

In terms of urgency: I think we have a workaround that will allow us to land #5912 and there may be no immediate problem, but I think there's considerable risk in doing nothing right now. For one, I'm not convinced the workaround isn't working by accident, in which case it could break again in a new update. (The workaround is to enable EDNS on the client but we're not actually using EDNS! I think it's just causing hickory-dns to process the full packet instead of ignoring it.) For another, it wouldn't necessarily take a software update to trigger this again. Any change that causes some query to produce more DNS records than it did before (e.g., adding a sled and putting another Nexus on it or something like that) could cause some client to stop working. (Even if we're talking about the same client, there may be another threshold slightly higher than we've hit so far that will trigger this again.) And when this sort of problem comes up, the failure mode is as we saw in #5912, which is sudden, silent failure of all DNS queries. So I think we should probably do this sooner rather than later.

In terms of implementation: right now, our DNS server is quite low-level: parsing packets directly and producing responses using low-level packet builders. I expect those interfaces probably do support what we need. But I see that hickory-dns has a much higher level ServerFuture interface where you can just register UDP or TCP sockets and plug in a request handler to handle the request. That seems like a much nicer interface for what we're trying to do. @rcgoodfellow did you have thoughts on this?

@davepacheco
Copy link
Collaborator Author

@rcgoodfellow
Copy link
Contributor

hickory-dns has a much higher level ServerFuture interface where you can just register UDP or TCP sockets and plug in a request handler to handle the request. That seems like a much nicer interface for what we're trying to do. @rcgoodfellow did you have thoughts on this?

If there is a higher-level interface that provides more inbuilt functionality and looks like it satisfies our needs, I'd say give it a go. IIRC, when I was first working with TrustDNS, the higher-level interfaces were a little too tied at the hip to the standalone daemon for what I was trying to accomplish, which is why I went for the lower-level ones.

@iximeow
Copy link
Member

iximeow commented Aug 22, 2024

from a quick look, i think ServerFuture can work for us, but we'd need a custom replacement for Catalog, which i suspect is the tied-at-the-hip interface that Ry might have seen?

ServerFuture does some nice defaults like sanitize_src_address (oops, looks like we'd respond to forged requests from broadcast addresses ... if we bound on ipv4), but i actually don't see any logic here to handle over-sized responses!

at the bare minimum, after looking at this, i'm strongly in favor of

  • truncate non-EDNS, UDP messages at 512 bytes (and set the "truncated" bit)

which it seems we might have to do ourselves anyway even using ServerFuture?

on that topic, part of why we've been happily emitting oversized DNS responses might be that the underlying buffer in BinEncoder defaults to u16::MAX as a bound here. we can set a lower bound, which i believe will cause us to fail to encode messages, and give us a point to emit an empty response with the truncated bit set.

this also gives us a nice way to plumb the EDNS-advertised maximum size through, if we really need.


i'm also uncomfortable with the thought that our DNS client code was getting responses that were somehow unusable and not even warning pretty loudly about it.

the change in hickory-dns that made this apparent is mostly interesting because it resulted in this recv_buf being correctly sized to 512 bytes, rather than 4096. on deserialization error there's a warn!(), but it's a tracing::warn!(). i think that's why the warning gets lost: we use slog, so we're not registering a tracing subscriber, and i think the message goes to the aether. from a client's perspective that just be losing the server's response.

client behavior is clearly a different issue, and i'm not immediately sure how to improve the client side here, so i'll probably digest this part into a different issue tomorrow...

edit: looked further into client behavior and wrote it up in #6415. while i think think this is more concerning from a debuggability standpoint, i also think this is where we'd need to work with hickory-dns more actively for the right API.

@iximeow
Copy link
Member

iximeow commented Aug 22, 2024

The workaround is to enable EDNS on the client but we're not actually using EDNS! I think it's just causing hickory-dns to process the full packet instead of ignoring it.

after finding the relevant code in hickory-proto i'm somewhat less concerned about an immediate risk, but this obviously is worth fixing still; my understanding is that the EDNS-advertised size specifically refers to the largest size a message directed to the sender can be. so the workaround has clients report that they can accept large UDP messages, but that does not imply that the server should report that it supports large UDP messages. it seems like it is primarily useful for a server to report accurate EDNS support/size to hint to clients that it could support larger queries, if a client needed to.

more mechanically, for this workaround to stop working, this buffer size calculation in hickory-proto would become more complicated. this is sized to match the client's advertised max size, and so if the response's size were to influence it, hickory-proto would have to receive the message, parse it, decide to shrink it down, and then do something unfortunate with the data.

truncate non-EDNS, UDP messages at 512 bytes (and set the "truncated" bit)

after putting together #6415 we should not do this until we support queries over TCP. internal-dns can't currently tell that it gets truncated responses, once we start setting the truncated bit. so just we'd boldly proceed on with incomplete answers after failing a TCP fallback.

support EDNS for larger response sizes

strong agree

support TCP for larger response sizes

yes

we may want to configure our clients to always use EDNS or TCP

big fan.

alongside all this, knowing that we're sending too-large answers (there's still a 64KiB size limit for TCP answers) seems really important. EDNS pushes that problem down the road another kb or two, TCP is probably Good Enough Forever, but still...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants