-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[nexus] Allow silo admins to upload new certs #4669
Conversation
The additional cert validation added in #4100 broke the ability for silo admins to upload new certs, because it introduced a call to fetch the rack DNS configuration (in order to assemble the FQDNs for the silo to check that the cert is valid for them). This PR fixes that by using an elevated internal privilege for that DNS config lookup. Fixes #4532.
nexus/src/app/certificate.rs
Outdated
let dns_opctx = OpContext::for_background( | ||
opctx.log.clone(), | ||
Arc::clone(&self.authz), | ||
authn::Context::internal_service_balancer(), | ||
Arc::clone(self.datastore()), | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that we've been a little inconsistent on how we create OpContexts that Nexus uses to do privileged things on behalf of unprivileged users.
- For the "instance allocator" and "external authn" OpContexts, we create them once at startup and store them into
Nexus
. Callers can access the former directly as a field. They can access the latter that way or usingself.opctx_external_authn()
. - For the service balancer and internal API OpContexts, we have similar methods
self.opctx_for_service_balancer()
andself.opctx_for_internal_api()
, but these construct theOpContext
anew each time rather than just doing that once on startup.
I generally prefer the first approach but clearly it's common to do the latter too. I was going to suggest not doing this logic inline here, creating a helper instead, although maybe you did this on purpose to preserve the existing request log (in opctx.log
)?
There's also the question of whether this is the right OpContext. This definitely isn't the service balancer...but neither is the other user of this OpContext that I found. In a sense, it really doesn't matter. These internal identities only exist to limit the privileges of code paths. You could create a new identity here that can only read DNS and use that. But it might be overkill to do that everywhere we need to use this pattern.
I see that we have an "internal-read" identity that only gets "fleet viewer", not "fleet admin". I also see that we already have Nexus.opctx_alloc
using that identity. You could use that here and it'd be a little safer (fewer privileges) and also would avoid constructing a new OpContext inline with the request. The only downside is that the label attached to the log for that one is "InstanceAllocator", which would be a little misleading. You could also just go ahead and construct a new OpContext here using authn::Context::internal_read()
instead, or add a helper to do it.
I don't have a strong feeling about all these options. I would at least use the internal_read()
identity because there's not much work and I don't think there's any downside. As for whether to create a new identity, whether to create the OpContext inline or use a helper or use the existing one, these all have tradeoffs and one could go either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summarizing an offline chat:
- We can't use "internal-read", because the DNS config policy only grants read to "fleet admin" (which "internal-read" isn't).
- Per the comments on DnsConfig, Nexus uses the "external-authenticator" role when creating silos, so it seems reasonable to use it here as well.
Changed in fa74a7a.
The additional cert validation added in #4100 broke the ability for silo admins to upload new certs, because it introduced a call to fetch the rack DNS configuration (in order to assemble the FQDNs for the silo to check that the cert is valid for them). This PR fixes that by using an elevated internal privilege for that DNS config lookup.
Fixes #4532.
TODO before mergingDone:confirm fix on madrid (the integration test makes me think this is probably not necessary, but I'd rather be sure)Tested on madrid; was able to use the CLIoxide certificate create
command while logged in as a silo admin, which 403'd prior to this PR