Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[nexus] Allow silo admins to upload new certs #4669

Merged
merged 3 commits into from
Dec 12, 2023
Merged

Conversation

jgallagher
Copy link
Contributor

@jgallagher jgallagher commented Dec 11, 2023

The additional cert validation added in #4100 broke the ability for silo admins to upload new certs, because it introduced a call to fetch the rack DNS configuration (in order to assemble the FQDNs for the silo to check that the cert is valid for them). This PR fixes that by using an elevated internal privilege for that DNS config lookup.

Fixes #4532.

TODO before merging Done:

  • confirm fix on madrid (the integration test makes me think this is probably not necessary, but I'd rather be sure) Tested on madrid; was able to use the CLI oxide certificate create command while logged in as a silo admin, which 403'd prior to this PR

The additional cert validation added in #4100 broke the ability for silo
admins to upload new certs, because it introduced a call to fetch the
rack DNS configuration (in order to assemble the FQDNs for the silo to
check that the cert is valid for them). This PR fixes that by using an
elevated internal privilege for that DNS config lookup.

Fixes #4532.
nexus/tests/integration_tests/silos.rs Outdated Show resolved Hide resolved
Comment on lines 67 to 72
let dns_opctx = OpContext::for_background(
opctx.log.clone(),
Arc::clone(&self.authz),
authn::Context::internal_service_balancer(),
Arc::clone(self.datastore()),
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that we've been a little inconsistent on how we create OpContexts that Nexus uses to do privileged things on behalf of unprivileged users.

  • For the "instance allocator" and "external authn" OpContexts, we create them once at startup and store them into Nexus. Callers can access the former directly as a field. They can access the latter that way or using self.opctx_external_authn().
  • For the service balancer and internal API OpContexts, we have similar methods self.opctx_for_service_balancer() and self.opctx_for_internal_api(), but these construct the OpContext anew each time rather than just doing that once on startup.

I generally prefer the first approach but clearly it's common to do the latter too. I was going to suggest not doing this logic inline here, creating a helper instead, although maybe you did this on purpose to preserve the existing request log (in opctx.log)?

There's also the question of whether this is the right OpContext. This definitely isn't the service balancer...but neither is the other user of this OpContext that I found. In a sense, it really doesn't matter. These internal identities only exist to limit the privileges of code paths. You could create a new identity here that can only read DNS and use that. But it might be overkill to do that everywhere we need to use this pattern.

I see that we have an "internal-read" identity that only gets "fleet viewer", not "fleet admin". I also see that we already have Nexus.opctx_alloc using that identity. You could use that here and it'd be a little safer (fewer privileges) and also would avoid constructing a new OpContext inline with the request. The only downside is that the label attached to the log for that one is "InstanceAllocator", which would be a little misleading. You could also just go ahead and construct a new OpContext here using authn::Context::internal_read() instead, or add a helper to do it.

I don't have a strong feeling about all these options. I would at least use the internal_read() identity because there's not much work and I don't think there's any downside. As for whether to create a new identity, whether to create the OpContext inline or use a helper or use the existing one, these all have tradeoffs and one could go either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summarizing an offline chat:

  • We can't use "internal-read", because the DNS config policy only grants read to "fleet admin" (which "internal-read" isn't).
  • Per the comments on DnsConfig, Nexus uses the "external-authenticator" role when creating silos, so it seems reasonable to use it here as well.

Changed in fa74a7a.

nexus/src/app/certificate.rs Outdated Show resolved Hide resolved
nexus/src/app/certificate.rs Outdated Show resolved Hide resolved
@jgallagher jgallagher enabled auto-merge (squash) December 12, 2023 21:09
@jgallagher jgallagher merged commit 343835c into main Dec 12, 2023
20 checks passed
@jgallagher jgallagher deleted the silo-admin-create-certs branch December 12, 2023 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

silo admin cannot manage TLS certificates (but is supposed to be able to)
2 participants