Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update external DNS during blueprint execution #5212

Merged
merged 40 commits into from
Mar 14, 2024

Conversation

davepacheco
Copy link
Collaborator

@davepacheco davepacheco commented Mar 7, 2024

Depends on #5239.

Fixes #5068 and #5220.

@davepacheco davepacheco self-assigned this Mar 7, 2024
@davepacheco davepacheco changed the base branch from main to dap/test-runner March 9, 2024 23:23
Base automatically changed from dap/test-runner to main March 10, 2024 03:20
@@ -122,7 +123,7 @@ impl super::Nexus {
&sled_rows,
&zpool_rows,
&ip_pool_range_rows,
NEXUS_REDUNDANCY,
NEXUS_REDUNDANCY + 1, // XXX-dap
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm planning to remove this before integration -- this is just for testing.

internal_dns_version INT8 NOT NULL,
-- identifies the latest external DNS version when blueprint planning began
-- XXX-dap migration code must set the value for existing blueprints
external_dns_version INT8 NOT NULL
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will obviously fix this before integration but wanted to save it for the last step. The plan will be to do something similar to internal_dns_version (schema version 36.0.0), setting the external DNS version to 1 for existing blueprints.

@davepacheco davepacheco requested a review from jgallagher March 12, 2024 00:20
@davepacheco davepacheco marked this pull request as ready for review March 12, 2024 00:20
@@ -1177,6 +1180,7 @@ mod tests {
let blueprint = BlueprintBuilder::build_initial_from_collection(
&collection,
Generation::new(),
Generation::new(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely not a blocker on this PR: This makes me wonder if we should have typed generations like we're getting typed UUIDs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does seem good.

let err = OptionalError::new();
let conn = self.pool_connection_authorized(opctx).await?;
self.transaction_retry_wrapper("nexus_external_addresses")
.transaction(&conn, |conn| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the transaction was here to (attempt to?) ensure we got a consistent view of DNS zones + external IPs; does that not matter in practice?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it should matter? The list of DNS zones cannot actually change today at all. If it did, I think you'd already have a problem because we're constructing a list of DNS names based on those zones, and that could become out of date by the time we try to write the DNS changes. The set of external IPs could, too, though only via Reconfigurator. Eventually I think we want to have even the Silo DNS changes happen through Reconfigurator, and then we'll know it will at least converge to correct. Anyway, since the list of DNS zones cannot change today, I think it makes sense to do this and revisit if/when we support that.

@@ -45,6 +47,7 @@ use ref_cast::RefCast;
use uuid::Uuid;

/// Filter a "silo_list" query based on silos' discoverability
#[derive(Clone, Copy)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe while we're here:

Suggested change
#[derive(Clone, Copy)]
#[derive(Debug, Clone, Copy)]

let dendrite_port =
cptestctx.dendrite.get(&switch_location).unwrap().port;
let mgd_port = cptestctx.mgd.get(&switch_location).unwrap().port;
overrides.override_switch_zone_ip(sled_id, ip);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could all the override_* methods also be #[cfg(test)], and/or not be pub?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 7028e0b.

/// structure is empty and returns the default (production) values. The test
/// suite overrides these values.
#[derive(Debug, Default)]
pub struct Overridables {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just thinking out loud, not requesting a change: All these fields are related to the switch zone. If we were to add other kinds of test overrides, would it get unwieldy if we stashed them in here too? Wondering if this would be clearer if it were named something like SwitchZoneAddresses. Just Overrideables implies to me that non-tests might also want to override these values, but really we're passing in the switch zone addresses with a way for tests only to set them to unusual values, right?

Copy link
Collaborator Author

@davepacheco davepacheco Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, your summary is correct. I also thought about calling it TestOverridables though without context it didn't seem clearer. SwitchZoneAddresses is okay though without context I usually think "address" means "IP address" and not TCP ports.

I'm not sure the problem is intrinsically related to the switch zone. It's anything whose addresses or ports we cannot determine from the configuration that's available. It's not a problem for Omicron zones because their addresses and ports are part of the blueprint configuration. It is a problem for the switch zone components because they're not part of that config. In principle there could be other things in that bucket. If we needed to know how to reach the internal DNS servers, we'd almost need this (because in real systems those are at fixed addresses), but we don't because they happen to be in OmicronZoneConfig too. Basically it would be anything that we run on some fixed, known address/port in production, but that runs in the test suite at some dynamic host/port.

I'm indifferent among these names. OverridablesForTest? Though the actual production values are behind that object, too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love OverrideablesForTest for the reason you mentioned, although maybe it'd work better if we added a realize_blueprint_with_test_overrides? Change realize_blueprint to not take this argument at all, but instead call straight through to realize_blueprint_with_test_overrides with Default::default()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the change to realize_blueprint and I've done that in 7028e0b.

let external_dns_config_blueprint = blueprint_external_dns_config(
blueprint,
&nexus_external_ips,
&silos.iter().collect::<Vec<_>>(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny nit - collecting another Vec of silos seems weird when we just did that on line 76. Could blueprint_external_dns_config take &[Silo] instead? Or maybe even impl Iterator<...>s instead of slices so we don't have to collect at all?

@@ -315,40 +437,70 @@ fn dns_compute_update(

#[cfg(test)]
mod test {
use super::blueprint_dns_config;
use super::blueprint_internal_dns_config;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing wrong with this, but I suspect if you used super::* you could drop a bunch of these duplicates.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that's considered more idiomatic I can switch to that but if it's an arbitrary choice I really prefer being explicit instead. I've gotten surprisingly burned by trying to move code out of a block that used use super::*. It's not a big deal though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it probably is? use super::* shows up over 100 times in omicron, and I have it in my "generate a mod tests" snippet. But I don't feel strongly about this at all.

.await
.expect("fetching initial external DNS");

// Now, use it to construct an initial blueprint.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really tiny nitpick: "use it" is referring back to the collection, not the DNS contents we just fetched, right? Could we swap these two chunks?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept the order to emphasize/ensure that we're collecting state before we change anything, but I have fixed the comment.

assert_eq!(new_zones.len(), 1);
let new_zone_id = *new_zones[0];

// Set this blueprint as the current target. We set it to disabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar tiny nit as above, with apologies: "this blueprint" sounds like blueprint2 (the one we just made), but first we have to set blueprint as the target. Could we set blueprint up above after we create, then keep the comment and just deal with blueprint2 here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 7028e0b.

/// _within_ the external DNS zone (i.e., without that zone's suffix)
///
/// This specific naming scheme is determined under RFD 357.
pub fn silo_dns_name(name: &omicron_common::api::external::Name) -> String {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this go above the mod test block?

Copy link
Contributor

@jgallagher jgallagher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@davepacheco davepacheco merged commit 45ccacd into main Mar 14, 2024
22 checks passed
@davepacheco davepacheco deleted the dap/reconfigurator-external-dns branch March 14, 2024 03:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

blueprints: Nexus needs to update external DNS
2 participants