-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensures datasets get propagated into RSS blueprint #7000
Conversation
@@ -3128,6 +3128,7 @@ | |||
"type": "object", | |||
"properties": { | |||
"address": { | |||
"nullable": true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this PR: The public-facing API to "add a dataset to the DB" required a SocketAddr type.
This is an artifact which was partially fixed by #2000 - unfortunately, there is code which still uses these IP/port pairs if they exist, but they are not required to exist for a dataset record to be created.
Anyway - TL;DR, they're optional, as they should be.
impl SledConfig { | ||
/// Adds a zone to the Sled's configuration, as well as any number of | ||
/// durable datasets. | ||
pub fn add_zone_and_datasets(&mut self, zone: BlueprintZoneConfig) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is RSS's version of "whenever you add a zone, make sure you track all possible datasets used by that zone too".
@@ -119,6 +124,54 @@ pub struct SledConfig { | |||
pub zones: Vec<BlueprintZoneConfig>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Struggling to comment the right line here -- I'm trying to comment on the pub datasets: DatasetsConfig
line above -- but basically:
- We added this field in [nexus] Manage datasets via reconfigurator #6229
- but we never populated it correctly
That's what this PR is most significantly trying to fix
.await | ||
.await?; | ||
|
||
// Ensure all datasets are initialized |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the equivalent of the "blueprint executor", as controlled by RSS. "Request datasets before requesting zones".
.pool_name | ||
.id() | ||
.into_untyped_uuid(), | ||
dataset_id: zone.id.into_untyped_uuid(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was nasty - we re-used the zone UUID as a dataset UUID 😬
We aren't doing that anymore - datasets have their own Uuids that are provisioned as they are created within the plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Just one bit of confusion from me.
@@ -107,6 +107,7 @@ impl DatasetName { | |||
&self.pool_name | |||
} | |||
|
|||
// TODO: Maybe rename this to "kind"? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I figure this will have ripple effects which is why you didn't do it in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's a mechanical change, so I figured I'd punt it out when I get to a less conflict-y place
>(()); | ||
}; | ||
|
||
if let sled_agent_client::Error::ErrorResponse(response) = &error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm failing to see how this could happen during an initial rack setup. We won't handoff to nexus until after RSS, which we won't do until we finish calling this method. The only way this can fail in this scenario is if we call RSS again after the rack is already initialized. That is gated elsewhere though and we shouldn't even get here right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may have been a result of copying "how we initialize zones" to "disks" and now to "datasets".
I've removed this in 6f6a87c - any conflict is now a hard error, for both disk and dataset initialization.
Tested this in a4x2, saw the following:
|
Okay, took this for a spin again on a4x2. Turns out, investing in the blueprint diff logic in #6481 was worthwhile. Here's what I did:
The results? They're almost the same, but turns out they have differing opinions on the compression level:
Seems that's because the Sled Agent initializes this dataset with Gzip-9: omicron/sled-storage/src/dataset.rs Lines 67 to 70 in 15305ae
But the reconfigurator is hard-coded to use
In 084d6f8 , I went ahead and updated this to be "gzip-9" everywhere. End result: This RSS-generated blueprint should now be identical to one immediately generated by Nexus. |
Fixes #6989