-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconfigurator: Add cockroachdb zones as needed #5797
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fantastic, thanks!
DiscretionaryOmicronZone::Nexus, | ||
DiscretionaryOmicronZone::CockroachDb, | ||
] { | ||
// Count the number of `kind` zones on all in-service sleds. This |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about moving this block to another function? Might make it easier to test/call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally had it that way, but the zone_placement
option that's reused across loop iterations was a little awkward. I've done some other rework since then; let me try it again...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried extracting the full body of this loop into a separate function, and didn't like two things:
- The aforementioned
&mut Option<OmicronZonePlacement>
it has to take - The function ended with an unconditional call to
add_discretionary_zones
, which felt like I was artificially splitting things up into two, which doesn't leave them independently testable
I took a second shot in 81b78ee: I kept the Option<OmicronZonePlacement>
(and its construction, if needed) in the body of this loop, but moved the calculation of how many zones are needed into a new method. I think I like this better than either what was here before or moving the whole body of the loop into a method - what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
@@ -791,6 +847,46 @@ impl<'a> BlueprintBuilder<'a> { | |||
allocator.alloc().ok_or(Error::OutOfAddresses { sled_id }) | |||
} | |||
|
|||
fn sled_alloc_zpool( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really actionable on this PR: This function works. It would probably have been easier to write and a bit more efficient if we had a way to iterate through the zpools themselves and check directly what zones they had rather than the other way around. I'm not sure this makes sense to track during adding zones though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, understandable. I'm not sure how we'd address this if we wanted to do something different here - maybe keep a separate zpool -> zone map in the builder? Might be worth it if we end up needing to do this more than just here, but seems annoying to keep in sync with zone changes being made.
@@ -1081,6 +1178,33 @@ pub mod test { | |||
); | |||
} | |||
} | |||
|
|||
// On any given zpool, we should have at most one zone of any given |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this function actually used anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, most/all of the tests in this file and planner.rs
call this after generating blueprints to do basic sanity checking
// TODO-cleanup This is wrong, but we don't currently set up any CRDB | ||
// nodes in our fake system, so this prevents downstream test issues | ||
// with the planner thinking our system is out of date from the gate. | ||
let target_cockroachdb_zone_count = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also wondering about this when looking at our tests the other day. Should we actually deploy CRDB zones and everything else as part of our ExampleSystem
? It feels like we should. Not in this PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, and initially I set this to COCKROACHDB_REDUNDANCY
. That broke basically all the nontrivial tests, though, so I took the easy way out and punted on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened an issue for this: #5808
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
a20c18d
to
81b78ee
Compare
This builds on #5788 to add support to the planner to add new CRDB zones to blueprints (if the current count is below the policy's target count). No changes were needed on the execution side, and the CRDB zones already bring themselves up as part of the existing cluster by looking up other node names in DNS. A big chunk of the diff comes from expectorate output - our simulated test system was producing sleds that all had the same physical disk and zpool UUIDs, which messed with the test I wrote for the builder's zpool allocation. I tweaked the `TypedUuidRng` that the sled uses to include the `sled_id` (which itself comes from a "parent" `TypedUuidRng`) as the second seed argument. If that seems unreasonable, I am very open to other fixes!
This builds on #5788 to add support to the planner to add new CRDB zones to blueprints (if the current count is below the policy's target count). No changes were needed on the execution side, and the CRDB zones already bring themselves up as part of the existing cluster by looking up other node names in DNS.
A big chunk of the diff comes from expectorate output - our simulated test system was producing sleds that all had the same physical disk and zpool UUIDs, which messed with the test I wrote for the builder's zpool allocation. I tweaked the
TypedUuidRng
that the sled uses to include thesled_id
(which itself comes from a "parent"TypedUuidRng
) as the second seed argument. If that seems unreasonable, I am very open to other fixes!