Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconfigurator: Add cockroachdb zones as needed #5797

Merged
merged 14 commits into from
Jun 13, 2024

Conversation

jgallagher
Copy link
Contributor

This builds on #5788 to add support to the planner to add new CRDB zones to blueprints (if the current count is below the policy's target count). No changes were needed on the execution side, and the CRDB zones already bring themselves up as part of the existing cluster by looking up other node names in DNS.

A big chunk of the diff comes from expectorate output - our simulated test system was producing sleds that all had the same physical disk and zpool UUIDs, which messed with the test I wrote for the builder's zpool allocation. I tweaked the TypedUuidRng that the sled uses to include the sled_id (which itself comes from a "parent" TypedUuidRng) as the second seed argument. If that seems unreasonable, I am very open to other fixes!

Copy link
Contributor

@sunshowers sunshowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fantastic, thanks!

DiscretionaryOmicronZone::Nexus,
DiscretionaryOmicronZone::CockroachDb,
] {
// Count the number of `kind` zones on all in-service sleds. This
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about moving this block to another function? Might make it easier to test/call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally had it that way, but the zone_placement option that's reused across loop iterations was a little awkward. I've done some other rework since then; let me try it again...

Copy link
Contributor Author

@jgallagher jgallagher May 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried extracting the full body of this loop into a separate function, and didn't like two things:

  • The aforementioned &mut Option<OmicronZonePlacement> it has to take
  • The function ended with an unconditional call to add_discretionary_zones, which felt like I was artificially splitting things up into two, which doesn't leave them independently testable

I took a second shot in 81b78ee: I kept the Option<OmicronZonePlacement> (and its construction, if needed) in the body of this loop, but moved the calculation of how many zones are needed into a new method. I think I like this better than either what was here before or moving the whole body of the loop into a method - what do you think?

nexus/reconfigurator/planning/src/system.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@andrewjstone andrewjstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@@ -791,6 +847,46 @@ impl<'a> BlueprintBuilder<'a> {
allocator.alloc().ok_or(Error::OutOfAddresses { sled_id })
}

fn sled_alloc_zpool(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really actionable on this PR: This function works. It would probably have been easier to write and a bit more efficient if we had a way to iterate through the zpools themselves and check directly what zones they had rather than the other way around. I'm not sure this makes sense to track during adding zones though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, understandable. I'm not sure how we'd address this if we wanted to do something different here - maybe keep a separate zpool -> zone map in the builder? Might be worth it if we end up needing to do this more than just here, but seems annoying to keep in sync with zone changes being made.

@@ -1081,6 +1178,33 @@ pub mod test {
);
}
}

// On any given zpool, we should have at most one zone of any given
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function actually used anywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, most/all of the tests in this file and planner.rs call this after generating blueprints to do basic sanity checking

// TODO-cleanup This is wrong, but we don't currently set up any CRDB
// nodes in our fake system, so this prevents downstream test issues
// with the planner thinking our system is out of date from the gate.
let target_cockroachdb_zone_count = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also wondering about this when looking at our tests the other day. Should we actually deploy CRDB zones and everything else as part of our ExampleSystem? It feels like we should. Not in this PR though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, and initially I set this to COCKROACHDB_REDUNDANCY. That broke basically all the nontrivial tests, though, so I took the easy way out and punted on this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened an issue for this: #5808

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Base automatically changed from john/blueprint-planning-service-placement to main May 22, 2024 14:15
@jgallagher jgallagher merged commit c1956b8 into main Jun 13, 2024
19 checks passed
@jgallagher jgallagher deleted the john/reconfigurator-add-cockroachdb branch June 13, 2024 18:59
Nieuwejaar pushed a commit that referenced this pull request Jun 15, 2024
This builds on #5788 to add support to the planner to add new CRDB zones
to blueprints (if the current count is below the policy's target count).
No changes were needed on the execution side, and the CRDB zones already
bring themselves up as part of the existing cluster by looking up other
node names in DNS.

A big chunk of the diff comes from expectorate output - our simulated
test system was producing sleds that all had the same physical disk and
zpool UUIDs, which messed with the test I wrote for the builder's zpool
allocation. I tweaked the `TypedUuidRng` that the sled uses to include
the `sled_id` (which itself comes from a "parent" `TypedUuidRng`) as the
second seed argument. If that seems unreasonable, I am very open to
other fixes!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants