Reconfigurator: Add cockroachdb zones as needed #5797

jgallagher · 2024-05-20T20:49:02Z

This builds on #5788 to add support to the planner to add new CRDB zones to blueprints (if the current count is below the policy's target count). No changes were needed on the execution side, and the CRDB zones already bring themselves up as part of the existing cluster by looking up other node names in DNS.

A big chunk of the diff comes from expectorate output - our simulated test system was producing sleds that all had the same physical disk and zpool UUIDs, which messed with the test I wrote for the builder's zpool allocation. I tweaked the TypedUuidRng that the sled uses to include the sled_id (which itself comes from a "parent" TypedUuidRng) as the second seed argument. If that seems unreasonable, I am very open to other fixes!

sunshowers

Looks fantastic, thanks!

nexus/reconfigurator/planning/src/blueprint_builder/builder.rs

sunshowers · 2024-05-20T22:28:43Z

nexus/reconfigurator/planning/src/planner.rs

+            DiscretionaryOmicronZone::Nexus,
+            DiscretionaryOmicronZone::CockroachDb,
+        ] {
+            // Count the number of `kind` zones on all in-service sleds. This


What do you think about moving this block to another function? Might make it easier to test/call.

I originally had it that way, but the zone_placement option that's reused across loop iterations was a little awkward. I've done some other rework since then; let me try it again...

I tried extracting the full body of this loop into a separate function, and didn't like two things:

The aforementioned &mut Option<OmicronZonePlacement> it has to take

The function ended with an unconditional call to add_discretionary_zones, which felt like I was artificially splitting things up into two, which doesn't leave them independently testable

I took a second shot in 81b78ee: I kept the Option<OmicronZonePlacement> (and its construction, if needed) in the body of this loop, but moved the calculation of how many zones are needed into a new method. I think I like this better than either what was here before or moving the whole body of the loop into a method - what do you think?

nexus/reconfigurator/planning/src/system.rs

andrewjstone

Looks great!

andrewjstone · 2024-05-21T17:58:26Z

nexus/reconfigurator/planning/src/blueprint_builder/builder.rs

@@ -791,6 +847,46 @@ impl<'a> BlueprintBuilder<'a> {
        allocator.alloc().ok_or(Error::OutOfAddresses { sled_id })
    }

+    fn sled_alloc_zpool(


Not really actionable on this PR: This function works. It would probably have been easier to write and a bit more efficient if we had a way to iterate through the zpools themselves and check directly what zones they had rather than the other way around. I'm not sure this makes sense to track during adding zones though.

Yeah, understandable. I'm not sure how we'd address this if we wanted to do something different here - maybe keep a separate zpool -> zone map in the builder? Might be worth it if we end up needing to do this more than just here, but seems annoying to keep in sync with zone changes being made.

andrewjstone · 2024-05-21T18:00:10Z

nexus/reconfigurator/planning/src/blueprint_builder/builder.rs

@@ -1081,6 +1178,33 @@ pub mod test {
                );
            }
        }
+
+        // On any given zpool, we should have at most one zone of any given


Is this function actually used anywhere?

Yeah, most/all of the tests in this file and planner.rs call this after generating blueprints to do basic sanity checking

andrewjstone · 2024-05-21T18:52:13Z

nexus/reconfigurator/planning/src/system.rs

+        // TODO-cleanup This is wrong, but we don't currently set up any CRDB
+        // nodes in our fake system, so this prevents downstream test issues
+        // with the planner thinking our system is out of date from the gate.
+        let target_cockroachdb_zone_count = 0;


I was also wondering about this when looking at our tests the other day. Should we actually deploy CRDB zones and everything else as part of our ExampleSystem? It feels like we should. Not in this PR though.

Agreed, and initially I set this to COCKROACHDB_REDUNDANCY. That broke basically all the nontrivial tests, though, so I took the easy way out and punted on this.

Makes sense.

I opened an issue for this: #5808

…e kind

This builds on #5788 to add support to the planner to add new CRDB zones to blueprints (if the current count is below the policy's target count). No changes were needed on the execution side, and the CRDB zones already bring themselves up as part of the existing cluster by looking up other node names in DNS. A big chunk of the diff comes from expectorate output - our simulated test system was producing sleds that all had the same physical disk and zpool UUIDs, which messed with the test I wrote for the builder's zpool allocation. I tweaked the `TypedUuidRng` that the sled uses to include the `sled_id` (which itself comes from a "parent" `TypedUuidRng`) as the second seed argument. If that seems unreasonable, I am very open to other fixes!

jgallagher requested review from sunshowers and andrewjstone May 20, 2024 20:49

sunshowers approved these changes May 20, 2024

View reviewed changes

andrewjstone approved these changes May 21, 2024

View reviewed changes

Base automatically changed from john/blueprint-planning-service-placement to main May 22, 2024 14:15

jgallagher added 12 commits May 22, 2024 10:17

add cockroachdb target count to reconfigurator Policy

82dba11

sled_num_nexus_zones -> sled_num_zones_of_kind

c9f569d

add BlueprintBuilder::sled_ensure_zone_multiple_cockroachdb

66c1cb4

planner: add CRDB zones as needed

6b86f26

planner: cleanup

bb98261

cleanup

dd7a954

verify blueprint: add check for zpools with multiple zones of the sam…

497e9b7

…e kind

Add builder test for sled_ensure_zone_multiple_cockroachdb

1566841

fix example system reusing disk/zpool IDs across multiple sleds

592bafd

cargo fmt

ae95fc2

minor PR feedback

13d3cac

extract num_additional_zones_needed planner method

81b78ee

jgallagher force-pushed the john/reconfigurator-add-cockroachdb branch from a20c18d to 81b78ee Compare May 22, 2024 14:40

andrewjstone mentioned this pull request May 22, 2024

[Reconfigurator] System Example used in tests should deploy CRDB nodes (and new types of zones) during setup #5808

Open

jgallagher added 2 commits June 13, 2024 13:25

Merge branch 'main' into john/reconfigurator-add-cockroachdb

6e879fe

fix mismerge

bef2017

jgallagher merged commit c1956b8 into main Jun 13, 2024
19 checks passed

jgallagher deleted the john/reconfigurator-add-cockroachdb branch June 13, 2024 18:59

smklein mentioned this pull request Jul 1, 2024

Allow Nexus to specify Zone filesystem location #5931

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reconfigurator: Add cockroachdb zones as needed #5797

Reconfigurator: Add cockroachdb zones as needed #5797

jgallagher commented May 20, 2024

sunshowers left a comment

sunshowers May 20, 2024

jgallagher May 22, 2024

jgallagher May 22, 2024 •

edited

Loading

andrewjstone left a comment

andrewjstone May 21, 2024

jgallagher May 22, 2024

andrewjstone May 21, 2024

jgallagher May 22, 2024

andrewjstone May 21, 2024

jgallagher May 22, 2024

andrewjstone May 22, 2024

andrewjstone May 22, 2024

jgallagher May 22, 2024

Reconfigurator: Add cockroachdb zones as needed #5797

Reconfigurator: Add cockroachdb zones as needed #5797

Conversation

jgallagher commented May 20, 2024

sunshowers left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jgallagher May 22, 2024 • edited Loading

Choose a reason for hiding this comment

andrewjstone left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jgallagher May 22, 2024 •

edited

Loading