New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

blueprint planner: add nexus on every sled #4959

Merged

jgallagher merged 14 commits into main from john/blueprint-planner-nexus

Feb 8, 2024

Contributor

jgallagher commented Feb 1, 2024

Most of the work in this PR is to give BlueprintBuilder the ability to add Nexus to a sled at all. A very minor bit to enable testing is a change to the planner to attempt to add a Nexus zone to every sled.

I think (?) we probably don't want to merge this into main with the planner making that decision, but I'm opening this for review because (a) maybe I'm wrong, since we're driving all of this by hand, (b) the bulk of the PR is ready for review, and (c) we want a TUF repo with this change to test on madrid.

jgallagher requested review from andrewjstone and davepacheco

February 1, 2024 19:10

Contributor Author

jgallagher commented Feb 1, 2024

Whoops, missed some tests. Working on fixing that and rebasing onto main, please hold the reviews...

jgallagher added 2 commits

February 1, 2024 16:05


          blueprint planner: add nexus on every sled

6a31e65


          fix broken tests

617b537

jgallagher force-pushed the john/blueprint-planner-nexus branch from 285c062 to 617b537 Compare

February 1, 2024 21:16

Contributor Author

jgallagher commented Feb 1, 2024

Whoops, missed some tests. Working on fixing that and rebasing onto main, please hold the reviews...

Ok, rebasing done and tests may be fixed (running locally too, but still slow). Clear for reviews; if there are more test breaks I won't rebase again.

andrewjstone reviewed

View reviewed changes

nexus/deployment/src/blueprint_builder.rs Outdated Show resolved Hide resolved

nexus/deployment/src/blueprint_builder.rs Outdated

+                                  match nic.ip {
+                                      IpAddr::V4(ip) => {
+                                          if !existing_nexus_v4_ips.insert(ip) {
+                                              bail!("duplicate Nexus NIC IP: {ip}");

Contributor

andrewjstone Feb 1, 2024

This seems pretty bad. This is an invariant violation baked into our blueprints AFAICT. Should we log here, or do we expect the caller to log?

Contributor Author

jgallagher Feb 2, 2024

It's definitely bad! The two ways this could happen are:

A bug in blueprint generation where we added a Nexus with a dupe IP (preventing this is the whole point of existing_nexus_v4_ips, etc)
An initial blueprint created from a collection contains dupes; this shouldn't be possible in practice given how RSS works

I don't think we'll ever see either of these, and I was tempted to make this a panic. But panicking nexus on "we should never see this" seems bad, hence the current bail!.

nexus/deployment/src/blueprint_builder.rs Show resolved Hide resolved

andrewjstone approved these changes

View reviewed changes


          Merge branch 'main' into john/blueprint-planner-nexus

894946f

davepacheco reviewed

View reviewed changes

nexus/deployment/src/planner.rs Outdated Show resolved Hide resolved

nexus/deployment/src/planner.rs Outdated Show resolved Hide resolved

nexus/deployment/src/blueprint_builder.rs Outdated Show resolved Hide resolved

nexus/deployment/src/blueprint_builder.rs Outdated Show resolved Hide resolved

nexus/deployment/src/blueprint_builder.rs Outdated Show resolved Hide resolved

nexus/deployment/src/blueprint_builder.rs Show resolved Hide resolved

nexus/deployment/src/blueprint_builder.rs Outdated

+                  fn random_mac(&mut self) -> MacAddr {
+                      let mut mac = MacAddr::random_system();
+                      // TODO-performance if the size of `used_macs` starts to approach some

Collaborator

davepacheco Feb 2, 2024

Feel free to ignore this. It's not a well fleshed-out thought. I worry about using randomness in the planner because I think that'll make it harder to test, and easier to introduce thrashing behavior (where two Nexuses -- or even one! -- fight over an arbitrary decision). You could as well argue the other way around, that by using randomness we'll be forced to encounter and deal with thrashing. I guess it's just a gut feel that randomness in the planner is going to make our lives harder.

Contributor Author

jgallagher Feb 2, 2024

Hmm! It's a fair concern. My gut feeling is that there are going to be other places where we want "just pick a random one" (e.g., which of the N sleds should get an extra Nexus/cockroach/etc.), and if we do something too deterministic (e.g., sort by sled ID) we'll make our lives unnecessarily difficult by picking the same sled for too many things.

Maybe a bad idea: what if the planner seeded an RNG with something deterministic, maybe the parent blueprint ID? Then it has deterministic behavior for any single given parent, but we can still freely use randomness.

Contributor

andrewjstone Feb 2, 2024

Maybe a bad idea: what if the planner seeded an RNG with something deterministic, maybe the parent blueprint ID? Then it has deterministic behavior for any single given parent, but we can still freely use randomness.

I am a fan of this! It's basically how proptest and all the other related tools give you replay determinism.

Collaborator

davepacheco Feb 2, 2024

we'll make our lives unnecessarily difficult by picking the same sled for too many things

I don't think randomness really solves that problem, but admittedly it might be good enough for a while. I think I'd propose an approach where we narrow the set of sleds to which ones are acceptable for a choice, then pick deterministically among those. That deterministic process could initially be "pick the lexicographically-first sled id", but it could also be "pick the one with fewer zones on it" or "pick the one with the least CPU/disk/memory already allocated by the control plane" or "assign each zone type a score based on how much load we expect it to generate and then pick the sled with the lowest total score". That is, we can do something simple (pick the first) or sophisticated (if we find we need to deal better with distribution) but it's still deterministic.

This is a lot more complicated than generating MAC addresses, though. Again, feel free to ignore this but I mention it because the sequential allocator seems no worse here and eliminates the only current source of randomness in the planner (I think).

Maybe a bad idea: what if the planner seeded an RNG with something deterministic, maybe the parent blueprint ID? Then it has deterministic behavior for any single given parent, but we can still freely use randomness.

I'm not sure. I could see that going either way too. It doesn't help test determinism unless we ensure that we use deterministic UUID generation too. I looked into that briefly while writing tests but it seemed like it'd require a lot of surgery.

Contributor

sunshowers Feb 5, 2024 •

edited

Loading

Maybe a bad idea: what if the planner seeded an RNG with something deterministic, maybe the parent blueprint ID? Then it has deterministic behavior for any single given parent, but we can still freely use randomness.

I think a deterministic source of randomness stored in the blueprint would be cool, and would probably end up getting used for a lot of things.

It doesn't help test determinism unless we ensure that we use deterministic UUID generation too. I looked into that briefly while writing tests but it seemed like it'd require a lot of surgery.

Deterministic UUID generation would be pretty useful as well I think, effectively requiring that a source of UUIDs be passed down to the various spots that generate them. Though it may be a fair bit of work.

Contributor

sunshowers Feb 5, 2024

Yeah, thinking about it, it's a bit hard to enforce the invariant that people don't use the Uuid::new_v4 method long-term. One way is probably to not enable the v4 feature, but that seems excessive (using the system RNG is completely fine in most other spots, just not here).

I guess a unit test could generate blueprints with a fixed seed several times, and ensure that the generated UUIDs stay the same.

Contributor

sunshowers Feb 5, 2024

Another idea really is to just ban new_v4 via clippy: https://rust-lang.github.io/rust-clippy/master/#/disallowed_methods

Collaborator

davepacheco Feb 5, 2024

I think it's fine if we want to have a deterministic source of randomness for blueprints. At the same time, I think randomness is not a great solution to any of the problems we've mentioned, or at least not better than an alternative (like sequential MAC address allocation), so I'm not sure it's worth the effort.

I think my aversion to randomness here comes down to a few things:

For automated testing, it's more annoying to verify that the planner generated the blueprint we expected because we can't just hardcode the values we know it will choose. Deterministic-random doesn't necessarily help here but it could. But we already wrote a higher-level diff thing that can tell if two blueprints are different, ignoring things like the "id" and "time_created" (which will always be different anyway). So this isn't that big a deal.
When we get to making this more autonomous, it will become important for Nexus instances to be able to look at some other instance's plan and say "is that okay?" That might be the same as "is that equivalent to the thing I would do?" -- or it might not. I alluded to this earlier. Choosing deterministically makes it easier for us to punt on this for longer (because two Nexus instances will tend to create the same plans), but it doesn't really solve the problem. Also: we're talking about hypotheticals on hypotheticals at this point and I really don't think we want to spend a lot of time on such things now.
Probably the biggest thing: during general operation and especially in support contexts: it means the system's behavior is less predictable. I can imagine wanting to set a customer's expectations -- or even our own -- about what would happen when the planner is enabled or when given some input. With randomness, even deterministic randomness, that's harder.
I've often reached for random approaches in the past and found that they work okay for a while, but in the limit / at scale, the variance winds up being a problem.

Honestly, none of these is that compelling. If there were a compelling reason to use randomness here, I'd say go for it. But I don't see any reason to use randomness, either, aside from being slightly quicker to implement than sequential. Either way, I don't think it's worth investing a lot of time on this part right now.

Contributor Author

jgallagher Feb 5, 2024

I went through this and basically copied what RSS was doing, so I ended up with sequential OPTE IP address generation and random MAC generation. It's a small change to switch over to sequential MAC generation; I can put that in this PR so I'm not introducing a new kind of randomness, and we can punt on whether we should have deterministic UUIDs for a followup.

Contributor Author

jgallagher Feb 5, 2024

9748940 switches to sequential MAC allocation. I'm not going to mark this conversation resolved because I don't want to hide it, but I think it's resolved as far as this PR is concerned; please correct me if that's wrong!

jgallagher added 6 commits

February 5, 2024 15:56


          allow blueprints to place multiple Nexuses on one sled

029dd2b


          cleanup/typos from PR review

a3e0a51


          cargo fmt

7f2d8f2


          move target_nexus_zone_count from Planner to Policy

38071d6


          doc fix

87ba387


          switch from random to sequential mac addr allocation

jgallagher requested review from andrewjstone and davepacheco

February 5, 2024 22:20

Contributor Author

jgallagher commented Feb 5, 2024

I think I've addressed most of the comments above. The changes are substantial enough that this warrants a second look, particularly 029dd2b which switches from "allocate one Nexus on every sled" to "allocate a total of Policy.target_nexus_zone_count Nexus zones, prioritizing sleds that have the fewest Nexus zones currently".


          fix db tests

e0278a5

jgallagher added a commit that referenced this pull request


          blueprint planner: skip non-provisionable sleds

ebb111c

I think I missed this on #4959, but instead of tacking onto that I'm
opening this as a separate PR to confirm the guard is in the right
place.

If a sled has been marked as non-provisionable by an operator, I'm
pretty confident we should NOT choose that sled when we're picking sleds
for additional service zones (e.g., Nexus). I'm less confident on what
we should do about NTP and Crucible zones. The change in this PR will
still attempt to add an NTP zone, but will not attempt to add any
Crucible zones. Is that the right place for it? Or should we still
add Crucible zones to a non-provisionable sled (if it has zpools that
don't yet have Crucible zones)?

jgallagher mentioned this pull request

blueprint planner: skip non-provisionable sleds #5003

Merged

Collaborator

davepacheco commented Feb 6, 2024

Note: this will fix #4886, I assume.

davepacheco approved these changes

View reviewed changes

sled-agent/Cargo.toml Outdated Show resolved Hide resolved

nexus/types/src/deployment.rs Outdated Show resolved Hide resolved

nexus/deployment/src/planner.rs Outdated Show resolved Hide resolved

nexus/deployment/src/planner.rs Show resolved Hide resolved

nexus/deployment/src/planner.rs Show resolved Hide resolved

jgallagher added 4 commits

February 8, 2024 11:22


          Merge branch 'main' into john/blueprint-planner-nexus

c82b1c3


          move NEXUS_REDUNDANCY from nexus-deployment to omicron-common to shar…

d332154

…e with RSS


          comment cleanup from PR review

7ae7461


          blueprint planner: skip non-provisionable sleds (#5003)

b941323

Contributor Author

jgallagher commented Feb 8, 2024

Note: this will fix #4886, I assume.

Not quite - we still need a change on the executor side to insert db records for the external IP and NIC required by Nexus. I'm working on that now.

jgallagher merged commit dcbc9cb into main

21 checks passed

jgallagher deleted the john/blueprint-planner-nexus branch

February 8, 2024 21:54

jgallagher mentioned this pull request

[nexus] Make nexus aware of zone filesystems #5050

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet