-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blueprint planner: add nexus on every sled #4959
Conversation
Whoops, missed some tests. Working on fixing that and rebasing onto |
285c062
to
617b537
Compare
Ok, rebasing done and tests may be fixed (running locally too, but still slow). Clear for reviews; if there are more test breaks I won't rebase again. |
match nic.ip { | ||
IpAddr::V4(ip) => { | ||
if !existing_nexus_v4_ips.insert(ip) { | ||
bail!("duplicate Nexus NIC IP: {ip}"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems pretty bad. This is an invariant violation baked into our blueprints AFAICT. Should we log here, or do we expect the caller to log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's definitely bad! The two ways this could happen are:
- A bug in blueprint generation where we added a Nexus with a dupe IP (preventing this is the whole point of
existing_nexus_v4_ips
, etc) - An initial blueprint created from a collection contains dupes; this shouldn't be possible in practice given how RSS works
I don't think we'll ever see either of these, and I was tempted to make this a panic. But panicking nexus on "we should never see this" seems bad, hence the current bail!
.
fn random_mac(&mut self) -> MacAddr { | ||
let mut mac = MacAddr::random_system(); | ||
|
||
// TODO-performance if the size of `used_macs` starts to approach some |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to ignore this. It's not a well fleshed-out thought. I worry about using randomness in the planner because I think that'll make it harder to test, and easier to introduce thrashing behavior (where two Nexuses -- or even one! -- fight over an arbitrary decision). You could as well argue the other way around, that by using randomness we'll be forced to encounter and deal with thrashing. I guess it's just a gut feel that randomness in the planner is going to make our lives harder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm! It's a fair concern. My gut feeling is that there are going to be other places where we want "just pick a random one" (e.g., which of the N sleds should get an extra Nexus/cockroach/etc.), and if we do something too deterministic (e.g., sort by sled ID) we'll make our lives unnecessarily difficult by picking the same sled for too many things.
Maybe a bad idea: what if the planner seeded an RNG with something deterministic, maybe the parent blueprint ID? Then it has deterministic behavior for any single given parent, but we can still freely use randomness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a bad idea: what if the planner seeded an RNG with something deterministic, maybe the parent blueprint ID? Then it has deterministic behavior for any single given parent, but we can still freely use randomness.
I am a fan of this! It's basically how proptest and all the other related tools give you replay determinism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll make our lives unnecessarily difficult by picking the same sled for too many things
I don't think randomness really solves that problem, but admittedly it might be good enough for a while. I think I'd propose an approach where we narrow the set of sleds to which ones are acceptable for a choice, then pick deterministically among those. That deterministic process could initially be "pick the lexicographically-first sled id", but it could also be "pick the one with fewer zones on it" or "pick the one with the least CPU/disk/memory already allocated by the control plane" or "assign each zone type a score based on how much load we expect it to generate and then pick the sled with the lowest total score". That is, we can do something simple (pick the first) or sophisticated (if we find we need to deal better with distribution) but it's still deterministic.
This is a lot more complicated than generating MAC addresses, though. Again, feel free to ignore this but I mention it because the sequential allocator seems no worse here and eliminates the only current source of randomness in the planner (I think).
Maybe a bad idea: what if the planner seeded an RNG with something deterministic, maybe the parent blueprint ID? Then it has deterministic behavior for any single given parent, but we can still freely use randomness.
I'm not sure. I could see that going either way too. It doesn't help test determinism unless we ensure that we use deterministic UUID generation too. I looked into that briefly while writing tests but it seemed like it'd require a lot of surgery.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a bad idea: what if the planner seeded an RNG with something deterministic, maybe the parent blueprint ID? Then it has deterministic behavior for any single given parent, but we can still freely use randomness.
I think a deterministic source of randomness stored in the blueprint would be cool, and would probably end up getting used for a lot of things.
It doesn't help test determinism unless we ensure that we use deterministic UUID generation too. I looked into that briefly while writing tests but it seemed like it'd require a lot of surgery.
Deterministic UUID generation would be pretty useful as well I think, effectively requiring that a source of UUIDs be passed down to the various spots that generate them. Though it may be a fair bit of work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, thinking about it, it's a bit hard to enforce the invariant that people don't use the Uuid::new_v4
method long-term. One way is probably to not enable the v4
feature, but that seems excessive (using the system RNG is completely fine in most other spots, just not here).
I guess a unit test could generate blueprints with a fixed seed several times, and ensure that the generated UUIDs stay the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another idea really is to just ban new_v4
via clippy: https://rust-lang.github.io/rust-clippy/master/#/disallowed_methods
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine if we want to have a deterministic source of randomness for blueprints. At the same time, I think randomness is not a great solution to any of the problems we've mentioned, or at least not better than an alternative (like sequential MAC address allocation), so I'm not sure it's worth the effort.
I think my aversion to randomness here comes down to a few things:
- For automated testing, it's more annoying to verify that the planner generated the blueprint we expected because we can't just hardcode the values we know it will choose. Deterministic-random doesn't necessarily help here but it could. But we already wrote a higher-level
diff
thing that can tell if two blueprints are different, ignoring things like the "id" and "time_created" (which will always be different anyway). So this isn't that big a deal. - When we get to making this more autonomous, it will become important for Nexus instances to be able to look at some other instance's plan and say "is that okay?" That might be the same as "is that equivalent to the thing I would do?" -- or it might not. I alluded to this earlier. Choosing deterministically makes it easier for us to punt on this for longer (because two Nexus instances will tend to create the same plans), but it doesn't really solve the problem. Also: we're talking about hypotheticals on hypotheticals at this point and I really don't think we want to spend a lot of time on such things now.
- Probably the biggest thing: during general operation and especially in support contexts: it means the system's behavior is less predictable. I can imagine wanting to set a customer's expectations -- or even our own -- about what would happen when the planner is enabled or when given some input. With randomness, even deterministic randomness, that's harder.
- I've often reached for random approaches in the past and found that they work okay for a while, but in the limit / at scale, the variance winds up being a problem.
Honestly, none of these is that compelling. If there were a compelling reason to use randomness here, I'd say go for it. But I don't see any reason to use randomness, either, aside from being slightly quicker to implement than sequential. Either way, I don't think it's worth investing a lot of time on this part right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through this and basically copied what RSS was doing, so I ended up with sequential OPTE IP address generation and random MAC generation. It's a small change to switch over to sequential MAC generation; I can put that in this PR so I'm not introducing a new kind of randomness, and we can punt on whether we should have deterministic UUIDs for a followup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
9748940 switches to sequential MAC allocation. I'm not going to mark this conversation resolved because I don't want to hide it, but I think it's resolved as far as this PR is concerned; please correct me if that's wrong!
I think I've addressed most of the comments above. The changes are substantial enough that this warrants a second look, particularly 029dd2b which switches from "allocate one Nexus on every sled" to "allocate a total of |
I think I missed this on #4959, but instead of tacking onto that I'm opening this as a separate PR to confirm the guard is in the right place. If a sled has been marked as non-provisionable by an operator, I'm pretty confident we should NOT choose that sled when we're picking sleds for additional service zones (e.g., Nexus). I'm less confident on what we should do about NTP and Crucible zones. The change in this PR will still attempt to add an NTP zone, but will not attempt to add any Crucible zones. Is that the right place for it? Or should we still add Crucible zones to a non-provisionable sled (if it has zpools that don't yet have Crucible zones)?
Note: this will fix #4886, I assume. |
Not quite - we still need a change on the executor side to insert db records for the external IP and NIC required by Nexus. I'm working on that now. |
Most of the work in this PR is to give
BlueprintBuilder
the ability to add Nexus to a sled at all. A very minor bit to enable testing is a change to the planner to attempt to add a Nexus zone to every sled.I think (?) we probably don't want to merge this into
main
with the planner making that decision, but I'm opening this for review because (a) maybe I'm wrong, since we're driving all of this by hand, (b) the bulk of the PR is ready for review, and (c) we want a TUF repo with this change to test onmadrid
.