-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow recommissioning of previously-decommissioned sleds #5733
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As always, thanks for the great tests (checking idempotency)!
CONSTRAINT serial_part_revision_unique UNIQUE ( | ||
serial_number, part_number, revision | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, was wondering when we'd eventually drop this constraint, thanks for doing so!
match LookupPath::new(opctx, self) | ||
.sled_id(allocation.sled_id.into_untyped_uuid()) | ||
.optional_fetch_for(authz::Action::Read) | ||
.await? | ||
.map(|(_, sled)| sled.state()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of me wonders if this filtering is something we should be doing on the query for subnet allocations in the first place (e.g., join with the sled table if the hw_baseboard_id matches, or else return the highest subnet_octet + 1
), but I think it's fine to keep this as-is.
At first glance I was worried about doing a "query-per-allocation" in a loop, but I guess we're only doing that when the baseboard_id matches, which is pretty infrequent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good question. We can't join against hw_baseboard_id
directly, because sled
doesn't have one (although I added a TODO
in this PR that maybe it should?). We could left join on the sled_id
, though? I don't have strong feels about this since as you point out, we only do this when the baseboard matches, and we only land in this function at all when someone is trying to add a sled, which is itself quite infrequent.
6418c3e
to
97e2148
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great tests!
This is a pretty small delta on top of #5698 that I've been sitting on for a few days before getting a chance to test it on a4x2 this afternoon. I set up a4x2 with three sleds (g0, g1, g3), and went through 3.5 add/remove cycles on sled g2 (3 add/remove pairs and a 4th add). g2 is present in
omdb db sleds
:I added
SledFilter::Decommissioned
specifically for use with omdb, where we can see the 3 previous timesg2
was present (each with a different sled ID):The current blueprint shows history of all four times the sled has been present, because we don't currently prune expunged zones:
Also visible in that blueprint is evidence that we went through the "add NTP" -> "add crucible" reconfigurator steps. I did not go so far as to add a Nexus to g2 each time (it's considerably more manually intensive to do so, which we might need to address if that's a thing we want to test more thoroughly).
This also fixes a bug in
allocate_sled_underlay_subnet_octets
that I accidentally introduced in #5675, restoring idempotence up until the point where the sled has upserted itself. (The comments added should clarify this.) We can and should still fail on an attempt to add a sled where we (a) have an allocation and (b) have an entry in thesled
table, but no longer fail on attempt to add a sled where we (a) have an allocation but (b) do NOT have an entry in thesled
table.