-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow Nexus to specify Zone filesystem location #5931
Conversation
// Add a zpool to both sleds, just to ensure that all new zones can find | ||
// a transient filesystem wherever they end up being placed. | ||
let _sled_agent_zpools = DiskTestBuilder::new(&cptestctx) | ||
.on_sled(SledUuid::from_untyped_uuid( | ||
cptestctx.sled_agent.sled_agent.id, | ||
)) | ||
.with_zpool_count(1) | ||
.build() | ||
.await; | ||
let _sled_agent2_zpools = DiskTestBuilder::new(&cptestctx) | ||
.on_sled(SledUuid::from_untyped_uuid( | ||
cptestctx.sled_agent2.sled_agent.id, | ||
)) | ||
.with_zpool_count(1) | ||
.build() | ||
.await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This "works" -- it forces zpools to be allocated on all sleds before we try to create a blueprint -- but it's a little gross.
I've punted cleanup of the DiskTest
stuff into #5939, just to keep this PR more reasonably-sized.
@@ -703,6 +704,53 @@ pub struct TestZpool { | |||
pub datasets: Vec<TestDataset>, | |||
} | |||
|
|||
pub struct DiskTestBuilder<'a, N: NexusServer> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This builder specifically is cleaned up / used by default in #5939 . However, doing so in a "not-just-additive" way changes a lot of code unrelated to this PR, so, it got split out.
// TODO-cleanup This currently returns `None` for zones that only have | ||
// transient datasets. This should change to a non-optional value once Nexus | ||
// is aware of them. | ||
pub fn dataset(&self) -> Option<&OmicronZoneDataset> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opted to change the name of this function as a way to address this "TODO".
As a follow-up: #5952 actually expunges these zones if the physical disk backing one of these transient zone filesystems goes away. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. It's awesome to see this control progressing up to Nexus!
} | ||
} | ||
|
||
for &zpool_id in resources.all_zpools(ZpoolFilter::InService) { | ||
for &zpool_id in all_in_service_zpools { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this isn't explicitly part of this PR, but it appears that we always choose the first zpool that doesn't have a zone of the same type on it. This means that we aren't really balancing the load across zpools. I'm not sure if this matters or not, but it seems like it might be somewhat accidental.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm reading blame correctly, this disposition was originally made in #5797 , when the sled_alloc_zpool
function was originally added.
@jgallagher - was this to make pool allocation deterministic? I agree that we will probably want the ability to evenly distribute load across different zpools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jgallagher - was this to make pool allocation deterministic?
Yes, and also it was only choosing CRDB zones, so I didn't put much thought into it.
We probably want something like OmicronZonePlacement
but for disks? (Or to expand OmicronZonePlacement
to also place disks.) It is deterministic but also distributes load as evenly as it can within the set of restrictions it has. I'll file an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes total sense, thanks for filing. I can look into the issue.
Builds on #5931 The `DiskTest` utility provides mechanisms to create zpools and datasets within a particular Sled Agent. Historically, this was used on a *single* sled agent within the control plane test context, but more recently, we've added a fair amount of multi-sled tests. This PR updates the `DiskTest` to be more usable with multiple sleds: - It provides a `DiskTestBuilder` to help callers customize behavior - It lets callers access virtual disks spread across multiple distinct sleds
Provides an internal API to remove disks, and wires it into omdb. Additionally, expands omdb commands for visibility. - `omdb db physical-disks` can be used to view all "control plane physical disks". This is similar to, but distinct from, the `omdb db inventory physical-disks` command, as it reports control plane disks that have been adopted in the control plane. This command is necessary for identifying the UUID of the associated control plane object, which is not observable via inventory. - `omdb nexus sleds expunge-disk` can be used to expunge a physical disk from a sled. This relies on many prior patches to operate correctly, but with the combination of: #5987, #5965, #5931, #5952, #5601, #5599, we can observe the following behavior: expunging a disk leads to all "users" of that disk (zone filesystems, datasets, zone bundles, etc) being removed. I tested this PR on a4x2 using the following steps: ```bash # Boot a4x2, confirm the Nexus zone is running # From g0, zlogin oxz_switch $ omdb db sleds SERIAL IP ROLE POLICY STATE ID g2 [fd00:1122:3344:103::1]:12345 - in service active 29fede5f-37e4-4528-bcf2-f3ee94924894 g0 [fd00:1122:3344:101::1]:12345 scrimlet in service active 6a2c7019-d055-4256-8bad-042b97aa0e5e g1 [fd00:1122:3344:102::1]:12345 - in service active a611b43e-3995-4cd4-9603-89ca6aca3dc5 g3 [fd00:1122:3344:104::1]:12345 scrimlet in service active f62f2cfe-d17b-4bd6-ae64-57e8224d3672 # We'll plan on expunging a disk on g1, and observing the effects. $ export SLED_ID=a611b43e-3995-4cd4-9603-89ca6aca3dc5 $ export OMDB_SLED_AGENT_URL=http://[fd00:1122:3344:102::1]:12345 $ omdb sled-agent zones list "oxz_cockroachdb_b3fecda8-2eb8-4ff3-9cf6-90c94fba7c50" "oxz_crucible_19831c98-3137-4af4-a93d-fc1a17c138f2" "oxz_crucible_6adcb8ec-6c9e-4e8a-a8d4-bbf9ad44e2c4" "oxz_crucible_74b2f587-10ce-4131-97fd-9832c52c8a41" "oxz_crucible_9e422508-f4d5-4c24-8dde-0080c0916419" "oxz_crucible_a47e9625-d189-4001-877a-cc3aa5b1f3eb" "oxz_crucible_pantry_c3b4e3cb-3e23-4f5e-921b-04e4801924fd" "oxz_external_dns_7e669b6f-a3fe-47a9-addd-20e42c58b8bb" "oxz_internal_dns_1a45a6e8-5b03-4ab4-a3db-e83fb7767767" "oxz_ntp_209ad0d0-a5e7-4ab8-ac8f-e99902697b32" "oxz_oximeter_864efebb-790f-4b7a-8377-b2c82c87f5b8" $ omdb db physical-disks | grep $SLED_ID ID SERIAL VENDOR MODEL SLED_ID POLICY STATE 23524716-a331-4d57-aa71-8bd4dbc916f8 synthetic-serial-g1_0 synthetic-vendor synthetic-model-U2 a611b43e-3995-4cd4-9603-89ca6aca3dc5 in service active 3ca1812b-55e3-47ed-861f-f667f626c8a0 synthetic-serial-g1_3 synthetic-vendor synthetic-model-U2 a611b43e-3995-4cd4-9603-89ca6aca3dc5 in service active 40139afb-7076-45d9-84cf-b96eefe7acf8 synthetic-serial-g1_1 synthetic-vendor synthetic-model-U2 a611b43e-3995-4cd4-9603-89ca6aca3dc5 in service active 5c8e33dd-1230-4214-af78-9be892d9f421 synthetic-serial-g1_4 synthetic-vendor synthetic-model-U2 a611b43e-3995-4cd4-9603-89ca6aca3dc5 in service active 85780bbf-8e2d-481e-9013-34611572f191 synthetic-serial-g1_2 synthetic-vendor synthetic-model-U2 a611b43e-3995-4cd4-9603-89ca6aca3dc5 in service active # Let's expunge the "0th" disk here. $ omdb nexus sleds expunge-disk 23524716-a331-4d57-aa71-8bd4dbc916f8 -w $ omdb nexus blueprints regenerate -w $ omdb nexus blueprints show $NEW_BLUEPRINT_ID # Observe that the new blueprint for the sled expunges some zones -- minimally, # the Crucible zone -- and no longer lists the "g1_0" disk. This should also be # summarized in the blueprint metadata comment. $ omdb nexus blueprints target set $NEW_BLUEPRINT_ID enabled -w $ omdb sled-agent zones list zones: "oxz_crucible_19831c98-3137-4af4-a93d-fc1a17c138f2" "oxz_crucible_74b2f587-10ce-4131-97fd-9832c52c8a41" "oxz_crucible_9e422508-f4d5-4c24-8dde-0080c0916419" "oxz_crucible_a47e9625-d189-4001-877a-cc3aa5b1f3eb" "oxz_crucible_pantry_c3b4e3cb-3e23-4f5e-921b-04e4801924fd" "oxz_ntp_209ad0d0-a5e7-4ab8-ac8f-e99902697b32" "oxz_oximeter_864efebb-790f-4b7a-8377-b2c82c87f5b8" # As we can see, the expunged zones have been removed. # We can also access the sled agent logs from g1 to observe that the expected requests have been sent # to adjust the set of control plane disks and expunge the expected zones. ``` This is a major part of #4719 Fixes #5370
This is a newer version of #5050 , focused on a slightly smaller scope.
This PR adds an optional
filesystem_pool
to Sled Agent's API, and to Nexus' blueprint for zone construction.This allows Nexus to have control over which zpool is being used for zone provisioning, rather than letting the Sled Agent control this decision as it used to.
By giving more control to Nexus, Nexus can better understand the impact of faults (e.g., will pulling a physical disk cause a zone to fail?) and can have more control over zone-reallocation.
This field is optional largely for backwards compatibility, but in the future, will eventually become mandatory.
Fixes #5048
Part of #5929