[Sled Agent] Expunged disks are not in use after omicron_physical_disks_ensure #5965

smklein · 2024-06-27T20:55:32Z

omicron_physical_disks_ensure is an API exposed by Sled Agent, which allows Nexus to control the set of
"active" control plane disks.

Although this API was exposed, it previously did not stop the Sled Agent from using expunged disks under
all circumstances. This PR now adjusts the endpoint to "flush out" all old usage of disks before returning.

This PR:

Ensures dump device management lets go of expunged U.2s
Ensures Zone bundles let go of expunged U.2s
Removes any network probes allocated with a transient filesystem on an expunged U.2
Removes any VMMs allocates with a transient filesystem on an expunged U.2

Fixes #5929

…ms-2

smklein

Hello, all! I've added multiple reviewers to this PR because it's pretty cross-cutting, and I wanted to give a "heads-up" that I'm poking at several different subcomponents of the Sled Agent.

I recommend everyone start reviewing this PR from sled-agent/src/sled_agent.rs, in the omicron_physical_disks_ensure function. That provides a top-down view of the changes in this PR.

Basically:

First, we update the set of control plane disks used by this Sled (same as before this PR). This can add or remove disks.
Then we ask the following subsystems to "stop using old disks":
StorageMonitor / dump devices
Zone bundler
Probe manager
Instance Manager
If and only if all those subsystems let go of old disks, we return to the caller (Nexus, presumably), and let them know that we've stopped using expunged devices

Given that this touches so many different subsystems, I wanted to give folks an appropriate heads-up about "what is happening".

@lifning - would you be willing to look at the storage monitor / dump setup?
@bnaecker - would you be willing to look at the zone bundler?
@rcgoodfellow - would you be willing to look at the probe manager?
@hawkw - would you be willing to look at the instance manager?

sled-agent/src/instance_manager.rs

hawkw · 2024-07-02T23:45:20Z

sled-agent/src/instance_manager.rs

+
+        for id in to_remove {
+            info!(self.log, "only_use_disks: Removing instance"; "id" => ?id);
+            self.instances.remove(&id);


What actually happens when we remove an instance from this map? What happens to the instance as a result?

Per the comment in sled_agent.rs, it seems like we want to mark as failed any instances which are using expunged disks, but I don't actually see where that happens.

Great question - we don't actually explicitly mark the instance failed here, we just stop it from running.

This is basically the same pathway a VMM could use to terminate itself.
This is the object we end up removing via the call to self.instances.remove:

omicron/sled-agent/src/instance.rs

Lines 904 to 910 in 30b6713

/// A reference to a single instance running a running Propolis server.

pub struct Instance {

tx: mpsc::Sender<InstanceRequest>,

#[allow(dead_code)]

runner_handle: tokio::task::JoinHandle<()>,

}

And by dropping it, we remove the only sender-side of an mpsc of InstanceRequests.

This should trigger the self-terminating case of the InstanceRunner:

omicron/sled-agent/src/instance.rs

Lines 450 to 454 in 30b6713

None => {

warn!(self.log, "Instance request channel closed; shutting down");

self.terminate().await;

break;

},

This is kinda why I mentioned I'd be relying on the background task -- as implemented, I'd need to wait for a background task to notice that the instance is not running on the sled.

I can introduce an upcall to nexus, but I figured that could throw more of a wrench into the instance lifecycle revamp. Lemme know what you prefer!

Coming back to this after a couple days, I guess it's possible that we return from the omicron_physical_disks_ensure function before the instance runner has finished terminating. I'll explicitly terminate here too, which should ensure the runner is not continuing to process the request.

Done in e360dae

we don't actually explicitly mark the instance failed here, we just stop it from running.

That's good! Eventually, sled-agent will never mark instances failed. I think we do want to mark the vmm state as failed, though.

Ack, thanks for bearing with me. I'm giving this a shot in: f242e0a

Right now, I'm just passing an extra boolean to some of the termination functions to decide what VMM state to propagate. Later, I think it might be useful to expand this to a stringified message too, so we can gain some better diagnostic info on why an instance failed or stopped.

lifning

changes to StorageMonitor and DumpSetup look good to me! (one nit on a variable name)

lifning · 2024-07-04T03:52:15Z

sled-agent/src/dump_setup.rs

+        // This is particularly useful for disk expungement, when a caller
+        // wants to ensure that the dump device is no longer accessing an
+        // old device.
+        let mut last_update_complete_tx = None;


the prefix last_ is ever-so-slightly confusing to its purpose - there's no code path where it isn't .take()n later in the same iteration of the loop where it was = Some()'d. (of course, current_ wouldn't be particularly clear either.. perhaps update_and_archiving_complete_tx or something of the sort?)

How about evaluation_and_archiving_complete_tx ? The fact that it's optional, IMO, implies that it may or may not be set, but we do respect that contract if it is set within a loop.

Done in 691bc85

rcgoodfellow

LGTM. Took this branch for a spin and ran the probes connectivity test in a4x2, things look good there.

$ pfexec ../target/debug/commtest \
    --api-timeout 30m \
    http://198.51.100.23 run \
    --ip-pool-begin 198.51.100.40 \
    --ip-pool-end 198.51.100.70 \
    --icmp-loss-tolerance 500 \
    --test-duration 200s \
    --packet-rate 10
the api is up
logging in ... done
project does not exist, creating ... done
default ip pool does not exist, creating ...done
ip range does not exist, creating ... done
getting sled ids ... done
checking if probe0 exists
probe0 does not exist, creating ... done
checking if probe1 exists
probe1 does not exist, creating ... done
checking if probe2 exists
probe2 does not exist, creating ... done
checking if probe3 exists
API call error: Communication Error: error sending request for url (http://198.51.100.23/experimental/v1/probes/probe3?project=classone): connection error: Connection timed out (os error 145), retrying in 3 s
probe3 does not exist, creating ... done
testing connectivity to probes
addr            low     avg     high    last    sent    received  lost
198.51.100.41   0.644   1.186   2.225   1.006   1998    1988      9
198.51.100.40   0.666   1.095   2.556   1.259   1998    1999      0
198.51.100.42   0.691   1.289   5.446   0.867   1998    1997      0
198.51.100.43   0.623   1.089   3.092   1.288   1998    1770      2
all connectivity tests within loss tolerance

smklein · 2024-07-05T20:42:57Z

I'm wiring up the command to actually expunge sleds in #5994 , which is built on top of this PR.

The basic plumbing appears to work there, though I'm admittedly not running instances on a4x2.

hawkw · 2024-07-05T20:57:07Z

sled-agent/src/instance_manager.rs

@@ -333,7 +333,10 @@ impl InstanceManager {
    ///
    /// This function looks for transient zone filesystem usage on expunged
    /// zpools.
-    pub async fn only_use_disks(&self, disks: AllDisks) -> Result<(), Error> {
+    pub async fn use_only_these_disks(


…ts_go

Provides an internal API to remove disks, and wires it into omdb. Additionally, expands omdb commands for visibility. - `omdb db physical-disks` can be used to view all "control plane physical disks". This is similar to, but distinct from, the `omdb db inventory physical-disks` command, as it reports control plane disks that have been adopted in the control plane. This command is necessary for identifying the UUID of the associated control plane object, which is not observable via inventory. - `omdb nexus sleds expunge-disk` can be used to expunge a physical disk from a sled. This relies on many prior patches to operate correctly, but with the combination of: #5987, #5965, #5931, #5952, #5601, #5599, we can observe the following behavior: expunging a disk leads to all "users" of that disk (zone filesystems, datasets, zone bundles, etc) being removed. I tested this PR on a4x2 using the following steps: ```bash # Boot a4x2, confirm the Nexus zone is running # From g0, zlogin oxz_switch $ omdb db sleds SERIAL IP ROLE POLICY STATE ID g2 [fd00:1122:3344:103::1]:12345 - in service active 29fede5f-37e4-4528-bcf2-f3ee94924894 g0 [fd00:1122:3344:101::1]:12345 scrimlet in service active 6a2c7019-d055-4256-8bad-042b97aa0e5e g1 [fd00:1122:3344:102::1]:12345 - in service active a611b43e-3995-4cd4-9603-89ca6aca3dc5 g3 [fd00:1122:3344:104::1]:12345 scrimlet in service active f62f2cfe-d17b-4bd6-ae64-57e8224d3672 # We'll plan on expunging a disk on g1, and observing the effects. $ export SLED_ID=a611b43e-3995-4cd4-9603-89ca6aca3dc5 $ export OMDB_SLED_AGENT_URL=http://[fd00:1122:3344:102::1]:12345 $ omdb sled-agent zones list "oxz_cockroachdb_b3fecda8-2eb8-4ff3-9cf6-90c94fba7c50" "oxz_crucible_19831c98-3137-4af4-a93d-fc1a17c138f2" "oxz_crucible_6adcb8ec-6c9e-4e8a-a8d4-bbf9ad44e2c4" "oxz_crucible_74b2f587-10ce-4131-97fd-9832c52c8a41" "oxz_crucible_9e422508-f4d5-4c24-8dde-0080c0916419" "oxz_crucible_a47e9625-d189-4001-877a-cc3aa5b1f3eb" "oxz_crucible_pantry_c3b4e3cb-3e23-4f5e-921b-04e4801924fd" "oxz_external_dns_7e669b6f-a3fe-47a9-addd-20e42c58b8bb" "oxz_internal_dns_1a45a6e8-5b03-4ab4-a3db-e83fb7767767" "oxz_ntp_209ad0d0-a5e7-4ab8-ac8f-e99902697b32" "oxz_oximeter_864efebb-790f-4b7a-8377-b2c82c87f5b8" $ omdb db physical-disks | grep $SLED_ID ID SERIAL VENDOR MODEL SLED_ID POLICY STATE 23524716-a331-4d57-aa71-8bd4dbc916f8 synthetic-serial-g1_0 synthetic-vendor synthetic-model-U2 a611b43e-3995-4cd4-9603-89ca6aca3dc5 in service active 3ca1812b-55e3-47ed-861f-f667f626c8a0 synthetic-serial-g1_3 synthetic-vendor synthetic-model-U2 a611b43e-3995-4cd4-9603-89ca6aca3dc5 in service active 40139afb-7076-45d9-84cf-b96eefe7acf8 synthetic-serial-g1_1 synthetic-vendor synthetic-model-U2 a611b43e-3995-4cd4-9603-89ca6aca3dc5 in service active 5c8e33dd-1230-4214-af78-9be892d9f421 synthetic-serial-g1_4 synthetic-vendor synthetic-model-U2 a611b43e-3995-4cd4-9603-89ca6aca3dc5 in service active 85780bbf-8e2d-481e-9013-34611572f191 synthetic-serial-g1_2 synthetic-vendor synthetic-model-U2 a611b43e-3995-4cd4-9603-89ca6aca3dc5 in service active # Let's expunge the "0th" disk here. $ omdb nexus sleds expunge-disk 23524716-a331-4d57-aa71-8bd4dbc916f8 -w $ omdb nexus blueprints regenerate -w $ omdb nexus blueprints show $NEW_BLUEPRINT_ID # Observe that the new blueprint for the sled expunges some zones -- minimally, # the Crucible zone -- and no longer lists the "g1_0" disk. This should also be # summarized in the blueprint metadata comment. $ omdb nexus blueprints target set $NEW_BLUEPRINT_ID enabled -w $ omdb sled-agent zones list zones: "oxz_crucible_19831c98-3137-4af4-a93d-fc1a17c138f2" "oxz_crucible_74b2f587-10ce-4131-97fd-9832c52c8a41" "oxz_crucible_9e422508-f4d5-4c24-8dde-0080c0916419" "oxz_crucible_a47e9625-d189-4001-877a-cc3aa5b1f3eb" "oxz_crucible_pantry_c3b4e3cb-3e23-4f5e-921b-04e4801924fd" "oxz_ntp_209ad0d0-a5e7-4ab8-ac8f-e99902697b32" "oxz_oximeter_864efebb-790f-4b7a-8377-b2c82c87f5b8" # As we can see, the expunged zones have been removed. # We can also access the sled agent logs from g1 to observe that the expected requests have been sent # to adjust the set of control plane disks and expunge the expected zones. ``` This is a major part of #4719 Fixes #5370

smklein added 30 commits June 20, 2024 12:36

Start requiring zone filesystem argument

cee78bc

Deprecate the old service format

3be4b6e

Merge branch 'main' into nexus-zone-filesystems-2

8756076

Merge branch 'deprecate-services-migration' into nexus-zone-filesyste…

0aac450

…ms-2

Plumbing through filesystem_pool, still need to make it optional

3833549

Merge branch 'main' into deprecate-services-migration

aea4bdb

review feedback

b58352f

no bail just warn

a04e9c7

Merge branch 'deprecate-services-migration' into nexus-zone-filesyste…

4615f1b

…ms-2

Merge branch 'main' into deprecate-services-migration

9f09c32

Merge branch 'deprecate-services-migration' into nexus-zone-filesyste…

9db3042

…ms-2

optional value

a96fc81

are we optional yet

9858dbf

lie about filesystem_pools for simulated sled agent

f1e6f7a

Patch test_builder_zones

8a9ade7

Fix test_silos_external_dns_end_to_end

d7c462c

patch v3 schema

3c59610

Patch blueprint edit

1270098

Add schema change

f48fba3

fmt

684932d

Merge branch 'main' into nexus-zone-filesystems-2

87b8df9

helios tests

52406a6

Merge branch 'main' into nexus-zone-filesystems-2

acaf91f

Cleanup

b1339d4

Merge branch 'main' into nexus-zone-filesystems-2

fcea2f1

only pick in-service zpools from reconfigurator - regression test wanted

53027a3

Merge zpool selection fns

ae41399

Add colocation test

5b38070

Merge branch 'main' into nexus-zone-filesystems-2

f0ab1c2

Ensure expunged disks are not in use after omicron_physical_disks_ensure

b883eec

smklein commented Jul 2, 2024

View reviewed changes

smklein requested review from lifning, rcgoodfellow, hawkw and bnaecker July 2, 2024 22:08

Add a bunch of logging

e933a46

hawkw reviewed Jul 2, 2024

View reviewed changes

review feedback

154a071

lifning approved these changes Jul 4, 2024

View reviewed changes

rcgoodfellow approved these changes Jul 4, 2024

View reviewed changes

smklein added 5 commits July 5, 2024 09:02

tx naming

691bc85

more explicit instance termination

e360dae

better handling of oneshot tx in instance manager

ec013d9

use_only_these_disks

a818de2

Mark vmm failed

f242e0a

smklein mentioned this pull request Jul 5, 2024

[nexus] Expunge disk internal API, omdb commands #5994

Merged

hawkw approved these changes Jul 5, 2024

View reviewed changes

smklein mentioned this pull request Jul 11, 2024

U.2 Physical Disk Deactivation #4719

Open

16 tasks

smklein added 4 commits July 11, 2024 17:10

Merge branch 'main' into stop-self-managing-disks

77931fd

Merge branch 'stop-self-managing-disks' into physical_disks_ensure_le…

6babd19

…ts_go

Merge branch 'main' into stop-self-managing-disks

d8a5465

Merge branch 'stop-self-managing-disks' into physical_disks_ensure_le…

d57ec70

…ts_go

Base automatically changed from stop-self-managing-disks to main July 15, 2024 20:34

smklein added 2 commits July 15, 2024 13:35

Merge branch 'main' into stop-self-managing-disks

9e1d729

Merge branch 'stop-self-managing-disks' into physical_disks_ensure_le…

426daf1

…ts_go

smklein enabled auto-merge (squash) July 15, 2024 20:38

smklein merged commit ad6c92e into main Jul 15, 2024
19 checks passed

smklein deleted the physical_disks_ensure_lets_go branch July 15, 2024 22:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Sled Agent] Expunged disks are not in use after omicron_physical_disks_ensure #5965

[Sled Agent] Expunged disks are not in use after omicron_physical_disks_ensure #5965

smklein commented Jun 27, 2024 •

edited

Loading

smklein left a comment

hawkw Jul 2, 2024

smklein Jul 3, 2024

smklein Jul 3, 2024

smklein Jul 5, 2024 •

edited

Loading

smklein Jul 5, 2024

hawkw Jul 5, 2024

smklein Jul 5, 2024

lifning left a comment

lifning Jul 4, 2024

smklein Jul 5, 2024

smklein Jul 5, 2024

rcgoodfellow left a comment

smklein commented Jul 5, 2024

hawkw Jul 5, 2024

	/// A reference to a single instance running a running Propolis server.
	pub struct Instance {
	tx: mpsc::Sender<InstanceRequest>,

	#[allow(dead_code)]
	runner_handle: tokio::task::JoinHandle<()>,
	}

	None => {
	warn!(self.log, "Instance request channel closed; shutting down");
	self.terminate().await;
	break;
	},

[Sled Agent] Expunged disks are not in use after omicron_physical_disks_ensure #5965

[Sled Agent] Expunged disks are not in use after omicron_physical_disks_ensure #5965

Conversation

smklein commented Jun 27, 2024 • edited Loading

smklein left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

smklein Jul 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lifning left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rcgoodfellow left a comment

Choose a reason for hiding this comment

smklein commented Jul 5, 2024

Choose a reason for hiding this comment

smklein commented Jun 27, 2024 •

edited

Loading

smklein Jul 5, 2024 •

edited

Loading