[#5333 3/6] Region snapshot replacement start #6294

jmpesp · 2024-08-12T18:01:10Z

This commit adds a "region snapshot replacement start" background task that will:

check for snapshots that are stored on disks that were expunged, and
insert region snapshot replacement requests for them.
check if there are any region snapshot replacement requests in the
Requested state and run the new "region snapshot replacement start"
saga for them. This background task will also pick up manually
requested region snapshot replacements.

Also in this commit is the "region snapshot replacement start saga", which will transition a region snapshot replacement request from Requested to Allocating and:

allocate a new Region (with source set to Some so that a clone occurs)
create a blank Volume that will be used to stash the snapshot to delete later
swap the snapshot being replaced with the new region that was cloned from the others in the read-only region set
update the region snapshot replacement request's state to "ReplacementDone" and clearing the operating saga id

This represents the first step to be taken after a snapshot goes away: allocate the replacement, and swap it in to the affected snapshot volume. Once this is done, anything that uses the snapshot volume as a read-only parent will no longer reference the expunged snapshot, and will work without any issues.

Existing constructed Volumes running in a propolis or pantry context will remain unmodified: a future commit will contain the saga that takes care of performing the necessary read-only target replacements.

This commit adds a "region snapshot replacement start" background task that will: 1) check for snapshots that are stored on disks that were expunged, and insert region snapshot replacement requests for them. 2) check if there are any region snapshot replacement requests in the `Requested` state and run the new "region snapshot replacement start" saga for them. This background task will also pick up manually requested region snapshot replacements. Also in this commit is the "region snapshot replacement start saga", which will transition a region snapshot replacement request from `Requested` to `Allocating` and: - allocate a new Region (with source set to Some so that a clone occurs) - create a blank Volume that will be used to stash the snapshot to delete later - swap the snapshot being replaced with the new region that was cloned from the others in the read-only region set - update the region snapshot replacement request's state to "ReplacementDone" and clearing the operating saga id This represents the first step to be taken after a snapshot goes away: allocate the replacement, and swap it in to the affected snapshot volume. Once this is done, anything that uses the snapshot volume as a read-only parent will no longer reference the expunged snapshot, and will work without any issues. Existing constructed Volumes running in a propolis or pantry context will remain unmodified: a future commit will contain the saga that takes care of performing the necessary read-only target replacements.

andrewjstone

This looks great @jmpesp. Just minor comments from me.

andrewjstone · 2024-08-12T22:43:46Z

nexus/src/app/background/tasks/region_snapshot_replacement_start.rs

+
+                    Err(e) => {
+                        let s =
+                            format!("error creating replacement request: {e}");


Worth logging info about which region_id and snapshot_id the error occurred on, similar to the Ok case above?

definitely, that's an oversight on my part. added in fea9ad9

andrewjstone · 2024-08-12T22:44:25Z

nexus/src/app/background/tasks/region_snapshot_replacement_start.rs

+
+                Err(e) => {
+                    let s =
+                        format!("error looking up replacement request: {e}");


Worth logging information about which snapshot was problematic, similar to below?

also added in fea9ad9

andrewjstone · 2024-08-12T22:46:32Z

nexus/src/app/background/tasks/region_snapshot_replacement_start.rs

+
+                Err(e) => {
+                    let s = format!(
+                        "sending region snapshot replacement request failed: \


Log info about the request that failed?

yep! done in 4a0ae26

andrewjstone · 2024-08-12T22:59:16Z

nexus/src/app/background/tasks/region_snapshot_replacement_start.rs

+        assert_eq!(result, RegionSnapshotReplacementStartStatus::default());
+        assert_eq!(starter.count_reset(), 0);
+
+        // Add a region snapshot replacement request for a fake region snapshot


Probably a big pain in the ass, but this test could be a touch more complete if it also exercised: create_requests_for_region_snapshots_on_expunged_disks. I believe you'd have to create a fake disk in the datastore, and a region, and then expunge the fake disk.

👍 ended up being not too bad, see 7132858

andrewjstone · 2024-08-12T22:59:44Z

nexus/src/app/background/tasks/region_snapshot_replacement_start.rs

+// License, v. 2.0. If a copy of the MPL was not distributed with this
+// file, You can obtain one at https://mozilla.org/MPL/2.0/.
+
+//! Background task for detecting region snapshots that need replacing and


I love how easy to read this whole RPW is.

andrewjstone · 2024-08-13T16:56:38Z

nexus/src/app/sagas/region_snapshot_replacement_start.rs

+    let alloc_region_params =
+        sagactx.lookup::<AllocRegionParams>("alloc_region_params")?;
+
+    let (.., db_snapshot) = LookupPath::new(&opctx, &osagactx.datastore())


You already look up the snapshot in rsrss_get_alloc_region_params. Why do you look it up again here? Can you instead save the volume_id and snapshot_id in AllocRegionParams?

I like it - done in f2b9fc7 (soon on the list will be to use more typed uuids...)

andrewjstone · 2024-08-13T18:49:09Z

nexus/src/app/sagas/region_snapshot_replacement_start.rs

+    // downstairs port. Once this goes away, Nexus will require a way to query
+    // for the repair port!
+
+    let mut repair_addr: SocketAddrV6 =


I think it would be clearer to explicitly say source_repair_addr here. This makes the call to ensure_region_in_dataset below more explicit in what it does.

Agreed, added in a2bda34 along with expanding the comment for ensure_region_in_dataset

andrewjstone · 2024-08-13T19:20:30Z

nexus/src/app/sagas/region_snapshot_replacement_start.rs

+    osagactx
+        .datastore()
+        .volume_replace_snapshot(
+            VolumeWithTarget(old_volume_id),


AFAICT, the only change between this undo and the prior method (as stated in the comment) is this swap in parameters and the log message above. It may be worth while to create a helper method to do all the lookups and then only perform this last call in the forward/undo functions. That would make it clearer what is different IMO.

agreed, done in 94bc3cb

leftwo

Thanks for getting this out so quickly. No comments of any substance.

A question, this will replace the downstairs for a snapshot.
However, that snapshot could be a read_only_parent for a bunch of other disks.
Is that handled in a future PR?

leftwo · 2024-08-12T23:45:01Z

nexus/src/app/background/tasks/region_snapshot_replacement_start.rs

+//! beginning that process
+//!
+//! This task's responsibility is to create region snapshot replacement requests
+//! when physical disks are expunged, and trigger the region snapshot


Do we have to wait for expungement, or can other things trigger a snapshot
replacement (not counting omdb)? I don' t think at the moment there exists
anytihing that would want to. I suspect at some point we may want the ability
to "move things around" and that means moving snapshots (or regions) off
a disk that is not expunged.

Right, we were talking in the storage huddle and in chat about getting snapshot scrubbing for free once this is in - that would look like inserting region snapshot replacement requests directly. That'll come later after R10 :)

leftwo · 2024-08-13T22:49:09Z

nexus/src/app/sagas/region_snapshot_replacement_start.rs

+//! 1. Allocate a new region
+//!
+//! 2. Create a blank volume that can be later deleted to stash the snapshot
+//!    being replaced. This is filled in the `volume_replace_snapshot`


Nit, change "This is filled in the .." to "This is populated in the .."

done in f89a395

leftwo · 2024-08-13T22:51:04Z

nexus/src/app/sagas/region_snapshot_replacement_start.rs

+//! request, and change the state to "Allocating". Then, it performs the
+//! following steps:
+//!
+//! 1. Allocate a new region


This step includes the "clone" of an existing region, correct?

It does yeah - later on ensure_region_in_dataset is called with a source repair address.

leftwo · 2024-08-13T22:55:01Z

nexus/src/app/sagas/region_snapshot_replacement_start.rs

+    GET_OLD_SNAPSHOT_VOLUME_ID -> "old_snapshot_volume_id" {
+        + rsrss_get_old_snapshot_volume_id
+    }
+    CREATE_FAKE_VOLUME -> "unused_3" {


Why unsued_3 before unused_2

because I can't count! 4319c0a

leftwo · 2024-08-13T23:00:21Z

nexus/src/app/sagas/region_snapshot_replacement_start.rs

+            .datastore()
+            .regions_hard_delete(log, vec![region.id()])
+            .await?;
+    }


If we don't find a new region and we expect to, should we log a message here?

added in 80b23e3

leftwo · 2024-08-13T23:10:44Z

nexus/src/app/sagas/region_snapshot_replacement_start.rs

+
+    // With a list of datasets and regions to ensure, other sagas need to have a
+    // separate no-op forward step for the undo action to ensure that the undo
+    // step occurs in the case that the ensure partially fails. Here this not


Nit: final sentence: "Here this is not required,.."

nice, 73c95c6

leftwo · 2024-08-13T23:21:23Z

nexus/src/app/sagas/region_snapshot_replacement_start.rs

+
+    let Some(region_snapshot) = region_snapshot else {
+        return Err(ActionError::action_failed(format!(
+            "region snapshot {} {} {} deleted!",


Do we know this is deleted? Is that what a None means?

We do yeah - if the corresponding region_snapshot is missing then that means the volume references went to zero, meaning the crucible resources would have been cleaned up and the record hard deleted.

leftwo · 2024-08-13T23:29:26Z

nexus/src/app/sagas/region_snapshot_replacement_start.rs

+
+    info!(
+        log,
+        "undo: replacing {} with {} in volume {}",


At this point is "old" from the broken region we were fixing? Or have we swapped
old/new here? I'm not sure if this log is printing what I think it should be, which is
opposite from the srss_replace_snapshot_in_volume().

Or, wait, by saying undo at the top, you mean we are undo-ing what the previous
step did, which is putting back things like they were. I guess I find this a little
confusing, but if all the undo messages are like this, I'll just live with it.

The old and new is not swapped in this log message, where it is in the volume_replace_snapshot arguments. You're right, adding undo: is meant to signify that it's undoing the forward action. The intention is that someone can search for the text "replacing UUID with UUID in volume UUID" and get two results.

I put a small comment in 1b97525 explaining this intention.

jmpesp requested review from andrewjstone and leftwo August 12, 2024 18:01

hakari update

65943d0

andrewjstone approved these changes Aug 13, 2024

View reviewed changes

jmpesp added 7 commits August 13, 2024 19:49

log region snapshot details

fea9ad9

log more when invoking region replacement start saga

4a0ae26

added test_expunge_disk_causes_region_snapshot_replacement_start

7132858

store snapshot id and volume id in AllocRegionParams

f2b9fc7

explicitly say source repair address, and expand comment

a2bda34

pull common code into get_replace_params fn

94bc3cb

matching strings is precarious!

987bd13

leftwo approved these changes Aug 13, 2024

View reviewed changes

jmpesp added 5 commits August 14, 2024 16:25

filled -> populated

f89a395

sort by number!

4319c0a

warn if maybe_dataset_and_region is None

80b23e3

here this *is* not

73c95c6

add small comment for non-swapped log message

1b97525

jmpesp enabled auto-merge (squash) August 14, 2024 16:42

jmpesp merged commit 823f8ab into oxidecomputer:main Aug 14, 2024
18 checks passed

jmpesp deleted the snapshot_replacement_part_3 branch August 14, 2024 17:53

davepacheco added a commit that referenced this pull request Aug 15, 2024

fix auto-mismerge of #6294 and #6327

4d6683a

davepacheco mentioned this pull request Aug 15, 2024

fix auto-mismerge of #6294 and #6327 #6344

Merged

davepacheco added a commit that referenced this pull request Aug 15, 2024

fix auto-mismerge of #6294 and #6327 (#6344)

912038c

[#5333 3/6] Region snapshot replacement start #6294

[#5333 3/6] Region snapshot replacement start #6294

Conversation

jmpesp commented Aug 12, 2024

andrewjstone left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leftwo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment