Handle region snapshot replacement volume deletes #7046

jmpesp · 2024-11-12T20:42:55Z

Volumes can be deleted at any time, but the tasks and sagas that perform region snapshot replacement did not account for this. This commit adds checks in a few places for if a volume is soft-deleted or hard-deleted, and bails out of any affected region snapshot replacement accordingly:

if a volume that has the region snapshot being replaced is soft-deleted, then skip making a region snapshot replacement step for it
if a region snapshot replacement step has the volume deleted after the step was created, transition it directly to the VolumeDeleted state
if a region snapshot replacement step has the volume deleted during the saga invocation, then skip notifying any Upstairs and allow the saga to transition the request to Complete, where then associated clean up can proceed

An interesting race condition emerged during unit testing: the read-only region allocated to replace a region snapshot would be swapped into the snapshot volume, but would be susceptible to being deleted by the user, and therefore unable to be swapped into other volumes that have that snapshot volume as a read-only parent.

This requires an additional volume that used that read-only region in order to bump the reference count associated with that region, so that the user cannot delete it before it was used to replace all other uses of the region snapshot it was meant to replace.

This additional volume's lifetime lives as long as the region snapshot replacement, and therefore needs to be deleted when the region snapshot replacement is finished. This required a new region snapshot replacement finish saga, which required a new "Completing" state to perform the same type of state based lock on the replacement request done for all the other sagas.

Testing also revealed that there were scenarios where find_deleted_volume_regions would return volumes for hard-deletion prematurely. The function now returns a struct instead of a list of tuples, and in that struct, regions freed for deletion are now distinct from volumes freed for deletion. Volumes are now only returned for hard-deletion when all associated read/write regions have been (or are going to be) deleted.

Fixes #6353

Volumes can be deleted at any time, but the tasks and sagas that perform region snapshot replacement did not account for this. This commit adds checks in a few places for if a volume is soft-deleted or hard-deleted, and bails out of any affected region snapshot replacement accordingly: - if a volume that has the region snapshot being replaced is soft-deleted, then skip making a region snapshot replacement step for it - if a region snapshot replacement step has the volume deleted after the step was created, transition it directly to the VolumeDeleted state - if a region snapshot replacement step has the volume deleted during the saga invocation, then skip notifying any Upstairs and allow the saga to transition the request to Complete, where then associated clean up can proceed An interesting race condition emerged during unit testing: the read-only region allocated to replace a region snapshot would be swapped into the snapshot volume, but would be susceptible to being deleted by the user, and therefore unable to be swapped into other volumes that have that snapshot volume as a read-only parent. This requires an additional volume that used that read-only region in order to bump the reference count associated with that region, so that the user cannot delete it before it was used to replace all other uses of the region snapshot it was meant to replace. This additional volume's lifetime lives as long as the region snapshot replacement, and therefore needs to be deleted when the region snapshot replacement is finished. This required a new region snapshot replacement finish saga, which required a new "Completing" state to perform the same type of state based lock on the replacement request done for all the other sagas. Testing also revealed that there were scenarios where `find_deleted_volume_regions` would return volumes for hard-deletion prematurely. The function now returns a struct instead of a list of tuples, and in that struct, regions freed for deletion are now distinct from volumes freed for deletion. Volumes are now only returned for hard-deletion when all associated read/write regions have been (or are going to be) deleted. Fixes oxidecomputer#6353

…und task

Do not allow a volume repair record to be created if the volume does not exist, or was hard-deleted!

serialize VolumeReplaceResult for a debugging breadcrumb

nothing should directly insert into volume_repair dsl anymore

…ted_volumes

gjcolombo · 2024-11-22T16:49:10Z

An interesting race condition emerged during unit testing: the read-only region allocated to replace a region snapshot would be swapped into the snapshot volume, but would be susceptible to being deleted by the user, and therefore unable to be swapped into other volumes that have that snapshot volume as a read-only parent.

This requires an additional volume that used that read-only region in order to bump the reference count associated with that region, so that the user cannot delete it before it was used to replace all other uses of the region snapshot it was meant to replace.

I'm not sure I'm following this--what is the thing the user can delete that causes the region reference count(?) to reach zero? My understanding is/was that regions aren't user-addressable in the API, but the disks/volumes and snapshots that depend on them are. Is the thing being deleted the snapshot that's now having a region replaced?

jmpesp · 2024-11-22T17:08:19Z

Is the thing being deleted the snapshot that's now having a region replaced?

Yep! The user's able to delete the snapshot (and therefore the snapshot volume) at any time. Here's the race:

region snapshot replacement starts, targeting one of the three region snapshots in a snapshot volume:

targets: [
  RS1,
  RS2, <- replace this one
  RS3
]

a read-only region is cloned from either RS1 or RS3
that read-only region is swapped into the snapshot volume

targets: [
  RS1,
  read-only region,
  RS3
]

if this was the only volume that read-only region was a part of, then its reference count would be 1. if the user deletes the snapshot now, the reference count would move to zero.
the fix is to create an additional volume to reference that read-only region before swapping it into the snapshot volume:

"new region" volume:
targets: [
  read-only region
]

snapshot volume:
targets: [
  RS1,
  RS2, <- replace this one 
  RS3
]

now the reference count for the read-only region is 1 before the swap takes place, and cannot move to zero unless the "new region" volume is deleted

gjcolombo · 2024-11-22T18:08:28Z

if this was the only volume that read-only region was a part of, then its reference count would be 1. if the user deletes the snapshot now, the reference count would move to zero.

That seems... correct? Insofar as there really are no volumes referring to the region in this case. It sounds like the deal here is that

the region-replacement procedure needs to hold a reference to the region it's swapping in
region references are held by volume records; there is no other way to prevent a region from being GC'ed; ergo
the replacement procedure needs to create a fake volume to keep its new region alive until the procedure ends (even if the replacement-target volume is deleted midstream)

Is that the right mental model?

jmpesp · 2024-11-22T18:17:09Z

if this was the only volume that read-only region was a part of, then its reference count would be 1. if the user deletes the snapshot now, the reference count would move to zero.

That seems... correct? Insofar as there really are no volumes referring to the region in this case. It sounds like the deal here is that
* the region-replacement procedure needs to hold a reference to the region it's swapping in

* region references are held by volume records; there is no other way to prevent a region from being GC'ed; ergo

* the replacement procedure needs to create a fake volume to keep its new region alive until the procedure ends (even if the replacement-target volume is deleted midstream)
Is that the right mental model?

You're not wrong: it would be correct if the only use of RS2 was the snapshot volume, but every disk (or image) that was created using that snapshot as a block source will contain (in the read-only parent) a copy of the snapshot volume, and therefore a reference to RS2 (from the previous example). The new read-only region is meant to replace every copy of RS2 in any volume that references it, so it has to stick around until all those references are replaced.

…test

gjcolombo

Overall I think the direction here makes sense. I have a few general clarifying questions/comments, but don't see any substantial synchronization issues.

gjcolombo · 2024-11-22T23:08:49Z

nexus/db-model/src/region_snapshot_replacement.rs

@@ -133,6 +140,12 @@ pub struct RegionSnapshotReplacement {
    pub replacement_state: RegionSnapshotReplacementState,

    pub operating_saga_id: Option<Uuid>,


Not related to this PR, but a doc comment explaining the use of this field would be a welcome addition for slow-on-the-uptake readers like me :)

gjcolombo · 2024-11-22T23:55:12Z

nexus/src/app/background/tasks/region_snapshot_replacement_step.rs

+            // Check if the volume was deleted _after_ the replacement step was
+            // created.


Is this just an optimization? I don't see anything that'd prevent a volume from being deleted between the check on line 416 and the saga execution on line 478. I don't think that'll break anything, though, because volume_replace_snapshot will bail if the volume is deleted.

Assuming this is right I'd leave a comment noting that this is just here to avoid the (not-inconsiderable!) expense of having to run a saga that we know isn't going to do anything.

You're correct, this is just an optimization - the region_snapshot_replacement_step saga's volume_replace_snapshot will detect if the volume is deleted and do the right thing. Added comments in 9fc46ab

gjcolombo · 2024-11-23T00:22:15Z

nexus/db-model/src/region_snapshot_replacement.rs

 ///       Running                     |
-///                                   | set in region snapshot replacement
-///          |                        | finish background task
+///                                   |
+///          |                        |
+///          v                        |
+///                                   | responsibility of region snapshot
+///     Completing                    | replacement finish saga


nit: there's another version of this diagram (in the saga comments, I think) that has a back-edge from completing to running, though I think this is only if it unwinds, and I'm not sure if you're including those edges here

not a nit, that's important! added in 2cd7881

gjcolombo · 2024-11-23T00:27:25Z

nexus/src/app/sagas/region_snapshot_replacement_start.rs

+    // Create a volume to inflate the reference count of the newly created
+    // read-only region. If this is not done it's possible that a user could
+    // delete the snapshot volume _after_ the new read-only region was swapped
+    // in, removing the last reference to it and causing garbage collection.


Just for my understanding: What prevents the region from being garbage-collected when it's just been ensured but hasn't been linked into anything yet? My guess is that completely orphaned regions don't get GC'ed and that possibly-unused regions are found by looking for them in soft-deleted volumes. Is that correct?

Kinda yeah:

for read-only regions, they're only returned from soft_delete_volume for deletion when they appear in the volume being soft-deleted.

it's similar for read/write regions, but an additional way they're returned for GC is from find_deleted_volume_regions. That function join the regions table with the volume table and then only operates on soft-deleted volumes. The relevant section is:

// only operate on soft deleted volumes let soft_deleted = match &volume { Some(volume) => volume.time_deleted.is_some(), None => false, }; if !soft_deleted { continue; }

If the region's volume isn't inserted yet, then volume will be None here.

gjcolombo · 2024-11-23T00:29:25Z

nexus/src/app/sagas/region_snapshot_replacement_step.rs

@@ -423,6 +424,21 @@ async fn rsrss_notify_upstairs(
    let params = sagactx.saga_params::<Params>()?;
    let log = sagactx.user_data().log();

+    // If the associated volume was deleted, then skip this notification step.
+    // It's likely there is no Upstairs to talk to, but continue with the saga


"Likely"? :)

I wonder if it'd be worthwhile to have a #[cfg(debug_assertions)] block here in which we assert that if the volume is deleted, then either it has no disk or the disk is detached.

a78cabf reworks the comment.

I tried your suggestion by adding a ``validate_higher_level_resource_deletedfunction to thevalidate_volume_invariants` check, but it returned an Err during the `disk_create` saga unwind. I'm not sure how to solve this, or even if it can be solved...?

gjcolombo · 2024-11-23T00:35:17Z

nexus/tests/integration_tests/crucible_replacements.rs

A high-level comment: my understanding of the underlying issue is that it involves sagas, some initiated by user requests and some by background tasks, racing with each other and creating havoc. Is that correct? If it is, do the new tests in this module fail deterministically without the new fixes in place, or do we need races between the sagas they start to break the right way?

A high-level comment: my understanding of the underlying issue is that it involves sagas, some initiated by user requests and some by background tasks, racing with each other and creating havoc. Is that correct?

Yes it is!

do the new tests in this module fail deterministically without the new fixes in place

test_racing_replacements_for_soft_deleted_disk_volume does fail without the fixes in place - I wrote it before writing the fixes! :) I'm not sure which fix(es) are required for that test to pass, or even if it's exclusively fixed by this PR or needs the previous related ones, but it was the intention of that test to first reproduce it, then fix it.

do we need races between the sagas

More extensive testing would be possible with a testing harness like app::sagas::test_helpers::action_failure_can_unwind_idempotently where instead of injecting an error would run an arbitrary function, which in this case could be a whole snapshot delete. No such functionality exists, so manual testing is required.

…s in state Requested

…ted_volumes

leftwo

This is a ton of great work.

nexus/db-queries/src/db/datastore/volume.rs

nexus/db-model/src/schema.rs

nexus/db-queries/src/db/datastore/volume_repair.rs

nexus/db-queries/src/db/datastore/volume.rs

nexus/db-queries/src/db/datastore/region_snapshot_replacement.rs

nexus/src/app/background/tasks/region_replacement.rs

nexus/src/app/background/tasks/region_snapshot_replacement_start.rs

leftwo · 2024-12-13T22:45:26Z

nexus/src/app/background/tasks/region_snapshot_replacement_step.rs

                            log,
                            "{s}";
                            "request id" => ?request.id,
                            "volume id" => ?volume.id(),
                        );
-                        status.errors.push(s);
+


Our list of regions to look at is built with get_running_region_snapshot_replacements.
That is only good while the region snapshot replacement is actually running, correct?

If so, then I believe any "missed" volume due to repair lock or other errors, must be re-searched for and handled before the running region snapshot replacement job finishes.

Is that all correct? And, could we miss the window here where we don't catch it wile the replacement is running and then it becomes an orphan?

I was going to ask a related question about this, but I think I managed instead to convince myself this is OK. I think the way this is meant to work is that

the region_snapshot table row for a given snapshot is garbage collected once no more volumes refer to it

if there is a snapshot whose dataset is on an expunged sled, the region-snapshot-replace-start background task will try to create a new replacement request for it if none already exists

In the case I think you have in mind, the following would happen:

the call to find_volumes_referencing_socket_addr on line 263 will find all the volumes that refer to the snapshot of interest; let's assume there's just one of these

assume the call to create_region_snapshot_replacement_step above (line 335) fails because the volume is already locked

the region-snapshot-replace-finish task will call in_progress_region_snapshot_replacement_steps and find there are no replacement steps being resolved; this allows the finish task to try to mark the replacement as having Completed

however, there's still a volume referring to the snapshot, which keeps it from being deleted; this means a subsequent run of the replace-start task will start another replacement cycle for it

This could go on forever if the region snapshot replacement never manages to acquire the volume lock. I'm not sure there's an easy way to prevent that (we'd have to make the lock fair and I haven't thought of a straightforward way to do that). But absent that kind of persistent unfairness, some replacement attempt or another should eventually get the volume lock and do this work.

@jmpesp does this check out? The other question I had was whether it was possible for an ill-timed "finish" task to prematurely decide that a Running replacement was finished (because it happened to run and evaluate the replacement before the corresponding "create steps" background task got to create any steps for it). Is this possible or is it foreclosed upon by the replacement state machine?

the region-snapshot-replace-finish task will call in_progress_region_snapshot_replacement_steps and find there are no replacement steps being resolved; this allows the finish task to try to mark the replacement as having Completed

The only way a region snapshot request transitions out of the Running state (via the finish saga) is if

there are no in-progress region snapshot replacement steps, and

the request's associated region snapshot record was deleted.

There shouldn't be a way to miss volumes: if there are any volumes referencing the region snapshot, then the record will not have been deleted. In the case that (I think) Alan's referring to, we don't create a region snapshot replacement step for a volume due to an existing lock, but the request will not be finished because of #2 above.

As well, if the request is in the Running state, then the snapshot volume's already had a successful replacement performed, meaning no new volumes that reference the region snapshot can be created, meaning there can't be a race between something creating a new reference (by copying the snapshot volume as a read-only parent) and deleting the last reference.

I misread the comment on line 107 of region_snapshot_replacement_finish.rs (it refers to what happens if the conditional branch on line 131 of that file is taken and not to what has happened by passing the condition on line 106). This makes more sense now!

nexus/tests/integration_tests/crucible_replacements.rs

gjcolombo

I have a few more general remarks/questions from the newest commits. I still don't see anything that's blocking, but (as you know) there's a ton going on here and I'm not 100% sure I haven't overlooked or misunderstood something.

I don't want to hold up the PR just for this, but I'd be delighted to have something in the docs directory that explains the overall theory of how these kinds of replacements are meant to work. (A lot of this information is in the various sagas' and tasks' module comments but having a single narrative doc to refer to would be great.)

nexus/tests/integration_tests/crucible_replacements.rs

gjcolombo · 2024-12-16T19:48:43Z

nexus/db-queries/src/db/datastore/region_snapshot_replacement.rs

+                            Err(err.bail(Error::conflict(format!(
+                                "region snapshot replacement {} set to {:?} \
+                                    (operating saga id {:?})",
+                                region_snapshot_replacement_id,
+                                record.replacement_state,
+                                record.operating_saga_id,
+                            ))))


This is unexpected, correct? (It means either that the operating saga ID was wrong, or the caller called this on a replacement that wasn't Completing.)

It's unexpected, yeah. Even in the case where the saga node is rerun the state should be set to Complete already and the saga shouldn't unwind.

nexus/src/app/background/tasks/region_replacement.rs

gjcolombo · 2024-12-16T20:53:24Z

nexus/db-queries/src/db/datastore/region_snapshot_replacement.rs

+                // An associated volume repair record isn't _strictly_
+                // needed: snapshot volumes should never be directly


I follow the bit about snapshots not being directly accessed by an upstairs, but I thought the repair record was still needed for mutual exclusion?

It's not strictly speaking necessary - many replacements could occur on the snapshot volume at the same time, and because it's never constructed there wouldn't be any repair operation required.

They wouldn't contend on the snapshot volume's database record?

They would only contend around the volume repair record. If there was no lock for the snapshot volume, then the individual replacement transactions could all fire in whatever order they're going to serialize in, and it would probably work.

gjcolombo · 2024-12-17T00:58:08Z

nexus/src/app/background/tasks/region_snapshot_replacement_step.rs

                            log,
                            "{s}";
                            "request id" => ?request.id,
                            "volume id" => ?volume.id(),
                        );
-                        status.errors.push(s);
+


I was going to ask a related question about this, but I think I managed instead to convince myself this is OK. I think the way this is meant to work is that

the region_snapshot table row for a given snapshot is garbage collected once no more volumes refer to it

if there is a snapshot whose dataset is on an expunged sled, the region-snapshot-replace-start background task will try to create a new replacement request for it if none already exists

In the case I think you have in mind, the following would happen:

the call to find_volumes_referencing_socket_addr on line 263 will find all the volumes that refer to the snapshot of interest; let's assume there's just one of these

assume the call to create_region_snapshot_replacement_step above (line 335) fails because the volume is already locked

the region-snapshot-replace-finish task will call in_progress_region_snapshot_replacement_steps and find there are no replacement steps being resolved; this allows the finish task to try to mark the replacement as having Completed

however, there's still a volume referring to the snapshot, which keeps it from being deleted; this means a subsequent run of the replace-start task will start another replacement cycle for it

This could go on forever if the region snapshot replacement never manages to acquire the volume lock. I'm not sure there's an easy way to prevent that (we'd have to make the lock fair and I haven't thought of a straightforward way to do that). But absent that kind of persistent unfairness, some replacement attempt or another should eventually get the volume lock and do this work.

@jmpesp does this check out? The other question I had was whether it was possible for an ill-timed "finish" task to prematurely decide that a Running replacement was finished (because it happened to run and evaluate the replacement before the corresponding "create steps" background task got to create any steps for it). Is this possible or is it foreclosed upon by the replacement state machine?

task, insert into the `requests_completed_ok` vec! also, add a test for this, which exposed another bug: the continue that would have skipped starting the start saga was in the wrong place.

gjcolombo

I've looked over the commits individually (and asked some basic questions in DMs) and I think I grok what's going on here pretty thoroughly now. Thanks @jmpesp for all the back-and-forth on this one!

gjcolombo · 2024-12-18T22:27:11Z

nexus/db-queries/src/db/datastore/region_snapshot_replacement.rs

+                // An associated volume repair record isn't _strictly_
+                // needed: snapshot volumes should never be directly


They wouldn't contend on the snapshot volume's database record?

gjcolombo · 2024-12-18T22:41:22Z

nexus/src/app/background/tasks/region_snapshot_replacement_step.rs

                            log,
                            "{s}";
                            "request id" => ?request.id,
                            "volume id" => ?volume.id(),
                        );
-                        status.errors.push(s);
+


I misread the comment on line 107 of region_snapshot_replacement_finish.rs (it refers to what happens if the conditional branch on line 131 of that file is taken and not to what has happened by passing the condition on line 106). This makes more sense now!

leftwo

Just a few final bikeshed comments, great to see this getting done here.

jmpesp requested review from smklein and leftwo November 12, 2024 20:42

jmpesp mentioned this pull request Nov 12, 2024

Pantry crashed when processing a region with mismatched information #6353

Closed

jmpesp added 6 commits November 14, 2024 21:36

slow CI machines may not have started the sagas after running backgro…

bee3cca

…und task

THIS WAS A TREMENDOUS OVERSIGHT

309742d

Do not allow a volume repair record to be created if the volume does not exist, or was hard-deleted!

emit VolumeReplaceResult as saga node data

c9455ec

serialize VolumeReplaceResult for a debugging breadcrumb

fix a bunch of tests that were locking non-existent volumes

4630e03

fix a bunch more tests that were locking non-existent volumes

4c729e6

fix another test that was locking non-existent volumes

1c50f37

jmpesp removed the request for review from smklein November 18, 2024 19:46

jmpesp added 3 commits November 18, 2024 19:54

the TREMENDOUS oversight continues

63cb56e

nothing should directly insert into volume_repair dsl anymore

Merge branch 'main' into region_snapshot_replacement_account_for_dele…

0ce2029

…ted_volumes

fix after merge

79f8ad9

jmpesp requested a review from gjcolombo November 22, 2024 16:38

jmpesp added 3 commits November 22, 2024 16:39

account for relocking a volume

8b0a677

add comment about validating volume exists

b2a740f

disambiguate between soft and hard delete

18efdd3

jmpesp added 2 commits November 22, 2024 20:03

cover the case that the region snapshot finish saga kicks off in the …

640e678

…test

wait for state to transition to complete

1a60b15

gjcolombo reviewed Nov 23, 2024

View reviewed changes

jmpesp added 5 commits November 25, 2024 16:28

just an optimization

9fc46ab

add missing unwind edge

2cd7881

remove likely

a78cabf

handle if region snapshot is hard deleted while replacement request i…

c869cad

…s in state Requested

Merge branch 'main' into region_snapshot_replacement_account_for_dele…

231764d

…ted_volumes

jmpesp added 5 commits December 9, 2024 22:40

fmt

7b93e2e

fix compile time errors from merge

cb1c2fe

conflicts are not errors!

fdf800a

add dummy region snapshot

abf522e

fmt

715c9f7

leftwo reviewed Dec 13, 2024

View reviewed changes

gjcolombo reviewed Dec 17, 2024

View reviewed changes

jmpesp added 5 commits December 17, 2024 19:36

comment for volume_repair_insert_in_txn

e4d0844

cargo fmt missed this one!

518a86c

when completing a region snapshot replacement from the start background

ec9bb9a

task, insert into the `requests_completed_ok` vec! also, add a test for this, which exposed another bug: the continue that would have skipped starting the start saga was in the wrong place.

address the unfinished comment

34db989

rework wait_for_request_state to be clearer

4931f58

jmpesp mentioned this pull request Dec 18, 2024

volume_repair_insert_in_txn should return its own error #7275

Open

gjcolombo approved these changes Dec 18, 2024

View reviewed changes

leftwo self-requested a review December 19, 2024 01:37

leftwo approved these changes Dec 19, 2024

View reviewed changes

mroe -> more!

2a37e9c

jmpesp enabled auto-merge (squash) December 19, 2024 04:26

jmpesp merged commit 09b150f into oxidecomputer:main Dec 19, 2024
16 checks passed

jmpesp deleted the region_snapshot_replacement_account_for_deleted_volumes branch December 19, 2024 15:05

jmpesp mentioned this pull request Dec 19, 2024

test failed in CI: test_multiple_disks_multiple_snapshots_order_2 #7071

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle region snapshot replacement volume deletes #7046

Handle region snapshot replacement volume deletes #7046

jmpesp commented Nov 12, 2024

gjcolombo commented Nov 22, 2024

jmpesp commented Nov 22, 2024

gjcolombo commented Nov 22, 2024

jmpesp commented Nov 22, 2024

gjcolombo left a comment

gjcolombo Nov 22, 2024

gjcolombo Nov 22, 2024

jmpesp Nov 25, 2024

gjcolombo Nov 23, 2024

jmpesp Nov 25, 2024

gjcolombo Nov 23, 2024

jmpesp Nov 25, 2024

gjcolombo Nov 23, 2024

jmpesp Nov 25, 2024

gjcolombo Nov 23, 2024

jmpesp Nov 25, 2024

leftwo left a comment

leftwo Dec 13, 2024

gjcolombo Dec 17, 2024

jmpesp Dec 18, 2024

gjcolombo Dec 18, 2024

gjcolombo left a comment

gjcolombo Dec 16, 2024

jmpesp Dec 18, 2024

gjcolombo Dec 16, 2024

jmpesp Dec 18, 2024

gjcolombo Dec 18, 2024

jmpesp Dec 18, 2024

gjcolombo Dec 17, 2024

gjcolombo left a comment

gjcolombo Dec 18, 2024

gjcolombo Dec 18, 2024

leftwo left a comment

		@@ -133,6 +140,12 @@ pub struct RegionSnapshotReplacement {
		pub replacement_state: RegionSnapshotReplacementState,

		pub operating_saga_id: Option<Uuid>,

		// Check if the volume was deleted _after_ the replacement step was
		// created.

		// An associated volume repair record isn't _strictly_
		// needed: snapshot volumes should never be directly

Handle region snapshot replacement volume deletes #7046

Handle region snapshot replacement volume deletes #7046

Conversation

jmpesp commented Nov 12, 2024

gjcolombo commented Nov 22, 2024

jmpesp commented Nov 22, 2024

gjcolombo commented Nov 22, 2024

jmpesp commented Nov 22, 2024

gjcolombo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leftwo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gjcolombo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gjcolombo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leftwo left a comment

Choose a reason for hiding this comment