Fix for deleted volumes during region replacement #6659

jmpesp · 2024-09-24T21:44:12Z

Volumes can be deleted at any time, but the tasks and sagas that perform region replacement did not account for this. This commit adds checks in a few places for if a volume is soft-deleted or hard-deleted, and bails out of any affected region replacement accordingly:

If the replacement request is in the Requested state and the volume was seen to be soft-deleted or hard-deleted in the "region replacement" background task, then transition the region replacement request to Complete
If the replacement request is in the Running state, and the volume was seen to be soft-deleted or hard-deleted in the region replacement drive saga, then skip any operations on that volume in that saga and allow that saga to transition the region replacement request to ReplacementDone. Later the rest of the region replacement machinery will transition the request to Complete and clean up resources as appropriate.

Testing this required fleshing out the simulated Crucible Pantry with support for the new endpoints that the region replacement drive saga queries. Full parity is left for future work, and the endpoints required were left in but commented out.

This commit was peeled off work in progress to address #6353.

Volumes can be deleted at any time, but the tasks and sagas that perform region replacement did not account for this. This commit adds checks in a few places for if a volume is soft-deleted or hard-deleted, and bails out of any affected region replacement accordingly: - If the replacement request is in the Requested state and the volume was seen to be soft-deleted or hard-deleted in the "region replacement" background task, then transition the region replacement request to Complete - If the replacement request is in the Running state, and the volume was seen to be soft-deleted or hard-deleted in the region replacement drive saga, then skip any operations on that volume in that saga and allow that saga to transition the region replacement request to ReplacementDone. Later the rest of the region replacement machinery will transition the request to Complete and clean up resources as appropriate. Testing this required fleshing out the simulated Crucible Pantry with support for the new endpoints that the region replacement drive saga queries. Full parity is left for future work, and the endpoints required were left in but commented out. This commit was peeled off work in progress to address oxidecomputer#6353.

leftwo

As usual, mostly questions from me :)
How much of a simulation does the simulated pantry actually do?

leftwo · 2024-09-25T01:05:34Z

nexus/src/app/background/tasks/region_replacement.rs

+                // sending the start request and instead transition the request
+                // to completed
+
+                let volume_deleted = match self


Because we have mut self here, does that mean that this volume_deleted state
will be guaranteed not to change while this method is running?

Unfortunately no - this queries the database, and the result could change immediately after the query.

If it does change to deleted in the middle of the saga, then that could be a problem, but both volume_replace_region and volume_replace_snapshot check for if the volume was hard-deleted during the transaction................. and a38b751 updates this to also check if it is soft-deleted too!

I just want to be sure that we can handle a delete if it can happen, really anywhere during this saga. If we can handle that, either by failing the replacement or handling it at the end, then I'm good :)

Yeah, the saga will unwind. There's more work to do in a related follow up commit though, I found a case where the start saga will do some extra unnecessary work allocating regions if there's a hard delete of the volume in the middle of its execution.

nexus/src/app/background/tasks/region_replacement.rs

nexus/db-queries/src/db/datastore/volume.rs

nexus/test-utils/src/background.rs

nexus/tests/integration_tests/crucible_replacements.rs

sled-agent/src/sim/http_entrypoints_pantry.rs

…ddress test flake

leftwo

Just the one question remaining, but good to go

andrewjstone

Looks good. Thanks for all the tests!

andrewjstone · 2024-09-26T14:57:46Z

dev-tools/omdb/tests/successes.out

@@ -600,16 +600,17 @@ task: "physical_disk_adoption"
    last completion reported error: task disabled

 task: "region_replacement"
-  configured period: every <REDACTED_DURATION>s


Why did these get unredacted, and are they going to cause test failures in the future?

They got unredacted because there wasn't any redaction for "every N minutes", which I've now added in 66e6678

andrewjstone · 2024-09-26T15:07:29Z

sled-agent/src/sim/http_entrypoints_pantry.rs

        api.register(attach)?;
+        api.register(attach_activate_background)?;
+        // api.register(replace)?;


I know it's in the commit message, but it would probably be useful to also have a comment here about why this is commented out.

I took them out in 08786cc, I now think this just clutters up the function

github runners are slow, but this revealed some race conditions with the replacement tests also refactor tests to use common harness

jmpesp · 2024-10-01T01:49:05Z

@andrewjstone @leftwo debugging the test flakes in CI revealed that the tests a) were not waiting for the background task invocations to complete, and b) were not waiting for the sagas to transition the replacement requests. The slowness of the Github runners has helped here:)

I've put the appropriate fixes in and also refactored the new tests to use a common test harness. I'd like a re-review of those two commits please, and then we can ship this thing :)

leftwo · 2024-10-01T16:04:35Z

nexus/test-utils/src/background.rs

 use nexus_client::types::LastResult;
 use nexus_client::types::LastResultCompleted;
 use nexus_types::internal_api::background::*;
 use omicron_test_utils::dev::poll::{wait_for_condition, CondCheckError};
 use std::time::Duration;

-/// Return the most recent start time for a background task
-fn most_recent_start_time(
+/// Return the most recent activate time for a background task, returning None


Is my understanding correct that this is returning:

The activate time for the last completed background task. Returning None if the task is currently running, or has never run.

You only get a Some here if the task has completed at least once and is not currently running?

Correct, yeah.

leftwo · 2024-10-01T16:31:31Z

nexus/test-utils/src/background.rs

+                        // that the LastResult is NeverCompleted? the Some in
+                        // the second part of the tuple means this ran before,
+                        // so panic here.
+                        panic!("task is idle, but there's no activate time?!");


We panic here because this should not be possible, right?

Is there any chance the task.current could change between we matches on line 73 and make the call to most_recent_activate_time()?

The state of the background task could change yeah but we'll pick that up when it gets re-polled.

The state of the world in that part of the match shouldn't ever be possible, no, I think it's appropriate to panic there.

leftwo

I almost commented before that it seemed like there was a bunch of test code that looked pretty similar between the tests. I'm glad instead of me having to make that comment, you instead read my mind and did what I wanted. Please continue to do that.

Volumes can be deleted at any time, but the tasks and sagas that perform region replacement did not account for this. This commit adds checks in a few places for if a volume is soft-deleted or hard-deleted, and bails out of any affected region replacement accordingly: - If the replacement request is in the Requested state and the volume was seen to be soft-deleted or hard-deleted in the "region replacement" background task, then transition the region replacement request to Complete - If the replacement request is in the Running state, and the volume was seen to be soft-deleted or hard-deleted in the region replacement drive saga, then skip any operations on that volume in that saga and allow that saga to transition the region replacement request to ReplacementDone. Later the rest of the region replacement machinery will transition the request to Complete and clean up resources as appropriate. Testing this required fleshing out the simulated Crucible Pantry with support for the new endpoints that the region replacement drive saga queries. Full parity is left for future work, and the endpoints required were left in but commented out. This commit was peeled off work in progress to address #6353.

jmpesp requested review from andrewjstone and leftwo September 24, 2024 21:44

jmpesp mentioned this pull request Sep 24, 2024

Pantry crashed when processing a region with mismatched information #6353

Closed

jmpesp added 4 commits September 24, 2024 22:43

test_add_region_replacement_causes_start needs a dummy volume now!

7343fe7

update expectorate output

a3ff716

bump period_secs for region_replacement_driver to address test flake

5cfc566

leftwo reviewed Sep 25, 2024

View reviewed changes

jmpesp added 5 commits September 25, 2024 17:43

bump nexus_test config period_secs for region_replacement_driver to a…

ed5d00b

…ddress test flake

update dev-tools/omdb/tests/successes.out

7f998a0

also check for soft-deleted volumes during transactions!

a38b751

remove true XXX

0c52969

print a message for panic

e5f9eae

leftwo approved these changes Sep 25, 2024

View reviewed changes

andrewjstone approved these changes Sep 26, 2024

View reviewed changes

jmpesp added 5 commits September 30, 2024 15:10

redact minutes from omdb output

66e6678

take out commented endpoints

08786cc

Merge branch 'main' into region_replacement_account_for_deleted_volumes

c132101

wait until the background task has actually run

aefcf54

wait for associated sagas to transition the requests

383d87c

github runners are slow, but this revealed some race conditions with the replacement tests also refactor tests to use common harness

leftwo reviewed Oct 1, 2024

View reviewed changes

leftwo approved these changes Oct 1, 2024

View reviewed changes

jmpesp merged commit 1b82134 into oxidecomputer:main Oct 2, 2024
16 checks passed

jmpesp deleted the region_replacement_account_for_deleted_volumes branch October 2, 2024 14:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for deleted volumes during region replacement #6659

Fix for deleted volumes during region replacement #6659

jmpesp commented Sep 24, 2024

leftwo left a comment

leftwo Sep 25, 2024

jmpesp Sep 25, 2024

leftwo Sep 25, 2024

jmpesp Sep 30, 2024

leftwo left a comment

andrewjstone left a comment

andrewjstone Sep 26, 2024

jmpesp Sep 30, 2024

andrewjstone Sep 26, 2024

jmpesp Sep 30, 2024

jmpesp commented Oct 1, 2024

leftwo Oct 1, 2024

jmpesp Oct 2, 2024

leftwo Oct 1, 2024

jmpesp Oct 2, 2024

leftwo left a comment

Fix for deleted volumes during region replacement #6659

Fix for deleted volumes during region replacement #6659

Conversation

jmpesp commented Sep 24, 2024

leftwo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leftwo left a comment

Choose a reason for hiding this comment

andrewjstone left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmpesp commented Oct 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

leftwo left a comment

Choose a reason for hiding this comment