Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for deleted volumes during region replacement #6659

Conversation

jmpesp
Copy link
Contributor

@jmpesp jmpesp commented Sep 24, 2024

Volumes can be deleted at any time, but the tasks and sagas that perform region replacement did not account for this. This commit adds checks in a few places for if a volume is soft-deleted or hard-deleted, and bails out of any affected region replacement accordingly:

  • If the replacement request is in the Requested state and the volume was seen to be soft-deleted or hard-deleted in the "region replacement" background task, then transition the region replacement request to Complete

  • If the replacement request is in the Running state, and the volume was seen to be soft-deleted or hard-deleted in the region replacement drive saga, then skip any operations on that volume in that saga and allow that saga to transition the region replacement request to ReplacementDone. Later the rest of the region replacement machinery will transition the request to Complete and clean up resources as appropriate.

Testing this required fleshing out the simulated Crucible Pantry with support for the new endpoints that the region replacement drive saga queries. Full parity is left for future work, and the endpoints required were left in but commented out.

This commit was peeled off work in progress to address #6353.

Volumes can be deleted at any time, but the tasks and sagas that perform
region replacement did not account for this. This commit adds checks in
a few places for if a volume is soft-deleted or hard-deleted, and bails
out of any affected region replacement accordingly:

- If the replacement request is in the Requested state and the volume
  was seen to be soft-deleted or hard-deleted in the "region
  replacement" background task, then transition the region replacement
  request to Complete

- If the replacement request is in the Running state, and the volume was
  seen to be soft-deleted or hard-deleted in the region replacement
  drive saga, then skip any operations on that volume in that saga and
  allow that saga to transition the region replacement request to
  ReplacementDone. Later the rest of the region replacement machinery
  will transition the request to Complete and clean up resources as
  appropriate.

Testing this required fleshing out the simulated Crucible Pantry with
support for the new endpoints that the region replacement drive saga
queries. Full parity is left for future work, and the endpoints required
were left in but commented out.

This commit was peeled off work in progress to address oxidecomputer#6353.
Copy link
Contributor

@leftwo leftwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As usual, mostly questions from me :)
How much of a simulation does the simulated pantry actually do?

// sending the start request and instead transition the request
// to completed

let volume_deleted = match self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we have mut self here, does that mean that this volume_deleted state
will be guaranteed not to change while this method is running?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately no - this queries the database, and the result could change immediately after the query.

If it does change to deleted in the middle of the saga, then that could be a problem, but both volume_replace_region and volume_replace_snapshot check for if the volume was hard-deleted during the transaction................. and a38b751 updates this to also check if it is soft-deleted too!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to be sure that we can handle a delete if it can happen, really anywhere during this saga. If we can handle that, either by failing the replacement or handling it at the end, then I'm good :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the saga will unwind. There's more work to do in a related follow up commit though, I found a case where the start saga will do some extra unnecessary work allocating regions if there's a hard delete of the volume in the middle of its execution.

nexus/src/app/background/tasks/region_replacement.rs Outdated Show resolved Hide resolved
nexus/db-queries/src/db/datastore/volume.rs Show resolved Hide resolved
nexus/test-utils/src/background.rs Outdated Show resolved Hide resolved
nexus/tests/integration_tests/crucible_replacements.rs Outdated Show resolved Hide resolved
sled-agent/src/sim/http_entrypoints_pantry.rs Show resolved Hide resolved
Copy link
Contributor

@leftwo leftwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the one question remaining, but good to go

Copy link
Contributor

@andrewjstone andrewjstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for all the tests!

@@ -600,16 +600,17 @@ task: "physical_disk_adoption"
last completion reported error: task disabled

task: "region_replacement"
configured period: every <REDACTED_DURATION>s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did these get unredacted, and are they going to cause test failures in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They got unredacted because there wasn't any redaction for "every N minutes", which I've now added in 66e6678

api.register(attach)?;
api.register(attach_activate_background)?;
// api.register(replace)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's in the commit message, but it would probably be useful to also have a comment here about why this is commented out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took them out in 08786cc, I now think this just clutters up the function

@jmpesp
Copy link
Contributor Author

jmpesp commented Oct 1, 2024

@andrewjstone @leftwo debugging the test flakes in CI revealed that the tests a) were not waiting for the background task invocations to complete, and b) were not waiting for the sagas to transition the replacement requests. The slowness of the Github runners has helped here:)

I've put the appropriate fixes in and also refactored the new tests to use a common test harness. I'd like a re-review of those two commits please, and then we can ship this thing :)

use nexus_client::types::LastResult;
use nexus_client::types::LastResultCompleted;
use nexus_types::internal_api::background::*;
use omicron_test_utils::dev::poll::{wait_for_condition, CondCheckError};
use std::time::Duration;

/// Return the most recent start time for a background task
fn most_recent_start_time(
/// Return the most recent activate time for a background task, returning None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is my understanding correct that this is returning:

The activate time for the last completed background task.
Returning None if the task is currently running, or has never run.

You only get a Some here if the task has completed at least once and is not currently running?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, yeah.

// that the LastResult is NeverCompleted? the Some in
// the second part of the tuple means this ran before,
// so panic here.
panic!("task is idle, but there's no activate time?!");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We panic here because this should not be possible, right?

Is there any chance the task.current could change between we matches on line 73 and make the call to most_recent_activate_time()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The state of the background task could change yeah but we'll pick that up when it gets re-polled.

The state of the world in that part of the match shouldn't ever be possible, no, I think it's appropriate to panic there.

Copy link
Contributor

@leftwo leftwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost commented before that it seemed like there was a bunch of test code that looked pretty similar between the tests. I'm glad instead of me having to make that comment, you instead read my mind and did what I wanted. Please continue to do that.

@jmpesp jmpesp merged commit 1b82134 into oxidecomputer:main Oct 2, 2024
16 checks passed
@jmpesp jmpesp deleted the region_replacement_account_for_deleted_volumes branch October 2, 2024 14:16
hawkw pushed a commit that referenced this pull request Oct 2, 2024
Volumes can be deleted at any time, but the tasks and sagas that perform
region replacement did not account for this. This commit adds checks in
a few places for if a volume is soft-deleted or hard-deleted, and bails
out of any affected region replacement accordingly:

- If the replacement request is in the Requested state and the volume
was seen to be soft-deleted or hard-deleted in the "region replacement"
background task, then transition the region replacement request to
Complete

- If the replacement request is in the Running state, and the volume was
seen to be soft-deleted or hard-deleted in the region replacement drive
saga, then skip any operations on that volume in that saga and allow
that saga to transition the region replacement request to
ReplacementDone. Later the rest of the region replacement machinery will
transition the request to Complete and clean up resources as
appropriate.

Testing this required fleshing out the simulated Crucible Pantry with
support for the new endpoints that the region replacement drive saga
queries. Full parity is left for future work, and the endpoints required
were left in but commented out.

This commit was peeled off work in progress to address #6353.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants