Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#5333 3/6] Region snapshot replacement start #6294

Merged
merged 14 commits into from
Aug 14, 2024

Conversation

jmpesp
Copy link
Contributor

@jmpesp jmpesp commented Aug 12, 2024

This commit adds a "region snapshot replacement start" background task that will:

  1. check for snapshots that are stored on disks that were expunged, and
    insert region snapshot replacement requests for them.

  2. check if there are any region snapshot replacement requests in the
    Requested state and run the new "region snapshot replacement start"
    saga for them. This background task will also pick up manually
    requested region snapshot replacements.

Also in this commit is the "region snapshot replacement start saga", which will transition a region snapshot replacement request from Requested to Allocating and:

  • allocate a new Region (with source set to Some so that a clone occurs)

  • create a blank Volume that will be used to stash the snapshot to delete later

  • swap the snapshot being replaced with the new region that was cloned from the others in the read-only region set

  • update the region snapshot replacement request's state to "ReplacementDone" and clearing the operating saga id

This represents the first step to be taken after a snapshot goes away: allocate the replacement, and swap it in to the affected snapshot volume. Once this is done, anything that uses the snapshot volume as a read-only parent will no longer reference the expunged snapshot, and will work without any issues.

Existing constructed Volumes running in a propolis or pantry context will remain unmodified: a future commit will contain the saga that takes care of performing the necessary read-only target replacements.

This commit adds a "region snapshot replacement start" background task
that will:

1) check for snapshots that are stored on disks that were expunged, and
   insert region snapshot replacement requests for them.

2) check if there are any region snapshot replacement requests in the
   `Requested` state and run the new "region snapshot replacement start"
   saga for them. This background task will also pick up manually
   requested region snapshot replacements.

Also in this commit is the "region snapshot replacement start saga",
which will transition a region snapshot replacement request from
`Requested` to `Allocating` and:

- allocate a new Region (with source set to Some so that a clone occurs)

- create a blank Volume that will be used to stash the snapshot to
  delete later

- swap the snapshot being replaced with the new region that was cloned
  from the others in the read-only region set

- update the region snapshot replacement request's state to
  "ReplacementDone" and clearing the operating saga id

This represents the first step to be taken after a snapshot goes away:
allocate the replacement, and swap it in to the affected snapshot
volume. Once this is done, anything that uses the snapshot volume as a
read-only parent will no longer reference the expunged snapshot, and
will work without any issues.

Existing constructed Volumes running in a propolis or pantry context
will remain unmodified: a future commit will contain the saga that takes
care of performing the necessary read-only target replacements.
@jmpesp jmpesp requested review from andrewjstone and leftwo August 12, 2024 18:01
Copy link
Contributor

@andrewjstone andrewjstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great @jmpesp. Just minor comments from me.


Err(e) => {
let s =
format!("error creating replacement request: {e}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth logging info about which region_id and snapshot_id the error occurred on, similar to the Ok case above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely, that's an oversight on my part. added in fea9ad9


Err(e) => {
let s =
format!("error looking up replacement request: {e}");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth logging information about which snapshot was problematic, similar to below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also added in fea9ad9


Err(e) => {
let s = format!(
"sending region snapshot replacement request failed: \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log info about the request that failed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep! done in 4a0ae26

assert_eq!(result, RegionSnapshotReplacementStartStatus::default());
assert_eq!(starter.count_reset(), 0);

// Add a region snapshot replacement request for a fake region snapshot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably a big pain in the ass, but this test could be a touch more complete if it also exercised: create_requests_for_region_snapshots_on_expunged_disks. I believe you'd have to create a fake disk in the datastore, and a region, and then expunge the fake disk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 ended up being not too bad, see 7132858

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nice!

// License, v. 2.0. If a copy of the MPL was not distributed with this
// file, You can obtain one at https://mozilla.org/MPL/2.0/.

//! Background task for detecting region snapshots that need replacing and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love how easy to read this whole RPW is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! :)

let alloc_region_params =
sagactx.lookup::<AllocRegionParams>("alloc_region_params")?;

let (.., db_snapshot) = LookupPath::new(&opctx, &osagactx.datastore())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You already look up the snapshot in rsrss_get_alloc_region_params. Why do you look it up again here? Can you instead save the volume_id and snapshot_id in AllocRegionParams?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it - done in f2b9fc7 (soon on the list will be to use more typed uuids...)

// downstairs port. Once this goes away, Nexus will require a way to query
// for the repair port!

let mut repair_addr: SocketAddrV6 =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be clearer to explicitly say source_repair_addr here. This makes the call to ensure_region_in_dataset below more explicit in what it does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, added in a2bda34 along with expanding the comment for ensure_region_in_dataset

osagactx
.datastore()
.volume_replace_snapshot(
VolumeWithTarget(old_volume_id),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, the only change between this undo and the prior method (as stated in the comment) is this swap in parameters and the log message above. It may be worth while to create a helper method to do all the lookups and then only perform this last call in the forward/undo functions. That would make it clearer what is different IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, done in 94bc3cb

Copy link
Contributor

@leftwo leftwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting this out so quickly. No comments of any substance.

A question, this will replace the downstairs for a snapshot.
However, that snapshot could be a read_only_parent for a bunch of other disks.
Is that handled in a future PR?

//! beginning that process
//!
//! This task's responsibility is to create region snapshot replacement requests
//! when physical disks are expunged, and trigger the region snapshot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to wait for expungement, or can other things trigger a snapshot
replacement (not counting omdb)? I don' t think at the moment there exists
anytihing that would want to. I suspect at some point we may want the ability
to "move things around" and that means moving snapshots (or regions) off
a disk that is not expunged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we were talking in the storage huddle and in chat about getting snapshot scrubbing for free once this is in - that would look like inserting region snapshot replacement requests directly. That'll come later after R10 :)

//! 1. Allocate a new region
//!
//! 2. Create a blank volume that can be later deleted to stash the snapshot
//! being replaced. This is filled in the `volume_replace_snapshot`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, change "This is filled in the .." to "This is populated in the .."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in f89a395

//! request, and change the state to "Allocating". Then, it performs the
//! following steps:
//!
//! 1. Allocate a new region
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This step includes the "clone" of an existing region, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does yeah - later on ensure_region_in_dataset is called with a source repair address.

GET_OLD_SNAPSHOT_VOLUME_ID -> "old_snapshot_volume_id" {
+ rsrss_get_old_snapshot_volume_id
}
CREATE_FAKE_VOLUME -> "unused_3" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why unsued_3 before unused_2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because I can't count! 4319c0a

.datastore()
.regions_hard_delete(log, vec![region.id()])
.await?;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't find a new region and we expect to, should we log a message here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in 80b23e3


// With a list of datasets and regions to ensure, other sagas need to have a
// separate no-op forward step for the undo action to ensure that the undo
// step occurs in the case that the ensure partially fails. Here this not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: final sentence: "Here this is not required,.."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, 73c95c6


let Some(region_snapshot) = region_snapshot else {
return Err(ActionError::action_failed(format!(
"region snapshot {} {} {} deleted!",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know this is deleted? Is that what a None means?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do yeah - if the corresponding region_snapshot is missing then that means the volume references went to zero, meaning the crucible resources would have been cleaned up and the record hard deleted.


info!(
log,
"undo: replacing {} with {} in volume {}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point is "old" from the broken region we were fixing? Or have we swapped
old/new here? I'm not sure if this log is printing what I think it should be, which is
opposite from the srss_replace_snapshot_in_volume().

Or, wait, by saying undo at the top, you mean we are undo-ing what the previous
step did, which is putting back things like they were. I guess I find this a little
confusing, but if all the undo messages are like this, I'll just live with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old and new is not swapped in this log message, where it is in the volume_replace_snapshot arguments. You're right, adding undo: is meant to signify that it's undoing the forward action. The intention is that someone can search for the text "replacing UUID with UUID in volume UUID" and get two results.

I put a small comment in 1b97525 explaining this intention.

@jmpesp jmpesp enabled auto-merge (squash) August 14, 2024 16:42
@jmpesp jmpesp merged commit 823f8ab into oxidecomputer:main Aug 14, 2024
18 checks passed
@jmpesp jmpesp deleted the snapshot_replacement_part_3 branch August 14, 2024 17:53
davepacheco added a commit that referenced this pull request Aug 15, 2024
davepacheco added a commit that referenced this pull request Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants