-
Notifications
You must be signed in to change notification settings - Fork 42
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Browse files
Browse the repository at this point in the history
Splitting up #5683 first by separating out the DB models, queries, and schema changes required: 1. region replacement records This commit adds a Region Replacement record, which is a request to replace a region in a volume. It transitions through the following states: Requested <-- | | | v | | Allocating -- | v Running <-- | | | v | | Driving -- | v ReplacementDone <-- | | | v | | Completing -- | v Completed which are captured in the `RegionReplacementState` enum. Transitioning from Requested to Running is the responsibility of the "start" saga, iterating between Running and Driving is the responsibility of the "drive" saga, and transitioning from ReplacementDone to Completed is the responsibility of the "finish" saga. All of these will come in subsequent PRs. The state transitions themselves are performed by these sagas and all involve a query that: - checks that the starting state (and other values as required) make sense - updates the state while setting a unique `operating_saga_id` id (and any other fields as appropriate) As multiple background tasks will be waking up, checking to see what sagas need to be triggered, and requesting that these region replacement sagas run, this is meant to block multiple sagas from running at the same time in an effort to cut down on interference - most will unwind at the first step instead of somewhere in the middle. 2. region replacement step records As region replacement takes place, Nexus will be making calls to services in order to trigger the necessary Crucible operations meant to actually perform th replacement. These steps are recorded in the database so that they can be consulted by subsequent steps, and additionally act as breadcrumbs if there is an issue. 3. volume repair records Nexus should take care to only replace one region (or snapshot!) for a volume at a time. Technically, the Upstairs can support two at a time, but codifying "only one at a time" is safer, and does not allow the possiblity for a Nexus bug to replace all three regions of a region set at a time (aka total data loss!). This "one at a time" constraint is enforced by each repair also creating a VolumeRepair record, a table for which there is a UNIQUE CONSTRAINT on the volume ID. 4. also, the `volume_replace_region` function The `volume_replace_region` function is also included in this PR. In a single transaction, this will: - set the target region's volume id to the replacement's volume id - set the replacement region's volume id to the target's volume id - update the target volume's construction request to replace the target region's SocketAddrV6 with the replacement region's This is called from the "start" saga, after allocating the replacement region, and is meant to transition the Volume's construction request from "indefinitely degraded, pointing to region that is gone" to "currently degraded, but can be repaired".
- Loading branch information
Showing
28 changed files
with
2,106 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,165 @@ | ||
// This Source Code Form is subject to the terms of the Mozilla Public | ||
// License, v. 2.0. If a copy of the MPL was not distributed with this | ||
// file, You can obtain one at https://mozilla.org/MPL/2.0/. | ||
|
||
use super::impl_enum_type; | ||
use crate::schema::region_replacement; | ||
use crate::Region; | ||
use chrono::DateTime; | ||
use chrono::Utc; | ||
use serde::{Deserialize, Serialize}; | ||
use uuid::Uuid; | ||
|
||
impl_enum_type!( | ||
#[derive(SqlType, Debug, QueryId)] | ||
#[diesel(postgres_type(name = "region_replacement_state", schema = "public"))] | ||
pub struct RegionReplacementStateEnum; | ||
|
||
#[derive(Copy, Clone, Debug, AsExpression, FromSqlRow, Serialize, Deserialize, PartialEq)] | ||
#[diesel(sql_type = RegionReplacementStateEnum)] | ||
pub enum RegionReplacementState; | ||
|
||
// Enum values | ||
Requested => b"requested" | ||
Allocating => b"allocating" | ||
Running => b"running" | ||
Driving => b"driving" | ||
ReplacementDone => b"replacement_done" | ||
Completing => b"completing" | ||
Complete => b"complete" | ||
); | ||
|
||
impl std::str::FromStr for RegionReplacementState { | ||
type Err = String; | ||
|
||
fn from_str(s: &str) -> Result<Self, Self::Err> { | ||
match s { | ||
"requested" => Ok(RegionReplacementState::Requested), | ||
"allocating" => Ok(RegionReplacementState::Allocating), | ||
"running" => Ok(RegionReplacementState::Running), | ||
"driving" => Ok(RegionReplacementState::Driving), | ||
"replacement_done" => Ok(RegionReplacementState::ReplacementDone), | ||
"complete" => Ok(RegionReplacementState::Complete), | ||
"completing" => Ok(RegionReplacementState::Completing), | ||
_ => Err(format!("unrecognized value {} for enum", s)), | ||
} | ||
} | ||
} | ||
|
||
/// Database representation of a Region replacement request. | ||
/// | ||
/// This record stores the data related to the operations required for Nexus to | ||
/// orchestrate replacing a region in a volume. It transitions through the | ||
/// following states: | ||
/// | ||
/// ```text | ||
/// Requested <-- --- | ||
/// | | | ||
/// | | | | ||
/// v | | responsibility of region | ||
/// | | replacement start saga | ||
/// Allocating -- | | ||
/// | | ||
/// | | | ||
/// v --- | ||
/// --- | ||
/// Running <-- | | ||
/// | | | ||
/// | | | | ||
/// v | | responsibility of region | ||
/// | | replacement drive saga | ||
/// Driving -- | | ||
/// | | ||
/// | | | ||
/// v --- | ||
/// --- | ||
/// ReplacementDone <-- | | ||
/// | | | ||
/// | | | | ||
/// v | | | ||
/// | | responsibility of region | ||
/// Completing -- | replacement finish saga | ||
/// | | ||
/// | | | ||
/// v | | ||
/// | | ||
/// Completed --- | ||
/// ``` | ||
/// | ||
/// which are captured in the RegionReplacementState enum. Annotated on the | ||
/// right are which sagas are responsible for which state transitions. The state | ||
/// transitions themselves are performed by these sagas and all involve a query | ||
/// that: | ||
/// | ||
/// - checks that the starting state (and other values as required) make sense | ||
/// - updates the state while setting a unique operating_saga_id id (and any | ||
/// other fields as appropriate) | ||
/// | ||
/// As multiple background tasks will be waking up, checking to see what sagas | ||
/// need to be triggered, and requesting that these region replacement sagas | ||
/// run, this is meant to block multiple sagas from running at the same time in | ||
/// an effort to cut down on interference - most will unwind at the first step | ||
/// of performing this state transition instead of somewhere in the middle. | ||
/// | ||
/// The correctness of a region replacement relies on certain operations | ||
/// happening only when the record is in a certain state. For example: Nexus | ||
/// should not undo a volume modification _after_ an upstairs has been sent a | ||
/// replacement request, so volume modification happens at the Allocating state | ||
/// (in the start saga), and replacement requests are only sent in the Driving | ||
/// state (in the drive saga) - this ensures that replacement requests are only | ||
/// sent if the start saga completed successfully, meaning the volume | ||
/// modification was committed to the database and will not change or be | ||
/// unwound. | ||
/// | ||
/// See also: RegionReplacementStep records | ||
#[derive( | ||
Queryable, | ||
Insertable, | ||
Debug, | ||
Clone, | ||
Selectable, | ||
Serialize, | ||
Deserialize, | ||
PartialEq, | ||
)] | ||
#[diesel(table_name = region_replacement)] | ||
pub struct RegionReplacement { | ||
pub id: Uuid, | ||
|
||
pub request_time: DateTime<Utc>, | ||
|
||
/// The region being replaced | ||
pub old_region_id: Uuid, | ||
|
||
/// The volume whose region is being replaced | ||
pub volume_id: Uuid, | ||
|
||
/// A synthetic volume that only is used to later delete the old region | ||
pub old_region_volume_id: Option<Uuid>, | ||
|
||
/// The new region that will be used to replace the old one | ||
pub new_region_id: Option<Uuid>, | ||
|
||
pub replacement_state: RegionReplacementState, | ||
|
||
pub operating_saga_id: Option<Uuid>, | ||
} | ||
|
||
impl RegionReplacement { | ||
pub fn for_region(region: &Region) -> Self { | ||
Self::new(region.id(), region.volume_id()) | ||
} | ||
|
||
pub fn new(old_region_id: Uuid, volume_id: Uuid) -> Self { | ||
Self { | ||
id: Uuid::new_v4(), | ||
request_time: Utc::now(), | ||
old_region_id, | ||
volume_id, | ||
old_region_volume_id: None, | ||
new_region_id: None, | ||
replacement_state: RegionReplacementState::Requested, | ||
operating_saga_id: None, | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
// This Source Code Form is subject to the terms of the Mozilla Public | ||
// License, v. 2.0. If a copy of the MPL was not distributed with this | ||
// file, You can obtain one at https://mozilla.org/MPL/2.0/. | ||
|
||
use super::impl_enum_type; | ||
use crate::ipv6; | ||
use crate::schema::region_replacement_step; | ||
use crate::SqlU16; | ||
use chrono::{DateTime, Utc}; | ||
use serde::{Deserialize, Serialize}; | ||
use std::net::SocketAddrV6; | ||
use uuid::Uuid; | ||
|
||
impl_enum_type!( | ||
#[derive(SqlType, Debug, QueryId)] | ||
#[diesel(postgres_type(name = "region_replacement_step_type", schema = "public"))] | ||
pub struct RegionReplacementStepTypeEnum; | ||
|
||
#[derive(Copy, Clone, Debug, AsExpression, FromSqlRow, Serialize, Deserialize, PartialEq)] | ||
#[diesel(sql_type = RegionReplacementStepTypeEnum)] | ||
pub enum RegionReplacementStepType; | ||
|
||
// What is driving the repair forward? | ||
Propolis => b"propolis" | ||
Pantry => b"pantry" | ||
); | ||
|
||
/// Database representation of a Region replacement repair step | ||
/// | ||
/// As region replacement takes place, Nexus will be making calls to services in | ||
/// order to trigger the necessary Crucible operations meant to actually perform | ||
/// the replacement. These steps are recorded in the database so that they can | ||
/// be consulted by subsequent steps, and additionally act as breadcrumbs if | ||
/// there is an issue. | ||
/// | ||
/// See also: RegionReplacement records | ||
#[derive( | ||
Queryable, | ||
Insertable, | ||
Debug, | ||
Clone, | ||
Selectable, | ||
Serialize, | ||
Deserialize, | ||
PartialEq, | ||
)] | ||
#[diesel(table_name = region_replacement_step)] | ||
pub struct RegionReplacementStep { | ||
pub replacement_id: Uuid, | ||
|
||
pub step_time: DateTime<Utc>, | ||
|
||
pub step_type: RegionReplacementStepType, | ||
|
||
pub step_associated_instance_id: Option<Uuid>, | ||
pub step_associated_vmm_id: Option<Uuid>, | ||
|
||
pub step_associated_pantry_ip: Option<ipv6::Ipv6Addr>, | ||
pub step_associated_pantry_port: Option<SqlU16>, | ||
pub step_associated_pantry_job_id: Option<Uuid>, | ||
} | ||
|
||
impl RegionReplacementStep { | ||
pub fn instance_and_vmm_ids(&self) -> Option<(Uuid, Uuid)> { | ||
if self.step_type != RegionReplacementStepType::Propolis { | ||
return None; | ||
} | ||
|
||
let instance_id = self.step_associated_instance_id?; | ||
let vmm_id = self.step_associated_vmm_id?; | ||
|
||
Some((instance_id, vmm_id)) | ||
} | ||
|
||
pub fn pantry_address(&self) -> Option<SocketAddrV6> { | ||
if self.step_type != RegionReplacementStepType::Pantry { | ||
return None; | ||
} | ||
|
||
let ip = self.step_associated_pantry_ip?; | ||
let port = self.step_associated_pantry_port?; | ||
|
||
Some(SocketAddrV6::new(*ip, *port, 0, 0)) | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.