Skip to content

Commit

Permalink
[#3886 1/4] Region replacement models and queries (#5791)
Browse files Browse the repository at this point in the history
Splitting up #5683 first by separating out the DB models, queries, and
schema changes required:

1. region replacement records

This commit adds a Region Replacement record, which is a request to
replace a region in a volume. It transitions through the following
states:

        Requested   <--
                      |
            |         |
            v         |
                      |
        Allocating  --

            |
            v

         Running    <--
                      |
            |         |
            v         |
                      |
         Driving    --

            |
            v

     ReplacementDone  <--
                        |
            |           |
            v           |
                        |
        Completing    --

            |
            v

        Completed

which are captured in the `RegionReplacementState` enum. Transitioning
from Requested to Running is the responsibility of the "start" saga,
iterating between Running and Driving is the responsibility of the
"drive" saga, and transitioning from ReplacementDone to Completed is the
responsibility of the "finish" saga. All of these will come in
subsequent PRs.

The state transitions themselves are performed by these sagas and all
involve a query that:

- checks that the starting state (and other values as required) make
sense
- updates the state while setting a unique `operating_saga_id` id (and
any other fields as appropriate)

As multiple background tasks will be waking up, checking to see what
sagas need to be triggered, and requesting that these region replacement
sagas run, this is meant to block multiple sagas from running at the
same time in an effort to cut down on interference - most will unwind at
the first step instead of somewhere in the middle.

2. region replacement step records

As region replacement takes place, Nexus will be making calls to
services in order to trigger the necessary Crucible operations meant to
actually perform th replacement. These steps are recorded in the
database so that they can be consulted by subsequent steps, and
additionally act as breadcrumbs if there is an issue.

3. volume repair records

Nexus should take care to only replace one region (or snapshot!) for a
volume at a time. Technically, the Upstairs can support two at a time,
but codifying "only one at a time" is safer, and does not allow the
possiblity for a Nexus bug to replace all three regions of a region set
at a time (aka total data loss!). This "one at a time" constraint is
enforced by each repair also creating a VolumeRepair record, a table for
which there is a UNIQUE CONSTRAINT on the volume ID.

4. also, the `volume_replace_region` function

The `volume_replace_region` function is also included in this PR. In a
single transaction, this will:

- set the target region's volume id to the replacement's volume id
- set the replacement region's volume id to the target's volume id
- update the target volume's construction request to replace the target
region's SocketAddrV6 with the replacement region's

This is called from the "start" saga, after allocating the replacement
region, and is meant to transition the Volume's construction request
from "indefinitely degraded, pointing to region that is gone" to
"currently degraded, but can be repaired".
  • Loading branch information
jmpesp authored May 23, 2024
1 parent c2f3515 commit 4090983
Show file tree
Hide file tree
Showing 28 changed files with 2,106 additions and 5 deletions.
6 changes: 6 additions & 0 deletions nexus/db-model/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,8 @@ pub mod queries;
mod quota;
mod rack;
mod region;
mod region_replacement;
mod region_replacement_step;
mod region_snapshot;
mod role_assignment;
mod role_builtin;
Expand Down Expand Up @@ -98,6 +100,7 @@ mod virtual_provisioning_resource;
mod vmm;
mod vni;
mod volume;
mod volume_repair;
mod vpc;
mod vpc_firewall_rule;
mod vpc_route;
Expand Down Expand Up @@ -162,6 +165,8 @@ pub use project::*;
pub use quota::*;
pub use rack::*;
pub use region::*;
pub use region_replacement::*;
pub use region_replacement_step::*;
pub use region_snapshot::*;
pub use role_assignment::*;
pub use role_builtin::*;
Expand Down Expand Up @@ -195,6 +200,7 @@ pub use virtual_provisioning_resource::*;
pub use vmm::*;
pub use vni::*;
pub use volume::*;
pub use volume_repair::*;
pub use vpc::*;
pub use vpc_firewall_rule::*;
pub use vpc_route::*;
Expand Down
3 changes: 3 additions & 0 deletions nexus/db-model/src/region.rs
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,9 @@ impl Region {
}
}

pub fn id(&self) -> Uuid {
self.identity.id
}
pub fn volume_id(&self) -> Uuid {
self.volume_id
}
Expand Down
165 changes: 165 additions & 0 deletions nexus/db-model/src/region_replacement.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
// This Source Code Form is subject to the terms of the Mozilla Public
// License, v. 2.0. If a copy of the MPL was not distributed with this
// file, You can obtain one at https://mozilla.org/MPL/2.0/.

use super::impl_enum_type;
use crate::schema::region_replacement;
use crate::Region;
use chrono::DateTime;
use chrono::Utc;
use serde::{Deserialize, Serialize};
use uuid::Uuid;

impl_enum_type!(
#[derive(SqlType, Debug, QueryId)]
#[diesel(postgres_type(name = "region_replacement_state", schema = "public"))]
pub struct RegionReplacementStateEnum;

#[derive(Copy, Clone, Debug, AsExpression, FromSqlRow, Serialize, Deserialize, PartialEq)]
#[diesel(sql_type = RegionReplacementStateEnum)]
pub enum RegionReplacementState;

// Enum values
Requested => b"requested"
Allocating => b"allocating"
Running => b"running"
Driving => b"driving"
ReplacementDone => b"replacement_done"
Completing => b"completing"
Complete => b"complete"
);

impl std::str::FromStr for RegionReplacementState {
type Err = String;

fn from_str(s: &str) -> Result<Self, Self::Err> {
match s {
"requested" => Ok(RegionReplacementState::Requested),
"allocating" => Ok(RegionReplacementState::Allocating),
"running" => Ok(RegionReplacementState::Running),
"driving" => Ok(RegionReplacementState::Driving),
"replacement_done" => Ok(RegionReplacementState::ReplacementDone),
"complete" => Ok(RegionReplacementState::Complete),
"completing" => Ok(RegionReplacementState::Completing),
_ => Err(format!("unrecognized value {} for enum", s)),
}
}
}

/// Database representation of a Region replacement request.
///
/// This record stores the data related to the operations required for Nexus to
/// orchestrate replacing a region in a volume. It transitions through the
/// following states:
///
/// ```text
/// Requested <-- ---
/// | |
/// | | |
/// v | | responsibility of region
/// | | replacement start saga
/// Allocating -- |
/// |
/// | |
/// v ---
/// ---
/// Running <-- |
/// | |
/// | | |
/// v | | responsibility of region
/// | | replacement drive saga
/// Driving -- |
/// |
/// | |
/// v ---
/// ---
/// ReplacementDone <-- |
/// | |
/// | | |
/// v | |
/// | | responsibility of region
/// Completing -- | replacement finish saga
/// |
/// | |
/// v |
/// |
/// Completed ---
/// ```
///
/// which are captured in the RegionReplacementState enum. Annotated on the
/// right are which sagas are responsible for which state transitions. The state
/// transitions themselves are performed by these sagas and all involve a query
/// that:
///
/// - checks that the starting state (and other values as required) make sense
/// - updates the state while setting a unique operating_saga_id id (and any
/// other fields as appropriate)
///
/// As multiple background tasks will be waking up, checking to see what sagas
/// need to be triggered, and requesting that these region replacement sagas
/// run, this is meant to block multiple sagas from running at the same time in
/// an effort to cut down on interference - most will unwind at the first step
/// of performing this state transition instead of somewhere in the middle.
///
/// The correctness of a region replacement relies on certain operations
/// happening only when the record is in a certain state. For example: Nexus
/// should not undo a volume modification _after_ an upstairs has been sent a
/// replacement request, so volume modification happens at the Allocating state
/// (in the start saga), and replacement requests are only sent in the Driving
/// state (in the drive saga) - this ensures that replacement requests are only
/// sent if the start saga completed successfully, meaning the volume
/// modification was committed to the database and will not change or be
/// unwound.
///
/// See also: RegionReplacementStep records
#[derive(
Queryable,
Insertable,
Debug,
Clone,
Selectable,
Serialize,
Deserialize,
PartialEq,
)]
#[diesel(table_name = region_replacement)]
pub struct RegionReplacement {
pub id: Uuid,

pub request_time: DateTime<Utc>,

/// The region being replaced
pub old_region_id: Uuid,

/// The volume whose region is being replaced
pub volume_id: Uuid,

/// A synthetic volume that only is used to later delete the old region
pub old_region_volume_id: Option<Uuid>,

/// The new region that will be used to replace the old one
pub new_region_id: Option<Uuid>,

pub replacement_state: RegionReplacementState,

pub operating_saga_id: Option<Uuid>,
}

impl RegionReplacement {
pub fn for_region(region: &Region) -> Self {
Self::new(region.id(), region.volume_id())
}

pub fn new(old_region_id: Uuid, volume_id: Uuid) -> Self {
Self {
id: Uuid::new_v4(),
request_time: Utc::now(),
old_region_id,
volume_id,
old_region_volume_id: None,
new_region_id: None,
replacement_state: RegionReplacementState::Requested,
operating_saga_id: None,
}
}
}
85 changes: 85 additions & 0 deletions nexus/db-model/src/region_replacement_step.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
// This Source Code Form is subject to the terms of the Mozilla Public
// License, v. 2.0. If a copy of the MPL was not distributed with this
// file, You can obtain one at https://mozilla.org/MPL/2.0/.

use super::impl_enum_type;
use crate::ipv6;
use crate::schema::region_replacement_step;
use crate::SqlU16;
use chrono::{DateTime, Utc};
use serde::{Deserialize, Serialize};
use std::net::SocketAddrV6;
use uuid::Uuid;

impl_enum_type!(
#[derive(SqlType, Debug, QueryId)]
#[diesel(postgres_type(name = "region_replacement_step_type", schema = "public"))]
pub struct RegionReplacementStepTypeEnum;

#[derive(Copy, Clone, Debug, AsExpression, FromSqlRow, Serialize, Deserialize, PartialEq)]
#[diesel(sql_type = RegionReplacementStepTypeEnum)]
pub enum RegionReplacementStepType;

// What is driving the repair forward?
Propolis => b"propolis"
Pantry => b"pantry"
);

/// Database representation of a Region replacement repair step
///
/// As region replacement takes place, Nexus will be making calls to services in
/// order to trigger the necessary Crucible operations meant to actually perform
/// the replacement. These steps are recorded in the database so that they can
/// be consulted by subsequent steps, and additionally act as breadcrumbs if
/// there is an issue.
///
/// See also: RegionReplacement records
#[derive(
Queryable,
Insertable,
Debug,
Clone,
Selectable,
Serialize,
Deserialize,
PartialEq,
)]
#[diesel(table_name = region_replacement_step)]
pub struct RegionReplacementStep {
pub replacement_id: Uuid,

pub step_time: DateTime<Utc>,

pub step_type: RegionReplacementStepType,

pub step_associated_instance_id: Option<Uuid>,
pub step_associated_vmm_id: Option<Uuid>,

pub step_associated_pantry_ip: Option<ipv6::Ipv6Addr>,
pub step_associated_pantry_port: Option<SqlU16>,
pub step_associated_pantry_job_id: Option<Uuid>,
}

impl RegionReplacementStep {
pub fn instance_and_vmm_ids(&self) -> Option<(Uuid, Uuid)> {
if self.step_type != RegionReplacementStepType::Propolis {
return None;
}

let instance_id = self.step_associated_instance_id?;
let vmm_id = self.step_associated_vmm_id?;

Some((instance_id, vmm_id))
}

pub fn pantry_address(&self) -> Option<SocketAddrV6> {
if self.step_type != RegionReplacementStepType::Pantry {
return None;
}

let ip = self.step_associated_pantry_ip?;
let port = self.step_associated_pantry_port?;

Some(SocketAddrV6::new(*ip, *port, 0, 0))
}
}
39 changes: 39 additions & 0 deletions nexus/db-model/src/schema.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1036,6 +1036,8 @@ table! {
}
}

allow_tables_to_appear_in_same_query!(zpool, dataset);

table! {
region (id) {
id -> Uuid,
Expand All @@ -1051,6 +1053,8 @@ table! {
}
}

allow_tables_to_appear_in_same_query!(zpool, region);

table! {
region_snapshot (dataset_id, region_id, snapshot_id) {
dataset_id -> Uuid,
Expand Down Expand Up @@ -1697,6 +1701,41 @@ table! {
}
}

table! {
region_replacement (id) {
id -> Uuid,
request_time -> Timestamptz,
old_region_id -> Uuid,
volume_id -> Uuid,
old_region_volume_id -> Nullable<Uuid>,
new_region_id -> Nullable<Uuid>,
replacement_state -> crate::RegionReplacementStateEnum,
operating_saga_id -> Nullable<Uuid>,
}
}

table! {
volume_repair (volume_id) {
volume_id -> Uuid,
repair_id -> Uuid,
}
}

table! {
region_replacement_step (replacement_id, step_time, step_type) {
replacement_id -> Uuid,
step_time -> Timestamptz,
step_type -> crate::RegionReplacementStepTypeEnum,

step_associated_instance_id -> Nullable<Uuid>,
step_associated_vmm_id -> Nullable<Uuid>,

step_associated_pantry_ip -> Nullable<Inet>,
step_associated_pantry_port -> Nullable<Int4>,
step_associated_pantry_job_id -> Nullable<Uuid>,
}
}

table! {
db_metadata (singleton) {
singleton -> Bool,
Expand Down
3 changes: 2 additions & 1 deletion nexus/db-model/src/schema_versions.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ use std::collections::BTreeMap;
///
/// This must be updated when you change the database schema. Refer to
/// schema/crdb/README.adoc in the root of this repository for details.
pub const SCHEMA_VERSION: SemverVersion = SemverVersion::new(64, 0, 0);
pub const SCHEMA_VERSION: SemverVersion = SemverVersion::new(65, 0, 0);

/// List of all past database schema versions, in *reverse* order
///
Expand All @@ -29,6 +29,7 @@ static KNOWN_VERSIONS: Lazy<Vec<KnownVersion>> = Lazy::new(|| {
// | leaving the first copy as an example for the next person.
// v
// KnownVersion::new(next_int, "unique-dirname-with-the-sql-files"),
KnownVersion::new(65, "region-replacement"),
KnownVersion::new(64, "add-view-for-v2p-mappings"),
KnownVersion::new(63, "remove-producer-base-route-column"),
KnownVersion::new(62, "allocate-subnet-decommissioned-sleds"),
Expand Down
1 change: 1 addition & 0 deletions nexus/db-model/src/upstairs_repair.rs
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@ pub struct UpstairsRepairNotification {
pub upstairs_id: DbTypedUuid<UpstairsKind>,
pub session_id: DbTypedUuid<UpstairsSessionKind>,

// The Downstairs being repaired
pub region_id: DbTypedUuid<DownstairsRegionKind>,
pub target_ip: ipv6::Ipv6Addr,
pub target_port: SqlU16,
Expand Down
Loading

0 comments on commit 4090983

Please sign in to comment.