Split instance state into Instance and VMM tables (#4194)

Refactor the definition of an `Instance` throughout the control plane so that an `Instance` is separate from the `Vmm`s that incarnate it. This confers several advantages: - VMMs have their own state that sled agent can update without necessarily changing their instance's state. It's also possible to change an instance's active Propolis ID without having to know or update an instance's Propolis IP or current sled ID, since these change when an instance's active Propolis ID changes. This removes a great deal of complexity in sled agent, especially when live migrating an instance, and also simplifies the live migration saga considerably. - Resource reservations for instances have much clearer lifetimes: a reservation can be released when its VMM has moved to a terminal state. Nexus no longer has to reason about VMM lifetimes from changes to an instance's Propolis ID columns. - It's now possible for an Instance not to have an active Propolis at all! This allows an instance not to reserve sled resources when it's not running. It also allows an instance to stop and restart on a different sled. - It's also possible to get a history of an instance's VMMs for, e.g., zone bundle examination purposes ("my VMM had a problem two days ago but it went away when I stopped and restarted it; can you investigate?"). Rework callers throughout Nexus who depend on knowing an instance's current state and/or its current sled ID. In many cases (e.g. disk and NIC attach and detach), the relevant detail is whether the instance has an active Propolis; for simplicity, augment these checks with "has an active Propolis ID" instead of trying to grab both instance and VMM states. ## Known issues/remaining work - The virtual provisioning table is still updated only at instance creation/deletion time. Usage metrics that depend on this table might report strange and wonderful values if a user creates many more instances than can be started at one time. - Instances still use the generic "resource attachment" CTE to manage attaching and detaching disks. Previously these queries looked at instance states; now they look at an instance's state and whether it has an active Propolis, but not at the active Propolis's state. This will need to be revisited in the future to support disk hotplug. - `handle_instance_put_result` is still very aggressive about setting instances to the Failed state if sled agent returns errors other than invalid-request-flavored errors. I think we should reconsider this behavior, but this change is big enough as it is. I will file a TODO for this and update the new comments accordingly before this merges. - The new live migration logic is not tested yet and differs from the "two-table" TLA+ model in RFD 361. More work will be needed here before we can declare live migration fully ready for selfhosting. - It would be nice to have an `omdb vmm` command; for now I've just updated existing `omdb` commands to deal with the optionality of Propolises and sleds. Tests: - Unit/integration tests - On a single-machine dev cluster, created two instances and verified that: - The instances only have resource reservations while they're running (and they reserve reservoir space now) - The instances can reach each other over their internal and external IPs when they're both running (and can still reach each other even if you try to delete one while it's active) - `scadm` shows the appropriate IP mappings being added/deleted as the instances start/stop - The instances' serial consoles work as expected - Attaching a new disk to an instance is only possible if the instance is stopped - Disk snapshot succeeds when invoked on a running instance's attached disk - Deleting an instance detaches its disks - `omicron-stress` on a single-machine dev cluster ran for about an hour and created ~800 instances without any instances going to the Failed state (previously this would happen in the first 5-10 minutes)
oxidecomputer · Oct 12, 2023 · 876e8ca · 876e8ca
1 parent c6955a5
commit 876e8ca
Show file tree

Hide file tree

Showing 64 changed files with 4,694 additions and 3,087 deletions.
diff --git a/clients/nexus-client/src/lib.rs b/clients/nexus-client/src/lib.rs
@@ -88,22 +88,41 @@ impl From<omicron_common::api::internal::nexus::InstanceRuntimeState>
         s: omicron_common::api::internal::nexus::InstanceRuntimeState,
     ) -> Self {
         Self {
-            run_state: s.run_state.into(),
-            sled_id: s.sled_id,
-            propolis_id: s.propolis_id,
             dst_propolis_id: s.dst_propolis_id,
-            propolis_addr: s.propolis_addr.map(|addr| addr.to_string()),
+            gen: s.gen.into(),
             migration_id: s.migration_id,
-            propolis_gen: s.propolis_gen.into(),
-            ncpus: s.ncpus.into(),
-            memory: s.memory.into(),
-            hostname: s.hostname,
+            propolis_id: s.propolis_id,
+            time_updated: s.time_updated,
+        }
+    }
+}
+
+impl From<omicron_common::api::internal::nexus::VmmRuntimeState>
+    for types::VmmRuntimeState
+{
+    fn from(s: omicron_common::api::internal::nexus::VmmRuntimeState) -> Self {
+        Self {
             gen: s.gen.into(),
+            state: s.state.into(),
             time_updated: s.time_updated,
         }
     }
 }
 
+impl From<omicron_common::api::internal::nexus::SledInstanceState>
+    for types::SledInstanceState
+{
+    fn from(
+        s: omicron_common::api::internal::nexus::SledInstanceState,
+    ) -> Self {
+        Self {
+            instance_state: s.instance_state.into(),
+            propolis_id: s.propolis_id,
+            vmm_state: s.vmm_state.into(),
+        }
+    }
+}
+
 impl From<omicron_common::api::external::InstanceState>
     for types::InstanceState
 {
@@ -124,14 +143,6 @@ impl From<omicron_common::api::external::InstanceState>
     }
 }
 
-impl From<omicron_common::api::external::InstanceCpuCount>
-    for types::InstanceCpuCount
-{
-    fn from(s: omicron_common::api::external::InstanceCpuCount) -> Self {
-        Self(s.0)
-    }
-}
-
 impl From<omicron_common::api::external::Generation> for types::Generation {
     fn from(s: omicron_common::api::external::Generation) -> Self {
         Self(i64::from(&s) as u64)

diff --git a/clients/sled-agent-client/src/lib.rs b/clients/sled-agent-client/src/lib.rs
@@ -24,16 +24,9 @@ impl From<omicron_common::api::internal::nexus::InstanceRuntimeState>
         s: omicron_common::api::internal::nexus::InstanceRuntimeState,
     ) -> Self {
         Self {
-            run_state: s.run_state.into(),
-            sled_id: s.sled_id,
             propolis_id: s.propolis_id,
             dst_propolis_id: s.dst_propolis_id,
-            propolis_addr: s.propolis_addr.map(|addr| addr.to_string()),
             migration_id: s.migration_id,
-            propolis_gen: s.propolis_gen.into(),
-            ncpus: s.ncpus.into(),
-            memory: s.memory.into(),
-            hostname: s.hostname,
             gen: s.gen.into(),
             time_updated: s.time_updated,
         }
@@ -85,22 +78,39 @@ impl From<types::InstanceRuntimeState>
 {
     fn from(s: types::InstanceRuntimeState) -> Self {
         Self {
-            run_state: s.run_state.into(),
-            sled_id: s.sled_id,
             propolis_id: s.propolis_id,
             dst_propolis_id: s.dst_propolis_id,
-            propolis_addr: s.propolis_addr.map(|addr| addr.parse().unwrap()),
             migration_id: s.migration_id,
-            propolis_gen: s.propolis_gen.into(),
-            ncpus: s.ncpus.into(),
-            memory: s.memory.into(),
-            hostname: s.hostname,
             gen: s.gen.into(),
             time_updated: s.time_updated,
         }
     }
 }
 
+impl From<types::VmmRuntimeState>
+    for omicron_common::api::internal::nexus::VmmRuntimeState
+{
+    fn from(s: types::VmmRuntimeState) -> Self {
+        Self {
+            state: s.state.into(),
+            gen: s.gen.into(),
+            time_updated: s.time_updated,
+        }
+    }
+}
+
+impl From<types::SledInstanceState>
+    for omicron_common::api::internal::nexus::SledInstanceState
+{
+    fn from(s: types::SledInstanceState) -> Self {
+        Self {
+            instance_state: s.instance_state.into(),
+            propolis_id: s.propolis_id,
+            vmm_state: s.vmm_state.into(),
+        }
+    }
+}
+
 impl From<types::InstanceState>
     for omicron_common::api::external::InstanceState
 {

diff --git a/common/src/api/external/mod.rs b/common/src/api/external/mod.rs
@@ -739,6 +739,7 @@ pub enum ResourceType {
     UpdateableComponent,
     UserBuiltin,
     Zpool,
+    Vmm,
 }
 
 // IDENTITY METADATA
@@ -866,25 +867,6 @@ impl InstanceState {
             InstanceState::Destroyed => "destroyed",
         }
     }
-
-    /// Returns true if the given state represents a fully stopped Instance.
-    /// This means that a transition from an !is_stopped() state must go
-    /// through Stopping.
-    pub fn is_stopped(&self) -> bool {
-        match self {
-            InstanceState::Starting => false,
-            InstanceState::Running => false,
-            InstanceState::Stopping => false,
-            InstanceState::Rebooting => false,
-            InstanceState::Migrating => false,
-
-            InstanceState::Creating => true,
-            InstanceState::Stopped => true,
-            InstanceState::Repairing => true,
-            InstanceState::Failed => true,
-            InstanceState::Destroyed => true,
-        }
-    }
 }
 
 /// The number of CPUs in an Instance
@@ -912,17 +894,6 @@ pub struct InstanceRuntimeState {
     pub time_run_state_updated: DateTime<Utc>,
 }
 
-impl From<crate::api::internal::nexus::InstanceRuntimeState>
-    for InstanceRuntimeState
-{
-    fn from(state: crate::api::internal::nexus::InstanceRuntimeState) -> Self {
-        InstanceRuntimeState {
-            run_state: state.run_state,
-            time_run_state_updated: state.time_updated,
-        }
-    }
-}
-
 /// View of an Instance
 #[derive(ObjectIdentity, Clone, Debug, Deserialize, Serialize, JsonSchema)]
 pub struct Instance {

diff --git a/common/src/api/internal/nexus.rs b/common/src/api/internal/nexus.rs
@@ -29,40 +29,59 @@ pub struct DiskRuntimeState {
     pub time_updated: DateTime<Utc>,
 }
 
-/// Runtime state of the Instance, including the actual running state and minimal
-/// metadata
-///
-/// This state is owned by the sled agent running that Instance.
+/// The "static" properties of an instance: information about the instance that
+/// doesn't change while the instance is running.
 #[derive(Clone, Debug, Deserialize, Serialize, JsonSchema)]
-pub struct InstanceRuntimeState {
-    /// runtime state of the Instance
-    pub run_state: InstanceState,
-    /// which sled is running this Instance
-    pub sled_id: Uuid,
-    /// which propolis-server is running this Instance
-    pub propolis_id: Uuid,
-    /// the target propolis-server during a migration of this Instance
-    pub dst_propolis_id: Option<Uuid>,
-    /// address of propolis-server running this Instance
-    pub propolis_addr: Option<SocketAddr>,
-    /// migration id (if one in process)
-    pub migration_id: Option<Uuid>,
-    /// The generation number for the Propolis and sled identifiers for this
-    /// instance.
-    pub propolis_gen: Generation,
-    /// number of CPUs allocated for this Instance
+pub struct InstanceProperties {
     pub ncpus: InstanceCpuCount,
-    /// memory allocated for this Instance
     pub memory: ByteCount,
-    /// RFC1035-compliant hostname for the Instance.
+    /// RFC1035-compliant hostname for the instance.
     // TODO-cleanup different type?
     pub hostname: String,
-    /// generation number for this state
+}
+
+/// The dynamic runtime properties of an instance: its current VMM ID (if any),
+/// migration information (if any), and the instance state to report if there is
+/// no active VMM.
+#[derive(Clone, Debug, Deserialize, Serialize, JsonSchema)]
+pub struct InstanceRuntimeState {
+    /// The instance's currently active VMM ID.
+    pub propolis_id: Option<Uuid>,
+    /// If a migration is active, the ID of the target VMM.
+    pub dst_propolis_id: Option<Uuid>,
+    /// If a migration is active, the ID of that migration.
+    pub migration_id: Option<Uuid>,
+    /// Generation number for this state.
     pub gen: Generation,
-    /// timestamp for this information
+    /// Timestamp for this information.
+    pub time_updated: DateTime<Utc>,
+}
+
+/// The dynamic runtime properties of an individual VMM process.
+#[derive(Clone, Debug, Deserialize, Serialize, JsonSchema)]
+pub struct VmmRuntimeState {
+    /// The last state reported by this VMM.
+    pub state: InstanceState,
+    /// The generation number for this VMM's state.
+    pub gen: Generation,
+    /// Timestamp for the VMM's state.
     pub time_updated: DateTime<Utc>,
 }
 
+/// A wrapper type containing a sled's total knowledge of the state of a
+/// specific VMM and the instance it incarnates.
+#[derive(Clone, Debug, Deserialize, Serialize, JsonSchema)]
+pub struct SledInstanceState {
+    /// The sled's conception of the state of the instance.
+    pub instance_state: InstanceRuntimeState,
+
+    /// The ID of the VMM whose state is being reported.
+    pub propolis_id: Uuid,
+
+    /// The most recent state of the sled's VMM process.
+    pub vmm_state: VmmRuntimeState,
+}
+
 // Oximeter producer/collector objects.
 
 /// Information announced by a metric server, used so that clients can contact it and collect