Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split instance state into Instance and VMM tables #4194

Merged
merged 59 commits into from
Oct 12, 2023
Merged
Show file tree
Hide file tree
Changes from 43 commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
4f56437
Define vmm table & update instance table
gjcolombo Sep 25, 2023
ed4f8e0
update DB model types for instances and VMMs
gjcolombo Sep 25, 2023
75a2f6a
redefine internal instance/vmm runtime state types
gjcolombo Sep 25, 2023
27eb921
Update sled-agent params types
gjcolombo Sep 26, 2023
f1ada46
sled agent: rework instance state management
gjcolombo Sep 26, 2023
cae4922
Rework InstanceInner -> InstanceStates contract & termination logic
gjcolombo Sep 27, 2023
74257ae
Rework simulated instances
gjcolombo Sep 27, 2023
a5c3488
Fix up simulated sled agent collection tests
gjcolombo Sep 27, 2023
4bcced2
Plumb parameters from sled agent entry points
gjcolombo Sep 27, 2023
39b7e64
sled agent: clean up test transcription errors
gjcolombo Sep 27, 2023
70cc2fb
update sled agent OpenAPI spec
gjcolombo Sep 27, 2023
854c77f
sled agent: reimplement generated-type From impls
gjcolombo Sep 27, 2023
46560b2
nexus: adapt some db queries to instance/vmm split
gjcolombo Sep 27, 2023
df011d5
Draft CTE for updating an instance/VMM in a single statement
gjcolombo Sep 27, 2023
896ec96
nexus: fix network interface queries
gjcolombo Sep 28, 2023
f5241c3
clean up test code that assumes instances always have sleds
gjcolombo Sep 28, 2023
94021a0
nexus: return instance/vmm info tuples; fix up serial console APIs
gjcolombo Sep 29, 2023
38b3939
Fix handling of instance/vmm state changes from sled agent
gjcolombo Sep 29, 2023
7ec7f84
Update Nexus::handle_instance_put_result
gjcolombo Sep 29, 2023
8dfdfdd
Update Nexus::instance_request_state
gjcolombo Sep 29, 2023
6dfeae1
Remove Nexus::instance_sled
gjcolombo Sep 29, 2023
3952f90
Rework instance create saga
gjcolombo Sep 29, 2023
bcc2f3b
Update instance start saga
gjcolombo Sep 29, 2023
0edf89c
Update instance migration saga
gjcolombo Sep 29, 2023
d5ab2f2
Update instance delete saga
gjcolombo Sep 29, 2023
cafa141
Update disk snapshot saga
gjcolombo Sep 29, 2023
781503e
Fix build errors
gjcolombo Sep 29, 2023
28683b8
cleanup: use dendrite deletion helper from start saga undo
gjcolombo Sep 30, 2023
bac9309
Actually check reservoir space when allocating to sleds
gjcolombo Sep 30, 2023
b3df607
bugfix: reorder instance columns in Diesel table schema
gjcolombo Sep 30, 2023
fd30291
bugfix: only query ID in 'find' prongs of instance/vmm update CTE
gjcolombo Sep 30, 2023
902c0a0
bugfix: use a valid state name in instance state subquery
gjcolombo Sep 30, 2023
7ffd8ef
Update Nexus OpenAPI spec
gjcolombo Sep 30, 2023
db4fa9c
Re-enable sled agent -> Nexus updates; fix tests
gjcolombo Oct 1, 2023
6534c38
Merge fix: use connections instead of pool in new code
gjcolombo Oct 2, 2023
521163f
remove dead code
gjcolombo Oct 2, 2023
6114aec
Update omdb to handle vmm optionality
gjcolombo Oct 2, 2023
fe709ad
clippy
gjcolombo Oct 2, 2023
5285226
omdb: fix expected schema version in tests
gjcolombo Oct 2, 2023
b8709e7
Don't abort instance monitor task on RunningState drop
gjcolombo Oct 3, 2023
f635ba1
Report Propolis Stopped state to Nexus as Stopping
gjcolombo Oct 3, 2023
71dc6e9
Remove instances from instance manager if terminated before starting
gjcolombo Oct 3, 2023
9a93a78
fix doc build
gjcolombo Oct 3, 2023
20c0d39
Merge branch 'main' into gjcolombo/your-vmm-table-is-ready
gjcolombo Oct 6, 2023
98bd378
Remove immutable data from VmmRuntimeState API struct
gjcolombo Oct 6, 2023
f20a6d8
Remove useless parameter from db::model::Instance constructor
gjcolombo Oct 6, 2023
f845e3a
clean up instance update-and-check CTE
gjcolombo Oct 6, 2023
32358bf
use Running as the state for instances w/active VMMs
gjcolombo Oct 6, 2023
3f41b17
clarify remarks re safety of rude termination
gjcolombo Oct 6, 2023
e688552
break schema upgrade into multiple statements
gjcolombo Oct 6, 2023
78f72e5
Standardize on "propolis_id" name in most places
gjcolombo Oct 6, 2023
d2a0636
make Nexus sole owner of Instance records' state values
gjcolombo Oct 7, 2023
b8cbde5
amend schema update README to suggest named constraints
gjcolombo Oct 7, 2023
8923fb4
improve comment on instance_and_vmm_update_runtime
gjcolombo Oct 7, 2023
e5b2a54
add issue number to TODO
gjcolombo Oct 7, 2023
2933c4b
clarify doc comment
gjcolombo Oct 9, 2023
6b5dfd3
Merge branch 'main' into gjcolombo/your-vmm-table-is-ready
gjcolombo Oct 11, 2023
ec65227
Merge branch 'main' into gjcolombo/your-vmm-table-is-ready
gjcolombo Oct 12, 2023
acbaf43
pick up Diesel error changes in new code
gjcolombo Oct 12, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 1 addition & 30 deletions common/src/api/external/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -739,6 +739,7 @@ pub enum ResourceType {
UpdateableComponent,
UserBuiltin,
Zpool,
Vmm,
}

// IDENTITY METADATA
Expand Down Expand Up @@ -866,25 +867,6 @@ impl InstanceState {
InstanceState::Destroyed => "destroyed",
}
}

/// Returns true if the given state represents a fully stopped Instance.
/// This means that a transition from an !is_stopped() state must go
/// through Stopping.
pub fn is_stopped(&self) -> bool {
match self {
InstanceState::Starting => false,
InstanceState::Running => false,
InstanceState::Stopping => false,
InstanceState::Rebooting => false,
InstanceState::Migrating => false,

InstanceState::Creating => true,
InstanceState::Stopped => true,
InstanceState::Repairing => true,
InstanceState::Failed => true,
InstanceState::Destroyed => true,
}
}
}

/// The number of CPUs in an Instance
Expand Down Expand Up @@ -912,17 +894,6 @@ pub struct InstanceRuntimeState {
pub time_run_state_updated: DateTime<Utc>,
}

impl From<crate::api::internal::nexus::InstanceRuntimeState>
for InstanceRuntimeState
{
fn from(state: crate::api::internal::nexus::InstanceRuntimeState) -> Self {
InstanceRuntimeState {
run_state: state.run_state,
time_run_state_updated: state.time_updated,
}
}
}

/// View of an Instance
#[derive(ObjectIdentity, Clone, Debug, Deserialize, Serialize, JsonSchema)]
pub struct Instance {
Expand Down
75 changes: 50 additions & 25 deletions common/src/api/internal/nexus.rs
Original file line number Diff line number Diff line change
Expand Up @@ -29,40 +29,65 @@ pub struct DiskRuntimeState {
pub time_updated: DateTime<Utc>,
}

/// Runtime state of the Instance, including the actual running state and minimal
/// metadata
///
/// This state is owned by the sled agent running that Instance.
/// The "static" properties of an instance: information about the instance that
/// doesn't change while the instance is running.
#[derive(Clone, Debug, Deserialize, Serialize, JsonSchema)]
pub struct InstanceRuntimeState {
/// runtime state of the Instance
pub run_state: InstanceState,
/// which sled is running this Instance
pub sled_id: Uuid,
/// which propolis-server is running this Instance
pub propolis_id: Uuid,
/// the target propolis-server during a migration of this Instance
pub dst_propolis_id: Option<Uuid>,
/// address of propolis-server running this Instance
pub propolis_addr: Option<SocketAddr>,
/// migration id (if one in process)
pub migration_id: Option<Uuid>,
/// The generation number for the Propolis and sled identifiers for this
/// instance.
pub propolis_gen: Generation,
/// number of CPUs allocated for this Instance
pub struct InstanceProperties {
pub ncpus: InstanceCpuCount,
/// memory allocated for this Instance
pub memory: ByteCount,
/// RFC1035-compliant hostname for the Instance.
/// RFC1035-compliant hostname for the instance.
// TODO-cleanup different type?
pub hostname: String,
/// generation number for this state
}

/// The dynamic runtime properties of an instance: its current VMM ID (if any),
/// migration information (if any), and the instance state to report if there is
/// no active VMM.
#[derive(Clone, Debug, Deserialize, Serialize, JsonSchema)]
pub struct InstanceRuntimeState {
/// The state of the instance if it has no active VMM.
pub fallback_state: InstanceState,
gjcolombo marked this conversation as resolved.
Show resolved Hide resolved
/// The instance's currently active VMM ID.
pub propolis_id: Option<Uuid>,
/// If a migration is active, the ID of the target VMM.
pub dst_propolis_id: Option<Uuid>,
/// If a migration is active, the ID of that migration.
pub migration_id: Option<Uuid>,
/// Generation number for this state.
pub gen: Generation,
/// timestamp for this information
/// Timestamp for this information.
pub time_updated: DateTime<Utc>,
}

/// The dynamic runtime properties of an individual VMM process.
#[derive(Clone, Debug, Deserialize, Serialize, JsonSchema)]
pub struct VmmRuntimeState {
/// The last state reported by this VMM.
pub state: InstanceState,
/// The generation number for this VMM's state.
pub gen: Generation,
/// The sled where this VMM is running.
pub sled_id: Uuid,
/// The IP of this VMM's Propolis server.
pub propolis_addr: SocketAddr,
/// Timestamp for the VMM's state.
pub time_updated: DateTime<Utc>,
}

/// A wrapper type containing a sled's total knowledge of the state of a
/// specific VMM and the instance it incarnates.
gjcolombo marked this conversation as resolved.
Show resolved Hide resolved
#[derive(Clone, Debug, Deserialize, Serialize, JsonSchema)]
pub struct SledInstanceState {
/// The sled's conception of the state of the instance.
pub instance_state: InstanceRuntimeState,

/// The ID of the VMM whose state is being reported.
pub vmm_id: Uuid,

/// The most recent state of the sled's VMM process.
pub vmm_state: VmmRuntimeState,
}

// Oximeter producer/collector objects.

/// Information announced by a metric server, used so that clients can contact it and collect
Expand Down
132 changes: 103 additions & 29 deletions dev-tools/omdb/src/bin/omdb/db.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,10 @@ use clap::Subcommand;
use clap::ValueEnum;
use diesel::expression::SelectableHelper;
use diesel::query_dsl::QueryDsl;
use diesel::BoolExpressionMethods;
use diesel::ExpressionMethods;
use diesel::JoinOnDsl;
use diesel::NullableExpressionMethods;
use nexus_db_model::Dataset;
use nexus_db_model::Disk;
use nexus_db_model::DnsGroup;
Expand All @@ -33,9 +36,11 @@ use nexus_db_model::DnsZone;
use nexus_db_model::Instance;
use nexus_db_model::Region;
use nexus_db_model::Sled;
use nexus_db_model::Vmm;
use nexus_db_model::Zpool;
use nexus_db_queries::context::OpContext;
use nexus_db_queries::db;
use nexus_db_queries::db::datastore::InstanceAndActiveVmm;
use nexus_db_queries::db::identity::Asset;
use nexus_db_queries::db::lookup::LookupPath;
use nexus_db_queries::db::model::ServiceKind;
Expand All @@ -56,6 +61,44 @@ use strum::IntoEnumIterator;
use tabled::Tabled;
use uuid::Uuid;

const NO_ACTIVE_PROPOLIS_MSG: &str = "<no active Propolis>";
const NOT_ON_SLED_MSG: &str = "<not on any sled>";

struct MaybePropolisId(Option<Uuid>);
struct MaybeSledId(Option<Uuid>);

impl From<&InstanceAndActiveVmm> for MaybePropolisId {
fn from(value: &InstanceAndActiveVmm) -> Self {
Self(value.instance().runtime().propolis_id)
}
}

impl Display for MaybePropolisId {
gjcolombo marked this conversation as resolved.
Show resolved Hide resolved
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
if let Some(id) = self.0 {
write!(f, "{}", id)
} else {
write!(f, "{}", NO_ACTIVE_PROPOLIS_MSG)
}
}
}

impl From<&InstanceAndActiveVmm> for MaybeSledId {
fn from(value: &InstanceAndActiveVmm) -> Self {
Self(value.sled_id())
}
}

impl Display for MaybeSledId {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
if let Some(id) = self.0 {
write!(f, "{}", id)
} else {
write!(f, "{}", NOT_ON_SLED_MSG)
}
}
}

#[derive(Debug, Args)]
pub struct DbArgs {
/// URL of the database SQL interface
Expand Down Expand Up @@ -443,33 +486,54 @@ async fn cmd_db_disk_info(
if let Some(instance_uuid) = disk.runtime().attach_instance_id {
// Get the instance this disk is attached to
use db::schema::instance::dsl as instance_dsl;
let instance = instance_dsl::instance
use db::schema::vmm::dsl as vmm_dsl;
let instances: Vec<InstanceAndActiveVmm> = instance_dsl::instance
.filter(instance_dsl::id.eq(instance_uuid))
.left_join(
vmm_dsl::vmm.on(vmm_dsl::id
.nullable()
.eq(instance_dsl::active_propolis_id)
.and(vmm_dsl::time_deleted.is_null())),
)
.limit(1)
.select(Instance::as_select())
.select((Instance::as_select(), Option::<Vmm>::as_select()))
.load_async(&*conn)
.await
.context("loading requested instance")?;
.context("loading requested instance")?
.into_iter()
.map(|i: (Instance, Option<Vmm>)| i.into())
.collect();

let Some(instance) = instance.into_iter().next() else {
let Some(instance) = instances.into_iter().next() else {
bail!("no instance: {} found", instance_uuid);
};

let instance_name = instance.name().to_string();
let propolis_id = instance.runtime().propolis_id.to_string();
let my_sled_id = instance.runtime().sled_id;
let instance_name = instance.instance().name().to_string();
let disk_name = disk.name().to_string();
let usr = if instance.vmm().is_some() {
let propolis_id =
instance.instance().runtime().propolis_id.unwrap();
let my_sled_id = instance.sled_id().unwrap();

let (_, my_sled) = LookupPath::new(opctx, datastore)
.sled_id(my_sled_id)
.fetch()
.await
.context("failed to look up sled")?;
let (_, my_sled) = LookupPath::new(opctx, datastore)
.sled_id(my_sled_id)
.fetch()
.await
.context("failed to look up sled")?;

let usr = UpstairsRow {
host_serial: my_sled.serial_number().to_string(),
disk_name: disk.name().to_string(),
instance_name,
propolis_zone: format!("oxz_propolis-server_{}", propolis_id),
UpstairsRow {
host_serial: my_sled.serial_number().to_string(),
disk_name,
instance_name,
propolis_zone: format!("oxz_propolis-server_{}", propolis_id),
}
} else {
UpstairsRow {
host_serial: NOT_ON_SLED_MSG.to_string(),
propolis_zone: NO_ACTIVE_PROPOLIS_MSG.to_string(),
disk_name,
instance_name,
}
};
rows.push(usr);
} else {
Expand Down Expand Up @@ -661,7 +725,7 @@ async fn cmd_db_disk_physical(
name: disk.name().to_string(),
id: disk.id().to_string(),
state: disk.runtime().disk_state,
instance_name: instance_name,
instance_name,
});
}

Expand Down Expand Up @@ -855,17 +919,17 @@ async fn cmd_db_sleds(
struct CustomerInstanceRow {
id: Uuid,
state: String,
propolis_id: Uuid,
sled_id: Uuid,
propolis_id: MaybePropolisId,
sled_id: MaybeSledId,
}

impl From<Instance> for CustomerInstanceRow {
fn from(i: Instance) -> Self {
impl From<InstanceAndActiveVmm> for CustomerInstanceRow {
fn from(i: InstanceAndActiveVmm) -> Self {
CustomerInstanceRow {
id: i.id(),
state: format!("{:?}", i.runtime_state.state.0),
propolis_id: i.runtime_state.propolis_id,
sled_id: i.runtime_state.sled_id,
id: i.instance().id(),
state: format!("{:?}", i.effective_state()),
propolis_id: (&i).into(),
sled_id: (&i).into(),
}
}
}
Expand All @@ -876,12 +940,22 @@ async fn cmd_db_instances(
limit: NonZeroU32,
) -> Result<(), anyhow::Error> {
use db::schema::instance::dsl;
let instances = dsl::instance
use db::schema::vmm::dsl as vmm_dsl;
let instances: Vec<InstanceAndActiveVmm> = dsl::instance
.left_join(
vmm_dsl::vmm.on(vmm_dsl::id
.nullable()
.eq(dsl::active_propolis_id)
.and(vmm_dsl::time_deleted.is_null())),
)
.limit(i64::from(u32::from(limit)))
.select(Instance::as_select())
.select((Instance::as_select(), Option::<Vmm>::as_select()))
.load_async(&*datastore.pool_connection_for_tests().await?)
.await
.context("loading instances")?;
.context("loading instances")?
.into_iter()
.map(|i: (Instance, Option<Vmm>)| i.into())
.collect();

let ctx = || "listing instances".to_string();
check_limit(&instances, limit, ctx);
Expand Down
6 changes: 3 additions & 3 deletions dev-tools/omdb/tests/env.out
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ sim-b6d65341 [::1]:REDACTED_PORT - REDACTED_UUID_REDACTED_UUID_REDACTED
---------------------------------------------
stderr:
note: using database URL postgresql://root@[::1]:REDACTED_PORT/omicron?sslmode=disable
note: database schema version matches expected (5.0.0)
note: database schema version matches expected (6.0.0)
=============================================
EXECUTING COMMAND: omdb ["db", "--db-url", "junk", "sleds"]
termination: Exited(2)
Expand Down Expand Up @@ -172,7 +172,7 @@ stderr:
note: database URL not specified. Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using database URL postgresql://root@[::1]:REDACTED_PORT/omicron?sslmode=disable
note: database schema version matches expected (5.0.0)
note: database schema version matches expected (6.0.0)
=============================================
EXECUTING COMMAND: omdb ["--dns-server", "[::1]:REDACTED_PORT", "db", "sleds"]
termination: Exited(0)
Expand All @@ -185,5 +185,5 @@ stderr:
note: database URL not specified. Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using database URL postgresql://root@[::1]:REDACTED_PORT/omicron?sslmode=disable
note: database schema version matches expected (5.0.0)
note: database schema version matches expected (6.0.0)
=============================================
Loading