Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split instance state into Instance and VMM tables #4194

Merged
merged 59 commits into from
Oct 12, 2023
Merged
Changes from 1 commit
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
4f56437
Define vmm table & update instance table
gjcolombo Sep 25, 2023
ed4f8e0
update DB model types for instances and VMMs
gjcolombo Sep 25, 2023
75a2f6a
redefine internal instance/vmm runtime state types
gjcolombo Sep 25, 2023
27eb921
Update sled-agent params types
gjcolombo Sep 26, 2023
f1ada46
sled agent: rework instance state management
gjcolombo Sep 26, 2023
cae4922
Rework InstanceInner -> InstanceStates contract & termination logic
gjcolombo Sep 27, 2023
74257ae
Rework simulated instances
gjcolombo Sep 27, 2023
a5c3488
Fix up simulated sled agent collection tests
gjcolombo Sep 27, 2023
4bcced2
Plumb parameters from sled agent entry points
gjcolombo Sep 27, 2023
39b7e64
sled agent: clean up test transcription errors
gjcolombo Sep 27, 2023
70cc2fb
update sled agent OpenAPI spec
gjcolombo Sep 27, 2023
854c77f
sled agent: reimplement generated-type From impls
gjcolombo Sep 27, 2023
46560b2
nexus: adapt some db queries to instance/vmm split
gjcolombo Sep 27, 2023
df011d5
Draft CTE for updating an instance/VMM in a single statement
gjcolombo Sep 27, 2023
896ec96
nexus: fix network interface queries
gjcolombo Sep 28, 2023
f5241c3
clean up test code that assumes instances always have sleds
gjcolombo Sep 28, 2023
94021a0
nexus: return instance/vmm info tuples; fix up serial console APIs
gjcolombo Sep 29, 2023
38b3939
Fix handling of instance/vmm state changes from sled agent
gjcolombo Sep 29, 2023
7ec7f84
Update Nexus::handle_instance_put_result
gjcolombo Sep 29, 2023
8dfdfdd
Update Nexus::instance_request_state
gjcolombo Sep 29, 2023
6dfeae1
Remove Nexus::instance_sled
gjcolombo Sep 29, 2023
3952f90
Rework instance create saga
gjcolombo Sep 29, 2023
bcc2f3b
Update instance start saga
gjcolombo Sep 29, 2023
0edf89c
Update instance migration saga
gjcolombo Sep 29, 2023
d5ab2f2
Update instance delete saga
gjcolombo Sep 29, 2023
cafa141
Update disk snapshot saga
gjcolombo Sep 29, 2023
781503e
Fix build errors
gjcolombo Sep 29, 2023
28683b8
cleanup: use dendrite deletion helper from start saga undo
gjcolombo Sep 30, 2023
bac9309
Actually check reservoir space when allocating to sleds
gjcolombo Sep 30, 2023
b3df607
bugfix: reorder instance columns in Diesel table schema
gjcolombo Sep 30, 2023
fd30291
bugfix: only query ID in 'find' prongs of instance/vmm update CTE
gjcolombo Sep 30, 2023
902c0a0
bugfix: use a valid state name in instance state subquery
gjcolombo Sep 30, 2023
7ffd8ef
Update Nexus OpenAPI spec
gjcolombo Sep 30, 2023
db4fa9c
Re-enable sled agent -> Nexus updates; fix tests
gjcolombo Oct 1, 2023
6534c38
Merge fix: use connections instead of pool in new code
gjcolombo Oct 2, 2023
521163f
remove dead code
gjcolombo Oct 2, 2023
6114aec
Update omdb to handle vmm optionality
gjcolombo Oct 2, 2023
fe709ad
clippy
gjcolombo Oct 2, 2023
5285226
omdb: fix expected schema version in tests
gjcolombo Oct 2, 2023
b8709e7
Don't abort instance monitor task on RunningState drop
gjcolombo Oct 3, 2023
f635ba1
Report Propolis Stopped state to Nexus as Stopping
gjcolombo Oct 3, 2023
71dc6e9
Remove instances from instance manager if terminated before starting
gjcolombo Oct 3, 2023
9a93a78
fix doc build
gjcolombo Oct 3, 2023
20c0d39
Merge branch 'main' into gjcolombo/your-vmm-table-is-ready
gjcolombo Oct 6, 2023
98bd378
Remove immutable data from VmmRuntimeState API struct
gjcolombo Oct 6, 2023
f20a6d8
Remove useless parameter from db::model::Instance constructor
gjcolombo Oct 6, 2023
f845e3a
clean up instance update-and-check CTE
gjcolombo Oct 6, 2023
32358bf
use Running as the state for instances w/active VMMs
gjcolombo Oct 6, 2023
3f41b17
clarify remarks re safety of rude termination
gjcolombo Oct 6, 2023
e688552
break schema upgrade into multiple statements
gjcolombo Oct 6, 2023
78f72e5
Standardize on "propolis_id" name in most places
gjcolombo Oct 6, 2023
d2a0636
make Nexus sole owner of Instance records' state values
gjcolombo Oct 7, 2023
b8cbde5
amend schema update README to suggest named constraints
gjcolombo Oct 7, 2023
8923fb4
improve comment on instance_and_vmm_update_runtime
gjcolombo Oct 7, 2023
e5b2a54
add issue number to TODO
gjcolombo Oct 7, 2023
2933c4b
clarify doc comment
gjcolombo Oct 9, 2023
6b5dfd3
Merge branch 'main' into gjcolombo/your-vmm-table-is-ready
gjcolombo Oct 11, 2023
ec65227
Merge branch 'main' into gjcolombo/your-vmm-table-is-ready
gjcolombo Oct 12, 2023
acbaf43
pick up Diesel error changes in new code
gjcolombo Oct 12, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Fix up simulated sled agent collection tests
These need some minor restructuring to handle the new shape of the
InstanceStates structure.

Do away with the `is_stopped` function for instance states; this isn't used
anywhere outside of tests, and the tests can just set their own expectations for
whether a VMM should pass through the Stopping state or not.
  • Loading branch information
gjcolombo committed Oct 2, 2023
commit a5c3488a28c55393dcd6fae9b78e96a3c7053e8f
19 changes: 0 additions & 19 deletions common/src/api/external/mod.rs
Original file line number Diff line number Diff line change
@@ -866,25 +866,6 @@ impl InstanceState {
InstanceState::Destroyed => "destroyed",
}
}

/// Returns true if the given state represents a fully stopped Instance.
/// This means that a transition from an !is_stopped() state must go
/// through Stopping.
pub fn is_stopped(&self) -> bool {
match self {
InstanceState::Starting => false,
InstanceState::Running => false,
InstanceState::Stopping => false,
InstanceState::Rebooting => false,
InstanceState::Migrating => false,

InstanceState::Creating => true,
InstanceState::Stopped => true,
InstanceState::Repairing => true,
InstanceState::Failed => true,
InstanceState::Destroyed => true,
}
}
}

/// The number of CPUs in an Instance
195 changes: 113 additions & 82 deletions sled-agent/src/sim/collection.rs
Original file line number Diff line number Diff line change
@@ -248,6 +248,9 @@ impl<S: Simulatable + 'static> SimCollection<S> {
if object.object.desired().is_none()
&& object.object.ready_to_destroy()
{
info!(&self.log, "object is ready to destroy";
"object_id" => %id);

(after, Some(object))
} else {
objects.insert(id, object);
@@ -397,6 +400,8 @@ impl<S: Simulatable + Clone + 'static> SimCollection<S> {

#[cfg(test)]
mod test {
use std::net::{Ipv4Addr, SocketAddrV4};

use crate::params::{DiskStateRequested, InstanceStateRequested};
use crate::sim::collection::SimObject;
use crate::sim::disk::SimDisk;
@@ -405,37 +410,48 @@ mod test {
use chrono::Utc;
use dropshot::test_util::LogContext;
use futures::channel::mpsc::Receiver;
use omicron_common::api::external::ByteCount;
use omicron_common::api::external::DiskState;
use omicron_common::api::external::Error;
use omicron_common::api::external::Generation;
use omicron_common::api::external::InstanceCpuCount;
use omicron_common::api::external::InstanceState;
use omicron_common::api::internal::nexus::DiskRuntimeState;
use omicron_common::api::internal::nexus::InstanceRuntimeState;
use omicron_common::api::internal::nexus::SledInstanceState;
use omicron_common::api::internal::nexus::VmmRuntimeState;
use omicron_test_utils::dev::test_setup_log;
use uuid::Uuid;

fn make_instance(
logctx: &LogContext,
) -> (SimObject<SimInstance>, Receiver<()>) {
let initial_runtime = {
InstanceRuntimeState {
run_state: InstanceState::Creating,
sled_id: uuid::Uuid::new_v4(),
propolis_id: uuid::Uuid::new_v4(),
dst_propolis_id: None,
propolis_addr: None,
migration_id: None,
propolis_gen: Generation::new(),
ncpus: InstanceCpuCount(2),
memory: ByteCount::from_mebibytes_u32(512),
hostname: "myvm".to_string(),
gen: Generation::new(),
time_updated: Utc::now(),
}
let propolis_id = Uuid::new_v4();
let instance_vmm = InstanceRuntimeState {
fallback_state: InstanceState::Creating,
propolis_id: Some(propolis_id),
dst_propolis_id: None,
migration_id: None,
gen: Generation::new(),
time_updated: Utc::now(),
};

SimObject::new_simulated_auto(&initial_runtime, logctx.log.new(o!()))
let vmm_state = VmmRuntimeState {
state: InstanceState::Starting,
gen: Generation::new(),
sled_id: Uuid::new_v4(),
propolis_addr: std::net::SocketAddr::V4(SocketAddrV4::new(
Ipv4Addr::new(0, 0, 0, 0),
12400,
)),
time_updated: Utc::now(),
};

let state = SledInstanceState {
instance_state: instance_vmm,
vmm_state,
vmm_id: propolis_id,
};

SimObject::new_simulated_auto(&state, logctx.log.new(o!()))
}

fn make_disk(
@@ -459,32 +475,39 @@ mod test {
let (mut instance, mut rx) = make_instance(&logctx);
let r1 = instance.object.current();

info!(logctx.log, "new instance"; "run_state" => ?r1.run_state);
assert_eq!(r1.run_state, InstanceState::Creating);
assert_eq!(r1.gen, Generation::new());
info!(logctx.log, "new instance"; "state" => ?r1);
assert_eq!(r1.vmm_state.state, InstanceState::Starting);
assert_eq!(r1.vmm_state.gen, Generation::new());

// There's no asynchronous transition going on yet so a
// transition_finish() shouldn't change anything.
assert!(instance.object.desired().is_none());
instance.transition_finish();
let rnext = instance.object.current();
assert!(instance.object.desired().is_none());
assert_eq!(&r1.time_updated, &instance.object.current().time_updated);
assert_eq!(&r1.run_state, &instance.object.current().run_state);
assert_eq!(r1.gen, instance.object.current().gen);
assert_eq!(r1.vmm_state.time_updated, rnext.vmm_state.time_updated);
assert_eq!(rnext.vmm_state.state, rnext.vmm_state.state);
assert_eq!(r1.vmm_state.gen, rnext.vmm_state.gen);
assert!(rx.try_next().is_err());

// Stopping an instance that was never started synchronously marks it
// stopped.
// Stopping an instance that was never started synchronously destroys
// its VMM and sets the fallback state to Stopped.
let rprev = r1;
assert!(rprev.run_state.is_stopped());
let dropped =
instance.transition(InstanceStateRequested::Stopped).unwrap();
assert!(dropped.is_none());
assert!(instance.object.desired().is_none());
let rnext = instance.object.current();
assert!(rnext.gen > rprev.gen);
assert!(rnext.time_updated >= rprev.time_updated);
assert_eq!(rnext.run_state, InstanceState::Stopped);
assert!(rnext.instance_state.gen > rprev.instance_state.gen);
assert!(rnext.vmm_state.gen > rprev.vmm_state.gen);
assert!(
rnext.instance_state.time_updated
>= rprev.instance_state.time_updated
);
assert!(rnext.vmm_state.time_updated >= rprev.vmm_state.time_updated);
assert_eq!(rnext.instance_state.fallback_state, InstanceState::Stopped);
assert!(rnext.instance_state.propolis_id.is_none());
assert_eq!(rnext.vmm_state.state, InstanceState::Destroyed);
assert!(rx.try_next().is_err());

logctx.cleanup_successful();
@@ -499,106 +522,114 @@ mod test {
let (mut instance, mut rx) = make_instance(&logctx);
let r1 = instance.object.current();

info!(logctx.log, "new instance"; "run_state" => ?r1.run_state);
assert_eq!(r1.run_state, InstanceState::Creating);
assert_eq!(r1.gen, Generation::new());
info!(logctx.log, "new instance"; "state" => ?r1);
assert_eq!(r1.vmm_state.state, InstanceState::Starting);
assert_eq!(r1.vmm_state.gen, Generation::new());

// There's no asynchronous transition going on yet so a
// transition_finish() shouldn't change anything.
assert!(instance.object.desired().is_none());
instance.transition_finish();
assert!(instance.object.desired().is_none());
assert_eq!(&r1.time_updated, &instance.object.current().time_updated);
assert_eq!(&r1.run_state, &instance.object.current().run_state);
assert_eq!(r1.gen, instance.object.current().gen);
let rnext = instance.object.current();
assert_eq!(r1.vmm_state.time_updated, rnext.vmm_state.time_updated);
assert_eq!(r1.vmm_state.state, rnext.vmm_state.state);
assert_eq!(r1.vmm_state.gen, rnext.vmm_state.gen);
assert!(rx.try_next().is_err());

// Now, if we transition to "Running", we must go through the async
// process.
// Set up a transition to Running. This has no immediate effect on the
// simulated instance's state, but it does queue up a transition.
let mut rprev = r1;
assert!(rx.try_next().is_err());
let dropped =
instance.transition(InstanceStateRequested::Running).unwrap();
assert!(dropped.is_none());
assert!(instance.object.desired().is_some());
assert!(rx.try_next().is_ok());

// The VMM should still be Starting and its generation should not have
// changed (the transition to Running is queued but hasn't executed).
let rnext = instance.object.current();
assert!(rnext.gen > rprev.gen);
assert!(rnext.time_updated >= rprev.time_updated);
assert_eq!(rnext.run_state, InstanceState::Starting);
assert!(!rnext.run_state.is_stopped());
assert_eq!(rnext.vmm_state.gen, rprev.vmm_state.gen);
assert_eq!(rnext.vmm_state.time_updated, rprev.vmm_state.time_updated);
assert_eq!(rnext.vmm_state.state, InstanceState::Starting);
rprev = rnext;

// Now poke the instance. It should transition to Running.
instance.transition_finish();
let rnext = instance.object.current();
assert!(rnext.gen > rprev.gen);
assert!(rnext.time_updated >= rprev.time_updated);
assert!(rnext.vmm_state.gen > rprev.vmm_state.gen);
assert!(rnext.vmm_state.time_updated >= rprev.vmm_state.time_updated);
assert!(instance.object.desired().is_none());
assert!(rx.try_next().is_err());
assert_eq!(rprev.run_state, InstanceState::Starting);
assert_eq!(rnext.run_state, InstanceState::Running);
assert_eq!(rprev.vmm_state.state, InstanceState::Starting);
assert_eq!(rnext.vmm_state.state, InstanceState::Running);
assert!(rnext.vmm_state.gen > rprev.vmm_state.gen);
rprev = rnext;

// There shouldn't be anything left on the queue now.
instance.transition_finish();
let rnext = instance.object.current();
assert_eq!(rprev.gen, rnext.gen);
assert_eq!(rprev.vmm_state.gen, rnext.vmm_state.gen);

// If we transition again to "Running", the process should complete
// immediately.
assert!(!rprev.run_state.is_stopped());
let dropped =
instance.transition(InstanceStateRequested::Running).unwrap();
assert!(dropped.is_none());
assert!(instance.object.desired().is_none());
assert!(rx.try_next().is_err());
let rnext = instance.object.current();
assert_eq!(rnext.gen, rprev.gen);
assert_eq!(rnext.time_updated, rprev.time_updated);
assert_eq!(rnext.run_state, rprev.run_state);
assert_eq!(rnext.vmm_state.gen, rprev.vmm_state.gen);
assert_eq!(rnext.vmm_state.time_updated, rprev.vmm_state.time_updated);
assert_eq!(rnext.vmm_state.state, rprev.vmm_state.state);
rprev = rnext;

// If we go back to any stopped state, we go through the async process
// again.
assert!(!rprev.run_state.is_stopped());
assert!(rx.try_next().is_err());
let dropped =
instance.transition(InstanceStateRequested::Stopped).unwrap();
assert!(dropped.is_none());
assert!(instance.object.desired().is_some());
let rnext = instance.object.current();
assert!(rnext.gen > rprev.gen);
assert!(rnext.time_updated >= rprev.time_updated);
assert_eq!(rnext.run_state, InstanceState::Stopping);
assert!(!rnext.run_state.is_stopped());
assert!(rnext.vmm_state.gen > rprev.vmm_state.gen);
assert!(rnext.vmm_state.time_updated >= rprev.vmm_state.time_updated);
assert_eq!(rnext.vmm_state.state, InstanceState::Stopping);
rprev = rnext;

// Propolis publishes its own transition to Stopping before it publishes
// Stopped.
instance.transition_finish();
let rnext = instance.object.current();
assert!(rnext.gen > rprev.gen);
assert!(rnext.time_updated >= rprev.time_updated);
assert!(rnext.vmm_state.gen > rprev.vmm_state.gen);
assert!(rnext.vmm_state.time_updated >= rprev.vmm_state.time_updated);
assert!(instance.object.desired().is_some());
assert_eq!(rprev.run_state, InstanceState::Stopping);
assert_eq!(rnext.run_state, InstanceState::Stopping);
assert_eq!(rprev.vmm_state.state, InstanceState::Stopping);
assert_eq!(rnext.vmm_state.state, InstanceState::Stopping);
rprev = rnext;

// Stopping goes to Stopped...
instance.transition_finish();
let rnext = instance.object.current();
assert!(rnext.gen > rprev.gen);
assert!(rnext.time_updated >= rprev.time_updated);
assert!(rnext.vmm_state.gen > rprev.vmm_state.gen);
assert!(rnext.vmm_state.time_updated >= rprev.vmm_state.time_updated);
assert!(instance.object.desired().is_some());
assert_eq!(rprev.run_state, InstanceState::Stopping);
assert_eq!(rnext.run_state, InstanceState::Stopped);
assert_eq!(rprev.vmm_state.state, InstanceState::Stopping);
assert_eq!(rnext.vmm_state.state, InstanceState::Stopped);
rprev = rnext;

// ...and Stopped (internally) goes to Destroyed, though the sled agent
// hides this state from clients.
// ...and Stopped (internally) goes to Destroyed. This transition is
// hidden from external viewers of the instance by retiring the active
// Propolis ID.
instance.transition_finish();
let rnext = instance.object.current();
assert!(rnext.gen > rprev.gen);
assert_eq!(rprev.run_state, InstanceState::Stopped);
assert_eq!(rnext.run_state, InstanceState::Stopped);
assert!(rnext.vmm_state.gen > rprev.vmm_state.gen);
assert!(rnext.vmm_state.time_updated >= rprev.vmm_state.time_updated);
assert_eq!(rprev.vmm_state.state, InstanceState::Stopped);
assert_eq!(rnext.vmm_state.state, InstanceState::Destroyed);
assert!(rnext.instance_state.gen > rprev.instance_state.gen);
assert_eq!(rnext.instance_state.fallback_state, InstanceState::Stopped);
logctx.cleanup_successful();
}

@@ -611,9 +642,9 @@ mod test {
let (mut instance, _rx) = make_instance(&logctx);
let r1 = instance.object.current();

info!(logctx.log, "new instance"; "run_state" => ?r1.run_state);
assert_eq!(r1.run_state, InstanceState::Creating);
assert_eq!(r1.gen, Generation::new());
info!(logctx.log, "new instance"; "state" => ?r1);
assert_eq!(r1.vmm_state.state, InstanceState::Starting);
assert_eq!(r1.vmm_state.gen, Generation::new());
assert!(instance
.transition(InstanceStateRequested::Running)
.unwrap()
@@ -626,7 +657,7 @@ mod test {
std::thread::sleep(std::time::Duration::from_millis(100));
}

assert!(rnext.gen > rprev.gen);
assert!(rnext.vmm_state.gen > rprev.vmm_state.gen);

// Now reboot the instance. This is dispatched to Propolis, which will
// move to the Rebooting state and then back to Running.
@@ -635,9 +666,9 @@ mod test {
.unwrap()
.is_none());
let (rprev, rnext) = (rnext, instance.object.current());
assert!(rnext.gen > rprev.gen);
assert!(rnext.time_updated > rprev.time_updated);
assert_eq!(rnext.run_state, InstanceState::Rebooting);
assert!(rnext.vmm_state.gen > rprev.vmm_state.gen);
assert!(rnext.vmm_state.time_updated > rprev.vmm_state.time_updated);
assert_eq!(rnext.vmm_state.state, InstanceState::Rebooting);
instance.transition_finish();
let (rprev, rnext) = (rnext, instance.object.current());

@@ -646,9 +677,9 @@ mod test {
std::thread::sleep(std::time::Duration::from_millis(100));
}

assert!(rnext.gen > rprev.gen);
assert!(rnext.time_updated > rprev.time_updated);
assert_eq!(rnext.run_state, InstanceState::Rebooting);
assert!(rnext.vmm_state.gen > rprev.vmm_state.gen);
assert!(rnext.vmm_state.time_updated > rprev.vmm_state.time_updated);
assert_eq!(rnext.vmm_state.state, InstanceState::Rebooting);
assert!(instance.object.desired().is_some());
instance.transition_finish();
let (rprev, rnext) = (rnext, instance.object.current());
@@ -658,9 +689,9 @@ mod test {
std::thread::sleep(std::time::Duration::from_millis(100));
}

assert!(rnext.gen > rprev.gen);
assert!(rnext.time_updated > rprev.time_updated);
assert_eq!(rnext.run_state, InstanceState::Running);
assert!(rnext.vmm_state.gen > rprev.vmm_state.gen);
assert!(rnext.vmm_state.time_updated > rprev.vmm_state.time_updated);
assert_eq!(rnext.vmm_state.state, InstanceState::Running);
logctx.cleanup_successful();
}