Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform instance state transitions in instance-update saga #5749

Merged
merged 234 commits into from
Aug 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
234 commits
Select commit Hold shift + click to select a range
f508831
start sketching saga
hawkw Mar 25, 2024
7a8b30e
wip
hawkw Apr 9, 2024
2c0fb3a
wip
hawkw May 13, 2024
3331f66
remove dead code
hawkw May 13, 2024
76c6960
sketch out the whole "instance destroyed" subsaga
hawkw May 13, 2024
bc93d0e
fix snapshot-create using renamed API
hawkw May 14, 2024
de11162
oh, i guess undo actions return anyhow::Error
hawkw May 14, 2024
195d747
okay this seems more or less right
hawkw May 14, 2024
4405883
add logging to VMM destroyed subsaga
hawkw May 14, 2024
756b78b
fixy
hawkw May 14, 2024
becb3d5
oh we can just call the nexus methods i guess
hawkw May 14, 2024
82f4731
wip bgtask stuff
hawkw May 15, 2024
88f3b8b
plumbing etc
hawkw May 16, 2024
e71f8f0
whoops, missing none
hawkw May 16, 2024
ecbdbca
more plumbing
hawkw May 16, 2024
4b80b2b
add configs
hawkw May 16, 2024
f05e3b3
whoops
hawkw May 16, 2024
15770a0
remaining configs
hawkw May 16, 2024
41321d4
unassign oximeter producer
hawkw May 24, 2024
a5b6d9e
update `delete_v2p_mappings` in light of #5568
hawkw May 24, 2024
8d9cdb2
tear apart most of `cpapi_instances_put`
hawkw May 24, 2024
6850946
rewrite most of the saga
hawkw May 24, 2024
dc807a9
fixup
hawkw May 24, 2024
3a421ed
rm unneeded comment
hawkw May 24, 2024
d8c0e63
WHEW OKAY
hawkw May 25, 2024
0bc3ae3
start ripping out sled-agent instance state munging
hawkw May 28, 2024
b2cf79c
whoops forgot this one
hawkw May 28, 2024
a69caf5
wip dead code cleanup
hawkw May 28, 2024
d1709a5
post merge fixy-wixy
hawkw May 29, 2024
6efd374
it's sagas all the way down
hawkw May 29, 2024
610d5e0
deal with dead code in a more compiley way
hawkw May 29, 2024
d3f8b0b
fix up stuff
hawkw May 29, 2024
f25792f
clean things up a bit
hawkw May 29, 2024
78cb0b3
big post-merge update
hawkw May 30, 2024
65c77f0
post-rebase fixup
hawkw Jun 3, 2024
c59d8f9
regenerate sled-agent openapi
hawkw Jun 3, 2024
63e514e
bunch of saga plumbing fixes
hawkw Jun 3, 2024
6404be6
handle unable-to-lock more gracefully
hawkw Jun 3, 2024
a40b35f
fix lock generations getting eaten
hawkw Jun 3, 2024
b00683a
more consistent naming for logs
hawkw Jun 4, 2024
da48db9
rm dead import
hawkw Jun 5, 2024
8078508
post-rebase remove dead imports
hawkw Jun 10, 2024
b791897
update openapi another time
hawkw Jun 13, 2024
2bc8183
hack up the CTE to do vmm-and-migration updates
hawkw Jun 13, 2024
419d067
actually write migration states to the db
hawkw Jun 14, 2024
dfe2594
remove duplicate code in CTE
hawkw Jun 14, 2024
da7bbb9
add expectorate tests for CTE
hawkw Jun 14, 2024
ec93ca9
uuids are typed now
hawkw Jun 19, 2024
4a5368d
tear up way more of sled-agent
hawkw Jun 19, 2024
53299b3
compiley-ness
hawkw Jun 19, 2024
777cdcc
remove most of `instance_set_migration_ids`
hawkw Jun 20, 2024
67c268b
make instance-migrate sagas just set migration IDs
hawkw Jun 20, 2024
3bbe6bb
cleanup
hawkw Jun 21, 2024
edbfdc8
update CTE expected SQL
hawkw Jun 21, 2024
3bdb094
whoops that was the sled id and not the vmm ID
hawkw Jun 21, 2024
63ec92a
just return instance in instance_set_migration_ids
hawkw Jun 21, 2024
ad94f20
fix sim sled-agent looking at the wrong migration
hawkw Jun 21, 2024
796f759
sled-agent: make sure migration-in is populated
hawkw Jun 21, 2024
64bbb3a
start sketching out migration-update subsaga
hawkw Jun 21, 2024
0cfec53
fix migrate saga sending states without migration IDs
hawkw Jun 21, 2024
74b9e37
more sketching out migration update saga
hawkw Jun 21, 2024
ffcc23f
okay, i think this is the migration part
hawkw Jun 21, 2024
932a178
...you have to actually register the saga actions
hawkw Jun 21, 2024
1bda356
obnoxious clippy nonsense
hawkw Jun 21, 2024
9450f31
shut up clippy in a slightly more polite way
hawkw Jun 21, 2024
437cc7a
fix docs
hawkw Jun 21, 2024
b2cc1a8
update omdb output
hawkw Jun 24, 2024
0e38a5a
fix(?) illumos-only tests
hawkw Jun 24, 2024
15b8333
put that back
hawkw Jun 24, 2024
3170f41
whoops, imports
hawkw Jun 24, 2024
85db24b
docs build unbreakening
hawkw Jun 24, 2024
5091226
you have to actually update the timestamps
hawkw Jun 24, 2024
3930d06
fix virtual provisioning record not being deleted
hawkw Jun 25, 2024
3d35078
move migration stuff out of a subsaga
hawkw Jun 25, 2024
4c4d648
also get rid of subsaga for active-vmm-destroyed
hawkw Jun 25, 2024
d21b60d
clean up network config actions
hawkw Jun 25, 2024
76aee1c
post-rebase fixy-uppy
hawkw Jul 1, 2024
e97e6e0
queue update sagas for terminated migrations
hawkw Jul 1, 2024
d79050c
add instance-updater omdb stuff
hawkw Jul 1, 2024
583b709
remove max gen from `virtual_provisioning_collection_delete_instance`
hawkw Jul 2, 2024
10594de
clippiness
hawkw Jul 2, 2024
882ebb2
whoops the generation number gets used here
hawkw Jul 2, 2024
e3191f1
review feedback from @gjcolombo
hawkw Jul 2, 2024
5a155f5
oops forgot to update tests
hawkw Jul 2, 2024
c526add
whoops i forgot to update tests
hawkw Jul 2, 2024
d17575d
gotta actually put the instance in the NoVmm state
hawkw Jul 2, 2024
a62c99d
oh i guess we can't assert that
hawkw Jul 2, 2024
7b6f5a0
destroy both VMMs from the update saga
hawkw Jul 3, 2024
1b7bfd8
clippy-clean helios sled-agent tests
hawkw Jul 3, 2024
0260f14
post-rebase update for #5985
hawkw Jul 3, 2024
5f3718d
post-rebase update for #5964
hawkw Jul 3, 2024
5e566da
don't wait for saga completion in API endpoint
hawkw Jul 3, 2024
130fca2
use `wait_for_condition` in instance start tests
hawkw Jul 4, 2024
92b5388
also use it in `instance_migrate` tests
hawkw Jul 4, 2024
f805c16
refactor `notify_instance_updated` a bit
hawkw Jul 4, 2024
f0f20c2
also make snapshot_create tests wait for NoVmm
hawkw Jul 4, 2024
ecd0030
clippy cleanliness
hawkw Jul 4, 2024
473299d
fixup instance real state determination
hawkw Jul 8, 2024
5a2404e
make tests wait for states
hawkw Jul 8, 2024
6af03b5
make logging in start saga more useful
hawkw Jul 8, 2024
df9e1eb
don't treat instances as stopped until virtual resources are gone
hawkw Jul 8, 2024
9a6b1bf
temp fix sagas not being rescheduled
hawkw Jul 8, 2024
723ae56
instance_wait_for_state should log successful transitions
hawkw Jul 8, 2024
ef24a5d
remove defunct test output files
hawkw Jul 8, 2024
4455946
fix `instance_wait_for_state` accidentally using the instance tests p…
hawkw Jul 8, 2024
1e0ed7b
add missing wait for stop in disk tests
hawkw Jul 8, 2024
093ee44
found some more places that need to wait for stop
hawkw Jul 8, 2024
edc69a5
WOW THERE'S MORE OF THEM (+authn stuff)
hawkw Jul 8, 2024
77c458a
also include SagaUnwound in stopping
hawkw Jul 8, 2024
8dadd30
oh my god theres even more of them
hawkw Jul 8, 2024
a6af286
WHEW OKAY ACTUALLY DO THE MIGRATION
hawkw Jul 9, 2024
06d01c1
another disk test that needs to wait for stop
hawkw Jul 9, 2024
b9f5a46
oh there's also pantry tests that stop instances
hawkw Jul 9, 2024
a5aa853
clippy cleanliness
hawkw Jul 9, 2024
2cde269
bump instance-updater bg task period in tests
hawkw Jul 9, 2024
b46676a
use migration_mark_failed where it's supposed to be used
hawkw Jul 9, 2024
7a75c7b
don't re-query sled ID for network cfg update
hawkw Jul 9, 2024
46ae4cb
slightly more accurate log message
hawkw Jul 9, 2024
3e5b6d1
rm vestigial line
hawkw Jul 9, 2024
32ea68f
THERES MORE OF THEM AGHGHGHGHHGHHGH
hawkw Jul 9, 2024
bedcfc6
chain into another saga (bad initial version)
hawkw Jul 9, 2024
529684b
Revert "don't re-query sled ID for network cfg update"
hawkw Jul 9, 2024
84ebd74
found another place we need to wait for stop
hawkw Jul 10, 2024
1cebe3e
update omdb again
hawkw Jul 10, 2024
8526697
clean up saga chaining code
hawkw Jul 10, 2024
1eba9cd
Reapply "don't re-query sled ID for network cfg update"
hawkw Jul 10, 2024
896d21f
cleanup migration update computation
hawkw Jul 10, 2024
b09503c
misc review feedback cleanup
hawkw Jul 11, 2024
7ecb9a8
nicer `instance_set_migration_ids`
hawkw Jul 11, 2024
d3db859
actually return UPDATED state in `instance_set_migration_ids`
hawkw Jul 11, 2024
6571b1a
single-step through states in migration test
hawkw Jul 11, 2024
1f36487
fix migration-source sled agents creating state at gen 1
hawkw Jul 11, 2024
7b39931
fix migration update query not bumping generation
hawkw Jul 11, 2024
6ef2120
report instance states as "migrating" until migration resolves
hawkw Jul 11, 2024
dfc0240
poke instances twice in single-steppy migration test
hawkw Jul 11, 2024
c47ed90
gotta wait for update saga to go back to running
hawkw Jul 11, 2024
9f7e101
whoops, migration arm needs to be BEFORE destroyed
hawkw Jul 11, 2024
59cf488
make filter expr more correct and match comment
hawkw Jul 11, 2024
b544332
whoops i forgot to update expectorate queries
hawkw Jul 11, 2024
e6771fb
add instance_wait_for_state saga test helpers
hawkw Jul 12, 2024
a4c3eac
remove test that doesn't really matter
hawkw Jul 12, 2024
9d622f0
remove defunct `instance_put_migration_ids` API
hawkw Jul 12, 2024
91beec6
sled-agent's per-instance logger should have UUIDs
hawkw Jul 12, 2024
96a21d0
misc commentary suggestions from @gjcolombo
hawkw Jul 12, 2024
70a7fbf
wip: instance updater saga tests
hawkw Jul 15, 2024
6951d97
wait for the preceeding update saga to complete
hawkw Jul 16, 2024
579c135
more assertions
hawkw Jul 16, 2024
5f7dbe1
ah, there was already a thingy for that
hawkw Jul 16, 2024
2515d20
additional active_vmm_destroyed tests
hawkw Jul 16, 2024
9f86865
rearrange deck chairs
hawkw Jul 16, 2024
d67395e
migration source completed tests
hawkw Jul 17, 2024
b9e02bc
add unwinding test for migration source success
hawkw Jul 17, 2024
77a5a85
shut up clippy
hawkw Jul 17, 2024
4d7e582
urghh
hawkw Jul 17, 2024
2c86e48
more migration tests
hawkw Jul 17, 2024
8dae386
massively overengineered test DSL thingy
hawkw Jul 18, 2024
98d5c4a
fix disappearing target sled resources
hawkw Jul 18, 2024
1aefe47
avoid spurious update sagas
hawkw Jul 18, 2024
f91a2d1
move update saga needed logic into saga module
hawkw Jul 18, 2024
e74fe4c
get rid of `write_returned_instance_state`
hawkw Jul 18, 2024
20311aa
cleanup warnings
hawkw Jul 18, 2024
77dc84b
docs unhappiness
hawkw Jul 18, 2024
bc4f276
start on the documentation i promised i'd write
hawkw Jul 19, 2024
7abe814
use the same code for determining effective states
hawkw Jul 19, 2024
17bb219
Revert "use the same code for determining effective states"
hawkw Jul 19, 2024
5d699c9
Reapply "use the same code for determining effective states"
hawkw Jul 19, 2024
0146cc7
cleanup visibilities
hawkw Jul 22, 2024
1805b8d
ensure lock is reliably unlocked when unwinding
hawkw Jul 22, 2024
80bb38d
assert instance state is consistent when unwinding
hawkw Jul 22, 2024
f7afb85
start on migration failure tests
hawkw Jul 22, 2024
9cca199
test more destroyed outcomes
hawkw Jul 22, 2024
af207ca
more commentary
hawkw Jul 22, 2024
7998371
more of my nonsense
hawkw Jul 22, 2024
de9391b
misc style consistency/cleanup
hawkw Jul 23, 2024
c0a4dda
don't need to serialize the entire snapshot
hawkw Jul 23, 2024
0090402
finish the Next Great American Novel
hawkw Jul 23, 2024
237831c
whoops i broke it
hawkw Jul 23, 2024
e2f451d
serialize less stuff
hawkw Jul 23, 2024
6902907
oops lol
hawkw Jul 23, 2024
9f3ba51
fix docs links (oops)
hawkw Jul 23, 2024
eab900c
remove second bonus license header
hawkw Jul 23, 2024
77af396
fix typo (thanks @bcantrill)
hawkw Jul 23, 2024
4fd8320
instance_and_vmm_update_runtime is dead code now
hawkw Jul 24, 2024
368dacf
CTE only does VMM and migration updates
hawkw Jul 24, 2024
b9f7711
separate "commit updates and unlock" and "just unlock now please" ope…
hawkw Jul 26, 2024
377bda1
fixup tests
hawkw Jul 26, 2024
6f74ced
fixup tests, add test for unlocking a deleted instance
hawkw Jul 26, 2024
836ea7d
activate network RPWs when they're likely to see new state
hawkw Jul 26, 2024
0208526
post-rebase fixup (PutMigrationIds went away)
hawkw Jul 27, 2024
0f5e340
update saga should also unlink `SagaUnwound` VMMs
hawkw Jul 29, 2024
c4c3ec8
allow start sagas to clobber saga-unwound VMMs
hawkw Jul 29, 2024
8d23d66
properly handle SagaUnwound, part 2
hawkw Jul 29, 2024
f8b44f6
fix target VMM unwound check when setting migration IDs
hawkw Jul 30, 2024
ddbf2fa
placate clippy
hawkw Jul 30, 2024
f301183
remove log file (oops)
hawkw Jul 30, 2024
d68f0de
start addressing @smklein's review suggestions
hawkw Jul 30, 2024
3cdac70
fix comments
hawkw Jul 30, 2024
bfd2c32
`InstanceGestalt` RIDES AGAIN!
hawkw Jul 30, 2024
12f69b4
update openapi (changed a comment)
hawkw Jul 31, 2024
558de3c
don't duplicate `SimulatedMigration` API types
hawkw Jul 31, 2024
b7075b6
better document unwinding behavior
hawkw Jul 31, 2024
48be892
update openapi yet again
hawkw Jul 31, 2024
e2a1ee5
fix typo
hawkw Jul 31, 2024
a0e6042
turns out we can just totally disable it in tests
hawkw Jul 31, 2024
0b89c58
turns out it's fine to not unlock deleted instances
hawkw Jul 31, 2024
9788b40
fix unfinished comments
hawkw Jul 31, 2024
9d331d6
lol, OMDB panics when it's Duration::MAX
hawkw Jul 31, 2024
d61d137
initial test for `vmm_and_migration_update_runtime`
hawkw Aug 1, 2024
b657fbc
actually repro the problem
hawkw Aug 1, 2024
bfb85af
replace vmm/migration CTE with transaction
hawkw Aug 1, 2024
4b7fe6b
don't reject runtime states with no migration
hawkw Aug 1, 2024
c7efd47
start addressing @gjcolombo's feedback
hawkw Aug 5, 2024
8e49853
rm extra backtick
hawkw Aug 5, 2024
5754b3a
activate bg task when dropping lock on unwind
hawkw Aug 5, 2024
4487a6a
don't unset migration IDs in migration unwind
hawkw Aug 5, 2024
90b2d04
fix "succeed idempotently" tests not testing that
hawkw Aug 5, 2024
73cdb72
tests for migration completed but target VMM destroyed
hawkw Aug 5, 2024
b031bd0
correctly handle target destroyed migration success
hawkw Aug 5, 2024
18b2f32
fix start saga unwinding if duplicate child unwinds
hawkw Aug 5, 2024
c8b7421
fix completed updates spawning spurious update sagas
hawkw Aug 5, 2024
20a222b
update tests to new migrate saga unwinding behavior
hawkw Aug 6, 2024
77594e3
saga idempotency tests should test the real saga
hawkw Aug 6, 2024
8cd3378
also run unwinding tests with "real" saga
hawkw Aug 6, 2024
e04b278
add back start saga unwinding/idempotency tests
hawkw Aug 6, 2024
caa4263
release lock if child sagas unwind before locking
hawkw Aug 6, 2024
59adbf4
turns out there's a normal reason that could happen
hawkw Aug 6, 2024
9461168
rename `UpdatesRequired::for_snapshot`
hawkw Aug 6, 2024
2253390
'rename symbol' doesnt work on docs
hawkw Aug 6, 2024
47d8c38
improve errors/log messages
hawkw Aug 7, 2024
58bb353
update instance lock tests to match new behavior
hawkw Aug 7, 2024
c707814
fix `instance_commit_update` idempotency
hawkw Aug 7, 2024
8ed6dc6
fail to commit update if gen has advanced while locked
hawkw Aug 7, 2024
d93a80b
use `InstanceAndVmm::effective_state` in start
hawkw Aug 9, 2024
67c424c
document more saga interactions
hawkw Aug 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 2 additions & 31 deletions clients/nexus-client/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -122,22 +122,6 @@ impl From<types::VmmState> for omicron_common::api::internal::nexus::VmmState {
}
}

impl From<omicron_common::api::internal::nexus::InstanceRuntimeState>
for types::InstanceRuntimeState
{
fn from(
s: omicron_common::api::internal::nexus::InstanceRuntimeState,
) -> Self {
Self {
dst_propolis_id: s.dst_propolis_id,
gen: s.gen,
migration_id: s.migration_id,
propolis_id: s.propolis_id,
time_updated: s.time_updated,
}
}
}

impl From<omicron_common::api::internal::nexus::VmmRuntimeState>
for types::VmmRuntimeState
{
Expand All @@ -153,10 +137,10 @@ impl From<omicron_common::api::internal::nexus::SledInstanceState>
s: omicron_common::api::internal::nexus::SledInstanceState,
) -> Self {
Self {
instance_state: s.instance_state.into(),
propolis_id: s.propolis_id,
vmm_state: s.vmm_state.into(),
migration_state: s.migration_state.map(Into::into),
migration_in: s.migration_in.map(Into::into),
migration_out: s.migration_out.map(Into::into),
}
}
}
Expand All @@ -169,26 +153,13 @@ impl From<omicron_common::api::internal::nexus::MigrationRuntimeState>
) -> Self {
Self {
migration_id: s.migration_id,
role: s.role.into(),
state: s.state.into(),
gen: s.gen,
time_updated: s.time_updated,
}
}
}

impl From<omicron_common::api::internal::nexus::MigrationRole>
for types::MigrationRole
{
fn from(s: omicron_common::api::internal::nexus::MigrationRole) -> Self {
use omicron_common::api::internal::nexus::MigrationRole as Input;
match s {
Input::Source => Self::Source,
Input::Target => Self::Target,
}
}
}

impl From<omicron_common::api::internal::nexus::MigrationState>
for types::MigrationState
{
Expand Down
79 changes: 64 additions & 15 deletions clients/sled-agent-client/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@
//! Interface for making API requests to a Sled Agent

use async_trait::async_trait;
use schemars::JsonSchema;
use serde::Deserialize;
use serde::Serialize;
use std::convert::TryFrom;
use uuid::Uuid;

Expand Down Expand Up @@ -162,10 +165,10 @@ impl From<types::SledInstanceState>
{
fn from(s: types::SledInstanceState) -> Self {
Self {
instance_state: s.instance_state.into(),
propolis_id: s.propolis_id,
vmm_state: s.vmm_state.into(),
migration_state: s.migration_state.map(Into::into),
migration_in: s.migration_in.map(Into::into),
migration_out: s.migration_out.map(Into::into),
}
}
}
Expand All @@ -177,25 +180,12 @@ impl From<types::MigrationRuntimeState>
Self {
migration_id: s.migration_id,
state: s.state.into(),
role: s.role.into(),
gen: s.gen,
time_updated: s.time_updated,
}
}
}

impl From<types::MigrationRole>
for omicron_common::api::internal::nexus::MigrationRole
{
fn from(r: types::MigrationRole) -> Self {
use omicron_common::api::internal::nexus::MigrationRole as Output;
match r {
types::MigrationRole::Source => Output::Source,
types::MigrationRole::Target => Output::Target,
}
}
}

impl From<types::MigrationState>
for omicron_common::api::internal::nexus::MigrationState
{
Expand Down Expand Up @@ -457,12 +447,29 @@ impl From<types::SledIdentifiers>
/// are bonus endpoints, not generated in the real client.
#[async_trait]
pub trait TestInterfaces {
async fn instance_single_step(&self, id: Uuid);
async fn instance_finish_transition(&self, id: Uuid);
async fn instance_simulate_migration_source(
&self,
id: Uuid,
params: SimulateMigrationSource,
);
async fn disk_finish_transition(&self, id: Uuid);
}

#[async_trait]
impl TestInterfaces for Client {
async fn instance_single_step(&self, id: Uuid) {
let baseurl = self.baseurl();
let client = self.client();
let url = format!("{}/instances/{}/poke-single-step", baseurl, id);
client
.post(url)
.send()
.await
.expect("instance_single_step() failed unexpectedly");
}

async fn instance_finish_transition(&self, id: Uuid) {
let baseurl = self.baseurl();
let client = self.client();
Expand All @@ -484,4 +491,46 @@ impl TestInterfaces for Client {
.await
.expect("disk_finish_transition() failed unexpectedly");
}

async fn instance_simulate_migration_source(
&self,
id: Uuid,
params: SimulateMigrationSource,
) {
let baseurl = self.baseurl();
let client = self.client();
let url = format!("{baseurl}/instances/{id}/sim-migration-source");
client
.post(url)
.json(&params)
.send()
.await
.expect("instance_simulate_migration_source() failed unexpectedly");
}
}

/// Parameters to the `/instances/{id}/sim-migration-source` test API.
///
/// This message type is not included in the OpenAPI spec, because this API
/// exists only in test builds.
#[derive(Serialize, Deserialize, JsonSchema)]
pub struct SimulateMigrationSource {
/// The ID of the migration out of the instance's current active VMM.
pub migration_id: Uuid,
/// What migration result (success or failure) to simulate.
pub result: SimulatedMigrationResult,
}

/// The result of a simulated migration out from an instance's current active
/// VMM.
#[derive(Serialize, Deserialize, JsonSchema)]
pub enum SimulatedMigrationResult {
/// Simulate a successful migration out.
Success,
/// Simulate a failed migration out.
///
/// # Note
///
/// This is not currently implemented by the simulated sled-agent.
Failure,
}
59 changes: 26 additions & 33 deletions common/src/api/internal/nexus.rs
Original file line number Diff line number Diff line change
Expand Up @@ -117,18 +117,38 @@ pub struct VmmRuntimeState {
/// specific VMM and the instance it incarnates.
#[derive(Clone, Debug, Deserialize, Serialize, JsonSchema)]
pub struct SledInstanceState {
/// The sled's conception of the state of the instance.
pub instance_state: InstanceRuntimeState,

/// The ID of the VMM whose state is being reported.
pub propolis_id: PropolisUuid,

/// The most recent state of the sled's VMM process.
pub vmm_state: VmmRuntimeState,

/// The current state of any in-progress migration for this instance, as
/// understood by this sled.
pub migration_state: Option<MigrationRuntimeState>,
/// The current state of any inbound migration to this VMM.
pub migration_in: Option<MigrationRuntimeState>,

/// The state of any outbound migration from this VMM.
pub migration_out: Option<MigrationRuntimeState>,
}

#[derive(Copy, Clone, Debug, Default)]
pub struct Migrations<'state> {
pub migration_in: Option<&'state MigrationRuntimeState>,
pub migration_out: Option<&'state MigrationRuntimeState>,
}

impl Migrations<'_> {
pub fn empty() -> Self {
Self { migration_in: None, migration_out: None }
}
}

impl SledInstanceState {
pub fn migrations(&self) -> Migrations<'_> {
Migrations {
migration_in: self.migration_in.as_ref(),
migration_out: self.migration_out.as_ref(),
}
}
}

/// An update from a sled regarding the state of a migration, indicating the
Expand All @@ -137,7 +157,6 @@ pub struct SledInstanceState {
pub struct MigrationRuntimeState {
pub migration_id: Uuid,
pub state: MigrationState,
pub role: MigrationRole,
pub gen: Generation,

/// Timestamp for the migration state update.
Expand Down Expand Up @@ -192,32 +211,6 @@ impl fmt::Display for MigrationState {
}
}

#[derive(
Clone, Copy, Debug, PartialEq, Eq, Deserialize, Serialize, JsonSchema,
)]
#[serde(rename_all = "snake_case")]
pub enum MigrationRole {
/// This update concerns the source VMM of a migration.
Source,
/// This update concerns the target VMM of a migration.
Target,
}

impl MigrationRole {
pub fn label(&self) -> &'static str {
match self {
Self::Source => "source",
Self::Target => "target",
}
}
}

impl fmt::Display for MigrationRole {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
f.write_str(self.label())
}
}

// Oximeter producer/collector objects.

/// The kind of metric producer this is.
Expand Down
88 changes: 82 additions & 6 deletions dev-tools/omdb/src/bin/omdb/nexus.rs
Original file line number Diff line number Diff line change
Expand Up @@ -929,6 +929,9 @@ fn print_task_details(bgtask: &BackgroundTask, details: &serde_json::Value) {
/// number of stale instance metrics that were deleted
pruned_instances: usize,

/// update sagas queued due to instance updates.
update_sagas_queued: usize,

/// instance states from completed checks.
///
/// this is a mapping of stringified instance states to the number
Expand Down Expand Up @@ -970,6 +973,7 @@ fn print_task_details(bgtask: &BackgroundTask, details: &serde_json::Value) {
),
Ok(TaskSuccess {
total_instances,
update_sagas_queued,
pruned_instances,
instance_states,
failed_checks,
Expand All @@ -987,7 +991,7 @@ fn print_task_details(bgtask: &BackgroundTask, details: &serde_json::Value) {
for (state, count) in &instance_states {
println!(" -> {count} instances {state}")
}

println!(" update sagas queued: {update_sagas_queued}");
println!(" failed checks: {total_failures}");
for (failure, count) in &failed_checks {
println!(" -> {count} {failure}")
Expand Down Expand Up @@ -1239,11 +1243,6 @@ fn print_task_details(bgtask: &BackgroundTask, details: &serde_json::Value) {
} else if name == "lookup_region_port" {
match serde_json::from_value::<LookupRegionPortStatus>(details.clone())
{
Err(error) => eprintln!(
"warning: failed to interpret task details: {:?}: {:?}",
error, details
),

Ok(LookupRegionPortStatus { found_port_ok, errors }) => {
println!(" total filled in ports: {}", found_port_ok.len());
for line in &found_port_ok {
Expand All @@ -1255,6 +1254,83 @@ fn print_task_details(bgtask: &BackgroundTask, details: &serde_json::Value) {
println!(" > {line}");
}
}

Err(error) => eprintln!(
"warning: failed to interpret task details: {:?}: {:?}",
error, details,
),
}
} else if name == "instance_updater" {
#[derive(Deserialize)]
struct UpdaterStatus {
/// number of instances found with destroyed active VMMs
destroyed_active_vmms: usize,

/// number of instances found with terminated active migrations
terminated_active_migrations: usize,

/// number of update sagas started.
sagas_started: usize,

/// number of sagas completed successfully
sagas_completed: usize,

/// number of sagas which failed
sagas_failed: usize,

/// number of sagas which could not be started
saga_start_failures: usize,

/// the last error that occurred during execution.
error: Option<String>,
}
match serde_json::from_value::<UpdaterStatus>(details.clone()) {
Err(error) => eprintln!(
"warning: failed to interpret task details: {:?}: {:?}",
error, details
),
Ok(UpdaterStatus {
destroyed_active_vmms,
terminated_active_migrations,
sagas_started,
sagas_completed,
sagas_failed,
saga_start_failures,
error,
}) => {
if let Some(error) = error {
println!(" task did not complete successfully!");
println!(" most recent error: {error}");
}

println!(
" total instances in need of updates: {}",
destroyed_active_vmms + terminated_active_migrations
);
println!(
" instances with destroyed active VMMs: {}",
destroyed_active_vmms,
);
println!(
" instances with terminated active migrations: {}",
terminated_active_migrations,
);
println!(" update sagas started: {sagas_started}");
println!(
" update sagas completed successfully: {}",
sagas_completed,
);

let total_failed = sagas_failed + saga_start_failures;
if total_failed > 0 {
println!(" unsuccessful update sagas: {total_failed}");
println!(
" sagas which could not be started: {}",
saga_start_failures
);
println!(" sagas failed: {sagas_failed}");
}
}
};
} else {
println!(
Expand Down
Loading
Loading