Skip to content

Commit

Permalink
[nexus] Expunge disk internal API, omdb commands (#5994)
Browse files Browse the repository at this point in the history
Provides an internal API to remove disks, and wires it into omdb.
Additionally, expands omdb commands for visibility.

- `omdb db physical-disks` can be used to view all "control plane
physical disks". This is similar to, but distinct from, the `omdb db
inventory physical-disks` command, as it reports control plane disks
that have been adopted in the control plane. This command is necessary
for identifying the UUID of the associated control plane object, which
is not observable via inventory.
- `omdb nexus sleds expunge-disk` can be used to expunge a physical disk
from a sled. This relies on many prior patches to operate correctly, but
with the combination of: #5987, #5965, #5931, #5952, #5601, #5599, we
can observe the following behavior: expunging a disk leads to all
"users" of that disk (zone filesystems, datasets, zone bundles, etc)
being removed.

I tested this PR on a4x2 using the following steps:

```bash
# Boot a4x2, confirm the Nexus zone is running
# From g0, zlogin oxz_switch

$ omdb db sleds

SERIAL  IP                             ROLE      POLICY      STATE   ID                                   
 g2      [fd00:1122:3344:103::1]:12345  -         in service  active  29fede5f-37e4-4528-bcf2-f3ee94924894 
 g0      [fd00:1122:3344:101::1]:12345  scrimlet  in service  active  6a2c7019-d055-4256-8bad-042b97aa0e5e 
 g1      [fd00:1122:3344:102::1]:12345  -         in service  active  a611b43e-3995-4cd4-9603-89ca6aca3dc5 
 g3      [fd00:1122:3344:104::1]:12345  scrimlet  in service  active  f62f2cfe-d17b-4bd6-ae64-57e8224d3672

# We'll plan on expunging a disk on g1, and observing the effects.
$ export SLED_ID=a611b43e-3995-4cd4-9603-89ca6aca3dc5
$ export OMDB_SLED_AGENT_URL=http://[fd00:1122:3344:102::1]:12345
$ omdb sled-agent zones list

    "oxz_cockroachdb_b3fecda8-2eb8-4ff3-9cf6-90c94fba7c50"
    "oxz_crucible_19831c98-3137-4af4-a93d-fc1a17c138f2"
    "oxz_crucible_6adcb8ec-6c9e-4e8a-a8d4-bbf9ad44e2c4"
    "oxz_crucible_74b2f587-10ce-4131-97fd-9832c52c8a41"
    "oxz_crucible_9e422508-f4d5-4c24-8dde-0080c0916419"
    "oxz_crucible_a47e9625-d189-4001-877a-cc3aa5b1f3eb"
    "oxz_crucible_pantry_c3b4e3cb-3e23-4f5e-921b-04e4801924fd"
    "oxz_external_dns_7e669b6f-a3fe-47a9-addd-20e42c58b8bb"
    "oxz_internal_dns_1a45a6e8-5b03-4ab4-a3db-e83fb7767767"
    "oxz_ntp_209ad0d0-a5e7-4ab8-ac8f-e99902697b32"
    "oxz_oximeter_864efebb-790f-4b7a-8377-b2c82c87f5b8"

$ omdb db physical-disks | grep $SLED_ID
 ID                                    SERIAL                 VENDOR            MODEL               SLED_ID                               POLICY      STATE
 23524716-a331-4d57-aa71-8bd4dbc916f8  synthetic-serial-g1_0  synthetic-vendor  synthetic-model-U2  a611b43e-3995-4cd4-9603-89ca6aca3dc5  in service  active 
 3ca1812b-55e3-47ed-861f-f667f626c8a0  synthetic-serial-g1_3  synthetic-vendor  synthetic-model-U2  a611b43e-3995-4cd4-9603-89ca6aca3dc5  in service  active 
 40139afb-7076-45d9-84cf-b96eefe7acf8  synthetic-serial-g1_1  synthetic-vendor  synthetic-model-U2  a611b43e-3995-4cd4-9603-89ca6aca3dc5  in service  active 
 5c8e33dd-1230-4214-af78-9be892d9f421  synthetic-serial-g1_4  synthetic-vendor  synthetic-model-U2  a611b43e-3995-4cd4-9603-89ca6aca3dc5  in service  active 
 85780bbf-8e2d-481e-9013-34611572f191  synthetic-serial-g1_2  synthetic-vendor  synthetic-model-U2  a611b43e-3995-4cd4-9603-89ca6aca3dc5  in service  active 

# Let's expunge the "0th" disk here.

$ omdb nexus sleds expunge-disk 23524716-a331-4d57-aa71-8bd4dbc916f8 -w
$ omdb nexus blueprints regenerate -w
$ omdb nexus blueprints show $NEW_BLUEPRINT_ID

# Observe that the new blueprint for the sled expunges some zones -- minimally,
# the Crucible zone -- and no longer lists the "g1_0" disk. This should also be
# summarized in the blueprint metadata comment.

$ omdb nexus blueprints target set $NEW_BLUEPRINT_ID enabled -w
$ omdb sled-agent zones list

zones:
    "oxz_crucible_19831c98-3137-4af4-a93d-fc1a17c138f2"
    "oxz_crucible_74b2f587-10ce-4131-97fd-9832c52c8a41"
    "oxz_crucible_9e422508-f4d5-4c24-8dde-0080c0916419"
    "oxz_crucible_a47e9625-d189-4001-877a-cc3aa5b1f3eb"
    "oxz_crucible_pantry_c3b4e3cb-3e23-4f5e-921b-04e4801924fd"
    "oxz_ntp_209ad0d0-a5e7-4ab8-ac8f-e99902697b32"
    "oxz_oximeter_864efebb-790f-4b7a-8377-b2c82c87f5b8"

# As we can see, the expunged zones have been removed.
# We can also access the sled agent logs from g1 to observe that the expected requests have been sent
# to adjust the set of control plane disks and expunge the expected zones.
```

This is a major part of
#4719
Fixes #5370
  • Loading branch information
smklein authored Jul 15, 2024
1 parent ad6c92e commit 748a1d7
Show file tree
Hide file tree
Showing 11 changed files with 497 additions and 44 deletions.
94 changes: 89 additions & 5 deletions dev-tools/omdb/src/bin/omdb/db.rs
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ use nexus_db_model::IpAttachState;
use nexus_db_model::IpKind;
use nexus_db_model::NetworkInterface;
use nexus_db_model::NetworkInterfaceKind;
use nexus_db_model::PhysicalDisk;
use nexus_db_model::Probe;
use nexus_db_model::Project;
use nexus_db_model::Region;
Expand Down Expand Up @@ -96,7 +97,10 @@ use nexus_types::deployment::Blueprint;
use nexus_types::deployment::BlueprintZoneDisposition;
use nexus_types::deployment::BlueprintZoneFilter;
use nexus_types::deployment::BlueprintZoneType;
use nexus_types::deployment::DiskFilter;
use nexus_types::deployment::SledFilter;
use nexus_types::external_api::views::PhysicalDiskPolicy;
use nexus_types::external_api::views::PhysicalDiskState;
use nexus_types::external_api::views::SledPolicy;
use nexus_types::external_api::views::SledState;
use nexus_types::identity::Resource;
Expand Down Expand Up @@ -281,12 +285,14 @@ pub struct DbFetchOptions {
enum DbCommands {
/// Print information about the rack
Rack(RackArgs),
/// Print information about disks
/// Print information about virtual disks
Disks(DiskArgs),
/// Print information about internal and external DNS
Dns(DnsArgs),
/// Print information about collected hardware/software inventory
Inventory(InventoryArgs),
/// Print information about physical disks
PhysicalDisks(PhysicalDisksArgs),
/// Save the current Reconfigurator inputs to a file
ReconfiguratorSave(ReconfiguratorSaveArgs),
/// Print information about regions
Expand Down Expand Up @@ -407,8 +413,8 @@ enum InventoryCommands {
Cabooses,
/// list and show details from particular collections
Collections(CollectionsArgs),
/// show all physical disks every found
PhysicalDisks(PhysicalDisksArgs),
/// show all physical disks ever found
PhysicalDisks(InvPhysicalDisksArgs),
/// list all root of trust pages ever found
RotPages,
}
Expand Down Expand Up @@ -437,14 +443,21 @@ struct CollectionsShowArgs {
}

#[derive(Debug, Args, Clone, Copy)]
struct PhysicalDisksArgs {
struct InvPhysicalDisksArgs {
#[clap(long)]
collection_id: Option<CollectionUuid>,

#[clap(long, requires("collection_id"))]
sled_id: Option<SledUuid>,
}

#[derive(Debug, Args)]
struct PhysicalDisksArgs {
/// Show disks that match the given filter
#[clap(short = 'F', long, value_enum)]
filter: Option<DiskFilter>,
}

#[derive(Debug, Args)]
struct ReconfiguratorSaveArgs {
/// where to save the output
Expand Down Expand Up @@ -611,6 +624,15 @@ impl DbArgs {
)
.await
}
DbCommands::PhysicalDisks(args) => {
cmd_db_physical_disks(
&opctx,
&datastore,
&self.fetch_opts,
args,
)
.await
}
DbCommands::ReconfiguratorSave(reconfig_save_args) => {
cmd_db_reconfigurator_save(
&opctx,
Expand Down Expand Up @@ -1385,6 +1407,68 @@ async fn cmd_db_disk_physical(
Ok(())
}

#[derive(Tabled)]
#[tabled(rename_all = "SCREAMING_SNAKE_CASE")]
struct PhysicalDiskRow {
id: Uuid,
serial: String,
vendor: String,
model: String,
sled_id: Uuid,
policy: PhysicalDiskPolicy,
state: PhysicalDiskState,
}

impl From<PhysicalDisk> for PhysicalDiskRow {
fn from(d: PhysicalDisk) -> Self {
PhysicalDiskRow {
id: d.id(),
serial: d.serial.clone(),
vendor: d.vendor.clone(),
model: d.model.clone(),
sled_id: d.sled_id,
policy: d.disk_policy.into(),
state: d.disk_state.into(),
}
}
}

/// Run `omdb db physical-disks`.
async fn cmd_db_physical_disks(
opctx: &OpContext,
datastore: &DataStore,
fetch_opts: &DbFetchOptions,
args: &PhysicalDisksArgs,
) -> Result<(), anyhow::Error> {
let limit = fetch_opts.fetch_limit;
let filter = match args.filter {
Some(filter) => filter,
None => {
eprintln!(
"note: listing all in-service disks \
(use -F to filter, e.g. -F in-service)"
);
DiskFilter::InService
}
};

let sleds = datastore
.physical_disk_list(&opctx, &first_page(limit), filter)
.await
.context("listing physical disks")?;
check_limit(&sleds, limit, || String::from("listing physical disks"));

let rows = sleds.into_iter().map(|s| PhysicalDiskRow::from(s));
let table = tabled::Table::new(rows)
.with(tabled::settings::Style::empty())
.with(tabled::settings::Padding::new(1, 1, 0, 0))
.to_string();

println!("{}", table);

Ok(())
}

// SERVICES

// Snapshots
Expand Down Expand Up @@ -3187,7 +3271,7 @@ async fn cmd_db_inventory_cabooses(
async fn cmd_db_inventory_physical_disks(
conn: &DataStoreConnection<'_>,
limit: NonZeroU32,
args: PhysicalDisksArgs,
args: InvPhysicalDisksArgs,
) -> Result<(), anyhow::Error> {
#[derive(Tabled)]
#[tabled(rename_all = "SCREAMING_SNAKE_CASE")]
Expand Down
Loading

0 comments on commit 748a1d7

Please sign in to comment.