Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Sled Agent] Expunged disks are not in use after omicron_physical_disks_ensure #5965

Merged
merged 70 commits into from
Jul 15, 2024
Merged
Show file tree
Hide file tree
Changes from 68 commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
cee78bc
Start requiring zone filesystem argument
smklein Jun 20, 2024
3be4b6e
Deprecate the old service format
smklein Jun 20, 2024
8756076
Merge branch 'main' into nexus-zone-filesystems-2
smklein Jun 20, 2024
0aac450
Merge branch 'deprecate-services-migration' into nexus-zone-filesyste…
smklein Jun 20, 2024
3833549
Plumbing through filesystem_pool, still need to make it optional
smklein Jun 21, 2024
aea4bdb
Merge branch 'main' into deprecate-services-migration
smklein Jun 21, 2024
b58352f
review feedback
smklein Jun 21, 2024
a04e9c7
no bail just warn
smklein Jun 21, 2024
4615f1b
Merge branch 'deprecate-services-migration' into nexus-zone-filesyste…
smklein Jun 21, 2024
9f09c32
Merge branch 'main' into deprecate-services-migration
smklein Jun 21, 2024
9db3042
Merge branch 'deprecate-services-migration' into nexus-zone-filesyste…
smklein Jun 21, 2024
a96fc81
optional value
smklein Jun 21, 2024
9858dbf
are we optional yet
smklein Jun 21, 2024
f1e6f7a
lie about filesystem_pools for simulated sled agent
smklein Jun 21, 2024
8a9ade7
Patch test_builder_zones
smklein Jun 24, 2024
d7c462c
Fix test_silos_external_dns_end_to_end
smklein Jun 24, 2024
3c59610
patch v3 schema
smklein Jun 24, 2024
1270098
Patch blueprint edit
smklein Jun 24, 2024
f48fba3
Add schema change
smklein Jun 24, 2024
684932d
fmt
smklein Jun 24, 2024
87b8df9
Merge branch 'main' into nexus-zone-filesystems-2
smklein Jun 24, 2024
52406a6
helios tests
smklein Jun 24, 2024
acaf91f
Merge branch 'main' into nexus-zone-filesystems-2
smklein Jun 24, 2024
b1339d4
Cleanup
smklein Jun 24, 2024
fcea2f1
Merge branch 'main' into nexus-zone-filesystems-2
smklein Jun 25, 2024
53027a3
only pick in-service zpools from reconfigurator - regression test wanted
smklein Jun 25, 2024
ae41399
Merge zpool selection fns
smklein Jun 25, 2024
5b38070
Add colocation test
smklein Jun 25, 2024
f0ab1c2
Merge branch 'main' into nexus-zone-filesystems-2
smklein Jun 26, 2024
b883eec
Ensure expunged disks are not in use after omicron_physical_disks_ensure
smklein Jun 27, 2024
83c7cdf
Fix tests, add comments
smklein Jun 27, 2024
6869d92
Zone bundler
smklein Jun 28, 2024
4292158
Plumb 'PathInPool' structure
smklein Jul 1, 2024
17db428
Destroy instances
smklein Jul 1, 2024
32596df
Remove unused zone code
smklein Jul 1, 2024
d83a553
Merge branch 'main' into nexus-zone-filesystems-2
smklein Jul 1, 2024
d6618e7
Merge branch 'nexus-zone-filesystems-2' into physical_disks_ensure_le…
smklein Jul 1, 2024
2c6eb01
fix helios tests
smklein Jul 1, 2024
e4123a9
Add TODO, re: concurrency safety
smklein Jul 1, 2024
98278d4
Merge branch 'main' into nexus-zone-filesystems-2
smklein Jul 1, 2024
fa91e75
Merge branch 'nexus-zone-filesystems-2' into physical_disks_ensure_le…
smklein Jul 1, 2024
1207c9e
very WIP - adjusting generation
smklein Jul 2, 2024
892a7ca
Stop self-managing disks
smklein Jul 2, 2024
15b8d21
Merge branch 'stop-self-managing-disks' into physical_disks_ensure_le…
smklein Jul 2, 2024
c0e8e07
Fix imports
smklein Jul 2, 2024
654a4ce
Merge branch 'stop-self-managing-disks' into physical_disks_ensure_le…
smklein Jul 2, 2024
187aea3
generation number unity
smklein Jul 2, 2024
7c5a67f
Merge branch 'main' into stop-self-managing-disks
smklein Jul 2, 2024
b50007b
Merge branch 'stop-self-managing-disks' into physical_disks_ensure_le…
smklein Jul 2, 2024
c2ee842
Remove self-managing test too
smklein Jul 2, 2024
a437cc2
imports
smklein Jul 2, 2024
d9ab0e2
Merge branch 'main' into stop-self-managing-disks
smklein Jul 2, 2024
3d91d67
Merge branch 'stop-self-managing-disks' into physical_disks_ensure_le…
smklein Jul 2, 2024
c7d4e2e
Merge branch 'main' into stop-self-managing-disks
smklein Jul 2, 2024
7751f12
Merge branch 'stop-self-managing-disks' into physical_disks_ensure_le…
smklein Jul 2, 2024
8f2301d
Safe against concurrent updates
smklein Jul 2, 2024
48c3578
Patch firmware tests
smklein Jul 2, 2024
e933a46
Add a bunch of logging
smklein Jul 2, 2024
154a071
review feedback
smklein Jul 3, 2024
691bc85
tx naming
smklein Jul 5, 2024
e360dae
more explicit instance termination
smklein Jul 5, 2024
ec013d9
better handling of oneshot tx in instance manager
smklein Jul 5, 2024
a818de2
use_only_these_disks
smklein Jul 5, 2024
f242e0a
Mark vmm failed
smklein Jul 5, 2024
77931fd
Merge branch 'main' into stop-self-managing-disks
smklein Jul 12, 2024
6babd19
Merge branch 'stop-self-managing-disks' into physical_disks_ensure_le…
smklein Jul 12, 2024
d8a5465
Merge branch 'main' into stop-self-managing-disks
smklein Jul 12, 2024
d57ec70
Merge branch 'stop-self-managing-disks' into physical_disks_ensure_le…
smklein Jul 12, 2024
9e1d729
Merge branch 'main' into stop-self-managing-disks
smklein Jul 15, 2024
426daf1
Merge branch 'stop-self-managing-disks' into physical_disks_ensure_le…
smklein Jul 15, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
171 changes: 19 additions & 152 deletions illumos-utils/src/running_zone.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ use crate::link::{Link, VnicAllocator};
use crate::opte::{Port, PortTicket};
use crate::svc::wait_for_service;
use crate::zone::{AddressRequest, IPADM, ZONE_PREFIX};
use crate::zpool::{PathInPool, ZpoolName};
use camino::{Utf8Path, Utf8PathBuf};
use camino_tempfile::Utf8TempDir;
use ipnetwork::IpNetwork;
Expand Down Expand Up @@ -101,60 +102,6 @@ pub enum EnsureAddressError {
OpteGatewayConfig(#[from] RunCommandError),
}

/// Errors returned from [`RunningZone::get`].
#[derive(thiserror::Error, Debug)]
pub enum GetZoneError {
#[error("While looking up zones with prefix '{prefix}', could not get zones: {err}")]
GetZones {
prefix: String,
#[source]
err: crate::zone::AdmError,
},

#[error("Invalid Utf8 path: {0}")]
FromPathBuf(#[from] camino::FromPathBufError),

#[error("Zone with prefix '{prefix}' not found")]
NotFound { prefix: String },

#[error("Cannot get zone '{name}': it is in the {state:?} state instead of running")]
NotRunning { name: String, state: zone::State },

#[error(
"Cannot get zone '{name}': Failed to acquire control interface {err}"
)]
ControlInterface {
name: String,
#[source]
err: crate::zone::GetControlInterfaceError,
},

#[error("Cannot get zone '{name}': Failed to create addrobj: {err}")]
AddrObject {
name: String,
#[source]
err: crate::addrobj::ParseError,
},

#[error(
"Cannot get zone '{name}': Failed to ensure address exists: {err}"
)]
EnsureAddress {
name: String,
#[source]
err: crate::zone::EnsureAddressError,
},

#[error(
"Cannot get zone '{name}': Incorrect bootstrap interface access {err}"
)]
BootstrapInterface {
name: String,
#[source]
err: crate::zone::GetBootstrapInterfaceError,
},
}

#[cfg(target_os = "illumos")]
static REAPER_THREAD: OnceLock<thread::JoinHandle<()>> = OnceLock::new();

Expand Down Expand Up @@ -407,6 +354,11 @@ impl RunningZone {
self.inner.root()
}

/// Returns the zpool on which the filesystem path has been placed.
pub fn root_zpool(&self) -> Option<&ZpoolName> {
self.inner.zonepath.pool.as_ref()
}

pub fn control_interface(&self) -> AddrObject {
AddrObject::new(self.inner.get_control_vnic_name(), "omicron6").unwrap()
}
Expand Down Expand Up @@ -797,95 +749,6 @@ impl RunningZone {
Ok(())
}

/// Looks up a running zone based on the `zone_prefix`, if one already exists.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function tries to "derive the zone information from the system" if the Sled Agent didn't know about it before.

This is a little hairy for the new PathInPool type, because we'd need to re-derive the pool location from the filesystem path. This is possible, but I just opted to not do it, because no one is currently accessing this function.

Instead, given that this function has no callers, I just removed it.

///
/// - If the zone was found, is running, and has a network interface, it is
/// returned.
/// - If the zone was not found `Error::NotFound` is returned.
/// - If the zone was found, but not running, `Error::NotRunning` is
/// returned.
/// - Other errors may be returned attempting to look up and accessing an
/// address on the zone.
pub async fn get(
log: &Logger,
vnic_allocator: &VnicAllocator<Etherstub>,
zone_prefix: &str,
addrtype: AddressRequest,
) -> Result<Self, GetZoneError> {
let zone_info = Zones::get()
.await
.map_err(|err| GetZoneError::GetZones {
prefix: zone_prefix.to_string(),
err,
})?
.into_iter()
.find(|zone_info| zone_info.name().starts_with(&zone_prefix))
.ok_or_else(|| GetZoneError::NotFound {
prefix: zone_prefix.to_string(),
})?;

if zone_info.state() != zone::State::Running {
return Err(GetZoneError::NotRunning {
name: zone_info.name().to_string(),
state: zone_info.state(),
});
}

let zone_name = zone_info.name();
let vnic_name =
Zones::get_control_interface(zone_name).map_err(|err| {
GetZoneError::ControlInterface {
name: zone_name.to_string(),
err,
}
})?;
let addrobj = AddrObject::new_control(&vnic_name).map_err(|err| {
GetZoneError::AddrObject { name: zone_name.to_string(), err }
})?;
Zones::ensure_address(Some(zone_name), &addrobj, addrtype).map_err(
|err| GetZoneError::EnsureAddress {
name: zone_name.to_string(),
err,
},
)?;

let control_vnic = vnic_allocator
.wrap_existing(vnic_name)
.expect("Failed to wrap valid control VNIC");

// The bootstrap address for a running zone never changes,
// so there's no need to call `Zones::ensure_address`.
// Currently, only the switch zone has a bootstrap interface.
let bootstrap_vnic = Zones::get_bootstrap_interface(zone_name)
.map_err(|err| GetZoneError::BootstrapInterface {
name: zone_name.to_string(),
err,
})?
.map(|name| {
vnic_allocator
.wrap_existing(name)
.expect("Failed to wrap valid bootstrap VNIC")
});

Ok(Self {
id: zone_info.id().map(|x| {
x.try_into().expect("zoneid_t is expected to be an i32")
}),
inner: InstalledZone {
log: log.new(o!("zone" => zone_name.to_string())),
zonepath: zone_info.path().to_path_buf().try_into()?,
name: zone_name.to_string(),
control_vnic,
// TODO(https://github.com/oxidecomputer/omicron/issues/725)
//
// Re-initialize guest_vnic state by inspecting the zone.
opte_ports: vec![],
links: vec![],
bootstrap_vnic,
},
})
}

/// Return references to the OPTE ports for this zone.
pub fn opte_ports(&self) -> impl Iterator<Item = &Port> {
self.inner.opte_ports()
Expand Down Expand Up @@ -1081,7 +944,7 @@ pub struct InstalledZone {
log: Logger,

// Filesystem path of the zone
zonepath: Utf8PathBuf,
zonepath: PathInPool,

// Name of the Zone.
name: String,
Expand Down Expand Up @@ -1131,7 +994,7 @@ impl InstalledZone {

/// Returns the filesystem path to the zonepath
pub fn zonepath(&self) -> &Utf8Path {
&self.zonepath
&self.zonepath.path
}

pub fn site_profile_xml_path(&self) -> Utf8PathBuf {
Expand All @@ -1147,7 +1010,7 @@ impl InstalledZone {

/// Returns the filesystem path to the zone's root in the GZ.
pub fn root(&self) -> Utf8PathBuf {
self.zonepath.join(Self::ROOT_FS_PATH)
self.zonepath.path.join(Self::ROOT_FS_PATH)
}
}

Expand Down Expand Up @@ -1198,7 +1061,7 @@ pub struct ZoneBuilder<'a> {
/// Allocates the NIC used for control plane communication.
underlay_vnic_allocator: Option<&'a VnicAllocator<Etherstub>>,
/// Filesystem path at which the installed zone will reside.
zone_root_path: Option<&'a Utf8Path>,
zone_root_path: Option<PathInPool>,
/// The directories that will be searched for the image tarball for the
/// provided zone type ([`Self::with_zone_type`]).
zone_image_paths: Option<&'a [Utf8PathBuf]>,
Expand Down Expand Up @@ -1251,7 +1114,7 @@ impl<'a> ZoneBuilder<'a> {
}

/// Filesystem path at which the installed zone will reside.
pub fn with_zone_root_path(mut self, root_path: &'a Utf8Path) -> Self {
pub fn with_zone_root_path(mut self, root_path: PathInPool) -> Self {
self.zone_root_path = Some(root_path);
self
}
Expand Down Expand Up @@ -1345,8 +1208,11 @@ impl<'a> ZoneBuilder<'a> {
self.zone_type?,
self.unique_name,
);
let zonepath = temp_dir
.join(self.zone_root_path?.strip_prefix("/").unwrap())
let mut zonepath = self.zone_root_path?;
zonepath.path = temp_dir
.join(
zonepath.path.strip_prefix("/").unwrap()
)
.join(&full_zone_name);
let iz = InstalledZone {
log: self.log?,
Expand Down Expand Up @@ -1376,7 +1242,7 @@ impl<'a> ZoneBuilder<'a> {
let Self {
log: Some(log),
underlay_vnic_allocator: Some(underlay_vnic_allocator),
zone_root_path: Some(zone_root_path),
zone_root_path: Some(mut zone_root_path),
zone_image_paths: Some(zone_image_paths),
zone_type: Some(zone_type),
unique_name,
Expand Down Expand Up @@ -1440,6 +1306,7 @@ impl<'a> ZoneBuilder<'a> {
net_device_names.sort();
net_device_names.dedup();

zone_root_path.path = zone_root_path.path.join(&full_zone_name);
Zones::install_omicron_zone(
&log,
&zone_root_path,
Expand All @@ -1460,7 +1327,7 @@ impl<'a> ZoneBuilder<'a> {

Ok(InstalledZone {
log: log.new(o!("zone" => full_zone_name.clone())),
zonepath: zone_root_path.join(&full_zone_name),
zonepath: zone_root_path,
name: full_zone_name,
control_vnic,
bootstrap_vnic,
Expand Down
6 changes: 3 additions & 3 deletions illumos-utils/src/zone.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ use std::net::{IpAddr, Ipv6Addr};

use crate::addrobj::AddrObject;
use crate::dladm::{EtherstubVnic, VNIC_PREFIX_BOOTSTRAP, VNIC_PREFIX_CONTROL};
use crate::zpool::PathInPool;
use crate::{execute, PFEXEC};
use omicron_common::address::SLED_PREFIX;

Expand Down Expand Up @@ -282,7 +283,7 @@ impl Zones {
#[allow(clippy::too_many_arguments)]
pub async fn install_omicron_zone(
log: &Logger,
zone_root_path: &Utf8Path,
zone_root_path: &PathInPool,
zone_name: &str,
zone_image: &Utf8Path,
datasets: &[zone::Dataset],
Expand Down Expand Up @@ -319,10 +320,9 @@ impl Zones {
true,
zone::CreationOptions::Blank,
);
let path = zone_root_path.join(zone_name);
cfg.get_global()
.set_brand("omicron1")
.set_path(&path)
.set_path(&zone_root_path.path)
.set_autoboot(false)
.set_ip_type(zone::IpType::Exclusive);
if !limit_priv.is_empty() {
Expand Down
15 changes: 14 additions & 1 deletion illumos-utils/src/zpool.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
//! Utilities for managing Zpools.

use crate::{execute, ExecutionError, PFEXEC};
use camino::Utf8Path;
use camino::{Utf8Path, Utf8PathBuf};
use std::str::FromStr;

pub use omicron_common::zpool_name::ZpoolName;
Expand Down Expand Up @@ -181,6 +181,19 @@ impl FromStr for ZpoolInfo {
/// Wraps commands for interacting with ZFS pools.
pub struct Zpool {}

/// A path which exists within a pool.
///
/// By storing these types together, it's possible to answer
/// whether or not a path exists on a particular device.
// Technically we could re-derive the pool name from the path,
// but that involves some string parsing, and honestly I'd just
// Rather Not.
#[derive(Debug, Clone, Eq, PartialEq)]
pub struct PathInPool {
pub pool: Option<ZpoolName>,
pub path: Utf8PathBuf,
}

#[cfg_attr(any(test, feature = "testing"), mockall::automock, allow(dead_code))]
impl Zpool {
pub fn create(
Expand Down
13 changes: 10 additions & 3 deletions sled-agent/src/common/instance.rs
Original file line number Diff line number Diff line change
Expand Up @@ -486,9 +486,15 @@ impl InstanceStates {
/// instance's state in Nexus may become inconsistent. This routine should
/// therefore only be invoked by callers who know that an instance is not
/// migrating.
pub(crate) fn terminate_rudely(&mut self) {
pub(crate) fn terminate_rudely(&mut self, mark_failed: bool) {
let vmm_state = if mark_failed {
PropolisInstanceState(PropolisApiState::Failed)
} else {
PropolisInstanceState(PropolisApiState::Destroyed)
};

let fake_observed = ObservedPropolisState {
vmm_state: PropolisInstanceState(PropolisApiState::Destroyed),
vmm_state,
migration_status: if self.instance.migration_id.is_some() {
ObservedMigrationStatus::Failed
} else {
Expand Down Expand Up @@ -893,7 +899,8 @@ mod test {
assert_eq!(state.propolis_role(), PropolisRole::MigrationTarget);

let prev = state.clone();
state.terminate_rudely();
let mark_failed = false;
state.terminate_rudely(mark_failed);

assert_state_change_has_gen_change(&prev, &state);
assert_eq!(state.instance.gen, prev.instance.gen);
Expand Down
Loading
Loading