Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sled-agent] Use zone-network-setup service after underlay is up #7260

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions illumos-utils/src/smf_helper.rs
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,38 @@ impl<'t> SmfHelper<'t> {
Ok(())
}

pub fn delpropvalue<P, V>(&self, prop: P, val: V) -> Result<(), Error>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this differ from the existing delpropvalue_default_instance method below? There's only one (default) instance of this service in a zone AFAIU, so they seem like they should be equivalent. I could be missing something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're almost the same, yes. You can set properties on the instance itself and/or on the service.

Example:

coatlicue@centzon:~$ svccfg -s svc:/system/fmd:default listprop
general                            framework
general/comment                    astring  
general/enabled                    boolean  true
restarter                          framework    NONPERSISTENT
restarter/logfile                  astring  /var/svc/log/system-fmd:default.log
restarter/contract                 count    89
restarter/start_pid                count    1516
restarter/start_method_timestamp   time     1734493550.186287000
restarter/start_method_waitstatus  integer  0
restarter/auxiliary_state          astring  dependencies_satisfied
restarter/next_state               astring  none
restarter/state                    astring  online
restarter/state_timestamp          time     1734493550.187733000
coatlicue@centzon:~$ svccfg -s svc:/system/fmd listprop
dumpadm-fmd                                    dependency
dumpadm-fmd/entities                           fmri     svc:/system/dumpadm
dumpadm-fmd/external                           boolean  true
dumpadm-fmd/grouping                           astring  require_all
dumpadm-fmd/restart_on                         astring  none
dumpadm-fmd/type                               astring  service
SUNWfmd                                        dependency
SUNWfmd/entities                               fmri     file://localhost/usr/lib/fm/fmd/fmd
SUNWfmd/grouping                               astring  require_all
SUNWfmd/restart_on                             astring  none
SUNWfmd/type                                   astring  path
<more properties>

The zone-network-service has it's properties set on the service directly so the other command would not delete the correct property values

where
P: ToString,
V: ToString,
{
match self
.running_zone
.run_cmd(&[
SVCCFG,
"-s",
&self.smf_name,
"delpropvalue",
&prop.to_string(),
&val.to_string(),
])
.map_err(|err| Error::ZoneCommand {
intent: format!("del {} smf property value", prop.to_string()),
err,
}) {
Ok(_) => (),
Err(e) => {
// If a property already doesn't exist we don't need to
// return an error
if !e.to_string().contains("No such property") {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd hit this if one tried to delete a property that doesn't exist, right? As opposed to deleting a value that doesn't exist from a property that does. Is that what we want? I could see an argument for failing if we try to deleting a non-existent property.

Copy link
Contributor Author

@karencfv karencfv Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, yeah I see your point.

In this case, after deleting we run the svccfg addpropvalue command with the value type, which means the property is created if it doesn't exist. So, it seems like it's not a big deal if we don't return an error if the property didn't exist. If the property wasn't created previously for whatever reason, it's OK because we're creating it anyway.

That said, returning the error like you propose, would guarantee that we are not adding a property where it's not meant to be.

I'm a little torn on this one. WDYT?

return Err(e);
}
}
};

Ok(())
}

pub fn delpropvalue_default_instance<P, V>(
&self,
prop: P,
Expand Down
91 changes: 48 additions & 43 deletions sled-agent/src/services.rs
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,6 @@ use illumos_utils::running_zone::{
};
use illumos_utils::smf_helper::SmfHelper;
use illumos_utils::zfs::ZONE_ZFS_RAMDISK_DATASET_MOUNTPOINT;
use illumos_utils::zone::AddressRequest;
use illumos_utils::zpool::{PathInPool, ZpoolName};
use illumos_utils::{execute, PFEXEC};
use internal_dns_resolver::Resolver;
Expand Down Expand Up @@ -541,6 +540,18 @@ struct SwitchZoneConfigLocal {
root: Utf8PathBuf,
}

/// Service that sets up common networking across zones
struct ZoneNetworkSetupService {}

impl illumos_utils::smf_helper::Service for ZoneNetworkSetupService {
fn service_name(&self) -> String {
"zone-network-setup".to_string()
}
fn smf_name(&self) -> String {
format!("svc:/oxide/{}", self.service_name())
}
}

/// Describes either an Omicron-managed zone or the switch zone, used for
/// functions that operate on either one or the other
enum ZoneArgs<'a> {
Expand Down Expand Up @@ -4272,59 +4283,53 @@ impl ServiceManager {
);
*request = new_request;

// Add SMF properties here and restart zone-network-setup service
let first_address = request.addresses.get(0);
let address = first_address
.map(|addr| addr.to_string())
.unwrap_or_else(|| "".to_string());

for addr in &request.addresses {
if *addr == Ipv6Addr::LOCALHOST {
continue;
}
info!(
self.inner.log,
"Ensuring address {} exists",
addr.to_string()
);
let addr_request =
AddressRequest::new_static(IpAddr::V6(*addr), None);
zone.ensure_address(addr_request).await?;
info!(
self.inner.log,
"Ensuring address {} exists - OK",
addr.to_string()
);
}
// Set new properties for the network set up service and refresh
let nw_setup_svc = ZoneNetworkSetupService {};
let nsmfh = SmfHelper::new(&zone, &nw_setup_svc);

nsmfh.delpropvalue("config/gateway", "*")?;
nsmfh.delpropvalue("config/static_addr", "*")?;

// When the request addresses have changed this means the underlay is
// available now as well.
if let Some(info) = self.inner.sled_info.get() {
info!(
nsmfh.addpropvalue_type(
"config/gateway",
&info.underlay_address,
"astring",
)?;
} else {
// It should be impossible for the `sled_info` not to be set here.
// When the request addresses have changed this means the underlay is
// available as well.
error!(
self.inner.log,
"Ensuring there is a default route";
"gateway" => ?info.underlay_address,
concat!(
"sled agent info is not present,",
" even though underlay address exists"
)
);
match zone.add_default_route(info.underlay_address).map_err(
|err| Error::ZoneCommand {
intent: "Adding Route".to_string(),
err,
},
) {
Ok(_) => (),
Err(e) => {
if e.to_string().contains("entry exists") {
info!(
self.inner.log,
"Default route already exists";
"gateway" => ?info.underlay_address,
)
} else {
return Err(e);
}
}
};
}

for address in &request.addresses {
if *address != Ipv6Addr::LOCALHOST {
nsmfh.addpropvalue_type(
"config/static_addr",
&address,
"astring",
)?;
}
}
nsmfh.refresh()?;
info!(
self.inner.log,
"refreshed zone-network-setup service with new configuration"
);

for service in &request.services {
let smfh = SmfHelper::new(&zone, service);

Expand Down
6 changes: 5 additions & 1 deletion smf/zone-network-setup/manifest.xml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@
<service_fmri value='svc:/milestone/network:default' />
</dependency>

<!-- Run after the NDP daemon is online.. -->
<!-- Run after the NDP daemon is online. -->
<dependency name='ndp' grouping='require_all' restart_on='none'
type='service'>
<service_fmri value='svc:/network/routing/ndp:default' />
Expand All @@ -31,6 +31,10 @@
timeout_seconds='0' />

<exec_method type='method' name='stop' exec=':true' timeout_seconds='0' />
<!-- We use the same command as the start method as it is safe to rerun. -->
<exec_method type='method' name='refresh'
exec='/opt/oxide/zone-setup-cli/bin/zone-setup common-networking -d %{config/datalink} -s %{config/static_addr} -g %{config/gateway}'
timeout_seconds='0' />
Comment on lines +34 to +37
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the service is currently running when it's refreshed?

Copy link
Contributor Author

@karencfv karencfv Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing really. All of the commands on that service are idempotent (I also manually refreshed this service multiple times to see what happened).

Additionally, it's highly unlikely this would happen. Several other services on the switch zone depend on this service, which means we wouldn't even get to the point where the service is refreshed, if the zone-network-setup service hadn't exited with 0 (as far as I understand how the switch zone startup works, I could be wrong though).


<property_group name='startd' type='framework'>
<propval name='duration' type='astring' value='transient' />
Expand Down
Loading