Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blueprint-execution: plumb service firewall rules #5157

Closed
wants to merge 10 commits into from
19 changes: 19 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ members = [
"nexus",
"nexus/authz-macros",
"nexus/blueprint-execution",
"nexus/capabilities",
"nexus/db-macros",
"nexus/db-model",
"nexus/db-queries",
Expand Down Expand Up @@ -115,6 +116,7 @@ default-members = [
"nexus",
"nexus/authz-macros",
"nexus/blueprint-execution",
"nexus/capabilities",
"nexus/macros-common",
"nexus/db-macros",
"nexus/db-model",
Expand Down Expand Up @@ -254,6 +256,7 @@ newtype_derive = "0.1.6"
mg-admin-client = { path = "clients/mg-admin-client" }
multimap = "0.10.0"
nexus-blueprint-execution = { path = "nexus/blueprint-execution" }
nexus-capabilities = { path = "nexus/capabilities" }
nexus-client = { path = "clients/nexus-client" }
nexus-db-model = { path = "nexus/db-model" }
nexus-db-queries = { path = "nexus/db-queries" }
Expand Down
1 change: 1 addition & 0 deletions nexus/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ trust-dns-resolver.workspace = true
uuid.workspace = true

nexus-blueprint-execution.workspace = true
nexus-capabilities.workspace = true
nexus-defaults.workspace = true
nexus-db-model.workspace = true
nexus-db-queries.workspace = true
Expand Down
1 change: 1 addition & 0 deletions nexus/blueprint-execution/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ dns-service-client.workspace = true
futures.workspace = true
illumos-utils.workspace = true
internal-dns.workspace = true
nexus-capabilities.workspace = true
nexus-db-model.workspace = true
nexus-db-queries.workspace = true
nexus-types.workspace = true
Expand Down
68 changes: 56 additions & 12 deletions nexus/blueprint-execution/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,16 @@
//! See `nexus_deployment` crate-level docs for background.

use anyhow::{anyhow, Context};
use nexus_capabilities::Base;
use nexus_capabilities::FirewallRules;
use nexus_db_queries::context::OpContext;
use nexus_db_queries::db::DataStore;
use nexus_types::deployment::Blueprint;
use nexus_types::identity::Asset;
use omicron_common::address::Ipv6Subnet;
use omicron_common::address::SLED_PREFIX;
use slog::info;
use slog::Logger;
use slog_error_chain::InlineErrorChain;
use std::collections::BTreeMap;
use std::net::SocketAddrV6;
Expand Down Expand Up @@ -46,6 +49,31 @@ impl From<nexus_db_model::Sled> for Sled {
}
}

// A vastly-restricted `Nexus` object that allows us access to some of Nexus
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's an example of constructing a not-Nexus-that-can-still-do-Nexus-things.

// proper's capabilities.
struct NexusContext<'a> {
opctx: OpContext,
datastore: &'a DataStore,
}

impl Base for NexusContext<'_> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern is interesting - at first read, I kinda expected that this would be the Nexus object.

This really isn't the Nexus object, nor is it truly a subset, right? It's:

  1. Taking a chunk of code that used to be part of struct Nexus
  2. Moving it out, and copying all fields from struct Nexus it used
  3. ... But now it's a totally new object.

I bring this up because in this use-case, you happen to use a single opctx + datastore, but there's a lot more stuff that could be used here by other capabilities, if we keep splitting this stuff up.

For example: app/silo.rs uses self.external_resolver, so presumably that field would need to be copied (or fully moved) into the NexusContext into a "capability to access silos".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really isn't the Nexus object, nor is it truly a subset, right? It's:

  1. Taking a chunk of code that used to be part of struct Nexus
  2. Moving it out, and copying all fields from struct Nexus it used
  3. ... But now it's a totally new object.

This is all correct. And in particular, in this case the logger we present is different from the one stored in Nexus, which means any logs emitted by plumb_service_firewall_rules (or other nexus-capabilities methods it calls in turn) will use our RPW logger instead of the Nexus logger.

I bring this up because in this use-case, you happen to use a single opctx + datastore, but there's a lot more stuff that could be used here by other capabilities, if we keep splitting this stuff up.

For example: app/silo.rs uses self.external_resolver, so presumably that field would need to be copied (or fully moved) into the NexusContext into a "capability to access silos".

All correct, yes.

fn log(&self) -> &Logger {
&self.opctx.log
}

fn datastore(&self) -> &DataStore {
self.datastore
}
}

impl nexus_capabilities::SledAgent for NexusContext<'_> {
fn opctx_sled_client(&self) -> &OpContext {
&self.opctx
}
}

impl nexus_capabilities::FirewallRules for NexusContext<'_> {}

/// Make one attempt to realize the given blueprint, meaning to take actions to
/// alter the real system to match the blueprint
///
Expand All @@ -64,43 +92,59 @@ where
"comment".to_string(),
blueprint.comment.clone(),
)]));
let nexusctx = NexusContext { opctx, datastore };

info!(
opctx.log,
nexusctx.log(),
"attempting to realize blueprint";
"blueprint_id" => ?blueprint.id
"blueprint_id" => %blueprint.id
);

resource_allocation::ensure_zone_resources_allocated(
&opctx,
datastore,
&nexusctx.opctx,
nexusctx.datastore,
&blueprint.omicron_zones,
)
.await
.map_err(|err| vec![err])?;

let sleds_by_id: BTreeMap<Uuid, _> = datastore
.sled_list_all_batched(&opctx)
let sleds_by_id: BTreeMap<Uuid, _> = nexusctx
.datastore
.sled_list_all_batched(&nexusctx.opctx)
.await
.context("listing all sleds")
.map_err(|e| vec![e])?
.into_iter()
.map(|db_sled| (db_sled.id(), Sled::from(db_sled)))
.collect();
omicron_zones::deploy_zones(&opctx, &sleds_by_id, &blueprint.omicron_zones)
.await?;
omicron_zones::deploy_zones(
&nexusctx,
&sleds_by_id,
&blueprint.omicron_zones,
)
.await?;

// After deploying omicron zones, we may need to refresh OPTE service
// firewall rules. This is an idempotent operation, so we don't attempt
// to optimize out calling it in unnecessary cases, although we expect
// _most_ cases this is not needed.
nexusctx
.plumb_service_firewall_rules(&nexusctx.opctx, &[])
.await
.context("failed to plumb service firewall rules to sleds")
.map_err(|err| vec![err])?;

datasets::ensure_crucible_dataset_records_exist(
&opctx,
datastore,
&nexusctx.opctx,
nexusctx.datastore,
blueprint.all_omicron_zones().map(|(_sled_id, zone)| zone),
)
.await
.map_err(|err| vec![err])?;

dns::deploy_dns(
&opctx,
datastore,
&nexusctx.opctx,
nexusctx.datastore,
String::from(nexus_label),
blueprint,
&sleds_by_id,
Expand Down
60 changes: 23 additions & 37 deletions nexus/blueprint-execution/src/omicron_zones.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,15 @@

//! Manges deployment of Omicron zones to Sled Agents

use crate::NexusContext;
use crate::Sled;
use anyhow::anyhow;
use anyhow::Context;
use futures::stream;
use futures::StreamExt;
use nexus_db_queries::context::OpContext;
use nexus_capabilities::Base;
use nexus_capabilities::SledAgent;
use nexus_types::deployment::OmicronZonesConfig;
use sled_agent_client::Client as SledAgentClient;
use slog::info;
use slog::warn;
use std::collections::BTreeMap;
Expand All @@ -20,7 +21,7 @@ use uuid::Uuid;
/// Idempotently ensure that the specified Omicron zones are deployed to the
/// corresponding sleds
pub(crate) async fn deploy_zones(
opctx: &OpContext,
nexusctx: &NexusContext<'_>,
sleds_by_id: &BTreeMap<Uuid, Sled>,
zones: &BTreeMap<Uuid, OmicronZonesConfig>,
) -> Result<(), Vec<anyhow::Error>> {
Expand All @@ -30,23 +31,24 @@ pub(crate) async fn deploy_zones(
Some(sled) => sled,
None => {
let err = anyhow!("sled not found in db list: {}", sled_id);
warn!(opctx.log, "{err:#}");
warn!(nexusctx.log(), "{err:#}");
return Some(err);
}
};
let client = sled_client(opctx, &db_sled);
let client =
nexusctx.sled_client(*sled_id, db_sled.sled_agent_address);
let result =
client.omicron_zones_put(&config).await.with_context(|| {
client.omicron_zones_put(config).await.with_context(|| {
format!("Failed to put {config:#?} to sled {sled_id}")
});
match result {
Err(error) => {
warn!(opctx.log, "{error:#}");
warn!(nexusctx.log(), "{error:#}");
Some(error)
}
Ok(_) => {
info!(
opctx.log,
nexusctx.log(),
"Successfully deployed zones for sled agent";
"sled_id" => %sled_id,
"generation" => %config.generation,
Expand All @@ -65,29 +67,10 @@ pub(crate) async fn deploy_zones(
}
}

// This is a modified copy of the functionality from `nexus/src/app/sled.rs`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay, grossness expunged!

// There's no good way to access this functionality right now since it is a
// method on the `Nexus` type. We want to have a more constrained type we can
// pass into background tasks for this type of functionality, but for now we
// just copy the functionality.
fn sled_client(opctx: &OpContext, sled: &Sled) -> SledAgentClient {
let dur = std::time::Duration::from_secs(60);
let client = reqwest::ClientBuilder::new()
.connect_timeout(dur)
.timeout(dur)
.build()
.unwrap();
SledAgentClient::new_with_client(
&format!("http://{}", sled.sled_agent_address),
client,
opctx.log.clone(),
)
}

#[cfg(test)]
mod test {
use super::deploy_zones;
use crate::Sled;
use crate::{NexusContext, Sled};
use httptest::matchers::{all_of, json_decoded, request};
use httptest::responders::status_code;
use httptest::Expectation;
Expand Down Expand Up @@ -135,10 +118,13 @@ mod test {
async fn test_deploy_omicron_zones(cptestctx: &ControlPlaneTestContext) {
let nexus = &cptestctx.server.apictx().nexus;
let datastore = nexus.datastore();
let opctx = OpContext::for_tests(
cptestctx.logctx.log.clone(),
datastore.clone(),
);
let nexusctx = NexusContext {
opctx: OpContext::for_tests(
cptestctx.logctx.log.clone(),
datastore.clone(),
),
datastore,
};

// Create some fake sled-agent servers to respond to zone puts and add
// sleds to CRDB.
Expand All @@ -165,7 +151,7 @@ mod test {
// Get a success result back when the blueprint has an empty set of
// zones.
let blueprint = Arc::new(create_blueprint(BTreeMap::new()));
deploy_zones(&opctx, &sleds_by_id, &blueprint.1.omicron_zones)
deploy_zones(&nexusctx, &sleds_by_id, &blueprint.1.omicron_zones)
.await
.expect("failed to deploy no zones");

Expand Down Expand Up @@ -220,7 +206,7 @@ mod test {
}

// Execute it.
deploy_zones(&opctx, &sleds_by_id, &blueprint.1.omicron_zones)
deploy_zones(&nexusctx, &sleds_by_id, &blueprint.1.omicron_zones)
.await
.expect("failed to deploy initial zones");

Expand All @@ -237,7 +223,7 @@ mod test {
.respond_with(status_code(204)),
);
}
deploy_zones(&opctx, &sleds_by_id, &blueprint.1.omicron_zones)
deploy_zones(&nexusctx, &sleds_by_id, &blueprint.1.omicron_zones)
.await
.expect("failed to deploy same zones");
s1.verify_and_clear();
Expand All @@ -261,7 +247,7 @@ mod test {
);

let errors =
deploy_zones(&opctx, &sleds_by_id, &blueprint.1.omicron_zones)
deploy_zones(&nexusctx, &sleds_by_id, &blueprint.1.omicron_zones)
.await
.expect_err("unexpectedly succeeded in deploying zones");

Expand Down Expand Up @@ -311,7 +297,7 @@ mod test {
}

// Activate the task
deploy_zones(&opctx, &sleds_by_id, &blueprint.1.omicron_zones)
deploy_zones(&nexusctx, &sleds_by_id, &blueprint.1.omicron_zones)
.await
.expect("failed to deploy last round of zones");
s1.verify_and_clear();
Expand Down
18 changes: 18 additions & 0 deletions nexus/capabilities/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
[package]
name = "nexus-capabilities"
version = "0.1.0"
edition = "2021"

[dependencies]
async-trait.workspace = true
futures.workspace = true
ipnetwork.workspace = true
nexus-db-queries.workspace = true
omicron-common.workspace = true
reqwest.workspace = true
sled-agent-client.workspace = true
slog.workspace = true
slog-error-chain.workspace = true
uuid.workspace = true

omicron-workspace-hack.workspace = true
Loading
Loading