Skip to content

Commit

Permalink
[nexus] put DNS servers in DNS, so you can DNS while you DNS (#5033)
Browse files Browse the repository at this point in the history
Currently, the DNS propagation background task in Nexus uses the
`services` table to enumerate the list of DNS servers that it's
responsible for keeping up to date. However, we'd really like to get rid
of said `services` table (see #4947), and the DNS propagation code is
the only remaining user of the `services` table.

Therefore, this branch changes how DNS servers are discovered for DNS
propagation. Rather than discovering DNS server addresses from the
`services` table, the `DnsWatcher` background task now discovers DNS
servers...using internal DNS. As described in #4889, this may _seem_
like a cyclical dependency, but, because the initial set of internal DNS
servers operate at known addresses -- by design -- so that they can
always be discovered. And they have to be up and discoverable for Nexus
to even come up and find CockroachDB. So, internal DNS can safely be
assumed to be up if Nexus has come up at all. Now, the `services` table
is no longer used, and

This change breaks the existing tests
`nexus::app::background::init::test_dns_propagation_basic` and
`nexus::app::background::dns_servers::test_basic`. I've rewritten the
`test_dns_propagation_basic` test to test the new expected behavior:
- creating a new internal DNS server and adding a DNS record for it to
the database's DNS config table results in that server's DNS records
being propagated to the existing DNS serve
- the `DnsWatcher` background task then picks up the DNS records for the
new DNS server by querying the existing known DNS server
- the current DNS config generation is then propagated to the new DNS
server
- a subsequent generation is propagated to both the initial and new DNS
servers

The `dns_servers::test_basic` test tested the discovery of DNS server
addresses from the database. Because these no longer come from the db,
and now come from internal DNS, this test would now end up exercising
most of the functionality tested in `test_dns_propagation_basic`. I
didn't think it was necessary to have two tests for this, so I made the
judgement call to delete `dns_servers::test_basic`. If we think having a
more isolated test that exercises only the DNS watcher task and not the
DNS propagation task, we could put this back and create records on the
DNS server by manually hitting its API with new configs, but I didn't
think this was really worth the effort.

I've also removed the `Datastore::upsert_service` method, which was used
only for test code and is now dead. I considered deleting all code
related to querying the `services` table in this branch as well.
However, I noticed that it's still populated when initializing the rack,
and that `omdb` has commands for querying that table. I wasn't sure if
there were alternative data sources for the `omdb` debugging commands
yet, so I didn't remove them. If the data provided by those commands is
available elsewhere, or if their only real purpose is _just_ to print
the state of this table, I'm happy to delete them in this branch, as
well.

Closes #4889


![image](https://github.com/oxidecomputer/omicron/assets/2796466/c37a0d31-26f7-4a5d-9748-ef7212cde9a9)
  • Loading branch information
hawkw authored Feb 12, 2024
1 parent 30571d5 commit e1c3dd7
Show file tree
Hide file tree
Showing 7 changed files with 168 additions and 334 deletions.
12 changes: 6 additions & 6 deletions dev-tools/omdb/tests/env.out
Original file line number Diff line number Diff line change
Expand Up @@ -57,11 +57,11 @@ task: "dns_propagation_internal"


task: "dns_servers_external"
watches list of external DNS servers stored in CockroachDB
watches list of external DNS servers stored in internal DNS


task: "dns_servers_internal"
watches list of internal DNS servers stored in CockroachDB
watches list of internal DNS servers stored in internal DNS


task: "external_endpoints"
Expand Down Expand Up @@ -147,11 +147,11 @@ task: "dns_propagation_internal"


task: "dns_servers_external"
watches list of external DNS servers stored in CockroachDB
watches list of external DNS servers stored in internal DNS


task: "dns_servers_internal"
watches list of internal DNS servers stored in CockroachDB
watches list of internal DNS servers stored in internal DNS


task: "external_endpoints"
Expand Down Expand Up @@ -224,11 +224,11 @@ task: "dns_propagation_internal"


task: "dns_servers_external"
watches list of external DNS servers stored in CockroachDB
watches list of external DNS servers stored in internal DNS


task: "dns_servers_internal"
watches list of internal DNS servers stored in CockroachDB
watches list of internal DNS servers stored in internal DNS


task: "external_endpoints"
Expand Down
8 changes: 4 additions & 4 deletions dev-tools/omdb/tests/successes.out
Original file line number Diff line number Diff line change
Expand Up @@ -251,11 +251,11 @@ task: "dns_propagation_internal"


task: "dns_servers_external"
watches list of external DNS servers stored in CockroachDB
watches list of external DNS servers stored in internal DNS


task: "dns_servers_internal"
watches list of internal DNS servers stored in CockroachDB
watches list of internal DNS servers stored in internal DNS


task: "external_endpoints"
Expand Down Expand Up @@ -313,7 +313,7 @@ task: "dns_servers_internal"
task: "dns_propagation_internal"
configured period: every 1m
currently executing: no
last completed activation: iter 5, triggered by a dependent task completing
last completed activation: iter 4, triggered by a dependent task completing
started at <REDACTED TIMESTAMP> (<REDACTED DURATION>s ago) and ran for <REDACTED DURATION>ms
attempt to propagate generation: 1

Expand Down Expand Up @@ -341,7 +341,7 @@ task: "dns_servers_external"
task: "dns_propagation_external"
configured period: every 1m
currently executing: no
last completed activation: iter 5, triggered by a dependent task completing
last completed activation: iter 4, triggered by a dependent task completing
started at <REDACTED TIMESTAMP> (<REDACTED DURATION>s ago) and ran for <REDACTED DURATION>ms
attempt to propagate generation: 2

Expand Down
2 changes: 1 addition & 1 deletion internal-dns/src/names.rs
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ impl ServiceName {

/// Returns the DNS name for this service, ignoring the zone part of the DNS
/// name
pub(crate) fn dns_name(&self) -> String {
pub fn dns_name(&self) -> String {
match self {
ServiceName::Clickhouse
| ServiceName::ClickhouseKeeper
Expand Down
5 changes: 1 addition & 4 deletions nexus/src/app/background/dns_config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,6 @@ impl BackgroundTask for DnsConfigWatcher {
mod test {
use crate::app::background::common::BackgroundTask;
use crate::app::background::dns_config::DnsConfigWatcher;
use crate::app::background::init::test::read_internal_dns_zone_id;
use crate::app::background::init::test::write_test_dns_generation;
use assert_matches::assert_matches;
use async_bb8_diesel::AsyncRunQueryDsl;
Expand Down Expand Up @@ -197,9 +196,7 @@ mod test {

// Now write generation 2, activate again, and verify that the update
// was sent to the watcher.
let internal_dns_zone_id =
read_internal_dns_zone_id(&opctx, &datastore).await;
write_test_dns_generation(&datastore, internal_dns_zone_id).await;
write_test_dns_generation(&opctx, &datastore).await;
assert_eq!(watcher.borrow().as_ref().unwrap().generation, 1);
let value = task.activate(&opctx).await;
assert_eq!(watcher.borrow().as_ref().unwrap().generation, 2);
Expand Down
214 changes: 34 additions & 180 deletions nexus/src/app/background/dns_servers.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,22 +7,15 @@
use super::common::BackgroundTask;
use futures::future::BoxFuture;
use futures::FutureExt;
use internal_dns::names::ServiceName;
use internal_dns::resolver::Resolver;
use nexus_db_model::DnsGroup;
use nexus_db_model::ServiceKind;
use nexus_db_queries::context::OpContext;
use nexus_db_queries::db::DataStore;
use omicron_common::api::external::DataPageParams;
use serde::Serialize;
use serde_json::json;
use std::net::{SocketAddr, SocketAddrV6};
use std::num::NonZeroU32;
use std::sync::Arc;
use std::net::SocketAddr;
use tokio::sync::watch;

// This constraint could be relaxed by paginating through the list of servers,
// but we don't expect to have this many servers any time soon.
const MAX_DNS_SERVERS: usize = 10;

#[derive(Debug, Clone, Eq, PartialEq, Serialize)]
pub struct DnsServersList {
pub addresses: Vec<SocketAddr>,
Expand All @@ -31,20 +24,17 @@ pub struct DnsServersList {
/// Background task that keeps track of the latest list of DNS servers for a DNS
/// group
pub struct DnsServersWatcher {
datastore: Arc<DataStore>,
dns_group: DnsGroup,
resolver: Resolver,
last: Option<DnsServersList>,
tx: watch::Sender<Option<DnsServersList>>,
rx: watch::Receiver<Option<DnsServersList>>,
}

impl DnsServersWatcher {
pub fn new(
datastore: Arc<DataStore>,
dns_group: DnsGroup,
) -> DnsServersWatcher {
pub fn new(dns_group: DnsGroup, resolver: Resolver) -> DnsServersWatcher {
let (tx, rx) = watch::channel(None);
DnsServersWatcher { datastore, dns_group, last: None, tx, rx }
DnsServersWatcher { dns_group, last: None, tx, rx, resolver }
}

/// Exposes the latest list of DNS servers for this DNS group
Expand Down Expand Up @@ -75,58 +65,38 @@ impl BackgroundTask for DnsServersWatcher {
};

// Read the latest service configuration for this DNS group.
let service_kind = match self.dns_group {
DnsGroup::Internal => ServiceKind::InternalDns,
DnsGroup::External => ServiceKind::ExternalDns,
let service_name = match self.dns_group {
DnsGroup::Internal => ServiceName::InternalDns,
DnsGroup::External => ServiceName::ExternalDns,
};

let pagparams = DataPageParams {
marker: None,
limit: NonZeroU32::try_from(
u32::try_from(MAX_DNS_SERVERS).unwrap(),
)
.unwrap(),
direction: dropshot::PaginationOrder::Ascending,
let result = self.resolver.lookup_all_socket_v6(service_name).await;
let addresses = match result {
Err(error) => {
warn!(
&log,
"failed to lookup DNS servers";
"error" => format!("{:#}", error)
);
return json!({
"error":
format!(
"failed to read list of DNS servers: {:#}",
error
)
});
}
Ok(addresses) => {
// TODO(eliza): it would be nicer if `Resolver` had a method
// returning an iterator instead of a `Vec`, so we didn't
// have to drain the Vec and then collect it into a new
// one...
addresses.into_iter().map(SocketAddr::V6).collect()
}
};

let result = self
.datastore
.services_list_kind(opctx, service_kind, &pagparams)
.await;

if let Err(error) = result {
warn!(
&log,
"failed to read list of DNS servers";
"error" => format!("{:#}", error)
);
return json!({
"error":
format!(
"failed to read list of DNS servers: {:#}",
error
)
});
}

let services = result.unwrap();
if services.len() >= MAX_DNS_SERVERS {
warn!(
&log,
"found {} servers, which is more than MAX_DNS_SERVERS \
({}). There may be more that will not be used.",
services.len(),
MAX_DNS_SERVERS
);
}

let new_config = DnsServersList {
addresses: services
.into_iter()
.map(|s| SocketAddrV6::new(*s.ip, *s.port, 0, 0).into())
.collect(),
};
let new_addrs_dbg = format!("{:?}", new_config);
let new_config = DnsServersList { addresses };
let new_addrs_dbg = format!("{new_config:?}");
let rv =
serde_json::to_value(&new_config).unwrap_or_else(|error| {
json!({
Expand Down Expand Up @@ -177,119 +147,3 @@ impl BackgroundTask for DnsServersWatcher {
.boxed()
}
}

#[cfg(test)]
mod test {
use crate::app::background::common::BackgroundTask;
use crate::app::background::dns_servers::DnsServersList;
use crate::app::background::dns_servers::DnsServersWatcher;
use crate::app::background::dns_servers::MAX_DNS_SERVERS;
use assert_matches::assert_matches;
use async_bb8_diesel::AsyncRunQueryDsl;
use diesel::ExpressionMethods;
use diesel::QueryDsl;
use nexus_db_model::DnsGroup;
use nexus_db_queries::context::OpContext;
use nexus_db_queries::db::model::Service;
use nexus_db_queries::db::model::ServiceKind;
use nexus_test_utils_macros::nexus_test;
use std::net::Ipv6Addr;
use std::net::SocketAddrV6;
use uuid::Uuid;

type ControlPlaneTestContext =
nexus_test_utils::ControlPlaneTestContext<crate::Server>;

#[nexus_test(server = crate::Server)]
async fn test_basic(cptestctx: &ControlPlaneTestContext) {
let nexus = &cptestctx.server.apictx().nexus;
let datastore = nexus.datastore();
let opctx = OpContext::for_tests(
cptestctx.logctx.log.clone(),
datastore.clone(),
);

// Verify the initial state.
let mut task =
DnsServersWatcher::new(datastore.clone(), DnsGroup::Internal);
let watcher = task.watcher();
assert_matches!(*watcher.borrow(), None);

// The datastore from the ControlPlaneTestContext is initialized with
// one DNS server.
let _ = task.activate(&opctx).await;
assert_matches!(*watcher.borrow(), Some(DnsServersList {
ref addresses
}) if addresses.len() == 1);

// If we add another server, we should see it.
{
use nexus_db_queries::db::schema::service::dsl;
diesel::insert_into(dsl::service)
.values(Service::new(
Uuid::new_v4(),
Uuid::new_v4(),
Some(Uuid::new_v4()),
SocketAddrV6::new(Ipv6Addr::LOCALHOST, 1, 0, 0),
ServiceKind::InternalDns,
))
.execute_async(
&*datastore.pool_connection_for_tests().await.unwrap(),
)
.await
.unwrap();
}

let _ = task.activate(&opctx).await;
assert_matches!(*watcher.borrow(), Some(DnsServersList {
ref addresses
}) if addresses.len() == 2);

// If we add MAX_DNS_SERVERS more servers, we should see
// MAX_DNS_SERVERS.
{
use nexus_db_queries::db::schema::service::dsl;
let new_services = (0..u16::try_from(MAX_DNS_SERVERS).unwrap())
.map(|i| {
Service::new(
Uuid::new_v4(),
Uuid::new_v4(),
Some(Uuid::new_v4()),
SocketAddrV6::new(Ipv6Addr::LOCALHOST, i + 2, 0, 0),
ServiceKind::InternalDns,
)
})
.collect::<Vec<_>>();

diesel::insert_into(dsl::service)
.values(new_services)
.execute_async(
&*datastore.pool_connection_for_tests().await.unwrap(),
)
.await
.unwrap();
}

let _ = task.activate(&opctx).await;
assert_matches!(*watcher.borrow(), Some(DnsServersList {
ref addresses
}) if addresses.len() == MAX_DNS_SERVERS);

// Now delete all the servers and try again.
{
use nexus_db_queries::db::schema::service::dsl;
diesel::delete(
dsl::service.filter(dsl::kind.eq(ServiceKind::InternalDns)),
)
.execute_async(
&*datastore.pool_connection_for_tests().await.unwrap(),
)
.await
.unwrap();
}
let _ = task.activate(&opctx).await;
assert_matches!(*watcher.borrow(), Some(DnsServersList {
ref addresses
}) if addresses.is_empty());
}
}
Loading

0 comments on commit e1c3dd7

Please sign in to comment.