Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rpws for all networking #4822

Merged
merged 48 commits into from
Mar 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
a3d0f56
WIP: placeholder to keep PR open
Feb 1, 2024
f289028
Revert "WIP: placeholder to keep PR open"
Feb 5, 2024
ac7a086
Implement BackgroundTask for switch port settings
Feb 5, 2024
5152b0b
bump fixtures
internet-diglett Feb 5, 2024
f69c3ec
Merge branch 'main' into rpws-for-all-networking
internet-diglett Feb 5, 2024
97db963
fix log attribute
internet-diglett Feb 6, 2024
185ae2f
comments for aiding refactor
internet-diglett Feb 8, 2024
2939cd9
Merge branch 'main' into rpws-for-all-networking
internet-diglett Feb 14, 2024
a00f21b
fixup! Merge branch 'main' into rpws-for-all-networking
internet-diglett Feb 14, 2024
09b241c
scaffold new background tasks
internet-diglett Feb 14, 2024
31db009
WIP: port switch port settings saga to RPW
internet-diglett Feb 20, 2024
ab0ed14
more WIP: continue port of saga to RPW
internet-diglett Feb 20, 2024
3d9fc13
clean up debris from refactor
internet-diglett Feb 21, 2024
6e3e390
More WIP: port bootstore updates
internet-diglett Feb 23, 2024
46264e8
more refactor debris cleanup
internet-diglett Feb 24, 2024
cb7ae9c
add more logging to workflow for additional tshooting
internet-diglett Feb 28, 2024
bec23c1
Enable runtime toggling of RPWs
internet-diglett Feb 28, 2024
4b970f1
WIP: remove routing from dendrite API calls
internet-diglett Mar 1, 2024
24aa837
cleanup todo!()
internet-diglett Mar 4, 2024
2820627
Merge branch 'main' into rpws-for-all-networking
internet-diglett Mar 4, 2024
05b37de
first round of pr fixes
Mar 4, 2024
7c7e22a
Second pass of PR fixes
Mar 5, 2024
78818d1
convert route diff methods to use HashSet
Mar 5, 2024
3b0ae08
try reordering rack init logic to resolve race
Mar 6, 2024
1abe3ed
try iterating over *initialized* racks
Mar 6, 2024
25a6ffe
remove toggle table, add index on rack(initialized)
Mar 6, 2024
90e3ed6
add the fully qualified name to the schema
Mar 7, 2024
a32d6fa
cache bootstore configs for subsequent comparisons and auditing
Mar 7, 2024
e01fb5b
track bootstore history
Mar 8, 2024
121c8b5
EXPECTORATE
internet-diglett Mar 8, 2024
61fc7ae
timeout on nexus zone startup
Mar 8, 2024
29dbb07
set the timeout to a more reasonable value
Mar 8, 2024
0126ae6
fix broken sed command
Mar 8, 2024
90cd854
Fix Correctness Issues
Mar 9, 2024
9aa14e8
move loopback address management to rpw
internet-diglett Mar 11, 2024
dbcdc5b
EXPECTORATE
internet-diglett Mar 11, 2024
9c58bf1
Merge branch 'main' into rpws-for-all-networking
internet-diglett Mar 12, 2024
cc2255e
remove duplicates
internet-diglett Mar 12, 2024
2f0fc52
Do not fail if address-lot already exists
Mar 14, 2024
8aadca9
WIP: make bgp creation idempotent
Mar 15, 2024
8db6137
WIP: make bgp call idempotent
Mar 15, 2024
af4fc83
add description to bgp config
Mar 15, 2024
1376116
add timestamp fields to bgp config insertion
Mar 16, 2024
4f6067c
adjust vdev creation parameters for virt disks
Mar 16, 2024
95024d9
Make Address Lot Creation Idempotent
Mar 16, 2024
3efe2de
a bit more cleanup
Mar 16, 2024
7d41d73
bump dendrite
Mar 16, 2024
e28135f
Merge branch 'main' into rpws-for-all-networking
Mar 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 14 additions & 4 deletions .github/buildomat/jobs/deploy.sh
Original file line number Diff line number Diff line change
Expand Up @@ -231,12 +231,10 @@ first = \"$SERVICE_IP_POOL_START\"
/^last/c\\
last = \"$SERVICE_IP_POOL_END\"
}
/^\\[rack_network_config/,/^$/ {
/^infra_ip_first/c\\
/^infra_ip_first/c\\
infra_ip_first = \"$UPLINK_IP\"
/^infra_ip_last/c\\
/^infra_ip_last/c\\
infra_ip_last = \"$UPLINK_IP\"
}
/^\\[\\[rack_network_config.ports/,/^\$/ {
/^routes/c\\
routes = \\[{nexthop = \"$GATEWAY_IP\", destination = \"0.0.0.0/0\"}\\]
Expand Down Expand Up @@ -335,6 +333,18 @@ while [[ $(pfexec svcs -z $(zoneadm list -n | grep oxz_ntp) \
done
echo "Waited for chrony: ${retry}s"

# Wait for at least one nexus zone to become available
retry=0
until zoneadm list | grep nexus; do
if [[ $retry -gt 300 ]]; then
echo "Failed to start at least one nexus zone after 300 seconds"
exit 1
fi
sleep 1
retry=$((retry + 1))
done
echo "Waited for nexus: ${retry}s"

export RUST_BACKTRACE=1
export E2E_TLS_CERT IPPOOL_START IPPOOL_END
eval "$(./tests/bootstrap)"
Expand Down
17 changes: 17 additions & 0 deletions clients/mg-admin-client/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,13 @@ mod inner {
}

pub use inner::types;
use inner::types::Prefix4;
pub use inner::Error;

use inner::Client as InnerClient;
use omicron_common::api::external::BgpPeerState;
use slog::Logger;
use std::hash::Hash;
use std::net::Ipv6Addr;
use std::net::SocketAddr;
use thiserror::Error;
Expand Down Expand Up @@ -81,3 +83,18 @@ impl Client {
Ok(Self { inner, log })
}
}

impl Eq for Prefix4 {}

impl PartialEq for Prefix4 {
fn eq(&self, other: &Self) -> bool {
self.value == other.value && self.length == other.length
}
}

impl Hash for Prefix4 {
fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
self.value.hash(state);
self.length.hash(state);
}
}
12 changes: 12 additions & 0 deletions dev-tools/omdb/tests/env.out
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,10 @@ task: "service_zone_nat_tracker"
ensures service zone nat records are recorded in NAT RPW table


task: "switch_port_config_manager"
manages switch port settings for rack switches


---------------------------------------------
stderr:
note: using Nexus URL http://127.0.0.1:REDACTED_PORT
Expand Down Expand Up @@ -182,6 +186,10 @@ task: "service_zone_nat_tracker"
ensures service zone nat records are recorded in NAT RPW table


task: "switch_port_config_manager"
manages switch port settings for rack switches


---------------------------------------------
stderr:
note: Nexus URL not specified. Will pick one from DNS.
Expand Down Expand Up @@ -259,6 +267,10 @@ task: "service_zone_nat_tracker"
ensures service zone nat records are recorded in NAT RPW table


task: "switch_port_config_manager"
manages switch port settings for rack switches


---------------------------------------------
stderr:
note: Nexus URL not specified. Will pick one from DNS.
Expand Down
15 changes: 13 additions & 2 deletions dev-tools/omdb/tests/successes.out
Original file line number Diff line number Diff line change
Expand Up @@ -299,6 +299,10 @@ task: "service_zone_nat_tracker"
ensures service zone nat records are recorded in NAT RPW table


task: "switch_port_config_manager"
manages switch port settings for rack switches


---------------------------------------------
stderr:
note: using Nexus URL http://127.0.0.1:REDACTED_PORT/
Expand Down Expand Up @@ -368,7 +372,7 @@ task: "nat_v4_garbage_collector"
currently executing: no
last completed activation: iter 2, triggered by an explicit signal
started at <REDACTED TIMESTAMP> (<REDACTED DURATION>s ago) and ran for <REDACTED DURATION>ms
warning: unknown background task: "nat_v4_garbage_collector" (don't know how to interpret details: Null)
last completion reported error: failed to resolve addresses for Dendrite services: no record found for Query { name: Name("_dendrite._tcp.control-plane.oxide.internal."), query_type: SRV, query_class: IN }

task: "blueprint_loader"
configured period: every 1m 40s
Expand All @@ -389,7 +393,7 @@ task: "bfd_manager"
currently executing: no
last completed activation: iter 2, triggered by an explicit signal
started at <REDACTED TIMESTAMP> (<REDACTED DURATION>s ago) and ran for <REDACTED DURATION>ms
warning: unknown background task: "bfd_manager" (don't know how to interpret details: Object {})
last completion reported error: failed to resolve addresses for Dendrite services: no record found for Query { name: Name("_dendrite._tcp.control-plane.oxide.internal."), query_type: SRV, query_class: IN }

task: "external_endpoints"
configured period: every 1m
Expand Down Expand Up @@ -440,6 +444,13 @@ task: "service_zone_nat_tracker"
started at <REDACTED TIMESTAMP> (<REDACTED DURATION>s ago) and ran for <REDACTED DURATION>ms
last completion reported error: inventory collection is None

task: "switch_port_config_manager"
configured period: every 30s
currently executing: no
last completed activation: iter 2, triggered by an explicit signal
started at <REDACTED TIMESTAMP> (<REDACTED DURATION>s ago) and ran for <REDACTED DURATION>ms
warning: unknown background task: "switch_port_config_manager" (don't know how to interpret details: Object {})

---------------------------------------------
stderr:
note: using Nexus URL http://127.0.0.1:REDACTED_PORT/
Expand Down
17 changes: 17 additions & 0 deletions nexus-config/src/nexus_config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -367,6 +367,8 @@ pub struct BackgroundTaskConfig {
pub sync_service_zone_nat: SyncServiceZoneNatConfig,
/// configuration for the bfd manager task
pub bfd_manager: BfdManagerConfig,
/// configuration for the switch port settings manager task
pub switch_port_settings_manager: SwitchPortSettingsManagerConfig,
/// configuration for region replacement task
pub region_replacement: RegionReplacementConfig,
}
Expand Down Expand Up @@ -427,6 +429,15 @@ pub struct SyncServiceZoneNatConfig {
pub period_secs: Duration,
}

#[serde_as]
#[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize)]
pub struct SwitchPortSettingsManagerConfig {
/// Interval (in seconds) for periodic activations of this background task.
/// This task is also activated on-demand when any of the switch port settings
/// api endpoints are called.
#[serde_as(as = "DurationSeconds<u64>")]
pub period_secs: Duration,
}
#[serde_as]
#[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize)]
pub struct InventoryConfig {
Expand Down Expand Up @@ -713,6 +724,7 @@ mod test {
blueprints.period_secs_load = 10
blueprints.period_secs_execute = 60
sync_service_zone_nat.period_secs = 30
switch_port_settings_manager.period_secs = 30
region_replacement.period_secs = 30
[default_region_allocation_strategy]
type = "random"
Expand Down Expand Up @@ -828,6 +840,10 @@ mod test {
sync_service_zone_nat: SyncServiceZoneNatConfig {
period_secs: Duration::from_secs(30)
},
switch_port_settings_manager:
SwitchPortSettingsManagerConfig {
period_secs: Duration::from_secs(30),
},
region_replacement: RegionReplacementConfig {
period_secs: Duration::from_secs(30),
},
Expand Down Expand Up @@ -893,6 +909,7 @@ mod test {
blueprints.period_secs_load = 10
blueprints.period_secs_execute = 60
sync_service_zone_nat.period_secs = 30
switch_port_settings_manager.period_secs = 30
region_replacement.period_secs = 30
[default_region_allocation_strategy]
type = "random"
Expand Down
6 changes: 4 additions & 2 deletions nexus/db-model/src/address_lot.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,10 @@ use omicron_common::api::external;
use serde::{Deserialize, Serialize};
use uuid::Uuid;

pub const INFRA_LOT: &str = "initial-infra";

impl_enum_type!(
#[derive(SqlType, Debug, Clone, Copy)]
#[derive(SqlType, Debug, Clone, Copy, QueryId)]
#[diesel(postgres_type(name = "address_lot_kind", schema = "public"))]
pub struct AddressLotKindEnum;

Expand All @@ -24,7 +26,7 @@ impl_enum_type!(
FromSqlRow,
PartialEq,
Serialize,
Deserialize
Deserialize,
)]
#[diesel(sql_type = AddressLotKindEnum)]
pub enum AddressLotKind;
Expand Down
18 changes: 17 additions & 1 deletion nexus/db-model/src/bootstore.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
use crate::schema::bootstore_keys;
use crate::schema::{bootstore_config, bootstore_keys};
use chrono::{DateTime, Utc};
use serde::{Deserialize, Serialize};

pub const NETWORK_KEY: &str = "network_key";
Expand All @@ -11,3 +12,18 @@ pub struct BootstoreKeys {
pub key: String,
pub generation: i64,
}

/// BootstoreConfig is a key-value store for bootstrapping data.
/// We serialize the data as json because it is inherently polymorphic and it
/// is not intended to be queried directly.
#[derive(
Queryable, Insertable, Selectable, Clone, Debug, Serialize, Deserialize,
)]
#[diesel(table_name = bootstore_config)]
pub struct BootstoreConfig {
pub key: String,
pub generation: i64,
pub data: serde_json::Value,
pub time_created: DateTime<Utc>,
pub time_deleted: Option<DateTime<Utc>>,
}
12 changes: 11 additions & 1 deletion nexus/db-model/src/schema.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ use omicron_common::api::external::SemverVersion;
///
/// This should be updated whenever the schema is changed. For more details,
/// refer to: schema/crdb/README.adoc
pub const SCHEMA_VERSION: SemverVersion = SemverVersion::new(44, 0, 0);
pub const SCHEMA_VERSION: SemverVersion = SemverVersion::new(45, 0, 0);

table! {
disk (id) {
Expand Down Expand Up @@ -1529,6 +1529,16 @@ table! {
}
}

table! {
bootstore_config (key, generation) {
key -> Text,
generation -> Int8,
data -> Jsonb,
time_created -> Timestamptz,
time_deleted -> Nullable<Timestamptz>,
}
}

table! {
bfd_session (remote, switch) {
id -> Uuid,
Expand Down
1 change: 1 addition & 0 deletions nexus/db-model/src/unsigned.rs
Original file line number Diff line number Diff line change
Expand Up @@ -130,6 +130,7 @@ where
FromSqlRow,
Serialize,
Deserialize,
QueryId,
)]
#[diesel(sql_type = sql_types::BigInt)]
#[repr(transparent)]
Expand Down
Loading
Loading