Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Network Operator APIs #6016

Open
wants to merge 42 commits into
base: main
Choose a base branch
from

Conversation

internet-diglett
Copy link
Contributor

@internet-diglett internet-diglett commented Jul 8, 2024

Summary

Our currently deployed APIs for operator networking aren't as user friendly as we'd like them to be. After landing these changes we hope to be able to start breaking ground on Terraform provider updates for rack / operator networking, which should increase user friendliness as well as make testing more streamlined in some scenarios. This will also allow us to migrate from using --json-object with Oxide cli commands, for example:

# Add a route
sled03 ➜  a4x2 git:(main) ✗ oxide --profile a4x2 system networking switch-port configuration route add  \
--configuration "default-uplink0" \
--dst 10.88.0.0/24 \
--gw 10.8.0.1 \
--interface qsfp0
{
  "dst": "10.88.0.0/24",
  "gw": "10.8.0.1/32",
  "interface_name": "qsfp0",
  "port_settings_id": "ec815e89-ff84-4e85-ad54-55020eb812c9"
}

# List routes
sled03 ➜  a4x2 git:(main) ✗ oxide --profile a4x2 system networking switch-port configuration route list \
--configuration "default-uplink0"
[
  {
    "dst": "10.88.0.0/24",
    "gw": "10.8.0.1/32",
    "interface_name": "qsfp0",
    "port_settings_id": "ec815e89-ff84-4e85-ad54-55020eb812c9"
  }
]

# Delete a route
sled03 ➜  a4x2 git:(main) ✗ oxide --profile a4x2 system networking switch-port configuration route remove  \
--configuration "default-uplink0" \
--dst 10.88.0.0/24 \
--gw 10.8.0.1 \
--interface qsfp0

# List routes again
sled03 ➜  a4x2 git:(main) ✗ oxide --profile a4x2 system networking switch-port configuration route list --configuration "default-uplink0"
[]

Changes

The following changes should mostly preserve backwards compatibility with the existing API, so we should be able to continue to use all of our existing automation / tooling / scripts for the time being.

  • Change namespace for api endpoints / commands to switch-port-configuration (switch-port-settings is not plural-friendly)

  • Break AddressLot and AddressLotBlock into separate APIs

  • Add endpoint for viewing which configuration has been applied to a switch port (i.e. configuration_view)

  • Rename settings_apply and settings_clear to configuration_apply and configuration_clear

  • Create separate API endpoints for each child object under SwitchPortSettings

    • geometry configuration
    • link configuration
    • address configuration
    • route configuration
    • bgp peer configuration
      • modification: break out communities
      • modification: break out import list
      • modification: break out allow list
  • Create integration tests for new endpoints

    • geometry configuration
    • link configuration
    • address configuration
    • route configuration
    • bgp peer configuration
    • bgp peer communities
    • bgp peer allowed import lists
    • bgp peer allowed export lists
  • update CLI client library: Refactor network operator APIs oxide.rs#817

Additional Acceptance Criteria

Copy link
Contributor

@ahl ahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might think about generating the SDK and CLI as you evolve this API. In particular, I'd love to see this list of CLI commands that require the --json-body parameter to shrink: https://github.com/oxidecomputer/oxide.rs/blob/main/cli/tests/data/json-body-required.txt (note about half of them are networking-related)

@@ -9727,7 +9816,7 @@
"last_address"
]
},
"AddressLotBlockCreate": {
"AddressLotBlock2": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just looking through this quickly, but I think we should avoid adding more of these name conflicting types that result in the 2 suffix.

@internet-diglett
Copy link
Contributor Author

You might think about generating the SDK and CLI as you evolve this API. In particular, I'd love to see this list of CLI commands that require the --json-body parameter to shrink: https://github.com/oxidecomputer/oxide.rs/blob/main/cli/tests/data/json-body-required.txt (note about half of them are networking-related)

@ahl do we have a preferred pattern that we want to use to avoid this?

@ahl
Copy link
Contributor

ahl commented Jul 15, 2024

You might think about generating the SDK and CLI as you evolve this API. In particular, I'd love to see this list of CLI commands that require the --json-body parameter to shrink: https://github.com/oxidecomputer/oxide.rs/blob/main/cli/tests/data/json-body-required.txt (note about half of them are networking-related)

@ahl do we have a preferred pattern that we want to use to avoid this?

There are some structures that are challenging to generate, in particular deeply nested structs. Perhaps put another way: you might keep in mind how you'd like the CLI to work for a particular API concept and then organize the API in a way where you can imagine the direct translation e.g. to CLI arguments. There's more that progenitor could and can do, but some structures--such as SwitchPortSettingsCreate--are pretty hard to imagine translated into something rational at the CLI

@internet-diglett
Copy link
Contributor Author

@ahl I generated a local CLI with this PR's openapi spec and it seems to avoid the need for --json-body:

For more information, try '--help'.
$ ./target/debug/oxide system networking address-lot create
error: the following required arguments were not provided:
  --description <description>
  --kind <kind>
  --name <name>

$ ./target/debug/oxide system networking address-lot block add
error: the following required arguments were not provided:
  --address-lot <address-lot>
  --first-address <first-address>
  --last-address <last-address>

Usage: oxide system networking address-lot block add --address-lot <address-lot> --first-address <first-address> --last-address <last-address>

@ahl
Copy link
Contributor

ahl commented Jul 17, 2024

well that's good news!

@internet-diglett
Copy link
Contributor Author

Local testing against a running control plane:

$ ./target/debug/oxide system networking address-lot list
[
  {
    "description": "initial infrastructure ip address lot",
    "id": "4d2fc8ca-8759-46e4-9c36-7e9bbd14ecc1",
    "kind": "infra",
    "name": "initial-infra",
    "time_created": "2024-07-17T21:39:29.384680Z",
    "time_modified": "2024-07-17T21:39:29.384680Z"
  }
]

$ ./target/debug/oxide system networking address-lot block list --address-lot initial-infra
[
  {
    "first_address": "10.85.0.30",
    "id": "60efc85a-34d0-4e07-b3e2-c11f203e054d",
    "last_address": "10.85.0.30"
  }
]

$ ./target/debug/oxide system networking address-lot block add \
--address-lot initial-infra \
--first-address 10.100.0.0 \
--last-address 10.100.0.255
{
  "first_address": "10.100.0.0",
  "id": "5eff580c-9e96-4c01-a580-b9c782bc4196",
  "last_address": "10.100.0.255"
}

$ ./target/debug/oxide system networking address-lot block list --address-lot initial-infra
[
  {
    "first_address": "10.100.0.0",
    "id": "5eff580c-9e96-4c01-a580-b9c782bc4196",
    "last_address": "10.100.0.255"
  }, {
    "first_address": "10.85.0.30",
    "id": "60efc85a-34d0-4e07-b3e2-c11f203e054d",
    "last_address": "10.85.0.30"
  }
]

$ ./target/debug/oxide system networking address-lot block remove \
--address-lot initial-infra \
--first-address 10.100.0.0 \
--last-address 10.100.0.255

$ ./target/debug/oxide system networking address-lot block list --address-lot initial-infra
[
  {
    "first_address": "10.85.0.30",
    "id": "60efc85a-34d0-4e07-b3e2-c11f203e054d",
    "last_address": "10.85.0.30"
  }
]

@internet-diglett internet-diglett marked this pull request as ready for review July 17, 2024 22:34
@internet-diglett internet-diglett changed the title Refactor Address Lot APIs Refactor Network Operator APIs Aug 19, 2024
@david-crespo
Copy link
Contributor

While you're messing with these APIs... #6497

@rcgoodfellow rcgoodfellow added this to the 11 milestone Sep 6, 2024
Copy link
Contributor

@FelixMcFelix FelixMcFelix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work here Levon! I spent some time reading over this; I really like the more granular updates we can have over switch port configuration. I've left some comments throughout.

common/src/api/external/mod.rs Outdated Show resolved Hide resolved
nexus/types/src/external_api/params.rs Show resolved Hide resolved
nexus/db-queries/src/db/datastore/address_lot.rs Outdated Show resolved Hide resolved
Comment on lines +688 to +703
let port_settings_id =
switch_port_configuration_id(&conn, identity).await?;

let port_config = SwitchPortConfig {
port_settings_id,
geometry: new_geometry,
};

// create or update geometry
diesel::insert_into(dsl::switch_port_settings_port_config)
.values(port_config.clone())
.on_conflict(dsl::port_settings_id)
.do_update()
.set(dsl::geometry.eq(new_geometry))
.execute_async(&conn)
.await?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this ID check -> use pattern is safe, given that CRDB works with transations set as SERIALIZABLE. I'd have to defer to someone with more knowledge there though!

nexus/db-queries/src/db/datastore/switch_port.rs Outdated Show resolved Hide resolved
Comment on lines +1533 to +1554
NameOrId::Id(id) => {
// verify id is valid
bgp_config::table
.filter(bgp_config::time_deleted.is_null())
.filter(bgp_config::id.eq(id))
.select(bgp_config::id)
.limit(1)
.first_async::<Uuid>(&conn)
.await
.map_err(|e: diesel::result::Error| {
match e {
diesel::result::Error::NotFound => {
err.bail(Error::not_found_by_id(ResourceType::BgpConfig, &id))
},
_ => {
let message = "error while looking up bgp config for bgp peer";
error!(opctx.log, "{message}"; "error" => ?e);
err.bail(Error::internal_error(message))
},
}
})
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should most of the linked config elements here have a lookup_resource! definition? That doesn't give us LookupPath on a conn, but...

I'm also starting to wonder about the NameOrId resolution within each transaction -- I think that resolving to an ID should still occur only once outside the transaction, and then we {use, fetch, check liveness of} the target resource by ID as part of the transaction. I don't see a way that these transactions can cause a retry today, but if that changes I imagine you could hypothetically be working on renamed resources.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we allow renaming of resources today? It is my understanding that Name and ID are not changeable.

Copy link
Contributor

@FelixMcFelix FelixMcFelix Oct 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ID is immutable, yeah -- I had meant to select the NameOrId::Name match branch at the time.

My thinking is in case an update endpoint does crop up in future -- e.g., not all project scoped resources have update endpoints, but the resource name is almost always mutable when such an endpoint exists.

nexus/src/external_api/http_entrypoints.rs Outdated Show resolved Hide resolved
nexus/types/src/external_api/params.rs Outdated Show resolved Hide resolved
nexus/types/src/external_api/params.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@elaine-oxide elaine-oxide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments. I did a quick review for consistency, but did not go deep into the DB transactions.

common/src/api/external/mod.rs Outdated Show resolved Hide resolved
common/src/api/external/mod.rs Outdated Show resolved Hide resolved
nexus/db-queries/src/db/datastore/switch_port.rs Outdated Show resolved Hide resolved
nexus/db-queries/src/db/datastore/switch_port.rs Outdated Show resolved Hide resolved
keepalive: new_settings.keepalive.into(),
remote_asn: new_settings.remote_asn.map(Into::into),
min_ttl: new_settings.min_ttl.map(Into::into),
md5_auth_key: new_settings.md5_auth_key,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a clone required?

Also check line 638.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies, I am unsure of what you are referring to?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I should have noticed/clarified that line 638 was from a different file that has the same name but different directory: nexus/db-model/src/switch_port.rs:638

For example, nexus/db-queries/src/db/datastore/switch_port.rs:590 has code that looks like:

md5_auth_key: p.md5_auth_key.clone(),

I was raising the question of, for any line of code that has a style of md5_auth_key: xyz.md5_auth_key (includes this line here for this comment, and above other file line 638), whether to consider if it should actually be md5_auth_key: xyz.md5_auth_key.clone(), because I've seen usage of clone() for lines of code related to md5_auth_key in both the omicron and oxide.rs repositories in the past.
See:

This was part of my quick checking for consistency throughout the code that I looked at.

Copy link
Contributor Author

@internet-diglett internet-diglett Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok! So generally clone() is required for any data you're "giving up ownership of" due to passing it to a different scope (like providing it as a parameter to a function), but you need to keep a copy of the value in the current scope for use later on. So if I wanted to pass that value to multiple function calls, I would need to clone it. Here, we don't need to hold on to this data, so it's okay that we give our only "copy" of it to the function that we pass the struct to.

Fortunately, if you ever miss a clone() the code will fail to compile due to violating ownership / borrow checker rules.

nexus/types/src/external_api/params.rs Outdated Show resolved Hide resolved
nexus/types/src/external_api/params.rs Outdated Show resolved Hide resolved
nexus/types/src/external_api/params.rs Outdated Show resolved Hide resolved
nexus/types/src/external_api/params.rs Outdated Show resolved Hide resolved
nexus/types/src/external_api/params.rs Outdated Show resolved Hide resolved
@internet-diglett
Copy link
Contributor Author

Many thanks for the reviews, going to chip away at these fixes!

@rcgoodfellow rcgoodfellow modified the milestones: 11, 12 Sep 26, 2024
@internet-diglett
Copy link
Contributor Author

@taspelund do you mind kicking the tires on this and oxidecomputer/oxide.rs#817?

@internet-diglett
Copy link
Contributor Author

Since this PR is already very big we'll perform fixes for #6462 and #6398 in a follow up PR.

Copy link
Contributor

@elaine-oxide elaine-oxide left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments. I did a quick review for consistency, viewing only changes from commits since my last review. Did not check tests.

pub remote_asn: Option<u32>,

/// Require messages from a peer have a minimum IP time to live field.
/// Require messages from this peer have a minimum IP time to live field.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: "Require that messages...", to be consistent with line 2675.

@@ -129,6 +129,9 @@ snapshot_view GET /v1/snapshots/{snapshot}

API operations found with tag "system/hardware"
OPERATION ID METHOD URL PATH
networking_switch_port_active_configuration_clear DELETE /v1/system/hardware/racks/{rack_id}/switch/{switch}/switch-port/{port}/configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird column alignment compared to rest of file, but ok.

@@ -129,6 +129,9 @@ snapshot_view GET /v1/snapshots/{snapshot}

API operations found with tag "system/hardware"
OPERATION ID METHOD URL PATH
networking_switch_port_active_configuration_clear DELETE /v1/system/hardware/racks/{rack_id}/switch/{switch}/switch-port/{port}/configuration
networking_switch_port_active_configuration_set PUT /v1/system/hardware/racks/{rack_id}/switch/{switch}/switch-port/{port}/configuration
networking_switch_port_active_configuration_view GET /v1/system/hardware/racks/{rack_id}/switch/{switch}/switch-port/{port}/configuration
networking_switch_port_apply_settings POST /v1/system/hardware/switch-port/{port}/settings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking at the diffs since my last review, and looking at this file as a whole, and looking in particular for consistency for the DELETE method (due to diff for line 195).

Are lines 50-51 similar to lines 135-136, in that a specific element is not specified at the end of the URL?

instance_ephemeral_ip_attach             POST     /v1/instances/{instance}/external-ips/ephemeral
instance_ephemeral_ip_detach             DELETE   /v1/instances/{instance}/external-ips/ephemeral

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can only be one active configuration per switch port, so specifying which configuration to delete is unnecessary here.

) -> Result<HttpResponseDeleted, HttpError>;

/// List switch port settings
#[endpoint {
method = GET,
path = "/v1/system/networking/switch-port-settings",
path = "/v1/system/networking/switch-port-configuration",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ordering in nexus/external-api/output/nexus_tags.txt made this GET hard to find there, as it doesn't come right after the POST and DELETE in that file, although it seems to be sorted in alphabetical order by OPERATION ID there. Ok if no changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that file is auto-generated

async fn networking_bgp_config_delete(
rqctx: RequestContext<Self::Context>,
sel: Path<params::BgpConfigSelector>,
) -> Result<HttpResponseUpdatedNoContent, HttpError>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is HttpResponse type ok? Can check other places with this type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be Deleted, will update

&opctx, &port, rack_id, switch, &settings,
)
.await?;
Ok(HttpResponseUpdatedNoContent {})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The syntax looks different here compared to other places (part of my consistency check). Could check other places too, as this appears in multiple places.

@@ -4813,15 +3988,15 @@ async fn networking_switch_port_status(
async fn networking_bgp_announce_set_update(
rqctx: RequestContext<ApiContext>,
config: TypedBody<params::BgpAnnounceSetCreate>,
) -> Result<HttpResponseCreated<BgpAnnounceSet>, HttpError> {
) -> Result<HttpResponseOk<BgpAnnounceSet>, HttpError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this HttpResponse type ok? In the rest of the code, it is mostly only used for list/view. Double check other places.

@@ -2119,6 +2078,19 @@ pub struct SwitchPortApplySettings {
pub port_settings: NameOrId,
}

/// Select a switch port by name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check if this doc string needs to be updated, as there are other fields that are not just name.

"tags": [
"system/networking"
],
"summary": "Get information about a named set of switch-port-settings",
"operationId": "networking_switch_port_configuration_view",
"summary": "Delete switch port settings",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

settings or configuration? (Fix in original code with docstring, check other places for consistency with new terminology)

@@ -19975,7 +20143,7 @@
]
},
"RouteAddRemove": {
"description": "A route to a destination network through a gateway address to add or remove to an interface.",
"description": "A network route to to add to or remove from an interface.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"to to" (Fix in original code with docstring).

@morlandi7 morlandi7 modified the milestones: 12, 13 Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants