-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconfigurator: Record external networking allocations when realizing a blueprint #5045
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
// one. Bail out here if this zone already has one or more allocated | ||
// NICs but not the one we think it needs. | ||
// | ||
// This doesn't check the allocated NIC's subnet against our NICs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable.
// Only attempt to allocate `external_ip` if it isn't already assigned | ||
// to this zone. | ||
// | ||
// Checking for the existing of the external IP and then creating it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me.
// if somehow this same service got a _different_ NIC allocated | ||
// to it in the TOCTOU race window above. That should be | ||
// impossible with the way we generate blueprints, so we'll just | ||
// return a scary error here and expect to never see it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔️
.await | ||
.expect("failed to expand service IP pool"); | ||
|
||
// Generate the values we care about. (Other required zone config params |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strategy of champion testers everywhere!
allocated_ip.first_port == SqlU16(first) | ||
&& allocated_ip.last_port == SqlU16(last) | ||
}) | ||
.unwrap_or(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does port_range
being None mean here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two kinds of external IPs that have separate methods (ensure_external_service_ip
and ensure_external_service_snat_ip
) that both want to call this method. In the former case, all we have is an IP address, no port range, so when it calls this function it passes None
. In the latter case, we have a SourceNatConfig
value, which includes an IP and a port range, so when it calls this function it passes Some(_)
.
In practice, the non-SNAT IPs are given a port range of (0, 65535)
, but it seemed weird to hardcode that here when it's a decision being made elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm interesting, should we use a type that's isomorphic to Option but not actually that? Definitely was surprised.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(In particular, Ord doesn't make sense I think)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, where is Ord coming into play?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, what I mean is that if we define our own type, we can choose to not implement Ord
on it, since it doesn't make sense in this case. We could also define other operations like subset_of
/superset_of
, which is what I think this code is trying to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(But this is followup material, doesn't have to be done here.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want subset/superset here; this is just a straight equality check (except for having to convert to SqlU16
to actually do the comparison). Is there a clearer way to express that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, no I think it was just that I misunderstood what this was doing :)
format!( | ||
"failed to allocate IP to {zone_type} {service_id}: \ | ||
{external_ip}" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a note here saying that this is expected in case of a race? (Or maybe this is already done at a higher level.)
I'm thinking about future debugging sessions where a simple error in the logs can be confusing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I hope the error we get from the datastore would make it clear whether a failure is due to a duplication issue or some other transient db issue. I'm hesitant to label this as "expected if we're racing" when the underlying error might be some other kind of db problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, just trying to think it through.
.with_context(|| { | ||
format!( | ||
"failed to allocate snat IP to {zone_type} {service_id}: \ | ||
{snat:?}" | ||
) | ||
})?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment applies here I guess, and maybe it does suggest that there is (or should be) a higher-level note.
service_id: Uuid, | ||
external_ip: IpAddr, | ||
ip_name: &Name, | ||
) -> anyhow::Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on use anyhow::Result
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No objection, I just have a light preference for this because when I see Result<()>
I have to either remember or look up which typealias it is.
On testbed, I'm seeing this warning as soon as I generate a blueprint for g2, which certainly doesn't have a nexus zone yet:
|
I think the previous error is preventing blueprint execution. No |
@andrewjstone is this possibly being caused by #5056? |
Ah very possibly. |
Yeah, it definitely is, and I strongly suspect the NIC it's warning about is for one of the existing Nexus zones. This PR attempts to reconcile all of the external IP allocations, which should be a no-op for anything set up during RSS, but is failing because of the mismatched slot number. Fixing #5056 is at the top of my list. |
…5065) This makes a several minor changes to plumb slots through: * `IncompleteNetworkInterface` now stores an optional slot, just like it stores an optional IP/MAC address * In `network_interfaces::InsertQuery`, if the incoming slot is set, we use it directly instead of running the `NextItem`-based subquery * Adds a partial index to ensure uniqueness of a slot within a single `parent_id` (I believe this is correct, but would love confirmation from someone more familiar!) * `IncompleteNetworkInterface::new_service()` now takes a _non-optional_ IP, MAC address, and slot. This matches how it was called in all non-test code. * Tweaked the Nexus internal API used for RSS handoff to include the slot in the description of NICs. This is a partial fix for #5056, and should produce correct behavior on new systems that run through RSS, even without a fix for #5055 (because we bypass `NextItem` altogether with this change). In particular, I think this should unblock testing of #5045 on madrid / testbeds. It does not address the already-recorded-NICs-with-incorrect-slots on systems like dogfood; I'll take care of that in a subsequent PR.
…ist-eips` (#5064) This PR has some omdb commands I wanted during dev/debug of #5045. It expands the output of `list-vnics` to include the parent's ID (particularly useful when trying to determine the external IP of a specific Nexus instance, for example): ``` IP PORTS KIND STATE OWNER_KIND OWNER_ID OWNER_DESCRIPTION 10.1.1.3/32 0/65535 floating Attached instance 4e6fb33a-7ba2-4a5e-abc5-dc9b047c01e0 v6/some-vm2 10.1.1.4/32 0/16383 SNAT Attached instance 4e6fb33a-7ba2-4a5e-abc5-dc9b047c01e0 v6/some-vm2 10.1.1.5/32 0/65535 ephemeral Attached instance 4e6fb33a-7ba2-4a5e-abc5-dc9b047c01e0 v6/some-vm2 172.20.26.1/32 0/65535 floating Attached service edd99650-5df1-4241-815d-253e4ef2399c ExternalDns 172.20.26.2/32 0/65535 floating Attached service f500d564-c40a-4eca-ac8a-a26b435f2037 ExternalDns 172.20.26.3/32 0/65535 floating Attached service 65a11c18-7f59-41ac-b9e7-680627f996e7 Nexus 172.20.26.4/32 0/65535 floating Attached service 20b100d0-84c3-4119-aa9b-0c632b0b6a3a Nexus 172.20.26.5/32 0/65535 floating Attached service 2898657e-4141-4c05-851b-147bffc6bbbd Nexus 172.20.26.6/32 0/16383 SNAT Attached service c3ec3d1a-3172-4d36-bfd3-f54a04d5ba55 Ntp ``` and adds a `list-vnics` command to show allocated vnics: ``` IP MAC SLOT PRIMARY KIND SUBNET PARENT_ID DESCRIPTION 172.30.0.5/32 A8:40:25:F8:A5:8C 1 true instance 172.30.0.0/22 2a4afdda-e269-48bc-913f-01ad57c50543 default primary interface for p4 172.30.0.5/32 A8:40:25:F5:AF:F0 1 true instance 172.30.0.0/22 be705808-d507-4693-9a97-186c92970e7b default primary interface for updateinst 172.30.0.5/32 A8:40:25:F7:3B:00 1 true instance 172.30.0.0/22 0ab1939f-af6e-4ea2-a155-71f210e937fc a sample nic 172.30.1.5/32 A8:40:25:FF:B0:9C 1 true service 172.30.1.0/24 edd99650-5df1-4241-815d-253e4ef2399c external_dns service vNIC 172.30.1.6/32 A8:40:25:FF:D0:B4 1 true service 172.30.1.0/24 f500d564-c40a-4eca-ac8a-a26b435f2037 external_dns service vNIC 172.30.2.5/32 A8:40:25:FF:A6:83 1 true service 172.30.2.0/24 65a11c18-7f59-41ac-b9e7-680627f996e7 nexus service vNIC 172.30.2.6/32 A8:40:25:FF:B4:C1 1 true service 172.30.2.0/24 20b100d0-84c3-4119-aa9b-0c632b0b6a3a nexus service vNIC 172.30.2.7/32 A8:40:25:FF:C6:59 0 true service 172.30.2.0/24 2898657e-4141-4c05-851b-147bffc6bbbd nexus service vNIC 172.30.3.5/32 A8:40:25:FF:B2:52 1 true service 172.30.3.0/24 c3ec3d1a-3172-4d36-bfd3-f54a04d5ba55 ntp service vNIC 172.30.3.6/32 A8:40:25:FF:A0:F9 1 true service 172.30.3.0/24 6ea2684c-115e-48a6-8453-ab52d1cecd73 ntp service vNIC ``` (This command immediately revealed issues with the slot number recording on dogfood, which led to opening #5056.)
@andrewjstone I've caught up with main which has fixed #5056, if you want to give this another try. |
Will do! |
I re-ran this and blueprints are executing successfully. Unfortunately I didn't remember to add an extra nexus node to the code, so I'm stuck at 3 nexuses for the original RSS nodes. However, the bug appears to be fixed regarding #5056, since the executor is successfully deploying zones. |
I ran this through a test on madrid, and it was almost successful. I was able to add a 4th Nexus, it did go to the one sled that didn't have one from RSS, the allocations for external IP and vNIC looked valid and were recorded in cockroach, and the background NAT RPW set up the NAT rules for it after the inventory collection picked it up. I ran into two problems (one extremely minor that may not even be worth fixing, and one serious):
I'm going to poke around 2 a bit and will open an issue soon. |
Haha 🤦♂️ the test failure on the latest commit is legit. I think "always request new firewall rules when getting zones" is probably okay once Nexus is up, but can't gate RSS. I need to think about this more. |
We chatted about this in the update sync. I reverted the last commit that attempted to quickly fix the firewall rule problem, and will land this as-is. Everything here is correct and necessary to the best of our knowledge; it just isn't fully complete and functional. I'll open a followup PR with the firewall rule changes later. |
This PR expands blueprint execution to record external IPs and created NICs for Nexus, Boundary NTP, and External DNS in crdb before sending new zone requests to sled-agents.
The implementation has a very obvious TOCTOU race, but I think it's okay (as explained in the comments inline). If two Nexuses try to realize the same blueprint simultaneously and both see no records present, only one will succeed to insert, and the other will spuriously fail. Assuming that failure causes a retry, subsequent attempts to realize the same blueprint will succeed, as the required records will be present. If this seems wrong, please holler!
I'd like to give this a spin on either madrid or one of the software testbeds before merging, but I think this is ready for review.