Floating IP delete/attach/detach should be fully contained in transactions #4628

FelixMcFelix · 2023-12-06T16:52:32Z

As of #4559, the above 3 operations use a fetched FloatingIp/ExternalIp to provide user-friendly error messages based on the current state of parent_id, before firing an UPDATE query conditioned on the correct parent_id field.

While this is not incorrect, this will provide a less user-friendly error message in the event of concurrent modification. We should move these datastore methods into a transaction and/or transaction_retry_wrapper to clean this up.

The text was updated successfully, but these errors were encountered:

bnaecker · 2023-12-06T19:36:49Z

I'm pretty confident one can do this without an interactive transaction (with multiple DB roundtrips). Instead, the update_and_check() machinery could be used to attach or detach a floating IP with the desired conditions, such as parent_id IS NULL or similar.

FelixMcFelix · 2023-12-07T00:02:03Z

That looks like just the ticket, thanks! I hadn't yet come across it -- I'll see about working that in as a solution alongside #4630.

Should close #4628.

This PR adds new endpoints to attach and detach external IPs to/from an individual instance at runtime, when instances are either stopped or started. These new endpoints are: * POST `/v1/floating-ips/{floating_ip}/attach` * POST `/v1/floating-ips/{floating_ip}/detach` * POST `/v1/instances/{instance}/external-ips/ephemeral` * DELETE `/v1/instances/{instance}/external-ips/ephemeral` These follow and enforce the same rules as external IPs registered during instance creation: at most one ephemeral IP, and at most 32 external IPs total. `/v1/floating-ips/{floating_ip}/attach` includes a `kind` field to account for future API resources which a FIP may be bound to -- such as internet gateways, load balancers, and services. ## Interaction with other instance lifecycle changes and sagas Both external IP modify sagas begin with an atomic update to external IP attach state conditioned on $\mathit{state}\in[ \mathit{started},\mathit{stopped}]$. As a result, we know that an external IP saga can only ever start before any other instance state change occurs. We then only need to think about how these other sagas/events must behave when called *during* an attach/detach, keeping in mind that these are worst-case orderings: attach/detach are likely to complete quickly. ### Instance start & migrate Both of these sagas alter an instance's functional sled ID, which controls whether NAT entry insertion and OPTE port state updates are performed. If an IP attach/detach is incomplete when either saga reaches `instance_ensure_dpd_config` or `instance_ensure_registered` (e.g., any IP associated with the target instance is in attaching/detaching state), the start/migrate will unwind with an HTTP 503. Generally, neither should undo in practice since IP attach/detach are fast operations -- particularly when an instance is formerly stopped. This is used solely to guarantee that only one saga is accessing a given external IP at a time, and that the update target remains unchanged. ### Instance stop & delete These operations are either not sagaized (stop), or cannot unwind (delete), and so we cannot block them using IP attach state. IP attach/detach will unwind if a given sled-agent is no longer responsible for an instance. Instance delete will force-detach IP addresses bound to an instance, and if this is seen then IP attach will deliberately unwind to potentially clean up NAT state. OPTE/DPD undo operations are best-effort in such a case to prevent stuck sagas. Instance stop and IP attach may interleave such that the latter adds additional NAT entries after other network state is cleared. Because we cannot unwind in this case, `instance_ensure_dpd_config` will now attempt to remove leftover conflicting RPW entries if they are detected, since we know they are a deviation from intended state. ## Additional/supporting changes * Pool/floating IP specifiers in instance create now take `NameOrId`, parameter names changed to match. * External IP create/bind in instance create no longer double-resolves name on saga unwind. * `views::ExternalIp` can now contain `FloatingIp` body. * DPD NAT insert/remove functions now perform single rule update via ID instead of index into the EIP list -- index-based was unstable under live addition/removal. * NAT RPW ensure is now more authoritative, and will remove conflicting entries if an initial insert fails. * Pool `NameOrId` resolution for floating IP allocation pulled up from `Datastore` into `Nexus`. --- Closes #4630 and #4628.

FelixMcFelix added networking Related to the networking. database Related to database access cleanup Code cleanliness labels Dec 6, 2023

FelixMcFelix added a commit that referenced this issue Dec 18, 2023

Make use of check_and_update for attach/detach

81c7d25

Should close #4628.

FelixMcFelix mentioned this issue Jan 3, 2024

Live attach/detach of external IPs #4694

Merged

FelixMcFelix linked a pull request Jan 3, 2024 that will close this issue

Live attach/detach of external IPs #4694

Merged

FelixMcFelix closed this as completed in #4694 Jan 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Floating IP delete/attach/detach should be fully contained in transactions #4628

Floating IP delete/attach/detach should be fully contained in transactions #4628

FelixMcFelix commented Dec 6, 2023

bnaecker commented Dec 6, 2023

FelixMcFelix commented Dec 7, 2023

Floating IP delete/attach/detach should be fully contained in transactions #4628

Floating IP delete/attach/detach should be fully contained in transactions #4628

Comments

FelixMcFelix commented Dec 6, 2023

bnaecker commented Dec 6, 2023

FelixMcFelix commented Dec 7, 2023