-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
VPC Subnet Routing [2/2] -- Custom Routers and NIC 'transit IP' lists #5823
Conversation
OPTE now prevents itself from being unloaded if its underlay state is set. Currently, underlay setup is performed only once, and it seems to be the case that XDE can be unloaded in some scenarios (e.g., `a4x2` setup). However, a consequence is that removing the driver requires an extra operation to explicitly clear the underlay state. This PR adds this operation to the `cargo xtask virtual-hardware destroy` command. This is currently blocked on opte#485 being approved/merged. Closes #5314.
These update in response to VPC subnet changes. Now to plumb them into OPTE.
Currently there are no triggers attached to most of the operations that will cause us to either a) push or b) re-resolve VPC routes, but this lays the basis for sled-agent and the background task to talk in terms of versions.
We'll get to that in the next PR.
495c69b
to
ab7223b
Compare
pub target: RouteTarget, | ||
/// A CIDR block (or named subnet) which this route will apply to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this accurate? I see a destination can be an IP, IP net, VPC, or subnet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking over this!
It can be any of those, except for VPCs -- that will likely relax in future with VPC peering, but it's not something that makes sense right now. I'll try and rework the wording for this, I maybe want to make it clearer that 'destination' matches on the address in routed packets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, sounds good. I was asking because I saw more stuff on the RouteDestination
type. But it's fine if there are things that are allowed by the type that are nonetheless rejected by the endpoint logic as invalid, and I see you're doing that nicely here.
/// Custom routers apply in addition to the VPC-wide *system* router, and have | ||
/// higher priority than the system router for an otherwise | ||
/// equal-prefix-length match. | ||
pub custom_router: Option<NameOrId>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if this could just be called router
and then on the VpcSubnet
view we can call it router_id
, and we always include the ID. But I see now, "Custom routers apply in addition to the VPC-wide system router" means that doesn't make sense, right? Because when there is a custom router, there is still also the system router. It's not a default vs. custom situation. So I think this is good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API side looks good to me. transit_ips
vec seems fine. We can always tweak it after we see how it feels in practice.
common/src/api/external/mod.rs
Outdated
pub target: RouteTarget, | ||
/// The set of destination IP addresses or subnets that this route | ||
/// will match packets against. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to keep nitpicking, but saying "set" and "subnets" might be confusing because this can only be one subnet. I can't think of a better wording — maybe rely on the definiton of RouteDestination
to cover what is inside it, and here emphasize its role in the route. I like this bit in the doc comment on RouteDestination
, though it's kind of long:
/// When traffic is to be sent to a destination that is within a given
/// `RouteDestination`, the corresponding `RouterRoute` applies, and traffic
/// will be forward to the `RouteTarget` for that rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the existing RouteDestination
text is good there. I think if we lean into just showing its role and delegating the details to the type, I'd go for:
/// Selects which traffic this routing rule will apply to.
pub destination: RouteDestination,
This PR wires up all the backing machinery for VPC subnet routing, and automatically resolves and pushes updated rules to sleds using an RPW. This allows instances in all subnets of a VPC to talk with one another -- assuming no firewall rules have been configured otherwise. At a high level, this works by a few changes: * During the VPC create saga, we now push two rules explicitly to the system router -- default routes from `(0.0.0.0/0, ::/0) -> inetgw:outbound`. * Any CRUD operation on a VPC subnet will reconcile the set of VPC subnet routes within the system router to have one entry per subnet. This takes the form `subnet:{name} -> subnet:{name}` for each subnet, which are later resolved to both v4 and v6 entries. * Ports are created using route information known to sled-agent -- this defaults to an empty route set for instances/probes, and an internet gateway rule for services to enable early NTP sync. * Routes are sync'd with sleds using a new background task. Broadly, this asks each sled for the set of VPCs and subnets it has ports on, and a version for the current route set installed in each. The background task will use this information to determine which routes must be rebuilt, and will send updated versions out in response. The most immediate consequence in this PR is that hosts within a subnet -- on different VPCs -- will be able to talk with one another at last. The user facing API (#2116) will be re-enabled in a concurrent PR -- #5823 -- as will NIC spoof detection hole-punching. Depends on oxidecomputer/opte#490. Closes #2232, Fixes #1336. --- A few pieces will block tests passing & merge-readiness: - [x] Creation of a `lab-2.0-opte-0.32` image. - [x] Merge of oxidecomputer/maghemite#274 (and updating all the right SHAs in this PR).
This PR builds on #5777 to provide the Custom routers for subnets as described in RFD21. This entails a few things:
unpublished = true
tag from the user API for VPC routers and routes.custom_router
field in subnetPOST
andPUT
requests.transit_ips
list, which denotes an additional set of CIDR blocks that a NIC is allowed to send and receive traffic on. This is set duringPOST
and/orPUT
on instances which are stopped. This is a key feature to enable software routing by instances, as today's default behaviour drops any packets not matching an assigned IP for an instance.There are some allowances around currently non-existent features:
inetgw:outbound
, which appears in our existing rules. Using this target sends packets upstream as it does today.Closes #2116.