You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current simple endpoint wiring architecture - while being effective in meeting Mizar's initial goals - had several drawbacks. The following diagram shows the current wiring:
We thought that changes to the veth driver mode would have been sufficient to work around this limitation. However, the current veth XDP driver implementation still has several limitations, including:
The XDP_REDIRECT action is not as fast as if the XDP driver being executed on the main interface. The best improvement we could achieve over the Generic XDP driver is about 30%. Which is still lagging behind our expectations.
The driver restricts the MTU size to be less than 4KB, which is a function in the default memory page size. The MTU limitation is problematic for Mizar since we need to support Jumbo frames to be in parity with Neutron at least.
The driver mandates that we load a dummy XDP program on the veth pair in the container/VM network namespace. While this seems okay, it is still problematic since we want all Mizar functionality to be transparent to containers and VM. So that a user of the container cannot alter Mizar's behavior in any way (even by removing an XDP program on the veth interface).
Lack of TSO/GSO support and other hardware offloading for that egress path.
Finally, since Mizar loads a transit agent program on each veth peer in the root namespace, it incurred a linear memory usage with the number of endpoints. While this could be negligible in most scenarios, it may be a concern for hosts where we create a large number of containers.
Proposed Changes
In the new architecture, we shall use one Geneve interface for tunneling all outgoing packets. The Geneve interface is common for all endpoints. Inside the container/VM namespace, we will create MACVLAN interfaces and connect them in private mode to the Geneve tunnel interface. A single instance of the transit agent will be attached to the egress clast of the Geneve interface and reused by all the endpoints. The endpoints table will be a global elf map.
All egress packets will trigger the transit agent in their normal packet processing path (fast-path). The transit agent will encapsulate the packet in Geneve as expected and will rewrite the destination IP address to be either the endpoint's transit switch or locally. In the local case, the packet will be picked up immediately by the destination endpoint.
Unlike the current architecture, ingressing packets need not redirected to the Geneve tunnel interface. It's sufficient to XDP_PASS the packet, and the kernel will deliver it normally to the tunnel interface, and the corresponding MACVLAN interface will immediately pick up the decapsulated packet. Since we are not using bridged more, the only overhead incurred is the encapsulation/decapsulation, which we have to account for anyway (even in the current architecture).
This approach has several benefits:
There is no need to redirect packets at all. The only actions used in the main XDP program are XDP_PASS or XDP_TX. Not using XDP_REDIRECT had performance benefits as we have shown - at least for the moment - that redirects are incurring performance penality for veth driver mode (even with redirect_map).
Since we will have one single shared transit agent, memory consumption is always constant. And independent on the number of endpoints on the host.
The control-plane does not need to maintain or discover the mac address of the host since the kernel ARP mechanism will be in effect.
VMs shall be treated similarly by using MACVTAP (requires testing).
We shall have a further simplified control-plane workflow. The network control-agent does not need to load a transit agent anymore for provisioning an endpoint, and the endpoint creation steps will be minimal. The following snippet shows an example of NCA expected steps:
ip link add veth0 link tunnel0 type macvlan mode private
ip link set veth0 netns ns1
ip netns exec ns1 ip addr add 10.0.0.5/24 dev veth0
ip netns exec ns1 ifconfig veth0 hw ether 0e:73:ae:c8:87:01
ip netns exec ns1 ip link set dev veth0 up
Performance gain
With this proposal we shall have:
At least 35% improvement over XDP redirect_maps (with Generic XDP driver)
9-10% improvement over an OVS based setup, even though in that setup, OVS directly tunnels the packets to the end hosts.
The text was updated successfully, but these errors were encountered:
The current simple endpoint wiring architecture - while being effective in meeting Mizar's initial goals - had several drawbacks. The following diagram shows the current wiring:
We thought that changes to the veth driver mode would have been sufficient to work around this limitation. However, the current veth XDP driver implementation still has several limitations, including:
Finally, since Mizar loads a transit agent program on each veth peer in the root namespace, it incurred a linear memory usage with the number of endpoints. While this could be negligible in most scenarios, it may be a concern for hosts where we create a large number of containers.
Proposed Changes
In the new architecture, we shall use one Geneve interface for tunneling all outgoing packets. The Geneve interface is common for all endpoints. Inside the container/VM namespace, we will create MACVLAN interfaces and connect them in private mode to the Geneve tunnel interface. A single instance of the transit agent will be attached to the egress clast of the Geneve interface and reused by all the endpoints. The endpoints table will be a global elf map.
All egress packets will trigger the transit agent in their normal packet processing path (fast-path). The transit agent will encapsulate the packet in Geneve as expected and will rewrite the destination IP address to be either the endpoint's transit switch or locally. In the local case, the packet will be picked up immediately by the destination endpoint.
Unlike the current architecture, ingressing packets need not redirected to the Geneve tunnel interface. It's sufficient to XDP_PASS the packet, and the kernel will deliver it normally to the tunnel interface, and the corresponding MACVLAN interface will immediately pick up the decapsulated packet. Since we are not using bridged more, the only overhead incurred is the encapsulation/decapsulation, which we have to account for anyway (even in the current architecture).
This approach has several benefits:
Performance gain
With this proposal we shall have:
The text was updated successfully, but these errors were encountered: