Kubernetes external network orchestration #118
Replies: 10 comments 24 replies
-
I think there are several different problems what should be addressed in Kubernetes network orchestration.
Should we create a separate use case for these? |
Beta Was this translation helpful? Give feedback.
-
tagging from the meeting |
Beta Was this translation helpful? Give feedback.
-
This problem is only applicable to Multus and/or DANM given that NSM provides on demand networks. |
Beta Was this translation helpful? Give feedback.
-
Is requiring L2 networks an example, or is it an axiom? An axiom, here, would be the indisputable one way to do this (as in OpenStack, for instance, where any external attachment will first cross an OpenStack network which is, give or take, defined to be an L2 domain). The original proposal doesn't really make it clear. I think it's an example. I don't require L2 networks - to be specific, I don't require bridge domains - for everything I do. I think it might also be worth approaching this from the sorts of packets I might send. For instance, if I'm sending a lot of IP, many techniques work. However, .1q, 1.ac and qinq would work in some cases and not others; MPLS will not go over a VLAN carrier, an L2 bridge domain or a routed network; and so on. The protocols I listed imply that I need something that gets me to the external network in more raw ways than previously, but that could equally be (and, as a fallback for some protocols, would have to be) an L1 point to point connection over raw copper or fibre - since all of these are based on Ethernet (which is default for all network interfaces and would make an axiom) and modern networks always have wires from one device to the other (I don't think we have to consider CS/CDMA and multiple endpoints on a single wire here; another axiom). I can implement bridge domains on this using these raw components, but I don't have to implement them if that's not what I want. We also don't have much sense of where I'm sending this traffic to, and how I identify this. That problem is only half a cloud problem, since we don't control the external network. I agree with the point that we might want to create new networks, as described; but adding a new network may also involve changing the external network. I can't just create a new network for VLAN 7 on a link and assume it will do what I want. So I think how co-ordination works is an important part of the use case as well. |
Beta Was this translation helpful? Give feedback.
-
Btw, I think design-wise we have two things here:
|
Beta Was this translation helpful? Give feedback.
-
Let me ask a provocative question: why multiple interfaces per pod in first place? |
Beta Was this translation helpful? Give feedback.
-
Good topic, and other people made good points, here are a few more -- First, quick link to a presentation on the topic. I want underscore your point about "pre-provisioning", because it's so crucial. One major problem with many deployments is that they can't just be placed in any vanilla cluster, you indeed need the cluster to have been installed and configured with certain abilities on the hosts, e.g. have a CNI plugin configured in a very certain way. Not only that, but it's a multi-host issue: you might very well have several pods with the same shared requirement, and they might be distributed on multiple hosts in the cluster (which might be a requirement for high-availability). So, what we're seeing a lot these days is that a certain product requires you to 1) install stuff on the baremetal nodes, and 2) then install K8s on them, and 3) deploy K8s workloads. This defeats many of the benefits of cloud technologies. So, what can we do? One challenge is that installing a K8s cluster is out of scope for K8s, or at least widely diverse (see the Cluster API), so how would you be able to list these requirements in a way that is even remotely portable? But the more practical challenge is that, well, you can't just easily reinstall the entire cluster when you are deploying a new network service. (You could potentially have standby baremetal nodes that you can on-board into your cluster with the new requirements, but again this is extremely implementation-specific, if the feature exists at all.) What we need is technologies that allow for existing, running hosts to be reconfigured. This is a very difficult issue, as you generally don't want the containers to touch the host. So I think we would need some kind of core K8s component (just like the kubelet) that exposes certain specific and controlled capabilities regarding host networking directly to K8s workloads. This can't be just a CNI plugin, because again the point might be that CNI plugin wasn't "pre-provisioned". Also note that even if the plugin was installed, it might not be configured according to the workload's requirements. An example of how difficult this could be: you might need to reconfigure something in the current host's BIOS, restart the host (after moving workloads to other hosts), and then rejoin the cluster with the new abilities. And, again, you might need to do the same with other hosts in the cluster, too, and this all needs to happen with minimal disruption. To put it another way, we need to turn the "pre-provisioning" paradox into a "re-provisioning" solution. :) Final point: we need a way to orchestrate this but I don't know if "API" is the right answer to that requirement. Cloud-native orchestration solutions are declarative and intent-oriented, not API-driven. This is not a small point: imagine if several different users are calling the same API and asking for networky things that cannot be orchestrated together due to a lack of resources (avilability of SR-IOV slots, number of VLAN IDs pre-configured in the switches, etc.). The declarative approach could allow an operator, which can look at the complete picture, to "reconcile" these different requirements in sensible ways, e.g. to prioritize certain users over others. |
Beta Was this translation helpful? Give feedback.
-
Thanks for all the great comments. I guess I agree with more or less all the comments. Also, like we mentioned in #tug-networking-orchestration channel:
ENO aims at providing automation APIs for networking solutions underneath the K8s cloud platform. An L2 service across a DC fabric to the GW being one that we have addresses so far. There may be others to be added. Don't necessarily read VLAN in this context as an L2 bridge domain. It really only means a VLAN on the access link between the K8s worker/server and the fabric. A DC fabric can and often will be EVPN-based, so that the L2 service is actually routed. Furthermore, the L3 DCGw function is often hosted on the fabric switches and in small Edge deployments, the "fabric" actually degenerates to a directly connected Gw. The ENO API data model should be abstract and generic enough to cover all these scenarios. If not, let's improve it. It should be a matter of the south-bound fabric/Gw plugin to map the abstract model to the suitable fabric configuration. We would like to share the ENO design document that will cover the object model and how E2E network orchestration can be realised. Should I create a separate discussion thread for ENO design document or do we have dedicated space under cnf-wg repo for design documents? |
Beta Was this translation helpful? Give feedback.
-
19APR2021 CNF-WG Call
Tasks:
Discuss the role of K8s itself, vs the CNF itself. |
Beta Was this translation helpful? Give feedback.
-
@jeffsaelens what was the outcome of this CNF-WG call ? |
Beta Was this translation helpful? Give feedback.
-
Primary/Standard K8s networking model relies on single NAT’ed interface and stateful load balancer when interworking with external networks. As a result, does not allow proper network separation and its implementation through Linux kernel IP stack mechanisms does not fulfill the performance requirements of many TelCo VNFs.
Secondary/Special network attachments were introduced to overcome these TelCo specific limitations:
Network interfaces to pods are provided by Container Network Interface (CNI) plugins. Multus, as meta CNI is able to handle a pod requesting more than the mandatory primary interface and delegates the interfaces plumbing and configuration to the actual CNI plugins responsible for each pod interface.
However, all this is achieved through static pre-provisioned external networks during the initial cluster deployment; which cannot be updated on demand in an automated manner. Resulting, K8s pods (CNFs) can only consume those preconfigured external networks. This limited support for external network orchestration makes the overall networking solutions static and CNFs bound to specific deployed K8s cluster.
It is expected that the external networks can be added/configured on demand during the lifetime of a K8s cluster. So, whenever a new K8s application pods (CNFs) have to be instantiated, a cloud admin can orchestrate the external networks so that, the cloud user can attach their application pods (CNFs) to provisioned external networks.
There should be a network orchestration API to automate the necessary configuration inside the Kubernetes cluster and on the DC fabric.
Beta Was this translation helpful? Give feedback.
All reactions