Enterprise BGP use case illustrating additional VRF use #62

iawells · 2021-02-08T16:28:04Z

iawells
Feb 8, 2021
Collaborator

The pull request - which may be a bit premature and get closed in its current form - is https://github.com/cncf/cnf-wg/pull/60. It intends to illustrate an SP-managed, enterprise-VPN-facing BGP server. It includes the use case and (arguably out of place) a discussion of how it will cause us problems in a stock k8s deployment.

(@taylor, to you comment, apparently SVG works in markdown but is not supported as an upload to discussions so this is a PNG export).

BGP on a customer network

Overview

A service provider wishes to run a BGP server that is attached to a customer network. This is a common use case when using VPNs.

The customer network is a separate network from the internet, having its own addressing domain (i.e. it is a VRF on the service provider network). In order to provide BGP service to this network, the BGP application run by the BGP server must peer with other BGP speakers on the customer network, and therefore needs to have an interface within that VRF. Simultaneously, the BGP speaker will require a management interface (for configuration and telemetry) in the SP network. The Kubernetes API endpoint will also be in the SP network, unavailable to the customer - the customer's network can only access the BGP speaker.

Figure 1: BGP network overview

Network consumption

In NFV we often consider interfaces that provide per-packet networking for the purposes of creating a fast dataplane. This, however, is not one of those use cases. Here, the BGP protocol involves a simple TCP connection, and the kernel network stack provides to the BGP process. So, note that the BGP peering sessions will be socket-based connections and the BGP speaker will want to consume the network via standard BGP socket APIs (socket(), listen() and connect()).

BGP speakers are true peers, without a client and a server. Any BGP speaker is expected to listen on a well known port for its neighbours trying to connect. At the same time, it will be attempting connections to those peers.

The BGP connection is typically described at both ends with the addresses of the opposite end. The Kubernetes-based BGP speaker process will want to know, and reach, the address of its neighbour. Similarly the neighbour will want to know, and reach, the address of the BGP speaker. There is no DNS involved in this lookup.

Problems vs standard Kubernetes

There are three sorts of networking occurring here.

We have the connection used by the OSS systems to the BGP speaker using the management VRF, which is isolated from the SP customer.
We have the connection used by the BGP peering sessions to the BGP speaker, using the customer VRF, isolated from other networks.
We have whatever internal networking is required between microservices within the BGP speaker (perhaps, for instance, speaker code versus a DB instance storing the RIB).

CNI-mediated networking

Kubernetes' CNI interface provides for the internal connectivity - this has no fixed protocol and is likely amenable to the semantics of the CNI. Similarly, the telemetry interface (logging, monitoring and configuration using e.g. a REST interface) will work using the CNI an Kubernetes ingress.

Networking outside the scope of the CNI

However, the BGP interface will not work using the CNI, for a number of reasons.

Multiple VRFs

Kubernetes' CNI is designed to attach to the network outside of the platform using a single point of connectivity - basically, into a single VRF - and the defined APIs do not allow a different connection to be specified for other connections. All CNI endpoints should be in a single addressing domain that can reach all other endpoints.

Multiple VRFs attached to one process

Aside from this, if the BGP process is in two VRFs, it, and at least one of its pods, must attach to two routing tables. This is conventionally done, in Linux, by using two namespaces. This is problematic for stock Kubernetes. It creates a namespace for each pod and gives control of it to the CNI, for the purposes of Kubernetes networking. Kubernetes does not provide a means to create and attach a second namespace.

It is also not possible for an unprivileged container to change namespaces. A process requires CAP_NET_ADMIN or higher privileges to switch namespaces. However, CAP_NET_ADMIN is indiscriminate - a process with CAP_NET_ADMIN can do anything to any namespace on the server, regardless of whether it is assigned to the container in which the pod resides.

BGP is not designed to be cloud-native

BGP far predates Kubernetes and it is not designed with cloud-style failure tolerance in mind. Its connection is designed to be direct from one BGP process to another, with the assumption that the network provides reachability and nothing else.

BGP peers need to know their own address and the address of their peer. Address rewrites (e.g. NAT) in the flow are likely to cause problems rather than help the solution.

BGP sessions last as long as the TCP connection is held up. If the speaker process dies, the interruption in the connection causes a significant event on the network, and having a failover process to accept the connection, while still useful, is not going to avoid that network event and the major part of the disruption caused by the failure.

So: current BGP speakers are not going to be able to make use of standard forms of address rewriting (e.g. NAT) or load balancing from the platform, and in fact these features will cause problems if they cannot be avoided. If BGP requires special network functionality it is likely to be specific to the application and therefore built into the application, not general purpose enough to be worth building into the platform.

taylor · 2021-02-08T16:57:54Z

taylor
Feb 8, 2021
Maintainer

Uploading the SVG to gist.github.com is an alternative (before the SVG is added to the repo via PR/commit).

See https://gist.github.com/taylor/1613a794d1be001577ec5efa303528f7

https://raw.githubusercontent.com/gist/taylor/1613a794d1be001577ec5efa303528f7/raw/dfd3f90cd0b264edf08587a63b3ffd19f7b130ce/Test.svg

1 reply

iawells Feb 8, 2021
Collaborator Author

Yep, it's purely 'drag and drop into the discussion' works with PNG files, but not with SVG files. When you have a URL anywhere your problem is solved; it just won't make one automatically.

taylor · 2021-06-22T20:32:36Z

taylor
Jun 22, 2021
Maintainer

@iawells see https://pantheon.tech/lightyio-bgp-evpn-rr/

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enterprise BGP use case illustrating additional VRF use #62

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Enterprise BGP use case illustrating additional VRF use #62

iawells Feb 8, 2021 Collaborator

BGP on a customer network

Overview

Network consumption

Problems vs standard Kubernetes

CNI-mediated networking

Networking outside the scope of the CNI

Multiple VRFs

Multiple VRFs attached to one process

BGP is not designed to be cloud-native

Replies: 2 comments · 1 reply

taylor Feb 8, 2021 Maintainer

iawells Feb 8, 2021 Collaborator Author

taylor Jun 22, 2021 Maintainer

iawells
Feb 8, 2021
Collaborator

Replies: 2 comments 1 reply

taylor
Feb 8, 2021
Maintainer

iawells Feb 8, 2021
Collaborator Author

taylor
Jun 22, 2021
Maintainer