Skip to content

Commit

Permalink
docs: Updates
Browse files Browse the repository at this point in the history
  • Loading branch information
rg0now committed Dec 21, 2023
1 parent ae1722d commit d4db3b3
Show file tree
Hide file tree
Showing 3 changed files with 66 additions and 66 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -179,7 +179,8 @@ Values](https://helm.sh/docs/chart_template_guide/values_files).
```console
helm repo add stunner https://l7mp.io/stunner
helm repo update
helm install stunner-gateway-operator stunner/stunner-gateway-operator --create-namespace --namespace=stunner-system
helm install stunner-gateway-operator stunner/stunner-gateway-operator --create-namespace \
--namespace=stunner-system
```

Find out more about the charts in the [STUNner-helm repository](https://github.com/l7mp/stunner-helm).
Expand Down
90 changes: 46 additions & 44 deletions docs/DEPLOYMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,9 @@ can act either as a simple headless STUN/TURN server or a fully fledged ingress
an entire Kubernetes-based media server pool. Second, when STUNner is configured as an ingress
gateway then there are multiple [ICE models](#ice-models), based on whether only the client
connects via STUNner or both clients and media servers use STUNner to set up the media-plane
connection. Third, STUNner can run in one of several [control plane models](#control-plane-models),
based on whether the user manually supplies STUNner configuration or there is a separate STUNner
control plane that automatically reconciles the dataplane based on a high-level [declarative
API](https://gateway-api.sigs.k8s.io).
connection. Third, STUNner can run in one of several [data plane models](#data-plane-models), based
on whether the dataplane is automatically provisioned or the user has to manually supply the
dataplane pods for STUNner.

## Architectural models

Expand All @@ -26,11 +25,11 @@ this case the STUN/TURN servers are deployed into Kubernetes.

![STUNner headless deployment architecture](img/stunner_standalone_arch.svg)

> **Warning**
For STUNner to be able to connect WebRTC clients and servers in the headless model *all* the
clients and servers *must* use STUNner as the TURN server. This is because STUNner opens the
transport relay connections *inside* the cluster, on a private IP address, and this address is
reachable only to STUNner itself, but not for external STUN/TURN servers.
<!-- > **Warning** -->
<!-- For STUNner to be able to connect WebRTC clients and servers in the headless model *all* the -->
<!-- clients and servers *must* use STUNner as the TURN server. This is because STUNner opens the -->
<!-- transport relay connections *inside* the cluster, on a private IP address, and this address is -->
<!-- reachable only to STUNner itself, but not for external STUN/TURN servers. -->

### Media-plane deployment model

Expand All @@ -52,12 +51,19 @@ for clients' UDP transport streams then STUNner can be scaled freely, otherwise
result the [disconnection of a small number of client
connections](https://cilium.io/blog/2020/11/10/cilium-19/#maglev).

#### Asymmetric ICE mode
## ICE models

The standard mode to supply an ICE server configuration for clients and media servers in the
media-plane deployment model of STUNner is the *asymmetric ICE mode*. In this model the client is
configured with STUNner as the TURN server and media servers run with no STUN or TURN servers
whatsoever.
The peers willing to create a connection via STUNner (e.g., two clients as per the headless model,
or a client and a media server in the media-plane deployment model) need to decide how to create
ICE candidates.

### Asymmetric ICE mode

In *asymmetric ICE mode*, one peer is configured with STUNner as the TURN server and the other peer
runs with no STUN or TURN servers whatsoever. The first peer will create a TURN transport relay
connection via STUNner to which the other peer can directly join. Asymmetric ICE mode is the
recommended way for setup for the media-plane deployment model for hosting WebRTC media servers in
Kubernetes.

![STUNner asymmetric ICE mode](img/stunner_asymmetric_ice.svg)

Expand All @@ -71,37 +77,34 @@ connection. In contrast, servers run without any STUN/TURN server whatsoever, so
only. Due to servers being deployed into ordinary Kubernetes pods, the server's host candidate will
likewise contain a private pod IP address. Then, since in the Kubernetes networking model ["pods
can communicate with all other pods on any other node without a
NAT"](https://kubernetes.io/docs/concepts/services-networking), clients' relay candidates and the
servers' host candidates will have direct connectivity in the Kubernetes private container network
and the ICE connectivity check will succeed. See more explanation
NAT"](https://kubernetes.io/docs/concepts/services-networking), the client's relay candidate and
the server's host candidate will have direct connectivity in the Kubernetes private container
network and the ICE connectivity check will succeed. See more explanation
[here](examples/kurento-one2one-call/README.md#what-is-going-on-here).

> **Warning**
Refrain from configuring additional public STUN/TURN servers, apart from STUNner itself. The rules
to follow in setting the [ICE server
Refrain from configuring additional public STUN/TURN servers apart from STUNner itself. The rules
to follow for setting the [ICE server
configuration](https://github.com/l7mp/stunner#configuring-webrtc-clients) in asymmetric ICE mode
are as below:
> - on the client, set STUNner as the *only* TURN server and configure *no* STUN servers, whereas
> - on the server do *not* configure *any* STUN or TURN servers whatsoever.
Most users will want to deploy STUNner using the asymmetric ICE mode. In the rest of the docs we
assume the asymmetric ICE mode with the media plane deployment model, unless noted otherwise.
- on the client, set STUNner as the *only* TURN server and configure *no* STUN servers, and
- on the server do *not* configure *any* STUN or TURN server whatsoever.

> **Warning**
Deviating from the above rules *might* work in certain cases, but may have uncanny and
hard-to-debug side-effects. For instance, configuring clients and servers with public STUN servers
in certain unlucky situations may allow them to connect via server-reflexive ICE candidates,
completely circumventing STUNner. This is on the one hand extremely fragile and, on the other hand,
a security vulnerability; remember, STUNner should be the *only* external access point to your
media plane. It is a good advice to set the `iceTransportPolicy` to `relay` on the clients to avoid
side-effects: this will prevent clients from generating host and server-reflexive ICE candidates,
leaving STUNner as the only option to obtain an ICE candidate from.
Deviating from these rules *might* work in certain cases, but may have uncanny and hard-to-debug
side-effects. For instance, configuring clients and servers with public STUN servers in certain
unlucky situations may allow them to connect via server-reflexive ICE candidates, completely
circumventing STUNner. This is on the one hand extremely fragile and, on the other hand, a security
vulnerability; remember, STUNner should be the *only* external access point to your media plane. It
is a good advice to set the `iceTransportPolicy` to `relay` on the clients to avoid side-effects:
this will prevent clients from generating host and server-reflexive ICE candidates, leaving STUNner
as the only option to obtain an ICE candidate from.

#### Symmetric ICE mode
### Symmetric ICE mode

In the symmetric ICE mode both the client and the server obtain an ICE [relay
candidate](https://developer.mozilla.org/en-US/docs/Web/API/RTCIceCandidate/type) from STUNner and
the connection occurs directly via STUNner.
the connection occurs directly via STUNner. This is the simplest mode for the headless deployment
model, but symmetric mode can also be used for the media-plane model as well to connect clients to
media servers.

![STUNner symmetric ICE mode](img/stunner_symmetric_ice.svg)

Expand All @@ -118,7 +121,7 @@ priorities](https://www.ietf.org/rfc/rfc5245.txt) to different connection types)
is a good practice to configure the STUNner TURN URI in the server-side ICE server configuration
with the *internal* IP address and port used by STUNner (i.e., the ClusterIP of the `stunner`
Kubernetes service and the corresponding port), otherwise the server might connect via the external
LoadBalancer IP causing an unnecessary roundtrip.
LoadBalancer IP causing an unnecessary roundtrip (hairpinning).

The symmetric mode means more overhead compared to the asymmetric mode, since STUNner now performs
TURN encapsulation/decapsulation for both sides. However, the symmetric mode comes with certain
Expand All @@ -127,11 +130,10 @@ internal IP addresses in the ICE candidates from attackers; note that this is no
but feel free to open an issue if [exposing internal IP addresses](SECURITY.md) is blocking
you from adopting STUNner.

## Control plane models
## Data plane models

In addition, STUNner supports two dataplane provisioning modes. In the default *managed* mode, the
dataplane pods (i.e., the `stunnerd` pods) are provisioned by the gateway operator automatically
per each Gateway existing in the cluster. In the *legacy* mode, the dataplane is supposed to be
deployed by the user manually (by installing the `stunner/stunner` Helm chart into the target
namespaces). Legacy mode is considered obsolete at this point and it will be removed in a later
release.
STUNner supports two dataplane provisioning modes. In the default *managed* mode, the dataplane
pods (i.e., the `stunnerd` pods) are provisioned automatically per each Gateway existing in the
cluster. In the *legacy* mode, the dataplane is supposed to be deployed by the user manually by
installing the `stunner/stunner` Helm chart into the target namespaces. Legacy mode is considered
obsolete at this point and it will be removed in a later release.
39 changes: 18 additions & 21 deletions docs/WHY.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,43 +9,40 @@ used outside of this context (e.g., as a regular STUN/TURN server), but this is

## The problem

The main pain points STUNner is trying to solve are all related to that Kubernetes and WebRTC are
The pain points STUNner is trying to solve are all related to that Kubernetes and WebRTC are
currently foes, not friends.

Kubernetes has been designed and optimized for the typical HTTP/TCP Web workload, which makes
streaming workloads, and especially UDP/RTP based WebRTC media, feel like a foreign citizen. Most
importantly, Kubernetes runs the media server pods/containers over a private L3 network over a
private IP address and the network dataplane applies several rounds of Network Address Translation
(NAT) steps to ingest media traffic into this private pod network. Most cloud load-balancers apply
a DNAT step to route packets to a node and then an SNAT step to put the packet to the private pod
private IP address and the several rounds of Network Address Translation (NAT) steps are required
to ingest media traffic into this private pod network. Most cloud load-balancers apply a DNAT step
to route packets to a Kubernetes node and then an SNAT step to inject a packet into the private pod
network, so that by the time a media packet reaches a pod essentially all header fields in the [IP
5-tuple](https://www.techopedia.com/definition/28190/5-tuple) are modified except the destination
port. Then, if any pod sends the packet over to another pod via a Kubernetes service load-balancer
then the packet will again undergo a DNAT step, and so on.

The *Kubernetes dataplane teems with NATs*. This is not a big deal for the usual HTTP/TCP web
protocols Kubernetes was designed for, since an HTTP/TCP session contains an HTTP header that fully
describes it. Once an HTTP/TCP session is accepted by a server it does not need to re-identify the
client per each received packet, because it has session context.

This is not the case with the prominent WebRTC media protocol encapsulation though, RTP over
UDP. RTP does not have anything remotely similar to an HTTP header. Consequently, the only
"semi-stable" connection identifier WebRTC servers can use to identify a client is by expecting the
client's packets to arrive from a negotiated IP source address and source port. When the IP 5-tuple
changes, for instance because there is a NAT in the datapath, then WebRTC media connections
break. Due to reasons which are mostly historical at this point, *UDP/RTP connections do not
survive not even a single NAT step*, let alone the 2-3 rounds of NATs a packet regularly undergoes
in the Kubernetes dataplane.
The *Kubernetes dataplane teems with NATs*. This is not a big deal for the web protocols Kubernetes
was designed for, since each HTTP/TCP connection involves a session context that can be used by a
server to identify clients. This is not the case with WebRTC media protocol stack though, since
UDP/RTP connections do not involve anything remotely similar to an HTTP context. Consequently, the
only "semi-stable" connection identifier WebRTC servers can use to identify a client is by
expecting the client's packets to arrive from a negotiated IP source address and source port. When
the IP 5-tuple changes, for instance because there is a NAT in the datapath, then WebRTC media
connections break. Due to reasons which are mostly historical at this point, *UDP/RTP connections
do not survive not even a single NAT step*, let alone the 2-3 rounds of NATs a packet regularly
undergoes in the Kubernetes dataplane.

## The state-of-the-art

The current stance is that the only way to deploy a WebRTC media server into Kubernetes is to
exploit a [well-documented Kubernetes
anti-pattern](https://kubernetes.io/docs/concepts/configuration/overview): *running the media
server pods in the host network namespace* (using the `hostNetwork=true` setting in the pod's
container template). This way the media server shares the network namespace of the host (i.e., the
Kubernetes node) it is running on, inheriting the public address (if any) of the host and
(hopefully) sidestepping the private pod network with the involved NATs.
server pods in the host network namespace* of Kubernetes nodes (using the `hostNetwork=true`
setting in the pod's container template). This way the media server shares the network namespace of
the host (i.e., the Kubernetes node) it is running on, inheriting the public address (if any) of
the host and (hopefully) sidestepping the private pod network with the involved NATs.

There are *lots* of reasons why this deployment model is less than ideal:

Expand Down

0 comments on commit d4db3b3

Please sign in to comment.