Release The final missing pieces: Third stable release candidate · l7mp/stunner

We are proud to present STUNner v0.19.0, the third stable release candidate of the STUNner Kubernetes media gateway for WebRTC from l7mp.io.

Originally we did not plan to release a third RC before v1, however, a couple of compelling bugs and feature requests required us to revisit this plan. Now that the v1 release milestones in the main STUNner repo and the STUNner gateway operator repo have all been fixed, we decided to roll a third release candidate. We hope this version to become the final v1 release in the coming weeks.

Changes occurred mostly on the side of the Kubernetes gateway operator, including the implementation of a new endpoint discovery controller to restore the graceful backend shutdown feature, further customization in the way dataplane pods are deployed, new Gateway annotations to finetune the way STUNner gateways are exposed to clients, and the implementation of a proper finalizer that makes sure the cluster is left in a well-defined state after the operator has been shut down. With this release, STUNner goes into a soft-freeze state again: only fixes, refactors, and documentation updates will be accepted in the master branch until we release v1.

News

A new controller for endpoint discovery

One of STUNner's main security features is filtering client connections based on the requested peer address: STUNner permits clients to reach only the pods that belong to one of the backend services in the target UDPRoute, and it blocks access to all other pod IPs. This feature relies on STUNner's endpoint discovery mechanism, which makes it possible for the operator to learn the pod IPs that belong to backend services. Until this release, the operator has been watching the legacy Kubernetes Endpoints API to learn IP addresses. Unfortunately, certain limitations of this legacy API have made backend graceful shutdown impossible to support. In particular, when a backend pod is terminated it is immediately removed from the Endpoints resource by Kubernetes, which triggers STUNner into rejecting TURN packets for the terminating backends from that point. This breaks all TURN connections to terminating backends, even though the backend may very well remain functional for a while to finish servicing active client connections (this process is called graceful shutdown).

In this release, STUNner's endpoint discovery mechanism has been rewritten over the newer EndpointSlice API. Since EndpointSlices keep the IP of terminating pods, this change has restored the graceful shutdown functionality.

Note that you need at least Kubernetes v1.21 to take advantage of the EndpointSlice API (STUNner falls back to the legacy Endpoints API when EndpointSlices are unavailable), and at least v1.22 to let EndpointSlices show terminating pods.

New dataplane customization features

With STUNner going into large-scale production, further customization features have been requested by users to finetune the way the stunnerd dataplane pods are deployed. With this release, the Dataplane CRD, used as a template by the operator to create stunnerd deployments, contains some additional settings:

ImagePullSecrets: list of Secret references for pulling the stunnerd image. This is useful when deploying stunnerd from a private container image repository.
TopologySpreadConstraints: this standard Deployment spec describes how the group of stunnerd pods ought to spread across topology domains.
ContainerSecurityContext: this field in the Dataplane spec allows to set container-level security attributes. Setting pod-level security attributes has always been supported, but now it is possible to customize security attributes at the level of each container in the stunnerd pods. This is useful for deploying sidecar containers alongside stunnerd.

For users deploying STUNner with a Helm upgrade: make sure to manually apply the new CRDs (kubectl apply -f deploy/manifests/static/stunner-crd.yaml), otherwise you won't have access to the new Dataplane spec fields.

Retaining clients' source IP

Normally, Kubernetes load balancers apply source IP address translation when ingesting packets into the cluster. This replaces clients' original IP address with a private IP address. For STUNner's intended use case, as an ingress media gateway exposing the cluster's media services over the TURN protocol, this does not matter. However, STUNner can also act as a STUN server, which requires clients' source IP to be retained at the load balancer.

Starting from the new release, this can be achieved by adding the annotation stunner.l7mp.io/external-traffic-policy: local to a Gateway, which will set the service.spec.externalTrafficPolicy field in the Service created by STUNner for the Gateway to Local. Note that this Kubernetes feature comes with fairly complex limitations: if a STUN or TURN request hits a Kubernetes node that is not running a stunnerd pod, then the request will silently fail. This is required for Kubernetes to retain the client IP, which otherwise would be lost when passing packets between nodes. Use this setting at your own risk.

Manually provisioning the dataplane

In some cases it may be useful to manually provision a dataplane for a Gateway. One specific example is when STUNner is used as a STUN server: replacing the stunnerd Deployment created by the operator to run the dataplane for a Gateway with a Daemonset will make sure that at least one stunnerd pod runs on each node, which eliminates the above problem created by the service.spec.externalTrafficPolicy: Local setting.

In the new release, adding the annotation stunner.l7mp.io/disable-managed-dataplane: true to a Gateway will prevent STUNner from spawning a dataplane Deployment for the Gateway (the LB Service will still be created). This then allows one to manually create a stunnerd dataplane and connect it to the CDS server exposed by the operator to load fresh dataplane configuration. Remove the annotation to revert to the default mode and let STUNner to manage the dataplane for the Gateway.

For instance, in order to run the dataplane of a Gateway in a DaemonSet, first dump the automatically created Deployment into a YAML file (this will serve as a template for the manually created DaemonSet), apply the above annotation to make sure the operator removes the automatically created Deployment, edit the template in the YAML file by rewriting the resource kind from apps/v1/Deployment to apps/v1/DaemonSet and remove the useless settings (like replicas), and finally apply the modified YAML. This will deploy stunnerd to all nodes of the cluster, making sure that STUN requests will always find a running STUN server no matter which Kubernetes node they hit. The cost is, however, that the dataplane DaemonSet will have to be manually adjusted every time you change the Gateway.

Manual dataplane provisioning requires intimate knowledge with the STUNner internals, use this feature only if you know what you are doing.

Selecting the NodePort for a Gateway

By default, Kubernetes assigns a random external port from the range [32000-32767] to each listener of a Gateway exposed in a NodePort Service. This requires all ports in the default NodePort range [32000-32767] to be opened on the external firewall, which may raise security concerns for hardened deployments.

In order to assign a specific NodePort to a particular listener, you can now add the annotation stunner.l7mp.io/nodeport: {"listener_name_1":nodeport_1,"listener_name_2":nodeport_2,...} to the Gateway, where each key-value pair is a name of a listener and the selected (numeric) NodePort. The annotation value itself must be proper a JSON map. Unknown listeners are silently ignored.

Note that STUNner makes no specific effort to reconcile conflicting nodeports: whenever the selected nodeport is unavailable Kubernetes will silently reject the Service, which can lead to hard-to-debug failures. Use this feature at your own risk.

Finalizer

So far, when the STUNner gateway operator was removed all the automatically created Kubernetes resources (stunnerd Deployments, Services and ConfigMaps) have kept running, with a status indicating a functional gateway deployment. From this release the operator carefully removes all managed resources and invalidates gateway statuses on exit, which makes sure the cluster is left in a well-defined state.

Commits

chore(Helm): Operator now can be installed as a dependency chart.
feat: Add finalizer to leave cluster in well-defined state on shutdown
feat: Allow for disabling the managed dataplane for a Gateway
feat: Allow Gateways to request specific NodePorts, fix #137
feat: Implement EndpointSlice controller, fixes #26
feat: Set ExternalTrafficPolicy in LB Services, fixes #47
feat: Add new dataplane customization features, fixes #46
fix: Deepcopy K8s resources to be sent to the updater
refactor: Clean up metadata sharing between Gateways and Deployments, fixes #45
test: Refactor integration test cases

Enjoy STUNner and don't forget to support us!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The final missing pieces: Third stable release candidate