The CNF Test Suite program validates interoperability of CNF workloads supplied by multiple different vendors orchestrated by Kubernetes platforms that are supplied by multiple different vendors. The goal is to provide an open source test suite to enable both open and closed source CNFs to demonstrate conformance and implementation of best practices. For more detailed CLI documentation see the usage document.
CNFs should work with any Certified Kubernetes product and any CNI-compatible network that meet their functionality requirements. The CNF Test suite will check for usage of standard, in-band deployment tools such as Helm (version 3) charts. The CNF test suite checks to see if CNFs support horizontal scaling (across multiple machines) and vertical scaling (between sizes of machines) by using the native K8s kubectl. The CNF Test Suite validates this:
- Performing K8s API usage testing by running API snoop on the cluster which:
- Checks alpha endpoint usage
- Checks beta endpoint usage
- Checks generally available (GA) endpoint usage
- Test increasing/decreasing capacity
- Test small scale autoscaling with kubectl
- Test large scale autoscaling with load test tools like CNF Testbed
- Test if the CNF control layer responds to retries for failed communication (e.g. using Pumba or Blockade for network chaos and Envoy for retries)
- Testing if the install script uses Helm v3
- Testing if the CNF is published to a public helm chart repository.
- Testing if the Helm chart is valid (e.g. using the helm linter)
- Testing if the CNF can perform a rolling update (i.e. kubectl rolling update)
- Performing CNI Plugin testing which:
- Tests if CNI Plugin follows the CNI specification
The CNF should be developed and delivered as a microservice. The CNF Test suite tests to determine the organizational structure and rate of change of the CNF being tested. Once these are known we can detemine whether or not the CNF is a microservice. See: Microservice-Principles:
- Check if the CNF have a reasonable startup time.
- Check the image size of the CNF.
- Checks for single process on pods.
The CNF test suite checks if state is stored in a custom resource definition or a separate database (e.g. etcd) rather than requiring local storage. It also checks to see if state is resilient to node failure:
- Checking volume hostpath is found or not.
- Checks if no local volume is configured.
- Check if the CNF is using elastic persistent volumes
- Checks for k8s database persistence.
Cloud Native Definition requires systems to be Resilient to failures inevitable in cloud environments. CNF Resilience should be tested to ensure CNFs are designed to deal with non-carrier-grade shared cloud HW/SW platform:
- Checks for network latency
- Performs a disk fill
- Deletes a pod to test reliability and availability.
- Performs a memory hog test for resilience.
- Performs an IO stress test.
- Tests network corruption.
- Tests network duplication.
- Drains a node on the cluster.
- Checking for a liveness entry in the helm chart and if the container is responsive to it after a reset (e.g. by checking the helm chart entry)
- Checking for a readiness entry in the helm chart and if the container is responsive to it after a reset
In order to maintain, debug, and have insight into a protected environment, infrastructure elements must have the property of being observable. This means these elements must externalize their internal states in some way that lends itself to metrics, tracing, and logging. The Test suite checks this:
- Testing to see if there is traffic to Fluentd
- Testing to see if there is traffic to Jaeger
- Testing to see if Prometheus rules for the CNF are configured correctly (e.g. using Promtool)
- Testing to see if there is traffic to Prometheus
- Testing to see if the monitoring calls are compatible with OpenMetric
- Tests log output.
CNF containers should be isolated from one another and the host. The CNF Test suite uses tools like OPA Gatekeeper, Falco, and Armosec Kubescape:
- Check if any containers are running in privileged mode.
- Checks root user.
- Checks for privilege escalation.
- Checks symlink file system.
- Checks application credentials.
- Checks if the container or pods can access the host network.
- Checks for service accounts and mappings.
- Checks for ingress and egress being blocked.
- Privileged container checks.
- Verifies if there are insecure and dangerous capabilities.
- Checks network policies.
- Checks for non root containers.
- Checks PID and IPC privileges.
- Checks for Linux Hardening, eg. Selinux is used.
- Checks resource policies defined.
- Checks for immutable file systems.
- Verifies and checks if any hostpath mounts are used.
- Check if there are any shells
Configuration should be managed in a declarative manner, using ConfigMaps, Operators, or other declarative interfaces. The Test suite checks this by:
- Testing if the CNF is installed using a versioned Helm v3 chart
- Searching for hardcoded IP addresses, subnets, or node ports in the configuration
- Checking if the pod/container can be started without mounting a volume (e.g. using helm configuration) that has configuration files
- Testing by reseting any child processes, and when the parent process is started, checking to see if those child processes are reaped (ie. monitoring processes with Falco or sysdig-inspect)
- Testing if there are any (non-declarative) hardcoded IP addresses or subnet masks
- Tests if nodeport is not used.
- Tests hostport is not used.
- Checks for secrets used or configured.
- Tests immutable configmaps.
Tools to study/use for such testing methodology: The previously mentioned Pumba and Blocade, ChaosMesh, Mitmproxy, Istio for "Network Resilience", kill -STOP -CONT, LimitCPU, Packet pROcessing eXecution (PROX) engine as Impair Gateway.