Skip to content

Latest commit

 

History

History
1372 lines (838 loc) · 36.5 KB

USAGE.md

File metadata and controls

1372 lines (838 loc) · 36.5 KB

CNF Test Suite CLI Usage Documentation

Table of Contents

Overview

The CNF Test suite can be run in production mode (using an executable) or in developer mode (using crystal lang directly). See the pseudo code documentation for examples of how the internals of WIP tests might work.

Syntax for running any of the tests

# Production mode
./cnf-testsuite <testname>

# Developer mode
crystal src/cnf-testsuite.cr <testname>

⭐ *Note: All usage commands in this document will use the production (binary executable) syntax unless otherwise stated.

  • ✔️ indicates implemented into stable release
  • 💡 indicates Proof of Concept
  • 📝 indicates To Do
  • ❌ indicates WARNINGS*

Results Output

  • ✔️ PASSED indicates it meets best practice, positive points given.
  • ⏭ SKIPPED indicates the test was skipped (output should provide a reason), no points given.
  • ❌ FAILED indicates the test failed, negative points given.

Common Example Commands

Building the executable

This is the command to build the binary executable if in developer mode or using the source install method (requires crystal):

crystal build src/cnf-testsuite.cr

Validating a cnf-testsuite.yml file:

./cnf-testsuite validate_config cnf-config=[PATH_TO]/cnf-testsuite.yml

Running all of the platform and workload tests:

./cnf-testsuite all cnf-config=<path_to_your_config_file>/cnf-testsuite.yml

Running all of the tests (including proofs of concepts)

./cnf-testsuite all poc cnf-config=<path_to_your_config_file>/cnf-testsuite.yml

Running all of the workload tests

crystal src/cnf-testsuite.cr workload
cnf-config=<path_to_your_config_file>/cnf-testsuite.yml

Running all of the platform or workload tests independently:

Run workload only tests:
./cnf-testsuite workload
Run platform only tests (long running):
./cnf-testsuite platform

Get available options and to see all available tests from command line:

./cnf-testsuite help

Clean up the CNF Test Suite, the K8s cluster, and upstream projects:

./cnf-testsuite cleanup

Logging Options

Update the loglevel from command line:

# cmd line
./cnf-testsuite -l debug test

If in developer mode, make sure to use - - if running from source:

crystal src/cnf-testsuite.cr -- -l debug test

You can also use env var for logging:

LOGLEVEL=DEBUG ./cnf-testsuite test

⭐ Note: When setting log level, the following is the order of precedence:

  1. CLI or Command line flag
  2. Environment variable
  3. CNF-Testsuite Config file
Verbose Option

Also setting the verbose option for many tasks will add extra output to help with debugging

./cnf-testsuite test_name verbose

Running The Linter in Developer Mode

See https://github.com/crystal-ameba/ameba for more details. Follow the INSTALL guide starting at the Source Install for more details running cnf-testsuite in developer mode.

shards install # only for first install
crystal bin/ameba.cr

Compatibility, Installability, and Upgradability Tests

To run all of the compatibility tests
./cnf-testsuite compatibility
To run both increase and decrease tests, you can use the alias command that calls them both:
./cnf-testsuite increase_decrease_capacity
Or, they can be called individually using the following commands:
./cnf-testsuite increase_capacity
./cnf-testsuite decrease_capacity

Remediation for failing this test:

Check out the kubectl docs for how to manually scale your cnf.

Also here is some info about things that could cause failures.

To run the Helm chart published test, you can use the following command:
./cnf-testsuite helm_chart_published

Remediation for failing this test:

Make sure your CNF helm charts are published in a Helm Repository.

To run the Helm chart vaild test, you can use the following command:
./cnf-testsuite helm_chart_valid

Remediation for failing this test:

Make sure your helm charts pass lint tests.

To run the Helm deploy test, you can use the following command:
./cnf-testsuite helm_deploy

Remediation for failing this test:

Make sure your helm charts are valid and can be deployed to clusters.

To run the Rollback test, you can use the following command:
./cnf-testsuite rollback

Remediation for failing this test:

Ensure that you can upgrade your CNF using the Kubectl Set Image command, then rollback the upgrade using the Kubectl Rollout Undo command.

To run the Rolling update test, you can use the following command:
./cnf-testsuite rolling_update

Remediation for failing this test:

Ensure that you can successfuly perform a rolling upgrade of your CNF using the Kubectl Set Image command.

To run the Rolling version change test, you can use the following command:
./cnf-testsuite rolling_version_change

Remediation for failing this test:

Ensure that you can successfuly rollback the software version of your CNF by using the Kubectl Set Image command.

To run the Rolling downgrade test, you can use the following command:
./cnf-testsuite rolling_downgrade

Remediation for failing this test:

Ensure that you can successfuly change the software version of your CNF back to an older version by using the Kubectl Set Image command.

To run the CNI compatible test, you can use the following command:
./cnf-testsuite cni_compatible

Remediation for failing this test:

Ensure that your CNF is compatible with Calico, Cilium and other available CNIs.

To run the Kubernetes Alpha APIs test, you can use the following command:
./cnf-testsuite alpha_k8s_apis

Remediation for failing this test:

Make sure your CNFs are not utilizing any Kubernetes alpha APIs. You can learn more about Kubernetes API versioning here.

Details for Compatibility, Installability and Upgradability Tests To Do's

📝 (To Do) To check of the CNF's CNI plugin accepts valid calls from the CNI specification

crystal src/cnf-testsuite.cr cni_spec

📝 (To Do) To check for the use of beta K8s API endpoints

crystal src/cnf-testsuite.cr api_snoop_beta

📝 (To Do) To check for the use of generally available (GA) K8s API endpoints

crystal src/cnf-testsuite.cr api_snoop_general_apis

📝 (To Do) To test small scale autoscaling

crystal src/cnf-testsuite.cr small_autoscaling

📝 (To Do) To test large scale autoscaling

crystal src/cnf-testsuite.cr large_autoscaling

📝 (To Do) To test if the CNF responds to network chaos

crystal src/cnf-testsuite.cr network_chaos

📝 (To Do) To test if the CNF control layer uses external retry logic

crystal src/cnf-testsuite.cr external_retry

📝 (To Do) To test small scale autoscaling

crystal src/cnf-testsuite.cr small_autoscaling

📝 (To Do) To test large scale autoscaling

crystal src/cnf-testsuite.cr large_autoscaling

📝 (To Do) To test if the CNF responds to network chaos

crystal src/cnf-testsuite.cr network_chaos

📝 (To Do) To test if the CNF control layer uses external retry logic

crystal src/cnf-testsuite.cr external_retry

Microservice Tests

To run all of the microservice tests
./cnf-testsuite microservice
To run the Reasonable image size, you can use the following command:
./cnf-testsuite reasonable_image_size

Remediation for failing this test:

Enure your CNFs image size is under 5GB.

To run the Reasonable startup time test, you can use the following command:
./cnf-testsuite reasonable_startup_time

Remediation for failing this test:

Ensure that your CNF gets into a running state within 30 seconds.

To run the Single process type test, you can use the following command:
./cnf-testsuite single_process_type

Remediation for failing this test:

Ensure that there is only one process type within a container. This does not count against child processes, e.g. nginx or httpd could be a parent process with 10 child processes and pass this test, but if both nginx and httpd were running, this test would fail.

To run the Service discovery test, you can use the following command:
./cnf-testsuite service_discovery

Remediation for failing this test:

Make sure the CNF exposes any of its containers as a Kubernetes Service. You can learn more about Kubernetes Service here.

To run the Shared database test, you can use the following command:
./cnf-testsuite shared_database 

Remediation for failing this test:

Make sure that your CNFs containers are not sharing the same database.

To run the Specialized Init System test, you can use the following command:
./cnf-testsuite specialized_init_system

Remediation for failing this test:

Use init systems that are purpose-built for containers like tini, dumb-init, s6-overlay.

State Tests

To run all of the state tests:
./cnf-testsuite state
To run the Node drain test, you can use the following command:
./cnf-testsuite node_drain

Please note, that this test requires a cluster with atleast two schedulable nodes.

Remediation for failing this test Ensure that your CNF can be successfully rescheduled when a node fails or is drained

To run the Volume hostpath not found test, you can use the following command:
./cnf-testsuite volume_hostpath_not_found

Remediation for failing this test: Ensure that none of the containers in your CNFs are using ["hostPath"] to mount volumes.

To run the No local volume configuration test, you can use the following command:
./cnf-testsuite no_local_volume_configuration

Remediation for failing this test: Ensure that your CNF isn't using any persistent volumes that use a ["local"] mount point.

To run the Elastic volume test, you can use the following command:
./cnf-testsuite elastic_volume

Remediation for failing this test: Setup and use elastic persistent volumes instead of local storage.

To run the Database persistence test, you can use the following command:
./cnf-testsuite database_persistence 

Remediation for failing this test: Select a database configuration that uses statefulsets and elastic storage volumes.

Reliability, Resilience and Availability

To run all of the resilience tests
./cnf-testsuite resilience
To run the CNF network latency test, you can use the following command:
./cnf-testsuite pod_network_latency

Remediation for failing this test: Ensure that your CNF doesn't stall or get into a corrupted state when network degradation occurs. A mitigation stagagy(in this case keep the timeout i.e., access latency low) could be via some middleware that can switch traffic based on some SLOs parameters.

To run the CNF disk fill test, you can use the following command:
./cnf-testsuite disk_fill

Remediation for failing this test: Ensure that your CNF is resilient and doesn't stall when heavy IO causes a degradation in storage resource availability.

To run the CNF Pod delete test, you can use the following command:
./cnf-testsuite pod_delete

Remediation for failing this test: Ensure that your CNF is resilient and doesn't fail on a forced/graceful pod failure on specific or random replicas of an application.

To run the CNF Pod delete test, you can use the following command:
./cnf-testsuite pod_memory_hog

Remediation for failing this test: Ensure that your CNF is resilient to heavy memory usage and can maintain some level of avaliabliy.

To run the IO Stress test, you can use the following command:
./cnf-testsuite pod_io_stress

Remediation for failing this test: Ensure that your CNF is resilient to continuous and heavy disk IO load and can maintain some level of avaliabliy

To run the Network corruption test, you can use the following command:
./cnf-testsuite pod_network_corruption

Remediation for failing this test: Ensure that your CNF is resilient to a lossy/flaky network and can maintain a level of avaliabliy.

To run the Network duplication test, you can use the following command:
./cnf-testsuite pod_network_duplication

Remediation for failing this test: Ensure that your CNF is resilient to erroneously duplicated packets and can maintain a level of availability.

To run the Pod DNS error test, you can use the following command:
./cnf-testsuite pod_dns_error

Remediation for failing this test: Ensure that your CNF is resilient to DNS resolution failures can maintain a level of availability.

To run the Helm chart liveness entry test, you can use the following command:
./cnf-testsuite liveness

Remediation for failing this test: Ensure that your CNF has a Liveness Probe configured.

To run the Helm chart readiness entry test, you can use the following command:
./cnf-testsuite readiness

Remediation for failing this test: Ensure that your CNF has a Readiness Probe configured.

Observability and Diagnostic Tests

To run all observability tests, you can use the following command:
./cnf-testsuite observability
To run the stdout/stderr logging test, you can use the following command:
./cnf-testsuite log_output

Remediation for failing this test: Make sure applications and CNF's are sending log output to STDOUT and or STDERR.

To run the Prometheus installed test, you can use the following command:
./cnf-testsuite prometheus_traffic 

Remediation for failing this test: Install and configure Prometheus for your CNF.

To run the routed logs test, you can use the following command:
./cnf-testsuite routed_logs

Remediation for failing this test: Install and configure fluentd or fluentbit to collect data and logs. See more at fluentd.org for fluentd or fluentbit.io for fluentbit.

To run the OpenMetrics compatible test, you can use the following command:
./cnf-testsuite open_metrics

Remediation for failing this test: Ensure that your CNF is publishing OpenMetrics compatible metrics.

To run the Jaeger tracing test, you can use the following command:
./cnf-testsuite tracing

Remediation for failing this test: Ensure that your CNF is both using & publishing traces to Jaeger.

Security Tests

To run all of the security tests, you can use the following command:
./cnf-testsuite security
To run the Container socket mount test, you can use the following command:
./cnf-testsuite container_sock_mounts

Remediation for failing this test: Make sure your CNF doesn't mount /var/run/docker.sock, /var/run/containerd.sock or /var/run/crio.sock on any containers.

To run the External IPs test, you can use the following command:
./cnf-testsuite external_ips

Remediation for failing this test: Make sure to not define external IPs in your kubernetes service configuration

To run the Privilege container test, you can use the following command:
./cnf-testsuite privileged_containers

Remediation for failing this test:

Remove privileged capabilities by setting the securityContext.privileged to false. If you must deploy a Pod as privileged, add other restriction to it, such as network policy, Seccomp etc and still remove all unnecessary capabilities.

To run the Privilege escalation test, you can use the following command:
./cnf-testsuite privilege_escalation

Remediation for failing this test: If your application does not need it, make sure the allowPrivilegeEscalation field of the securityContext is set to false. See more at ARMO-C0016

To run the Symlink file test, you can use the following command:
./cnf-testsuite symlink_file_system

Remediation for failing this test: To mitigate this vulnerability without upgrading kubelet, you can disable the VolumeSubpath feature gate on kubelet and kube-apiserver, or remove any existing Pods using subPath or subPathExpr feature.

To run the Sysctls test, you can use the following command:
./cnf-testsuite sysctls

Remediation for failing this test: The spec.securityContext.sysctls field must be unset or not use.

To run the Application credentials test, you can use the following command:
./cnf-testsuite application_credentials

Remediation for failing this test: Use Kubernetes secrets or Key Management Systems to store credentials.

To run the Host network credentials test, you can use the following command:
./cnf-testsuite host_network

Remediation for failing this test: Only connect PODs to the hostNetwork when it is necessary. If not, set the hostNetwork field of the pod spec to false, or completely remove it (false is the default). Allow only those PODs that must have access to host network by design.

To run the Service account mapping test, you can use the following command:
./cnf-testsuite service_account_mapping

Remediation for failing this test: Disable automatic mounting of service account tokens to PODs either at the service account level or at the individual POD level, by specifying the automountServiceAccountToken: false. Note that POD level takes precedence.

To run the Ingress and Egress test, you can use the following command:
./cnf-testsuite ingress_egress_blocked

Remediation for failing this test:

By default, you should disable or restrict Ingress and Egress traffic on all pods.

To run the Insecure capabilities test, you can use the following command:
./cnf-testsuite insecure_capabilities

Remediation for failing this test:

Remove all insecure capabilities which aren’t necessary for the container.

To run the Non-root containers test, you can use the following command:
./cnf-testsuite non_root_containers

Remediation for failing this test:

If your application does not need root privileges, make sure to define the runAsUser and runAsGroup under the PodSecurityContext to use user ID 1000 or higher, do not turn on allowPrivlegeEscalation bit and runAsNonRoot is true.

To configure the Falco driver to be used for this test, please refer to docs/falco-config.md.

To run the Host PID/IPC test, you can use the following command:
./cnf-testsuite host_pid_ipc_privileges

Remediation for failing this test:

Apply least privilege principle and remove hostPID and hostIPC from the yaml configuration privileges unless they are absolutely necessary.

To run the Linux hardening test, you can use the following command:
./cnf-testsuite linux_hardening

Remediation for failing this test:

Use AppArmor, Seccomp, SELinux and Linux Capabilities mechanisms to restrict containers abilities to utilize unwanted privileges.

To run the Resource policies test, you can use the following command:
./cnf-testsuite resource_policies

Remediation for failing this test:

Define LimitRange and ResourceQuota policies to limit resource usage for namespaces or in the deployment/POD yamls.

To run the Immutable File Systems test, you can use the following command:
./cnf-testsuite immutable_file_systems

Remediation for failing this test:

Set the filesystem of the container to read-only when possible. If the containers application needs to write into the filesystem, it is possible to mount secondary filesystems for specific directories where application require write access.

To run the HostPath Mounts test, you can use the following command:
./cnf-testsuite hostpath_mounts

Remediation for failing this test:

Refrain from using a hostPath mount.

To run the SELinux options test, you can use the following command:
./cnf-testsuite selinux_options

Remediation for failing this test: Ensure the following guidelines are followed for any cluster resource that allow SELinux options.

  • If the SELinux option `type` is set, it should only be one of the allowed values: `container_t`, `container_init_t`, or `container_kvm_t`.
  • SELinux options `user` or `role` should not be set.
Details for Security Tests To Do's

📝 (To Do) To check if there are any shells running in the container

crystal src/cnf-testsuite.cr shells

📝 (To Do) To check if there are any protected directories or files that are accessed from within the container

crystal src/cnf-testsuite.cr protected_access

Configuration Tests

To run all Configuration tests, you can use the following command:
./cnf-testsuite configuration_lifecycle
To run the Default namespace test, you can use the following command:
./cnf-testsuite default_namespace

Remediation for failing this test:

Ensure that your CNF is configured to use a Namespace and is not using the default namespace.

To run the Latest tag test, you can use the following command:
./cnf-testsuite latest_tag

Remediation for failing this test:

When specifying container images, always specify a tag and ensure to use an immutable tag that maps to a specific version of an application Pod. Remove any usage of the latest tag, as it is not guaranteed to be always point to the same version of the image.

To run the require labels test, you can use the following command:
./cnf-testsuite require_labels

Remediation for failing this test:

Make sure to define app.kubernetes.io/name label under metadata for your CNF.

To run the versioned tag test, you can use the following command:
./cnf-testsuite versioned_tag

Remediation for failing this test:

When specifying container images, always specify a tag and ensure to use an immutable tag that maps to a specific version of an application Pod. Remove any usage of the latest tag, as it is not guaranteed to be always point to the same version of the image.

To run the nodePort not used test, you can use the following command:
./cnf-testsuite nodeport_not_used

Remediation for failing this test:

Review all Helm Charts & Kubernetes Manifest files for the CNF and remove all occurrences of the nostPort field in you configuration. Alternatively, configure a service or use another mechanism for exposing your contianer.

To run the hodePort not used test, you can use the following command:
./cnf-testsuite hostport_not_used

Remediation for failing this test:

Review all Helm Charts & Kubernetes Manifest files for the CNF and remove all occurrences of the hostPort field in you configuration. Alternatively, configure a service or use another mechanism for exposing your contianer.

To run the Hardcoded IP addresses test, you can use the following command:
./cnf-testsuite hardcoded_ip_addresses_in_k8s_runtime_configuration

Remediation for failing this test:

Review all Helm Charts & Kubernetes Manifest files of the CNF and look for any hardcoded usage of ip addresses. If any are found, you will need to use an operator or some other method to abstract the IP management out of your configuration in order to pass this test.

To run the Secrets used test, you can use the following command:
./cnf-testsuite secrets_used

Rules for the test: The whole test passes if any workload resource in the cnf uses a (non-exempt) secret. If no workload resources use a (non-exempt) secret, the test is skipped.

Remediation for failing this test:

Remove any sensitive data stored in configmaps, environment variables and instead utilize K8s Secrets for storing such data. Alternatively, you can use an operator or some other method to abstract hardcoded sensitive data out of your configuration.

To run the immutable configmap test, you can use the following command:
./cnf-testsuite immutable_configmap

Remediation for failing this test: Use immutable configmaps for any non-mutable configuration data.

5g Tests

To run all 5g tests, you can use the following command:
./cnf-testsuite 5g
To run the 5g core_validator test, you can use the following command:
./cnf-testsuite smf_upf_core_validator
To run the 5g suci_enabled test, you can use the following command:
./cnf-testsuite suci_enabled

RAN Tests

To run all RAN tests, you can use the following command:
./cnf-testsuite ran
To run the oran e2 connection test, you can use the following command:
./cnf-testsuite oran_e2_connection

Platform Tests

To run all Platform tests, you can use the following command:
./cnf-testsuite platform
To run the K8s Conformance test, you can use the following command:
./cnf-testsuite k8s_conformance

Remediation for failing this test: Check that Sonobuoy can be successfully run and passes without failure on your platform. Any failures found by Sonobuoy will provide debug and remediation steps required to get your K8s cluster into a conformant state.

To run the ClusterAPI enabled test, you can use the following command:
./cnf-testsuite clusterapi_enabled

Remediation for failing this test: Enable ClusterAPI and start using it to manage the provisioning and lifecycle of your Kubernetes clusters.

To run all platform harware and scheduling tests, you can use the following command:
./cnf-testsuite  platform:hardware_and_scheduling
To run the OCI Compliant test, you can use the following command:
./cnf-testsuite platform:oci_compliant

Remediation for failing this test:

Check if your Kuberentes Platform is using an OCI Compliant Runtime. If you platform is not using an OCI Compliant Runtime, you'll need to switch to a new runtitme that is OCI Compliant in order to pass this test.

(PoC) To run All platform resilience tests, you can use the following command:
./cnf-testsuite platform:resilience poc
To run the Worker reboot recovery test, you can use the following command:
./cnf-testsuite platform:worker_reboot_recovery poc destructive

Remediation for failing this test:

Reboot a worker node in your Kubernetes cluster verify that the node can recover and re-join the cluster in a schedulable state. Workloads should also be rescheduled to the node once it's back online.

✔️ Run All platform security tests
./cnf-testsuite platform:security 
To run the Cluster admin test, you can use the following command:
./cnf-testsuite platform:cluster_admin

Remediation for failing this test: You should apply least privilege principle. Make sure cluster admin permissions are granted only when it is absolutely necessary. Don't use subjects with high privileged permissions for daily operations.

See more at ARMO-C0035

To run the Control plane hardening test, you can use the following command:
./cnf-testsuite platform:control_plane_hardening

Remediation for failing this test:

Set the insecure-port flag of the API server to zero.

See more at ARMO-C0005

./cnf-testsuite platform:control_plane_hardening
To run the Dashboard exposed test, you can use the following command:
./cnf-testsuite platform:exposed_dashboard

Remediation for failing this test:

Update dashboard version to v2.0.1 or above.

To run the Tiller images test, you can use the following command:
./cnf-testsuite platform:helm_tiller

Remediation for failing this test: Switch to using Helm v3+ and make sure not to pull any images with name tiller in them