Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DNSPolicy scale test #615

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

DNSPolicy scale test #615

wants to merge 1 commit into from

Conversation

mikenairn
Copy link
Member

@mikenairn mikenairn commented Jan 13, 2025

Adds a DNSPolicy specific scale test using kube burner.

Part of #928

Based on the existing scale test, but with a focus on DNSPolicy and shared hostnames being updated by multiple dns operator instances.

The workload will create multiple instances of the dns operator in separate namespaces(kuadrant-dns-operator-x), and multiple test namespaces (scale-test-x) that the corresponding dns operator is configured to watch. The number of dns operator instances and test namespaces created is determined by the JOB_ITERATIONS environment variable.
In each test namespace a test app and service is deployed and one or more gateways are created determined by the NUM_GWS environment variable. The number of listeners added to the gateway is determined by the NUM_LISTENERS environment variable.
Each listener hostname is generated using the listener number and the KUADRANT_ZONE_ROOT_DOMAIN environment variable. In each test namespace a dns provider credential is created, the type created is determined by the DNS_PROVIDER environment variable, additional environment variables may need to be set depending on the provider type.

Requires:

Comments/Thoughts:

  • Kubeburner does not have the concept of running workloads across multiple instances. This was one of the asks in this issue. It is probably possible to run multiple kubeburner tasks simultaneously using the same configuration in order to have multiple updates to the same record set from multiple clusters but there would be no orchestration from kubeburners POV. It should also use of a single thanos instance instead of one deployed on each cluster.
  • For these workloads to be of any use we need good metrics and alerts that are expected to fire when things are not working. It's not a test suite with assertions on the state, but rather it expects alerts to fire in order to fail the test run.
  • Separating the DNS Operator specific templates/metrics/alerts into the dns operator repo makes sense as long as we have a similar scale test in that repo. TBD if we do want that.

Alerts
A small list of alerts that i realised would be useful, but really there are probably hundreds required.

  • Alert when a gateway has not been assigned an address in an appropriate amount of time (Can be hit quite easily when using kind if you only have a few IPs available). This isn't strictly a kuadrant, issue.
  • Alert when DNSRecords are in a failing state for a given amount of time.
  • Alert if the managers are restarting an unexpected amount of times during the test run. Hit this as part of the DNSRecord scale test, wrote an alert for this here.

Adds a DNSPolicy specific scale test using kube burner.

The workload will create multiple instances of the dns operator in
separate namespaces(kuadrant-dns-operator-x), and multiple test
namespaces (scale-test-x) that the corresponding dns operator is
configured to watch.  The number of dns operator instances and test
namespaces created is determined by the `JOB_ITERATIONS` environment
variable.
In each test namespace a test app and service is deployed and one or
more gateways are created determined by the `NUM_GWS` environment
variable.  The number of listeners added to the gateway is determined by
the `NUM_LISTENERS` environment variable.
Each listener hostname is generated using the listener number and the
`KUADRANT_ZONE_ROOT_DOMAIN` environment variable.  In each test
namespace a dns provider credential is created, the type created is
determined by the `DNS_PROVIDER` environment variable, additional
environment variables may need to be set depending on the provider type.

Signed-off-by: Michael Nairn <[email protected]>
@awk 'BEGIN {FS = ":.*?## "} /^[a-zA-Z_-]+:.*?## / {printf "\033[36m%-30s\033[0m %s\n", $$1, $$2}' $(MAKEFILE_LIST)
.PHONY: help
help: ## Display this help.
@awk 'BEGIN {FS = ":.*##"; printf "\nUsage:\n make \033[36m<target>\033[0m\n"} /^[a-zA-Z_0-9-]+:.*?##/ { printf " \033[36m%-15s\033[0m %s\n", $$1, $$2 } /^##@/ { printf "\n\033[1m%s\033[0m\n", substr($$0, 5) } ' $(MAKEFILE_LIST)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional change, just brings it in-line with the help in other repos, you can use ##@ foo to add sections:

Before:

$ make help
commit-acceptance              Runs pre-commit linting checks
reformat                       Reformats testsuite with black
test                           Run all non mgc tests
authorino                      Run only authorino related tests
authorino-standalone           Run only test capable of running with standalone Authorino
limitador                      Run only Limitador related tests
kuadrant                       Run all tests available on Kuadrant
kuadrant-only                  Run Kuadrant-only tests
multicluster                   Run Multicluster only tests
dnstls                         Run DNS and TLS tests
disruptive                     Run disruptive tests
kuadrantctl                    Run Kuadrantctl tests
poetry                         Installs poetry with all dependencies
poetry-no-dev                  Installs poetry without development dependencies
polish-junit                   Remove skipped tests and logs from passing tests
reportportal                   Upload results to reportportal. Appropriate variables for juni2reportportal must be set
help                           Print this help
clean                          Clean all objects on cluster created by running this testsuite. Set the env variable USER to delete after someone else
test-scale-dnspolicy           Run DNSPolicy scale tests.
kube-burner                    Download kube-burner locally if necessary.

After:

$ make help

Usage:
  make <target>
  commit-acceptance  Runs pre-commit linting checks
  reformat         Reformats testsuite with black
  test             Run all non mgc tests
  authorino        Run only authorino related tests
  authorino-standalone  Run only test capable of running with standalone Authorino
  limitador        Run only Limitador related tests
  kuadrant         Run all tests available on Kuadrant
  kuadrant-only    Run Kuadrant-only tests
  multicluster     Run Multicluster only tests
  dnstls           Run DNS and TLS tests
  disruptive       Run disruptive tests
  kuadrantctl      Run Kuadrantctl tests
  poetry           Installs poetry with all dependencies
  poetry-no-dev    Installs poetry without development dependencies
  polish-junit     Remove skipped tests and logs from passing tests
  reportportal     Upload results to reportportal. Appropriate variables for juni2reportportal must be set
  help             Display this help.
  clean            Clean all objects on cluster created by running this testsuite. Set the env variable USER to delete after someone else

Scale Testing
  test-scale-dnspolicy  Run DNSPolicy scale tests.

Build Dependencies
  kube-burner      Download kube-burner locally if necessary.

Deploy the observability stack:
```shell
#kubectl apply --server-side -k github.com/mikenairn/dns-operator/config/observability?ref=add_scale_test
kubectl apply --server-side -k github.com/kuadrant/dns-operator/config/observability?ref=main # Run twice if it fails the first time
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be updated.

I think a single kustomization in the kuadrant operator that can be executed, without the need to pull down the repo, would be useful and easier than having to run this set of steps https://github.com/Kuadrant/kuadrant-operator/tree/main/config/observability#deploying-the-observabilty-stack. Depends how varied the observability setup gets.

Thanos setup should likely be a secondary optional task when working with a single cluster.

@david-martin Any opinions/thoughts on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cmd you have here should be enough to deploy the stack (without thanos or example alerts & dashboards).
You would typically need to run it twice if CRDs don't exist.
Once to just include CRDs so they are registered, and a 2nd time (without CRDs) to avoid errors creating CRs.

- kind: DNSPolicy
apiVersion: kuadrant.io/v1alpha1
labelSelector: {kube-burner-job: dnspolicy-scale-test-loadbalanced}
{{ end }}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied the pattern from the original scale test of adding this cleanup to remove the DNSPolicies. I understand why its here, but it is fairly frustrating that we need it.

I wonder if we should revisit the cleanup of policies/records/secrets and see if there is some reasonable way we can prevent secrets being removed before all resources referencing them are deleted.

@maleck13 I know we have discussed this before, but i think its probably the single most annoying thing about testing anything DNS related.

- https://raw.githubusercontent.com/{{.DNS_OPERATOR_GITHUB_ORG}}/dns-operator/refs/heads/{{.DNS_OPERATOR_GITREF}}/test/scale/alerts.yaml
indexer:
type: local
metricsDirectory: ./metrics
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the alerts and metrics being pulled from the dns operator repo here, but i imagine we could have these being pulled from multiple sources i.e. kuadrant-operator, testsuite repo, other components, where they define their own metrics/alerts specific to the resources they are providing.

The metrics/alerts configured, from what i can gather, are really needed to make the most out of kubeburner runs since alerts firing during the run are what will tell us if things are working or not, and what we would need to improve on if we feel these types of scale tests are useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the alerts and metrics files should be maintained in this repo for easier maintenance in the context of running and maintaining tests.
The alternative could result in extra toil, particularly when working out the details of assertions for a test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants