Skip to content

Commit

Permalink
Add E2E automation, debug helpers (#94)
Browse files Browse the repository at this point in the history
* WIP: Add script to automate testing of two node cluster

* Add template for cluster profile

* Remove separate function to build stylus framework image

Earthly stylus goal is now building this goal two

* WIP: automation of cluster launch

* Move test script to test/

* Exract user data template

* Add creation of all machines

* Refactor main

* Add missing required variable

* wip: e2e vmware test

Signed-off-by: Tyler Gillson <[email protected]>

* finish automating e2e provisioning

Signed-off-by: Tyler Gillson <[email protected]>

* tidy whitespace & remove oz's changes to default user-data template

Signed-off-by: Tyler Gillson <[email protected]>

* fixes & docs

Signed-off-by: Tyler Gillson <[email protected]>

* add isTwoNodeCandidate flag to cluster template

Signed-off-by: Tyler Gillson <[email protected]>

* fix: remove invalid node-status-update-frequency flag

Signed-off-by: Tyler Gillson <[email protected]>

* fix: ensure unique VM names in vSphere; docs

Signed-off-by: Tyler Gillson <[email protected]>

* remove invalid arg from cluster profile template

Signed-off-by: Tyler Gillson <[email protected]>

* add livenessSeconds & VIP config

Signed-off-by: Tyler Gillson <[email protected]>

* feat: add destroy_edge_hosts

Signed-off-by: Tyler Gillson <[email protected]>

* install ping for two node

Signed-off-by: Tyler Gillson <[email protected]>

* feat: configurable NIC_NAME

Signed-off-by: Tyler Gillson <[email protected]>

* fix test-two-node.sh

* add debug scripts, cleanup funcs, remove two-node env hack

Signed-off-by: Tyler Gillson <[email protected]>

---------

Signed-off-by: Tyler Gillson <[email protected]>
Co-authored-by: Oz Tiram <[email protected]>
  • Loading branch information
TylerGillson and oz123 authored Nov 6, 2023
1 parent 54ef5b6 commit 7e3f355
Show file tree
Hide file tree
Showing 9 changed files with 741 additions and 15 deletions.
4 changes: 2 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,5 @@ config.yaml
content-*/*
*.arg
.idea

.DS_Store
hack/*.img
.DS_Store
16 changes: 16 additions & 0 deletions hack/Earthfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
VERSION 0.6

ARG OSBUILDER_VERSION=v0.7.11
ARG OSBUILDER_IMAGE=quay.io/kairos/osbuilder-tools:$OSBUILDER_VERSION
ARG ISO_NAME=debug

# replace with your CanvOS provider image
ARG PROVIDER_IMAGE=oci:tylergillson/ubuntu:k3s-1.26.4-v4.0.4-071c2c23

build:
FROM $OSBUILDER_IMAGE
WORKDIR /build
COPY . ./

RUN /entrypoint.sh --name $ISO_NAME --debug build-iso --squash-no-compression --date=false $PROVIDER_IMAGE --output /build/
SAVE ARTIFACT /build/$ISO_NAME.iso kairos.iso AS LOCAL build/$ISO_NAME.iso
17 changes: 17 additions & 0 deletions hack/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Debugging Kairos

If you're facing hard-to-diagnose issues with your custom provider image, you can use the scripts in this directory to obtain verbose Kairos output.

## Steps
1. Use earthly to generate an ISO from your CanvOS provider image:
```
earthly +build --PROVIDER_IMAGE=<your_provider_image> # e.g., oci:tylergillson/ubuntu:k3s-1.26.4-v4.0.4-071c2c23
```
If successful, `build/debug.iso` will be created.
2. Launch a local VM based on the debug ISO using QEMU and pipe all output to a log file:
```
./launch-qemu.sh build/debug.iso | tee out.log
```
3. Once the VM boots, use `reboot` to return to the GRUB menu, then select your desired entry and hit `x` to edit it. Add `rd.debug rd.immucore.debug` to the end of the `linux` line for your selected GRUB menu entry, then hit `CTRL+x` to boot with your edits. You should see verbose Kairos debug logs and they will be persisted to `out.log`.
Empty file added hack/build/.keep
Empty file.
25 changes: 25 additions & 0 deletions hack/launch-qemu.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#!/bin/bash

# Screenshot capability:
# https://unix.stackexchange.com/a/476617

if [ ! -e disk.img ]; then
qemu-img create -f qcow2 disk.img 60g
fi

# -nic bridge,br=br0,model=virtio-net-pci \
qemu-system-x86_64 \
-enable-kvm \
-cpu "${CPU:=host}" \
-nographic \
-spice port=9000,addr=127.0.0.1,disable-ticketing=yes \
-m ${MEMORY:=10096} \
-smp ${CORES:=5} \
-monitor unix:/tmp/qemu-monitor.sock,server=on,wait=off \
-serial mon:stdio \
-rtc base=utc,clock=rt \
-chardev socket,path=qga.sock,server=on,wait=off,id=qga0 \
-device virtio-serial \
-device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 \
-drive if=virtio,media=disk,file=disk.img \
-drive if=ide,media=cdrom,file="${1}"
64 changes: 64 additions & 0 deletions test/templates/two-node-cluster-profile.json.tmpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
{
"metadata": {
"name": "_____place_holder_____",
"description": "",
"labels": {}
},
"spec": {
"version": "1.0.0",
"template": {
"type": "infra",
"cloudType": "edge-native",
"packs": [
{
"name": "edge-native-byoi",
"type": "spectro",
"layer": "os",
"version": "1.0.0",
"tag": "1.0.0",
"values": "pack:\n content:\n images:\n - image: \"{{.spectro.pack.edge-native-byoi.options.system.uri}}\"\n # Below config is default value, please uncomment if you want to modify default values\n #drain:\n #cordon: true\n #timeout: 60 # The length of time to wait before giving up, zero means infinite\n #gracePeriod: 60 # Period of time in seconds given to each pod to terminate gracefully. If negative, the default value specified in the pod will be used\n #ignoreDaemonSets: true\n #deleteLocalData: true # Continue even if there are pods using emptyDir (local data that will be deleted when the node is drained)\n #force: true # Continue even if there are pods that do not declare a controller\n #disableEviction: false # Force drain to use delete, even if eviction is supported. This will bypass checking PodDisruptionBudgets, use with caution\n #skipWaitForDeleteTimeout: 60 # If pod DeletionTimestamp older than N seconds, skip waiting for the pod. Seconds must be greater than 0 to skip.\nstylusPackage: container://OCI_REGISTRY/stylus-linux-amd64:v0.0.0-STYLUS_HASH\noptions:\n system.uri: \"OCI_REGISTRY/ubuntu:k3s-1.26.4-v4.0.4-STYLUS_HASH\"",
"registry": {
"metadata": {
"uid": "_____place_holder_____",
"name": "Public Repo",
"kind": "pack",
"isPrivate": false
}
}
},
{
"name": "edge-k3s",
"type": "spectro",
"layer": "k8s",
"version": "1.26.4",
"tag": "1.26.4",
"values": "cluster:\n config: |\n flannel-backend: host-gw\n disable-network-policy: false\n disable:\n - traefik\n - local-storage\n - servicelb\n - metrics-server\n\n # configure the pod cidr range\n cluster-cidr: \"192.170.0.0/16\"\n\n # configure service cidr range\n service-cidr: \"192.169.0.0/16\"\n\n # kubeconfig must be in run for the stylus operator to manage the cluster\n write-kubeconfig: /run/kubeconfig\n write-kubeconfig-mode: 600\n\n # additional component settings to harden installation\n kube-apiserver-arg:\n - anonymous-auth=true\n - profiling=false\n - disable-admission-plugins=AlwaysAdmit\n - default-not-ready-toleration-seconds=60\n - default-unreachable-toleration-seconds=60\n - enable-admission-plugins=AlwaysPullImages,NamespaceLifecycle,ServiceAccount,NodeRestriction\n - audit-log-path=/var/log/apiserver/audit.log\n - audit-policy-file=/etc/kubernetes/audit-policy.yaml\n - audit-log-maxage=30\n - audit-log-maxbackup=10\n - audit-log-maxsize=100\n - authorization-mode=RBAC,Node\n - tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256\n kube-controller-manager-arg:\n - profiling=false\n - terminated-pod-gc-threshold=25\n - use-service-account-credentials=true\n - feature-gates=RotateKubeletServerCertificate=true\n - node-monitor-period=5s\n - node-monitor-grace-period=20s\n - pod-eviction-timeout=20s\n kube-scheduler-arg:\n - profiling=false\n kubelet-arg:\n - read-only-port=0\n - event-qps=0\n - feature-gates=RotateKubeletServerCertificate=true\n - protect-kernel-defaults=true\n - tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256\n - rotate-server-certificates=true\nstages:\n initramfs:\n - sysctl:\n vm.overcommit_memory: 1\n kernel.panic: 10\n kernel.panic_on_oops: 1\n kernel.printk: \"0 4 0 7\"\n - directories:\n - path: \"/var/log/apiserver\"\n permissions: 0644\n files:\n - path: /etc/hosts\n permission: \"0644\"\n content: |\n 127.0.0.1 localhost\n - path: \"/etc/kubernetes/audit-policy.yaml\"\n owner_string: \"root\"\n permission: 0600\n content: |\n apiVersion: audit.k8s.io/v1\n kind: Policy\n rules:\n - level: None\n users: [\"system:kube-proxy\"]\n verbs: [\"watch\"]\n resources:\n - group: \"\" # core\n resources: [\"endpoints\", \"services\", \"services/status\"]\n - level: None\n users: [\"system:unsecured\"]\n namespaces: [\"kube-system\"]\n verbs: [\"get\"]\n resources:\n - group: \"\" # core\n resources: [\"configmaps\"]\n - level: None\n users: [\"kubelet\"] # legacy kubelet identity\n verbs: [\"get\"]\n resources:\n - group: \"\" # core\n resources: [\"nodes\", \"nodes/status\"]\n - level: None\n userGroups: [\"system:nodes\"]\n verbs: [\"get\"]\n resources:\n - group: \"\" # core\n resources: [\"nodes\", \"nodes/status\"]\n - level: None\n users:\n - system:kube-controller-manager\n - system:kube-scheduler\n - system:serviceaccount:kube-system:endpoint-controller\n verbs: [\"get\", \"update\"]\n namespaces: [\"kube-system\"]\n resources:\n - group: \"\" # core\n resources: [\"endpoints\"]\n - level: None\n users: [\"system:apiserver\"]\n verbs: [\"get\"]\n resources:\n - group: \"\" # core\n resources: [\"namespaces\", \"namespaces/status\", \"namespaces/finalize\"]\n - level: None\n users: [\"cluster-autoscaler\"]\n verbs: [\"get\", \"update\"]\n namespaces: [\"kube-system\"]\n resources:\n - group: \"\" # core\n resources: [\"configmaps\", \"endpoints\"]\n # Don't log HPA fetching metrics.\n - level: None\n users:\n - system:kube-controller-manager\n verbs: [\"get\", \"list\"]\n resources:\n - group: \"metrics.k8s.io\"\n # Don't log these read-only URLs.\n - level: None\n nonResourceURLs:\n - /healthz*\n - /version\n - /swagger*\n # Don't log events requests.\n - level: None\n resources:\n - group: \"\" # core\n resources: [\"events\"]\n # node and pod status calls from nodes are high-volume and can be large, don't log responses for expected updates from nodes\n - level: Request\n users: [\"kubelet\", \"system:node-problem-detector\", \"system:serviceaccount:kube-system:node-problem-detector\"]\n verbs: [\"update\",\"patch\"]\n resources:\n - group: \"\" # core\n resources: [\"nodes/status\", \"pods/status\"]\n omitStages:\n - \"RequestReceived\"\n - level: Request\n userGroups: [\"system:nodes\"]\n verbs: [\"update\",\"patch\"]\n resources:\n - group: \"\" # core\n resources: [\"nodes/status\", \"pods/status\"]\n omitStages:\n - \"RequestReceived\"\n # deletecollection calls can be large, don't log responses for expected namespace deletions\n - level: Request\n users: [\"system:serviceaccount:kube-system:namespace-controller\"]\n verbs: [\"deletecollection\"]\n omitStages:\n - \"RequestReceived\"\n # Secrets, ConfigMaps, and TokenReviews can contain sensitive \u0026 binary data,\n # so only log at the Metadata level.\n - level: Metadata\n resources:\n - group: \"\" # core\n resources: [\"secrets\", \"configmaps\"]\n - group: authentication.k8s.io\n resources: [\"tokenreviews\"]\n omitStages:\n - \"RequestReceived\"\n # Get repsonses can be large; skip them.\n - level: Request\n verbs: [\"get\", \"list\", \"watch\"]\n resources:\n - group: \"\" # core\n - group: \"admissionregistration.k8s.io\"\n - group: \"apiextensions.k8s.io\"\n - group: \"apiregistration.k8s.io\"\n - group: \"apps\"\n - group: \"authentication.k8s.io\"\n - group: \"authorization.k8s.io\"\n - group: \"autoscaling\"\n - group: \"batch\"\n - group: \"certificates.k8s.io\"\n - group: \"extensions\"\n - group: \"metrics.k8s.io\"\n - group: \"networking.k8s.io\"\n - group: \"policy\"\n - group: \"rbac.authorization.k8s.io\"\n - group: \"settings.k8s.io\"\n - group: \"storage.k8s.io\"\n omitStages:\n - \"RequestReceived\"\n # Default level for known APIs\n - level: RequestResponse\n resources:\n - group: \"\" # core\n - group: \"admissionregistration.k8s.io\"\n - group: \"apiextensions.k8s.io\"\n - group: \"apiregistration.k8s.io\"\n - group: \"apps\"\n - group: \"authentication.k8s.io\"\n - group: \"authorization.k8s.io\"\n - group: \"autoscaling\"\n - group: \"batch\"\n - group: \"certificates.k8s.io\"\n - group: \"extensions\"\n - group: \"metrics.k8s.io\"\n - group: \"networking.k8s.io\"\n - group: \"policy\"\n - group: \"rbac.authorization.k8s.io\"\n - group: \"settings.k8s.io\"\n - group: \"storage.k8s.io\"\n omitStages:\n - \"RequestReceived\"\n # Default level for all other requests.\n - level: Metadata\n omitStages:\n - \"RequestReceived\"\npack:\n palette:\n config:\n oidc:\n identityProvider: noauth",
"registry": {
"metadata": {
"uid": "_____place_holder_____",
"name": "Public Repo",
"kind": "pack",
"isPrivate": false
}
}
},
{
"name": "cni-custom",
"type": "spectro",
"layer": "cni",
"version": "0.1.0",
"tag": "0.1.0",
"values": "manifests:\n byo-cni:\n contents: |\n apiVersion: v1\n kind: ConfigMap\n metadata:\n name: custom-cni\n data:\n # property-like keys; each key maps to a simple value\n custom-cni: \"byo-cni\"",
"registry": {
"metadata": {
"uid": "_____place_holder_____",
"name": "Public Repo",
"kind": "pack",
"isPrivate": false
}
}
}
]
}
}
}
Loading

0 comments on commit 7e3f355

Please sign in to comment.