Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add E2E automation, debug helpers #94

Merged
merged 24 commits into from
Nov 6, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
68be2e4
WIP: Add script to automate testing of two node cluster
oz123 Oct 23, 2023
8d23a9a
Add template for cluster profile
oz123 Oct 23, 2023
2204b3c
Remove separate function to build stylus framework image
oz123 Oct 23, 2023
f6437f6
WIP: automation of cluster launch
oz123 Oct 23, 2023
a79478e
Move test script to test/
oz123 Oct 23, 2023
2a59408
Exract user data template
oz123 Oct 23, 2023
8f704fd
Add creation of all machines
oz123 Oct 23, 2023
48f4ddd
Refactor main
oz123 Oct 23, 2023
a945f41
Add missing required variable
oz123 Oct 23, 2023
1a48681
wip: e2e vmware test
TylerGillson Oct 24, 2023
9dfb2be
finish automating e2e provisioning
TylerGillson Oct 25, 2023
7210693
tidy whitespace & remove oz's changes to default user-data template
TylerGillson Oct 25, 2023
1876ece
fixes & docs
TylerGillson Oct 25, 2023
8aab13b
add isTwoNodeCandidate flag to cluster template
TylerGillson Oct 25, 2023
c750268
fix: remove invalid node-status-update-frequency flag
TylerGillson Oct 25, 2023
5e1ea92
fix: ensure unique VM names in vSphere; docs
TylerGillson Oct 25, 2023
33544ae
remove invalid arg from cluster profile template
TylerGillson Oct 26, 2023
b9c9d92
add livenessSeconds & VIP config
TylerGillson Oct 26, 2023
45b89ba
feat: add destroy_edge_hosts
TylerGillson Oct 26, 2023
862c46b
install ping for two node
TylerGillson Oct 26, 2023
bd01280
feat: configurable NIC_NAME
TylerGillson Oct 27, 2023
bd04d7b
fix test-two-node.sh
TylerGillson Oct 27, 2023
b2d8e72
add debug scripts, cleanup funcs, remove two-node env hack
TylerGillson Nov 6, 2023
545df52
Merge branch 'two-node' into test-two-node-vmware
TylerGillson Nov 6, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,5 +8,5 @@ config.yaml
content-*/*
*.arg
.idea

.DS_Store
hack/*.img
.DS_Store
16 changes: 16 additions & 0 deletions hack/Earthfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
VERSION 0.6

ARG OSBUILDER_VERSION=v0.7.11
ARG OSBUILDER_IMAGE=quay.io/kairos/osbuilder-tools:$OSBUILDER_VERSION
ARG ISO_NAME=debug

# replace with your CanvOS provider image
ARG PROVIDER_IMAGE=oci:tylergillson/ubuntu:k3s-1.26.4-v4.0.4-071c2c23

build:
FROM $OSBUILDER_IMAGE
WORKDIR /build
COPY . ./

RUN /entrypoint.sh --name $ISO_NAME --debug build-iso --squash-no-compression --date=false $PROVIDER_IMAGE --output /build/
SAVE ARTIFACT /build/$ISO_NAME.iso kairos.iso AS LOCAL build/$ISO_NAME.iso
17 changes: 17 additions & 0 deletions hack/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Debugging Kairos

If you're facing hard-to-diagnose issues with your custom provider image, you can use the scripts in this directory to obtain verbose Kairos output.

## Steps
1. Use earthly to generate an ISO from your CanvOS provider image:
```
earthly +build --PROVIDER_IMAGE=<your_provider_image> # e.g., oci:tylergillson/ubuntu:k3s-1.26.4-v4.0.4-071c2c23
```
If successful, `build/debug.iso` will be created.

2. Launch a local VM based on the debug ISO using QEMU and pipe all output to a log file:
```
./launch-qemu.sh build/debug.iso | tee out.log
```

3. Once the VM boots, use `reboot` to return to the GRUB menu, then select your desired entry and hit `x` to edit it. Add `rd.debug rd.immucore.debug` to the end of the `linux` line for your selected GRUB menu entry, then hit `CTRL+x` to boot with your edits. You should see verbose Kairos debug logs and they will be persisted to `out.log`.
Empty file added hack/build/.keep
Empty file.
25 changes: 25 additions & 0 deletions hack/launch-qemu.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#!/bin/bash

# Screenshot capability:
# https://unix.stackexchange.com/a/476617

if [ ! -e disk.img ]; then
qemu-img create -f qcow2 disk.img 60g
fi

# -nic bridge,br=br0,model=virtio-net-pci \
qemu-system-x86_64 \
-enable-kvm \
-cpu "${CPU:=host}" \
-nographic \
-spice port=9000,addr=127.0.0.1,disable-ticketing=yes \
-m ${MEMORY:=10096} \
-smp ${CORES:=5} \
-monitor unix:/tmp/qemu-monitor.sock,server=on,wait=off \
-serial mon:stdio \
-rtc base=utc,clock=rt \
-chardev socket,path=qga.sock,server=on,wait=off,id=qga0 \
-device virtio-serial \
-device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 \
-drive if=virtio,media=disk,file=disk.img \
-drive if=ide,media=cdrom,file="${1}"
64 changes: 64 additions & 0 deletions test/templates/two-node-cluster-profile.json.tmpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
{
"metadata": {
"name": "_____place_holder_____",
"description": "",
"labels": {}
},
"spec": {
"version": "1.0.0",
"template": {
"type": "infra",
"cloudType": "edge-native",
"packs": [
{
"name": "edge-native-byoi",
"type": "spectro",
"layer": "os",
"version": "1.0.0",
"tag": "1.0.0",
"values": "pack:\n content:\n images:\n - image: \"{{.spectro.pack.edge-native-byoi.options.system.uri}}\"\n # Below config is default value, please uncomment if you want to modify default values\n #drain:\n #cordon: true\n #timeout: 60 # The length of time to wait before giving up, zero means infinite\n #gracePeriod: 60 # Period of time in seconds given to each pod to terminate gracefully. If negative, the default value specified in the pod will be used\n #ignoreDaemonSets: true\n #deleteLocalData: true # Continue even if there are pods using emptyDir (local data that will be deleted when the node is drained)\n #force: true # Continue even if there are pods that do not declare a controller\n #disableEviction: false # Force drain to use delete, even if eviction is supported. This will bypass checking PodDisruptionBudgets, use with caution\n #skipWaitForDeleteTimeout: 60 # If pod DeletionTimestamp older than N seconds, skip waiting for the pod. Seconds must be greater than 0 to skip.\nstylusPackage: container://OCI_REGISTRY/stylus-linux-amd64:v0.0.0-STYLUS_HASH\noptions:\n system.uri: \"OCI_REGISTRY/ubuntu:k3s-1.26.4-v4.0.4-STYLUS_HASH\"",
"registry": {
"metadata": {
"uid": "_____place_holder_____",
"name": "Public Repo",
"kind": "pack",
"isPrivate": false
}
}
},
{
"name": "edge-k3s",
"type": "spectro",
"layer": "k8s",
"version": "1.26.4",
"tag": "1.26.4",
"values": "cluster:\n config: |\n flannel-backend: host-gw\n disable-network-policy: false\n disable:\n - traefik\n - local-storage\n - servicelb\n - metrics-server\n\n # configure the pod cidr range\n cluster-cidr: \"192.170.0.0/16\"\n\n # configure service cidr range\n service-cidr: \"192.169.0.0/16\"\n\n # kubeconfig must be in run for the stylus operator to manage the cluster\n write-kubeconfig: /run/kubeconfig\n write-kubeconfig-mode: 600\n\n # additional component settings to harden installation\n kube-apiserver-arg:\n - anonymous-auth=true\n - profiling=false\n - disable-admission-plugins=AlwaysAdmit\n - default-not-ready-toleration-seconds=60\n - default-unreachable-toleration-seconds=60\n - enable-admission-plugins=AlwaysPullImages,NamespaceLifecycle,ServiceAccount,NodeRestriction\n - audit-log-path=/var/log/apiserver/audit.log\n - audit-policy-file=/etc/kubernetes/audit-policy.yaml\n - audit-log-maxage=30\n - audit-log-maxbackup=10\n - audit-log-maxsize=100\n - authorization-mode=RBAC,Node\n - tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256\n kube-controller-manager-arg:\n - profiling=false\n - terminated-pod-gc-threshold=25\n - use-service-account-credentials=true\n - feature-gates=RotateKubeletServerCertificate=true\n - node-monitor-period=5s\n - node-monitor-grace-period=20s\n - pod-eviction-timeout=20s\n kube-scheduler-arg:\n - profiling=false\n kubelet-arg:\n - read-only-port=0\n - event-qps=0\n - feature-gates=RotateKubeletServerCertificate=true\n - protect-kernel-defaults=true\n - tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_AES_128_GCM_SHA256\n - rotate-server-certificates=true\nstages:\n initramfs:\n - sysctl:\n vm.overcommit_memory: 1\n kernel.panic: 10\n kernel.panic_on_oops: 1\n kernel.printk: \"0 4 0 7\"\n - directories:\n - path: \"/var/log/apiserver\"\n permissions: 0644\n files:\n - path: /etc/hosts\n permission: \"0644\"\n content: |\n 127.0.0.1 localhost\n - path: \"/etc/kubernetes/audit-policy.yaml\"\n owner_string: \"root\"\n permission: 0600\n content: |\n apiVersion: audit.k8s.io/v1\n kind: Policy\n rules:\n - level: None\n users: [\"system:kube-proxy\"]\n verbs: [\"watch\"]\n resources:\n - group: \"\" # core\n resources: [\"endpoints\", \"services\", \"services/status\"]\n - level: None\n users: [\"system:unsecured\"]\n namespaces: [\"kube-system\"]\n verbs: [\"get\"]\n resources:\n - group: \"\" # core\n resources: [\"configmaps\"]\n - level: None\n users: [\"kubelet\"] # legacy kubelet identity\n verbs: [\"get\"]\n resources:\n - group: \"\" # core\n resources: [\"nodes\", \"nodes/status\"]\n - level: None\n userGroups: [\"system:nodes\"]\n verbs: [\"get\"]\n resources:\n - group: \"\" # core\n resources: [\"nodes\", \"nodes/status\"]\n - level: None\n users:\n - system:kube-controller-manager\n - system:kube-scheduler\n - system:serviceaccount:kube-system:endpoint-controller\n verbs: [\"get\", \"update\"]\n namespaces: [\"kube-system\"]\n resources:\n - group: \"\" # core\n resources: [\"endpoints\"]\n - level: None\n users: [\"system:apiserver\"]\n verbs: [\"get\"]\n resources:\n - group: \"\" # core\n resources: [\"namespaces\", \"namespaces/status\", \"namespaces/finalize\"]\n - level: None\n users: [\"cluster-autoscaler\"]\n verbs: [\"get\", \"update\"]\n namespaces: [\"kube-system\"]\n resources:\n - group: \"\" # core\n resources: [\"configmaps\", \"endpoints\"]\n # Don't log HPA fetching metrics.\n - level: None\n users:\n - system:kube-controller-manager\n verbs: [\"get\", \"list\"]\n resources:\n - group: \"metrics.k8s.io\"\n # Don't log these read-only URLs.\n - level: None\n nonResourceURLs:\n - /healthz*\n - /version\n - /swagger*\n # Don't log events requests.\n - level: None\n resources:\n - group: \"\" # core\n resources: [\"events\"]\n # node and pod status calls from nodes are high-volume and can be large, don't log responses for expected updates from nodes\n - level: Request\n users: [\"kubelet\", \"system:node-problem-detector\", \"system:serviceaccount:kube-system:node-problem-detector\"]\n verbs: [\"update\",\"patch\"]\n resources:\n - group: \"\" # core\n resources: [\"nodes/status\", \"pods/status\"]\n omitStages:\n - \"RequestReceived\"\n - level: Request\n userGroups: [\"system:nodes\"]\n verbs: [\"update\",\"patch\"]\n resources:\n - group: \"\" # core\n resources: [\"nodes/status\", \"pods/status\"]\n omitStages:\n - \"RequestReceived\"\n # deletecollection calls can be large, don't log responses for expected namespace deletions\n - level: Request\n users: [\"system:serviceaccount:kube-system:namespace-controller\"]\n verbs: [\"deletecollection\"]\n omitStages:\n - \"RequestReceived\"\n # Secrets, ConfigMaps, and TokenReviews can contain sensitive \u0026 binary data,\n # so only log at the Metadata level.\n - level: Metadata\n resources:\n - group: \"\" # core\n resources: [\"secrets\", \"configmaps\"]\n - group: authentication.k8s.io\n resources: [\"tokenreviews\"]\n omitStages:\n - \"RequestReceived\"\n # Get repsonses can be large; skip them.\n - level: Request\n verbs: [\"get\", \"list\", \"watch\"]\n resources:\n - group: \"\" # core\n - group: \"admissionregistration.k8s.io\"\n - group: \"apiextensions.k8s.io\"\n - group: \"apiregistration.k8s.io\"\n - group: \"apps\"\n - group: \"authentication.k8s.io\"\n - group: \"authorization.k8s.io\"\n - group: \"autoscaling\"\n - group: \"batch\"\n - group: \"certificates.k8s.io\"\n - group: \"extensions\"\n - group: \"metrics.k8s.io\"\n - group: \"networking.k8s.io\"\n - group: \"policy\"\n - group: \"rbac.authorization.k8s.io\"\n - group: \"settings.k8s.io\"\n - group: \"storage.k8s.io\"\n omitStages:\n - \"RequestReceived\"\n # Default level for known APIs\n - level: RequestResponse\n resources:\n - group: \"\" # core\n - group: \"admissionregistration.k8s.io\"\n - group: \"apiextensions.k8s.io\"\n - group: \"apiregistration.k8s.io\"\n - group: \"apps\"\n - group: \"authentication.k8s.io\"\n - group: \"authorization.k8s.io\"\n - group: \"autoscaling\"\n - group: \"batch\"\n - group: \"certificates.k8s.io\"\n - group: \"extensions\"\n - group: \"metrics.k8s.io\"\n - group: \"networking.k8s.io\"\n - group: \"policy\"\n - group: \"rbac.authorization.k8s.io\"\n - group: \"settings.k8s.io\"\n - group: \"storage.k8s.io\"\n omitStages:\n - \"RequestReceived\"\n # Default level for all other requests.\n - level: Metadata\n omitStages:\n - \"RequestReceived\"\npack:\n palette:\n config:\n oidc:\n identityProvider: noauth",
"registry": {
"metadata": {
"uid": "_____place_holder_____",
"name": "Public Repo",
"kind": "pack",
"isPrivate": false
}
}
},
{
"name": "cni-custom",
"type": "spectro",
"layer": "cni",
"version": "0.1.0",
"tag": "0.1.0",
"values": "manifests:\n byo-cni:\n contents: |\n apiVersion: v1\n kind: ConfigMap\n metadata:\n name: custom-cni\n data:\n # property-like keys; each key maps to a simple value\n custom-cni: \"byo-cni\"",
"registry": {
"metadata": {
"uid": "_____place_holder_____",
"name": "Public Repo",
"kind": "pack",
"isPrivate": false
}
}
}
]
}
}
}
Loading