You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We’ve observed some weird behavior of Kamaji in our environment. We are not sure if it’s a configuration error from our side, but we wanted to share the results of our investigation to help others and / or fixing this behavior. We are using the latest edge version of Kamaji.
Creating a TCP with a random number of replicas (tested 1 and 3) will result in multiple adjustments to the Deployment which results in four revisions of the ReplicaSet. This will trigger multiple redeployments in the startup phase of the TCP. We've deployed a TCP with replicas set to 3 and had like 12 different pods in different states (Running, Pending, Termintating) at the same time due to the "fast" updates of the replicaset. It's important to mention that these changes (GEN 1 > 4) happened in around 5 seconds. Below you can see the differences between each revision. We've not highlighted all changes (like creationTimestamp, selfLink or labels). Changes observed in each generation change (like name, uid, pod-template-hash) were not mentioned. We've also attached all ReplicaSet.yaml manifests at the given time, but as .txt because GitHub don't support yaml as file type. yikes.
TCP Changes
Main observations
Most changes are coming from the konnectivity-server and it's configuration to the kube-apiserver which got reverted and added back. Same with the kind-uds volume.
GEN 1 > 2: Added konnectivity-server with configuration
GEN 2 > 3: Removed konnectivity-server + kind-uds and added it back
GEN 3 > 4: Added konnectivity-server configuration arg back
metadata.annotations.deployment.kubernetes.io/revision: '1' > '2'
spec.selector.matchLabels.pod-template-hash: 8f9dbbb45 > 6869d9d576
spec.template.metadata.labels.pod-template-hash: 8f9dbbb45 > 6869d9d576
spec.template.spec.volumes: [] > ['konnectivity-uds', 'egress-selector-configuration', 'konnectivity-server-kubeconfig'] (those volumes have been added. Others were present. The added were not present before)
spec.template.spec.containers[kube-apiserver].args: [] > '--egress-selector-config-file=/etc/kubernetes/konnectivity/configurations/egress-selector-configuration.yaml' (same as above)
spec.template.spec.containers[kube-apiserver].volumeMounts: [] > ['konnectivity-uds', 'egress-selector-configuration'] (same as above)
spec.template.spec.containers[konnectivity-server]: konnectivity-server container wasn't in the 1st generation
TCP GEN 2 > 3 Differences
metadata.annotations.deployment.kubernetes.io/revision: '2' > '3'
metadata.annotations.kube-apiserver.kamaji.clastix.io/args: '' > '6' (Added in GEN 3)
status.replicas: 0 > 3
status.fullyLabeledReplicas: "" > "3" (Added in GEN 3)
status.readyReplicas: "" > "3" (Added in GEN 3)
status.availableReplicas: "" > "3" (Added in GEN 3)
spec.replicas: 0 > 3
spec.selector.matchLabels.pod-template-hash: 6869d9d576 > 668c9c747d
spec.template.metadata.labels.pod-template-hash: 6869d9d576 > 668c9c747d
spec.template.spec.containers[kube-apiserver].args: '--egress-selector-config-file=/etc/kubernetes/konnectivity/configurations/egress-selector-configuration.yaml' > "" (Removed)
spec.template.spec.containers[kube-apiserver].volumeMounts[kine-uds] (Removed and added at last entry in list again in the same GEN)
You correctly pointed our what's the main topic here, and it's konnectivity: it requires some mangling of the API Server flags (the EgressSelectorConfiguration one) as well as the Konnectivity server flag --server-count which needs to match the amount of Control Plane replicas.
The latter one is absolutely mandatory to ensure each Konnectivity agent on the nodes are connecting to all the available replicas: when users interact with the API Server for the Kubelet operations (such as logs, exec, etc. ops) the API Server instance must have an already established connection with the current API Server instance, and that value is used to check if the connection has been established against all the available backends.
If you try to scale a Tenant Control Plane with no konnectivity, you'll notice new replicas are added as expected, mostly because Konnectivity requires these changes which trigger a new ReplicaSet version.
I had a discussion with other Hosted Control Plane projects maintainers and the idea was to engage with the Konnectivity maintainers, trying to understand if we can avoid these kinds of reloads upon Control Plane scaling: never had the time, and the community didn't help.
Let me know if you want to know more, or if we can close the "issue" since it's not a bug per se.
We’ve observed some weird behavior of Kamaji in our environment. We are not sure if it’s a configuration error from our side, but we wanted to share the results of our investigation to help others and / or fixing this behavior. We are using the latest edge version of Kamaji.
Creating a TCP with a random number of replicas (tested 1 and 3) will result in multiple adjustments to the Deployment which results in four revisions of the ReplicaSet. This will trigger multiple redeployments in the startup phase of the TCP. We've deployed a TCP with replicas set to 3 and had like 12 different pods in different states (Running, Pending, Termintating) at the same time due to the "fast" updates of the replicaset. It's important to mention that these changes (GEN 1 > 4) happened in around 5 seconds. Below you can see the differences between each revision. We've not highlighted all changes (like creationTimestamp, selfLink or labels). Changes observed in each generation change (like name, uid, pod-template-hash) were not mentioned. We've also attached all ReplicaSet.yaml manifests at the given time, but as .txt because GitHub don't support
yaml
as file type. yikes.TCP Changes
Main observations
Most changes are coming from the
konnectivity-server
and it's configuration to the kube-apiserver which got reverted and added back. Same with thekind-uds
volume.TCP (yaml) Download
TCP_1.txt
TCP_2.txt
TCP_3.txt
TCP_4.txt
TCP GEN 1 > 2 Differences
metadata.annotations.deployment.kubernetes.io/revision: '1' > '2'
spec.selector.matchLabels.pod-template-hash: 8f9dbbb45 > 6869d9d576
spec.template.metadata.labels.pod-template-hash: 8f9dbbb45 > 6869d9d576
spec.template.spec.volumes: [] > ['konnectivity-uds', 'egress-selector-configuration', 'konnectivity-server-kubeconfig'] (those volumes have been added. Others were present. The added were not present before)
spec.template.spec.containers[kube-apiserver].args: [] > '--egress-selector-config-file=/etc/kubernetes/konnectivity/configurations/egress-selector-configuration.yaml' (same as above)
spec.template.spec.containers[kube-apiserver].volumeMounts: [] > ['konnectivity-uds', 'egress-selector-configuration'] (same as above)
spec.template.spec.containers[konnectivity-server]: konnectivity-server container wasn't in the 1st generation
TCP GEN 2 > 3 Differences
metadata.annotations.deployment.kubernetes.io/revision: '2' > '3'
metadata.annotations.kube-apiserver.kamaji.clastix.io/args: '' > '6' (Added in GEN 3)
status.replicas: 0 > 3
status.fullyLabeledReplicas: "" > "3" (Added in GEN 3)
status.readyReplicas: "" > "3" (Added in GEN 3)
status.availableReplicas: "" > "3" (Added in GEN 3)
spec.replicas: 0 > 3
spec.selector.matchLabels.pod-template-hash: 6869d9d576 > 668c9c747d
spec.template.metadata.labels.pod-template-hash: 6869d9d576 > 668c9c747d
spec.template.spec.containers[kube-apiserver].args: '--egress-selector-config-file=/etc/kubernetes/konnectivity/configurations/egress-selector-configuration.yaml' > "" (Removed)
spec.template.spec.containers[kube-apiserver].volumeMounts[kine-uds] (Removed and added at last entry in list again in the same GEN)
TCP GEN 3 > 4 Differences
metadata.annotations.deployment.kubernetes.io/revision: '3' > '4'
status.readyReplicas: 3 > "" (Removed)
stauts.availableReplicas: 3 > "" (Removed)
spec.template.spec.containers[kube-apiserver].args: [] > '--egress-selector-config-file=/etc/kubernetes/konnectivity/configurations/egress-selector-configuration.yaml' (Added again, was removed in GEN3)
The text was updated successfully, but these errors were encountered: