Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple ReplicaSet revisions triggering redeployments #663

Open
JonnyBDev opened this issue Jan 11, 2025 · 1 comment
Open

Multiple ReplicaSet revisions triggering redeployments #663

JonnyBDev opened this issue Jan 11, 2025 · 1 comment

Comments

@JonnyBDev
Copy link

We’ve observed some weird behavior of Kamaji in our environment. We are not sure if it’s a configuration error from our side, but we wanted to share the results of our investigation to help others and / or fixing this behavior. We are using the latest edge version of Kamaji.

Creating a TCP with a random number of replicas (tested 1 and 3) will result in multiple adjustments to the Deployment which results in four revisions of the ReplicaSet. This will trigger multiple redeployments in the startup phase of the TCP. We've deployed a TCP with replicas set to 3 and had like 12 different pods in different states (Running, Pending, Termintating) at the same time due to the "fast" updates of the replicaset. It's important to mention that these changes (GEN 1 > 4) happened in around 5 seconds. Below you can see the differences between each revision. We've not highlighted all changes (like creationTimestamp, selfLink or labels). Changes observed in each generation change (like name, uid, pod-template-hash) were not mentioned. We've also attached all ReplicaSet.yaml manifests at the given time, but as .txt because GitHub don't support yaml as file type. yikes.

TCP Changes

Main observations

Most changes are coming from the konnectivity-server and it's configuration to the kube-apiserver which got reverted and added back. Same with the kind-uds volume.

  • GEN 1 > 2: Added konnectivity-server with configuration
  • GEN 2 > 3: Removed konnectivity-server + kind-uds and added it back
  • GEN 3 > 4: Added konnectivity-server configuration arg back

TCP (yaml) Download

TCP_1.txt
TCP_2.txt
TCP_3.txt
TCP_4.txt

TCP GEN 1 > 2 Differences

metadata.annotations.deployment.kubernetes.io/revision: '1' > '2'
spec.selector.matchLabels.pod-template-hash: 8f9dbbb45 > 6869d9d576
spec.template.metadata.labels.pod-template-hash: 8f9dbbb45 > 6869d9d576
spec.template.spec.volumes: [] > ['konnectivity-uds', 'egress-selector-configuration', 'konnectivity-server-kubeconfig'] (those volumes have been added. Others were present. The added were not present before)
spec.template.spec.containers[kube-apiserver].args: [] > '--egress-selector-config-file=/etc/kubernetes/konnectivity/configurations/egress-selector-configuration.yaml' (same as above)
spec.template.spec.containers[kube-apiserver].volumeMounts: [] > ['konnectivity-uds', 'egress-selector-configuration'] (same as above)
spec.template.spec.containers[konnectivity-server]: konnectivity-server container wasn't in the 1st generation

TCP GEN 2 > 3 Differences

metadata.annotations.deployment.kubernetes.io/revision: '2' > '3'
metadata.annotations.kube-apiserver.kamaji.clastix.io/args: '' > '6' (Added in GEN 3)
status.replicas: 0 > 3
status.fullyLabeledReplicas: "" > "3" (Added in GEN 3)
status.readyReplicas: "" > "3" (Added in GEN 3)
status.availableReplicas: "" > "3" (Added in GEN 3)
spec.replicas: 0 > 3
spec.selector.matchLabels.pod-template-hash: 6869d9d576 > 668c9c747d
spec.template.metadata.labels.pod-template-hash: 6869d9d576 > 668c9c747d
spec.template.spec.containers[kube-apiserver].args: '--egress-selector-config-file=/etc/kubernetes/konnectivity/configurations/egress-selector-configuration.yaml' > "" (Removed)
spec.template.spec.containers[kube-apiserver].volumeMounts[kine-uds] (Removed and added at last entry in list again in the same GEN)

TCP GEN 3 > 4 Differences

metadata.annotations.deployment.kubernetes.io/revision: '3' > '4'
status.readyReplicas: 3 > "" (Removed)
stauts.availableReplicas: 3 > "" (Removed)
spec.template.spec.containers[kube-apiserver].args: [] > '--egress-selector-config-file=/etc/kubernetes/konnectivity/configurations/egress-selector-configuration.yaml' (Added again, was removed in GEN3)

@prometherion
Copy link
Member

Thanks for opening the issue.

You correctly pointed our what's the main topic here, and it's konnectivity: it requires some mangling of the API Server flags (the EgressSelectorConfiguration one) as well as the Konnectivity server flag --server-count which needs to match the amount of Control Plane replicas.

The latter one is absolutely mandatory to ensure each Konnectivity agent on the nodes are connecting to all the available replicas: when users interact with the API Server for the Kubelet operations (such as logs, exec, etc. ops) the API Server instance must have an already established connection with the current API Server instance, and that value is used to check if the connection has been established against all the available backends.

If you try to scale a Tenant Control Plane with no konnectivity, you'll notice new replicas are added as expected, mostly because Konnectivity requires these changes which trigger a new ReplicaSet version.

I had a discussion with other Hosted Control Plane projects maintainers and the idea was to engage with the Konnectivity maintainers, trying to understand if we can avoid these kinds of reloads upon Control Plane scaling: never had the time, and the community didn't help.

Let me know if you want to know more, or if we can close the "issue" since it's not a bug per se.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants