Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drift Exponential Wave-Based Rollout #1775

Open
njtran opened this issue Oct 24, 2024 · 3 comments
Open

Drift Exponential Wave-Based Rollout #1775

njtran opened this issue Oct 24, 2024 · 3 comments
Labels
deprovisioning Issues related to node deprovisioning kind/feature Categorizes issue or PR as related to a new feature.

Comments

@njtran
Copy link
Contributor

njtran commented Oct 24, 2024

Description

What problem are you trying to solve?
When making changes to a NodePool, all nodes for that nodepool could be drifted, at which Karpenter begins automated upgrades to all the nodes owned by the nodepool. Depending on how NodePools are architected in a cluster, this could be a large percentage if not all of the nodes in your cluster.

Thankfully there are ways to rate-limit the speed at which these upgrades happen, natively implemented in Karpenter.

  1. Karpenter will wait for a replacement node for a drifted node to be ready and healthy before draining pods on the drifted node. This prioritizes application availability, and guards against user-errors in the bootstrapping logic in the Node from impacting the cluster.
  2. NodePool Disruption Budgets limits the number of concurrently disrupting nodes at a given time. If a user's nodes go unhealthy shortly after step 1, this would limit the number of nodes that could be drifting, potentially halting the progress.
  3. PDBs and do-not-disrupt annotations limits how quick nodes can be drained, never violating the user-defined minimum level of application availability.
  4. Unless empty, drifted nodes are enqueued for disruption one at a time. Only once the node starts draining after its replacement in step 1 is complete do we consider another node for drift in parallel.

Yet, this doesn't solve all rollout cases. Users with quick drain times (less restrictions on pod evictions) may actually see rollouts be too quick, since Karpenter would drain nodes as fast as it can with no "bake time" in between an upgrade. This is particularly painful for issues with a node image that present itself after some period of time or some level of stress/load.

As such, I'm proposing that Drift be rolled out to a cluster in waves. This would be automatically computed based on the number of nodes in a NodePool/Cluster, the number of nodes that are drifted, and sane defaults on the total amount of time to rollout a cluster, and the number of waves.

As an example, take 100 nodes in a NodePool. If I were to drift the nodes in a NodePool, and let's say I wanted it to take 24 hours, I could "leak in" nodes that can be considered drifted every hour. With an increasing factor of 2, we could drift all 100 nodes in 8 waves (1 -> 2 -> ... -> 64 -> 100). We can divide the 24 hours into 8 time intervals, so that in the 0-3h time frame, only one node is driftable, and in the 21-24h time frame 64 nodes could be drifted.

Configuration could be all of or a subset of any of the following:

  • Total number of waves
  • Exponential rate of increase
  • Idle time in between waves
  • Total time to rollout
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@njtran njtran added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 24, 2024
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Oct 24, 2024
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If Karpenter contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@njtran njtran added deprovisioning Issues related to node deprovisioning and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Oct 24, 2024
@jwcesign
Copy link
Contributor

I think the start-time/end-time needs to be considered.

Like, doing the drift in the middle night is a better choice.

@njtran
Copy link
Contributor Author

njtran commented Oct 28, 2024

I think the start-time/end-time needs to be considered.
Like, doing the drift in the middle night is a better choice.

I could see it either way. Some users might want to be present when upgrades are happening, some might not be. Either way, we'd need to orchestrate/integrate this with Budgets and when the drift gets kicked off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deprovisioning Issues related to node deprovisioning kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

3 participants