-
Notifications
You must be signed in to change notification settings - Fork 578
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWSMachinePool does not drain nodes during scale-in #2023
Comments
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-contributor-experience at kubernetes/community. |
Chatting with @sedefsavas Sync with CAPZ on MachinePool v.Next @kschumy , any ideas on what we should do here? |
We can follow a similar approach with Openshift's POC about polling termination endpoint: More on this is discussed in the cluster-api proposal: kubernetes-sigs/cluster-api#3528 |
Rotten issues close after 30d of inactivity. Send feedback to sig-contributor-experience at kubernetes/community. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/reopen |
From office hours 2023-04-03:
/triage accepted |
Also from office hours discussion: Users define Pod Disruption Budgets to ensure that their Pods are not voluntarily deleted. A scale-in of a MachinePool, if it uses the "providers refresh", will always proceed, even if it violates a budget. For comparison, a scale-in of a MachineDeployment will never proceed if it violates a budget. |
This issue is labeled with You can:
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/ /remove-triage accepted |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
/remove-lifecycle rotten |
Is there any momentum around getting this implemented? We make extensive use of AWSMachinePools and need the ability for the Nodes to be drained to avoid disrupting hosted workloads. |
/kind bug
What steps did you take and what happened:
replicas: 5
and create the associated AWSMachinePool resources. (Note: this AWSMachinePool is not managed by cluster-autoscaler)maxUnavailable: 1
replicas: 3
This caused the AWSMachineController to set the DesiredInstances in the ASG to 3 without draining nodes at all. The PDB was not honored, and the EC2 instances were terminated by the ASG immediately.
What did you expect to happen:
The nodes should have drained gracefully before the EC2 instances are terminated.
Anything else you would like to add:
In the current AWSMachinePool implementation, the instance selection for scale-in is performed at the AutoScalingGroup. This could be fixed in the non-cluster-autoscaler case by modifying AWSMachinePool controller to perform node selection for scale-in, drain the selected nodes, and finally utilize the AWS
TerminateInstanceInAutoScalingGroup
action while setting the request valueShouldDecrementDesiredCapacity: true
We may want to also consider a lifecycle hook on the autoscaling group that prevents ec2 instance termination until the drain completes. This would help to prevent cases where instances are forcibly terminated without draining when the DesiredInstances values are manipulated via the EC2 console, CLI, or APIs.
Environment:
Cluster-api-provider-aws version: Commit: 3338cd4
Kubernetes version: (use kubectl version): v.1.17.9
OS (e.g. from /etc/os-release): Amazon Linux 2
The text was updated successfully, but these errors were encountered: