Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve cluster autoscaler e2e test #7965

Merged
merged 1 commit into from
Apr 11, 2024

Conversation

abhinavmpandey08
Copy link
Member

Description of changes:
Currently, cluster autoscaler test creates a busybox deployment with HPA to simulate load to trigger autoscaling.
The problem with this approach is that we expect HPA to scale up the pods to a point where they are un-schedulable which should cause autoscaler to scale up the worker nodes, but there's no guarantee that it'll do that. Plus, HPA can take time to scale the pods as well.
So instead, we replace the uncertainty with HPA and just try to schedule 111 pods which is over the limit of number of pods supported per Kubernetes node. This will ensure that there's atleast 1 pod that's un-schedulable and cause the autoscaler to scale up the MD.

There is a bug in cluster autoscaler currently where it's not able to autoscale the cluster initially because of missing permissions on infrastructure machine template. Cluster Autoscaler does restart after ~10 min after which it starts functioning normally.
Instead of waiting for ~10 min, we'll also force triggering a restart so the e2e doesn't have to wait.
This can be removed once the following issue is resolve upstream kubernetes/autoscaler#6490

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@eks-distro-bot eks-distro-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 11, 2024
Copy link

codecov bot commented Apr 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.42%. Comparing base (618bd05) to head (f009c57).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #7965   +/-   ##
=======================================
  Coverage   73.42%   73.42%           
=======================================
  Files         577      577           
  Lines       35821    35821           
=======================================
  Hits        26302    26302           
  Misses       7854     7854           
  Partials     1665     1665           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@pokearu pokearu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@eks-distro-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pokearu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@eks-distro-bot eks-distro-bot merged commit 3fe6cc7 into aws:main Apr 11, 2024
9 checks passed
@abhinavmpandey08 abhinavmpandey08 deleted the autoscaler-e2e branch April 11, 2024 22:17
@sp1999
Copy link
Member

sp1999 commented Jun 11, 2024

/cherry-pick release-0.19

@eks-distro-pr-bot
Copy link
Contributor

@sp1999: #7965 failed to apply on top of branch "release-0.19":

Applying: Improve cluster autoscaler e2e test
Using index info to reconstruct a base tree...
M	test/framework/cluster.go
M	test/framework/testdata/hpa_busybox.yaml
Falling back to patching base and 3-way merge...
CONFLICT (modify/delete): test/framework/testdata/hpa_busybox.yaml deleted in Improve cluster autoscaler e2e test and modified in HEAD. Version HEAD of test/framework/testdata/hpa_busybox.yaml left in tree.
Auto-merging test/framework/cluster.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Improve cluster autoscaler e2e test
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-0.19

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants