Improve cluster autoscaler e2e test #7965

abhinavmpandey08 · 2024-04-11T16:47:30Z

Description of changes:
Currently, cluster autoscaler test creates a busybox deployment with HPA to simulate load to trigger autoscaling.
The problem with this approach is that we expect HPA to scale up the pods to a point where they are un-schedulable which should cause autoscaler to scale up the worker nodes, but there's no guarantee that it'll do that. Plus, HPA can take time to scale the pods as well.
So instead, we replace the uncertainty with HPA and just try to schedule 111 pods which is over the limit of number of pods supported per Kubernetes node. This will ensure that there's atleast 1 pod that's un-schedulable and cause the autoscaler to scale up the MD.

There is a bug in cluster autoscaler currently where it's not able to autoscale the cluster initially because of missing permissions on infrastructure machine template. Cluster Autoscaler does restart after ~10 min after which it starts functioning normally.
Instead of waiting for ~10 min, we'll also force triggering a restart so the e2e doesn't have to wait.
This can be removed once the following issue is resolve upstream kubernetes/autoscaler#6490

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

codecov · 2024-04-11T16:53:26Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 73.42%. Comparing base (618bd05) to head (f009c57).
Report is 1 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #7965   +/-   ##
=======================================
  Coverage   73.42%   73.42%           
=======================================
  Files         577      577           
  Lines       35821    35821           
=======================================
  Hits        26302    26302           
  Misses       7854     7854           
  Partials     1665     1665

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

pokearu

/lgtm
/approve

eks-distro-bot · 2024-04-11T21:50:51Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pokearu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [pokearu]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sp1999 · 2024-06-11T01:27:43Z

/cherry-pick release-0.19

eks-distro-pr-bot · 2024-06-11T01:28:18Z

@sp1999: #7965 failed to apply on top of branch "release-0.19":

Applying: Improve cluster autoscaler e2e test
Using index info to reconstruct a base tree...
M	test/framework/cluster.go
M	test/framework/testdata/hpa_busybox.yaml
Falling back to patching base and 3-way merge...
CONFLICT (modify/delete): test/framework/testdata/hpa_busybox.yaml deleted in Improve cluster autoscaler e2e test and modified in HEAD. Version HEAD of test/framework/testdata/hpa_busybox.yaml left in tree.
Auto-merging test/framework/cluster.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Improve cluster autoscaler e2e test
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

In response to this:

/cherry-pick release-0.19

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Improve cluster autoscaler e2e test

f009c57

eks-distro-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 11, 2024

pokearu approved these changes Apr 11, 2024

View reviewed changes

eks-distro-bot assigned pokearu Apr 11, 2024

eks-distro-bot added the lgtm label Apr 11, 2024

eks-distro-bot added the approved label Apr 11, 2024

eks-distro-bot merged commit 3fe6cc7 into aws:main Apr 11, 2024
9 checks passed

abhinavmpandey08 deleted the autoscaler-e2e branch April 11, 2024 22:17

sp1999 mentioned this pull request Jun 11, 2024

[release-0.19] Improve cluster autoscaler e2e test #8281

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve cluster autoscaler e2e test #7965

Improve cluster autoscaler e2e test #7965

abhinavmpandey08 commented Apr 11, 2024

codecov bot commented Apr 11, 2024 •

edited

Loading

pokearu left a comment

eks-distro-bot commented Apr 11, 2024

sp1999 commented Jun 11, 2024

eks-distro-pr-bot commented Jun 11, 2024

Improve cluster autoscaler e2e test #7965

Improve cluster autoscaler e2e test #7965

Conversation

abhinavmpandey08 commented Apr 11, 2024

codecov bot commented Apr 11, 2024 • edited Loading

Codecov Report

pokearu left a comment

Choose a reason for hiding this comment

eks-distro-bot commented Apr 11, 2024

sp1999 commented Jun 11, 2024

eks-distro-pr-bot commented Jun 11, 2024

codecov bot commented Apr 11, 2024 •

edited

Loading