KEP-4951: Configurable tolerance for HPA #4954

jm-franc · 2024-11-06T22:12:32Z

Add KEP-4951: Configurable tolerance for HPA

Issue link: #4951

k8s-ci-robot · 2024-11-06T22:12:42Z

Welcome @jm-franc!

It looks like this is your first PR to kubernetes/enhancements 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/enhancements has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2024-11-06T22:12:42Z

Hi @jm-franc. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

raywainman

This is great! Looking forward to tackling a longstanding user request! :)

keps/sig-autoscaling/4951-configurable-hpa-tolerance/kep.yaml

keps/sig-autoscaling/4951-configurable-hpa-tolerance/README.md

pr00se

Excellent work @jm-franc, thank you!

I left some suggestions in the comments, feel free to use them or tweak them further. I skipped the sections that are needed for Beta, since we're still a ways off (even the Alpha ones may be premature).

I think we can merge the end result as-is, and then work with the sig to get it to the implementable state.

keps/sig-autoscaling/4951-configurable-hpa-tolerance/kep.yaml

keps/sig-autoscaling/4951-configurable-hpa-tolerance/README.md

Co-authored-by: Patryk Prus <[email protected]>

Add a scaling scenario. Co-authored-by: Patryk Prus <[email protected]>

Co-authored-by: Patryk Prus <[email protected]>

Specify that this feature won't increase resource usage. Co-authored-by: Patryk Prus <[email protected]>

Add version history. Co-authored-by: Patryk Prus <[email protected]>

Specify that this feature won't impact SLIs/SLOs. Co-authored-by: Patryk Prus <[email protected]>

jm-franc · 2024-11-13T22:42:40Z

Excellent work @jm-franc, thank you!

I left some suggestions in the comments, feel free to use them or tweak them further. I skipped the sections that are needed for Beta, since we're still a ways off (even the Alpha ones may be premature).

I think we can merge the end result as-is, and then work with the sig to get it to the implementable state.

Thank you for all those suggestions Patryk, I've merged all of them. I'm now sending this KEP for review.

This is close to what the code coverage will be in 1.32.

Co-authored-by: Patryk Prus <[email protected]>

jm-franc · 2024-11-18T17:54:07Z

/assign @gjtempleton

Assigning to Guy for review/approval.

raywainman

/lgtm

raywainman · 2024-11-25T17:59:23Z

keps/sig-autoscaling/4951-configurable-hpa-tolerance/README.md

+
+- [ ] Events
+  - Event Reason:
+- [ ] API .status


Small ask - I wonder if we can emit a message in the HPA status when the tolerance level is preventing a scale up or scale down?

(looking at the code, it might be tricky to add but wonder what your thoughts are and whether it would be useful)

I looked it up, but I can't find any straightforward way to do this.

We don't want to emit a message when the HPAs is just in-tolerance (that's almost always the case), but we could emit it only when the computed number of replicas is different from the current one.

The message would still trigger frequently for metrics valued just at the boundary between 2 replicas.

It's not clear what to do when some metrics are in tolerance while others aren't (and even if some are in-tolerance, it doesn't mean that a smaller tolerance would change the final recommendation).

This doesn't look unsolvable: we could for example compute the recommendation both normally and with tolerances set to 0, then warn users if the results are different. (But this would be a large refactoring).

If I'm missing a simpler solution I would be all in favour for this. If it's as complicated as I currently see this, I'd keep this outside of this KEP as this is only tangentially related to the problem addressed here.

Completely reasonable. Thanks for checking.

k8s-ci-robot · 2024-11-25T19:33:42Z

New changes are detected. LGTM label has been removed.

k8s-ci-robot · 2024-11-25T19:33:48Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: jm-franc
Once this PR has been reviewed and has the lgtm label, please ask for approval from gjtempleton. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

keps/sig-autoscaling/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jm-franc · 2024-11-25T19:37:46Z

/retest

k8s-ci-robot · 2024-11-25T19:38:01Z

@jm-franc: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

raywainman · 2024-11-25T19:41:47Z

/ok-to-test

jm-franc · 2024-11-25T19:52:56Z

/retest

jm-franc added 3 commits November 6, 2024 01:30

Add KEP on Configurable tolerance for HPA.

b01f5d8

Add details about upgrade/downgrades, feature gate.

32bc885

Set title.

8f9396f

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Nov 6, 2024

k8s-ci-robot requested review from gjtempleton and MaciekPytel November 6, 2024 22:12

k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Nov 6, 2024

k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Nov 6, 2024

raywainman reviewed Nov 7, 2024

View reviewed changes

jm-franc mentioned this pull request Nov 8, 2024

Configurable tolerance for Horizontal Pod Autoscalers #4951

Open

4 tasks

pr00se reviewed Nov 12, 2024

View reviewed changes

jm-franc and others added 8 commits November 13, 2024 15:46

Update keps/sig-autoscaling/4951-configurable-hpa-tolerance/kep.yaml

da1052d

Co-authored-by: Patryk Prus <[email protected]>

Update keps/sig-autoscaling/4951-configurable-hpa-tolerance/kep.yaml

49c8b2f

Co-authored-by: Patryk Prus <[email protected]>

Update keps/sig-autoscaling/4951-configurable-hpa-tolerance/kep.yaml

71ed0a6

Co-authored-by: Patryk Prus <[email protected]>

Update keps/sig-autoscaling/4951-configurable-hpa-tolerance/README.md

2fa9137

Add a scaling scenario. Co-authored-by: Patryk Prus <[email protected]>

Update keps/sig-autoscaling/4951-configurable-hpa-tolerance/README.md

d3b8b5c

Co-authored-by: Patryk Prus <[email protected]>

Update keps/sig-autoscaling/4951-configurable-hpa-tolerance/README.md

3d48ef6

Specify that this feature won't increase resource usage. Co-authored-by: Patryk Prus <[email protected]>

Update keps/sig-autoscaling/4951-configurable-hpa-tolerance/README.md

d99d1cf

Add version history. Co-authored-by: Patryk Prus <[email protected]>

Update keps/sig-autoscaling/4951-configurable-hpa-tolerance/README.md

ecc3d81

Specify that this feature won't impact SLIs/SLOs. Co-authored-by: Patryk Prus <[email protected]>

k8s-ci-robot added the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Nov 13, 2024

jm-franc marked this pull request as ready for review November 13, 2024 22:43

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 13, 2024

ryanzhang-oss and others added 2 commits November 13, 2024 17:52

update the KEP with cluster access mechanism

dc19885

DRA: update unit test coverage for 1.32

99aa891

This is close to what the code coverage will be in 1.32.

jm-franc and others added 7 commits November 13, 2024 17:52

Update keps/sig-autoscaling/4951-configurable-hpa-tolerance/README.md

9fc2d07

Co-authored-by: Patryk Prus <[email protected]>

Improve description of the 'tolerance' field.

4058fff

Co-authored-by: Patryk Prus <[email protected]>

Update keps/sig-autoscaling/4951-configurable-hpa-tolerance/README.md

b8d7381

Co-authored-by: Patryk Prus <[email protected]>

Update keps/sig-autoscaling/4951-configurable-hpa-tolerance/README.md

566e097

Co-authored-by: Patryk Prus <[email protected]>

Improve the section dedicated to risks.

23c4ff1

Co-authored-by: Patryk Prus <[email protected]>

Apply suggestions from code review.

df5b1b9

Co-authored-by: Patryk Prus <[email protected]>

Fix wording.

0f848b4

Co-authored-by: Patryk Prus <[email protected]>

jm-franc force-pushed the kep-hpa-tolerance branch from 53c6e1f to 0f848b4 Compare November 13, 2024 22:52

jm-franc requested a review from raywainman November 13, 2024 22:54

Merge branch 'kubernetes:master' into kep-hpa-tolerance

e5addf6

k8s-ci-robot assigned gjtempleton Nov 18, 2024

raywainman reviewed Nov 25, 2024

View reviewed changes

k8s-ci-robot assigned raywainman Nov 25, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 25, 2024

Update TOC.

bd0b164

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 25, 2024

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Nov 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KEP-4951: Configurable tolerance for HPA #4954

KEP-4951: Configurable tolerance for HPA #4954

jm-franc commented Nov 6, 2024

k8s-ci-robot commented Nov 6, 2024

k8s-ci-robot commented Nov 6, 2024

raywainman left a comment

pr00se left a comment

jm-franc commented Nov 13, 2024

jm-franc commented Nov 18, 2024

raywainman left a comment

raywainman Nov 25, 2024

jm-franc Nov 25, 2024

raywainman Nov 25, 2024

k8s-ci-robot commented Nov 25, 2024

k8s-ci-robot commented Nov 25, 2024

jm-franc commented Nov 25, 2024

k8s-ci-robot commented Nov 25, 2024

raywainman commented Nov 25, 2024

jm-franc commented Nov 25, 2024

KEP-4951: Configurable tolerance for HPA #4954

Are you sure you want to change the base?

KEP-4951: Configurable tolerance for HPA #4954

Conversation

jm-franc commented Nov 6, 2024

k8s-ci-robot commented Nov 6, 2024

k8s-ci-robot commented Nov 6, 2024

raywainman left a comment

Choose a reason for hiding this comment

pr00se left a comment

Choose a reason for hiding this comment

jm-franc commented Nov 13, 2024

jm-franc commented Nov 18, 2024

raywainman left a comment

Choose a reason for hiding this comment

raywainman Nov 25, 2024

Choose a reason for hiding this comment

jm-franc Nov 25, 2024

Choose a reason for hiding this comment

raywainman Nov 25, 2024

Choose a reason for hiding this comment

k8s-ci-robot commented Nov 25, 2024

k8s-ci-robot commented Nov 25, 2024

jm-franc commented Nov 25, 2024

k8s-ci-robot commented Nov 25, 2024

raywainman commented Nov 25, 2024

jm-franc commented Nov 25, 2024