Cluster autoscaling for AKS: Request Rate Throttling has been detected for your Cluster #1432
Replies: 5 comments
-
Thanks for the report but can you explain a bit further please? This sounds more like an AKS problem rather than KEDA? |
Beta Was this translation helpful? Give feedback.
-
You don't give enough information on your deployment config, but some points to consider:
Resources limit could help prevent your pods from overusing resources, but Kubernetes won't use them to schedule your pods. |
Beta Was this translation helpful? Give feedback.
-
The generated deployment (by func kubernetes deploy) does not contain resource requests. I had to add those indeed to prevent the node to become not ready. But my point was about cluster autoscaling of AKS: it's so slow and Keda is creating many replicas (100), that I run into throtling issues with Azure. So it's indeed not a bug in Keda, but a bit more guidance would be nice, so the cluster doesn't 'collapse' on my first attempt to use Keda. |
Beta Was this translation helpful? Give feedback.
-
Thanks for letting us know, would you mind opening an issue on http://github.com/azure/azure-functions-core-tools for the I've opened #1450 & kedacore/charts#112 to provide these resources out-of-the-box as guidance |
Beta Was this translation helpful? Give feedback.
-
@kwaazaar, you could also configure the scaling behaviour to prevent too much pods from being created in a short amount of time. See support-for-configurable-scaling-behavior which can be configured in the |
Beta Was this translation helpful? Give feedback.
-
Out of the box on my first attempt to investigate Keda I run into scaling issues with my cluster.
Expected Behavior
Cluster autoscaling should keep working. It's the only reason why running functions on AKS (or other auto-scaling clusters) makes sense.
Actual Behavior
Keda is scaling to 100 replicas. My cluster needs to autoscale to provide the power to run these replicas (from 1 until its maximum of 10 nodes). This all works, but after a few of these 'bursts' (the queue gets filled up quickly periodically), I get these errors on the cluster/VMSS it's not scaling anymore.
Maybe the solution would be to not use 100 replicas as the default max. 10 maybe makes more sense.
And maybe advice to use resource limits in the deployment of the function, because by default it scheduled all the replicas on a single node, which caused my whole cluster to 'stumble'.
Steps to Reproduce the Problem
See above.
Logs from KEDA operator
No errors. Problem is AKS, but indirectly caused by Keda behavior.
Specifications
Beta Was this translation helpful? Give feedback.
All reactions