-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: Add an option that allows users to choose not to generate the root queue capacity based on the actual cluster resources. #3947
Comments
https://github.com/volcano-sh/volcano/blob/master/pkg/scheduler/plugins/capacity/capacity.go#L485 |
/area scheduling |
We could not assume that the parent's |
Sounds good ideas. |
So the key to the problem you want to express is that when combined with the cluster autoscaler, we cannot panic at that place. I agree with @kingeasternsun 's solution |
I think we may let the root queue's real-capability be the max of total-resource and total-deserved. |
From the user's perspective, they should not be aware too much of the code detail, so it's no necessary to add another param or something to add the burden to them, we should provide a best practice and make sure the code is robust enough to cover as many as possible unexpected cases, and also give trouble shooting methods to make users use queue features more easily, and I think we should add some notices to our queue related docs. @JesseStutler |
If so, we will get a wrong real-capability if total-deserved > total-resource, the real-capability should reflect the real resource view. |
I think as mentioned in #3929, just don’t panic when checking realCapability and Guarantee is enough. We shouldn’t delve into overly complex edge cases. I even thought that with autoscaler, we should consider scaling up the cluster resource rather than preempting/reclaiming. |
What is the problem you're trying to solve
The scheduler determines the root queue's real-capability by summing the available resources of all nodes, which may causes vc-scheduler panic here or beging failed here. It is difficult to ensure the real-capability of a cluster which enabled cluster-autoscaler always greater than total deserved and guarantee manually, and now there is not an effective method to manage queues that declared excessive resources automatically.
Describe the solution you'd like
I think we can change the starting point from root to root's children when we need calculating real-capability, and just let the children's real-capability be there capacity. That can lead to the decoupling of the queues' quota from physical resources, so that it will change the result from crash to pod pending when the physical resources less that total guarantee or total deserved.
Of cause we should make it configurable in order to provide users with two modes.
Additional context
related to #3929
The text was updated successfully, but these errors were encountered: