Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vGPU scheduler assume all the nodes have GPU information annotation. Cannot handle cpu node or the period before annotation got patched #125

Open
Fizzbb opened this issue Apr 7, 2022 · 1 comment

Comments

@Fizzbb
Copy link
Collaborator

Fizzbb commented Apr 7, 2022

To reproduce, when a new node join, all the daemonSet on the node stuck at pending node, with PostFilter plugin error messages.
The is caused by filter plugin denied the node (no GPU annotation found)

@YHDING23
Copy link
Collaborator

YHDING23 commented Apr 7, 2022

When a new node joins the cluster, all the DaemonSets should be scheduled by our customized scheduler. The old scheduler design needs to capture some node information (including GPU counts) of this new node, then make the scheduling decision of these DaemonSets. However, without deploying the DaemonSet of the device plugin as a pod, no GPU information can be captured.

The issue is fixed by adding a function, IsSharingGPU, to determine whether a pod is sharing GPU or not. If the pod is not planning to use any GPU, it will be scheduled without using any node-level information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants