Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU sharing corner case: vGPUs spread to two or more physical GPUs #98

Open
Fizzbb opened this issue Feb 18, 2022 · 0 comments
Open

GPU sharing corner case: vGPUs spread to two or more physical GPUs #98

Fizzbb opened this issue Feb 18, 2022 · 0 comments

Comments

@Fizzbb
Copy link
Collaborator

Fizzbb commented Feb 18, 2022

In release v0.3.0, the main feature is GPU sharing. To use this feature, we assume user application will only request a fractional GPU. If users need 1 or more GPUs, we will direct to other GPU resources (not alnair/vgpu-memory, implement later)
Therefore, in the application code, it is only assume one visible device, and the code is written based on one GPU configuration. At the same time in Alnair implementation we guarantee all vGPUs falls into one physical GPU.

For example a server of two physical gpus, each of them is split into 10 vGPU. Total capacity is 20. User can only request fewer than 10 vGPUs. Scheduler is responsible for filtering out the node that does not have any GPU has enough vGPUs. Device plugin is responsible for picking a GPU card has enough vGPUs. On a node, it is possible that some GPU have enough vGPUs, but others don't have. Device plugin will select the one has enough vGPUs.

However, this could lead to resource fragmentation. We will investigate the algorithms and strategies to minimize this drawbacks, and evaluate the tradeoff between sharing and fragmentation later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant