GPU sharing corner case: vGPUs spread to two or more physical GPUs #98

Fizzbb · 2022-02-18T16:43:55Z

In release v0.3.0, the main feature is GPU sharing. To use this feature, we assume user application will only request a fractional GPU. If users need 1 or more GPUs, we will direct to other GPU resources (not alnair/vgpu-memory, implement later)
Therefore, in the application code, it is only assume one visible device, and the code is written based on one GPU configuration. At the same time in Alnair implementation we guarantee all vGPUs falls into one physical GPU.

For example a server of two physical gpus, each of them is split into 10 vGPU. Total capacity is 20. User can only request fewer than 10 vGPUs. Scheduler is responsible for filtering out the node that does not have any GPU has enough vGPUs. Device plugin is responsible for picking a GPU card has enough vGPUs. On a node, it is possible that some GPU have enough vGPUs, but others don't have. Device plugin will select the one has enough vGPUs.

However, this could lead to resource fragmentation. We will investigate the algorithms and strategies to minimize this drawbacks, and evaluate the tradeoff between sharing and fragmentation later.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU sharing corner case: vGPUs spread to two or more physical GPUs #98

GPU sharing corner case: vGPUs spread to two or more physical GPUs #98

Fizzbb commented Feb 18, 2022

GPU sharing corner case: vGPUs spread to two or more physical GPUs #98

GPU sharing corner case: vGPUs spread to two or more physical GPUs #98

Comments

Fizzbb commented Feb 18, 2022