You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Prepare an instance with an NVIDIA GPU, Docker, and CUDA drivers, but without the NVIDIA container runtime (nvidia-container-toolkit).
Create and apply an on-prem fleet configuration with the instance.
Actual behaviour
The fleet is created successfully but the GPU is not mentioned in its resources.
FLEET INSTANCE BACKEND RESOURCES PRICE STATUS CREATED ERROR
on-prem 0 ssh (remote) 24xCPU, 71GB, 36.4GB (disk) $0.0 idle 57 sec ago
The user may not notice that the GPU is missing, in which case they will only find out that something is wrong when trying to run a job on the instance.
Run failed with error code CONTAINER_EXITED_WITH_ERROR.
Error: could not select device driver "" with capabilities: []
Check CLI, server, and run logs for more details.
Expected behaviour
Fleet provisioning fails, the user sees an error about the NVIDIA runtime being misconfigured on the instance.
Steps to reproduce
nvidia-container-toolkit
).Actual behaviour
The fleet is created successfully but the GPU is not mentioned in its resources.
The user may not notice that the GPU is missing, in which case they will only find out that something is wrong when trying to run a job on the instance.
Expected behaviour
Fleet provisioning fails, the user sees an error about the NVIDIA runtime being misconfigured on the instance.
dstack version
0.18.22
Server logs
Additional information
No response
The text was updated successfully, but these errors were encountered: