You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
v0.3.0 release vgpu-server create socket at /run/alnair.sock on the host machine, user container mount hostpath volumn /run/alnair.sock to communicate.
vgpu-server was deployed directly on the host. Will this communication still work when vgpu-server is deployed in a pod?
In the manifest, if we mount /run from hostpath to vgpu-server container, the container cannot start. Error message is
"Error: failed to start container "alnair-vgpu-server": Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/5afe69b7c6a1edf6187c693b8e9876f1e7df11b0b37badf1a61eae16c7694f4f/merged/run/nvidia-persistenced/socket: no such device or address: unknown"
However, one gpu node reports this, while the other did not. Even after set up some docker(20.10.12), and nvidia docker2(2.9.1), nvidia-container-cli(1.8.1) version NVIDIA/nvidia-docker#885
the mount error, is caused by nvidia-docker2, install nvidia-container-runtime, instead of nvidia-docker2, NVIDIA/nvidia-docker#825
"@3XX0 is this something that changed in version 2? we've been using nvidia-docker for a while now and we do have nvidia-smi as part of our docker images. we started to get the mount error once we switched to nvidia-docker2"
"No this requirement didn't change between the versions, as part of v2 we prevent the container from starting if you have the NVIDIA driver (e.g: nvidia-smi) in your image as this will lead to undefined behavior."
v0.3.0 release vgpu-server create socket at /run/alnair.sock on the host machine, user container mount hostpath volumn /run/alnair.sock to communicate.
vgpu-server was deployed directly on the host. Will this communication still work when vgpu-server is deployed in a pod?
Kubernetes has example of using empty dir to communicate between containers within a pod, but not exactly what we want.
https://kubernetes.io/docs/tasks/access-application-cluster/communicate-containers-same-pod-shared-volume/
Reference:
https://serverfault.com/questions/881875/expose-a-unix-socket-to-the-host-system-from-inside-from-a-docker-container/881895#881895
use unix socket in docker
https://web.archive.org/web/20210411145047/https://www.jujens.eu/posts/en/2017/Feb/15/docker-unix-socket/
The text was updated successfully, but these errors were encountered: