commited image can not run in another node. #8

haijohn · 2021-09-13T03:27:34Z

The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.

1. Issue or feature description

commited image can not run in another node.

2. Steps to reproduce the issue

start pod with gpu enabled
commit container to image and push to registry
start pod with commited image in another node
container can not run with following error

Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: 
exit status 1, stdout: , stderr: nvidia-container-cli: device error: GPU-caba9b00-6386-2c33-7834-646ef2692cb7: unknown device\\\\n\\\"\"": unknown

3. Information to attach (optional if deemed irrelevant)

Common error checking:

The output of nvidia-smi -a on your host
Your docker configuration file (e.g: /etc/docker/daemon.json)
The k8s-device-plugin container logs
The kubelet logs on the node (e.g: sudo journalctl -r -u kubelet)

Additional information that might help better understand your environment and reproduce the bug:

Docker version from docker version: 19.03
Docker command, image and tag used: docker commit
Kernel version from uname -a
Any relevant kernel output lines from dmesg
NVIDIA packages version from dpkg -l '*nvidia*' or rpm -qa '*nvidia*'
NVIDIA container library version from nvidia-container-cli -V
NVIDIA container library logs (see troubleshooting)

The text was updated successfully, but these errors were encountered:

archlitchi · 2021-09-13T06:42:54Z

你是在另一个节点上用docker裸起的吗？可以的话，上slack上聊吧

haijohn · 2021-09-14T02:14:53Z

你是在另一个节点上用docker裸起的吗？可以的话，上slack上聊吧

是的，另一个节点上没有用vGPU，如果另一个节点也用了vGPU好像就没有这个问题了

archlitchi · 2021-09-14T04:49:11Z

你是在另一个节点上用docker裸起的吗？可以的话，上slack上聊吧

是的，另一个节点上没有用vGPU，如果另一个节点也用了vGPU好像就没有这个问题了

嗯，如果用docker裸起的话，不能用--gpus申请显卡，得用 docker run -it --runtime=nvidia -e=NVIDIA_VISIBLE_DEVICES=0,1,2,3(对应显卡序号，或者all代表所有显卡） {image} 这样的方式来配置～

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

commited image can not run in another node. #8

commited image can not run in another node. #8

haijohn commented Sep 13, 2021

archlitchi commented Sep 13, 2021

haijohn commented Sep 14, 2021 •

edited

Loading

archlitchi commented Sep 14, 2021

commited image can not run in another node. #8

commited image can not run in another node. #8

Comments

haijohn commented Sep 13, 2021

The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.

1. Issue or feature description

2. Steps to reproduce the issue

3. Information to attach (optional if deemed irrelevant)

archlitchi commented Sep 13, 2021

haijohn commented Sep 14, 2021 • edited Loading

archlitchi commented Sep 14, 2021

haijohn commented Sep 14, 2021 •

edited

Loading