We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
It's ok when request 1 gpu in yaml. But when request more than 1, the output of nvidia-smi is below: The output of nvidia-smi in host machine is ok.
In another machine with GeForce RTX 2070 SUPER ,it's all right when request 2 gpus. but when I run application locally , it abort due to :
[4pdvGPU ERROR (pid:697 thread=140106827071488 context.c:189)]: cuCtxGetDevice Not Found. tid=140106827071488 ctx=0x239601906000:0x23960041a000 home/limengxuan/work/libcuda_override/src/cuda/context.c:189: cuCtxGetDevice: Assertion `0' failed.
ubuntu1~20.04 + microk8s + Tesla T4 GPU + 510driver
Common error checking:
nvidia-smi -a
/etc/docker/daemon.json
Additional information that might help better understand your environment and reproduce the bug:
dmesg
nvidia-smi[2260220]: segfault at 0 ip 00007fde46d051ce sp 00007ffe1ae4c9e8 error 4 in libc-2.31.so[7fde46b9d000+178000] [89993.700532] Code: fd d7 c9 0f bc d1 c5 fe 7f 27 c5 fe 7f 6f 20 c5 fe 7f 77 40 c5 fe 7f 7f 60 49 83 c0 1f 49 29 d0 48 8d 7c 17 61 e9 c2 04 00 00 <c5> fe 6f 1e c5 fe 6f 56 20 c5 fd 74 cb c5 fd d7 d1 49 83 f8 21 0f [90182.697502] nvidia-smi[2265941]: segfault at 0 ip 00007f241971c1ce sp 00007fffff703d08 error 4 in libc-2.31.so[7f24195b4000+178000] [90182.697509] Code: fd d7 c9 0f bc d1 c5 fe 7f 27 c5 fe 7f 6f 20 c5 fe 7f 77 40 c5 fe 7f 7f 60 49 83 c0 1f 49 29 d0 48 8d 7c 17 61 e9 c2 04 00 00 <c5> fe 6f 1e c5 fe 6f 56 20 c5 fd 74 cb c5 fd d7 d1 49 83 f8 21 0f
The text was updated successfully, but these errors were encountered:
And memory and fault isolation are provided?
Sorry, something went wrong.
No branches or pull requests
1. Issue or feature description
It's ok when request 1 gpu in yaml. But when request more than 1, the output of nvidia-smi is below:
The output of nvidia-smi in host machine is ok.
In another machine with GeForce RTX 2070 SUPER ,it's all right when request 2 gpus.
but when I run application locally , it abort due to :
2. Steps to reproduce the issue
ubuntu1~20.04 + microk8s + Tesla T4 GPU + 510driver
3. Information to attach (optional if deemed irrelevant)
Common error checking:
nvidia-smi -a
on your host/etc/docker/daemon.json
)-{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
Additional information that might help better understand your environment and reproduce the bug:
dmesg
The text was updated successfully, but these errors were encountered: