-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ImportError: No module named 'nvdiffrast_plugin' #46
Comments
I install nvdiffrast in my own docker and I install dependencies as the Dockerfile, but this issue still exists. |
It looks like the building of plugin somehow fails silently. This should not happen with the ninja build system, and without an error message telling what went wrong, it is difficult to debug the issue. Just to double check: Are you seeing this behavior using the provided docker setup or only in your own? |
I also meet this problem! Could someone tell me how to solve this problem? |
Hi @LCY850729436, can you be a bit more specific? Is this with the Docker configuration provided by us, or in a different environment? If latter, do you have the Ninja build system installed? |
I have solved this problem. I think the problem should be the version adaptation of GPU to CUDA. This problem occurs when I use 2080ti, but not when I use Titan. |
I use a 3090 GPU |
I use two 2080ti on docker, same problem occured! |
Hi everyone, I'm eager to help in solving this problem, but more information is needed of what exactly goes wrong. We know there are plenty of working installations out there, so something must be different in the setups that exhibit this problem. To start, I repeat my question to everyone that experiences this problem: Is this with the Docker configuration provided by us, or in a different environment? If latter, do you have the Ninja build system installed? Second, I would like to ask you to change Finally, if someone has seen this problem and found a way to fix it, please share your solution. The error indicates that the nvdiffrast C++/Cuda plugin could not be loaded, and the most likely reason is that it could not be compiled. I imagine this could occur for a variety of reasons, and therefore there could be multiple different root causes for the same issue. |
Hi @s-laine, I use the Docker conf provided by you as below: ARG BASE_IMAGE=pytorch/pytorch:1.6.0-cuda10.1-cudnn7-devel RUN apt-get update && apt-get install -y --no-install-recommends #x forward update ENV PYTHONDONTWRITEBYTECODE=1 #for GLEW #nvidia-container-runtime #Default pyopengl to EGL for good headless rendering support COPY docker/10_nvidia.json /usr/share/glvnd/egl_vendor.d/10_nvidia.json RUN pip install -i https://pypi.tuna.tsinghua.edu.cn/simple imageio imageio-ffmpeg COPY nvdiffrast /tmp/pip/nvdiffrast/ And when I run 'triangle.py' the importError will happen. |
@xjcvip007, thank you for the information. It appears that you are not running the Dockerfile provided in our repo, as the base image in yours is Can you try the same experiment with a container built using our Dockerfile? |
@s-laine, I can not use your default dockerfile for our gpu cloud platform support, so we change the base image from pytorch/pytorch:1.7.1-cuda11.0-cudnn8-devel to pytorch/pytorch:1.6.0-cuda10.1-cudnn7-devel, and add some installation for our sshd support, but all needed config and file included in the dockerfile. |
I tried this with a Linux machine, and I'm unfortunately unable to replicate the problem even when using your Dockerfile (with the missing backslashes added, and imageio/imageio-ffmpeg installed from the default source). My test machine has the following operating system, as reported by And
As the container looks to be fine, I'm suspecting you may have outdated graphics drivers, because those depend on the host operating system instead of the container. Alternatively, building the container does not produce the same result for one reason or another, but I don't know enough about docker to tell why this might happen. What I don't understand is why there are no useful error messages so I still don't know what exactly fails when you try to run the example. For reference, below is the exact Dockerfile that I used: ARG BASE_IMAGE=pytorch/pytorch:1.6.0-cuda10.1-cudnn7-devel
FROM $BASE_IMAGE
RUN apt-get update && apt-get install -y --no-install-recommends \
pkg-config \
libglvnd0 \
libgl1 \
libglx0 \
libegl1 \
libgles2 \
libglvnd-dev \
libgl1-mesa-dev \
libegl1-mesa-dev \
libgles2-mesa-dev \
cmake \
curl \
build-essential \
git \
curl \
vim \
wget \
ca-certificates \
libjpeg-dev \
libpng-dev \
apt-utils \
bzip2 \
tmux \
gcc \
g++ \
openssh-server \
software-properties-common \
xauth \
zip \
unzip \
&& apt-get clean
#x forward update
RUN echo "X11UseLocalhost no" >> /etc/ssh/sshd_config \
&& mkdir -p /run/sshd
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1
#for GLEW
ENV LD_LIBRARY_PATH /usr/lib64:$LD_LIBRARY_PATH
#nvidia-container-runtime
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility,graphics
#Default pyopengl to EGL for good headless rendering support
ENV PYOPENGL_PLATFORM egl
COPY docker/10_nvidia.json /usr/share/glvnd/egl_vendor.d/10_nvidia.json
RUN pip install imageio imageio-ffmpeg
#RUN pip install -i https://pypi.tuna.tsinghua.edu.cn/simple imageio imageio-ffmpeg
COPY nvdiffrast /tmp/pip/nvdiffrast/
COPY README.md setup.py /tmp/pip/
RUN cd /tmp/pip && pip install . I built the container with I also tried launching a shell into the container by doing
and running |
@s-laine thanks for your effort, I will try the dockerfile on the more new graphics drivers, and below is my 'nvidia-smi' result: |
This appears to be an incompatibility between PyTorch and the C++ compiler in the Linux distribution. A discussion here mentions this error when trying to build PyTorch extensions on Arch Linux. So this issue isn't specific to nvdiffrast, but prevents the building of any C++ based PyTorch extensions on your system. If PyTorch refuses to work with the compiler on the system, there unfortunately isn't anything we can do about it. We recommend using an Ubuntu distribution as that's what we have tested everything on. |
I have solved the problem. I meet the problem on Windows, and this is due to ninja fails to compile the plugin. |
|
Got same problems on Ubuntu18.04, WSL2(Windows Subsystem for Linux), with RTX3060_laptop. |
OpenGL/Cuda interop isn't currently supported in WSL2 and thus it won't be able to run the OpenGL rasterizer in nvdiffrast. The next release of nvdiffrast will include a Cuda-based rasterizer that sidesteps the compatibility issues on platforms where OpenGL doesn't work. The release should be out early next week. |
The Cuda rasterizer is now released in v0.3.0. Documentation notes here. |
I have an interesting experience when using I used to download the CuDNN and add the path to the system environment variables. The way I add CuDNN to So I manually compiled the nvdiffrast via I think my case may not cover the general case, but I hope my sharing can help some people who makes the same mistakes like me. |
@icewired-yy Thanks for the report! Nvdiffrast does not do anything special about CuDNN or look for the related environment variables, but PyTorch's cpp extension builder seems to have some logic related to it here. Upon a quick glance, it looks like PyTorch expects Good to have this noted here if others bump into the same issue. |
I spent... like 10 hours trying to get this to work today on Windows 10 and Visual Studio 2022 using Git Bash (note the unix style c paths). I was able to solve the ninja compilation issues with: # fixes functional crtdbg.h basetsd.h
export INCLUDE="/c/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.38.33130/include:/c/Program Files (x86)/Windows Kits/10/Include/10.0.22621.0/ucrt:/c/Program Files (x86)/Windows Kits/10/Include/10.0.22621.0/shared"
# fixes kernel32.Lib ucrt.lib
export LIB="/c/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.38.33130/lib/x64/:/c/Program Files (x86)/Windows Kits/10/Lib/10.0.22621.0/um/x64/:/c/Program Files (x86)/Windows Kits/10/Lib/10.0.22621.0/ucrt/x64" There are no more errors building the plugin with ninja. However, I still see:
when building the plugin from threestudio on export. |
Has anyone found a fix?
I clone this repo and run
But then if I try to import:
I get the same issue:
EDIT: |
Long story short, nvdiffrast_plugin is build against
|
@s-laine @nurpax @jannehellsten I think this should be mentioned in official documentation as one of the pre-installation steps. As many ML-activists (you included) have different CUDA versions for different projects, guessing where things went wrong might be sometimes tricky. |
I have the same problem, also 3090, looking at the comments, most people have the problem with 3090 |
Successfully solve the problem. Here is my full solution and referrence link:
Solved! |
我解决了这个问题: 注意,我这里还有一个问题,就是安装cuda12.1.0后,在系统环境变量中,没有自动生成CUDA_PATH环境变量,需要手动指向cuda12.1.0的目录。这也是为什么在ninja时,nvcc使用的是conda的nvcc,导致ninja编译失败! |
When I was compiling nvdiffuse, the generated files in the torch.exe directory were incomplete and reported the following error |
@zbclovehj i got same issue did you fix ? #207 |
After reviewing the solutions for each of the above points and troubleshooting the issue myself, the problem arose from having multiple CUDA versions on Windows.Here is a summary of the solutions I hope will help you. nvdiffrast_plugin: Module Not FoundIssue Description:When using the ImportError: DLL load failed while importing nvdiffrast_plugin: The specified module could not be found.
or
ImportError: No module named 'nvdiffrast_plugin' Issue Troubleshooting:
torch.utils.cpp_extension.load(name=plugin_name, sources=source_paths, extra_cflags=common_opts+cc_opts, extra_cuda_cflags=common_opts+['-lineinfo'], extra_ldflags=ldflags, with_cuda=True, verbose=False)
# Change verbose to True
torch.utils.cpp_extension.load(name=plugin_name, sources=source_paths, extra_cflags=common_opts+cc_opts, extra_cuda_cflags=common_opts+['-lineinfo'], extra_ldflags=ldflags, with_cuda=True, verbose=True) 2.Run Example and Check Detailed Compilation Logs Run the following code to compile the python .\nvdiffrast\samples\torch\triangle.py --cuda Observe the compilation log, especially the paths of the CUDA header files, such as: Note: Including file: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include\cuda_runtime_api.h
Note: Including file: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include\crt/host_defines.h
Note: Including file: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include\builtin_types.h
...... You might notice that the CUDA version referenced in the header files is different from the version reported by nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:28:36_Pacific_Standard_Time_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0
echo $env:CUDA_HOME
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8 Root Cause of the Issue:Let's summarize the findings:
Thus, it appears that the plugin was compiled with the incorrect version of CUDA headers. Solution:
cd C:\Users\*\AppData\Local\torch_extensions\torch_extensions\Cache\py311_cu124\nvdiffrast_plugin
ninja
# ninja: no work to do.
python ./nvdiffrast/samples/torch/triangle.py --cuda Observe the compilation logs and ensure that the CUDA header files are now being correctly referenced: Note: Including file: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_fp16.hpp
Note: Including file: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cublas_v2.h
Note: Including file: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cublas_api.h If the compilation is successful, you should see something like: [16/16] "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\Hostx64\x64/link.exe" Buffer.o CudaRaster.o RasterImpl.cuda.o RasterImpl.o common.o rasterize.cuda.o interpolate.cuda.o texture.cuda.o texture.o antialias.cuda.o torch_bindings.o torch_rasterize.o torch_interpolate.o torch_texture.o torch_antialias.o /nologo /DLL c10.lib c10_cuda.lib torch_cpu.lib torch_cuda.lib -INCLUDE:?warp_size@cuda@at@@YAHXZ torch.lib /LIBPATH:C:\ProgramData\anaconda3\envs\tddfav3\Lib\site-packages\torch\lib torch_python.lib /LIBPATH:C:\ProgramData\anaconda3\envs\tddfav3\libs "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\lib\x64" cudart.lib /out:nvdiffrast_plugin.pyd
Creating library nvdiffrast_plugin.lib and object nvdiffrast_plugin.exp
Loading extension module nvdiffrast_plugin...
Saving to 'tri.png'. You’re done! The issue should be resolved. |
@TonyDua excellent writing purely amazing Now I have few questions to you I have CUDA path set and nvcc --version shows but echo cuda path didnt show How do we set echo cuda path or should we set it? And what is difference between cuda path vs system variables? So this main issue comes from cuda path and system variable path setup? |
It is a conundrum of the century |
When I run codes in ./samples/torch,there is always an error: No module named 'nvdiffrast_plugin'
Traceback (most recent call last):
File "triangle.py", line 21, in
glctx = dr.RasterizeGLContext()
File "/opt/conda/envs/fomm/lib/python3.7/site-packages/nvdiffrast/torch/ops.py", line 142, in init
self.cpp_wrapper = _get_plugin().RasterizeGLStateWrapper(output_db, mode == 'automatic')
File "/opt/conda/envs/fomm/lib/python3.7/site-packages/nvdiffrast/torch/ops.py", line 83, in _get_plugin
torch.utils.cpp_extension.load(name=plugin_name, sources=source_paths, extra_cflags=opts, extra_cuda_cflags=opts, extra_ldflags=ldflags, with_cuda=True, verbose=False)
File "/opt/conda/envs/fomm/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1091, in load
keep_intermediates=keep_intermediates)
File "/opt/conda/envs/fomm/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1317, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/opt/conda/envs/fomm/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1706, in _import_module_from_library
file, path, description = imp.find_module(module_name, [path])
File "/opt/conda/envs/fomm/lib/python3.7/imp.py", line 299, in find_module
raise ImportError(_ERR_MSG.format(name), name=name)
ImportError: No module named 'nvdiffrast_plugin'
It seems like that some packages are lost.
I install nvdiffrast as the instruction in document ----cd ./nvdiffrast and pip install .
I uninstall and install many times but this error still exists. I try installing in cuda10.0, torch 1.6, cuda11.1, torch 1.8.1, and Cuda 9.0, torch 1.6, but all these situations have this error. I use an Nvidia 3090 GPU.
Is there anyone who can solve this problem? Thanks.
The text was updated successfully, but these errors were encountered: