Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImportError: No module named 'nvdiffrast_plugin' #46

Open
sunkymepro opened this issue Sep 24, 2021 · 35 comments
Open

ImportError: No module named 'nvdiffrast_plugin' #46

sunkymepro opened this issue Sep 24, 2021 · 35 comments

Comments

@sunkymepro
Copy link

When I run codes in ./samples/torch,there is always an error: No module named 'nvdiffrast_plugin'

Traceback (most recent call last):
File "triangle.py", line 21, in
glctx = dr.RasterizeGLContext()
File "/opt/conda/envs/fomm/lib/python3.7/site-packages/nvdiffrast/torch/ops.py", line 142, in init
self.cpp_wrapper = _get_plugin().RasterizeGLStateWrapper(output_db, mode == 'automatic')
File "/opt/conda/envs/fomm/lib/python3.7/site-packages/nvdiffrast/torch/ops.py", line 83, in _get_plugin
torch.utils.cpp_extension.load(name=plugin_name, sources=source_paths, extra_cflags=opts, extra_cuda_cflags=opts, extra_ldflags=ldflags, with_cuda=True, verbose=False)
File "/opt/conda/envs/fomm/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1091, in load
keep_intermediates=keep_intermediates)
File "/opt/conda/envs/fomm/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1317, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/opt/conda/envs/fomm/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1706, in _import_module_from_library
file, path, description = imp.find_module(module_name, [path])
File "/opt/conda/envs/fomm/lib/python3.7/imp.py", line 299, in find_module
raise ImportError(_ERR_MSG.format(name), name=name)
ImportError: No module named 'nvdiffrast_plugin'

It seems like that some packages are lost.
I install nvdiffrast as the instruction in document ----cd ./nvdiffrast and pip install .
I uninstall and install many times but this error still exists. I try installing in cuda10.0, torch 1.6, cuda11.1, torch 1.8.1, and Cuda 9.0, torch 1.6, but all these situations have this error. I use an Nvidia 3090 GPU.
Is there anyone who can solve this problem? Thanks.

@sunkymepro
Copy link
Author

I install nvdiffrast in my own docker and I install dependencies as the Dockerfile, but this issue still exists.

@s-laine
Copy link
Collaborator

s-laine commented Sep 27, 2021

It looks like the building of plugin somehow fails silently. This should not happen with the ninja build system, and without an error message telling what went wrong, it is difficult to debug the issue.

Just to double check: Are you seeing this behavior using the provided docker setup or only in your own?

@HarshWinterBytes
Copy link

I also meet this problem! Could someone tell me how to solve this problem?

@s-laine
Copy link
Collaborator

s-laine commented Oct 11, 2021

Hi @LCY850729436, can you be a bit more specific? Is this with the Docker configuration provided by us, or in a different environment? If latter, do you have the Ninja build system installed?

@HarshWinterBytes
Copy link

Hi @LCY850729436, can you be a bit more specific? Is this with the Docker configuration provided by us, or in a different environment? If latter, do you have the Ninja build system installed?

I have solved this problem. I think the problem should be the version adaptation of GPU to CUDA. This problem occurs when I use 2080ti, but not when I use Titan.

@sunkymepro
Copy link
Author

Hi @LCY850729436, can you be a bit more specific? Is this with the Docker configuration provided by us, or in a different environment? If latter, do you have the Ninja build system installed?

I have solved this problem. I think the problem should be the version adaptation of GPU to CUDA. This problem occurs when I use 2080ti, but not when I use Titan.

I use a 3090 GPU

@xjcvip007
Copy link

I use two 2080ti on docker, same problem occured!

@s-laine
Copy link
Collaborator

s-laine commented Oct 25, 2021

Hi everyone,

I'm eager to help in solving this problem, but more information is needed of what exactly goes wrong. We know there are plenty of working installations out there, so something must be different in the setups that exhibit this problem.

To start, I repeat my question to everyone that experiences this problem: Is this with the Docker configuration provided by us, or in a different environment? If latter, do you have the Ninja build system installed?

Second, I would like to ask you to change verbose=False to verbose=True in the call to torch.utils.cpp_extension.load in nvdiffrast/torch/ops.py line 84, and share the output.

Finally, if someone has seen this problem and found a way to fix it, please share your solution. The error indicates that the nvdiffrast C++/Cuda plugin could not be loaded, and the most likely reason is that it could not be compiled. I imagine this could occur for a variety of reasons, and therefore there could be multiple different root causes for the same issue.

@xjcvip007
Copy link

Hi @s-laine, I use the Docker conf provided by you as below:

ARG BASE_IMAGE=pytorch/pytorch:1.6.0-cuda10.1-cudnn7-devel
FROM $BASE_IMAGE

RUN apt-get update && apt-get install -y --no-install-recommends
pkg-config
libglvnd0
libgl1
libglx0
libegl1
libgles2
libglvnd-dev
libgl1-mesa-dev
libegl1-mesa-dev
libgles2-mesa-dev
cmake
curl
build-essential
git
curl
vim
wget \
ca-certificates
libjpeg-dev
libpng-dev \
apt-utils
bzip2 \
tmux
gcc
g++
openssh-server
software-properties-common
xauth
zip
unzip
&& apt-get clean
&& rm -rf /var/lib/apt/lists/*

#x forward update
RUN echo "X11UseLocalhost no" >> /etc/ssh/sshd_config
&& mkdir -p /run/sshd

ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1

#for GLEW
ENV LD_LIBRARY_PATH /usr/lib64:$LD_LIBRARY_PATH

#nvidia-container-runtime
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility,graphics

#Default pyopengl to EGL for good headless rendering support
ENV PYOPENGL_PLATFORM egl

COPY docker/10_nvidia.json /usr/share/glvnd/egl_vendor.d/10_nvidia.json

RUN pip install -i https://pypi.tuna.tsinghua.edu.cn/simple imageio imageio-ffmpeg

COPY nvdiffrast /tmp/pip/nvdiffrast/
COPY README.md setup.py /tmp/pip/
RUN cd /tmp/pip && pip install .

And when I run 'triangle.py' the importError will happen.
I set verbose=True as you suggest, the errors show as follow:
图片

@s-laine
Copy link
Collaborator

s-laine commented Oct 26, 2021

@xjcvip007, thank you for the information. It appears that you are not running the Dockerfile provided in our repo, as the base image in yours is pytorch/pytorch:1.6.0-cuda10.1-cudnn7-devel, whereas in ours it is pytorch/pytorch:1.7.1-cuda11.0-cudnn8-devel. The block with #x forward update is also not from our Dockerfile.

Can you try the same experiment with a container built using our Dockerfile?

@xjcvip007
Copy link

@s-laine, I can not use your default dockerfile for our gpu cloud platform support, so we change the base image from pytorch/pytorch:1.7.1-cuda11.0-cudnn8-devel to pytorch/pytorch:1.6.0-cuda10.1-cudnn7-devel, and add some installation for our sshd support, but all needed config and file included in the dockerfile.

@s-laine
Copy link
Collaborator

s-laine commented Oct 27, 2021

I tried this with a Linux machine, and I'm unfortunately unable to replicate the problem even when using your Dockerfile (with the missing backslashes added, and imageio/imageio-ffmpeg installed from the default source).

My test machine has the following operating system, as reported by uname -a:
Linux <hostname> 5.4.0-80-generic #90-Ubuntu SMP Fri Jul 9 22:49:44 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

And nvidia-smi reports the following version information:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+

As the container looks to be fine, I'm suspecting you may have outdated graphics drivers, because those depend on the host operating system instead of the container. Alternatively, building the container does not produce the same result for one reason or another, but I don't know enough about docker to tell why this might happen. What I don't understand is why there are no useful error messages so I still don't know what exactly fails when you try to run the example.

For reference, below is the exact Dockerfile that I used:

ARG BASE_IMAGE=pytorch/pytorch:1.6.0-cuda10.1-cudnn7-devel
FROM $BASE_IMAGE

RUN apt-get update && apt-get install -y --no-install-recommends \
pkg-config \
libglvnd0 \
libgl1 \
libglx0 \
libegl1 \
libgles2 \
libglvnd-dev \
libgl1-mesa-dev \
libegl1-mesa-dev \
libgles2-mesa-dev \
cmake \
curl \
build-essential \
git \
curl \
vim \
wget \
ca-certificates \
libjpeg-dev \
libpng-dev \
apt-utils \
bzip2 \
tmux \
gcc \
g++ \
openssh-server \
software-properties-common \
xauth \
zip \
unzip \
&& apt-get clean

#x forward update
RUN echo "X11UseLocalhost no" >> /etc/ssh/sshd_config \
&& mkdir -p /run/sshd

ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1

#for GLEW
ENV LD_LIBRARY_PATH /usr/lib64:$LD_LIBRARY_PATH

#nvidia-container-runtime
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility,graphics

#Default pyopengl to EGL for good headless rendering support
ENV PYOPENGL_PLATFORM egl

COPY docker/10_nvidia.json /usr/share/glvnd/egl_vendor.d/10_nvidia.json

RUN pip install imageio imageio-ffmpeg
#RUN pip install -i https://pypi.tuna.tsinghua.edu.cn/simple imageio imageio-ffmpeg

COPY nvdiffrast /tmp/pip/nvdiffrast/
COPY README.md setup.py /tmp/pip/
RUN cd /tmp/pip && pip install .

I built the container with ./run_sample.sh --build-container and executed the sample with ./run_sample.sh ./samples/torch/triangle.py.

I also tried launching a shell into the container by doing

docker run --rm -it --gpus all -v `pwd`:/app --workdir /app -e TORCH_EXTENSIONS_DIR=/app/tmp gltorch:latest bash

and running python samples/torch/triangle.py manually from within it, and that worked too.

@xjcvip007
Copy link

xjcvip007 commented Oct 28, 2021

@s-laine thanks for your effort, I will try the dockerfile on the more new graphics drivers, and below is my 'nvidia-smi' result:
图片

@DoubleYanLee
Copy link

Hi everyone,

I'm eager to help in solving this problem, but more information is needed of what exactly goes wrong. We know there are plenty of working installations out there, so something must be different in the setups that exhibit this problem.

To start, I repeat my question to everyone that experiences this problem: Is this with the Docker configuration provided by us, or in a different environment? If latter, do you have the Ninja build system installed?

Second, I would like to ask you to change verbose=False to verbose=True in the call to torch.utils.cpp_extension.load in nvdiffrast/torch/ops.py line 84, and share the output.

Finally, if someone has seen this problem and found a way to fix it, please share your solution. The error indicates that the nvdiffrast C++/Cuda plugin could not be loaded, and the most likely reason is that it could not be compiled. I imagine this could occur for a variety of reasons, and therefore there could be multiple different root causes for the same issue.

Hello,I've meet the same question. I didn't use docker. my environment is CUDA10.2+pytorch1.7.1+torchvision0.8.2 I installed nvdiffrast from 'pip install .' in nvdiffrast directory . When i run 'python pose.py',I got the same question'ImportError: No module named 'nvdiffrast_plugin'
' and a unique question like that :
Screenshot 2021-10-30 at 9 15 16 AM

@s-laine
Copy link
Collaborator

s-laine commented Oct 30, 2021

This appears to be an incompatibility between PyTorch and the C++ compiler in the Linux distribution. A discussion here mentions this error when trying to build PyTorch extensions on Arch Linux.

So this issue isn't specific to nvdiffrast, but prevents the building of any C++ based PyTorch extensions on your system. If PyTorch refuses to work with the compiler on the system, there unfortunately isn't anything we can do about it. We recommend using an Ubuntu distribution as that's what we have tested everything on.

@bo233
Copy link

bo233 commented Nov 21, 2021

I have solved the problem. I meet the problem on Windows, and this is due to ninja fails to compile the plugin.
I set cl.exe(C:\Program Files(x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30113\bin\Hostx64\x64) and ninja.exe(*\Anaconda\envs\*\Lib\site-packages\ninja\data\bin) to environment variables(I'm not sure if it make sense). Then I change verbose=False to verbose=True in the call to torch.utils.cpp_extension.load in nvdiffrast/torch/ops.py line 84, and find the plugin's resource file folderC:\Users\*\AppData\Local\torch_extensions\torch_extensions\Cache\nvdiffrast_plugin. I cd to the path and try to ninja it, but find ninja call cl.exe, and some head files miss(in my situation is cstddef). Then I search the file, and add the path to environment var INCLUDE(C:\Program Files(x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30113\include), and the same as LIB(C:\Program Files(x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30113\lib\x64).
Finally the plugin is successfully complied.

@c1a1o1
Copy link

c1a1o1 commented Jan 21, 2022

I have solved the problem. I meet the problem on Windows, and this is due to ninja fails to compile the plugin. I set cl.exe(C:\Program Files(x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30113\bin\Hostx64\x64) and ninja.exe(*\Anaconda\envs\*\Lib\site-packages\ninja\data\bin) to environment variables(I'm not sure if it make sense). Then I change verbose=False to verbose=True in the call to torch.utils.cpp_extension.load in nvdiffrast/torch/ops.py line 84, and find the plugin's resource file folderC:\Users\*\AppData\Local\torch_extensions\torch_extensions\Cache\nvdiffrast_plugin. I cd to the path and try to ninja it, but find ninja call cl.exe, and some head files miss(in my situation is cstddef). Then I search the file, and add the path to environment var INCLUDE(C:\Program Files(x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30113\include), and the same as LIB(C:\Program Files(x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30113\lib\x64). Finally the plugin is successfully complied.
@bo233 How to include the last two

@XCR16729438
Copy link

Got same problems on Ubuntu18.04, WSL2(Windows Subsystem for Linux), with RTX3060_laptop.
But I success to compile it on Windows10 with same computer.
(Strange, I believed linux is always better than windows XD.)

@s-laine
Copy link
Collaborator

s-laine commented Aug 12, 2022

OpenGL/Cuda interop isn't currently supported in WSL2 and thus it won't be able to run the OpenGL rasterizer in nvdiffrast.

The next release of nvdiffrast will include a Cuda-based rasterizer that sidesteps the compatibility issues on platforms where OpenGL doesn't work. The release should be out early next week.

@s-laine
Copy link
Collaborator

s-laine commented Aug 17, 2022

The Cuda rasterizer is now released in v0.3.0. Documentation notes here.

@icewired-yy
Copy link

icewired-yy commented Jan 15, 2024

I have an interesting experience when using nvdiffrast on Windows and I would like to share here.

I used to download the CuDNN and add the path to the system environment variables. The way I add CuDNN to path is to create a variable name CUDNN_HOME pointing at the base path of CuDNN directory and add something like %CUDMM_HOME%\bin into path. Then I found my NvDiffrast compilation failed.

So I manually compiled the nvdiffrast via ninja --verbose and I found that something wrong with the content of build.ninja, that CUDNN_HOME unexpectedly appeared in build.ninja, and pointing at a wrong path. Now I think nvdiffrast will automatically detect the CuDNN in system environment variables. But at that time I deleted the CUDNN_HOME and now everything goes well.

I think my case may not cover the general case, but I hope my sharing can help some people who makes the same mistakes like me.

@s-laine
Copy link
Collaborator

s-laine commented Jan 15, 2024

@icewired-yy Thanks for the report!

Nvdiffrast does not do anything special about CuDNN or look for the related environment variables, but PyTorch's cpp extension builder seems to have some logic related to it here.

Upon a quick glance, it looks like PyTorch expects CUDNN_HOME, if defined, to point to the main CuDNN directory instead of the bin directory. This may explain ninja.build ending up with broken paths.

Good to have this noted here if others bump into the same issue.

@createthis
Copy link

createthis commented Jan 25, 2024

I spent... like 10 hours trying to get this to work today on Windows 10 and Visual Studio 2022 using Git Bash (note the unix style c paths). I was able to solve the ninja compilation issues with:

# fixes  functional crtdbg.h basetsd.h
export INCLUDE="/c/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.38.33130/include:/c/Program Files (x86)/Windows Kits/10/Include/10.0.22621.0/ucrt:/c/Program Files (x86)/Windows Kits/10/Include/10.0.22621.0/shared"
# fixes  kernel32.Lib ucrt.lib
export LIB="/c/Program Files/Microsoft Visual Studio/2022/Community/VC/Tools/MSVC/14.38.33130/lib/x64/:/c/Program Files (x86)/Windows Kits/10/Lib/10.0.22621.0/um/x64/:/c/Program Files (x86)/Windows Kits/10/Lib/10.0.22621.0/ucrt/x64"

There are no more errors building the plugin with ninja. However, I still see:

ImportError: DLL load failed while importing nvdiffrast_plugin: The specified module could not be found.

when building the plugin from threestudio on export.

@createthis
Copy link

createthis commented Jan 25, 2024

I finally got past this error with another 2-1/2 hours of work on Windows 10 with Visual Studio Community 2022.

First, See previous comment for how I got nvdiffrast_plugin building correctly using ninja.

Next, I had to figure out why the import was failing. To do this, I needed to manually reproduce the problem:

# This path will be different for each system/person. I got the path from the output of the verbose=True change
cd /c/Users/jesse/AppData/Local/Packages/PythonSoftwareFoundation.Python.3.10_qbz5n2kfra8p0/LocalCache/Local/torch_extensions/torch_extensions/Cache/py310_cu118/nvdiffrast_plugin
ls -al
python -vvvvvvvvvvvvvv
>>> import nvdiffrast_plugin

Here's a screenshot of my directory listing:
nvdiffrast_plugin_dir_listing

Here's a screenshot of the repro:
nvdiffrast_plugin

Next, I asked myself why the import was failing when the .pyd file was clearly right there. After some googling, I learned a few things:

  1. .pyd files are basically shared DLLs on Windows.
  2. python will give this error if the shared DLL links other shared DLLs and those DLLs cannot be found.

So let's list the other shared DLLs:

 dumpbin //dependents ./nvdiffrast_plugin.pyd

Here's a screenshot of that output:
nvdiffrast_plugin_dumpbin_dependents

Next, I painstakingly identified the location of each of these DLLs and crafted these statements to allow python to find them:

import os
os.add_dll_directory(r"C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.38.33130\lib\x64")
os.add_dll_directory(r"C:\Program Files (x86)\Windows Kits\10\Lib\10.0.22621.0\um\x64")
os.add_dll_directory(r"C:\Program Files (x86)\Windows Kits\10\Lib\10.0.22621.0\ucrt\x64")

# c10.dll
os.add_dll_directory(r"C:\Users\jesse\Documents\ai\threestudio\venv\Lib\site-packages\torch\lib")

# cudart64_12.dll
os.add_dll_directory(r"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.3\bin")

Once this has been done, we can import the plugin successfully:
nvdiffrast_plugin_import_working

I then solved the problem in code by adding these statements to the top of launch.py (I'm using this from threestudio).

Hope this saves someone else a day of work. I hate windows! 🤣

@cyrildiagne
Copy link

cyrildiagne commented Apr 25, 2024

Has anyone found a fix?
I'm facing the same issue with RTX 3090, and an environment setup using conda (not using Docker):

  • Ubuntu 22.04
  • Pytorch: 1.7.1
  • CUDA: 11
  • I've installed all the apt packages listed in the Dockerfile

I clone this repo and run pip install . without error:

Collecting numpy
  Downloading numpy-1.19.5-cp36-cp36m-manylinux2010_x86_64.whl (14.8 MB)
     |████████████████████████████████| 14.8 MB 21.6 MB/s 
Building wheels for collected packages: nvdiffrast
  Building wheel for nvdiffrast (setup.py) ... done
  Created wheel for nvdiffrast: filename=nvdiffrast-0.3.1-py3-none-any.whl size=137866 sha256=f6736342f9499bcab7d5fd651434608921671a66bc4337bde1096d18bb1a9a78
  Stored in directory: /tmp/pip-ephem-wheel-cache-4j89pp58/wheels/fd/b0/9b/ee78c398f92015d6a02b99f5db6a08c41b1a47c4be7e2e0631
Successfully built nvdiffrast
Installing collected packages: numpy, nvdiffrast
Successfully installed numpy-1.19.5 nvdiffrast-0.3.1

But then if I try to import:

import nvdiffrast.torch as dr
dr.RasterizeCudaContext(device=device)

I get the same issue:

ImportError: No module named 'nvdiffrast_plugin'

EDIT:
The solution for me was to install CUDA on the system (following this guide) rather than using conda's files

@iiiCpu
Copy link

iiiCpu commented May 24, 2024

Long story short, nvdiffrast_plugin is build against CUDA_PATH version, not first (or the only) in PATH.
So, delete nvdiffrast_plugin and set correct CUDA_PATH before run.

rmdir /S %userprofile%\AppData\Local\torch_extensions\torch_extensions\Cache\py310_cu121\nvdiffrast_plugin
set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\

@iiiCpu
Copy link

iiiCpu commented May 25, 2024

Long story short, nvdiffrast_plugin is build against CUDA_PATH version, not first (or the only) in PATH. So, delete nvdiffrast_plugin and set correct CUDA_PATH before run.

rmdir /S %userprofile%\AppData\Local\torch_extensions\torch_extensions\Cache\py310_cu121\nvdiffrast_plugin
set CUDA_PATH=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\

@s-laine @nurpax @jannehellsten I think this should be mentioned in official documentation as one of the pre-installation steps. As many ML-activists (you included) have different CUDA versions for different projects, guessing where things went wrong might be sometimes tricky.

@Linxmotion
Copy link

I have the same problem, also 3090, looking at the comments, most people have the problem with 3090

@Picaloer
Copy link

Picaloer commented Sep 2, 2024

Successfully solve the problem. Here is my full solution and referrence link:
Problem ImportError: DLL load failed while importing nvdiffrast_plugin: The specified module could not be found.
Solution Manually run ninja for nvdiffrast_plugin

  1. ImportError: DLL load failed while importing nvdiffrast_plugin: The specified module could not be found.
    [https://github.com/NVlabs/nvdiffrast/issues/46#issuecomment-974756618]
    cd C:\Users\*\AppData\Local\torch_extensions\torch_extensions\Cache\nvdiffrast_plugin
    ninja
    
  2. fatal error C1189: #error: -- unsupported Microsoft Visual Studio version! Only the versions between 2017 and 2022 (inclusive) are supported!
    [https://blog.csdn.net/HaoZiHuang/article/details/125795675]
    a. Install VS2019
    b. add cl.exe to the environment variables
    
  3. LINK : fatal error LNK1104: cannot open file 'msvcprt.lib' or other .lib files not found
    [https://www.reddit.com/r/comfyui/comments/1bf7tv1/comment/kv1dq5f/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button]
     a. Manually add the path to msvcprt.lib
     b. Run ninja again, if other .lib files are still not found, continue to add their paths
     c. The final ldflags variable in the build.ninja file:
     ldflags = /DLL c10.lib c10_cuda.lib torch_cpu.lib torch_cuda.lib -INCLUDE:?warp_size@cuda@at@@YAHXZ torch.lib /LIBPATH:C:\Users\Picaloe\.conda\envs\instantmesh\lib\site-packages\torch\lib torch_python.lib /LIBPATH:C:\Users\Picaloe\.conda\envs\instantmesh\libs "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\lib\x64" cudart.lib "/LIBPATH:D:\Softwares\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30133\lib\x64" msvcprt.lib "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\Lib\10.0.22621.0\um\x64" kerbcli.lib "/LIBPATH:C:\Program Files (x86)\Windows Kits\10\Lib\10.0.22621.0\ucrt\x64" ucrt.lib "/LIBPATH:C:\Users\Picaloe\AppData\Local\torch_extensions\torch_extensions\Cache\py310_cu121\nvdiffrast_plugin" nvdiffrast_plugin.exp "/LIBPATH:C:\Users\Picaloe\AppData\Local\torch_extensions\torch_extensions\Cache\py310_cu121\nvdiffrast_plugin" nvdiffrast_plugin.lib
    

Solved!

@opentld
Copy link

opentld commented Sep 19, 2024

我解决了这个问题:
1、首先,cl.exe ninja 的路径要对,要能找的到
2、你的 anaconda3\envs\instantmesh\Lib\site-packages\torch\utils\cpp_extension.py 中,将2065行改成 command = ['ninja', '--version'], 以解决ninja -v的错误
3、你的 anaconda3\envs\instantmesh\Lib\site-packages\nvdiffrast\torch\ops.py中,将118行改成 verbose=True
4、运行python app.py,大概聊仍然会报找不到nvdiffrast_plugin,这时候到C:\Users****\AppData\Local\torch_extensions\torch_extensions\Cache\py310_cu121\nvdiffrast_plugin目录下,手动运行ninja,以生成nvdiffrast_plugin

注意,我这里还有一个问题,就是安装cuda12.1.0后,在系统环境变量中,没有自动生成CUDA_PATH环境变量,需要手动指向cuda12.1.0的目录。这也是为什么在ninja时,nvcc使用的是conda的nvcc,导致ninja编译失败!

@zbclovehj
Copy link

zbclovehj commented Oct 28, 2024

When I was compiling nvdiffuse, the generated files in the torch.exe directory were incomplete and reported the following error
Traceback (most recent call last):
File "/home/data2/chuan/.conda/envs/DiffTex/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/data2/chuan/.conda/envs/DiffTex/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/data2/chuan/DiffTex/scripts/run_texture.py", line 24, in
main()
File "/home/data2/chuan/.conda/envs/DiffTex/lib/python3.10/site-packages/pyrallis/argparsing.py", line 158, in wrapper_inner
response = fn(cfg, *args, **kwargs)
File "/home/data2/chuan/DiffTex/scripts/run_texture.py", line 19, in main
trainer = DiffTex(cfg)
File "/home/data2/chuan/DiffTex/src/training/trainer.py", line 63, in init
self.mesh_model = self.init_mesh_model()
File "/home/data2/chuan/DiffTex/src/training/trainer.py", line 75, in init_mesh_model
model = FacadeMeshModel(self.cfg.guide, self.dataset, device=self.device, exp_pth=self.exp_path)
File "/home/data2/chuan/DiffTex/src/models/facade_mesh.py", line 35, in init
self.renderer = Renderer(self.device, rast_context=self.guide_opt.rast_context)
File "/home/data2/chuan/DiffTex/src/models/render.py", line 12, in init
self.glctx = dr.RasterizeCudaContext(device)
File "/home/data2/chuan/.conda/envs/DiffTex/lib/python3.10/site-packages/nvdiffrast/torch/ops.py", line 184, in init
self.cpp_wrapper = _get_plugin().RasterizeCRStateWrapper(cuda_device_idx)
File "/home/data2/chuan/.conda/envs/DiffTex/lib/python3.10/site-packages/nvdiffrast/torch/ops.py", line 125, in _get_plugin
torch.utils.cpp_extension.load(name=plugin_name, sources=source_paths, extra_cflags=common_opts+cc_opts, extra_cuda_cflags=common_opts+['-lineinfo'], extra_ldflags=ldflags, with_cuda=True, verbose=False)
File "/home/data2/chuan/.conda/envs/DiffTex/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1284, in load
return _jit_compile(
File "/home/data2/chuan/.conda/envs/DiffTex/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1535, in _jit_compile
return _import_module_from_library(name, build_directory, is_python_module)
File "/home/data2/chuan/.conda/envs/DiffTex/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1929, in _import_module_from_library
module = importlib.util.module_from_spec(spec)
File "", line 571, in module_from_spec
File "", line 1176, in create_module
File "", line 241, in _call_with_frames_removed
ImportError: /home/data2/chuan/.cache/torch_extensions/py310_cu118/nvdiffrast_plugin/nvdiffrast_plugin.so: cannot open shared object file: No such file or directory

@FurkanGozukara
Copy link

@zbclovehj i got same issue did you fix ? #207

@TonyDua
Copy link

TonyDua commented Nov 6, 2024

After reviewing the solutions for each of the above points and troubleshooting the issue myself, the problem arose from having multiple CUDA versions on Windows.Here is a summary of the solutions I hope will help you.

nvdiffrast_plugin: Module Not Found

Issue Description:

When using the nvdiffrast plugin, the following error occurs:

ImportError: DLL load failed while importing nvdiffrast_plugin: The specified module could not be found.
or
ImportError: No module named 'nvdiffrast_plugin'

Issue Troubleshooting:

  1. Modify Detailed Compilation Log Output Modify the torch.ops.py file located in your nvdiffrast Conda environment (e.g., C:\ProgramData\anaconda3\envs\*\Lib\site-packages\nvdiffrast\torch\ops.py, where * is your environment name). In line 125, change the torch.utils.cpp_extension.load function call as follows:
    torch.utils.cpp_extension.load(name=plugin_name, sources=source_paths, extra_cflags=common_opts+cc_opts, extra_cuda_cflags=common_opts+['-lineinfo'], extra_ldflags=ldflags, with_cuda=True, verbose=False)

# Change verbose to True
    torch.utils.cpp_extension.load(name=plugin_name, sources=source_paths, extra_cflags=common_opts+cc_opts, extra_cuda_cflags=common_opts+['-lineinfo'], extra_ldflags=ldflags, with_cuda=True, verbose=True)

2.Run Example and Check Detailed Compilation Logs

Run the following code to compile the nvdiffrast_plugin and output detailed logs:

python .\nvdiffrast\samples\torch\triangle.py --cuda

Observe the compilation log, especially the paths of the CUDA header files, such as:

Note: Including file:      C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include\cuda_runtime_api.h
Note: Including file:       C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include\crt/host_defines.h
Note: Including file:       C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include\builtin_types.h
......

You might notice that the CUDA version referenced in the header files is different from the version reported by nvcc -V, for example:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:28:36_Pacific_Standard_Time_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0
  1. Check System Environment Variable CUDA_HOME

    Check whether the CUDA_HOME environment variable is correctly pointing to the current installed CUDA version (for example, v12.4, here I’m using Windows PowerShell):

echo $env:CUDA_HOME
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8

Root Cause of the Issue:

Let's summarize the findings:

  1. The plugin compilation log shows that the referenced CUDA header files are from version 11.8, while nvcc -V reports version 12.4.
  2. The files generated after plugin compilation are based on CUDA 12.4.
  3. The CUDA_HOME environment variable in the terminal points to version 11.8.

Thus, it appears that the plugin was compiled with the incorrect version of CUDA headers.


Solution:

  1. Clear Old Cache Files
    Go to C:\Users\YourUsername\AppData\Local\torch_extensions\torch_extensions\Cache, find the folder named pyXX_cuXXX (e.g., py311_cu124), and delete the nvdiffrast_plugin directory. This folder may contain old compilation caches.

    Tip: If you run ninja inside this folder, it will tell you that there is no work to do because the build has already been compiled:

cd C:\Users\*\AppData\Local\torch_extensions\torch_extensions\Cache\py311_cu124\nvdiffrast_plugin
ninja
# ninja: no work to do.
  1. Modify System Environment Variable

    You can modify the system environment variable directly or just adjust the environment variable for the current terminal session.
    Note: In some IDEs (like PyCharm), the environment variable changes may not take effect immediately, so it is recommended to verify using the system terminal.

image-20241106115540105

  1. Run Example Again

    After clearing the cache and updating the CUDA_HOME environment variable, run the following example to check the compilation logs again:

python ./nvdiffrast/samples/torch/triangle.py --cuda

Observe the compilation logs and ensure that the CUDA header files are now being correctly referenced:

Note: Including file:        C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cuda_fp16.hpp
Note: Including file:      C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cublas_v2.h
Note: Including file:       C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\include\cublas_api.h

If the compilation is successful, you should see something like:

[16/16] "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.29.30133\bin\Hostx64\x64/link.exe" Buffer.o CudaRaster.o RasterImpl.cuda.o RasterImpl.o common.o rasterize.cuda.o interpolate.cuda.o texture.cuda.o texture.o antialias.cuda.o torch_bindings.o torch_rasterize.o torch_interpolate.o torch_texture.o torch_antialias.o /nologo /DLL c10.lib c10_cuda.lib torch_cpu.lib torch_cuda.lib -INCLUDE:?warp_size@cuda@at@@YAHXZ torch.lib /LIBPATH:C:\ProgramData\anaconda3\envs\tddfav3\Lib\site-packages\torch\lib torch_python.lib /LIBPATH:C:\ProgramData\anaconda3\envs\tddfav3\libs "/LIBPATH:C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4\lib\x64" cudart.lib /out:nvdiffrast_plugin.pyd
  Creating library nvdiffrast_plugin.lib and object nvdiffrast_plugin.exp
Loading extension module nvdiffrast_plugin...
Saving to 'tri.png'.

image-20241106120340107

You’re done! The issue should be resolved.

@FurkanGozukara
Copy link

@TonyDua excellent writing purely amazing

Now I have few questions to you

I have CUDA path set and nvcc --version shows but echo cuda path didnt show

How do we set echo cuda path or should we set it?

image

And what is difference between cuda path vs system variables?

So this main issue comes from cuda path and system variable path setup?

@ollyestn
Copy link

ollyestn commented Nov 8, 2024

It is a conundrum of the century

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests