-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue for the new NGC images #40
Comments
which pytorch and torch-ccl version do you use? |
|
it seems that your codebase is older than the 1.13.0 tag, and pytorch change the c10d distributed path in the pytorch/pytorch#85780, so you may have 2 choices to fix this issue:
|
Thx for your reply! My problem was solved by the first option. The second option didn't work, but that's not torch-ccl or pytorch's fault. What I mean is that the compiled pytorch provided by the ngc image no longer contains C++ header files. I had to recompile pytorch for torch-ccl to compile correctly. |
Hi! Recently I was looking at ngc images sites and noticed
It seems that ngc images will no longer provide the conda environment and pytorch related files will be moved to the python environment. When I docker run the new images such as nvcr.io/nvidia/pytorch:22.11-py3, I found that there is no c10d related head files in python environment in directory /usr/local/lib/python3.8/dist-packages/torch/include. But ProcessCCL.hpp must use head file <torch/csrc/distributed/c10d/Utils.hpp>.
So how do we solve this problem so that we can use torch-ccl in the latest ngc image?
The text was updated successfully, but these errors were encountered: