[Ray Complied Graph] NCCL Internal Error #49827
Labels
bug
Something that is supposed to be working; but isn't
compiled-graphs
core
Issues that should be addressed in Ray Core
P1
Issue that should be fixed within a few weeks
What happened + What you expected to happen
I have installed the latest Ray, Cuda 12.6, latest NCCL and am trying to run Complied Graph as per the instructions from anyscale but am getting this error.
I have 2 servers with 3090 and 3080TI in a Ray cluster.
I have reinstalled the environment, Cuda Toolkit, Cuda, NCCL several times and it doesn't help.
Versions / Dependencies
ray 2.40.0
cupy-cuda12x 13.3.0
Reproduction script
Run Ray with
ray start --head --node-ip-address=192.168.1.166 --port=6379 --dashboard-port=8265 --dashboard-host=0.0.0.0 --num-gpus=1
Connect to Ray
ray start --address='192.168.1.166:6379' --num-gpus=1
Issue Severity
None
The text was updated successfully, but these errors were encountered: