Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST]: Building cugraph for running SSSP inside cpp/src/traversal #4269

Open
2 tasks done
mrprajesh opened this issue Mar 22, 2024 · 10 comments
Open
2 tasks done

[QST]: Building cugraph for running SSSP inside cpp/src/traversal #4269

mrprajesh opened this issue Mar 22, 2024 · 10 comments
Assignees
Labels
question Further information is requested

Comments

@mrprajesh
Copy link

mrprajesh commented Mar 22, 2024

What is your question?

We are interested in running cpp/single-gpu version of SSSP for comparison as baselines in our paper. So, I tried building cugraph from the instructions

git clone [email protected]:rapidsai/cugraph.git
cd cugraph
 ./build.sh clean
 ./build.sh libcugraph
<snip>
CMake Error at /home/rajesh/install/cmake-3.28.3-linux-x86_64/share/cmake-3.28/Modules/FetchContent.cmake:1679 (message):
  Build step for cugraph-ops failed: 1

I understood that ops is a closed source. So, I even tried from the conda env, which had lincugraphops installed, however, that gave a different error with nccl INCLUDE_DIR vars. Could you please clarify the following?

  1. Is the cpp version usable or buildable at v24.x? or do we have support only for py version?
  2. Can we build cugraph from source via these steps?
  3. Can we run sssp_sg.cu version after installing RAPIDS nightly via conda installation?
  4. Are we on the right lines? Could you please suggest a solution for our objective? Thank a lot in advance.

Our machine config.

  • Ubuntu 22.04 based LM
  • CUDA 12.2
  • RTX 3060 Notebook
  • Nvidia driver 535.104.05

Code of Conduct

  • I agree to follow cuGraph's Code of Conduct
  • I have searched the open issues and have found no duplicates for this question
@mrprajesh mrprajesh added the question Further information is requested label Mar 22, 2024
@ChuckHastings
Copy link
Collaborator

There is an option --without_cugraphops which you can add to the build command which will skip over the cugraph ops dependency. That will cause some of the sampling algorithms (which rely on some closed-source cugraph-ops features) to fail. But everything else (including SSSP) will function properly.

So you can try:

./build.sh clean
./build.sh libcugraph --without_cugraphops

and that should do what you want.

@ChuckHastings ChuckHastings self-assigned this Mar 25, 2024
@mrprajesh mrprajesh changed the title [QST]: Building cugraph for running SSSP insire cpp/src/traversal [QST]: Building cugraph for running SSSP inside cpp/src/traversal Apr 12, 2024
@mrprajesh
Copy link
Author

Thanks @ChuckHastings,
After installing NCCL, I was able to move past the NCCL error. However, my chrome/cinnoman/laptop nearly crashed while spitting more errors (below) during build.

git clone -b v24.04.00 https://github.com/rapidsai/cugraph.git
cd cugraph/
./build.sh clean
./build.sh libcugraph --without_cugraphops
 
#NCCL Error 
CMake Error at /home/rajesh/install/cmake-3.28.3-linux-x86_64/share/cmake-3.28/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
Could NOT find NCCL (missing: NCCL_LIBRARY NCCL_INCLUDE_DIR)


wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt update
sudo apt install libnccl2 libnccl-dev

./build.sh clean
./build.sh libcugraph --without_cugraphops

[1/632] Building CUDA object CMakeFiles/cugraph.dir/src/community/detail/refine_mg.cu.o
FAILED: CMakeFiles/cugraph.dir/src/community/detail/refine_mg.cu.o 
/usr/local/cuda-12.2/bin/nvcc -forward-unknown-to-host-compiler -DCUDA_API_PER_THREAD_DEFAULT_STREAM -DCUTLASS_NAMESPACE=raft_cutlass -DFMT_HEADER_ONLY=1 -DLIBCUDACXX_ENABLE_EXPERIMENTAL_MEMORY_RESOURCE -DRAFT_COMPILED -DRAFT_SYSTEM_LITTLE_ENDIAN=1 -DSPDLOG_FMT_EXTERNAL -DTHRUST_DEVICE_SYSTEM=THRUST_DEVICE_SYSTEM_CUDA -DTHRUST_DISABLE_ABI_NAMESPACE -DTHRUST_HOST_SYSTEM=THRUST_HOST_SYSTEM_CPP -DTHRUST_IGNORE_ABI_NAMESPACE_ERROR -Dcugraph_EXPORTS -I/home/rajesh/temp/cugraph/cpp/../thirdparty -I/home/rajesh/temp/cugraph/cpp/src -I/home/rajesh/temp/cugraph/cpp/include -I/home/rajesh/temp/cugraph/cpp/build/_deps/rmm-src/include -I/home/rajesh/temp/cugraph/cpp/build/_deps/cccl-src/thrust/thrust/cmake/../.. -I/home/rajesh/temp/cugraph/cpp/build/_deps/cccl-src/libcudacxx/lib/cmake/libcudacxx/../../../include -I/home/rajesh/temp/cugraph/cpp/build/_deps/cccl-src/cub/cub/cmake/../.. -I/home/rajesh/temp/cugraph/cpp/build/_deps/fmt-src/include -I/home/rajesh/temp/cugraph/cpp/build/_deps/spdlog-src/include -I/home/rajesh/temp/cugraph/cpp/build/_deps/raft-src/cpp/include -I/home/rajesh/temp/cugraph/cpp/build/_deps/cuco-src/include -I/home/rajesh/temp/cugraph/cpp/build/_deps/nvidiacutlass-src/include -I/home/rajesh/temp/cugraph/cpp/build/_deps/nvidiacutlass-build/include -I/usr/local/cuda-12.2/include -isystem /usr/local/cuda-12.2/targets/x86_64-linux/include -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_86,code=[sm_86]" -Xcompiler=-fPIC --expt-extended-lambda --expt-relaxed-constexpr -Werror=cross-execution-space-call -Wno-deprecated-declarations -Xptxas=--disable-warnings -Xcompiler=-Wall,-Wno-error=sign-compare,-Wno-error=unused-but-set-variable -Xfatbin=-compress-all -DNO_CUGRAPH_OPS -MD -MT CMakeFiles/cugraph.dir/src/community/detail/refine_mg.cu.o -MF CMakeFiles/cugraph.dir/src/community/detail/refine_mg.cu.o.d -x cu -c /home/rajesh/temp/cugraph/cpp/src/community/detail/refine_mg.cu -o CMakeFiles/cugraph.dir/src/community/detail/refine_mg.cu.o
Killed
[2/632] Building CUDA object CMakeFiles/cugraph.dir/src/community/detail/refine_sg.cu.o
FAILED:

Thank you for your patience and your assistance.
Kind regards,
Rajesh

@ChuckHastings
Copy link
Collaborator

Sorry, I didn't fully read your original input, let me answer these first, then I'll answer your most recent question.

I understood that ops is a closed source. So, I even tried from the conda env, which had lincugraphops installed, however, that gave a different error with nccl INCLUDE_DIR vars. Could you please clarify the following?

  1. Is the cpp version usable or buildable at v24.x? or do we have support only for py version?

Yes, each branch should be usable/buildable (cpp or python). 24.02 and 24.04 are released branches and should work fine. 24.06 is the latest code and subject to change, however based on our development/CI process our latest branch should also be buildable unless one of our dependencies has changed and we haven't updated to reflect that change yet.

  1. Can we build cugraph from source via these steps?

Yes, I skipped to this detail of your question in my first answer.

  1. Can we run sssp_sg.cu version after installing RAPIDS nightly via conda installation?

If you are only interested in calling the functions as is and are on a supported architecture, you could install the conda packages. If you install the conda packages, your environment should contain the necessary headers and libraries already compiled for your environment and you wouldn't need to build from source. I would certainly recommend this, building libcugraph takes a bit of time, and unless you're on a system that we don't build for (e.g. using an older GCC or a Pascal or older GPU) there's not much benefit in building the code yourself.

@ChuckHastings
Copy link
Collaborator

There's not enough information in your error message for me to suggest what's going wrong. I see the Killed message in your output. If I had to guess (pure speculation on my part), you may have run out of memory.

We have seen issues where some of our .cu files require a large amount of host memory for the compiler to run. It's possible that running this on your notebook computer doesn't have sufficient memory to complete compilation. That would be even more motivation to use the pre-built versions.

@mrprajesh
Copy link
Author

Sorry, I didn't fully read your original input,

Sure, No worries. Thank you for your replies.

you may have run out of memory.

Ah, I see.

We have seen issues where some of our .cu files require a large amount of host memory for the compiler to run. It's possible that running this on your notebook computer doesn't have sufficient memory to complete compilation.

OMG! Thanks.

That would be even more motivation to use the pre-built versions.

Sure. I'll attempt this.

I see there are a lot of developments happening in this complex repo/intergrations and due to nx-cugraph
All I wished for was to run this BFS example at https://github.com/rapidsai/cugraph/blob/branch-24.06/cpp/examples/users/single_gpu_application/sg_graph_algorithms.cpp
It looked very much like gunrock's style of programming so I got interested in checking it out and learning them.

@ChuckHastings
Copy link
Collaborator

I think you should be able to build those examples from a conda install of the software. Please let us know if you have any issues, the C++ examples are a new feature we just added in the 24.04 release. Any feedback on making them easier to use would be wonderful.

@ChuckHastings
Copy link
Collaborator

Any luck on either running from conda installation or building things on a system with more memory?

@mrprajesh
Copy link
Author

Any luck on either running from conda installation or building things on a system with more memory?

Unfortunately, on a system with more memory, we encountered NCCL errors (which we have to compile from src or use sudo).
We tried using the Conda-installed version (back then, before the 24.04 release) but encountered similar roadblocks. // I'll have to check with the release version.

Any feedback on making them easier to use would be wonderful.

It would be nice to have a lite build system, for example, separating single GPU code vs multi GPU code. i.e. minimal dependency on required -I files than building the whole of cugraph

On build from source

It would be nice if the prerequisite section lists about NCCL, cugraphops, etc.

Thank you for all your help and patience. Kind regards,

@ChuckHastings
Copy link
Collaborator

A thought to try.

We have segregated the SG and MG implementations for many of the algorithms into separate source files. The implementation is generally in a common header file, but the instantiation of the actual functions occurs in separate source files. While we don't have an easy way to skip building the MG code, you could try going into CMakeLists.txt and commenting out the compilation of all of the source files that have an _mg suffix (e.g. src/community/louvain_mg.cu). You'd have to also do that in the tests/CMakeLists.txt.

That might work, or if you combine that with commenting out the references to NCCL in the two CMakeLists.txt files you might get a functioning build.

@ChuckHastings
Copy link
Collaborator

Any luck on this?

If you are using the latest branch (our 24.08 development branch) you will see that we split many of the files into smaller translation units to make the compilation require less memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants