GGP Build With CMake

GGP and CMake

Starting with version 0.9, GGP must be built using cmake. To build GGP with CMake you need to have cmake 3.15 (GGP or later for version 1.1, and 3.18 or later with the current develop. Try

cmake --version

to make sure you have cmake and your version is recent enough. If you do not have a CMake on your system, please follow the following instructions, else skip to the Building GGP using CMake section.

For multi-GPU builds with OpenMPI, we recommend using at least version 4.0.x, and compiling OpenMPI with a recent version of UCX (at least 1.10) for CUDA-aware MPI support.

Obtaining CMake

You are likely going to build GGP on a remote machine with a module system. Try

module avail cmake

to see if the module loader has a CMake option. If it does not have a CMake module loaded, please ask the system administrator to add the module. In the meantime, you can download the source code here. Once you've gone through the build steps of CMake, prepend your PATH so that your environment can access the binaries.

Building GGP using CMake

It is recommend to build GGP in a separate folder (out-of_source). This has the advantage that you don't need to have different copies of the GGP source code on your disk to build separate configurations (e.g. for different GPU architectures) or need to trigger a full rebuild in your local GGP copy to build it for a different architecture. For example, suppose you have a machine with two GPU partitions. One has NVIDIA P100, and the other has NVIDIA V100. One can download one copy of the GGP source code (typically named ggp) and then have two build directories (say, build_p100 and build_v100). The advantage here is that when the source code is updated or modified, one need only change the source code once, then update each build as required.

cmake Vs ccmake

After downloading GGP from github, create a build directory and cd into it (the name is arbitrary - here we use build):

mkdir build
cd build

There are two methods one can use to build. The first is to use ccmake:

ccmake

ccmake ../generic-GPU-project

NOTE, for this to work, you may first need to run

cmake [-DGGP_TARGET_TYPE=<TARGET>] ../generic-GPU-project

and then, launch with ccmake. This will bring up a text based GUI for all the GGP CMake options. If you take this route, please take note that pressing the t key in the GUI will bring up extra CMAKE options. This, at first, can seem a little daunting, but the majority of the options you see here are automatically populated. Options are grouped into two main parts: CMAKE options (revealed by hitting t) and GGP options, each prepended accordingly. CMAKE options are more to do with HOW to build GGP, and GGP options are more to do with WHAT parts to build.

The CMAKE options CMAKE_CUDA_HOST_COMPILER, CMAKE_CXX_COMPILER and CMAKE_C_COMPILER dictate which host C++ and C and compiler to use. If you want to use a specific compiler, you must set these manually.

After changing the options to your preferences, press c to configure. As this will force CMake to find further tools / libraries (like locate mpi if you build using mpi). New variables may pop up here and may require you to run multiple times. As soon as the Press [g] to generate and exit option is shown at the bottom of the screen you may use it and cmake will generate your configuration.

cmake

If using the text GUI is not to your liking, then you can configure GGP directly using cmake. For example,

cmake ../generic-GPU-project -DGGP_MPI=ON
cmake .

This will configure GGP with the default options, except GGP_MPI will be turned ON. Make sure you used the correct architecture for your GPUs in first configuring step. Default architecture is sm_70 but you may want to specify different architectures such as -DGGP_GPU_ARCH=sm_60 for a Pascal GPU or -DGGP_GPU_ARCH=sm_80 for A100. The second cmake . (and no other arguments) command is often required to ensure that all configuration is completed. Without this second step, some configuration may not be complete (this is equivalent to ccmake requiring multiple configuration passes.

Building

In either case, once GGP has been configured, you can build with

make -j N

where N is the number of available CPU cores, or alternatively just make -j, when oversubscribe the CPU cores. This latter approach has typically the shortest time to compile.

The following are advanced options that can be specified directly to cmake with -D or can be set using ccmake under advanced options:

GGP_PRECISION=n - where n is 4-bit number that specifies which precisions we will enable (8 - double, 4 - single, 2 - half, 1 - quarter). Default value is 14, which corresponds to double/single/half enabled and quarter disabled.
GGP_FAST_COMPILE_REDUCE=ON** - where this option only compiles reduction kernels with block-size = 32, dramatically accelerating of the reduction kernels (reduce_ggp.cu, multi_reduce_ggp.cu, etc.) Additionally, the multi-blas kernels will not employ the warp-shfl optimization. This will affect performance, so should be used for fast debugging or development builds, hence the default value is OFF.
GGP_MAX_MULTI_BLAS_N=1 - disables some kernel fusion optimization for BLAS routines

** - signifies this option is post GGP 1.0

By default, GGP builds as a shared library and takes advantage of rpath to avoid needing to set LD_LIBRARY_PATH in most cases. If, for some reason, you would prefer a static library build you can set GGP_BUILD_SHAREDLIB=OFF. We do not recommend this because it creates a large spike in link time and binary sizes.

Improving build times with Ninja

You can use Ninja instead of make to improve parallel builds by specifying it as cmake generator in the initial cmake run

cmake -GNinja ...

and then build using

ninja

or just use

cmake --build .

Improving link times with Mold

A further reduction of the overall build time can be achieved by using an alternative linker like LLVM's lld or mold. For using mold you can just use

mold -run ninja

Building GGP with clang as CUDA compiler

While GGP can be build using clang as compiler this is still considered early and might not work for all possible options and the performance may not perform as expected!

The development version of GGP now supports building GGP with clang as CUDA compiler. This requires

CMake >= 3.18
Clang >= 10 and a compatible CUDA toolkit (see https://www.llvm.org/docs/CompileCudaWithLLVM.html for details)

To enable the use of clang as CUDA compiler execute the initial cmake call with the options

-DCMAKE_CUDA_COMPILER=clang++ -DCMAKE_CXX_COMPILER=clang++

You might need to specify the full path to clang++ and append a version number. If you need to specify a specific CUDA toolkit or have it installed in an uncommon location you can do that with

-DCUDAToolkit_ROOT=/some/path

Note: The CUDA Toolkit detection is done by FindCUDAToolkit and its documentation has more details on determining the CUDA Toolkit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly